Home

Awesome

MalwareTrainingSets

Please check it out: https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/

For an updated followUP please check it out: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/

Cite The DataSet
If you find those results useful please cite them :

@misc{ MR,
   author = "Marco Ramilli",
   title = "Malware Training Sets: a machine learning dataset for everyone",
   year = "2016",
   url = "https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/",
   note = "[Online; December 2016]"
 }

UPDATE Many people asked me about the scripts I used to generate MIST-Modified JSON. So here there are ! (take a look to scripts section). You might use mist_json.py as a reporting module from CuckooSandbox and the script fromMongoToARFF.py to generate ARFF files suitables for WEKA.

If you are going to create new datasets by running your local CuckooSandbox using mist_json.py module and you wanto to share them, please feel free to make pool requests !

If you want to know more about the working flow, please check this update: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/