Ryan Kovar

Splunk

Datasets for the Everyman (pdf, video)

Security data can be surprisingly hard to come by when you don't have users generating it for you. So we made or found datasets and then hosted them for the community. This talk will discuss the "Splunk dataset project" and how it can be used by data scientists (new and experienced) to try machine learning hypotheses across a variety of different datasets in a curated environment. From the Endgame Ember malware dataset to Windows Event Logs, the Splunk Datasets Project attempts to give researchers and newbies a place to try new ML techniques using tools like Splunk's Machine Learning Toolkit (MLTK) which is a bundled version of various ML libraries like numpy, scipy, pandas, scikit-learn, and statsmodels.