Linking Exploits from the Dark Web to Known Vulnerabilities for Proactive Cyber Threat Intelligence: An Attention-Based Deep Structured Semantic Model Approach (pdf)

Sagar Samtani

The Dark Web has emerged as a valuable source to proactively develop cyber threat intelligence (CTI) capabilities. Despite its value, Dark Web data contains tens of thousands of unstructured, un-sanitized text records containing significant non-natural language. This prevents the direct application of standard CTI analytics (e.g., malware analysis, IP reputation services) and text mining methodologies to perform critical tasks. One such challenge pertains to systematically linking Dark Web exploits to known vulnerabilities present within modern organizations. In this talk, I will present my recent work in extending a deep learning technique known as the Deep Structured Semantic Model (DSSM) (drawn the neural information retrieval) to incorporate emerging attention mechanisms from interpretability for deep learning literature. The resultant Exploit Vulnerability Attention DSSM (EVA-DSSM) automatically links hacker forum exploits and vulnerabilities provided by enterprise vulnerability assessment tools based on their names, outputs interpretable and explainable text features that are critical for creating links, and provides prioritized links for subsequent remediation and mitigation efforts. Rigorous evaluation indicates that EVA-DSSM outperforms baseline methods drawn from distributional semantics, probabilistic matching, and deep learning-based short text matching algorithms in matching relevant vulnerabilities from major vulnerability assessment tools to 0-day to web applications exploits, remote exploits, local exploits, and denial of service exploits. The framework’s utility in two contexts: the systems of selected major US hospitals and Supervisory Control and Data Acquisition (SCADA) systems worldwide.