1. Building Machine Learning-based Threat Hunting System from Scratch
- Author
-
Yung-Tien Chu, Chun-Ying Huang, Szu-Chun Huang, Chung-Kuan Chen, Chin-Laung Lei, and Si-Chen Lin
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Weak signal ,Machine learning ,computer.software_genre ,Imbalanced data ,Computer Science Applications ,True negative ,Hardware and Architecture ,Scratch ,Anomaly detection ,Artificial intelligence ,Graph algorithms ,business ,Safety Research ,computer ,True positive rate ,Software ,Information Systems ,computer.programming_language - Abstract
Machine learning has been widely used for solving challenging problems in diverse areas. However, to the best of our knowledge, seldom literature has discussed in-depth how machine learning approaches can be used effectively to “hunt” (identify) threats, especially advanced persistent threats (APTs) , in a monitored environment. In this study, we share our past experiences in building machine learning-based threat-hunting models. Several challenges must be considered when a security team attempts to build such models. These challenges include (1) weak signal, (2) imbalanced data sets, (3) lack of high-quality labels, and (4) no storyline. In this study, we propose Fuchikoma and APTEmu to demonstrate how we tackle the above-mentioned challenges. The former is a proof of concept system for demonstrating the ideas behind autonomous threat-hunting. It is a machine learning-based anomaly detection and threat hunting system which leverages natural language processing (NLP) and graph algorithms. The latter is an APT emulator, which emulates the behavior of a well-known APT called APT3, which is the target used in the first round of MITRE ATT&CK Evaluations. APTEmu generates attacks on Windows machines in a virtualized environment, and the captured system events can be further used to train and enhance Fuchikoma’s capabilities. We illustrate the steps and experiments we used to build the models, discuss each model’s effectiveness and limitations of each model, and propose countermeasures and solutions to improve the models. Our evaluation results show that machine learning algorithms can effectively assist threat hunting processes and significantly reduce security analysts’ efforts. Fuchikoma correctly identifies malicious commands and achieves high performance in terms of over 80% True Positive Rate and True Negative Rate and over 60% F3. We believe our proposed approaches provide valuable experiences in the area and shed light on automated threat-hunting research.
- Published
- 2022