Deciphering Malware's use of TLS (without Decryption)

July 06, 2016 · Declared Dead · 🏛 Journal of Computer Virology and Hacking Techniques

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Blake Anderson, Subharthi Paul, David McGrew arXiv ID 1607.01639 Category cs.CR: Cryptography & Security Citations 197 Venue Journal of Computer Virology and Hacking Techniques Last Checked 4 months ago

Abstract

The use of TLS by malware poses new challenges to network threat detection because traditional pattern-matching techniques can no longer be applied to its messages. However, TLS also introduces a complex set of observable data features that allow many inferences to be made about both the client and the server. We show that these features can be used to detect and understand malware communication, while at the same time preserving the privacy of benign uses of encryption. These data features also allow for accurate malware family attribution of network communication, even when restricted to a single, encrypted flow. To demonstrate this, we performed a detailed study of how TLS is used by malware and enterprise applications. We provide a general analysis on millions of TLS encrypted flows, and a targeted study on 18 malware families composed of thousands of unique malware samples and ten-of-thousands of malicious TLS flows. Importantly, we identify and accommodate the bias introduced by the use of a malware sandbox. The performance of a malware classifier is correlated with a malware family's use of TLS, i.e., malware families that actively evolve their use of cryptography are more difficult to classify. We conclude that malware's usage of TLS is distinct from benign usage in an enterprise setting, and that these differences can be effectively used in rules and machine learning classifiers.