Skip to content

Papers, code and datasets about deep learning for Android malware defenses and malware detection

Notifications You must be signed in to change notification settings

yueyueL/DL-based-Android-Malware-Defenses-review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 

Repository files navigation

Deep Learning for Android Malware Defenses

This is an updated survey fo deep learning-based Android malware defenses, a constantly updated version of the manuscript, "Deep Learning for Android Malware Defenses: a Systematic Literature Review" by Yue Liu, Li Li, Chakkrit Tantithamthavorn and Yepang Liu. This paper has been accepted by ACM Computing Surveys.

To the best of our knowledge, no systematic literature review focusing on deep learning approaches for Android Malware defenses exists. In this paper, we conducted a systematic literature review to search and analyze how deep learning approaches have been applied in the context of malware defenses in the Android environment. As a result, a total of 132 studies covering the period 2014-2021 were identified. Our investigation reveals that, while the majority of these sources mainly consider DL-based on Android malware detection, 53 primary studies (40.1 percent) design defense approaches based on other scenarios. This review also discusses research trends, research focuses, challenges, and future research directions in DL-based Android malware defenses.

Please kindly cite this paper if it helps your research:

@article{liu2022deep,
	author = {Liu, Yue and Tantithamthavorn, Chakkrit and Li, Li and Liu, Yepang},
	title = {Deep Learning for Android Malware Defenses: A Systematic Literature Review},
	year = {2022},
	issue_date = {August 2023},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	volume = {55},
	number = {8},
	issn = {0360-0300},
	url = {https://doi.org/10.1145/3544968},
	doi = {10.1145/3544968},
	journal={ACM Computing Surveys},
	month = {dec},
	articleno = {153},
	numpages = {36},
}

You are welcome to update our review list!!

  • fork this repository, add it and merge back;
  • or email us.

If you see a project or link here that is no longer maintained or is not a good fit, please submit a pull request to improve this document. Thank you!

Content

Systematic review process and paper lists

We collected primary studies related DL-based Android malware defenses from a variety of sources (IEEE, ACM Digital Library, Springer, Science Direct, Wiley Online Library, Google Scholar and Web of Knowledge). Only those studies related to deep learning-based Android malware defenses should be considered for further review;in addition, we proposed a quality appraisal criterion to obtain high-quality studies. The complete list of exclusion criteria and quality appraisal criterion is available at this page. After that, we obtained 132 relevant parimary studies.

Paper structure

We uploaded our completed paper lists to Google Drive with detailed reviewed information.

(Rewiew paper lists)

Our paper is structured as below:

  • Malware Defenses Objectives
    • binary malware classification
    • malware family attribution
    • repackaged/fake app detection
    • adversarial learning attacks and protections
    • malware evolution detection and defense
    • malicious behavior analysis
  • APK Characterization
    • Program analysis approaches (static analysis, dynamic analysis, hybrid analysis)
    • Feature categories (permission, API calls, filtered intents, app component, url, string, hardware component, app metadata, system call, dynamic activities, program graph, opcode, bytecode, java code)
    • Feature encoding approaches (categorical, text-based, graph-based, image-based, hybrid)
  • Deep Learning Techniques
    • Learning paradigms (supervised, supervised & unsupervised, unsupervised, reinforcement learning)
    • Deep learning models (Multilayer Perceptrons, Convolutional Neural Networks, Recurrent Neural Networks, Deep Belief Networks, Autoencoders, Generative Adversarial Networks, Graph Neural Networks, Attention-based neural networks, Deep Reinforcement Learning, Transformers, Hybrid models)
    • Model explanation
  • Deployment
    • Off-device, Distributed, On-device
  • Performance evaluation
    • Dataset
    • Evaluation approaches
    • Evaluation metrics
    • Availability

If you are interested in the summary of each subtopic for these 132 primary studies, you can read our survey to catch more information; If you want to check detailed information for each primary study, you can read our review table.

Malware data collection

Data sources Is update Paper Details
Drebin - NDSS-2014 123453 benign samples and 5560 malware(176 malware families); 2010-2012 samples
Genome - S&P-2012 863 benign and 1260 malware; 2010-2011 samples
Contagio - - it consists of 11,960 mobile malware samples and 16,800 benign samples utill 2018
AMD - DIMVA-2017 24553 malware (2010-2016)
AndroZoo Yes MSR-2016 AndroZoo is a growing collection of Android Applications collected from several sources, including the official Google Play app market. It currently contains 17,951,878 different APKs.
VirusTotal Yes - VirusTotal aggregates many antivirus products and online scan engines. It also provide datasets for researchers
VirusShare Yes - VirusShare is a repository of malware samples to provide security researchers. System currently contains 44,390,572 malware samples.
CICMalDroid - - It has more than 17,341 Android samples utill 2018.
RmvDroid - MSR-2019 9,133 malware samples, which belong to 56 malware families
Google Play Yes - Google play is the official Android market. PlayDrone: Google crawler
Thirt-party markets Yes - HUAWEI, APKpure, MI store, Tencent, 360, Wandoujia, Aptoide,Anzhi, APKmirror, Amazon Appstore, 9APPS
Google Play Malware No ICSE-2022 1,238 Android malware from 134 distinct malware families

Anti-virus tools

Public tools

Deep learning-based Android malware defense approaches

  • Deep Android Malware Detection
    • Bianry Malware Classification; CNN; Opcode Sequence
    • in CODASPY '17 [pdf] [Code]
  • A Multimodal Deep Learning Method for Android Malware Detection Using Various Features
    • Bianry Malware classification; CNN; Multiple features (String,method opcode, method API, shared library function opcode, permission, App component, environmental feature)
    • in TIFS 2018, [pdf] [Code]
  • Detecting Android malware using Long Short-term Memory (LSTM)
    • Bianry Malware Classification; LSTM; Permissions, dynamic behaviour
    • in Journal of Intelligent & Fuzzy Systems, 2018, [pdf][Code]
  • {TESSERACT}: Eliminating experimental bias in malware classification across space and time
    • Malware Evolution Detection and Defense, MLP, Drebin's features
    • in USENIX Security Symposium , 2019, [pdf][Code]
  • DeepIntent: Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile Apps
    • Malicious Behavior Analysis; CNN,RNN,AE; App metadata
    • in CCS'19, [pdf] [Code]
  • An Android mutation malware detection based on deep learning using visualization of importance from codes
    • Bianry Malware Classification; CNN; Java code
    • in Microelectronics Reliability, 2019, [pdf] [Code]
  • Familial Clustering for Weakly-Labeled Android Malware Using Hybrid Representation Learning
    • Malware family attribution; MLP; Java Code, App components, action, Requested Permission, Hardware,instrumentation classes, requested API, package name, version, referenced libraries.
    • in TIFS 2019, [pdf] [Code]
  • Android Malware Detection Based on System Calls Analysis and CNN Classification
    • Binary malware classification; CNN; System Call
    • in IEEE Wireless Communications and Networking Conference Workshop (WCNCW), 2019, [pdf] [Code]
  • Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection
    • Adversarial Learning Attacks and Protections; MLP;
    • in TIFS 2020, [pdf][Code]
  • Evaluating explanation methods for deep learning in security
    • Binary malware classification; MLP, CNN
    • in EuroS&P'20; [pdf] [code]
  • Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware
    • Malware Evolution Detection and Defense, Binary Malware Detection; MLP
    • in CCS'20, [pdf] [code]
  • A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store
    • Repackaged/Fake App Detection; CNN
    • in TMC'20, [pdf] [code]
  • DENAS: Automated Rule Generation by Knowledge Extraction from Neural Networks
  • Experiences of Landing Machine Learning onto Market-Scale Mobile Malware Detection
  • Hybrid Analysis of Android Apps for Security Vetting using Deep Learning
    • Binary Malware Classification; LSTM(Bi-LSTM and Attn-BiLSTM),
    • in IEEE Conference on Communications and Network Security (CNS), 2020 [pdf][Code]
  • Understanding Privacy Awareness in Android App Descriptions Using Deep Learning
    • Malicious Behavior Analysis, CNN
    • in ACM Conference on Data and Application Security and Privacy, 2020, [pdf][Code]
  • Combining multi-features with a neural joint model for Android malware detection
    • Binary Malware Detection, Malware Family Identification; RNN, CNN,
    • in Journal of Intelligent & Fuzzy Systems, 2020, [pdf] [Code]
  • Experimental comparison of features and classifiers for Android malware detection
    • Binary Malware classification; MLP,CNN,RNN,
    • in International Conference on Mobile Software Engineering and Systems, 2020, [pdf][Code]
  • A Framework for Enhancing Deep Neural Networks Against Adversarial Malware
    • Adversarial Learning Attacks and Protections; AE, MLP
    • in IEEE Transactions on Network Science and Engineering, 2021 [pdf][Code]
  • Towards an interpretable deep learning model for mobile malware detection and family identification
    • Malware Family Identification; CNN
    • in Computers & Security 2021 [pdf][Code]
  • Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers
    • Adversarial Learning Attacks and Protections; MLP
    • in USENIX Security Symposium 2021 [pdf][Code]
  • CADE: Detecting and Explaining Concept Drift Samples for Security Applications
    • Malware Evolution Detection and Defense; AE
    • in USENIX Security Symposium 2021 [pdf][Code]
  • DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode
    • Binary Malware Detection; CNN
    • in Deployable Machine Learning for Security Defense 2021 [pdf][Code]
  • Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?
    • Malware Evolution Detection and Defense,Adversarial Learning Attacks and Protections; MLP, CNN, RNN
    • in ACSAC'21 [pdf][Code]
  • Why an Android App is Classified as Malware? Towards Malware Classification Interpretation
    • Malware detection; attention-based nueral networks
    • in TOSEM 2021, [pdf][Code]
  • Heterogeneous Temporal Graph Transformer: An Intelligent System for Evolving Android Malware Detection
    • Malware Evolution Detection and Defense, Binary Malware Detection; transformers, GNN
    • in SIGKDD'21[pdf][Code]
  • Robust Android Malware Detection against Adversarial Example Attacks
    • Adversarial Learning Attacks and Protections; Hybrid
    • IN WWW'21 [pdf][Code]
  • PetaDroid: Adaptive Android Malware Detection Using Deep Learning
    • Binary Malware Detection; Hybrid
    • In Detection of Intrusions and Malware, and Vulnerability Assessment 2021 [pdf][Code]
  • Structural A!ack against Graph Based Android Malware Detection
    • Adversarial Learning Attacks and Protections; DRL
    • in CCS'21 [pdf][Code]
  • Continuous Learning for Android Malware Detection

Machine learning-based tools

Program analysis tools

  • Apktool: A tool for reverse engineering Android apk files [link]
  • Androguard: Reverse engineering, Malware and goodware static analysis of Android applications ... and more [link]
  • FlowDroid: FlowDroid statically computes data flows in Android apps and Java programs. [link]
  • Monkey: An open source security tool for testing a data center's resiliency to perimeter breaches and internal server infection. The Monkey uses various methods to self propagate across a data center and reports success to a centralized Monkey Island server. [link]
  • DroidBox: Dynamic analysis of Android apps [link]
  • DroidBot: A lightweight test input generator for Android. Similar to Monkey, but with more intelligence and cool features. [link]

Supplementary materials

Deep learning

Research Papers

  • Deep learning - LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Nature, 2015, [pdf]
  • Deep learning - Goodfellow, Ian, et al. MIT press, 2016, [pdf1][pdf2]
  • Deep learning in neural networks: An overview - Schmidhuber, Jürgen. Neural networks, 2015, [pdf]

Online Tutorials and Repositories

Tools: Tensorflow, keras, scikit-learn, pytorch

Android Malware Analysis

Research Papers

  • Android security: a survey of issues, malware penetration, and defenses - Faruki P, Bharmal A, Laxmi V, et al. IEEE communications surveys & tutorials, 2014, [pdf]
  • A taxonomy and qualitative comparison of program analysis techniques for security assessment of android software - Sadeghi A, Bagheri H, Garcia J, et al. IEEE Transactions on Software Engineering, 2016, [pdf]
  • The Evolution of Android Malware and Android Analysis Techniques - Tam K, Feizollah A, Anuar N B, et al. ACM Computing Surveys (CSUR), 2017, [pdf]
  • Static analysis of android apps: A systematic literature review - Li L, Bissyandé T F, Papadakis M, et al. Information and Software Technology, 2017, [pdf] [Project link]
  • A Survey on Malware Detection Using Data Mining Techniques - Ye Y, Li T, Adjeroh D, et al. ACM Computing Surveys (CSUR), 2017, [pdf]
  • A survey on various threats and current state of security in android platform - Bhat P, Dutta K. ACM Computing Surveys (CSUR), 2019, [pdf]
  • A survey of Android malware detection with deep neural models - Qiu J, Zhang J, Luo W, et al. ACM Computing Surveys (CSUR), 2020, [pdf]
  • Comprehensive Android Malware Detection Based on Federated Learning Architecture - Deldar F, Abadi M. ACM Computing Surveys (CSUR), 2023, [pdf]

Recent Publications

Recent relevant studies (Last update: 2023-02, we welcome our fellow researchers to update recent works)

2023

2022

2021