Looking beyond Dalvik Bytecode

This Repository is for Android Malware Detection based on Image Representation.

For more technical details, please refer to our A-Mobile '21 paper:

"Android Malware Detection: Looking beyond Dalvik Bytecode"

Data Availability

Due to the large size of the APKs and Images, we share them upon request.
One can find the Hash list of all original APKs in the directory ApkHashList, and download them in AndroZoo.
The images can be generated with the script apk2images.py.

To generate images, use `apk2images.py` script:

This script generates 3 gray-scale images and 1 color-sacle image from an given APK.

INPUT is:

- The path of an APK to convert into images.

OUTPUTs are

- 3 gray-scale images (from .dex, .so and .xml files) and 1 color-sacle image (combined from the 3 types of files).

Example

python3 apk2images APK_PATH

Models Training and Testing

Notes:

The evaluation is repeated 10 times using the holdout technique.
The training, validation and test hashes are provided in data_splits directory.
To run the scripts blow, you need to
- Extract the gray-scale images and color-scale images for goodware and malware applications in goodware_hashes.txt and malware_hashes.txt using the apk2images.py script.
- Then organize the directory structure as dataset.example

Model based on Gray-scale Image

To train and test a model based on gray-scale image, use `ModelGray.py` script:

This script trains the Neural Network using the gray-scale training images, and evaluates its learning using the gray-scale testing dataset.

INPUTs are:

- The path to the directory that contains malware and goodware image folders.
- The name of the directory where to save the model.
- The type of the image source files, which can only be one of 'dex', 'so' or 'xml'.

OUTPUTs are

- The file that contains Accuracy, Precision, Recall, and F1-score of the ten trained models
  and their average scores.
- The ten trained models.

Example:

python3 ModelGray.py -p "dataset_images" -d "results_dir" -t "dex"

Model based on Color-scale Image

To train and test a model based on color-scale image, use `ModelColor.py` or `ModelEnsemble.py` scripts:

These two scripts train the Neural Networks using the color-scale training images, and evaluates its learning using the color-scale testing dataset.

INPUTs are:

- The path to the directory that contains malware and goodware image folders.
- The name of the directory where to save the model.

OUTPUTs are

- The file that contains Accuracy, Precision, Recall, and F1-score of the ten trained models
  and their average scores.
- The ten trained models.

Example:

python3 ModelColor.py -p "dataset_images" -d "results_dir"
# or
python3 ModelEnsemble.py -p "dataset_images" -d "results_dir"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Looking beyond Dalvik Bytecode

Data Availability

To generate images, use `apk2images.py` script:

INPUT is:

OUTPUTs are

Example

Models Training and Testing

Notes:

Model based on Gray-scale Image

To train and test a model based on gray-scale image, use `ModelGray.py` script:

INPUTs are:

OUTPUTs are

Example:

Model based on Color-scale Image

To train and test a model based on color-scale image, use `ModelColor.py` or `ModelEnsemble.py` scripts:

INPUTs are:

OUTPUTs are

Example:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ApkHashList		ApkHashList
data_splits		data_splits
dataset.example		dataset.example
ModelColor.py		ModelColor.py
ModelEnsemble.py		ModelEnsemble.py
ModelGray.py		ModelGray.py
README.md		README.md
apk2images.py		apk2images.py

Trustworthy-Software/Looking-beyond-Dalvik-Bytecode

Folders and files

Latest commit

History

Repository files navigation

Looking beyond Dalvik Bytecode

Data Availability

To generate images, use apk2images.py script:

INPUT is:

OUTPUTs are

Example

Models Training and Testing

Notes:

Model based on Gray-scale Image

To train and test a model based on gray-scale image, use ModelGray.py script:

INPUTs are:

OUTPUTs are

Example:

Model based on Color-scale Image

To train and test a model based on color-scale image, use ModelColor.py or ModelEnsemble.py scripts:

INPUTs are:

OUTPUTs are

Example:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

To generate images, use `apk2images.py` script:

To train and test a model based on gray-scale image, use `ModelGray.py` script:

To train and test a model based on color-scale image, use `ModelColor.py` or `ModelEnsemble.py` scripts:

Packages