Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 1.16 KB

README.md

File metadata and controls

27 lines (18 loc) · 1.16 KB

reJARchiver

Effort to analyze and clean up J2ME archives.

Original idea

Structure

So far the procedure consists of 4 Python scripts:

1. Extract

Recursively extract ZIP, 7Z and RAR archives.

2. Index

Create a JSON index of all JARs found, including their hashes and manifest data.

We also need to check if the JARs are valid J2ME midlets, since some of them can be broken files, Java desktop apps, or libraries.

3. Filter

Remove entries from the index that show signs of modification by third parties (such as pirate sites that put their own name on the manifest - we call these "bad keywords").

At the same time, we also do de-duping of files based on the hashes.

4. Sort

Sort each JAR file into directories based on the app's name, and use a standard naming scheme, so variants of the same game are easy to find.

Other scripts

Some miscellaneous scripts are provided in this repo for further data analysis:

  • vendor_count.py: Outputs a list of MIDlet vendors in the JSON index sorted by frequency.