Skip to content

Python script to transform the package information received from monster in a way `kafka` can hande.

Notifications You must be signed in to change notification settings

endocode/transform-monster-data-fasten

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Python script to transform the data collected from monster to a form kafka can handle.

To transform the data just run

python3 transformData.py

and the program will take a filename of the input file and a filename of the output file. The output file will be created if not already existend.

The data set folder contains the original data received from fasten (monster-dataset*) and some already transformed sets (*.mvn.coords.txt). Each transformed data set contains a random number (10 or 1000 or full) of packages of the original dataset picked with:

shuf -n 1000 monster-dataset-full-orinal.txt > monster-dataset-1000.txt

The file monster-dataset-full.txt is a version of the monster-dataset-full-original.txt file without leading spaces, as those cause the python script to crash, as the regex doesn't work anymore.
For example:
Most of the time, the string containing the package version is:
"version":"version number"
But in some cases its:
"version": "version number"
Same goes for other key value pairs.

About

Python script to transform the package information received from monster in a way `kafka` can hande.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages