- Create customers data file (see generate_customers.bat for syntax)
- Create transactions, utilizing prior customer file (see various .sh/.bat for syntax)
This code is heavily modified, but based on original code by Josh Plotkin. Change log of modifications to original code are below.
- Only surface-level changes done in scripts so that simulation can be done using Python3
- Corrected bat files to generate transactions files.
- Completely re-worked profiles / segementation of customers
- introduced fraudulent transactions
- introduced fraudulent profiles
- modification of transaction amount generation via Gamma distribution
- added 150k_ shell scripts for multi-threaded data generation (one python process for each segment launched in the background)
- Added unix time stamp for transactions for easier programamtic evaluation.
- Individual profiles modified so that there is more variation in the data.
- Modified random generation of age/gender. Original code did not appear to work correctly?
- Added batch files for windows users
- Transaction times are now included instead of just dates
- Profile specific spending windows (AM/PM with weighting of transaction times)
- Merchant names (specific to spending categories) are now included (along with code for generation)
- Travel probability is added, with profile specific options
- Travel max distances is added, per profile option
- Merchant location is randomized based on home location and profile travel probabilities
- Simulated transaction numbers via faker MD5 hash (replacing sequential 0..n numbering)
- Includes credit card number via faker
- improved cross-platform file path compatibility