- Mohammad Firas Sada ([email protected])
- Arjun Aggarwal ([email protected])
- Lauren-Charlise Walls ([email protected])
We have developed a Python 3 script that generates a CSV file that mimics the data produced by the experimental setup. Each row in the CSV file represents an instance of a sensor report, and each column contains a random value within some column-specific bounds. Initially, we use a uniform distribution for each column with specified bounds. If the data is timestamped, the timestamps progress logically between rows. Additionally, we can optionally inject noise into the non-time values. The program generates a small random noise delta value from a Gaussian distribution that is added to the data. This data generator can be used for testing and validation purposes.
We have also developed a Python 3 script that generates multiple CSV files that synthesize the values produced by the experimental setup. Each row in the CSV file represents an instance of a sensor report, and each column contains a random value within some column-specific bounds. Initially, we use a uniform distribution for each column with specified bounds. If the data is timestamped, the timestamps progress logically between rows. Additionally, we can optionally inject noise into the non-time values. The program generates a small random noise delta value from a Gaussian distribution that is added to the data.
This script takes an input file and the average timestamp step as input. It starts right after the last timestamp in the last file and ensures incremental timestamps in all of the synthesized files with no duplicates. It also takes a parameter as the bounds of the injected noise. This script is the first iteration of the synthesizer and can be used to generate synthetic data for training machine learning models.
All source code included is licensed under the GNU General Public License.