From 8d96c9b41fc5c5e3daabc422caf2428628a26d7a Mon Sep 17 00:00:00 2001 From: Alexander Nikitin <1243786+AlexanderVNikitin@users.noreply.github.com> Date: Fri, 7 Jun 2024 12:38:26 +0300 Subject: [PATCH] detailed readme --- README.md | 130 ++++++++++++++++++++++++----------- docs/guides/introduction.rst | 7 +- docs/modules/root.rst | 7 ++ tsgm/metrics/metrics.py | 2 +- 4 files changed, 100 insertions(+), 46 deletions(-) diff --git a/README.md b/README.md index 2b9ae1f..21f1cef 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,17 @@ -
+Create and evaluate synthetic time series datasets effortlessly +
+ ++ Get Started • + Tutorials • + Augmentations • + Generators • + Metrics • + Datasets • + Contributing • + Citing +
-[Documentation](https://tsgm.readthedocs.io/en/latest/) | -[Tutorials](https://github.com/AlexanderVNikitin/tsgm/tree/main/tutorials) -## About TSGM +## :jigsaw: Get Started TSGM is an open-source framework for synthetic time series generation and augmentation. The framework can be used for: -- creating synthetic data, using historical data, black-box models, or a combined approach, -- augmenting time series data, -- evaluating synthetic data with respect to consistency, privacy, downstream performance, and more. +- creating synthetic datasets (see :hammer: Generators ), +- augmenting time series data (see :art: Augmentations ), +- evaluating synthetic data with respect to consistency, privacy, downstream performance, and more (see :chart_with_upwards_trend: Metrics ), +- using common time series datasets (TSGM provides easy access to more than 140 datasets, see :floppy_disk: Datasets ). + +We provide: +* [Documentation](https://tsgm.readthedocs.io/en/latest/) with a complete overview of the implemented methods, +* [Tutorials](https://github.com/AlexanderVNikitin/tsgm/tree/main/tutorials) that describe practical use-cases of the framework. -## Install TSGM -``` +### Install TSGM +```bash pip install tsgm ``` -### M1 and M2 chips: +#### M1 and M2 chips: To install `tsgm` on Apple M1 and M2 chips: ```bash # Install tensorflow @@ -43,15 +67,7 @@ conda install tensorflow-probability scipy antropy statsmodels dtaidistance netw ``` -## Train your generative model - -- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1l2VB6eUwvrxyu8iB30faGiQM5AKthc82?usp=sharing) Introductory Tutorial "[Getting started with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/GANs/cGAN.ipynb)" -- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Vw9t4TlI1Nek_t6bMPyKcPPPqCiXfOK3?usp=sharing) Tutorial on using [Time Series Augmentations](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/augmentations.ipynb) -- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hubtddSX94KyLzuCTwmU6pAFBgBeiEB-?usp=sharing) Tutorial on [Evaluation of Synthetic Time Series Data](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/evaluation.ipynb) -- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wpf9WeNVj5TkUcPF6EavVx-hUCOfyvUd?usp=sharing) Tutorial on using [Multiple GPUs or TPU with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/Using%20Multiple%20GPUs%20or%20TPU.ipynb) - -For more examples, see [our tutorials](./tutorials). - +### Train your generative model ```python import tsgm @@ -79,16 +95,60 @@ gan.fit(dataset, epochs=N_EPOCHS) result = gan.generate(100) ``` +## :anchor: Tutorials -## Getting started +- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1l2VB6eUwvrxyu8iB30faGiQM5AKthc82?usp=sharing) Introductory Tutorial "[Getting started with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/GANs/cGAN.ipynb)" +- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Vw9t4TlI1Nek_t6bMPyKcPPPqCiXfOK3?usp=sharing) Tutorial on using [Time Series Augmentations](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/augmentations.ipynb) +- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hubtddSX94KyLzuCTwmU6pAFBgBeiEB-?usp=sharing) Tutorial on [Evaluation of Synthetic Time Series Data](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/evaluation.ipynb) +- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wpf9WeNVj5TkUcPF6EavVx-hUCOfyvUd?usp=sharing) Tutorial on using [Multiple GPUs or TPU with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/Using%20Multiple%20GPUs%20or%20TPU.ipynb) -We provide: -* [Documentation](https://tsgm.readthedocs.io/en/latest/) with a complete overview of the implemented methods, -* [Tutorials](https://github.com/AlexanderVNikitin/tsgm/tree/main/tutorials) that describe practical use-cases of the framework. +For more examples, see [our tutorials](./tutorials). +## :art: Augmentations +TSGM provides a number of time series augmentations. +| Augmentation | Class in TSGM | Reference | +| ------------- | ------------- | ------------- | +| Gaussian Noise / Jittering | `tsgm.augmentations.GaussianNoise` | - | +| Slice-And-Shuffle | `tsgm.augmentations.SliceAndShuffle` | - | +| Shuffle Features | `tsgm.augmentations.Shuffle` | - | +| Magnitude Warping | `tsgm.augmentations.MagnitudeWarping` | [Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks](https://dl.acm.org/doi/pdf/10.1145/3136755.3136817) | +| Window Warping | `tsgm.augmentations.WindowWarping` | [Data Augmentation for Time Series Classification using Convolutional Neural Networks](https://shs.hal.science/halshs-01357973/document) | +| DTW Barycentric Averaging | `tsgm.augmentations.DTWBarycentricAveraging` | [A global averaging method for dynamic time warping, with applications to clustering.](https://www.sciencedirect.com/science/article/pii/S003132031000453X) | -## 💾 Datasets +## :hammer: Generators +TSGM implements several generative models for synthetic time series data. + +| Method | Link to docs | Type | Notes | +| ------------- | ------------- | ------------- | ------------- | +| Structural Time Series model | [tsgm.models.sts.STS](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.sts.STS) | Data-driven | Great for modeling time series when prior knowledge is available (e.g., trend or seasonality). | +| GAN | [tsgm.models.cgan.GAN](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cgan.GAN) | Data-driven | A generic implementation of GAN for time series generation. It can be customized with architectures for generators and discriminators. | +| ConditionalGAN | [tsgm.models.cgan.ConditionalGAN](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cgan.ConditionalGAN) | Data-driven | A generic implementation of conditional GAN. It supports scalar conditioning as well as temporal one. | +| BetaVAE | [tsgm.models.cvae.BetaVAE](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cvae.BetaVAE) | Data-driven | A generic implementation of Beta VAE for TS. The loss function is customized to work well with multi-dimensional time series. | +| cBetaVAE | [tsgm.models.cvae.cBetaVAE](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cvae.cBetaVAE) | Data-driven | Conditional version of BetaVAE. It supports temporal a scalar condiotioning.| +| TimeGAN | [tsgm.models.timegan.TimeGAN](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.timegan.TimeGAN) | Data-driven | TSGM implementation of TimeGAN from (paper)[https://papers.nips.cc/paper_files/paper/2019/hash/c9efe5f26cd17ba6216bbe2a7d26d490-Abstract.html] | +| SineConstSimulator | [tsgm.simulator.SineConstSimulator](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.simulator.SineConstSimulator) | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. | +| LotkaVolterraSimulator | [tsgm.simulator.LotkaVolterraSimulator](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.simulator.LotkaVolterraSimulator) | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. | +| PredictiveMaintenanceSimulator | [tsgm.simulator.PredictiveMaintenanceSimulator](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.simulator.PredictiveMaintenanceSimulator) | Simulator-based | Simulator of predictive maintenance with multiple pieces of equipment from (paper)[(paper)[https://arxiv.org/pdf/2206.11574] | + +## :chart_with_upwards_trend: Metrics +TSGM implements many metrics for synthetic time series evaluation. Check Section 3 from [our paper for more detail on the evaluation of synthetic time series](https://arxiv.org/pdf/2305.11567). + +| Metric | Link to docs | Type | Notes | +| ------------- | ------------- | ------------- | ------------- | +| Distance in the space of summary statistics | [tsgm.metrics.DistanceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DistanceMetric) | Distance | Calculates a set of summary statistics in the original and synthetic data, and measures the distance between those. | +| Maximum Mean Discrepancy (MMD) | [tsgm.metrics.MMDMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.MMDMetric) | Distance | This metric calculated MMD between real and synthetic samples | +| Discriminative Score | [tsgm.metrics.DiscriminativeMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DiscriminativeMetric) | Distance | The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets. | +| Demographic Parity Score | [tsgm.metrics.DemographicParityMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DemographicParityMetric) | Fairness | This metric assesses the disparity in the distributions of a target variable among different groups in two datasets. | +| Privacy Membership Inference Attack Score | [tsgm.metrics.PrivacyMembershipInferenceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.PrivacyMembershipInferenceMetric) | Privacy | The metric measures the possibility of membership inference attacks.| +| Spectral Entropy | [tsgm.metrics.EntropyMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.EntropyMetric) | Diversity | Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies. | +| Shannon Entropy | [tsgm.metrics.ShannonEntropyMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.ShannonEntropyMetric) | Diversity | Shannon Entropy calculated over the labels of a dataset. | +| Pairwise Distance | [tsgm.metrics.PairwiseDistanceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.PairwiseDistanceMetric) | Diversity | Measures pairwise distances in a set of time series. | +| Downstream Effectiveness | [tsgm.metrics.DownstreamPerformanceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DownstreamPerformanceMetric) | Downstream Effectiveness | The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data. | +| Qualitative Evaluation | [tsgm.utils.visualization](https://tsgm.readthedocs.io/en/latest/modules/root.html#module-tsgm.utils.visualization) | Qualitative | Various tools for visual assessment of a generated dataset. | + + +## :floppy_disk: Datasets | Dataset | API | Description | | ------------- | ------------- | ------------- | | UCR Dataset | `tsgm.utils.UCRDataManager` | https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ | @@ -102,22 +162,10 @@ We provide: | Samples from GPs | `tsgm.utils.get_gp_samples_data()` | https://en.wikipedia.org/wiki/Gaussian_process | | Physionet 2012 | `tsgm.utils.get_physionet2012()` | https://archive.physionet.org/pn3/challenge/2012/ | -TSGM provides API for convenient use of many time-series datasets (currently more than 20 datasets). The comprehensive list of the datasets in the [documentation](https://tsgm.readthedocs.io/en/latest/guides/datasets.html) - -## Augmentations -TSGM provides a number of time series augmentations. - -| Augmentation | Class in TSGM | Reference | -| ------------- | ------------- | ------------- | -| Gaussian Noise / Jittering | `tsgm.augmentations.GaussianNoise` | - | -| Slice-And-Shuffle | `tsgm.augmentations.SliceAndShuffle` | - | -| Shuffle Features | `tsgm.augmentations.Shuffle` | - | -| Magnitude Warping | `tsgm.augmentations.MagnitudeWarping` | [Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks](https://dl.acm.org/doi/pdf/10.1145/3136755.3136817) | -| Window Warping | `tsgm.augmentations.WindowWarping` | [Data Augmentation for Time Series Classification using Convolutional Neural Networks](https://shs.hal.science/halshs-01357973/document) | -| DTW Barycentric Averaging | `tsgm.augmentations.DTWBarycentricAveraging` | [A global averaging method for dynamic time warping, with applications to clustering.](https://www.sciencedirect.com/science/article/pii/S003132031000453X) | +TSGM provides API for convenient use of many time-series datasets (currently more than 140 datasets). The comprehensive list of the datasets in the [documentation](https://tsgm.readthedocs.io/en/latest/guides/datasets.html) -## Contributing +## :hammer_and_wrench: Contributing We appreciate all contributions. To learn more, please check [CONTRIBUTING.md](CONTRIBUTING.md). #### For contributors @@ -137,14 +185,14 @@ To check static typing: mypy ``` -## CLI +## :computer: CLI We provide two CLIs for convenient synthetic data generation: - `tsgm-gd` generates data by a stored sample, - `tsgm-eval` evaluates the generated time series. Use `tsgm-gd --help` or `tsgm-eval --help` for documentation. -## Citing +## :mag: Citing If you find this repo useful, please consider citing our paper: ``` @article{ diff --git a/docs/guides/introduction.rst b/docs/guides/introduction.rst index 03dc193..bc03d89 100644 --- a/docs/guides/introduction.rst +++ b/docs/guides/introduction.rst @@ -42,7 +42,7 @@ The training of data-driven simulators can be done via likelihood optimization, - `tsgm.models.cgan.ConditionalGAN` - conditional GAN model for labeled and temporally labeled time-series simulation,\\ - `tsgm.models.cvae.BetaVAE` - beta-VAE model adapted for time-series simulation,\\ - `tsgm.models.cvae.cBetaVAE` - conditional beta-VAE model for labeled and temporally labeled time-series simulation,\\ -- `tsgm.models.cvae.TimeGAN` - extended GAN-based model for time series generation. +- `tsgm.models.timegan.TimeGAN` - extended GAN-based model for time series generation. A minimalistic example of synthetic data generation with VAEs: @@ -107,8 +107,9 @@ In `tsgm.metrics`, we implemented several metrics for evaluation of generated ti - predictive consistency: `tsgm.metrics.ConsistencyMetric`, - fairness: `tsgm.metrics.DemographicParityMetric`, - privacy: `tsgm.metrics.PrivacyMembershipInferenceMetric`, +- diversity: `tsgm.metrics.EntropyMetric`, `tsgm.metrics.ShannonEntropyMetric`, `tsgm.metrics.PairwiseDistanceMetric`, - downstream effectiveness: `tsgm.metrics.DownstreamPerformanceMetric`, -- qualitative analysis: `tsgm.visualization`. +- qualitative analysis: `tsgm.utils.visualization`. See the following code for an example of using metrics: @@ -151,5 +152,3 @@ If you find the *TSGM* useful, please consider citing our paper: journal={arXiv preprint arXiv:2305.11567}, year={2023} } - - diff --git a/docs/modules/root.rst b/docs/modules/root.rst index f6b8e8c..c0bffd2 100644 --- a/docs/modules/root.rst +++ b/docs/modules/root.rst @@ -85,6 +85,13 @@ Datasets :undoc-members: +Simulators +-------------- +.. automodule:: tsgm.simulator + :members: + :undoc-members: + + Data Processing Utils -------------- .. automodule:: tsgm.utils.data_processing diff --git a/tsgm/metrics/metrics.py b/tsgm/metrics/metrics.py index c5c7bae..22f5a85 100644 --- a/tsgm/metrics/metrics.py +++ b/tsgm/metrics/metrics.py @@ -172,7 +172,7 @@ def __call__(self, D1: tsgm.dataset.DatasetOrTensor, D2: tsgm.dataset.DatasetOrT class PrivacyMembershipInferenceMetric(Metric): """ - The metric that measures the possibility of membership inference attacks. + The metric measures the possibility of membership inference attacks. """ def __init__(self, attacker: T.Any, metric: T.Optional[T.Callable] = None) -> None: """