Skip to content

Commit

Permalink
detailed readme
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexanderVNikitin committed Jun 7, 2024
1 parent 50a0c6e commit a5f23b9
Show file tree
Hide file tree
Showing 7 changed files with 209 additions and 51 deletions.
131 changes: 88 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,55 @@
<div style="text-align:center">
<div align="center">
<img src="https://github.com/AlexanderVNikitin/tsgm/raw/main/docs/_static/logo.png">
</div>

<h3 align="center">
Time Series Generative Modeling (TSGM)
</h3>

<p align="center">
Create and evaluate synthetic time series datasets effortlessly
</p>

<div align="center">

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1l2VB6eUwvrxyu8iB30faGiQM5AKthc82?usp=sharing)
[![Pypi version](https://img.shields.io/pypi/v/tsgm)](https://pypi.org/project/tsgm/)
[![unit-tests](https://github.com/AlexanderVNikitin/tsgm/actions/workflows/test.yml/badge.svg?event=push)](https://github.com/AlexanderVNikitin/tsgm/actions?query=workflow%3ATests+branch%3Amain)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)
[![codecov](https://codecov.io/gh/AlexanderVNikitin/tsgm/branch/main/graph/badge.svg?token=UD38ANZ0M1)](https://codecov.io/gh/AlexanderVNikitin/tsgm)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11567-b31b1b.svg)](https://arxiv.org/abs/2305.11567)

# Time Series Generative Modeling (TSGM)
</div>

[Documentation](https://tsgm.readthedocs.io/en/latest/) |
[Tutorials](https://github.com/AlexanderVNikitin/tsgm/tree/main/tutorials)
<p align="center">
<a href="#jigsaw-get-started">Get Started</a> •
<a href="#anchor-tutorials">Tutorials</a> •
<a href="#art-augmentations">Augmentations</a> •
<a href="#hammer-generators">Generators</a> •
<a href="#chart_with_upwards_trend-metrics">Metrics</a> •
<a href="#floppy_disk-datasets">Datasets</a> •
<a href="#hammer_and_wrench-contributing">Contributing</a> •
<a href="#mag-citing">Citing</a>
</p>

## About TSGM

TSGM is an open-source framework for synthetic time series generation and augmentation.
## :jigsaw: Get Started

The framework can be used for:
- creating synthetic data, using historical data, black-box models, or a combined approach,
- augmenting time series data,
- evaluating synthetic data with respect to consistency, privacy, downstream performance, and more.
TSGM is an open-source framework for synthetic time series dataset generation and evaluation.

The framework can be used for creating synthetic datasets (see <a href="#hammer-generators">:hammer: Generators </a>), augmenting time series data (see <a href="#art-augmentations">:art: Augmentations </a>), evaluating synthetic data with respect to consistency, privacy, downstream performance, and more (see <a href="#chart_with_upwards_trend-metrics">:chart_with_upwards_trend: Metrics </a>), using common time series datasets (TSGM provides easy access to more than 140 datasets, see <a href="#floppy_disk-datasets">:floppy_disk: Datasets </a>).

## Install TSGM
```
We provide:
* [Documentation](https://tsgm.readthedocs.io/en/latest/) with a complete overview of the implemented methods,
* [Tutorials](https://github.com/AlexanderVNikitin/tsgm/tree/main/tutorials) that describe practical use-cases of the framework.


### Install TSGM
```bash
pip install tsgm
```

### M1 and M2 chips:
#### M1 and M2 chips:
To install `tsgm` on Apple M1 and M2 chips:
```bash
# Install tensorflow
Expand All @@ -43,15 +63,7 @@ conda install tensorflow-probability scipy antropy statsmodels dtaidistance netw
```


## Train your generative model

- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1l2VB6eUwvrxyu8iB30faGiQM5AKthc82?usp=sharing) Introductory Tutorial "[Getting started with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/GANs/cGAN.ipynb)"
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Vw9t4TlI1Nek_t6bMPyKcPPPqCiXfOK3?usp=sharing) Tutorial on using [Time Series Augmentations](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/augmentations.ipynb)
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hubtddSX94KyLzuCTwmU6pAFBgBeiEB-?usp=sharing) Tutorial on [Evaluation of Synthetic Time Series Data](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/evaluation.ipynb)
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wpf9WeNVj5TkUcPF6EavVx-hUCOfyvUd?usp=sharing) Tutorial on using [Multiple GPUs or TPU with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/Using%20Multiple%20GPUs%20or%20TPU.ipynb)

For more examples, see [our tutorials](./tutorials).

### Train your generative model
```python
import tsgm

Expand Down Expand Up @@ -79,16 +91,61 @@ gan.fit(dataset, epochs=N_EPOCHS)
result = gan.generate(100)
```

## :anchor: Tutorials

## Getting started
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1l2VB6eUwvrxyu8iB30faGiQM5AKthc82?usp=sharing) Introductory Tutorial "[Getting started with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/GANs/cGAN.ipynb)"
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Vw9t4TlI1Nek_t6bMPyKcPPPqCiXfOK3?usp=sharing) Tutorial on using [Time Series Augmentations](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/augmentations.ipynb)
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hubtddSX94KyLzuCTwmU6pAFBgBeiEB-?usp=sharing) Tutorial on [Evaluation of Synthetic Time Series Data](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/evaluation.ipynb)
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wpf9WeNVj5TkUcPF6EavVx-hUCOfyvUd?usp=sharing) Tutorial on using [Multiple GPUs or TPU with TSGM](https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/Using%20Multiple%20GPUs%20or%20TPU.ipynb)

We provide:
* [Documentation](https://tsgm.readthedocs.io/en/latest/) with a complete overview of the implemented methods,
* [Tutorials](https://github.com/AlexanderVNikitin/tsgm/tree/main/tutorials) that describe practical use-cases of the framework.
For more examples, see [our tutorials](./tutorials).

## :art: Augmentations
TSGM provides a number of time series augmentations.

| Augmentation | Class in TSGM | Reference |
| ------------- | ------------- | ------------- |
| Gaussian Noise / Jittering | `tsgm.augmentations.GaussianNoise` | - |
| Slice-And-Shuffle | `tsgm.augmentations.SliceAndShuffle` | - |
| Shuffle Features | `tsgm.augmentations.Shuffle` | - |
| Magnitude Warping | `tsgm.augmentations.MagnitudeWarping` | [Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks](https://dl.acm.org/doi/pdf/10.1145/3136755.3136817) |
| Window Warping | `tsgm.augmentations.WindowWarping` | [Data Augmentation for Time Series Classification using Convolutional Neural Networks](https://shs.hal.science/halshs-01357973/document) |
| DTW Barycentric Averaging | `tsgm.augmentations.DTWBarycentricAveraging` | [A global averaging method for dynamic time warping, with applications to clustering.](https://www.sciencedirect.com/science/article/pii/S003132031000453X) |

## 💾 Datasets
## :hammer: Generators
TSGM implements several generative models for synthetic time series data.

| Method | Link to docs | Type | Notes |
| ------------- | ------------- | ------------- | ------------- |
| Structural Time Series model | [tsgm.models.sts.STS](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.sts.STS) | Data-driven | Great for modeling time series when prior knowledge is available (e.g., trend or seasonality). |
| GAN | [tsgm.models.cgan.GAN](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cgan.GAN) | Data-driven | A generic implementation of GAN for time series generation. It can be customized with architectures for generators and discriminators. |
| ConditionalGAN | [tsgm.models.cgan.ConditionalGAN](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cgan.ConditionalGAN) | Data-driven | A generic implementation of conditional GAN. It supports scalar conditioning as well as temporal one. |
| BetaVAE | [tsgm.models.cvae.BetaVAE](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cvae.BetaVAE) | Data-driven | A generic implementation of Beta VAE for TS. The loss function is customized to work well with multi-dimensional time series. |
| cBetaVAE | [tsgm.models.cvae.cBetaVAE](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.cvae.cBetaVAE) | Data-driven | Conditional version of BetaVAE. It supports temporal a scalar condiotioning.|
| TimeGAN | [tsgm.models.timegan.TimeGAN](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.models.timegan.TimeGAN) | Data-driven | TSGM implementation of TimeGAN from (paper)[https://papers.nips.cc/paper_files/paper/2019/hash/c9efe5f26cd17ba6216bbe2a7d26d490-Abstract.html] |
| SineConstSimulator | [tsgm.simulator.SineConstSimulator](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.simulator.SineConstSimulator) | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
| LotkaVolterraSimulator | [tsgm.simulator.LotkaVolterraSimulator](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.simulator.LotkaVolterraSimulator) | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
| PredictiveMaintenanceSimulator | [tsgm.simulator.PredictiveMaintenanceSimulator](https://tsgm.readthedocs.io/en/latest/modules/root.html#tsgm.simulator.PredictiveMaintenanceSimulator) | Simulator-based | Simulator of predictive maintenance with multiple pieces of equipment from (paper)[(paper)[https://arxiv.org/pdf/2206.11574] |

## :chart_with_upwards_trend: Metrics
TSGM implements many metrics for synthetic time series evaluation. Check Section 3 from [our paper for more detail on the evaluation of synthetic time series](https://arxiv.org/pdf/2305.11567).

| Metric | Link to docs | Type | Notes |
| ------------- | ------------- | ------------- | ------------- |
| Distance in the space of summary statistics | [tsgm.metrics.DistanceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DistanceMetric) | Distance | Calculates a set of summary statistics in the original and synthetic data, and measures the distance between those. |
| Maximum Mean Discrepancy (MMD) | [tsgm.metrics.MMDMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.MMDMetric) | Distance | This metric calculated MMD between real and synthetic samples |
| Discriminative Score | [tsgm.metrics.DiscriminativeMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DiscriminativeMetric) | Distance | The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets. |
| Demographic Parity Score | [tsgm.metrics.DemographicParityMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DemographicParityMetric) | Fairness | This metric assesses the difference in the distributions of a target variable among different groups in two datasets. Refer to [this paper](https://fairware.cs.umass.edu/papers/Verma.pdf) to learn more. |
| Predictive Parity Score | [tsgm.metrics.PredictiveParityMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.PredictiveParityMetric) | Fairness | This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. Refer to [this paper](https://fairware.cs.umass.edu/papers/Verma.pdf) to learn more. |
| Privacy Membership Inference Attack Score | [tsgm.metrics.PrivacyMembershipInferenceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.PrivacyMembershipInferenceMetric) | Privacy | The metric measures the possibility of membership inference attacks.|
| Spectral Entropy | [tsgm.metrics.EntropyMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.EntropyMetric) | Diversity | Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies. |
| Shannon Entropy | [tsgm.metrics.ShannonEntropyMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.ShannonEntropyMetric) | Diversity | Shannon Entropy calculated over the labels of a dataset. |
| Pairwise Distance | [tsgm.metrics.PairwiseDistanceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.PairwiseDistanceMetric) | Diversity | Measures pairwise distances in a set of time series. |
| Downstream Effectiveness | [tsgm.metrics.DownstreamPerformanceMetric](https://tsgm.readthedocs.io/en/latest/autoapi/tsgm/metrics/index.html#tsgm.metrics.DownstreamPerformanceMetric) | Downstream Effectiveness | The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data. |
| Qualitative Evaluation | [tsgm.utils.visualization](https://tsgm.readthedocs.io/en/latest/modules/root.html#module-tsgm.utils.visualization) | Qualitative | Various tools for visual assessment of a generated dataset. |


## :floppy_disk: Datasets
| Dataset | API | Description |
| ------------- | ------------- | ------------- |
| UCR Dataset | `tsgm.utils.UCRDataManager` | https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ |
Expand All @@ -102,22 +159,10 @@ We provide:
| Samples from GPs | `tsgm.utils.get_gp_samples_data()` | https://en.wikipedia.org/wiki/Gaussian_process |
| Physionet 2012 | `tsgm.utils.get_physionet2012()` | https://archive.physionet.org/pn3/challenge/2012/ |

TSGM provides API for convenient use of many time-series datasets (currently more than 20 datasets). The comprehensive list of the datasets in the [documentation](https://tsgm.readthedocs.io/en/latest/guides/datasets.html)

## Augmentations
TSGM provides a number of time series augmentations.

| Augmentation | Class in TSGM | Reference |
| ------------- | ------------- | ------------- |
| Gaussian Noise / Jittering | `tsgm.augmentations.GaussianNoise` | - |
| Slice-And-Shuffle | `tsgm.augmentations.SliceAndShuffle` | - |
| Shuffle Features | `tsgm.augmentations.Shuffle` | - |
| Magnitude Warping | `tsgm.augmentations.MagnitudeWarping` | [Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks](https://dl.acm.org/doi/pdf/10.1145/3136755.3136817) |
| Window Warping | `tsgm.augmentations.WindowWarping` | [Data Augmentation for Time Series Classification using Convolutional Neural Networks](https://shs.hal.science/halshs-01357973/document) |
| DTW Barycentric Averaging | `tsgm.augmentations.DTWBarycentricAveraging` | [A global averaging method for dynamic time warping, with applications to clustering.](https://www.sciencedirect.com/science/article/pii/S003132031000453X) |
TSGM provides API for convenient use of many time-series datasets (currently more than 140 datasets). The comprehensive list of the datasets in the [documentation](https://tsgm.readthedocs.io/en/latest/guides/datasets.html)


## Contributing
## :hammer_and_wrench: Contributing
We appreciate all contributions. To learn more, please check [CONTRIBUTING.md](CONTRIBUTING.md).

#### For contributors
Expand All @@ -137,14 +182,14 @@ To check static typing:
mypy
```

## CLI
## :computer: CLI
We provide two CLIs for convenient synthetic data generation:
- `tsgm-gd` generates data by a stored sample,
- `tsgm-eval` evaluates the generated time series.

Use `tsgm-gd --help` or `tsgm-eval --help` for documentation.

## Citing
## :mag: Citing
If you find this repo useful, please consider citing our paper:
```
@article{
Expand Down
9 changes: 4 additions & 5 deletions docs/guides/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The training of data-driven simulators can be done via likelihood optimization,
- `tsgm.models.cgan.ConditionalGAN` - conditional GAN model for labeled and temporally labeled time-series simulation,\\
- `tsgm.models.cvae.BetaVAE` - beta-VAE model adapted for time-series simulation,\\
- `tsgm.models.cvae.cBetaVAE` - conditional beta-VAE model for labeled and temporally labeled time-series simulation,\\
- `tsgm.models.cvae.TimeGAN` - extended GAN-based model for time series generation.
- `tsgm.models.timegan.TimeGAN` - extended GAN-based model for time series generation.

A minimalistic example of synthetic data generation with VAEs:

Expand Down Expand Up @@ -105,10 +105,11 @@ In `tsgm.metrics`, we implemented several metrics for evaluation of generated ti

- data similarity / distance: `tsgm.metrics.DistanceMetric`, `tsgm.metrics.MMDMetric`, `tsgm.metrics.DiscriminativeMetric`,
- predictive consistency: `tsgm.metrics.ConsistencyMetric`,
- fairness: `tsgm.metrics.DemographicParityMetric`,
- fairness: `tsgm.metrics.DemographicParityMetric`, `tsgm.metrics.PredictiveParityMetric`
- privacy: `tsgm.metrics.PrivacyMembershipInferenceMetric`,
- diversity: `tsgm.metrics.EntropyMetric`, `tsgm.metrics.ShannonEntropyMetric`, `tsgm.metrics.PairwiseDistanceMetric`,
- downstream effectiveness: `tsgm.metrics.DownstreamPerformanceMetric`,
- qualitative analysis: `tsgm.visualization`.
- qualitative analysis: `tsgm.utils.visualization`.

See the following code for an example of using metrics:

Expand Down Expand Up @@ -151,5 +152,3 @@ If you find the *TSGM* useful, please consider citing our paper:
journal={arXiv preprint arXiv:2305.11567},
year={2023}
}


7 changes: 7 additions & 0 deletions docs/modules/root.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,13 @@ Datasets
:undoc-members:


Simulators
--------------
.. automodule:: tsgm.simulator
:members:
:undoc-members:


Data Processing Utils
--------------
.. automodule:: tsgm.utils.data_processing
Expand Down
12 changes: 12 additions & 0 deletions tests/test_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,3 +255,15 @@ def test_demographic_parity():
2: 0,
3: -np.inf
}


def test_predictive_parity():
metric = tsgm.metrics.PredictiveParityMetric()
y_pred_hist = np.array([0, 1, 1, 0, 0, 1])
y_true_hist = np.array([0, 1, 0, 0, 0, 1])
groups_hist = np.array([0, 0, 0, 1, 1, 1])
y_true_synth = np.array([0, 0, 1, 0, 0, 1])
y_pred_synth = np.array([1, 0, 0, 0, 0, 1])
groups_synth = np.array([0, 0, 0, 1, 1, 1])
result = metric(y_true_hist, y_pred_hist, groups_hist, y_true_synth, y_pred_synth, groups_synth)
assert result[0] > result[1] and result[1] == 0
17 changes: 17 additions & 0 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,24 @@
import tensorflow as tf
import sklearn.metrics.pairwise
from unittest import mock
from functools import wraps

import tsgm


def skip_on(exception, reason="default"):
def decorator_func(f):
@wraps(f)
def wrapper(*args, **kwargs):
try:
return f(*args, **kwargs)
except exception:
pytest.skip(reason)

return wrapper
return decorator_func


def test_TSFeatureWiseScaler():
ts = np.array([[[0, 2], [1, 0], [1, 2]]])
scaler = tsgm.utils.TSFeatureWiseScaler()
Expand Down Expand Up @@ -102,19 +116,22 @@ def test_split_dataset_into_objects():
assert y.shape == (2225, 1)


@skip_on(urllib.error.HTTPError, reason="HTTPError due to connection")
def test_get_eeg():
X, y = tsgm.utils.get_eeg()

assert X.shape == (14980, 14)
assert y.shape == (14980,)


@skip_on(urllib.error.HTTPError, reason="HTTPError due to connection")
def test_get_power_consumption():
X = tsgm.utils.get_power_consumption()

assert X.shape == (2075259, 7)


@skip_on(urllib.error.HTTPError, reason="HTTPError due to connection")
def test_get_power_consumption_second_call(mocker):
X = tsgm.utils.get_power_consumption()
file_download_mock = mocker.patch('tsgm.utils.download')
Expand Down
Loading

0 comments on commit a5f23b9

Please sign in to comment.