Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add models US-GAN and GP-VAE, update docs, refactor testing cases, add cal_internal_cluster_validation_metrics() #190

Merged
merged 22 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 39 additions & 21 deletions .github/workflows/testing_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,43 +15,61 @@ jobs:
runs-on: ${{ matrix.os }}
defaults:
run:
shell: bash -l {0}
shell: bash {0}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
python-version: ["3.7", "3.8", "3.9", "3.10"]
python-version: ["3.7", "3.10"]
torch-version: ["1.13.1"]

steps:
- name: Check out the repo code
uses: actions/checkout@v3

- name: Set up Conda
uses: conda-incubator/setup-miniconda@v2
- name: Determine the Python version
uses: haya14busa/action-cond@v1
id: condval
with:
activate-environment: pypots-test
python-version: ${{ matrix.python-version }}
environment-file: tests/environment_for_conda_test.yml
auto-activate-base: false
cond: ${{ matrix.python-version == 3.7 && matrix.os == 'macOS-latest' }}
# Note: the latest 3.7 subversion 3.7.17 for MacOS has "ModuleNotFoundError: No module named '_bz2'"
if_true: "3.7.16"
if_false: ${{ matrix.python-version }}

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ steps.condval.outputs.value }}
check-latest: true
cache: pip
cache-dependency-path: |
setup.cfg

- name: Install PyTorch ${{ matrix.torch-version }}+cpu
# we have to install torch in advance because torch_sparse needs it for compilation,
# refer to https://github.com/rusty1s/pytorch_sparse/issues/156#issuecomment-1304869772 for details
run: |
which python
which pip
python -m pip install --upgrade pip
pip install torch==${{ matrix.torch-version }} -f https://download.pytorch.org/whl/cpu
python -c "import torch; print('PyTorch:', torch.__version__)"

- name: Install other dependencies
run: |
pip install pypots
pip install torch-geometric torch-scatter torch-sparse -f "https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html"
pip install -e ".[dev]"

- name: Fetch the test environment details
run: |
which python
conda info
conda list
pip list

- name: Test with pytest
run: |
# run tests separately here due to Segmentation Fault in test_clustering when run all in
# one command with `pytest` on MacOS. Bugs not caught, so this is a trade-off to avoid SF.
python -m pytest -rA tests/test_classification.py -n auto --cov=pypots --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/test_imputation.py -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/test_clustering.py -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/test_forecasting.py -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/test_optim.py -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/test_data.py -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/test_utils.py -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/test_cli.py -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
rm -rf tests/__pycache__
python -m pytest -rA tests/*/* -n auto --cov=pypots --dist=loadgroup --cov-config=.coveragerc

- name: Generate the LCOV report
run: |
Expand All @@ -61,4 +79,4 @@ jobs:
uses: coverallsapp/github-action@master
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
path-to-lcov: 'coverage.lcov'
path-to-lcov: "coverage.lcov"
60 changes: 21 additions & 39 deletions .github/workflows/testing_daily.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,61 +10,43 @@ jobs:
runs-on: ${{ matrix.os }}
defaults:
run:
shell: bash {0}
shell: bash -l {0}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
python-version: ["3.7", "3.8", "3.9", "3.10"]
torch-version: ["1.13.1"]
python-version: ["3.7", "3.10"]

steps:
- name: Check out the repo code
uses: actions/checkout@v3

- name: Determine the Python version
uses: haya14busa/action-cond@v1
id: condval
- name: Set up Conda
uses: conda-incubator/setup-miniconda@v2
with:
cond: ${{ matrix.python-version == 3.7 && matrix.os == 'macOS-latest' }}
# Note: the latest 3.7 subversion 3.7.17 for MacOS has "ModuleNotFoundError: No module named '_bz2'"
if_true: "3.7.16"
if_false: ${{ matrix.python-version }}

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ steps.condval.outputs.value }}
check-latest: true
cache: pip
cache-dependency-path: |
setup.cfg

- name: Install PyTorch ${{ matrix.torch-version }}+cpu
# we have to install torch in advance because torch_sparse needs it for compilation,
# refer to https://github.com/rusty1s/pytorch_sparse/issues/156#issuecomment-1304869772 for details
run: |
which python
which pip
python -m pip install --upgrade pip
pip install torch==${{ matrix.torch-version }} -f https://download.pytorch.org/whl/cpu
python -c "import torch; print('PyTorch:', torch.__version__)"

- name: Install other dependencies
run: |
pip install pypots
pip install torch-geometric torch-scatter torch-sparse -f "https://data.pyg.org/whl/torch-${{ matrix.torch-version }}+cpu.html"
pip install -e ".[dev]"
activate-environment: pypots-test
python-version: ${{ matrix.python-version }}
environment-file: tests/environment_for_conda_test.yml
auto-activate-base: false

- name: Fetch the test environment details
run: |
which python
pip list
conda info
conda list

- name: Test with pytest
run: |
coverage run --source=pypots -m pytest --ignore tests/test_training_on_multi_gpus.py
# ignore the test_training_on_multi_gpus.py because it requires multiple GPUs which are not available on GitHub Actions
# run tests separately here due to Segmentation Fault in test_clustering when run all in
# one command with `pytest` on MacOS. Bugs not caught, so this is a trade-off to avoid SF.
python -m pytest -rA tests/classification/* -n auto --cov=pypots --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/imputation/* -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/clustering/* -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/forecasting/* -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/optim/* -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/data/* -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/utils/* -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/cli/* -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc

- name: Generate the LCOV report
run: |
Expand All @@ -74,4 +56,4 @@ jobs:
uses: coverallsapp/github-action@master
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
path-to-lcov: "coverage.lcov"
path-to-lcov: 'coverage.lcov'
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ docs/_build
.coverage
.pytest_cache
*__pycache__*
*testing_results*
*test*

# ignore specific kinds of files like all PDFs
*.pdf
*.ipynb
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
<img src="https://pypots.com/figs/pypots_logos/PyPOTS_logo_FFBG.svg?sanitize=true" width="200" align="right">
</a>

## <p align="center">Welcome to PyPOTS</p>
<h2 align="center">Welcome to PyPOTS</h2>

**<p align="center">A Python Toolbox for Data Mining on Partially-Observed Time Series</p>**

<p align="center">
Expand Down Expand Up @@ -161,6 +162,8 @@ PyPOTS supports imputation, classification, clustering, and forecasting tasks on
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Neural Net | SAITS | Self-Attention-based Imputation for Time Series [^1] | 2023 |
| Neural Net | Transformer | Attention is All you Need [^2];<br>Self-Attention-based Imputation for Time Series [^1];<br><sub>Note: proposed in [^2], and re-implemented as an imputation model in [^1].</sub> | 2017 |
| Neural Net | US-GAN | Generative Semi-supervised Learning for Multivariate Time Series Imputation [^10] | 2021 |
| Neural Net | GP-VAE | GP-VAE: Deep Probabilistic Time Series Imputation [^11] | 2020 |
| Neural Net | BRITS | Bidirectional Recurrent Imputation for Time Series [^3] | 2018 |
| Neural Net | M-RNN | Multi-directional Recurrent Neural Network [^9] | 2019 |
| Naive | LOCF | Last Observation Carried Forward | - |
Expand Down Expand Up @@ -253,7 +256,7 @@ We care about the feedback from our users, so we're building PyPOTS community on
If you have any suggestions or want to contribute ideas or share time-series related papers, join us and tell.
PyPOTS community is open, transparent, and surely friendly. Let's work together to build and improve PyPOTS!


[//]: # (Use APA reference style below)
[^1]: Du, W., Cote, D., & Liu, Y. (2023). [SAITS: Self-Attention-based Imputation for Time Series](https://doi.org/10.1016/j.eswa.2023.119619). *Expert systems with applications*.
[^2]: Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). [Attention is All you Need](https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html). *NeurIPS 2017*.
[^3]: Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). [BRITS: Bidirectional Recurrent Imputation for Time Series](https://papers.nips.cc/paper/2018/hash/734e6bfcd358e25ac1db0a4241b95651-Abstract.html). *NeurIPS 2018*.
Expand All @@ -263,12 +266,13 @@ PyPOTS community is open, transparent, and surely friendly. Let's work together
[^7]: Jong, J.D., Emon, M.A., Wu, P., Karki, R., Sood, M., Godard, P., Ahmad, A., Vrooman, H.A., Hofmann-Apitius, M., & Fröhlich, H. (2019). [Deep learning for clustering of multivariate clinical patient trajectories with missing values](https://academic.oup.com/gigascience/article/8/11/giz134/5626377). *GigaScience*.
[^8]: Chen, X., & Sun, L. (2021). [Bayesian Temporal Factorization for Multidimensional Time Series Prediction](https://arxiv.org/abs/1910.06366). *IEEE transactions on pattern analysis and machine intelligence*.
[^9]: Yoon, J., Zame, W. R., & van der Schaar, M. (2019). [Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks](https://ieeexplore.ieee.org/document/8485748). *IEEE Transactions on Biomedical Engineering*.

[^10]: Miao, X., Wu, Y., Wang, J., Gao, Y., Mao, X., & Yin, J. (2021). [Generative Semi-supervised Learning for Multivariate Time Series Imputation](https://ojs.aaai.org/index.php/AAAI/article/view/17086). *AAAI 2021*.
[^11]: Fortuin, V., Baranchuk, D., Raetsch, G. & Mandt, S.. (2020). [GP-VAE: Deep Probabilistic Time Series Imputation](https://proceedings.mlr.press/v108/fortuin20a.html). *AISTATS 2020*.

<details>
<summary>🏠 Visits</summary>
<a href="https://github.com/WenjieDu/PyPOTS">
<img alt="PyPOTS visits" align="left" src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FPyPOTS%2FPyPOTS&count_bg=%23009A0A&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visits%20since%20May%202022&edge_flat=false">
</a>
</details>
<br>
<br>
2 changes: 1 addition & 1 deletion docs/about_us.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,5 @@ PyPOTS exists thanks to all the nice people (sorted by contribution time) who co

.. raw:: html

<object data="https://pypots.com/figs/PyPOTS_contributors.svg">
<object data="https://pypots.com/figs/pypots_logos/PyPOTS_contributors.svg">
</object>
9 changes: 9 additions & 0 deletions docs/pypots.data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ pypots.data.base module
:show-inheritance:
:inherited-members:

pypots.data.saving module
-----------------------------

.. automodule:: pypots.data.saving
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

pypots.data.generating module
-----------------------------

Expand Down
24 changes: 22 additions & 2 deletions docs/pypots.forecasting.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,31 @@
pypots.forecasting package
==========================

Subpackages
-----------

pypots.forecasting.bttf module
.. toctree::
:maxdepth: 4

pypots.forecasting.bttf
pypots.forecasting.template

Submodules
----------

pypots.forecasting.base module
------------------------------

.. automodule:: pypots.forecasting.bttf
.. automodule:: pypots.forecasting.base
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

Module contents
---------------

.. automodule:: pypots.forecasting
:members:
:undoc-members:
:show-inheritance:
Expand Down
18 changes: 18 additions & 0 deletions docs/pypots.imputation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,24 @@ pypots.imputation.transformer module
:show-inheritance:
:inherited-members:

pypots.imputation.usgan module
------------------------------

.. automodule:: pypots.imputation.usgan
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

pypots.imputation.gpvae module
------------------------------

.. automodule:: pypots.imputation.gpvae
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

pypots.imputation.brits module
------------------------------

Expand Down
5 changes: 3 additions & 2 deletions pypots/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,9 @@ def _setup_device(self, device: Union[None, str, torch.device, list]):
self.device = device
elif isinstance(device, list):
if len(device) == 0:
raise ValueError("The list of devices should have at least 1 device, but got 0.")
raise ValueError(
"The list of devices should have at least 1 device, but got 0."
)
elif len(device) == 1:
return self._setup_device(device[0])
# parallely training on multiple CUDA devices
Expand Down Expand Up @@ -176,7 +178,6 @@ def _send_data_to_given_device(self, data):
if isinstance(self.device, torch.device): # single device
data = map(lambda x: x.to(self.device), data)
else: # parallely training on multiple devices

# randomly choose one device to balance the workload
# device = np.random.choice(self.device)

Expand Down
1 change: 0 additions & 1 deletion pypots/classification/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,6 @@ def _train_model(
training_loader: DataLoader,
val_loader: DataLoader = None,
) -> None:

# each training starts from the very beginning, so reset the loss and model dict here
self.best_loss = float("inf")
self.best_model_dict = None
Expand Down
2 changes: 1 addition & 1 deletion pypots/classification/grud/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def _fetch_data_from_file(self, idx: int) -> Iterable:
if self.file_handle is None:
self.file_handle = self._open_file_handle()

X = torch.from_numpy(self.file_handle["X"][idx])
X = torch.from_numpy(self.file_handle["X"][idx]).to(torch.float32)
missing_mask = (~torch.isnan(X)).to(torch.float32)
X_filledLOCF = self.locf._locf_torch(X.unsqueeze(dim=0)).squeeze()
X = torch.nan_to_num(X)
Expand Down
1 change: 0 additions & 1 deletion pypots/classification/raindrop/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,6 @@ def forward(
edge_attr: OptTensor = None,
return_attention_weights=None,
) -> Tuple[torch.Tensor, Any]:

r"""
Args:
return_attention_weights (bool, optional): If set to :obj:`True`,
Expand Down
1 change: 0 additions & 1 deletion pypots/clustering/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,6 @@ def _train_model(
training_loader: DataLoader,
val_loader: DataLoader = None,
) -> None:

"""

Parameters
Expand Down
1 change: 0 additions & 1 deletion pypots/clustering/crli/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,6 @@ def __init__(
saving_path: Optional[str] = None,
model_saving_strategy: Optional[str] = "best",
):

super().__init__(
n_clusters,
batch_size,
Expand Down
12 changes: 9 additions & 3 deletions pypots/clustering/vader/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@
# License: GLP-v3


from typing import Union
from typing import Union, Iterable

from ..crli.data import DatasetForCRLI
from ...data.base import BaseDataset


class DatasetForVaDER(DatasetForCRLI):
class DatasetForVaDER(BaseDataset):
"""Dataset class for model VaDER.

Parameters
Expand Down Expand Up @@ -45,3 +45,9 @@ def __init__(
file_type: str = "h5py",
):
super().__init__(data, return_labels, file_type)

def _fetch_data_from_array(self, idx: int) -> Iterable:
return super()._fetch_data_from_array(idx)

def _fetch_data_from_file(self, idx: int) -> Iterable:
return super()._fetch_data_from_file(idx)
Loading