Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Initial work toward PyTorch data loaders #2823

Closed
wants to merge 84 commits into from
Closed
Show file tree
Hide file tree
Changes from 71 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
dae4e98
initial commit of pytorch datapipe/loader
bkmartinjr Jul 31, 2024
a030e2f
update pre-commit versions for black, ruff and mypy; fix newly detect…
bkmartinjr Jul 31, 2024
70488e6
update comments
bkmartinjr Jul 31, 2024
8c73f0c
update workflow
bkmartinjr Jul 31, 2024
1c54c42
fix syntax for py38 support
bkmartinjr Jul 31, 2024
209a8de
add explicit target version of python 3.8 to mypy and ruff; fix lint
bkmartinjr Jul 31, 2024
80e18da
Merge branch 'bkmartinjr/update-pre-commit-versions' into bkmartinjr/…
bkmartinjr Jul 31, 2024
7734d15
more lint
bkmartinjr Jul 31, 2024
1eca542
Merge branch 'main' into bkmartinjr/experimentdatapipe
bkmartinjr Jul 31, 2024
81110b3
lint
bkmartinjr Jul 31, 2024
d4384cd
add test
bkmartinjr Jul 31, 2024
c6d832b
merge with main
bkmartinjr Jul 31, 2024
9108d95
add notebook to demonstrate pytorch experimentdatapipe
bkmartinjr Jul 31, 2024
db58b75
add /tmp/ to .gitignore
bkmartinjr Jul 31, 2024
bc1643f
fix typos
bkmartinjr Aug 1, 2024
485e4e8
rework for performance
bkmartinjr Aug 20, 2024
21b4132
merge with main
bkmartinjr Aug 20, 2024
1316337
tuning
bkmartinjr Aug 21, 2024
9ddc85f
tweaks, checkpoint
bkmartinjr Aug 21, 2024
c16a56b
lint
bkmartinjr Aug 21, 2024
7438524
py 3.8 lint
bkmartinjr Aug 22, 2024
8ef682c
expand test coverage
bkmartinjr Aug 22, 2024
9a68b7e
rework io and shuffle buffer size params
bkmartinjr Aug 22, 2024
2a1aff5
cleanup notebook
bkmartinjr Aug 23, 2024
4e9180a
lint
bkmartinjr Aug 24, 2024
33fae47
remove encoders; more perf work
bkmartinjr Aug 24, 2024
c185cd9
Merge branch 'main' into bkmartinjr/experimentdatapipe
bkmartinjr Aug 24, 2024
4677074
reorganize into separate python package
bkmartinjr Aug 25, 2024
a920f62
add CI
bkmartinjr Aug 25, 2024
3c3d278
fix name
bkmartinjr Aug 25, 2024
a54148c
add more paths to CI
bkmartinjr Aug 25, 2024
f6d84eb
fix typo in ci
bkmartinjr Aug 25, 2024
7f73184
fix a second typo in ci
bkmartinjr Aug 25, 2024
095fd02
set working dir in CI
bkmartinjr Aug 25, 2024
ba46743
make batched 3.12 compat
bkmartinjr Aug 25, 2024
278ae13
debugging pre-commit failure
bkmartinjr Aug 25, 2024
34ef84d
lint, lint, lint
bkmartinjr Aug 25, 2024
c8ecf43
more CI debugging
bkmartinjr Aug 25, 2024
2c499e2
add build test to CI
bkmartinjr Aug 25, 2024
2681cdf
add code coverage
bkmartinjr Aug 25, 2024
0e32457
update GHA
bkmartinjr Aug 25, 2024
5dc6cfd
test TypeAlias
bkmartinjr Aug 25, 2024
45edc7c
add missing dependencies
bkmartinjr Aug 25, 2024
9ff0313
extend tests
bkmartinjr Aug 25, 2024
5658757
remove coverage reporting from CI for now
bkmartinjr Aug 25, 2024
c71e03d
docstrings
bkmartinjr Aug 25, 2024
05774d5
more file organization
bkmartinjr Aug 25, 2024
bea3085
add missing test
bkmartinjr Aug 25, 2024
9295f85
re-run notebook
bkmartinjr Aug 25, 2024
525aff8
revert change to type ignore statement
bkmartinjr Aug 25, 2024
db36708
clarify comment
bkmartinjr Aug 25, 2024
a6261d6
update changelog
bkmartinjr Aug 25, 2024
44194b9
add collate unit test
bkmartinjr Aug 26, 2024
39f2f4a
clean up experiment_dataloader function
bkmartinjr Aug 26, 2024
7484315
docstrings
bkmartinjr Aug 27, 2024
cb4e408
fix typo in notebook name (thanks Ryan!)
bkmartinjr Aug 28, 2024
d78fcfd
checkpoint updates
bkmartinjr Aug 30, 2024
91f8ed1
Merge branch 'main' into bkmartinjr/experimentdatapipe
bkmartinjr Aug 30, 2024
3ffc511
update tests to include _CSR tests
bkmartinjr Aug 30, 2024
80fb71e
fix typo in method name
bkmartinjr Aug 30, 2024
118d532
tuning
bkmartinjr Aug 30, 2024
f49c150
update demo notebook
bkmartinjr Aug 30, 2024
c89c64d
add to README
bkmartinjr Aug 30, 2024
da81a68
concurrency tweak
bkmartinjr Aug 31, 2024
c8d7a68
additional memory reductions
bkmartinjr Aug 31, 2024
e5873a2
DDP/multi-GPU support
bkmartinjr Sep 4, 2024
ce6426b
add further concurrency to CSR construction
bkmartinjr Sep 6, 2024
de44410
Merge branch 'main' into bkmartinjr/experimentdatapipe
bkmartinjr Sep 6, 2024
33e8ff6
cleanup
bkmartinjr Sep 10, 2024
0b73c23
Merge branch 'main' into bkmartinjr/experimentdatapipe
bkmartinjr Sep 12, 2024
b6d6230
fix multi-gpu hang due to incorrect __len__ return value
bkmartinjr Sep 13, 2024
23f7119
compat with Lightning
bkmartinjr Sep 14, 2024
d26d13f
PR review edits
bkmartinjr Sep 14, 2024
3926beb
merge with main
bkmartinjr Sep 14, 2024
5dae82a
formatting
bkmartinjr Sep 14, 2024
fdf6a90
add py.typed to package
bkmartinjr Sep 16, 2024
0806d99
add sparse support
bkmartinjr Sep 16, 2024
bb8cc3a
Merge branch 'main' into bkmartinjr/experimentdatapipe
bkmartinjr Sep 16, 2024
614b0af
start draft of Ligtning notebook
bkmartinjr Sep 16, 2024
3ff42e1
lint
bkmartinjr Sep 16, 2024
17f2260
update notebook for lightning
bkmartinjr Sep 17, 2024
e66b4c2
run notebooks
bkmartinjr Sep 17, 2024
38f4e41
fix RNG state bug in shuffle; add multi-worker notebook
bkmartinjr Sep 18, 2024
124a510
Merge branch 'main' into bkmartinjr/experimentdatapipe
bkmartinjr Sep 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions .github/workflows/python-tiledbsoma-ml.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
name: python-tiledbsoma-ml

on:
pull_request:
branches: ["*"]
paths:
- "!**"
- "other_packages/python/tiledbsoma_ml/**"
- ".github/workflows/python-tiledbsoma-ml.yml"

push:
branches: [main]
paths:
- "!**"
- "other_packages/python/tiledbsoma_ml/**"
- ".github/workflows/python-tiledbsoma-ml.yml"

workflow_dispatch:

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Restore pre-commit cache
uses: actions/cache@v4
with:
path: ~/.cache/pre-commit
key: pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}

- name: Install pre-commit
run: pip -v install pre-commit

- name: Run pre-commit hooks on all files
run: pre-commit run -v -a

tests:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip
cache-dependency-path: python-spec/requirements-py${{ matrix.python-version }}.txt

- name: Install prereqs
working-directory: ./other_packages/python/tiledbsoma_ml/
run: |
pip install --upgrade pip wheel pytest pytest-cov setuptools
pip install .

- name: Run tests
run: |
PYTHONPATH=$(pwd)/other_packages/python/tiledbsoma_ml python -m pytest \
--cov=other_packages/python/tiledbsoma_ml/src \
--cov-report=xml other_packages/python/tiledbsoma_ml/tests \
-v

# - name: Report coverage to Codecov
# if: ${{ matrix.python-version == '3.11' }}
# uses: codecov/codecov-action@v4
# with:
# flags: python
# # Although Codecov isn't supposed to require an auth token for public repos like this one,
# # the uploader can be unreliable without one; see
# # https://github.com/codecov/codecov-action/issues/557#issuecomment-1216749652
# # As of this writing (8 Nov 2022) the CODECOV_TOKEN was generated by @aaronwolen in his
# # Codecov settings page for this repo, then filled into the GitHub Actions secrets.
# token: ${{ secrets.CODECOV_TOKEN }}

build:
# for now, just do a test build to ensure that it works
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Do build
working-directory: ./other_packages/python/tiledbsoma_ml/
run: |
pip install --upgrade build pip wheel setuptools setuptools-scm
python -m build .
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ examples/obj/*
scripts/deps-staging/*
.idea
*temp*
**/tmp/*
*.DS_Store
doc/venv
doc/source/__pycache__
Expand All @@ -52,6 +53,7 @@ apis/python/src/tiledbsoma/libtiledb.*
apis/python/src/tiledbsoma/libtiledbsoma.so
apis/python/src/tiledbsoma/libtiledbsoma.dylib
apis/python/src/tiledbsoma/pytiledbsoma.*
**/dist/

/.quarto/

Expand Down
60 changes: 45 additions & 15 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,54 @@
# note that many hooks are organized by the package to which they apply (files: param).
exclude: ^doc/source/
repos:
- repo: https://github.com/psf/black
rev: "24.4.2"
rev: "24.8.0"
hooks:
- id: black
- id: black

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.5
rev: v0.6.2
hooks:
- id: ruff
args: ["--config=apis/python/pyproject.toml"]
- id: ruff
name: "ruff for tiledbsoma"
files: "^apis/python/"
args: ["--config=apis/python/pyproject.toml"]

- id: ruff
name: "ruff for tiledbsoma_ml"
files: "^other_packages/python/tiledbsoma_ml/"
args: ["--config=other_packages/python/tiledbsoma_ml/pyproject.toml"]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.1
hooks:
- id: mypy
additional_dependencies:
# Pandas types changed between 1.x and 2.x. Our setup.py permits both, but for type-checking purposes we use the
# Pandas 2.x types (e.g. `pd.Series[Any]`). See `_types.py` or https://github.com/single-cell-data/TileDB-SOMA/issues/2839
# for more info.
- "pandas-stubs>=2"
- "somacore==1.0.14"
- types-setuptools
args: ["--config-file=apis/python/pyproject.toml", "apis/python/src", "apis/python/devtools"]
pass_filenames: false
- id: mypy
name: "mypy for tiledbsoma"
files: "^apis/python/"
additional_dependencies:
# Pandas types changed between 1.x and 2.x. Our setup.py permits both, but for type-checking purposes we use the
# Pandas 2.x types (e.g. `pd.Series[Any]`). See `_types.py` or https://github.com/single-cell-data/TileDB-SOMA/issues/2839
# for more info.
- "pandas-stubs>=2"
- "somacore==1.0.14"
- types-setuptools
args:
[
"--config-file=apis/python/pyproject.toml",
"apis/python/src",
"apis/python/devtools",
]
pass_filenames: false

- id: mypy
name: "mypy for tiledbsoma_ml"
files: "^other_packages/python/tiledbsoma_ml/"
args: ["--config=other_packages/python/tiledbsoma_ml/pyproject.toml"]
additional_dependencies:
- attrs
- types-requests
- pytest
- "pandas-stubs>=2"
- numpy
- typing_extensions
- types-setuptools
Loading