-
Notifications
You must be signed in to change notification settings - Fork 186
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* modelscope-sora news (#323) * News/modelscope sora (#327) * modelscope-sora news * remove empower * debug for gpu rank for analyser (#329) * debug for gpu rank for analyser * spec_numprocs -> num_proc * Add more unittest (#304) * add unittest env with gpu * fix unittest yml * add environment for unittest * update workflow trigger * update install step * fix install command * update working dir * update container * update working dir * change working directory * change working directory * change working directory * change working directory * change unittest * use test tag * finish tag support * support run op with different executro * fix pre-commit * add hf mirror * add hf mirror * run all test in standalone mode by default * ignore image face ratio * update tags * add ray testcase * add ray test in workflow * update ray unittest workflow * delete old unittest --------- Co-authored-by: root <panxuchen> * Add source tag (#317) * add source tag for some mapper op * fix no attribute 'current_tag' when executing local tests * move op process logic from executor to base op * fix typo * move export outside op * init refactor * update analyser * fix format * clean up * bring back batch mapper * Improve fault tolerance & Fix Ray executor * fix wrapper * fix batched filter * Remove use_actor as it is not compatible with the refactored OP clas, unless the dataset class is refactored * make wrappers work with unittests * Compatible with unit tests and works with ray * fix unittest * fix wrappers with ray, map, filter * unify unittests * wrap deduplicators * Compatible with non-batched calls * Class-level wrappers - compatible with dataset.filter - bring back nested wrappers * Instance-level wrappers * Refined instance-level wrappers - Remove incomplete dataset.filter wrappers - Simplify code - Stack wrappers * fix use_cuda * Refactor dataset (#348) * refactor dataset * update unittest with DJDataset * fix unittest * update ray data load * add test * ray read json * update docker image version * actor is no longer supported * Regress filter's stats export logic --------- Co-authored-by: BeachWang <[email protected]> Co-authored-by: Xuchen Pan <[email protected]> Co-authored-by: chenhesen <[email protected]> Co-authored-by: garyzhang99 <[email protected]>
- Loading branch information
1 parent
da79345
commit b4cef5d
Showing
108 changed files
with
1,012 additions
and
592 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
version: '3' | ||
services: | ||
ray-head: | ||
image: data-juicer-unittest:0.2.1 | ||
pull_policy: never | ||
command: ray start --head --dashboard-host 0.0.0.0 --include-dashboard true --block | ||
environment: | ||
- HF_HOME=/data/huggingface | ||
- HF_ENDPOINT=https://hf-mirror.com | ||
- TORCH_HOME=/data/torch | ||
- NLTK_DATA=/data/nltk | ||
- DATA_JUICER_CACHE_HOME=/data/dj | ||
- RAY_ADDRESS=auto | ||
working_dir: /workspace | ||
networks: | ||
- ray-network | ||
volumes: | ||
- huggingface_cache:/data | ||
- ../../..:/workspace | ||
ports: | ||
- "6379:6379" | ||
- "8265:8265" | ||
shm_size: "64G" | ||
deploy: | ||
resources: | ||
reservations: | ||
devices: | ||
- driver: nvidia | ||
device_ids: ['0', '1'] | ||
capabilities: [gpu] | ||
|
||
ray-worker: | ||
image: data-juicer-unittest:0.2.1 | ||
pull_policy: never | ||
command: ray start --address=ray-head:6379 --block | ||
environment: | ||
- HF_HOME=/data/huggingface | ||
- HF_ENDPOINT=https://hf-mirror.com | ||
- TORCH_HOME=/data/torch | ||
- NLTK_DATA=/data/nltk | ||
- DATA_JUICER_CACHE_HOME=/data/dj | ||
working_dir: /workspace | ||
volumes: | ||
- huggingface_cache:/data | ||
- ../../..:/workspace | ||
depends_on: | ||
- ray-head | ||
networks: | ||
- ray-network | ||
shm_size: "64G" | ||
deploy: | ||
resources: | ||
reservations: | ||
devices: | ||
- driver: nvidia | ||
device_ids: ['2', '3'] | ||
capabilities: [gpu] | ||
|
||
networks: | ||
ray-network: | ||
driver: bridge | ||
|
||
volumes: | ||
huggingface_cache: | ||
external: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,58 +1,63 @@ | ||
# This workflow will install Python dependencies, run tests and lint with a single version of Python | ||
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python | ||
|
||
name: Unit Test | ||
name: unittest | ||
|
||
on: [push, pull_request, workflow_dispatch] | ||
on: | ||
workflow_dispatch: | ||
pull_request: | ||
push: | ||
branches: | ||
- main | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
build: | ||
|
||
runs-on: ubuntu-latest | ||
env: | ||
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true | ||
|
||
jobs: | ||
unittest-single: | ||
runs-on: [self-hosted, linux] | ||
environment: Testing | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- name: Check disk space | ||
run: | | ||
df -h | ||
- name: Set up Python 3.8 | ||
uses: actions/setup-python@v3 | ||
with: | ||
python-version: "3.8" | ||
- name: Check disk space | ||
path: dj-${{ github.run_id }} | ||
|
||
- name: Setup docker compose | ||
working-directory: dj-${{ github.run_id }}/.github/workflows/docker | ||
run: | | ||
df -h | ||
- name: Install dependencies | ||
docker compose up -d | ||
- name: Install data-juicer | ||
working-directory: dj-${{ github.run_id }}/.github/workflows/docker | ||
run: | | ||
sudo apt-get install ffmpeg | ||
python -m pip install --upgrade pip | ||
pip install -v -e .[all] | ||
pip install -v -e .[sandbox] | ||
- name: Increase swapfile | ||
docker compose exec ray-head pip install -e .\[all\] | ||
docker compose exec ray-worker pip install -e .\[all\] | ||
- name: Clean dataset cache | ||
working-directory: dj-${{ github.run_id }}/.github/workflows/docker | ||
run: | | ||
df -h | ||
free -h | ||
sudo swapoff -a | ||
sudo fallocate -l 12G /mnt/swapfile | ||
sudo chmod 600 /mnt/swapfile | ||
sudo mkswap /mnt/swapfile | ||
sudo swapon /mnt/swapfile | ||
sudo swapon --show | ||
- name: Clean data-juicer assets and models after cached | ||
uses: webiny/[email protected] | ||
with: | ||
run: rm -rf ~/.cache/data_juicer | ||
- name: Cache data-juicer assets and models | ||
uses: actions/cache@v3 | ||
with: | ||
path: ~/.cache/data_juicer | ||
key: dj-assets-models | ||
- name: Check disk space | ||
docker compose exec ray-head rm -rf /data/huggingface/dataset | ||
- name: Run unittest standalone | ||
working-directory: dj-${{ github.run_id }}/.github/workflows/docker | ||
run: | | ||
docker compose exec ray-head python tests/run.py --tag standalone | ||
- name: Run unittest ray | ||
working-directory: dj-${{ github.run_id }}/.github/workflows/docker | ||
run: | | ||
df -h | ||
- name: Run the test | ||
docker compose exec ray-head python tests/run.py --tag ray | ||
- name: Remove docker compose | ||
working-directory: dj-${{ github.run_id }}/.github/workflows/docker | ||
if: always() | ||
run: | | ||
docker compose down --remove-orphans | ||
- name: Cleanup workspace | ||
if: always() | ||
run: | | ||
python tests/run.py | ||
rm -rf dj-${{ github.run_id }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.