Skip to content

Commit

Permalink
Optimize OP docs for better usage (#472)
Browse files Browse the repository at this point in the history
* * update tags for OPs in docs

* + add code and unit tests links for each OP in the docs

* * sync OP doc with the latest main branch
  • Loading branch information
HYLcool authored Nov 5, 2024
1 parent 65d7c91 commit 6badfa8
Show file tree
Hide file tree
Showing 6 changed files with 288 additions and 275 deletions.
8 changes: 4 additions & 4 deletions docs/DeveloperGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,13 +277,13 @@ the corresponding documents, including the following docs:
```markdown
## Overview
...
| [ Filter ]( #filter ) | 21 (+1 HERE) | Filters out low-quality samples |
| [ Filter ]( #filter ) | 43 (+1 HERE) | Filters out low-quality samples |
...
## Filter <a name="filter"/>
...
| suffix_filter | General | en, zh | Keeps samples with specified suffixes |
| text_length_filter | General | en, zh | Keeps samples with total text length within the specified range |
| token_num_filter | General | en, zh | Keeps samples with token count within the specified range |
| text_entity_dependency_filter | ![General](https://img.shields.io/badge/General-5FBF50?style=plastic) ![Text](https://img.shields.io/badge/Text-010326?style=plastic) ![en](https://img.shields.io/badge/en-A60D1A?style=plastic) ![zh](https://img.shields.io/badge/zh-F2D6A2?style=plastic) | Keeps samples containing dependency edges for an entity in the dependency tree of the texts | [code](../data_juicer/ops/filter/text_entity_dependency_filter.py) | [tests](../tests/ops/filter/test_text_entity_dependency_filter.py) |
| text_length_filter | ![General](https://img.shields.io/badge/General-5FBF50?style=plastic) ![Text](https://img.shields.io/badge/Text-010326?style=plastic) ![en](https://img.shields.io/badge/en-A60D1A?style=plastic) ![zh](https://img.shields.io/badge/zh-F2D6A2?style=plastic) | Keeps samples with total text length within the specified range | [code](../data_juicer/ops/filter/text_length_filter.py) | [tests](../tests/ops/filter/test_text_length_filter.py) |
| token_num_filter | ![General](https://img.shields.io/badge/General-5FBF50?style=plastic) ![Text](https://img.shields.io/badge/Text-010326?style=plastic) ![en](https://img.shields.io/badge/en-A60D1A?style=plastic) ![zh](https://img.shields.io/badge/zh-F2D6A2?style=plastic) ![GPU](https://img.shields.io/badge/GPU-F27649?style=plastic) | Keeps samples with token count within the specified range | [code](../data_juicer/ops/filter/token_num_filter.py) | [tests](../tests/ops/filter/test_token_num_filter.py) |
...
```

Expand Down
8 changes: 4 additions & 4 deletions docs/DeveloperGuide_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,13 +266,13 @@ if __name__ == '__main__':
```markdown
## Overview
...
| [ Filter ]( #filter ) | 21 (+1 HERE) | Filters out low-quality samples |
| [ Filter ]( #filter ) | 43 (+1 HERE) | Filters out low-quality samples |
...
## Filter <a name="filter"/>
...
| suffix_filter | General | en, zh | Keeps samples with specified suffixes |
| text_length_filter | General | en, zh | Keeps samples with total text length within the specified range |
| token_num_filter | General | en, zh | Keeps samples with token count within the specified range |
| text_entity_dependency_filter | ![General](https://img.shields.io/badge/General-5FBF50?style=plastic) ![Text](https://img.shields.io/badge/Text-010326?style=plastic) ![en](https://img.shields.io/badge/en-A60D1A?style=plastic) ![zh](https://img.shields.io/badge/zh-F2D6A2?style=plastic) | Keeps samples containing dependency edges for an entity in the dependency tree of the texts | [code](../data_juicer/ops/filter/text_entity_dependency_filter.py) | [tests](../tests/ops/filter/test_text_entity_dependency_filter.py) |
| text_length_filter | ![General](https://img.shields.io/badge/General-5FBF50?style=plastic) ![Text](https://img.shields.io/badge/Text-010326?style=plastic) ![en](https://img.shields.io/badge/en-A60D1A?style=plastic) ![zh](https://img.shields.io/badge/zh-F2D6A2?style=plastic) | Keeps samples with total text length within the specified range | [code](../data_juicer/ops/filter/text_length_filter.py) | [tests](../tests/ops/filter/test_text_length_filter.py) |
| token_num_filter | ![General](https://img.shields.io/badge/General-5FBF50?style=plastic) ![Text](https://img.shields.io/badge/Text-010326?style=plastic) ![en](https://img.shields.io/badge/en-A60D1A?style=plastic) ![zh](https://img.shields.io/badge/zh-F2D6A2?style=plastic) ![GPU](https://img.shields.io/badge/GPU-F27649?style=plastic) | Keeps samples with token count within the specified range | [code](../data_juicer/ops/filter/token_num_filter.py) | [tests](../tests/ops/filter/test_token_num_filter.py) |
...
```

Expand Down
Loading

0 comments on commit 6badfa8

Please sign in to comment.