Skip to content

Commit

Permalink
developer doc done
Browse files Browse the repository at this point in the history
  • Loading branch information
BeachWang committed Dec 12, 2024
1 parent a0da444 commit 02f8dda
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
8 changes: 5 additions & 3 deletions docs/DeveloperGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,9 @@ __all__ = [
]
```

4. Now you can use this new OP with custom arguments in your own config files!
4. When an operator has package dependencies listed in `environments/science_requires.txt`, you need to add the corresponding dependency packages to the `OPS_TO_PKG` dictionary in `data_juicer/utils/auto_install_mapping.py` to support dependency installation at the operator level.

5. Now you can use this new OP with custom arguments in your own config files!

```yaml
# other configs
Expand All @@ -222,7 +224,7 @@ process:
max_len: 1000
```
5. (Strongly Recommend) It's better to add corresponding tests for your own OPs. For `TextLengthFilter` above, you would like to add `test_text_length_filter.py` into `tests/ops/filter/` directory as below.
6. (Strongly Recommend) It's better to add corresponding tests for your own OPs. For `TextLengthFilter` above, you would like to add `test_text_length_filter.py` into `tests/ops/filter/` directory as below.

```python
import unittest
Expand All @@ -244,7 +246,7 @@ if __name__ == '__main__':
unittest.main()
```

6. (Strongly Recommend) In order to facilitate the use of other users, we also need to update this new OP information to
7. (Strongly Recommend) In order to facilitate the use of other users, we also need to update this new OP information to
the corresponding documents, including the following docs:
1. `configs/config_all.yaml`: this complete config file contains a list of all OPs and their arguments, serving as an
important document for users to refer to all available OPs. Therefore, after adding the new OP, we need to add it to the process
Expand Down
8 changes: 5 additions & 3 deletions docs/DeveloperGuide_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,9 @@ __all__ = [
]
```

4. 全部完成!现在您可以在自己的配置文件中使用新添加的算子:
4. 算子有`environments/science_requires.txt`中列举的包依赖时,需要在`data_juicer/utils/auto_install_mapping.py`里的`OPS_TO_PKG`中添加对应的依赖包,以支持算子粒度的依赖安装。

5. 全部完成!现在您可以在自己的配置文件中使用新添加的算子:

```yaml
# other configs
Expand All @@ -215,7 +217,7 @@ process:
max_len: 1000
```
5. (强烈推荐)最好为新添加的算子进行单元测试。对于上面的 `TextLengthFilter` 算子,建议在 `tests/ops/filter/` 中实现如 `test_text_length_filter.py` 的测试文件:
6. (强烈推荐)最好为新添加的算子进行单元测试。对于上面的 `TextLengthFilter` 算子,建议在 `tests/ops/filter/` 中实现如 `test_text_length_filter.py` 的测试文件:

```python
import unittest
Expand All @@ -238,7 +240,7 @@ if __name__ == '__main__':
unittest.main()
```

6. (强烈推荐)为了方便其他用户使用,我们还需要将新增的算子信息更新到相应的文档中,具体包括如下文档:
7. (强烈推荐)为了方便其他用户使用,我们还需要将新增的算子信息更新到相应的文档中,具体包括如下文档:
1. `configs/config_all.yaml`:该全集配置文件保存了所有算子及参数的一个列表,作为用户参考可用算子的一个重要文档。因此,在新增算子后,需要将其添加到该文档process列表里(按算子类型分组并按字母序排序):

```yaml
Expand Down

0 comments on commit 02f8dda

Please sign in to comment.