diff --git a/docs/DeveloperGuide.md b/docs/DeveloperGuide.md index e736b5ade..734f1201a 100644 --- a/docs/DeveloperGuide.md +++ b/docs/DeveloperGuide.md @@ -209,7 +209,9 @@ __all__ = [ ] ``` -4. Now you can use this new OP with custom arguments in your own config files! +4. When an operator has package dependencies listed in `environments/science_requires.txt`, you need to add the corresponding dependency packages to the `OPS_TO_PKG` dictionary in `data_juicer/utils/auto_install_mapping.py` to support dependency installation at the operator level. + +5. Now you can use this new OP with custom arguments in your own config files! ```yaml # other configs @@ -222,7 +224,7 @@ process: max_len: 1000 ``` -5. (Strongly Recommend) It's better to add corresponding tests for your own OPs. For `TextLengthFilter` above, you would like to add `test_text_length_filter.py` into `tests/ops/filter/` directory as below. +6. (Strongly Recommend) It's better to add corresponding tests for your own OPs. For `TextLengthFilter` above, you would like to add `test_text_length_filter.py` into `tests/ops/filter/` directory as below. ```python import unittest @@ -244,7 +246,7 @@ if __name__ == '__main__': unittest.main() ``` -6. (Strongly Recommend) In order to facilitate the use of other users, we also need to update this new OP information to +7. (Strongly Recommend) In order to facilitate the use of other users, we also need to update this new OP information to the corresponding documents, including the following docs: 1. `configs/config_all.yaml`: this complete config file contains a list of all OPs and their arguments, serving as an important document for users to refer to all available OPs. Therefore, after adding the new OP, we need to add it to the process diff --git a/docs/DeveloperGuide_ZH.md b/docs/DeveloperGuide_ZH.md index e9d746d7c..fcc76aafe 100644 --- a/docs/DeveloperGuide_ZH.md +++ b/docs/DeveloperGuide_ZH.md @@ -202,7 +202,9 @@ __all__ = [ ] ``` -4. 全部完成!现在您可以在自己的配置文件中使用新添加的算子: +4. 算子有`environments/science_requires.txt`中列举的包依赖时,需要在`data_juicer/utils/auto_install_mapping.py`里的`OPS_TO_PKG`中添加对应的依赖包,以支持算子粒度的依赖安装。 + +5. 全部完成!现在您可以在自己的配置文件中使用新添加的算子: ```yaml # other configs @@ -215,7 +217,7 @@ process: max_len: 1000 ``` -5. (强烈推荐)最好为新添加的算子进行单元测试。对于上面的 `TextLengthFilter` 算子,建议在 `tests/ops/filter/` 中实现如 `test_text_length_filter.py` 的测试文件: +6. (强烈推荐)最好为新添加的算子进行单元测试。对于上面的 `TextLengthFilter` 算子,建议在 `tests/ops/filter/` 中实现如 `test_text_length_filter.py` 的测试文件: ```python import unittest @@ -238,7 +240,7 @@ if __name__ == '__main__': unittest.main() ``` -6. (强烈推荐)为了方便其他用户使用,我们还需要将新增的算子信息更新到相应的文档中,具体包括如下文档: +7. (强烈推荐)为了方便其他用户使用,我们还需要将新增的算子信息更新到相应的文档中,具体包括如下文档: 1. `configs/config_all.yaml`:该全集配置文件保存了所有算子及参数的一个列表,作为用户参考可用算子的一个重要文档。因此,在新增算子后,需要将其添加到该文档process列表里(按算子类型分组并按字母序排序): ```yaml