From 17c2c763572058d43197a8af19595398bd6eee17 Mon Sep 17 00:00:00 2001 From: Daoyuan Chen <67475544+yxdyc@users.noreply.github.com> Date: Tue, 7 May 2024 16:59:05 +0800 Subject: [PATCH] Minor update the github io page, for KDD'24 tutorial placeholder (#310) * init for KDD'24 tutorial placeholder --- docs/sphinx_doc/README.md | 8 +- docs/sphinx_doc/README_ZH.md | 9 +- .../source/_static/tutorial_kdd24.html | 107 ++++++++++++++++++ docs/sphinx_doc/source/conf.py | 2 +- docs/sphinx_doc/source/index.rst | 13 ++- 5 files changed, 122 insertions(+), 17 deletions(-) create mode 100644 docs/sphinx_doc/source/_static/tutorial_kdd24.html diff --git a/docs/sphinx_doc/README.md b/docs/sphinx_doc/README.md index 43127bbeb..395e6ac41 100644 --- a/docs/sphinx_doc/README.md +++ b/docs/sphinx_doc/README.md @@ -17,10 +17,4 @@ pip install sphinx sphinx-autobuild sphinx_rtd_theme recommonmark mv build/html position_to_publish ``` -- For convenience (you don’t have to compile from scratch again), the built - directory (including the html files) can be download as follows: -```bash -# cd docs/sphinx_doc -wget https://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/data_juicer/sphinx_API_build_0801.zip -unzip sphinx_API_build_0801.zip -``` +Automatic action in github can be found in [here](https://github.com/modelscope/data-juicer/blob/main/.github/workflows/deploy_sphinx_docs.yml). \ No newline at end of file diff --git a/docs/sphinx_doc/README_ZH.md b/docs/sphinx_doc/README_ZH.md index e02179532..960d18b6d 100644 --- a/docs/sphinx_doc/README_ZH.md +++ b/docs/sphinx_doc/README_ZH.md @@ -15,10 +15,5 @@ pip install sphinx sphinx-autobuild sphinx_rtd_theme recommonmark mv build/html position_to_publish ``` -- 为了方便起见(不必再次从头开始编译),可以按如下方式下载构建的目录(包括 html 文件): - -```bash -# cd docs/sphinx_doc -wget https://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/data_juicer/sphinx_API_build_0801.zip -unzip sphinx_API_build_0801.zip -``` +Github上的自动化部署配置可参考 [该处]( +https://github.com/modelscope/data-juicer/blob/main/.github/workflows/deploy_sphinx_docs.yml). \ No newline at end of file diff --git a/docs/sphinx_doc/source/_static/tutorial_kdd24.html b/docs/sphinx_doc/source/_static/tutorial_kdd24.html new file mode 100644 index 000000000..25270d5c9 --- /dev/null +++ b/docs/sphinx_doc/source/_static/tutorial_kdd24.html @@ -0,0 +1,107 @@ + + + + + + + + Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases + + + + + + + +
+
+
+
+
+

KDD 2024 Hands-on Tutorial

+

Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases

+

Date & Time: X:XX pm - Y:YY pm, August XX, 2024

+

Location: To be updated

+
+
+
In the era of foundation models, the ability to process multi-modal data efficiently and effectively has become paramount. +In this tutorial, participants will dive into the essential techniques for processing multi-modal data. We will explore how large-scale high-quality data enhances model performance and introduce the open-sourced Data-Juicer system, designed to tackle the complexities of data variety, quality and scale. +Attendees will gain practical experience with Data-Juicer's operators, mastering data formatting, mapping, filtering, deduplication and selection. +A significant portion of the tutorial is dedicated to the Data-Juicer Sandbox Lab and typical use cases for static and dynamic data, including text, image, audio, and video. The lab is a playground integrated with unified models and evaluators, and facilitates experiments with data recipes that represent methodical sequences of operators and streamline the creation of scalable data processing pipelines. This experience is designed to not only solidify the concepts discussed but also to provide a space for innovation and exploration, highlighting how data recipes can be optimized and deployed in high-performance distributed environments. +

By the end of this tutorial, attendees will be equipped with the practical knowledge and skills to navigate the complexities of multi-modal data processing. They will leave with actionable knowledge with an industrial open-source system and an enriched perspective on the importance of high-quality data in AI, poised to implement sustainable and scalable solutions in their projects. +
+
+
+
+

+

+

Tutorial Slides
slides.pdf

+
+
+
+ +
+
+ +
+
+
+

Schedule

+
+
+
+
+
Date: August XX, 2024
+
Location: To be updated.
+
(xx min) | Introduction and Overview: Multi-modal Data Processing and the +Data-Juicer System
+
(xx min) | Building Blocks of Data Processing: Data-Juicer’s Operators
+
(xx min) | Composing Atomic Capabilities: Data-Juicer’s Data Recipes
+
(xx min) | Exploring Data Recipes: The Data-Juicer Sandbox Lab
+
(xx min) | From Exploration to Production: High-Performance Data Factory
+
(xx min) | Static Data Use Cases: Text and Image Data Processing
+
(xx min) | Dynamic Data Use Cases: Video and Audio Data Processing
+
(xx min) | Conclusion and Resources
+

+
+
+
+
+ +
+
+

Organizers

+
We are the Data-Juicer team from Alibaba Tongyi
+ Data-Juicer + +
+
+
+ + +
+
+ + + + + + + + \ No newline at end of file diff --git a/docs/sphinx_doc/source/conf.py b/docs/sphinx_doc/source/conf.py index 8b5921558..9d8650ee3 100644 --- a/docs/sphinx_doc/source/conf.py +++ b/docs/sphinx_doc/source/conf.py @@ -7,7 +7,7 @@ # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information project = 'data_juicer' -copyright = '2023, Data-Juicer Team' +copyright = '2024, Data-Juicer Team' author = 'Data-Juicer Team' # The theme to use for HTML and HTML Help pages. See the documentation for diff --git a/docs/sphinx_doc/source/index.rst b/docs/sphinx_doc/source/index.rst index 78f525425..27fed339f 100644 --- a/docs/sphinx_doc/source/index.rst +++ b/docs/sphinx_doc/source/index.rst @@ -6,10 +6,19 @@ Welcome to data-juicer's documentation! ======================================= +Tutorial +--------------------------------------- + +We will give a tutorial on KDD'24, Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases, see more details `here <_static/tutorial_kdd24.html>`_! + + + + + .. toctree:: :maxdepth: 2 :glob: - :caption: Data-Juicer API Reference + :caption: API Reference data_juicer.core data_juicer.ops @@ -22,7 +31,7 @@ Welcome to data-juicer's documentation! data_juicer.config data_juicer.format -Indices and tables +Indices and Tables ================== * :ref:`genindex`