docs: Use sphinx-ext-toc to control examples (NVIDIA-Merlin#404)

mikemckiernan · May 9, 2022 · b6042db · b6042db
1 parent 279e732
commit b6042db
Show file tree

Hide file tree

Showing 14 changed files with 284 additions and 203 deletions.
diff --git a/.gitignore b/.gitignore
@@ -66,6 +66,7 @@ docs/_build/
 docs/source/_build/
 docs/source/generated
 docs/source/README.md
+docs/source/CONTRIBUTING.md
 docs/source/examples
 
 # PyBuilder
@@ -103,4 +104,4 @@ dmypy.json
 
 
 # Experiment files
-_test.py
+_test.py
diff --git a/README.md b/README.md
@@ -1,4 +1,5 @@
-## Merlin Models 
+## Merlin Models
+
 [![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-models.svg)](https://pypi.python.org/pypi/merlin-models/)
 ![GitHub License](https://img.shields.io/github/license/NVIDIA-Merlin/models)
 [![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/models/main/)
@@ -33,12 +34,13 @@ To address the challenge, Merlin has custom, highly-optimized dataloaders to acc
 The Merlin dataloaders can lead to a speedup that is nine times faster than the same training pipeline used with the GPU.
 
 With the Merlin dataloaders, you can:
+
 - Remove bottlenecks from data loading by processing large chunks of data at a time instead of item by item.
 - Process datasets that don't fit within the GPU or CPU memory by streaming from the disk.
 - Prepare batches asynchronously into the GPU to avoid CPU-to-GPU communication.
 - Integrate easily into existing TensorFlow or PyTorch training pipelines by using a similar API.
 
-To learn about the core features of Merlin Models, see the [Models Overview](docs/source/models_overview.md) page.
+To learn about the core features of Merlin Models, see the [Models Overview](https://nvidia-merlin.github.io/models/main/models_overview.html) page.
 
 ### Installation
 
@@ -61,7 +63,7 @@ Refer to the [Merlin Containers](https://nvidia-merlin.github.io/Merlin/main/con
 
 #### Installing Merlin Models from Source
 
-Merlin Models can be installed from source by running the following commands: 
+Merlin Models can be installed from source by running the following commands:
 
 ```shell
 git clone https://github.com/NVIDIA-Merlin/models
@@ -73,7 +75,7 @@ cd models && pip install -e .
 Merlin Models makes it straightforward to define architectures that adapt to different input features.
 This adaptability is provided by building on a core feature of the NVTabular library.
 When you use NVTabular for feature engineering, NVTabular creates a schema that identifies the input features.
-You can see the `Schema` object in action by looking at the [Applying to your own dataset with Merlin Models and NVTabular](https://github.com/NVIDIA-Merlin/models/examples/02-Merlin-Models-and-NVTabular-applying-to-your-own-dataset.html) example notebook.
+You can see the `Schema` object in action by looking at the [From ETL to Training RecSys models - NVTabular and Merlin Models integrated example](https://nvidia-merlin.github.io/models/main/examples/02-Merlin-Models-and-NVTabular-integration.html) example notebook.
 
 You can easily build popular RecSys architectures like [DLRM](http://arxiv.org/abs/1906.00091), as shown in the following code sample.
 After you define the model, you can train and evaluate it with a typical Keras model.
@@ -98,11 +100,11 @@ model.fit(train, validation_data=valid, batch_size=1024)
 eval_metrics = model.evaluate(valid, batch_size=1024, return_dict=True)
 ```
 
- 1. To build the internal input layer, the model identifies them from the schema object.
-  The schema identifies the continuous features and categorical features, for which embedding tables are created.
- 2. To define the body of the architecture, MLP layers are used with configurable dimensions.
- 3. The head of the architecture is created from the chosen task, `BinaryClassificationTask` in this example.
-  The target binary feature is also inferred from the schema (i.e., tagged as 'TARGET').
+1. To build the internal input layer, the model identifies them from the schema object.
+ The schema identifies the continuous features and categorical features, for which embedding tables are created.
+2. To define the body of the architecture, MLP layers are used with configurable dimensions.
+3. The head of the architecture is created from the chosen task, `BinaryClassificationTask` in this example.
+ The target binary feature is also inferred from the schema (i.e., tagged as 'TARGET').
 
 You can find more details and information about a low-level API in our overview of the
 [Deep Learning Recommender Model](https://nvidia-merlin.github.io/models/main/models_overview.html#deep-learning-recommender-model).
@@ -118,4 +120,3 @@ The same notebooks are available in the `examples` directory from the [Merlin Mo
 If you'd like to contribute to the library directly, see the [CONTRIBUTING.md](CONTRIBUTING.md) file.
 We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations.
 To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).
-
diff --git a/docs/README.md b/docs/README.md
@@ -4,7 +4,7 @@ This folder contains the scripts necessary to build the Merlin Models
 documentation. You can view the generated
 [documentation here](https://nvidia-merlin.github.io/models).
 
-# Contributing to Docs
+## Contributing to Docs
 
 Refer to the following instructions to build the docs.
 
@@ -13,46 +13,128 @@ Refer to the following instructions to build the docs.
 1. Follow the instructions to create a Python developer environment. See the
  [installation instructions](https://github.com/NVIDIA-Merlin/models).
 
-2. Install required documentation tools and extensions:
+1. Install required documentation tools and extensions:
 
- ```sh
+ ```shell
  cd models
  python3 -m virtualenv -p=python3.9 env
  source env/bin/activate
  pip install -r requirements/base.txt
  pip install -r requirements/dev.txt
  ```
 
-3. If you updated docstrings, you need to delete the `docs/source/api` directory
- and then run the following command within the `docs` directory:
+1. Build the documentation to HTML output:
 
- ```sh
- sphinx-apidoc -f -o source/api ../merlin.models
+ ```shell
+ make -C docs clean html
  ```
 
-4. Navigate to `models/docs/` and transform the documentation to HTML output:
+ This should run Sphinx in your shell, and output HTML in
+ `build/html/`.
+
+1. Start an HTTP server and review your updates:
 
- ```sh
- make html
+ ```shell
+ python -m http.server 8000 -d docs/build/html
  ```
 
- This should run Sphinx in your shell, and output HTML in
- `build/html/index.html`
+ Navigate a web browser to the IP address or hostname of the host machine at port 8000:
 
-## Preview the documentation build
+ `https://localhost:8000`
 
-1. To view the docs build, run the following command from the `build/html`
- directory:
+ Check that your docs edits formatted correctly, and read well.
 
- ```sh
- python -m http.server
+## Decisions
 
- # or
+### Source management: README and index files
 
- python -m SimpleHTTPServer 8000
- ```
+- To preserve Sphinx's expectation that all source files are child files and directories
+ of the `docs/source` directory, other content, such as the `examples` directory is
+ copied to the source directory. You can determine which directories are copied by
+ viewing `docs/source/conf.py` and looking for the `copydirs_additional_dirs` list.
+ Directories are specified relative to the Sphinx source directory, `docs/source`.
+
+- One consequence of the preceding bullet is that any change to the original files,
+ such as adding or removing a topic, requires a similar change to the `docs/source/toc.yaml`
+ file. Updating the `docs/source/toc.yaml` file is not automatic.
+
+- Because the GitHub browsing expectation is that a `README.md` file is rendered when you
+ browse a directory, when a directory is copied, the `README.md` file is renamed to
+ `index.md` to meet the HTML web server expectation of locating an `index.html` file
+ in a directory.
+
+- Add the file to the `docs/source/toc.yaml` file. Keep in mind that notebooks are
+ copied into the `docs/source/` directory, so the paths are relative to that location.
+ Follow the pattern that is already established and you'll be fine.
+
+### Adding links
+
+TIP: When adding a link to a method or any heading that has underscores in it, repeat
+the underscores in the link even though they are converted to hyphens in the HTML.
+
+Refer to the following examples from HugeCTR:
+
+- `../QAList.md#24-how-to-set-workspace_size_per_gpu_in_mb-and-slot_size_array`
+- `./api/python_interface.md#save_params_to_files-method`
+
+#### Docs-to-docs links
+
+There is no concern for the GitHub browsing experience for files in the `docs/source/` directory.
+You can use a relative path for the link. For example, in the HugeCTR repository, the
+following link is in the `docs/source/hugectr_user_guide.md` file and links to the
+"Build HugeCTR from Source" heading in the `docs/source/hugectr_contributor_guide.md` file:
+
+```markdown
+To build HugeCTR from scratch, refer to
+[Build HugeCTR from source code](./hugectr_contributor_guide.md#build-hugectr-from-source).
+```
+
+#### Docs-to-repository links
+
+To refer a reader to a README or program in a repository directory, state that
+the link is to the repository:
+
+```markdown
+Refer to the sample Python programs in the
+[examples/blah](https://github.com/NVIDIA-Merlin/models/tree/main/examples/blah)
+directory of the repository.
+```
+
+The idea is to let a reader know that following the link&mdash;whether from an HTML docs page or
+from browsing GitHub&mdash;results in viewing our repository on GitHub.
+
+> TIP: In the `release_notes.md` file, use the tag such as `v1.1.0` instead of `main` so that
+> the link is durable.
+
+#### Links to notebooks
+
+The notebooks are published as documentation. The few exceptions are identified in the
+`docs/source/conf.py` file in the `exclude_patterns` list:
+
+```python
+exclude_patterns = [
+ # list RST, MD, and IPYNB files to ignore here
+]
+```
+
+If the document that you link from is also published as docs, such as `release_notes.md`, then
+a relative path works both in the HTML docs page and in the repository browsing experience:
+
+```markdown
+### Some awesome feature
+
+ + ...snip...
+ + ...snip...
+ + Added the [awesome notebook](examples/awesome_notebook.ipynb) to show how to use the feature.
+```
+
+#### Links from notebooks to docs
+
+Use a link to the HTML page like the following:
 
-1. Open a web browser to the IP address or hostname of the host machine at
- port 8000.
+```markdown
+<https://nvidia-merlin.github.io/NVTabular/main/Introduction.html>
+```
 
- Check that the doc edits format correctly and read well.
+> I'd like to change this in the future. My preference would be to use a relative
+> path, but I need to research and change how Sphinx handles relative links.
diff --git a/docs/source/additional_resources.rst b/docs/source/additional_resources.rst
@@ -0,0 +1,8 @@
+Additional Resources
+====================
+
+.. toctree::
+ :maxdepth: 2
+
+ Contributing to Merlin Models <CONTRIBUTING.md>
+ Github Repo <https://github.com/NVIDIA-Merlin/models>
diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -185,35 +185,6 @@ Schema Functions
  merlin.models.utils.schema_utils.get_embedding_size_from_cardinality
 
 
-Data
-----
-
-.. currentmodule:: merlin.models
-
-.. autosummary::
- :toctree: generated
-
- merlin.models.data.synthetic.SyntheticData
-
-
-Loader Utility Functions
-~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autosummary::
- :toctree: generated
-
- merlin.models.loader.utils.device_mem_size
-
-Loader Utility Functions for TensorFlow
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autosummary::
- :toctree: generated
-
- merlin.models.loader.tf_utils.configure_tensorflow
- merlin.models.loader.tf_utils.get_dataset_schema_from_feature_columns
-
-
 Utilities
 ---------