Skip to content

Commit

Permalink
docs: Use sphinx-ext-toc to control examples (NVIDIA-Merlin#404)
Browse files Browse the repository at this point in the history
  • Loading branch information
mikemckiernan authored May 9, 2022
1 parent 279e732 commit b6042db
Show file tree
Hide file tree
Showing 14 changed files with 284 additions and 203 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ docs/_build/
docs/source/_build/
docs/source/generated
docs/source/README.md
docs/source/CONTRIBUTING.md
docs/source/examples

# PyBuilder
Expand Down Expand Up @@ -103,4 +104,4 @@ dmypy.json


# Experiment files
_test.py
_test.py
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
## Merlin Models
## Merlin Models

[![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-models.svg)](https://pypi.python.org/pypi/merlin-models/)
![GitHub License](https://img.shields.io/github/license/NVIDIA-Merlin/models)
[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/models/main/)
Expand Down Expand Up @@ -33,12 +34,13 @@ To address the challenge, Merlin has custom, highly-optimized dataloaders to acc
The Merlin dataloaders can lead to a speedup that is nine times faster than the same training pipeline used with the GPU.

With the Merlin dataloaders, you can:

- Remove bottlenecks from data loading by processing large chunks of data at a time instead of item by item.
- Process datasets that don't fit within the GPU or CPU memory by streaming from the disk.
- Prepare batches asynchronously into the GPU to avoid CPU-to-GPU communication.
- Integrate easily into existing TensorFlow or PyTorch training pipelines by using a similar API.

To learn about the core features of Merlin Models, see the [Models Overview](docs/source/models_overview.md) page.
To learn about the core features of Merlin Models, see the [Models Overview](https://nvidia-merlin.github.io/models/main/models_overview.html) page.

### Installation

Expand All @@ -61,7 +63,7 @@ Refer to the [Merlin Containers](https://nvidia-merlin.github.io/Merlin/main/con

#### Installing Merlin Models from Source

Merlin Models can be installed from source by running the following commands:
Merlin Models can be installed from source by running the following commands:

```shell
git clone https://github.com/NVIDIA-Merlin/models
Expand All @@ -73,7 +75,7 @@ cd models && pip install -e .
Merlin Models makes it straightforward to define architectures that adapt to different input features.
This adaptability is provided by building on a core feature of the NVTabular library.
When you use NVTabular for feature engineering, NVTabular creates a schema that identifies the input features.
You can see the `Schema` object in action by looking at the [Applying to your own dataset with Merlin Models and NVTabular](https://github.com/NVIDIA-Merlin/models/examples/02-Merlin-Models-and-NVTabular-applying-to-your-own-dataset.html) example notebook.
You can see the `Schema` object in action by looking at the [From ETL to Training RecSys models - NVTabular and Merlin Models integrated example](https://nvidia-merlin.github.io/models/main/examples/02-Merlin-Models-and-NVTabular-integration.html) example notebook.

You can easily build popular RecSys architectures like [DLRM](http://arxiv.org/abs/1906.00091), as shown in the following code sample.
After you define the model, you can train and evaluate it with a typical Keras model.
Expand All @@ -98,11 +100,11 @@ model.fit(train, validation_data=valid, batch_size=1024)
eval_metrics = model.evaluate(valid, batch_size=1024, return_dict=True)
```

1. To build the internal input layer, the model identifies them from the schema object.
The schema identifies the continuous features and categorical features, for which embedding tables are created.
2. To define the body of the architecture, MLP layers are used with configurable dimensions.
3. The head of the architecture is created from the chosen task, `BinaryClassificationTask` in this example.
The target binary feature is also inferred from the schema (i.e., tagged as 'TARGET').
1. To build the internal input layer, the model identifies them from the schema object.
The schema identifies the continuous features and categorical features, for which embedding tables are created.
2. To define the body of the architecture, MLP layers are used with configurable dimensions.
3. The head of the architecture is created from the chosen task, `BinaryClassificationTask` in this example.
The target binary feature is also inferred from the schema (i.e., tagged as 'TARGET').

You can find more details and information about a low-level API in our overview of the
[Deep Learning Recommender Model](https://nvidia-merlin.github.io/models/main/models_overview.html#deep-learning-recommender-model).
Expand All @@ -118,4 +120,3 @@ The same notebooks are available in the `examples` directory from the [Merlin Mo
If you'd like to contribute to the library directly, see the [CONTRIBUTING.md](CONTRIBUTING.md) file.
We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations.
To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).

128 changes: 105 additions & 23 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This folder contains the scripts necessary to build the Merlin Models
documentation. You can view the generated
[documentation here](https://nvidia-merlin.github.io/models).

# Contributing to Docs
## Contributing to Docs

Refer to the following instructions to build the docs.

Expand All @@ -13,46 +13,128 @@ Refer to the following instructions to build the docs.
1. Follow the instructions to create a Python developer environment. See the
[installation instructions](https://github.com/NVIDIA-Merlin/models).

2. Install required documentation tools and extensions:
1. Install required documentation tools and extensions:

```sh
```shell
cd models
python3 -m virtualenv -p=python3.9 env
source env/bin/activate
pip install -r requirements/base.txt
pip install -r requirements/dev.txt
```

3. If you updated docstrings, you need to delete the `docs/source/api` directory
and then run the following command within the `docs` directory:
1. Build the documentation to HTML output:

```sh
sphinx-apidoc -f -o source/api ../merlin.models
```shell
make -C docs clean html
```

4. Navigate to `models/docs/` and transform the documentation to HTML output:
This should run Sphinx in your shell, and output HTML in
`build/html/`.

1. Start an HTTP server and review your updates:

```sh
make html
```shell
python -m http.server 8000 -d docs/build/html
```

This should run Sphinx in your shell, and output HTML in
`build/html/index.html`
Navigate a web browser to the IP address or hostname of the host machine at port 8000:

## Preview the documentation build
`https://localhost:8000`

1. To view the docs build, run the following command from the `build/html`
directory:
Check that your docs edits formatted correctly, and read well.

```sh
python -m http.server
## Decisions

# or
### Source management: README and index files

python -m SimpleHTTPServer 8000
```
- To preserve Sphinx's expectation that all source files are child files and directories
of the `docs/source` directory, other content, such as the `examples` directory is
copied to the source directory. You can determine which directories are copied by
viewing `docs/source/conf.py` and looking for the `copydirs_additional_dirs` list.
Directories are specified relative to the Sphinx source directory, `docs/source`.
- One consequence of the preceding bullet is that any change to the original files,
such as adding or removing a topic, requires a similar change to the `docs/source/toc.yaml`
file. Updating the `docs/source/toc.yaml` file is not automatic.
- Because the GitHub browsing expectation is that a `README.md` file is rendered when you
browse a directory, when a directory is copied, the `README.md` file is renamed to
`index.md` to meet the HTML web server expectation of locating an `index.html` file
in a directory.
- Add the file to the `docs/source/toc.yaml` file. Keep in mind that notebooks are
copied into the `docs/source/` directory, so the paths are relative to that location.
Follow the pattern that is already established and you'll be fine.

### Adding links

TIP: When adding a link to a method or any heading that has underscores in it, repeat
the underscores in the link even though they are converted to hyphens in the HTML.

Refer to the following examples from HugeCTR:

- `../QAList.md#24-how-to-set-workspace_size_per_gpu_in_mb-and-slot_size_array`
- `./api/python_interface.md#save_params_to_files-method`

#### Docs-to-docs links

There is no concern for the GitHub browsing experience for files in the `docs/source/` directory.
You can use a relative path for the link. For example, in the HugeCTR repository, the
following link is in the `docs/source/hugectr_user_guide.md` file and links to the
"Build HugeCTR from Source" heading in the `docs/source/hugectr_contributor_guide.md` file:

```markdown
To build HugeCTR from scratch, refer to
[Build HugeCTR from source code](./hugectr_contributor_guide.md#build-hugectr-from-source).
```

#### Docs-to-repository links

To refer a reader to a README or program in a repository directory, state that
the link is to the repository:

```markdown
Refer to the sample Python programs in the
[examples/blah](https://github.com/NVIDIA-Merlin/models/tree/main/examples/blah)
directory of the repository.
```

The idea is to let a reader know that following the link—whether from an HTML docs page or
from browsing GitHub—results in viewing our repository on GitHub.

> TIP: In the `release_notes.md` file, use the tag such as `v1.1.0` instead of `main` so that
> the link is durable.

#### Links to notebooks

The notebooks are published as documentation. The few exceptions are identified in the
`docs/source/conf.py` file in the `exclude_patterns` list:

```python
exclude_patterns = [
# list RST, MD, and IPYNB files to ignore here
]
```

If the document that you link from is also published as docs, such as `release_notes.md`, then
a relative path works both in the HTML docs page and in the repository browsing experience:

```markdown
### Some awesome feature
+ ...snip...
+ ...snip...
+ Added the [awesome notebook](examples/awesome_notebook.ipynb) to show how to use the feature.
```

#### Links from notebooks to docs

Use a link to the HTML page like the following:

1. Open a web browser to the IP address or hostname of the host machine at
port 8000.
```markdown
<https://nvidia-merlin.github.io/NVTabular/main/Introduction.html>
```

Check that the doc edits format correctly and read well.
> I'd like to change this in the future. My preference would be to use a relative
> path, but I need to research and change how Sphinx handles relative links.
8 changes: 8 additions & 0 deletions docs/source/additional_resources.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Additional Resources
====================

.. toctree::
:maxdepth: 2

Contributing to Merlin Models <CONTRIBUTING.md>
Github Repo <https://github.com/NVIDIA-Merlin/models>
29 changes: 0 additions & 29 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,35 +185,6 @@ Schema Functions
merlin.models.utils.schema_utils.get_embedding_size_from_cardinality


Data
----

.. currentmodule:: merlin.models

.. autosummary::
:toctree: generated

merlin.models.data.synthetic.SyntheticData


Loader Utility Functions
~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: generated

merlin.models.loader.utils.device_mem_size

Loader Utility Functions for TensorFlow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: generated

merlin.models.loader.tf_utils.configure_tensorflow
merlin.models.loader.tf_utils.get_dataset_schema_from_feature_columns


Utilities
---------

Expand Down
Loading

0 comments on commit b6042db

Please sign in to comment.