Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate unified Python/C++ docs #13846

Merged
merged 80 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
2ebf609
Add breathe dep
vyasr Aug 9, 2023
efdd554
Add all libcudf doc pages
vyasr Aug 9, 2023
a2a5598
Remove extraneous file
vyasr Aug 9, 2023
91db8c7
Temporarily allow building with warnings so that CI can complete and …
vyasr Aug 10, 2023
0ec51ea
Temporarily disable CI doxygen check
vyasr Aug 10, 2023
0aee382
Fix cmake format
vyasr Aug 10, 2023
fd6f9e7
Start handling more missing refs
vyasr Nov 8, 2023
3229d52
Add lexer for pseudocode
vyasr Nov 8, 2023
ec0d1ba
Add the default stream to a group and link it
vyasr Nov 8, 2023
fe34bf3
Add a few more types to ignore
vyasr Nov 9, 2023
86ca9d5
Add extra intersphinx lookup step
vyasr Nov 9, 2023
884c5b8
Add more types to ignore
vyasr Nov 9, 2023
269f7ba
Add more robust logic for parsing namespaces
vyasr Nov 9, 2023
4e25e15
Add expressions to Sphinx
vyasr Nov 10, 2023
81eb699
Ignore detail APIs for now
vyasr Nov 10, 2023
327ec95
Add io_types to Sphinx docs
vyasr Nov 10, 2023
e100ee4
Also ignore md_regex
vyasr Nov 21, 2023
740c4f7
Breathe doesn't support deprecated tag.
vyasr Nov 21, 2023
3c610ed
Add anchors for namespaces
vyasr Nov 21, 2023
fbb1396
Add md_regex page
vyasr Nov 30, 2023
2952b4b
Also search the numeric namespace
vyasr Nov 30, 2023
7383e92
Add range_window_bounds to group
vyasr Nov 30, 2023
4ba01d3
Add nvtext namespace and clean up namespace logic
vyasr Nov 30, 2023
fcc9fc3
Ignore kafka objects
vyasr Nov 30, 2023
a182b49
Make sure to use the template-stripped reftarget when searching inter…
vyasr Nov 30, 2023
5ed9a0d
Add spans to Sphinx
vyasr Nov 30, 2023
0850b1c
Ignore dlmanagedtensor for now
vyasr Nov 30, 2023
7dcf8f9
Add tdigest to Sphinx
vyasr Nov 30, 2023
2c111eb
Ignore char_utf8
vyasr Nov 30, 2023
7b690f8
Also account for std namespaced objects
vyasr Dec 1, 2023
de3e5de
Add a couple more specific names to remap
vyasr Dec 1, 2023
f0e030e
Add io::datasource
vyasr Dec 1, 2023
153a695
Repoint intersphinx to the online docs
vyasr Dec 1, 2023
6a72a57
Add missing orc types
vyasr Dec 1, 2023
b3c4157
Ignore bpe pairs impl
vyasr Dec 1, 2023
6ddaa4e
Add newly added doxygen namespaces to Sphinx
vyasr Dec 1, 2023
7ceb302
Remove unused ingroup from src file and ignore symbol instead
vyasr Dec 2, 2023
4f19a29
Ignore TypeKind
vyasr Dec 2, 2023
b224641
Fix header
vyasr Dec 2, 2023
93a432d
Add script to parse xml and fix known issues
vyasr Dec 15, 2023
6817217
Parse more precisely and remove potential SFINAE duplicates
vyasr Dec 15, 2023
9ea9c75
Remove nonexistent group from Sphinx
vyasr Dec 15, 2023
3ed3a2f
Simplify script
vyasr Dec 15, 2023
8151aad
Make checks strict again
vyasr Dec 15, 2023
da266dc
Temporarily move parsing script
vyasr Dec 15, 2023
06625bf
Moving parsing into conf.py
vyasr Dec 15, 2023
527181f
Remove outdated reference
vyasr Dec 15, 2023
59c5844
Remove ignores that are no longer necessary
vyasr Dec 15, 2023
90c63e5
Add links for dlpack
vyasr Dec 15, 2023
238a553
Remove old test changes
vyasr Dec 15, 2023
895caf8
Put back detail ignore
vyasr Dec 15, 2023
15942d4
Temporarily disable text docs for cudf
vyasr Dec 16, 2023
0663d2e
Make table compatible with text output
vyasr Dec 16, 2023
90c89c5
Optimize missing reference hook
vyasr Dec 17, 2023
5161a3a
Reenable notebooks
vyasr Dec 17, 2023
b4ccc3b
Reenable text builds
vyasr Dec 17, 2023
a90679b
Address PR feedback
vyasr Dec 18, 2023
2edd7ad
Add one more note
vyasr Dec 18, 2023
bd3a9e1
Merge remote-tracking branch 'origin/branch-24.02' into feat/unify_docs
vyasr Dec 18, 2023
42604fa
Match group layout of modules from doxygen HTML
vyasr Dec 18, 2023
5ece824
Reorganize to add in non-API pages
vyasr Dec 18, 2023
0ede73e
Require new Breathe
vyasr Dec 18, 2023
beaceaf
Fix issues with developer guide links
vyasr Dec 18, 2023
7a9581f
Merge remote-tracking branch 'origin/branch-24.02' into feat/unify_docs
vyasr Dec 19, 2023
89ce4dc
Test parallel builds
vyasr Dec 19, 2023
08797fb
Move parallelism flag to build script so that it's not hardcoded in M…
vyasr Dec 19, 2023
cf40777
More optimizations
vyasr Dec 19, 2023
6c2fa6b
Merge branch 'branch-24.02' into feat/unify_docs
vyasr Dec 19, 2023
39bfa1a
Merge branch 'branch-24.02' into feat/unify_docs
vyasr Jan 9, 2024
8ea7a61
Merge remote-tracking branch 'origin/branch-24.02' into feat/unify_docs
vyasr Jan 9, 2024
aab3e86
Fix style
vyasr Jan 9, 2024
ff12064
Merge remote-tracking branch 'origin/branch-24.02' into feat/unify_docs
vyasr Jan 11, 2024
0efd3bc
Put back doxygen HTML generation for now.
vyasr Jan 11, 2024
3f598a7
Fix typo
vyasr Jan 12, 2024
2bcb370
Merge remote-tracking branch 'origin/branch-24.02' into feat/unify_docs
vyasr Jan 16, 2024
7445e50
Merge remote-tracking branch 'origin/branch-24.02' into feat/unify_docs
vyasr Jan 17, 2024
a5bc91a
Fix one more doxygen error
vyasr Jan 17, 2024
b2dae66
Revert all changes that break the doxygen build
vyasr Jan 17, 2024
7f8a50d
Fix a typo
vyasr Jan 17, 2024
13cd4c0
Disable APIs containing tables for now due to failing text builds
vyasr Jan 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions ci/build_docs.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2023-2024, NVIDIA CORPORATION.

set -euo pipefail

Expand Down Expand Up @@ -40,8 +40,8 @@ popd

rapids-logger "Build Python docs"
pushd docs/cudf
make dirhtml
make text
make dirhtml O="-j 4"
make text O="-j 4"
mkdir -p "${RAPIDS_DOCS_DIR}/cudf/"{html,txt}
mv build/dirhtml/* "${RAPIDS_DOCS_DIR}/cudf/html"
mv build/text/* "${RAPIDS_DOCS_DIR}/cudf/txt"
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- benchmark==1.8.0
- boto3>=1.21.21
- botocore>=1.24.21
- breathe>=4.35.0
- c-compiler
- cachetools
- clang-tools=16.0.6
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-120_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
- benchmark==1.8.0
- boto3>=1.21.21
- botocore>=1.24.21
- breathe>=4.35.0
- c-compiler
- cachetools
- clang-tools=16.0.6
Expand Down
6 changes: 3 additions & 3 deletions cpp/doxygen/developer_guide/TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -464,9 +464,9 @@ the host (`to_host`).

### Background

libcudf employs a custom-built [preload library
docs](https://man7.org/linux/man-pages/man8/ld.so.8.html) to validate its internal stream usage (the
code may be found
libcudf employs a custom-built [preload
library](https://man7.org/linux/man-pages/man8/ld.so.8.html) to validate its internal stream usage
(the code may be found
[`here`](https://github.com/rapidsai/cudf/blob/main/cpp/tests/utilities/identify_stream_usage.cpp)).
This library wraps every asynchronous CUDA runtime API call that accepts a stream with a check to
ensure that the passed CUDA stream is a valid one, immediately throwing an exception if an invalid
Expand Down
2 changes: 1 addition & 1 deletion cpp/include/cudf/strings/strings_column_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ class strings_column_view : private column_view {
/**
* @brief Returns the internal column of chars
*
* @throw cudf::logic error if this is an empty column
* @throw cudf::logic_error if this is an empty column
* @param stream CUDA stream used for device memory operations and kernel launches
* @return The chars column
*/
Expand Down
1 change: 1 addition & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,7 @@ dependencies:
common:
- output_types: [conda]
packages:
- breathe>=4.35.0
- dask-cuda==24.2.*
- *doxygen
- make
Expand Down
256 changes: 249 additions & 7 deletions docs/cudf/source/conf.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2018-2023, NVIDIA CORPORATION.
# Copyright (c) 2018-2024, NVIDIA CORPORATION.
#
# cudf documentation build configuration file, created by
# sphinx-quickstart on Wed May 3 10:59:22 2017.
Expand All @@ -16,11 +16,33 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import glob
import os
import re
import sys
import xml.etree.ElementTree as ET

from docutils.nodes import Text
from sphinx.addnodes import pending_xref
from sphinx.highlighting import lexers
from sphinx.ext import intersphinx
from pygments.lexer import RegexLexer
from pygments.token import Text as PText


class PseudoLexer(RegexLexer):
"""Trivial lexer for pseudocode."""

name = 'pseudocode'
aliases = ['pseudo']
tokens = {
'root': [
(r'.*\n', PText),
]
}


lexers['pseudo'] = PseudoLexer()

# -- Custom Extensions ----------------------------------------------------
sys.path.append(os.path.abspath("./_ext"))
Expand All @@ -35,6 +57,7 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"breathe",
"sphinx.ext.intersphinx",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
Expand All @@ -46,6 +69,67 @@
"myst_nb",
]

# Preprocess doxygen xml for compatibility with latest Breathe
def clean_definitions(root):
# Breathe can't handle SFINAE properly:
# https://github.com/breathe-doc/breathe/issues/624
seen_ids = set()
for sectiondef in root.findall(".//sectiondef"):
for memberdef in sectiondef.findall("./memberdef"):
id_ = memberdef.get("id")
for tparamlist in memberdef.findall("./templateparamlist"):
for param in tparamlist.findall("./param"):
for type_ in param.findall("./type"):
# CUDF_ENABLE_IF or std::enable_if
if "enable_if" in ET.tostring(type_).decode().lower():
if id_ not in seen_ids:
# If this is the first time we're seeing this function,
# just remove the template parameter.
seen_ids.add(id_)
tparamlist.remove(param)
else:
# Otherwise, remove the overload altogether and just
# rely on documenting one of the SFINAE overloads.
sectiondef.remove(memberdef)
break

# In addition to enable_if, check for overloads set up by
# ...*=nullptr.
for type_ in param.findall("./defval"):
if "nullptr" in ET.tostring(type_).decode():
try:
tparamlist.remove(param)
except ValueError:
# May have already been removed in above,
# so skip.
pass
break


# All of these in type declarations cause Breathe to choke.
# For friend, see https://github.com/breathe-doc/breathe/issues/916
strings_to_remove = ("__forceinline__", "CUDF_HOST_DEVICE", "decltype(auto)", "friend")
for field in (".//type", ".//definition"):
for type_ in root.findall(field):
if type_.text is not None:
for string in strings_to_remove:
type_.text = type_.text.replace(string, "")


def clean_all_xml_files(path):
for fn in glob.glob(os.path.join(path, "*.xml")):
tree = ET.parse(fn)
clean_definitions(tree.getroot())
tree.write(fn)


# Breathe Configuration
breathe_projects = {"libcudf": "../../../cpp/doxygen/xml"}
for project_path in breathe_projects.values():
clean_all_xml_files(project_path)
breathe_default_project = "libcudf"


nb_execution_excludepatterns = ['performance-comparisons.ipynb']

nb_execution_mode = "force"
Expand Down Expand Up @@ -195,11 +279,13 @@

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"cupy": ("https://docs.cupy.dev/en/stable/", None),
"dlpack": ("https://dmlc.github.io/dlpack/latest/", None),
"numpy": ("https://numpy.org/doc/stable", None),
"pyarrow": ("https://arrow.apache.org/docs/", None),
"pandas": ("https://pandas.pydata.org/docs/", None),
"pyarrow": ("https://arrow.apache.org/docs/", None),
"python": ("https://docs.python.org/3", None),
"rmm": ("https://docs.rapids.ai/api/rmm/nightly/", None),
vyasr marked this conversation as resolved.
Show resolved Hide resolved
"typing_extensions": ("https://typing-extensions.readthedocs.io/en/stable/", None),
}

Expand Down Expand Up @@ -238,14 +324,170 @@ def resolve_aliases(app, doctree):
text_node.parent.replace(text_node, Text(text_to_render, ""))


def ignore_internal_references(app, env, node, contnode):
name = node.get("reftarget", None)
if name == "cudf.core.index.GenericIndex":
def _generate_namespaces(namespaces):
all_namespaces = []
for base_namespace, other_namespaces in namespaces.items():
all_namespaces.append(base_namespace + "::")
for other_namespace in other_namespaces:
all_namespaces.append(f"{other_namespace}::")
all_namespaces.append(f"{base_namespace}::{other_namespace}::")
return all_namespaces

_all_namespaces = _generate_namespaces({
# Note that io::datasource is actually a nested class
"cudf": {"io", "io::datasource", "strings", "ast", "ast::expression"},
"numeric": {},
"nvtext": {},
})

_names_to_skip = {
# External names
"thrust",
"cuda",
"arrow",
# Unknown types
"int8_t",
"int16_t",
"int32_t",
"int64_t",
"__int128_t",
"size_t",
"uint8_t",
"uint16_t",
"uint32_t",
"uint64_t",
# Internal objects
"id_to_type_impl",
"type_to_scalar_type_impl",
"type_to_scalar_type_impl",
"detail",
# kafka objects
"python_callable_type",
"kafka_oauth_callback_wrapper_type",
# Template types
"Radix",
# Unsupported by Breathe
# https://github.com/breathe-doc/breathe/issues/355
"deprecated",
# TODO: This type is currently defined in a detail header but it's in
# the public namespace. However, it's used in the detail header, so it
# needs to be put into a public header that can be shared.
"char_utf8",
# TODO: This is currently in a src file but perhaps should be public
"orc::column_statistics",
# Sphinx doesn't know how to distinguish between the ORC and Parquet
# definitions because Breathe doesn't to preserve namespaces for enums.
"TypeKind",
}

_domain_objects = None
_prefixed_domain_objects = None
_intersphinx_cache = {}

_intersphinx_extra_prefixes = ("rmm", "rmm::mr", "mr")


def _cached_intersphinx_lookup(env, node, contnode):
"""Perform an intersphinx lookup and cache the result.

Have to manually manage the intersphinx cache because lru_cache doesn't
handle the env object properly.
"""
key = (node, contnode)
if key in _intersphinx_cache:
return _intersphinx_cache[key]
if (ref := intersphinx.resolve_reference_detect_inventory(env, node, contnode)) is not None:
_intersphinx_cache[key] = ref
return ref


def on_missing_reference(app, env, node, contnode):
# These variables are defined outside the function to speed up the build.
global _all_namespaces, _names_to_skip, _intersphinx_extra_prefixes, \
_domain_objects, _prefixed_domain_objects, _intersphinx_cache

# Precompute and cache domains for faster lookups
if _domain_objects is None:
_domain_objects = {}
_prefixed_domain_objects = {}
for (name, _, _, docname, _, _) in env.domains["cpp"].get_objects():
_domain_objects[name] = docname
for prefix in _all_namespaces:
_prefixed_domain_objects[f"{prefix}{name}"] = name

reftarget = node.get("reftarget")
if reftarget == "cudf.core.index.GenericIndex":
# We don't exposed docs for `cudf.core.index.GenericIndex`
# hence we would want the docstring & mypy references to
# use `cudf.Index`
node["reftarget"] = "cudf.Index"
return contnode
if "namespacecudf" in reftarget:
node["reftarget"] = "cudf"
return contnode
if "classcudf_1_1column__device__view_" in reftarget:
node["reftarget"] = "cudf::column_device_view"
return contnode

if (refid := node.get("refid")) is not None and "hpp" in refid:
# We don't want to link to C++ header files directly from the
# Sphinx docs, those are pages that doxygen automatically
# generates. Adding those would clutter the Sphinx output.
return contnode

if node["refdomain"] in ("std", "cpp") and reftarget is not None:
if any(toskip in reftarget for toskip in _names_to_skip):
return contnode

# Strip template parameters and just use the base type.
if match := re.search("(.*)<.*>", reftarget):
reftarget = match.group(1)

# Try to find the target prefixed with e.g. namespaces in case that's
# all that's missing.
# We need to do this search because the call sites may not have used
# the namespaces and we don't want to force them to, and we have to
# consider both directions because of issues like
# https://github.com/breathe-doc/breathe/issues/860
# (there may be other related issues, I haven't investigated all
# possible combinations of failures in depth).
if (name := _prefixed_domain_objects.get(reftarget)) is None:
for prefix in _all_namespaces:
if f"{prefix}{reftarget}" in _domain_objects:
name = f"{prefix}{reftarget}"
break
if name is not None:
return env.domains["cpp"].resolve_xref(
env,
_domain_objects[name],
app.builder,
node["reftype"],
name,
node,
contnode,
)

# Final possibility is an intersphinx lookup to see if the symbol
# exists in one of the other inventories. First we check the symbol
# itself in case it was originally templated and that caused the lookup
# to fail.
if reftarget != node["reftarget"]:
node["reftarget"] = reftarget
if (ref := _cached_intersphinx_lookup(env, node, contnode)) is not None:
return ref

# If the template wasn't the (only) issue, we check the various
# namespace prefixes that may need to be added or removed.
for prefix in _intersphinx_extra_prefixes:
if prefix not in reftarget:
node["reftarget"] = f"{prefix}::{reftarget}"
if (ref := _cached_intersphinx_lookup(env, node, contnode)) is not None:
return ref
else:
node["reftarget"] = reftarget.replace(f"{prefix}::", "")
if (ref := _cached_intersphinx_lookup(env, node, contnode)) is not None:
return ref

return None


Expand All @@ -261,4 +503,4 @@ def setup(app):
app.add_css_file("https://docs.rapids.ai/assets/css/custom.css")
app.add_js_file("https://docs.rapids.ai/assets/js/custom.js", loading_method="defer")
app.connect("doctree-read", resolve_aliases)
app.connect("missing-reference", ignore_internal_references)
app.connect("missing-reference", on_missing_reference)
1 change: 1 addition & 0 deletions docs/cudf/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ other operations.

user_guide/index
cudf_pandas/index
libcudf_docs/index
developer_guide/index
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Aggregation Factories
=====================

.. doxygengroup:: aggregation_factories
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Aggregation Groupby
===================

.. doxygengroup:: aggregation_groupby
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Aggregation Reduction
=====================

.. doxygengroup:: aggregation_reduction
:members:
Loading