-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure deterministic resolution of toctree #12888
Conversation
afd81d9
to
ec51d9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd have a test for this, but I'm not sure how easy it would be, so I don't mind as much not having one.
Please could you add an entry to CHANGES, though?
A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work guys!
This PR only resolves the issue of non-determinism, however users may still have their own expectations of what parent should be chosen which is not trivial to solve. I think we should warn the user when a document is included in multiple toctrees (different files). What do you think? Edit: I have added the warning in the recent commit but feel free to discard. |
bdeb4e0
to
909b161
Compare
5d40fc3
to
1b08b66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the lines of "correctness first, performance / etc second", I like this and would suggest that we merge it and gather feedback 👍
There are two thoughts I had while reviewing this:
- Whether tree depth might be better than lexicographic order as a sort key (@khanxmetu has also noted this as a possibility in an updated comment)
- A question really: it puzzles me slightly that we have nondeterminism here and not in the breadcrumb navigation trail (e.g.
System Emulation > QEMU System Emulator Targets > PowerPC System emulator > pSeries family boards (pseries) > ...
in the header of the referencedqemu
page). Does that component gather the navtree differently? (I should, and will eventually, check)
This seems a better approach than solely relying upon lexicographic order. However note that sorting and lexicographic order is still required to break the tie in case of equal path depths. Besides that, are you proposing for minimum or maximum path depth to be chosen?
Thank you for bringing this up. I found two important issues:
I propose that shortest depth path should be chosen. We can add a BFS traversal function similar to |
Thanks @khanxmetu, that makes sense that a tiebreaker will still be required if we use a path-depth approach. To answer whether I'm suggesting to include path-depth in the sorting: initially I say no, let's continue to use solely lexicographic sorting -- because whatever method we choose, I think some projects/pages will still emit sidebar navigation menus that seem unexpected to some people, because the origin of the ambiguity is in the source files. Even so, I think additional discussion of the navtree is worthwhile:
I wouldn't worry too much about ensuring that the navtree is always consistent with the sidebar toctree; as you've noticed in #12926, table-of-contents displays can be customized by themes and layout; my sense is that it could be difficult to get them to correspond precisely, and also that additional tree traversals might introduce unpredictable build performance changes (especially for large projects). To resolve the bug we can focus on making the build output stable (ensuring that it stays the same for two or more subsequent builds) - and your changeset here already does that. If it seems easy to re-use logic between |
# Conflicts: # CHANGES.rst # sphinx/environment/__init__.py
Our docs are failing... should we consider "INFO" for this or is "WARNING" appropriate? This is just to fix determinism so I think "WARNING" is a bit heavy...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation seems fine, but needs a decision on if/when/how to emit logging (cc @jayaddison)
A
These seem to be false-positives in the case of the Sphinx documentation: yes there are multiple entries, but in each of these three examples the ancestor path always resolves to the same value (in other words: So I'd probably agree with downgrading to |
Is there a simple way to avoid these? False positives are pretty frustrating for users. A |
Maybe; if so I think it may require some kind of relative-path-resolution approach (I'm experimenting with it at the moment). |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I think this is more like it, but supporting unit test coverage is required: diff --git a/sphinx/environment/__init__.py b/sphinx/environment/__init__.py
index 28e53aa7d..9780ef396 100644
--- a/sphinx/environment/__init__.py
+++ b/sphinx/environment/__init__.py
@@ -789,19 +789,27 @@ def _traverse_toctree(
def _check_toc_parents(toctree_includes: dict[str, list[str]]) -> None:
- toc_parents: dict[str, list[str]] = {}
+ toc_parents: dict[str, set[str]] = {}
for parent, children in toctree_includes.items():
for child in children:
- toc_parents.setdefault(child, []).append(parent)
+ # Remove duplicate ancestry if it exists
+ base = path.commonprefix([parent, child])
+ if base and base.endswith('/'):
+ parent = parent.removeprefix(base)
+ child = child.removeprefix(base)
+ # Record de-duplicated toctree routes to each document
+ absolute_path = "/".join(filter(None, [base, parent]))
+ toc_parents.setdefault(child, set()).add(absolute_path)
for doc, parents in sorted(toc_parents.items()):
if len(parents) > 1:
+ candidates = sorted(parent for parent in parents)
logger.info(
__(
'document is referenced in multiple toctrees: %s, selecting: %s <- %s'
),
- parents,
- max(parents),
+ candidates,
+ max(candidates),
doc,
location=doc,
type='toc', |
Possibly also wrong, in particular due to incorrect in-place modification of the |
I'll try to write some unit test coverage within the next day or so to help implement more-precise messaging |
@jayaddison I don't quite understand how are the warnings false-positive. I don't think it's best to think of ancestor path as a single string representing path to root because the underlying implementation does not have any sense of directories and is merely a list of docnames/docpaths entities as strings each pointing to the next child in chain. ['index', 'usage/configuration'] != ['index', 'usage/index', 'usage/configuration']. A graphical representation similar to what I had in tests:
There exists an ambiguity as to what parent should be chosen by Had it been that |
Ok, thanks @khanxmetu. What you say makes sense. I suppose the particular scenario I'm considering is cases where apparently-ambiguous toctree references all in fact map to a single path. In other words: |
It's certainly possible that I made a mistake here; but I think that by implicitly expanding an |
My apologies: reading the code, it appears that, apart from the root document, docpaths ending in Given that, I think the existing
I'll re-approve the PR to indicate that. |
@AA-Turner it's taken me a longwinded route to get there, but I believe that the ambiguity messages about the Sphinx documentation are genuine, in that they make it unclear (in machine terms, if not in human terms) about how the corresponding sidebar toctree should be expanded. I still feel that it might be possible somehow to devise a more advanced algorithm that can resolve a subset of the multi-toctree reference cases unambiguously, however I don't have 100% confidence that it's feasible in a performant way (mostly I don't feel I have a complete mental model of the problem space yet), nor do I think we should do it during this PR even if it is. So in short: I'm comfortable with merging the pull request. |
# Conflicts: # CHANGES.rst
Semi-OT: from #6714 (comment)
Is a non-tree toc arrangement really desirable? Or is it just a structure that happens to be allowed through the way I mean the name "toctree" already implies a tree. And from a user prespective, I find a clear hierarchial TOC helpful to understand the structure of docs. For my docs, I'd rather place a topic in one place and cross-reference from another place, rather than including it fully in two locations. So, while it's nice to make this deterministic, the ambiguity persists: I go into a subtopic from one location and find myself in a completely different toc path. I would consider that a bug for my documentation. Wouldn't it be better (or at least an option) to warn on multiple inclusions of the same document? |
@timhoffm This PR has two purposes in case of multiple parents:
The workflow you mentioned is ideal for local toc (in-document, see the issue with qemu-docs for example where toctree directive is (ab)used for spapr-numa: https://github.com/qemu/qemu/blob/master/docs/system/ppc/pseries.rst?plain=1#L144). However, in some cases it could be desirable to have a secondary reference in global toctree (side-bar navigation) as is the case with Sphinx's documentation where "Configuration" link is also put on the root-level in the sidebar for convenience. If this issue seems significant, perhaps, in the future we could have primary(to be used as toc parent) and secondary(not to be used as toc parent) toc entries in the case of conflicts. |
This reverts commit 8351936.
Subject: Ensure deterministic toctree generation
Feature or Bugfix
Purpose
Detail
Relates