Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing docstring normalization #1736

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Add missing docstring normalization #1736

wants to merge 3 commits into from

Conversation

jberdine
Copy link
Collaborator

Without this docstrings with UTF8 withing [] have miscalculated lengths.

This adds (some of?) the normalization needed for [...] docstring elements. Currently this change also incorrectly applies to preformatted code block elements {[...]}. I do not know the best way to distinguish these cases in odoc, see #1735.

Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know the best way to distinguish these cases in odoc

Multi-line code blocks are `Code_block in the AST and are handled in fmt_nestable_block_element, immediately calling fmt_code_block.
Inline code spans are `Code_span.

lib/Fmt_odoc.ml Show resolved Hide resolved
@jberdine
Copy link
Collaborator Author

I don't know the best implementation strategy. The issues I am trying to resolve are, I think, similar in nature to the following example. Formatting:

  val trms : t -> trm iter
  (** [trms a] enumerates the maximal foreign or noninterpreted proper
      subterms of [a]. Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ))] is the sequence of monomials
      [Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ] for each [i]. If the arithmetic term is a monomial,
      [trms (Πⱼ₌₁ᵐ Xⱼ^pⱼ)] is the sequence of factors [Xⱼ] for each [j]. *)

yields

  val trms : t -> trm iter
  (** [trms a] enumerates the indeterminate terms appearing in [a].
      Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ
      Xᵢⱼ^pᵢⱼ))] is the sequence of terms [Xᵢⱼ] for each [i] and
      [j]. *)

while with this PR this docstring is unchanged, as desired.

There seems to be something amiss where without this PR, the length of strings with UTF8 characters is not calculated correctly, where with this PR they are. @Julow, this behavior that I see doesn't match up with your explanation in the inline comment, so I'm confused. Do you see a different / better solution?

@Julow
Copy link
Collaborator

Julow commented Jul 21, 2021

Are you using 0.18.0 as a comparison ?

I get no diff between this PR and the current master with your example but I do see it compared with 0.18.0.
UTF8 length (grapheme clusters) is used on every strings (including keywords and identifiers) since #1673

@jberdine
Copy link
Collaborator Author

Hmm, I am literally comparing this PR with master. I'll dig some more to see if I can understand what is happening.

@jberdine
Copy link
Collaborator Author

Ugh, this is an instance of #1675. If you start with master and format

val trms : t -> trm iter
  (** [trms a] enumerates the indeterminate terms appearing in [a].
      Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ
      Xᵢⱼ^pᵢⱼ))] is the sequence of terms [Xᵢⱼ] for each [i] and
      [j]. *)

then it is unchanged. Switch to this PR and format and it produces:

  val trms : t -> trm iter
  (** [trms a] enumerates the maximal foreign or noninterpreted proper
      subterms of [a]. Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ))] is the sequence of monomials
      [Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ] for each [i]. If the arithmetic term is a monomial,
      [trms (Πⱼ₌₁ᵐ Xⱼ^pⱼ)] is the sequence of factors [Xⱼ] for each [j]. *)

@Julow
Copy link
Collaborator

Julow commented Jul 22, 2021

The remaining question is: How should be change the formatting of code spans.

This PR allows every spaces in a code span to break. This makes it more stable (there's only one stable formatting) but it changes Odoc's AST.

(** [a
    b] *)

This is parsed as "a\n\ \ \ \ b" by Odoc. (The HTML output is "a b" but not in other backends)
Perhaps we could change Odoc to normalize whitespaces inside code spans ? For now we can't break code spans.

@jberdine
Copy link
Collaborator Author

Hmm, this feels like another case of odoc just doing something different from what is in the ocaml manual, so I don't know. To my understanding, {[...]} are described as preformatted, and hence parsing their contents in a whitespace-preserving way makes sense, while [...] are not described as preformatted, and so their contents should be normalized. If the current parsing in odoc prevents this, then it seems like an odoc improvement is needed. Does that make sense to anyone other than me?

Without this docstrings with UTF8 withing `[]` have miscalculated lengths.
@jberdine jberdine marked this pull request as ready for review August 4, 2021 23:19
@gpetiot
Copy link
Collaborator

gpetiot commented Aug 26, 2021

I tend to agree that [] contents should be normalized while {[]} should be kept verbatim, is there an issue open on odoc? We will comply with whatever consensus there will be between ocaml and odoc on the language

@jberdine
Copy link
Collaborator Author

I don't think that there is an odoc issue about this. I'm not sure of the constraints on the odoc side, and on the interop between ocamlformat and odoc. We are currently using this PR internally (this branch) with no issues, even though we do full docstring normalization. So I'm not sure what the observed problem just for odoc looks like.

@gpetiot gpetiot requested a review from Julow August 26, 2021 14:55
@Julow
Copy link
Collaborator

Julow commented Sep 14, 2021

I would support normalizing [ ... ] code blocks but this cannot be done yet. With the current Odoc, this means changing the AST and inserting unwanted newlines into the rendered output.
Also, we must be careful at how code blocks are used, this might break some documentations:

- (** [trim "  "] is [""] *)
+ (** [trim " "] is [""] *)

I believe that this PR doesn't fix the original issue (which was fixed in #1673).

Perhaps we should open an issue on Odoc ?

@jberdine
Copy link
Collaborator Author

In the original report I was mistaken to bring up utf length computation, that turned out not to be the issue. The issue in the comment above remains though, and is fixed by this PR.

@Julow, are you saying that any change in ocamlformat that allows every space in a code span to break would cause problems with odoc? Are the odoc ast changes you mention something that are checked in ocamlformat analogous to how we check the main parsetree is preserved by formatting? I'm curious as we have been running with this PR in production for a while with no issues.

@Julow
Copy link
Collaborator

Julow commented Oct 1, 2021

@Julow, are you saying that any change in ocamlformat that allows every space in a code span to break would cause problems with odoc? Are the odoc ast changes you mention something that are checked in ocamlformat analogous to how we check the main parsetree is preserved by formatting? I'm curious as we have been running with this PR in production for a while with no issues.

Any change in whitespace/newlines in code spans would change the rendered documentation. Every strings in doc comments are normalized: https://github.com/ocaml-ppx/ocamlformat/blob/main/lib/Normalize.ml#L79-L83 which is wrong at least for code spans with the current implementation of Odoc.

I believe it is reasonable to change Odoc to normalize line breaks (and the following spaces) in code spans (but not consecutive spaces).

@gpetiot gpetiot marked this pull request as draft October 8, 2021 21:28
@gpetiot gpetiot removed the CLA Signed label Apr 6, 2023
@gpetiot
Copy link
Collaborator

gpetiot commented Oct 11, 2023

I believe it is reasonable to change Odoc to normalize line breaks (and the following spaces) in code spans (but not consecutive spaces).

@Julow So is this something worth fixing in odoc? we will synchronize with odoc as necessary. Otherwise we should close this PR.

@Julow
Copy link
Collaborator

Julow commented Oct 11, 2023

I think it's worth clarifying Odoc's code span syntax independently from this PR.
This PR still makes sense in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants