Add missing docstring normalization #1736

jberdine · 2021-07-21T13:21:06Z

Without this docstrings with UTF8 withing [] have miscalculated lengths.

This adds (some of?) the normalization needed for [...] docstring elements. Currently this change also incorrectly applies to preformatted code block elements {[...]}. I do not know the best way to distinguish these cases in odoc, see #1735.

Julow

I do not know the best way to distinguish these cases in odoc

Multi-line code blocks are `Code_block in the AST and are handled in fmt_nestable_block_element, immediately calling fmt_code_block.
Inline code spans are `Code_span.

lib/Fmt_odoc.ml

jberdine · 2021-07-21T16:01:20Z

I don't know the best implementation strategy. The issues I am trying to resolve are, I think, similar in nature to the following example. Formatting:

  val trms : t -> trm iter
  (** [trms a] enumerates the maximal foreign or noninterpreted proper
      subterms of [a]. Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ))] is the sequence of monomials
      [Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ] for each [i]. If the arithmetic term is a monomial,
      [trms (Πⱼ₌₁ᵐ Xⱼ^pⱼ)] is the sequence of factors [Xⱼ] for each [j]. *)

yields

  val trms : t -> trm iter
  (** [trms a] enumerates the indeterminate terms appearing in [a].
      Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ
      Xᵢⱼ^pᵢⱼ))] is the sequence of terms [Xᵢⱼ] for each [i] and
      [j]. *)

while with this PR this docstring is unchanged, as desired.

There seems to be something amiss where without this PR, the length of strings with UTF8 characters is not calculated correctly, where with this PR they are. @Julow, this behavior that I see doesn't match up with your explanation in the inline comment, so I'm confused. Do you see a different / better solution?

Julow · 2021-07-21T17:56:23Z

Are you using 0.18.0 as a comparison ?

I get no diff between this PR and the current master with your example but I do see it compared with 0.18.0.
UTF8 length (grapheme clusters) is used on every strings (including keywords and identifiers) since #1673

jberdine · 2021-07-21T20:11:46Z

Hmm, I am literally comparing this PR with master. I'll dig some more to see if I can understand what is happening.

jberdine · 2021-07-21T20:47:31Z

Ugh, this is an instance of #1675. If you start with master and format

val trms : t -> trm iter
  (** [trms a] enumerates the indeterminate terms appearing in [a].
      Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ
      Xᵢⱼ^pᵢⱼ))] is the sequence of terms [Xᵢⱼ] for each [i] and
      [j]. *)

then it is unchanged. Switch to this PR and format and it produces:

  val trms : t -> trm iter
  (** [trms a] enumerates the maximal foreign or noninterpreted proper
      subterms of [a]. Considering an arithmetic term as a polynomial,
      [trms (c × (Σᵢ₌₁ⁿ cᵢ × Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ))] is the sequence of monomials
      [Πⱼ₌₁ᵐᵢ Xᵢⱼ^pᵢⱼ] for each [i]. If the arithmetic term is a monomial,
      [trms (Πⱼ₌₁ᵐ Xⱼ^pⱼ)] is the sequence of factors [Xⱼ] for each [j]. *)

Julow · 2021-07-22T13:25:28Z

The remaining question is: How should be change the formatting of code spans.

This PR allows every spaces in a code span to break. This makes it more stable (there's only one stable formatting) but it changes Odoc's AST.

(** [a
    b] *)

This is parsed as "a\n\ \ \ \ b" by Odoc. (The HTML output is "a b" but not in other backends)
Perhaps we could change Odoc to normalize whitespaces inside code spans ? For now we can't break code spans.

jberdine · 2021-07-22T20:50:53Z

Hmm, this feels like another case of odoc just doing something different from what is in the ocaml manual, so I don't know. To my understanding, {[...]} are described as preformatted, and hence parsing their contents in a whitespace-preserving way makes sense, while [...] are not described as preformatted, and so their contents should be normalized. If the current parsing in odoc prevents this, then it seems like an odoc improvement is needed. Does that make sense to anyone other than me?

Without this docstrings with UTF8 withing `[]` have miscalculated lengths.

gpetiot · 2021-08-26T14:14:35Z

I tend to agree that [] contents should be normalized while {[]} should be kept verbatim, is there an issue open on odoc? We will comply with whatever consensus there will be between ocaml and odoc on the language

jberdine · 2021-08-26T14:43:01Z

I don't think that there is an odoc issue about this. I'm not sure of the constraints on the odoc side, and on the interop between ocamlformat and odoc. We are currently using this PR internally (this branch) with no issues, even though we do full docstring normalization. So I'm not sure what the observed problem just for odoc looks like.

Julow · 2021-09-14T14:44:31Z

I would support normalizing [ ... ] code blocks but this cannot be done yet. With the current Odoc, this means changing the AST and inserting unwanted newlines into the rendered output.
Also, we must be careful at how code blocks are used, this might break some documentations:

- (** [trim "  "] is [""] *)
+ (** [trim " "] is [""] *)

I believe that this PR doesn't fix the original issue (which was fixed in #1673).

Perhaps we should open an issue on Odoc ?

jberdine · 2021-09-30T22:13:22Z

In the original report I was mistaken to bring up utf length computation, that turned out not to be the issue. The issue in the comment above remains though, and is fixed by this PR.

@Julow, are you saying that any change in ocamlformat that allows every space in a code span to break would cause problems with odoc? Are the odoc ast changes you mention something that are checked in ocamlformat analogous to how we check the main parsetree is preserved by formatting? I'm curious as we have been running with this PR in production for a while with no issues.

Julow · 2021-10-01T12:55:39Z

@Julow, are you saying that any change in ocamlformat that allows every space in a code span to break would cause problems with odoc? Are the odoc ast changes you mention something that are checked in ocamlformat analogous to how we check the main parsetree is preserved by formatting? I'm curious as we have been running with this PR in production for a while with no issues.

Any change in whitespace/newlines in code spans would change the rendered documentation. Every strings in doc comments are normalized: https://github.com/ocaml-ppx/ocamlformat/blob/main/lib/Normalize.ml#L79-L83 which is wrong at least for code spans with the current implementation of Odoc.

I believe it is reasonable to change Odoc to normalize line breaks (and the following spaces) in code spans (but not consecutive spaces).

gpetiot · 2023-10-11T03:44:46Z

I believe it is reasonable to change Odoc to normalize line breaks (and the following spaces) in code spans (but not consecutive spaces).

@Julow So is this something worth fixing in odoc? we will synchronize with odoc as necessary. Otherwise we should close this PR.

Julow · 2023-10-11T08:18:03Z

I think it's worth clarifying Odoc's code span syntax independently from this PR.
This PR still makes sense in my opinion.

facebook-github-bot added the CLA Signed label Jul 21, 2021

Julow reviewed Jul 21, 2021

View reviewed changes

lib/Fmt_odoc.ml Show resolved Hide resolved

jberdine added 3 commits August 4, 2021 23:57

Add missing docstring normalization

b139da8

Without this docstrings with UTF8 withing `[]` have miscalculated lengths.

promote tests

82f553e

changes

eba87ba

jberdine force-pushed the norm branch from 82b388d to eba87ba Compare August 4, 2021 23:18

jberdine marked this pull request as ready for review August 4, 2021 23:19

gpetiot requested a review from Julow August 26, 2021 14:55

gpetiot mentioned this pull request Oct 8, 2021

[RFC] modifications to the conventional profile #1851

Closed

gpetiot marked this pull request as draft October 8, 2021 21:28

gpetiot removed the CLA Signed label Apr 6, 2023

panglesd mentioned this pull request Feb 27, 2024

Normalize whitespaces in codespans ocaml/odoc#1085

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing docstring normalization #1736

Add missing docstring normalization #1736

jberdine commented Jul 21, 2021

Julow left a comment

jberdine commented Jul 21, 2021

Julow commented Jul 21, 2021

jberdine commented Jul 21, 2021

jberdine commented Jul 21, 2021

Julow commented Jul 22, 2021 •

edited

Loading

jberdine commented Jul 22, 2021

gpetiot commented Aug 26, 2021

jberdine commented Aug 26, 2021

Julow commented Sep 14, 2021

jberdine commented Sep 30, 2021

Julow commented Oct 1, 2021 •

edited

Loading

gpetiot commented Oct 11, 2023 •

edited

Loading

Julow commented Oct 11, 2023

Add missing docstring normalization #1736

Are you sure you want to change the base?

Add missing docstring normalization #1736

Conversation

jberdine commented Jul 21, 2021

Julow left a comment

Choose a reason for hiding this comment

jberdine commented Jul 21, 2021

Julow commented Jul 21, 2021

jberdine commented Jul 21, 2021

jberdine commented Jul 21, 2021

Julow commented Jul 22, 2021 • edited Loading

jberdine commented Jul 22, 2021

gpetiot commented Aug 26, 2021

jberdine commented Aug 26, 2021

Julow commented Sep 14, 2021

jberdine commented Sep 30, 2021

Julow commented Oct 1, 2021 • edited Loading

gpetiot commented Oct 11, 2023 • edited Loading

Julow commented Oct 11, 2023

Julow commented Jul 22, 2021 •

edited

Loading

Julow commented Oct 1, 2021 •

edited

Loading

gpetiot commented Oct 11, 2023 •

edited

Loading