`commonmark-pandoc`: calculate relative cell widths for pipe tables #128

max-heller · 2023-11-18T17:27:23Z

Pandoc's Markdown parser sets relative widths for each column in a pipe table when the table contains long rows, which allows LaTeX to wrap cells to avoid overflowing the page (implemented in jgm/pandoc@eb8aee4).

commonmark-pandoc doesn't implement this functionality and always sets the column width to default, resulting in poor PDF rendering:

commonmark-hs/commonmark-pandoc/src/Commonmark/Pandoc.hs

Lines 139 to 140 in 6ec393d

    
           colspecs = map (\al -> (toPandocAlignment al, ColWidthDefault)) 
        
                       aligns

It'd be great if this feature could be implemented in commonmark-pandoc, but I don't have the familiarity with Haskell or the codebase to do it myself.

The text was updated successfully, but these errors were encountered:

jgm · 2023-11-18T23:09:38Z

commonmark-pandoc can only do what the core commonmark-extensions API for tables supports, so that is where the change would need to happen. `commonmark-pandoc' is just a thin wrapper.

max-heller · 2023-11-19T00:06:59Z

commonmark-pandoc can only do what the core commonmark-extensions API for tables supports, so that is where the change would need to happen. `commonmark-pandoc' is just a thin wrapper.

Not sure I follow: the code is currently passing ColWidthDefault, so it seems like it could also pass a specific ColWidth width.

jgm · 2023-11-19T20:52:00Z

Well, how would it figure out what ColWidth to use? commonmark-pandoc doesn't have access to the widths of separator lines.

Another issue is that computing relative widths requires a standard for 100%. In pandoc this is adjustable using --columns. Not sure how this would be handled in commonmark.

max-heller · 2023-11-20T13:31:37Z

Well, how would it figure out what ColWidth to use? commonmark-pandoc doesn't have access to the widths of separator lines.

Ah, I understand now: commonmark-extensions's HasPipeTable typeclass doesn't pass through the separator widths, so that'd have to be changed first.

Another issue is that computing relative widths requires a standard for 100%. In pandoc this is adjustable using --columns. Not sure how this would be handled in commonmark.

Perhaps commonmark-pandoc could always compute relative widths (using the separator line approach) and then leave it to the pandoc writer to determine whether to use them or determine widths automatically (if pandoc thinks the rows fit within --columns)?

jgm · 2023-11-20T17:01:24Z

Perhaps commonmark-pandoc could always compute relative widths (using the separator line approach) and then leave it to the pandoc writer to determine whether to use them or determine widths automatically (if pandoc thinks the rows fit within --columns)?

This would require pandoc to be parsing the document rather than handing off parsing to commonmark. Pandoc doesn't know what the line widths are -- it just sends commonmark the whole text and gets back an AST.

max-heller · 2023-11-20T21:24:57Z

This would require pandoc to be parsing the document rather than handing off parsing to commonmark. Pandoc doesn't know what the line widths are -- it just sends commonmark the whole text and gets back an AST.

I was imagining Pandoc would parse with commonmark, get an AST, then estimate rendered line widths based on the AST for each row.

If I understand the current approach, it seems like the column width of the markdown source is used to determine whether a line is long:

| column a | column b |
| -------- | -------- |
| normal   | more normal |
| something | reaaaaaaaaaaaaaaaaaaaaaally long | # considered long

With commonmark doing the parsing, Pandoc wouldn't be able to tell the length of the source line, but it'd get an AST that looks something like:

headers: ["column a", "column b"]
separators: ["--------", "--------"] or [8, 8] # separator widths added to the `commonmark-extensions` API
rows: [
    ["normal", "more normal"],
    ["something", "reaaaaaaaaaaaaaaaaaaaaaally long"],
]

With the AST, Pandoc would then sum up an approximation of width for each cell in a row, and determine whether row exceeds --columns.

Approximating the rendered width -- based on either the source markdown or the AST -- seems tricky in the presence of short commands that expand to long output and long commands that expand to short output (e.g. \textendash), but if a rough approximation is okay, it seems doable to perform on the AST layer as well.

jgm · 2023-11-20T22:12:53Z

You could get a rough approximation that way, maybe.
But there are lots of cases where it would fail. E.g. a Link in the AST could have been a short reference link or a long inline link in the source.

max-heller · 2023-11-20T22:50:13Z

You could get a rough approximation that way, maybe. But there are lots of cases where it would fail. E.g. a Link in the AST could have been a short reference link or a long inline link in the source.

Is that not already an issue with the source-based approach? [link](reaaaaaaaaaaaaaaaaaaaally loooooong link) looks long in the source but is short when rendered

jgm · 2023-11-25T20:37:15Z

What I mean is that it could not match the behavior specified in the manual, which refers to the length of the source line -- which we could only guess about.

max-heller · 2023-11-25T21:28:17Z

What I mean is that it could not match the behavior specified in the manual, which refers to the length of the source line -- which we could only guess about.

I see. As a clunky but correct approach then, how about passing the source (or the source length) for each row through commonmark-extensions along with the parsed rows?

jgm · 2024-02-07T03:58:37Z

We should be able to change

class HasPipeTable il bl where
  pipeTable :: [ColAlignment] -> [il] -> [[il]] -> bl

to something like

class HasPipeTable il bl where
  pipeTable :: [(ColAlignment, ColWidth)] -> [il] -> [[il]] -> bl

in commonmark-extensions. We can have the extension calculate widths in the same way as pandoc. Then it would be trivial to have commonmark-pandoc use this information.

The one potential drawback is that our pipe tables would sometimes render differently from GFM's -- and this might be a problem for some users.

To fix that, we could make pipeTableSpec parameterizable.

max-heller mentioned this issue Jan 16, 2024

Lines not wrapping in PDF tables google/comprehensive-rust#1709

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`commonmark-pandoc`: calculate relative cell widths for pipe tables #128

`commonmark-pandoc`: calculate relative cell widths for pipe tables #128

max-heller commented Nov 18, 2023

jgm commented Nov 18, 2023

max-heller commented Nov 19, 2023

jgm commented Nov 19, 2023

max-heller commented Nov 20, 2023

jgm commented Nov 20, 2023

max-heller commented Nov 20, 2023

jgm commented Nov 20, 2023

max-heller commented Nov 20, 2023

jgm commented Nov 25, 2023

max-heller commented Nov 25, 2023 •

edited

Loading

jgm commented Feb 7, 2024

commonmark-pandoc: calculate relative cell widths for pipe tables #128

commonmark-pandoc: calculate relative cell widths for pipe tables #128

Comments

max-heller commented Nov 18, 2023

jgm commented Nov 18, 2023

max-heller commented Nov 19, 2023

jgm commented Nov 19, 2023

max-heller commented Nov 20, 2023

jgm commented Nov 20, 2023

max-heller commented Nov 20, 2023

jgm commented Nov 20, 2023

max-heller commented Nov 20, 2023

jgm commented Nov 25, 2023

max-heller commented Nov 25, 2023 • edited Loading

jgm commented Feb 7, 2024

`commonmark-pandoc`: calculate relative cell widths for pipe tables #128

`commonmark-pandoc`: calculate relative cell widths for pipe tables #128

max-heller commented Nov 25, 2023 •

edited

Loading