Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commonmark-pandoc: calculate relative cell widths for pipe tables #128

Open
max-heller opened this issue Nov 18, 2023 · 11 comments
Open

commonmark-pandoc: calculate relative cell widths for pipe tables #128

max-heller opened this issue Nov 18, 2023 · 11 comments

Comments

@max-heller
Copy link

Pandoc's Markdown parser sets relative widths for each column in a pipe table when the table contains long rows, which allows LaTeX to wrap cells to avoid overflowing the page (implemented in jgm/pandoc@eb8aee4).

commonmark-pandoc doesn't implement this functionality and always sets the column width to default, resulting in poor PDF rendering:

colspecs = map (\al -> (toPandocAlignment al, ColWidthDefault))
aligns

It'd be great if this feature could be implemented in commonmark-pandoc, but I don't have the familiarity with Haskell or the codebase to do it myself.

@jgm
Copy link
Owner

jgm commented Nov 18, 2023

commonmark-pandoc can only do what the core commonmark-extensions API for tables supports, so that is where the change would need to happen. `commonmark-pandoc' is just a thin wrapper.

@max-heller
Copy link
Author

commonmark-pandoc can only do what the core commonmark-extensions API for tables supports, so that is where the change would need to happen. `commonmark-pandoc' is just a thin wrapper.

Not sure I follow: the code is currently passing ColWidthDefault, so it seems like it could also pass a specific ColWidth width.

@jgm
Copy link
Owner

jgm commented Nov 19, 2023

Well, how would it figure out what ColWidth to use? commonmark-pandoc doesn't have access to the widths of separator lines.

Another issue is that computing relative widths requires a standard for 100%. In pandoc this is adjustable using --columns. Not sure how this would be handled in commonmark.

@max-heller
Copy link
Author

Well, how would it figure out what ColWidth to use? commonmark-pandoc doesn't have access to the widths of separator lines.

Ah, I understand now: commonmark-extensions's HasPipeTable typeclass doesn't pass through the separator widths, so that'd have to be changed first.

Another issue is that computing relative widths requires a standard for 100%. In pandoc this is adjustable using --columns. Not sure how this would be handled in commonmark.

Perhaps commonmark-pandoc could always compute relative widths (using the separator line approach) and then leave it to the pandoc writer to determine whether to use them or determine widths automatically (if pandoc thinks the rows fit within --columns)?

@jgm
Copy link
Owner

jgm commented Nov 20, 2023

Perhaps commonmark-pandoc could always compute relative widths (using the separator line approach) and then leave it to the pandoc writer to determine whether to use them or determine widths automatically (if pandoc thinks the rows fit within --columns)?

This would require pandoc to be parsing the document rather than handing off parsing to commonmark. Pandoc doesn't know what the line widths are -- it just sends commonmark the whole text and gets back an AST.

@max-heller
Copy link
Author

This would require pandoc to be parsing the document rather than handing off parsing to commonmark. Pandoc doesn't know what the line widths are -- it just sends commonmark the whole text and gets back an AST.

I was imagining Pandoc would parse with commonmark, get an AST, then estimate rendered line widths based on the AST for each row.

If I understand the current approach, it seems like the column width of the markdown source is used to determine whether a line is long:

| column a | column b |
| -------- | -------- |
| normal   | more normal |
| something | reaaaaaaaaaaaaaaaaaaaaaally long | # considered long

With commonmark doing the parsing, Pandoc wouldn't be able to tell the length of the source line, but it'd get an AST that looks something like:

headers: ["column a", "column b"]
separators: ["--------", "--------"] or [8, 8] # separator widths added to the `commonmark-extensions` API
rows: [
    ["normal", "more normal"],
    ["something", "reaaaaaaaaaaaaaaaaaaaaaally long"],
]

With the AST, Pandoc would then sum up an approximation of width for each cell in a row, and determine whether row exceeds --columns.

Approximating the rendered width -- based on either the source markdown or the AST -- seems tricky in the presence of short commands that expand to long output and long commands that expand to short output (e.g. \textendash), but if a rough approximation is okay, it seems doable to perform on the AST layer as well.

@jgm
Copy link
Owner

jgm commented Nov 20, 2023

You could get a rough approximation that way, maybe.
But there are lots of cases where it would fail. E.g. a Link in the AST could have been a short reference link or a long inline link in the source.

@max-heller
Copy link
Author

You could get a rough approximation that way, maybe. But there are lots of cases where it would fail. E.g. a Link in the AST could have been a short reference link or a long inline link in the source.

Is that not already an issue with the source-based approach? [link](reaaaaaaaaaaaaaaaaaaaally loooooong link) looks long in the source but is short when rendered

@jgm
Copy link
Owner

jgm commented Nov 25, 2023

What I mean is that it could not match the behavior specified in the manual, which refers to the length of the source line -- which we could only guess about.

@max-heller
Copy link
Author

max-heller commented Nov 25, 2023

What I mean is that it could not match the behavior specified in the manual, which refers to the length of the source line -- which we could only guess about.

I see. As a clunky but correct approach then, how about passing the source (or the source length) for each row through commonmark-extensions along with the parsed rows?

@jgm
Copy link
Owner

jgm commented Feb 7, 2024

We should be able to change

class HasPipeTable il bl where
  pipeTable :: [ColAlignment] -> [il] -> [[il]] -> bl

to something like

class HasPipeTable il bl where
  pipeTable :: [(ColAlignment, ColWidth)] -> [il] -> [[il]] -> bl

in commonmark-extensions. We can have the extension calculate widths in the same way as pandoc. Then it would be trivial to have commonmark-pandoc use this information.

The one potential drawback is that our pipe tables would sometimes render differently from GFM's -- and this might be a problem for some users.

To fix that, we could make pipeTableSpec parameterizable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants