Support custom extensions "interrupting" built-in tokens #3435

calculuschild · 2024-08-29T17:48:14Z

What pain point are you perceiving?.
Not sure the best way to describe this. So currently, custom extensions have the start property which we use to interrupt the paragraph element. But there are other tokens that are interruptable according to the Commonmark/GFM spec. For example, GFM Tables must end when they encounter another block-level token.

The difficulty comes with enforcing that rule for custom extensions. Say I make a new block-level token via custom extensions

{{block
....
}}

If this were placed immediately after a Table, the table would just consume it, because it does not interact with the start property in the same way that paragraph does. You could roll your own Table tokenizer that does nothing but except add a few more characters to the Rules regex, but this seems like a lot of effort just to make your extension compatible with GFM rules.

Describe the solution you'd like
I really don't know how this would be implemented, but the desire would be a way for an extension to signal which tokens it can interrupt. Or, maybe better the other way around, allow a token to specify which types of other tokens can interrupt it.

One thing to consider, is that each token is also a little different in terms of at what points it can be interrupted. Blockquotes can only be interrupted during the "lazy continuation" step. Paragraphs can be interrupted any time. Tables can only interrupted if the line starts without |. Not every token can be interrupted by the same kinds of tokens.

I kind of hacked my way around this for Tables using my own extension Marked-Extended-Tables by allowing the user to input "termination" regex that would be appended to the tokenizer and cause table to stop lexing on that line.

https://github.com/calculuschild/marked-extended-tables/blob/9e56b24598e07de71e225d6c50a50d40c366965f/src/index.js#L23-L25

Not sure if this is the easiest way to go about it, but the trickiest part is somehow applying that to the built-in tokens without just ending up rewriting every tokenizer anyway.

Mostly I'm just kind of stumped on any better way to do this.

The text was updated successfully, but these errors were encountered:

UziTech · 2024-08-29T20:10:28Z

The way we interrupt paragraph is by clipping src when passing it into the tokenizer

marked/src/Lexer.ts

Line 235 in 2124b5d

// prevent paragraph consuming extensions by clipping 'src' to extension start

We could do something similar with other tokenizers.

Although I'm not sure this is needed if we just say built in tokens take precedence over custom tokens. In well formatted markdown every block token should be separated by a blank line. The only reason start is actually needed is for inline tokens.

UziTech · 2024-08-29T20:13:17Z

For example the katex extensions block tokenizer does not have a start function because we are expecting a blank line before it so even a paragraph takes precedence.

https://github.com/UziTech/marked-katex-extension/blob/main/src/index.js#L63

calculuschild · 2024-08-29T21:23:09Z

The way we interrupt paragraph is by clipping src when passing it into the tokenizer

I remember. I wrote that. 😜

In well formatted markdown every block token should be separated by a blank line.

Pretty markdown might, but the specs still make it clear that it is valid to place certain block tokens directly against each other. demo example

The only reason start is actually needed is for inline tokens.

Remember, we have separate handling for paragraphs and inline text. Paragraphs are clipped by block tokens

marked/src/Lexer.ts

Line 237 in 2124b5d

if (this.options.extensions && this.options.extensions.startBlock) {

, and inline text is clipped by inline tokens

marked/src/Lexer.ts

Line 436 in 2124b5d

if (this.options.extensions && this.options.extensions.startInline) {

. They are both needed.

We could do something similar with other tokenizers.

If we did, I think it would only need to be tables and blockquotes to keep with the GFM spec. The other block tokens have a clear ending symbol (fences), or are allowed to just absorb the block tokens (lists). Maybe that's not too bad?

UziTech · 2024-08-29T22:21:08Z

Remember, we have separate handling for paragraphs and inline text. Paragraphs are clipped by block tokens. They are both needed.

The block tokenizer start function is not needed if you don't need to interrupt a paragraph. Paragraphs are automatically interrupted by blank lines.

calculuschild changed the title ~~Support custom extensions "interrupting" other tokens~~ Support custom extensions "interrupting" built-in tokens Aug 29, 2024

UziTech added the proposal label Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom extensions "interrupting" built-in tokens #3435

Support custom extensions "interrupting" built-in tokens #3435

calculuschild commented Aug 29, 2024 •

edited

Loading

UziTech commented Aug 29, 2024 •

edited

Loading

UziTech commented Aug 29, 2024 •

edited

Loading

calculuschild commented Aug 29, 2024 •

edited

Loading

UziTech commented Aug 29, 2024

Support custom extensions "interrupting" built-in tokens #3435

Support custom extensions "interrupting" built-in tokens #3435

Comments

calculuschild commented Aug 29, 2024 • edited Loading

UziTech commented Aug 29, 2024 • edited Loading

UziTech commented Aug 29, 2024 • edited Loading

calculuschild commented Aug 29, 2024 • edited Loading

UziTech commented Aug 29, 2024

calculuschild commented Aug 29, 2024 •

edited

Loading

UziTech commented Aug 29, 2024 •

edited

Loading

UziTech commented Aug 29, 2024 •

edited

Loading

calculuschild commented Aug 29, 2024 •

edited

Loading