Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Support syntax highlighting in <pre> #402

Open
Krinkle opened this issue Jul 28, 2017 · 4 comments
Open

Support syntax highlighting in <pre> #402

Krinkle opened this issue Jul 28, 2017 · 4 comments

Comments

@Krinkle
Copy link

Krinkle commented Jul 28, 2017

Originally reported at npm/www#207.


In GitHub Flavoured Markdown use of ```javascript renders the same as <pre lang="javascript">. Same HTML output, same syntax highlighting and other styling. Use of <pre> is often preferred when compatibility is desired with other Markdown parsers (such as Doxygen, Phabricator, GitBlit, GitWeb etc.), especially when the code is published from a repository outside GitHub (but possibly still mirrored to GitHub).

Many @wikimedia projects are maintained in Phabricator (with its own Markdown-like parser for readme files), and documentation generated by Doxygen (with its own Markdown-like parser to create the landing page from the readme file).

It seems marky-markdown (used on npmjs.com) treats these two syntaxes differently.

```javascript on GitHub ```javascript on npmjs.com
screen shot screen shot
<pre lang="javascript"> on GitHub <pre lang="javascript"> on npmjs.com
screen shot screen shot

rendering of the above tables

Notice how the <pre> element rendering at https://www.npmjs.com/package/oojs has no background color or syntax highlighting.

It'd be cool if marky-markdown could support this for improved interoperability (given that <pre> is supported by all, and also falls back nicely, whereas the tripple-backtick approach is mostly GFM-only. ReMarkup is the only other one I know of that supports tripple-backticks, but unfortunately it uses an incompatible syntax for specifying the language (it requires ```lang=javascript insteaed, which GFM doesn't support), which means the only common way to both is to use <pre>. And Doxygen/GitWeb/GitBlit etc. only take <pre> anyway.

@revin
Copy link
Collaborator

revin commented Jul 30, 2017

Hey @Krinkle very good observation. If GitHub does it, we definitely should do it too. Thanks for pointing it out!

@karanjthakkar
Copy link
Contributor

@revin If I wanted to give this a shot, where do you think I should start from? Any pointers?

@revin
Copy link
Collaborator

revin commented Sep 1, 2017

Hey @karanjthakkar, thanks for asking!

This one might be somewhat complicated, I think. I don't know the exact code that will get this feature implemented, off the top of my head, but here's how I'd attack it:

  1. Get the raw contents of the README in question so you're working with real data. Notice that there are a couple of instances of <pre lang="…"> there.
  2. Paste the README into the live marky tester and click on the debug tab over on the output side of the page to see the internal markdown-it token stream represented as JSON.
  3. Notice that the <pre lang="…"> markup will occur inside html_block tokens.
  4. Notice that marky-markdown does code highlighting by passing a function in the markdown-it options: see render.js.
  5. Notice that markdown-it only runs the highlighter on what CommonMark calls "code fences" (the ```-delimited markdown blocks): see markdown-it/lib/renderer.js

So, the technical task here is to convince markdown-it to also run the highlighter on html_block tokens when there's a lang attribute provided.

Here's where the process is a little foggy for me, and I can't tell you with absolute certainty what will work. What I think you'd need to do is create a markdown-it plugin that overrides the renderer for html_block tokens to inspect their contents, and to run the highlighter provided in the markdown-it instance's options when the block starts with a pre element having a lang attribute, plus make any necessary changes to the code that runs the highlighting stuff (which is all in our render.js) in case it turns out the code makes assumptions about the nature of the markdown-it token type or structure that are invalid now that highlighting might happen on more than just code fences.

We maintain a couple of plugins already that deal with processing html_block tokens; maybe review those first to get an idea of how the general process works, then add a new one and hook it up in render.js.

We'd also need good tests around this code as well. We're not gonna try and dictate whether you do them beforehand, TDD style; or after the fact once you've got it working; all we care about is that we have good test coverage for features like this. If you get all of this working and then want some help with getting the tests in place, I'm more than happy to lend a hand on that.

I hope this helps. If any part of it is unclear or you need more detail or whatever, just ask, and I'll answer anything I can. Thanks again! ✨✨✨

@karanjthakkar
Copy link
Contributor

@revin Thanks a lot for the detailed reply! 💯 This seems much more involved than I initially thought it would be. Let me go through the existing flow over the weekend and see if I can figure out what's happening. Then I can get back to you if I have any doubts!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants