-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal - redesign how Elixir adds links to source code formatted by Pygments #307
Comments
First of all, that's all great stuff. Intuitively, this looks like the right direction forward.
Can you expand on that? If pygments exposes an interface to add links to tokens (using a specific format, From the threads (pygments/pygments#1930 and pygments/pygments#1091), people mostly ask about generating anchors that match the definition/reference name. That is a bit different from our usecase that is:
Is the second usecase already supported? This seems pretty unrelated to link injection. Will we be parsing pygments output tree to find our symbols list? Could we contribute a change to pygments for it to try to handle the body of macros? |
See
We would have to generate a ctags index file with information about references in the processed file (and then save it somewhere in the filesystem, but that probably could be worked around, maybe it would be smarter if we would do that from update.py).
I think both issues mention both anchors and links? But regardless, if we have a usecase, then I don't think existing issues matter that much, unless there are any against this feature. And also, adding a functionality that allows users to add links/anchor information to any tokens would cover a more specific usecase of "generating anchors that match the definition/reference name".
Yes, we would use token list generated by Pygments to find references in the code. We could alternatively stay with existing tokenizers just for update.py, but I just don't see the point.
Maybe, splitting macros into more tokens shouldn't affect formatting, it would just generate more tokens. |
Ah yes. And it doesn't give a way to configure what the links should look like, it is only describing generating anchors (ie putting title attributes). It is not a solution for our need.
Yes, we'll see what the maintainers think of it. Indeed our need is more generic and so should fill the void that the issues want.
The nice thing is that working on this does not require any pygments patch.
It would (1) give us our desired symbols inside macros and (2) maybe improve syntax highlighting for the (probable) majority of macros that can be parsed. That requires a really lean lexer however, I don't know what pygments internals look like. |
You actually can configure link format, see
Maybe it would actually be a good idea to start there, to see if updates that use Pygments lexers to find references give satisfying results.
I think we agree, I skipped a few steps there - what I meant is that I think that if a lexer would generate more tokens for a macro, as long as nothing was skipped and token types were correct, highlighting wouldn't change (and yes, someone could use these new tokens to improve macro highlighting). |
I'm wondering if it wouldn't be a good idea to redesign how Elixir turns identifiers into links in formatted source code. Current approach is kind of hacky and some bugs found in it seem rather hard to fix.
How filters work right now
Some more notes:
Problems with this approach
While of course a some of this could be fixed while leaving the filters design mostly the same, I started thinking if there maybe is a better way to design this whole part.
Alternative design with Pygments
First, something that I want to be sure you noticed - the responsibility of filters seems to be to work around a missing functionality of Pygments. Pygments currently does not allow adding links to selected tokens (except if you generate a ctags file, but that's another hack, please, let's not do that).
My idea is simple:
Pygments is split into lexers and formatters - the lexer turns source into a sequence of tokens. The formatter then takes the tokens and turns them into formatted code. Pygments also supports filters - functions that take a sequence of tokens and transform it in some way.
This could be leveraged, unfortunately Pygments currently does not allow filters to add link information to tokens.
What I would like to do:
This should lead to less and simpler code - for example, filters won't have to track what they replaced in the unformatted source. Filters will also receive more information which in the future could allow them to, for example, track context to supply links with more information. And, less regex (in Elixir at least).
Possible problems
A problem I see so far is that Pygments treats each macro line as a single token. Macros are easily recognizable because they use a separate token type. So I think this could be handled by a filter that would lex all macros into smaller tokens and supply them with identifier information. I realize that this hits into some of my points, but I still think this approach is better - most code would be handled by the cleaner codepath. And also, support for identifiers in macros is also already very hacky at best - maybe this would allow us to improve something, since the job of recognizing macros will now be done for us.
The text was updated successfully, but these errors were encountered: