Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust user observations #61

Open
qtfkwk opened this issue Sep 28, 2022 · 3 comments
Open

Rust user observations #61

qtfkwk opened this issue Sep 28, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@qtfkwk
Copy link

qtfkwk commented Sep 28, 2022

I've been testing both tree-sitter-md and tree-sitter-markdown over the past couple days and have the following observations. Note that each observation is standalone and they aren't listed in any particular order. Also note that my use case is general in that it is not specifically or only concerned with syntax highlighting but includes other uses such as general markdown parsing/processing similar to pulldown-cmark.

  1. While tree-sitter-md (this crate) is more recently updated and maintained than tree-sitter-markdown, and depends on tree-sitter 0.20, its grammar seems more basic than tree-sitter-markdown's. In particular:

    • Block node inline kinds are present, and there are inline node inline kinds, but there seems to be no kind for just the plain text (?), so this would require additional calculations to extract the plain text, or replace parts of the rendered content with the inline spans... (?)
    • No support for tables due to strict commonmark spec (?)
    • Unsure if there are others... (?)
  2. tree-sitter-md uses a 2-pass approach which forces walking those multiple trees. Perhaps the MarkdownTree struct could provide a method to walk all of the trees? tree-sitter-markdown doesn't work like this, and there aren't any drawbacks afaik, because whether a node is a block or inline can be determined by its kind. Here's an example of usage for this crate, but tree-sitter-markdown does not need the nested bit to walk the inline tree(s).

    let input = String::from("Markdown content here...");
    
    let mut parser = MarkdownParser::default();
    let tree = parser.parse(&input.as_bytes(), None).unwrap();
    
    // walk block tree...
    let mut cursor = tree.block_tree().walk();
    'outer: loop {
        let node = cursor.node();
    
        // do something with node...
    
        // walk inline tree...
        if let Some(inline_tree) = tree.inline_tree(&node) {
            let mut inline_cursor = inline_tree.walk();
            'inline_outer: loop {
                let inline_node = inline_cursor.node();
    
                // do something with inline_node...
    
                if !inline_cursor.goto_first_child() {
                    if !inline_cursor.goto_next_sibling() {
                        loop {
                            if !inline_cursor.goto_parent() {
                                break 'inline_outer;
                            }
                            if inline_cursor.goto_next_sibling() {
                                break;
                            }
                        }
                    }
                }
            }
        }
    
        if !cursor.goto_first_child() {
            if !cursor.goto_next_sibling() {
                loop {
                    if !cursor.goto_parent() {
                        break 'outer;
                    }
                    if cursor.goto_next_sibling() {
                        break;
                    }
                }
            }
        }
    }
  3. In my initial testing, parsing 2 large but slightly different files both took roughly the same amount of time (50 ms) for each file, which should not be the case because it should only be processing the changes (right?). Using tree-sitter-markdown, it took 50 ms for the first one and 4 ms for the second one. Perhaps I'm doing something wrong?

  4. Saw an error during cargo test: thread 'project::tests::doc_from_empty' panicked at 'Could not load injection query: QueryError { row: 0, column: 1, offset: 1, message: "inline", kind: NodeType }', tree-sitter-md-0.1.1/bindings/rust/lib.rs:113:14. I did not see this on a standalone attempt; it only cropped up when I combined both tree-sitter-md and tree-sitter-markdown into the same project, so would guess that is causing some specific conflict.

@qtfkwk qtfkwk added the bug Something isn't working label Sep 28, 2022
@MDeiml
Copy link
Collaborator

MDeiml commented Sep 30, 2022

The big difference between tree-sitter-md and tree-sitter-markdown is, that the latter basically uses a handwritten C parser and then couples this to the tree-sitter parsing system with some hacks. That created a few problems for people and was also a bit unstable. That motivated me to write this parser, which mostly uses "normal" tree-sitter grammars.

There are some caveats though. This parser (tree-sitter-md) is slower and parsing has to follow some restricting rules.

First of all it's hard / maybe impossible to have explicit "text content" nodes. I might give this another try though. I think it's still ok, as the information is present, but I see the impracticality. Another solution would be to provide methods to extract text in the binding library.

The double tree structure is also caused by these restrictions. But please note that almost all markdown parsers use 2 passes (the spec even recommends to do so). Here as well it might make sense to abstract a bit more in the bindings and provide something like a MarkdownTreeCursor.

There are some inefficiencies still and I'm trying to work on those. For the testing you did in (3.) you should make sure though that you use MarkdownTree::edit to mark the changes and then pass the old tree to MarkdownParser::parse.

The error you mention in (4.) is definitely due to both libraries being loaded, as they both add C bindings with the same name and cargo can't deal with that. (That's ok though, as noone should probably use both libraries :D)

Finally there is already support for tables. I just didn't get around to pushing these changes to crates.io. (Also tables are still not tested very well)

@MDeiml
Copy link
Collaborator

MDeiml commented Sep 30, 2022

Also, thanks! This is very valuable feedback

@MDeiml MDeiml added enhancement New feature or request and removed bug Something isn't working labels Oct 27, 2022
MDeiml added a commit that referenced this issue Oct 27, 2022
* Implement `parse_with` for `MarkdownTree`
* Add `MarkdownCursor` to traverse `MarkdownTree` more easily
* Add tests for `MarkdownCursor` and pipe tables
@MDeiml
Copy link
Collaborator

MDeiml commented Oct 27, 2022

I implemented a lot of your feedback. I added MarkdownCursor to traverse the tree more easily and without having to think about the double structure. Also the newest release includes the grammar with support for pipe tables and more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants