Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packrat Parsrers resolve exponential parsing time #248

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kmizu
Copy link

@kmizu kmizu commented Dec 14, 2016

Hi, I studied Packrat Parsing/PEG a bit. As I saw exponential parsing time, I though that it maybe resolved by packrat parsing.

To emulate packrat parsing, I added @MemoMismatches to all Rule returning methods that take no argument.

As a result, the exponential parsing time problems such as #43 and #104 were resolved in my machine.
Actually, it maybe OK if the number of @MemoMismatches methods decreases.

It resolve exponential parsing time in exchange of memory consumption!
@kmizu
Copy link
Author

kmizu commented Dec 14, 2016

I'm sorry. Some tests failed by adding too much @MemoMismatches. I'll fix the problem. However, it is effective to reduce parsing time.

@vsch
Copy link
Contributor

vsch commented Dec 14, 2016

@kmizu, it may be effective in some cases but not in most. The parsing time issue and exponential parsing time is rooted in the PEG architecture not in the implementation of it. There are too many possibilities when parsing markdown to be able to prune them early so the possible parsing rules keep growing and with it the parsing time.

No matter what you do you will not be able to eliminate all the pathological input cases, because you have to identify them first and that is a career decision all by itself. For example I found that a few dozen repeated [ is enough to take a coffee break before the parser is done parsing. On the other hand commonmark-java and my project flexmark-java which started as a fork of commonmark-java, which use the CommonMark blocks first inlines second parsing method for markdown, use a 100,000 x [ as a pathological performance test, that completes in under 100ms.

You can mitigate some parsing issues by adding constructs and optimizations on top of PEG but the inherent shortcomings of using any regular grammar for Markdown, not just PEG, cannot be resolved in such fashion. Markdown is not a regular language.

A tool has to be applied to tasks it is suited for. PEG is not a suitable tool for markdown parsing and pegdown has pushed the envelope what could be done with PEG for markdown parsing.

Pegdown was intimately integrated into my plugin project and when I had to make the hard decision to switch parsers I spent some time looking for alternatives. I chose commonmark-java for its amazing speed, ease of maintenance, ease of tweaking and debugging of the parsing rules. Its rules are just java code, easily debugged unlike PEG.

commonmark-java was not at all suited to replace pegdown but I made the decision based on its architecture and implementation. I already had enough experience fixing and adding to pegdown to know what to avoid in the next parser. In the end it was well worth it. Average file parsing performance is 30x faster than pegdown. No pathological cases, no timeouts, easy to tweak parsing rules per element since they have minimal interdependencies, unlike one big PEG grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants