ANTLR memory allocations look excessive? #3765
Replies: 11 comments 25 replies
-
I am afraid this data can not be shrunk a lot, because ATN is a static representation of grammar. It's encoded into generated code, it's deserialized on loading, and it's used during lexing/parsing. But it can be shrunk moderately if use ATN optimizations, for instance, this one: #3676 |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for this.
I thought antler was recursive descent, and that C# output was really the encoding of the grammar? (I should know more about ALL*, I don't). Putting that aside, that explains part of what I saw, which is the one part (ATN transition it might have been) didn't really grow when you had more input to parse, but other stuff did. For example, the very topmost item, ATN config drastically increased. I'll have to do some more tests tomorrow and post evidence, not tonight. But it also fails to explain other things for example ATN.Transation enumerator being allocated in huge numbers, and nearly 20,000 sharpen.bitset allocations? That just makes no sense to me. So, a question, is it possible to reuse a parser once constructed without throwing it away and recreating it from scratch? For example for me the code looks like this:
and I do that every time. If it's so expensive than I'd rather recycle than re-create from scratch - a one second startup if you're doing huge numbers of these will be crippling. I'll look into this tomorrow and try and give further breakdown. jan |
Beta Was this translation helpful? Give feedback.
-
ATNConfig() constructors are not called during deserialization. But I doubt deserialization is at all a performance problem. For plsql--the largest grammar we have with many thousands of rules--deserialization takes ~0.05s for each atn, lexer and parser (Ryzen 7 2700 standard clock speed, 16GB DDR4, SSD), out of ~2s (plsql/examples/aggregate01.sql). I would first eliminate grammar issues. Check the ambiguity with Intellij on the small input. For example, for plsql, there's a max k of 26!! associated with unary_expression, and lots of ambiguity in atom. Nothing good can happen with numbers like these. That's why it takes 2s to parse a 73-line example. (The Intellij IDE also had a fatal error, but that's another issue.) Remember: Antlr gives one a lot of rope to hang yourself. Everything starts with a quality grammar. |
Beta Was this translation helpful? Give feedback.
-
BTW, the similar issue is existed: #821 |
Beta Was this translation helpful? Give feedback.
-
Excuse delay in responding, just had my first covid. Will skip gathering data and do as Ken Domino said and check lookahead. Also thanks to Ivan Kochurkin, will check out links. Will report back if I find anything noteworthy. |
Beta Was this translation helpful? Give feedback.
-
I recall this being mentioned.
I think Ken Domino's point is likely the answer, that I've just written
myself a very suboptimal grammar and should look at that first.
jan
…On 02/07/2022 18:25, Terence Parr wrote:
There is also the discussion of freeze drying the ATN state so we
can't be reloaded, to avoid the start up cost. Don't have a link handy.
—
Reply to this email directly, view it on GitHub
<#3765 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4SBWN6VYH6HO5CUIS65XDVSB3R5ANCNFSM5Z6XWCKA>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
We see a similar pattern on the c++ runtime. Here is a recent profile: It is ~8500 total samples, out of which (613+530+870+615+356)=2984 is memory allocation / deallocation / dynamic cast. Probably not easy to move the existing code to use the stack instead of the heap, or some fixed structure, but would give a big boost. The dynamic cast normally can be replaced with virtual methods (which still have one virtual table lookup in the background compared to non-virtuals). These were small trees with 1-3 leaves. |
Beta Was this translation helpful? Give feedback.
-
Well unless you’re parsing sone large input, then all measurements will be
dominated by initialization.
…On Thu, Mar 23, 2023 at 13:40 Gabor Nagy ***@***.***> wrote:
Something very simple, just an int and a float in it.
—
Reply to this email directly, view it on GitHub
<#3765 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMCD6322UV3GUEGEMNLW5PO3VANCNFSM5Z6XWCKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Are you reusing the parser and lexer?
…On Thu, Mar 23, 2023 at 13:48 Gabor Nagy ***@***.***> wrote:
Yes, noticed that all results seem to have a 2000ns baseline.
—
Reply to this email directly, view it on GitHub
<#3765 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMFTECV5FF53K2K4LPTW5PP4DANCNFSM5Z6XWCKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
You’re welcome
…On Fri, Mar 24, 2023 at 11:18 Gabor Nagy ***@***.***> wrote:
The overhead came down from ~2000ns to ~1500ns, which is important for us
even if we use Antlr4 only on larger grammars / more complexity. SLL did
not produce an eyeball-visible difference in this case.
Thank you, was very helpful.
—
Reply to this email directly, view it on GitHub
<#3765 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMHIZXWWOTYG7723W43W5UG7NANCNFSM5Z6XWCKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
For the C++ runtime, I juts opened issue #4518, maybe that's interesting to some readers here. |
Beta Was this translation helpful? Give feedback.
-
For a tiny little parsing input in C# I noticed it took a full second to start up. Using the profiler I looked at the memory allocated and found that it seems to be doing an gigantic amount of object generation, which then has to be cleaned up by the GC.
Here is my input, 413 bytes of it:
Attached is an image of the profiler's output. The top graph I guess is the allocation nursery, and you can see what appeared to be five garbage collections in just that. I don't know what the bottom graph is but it's going to 48 meg.
Below that are the number of allocations from antlr, I have no idea how it's possible it could be doing this much.
A while back I did some profiling with the Python output, and tentatively concluded that the slowdown was something to do with the number of objects allocated, which makes sense because Python uses a skanky expensive reference counting GC and a fallback mark-sweep if necessary for cycles, whereas Java and C# seem to have much more efficient generational collectors.
I don't have time to dig into this, but I would like other people's opinion. Can anyone reproduce this?
Please note that the stuff I'm parsing actually produces an AST inside and a few other bits and pieces, but if you look at the breakdown in the attached image, you'll see that it really is utterly overwhelmed by antler-related objects.
Keep in mind I may be doing something wrong, but given that, any thoughts?
ja
n
Beta Was this translation helpful? Give feedback.
All reactions