Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loops may not collect garbage, leading to very slow execution #2123

Open
quexxon opened this issue Nov 16, 2024 · 7 comments
Open

Loops may not collect garbage, leading to very slow execution #2123

quexxon opened this issue Nov 16, 2024 · 7 comments

Comments

@quexxon
Copy link
Collaborator

quexxon commented Nov 16, 2024

I tried unsuccessfully to boil this down to a simpler reproducible example. Something about the interplay of get without a default value in the filter procedure appears to cause the issue. Sample code for reproduction:

#!/usr/bin/env ysh

proc filter (; predicate) {
	for line in (io.stdin) {
		if (io->evalExpr(predicate, vars={_val: fromJson8(line)})) {
			write -- $line
		}
	}
}

var f = $(mktemp)

for i in (0..=100_000) {
	json write ({}, space=0) >> $f
}

write -- u'get() with default\n'
write -- 'BEFORE ------------------------------------------------------'

for _ in (0..<3) {
	time for line in (io.stdin) {
		call fromJson8(line)
	} < $f
	write
}

filter [get(_val, 'missing-key', 0) === 0] < $f >/dev/null

write -- 'AFTER -------------------------------------------------------'

for _ in (0..<3) {
	time for line in (io.stdin) {
		call fromJson8(line)
	} < $f
	write
}

write -- u'get() without default\n'
write -- 'BEFORE ------------------------------------------------------'

for _ in (0..<3) {
	time for line in (io.stdin) {
		call fromJson8(line)
	} < $f
	write
}

filter [get(_val, 'missing-key')] < $f >/dev/null

write -- 'AFTER -------------------------------------------------------'

for _ in (0..<3) {
	time for line in (io.stdin) {
		call fromJson8(line)
	} < $f
	write
}

rm $f

Output on my machine:

ysh-0.24.0$ uname -a
Linux rodan 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
ysh-0.24.0$ ./bug.ysh
get() with default

BEFORE ------------------------------------------------------
real    0.071
user    0.071
sys     0.000

real    0.070
user    0.070
sys     0.000

real    0.070
user    0.070
sys     0.000

AFTER -------------------------------------------------------
real    0.071
user    0.071
sys     0.000

real    0.070
user    0.070
sys     0.000

real    0.071
user    0.071
sys     0.000

get() without default

BEFORE ------------------------------------------------------
real    0.071
user    0.071
sys     0.000

real    0.070
user    0.070
sys     0.000

real    0.070
user    0.070
sys     0.000

AFTER -------------------------------------------------------
real    0.928
user    0.927
sys     0.001

real    0.929
user    0.928
sys     0.001

real    0.937
user    0.932
sys     0.001
@quexxon
Copy link
Collaborator Author

quexxon commented Nov 16, 2024

I'm on 5a67993f

@andychu
Copy link
Contributor

andychu commented Nov 16, 2024

Hm thanks for the report! I have repro'd this


real    0.077
user    0.077
sys     0.000

AFTER -------------------------------------------------------
real    1.126
user    1.126
sys     0.000

real    1.120
user    1.120
sys     0.000

Let me see what's happening ...

first hunch: maybe OILS_GC_STATS=1 will tell us something

@andychu
Copy link
Contributor

andychu commented Nov 16, 2024

Gah, so yeah my hunch was right, this is a GC bug. If we do 3 iterations of the last loop, the GC spends 3+ seconds. If we do 4, it spends 4+ seconds

I am not sure why at the moment, but it is a bug

AFTER -------------------------------------------------------
real    1.148
user    1.148
sys     0.000

real    1.156
user    1.156
sys     0.000

real    1.140
user    1.140
sys     0.000

real    1.141
user    1.141
sys     0.000

  num live         =       6254
  max survived     =       3658

  num allocated    =   64310218
  num in heap      =    3600881
  num in pool 1    =   29104704
  num in pool 2    =   31604633
bytes allocated    =  410828957

  num gc points    =    1500048
  num collections  =       1256

   gc threshold    =      50000
  num growths      =          0

  max gc millis    =       28.4
total gc millis    =     4529.3

@andychu
Copy link
Contributor

andychu commented Nov 16, 2024

The command was

OILS_GC_STATS=1 _bin/cxx-opt/ysh test/bug-2123.ysh

The weird thing is that "num live" is small, which means the GC should be working correctly?

Or maybe it is growing to take a huge amount of memory, and then it's all freed?

So yeah the first thing to look at would be a memory leak ... possibly triggered by the get() somehow?

Yes I think so

@andychu
Copy link
Contributor

andychu commented Nov 16, 2024

OK, it does look like some kind of memory leak

  • if I increase the number of iterations from 3 to 4, the maximum memory usage does not increase
  • however, if I increase the lines from 100_000 to 200_00, the max RSS doubles

So basically the GC is thrashing on the internal data structures for 200,000 lines ....


going deeper ...

andychu pushed a commit that referenced this issue Nov 16, 2024
This is bug #2123.

Unrelated:

[spec/ysh-builtin-eval] Nicer example
andychu pushed a commit that referenced this issue Nov 17, 2024
This resolves issue #2123.

The 'filter' loop would evaluate expressions, and allocate, without
every hitting a GC point.

To break out of a loop or recursion, you will need 'if' or 'case'.  So
I believe this should work everywhere, just as it does the
test/bug-2123.* repro.

I'm not quite sure why the bug depended on get() NOT having a default
arg though.  Could look into that more.
andychu pushed a commit that referenced this issue Nov 17, 2024
Related to issue #2123.

Because control flow raises C++ exceptions, we can skip a _LeafTick()
after, which would lead to the same perf bug:

    if (true) {
      continue  # exception, interpreter doesn't reach the end of 'if'
    }
@andychu
Copy link
Contributor

andychu commented Nov 17, 2024

Thank you for the report, this is fixed!

http://op.oilshell.org/uuu/github-jobs/8263/

http://op.oilshell.org/uuu/github-jobs/8263/cpp-tarball.wwz/_release/oils-for-unix.tar

The problem was that in the filter loop we weren't GC-ing enough. Basically if the condition is FALSE, then we were skipping GC, and if it was true, then we would GC


It actually was not a memory leak -- it was building up a huge heap with the 100_000 line computation, and then GC'ing it all at the end

After that, I think there was less locality in all subsequence GC's! They went from 0.2 ms to ~30ms

andychu pushed a commit that referenced this issue Nov 17, 2024
Using the test case from issue #2123
@andychu
Copy link
Contributor

andychu commented Nov 17, 2024

Also, I noticed some places where we are allocating a lot in a loop, which is low hanging fruit for speeding up the interpreter. I checked in a change to benchmarks/callgrind.sh to show that

For the near future, we're focused on semantics more than speed

But there is plenty left to optimize, and I appreciate perf bugs like this one!

@andychu andychu changed the title Calling the get() function without a default value can cause a performance regression Loops may not collect garbage, leading to very slow execution Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants