Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ooms #92

Merged
merged 6 commits into from
Mar 25, 2024
Merged

fix ooms #92

merged 6 commits into from
Mar 25, 2024

Conversation

tjjfvi
Copy link
Contributor

@tjjfvi tjjfvi commented Mar 20, 2024

Fixes #91

  • adds some missing calls to half_free
  • switches from a stack to a queue for the redex bag, ensuring that reductions are processed in an memory-friendly order

I can now run the program in #91 with 1G of memory for 2^28, and with only 1K of memory when single-threaded.

@Janiczek can you confirm this fixes it for you?

@tjjfvi tjjfvi requested a review from FranchuFranchu March 20, 2024 13:20
@tjjfvi tjjfvi marked this pull request as draft March 20, 2024 14:17
@tjjfvi
Copy link
Contributor Author

tjjfvi commented Mar 20, 2024

switches from a stack to a queue for the redex bag, ensuring that reductions are processed in an memory-friendly order

This change is... problematic. It's great for tail recursive things, but now a lot of our tests that operate on trees OOM.

@tjjfvi
Copy link
Contributor Author

tjjfvi commented Mar 20, 2024

Probably, we should categorize redexes into 'shrinking' and 'growing', and process shrinking ones first?

@FranchuFranchu
Copy link
Contributor

Probably, we should categorize redexes into 'shrinking' and 'growing', and process shrinking ones first?

@tjjfvi I agree, a pair of stacks is the right way to do this. Any usecase that's solved by having a queue will be solved by having a shrinking redex stack and a growing redex stack. This is because some of the elements at the front of the queue will be shrinking, and other ones will be growing (having a queue works because there are enough shrinking redexes to keep hvm-core from OOMing), and with a pair of stacks, all elements in the shrinking redex stack will be processed first.

@Janiczek
Copy link
Contributor

Janiczek commented Mar 20, 2024

@tjjfvi Let me preface this by saying I needed to gut out the builtins out of hvm-lang to be able to use this PR's commit 22a0c89c301a36953a1bfcfd55e26ef113498c96 in hvm-lang main. But hopefully that's orthogonal.

The sequential program indeed now works for eg. 2^29 and so on, but the parallel one now takes much longer, and doesn't use the CPU cores nearly as well as before.

what s/run
sequential 2^26 9.182
parallel 2^26 22.061

I guess correct and slow > incorrect and fast, but yeah maybe this approach needs some tweaking :)

Here's the parallel program for reference

Main n = (Sum (PowOfTwo n))

PowOfTwo n = (PowOfTwoHelp 1 n)
PowOfTwoHelp acc 0 = acc
PowOfTwoHelp acc e = (* 2 (PowOfTwoHelp acc (- e 1)))

Sum 0 = 0
Sum 1 = 1
Sum n = 
  let half1 = (/ n 2);
  let half2 = (- n half1);
  (+ (Sum half1) (Sum half2))

@FranchuFranchu FranchuFranchu force-pushed the fix-ooms branch 2 times, most recently from d19a52a to de3d1fc Compare March 22, 2024 19:24
@tjjfvi
Copy link
Contributor Author

tjjfvi commented Mar 22, 2024

@Janiczek can you test again?

@Janiczek
Copy link
Contributor

Janiczek commented Mar 24, 2024

That one works much better!

The parallel program now utilizes all cores to the max as expected.

The memory usage in all the program runs seems to be stable around 1840 MB, rising slightly (eg. stopping around 1860 MB) as time goes on. So maybe some forgotten frees are still there, but much less prominent.

power of 2 seq [s] par [s]
27 17.91 6.11
28 35.79 11.65
29 71.72 22.10
30 143.00 43.29
31 286.75 85.76
32 572.77 174.00

Screenshot 2024-03-24 171508

src/run/parallel.rs Outdated Show resolved Hide resolved
@tjjfvi tjjfvi force-pushed the fix-ooms branch 3 times, most recently from fd7c820 to 3e1ca16 Compare March 25, 2024 17:18
@HigherOrderBot

This comment has been minimized.

@HigherOrderBot

This comment has been minimized.

@tjjfvi
Copy link
Contributor Author

tjjfvi commented Mar 25, 2024

ooh, perf wins across the board, love to see it

@tjjfvi tjjfvi marked this pull request as ready for review March 25, 2024 18:57
@tjjfvi tjjfvi enabled auto-merge March 25, 2024 18:57
@HigherOrderBot
Copy link
Collaborator

Perf run for 2a7da0e:

file              mode        main          2a7da0e057a0
========================================================
sum_rec           intr-singl      17.650 s      14.047 s
                  intr-multi       8.069 s       7.426 s
--------------------------------------------------------
bitonic_sort_lam  intr-singl      11.871 s      11.261 s
                  intr-multi       5.827 s       5.467 s
--------------------------------------------------------
merge_sort        intr-singl       7.881 s       7.391 s
                  intr-multi       4.074 s       3.759 s
--------------------------------------------------------
sum_tail          intr-singl       1.486 s       1.142 s
                  intr-multi       1.353 s       1.137 s
--------------------------------------------------------
c2                intr-singl       0.000 s       0.000 s
                  intr-multi       0.001 s       0.001 s
--------------------------------------------------------
sum_tree          intr-singl      11.505 s       9.220 s
                  intr-multi       4.971 s       4.494 s
--------------------------------------------------------
boom              intr-singl       2.252 s       2.230 s
                  intr-multi       2.356 s       2.617 s
--------------------------------------------------------
radix_sort_lam    intr-singl      10.046 s       9.518 s
                  intr-multi       4.937 s       4.756 s

@tjjfvi tjjfvi added this pull request to the merge queue Mar 25, 2024
Merged via the queue into main with commit fee98fb Mar 25, 2024
5 checks passed
@tjjfvi tjjfvi deleted the fix-ooms branch March 25, 2024 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM in a program that shouldn't grow
4 participants