fix ooms #92

tjjfvi · 2024-03-20T13:20:14Z

Fixes #91

adds some missing calls to half_free
switches from a stack to a queue for the redex bag, ensuring that reductions are processed in an memory-friendly order

I can now run the program in #91 with 1G of memory for 2^28, and with only 1K of memory when single-threaded.

@Janiczek can you confirm this fixes it for you?

tjjfvi · 2024-03-20T14:18:08Z

switches from a stack to a queue for the redex bag, ensuring that reductions are processed in an memory-friendly order

This change is... problematic. It's great for tail recursive things, but now a lot of our tests that operate on trees OOM.

tjjfvi · 2024-03-20T14:20:46Z

Probably, we should categorize redexes into 'shrinking' and 'growing', and process shrinking ones first?

FranchuFranchu · 2024-03-20T14:45:10Z

Probably, we should categorize redexes into 'shrinking' and 'growing', and process shrinking ones first?

@tjjfvi I agree, a pair of stacks is the right way to do this. Any usecase that's solved by having a queue will be solved by having a shrinking redex stack and a growing redex stack. This is because some of the elements at the front of the queue will be shrinking, and other ones will be growing (having a queue works because there are enough shrinking redexes to keep hvm-core from OOMing), and with a pair of stacks, all elements in the shrinking redex stack will be processed first.

Janiczek · 2024-03-20T15:18:03Z

@tjjfvi Let me preface this by saying I needed to gut out the builtins out of hvm-lang to be able to use this PR's commit 22a0c89c301a36953a1bfcfd55e26ef113498c96 in hvm-lang main. But hopefully that's orthogonal.

The sequential program indeed now works for eg. 2^29 and so on, but the parallel one now takes much longer, and doesn't use the CPU cores nearly as well as before.

what	s/run
sequential 2^26	9.182
parallel 2^26	22.061

I guess correct and slow > incorrect and fast, but yeah maybe this approach needs some tweaking :)

Here's the parallel program for reference

Main n = (Sum (PowOfTwo n))

PowOfTwo n = (PowOfTwoHelp 1 n)
PowOfTwoHelp acc 0 = acc
PowOfTwoHelp acc e = (* 2 (PowOfTwoHelp acc (- e 1)))

Sum 0 = 0
Sum 1 = 1
Sum n = 
  let half1 = (/ n 2);
  let half2 = (- n half1);
  (+ (Sum half1) (Sum half2))

tjjfvi · 2024-03-22T20:35:56Z

@Janiczek can you test again?

Janiczek · 2024-03-24T15:48:48Z

That one works much better!

The parallel program now utilizes all cores to the max as expected.

The memory usage in all the program runs seems to be stable around 1840 MB, rising slightly (eg. stopping around 1860 MB) as time goes on. So maybe some forgotten frees are still there, but much less prominent.

power of 2	seq [s]	par [s]
27	17.91	6.11
28	35.79	11.65
29	71.72	22.10
30	143.00	43.29
31	286.75	85.76
32	572.77	174.00

src/run/parallel.rs

…ate extra memory to prevent some OOMs.

tjjfvi · 2024-03-25T18:25:19Z

ooh, perf wins across the board, love to see it

HigherOrderBot · 2024-03-25T19:02:08Z

Perf run for 2a7da0e:

file              mode        main          2a7da0e057a0
========================================================
sum_rec           intr-singl      17.650 s      14.047 s
                  intr-multi       8.069 s       7.426 s
--------------------------------------------------------
bitonic_sort_lam  intr-singl      11.871 s      11.261 s
                  intr-multi       5.827 s       5.467 s
--------------------------------------------------------
merge_sort        intr-singl       7.881 s       7.391 s
                  intr-multi       4.074 s       3.759 s
--------------------------------------------------------
sum_tail          intr-singl       1.486 s       1.142 s
                  intr-multi       1.353 s       1.137 s
--------------------------------------------------------
c2                intr-singl       0.000 s       0.000 s
                  intr-multi       0.001 s       0.001 s
--------------------------------------------------------
sum_tree          intr-singl      11.505 s       9.220 s
                  intr-multi       4.971 s       4.494 s
--------------------------------------------------------
boom              intr-singl       2.252 s       2.230 s
                  intr-multi       2.356 s       2.617 s
--------------------------------------------------------
radix_sort_lam    intr-singl      10.046 s       9.518 s
                  intr-multi       4.937 s       4.756 s

tjjfvi requested a review from FranchuFranchu March 20, 2024 13:20

tjjfvi marked this pull request as draft March 20, 2024 14:17

FranchuFranchu approved these changes Mar 20, 2024

View reviewed changes

FranchuFranchu force-pushed the fix-ooms branch 2 times, most recently from d19a52a to de3d1fc Compare March 22, 2024 19:24

Janiczek reviewed Mar 24, 2024

View reviewed changes

src/run/parallel.rs Outdated Show resolved Hide resolved

tjjfvi force-pushed the fix-ooms branch 3 times, most recently from fd7c820 to 3e1ca16 Compare March 25, 2024 17:18

This comment has been minimized.

Sign in to view

tjjfvi and others added 5 commits March 25, 2024 14:19

fix ooms

5943780

deque is totally a word

b06e34d

improve test printing

e0be4b0

Implement a separate fast redex stack for redexes that do not alloc…

ad21c4d

…ate extra memory to prevent some OOMs.

tweaks

45a33af

tjjfvi force-pushed the fix-ooms branch from 3e1ca16 to 45a33af Compare March 25, 2024 18:19

This comment has been minimized.

Sign in to view

tjjfvi marked this pull request as ready for review March 25, 2024 18:57

bump version

eb6b9fd

tjjfvi enabled auto-merge March 25, 2024 18:57

tjjfvi added this pull request to the merge queue Mar 25, 2024

Merged via the queue into main with commit fee98fb Mar 25, 2024
5 checks passed

tjjfvi deleted the fix-ooms branch March 25, 2024 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix ooms #92

fix ooms #92

tjjfvi commented Mar 20, 2024

tjjfvi commented Mar 20, 2024

tjjfvi commented Mar 20, 2024

FranchuFranchu commented Mar 20, 2024

Janiczek commented Mar 20, 2024 •

edited

Loading

tjjfvi commented Mar 22, 2024

Janiczek commented Mar 24, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

tjjfvi commented Mar 25, 2024

HigherOrderBot commented Mar 25, 2024

fix ooms #92

fix ooms #92

Conversation

tjjfvi commented Mar 20, 2024

tjjfvi commented Mar 20, 2024

tjjfvi commented Mar 20, 2024

FranchuFranchu commented Mar 20, 2024

Janiczek commented Mar 20, 2024 • edited Loading

tjjfvi commented Mar 22, 2024

Janiczek commented Mar 24, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

tjjfvi commented Mar 25, 2024

HigherOrderBot commented Mar 25, 2024

Janiczek commented Mar 20, 2024 •

edited

Loading

Janiczek commented Mar 24, 2024 •

edited

Loading