Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama uses a lot of memory with parallel compiles #45

Open
jpeach opened this issue Jun 14, 2021 · 4 comments
Open

llama uses a lot of memory with parallel compiles #45

jpeach opened this issue Jun 14, 2021 · 4 comments

Comments

@jpeach
Copy link
Contributor

jpeach commented Jun 14, 2021

Once the number of parallel compiles climbs (20 - 30), the llama daemon starts using a lot of memory and the OOM killer gets it. If I adjust the OOM score so it doesn't get killed, it will hit ~5G resident for build w/ 56 max in-flight.

Probably need to do some memory profiling to understand what's happening here.

@nelhage
Copy link
Owner

nelhage commented Jun 15, 2021

llama supports -mem-profile, so you should be able to measure this with something like:

llama daemon -shutdown;
llama -mem-profile memprof.out daemon -start &
# do your build here
llama daemon -shutdown

And then using go tool pprof to explore the profile data. If you have a chance to look at that it will hopefully be pretty informative. I can also potentially try to repro and look when I have some time.

@jpeach
Copy link
Contributor Author

jpeach commented Jun 19, 2021

I poked at this a bit, and my bet is that the memory use comes from GC lagging the large RPC calls. The preprocessed source tends to be pretty large (5 - 10MB) and that's all read into memory and later copied and compressed.

Memory profile svg

I added some debug logging to the server:

2021/06/19 13:04:51 invoke gcc with 21 args, 6224045 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:51 invoke gcc with 21 args, 6221758 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:51 invoke gcc with 21 args, 6219036 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:51 invoke gcc with 20 args, 882309 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:51 invoke gcc with 20 args, 795040 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:51 invoke gcc with 21 args, 6314550 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:51 invoke gcc with 20 args, 948303 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 21 args, 6220531 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 819762 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 1007963 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 795447 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 662430 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 817853 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 31 args, 1139504 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 824722 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 887429 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 632164 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 1239761 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 655148 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 822111 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 819140 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 1086661 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 913843 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 994983 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 817727 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 792992 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 795138 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 850226 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 1590191 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 816673 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 824437 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 31 args, 293768 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 867605 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 20 args, 817177 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:52 invoke gcc with 55 args, 2989972 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 20 args, 842875 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 20 args, 1199644 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 21 args, 594118 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 20 args, 629307 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 44 args, 7333004 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 42 args, 6044028 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 19 args, 3558 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:53 invoke gcc with 18 args, 157426 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 55 args, 2780302 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 21 args, 3297519 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 21 args, 2428054 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 55 args, 4094532 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 55 args, 422063 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 46 args, 10917823 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 55 args, 990792 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 55 args, 1020103 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 46 args, 10830322 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 21 args, 4550776 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 55 args, 4787234 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 27 args, 8734638 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 19 args, 3558 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:54 invoke gcc with 55 args, 3892367 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 42 args, 6039800 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 2985431 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 5525730 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 5481249 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 18 args, 145061 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 5621991 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 5491415 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 20 args, 1021930 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 5117929 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 20 args, 58 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 42 args, 6041808 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 4092807 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 3254789 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 3867740 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 27 args, 133171 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 3399403 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 684537 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 5597704 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 3556953 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:55 invoke gcc with 55 args, 2062523 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:56 invoke gcc with 42 args, 6011005 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:56 invoke gcc with 21 args, 58 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:56 invoke gcc with 42 args, 6010749 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:56 invoke gcc with 55 args, 6006803 bytes on stdin, 0 files for 0 bytes
2021/06/19 13:04:56 invoke gcc with 55 args, 5291457 bytes on stdin, 0 files for 0 bytes

@jpeach
Copy link
Contributor Author

jpeach commented Jun 19, 2021

GODEBUG=gctrace=1 llama daemon -start -cc-concurrency 500
gc 1 @3.839s 0%: 0.019+2.1+0.003 ms clock, 0.15+0/1.6/0+0.025 ms cpu, 14->14->12 MB, 15 MB goal, 8 P
gc 2 @4.012s 0%: 2.9+5.4+0.17 ms clock, 23+0/4.9/0+1.4 ms cpu, 27->27->27 MB, 28 MB goal, 8 P
gc 3 @4.079s 0%: 23+31+0.035 ms clock, 188+2.7/13/0+0.28 ms cpu, 67->84->83 MB, 68 MB goal, 8 P
gc 4 @4.713s 3%: 34+64+98 ms clock, 272+0.29/4.3/0+788 ms cpu, 156->170->164 MB, 166 MB goal, 8 P
gc 5 @5.171s 3%: 30+85+14 ms clock, 242+0.20/21/0+115 ms cpu, 296->332->313 MB, 328 MB goal, 8 P
gc 6 @5.847s 4%: 16+56+23 ms clock, 128+0.68/5.4/0+187 ms cpu, 560->575->529 MB, 627 MB goal, 8 P
gc 7 @8.231s 4%: 71+55+63 ms clock, 573+45/47/0.52+505 ms cpu, 1005->1184->1089 MB, 1058 MB goal, 8 P
gc 8 @9.965s 4%: 49+1155+0.026 ms clock, 394+0.22/565/0+0.21 ms cpu, 1908->2141->2014 MB, 2179 MB goal, 8 P
gc 9 @63.759s 0%: 0.032+2.3+0.003 ms clock, 0.26+0/3.8/7.8+0.029 ms cpu, 3592->3592->1784 MB, 4028 MB goal, 8 P
gc 10 @87.312s 0%: 7.4+2.5+0.008 ms clock, 59+0.17/4.4/5.9+0.066 ms cpu, 3372->3372->1323 MB, 3568 MB goal, 8 P
gc 11 @106.963s 0%: 0.34+7.5+0.038 ms clock, 2.7+0/12/24+0.31 ms cpu, 2594->2594->1684 MB, 2647 MB goal, 8 P
...

@jpeach
Copy link
Contributor Author

jpeach commented Jun 19, 2021

If I use a temporary file to send the preprocessed source to the daemon, net/rpc doens't appeat in the memory profile at all, it's all in the encoding and upload SVG

jpeach added a commit to jpeach/llama that referenced this issue Jun 19, 2021
Preprocessed output tends to be quite large, often multiple megabytes.
Rather than transferring it over the RPC interface, use a tempfile so
that the server doesn't use a lot of memory uneccessarily.

This updates nelhage#45.
jpeach added a commit to jpeach/llama that referenced this issue Jun 20, 2021
Rather than read large files into memory and process them multiple
times (for hashing and for compression), use streaming compression
for files so that only the compressed output needs to be fully in
memory.

This has the side-effect of the object ID being generated by hashing
the compressed text (which is better because there's less to hash after
compression). This means that the function image needs to be regenerated
to match.

This updates nelhage#45.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants