[bug report] about the mem_limit of dockers #547

ShawnShawnYou · 2024-10-16T08:37:59Z

Hi! Team:

I've been trying to test several algorithms on the benchmark and used the following command:

python3 run.py --parallelism 31 --dataset gist-960-euclidean --runs 5 --force

I found that many algorithms failed and returned error 137. Upon checking the log, it showed that some algorithms were allocated less memory compared to others. The machine I used is the same as Erikbern's (i.e., an r6i.16xlarge machine on AWS, with 512GB of memory). Specifically, part of the log reads as follows:

Actually, we expect that each algorithm is limited to about 512 GB / 32 = 16 GB of memory. However, you can see that these algorithms are only allocated about 11 GB, which is far less than our expectation.

About Fix

Checking the code, I found that there is a bug when setting the mem_limit at Line 73 in ann_benchmarks/main.py:

mem_limit = int((psutil.virtual_memory().available - memory_margin) / args.parallelism)

When using "available," the algorithms in the first batch will get 16 GB of memory, while the algorithms in the latter batches will get less than 16 GB of memory.

So, I think this line should be modified to return the correct memory limit:

mem_limit = int((psutil.virtual_memory().total - memory_margin) / args.parallelism)

Or is it my misunderstanding about the setting of mem_limit? Thanks!

Environment
a r6i.16xlarge machine on AWS

The text was updated successfully, but these errors were encountered:

maumueller · 2024-10-18T09:33:48Z

Thanks @ShawnShawnYou . I'm a bit split here, because that change only works for machines that are completely devoted to running the benchmark. In a setting where you share the machine, it would be wrong to split it up like you propose.

The problem only appears when a container is done and a new one is spawned, does it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug report] about the mem_limit of dockers #547

[bug report] about the mem_limit of dockers #547

ShawnShawnYou commented Oct 16, 2024

maumueller commented Oct 18, 2024

[bug report] about the mem_limit of dockers #547

[bug report] about the mem_limit of dockers #547

Comments

ShawnShawnYou commented Oct 16, 2024

maumueller commented Oct 18, 2024