Question about running `mx.eval` in separate threads #1448

zcbenz · 2024-09-30T13:16:59Z

zcbenz
Sep 30, 2024

I'm computing cosine similarities with code:

embeddings = [ [...], [...] ]
query1 = [...]
query2 = [...]

results1 = nn.losses.cosine_similarity_loss((query1,), embeddings)
results2 = nn.losses.cosine_similarity_loss((query2,), embeddings)

My questions are:

Is it safe to run mx.eval(results1) and mx.eval(results2) in separate worker threads at the same time? Supposing embeddings is not changed.
On CPU, can I expect the code to efficiently make use of 2 CPU cores if I run mx.eval in 2 worker threads?

Answered by awni

Sep 30, 2024

Is it safe to run mx.eval(results1) and mx.eval(results2) in separate worker threads at the same time? Supposing embeddings is not changed.

No unfortunately MLX is not in general thread safe at that level.

If you are looking to explicitly multi-thread the eval, the way it is supported in MLX is using two CPU streams. You can make a new stream on the CPU with mx.new_stream(mx.cpu).

Note, not every workload will actually go faster with multi-threading. For example, matmuls use the co-processor and if there is only one, then you won't see much speedup.. so they probably won't get faster.

Here is an example of something which runs almost twice as fast with two streams:

import mlx.core as mx

…

View full answer

awni · 2024-09-30T13:31:06Z

awni
Sep 30, 2024
Maintainer

Is it safe to run mx.eval(results1) and mx.eval(results2) in separate worker threads at the same time? Supposing embeddings is not changed.

No unfortunately MLX is not in general thread safe at that level.

If you are looking to explicitly multi-thread the eval, the way it is supported in MLX is using two CPU streams. You can make a new stream on the CPU with mx.new_stream(mx.cpu).

Note, not every workload will actually go faster with multi-threading. For example, matmuls use the co-processor and if there is only one, then you won't see much speedup.. so they probably won't get faster.

Here is an example of something which runs almost twice as fast with two streams:

import mlx.core as mx

cpu1 = mx.new_stream(mx.cpu)
cpu2 = mx.new_stream(mx.cpu)

a = mx.random.uniform(shape=(2048, 2048))

def fun(s1, s2):
    c = mx.exp(a, stream=s1)
    d = mx.exp(a, stream=s2)
    mx.eval(c, d)


import time

def timeit(f, *args):
    for _ in range(5):
        fun(*args)

    tic = time.time()
    for _ in range(100):
        fun(*args)
    toc = time.time()
    ms = 10 * (toc - tic)
    return ms

print(timeit(fun, cpu1, cpu1))
print(timeit(fun, cpu1, cpu2))

`mx.new_stream(mx.cpu)

0 replies

awni · 2024-09-30T13:33:30Z

awni
Sep 30, 2024
Maintainer

I should add a couple things:

Regarding thread safety.. we may want it, I am not sure. It would be good to file an issue if it's an important feature for you.
Our story with threading at the CPU level is pretty weak right now. We are thinking about ways to improve this which don't require users to explicitly manipulate more streams. I'd say it's on the roadmap but not a top priority yet.

0 replies

zcbenz · 2024-09-30T23:05:24Z

zcbenz
Sep 30, 2024
Author

Thanks for the multi-thread example, suppose I'm serving 16 cosine_similarity_loss requests on 4 CPU cores, what would be the best strategy to maximize CPU utilization? Does it work if I just allocate 4 CPU streams and dispatch them evenly between the cosine_similarity_loss calls?

I think what I need is not really thread safety in MLX. I'm currently writing a demo of vector database with MLX, which handles multiple requests in parallel and mainly run on Linux.

To implement it, I need to run eval asynchronously, and send the results when it finishes. The async_eval API does not have notification so I tried to use eval in workers instead, as advised in #1251, but this approach does not work when serving multiple requests at the same time. Maybe the best approach is returning a synchronizer in async_eval and make it safe to wait on the synchronizer in separate threads?

0 replies

angeloskath · 2024-10-01T07:58:43Z

angeloskath
Oct 1, 2024
Maintainer

Are you using the C++ API or the python API. I don't think we were really aiming for this to be widely used but in the C++ API there is the following method.

auto y = f(x);
async_eval({y});

// The following should be valid in any thread if y.status() != Status::unscheduled
y.event().wait()

Given that eval({y}) is basically async_eval({y}); y.event().wait() why wouldn't using eval in worker threads work?

In general this is kind of testing the limits of what is possible with the current API and assumptions held in core so ymmv. I think it is great to check what we need to change to make the core thread safe or at least pieces of it.

0 replies

zcbenz · 2024-10-01T08:10:52Z

zcbenz
Oct 1, 2024
Author

I'm using the C++ API in JavaScript binding to implement something like await mx.asyncEval(array). Your code works if there is only one single input, but I want to make the API wait for all the inputs.

why wouldn't using eval in worker threads work?

It only works if the graph is only evaluated by one worker thread, but I want to run evaluations in multiple workers to implement something like a vector database, which is my original question:

Is it safe to run mx.eval(results1) and mx.eval(results2) in separate worker threads at the same time?

No unfortunately MLX is not in general thread safe at that level.

0 replies

angeloskath · 2024-10-01T08:50:33Z

angeloskath
Oct 1, 2024
Maintainer

You can hack it using depends(inputs, dependencies) and something like the following

auto synchronizer = depends({x}, {y, z});
async_eval({synchronizer});
synchronizer.event().wait();

I mentioned running eval in worker threads because it is indeed not supported but neither was depends designed for the above use. Basically either one may break in an unexpected way.

0 replies

zcbenz · 2024-10-01T09:59:16Z

zcbenz
Oct 1, 2024
Author

Thanks for the code, it should be able to work. I understand I'm hacking things to do what they are not designed for, but awaitable async eval is very needed when writing JS code, I'll try and see if things break.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about running `mx.eval` in separate threads #1448

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Question about running mx.eval in separate threads #1448

zcbenz Sep 30, 2024

Replies: 7 comments

awni Sep 30, 2024 Maintainer

awni Sep 30, 2024 Maintainer

zcbenz Sep 30, 2024 Author

angeloskath Oct 1, 2024 Maintainer

zcbenz Oct 1, 2024 Author

angeloskath Oct 1, 2024 Maintainer

zcbenz Oct 1, 2024 Author

Question about running `mx.eval` in separate threads #1448

zcbenz
Sep 30, 2024

awni
Sep 30, 2024
Maintainer

awni
Sep 30, 2024
Maintainer

zcbenz
Sep 30, 2024
Author

angeloskath
Oct 1, 2024
Maintainer

zcbenz
Oct 1, 2024
Author

angeloskath
Oct 1, 2024
Maintainer

zcbenz
Oct 1, 2024
Author