-
I'm computing cosine similarities with code: embeddings = [ [...], [...] ]
query1 = [...]
query2 = [...]
results1 = nn.losses.cosine_similarity_loss((query1,), embeddings)
results2 = nn.losses.cosine_similarity_loss((query2,), embeddings) My questions are:
|
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
No unfortunately MLX is not in general thread safe at that level. If you are looking to explicitly multi-thread the eval, the way it is supported in MLX is using two CPU streams. You can make a new stream on the CPU with Note, not every workload will actually go faster with multi-threading. For example, matmuls use the co-processor and if there is only one, then you won't see much speedup.. so they probably won't get faster. Here is an example of something which runs almost twice as fast with two streams: import mlx.core as mx
cpu1 = mx.new_stream(mx.cpu)
cpu2 = mx.new_stream(mx.cpu)
a = mx.random.uniform(shape=(2048, 2048))
def fun(s1, s2):
c = mx.exp(a, stream=s1)
d = mx.exp(a, stream=s2)
mx.eval(c, d)
import time
def timeit(f, *args):
for _ in range(5):
fun(*args)
tic = time.time()
for _ in range(100):
fun(*args)
toc = time.time()
ms = 10 * (toc - tic)
return ms
print(timeit(fun, cpu1, cpu1))
print(timeit(fun, cpu1, cpu2)) `mx.new_stream(mx.cpu) |
Beta Was this translation helpful? Give feedback.
-
I should add a couple things:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the multi-thread example, suppose I'm serving 16 I think what I need is not really thread safety in MLX. I'm currently writing a demo of vector database with MLX, which handles multiple requests in parallel and mainly run on Linux. To implement it, I need to run |
Beta Was this translation helpful? Give feedback.
-
Are you using the C++ API or the python API. I don't think we were really aiming for this to be widely used but in the C++ API there is the following method. auto y = f(x);
async_eval({y});
// The following should be valid in any thread if y.status() != Status::unscheduled
y.event().wait() Given that In general this is kind of testing the limits of what is possible with the current API and assumptions held in |
Beta Was this translation helpful? Give feedback.
-
I'm using the C++ API in JavaScript binding to implement something like
It only works if the graph is only evaluated by one worker thread, but I want to run evaluations in multiple workers to implement something like a vector database, which is my original question:
|
Beta Was this translation helpful? Give feedback.
-
You can hack it using
I mentioned running |
Beta Was this translation helpful? Give feedback.
-
Thanks for the code, it should be able to work. I understand I'm hacking things to do what they are not designed for, but awaitable async eval is very needed when writing JS code, I'll try and see if things break. |
Beta Was this translation helpful? Give feedback.
No unfortunately MLX is not in general thread safe at that level.
If you are looking to explicitly multi-thread the eval, the way it is supported in MLX is using two CPU streams. You can make a new stream on the CPU with
mx.new_stream(mx.cpu)
.Note, not every workload will actually go faster with multi-threading. For example, matmuls use the co-processor and if there is only one, then you won't see much speedup.. so they probably won't get faster.
Here is an example of something which runs almost twice as fast with two streams: