Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addjob() blocking even if the queue is not full? #66

Open
mgard opened this issue Jun 29, 2016 · 1 comment
Open

addjob() blocking even if the queue is not full? #66

mgard opened this issue Jun 29, 2016 · 1 comment

Comments

@mgard
Copy link

mgard commented Jun 29, 2016

I'm currently writting a small threaded application. I ran into an issue using torch.threads. Basically, my code can be reduced to this:

tpool:addjob(threadFunc)
-- Do another long task here
-- ...
while currentPos <= shuffledIdx:storage():size() do
    att = sys.clock()
    tpool:synchronize()
    print("Time elapsed waiting for synchronize : " .. sys.clock() - att)

    local dataIn = threadBufferIn[{{1, bsize}, {}, {}, {}}]:clone()
    local dataOut = threadBufferOut[{{1, bsize}, {}, {}, {}}]:clone()

    currentPos = currentPos + bsize
    if currentPos <= shuffledIdx:storage():size() then
        att2 = sys.clock()
        print(tpool:acceptsjob())
        tpool:addjob(threadFunc)
        print("Time elapsed waiting for addjob : " .. sys.clock() - att2)
    end

    --  Do another long task in the main thread
end

where, of course, tpool is a well-defined thread pool. I use this pool at no other place in my code. Now, I get that tpool:synchronize() should block until the "threadFunc" function is done -- so the time elapsed in synchronize could be > 0. But here is the output I get:

Time elapsed waiting for synchronize : 9.9897384643555e-05
true
Time elapsed waiting for addjob : 2.5406899452209

I also print the return of acceptsjob() (which is true) -- I also tried to print the return value of hasjob(), which is always false at this point. According to the documentation, this should indicates that the corresponding queue is not full, and thus that addjob will not block. Then why does it hangs more than 2 seconds there? I would get that synchronize blocks (and it indeed does a few time), but addjob?
Is there a way for me to gather more debug information?

@soumith
Copy link
Member

soumith commented Aug 3, 2016

while i understand the behavior you are seeing, I am not sure why it's doing so.
Can you try putting threadFunc in a separate file so that it doesn't have weird upvalues based serialization / stuff that's happening.

A simple threadpool behavior seems to work fine without blocking in several of our projects.
For example: https://github.com/soumith/dcgan.torch/blob/master/data/data.lua

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants