-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nursery does not re-raise error from .run_in_actor()
remote task
#287
Comments
I can workaround this by doing: async def main() -> None:
async with tractor.open_nursery() as n:
t1 = await n.run_in_actor(not_raising, name="raising")
t2 = await n.run_in_actor(raising, name="not raising")
async with trio.open_nursery() as tn:
tn.start_soon(t1.result)
tn.start_soon(t2.result)
print("outside the tractor nursery") but the traceback is weird (reported below)
|
@didimelli ahah! Nice catch. This does appear to be a bug todo with the nursery reaping. My suspicion is the issue is a race between when the nursery exit happens versus when the error is collected from the portal. This is likely a regression after the latest release since we changed some cancel request semantics to be more deterministic and this may have touched some error retrieval logic which we aren't testing in this way. Would you mind trying this on Cheers and thanks for finding this bug! |
As some tips for your example (and how I figured out this was indeed a bug):
|
Huh i just tried on the last 2 alphas and the same issue still seems to exist!? I will dig in this afternoon. |
Here the logs with
lol you are right! 🤦🏼♂️ |
With a2, it hangs with these logs:
|
Just tried on all the alphas, and it is hanging with all of them! |
@didimelli yah i found the root cause and put up the patch in #288. We need a test and hopefully this change won't blow up CI 😂 |
Heh it's definitely blowing up CI looks like3 tests; so we might need a slightly different approach or to fix those tests. Having a test in the suite would be great if you wanted to write up your test script as one 😉 |
.run_in_actor()
remote task
Ok, I will look into this tomorrow! |
@didimelli sweet! Yeah just to clarify the issue, basically it's summarized as follows:
Some solutions:
|
Hello! I was having a look at this, and I had an idea: I'm by no means good at trio (or tractor), but I think this would trigger cancellation for all the other tasks, ensuring that they are cancelled (and errors accounted for) cleanly. Are there any side effects I am missing? I can push some code to the PR that does what I said, if you want to have a look. |
@houtenjack thanks for the input! Yes, cancellation will also work but (and this is important) we also want to make sure we aren't limiting the ability of users to eventually be able to define custom supervisors as is discussed in #22 😎 So the question is if we decide to use a cancel-actor-nursery-on-any- To clarify the reason CI is failing in #288, error aggregation:
|
Another question is whether a child actor which is cancelled due to an error from a sibling in the nursery group (in this case |
Let's see if I understand the issue:
In the second solution you proposed, you want to cancel How would the supervisor "tap" into a child nursery to check for errors before all the tasks are collected, if tasks are not raising errors? It seems to me that supervisor code needs to branch for implementing different strategies, but this would require "lower level" branching for handling these cases, which means having supervisor logic in places like I hope everything is clear, please let me know if I'm making wrong assumptions! |
💯. Only things I might emphasize is that
Only because we have a default OCA strat right now. The rest of it i'd say you have correct and I'd encourage solidifying your understanding by running the OP's example and enabling
A very good question! Something I've been thinking about is that we may need to make our default strat one-cancels-one which would mean every actor started with OCO seems to actually be the most general case (since you'll need error handling around each and every task it spawns and a way to ask the super for control of sibling lifetimes) and likely can be implemented solely as a generalization of a
Presuming tasks do raise errors (as per the comment on OCO above), then I think we can maybe just offer a strat API implemented as async context managers? Worth checking out is the ctx mngr style API i was dreaming up in the comment in #22 though I'm not sure the implications of limitations of that chaining style. Ideally we can get ourselves some kind of nice, composable strategy combining API in the long run. |
Heh, so despite trying all the things the test suite is still be grumpy and it's making me wonder if we should just redo the entire done = trio.Event()
result = None
async def target():
await trio.sleep(1)
return 'blah'
async with (
trio.open_nursery() as tn,
tractor.open_nursery() as an,
):
portal = an.start_actor('name', enable_modules=[__name__])
async def run_task_and_get_result():
result = await portal.run(target)
done.set()
t.start_soon(run_task_and_get_result)
# simulate waiting on result
await done.wait()
assert result In this case you can never really have errors swallowed since the I dunno, open to suggestions here. I personally have not used If we were to let's say (😲 ) drop this api it would definitely simplify a lot of things and makes me wonder if we'd even need anything non- |
Yeah, after spelunking the original design a bit yesterday I think I'm of the opinion that we should at the least decouple Argument:
-> we can offer a "wrapper" high level API that provides The benefits include:
Interested to see what y'all think. Also linking #172 as maybe another tidbit to consider as sugar level API. |
Added a couple more patches to #288 to attempt to make the test suite happy (in terms of errors we previously expected in a |
I am not particularly interested in I think that since this project is heavily inspired by trio, many people will approach tractor the same way I did, and so they will reach for a trio-like api. But apart from this, I understand the benefits and agree with your arguments on the API. |
@didimelli i agree i'm just becoming convinced the layering here is wrong. One point I do want to emphasize: an actor nursery is actually quite a bit different.
Right now the error collection that can be done above our actor nursery probably should be instead of intermingled in our nursery's cancellation/teardown logic with special cases all throughout the spawning code 😂 to deal with the The issue is when I first started working on the idea of an actor nursery I wasn't entirely sure how RPC semantics would work but now that they're more refined with the IMO it's super confusing that Why would you ever want to use Further, if we drop It'd be interesting if someone wanted to try writing a To me |
Yeah, if there's one thing I've discovered since adding In fact rolling your own worker pool that caches streams kind of seems so trivial once you're familiar with these APIs that you wonder why one shot work was ever a thing 😂 |
Hello,
I am on WSL2, with python 3.9.9 and tractor 0.1.0a4.
I am testing the cancellation of the tasks when one raises. I am using this simple snippet:
I would expect that when
raising
task raisesValueError
, the other task gets cancelled (as Trio does, as inmain2
function). However, when I run this code, i get:and the programs hangs indefinitely. I need to press
ctrl+c
to kill the process.The traceback shows that tractor is trying to cancel the
not_raising
task (Cancelling ('not raising', '8e3309a2-2e39-4a8e-a3ec-15ce0977c1a8') after error ('not raising', '8e3309a2-2e39-4a8e-a3ec-15ce0977c1a8')
), but the program hangs there.If instead of
sleep_forever()
, i usesleep(20)
, the program actually sleeps for 20 seconds before exiting.Since this is a basic feature of the library, i feel like i am missing something.
Could you please help me understand what it is happening?
The text was updated successfully, but these errors were encountered: