-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new ExitCode type and values, change ActorError to make bad exit codes less likely #191
Conversation
@@ -36,6 +35,8 @@ pub enum Method { | |||
Collect = 4, | |||
} | |||
|
|||
pub const ERR_CHANNEL_STATE_UPDATE_AFTER_SETTLED: ExitCode = ExitCode::new(32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling attention to this actor-defined exit code, as an example.
@@ -61,6 +61,8 @@ pub enum Method { | |||
CurrentTotalPower = 9, | |||
} | |||
|
|||
pub const ERR_TOO_MANY_PROVE_COMMITS: ExitCode = ExitCode::new(32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another actor-specific exit code.
actors/runtime/src/actor_error.rs
Outdated
} | ||
pub fn internal_send_aborted(msg: String) -> Self { | ||
// TODO use symbolic constant for USR_SEND_ABORTED when it's defined. | ||
Self { exit_code: ExitCode::from(24), msg } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tada! I can define this in builtin-actors without waiting for an enum update in FVM.
actors/runtime/src/runtime/fvm.rs
Outdated
@@ -60,7 +60,7 @@ impl<B> FvmRuntime<B> { | |||
fn assert_not_validated(&mut self) -> Result<(), ActorError> { | |||
if self.caller_validated { | |||
return Err(actor_error!( | |||
SysErrIllegalActor, | |||
unspecified, // FIXME is this the right one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: which standard exit code should we use for caller validation errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this would be "unspecified". But maybe we do need an extra USR_ASSERTION_FAILED
error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actors/runtime/src/runtime/fvm.rs
Outdated
@@ -263,28 +261,40 @@ where | |||
value: TokenAmount, | |||
) -> Result<RawBytes, ActorError> { | |||
if self.in_transaction { | |||
return Err(actor_error!(SysErrIllegalActor; "runtime.send() is not allowed")); | |||
return Err(actor_error!(unspecified; "runtime.send() is not allowed")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: which standard exit code for state transaction violations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actors/runtime/src/runtime/fvm.rs
Outdated
match fvm::send::send(&to, method, params, value) { | ||
Ok(ret) => { | ||
if ret.exit_code.is_success() { | ||
Ok(ret.return_data) | ||
} else { | ||
Err(ActorError::from(ret.exit_code)) | ||
// The returned code can't be simply propagated as it may be a system exit code. | ||
Err(ActorError::internal_send_aborted(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's where failures of internal sends are caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this hide the error code from the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. As noted elsewhere, returning ActorError from send isn't appropriate. I'm wary of breaking the consistency with other runtime methods, but maybe I'll try fixing this one now.
} | ||
} | ||
Err(err) => Err(match err { | ||
// Some of these errors are from operations in the Runtime or SDK layer | ||
// before or after the underlying VM send syscall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my commentary here as I was trying to get to grips with things. I'm reaching the conclusion that error numbers are not good as error values, because context is lost and it's hard to know if the right message is being attached later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They seem to be only enough in the case we are okay with loosing that information, otherwise we must track additional information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using "numbers" here (in the syscall interface) because they're consensus critical and we want to keep things as "simple" as possible. If we can't distinguish between two cases, we should add additional exit codes for that. But I think all cases are covered.
https://github.com/filecoin-project/fvm-specs/blob/main/07-errors.md#error-numbers-by-syscall should tell you exactly what each error number means with respect to each syscall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my commentary here as I was trying to get to grips with things. I'm reaching the conclusion that error numbers are not good as error values, because context is lost and it's hard to know if the right message is being attached later.
Or are you referring to the exit code in ActorError
? In that case, I agree. My attitude there is:
- At the point where we can definitively say "we should exit with some specific exit code", we should just exit with that exit code (abort).
- Before then, we should return descriptive (structured) errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Here" is not the syscall interface, though. It's calling the SDK, which has more code and makes multiple syscalls. We need a descriptive structured error here. Propagating an error number is an antipattern.
value, *balance | ||
return Err(ActorError::unchecked( | ||
ExitCode::SYS_SENDER_STATE_INVALID, | ||
format!("cannot send value: {:?} exceeds balance: {:?}", value, *balance), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock has to replicate what the FVM would do sometimes. I have a hunch that ActorError is not a good type to return from the Runtime API.
Actually this code might be wrong, and it should be an "internal send failed", masking the system code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah using an ActorError
here seems weird at best, wrong at worst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Steb points out elsewhere, we don't want to mask the system exit code either, we just can't propagate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #197 for a possible fix
@@ -965,51 +965,64 @@ impl Syscalls for MockRuntime { | |||
) -> anyhow::Result<Cid> { | |||
let exp = | |||
self.expectations.borrow_mut().expect_compute_unsealed_sector_cid.take().ok_or_else( | |||
|| actor_error!(ErrIllegalState; "Unexpected syscall to ComputeUnsealedSectorCID"), | |||
|| actor_error!(illegal_state; "Unexpected syscall to ComputeUnsealedSectorCID"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock is totally messed up here. It should not be returning ActorError for mock expectation violations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed as #196
@@ -24,7 +24,7 @@ impl ActorDowncast for anyhow::Error { | |||
match downcast_util(self) { | |||
Ok(actor_error) => actor_error.wrap(msg), | |||
Err(other) => { | |||
ActorError::new(default_exit_code, format!("{}: {}", msg.as_ref(), other)) | |||
ActorError::unchecked(default_exit_code, format!("{}: {}", msg.as_ref(), other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This downcasting smells iffy, but is used a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the solution to this is: remove the usage of anyhow
in all the ipld crates (amt,hamt, blockstore,encoding) using conrete error enums. anyhow
is great as a "Final" error type, but not as a generic error from which you want to retrieve the original error again. (I have some wip code that does this, but it's a bit more work to finish)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now implemented in filecoin-project/ref-fvm#463
actors/runtime/src/runtime/fvm.rs
Outdated
@@ -60,7 +60,7 @@ impl<B> FvmRuntime<B> { | |||
fn assert_not_validated(&mut self) -> Result<(), ActorError> { | |||
if self.caller_validated { | |||
return Err(actor_error!( | |||
SysErrIllegalActor, | |||
unspecified, // FIXME is this the right one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this would be "unspecified". But maybe we do need an extra USR_ASSERTION_FAILED
error.
actors/runtime/src/runtime/fvm.rs
Outdated
@@ -60,7 +60,7 @@ impl<B> FvmRuntime<B> { | |||
fn assert_not_validated(&mut self) -> Result<(), ActorError> { | |||
if self.caller_validated { | |||
return Err(actor_error!( | |||
SysErrIllegalActor, | |||
unspecified, // FIXME is this the right one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actors/runtime/src/runtime/fvm.rs
Outdated
@@ -263,28 +261,40 @@ where | |||
value: TokenAmount, | |||
) -> Result<RawBytes, ActorError> { | |||
if self.in_transaction { | |||
return Err(actor_error!(SysErrIllegalActor; "runtime.send() is not allowed")); | |||
return Err(actor_error!(unspecified; "runtime.send() is not allowed")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actors/runtime/src/runtime/fvm.rs
Outdated
match fvm::send::send(&to, method, params, value) { | ||
Ok(ret) => { | ||
if ret.exit_code.is_success() { | ||
Ok(ret.return_data) | ||
} else { | ||
Err(ActorError::from(ret.exit_code)) | ||
// The returned code can't be simply propagated as it may be a system exit code. | ||
Err(ActorError::internal_send_aborted(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this hide the error code from the caller?
actors/runtime/src/runtime/fvm.rs
Outdated
// How do we know the receiver is what triggered this? | ||
// An error number doesn't carry enough information at this level | ||
// above the raw syscall (and even then). | ||
actor_error!(not_found; "receiver not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The send syscall is defined to return NotFound
if, and only if, the receiver doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but this is not the syscall! This is invoking an SDK method that makes three syscalls. We don't know where the NotFound
came from here.
actors/runtime/src/runtime/fvm.rs
Outdated
// Actually this is more like an illegal argument where the caller attempted | ||
// to transfer an amount larger than that representable by the VM's | ||
// token amount type. Yes, the caller doesn't have that amount, but if they'd | ||
// attempted to transfer a representable amount it would fail with | ||
// SYS_INSUFFICIENT_FUNDS instead, so this difference is wierd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll fail with InsufficientFunds
(the syscall error) in both cases. See https://github.com/filecoin-project/fvm-project/blob/724650bf92b239bb7d2079cde7d5bad121d7c79f/07-errors.md#L53-L57 for the only "system" exit codes that can appear here.
Basically, the rules are:
- Syscall error numbers indicates that something went wrong with the syscall.
- Exit codes indicates that the syscall succeeded, but the called actor failed or did something wrong.
In other words:
- If we fail before we invoke the target actor, we get a syscall error.
- If we fail afterwards, we get an exit code. Usually a "user" exit code, but sometimes a "system" exit code if the actor fails without yielding a usable exit code.
Other than that, the system exit codes exist for recording failures on-chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I traced the code. The FVM SDK will return ErrorNumber::InsufficientFunds
prior to invoking the syscall in the case I describe. This is not the syscall interface. The SDK makes three syscalls and performs some other checks below here. This match branch is specifically for when we don't get a system exit code from the send (and so we can't invoke one here in userland).
I know we want it to behave how you described, but it doesn't. This is another example of error numbers being a poor representation except for directly at the actual syscall.
} | ||
} | ||
Err(err) => Err(match err { | ||
// Some of these errors are from operations in the Runtime or SDK layer | ||
// before or after the underlying VM send syscall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using "numbers" here (in the syscall interface) because they're consensus critical and we want to keep things as "simple" as possible. If we can't distinguish between two cases, we should add additional exit codes for that. But I think all cases are covered.
https://github.com/filecoin-project/fvm-specs/blob/main/07-errors.md#error-numbers-by-syscall should tell you exactly what each error number means with respect to each syscall.
} | ||
} | ||
Err(err) => Err(match err { | ||
// Some of these errors are from operations in the Runtime or SDK layer | ||
// before or after the underlying VM send syscall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my commentary here as I was trying to get to grips with things. I'm reaching the conclusion that error numbers are not good as error values, because context is lost and it's hard to know if the right message is being attached later.
Or are you referring to the exit code in ActorError
? In that case, I agree. My attitude there is:
- At the point where we can definitively say "we should exit with some specific exit code", we should just exit with that exit code (abort).
- Before then, we should return descriptive (structured) errors.
7d62272
to
0093286
Compare
Updates all uses of the FVM's
ExitCode
type to the new one that will be released in the next crates. Updates to use the codes defined in the spec.Changes
ActorError
, the type used to propagate abort codes to the trampoline, to make it easier to construct common ones and a bit harder to accidentally attempt to exit with a system exit code. As part of this, introduces exit code 24 for "internal send failed", because the exit code of the internal send may be a system exit code and can't be propagated by the caller.It's going to be painful, but please do review each change in exit code (or at least those that used to be
SysErrXxx
). There are many places where I've now had to replace a system exit code with a user exit code. The tests all pass locally, but I am somewhat surprised at that.fvm_shared
crates before merging.FYI @dignifiedquire