Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Switch to std::process::Command when invoking the program binary #573

Conversation

ratankaliani
Copy link
Contributor

@ratankaliani ratankaliani commented Sep 25, 2024

The use of command-fds with tokio::process::Command causes rare, deterministic race conditions that lead to witness generation failures. These issues consistently occur for specific block ranges on OP Sepolia, Base, and OP Mainnet in op-succinct.

We should switch to std::process::Command to avoid these race conditions. This will increase resource usage for concurrent witness generation, and we can move back to tokio once the race condition issues are resolved.

I'm not exactly certain the source of the issue, but the Safety section of tokio::process::Command mentions some issues arising from file descriptors: https://tikv.github.io/doc/tokio/process/struct.Command.html#safety

Copy link

codecov bot commented Sep 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.8%. Comparing base (38cf22e) to head (4be3cb4).

Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@clabby
Copy link
Collaborator

clabby commented Sep 26, 2024

It's a bit concerning that we have to move to a blocking task to fix this. I'd prefer if we found out the root cause of the issue before making it such that the future that spawns this command cannot be cancelled. That can cause hanging behavior in ways we may not anticipate, if the host program panics.

I think a better approach here, at least for the native run, is to move towards just executing the client program from within the host. It'll require a bit of a larger refactor (likely including some or all of #398 + #361 + #553), but would remove the child process.

My primary suspect here is #553 - that's where I'd start. fwiw, we have not yet ran into this in the upstream kona-client + kona-host pair either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants