Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor operation dispatcher #127

Open
eguzki opened this issue Oct 30, 2024 · 3 comments
Open

refactor operation dispatcher #127

eguzki opened this issue Oct 30, 2024 · 3 comments

Comments

@eguzki
Copy link
Contributor

eguzki commented Oct 30, 2024

The current implementation works (famous last words). However, it does not fit well with the lifetime of the tasks, which may (or may not) run (one or multiple) GRPC requests.

Description of the workflow

  • The operation dispatcher is created on create_http_context, which has the lifetime of one single HTTP request. It is not being reused for multiple HTTP requests re-using the same connection.
  • Each operation dispatcher needs to run a list of operations (actions in the configuration) one right after the other. There cannot be parallel execution. Therefore, only one of the actions can be waiting for the GRPC response.
  • Each operation can run one or multiple (for this latter we do not have use case for now) GRPC requests.
  • It is the responsibility of the operation (action) to send HTTP response when something goes wrong. For example: hostcalls::send_http_response(status_code: 429, response_headers, body: Some(b"Too Many Requests\n"))
  • Each operation is responsible to take actions based on failureMode.

The high level structure

The GRPC operation

pub enum OperationResult {  
    Done,  
    Wait,    
    Failed,   
}                  

trait GRPCOperation {
       // Will start the operation task 
       // The return can be either Done, Wait or Failed
       fn start(&self) -> OperationResult;

       // Will resume operation task
       // The return can be either Done, Wait or Failed
       fn on_grpc_response(&mut self, token_id: u32, status_code: u32, resp_size: usize) -> OperationResult;
}

The GRPC operations are started and then they can run N GRPC requests. Each time a GRPC request is done, the operation will be waiting for response that will be signaled with the on_grpc_response.

The operation result has the following meaning:

  • Done => The operation is done and its methods start and on_grpc_response should not be called.
  • Wait => The operation is in waiting state and it is expecting that the on_grpc_response will eventually be called
  • Failed => The operation has failed and entered unrecoverable state. Its methods start and on_grpc_response should not be called.

When something failed on the operation work, the operation should check FailureMode. If failure mode is deny, then the operation should send HTTP response self.send_http_response(status_code: 500, headers: vec![], body: Some(b"Internal Server Error.\n")) or whatever it needs to be send. For instance, on over the limit in a rate limiting action, it should send hostcalls::send_http_response(status_code: 429, response_headers, body: Some(b"Too Many Requests\n")). Then return OperationResult::Failed

On the other hand, if failure mode is allow, the operation should decide what to do. It can either be OperationResult::Failed or OperationResult::Done, but never OperationResult::Wait

The operation dispatcher

pub enum RunAction {  
    DoneOK,  
    DoneFailed,  
    Wait,
}                  

trait OperationDispatcher {
       // Will run the list of operations
       // On RunAction::DoneOK, the operation dispatcher is signaling it is done and does not expect to be called again in `run`
       // On RunAction::DoneFailed, the operation dispatcher is signaling it is done and the workflow failed and does not expect to be called again in `run`
       // On RunAction::Wait, the operation dispatcher is signaling it is NOT done and does expect to be called again in `run`
       fn run(&mut self) -> RunAction;

       // Will resume the work of one of the GRPC operations
       // It does not return anything. The dispatcher expects to be called on the "run" method again.
       fn on_grpc_response(&mut self, token_id: u32, status_code: u32, resp_size: usize);
}

From the Wasm HTTPContext, on http_request_headers, the operation dispatcher should be populated with the action sets and those which conditions do not apply, then filtered out. Then, the run method should be called and map RunAction to proxy_wasm::types::Action. The mapping should be straightforward

  • RunAction::DoneOK | RunAction::DoneFailed => proxy_wasm::types::Action::Continue
  • RunAction::Wait => proxy_wasm::types::Action::Wait.

From the Wasm HTTPContext, on on_grpc_call_response, the on_grpc_response method should be called. That should update the internal status of the current operation waiting for GRPC response. Then, resume dispatcher work running again run method. From on_grpc_call_response, it is needed to know whether the dispatcher is done and then call self.resume_http_request() method. This is what RunAction will tell.

  • RunAction::Wait | RunAction::DoneFailed => // Do nothing. For RunAction::DoneFailed case the oepration have already sent the error response like hostcalls::send_http_response(status_code: 429, response_headers, body: Some(b"Too Many Requests\n"))
  • RunAction::DoneOK => self.resume_http_request()

When run method it called, internally, the operations will be executed. When one of the operations returns "Fail" state, from the dispatcher perspective, the entire workflow should be halted and the return value should be "RunAction::DoneFailed. When ALL the operations are done and the queue is empty, the return value should be "RunAction::DoneOK. When one of the operations return OperationResult::Wait, the dispatcher should return RunAction::Wait.

When on_grpc_response is called, the dispatcher knows which operation is waiting (the first of the list) and can call its on_grpc_response method. Then. wait run method to be called.

Operation dispatcher state machine

graph TD;
    INIT-->WAIT;
    WAIT-->WAIT;
    INIT-->OK;
    WAIT-->OK;
    INIT-->ERROR;
    WAIT-->ERROR;
Loading
@eguzki
Copy link
Contributor Author

eguzki commented Oct 30, 2024

Not for today. I would love your thoughts @didierofrivia

@didierofrivia
Copy link
Member

I think at the moment there's no reason for considering a refactor... specially after #125 that is solving the use cases you presented with the e2e tests. However, it will be definitely good to review not only the OperationDispatcher but the whole of the wasm shim.... starting from it's name. We might need to present real use cases and/or performance/complexity tests in order to not be biased by our assumptions on hypothetical use cases and workflows. However, these kind of analyses are good and we could revisit it after GA in order to assess its viability 👍🏼

@eguzki
Copy link
Contributor Author

eguzki commented Nov 21, 2024

Should fix #149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants