Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renewed function invocation workflow to avoid metadata downloads #660

Merged
merged 174 commits into from
Jul 31, 2024

Conversation

afsalthaj
Copy link
Contributor

@afsalthaj afsalthaj commented Jul 15, 2024

Optimizations around Component Metadata Downloads

worker-service mainly consist of two components. A service which acts as the entry point to invoke worker functions and worker-gateway . The service is mainly used by golem-cli as well as worker-gateway. worker-service also consist of grpc service which is used by worker-executor mainly.

Any worker invocation (be it gateway call, or direct invocation through CLI, or the call back from worker-proxy in worker-executor), all of them led to a metadata download which in turn calls component-service. This document and PR describes how we avoided this, without compromising some of the type-safety that we have in golem-rib evaluation, and hopefully increase performance (as we reduce the number of RPC calls to component-service)

Note that, this optimization is not just done for worker-gateway (or worker-bridge).

New invoke functions and Optimisations

  • New invoke functions that takes precise-json (as a vector) of input, which gets converted to wasm_rpc::Val internally before calling worker-executor
  • The internals of worker-executor now stores TypeAnnotatedValue so that there isn't a need to download the metadata at client site to interpret Vec<wasm_rpc::Val>
  • In short, the two invoke-and-await functions Vec<Val> as input as well as Vec<PreciseJson>, and one returning Vec<Val> as output and the other a TypeAnnotatedValue which can be converted to JSON
  • The golem-cli wasm wave syntax supports still downloads the metadata at cli code side. It was confirmed with @vigoo we are not changing this
  • As part of the PR, the drop functionality which was storing empty array was failing when converted to TypeAnnotatedValue. This is fixed in a separate PR. Take a look at this one: Fix resource drop functionality in worker-executor - #717 #718
  • As a summary, all interpretations are moved to worker-executor and client sites are light weight now.
  • WorkerBridge is broken now until we fix https://app.zenhub.com/workspaces/golem-cloud-65f5b68988aba80f7d690199/issues/gh/golemcloud/golem/715
  • Removed Metadata Download when Interpreting Rib in Worker Gateway

When evaluating Rib, we used to download the metdata for mainly two purposes.

  • To evaluate whether an identifier that looks like a function call is really a function that’s part of the compoent-metadata, or is it constructing a Variant etc.
  • To identify whether an empty {} could be an empty flag or an empty record etc.
  • However we removed these downloads in Rib, and Rib is pushing this as it is to worker-service functionalities (mentioned above).

Function Name Parsing - ParsedFunctionName and Display instance

  • when evaluating a Rib expression, we already parse the function name as ParsedFunctionName. However the functions exposed by worker-service as well as GRPC interface of worker-executor, all work with function name as String. To make lesser changes in this PR, I have introduced a display instane for ParsedFunctionName, and worker-bridge simply reuses the existing functions without introducing a new function that accepts ParsedFunctionName.
  • What this could imply is, we will have 2 times of parsing to form ParsedFunctionName, one during Rib evaluation, and one during execution of the worker function by worker-executor.
  • All other parsing that was done in worker-service are removed.

Please find the follow up ticket raised based on a review comment from @vigoo :
#724

@afsalthaj
Copy link
Contributor Author

Finally, an important test is succeeding

cargo test --package golem-worker-executor-base --test integration api::counter_resource_test_1 -- --nocapture --test-threads=1
   INFO golem_worker_executor_base::http_server: Stopping Http server...
    at golem-worker-executor-base/src/http_server.rs:43
    in integration::api::counter_resource_test_1

ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 183 filtered out; finished in 1.20s

   INFO golem_test_framework::components::redis::spawned: Stopping Redis
    at golem-test-framework/src/components/redis/spawned.rs:92

@afsalthaj
Copy link
Contributor Author

afsalthaj commented Jul 31, 2024

Integration test log is terminated and is showing red, but works in my machine

image
-07-31T09:53:15.839285Z  INFO golem_test_framework::components: [workersvc] 2024-07-31T09:53:15.839196Z  INFO api_request{api="get_workers_metadata" api_type="grpc" component_id="f93a419e-4ef4-48ff-9f64-2bd0a7a43376"}: golem_common::metrics::api: API request succeeded elapsed_ms=3
test worker::get_workers ... ok
2024-07-31T09:53:15.849059Z  INFO golem_test_framework::components: [worker-9101] 2024-07-31T09:53:15.791625Z DEBUG api_request{api="create_worker" api_type="grpc" worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14" component_version=0 account_id="-1"}:waiting-for-permits{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14"}:invocation-loop{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14"}:invocation{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14" idempotency_key="637b7fb7-48b0-4248-8461-bdd9a1c062c4" function="golem:it/api.{start-polling}"}: golem_worker_executor_base::metrics::wasm: golem http::types::outgoing_request::set_method called
2024-07-31T09:53:15.849308Z  INFO golem_test_framework::components: [worker-9100] 2024-07-31T09:53:15.800828Z DEBUG api_request{api="create_worker" api_type="grpc" worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-2" component_version=0 account_id="-1"}:waiting-for-permits{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-2"}:invocation-loop{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-2"}:invocation{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-2" idempotency_key="3463d13a-82bb-4e2a-81ae-21764c3b6c29" function="golem:it/api.{start-polling}"}: golem_worker_executor_base::metrics::wasm: golem http::types::future_incoming_response::drop called

test result: ok. 8 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out2024-07-31T09:53:15.850528Z  INFO golem_test_framework::components: [worker-9100] 2024-07-31T09:53:15.800827Z DEBUG api_request{api="create_worker" api_type="grpc" worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-4" component_version=0 account_id="-1"}:waiting-for-permits{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-4"}:invocation-loop{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-4"}:invocation{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-4" idempotency_key="19be4681-cf41-4a5f-b723-0414bec75325" function="golem:it/api.{start-polling}"}: golem_worker_executor_base::metrics::wasm: golem http::types::future_incoming_response::get called
2024-07-31T09:53:15.849923Z  INFO golem_test_framework::components: [worker-9101] 2024-07-31T09:53:15.791679Z DEBUG api_request{api="create_worker" api_type="grpc" worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14" component_version=0 account_id="-1"}:waiting-for-permits{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14"}:invocation-loop{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14"}:invocation{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-14" idempotency_key="637b7fb7-48b0-4248-8461-bdd9a1c062c4" function="golem:it/api.{start-polling}"}: golem_worker_executor_base::metrics::wasm: golem http::types::outgoing_request::set_path_with_query called
; finished in 259.06s

2024-07-31T09:53:15.850823Z  INFO golem_test_framework::components: [worker-9100] 2024-07-31T09:53:15.800679Z DEBUG api_request{api="create_worker" api_type="grpc" worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8" component_version=0 account_id="-1"}:waiting-for-permits{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8"}:invocation-loop{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8"}:invocation{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8" idempotency_key="76d3637d-52b1-4538-b4c6-11f1e403d347" function="golem:it/api.{start-polling}"}: golem_worker_executor_base::metrics::wasm: golem http::types::future_incoming_response::drop called
2024-07-31T09:53:15.851221Z  INFO golem_test_framework::components: [worker-9100] 2024-07-31T09:53:15.804206Z DEBUG api_request{api="create_worker" api_type="grpc" worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8" component_version=0 account_id="-1"}:waiting-for-permits{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8"}:invocation-loop{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8"}:invocation{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-8" idempotency_key="76d3637d-52b1-4538-b4c6-11f1e403d347" function="golem:it/api.{start-polling}"}: wasmtime::runtime::gc::enabled::rooting: Exiting GC root set LIFO scope: 0
2024-07-31T09:53:15.851092Z  INFO golem_test_framework::components::worker_executor_cluster::spawned: Killing all worker executors
2024-07-31T09:53:15.851265Z  INFO golem_test_framework::components: [worker-9100] 2024-07-31T09:53:15.801312Z DEBUG api_request{api="create_worker" api_type="grpc" worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-1" component_version=0 account_id="-1"}:waiting-for-permits{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-1"}:invocation-loop{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-1"}:invocation{worker_id="ab599159-f504-487a-9e32-a1effed6bd9a/worker-http-client-1" idempotency_key="d0e02879-6474-4b72-ab03-03d359412142" function="golem:it/api.{start-polling}"}: golem_worker_executor_base::metrics::wasm: golem http::types::future_incoming_response::get called
2024-07-31T09:53:15.851273Z  INFO golem_test_framework::components::worker_executor::spawned: Stopping golem-worker-executor 9100
2024-07-31T09:53:15.85099

@afsalthaj afsalthaj marked this pull request as ready for review July 31, 2024 10:26
@afsalthaj
Copy link
Contributor Author

I will address @vigoo 's comments in a separate PR. I think we have come a long way by this time, and would like to merge a working version.

@afsalthaj afsalthaj merged commit dc397bf into main Jul 31, 2024
14 checks passed
@afsalthaj afsalthaj deleted the delete_metadata branch July 31, 2024 13:11
@justcoon justcoon mentioned this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants