Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AWS key value store #2883

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ogghead
Copy link
Contributor

@ogghead ogghead commented Oct 10, 2024

Hi folks! I am creating this draft PR to solicit feedback on an initial AWS key value store implementation. I appreciate any and all discussions on this PR!

Some points for thought:

  1. Implementation uses DynamoDB, though for large blob storage, S3 is preferable (DynamoDB can only store <=400KB size records), see Add an S3 key/value storage provider interface #2606. DynamoDB is cheaper and faster for performing many rapid reads/writes of small amounts of data though, and is roughly in the same niche as Azure CosmosDB
  2. Auth currently requires generating AWS STS token credentials and passing them to the Spin app in a runtime config file. https://github.com/spinkube/skips/pull/9/files discusses better patterns to fetch credentials. Curious to hear thoughts on how this implementation can integrate better with that proposal!
  3. The Azure key-value implementation supports reading credentials from environment variables, however the AWS Rust SDK does not offer a synchronous API to load config and would require the MakeKeyValueStore::make_store function to be async for all implementations -- leading to a chain of async function coloring. It is possible to manually fill the SdkConfig object and I did this to pass STS tokens from a runtime config file, but it would be ideal to rely on the SDK's defaults and many credential loading fallbacks if possible. Curious on thoughts for how to best handle env var credential loading

@itowlson
Copy link
Contributor

Kia ora @ogghead and thanks for this. We've got work going on in #2895 to implement some additional key-value interfaces, and I think it works better to land that one first, then have this PR include all the AWS stuff. This is partly down to what has the biggest compatibility implications combined with the present release timeline, but also will hopefully provide enough infrastructure to make extending this PR to the new interfaces easy! I hope that's okay with you.

In the meantime, I'll try to have a look at your points for thought!

@itowlson
Copy link
Contributor

Implementation uses DynamoDB, though for large blob storage, S3 is preferable

I'd say this is fine for now. We could call this a DynamoDB KV store, which would leave us flexibility to later to add a S3 KV backend if users had large object use cases - nothing here would preclude that as long as we think of it as an "AWS product X" store rather than an "AWS" store.

Auth currently requires generating AWS STS token credentials

I don't think we are bound to offer a "tokens in the runtime config" option if that doesn't make sense or is painful to implement. I'm not sure we can rely on the workload identity idea from that SKIP across all Spin runtime environments, but absolutely open to doing things differently as appropriate.

it would be ideal to rely on the SDK's defaults and many credential loading fallbacks if possible

This is what we do in the SQS trigger. There's no credential configuration, we just load the SDK and let figure out the credentials, whether from ambient EVs or whatever. I'm told that's idiomatic enough, so I'd have no problem with doing the same thing here. I'm sure someone will shout out if they do - but we could presumably retrofit additional configuration methods if need be - the Cosmos one certainly went through a few sets of extensions...

@ogghead
Copy link
Contributor Author

ogghead commented Oct 29, 2024

Kia ora @ogghead and thanks for this. We've got work going on in #2895 to implement some additional key-value interfaces, and I think it works better to land that one first, then have this PR include all the AWS stuff. This is partly down to what has the biggest compatibility implications combined with the present release timeline, but also will hopefully provide enough infrastructure to make extending this PR to the new interfaces easy! I hope that's okay with you.

In the meantime, I'll try to have a look at your points for thought!

Sounds good to me! I'm excited to see that work land. It makes sense to hold off on this for now, then rework after that is merged and support all WASI KV interfaces for AWS.

Thanks for taking a look!

@ogghead
Copy link
Contributor Author

ogghead commented Oct 29, 2024

Implementation uses DynamoDB, though for large blob storage, S3 is preferable

I'd say this is fine for now. We could call this a DynamoDB KV store, which would leave us flexibility to later to add a S3 KV backend if users had large object use cases - nothing here would preclude that as long as we think of it as an "AWS product X" store rather than an "AWS" store.

Great callout! The config specifies "Dynamo" as the KV store type, so should hopefully be flexible to add other backends. I will keep this in mind when implementing the full WASI KV interface

Auth currently requires generating AWS STS token credentials

I don't think we are bound to offer a "tokens in the runtime config" option if that doesn't make sense or is painful to implement. I'm not sure we can rely on the workload identity idea from that SKIP across all Spin runtime environments, but absolutely open to doing things differently as appropriate.

The runtime config token setup is (from local testing) working -- but the "use the default SDK config loading" is proving challenging with the current interface constraints. Mainly as the default AWS config loader is implemented as an async function.

it would be ideal to rely on the SDK's defaults and many credential loading fallbacks if possible

This is what we do in the SQS trigger. There's no credential configuration, we just load the SDK and let figure out the credentials, whether from ambient EVs or whatever. I'm told that's idiomatic enough, so I'd have no problem with doing the same thing here. I'm sure someone will shout out if they do - but we could presumably retrofit additional configuration methods if need be - the Cosmos one certainly went through a few sets of extensions...

Agreed, this is the pattern I followed in the Kinesis trigger as well. I would love to have this here too! The challenge is function coloring from AWS config default loader function -- using that async function here appeared to require a chain of refactoring across the general KV traits, but I must admit that my async Rust knowledge hit a wall when trying to reconcile the changes required for that.

@itowlson
Copy link
Contributor

Ah, I misread your comment about using the default SDK config - sorry about that.

It seems like you could call Tokio's block_on to wrap the async call in a blocking wrapper. It's not going to win any prizes for elegance but should, hopefully, get the job done. The section "A synchronous interface to mini-redis" on the Tokio "Bridging to sync code" page seems like it might be close to what you want, although you may well have tried that already? We do have some Giant Async Brains floating around who might be able to help if you share what you tried and what you ran into.

@ogghead
Copy link
Contributor Author

ogghead commented Oct 29, 2024

Ah, I misread your comment about using the default SDK config - sorry about that.

It seems like you could call Tokio's block_on to wrap the async call in a blocking wrapper. It's not going to win any prizes for elegance but should, hopefully, get the job done. The section "A synchronous interface to mini-redis" on the Tokio "Bridging to sync code" page seems like it might be close to what you want, although you may well have tried that already? We do have some Giant Async Brains floating around who might be able to help if you share what you tried and what you ran into.

Good callout -- I tried using block_on to get this working with

Handle::current().block_on(aws_config::load_defaults(BehaviorVersion::latest()))

as well as

let rt = tokio::runtime::Builder::new_current_thread()
                    .enable_all()
                    .build()?;
 rt.block_on(aws_config::load_defaults(BehaviorVersion::latest()))

While these do compile, I see crashes immediately on Spin app startup with

Cannot start a runtime from within a runtime. This happens because a function (like `block_on`) attempted to block the current thread while the thread is being used to drive asynchronous tasks.

Alternatively, when I went down the path of asyncifying everything required to await this function, I hit a wall at store_from_toml_fn where closures are returned. Async closure enhancements in Rust might be needed to make the closures returned there async, but this is where my knowledge of async Rust was lacking.

If any Giant Async Brains (or anyone) have ideas on the best path forward on this, much appreciated!

@itowlson
Copy link
Contributor

All right. I think I have a way around this for you. "But," in the words of Deep Thought, "you're not going to like it."

So I asked the Giant Async Brains about blocking on load_defaults and they said "don't do that." Instead the idea is to create a future for the Client and capture that in a lazy or once-cell instead. Then await it each time you want to use it (which will be cheap after the first time, especially compared to the network activity that follows).

Here is what I did, which seems to work (but you may find a less awful way, this was just the first stab that didn't make the compiler mad at me):

  • Add the async-once-cell crate
  • Change KeyValueAwsDynamo::client to be a (deep breath) async_once_cell::Lazy<Client, std::pin::Pin<Box<dyn std::future::Future<Output = Client> + Send>>>
  • In KeyValueAwsDynamo::new, shunt the existing code into an async move. Then Box::pin the async move. And capture that as a future. Then put that into a Lazy::from_future. So it looks like:
        let client_fut: std::pin::Pin<Box<dyn std::future::Future<Output = Client> + Send>> = Box::pin(async move {
            let config = match auth_options {
                KeyValueAwsDynamoAuthOptions::RuntimeConfigValues(config) => /* as current */,
                KeyValueAwsDynamoAuthOptions::Environmental => {
                    aws_config::load_defaults(BehaviorVersion::latest()).await  // as before but uncommented
                }
            };
            Client::new(&config)
        });

        let client_cell = async_once_cell::Lazy::from_future(client_fut);

        Ok(Self { client: client_cell, table })

(Some of the naming here is poor, this was throwaway code.)

  • Change StoreManager::get to get_unpin().await the Lazy:
    async fn get(&self, name: &str) -> Result<Arc<dyn Store>, Error> {
        Ok(Arc::new(AwsDynamoStore {
            _name: name.to_owned(),
            client: self.client.get_unpin().await.clone(),  // <-- this bit
            table: self.table.clone(),
        }))
    }

NOTE: this breaks StoreManager::summary. You'll need to either make that async or capture the summary info as extra fields, but this should be routine. (I hope. I admit punting on this.)

Let me know if you need more info or want a proper diff.

@ogghead
Copy link
Contributor Author

ogghead commented Oct 29, 2024

Excellent! This is exactly the Galaxy Async Brain thinking I was sorely lacking 😄

I will give this a go tonight, thanks for the tips!

@ogghead
Copy link
Contributor Author

ogghead commented Oct 30, 2024

Can confirm this worked like a charm! Pushed the changes to reflect and I will keep an eye on the full WASI KV implementation PR. I am in your debt for your help on this @itowlson :)

One does not simply walk into async_once_cell::Lazy<Client, std::pin::Pin<Box<dyn std::future::Future<Output = Client> + Send>>> (or Mordor)

@itowlson
Copy link
Contributor

I'm delighted to have helped! Thanks once again for your effort, your patience, and your good humour throughout this...

...

...because you will need them when I call that debt in. ominous music and cheesy lightning FX in which the viewer can vaguely make out the looming shape of wasi:blobstore

(also, and at the risk of bathos, please ignore MQTT CI failures - it's a known flake)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants