Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: object storage building block #18

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ItalyPaleAle
Copy link
Contributor

@ItalyPaleAle ItalyPaleAle commented Dec 14, 2022

Consolidating and updating the proposals for a new object storage building block: dapr/dapr#4804, dapr/dapr#4934

Proposed the creation of a new "object storage" building block which allows storing objects of unstructured data and arbitrary size, and it's optimized for reading/writing with streams.

Copy link
Member

@artursouza artursouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this proposal.

0010-B-object-storage-building-block.md Show resolved Hide resolved
To retrieve an object, clients make a GET request:

```text
GET http://localhost:<port>/v1.0-alpha1/objects/<component>/<key>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we would call it the "objects" API and not "blob"? I think "blob" is clear about the unstructured nature of it while "object" reminds me of JSON. It might just be me but food for thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same doubt. I landed with "object" because the industry-standard term seems to be "object storage", but I agree that "blob" does sound better. I'm fine either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great for people to vote on the naming for this here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the points you 2 just mentioned here should be part of the overview: about both blob and objects being commonly used to refer to the concept at hand and that the industry usually uses "object".

I will say that blob makes it non-ambiguous to me to what you are targeting with this building block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably "Object storage" contains more than just a binary blob, there's often metadata attached to it -- AWS and Google whereas blob storage usually refers to an opaque large binary object so I think naming matters; are we purely offering the opaque data or something that requires metadata be associated with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata can be added (as described in the doc) but isn't required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for object storage here since metadata while optional can play a major part in how the API behaves, and this supports the same object storage notion for AWS and GCP which has metadata as an optional construct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would agree that it feels more like an object store than a blob store, which is fine because objects can be blobs (i.e no metadata associated), but blobs can't really be objects by their accepted definitions.

To retrieve an object, clients make a GET request:

```text
GET http://localhost:<port>/v1.0-alpha1/objects/<component>/<key>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for object storage here since metadata while optional can play a major part in how the API behaves, and this supports the same object storage notion for AWS and GCP which has metadata as an optional construct.

* Updated: 2022-12-14
* Original proposals / discussions: dapr/dapr#4808, dapr/dapr#4934

## Overview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is really missing from this proposal is the why (why users would care about this), the actual problems it helps solve and a mention of the alternatives. All these sections are indicated in the proposed template and need to be added here for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you mention the alternatives? I have additional comments that would best go under that section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides Dapr state stores (which I sort-of mentioned), what other alternatives do you think are worth including? Are non-Dapr solutions (like libraries that abstract various object storage services) one thing I should include?

Copy link
Member

@yaron2 yaron2 Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, what would a user do without Dapr (for example, going with a native SDK) is important - it was included in other proposals and it serves as a really great way to know if the value add to the user with Dapr is worth both the maintenance effort for the project and also for the users who need to run it. (Dapr has overhead and we need to make sure APIs/features we add have a considerable value add).

The proposal involves creating a new `objectstore` building block with the following interface:

```go
type ObjectStore interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about listing objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about listing objects at this point. The problem with this is that it's a very hard API to build in a scalable, performant, and consistent way.

There could be hundreds of thousands or millions of file in a storage account / bucket, and returning them all is not practical, both for Dapr and for the service. Users will try to list all objects in a storage account or bucket (it's inevitable!) and that will cause problems.

The "obvious" solution here is to add pagination, which means we limit the number of objects we return in each list response, but that's actually pretty complicated in practice because some backends have no native support for pagination (local filesystems), and those who do, are inconsistent in the way it's implemented.

It's also very hard to do pagination in a consistent way (in the sense of data consistency), because if files are modified between pagination requests, the result is undefined.

My preference for now would be to move forward without listing objects, so this (already very important) proposal isn't delayed due to complications in design and implementation for supporting listing objects. It is something we could add later on, if we can figure out the correct way to do that.

Copy link
Member

@yaron2 yaron2 Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this (already very important) proposal isn't delayed

Very important based on what? that's a very subjective statement, I can't relate to that as a reviewer (and in this case, a voter) and any sort of urgency expressed here is redundant to the proposal.

Two things to consider here:

  1. Proposals are separated from milestones. This can go anywhere from 1.11 to 1.14 and beyond.
  2. Because proposals are separated from milestones, there is no requirement for every feature listed in a proposal to fit within a single milestone. Specifying the wanted interfaces serves as a decision record we can look back to and then derive milestone work for. How that actually works in practice remains to be seen as this is a new process for us.

I don't feel too strongly about listing objects and merely wanted to know if we view listing objects as an integral part of the overall experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, it wasn't clear to me that proposals did not have to be accepted and implemented in toto.

I am still unsure how to correctly implement, and even design, an API for listing; I will need to let this sit with me for a bit and perhaps even make some POCs. But I'm happy to accept suggestions here and I can update this proposal if someone has good ideas and/or experience with designing such an API.

Also, no particular urgency (in fact, the initial proposal was made in June). Just hoping the entire proposal doesn't get stalled because of lack of clarity on the listing API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just hoping the entire proposal doesn't get stalled because of lack of clarity on the listing API.

That should not happen, at least from my perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the list operation, I agree that this particular operation for most any object store is very costly and inefficient (and therefore largely unused except by individuals / tools doing administrative operations such as cleaning up stale data) -- a common solution is a keeping secondary index stored elsewhere in something that is queryable so I think it's okay to not add to the building block.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing is super important, but perhaps consider it an "optional" feature of the underlying storage? For example, local storage on disk, can list the files very quickly and cheaply, while maybe it is a burden for S3/Azure.

The "obvious" solution here is to add pagination

To avoid paging, why not make the listing a stream that you can consume at your convenience...?


#### Storing an object

To store an object, clients make a PUT request:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user has a state store component that is used both for KV and Blob, will a key from one API override the other? If so, this might create inconsistencies that are very hard to reason about, especially during runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should recommend users to not use the same storage accounts / buckets with multiple building blocks.

I don't see this much different than someone using for example Postgres as a Dapr state store and also interacting with the database directly using the Postgres APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a great difference, because in a Dapr state Postgres + Direct Postgres scenario saving the same key would actually work as expected by default whereas with Dapr state S3 and Dapr object S3 you get values being overridden by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following, sorry.

What do you mean with:

with Dapr state S3 and Dapr object S3 you get values being overridden by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're calling a Dapr S3 state store for saving state with key 1, then call a Dapr object store to store an object with key 1, a GET operation for the S3 state store would return the data saved by the object store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's the same as using Dapr Postgres state store to save state with key 1, then using the Postgres client to query for key 1. It is working as intended IMHO. (although keep in mind that the Dapr state stores have a prefix by default, so users who do want to overwrite 1 would actually need to write prefix/1)

The documentation should not encourage users to use multiple building blocks to access the same resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save state with Dapr for key 1 results in a key in the format of app-id||1. Saving state with an SDK for key 1 would result in a key of 1.

So if a user saves two different values, with and without Dapr, and performs a GET call with and without Dapr, the values would be taken from two different keys, as expected. That's not the case with Dapr state and object storage, where both operations target app-id||1 which is the same key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should not encourage users to use multiple building blocks to access the same resources.

Best to incorporate into the proposal as a note then.

Signed-off-by: ItalyPaleAle <[email protected]>

Where `<component>` is the name of the object store component, and `<key>` is the key (name) of the object.

The request body contains the raw bytes of the object, which are read by Dapr in a streamed way and are sent to the object store as soon as they are available (once again, Dapr does not buffer the data in-memory before storing it).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good idea to include the object size in this request. I am not certain about other platforms but S3, for example, requires you use the multipart upload mechanism if the object to be stored exceeds the 5GB put operation maximum, which would require knowledge of the expected size so that the component can use the right API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that's not possible because knowing the object size ahead of time would require the client to have all the data already assembled in memory. This won't work for data that is generated on-the-fly and streamed to the object storage service.

When implementing this, we can err on the side of caution and always leverage multipart upload if that's the most universal solution.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to stream files to S3 without knowing the size of the file -- https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header?rq=1

Copy link
Contributor Author

@ItalyPaleAle ItalyPaleAle Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I have done it before (with the SDK), confirming it's possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible in S3 using multipart but that requires separate permissions and implementation than a simple put operation -- if the requirement is that size is explicitly not known ahead of time then that should be clear to component developers so that, for example, the S3 implementation can default to using multipart (or at least make it obvious to the developer what limitations it will have).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that's not possible because knowing the object size ahead of time would require the client to have all the data already assembled in memory.

Since metadata can only be submitted as a header, if you want to compute a checksum, you'll need to traverse the file once before you send it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make object size an optional parameter. Caller can pass it using a Content-Length header in the request and the components can (but don’t have to) use that to make decisions.

@yaron2
Copy link
Member

yaron2 commented Dec 4, 2023

@dapr/maintainers-dapr - review ping.


- This building block will be stream-first. It will be working with input and output data as binary streams, and will not perform any transformation or encoding on the data. Thanks to using streams, there are no limits on the size of the input/output data, bypassing `MaxBodySize`.
- Dapr only supports the "last-write-wins" concurrency pattern for object storage (unlike with state stores). This makes sense considered the intended usage of object storage, which is storing documents that usually do not change. When documents do change, using last-write-wins is consistent with how regular filesystems work, for example.
- Dapr will not calculate and store the checksum of objects. This is because checksums must be transmitted as header, but Dapr stores data in the state store in a streaming way, so it can't compute the full checksum until the end of the stream. Apps that want to store checksums (such as the `Content-MD5` header) should compute it beforehand and submit it as metadata header.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can metadata also be submitted via a trailer? Imagine that I have a massive file that I want to compute the checksum for, if I can only send it as a header, I must traverse the file twice, once to compute the header, then again to submit to the object store.

I think it would make sense to allow trailing metadata and/or being able to submit metadata separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a method to replace metadata on a blob?

Using tailers is a less-ideal option because it means you can't use streaming and need to buffer the entire blob in memory, because SDKs need to have the metadata when the call is made


Where `<component>` is the name of the object store component, and `<key>` is the key (name) of the object.

The request body contains the raw bytes of the object, which are read by Dapr in a streamed way and are sent to the object store as soon as they are available (once again, Dapr does not buffer the data in-memory before storing it).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that's not possible because knowing the object size ahead of time would require the client to have all the data already assembled in memory.

Since metadata can only be submitted as a header, if you want to compute a checksum, you'll need to traverse the file once before you send it anyway.


#### Retrieving an object

To retrieve an object, clients make a GET request:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also recommend a HEAD request being allowed if I just want to get the metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good suggestion

The proposal involves creating a new `objectstore` building block with the following interface:

```go
type ObjectStore interface {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing is super important, but perhaps consider it an "optional" feature of the underlying storage? For example, local storage on disk, can list the files very quickly and cheaply, while maybe it is a burden for S3/Azure.

The "obvious" solution here is to add pagination

To avoid paging, why not make the listing a stream that you can consume at your convenience...?

- `204 No Content` for successful operations
- `400 Bad Request` if the state store doesn't exist or the key parameter is missing
- `409 Conflict` if the object already exists and there's an ETag mismatch
- `500 Internal Server Error` for all other errors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about write-only object storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, an HSM might be backing this that allows storing new key material, and reading public keys, but not deletions. Maybe the object is backed by a file system but the file is marked read-only.

I'd suggest a 403: forbidden, to handle the case where deletion isn't allowed.

I'd also recommend a 410 if the content is already deleted and no longer exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for HSM is out of scope of this building block. It would more likely be aligned with the crypto and/or secret stores building blocks.

But in general, i support adding more status codes to indicate various scenarios like the ones you describe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for HSM is out of scope of this building block.

I was illustrating the error, but yeah, I agree.

// Parameter tag is optional and can be used to pass an etag or any similar concurrency-control tag.
// In case of conflict (e.g. etag mismatch), the method returns ErrConflict.
DeleteObject(ctx context.Context, key string, tag any) (err error)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about getting a public url or reference to the object? For example, S3/Azure/(probably GCP) allows for getting a pre-signed URL that can be accessed for some time. That's incredibly useful when you want to "share" objects with other systems/clients in a secure way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that could be a good suggestion but I would consider it an optional method, as not sure if all blob storage services have a concept of pre-signed URLs.

- The first message in the stream MUST contain all other required keys.
- The first message in the stream MAY contain a `payload`, but that is not required.
- Subsequent messages (any message except the first in the stream) MUST contain a `payload` and MUST NOT contain any other property.
- The last message in the stream MUST contain a `payload` with `complete` set to `true`. That message is assumed to be the last one from the sender and no more messages are to be sent in the stream after that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, but I'd also recommend being able to send any other metadata at the end. For example, content-length, content-hash, or whatever else is important to the application that can only be done once the entire file is consumed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let the server populate content-hash and/or content-length.

For checksumming during upload, if the service and SDK support that, it can be done transparently by the component. Because components will most likely need to performed chunked upload, the checksum should be included in each chunk anyways.


There can be two types of headers:

- Custom ones have the `x-dapr-` prefix in requests sent between Dapr and the app. When they are stored, they generally do not have the `x-dapr-` prefix, which is removed (although this could be dependent on the implementation of the state store).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, x-dapr- headers feel like they would be for out-of-band communication between the app and sidecar, and not for application metadata. If I recall correctly, most object stores will also send headers as-is from the object store when requested externally, thus potentially leaking information, such as the fact that dapr is being used (which may be undesirable, much like wp-content urls lets any attacker know that they're dealing with WordPress immediately).

I don't have an answer here, just something to consider.

@yaron2
Copy link
Member

yaron2 commented Dec 5, 2023

+1 binding

This is a good proposal and barring any new major issues I think it'd be good to move forward with this. Outstanding comments can be addressed as implementation details IMO.

@daixiang0
Copy link
Member

+1 binding

Object storage is used widely, I really wish Dapr support it. As @yaron2 said, Outstanding comments can be addressed as implementation details IMO.

@WhitWaldo
Copy link

WhitWaldo commented Jun 17, 2024

Rather than putting this under the kitchen sink of the existing state management building block, I propose that this instead should be part of a new and next-generation suite of purpose-built state management building blocks.

I propose that we leave the existing building block where it is as it's a dependency for several other blocks, but split this out to be a completely separate thing conceptually, leaving the door open for a collection of other specialty-state stores to be built alongside it later.

Why a clean break? The existing key value store is confusing to new developers and quite arguably incomplete as a full-featured key/value store (e.g. no key query support). Discord is filled with people inquiring about the (discontinued, but still present and very limited) query API and every now and then someone asks about their large files mysteriously not saving because they're exceeding the request limits. It's difficult to add any new functionality to today's state store because it already supports so many storage providers that themselves are so distinctly purpose built. For developers that want that need something today, it's a fine option, but I propose that we draw a line here on the first generation state store as a whole and instead adopt a more specific second generation model going forward. Object stores don't need querying or transactions and they'll never be a primary store for actors - they're distinctly different from what we've already got, I think this is the opportune time to start fresh.

Finally, I propose that this start with the core Get/Set streaming and Delete functionality and have a pile of later optional interfaces come along that provides search/filter functionality against the metadata for those providers that support this or event notifications (potentially paired with pubsub) or lifecycle management. We can save a more blob-specific state storage (with its own purpose-built component providers like append-only blob storage or page blobs) for another purpose-built implementation with its own distinct requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants