-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List API for the state store building block #61
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
This proposal is quite similar to a discussion I've been having for the last several months. I can tell you already that I would not expect it's likely the proposal as-is would receive much traction for the following reasons:
Rather, I've been working to propose and solicit traction for a next-generation of specialty state stores in Dapr that each support a core feature set specific to their unique purposes. I just finished some updates to my proposal for a refreshed Key/Value Store that would support streaming and key retrieval based on prefix filtering, @berndverst has an outstanding proposal for a Document Store that would specifically allow document-focused queries and I just typed up one for a dedicated Centralized Cache Store. While querying keys is out of scope for the cache proposal and I'm not initially inclined to do more than a prefix key search on the key/values store, I'd love to hear your thoughts on why I'm scoping either too narrowly. Or perhaps your approach is better suited to the document store proposal as it would be interacting with query-based APIs that are more amenable to programmatic constraints, but I'd urge you to follow up on any of those linked items. |
While a list operation would be very useful, and I also recognize the desire for list with prefix, I currently do not support this proposal. Here is why:
Memcached is the first example that comes to mind (even before I saw that you mention it in your proposal) that highlights the limitations of this proposal. I think this proposal can be reduced to a List operation for all keys and easily approved that way. I agree with @WhitWaldo that having specialty state stores - separate building blocks for these in fact - makes a lot more sense! |
The requirements for the API are: | ||
|
||
- Ability to list all keys in a state store | ||
- Ability to list keys in a state store with a certain prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not supported by all state stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @elena-kolevska has a table with her analysis and she will be sharing here. We can modify the proposal to reduce the feature set of the list API and maximize coverage across state stores but that will not be 100%. We already have that today for existing state store features. I don't think that every single state store must support a feature to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I probably should have shared that from the beginning. I updated the proposal now with the table in an adendum.
20240627-BC-listapi.md
Outdated
|
||
- Ability to list all keys in a state store | ||
- Ability to list keys in a state store with a certain prefix | ||
- The results can be sorted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not supported by all state stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
- Ability to list all keys in a state store | ||
- Ability to list keys in a state store with a certain prefix | ||
- The results can be sorted | ||
- The results can be paginated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not supported by all state stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
||
## Default behaviour for state stores with missing features | ||
|
||
Some of the state stores Dapr supports don’t provide the necessary capabilities for implementing the list API. For example, Memcached doesn’t provide a way to list keys, Azure table storage can’t sort keys in descending order and so on. For those cases the list API will do a best effort to provide the closest functionality to the one defined in the API. The functionality will be specific to the data store and will be implemented on the component level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different philosophies - but I no longer support this approach in Dapr building blocks and think we should instead have a new specialty building block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having specialty building blocks. I think the state store is doing too much. On the other hand, it also misses the List API, which is a basic operation. I think we should consider moving some state store components out of the building block into a new one and rename state store to key-value store. Another route is to deprecate the state store API and have multiple specialized building blocks. For the sake of people's time availability, the most realistic path is to rebrand state store to key/value store and add list API. Non-compliant components will partially implement the API until a specialized building block is created.
I agree with @berndverst and I'm a 👎 on this proposal I had done some research about listing before, and it's a really hard problem. Even for backends that do support listing, pagination is implemented very differently, and even then, pagination is not consistent if the underlying state changes between requests. |
@elena-kolevska did a great research on this problem as well. I agree it is not an easy problem. Regarding pagination specifically, we can offer a simple API for "next" page only and that will allow a common scenario across state stores. The fact that it is not in a session (items listed can change between page requests) is an acceptable behavior in many applications - yes, not all. Dapr has never intended to cover 100% of the use cases for applications, it has been about covering what most apps need. So, for any new API proposed, not covering all scenarios has never been a blocker and it will not be different for list API. |
This is different from Query API in one fundamental way - it does not offer a query language that acts on the state values. This proposal includes filtering by key prefix, which is not even close to the query API. Filtering by key prefix can be valuable for applications that want to show only items that belong to an user and compose the key in a smart way to process those - it is a very common scenario.
The current state store API does not expect all state stores to match all of the API, see the "transaction" API, for example. The list API has been a demand for a while and the Query API failed to satisfy that (I can be blamed for that since I was part of that design). The problem of the Query API is that it is difficult for components to implement and it is an all-or-nothing type of deal. This List API, we can allow components to partially implement it (no prefix matching, for example).
I agree that this is the way to go. The current state store building block is trying to do too much and does not do any abstraction well enough (lack of List API makes it not even a KV store). On the other hand, I would suggest to view this proposal as a complement, making the state store a KV/store and components that are non-compliant will be moved to an specialized building block. Given the number of full-time contributors, waiting for all the new building blocks to be implement, plus components and SDKs, is not realistic. This proposal is a small and realistic step in the right direction. If there are enough contributors that can commit to deliver the new building blocks in a timely manner, we can still do the list API and rebrand state store as KV store.
I agree that cache should not be in the KV store. Also, KV store should not do query. Document store and relational store should be correct abstractions for query. In that case, applications will pick the best abstraction for their problem. Again, this proposal is making the current state store into a KV store. Lastly, if we all decide that we should create a new building block for KV store and deprecate the current state store (also a valid path), we can do but that will be a bigger commitment. Repeating myself, I agree with specialized building blocks, while also I agree that the List API in the existing state store is a realistic step given the dev cycles we have currently available for the Dapr project. Time-to-market must be considered. |
My summary (time-to-market matters):
|
One of the first questions I had when trying to use Dapr the first time is how do I list items? and any new user of Dapr will probably have the same question. And that is kind of expected for CRUD operations, and yeah, CRUD does not contains list in there, but it is just expected, how many REST CRUD APIs do you see without it? I concur with Artur's assesment. This is an achieavable step forward that does not invalidate or exclude the current concerns and future implementations of the loaftier goal of specialized state stores separation. Almost never moving forward to our goal means doing it in a straight line. |
Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
|
||
message ListStateRequest { | ||
// The prefix that should be used for listing. | ||
optional string prefix = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think prefix matching is important for apps to filter based on customer ID, scenarios like: all orders where key starts with "Customer1035143531|" since keys are composed as "customer Id|Order Id".
So, we should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should case-sensitivity be handled, if at all?
20240627-BC-listapi.md
Outdated
optional string page_token = 4; | ||
|
||
// Sorting order options | ||
enum Sort { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might remove sorting from the initial proposal. Some state stores may support a metadata param to handle it.
| **cockroachdb** | Yes, if sorting is required | Yes | Yes | Yes | Yes | Need to create an index on the search column | | ||
| **gcp firestore** | Yes | | | | | | | ||
| **in-memory** | No | No | No | No | No | We can implement all the features, but it’s not trivial to aggregate data across multiple instances | | ||
| **memcached** | No | No | No | No | No | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong to state store IMO, so we should not dismiss list API just because of this one.
|
||
Here's a list of the relevant capabilities of all the stable state stores: | ||
|
||
| Store | Cursor listing | Offset listing | Sorting | Number of Items per Page | Prefix Search | Comments | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this list, we can clearly see that with cursor listing, page limit and prefix search, we will have plenty of coverage.
20240627-BC-listapi.md
Outdated
| **azure blob store** | Yes (continuation token) | No | Always sorted in ASC order. Desc, or unsorted is not possible. | Yes | Yes | Results are always sorted by key name in ascending order. | | ||
| **azure cosmos db** | Yes | Yes | Yes | Yes | Yes | | | ||
| **azure table storage** | Yes | No | Yes, just ASC | Yes, with $top | Yes, with range search | Partition key is the application id. | | ||
| **cassandra** | Yes | No | No | Yes | No | Can’t prefix search and sort across all partitions. We could consider maintaining a new table containing all keys, and mirroring the original key’s ttl. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can cassandra to filtering within the same partition? It might be enough to begin with. It is a common thing in CosmosDb too for transaction API, for example.
@artursouza I'm eager to move forward on (1) myself and have put some thought into it. What's the best path forward to get the ball rolling on it? I intend to type up more formal proposals for each in the coming days. Should I be doing anything else as well? |
That is a great start! You may present them (or one at a time) in our Tuesday calls at 9am PST: https://zoom.us/j/91940016938?pwd=bGNRVmlPK094a0tQZWRlTTJIZUl6UT09 Also, feel free to ping me directly on Discord to remind to review them :) I can setup a separate recurring call for us to work together on those proposals as well, for faster feedback loop. |
Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
As agreed in our contributors meeting yesterday, I removed the sorting capability from the proposal. |
If sorting is going to be removed from the API surface, assumed due to technical restrictions? Does that imply that token-based pagination also isn't possible given its need to sort too? From the proposal:Token-based pagination Relies on a token usually equal to, or derived from the last element in the last returned page. Very common in no-sql databases that do a scan across the keyspace. In relational databases this method relies on an indexed column, such as a timestamp or an ID, to ensure efficient sorting and querying. For example:
|
Great question. It's not going to affect token sorting, because we'll be doing the sorting internally, in the call from components-contrib to the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 binding
This proposal proposes implementing a List API in Dapr's state component. The List API will enable the retrieval of keys in a state store based on certain criteria, providing users with the necessary visibility into stored keys.