Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List API for the state store building block #61

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

elena-kolevska
Copy link
Contributor

@elena-kolevska elena-kolevska commented Jun 27, 2024

This proposal proposes implementing a List API in Dapr's state component. The List API will enable the retrieval of keys in a state store based on certain criteria, providing users with the necessary visibility into stored keys.

Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
@elena-kolevska elena-kolevska marked this pull request as ready for review June 27, 2024 13:35
@WhitWaldo
Copy link

This proposal is quite similar to a discussion I've been having for the last several months. I can tell you already that I would not expect it's likely the proposal as-is would receive much traction for the following reasons:

  • It proposes functionality quite similar to the Query API which proposes filtering, sorting and paging on key or value. This API has been discontinued because of the difficulty of maintaining it against the great many providers the state management building block supports.
  • Too few of the providers supported in the state building block support all that you propose. Coincidentally, I did a quick check earlier tonight against the list just to see which ones support key prefix filtering alone and it's a clear minority of the currently-supported stores, especially when you exclude relational databases.

Rather, I've been working to propose and solicit traction for a next-generation of specialty state stores in Dapr that each support a core feature set specific to their unique purposes. I just finished some updates to my proposal for a refreshed Key/Value Store that would support streaming and key retrieval based on prefix filtering, @berndverst has an outstanding proposal for a Document Store that would specifically allow document-focused queries and I just typed up one for a dedicated Centralized Cache Store.

While querying keys is out of scope for the cache proposal and I'm not initially inclined to do more than a prefix key search on the key/values store, I'd love to hear your thoughts on why I'm scoping either too narrowly. Or perhaps your approach is better suited to the document store proposal as it would be interacting with query-based APIs that are more amenable to programmatic constraints, but I'd urge you to follow up on any of those linked items.

@berndverst
Copy link
Member

berndverst commented Jul 3, 2024

While a list operation would be very useful, and I also recognize the desire for list with prefix, I currently do not support this proposal. Here is why:

  • List with prefix is not natively available in all underlying state store backend services. I do not support in memory filtering within the sidecar itself - this can lead to OOM problems.
  • Pagination is not natively supported in all underlying state store backend services.
  • Sorting is not natively supported in all underlying state store backend services.

Memcached is the first example that comes to mind (even before I saw that you mention it in your proposal) that highlights the limitations of this proposal.

I think this proposal can be reduced to a List operation for all keys and easily approved that way.
However, I cannot support augmenting our building block interface with yet more methods or parameters that only a subset of components can implement. And as I mentioned the in-memory handling workaround would be problematic.

I agree with @WhitWaldo that having specialty state stores - separate building blocks for these in fact - makes a lot more sense!

The requirements for the API are:

- Ability to list all keys in a state store
- Ability to list keys in a state store with a certain prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not supported by all state stores

Copy link
Member

@artursouza artursouza Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @elena-kolevska has a table with her analysis and she will be sharing here. We can modify the proposal to reduce the feature set of the list API and maximize coverage across state stores but that will not be 100%. We already have that today for existing state store features. I don't think that every single state store must support a feature to be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I probably should have shared that from the beginning. I updated the proposal now with the table in an adendum.


- Ability to list all keys in a state store
- Ability to list keys in a state store with a certain prefix
- The results can be sorted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not supported by all state stores

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

- Ability to list all keys in a state store
- Ability to list keys in a state store with a certain prefix
- The results can be sorted
- The results can be paginated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not supported by all state stores

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


## Default behaviour for state stores with missing features

Some of the state stores Dapr supports don’t provide the necessary capabilities for implementing the list API. For example, Memcached doesn’t provide a way to list keys, Azure table storage can’t sort keys in descending order and so on. For those cases the list API will do a best effort to provide the closest functionality to the one defined in the API. The functionality will be specific to the data store and will be implemented on the component level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different philosophies - but I no longer support this approach in Dapr building blocks and think we should instead have a new specialty building block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having specialty building blocks. I think the state store is doing too much. On the other hand, it also misses the List API, which is a basic operation. I think we should consider moving some state store components out of the building block into a new one and rename state store to key-value store. Another route is to deprecate the state store API and have multiple specialized building blocks. For the sake of people's time availability, the most realistic path is to rebrand state store to key/value store and add list API. Non-compliant components will partially implement the API until a specialized building block is created.

@ItalyPaleAle
Copy link
Contributor

I agree with @berndverst and I'm a 👎 on this proposal

I had done some research about listing before, and it's a really hard problem.

Even for backends that do support listing, pagination is implemented very differently, and even then, pagination is not consistent if the underlying state changes between requests.

@artursouza
Copy link
Member

I agree with @berndverst and I'm a 👎 on this proposal

I had done some research about listing before, and it's a really hard problem.

Even for backends that do support listing, pagination is implemented very differently, and even then, pagination is not consistent if the underlying state changes between requests.

@elena-kolevska did a great research on this problem as well. I agree it is not an easy problem.

Regarding pagination specifically, we can offer a simple API for "next" page only and that will allow a common scenario across state stores. The fact that it is not in a session (items listed can change between page requests) is an acceptable behavior in many applications - yes, not all.

Dapr has never intended to cover 100% of the use cases for applications, it has been about covering what most apps need. So, for any new API proposed, not covering all scenarios has never been a blocker and it will not be different for list API.

@artursouza
Copy link
Member

artursouza commented Sep 23, 2024

This proposal is quite similar to a discussion I've been having for the last several months. I can tell you already that I would not expect it's likely the proposal as-is would receive much traction for the following reasons:

  • It proposes functionality quite similar to the Query API which proposes filtering, sorting and paging on key or value. This API has been discontinued because of the difficulty of maintaining it against the great many providers the state management building block supports.

This is different from Query API in one fundamental way - it does not offer a query language that acts on the state values. This proposal includes filtering by key prefix, which is not even close to the query API. Filtering by key prefix can be valuable for applications that want to show only items that belong to an user and compose the key in a smart way to process those - it is a very common scenario.

  • Too few of the providers supported in the state building block support all that you propose. Coincidentally, I did a quick check earlier tonight against the list just to see which ones support key prefix filtering alone and it's a clear minority of the currently-supported stores, especially when you exclude relational databases.

The current state store API does not expect all state stores to match all of the API, see the "transaction" API, for example. The list API has been a demand for a while and the Query API failed to satisfy that (I can be blamed for that since I was part of that design). The problem of the Query API is that it is difficult for components to implement and it is an all-or-nothing type of deal. This List API, we can allow components to partially implement it (no prefix matching, for example).

Rather, I've been working to propose and solicit traction for a next-generation of specialty state stores in Dapr that each support a core feature set specific to their unique purposes. I just finished some updates to my proposal for a refreshed Key/Value Store that would support streaming and key retrieval based on prefix filtering, @berndverst has an outstanding proposal for a Document Store that would specifically allow document-focused queries and I just typed up one for a dedicated Centralized Cache Store.

I agree that this is the way to go. The current state store building block is trying to do too much and does not do any abstraction well enough (lack of List API makes it not even a KV store). On the other hand, I would suggest to view this proposal as a complement, making the state store a KV/store and components that are non-compliant will be moved to an specialized building block. Given the number of full-time contributors, waiting for all the new building blocks to be implement, plus components and SDKs, is not realistic. This proposal is a small and realistic step in the right direction. If there are enough contributors that can commit to deliver the new building blocks in a timely manner, we can still do the list API and rebrand state store as KV store.

While querying keys is out of scope for the cache proposal and I'm not initially inclined to do more than a prefix key search on the key/values store, I'd love to hear your thoughts on why I'm scoping either too narrowly. Or perhaps your approach is better suited to the document store proposal as it would be interacting with query-based APIs that are more amenable to programmatic constraints, but I'd urge you to follow up on any of those linked items.

I agree that cache should not be in the KV store. Also, KV store should not do query. Document store and relational store should be correct abstractions for query. In that case, applications will pick the best abstraction for their problem. Again, this proposal is making the current state store into a KV store.

Lastly, if we all decide that we should create a new building block for KV store and deprecate the current state store (also a valid path), we can do but that will be a bigger commitment. Repeating myself, I agree with specialized building blocks, while also I agree that the List API in the existing state store is a realistic step given the dev cycles we have currently available for the Dapr project. Time-to-market must be considered.

@artursouza
Copy link
Member

artursouza commented Sep 23, 2024

My summary (time-to-market matters):

  1. Dapr's state store does too much and it should be split into specialized state stores instead: cache, kv store, document store, relational store.
  2. Time-to-market to have those specialized stores is unknown (meaning too far in the future) and the list API is satisfying an immediate need that the Query API failed to do (reasons discussed above).
  3. (1) and (2) are not mutually exclusive.
  4. I am happy to work with contributors that want to deliver any building block from (1).

@filintod
Copy link

One of the first questions I had when trying to use Dapr the first time is how do I list items? and any new user of Dapr will probably have the same question. And that is kind of expected for CRUD operations, and yeah, CRUD does not contains list in there, but it is just expected, how many REST CRUD APIs do you see without it?

I concur with Artur's assesment. This is an achieavable step forward that does not invalidate or exclude the current concerns and future implementations of the loaftier goal of specialized state stores separation. Almost never moving forward to our goal means doing it in a straight line.


message ListStateRequest {
// The prefix that should be used for listing.
optional string prefix = 1;
Copy link
Member

@artursouza artursouza Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prefix matching is important for apps to filter based on customer ID, scenarios like: all orders where key starts with "Customer1035143531|" since keys are composed as "customer Id|Order Id".

So, we should keep it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should case-sensitivity be handled, if at all?

optional string page_token = 4;

// Sorting order options
enum Sort {
Copy link
Member

@artursouza artursouza Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might remove sorting from the initial proposal. Some state stores may support a metadata param to handle it.

20240627-BC-listapi.md Outdated Show resolved Hide resolved
| **cockroachdb** | Yes, if sorting is required | Yes | Yes | Yes | Yes | Need to create an index on the search column |
| **gcp firestore** | Yes |   |   |   |   |   |
| **in-memory** | No | No | No | No | No | We can implement all the features, but it’s not trivial to aggregate data across multiple instances |
| **memcached** | No | No | No | No | No |   |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not belong to state store IMO, so we should not dismiss list API just because of this one.


Here's a list of the relevant capabilities of all the stable state stores:

| Store | Cursor listing | Offset listing | Sorting | Number of Items per Page | Prefix Search | Comments |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this list, we can clearly see that with cursor listing, page limit and prefix search, we will have plenty of coverage.

| **azure blob store** | Yes (continuation token) | No | Always sorted in ASC order. Desc, or unsorted is not possible. | Yes | Yes | Results are always sorted by key name in ascending order. |
| **azure cosmos db** | Yes | Yes | Yes | Yes | Yes |   |
| **azure table storage** | Yes | No | Yes, just ASC | Yes, with $top | Yes, with range search | Partition key is the application id. |
| **cassandra** | Yes | No | No | Yes | No | Can’t prefix search and sort across all partitions. We could consider maintaining a new table containing all keys, and mirroring the original key’s ttl. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can cassandra to filtering within the same partition? It might be enough to begin with. It is a common thing in CosmosDb too for transaction API, for example.

@WhitWaldo
Copy link

WhitWaldo commented Sep 24, 2024

My summary (time-to-market matters):

1. Dapr's state store does too much and it should be split into specialized state stores instead: cache, kv store, document store, relational store.

2. Time-to-market to have those specialized stores is unknown (meaning too far in the future) and the list API is satisfying an immediate need that the Query API failed to do (reasons discussed above).

3. (1) and (2) are not mutually exclusive.

4. I am happy to work with contributors that want to deliver any building block from (1).

@artursouza I'm eager to move forward on (1) myself and have put some thought into it. What's the best path forward to get the ball rolling on it? I intend to type up more formal proposals for each in the coming days. Should I be doing anything else as well?

@artursouza
Copy link
Member

My summary (time-to-market matters):

1. Dapr's state store does too much and it should be split into specialized state stores instead: cache, kv store, document store, relational store.

2. Time-to-market to have those specialized stores is unknown (meaning too far in the future) and the list API is satisfying an immediate need that the Query API failed to do (reasons discussed above).

3. (1) and (2) are not mutually exclusive.

4. I am happy to work with contributors that want to deliver any building block from (1).

@artursouza I'm eager to move forward on (1) myself and have put some thought into it. What's the best path forward to get the ball rolling on it? I intend to type up more formal proposals for each in the coming days. Should I be doing anything else as well?

That is a great start! You may present them (or one at a time) in our Tuesday calls at 9am PST: https://zoom.us/j/91940016938?pwd=bGNRVmlPK094a0tQZWRlTTJIZUl6UT09

Also, feel free to ping me directly on Discord to remind to review them :) I can setup a separate recurring call for us to work together on those proposals as well, for faster feedback loop.

Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Elena Kolevska <[email protected]>
@elena-kolevska
Copy link
Contributor Author

As agreed in our contributors meeting yesterday, I removed the sorting capability from the proposal.

@olitomlinson
Copy link

As agreed in our contributors meeting yesterday, I removed the sorting capability from the proposal.

If sorting is going to be removed from the API surface, assumed due to technical restrictions? Does that imply that token-based pagination also isn't possible given its need to sort too?

From the proposal:

Token-based pagination

Relies on a token usually equal to, or derived from the last element in the last returned page.

Very common in no-sql databases that do a scan across the keyspace.

In relational databases this method relies on an indexed column, such as a timestamp or an ID, to ensure efficient sorting and querying. For example:

SELECT * FROM items WHERE key > last_key_id ORDER BY key;

@elena-kolevska
Copy link
Contributor Author

If sorting is going to be removed from the API surface, assumed due to technical restrictions? Does that imply that token-based pagination also isn't possible given its need to sort too?

Great question. It's not going to affect token sorting, because we'll be doing the sorting internally, in the call from components-contrib to the database.
All the no-sql databases (with the exception of memcached) support token pagination natively.
Non of the SQL databases support token pagination natively, but all of them support sorting, so we'll implement token-based pagination ourselves. When we call list on a SQL database, we would receive all the keys sorted in ASC order, without specifying any order ourselves.

Copy link
Member

@artursouza artursouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 binding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

7 participants