Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow queries without batch keys #170

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

arnodirlam
Copy link

I have a use case where a query should run in dataloader without batching, together with all other (possibly batched) queries.

A few examples (see tests added in this PR):

# all entries in the table
Dataloader.load(Test, {:many, Post}, [])

# first entry in the entire table
Dataloader.load(Test, {:one, Post, limit: 1, order_by: [asc: :id]}, [])

The changes needed are minimal.

I'm happy to make another PR for the v1 branch, if and when this gets merged.

Thanks a lot for your consideration! 🙏

@benwilson512
Copy link
Contributor

Hi @arnodirlam can you elaborate what this accomplishes beyond calling Repo.all in your resolver?

@arnodirlam
Copy link
Author

Hi @arnodirlam can you elaborate what this accomplishes beyond calling Repo.all in your resolver?

Hi @benwilson512 thanks for your quick reply!

For context, I'm using dataloader outside of GraphQL as a dependency for my library dx.

The idea here is a Repo.all with the following advantages that dataloader already brings to the table:

  • unified API for executing queries
  • caching & loading results for duplicate queries
  • concurrent execution when several queries need to be made

I'll try to make arguments in the context of running in a GraphQL resolver, though:

For example, say you have a resolver that needs to load a database table with a global set of data. Could be some kind of app-wide config. In the app I'm working on, for example, we have tables for user roles that are global. That could be exposed via GraphQL.

In that case, there is no batch key, because the data is global, but writing Repo.all in the resolver without using dataloader is a sequential ("blocking") call, i.e. it won't be run concurrently with other queries from other resolvers. Also, depending on the GraphQL schema, it might be executed multiple times, for example if the data is used to evaluate some further authorization logic.

Does that make sense?

@benwilson512
Copy link
Contributor

It does. My only thought from an API design standpoint is that maybe we should require that users pass in a specific special value like :root or :global or :all instead of []. My concern is that [] might just be a mistake where someone forgot to put in a batch key.

@arnodirlam
Copy link
Author

I see. I might be biased here, but to me [] makes most sense.

To me, it's a list of filters, where [] means not filtered.

I see a point in helping prevent user mistakes by using :all or similar instead. However,

  1. I think, [] already is quite "far away" from [color: "red"] etc., because the latter is either passed in as Keyword list such as ... , color: "red"), then you'd have to replace all that with [] to make a mistake. Or it's generated programmatically, in which case the [] makes more sense, so users don't have to make an extra pass to replace [] with :all or similar.
  2. I don't see an enhanced need to protect users from making these kinds of mistakes, that they could make anywhere else, really. Just think of crafting an Ecto request, or calling a function, and making a mistake like forgetting to pass in a filter. That would lead to errors in most other contexts as well, and this one is not fundamentally different, in my eyes.

On another note, I'm considering crafting a PR to enable multiple filters in that list, internally detecting the most efficient key to batch on and executing the queries accordingly. For example, [color: "red", type: "special"]. Internally, it'd count which key has the highest cardinality, use that for batching, and generate queries to satisfy the other conditions. I've already implemented this in my library, and could extract and move it upstream to dataloader, if you're interested. Then the "list of filters/conditions" paradigm makes even more sense.

@benwilson512
Copy link
Contributor

benwilson512 commented May 15, 2024

I had not really conceptualized the batch key as filter, rather that's the job of the middle arg. The params in the middle argument are filters that apply to the entire batch to be loaded. The right most arg is what uniquely identifiers this particular item within the batch. In general you want a single value there so that the SQL is ultimately some form of WHERE $col = ANY?($ids). You can do that with multiple columns but it gets a lot uglier.

EDIT: Put another way, the argument you're calling the "batch key" is the "key of this item in the batch" rather than the key of the batch as a whole. The key of the batch as a whole within dataloader is the middle arg.

@arnodirlam
Copy link
Author

Hi there, sorry for letting this linger. I'd still be interested in getting it merged. Just rebased my branch on main.

My only thought from an API design standpoint is that maybe we should require that users pass in a specific special value like :root or :global or :all instead of []. My concern is that [] might just be a mistake where someone forgot to put in a batch key.

I'd prefer either [] or :all, but I'm happy with any solution and will modify my PR accordingly. Up to you!

Thanks! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants