Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable shallow and deep queries #53

Open
nocollier opened this issue Apr 30, 2024 · 0 comments
Open

Enable shallow and deep queries #53

nocollier opened this issue Apr 30, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@nocollier
Copy link
Member

When looking for data in ESGF, a common mode of working is to first search for a few facets and then use the unique column values to refine your search iteratively. This is currently very slow for initial queries that will return many records. This is what we currently implement when you call search():

  • Query each index node with your search and build a pandas dataframe from the responses
  • Merge this information into a single dataframe, removing older versions and populating lists of dataset_ids which contain location information
  • Call df.unique() to get the unique facet columns and return as the __repr__ of the catalog

The Solr indices will take a long time to return the complete response and even if Globus is faster, it consumes a lot of resources for information we really didn't need in early stages of the search.

Instead, we could have search perform what I will call a shallow query. That is, we return 0 records, but ask the index for the unique facets that are part of the search. This response we use to manually build up the unique facet columns and the underlying dataframe remains empty initially.

When the user makes reference to cat.df (either directly or indirectly by calling something that uses it, such as to_dataset_dict()), then we pay the price of the full search, hoping that you have a better idea of what you need at this point.

@nocollier nocollier added the enhancement New feature or request label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant