Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic utility for querying existing BioThings APIs to retrieve canonical identifier (_id) used in specific APIs #198

Open
zcqian opened this issue Oct 22, 2021 · 0 comments

Comments

@zcqian
Copy link
Contributor

zcqian commented Oct 22, 2021

When developing additional data source uploads (or data plugins), the identifiers used in the source may not match the identifier used to join documents together.

Previously each uploader had to implement this functionality separately, for instance MyChem mostly uses datatransform module here which queries the MongoDB collections where other data is stored. Some MyDisease plugins queried MyDisease for the primary _id as shown here.

The downside of using datatransform is that it performs a lot of queries and the exact behavior is not well documented, and has a heavy dependency on MongoDB (using the BioThings APIs is not implemented in practice).

On the other hand querying each service, either bundled within BioThings SDK or doing it separately, introduces a chicken and egg problem: the API must be up before querying is possible, thus using it makes bootstrapping impossible or it may require doing the upload-build-release-install process at least twice to get most up to date data, as each time the identifier is retrieved using data from a previous release.

Either way, before BioThings SDK is capable of building documents by joining on arbitrary fields (i.e. not limited to joining on _id), we should provide a well-documented standard interface for this type of lookups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant