You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When developing additional data source uploads (or data plugins), the identifiers used in the source may not match the identifier used to join documents together.
Previously each uploader had to implement this functionality separately, for instance MyChem mostly uses datatransform module here which queries the MongoDB collections where other data is stored. Some MyDisease plugins queried MyDisease for the primary _id as shown here.
The downside of using datatransform is that it performs a lot of queries and the exact behavior is not well documented, and has a heavy dependency on MongoDB (using the BioThings APIs is not implemented in practice).
On the other hand querying each service, either bundled within BioThings SDK or doing it separately, introduces a chicken and egg problem: the API must be up before querying is possible, thus using it makes bootstrapping impossible or it may require doing the upload-build-release-install process at least twice to get most up to date data, as each time the identifier is retrieved using data from a previous release.
Either way, before BioThings SDK is capable of building documents by joining on arbitrary fields (i.e. not limited to joining on _id), we should provide a well-documented standard interface for this type of lookups.
The text was updated successfully, but these errors were encountered:
When developing additional data source uploads (or data plugins), the identifiers used in the source may not match the identifier used to join documents together.
Previously each uploader had to implement this functionality separately, for instance MyChem mostly uses datatransform module here which queries the MongoDB collections where other data is stored. Some MyDisease plugins queried MyDisease for the primary
_id
as shown here.The downside of using
datatransform
is that it performs a lot of queries and the exact behavior is not well documented, and has a heavy dependency on MongoDB (using the BioThings APIs is not implemented in practice).On the other hand querying each service, either bundled within BioThings SDK or doing it separately, introduces a chicken and egg problem: the API must be up before querying is possible, thus using it makes bootstrapping impossible or it may require doing the upload-build-release-install process at least twice to get most up to date data, as each time the identifier is retrieved using data from a previous release.
Either way, before BioThings SDK is capable of building documents by joining on arbitrary fields (i.e. not limited to joining on
_id
), we should provide a well-documented standard interface for this type of lookups.The text was updated successfully, but these errors were encountered: