-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release/drs 1.5.0 #408
base: master
Are you sure you want to change the base?
Release/drs 1.5.0 #408
Conversation
DRS 1.4.0 adds bulk operations, this update includes a syntax fix in the schema
… made against master instead of develop
* adding stats endpoint * stats endpoint * move stats into status_info * adding the two new variables to the service-info response (objectCloud and totalObjectSize). --------- Co-authored-by: Brian O'Connor <[email protected]>
@@ -3,7 +3,6 @@ description: The object that contains the DRS object IDs array | |||
properties: | |||
bulk_object_ids: | |||
type: array | |||
description: DRS object IDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor question: why was this description deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a duplicate "description" entry here which caused a silent build failure for our documentation (took a long time for me to find that!!)
All of Us +1 on this PR -- nice list of small incremental enhancements, with no downside |
Thanks @dglazer !! Can you edit your comment to indicate the driver project(s) you're voting for? Assuming All of Us but just wanted to make it clear as we tally the votes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roche imCORE +1
description: >- | ||
Name of the cloud service provider that the object belongs to. | ||
If the cloud service is Amazon Web Services, Google Cloud Platform or Azure the values should be `aws`, `gcp`, or `azure` respectively. | ||
example: aws, gcp, or azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how pedantic you want to be but this is more a repeat of the description and not a valid example. Maybe aws
to keep it consistent with your region example?
|
||
With DRS objects it may be necessary to attach additional metadata to your objects. We believe that a change to the API of DRS to include metadata is not in the spirit of the DRS spec and in general DRS should have no knowledge of the metadata associated with the objects. We have found that a good GA4GH standard to support this is Data Connect (https://github.com/ga4gh-discovery/data-connect). The general approach would be to have a Data Connect service on your platform and to include "tables" with the ID matching your DRS ID for the same object. This means that if you have metadata associated with an object id `abcd` (ex. additional information about Compound Objects) all you need to do is request the information from the Data Connect client at `/tables/abcd/info`. There are optional functionalities of Data Connect, such as querying of tables, but we do not explore them or give any recommendations here. | ||
|
||
Here is an example of using Data Connect with DRS in the fasp-scripts repository (https://github.com/ga4gh/fasp-scripts/blob/master/notebooks/drs/DRS%20File%20Data.ipynb). In this notebook we can see that data connect is used to get DRS IDs from a platform. Those DRS IDs are then used to gather aditional information about the file that might be necessary for analysis. This is just one example of how DRS and Data Connect can interact with each other to gather information about data on a platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "Data Connect" here "... can see that data connect is used to ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vote: -1
I'm part of the PCGL driver project. In its current state, I am voting against merging this version of the DRS spec, for the reasons outlined below.
@@ -6,6 +6,12 @@ properties: | |||
maxBulkRequestLength: | |||
type: integer | |||
description: The max length the bullk request endpoints can handle (>= 1) before generating a 413 error e.g. how long can the arrays bulk_object_ids and bulk_object_access_ids be for this server. | |||
objectCount: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the documentation, these stats show as top-level in the "response sample", vs in a "stats" object in the example
} | ||
... | ||
"stats": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not match what the response sample indicates
additionally, I would prefer some consistency with projects like refget and htsget, where the service-specific additions to the service-info response are nested in an object named after the service artifact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey David, could you point me to the specifics for refget and htsget. I would love to understand how they are written so we could incorporate those into DRS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out @davidlougheed , I think it's a good idea to be consistent here so Michael or I will take a look and try to address your feedback ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For refget, see http://samtools.github.io/hts-specs/refget.html - specifically GET /sequence/service-info
and the Example JSON request and response section below it. They put myriad service-specific service info properties in a refget
sub-object, which I like as a pattern since it makes it clear it's an extension of the service-info spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I don't feel as strongly about how the stats are nested in the response as much as I feel that they should be nested, in one form or another.
This pull request compares the primary branch (master) with the release/drs-1.5.0 branch.
DRS 1.5.0 has several small but useful changes that set us up for more significant changes in 1.6 and beyond. These include:
Documentation for 1.5 can be found here and the schema can be examined here.
Please feel free to use this PR to comment on DRS 1.5.0 and, for GA4GH Driver Projects, we ask you to use this PR to vote on approval for DRS 1.5. The voting process is simple: add a comment with 1) your driver project name, 2) +1 (approved), 0 (neutral), or -1 (concerns blocking approval), and 3) any comments (required if you vote "-1", the cloud lead and DRS champions will reach out to discuss).
We would like to have voting by driver projects done by Oct 1st, 2024 with a followup discussion and merge (if no "-1" votes are received) by the following Cloud WS meeting.