Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the entirety of datasets through identifiers.org #15

Open
yarikoptic opened this issue Jan 10, 2018 · 1 comment
Open

the entirety of datasets through identifiers.org #15

yarikoptic opened this issue Jan 10, 2018 · 1 comment

Comments

@yarikoptic
Copy link
Member

yarikoptic commented Jan 10, 2018

Cons

Analysis/possible difficulties

  • I do not see yet how to discover individual IDs/datasets for a particular prefix (sent out a question via their web interface; the answer was: not at the moment, but it sounded to them as an interesting feature so might come at some point)
  • Not all prefixes relate to "datasets", but some are known as "(data) collections": https://www.ebi.ac.uk/miriam/main/collections/
  • I do not think there is any versioning, but most probably it is assumed that an identifier points to immutable dataset
  • There will be a lot of datasets. So we would need some sensible structure/hierarchy. First level would be the identifier. Then we could partition even further splitting IDs on / and -.
  • There seems to be no "filename" information provided. So we would have choices:
    • like a default git-annex behavior - just use the entire url to compose a unique filename
    • one from the URL (often from Content-Disposition header field) - but that might lead to conflicts since we would allow only for a flat structure:
      • we could preanalyze the entire list of those first and see if conflicts arise. If there are conflicts, try to deduce somehow disambiguating structure. but that is unreliable in case a dataset record changes with more files etc
      • just add an arbitrary, or based on some metadata?, numeric index in addition
@chrisgorgo
Copy link

BTW NeuroVault uses identifiers.org: http://identifiers.org/neurovault.collection and http://identifiers.org/neurovault.image. Happy to answer any questions I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants