Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate Authors: format is not specified, and not enriched with ORCID #82

Open
yarikoptic opened this issue Sep 23, 2024 · 10 comments
Labels
consistency Aspect requiring special treatment/logic outside of generic common principles linked-data

Comments

@yarikoptic
Copy link
Contributor

ATM Authors field is just a list of free form strings. That complicates interlinking bids datasets easily across authors/contributors, as expressed e.g. by @lzehl in their pipeline for inclusion of BIDS datasets within openMINDs KG etc. ATM we have in the
https://github.com/bids-standard/bids-specification/blob/HEAD/src/schema/objects/metadata.yaml#L207
we have

Authors:
  name: Authors
  display_name: Authors
  description: |
    List of individuals who contributed to the creation/curation of the dataset.
  type: array
  items:
    type: string

I feel that we better start defining some notion of a Person which would have clear mapping schema.org's Person record. Critically

  • separate first (given) from last (family) name

Ideally, provide additional metadata relevant per person

  • ORCID
  • Affiliation(s), also in a standardized form

If we solve the critical issue, we could then rely on CITATION.cff to be the one which would provide necessary more structured data (ORCID, affiliations), but need to keep in mind some of already identified shortcomings (which might later be worked out in CITATION.cff):

and unfortunately CITATION.cff not really seeing much of "development" in the past year(s).

@yarikoptic yarikoptic added the consistency Aspect requiring special treatment/logic outside of generic common principles label Sep 23, 2024
@yarikoptic yarikoptic changed the title Disambiguate Authors Disambiguate Authors: format is not specified, and not enriched with ORCID Sep 23, 2024
@yarikoptic yarikoptic moved this to Todo in BIDS 2.0 Sep 24, 2024
@kabilar
Copy link

kabilar commented Sep 25, 2024

Thanks Yarik. I like the idea of disambiguating the authors list in the dataset_description.json, and using the CITATION.cff for ORCID, affiliations, etc. A couple of additional suggestions are below.

  1. If we are using both the Authors key in the dataset_description.json file and the CITATION.cff file, perhaps we will need to change the current docs:

    Requirement level for Authors key in dataset_description.json:

    RECOMMENDED if CITATION.cff is not present

    CITATION.cff

    For most redundant fields between CITATION.cff and dataset_description.json, the CITATION.cff SHOULD take precedence. To avoid inconsistency, metadata present in CITATION.cff SHOULD NOT be be included in dataset_description.json, with the exception of Name and DatasetDOI, to ensure that CITATION.cff-unaware tools can generate references to the dataset. In particular, if CITATION.cff is present, the "Authors" field of dataset_description.json MUST be omitted, and the "HowToAcknowledge", "License" and "ReferencesAndLinks" SHOULD be omitted in favor of the CITATION.cff fields message/preferred-citation, license and references.

  2. Perhaps BIDS validators should test for inconsistencies between the authors listed in the dataset_description.json and the CITATION.cff.

@Remi-Gau
Copy link

Since bids can already have citation.cff in them to curate authors information in a more structured manner, I am not sure what this issue is meant to bring.

@Remi-Gau
Copy link

If anything for bids 2 I would remove authors from the dataset description all together and force it in citation.cff

@kabilar
Copy link

kabilar commented Sep 25, 2024

If anything for bids 2 I would remove authors from the dataset description all together and force it in citation.cff

Thanks @Remi-Gau. For the BIDS 1.x to BIDS 2.0 migration script, would we then parse the Authors key from the dataset_description.json, generate a CITATION.cff file if it doesn't already exist, and add the authors to the CITATION.cff?

@Remi-Gau
Copy link

I had started working on a CLI tool to help with this a long time ago, but I was waiting for the bids validator to validate citation.cff file to get back to it.

https://github.com/Remi-Gau/bids2cite

So I should probably start dusting it up.

@lzehl
Copy link

lzehl commented Sep 25, 2024

Hi. To put this into context from our side: any clear recommendation from bids on how to write an author string within the specifications would already lower the variability across bids datasets. Of course long-term we would definitely prefer a structured registration of authors through e.g. the cff

@kabilar
Copy link

kabilar commented Sep 26, 2024

I had started working on a CLI tool to help with this a long time ago, but I was waiting for the bids validator to validate citation.cff file to get back to it.

https://github.com/Remi-Gau/bids2cite

So I should probably start dusting it up.

This is great. Thank you @Remi-Gau.

@kabilar
Copy link

kabilar commented Sep 26, 2024

Hi. To put this into context from our side: any clear recommendation from bids on how to write an author string within the specifications would already lower the variability across bids datasets. Of course long-term we would definitely prefer a structured registration of authors through e.g. the cff

Thank you @lzehl. I definitely agree.

@yarikoptic
Copy link
Contributor Author

by any, do you think Last, First would be sufficient @lzehl ?

@lzehl
Copy link

lzehl commented Oct 7, 2024

Yes. That would harmonize already a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Aspect requiring special treatment/logic outside of generic common principles linked-data
Projects
Status: Todo
Development

No branches or pull requests

4 participants