Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to accommodate programmatic metadata alternations? #2

Open
dhimmel opened this issue Jan 29, 2020 · 4 comments
Open

How to accommodate programmatic metadata alternations? #2

dhimmel opened this issue Jan 29, 2020 · 4 comments

Comments

@dhimmel
Copy link
Collaborator

dhimmel commented Jan 29, 2020

expanding manubot/manubot#187 (comment) into an issue

In certain cases, it makes sense for users to enter only a subset of the final metadata that is needed by Pandoc filters and templates, and have a program auto-complete metadata.

For example, the following approaches be convenient for users and help avoid error-prone data duplication:

  1. assume author is a key-value object. If author.orcid is set, auto-complete missing author fields that can be retrieved from the ORCID API like author.name, author.email, author.affiliations.

  2. assume author affiliations are described via a alphnumeric key or even inline. Add an affiliations object with numbered affiliations for use in frontmatters.

  3. assume license is a key-value object. If license.spdx is set, detect license details from the SPDX API, such as name, URL, full text.

  4. adding metadata that the user doesn't explicitly provide a seed value for at all. For example, the commit hash of HEAD if executed within a git repository.

Do we need to make our schema aware of auto-completion / auto-population? Do we need multiple schema, like user-schema that describe what the user should input rather than the final output-schema? Should output-schema be a superset of input-schema such that auto-complete/populate only fills in additional values but does not delete any existing values?

@tarleb any general thoughts?

@tarleb
Copy link
Member

tarleb commented Jan 29, 2020

Do we need to make our schema aware of auto-completion / auto-population? Do we need multiple schema, like user-schema that describe what the user should input rather than the final output-schema? Should output-schema be a superset of input-schema such that auto-complete/populate only fills in additional values but does not delete any existing values?

Two schemas seems like a good idea. I would prefer the output schema to be mostly independent of the input schema, which should give us more flexibility. Automatically populated fields could be marked as optional (or rather: not be marked as required), and I would like to see them included in the schema.

Would it make sense to develop the auto-population scripts here as well? Pandoc's Lua is currently lacking appropriate support to deal with web APIs (unless we get to do this GSoC project). Maybe python?

@jcolomb
Copy link
Contributor

jcolomb commented Jan 29, 2020

I think it makes little sense, because there is very little things that can be completely automated, unfortunately. And when it can, users will want to proof read the results most of the time.
As an example, affiliation from orcid is very difficult/impossible: you will get multiple affiliation per users, and the right one might be missing. The right one is also not the latest one, because author should indicate the affiliation they had when they did the job (which can be years before the manuscript is written).
etc, etc

So I would just work on the output-schema one wants, and if some tools can autocomplete stuff, it needs to be done before pandoc take actions (can be done via python/R or other, but probably needs interaction with the user).

@jcolomb
Copy link
Contributor

jcolomb commented Jan 29, 2020

This might be done via a bot like weadon at joss, when asked it would:

  • make a new branch
  • autocomplete the metadata from entries already given (get all information available)
  • commit the change on new branch.

The user would then be asked to delete wrong/outdated information before merging.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Jan 31, 2020

So I would just work on the output-schema one wants, and if some tools can autocomplete stuff, it needs to be done before pandoc

Okay, let's focus for now on the schema for metadata provided to pandoc and not pre-processors. And keep this topic in the back of our minds.

Would it make sense to develop the auto-population scripts here as well?

I think this would expand the scope of this project too much at the moment. And the solutions won't be universal since different users will have different computational constraints. That being said, perhaps eventually we could create an official set of Python / Haskell / Lua auto-completion scripts.

With Manubot, we're set considerable amounts of metadata automatically (example) in Python. I think there is a lot of opportunity to split out some of the more general purpose auto-completion, but first we should create the schema.

Pandoc does some additional metadata tweaks during runtime, which further complicates things a bit... like if the --bibliography option is supplied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants