Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store custom metadata separately #65

Open
sciepsilon opened this issue Nov 7, 2020 · 0 comments
Open

Store custom metadata separately #65

sciepsilon opened this issue Nov 7, 2020 · 0 comments
Labels
enhancement hours or days Tasks that will take more than an hour, but less than 20 hours

Comments

@sciepsilon
Copy link
Member

LingView stores metadata about each text. This metadata comes from three places: direct metadata from the original ELAN or FLEx file (e.g. title and tiers), "autometadata" from LingView's automated processes (e.g. the uploaded date), and custom metadata, which is any info the LingView user chooses to set when they run the edit.js script (e.g. description, genre, a better title). If you're curious, see the wiki for a full description of metadata. Metadata that comes directly from the ELAN or FLEx file can be safely forgotten and recreated each time LingView rebuilds, but LingView needs to remember the other two kinds or else they'll be lost.

The current setup: Metadata is stored in index.json and in the individual text's json files; each one contains a complete copy of the metadata. Whenever LingView rebuilds the site, index.json is updated, never destroyed, in order to preserve the metadata. The individual text json files are recreated using the metadata from the index. LingView uses the index's copy of the metadata when showing the index of texts and the search page, and it uses an individual text's file when displaying a text.

The current setup is brittle; it's easy for metadata to be lost. Currently, deleting a FLEx or ELAN file and then rebuilding the site will cause its metadata to be deleted. You can bring back the FLEx or ELAN file and rebuild again, but its old uploaded date and anything that was set using the edit.js script are gone forever. Metadata can also be lost by making large changes to a file: when a text is exported from ELAN to FLEx or vice versa, its URI identifier (sometimes? always?) changes, so LingView decides these are separate texts and it won't carry over the old metadata to the new file. On the other hand, removing metadata from the ELAN or FLEx file and then rebuilding isn't a reliable way to get rid of that metadata, even metadata that originally came from the ELAN or FLEx file.

Sometimes LingView users do want to clear out the old metadata and start fresh, but this should only be an option, not something that happens by surprise.

The plan:

  • Just like before, the index.json and individual text's json files should each contain a copy of the metadata.
  • Additionally, the "metadata to be remembered" - that is, autometadata and custom metadata - should each be stored in some other place. Maybe we should actually have two separate places, one for autometadata and one for edit.js metadata. There should be a separate file for each text, and they should be named based on the URI identifier.
  • When rebuilding the site, the old index.json should be ignored, and the "metadata to be remembered" files should be used in combination with the FLEx or ELAN file to build the full set of metadata. Just like before, we the full set of metadata in index.json and we also store a copy of it in the individual text's json file.
  • When we rebuild after a FLEx or ELAN file has been deleted, we should remove its individual json file and remove it from the index, but don't remove its "metadata to be remembered" files. Like the media files, these should only be deleted if a human manually deletes them.
@sciepsilon sciepsilon added hours or days Tasks that will take more than an hour, but less than 20 hours enhancement labels Nov 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement hours or days Tasks that will take more than an hour, but less than 20 hours
Projects
None yet
Development

No branches or pull requests

1 participant