Metadata for publicly hosted fibsem datasets

Motivation

We store large electron microscopy datasets on cloud storage and offer a web site (openorganelle openorganelle.janelia.org/)) for browsing those datasets. That site uses this github repo to discover which datasets exist, what their metadata is, which volumes are contained within each dataset, etc.

Metadata model

At a high level, this repository represents a collection of independent datasets, where each dataset contains the following elements:

a thumbnail image
metadata describing the entire dataset.
metadata describing the views associated with the dataset
a collection of metadata describing each source associated with the dataset

This metadata is stored in the metadata folder. Here is how this specification is implemented for a single dataset:

metadata/jrc_fly-fsb-1
├── metadata.json
├── sources
│   └── fibsem-uint16.json
├── thumbnail.jpg
└── views.json

The structures of the different JSON files are defined by python classes:

Metadata API

Consuming applications (like OpenOrganelle) benefit from minimizing the number of I/O requests needed to discover datasets. This means that ideally all the metadata for a single dataset should be consolidated into a single file. However, editing metadata is much simpler when it is distributed across multiple logically separable files. Thus, this repository contains two representations of the same information: in addition to the write-optimized metadata folder described above, there is a read-optiminzed api folder that contains consolidated metadata for each dataset, as well as additional metadata to describe the set of all datasets, e.g.:

api/jrc_fly-fsb-1
├── manifest.json
└── thumbnail.jpg

OpenOrganelle accesses these files directly from github for each dataset.

Because the api directory is derived entirely from the contents of the metadata directory, you should not edit the contents of api directly. Instead, the api directory is created programmatically by the generate_endpoints.py script, which is used by a github actions workflow triggered on each commit.

Adding metadata

To modify, add, or remove metadata, clone this repository, make changes, and submit a pull request. Although the models are defined as python data structures and python is used for data validation, the metadata itself is all JSON, so no coding is needed for simple metadata updates.

Local development

To develop locally, clone this repository, install the poetry package manager in your python environment, and run poetry install in the root of the cloned directory to install dependencies for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 421 Commits
.github/workflows		.github/workflows
api		api
metadata		metadata
src/fibsem_metadata		src/fibsem_metadata
tests		tests
.gitignore		.gitignore
README.md		README.md
ingest.py		ingest.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata for publicly hosted fibsem datasets

Motivation

Metadata model

Metadata API

Adding metadata

Local development

About

Releases

Packages

Languages

avweigel/fibsem-metadata

Folders and files

Latest commit

History

Repository files navigation

Metadata for publicly hosted fibsem datasets

Motivation

Metadata model

Metadata API

Adding metadata

Local development

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages