We store large electron microscopy datasets on cloud storage and offer a web site (openorganelle openorganelle.janelia.org/)) for browsing those datasets. That site uses this github repo to discover which datasets exist, what their metadata is, which volumes are contained within each dataset, etc.
At a high level, this repository represents a collection of independent datasets, where each dataset contains the following elements:
- a thumbnail image
- metadata describing the entire dataset.
- metadata describing the views associated with the dataset
- a collection of metadata describing each source associated with the dataset
This metadata is stored in the metadata
folder. Here is how this specification is implemented for a single dataset:
metadata/jrc_fly-fsb-1
├── metadata.json
├── sources
│ └── fibsem-uint16.json
├── thumbnail.jpg
└── views.json
The structures of the different JSON files are defined by python classes:
Consuming applications (like OpenOrganelle) benefit from minimizing the number of I/O requests needed to discover datasets. This means that ideally all the metadata for a single dataset should be consolidated into a single file. However, editing metadata is much simpler when it is distributed across multiple logically separable files. Thus, this repository contains two representations of the same information: in addition to the write-optimized metadata
folder described above, there is a read-optiminzed api
folder that contains consolidated metadata for each dataset, as well as additional metadata to describe the set of all datasets, e.g.:
api/jrc_fly-fsb-1
├── manifest.json
└── thumbnail.jpg
OpenOrganelle accesses these files directly from github for each dataset.
Because the api
directory is derived entirely from the contents of the metadata
directory, you should not edit the contents of api
directly. Instead, the api
directory is created programmatically by the generate_endpoints.py
script, which is used by a github actions workflow triggered on each commit.
To modify, add, or remove metadata, clone this repository, make changes, and submit a pull request. Although the models are defined as python data structures and python is used for data validation, the metadata itself is all JSON, so no coding is needed for simple metadata updates.
To develop locally, clone this repository, install the poetry
package manager in your python environment, and run poetry install
in the root of the cloned directory to install dependencies for this project.