This repository accompanies the preprint Neurotransmitter Classification from Electron Microscopy Images at Synaptic Sites in Drosophila.
Predictions for the most common neurotransmitters (GABA, acetylcholine, glutamate, serotonin, octopamine, and dopamine) are publicly available for the following two datasets:
We used our method to predict the neurotransmitter identity of all annotated synapses in the FAFB volume. Our predictions are available through FlyWire and as a direct download in Apache Feather format.
Neurotransmitter predictions for all automatically located synapses in the HemiBrain are available here in Apache Feather format.
The following shows how to recreate our results on the FAFB volume and might be useful to train and predict in other datasets.
- Singularity
cd singularity
make
- Conda
conda create -n synister -c conda-forge -c funkey python numpy scipy cython pylp pytorch-gpu
conda activate synister
pip install -r requirements.txt
pip install .
An export of the three collections constituting the synister FAFB database used
for all described experiments can be found at data/fafb_v3
. The three
files contain:
- Location, id, skid, brain region and split for each synapse (synapses(_v3).json).
- Skid, neurotransmitter, hemilineage id for each skeleton (skeletons(_v3).json).
- Hemilineage name, hemilineage id for each hemilineage (hemilineages(_v3).json).
To reproduce the experiment each json file should be imported as a collection
with names "synapses", "skeletons", "hemilineages" in one mongo database (for
additional instructions on how to import json files in a mongo db click
here). Dictionary keys
are field names. Provided splits can be reproduced using
synister/split.py
, which searches for the optimally balanced split in
terms of neurotransmitter distribution for any given superset, such as
hemilineage id, skeleton id or brain region.
For training on other data, recreate the database scheme shown here (required are a "synapses" and a "skeletons" collection) and adapt config files to match the new database name.
python prepare_training.py -d <base_dir> -e <experiment_name> -t <train_id>
This creates a new directory at the specified path and initialises default config files for the run.
cd <base_dir>/<experiment_name>/02_train/setup_t<train_id>
Edit config files to match architecture, database and compute resources to train with.
For example configs, training a VGG on the skeleton split, inside a singularity container, on a gpu queue see:
example_configs/train_config.ini
[Training]
synapse_types = gaba, acetylcholine, glutamate, serotonin, octopamine, dopamine
input_shape = 16, 160, 160
fmaps = 12
batch_size = 8
db_credentials = synister_data/credentials/db_credentials.ini
db_name_data = synister_v3
split_name = skeleton
voxel_size = 40, 4, 4
raw_container = /nrs/saalfeld/FAFB00/v14_align_tps_20170818_dmg.n5
raw_dataset = volumes/raw/s0
downsample_factors = (1,2,2), (1,2,2), (1,2,2), (2,2,2)
network = VGG
fmap_inc = 2, 2, 2, 2
n_convolutions = 2, 2, 2, 2
network_appendix = None
example_configs/worker_config.ini
[Worker]
singularity_container = synister/singularity/synister.img
num_cpus = 5
num_block_workers = 1
num_cache_workers = 5
queue = gpu_any
mount_dirs = /nrs, /scratch, /groups, /misc
Finally, to submit the train job with the desired number of iterations run:
python train.py <num_iterations>
We recommend training for at least 500,000 iterations for FAVB_v3 splits.
For visualizing training progress run:
tensorboard --logdir <base_dir>/<experiment_name>/02_train/setup_t<train_id>/log
Snapshots are written to:
<base_dir>/<experiment_name>/02_train/setup_t<train_id>/snapshots
python prepare_prediction.py -v -d <base_dir> -e <experiment_name> -t <train_id> -i <iter_0> <iter_1> <iter_2> ... <iter_N>
This will create N prediction directories with appropriately initialized config files, one for each given train iteration <iter_k>. The -v flag sets the split part of the chosen split type to validation, only pulling those synapses from the DB that are tagged as validation synapses.
example_configs/worker_config.ini
[Predict]
train_checkpoint = <experiment_name>/02_train/setup_t<train_id>/model_checkpoint_<iter_k>
experiment = <experiment_name>
train_number = 0
predict_number = 0
synapse_types = gaba, acetylcholine, glutamate, serotonin, octopamine, dopamine
input_shape = 16, 160, 160
fmaps = 12
batch_size = 8
db_credentials = synister_data/credentials/db_credentials.ini
db_name_data = synister_v3
split_name = skeleton
voxel_size = 40, 4, 4
raw_container = /nrs/saalfeld/FAFB00/v14_align_tps_20170818_dmg.n5
raw_dataset = volumes/raw/s0
downsample_factors = (1, 2, 2), (1, 2, 2), (1, 2, 2), (2, 2, 2)
split_part = validation
overwrite = False
network = VGG
fmap_inc = 2, 2, 2, 2
n_convolutions = 2, 2, 2, 2
network_appendix = None
For most use cases the automatically initialized predict config does not require any edits. If run as is, predictions will be written into the database under:
<db_name>_predictions.<split_name>_<experiment_name>_t<train_id>_p<predict_id>
To start the prediction, run:
cd <base_dir>/<experiment_name>/03_predict/setup_t<train_id>_p<predict_id>
python predict.py
If the collection already exists the script will abort. A collection can be overwritten by setting overwrite=True in the predict config. Parallel prediction with multiple GPUs can be done by setting num_block_workers=num_gpus in the worker_config file. Prediction speed and expected time to finish will be shown in the console.
For submitting multiple predictions to the cluster at once run the provided convenience script:
python start_predictions -d <base_dir> -e <experiment_name> -t <train_id> -p <predict_id_0> <predict_id_1> ... <predict_id_N>
python prepare_prediction.py -d <base_dir> -e <experiment_name> -t <train_id> -i <iter_0> <iter_1> <iter_2> ... <iter_N>
Similar to validation this prepares the relevant dictionaries and config files but sets the split part to "test".
Starting the prediction follow the same pattern as before:
cd <base_dir>/<experiment_name>/03_predict/setup_t<train_id>_p<predict_id>
python predict.py