Convolutional neural networks were trained to predict the locations of postsynaptic sites and their corresponding presynaptic sites in the EM dataset using the method described in Buhmann et al. 2021 Nature Methods. The implementation of this pipeline is available on Github at funkelab/synful.
This neuroglancer link shows the two layers output by the two CNNs: one predicts whether a given pixel is a postsynaptic site, and the other predicts vectors pointing from postsynaptic sites to presynaptic sites.
From the postsynaptic site probabilities, 83,917,332 discrete postsynaptic sites were extracted. The corresponding presynaptic sites were identified via the vector predictions at each postsynaptic site. The full set of ~84 million synaptic links can be found in the google cloud storage bucket gs://zetta_lee_fly_vnc_001_alignment_temp/v4/fill_nearest_mip1/img/img_seethrough/synful_extraction/229bd2f77b2adf7c0e2c5b90ed605098/8.6_8.6_45/
A number of filters were applied to prune the ~84 million synaptic links down to a final set of ~50 million that constitute the final synapse table.
Any synaptic link where either the presynaptic or the postsynaptic site isn't associated with any reconstructed object (specifically, has no supervoxel at its position) was excluded. This filtered out 836,640 synapses (~1%), bringing the number of remaining synapses from 83,917,332 to 83,080,692.
The synful
package provides a score for each predicted synaptic link. We found that thresholding the predictions by keeping only the ones with sum_score > 12
(roughly meaning that the postsynaptic site had more than 12 voxels predicted to be a postsynaptic location) produced the maximal f-score when evaluating performance on ground truth synapse annotations. We applied this threshold, filtering out 26,566,091 synapses (~32%), bringing the number of remaining synapses from 83,080,692 to 56,514,601.
Any synaptic link that connects a given supervoxel to itself was excluded. This removed 798,050 synapses (~1%), bringing the number of remaining synapses from 56,514,601 to 55,716,551.
Sometimes a single pair of supervoxels will be connected by two (or more) different synaptic links. This occurs due to limitations in the synful
approach – in essentially all of these cases, the pair of supervoxels should only be connected a single time. In these cases, we removed any duplicates, leaving only the single link with the largest score to connect any given pair of supervoxels. This removed 5,724,380 synapses (~7%), bringing the number of remaining synapses from 55,716,551 to 49,992,171.
Similar to the section above, sometimes a single dendritic twig will be connected to the same presynaptic neuron by multiple links, but without having the same exact pair of supervoxels connected. We removed duplicate links that connect the same two segIDs if the links' presynaptic locations are within 150nm of one another. We examined the some of the cases identified by this approach and indeed all the instances we examined were duplicates that deserved to be removed. Here's a neuroglancer link to 5 examples. Thanks to Sven Dorkenwald for providing code for this step, which was also applied to the FAFB/FlyWire synapse table. This step removed 4,936,246 synapes (5.9%), bringing the number of remaining synapses from 49,992,171 to 45,055,925.
The final set of 45,055,925 synapses is available:
- as a csv file on google cloud storage at
gs://lee-lab_female-adult-nerve-cord/alignmentV4/synapses/synapses_Nov2022/
. Link to view files through browser. - through a CAVE table named
synapses_nov2022
. Link to view table through browser.
The score
column is the sum_score
provided by synful
, converted from float to int.
We next check which region(s)/neuropil(s) each synapse is in.
Algorithm: For each neuropil-synapse pair, we first check whether the synapse is at least in the bounding box of the neuropil mesh by simple arithmetic comparisons of x, y, z coordinates. If the synapse is in the bounding box, we further check whether it's actually contained in the mesh using a ray-casting-based algorithm from the Trimesh peckage.
Implementation: Practically, we load the CSV synapse table dump and add boolean columns "is_in_<region_name>" to indicate whether each synapse is in the corresponding region. This allows for overlapping/hierarchical organization of regions. We also split the synapse table into chunks and distribute the workload among many worker processes using a payload pool. Each payload checks whether a set of synapses is in a single given neuropil. The results are merged and written into a Parquet file.
Usage: Use neuropil_identification/locate_neuropil.py
:
usage: locate_neuropil [-h] [-c CHUNK_SIZE] [-p PROCS]
input_file output_file mesh_dir
Identify which neuropil/tract each synapse is in
positional arguments:
input_file Input CSV file listing all synapses
output_file Output Parquet file identifying the regions
mesh_dir Path to mesh files
options:
-h, --help show this help message and exit
-c CHUNK_SIZE, --chunk_size CHUNK_SIZE
Synapses are localized in small chunks. Set the chunk
size here.
-p PROCS, --procs PROCS
Number of worker processes
For example:
python locate_neuropil.py \
~/Data/fanc/synapse_table_20221120/raw_dump/20221117_fanc_syn.csv \ # input
~/Data/fanc/synapse_table_20221120/localization_res/synapse_location.parquet \ # output
FANC_auto_recon/data/volume_meshes/JRC2018_VNC_UNISEX_to_FANC/meshes_by_side \ # meshes
-c 10000 -p 12 # run in chunks of 10,000 synapses with 12 worker processes
Alternatively, one can also import locate_neuropil
from neuropil_identification/locate_neuropil.py
in Python. See docstring for details.