Home

Awesome

Local Shape Descriptors (for Neuron Segmentation)

This repository contains code to compute Local Shape Descriptors (LSDs) from an instance segmentation. LSDs can then be used during training as an auxiliary target, which we found to improve boundary prediction and therefore segmentation quality. Read more about it in our paper and/or blog post.

PaperBlog Post
PaperBlog post

Quick 2d Examples

Notebooks

Example networks & pipelines

Parallel processing


Cite:

@article{sheridan_local_2022,
	title = {Local shape descriptors for neuron segmentation},
	issn = {1548-7091, 1548-7105},
	url = {https://www.nature.com/articles/s41592-022-01711-z},
	doi = {10.1038/s41592-022-01711-z},
	urldate = {2023-01-12},
	journal = {Nature Methods},
	author = {Sheridan, Arlo and Nguyen, Tri M. and Deb, Diptodip and Lee, Wei-Chung Allen and Saalfeld, Stephan and Turaga, Srinivas C. and Manor, Uri and Funke, Jan},
	month = dec,
	year = {2022},
}

Notes:

conda create -n lsd_env python=3
conda activate lsd_env
conda install lsds -c conda-forge

also via pip:

mamba create -n lsd_env python=3
mamba activate lsd_env
pip install lsds
from lsd.train import local_shape_descriptor

<a name="example"></a>

Quick 2d Examples

The following tutorial allows you to run in the browser using google colab. In order to replicate the tutorial locally, create a conda environment and install the relevant packages. E.g:

  1. conda create -n lsd_test python=3
  2. conda activate lsd_test
  3. pip install lsds

tutorial: Open In Colab


<a name="nbook"></a>

Notebooks


<a name="networks"></a>

Example networks & pipelines

Training

Inference

<details> <summary>Visualizations of example training/prediction pipelines</summary> <br/><br/>

Vanilla affinities training:

<br/><br/> Autocontext LSD and affinities prediction:

</details>

<a name="parallel"></a>

Parallel processing

  1. Predict boundaries (this can involve the use of LSDs as an auxiliary task)
  2. Generate supervoxels (fragments) using seeded watershed. The fragment centers of mass are stored as region adjacency graph nodes.
  3. Generate edges between nodes using hierarchical agglomeration. The edges are weighted by the underlying affinities. Edges with lower scores are merged earlier.
  4. Cut the graph at a predefined threshold and relabel connected components. Store the node - component lookup tables.
  5. Use the lookup tables to relabel supervoxels and generate a segmentation.

Inference

The worker logic is located in individual predict.py scripts (example). The master script distributes using daisy.run_blockwise. The only need for MongoDb here is for the block check function (to check which blocks have successfully completed). To remove the need for mongo, one could remove the check function (remember to also remove block_done_callback in predict.py) or replace with custom function (e.g check chunk completion directly in output container).

<details> <summary>Example roi config</summary>
{
  "container": "hemi_roi_1.zarr",
  "offset": [140800, 205120, 198400],
  "size": [3000, 3000, 3000]
}
</details> <details> <summary>Example predict config</summary>
 {
  "base_dir": "/path/to/base/directory",
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "raw_file": "predict_roi.json",
  "raw_dataset" : "volumes/raw",
  "out_base" : "output",
  "file_name": "foo.zarr",
  "num_workers": 5,
  "db_host": "mongodb client",
  "db_name": "foo",
  "queue": "gpu_rtx",
  "singularity_image": "/path/to/singularity/image"
}
</details>

Watershed

The worker logic is located in a single script which is then distributed by the master script. By default the nodes are stored in mongo using a MongoDbGraphProvider. To write to file (i.e compressed numpy arrays), you can use the FileGraphProvider instead (inside the worker script).

<details> <summary>Example watershed config</summary>
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "affs_file": "foo.zarr",
  "affs_dataset": "/volumes/affs",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "block_size": [1000, 1000, 1000],
  "context": [248, 248, 248],
  "db_host": "mongodb client",
  "db_name": "foo",
  "num_workers": 6,
  "fragments_in_xy": false,
  "epsilon_agglomerate": 0,
  "queue": "local"
}
</details>

Agglomerate

Same as watershed. Worker script, master script. Change to FileGraphProvider if needed.

<details> <summary>Example agglomerate config</summary>
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "affs_file": "foo.zarr",
  "affs_dataset": "/volumes/affs",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "block_size": [1000, 1000, 1000],
  "context": [248, 248, 248],
  "db_host": "mongodb client",
  "db_name": "foo",
  "num_workers": 4,
  "queue": "local",
  "merge_function": "hist_quant_75"
}
</details>

Find segments

In contrast to the above three methods, when creating LUTs there just needs to be enough RAM to hold the RAG in memory. The only thing done in parallel is reading the graph (graph_provider.read_blockwise()). It could be adapted to use multiprocessing/dask for distributing the connected components for each threshold, but if the rag is too large there will be pickling errors when passing the nodes/edges. Daisy doesn't need to be used for scheduling here since nothing is written to containers.

<details> <summary>Example find segments config</summary>
{
  "db_host": "mongodb client",
  "db_name": "foo",
  "fragments_file": "foo.zarr",
  "edges_collection": "edges_hist_quant_75",
  "thresholds_minmax": [0, 1],
  "thresholds_step": 0.02,
  "block_size": [1000, 1000, 1000],
  "num_workers": 5,
  "fragments_dataset": "/volumes/fragments",
  "run_type": "test"
}
</details>

Extract segmentation

This script does use daisy to write the segmentation to file, but doesn't necessarily require bsub/sbatch to distribute (you can run locally).

<details> <summary>Example extract segmentation config</summary>
{
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "edges_collection": "edges_hist_quant_75",
  "threshold": 0.4,
  "block_size": [1000, 1000, 1000],
  "out_file": "foo.zarr",
  "out_dataset": "volumes/segmentation_40",
  "num_workers": 3,
  "run_type": "test"
}
</details>

Evaluate volumes

Evaluate Voi scores. Assumes dense voxel ground truth (not skeletons). This also assumes the ground truth (and segmentation) can fit into memory, which was fine for hemi and fib25 volumes assuming ~750 GB of RAM. The script should probably be refactored to run blockwise.

<details> <summary>Example evaluate volumes config</summary>
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "gt_file": "hemi_roi_1.zarr",
  "gt_dataset": "volumes/labels/neuron_ids",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "db_host": "mongodb client",
  "rag_db_name": "foo",
  "edges_collection": "edges_hist_quant_75",
  "scores_db_name": "scores",
  "thresholds_minmax": [0, 1],
  "thresholds_step": 0.02,
  "num_workers": 4,
  "method": "vanilla",
  "run_type": "test"
}
</details>

Evaluate annotations

For the zebrafinch, ground truth skeletons were used due to the size of the dataset. These skeletons were cropped, masked, and relabelled for the sub Rois that were tested in the paper. We evaluated voi, erl, and the mincut metric on the consolidated skeletons. The current implementation could be refactored / made more modular. It also uses node_collections which are now deprecated in daisy. To use with the current implementation, you should checkout daisy commit 39723ca.

<details> <summary>Example evaluate annotations config</summary>
{
  "experiment": "zebrafinch",
  "setup": "setup01",
  "iteration": 400000,
  "config_slab": "mtlsd",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "edges_db_host": "mongodb client",
  "edges_db_name": "foo",
  "edges_collection": "edges_hist_quant_75",
  "scores_db_name": "scores",
  "annotations_db_host": "mongo client",
  "annotations_db_name": "foo",
  "annotations_skeletons_collection_name": "zebrafinch",
  "node_components": "zebrafinch_components",
  "node_mask": "zebrafinch_mask",
  "roi_offset": [50800, 43200, 44100],
  "roi_shape": [10800, 10800, 10800],
  "thresholds_minmax": [0.5, 1],
  "thresholds_step": 1,
  "run_type": "11_micron_roi_masked"
}
</details>