FLoRIN¶
FLoRIN, the Flexible Learning-free Reconstruction of Neural Volumes pipeline, is a pipeline for large-scale parallel and distributed computer vision. Offering easy setup and access to hierarchical parallelism, FLorIN is ideal for scaling computer vision to HPC systems.
Originally, this project was our response to the question of how to segment and reconstruct neural microscopy (e.g., micro-CT tomography, low-resolution electron microscopy, fluorescence microscopy etc.) without large amounts of training data available to train a neural network. We tackled this problem by revisiting classical computer vision methods, eventually developing the N-Dimensional Neighborhood Thresholding (NDNT) algorithm as a modern update to integral image-based thresholding. FLoRIN has since been shown to be a fast, robust segmentation and reconstruction engine across different imaging modalities and datasets.
This package implements the NDNT algorithm, as well as a straightforward API for mixed serial, parallel, and distributed computer vision. These docs provide examples of how to use FLoRIN with various mixtures of serial and parallel processing and how to customize the FLoRIN pipeline with new functions and features.
Publications¶
- Shahbazi, Ali, Jeffery Kinnison, Rafael Vescovi, Ming Du, Robert Hill, Maximilian Jösch, Marc Takeno et al. “Flexible Learning-Free Segmentation and Reconstruction of Neural Volumes.” Scientific reports 8, no. 1 (2018): 14247.
Installation¶
FLoRIN can be installed with all of its Python dependencies through the Python Package Index or Anaconda.
PyPI¶
pip install florin
Anaconda¶
conda install -c jeffkinnison florin
Python Dependencies¶
- Python 3.4+
- numpy
- scipy
- scikit-image
- pathos
- mpi4py
- h5py
Examples using FLoRIN¶
A First Example¶
This example will walk through basic FLoRIN usage segmenting and reconstructing a small X-Ray volume.
Segmenting¶
The following code sets up a serial pipeline to segment the image:
import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding
# Set up a serial pipeline
pipeline = florin.Serial(
# Load in the volume from file
florin.load(),
# Tile the volume into overlapping 64 x 64 x 10 subvolumes
florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),
# Threshold with NDNT
thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),
# Clean up a little bit
morphology.binary_opening(),
# Save the output to a TIFF stack
florin.save('segmented.tiff')
)
# Run the pipeline
segmented = pipeline()
At the end of the pipeline, a TIFF stack with the binary segmentation will be output.
Weak Classification¶
After we have the binary mask, we want to determine what type of structure each object is. The previous pipeline can be extended to perform weak classification by user-defined bounds on the segmented objects:
import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding
# Set up a serial pipeline
pipeline = florin.Serial(
# Load in the volume from file
florin.load(),
# Tile the volume into overlapping 64 x 64 x 10 subvolumes
florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),
# Threshold with NDNT
thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),
# Clean up a little bit
morphology.binary_opening(),
# Save the output to a TIFF stack
florin.save('segmented.tiff'),
# Find connected components
conncomp.label(),
morphology.remove_small_holes(min_size=20),
conncomp.regionprops(),
# Classify the connected components by their volume and dimensions
florin.classify(
florin.bounds_classifier(
'cell',
area=(100, 300),
depth=(10, 25),
width=(50, 100),
height=(50, 100)
),
florin.bounds_classifier('vasculature')
),
# Reconstruct the labeled volume
florin.reconstruct(),
# Write out the labeled volume
florin.save('labeled.tiff')
)
# Run the pipeline
segmented = pipeline()
This pipeline save both the binary segmentation and the labeled volume where each class is represented by a different color.
Closing Remarks¶
Rolling out a basic FLoRIN pipeline is relatively easy (20 lines of code without the comments and whitespace). This example runs everything on a single cores, but the next example demonstrates parallel processing, which is just as easy to set up.
Parallel Processing Pipelines¶
This example will show how to convert the previous example to perform multiprocessing on the tiles and connected components created during segmentation and weak classification, respectively.
Parallelism¶
Parallel processing can be invoked by creating sub-pipelines around commands that will receive multiple inputs.
Multithreading
import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding
# Set up a serial pipeline
pipeline = florin.Serial(
# Load in the volume from file
florin.load(),
# Tile the volume into overlapping 64 x 64 x 10 subvolumes
florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),
florin.Multithread(
# Threshold with NDNT
thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),
# Clean up a little bit
morphology.binary_opening()
),
# Save the output to a TIFF stack
florin.save('segmented.tiff'),
# Find connected components
conncomp.label(),
morphology.remove_small_holes(min_size=20),
conncomp.regionprops(),
# Classify the connected components by their volume and dimensions
florin.Multithread(
florin.classify(
florin.bounds_classifier(
'cell',
area=(100, 300),
depth=(10, 25),
width=(50, 100),
height=(50, 100)
),
florin.bounds_classifier('vasculature')
)
)
# Reconstruct the labeled volume
florin.reconstruct(),
# Write out the labeled volume
florin.save('labeled.tiff')
)
# Run the pipeline
segmented = pipeline()
Multiprocessing
import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding
# Set up a serial pipeline
pipeline = florin.Serial(
# Load in the volume from file
florin.load(),
# Tile the volume into overlapping 64 x 64 x 10 subvolumes
florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),
florin.Multiprocess(
# Threshold with NDNT
thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),
# Clean up a little bit
morphology.binary_opening()
),
# Save the output to a TIFF stack
florin.save('segmented.tiff'),
# Find connected components
conncomp.label(),
morphology.remove_small_holes(min_size=20),
conncomp.regionprops(),
# Classify the connected components by their volume and dimensions
florin.Multiprocess(
florin.classify(
florin.bounds_classifier(
'cell',
area=(100, 300),
depth=(10, 25),
width=(50, 100),
height=(50, 100)
),
florin.bounds_classifier('vasculature')
)
)
# Reconstruct the labeled volume
florin.reconstruct(),
# Write out the labeled volume
florin.save('labeled.tiff')
)
# Run the pipeline
segmented = pipeline()
MPI
import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding
# Set up a serial pipeline
pipeline = florin.Serial(
# Load in the volume from file
florin.load(),
# Tile the volume into overlapping 64 x 64 x 10 subvolumes
florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),
florin.MPI(
# Threshold with NDNT
thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),
# Clean up a little bit
morphology.binary_opening()
),
# Save the output to a TIFF stack
florin.save('segmented.tiff'),
# Find connected components
conncomp.label(),
morphology.remove_small_holes(min_size=20),
conncomp.regionprops(),
# Classify the connected components by their volume and dimensions
florin.MPI(
florin.classify(
florin.bounds_classifier(
'cell',
area=(100, 300),
depth=(10, 25),
width=(50, 100),
height=(50, 100)
),
florin.bounds_classifier('vasculature')
)
)
# Reconstruct the labeled volume
florin.reconstruct(),
# Write out the labeled volume
florin.save('labeled.tiff')
)
# Run the pipeline
segmented = pipeline()
All of these examples scale to the number of availble cores (or MPI ranks in the MPI version), and can be parameterized to use a specific number when the sub-pipelines are created.
Mixed Parallelism¶
Using the sub-pipeline model in the above example, it is possible to mix parallel processing paradigms. For example, segmenting tiles with NDNT uses vectorized operations and may be better suited to multi-node parallelism with MPI, but classification is more lightweight and can be carried out in threads. This sort of a pipeline would look like:
import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding
# Set up a serial pipeline
pipeline = florin.Serial(
# Load in the volume from file
florin.load(),
# Tile the volume into overlapping 64 x 64 x 10 subvolumes
florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),
florin.MPI(
# Threshold with NDNT
thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),
# Clean up a little bit
morphology.binary_opening()
),
# Save the output to a TIFF stack
florin.save('segmented.tiff'),
# Find connected components
conncomp.label(),
morphology.remove_small_holes(min_size=20),
conncomp.regionprops(),
# Classify the connected components by their volume and dimensions
florin.Multithread(
florin.classify(
florin.bounds_classifier(
'cell',
area=(100, 300),
depth=(10, 25),
width=(50, 100),
height=(50, 100)
),
florin.bounds_classifier('vasculature')
)
)
# Reconstruct the labeled volume
florin.reconstruct(),
# Write out the labeled volume
florin.save('labeled.tiff')
)
# Run the pipeline
segmented = pipeline()
In this case, an implicit join after the MPI pipeline converts merges the segmented tiles into a single volume. Connected components are then computed over the whole volume and classified concurrently using a multithreading model.
Closing Remarks¶
Parallel processing with FLoRIN is as easy as specifying the type of parallel
pipeline to use, and they are roughly interchangeable (MPI requires using the
standard mpirun
or mpiexec
invocations, or an equivalent).
Using Custom Functions in FLoRIN¶
Because of the wide array of computer vision methods, FLoRIN comes with utilities to prepare functions. This section will go over the two cases for preparing functions: without parameters, and with parameters.
Single-Argument Functions¶
Functions with a single argument (e.g., those taking a single image or a single
numpy array and no other arguments) require no additional preparation. This
example shows how to incorporate np.squeeze
into a pipeline:
import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding
import numpy as np
# Set up a serial pipeline
pipeline = florin.Serial(
# Load in the volume from file
florin.load(),
# Tile the volume into overlapping 64 x 64 x 10 subvolumes
florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),
# Remove any axes with shape 1. Simply pass np.squeeze without invoking
np.squeeze,
# Threshold with NDNT
thresholding.ndnt(shape=(10, 64, 64), threshold=0.3),
# Clean up a little bit
morphology.binary_opening(),
# Save the output to a TIFF stack
florin.save('segmented.tiff')
)
# Run the pipeline
segmented = pipeline()
Note that np.squeeze
is not invoked. The function is just passed to the
pipeline as-is, and FLoRIN will call it later.
Parameterizing Functions with florinate
¶
Functions with parameters can also be used within FLoRIN by wrapping them with
florin.florinate
. This function records any parameters passed while setting
up the pipeline and then automatically applies them when the data comes through
(i.e. partial function application):
.. content-tabs::
Decorator
import florin import florin.conncomp as conncomp import florin.morphology as morphology import florin.thresholding as thresholding # Create the custom function and decorate it with ``florinate`` @florin.florinate def scale(image, scalar=1): """Scale an images values by some number. Parameters ---------- image : array_like scale : int or float Returns ------- image * scale """ return image * scale # Set up a serial pipeline pipeline = florin.Serial( # Load in the volume from file florin.load(), # Tile the volume into overlapping 64 x 64 x 10 subvolumes florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)), # Add the custom function to the pipeline scale(scalar=2.0), # Threshold with NDNT thresholding.ndnt(shape=(10, 64, 64), threshold=0.3), # Clean up a little bit morphology.binary_opening(), # Save the output to a TIFF stack florin.save('segmented.tiff') ) # Run the pipeline segmented = pipeline()In-Line
import florin import florin.conncomp as conncomp import florin.morphology as morphology import florin.thresholding as thresholding # Create the custom function def scale(image, scalar=1): """Scale an images values by some number. Parameters ---------- image : array_like scale : int or float Returns ------- image * scale """ return image * scale # Set up a serial pipeline pipeline = florin.Serial( # Load in the volume from file florin.load(), # Add the custom function to the pipeline and wrap it in ``florinate`` florin.florinate(scale)(scalar=2.0), # Tile the volume into overlapping 64 x 64 x 10 subvolumes florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)), # Add the custom function to the pipeline scale(scalar=2.0), # Threshold with NDNT thresholding.ndnt(shape=(10, 64, 64), threshold=0.3), # Clean up a little bit morphology.binary_opening(), # Save the output to a TIFF stack florin.save('segmented.tiff') ) # Run the pipeline segmented = pipeline()
florinate
will handle any number of arguments and keyword arguments passed
to it, applying them every time the function is called during the pipeline.
Why florinate
?¶
The functools
module already has an implementation of partial functions
(functools.partial
), the the natural question is: why reinvent the wheel?
When building FLoRIN, we noticed that most computer vision functions take the
image as the first argument; functools.partial
, however will only
append arguments when called. florinate
solves this by prepending the
argument(s) when called, lining up with the norm for computer vision APIs.
If a custom function takes the image as the last argument,
functools.partial
can be used in place of florinate
with no changes.
Adding New Pipeline Types to FLoRIN¶
FLoRIN offers a number of pipeline options (Serial, Multithread, Multiprocess, etc.) out of the box, but what if you need a different model? This example will show how to create a custom pipeline class with a different style of execution.
SLURMPipeline¶
Suppose you work on a cluster that uses SLURM and want to submit a job to a queue. This requires a pipeline that
- Accepts parameters to configure
sbatch
- Sets up a job script
- Submits the job script for processing
- Blocks until all jobs are finished
Such a pipeline may look like this
import re
import subprocess
import time
import dill # dill is installed with florin
from florin.pipelines import Pipeline
class SLURMPipeline(Pipeline):
"""Pipeline that sets up and runs a SLURM job.
Parameters
----------
operations : callables
The functions of the pipeline.
Other Parameters
----------------
Keyword arguments corresponding to SLURM directives, e.g. qos='debug',
time=60, etc. These are dynamically added to the jobscript before
submission.
"""
def __init__(self, *operations, **kwargs):
super(SLURMPipeline, self).__init__(*operations)
self.slurm_directives = kwargs
def run(self, data):
"""Submit and run a pipeline on SLURM.
Parameters
----------
data : list
The input to the first function in the pipeline, e.g. a
filepath for florin.load().
"""
# Serialize this current pipeline
pipeline_path = 'my_pipeline.pkl'
self.dump(pipeline_path)
# Set up the job script. This sets up the shebang header, then
# iterates over the provided #SBATCH disrectives and sets each one
# up on its own line, then finally invokes srun to deserialize the
# pipeline and run it on the data.
jobscript = "#/usr/bin/env bash"
jobscript = '\n'.join(
['#!/usr/bin/env bash'] +
['#SBATCH --{}={}'.format(key, val) for key, val in self.slurm_directives.items()] +
['srun python -m florin.run {} $1'.format(pipeline_path)])
# Dump the jobscript to file
with open('my_jobscript.job', 'w') as f:
f.write(jobscript)
jobids = []
# Submit one job for each data item.
for item in data:
out = subprocess.check_output(['sbatch', my_jobscript, item])
jobids.append(re.search(r'([\d]+)', out).group())
# Wait until all jobs have completed to exit.
while len(jobids) > 0:
time.sleep(10)
completed = []
for jid in jobids:
out = subprocess.check_output(['sacct', '-j', jid])
if re.search(r'(COMPLETE)', out):
completed.add(jid)
for jid in completed:
jobids.remove(jid)
Note that this code is untested and by no means guaranteed to work, it is only meant to be a non-trivial example of what a custom pipeline may look like.
Other Examples¶
Another great source of examples for setting up custom pipelines is the
florin.pipelines
module, where the source code for the officially
supported pipelines.
API Documentation¶
florin |
The FLoRIN pipeline for large-scale learning-free computer vision. |
florin.classification |
Utilities for classifying connected components. |
florin.closure |
Closure decorator for delayed processing. |
florin.compose |
Deferred function composition with functools. |
florin.conncomp |
Convenience functions for image connected components operations. |
florin.io |
I/O functions for loading and saving data in a variety of formats. |
florin.morphology |
Convenience functions for image morphological operations. |
florin.ndnt |
N-Dimensional Neighborhood Thresholding for any-dimensional data. |
florin.pipelines |
Deferred execution pipelines with different computational models. |
florin.reconstruction |
Reconstruct connected components as an array of pixel-wise class labels. |
florin.thresholding |
Convenience functions for image thresholding operations. |
florin.tiling |
Utilities for tiling images and volumes. |