FLoRIN

FLoRIN, the Flexible Learning-free Reconstruction of Neural Volumes pipeline, is a pipeline for large-scale parallel and distributed computer vision. Offering easy setup and access to hierarchical parallelism, FLorIN is ideal for scaling computer vision to HPC systems.

Originally, this project was our response to the question of how to segment and reconstruct neural microscopy (e.g., micro-CT tomography, low-resolution electron microscopy, fluorescence microscopy etc.) without large amounts of training data available to train a neural network. We tackled this problem by revisiting classical computer vision methods, eventually developing the N-Dimensional Neighborhood Thresholding (NDNT) algorithm as a modern update to integral image-based thresholding. FLoRIN has since been shown to be a fast, robust segmentation and reconstruction engine across different imaging modalities and datasets.

This package implements the NDNT algorithm, as well as a straightforward API for mixed serial, parallel, and distributed computer vision. These docs provide examples of how to use FLoRIN with various mixtures of serial and parallel processing and how to customize the FLoRIN pipeline with new functions and features.

Installation

pip

pip install florin

anaconda

conda install -c jeffkinnison florin

Publications

  1. Shahbazi, Ali, Jeffery Kinnison, Rafael Vescovi, Ming Du, Robert Hill, Maximilian Jösch, Marc Takeno et al. “Flexible Learning-Free Segmentation and Reconstruction of Neural Volumes.” Scientific reports 8, no. 1 (2018): 14247.

Installation

FLoRIN can be installed with all of its Python dependencies through the Python Package Index or Anaconda.

PyPI

pip install florin

Anaconda

conda install -c jeffkinnison florin

Python Dependencies

  • Python 3.4+
  • numpy
  • scipy
  • scikit-image
  • pathos
  • mpi4py
  • h5py

Examples using FLoRIN

A First Example

This example will walk through basic FLoRIN usage segmenting and reconstructing a small X-Ray volume.

Segmenting

The following code sets up a serial pipeline to segment the image:

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    # Threshold with NDNT
    thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),

    # Clean up a little bit
    morphology.binary_opening(),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff')
)

# Run the pipeline
segmented = pipeline()

At the end of the pipeline, a TIFF stack with the binary segmentation will be output.

Weak Classification

After we have the binary mask, we want to determine what type of structure each object is. The previous pipeline can be extended to perform weak classification by user-defined bounds on the segmented objects:

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    # Threshold with NDNT
    thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),

    # Clean up a little bit
    morphology.binary_opening(),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff'),

    # Find connected components
    conncomp.label(),
    morphology.remove_small_holes(min_size=20),
    conncomp.regionprops(),

    # Classify the connected components by their volume and dimensions
    florin.classify(
        florin.bounds_classifier(
            'cell',
            area=(100, 300),
            depth=(10, 25),
            width=(50, 100),
            height=(50, 100)
        ),
        florin.bounds_classifier('vasculature')
    ),

    # Reconstruct the labeled volume
    florin.reconstruct(),

    # Write out the labeled volume
    florin.save('labeled.tiff')
)

# Run the pipeline
segmented = pipeline()

This pipeline save both the binary segmentation and the labeled volume where each class is represented by a different color.

Closing Remarks

Rolling out a basic FLoRIN pipeline is relatively easy (20 lines of code without the comments and whitespace). This example runs everything on a single cores, but the next example demonstrates parallel processing, which is just as easy to set up.

Parallel Processing Pipelines

This example will show how to convert the previous example to perform multiprocessing on the tiles and connected components created during segmentation and weak classification, respectively.

Parallelism

Parallel processing can be invoked by creating sub-pipelines around commands that will receive multiple inputs.

Multithreading

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    florin.Multithread(
        # Threshold with NDNT
        thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),

        # Clean up a little bit
        morphology.binary_opening()
    ),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff'),

    # Find connected components
    conncomp.label(),
    morphology.remove_small_holes(min_size=20),
    conncomp.regionprops(),

    # Classify the connected components by their volume and dimensions
    florin.Multithread(
        florin.classify(
            florin.bounds_classifier(
                'cell',
                area=(100, 300),
                depth=(10, 25),
                width=(50, 100),
                height=(50, 100)
            ),
            florin.bounds_classifier('vasculature')
        )
    )

    # Reconstruct the labeled volume
    florin.reconstruct(),

    # Write out the labeled volume
    florin.save('labeled.tiff')
)

# Run the pipeline
segmented = pipeline()

Multiprocessing

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    florin.Multiprocess(
        # Threshold with NDNT
        thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),

        # Clean up a little bit
        morphology.binary_opening()
    ),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff'),

    # Find connected components
    conncomp.label(),
    morphology.remove_small_holes(min_size=20),
    conncomp.regionprops(),

    # Classify the connected components by their volume and dimensions
    florin.Multiprocess(
        florin.classify(
            florin.bounds_classifier(
                'cell',
                area=(100, 300),
                depth=(10, 25),
                width=(50, 100),
                height=(50, 100)
            ),
            florin.bounds_classifier('vasculature')
        )
    )

    # Reconstruct the labeled volume
    florin.reconstruct(),

    # Write out the labeled volume
    florin.save('labeled.tiff')
)

# Run the pipeline
segmented = pipeline()

MPI

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    florin.MPI(
        # Threshold with NDNT
        thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),

        # Clean up a little bit
        morphology.binary_opening()
    ),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff'),

    # Find connected components
    conncomp.label(),
    morphology.remove_small_holes(min_size=20),
    conncomp.regionprops(),

    # Classify the connected components by their volume and dimensions
    florin.MPI(
        florin.classify(
            florin.bounds_classifier(
                'cell',
                area=(100, 300),
                depth=(10, 25),
                width=(50, 100),
                height=(50, 100)
            ),
            florin.bounds_classifier('vasculature')
        )
    )

    # Reconstruct the labeled volume
    florin.reconstruct(),

    # Write out the labeled volume
    florin.save('labeled.tiff')
)

# Run the pipeline
segmented = pipeline()

All of these examples scale to the number of availble cores (or MPI ranks in the MPI version), and can be parameterized to use a specific number when the sub-pipelines are created.

Mixed Parallelism

Using the sub-pipeline model in the above example, it is possible to mix parallel processing paradigms. For example, segmenting tiles with NDNT uses vectorized operations and may be better suited to multi-node parallelism with MPI, but classification is more lightweight and can be carried out in threads. This sort of a pipeline would look like:

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    florin.MPI(
        # Threshold with NDNT
        thresholding.ndnt(shape=(10, 64, 64), thresshold=0.3),

        # Clean up a little bit
        morphology.binary_opening()
    ),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff'),

    # Find connected components
    conncomp.label(),
    morphology.remove_small_holes(min_size=20),
    conncomp.regionprops(),

    # Classify the connected components by their volume and dimensions
    florin.Multithread(
        florin.classify(
            florin.bounds_classifier(
                'cell',
                area=(100, 300),
                depth=(10, 25),
                width=(50, 100),
                height=(50, 100)
            ),
            florin.bounds_classifier('vasculature')
        )
    )

    # Reconstruct the labeled volume
    florin.reconstruct(),

    # Write out the labeled volume
    florin.save('labeled.tiff')
)

# Run the pipeline
segmented = pipeline()

In this case, an implicit join after the MPI pipeline converts merges the segmented tiles into a single volume. Connected components are then computed over the whole volume and classified concurrently using a multithreading model.

Closing Remarks

Parallel processing with FLoRIN is as easy as specifying the type of parallel pipeline to use, and they are roughly interchangeable (MPI requires using the standard mpirun or mpiexec invocations, or an equivalent).

Using Custom Functions in FLoRIN

Because of the wide array of computer vision methods, FLoRIN comes with utilities to prepare functions. This section will go over the two cases for preparing functions: without parameters, and with parameters.

Single-Argument Functions

Functions with a single argument (e.g., those taking a single image or a single numpy array and no other arguments) require no additional preparation. This example shows how to incorporate np.squeeze into a pipeline:

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

import numpy as np

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    # Remove any axes with shape 1. Simply pass np.squeeze without invoking
    np.squeeze,

    # Threshold with NDNT
    thresholding.ndnt(shape=(10, 64, 64), threshold=0.3),

    # Clean up a little bit
    morphology.binary_opening(),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff')
)

# Run the pipeline
segmented = pipeline()

Note that np.squeeze is not invoked. The function is just passed to the pipeline as-is, and FLoRIN will call it later.

Parameterizing Functions with florinate

Functions with parameters can also be used within FLoRIN by wrapping them with florin.florinate. This function records any parameters passed while setting up the pipeline and then automatically applies them when the data comes through (i.e. partial function application):

.. content-tabs::

Decorator

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Create the custom function and decorate it with ``florinate``
@florin.florinate
def scale(image, scalar=1):
    """Scale an images values by some number.

    Parameters
    ----------
    image : array_like
    scale : int or float

    Returns
    -------
    image * scale
    """
    return image * scale

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    # Add the custom function to the pipeline
    scale(scalar=2.0),

    # Threshold with NDNT
    thresholding.ndnt(shape=(10, 64, 64), threshold=0.3),

    # Clean up a little bit
    morphology.binary_opening(),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff')
)

# Run the pipeline
segmented = pipeline()

In-Line

import florin
import florin.conncomp as conncomp
import florin.morphology as morphology
import florin.thresholding as thresholding

# Create the custom function
def scale(image, scalar=1):
    """Scale an images values by some number.

    Parameters
    ----------
    image : array_like
    scale : int or float

    Returns
    -------
    image * scale
    """
    return image * scale

# Set up a serial pipeline
pipeline = florin.Serial(
    # Load in the volume from file
    florin.load(),

    # Add the custom function to the pipeline and wrap it in ``florinate``
    florin.florinate(scale)(scalar=2.0),

    # Tile the volume into overlapping 64 x 64 x 10 subvolumes
    florin.tile(shape=(10, 64, 64), stride=(10, 32, 32)),

    # Add the custom function to the pipeline
    scale(scalar=2.0),

    # Threshold with NDNT
    thresholding.ndnt(shape=(10, 64, 64), threshold=0.3),

    # Clean up a little bit
    morphology.binary_opening(),

    # Save the output to a TIFF stack
    florin.save('segmented.tiff')
)

# Run the pipeline
segmented = pipeline()

florinate will handle any number of arguments and keyword arguments passed to it, applying them every time the function is called during the pipeline.

Why florinate?

The functools module already has an implementation of partial functions (functools.partial), the the natural question is: why reinvent the wheel? When building FLoRIN, we noticed that most computer vision functions take the image as the first argument; functools.partial, however will only append arguments when called. florinate solves this by prepending the argument(s) when called, lining up with the norm for computer vision APIs.

If a custom function takes the image as the last argument, functools.partial can be used in place of florinate with no changes.

Adding New Pipeline Types to FLoRIN

FLoRIN offers a number of pipeline options (Serial, Multithread, Multiprocess, etc.) out of the box, but what if you need a different model? This example will show how to create a custom pipeline class with a different style of execution.

SLURMPipeline

Suppose you work on a cluster that uses SLURM and want to submit a job to a queue. This requires a pipeline that

  1. Accepts parameters to configure sbatch
  2. Sets up a job script
  3. Submits the job script for processing
  4. Blocks until all jobs are finished

Such a pipeline may look like this

import re
import subprocess
import time

import dill  # dill is installed with florin

from florin.pipelines import Pipeline

class SLURMPipeline(Pipeline):
    """Pipeline that sets up and runs a SLURM job.

    Parameters
    ----------
    operations : callables
        The functions of the pipeline.

    Other Parameters
    ----------------
    Keyword arguments corresponding to SLURM directives, e.g. qos='debug',
    time=60, etc. These are dynamically added to the jobscript before
    submission.
    """

    def __init__(self, *operations, **kwargs):
        super(SLURMPipeline, self).__init__(*operations)
        self.slurm_directives = kwargs

    def run(self, data):
        """Submit and run a pipeline on SLURM.

        Parameters
        ----------
        data : list
            The input to the first function in the pipeline, e.g. a
            filepath for florin.load().
        """
        # Serialize this current pipeline
        pipeline_path = 'my_pipeline.pkl'
        self.dump(pipeline_path)

        # Set up the job script. This sets up the shebang header, then
        # iterates over the provided #SBATCH disrectives and sets each one
        # up on its own line, then finally invokes srun to deserialize the
        # pipeline and run it on the data.
        jobscript = "#/usr/bin/env bash"
        jobscript = '\n'.join(
            ['#!/usr/bin/env bash'] +
            ['#SBATCH --{}={}'.format(key, val) for key, val in self.slurm_directives.items()] +
            ['srun python -m florin.run {} $1'.format(pipeline_path)])

        # Dump the jobscript to file
        with open('my_jobscript.job', 'w') as f:
            f.write(jobscript)

        jobids = []

        # Submit one job for each data item.
        for item in data:
            out = subprocess.check_output(['sbatch', my_jobscript, item])
            jobids.append(re.search(r'([\d]+)', out).group())

        # Wait until all jobs have completed to exit.
        while len(jobids) > 0:
            time.sleep(10)
            completed = []

            for jid in jobids:
                out = subprocess.check_output(['sacct', '-j', jid])
                if re.search(r'(COMPLETE)', out):
                    completed.add(jid)

            for jid in completed:
                jobids.remove(jid)

Note that this code is untested and by no means guaranteed to work, it is only meant to be a non-trivial example of what a custom pipeline may look like.

Other Examples

Another great source of examples for setting up custom pipelines is the florin.pipelines module, where the source code for the officially supported pipelines.

API Documentation

florin The FLoRIN pipeline for large-scale learning-free computer vision.
florin.classification Utilities for classifying connected components.
florin.closure Closure decorator for delayed processing.
florin.compose Deferred function composition with functools.
florin.conncomp Convenience functions for image connected components operations.
florin.io I/O functions for loading and saving data in a variety of formats.
florin.morphology Convenience functions for image morphological operations.
florin.ndnt N-Dimensional Neighborhood Thresholding for any-dimensional data.
florin.pipelines Deferred execution pipelines with different computational models.
florin.reconstruction Reconstruct connected components as an array of pixel-wise class labels.
florin.thresholding Convenience functions for image thresholding operations.
florin.tiling Utilities for tiling images and volumes.