Add PreprocessingPipeline #3438

chrishalcrow · 2024-09-25T10:13:57Z

A proposal to add a PreprocessingPipeline class, which contains ordered preprocessing steps and their kwargs in a dictionary.

You can apply the class to a recording, or use the helper function create_preprocessed to make a preprocessed recording:

preprocessor_dict = {'bandpass_filter': {'freq_max': 3000}, 'common_reference': {}}

# apply using
from spikeinterface.preprocessing import PreprocessingPipeline
pipeline = PreprocessingPipeline(preprocessor_dict)
preprocessed_recording = pipeline.apply_to(recording)

# or
from spikeinterface.preprocessing import create_preprocessed
preprocessed_recording = create_preprocessed(recording, preprocessor_dict)

Also adds a function which takes in a recording.json provenance file and make a preprocessor_dict:

from spikeinterface.preprocessing import get_preprocessing_dict_from_json
my_dict = get_preprocessing_dict_from_json('/path/to/recording.json')

This allow for some cool things:

Users can pass a single dictionary to construct a preprocessed recording (as above). Hence it completes the “dictionary workflow”; since you can use dicts in sorting, run_sorter_jobs, and postprocessing in compute.
Users can easily visualise their preprocessing pipeline using the repr, including an HTML repr in Jupyter notebook
Increases portability between labs, since you can reconstruct the preprocessing steps from the recording.json file without the original recording (and worrying about paths).

The repr currently looks like this:

Note that 3. only works for preprocessing steps that are in some sense “global” i.e. can be applied to any recording. This doesn’t apply for all preprocessing steps e.g. interpolate_bad_channels needs the bad_unit_ids which are recording dependent. However, many of these functions can be modified to be applied more globally e.g. if bad_unit_ids is None, interpolate_bad_channels could detect bad channels, then interpolate these. This would be apply-able to any recording, so is “global”.

Important to get the names right. I read this: https://melevir.medium.com/python-functions-naming-tips-376f12549f9. I think it’s important that create_preprocessed doesn’t sound in-place, after the number of problems with set_probe. Hence I’m against something like apply_preprocessing(recording), and would rather have make, create, construct, produce or something in the function name. I also like the idea (from the article) that you don’t need to include e.g. recording in the name if recording is a required argument. Hence I like something like my_pipeline.apply_to(recording) rather than something like my_pipeline.apply_pipeline_to_recording(recording).

To do:

Tests
~~Add "allowed preprocessing steps" for get_preprocessing_dict_from_json~~

chrishalcrow added 2 commits September 25, 2024 11:06

add PreprocessingPipeline

d7bb297

Merge branch 'main' into preprocessing-pipeline

d0e74f7

chrishalcrow added enhancement New feature or request preprocessing Related to preprocessing module labels Sep 25, 2024

alejoe91 modified the milestone: 0.101.2 Oct 1, 2024

chrishalcrow added 2 commits December 6, 2024 15:57

add motion correct and nice html repr

8252f8f

add preprocessing names_to_funcitons dict

8436eb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PreprocessingPipeline #3438

Add PreprocessingPipeline #3438

chrishalcrow commented Sep 25, 2024 •

edited

Loading

Add PreprocessingPipeline #3438

Are you sure you want to change the base?

Add PreprocessingPipeline #3438

Conversation

chrishalcrow commented Sep 25, 2024 • edited Loading

chrishalcrow commented Sep 25, 2024 •

edited

Loading