Thoughts on extracting waveforms from Kilosort's `temp_wh.dat` #1916

JoeZiminski · 2023-08-10T11:32:50Z

Currently (I believe) the canonical way (e..g in the docs) to perform a sorting pipeline in Spikeinterface is

generate a recording object for preprocessing
pass this to a sorter and get a sorting object
extract waveforms using the preprocessing recording and sorting output. The waveform data is extracted from the preprocessed recording.

(please let me know if this is wrong off the bat)

One problem with this workflow is that in some instances the sorter performs it's own preprocessing, in the case of Kilosort this is saved as the temp_wh.dat file. In this case, the waveforms saved by spikeinterface will not be the same as those in Phy / the true data that was sorted on. I guess this will be similar in any case that any sorter does it's own preprocessing, although I am not sure of the specifics of other sorters.

I wonder what the best approach for this is (if it is indeed a problem) - could it be solved by documentation? Or, should the sorting object hold a reference to the output of a sorter's preprocessing steps (e.g. temp_wh.dat loaded as an SI recording object in the Kilosort case) which waveform extraction can use under the hood? For example instead of:

recording = load_data(...)
prepro_recording = SomePreproSteps(recording)
sorting = run_sorter(prepro_recording)
we = extract_waveforms(prepro_recording, sorting, ...)

the call to extract_waveforms1 would be:

we = extract_waveforms(sorting, ...)

and under the hood sorting would use temp_wh.dat to extract waveforms from, or whatever other sorter-specific preprocesing file there is. Otherwise, it could load the preprocessing file directly from the metadata in the sorting output, or optionally take a recording object as is done now.

The text was updated successfully, but these errors were encountered:

alejoe91 · 2023-08-21T12:26:49Z

Hi @JoeZiminski

You can already load the KS pre-processed data as a plain binary file with the si.read_binary() function:

The main "problem" is that the preprocessed data will not contain all channels, as some are removed for lack of activity.
I guess that we could make a KilosortTempWhRecording that loads the preprocessed data and reads the metadata (such as which channels have been removed) from the KS output. Then one can extract waveforms with whatever preferred recording. What do you think?

JoeZiminski · 2023-08-29T12:53:28Z

I think this is a nice idea, happy to open a PR on this this week. Thanks!

JoeZiminski · 2024-05-13T20:00:15Z

Closed further to discussion on #1986 and #1954 as it is not very feasible.

alejoe91 added the question General question regarding SI label Aug 21, 2023

JoeZiminski mentioned this issue Aug 29, 2023

Cleanup of kilosort temporary files causing issues with Phy #1896

Closed

rat-h mentioned this issue Apr 1, 2024

Import temp_wh.dat as recording object #2650

Closed

JoeZiminski closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on extracting waveforms from Kilosort's `temp_wh.dat` #1916

Thoughts on extracting waveforms from Kilosort's `temp_wh.dat` #1916

JoeZiminski commented Aug 10, 2023 •

edited

Loading

alejoe91 commented Aug 21, 2023

JoeZiminski commented Aug 29, 2023

JoeZiminski commented May 13, 2024

Thoughts on extracting waveforms from Kilosort's temp_wh.dat #1916

Thoughts on extracting waveforms from Kilosort's temp_wh.dat #1916

Comments

JoeZiminski commented Aug 10, 2023 • edited Loading

alejoe91 commented Aug 21, 2023

JoeZiminski commented Aug 29, 2023

JoeZiminski commented May 13, 2024

Thoughts on extracting waveforms from Kilosort's `temp_wh.dat` #1916

Thoughts on extracting waveforms from Kilosort's `temp_wh.dat` #1916

JoeZiminski commented Aug 10, 2023 •

edited

Loading