Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on extracting waveforms from Kilosort's temp_wh.dat #1916

Closed
JoeZiminski opened this issue Aug 10, 2023 · 3 comments
Closed

Thoughts on extracting waveforms from Kilosort's temp_wh.dat #1916

JoeZiminski opened this issue Aug 10, 2023 · 3 comments
Labels
question General question regarding SI

Comments

@JoeZiminski
Copy link
Collaborator

JoeZiminski commented Aug 10, 2023

Currently (I believe) the canonical way (e..g in the docs) to perform a sorting pipeline in Spikeinterface is

  1. generate a recording object for preprocessing
  2. pass this to a sorter and get a sorting object
  3. extract waveforms using the preprocessing recording and sorting output. The waveform data is extracted from the preprocessed recording.

(please let me know if this is wrong off the bat)

One problem with this workflow is that in some instances the sorter performs it's own preprocessing, in the case of Kilosort this is saved as the temp_wh.dat file. In this case, the waveforms saved by spikeinterface will not be the same as those in Phy / the true data that was sorted on. I guess this will be similar in any case that any sorter does it's own preprocessing, although I am not sure of the specifics of other sorters.

I wonder what the best approach for this is (if it is indeed a problem) - could it be solved by documentation? Or, should the sorting object hold a reference to the output of a sorter's preprocessing steps (e.g. temp_wh.dat loaded as an SI recording object in the Kilosort case) which waveform extraction can use under the hood? For example instead of:

recording = load_data(...)
prepro_recording = SomePreproSteps(recording)
sorting = run_sorter(prepro_recording)
we = extract_waveforms(prepro_recording, sorting, ...)

the call to extract_waveforms1 would be:

we = extract_waveforms(sorting, ...)

and under the hood sorting would use temp_wh.dat to extract waveforms from, or whatever other sorter-specific preprocesing file there is. Otherwise, it could load the preprocessing file directly from the metadata in the sorting output, or optionally take a recording object as is done now.

@alejoe91 alejoe91 added the question General question regarding SI label Aug 21, 2023
@alejoe91
Copy link
Member

Hi @JoeZiminski

You can already load the KS pre-processed data as a plain binary file with the si.read_binary() function:

The main "problem" is that the preprocessed data will not contain all channels, as some are removed for lack of activity.
I guess that we could make a KilosortTempWhRecording that loads the preprocessed data and reads the metadata (such as which channels have been removed) from the KS output. Then one can extract waveforms with whatever preferred recording. What do you think?

@JoeZiminski
Copy link
Collaborator Author

I think this is a nice idea, happy to open a PR on this this week. Thanks!

@JoeZiminski
Copy link
Collaborator Author

Closed further to discussion on #1986 and #1954 as it is not very feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question regarding SI
Projects
None yet
Development

No branches or pull requests

2 participants