Whitening fix - compute covariance matrix in float #3505

JoeZiminski · 2024-10-25T08:04:00Z

In whiten.py there is a datatype argument to set the datatype of the output recording. The casting is performed after whitening has been performed and does not affect the datatype used for computing the whitening matrix. Up until 0.100.8, the covariance matrix was always computed in float32, but in 0.101.0 the line random_data = np.astype(random_data, np.float32) that was under this line was removed. Now if the recording is int16 the covariance matrix estimation is performed in int16 and will overflow in nearly all cases.

This PR reinstates that line, for now casting to float64. The recording dtype will be cast back to the user-specified dtype or the input recording dtype, so I don't think there is any downside in using maximum precision. The only thing is, maybe we will want to move to GPU support in future, and using float32 will keep it consistent.

Unless I'm missing something, I think it would be worth rolling this out in an update ASAP and making a note of this on the release note header. My understanding is the majority of recording datatypes are stored int16 and so this regression may catch out quite a lot of users since 0.101.0 and is quite hard to track down.

TODOs

extend tests

for more information, see https://pre-commit.ci

zm711 · 2024-10-25T10:48:16Z

Out of curiosity since kilosort 1-3 (and even 4 prefers) int16 how do they deal with their whitening? I have looked.

JoeZiminski · 2024-10-25T11:37:09Z

I was curious about this also, they:

have int16 raw data
scale to float32 for CAR, filtering, whitening
scale up by 200 and convert back to int16.

kilosort4

In kilosort4, the processing is done in float32, as mentioned here. The covariance matrix is computed in float32 (X is in float32 and torch.zeros initialises in float32 by default). You can see the rescaling by 200 and int16 conversion for example here.

Curiously, I can see in the data loading class here that if the data is uint16 it is cast to float. But in the class __init__ it states the default datatype expected is int16. I can't find ATM where this cast from int16 to float32 is performed (it is definitely performed for uint16). I need to look into this further, from the other comments and code sections the data is definitely expected to be in float32 for processing.

EDIT: Ah it is done here, the main purpose of the unit16 block is the centering around 0.

kilosort<4 (based on 2.5)

First, the whitening matrix is computed in float32 (from data processed by gpufilter.py which performs CAR and filtering in float32). Then float32 conversion + filtering and CAR is performed in the main preprocessing loop, as well as whitening. Then cast back to int16 is performed here, with the 200 scaling incorporated into the whitening matrix.

zm711 · 2024-10-25T12:12:52Z

Cool I guess they cast too. did you find the commit+PR where it was changed was there a comment about the dropping? Some of the preprocessing dtype stuff is a mystery to me so maybe someone had a reason?

JoeZiminski · 2024-10-28T09:31:04Z

Thanks @zm711 good point I was just looking at the version diff, I think it was here.

yger · 2024-10-28T09:40:10Z

Then indeed, this is a big mistake and random_data should always be broadcasted to float32 before estimating the whitening matrix

src/spikeinterface/preprocessing/whiten.py

alejoe91 · 2024-10-31T10:24:47Z

@JoeZiminski are you planning to extend tests here or is it good to merge?

zm711 · 2024-10-31T11:56:42Z

Could we add the regression test in an issue to add? We use MS5 which relies on the whitening machinery. Would love to have this on main for everyone.

JoeZiminski · 2024-10-31T12:04:25Z

Hey @alejoe91 @zm711 hey both, sorry got waylaid with something but will add some tests today! agree this will be good to merge ASAP

zm711 · 2024-10-31T12:17:40Z

Sounds good to me. Thanks Joe!

src/spikeinterface/preprocessing/whiten.py

JoeZiminski · 2024-10-31T18:02:19Z

Hey @alejoe91 @zm711 in the end adding test tests was taking longer than expected, I'll make a new PR for this. Feel free to merge! At the next meeting we can discuss where to include a warning (e.g. maybe 0.101.0 release notes, or the release notes for a new version).

In the end I reverted to np.float32 to a) match what was there before b) match other implementation (kilosort) c) match GPU if this is implemented. The overflowing computation is on int16 ephys data and is the dot product between two channel timeseries of length determined by the random chunk, by default it is 10k samples. In some impossible example, if somehow the raw data is all around + 32767, demean is not selected, and the user specifies 1M samples instead of 10k, the value will be 1.07 * 1e+15, float32 max value is 3.4 * 1e+39, so I think we'll be fine!

for more information, see https://pre-commit.ci

JoeZiminski and others added 3 commits October 25, 2024 08:28

Compute covariance matrix in float64.

f66ae7f

Update docstring.

b5c260a

[pre-commit.ci] auto fixes from pre-commit.com hooks

18cfb2b

for more information, see https://pre-commit.ci

JoeZiminski mentioned this pull request Oct 25, 2024

A detailed guide on preprocessing #2996

Open

LeMuellerGuy mentioned this pull request Oct 28, 2024

Possible issue in whitening procedure? #3510

Open

alejoe91 reviewed Oct 29, 2024

View reviewed changes

src/spikeinterface/preprocessing/whiten.py Outdated Show resolved Hide resolved

Update src/spikeinterface/preprocessing/whiten.py

d17181f

alejoe91 approved these changes Oct 29, 2024

View reviewed changes

alejoe91 added the preprocessing Related to preprocessing module label Oct 29, 2024

zm711 reviewed Oct 29, 2024

View reviewed changes

src/spikeinterface/preprocessing/whiten.py Show resolved Hide resolved

Revert to float32.

12538cc

JoeZiminski commented Oct 31, 2024

View reviewed changes

src/spikeinterface/preprocessing/whiten.py Outdated Show resolved Hide resolved

Fix string format error.

035d61c

[pre-commit.ci] auto fixes from pre-commit.com hooks

7b8d0a2

for more information, see https://pre-commit.ci

alejoe91 merged commit d78a0da into SpikeInterface:main Nov 4, 2024
15 checks passed

JoeZiminski mentioned this pull request Nov 12, 2024

Extend whitening tests #3531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitening fix - compute covariance matrix in float #3505

Whitening fix - compute covariance matrix in float #3505

JoeZiminski commented Oct 25, 2024 •

edited

Loading

zm711 commented Oct 25, 2024

JoeZiminski commented Oct 25, 2024 •

edited

Loading

zm711 commented Oct 25, 2024

JoeZiminski commented Oct 28, 2024

yger commented Oct 28, 2024

alejoe91 commented Oct 31, 2024

zm711 commented Oct 31, 2024

JoeZiminski commented Oct 31, 2024

zm711 commented Oct 31, 2024

JoeZiminski commented Oct 31, 2024 •

edited

Loading

Whitening fix - compute covariance matrix in float #3505

Whitening fix - compute covariance matrix in float #3505

Conversation

JoeZiminski commented Oct 25, 2024 • edited Loading

zm711 commented Oct 25, 2024

JoeZiminski commented Oct 25, 2024 • edited Loading

zm711 commented Oct 25, 2024

JoeZiminski commented Oct 28, 2024

yger commented Oct 28, 2024

alejoe91 commented Oct 31, 2024

zm711 commented Oct 31, 2024

JoeZiminski commented Oct 31, 2024

zm711 commented Oct 31, 2024

JoeZiminski commented Oct 31, 2024 • edited Loading

JoeZiminski commented Oct 25, 2024 •

edited

Loading

JoeZiminski commented Oct 25, 2024 •

edited

Loading

JoeZiminski commented Oct 31, 2024 •

edited

Loading