-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whitening fix - compute covariance matrix in float #3505
Whitening fix - compute covariance matrix in float #3505
Conversation
Out of curiosity since kilosort 1-3 (and even 4 prefers) int16 how do they deal with their whitening? I have looked. |
I was curious about this also, they:
kilosort4 In kilosort4, the processing is done in Curiously, I can see in the data loading class here that if the data is EDIT: Ah it is done here, the main purpose of the kilosort<4 (based on 2.5) First, the whitening matrix is computed in |
Cool I guess they cast too. did you find the commit+PR where it was changed was there a comment about the dropping? Some of the preprocessing dtype stuff is a mystery to me so maybe someone had a reason? |
Then indeed, this is a big mistake and random_data should always be broadcasted to float32 before estimating the whitening matrix |
@JoeZiminski are you planning to extend tests here or is it good to merge? |
Could we add the regression test in an issue to add? We use MS5 which relies on the whitening machinery. Would love to have this on main for everyone. |
Sounds good to me. Thanks Joe! |
Hey @alejoe91 @zm711 in the end adding test tests was taking longer than expected, I'll make a new PR for this. Feel free to merge! At the next meeting we can discuss where to include a warning (e.g. maybe 0.101.0 release notes, or the release notes for a new version). In the end I reverted to |
for more information, see https://pre-commit.ci
In
whiten.py
there is a datatype argument to set the datatype of the output recording. The casting is performed after whitening has been performed and does not affect the datatype used for computing the whitening matrix. Up until0.100.8
, the covariance matrix was always computed infloat32
, but in0.101.0
the linerandom_data = np.astype(random_data, np.float32)
that was under this line was removed. Now if the recording isint16
the covariance matrix estimation is performed inint16
and will overflow in nearly all cases.This PR reinstates that line, for now casting to
float64
. The recordingdtype
will be cast back to the user-specifieddtype
or the input recordingdtype
, so I don't think there is any downside in using maximum precision. The only thing is, maybe we will want to move to GPU support in future, and using float32 will keep it consistent.Unless I'm missing something, I think it would be worth rolling this out in an update ASAP and making a note of this on the release note header. My understanding is the majority of recording datatypes are stored
int16
and so this regression may catch out quite a lot of users since0.101.0
and is quite hard to track down.TODOs