Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zscore fails when using whitenoise as an input. #1972

Open
h-mayorquin opened this issue Sep 6, 2023 · 4 comments
Open

zscore fails when using whitenoise as an input. #1972

h-mayorquin opened this issue Sep 6, 2023 · 4 comments
Labels
bug Something isn't working preprocessing Related to preprocessing module

Comments

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 6, 2023

Related to this #1971:

Bug:

from spikeinterface.preprocessing import normalize_by_quantile, scale, center, zscore
from spikeinterface.core import generate_recording

seed = 1
rec = generate_recording(seed=seed, mode="lazy")
rec_int = scale(rec, dtype="int16", gain=100)

zscore_recording = zscore(rec_int, dtype="int16", int_scale=256, mode="mean+std", seed=seed)
traces = zscore_recording.get_traces(segment_index=0)
trace_mean = np.mean(traces, axis=0)
trace_std = np.std(traces, axis=0)
assert np.all(np.abs(trace_mean) < 1)
assert np.all(np.abs(trace_std - 256) < 1)

For most seeds with lazy this fails.

@samuelgarcia you are the one that implemented this range thing for z-scores, any idea on why this might be?

@h-mayorquin h-mayorquin added bug Something isn't working preprocessing Related to preprocessing module labels Sep 6, 2023
@h-mayorquin
Copy link
Collaborator Author

This functionality was implemented here:
#1437

@alejoe91
Copy link
Member

alejoe91 commented Sep 7, 2023

Interesting! I'll also take a look! Thanks for tracking this down @h-mayorquin

@h-mayorquin
Copy link
Collaborator Author

h-mayorquin commented Sep 7, 2023

I thought more about this this morning. It is just a statistical problem with the test. This is not a bug.
The thing is that if you scale your data by a standadrd deviation you increase the standard error:

$ SE = \frac{\sigma}{\sqrt{n}} $

Then you need a larger n. If you calculate the standard error of the mean for the chunks available you get that the standard error is not small enough for the test above to be sound.

Plus, casting to int is not a linear function (it behaves like floor for positives and like ceiling for negatives) which also increases the standard deviation of the samples and will make the standard error larger in practice.

Here, I estimate taking the mean 1_000 times with 100_000 samples. You can see that most of the estimates are farther from the true (which is zero) than 1 and therefore will fail on the test above.

estimates = (256 * np.random.randn(int(5e4), 1000).astype("int16")).mean(axis=0)
plt.hist(estimates, bins=100)
plt.xlabel("Estimated mean from the process")
plt.ylabel("Frequency")

image

@h-mayorquin
Copy link
Collaborator Author

h-mayorquin commented Sep 7, 2023

I would like for us to shutdown this test and slowly deprecate the int range functionality in ZScoreRecording so we can confine this complexity to kilosort only. Probably Sam tested this a lot there when he was implementing kilosort and as this is is an ad-hoc processing step for them we don't neeed to test it in the general.

Within the kilosort pipeline or anywhere else if the need for an operation like this arises the operation can be performed by composing the casting and scaling operations already available in the pre-processing module. No need for flags.

Simliar to previous discussions:
#1815 (comment)
#914

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working preprocessing Related to preprocessing module
Projects
None yet
Development

No branches or pull requests

2 participants