Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scale_to_uV preprocessing #3053

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/spikeinterface/preprocessing/preprocessinglist.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
CenterRecording,
center,
)
from .scale import ScaleTouV, scale_to_uV

from .whiten import WhitenRecording, whiten, compute_whitening_matrix
from .rectify import RectifyRecording, rectify
from .clip import BlankSaturationRecording, blank_staturation, ClipRecording, clip
Expand Down Expand Up @@ -54,6 +56,7 @@
ScaleRecording,
CenterRecording,
ZScoreRecording,
ScaleTouV,
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
# decorrelation stuff
WhitenRecording,
# re-reference
Expand Down
46 changes: 46 additions & 0 deletions src/spikeinterface/preprocessing/scale.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from __future__ import annotations

from spikeinterface.core import BaseRecording
from spikeinterface.preprocessing.basepreprocessor import BasePreprocessor


class ScaleTouV(BasePreprocessor):
"""
Scale raw traces to microvolts (µV).

This preprocessor uses the channel-specific gain and offset information
stored in the recording extractor to convert the raw traces to µV units.
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
recording : BaseRecording
The recording extractor to be scaled. The recording extractor must
have gains and offsets otherwise an error will be raised.

Raises
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
------
AssertionError
If the recording extractor does not have scaleable traces.
"""

name = "scale_to_uV"

def __init__(self, recording: BaseRecording):
assert recording.has_scaleable_traces(), "Recording must have scaleable traces"
from spikeinterface.preprocessing.normalize_scale import ScaleRecordingSegment
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved

dtype = recording.get_dtype()
BasePreprocessor.__init__(self, recording, dtype=dtype)

gain = recording.get_channel_gains()[None, :]
offset = recording.get_channel_offsets()[None, :]
for parent_segment in recording._recording_segments:
rec_segment = ScaleRecordingSegment(parent_segment, gain, offset, self._dtype)
self.add_recording_segment(rec_segment)

self._kwargs = dict(
recording=recording,
)


scale_to_uV = ScaleTouV
36 changes: 36 additions & 0 deletions src/spikeinterface/preprocessing/tests/test_scaling.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import pytest
import numpy as np
from spikeinterface.core.testing_tools import generate_recording
from spikeinterface.preprocessing import ScaleTouV # Replace 'your_module' with your actual module name


def test_scale_to_uv():
# Create a sample recording extractor with fake gains and offsets
num_channels = 4
sampling_frequency = 30_000.0
durations = [1] # seconds
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
recording = generate_recording(
num_channels=num_channels,
durations=durations,
sampling_frequency=sampling_frequency,
)

rng = np.random.default_rng(0)
gains = rng.random(size=(num_channels)).astype(np.float32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth parameterising over some extreme cases (e.g. very small values, typical values, extremely large values) for gains and offsets?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm do you have some suggestions of what would be the purpose more specifically? I agree with this philosophy (e.g. "think about what will break the test") but nothing concrete comes to mind other than testing overflow problems. Let me think more but if you come with anything it would be good.

I would not like to stop the PR if we don't come with good extreme case criteria though : P

Copy link
Collaborator

@zm711 zm711 Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gains and offsets should really be based on reality. We could check a bunch of gains and offsets from Neo if we want to parameterize against real values. For example I think gain=1, offset=0 comes up because some data formats don't scale. We could test against Intan's scaling for example since the format is pretty common.

Ie to be clearer I don't think our tests should specifically look at "wrong" values that a users could input, but rather test for real inputs that our library should handle.

Copy link
Collaborator

@JoeZiminski JoeZiminski Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good points, in general I am in two minds. For sure I think it's good to test with a range of expected values. But the tests can also be viewed as a way of stress-testing the software, and putting extreme values in can reveal strange or unexpected behaviours. For example we sometimes struggle with overflow errors from numpy dtypes that would probably be caught in tests if we were using some extreme values. So I think, as including more values for these short tests is basically free, we might as well as more information is better, but I'm not strong on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that the random gains kind of cover for all the common use cases save the stress (outlier / weird) part. For the overflow problem I think we could think on someting but I am leaning on testing overflow in a more general check. I actually do feel that that one needs generalization to be useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree it's not super important to check the overflow errors in sporadic tests and would be better centralised somewhere. I like the rand but isnt rand bounded [0, 1] and gains can in theory be (0, (some large number?]? For me it seems most efficient rather than guessing bounds on realistic values just to test very small, commonplace, very large number and for sure cover all cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, to move this forward I propose the following:

  1. You give me the numbers you want to test.
  2. I increase the bounds of float to be very large.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good, this was a suggestion that I'm not super-strong on and I wouldn't want it to be a blocker! Please feel free to proceed however you think best, personally I would do something like parameterize over small, medium and large number for gain and offset:

@pytest.mark.parametrize("gains", [1e-6,, 50, 1e6])
@pytest.mark.parametrize("offsets", [1e-6,, 50, 1e6])

I'm not advocating this as necessarily the best approach but it is the kind of habit I have fallen into to test bounds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to avoid repeatedly generating a recording in such an case, the recording can be generated in a session-scoped fixture)

offsets = rng.random(size=(num_channels)).astype(np.float32)
recording.set_channel_gains(gains)
recording.set_channel_offsets(offsets)

# Apply the preprocessor
scaled_recording = ScaleTouV(recording=recording)

# Check if the traces are indeed scaled
expected_traces = recording.get_traces(return_scaled=True)
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
scaled_traces = scaled_recording.get_traces()

np.testing.assert_allclose(scaled_traces, expected_traces)

# Test for the error when recording doesn't have scaleable traces
recording.set_channel_gains(None) # Remove gains to make traces unscaleable
with pytest.raises(AssertionError):
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
ScaleTouV(recording)
Loading