Refactor generate.py #1948

samuelgarcia · 2023-08-30T21:40:47Z

This big PR is a strong refactor of generate.py:

move InjectTemplatesRecording into generate.py
change GeneratorRecording into NoiseGeneratorRecording and improve it
move and refactor generate_templates() into generate.py
create generate_ground_truth_recording() to generate recording totally lazy + sorting
rewrite toy_example() using generate_ground_truth_recording()

TODO add upsampling concept into InjectTemplatesRecording

A major improvement will be to use this for testing api, algos, metrics, visulisation without any disk space and memory use.

Move inject template into generate.py

Random then in range per units when not given.

samuelgarcia · 2023-08-31T14:43:41Z

@alejoe91 @h-mayorquin @yger @DradeAW ready to review.
We will need to merge quickly because it is a blocker of other important PR like #1941 and #1944 which is needed for better spike_localization.

As a side effect we will be able one to refactor many test base on this to avoid generating and writting to disk many times toy example.

samuelgarcia · 2023-08-31T14:46:13Z

@h-mayorquin : the test you design to checkthat get_traces does not consume more memory than allocated is not passing on macos... It will be hard to debug from my side...

alejoe91 · 2023-08-31T14:49:07Z

Can you run pre-commit? Also, tests are failing!

DradeAW

I unfortunately don't have the time to do an extensive review, but I think for my part it's ok!

I highlighted some details :)

src/spikeinterface/core/generate.py

samuelgarcia · 2023-08-31T15:16:03Z

@alejoe91 : pre-commit will be run at th end!
@DradeAW : thanks done

samuelgarcia · 2023-08-31T19:20:17Z

@h-mayorquin : the test you design to checkthat get_traces does not consume more memory than allocated is not passing on macos... It will be hard to debug from my side...

Finally I fixed it.

…generator

for more information, see https://pre-commit.ci

samuelgarcia · 2023-09-01T07:04:05Z

test are passing and precommit-ci is back

alejoe91 · 2023-09-01T10:01:56Z

@samuelgarcia good on my side! Great work!!!

I chanded hyperpolarization to recovery (as discussed yesterday)
Exposed the decay_power in generate_templates params (default 1.2-1.8)
Improved docs and fixed some typos

h-mayorquin · 2023-09-05T06:45:17Z

src/spikeinterface/core/generate.py

+            b = refractory_sample * 20
+            shift = a + (b - a) * x**2
+            spike_times[some] += shift
+            times0 = times0[(0 <= times0) & (times0 < N)]


The N here is not defined.

Thanks @h-mayorquin

@samuelgarcia can you make a PR to fix it?

oups. thank for this check. I will fix this

h-mayorquin · 2023-09-06T10:51:01Z

src/spikeinterface/core/generate.py

        num_channels: int,
+        sampling_frequency: float,
+        durations: List[float],
+        noise_level: float = 5.0,


I think that 1.0 is the sensible default here. Why 5.0?

h-mayorquin · 2023-09-06T10:53:36Z

src/spikeinterface/core/generate.py



-def generate_lazy_recording(
+def generate_recording_by_size(


I wish we went all the way to name this "generate_recording_by_memory_size" to be completly specific.

h-mayorquin · 2023-09-06T14:09:59Z

OK, this new mode of NoiseGenerator is terribly slow for long recordings (one hour).

on_the_fly
 127.724 seconds

block_approach:
Execution Time: 5.90863 seconds

21 times slower (and that's after the improvements in #1948).

Script:

import time
from spikeinterface.core.generate import NoiseGeneratorRecording
import cProfile


def generate_noise():
    strategy = "on_the_fly"
    strategy = "tile_pregenerated"
    print(strategy)
    recording = NoiseGeneratorRecording(num_channels=32, sampling_frequency=30_000.0, durations=[3600], strategy=strategy)
    for i in range(5):
        print(i)
        x = recording.get_traces()
        del x
    return recording

# Profile the execution
profiler = cProfile.Profile()
profiler.enable()
start_time = time.time()
recording = generate_noise()
end_time = time.time()
profiler.disable()
profiler.print_stats(sort="cumulative")

print(f"Execution Time: {end_time - start_time:.5f} seconds")
print(recording)

Complete c profiler.

on_the_fly
0
1
2
3
4
         450257 function calls (450251 primitive calls) in 127.724 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.123    0.123  127.724  127.724 dev_profile.py:5(generate_noise)
        5    0.002    0.000  127.600   25.520 baserecording.py:238(get_traces)
        5    9.913    1.983  127.598   25.520 generate.py:585(get_traces)
    18005  115.661    0.006  115.661    0.006 {method 'standard_normal' of 'numpy.random._generator.Generator' objects}
    18007    0.579    0.000    2.024    0.000 {numpy.random._generator.default_rng}
    18007    0.148    0.000    1.046    0.000 contextlib.py:76(inner)
    36014    0.222    0.000    0.508    0.000 _ufunc_config.py:32(seterr)
    18007    0.073    0.000    0.362    0.000 _ufunc_config.py:429(__enter__)
    36012    0.149    0.000    0.304    0.000 <__array_function__ internals>:177(concatenate)
    18007    0.055    0.000    0.273    0.000 _ufunc_config.py:434(__exit__)
    18007    0.240    0.000    0.240    0.000 {function SeedSequence.generate_state at 0x7f1cf2fd5a20}
    36014    0.130    0.000    0.171    0.000 _ufunc_config.py:131(geterr)
    36012    0.109    0.000    0.109    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
    18007    0.052    0.000    0.095    0.000 abc.py:117(__instancecheck__)
    72028    0.091    0.000    0.091    0.000 {built-in method numpy.geterrobj}
    36014    0.064    0.000    0.064    0.000 {built-in method numpy.seterrobj}
    36012    0.047    0.000    0.047    0.000 multiarray.py:148(concatenate)
    18007    0.043    0.000    0.043    0.000 {built-in method _abc._abc_instancecheck}
    18007    0.023    0.000    0.023    0.000 contextlib.py:63(_recreate_cm)
        1    0.000    0.000    0.000    0.000 generate.py:506(__init__)
        1    0.000    0.000    0.000    0.000 generate.py:18(_ensure_seed)
        5    0.000    0.000    0.000    0.000 {built-in method numpy.empty}
        6    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 baserecording.py:34(__init__)
        1    0.000    0.000    0.000    0.000 baserecordingsnippets.py:22(__init__)
        1    0.000    0.000    0.000    0.000 base.py:43(__init__)
        5    0.000    0.000    0.000    0.000 base.py:73(_check_segment_index)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.array}
        1    0.000    0.000    0.000    0.000 _dtype.py:344(_name_get)
      6/3    0.000    0.000    0.000    0.000 abc.py:121(__subclasscheck__)
        2    0.000    0.000    0.000    0.000 {method 'integers' of 'numpy.random._generator.Generator' objects}
      6/3    0.000    0.000    0.000    0.000 {built-in method _abc._abc_subclasscheck}
        5    0.000    0.000    0.000    0.000 baserecording.py:110(get_num_segments)
        1    0.000    0.000    0.000    0.000 _dtype.py:330(_name_includes_bit_suffix)
        5    0.000    0.000    0.000    0.000 base.py:82(ids_to_indices)
        1    0.000    0.000    0.000    0.000 numerictypes.py:356(issubdtype)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.arange}
        1    0.000    0.000    0.000    0.000 generate.py:559(__init__)
        1    0.000    0.000    0.000    0.000 random.py:826(getrandbits)
        1    0.000    0.000    0.000    0.000 generate.py:531(<listcomp>)
        2    0.000    0.000    0.000    0.000 numerictypes.py:282(issubclass_)
        6    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 baserecording.py:121(add_recording_segment)
        1    0.000    0.000    0.000    0.000 baserecording.py:677(__init__)
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 base.py:111(annotate)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        1    0.000    0.000    0.000    0.000 {built-in method posix.urandom}
        1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    0.000    0.000 _dtype.py:24(_kind_name)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 base.py:1084(__init__)
        1    0.000    0.000    0.000    0.000 base.py:1091(set_parent_extractor)
        1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {built-in method from_bytes}
        1    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}


Execution Time: 127.72357 seconds
NoiseGeneratorRecording: 32 channels - 30.0kHz - 1 segments - 108,000,000 samples 
                         3,600.00s (1.00 hours) - float32 dtype - 12.87 GiB
(spikeinterface_env) @h-laptop$ /home/heberto/miniconda3/envs/spikeinterface_env/bin/python /home/heberto/development/spikeinterface/bin/dev_profile.py
tile_pregenerated
0
1
2
3
4
         150 function calls (146 primitive calls) in 5.909 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.130    0.130    5.909    5.909 dev_profile.py:6(generate_noise)
        5    0.000    0.000    5.771    1.154 baserecording.py:238(get_traces)
        5    5.770    1.154    5.770    1.154 generate.py:585(get_traces)
        1    0.000    0.000    0.008    0.008 generate.py:506(__init__)
        1    0.000    0.000    0.008    0.008 generate.py:559(__init__)
        1    0.007    0.007    0.007    0.007 {method 'standard_normal' of 'numpy.random._generator.Generator' objects}
        3    0.000    0.000    0.000    0.000 {numpy.random._generator.default_rng}
        1    0.000    0.000    0.000    0.000 generate.py:18(_ensure_seed)
        5    0.000    0.000    0.000    0.000 {built-in method numpy.empty}
        3    0.000    0.000    0.000    0.000 contextlib.py:76(inner)
        6    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        6    0.000    0.000    0.000    0.000 _ufunc_config.py:32(seterr)
        3    0.000    0.000    0.000    0.000 _ufunc_config.py:429(__enter__)
        5    0.000    0.000    0.000    0.000 base.py:73(_check_segment_index)
        3    0.000    0.000    0.000    0.000 _ufunc_config.py:434(__exit__)
        1    0.000    0.000    0.000    0.000 _dtype.py:344(_name_get)
        3    0.000    0.000    0.000    0.000 abc.py:117(__instancecheck__)
        3    0.000    0.000    0.000    0.000 {function SeedSequence.generate_state at 0x7fbab8fcda20}
        3    0.000    0.000    0.000    0.000 {built-in method _abc._abc_instancecheck}
        2    0.000    0.000    0.000    0.000 {method 'integers' of 'numpy.random._generator.Generator' objects}
        1    0.000    0.000    0.000    0.000 baserecording.py:34(__init__)
        3    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(concatenate)
        6    0.000    0.000    0.000    0.000 _ufunc_config.py:131(geterr)
        5    0.000    0.000    0.000    0.000 baserecording.py:110(get_num_segments)
      4/2    0.000    0.000    0.000    0.000 abc.py:121(__subclasscheck__)
        1    0.000    0.000    0.000    0.000 _dtype.py:330(_name_includes_bit_suffix)
      4/2    0.000    0.000    0.000    0.000 {built-in method _abc._abc_subclasscheck}
        1    0.000    0.000    0.000    0.000 baserecordingsnippets.py:22(__init__)
        1    0.000    0.000    0.000    0.000 numerictypes.py:356(issubdtype)
       12    0.000    0.000    0.000    0.000 {built-in method numpy.geterrobj}
        1    0.000    0.000    0.000    0.000 baserecording.py:121(add_recording_segment)
        5    0.000    0.000    0.000    0.000 base.py:82(ids_to_indices)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.arange}
        6    0.000    0.000    0.000    0.000 {built-in method numpy.seterrobj}
        1    0.000    0.000    0.000    0.000 random.py:826(getrandbits)
        3    0.000    0.000    0.000    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
        1    0.000    0.000    0.000    0.000 base.py:43(__init__)
        6    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 numerictypes.py:282(issubclass_)
        1    0.000    0.000    0.000    0.000 generate.py:531(<listcomp>)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        1    0.000    0.000    0.000    0.000 baserecording.py:677(__init__)
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 {built-in method numpy.array}
        1    0.000    0.000    0.000    0.000 {built-in method posix.urandom}
        3    0.000    0.000    0.000    0.000 multiarray.py:148(concatenate)
        1    0.000    0.000    0.000    0.000 base.py:111(annotate)
        1    0.000    0.000    0.000    0.000 base.py:1091(set_parent_extractor)
        1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        3    0.000    0.000    0.000    0.000 contextlib.py:63(_recreate_cm)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 _dtype.py:24(_kind_name)
        1    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 base.py:1084(__init__)
        1    0.000    0.000    0.000    0.000 {built-in method from_bytes}


Execution Time: 5.90863 seconds
NoiseGeneratorRecording: 32 channels - 30.0kHz - 1 segments - 108,000,000 samples 
                         3,600.00s (1.00 hours) - float32 dtype - 12.87 GiB

samuelgarcia · 2023-09-06T14:47:22Z

Thank you @h-mayorquin for having a deeper look on this.
I will open a new PR with some changes and lets discuss over there.

on-the-fly is slower because it rnadomly generate every sample for evry channel!!!
The ratio computaion / real time is not so bad. 127s/3600s
But I think this is good to have the 2 options : on fast and the other with better randomness

h-mayorquin · 2023-09-06T15:10:56Z

Yeah, I actually like your solution a lot. I am hoping that at some point we can think on an optimization that will allow us to have the advantage of your implementation (not requiring a constant memory block) but without the speed performance costs.

But as you say, right now, let's keep the two options as they are useful in two different scenarios.

h-mayorquin · 2023-11-18T14:51:59Z

src/spikeinterface/core/tests/test_generate.py

+    after_instanciation_MiB = measure_memory_allocation() / bytes_to_MiB_factor
+    memory_usage_MiB = after_instanciation_MiB - before_instanciation_MiB
+    expected_allocation_MiB = dtype.itemsize * num_channels * noise_block_size / bytes_to_MiB_factor
+    ratio = expected_allocation_MiB / expected_allocation_MiB


@samuelgarcia You modified this test so it always passes? : P

hé hé. I see it now.
this is some kind of a mistake.

Let's fix it later : D

samuelgarcia added 8 commits August 30, 2023 12:52

refactor lazy noise generator.

8e6d7ca

Move inject template into generate.py

remove injecttemplates.py

5e2e53e

new toy_example almost working.

a97348a

More refactoring fix seed issues.

755db26

More fixes and tests for generate.py

ac0689b

Fix various with the new toy_example.

f32f929

Some more clean.

1d78131

Expose waveforms parameters in generate_templates()

85d584f

Random then in range per units when not given.

DradeAW reviewed Aug 31, 2023

View reviewed changes

src/spikeinterface/core/generate.py Outdated Show resolved Hide resolved

src/spikeinterface/core/generate.py Show resolved Hide resolved

Feedback from Aurelien

294781d

fix test_noise_generator_memory()

2a951de

samuelgarcia added 3 commits August 31, 2023 21:27

oups

a2218c6

More fixes.

bf8ac92

Fix in curation : seed/random/params for new toy_example()

3c65e20

samuelgarcia added enhancement New feature or request core Changes to core module extractors Related to extractors module refactor Refactor of code, with no change to functionality testing Related to test routines labels Aug 31, 2023

samuelgarcia added this to the 0.98.0 milestone Aug 31, 2023

Merge branch 'main' of github.com:SpikeInterface/spikeinterface into …

03cc4bc

…generator

samuelgarcia mentioned this pull request Sep 1, 2023

move peak_pipeline into core and rename it as node_pipeline. #1941

Merged

samuelgarcia and others added 2 commits September 1, 2023 09:03

force ci again

4f6e5b0

[pre-commit.ci] auto fixes from pre-commit.com hooks

7637f01

for more information, see https://pre-commit.ci

alejoe91 added 2 commits September 1, 2023 12:00

Expose decay_power, hyperpolarization->recovery, and cleanup

7ddeeb5

small typo

20f5108

alejoe91 approved these changes Sep 1, 2023

View reviewed changes

alejoe91 merged commit 23aef27 into SpikeInterface:main Sep 1, 2023

h-mayorquin reviewed Sep 5, 2023

View reviewed changes

h-mayorquin mentioned this pull request Sep 6, 2023

Seed behavior for generate #1963

Open

h-mayorquin reviewed Sep 6, 2023

View reviewed changes

h-mayorquin mentioned this pull request Sep 6, 2023

Some additions to generate.py after #1948 #1970

Merged

h-mayorquin reviewed Nov 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor generate.py #1948

Refactor generate.py #1948

samuelgarcia commented Aug 30, 2023

samuelgarcia commented Aug 31, 2023

samuelgarcia commented Aug 31, 2023

alejoe91 commented Aug 31, 2023

DradeAW left a comment

samuelgarcia commented Aug 31, 2023

samuelgarcia commented Aug 31, 2023

samuelgarcia commented Sep 1, 2023

alejoe91 commented Sep 1, 2023

h-mayorquin Sep 5, 2023

alejoe91 Sep 5, 2023

samuelgarcia Sep 5, 2023

h-mayorquin Sep 6, 2023

h-mayorquin Sep 6, 2023

h-mayorquin commented Sep 6, 2023

samuelgarcia commented Sep 6, 2023

h-mayorquin commented Sep 6, 2023

h-mayorquin Nov 18, 2023

samuelgarcia Nov 20, 2023

h-mayorquin Nov 20, 2023

Refactor generate.py #1948

Refactor generate.py #1948

Conversation

samuelgarcia commented Aug 30, 2023

samuelgarcia commented Aug 31, 2023

samuelgarcia commented Aug 31, 2023

alejoe91 commented Aug 31, 2023

DradeAW left a comment

Choose a reason for hiding this comment

samuelgarcia commented Aug 31, 2023

samuelgarcia commented Aug 31, 2023

samuelgarcia commented Sep 1, 2023

alejoe91 commented Sep 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-mayorquin commented Sep 6, 2023

samuelgarcia commented Sep 6, 2023

h-mayorquin commented Sep 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment