Expose `t_start` in `BaseRecording` #3117

JoeZiminski · 2024-07-01T21:26:50Z

This PR exposes t_start by allowing it to be set through set_times(). Now set_times times parameter can take a vector (np.ndarray) or int | float. If the format it is treated as a time vector and if the latter a t_start. Previously t_start were only available if set when loading from file.

The PR extends the tests to cover the t_start cases and another couple of cases such as save_to_memory(), which currently has a bug (#2515). In creating these tests another couple of small issues were found, I cherry-picked the fixes to different PRs to keep the diffs easier to review. For ease though if everyone is happy all can stay here. They will fail until this branch is rebased back on master once the other PRs are reviewed.

JoeZiminski · 2024-07-01T22:30:42Z

src/spikeinterface/core/tests/test_time_handling.py



+# TODO: deprecate original implementations ###


TODO: this was messing up the diff so left for the end.

h-mayorquin · 2024-07-02T18:36:53Z

We do use the time machiner of spikeinterface in neuroconv. I would like to participate on the review. Among the Prs that you have opened, where should I start?

JoeZiminski · 2024-07-02T19:07:25Z

Thanks @h-mayorquin! That would be great, I think:

Round instead of int for time_to_sample_index. #3119 is a very quick one mostly unrelated to others
Is also mostly standalone Add time vector case to get_durations. #3118 only related to the time_vector concept.
I think then this PR (I will undraft it now) for the changes to t_start
then Fix t_starts not propagated to save_to_memory. #3120 which fixes a failing test case here.

Cheers!

h-mayorquin

What is the purpose of this? I think knowing about how you intend this to be used would help clarify the debate and inform the API.

I find the testing a bit complicated to read. I think there is too much indirection of the fixtures. I will have to do a second reading but I think that's a code smell.

Plus, it seems that many of the tests don't really correspond hee. Why are the tests of memory no in the memory PR. If you are iterating over the segments already on some of the fixtures you can just set their t_start attribute directly, you don't need a special method at the BaseRecording level to set t_start, it is a public attribute at the BaseRecordingSegment. I don't think that the tests for the interface should rely on the interface.

I am requesting changes here because I think some of the testing does not correspond to this PR.

h-mayorquin · 2024-07-02T20:02:04Z

src/spikeinterface/core/baserecording.py


        Parameters
        ----------
-        times : 1d np.array
-            The time vector
+        times : int | float | 1d np.array


I personally would prefer to not overload this function and create a new one instead. set_t_start.

What are the advantages of overloading this? How are you thinking about it?

But I think what kind of API we should have will become clear once I understood how are you envisioning this to be used.

Yes I couldn't decide on this vs. separate functions and @samuelgarcia suggested this approach. On balance I think I prefer it as t_start and times_vector are mutually exclusive ways of setting custom times, so it makes sense to change in one place. It would be slightly strange to call set_t_start() and this removes the time_vector attribute. But I'm not sure either way.

Interesting. Thanks for sharing, I will add this as something to discuss at the meeting.

I guess that my take is the following:

We use set_times because the sampling recording is slightly irregular when we want more precision. This is a good name for what the function does, it has a clear purpose and semantics.

Why we would need to set t_start?

The use case that I can think off is that we would like to shift all the recording to the right or the left on time. But if that is the use case I would be better to have a method that shift the recording times and works independently of whether times are handled internally with t_start and sampling frequency or a time vector.
In the first case, you shift t_start (in most of the cases from 0) and in the second you shift the time vector.

If it is not for shifting I can't think on other use of setting t_start

I think the use case would be if you had separate sessions in a single day, for example a session, 10 minute break to change some equiptment, and another recording session. If these sessions are held as different segments on a recording (or, as separate recordings) the researcher may want to hold the true recording times (e.g. session 1 started at 1 pm, session two at 1.30 pm).

OK, change my answer below what I think this type of case is not well served with the current implementation.

h-mayorquin · 2024-07-02T20:04:59Z

src/spikeinterface/core/tests/test_time_handling.py

+        spaced timeseries data. Return the original recording,
+        recoridng with time vectors added and list including the added time vectors.
+        """
+        times_recording = copy.deepcopy(raw_recording)


we have clone for this as an extractor method but if you really require this, why make the raw recording fixture per session?

The benefit of the raw_recording fixture is that durations only needs to be defined once, then copied as set_times() is in place. But I agree it is a lot of indirection and it is probably more readable to incorporate into the individual fixtures, possibly with DURATIONS=[...] set at the top of the script?

JoeZiminski · 2024-07-02T20:48:28Z

Thanks @h-mayorquin! The problem is that t_start and time_vecetor are mutually exclusive and so if you set time_vector the t_start is removed (and, it requires the same vice-versa). Therefore some kind of setter is required to remove time_vector if it is already set when setting t_start. Also, t_start and time_vector seem like core ways of handling custom times in SpikeInterface so it might seem unusual to have a setter for one but not the other.

Agree the tests are kind of messed up, in adding these for the t_start I found some other bugs and split them into separate PRs to ease review, but was too lazy to do the tests 😅 . The tests in the other PRs require some of the machinery in this PR, so happy to merge this PR first removing some of those tests, then add them in other PRs once this is merged.

Tomorrow I can try and remove some of the indrection in the tests and remove tests that do not correspond directly to this PR, or at least, are failing if included in this PR. WDYT?

h-mayorquin · 2024-07-02T21:16:46Z

Thanks @h-mayorquin! The problme is that t_start and time_vecetor are mutually exclusive and so if you set time_vector the t_start is removed (and, it requires the save vice-versa). Therefore some kind of setter is required to remove time_vector if it is set when setting t_start. Also, t_start and time_vector are core ways of handling custom times in SpikeInterface so for me it seems unusual to have a setter for one but the other is set through a public attribute on a segment.

Yes, I think this is at the core of why I don't like this approach. It requires the final user to be aware of internal details of spikeinterface. I feel it mixes how spikeinterface handles time internally with what the user wants with its recording object. Let me elaborate:

I think that a good interface will abstract those implementations details away from the user and will allow them to express things that they want. Let me illustrate a different approach with an example:

My times are iregular so I will set the correct times that I got from a TTL process with set_times.
But also, this is the second segment of an experiment and it should start an hour after the first.

What I calim is that this should be as easy as

set_times(times_from_ttl)
set_segment_start_time(t=time_when_the_second_segment_started)

The interface proposed in this PR will say: sorry, those things are exclusive because of this internal spikeinterface reason, what you need to do is to compose on your own the time_vector, shift it, and then use set_times differently for each segment.

What I am proposing is that we have a set_segment_start_time that is agnostic to the internal implementation details of spikeinterface: It should do what the user wants

If the recording has a time_vector because the format is like that or the user set those times before, then when the user calls set_segment_start_time it should modify the time vector such that the time_vector[0] is t_start.
If the recording handles time with sampling_frequency and t_start then it should modify t_start to what the user wants.

Advance users or testing casses like the one in this PR can modify the attributes directly for stuff that is not covered.

JoeZiminski · 2024-07-03T07:49:35Z

Thanks Herberto, I like the semantics of the set_segment_start_time() it is very clear. I also don't like the mutual excusivility of the t_start and time_vector mechanisms and its requirement for the user to understand the SpikeInterface internals. I have put some feedback on the proposed approach in the dropdown below (as not to bloat the response).

on `set_segment_start_time`

I like the semantics of this but I think this still requires the user to understand how SpikeInterface is representing time internally, and creates some hidden dependencies that could be even more confusing. At present there is hidden dependency but at least the dependency is mutual exclusivility so the two concepts can't interact and you only really need to track the one you want to use. In the proposed case there are some interactions going on under the hood which could end up in confusing behaviour. For example:

if you have times that start from zero but you want them relative, you need to pass some times that are 'wrong' and then use another function to make them correct
if you have times that do not start from zero, you need to remove the offset (?), pass them in an incorrect form, then make them correct again through a second function
Or, maybe set_times can take a time_vector that do not start at zero. But now there are two way to set the segment start times for time vector (directly through the set time vector or through this second approach).
It could lead to the case where somone sets set_segment_start_time then later adds the time_vector, but now their time vector is wrong because the previously set t_starts are interacting. This would be quite stange behaviour that you set_times() then get_times() and the get_times() is different.

I think this is a very important discussion to have but it would require a major reworking of how SpikeInterface handles time which I think is outside the scope of the PR. I think we should make an issue to discuss this, I'd be very keen as it is important for #3072.

For now, let me better motivate this PR with respect to the existing implementation of times:

A present, custom times can be represented as t_start or time_vector. time_vector has it's own setter set_times(). However t_start is not possible to set except for directly on the segment, which is an unused pattern in the SpikeInterface API. The only proper way to set t_start through the API is through some extractors when loading certain file types. So t_start is in this weird state of sometimes turning up in the codebase but isn't actually settable generally.
Therefore I think there are two possible approaches. Either t_start is not a useful concept and it should be removed from SpikeInterface (maybe if t_start is required in extractors it can be set to a full time_vector during load) (I am not averse to this). Alternatively, it should be fully exposed and documented in the SpikeInterface API (this PR takes the second approach).
Tests are added to cover all uses of time_vector and t_start in the core machinery.

Let me know what you think of the above, I think there is definately room for further optimisation for the API generally, but I think this PR represents a digestible improvement on the current situation (but alternatively would agree with removing t_start from the SpikeInterface side and relying entirely on time_vector).

h-mayorquin · 2024-07-03T12:48:59Z

Or, maybe set_times can take a time_vector that do not start at zero. But now there are two way to set the segment start times for time vector (directly through the set time vector or through this second approach).

Yes, that's how I am thinking about it. This is the way that is now. I don't understand the first two examples but maybe they rely on this behavior not being available? Maybe not? If not, can you illustrate the examples?

It could lead to the case where somone sets set_segment_start_time then later adds the time_vector, but now their time vector is wrong because the previously set t_starts are interacting. This would be quite stange behaviour that you set_times() then get_times() and the get_times() is different.

This one I agree is confusing. The current behavior is that set_times is the source of true so I will keep that overwrite behavior. set_segment_start_time is a utility that works either if you have a time vector or not but if you set the time vector then that overides what set_segment_start_time did before.

A present, custom times can be represented as t_start or time_vector. time_vector has it's own setter set_times(). However t_start is not possible to set except for directly on the segment, which is an unused pattern in the SpikeInterface API. The only proper way to set t_start through the API is through some extractors when loading certain file types. So t_start is in this weird state of sometimes turning up in the codebase but isn't actually settable generally.

Yeah, mabye is an improvement, but I don't think is a big one, setting the t_start at the segment level is fine for the purposes of this PR for example which is testing but if we are gona discuss how to expose a simpler API to the end user then I want to have this more general discussion that we are having.

We should not try to remove the concept of handling the internal timings with t_start because that is there for memory efficiency reasons but I maybe the user does not need to interact with it. Another possibility to the one I am proposing is only setings times with a time vector but transforming them to t_start, sampling_frequency internally if they are regular enough. I general I think we should separate internal handing from how the user interacts with him and avoid a power user bias.

JoeZiminski · 2024-07-03T13:04:08Z

Thanks! Agree these can be split into two separate discussions. Would you agree with:

Move the from tests here to Fix t_starts not propagated to save_to_memory. #3120 (with some refactoring of the tests) and close this PR
Open an issue to discuss the time API and how this should be presented in Understanding how Spikeinterface handles recording times. #2627

h-mayorquin · 2024-07-03T13:36:37Z

Mmm this is just my opinion, I don't know if I have convinced you. I think that Sam will side with you if you want to make your life more convenient here

But I think not using this machinery for the tests is a good idea regardless of whether we make this a user interface or not.

JoeZiminski · 2024-07-05T10:00:02Z

I think it's worthwhile discussing further, if we pick this lane (i.e. what is introduced in this PR) and write the documentation it will be hard to undo and there may be better options. I definately agree there is room to make this cleaner and more intutiive, for me though it is safer to keep t_start and time_vector distinct concepts rather than allow them to interact. I'll move the tests to the other PR ASAP, and then we can keep this PR with bare minimum changes and discuss the API further here.

JoeZiminski · 2024-07-08T15:17:38Z

I will close this PR, the tests can be included in #3120 #3118 and we are converging on a much better time API in #3157

Add t_start option to get_times().

6b617ab

JoeZiminski force-pushed the add_t_start_to_set_times branch 7 times, most recently from f6d9e67 to acd57a6 Compare July 1, 2024 21:52

Add tests.

6c87d22

JoeZiminski force-pushed the add_t_start_to_set_times branch from acd57a6 to 62bd935 Compare July 1, 2024 21:59

This was referenced Jul 1, 2024

Add time vector case to get_durations. #3118

Merged

Fix t_starts not propagated to save_to_memory. #3120

Merged

JoeZiminski force-pushed the add_t_start_to_set_times branch from 62bd935 to 6c87d22 Compare July 1, 2024 22:29

JoeZiminski commented Jul 1, 2024

View reviewed changes

Fix docstring typing.

7da035c

JoeZiminski changed the title ~~Expose t_start in BaseRecording, extend tests and a couple of fixes.~~ Expose t_start in BaseRecording Jul 1, 2024

alejoe91 added this to the 0.101.0 milestone Jul 2, 2024

alejoe91 added the core Changes to core module label Jul 2, 2024

JoeZiminski marked this pull request as ready for review July 2, 2024 19:07

h-mayorquin requested changes Jul 2, 2024

View reviewed changes

h-mayorquin mentioned this pull request Jul 5, 2024

Proposal for handling the user interaction with time #3157

Open

JoeZiminski closed this Jul 8, 2024

JoeZiminski mentioned this pull request Nov 1, 2024

Add shift start time function. #3509

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose `t_start` in `BaseRecording` #3117

Expose `t_start` in `BaseRecording` #3117

JoeZiminski commented Jul 1, 2024 •

edited

Loading

JoeZiminski Jul 1, 2024

h-mayorquin commented Jul 2, 2024 •

edited

Loading

JoeZiminski commented Jul 2, 2024

h-mayorquin left a comment •

edited

Loading

h-mayorquin Jul 2, 2024 •

edited

Loading

JoeZiminski Jul 2, 2024

h-mayorquin Jul 2, 2024

h-mayorquin Jul 2, 2024 •

edited

Loading

JoeZiminski Jul 2, 2024

h-mayorquin Jul 2, 2024

h-mayorquin Jul 2, 2024

JoeZiminski Jul 2, 2024

JoeZiminski commented Jul 2, 2024 •

edited

Loading

h-mayorquin commented Jul 2, 2024 •

edited

Loading

JoeZiminski commented Jul 3, 2024 •

edited

Loading

h-mayorquin commented Jul 3, 2024

JoeZiminski commented Jul 3, 2024

h-mayorquin commented Jul 3, 2024

JoeZiminski commented Jul 5, 2024

JoeZiminski commented Jul 8, 2024

Expose t_start in BaseRecording #3117

Expose t_start in BaseRecording #3117

Conversation

JoeZiminski commented Jul 1, 2024 • edited Loading

JoeZiminski Jul 1, 2024

Choose a reason for hiding this comment

h-mayorquin commented Jul 2, 2024 • edited Loading

JoeZiminski commented Jul 2, 2024

h-mayorquin left a comment • edited Loading

Choose a reason for hiding this comment

h-mayorquin Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

JoeZiminski Jul 2, 2024

Choose a reason for hiding this comment

h-mayorquin Jul 2, 2024

Choose a reason for hiding this comment

h-mayorquin Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

JoeZiminski Jul 2, 2024

Choose a reason for hiding this comment

h-mayorquin Jul 2, 2024

Choose a reason for hiding this comment

h-mayorquin Jul 2, 2024

Choose a reason for hiding this comment

JoeZiminski Jul 2, 2024

Choose a reason for hiding this comment

JoeZiminski commented Jul 2, 2024 • edited Loading

h-mayorquin commented Jul 2, 2024 • edited Loading

JoeZiminski commented Jul 3, 2024 • edited Loading

h-mayorquin commented Jul 3, 2024

JoeZiminski commented Jul 3, 2024

h-mayorquin commented Jul 3, 2024

JoeZiminski commented Jul 5, 2024

JoeZiminski commented Jul 8, 2024

Expose `t_start` in `BaseRecording` #3117

Expose `t_start` in `BaseRecording` #3117

JoeZiminski commented Jul 1, 2024 •

edited

Loading

h-mayorquin commented Jul 2, 2024 •

edited

Loading

h-mayorquin left a comment •

edited

Loading

h-mayorquin Jul 2, 2024 •

edited

Loading

h-mayorquin Jul 2, 2024 •

edited

Loading

JoeZiminski commented Jul 2, 2024 •

edited

Loading

h-mayorquin commented Jul 2, 2024 •

edited

Loading

JoeZiminski commented Jul 3, 2024 •

edited

Loading