Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Match Event Count in Comparison between Two Sorting Objects #2343

Closed
lavanv1107 opened this issue Dec 15, 2023 · 7 comments
Closed
Labels
comparison Related to comparison module

Comments

@lavanv1107
Copy link

I have two sorting objects both with the same amount of samples and labels. The event times are also exactly the same for both objects.

However, when running both of them through the si.compare_two_sorters function, the computed match event count has fewer events than what was in both objects. Why is that?

This is true for the default delta_time= 0.4 as well as 0.

I have attached the two numpy sorting objects I created for comparison here.
sorting_objects.zip

Here is the code I used:

cmp_dss_peaks = si.compare_two_sorters(
    sorting1=sorting_dss,
    sorting2=sorting_peaks,
    sorting1_name='DeepSpikeSort',
    sorting2_name='Peaks',
    verbose=True,
    delta_time=0
)

# Sum of match event counts
# There are 7251 events in both objects, but the sum of match events is 7168
np.sum(np.sum(cmp_dss_peaks.match_event_count))

@rly

@alejoe91 alejoe91 added the comparison Related to comparison module label Dec 18, 2023
@alejoe91
Copy link
Member

alejoe91 commented Dec 18, 2023

Hi @lavanv1107

One question: are you sure that all spikes from the two sorting object match precisely?

We recently refactored the matching function for efficiency. (see #2114, #2182, #2191)

The main change is the following:
In the "old" implementation, every spike train pair was computed seprately, which resulted in (possibly) more accurate matching counts, at the expense of speed. This option started to be intractable for recordings with hundreds of units.
The "new" implementation, tries to match every possible pair at once by traversing the spike vectors (which include all spikes from all units).

@h-mayorquin has also been implementing a third method which is based on functional similarity, see #2192

Maybe wirth trying that out?

@h-mayorquin
Copy link
Collaborator

@lavanv1107 I can take a look but can you add the script of how you loaded the data in spikeinterface.

@samuelgarcia
Copy link
Member

@lavanv1107 : the way we implemented the algos make it not totally symetric (even it should be) when one of the two spiketrains is violating the the refratory period below the delta_time. (Here 0.4ms)

So my guess is that np.sum(np.diff(sorting_dss) < delta) is not null

could you check this ?

@lavanv1107
Copy link
Author

lavanv1107 commented Dec 20, 2023

Hi @alejoe91,

I created the two sorting objects using the si.NumpySorting.from_times_labels() function.

The list of times (frames) I used to create the two NumpySorting objects are the exact same. As for the labels;

  • sorting_dss has labels generated from my sorting algorithm
  • sorting_peaks has labels I retrieved from the NWB file I used

I'll take a look at the other methods you listed.

@lavanv1107
Copy link
Author

lavanv1107 commented Dec 20, 2023

@samuelgarcia

In np.sum(np.diff(sorting_dss) < delta) is sorting_dss the list of times (frames) used to make the sorting object?

If so, yes, it is not null.

@samuelgarcia
Copy link
Member

oups. sorting_dss is the sorting object. I waq confusing. My formula was on spiketrain times thenself.
I had in mind sometthing like this:

for unit_id in sorting_dss.unit_ids:
    spiketrain = sorting_dss.get_unit_spiketrains(unti_id=unit_id)
    if np.sum(np.diff(spiketrain) < delta:
        print(f"unit_id {unit_id} has some spikes under refractory period")

If you have some unit with this then compare_two_sorters will not give an exact matching count because there are ambiguous cases and we simplify the algo to be faster.
In short, in crazy situation where at the same 2 units has a burst with sevral interval under delta some spikes can be counted twice in our implementation.

@lavanv1107
Copy link
Author

Hello,

It seems that after updating to the latest version, the comparison does account for all peaks now. Thank you so much for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comparison Related to comparison module
Projects
None yet
Development

No branches or pull requests

4 participants