Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize make_match_count_matrix #2114

Merged
merged 16 commits into from
Oct 31, 2023

Conversation

h-mayorquin
Copy link
Collaborator

@h-mayorquin h-mayorquin commented Oct 18, 2023

OK guys. This was really tedious to test and think about (I did the other Hilbert Space approach out of pure spite) but here it is.

70 times faster in the following benchmark. It would be interesting to know if you think that some other parameters or regime is more interesting:

(make_match_count_matrix_si is the current method in main)

durations=[18000.0], num_units=100
function.__name__='make_match_count_matrix_si'
--------------------------------------------------
Mean time over 5 iterations: 44.444810 seconds
Std over 5 iterations: 1.043592 seconds
Std without compliation over: 1.166766 seconds
Mean without compliation 44.443244 seconds
--------------------------------------------------
function.__name__='make_match_count_matrix'
--------------------------------------------------
Mean time over 5 iterations: 0.714384 seconds
Std over 5 iterations: 0.189878 seconds
Std without compliation over: 0.009023 seconds
Mean without compliation 0.619530 seconds
--------------------------------------------------
Speedup: 71.7369745920062
Speedup including compilation: 62.21417947326248

Here the code:

num_units  = 100
#num_units = 3
durations = [5 * 60 * 60.0]
# durations = [60 * 60.0]
sampling_frequency = 30_000.0
seed = 20
sorting = generate_sorting(num_units, durations=durations, sampling_frequency=sampling_frequency, seed=seed)


samples_per_second = sampling_frequency
samples_per_millisecond = samples_per_second / 1000
delta_milliseconds = 3.334
delta_frames = int(delta_milliseconds * samples_per_millisecond)

sorting1  = sorting
sorting2  = sorting

number_it = 5

    functions = [make_match_count_matrix_si,
                 make_match_count_matrix, 
                # make_match_count_matrix_theoretical,
                 ]

    results = {}

    # Iterate through the functions and collect timing information
    for function in functions:
        print(f"{function.__name__=}")
        
        # Run the function multiple times to gather timing data
        times = [timeit.timeit(lambda: function(sorting1, sorting2, delta_frames=delta_frames), number=1) for _ in range(number_it)]
        
        # Calculate mean and std of the times
        mean_time = np.mean(times)
        std_time = np.std(times)
        mean_without_compilation = np.mean(times[1:])
        std_without_compilation = np.std(times[1:])
        
        
        print("-" * 50)
        print(f"Mean time over {number_it} iterations: {mean_time:2.6f} seconds")
        print(f"Std over {number_it} iterations: {std_time:2.6f} seconds")
        print(f"Std without compliation over: {std_without_compilation:2.6f} seconds")
        print(f"Mean without compliation {mean_without_compilation:2.6f} seconds")

        # print(f"Timing results: {times}")
        
        results[function.__name__] = {"iterations": number_it, "times": times, "mean": mean_time, "std": std_time, "mean_without_compilaton": mean_without_compilation}

        print("-" * 50)
        
    # Divide the mean without compilation of the first function by the mean without compilation of the second function
    speedup = results[functions[0].__name__]['mean_without_compilaton'] / results[functions[1].__name__]['mean_without_compilaton']
    speed_up_with_compliation = results[functions[0].__name__]['mean'] / results[functions[1].__name__]['mean']
    print(f"Speedup: {speedup}")
    print(f"Speedup with numba compilation: {speed_up_with_compliation}")

@h-mayorquin h-mayorquin added the comparison Related to comparison module label Oct 18, 2023
@h-mayorquin h-mayorquin self-assigned this Oct 18, 2023
@samuelgarcia
Copy link
Member

Yeah! You are a true heroe.
Thanks to this, I think we will be able to remove this joblib dependencies soon!!!

@h-mayorquin
Copy link
Collaborator Author

@alejoe91
You are totally right, Alessio!

I wanted to avoid the bare try-except (as that still imports numba even when you are not using it) but for numba the type of solution that preserves the performance is a monster. Check the last commit.

I wonder if we should eat the performance importing cost and just have the simple try except.

@h-mayorquin
Copy link
Collaborator Author

There is always a quality metrics test failing somewhere, any idea about this one @alejoe91?

It seems unrelated to the change here but I have been wrong about this before...

@zm711
Copy link
Collaborator

zm711 commented Oct 20, 2023

I actually had a semi-related question to the point of this PR, so I'm hoping to just index here so if I forget I can come back. I understand the point of keeping minimal dependencies, but there is so much numba in this code why isn't it a dependency?

There is always a quality metrics test failing somewhere, any idea about this one @alejoe91?

It seems unrelated to the change here but I have been wrong about this before...

I've seen that test fail on my PRs before too. I think it is a non-deterministic test that fails on some runs and not others, it probably needs a patch at some point.

@samuelgarcia
Copy link
Member

About dependency this is OK to more heavy dependency in sub modules pecific.
comparison having numba is OK.
what I do not like is core having numba and scipy.

So if comparison need numba we can use global import for numba I think like do evrywhere else.

@alejoe91
Copy link
Member

Thanks Ramon! This looks great to me! @samuelgarcia do you have any more comments?

@samuelgarcia
Copy link
Member

Let me read this again.

@h-mayorquin
Copy link
Collaborator Author

Just had a discussion with Sam. Some points.

  1. I was not aware that sorting objects can reset their time. For that and more, spikes from different sorters should not be compared as I am currently doing. I will change this.
  2. I will change to spike_vector, that will be improved later and I think it will make the code easier to read.
  3. We are both weary of the current solution for dynamically importing numba. We should think on a general solution on how to do this.

@alejoe91 alejoe91 added this to the 0.99.0 milestone Oct 25, 2023
@samuelgarcia
Copy link
Member

HI @h-mayorquin.
Thanks a lot for this. This is really super super usefull.
I did a few changes on the fly. I hope that is ok for you.
I am not super happy with the mechanism of numba decorator to avoid global numba import.
I merge this now but I propose to open an issue to unify numba decorator all over the code later for a next release.

@h-mayorquin
Copy link
Collaborator Author

great. lets work on the numba thing.

@samuelgarcia samuelgarcia merged commit 40210c8 into SpikeInterface:main Oct 31, 2023
9 checks passed
@h-mayorquin h-mayorquin deleted the speed_up_matrix branch November 29, 2023 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comparison Related to comparison module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants