Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_behavioral_distance() raises errors under certain circumstances #130

Open
quantumdot opened this issue Jul 30, 2021 · 0 comments
Open

Comments

@quantumdot
Copy link
Contributor

Calling moseq2_viz.model.dist.get_behavioral_distance() is crashing on syllables that have no emissions in the dataset when computing the distance metric pca[dtw]. Given the call

get_behavioral_distance(sorted_index, model, max_syllable=100, sort_labels_by_usage=False, count='usage', distances=['pca[dtw]'])

I see the (partial) stack trace:

  File "c:\users\josh\git\moseq2\moseq2-viz\moseq2_viz\model\dist.py", line 165, in get_behavioral_distance
    **use_options)
  File "c:\users\josh\git\moseq2\moseq2-viz\moseq2_viz\model\util.py", line 1092, in retrieve_pcs_from_slices
    inds = np.random.randint(0, len(filtered_slices), size=max_samples)
  File "mtrand.pyx", line 741, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1353, in numpy.random._bounded_integers._rand_int32
ValueError: low >= high

Stepping through with a debugger, it looks like moseq2_viz.model.util.retrieve_pcs_from_slices() is failing if slices is an empty list, specifically when trying to sample random slices (because there are none!). This was not the behavior in earlier versions of moseq2-viz. I suspect the same errors would occur when computing the distance metric combined.

I'm not really sure the best course of action to resolve this bug. While investigating, I found a few other related issues:

  • The kwargs provided by get_behavioral_distance() for retrieve_pcs_from_slices() includes 'max_samples': None,, but then retrieve_pcs_from_slices() will fail when trying to enumerate(use_slices). This is due to np.random.randint() only returning a single value when size is None, so use_slices ends up with a value of a single slice and not a list of slices. Not sure if the use of None for max_samples is intentional or not.
  • The unit tests do not cover some of the possible execution paths for get_behavioral_distance(). Namely, they omit the distance metrics of pca[dtw] and combined. Probably this contributes to these problems not being discovered sooner.

My suggested patch would be something along the lines of below, which seems to work on my machine.

diff --git a/moseq2_viz/model/util.py b/moseq2_viz/model/util.py
index 5c2a8b2..10ac67c 100644
--- a/moseq2_viz/model/util.py
+++ b/moseq2_viz/model/util.py
@@ -1089,8 +1089,13 @@ def retrieve_pcs_from_slices(slices, pca_scores, max_dur=60, min_dur=3,
                          get(0))
     filtered_slices = _gen_to_arr(filter(filter_dur, slices))
     # select random samples
-    inds = np.random.randint(0, len(filtered_slices), size=max_samples)
-    use_slices = filtered_slices[inds]
+    if(len(filtered_slices) > 0):
+        inds = np.random.randint(0, len(filtered_slices), size=max_samples)
+        if not hasattr(inds, 'shape'):
+            inds = [inds]
+        use_slices = filtered_slices[inds]
+    else:
+        use_slices = []

     syllable_matrix = np.zeros((len(use_slices), max_dur, npcs), 'float32')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant