get_behavioral_distance() raises errors under certain circumstances #130

quantumdot · 2021-07-30T23:42:24Z

Calling moseq2_viz.model.dist.get_behavioral_distance() is crashing on syllables that have no emissions in the dataset when computing the distance metric pca[dtw]. Given the call

get_behavioral_distance(sorted_index, model, max_syllable=100, sort_labels_by_usage=False, count='usage', distances=['pca[dtw]'])

I see the (partial) stack trace:

  File "c:\users\josh\git\moseq2\moseq2-viz\moseq2_viz\model\dist.py", line 165, in get_behavioral_distance
    **use_options)
  File "c:\users\josh\git\moseq2\moseq2-viz\moseq2_viz\model\util.py", line 1092, in retrieve_pcs_from_slices
    inds = np.random.randint(0, len(filtered_slices), size=max_samples)
  File "mtrand.pyx", line 741, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1353, in numpy.random._bounded_integers._rand_int32
ValueError: low >= high

Stepping through with a debugger, it looks like moseq2_viz.model.util.retrieve_pcs_from_slices() is failing if slices is an empty list, specifically when trying to sample random slices (because there are none!). This was not the behavior in earlier versions of moseq2-viz. I suspect the same errors would occur when computing the distance metric combined.

I'm not really sure the best course of action to resolve this bug. While investigating, I found a few other related issues:

The kwargs provided by get_behavioral_distance() for retrieve_pcs_from_slices() includes 'max_samples': None,, but then retrieve_pcs_from_slices() will fail when trying to enumerate(use_slices). This is due to np.random.randint() only returning a single value when size is None, so use_slices ends up with a value of a single slice and not a list of slices. Not sure if the use of None for max_samples is intentional or not.
The unit tests do not cover some of the possible execution paths for get_behavioral_distance(). Namely, they omit the distance metrics of pca[dtw] and combined. Probably this contributes to these problems not being discovered sooner.

My suggested patch would be something along the lines of below, which seems to work on my machine.

diff --git a/moseq2_viz/model/util.py b/moseq2_viz/model/util.py
index 5c2a8b2..10ac67c 100644
--- a/moseq2_viz/model/util.py
+++ b/moseq2_viz/model/util.py
@@ -1089,8 +1089,13 @@ def retrieve_pcs_from_slices(slices, pca_scores, max_dur=60, min_dur=3,
                          get(0))
     filtered_slices = _gen_to_arr(filter(filter_dur, slices))
     # select random samples
-    inds = np.random.randint(0, len(filtered_slices), size=max_samples)
-    use_slices = filtered_slices[inds]
+    if(len(filtered_slices) > 0):
+        inds = np.random.randint(0, len(filtered_slices), size=max_samples)
+        if not hasattr(inds, 'shape'):
+            inds = [inds]
+        use_slices = filtered_slices[inds]
+    else:
+        use_slices = []

     syllable_matrix = np.zeros((len(use_slices), max_dur, npcs), 'float32')

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_behavioral_distance() raises errors under certain circumstances #130

get_behavioral_distance() raises errors under certain circumstances #130

quantumdot commented Jul 30, 2021

get_behavioral_distance() raises errors under certain circumstances #130

get_behavioral_distance() raises errors under certain circumstances #130

Comments

quantumdot commented Jul 30, 2021