Clarity on data description #6

YiqiJ · 2024-12-19T07:02:13Z

Hi,
Thank you for sharing the dataset. I am processing the data but encountered several issues:

Center Out Reaching 001057 (nwb) file sub-Dataset-3-Animals-1-2-3-&-4 has no Target_ID field but target_dir field. Additionally, there are 15 unique target_dir values. By /(2*np.pi) * 360, I obtain unique values as [0,45,90,135,180,225,270,325, 1,-1,2,-2,3,-3, nan] . Specifically, Animal 3 has the correct radian values, while Animals 1, 3, 4 all has non-radian values. I initialized a map between the index to radian. However, if you plot the plt.plot(hand_vel_x, hand_vel_y), the index labels (-3 to 3) are seemingly incorrect. As shown here, this is trial_dir equal to -2, which is identical to 225 degree presumably.

Center Out Reaching 001057 (nwb) file sub-Dataset-5-Animal-1 has Target_ID field, but with values array([ 6., 3., 2., 7., 4., 5., 0., 1., 12., nan]), which looks confusing.
Currently, I am using df_raw , bin = get_dataframe(data,filter_result=[b'R']). But when applying the df=rebin(df_raw,prev_bin_size = bin ,new_bin_size = 30) and df = align_event(df, start_event='EventTarget_Onset', bin_size=30,offset_min=-50,offset_max=400), the trial length decrease from ~120 time bins to less than 10 time bins. Additionally, after calling these two functions, each trial appear to include various different target_dir, which should not be the case. I thought the align function usually tend to first find the key point (e.g., target onset), then include 50 time bins before and 400 time bins after.
The kaggle dataset end with .parquet looks very different from the nwb files. If you plot the hand velocity

Can you please specify

how to interpret the trial type information? Including trial_dir = {-3, -2, -1, 1, 2, 3} and 'target_ID = array([ 7., 2., 3., 1., 0., 6., 5., 4., nan])
how to get 50 ms before key point and 400 ms after key point? Thus resulting in 45 time bins if the bin_width is 10 ms.
Is the original data with timestamp and hand position available? (looks like dataset 4 has cursor_pos information available, but not for dataset 3 and dataset 5) I feel like this could be extremely useful.

Thank you!

The text was updated successfully, but these errors were encountered:

acarolinafilipe · 2024-12-19T11:19:16Z

Hi,

Thank you for your detailed feedback and observations. Let me address each point:

Target_ID in Dataset 3
You are correct that the original data for Dataset 3 does not include a Target_ID field but rather a target_dir field. I have double-checked the data, and the values in target_dir are the same as in the original dataset. According to their documentation, the target_dir values should represent angles in radians for all animals. I did not apply any further processing to this variable. However, I agree that the values you pointed out do not look consistent or interpretable. I will contact the original authors of this dataset to clarify and verify the correctness of these values.
Target_ID in Dataset 5
The Target_ID values in Dataset 5 indicate that the experiment involved 13 unique targets across all experiments. However, not all targets were used for every animal or task, as this dataset includes both a center-out task and a random-target task. The nan values represent incomplete or aborted trials. I will update the labeling in the data to ensure these cases are explicitly identified. Thank you for catching this issue. (note: cursor position is available in this dataset.)
Alignment and Rebinning
I re-tested the alignment function and confirmed that it works as expected. The offsets specified in the function are in milliseconds (ms), not time bins. Please check if you are filtering by dataset, animal, session, and trial when verifying single-trial lengths. Filtering inconsistently or across datasets can lead to discrepancies.
Your intuition is correct; it should work as you described. If the trial length decreases significantly after alignment, this could be caused by misaligned event timestamps or inconsistencies in the way trials are filtered. Can you tell me in which dataset you observed this problem? That will help me replicate the problem and investigate it further.

Thank you for using our dataset and for your detailed observations. We're excited to see people working with the data, and your feedback is invaluable for improving it. Please don't hesitate to contact us if you have any other problems and share details about the datasets or specific problems so that we can provide further assistance.

YiqiJ · 2024-12-19T21:18:57Z

Hi Carolina,

Thank you for the quick response! These are very helpful. A few more questions:

These are some sample behavioral trajectories in Dataset-3-5. Note, these are the plot of [cursor_vel_x, cursor_vel_y]. The colors are assigned based on the Target_ID. Note that I ignored the trials if there Target_ID is either 12 or nan.

Q1 (Dataset 5): it looks like a fraction of trials are labeled correctly, while a fraction of trials have wrong Target_ID labels (some random colors in each target direction). I wonder if it is because this Dataset-5 has includes two task? Two follow up questions are: 1) Are the ID in the table in the Q4 of the NeuroTask_datasheet.pdf refer to the dataset id or some other id? 2) if the dataset-5 includes both CenterOut and RandomTarget task, how do we differentiate between the two tasks?

Q2 (Dataset 4): Here are the results I got comparing between w/ rebin+alignment and w/o rebin/alignment. It looks like the rebin and alignment is not giving me the correct results. I'm doing

fpath = "data/NeuroTask/CenterOutReaching/001057/sub-Dataset-3-Animals-1-2-3-&-4/sub-Dataset-3-Animals-1-2-3-&-4.nwb"
data = nap.load_file(fpath)
df_raw , bin = get_dataframe(data,filter_result=[b'R'])
print(bin)
df = rebin(df_raw,prev_bin_size = bin ,new_bin_size = 10)
df = align_event(df_raw, start_event='EventTarget_Onset', bin_size=10,offset_min=-50,offset_max=400)

Q3 (Dataset 3): For Animal-1, it seems like the target_dir might be incorrect. The colors are consistent across sessions, which might indicate that some directions are consistently labeled falsely. (It is hard to believe that the monkey is performing incorrectly, because we are filtering out the rewarded trials by calling df_raw , bin = get_dataframe(data,filter_result=[b'R']). Animal-2 looks correct. For Animal-3, all sessions appear to have only 4 directions, but the direction labels are correct. For Animal-4, even though there are 12 sessions, it seems like session 1 and 2 are identical, session 3 and 4 are identical, etc.. So in total only 6 different sessions. Furthermore, the target_dir labels looks incorrect for this animal.

Note that, the animals (Anima-1 and Animal-4) that have false target_dir labels are those whose target_dir has values [-3, -2, -1, 0, 1, 2, 3] after division by 2 pi and multiplication of 360. While Animal-2 and Animal-3, who have correct target_dir values are those whose target_dir has values [0, 45, 90, 135, 180, 225, 270, 325] after division by 2 pi and multiplication of 360.

For Dataset-3, I obtain the original bin size equal to 30ms. I'm confuse about this BIN value from df , BIN = get_dataframe(data,filter_result=[b'R']). Is that the sampling rate? If so, 30 Hz sounds too slow for Electrophysiology data. From the original paper, it seems like the sampling rate is 30 kHz. If the original data is already binned, isn't that the data we have for each bin should represent spike rate, thus a real value instead of binary value?
For datasets that has multiple tasks, how can we differentiate between them?

Thanks!

acarolinafilipe · 2024-12-23T10:41:25Z

I wanted to let you know that I’ve already contacted the authors of the original datasets to investigate the issue with the target ID and dir. I plan to upload the revised dataset after the Christmas break, in early January.

Regarding the align_event function, please ensure the dataframe used is consistent with the re-binning process. In your code, when calling this function, use the dataframe obtained after re-binning rather than the original one to maintain consistency.

On DANDI, each dandiset corresponds to a task, while on Kaggle, this information is indicated in the file name. Currently, RTT (for dataset 5) is only available in the Parquet format and not in NWB, but I will include it in the upcoming version.

Yes, the ID corresponds to datasetID.
If there’s any other field or feature you’d like to see in the dataset, let me know, and I’ll include it in the new version.

Thank you again for your feedback, and happy holidays!

acarolinafilipe · 2025-01-14T16:56:41Z

I've uploaded dataset 3 with the corrected targets - both in kaggle and dandi. The numbering of animal 4's sessions has also been changed, because it has 2 brain areas recorded at the same time, and could have been a little misleading previously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarity on data description #6

Clarity on data description #6

YiqiJ commented Dec 19, 2024 •

edited

Loading

acarolinafilipe commented Dec 19, 2024

YiqiJ commented Dec 19, 2024 •

edited

Loading

acarolinafilipe commented Dec 23, 2024

acarolinafilipe commented Jan 14, 2025

Clarity on data description #6

Clarity on data description #6

Comments

YiqiJ commented Dec 19, 2024 • edited Loading

acarolinafilipe commented Dec 19, 2024

YiqiJ commented Dec 19, 2024 • edited Loading

acarolinafilipe commented Dec 23, 2024

acarolinafilipe commented Jan 14, 2025

YiqiJ commented Dec 19, 2024 •

edited

Loading

YiqiJ commented Dec 19, 2024 •

edited

Loading