Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert any sample columns in events files to integers. #356

Closed

Conversation

rwblair
Copy link
Member

@rwblair rwblair commented Feb 15, 2023

As per:
https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/05-task-events.html

script I ran from root of the repository:

#!/bin/bash
find  . -name "*_events.tsv" -exec grep -l sample {} \; | while read line;
    do indx=$(head -n1 $line | tr "\t" "\n" | grep -nx sample |  cut -d":" -f1);
    if [[ $indx ]]; then 
        if grep -lzP "\r\n$" $line; then
            rs="\r\n"
        else
            rs="\n"
        fi
        echo "$line at column $indx"
        gawk -v RS=$rs -v col=$indx 'BEGIN { FS=OFS="\t" } { $col=gensub(/([0-9]+)\.[0-9]+/, "\\1",1, $col); print }' $line > tmp && mv tmp $line;

    fi
done

@@ -1,300 +1,300 @@
onset duration trial_type response_time sample value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples with .5

@@ -1,553 +1,553 @@
onset duration sample event_type face_type rep_status trial rep_lag value stim_file
0.004 n/a 1.0 setup_right_sym n/a n/a n/a n/a 3 n/a
24.2098181818 n/a 6052.4545 show_face_initial unfamiliar_face first_show 1 n/a 13 u032.bmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples with lots of extra digits

Copy link
Contributor

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to get feedback from the eeg_face13 contributor, which has non-integer samples (but halves) and ds003654s_* which has the onset column divided by 0.004.

I agree that samples should be integers and this is explicit in the spec.

@effigies
Copy link
Contributor

"Authors": [
"James A. Desjardins",
"Sidney J. Segalowitz"
],

"Authors": [
"Daniel G. Wakeman",
"Richard N Henson",
"Dung Truong (curation)",
"Kay Robbins (curation)",
"Scott Makeig (curation)",
"Arno Delorme (curation)"
],

@VisLab you might be a good person to check in with here.

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Ross. I'll take care of mirroring these changes to eeg_matchingpennies, where apparently I used floats instead of integers ... so I'll just cut off the .0 at each sample.

Some further notifications:

@VisLab
Copy link
Member

VisLab commented Mar 15, 2023

I'll fix these both on the example datasets (all of the hed-related examples have this problem) and on the openNeuro version.

BTW, I noticed that this is referencing bids-examples/eeg_ds003654s_hed/dataset_description.json. However,
we did a correction pull-request which has been previously merged so that the datasets are eeg_ds003645s in bids-examples.

I thought I had caught and corrected all of the 54 --> 45 errors. Where did this one come from @effigies? I'd like to correct that while I am at it. Thx

@christinerogers
Copy link

@Andesha is this something we could update in the next month perhaps?

(thanks @sappelhoff for bringing my attention to this)

@sappelhoff
Copy link
Member

I thought I had caught and corrected all of the 54 --> 45 errors. Where did this one come from @effigies? I'd like to correct that while I am at it. Thx

That is simply a git merge conflict. In master, the files are named correctly, see e.g.:

they show up here as wrong because @rwblair's branch is old

@Andesha
Copy link
Contributor

Andesha commented Mar 15, 2023

Feel free to cut off the decimal values for the samples. That's just a result of down sampling from the original recording of 1024Hz to 512Hz.

@effigies
Copy link
Contributor

I reran Ross' script on master and pushed to #362.

@VisLab To confirm, truncating to round down to the nearest int is an acceptable transform?

@arnodelorme
Copy link
Contributor

Sample latencies can sometimes be fractional because they come from a different machine that has more resolution than the EEG sampling frequency. This is important for reaction time, for example. Even with EEG sampled at 250 Hz, you need to be able to determine reaction time with 1ms (1000 Hz equivalent sampling rate) precision.

@effigies
Copy link
Contributor

sample was introduced in BIDS 1.2.0, saying only:

Column name Description
sample OPTIONAL. Onset of the event according to the sampling scheme of the recorded modality (i.e., referring to the raw data file that the events.tsv file accompanies).

In 1.7.0, it became:

Column name Requirement Level Data type Description
sample OPTIONAL integer Onset of the event according to the sampling scheme of the recorded modality (that is, referring to the raw data file that the events.tsv file accompanies).

The integer type was introduced in bids-standard/bids-specification@034c6a8 (part of bids-standard/bids-specification#827) as part of the schematization. There doesn't appear to have been written discussion on whether sample is an integer or a float. From @arnodelorme's post, it sounds like fractional samples are a desired feature, so it's less an index than an alternative unit of time. From this perspective, the fix is not truncation but relaxing the data type to be number.

There is an open issue about whether it is a 0- or 1-based index here: bids-standard/bids-specification#499. If it is an alternative temporal unit, then I think starting at 0 more sense than starting at 1.

@VisLab
Copy link
Member

VisLab commented Mar 15, 2023 via email

@VisLab
Copy link
Member

VisLab commented Mar 15, 2023 via email

@dorahermes
Copy link
Member

+1 on zero-based indexing

@effigies
Copy link
Contributor

Thanks for the quick responses, Dora and Kay. I've proposed bids-standard/bids-specification#1441, if any interested parties from this thread would care to comment there.

@sappelhoff
Copy link
Member

The sample column will be considered an arbitrary column in future BIDS version. As such, its use is dataset-specific and SHOULD be documented in the accompanying JSON files. Thanks for the discussion in #1441, everyone.

@sappelhoff sappelhoff closed this Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants