-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCHEMA: Update existence checks to consider empty lists #1747
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1747 +/- ##
=======================================
Coverage 87.93% 87.93%
=======================================
Files 16 16
Lines 1351 1351
=======================================
Hits 1188 1188
Misses 163 163 ☔ View full report in Codecov by Sentry. |
@@ -3,35 +3,31 @@ SubjectRelativeIntendedFor: | |||
selectors: | |||
- datatype != "ieeg" | |||
- type(sidecar.IntendedFor) != "null" | |||
- length(sidecar.IntendedFor) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but is it an issue really?
do we have some common requirement on why no empty lists or no columns full of n/a
?
if not -- then I don't see why this particular attributes/columns are special.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are responses to existing datasets. We can warn on empty lists if desired, but we shouldn't raise these errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From in-person convo, special-casing empty lists isn't great. The problem here is that the value could be either a string or a list of strings. exists()
handles that fine, but exists() == length()
would fail on non-empty strings.
One way to do this is to update the check to:
exists(sidecar.IntendedFor, "bids-uri") + exists(sidecar.IntendedFor, "subject")
== length(type(sidecar.IntendedFor) == "string" && 1 || length(sidecar.IntendedFor))
@rwblair Any opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the RHS of that equality be wrapped in a length
? Otherwise it's good. It is complicated enough that we should probably have some unit tests to make sure it works how we think it does.
Simple but more verbose option would be to have multiple rules based on type of IntendedFor
. One for it being a string and another on it being a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, should drop that outer length
Maybe two rules is best. Was trying to avoid it, but it would be more readable IMO.
9467f62
to
de4bb1a
Compare
Fixes two issues found in real datasets:
Guessing this got rebased out before pushing.stim_file
columns with alln/a
. This is already handled in a duplicated check:IntendedFor
fields with an empty array ([]
) as a value. Other reference checks suffer the same potential bug, so fixed at the same time.