-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIDS-like format #52
BIDS-like format #52
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running this on an example CSV downloaded after Alex gave us access to the development project in RedCap. There are five columns in the face-to-face CSV that are not in the CSV I downloaded with "Bridge2AI - Enrolled Participants - All Sites."
It would be good to make sure this is robust to differences in columns as much possible.
The problematic function was:
def get_df_of_repeat_instrument(df: DataFrame, repeat_instrument: RepeatInstrument) -> pd.DataFrame:
columns = get_columns_of_repeat_instrument(repeat_instrument)
if repeat_instrument in (RepeatInstrument.PARTICIPANT,):
idx = df['redcap_repeat_instrument'].isnull()
else:
idx = df['redcap_repeat_instrument'] == repeat_instrument.value
return df.loc[idx, columns].copy()
Interesting! I didn't expect us to have additional columns in the F2F one. I can make that work. We actually need to have a better idea of the RedCap export in general so that it is somewhat reproducible. |
Ahhh i see what happened, for the F2F they removed columns that the ethics team flagged as having identifiable information. I expect that these columns could change for each release since that process seems to be ever evolving fyi @alistairewj @ibevers |
35dff68
to
6b772a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After running this code, it appears to work as intended, although I suspect it is hard to manually check. It would be good to add some validation to this, although I understand if that is not feasible. I don't have deep enough knowledge of RedCap or BIDS to provide additional feedback on the functionality just from running and reading the code.
I understand that the priority of this PR has been getting the functionality working in the limited time that you have, @alistairewj, so the following style feedback can be seen as a note about improvements that can be addressed in future PR.
Stylistically, this code has a lot of room for improvement (details in inline comments). A few issues:
- hard-coded values
- largish data structure constants that should be in separate files and loaded
- redundant operations that could be looped
- functions that are too short
- functions that are too long and need to be decomposed
- mostly missing docstrings
- inconsistent docstring formatting
- multiline comments in the middle of functions should be in docstrings
Thanks! I addressed most of your concerns. I haven't added tests yet but I am also not sure we will keep this dataset API yet, it merits further discussion. My use of the word "questionnaire" in the dataset API is probably wrong. |
This is an initial PR for a BIDS-like format for the data. This PR adds (1) a prepare module which reformats data into BIDS-like data structure, and (2) a BIDSDataset and VBAIDataset class which provide utilities for loading data in from this format.
The conversion can be run with:
Once that's done, the data is in the output folder. See the tutorial.ipynb for an example of how to use the dataset classes for loading in dataframes from this format.