BIDS-like format #52

alistairewj · 2024-06-03T19:39:58Z

This is an initial PR for a BIDS-like format for the data. This PR adds (1) a prepare module which reformats data into BIDS-like data structure, and (2) a BIDSDataset and VBAIDataset class which provide utilities for loading data in from this format.

The conversion can be run with:

b2aiprep-cli redcap2bids bridge2ai_voice_data.csv --outdir output --audiodir audio

Once that's done, the data is in the output folder. See the tutorial.ipynb for an example of how to use the dataset classes for loading in dataframes from this format.

ibevers

I tried running this on an example CSV downloaded after Alex gave us access to the development project in RedCap. There are five columns in the face-to-face CSV that are not in the CSV I downloaded with "Bridge2AI - Enrolled Participants - All Sites."

It would be good to make sure this is robust to differences in columns as much possible.
The problematic function was:

def get_df_of_repeat_instrument(df: DataFrame, repeat_instrument: RepeatInstrument) -> pd.DataFrame:
    columns = get_columns_of_repeat_instrument(repeat_instrument)
    if repeat_instrument in (RepeatInstrument.PARTICIPANT,):
        idx = df['redcap_repeat_instrument'].isnull()
    else:
        idx = df['redcap_repeat_instrument'] == repeat_instrument.value
    return df.loc[idx, columns].copy()

alistairewj · 2024-06-04T14:43:07Z

Interesting! I didn't expect us to have additional columns in the F2F one. I can make that work. We actually need to have a better idea of the RedCap export in general so that it is somewhat reproducible.

Rahul-Brito · 2024-06-04T14:48:29Z

Ahhh i see what happened, for the F2F they removed columns that the ethics team flagged as having identifiable information. I expect that these columns could change for each release since that process seems to be ever evolving fyi @alistairewj @ibevers

ibevers

After running this code, it appears to work as intended, although I suspect it is hard to manually check. It would be good to add some validation to this, although I understand if that is not feasible. I don't have deep enough knowledge of RedCap or BIDS to provide additional feedback on the functionality just from running and reading the code.

I understand that the priority of this PR has been getting the functionality working in the limited time that you have, @alistairewj, so the following style feedback can be seen as a note about improvements that can be addressed in future PR.

Stylistically, this code has a lot of room for improvement (details in inline comments). A few issues:

hard-coded values
largish data structure constants that should be in separate files and loaded
redundant operations that could be looped
functions that are too short
functions that are too long and need to be decomposed
mostly missing docstrings
inconsistent docstring formatting
multiline comments in the middle of functions should be in docstrings

src/b2aiprep/fhir_utils.py

src/b2aiprep/prepare.py

docs/tutorial.ipynb

…ded values

alistairewj · 2024-06-05T20:28:09Z

Thanks! I addressed most of your concerns. I haven't added tests yet but I am also not sure we will keep this dataset API yet, it merits further discussion. My use of the word "questionnaire" in the dataset API is probably wrong.

alistairewj added 3 commits June 3, 2024 13:43

add conversion from redcap csv / audio folder into BIDS like structure

bc6a8d6

add json files with column names for each questionnaire

7a8da6f

fix a few bugs

cde0fc0

alistairewj requested review from Rahul-Brito and ibevers June 3, 2024 19:40

ibevers reviewed Jun 4, 2024

View reviewed changes

alistairewj added 8 commits June 5, 2024 10:23

init tutorial using BIDS format

a9d8527

additional usage notes

f7a6ca8

fix typo in fhir resources dependency

745864c

raise warning if columns are missing and insert nulls

99024e0

reorganize json column files and add detection/fixing of column names

996530d

add mapping from coded redcap columns to free-text column names

9569758

fix name for recording files to match BIDS schema

b6e7a61

add validation that audio recordings exist

6b772a3

alistairewj force-pushed the alistair/bids_format branch from 35dff68 to 6b772a3 Compare June 5, 2024 14:23

ibevers reviewed Jun 5, 2024

View reviewed changes

alistairewj added 3 commits June 5, 2024 16:24

add more doc to tutorial

f649a50

add utility to list all questionnaires in the beh folder

f5e2afc

clean up code with docstrings and a new constants.py file for hard-co…

5f400cd

…ded values

alistairewj merged commit c271df1 into main Jun 5, 2024
2 checks passed

alistairewj deleted the alistair/bids_format branch June 5, 2024 20:30

ibevers mentioned this pull request Jun 6, 2024

Discussion for whether to keep the dataset API #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BIDS-like format #52

BIDS-like format #52

alistairewj commented Jun 3, 2024

ibevers left a comment

alistairewj commented Jun 4, 2024

Rahul-Brito commented Jun 4, 2024

ibevers left a comment

alistairewj commented Jun 5, 2024

BIDS-like format #52

BIDS-like format #52

Conversation

alistairewj commented Jun 3, 2024

ibevers left a comment

Choose a reason for hiding this comment

alistairewj commented Jun 4, 2024

Rahul-Brito commented Jun 4, 2024

ibevers left a comment

Choose a reason for hiding this comment

alistairewj commented Jun 5, 2024