Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIDS-like format #52

Merged
merged 14 commits into from
Jun 5, 2024
Merged

BIDS-like format #52

merged 14 commits into from
Jun 5, 2024

Conversation

alistairewj
Copy link
Collaborator

This is an initial PR for a BIDS-like format for the data. This PR adds (1) a prepare module which reformats data into BIDS-like data structure, and (2) a BIDSDataset and VBAIDataset class which provide utilities for loading data in from this format.

The conversion can be run with:

b2aiprep-cli redcap2bids bridge2ai_voice_data.csv --outdir output --audiodir audio

Once that's done, the data is in the output folder. See the tutorial.ipynb for an example of how to use the dataset classes for loading in dataframes from this format.

Copy link
Contributor

@ibevers ibevers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running this on an example CSV downloaded after Alex gave us access to the development project in RedCap. There are five columns in the face-to-face CSV that are not in the CSV I downloaded with "Bridge2AI - Enrolled Participants - All Sites."

It would be good to make sure this is robust to differences in columns as much possible.
The problematic function was:

def get_df_of_repeat_instrument(df: DataFrame, repeat_instrument: RepeatInstrument) -> pd.DataFrame:
    columns = get_columns_of_repeat_instrument(repeat_instrument)
    if repeat_instrument in (RepeatInstrument.PARTICIPANT,):
        idx = df['redcap_repeat_instrument'].isnull()
    else:
        idx = df['redcap_repeat_instrument'] == repeat_instrument.value
    return df.loc[idx, columns].copy()

@alistairewj
Copy link
Collaborator Author

Interesting! I didn't expect us to have additional columns in the F2F one. I can make that work. We actually need to have a better idea of the RedCap export in general so that it is somewhat reproducible.

@Rahul-Brito
Copy link
Contributor

Ahhh i see what happened, for the F2F they removed columns that the ethics team flagged as having identifiable information. I expect that these columns could change for each release since that process seems to be ever evolving fyi @alistairewj @ibevers

Copy link
Contributor

@ibevers ibevers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running this code, it appears to work as intended, although I suspect it is hard to manually check. It would be good to add some validation to this, although I understand if that is not feasible. I don't have deep enough knowledge of RedCap or BIDS to provide additional feedback on the functionality just from running and reading the code.

I understand that the priority of this PR has been getting the functionality working in the limited time that you have, @alistairewj, so the following style feedback can be seen as a note about improvements that can be addressed in future PR.

Stylistically, this code has a lot of room for improvement (details in inline comments). A few issues:

  • hard-coded values
  • largish data structure constants that should be in separate files and loaded
  • redundant operations that could be looped
  • functions that are too short
  • functions that are too long and need to be decomposed
  • mostly missing docstrings
  • inconsistent docstring formatting
  • multiline comments in the middle of functions should be in docstrings

src/b2aiprep/fhir_utils.py Show resolved Hide resolved
src/b2aiprep/fhir_utils.py Outdated Show resolved Hide resolved
src/b2aiprep/prepare.py Outdated Show resolved Hide resolved
src/b2aiprep/prepare.py Outdated Show resolved Hide resolved
src/b2aiprep/prepare.py Outdated Show resolved Hide resolved
src/b2aiprep/prepare.py Outdated Show resolved Hide resolved
src/b2aiprep/prepare.py Show resolved Hide resolved
src/b2aiprep/prepare.py Outdated Show resolved Hide resolved
src/b2aiprep/prepare.py Show resolved Hide resolved
docs/tutorial.ipynb Outdated Show resolved Hide resolved
@alistairewj
Copy link
Collaborator Author

Thanks! I addressed most of your concerns. I haven't added tests yet but I am also not sure we will keep this dataset API yet, it merits further discussion. My use of the word "questionnaire" in the dataset API is probably wrong.

@alistairewj alistairewj merged commit c271df1 into main Jun 5, 2024
2 checks passed
@alistairewj alistairewj deleted the alistair/bids_format branch June 5, 2024 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants