Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

ENH: Outsource leave-one-out splitter so it can be used across data types #98

Merged
merged 34 commits into from
Apr 3, 2024

Conversation

teresamg
Copy link
Contributor

@teresamg teresamg commented Nov 30, 2022

(Empty message edited by @oesteban)

This PR covers two targets:

  • Extract the "logo" splitter from the DWI object. The rationale behind this is allowing its use across different data types (e.g., with PET).
  • Incidentally, change the name from "logo" (for leave-one-gradient-out) to "lovo" (for leave-one-volume-out).

@oesteban
Copy link
Member

After refactoring the package's structure, I have allowed myself to rebase into main.

I would suggest we pull the logo_split function out of the dmri.py submodule, and instead one of these two options:

  • Under the eddymotion.data module, e.g. in eddymotion.data.splitting.
  • Under the eddymotion.model module, which is closer to what scikit-learn has (I believe it is something along the lines of sklearn.model.model_selection). However, in our case this has nothing to do with model selection, but rather a data manipulation routine.

I would favor creating a module under eddymotion.data - whether it is splitting.py, I don't really care much.

WDYT?

@oesteban
Copy link
Member

In e2b8f68 I went ahead and just started the eddymotion.data.splitting submodule. Please amend if you feel that's necessary.

@teresamg my rebasing will require you to update your working copy. If you don't have changes that you did not push, then I would just switch to main, git branch -D teresamg and then git checkout teresamg. If you did have changes, then let me know and I can help how to best transfer them.

src/eddymotion/data/dmri.py Outdated Show resolved Hide resolved
src/eddymotion/estimator.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
@arokem arokem changed the title WIP: Starting to refactor logo_split logo_split => lovo_split Dec 15, 2022
@teresamg teresamg marked this pull request as ready for review December 15, 2022 22:23
@arokem
Copy link
Collaborator

arokem commented Dec 15, 2022

@oesteban : This integrates all of your proposed changes, and updates things so that everything runs. It does also include the refactor of the em_affines list into a preallocated array, which helps with memory.

@oesteban oesteban changed the title logo_split => lovo_split ENH: Outsource leave-one-out splitter so it can be used across data types Mar 27, 2024
@oesteban
Copy link
Member

I have resolved the conflicts and dropped some changes that did not pertain to this PR. These spurious changes addressed ancient memory issues which have been addressed elsewhere (namely, fixed within nitransforms, which was the original cause of the issues).

@esavary this is ready for your code review and perhaps some hands-on testing. Since I assume it is not immediate how to test eddymotion locally, I think we could add some documentation about that. Let us know how you want to proceed.

@oesteban oesteban requested a review from esavary March 27, 2024 11:13
Copy link
Member

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked two important features that have disappeared in the outsourced function. @esavary - may you take care of those issues?

src/eddymotion/estimator.py Outdated Show resolved Hide resolved
src/eddymotion/estimator.py Outdated Show resolved Hide resolved
src/eddymotion/data/dmri.py Show resolved Hide resolved
src/eddymotion/data/dmri.py Show resolved Hide resolved
@esavary
Copy link
Member

esavary commented Mar 27, 2024

I have resolved the conflicts and dropped some changes that did not pertain to this PR. These spurious changes addressed ancient memory issues which have been addressed elsewhere (namely, fixed within nitransforms, which was the original cause of the issues).

@esavary this is ready for your code review and perhaps some hands-on testing. Since I assume it is not immediate how to test eddymotion locally, I think we could add some documentation about that. Let us know how you want to proceed.

I was thinking of creating a 'test_splitting.py' module to test the splitting functionality. The idea is to generate a mock DWI (Diffusion-Weighted Imaging) object with 'dataobj' and 'gradients' attributes filled with zeros, except at one index where we'll set the values to 1. ( In the same direction as what you suggested yesterday) Then checking if the 'lovo_split' function, applied at this index, produces the expected results. Does this align with your thoughts?

Regarding documentation, could you provide more details? Do you expect a dedicated section focusing only on testing the splitting feature or more general about testing the entire module?

@esavary
Copy link
Member

esavary commented Mar 28, 2024

I have resolved the conflicts and dropped some changes that did not pertain to this PR. These spurious changes addressed ancient memory issues which have been addressed elsewhere (namely, fixed within nitransforms, which was the original cause of the issues).

@esavary this is ready for your code review and perhaps some hands-on testing. Since I assume it is not immediate how to test eddymotion locally, I think we could add some documentation about that. Let us know how you want to proceed.

I've gone ahead and included the test I used to check the 'lovo_split' function in the test folder. However, feel free to remove it if you don't find it relevant or if you had something else in mind @oesteban .

Copy link
Member

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doubtful about the right way to implement this. I'm not sure anymore if we want to access the filesystem here. In any case, I think the above variable changes will clarify the picture.

cc @effigies - what do you think? this is related to the new issue #145.

src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
src/eddymotion/data/splitting.py Outdated Show resolved Hide resolved
@esavary esavary requested a review from oesteban March 28, 2024 13:50
@arokem
Copy link
Collaborator

arokem commented Mar 28, 2024

I think that @teresamg also had some test scripts that she used in order to run experiments with a sample dataset (I think that it was the traveling brain study from OpenNeuro). Maybe she can share those scripts (e.g., on a separate repo?) so that @esavary can run those for profiling/testing of this code?

@esavary
Copy link
Member

esavary commented Apr 2, 2024

I think that @teresamg also had some test scripts that she used in order to run experiments with a sample dataset (I think that it was the traveling brain study from OpenNeuro). Maybe she can share those scripts (e.g., on a separate repo?) so that @esavary can run those for profiling/testing of this code?

Thanks, that sounds great! I'll run them as soon as I have them.

@oesteban oesteban merged commit a1bcacd into nipreps:main Apr 3, 2024
8 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants