Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

ENH: Improving GP models #244

Merged
merged 8 commits into from
Oct 28, 2024
Merged

ENH: Improving GP models #244

merged 8 commits into from
Oct 28, 2024

Conversation

oesteban
Copy link
Member

No description provided.

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 75.67568% with 18 lines in your changes missing coverage. Please review.

Project coverage is 66.24%. Comparing base (d3f5eac) to head (6e6ff4a).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/eddymotion/model/_sklearn.py 75.67% 16 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
+ Coverage   66.18%   66.24%   +0.05%     
==========================================
  Files          19       19              
  Lines         905      942      +37     
  Branches      112      119       +7     
==========================================
+ Hits          599      624      +25     
- Misses        264      274      +10     
- Partials       42       44       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@oesteban
Copy link
Member Author

@jhlegarreta our simulated data may not be realistic enough for our tests:

synthetic-covariance-plot

That said, convergence issues also happen when using b=3000 of HCP.

@oesteban oesteban force-pushed the enh/gp-model-improvements branch 2 times, most recently from 28c371e to 22686a0 Compare October 26, 2024 05:58
@oesteban oesteban force-pushed the enh/gp-model-improvements branch from 22686a0 to 5fd0125 Compare October 26, 2024 06:07
@jhlegarreta @yasseraleman -- I may have found the potential culprit
that makes fitting so weird.

By looking at the covariance vs. angle plots (at the beginning of the
notebook), I ended up realizing that they massaged the data before
calculating the covariance.
They certainly removed the mean (I don't know if gradient-wise mean or
the grand mean), and possibly scaled to one-standard-deviation.

Then I went back to the paper and found the little comment I included
in the docstring of ``EddymotionGPR.fit()``.

This doesn't completely resolve all problems, but I'm positive it's one
little step we needed to take.
@oesteban
Copy link
Member Author

@jhlegarreta @yasseraleman -- the new plot accounting for the normalization of the data differently (see my commit message in 3136256):

covariance-plot

Perhaps the optimizer is moving a towards $\frac{\pi}{2}$ because it is indeed a better fit than a = 1.23.

@jhlegarreta
Copy link
Collaborator

@jhlegarreta @yasseraleman -- the new plot accounting for the normalization of the data differently (see my commit message in 3136256):

@oesteban 🎉 looks reasonable.

It is still unclear to me how :math:f^{*} is interpolated from :math:\hat{\mathbf{f}} (grand mean?), or whether the fitted model accounts for that factor (I doubt it).

Maybe I'm being very naive, but they assume that $f^{*}$ is the target value, and simply add $\hat{f}$ once they're done with the estimation following the text? That said, I have not found where in the eddy folks they add back the mean:
https://git.fmrib.ox.ac.uk/fsl/eddy/-/blob/2480dda293d4cec83014454db3a193b87921f6b0/DiffusionGP.cpp#L252

Maybe since they add it back to the predictions when writing the volumes as they have kept track of the value. But have not found the place, unless it is this one:
https://git.fmrib.ox.ac.uk/fsl/eddy/-/blob/2480dda293d4cec83014454db3a193b87921f6b0/DiffusionGP.cpp#L132

@oesteban
Copy link
Member Author

Maybe I'm being very naive, but they assume that $f^{*}$ is the target value, and simply add $\hat{f}$ once they're done with the estimation following the text? That said, I have not found where in the eddy folks they add back the mean

I don't think that's naive and I understand the same of the text. But then the question is you can obtain $\bar{\mathbf{f}}$ by just calculating the mean along the voxels axis (obtaining one mean per b-vector). However, in prediction, you don't know the mean of the orientations. So perhaps eddy is kind of cheating and getting the parameter from the testing data (fair enough, which would explain why you didn't find it in the prediction part of the code) or they are actually interpolating $\bar{\mathbf{f}}$ on the surface (which doesn't seem to be implemented within eddy).

@oesteban
Copy link
Member Author

Addendum: This would invalidate our cross-validation unless we regress data to $\bar{\mathbf{f}}$ before entering cross-validation.

It seems I was wrong about the need for standardizing, Scikit-learn's
implementation is sufficient and follows what the paper specifies.

There are a few additional details that have surfaced as relevant:

* $\sigma^2$ should not be very large (and ~100) was VERY large.
  With small values, it seems to pacify optimizers with more plausible
  outcomes.
* $a$ may take values above $\frac{\pi}{2}$, even though $\theta$ will
  always be smaller than that.
  Indeed, it seems for our real dataset the optimal $a$ is indeed around
  $\frac{\pi}{2}$.
* I haven't cracked the manipulations of the covariance plot yet, but
  it's clear to me now that they are indeed considering $a$ and $\lambda$.

The most relevant inclusion in this commit is the repetition of the
covariance plot with the covariance of predicted data, which is very
encouraging.

cc @jhlegarreta.
@oesteban
Copy link
Member Author

It occurred to me that we could plot the covariance of predictions (out of sample, btw):

covariance-plot

Promising!

(see updated notebook)

@oesteban
Copy link
Member Author

Improved:
covariance-plot

@oesteban oesteban marked this pull request as ready for review October 28, 2024 13:01
@oesteban oesteban merged commit ef91d4f into main Oct 28, 2024
7 checks passed
@oesteban oesteban deleted the enh/gp-model-improvements branch October 28, 2024 13:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants