Add coordinate-based coactivation-based parcellation class #533

tsalo · 2021-06-30T16:46:07Z

Closes #260. Tagging @DiveicaV in case she wants to look at this.

We are using Chase et al. (2020) as the basis for our general approach- especially the metrics we're using for kernel and order selection.

EDIT: A recommendation from @SBEickhoff is to look at Liu et al. (2020) and Plachti et al. (2019) as well.

To do:

Changes proposed in this pull request:

Add n option to Dataset.get_studies_by_coordinate().
Draft new parcellate module with CoordCBP class.

tsalo · 2021-06-30T16:48:52Z

@mriedel56 @62442katieb if possible, I'd love it if you could check out the new class (especially the _fit method, which does the actual CBP) and give your thoughts. So far, I just have the most basic elements of the algorithm implemented, so I still need input on (1) the clustering algorithm options, (2) the metrics to use, and (3) the outputs to save.

Ultimately, I want this class to be fairly basic, meaning not including too many tunable parameters, with some documentation pointing toward cbptools for users who require more control.

Additional questions:

Should we run PCA before clustering? From the sklearn clustering user guide:

in very high-dimensional spaces, Euclidean distances tend to become inflated (this is an instance of the so-called “curse of dimensionality”). Running a dimensionality reduction algorithm such as Principal component analysis (PCA) prior to k-means clustering can alleviate this problem and speed up the computations.
Do we want to leverage sample weights at all? E.g., by weighting by studies' sample sizes?
How do we want to structure our outputs? The label maps can go in a standard MetaResult, but we have additional information, like filter selection ranges and metrics, that we probably want to output as well.

nimare/parcellate.py

tsalo · 2021-07-01T17:04:33Z

nimare/parcellate.py

+        images = {"labels": labels}
+        return images
+
+    def _filter_selection(self):


Chase 2020:

We implemented a two-step procedure that involved a decision on those filter sizes to be included in the final analysis and subsequently a decision on the optimal cluster solution. In the first step, we examined the consistency of the cluster assignment for the individual voxels across the cluster solutions of the co-occurrence maps performed at different filter sizes. We selected a filter range with the lowest number of deviants, that is, number of voxels that were assigned differently compared with the solution from the majority of filters. In other words, we identified those filter sizes which produced solutions most similar to the consensus-solution across all filter sizes. For example, the proportion of deviants for the second parcellation is illustrated in Figure S1; this shows the borders of the filter range to be used for subsequent steps was based on the Z-scores of the number of deviants.

I interpret this to mean:

Derive mode array of label assignments for each cluster count across filter sizes.

I assume this means mode of each voxel determined independently, rather than mode of full set of assignments.

What if label numbers don't match? E.g., label 1 in filter size 1 is most similar to label 2 in filter size 2.

I assume we should do some kind of synchronization, unless there's some inherent order to KMeans labels?

Count number of voxels that don't match mode for each filter size.

Calculate proportion of deviants in each cluster solution and filter size.

Calculate weighted z-score for each filter size (across cluster solutions) somehow?

What is it weighted by?

Select range of filter sizes with lowest z-scores.

How? Is there some kind of threshold? Figure S1 grabs range with z-scores < -0.5. No clue if that's a meaningful threshold or something like 2 standard deviations from avg z-score or what.

What if there are multiple dips? Does amplitude (z-scores of filters below threshold) or width (number of filters below threshold) matter more?

codecov · 2021-07-12T22:19:39Z

Codecov Report

Base: 88.55% // Head: 84.29% // Decreases project coverage by -4.26% ⚠️

Coverage data is based on head (0c60dd5) compared to base (e269941).
Patch coverage: 7.89% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
- Coverage   88.55%   84.29%   -4.27%     
==========================================
  Files          38       36       -2     
  Lines        4370     4069     -301     
==========================================
- Hits         3870     3430     -440     
- Misses        500      639     +139

Impacted Files	Coverage Δ
nimare/parcellate.py	`0.00% <0.00%> (ø)`
nimare/dataset.py	`90.33% <100.00%> (+0.37%)`	⬆️
nimare/utils.py
nimare/base.py
nimare/__init__.py

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Will move most of the work to the main fit loop, because this metric requires too much information.

adelavega · 2022-04-20T19:35:53Z

https://github.com/neurosynth/neurosynth/blob/master/neurosynth/analysis/cluster.py

tsalo · 2022-04-20T19:36:58Z

@62442katieb has some code from her naturalistic meta-analysis that may implement some of these metrics: https://github.com/62442katieb/meta-analytic-kmeans/blob/daf3904caad990aeadc89bc98769aaed32857e09/evaluating_clustering_solutions.ipynb

tsalo added 2 commits June 30, 2021 12:04

Add n option to get_studies_by_coordinate

08ae309

Mock up maybe passable CoordCBP class.

88d5c05

Fix docstrings.

3ce90f1

tsalo commented Jun 30, 2021

View reviewed changes

nimare/parcellate.py Outdated Show resolved Hide resolved

tsalo added 4 commits July 1, 2021 10:58

Don't calculate distances.

40f4585

Remove extra newline.

530048f

Clustering should be fairly good. Still need to implement metrics.

8382704

Add empty methods for the different metrics.

c79b251

tsalo commented Jul 1, 2021

View reviewed changes

tsalo added 7 commits July 1, 2021 13:41

Work on filter-selection metric. Far from done.

3e2d580

Add a couple of metrics.

3365d5c

Improve metric documentation.

7081604

VI should be fairly good now.

12cddec

Merge branch 'main' into coord-cbp

7023291

Draft voxel misclassification metric.

f37751a

Merge branch 'main' into coord-cbp

f555790

tsalo added 6 commits July 14, 2021 12:52

Draft another metric.

0896745

Will move most of the work to the main fit loop, because this metric requires too much information.

Little cleanup.

f9611db

Move ratio calculations to main fit method.

909c822

More cleanup.

65857d1

More work and debugging.

65b9675

Mention re-labeling (not implemented).

118e718

tsalo added the enhancement New feature or request label Nov 6, 2021

tsalo added 2 commits December 21, 2021 13:09

Merge branch 'main' into coord-cbp

267a135

Remove added line.

530a4de

tsalo added the parcellate Issues/PRs related to the parcellate module. label Jan 5, 2022

tsalo changed the title ~~[ENH] Add coordinate-based coactivation-based parcellation class~~ Add coordinate-based coactivation-based parcellation class Jan 29, 2022

tsalo added the help wanted Extra attention is needed label Mar 27, 2022

tsalo and others added 5 commits April 20, 2022 15:54

Merge branch 'main' into coord-cbp

e58a9d9

Merge remote-tracking branch 'upstream/main' into jperaza-coord-cbp

a1bcf0f

Update test_dataset.py

1c352f7

Merge branch 'main' into coord-cbp

917e943

Update utils.py

0c60dd5

JulioAPeraza self-assigned this Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coordinate-based coactivation-based parcellation class #533

Add coordinate-based coactivation-based parcellation class #533

tsalo commented Jun 30, 2021 •

edited

Loading

tsalo commented Jun 30, 2021 •

edited

Loading

tsalo Jul 1, 2021

codecov bot commented Jul 12, 2021 •

edited

Loading

adelavega commented Apr 20, 2022

tsalo commented Apr 20, 2022

Add coordinate-based coactivation-based parcellation class #533

Are you sure you want to change the base?

Add coordinate-based coactivation-based parcellation class #533

Conversation

tsalo commented Jun 30, 2021 • edited Loading

tsalo commented Jun 30, 2021 • edited Loading

tsalo Jul 1, 2021

Choose a reason for hiding this comment

codecov bot commented Jul 12, 2021 • edited Loading

Codecov Report

adelavega commented Apr 20, 2022

tsalo commented Apr 20, 2022

tsalo commented Jun 30, 2021 •

edited

Loading

tsalo commented Jun 30, 2021 •

edited

Loading

codecov bot commented Jul 12, 2021 •

edited

Loading