Add metric morans i #303

MxMstrmn · 2022-05-02T14:56:38Z

No description provided.

LuckyMD

I still have some questions about the rationale for the rescaling and comparison pre and post. It seems in the end, you default to not comparing the score to pre-integration? Do we have a ground truth that 1 is desirable for all HVGs that we're using? This seems a bit tough for me to argue for. Why not go for the pre-post comparison instead? Also, I'm not sure why we should care about distinguishing values of -1 and 0.

scib/metrics/metrics.py

LuckyMD · 2022-05-02T17:10:51Z

scib/metrics/morans_i.py

+    adata_pre,
+    adata_post,
+    batch_key,
+    n_hvg=1000,


1000 hvgs sounds like a lot for this as default. Is there any reason for this? Can it be reduced?

I agree, this can be reduced - maybe to 100?

LuckyMD · 2022-05-02T17:13:06Z

scib/metrics/morans_i.py

+    sc.pp.neighbors(adata_post, use_rep=embed)
+    # Prepare pre data
+    adata_pre = adata_pre.copy()
+    adata_pre.obs[batch_key] = adata_pre.obs[batch_key].astype("category")


Please add a call to check_sanity() or the check_adata or check_batch util functions from here:

scib/scib/utils.py

Line 25 in f35b0c9

def check_sanity(adata, batch, hvg):

LuckyMD · 2022-05-02T17:16:25Z

scib/metrics/morans_i.py

+        cond = (
+            np.array(np.mean(data[:, hvgs].X, axis=0))
+            != np.array(data[0, hvgs].X.todense())
+        ).squeeze()


mean equal to first value could happen if the genes are not constant, no? Why not define this via standard deviation?

You are right! std is better!

LuckyMD · 2022-05-02T17:20:00Z

scib/metrics/morans_i.py

+            )
+            adata_sample_scl = adata_sample.copy()
+            # PP each sample: scaling, pca, neighbours
+            sc.pp.scale(adata_sample_scl, max_value=10)


just out of curiosity: any reason for scaling here specifically or out of habit?

Most of the code was adapted from Karin. @Hrovatin, do you have an opinion here?

LuckyMD · 2022-05-02T17:39:23Z

scib/metrics/morans_i.py

+        batch_mis = pd.concat(batch_mis, axis=1)
+        # Difference with integrated data
+        mi_diffs = adata_post.var["morans_i"] - batch_mis.max(axis=1)
+        avg_mi_diff = mi_diffs.mean()


I don't entirely understand this part. You are expecting optimal integration to mean that Moran's I is the same same in the full integration as the max in any individual batch? Is the max used due to the possibility of having unique cell types in a single batch? Is this affected by cell type composition differences across batches?

I guess the max gives an upper value on the degree at which an individual gene can be clustered. I changed it in #304 such that only worse morans I scores are counted - if you agree that the max is an upper bound per gene, then a difference > 0 would mean that the integration was successful in maintaining this trait of the data.

To compare different integrations, I believe that that compare_pre=False computation would be sensible as scores are directly comparable (their mean).

LuckyMD · 2022-05-02T17:43:02Z

scib/metrics/morans_i.py

+        # Rescale so that it is between [0,1] where 1 is better
+        # Moran's I will be in [-1,1] and thus difference can be in [-2,2]
+        if rescale:
+            res = (avg_mi_diff + 2) / 4


Another note on rescaling. Here it seems that a difference of 2 (perfect auto-correlation of markers in the integrated data but max per-batch auto-correlation is -1) is optimal. I'm not sure this should be the case. Wouldn't the idea be to detect the same spatial correlation structure in a single batch as across all of them? I guess you want a 1 - (avg_mi_diff + 2) / 4 here, no?

Also, should a Moran's I of -1 and 0 be distinguished for the comparison at all? Both mean no biologically meaningful information is encoded, no?

-1 means perfectly dispersed, if we are looking for a clustered signal we could do max(0, score) to have the score in [0,1] from the beginning - what do you think?

Wrt to the 1 - ... , I totally agree. Minimal distances should yield a high score.

EDIT: I changed it such as to only consider "bad" differences, please check in #304

LuckyMD · 2022-05-02T17:44:14Z

scib/metrics/morans_i.py

+        res = adata_post.var["morans_i"].mean()
+        if rescale:
+            # Moran's I will be in [-1,1]
+            res = (res + 1) / 2


also here: should a Moran's I of 0 be better than one of -1? Could that difference only be due to sparsity of the marker?

I changed it to scores >0 as we are mostly interested in clusterings.

MxMstrmn added 5 commits May 2, 2022 16:41

Add morans_i to init

97e5a73

Format code and add morans I

ed18a80

Format code and add check for hvg computation

3fb2ea9

Add morans I metric

5dd45fd

Add test for morans I

decb65d

MxMstrmn requested a review from mumichae May 2, 2022 14:56

merge main

f35b0c9

LuckyMD reviewed May 2, 2022

View reviewed changes

MxMstrmn mentioned this pull request May 3, 2022

Feat/add moransi #304

Open

7 tasks

run pre-commit

88ba08f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metric morans i #303

Add metric morans i #303

MxMstrmn commented May 2, 2022

LuckyMD left a comment

LuckyMD May 2, 2022

MxMstrmn May 3, 2022

LuckyMD May 2, 2022

LuckyMD May 2, 2022

MxMstrmn May 3, 2022

LuckyMD May 2, 2022

MxMstrmn May 3, 2022

LuckyMD May 2, 2022

MxMstrmn May 3, 2022

LuckyMD May 2, 2022

MxMstrmn May 3, 2022

MxMstrmn May 3, 2022

LuckyMD May 2, 2022

MxMstrmn May 3, 2022

Add metric morans i #303

Are you sure you want to change the base?

Add metric morans i #303

Conversation

MxMstrmn commented May 2, 2022

LuckyMD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment