Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two distance metrics, three-way comparison and bootstrapping #608

Merged
merged 36 commits into from
Jun 24, 2024
Merged

Conversation

wxicu
Copy link
Collaborator

@wxicu wxicu commented May 26, 2024

PR Checklist

Description of changes
Add distance metrics: mean_var_distn and mahalanobis

Technical details

Additional context

There is a question whether to store/return/accept the expensive inverse of the covariance matrix for mahalanobis metrix in the draft implementations on the hackathon branch. Do you think it make sense to save the invert in a multidimensional array? I didnt come up with a more efficient solution.

@wxicu wxicu requested a review from Zethson May 26, 2024 21:45
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

Left some comments.

  1. We might be able to speed it up with numba but it's not that important for now.
  2. Can we use the aggregation functions for other distances? As it is, you can pass them generally, but then it's only used in one distance? This is super weird UX. This needs improvement.

pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
tests/tools/_distances/test_distances.py Outdated Show resolved Hide resolved
@wxicu
Copy link
Collaborator Author

wxicu commented Jun 3, 2024

It takes about 5 seconds to calculate KDE of a group pair for the metric mean_var_disn. There are 32 groups in the test dataset so I subset to only 5 of them for speeding up.

@wxicu wxicu requested a review from Zethson June 3, 2024 18:42
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am honestly not sure whether it makes to use the aggregation functions for all of them^^

I think I might have gotten it wrong. It says "pseudobulk" but I think that how you're using them in the distance now is something different? Do I get it wrong?

pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
tests/tools/_distances/test_distances.py Show resolved Hide resolved
@Zethson Zethson requested a review from eroell June 3, 2024 19:44
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Show resolved Hide resolved
@wxicu
Copy link
Collaborator Author

wxicu commented Jun 6, 2024

I am honestly not sure whether it makes to use the aggregation functions for all of them^^

I think I might have gotten it wrong. It says "pseudobulk" but I think that how you're using them in the distance now is something different? Do I get it wrong?

To be honest I copied "aggregation part" from the hackathon branch where aggregation function accepts np.mean/variance/median etc.
From my understanding: currently the majority of the distance metrics calculate the distance between 2 pseudobulk vectors. For those metrics, the aggregation function just provides different modes to aggregate the counts then just mean expression. For metrics which not really calculate the distance between 2 pseudobulk, , e.g. mean_var_distribution, we dont use any aggregation funtcion there.

However the problem is that I am not sure whether it makes sense to generate pseudobulk vector with variance. So for now I would like to keep only mean, median and add sum for aggregation functions in distance metrics. For mediod, I dont think it is really an aggregation method, but a clustering method? So i also removed it. What do you think?

@Zethson
Copy link
Member

Zethson commented Jun 6, 2024

@wxicu all of this sounds reasonable to me.

@wxicu wxicu changed the title Add two distance metrics Add two distance metrics, three-way comparison and bootstrapping Jun 7, 2024
@wxicu
Copy link
Collaborator Author

wxicu commented Jun 12, 2024

But.....I didn't come up with more descriptive names for the parameters

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Show resolved Hide resolved
pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved
tests/tools/test_metrics_3g.py Outdated Show resolved Hide resolved
@wxicu wxicu requested review from Zethson and eroell June 20, 2024 19:10
@wxicu wxicu enabled auto-merge (squash) June 24, 2024 09:41
Copy link
Collaborator Author

@wxicu wxicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeeeeeee

@wxicu wxicu disabled auto-merge June 24, 2024 09:46
@wxicu wxicu dismissed Zethson’s stale review June 24, 2024 09:49

All changes have been adapted

@wxicu wxicu enabled auto-merge (squash) June 24, 2024 09:49
@wxicu wxicu requested a review from Zethson June 24, 2024 09:50
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the docs yet because the API needs an overhaul. But good enough for the figure for the manuscript.

The tests are getting slower and slower but well we need another take on that.

@wxicu wxicu merged commit a22aaab into main Jun 24, 2024
3 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants