Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMIC #1

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open

GMIC #1

wants to merge 44 commits into from

Conversation

tingshanL
Copy link
Owner

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

tliu68 and others added 30 commits May 24, 2023 16:37
…d without a y argument (scikit-learn#29402)

Co-authored-by: Loïc Estève <[email protected]>
Co-authored-by: Lucy Liu <[email protected]>
…nd local caching (scikit-learn#29354)

Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Loïc Estève <[email protected]>
@tingshanL tingshanL changed the title Gm ic GMIC Oct 30, 2024
Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e2f9a77. Link to the linter CI: here

@tingshanL
Copy link
Owner Author

tingshanL commented Oct 30, 2024

Hello Adam,

We have reviewed and addressed all feedback provided here. Below is a summary of the primary comments and our responses:

  • "significantly richer init capabilities than the underlying GaussianMixture class in scikit-learn": currently, GaussianMixtureIC selects models only based on covariance constraints and the number of components, the same as the GM model selection example in the library.
  • "orders of magnitude slower than GaussianMixture": using current GaussianMixtureIC to run the same task as in the example does not take more time. On my computer, both fitting procedures took around 1.3s.
  • "to configure continuous integration to run the tests, including a test to run the check_estimator function to make sure that the code stays compatible with future scikit-learn versions": all checks, including the tests related to check_estimator, have passed.
  • "joblib-based multi-threading": it is no longer in the code.
  • "regularization": also not included in the code anymore.

With these revisions, we believe the code is ready for the next review. Please let us know if there is anything further we should adjust. Thank you very much for your time and insights!

@tingshanL
Copy link
Owner Author

Hello Adam,

Thank you for your patience while we worked on addressing the feedback. We’ve been focused on adding the tied-diagonal covariance type to the GaussianMixture class. This option is supported in mclust but not in sklearn.GaussianMixture, which highlights some broader discrepancies between the two packages in their GMM implementations.

Here’s a brief summary of the key differences:

Covariance Types

  • sklearn.GaussianMixture:
    Supports full, tied, diag, and spherical.
  • mclust:
    Supports all four above (referred to as VVV, EEE, VVI, VII) and ten additional covariance types. A complete list is in Table 3 of this paper.

Initialization Schemes

  • sklearn.GaussianMixture:
    Options include kmeans, k-means++, random, and random_from_data.
  • mclust:
    Uses model-based hierarchical agglomerative clustering for initialization.

Model Selection

  • sklearn.GaussianMixture:
    Lacks built-in automation. Users must implement their own model selection logic, as shown in this example.
  • mclust:
    Automates model selection across n_components and covariance_type based on BIC values.

Our Proposal to Bridge the Gap

  1. Enable automatic model selection across n_components and covariance_type.
  2. Add support for additional covariance_type options, such as tied-diagonal (referred to as "EEI" in mclust).

We believe these enhancements will significantly improve the utility and feature parity of sklearn.GaussianMixture.

Thank you for your consideration, and we look forward to hearing your feedback!

Best regards,
Tingshan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.