-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterBasedNormalizer vs GaussianNormalizer vs PowerTransformer #613
Labels
question
General question about the software
Comments
candalfigomoro
added
new
Label applied to new issues
question
General question about the software
labels
Feb 13, 2023
Hi @candalfigomoro, thanks for the feedback. We'll keep this issue open to share any information as we investigate the specifics of this transformers. Some considerations:
If you have done any exploration yourself along these lines, we'd be very eager to see it! |
npatki
added
under discussion
Issue is currently being discussed
and removed
new
Label applied to new issues
labels
Mar 29, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When using CTGAN, data is normalized using ClusterBasedNormalizer.
In RDT, GaussianNormalizer is also implemented.
What are the advantages of ClusterBasedNormalizer and GaussianNormalizer compared to using sklearn's PowerTransformer (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html) with the Yeo-Johnson method? Couldn't a power transform be used instead (which would perhaps be faster than ClusterBasedNormalizer)?
Thank you
The text was updated successfully, but these errors were encountered: