Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matbench task model accuracy for xtal2png representation #50

Open
sgbaird opened this issue Jun 1, 2022 · 4 comments
Open

Matbench task model accuracy for xtal2png representation #50

sgbaird opened this issue Jun 1, 2022 · 4 comments
Assignees
Labels
manuscript-enhancements Interesting things to explore that can enhance the manuscript

Comments

@sgbaird
Copy link
Member

sgbaird commented Jun 1, 2022

The task is to use a CNN model for a Matbench submission on regressing formation energy using the xtal2png representation (as an image and/or as an array would be fine). This will help with knowing how "good" the xtal2png representation is from a model accuracy perspective, though I don't expect this to set new benchmarks necessarily.

This might look like using skorch with some type of pytorch CNN module (e.g. ResNetUNet, Net) and an MSE loss function. This skorch tutorial looks like it might help with loading images, though this SO answer is probably better for making the actual dataset to pass to skorch.

If regression is too much of a pain (CNNs aren't used as often for property regression in the image-processing domain), an easy fallback is to do the mp_is_metal binary classification task instead of the e_form regression task.

Related:

Maybe Faris interested in working on this given that he'll be doing some image processing

@sgbaird
Copy link
Member Author

sgbaird commented Jun 1, 2022

@sgbaird sgbaird added the manuscript-enhancements Interesting things to explore that can enhance the manuscript label Jun 1, 2022
@sgbaird
Copy link
Member Author

sgbaird commented Jun 1, 2022

The following tutorial uses grayscale MNIST dataset for classification and might be one of the easiest to adapt to mp_is_metal (or to mp_e_form via adjusting the final layer and the loss function)

@sgbaird
Copy link
Member Author

sgbaird commented Jul 8, 2022

@faris-k did a classification task on mp_is_metal and is getting the files ready for a Matbench PR. See the notebook.

@sgbaird
Copy link
Member Author

sgbaird commented Jul 9, 2022

It does leave the question on my mind, why does the regression results are so poor (much worse than dummy), whereas the classification results are OK (a bit better than dummy).

A follow-up computational experiment (that I think we should leave on the back-burner until further notice) is using the classification model, but with bins for the classes (e.g. formation energy between 0 and 0.05). Implementing ordinal classification would be extra work, so first treat it as categorical. I'm putting this here more as a future reference sort of thing as things progress with xtal2png.

It's also interesting in the sense that hyperparameter-tuned XGBoost did a pretty good job on the regression task (~4x better than the CNN regression), and this was with much less information. We'll see if the results still hold when we double-check that data leakage wasn't coming into play. #51 and specifically #51 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
manuscript-enhancements Interesting things to explore that can enhance the manuscript
Projects
None yet
Development

No branches or pull requests

2 participants