Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification performance #13

Open
orelgueta opened this issue Jan 29, 2021 · 7 comments
Open

Classification performance #13

orelgueta opened this issue Jan 29, 2021 · 7 comments

Comments

@orelgueta
Copy link
Collaborator

The plots below compare the scores of all classification models for 3 types. Surprisingly (or not!), MLP_small is the best model also for classification (apologies for it changing colours between plots...).
The drop in performance in the two highest energy bins is interesting. Perhaps we need to split those bins to smaller bins?

scores_features_classifier_1

scores_features_classifier_2

scores_features_classifier_3

@orelgueta
Copy link
Collaborator Author

Below are the confusion matrices for the best model (MLP_small) for 3 types. Both 1D and 2D versions are shown.
What I conclude from these matrices is that perhaps 3 types is too many, maybe 2 types is more realistic (plots for 2 types to come soon). We need to see the IRFs to decide perhaps.
Also, I don't think the performance is better than regression. Definitely not significantly better.

MLP_small_classifier_ntypes_3_confusion_matrix_n_types_3

MLP_small_classifier_ntypes_3_1d_confusion_matrix_n_types_3

@orelgueta
Copy link
Collaborator Author

Below are the same score comparison as above, but this time for 2 event types. This time the best model is not so clear. MLP small is probably still best overall, but it has a drop at one energy bin (1.2 < E < 1.7 TeV) and at higher energies even the linear models (Ridge), perform a bit better. Unclear why this is, but since we will anyway switch to the Prod5 dataset, add more variables and might anyway use regression, I won't investigate at the moment. I still take MLP small as the default for the rest of the plots.

scores_features_classifier_n_types_2_1

scores_features_classifier_n_types_2_2

scores_features_classifier_n_types_2_3

@orelgueta
Copy link
Collaborator Author

Confusion matrices for two event types can be seen below. I think they are not significantly better than the regression confusion matrices.

MLP_small_classifier_ntypes_2_confusion_matrix_n_types_2

MLP_small_classifier_ntypes_2_1d_confusion_matrix_n_types_2

@orelgueta
Copy link
Collaborator Author

Please see below updated plots for classification with the Prod5 sample. Additional details about the sample can be seen in #4. Below are the results for 2 event types, where we can see that MLP_small is clearly the best. I don't see a reason to assume it will not be the best also for other numbers of event types, so for 3 event types I only train MLP_small (results in the next comment).

scores_features_classifier_n_types_2_1

scores_features_classifier_n_types_2_2

@TarekHC
Copy link
Collaborator

TarekHC commented Feb 4, 2021

Question: Aren't they way too good?! Getting 70-80% of the classifications right seems really good... Are we expecting such a performance?

@orelgueta
Copy link
Collaborator Author

With just two types? Yea that doesn't surprise me considering the regression performance we saw before. Soon I will get the 3-event-type results and then we will see if it's too good.

@orelgueta
Copy link
Collaborator Author

Below are the confusion matrices for 2 and 3 types MLP_small classification model. With 3 event types we get 60-65% classification accuracy, which is more reasonable I guess. The comparison with regression will be discussed in #2.

MLP_small_classifier_ntypes_2_confusion_matrix_n_types_2

MLP_small_classifier_ntypes_3_confusion_matrix_n_types_3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants