-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification vs regression #2
Comments
Decided to go with regression for now. Working on the performance (see #4). |
Yes, makes sense to me. Although we should probably find a way to create a direct comparison between classification and regression. For instance, we could rank the test sample into the N event types for each regression algorithm applied. This way we can evaluate performance exactly the same way for classification and regression. |
The way I understood it, the reason to go for regression was motivated by the freedom to divide the events after training. It wasn't motivated by performance. I actually assume that we will be able to get slightly better performance with classification. Let's discuss this later and decide if there's a reason to go back and look at classification. |
Yes, the reason to go for regression is clear. But I would find very informative to show how much better/worse will regression be. If there was an enormous improvement, then not having such a control over test statistics is not such a big deal... |
Yeah, I agree, if the performance is significantly better for classification, it might be worth it. We will keep this on the to-do list then. However, I would first like to solve all of the pending issues so we can get a good estimate on the regression performance (I am sure it will help for classification as well). |
Testing the regression performance for classification, the confusion matrix plots below were obtained. Three plots are shown below, for 2, 3 and 4 event type partitions. When partitioning to event types, the sample is divided based on the true angular difference such to have equal statistics in each sub-sample. I think it is fairly safe to say that 4 classes is too much based on these results. Maybe 3 classes is OK. Need to see what happens when we add all samples (gamma-cone, electrons and protons). In terms of classification vs regression, we need to run the classification again, but regression doesn't look too bad at the moment compared to the results in the first comment (hard to compare of course). |
Hi Orel. The plots look really really good! The performance seems way more than enough... So I'm really hoping the IRFs will look super different. One suggestion: perhaps convert these to row-wise %? Meaning that in a row, 70% are correct, 15% not so bad, 7% wrong, etc... To be fair with classification vs regression we should then devote a bit of time also to optimize the classification exercise. I would definitely not devote a lot of time to it, but if you feel like playing with ML some more... It could be fun. I'll open one issue now! |
I am not as impressed by the performance to be honest and I wonder how this will translate when looking at the IRFs, but we'll see.
Good idea, see corresponding plots below. (Still need to make the size change dynamically with the number of types, but it's a small detail.)
Yeah, I definitely have it already on the to-do list. |
I am actually positively impressed by the performance! :-) Regarding the plots, I was thinking of producing the same N by N plots with % instead of number of events, but the plots you produced are also super informative. We leave this issue open to show classification algorithms performance in the future. |
Please see below updated confusions matrices, after fixing the mistake mentioned in #4. Namely, the event type bins are now defined based on the reconstructed angular error rather than the true one. |
Hi Orel, Interesting!
Actually, I think there is a very clear effect on the intermediate types (now, they have a significantly worse determination over all energies), which is much more consistent to my previous tests (I was very surprised about how good your classification was in those!). In any case, it is not a show stopper at all: It just means that we are very good at identifying good and bad events, and it might simply mean that the types we may define in the future are "very good", "very bad" and "average performance" event types, not really requiring equal statistics. |
I wasn't referring to the 3 and 4 type cases because I am not sure it makes sense to define an "average performance" type if we can't classify to that event type well. We can discuss later of course and maybe it will improve if we improve performance. What's important now is that it looks like regression is not significantly worse than classification. |
Latest results of regression using Prod5 sample (see #4 for more details, first comment mentioning Prod5 results). |
Hi Orel, Regarding the "only define 2 event types", I'm not sure that would be the best approach, mainly due to statistics. In the 4-type classification you show, 25% of the best/worst events are very well classified. This means that if you select just 2 event types (each with 50% of the signal) we would be "dirtying" the great resolution of those super good events with others that we know are not as good. Maybe, as we have already discussed in the past, what we are seeing here is that we need 3 event types with uneven statistics:
I feel we need to start calculating IRFs to be able to answer this question. For example, if we are able to improve resolution by selecting just the top 10% of the events, then it could definitely be worth it. If resolution does not really improve when going from 20->10 %, then it makes no sense to go farther than that... |
Yes, I agree that calculating the IRFs would be the way to go here to make a decision. Also, I will try making the confusion matrices for different partitioning of events instead of the equal statistics. |
Going back to the main topic of this issue, which is better, classification or regression? |
Hi Orel, These plots show that their performance is very comparable... So seems reasonable to assume that the benefits provided by the regression algorithms (flexibility on deciding the amount of statistics for each event type at the IRF calculation stage) are more relevant than improving the classification at the 1-2% level. Although what I would show in the paper is probably just a accuracy vs energy plot comparing our best classifier with the best regressor (what you have here, but in a classical 1D plot). Maybe we could add the "2 off" lines just for reference... But again, these results seem pretty solid: classifiers don't seem to show a very significant improvement in performance over regressors (which is what we wanted!). :) |
By the way, this could actually be a plot to add into the proceeding. Its a bit technical, but its a good test any reviewer would ask you. |
Sure, I can make this plot for the paper no problem. Considering the space limitations of the proceedings, not sure if it will go there as well, but we'll see when the time comes. OK, so regression it is. Maybe we can even close this issue. |
Yep, this is definitely good enough! |
We can use two approaches: multi-class classification and regression
Multi-class classification:
The performance of most algorithms is really bad (roughly 35-40% precision), but generally I chose the algorithms that if they don't label properly an event, they are usually relatively close:
Each of these plots is a different energy bin in log scale, each showing the confusion matrix of the classifier: the Y axis are the true event types and the X axis the predicted one.
As you can see, it seems the "bad" events are generally well labeled across all energies (event type 3), while best events are more or less also well labeled. The intermediate event types seem rather random to me... But we will probably need to wait for the IRFs to see how good the separation really is. Best algorithm seems to be a One vs One ensemble of random forest classificators.
Regression:
Instead of just dividing into 4 groups, we can also try to estimate the expected angular difference between true and reconstructed direction. For that, I used the same variables as in the previous step.
Following a similar approach as before, I show the true (Y) vs reconstructed (X) log10(angular difference):
For the moment the best classification is given by a Ridge linear regression, but I probably need to play around more.
The good thing of performing a regression is that we can decide the statistics falling into each event type during IRF production, while in the case of classification we can only control the training statistics. I have not compared yet which classification method provides better classifications, but it will be trivial to do.
The text was updated successfully, but these errors were encountered: