Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifier comparison with noise dimensions #17

Merged
merged 4 commits into from
Dec 19, 2019

Conversation

sahanasrihari
Copy link

@sahanasrihari sahanasrihari commented Dec 11, 2019

Tutorial on testing the performance of Random Forest, Support Vector Machine, K Nearest Neighbours given additional noise dimensions of different variance values.

Reference Issues/PRs

This is in reference to the issue stated in neurodata#1

What does this implement/fix? Explain your changes.

It is a new tutorial demoing the effect of addition noise dimensions on the accuracy of three classifiers. This gives us insight into one setting - which classification algorithm performs best amidst all the noise dimensions.
Here is a link to the code: https://github.com/sahanasrihari/scikit-learn/blob/master/examples/classification/CLASSIFIER_COMPARISON_PR.ipynb

Any other comments?

Random Forest is known to be robust in the sense of additional noise dimensions and especially with respect to the variance in the dataset. It outperforms both SVM and KNN according to the experiments run.

@sahanasrihari
Copy link
Author

sahanasrihari commented Dec 12, 2019

Tutorial on testing the performance of Random Forest, Support Vector Machine, K Nearest Neighbours given additional noise dimensions of different variance values.

Reference Issues/PRs

This is in reference to the issue stated in #19

What does this implement/fix? Explain your changes.

It is a new tutorial demoing the effect of addition noise dimensions on the accuracy of three classifiers. This gives us insight into one setting - which classification algorithm performs best amidst all the noise dimensions.

Any other comments?

Random Forest is known to be robust in the sense of additional noise dimensions and especially with respect to the variance in the dataset. It outperforms both SVM and KNN according to the experiments run.

@sahanasrihari
Copy link
Author

sahanasrihari commented Dec 12, 2019

@bdpedigo
Copy link

  • typo: "trails" instead of trials
  • call fit_predict something else. Just because this is a common sklearn term already. maybe just fit_models or something like that
  • typo \ in "Computation of accuracy"
  • you have a bug where you are not passing a new variance into fit_predict each time
  • compute is too vague, maybe run_classification_experiment? Open to better suggestions
  • make file name not all caps
  • remove ticks and tick labels for dataset visualizations
  • remove xtick labels for all accuracy plots besides bottom row. Can keep ticks themselves
  • add Noise dimensions label to the bottom row of all 3 cols
  • add accuracy label to leftmost column of all rows
  • after doing all of the above, see if you can increase font size by 1.5x or 2x without it looking too cluttered
  • put everything on same yscale, then you can remove all yticklabels besides left-most ones. Could probably keep ticks themselves but see how it looks
  • I wonder whether it is worth pointing out where chance is - e.g. 0.5 with a line or something like that. You decide, maybe it would look bad idk.

Despite above comments (which I think should actually be easy to address) code is very clear and has come a long way. Nice work!

@bdpedigo
Copy link

@sahanasrihari let me know the status of

  • you have a bug where you are not passing a new variance into fit_predict each time
    in particular, I am curious to see new results if I am right about this

Copy link
Author

@sahanasrihari sahanasrihari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdpedigo File changes after the feedback.

@bdpedigo bdpedigo merged commit 1a201fc into NeuroDataDesign:master Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants