You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sklearn's example on comparing the different classifier accuracies does not have multiple settings for testing various scenarios.
There is no concrete example showing when some of these algorithms win and when they lose.
One scenario to consider is - given a dataset of a relatively low order dimension, how does the accuracy of classifiers change with respect to the addition of noise dimensions.
Noise dimensions are any features added across the dimensions of the dataset which bears no relevance to the original signal dimensions.
The experiment tests the performance of three classifiers - Random Forest, Support Vector Machine and K Nearest Neighbour in this setting. Noise dimensions sampled from a gaussian distribution of multiple variance values are concatenated to the original signal. We test the performance of these classifiers across the noise dimension and variance values, metric of evaluation is through the accuracy score.
Expected Results
We expect to see Random Forest perform better in this setting as compared to SVM and KNN as it is known to be more invariant to the variance of the data. We also expect to see the accuracy across all classifiers to drop as the number of noise dimensions increase.
Actual Results
The actual result match what we expect as across the 3 variance values of the noise and the three datasets taken into consideration - RF outperforms SVM and KNN. Simultaneously we see a drop in the accuracy score of all the classifiers in the addition of successive noise dimensions.
Proposed changes in the form of PR
I am proposing a new tutorial in the form of a jupyter notebook containing all the code from data generation to the computation of accuracies across noise dimensions.
The final figure will contain a plot of the original datasets adopted from https://github.com/sahanasrihari/scikit-learn/blob/master/examples/classification/CLASSIFIER_COMPARISON_PR.ipynb
and 9 different plots of "Accuracy Vs Number of Noise Dimensions" for the 3 different datasets and 3 different variances of gaussian noise. The plot will containing the testing accuracies across 50 trials of the experiment. The top row of the plot will contain the original datasets used for the experiment.
The text was updated successfully, but these errors were encountered:
Description
The sklearn's example on comparing the different classifier accuracies does not have multiple settings for testing various scenarios.
There is no concrete example showing when some of these algorithms win and when they lose.
One scenario to consider is - given a dataset of a relatively low order dimension, how does the accuracy of classifiers change with respect to the addition of noise dimensions.
Noise dimensions are any features added across the dimensions of the dataset which bears no relevance to the original signal dimensions.
The experiment tests the performance of three classifiers - Random Forest, Support Vector Machine and K Nearest Neighbour in this setting. Noise dimensions sampled from a gaussian distribution of multiple variance values are concatenated to the original signal. We test the performance of these classifiers across the noise dimension and variance values, metric of evaluation is through the accuracy score.
Expected Results
We expect to see Random Forest perform better in this setting as compared to SVM and KNN as it is known to be more invariant to the variance of the data. We also expect to see the accuracy across all classifiers to drop as the number of noise dimensions increase.
Actual Results
The actual result match what we expect as across the 3 variance values of the noise and the three datasets taken into consideration - RF outperforms SVM and KNN. Simultaneously we see a drop in the accuracy score of all the classifiers in the addition of successive noise dimensions.
Proposed changes in the form of PR
I am proposing a new tutorial in the form of a jupyter notebook containing all the code from data generation to the computation of accuracies across noise dimensions.
The final figure will contain a plot of the original datasets adopted from https://github.com/sahanasrihari/scikit-learn/blob/master/examples/classification/CLASSIFIER_COMPARISON_PR.ipynb
and 9 different plots of "Accuracy Vs Number of Noise Dimensions" for the 3 different datasets and 3 different variances of gaussian noise. The plot will containing the testing accuracies across 50 trials of the experiment. The top row of the plot will contain the original datasets used for the experiment.
The text was updated successfully, but these errors were encountered: