Processing NASA Kepler space telescope data through machine learning models capable of classifying candidate exoplanets from the raw dataset.
Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system.
To help process this data, you will create machine learning models capable of classifying candidate exoplanets from the raw dataset.
In this project, you will need to:
- Preprocess the dataset prior to fitting the model.
- Perform feature selection and remove unnecessary features.
- Use
MinMaxScaler
to scale the numerical data. - Separate the data into training and testing data.
- Use
GridSearch
to tune model parameters. - Tune and compare at least two different classifiers.
- Random Forest model is better than Support Vector Machine, even without hyper tuning the parameters with GridSearchCV. Of all the models considered (SVM, Random Forest and Gradient Boosting). Gradient Boosting model yielded highest accuracy of 91%.
precision recall f1-score support
CANDIDATE 0.76 0.61 0.68 422
CONFIRMED 0.69 0.79 0.74 450
FALSE POSITIVE 0.99 1.00 0.99 876
accuracy 0.85 1748
macro avg 0.81 0.80 0.80 1748
weighted avg 0.85 0.85 0.85 1748
precision recall f1-score support
CANDIDATE 0.84 0.71 0.77 422
CONFIRMED 0.76 0.86 0.80 450
FALSE POSITIVE 0.99 1.00 0.99 876
accuracy 0.89 1748
macro avg 0.86 0.85 0.86 1748
weighted avg 0.89 0.89 0.89 1748
precision recall f1-score support
CANDIDATE 0.76 0.84 0.80 378
CONFIRMED 0.85 0.82 0.84 469
FALSE POSITIVE 1.00 0.97 0.98 901
accuracy 0.90 1748
macro avg 0.87 0.88 0.87 1748
weighted avg 0.91 0.90 0.90 1748
precision recall f1-score support
CANDIDATE 0.84 0.75 0.79 422
CONFIRMED 0.81 0.86 0.83 450
FALSE POSITIVE 0.97 1.00 0.99 876
accuracy 0.90 1748
macro avg 0.88 0.87 0.87 1748
weighted avg 0.90 0.90 0.90 1748
precision recall f1-score support
CANDIDATE 0.85 0.79 0.82 422
CONFIRMED 0.82 0.85 0.83 450
FALSE POSITIVE 0.99 1.00 0.99 876
accuracy 0.91 1748
macro avg 0.88 0.88 0.88 1748
weighted avg 0.91 0.91 0.91 1748
precision recall f1-score support
CANDIDATE 0.84 0.80 0.82 422
CONFIRMED 0.82 0.85 0.83 450
FALSE POSITIVE 0.99 1.00 1.00 876
accuracy 0.91 1748
macro avg 0.88 0.88 0.88 1748
weighted avg 0.91 0.91 0.91 1748