This project uses the Kaggle breast cancer dataset to train and evaluate an SVM model for predicting breast cancer diagnosis. The project includes data processing, model training, performance evaluation, and result visualization.
data.csv
: The breast cancer dataset downloaded from Kaggle.myData.csv
: The output file containing the predicted labels (YTestPred) and actual labels (YTest).202406140459.gif
: A gif showing a demonstration of runningbreast_cancer.m
in MATLAB and reviewingmyData.xlsx
.myData.xlsx
: An Excel file calculating the number of false and true predictions (7 false, 163 true, total 170).breast_cancer.m
: The source code for the project.report.html
: The HTML report detailing the project results.
The dataset was loaded from a CSV file. Predictors (features) were separated from the response (labels). The first two columns were excluded as they contain the ID and diagnosis information. The remaining columns were used as features.
The data was split into training and testing sets with a 70-30 split. The cvpartition
function was used to create a holdout cross-validation partition to ensure reproducibility.
An SVM model with a linear kernel was trained on the training set. The features were standardized before training.
The trained SVM model was used to predict the labels for the test set.
The accuracy of the model on the test set was calculated, and a confusion matrix was created to evaluate performance in detail.
-
Accuracy on test set: 95.88%
-
Confusion Matrix:
Actual / Predicted Positive Negative Positive 62 2 Negative 5 101
There are 7 false predictions (5 false positives and 2 false negatives) and 163 true predictions, resulting in a total of 170 test samples.
The 202406140459.gif
file shows a demonstration of running breast_cancer.m
in MATLAB and reviewing myData.xlsx
.
- Ensure all the files are in the same directory.
- Run the
breast_cancer.m
script to process the data, train the model, make predictions, and generate themyData.csv
output file. - Open
report.html
in a web browser to view the detailed report.
The breast cancer dataset used in this project is available on Kaggle and is subject to the following license: