kNN From Scratch

Introduction

This repository consists of code and example implementations for my medium article on building k-Nearest Neighbors from scratch and evaluating it using k-Fold Cross validation which is also built from scratch

For PyPI package version please refer to this repository

Neighbors (Image Source: Freepik)

k-Nearest Neighbors

k-Nearest Neighbors, kNN for short, is a very simple but powerful technique used for making predictions. The principle behind kNN is to use “most similar historical examples to the new data.”

k-Nearest Neighbors in 4 easy steps

Choose a value for k
Find the distance of the new point to each record of training data
Get the k-Nearest Neighbors
Making Predictions
- For classification problem, the new data point belongs to the class that most of the neighbors belong to.
- For regression problem, the prediction can be average or weighted average of the label of k-Nearest Neighbors

Finally, we evaluate the model using k-Fold Cross Validation technique

k-Fold Cross Validation

This technique involves randomly dividing the dataset into k-groups or folds of approximately equal size. The first fold is kept for testing and the model is trained on remaining k-1 folds.

5 fold cross validation. Blue block is the fold used for testing. (Image Source: sklearn documentation)

Datasets Used

The datasets used here are taken from UCI Machine Learning Repository

Car Evaluation and Breast cancer datasets contain text attributes. As we cannot run the classifier on text attributes, we need to convert categorical input features. This is done using LabelEncoder of sklearn.preprocessing. LabelEncoder can be applied on a dataframe or a list. LabelEncoder encodes labels with value between 0 and n_classes-1.

Applying LabelEncoder on entire dataframe

from sklearn import preprocessing

df = pd.DataFrame(data)
df = df.apply(preprocessing.LabelEncoder().fit_transform)

Applying LabelEncoder on a list

labels = preprocessing.LabelEncoder().fit_transform(inputList)

References

More info on Cross Validation can be seen here
kNN
kFold Cross Validation

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Datasets		Datasets
images		images
README.md		README.md
kNNFromScratch.ipynb		kNNFromScratch.ipynb
knnWithKFCV.ipynb		knnWithKFCV.ipynb
knnWithKFCV.py		knnWithKFCV.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kNN From Scratch

Introduction

k-Nearest Neighbors

k-Nearest Neighbors in 4 easy steps

k-Fold Cross Validation

Datasets Used

References

About

Releases

Packages

Languages

chaitanyakasaraneni/knnFromScratch

Folders and files

Latest commit

History

Repository files navigation

kNN From Scratch

Introduction

k-Nearest Neighbors

k-Nearest Neighbors in 4 easy steps

k-Fold Cross Validation

Datasets Used

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages