Skip to content

chaitanyakasaraneni/knnFromScratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kNN From Scratch

Introduction

This repository consists of code and example implementations for my medium article on building k-Nearest Neighbors from scratch and evaluating it using k-Fold Cross validation which is also built from scratch

For PyPI package version please refer to this repository

Neighbors

Neighbors (Image Source: Freepik)

k-Nearest Neighbors

k-Nearest Neighbors, kNN for short, is a very simple but powerful technique used for making predictions. The principle behind kNN is to use “most similar historical examples to the new data.”

k-Nearest Neighbors in 4 easy steps

  • Choose a value for k
  • Find the distance of the new point to each record of training data
  • Get the k-Nearest Neighbors
  • Making Predictions
    • For classification problem, the new data point belongs to the class that most of the neighbors belong to.
    • For regression problem, the prediction can be average or weighted average of the label of k-Nearest Neighbors

Finally, we evaluate the model using k-Fold Cross Validation technique

k-Fold Cross Validation

This technique involves randomly dividing the dataset into k-groups or folds of approximately equal size. The first fold is kept for testing and the model is trained on remaining k-1 folds. kFCV

5 fold cross validation. Blue block is the fold used for testing. (Image Source: sklearn documentation)

Datasets Used

The datasets used here are taken from UCI Machine Learning Repository

Car Evaluation and Breast cancer datasets contain text attributes. As we cannot run the classifier on text attributes, we need to convert categorical input features. This is done using LabelEncoder of sklearn.preprocessing. LabelEncoder can be applied on a dataframe or a list. LabelEncoder encodes labels with value between 0 and n_classes-1.

Applying LabelEncoder on entire dataframe

from sklearn import preprocessing

df = pd.DataFrame(data)
df = df.apply(preprocessing.LabelEncoder().fit_transform)

Applying LabelEncoder on a list

labels = preprocessing.LabelEncoder().fit_transform(inputList)

References

About

Example Code for building kNN from scratch with kFold Cross Validation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published