Retail-Store-Customer-Segmentation-Using-K-Means-Clustering

This project focuses on segmenting customers of a retail store using the K-Means clustering algorithm. By leveraging customer data, including purchase history and demographic information, we aim to identify distinct customer groups. These segments can then be used to tailor marketing strategies and improve customer relationship management.

Dataset Description

The dataset used in this project contains transactional records from a retail store. Key attributes include:

Invoice Number: Unique identifier for each transaction.
Stock Code: Unique identifier for each product.
Description: Description of the product.
Quantity: Number of units purchased.
Invoice Date: Date of the transaction.
Unit Price: Price per unit of the product.
Customer ID: Unique identifier for each customer.
Country: Country where the customer resides.

Methodology

Data Collection: Collecting data on customers' purchase history and demographics.
Data Cleaning and Preprocessing: Handling missing values, outliers, and creating new features such as TotalCost.
Feature Engineering: Extracting relevant features for clustering.
Defining the Number of Clusters: Using the Silhouette Score to determine the optimal number of clusters.
K-Means Clustering: Applying the K-Means algorithm to segment the customers.
Evaluation and Interpretation: Analyzing the resulting clusters to identify distinct customer segments.

Dependencies

Python 3.x
pandas
numpy
matplotlib
seaborn
scikit-learn
jupyter

Usage

Clone the Repository:

git clone https://github.com/yourusername/Retail-Store-Customer-Segmentation.git
cd Retail-Store-Customer-Segmentation

Install Dependencies:
```
pip install -r requirements.txt
```
Run the Jupyter Notebooks: Open and run the customer-segmentation.ipynb and exploratory-data-analysis.ipynb notebooks to reproduce the results.

Results

Customer Segmentation: Customers were classified into distinct clusters based on their purchase behavior and demographics.
Visualization: t-SNE visualization was used to display the clusters.
Model Accuracy: Various models were tested for customer classification, with Random Forest achieving the highest accuracy.

Authors

Developed by:

Bhargavi Joshi - [email protected]
Pranjal Gautam - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
comment.png		comment.png
customer-segmentation.ipynb		customer-segmentation.ipynb
exploratory-data-analysis.ipynb		exploratory-data-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail-Store-Customer-Segmentation-Using-K-Means-Clustering

Dataset Description

Methodology

Dependencies

Usage

Results

Authors

About

Releases

Packages

Languages

Bhargavi-Joshi/Retail-Store-Customer-Segmentation-Using-K-Means-Clustering

Folders and files

Latest commit

History

Repository files navigation

Retail-Store-Customer-Segmentation-Using-K-Means-Clustering

Dataset Description

Methodology

Dependencies

Usage

Results

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages