This project focuses on segmenting customers of a retail store using the K-Means clustering algorithm. By leveraging customer data, including purchase history and demographic information, we aim to identify distinct customer groups. These segments can then be used to tailor marketing strategies and improve customer relationship management.
The dataset used in this project contains transactional records from a retail store. Key attributes include:
- Invoice Number: Unique identifier for each transaction.
- Stock Code: Unique identifier for each product.
- Description: Description of the product.
- Quantity: Number of units purchased.
- Invoice Date: Date of the transaction.
- Unit Price: Price per unit of the product.
- Customer ID: Unique identifier for each customer.
- Country: Country where the customer resides.
- Data Collection: Collecting data on customers' purchase history and demographics.
- Data Cleaning and Preprocessing: Handling missing values, outliers, and creating new features such as
TotalCost
. - Feature Engineering: Extracting relevant features for clustering.
- Defining the Number of Clusters: Using the Silhouette Score to determine the optimal number of clusters.
- K-Means Clustering: Applying the K-Means algorithm to segment the customers.
- Evaluation and Interpretation: Analyzing the resulting clusters to identify distinct customer segments.
- Python 3.x
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- jupyter
- Clone the Repository:
git clone https://github.com/yourusername/Retail-Store-Customer-Segmentation.git cd Retail-Store-Customer-Segmentation
- Install Dependencies:
pip install -r requirements.txt
- Run the Jupyter Notebooks:
Open and run the
customer-segmentation.ipynb
andexploratory-data-analysis.ipynb
notebooks to reproduce the results.
- Customer Segmentation: Customers were classified into distinct clusters based on their purchase behavior and demographics.
- Visualization: t-SNE visualization was used to display the clusters.
- Model Accuracy: Various models were tested for customer classification, with Random Forest achieving the highest accuracy.
Developed by:
- Bhargavi Joshi - [email protected]
- Pranjal Gautam - [email protected]