This project, titled Customer Insights and Sales Performance Dashboard, focuses on analyzing retail transactional data to uncover insights into customer behavior, segmentation, and sales patterns. The project was undertaken to develop a comprehensive understanding of customer dynamics, particularly focusing on retention rates, repeat customer behavior, and sales performance across various segments.
View the full interactive dashboard here
Retail businesses thrive on understanding their customers and optimizing sales strategies based on data-driven insights. However, raw data is often messy and requires significant cleaning and processing before it can be used for meaningful analysis. In this project, I started with a retail transactional dataset that contained several inconsistencies and missing values. The challenge was to clean this data, derive meaningful insights, and present them in a user-friendly dashboard.
The dataset used in this project was sourced from Kaggle (Retail Transactional Dataset). The dataset contains detailed information on customer transactions, including:
- Customer Information: Customer ID, Name, Email, Phone, Address, City, State, Zipcode, Country, Age, Gender, Income, Customer Segment.
- Transaction Details: Last Purchase Date, Total Purchases, Amount Spent.
- Product Information: Product Category, Product Brand, Product Type.
- Feedback: Customer feedback or rating related to the product or service received.
- Transaction Logistics: Shipping Method, Payment Method, Order Status.
The first step involved cleaning the raw dataset to remove inconsistencies, handle missing values, and prepare the data for analysis. This was achieved through a series of steps, which are documented in the Jupyter notebook Clean_The_Data.ipynb. Key steps include:
- Dropping rows with missing critical information: Rows with missing values in key columns like
Transaction_ID
,Customer_ID
,Email
,Phone
, etc., were removed to ensure data integrity. - Handling duplicates: Duplicates were identified based on
Transaction_ID
andCustomer_ID
, and only the first occurrence was retained. - Filling missing values: Missing values in columns such as
Amount
,Total_Purchases
, andTotal_Amount
were calculated and filled based on existing data. - Parsing dates: Dates were parsed and converted into a consistent format. Additional columns for
Year
,Month
, andDay_of_Week
were created to facilitate time-based analysis. - Dropping rows with missing
Zipcode
orAddress
: Rows missing bothZipcode
andAddress
were removed, and further missing values inCity
,State
, orCountry
were filled where possible.
The cleaned dataset was saved as both a CSV and an Excel file to ensure compatibility with various tools.
To make the cleaned data compatible with Power BI, it was saved as an Excel file using the notebook csv to xlsx.ipynb. This step ensured no loss of data and allowed seamless integration into Power BI for visualization.
The cleaned data was imported into Power BI to create a dynamic and interactive dashboard. The dashboard, titled Customer Insights and Sales Performance Dashboard, includes the following components:
-
Main Left Column - Customer Overview:
-
Top Row:
-
Middle Row:
-
Bottom Row:
To further enhance the dashboard, DAX (Data Analysis Expressions) was used to create custom columns and measures:
-
Combined Customer Segment Column:
CombinedCustomerSegment = SWITCH( TRUE(), 'Retail-Sales-Analysis'[Customer_Segment] = "Premium", "Returning Customer", 'Retail-Sales-Analysis'[Customer_Segment] = "Regular", "Returning Customer", 'Retail-Sales-Analysis'[Customer_Segment] = "New", "New", BLANK() // Handle any unexpected values )
-
Repeat Customer Rate Measure:
RepeatCustomerRate = DIVIDE( CALCULATE(DISTINCTCOUNT('Retail-Sales-Analysis'[Customer_ID]), 'Retail-Sales-Analysis'[Customer_Segment] = "Regular"), DISTINCTCOUNT('Retail-Sales-Analysis'[Customer_ID]) )
-
Retention Rate Measure:
RetentionRate = DIVIDE( CALCULATE( COUNTROWS('Retail-Sales-Analysis'), 'Retail-Sales-Analysis'[CombinedCustomerSegment] = "Returning Customer" ), COUNTROWS('Retail-Sales-Analysis'), 0 // Optional: Default value to return if the denominator is 0 )
The final output is a Power BI dashboard that provides comprehensive insights into customer behavior, segmentation, and sales performance. The dashboard is visually appealing and user-friendly, allowing stakeholders to make informed decisions based on data.
- Python (Pandas, NumPy): For data cleaning, manipulation, and processing.
- Jupyter Notebook: For documenting and running the data cleaning process.
- Power BI: For data visualization and creating the interactive dashboard.
- DAX (Data Analysis Expressions): For creating custom columns and measures in Power BI.
- Icons8: For sourcing icons used in the dashboard.
This project demonstrates the entire workflow from raw data to actionable insights, showcasing my ability to clean, process, and analyze large datasets using Python and Power BI. The final dashboard provides valuable insights into customer behavior and sales performance, enabling data-driven decision-making.
For further details, you can explore the provided notebooks and the Power BI dashboard.
This project is licensed under the MIT License.