Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Predicting Customer Satisfaction in Supermarket Sales Using Random Forest Regression
This project applied supervised machine learning techniques to analyze transactional data from a supermarket sales dataset and attempted to predict customer satisfaction ratings. It combined exploratory data analysis, feature engineering, and modeling using Random Forest regression to evaluate the feasibility of forecasting customer sentiment based on sales and operational data.
Dataset Source:
Kaggle: Supermarket Sales Dataset
1,000 records | 17 features (both categorical and numerical)
Objectives:
Understand key sales and revenue patterns by branch, product line, and payment method
Predict customer satisfaction ratings based on measurable transactional variables
Assess which features, if any, strongly influence ratings
Tools and Technologies:
R Programming (randomForest, caret, tidyverse)
Modeling Technique: Random Forest Regression
Evaluation Metrics: RMSE, MAE, R²
Key Analytical Steps:
Data Preparation:
Selected predictive features: Unit Price, Quantity, Total, COGS, Gross Income
Target variable: Customer Rating (1 to 10 scale)
80/20 train-test split for modeling
Modeling Approach:
Trained a Random Forest Regression model (100 trees)
Evaluated using standard metrics:
RMSE: 1.867
MAE: 1.579
R²: 0.00095 (indicating extremely low explanatory power)
Visualization & Insights:
Explored payment method distributions (Cash, E-wallet, Credit Card)
Analyzed sales performance by product line using boxplots
Tracked total sales trends across date/time
Key Takeaways:
Sales metrics alone are insufficient to predict customer ratings
Ratings are likely influenced by non-quantitative factors such as customer experience, staff interaction, or brand trust
The model struggled to generalize, suggesting a mismatch between available predictors and the complexity of human satisfaction
Recommendations:
Enrich the dataset with qualitative data such as customer reviews or service feedback
Leverage Natural Language Processing (NLP) to analyze textual customer sentiment
Explore alternative machine learning approaches, such as:
Sentiment classification
Customer segmentation
Hybrid models combining quantitative and qualitative inputs
This project demonstrates your ability to:
Apply end-to-end predictive analytics workflows
Use Random Forest modeling effectively and interpret its limitations
Translate modeling performance into strategic business recommendations
Communicate technical findings in an accessible, data storytelling format