top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Predicting Customer Satisfaction in Supermarket Sales Using Random Forest Regression

Project type

Random Forest Regression

Date

2024

Role

Data Scientist

This project applied supervised machine learning techniques to analyze transactional data from a supermarket sales dataset and attempted to predict customer satisfaction ratings. It combined exploratory data analysis, feature engineering, and modeling using Random Forest regression to evaluate the feasibility of forecasting customer sentiment based on sales and operational data.

Dataset Source:

Kaggle: Supermarket Sales Dataset

1,000 records | 17 features (both categorical and numerical)

Objectives:
Understand key sales and revenue patterns by branch, product line, and payment method

Predict customer satisfaction ratings based on measurable transactional variables

Assess which features, if any, strongly influence ratings

Tools and Technologies:
R Programming (randomForest, caret, tidyverse)

Modeling Technique: Random Forest Regression

Evaluation Metrics: RMSE, MAE, R²

Key Analytical Steps:
Data Preparation:

Selected predictive features: Unit Price, Quantity, Total, COGS, Gross Income

Target variable: Customer Rating (1 to 10 scale)

80/20 train-test split for modeling

Modeling Approach:

Trained a Random Forest Regression model (100 trees)

Evaluated using standard metrics:

RMSE: 1.867

MAE: 1.579

R²: 0.00095 (indicating extremely low explanatory power)

Visualization & Insights:

Explored payment method distributions (Cash, E-wallet, Credit Card)

Analyzed sales performance by product line using boxplots

Tracked total sales trends across date/time

Key Takeaways:
Sales metrics alone are insufficient to predict customer ratings

Ratings are likely influenced by non-quantitative factors such as customer experience, staff interaction, or brand trust

The model struggled to generalize, suggesting a mismatch between available predictors and the complexity of human satisfaction

Recommendations:
Enrich the dataset with qualitative data such as customer reviews or service feedback

Leverage Natural Language Processing (NLP) to analyze textual customer sentiment

Explore alternative machine learning approaches, such as:

Sentiment classification

Customer segmentation

Hybrid models combining quantitative and qualitative inputs

This project demonstrates your ability to:

Apply end-to-end predictive analytics workflows

Use Random Forest modeling effectively and interpret its limitations

Translate modeling performance into strategic business recommendations

Communicate technical findings in an accessible, data storytelling format

bottom of page