Sending Out The Most Appealing Star Bucks Offer!

9 min readApr 7, 2021

Problem Introduction

This is the Starbucks Capstone Project for Data Scientist Nanodegree in Udacity. The data set comes from the Nanodegree program that simulates offers sent to customers and how customers responded to these offers. We are to investigate how to predict and maximize the revenue and increase customer engagement with the offers we send out so we have to build a recommendation engine.

Business Understanding

Starbucks is always on the go for analyzing its customers and gaining the most revenue from its customers, milking whatever they can by sending out offers that startle your cravings whether it is buy one get one free or major discounts. What are some ways we can better optimize their marketing strategies to maximize their revenue?

Strategy to Solve the Problem

Leveraging Star Bucks’ first-party data to target / retarget its existing customers is definitely a smart methodology to maintain its loyal customers and to keep them interested in new offerings and sales.

The methodology we will use is to process/clean the received data sets into useable data, for each attribute, span out into multiple columns of binary numbers of 0 and 1 to keep track of which touchpoint did the user interact with.

After the data is processed, we will duplicate the data frames so that we have an extra copy on hand and begin exploratory analysis to see what the audience looks like, matrix factorization, implementing Singular Value Decomposition, making a prediction model predict whether or not a user will place an order with the offer he/she viewed. The qualifier and constraint is that the user must follow a specific sequence of receiving, viewing, and completing an order to be qualified as a “full customer journey” and the correct path to track whether or not the offer was successful. After we have these data, we will create a recommendation engine so that if a new user comes on board, which offer should he/she receive? Perhaps it would be the offer that had the most engagement overall since we do not have previous data on a “new customer”. And if it is an existing customer, we can analyze based on historic purchases and past behavior from other customers who happen to interact with a certain offer.

Data Set Info

portfolio.json

id (string) — offer id
offer_type (string) — type of offer ie BOGO, discount, informational
difficulty (int) — minimum required spend to complete an offer
reward (int) — reward given for completing an offer
duration (int) — time for offer to be open, in days
channels (list of strings)

profile.json

age (int) — age of the customer
became_member_on (int) — date when customer created an app account
gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
id (str) — customer id
income (float) — customer’s income

transcript.json

event (str) — record description (ie transaction, offer received, offer viewed, etc.)
person (str) — customer id
time (int) — time in hours since start of test. The data begins at time t=0
value — (dict of strings) — either an offer id or transaction amount depending on the record

Each containing different types of data but with a primary key of “id”

Data Cleaning and Exploration (EDA)

For the portfolio, I coded a one-hot encoding on categorical columns so that it is spanned out into separate columns. This will allow us to know which channel and which discount got interactions with whcihever customer and whichever offer ID.

For profile, I changed anyone who has age 118 to NaN because the missing values reside in the value 118. Then I dropped all missing values. and calculated the number of days the person has been a member of Starbucks. This gives a clean profile dataset that allows us to analyze its age, gender, income and how many days the person has been a member of this community.

Exploration

I plotted histograms for the customer's age, membership duration, and also their income. We see that high-income individuals arent necessarily the major audience of Starbucks. Audience-wise, Starbucks has more middle income and low income which are very likely to be students who are working part-time jobs.

Most members within that age range are 40–60 who happen to drink a lot of Starbucks, likely to be in the workforce, working in the office I would presume. Most of Starbucks customers are actually newer customers, seems like Starbucks is losing on their loyal customers who have been members for more than 2000 days? There s a sudden drop from 1250 and 2000 days, maybe it is always on a pattern to decline every several years.

Data Modelling

The way we can look at this is a customer can only be influenced by the offer only if he/she has viewed AND activated the offer. So anyone who doesn’t “view” the offer but activated, will not be counted towards the correlation. So we really have to look for patterns such as “offer received, offer viewed, and offer completed.

We can iterate through the data to see if there are matching patterns, if there are, we encode 1, if not we encode 0. After creating the matrix, we can start using our modeling technique.

FunkSVD

Using FunkSVD, we split the matrix into user matrix, latent feature matrix, and offer matrix. The reason why we have to use FunkSVD is that having missing values in the matrix will not work in normal SVD.

Like any other machine learning algorithms, we have to split the data into train and test datasets. Since this is almost like a time series, we use an “earlier time” to train the model and using the later half data to test the data to see if it runs smoothly and accurately. For this case, I used the first 60% to train and 40% to test, hopefully, it will not overfit.

Metrics

Here we used mean squared error and have 100 iterations for each set of latent feature (30, 20, 10). The methodology basically compares each paired offer with a customer and computes the error minus the dot product of the user and offers latent features. As the final output, we sum up the square errors and we get a metric that determines whether or not this is accurate or did it underfit/overfit. The reason, why I think these would be the best to evaluate our model, is because the training dataset is used to help shape the model, if the model doesn’t predict accuracy on the test data, then that means there will be a higher error rate, which indicates poor performance. The sum of square errors is the final metric we will use to judge whether or not the model is performing well or not, the lower the better. The cost of making each error is huge because time is very valuable and the time to construct the model, run the model, and validate all takes time. If the model is not accurate, it means all the predictions will be incorrect which makes forecasting difficult and probably sending out an ineffective offer that ultimately does not generate sales.

Hyperparameter Tuning and Evaluation

For evaluating how accurate our model is, I used Mean Squared Error and keep track of the iterations of FunkSVD. The algorithm basically iterates the data for each user offer paired, it computes the error as the actual minus the dot product of the user and offers latent features, we then sum up the square errors. For parameter tuning, we have 30, 20, 10 latent features to test which one suits the model best. Turns out the model with 30latent features performed the worst and the one with 10 latent features performed the best.

As we can see, the matrix with 30 latent features had 0.31 mean squared error versus a 0.259 MSE 10 latent features. Hence why we will be using the 10 latent features matrix since it is. more accurate with fewer errors. Again these are run at 100 iterations, it could affect the results a little bit and also at a different learning rate.

Recommendation Engine and Results

When it comes to suggesting the best offer, we already have data on how well an offer is engaged by a certain user and for each user. That being said, we have computed a score for each type of offer for each customer. If one offer has a higher score, then that offer will be chosen to be recommended to the customer.

First off, we can search by a specific user and run our recommendation engine to identify which offer did this specific user reacted to the most.

Secondly, if we cannot find the user OR the customer ID does not exist OR we don’t have enough data to provide a recommendation OR it is a new user, we can recommend by default the top-performing sales offer. This should be the safest bet as it averages out for the overall population.

Based on male and female, which gender reacted the most to BOGO or discount?

Lastly, we can also look at which channel performed the best to which gender.

Improvements

I think there may be other machine learning techniques that I can experiment and they may provide different results. Using logistic regression, random forests, classification, clustering, these may be able to identify different types of audiences and also be able to send out more accurate predictions of how the offer will perform.

Also, I do believe we can run more iterations to improve results, I did 100 iterations because I want to be a little bit faster but there can be calculations d one to see how we can optimize the algorithm.

I am also thinking if we have more data on the customers such as their demographics and other behaviours, we will be able to identfy customers that are moe likely to click on informational emails(which is missing) as well as what income level of audience interacts with certain offers.

Conclusions

From these findings, we find that women have a higher engagement than male. Probably due to the amount of time they spend on social and mobile devices compared to men as seen on the last chart.

Social is also the more effective channel to get more engagements than sending on the web and also email. Email direct marketing used to be the main source of traffic, but not anymore. Social is now the leading 1：1 advertisement and it even outperforms normal mobile ads. I think we need to it will continue to grow.

There can be a lot of factors affecting engagements of course, from timing to ad creative to time of year. For example, Covid is the perfect example of destroying most ads and outing events. It certainly has an effect on people’s coffee behaviour.

However, by pure data, it seems to me that male are more into direct discount rather than getting an extra one for free, more minimalistic and probably lone wolves. Female are engaging in both offers in terms of buy one get one free and discounts, they are more likely to be provoked by offers and go on a shopping spree.

For more information, you can find the Github https://github.com/adriantse27/AdrianStarbucksV2