Retail notes

Description

Organize notes based on PPT and make them as concise as possible

Don't use plagiarized sources. Get Your Custom Assignment on
Retail notes
From as Little as $13/Page

Unformatted Attachment Preview

BU.450.740
Retail Analytics
Week 5
Machine Learning in Retailing:
Tree-Based Methods in R
Prof. Mitsukuni Nishida
Johns Hopkins Carey Business School
1
Today: Tree-Based Methods
• Introduction to Artificial Intelligence
• Tree-based methods in ML
– Classification Trees
– Random Forests
– Gradient Boosting
2
MACHINE LEARNING:
OVERVIEW
3
What is “AI”?
• Intelligence = Ability to classify
• What is the most basic form of intelligence?
• Yes/No classifications are everywhere
– Is this fish edible?
– Should I choose Carey?
• If let our brain (i.e., human) to classify = Human Intelligence
4
• If let machine (i.e., program) to classify = Artificial Intelligence (AI)
AI and Machine Learning
e.g., Yes/No judgement
5
What is Machine Leaning? Automation
When computers classify objects, two options are available
• Option 1: To provide computers a set of rules that human
manually specifies (Note: according to definition, this is still “AI”!)
– E.g., We specify an equation Y = 4 – 2X, where Y is fish
quality, X is the days from catch
– If Y >= 0, we consider edible, and not edible otherwise
• Option 2: To let computers to automatically find out the set of
rules by itself = Machine learning
– E.g., We specify a model Y = a + b X, and if Y >= 0, the model
predicts edible, and not edible otherwise
– We prepare historical data on Y and X
– We let computers to find out parameters a and b that provides
the highest performance in predicting outcome
• Various approaches exist to find the “set of rules”
– E.g., regressions, classification trees, random forests, etc.
6
Machine Leaning: Overview
Question: What is the predicted rent of the property?
Input: Inner harbor, high-rise apartment, 2BR, 700 sq ft,..
Output: Rent $2,000 per month
From data, we predict outcome
Xs (regressors) Y (regressand)
REINFORCEMENT
LEARNING
7
Clustering for Segmentation
Clustering allows us to identify groups
• Groups of similar cars in horsepower, price, & units sold
• Groups of similar stores in sales, # of customers, sq ft
• Groups of similar customers in age, gender, income
Only Xs, no Y
REINFORCEMENT
LEARNING
From input data, we
(1) Find regularities
(2) Identify groups based
on regularities
8
Remark: There is no
right/wrong! (i.e., no “Y”)
Econometrics vs. Machine Leaning
Econometrics
Machine Learning
Discipline to
Discipline to
(1) Analyze patterns in past data
(1) Construct and train a model
(2) Explain factors that generate
those patterns
(2) Use the model to predict
future data
Interests: Causal relationship
Interests: Prediction
• Which advertising lifted sales
this year?
• Which model predicts the
sales (i.e., outcome) better?
Common Statistical Methods (e.g., linear regressions, logit, ..)
“Multivariate regression”
“Supervised leaning (regression)”
9
HOW TO EVALUATE
PERFORMANCE OF
PREDICTION?
CONFUSION MATRIX
10
Evaluate ML Model: Accuracy (ACC)
• Consider a Covid-19 PCR test
• Doctor tells test is “99.255% accurate”
• What would this statement mean?
• What accuracy tells us
– Among 100K people, the model predicted the actual status
(either positive or negative) correctly for 99,255 people and not
correctly for 745 people
– Accuracy (“ACC”) = 99,255/100,000 = 0.99255
• Now suppose my result was positive today. How likely would I be
Covid-19 positive?
• ACC does not answer this question
• Why?
11
Introduce Confusion Matrix to Evaluate
Quality of Prediction
750
250
495
98,505
• ACC = (TP + TN) / (TP + TN + FP + FN)
– Accuracy = (750 + 98,505) / 100,000
= 99,255/100,000 = 99.255%
• Question: Accuracy of fake test (i.e., always negative)?
– Accuracy = (0 + 99,000) /100,000 = 99.00%
12
• What were missing: False Positive (495) and False Negative (250)
“Precision”
Type I error
750
250
495
98,505

More info about FP (495): Among the 1,245 (=750+495) patients diagnosed (i.e.,
predicted) as “positive,” 750 of them are truly positive (TP)

Precision captures “Out of predicted as positive, how many are indeed positive?”

Formula for Precision = TP / (TP + FP). Precision = 1 is best, = 0 is worst
– Precision = 750 / (750 + 495) = 0.60241 = 60.241%
– Type 1 error rate = 1 – Precision = 39.759%

Marketing application: Precision would be a key criteria when we do not want to
recommend items that customers actually do not like (FP)
13
– Amazon’s “You may also like …” may offend some people
“Recall”
750
495
Type II error
250
98,505

Captures “Out of true positives, how many are captured by positive prediction?”

Formula for Recall = TP / (TP + FN). Recall = 1 is best = 0 is worst
– Recall (“Sensitivity”) = 750 / (750 + 250) = 750 / 1000 = 0.75 = 75%
• Interpretation “PCR test detects 75 out of 100 actual covid patients”
• Note that the fake test’s recall: 0 / (0+1,000) = 0%
– Type 2 error rate = 1 – Recall = 25%

Marketing application: Recall would be a key criteria when we do not want to
have high False Negatives (FN)
14
– Example: Overbooked seats in flights, where P = customers shows up
TREE-BASED METHODS:
CLASSIFICATION TREES (i.e.,
DECISION TREES)
15
20 Questions Game
• One person has in mind some object & another person tries to
guess with no more than 20 questions (less question is better)
• Challenge: come up with (1) “good” questions that predicts outcome
and (2) a “good” sequence of questions (example: Akinator)
– Note: 2^20 = 1 million words > Adult’s vocabulary of 20-40K
16
• We introduce decision tree-based methods
Consider POS (Point-Of-Sales) Data
• Convenience-store chains in Japan, such as 7-Eleven, introduced POS in 1982
17
• For every transaction, a staff enters a customer’s demographics: Age and Gender
Which Question to Ask First?
• Imagine we have following POS data from a retailer
Customer id
Purcase (outcome)
Gender
Age
1
Yes (1)
Female (0)
< 20 (0) 2 Yes (1) Female (0) < 20 (0) 3 No (0) Male (1) >= 20 (1)
4
No (0)
Female (0)
>= 20 (1)
5
No (0)
Male (1)
< 20 (0) 6 Yes (1) Male (1) < 20 (0) 7 No (0) Female (0) >= 20 (1)
8
No (0)
Male (1)
>= 20 (1)
9
Yes (1)
Male (1)
< 20 (0) 10 Yes (1) Male (1) >= 20 (1)
• Imagine we want to predict the purchase probability of a customer
with unknown features (i.e., no info about Gender and Age)
• If you are allowed to ask only one question, which feature do you
ask: Gender or Age to better predict the outcome?
Introduce CART (Classification And
Regression Trees, Breiman et al. 1984)
Customer id
Purcase (outcome)
Gender
Age
1
Yes (1)
Female (0)
< 20 (0) 2 Yes (1) Female (0) < 20 (0) 3 No (0) Male (1) >= 20 (1)
4
No (0)
Female (0)
>= 20 (1)
5
No (0)
Male (1)
< 20 (0) 6 Yes (1) Male (1) < 20 (0) 7 No (0) Female (0) >= 20 (1)
8
No (0)
Male (1)
>= 20 (1)
9
Yes (1)
Male (1)
< 20 (0) 10 Yes (1) Male (1) >= 20 (1)
• If we choose Gender,
– Pr (Purchase = 1 | Gender = 0 (i.e., Female)) = 2/4 = 0.5
– Pr (Purchase = 1 | Gender = 1 (i.e., Male)) = 3/6 = 0.5
• Not really helpful to know the customer’s gender to predict outcome!
19
Choosing Age Provides Sharper Prediction
Customer id
Purcase (outcome)
Gender
Age
1
Yes (1)
Female (0)
< 20 (0) 2 Yes (1) Female (0) < 20 (0) 3 No (0) Male (1) >= 20 (1)
4
No (0)
Female (0)
>= 20 (1)
5
No (0)
Male (1)
< 20 (0) 6 Yes (1) Male (1) < 20 (0) 7 No (0) Female (0) >= 20 (1)
8
No (0)
Male (1)
>= 20 (1)
9
Yes (1)
Male (1)
< 20 (0) 10 Yes (1) Male (1) >= 20 (1)
• If we choose Age
– Pr (Purchase = 1 | Age >= 20) = 1/5 = 0.2
– Pr (Purchase = 1 | Age < 20) = 4/5 = 0.8 Age gives us much sharper prediction on outcome (i.e., purchase)! 20 • We should ask Age first.. But any statistic to generalize our findings? Calculate Gini Index for Gender Customer id Purcase (outcome) Gender Age 1 Yes (1) Female (0) < 20 (0) 2 Yes (1) Female (0) < 20 (0) 3 No (0) Male (1) >= 20 (1)
4
No (0)
Female (0)
>= 20 (1)
5
No (0)
Male (1)
< 20 (0) 6 Yes (1) Male (1) < 20 (0) 7 No (0) Female (0) >= 20 (1)
8
No (0)
Male (1)
>= 20 (1)
9
Yes (1)
Male (1)
< 20 (0) 10 Yes (1) Male (1) >= 20 (1)
• Define P1 = Prob. of purchase, P0 = Prob. of non purchase
– P1 for Female is 0.5, P1 for Male is 0.5
• For each group under feature i, (i = gender), we define
– Female: Gi=0 = 1 – (P1)^2 – (P0)^2 = 1 – 0.5^2 – 0.5^2 = 0.5
– Male: Gi=1 = 1 – (P1)^2 – (P0)^2 = 1 – 0.5^2 – 0.5^2 = 0.5
• Define Gini Index for gender “Ggen” as a weighted sum of Gi=1 and Gi=0
– Ggen = Gi=1 * 6/10 + Gi=0 * 4/10 = 0.5*0.6+0.5*0.4 = 0.5
21
– Gini index takes a value of zero if the feature is a perfect predictor
Calculate Gini Index for Age
Customer id
Purcase (outcome)
Gender
Age
1
Yes (1)
Female (0)
< 20 (0) 2 Yes (1) Female (0) < 20 (0) 3 No (0) Male (1) >= 20 (1)
4
No (0)
Female (0)
>= 20 (1)
5
No (0)
Male (1)
< 20 (0) 6 Yes (1) Male (1) < 20 (0) 7 No (0) Female (0) >= 20 (1)
8
No (0)
Male (1)
>= 20 (1)
9
Yes (1)
Male (1)
< 20 (0) 10 Yes (1) Male (1) >= 20 (1)
• Again, define P1 = Prob. of purchase, P0 = Prob. of non purchase
– P1 for Child = 4/5, P1 for Adult = 1/5
• For each group under feature i, (i = age), we define
– Child: Gi=0 = 1 – (P1)^2 – (P0)^2 = 1 – 0.8^2 – 0.2^2 = 0.32
– Adult: Gi=1 = 1 – (P1)^2 – (P0)^2 = 1 – 0.2^2 – 0.8^2 = 0.32
• Define Gini Index for age “Gage” as a weighted sum of Gi=1 and Gi=0
– Gage = Gi=1*5/10 + Gi=0*5/10 =0.32*0.5+0.32*0.5 = 0.32 < Ggen=0.5 22 – Based on Gage < Ggen, we should ask Age first = Our root question Construct Decision Tree Customer id Purcase (outcome) Gender Age 1 Yes (1) Female (0) < 20 (0) 2 Yes (1) Female (0) < 20 (0) 3 No (0) Male (1) >= 20 (1)
4
No (0)
Female (0)
>= 20 (1)
5
No (0)
Male (1)
< 20 (0) 6 Yes (1) Male (1) < 20 (0) 7 No (0) Female (0) >= 20 (1)
8
No (0)
Male (1)
>= 20 (1)
9
Yes (1)
Male (1)
< 20 (0) 10 Yes (1) Male (1) >= 20 (1)
Age>=20 (1)
Age
Male (1)
Male (1)
Gender
Nonpurchase
(2/3)
Age < 20 (0) Female (0) NonPurchase (2/2) Female (0) Gender Purchase (2/3) Purchase (2/2) • After observing Age, Gender gives sharper prediction on outcome (i.e., purchase), unlike prior to observing age • Namely, Gender provides useful info for prediction if you know the age group Steps to Construct Decision Tree When # of Variables Is Large • So far we have two variables: Age and Gender • Imagine we have 100 variables that could potentially explain purchase decisions (= 1 if purchase, = 0 if not) • To construct a decision tree, how do we proceed? 1. We first compute Gini index for 100 variables. Pick the one that yields the lowest Gini index for the top node. 2. Excluding the variable we picked in the first step, we compute Gini index for 99 variables, we pick the one that yields the lowest Gini index as the second to the top node 3. Excluding the variable we picked in the first step, we compute Gini index for 98 variables, we pick the one that yields the lowest Gini index as the third to the top node 4. (Do the same as step 3 or) Stop creating nodes to avoid overfitting. Typically, we pre-specify the maximum # of nodes IMPLEMENTATION OF CLASSIFICATION TREES IN R 25 Read “dec_data.csv” • Data contain individual-level purchase decisions on an item sold on catalog orders • 500 observations, 1 ID variable, 2 outcome variables (Buy and Purchase), 5 explanatory variables • Variables – ID: Customer ID – Buy: Yes or No – Purchase: Purchased amount in US$ – Gender: Male or Female – Kid: Yes if the customer has child(s), and No otherwise – Email: Yes if the customer uses an email – Subscribe: Yes if the customer subscribes to the catalog – ownhome: Yes if the customer owns a home Validation Set Approach: Decompose Master Data into Training Data and Test Data Training data (approx. 80-90%) Testing data (approx. 10-20%) 27 Overfitting • If the model follows errors (noise) too closely in training data, the model would not be capable of predicting outcomes in new data => High generalization ability
– For instance, suppose we want to explain Y: sales of softdrink by
X: outside temp of the day
28
Decompose “dec_data.csv” into (1)
Training Data and (2) Testing Data
• We construct Training Data (“Data_Tr”) and Test Data (“Data_Te”)
from the master data
• We implement this data split in R as
– set.seed(1) # Keep the random seed as constant to replicate
results
– TestD = sample(1:500,50) # Pick up 50 random ID from master
data used for “Data_Te”
– Data_Tr = dec_data[-TestD,] # Construct training data
– Data_Te = dec_data[TestD,] # Construct testing data
– head(Data_Tr)
Using Training Data, We Fit Classification
Trees in R
• We implement the fit to classification trees in R as
– library(rpart)
# Use library for decision tree
– result = rpart(Buy ~.-ID – Purchase ,data = Data_Tr,
method=”class”, parms=list(split=”gini”)) # Implement CART
– result
Outcome
Tell R that we can use any variables except ID
and Purchase to explain outcome
Tell R that we use classification trees Tell R that we use Gini index for criteria
Alternatively, we can use Entropy for criteria
• We then plot the rpart model in R as
– library(rpart.plot) # Use library for plotting a tree
– prp(result, type=3, extra=102,nn=TRUE,
fallen.leaves=TRUE,faclen=0,varlen=0,shadow.col =”grey” ) #
Plot the fitted trees
Fitted Classification Trees for “Buy”
Finding 1: The fitted tree uses
only three variables: Subscribe,
Gender, and Email
Finding 4: These two
segments of customers are
unlikely to buy the product
Finding 2: Customers who
subscribe (=Yes) and are
Female are the best purchase
rate of 73.9% (=88/119) so the
best segment of all
Finding 3: Customers who subscribe (=Yes), are Male, and use Email are the
second best purchase rate of 56.5% (=82/145) so 2nd best segment of all
We Construct Confusion Matrix in R
– # To measure how the fitted model performs against unknown & new
data, we predict the outcome using Testing Data that we did not use for
fitting
– yhat.result = predict(result,Data_Te,type=”class”)
– # Ensure both are factors and set the levels to “Yes” then “No”. #
When we explicitly set the levels of a factor with levels = c(“Yes”, “No”),
“Yes” is internally coded as 1 and “No” as 2. This is because “Yes” is
the first level in the list, and “No” is the second.
– Data_Te$Buy = factor(Data_Te$Buy, levels = c(“Yes”, “No”))
– yhat.result = factor(yhat.result, levels = c(“Yes”, “No”))
– # Construct the confusion matrix with the desired order
– yhat.table = table(Data_Te$Buy, yhat.result)
– # Display the confusion matrix
– print(yhat.table)
We Construct Confusion Matrix in R
TP
FN
FP
TN
– Accuracy = (TP + TN) / (TP + TN + FP + FN) = 33/50 = 66.0%
• Misclassification rate = 1 – Accuracy = 34%
– Precision = TP / (TP + FP) = 17/(17 +14) = 54.8%
– Recall = TP / (TP + FN) = 17/(17+3) = 17/20 = 85.0%
R Codes to Compute and Display
Performance Criterion of Prediction
# Evaluate the quality of prediction
• Accuracy = (yhat.table[1,1]+yhat.table[2,2])/nrow(Data_Te)
• Precision = yhat.table[1,1]/(yhat.table[2,1]+yhat.table[1,1])
• Recall
= yhat.table[1,1]/(yhat.table[1,2]+yhat.table[1,1])
• print(paste(“Accuracy:”, Accuracy))
• print(paste(“Precision:”, Precision ))
• print(paste(“Recall:”, Recall ))
ENSEMBLE LEARNING
ALGORITHM 1:
RANDOM FORESTS
35
Why Ensemble Learning Methods?
• Decision trees are useful because
– Results can be easily visualized so non experts can understand
– Results are not sensitive to scaling of data
• Meanwhile, downsides do exist
– Not robust to outliers
– Sometimes it overfits the training data and provides poor
generalization performance
• In practice, two “ensemble” methods (i.e., combination of ML
methods), based on decision trees, are deployed for a wide range
of applications
– Random forests (most popular in kaggle, -2014)
– Gradient boosting (2014-)
Why Random Forests (RF)?
• RF address overfit issue by building many samples
(“bootstrapping”) and generating many decision trees
• RF introduce randomness in
– (1) Building many samples from the original sample (“Bagging”)
– (2) Selecting the features (i.e., X variables) in each split test
• Ideas behind RF
– Each tree might do well in predicting, but will likely overfit data
– If we build many trees, however, all of which may work well and
overfit in different ways, so we can reduce the overfit by
averaging the results
– That way, we reduce overfitting while retaining predictive power
of the trees
• We won’t be covering, but rigorous math behind this advantage
Bootstrap aggregating (“Bagging”)
• Bagging uses repeated samples from the single training data set to
generate many different training data sets
After bootstrapping, we have three training data sets in this example
RF = Bagging + Random Subset of Predictors
“Bagging”
Use random subset of predictors for each node
“Voting” using n predictions
Bootstrapping the set of predictors, which restricts the number of predictors, makes
predictions across training data sets less correlated
Implement Random Forest in R
• library(randomForest) # Use library for Random Forest
• set.seed(1) # Specify the seed for random number to make sure
the results are replicable
• result_RF1 = randomForest(as.factor(Buy)~.-ID Purchase,data=Data_Tr,mtry=3,importance=TRUE) # Implement
RF

Outcome we want to predict
Tell R that we can use any variables except ID
result_RF1
and Purchase to explain outcome
Number of variables randomly sampled as candidates at each split
Below is to construct the confusion matrix
• yhat_RF1.result = predict(result_RF1,Data_Te,type=”class”)
• yhat_RF1.table = table(Data_Te$Buy,yhat_RF1.result)
• print(“==== Class”)
• print(yhat_RF1.table)
40
Random Forest Outcome in Console
Random Forest: Performance Criterion
Using Testing Data
Random Forest: Tuning Parameters
• result_RF3 = randomForest(as.factor(Buy)~.-ID Purchase,data=Data_Tr, mtry=4, ntree = 2000,
importance=TRUE) # Implement RF
• result_RF3
Increase the number of trees grown to 2,000 (previously 500)
Number of variables randomly sampled as candidates at each split is now 4 (previously 3)
43
Random Forest: Performance Criterion
Using Testing Data
Below is the performance of the model with mtry = 4, ntree = 2,000
• Rule of thumb for tuning parameters
– Set mtry as the square root of the number of all predictors
(Xs)
• In this example, we have 5 predictors => Set the default of mtry
as 2 or 3 because (5)^(1/2) ~= 2.23
– Set ntree large (at the cost of computing time)
ENSEMBLE LEARNING
ALGORITHM 2:
GRADIENT BOOSTING
MACHINE
45
Why Gradient Boosting Machine (GBM) ?
• GBM
– Frequently the winning entries in ML competitions since 2014
– Widely used in industry
• Idea behind GBM
– GBM builds trees in a serial manner, where each tree tries to
correct the mistakes of the previous one
– GBM combines many simple models, like shallow trees
– Each tree can provide good predictions on part of the data, and
so more and more trees are added to iteratively improve
performance
Implement GBM in R
• library(gbm) # Use library for GBM
• set.seed(1)
• result_gbm = gbm(formula = buy_dum ~ Gender + Kid + Email
+ Subscribe + ownHome, data=Data_Tr_v2, shrinkage=0.01,
interaction.depth=3, distribution = “bernoulli”,n.trees =
80000,bag.fraction = 0.5) # Implement GBM
• pred_gbm_raw = predict(object = result_gbm, newdata =
Data_Te_v2, n.trees = 80000, type = “response”)
• pred_gbm 0.5, 1,0)
• gb.table = table(Data_Te_v2$buy_dum,pred_gbm)
47
Implement GBM in R
The performance did not change
Note: GBM (& Random Forest) needs careful parameter tunings
48
(end of slides)
BU.450.740
Retail Analytics
Week 5 Location Choice in Retailing
Prof. Mitsukuni Nishida
Johns Hopkins Carey Business School
1
Announcement
• Homework 5 (last one) on tree-based methods in R & ArcGIS is
due at the beginning of session 6
• Group project
– Retail track: Market analysis based on industry analysis and
competitive advantage’
– Analytics track: Data cleaning and descriptive analyses using
the data
2
Today’s Agenda:
“Location, Location, Location”
Expression for common knowledge that the right location is essential in retail
• Pillar 1: Retailer’s entry and location decisions
• Pillar 2: “Turbo” Microecon for retailers
– Economies of scale
• Application: New outlet decision
– Repeated game
• Application: Tacit collusion
• Pillar 3:
– Empirical analysis in R using retail data: Tree-based methods
– Spatial analysis in ArcGIS: Merchandising strategy
3
RETAIL NEWS:
Retailers Have Launched
Apps for Apple Vision Pro
4
Retailers Launched Apps for Apple Vision Pro
• Apple started to sell Apple Vision Pro on February 2
• Timed with the Apple Vision Pro release, retailers have been
unveiling new applications for visionOS
– J.Crew: J.Crew Virtual Closet
• “With the interactive mannequin, users can mix and match a
curated selection of products; through intuitive hand and eye
movements, Vision Pro users can browse a selection of
items and create their own ideal outfit for any occasion”
– Lowe’s: Lowe’s Style Studio
– Alo Yoga: alo Sanctuary
• Alo incorporated wellness into their shopping experience by
offering over 20 complimentary meditation sessions
• These retailers partnered with Obsess, retail tech company
for shopping platform, which developed immersive shopping
5
experiences for visionOS
RETAIL “NEWS”: USE OF
SATELLITE IMAGE
6
Hedge Funds Use Satellite Images to Beat
Wall Street (Counts, 2019)

“… a (Haas) student brought up the story of how company founder Sam Walton
used to count cars in store parking lots to gauge how sales were going.”

“(Assoc. Prof. Panos Patatoukas) landed 4.8 million images of parking lots at
67,000 individual stores across the U.S. owned by 44 major retailers, including
Walmart… (they) found that the strategy can indeed deliver a significant boost for
investors savvy enough to exploit it…The informational advantage yields 4% to 5%
in the three days around quarterly earnings announcements”

“The researchers also found that although this type of satellite data has been
commercially available since 2011, the information hasn’t spread beyond a select
7
few large investors, mostly hedge funds.”
PILLAR 1: LOCATION
CHOICES OF RETAILERS
8
Factors for Entry and Location Decisions
Population and
employment
growth levels
Retailers like La Curacao targets its assortment of
electronics, appliances, and home goods to Hispanics,
by locating its stores in trade areas where they live
Competition can be within a chain (next slide),
and may have positive effects (next^2 slide)
9
Local and state regulations
Competition within a Retailer: Mapping Trade
Area
Yellow: 3 minutes
from a store
Red: 6 minutes
Blue: 9 minutes
Trade off of multiple stores: Cost savings due to economies of density (i.e., savings in
logistic costs) vs. business-stealing effects (i.e., loss of sales due to competition over
same customers)
Q. Would “Cannibalization” (i.e., competition across stores of the same chain) be a
10
concern in this map? Ans. No, because there is little overlap
Competition Across Retailers: Spillovers
Co-location of similar stores may increase appeal of location
11
Competition Across Retailers:
Spillovers Are Not Always Positive
“A group made up of Forever 21’s biggest landlords—Simon and Brookfield Property
Partners LP—along with Authentic Brands Group LLC, a brand licensing firm, has offered12
$81 million for the bankrupt fast-fashion retailer.” (Biswas, 2020)
Other Factors for Site Selection
Zoning in Superior, Wisconsin
Zoning regulates land use to prevent interference with existing use by
14
residents or businesses
Location Analysis is Everywhere
Retailers that leverage a platform for geographical analysis
• Case 1: Chick-fil-A (ESRI case posted at Canvas)
– Initially utilized ArcGIS for optimized site selection to minimize
competition and potential cannibalization
– Later integrated ArcGIS with marketing, daily operation services,
and business intelligence
• Case 2: Starbucks (Forbes posted at Canvas)
– Displays distance to other Starbucks locations, demographics,
traffic patterns before recommending a new location
– Predicts effects on other Starbucks location after opening a store
• Case 3: Johnson and Johnson
– Optimal territory for each sales rep to maximize resources via
ArcGIS
15
GIS Resources at JHU
• Information available at ArcGIS Desktop
– Current and 5 year prediction on: Gender, Occupation, Income,
Travel time to work, Disposable Income, Transportation mode to
work, Net worth, Household composition, Education, Age,
Household expenditures, Race, Market potential index,
Employment status, Spending Index, …etc
• Business Analyst for ArcGIS
– Physical address, Estimated sales for each store
– https://www.esri.com/en-us/arcgis/products/arcgis-businessanalyst/overview
16
TWO WAYS TO ESTIMATE
POTENTIAL SALES FOR A
NEW STORE
17
1. Huff’s Gravity Model
Model sales of a store as a function of two variables
(1) Size of store: Larger store has more pulling power
(2) Travel time to the store: Closer store has more pulling power
Mathematical representation:
S j  Tij b
Pij = n
 S j  Tij b
j =1
Where
Pij = Probabilit y of a customer at a given point of origin i traveling to a
particular shopping center j
S j = Size of shopping center j
Tij = Travel time or distance from customer’s starting point to shopping
center
b = An exponent to Tij that reflects the effect of travel time on different
kinds of shopping trips
Application of Huff Gravity Model
5,000ft^2
10,000ft^2
PRC
=
POH
=
=
10,000/5 2
.889
(10,000/52 + 5,000/10^2)
= .182
10,000/152
(10,000/152 + 5,000/5^2)
.889 x $5 million + .182 x $3 million = $4.9 million
2. Regression Analysis & Analog Approach
• Multiple Regression Analysis
– Factors affecting the sales of existing stores in a chain will
have the same impact upon the stores located at new sites
being considered
• Analog Approach
– Retailer describes the site and trade area characteristics for
its most successful stores and attempts to find a similar site
• As the # of outlets goes beyond 20, we often utilize multiple
regression analysis
Regression Model to Estimate Store
Sales
Assume we estimated the following equation
Stores sales = 275 x number of households in trade area
(within 15 minute drive time)
+ 1,800,000 x percent of household in trade with children
under 15
+ 2,000,000 x % of households in trade area in Tapestry
segment “aspiring young ”
+ 8 x shopping center square feet
+ 250,000 if visible from street
+ 300,000 if Wal-Mart in center
Application of Regression Model
Store Sales A
= $7,635,000
= 275×11,000 + 1,800,000 x 0.7 + 2,000,000 x 0.6
+ 8 x 200,000 + 250,000 + 300,000
Store Sales B
= $6,685,000
= 275×15,000 + 1,800,000 x 0.2 + 2,000,000 x 0.1
+ 8 x 250,000
PILLAR 2: ECONOMIES
OF SCALE & OUTLET
OPENING DECISIONS
23
When & Where to Open a New Outlet?
24
San Antonio, TX
AC Decreasing in Q = Economies of Scale
Economies of scale
1. Spreading fixed costs
2. Specialization
3. Managerial economies
Diseconomies of scale
1. Labor costs: Unions and compensating differentials
2. Incentive and coordination effect
AC $
Average Cost = (Total costs) /Q
= TC/Q = FC/Q + VC/Q = AFC + AVC
Diseconomies
of scale
Economies
of scale
Constant
returns to scale
29
32
Quantity in
1,000
Mini Case: Starbucks’ New Location
Store ID
Year
# of cups
in 000s
Ave Cost
per cup
Total costs
000s 2002$
26
Cost per Cup (AC) by Year,1993-2002
27
Cost per Cup (AC) by Quantity
28
Which market would need new store openings in 1993 to
achieve average cost minimization?
Store ID
Year
# of cups
in 000s
a.
Markets with ID 1 & ID 5 stores
b.
Markets with ID 2 & ID 3 stores
c.
Markets with ID 4 & ID 6 stores
d.
Do not have enough information to proceed
Ave Cost
per cup
Total costs
000s 2002$
30
Cost per Cup (AC) by Quantity
Outlet experiencing high Q beyond
33,000 cups may need a new outlet
openings nearby
31
PILLAR 2: Theory on
Tacit Collusion via
Repeated Interactions
32
Dynamic Model
• Repeated game
• What can keep members from cheating?
– How can firms sustain prices above competition
levels without explicit collusion?
– Threat of punishment in the future allows you to
cooperate with competitors today
Tradeoff: Long-run benefit vs. short-run benefits
Long-run benefit:
Today (T)
Tomorrow
(T+1)
T+2
T+3

Short-run benefit:
33
Dynamic Gas Pricing
pB
high
m/2
high
m/2
pA
low
m
0
0
low
m
0
0
• Static Nash equilibrium: Both stations play low
• Suppose both stations adopt the trigger (“grim”) strategy—play
high; if my competitor plays low, play low forever
• When is cooperation among firms sustained?
34
Solution Depends on Discount Rate
• Suppose in the first period station A expects station B to
cooperate (=play high)
• If station A cooperates (=play high) as well, profits are
m
m
m
m
2 
+
+
+ … =
2
2
2
2(1 −  )
Note: When 0
Purchase answer to see full
attachment