Introduction
DoorDash is the market leader in U.S. based online food ordering and the platform had a 56% market share in the United States as of November 2022 according to this article by emergenresearch. With that being said, one could imaging that the Data Science and Analytics team at DoorDash are one of the high-impact departments of the company. I did a quick LinkedIn job search and found a variety of names for different technical roles within the company. Some of these were:
Real Time Analyst, Supervisor (Remote)
Autonomy Engineer
Corporate Security Engineer
Staff Engineer, Machine Learning Science
Software Engineer, Data Platform
Data Engineer
All this to say once again, that DoorDash places a high-priority on their data and use that to target their audience.
Data
The data is a sample dataset provided by bootcamp instructor Avery Smith in his program the Data Analytics Accelerator. Have a look at it through the link provided. First, I've provided a data dictionary with a description of the fields and then I'll parry down the fields of interest for this project.
Data Dictionary
ncome: Customer's Yearly Income
MntTotal: Total Amount Spent at Store by Customer
Kidhome: Number of Young Kids in Home
Teenhome: Number of Teenagers in Home
Recency: Number of Days Since Last Purchase
MntWines: Amount Spent on Purchasing Wine
MntFruits: Amount Spent on Purchasing Fruit
MntMeatProducts: Amount Spent on Purchasing Meat
MntFishProducts: Amount Spent on Purchasing Fish
MntSweetProducts: Amount Spent on Purchasing Sweet
MntGoldProds: Amount Spent on Purchasing Gold
NumDealsPuchased: Number of Purchases With Discount
NumWebPurchases: Number of Purchases Made Through Website
NumCatalogPurchases: Number of Purchases Made Through Catalogue
NumStorePurchases: Number of Purchases Made Through Physical Store
NumWebVisitsMonth: Number of Visits To Website in Last Month
AcceptedCmp1: Did The Customer Accept Offer in 1st Campaign
AccaptedCmp2: Did The Customer Accept Offer in 2nd Campaign
AcceptedCmp3: Did The Customer Accept Offer in 3rd Campaign
AcceptedCmp4: Did The Customer Accept Offer in 4th Campaign
AcceptedCmp5: Did The Customer Accept Offer in 5th Campaign
Complain: Has The Customer Complained In Last 2 Years
Age: Age of Cusotmer
Customer_Days: How Many Days Has Customer Been a Customer
marital_Divorced: Is Customer Divorced?
marital_married: Is Customer Married?
marital_Single: Is Customer Single?
Marital_Together: Is Customer Living With Someone?
Marital_Widow: Is Customer Divorced?
education_Basic: Is the customer's highest education level high school?
education_Graduation: Is the customer's highest education level undergraduate?
education_Master: Is the customer's highest education level Master's?
education_PHD: Is the customer's highest education level a PhD
MntRegularProds: Total Amount Spent on regular products
DateJoined: The date the customer first became a customer
Key Fields:
I've decided to grab the following fields and I've included my ideas for why I need those fields off on the side:
Income: I want to observe the interaction between income and the amount of money spent in the app.
MntTotal: I'll use this for a variety of measures, but mainly for comparison with income.
Kidhome: I want to see if number of children in the home affects the way a customer spends in the app.
MntWines: I want to segment customer spend by amount spent on wine.
MntFruits: I have the same idea as above, but for the fruit category.
MntProducts columns: Segment by different types of products.
NumDealsPurchased: I want to segment by purchases because of a deal.
NumWebPurchased: I want to see how many web purchases were made.
NumCatalogPurchased: I want to see how many orders originated from the catalogue.
NumStorePurchased: I want to see how many purchases were made through a physical store.
AcceptedCmp: Information related to the various marketing campaigns.
Age: Customer Age, will also use to create customer age brackets.
DateJoined: I want to see which months resulted in the most amount of new customers.
Scenario
For this project, I'm acting as a newly hired data analyst for Door Dash and I have been asked for information related to DoorDash marketing campaigns, customer demographics, seasonal spend, levels of spending, income vs. spend analysis, age group analysis, and other forms of analysis. I've opted to do this analysis in Excel as it has been a while since I've worked in Microsoft Excel and I wanted to showcase my ability use the tool.
Task
The data analyst leading my team asked for several levels of information. Of which I've listed the questions below that I'll be trying to answer using Excel.
Summary
How many customers do we have?
What is the age range of our customers?
What is the average spend using the app by customer?
What is the total amount of money spent?
What is the shortest lifetime of a customer?
Customer spending by ID
How can we look up a customer by their ID and figure out the total amount of money they spent?
What interaction does customer income have with their spend?
Do customers typically spend more when they make more money?
What is distribution of customer spend?
Are there brackets that show this?
Customer Membership Information
Which months had a higher number of new customers?
Campaign Information
Which campaign captured the most new customers?
Which month had the highest amount of captures by campaign?
What seasons were most popular overall?
Which campaign and in which month resulted in the most amount of new customers?
Demographic information
From the most successful campaigns which new customers had children?
What was the average spend by age group?
Which months resulted in more spend for each age group?
Which product categories did age groups spend more money on?
Revenue Stream
Which revenue stream resulted in the most amount of purchases?
Approach
We have a few questions to answer for the different pieces of information that we have been tasked to find. My approach to this was to focus on creating tools to help answer the questions above.
Summary
For the summary portion of this analysis, I decided on using a simple table to gather this information into one place.
Insights:
The number of unique customers in the available dataset was 2,205.
The oldest customer that the app served was 80 years old and the youngest was 24 years old. The range of age served was 56 years difference between the oldest and youngest customer.
The average amount of spend was $562.76.
The total amount of money spent in the app was $1,240,896.00. This looks a little suspicious, but I used formatting to round the information in this cell to the nearest dollar.
Customer Spending Lookup:
For this part of the analysis I decided to use the vlookup function to reference the data in the "Main" table tab. This is where we hold the source data for all of the sheets that were made. I decided to utilize a vlookup to gather information from the Main table. The formula used is pictured below:
Insights:
The tool allows me to lookup the customer ID typed into the first column, reference it against a table array which is columns A:C from the main table and grab the value from the third column in that array which was the 'MntTotal' column that represents the total amount of money spent by that customer.
As you can see, the customer with customer ID 1 spent $2,491.00 in the DoorDash app
Customer Income Analysis:
For this part of the analysis, I wanted to see how the amount of total income for a customer interacted with their spend in the store. I had 3 questions I wanted to answer.
Insights:
The first point of analysis was to see how customer income interacted with their total spend. So, to do this I used a scatter plot to analyze the general interaction between the two variables.
There's a clear positive relationship between the two variable. As customers income increase it looks like they spend more in the store.
To be sure, I used an R-squared line which is a line fitted to the chart that explains the variance in total spend by change in income. In this case, our R-squared value is .6774 which means that the more people make, the more they spend in the store and the variance in the total spent value can be explained by 67.75% by the changes in their income.
Essentially, DoorDash's marketing team should target customers with higher income as they tend to spend more in the store.
To add another level of analysis to customer spend, I'm going to look at the distribution of customer spend, but there is too much granularity at the moment, so we will use a histogram to create spending brackets by customers to see if we should really target higher spending customers.
It looks like most customers (1,196 to be exact) spent between $4.00 and $418.50 which would indicate that looking at customer age groups who's spend might fall into these brackets could be a significant way to further target an audience.
Interestingly, it looks like the second most popular spending bracket is $833-$1247.50 which might mean that there is an opportunity to further target customers who spend in this range to capture some more revenue.
New Customer Seasonality:
Insights:
I wanted to expand on our customer profile that we are building to see what kind effect the time of the year had on a new customer sign ups. It looks like the summer months dominated for new membership with July being the month where the most new customers (203) signed up. August, was a close second with 201.
Going forward, I'm going to keep an eye on the summer months to see if we can target the customer by a combination of spend, time of the year, and age group they fall in.
Measuring Campaign Success:
The next part of our customer profile that we can drill down on is to look at how the customer's age and time of the year interact act with the different campaigns that were run. First, we'll have a look at the new customers by month and campaign. To visualize this relationship I used conditional formatting to create a heatmap that showed popular months and popular campaigns. The cooler the color on the scale (green) the more customers we got.
Insight:
From the heat map above we can see right away that a lot of the green or high levels of new customers came from campaign 6. The most successful month for campaign 6 was January. This contradicts my previous statement that the most successful months for new customers was July, but this might be a campaign level thing. I'll explore this further through a line chart to show campaign changes over time.
Insight:
Similar to the heat map I can see that the most successful campaign was campaign 6. Additionally, it looks July was the month on average that had the highest new customers as there are upward movements for almost every campaign except for campaigns 2 and 3.
Overall, campaign 2 had the least success for new customers so it would not be recommended to run a campaign similar to that one.
The most successful month and campaign combination was January, which is something that could be extension on this analysis.
Finally, the chart is quite busy, so I designed a tool that uses named ranges to create cell references that allowed me to create a filter bar for the chart. To accomplish this I used a combination of the functions OFFSET and MATCH to add a named range to filter the chart.
The tool allows for the end-user to look at the various campaigns on a monthly basis without the clutter of the campaign information for a zoomed in view.
We already know that campaign 6 was the most successful campaign and the most successful month was January so I'll continue with the analysis which was a look at the interaction between age groups and campaign success. I created another heat map to visualize the popular campaigns by age group.
But first, I needed to create age brackets based on the age column. To do this I used a combination of an IF function and the AND keyword.
Campaign 6 was a success again. However, it wasn't as resounding a success for a couple of age brackets. The 66+ age bracket wasn't as attracted to campaign 6 as other age groups, but additionally the age group as a whole didn't seem receptive to the marketing campaigns which is no surprise.
Curiously, the 24-35 year old age bracket had lower numbers across the board.
To visualize the spend by each age group I created a clustered bar chart that segments each age group and counts the number of new customers who accepted each campaign. The two age groups that most represented new customers were 36-50 and 51-65 and the. most successful campaign was the campaign 6 (the turquoise one).
I also added some sparklines at the end of each row in the pivot table to make an easy to see distribution of the values in the columns.
Demographic Analysis
Up to this point we've had a good look the effects of income on spend, the distribution of spend by customers, the seasonality of new customers, the most and least successful campaigns, and the age brackets of customers. What I want to do next is to have a closer look at customer spend using pivot tables and heat maps.
Insight:
With campaign 6 being the most successful campaign overall, I wanted to look at the demographics of the customers in that campaign. I thought I'd have a look at what the families generally were like. Did they have children? No children? More than one child? I used a pivot table to visualize this information.
For new customers who came from campaign 6, 221 of them had no children. 110 of them had 1 child, and 2 of them had no children. Targeting customers who didn't have children could be a way of targeting a future campaign.
To provide further analysis, I wanted to observe the spend by age group on the different products that are offered in the DoorDash app. So I want to look at spend by product category.
Looking at the pivot table above, it's clear that a lot of money is spent on wine in the app with the 36-50 and 51-65 age brackets spending the most on that particular product category.
Other areas of moderate spend were on Meat Products from the same age groups mentioned before. Those two age groups were highlighted earlier for their prominence as new customers by campaigns. So it's possible the sheer amount of new customers in those age brackets are responsible for the higher spending.
I decided to examine monthly spend by age group as well and found that the same two age groups spent the most in the early part of the year. January and February, which were colder months so maybe people wanted to drive during that time. However, I did notice that those same age groups accounted for larger monthly spend totals in June and July as well.
Finally, what I found most interesting was that the most spend per age group was from the 66+ age group which indicated that although the larger represented age groups 36-50 and 51-65 had more overall spend because there were more consumers in that age group, there were some opportunities for targeting the 66+ age group because however few of them there are, they are spending more than the other age groups by almost $100. That combined with information from our previous table where customers where customers who were 66+ spent their most money in July indicates a summer campaign might be good for that age group.
Purchases by Revenue Stream
Finally, I wanted to look at the amount of purchases by revenue stream. We found that most people elected to have purchase in-store. Which I believe means that they purchased something through the app and elected to pick-up the item to save on the cost. This stream had 12,841 purchases, nearly 4,000 more than the next stream web purchases.
Conclusion/Results:
From our findings we saw that there is are a large age range between the 2205 customer represented in the app. So we needed to segment this information by age range range and spending habits.
First, I created a tool that allowed me to look up each customer by their ID and observed the total spend by using vlookups.
Then, I observed the relationship between total income and amount spent by visualizing this interaction with a scatterplot and found that there is a positive correlation between the two. This means that as income goes up, so does spend.
I found that most customers spend between $4-$418 compared to other spending brackets.
When it came to new customers it looked the summer months were months where the most new customers joined.
This was confirmed through the pivot table heat maps I created and we also saw that campaign 6 was responsible for the most new customers signing up to the app. This was accomplished through creating a tool by combing offset and match functions with named ranges to create a dynamic lookup.
Campaign 2 had the least success in finding new customers.
Then, I used a combination of conditional formatting heat maps, sparklines, and clustered bar charts to visualize the relationship between age brackets signed up per campaign. We found that the age brackets most represented were 36-50 and 51-65 and they signed up the most during campaign 6.
We also found that the most amount of people who signed up for the app did not have children.
The age groups 36-50 and 51-65 spent the most money in the store and opted to spend it on wine products, but the age group who spent the most on average by person was the 66+ age group who spent a lot of their money on wine and meat products.
Finally, people opted for store pick up when using the app to save on costs.
From all of the information we gathered, I'd recommend that DoorDash create marketing campaigns for mid-summer and mid-winter as these months were the most successful in finding new customers. Additionally, I'd recommend that they examine what they did during campaign 6 as this campaign outperformed all of the others by a large margin. Finally, I'd say that the customer age brackets to target would be ages 36-50 and 51-65 and wine/meat promos would be most effective as this is where most of their money was spent.
I really enjoyed this project and would love any feedback, suggestions, would welcome any questions, so feel free to send me a message and if not, please send a LinkedIn connection request. Finally, I'd recommend Avery Smith's bootcamp as it helped me land my first data job and he creates a fun learning environment which is why I came back when I wanted to practice my excel with a practical, project-based approach.
Comments