Title: | Retail Shopping Data |
---|---|
Description: | Retail shopping transactions for 2,469 households over one year. Originates from the 84.51° Complete Journey 2.0 source files <https://www.8451.com/area51> which also includes useful metadata on products, coupons, campaigns, and promotions. |
Authors: | Brad Boehmke [aut, cre] |
Maintainer: | Brad Boehmke <[email protected]> |
License: | CC0 |
Version: | 1.1.0.9000 |
Built: | 2025-02-01 03:42:03 UTC |
Source: | https://github.com/bradleyboehmke/completejourney |
See %<-%
for more details.
x %<-% value
x %<-% value
x |
A name structure. |
value |
A list of values, vector of values, or R objects to assign. |
Campaign metadata for all campaigns run for the Customer Journey study. This dataset gives the length of time for which a campaign runs. So, any coupons received as part of a campaign are valid within the dates contained in this dataset.
campaign_descriptions
campaign_descriptions
A data frame with 27 rows and 4 variables
campaign_id: Uniquely identifies each campaign; Ranges 1-27
campaign_type: Type of campaign (Type A, Type B, Type C)
start_date: Start date of campaign
end_date: End date of campaign
campaign_descriptions |
a tibble |
84.51°, Customer Journey study, http://www.8451.com/area51/
# full data set campaign_descriptions # Join product campaign metadata to campaign_table dataset require("dplyr") campaigns %>% left_join(campaign_descriptions, "campaign_id")
# full data set campaign_descriptions # Join product campaign metadata to campaign_table dataset require("dplyr") campaigns %>% left_join(campaign_descriptions, "campaign_id")
Data on the campaigns received by each household in the Complete Journey study. Each household received a different set of marketing campaigns.
campaigns
campaigns
A data frame with 6,589 rows and 2 variables
campaign_id: Uniquely identifies each campaign; Ranges 1-27
household_id: Uniquely identifies each household
campaigns |
a tibble |
84.51°, Customer Journey study, http://www.8451.com/area51/
# full data set campaigns # Join household demographics metadata to campaigns dataset require("dplyr") campaigns %>% left_join(demographics, "household_id")
# full data set campaigns # Join household demographics metadata to campaigns dataset require("dplyr") campaigns %>% left_join(demographics, "household_id")
completejourney
packageRetail shopping transactions for 2,469 households over one year
Learn more here: GitHub
Maintainer: Brad Boehmke [email protected] (0000-0002-3611-8516)
Authors:
Steven M. Mortimer [email protected]
Useful links:
Report bugs at https://github.com/bradleyboehmke/completejourney/issues
Coupon data identifying the coupons that each household redeemed in the Complete Journey study.
coupon_redemptions
coupon_redemptions
A data frame with 2,102 rows and 4 variables
household_id: Uniquely identifies each household
coupon_upc: Uniquely identifies each coupon (unique to household and campaign)
campaign_id: Uniquely identifies each campaign
redemption_date: Date when the coupon was redeemed
84.51°, Customer Journey study, http://www.8451.com/area51/
# full data set coupon_redemptions # Join coupon metadata to coupon_redempt dataset require("dplyr") coupon_redemptions %>% left_join(coupons, "coupon_upc")
# full data set coupon_redemptions # Join coupon metadata to coupon_redempt dataset require("dplyr") coupon_redemptions %>% left_join(coupons, "coupon_upc")
Coupon metadata for all coupons used in campaigns advertised to households participating in the Customer Journey study.
coupons
coupons
A data frame with 116,204 rows and 3 variables
coupon_upc: Uniquely identifies each coupon (unique to household and campaign)
product_id: Uniquely identifies each product
campaign_id: Uniquely identifies each campaign
coupons |
a tibble |
84.51°, Customer Journey study, http://www.8451.com/area51/
# full data set coupons # Join product metadata to coupon dataset require("dplyr") coupons %>% left_join(products, "product_id")
# full data set coupons # Join product metadata to coupon dataset require("dplyr") coupons %>% left_join(products, "product_id")
Household demographic metadata for households participating in the Customer Journey study. Due to nature of the data, the demographic information is not available for all households.
demographics
demographics
A data frame with 801 rows and 8 variables
household_id: Uniquely identifies each household
age: Estimated age range
income: Household income range
home_ownership: Homeowner status (Homeowner, Renter, Unknown)
marital_status: Marital status (Married, Single, Unknown)
household_size: Size of household up to 5+
household_comp: Household composition description
kids_count: Number of children present up to 3+
demographics |
a tibble |
84.51°, Customer Journey study, http://www.8451.com/area51/
# full data set demographics # Transaction line items that don't have household metadata require("dplyr") transactions_sample %>% anti_join(demographics, "household_id")
# full data set demographics # Transaction line items that don't have household metadata require("dplyr") transactions_sample %>% anti_join(demographics, "household_id")
The promotions and transactions data sets are too large to be contained within
the package. get_data()
is a convenience function to download both
full promotions and transactions data sets simultaneously from the
source GitHub repository. An internet connection is required.
get_data(which = "both", verbose = TRUE)
get_data(which = "both", verbose = TRUE)
which |
Character string of one or more data sets to be downloaded.
Can be one of the following; default is
|
verbose |
Logical indicator whether or not to download silently. |
Downloading a single data set will result in a tibble whereas
downloading multiple data sets will return a list containing each tibble.
For specific details on a given data set see the data sets respective help
file (i.e. ?transactions_sample
).
Downloading from https://github.com/bradleyboehmke/completejourney/tree/master/data. Data originated from 84.51°, Customer Journey study, http://www.8451.com/area51/ and were processes for analysis.
Use %<-%
for unpacking a list with multiple
tibbles to their own global environment tibble. You can also download a
single data set with get_promotions
and get_transactions
.
# download transactions and promotions data sets # requires internet connection c(promotions, transactions) %<-% get_data(which = 'both')
# download transactions and promotions data sets # requires internet connection c(promotions, transactions) %<-% get_data(which = 'both')
The complete promotions data set for the Complete Journey is too large to be
contained within the package. get_promotions()
provides an efficient
method for downloading the full data set from the source GitHub repository.
get_promotions(verbose = FALSE)
get_promotions(verbose = FALSE)
verbose |
Logical indicator whether or not to download silently. |
A data frame with 20,940,529 rows and 5 variables
Downloading from https://github.com/bradleyboehmke/completejourney/tree/master/data. Data originated from 84.51°, Customer Journey study, http://www.8451.com/area51/ and were processes for analysis.
promotions_sample
for details regarding the variables.
# requires internet connection promotions <- get_promotions()
# requires internet connection promotions <- get_promotions()
The complete transactions data set for the Complete Journey is too large to be
contained within the package. get_transactions()
provides an efficient
method for downloading the full data set from the source GitHub repository.
get_transactions(verbose = FALSE)
get_transactions(verbose = FALSE)
verbose |
Logical indicator whether or not to download silently. |
A data frame with 1,469,307 rows and 5 variables
Downloading from https://github.com/bradleyboehmke/completejourney/tree/master/data. Data originated from 84.51°, Customer Journey study, http://www.8451.com/area51/ and were processes for analysis.
transactions_sample
for details regarding the variables.
# requires internet connection transactions <- get_transactions()
# requires internet connection transactions <- get_transactions()
Product metadata for all products purchased by households participating in the Customer Journey study.
products
products
A data frame with 92,331 rows and 7 variables
product_id: Uniquely identifies each product
manufacturer_id: Uniquely identifies each manufacturer
department: Groups similar products together
brand: Indicates Private or National label brand
product_category: Groups similar products together at lower level
product_type: Groups similar products together at lowest level
package_size: Indicates package size (not available for all products)
products |
a tibble |
84.51°, Customer Journey study, http://www.8451.com/area51/
# full data set products # Transaction line items that don't have product metadata require("dplyr") transactions_sample %>% anti_join(products, "product_id")
# full data set products # Transaction line items that don't have product metadata require("dplyr") transactions_sample %>% anti_join(products, "product_id")
A sampling of the promotions data from the Complete Journey study signifying whether a given product was featured in the weekly mailer or was part of an in-store display (other than regular product placement).
promotions_sample
promotions_sample
A data frame with 360,535 rows and 5 variables
product_id: Uniquely identifies each product
store_id: Uniquely identifies each store
display_location: Display location (see details for range of values)
mailer_location: Mailer location (see details for range of values)
week: Week of the transaction; Ranges 1-53
promotions_sample |
a tibble |
0 - Not on Display
1 - Store Front
2 - Store Rear
3 - Front End Cap
4 - Mid-Aisle End Cap
5 - Rear End Cap
6 - Side-Aisle End Cap
7 - In-Aisle
9 - Secondary Location Display
A - In-Shelf
0 - Not on ad
A - Interior page feature
C - Interior page line item
D - Front page feature
F - Back page feature
H - Wrap from feature
J - Wrap interior coupon
L - Wrap back feature
P - Interior page coupon
X - Free on interior page
Z - Free on front page, back page or wrap
84.51°, Customer Journey study, http://www.8451.com/area51/
Use get_promotions
to download the entire promotions
data containing all 20,940,529 rows.
# sampled promotions data set promotions_sample # Join promotions to transactions to analyze # product promotion/location require("dplyr") transactions_sample %>% left_join(promotions_sample, c("product_id", "store_id", "week"))
# sampled promotions data set promotions_sample # Join promotions to transactions to analyze # product promotion/location require("dplyr") transactions_sample %>% left_join(promotions_sample, c("product_id", "store_id", "week"))
A sampling of all products purchased by households within the Complete Journey study. Each line found in this table is essentially the same line that would be found on a store receipt. This is only a subsample of the complete data set to keep package size manageable.
transactions_sample
transactions_sample
A data frame with 75,000 rows and 11 variables
Uniquely identifies each household
Uniquely identifies each store
Uniquely identifies a purchase occasion
Uniquely identifies each product
Number of the products purchased during the trip
Amount of dollars retailer receives from sale
Discount applied due to retailer's loyalty card program
Discount applied due to manufacturer coupon
Discount applied due to retailer's match of manufacturer coupon
Week of the transaction; Ranges 1-53
Date and time of when the transaction occurred
transactions_sample |
a tibble |
84.51°, Customer Journey study, http://www.8451.com/area51/
Use get_transactions
to download the entire transactions
data containing all 1,469,307 rows.
transactions_sample
transactions_sample