8 R Markdown
8.1 Introduction to R Markdown
This page will guide you through creating and editing R Markdown documents. This is a useful tool for reporting your analysis (e.g. for homework assignments). Of course, there is also a cheat sheet for R-Markdown and this book contains a comprehensive discussion of the format.
The following video contains a short introduction to the R Markdown format.
Creating a new R Markdown document
In addition to the video, the following text contains a short description of the most important formatting options.
Let’s start to go through the steps of creating and .Rmd file and outputting the content to an HTML file.
If an R-Markdown file was provided to you, open it with R-Studio and skip to step 4 after adding your answers.
Open R-Studio
Create a new R-Markdown document
Save with appropriate name
3.1. Add your answers
3.2. Save again
“Knit” to HTML
Hand in appropriate file (ending in
.html
) on learn@WU
Text and Equations
R-Markdown documents are plain text files that include both text and R-code. Using RStudio they can be converted (‘knitted’) to HTML or PDF files that include both the text and the results of the R-code. In fact this website is written using R-Markdown and RStudio. In order for RStudio to be able to interpret the document you have to use certain characters or combinations of characters when formatting text and including R-code to be evaluated. By default the document starts with the options for the text part. You can change the title, date, author and a few more advanced options.
The default is text mode, meaning that lines in an Rmd document will be interpreted as text, unless specified otherwise.
Headings
Usually you want to include some kind of heading to structure your text. A heading is created using #
signs. A single #
creates a first level heading, two ##
a second level and so on.
It is important to note here that the #
symbol means something different within the code chunks as opposed to outside of them. If you continue to put a #
in front of all your regular text, it will all be interpreted as a first level heading, making your text very large.
Lists
Bullet point lists are created using *
, +
or -
. Sub-items are created by indenting the item using 4 spaces or 2 tabs.
* First Item
* Second Item
+ first sub-item
- first sub-sub-item
+ second sub-item
- First Item
- Second Item
- first sub-item
- first sub-sub-item
- second sub-item
- first sub-item
Ordered lists can be created using numbers and letters. If you need sub-sub-items use A)
instead of A.
on the third level.
1. First item
a. first sub-item
A) first sub-sub-item
b. second sub-item
2. Second item
- First item
- first sub-item
- first sub-sub-item
- second sub-item
- first sub-item
- Second item
Text formatting
Text can be formatted in italics (*italics*
) or bold (**bold**
). In addition, you can ad block quotes with >
> Lorem ipsum dolor amet chillwave lomo ramps, four loko green juice messenger bag raclette forage offal shoreditch chartreuse austin. Slow-carb poutine meggings swag blog, pop-up salvia taxidermy bushwick freegan ugh poke.
Lorem ipsum dolor amet chillwave lomo ramps, four loko green juice messenger bag raclette forage offal shoreditch chartreuse austin. Slow-carb poutine meggings swag blog, pop-up salvia taxidermy bushwick freegan ugh poke.
R-Code
R-code is contained in so called “chunks”. These chunks always start with three backticks and r
in curly braces ({r}
) and end with three backticks (
). Optionally, parameters can be added after the
r
to influence how a chunk behaves. Additionally, you can also give each chunk a name. Note that these have to be unique, otherwise R will refuse to knit your document.
Global and chunk options
The first chunk always looks as follows
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
It is added to the document automatically and sets options for all the following chunks. These options can be overwritten on a per-chunk basis.
Keep knitr::opts_chunk$set(echo = TRUE)
to print your code to the document you will hand in. Changing it to knitr::opts_chunk$set(echo = FALSE)
will not print your code by default. This can be changed on a per-chunk basis.
```{r cars, echo = FALSE}
summary(cars)
plot(dist~speed, cars)
```
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
```{r cars2, echo = TRUE}
summary(cars)
plot(dist~speed, cars)
```
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
A good overview of all available global/chunk options can be found here.
LaTeX Math
Writing well formatted mathematical formulas is done the same way as in LaTeX. Math mode is started and ended using $$
.
$$
f_1(\omega) = \frac{\sigma^2}{2 \pi},\ \omega \in[-\pi, \pi]
$$
\[ f_1(\omega) = \frac{\sigma^2}{2 \pi},\ \omega \in[-\pi, \pi] \]
(for those interested this is the spectral density of white noise)
Including inline mathematical notation is done with a single $
symbol.
${2\over3}$ of my code is inline.
\({2\over3}\) of my code is inline.
Take a look at this wikibook on Mathematics in LaTeX and this list of Greek letters and mathematical symbols if you are not familiar with LaTeX.
In order to write multi-line equations in the same math environment, use \\
after every line. In order to insert a space use a single \
. To render text inside a math environment use \text{here is the text}
. In order to align equations start with \begin{align}
and place an &
in each line at the point around which it should be aligned. Finally end with \end{align}
$$
\begin{align}
\text{First equation: }\ Y &= X \beta + \epsilon_y,\ \forall X \\
\text{Second equation: }\ X &= Z \gamma + \epsilon_x
\end{align}
$$
\[ \begin{align} \text{First equation: }\ Y &= X \beta + \epsilon_y,\ \forall X \\ \text{Second equation: }\ X &= Z \gamma + \epsilon_x \end{align} \]
Important symbols
Symbol | Code |
---|---|
\(a^{2} + b\) |
a^{2} + b
|
\(a^{2+b}\) |
a^{2+b}
|
\(a_{1}\) |
a_{1}
|
\(a \leq b\) |
a \leq b
|
\(a \geq b\) |
a \geq b
|
\(a \neq b\) |
a \neq b
|
\(a \approx b\) |
a \approx b
|
\(a \in (0,1)\) |
a \in (0,1)
|
\(a \rightarrow \infty\) |
a \rightarrow \infty
|
\(\frac{a}{b}\) |
\frac{a}{b}
|
\(\frac{\partial a}{\partial b}\) |
\frac{\partial a}{\partial b}
|
\(\sqrt{a}\) |
\sqrt{a}
|
\(\sum_{i = 1}^{b} a_i\) |
\sum_{i = 1}^{b} a_i
|
\(\int_{a}^b f(c) dc\) |
\int_{a}^b f(c) dc
|
\(\prod_{i = 0}^b a_i\) |
\prod_{i = 0}^b a_i
|
\(c \left( \sum_{i=1}^b a_i \right)\) |
c \left( \sum_{i=1}^b a_i \right)
|
The {}
after _
and ^
are not strictly necessary if there is only one character in the sub-/superscript. However, in order to place multiple characters in the sub-/superscript they are necessary.
e.g.
Symbol | Code |
---|---|
\(a^b = a^{b}\) |
a^b = a^{b}
|
\(a^b+c \neq a^{b+c}\) |
a^b+c \neq a^{b+c}
|
\(\sum_i a_i = \sum_{i} a_{i}\) |
\sum_i a_i = \sum_{i} a_{i}
|
\(\sum_{i=1}^{b+c} a_i \neq \sum_i=1^b+c a_i\) |
\sum_{i=1}^{b+c} a_i \neq \sum_i=1^b+c a_i
|
Greek letters
Greek letters are preceded by a \
followed by their name ($\beta$
= \(\beta\)). In order to capitalize them simply capitalize the first letter of the name ($\Gamma$
= \(\Gamma\)).
8.2 Assignment 1
8.2.1 Load libraries and data
For your convenience the following code will load the required tidyverse
library as well as the data. Make sure to convert each of the variables you use for you analysis to the appropriate data types (e.g., Date
, factor
).
library(tidyverse)
music_data <- read.csv2("https://raw.githubusercontent.com/WU-RDS/MA2024/main/data/music_data_fin.csv")
str(music_data)
## 'data.frame': 66796 obs. of 31 variables:
## $ isrc : chr "BRRGE1603547" "USUM71808193" "ES5701800181" "ITRSE2000050" ...
## $ artist_id : int 3679 5239 776407 433730 526471 1939 210184 212546 4938 119985 ...
## $ streams : num 11944813 8934097 38835 46766 2930573 ...
## $ weeks_in_charts : int 141 51 1 1 7 226 13 1 64 7 ...
## $ n_regions : int 1 21 1 1 4 8 1 1 5 1 ...
## $ danceability : num 50.9 35.3 68.3 70.4 84.2 35.2 73 55.6 71.9 34.6 ...
## $ energy : num 80.3 75.5 67.6 56.8 57.8 91.1 69.6 24.5 85 43.3 ...
## $ speechiness : num 4 73.3 14.7 26.8 13.8 7.47 35.5 3.05 3.17 6.5 ...
## $ instrumentalness : num 0.05 0 0 0.000253 0 0 0 0 0.02 0 ...
## $ liveness : num 46.3 39 7.26 8.91 22.8 9.95 32.1 9.21 11.4 10.1 ...
## $ valence : num 65.1 43.7 43.4 49.5 19 23.6 58.4 27.6 36.7 76.8 ...
## $ tempo : num 166 191.2 99 91 74.5 ...
## $ song_length : num 3.12 3.23 3.02 3.45 3.95 ...
## $ song_age : num 228.3 144.3 112.3 50.7 58.3 ...
## $ explicit : int 0 0 0 0 0 0 0 0 1 0 ...
## $ n_playlists : int 450 768 48 6 475 20591 6 105 547 688 ...
## $ sp_popularity : int 51 54 32 44 52 81 44 8 59 68 ...
## $ youtube_views : num 1.45e+08 1.32e+07 6.12e+06 0.00 0.00 ...
## $ tiktok_counts : int 9740 358700 0 13 515 67300 0 0 653 3807 ...
## $ ins_followers_artist : int 29613108 3693566 623778 81601 11962358 1169284 1948850 39381 9751080 343 ...
## $ monthly_listeners_artist : int 4133393 18367363 888273 143761 15551876 16224250 2683086 1318874 4828847 3088232 ...
## $ playlist_total_reach_artist: int 24286416 143384531 4846378 156521 90841884 80408253 7332603 24302331 8914977 8885252 ...
## $ sp_fans_artist : int 3308630 465412 23846 1294 380204 1651866 214001 10742 435457 1897685 ...
## $ shazam_counts : int 73100 588550 0 0 55482 5281161 0 0 39055 0 ...
## $ artistName : chr "Luan Santana" "Alessia Cara" "Ana Guerra" "Claver Gold feat. Murubutu" ...
## $ trackName : chr "Eu, Você, O Mar e Ela" "Growing Pains" "El Remedio" "Ulisse" ...
## $ release_date : chr "2016-06-20" "2018-06-14" "2018-04-26" "2020-03-31" ...
## $ genre : chr "other" "Pop" "Pop" "HipHop/Rap" ...
## $ label : chr "Independent" "Universal Music" "Universal Music" "Independent" ...
## $ top10 : int 1 0 0 0 0 1 0 0 0 0 ...
## $ expert_rating : chr "excellent" "good" "good" "poor" ...
8.2.2 Task 1
- Determine the most popular song by the artist “BTS”.
- Create a new
data.frame
that only contains songs by “BTS” (Bonus: Also include songs that feature both BTS and other artists, see e.g., “BTS feat. Charli XCX”) - Save the
data.frame
sorted by success (number of streams) with the most popular songs occurring first.
# provide your code here
## 1.
music_data |>
filter(artistName == "BTS") |>
slice_max(order_by = streams, n = 1) |>
select(artistName, trackName)
## 2.
bts_data <- music_data |>
filter(str_detect(artistName, "BTS")) |>
arrange(desc(streams))
distinct(bts_data, artistName)
8.2.3 Task 2
Create a new data.frame
containing the 100 most streamed songs.
# provide your code here
top100 <- slice_max(music_data, order_by = streams, n = 100)
range(top100$streams)
## [1] 490037377 2165692479
8.2.4 Task 3
- Determine the most popular genres.
- Group the data by genre and calculate the total number of streams within each genre.
- Sort the result to show the most popular genre first.
- Create a bar plot in which the heights of the bars correspond to the total number of streams within a genre (Bonus: order the bars by their height)
# provide your code here
genre_data <- music_data |>
summarize(.by = genre, total_streams = sum(streams)) |>
arrange(desc(total_streams)) |>
mutate(genre = fct_reorder(as.factor(genre), total_streams))
head(genre_data)
8.2.5 Task 4
- Rank the music labels by their success (total number of streams of all their songs)
- Show the total number of streams as well as the average and the median of all songs by label. (Bonus: Also add the artist and track names and the number of streams of each label’s top song to the result)
# provide your code here
label_data <- music_data |>
summarize(
.by = label,
total_streams = sum(streams),
avg_streams = mean(streams),
med_streams = median(streams),
top_song_artist = artistName[which.max(streams)],
top_song_title = trackName[which.max(streams)],
top_song_streams = max(streams)
) |>
mutate(label_rank = dense_rank(-total_streams)) |>
arrange(desc(total_streams))
label_data
8.2.6 Task 5
- How do genres differ in terms of song features (audio features + song length + explicitness + song age)?
- Select appropriate summary statistics for each of the variables and highlight the differences between genres using the summary statistics.
- Create an appropriate plot showing the differences of “energy” across genres.
# provide your code here
music_data |>
summarize(.by = genre, across(danceability:explicit,
list("avg" = mean,
"std.dev" = sd,
"median" = median,
"pct_10" = \(x) quantile(x, 0.1),
"pct_90" = \(x) quantile(x, 0.9))
)
)
8.2.9 Task 8
Create a histogram of the variable “valence”.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
8.2.10 Task 9
Create a scatter plot showing youtube_views
and shazam_counts
(Bonus: add a linear regression line). Interpret the plot briefly.
# provide your code here
ggplot(music_data, aes(x = youtube_views, y = shazam_counts)) +
geom_point() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
On average Youtube views and Shazam counts show a positive coefficient in the linear regression. However, the relationship does not appear to be linear.
8.3 Assignment 2
This assignment consists of four parts. When answering the questions, please remember to address the following points, where appropriate:
- Formulate the corresponding hypotheses and choose an appropriate statistical test
- Provide the reason for your choice and discuss if the assumptions of the test are met
- Convert the variables to the appropriate type (e.g., factor variables)
- Create appropriate graphs to explore the data (e.g., plot of means incl. confidence intervals, histogram, boxplot)
- Provide appropriate descriptive statistics for the variables (e.g., mean, median, standard deviation, etc.)
- Report and interpret the test results accurately (including confidence intervals)
- Finally, don’t forget to report your research conclusion
When you are done with your analysis, click on “Knit to HTML” button above the code editor. This will create a HTML document of your results in the folder where the “assignment2.Rmd” file is stored. Open this file in your Internet browser to see if the output is correct. If the output is correct, submit the HTML file via Learn@WU. The file name should be “assignment2_studendID_lastname.html”.
8.3.1 Assignment 2a
As a mobile app product manager, you are looking for ways to improve user engagement and in-app purchases. Your team has launched an A/B test to analyze the effect of a new user interface (UI) feature. You have data that contains information about user behavior within your app.
The data file contains the following variables:
- userID: Unique user ID.
- exp_group: Experimental group (indicator variable w/ 2 levels: 0 = control, 1 = treatment).
- in_app_purchases: Total amount spent by a user in the app in the past month (in USD).
- time_in_app: Average time a user spends in your app per session (in minutes).
Use R and appropriate methods to answer the following questions:
- The finance department asks you to provide an estimate of the average amount spent by users through in-app purchases. Compute the 95% confidence interval for the mean amount spent and provide an interpretation of the interval.
- You run an A/B test to analyze the effect of a new UI feature on in-app purchases and time spent in the app. The information regarding which group a user has been assigned to is stored in the variable “exp_group”. Is there a significant difference regarding in-app purchases and time spent between users from the control and treatment groups? Please include the effect size (Cohen’s d) and confidence intervals in your report.
- Assume that you plan to run an experiment to test two different notification strategies. You randomly assign app users to the control and experimental conditions. How many users would you need to include in each group if you assume the effect size to be 0.1 for a significance level of 0.05 and power of 0.8?
8.3.3 Load data
app_user_data <- read.table("https://raw.githubusercontent.com/WU-RDS/MA2024/main/user_data_q1.csv",
sep = ",",
header = TRUE) #read in data
head(app_user_data)
## 'data.frame': 1600 obs. of 4 variables:
## $ userID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ exp_group : int 0 0 0 0 0 0 0 0 0 0 ...
## $ in_app_purchases: num 7.2 8.85 17.79 10.35 10.65 ...
## $ time_in_app : num 18.6 23.3 17.7 17 11.4 ...
8.3.4 Question 1
First, we take a look at the statistics for the amount spent for in app purchases our customers make and at the graph with the distribution of purchases made in app per customer.
To compute the confidence interval for the average customer, we will need the mean, the standard error and the critical value for a t-distribution (because we don’t know exactly the variance in the population).
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(psych))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggstatsplot))
#First let's have a look at the purchases in the app in the data
psych::describe(app_user_data$in_app_purchases)
ggplot(app_user_data, aes(in_app_purchases)) +
geom_histogram(col = "white", fill = "lavenderblush3", bins = 50) +
geom_vline(data = app_user_data %>% dplyr::summarise(mean = mean(in_app_purchases)),aes(xintercept = mean),
linewidth = 0.7, color = "gray19") +
labs(x = "Purchases", y = "Frequency") +
ggtitle("Distribution of purchases per customer") +
theme_minimal()
# Compute mean, standard error, and confidence interval for in-app purchases
mean_purchases <- mean(app_user_data$in_app_purchases)
sd_purchases <- sd(app_user_data$in_app_purchases)
n <- nrow(app_user_data)
se_purchases <- sd_purchases / sqrt(n)
df <- n - 1
t_crit <- qt(0.975, df)
# Confidence Interval
ci_lower <- mean_purchases - t_crit * se_purchases
ci_upper <- mean_purchases + t_crit * se_purchases
print(ci_lower)
## [1] 10.87386
## [1] 11.36392
# Alternatively: get confidence interval from t.test
t.test(app_user_data$in_app_purchases)$conf.int
## [1] 10.87386 11.36392
## attr(,"conf.level")
## [1] 0.95
The confidence interval for in app purchases is CI = [10.87;11.36] Interpretation: If we take 100 samples and calculate the mean and confidence interval for each of them, then the true population mean would be included in 95% of these intervals. In the sample we have, this interval spans from 10.87 to 11.36.
8.3.5 Question 2
We need to analyze if a new UI feature has an effect on in app purchases. We need to formulate the null hypothesis as the first step. In our case the null hypothesis is that the new UI feature has no effect on the mean in-app purchases, that there is no difference in the mean in-app purchases between two populations. The alternative hypothesis states that the new UI feature has an effect on the mean in-app purchases, meaning that there is a difference in the mean in-app purchases between the two populations.
We first transform the variable exp_group into a factor and inspect the data with descriptive statistics. It can be already seen that the mean of in app-purchases is higher in the treatment group. Next we visualize the data, for this we can use a boxplot.
To test whether or not this difference is significant, we need to use an independent-means t-test, since we have different customers in each group, meaning that we have collected the data using a between-subjects design (the customers in one condition are independent of the customers in the other condition). The requirements are met: 1) the DV (in-app purchases) is measured on a ratio scale; 2) there are more than 30 observations in each group, so the data is normally distributed according to Central Limit Theorem; 3) the feature was assigned randomly, so the groups are independent; 4) Welch’s t-test corrects for unequal variance.
We also then calculate the effect size (Cohen’s d). Then we can also visualize the results of the test.
# Load necessary libraries
library(ggplot2)
library(data.table)
library(lsr)
library(pwr)
library(psych)
#making IV a factor
app_user_data$exp_group <- as.factor(app_user_data$exp_group)
#looking at descriptive statistics
describeBy(app_user_data$in_app_purchases, app_user_data$exp_group) #description of control and treatment groups
##
## Descriptive statistics by group
## group: 0
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 800 10.08 4.84 10.05 10.05 4.75 0 26.21 26.21 0.09 -0.22 0.17
## ------------------------------------------------------------
## group: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 800 12.15 4.94 12.27 12.12 4.98 0 28.95 28.95 0.1 -0.03 0.17
# Boxplot for In-App Purchases by Experimental Group
ggplot(app_user_data, aes(x = factor(exp_group), y = in_app_purchases, fill = factor(exp_group))) +
geom_boxplot() +
labs(title = "Boxplot of In-App Purchases by Experimental Group",
x = "Group (0 = Control, 1 = Treatment)",
y = "In-App Purchases (USD)") +
scale_fill_manual(values = c("lightblue", "lightgreen")) +
theme_minimal()
# t-test for differences in in-app purchases
t_test_purchases <- t.test(in_app_purchases ~ exp_group, data = app_user_data)
print(t_test_purchases)
##
## Welch Two Sample t-test
##
## data: in_app_purchases by exp_group
## t = -8.4689, df = 1597.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -2.550210 -1.591065
## sample estimates:
## mean in group 0 mean in group 1
## 10.08357 12.15421
# Compute Cohen's d for in app purchases
cohen_d_purchases <- cohensD(in_app_purchases ~ exp_group, data = app_user_data)
print(cohen_d_purchases)
## [1] 0.4234454
#Visualization of test results
ggbetweenstats(
data = app_user_data,
plot.type = "box",
x = exp_group, #2 groups
y = in_app_purchases ,
type = "p", #default
effsize.type = "d", #display effect size (Cohen's d in output)
messages = FALSE,
bf.message = FALSE,
mean.ci = TRUE,
title = "Average revenue per customer by group"
)
Interpretations: As we can see from the descriptive statistics and the plot for control and treatment groups, the in-app purchases are higher in the group that was exposed to the new UI feature. The t-test showed significant result because the p-value is smaller than 0,05, meaning that we can reject the null hypothesis that there is no difference in the mean of the in-app purchases. The p-value states that the probability of finding a difference of the observed magnitude or higher, if the null hypothesis was in fact true (if there was in fact no difference between the populations). For us it means that the new UI feature in fact has an effect on the average in-app purchases. Also: Since 0 (the hypothetical difference in means from H0) is not included in the interval, it confirms that we can reject the null hypothesis. The Cohen’s d effect size value of 0.4234 suggests that the effect of the new UI feature is small to medium.
The plot shows us that in app purchases are higher in the treatment group (Mean = 12.15) compared to the control group (Mean = 10.08). This means that, on average, the in app purchases were 2.07 higher in the treatment group, compared to the control group. An independent-means t-test showed that this difference is significant: t(1597) = 8.47, p < .05 (95% CI = [1.59, 2.55]); effect size is small to medium = 0.42.
Now we can look at the influence of the new UI feature on the time spent in app.
First, we need to formulate the null hypothesis. In this case the null hypothesis is that the new UI feature has no effect on the mean time spent in app, that there is no difference in the mean time spent in app between two populations. The alternative hypothesis states that the new UI feature has an effect on the mean time spent in app, meaning that there is a difference in the mean time in app between the two populations.
First, we inspect the data with descriptive statistics. It can be already seen that the mean of in app purchases is slightly higher in the treatment group. Next we visualize the data, for this we can use a boxplot.
We can use the independent means t-test because again the necessary assumptions are met: 1) The dependent variable (time in app) is measured on an ratio scale; 2) We have more than 30 observations per group; 3) The groups are independent.
We also then calculate the effect size (Cohen’s d). Then we can also visualize the results of the test.
##
## Descriptive statistics by group
## group: 0
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 800 19.99 4.98 19.83 20.02 5.03 4.76 36.45 31.69 -0.04 -0.04 0.18
## ------------------------------------------------------------
## group: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 800 21.92 5.02 21.7 21.88 4.81 7.41 39.11 31.7 0.1 -0.05 0.18
# Boxplot for Time Spent in the App
ggplot(app_user_data, aes(x = factor(exp_group), y = time_in_app, fill = factor(exp_group))) +
geom_boxplot() +
labs(title = "Boxplot of Time Spent in App by Experimental Group",
x = "Group (0 = Control, 1 = Treatment)",
y = "Time Spent in App (minutes)") +
scale_fill_manual(values = c("lightblue", "lightgreen")) +
theme_minimal()
# t-test for differences in time spent in app
t_test_time_in_app <- t.test(time_in_app ~ exp_group, data = app_user_data)
print(t_test_time_in_app)
##
## Welch Two Sample t-test
##
## data: time_in_app by exp_group
## t = -7.7392, df = 1597.9, p-value = 1.762e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -2.426037 -1.444958
## sample estimates:
## mean in group 0 mean in group 1
## 19.98501 21.92051
# Compute Cohen's d for in app purchases
cohen_d_time_in_app <- cohensD(time_in_app ~ exp_group, data = app_user_data)
print(cohen_d_time_in_app)
## [1] 0.3869598
#Visualization of test results
ggbetweenstats(
data = app_user_data,
plot.type = "box",
x = exp_group, #2 groups
y = time_in_app ,
type = "p", #default
effsize.type = "d", #display effect size (Cohen's d in output)
messages = FALSE,
bf.message = FALSE,
mean.ci = TRUE,
title = "Average revenue per customer by group"
)
Interpretations: From the descriptive statistics and the boxplot, we can already see that the mean time spent in app is higher in the treatment group. However, we need to conduct the test to see if this result is significant. The t-test showed significant result because the p-value is smaller than 0,05, meaning that we can reject the null hypothesis that there is no difference in the mean of time spent in app. The p-value states that the probability of finding a difference of the observed magnitude or higher, if the null hypothesis was in fact true (if there was in fact no difference between the populations). For us it means that the new UI feature in fact has an effect on the average time spent in app. Also: Since 0 (the hypothetical difference in means from H0) is not included in the interval, it confirms that we can reject the null hypothesis. The Cohen’s d effect size value of 0.387 suggests that the effect of the new UI feature is small to medium.
The plot shows us that time spent in app is higher in the treatment group (Mean = 21.92) compared to the control group (Mean = 19.99). This means that, on average, the time spent in app was 1.93 higher in the treatment group, compared to the control group. An independent-means t-test showed that this difference is significant: t(1598) = 7.74, p < .05 (95% CI = [1.45, 2.43]); effect size is small to medium = 0.387.
8.3.6 Question 3
To define the number of users that should be placed in two different conditions, pwr.t.test() function should be used. If the goal of the experiment is to simply detect significant difference between the groups, the sample size definition should be based on two-sided test.
Given the effect size = 0.1, significance level = 0.05, and power = 0.8, sample size for each group will be:
# provide your code here (you can use multiple code chunks per question if you like)
# Power analysis for sample size calculation
sample_size <- pwr.t.test(d = 0.1, sig.level = 0.05, power = 0.8, type = "two.sample")
print(sample_size)
##
## Two-sample t test power calculation
##
## n = 1570.733
## d = 0.1
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
To achieve our desired effect size of 0.1, a significance level of 0.5 and a power of 0.8 we would need to include at least 1571 customers per group in the experiment.
8.3.7 Assignment 2b
After conducting the experiment described above, you would like to find out whether push notifications can further improve user engagement with your mobile app. You expose a set of users, who were already exposed to the new UI feature, to push notifications and record the time they spend in the app before and after implementing the notifications.
You obtain a new data set with the following variables:
- userID: Unique user ID.
- time_in_app_1: Average time (in minutes) a user spends in your app per session before receiving push notifications.
- time_in_app_2: Average time (in minutes) a user spends in your app per session after receiving push notifications.
Use R and appropriate methods to answer the following question:
- Did the push notifications lead to a significant increase in the time that users spend in the app compared to before the notifications were implemented ? Conduct an appropriate statistical test to determine if the difference is statistically significant. Please include the effect size (Cohen’s d) and the confidence intervals in your report.
8.3.9 Load data
app_user_data_time <- read.table("https://raw.githubusercontent.com/WU-RDS/MA2024/main/user_data_q2.csv",
sep = ",",
header = TRUE) #read in data
head(app_user_data_time)
## 'data.frame': 417 obs. of 3 variables:
## $ userID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ time_in_app_1: num 21.2 38.4 22.7 25.3 28.2 ...
## $ time_in_app_2: num 30 26.4 27.4 30.7 30.8 ...
8.3.10 Question 4
We want to examine if push notifications have an effect on average time a user spends in the app. The null hypothesis here is that there is no difference in the mean time spent in the app for the same customers between the presence and absence of push notifications. Because the observations come from the same population of customers (a within-subject design), we refer to the difference in the means for the same population when stating our hypotheses. The alternative hypothesis states that there is a difference between the time in app for the same customers.
We start our analysis with looking at the descriptive statistics and at the plot. Then we conduct a dependent means t-test to see if the difference is significant.
# provide your code here (you can use multiple code chunks per question if you like)
suppressPackageStartupMessages(library(Rmisc))
library(tidyr)
#Descriptive statistics
psych::describe(app_user_data_time[!is.na(app_user_data_time$time_in_app_2), c("time_in_app_1","time_in_app_2")])
#Boxplot
time_data <- app_user_data_time |>
drop_na(time_in_app_2) |>
select(time_in_app_1, time_in_app_2) |>
pivot_longer(
cols = c(time_in_app_1, time_in_app_2),
names_to = "push_notifications", values_to = "time_in_app")
ggplot(time_data, aes(x = push_notifications, y = time_in_app)) + geom_boxplot() +
geom_jitter(alpha = 0.2, color = "lavenderblush4") +
labs(x = "", y = "Time spent in app", title = "Boxplot of time in app by group") +
theme_minimal()
# Paired t-test for time spent in app before and after push notifications
t_test_result <- t.test(app_user_data_time$time_in_app_2, app_user_data_time$time_in_app_1, paired = TRUE)
print(t_test_result)
##
## Paired t-test
##
## data: app_user_data_time$time_in_app_2 and app_user_data_time$time_in_app_1
## t = 5.8341, df = 416, p-value = 1.088e-08
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 1.346398 2.714706
## sample estimates:
## mean difference
## 2.030552
# Compute Cohen's d for paired samples
cohen_d_result <- cohensD(app_user_data_time$time_in_app_2, app_user_data_time$time_in_app_1, method = "paired")
print(cohen_d_result)
## [1] 0.2856969
#Visualization of the test
ggwithinstats(
data = time_data,
x = push_notifications,
y = time_in_app,
path.point = FALSE,
path.mean = TRUE,
title = "Time on site with or without push notifications",
messages = FALSE,
bf.message = FALSE,
mean.ci = TRUE,
effsize.type = "d" # display effect size (Cohen's d in output)
)
Interpretation: It appears that there is a difference in the means from the descriptive statistics and the plots. To test whether it is significant, we need to run a t-test. This time we need a different version of the t-test because the same customers are observed for the app with and without push notifications (the same customers are shown both versions of the app). This means that we need a dependent means t-test, or paired samples t-test. The other assumptions are identical to the independent-means t-test.
The p-value is lower than the significance level of 5% (p < .05), which means that we can reject the null hypothesis that there is no difference in the mean time in app between absence and presence of push notifications. The confidence interval confirms the conclusion to reject the null hypothesis since 0 is not contained in the range of plausible values.
The Cohen’s d effect size of 0.2857 shows us that the effect is rather small.
The results of the experiment show that, on average, the same users used the app on average 2.03 minutes longer when it included the push notifications (Mean = 27.24) compared to the app without the push notifications (Mean = 25.21). This difference was significant: t(416) = 5.83, p < .05 (95% CI = [1.35, 2.72]); effect size is small = 0.29.
This means that it makes sense to include push notifications to the app as standard practice.
8.3.11 Assignment 2c
As a marketing analyst for an online retailer, you’re tasked with evaluating how different levels of GDPR-compliant behavioral targeting affect purchase behavior. Given the restrictions imposed by GDPR on using personal data, the retailer conducts an experiment with three levels of targeting:
- No targeting: Users receive no targeted ads (default ad experience).
- Segment-based targeting: Ads are tailored using aggregate-level data (e.g., based on product categories users browse, not their personal data).
- Individual personalized behavioral targeting: Ads are personalized based on the specific behavior of individual users (using compliant first-party data).
You obtain a data set with the following variables:
- customerID: Unique customer ID.
- revenue: Total revenue generated by the customer during the experiment (in USD).
- satisfaction: Customer satisfaction score from a post-purchase survey (measured in 11 categories from 0 [very dissatisfied] to 10 [very satisfied]).
- targeting: Type of targeting the customer was exposed to (1 = no targeting, 2 = segment-based targeting, 3 = personalized behavioral targeting).
Use R and appropriate methods to answer the following question:
- Are there significant differences in revenue between the three targeting strategies?
- Did the targeting strategy significantly influence customer satisfaction?
8.3.13 Load data
targeting_data <- read.table("https://raw.githubusercontent.com/WU-RDS/MA2024/main/user_targeting_data.csv",
sep = ",",
header = TRUE) #read in data
head(targeting_data)
## 'data.frame': 300 obs. of 4 variables:
## $ customerID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ targeting : chr "Personalized Targeting" "Personalized Targeting" "Personalized Targeting" "Segment-Based Targeting" ...
## $ revenue : num 217 181 182 181 143 ...
## $ satisfaction: int 6 7 1 10 5 0 10 9 5 4 ...
8.3.14 Question 5
To answer the question of whether the type of targeting has an effect on revenue, we need to formulate the null hypothesis first. In this case, the null hypothesis is that the average level of sales is equal for all three targeting types. The alternative hypothesis states that mean revenue is not equal among three targeting types.
The appropriate test for such a hypothesis is one-way ANOVA since we have a metric-scaled dependent variable and a categorical independent variable with more than two levels. First, we need to recode the independent variable into factor. Next we take a look at descriptive statistics for the data and create appropriate plots.
Before we move to the formal test, we need to see if a series of assumptions are met, namely: 1) Independence of observations; 2) Distributional assumptions; 3) Homogeneity of variances.
Due to the fact that there are more than 30 observations in each group we can rely on the Central Limit Theorem to satisfy the distributional assumptions. We can still test this assumption using Shapiro-Wilk normality test and Q-Q plots. Homogeneity of variances can be checked with Levene’s test.
After checking that the assumptions are met, we can proceed with ANOVA and show also the plot for the test. Next we will briefly inspect the residuals of the ANOVA to see if the assumptions of the test really are justified.
The ANOVA result only tells us that the means of the three groups are not equal, but it does not tell us anything about which pairs of means are unequal. To find this out we need to conduct a post-hoc test.
# provide your code here (you can use multiple code chunks per question if you like)
targeting_data$targeting <- factor(targeting_data$targeting,
levels = c("Personalized Targeting",
"Segment-Based Targeting",
"No Targeting"))
#Descriptive statistics
describeBy(targeting_data$revenue, targeting_data$targeting)
##
## Descriptive statistics by group
## group: Personalized Targeting
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 90 158.33 27.45 159.56 159.58 28.32 79.61 217.03 137.42 -0.38 -0.33
## se
## X1 2.89
## ------------------------------------------------------------
## group: Segment-Based Targeting
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 98 135.43 34.72 135.43 135.17 40.54 40.68 221.25 180.57 0.03 -0.29
## se
## X1 3.51
## ------------------------------------------------------------
## group: No Targeting
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 112 117.97 30.62 121.21 117.98 29.01 28.87 193.63 164.76 -0.1 -0.05
## se
## X1 2.89
#Visual inspection of data
ggplot(targeting_data,aes(x = targeting, y = revenue)) +
geom_boxplot() +
geom_jitter(colour="red", alpha = 0.1) +
theme_bw() +
labs(x = "Targeting", y = "Revenue")+
theme_bw() +
theme(plot.title = element_text(hjust = 0.5,color = "#666666"))
#Distributional assumptions - checking for normal distributions
#test for normal distribution of variables - Shapiro-Wilk test
by(targeting_data$revenue, targeting_data$targeting, shapiro.test)
## targeting_data$targeting: Personalized Targeting
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.98123, p-value = 0.2199
##
## ------------------------------------------------------------
## targeting_data$targeting: Segment-Based Targeting
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.98853, p-value = 0.5629
##
## ------------------------------------------------------------
## targeting_data$targeting: No Targeting
##
## Shapiro-Wilk normality test
##
## data: dd[x, ]
## W = 0.99486, p-value = 0.9555
qqnorm(targeting_data[targeting_data$targeting == "Personalized Targeting", ]$revenue)
qqline(targeting_data[targeting_data$targeting == "Personalized Targeting", ]$revenue)
qqnorm(targeting_data[targeting_data$targeting == "Segment-Based Targeting", ]$revenue)
qqline(targeting_data[targeting_data$targeting == "Segment-Based Targeting", ]$revenue)
qqnorm(targeting_data[targeting_data$targeting == "No Targeting", ]$revenue)
qqline(targeting_data[targeting_data$targeting == "No Targeting", ]$revenue)
## Loading required package: carData
# Perform ANOVA for revenue across targeting strategies
anova_result <- aov(revenue ~ targeting, data = targeting_data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## targeting 2 81290 40645 41.9 <2e-16 ***
## Residuals 297 288110 970
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Visualize the test
library(ggstatsplot)
ggbetweenstats(
data = targeting_data,
x = targeting,
y = revenue,
plot.type = "box",
pairwise.comparisons = TRUE,
pairwise.annotation = "p.value",
p.adjust.method = "bonferroni",
effsize.type = "eta", #if var.equal = FALSE, returns partial eta^2
var.equal = TRUE,
mean.plotting = TRUE,
mean.ci = TRUE,
mean.label.size = 2.5,
type = "parametric",
k = 3,
outlier.label.color = "darkgreen",
title = "Comparison of revenues between groups",
xlab = "Experimental group",
ylab = "Revenue",
messages = FALSE,
bf.message = FALSE,
)
##
## Shapiro-Wilk normality test
##
## data: resid(anova_result)
## W = 0.99499, p-value = 0.4388
#Effect size
summary(anova_result)[[1]]$"Sum Sq"[1]/(summary(anova_result)[[1]]$"Sum Sq"[1] +
summary(anova_result)[[1]]$"Sum Sq"[2])
## [1] 0.2200593
# Tukey's post-hoc test for pairwise comparisons
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = revenue ~ targeting, data = targeting_data)
##
## $targeting
## diff lwr upr
## Segment-Based Targeting-Personalized Targeting -22.89817 -33.60926 -12.18709
## No Targeting-Personalized Targeting -40.35672 -50.74238 -29.97106
## No Targeting-Segment-Based Targeting -17.45855 -27.60645 -7.31064
## p adj
## Segment-Based Targeting-Personalized Targeting 0.0000025
## No Targeting-Personalized Targeting 0.0000000
## No Targeting-Segment-Based Targeting 0.0001909
# Boxplot for Revenue by Targeting Strategy
ggplot(targeting_data, aes(x = targeting, y = revenue, fill = targeting)) +
geom_boxplot() +
labs(title = "Boxplot of Revenue by Targeting Strategy",
x = "Targeting Strategy",
y = "Revenue (USD)") +
scale_fill_manual(values = c("lightblue", "lightgreen", "lightcoral")) +
theme_minimal()
Both the summary statistics and the plot show that the means are not equal among the three groups. Especially the difference between personalized targeting and no targeting seem to be quite high.
First we check if the assumptions of ANOVA are met. The insignificant result of Shapiro-Wilk test shows that we cannot reject the null hypothesis that the residuals are normally distributed. The same we can see from the Q-Q plots.
The null hypothesis of Levene’s test is that the variances are equal, with the alternative hypothesis being that the variances are not all equal. The insignificant result of this test demonstrates that variances are equal, so this assumption is also met.
The ANOVA showed a p-value lower that 0.05, which means we can reject the null hypothesis that the mean revenue is the same for all three groups with different types of targeting.
The Q-Q plots show us that the residuals are equally distributed, which is confirmed also by the insignificant result of the Shapiro-Wilk test. The null hypothesis for this test is that the distribution of residuals is normal.
According to the test, the effect of different types of targeting on revenues was detected: F = 41.9, p < 0.05, with the effect size which is rather small η2 = 0.22.
The Tukey’s HSD test compares pairwise all three groups and we can see from the result that we can reject null hypothesis in all three case, which means that the revenue means are all significantly different from each other. It is clearly visible that none of the CIs cross the 0 bound, which further indicates that all differences in means are statistically significantly different from 0.
From a reporting standpoint we can say that revenue is higher when using personalized targeting. This means that personalized targeting helps us to increase sales and should thus be the preferred choice.
8.3.15 Question 6
For this question we want to examine whether the customer satisfaction is significantly different for the groups with three different types of targeting. Because we are dealing with data on an ordinal scale, we cannot use ANOVA for this analysis. The non-parametric counterpart is the Kruskal-Wallis test, which tests for differences in medians between more than two groups. Hence, the null hypothesis is that the medians are equal in each group, and the alternative hypothesis is that there is a difference in medians.
First, we inspect the descriptive statistics and the plot. The only assumption for Kruskal-Wallis test is that the DV has to be at least ordinal scaled, and this assumption is met.
# provide your code here (you can use multiple code chunks per question if you like)
#Descriptive statistics by for customer satisfaction by group
describeBy(targeting_data$satisfaction, targeting_data$targeting)
##
## Descriptive statistics by group
## group: Personalized Targeting
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 90 5.23 3.01 6 5.32 2.97 0 10 10 -0.19 -1.08 0.32
## ------------------------------------------------------------
## group: Segment-Based Targeting
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 98 5 3.37 5 5 4.45 0 10 10 -0.02 -1.47 0.34
## ------------------------------------------------------------
## group: No Targeting
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 112 4.63 2.89 5 4.52 4.45 0 10 10 0.15 -1.14 0.27
# Boxplot for Satisfaction by Targeting Strategy
ggplot(targeting_data, aes(x = targeting, y = satisfaction, fill = targeting)) +
geom_boxplot() +
labs(title = "Boxplot of Customer Satisfaction by Targeting Strategy",
x = "Targeting Strategy",
y = "Satisfaction Score") +
scale_fill_manual(values = c("lightblue", "lightgreen", "lightcoral")) +
theme_minimal()
# Kruskal-Wallis test for satisfaction across targeting strategies
kruskal_result <- kruskal.test(satisfaction ~ targeting, data = targeting_data)
print(kruskal_result)
##
## Kruskal-Wallis rank sum test
##
## data: satisfaction by targeting
## Kruskal-Wallis chi-squared = 2.0659, df = 2, p-value = 0.356
ggbetweenstats(
data = targeting_data,
plot.type = "box",
x = targeting, #3 groups
y = satisfaction,
type = "nonparametric",
pairwise.comparisons = TRUE,
pairwise.annotation = "p.value",
p.adjust.method = "bonferroni",
messages = FALSE,
title = "Median satisfaction for different targeting groups"
)
We can see from the descriptive statistics and from the boxplot that the median customer satisfaction for customers with personalized targeting is slightly higher (=6) than for other two targeting types (=5).
The p-value of Kruskal-Wallis test is higher than 0.05, which indicates that we cannot reject the null hypothesis. This means that medians of customer satisfaction are not different among the targeting groups.
8.3.16 Assignment 2d
As a digital marketing manager, you want to evaluate the effectiveness of a new email subscription pop-up feature designed to increase newsletter signups. You run an A/B test where some visitors to your website see the new subscription pop-up, while others experience the regular sign-up option without a pop-up. Your goal is to compare the conversion rate (whether visitors signed up for the newsletter) between the control group (no pop-up) and the treatment group (pop-up).
You obtain a new data set with the following variables:
- customerID: Unique customer ID.
- conversion: Indicator variable for whether a visitor signed up for the newsletter (0 = no, 1 = yes).
- exp_group: Experimental group (0 = control, no pop-up; 1 = treatment, pop-up).
- Did the new email subscription pop-up have a significant effect on the conversion rate?
8.3.18 Load data
conversion_data <- read.table("https://raw.githubusercontent.com/WU-RDS/MA2024/main/conversion_data.csv",
sep = ",",
header = TRUE) #read in data
head(conversion_data)
## 'data.frame': 487 obs. of 3 variables:
## $ customerID: int 1 2 3 4 5 6 7 8 9 10 ...
## $ exp_group : chr "Control" "Treatment" "Treatment" "Control" ...
## $ conversion: int 0 0 0 0 0 0 1 1 1 1 ...
8.3.19 Question 7
To find out if a new email subscription pop-up feature has an effect on the conversion rate, we can use a test for proportions. To test for the equality of proportions (and therefore no difference between them) we can use a Chi-squared test.
Our null hypothesis in this case states that the proportions of conversion are the same for groups with and without the subscription pop-up feature. Our alternative hypothesis states that these proportions are unequal. First, we have to recode the relevant variables into factors. Then we create a contingency table and a plot to take a look at the proportions of conversion rates in the control and treatment groups. We can then conduct the formal Chi-squared test to see if the difference in conversion rates is significant.
# provide your code here (you can use multiple code chunks per question if you like)
#Recoding variables into factors
conversion_data$exp_group <- factor(conversion_data$exp_group,
levels = c("Control",
"Treatment"))
conversion_data$conversion <- factor(conversion_data$conversion, levels = c(0,1), labels = c("no", "yes"))
# Create a contingency table for conversions
conversion_table <- table(conversion_data$exp_group, conversion_data$conversion)
print(conversion_table)
##
## no yes
## Control 206 47
## Treatment 165 69
##
## no yes
## Control 0.8142292 0.1857708
## Treatment 0.7051282 0.2948718
#Visualization
rel_freq_table <- as.data.frame(prop.table(table(conversion_data$exp_group, conversion_data$conversion), 1))
names(rel_freq_table) <- c("exp_group", "conversion","freq") # changing names of the columns
rel_freq_table
ggplot(rel_freq_table, aes(x = exp_group, y = freq, fill = conversion)) + #plot data
geom_col(width = .7) + #position
geom_text(aes(label = paste0(round(freq*100,0),"%")), position = position_stack(vjust = 0.5), size = 4) + #add percentages
ylab("Proportion of conversions") + xlab("Exp group") + # specify axis labels
theme_bw()
# Proportion test to compare conversion rates between groups
prop_test_result <- prop.test(conversion_table)
print(prop_test_result)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: conversion_table
## X-squared = 7.3844, df = 1, p-value = 0.006579
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.02942325 0.18877884
## sample estimates:
## prop 1 prop 2
## 0.8142292 0.7051282
table <- table(conversion_data$conversion,conversion_data$exp_group)
chisq.test(table, correct = TRUE)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table
## X-squared = 7.3844, df = 1, p-value = 0.006579
#effect size
test_stat <- chisq.test(conversion_table, correct = FALSE)$statistic
n <- nrow(conversion_data)
phi1 <- sqrt(test_stat/n)
phi1
## X-squared
## 0.127962
We can see in the contingency table and in the plot that the conversion rate in the treatment group of 29% is higher than the conversion rate of 19% in the control group. To see if this difference is significant, we have to conduct the formal chi-squared test. It can be clearly seen from the test that p-value is lower than 0.05, so the result of the treatment on the conversion rate is statistically significant. We also calculated the effect size Phi: it is pretty small 0.128.
From the managerial perspective, it makes sense to include the new email subscription pop-up feature since it significantly increases the coversion rate, although the effect size is rather small.