import matplotlib as mpl
import math
import pandas as pd
import numpy as np
import scipy
from pandasql import sqldf
import wmfdata
from wmfdata import hive, mariadb, spark
import matplotlib.pyplot as plt
import seaborn as sns
Wiki Highlights Experiment
Table of Contents
Summary
Wiki Highlights is a concise overview of text generated from the lead and other sections of a Wikipedia article, combined with a relevant image, whose purpose is to highlight relevant facts from a lengthy paragraph.
The experiment ran through January 4th to January 6th in six countries: Brazil, Germany, India, Indonesia, Nigeria, United States. The participants in each country were randomly assigned two versions of content uploaded on microsites: the highlight version of content and the article version of content. The content is sourced from English Wikipedia and Commons, as featured in this list. Participants were able to read one of the versions of content and choose whether to continue reading more or exit the microsite.
Purpose
We are measuring the following set of metrics, to understand whether Wiki-Highlights is a viable reading experience for global youth audiences on 3rd party platform.
Primary metric - Time on site(session length) - Total time = Time on homepage + Time on content page - Time on homepage - Time on content page
Secondary metrics
- Summaries completion rate
- Number of summaries consumed per session
- Popular topics
Data Preparation
= wmfdata.spark.create_session(app_name='pyspark regular; wiki-highlights',
spark_session type='yarn-regular', # local, yarn-regular, yarn-large
)
24/03/14 04:25:11 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
24/03/14 04:25:17 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
= ('Brazil', 'Germany', 'India', 'Indonesia', 'Nigeria', 'United States') country_list
## Adding function for percentile
def percentile(n):
def percentile_(x):
return x.quantile(n)
__name__ = 'percentile_{:02.0f}'.format(n*100)
percentile_.return percentile_
Wiki highlights Event Data
Collect event data from wiki_highlights_experiment schema between the test period January 4th - January 16th.
= """
event_data_query
SELECT
meta.dt as server_dt,
experiment_group,
geocoded_data['country'] as user_country,
md5(concat(http.client_ip, '+{salt}')) as ip_hash,
session_id, event_type,
page_name,
CASE WHEN page_name IN ('categories_highlights', 'categories_articles') THEN 'homepage' ELSE topic END AS topic, -- hard code homepage
CASE WHEN page_name IN ('categories_highlights', 'categories_articles') THEN 'homepage' ELSE category_name END AS category_name, -- hard code homepage
page_bottom_was_visible, time_length_ms
FROM event.inuka_wiki_highlights_experiment e
LEFT JOIN cchen.wiki_highlights_article_list l ON e.page_name = l.article_title
WHERE
(year = 2024 AND month = 1 AND day >=4 AND day <= 16)
"""
= spark.run(event_data_query) event_data
24/03/14 04:49:38 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
# store data in GlobalTempView
= spark_session.createDataFrame(event_data)
event_sdf "event_data_view") event_sdf.createGlobalTempView(
Metrics
Time on Site (Session Length)
The metric indicates users’ willingness to consume articles and highlights. All the times we calculate are in seconds.
= """
time_on_site_query
SELECT
experiment_group,
session_id,
SUM(time_length_ms)/1000 AS total_length,
SUM(CASE WHEN topic = 'homepage' THEN time_length_ms END)/1000 AS home_length,
SUM(CASE WHEN topic != 'homepage' THEN time_length_ms END)/1000 AS content_length
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
GROUP BY experiment_group, session_id
"""
= spark.run(time_on_site_query) time_on_site
## Check % of sessions with only homepage visits, no content page visits
"""
sqldf(
SELECT
experiment_group,
SUM(CASE WHEN content_length IS NULL THEN 1 END)*100 / COUNT(1) AS hp_only_pct
FROM time_on_site
GROUP BY experiment_group
""")
experiment_group | hp_only_pct | |
---|---|---|
0 | control | 51 |
1 | experiment | 52 |
There were 51% and 52% of sessions with only homepage visits in the control group and the experiment group, respectively.
Total Time
= time_on_site.groupby('experiment_group')
time_grouped = time_grouped['total_length'] total_time_column
0.5), percentile(0.75), percentile(0.90), percentile(0.95)]) total_time_column.agg([percentile(
percentile_50 | percentile_75 | percentile_90 | percentile_95 | |
---|---|---|---|---|
experiment_group | ||||
control | 18.942 | 41.227 | 95.2176 | 188.6705 |
experiment | 20.524 | 46.332 | 109.9998 | 197.8194 |
"white")
sns.set_style(= plt.subplots(figsize=(10,5))
fig, ax
= sns.kdeplot(data=time_on_site, x="total_length", hue="experiment_group", shade=True, log_scale=True, clip =(-1,3.5))
fig set(yticklabels=[])
fig.set(ylabel=None)
fig.set(xticklabels=[0,0,0,1,10,100,1000])
fig.
"Seconds")
plt.xlabel('Total Time Spent') plt.title(
/tmp/ipykernel_3136929/331582030.py:4: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
fig = sns.kdeplot(data=time_on_site, x="total_length", hue="experiment_group", shade=True, log_scale=True, clip =(-1,3.5))
/tmp/ipykernel_3136929/331582030.py:7: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
fig.set(xticklabels=[0,0,0,1,10,100,1000])
Text(0.5, 1.0, 'Total Time Spent')
set(rc={'figure.figsize':(15,5)})
sns."white")
sns.set_style(
=time_on_site, x="total_length", hue="experiment_group",showfliers=False, gap=.5)
sns.boxplot(data
"Seconds")
plt.xlabel('Total Time Spent') plt.title(
Text(0.5, 1.0, 'Total Time Spent')
In control group, 50% of sessions had a total reading time between 0 to 19 seconds; and in experiment group, 50% of sessions had a total reading time between 0 to 21 seconds.
In control group, 95% of sessions had a total reading time between 0 to 189 seconds; and in experiment group, 95% of sessions had a total reading time between 0 to 198 seconds.
The experiment group had more users spent more time on homepages and content pages than the control group.
Time on Homepage
= time_grouped['home_length']
home_time_column 0.5), percentile(0.75), percentile(0.90), percentile(0.95)]) home_time_column.agg([percentile(
percentile_50 | percentile_75 | percentile_90 | percentile_95 | |
---|---|---|---|---|
experiment_group | ||||
control | 14.348 | 27.75075 | 51.5783 | 76.13905 |
experiment | 14.877 | 27.59550 | 52.5068 | 78.76800 |
"white")
sns.set_style(= plt.subplots(figsize=(10,5))
fig, ax
= sns.kdeplot(data=time_on_site, x="home_length", hue="experiment_group", shade=True, log_scale=True, clip =(-1,3.5))
fig set(yticklabels=[])
fig.set(ylabel=None)
fig.set(xticklabels=[0,0,0,1,10,100,1000])
fig.
"Seconds")
plt.xlabel("Homepage Time Spent") plt.title(
/tmp/ipykernel_3136929/3455724706.py:4: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
fig = sns.kdeplot(data=time_on_site, x="home_length", hue="experiment_group", shade=True, log_scale=True, clip =(-1,3.5))
/tmp/ipykernel_3136929/3455724706.py:7: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
fig.set(xticklabels=[0,0,0,1,10,100,1000])
Text(0.5, 1.0, 'Homepage Time Spent')
=time_on_site, x="home_length", hue="experiment_group",showfliers=False, gap=.5)
sns.boxplot(data
"Seconds")
plt.xlabel('Time Spent on Homepage')
plt.title(
"upper left", bbox_to_anchor=(1, 1)) sns.move_legend(ax,
In control group, 50% of sessions had a total reading time between 0 to 14 seconds; and in experiment group, 50% of sessions had a total reading time between 0 to 15 seconds.
In control group, 95% of sessions had a total reading time between 0 to 76 seconds; and in experiment group, 95% of sessions had a total reading time between 0 to 79 seconds.
The users in the experiment group seem to stay at a similar time as the users in the control group on the home page.
Time on Content Page
= time_grouped['content_length']
content_time_column 0.5), percentile(0.75), percentile(0.90), percentile(0.95)]) content_time_column.agg([percentile(
percentile_50 | percentile_75 | percentile_90 | percentile_95 | |
---|---|---|---|---|
experiment_group | ||||
control | 9.698 | 29.08675 | 116.2183 | 214.5830 |
experiment | 11.725 | 46.27050 | 129.4230 | 260.2247 |
"white")
sns.set_style(= plt.subplots(figsize=(10,5))
fig, ax
= sns.kdeplot(data=time_on_site, x="content_length", hue="experiment_group", shade=True, log_scale=True, clip =(-1,3.5))
fig set(yticklabels=[])
fig.set(ylabel=None)
fig.set(xticklabels=[0,0,0,1,10,100,1000])
fig.
"Seconds")
plt.xlabel("Contnt Time Spent") plt.title(
/tmp/ipykernel_3136929/2061442196.py:4: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
fig = sns.kdeplot(data=time_on_site, x="content_length", hue="experiment_group", shade=True, log_scale=True, clip =(-1,3.5))
/tmp/ipykernel_3136929/2061442196.py:7: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
fig.set(xticklabels=[0,0,0,1,10,100,1000])
Text(0.5, 1.0, 'Contnt Time Spent')
=time_on_site, x="content_length", hue="experiment_group",showfliers=False, gap=.5)
sns.boxplot(data
"Seconds")
plt.xlabel('Time Spent on Content pages')
plt.title(
"upper left", bbox_to_anchor=(1, 1)) sns.move_legend(ax,
In control group, 50% of sessions had a total reading time between 0 to 10 seconds; and in experiment group, 50% of sessions had a total reading time between 0 to 12 seconds.
In control group, 95% of sessions had a total reading time between 0 to 215 seconds; and in experiment group, 95% of sessions had a total reading time between 0 to 260 seconds.
For number of users who viewed content pages, the experiment group had more users spent more time on content pages than the control group.
Note: in the control group, the articles are collapsed. This implies that it might be possible that some users did not expand each section to read through the entire article; which could have potentially affected the reading time of in control group
Content Read Completion Rate
The metric indicates users’ willingness to complete reading the content. Content is considered complete when users reach the bottom of an article or the last page of a highlight.
When calculating the completion rate, we are excluding homepage visits.
= """
content_completion_query
SELECT
experiment_group,
COUNT(1) AS pageview,
SUM(CASE WHEN page_bottom_was_visible THEN 1 END)/ COUNT(1) AS completion_rate
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND topic != 'homepage'
GROUP BY experiment_group
"""
= spark.run(content_completion_query) content_completion
content_completion
experiment_group | pageview | completion_rate | |
---|---|---|---|
0 | control | 1112 | 0.781475 |
1 | experiment | 1658 | 0.721954 |
The control group had 1,112 articles opened, with a 78.1% completion rate.
The experiment group had more highlights read but less completion rate. There are 1,658 highlights opened with a 72.2% completion rate.
Number of Content Viewed per Session
The metric reflects users’ willingness to view subsequent highlights and articles.
We also exclude homepage views here. If a session only had homepage views, then we count it as 0 content views in that session
= """
content_per_session_query
SELECT
experiment_group,
session_id,
SUM(CASE WHEN topic = 'homepage' THEN 0 ELSE 1 END) AS num_pages
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
GROUP BY experiment_group,session_id
"""
= spark.run(content_per_session_query) content_per_session
= content_per_session.groupby('experiment_group')
content_per_session_grouped = content_per_session_grouped['num_pages'] content_per_session_column
0.5), percentile(0.75), percentile(0.90), percentile(0.95)]) content_per_session_column.agg([percentile(
percentile_50 | percentile_75 | percentile_90 | percentile_95 | |
---|---|---|---|---|
experiment_group | ||||
control | 0.0 | 1.0 | 2.0 | 3.0 |
experiment | 0.0 | 1.0 | 3.0 | 4.0 |
"white")
sns.set_style(= plt.subplots(figsize=(10,5))
fig, ax
= sns.kdeplot(data=content_per_session, x="num_pages", hue="experiment_group", shade=True, cut =0, clip=(0,15),
fig ={'control':'b', 'experiment':'r'})
paletteset(yticklabels=[])
fig.set(ylabel=None)
fig.
"Number of Content per Session") plt.title(
/tmp/ipykernel_3136929/3845779504.py:4: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
fig = sns.kdeplot(data=content_per_session, x="num_pages", hue="experiment_group", shade=True, cut =0, clip=(0,15),
Text(0.5, 1.0, 'Number of Content per Session')
=content_per_session, x="num_pages", hue="experiment_group",showfliers=False, gap=.5,
sns.boxplot(data={'control':'b', 'experiment':'r'})
palette
"Seconds")
plt.xlabel("Content Read per Session")
plt.title(
"upper left", bbox_to_anchor=(1, 1)) sns.move_legend(ax,
There were 50% of sessions with 0 summary/article consumed per session for both control and experiment groups.
There were 75% of sessions with 0 or 1 summaries/articles consumed per session for both control and experiment groups.
For the control group, there were 95% of sessions with 0 to 3 articles per session. For the experiment group, there were 95% of sessions with 0 to 4 summaries per session.
There are more users in experiment group viewed slightly more summaries in control group does.
Top Viewed Content
This section shows which topics & categories had majority of reads through pageviews/wiki highlights views. Additionally, we include the content completion rate as a reference.
For the list of featured articles and their topics, please refer to this sheet.
= """
top_page_query
SELECT
experiment_group,
page_name,
COUNT(1) AS pv,
SUM(CASE WHEN page_bottom_was_visible THEN 1 ELSE 0 END)/ COUNT(1) AS completion_rate
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND topic != 'homepage'
GROUP BY experiment_group, page_name
"""
= spark.run(top_page_query) top_page
Top 10 viewed articles in contol group are:
'experiment_group'] == 'control')].sort_values(by=['pv'], ascending=False).head(10) top_page.loc[(top_page[
experiment_group | page_name | pv | completion_rate | |
---|---|---|---|---|
49 | control | Lionel Messi | 76 | 0.881579 |
7 | control | Friends | 62 | 0.806452 |
16 | control | Japan | 60 | 0.783333 |
34 | control | Ancient Egypt | 53 | 0.735849 |
55 | control | Body piercing | 47 | 0.808511 |
40 | control | Baseball | 46 | 0.695652 |
21 | control | Comics | 46 | 0.847826 |
17 | control | Feminism | 43 | 0.813953 |
31 | control | Obesity | 42 | 0.833333 |
37 | control | Statue of Liberty | 41 | 0.804878 |
Top 10 viewed highligths in experiment group are:
'experiment_group'] == 'experiment')].sort_values(by=['pv'], ascending=False).head(10) top_page.loc[(top_page[
experiment_group | page_name | pv | completion_rate | |
---|---|---|---|---|
14 | experiment | Lionel Messi | 99 | 0.595960 |
45 | experiment | Climate change | 86 | 0.767442 |
52 | experiment | Elephant | 80 | 0.700000 |
11 | experiment | Japan | 79 | 0.721519 |
26 | experiment | Friends | 74 | 0.756757 |
38 | experiment | Obesity | 71 | 0.760563 |
39 | experiment | Comics | 69 | 0.710145 |
28 | experiment | Sustainable energy | 69 | 0.695652 |
29 | experiment | Statue of Liberty | 64 | 0.593750 |
44 | experiment | Yoga | 62 | 0.693548 |
Top Viewed Topics
= """
top_topic_query
SELECT
experiment_group,
category_name AS topic,
COUNT(1) AS pv,
SUM(CASE WHEN page_bottom_was_visible THEN 1 ELSE 0 END)/ COUNT(1) AS completion_rate
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND topic != 'homepage'
GROUP BY experiment_group, category_name
"""
= spark.run(top_topic_query) top_topic
Top viewed topics in contol group are:
'experiment_group'] == 'control')].sort_values(by=['pv'], ascending=False).head(10) top_topic.loc[(top_topic[
experiment_group | topic | pv | completion_rate | |
---|---|---|---|---|
8 | control | LIFESTYLE | 190 | 0.800000 |
0 | control | PERSONALITIES | 184 | 0.836957 |
6 | control | HISTORY | 160 | 0.787500 |
4 | control | TOPICAL | 150 | 0.820000 |
5 | control | SPORT | 146 | 0.705479 |
10 | control | NATURE | 145 | 0.710345 |
2 | control | PLACES | 137 | 0.788321 |
Top viewed topics in experiment group are:
'experiment_group'] == 'experiment')].sort_values(by=['pv'], ascending=False).head(10) top_topic.loc[(top_topic[
experiment_group | topic | pv | completion_rate | |
---|---|---|---|---|
12 | experiment | TOPICAL | 286 | 0.762238 |
3 | experiment | PERSONALITIES | 257 | 0.680934 |
11 | experiment | NATURE | 254 | 0.728346 |
1 | experiment | LIFESTYLE | 245 | 0.767347 |
7 | experiment | PLACES | 216 | 0.726852 |
13 | experiment | SPORT | 207 | 0.719807 |
9 | experiment | HISTORY | 193 | 0.647668 |
Note: Nature didn’t show up at the top for any country contrary to user feedback from the survey.
Metrics Breakdown by Countries
Add a country-wise breakdown for each metric to facilitate comparisons.
Time on Site (Session Length)
= """
time_on_site_c_query
SELECT
user_country,
experiment_group,
session_id,
SUM(time_length_ms)/1000 AS total_length,
SUM(CASE WHEN topic = 'homepage' THEN time_length_ms END)/1000 AS home_length,
SUM(CASE WHEN topic != 'homepage' THEN time_length_ms END)/1000 AS content_length
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND user_country IN {country_list}
GROUP BY user_country,experiment_group, session_id
"""
= spark.run(
time_on_site_c format(
time_on_site_c_query.= country_list
country_list ))
Total Time
= """
totla_time_c_query
WITH total_time AS (
SELECT
user_country,
experiment_group,
session_id,
SUM(time_length_ms)/1000 AS total_length
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND user_country IN {country_list}
GROUP BY user_country,experiment_group, session_id
)
SELECT
user_country,
experiment_group,
PERCENTILE_APPROX(total_length,0.50) AS 50_percentile,
PERCENTILE_APPROX(total_length,0.75) AS 75_percentile,
PERCENTILE_APPROX(total_length,0.90) AS 90_percentile,
PERCENTILE_APPROX(total_length,0.95) AS 95_percentile
FROM total_time
GROUP BY user_country,experiment_group
ORDER BY user_country,experiment_group
"""
spark.run( format(
totla_time_c_query.= country_list
country_list
) )
user_country | experiment_group | 50_percentile | 75_percentile | 90_percentile | 95_percentile | |
---|---|---|---|---|---|---|
0 | Brazil | control | 20.637 | 39.948 | 96.972 | 164.477 |
1 | Brazil | experiment | 25.193 | 52.807 | 123.295 | 200.400 |
2 | Germany | control | 15.257 | 26.127 | 49.417 | 73.394 |
3 | Germany | experiment | 15.560 | 24.174 | 48.615 | 68.727 |
4 | India | control | 18.850 | 36.875 | 66.654 | 106.661 |
5 | India | experiment | 22.300 | 45.418 | 92.987 | 129.510 |
6 | Indonesia | control | 17.296 | 28.746 | 58.731 | 76.430 |
7 | Indonesia | experiment | 15.608 | 25.449 | 48.345 | 102.256 |
8 | Nigeria | control | 42.690 | 114.206 | 297.555 | 521.008 |
9 | Nigeria | experiment | 60.662 | 122.915 | 319.056 | 606.709 |
10 | United States | control | 17.650 | 39.494 | 78.332 | 146.681 |
11 | United States | experiment | 19.837 | 42.082 | 88.832 | 123.535 |
#sns.set_theme(style="white")
#g = sns.FacetGrid(time_on_site_c, row="user_country",aspect=7, height=3.5)
#g.map_dataframe(sns.kdeplot, x="total_length",hue="experiment_group",shade=True, log_scale=True, clip =(-1,3.5))
#fig.set(yticklabels=[])
#fig.set(ylabel=None)
#fig.set(xticklabels=[0,0,0,1,10,100,1000])
#plt.xlabel("Seconds")
#plt.title('Total Time Spent')
set(rc={'figure.figsize':(15,8)})
sns."white")
sns.set_style(
=time_on_site_c, x="total_length", y = "user_country",hue="experiment_group",showfliers=False, gap=.3)
sns.boxplot(data
"Seconds")
plt.xlabel('Total Time Spent')
plt.title(
plt.ylabel([])
"upper left", bbox_to_anchor=(1, 1)) sns.move_legend(ax,
From the data above, we discover that, in Brazil, India, United Stats and Nigeria, the experiment group had more users spent more time on homepages and content pages.
In Indonesia and Germany, the control group had more users spent more time on homepages and content pages.
Time on homepage
= """
homepage_c_query
WITH total_time AS (
SELECT
user_country,
experiment_group,
session_id,
SUM(CASE WHEN topic = 'homepage' THEN time_length_ms END)/1000 AS home_length
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND user_country IN {country_list}
GROUP BY user_country,experiment_group, session_id
)
SELECT
user_country,
experiment_group,
PERCENTILE_APPROX(home_length,0.50) AS 50_percentile,
PERCENTILE_APPROX(home_length,0.75) AS 75_percentile,
PERCENTILE_APPROX(home_length,0.90) AS 90_percentile,
PERCENTILE_APPROX(home_length,0.95) AS 95_percentile
FROM total_time
GROUP BY user_country,experiment_group
ORDER BY user_country,experiment_group
"""
spark.run( format(
homepage_c_query.= country_list
country_list
) )
user_country | experiment_group | 50_percentile | 75_percentile | 90_percentile | 95_percentile | |
---|---|---|---|---|---|---|
0 | Brazil | control | 16.276 | 32.965 | 62.115 | 89.921 |
1 | Brazil | experiment | 17.559 | 37.829 | 77.639 | 109.329 |
2 | Germany | control | 12.864 | 18.631 | 35.523 | 42.511 |
3 | Germany | experiment | 12.830 | 18.216 | 31.342 | 41.777 |
4 | India | control | 14.632 | 26.862 | 43.337 | 61.269 |
5 | India | experiment | 15.706 | 30.874 | 48.493 | 81.931 |
6 | Indonesia | control | 13.864 | 22.128 | 39.994 | 57.322 |
7 | Indonesia | experiment | 13.366 | 19.215 | 37.140 | 46.788 |
8 | Nigeria | control | 20.399 | 50.511 | 96.485 | 178.650 |
9 | Nigeria | experiment | 21.995 | 53.847 | 76.100 | 109.041 |
10 | United States | control | 13.311 | 25.069 | 40.706 | 62.318 |
11 | United States | experiment | 13.295 | 22.200 | 35.865 | 48.751 |
=time_on_site_c, x="home_length", y = "user_country",hue="experiment_group",showfliers=False, gap=.3)
sns.boxplot(data"white")
sns.set_style(
"Seconds")
plt.xlabel('Time Spent on Homepage')
plt.title(
plt.ylabel([])
"upper left", bbox_to_anchor=(1, 1)) sns.move_legend(ax,
For home page time spent, we discover that, in Brazil, Nigeria, and India, the experiment group had more users spent more time on homepages.
In Germany, users spent similar time on homepages pages in the experiment group and control group.
While in Indonesia and United States, the control group had more users spent more time on homepages pages.
Time on content page
= """
content_time_c_query
WITH total_time AS (
SELECT
user_country,
experiment_group,
session_id,
SUM(CASE WHEN topic != 'homepage' THEN time_length_ms END)/1000 AS content_length
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND user_country IN {country_list}
GROUP BY user_country,experiment_group, session_id
)
SELECT
user_country,
experiment_group,
PERCENTILE_APPROX(content_length,0.50) AS 50_percentile,
PERCENTILE_APPROX(content_length,0.75) AS 75_percentile,
PERCENTILE_APPROX(content_length,0.90) AS 90_percentile,
PERCENTILE_APPROX(content_length,0.95) AS 95_percentile
FROM total_time
GROUP BY user_country,experiment_group
ORDER BY user_country,experiment_group
"""
spark.run( format(
content_time_c_query.= country_list
country_list
) )
user_country | experiment_group | 50_percentile | 75_percentile | 90_percentile | 95_percentile | |
---|---|---|---|---|---|---|
0 | Brazil | control | 8.350 | 21.339 | 84.189 | 203.409 |
1 | Brazil | experiment | 14.708 | 41.528 | 138.720 | 176.157 |
2 | Germany | control | 6.515 | 13.817 | 37.952 | 50.856 |
3 | Germany | experiment | 7.157 | 11.233 | 26.987 | 57.811 |
4 | India | control | 9.637 | 20.231 | 59.604 | 121.616 |
5 | India | experiment | 11.766 | 35.500 | 92.876 | 127.866 |
6 | Indonesia | control | 8.790 | 17.016 | 38.026 | 70.835 |
7 | Indonesia | experiment | 8.918 | 19.751 | 104.614 | 187.773 |
8 | Nigeria | control | 43.949 | 176.100 | 369.269 | 526.803 |
9 | Nigeria | experiment | 79.387 | 225.900 | 572.007 | 969.253 |
10 | United States | control | 8.436 | 22.734 | 85.557 | 136.388 |
11 | United States | experiment | 12.308 | 41.031 | 85.045 | 117.422 |
=time_on_site_c, x="content_length", y = "user_country",hue="experiment_group",showfliers=False, gap=.3)
sns.boxplot(data"white")
sns.set_style(
"Seconds")
plt.xlabel('Time Spent on Content pages')
plt.title(
plt.ylabel([])
"upper left", bbox_to_anchor=(1, 1)) sns.move_legend(ax,
For cotent page time spent, we discover that, in Brazil, India, Indonesia, Nigeria,and US the experiment group had more users spent more time on homepages.
And in Germany, in 90% of the seesions, the control group had more users spent more time on content pages.
Nigeria had much longer time spent on content pages compared to other countries.
Number of Content Viewed per Session
= """
content_per_session_query_c
WITH content_view AS (
SELECT
experiment_group,
user_country,
session_id,
SUM(CASE WHEN topic = 'homepage' THEN 0 ELSE 1 END) AS num_pages
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND user_country IN {country_list}
GROUP BY experiment_group,user_country,session_id
)
SELECT
user_country,
experiment_group,
PERCENTILE_APPROX(num_pages,0.50) AS 50_percentile,
PERCENTILE_APPROX(num_pages,0.75) AS 75_percentile,
PERCENTILE_APPROX(num_pages,0.90) AS 90_percentile,
PERCENTILE_APPROX(num_pages,0.95) AS 95_percentile
FROM content_view
GROUP BY user_country,experiment_group
ORDER BY user_country,experiment_group
"""
= spark.run(
content_per_session_c format(
content_per_session_query_c.= country_list
country_list
) )
content_per_session_c
user_country | experiment_group | 50_percentile | 75_percentile | 90_percentile | 95_percentile | |
---|---|---|---|---|---|---|
0 | Brazil | control | 0 | 1 | 1 | 2 |
1 | Brazil | experiment | 0 | 1 | 2 | 4 |
2 | Germany | control | 1 | 1 | 2 | 3 |
3 | Germany | experiment | 1 | 1 | 2 | 2 |
4 | India | control | 0 | 1 | 2 | 2 |
5 | India | experiment | 0 | 1 | 2 | 4 |
6 | Indonesia | control | 0 | 1 | 1 | 2 |
7 | Indonesia | experiment | 0 | 1 | 1 | 2 |
8 | Nigeria | control | 1 | 1 | 4 | 5 |
9 | Nigeria | experiment | 1 | 2 | 5 | 8 |
10 | United States | control | 1 | 1 | 2 | 3 |
11 | United States | experiment | 1 | 1 | 3 | 4 |
From the data above, we discover that, in Brazil, India, Nigeria and United States, users viewed more content in the experiment group than the control group per session.
In Indonesia, users viewed similar amount of content per session in both two groups.
While in Germany, users viewed fewer content in the experiment group than the control group per session.
Content Read Completion Rate
= """
content_completion_query_c
SELECT
experiment_group,
user_country,
COUNT(1) AS pageview,
SUM(CASE WHEN page_bottom_was_visible THEN 1 END)/ COUNT(1)*100 AS completion_rate
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND topic != 'homepage'
AND user_country IN {country_list}
GROUP BY experiment_group, user_country
ORDER BY user_country,experiment_group
"""
= spark.run(
content_completion_c format(
content_completion_query_c.= country_list
country_list
) )
content_completion_c
experiment_group | user_country | pageview | completion_rate | |
---|---|---|---|---|
0 | control | Brazil | 117 | 83.760684 |
1 | experiment | Brazil | 252 | 80.555556 |
2 | control | Germany | 201 | 75.124378 |
3 | experiment | Germany | 225 | 59.111111 |
4 | control | India | 154 | 69.480519 |
5 | experiment | India | 320 | 70.000000 |
6 | control | Indonesia | 115 | 77.391304 |
7 | experiment | Indonesia | 138 | 77.536232 |
8 | control | Nigeria | 279 | 81.362007 |
9 | experiment | Nigeria | 409 | 73.838631 |
10 | control | United States | 241 | 80.082988 |
11 | experiment | United States | 303 | 71.947195 |
set(rc={'figure.figsize':(15,8)})
sns."white")
sns.set_style(
= sns.barplot(content_completion_c, x="completion_rate", y="user_country", hue="experiment_group", orient="y")
axfor container in ax.containers:
='%.1f%%')
ax.bar_label(container, fmt
"Completion Rate %")
plt.xlabel(
plt.ylabel([]) 'Content Completion Rate by Country')
plt.title(
"upper left", bbox_to_anchor=(1, 1)) sns.move_legend(ax,
The content completion rate in experiment groups is lower in every country except India and Indonesia.
Top Viewed Content
= """
top_page_query_c
SELECT
experiment_group,
user_country,
page_name,
COUNT(1) AS pv,
SUM(CASE WHEN page_bottom_was_visible THEN 1 ELSE 0 END)/ COUNT(1) AS completion_rate
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND topic != 'homepage'
GROUP BY experiment_group, user_country, page_name
"""
= spark.run(top_page_query_c) top_page_c
In control group
Top 10 viewed articles in Brazil are:
'experiment_group'] == 'control')&(top_page_c['user_country'] == 'Brazil')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
30 | control | Brazil | Amazon parrot | 9 | 1.000000 |
135 | control | Brazil | Baseball | 7 | 0.857143 |
59 | control | Brazil | Ancient Egypt | 7 | 0.714286 |
79 | control | Brazil | Friends | 7 | 1.000000 |
337 | control | Brazil | Japan | 7 | 0.714286 |
1 | control | Brazil | Lionel Messi | 6 | 1.000000 |
114 | control | Brazil | Obesity | 6 | 1.000000 |
293 | control | Brazil | Australian Magpie | 5 | 0.400000 |
10 | control | Brazil | Statue of Liberty | 5 | 0.800000 |
151 | control | Brazil | Comics | 5 | 0.800000 |
Top 10 viewed articles in Germany are:
'experiment_group'] == 'control')&(top_page_c['user_country'] == 'Germany')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
300 | control | Germany | Statue of Liberty | 14 | 0.714286 |
308 | control | Germany | Lionel Messi | 13 | 0.923077 |
31 | control | Germany | Body piercing | 12 | 0.750000 |
44 | control | Germany | Japan | 12 | 0.750000 |
315 | control | Germany | Michael Jackson | 10 | 0.700000 |
157 | control | Germany | Friends | 9 | 0.777778 |
89 | control | Germany | Elephant | 9 | 0.666667 |
224 | control | Germany | Ice dance | 9 | 0.666667 |
298 | control | Germany | Masrur Temples | 9 | 1.000000 |
13 | control | Germany | Ancient Egypt | 8 | 0.625000 |
Top 10 viewed articles in India are:
'experiment_group'] == 'control')&(top_page_c['user_country'] == 'India')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
343 | control | India | Friends | 12 | 0.833333 |
232 | control | India | Climate change | 10 | 0.700000 |
14 | control | India | Lionel Messi | 10 | 0.900000 |
216 | control | India | Japan | 9 | 0.666667 |
144 | control | India | Yoga | 9 | 0.555556 |
15 | control | India | Hyderabad | 9 | 0.777778 |
290 | control | India | Ancient Egypt | 8 | 0.750000 |
112 | control | India | Maya civilization | 7 | 1.000000 |
240 | control | India | Baseball | 7 | 0.428571 |
225 | control | India | Comics | 7 | 0.857143 |
Top 10 viewed articles in Indonesia are:
'experiment_group'] == 'control')&(top_page_c['user_country'] == 'Indonesia')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
289 | control | Indonesia | Japan | 12 | 0.916667 |
120 | control | Indonesia | Lionel Messi | 10 | 1.000000 |
296 | control | Indonesia | Comics | 8 | 0.625000 |
126 | control | Indonesia | Baseball | 7 | 0.428571 |
86 | control | Indonesia | Elephant | 6 | 0.833333 |
323 | control | Indonesia | Friends | 5 | 1.000000 |
91 | control | Indonesia | Maya civilization | 5 | 0.600000 |
318 | control | Indonesia | Statue of Liberty | 5 | 0.800000 |
177 | control | Indonesia | Feminism | 5 | 0.800000 |
291 | control | Indonesia | Winnipeg | 5 | 0.600000 |
Top 10 viewed articles in Nigeria are:
'experiment_group'] == 'control')&(top_page_c['user_country'] == 'Nigeria')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
178 | control | Nigeria | Lionel Messi | 22 | 0.863636 |
2 | control | Nigeria | Friends | 18 | 0.722222 |
110 | control | Nigeria | Feminism | 16 | 0.812500 |
338 | control | Nigeria | Nelson Mandela | 14 | 0.785714 |
329 | control | Nigeria | Body piercing | 14 | 0.928571 |
204 | control | Nigeria | Maraba Coffee | 13 | 0.846154 |
39 | control | Nigeria | Michael Jackson | 11 | 0.909091 |
312 | control | Nigeria | Obesity | 11 | 0.818182 |
310 | control | Nigeria | Maya Angelou | 11 | 0.818182 |
249 | control | Nigeria | Youth Olympic Games | 11 | 0.818182 |
Top 10 viewed articles in United States are:
'experiment_group'] == 'control')&(top_page_c['user_country'] == 'United States')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
194 | control | United States | Ancient Egypt | 15 | 0.666667 |
34 | control | United States | Australian Magpie | 14 | 0.571429 |
124 | control | United States | Lionel Messi | 14 | 0.714286 |
317 | control | United States | Baseball | 13 | 0.846154 |
275 | control | United States | Body piercing | 12 | 0.833333 |
175 | control | United States | Nelson Mandela | 12 | 0.833333 |
53 | control | United States | Elephant | 11 | 0.636364 |
257 | control | United States | Friends | 11 | 0.727273 |
217 | control | United States | Maraba Coffee | 10 | 0.700000 |
321 | control | United States | Maya Angelou | 10 | 0.900000 |
In experiment group:
Top 10 viewed highlights in Brazil are:
'experiment_group'] == 'experiment')&(top_page_c['user_country'] == 'Brazil')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
294 | experiment | Brazil | Lionel Messi | 16 | 0.562500 |
334 | experiment | Brazil | Hyderabad | 15 | 0.933333 |
3 | experiment | Brazil | Japan | 13 | 0.769231 |
191 | experiment | Brazil | Friends | 12 | 0.916667 |
101 | experiment | Brazil | Yoga | 11 | 0.909091 |
119 | experiment | Brazil | Maya Angelou | 11 | 0.727273 |
115 | experiment | Brazil | Comics | 10 | 0.900000 |
52 | experiment | Brazil | Ice dance | 9 | 0.777778 |
72 | experiment | Brazil | Sustainable energy | 9 | 0.555556 |
80 | experiment | Brazil | Statue of Liberty | 9 | 0.666667 |
Top 10 viewed highlights in Germany are:
'experiment_group'] == 'experiment')&(top_page_c['user_country'] == 'Germany')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
279 | experiment | Germany | Obesity | 17 | 0.588235 |
121 | experiment | Germany | Elephant | 16 | 0.437500 |
180 | experiment | Germany | Lionel Messi | 15 | 0.333333 |
158 | experiment | Germany | Giraffe | 14 | 0.642857 |
313 | experiment | Germany | Climate change | 14 | 0.857143 |
190 | experiment | Germany | Japan | 12 | 0.500000 |
319 | experiment | Germany | Feminism | 11 | 0.636364 |
297 | experiment | Germany | Ancient Egypt | 11 | 0.545455 |
278 | experiment | Germany | Statue of Liberty | 11 | 0.545455 |
324 | experiment | Germany | Body piercing | 10 | 0.800000 |
Top 10 viewed highlights in India are:
'experiment_group'] == 'experiment')&(top_page_c['user_country'] == 'India')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
295 | experiment | India | Elephant | 25 | 0.600000 |
130 | experiment | India | Sustainable energy | 17 | 0.647059 |
234 | experiment | India | Climate change | 17 | 0.823529 |
5 | experiment | India | Hyderabad | 16 | 0.750000 |
286 | experiment | India | Amazon parrot | 15 | 0.800000 |
11 | experiment | India | Japan | 15 | 0.733333 |
133 | experiment | India | Australian Magpie | 15 | 0.533333 |
192 | experiment | India | Yoga | 15 | 0.733333 |
116 | experiment | India | Comics | 15 | 0.533333 |
134 | experiment | India | Michael Jackson | 13 | 0.692308 |
Top 10 viewed highlights in Indonesia are:
'experiment_group'] == 'experiment')&(top_page_c['user_country'] == 'Indonesia')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
299 | experiment | Indonesia | Japan | 15 | 0.733333 |
75 | experiment | Indonesia | Lionel Messi | 10 | 0.600000 |
163 | experiment | Indonesia | Climate change | 7 | 0.857143 |
62 | experiment | Indonesia | Rwanda | 7 | 0.571429 |
212 | experiment | Indonesia | Statue of Liberty | 7 | 0.428571 |
241 | experiment | Indonesia | Hyderabad | 7 | 1.000000 |
273 | experiment | Indonesia | Comics | 6 | 0.666667 |
82 | experiment | Indonesia | Winnipeg | 6 | 0.500000 |
106 | experiment | Indonesia | Amazon parrot | 6 | 1.000000 |
42 | experiment | Indonesia | Giraffe | 6 | 0.833333 |
Top 10 viewed highlights in Nigeria are:
'experiment_group'] == 'experiment')&(top_page_c['user_country'] == 'Nigeria')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
223 | experiment | Nigeria | Lionel Messi | 28 | 0.714286 |
188 | experiment | Nigeria | Climate change | 25 | 0.680000 |
162 | experiment | Nigeria | Friends | 21 | 0.714286 |
282 | experiment | Nigeria | Yoga | 20 | 0.650000 |
264 | experiment | Nigeria | Sustainable energy | 20 | 0.700000 |
85 | experiment | Nigeria | Statue of Liberty | 19 | 0.684211 |
263 | experiment | Nigeria | Youth Olympic Games | 18 | 0.722222 |
246 | experiment | Nigeria | Obesity | 18 | 0.833333 |
12 | experiment | Nigeria | Comics | 17 | 0.764706 |
29 | experiment | Nigeria | Ancient Egypt | 17 | 0.529412 |
Top 10 viewed highlights in United States are:
'experiment_group'] == 'experiment')&(top_page_c['user_country'] == 'United States')].sort_values(by=['pv'], ascending=False).head(10) top_page_c.loc[(top_page_c[
experiment_group | user_country | page_name | pv | completion_rate | |
---|---|---|---|---|---|
262 | experiment | United States | Lionel Messi | 18 | 0.666667 |
25 | experiment | United States | Body piercing | 17 | 0.764706 |
102 | experiment | United States | Elephant | 17 | 0.823529 |
267 | experiment | United States | Friends | 16 | 0.687500 |
108 | experiment | United States | Michael Jackson | 15 | 0.800000 |
269 | experiment | United States | Climate change | 15 | 0.666667 |
202 | experiment | United States | Obesity | 14 | 0.928571 |
179 | experiment | United States | Feminism | 13 | 0.769231 |
196 | experiment | United States | Japan | 13 | 0.846154 |
322 | experiment | United States | Comics | 12 | 0.750000 |
From the list, we can see that some content, such as Lionel Messi and Climate change, appears in the top-viewed lists of most countries. The rest of the top-viewed content differs from country to country.
Top View Topics
= """
top_topic_query_c
SELECT
experiment_group,
user_country,
category_name AS topic,
COUNT(1) AS pv,
SUM(CASE WHEN page_bottom_was_visible THEN 1 ELSE 0 END)/ COUNT(1) AS completion_rate
FROM global_temp.event_data_view
WHERE event_type = 'pageUnloaded'
AND topic != 'homepage'
GROUP BY experiment_group,user_country, category_name
"""
= spark.run(top_topic_query_c) top_topic_c
In control group
Top viewed topics in Brazil are:
'experiment_group'] == 'control')&(top_topic_c['user_country'] == 'Brazil')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
88 | control | Brazil | NATURE | 21 | 0.857143 |
13 | control | Brazil | HISTORY | 20 | 0.800000 |
18 | control | Brazil | SPORT | 18 | 0.777778 |
93 | control | Brazil | LIFESTYLE | 16 | 0.812500 |
17 | control | Brazil | PLACES | 15 | 0.800000 |
76 | control | Brazil | PERSONALITIES | 14 | 1.000000 |
87 | control | Brazil | TOPICAL | 13 | 0.846154 |
Top viewed topics in Germany are:
'experiment_group'] == 'control')&(top_topic_c['user_country'] == 'Germany')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
20 | control | Germany | HISTORY | 39 | 0.769231 |
26 | control | Germany | LIFESTYLE | 31 | 0.806452 |
7 | control | Germany | PERSONALITIES | 29 | 0.793103 |
40 | control | Germany | SPORT | 28 | 0.678571 |
84 | control | Germany | PLACES | 27 | 0.814815 |
67 | control | Germany | NATURE | 24 | 0.583333 |
29 | control | Germany | TOPICAL | 23 | 0.782609 |
Top viewed topics in India are:
'experiment_group'] == 'control')&(top_topic_c['user_country'] == 'India')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
23 | control | India | LIFESTYLE | 29 | 0.758621 |
55 | control | India | TOPICAL | 25 | 0.680000 |
37 | control | India | HISTORY | 24 | 0.791667 |
57 | control | India | SPORT | 23 | 0.434783 |
21 | control | India | PLACES | 18 | 0.722222 |
54 | control | India | PERSONALITIES | 18 | 0.833333 |
77 | control | India | NATURE | 17 | 0.647059 |
Top viewed topics in Indonesia are:
'experiment_group'] == 'control')&(top_topic_c['user_country'] == 'Indonesia')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
80 | control | Indonesia | PLACES | 20 | 0.800000 |
65 | control | Indonesia | PERSONALITIES | 19 | 0.789474 |
60 | control | Indonesia | LIFESTYLE | 18 | 0.777778 |
2 | control | Indonesia | NATURE | 16 | 0.812500 |
15 | control | Indonesia | HISTORY | 15 | 0.733333 |
85 | control | Indonesia | SPORT | 14 | 0.571429 |
36 | control | Indonesia | TOPICAL | 13 | 0.923077 |
Top viewed topics in Nigeria are:
'experiment_group'] == 'control')&(top_topic_c['user_country'] == 'Nigeria')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
47 | control | Nigeria | PERSONALITIES | 58 | 0.844828 |
64 | control | Nigeria | LIFESTYLE | 53 | 0.849057 |
90 | control | Nigeria | TOPICAL | 46 | 0.782609 |
89 | control | Nigeria | SPORT | 34 | 0.794118 |
73 | control | Nigeria | PLACES | 32 | 0.812500 |
11 | control | Nigeria | HISTORY | 29 | 0.827586 |
92 | control | Nigeria | NATURE | 27 | 0.740741 |
Top viewed topics in United States are:
'experiment_group'] == 'control')&(top_topic_c['user_country'] == 'United States')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
66 | control | United States | LIFESTYLE | 43 | 0.767442 |
69 | control | United States | PERSONALITIES | 43 | 0.813953 |
72 | control | United States | NATURE | 40 | 0.675000 |
42 | control | United States | HISTORY | 33 | 0.787879 |
32 | control | United States | TOPICAL | 29 | 0.965517 |
82 | control | United States | SPORT | 28 | 0.892857 |
61 | control | United States | PLACES | 25 | 0.760000 |
In experiment group:
Top viewed topics in Brazil are:
'experiment_group'] == 'experiment')&(top_topic_c['user_country'] == 'Brazil')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
71 | experiment | Brazil | PLACES | 43 | 0.860465 |
25 | experiment | Brazil | PERSONALITIES | 40 | 0.700000 |
52 | experiment | Brazil | LIFESTYLE | 39 | 0.923077 |
30 | experiment | Brazil | TOPICAL | 34 | 0.735294 |
19 | experiment | Brazil | SPORT | 33 | 0.878788 |
27 | experiment | Brazil | NATURE | 32 | 0.750000 |
48 | experiment | Brazil | HISTORY | 31 | 0.774194 |
Top viewed topics in Germany are:
'experiment_group'] == 'experiment')&(top_topic_c['user_country'] == 'Germany')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
45 | experiment | Germany | TOPICAL | 50 | 0.660000 |
81 | experiment | Germany | NATURE | 45 | 0.600000 |
95 | experiment | Germany | HISTORY | 33 | 0.606061 |
68 | experiment | Germany | PERSONALITIES | 30 | 0.466667 |
74 | experiment | Germany | LIFESTYLE | 28 | 0.714286 |
12 | experiment | Germany | PLACES | 23 | 0.478261 |
75 | experiment | Germany | SPORT | 16 | 0.500000 |
Top viewed topics in India are:
'experiment_group'] == 'experiment')&(top_topic_c['user_country'] == 'India')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
16 | experiment | India | NATURE | 67 | 0.656716 |
1 | experiment | India | TOPICAL | 53 | 0.754717 |
41 | experiment | India | PLACES | 46 | 0.782609 |
86 | experiment | India | PERSONALITIES | 44 | 0.681818 |
34 | experiment | India | LIFESTYLE | 43 | 0.674419 |
70 | experiment | India | SPORT | 42 | 0.690476 |
53 | experiment | India | HISTORY | 25 | 0.640000 |
Top viewed topics in Indonesia are:
'experiment_group'] == 'experiment')&(top_topic_c['user_country'] == 'Indonesia')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
22 | experiment | Indonesia | PLACES | 35 | 0.714286 |
44 | experiment | Indonesia | PERSONALITIES | 19 | 0.684211 |
58 | experiment | Indonesia | LIFESTYLE | 19 | 0.736842 |
59 | experiment | Indonesia | NATURE | 19 | 0.842105 |
6 | experiment | Indonesia | TOPICAL | 17 | 0.941176 |
51 | experiment | Indonesia | SPORT | 15 | 0.933333 |
33 | experiment | Indonesia | HISTORY | 14 | 0.642857 |
Top viewed topics in Nigeria are:
'experiment_group'] == 'experiment')&(top_topic_c['user_country'] == 'Nigeria')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
35 | experiment | Nigeria | TOPICAL | 78 | 0.769231 |
43 | experiment | Nigeria | PERSONALITIES | 71 | 0.690141 |
79 | experiment | Nigeria | SPORT | 65 | 0.723077 |
63 | experiment | Nigeria | LIFESTYLE | 58 | 0.775862 |
39 | experiment | Nigeria | HISTORY | 53 | 0.641509 |
91 | experiment | Nigeria | NATURE | 45 | 0.888889 |
3 | experiment | Nigeria | PLACES | 39 | 0.692308 |
Top viewed topics in United States are:
'experiment_group'] == 'experiment')&(top_topic_c['user_country'] == 'United States')].sort_values(by=['pv'], ascending=False).head(10) top_topic_c.loc[(top_topic_c[
experiment_group | user_country | topic | pv | completion_rate | |
---|---|---|---|---|---|
4 | experiment | United States | LIFESTYLE | 54 | 0.740741 |
24 | experiment | United States | TOPICAL | 54 | 0.814815 |
31 | experiment | United States | PERSONALITIES | 52 | 0.769231 |
28 | experiment | United States | NATURE | 45 | 0.733333 |
94 | experiment | United States | SPORT | 35 | 0.600000 |
50 | experiment | United States | HISTORY | 33 | 0.575758 |
78 | experiment | United States | PLACES | 30 | 0.700000 |
From the list, we can see the most popular topics are differ country by country.
We see that some content, such as Lionel Messi and Climate change, appears in the top-viewed lists of most countries. The rest of the top-viewed content differs from country to country.