Randomized Controlled Experiments#

from datascience import *
from cs104 import *
import numpy as np
%matplotlib inline

1. Warm-up Permutation Test#

survey = Table().read_table('data/prelab01-survey-fall2025.csv')
survey = survey.relabeled('Left or right handed? ', 'Left or Right Handed')  # clean up label
survey = survey.where('Left or Right Handed', are.not_equal_to('Ambidextrous'))
survey

Favorite icecream flavor	Favorite planet	Height (in inches)	Distance Home (in miles)	Birthday month	Left or Right Handed
Mint chocolate chip	Neptune	67	49	September	Right handed
Vanilla	Neptune	72	178	December	Right handed
Purple Cow	Earth	72	3058.6	April	Right handed
Strawberry	Earth	62	66	July	Right handed
Strawberry	Earth	66.5	185	June	Left handed
Mint chocolate chip	Saturn	65	2851.8	January	Right handed
Purple Cow	Mars	64	162	February	Right handed
Mint chocolate chip	Earth	68.5	120	April	Right handed
Mint chocolate chip	Earth	66	18.4	January	Right handed
Chocolate	Earth	68	146	January	Right handed

... (48 rows omitted)

survey.hist('Height (in inches)')

../_images/23-randomized-controlled-experiments_5_0.png

survey.group('Left or Right Handed')

Left or Right Handed	count
Left handed	8
Right handed	50

observed = abs_difference_of_means(survey, 'Left or Right Handed', 'Height (in inches)')
observed

0.20049999999999102

Is the height difference significant?

results = simulate_permutation_statistic(survey, 'Left or Right Handed', 'Height (in inches)', 5000)

plot = Table().with_columns('abs_difference_of_means', results).hist(left_end=observed)
plot.set_title('Null hypothesis empirical distibution')
plot.dot(observed)

../_images/23-randomized-controlled-experiments_10_0.png

p_value = empirical_pvalue(results, observed)
p_value

0.9064

2. Randomized Controlled Experiment with BTA#

rct = Table.read_table('data/bta.csv')
rct.sample(10)

Group	Result
Treatment	1
Treatment	0
Control	1
Control	0
Treatment	0
Control	0
Control	0
Control	0
Control	1
Control	0

rct.group('Group')

Group	count
Control	16
Treatment	15

rct.pivot('Result', 'Group')

Group	0.0	1.0
Control	14	2
Treatment	6	9

rct.group('Group', np.mean)

Group	Result mean
Control	0.125
Treatment	0.6

Permutation Testing#

observed_statistic = abs_difference_of_means(rct, 'Group', 'Result')
observed_statistic

0.475

type(observed_statistic)

float

results = simulate_permutation_statistic(rct, 'Group', 'Result', 2000)

plot = Table().with_columns('Abs Difference in Relief Proportions', results).hist(bins=np.arange(0,0.9,1/16))
plot.set_title('Null hypothesis empirical distibution')
plot.dot(observed_statistic)

../_images/23-randomized-controlled-experiments_21_0.png

p_value = empirical_pvalue(results, observed_statistic)
p_value

0.008

3. Sample Size, Effect Size, and P-values#

What’s the relationship between effect size, sample size, and p-value?

What we had before.

../_images/23-randomized-controlled-experiments_27_0.png

What if the effect size was slightly smaller? What if the sample size was bigger?

../_images/23-randomized-controlled-experiments_29_0.png

Let’s look at all these relationships at once!

interact(back_pain_exploration, 
         observed_sample_size=Slider(10, 128, 1), 
         treatment_prop_effective=Slider(0.05, 0.95, 0.01),
         control_prop_effective=Slider(0.05, 0.95, 0.01))

observed_sample_size

treatment_prop_effective

control_prop_effective

Here’s an animation showing the effects above.

CSCI 104: Data Science and Computing for All

Randomized Controlled Experiments

Contents

Randomized Controlled Experiments#

1. Warm-up Permutation Test#

2. Randomized Controlled Experiment with BTA#

Permutation Testing#

3. Sample Size, Effect Size, and P-values#