Randomized Controlled Experiments#

from datascience import *
from cs104 import *
import numpy as np
%matplotlib inline

1. Warm-up Permutation Test#

survey = Table().read_table('data/prelab01-survey-fall2024.csv')
survey = survey.where('Left or Right Handed', are.not_equal_to('Ambidextrous'))
survey
Year at Williams Favorite icecream flavor Favorite planet Height (in inches) Distance Home (in miles) Birthday month Left or Right Handed
Year 4 Purple Cow Pluto* because liking Pluto is a protest move against Pl ... 66 175 October Right handed
Year 4 Purple Cow Neptune 72 213 January Right handed
Year 2 Chocolate Venus 66 152 April Left handed
Year 4 Mint chocolate chip Earth 71 218 September Right handed
Year 1 Vanilla Venus 60 1729 February Right handed
Year 1 Vanilla Neptune 70 402 December Right handed
Year 2 Chocolate Earth 67 2570 February Right handed
Year 2 Purple Cow Earth 74 132.6 June Right handed
Year 2 Vanilla Earth 68.4 1275 May Right handed
Year 1 Chocolate Mercury 62 140 June Right handed

... (48 rows omitted)

survey.group('Left or Right Handed')
Left or Right Handed count
Left handed 9
Right handed 49
observed = abs_difference_of_means(survey, 'Left or Right Handed', 'Height (in inches)')
observed
1.4226757369614376

Is the height difference significant?

results = simulate_permutation_statistic(survey, 'Left or Right Handed', 'Height (in inches)', 5000)
plot = Table().with_columns('abs_difference_of_means', results).hist(left_end=observed)
plot.set_title('Null hypothesis empirical distibution')
plot.dot(observed)
../_images/23-randomized-controlled-experiments_9_0.png
p_value = empirical_pvalue(results, observed)
p_value
0.314

2. Randomized Controlled Experiment with BTA#

rct = Table.read_table('data/bta.csv')
rct.sample(10)
Group Result
Treatment 0
Control 0
Control 0
Treatment 0
Treatment 0
Control 0
Control 0
Treatment 1
Treatment 0
Treatment 1
rct.group('Group')
Group count
Control 16
Treatment 15
rct.pivot('Result', 'Group')
Group 0.0 1.0
Control 14 2
Treatment 6 9
rct.group('Group', np.mean)
Group Result mean
Control 0.125
Treatment 0.6

Permutation Testing#

observed_statistic = abs_difference_of_means(rct, 'Group', 'Result')
observed_statistic
0.475
type(observed_statistic)
float
results = simulate_permutation_statistic(rct, 'Group', 'Result', 2000)
plot = Table().with_columns('Abs Difference in Relief Proportions', results).hist(bins=np.arange(0,0.9,1/16))
plot.set_title('Null hypothesis empirical distibution')
plot.dot(observed_statistic)
../_images/23-randomized-controlled-experiments_20_0.png
p_value = empirical_pvalue(results, observed_statistic)
p_value
0.0085

3. Sample Size, Effect Size, and P-values#

What’s the relationship between effect size, sample size, and p-value?

What we had before.

../_images/23-randomized-controlled-experiments_26_0.png

What if the effect size was slightly smaller? What if the sample size was bigger?

../_images/23-randomized-controlled-experiments_28_0.png

Let’s look at all these relationships at once!

interact(back_pain_exploration, 
         observed_sample_size=Slider(10, 120, 1), 
         treatment_prop_effective=Slider(0.05, 0.95, 0.01),
         control_prop_effective=Slider(0.05, 0.95, 0.01))

Here’s an animation showing the effects above.