Inference Library Reference#

  • Click on any row to see detailed examples.

Library Sections

Sampling and Simulation

Name Description Parameters Output

sample_proportions(sample_size,        
                   model_proportions)

Sample_size should be an integer, model_proportions an array of probabilities that sum up to 1. The function samples sample_size objects from the distribution specified by model_proportions. It returns an array with the same size as model_proportions. Each item in the array corresponds to the proportion of times it was sampled out of the sample_size times.

  1. int : sample size

  2. array : an array of proportions that should sum to 1

array : each item corresponds to the proportion of times that corresponding item was sampled from model_proportions in sample_size draws, should sum to 1

Examples
model_proportions = make_array(0.9, 0.1)
sample_proportions(100, model_proportions)
array([0.92, 0.08])
model_proportions = make_array(0.7, 0.2, 0.1)
sample_proportions(100, model_proportions)
array([0.76, 0.18, 0.06])

simulate(make_one_outcome, num_outcomes)

Simulates the outcome of num_outcomes events. The outcome of an event is computed by the make_one_outcome function passed to simulate

  • make_one_outcome: a function that returns the outcome of an event.

  • num_outcomes: the number of events to simulate.

An array of the simulated outcomes.

Examples
dice = np.arange(1,7)

def roll_two_dice():
  return np.random.choice(dice) + np.random.choice(dice)
simulate(roll_two_dice, 10)
array([10.,  7.,  9.,  5., 10.,  8., 10.,  8.,  6.,  8.])

simulate_sample_statistic(make_one_sample,
   sample_size,
   compute_statistic,
   num_trials)

Simulates the process of computing a statistic for random samples.

  • make_one_sample: a function that takes an int \(n\) and returns a sample as an array of \(n\) elements.

  • sample_size: the size of the samples to use in the simulation.

  • compute_statistic: a function that takes a sample as an array and returns the statistic for that sample.

  • num_trials: the number of simulation steps to perform.

An array of the simulated statistics.

Examples
coin = make_array('heads', 'tails')

def flip_coins(n):
  return np.random.choice(coin, n)

def count_heads(sample):
  return np.count_nonzero(sample == 'heads')

simulate_sample_statistic(flip_coins, 100, 
                          count_heads, 10)
array([40., 42., 52., 52., 47., 45., 45., 47., 50., 49.])
coin = make_array('heads', 'tails')

def flip_coins(n):
  return np.random.choice(coin, n)

def diff_from_half_heads(sample):
  return abs(np.count_nonzero(sample == 'heads') - len(sample)/2)

simulate_sample_statistic(flip_coins, 100, 
                          diff_from_half_heads, 10)
array([2., 5., 0., 0., 5., 3., 5., 5., 3., 5.])

Hypothesis Testing

Name Description Parameters Output

empirical_pvalue(simulated_statistics,
    observed_statistic)

Computes the proportion of values in simulated statistics that are at least as large as observed_statistic

  • simulated_statistics: an array of int of float.

  • observed_statistic: an int or float.

A proportion.

Examples
sample_statistics = make_array(2,2,3,4,5,2,2,6)
empirical_pvalue(sample_statistics, 5)    
0.25
sample_statistics = make_array(2,2,3,4,5,2,2,6)
empirical_pvalue(sample_statistics, 6)    
0.125

Permutation Tests

Name Description Parameters Output

permutation_sample(table,
    group_label)

Returns the given table augmented with a new column Shuffled Labels that contains a permutation of the values in the column group_column_label.

  • table: a Table.

  • group_label: the column to permute.

A new Table.

Examples
trial
Group Outcome
Control 0
Control 0
Control 1
Control 0
Treatment 1
Treatment 1
Treatment 0
Treatment 1
permutation_sample(trial, 'Group')
Group Outcome Shuffled Label
Control 0 Control
Control 0 Treatment
Control 1 Treatment
Control 0 Control
Treatment 1 Control
Treatment 1 Treatment
Treatment 0 Control
Treatment 1 Treatment

abs_difference_of_means(table,
   group_label,
   value_label)

Takes a table, the label of the column used to divide rows into two groups, and the label of the column storing the values for each row. Returns: the absolute difference of means for the two groups.

Note: If the values are all 0 or 1, then the result can be interpreted as the difference in the proportion of 1 for the two groups.

  • table: a Table.

  • group_label: the column to divide the rows.

  • value_label: the column holding numerical values.

A new Table.

Examples
sizes
Color Size
Blue 2
Blue 4
Red 3
Blue 6
Red 2
Red 1
abs_difference_of_means(sizes, 'Color', 'Size')
2.0
trial
Group Outcome
Control 0
Control 0
Control 1
Control 0
Treatment 1
Treatment 1
Treatment 0
Treatment 1
abs_difference_of_means(trial, 'Group', 'Outcome')
0.5

simulate_permutation_statistic(table,
   group_label,
   value_label,
   num_trials)

Simulates num_trials permutation sampling steps and returns an array of abs_difference_of_means statistics for those samples.

  • table: a Table.

  • group_label: the column to divide the rows.

  • value_label: the column holding the values of interest.

  • num_trials: the number of simulation steps to perform.

An array of the simulated statistics.

Examples
big_trial.sample(10)
Group Outcome
Treatment 1
Treatment 0
Treatment 0
Treatment 1
Control 0
Treatment 1
Treatment 1
Treatment 0
Control 1
Treatment 0
simulate_permutation_statistic(big_trial, 'Group', 
                               'Outcome', 5)
array([0.076, 0.072, 0.02 , 0.02 , 0.004])