[Gist embed code (this it for embedding directly into the website if that’s preferred: <script src=”https://gist.github.com/Rob-wine/525494f4ea9370fe830b630bccb885e4.js”></script> ]def bootstrapped_conf_interval(data, metric, num_runs=1000, conf=.95):
“”” purpose: Calculate confidence interval for model performance (metric)
Parameters
—————-
data: list of data points in a format that is acceptable to get_metric
metric: options include ‘accuracy’, ‘precision’,’recall’ and ‘f1-score’
conf: how certain you want to be about the range of possible values of your metrc
num_runs: int designating how many bootstrapped samples of data you would like
Returns
———-
Confidence tuple with floats as entries ex: (.2,.3)
“””
results = []
# get num _runs bootstrapped samples of unlabeled and labeled data
for i in range(num_runs):
bootstrapped_data = np.random.choice(data, len(data))
results.append(get_metric(bootstrapped_data, metric))
results.sort()
bootstrapped_mean = sum(results) / float(len(results))
x_bar = get_metric(data, metric)
# how much of the measured metrics do you want to cut off on either end
left_index = int(num_runs * (1 – conf) / 2)
# deviations from bootstrapped means
delta_interval = [results[left_index] – bootstrapped_mean,
results[-left_index] – bootstrapped_mean]
# deviations from mean from actual sample, not bootstrapped sample
interval = [delta_interval[0] + x_bar, delta_interval[1] + x_bar]
return interval