Here’s a way to find a 95% confidence interval for any parameter θ. With probability 0.95, return the real line. With probability 0.05, return the empty set. Clearly 95% of the time this procedure will return an interval that contains θ. This example shows the difference between a confidence interval …

Suppose you do N trials of something that can succeed or fail. After your experiment you want to present a point estimate and a confidence interval. Or if you’re a Bayesian, you want to present a posterior mean and a credible interval. The numerical results hardly differ, though the two …

I recently ran across a new probability distribution called the “Scaled Beta2” or “SBeta2” for short in [1]. For positive argument x and for positive parameters p, q, and b, its density is This is a heavy-tailed distribution. For large x, the probability density is O(x–q-1), the same as a …

March 3, 2021, 11:23 a.m.

Suppose you want to estimate the number of patients who respond to some treatment in a large population of size N and what you have is data on a small sample of size n. The usual way of going about this calculates the proportion of responses in the small sample, …

"Bayesian Statistics the Fun Way: Understanding Statistics and Probability with Star Wars, Lego, and Rubber Ducks," by Will Kurt (2019 No Starch Press) is an excellent introduction to subjects critical to all data scientists. Will Kurt, in fact, is a data scientist! I always advise my data science classes at …

Suppose you are designing an autonomous system that will gather data and adapt its behavior to that data. At first you face the so-called cold-start problem. You don’t have any data when you first turn the system on, and yet the system needs to do something before it has accumulated …

March 4, 2020, 11:10 a.m.

This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is a complementary Domino project available. Introduction Feature engineering and hyperparameter optimization are two important model building steps. Over the years, I have debated with many colleagues as …

This post will look at adaptively randomized trial designs. In particular, we want to focus on multi-arm trials, i.e. trials of more than two treatments. The aim is to drop the less effective treatments quickly so the trial can focus on determining which of the better treatments is best. We’ll …

This post will look at adaptively randomized trial designs. In particular, we want to focus on multi-arm trials, i.e. trials of more than two treatments. The aim is to drop the less effective treatments quickly so the trial can focus on determining which of the better treatments is best. We’ll …

How do you operate a data-driven application before you have any data? This is known as the cold start problem. We faced this problem all the time when I designed clinical trials at MD Anderson Cancer Center. We uses Bayesian methods to design adaptive clinical trial designs, such as clinical …