In 2018, three researchers from the US Census Bureau published a paper entitled “Understanding Database Reconstruction Attacks on Public Data.” [1] The article showed that private data on many individuals could be reverse engineered from public data. As I wrote about a few days ago, census blocks are at the …

Let’s work our way up to differentially private stochastic gradient descent (DP-SGD) a little at a time. We’ll first look at gradient descent, then stochastic gradient descent, then finally differentially private stochastic gradient descent. Gradient descent We’ll start with gradient descent. Suppose you have a function of several variables f(x) …

On June 29 this year I said on Twitter that companies would start avoiding AI to avoid regulation. I followed that up with an article Three advantages of non-AI models. The third advantage I listed was Statistical models are not subject to legislation hastily written in response to recent improvements …

Can you tell who someone is from their telephone number? That’s kinda the point of telephone numbers, to let you contact someone. And indeed telephone number is one the 18 identifiers under HIPAA Safe Harbor. But whether any piece of information allows you to identify someone depends on context. If …

A few days ago I wrote about U-statistics, statistics which can be expressed as the average of a symmetric function over all combinations of elements of a set. V-statistics can be written as an average of over all products of elements of a set. Let S be a statistical sample …

John Tukey developed his so-called g-and-h distribution to be very flexible, having a wide variety of possible values of skewness and kurtosis. Although the reason for the distribution’s existence is its range of possible skewness and values, calculating the skewness and kurtosis of the distribution is not simple. Definition Let …

A symmetric function is a function whose value is unchanged under every permutation of its arguments. The previous post showed how three symmetric functions of the sides of a triangle a + b + c ab + bc + ac abc are related to the perimeter, inner radius, and outer …

Suppose you have a bivariate normal distribution with correlation ρ where Then the level sets of the density function are ellipses, and the eccentricity e of the ellipses is related to the correlation ρ by Plots For example, suppose ρ = 0.8. Here’s a plot of the density function f(x, …

About half of children are boys and half are girls, but that doesn’t mean that every couple is equally likely to have a boy or a girl each time they conceive a child. And evidence suggests that indeed the probability of conceiving a girl varies per couple. I will simplify …

