All the data you need.

Tag: Statistics

Database reconstruction attacks
In 2018, three researchers from the US Census Bureau published a paper entitled “Understanding Database Reconstruction Attacks on Public Data.” [1] The article showed that private data on many individuals could be reverse engineered from public data. As I wrote about a few days ago, census blocks are at the …
Building fair algorithms
Emma Pierson and Kowe Kadoma, for Fred Hutchinson Cancer Center, have a short…Tags: algorithm, bias, Coursera, Fred Hutchinson Cancer Center
Differentially private stochastic gradient descent
Let’s work our way up to differentially private stochastic gradient descent (DP-SGD) a little at a time. We’ll first look at gradient descent, then stochastic gradient descent, then finally differentially private stochastic gradient descent. Gradient descent We’ll start with gradient descent. Suppose you have a function of several variables f(x) …
Using classical statistics to avoid regulatory burden
On June 29 this year I said on Twitter that companies would start avoiding AI to avoid regulation. I followed that up with an article Three advantages of non-AI models. The third advantage I listed was Statistical models are not subject to legislation hastily written in response to recent improvements …
Identifiers depend on context
Can you tell who someone is from their telephone number? That’s kinda the point of telephone numbers, to let you contact someone. And indeed telephone number is one the 18 identifiers under HIPAA Safe Harbor. But whether any piece of information allows you to identify someone depends on context. If …
Nobel Prize for research in global labor markets, using historical data
Claudia Goldin, an economist at Harvard, has won the Nobel Prize in Economics.…Tags: Claudia Goldin, economics, gender, Nobel Prize, work
Video Highlights: Make Better Decisions with Data — with Dr. Allen Downey
In this video presentation, our good friend Jon Krohn, Co-Founder and Chief Data Scientist at the machine learning company Nebula, is joined by Dr. Allen Downey, renowned author and professor, who shares insights from his upcoming book 'Probably Overthinking It,' breaking down underused techniques like Survival Analysis, explaining common paradoxes, …
Crows might understand probabilities
Researchers at the University of Tübingen are studying crows’ abilities to understand statistical…Tags: Ars Technica, birds, probability
A few days ago I wrote about U-statistics, statistics which can be expressed as the average of a symmetric function over all combinations of elements of a set. V-statistics can be written as an average of over all products of elements of a set. Let S be a statistical sample …
Manual data labeling behind the AI
One of the things that makes AI seem neat is that it sometimes…Tags: AI, Bloomberg, ethics, Google
Moments of Tukey’s g-and-h distribution
John Tukey developed his so-called g-and-h distribution to be very flexible, having a wide variety of possible values of skewness and kurtosis. Although the reason for the distribution’s existence is its range of possible skewness and values, calculating the skewness and kurtosis of the distribution is not simple. Definition Let …
Symmetric functions and U-statistics
A symmetric function is a function whose value is unchanged under every permutation of its arguments. The previous post showed how three symmetric functions of the sides of a triangle a + b + c ab + bc + ac abc are related to the perimeter, inner radius, and outer …
Power to the Data Report Podcast: The Math Behind the Models
Hello, and welcome to the “Power-to-the-Data Report” podcast where we cover timely topics of the day from throughout the Big Data ecosystem. I am your host Daniel Gutierrez from insideBIGDATA where I serve as Editor-in-Chief & Resident Data Scientist. Today’s topic is “The Math Behind the Models,” one of my …
Introduction to statistical learning, with Python examples
An Introduction to Statistical Learning, with Applications in R by Gareth James, Daniela…Tags: book, learning, Python
Eccentricity of bivariate normal level sets
Suppose you have a bivariate normal distribution with correlation ρ where Then the level sets of the density function are ellipses, and the eccentricity e of the ellipses is related to the correlation ρ by Plots For example, suppose ρ = 0.8. Here’s a plot of the density function f(x, …
Astericking NBA champions
It seems to have grown more common for basketball fans to complain that…Tags: basketball, Pudding, Russell Samora
Babies and the beta-binomial distribution
About half of children are boys and half are girls, but that doesn’t mean that every couple is equally likely to have a boy or a girl each time they conceive a child. And evidence suggests that indeed the probability of conceiving a girl varies per couple. I will simplify …
Changes to Blackjack payouts so that gamblers lose more to casinos
Katherine Sayre, for The Wall Street Journal, on Las Vegas casinos squeezing out…Tags: gambling, Las Vegas, probability