All the data you need.

Tag: Statistics

What’s “differential” about differential privacy?
Interest in differential privacy is growing rapidly. As evidence of this, here’s the result of a Google Ngram search [1] on “differential privacy.” When I first mentioned differential privacy to consulting leads a few years ago, not many had heard of it. Now most are familiar with the term, though …
50 sigma events for t distributions
I had a recent discussion with someone concerning a 50 sigma event, and that conversation prompted this post. When people count “sigmas” they usually have normal distributions in mind. And six-sigma events so rare for normal random variables that it’s more likely the phenomena under consideration doesn’t have a normal …
Power law principles are not about power laws
Much of what you’ll read about power laws in popular literature is not mathematically accurate, but still useful. A lot of probability distributions besides power laws look approximately linear on a log-log plot, particularly over part of their range. The usual conclusion from this observation is that much of the …
Your location for sale
Companies collect and aggregate location data from millions of people’s phones. Then that…Tags: location, privacy, The Markup
Americans are dying too much
Derek Thompson for The Atlantic highlights recent research comparing mortality in America against…Tags: Atlantic, Derek Thompson, mortality
How Humans Judge Machines
How Humans Judge Machines is an academic publication covering the results of experiments…Tags: book, interaction, machines
Machine learning explained at five difficulty levels
For their 5 Levels series, Wired brought in Hilary Mason to explain machine…Tags: Hilary Mason, machine learning, Wired
Missing data
Missing data throws a monkey wrench into otherwise elegant plans. Yesterday’s post on genetic sequence data illustrates this point. DNA sequences consist of four bases, but we need to make provision for storing a fifth value for unknowns. If you know there’s a base in a particular position, but you …
Initial letter frequency
I needed to know the frequencies of letters at the beginning of words for a project. The overall frequency of letters, wherever they appear in a word, is well known. Initial frequencies are not so common, so I did a little experiment. I downloaded the Canterbury Corpus and looked at …
Testing the TikTok algorithm
The Wall Street Journal tested out the TikTok algorithm with bots to see…Tags: algorithm, TikTok, Wall Street Journal
Navigating the Return to In-Person Dining in NYC: Analysis of Data Scraped from OpenTable
Background The COVID-19 pandemic has had huge impacts on the economy of the U.S., and the restaurant industry has been among the hardest hit. To adapt to the pandemic, restaurants turned to technology. 2020 brought about contactless ordering on tablets, QR code menus, and an explosion in the usage of …
An AI chatbot to talk to the dead
Joshua Barbeau fed an AI chatbot with old texts from his fiancee who…Tags: AI, chatbot, death, fiancee
Introduction to Deep Learning
Sebastian Raschka made 170 videos on deep learning, and you can watch all…Tags: deep larning, Python, Sebastian Raschka
Introduction to Modern Statistics
Introduction to Modern Statistics by Mine Cetinkaya-Rundel and Johanna Hardin is a free-to-download…Tags: book, introduction
Random drug screening
Suppose in a company of N employees, m are chosen randomly for drug screening. In two independent screenings, what is the probability that someone will be picked both times? It may be unlikely that any given individual will be picked twice, while being very likely that someone will be picked …
Billionaire tax rates
ProPublica anonymously obtained billionaires’ tax returns. Combining the data with Forbes’ billionaire wealth…Tags: billionaires, money, ProPublica, taxes
AI Embraces and Extends Statistics
In the sixty years since Arthur Samuel first published his seminal machine learning work, artificial intelligence has advanced from being not as smart as a flatworm to having less common sense than a house cat. Read more
Universal confidence interval
Here’s a way to find a 95% confidence interval for any parameter θ. With probability 0.95, return the real line. With probability 0.05, return the empty set. Clearly 95% of the time this procedure will return an interval that contains θ. This example shows the difference between a confidence interval …