All the data you need.

Tag: Computing

Three composition theorems for differential privacy
This is a brief post, bringing together three composition theorems for differential privacy. The composition of an ε1-differentially private algorithm and an ε2-differentially private algorithm is an (ε1+ε2)-differentially private algorithm. The composition of an (ε1, δ1)-differentially private algorithm and an (ε2, δ2)-differentially private algorithm is an (ε1+ε2, δ1+δ2)-differentially private algorithm. …
How to Set Num Lock on permanently
When I use my Windows laptop, I’m always accidentally brushing against the Num Lock key. I suppose it’s because the keys are so flat; I never have this problem on a desktop. I thought there must be some way to set it so that it’s always on, so I searched …
Extended floating point precision in R and C
The GNU MPFR library is a C library for extended precision floating point calculations. The name stands for Multiple Precision Floating-point Reliable. The library has an R wrapper Rmpfr that is more convenient for interactive use. There are also wrappers for other languages. It takes a long time to install …
When is round-trip floating point radix conversion exact?
Suppose you store a floating point number in memory, print it out in human-readable base 10, and read it back in. When can the original number be recovered exactly? D. W. Matula answered this question more generally in 1968 [1]. Suppose we start with base β with p places of …
MDS codes
A maximum distance separable code, or MDS code, is a way of encoding data so that the distance between code words is as large as possible for a given data capacity. This post will explain what that means and give examples of MDS codes. Notation A linear block code takes …
Computing the area of a thin triangle
Heron’s formula computes the area of a triangle given the length of each side. where If you have a very thin triangle, one where two of the sides approximately equal s and the third side is much shorter, a direct implementation Heron’s formula may not be accurate. The cardinal rule …
Computing parity of a binary word
The previous post mentioned adding a parity bit to a string of bits as a way of detecting errors. The parity of a binary word is 1 if the word contains an odd number of 1s and 0 if it contains an even number of ones. Codes like the Hamming …
Popcount: counting 1’s in a bit stream
Sometimes you need to count the number of 1’s in a stream of bits. The most direct application would be summarizing yes/no data packed into bits. It’s also useful in writing efficient, low-level bit twiddling code. But there are less direct applications as well. For example, three weeks ago this …
Runge-Kutta methods and Butcher tableau
If you know one numerical method for solving ordinary differential equations, it’s probably Euler’s method. If you know two methods, the second is probably 4th order Runge-Kutta. It’s standard in classes on differential equations or numerical analysis to present Euler’s method as conceptually simple but inefficient introduction, then to present …
TestU01 small crush test suite
In recent posts I’ve written about using RNG test suites on the output of the μRNG entropy extractor. This is probably the last post in the series. I’ve looked at NIST STS, PractRand, and DIEHARDER before. In this post I’ll be looking at TestU01. TestU01 includes three batteries of tests: …
Testing entropy extractor with NIST STS
Around this time last year I wrote about the entropy extractor used in μRNG. It takes three biased random bit streams and returns an unbiased bit stream, provided each stream as has least 1/3 of a bit of min-entropy. I’ve had in the back of my mind that I should …
Stiff differential equations
There is no precise definition of what it means for a differential equation to be stiff, but essentially it means that implicit methods will work much better than explicit methods. The first use of the term [1] defined stiff equations as equations where certain implicit methods, in particular BDF, perform …
Stable and unstable recurrence relations
The previous post looked at computing recurrence relations. That post ends with a warning that recursive evaluations may nor may not be numerically stable. This post will give examples that illustrate stability and instability. There are two kinds of Bessel functions, denoted J and Y. These are called Bessel functions …
Finding large pseudoprimes
Fermat’s little theorem says that if p is a prime number, then for any integer b, bp-1 = 1 (mod p). This gives a necessary but not sufficient test for a number to be prime. A number that satisfies the equation above but is not prime is called a pseudoprime …
Doing a database join with CSV files
It’s easy to manipulate CSV files with basic command line tools until you need to do a join. When your data is spread over two different files, like two tables in a normalized database, joining the files is more difficult unless the two files have the same keys in the …
Exporting Excel files to CSV with in2csv
This post shows how to export an Excel file to a CSV file using in2csv from the csvkit package. You could always use Excel itself to export an Excel file to CSV but there are several reasons you might not want to. First and foremost, you might not have Excel. …
Minimizing context switching between shell and Python
Sometimes you’re in the flow using the command line and you’d like to briefly switch over to Python without too much interruption. Or it could be the other way around: you’re in the Python REPL and need to issue a quick shell command. One solution would be to run your …
Top command line posts of 2019
Top blog posts this year about command line tools. The hard part in becoming a command line wizard Computational survivalist Computing π with bc Set theory at the command line Working with wide text files Random sampling from a file