Saturday, January 16, 2016

The First-Digit Law

A Coruña243,870
Ávila58,358
Barcelona1,604,555
Bilbao345,141
Cáceres95,617
Ciudad Real74,427
Córdoba327,362
Cuenca55,428
Donostia186,095
Girona97,586
Granada235,800
Guadalajara83,391
Huelva146,318
Huesca52,239
Lugo98,134
Madrid3,141,991
Málaga569,130
Murcia439,889
Ourense106,231
Oviedo221,870
This post is about long lists of — real-life — numerical data. Doesn't sound like we're off to an interesting start, are we? But keep staring at such a list long enough, and you'll notice something strange.

For example, take a list of population counts for each of the more than 8000 cities in Spain. Like the one on the left. Without actually seeing that whole list, approximately what percentage of those numbers do you expect to start with the digit one?

You could reason that since the first digit could be any from 1 to 9, then a number will start with a specific digit about one in nine times as well. Or about ~11%. You know, on average.

That seems a reasonable and logical guess, and yet it turns out that almost one third of the numbers on this list start with the digit ‘1’!

Check out that link. That site shows something else too. It doesn't just apply to this particular list of numbers. Nor does it only apply to lists of population counts. There are many, many numerical data sets out there where this Law — also called Benford's Law — applies.

Socio-economic data. Stock prices. The lengths of rivers in miles. The lengths of rivers in kilometers! Street addresses. Constants in physics. Birth rates. Death rates. The sizes of the files on your computer. It doesn't apply to everything, but it sure applies to a lot.