Saturday, January 16, 2016

The First-Digit Law

A Coruña243,870
Ávila58,358
Barcelona1,604,555
Bilbao345,141
Cáceres95,617
Ciudad Real74,427
Córdoba327,362
Cuenca55,428
Donostia186,095
Girona97,586
Granada235,800
Guadalajara83,391
Huelva146,318
Huesca52,239
Lugo98,134
Madrid3,141,991
Málaga569,130
Murcia439,889
Ourense106,231
Oviedo221,870
This post is about long lists of — real-life — numerical data. Doesn't sound like we're off to an interesting start, are we? But keep staring at such a list long enough, and you'll notice something strange.

For example, take a list of population counts for each of the more than 8000 cities in Spain. Like the one on the left. Without actually seeing that whole list, approximately what percentage of those numbers do you expect to start with the digit one?

You could reason that since the first digit could be any from 1 to 9, then a number will start with a specific digit about one in nine times as well. Or about ~11%. You know, on average.

That seems a reasonable and logical guess, and yet it turns out that almost one third of the numbers on this list start with the digit ‘1’!

Check out that link. That site shows something else too. It doesn't just apply to this particular list of numbers. Nor does it only apply to lists of population counts. There are many, many numerical data sets out there where this Law — also called Benford's Law — applies.

Socio-economic data. Stock prices. The lengths of rivers in miles. The lengths of rivers in kilometers! Street addresses. Constants in physics. Birth rates. Death rates. The sizes of the files on your computer. It doesn't apply to everything, but it sure applies to a lot.

Benford's Law actually says something about the occurrence of all digits in such data sets, not just the 1:
Distribution of leading digits in data sets
130.1%
217.6%
312.5%
49.7%
57.9%
66.7%
75.8%
85.1%
94.6%

Realizing all this is actually good for something too.

Like, if you sell a lot of house numbers such as these on the right, I guess you might want to stock up on the lower digits? Ehm...   via

A more interesting application is that Benford's Law can be used for fraud detection, for example in accounting figures. People tampering with numbers tend to try and give them a nice and even distribution. That looks the most innocuous right? Well, not if put besides Benford's Law. So it is being used in forensic accountancy too, where it is admissible as evidence in court. The math must really check out!

Stepping it up: Can the Law be used for predicting a financial crisis?
Well... one can surely speculate:
“Greece's public accounts deviated significantly from the distribution of values indicated by Benford's Law just before joining the Euro. It has been suggested that Greece modified their numbers in order to remain compliant with the Maastrict Treaty.”   source

Huh.

Bonus: Do you know the following sequence of numbers?
11235813213455891442333776109871597258441816765
Each number is the sum of the previous two numbers. It's called the Fibonacci sequence and goes on forever.

Another sequence are the factorials. The factorial of a number is the product of that number and all smaller whole numbers. It is denoted with an exclamation mark. So 4! = 4 × 3 × 2 × 1 = 24.
The list of factorials is as follows, and it too goes on forever:
126241207205040403203628803628800399168004790016006227020800

And then there's powers of 2:
12481632641282565121024204840968192163843276865536

The sequences mentioned above have something in common.
Their leading digits follow the distribution given by Benford's Law exactly.

So once again... Huh.

No comments:

Post a Comment