10. Indeed, only very particular (and peculiar) distributional assumptions will consistently

produce last digits that are not uniformly distributed.

Proposition 3 relaxes the assumption that the support of the distribution of electoral
returns has to be a multiple of numeral base b, but in turn it requires that the density has

to approach 0 for very large or very small returns. In reality, this is an issue only for small

returns, because we cannot reasonably extend the support of f below its natural lower bound

of 0. If there is a non-trivial probability of observing less than eighteen votes for a unit of

interest, then proposition 3 does not hold (although proposition 1 still may).
For convenience, proposition 3 explicitly imposes a restriction on the linear approximation
error, which is equivalent to the restriction discussed in corollary 1.
2

Proposition 3. Consider a discrete, non-negative random variable X with probability den-

sity function f and domain {s

1

, . . . , s

2

}. Suppose f can be approximated by an arithmetic

progression for any sequence containing 2b − 1 elements, where b is the base of the positional

numeral system, and the approximation error follows function f

e

, where E[f

e

(z + d)] = 0

over z ∈ {s

1

, . . . , s

2

−2(b−1)} for any d ∈ {0, . . . , b−1}. Then the occurrence of numerals in

the last digit of X approaches a uniform distribution as f (x) approaches 0 for x ≤ s

1

+ 2b − 3

and x ≥ s

2

− 2b + 3.

The intuition behind the proof of proposition 3 is similar to the one for proposition 1.
Here we show that the total density for diﬀerent last digits in sequences of size 2(b − 1) is

proportional to a constant if we can linearly approximate the density function within each

sequence. In the proof of proposition 1, we broke density function g into consecutive pieces
of size b. Here the pieces are overlapping, with a sequence starting at each integer, and in

turn the density function’s support no longer has to be divisible by b.

Finally, no formal proof is needed to see that if last digits are independently and uni-

formly distributed, then (a) in expectation no last digit will be repeated more frequently

than any other in a series of N random draws, and (b) the expected number of repetitions

(i.e. consecutive draws of the same last digit) is

N −1

b

. We argue that the type of empirical

data we consider lends itself to the assumption that last digits are independently distributed.

It is certainly possible that the last digit of the total number of votes cast at a polling station

is correlated with the last digit of the vote count at the next polling station. But if turnout

is in the several hundreds, as it is in our data, it would take a spatial correlation of unlikely

magnitude to carry through to the last digit.

Also note that if last and penultimate digits are independently distributed, and last

digits are distributed uniformly, then the expected number of pairs with digit repetition is

again

N −1

b

, regardless of how the penultimate digit is distributed. Even if the second-to-

last digit was always the same, it would not change the fact that the last digit is a match

with probability

1

b

. If we think about the minimum distance between penultimate and last

digits more generally (for convenience, we like to visualize numerals in a circle, in which case

it is easy to see that the minimum distance between 7 and 1, for example, is 4), we can

say that this distance is 0 with probability

1

b

, it is 1 with probability

2

b

, and it is greater

than 1 with probability

b−3

b

(for b > 2). We later use simulations to construct conﬁdence

2

Note that in this case f is linearly approximated over sequences of size 2(b − 1) rather than b, which means

that proposition 3 actually places a somewhat stricter restriction on the approximation error than corollary 1.
5