The probability of coprimality

A famous result: Two positive integers chosen at random will be coprime with probability {6/\pi^2}.

One has to say what one means by this, since there is no uniform distribution on the positive integers. What we mean is that if we choose {x} and {y} uniformly and independently at random from the integers {\{1,\ldots,N\}}, then {x} and {y} will be coprime with probability tending to {6/\pi^2} as {N\rightarrow\infty}.

Here is one classic proof: Consider the points {(x,y)\in\{1,\ldots,N\}^2} such that {\gcd(x,y)=1}. If we scale this picture by {M}, we will have exactly the points {(X,Y)\in\{1,\ldots,MN\}^2} such that {\gcd(X,Y)=M}. Therefore, whatever the limiting probability {P} of being coprime is, the probability of having {\gcd = M} is exactly {P/M^2}. The law of total probability then implies

\displaystyle  1 = \lim_{N\rightarrow\infty}\mathbf{P}(\gcd(x,y)\in\mathbf{N}) = P \sum_{M=1} \frac{1}{M^2} = P\,\zeta(2).

Another way of imagining a random element on a noncompact group {G} is to look at the profinite completion {\hat{G}}, which is always compact so always possesses a uniform distribution. Now by the Chinese Remainder Theorem,

\displaystyle  \hat{\mathbf{Z}} = \prod_p \mathbf{Z}_p,

so choosing an element at random from {\hat{\mathbf{Z}}} is the same as choosing an element at random from {\mathbf{Z}_p} for each {p}. The probability that a random element of {\mathbf{Z}_p} is divisible by {p} is precisely {1/p}, so the probably that two elements from {\hat{\mathbf{Z}}} are coprime is precisely

\displaystyle  \prod_p (1-1/p^2) = 1/\zeta(2).

This does not imply the asymtotic result in {\mathbf{Z}} mentioned above, but it is a nice way of thinking about it.

EDIT (28/4/12): As pointed out to me by Freddie Manners, the “classic proof” suggested above suffers from the fatal defect of not proving that the limit exists. Here is an alternative proof which does prove that the limit exists. More generally, let {d>1} and consider positive integers {x_1,\ldots,x_d\leq X} chosen uniformly at random. Let {G(X)} be the number of {(x_1,\ldots,x_d)\in[1,X]^d} such that {\gcd(x_1,\ldots,x_d)=1}, and observe that the number of {(x_1,\ldots,x_d)\in[1,X]^d} such that {\gcd(x_1,\ldots,x_d) = n} is precisely {G(X/n)}. Then

\displaystyle  \lfloor X\rfloor^d = \sum_{n\geq1} G(X/n)

for all {X}. Inverting this relationship,

\displaystyle  G(X) = \sum_{n\geq1} \mu(n) \lfloor X/n\rfloor^d.

Thus, the probability that {d} randomly and uniformly chosen integers {x_1,\ldots,x_d\leq X} are all coprime is exactly

\displaystyle P(X) = \sum_{n\geq1} \mu(n) \left(\frac{\lfloor X/n\rfloor}{\lfloor X\rfloor}\right)^d.

The {n}th term here is bounded by {n^{-d}}, so by the dominated convergence theorem {P(X)\rightarrow\sum_{n\geq1} \mu(n) n^{-d} = \zeta(d)^{-1}}.

Deriving the normal distribution

When I was in high school, I asked several a person, “Why the normal distribution?”. After all, the function {e^{-x^2/2}/\sqrt{2\pi}} looks like a pretty bizarre function to guess, when there are other functions like {1/(1+x^2)} which produce perfectly fine bell-shaped curves. One answer I received was that the normal distribution is in some sense the limit of the binomial distribution. While this answer seems fair enough, I tried my hand at the mathematics and did not succeed, so I was still confounded. None the less I believed the answer, and it satisfied me for the time.

The real answer that I was looking for but did not appreciate until university was the central limit theorem. For me, the central limit theorem is the explanation of the normal distribution. In any case, the calculation that I attempted was basically a verification of the central limit theorem in a simple case, and it is a testement to the force of the central limit theorem that that simple case is difficult to work out by hand.

In this post, I rectify that calculation that I should have accomplished in high school (with the benefit of hind-sight being the correct factor of {\sqrt{n}} rather than {n}). On the way, I will also check both laws of large numbers in this simple case.

Consider a “random walk” on {\bf Z}. Specifically, let {X} be a random variable taking the values {1} and {-1} each with probability {1/2}, and let

\displaystyle  S_n = X_1+X_2+\ldots+X_n,

where each {X_i} is an independent copy of {X}. Note that {(X+1)/2} is Bernoulli, so {S_n/2 + n/2} is binomial, so

\displaystyle  {\bf P}(S_n = k) = {\bf P}(S_n/2 + n/2 = k/2 + n/2) = \left(\begin{array}{c}n\\ k/2+n/2\end{array}\right) 2^{-n},

where we make the convention that the binomial coefficient is zero when it doesn’t make sense. Hence, if {x>0},

\displaystyle  {\bf P}(S_n/n \geq x) = \sum_{k\geq x} {\bf P}(S_n/n = k) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n}.

By Stirling’s formula, {m! \sim \sqrt{2\pi m} (m/e)^m}, so

\displaystyle  \left(\begin{array}{c}m\\k\end{array}\right) \sim \sqrt{\frac{m}{2\pi k(m-k)}} \frac{m^m}{k^k (m-k)^{m-k}}

as {m,k\rightarrow\infty}. Hence, for constant {x>0},

\displaystyle  \mathbf{P}(S_n/n \geq x) \leq \frac{n}{2} \binom{n}{\lceil (1+x)n/2\rceil} 2^{-n} \sim \frac{\sqrt{n}}{\sqrt{2\pi(1-x^2)}} \left((1+x)^{1+x} (1-x)^{1-x}\right)^{-n/2}.

Let {f(x) = (1+x)^{1+x} (1-x)^{1-x}}. Note that {f(x)>0} for {0<x<1}, that {f(0)=1}, and that

\displaystyle  \frac{f^\prime(x)}{f(x)} = (\log f(x))^\prime = \log(1+x) + 1 - \log(1-x) - 1 = \log\left(\frac{1+x}{1-x}\right) > 0,

so {f(x)>1} for all {0<x<1}. Hence

\displaystyle  {\bf P}(S_n/n\geq x) \rightarrow 0,

and by symmetry,

\displaystyle  {\bf P}(S_n/n\leq -x) \rightarrow 0,

as {n\rightarrow\infty}. Thus we have verified the weak law.

In fact, because the convergence above is geometric,

\displaystyle  \sum_{n=1}^\infty {\bf P}(S_n/n\geq x) < \infty,

whence

\displaystyle  {\bf P}(S_n/n\geq x\text{ infinitely often}) = \lim_{m\rightarrow\infty} {\bf P}(S_n/n\geq x\text{ for some }n\geq m) \leq \lim_{m\rightarrow\infty} \sum_{n=m}^\infty {\bf P}(S_n/n\geq x) =0.

(This is the Borel–Cantelli lemma.) Thus {S_n/n\rightarrow 0} almost surely. Thus we have verified the strong law.

It remains to check the central limit theorem, i.e., to investigate the limiting distribution of {S_n/\sqrt{n}}. Now, for constant {x<y},

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n} \leq y) = \int_x^y\frac{\sqrt{n}}{2} \left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \,d\mu_n(t),

where {\mu_n} is the atomic measure assigning a mass {2/\sqrt{n}} to each point of {(n+2{\bf Z})/\sqrt{n}}. Now Stirling’s formula strikes again, giving

\displaystyle  \frac{\sqrt{n}}{2}\left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \sim \frac{1}{\sqrt{2\pi(1-t^2/n)}} \left((1+t/\sqrt{n})^{1+ t/\sqrt{n}} (1- t/\sqrt{n})^{1- t/\sqrt{n}}\right)^{-n/2} \longrightarrow \frac{1}{\sqrt{2\pi}} e^{-t^2/2},

as

\displaystyle  \lim_{m\rightarrow\infty} \left((1+t/m)^{1+t/m} (1-t/m)^{1-t/m}\right)^{m^2/t^2} = \lim_{z\rightarrow\pm\infty} (1-1/z^2)^{z^2} (1+1/z)^z (1-1/z)^{-z} = e.

Hence

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n} \leq y) = \frac{1}{\sqrt{2\pi}} \int_x^y e^{-t^2/2}\,d\mu_n(t) + \int_x^y R_n(t)\,d\mu_n(t),

where {R_n(t)\rightarrow 0} as {n\rightarrow\infty}. Moreover this convergence is uniform in {t\in[x,y]}, so

\displaystyle  \left|\int_x^y R_n(t)\,d\mu_n(t)\right| \leq \mu_n([x,y]) \max_{x\leq t\leq y} |R_n(t)| \leq (y-x + 2/\sqrt{n})\max_{x\leq t\leq y} |R_n(t)| \rightarrow 0.

Hence

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n}\leq y) \rightarrow \frac{1}{\sqrt{2\pi}} \lim_{n\rightarrow\infty}\int_x^y e^{-t^2/2}\,d\mu_n(t) = \frac{1}{\sqrt{2\pi}}\int_x^y e^{-t^2/2}\,dt,

where the last equality follows from the theorem that continuous functions are Riemann integrable. Thus we have verified the central limit theorem.