Deriving the normal distribution

When I was in high school, I asked several a person, “Why the normal distribution?”. After all, the function {e^{-x^2/2}/\sqrt{2\pi}} looks like a pretty bizarre function to guess, when there are other functions like {1/(1+x^2)} which produce perfectly fine bell-shaped curves. One answer I received was that the normal distribution is in some sense the limit of the binomial distribution. While this answer seems fair enough, I tried my hand at the mathematics and did not succeed, so I was still confounded. None the less I believed the answer, and it satisfied me for the time.

The real answer that I was looking for but did not appreciate until university was the central limit theorem. For me, the central limit theorem is the explanation of the normal distribution. In any case, the calculation that I attempted was basically a verification of the central limit theorem in a simple case, and it is a testement to the force of the central limit theorem that that simple case is difficult to work out by hand.

In this post, I rectify that calculation that I should have accomplished in high school (with the benefit of hind-sight being the correct factor of {\sqrt{n}} rather than {n}). On the way, I will also check both laws of large numbers in this simple case.

Consider a “random walk” on {\bf Z}. Specifically, let {X} be a random variable taking the values {1} and {-1} each with probability {1/2}, and let

\displaystyle  S_n = X_1+X_2+\ldots+X_n,

where each {X_i} is an independent copy of {X}. Note that {(X+1)/2} is Bernoulli, so {S_n/2 + n/2} is binomial, so

\displaystyle  {\bf P}(S_n = k) = {\bf P}(S_n/2 + n/2 = k/2 + n/2) = \left(\begin{array}{c}n\\ k/2+n/2\end{array}\right) 2^{-n},

where we make the convention that the binomial coefficient is zero when it doesn’t make sense. Hence, if {x>0},

\displaystyle  {\bf P}(S_n/n \geq x) = \sum_{k\geq x} {\bf P}(S_n/n = k) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n}.

By Stirling’s formula, {m! \sim \sqrt{2\pi m} (m/e)^m}, so

\displaystyle  \left(\begin{array}{c}m\\k\end{array}\right) \sim \sqrt{\frac{m}{2\pi k(m-k)}} \frac{m^m}{k^k (m-k)^{m-k}}

as {m,k\rightarrow\infty}. Hence, for constant {x>0},

\displaystyle  \mathbf{P}(S_n/n \geq x) \leq \frac{n}{2} \binom{n}{\lceil (1+x)n/2\rceil} 2^{-n} \sim \frac{\sqrt{n}}{\sqrt{2\pi(1-x^2)}} \left((1+x)^{1+x} (1-x)^{1-x}\right)^{-n/2}.

Let {f(x) = (1+x)^{1+x} (1-x)^{1-x}}. Note that {f(x)>0} for {0<x<1}, that {f(0)=1}, and that

\displaystyle  \frac{f^\prime(x)}{f(x)} = (\log f(x))^\prime = \log(1+x) + 1 - \log(1-x) - 1 = \log\left(\frac{1+x}{1-x}\right) > 0,

so {f(x)>1} for all {0<x<1}. Hence

\displaystyle  {\bf P}(S_n/n\geq x) \rightarrow 0,

and by symmetry,

\displaystyle  {\bf P}(S_n/n\leq -x) \rightarrow 0,

as {n\rightarrow\infty}. Thus we have verified the weak law.

In fact, because the convergence above is geometric,

\displaystyle  \sum_{n=1}^\infty {\bf P}(S_n/n\geq x) < \infty,

whence

\displaystyle  {\bf P}(S_n/n\geq x\text{ infinitely often}) = \lim_{m\rightarrow\infty} {\bf P}(S_n/n\geq x\text{ for some }n\geq m) \leq \lim_{m\rightarrow\infty} \sum_{n=m}^\infty {\bf P}(S_n/n\geq x) =0.

(This is the Borel–Cantelli lemma.) Thus {S_n/n\rightarrow 0} almost surely. Thus we have verified the strong law.

It remains to check the central limit theorem, i.e., to investigate the limiting distribution of {S_n/\sqrt{n}}. Now, for constant {x<y},

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n} \leq y) = \int_x^y\frac{\sqrt{n}}{2} \left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \,d\mu_n(t),

where {\mu_n} is the atomic measure assigning a mass {2/\sqrt{n}} to each point of {(n+2{\bf Z})/\sqrt{n}}. Now Stirling’s formula strikes again, giving

\displaystyle  \frac{\sqrt{n}}{2}\left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \sim \frac{1}{\sqrt{2\pi(1-t^2/n)}} \left((1+t/\sqrt{n})^{1+ t/\sqrt{n}} (1- t/\sqrt{n})^{1- t/\sqrt{n}}\right)^{-n/2} \longrightarrow \frac{1}{\sqrt{2\pi}} e^{-t^2/2},

as

\displaystyle  \lim_{m\rightarrow\infty} \left((1+t/m)^{1+t/m} (1-t/m)^{1-t/m}\right)^{m^2/t^2} = \lim_{z\rightarrow\pm\infty} (1-1/z^2)^{z^2} (1+1/z)^z (1-1/z)^{-z} = e.

Hence

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n} \leq y) = \frac{1}{\sqrt{2\pi}} \int_x^y e^{-t^2/2}\,d\mu_n(t) + \int_x^y R_n(t)\,d\mu_n(t),

where {R_n(t)\rightarrow 0} as {n\rightarrow\infty}. Moreover this convergence is uniform in {t\in[x,y]}, so

\displaystyle  \left|\int_x^y R_n(t)\,d\mu_n(t)\right| \leq \mu_n([x,y]) \max_{x\leq t\leq y} |R_n(t)| \leq (y-x + 2/\sqrt{n})\max_{x\leq t\leq y} |R_n(t)| \rightarrow 0.

Hence

\displaystyle  {\bf P}(x\leq S_n/\sqrt{n}\leq y) \rightarrow \frac{1}{\sqrt{2\pi}} \lim_{n\rightarrow\infty}\int_x^y e^{-t^2/2}\,d\mu_n(t) = \frac{1}{\sqrt{2\pi}}\int_x^y e^{-t^2/2}\,dt,

where the last equality follows from the theorem that continuous functions are Riemann integrable. Thus we have verified the central limit theorem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s