Mixing time of a pseudorandom number generator

Today I was due to speak at the Isaac Newton Institute’s workshop on “Interactions between group theory, number theory, combinatorics and geometry”, which has obviously been canceled. I was going to speak about my work on the Hall–Paige conjecture: see this earlier post. Meanwhile Péter Varjú was going to speak about some other joint work, which I’d like to share with you now.

Consider (if you can consider anything other than coronaviral apocalapyse right now) the random process {(X_n)} in {\mathbf{Z}/p\mathbf{Z}} defined in the following way: {X_0 = 1} (or any initial value), and for {n\geq 1}

\displaystyle   X_n = aX_{n-1} + b_n. \ \ \ \ \ (1)

Here we take {a} to be a fixed integer and {b_1, b_2, \dots} a sequence of independent draws from some finitely supported measure {\mu} on {\mathbf{Z}}. As a representative case, take {a = 2} and {\mu = u_{\{-1, 0, 1\}}} (uniform on {\{-1, 0, 1\}}).

Question 1 What is the mixing time of {(X_n \bmod p)} in {\mathbf{Z}/p\mathbf{Z}}? That is, how large must we take {n} before {X_n} is approximately uniformly distributed in {\mathbf{Z}/p\mathbf{Z}}?

This question was asked by Chung, Diaconis, and Graham (1987), who were motivated by pseudorandom number generation. The best-studied pseudorandom number generators are the linear congruential generators, which repeatedly apply an affine map {x\mapsto ax+b \bmod p}. These go back to Lehmer (1949), whose parameters {a} and {b} were also subject to some randomness (from MathSciNet, “The author’s proposal for generating a sequence of `random’ 8 decimal numbers {u_n} is to start with some number {u_0\neq 0}, multiply it by 23, and subtract the two digit overflow on the left from the two right hand digits to obtain a new number {u_1}.”) The process {(X_n)} defined above is an idealized model: we assume the increment {b} develops under some external perfect randomness, and we are concerned with the resulting randomness of the iterates.

My own interest arises differently. The distribution of {X_n} can be thought of as the image of {a} under the random polynomial

\displaystyle  P_n(X) = X^n + b_1 X^{n-1} + \cdots + b_n,.

In particular, {X_n \bmod p = 0} if and only if {a} is a root of {P_n}. Thus the distribution of {X_n \bmod p} is closely related to the roots of a random polynomial (of high degree) mod {p}. There are many basic and difficult questions about random polynomials: are they irreducible, what are their Galois groups, etc. But these are questions for another day (I hope!).

Returning to Question 1, in the representative case {a = 2} and {\mu = u_{\{-1, 0, 1\}}}, CDG proved the following results (with a lot of clever Fourier analysis):

  1. The mixing time is at least

    \displaystyle (1+o(1)) \log_2 p.

    This is obvious: {X_n} is supported on a set of size {O(2^n)}, and if {n \leq (1-\varepsilon) \log_2 p} then {2^n \leq p^{1-\varepsilon}}, so at best {X_n} is spread out over a set of size {p^{1-\varepsilon}}.

  2. The mixing time is never worse than

    \displaystyle C \log p \log \log p.

  3. Occasionally, e.g., if {p} is a Mersenne prime, or more generally if {a} has small order mod {p} (or even roughly small order, suitably defined), the mixing time really is as bad as that.
  4. For almost all odd {p} the mixing time is less than

    \displaystyle 1.02 \log_2 p.

You would be forgiven for guessing that {1.02} can be replaced with {1+o(1)} with more careful analysis, but you would be wrong! In 2009, Hildebrand proved that for all {p} the mixing time of {X_n \bmod p} is at least

\displaystyle  1.004 \log_2 p.

Therefore the mixing time of {X_n \bmod p} is typically slightly more than {\log_2 p}. What is going on here? What is the answer exactly?

In a word, the answer is entropy. Recall that entropy of a discrete random variable {X} is defined by

\displaystyle  H(X) = \sum_{x : \mu(x) > 0} \mathbf{P}(X = x) \log \mathbf{P}(X = x)^{-1}.

Here are some basic facts about entropy:

  1. If {X} is uniform on a set of size {k} then {H(X) = \log k}.
  2. Entropy is subadditive: the entropy of the joint variable {(X, Y)} is at most {H(X) + H(Y)}.
  3. Entropy cannot be increased by a (deterministic) function: {H(f(X)) \leq H(X)}.

By combining the second and third facts, we have the additive version (in the sense of adding the variables) of subadditivity:

\displaystyle  H(X + Y) \leq H(X) + H(Y).

In particular, it follows from (1) that

\displaystyle  H(X_{m+n}) \leq H(X_m) + H(X_n),

and hence by the subadditive lemma {H(X_n)/n} converges. (We are thinking now of {X_n} as a random variable in {\mathbf{Z}}, not reduced modulo anything.) Call the limit {H = H(a, \mu)}.

It is easy to see in our model case {a=2, \mu = u_{\{-1, 0, 1\}}} that {H \leq \log 2} (because {X_n} has support size {O(2^n)}). If {H < \log 2}, then it follows that the mixing time of {X_n \bmod p} is strictly greater than {\log_2 p} (as {X_n} cannot approach equidistribution mod {p} before its entropy is at least {(1+o(1))\log p}).

Indeed it turns out that {H < \log 2}. This is "just a computation": since {H = \inf_{n\geq 1} H(X_n) / n}, we just need to find some {n} such that {H(X_n) / n < \log 2}. Unfortunately, the convergence of {H(X_n) / n} is rather slow, as shown in Figure 1, but we can take advantage of another property of entropy: entropy satisfies not just subadditivity but submodularity

\displaystyle  H(X+Y+Z) + H(X) \leq H(X+Y) + H(X+Z),

and it follows by a short argument that {H(X_n) - H(X_{n-1})} is monotonically decreasing and hence also convergent to {H}; moreover, unlike the quotient {H(X_n)/n}, the difference {H(X_n) - H(X_{n-1})} appears to converge exponentially fast. The result is that

\displaystyle  H / \log 2 = 0.98876\dots,

so the mixing time of {X_n \bmod p} is not less than

\displaystyle  H^{-1} \log p = (1.01136\dots ) \log_2 p.

(We can also deduce, a posteriori, that we will have {H(X_n) / n < \log 2} for {n \geq 72}, though it is out of the question to directly compute {H(X_n)} for such large {n}.)


Figure 1: The entropy difference rapidly converges. Predicted values are dashed.

This observation was the starting point of a paper that Péter and I have just finished writing. The preprint is available at arxiv:2003.08117. What we prove in general is that the mixing time of {X_n \bmod p} is indeed just {(H^{-1} + o(1)) \log p} for almost all {p} (either almost all composite {p} coprime with {a}, or alternatively almost all prime {p}). In other words, entropy really is the whole story: as soon as the entropy of {X_n} is large enough, {X_n \bmod p} should be close to equidistributed (with a caveat: see below). The lower bound is more or less clear, as above. Most of the work of the paper is involved with the upper bound, for which we needed several nontrivial tools from probability theory and number theory, as well as a few arguments recycled from the original CDG paper.

However, just as one mystery is solved, another arises. Our argument depends on the large sieve, and it therefore comes with a square-root barrier. The result is that we have to assume {H(a, \mu) > \frac12 \log a}. This is certainly satisfied in our representative case {a = 2, \mu = u_{\{-1, 0, 1\}}} (as the entropy is very close to {\log 2}), but in general it need not be, and in that case the problem remains open. The following problem is representative.

Problem 2 Let {a \geq 2} and let {\mu = u_{\{0, 1\}}}. Then {X_n} is uniform on a (Cantor) set of size {2^n}, so {H(a, \mu) = \log 2}. Show that the mixing time of {X_n \bmod p} is {(1+o(1))\log_2 p} for almost all primes {p}.

You might call this the “2–3–4–5 problem”. The case {a=2} is trivial, as {X_n} is uniform on {\{1, \dots, 2^n\}}. The case {a=3} is covered by our main theorem, since {\log 2 > \frac12 \log 3}. The case {a=4} is exactly borderline, as {\log 2 = \frac12 \log 4}, and this case is not covered by our main theorem, but we sketch how to stretch the proof to include this case anyway. For {a \geq 5} we need a new idea.

Thoughts on coronavirus

This is a research blog, but it is hard to think about much other than coronavirus in this unique time. Here are some thoughts.

Some extreme fears:

Let’s get these out of the way first.

  1. People keep greedily stock-piling and vulnerable people go without necessary resources, or people get injured in the frenzy. More generally, I am worried about panic, mob mentality, riots, looting, etc.
  2. Blame. Blame China for creating the thing, blame the government for mismanaging, blame your neighbour for infecting you, blame the stockpilers (see point 1).
  3. Social distance. Not in the sense of infection control (see wikipedia), which is practical and good, but in a more general sense of developing selfishness, nationalism, every-man-for-himself-ism, the opposite of solidarity. A unique feature of the current crisis is that our usual mechanisms for showing unity are mostly compromised.
  4. Tip of the iceberg: maybe this is just phase 1, and the virus will keep mutating and mutating, and who knows what will happen.

More concretely, and most obviously, I am worried about the safety of people I know. I will leave “how to think about the risk” for some other time (or some other author).

An extreme hope: virtual connectivity

It sounds frivolous compared to my list of fears, but just now it’s one of my big hopes for humans. In this era of technology we deserve to be much better at this. The only reason I don’t use Skype more is the technology seems just slightly crap, e.g., with dropped or slow connections, or the interaction is just slightly clumsy, with the natural timing of conversation out of step. The current crisis could drive change here: technologically, much like how war invented the microwave, but also just by forcing us to practice and develop the social protocol.

Why do I consider that important? Because I think it’s a key step for us as a global society to travel less, and traveling less will not only slow the spread of disease but also help to slow climate change, which after all is still a much greater existential threat to the race, but lacks (at least for now) the shock-and-awe necessary to generate a collective response.

In short, I am hoping that a global crisis like the current one compels us to develop tools quickly that we are going to need anyway, and virtual connectivity is my big example.

A message from Queens’ College Acting Senior Tutor

I liked this message from Queens’ College Acting Senior Tutor, Prof Martin Dixon, to students, a much-needed message of responsibility and leadership:

Equally, do not underestimate the positive impact that you can make. Perhaps you may not realise, but being a student at Cambridge causes people to hold you in high regard. Your words and actions will be listened to and watched. You are among the very best students in the world, and you can lead by example. A kind word, a simple act of support, a calming of panic, can have a profound effect on those around you. You may not want this responsibility, but you have it. You also have the opportunity to demonstrate what it means to be a Cambridge student in a time of uncertainty. Please live up to the faith we have in you.

Source: the tab, Emily Hyde & Claudia Rowan, 2020/03/13

Of course, the message applies much more broadly than to just students at Cambridge. Anybody with any sort of visible status has the responsibility to think calmly and critically in this time of crisis, as their decisions and opinions are more liable to influence and spread.

Isaac Newton and the black death

Yes, Isaac Newton invented calculus, optics, gravity, etc, etc, during the great plague (1665–1666), the last major recurrence of bubonic plague in England. Therefore, so says everybody recently it seems, lots of excellent ground-breaking research is going to happen over the next couple of months. Maybe, but I’m not as confident. Here are some counterpoints.

  1. Most science is done in labs these days.
  2. Science that isn’t done in labs (e.g., mathematics, theoretical physics) is often about collective understanding, and sharing ideas is essential.
  3. Lots of people have, like, 2-year-olds at home.
  4. Isaac Newton didn’t have minute-by-minute updates on the progress of the disease.

More on point 2: Mathematics at its heart is often about developing collective understanding: that’s why giving talks is so cental, to give people the opportunity to interpose questions off the cuff, test their understanding, bounce ideas off each other. Reading and writing papers is an equally important part of the process, but a much slower part of the process.

Personally

Personally, my usual research is not much affected: I normally work from my office at home and collaborate remotely, mostly by email. In the coming months I hope to work on virtual colloquia, virtual teaching, etc. It will be interesting to see where we get to in a couple months’ or a year’s time.

Best wishes to anyone reading this. Stay safe, be sensible, don’t panic. Stay in touch.

xkcd-2282
Source: xkcd 2282