Babai’s conjecture for classical groups with random generators

Yesterday Urban Jezernik and I uploaded to the arxiv our preprint Babai’s conjecture for high-rank classical groups with random generators. I want to use this space to explain what our work is about and where it fits in the literature. To try to cater for a possibly wide range of mathematical backgrounds, I will start friendly and informal and get progressively more precise and technical. Thanks in advance for your attention!

Rubik's cube

The subject matter is diameters of groups. A familiar example is Rubik’s cube, which has {n \approx 4 \times 10^{19}} ({40} billion billion) possible configurations. The set of all configurations of the cube can be viewed as a group {R}. (For the exact group structure, see wikipedia.) The standard generating set {S} consists of rotations of any of the 6 faces, a set of size {19} (including the do-nothing operation). The diameter is the maximum number of moves it takes to solve the cube from any configuration. Now the number of configurations a distance {d} from the origin, denoted {S^d}, is at most {19^d}: at each step, choose one of the {19} possible moves. Therefore the diameter cannot be smaller than

\displaystyle  \left\lceil\frac{\log n}{\log 19}\right\rceil = 16.

The exact answer is not that much larger: 20. The size of {S^d} is plotted below, which exhibits almost perfect exponential growth, until the end when it runs out of space.

rubik
Growth of S^d in the Rubik’s cube group

In general, let {G} be a finite group and {S} a set of generators. Assume {S = S^{-1}} and {1 \in S}. The Cayley graph {\mathrm{Cay}(G, S)} is the graph with vertex set {G} and an edge for every pair {\{g, gs\}} with {g\in G} and {s \in S \setminus \{1\}}. The diameter {\mathrm{diam}\,\,\Gamma} of a graph {\Gamma} is the “longest shortest path”: the distance between two vertices is the length of the shortest path between them, and the diameter is the greatest distance between any two vertices. The diameter of a Cayley graph {\mathrm{Cay}(G, S)} is the same as the smallest {d} such that {S^d = G}.

One expects similar behaviour as we saw for the Rubik’s cube group {R} for any group {G} which is sufficiently nonabelian (so that most products of generators are different, causing the balls {S^d} to expand at about the rate they do in a tree). The most ideal form of nonabelianness is simplicity, and Babai’s conjecture is a precise form of this intuition for finite simple groups. The conjecture states that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq (\log |G|)^{O(1)},

uniformly over finite simple groups {G} and generating sets {S}. The trivial lower bound is {\log_2|G|}, so the conjecture states that the truth is within a power of this trivial lower bound.

By the classification of finite simple groups, Babai’s conjecture breaks naturally into three parts:

  1. finite groups {X(q)} of a fixed Lie type {X} over a field of order {q} with {q \to \infty};
  2. quasisimple classical groups of dimension {n} with {n \to \infty} and {q} arbitrary: namely, the special linear group {\mathrm{SL}_n(q)}, the symplectic group {\mathrm{Sp}_n(q)}, the derived subgroup of the orthogonal group {O_n(q)'}, and the special unitary group {\mathrm{SU}_n(q)};
  3. alternating groups {A_n}.

The first part is the “bounded rank” case, and the latter two parts make up the “unbounded rank” case (and you can loosely consider {A_n} to be {\mathrm{PSL}_n(1)}, if you want).

The bounded rank case of Babai’s conjecture is completely resolved, following breakthrough work of Helfgott, Pyber–Szabo, and Breuillard–Green–Tao. Even stronger conjectures can be made in this case. For instance, it may be that the graphs {\mathrm{Cay}(G, S)} have a uniform spectral gap, and in particular that

\displaystyle  \mathrm{diam}\,\,\mathrm{Cay}(G,S) \leq O(\log|G|),

with the implicit constant depending only on the rank. This is true if the generators are random, by work of Breuillard–Green–Guralnick–Tao.

Personally I am most interested in high-rank groups, such as the alternating group {A_n} (cf. the name of this blog). In the case of {A_n}, Babai’s conjecture (a much older conjecture in this case) asserts simply that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(A_n, S) = n^{O(1)}.

For example, if the generating set consists just of the cycles {(12)} and {(12\cdots n)^{\pm1}}, then the the diameter is comparable to {n^2}. A folklore conjecture asserts that this is the worst case up to a constant factor. The strongest results in this direction are the result of Helfgott–Seress (see also this later paper of Helfgott) which states that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(A_n, S) \leq \exp O((\log n)^{4 + o(1)}),

which is quasipolynomial rather than polynomial, and the result of Helfgott–Seress–Zuk which states that, if the generators are random,

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(A_n, S) \leq n^2 (\log n)^{O(1)}.

Thus we are “almost there” with {A_n}. On the other hand, we are still a long way off for high-rank classical groups such as {\mathrm{SL}_n(q)}. Such a group has size comparable to {q^{n^2}}, and Babai’s conjecture asserts that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq (n \log q)^{O(1)}.

In this case the best bound we have is one of Halasi–Maroti–Pyber–Qiao (building on Biswas–Yang), which states

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq q^{O(n (\log n)^2)}.

In many ways {A_n} and, say, {\mathrm{SL}_n(2)} behave similarly, but in other ways they are very different. For example, suppose your generating set for {A_n} contains a {3}-cycle. Then because there are at most {n^3} {3}-cycles in all, it is clear that every {3}-cycle has length at most {n^3} in your generators, and hence every element of {A_n} has length at most {n^4}, so the conjecture is trivial in this case. On other hand, suppose your generating set for {\mathrm{SL}_n(2)} contains the closest analogue to a {3}-cycle, a transvection. Unfortunately, there is not a trivial argument in this case, because the number of transvections is roughly {2^{2n-1}}. Indeed this is a difficult problem that was solved only earlier this year by Halasi.

However, if the size of the field {q} is bounded, and you have at least {q^C} random generators, then Urban and I know what to do. Under these conditions we show that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq n^{O(1)},

as conjectured by Babai.

Roughly, the recipe is the following. Let {S} be your set of random generators. We want to show that we can reach everywhere in {G} using {n^{O(1)}} steps in {S}.

  1. Start with a random walk of length {n/10}. All of the points you get to will have “large support”, in the sense that {g-1} has no large eigenspace.
  2. Use explicit character bounds and the second moment method to show that, within the first {n/10} steps, you will reach any specified large normal subset {\mathfrak{C}}.
  3. Choose {\mathfrak{C}} so that every element of {\mathfrak{C}} has a {n^{O(1)}} power which has minimal support. Hence we can reach such an element with {n^{O(1)}} steps.
  4. Now act on that element by conjugation. This looks much like the standard action of {G} on the natural module {\mathbf{F}_q^n}. We show that with high probability the Schreier graph of this action actually has a spectral gap, and in particular logarithmic diameter. Hence we can cover the entire conjugacy class.
  5. Every element of {G} can be written as the product of {O(n)} such elements, so every element of {G} has length {n^{O(1)}} in {S}.

Some of these ideas are recycled from my previous paper with Stefan Virchow (discussed in this earlier post).

Probably the most interesting part is step 4, which generalizes a result of Friedman–Joux–Roichman–Stern–Tillich for the symmetric group. The Schreier graph of the usual action of {S_n} on {\{1, \dots, n\}} is one of the standard models for a random bounded-degree regular graph, and the spectral gap of this graph is well-studied. The graph we care about on the other hand has vertex set {\mathbf{F}_q^n \setminus \{0\}}, with {v} joined to {gv} for each {g} in our set of random generators. We found experimentally that, spectrally, this graph is virtually indistinguishable from the random regular graph. In particular, there should be a spectral gap for at least {2} random generators.

Our method needs {q^C} generators. The reason is that we have great difficulty saying anything about the behaviour of the random walk after about {n} steps, or indeed somewhat fewer depending on the type (linear/symplectic/unitary/orthogonal) of the group. To get around this problem we need to ensure that the random walk covers the whole Schreier graph before this critical time, so we need enough generators so that the random walk spreads quickly enough.

Mixing time of a pseudorandom number generator

Today I was due to speak at the Isaac Newton Institute’s workshop on “Interactions between group theory, number theory, combinatorics and geometry”, which has obviously been canceled. I was going to speak about my work on the Hall–Paige conjecture: see this earlier post. Meanwhile Péter Varjú was going to speak about some other joint work, which I’d like to share with you now.

Consider (if you can consider anything other than coronaviral apocalapyse right now) the random process {(X_n)} in {\mathbf{Z}/p\mathbf{Z}} defined in the following way: {X_0 = 1} (or any initial value), and for {n\geq 1}

\displaystyle   X_n = aX_{n-1} + b_n. \ \ \ \ \ (1)

Here we take {a} to be a fixed integer and {b_1, b_2, \dots} a sequence of independent draws from some finitely supported measure {\mu} on {\mathbf{Z}}. As a representative case, take {a = 2} and {\mu = u_{\{-1, 0, 1\}}} (uniform on {\{-1, 0, 1\}}).

Question 1 What is the mixing time of {(X_n \bmod p)} in {\mathbf{Z}/p\mathbf{Z}}? That is, how large must we take {n} before {X_n} is approximately uniformly distributed in {\mathbf{Z}/p\mathbf{Z}}?

This question was asked by Chung, Diaconis, and Graham (1987), who were motivated by pseudorandom number generation. The best-studied pseudorandom number generators are the linear congruential generators, which repeatedly apply an affine map {x\mapsto ax+b \bmod p}. These go back to Lehmer (1949), whose parameters {a} and {b} were also subject to some randomness (from MathSciNet, “The author’s proposal for generating a sequence of `random’ 8 decimal numbers {u_n} is to start with some number {u_0\neq 0}, multiply it by 23, and subtract the two digit overflow on the left from the two right hand digits to obtain a new number {u_1}.”) The process {(X_n)} defined above is an idealized model: we assume the increment {b} develops under some external perfect randomness, and we are concerned with the resulting randomness of the iterates.

My own interest arises differently. The distribution of {X_n} can be thought of as the image of {a} under the random polynomial

\displaystyle  P_n(X) = X^n + b_1 X^{n-1} + \cdots + b_n,.

In particular, {X_n \bmod p = 0} if and only if {a} is a root of {P_n}. Thus the distribution of {X_n \bmod p} is closely related to the roots of a random polynomial (of high degree) mod {p}. There are many basic and difficult questions about random polynomials: are they irreducible, what are their Galois groups, etc. But these are questions for another day (I hope!).

Returning to Question 1, in the representative case {a = 2} and {\mu = u_{\{-1, 0, 1\}}}, CDG proved the following results (with a lot of clever Fourier analysis):

  1. The mixing time is at least

    \displaystyle (1+o(1)) \log_2 p.

    This is obvious: {X_n} is supported on a set of size {O(2^n)}, and if {n \leq (1-\varepsilon) \log_2 p} then {2^n \leq p^{1-\varepsilon}}, so at best {X_n} is spread out over a set of size {p^{1-\varepsilon}}.

  2. The mixing time is never worse than

    \displaystyle C \log p \log \log p.

  3. Occasionally, e.g., if {p} is a Mersenne prime, or more generally if {a} has small order mod {p} (or even roughly small order, suitably defined), the mixing time really is as bad as that.
  4. For almost all odd {p} the mixing time is less than

    \displaystyle 1.02 \log_2 p.

You would be forgiven for guessing that {1.02} can be replaced with {1+o(1)} with more careful analysis, but you would be wrong! In 2009, Hildebrand proved that for all {p} the mixing time of {X_n \bmod p} is at least

\displaystyle  1.004 \log_2 p.

Therefore the mixing time of {X_n \bmod p} is typically slightly more than {\log_2 p}. What is going on here? What is the answer exactly?

In a word, the answer is entropy. Recall that entropy of a discrete random variable {X} is defined by

\displaystyle  H(X) = \sum_{x : \mu(x) > 0} \mathbf{P}(X = x) \log \mathbf{P}(X = x)^{-1}.

Here are some basic facts about entropy:

  1. If {X} is uniform on a set of size {k} then {H(X) = \log k}.
  2. Entropy is subadditive: the entropy of the joint variable {(X, Y)} is at most {H(X) + H(Y)}.
  3. Entropy cannot be increased by a (deterministic) function: {H(f(X)) \leq H(X)}.

By combining the second and third facts, we have the additive version (in the sense of adding the variables) of subadditivity:

\displaystyle  H(X + Y) \leq H(X) + H(Y).

In particular, it follows from (1) that

\displaystyle  H(X_{m+n}) \leq H(X_m) + H(X_n),

and hence by the subadditive lemma {H(X_n)/n} converges. (We are thinking now of {X_n} as a random variable in {\mathbf{Z}}, not reduced modulo anything.) Call the limit {H = H(a, \mu)}.

It is easy to see in our model case {a=2, \mu = u_{\{-1, 0, 1\}}} that {H \leq \log 2} (because {X_n} has support size {O(2^n)}). If {H < \log 2}, then it follows that the mixing time of {X_n \bmod p} is strictly greater than {\log_2 p} (as {X_n} cannot approach equidistribution mod {p} before its entropy is at least {(1+o(1))\log p}).

Indeed it turns out that {H < \log 2}. This is "just a computation": since {H = \inf_{n\geq 1} H(X_n) / n}, we just need to find some {n} such that {H(X_n) / n < \log 2}. Unfortunately, the convergence of {H(X_n) / n} is rather slow, as shown in Figure 1, but we can take advantage of another property of entropy: entropy satisfies not just subadditivity but submodularity

\displaystyle  H(X+Y+Z) + H(X) \leq H(X+Y) + H(X+Z),

and it follows by a short argument that {H(X_n) - H(X_{n-1})} is monotonically decreasing and hence also convergent to {H}; moreover, unlike the quotient {H(X_n)/n}, the difference {H(X_n) - H(X_{n-1})} appears to converge exponentially fast. The result is that

\displaystyle  H / \log 2 = 0.98876\dots,

so the mixing time of {X_n \bmod p} is not less than

\displaystyle  H^{-1} \log p = (1.01136\dots ) \log_2 p.

(We can also deduce, a posteriori, that we will have {H(X_n) / n < \log 2} for {n \geq 72}, though it is out of the question to directly compute {H(X_n)} for such large {n}.)


Figure 1: The entropy difference rapidly converges. Predicted values are dashed.

This observation was the starting point of a paper that Péter and I have just finished writing. The preprint is available at arxiv:2003.08117. What we prove in general is that the mixing time of {X_n \bmod p} is indeed just {(H^{-1} + o(1)) \log p} for almost all {p} (either almost all composite {p} coprime with {a}, or alternatively almost all prime {p}). In other words, entropy really is the whole story: as soon as the entropy of {X_n} is large enough, {X_n \bmod p} should be close to equidistributed (with a caveat: see below). The lower bound is more or less clear, as above. Most of the work of the paper is involved with the upper bound, for which we needed several nontrivial tools from probability theory and number theory, as well as a few arguments recycled from the original CDG paper.

However, just as one mystery is solved, another arises. Our argument depends on the large sieve, and it therefore comes with a square-root barrier. The result is that we have to assume {H(a, \mu) > \frac12 \log a}. This is certainly satisfied in our representative case {a = 2, \mu = u_{\{-1, 0, 1\}}} (as the entropy is very close to {\log 2}), but in general it need not be, and in that case the problem remains open. The following problem is representative.

Problem 2 Let {a \geq 2} and let {\mu = u_{\{0, 1\}}}. Then {X_n} is uniform on a (Cantor) set of size {2^n}, so {H(a, \mu) = \log 2}. Show that the mixing time of {X_n \bmod p} is {(1+o(1))\log_2 p} for almost all primes {p}.

You might call this the “2–3–4–5 problem”. The case {a=2} is trivial, as {X_n} is uniform on {\{1, \dots, 2^n\}}. The case {a=3} is covered by our main theorem, since {\log 2 > \frac12 \log 3}. The case {a=4} is exactly borderline, as {\log 2 = \frac12 \log 4}, and this case is not covered by our main theorem, but we sketch how to stretch the proof to include this case anyway. For {a \geq 5} we need a new idea.