The characteristic polynomial of a random matrix is almost always irreducible

I just uploaded to the arxiv a paper which can be viewed as vindication by anybody who has ever struggled to compute an eigenvalue by hand: the paper gives mathematical proof that eigenvalues are almost always as complicated as they can be. Intuitive as this may be, to prove it I needed two mathematical bazookas: both the extended Riemann hypothesis and the classification of finite simple groups!

An example will illustrate what I mean. Here is a random {10\times 10} matrix with {\pm1} entries:

\displaystyle  M = \left(\begin{array}{rrrrrrrrrr} -1 & 1 & 1 & -1 & -1 & 1 & -1 & 1 & -1 & 1 \\ -1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & -1 & -1 \\ -1 & -1 & 1 & 1 & 1 & -1 & -1 & -1 & 1 & 1 \\ 1 & -1 & -1 & 1 & -1 & -1 & 1 & 1 & 1 & -1 \\ -1 & -1 & -1 & 1 & 1 & 1 & 1 & -1 & -1 & -1 \\ -1 & -1 & -1 & -1 & -1 & 1 & 1 & 1 & 1 & 1 \\ -1 & -1 & -1 & 1 & -1 & 1 & 1 & 1 & -1 & 1 \\ 1 & -1 & -1 & -1 & 1 & 1 & 1 & 1 & -1 & -1 \\ -1 & -1 & 1 & -1 & 1 & -1 & -1 & -1 & -1 & -1 \\ -1 & -1 & -1 & -1 & -1 & 1 & -1 & 1 & -1 & -1 \end{array}\right) .

The eigenvalues of {M} as approximated by scipy.lingalg.eig are shown below.

But what if we want to know the eigenvalues exactly? Well, they are the roots of the characteristic polynomial of {M}, which is

\displaystyle  \varphi(x) = x^{10} - 4x^{9} + 6x^{8} - 80x^{6} + 144x^{5} - 544x^{4} + 1152x^{3} - 1024x^{2} + 1280x - 1024,

so we just have to solve {\varphi(x) = 0}. The Abel–Ruffini theorem states that, in general, this is not possible in the quadratic-formula-type sense involving radicals, etc., so there are certainly evil matrices that defy solution. But could there be something special about random matrices that makes {\varphi} typically more special?

It is not at all obvious that {\varphi} is “generic”. For one thing, if {M} is singular, then {\varphi} has an obvious root: zero! For another, if {M} happens to have a repeated eigenvalue, then {\varphi} and {\varphi'} share a factor, which is certainly a special algebraic property. It turns out that these events are low-probability events, but those are landmark results about random matrices (due originally to Komlós and Tao–Vu (symmetric case) and Ge (nonsymmetric), respectively).

Algebraically, the complexity of solving {\varphi(x) = 0} is measured by the Galois group {G}. In particular, the polynomial is irreducible if and only if {G} is transitive, and the equation is solvable by a formula involving radicals as in the quadratic formula if and only if {G} is solvable in the sense of group theory. Galois groups are not easy to calculate in general, but “typically” they are, by the following well-known trick. If we factorize {\varphi} mod {p} for some prime {p}, then, as long as we do not get a repeated factor, the degree sequence of the factorization is the same as the cycle type of some element of {G}. If we do this for enough small primes {p} we can often infer that {G = S_n}.

Let’s do this for the above example. If we factorize {\varphi} mod {p} for {p < 20}, discarding the “ramified” primes for which there is a repeated factor, we get the following degree sequences:

{p} degree sequence
{5} {\left(9, 1\right)}
{7} {\left(4, 4, 2\right)}
{13} {\left(5, 3, 2\right)}
{17} {\left(5, 3, 2\right)}
{19} {\left(7, 2, 1\right)}

Therefore {G} contains a {9}-cycle, an element of type {(4, 4, 2)}, etc. Just from the {9}-cycle and the {(4,4,2)} element it is clear that {G} must be transitive (in fact 2-transitive), so {\varphi} is irreducible. The are various other rules that help eliminate other possibilities for {G}. A common one is a theorem of Jordan which states that if {G} is primitive and contains a {p}-cycle for some prime {p \leq n - 3} then {G \geq A_n}. The element of cycle type {(5, 3, 2)} is odd and has a power which is a 5-cycle, so Jordan’s theorem implies {G = S_{10}}.

Conclusion: for the matrix {M} above, solving {\varphi(x) = 0} is a certifiable pain. You should just give up and accept {\varphi} itself as the prettiest answer you are going to get.

What my paper shows in general is that, if you choose the entries of an {n\times n} matrix from a fixed distribution in the integers,then, with probability tending to {1} as {n \to \infty}, the characteristic polynomial {\varphi} is irreducible, and moreover its Galois group is at least {A_n}. The result is conditional on the extended Riemann hypothesis, as mentioned, except in some nice special cases, such as when the entries are uniformly distributed in {\{1, \dots, 210\}.}

Note I only said “at least {A_n}”. Presumably the event {G = A_n} also has negligible probability, so {G = S_n} with probability {1 - o(1)}, but this remains open, and seems difficult. To prove it (naively), you have to show that the discriminant of the characteristic polynomial is, with high probability, not a perfect square. That should certainly be a low-probability event, but it’s not clear how to prove it!

Why do I need the extended Riemann hypothesis (ERH) and the classification of finite simple groups (CFSG)?

  • ERH: The example above involving factorizing mod {p} is not just useful in practice but also in theory. The prime ideal theorem tells you the statistics of what happens with your polynomial mod {p} for large {p}. In particular, to show that {G} is transitive it suffices to show that {\varphi} has one root on average mod {p}, for large primes {p}. Exactly how large {p} must be is determined by the error term in the prime ideal theorem, and ERH really helps a lot.
  • CFSG: An elaboration of the above strategy is used to show that {G} is not just transitive but {m}-transitive for any constant {m}. By CFSG, {m = 4} and {n > 24} is enough to imply {G \geq A_n}.

Arguably, matrices are not playing a large role in the statement of the theorem: it is just a statement about a random polynomial {\varphi}, where the model happens to be “take a random matrix with independent entries and take its characteristic polynomial”. The theorem should be true for any good model for a random polynomial of high degree. For example, in the “independent (or restricted) coefficients model”, the coefficients of {\varphi} are independent with some fixed distribution. For this model the corresponding statements were proved recently by Bary-Soroker–Kozma (see also Bary-Soroker’s blog post) and Breuillard–Varju, and I certainly borrow a lot from those papers. However, the common ground is isolated to the local–global principle that reduces the problem to a problem modulo {p}, and there the commonality ends. The independent coefficients case becomes a question about a certain kind of random walk (see the Breuillard–Varju paper) mod {p}, whereas the characteristic polynomial case becomes a problem about counting eigenvalues of a random matrix mod {p}.

There are many beautiful questions and mysteries about random polynomials. See for example the many high-definition pictures on John Baez’s website here showing the roots of random polynomials with independent coefficients. In many ways, taking the characteristic polynomial of a random matrix is actually a more natural model of a random polynomial, and some aspects become simpler. For example, I bet that the {A_n} vs {S_n} question will be settled sooner in the case of a random characteristic polynomial than in the case of independent coefficients.

Babai’s conjecture for classical groups with random generators

Yesterday Urban Jezernik and I uploaded to the arxiv our preprint Babai’s conjecture for high-rank classical groups with random generators. I want to use this space to explain what our work is about and where it fits in the literature. To try to cater for a possibly wide range of mathematical backgrounds, I will start friendly and informal and get progressively more precise and technical. Thanks in advance for your attention!

Rubik's cube

The subject matter is diameters of groups. A familiar example is Rubik’s cube, which has {n \approx 4 \times 10^{19}} ({40} billion billion) possible configurations. The set of all configurations of the cube can be viewed as a group {R}. (For the exact group structure, see wikipedia.) The standard generating set {S} consists of rotations of any of the 6 faces, a set of size {19} (including the do-nothing operation). The diameter is the maximum number of moves it takes to solve the cube from any configuration. Now the number of configurations a distance {d} from the origin, denoted {S^d}, is at most {19^d}: at each step, choose one of the {19} possible moves. Therefore the diameter cannot be smaller than

\displaystyle  \left\lceil\frac{\log n}{\log 19}\right\rceil = 16.

The exact answer is not that much larger: 20. The size of {S^d} is plotted below, which exhibits almost perfect exponential growth, until the end when it runs out of space.

rubik
Growth of S^d in the Rubik’s cube group

In general, let {G} be a finite group and {S} a set of generators. Assume {S = S^{-1}} and {1 \in S}. The Cayley graph {\mathrm{Cay}(G, S)} is the graph with vertex set {G} and an edge for every pair {\{g, gs\}} with {g\in G} and {s \in S \setminus \{1\}}. The diameter {\mathrm{diam}\,\,\Gamma} of a graph {\Gamma} is the “longest shortest path”: the distance between two vertices is the length of the shortest path between them, and the diameter is the greatest distance between any two vertices. The diameter of a Cayley graph {\mathrm{Cay}(G, S)} is the same as the smallest {d} such that {S^d = G}.

One expects similar behaviour as we saw for the Rubik’s cube group {R} for any group {G} which is sufficiently nonabelian (so that most products of generators are different, causing the balls {S^d} to expand at about the rate they do in a tree). The most ideal form of nonabelianness is simplicity, and Babai’s conjecture is a precise form of this intuition for finite simple groups. The conjecture states that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq (\log |G|)^{O(1)},

uniformly over finite simple groups {G} and generating sets {S}. The trivial lower bound is {\log_2|G|}, so the conjecture states that the truth is within a power of this trivial lower bound.

By the classification of finite simple groups, Babai’s conjecture breaks naturally into three parts:

  1. finite groups {X(q)} of a fixed Lie type {X} over a field of order {q} with {q \to \infty};
  2. quasisimple classical groups of dimension {n} with {n \to \infty} and {q} arbitrary: namely, the special linear group {\mathrm{SL}_n(q)}, the symplectic group {\mathrm{Sp}_n(q)}, the derived subgroup of the orthogonal group {O_n(q)'}, and the special unitary group {\mathrm{SU}_n(q)};
  3. alternating groups {A_n}.

The first part is the “bounded rank” case, and the latter two parts make up the “unbounded rank” case (and you can loosely consider {A_n} to be {\mathrm{PSL}_n(1)}, if you want).

The bounded rank case of Babai’s conjecture is completely resolved, following breakthrough work of Helfgott, Pyber–Szabo, and Breuillard–Green–Tao. Even stronger conjectures can be made in this case. For instance, it may be that the graphs {\mathrm{Cay}(G, S)} have a uniform spectral gap, and in particular that

\displaystyle  \mathrm{diam}\,\,\mathrm{Cay}(G,S) \leq O(\log|G|),

with the implicit constant depending only on the rank. This is true if the generators are random, by work of Breuillard–Green–Guralnick–Tao.

Personally I am most interested in high-rank groups, such as the alternating group {A_n} (cf. the name of this blog). In the case of {A_n}, Babai’s conjecture (a much older conjecture in this case) asserts simply that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(A_n, S) = n^{O(1)}.

For example, if the generating set consists just of the cycles {(12)} and {(12\cdots n)^{\pm1}}, then the the diameter is comparable to {n^2}. A folklore conjecture asserts that this is the worst case up to a constant factor. The strongest results in this direction are the result of Helfgott–Seress (see also this later paper of Helfgott) which states that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(A_n, S) \leq \exp O((\log n)^{4 + o(1)}),

which is quasipolynomial rather than polynomial, and the result of Helfgott–Seress–Zuk which states that, if the generators are random,

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(A_n, S) \leq n^2 (\log n)^{O(1)}.

Thus we are “almost there” with {A_n}. On the other hand, we are still a long way off for high-rank classical groups such as {\mathrm{SL}_n(q)}. Such a group has size comparable to {q^{n^2}}, and Babai’s conjecture asserts that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq (n \log q)^{O(1)}.

In this case the best bound we have is one of Halasi–Maroti–Pyber–Qiao (building on Biswas–Yang), which states

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq q^{O(n (\log n)^2)}.

In many ways {A_n} and, say, {\mathrm{SL}_n(2)} behave similarly, but in other ways they are very different. For example, suppose your generating set for {A_n} contains a {3}-cycle. Then because there are at most {n^3} {3}-cycles in all, it is clear that every {3}-cycle has length at most {n^3} in your generators, and hence every element of {A_n} has length at most {n^4}, so the conjecture is trivial in this case. On other hand, suppose your generating set for {\mathrm{SL}_n(2)} contains the closest analogue to a {3}-cycle, a transvection. Unfortunately, there is not a trivial argument in this case, because the number of transvections is roughly {2^{2n-1}}. Indeed this is a difficult problem that was solved only earlier this year by Halasi.

However, if the size of the field {q} is bounded, and you have at least {q^C} random generators, then Urban and I know what to do. Under these conditions we show that

\displaystyle  \mathrm{diam}\, \mathrm{Cay}(G, S) \leq n^{O(1)},

as conjectured by Babai.

Roughly, the recipe is the following. Let {S} be your set of random generators. We want to show that we can reach everywhere in {G} using {n^{O(1)}} steps in {S}.

  1. Start with a random walk of length {n/10}. All of the points you get to will have “large support”, in the sense that {g-1} has no large eigenspace.
  2. Use explicit character bounds and the second moment method to show that, within the first {n/10} steps, you will reach any specified large normal subset {\mathfrak{C}}.
  3. Choose {\mathfrak{C}} so that every element of {\mathfrak{C}} has a {n^{O(1)}} power which has minimal support. Hence we can reach such an element with {n^{O(1)}} steps.
  4. Now act on that element by conjugation. This looks much like the standard action of {G} on the natural module {\mathbf{F}_q^n}. We show that with high probability the Schreier graph of this action actually has a spectral gap, and in particular logarithmic diameter. Hence we can cover the entire conjugacy class.
  5. Every element of {G} can be written as the product of {O(n)} such elements, so every element of {G} has length {n^{O(1)}} in {S}.

Some of these ideas are recycled from my previous paper with Stefan Virchow (discussed in this earlier post).

Probably the most interesting part is step 4, which generalizes a result of Friedman–Joux–Roichman–Stern–Tillich for the symmetric group. The Schreier graph of the usual action of {S_n} on {\{1, \dots, n\}} is one of the standard models for a random bounded-degree regular graph, and the spectral gap of this graph is well-studied. The graph we care about on the other hand has vertex set {\mathbf{F}_q^n \setminus \{0\}}, with {v} joined to {gv} for each {g} in our set of random generators. We found experimentally that, spectrally, this graph is virtually indistinguishable from the random regular graph. In particular, there should be a spectral gap for at least {2} random generators.

Our method needs {q^C} generators. The reason is that we have great difficulty saying anything about the behaviour of the random walk after about {n} steps, or indeed somewhat fewer depending on the type (linear/symplectic/unitary/orthogonal) of the group. To get around this problem we need to ensure that the random walk covers the whole Schreier graph before this critical time, so we need enough generators so that the random walk spreads quickly enough.