A thought about pancakes

The pancake sorting problem is to determine the diameter of the pancake graph, the Cayley graph of {S_n} with respect to the generating set {P_n = \{p_1, \dots, p_n\}}, where {p_k} is defined (in two-line notation) by

\displaystyle  p_k = \left(\begin{array}{ccccccc} 1 & 2 & \dots & k & k+1 & \cdots & n \\ k & k-1 & \dots & 1 & k+1 & \cdots & n \end{array}\right)

(i.e., {p_k} flips the top {k} pancakes). It is not hard to see that the diameter is between {n} and {2n}. Both lower and upper bound have been sharpened by a constant factor, and a natural conjecture is that {\mathrm{diam}~\mathrm{Cay}(S_n, P_n) / n} converges to some intermediate constant (to be determined, ideally).

Here is a natural model problem: estimate {|P^m|} for each fixed {m} as {n \to \infty}. Clearly {|P^m| \leq n^m}, and we should be able to detect some fall-off from this trivial upper bound.

Proposition 1 There is an integer sequence {(a_m)}, beginning

\displaystyle  1, 2, 6, 22, 90, 486, 3014, 22185, 182229, \dots

(not in the OEIS) such that {|P^m| \sim (a_m/m!) n^m} as {n \to \infty} with {m} fixed.

To define {(a_m)} it is useful to introduce the hyperoctahedral group {B_m = \{\pm1\} \wr S_m} (cf. burnt pancake sorting). You can think of {B_m} as the group of “signed” or “antipodal” permutations: permutations {\sigma} of {\{\pm1, \dots, \pm m\}} such that {\sigma(-x) = -\sigma(x)} for all {x}. There is a natural cover {B_m \to S_m} defined by forgetting signs. We lift {P_m} to {B_m} in a particular way: define {Q_m = \{q_1, \dots, q_m\}}, where

\displaystyle  q_k = \left(\begin{array}{ccccccc} 1 & 2 & \dots & k & k+1 & \cdots & n \\ -k & -(k-1) & \dots & -1 & k+1 & \cdots & n \end{array}\right).

Now define

\displaystyle  a_m = \# \{ q_{\sigma(1)} \cdots q_{\sigma(m)} : \sigma \in S_m \}.

I computed the initial terms in the sequence using this simple formula. The first nontrivial relations are

\displaystyle  q_1 q_2 q_4 q_3 = q_2 q_4 q_3 q_1

and its inverse (to verify I recommend using four asymmetric cards from a deck of cards), which together account for {a_4 = 22 < 24}.

To sketch the proof of Proposition 1, consider composing {m} random elements of {P_n}. You can think of this as a two-step process: randomly divide the stack {\{1, \dots, n\}} in {m} places, and then rearrange the resulting {m+1} pieces, possibly with some reversals. With high probability the points of division are all distinct, so they can be inferred from the result (this is true for {m} up to about {n^{1/2}}). The only question is whether the order can be inferred, and this is precisely what is defined by {a_m}.

It would be informative for the pancake problem to know more about the sequence {(a_m)}. I don’t know whether {a_m} can be computed more efficiently than the naive way above, e.g., whether there is a recurrence: if so that would really help a lot.

There is at least a submultiplicativity property that follows from the proposition:

Corollary 2 For {m, k\geq 1}, {a_{m+k} / (m+k)! \leq a_m / m! \cdot a_k / k!}. Hence there is a constant {c_1 > 0} such that

\displaystyle  a_m/m! < e^{-c_1m + o(m)}.

Proof: This follows from {|P^{m+k}| \leq |P^m| |P^k|}, dividing by {n^{m+k}}, and taking the limit. The second part follows from the subadditivity lemma and the data point {a_4 = 22 < 24}. \Box

I think a stronger bound should be true: there should be a constant {c_2 > 0} such that

\displaystyle  a_m < m!^{1-c_2 + o(1)}.

(One might then contemplate whether {n/(1 - c_2) + o(n)} is the diameter of the pancake graph, but that seems a little bold at this stage.)

The characteristic polynomial of a random matrix is almost always irreducible

I just uploaded to the arxiv a paper which can be viewed as vindication by anybody who has ever struggled to compute an eigenvalue by hand: the paper gives mathematical proof that eigenvalues are almost always as complicated as they can be. Intuitive as this may be, to prove it I needed two mathematical bazookas: both the extended Riemann hypothesis and the classification of finite simple groups!

An example will illustrate what I mean. Here is a random {10\times 10} matrix with {\pm1} entries:

\displaystyle  M = \left(\begin{array}{rrrrrrrrrr} -1 & 1 & 1 & -1 & -1 & 1 & -1 & 1 & -1 & 1 \\ -1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & -1 & -1 \\ -1 & -1 & 1 & 1 & 1 & -1 & -1 & -1 & 1 & 1 \\ 1 & -1 & -1 & 1 & -1 & -1 & 1 & 1 & 1 & -1 \\ -1 & -1 & -1 & 1 & 1 & 1 & 1 & -1 & -1 & -1 \\ -1 & -1 & -1 & -1 & -1 & 1 & 1 & 1 & 1 & 1 \\ -1 & -1 & -1 & 1 & -1 & 1 & 1 & 1 & -1 & 1 \\ 1 & -1 & -1 & -1 & 1 & 1 & 1 & 1 & -1 & -1 \\ -1 & -1 & 1 & -1 & 1 & -1 & -1 & -1 & -1 & -1 \\ -1 & -1 & -1 & -1 & -1 & 1 & -1 & 1 & -1 & -1 \end{array}\right) .

The eigenvalues of {M} as approximated by scipy.lingalg.eig are shown below.

But what if we want to know the eigenvalues exactly? Well, they are the roots of the characteristic polynomial of {M}, which is

\displaystyle  \varphi(x) = x^{10} - 4x^{9} + 6x^{8} - 80x^{6} + 144x^{5} - 544x^{4} + 1152x^{3} - 1024x^{2} + 1280x - 1024,

so we just have to solve {\varphi(x) = 0}. The Abel–Ruffini theorem states that, in general, this is not possible in the quadratic-formula-type sense involving radicals, etc., so there are certainly evil matrices that defy solution. But could there be something special about random matrices that makes {\varphi} typically more special?

It is not at all obvious that {\varphi} is “generic”. For one thing, if {M} is singular, then {\varphi} has an obvious root: zero! For another, if {M} happens to have a repeated eigenvalue, then {\varphi} and {\varphi'} share a factor, which is certainly a special algebraic property. It turns out that these events are low-probability events, but those are landmark results about random matrices (due originally to Komlós and Tao–Vu (symmetric case) and Ge (nonsymmetric), respectively).

Algebraically, the complexity of solving {\varphi(x) = 0} is measured by the Galois group {G}. In particular, the polynomial is irreducible if and only if {G} is transitive, and the equation is solvable by a formula involving radicals as in the quadratic formula if and only if {G} is solvable in the sense of group theory. Galois groups are not easy to calculate in general, but “typically” they are, by the following well-known trick. If we factorize {\varphi} mod {p} for some prime {p}, then, as long as we do not get a repeated factor, the degree sequence of the factorization is the same as the cycle type of some element of {G}. If we do this for enough small primes {p} we can often infer that {G = S_n}.

Let’s do this for the above example. If we factorize {\varphi} mod {p} for {p < 20}, discarding the “ramified” primes for which there is a repeated factor, we get the following degree sequences:

{p} degree sequence
{5} {\left(9, 1\right)}
{7} {\left(4, 4, 2\right)}
{13} {\left(5, 3, 2\right)}
{17} {\left(5, 3, 2\right)}
{19} {\left(7, 2, 1\right)}

Therefore {G} contains a {9}-cycle, an element of type {(4, 4, 2)}, etc. Just from the {9}-cycle and the {(4,4,2)} element it is clear that {G} must be transitive (in fact 2-transitive), so {\varphi} is irreducible. The are various other rules that help eliminate other possibilities for {G}. A common one is a theorem of Jordan which states that if {G} is primitive and contains a {p}-cycle for some prime {p \leq n - 3} then {G \geq A_n}. The element of cycle type {(5, 3, 2)} is odd and has a power which is a 5-cycle, so Jordan’s theorem implies {G = S_{10}}.

Conclusion: for the matrix {M} above, solving {\varphi(x) = 0} is a certifiable pain. You should just give up and accept {\varphi} itself as the prettiest answer you are going to get.

What my paper shows in general is that, if you choose the entries of an {n\times n} matrix from a fixed distribution in the integers,then, with probability tending to {1} as {n \to \infty}, the characteristic polynomial {\varphi} is irreducible, and moreover its Galois group is at least {A_n}. The result is conditional on the extended Riemann hypothesis, as mentioned, except in some nice special cases, such as when the entries are uniformly distributed in {\{1, \dots, 210\}.}

Note I only said “at least {A_n}”. Presumably the event {G = A_n} also has negligible probability, so {G = S_n} with probability {1 - o(1)}, but this remains open, and seems difficult. To prove it (naively), you have to show that the discriminant of the characteristic polynomial is, with high probability, not a perfect square. That should certainly be a low-probability event, but it’s not clear how to prove it!

Why do I need the extended Riemann hypothesis (ERH) and the classification of finite simple groups (CFSG)?

  • ERH: The example above involving factorizing mod {p} is not just useful in practice but also in theory. The prime ideal theorem tells you the statistics of what happens with your polynomial mod {p} for large {p}. In particular, to show that {G} is transitive it suffices to show that {\varphi} has one root on average mod {p}, for large primes {p}. Exactly how large {p} must be is determined by the error term in the prime ideal theorem, and ERH really helps a lot.
  • CFSG: An elaboration of the above strategy is used to show that {G} is not just transitive but {m}-transitive for any constant {m}. By CFSG, {m = 4} and {n > 24} is enough to imply {G \geq A_n}.

Arguably, matrices are not playing a large role in the statement of the theorem: it is just a statement about a random polynomial {\varphi}, where the model happens to be “take a random matrix with independent entries and take its characteristic polynomial”. The theorem should be true for any good model for a random polynomial of high degree. For example, in the “independent (or restricted) coefficients model”, the coefficients of {\varphi} are independent with some fixed distribution. For this model the corresponding statements were proved recently by Bary-Soroker–Kozma (see also Bary-Soroker’s blog post) and Breuillard–Varju, and I certainly borrow a lot from those papers. However, the common ground is isolated to the local–global principle that reduces the problem to a problem modulo {p}, and there the commonality ends. The independent coefficients case becomes a question about a certain kind of random walk (see the Breuillard–Varju paper) mod {p}, whereas the characteristic polynomial case becomes a problem about counting eigenvalues of a random matrix mod {p}.

There are many beautiful questions and mysteries about random polynomials. See for example the many high-definition pictures on John Baez’s website here showing the roots of random polynomials with independent coefficients. In many ways, taking the characteristic polynomial of a random matrix is actually a more natural model of a random polynomial, and some aspects become simpler. For example, I bet that the {A_n} vs {S_n} question will be settled sooner in the case of a random characteristic polynomial than in the case of independent coefficients.