A thought about pancakes

The pancake sorting problem is to determine the diameter of the pancake graph, the Cayley graph of {S_n} with respect to the generating set {P_n = \{p_1, \dots, p_n\}}, where {p_k} is defined (in two-line notation) by

\displaystyle  p_k = \left(\begin{array}{ccccccc} 1 & 2 & \dots & k & k+1 & \cdots & n \\ k & k-1 & \dots & 1 & k+1 & \cdots & n \end{array}\right)

(i.e., {p_k} flips the top {k} pancakes). It is not hard to see that the diameter is between {n} and {2n}. Both lower and upper bound have been sharpened by a constant factor, and a natural conjecture is that {\mathrm{diam}~\mathrm{Cay}(S_n, P_n) / n} converges to some intermediate constant (to be determined, ideally).

Here is a natural model problem: estimate {|P^m|} for each fixed {m} as {n \to \infty}. Clearly {|P^m| \leq n^m}, and we should be able to detect some fall-off from this trivial upper bound.

Proposition 1 There is an integer sequence {(a_m)}, beginning

\displaystyle  1, 2, 6, 22, 90, 486, 3014, 22185, 182229, \dots

(not in the OEIS) such that {|P^m| \sim (a_m/m!) n^m} as {n \to \infty} with {m} fixed.

To define {(a_m)} it is useful to introduce the hyperoctahedral group {B_m = \{\pm1\} \wr S_m} (cf. burnt pancake sorting). You can think of {B_m} as the group of “signed” or “antipodal” permutations: permutations {\sigma} of {\{\pm1, \dots, \pm m\}} such that {\sigma(-x) = -\sigma(x)} for all {x}. There is a natural cover {B_m \to S_m} defined by forgetting signs. We lift {P_m} to {B_m} in a particular way: define {Q_m = \{q_1, \dots, q_m\}}, where

\displaystyle  q_k = \left(\begin{array}{ccccccc} 1 & 2 & \dots & k & k+1 & \cdots & n \\ -k & -(k-1) & \dots & -1 & k+1 & \cdots & n \end{array}\right).

Now define

\displaystyle  a_m = \# \{ q_{\sigma(1)} \cdots q_{\sigma(m)} : \sigma \in S_m \}.

I computed the initial terms in the sequence using this simple formula. The first nontrivial relations are

\displaystyle  q_1 q_2 q_4 q_3 = q_2 q_4 q_3 q_1

and its inverse (to verify I recommend using four asymmetric cards from a deck of cards), which together account for {a_4 = 22 < 24}.

To sketch the proof of Proposition 1, consider composing {m} random elements of {P_n}. You can think of this as a two-step process: randomly divide the stack {\{1, \dots, n\}} in {m} places, and then rearrange the resulting {m+1} pieces, possibly with some reversals. With high probability the points of division are all distinct, so they can be inferred from the result (this is true for {m} up to about {n^{1/2}}). The only question is whether the order can be inferred, and this is precisely what is defined by {a_m}.

It would be informative for the pancake problem to know more about the sequence {(a_m)}. I don’t know whether {a_m} can be computed more efficiently than the naive way above, e.g., whether there is a recurrence: if so that would really help a lot.

There is at least a submultiplicativity property that follows from the proposition:

Corollary 2 For {m, k\geq 1}, {a_{m+k} / (m+k)! \leq a_m / m! \cdot a_k / k!}. Hence there is a constant {c_1 > 0} such that

\displaystyle  a_m/m! < e^{-c_1m + o(m)}.

Proof: This follows from {|P^{m+k}| \leq |P^m| |P^k|}, dividing by {n^{m+k}}, and taking the limit. The second part follows from the subadditivity lemma and the data point {a_4 = 22 < 24}. \Box

I think a stronger bound should be true: there should be a constant {c_2 > 0} such that

\displaystyle  a_m < m!^{1-c_2 + o(1)}.

(One might then contemplate whether {n/(1 - c_2) + o(n)} is the diameter of the pancake graph, but that seems a little bold at this stage.)

How to evaluate an ordering quiz

In a virtual pub quiz last week, Lucy and I were responsible for hosting the science round. Here was our question.

Put the following events in chronological order:

1. Discovery of the structure of DNA
2. Invention of the telescope
3. Periodic table developed
4. Invention of spectacles
5. Extinction of the dodo
6. Discovery of Uranus
7. Einstein’s theory of general relativity
8. First use of fingerprints in detection
9. Darwin’s theory of evolution
10. BCG vaccine first used on humans

Go on, have a go, write down your guess. I will tell you the correct answers in a moment, but before I do, how are you planning on evaluating your performance? What if you get 9 out of 10 events in the right order, but it turns out the event you thought might be last was actually first. Then you got 0 out of 10 events in the right place, but surely you deserve more than 0 points.

One reasonable way of doing this is by counting inversions. There are 45 pairs of events, and you deserve a point for every pair of events you get in the right order. An annoyance with this in practice is that it’s a little tedious to calculate quickly (there are 45 pairs to check), so I wrote a little tool for doing it. Try it in your own virtual pub quiz! (You may want to rescale to a 0–10 standard.)

Anyway, here is the correct ordering:

Invention of spectacles (1290)
Invention of the telescope (1608)
Extinction of the dodo (1662)
Discovery of Uranus (1781)
Darwin’s theory of evolution (1859)
Periodic table developed (1869)
First use of fingerprints in solving crime (1892)
Einstein’s theory of general relativity (1915)
BCG vaccine first used on humans (1921)
Discovery of the structure of DNA (1953)

Mathematically, assume the events are represented by the symbols {1, \dots, n} in such a way that the correct ordering is {1, \dots, n}, and the proffered solution is {\pi}, which is some permutation of {1, \dots, n}. The number of inversions in {\pi} is defined by

\displaystyle  \textup{inv}(\pi) = \# \{(i, j) : 1 \leq i < j \leq n, \pi(i) > \pi(j)\}.

This quantity goes by many names, e.g., bubble-sort distance in computer science, Kendall tau distance in statistics, Coxeter length in group theory.

If {\pi} is sorted then {\textup{inv}(\pi) = 0}, while if {\pi} is anti-sorted then {\textup{inv}(\pi) = \binom{n}{2}}, and

\displaystyle \mathbf{E} [\textup{inv}(\pi)] = \frac12 \binom{n}{2}.

Write

\displaystyle \textup{inv}(\pi) = \sum_{1 \leq i < j \leq n} I_{ij}(\pi),

where {I_{ij}} indicates the event that {\pi(i) > \pi(j)}. The pairwise correlations are readily calculated, because they reduce to statistics of 3-symbol permutations. Certainly {I_{ij}} and {I_{i'j'}} are independent if {\{i, j\} \cap \{i', j'\} = \emptyset}, while

\displaystyle \begin{array}{rcl} \mathbf{E} I_{ij} I_{i'j} &=& 1/3 \qquad (i < i' < j),\\ \mathbf{E} I_{ij} I_{ij'} &=& 1/3 \qquad (i < j < j'),\\ \mathbf{E} I_{ij} I_{jk} &=& 1/6 \qquad (i < j < k). \end{array}

It follows that

\displaystyle \begin{array}{rcl} \textup{Var}[\textup{inv}(\pi)] &=& \binom{n}{2} \frac{1}{4} + \binom{n}{3} \left( 2 \times (1/3 - 1/4) + 2 \times (1/3 - 1/4) + 2 \times (1/6 - 1/4) \right)\\ &=& \frac{n(n-1)}{8} + \frac{n(n-1)(n-2)}{36}. \end{array}

(Given a triple {i < j < k}, how many times do we have {\{i, j, k\} = \{i, j\} \cup \{i', j'\}} in each of the three cases above? Twice, twice, twice.) Call this {\sigma^2} (note {\sigma \sim n^{3/2} / 6}). Then, statistically, if our quiz participant is just a random monkey, we expect

\displaystyle \textup{inv}(\pi) \approx \frac12 \binom{n}{2} \pm \sigma.

The script in the link above computes

\displaystyle Z = \frac{\frac12\binom{n}{2} - \textup{inv}(\pi)}{\sigma},

which under the null hypothesis has zero mean and unit variance. Score more than 2 to convince me you’re not a monkey!