Separating big sticks from little sticks, and applications

Start with a stick. Randomly break it into two pieces and throw away one half. Then randomly break what’s left into two pieces and throw away half, and so on forever. At the end of this process (until we get bored or run out of stick or whatever), collect up all the pieces of stick we threw aside and examine them. Here is a theorem.

Theorem 1 Almost surely you can divide the stick pieces into two piles, the big sticks and the little sticks, in such a way that the little sticks, even put back together end-to-end, are much smaller than the littlest big stick.

To formalize the process, suppose the original stick has length {1}, and we break it into {u_1} and {1-u_1}, and then we break {u_1} into {u_1u_2} and {u_1(1-u_2)}, and so on. In the end we have a stick of length

\displaystyle L_n = u_1 \cdots u_{n-1} (1-u_n)

for each {n\geq 1}, where {u_1, \dots, u_n} are independent and {U[0,1]}.

Warning: {L_n} is almost surely not monotonic! The biggest stick piece is not necessarily the first piece we broke off, although it is with positive probability. The distribution of {L_1 = 1-u_1} is {U[0,1]}, but the distribution of {L_\text{max} = \max_{n\geq 1} L_n} is rather more complicated (more on this later).

Consider the record process {R_n = \min_{i \leq n} (1-u_i)}. At step {n}, there is a probability {R_{n-1}} that {1-u_n} is a new record, and in this circumstance we have {R_n = 1-u_n \sim U[0, R_{n-1}]}. Therefore there will be infinitely many {n} such that {R_n < \epsilon R_{n-1}}. But {u_n} and {1-u_n} have the same distribution, so there will also be infinitely many {n} such that {u_n < \epsilon R_n}. For any such {n},

\displaystyle u_1 \cdots u_n < \epsilon u_1 \cdots u_{n-1} \min_{i \leq n} (1-u_i) \leq \epsilon \min_{i \leq n} u_1 \cdots u_{i-1} (1-u_i) = \epsilon \min_{i \leq n} L_i.

But

\displaystyle u_1 \cdots u_n = \sum_{i > n} L_i.

Therefore after step {n} the total of what is left is small compared to our smallest stick so far, and this proves the theorem.

As it turns out, sticks have applications!

Corollary 2 Let {\pi \in S_n} be a random permutation. With high probability (as {n\to\infty}), we can divide the cycles of {\pi} into two sets, the big cycles {\sigma_1, \dots, \sigma_k} and the little cycles {\sigma_{k+1}, \dots, \sigma_m}, in such a way that

\displaystyle \sum_{i = k+1}^m |\sigma_i| < \epsilon \min_{1 \leq i \leq k} |\sigma_i|.

Corollary 3 Let {n \in [1, X]} be a random integer. With high probability (as {X \to \infty}), we can factorize {n} as {n_1 n_2} in such a way that

\displaystyle n_2 < (\min_{p \mid n_1} p)^\epsilon.

Corollary 4 Let {f \in \mathbf{F}_q[X]} be a random polynomial of degree {d}. With high probability (as {d \to \infty}), we can factorize {f} as {f = f_1 f_2} in such a way that

\displaystyle \deg f_2 < \epsilon \min_{1 \neq \phi \mid f_1} \deg \phi.

What about {L_\text{max}}? By the same connections, {L_\text{max}} models the longest cycle in a random permutation, the log of the largest prime factor of a random integer, and the degree of the largest irreducible factor of a random polynomial. The event that {L_\text{max} < x} is the intersection of two events: {1-u_1 < x} and {\max_{n\geq 2} L_n < x}. But it is easy to see that {\max_{n \geq 2} L_n} has the same distribution as {u_1 L'_\text{max}}, where {L'_\text{max}} is an independent copy of {L_\text{max}}. (The largest stick other than the first is like the largest stick if we had started from a stick of length {u_1} instead of {1}.) Therefore

\displaystyle \mathbf{P}(L_\text{max} < x) = \mathbf{P}(1-u < x, u L_{\max} < x) = \int_{1-x}^1 \mathbf{P}(L_\text{max} < x/u) \, du.

Let {\phi(t) = \mathbf{P}(L_\text{max} < 1/t)}. Then {\phi(t) = 1} for {0  1} we have

\displaystyle \phi(t) = \int_{1-1/t}^1 \phi(tu) \, du = \frac1t \int_{t-1}^t \phi(s) \, ds,

or the delay differential equation

\displaystyle t \phi'(t) + \phi(t-1) = 0.

There is not a much more explicit expression for this function: this is the Dickman–de Bruijn function.

Bonus: What is the probabiliy that {L_\text{max} = L_1}? Answer:

\displaystyle \mathbf{P}(u_1 L'_\text{max} < 1-u_1) = \int_0^1 \phi(u/(1-u)) \, du = \int_0^\infty \frac{\phi(t)}{(t+1)^2}\, dt \approx 0.62

(this is the Golomb–Dickman constant).

This is some sort of universality phenomenon: we have several natural objects (permutations, integers, polynomials) that break up into irreducible pieces randomly, so it is plausible that some universal fundamental process should underlie each of them. On the other hand, the connections seem to run deeper than just the stick-breaking process. For example, if you look at the proportion of permutations having no cycles smaller than some bound, or the proportion of integers having no prime factors below an equivalent bound, in both cases you run into the Buchstab function, which is similar but different. This connection is not modelled by the stick-breaking process.

Some references:

  1. Tao on the stick-breaking process, or the Poisson–Dirichlet process (which is essentially the same);
  2. the distribution of large prime factors of a random integer was found by Billingsley (1972); the clean stick-breaking formulation is due to Donnelly and Grimmett (1993);
  3. the same result for permutations was established by Kingman (1977) and Vershik and Shmidt (1977); see also Arratia, Barbour, and Tavarè (2006) (and references therein) for convergence rate estimates;
  4. best of all, see the comic book by Andrew and Jennifer Granville (2019).

Probability that two permutations have the same order

What is the probability {p(n)} that two random permutations {\pi_1, \pi_2 \in S_n} have the same order? That is the question asked by this MathOverflow question from 2016, and considered again in this paper of Acan, Burnette, Schmutz, and Thomas from last month. It’s an innocuous-sounding question that turns out to encapsulate some rather subtle analytic combinatorics.

You haven’t got started until you realize {p(n) \geq 1/n^2}: this is the probability that both {\pi_1} and {\pi_2} are {n}-cycles. In fact since {\pi_1} and {\pi_2} have the same order whenever they are conjugate, we can say

\displaystyle  \mathbf{P}(\textup{ord}(\pi_1) = \textup{ord}(\pi_2)) \geq \mathbf{P}(\pi_1 \sim \pi_2).

This latter probability gets a significant contribution from every conjugacy class {\mathfrak{c}} which is a nearly an {n}-cycle, specifically which contains an {(n-k)}-cycle for {k=O(1)}. Suppose {\mathfrak{c}} consists of an {(n-k)}-cycle as well as {c_j} cycles of length {j} for each {j\geq 0}, where {\sum j c_j = k}. Then the probability that {\pi_1, \pi_2 \in \mathfrak{c}} is

\displaystyle  \frac1{(n-k)^2} \prod_{j\geq 0} \left( \frac1{j^{c_j} c_j!} \right)^2 \sim \frac1{n^2} \prod_{j\geq 0} \left( \frac1{j^{c_j} c_j!}\right)^2.

If we sum this over all choices of {k} and {(c_j)}, we get {\Delta / n^2}, where

\displaystyle  \Delta = \exp \sum_{j \geq 0} \frac1{j^{2c} c!^2} = 4.26\dots,

and it’s not too hard to see that all other conjugacy classes contribute negligibly. As far as I know this asymptotic was original observed by Flajolet, Fusy, Gourdon, Panario, and Pouyanne (see Proposition 4).

This isn’t the full story for {p(n)}, since there are pairs of different conjugacy classes which contribute significantly, such as the cycle types {(n-2, 2)} and {(n-2, 1, 1)} if {n} is even, or {(n-3, 2, 1)} and {(n-3, 1, 1, 1)} if {n} is odd. It’s also clear from this example that the arithmetic of {n} is now relevant. But at least a natural approach is suggested.

Assume that the main contribution comes from conjugacy classes of the shape {(n-k, \lambda)}, where {k=O(1)} and {\lambda} is a partition of {k}. The order is given by

\displaystyle  \textup{lcm}(n-k, \textup{lcm}(\lambda)) = \frac{\textup{lcm}(\lambda)}{\textup{gcd}(\textup{lcm}(\lambda), n-k)} \times (n-k).

Now as long as {k} is bounded, the only way we can get a collision between two of these is if the two factors separated by {\times} collide separately. If we sum over such colliding pairs, we should get a rather complicated arithmetical almost-periodic function of {n}, but at least one we can try to understand with a computer.

But if you do this, a rather strange picture develops. Counting only those conjugacy classes with {k\leq 50}, we get the following picture.

Notice the drop at exactly {n=50}, where we arbitrarily decided to limit {k}. What this is saying is that our assumption, that the main contribution comes from {k=O(1)}, was faulty.

After thinking further along these lines I contributed an answer to the MO question which proves that {p(n)} is infinitely often at least roughly {\log^*(n)/n^2}, and in particular {\limsup p(n) n^2 = \infty}. But it really looks like there is more to understand here. Even though the {n} in my MO answer are very particular, and the lower bound very weak, I’m still tempted to conjecture that {\liminf p(n) n^2 = \infty}. And it would be nice to know the true order of magnitude!

In case anybody wants to reproduce or extend my calculations, here is my colab notebook.