An asymptotic for the Hall–Paige conjecture

Problem 1 Let {G} be a group of order {n}. Is there a bijection {f \colon G \to G} such that the map {x \mapsto x f(x)} is also a bijection?

For example, if {G} has odd order, you can just take {f(x) = x}. Then {x f(x) = x^2}, and because every element has odd order this defines a bijection. If {G = \mathbf{F}_2^d}, then it suffices to find a linear map without an eigenvalue. Cyclic groups of even order never have complete mappings (option 1: stop reading and try to prove this as a diverting puzzle; option 2: read on, and it will be explained and become essential).

In general, a solution to Problem 1 is called a complete mapping. The definition of complete mapping is a little strange. Here is some motivation, if you want some. The construction of finite projective planes is a problem going back to Euler. You can think of a finite projective plane as a collection of {n-1} mutually orthogonal {n \times n} Latin squares, where two Latin squares are termed orthogonal if, when superimposed, the {n^2} pairs of symbols are distinct. The most familiar Latin square is the cyclic Latin square, which has entries {i + j \bmod n} for {0 \leq i, j < n}. More generally, for any group {G} of order {n}, the multiplication table of {G} is an {n \times n} Latin square. If you write down what it means for another square to be orthogonal to the {G}-based Latin square, you will find yourself re-inventing the definition of complete mapping.

Another motivation, of possibly broader interest, is simply that counting complete mappings turns out to be interesting and difficult, and requires us to hone some counting skills that, we hope, will be applicable elsewhere.

The concerted effort to answer Problem 1 began with Hall and Paige (1955). First, there is an obstruction, beginning with cyclic groups of even order. More generally, consider the abelianization {G^\textup{ab}} of {G}, and suppose all the elements of {G} have nontrivial sum in {G^\textup{ab}}. Then if {f} is a bijection the elements {x f(x)} will sum up to zero, so {x f(x)} cannot possibly be a bijection. Therefore in order for a complete mapping to exist, we must have {\prod G \in [G, G]}. Hall and Paige proved that this condition is equivalent to the Sylow 2-subgroups of {G} being trivial or noncyclic. Henceforth this condition will be called the Hall–Paige condition. Conversely, Hall and Paige conjectured that Problem 1 has a solution as long as {G} satisfies their condition.

Conjecture 2 (Hall–Paige) Every group satisfying the Hall–Paige condition has a complete mapping.

Progress on the Hall–Paige conjecture was slow. Hall and Paige (1955) proved the conjecture for solvable groups and symmetric and alternating groups, but progress on the full conjecture didn’t really get under way until the classification was established. For a complete history, see the book by Evans. Aschbacher (1990) proved various restrictions on the form of a minimal counterexample. The real breakthrough came in 2009 when Wilcox showed that a minimal counterexample would have to be simple, and furthermore could not be any of the simple groups of Lie type, which leaves only the Tits group and the 26 sporadic groups. At this point the problem is in principle a finite check, but still a mammoth one. Combining Wilcox’s technology with extensive computer algebra, Evans ruled out all of the remaining groups, with one exception: the fourth Janko group {J_4}, a group of order {86775571046077562880 \approx 9 \times 10^{19}}, and finally {J_4} was ruled out by Bray in work that remained unpublished until last year. Thus the Hall–Paige conjecture was proved. Although many people contributed, it is customary to attribute the final result to Wilcox, Evans, and Bray.

Meanwhile, another strand of study was developing. Suppose we take just the cyclic group {G = C_n}. We know there is at least one complete mapping as long as {n} is odd. But how many are there?

Problem 3 How many complete mappings does the cyclic group {C_n} have?

This problem can be compared with the well-known {n} queens puzzle: how many ways can you place {n} queens on an {n \times n} chessboard so that no two are attacking? If you make the chessboard toroidal, so that going off one edge brings you back on the opposite one, and if you only allow the queens to use one of the two diagonals, then you get an equivalent question.

Heuristically, there are {n!} bijections {f}, and for each the function {x\mapsto x f(x)} has a roughly {n!/n^n} chance of being a bijection, so you can guess that the number of complete mappings is about {n!^2 / n^n}; this conjecture is attributed to Vardi (1991) (in a weaker form) and Wanless (2011).

Conjecture 4 (Vardi–Wanless) The number of complete mappings of {C_n} is asymptotically {(e^{-1} + o(1))^n n!}.

This is where additive combinatorics enters. A few years ago Freddie Manners, Rudi Mrazović, and I started thinking about Problem 3, motivated by the observation that it can be thought of as requesting the number of solutions to

\displaystyle  \pi_1 + \pi_2 = \pi_3

(or additive triples) with {\pi_1, \pi_2, \pi_3 \in S}, where {S \subset C_n^n} is the set of bijections. The usual tool for counting additive triples (or solutions to any linear equation) is Fourier analysis; hence the problem reduces to understanding the Fourier transform of {S}. After barking up this tree (for quite a while), we were eventually able to prove the Vardi–Wanless conjecture. In fact we proved something even more precise: the solution to Problem 3 turns out to be asymptotically

\displaystyle  (e^{-1/2} + o(1)) n!^2/n^{n-1},

which differs from the Vardi–Wanless guess (though they never made a guess so precise) in two important respects: there is an extra factor of {n}, and a factor of {e^{-1/2}} (what the hell?).

Since the key to unlocking Problem 3 is Fourier analysis, it seems like we are using the abelianness of {C_n} in an absolutely essential way, but today Freddie and Rudi and I have a new announcement: we can count complete mappings in nonabelian groups too, and the asymptotic is essentially unchanged.

Theorem 5 Let {G} be a group of order {n} satisfying the Hall–Paige condition. Then the number of complete mappings of {G} is asymptotically

\displaystyle  (e^{-1/2} + o(1)) |G^\textup{ab}| n!^2 / n^n.

Here are some other highlights from our paper:

  1. The asymptotic above is begging for a heuristic explanation. We give one! It uses something called the principle of maximum entropy, which I personally can’t wait to use again. Basically, if {f} is a random bijection, {x \mapsto x f(x)} is more prone to collisions than a random function, and, heuristically, has a Gibbs distribution with a particular partition function, and a calculation shows that its probability of being a bijection is therefore smaller by a constant factor.
  2. We evaluate the next term in the asymptotic! We actually show that the number of complete mappings is

    \displaystyle  e^{-1/2} (1 + (1/3 + \textup{inv}(G)/4)/n + O(1/n^2)) |G^\textup{ab}| n!^2/n^n,

    where {\textup{inv}(G)} is the proportion of involutions in {G}. This verifies another conjecture of Wanless: that elementary abelian {2}-groups have the most complete mappings of any group of the same order. We can keep going, in principle, but really we would rather not: there is a combinatorial explosion in the number of “collision types” (in a sense we make precise), and to give the next term in the asymptotic you need to sum over all of these.

  3. Orthogonally, we can give effective estimates instead of asymptotics. For example, we can show that as long as {|G| > 10^5} and all nontrivial complex representations of {G} have degree at least {21}, then {G} has a positive number of complete mappings (in fact, within a constant factor of {n!^2/n^n} such). By combining a few such effective statements, we can cover all the simple sporadic groups, apart from the two smallest {M_{11}} and {M_{12}}. You can view this as a second-generation proof of the Evans–Bray contribution to the Hall–Paige conjecture. In fact, our proof covers all nonabelian finite simple groups except for some alternating groups, some {\textup{PSL}_2(q)'s}, and about ten other groups that we list.

The preprint is available at Any comments appreciated, as always!

The avoidance density of (k,l)-sum-free sets

A set {S \subset\mathbf{N} = \{1, 2, \dots\}} is called sum-free if it contains no solutions to {x+y=z}. For example, the set of odd integers is sum-free, as is the set {\{51, 52, \dots, 100\}}. Generalizing these two examples, for every {\theta \in \mathbf{R}/\mathbf{Z}} (real numbers mod {1}), the “middle third set”

\displaystyle S_\theta = \{x \in N: \theta x \in (1/3, 2/3) \pmod 1\}

is sum-free (the odd integers is precisely {S_{1/2}}, while the interval {\{51, \dots, 100\}} is a subset of {S_\theta} for {\theta = 1/150 - \epsilon}). As observed by Erdős, for any set {A \subset \mathbf{N}}, one of the sets {S_\theta \cap A} is large, because

\displaystyle  \int |S_\theta \cap A| \,d\theta = |A|/3.

Therefore every set of {n} integers contains a sum-free subset of size at least {n/3}.

Suppose more generally you have some type of configuration {\mathfrak{C}} (being deliberately vague about allowed generality), and you are interested in finding {\mathfrak{C}}-free subsets of arbitrary sets. The avoidance density of {\mathfrak{C}} is the largest constant {\delta} such that every set of {n} positive integers has a {\mathfrak{C}}-free subset of size at least {\delta n}. Here are some examples.

  1. The equation {x - y = 1} has avoidance density {1/2}: clearly any subset of a long interval contains at most half-sized subsets without adjacent elements, and conversely I can always take either the odds or the evens to get an adjacency-free subset of at least half the size. The equation {x - y = 2} also has avoidance density {1/2}, as does {x - y = c} for any constant {c}. (Partition into arithmetic progressions of common difference {c} and take every other element to find your large subset. For an example of a set with no large {\mathfrak{C}}-free subsets, take an arithmetic progression of common difference {c}.)
  2. For any rational {a \neq 1}, the equation {ax = y} has avoidance density {1/2} (this is a similar easy exercise).
  3. Suppose {\mathfrak{C}} represents {3}-term arithmetic progressions. Then a {\mathfrak{C}}-free subset is a set without {3}-term arithmetic progressions, and by Roth’s theorem the set {\{1, \dots, n\}} does not have any positive-density {\mathfrak{C}}-free subsets, so the avoidance density of {\mathfrak{C}} is zero. Similarly, the avoidance density of {k}-term progressions is zero, by Szemerédi’s theorem. More generally, any sort of finite configuration which is preserved under affine maps has avoidance density zero, because you can always find it inside a sufficiently long arithmetic progression.

Erdős’s argument shows that {x+y=z} has avoidance density at least {1/3}. A few years ago, Ben Green, Freddie Manners and I showed that the avoidance density of {x+y=z} is equal to {1/3}: that is, for every {\epsilon>0} we constructed a set of {n} integers with no sum-free subset of size larger than {(1/3 + \epsilon)n}.

The next step is to look at {\mathfrak{C}_{k,l} : x_1 + \cdots + x_k = y_1 + \cdots + y_l} for arbitrary {k > l}. A {\mathfrak{C}_{k,l}}-free set is sometimes called {(k,l)}-sum-free. The natural conjecture is that the avoidance density of {\mathfrak{C}_{k,l}} is {1/(k+l)}, as this is what the natural adaptation of Erdős’s argument shows. I was able to prove this in two big cases: in the case {k \leq 3l}, by generalizing the previous method, and in the case {l = 1}, using a new argument based on transference and a beautiful theorem of Łuczak and Schoen.

Today a paper appeared on the arxiv by Yifan Jing which settles this question for all {k, l}. Thus the avoidance density of {\mathfrak{C}_{k,l}} is indeed {1/(k+l)}! To do this Jing extended the Łuczak–Schoen theorem to {l > 1}, proving the following theorem (Lemma 5.4 in Jing’s paper).

Theorem 1 Every {(k,l)}-sum-free subset of {\mathbf{N}} of upper density greater than {1/(k+l)} is contained in a periodic {(k,l)}-sum-free set.

What this means is that {(k,l)}-sum-free sets of density greater than {1/(k+l)} generally have to be structured. On the other hand, if {\delta} is the avoidance density of {\mathfrak{C}_{k,l}}, then we can create random-looking {(k,l)}-sum-free subsets of density nearly {\delta}. In particular, using a transference tool (namely, Loeb measure), we can create random-looking infinite {(k,l)}-sum-free subsets of upper density nearly {\delta}. Therefore we must have {\delta \leq 1/(k+l)}.

Jing also looked the lower bound. Famously Bourgain did some work on this, showing that every set of {n} positive integers has a sum-free subset of size at least {(n+2)/3}, emphasis on the {2}. With a more interesting argument Bourgain showed that there is a {(3,1)}-sum-free set of size at least {n/4 + c \log n / \log \log n}. Jing adapted this argument for many pairs {(k,l)}, though doing so for {(k,l) = (2,1)} remains a stubborn open problem. It is interesting to see how Bourgain’s method adapts, and I look forward to thinking more about this.

It is all-too-easy to pose difficult open questions in this circle of ideas, but I will risk two.

  1. This question is posed by Jing. Let {\square^d} represent {d}-dimensional projective cubes. That is, {\square^2} represents the sum configuration {\{x, y, x+y\}}, {\square^3} represents {\{x, y, z, x+y, x+z, y+z, x+y+z\}}, etc. What is the avoidance density of {\square^d}?
  2. What is the avoidance density of {2x+y = z}? Or for that matter any equation without only {\pm1} coefficients?

One general theorem that comes out of the transference method is the following. Say that {\mathfrak{C}} is dilation-invariant if whenever {C \in \mathfrak{C}} we have also {\lambda C \in \mathfrak{C}} for every {\lambda \in \mathbf{N}}. (Everything we have mentioned is dilation-invariant, apart from {x-y = c}.)

Theorem 2 Let {\mathfrak{C}} be any dilation-invariant configuration. Then the avoidance density of {\mathfrak{C}} is the same as the largest multiplicative upper density of an infinite {\mathfrak{C}}-free subset of {\mathbf{N}}.

Here multiplicative upper density is defined as the asymptotic upper density with respect to your favourite multiplicative Følner sequence, such as

\displaystyle  A_n = \{p_1^{e_1} \cdots p_n^{e_n} : 0 \leq e_i < n\},

where {p_1, p_2, \dots} are the primes in any order. Therefore questions 1 and 2 ask for the largest multiplicative upper density of {\square^d}-free or {2x+y=z}-free subset of {\mathbf{N}}, respectively.

Here is my general conjecture.

Conjecture 3 Let {\mathfrak{C}} be any type of configuration described by a single equation {a_1 x_1 + \cdots + a_k x_k = 0}, with {a_1, \dots, a_k \in \mathbf{Z}\setminus\{0\}}. Then the avoidance density of {\mathfrak{C}} is the same as the measure of the largest {\mathfrak{C}}-free subset of {\mathbf{R}/\mathbf{Z}}.

One might also conjecture this for higher-complexity systems like {\square^3}, but I have no idea whether that is likely to be true! Certainly the lower bound holds: if {S \subset \mathbf{R}/\mathbf{Z}} is {\mathfrak{C}}-free then the avoidance density of {\mathfrak{C}} is at least {\mu(S)}. But even if this is true it’s not very useful! What is the largest {\square^3}-free subset of {\mathbf{R}/\mathbf{Z}}? What is the largest {2x+y=z}-free subset of {\mathbf{R}/\mathbf{Z}}? These questions are simpler, certainly, but still not at all obvious!

Edit: The largest {\mathfrak{C}}-free sets {S \subset \mathbf{R}/\mathbf{Z}} I know in these two cases are the following. I have made very little effort to look for other solutions. Can you spot a better solution? Is there a systematic way of solving this problem?

{\mathfrak{C}} {S} {\mu(S)}
{\square^3} {(1/4, 3/4)} {1/2}
{2x+y=z} {(1/8, 3/8)} {1/4}

Later edit: Here are some variants of the {\square^3} question, essentially due to Cosmin Pohoata:

  1. Suppose {A} is a {\square^3}-free subset of {\{1, \dots, n\}}, e.g., {\{x \equiv 1,2 \pmod 3\}}, or {(n/3, n]} (both of these are {S_\theta} for some {\theta}). Is it true that {|A| \leq (2/3 + o(1))n}?
  2. Suppose {A} is a {\square^3}-free subset of {\mathbf{Z}/2^k\mathbf{Z}}, e.g., {\{x \equiv 1, 4 \pmod 8\}}. Is it true that {|A| \leq (5/8 + o(1))2^k}?

The interesting point here is that shoehorning the set into {\mathbf{Z}_2} appears to come with a density hit (as with sum-free sets), and the extremal set appears not to be of the form {S_\theta}.