The avoidance density of (k,l)-sum-free sets

A set {S \subset\mathbf{N} = \{1, 2, \dots\}} is called sum-free if it contains no solutions to {x+y=z}. For example, the set of odd integers is sum-free, as is the set {\{51, 52, \dots, 100\}}. Generalizing these two examples, for every {\theta \in \mathbf{R}/\mathbf{Z}} (real numbers mod {1}), the “middle third set”

\displaystyle S_\theta = \{x \in N: \theta x \in (1/3, 2/3) \pmod 1\}

is sum-free (the odd integers is precisely {S_{1/2}}, while the interval {\{51, \dots, 100\}} is a subset of {S_\theta} for {\theta = 1/150 - \epsilon}). As observed by Erdős, for any set {A \subset \mathbf{N}}, one of the sets {S_\theta \cap A} is large, because

\displaystyle  \int |S_\theta \cap A| \,d\theta = |A|/3.

Therefore every set of {n} integers contains a sum-free subset of size at least {n/3}.

Suppose more generally you have some type of configuration {\mathfrak{C}} (being deliberately vague about allowed generality), and you are interested in finding {\mathfrak{C}}-free subsets of arbitrary sets. The avoidance density of {\mathfrak{C}} is the largest constant {\delta} such that every set of {n} positive integers has a {\mathfrak{C}}-free subset of size at least {\delta n}. Here are some examples.

  1. The equation {x - y = 1} has avoidance density {1/2}: clearly any subset of a long interval contains at most half-sized subsets without adjacent elements, and conversely I can always take either the odds or the evens to get an adjacency-free subset of at least half the size. The equation {x - y = 2} also has avoidance density {1/2}, as does {x - y = c} for any constant {c}. (Partition into arithmetic progressions of common difference {c} and take every other element to find your large subset. For an example of a set with no large {\mathfrak{C}}-free subsets, take an arithmetic progression of common difference {c}.)
  2. For any rational {a \neq 1}, the equation {ax = y} has avoidance density {1/2} (this is a similar easy exercise).
  3. Suppose {\mathfrak{C}} represents {3}-term arithmetic progressions. Then a {\mathfrak{C}}-free subset is a set without {3}-term arithmetic progressions, and by Roth’s theorem the set {\{1, \dots, n\}} does not have any positive-density {\mathfrak{C}}-free subsets, so the avoidance density of {\mathfrak{C}} is zero. Similarly, the avoidance density of {k}-term progressions is zero, by Szemerédi’s theorem. More generally, any sort of finite configuration which is preserved under affine maps has avoidance density zero, because you can always find it inside a sufficiently long arithmetic progression.

Erdős’s argument shows that {x+y=z} has avoidance density at least {1/3}. A few years ago, Ben Green, Freddie Manners and I showed that the avoidance density of {x+y=z} is equal to {1/3}: that is, for every {\epsilon>0} we constructed a set of {n} integers with no sum-free subset of size larger than {(1/3 + \epsilon)n}.

The next step is to look at {\mathfrak{C}_{k,l} : x_1 + \cdots + x_k = y_1 + \cdots + y_l} for arbitrary {k > l}. A {\mathfrak{C}_{k,l}}-free set is sometimes called {(k,l)}-sum-free. The natural conjecture is that the avoidance density of {\mathfrak{C}_{k,l}} is {1/(k+l)}, as this is what the natural adaptation of Erdős’s argument shows. I was able to prove this in two big cases: in the case {k \leq 3l}, by generalizing the previous method, and in the case {l = 1}, using a new argument based on transference and a beautiful theorem of Łuczak and Schoen.

Today a paper appeared on the arxiv by Yifan Jing which settles this question for all {k, l}. Thus the avoidance density of {\mathfrak{C}_{k,l}} is indeed {1/(k+l)}! To do this Jing extended the Łuczak–Schoen theorem to {l > 1}, proving the following theorem (Lemma 5.4 in Jing’s paper).

Theorem 1 Every {(k,l)}-sum-free subset of {\mathbf{N}} of upper density greater than {1/(k+l)} is contained in a periodic {(k,l)}-sum-free set.

What this means is that {(k,l)}-sum-free sets of density greater than {1/(k+l)} generally have to be structured. On the other hand, if {\delta} is the avoidance density of {\mathfrak{C}_{k,l}}, then we can create random-looking {(k,l)}-sum-free subsets of density nearly {\delta}. In particular, using a transference tool (namely, Loeb measure), we can create random-looking infinite {(k,l)}-sum-free subsets of upper density nearly {\delta}. Therefore we must have {\delta \leq 1/(k+l)}.

Jing also looked the lower bound. Famously Bourgain did some work on this, showing that every set of {n} positive integers has a sum-free subset of size at least {(n+2)/3}, emphasis on the {2}. With a more interesting argument Bourgain showed that there is a {(3,1)}-sum-free set of size at least {n/4 + c \log n / \log \log n}. Jing adapted this argument for many pairs {(k,l)}, though doing so for {(k,l) = (2,1)} remains a stubborn open problem. It is interesting to see how Bourgain’s method adapts, and I look forward to thinking more about this.

It is all-too-easy to pose difficult open questions in this circle of ideas, but I will risk two.

  1. This question is posed by Jing. Let {\square^d} represent {d}-dimensional projective cubes. That is, {\square^2} represents the sum configuration {\{x, y, x+y\}}, {\square^3} represents {\{x, y, z, x+y, x+z, y+z, x+y+z\}}, etc. What is the avoidance density of {\square^d}?
  2. What is the avoidance density of {2x+y = z}? Or for that matter any equation without only {\pm1} coefficients?

One general theorem that comes out of the transference method is the following. Say that {\mathfrak{C}} is dilation-invariant if whenever {C \in \mathfrak{C}} we have also {\lambda C \in \mathfrak{C}} for every {\lambda \in \mathbf{N}}. (Everything we have mentioned is dilation-invariant, apart from {x-y = c}.)

Theorem 2 Let {\mathfrak{C}} be any dilation-invariant configuration. Then the avoidance density of {\mathfrak{C}} is the same as the largest multiplicative upper density of an infinite {\mathfrak{C}}-free subset of {\mathbf{N}}.

Here multiplicative upper density is defined as the asymptotic upper density with respect to your favourite multiplicative Følner sequence, such as

\displaystyle  A_n = \{p_1^{e_1} \cdots p_n^{e_n} : 0 \leq e_i < n\},

where {p_1, p_2, \dots} are the primes in any order. Therefore questions 1 and 2 ask for the largest multiplicative upper density of {\square^d}-free or {2x+y=z}-free subset of {\mathbf{N}}, respectively.

Here is my general conjecture.

Conjecture 3 Let {\mathfrak{C}} be any type of configuration described by a single equation {a_1 x_1 + \cdots + a_k x_k = 0}, with {a_1, \dots, a_k \in \mathbf{Z}\setminus\{0\}}. Then the avoidance density of {\mathfrak{C}} is the same as the measure of the largest {\mathfrak{C}}-free subset of {\mathbf{R}/\mathbf{Z}}.

One might also conjecture this for higher-complexity systems like {\square^3}, but I have no idea whether that is likely to be true! Certainly the lower bound holds: if {S \subset \mathbf{R}/\mathbf{Z}} is {\mathfrak{C}}-free then the avoidance density of {\mathfrak{C}} is at least {\mu(S)}. But even if this is true it’s not very useful! What is the largest {\square^3}-free subset of {\mathbf{R}/\mathbf{Z}}? What is the largest {2x+y=z}-free subset of {\mathbf{R}/\mathbf{Z}}? These questions are simpler, certainly, but still not at all obvious!

Edit: The largest {\mathfrak{C}}-free sets {S \subset \mathbf{R}/\mathbf{Z}} I know in these two cases are the following. I have made very little effort to look for other solutions. Can you spot a better solution? Is there a systematic way of solving this problem?

{\mathfrak{C}} {S} {\mu(S)}
{\square^3} {(1/4, 3/4)} {1/2}
{2x+y=z} {(1/8, 3/8)} {1/4}

The Kronecker factor of an ultrafinite group

A few months ago on his blog Terry Tao explained how one could, by analogy with the theory of graph limits, replace the explicit use of arithmetic regularity with a soft device which he calls an additive limit. Roughly speaking, the additive limit, or Kronecker factor, of a sequence of finite abelian groups {G_i} is a compact quotient of the ultraproduct {\prod G_i} which controls the convolutions. The result is that many theorems from additive combinatorics, such as Roth’s theorem, which are usually proved using quantitative tools like Fourier analysis can instead be proved using soft tools more like the Lebesgue differentiation theorem.

The purpose of this post is to extend Tao’s construction to the nonabelian setting. Tao already stated in his post that this should be possible, so one could say that this is just an exercise in nonabelian Fourier analysis. On the other hand the proof in the nonabelian setting more or less forces a more categorical point of view, so certain points of this exercise are instructive.

1. Measurable Bohr compactification

Given any topological group {G} there is a compact group {\textup{Bohr}(G)}, called the Bohr compactification of {G}, such every continuous homomorphism from {G} to a compact group {K} factors uniquely through {\textup{Bohr}(G)}. One can think of {\textup{Bohr}(G)} as the ‘largest’ compact group in which {G} has dense image. We need a variant of this definition for groups {G} endowed only with a {\sigma}-algebra instead of a topology.

By a measurable group we mean a group {G} together with a {\sigma}-algebra {\Sigma} of subsets of {G}. Note that we do not make any measurability assumptions about the group operation or even the left- or right-shifts, though certainly it would be sensible to do so in other contexts. For us the only role of {\Sigma} is to distinguish among all homomorphisms the measurable homomorphisms. The analogue of Bohr compactification for measurable groups is given by the following theorem.

Theorem 1 (Existence of measurable Bohr compactification) For every measurable group {G} there is a compact group {\textup{Bohr}_m(G)} together with a {(\Sigma,\textup{Baire})}-measurable homomorphism {\pi:G\to \textup{Bohr}_m(G)} such that every {(\Sigma,\textup{Baire})}-measurable homomorphism from {G} to a compact group {K} factors uniquely as the composition of {\pi:G\to \textup{Bohr}_m(G)} and a continuous homomorphism {\textup{Bohr}_m(G)\to K}.

In category-theoretic terms {\textup{Bohr}_m} is a left adjoint to the functor {\textup{Baire}} from compact groups to measurable groups which replaces a group’s topology with its Baire {\sigma}-algebra. By the adjoint functor theorem it suffices to check that the functor {\textup{Baire}} is continuous, which boils down to the following lemma. Incidentally the analogue of this lemma fails for the Borel {\sigma}-algebra, which is why we must consider the Baire {\sigma}-algebra instead.

Lemma 2 For compact Hausdorff spaces {X_i} we have {\textup{Baire}(\prod X_i) = \prod \textup{Baire}(X_i)}. In words, the Baire {\sigma}-algebra of the product is the product of the Baire {\sigma}-algebras.

Proof: The containment {\prod \textup{Baire}(X_i) \subset \textup{Baire}(\prod X_i)} is immediate from the defintions. To prove the opposite containment it suffices to check that every continuous function {f:\prod X_i \to \mathbf{R}} is measurable with respect to {\prod\textup{Baire}(X_i)}. This is certainly true of functions {f} which depend on only finitely many coordinates, and thus for all continuous functions {f} by the Stone–Weierstrass theorem. \Box

Those who, like me, are not used to thinking in terms of the adjoint functor theorem will appreciate a more pedestrian proof of Theorem 1. To this end, let {\mathcal{K}} be the set of all pairs {(f,K)}, where {K} is a compact group and {f:G\to K} is a {(\Sigma,\textup{Baire})}-measurable homomorphism with {f(G)} dense in {K}. Then

\displaystyle  \pi:G\to\prod_{(f,K)\in\mathcal{K}} K

is the diagonal map then {\pi:G\to\overline{\pi(G)}} is the Bohr compactification of {G}. Indeed, the measurability of {\pi} follows from Lemma 2, and the universal property is essentially obvious: given a {(\Sigma,\textup{Baire})}-measurable homomorphism {f:G\to K} with {K} compact, the pair {(f,\overline{f(G)})} appears in {\mathcal{K}}, so we have continuous maps

\displaystyle  \overline{\pi(G)}\to\prod_{(g,L)\in\mathcal{K}} L \to \overline{f(G)}\to K

whose composition {h} satisfies {h\pi = f}; moreover {h} is unique because {\pi(G)} is dense in {\overline{\pi(G)}}.

Alternatively, by the Peter–Weyl theorem, {\textup{Bohr}_m(G)} can be defined as the inverse limit of all measurable finite-dimensional unitary representations of {G}.

It is natural to ask what relation the measurable Bohr compactification {\textup{Bohr}_m} bears to the usual Bohr compactification {\textup{Bohr}}. In particular, is {\textup{Bohr}} just the composition {\textup{Bohr}_m\circ\textup{Baire}}? Clearly {\textup{Bohr}(G) = \textup{Bohr}_m(\textup{Baire}(G))} if and only if every measurable homomorphism {f:G\to K} from {G} to a compact group {K} is continuous. This follows from Steinhaus’s theorem if {G} is locally compact, but it certainly fails in general, for instance for {G=\mathbf{Q}} with the topology inherited from {\mathbf{R}}.

2. The Bohr compactification of an ultrafinite group

Let {G_1,G_2,\dots} be a sequence of finite groups and let {p\in \beta\mathbf{N} \setminus\mathbf{N}} be a nonprincipal ultrafilter. We form the ultraproduct {G=\prod_{n\to p} G_n} and make it into a measurable group by giving it the Loeb {\sigma}-algebra {\mathcal{L}_G}, the {\sigma}-algebra generated by internal sets {\prod_{n\to p} A_n}, where {A_n\subset G_n}. We define the Loeb measure {\mu_G} on internal sets {\prod_{n\to p}A_n} by putting

\displaystyle \mu_G\left(\prod_{n\to p}A_n\right)=\text{st}\lim_{n\to p} |A_n|/|G_n|,

and we define {\mu_G} on {\mathcal{L}_G} by extension.

While the group operation {G\times G\to G} is not generally measurable with respect to the product {\sigma}-algebra {\mathcal{L}_G\times\mathcal{L}_G}, it is measurable with respect to the larger {\sigma}-algebra {\mathcal{L}_{G\times G}}. Moreover this latter {\sigma}-algebra is still ‘product-like’ in the sense that all {\mathcal{L}_{G\times G}}-measurable {f:G\times G\to\mathbf{R}_{\geq0}} obey Fubini’s theorem, so we have a sensibly defined convolution operation {L^1(G)\times L^1(G)\to L^1(G)} given

\displaystyle f*g(x) = \int f(y)g(y^{-1}x)\,d\mu_G(y).

Now consider the Bohr compactification {\textup{Bohr}_m(G)} of {G}. The first thing to notice is that {\pi_*\mu_G} is a {\pi(G)}-invariant Baire probability measure on {\textup{Bohr}_m(G)}. Since {\pi(G)} is dense in {\textup{Bohr}_m(G)} we conclude that {\pi_*\mu_G} is in fact {\textup{Bohr}_m(G)}-invariant, so by the uniqueness of Haar measure we must have

\displaystyle  \pi_*\mu_G=\mu_{\textup{Bohr}_m(G)}.

In the remainder of this section we relate the two convolution algebras {L^2(G)} and {L^2(\textup{Bohr}_m(G))}. Given {f\in L^2(\textup{Bohr}_m(G))} we can form the pullback {\pi^*f = f\circ\pi}. Since

\displaystyle  \|f\circ\pi\|_{L^2(G)}^2 = \int |f\circ\pi|^2\,d\mu_G = \int |f|^2 \,d\mu_{\textup{Bohr}_m(G)} = \|f\|_{L^2(\textup{Bohr}_m(G))}^2,

we see that {\pi^*f} is a well defined element of {L^2(G)}, and in fact {\pi^*} defines an isometric embedding

\displaystyle  \pi^*:L^2(\textup{Bohr}_m(G))\to L^2(G).

In the other direction we have the pushforward

\displaystyle  \pi_*:L^2(G)\to L^2(\textup{Bohr}_m(G)),

defined as the adjoint of {\pi_*}. The identities

\displaystyle  \begin{array}{rcl}  &\pi_*\pi^*f=f\quad\text{for }f\in L^2(\textup{Bohr}_m(G)),\\ &\pi^*(f*g)=\pi^*f*\pi^*f\quad\text{for }f,g\in L^2(\textup{Bohr}_m(G)),\\ &\pi_*(f*g)=\pi_*f*\pi_*g\quad\text{for }f,g\in L^2(G) \end{array}

are readily verified. For instance, the last of these is verified by the following computation, valid for {h\in L^2(\textup{Bohr}_m(G))} by {\mathcal{L}_{G\times G}}-Fubini:

\displaystyle  \begin{array}{rcl}  \langle h, \pi_*(f*g)\rangle = \langle \pi^* h, f*g\rangle = \int h(\pi(xy)) f(x)g(y)\,d\mu_G(x)\,d\mu_G(y)\\ =\int h(x'y') \pi_*f(x')\pi_*g(y')\,d\mu_{\textup{Bohr}_m(G)}(x')\,d\mu_{\textup{Bohr}_m(G)}(y') = \langle h,\pi_*f*\pi_*g\rangle. \end{array}

We can summarise the situation as an isometric Banach algebra isomorphism

\displaystyle  L^2(G) \cong L^2(\textup{Bohr}_m(G))\oplus \ker\pi_*.

The following theorem asserts that in fact {\textup{Bohr}_m(G)} alone determines convolutions in {G}, and thus {\textup{Bohr}_m(G)} will more generally control all ‘first-order configurations’ in {G}.

Theorem 3 We have {(\ker\pi_*)*L^2(G)=0}. Thus all convolutions in {L^2(G)} can be computed in {L^2(\textup{Bohr}_m(G))}, in the sense that

\displaystyle  f*g=\pi^*(\pi_*f*\pi_*g)

for all {f,g\in L^2(G)}.

The theorem follows from the following lemma.

Lemma 4 For all {f,g\in L^2(G)} and {d\in\mathbf{R}} we have

\displaystyle  \|f*g\|^2_{L^2(G)} \leq \left(\frac{1}{d}\|f\|_{L^2(G)}^2 + d\|\pi_*f\|_{L^2(\textup{Bohr}_m(G))}^2\right)\|g\|_{L^2(G)}^2.

In particular by optimising {d} we have

\displaystyle  \|f*g\|^2_{L^2(G)} \leq 2\|\pi_*f\|_{L^2(\textup{Bohr}_m(G))}\|f\|_{L^2(G)}\|g\|_{L^2(G)}^2.

Proof: We borrow nonabelian harmonic analysis notation from Tao. Certainly we may assume that {f} and {g} are internal, say {f=\textup{st}\lim_{n\to p}f_n} and {g=\textup{st}\lim_{n\to p}g_n}. Then by nonabelian Plancherel,

\displaystyle  \begin{array}{rcl}  \|f*g\|_{L^2(G)}^2 &=& \textup{st}\lim_{n\to p} \|f_n*g_n\|_{L^2(G_n)}^2 \\ &=& \textup{st}\lim_{n\to p} \sum_{\xi\in\widehat{G_n}} \dim V_\xi \|\hat{f_n}(\xi)\hat{g_n}(\xi)\|^2_{\text{HS}(V_\xi)} \\ &\leq& \textup{st}\lim_{n\to p} \sup_\xi\|\hat{f_n}(\xi)\|^2_{\text{HS}(V_\xi)} \|g_n\|_{L^2(G_n)}^2 \\ &=& \sup_\xi\textup{st}\lim_{n\to p}\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} \|g\|_{L^2(G)}^2, \end{array}

where in the last line the supremum is taken over all {\xi=(\xi_n)}. Fixing some such {\xi}, by Plancherel again

\displaystyle  \|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} \leq \frac{1}{\dim V_{\xi_n}} \|f_n\|_{L^2(G_n)}^2,

so we may assume that {\dim V_{\xi_n}\leq d} for {p}-most {n}. But then the representations {\rho_{\xi_n}:G_n\to U(d)} induce a measurable representation {\rho_\xi:G\to U(d)}, which in turn by the universal property of {\textup{Bohr}_m(G)} factors through a continuous representation {\rho'_\xi:\textup{Bohr}_m(G)\to U(d)}. Thus

\displaystyle  \begin{array}{rcl}  \textup{st}\lim_{n\to p}\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} &=& \textup{st}\lim_{n\to p} \int f_n(x)f_n(y) \text{tr}\rho_{\xi_n}(xy^{-1})\,d\mu_{G_n}(x)\,d\mu_{G_n}(y)\\ &=& \int f(x) f(y) \text{tr}\rho_\xi(xy^{-1})\,d\mu_G(x)\,d\mu_G(y)\\ &=& \int f(x) f(y) \text{tr}\rho'_\xi(\pi(xy^{-1}))\,d\mu_G(x)\,d\mu_G(y)\\ &=& \int \pi_*f(x') \pi_*f(y') \text{tr}\rho'_\xi(x'y'^{-1}) d\mu_{\textup{Bohr}_m(G)}(x)\,d\mu_{\textup{Bohr}_m(G)}(y)\\ &\leq& d \|\pi_*f\|_{L^1(\textup{Bohr}_m(G))}^2\\ &\leq& d\|\pi_*f\|_{L^2(\textup{Bohr}_m(G))}^2. \end{array}

This proves the lemma. \Box

3. Quasirandomness

From an additive combinatorics point of view, nonabelian groups obey a structure versus randomness principle: the asymptotic behaviour with respect to linear configurations can usually be described as some combination of abelian and random-like behaviour. Following Gowers, we call a sequence of finite groups {(G_n)} quasirandom if the least dimension of a nontrivial representation of {G_n} tends to infinity with {n}. For example for {n\geq 7} every nontrivial representation of the alternating group {\textup{Alt}(n)} has dimension at least {n-1}, so the sequence {(\textup{Alt}(n))} is quasirandom.

Theorem 5 The ultrafinite group {G_p=\prod_{n\to p} G_n} has trivial Bohr compactification if and only if the least dimension of a nontrivial representation of {G_n} tends to infinity as {n\to p}. In particular, {(G_n)} is quasirandom if and only if {\textup{Bohr}_m(G_p)} is trivial for every {p\in\beta\mathbf{N}\setminus\mathbf{N}}.

First we need a simple lemma.

Lemma 6 Let {G} and {H} be groups with {G} finite and {f:G\to H} a map which satisfies {f(xy)=f(x)f(y)} for {1-o(1)} of the pairs {(x,y)\in G^2}. Then there is a homomorphism {h:G\to H} such that {f(x)=h(x)} for {1-o(1)} of the points {x\in G}.

Proof: For every {x\in G} and for {1-o(1)} of the pairs {(y,z)\in G^2} we have

\displaystyle  f(xyz)f(yz)^{-1} = f(xy)f(z)(f(y)f(z))^{-1} = f(xy)f(y)^{-1},

so for each {x\in G} there is a unique {h(x)\in H} such that

\displaystyle  h(x) = f(xy)f(y)^{-1}

for {1-o(1)} of the points {y\in G}. Clearly {h(x)=f(x)} for {1-o(1)} of the points {x\in G}, and for {x_1,x_2\in G} we have

\displaystyle  h(x_1)h(x_2)^{-1} = f(x_1y)f(y)^{-1}f(y)f(x_2 y)^{-1} = f(x_1y)f(x_2y)^{-1}=h(x_1x_2^{-1})

for {1-o(1)} of the points {y\in G}, in particular for at least one {y\in G}, so {h} is a homomorphism. \Box

Proof: (of Theorem~5) Suppose we have a sequence of nontrivial homomorphisms {f_n:G_n\to U(d)} for all {n} on some neighbourhood of {p}. Then {(f_n)} induces a measurable homomorphism {f:G_p\to U(d)}, and since {U(d)} has no small subgroups the induced homomorphism {f:G_p\to U(d)} will also be nontrivial, so {\textup{Bohr}_m(G_p)} must be nontrivial.

Conversely suppose the least dimension of a nontrivial representation of {(G_n)} tends to infinity as {n} tends to {p}, and let {f:G_p\to U(d)} be a measurable homomorphism. By a countable saturation argument there is an internal function {g:G_p\to U(d)}, say {g=\textup{st}\lim_{n\to p}g_n}, such that {f=g} almost everywhere. Then {g_n} satisfies {g_n(xy)=g_n(x)g_n(y)} for {1-o(1)} of the pairs {(x,y)\in G_n}, so by the lemma there is a homomorphism {h_n:G_n\to U(d)} such that {g_n(x) = h_n(x)} for {1-o(1)} of the points {x\in G_n}. But by assumption any such homomorphism {h_n} must be trivial for {n} near enough to {p}, so we must have {g=1} almost everywhere, so {f=1} almost everywhere. Moreover since {f(x) = f(xy)f(y)^{-1}} we must in fact have {f=1} identically. Since the Peter–Weyl theorem implies that {\textup{Bohr}_m(G_p)} is an inverse limit of matrix groups this implies that {\textup{Bohr}_m(G_p)} is trivial. \Box

Equations are generally easy to solve in quasirandom groups. We illustrate this point with the following theorem.

Theorem 7 Let {(G_n)} be quasirandom and let {\epsilon>0}. Then there exists {n_0} such that if {n\geq n_0} and {A_n\subset G_n} has density {|A_n|/|G_n|\geq\epsilon>0} then we can find {x,y,z\in A_n} with {xy=z}.

Proof: Let {p\in\beta\mathbf{N}\setminus\mathbf{N}}, let {G=\prod_{n\to p}G_n}, and let {f} be the internal function {\textup{st}\lim_{n\to p} 1_{A_n}}. Then {\int f\,d\mu_G \geq \epsilon}, so if {\pi:G\to\textup{Bohr}_m(G)} is the Bohr compactification then {\int\pi_* f\,d\mu_{\textup{Bohr}_m(G)}\geq\epsilon}. But by the previous theorem {\textup{Bohr}_m(G)} is trivial, so {\pi_* f} is a constant {\geq\epsilon}, so by Theorem 3 we have

\displaystyle \langle f*f,f\rangle_{L^2(G)} = \langle \pi^*(\pi_*f*\pi_*f),f\rangle_{L^2(G)} = \langle \pi_*f*\pi_*f,\pi_*f\rangle_{L^2(\textup{Bohr}_m(G))} \geq \epsilon^3.

In other words the number of pairs {(x,y)\in G_n^2} such that {x,y,xy\in A_n} is at least {(\epsilon^3-o(1))|G_n|} as {n\to p}, but since {p} was arbitrary this must hold as {n\to\infty}. \Box

Here is another nice criterion for quasirandomness, which can be found in Gowers’s original paper: {(G_n)} is not quasirandom if and only if the groups {G_n} have nontrivial abelian quotients or nontrivial small quotients. In our setup we can write this the following way.

Theorem 8 Let {G} be an ultrafinite group. Then one of the following three alternatives hold:

  1. {\textup{Bohr}_m(G)} is trivial.
  2. {\textup{Bohr}_m(G)} has a nontrivial abelian quotient.
  3. {\textup{Bohr}_m(G)} has a nontrivial finite quotient.

Proof: Suppose {G=\prod_{n\to p} G_n}. By the previous theorem if {\textup{Bohr}_m(G)} is nontrivial then the groups {G_n} have bounded-dimensional nontrivial representations {\pi_n:G_n\to U(d)} as {n\to p}. By Jordan’s theorem, {\pi_n(G_n)} has a normal abelian subgroup {A_n} of bounded index. If {A_n=\pi_n(G_n)} as {n\to p} then 2 holds, while if {A_n<\pi_n(G_n)} as {n\to p} then 3 holds. \Box

Using this theorem one can prove a sort of converse to Theorem 7. If {(G_n)} is not quasirandom then there are arbitrarily large {n} and product-free subsets {A_n\subset G_n} of density bounded away from {0}. We leave the details to the reader.

4. Roth’s theorem

As an application of group limits we can prove the following version of Roth’s theorem.

Theorem 9 Let {G} be a finite group on which the squaring map {s:y\mapsto y^2} is {O(1)}-to-{1}. Let {A\subset G} be a subset of density {|A|/|G|\geq\epsilon>0}. Then there are {\gtrsim_\epsilon |G|^2} solutions to {y^2=xz} in {A}. Equivalently, there are {\gtrsim_\epsilon|G|^2} pairs {(a,b)\in G^2} such that {a,ab,bab\in A}.

Proof: If the theorem fails then we have finite groups {G_n}, some {\epsilon>0}, and subsets {A_n\subset G_n} of density {|A_n|/|G_n|\geq\epsilon} for which there are fewer than {|G_n|^2/n} pairs {(a,b)\in G_n^2} such that {a,ab,bab\in A_n}. Let {G=\prod_{n\to p} G_n} and {f = \textup{st}\lim_{n\to p} 1_{A_n}}. Then {\int_G f \,d\mu_G \geq \epsilon > 0} but

\displaystyle  \int_{G^2} f(a)f(ab)f(bab)\,d\mu_G^2 = 0. \ \ \ \ \ (1)

But note

\displaystyle  \int_{G^2} f(a)f(ab)f(bab)\,d\mu_G^2 = \int_{G^2} f(x) f(y) f(x^{-1}y^2)\,d\mu_G^2= \langle f,(f*f)\circ s\rangle_{L^2(G)},

and by Theorem 3 this is the same as

\displaystyle  \begin{array}{rcl}  \langle f,\pi^*(\pi_*f*\pi_*f)\circ s\rangle_{L^2(G)} &=& \langle f, \pi^*((\pi_*f*\pi_*f)\circ s)\rangle_{L^2(G)} \\ &=& \langle \pi_*f,(\pi_*f*\pi_*f)\circ s\rangle_{L^2(B)} \\ &=& \int_{B^2} \pi_*f(a)\,\pi_*f(ab)\,\pi_*f(bab)\,d\mu_B^2, \end{array}

where {B=\textup{Bohr}_m(G)}. By Fubini’s theorem this is the same as {\int g\, d\mu_B}, where

\displaystyle  g(b) = \int_B \pi_*f(a) \pi_*f(ab) \pi_*f(bab) \, d\mu_B(a).

But a standard argument (approximating {\pi_*f} by a continuous function) shows that {g} is a continuous nonnegative function, and

\displaystyle  g(1) = \int_B (\pi_*f)^3 \, d\mu_B \geq \left( \int_B \pi_* f \,d\mu_B \right)^3 > 0.

Hence {g > 0} on a neighbourhood of {1}, so {\int g\, d\mu_B > 0}, in contradiction to (1). \Box

We chose to count configurations of the form {(a,ab,bab)} precisely because they are alternatively described by the rather simple equation {y^2=xz}. If instead we chose to count the more “obvious” nonabelian analgues of three-term arithmetic progressions, namely configurations of the form {(a,ab,ab^2)}, then we would be counting solutions to the more complicated equation {z = yx^{-1}y}. The problem is that the count of these configurations is not obviously controlled by convolutions, so we can’t easily transport the problem to the Bohr compactification. In fact the situation is delicate and not completely understood: see for example this paper of Tao for the case of {\text{SL}_d(F)}. [Later edit: This paper of Sarah Peluse solves this problem for simple groups.]