We obtain subquadratic algorithms for 3SUM on integers and rationals in several models. On a standard word RAM with w-bit words, we obtain a running time of O(n2 / max{ w lg2 w , lg2 n (lg lg n)2 }). In the circuit RAM with one nonstandard AC0 operation, we obtain O(n2 / w2 lg2 w ). In external memory, we achieve O(n2 /(MB)), even under the standard assumption of data indivisibility. Cache-obliviously, we obtain a running time of O(n2 / MB lg2 M ). In all cases, our speedup is almost quadratic in the “parallelism” the model can afford, which may be the best possible. Our algorithms are Las Vegas randomized; time bounds hold in expectation, and in most cases, with high probability.
Ilya Baran, Erik D. Demaine, Mihai Patrascu