• an O(k log n)-time algorithm for the exactly k-sparse case, and
• an O(k log n log(n/k))-time algorithm for the general case.
The key property of both algorithms is their ability to achieve o(n log n) time, and thus improve over the FFT, for any k = o(n). These algorithms are the first known algorithms that satisfy this property. Moreover, if one assumes that FFT is optimal and hence the DFT cannot be computed in less than O(n log n) time, the algorithm for the exactly k-sparse case is optimal1 as long as k = nΩ(1). Under the same assumption, the result for the general case is at most one log log n factor away from the optimal runtime for the case of “large” sparsity
For the general case, given a signal x, the algorithm computes a k-sparse approximation
where C is some approximation factor and the minimization is over k-sparse signals.
Furthermore, our algorithm for the exactly sparse case is quite simple and has low big-Oh constants. In particular, our implementation of a variant of this algorithm, described in Chapter 6, is faster than FFTW, a highly efficient implementation of the FFT, for n = 222 and k ≤ 217 [Hassanieh et al. 2012]. In contrast, for the same signal size, the algorithms in Chapter 3 were faster than FFTW only for k ≤ 2000.2
We complement our algorithmic results by showing that any algorithm that works for the general case must use at least Ω(k log(n/k)/ log log n) samples from x. The proof of this lower bound can be found in Appendix C. The lower bound uses techniques from Price and Woodruff [2011], which shows a lower bound of Ω(k log(n/k)) for the number of arbitrary linear measurements needed to compute the k-sparse approximation of an n-dimensional vector x̂. In comparison to Price and Woodruff [2011], our bound is slightly worse but it holds even for adaptive sampling, where the algorithm selects the samples based on the values of the previously sampled coordinates.3 Note that our algorithms are non-adaptive, and thus limited by the more stringent lower bound of Price and Woodruff [2011].
4.1.2 Techniques
Recall from Chapter 3 that we can use the flat window filters coupled with a random permutation of the spectrum to bin/bucketize the Fourier coefficients into a small number of buckets. We can then use that to estimate the positions and values of the large frequency coefficients that were isolated in their own bucket. Here, we use the same filters introduced in Chapter 3. In this case, a filter G have the property that the value of Ĝ is “large” over a constant fraction of the pass region, referred to as the “super-pass” region. We say that a coefficient is “isolated” if it falls into a filter’s super-pass region and no other coefficient falls into filter’s pass region. Since the super-pass region of our filters is a constant fraction of the pass region, the probability of isolating a coefficient is constant.
However, the main difference in this chapter, that allows us to achieve the stated running times, is a fast method for locating and estimating isolated coefficients. Further, our algorithm is iterative, so we also provide a fast method for updating the signal so that identified coefficients are not considered in future iterations. Below, we describe these methods in more detail.
New Techniques: Location and Estimation
Our location and estimation methods depends on whether we handle the exactly sparse case or the general case. In the exactly sparse case, we show how to estimate the position of an isolated Fourier coefficient using only two samples of the filtered signal. Specifically, we show that the phase difference between the two samples is linear in the index of the coefficient, and hence we can recover the index by estimating the phases. This approach is inspired by the frequency offset estimation in orthogonal frequency division multiplexing (OFDM), which is the modulation method used in modern wireless technologies (see Heiskala and Terry [2001, chapter 2]).
In order to design an algorithm4 for the general case, we employ a different approach. Specifically, we can use two samples to estimate (with constant probability) individual bits of the index of an isolated coefficient. Similar approaches have been employed in prior work. However, in those papers, the index was recovered bit by bit, and one needed Ω(log log n) samples per bit to recover all bits correctly with constant probability. In contrast, we recover the index one block of bits at a time, where each block consists of O(log log n) bits. This approach is inspired by the fast sparse recovery algorithm of Gilbert et al. [2010]. Applying this idea in our context, however, requires new techniques. The reason is that, unlike in Gilbert et al. [2010], we do not have the freedom of using arbitrary “linear measurements” of the vector x̂, and we can only use the measurements induced by the Fourier transform.5 As a result, the extension from “bit recovery” to “block recovery” is the most technically involved part of the algorithm. Section 4.3.1 contains further intuition on this part.
New Techniques: Updating the Signal
The aforementioned techniques recover the position and the value of any isolated coefficient. However, during each filtering step, each coefficient becomes isolated only with constant probability. Therefore, the filtering process needs to be repeated to ensure that each coefficient is correctly identified. In Chapter 3, the algorithm simply performs the filtering O(log n) times and uses the median estimator to identify each coefficient with high probability. This, however, would lead to a running time of O(k log2 n) in the k-sparse case, since each filtering step takes k log n time.
One could reduce the filtering time by subtracting the identified coefficients from the signal. In this way, the number of non-zero coefficients would be reduced by a constant factor after each iteration, so the cost of the first iteration would dominate the total running time. Unfortunately, subtracting the recovered coefficients from the signal is a computationally costly operation, corresponding to a so-called non-uniform DFT (see Gilbert et al. [2008] for details). Its cost would override any potential savings.
In this chapter, we introduce a different approach: instead of subtracting the identified coefficients from the signal, we subtract them directly from the bins obtained by filtering the signal. The latter operation can be done in time linear in the number of subtracted coefficients, since each of them “falls” into only one bin. Hence, the computational costs of each iteration can be decomposed into two terms, corresponding to filtering the original signal and subtracting the coefficients. For the exactly sparse case these terms are as follows.
• The cost of filtering the original signal is O(B log n), where B is the number of bins. B is set to O(k′), where k′ is the number of yet-unidentified coefficients. Thus, initially B is equal to O(k), but its value decreases by a constant factor after each iteration.
• The cost of subtracting the identified coefficients from the bins is O(k).
Since the number of iterations is O(log k), and the cost of filtering is dominated by the first iteration, the total