Computational Frameworks For The Fast Fourier Transform (frontiers In Applied Mathematics)

Preparing link to download Please wait... Download

E-Book Overview

The most comprehensive treatment of FFTs to date. Van Loan captures the interplay between mathematics and the design of effective numerical algorithms--a critical connection as more advanced machines become available. A stylized Matlab notation, which is familiar to those engaged in high-performance computing, is used. The Fast Fourier Transform (FFT) family of algorithms has revolutionized many areas of scientific computation. The FFT is one of the most widely used algorithms in science and engineering, with applications in almost every discipline. This volume is essential for professionals interested in linear algebra as well as those working with numerical methods. The FFT is also a great vehicle for teaching key aspects of scientific computing.

E-Book Content

Kronecker Product Properties Kronl If A, B, C, and D are matrices, then (A® B)(C ® D) = (AC)®(BD), assuming that the ordinary multiplications AC and BD are defined. Kron2 If A and B are nonsingular matrices, then A® B is nonsingvlar and Kron3 If A and B are matrices, then (A ® B) Kron4 If P and Q are permutations, then so is P® Q. Kron5 Suppose n = re. If A 2, then Fn is not Hermitian. Indeed, F^ — F% = Fn . However, if n = 1 or 2, then Fn is real and so FI and F? are Hermitian. Two vectors x, y £ n we have If q — p, then npq — n. Otherwise, w 4 ~ p ^ 1, and it follows from the equation that /ip, = 0 whenever We say that Q C n x n is unitary if Q-1 = QH. From Theorem 1.1.2 we see tha Fn/\/n is unitary. Thus, the DFT is a scaled unitary transformation on d, then the time required is approximately given by 34 CHAPTER 1. THE RADix-2 FRAMEWORKS Let us apply this model to (1.4.8). However, before we do this we clarify the exact nature of the underlying vector operations that are carried out by the inner loop by rewriting the algorithm as follows: Notice that there are ten length-!* vector operations to perform each pass through the loop. It follows that the whole process requires time for completion. Notice that the start-up factor a has a greater bearing upon performance when L+ is small. 1.4.11 The Long Weight Vector When we consider the overall Cooley-Tukey procedure, we see that the weights associated with Aq are a subset of the weights associated with Aq+\. In particular, since n /2, the weight vector for At. Unfortunately, this results in a power-of-two stride access into u;n/2- This is because the scaling operation which arises in either Algorithm 1.4.8 or 1.4.9, transforms to where j = n/L. A way around this difficulty suggested by Bailey (1987) is to precompute the long weight vector «4° defined by 1.4. WEIGHT AND BUTTERFLY COMPUTATIONS 35 This (n — 1)-vector is just a stacking of the weight vectors wu. In particular, if L — 2 and L, = 2*-1, then WL. = w(rl°n9\L. - l:L - 1) With Wn°ng available, the above r computation becomes. and unit stride access is achieved. It is clear that Wn°n9) can be constructed in O(n) flops using any of Algorithms 1.4.1-1.4.6. Problems Pi.4.1 Using the identities prove (1.4.2) and (1.4.5). PI.4.2 Show that if j = (6 t _i •••6160)2, then WL.(J) is the consequence of bt-i + • • • + 61 + 60 complex multiplications in Algorithm 1.4.3. Pi.4.3 Develop a complete algorithm for u / n " 9 ' that uses Algorithm 1.4.3 to compute the individual weight vectors. How many flops are required? PI.4.4 Using the concepts of §1.4.10, analyze the performance of Algorithm 1.4.3. Notes and References for Section Interleaved memory systems, vectorization models, and various pipelining issues are discussed in J.J. Dongarra, I.S. Duff, D.C. Sorensen, and H.A. van der Vorst (1991). Solving Linear Systems on Vector and Shared Memory Computers