E-Book Content
Architecture of High Performance Computers Valurne I R. N. lbbett and N. P. Topharn Department of Computer Science University of Edinburgh Edinburgh Scotland EH9 3JZ Architecture of High Performance Computers Valurne I Uniprocessors and vector processors Springer Science+Business Media, LLC ©Roland N. lbbett and Nigel P. Topharn 1989 Originally published by Springer-Verlag New York Inc. in 1989 All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission. ISBN 978-1-4899-6714-5 ISBN 978-1-4899-6712-1 (eBook) DOI 10.1007/978-1-4899-6712-1 Contents Preface 1 Introduction 1.1 Historical developments . . . . . . . . 1.2 Techniques for improving performance 1.3 An architectural design example 2 3 Instructions and Addresses 2.1 Three-address systems- the CDC 6600 and 7600 2.2 Two-address systems - the IBM System/360 and /370 2.3 One-address systems .. . 2.4 Zero-address systems . . . . . . 2.5 The MU5 instruction set . . . . 2.6 Comparing instruction formats 2.7 Reduced instruction sets Storage Hierarchies 3.1 Store interleaving . . . . . . . . . . 3.2 The Atlas paging system . . . . . . 3.2.1 Real and virtual addresses . 3.2.2 Page address registers .. . 3.3 IBMcachestores . . . . . . . . . . 3.3.1 The System/360 Model 85 cache 3.3.2 The System/370 Model 165 cache 3.3.3 The 3090 cache .. 3.4 The MU5 Name Store . . . . . 3.4.1 Normaloperation . . . . 3.4.2 Non-equivalence actions 3.4.3 Actions for ACC orders 3.4.4 Name Storeperformance. 3.5 Data transfers in the MU5 storage hierarchy . 3.5.1 The Exchange . . . . . . . . . 3.5.2 The Exchange priority system . . . . . viii 1 1 2 3 1 7 10 12 15 17 22 25 26 26 29 30 31 33 33 35 37 37 38 40 43 43 44 44 48 Vl Contents 4 Pipelines 4.1 Principles of pipeline concurrency . . . . . . . 4.2 The MUS Primary Operand Unit pipeline .. 4.2.1 Synchronaus and asynchronaus timing 4.2.2 Variations among instruction types . 4.2.3 Control transfers . . . . . . . 4.2.4 Performance measurements . . . . . 4.3 Arithmetic pipelines - the TI ASC . . . . 4.4 The IBM System/360 Model 91 Common Data Bus 49 49 52 57 59 62 63 64 70 5 Instruction Buffers 5.1 The IBM System/360 Model195 instruction processor 5.1.1 Sequential instruction fetehing 5.1.2 Conditional mode . . . . . . . . 5.1.3 Loop mode . . . . . . . . . . . . 5.2 lnstruction buffering in CDC computers 5.2.1 The CDC 6600 instruction stack 5.2.2 The CDC 7600 instruction stack 5.3 The MU5 Instruction Buffer Unit . . . . 5.3.1 Sequential instruction fetehing . 5.3.2 lnstruction effects in the Jump Trace . 5.4 The CRAY-1 instruction buffers 5.5 Position of the Control Point 74 74 75 77 78 79 79 82 83 86 87 88 90 6 Parallel Functional Units 6.1 The CDC 6600 central processor . . . . . 6.1.1 Functional units in the CDC 6600 6.1.2 lnstruction dependencies . 6.1.3 Data highways . . . . . . . . . . . 6.2 The CDC 7600 central processor . . . . . 6.2.1 Functional units in the CDC 7600 6.2.2 lnstruction issue 6.3 Performance . . . . . . . 96 96 97 99 . 104 . 105 . 107 . 108 . 111 7 The CRAY Series 7.1 The CRAY-1 . 7.1.1 Functional units in the CRAY-1 7.1.2 lnstruction issue 7.1.3 Chaining .. 7.1.4 Performance .. 7.2 The CRAY X-MP . . . 7.2.1 Central memory organisation 7.2.2 Chaining . . . . . . . . . . . 113 . 113 . 116 . 119 . 121 . 124 . 127 . 128 . 131 Contents Vll 7.2.3 Interprocessor communication . 7.2.4 The CRAY Y-MP . . . . . . . The CRAY-2 . . . . . . . . . . . . . . 7.3.1 Background Processor architecture 7.3.2 Interprocessor communication . 7.3.3 Technology . Beyond the CRAY-2 .. . 132 . 133 . 133 . 136 . 137 . 138 . 139 8 Vector Facilities in MU5 8.1 Introduct