Revision/ Update Information: This is Version 4 of the Alpha
Architecture Handbook.
1
1
Page 2
3
October 1998
The information in this publication is subject to change without notice.
COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL
ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAM-AGES
RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS
INFORMATION IS PROVIDED "AS IS" AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY
WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WAR-RANTIES
OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST
INFRINGEMENT.
This publication contains information protected by copyright. No part of this publication may be photocopied or
reproduced in any form without prior written consent from Compaq Computer Corporation.
© Compaq Computer Corporation 1998.
All rights reserved. Printed in the U. S. A.
The following are trademarks of Comaq Computer Corporation: Alpha AXP, AXP, DEC, DIGITAL, DIGITAL
UNIX, OpenVMS, PDP– 11, VAX, VAX DOCUMENT, and the DIGITAL logo.
Cray is a registered trademark of Cray Research, Inc. IBM is a registered trademark of International Business
Machines Corporation. UNIX is a registered trademark in the United States and other countries licensed exclusively
through X/ Open Company Ltd. Windows NT is a trademark of Microsoft Corporation.
All other trademarks and registered trademarks are the property of their respective owners.
2
2
Page 3
4
iii
Table of Contents
1 Introduction
1. 1 The Alpha Approach to RISC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 1
1. 2 Data Format Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 3
1. 3 Instruction Format Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 4
1. 4 Instruction Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 4
1. 5 Instruction Set Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 6
1.6 Terminology and Conventions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 6
1. 6. 1 Numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 7
1. 6. 2 Security Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 7
1.6.3 UNPREDICTABLE and UNDEFINED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 7
1.6.4 Ranges and Extents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 8
1. 6. 5 ALIGNED and UNALIGNED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 8
1. 6. 6 Must Be Zero (MBZ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 9
1. 6. 7 Read As Zero (RAZ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 9
1. 6. 8 Should Be Zero (SBZ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 9
1.6.9 Ignore (IGN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 9
1.6.10 Implementation Dependent (IMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 9
1. 6. 11 Illustration Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 9
1.6.12 Macro Code Example Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 9
2 Basic Architecture
2. 1 Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 1
2. 2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 1
2. 2. 1 Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 1
2. 2. 2 Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 1
2. 2. 3 Longword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 2
2. 2. 4 Quadword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 2
2.2.5 VAX Floating-Point Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 3
2. 2. 5. 1 F_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 3
2. 2. 5. 2 G_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 4
2. 2. 5. 3 D_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 5
2.2.6 IEEE Floating-Point Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 6
2. 2. 6. 1 S_Floating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 7
2. 2. 6. 2 T_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 8
2. 2. 6. 3 X_Floating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 9
2.2.7 Longword Integer Format in Floating-Point Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 11
2. 2. 8 Quadword Integer Format in Floating-Point Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 12
2.2.9 Data Types with No Hardware Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 12
3
3
Page 4
5
iv
2.3 Big-Endian Addressing Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 13
3 Instruction Formats
3.1 Alpha Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 1
3. 1. 1 Program Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 1
3.1.2 Integer Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 1
3. 1. 3 Floating-Point Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 2
3. 1. 4 Lock Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 2
3. 1. 5 Processor Cycle Counter (PCC) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 3
3.1.6 Optional Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 3
3. 1. 6. 1 Memory Prefetch Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 3
3.1.6.2 VAX Compatibility Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 3
3. 2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 3
3. 2. 1 Operand Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 4
3.2.2 Instruction Operand Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 5
3. 2. 2. 1 Operand Name Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 5
3. 2. 2. 2 Operand Access Type Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 5
3. 2. 2. 3 Operand Data Type Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 6
3. 2. 3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 6
3. 2. 4 Notation Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 10
3. 3 Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 10
3. 3. 1 Memory Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 11
3. 3. 1. 1 Memory Format Instructions with a Function Code . . . . . . . . . . . . . . . . . . . . . . . . 3– 11
3. 3. 1. 2 Memory Format Jump Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 12
3. 3. 2 Branch Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 12
3. 3. 3 Operate Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 12
3. 3. 4 Floating-Point Operate Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 13
3. 3. 4. 1 Floating-Point Convert Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 14
3. 3. 4. 2 Floating-Point/ Integer Register Moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 14
3.3.5 PALcode Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 14
4 Instruction Descriptions
4. 1 Instruction Set Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 1
4. 1. 1 Subsetting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 2
4. 1. 2 Floating-Point Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 2
4. 1. 3 Software Emulation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 3
4.1.4 Opcode Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 3
4.2 Memory Integer Load/ Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 4
4. 2. 1 Load Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 5
4.2.2 Load Memory Data into Integer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 6
4.2.3 Load Unaligned Memory Data into Integer Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 8
4.2.4 Load Memory Data into Integer Register Locked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 9
4.2.5 Store Integer Register Data into Memory Conditional . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 12
4.2.6 Store Integer Register Data into Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 15
4.2.7 Store Unaligned Integer Register Data into Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 17
4. 3 Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 18
4.3.1 Conditional Branch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 20
4.3.2 Unconditional Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 21
4. 3. 3 Jumps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 22
4.4 Integer Arithmetic Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 24
4. 4. 1 Longword Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 25
4.4.2 Scaled Longword Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 26
4. 4. 3 Quadword Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 27
4. 4. 4 Scaled Quadword Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 28
4
4
Page 5
6
v
4.4.5 Integer Signed Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 29
4.4.6 Integer Unsigned Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 30
4.4.7 Count Leading Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 31
4.4.8 Count Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 32
4. 4. 9 Count Trailing Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 33
4. 4. 10 Longword Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 34
4. 4. 11 Quadword Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 35
4.4.12 Unsigned Quadword Multiply High. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 36
4. 4. 13 Longword Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 37
4.4.14 Scaled Longword Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 38
4. 4. 15 Quadword Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 39
4. 4. 16 Scaled Quadword Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 40
4. 5 Logical and Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 41
4. 5. 1 Logical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 42
4.5.2 Conditional Move Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 43
4.5.3 Shift Logical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 45
4. 5. 4 Shift Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 46
4. 6 Byte Manipulation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 47
4. 6. 1 Compare Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 49
4. 6. 2 Extract Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 51
4. 6. 3 Byte Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 55
4. 6. 4 Byte Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 57
4.6.5 Sign Extend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 60
4. 6. 6 Zero Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 61
4. 7 Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 62
4. 7. 1 Single-Precision Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 62
4.7.2 Subsets and Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 62
4. 7. 3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 63
4. 7. 4 Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 65
4. 7. 5 Rounding Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 66
4.7.6 Computational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 67
4.7.6.1 VAX-Format Arithmetic with Precise Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 67
4.7.6.2 High-Performance VAX-Format Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 68
4.7.6.3 IEEE-Compliant Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 68
4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception. . . . . . . . . . . . . . . . . . . . . . 4– 68
4.7.6.5 High-Performance IEEE-Format Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 69
4.7.7 Trapping Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 69
4.7.7.1 VAX Trapping Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 69
4.7.7.2 IEEE Trapping Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 71
4. 7. 7. 3 Arithmetic Trap Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 73
4.7.7.3.1 Trap Shadow Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 73
4.7.7.3.2 Trap Shadow Length Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 74
4. 7. 7. 4 Invalid Operation (INV) Arithmetic Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 76
4. 7. 7. 5 Division by Zero (DZE) Arithmetic Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 77
4. 7. 7. 6 Overflow (OVF) Arithmetic Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 77
4. 7. 7. 7 Underflow (UNF) Arithmetic Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 78
4. 7. 7. 8 Inexact Result (INE) Arithmetic Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 78
4.7.7.9 Integer Overflow (IOV) Arithmetic Trap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 78
4.7.7.10 IEEE Floating-Point Trap Disable Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 78
4.7.7.11 IEEE Denormal Control Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 79
4. 7. 8 Floating-Point Control Register (FPCR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 79
4. 7. 8. 1 Accessing the FPCR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 82
4. 7. 8. 2 Default Values of the FPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 83
4. 7. 8. 3 Saving and Restoring the FPCR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 83
4. 7. 9 Floating-Point Instruction Function Field Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 84
4.7.10 IEEE Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 88
4. 7. 10. 1 Conversion of NaN and Infinity Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 88
4. 7. 10. 2 Copying NaN Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 89
4. 7. 10. 3 Generating NaN Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 89
5
5
Page 6
7
vi
4.7.10.4 Propagating NaN Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 89
4. 8 Memory Format Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 90
4. 8. 1 Load F_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 91
4. 8. 2 Load G_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 92
4. 8. 3 Load S_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 93
4. 8. 4 Load T_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 94
4. 8. 5 Store F_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 95
4. 8. 6 Store G_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 96
4. 8. 7 Store S_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 97
4. 8. 8 Store T_floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 98
4. 9 Branch Format Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 99
4.9.1 Conditional Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 100
4. 10 Floating-Point Operate Format Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 102
4. 10. 1 Copy Sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 105
4. 10. 2 Convert Integer to Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 106
4.10.3 Floating-Point Conditional Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 107
4. 10. 4 Move from/ to Floating-Point Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 109
4.10.5 VAX Floating Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 110
4.10.6 IEEE Floating Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 111
4.10.7 VAX Floating Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 112
4.10.8 IEEE Floating Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 113
4.10.9 Convert VAX Floating to Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 114
4.10.10 Convert Integer to VAX Floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 115
4.10.11 Convert VAX Floating to VAX Floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 116
4.10.12 Convert IEEE Floating to Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 117
4.10.13 Convert Integer to IEEE Floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 118
4.10.14 Convert IEEE S_Floating to IEEE T_Floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 119
4.10.15 Convert IEEE T_Floating to IEEE S_Floating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 120
4.10.16 VAX Floating Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 121
4.10.17 IEEE Floating Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 122
4.10.18 Floating-Point Register to Integer Register Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 123
4.10.19 Integer Register to Floating-Point Register Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 124
4.10.20 VAX Floating Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 126
4.10.21 IEEE Floating Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 127
4.10.22 VAX Floating Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 128
4.10.23 IEEE Floating Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 129
4.10.24 VAX Floating Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 130
4.10.25 IEEE Floating Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 131
4.11 Miscellaneous Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 132
4. 11. 1 Architecture Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 133
4. 11. 2 Call Privileged Architecture Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 135
4. 11. 3 Evict Data Cache Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 136
4. 11. 4 Exception Barrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 138
4. 11. 5 Prefetch Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 139
4. 11. 6 Implementation Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 141
4. 11. 7 Memory Barrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 142
4. 11. 8 Read Processor Cycle Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 143
4. 11. 9 Trap Barrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 144
4. 11. 10 Write Hint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 145
4. 11. 11 Write Memory Barrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 147
4.12 VAX Compatibility Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 149
4.12.1 VAX Compatibility Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 150
4.13 Multimedia (Graphics and Video) Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 151
4.13.1 Byte and Word Minimum and Maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 152
4. 13. 2 Pixel Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 154
4. 13. 3 Pack Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 155
4. 13. 4 Unpack Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 156
6
6
Page 7
8
vii
5 System Architecture and Programming Implications
5. 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 1
5. 2 Physical Address Space Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 1
5. 2. 1 Coherency of Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 1
5.2.2 Granularity of Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 2
5. 2. 3 Width of Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 3
5.2.4 Memory-Like and Non-Memory-Like Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 3
5.3 Translation Buffers and Virtual Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 4
5.4 Caches and Write Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 4
5. 5 Data Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 6
5.5.1 Atomic Change of a Single Datum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 6
5. 5. 2 Atomic Update of a Single Datum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 6
5. 5. 3 Atomic Update of Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 7
5. 5. 4 Ordering Considerations for Shared Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 9
5. 6 Read/ Write Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 10
5.6.1 Alpha Shared Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 10
5.6.1.1 Architectural Definition of Processor Issue Sequence . . . . . . . . . . . . . . . . . . . . . . 5– 12
5. 6. 1. 2 Definition of Before and After . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 12
5. 6. 1. 3 Definition of Processor Issue Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 12
5. 6. 1. 4 Definition of Location Access Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 14
5. 6. 1. 5 Definition of Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 14
5. 6. 1. 6 Definition of Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 14
5.6.1.7 Definition of Dependence Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 15
5.6.1.8 Definition of Load-Locked and Store-Conditional . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 16
5. 6. 1. 9 Timeliness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 17
5. 6. 2 Litmus Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 17
5.6.2.1 Litmus Test 1 (Impossible Sequence). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 17
5.6.2.2 Litmus Test 2 (Impossible Sequence). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 18
5.6.2.3 Litmus Test 3 (Impossible Sequence). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 18
5.6.2.4 Litmus Test 4 (Sequence Okay) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 19
5.6.2.5 Litmus Test 5 (Sequence Okay) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 19
5.6.2.6 Litmus Test 6 (Sequence Okay) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 19
5.6.2.7 Litmus Test 7 (Impossible Sequence). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 20
5.6.2.8 Litmus Test 8 (Impossible Sequence). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 20
5.6.2.9 Litmus Test 9 (Impossible Sequence). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 21
5.6.2.10 Litmus Test 10 (Sequence Okay) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 21
5.6.2.11 Litmus Test 11 (Impossible Sequence). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 21
5. 6. 3 Implied Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 22
5. 6. 4 Implications for Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 22
5. 6. 4. 1 Single Processor Data Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 22
5. 6. 4. 2 Single Processor Instruction Stream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 22
5.6.4.3 Multiprocessor Data Stream (Including Single Processor with DMA I/ O) . . . . . . . . 5– 22
5.6.4.4 Multiprocessor Instruction Stream (Including Single Processor with DMA I/ O) . . . 5– 23
5. 6. 4. 5 Multiprocessor Context Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 24
5. 6. 4. 6 Multiprocessor Send/ Receive Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 26
5.6.4.7 Implications for Memory Mapped I/ O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 27
5. 6. 4. 8 Multiple Processors Writing to a Single I/ O Device. . . . . . . . . . . . . . . . . . . . . . . . . 5– 28
5. 6. 5 Implications for Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 29
5. 7 Arithmetic Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5– 30
6 Common PALcode Architecture
6. 1 PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 1
6.2 PALcode Instructions and Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 1
6.3 PALcode Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 2
6. 4 Special Functions Required for PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 2
7
7
Page 8
9
viii
6.5 PALcode Effects on System Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 3
6.6 PALcode Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 3
6. 7 Required PALcode Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 4
6. 7. 1 Drain Aborts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 6
6. 7. 2 Halt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 7
6. 7. 3 Instruction Memory Barrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6– 8
7 Console Subsystem Overview
8 Input/ Output Overview
9 OpenVMS Alpha
9.1 Unprivileged OpenVMS Alpha PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9– 1
9.2 Privileged OpenVMS Alpha Palcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9– 8
10 Digital UNIX
10. 1 Unprivileged Digital UNIX PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10– 1
10. 2 Privileged Digital UNIX PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10– 2
11 Windows NT Alpha
11. 1 Unprivileged Windows NT Alpha PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11– 1
11. 2 Privileged Windows NT Alpha PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11– 2
A Software Considerations
A. 1 Hardware-Software Compact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 1
A. 2 Instruction-Stream Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 2
A. 2. 1 Instruction Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 2
A. 2.2 Branch Prediction and Minimizing Branch-Taken — Factor of 3 . . . . . . . . . . . . . . . . . . A– 2
A. 2. 3 Improving I-Stream Density — Factor of 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 4
A. 2.4 Instruction Scheduling — Factor of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 4
A. 3 Data-Stream Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 4
A. 3. 1 Data Alignment — Factor of 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 4
A. 3. 2 Shared Data in Multiple Processors — Factor of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 5
A. 3. 3 Avoiding Cache/ TB Conflicts — Factor of 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 6
A. 3.4 Sequential Read/ Write — Factor of 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 8
A. 3. 5 Prefetching — Factor of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 8
A. 4 Code Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 9
A. 4.1 Aligned Byte/ Word (Within Register) Memory Accesses . . . . . . . . . . . . . . . . . . . . . . . . A– 9
A. 4. 2 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 10
A. 4. 3 Byte Swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 11
A. 4. 4 Stylized Code Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 11
A. 4. 4. 1 NOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 11
A. 4. 4. 2 Clear a Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 12
A. 4.4.3 Load Literal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 12
A. 4. 4. 4 Register-to-Register Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 13
A. 4.4.5 Negate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 13
8
8
Page 9
10
ix
A. 4. 4. 6 NOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 13
A. 4.4.7 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 13
A. 4.5 Exceptions and Trap Barriers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 14
A. 4.6 Pseudo-Operations (Stylized Code Forms) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 14
A. 5 Timing Considerations: Atomic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 16
B IEEE Floating-Point Conformance
B. 1 Alpha Choices for IEEE Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B– 1
B. 2 Alpha Support for OS Completion Handlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B– 3
B. 2.1 IEEE Floating-Point Control (FP_C) Quadword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B– 4
B. 3 Mapping to IEEE Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B– 6
C Instruction Summary
C. 1 Common Architecture Instruction Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 1
C. 2 IEEE Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 6
C. 3 VAX Floating-Point Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 7
C. 4 Independent Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 8
C. 5 Opcode Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 8
C. 6 Common Architecture Opcodes in Numerical Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 10
C. 7 OpenVMS Alpha PALcode Instruction Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 14
C. 8 DIGITAL UNIX PALcode Instruction Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 16
C. 9 Windows NT Alpha Instruction Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 17
C. 10 PALcode Opcodes in Numerical Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 18
C. 11 Required PALcode Opcodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 20
C. 12 Opcodes Reserved to PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 20
C. 13 Opcodes Reserved to Compaq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 21
C. 14 Unused Function Code Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 21
C. 15 ASCII Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C– 22
D Registered System and Processor Identifiers
D. 1 Processor Type Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D– 1
D. 2 PALcode Variation Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D– 2
D. 3 Architecture Mask and Implementation Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D– 3
E Waivers and Implementation-Dependent Functionality
E. 1 Waivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 1
E. 1.1 DECchip 21064, DECchip 21066, and DECchip 21068 IEEE Divide Instruction Violation E– 1
E. 1.2 DECchip 21064, DECchip 21066, and DECchip 21068 Write Buffer Violation . . . . . . . E– 2
E. 1. 3 DECchip 21264 LDx_L/ STx_C with WH64 Violation . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 2
E. 2 Implementation-Specific Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 3
E. 2.1 DECchip 21064/ 21066/ 21068 Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . E– 3
E. 2.1.1 DECchip 21064/ 21066/ 21068 Performance Monitor Interrupt Mechanism . . . . . . E– 4
E. 2.1.2 Functions and Arguments for the DECchip 21064/ 21066/ 21068 . . . . . . . . . . . . . . E– 5
E. 2.2 DECchip 21164/ 21164PC Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 9
E. 2.2.1 Performance Monitor Interrupt Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 9
9
9
Page 10
11
x
E. 2.2.2 Windows NT Alpha Functions and Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 10
E. 2.2.3 OpenVMS Alpha and DIGITAL UNIX Functions and Arguments . . . . . . . . . . . . . . E– 12
E. 2.3 21264 Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 23
E. 2.3.1 Performance Monitor Interrupt Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 23
E. 2.3.2 Windows NT Alpha Functions and Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E– 24
E. 2.3.3 OpenVMS Alpha and DIGITAL UNIX Functions and Arguments . . . . . . . . . . . . . . E– 25
xi
Figures
1– 1 Instruction Format Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1– 4
2– 1 Byte Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 1
2– 2 Word Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 2
2– 3 Longword Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 2
2– 4 Quadword Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 2
2– 5 F_floating Datum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 3
2– 6 F_floating Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 3
2– 7 G_floating Datum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 4
2– 8 G_floating Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 5
2– 9 D_floating Datum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 5
2– 10 D_floating Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 5
2– 11 S_floating Datum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 7
2– 12 S_floating Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 7
2– 13 T_floating Datum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 8
2– 14 T_floating Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 9
2– 15 X_floating Datum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 10
2– 16 X_floating Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 10
2– 17 X_floating Big-Endian Datum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 11
2– 18 X_floating Big-Endian Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 11
2– 19 Longword Integer Datum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 11
2– 20 Longword Integer Floating-Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 11
2– 21 Quadword Integer Datum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 12
2– 22 Quadword Integer Floating-Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 12
2– 23 Little-Endian Byte Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 13
2– 24 Big-Endian Byte Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2– 13
3– 1 Memory Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 11
3– 2 Memory Instruction with Function Code Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 11
3– 3 Branch Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 12
3– 4 Operate Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 12
3– 5 Floating-Point Operate Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 13
3– 6 PALcode Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3– 15
4– 1 Floating-Point Control Register (FPCR) Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 80
4– 2 Floating-Point Instruction Function Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4– 84
8– 1 Alpha System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8– 1
A– 1 Branch-Format BSR and BR Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 3
A– 2 Memory-Format JSR Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 3
A– 3 Bad Allocation in Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 7
A– 4 Better Allocation in Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 7
A– 5 Best Allocation in Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A– 7
B– 1 IEEE Floating-Point Control (FP_C) Quadword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B– 4
B– 2 IEEE Trap Handling Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B– 7
11
11
Page 12
13
xii
Tabl es
2– 1 F_floating Load Exponent Mapping (MAP_F) ................................................................ 2– 4
2– 2 S_floating Load Exponent Mapping (MAP_S) ................................................................ 2– 7
3– 1 Operand Notation ........................................................................................................... 3– 4
3– 2 Operand Value Notation ................................................................................................. 3– 4
3– 3 Expression Operand Notation ........................................................................................ 3– 4
3– 4 Operand Name Notation ................................................................................................ 3– 5
3– 5 Operand Access Type Notation .................................................................................... 3– 5
3– 6 Operand Data Type Notation ......................................................................................... 3– 6
3– 7 Operators ....................................................................................................................... 3– 6
4– 1 Opcode Qualifiers ........................................................................................................... 4– 3
4– 2 Memory Integer Load/ Store Instructions ......................................................................... 4– 4
4– 3 Control Instructions Summary ...................................................................................... 4– 18
4– 4 Jump Instructions Branch Prediction ............................................................................ 4– 23
4– 5 Integer Arithmetic Instructions Summary ...................................................................... 4– 24
4– 6 Logical and Shift Instructions Summary ........................................................................ 4– 41
4– 7 Byte-Within-Register Manipulation Instructions Summary ........................................... 4– 47
4– 8 VAX Trapping Modes Summary ................................................................................... 4– 71
4– 9 Summary of IEEE Trapping Modes .............................................................................. 4– 72
4– 10 Trap Shadow Length Rules .......................................................................................... 4– 75
4– 11 Floating-Point Control Register (FPCR) Bit Descriptions ............................................. 4– 80
4– 12 IEEE Floating-Point Function Field Bit Summary ......................................................... 4– 85
4– 13 VAX Floating-Point Function Field Bit Summary .......................................................... 4– 87
4– 14 Memory Format Floating-Point Instructions Summary .................................................. 4– 90
4– 15 Floating-Point Branch Instructions Summary ................................................................ 4– 99
4– 16 Floating-Point Operate Instructions Summary ........................................................... 4– 102
4– 17 Miscellaneous Instructions Summary.......................................................................... 4– 132
4– 18 VAX Compatibility Instructions Summary.................................................................... 4– 149
5– 1 Processor Issue Constraints ....................................................................................... 5– 13
6– 1 PALcode Instructions that Require Recognition.............................................................. 6– 4
6– 2 Required PALcode Instructions ....................................................................................... 6– 5
9– 1 Unprivileged OpenVMS Alpha PALcode Instruction Summary ..................................... 9– 1
9– 2 Privileged OpenVMS Alpha PALcode Instructions Summary ........................................ 9– 8
10– 1 Unprivileged Digital UNIX PALcode Instruction Summary .......................................... 10– 1
10– 2 Privileged Digital UNIX PALcode Instruction Summary ............................................... 10– 2
11– 1 Unprivileged Windows NT Alpha PALcode Instruction Summary ................................ 11– 1
11– 2 Privileged Windows NT Alpha PALcode Instruction Summary ..................................... 11– 2
A– 1 Cache Block Prefetching ................................................................................................ A– 8
A– 2 Decodable Pseudo-Operations (Stylized Code Forms) ............................................... A– 14
B– 1 Floating-Point Control (FP_C) Quadword Bit Summary ................................................ B– 5
B– 2 IEEE Floating-Point Trap Handling ............................................................................... B– 8
B– 3 IEEE Standard Charts ................................................................................................. B– 12
C– 1 Instruction Format and Opcode Notation ....................................................................... C– 1
C– 2 Common Architecture Instructions ................................................................................ C– 2
C– 3 IEEE Floating-Point Instruction Function Codes ........................................................... C– 6
C– 4 VAX Floating-Point Instruction Function Codes ............................................................ C– 7
C– 5 Independent Floating-Point Instruction Function Codes ............................................... C– 8
C– 6 Opcode Summary ......................................................................................................... C– 9
C– 7 Key to Opcode Summary ............................................................................................... C– 9
C– 8 Common Architecture Opcodes in Numerical Order ................................................... C– 10
C– 9 OpenVMS Alpha Unprivileged PALcode Instructions .................................................. C– 14
C– 10 OpenVMS Alpha Privileged PALcode Instructions ...................................................... C– 15
C– 11 DIGITAL UNIX Unprivileged PALcode Instructions ..................................................... C– 16
C– 12 DIGITAL UNIX Privileged PALcode Instructions ......................................................... C– 16
C– 13 Windows NT Alpha Unprivileged PALcode Instructions ............................................. C– 17
C– 14 Windows NT Alpha Privileged PALcode instructions .................................................. C– 17
12
12
Page 13
14
xiii
C– 15 PALcode Opcodes in Numerical Order ....................................................................... C– 18
C– 16 Required PALcode Opcodes........................................................................................ C– 20
C– 17 Opcodes Reserved for PALcode.................................................................................. C– 20
C– 18 Opcodes Reserved for Compaq................................................................................... C– 21
C– 19 ASCII Character Set ..................................................................................................... C– 22
D– 1 Processor Type Assignments ........................................................................................ D– 1
D– 2 PALcode Variation Assignments .................................................................................... D– 2
D– 3 AMASK Bit Assignments ............................................................................................... D– 3
D– 4 IMPLVER Value Assignments ....................................................................................... D– 3
E– 1 DECchip 21064/ 21066/ 21068 Performance Monitoring Functions ............................ E– 5
E– 2 DECchip 21064/ 21066/ 21068 MUX Control Fields in ICCSR Register ......................... E– 7
E– 3 Bit Summary of PMCTR Register for Windows NT Alpha .......................................... E– 11
E– 4 OpenVMS Alpha and DIGITAL UNIX Performance Monitoring Functions .................. E– 12
E– 5 21164/ 21164PC Enable Counters for OpenVMS Alpha and DIGITAL UNIX............... E– 15
E– 6 21164/ 21164PC Disable Counters for OpenVMS Alpha and DIGITAL UNIX ............. E– 15
E– 7 21164 Select Desired Events for OpenVMS Alpha and DIGITAL UNIX ..................... E– 16
E– 8 21164PC Select Desired Events for OpenVMS Alpha and DIGITAL UNIX ............. E– 16
E– 9 21164/ 21164PC Select Special Options for OpenVMS Alpha and DIGITAL UNIX...... E– 17
E– 10 21164/ 21164PC Select Desired Frequencies for OpenVMS Alpha and DIGITAL UNIX E– 18
E– 11 21164/ 21164PC Read Counters for OpenVMS Alpha and DIGITAL UNIX ................. E– 19
E– 12 21164/ 21164PC Write Counters for OpenVMS Alpha and DIGITAL UNIX ................. E– 19
E– 13 21164/ 21164PC Counter 1 (PCSEL1) Event Selection .............................................. E– 19
E– 14 21164/ 21164PC Counter 2 (PCSEL2) Event Selection .............................................. E– 20
E– 15 21164 CBOX1 Event Selection ................................................................................... E– 21
E– 16 21164 CBOX2 Event Selection ................................................................................... E– 21
E– 17 21164PC PM0_MUX Event Selection ......................................................................... E– 22
E– 18 21164PC PM1_MUX Event Selection ......................................................................... E– 22
E– 19 Bit Summary of PCTR_CTL Register for Windows NT Alpha .................................... E– 24
E– 20 OpenVMS Alpha and DIGITAL UNIX Performance Monitoring Functions ................... E– 25
E– 21 21264 Enable Counters for OpenVMS Alpha and DIGITAL UNIX............................... E– 27
E– 22 21264 Disable Counters for OpenVMS Alpha and DIGITAL UNIX ............................. E– 27
E– 23 21264 Select Desired Events for OpenVMS Alpha and DIGITAL UNIX ..................... E– 28
E– 24 21264 Read Counters for OpenVMS Alpha and DIGITAL UNIX ................................. E– 28
E– 25 21264 Write Counters for OpenVMS Alpha and DIGITAL UNIX ................................. E– 28
E– 26 21264 Enable and Write Counters for OpenVMS Alpha and DIGITAL UNIX............... E– 29
13
13
Page 14
15
xiv
14
14
Page 15
16
xv
Preface
Chapters 1 through 8 and appendixes A through E of this book are directly derived from the Alpha Sys-tem
Reference Manual, Version 7 and passed engineering change orders (ECOs) that have been
applied. It is an accurate representation of the described parts of the Alpha architecture.
References in this handbook to the Alpha Architecture Reference Manual are to the Third Edition of
that manual, EY-W938E-DP.
15
15
Page 16
17
16
16
Page 17
18
Introduction 1– 1
Chapter 1
Introduction
Alpha is a 64-bit load/ store RISC architecture that is designed with particular emphasis on the
three elements that most affect performance: clock speed, multiple instruction issue, and multi-ple
processors.
The Alpha architects examined and analyzed current and theoretical RISC architecture design
elements and developed high-performance alternatives for the Alpha architecture. The archi-tects
adopted only those design elements that appeared valuable for a projected 25-year design
horizon. Thus, Alpha becomes the first 21st century computer architecture.
The Alpha architecture is designed to avoid bias toward any particular operating system or pro-gramming
language. Alpha supports the OpenVMS Alpha, DIGITAL UNIX, and Windows NT
Alpha operating systems and supports simple software migration for applications that run on
those operating systems.
This manual describes in detail how Alpha is designed to be the leadership 64-bit architecture
of the computer industry.
1.1 The Alpha Approach to RISC Architecture
Alpha Is a True 64-Bit Architecture
Alpha was designed as a 64-bit architecture. All registers are 64 bits in length and all opera-tions
are performed between 64-bit registers. It is not a 32-bit architecture that was later
expanded to 64 bits.
Alpha Is Designed for Very High-Speed Implementations
The instructions are very simple. All instructions are 32 bits in length. Memory operations are
either loads or stores. All data manipulation is done between registers.
The Alpha architecture facilitates pipelining multiple instances of the same operations because
there are no special registers and no condition codes.
The instructions interact with each other only by one instruction writing a register or memory
and another instruction reading from the same place. That makes it particularly easy to build
implementations that issue multiple instructions every CPU cycle.
17
17
Page 18
19
1– 2 Alpha Architecture Handbook
Alpha makes it easy to maintain binary compatibility across multiple implementations and easy
to maintain full speed on multiple-issue implementations. For example, there are no implemen-tation-
specific pipeline timing hazards, no load-delay slots, and no branch-delay slots.
The Alpha Approach to Byte Manipulation
The Alpha architecture reads and writes bytes between registers and memory with the LDBU
and STB instructions. (Alpha also supports word read/ writes with the LDWU and STW
instructions.)
Byte shifting and masking is performed with normal 64-bit register-to-register instructions,
crafted to keep instruction sequences short.
The Alpha Approach to Multiprocessor Shared Memory
As viewed from a second processor (including an I/ O device), a sequence of reads and writes
issued by one processor may be arbitrarily reordered by an implementation. This allows imple-mentations
to use multibank caches, bypassed write buffers, write merging, pipelined writes
with retry on error, and so forth. If strict ordering between two accesses must be maintained,
explicit memory barrier instructions can be inserted in the program.
The basic multiprocessor interlocking primitive is a RISC-style load_ locked, modify,
store_ conditional sequence. If the sequence runs without interrupt, exception, or an interfering
write from another processor, then the conditional store succeeds. Otherwise, the store fails and
the program eventually must branch back and retry the sequence. This style of interlocking
scales well with very fast caches and makes Alpha an especially attractive architecture for
building multiple-processor systems.
Alpha Instructions Include Hints for Achieving Higher Speed
A number of Alpha instructions include hints for implementations, all aimed at achieving
higher speed.
° Calculated jump instructions have a target hint that can allow much faster subroutine calls and returns.
° There are prefetching hints for the memory system that can allow much higher cache hit rates.
° There are granularity hints for the virtual-address mapping that can allow much more effective use of translation lookaside buffers for large contiguous structures.
PALcode – Alpha's Very Flexible Privileged Software Library
A Privileged Architecture Library (PALcode) is a set of subroutines that are specific to a par-ticular
Alpha operating system implementation. These subroutines provide operating-system
primitives for context switching, interrupts, exceptions, and memory management. PALcode is
similar to the BIOS libraries that are provided in personal computers.
PALcode subroutines are invoked by implementation hardware or by software CALL_ PAL
instructions.
18
18
Page 19
20
Introduction 1– 3
PALcode is written in standard machine code with some implementation-specific extensions to
provide access to low-level hardware.
PALcode lets Alpha implementations run the full OpenVMS Alpha, DIGITAL UNIX, and
Windows NT Alpha operating systems. PALcode can provide this functionality with little
overhead. For example, the OpenVMS Alpha PALcode instructions let Alpha run OpenVMS
with little more hardware than that found on a conventional RISC machine: the PAL mode bit
itself, plus four extra protection bits in each translation buffer entry.
Other versions of PALcode can be developed for real-time, teaching, and other applications.
PALcode makes Alpha an especially attractive architecture for multiple operating systems.
Alpha and Programming Languages
Alpha is an attractive architecture for compiling a large variety of programming languages.
Alpha has been carefully designed to avoid bias toward one or two programming languages.
For example:
° Alpha does not contain a subroutine call instruction that moves a register window by a fixed amount. Thus, Alpha is a good match for programming languages with many
parameters and programming languages with no parameters.
° Alpha does not contain a global integer overflow enable bit. Such a bit would need to be changed at every subroutine boundary when a FORTRAN program calls a C pro-gram.
1.2 Data Format Overview
Alpha is a load/ store RISC architecture with the following data characteristics:
° All operations are done between 64-bit registers.
° Memory is accessed via 64-bit virtual byte addresses, using the little-endian or, option-ally, the big-endian byte numbering convention.
° There are 32 integer registers and 32 floating-point registers.
° Longword (32-bit) and quadword (64-bit) integers are supported.
° Five floating-point data types are supported:
– VAX F_ floating (32-bit)
– VAX G_ floating (64-bit)
– IEEE single (32-bit)
– IEEE double (64-bit)
– IEEE extended (128-bit)
19
19
Page 20
21
1– 4 Alpha Architecture Handbook
1.3 Instruction Format Overview
As shown in Figure 1– 1,
Alpha instructions are all 32 bits in length. There are four major
instruction format classes that contain 0, 1, 2, or 3 register fields. All formats have a 6-bit
opcode.
Figure 1– 1: Instruction Format Overview
° PALcode instructions specify, in the function code field, one of a few dozen complex operations to be performed.
° Conditional branch instructions test register Ra and specify a signed 21-bit PC-rela-tive longword target displacement. Subroutine calls put the return address in register
Ra.
° Load and store instructions move bytes, words, longwords, or quadwords between register Ra and memory, using Rb plus a signed 16-bit displacement as the memory
address.
° Operate instructions for floating-point and integer operations are both represented in Figure 1– 1
by the operate format illustration and are as follows:
– Word and byte sign-extension operators.
– Floating-point operations use Ra and Rb as source registers and write the result in
register Rc. There is an 11-bit extended opcode in the function field.
– Integer operations use Ra and Rb or an 8-bit literal as the source operand, and write
the result in register Rc.
– Integer operate instructions can use the Rb field and part of the function field to
specify an 8-bit literal. There is a 7-bit extended opcode in the function field.
1. 4 Instruction Overview
PALcode Instructions
As described in Section 1. 1,
a Privileged Architecture Library (PALcode) is a set of subrou-tines
that is specific to a particular Alpha operating-system implementation. These subroutines
can be invoked by hardware or by software CALL_ PAL instructions, which use the function
field to vector to the specified subroutine.
0 31 26 25 21 20 16 15 5 4
Number Opcode
Opcode
Opcode
Opcode
Disp
Disp
Function RC RB
RB
RA
RA
RA
PALcode Format
Branch Format
Memory Format
Operate Format
20
20
Page 21
22
Introduction 1– 5
Branch Instructions
Conditional branch instructions can test a register for positive/ negative or for zero/ nonzero,
and they can test integer registers for even/ odd. Unconditional branch instructions can write a
return address into a register.
There is also a calculated jump instruction that branches to an arbitrary 64-bit address in a
register.
Load/ Store Instructions
Load and store instructions move 8-bit, 16-bit, 32-bit, or 64-bit aligned quantities from and to
memory. Memory addresses are flat 64-bit virtual addresses with no segmentation.
The VAX floating-point load/ store instructions swap words to give a consistent register format
for floating-point operations.
A 32-bit integer datum is placed in a register in a canonical form that makes 33 copies of the
high bit of the datum. A 32-bit floating-point datum is placed in a register in a canonical form
that extends the exponent by 3 bits and extends the fraction with 29 low-order zeros. The 32-bit
operates preserve these canonical forms.
Compilers, as directed by user declarations, can generate any mixture of 32-bit and 64-bit oper-ations.
The Alpha architecture has no 32/ 64 mode bit.
Integer Operate Instructions
The integer operate instructions manipulate full 64-bit values and include the usual assortment
of arithmetic, compare, logical, and shift instructions.
There are just three 32-bit integer operates: add, subtract, and multiply. They differ from their
64-bit counterparts only in overflow detection and in producing 32-bit canonical results.
There is no integer divide instruction.
The Alpha architecture also supports the following additional operations:
° Scaled add/ subtract instructions for quick subscript calculation
° 128-bit multiply for division by a constant, and multiprecision arithmetic
° Conditional move instructions for avoiding branch instructions
° An extensive set of in-register byte and word manipulation instructions
° A set of multimedia instructions that support graphics and video
Integer overflow trap enable is encoded in the function field of each instruction, rather than
kept in a global state bit. Thus, for example, both ADDQ/ V and ADDQ opcodes exist for spec-ifying
64-bit ADD with and without overflow checking. That makes it easier to pipeline
implementations.
21
21
Page 22
23
1– 6 Alpha Architecture Handbook
Floating-Point Operate Instructions
The floating-point operate instructions include four complete sets of VAX and IEEE arith-metic
instructions, plus instructions for performing conversions between floating-point and
integer quantities.
In addition to the operations found in conventional RISC architectures, Alpha includes condi-tional
move instructions for avoiding branches and merge sign/ exponent instructions for simple
field manipulation.
The arithmetic trap enables and rounding mode are encoded in the function field of each
instruction, rather than kept in global state bits. That makes it easier to pipeline
implementations.
1. 5 Instruction Set Characteristics
Alpha instruction set characteristics are as follows:
° All instructions are 32 bits long and have a regular format.
° There are 32 integer registers (R0 through R31), each 64 bits wide. R31 reads as zero, and writes to R31 are ignored.
° All integer data manipulation is between integer registers, with up to two variable regis-ter source operands (one may be an 8-bit literal) and one register destination operand.
° There are 32 floating-point registers (F0 through F31), each 64 bits wide. F31 reads as zero, and writes to F31 are ignored.
° All floating-point data manipulation is between floating-point registers, with up to two register source operands and one register destination operand.
° Instructions can move data in an integer register file to a floating-point register file, and data in a floating-point register file to an integer register file. The instructions do not
interpret bits in the register files and do not access memory.
° All memory reference instructions are of the load/ store type that moves data between registers and memory.
° There are no branch condition codes. Branch instructions test an integer or floating-point register value, which may be the result of a previous compare.
° Integer and logical instructions operate on quadwords.
° Floating-point instructions operate on G_ floating, F_ floating, and IEEE extended, dou-ble, and single operands. D_ floating "format compatibility," in which binary files of
D_ floating numbers may be processed, but without the last 3 bits of fraction precision,
is also provided.
° A minimal number of VAX compatibility instructions are included.
1. 6 Terminology and Conventions
The following sections describe the terminology and conventions used in this book.
22
22
Page 23
24
Introduction 1– 7
1.6. 1 Numbering
All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other
than decimal are indicated with the name of the base in subscript form, for example, 10 16.
1.6. 2 Security Holes
A security hole is an error of commission, omission, or oversight in a system that allows pro-tection
mechanisms to be bypassed.
Security holes exist when unprivileged software (software running outside of kernel mode)
can:
° Affect the operation of another process without authorization from the operating sys-tem;
° Amplify its privilege without authorization from the operating system; or
° Communicate with another process, either overtly or covertly, without authorization from the operating system.
The Alpha architecture has been designed to contain no architectural security holes. Hardware
(processors, buses, controllers, and so on) and software should likewise be designed to avoid
security holes.
1.6. 3 UNPREDICTABLE and UNDEFINED
The terms UNPREDICTABLE and UNDEFINED are used throughout this book. Their mean-ings
are quite different and must be carefully distinguished.
In particular, only privileged software (software running in kernel mode) can trigger UNDE-FINED
operations. Unprivileged software cannot trigger UNDEFINED operations. However,
either privileged or unprivileged software can trigger UNPREDICTABLE results or
occurrences.
UNPREDICTABLE results or occurrences do not disrupt the basic operation of the processor;
it continues to execute instructions in its normal manner. In contrast, UNDEFINED operation
can halt the processor or cause it to lose information.
The terms UNPREDICTABLE and UNDEFINED can be further described as follows:
UNPREDICTABLE
° Results or occurrences specified as UNPREDICTABLE may vary from moment to moment, implementation to implementation, and instruction to instruction within
implementations. Software can never depend on results specified as UNPREDICT-ABLE.
° An UNPREDICTABLE result may acquire an arbitrary value subject to a few con-straints. Such a result may be an arbitrary function of the input operands or of any state
information that is accessible to the process in its current access mode. UNPREDICT-ABLE
results may be unchanged from their previous values.
23
23
Page 24
25
1– 8 Alpha Architecture Handbook
Operations that produce UNPREDICTABLE results may also produce exceptions.
° An occurrence specified as UNPREDICTABLE may happen or not based on an arbi-trary choice function. The choice function is subject to the same constraints as are
UNPREDICTABLE results and, in particular, must not constitute a security hole.
Specifically, UNPREDICTABLE results must not depend upon, or be a function of,
the contents of memory locations or registers that are inaccessible to the current
process in the current access mode.
Also, operations that may produce UNPREDICTABLE results must not:
– Write or modify the contents of memory locations or registers to which the current
process in the current access mode does not have access, or
– Halt or hang the system or any of its components.
For example, a security hole would exist if some UNPREDICTABLE result depended
on the value of a register in another process, on the contents of processor temporary
registers left behind by some previously running process, or on a sequence of actions
of different processes.
UNDEFINED
° Operations specified as UNDEFINED may vary from moment to moment, implementa-tion to implementation, and instruction to instruction within implementations. The
operation may vary in effect from nothing to stopping system operation.
° UNDEFINED operations may halt the processor or cause it to lose information. How-ever, UNDEFINED operations must not cause the processor to hang, that is, reach an
unhalted state from which there is no transition to a normal state in which the machine
executes instructions.
1.6. 4 Ranges and Extents
Ranges are specified by a pair of numbers separated by two periods and are inclusive. For
example, a range of integers 0.. 4 includes the integers 0, 1, 2, 3, and 4.
Extents are specified by a pair of numbers in angle brackets separated by a colon and are inclu-sive.
For example, bits <7: 3> specify an extent of bits including bits 7, 6, 5, 4, and 3.
1.6. 5 ALIGNED and UNALIGNED
In this document the terms ALIGNED and NATURALLY ALIGNED are used interchange-ably
to refer to data objects that are powers of two in size. An aligned datum of size 2** N is
stored in memory at a byte address that is a multiple of 2** N, that is, one that has N low-order
zeros. Thus, an aligned 64-byte stack frame has a memory address that is a multiple of 64.
If a datum of size 2** N is stored at a byte address that is not a multiple of 2** N, it is called
UNALIGNED.
24
24
Page 25
26
Introduction 1– 9
1.6. 6 Must Be Zero (MBZ)
Fields specified as Must be Zero (MBZ) must never be filled by software with a non-zero
value. These fields may be used at some future time. If the processor encounters a non-zero
value in a field specified as MBZ, an Illegal Operand exception occurs.
1.6. 7 Read As Zero (RAZ)
Fields specified as Read as Zero (RAZ) return a zero when read.
1.6. 8 Should Be Zero (SBZ)
Fields specified as Should be Zero (SBZ) should be filled by software with a zero value. Non-zero
values in SBZ fields produce UNPREDICTABLE results and may produce extraneous
instruction-issue delays.
1.6. 9 Ignore (IGN)
Fields specified as Ignore (IGN) are ignored when written.
1.6. 10 Implementation Dependent (IMP)
Fields specified as Implementation Dependent (IMP) may be used for implementation-specific
purposes. Each implementation must document fully the behavior of all fields marked as IMP
by the Alpha specification.
1.6. 11 Illustration Conventions
Illustrations that depict registers or memory follow the convention that increasing addresses
run right to left and top to bottom.
1.6. 12 Macro Code Example Conventions
All instructions in macro code examples are either listed in Chapter 4
or are stylized code
forms found in Section A. 4. 6.
25
25
Page 26
27
26
26
Page 27
28
Basic Architecture 2– 1
Chapter 2
Basic Architecture
2.1 Addressing
The basic addressable unit in the Alpha architecture is the 8-bit byte. Virtual addresses are 64
bits long. An implementation may support a smaller virtual address space. The minimum vir-tual
address size is 43 bits.
Virtual addresses as seen by the program are translated into physical memory addresses by the
memory management mechanism.
Although the data types in Section 2.2
are described in terms of little-endian byte addressing,
implementations may also include big-endian addressing support, as described in Section 2. 3.
All current implementations have some big-endian support.
2.2 Data Types
Following are descriptions of the Alpha architecture data types.
2. 2.1 Byte
A byte is 8 contiguous bits starting on an addressable byte boundary. The bits are numbered
from right to left, 0 through 7, as shown in Figure 2–
1.
Figure 2– 1: Byte Format
A byte is specified by its address A. A byte is an 8-bit value. The byte is only supported in
Alpha by the load, store, sign-extend, extract, mask, insert, and zap instructions.
2.2. 2 Word
A word is 2 contiguous bytes starting on an arbitrary byte boundary. The bits are numbered
from right to left, 0 through 15, as shown in Figure 2–
2.
2– 2 Alpha Architecture Handbook
Figure 2– 2: Word Format
A word is specified by its address, the address of the byte containing bit 0.
A word is a 16-bit value. The word is only supported in Alpha by the load, store, sign-extend,
extract, mask, and insert instructions.
2.2. 3 Longword
A longword is 4 contiguous bytes starting on an arbitrary byte boundary. The bits are num-bered
from right to left, 0 through 31, as shown in Figure 2–
3.
Figure 2– 3: Longword Format
A longword is specified by its address A, the address of the byte containing bit 0. A longword
is a 32-bit value.
When interpreted arithmetically, a longword is a two's-complement integer with bits of
increasing significance from 0 through 30. Bit 31 is the sign bit. The longword is only sup-ported
in Alpha by sign-extended load and store instructions and by longword arithmetic
instructions.
Note:
Alpha implementations will impose a significant performance penalty when accessing
longword operands that are not naturally aligned. (A naturally aligned longword has zero
as the low-order two bits of its address.)
2.2. 4 Quadword
A quadword is 8 contiguous bytes starting on an arbitrary byte boundary. The bits are num-bered
from right to left, 0 through 63, as shown in Figure 2–
4.
Figure 2– 4: Quadword Format
0 15
:A
0 31
:A
Basic Architecture 2– 3
A quadword is specified by its address A, the address of the byte containing bit 0. A quadword
is a 64-bit value. When interpreted arithmetically, a quadword is either a two's-complement
integer with bits of increasing significance from 0 through 62 and bit 63 as the sign bit, or an
unsigned integer with bits of increasing significance from 0 through 63.
Note:
Alpha implementations will impose a significant performance penalty when accessing
quadword operands that are not naturally aligned. (A naturally aligned quadword has zero
as the low-order three bits of its address.)
2.2. 5 VAX Floating-Point Formats
VAX floating-point numbers are stored in one set of formats in memory and in a second set of
formats in registers. The floating-point load and store instructions convert between these for-mats
purely by rearranging bits; no rounding or range-checking is done by the load and store
instructions.
2.2.5.1 F_ floating
An F_ floating datum is 4 contiguous bytes in memory starting on an arbitrary byte boundary.
The bits are labeled from right to left, 0 through 31, as shown in Figure 2–
5 .
Figure 2– 5: F_ floating Datum
An F_ floating operand occupies 64 bits in a floating register, left-justified in the 64-bit regis-ter,
as shown in Figure 2– 6.
Figure 2– 6: F_ floating Register Format
The F_ floating load instruction reorders bits on the way in from memory, expands the expo-nent
from 8 to 11 bits, and sets the low-order fraction bits to zero. This produces in the register
an equivalent G_ floating number suitable for either F_ floating or G_ floating operations. The
mapping from 8-bit memory-format exponents to 11-bit register-format exponents is shown in
Table 2– 1.
This mapping preserves both normal values and exceptional values.
S Frac. Hi Fraction Lo :A Exp.
6 0 7 15 16 14 31
0 63 62
S
52 51 29 28
Exp. Fraction 0 :Fx
29
29
Page 30
31
2– 4 Alpha Architecture Handbook
The F_ floating store instruction reorders register bits on the way to memory and does no
checking of the low-order fraction bits. Register bits <61: 59> and <28: 0> are ignored by the
store instruction.
An F_ floating datum is specified by its address A, the address of the byte containing bit 0. The
memory form of an F_ floating datum is sign magnitude with bit 15 the sign bit, bits <14: 7> an
excess-128 binary exponent, and bits <6: 0> and <31: 16> a normalized 24-bit fraction with the
redundant most significant fraction bit not represented. Within the fraction, bits of increasing
significance are from 16 through 31 and 0 through 6. The 8-bit exponent field encodes the val-ues
0 through 255. An exponent value of 0, together with a sign bit of 0, is taken to indicate
that the F_ floating datum has a value of 0.
If the result of a VAX floating-point format instruction has a value of zero, the instruction
always produces a datum with a sign bit of 0, an exponent of 0, and all fraction bits of 0. Expo-nent
values of 1.. 255 indicate true binary exponents of –127.. 127. An exponent value of 0,
together with a sign bit of 1, is taken as a reserved operand. Floating-point instructions pro-cessing
a reserved operand take an arithmetic exception. The value of an F_ floating datum is in
the approximate range 0.29* 10**– 38 through 1.7* 10** 38. The precision of an F_ floating
datum is approximately one part in 2** 23, typically 7 decimal digits. See Section 4.7.
Note:
Alpha implementations will impose a significant performance penalty when accessing
F_ floating operands that are not naturally aligned. (A naturally aligned F_ floating datum
has zero as the low-order two bits of its address.)
2.2.5.2 G_ floating
A G_ floating datum in memory is 8 contiguous bytes starting on an arbitrary byte boundary.
The bits are labeled from right to left, 0 through 63, as shown in Figure 2–
7.
Figure 2– 7: G_ floating Datum
Table 2– 1: F_ floating Load Exponent Mapping (MAP_ F)
Memory <14: 7> Register <62: 52>
1 1111111 1 000 1111111
1 xxxxxxx 1 000 xxxxxxx (xxxxxxx not all 1's)
0 xxxxxxx 0 111 xxxxxxx (xxxxxxx not all 0's)
0 0000000 0 000 0000000
S Exp. Frac. Hi Fraction Midh :A
:A+ 4 Fraction Midl Fraction Lo
4 3 0 15 16 14 31
30
30
Page 31
32
Basic Architecture 2– 5
A G_ floating operand occupies 64 bits in a floating register, arranged as shown in Figure 2–
8.
Figure 2– 8: G_ floating Register Format
A G_ floating datum is specified by its address A, the address of the byte containing bit 0. The
form of a G_ floating datum is sign magnitude with bit 15 the sign bit, bits <14: 4> an excess-1024
binary exponent, and bits <3: 0> and <63: 16> a normalized 53-bit fraction with the redun-dant
most significant fraction bit not represented. Within the fraction, bits of increasing
significance are from 48 through 63, 32 through 47, 16 through 31, and 0 through 3. The 11-bit
exponent field encodes the values 0 through 2047. An exponent value of 0, together with a sign
bit of 0, is taken to indicate that the G_ floating datum has a value of 0.
If the result of a floating-point instruction has a value of zero, the instruction always produces
a datum with a sign bit of 0, an exponent of 0, and all fraction bits of 0. Exponent values of
1.. 2047 indicate true binary exponents of –1023.. 1023. An exponent value of 0, together with a
sign bit of 1, is taken as a reserved operand. Floating-point instructions processing a reserved
operand take a user-visible arithmetic exception. The value of a G_ floating datum is in the
approximate range 0.56* 1 0**– 308 through 0.9* 10** 308. The precision of a G_ floating datum
is approximately one part in 2** 52, typically 15 decimal digits. See Section 4.7.
Note:
Alpha implementations will impose a significant performance penalty when accessing
G_ floating operands that are not naturally aligned. (A naturally aligned G_ floating datum
has zero as the low-order three bits of its address.)
2.2.5.3 D_ floating
A D_ floating datum in memory is 8 contiguous bytes starting on an arbitrary byte boundary.
The bits are labeled from right to left, 0 through 63, as shown in Figure 2–
9.
Figure 2– 9: D_ floating Datum
A D_ floating operand occupies 64 bits in a floating register, arranged as shown in Figure 2–
10.
Figure 2– 10: D_ floating Register Format
0 63 62
S
32 31
Exp. Fraction Hi Fraction Lo :Fx
52 51
S Exp. Frac. Hi Fraction Midh :A
:A+ 4 Fraction Midl Fraction Lo
6 0 7 15 16 14 31
0 63 62
S
48 47 32 31 16 15
Exp. Fraction Midh Fraction Midl Fraction Lo :Fx
55 54
Frac. Hi
31
31
Page 32
33
2– 6 Alpha Architecture Handbook
The reordering of bits required for a D_ floating load or store is identical to that required for a
G_ floating load or store. The G_ floating load and store instructions are therefore used for load-ing
or storing D_ floating data.
A D_ floating datum is specified by its address A, the address of the byte containing bit 0. The
memory form of a D_ floating datum is identical to an F_ floating datum except for 32 addi-tional
low significance fraction bits. Within the fraction, bits of increasing significance are
from 48 through 63, 32 through 47, 16 through 31, and 0 through 6. The exponent conventions
and approximate range of values is the same for D_ floating as F_ floating. The precision of a
D_ floating datum is approximately one part in 2** 55, typically 16 decimal digits.
Notes:
D_ floating is not a fully supported data type; no D_ floating arithmetic operations are
provided in the architecture. For backward compatibility, exact D_ floating arithmetic may
be provided via software emulation. D_ floating "format compatibility" in which binary files
of D_ floating numbers may be processed, but without the last three bits of fraction
precision, can be obtained via conversions to G_ floating, G arithmetic operations, then
conversion back to D_ floating.
Alpha implementations will impose a significant performance penalty on access to
D_ floating operands that are not naturally aligned. (A naturally aligned D_ floating datum
has zero as the low-order three bits of its address.)
2.2. 6 IEEE Floating-Point Formats
The IEEE standard for binary floating-point arithmetic, ANSI/ IEEE 754-1985, defines four
floating-point formats in two groups, basic and extended, each having two widths, single and
double. The Alpha architecture supports the basic single and double formats, with the basic
double format serving as the extended single format. The values representable within a format
are specified by using three integer parameters:
° P – the number of fraction bits
° Emax – the maximum exponent
° Emin – the minimum exponent
Within each format, only the following entities are permitted:
° Numbers of the form (– 1)** S x 2** E x b( 0). b( 1) b( 2).. b( P– 1) where:
– S = 0 or 1
– E = any integer between Emin and Emax, inclusive
– b( n) = 0 or 1
° Two infinities – positive and negative
° At least one Signaling NaN
° At least one Quiet NaN
NaN is an acronym for Not-a-Number. A NaN is an IEEE floating-point bit pattern that repre-sents
something other than a number. NaNs come in two forms: Signaling NaNs and Quiet
32
32
Page 33
34
Basic Architecture 2– 7
NaNs. Signaling NaNs are used to provide values for uninitialized variables and for arithmetic
enhancements. Quiet NaNs provide retrospective diagnostic information regarding previous
invalid or unavailable data and results. Signaling NaNs signal an invalid operation when they
are an operand to an arithmetic instruction, and may generate an arithmetic exception. Quiet
NaNs propagate through almost every operation without generating an arithmetic exception.
Arithmetic with the infinities is handled as if the operands were of arbitrarily large magnitude.
Negative infinity is less than every finite number; positive infinity is greater than every finite
number.
2.2.6.1 S_ Floating
An IEEE single-precision, or S_ floating, datum occupies 4 contiguous bytes in memory start-ing
on an arbitrary byte boundary. The bits are labeled from right to left, 0 through 31, as
shown in Figure 2– 11.
Figure 2– 11: S_ floating Datum
An S_ floating operand occupies 64 bits in a floating register, left-justified in the 64-bit regis-ter,
as shown in Figure 2– 12.
Figure 2– 12: S_ floating Register Format
The S_ floating load instruction reorders bits on the way in from memory, expanding the expo-nent
from 8 to 11 bits, and sets the low-order fraction bits to zero. This produces in the register
an equivalent T_ floating number, suitable for either S_ floating or T_ floating operations. The
mapping from 8-bit memory-format exponents to 11-bit register-format exponents is shown in
Table 2– 2.
Table 2– 2: S_ floating Load Exponent Mapping (MAP_ S)
Memory <30: 23> Register <62: 52>
1 1111111 1 111 1111111
1 xxxxxxx 1 000 xxxxxxx (xxxxxxx not all 1's)
0 xxxxxxx 0 111 xxxxxxx (xxxxxxx not all 0's)
0 0000000 0 000 0000000
S Exp. Fraction :A
0 30 31 22 23
0 63 62
S
52 51 29 28
Exp. Fraction 0 :Fx
33
33
Page 34
35
2– 8 Alpha Architecture Handbook
This mapping preserves both normal values and exceptional values. Note that the mapping for
all 1's differs from that of F_ floating load, since for S_ floating all 1's is an exceptional value
and for F_ floating all 1's is a normal value.
The S_ floating store instruction reorders register bits on the way to memory and does no
checking of the low-order fraction bits. Register bits <61: 59> and <28: 0> are ignored by the
store instruction. The S_ floating load instruction does no checking of the input.
The S_ floating store instruction does no checking of the data; the preceding operation should
have specified an S_ floating result.
An S_ floating datum is specified by its address A, the address of the byte containing bit 0. The
memory form of an S_ floating datum is sign magnitude with bit 31 the sign bit, bits <30: 23>
an excess-127 binary exponent, and bits <22: 0> a 23-bit fraction.
The value (V) of an S_ floating number is inferred from its constituent sign (S), exponent (E),
and fraction (F) fields as follows:
° If E= 255 and F<> 0, then V is NaN, regardless of S.
° If E= 255 and F= 0, then V = (– 1)** S x Infinity.
° If 0 < E < 255, then V = (– 1)** S x 2**( E– 127) x (1. F).
° If E= 0 and F<> 0, then V = (– 1)** S x 2**(– 126) x (0. F).
° If E= 0 and F= 0, then V = (– 1)** S x 0 (zero).
Floating-point operations on S_ floating numbers may take an arithmetic exception for a vari-ety
of reasons, including invalid operations, overflow, underflow, division by zero, and inexact
results.
Note:
Alpha implementations will impose a significant performance penalty when accessing
S_ floating operands that are not naturally aligned. (A naturally aligned S_ floating datum
has zero as the low-order two bits of its address.)
2.2.6.2 T_ floating
An IEEE double-precision, or T_ floating, datum occupies 8 contiguous bytes in memory start-ing
on an arbitrary byte boundary. The bits are labeled from right to left, 0 through 63, as
shown in Figure 2– 13.
Figure 2– 13: T_ floating Datum
S
:A
:A+ 4
Fraction Lo
Fraction Hi Exponent
0 31 30 19 20
34
34
Page 35
36
Basic Architecture 2– 9
A T_ floating operand occupies 64 bits in a floating register, arranged as shown in Figure 2–
14.
Figure 2– 14: T_ floating Register Format
The T_ floating load instruction performs no bit reordering on input, nor does it perform check-ing
of the input data.
The T_ floating store instruction performs no bit reordering on output. This instruction does no
checking of the data; the preceding operation should have specified a T_ floating result.
A T_ floating datum is specified by its address A, the address of the byte containing bit 0. The
form of a T_ floating datum is sign magnitude with bit 63 the sign bit, bits <62: 52> an excess-1023
binary exponent, and bits <51: 0> a 52-bit fraction.
The value (V) of a T_ floating number is inferred from its constituent sign (S), exponent (E),
and fraction (F) fields as follows:
° If E= 2047 and F<> 0, then V is NaN, regardless of S.
° If E= 2047 and F= 0, then V = (– 1)** S x Infinity.
° If 0 < E < 2047, then V = (– 1)** S x 2**( E– 1023) x (1. F).
° If E= 0 and F<> 0, then V = (– 1)** S x 2**(– 1022) x (0. F).
° If E= 0 and F= 0, then V = (– 1)** S x 0 (zero).
Floating-point operations on T_ floating numbers may take an arithmetic exception for a vari-ety
of reasons, including invalid operations, overflow, underflow, division by zero, and inexact
results.
Note:
Alpha implementations will impose a significant performance penalty when accessing
T_ floating operands that are not naturally aligned. (A naturally aligned T_ floating datum
has zero as the low-order three bits of its address.)
2.2.6.3 X_ Floating
Support for 128-bit IEEE extended-precision (X_ float) floating-point is initially provided
entirely through software. This section is included to preserve the intended consistency of
implementation with other IEEE floating-point data types, should the X_ float data type be sup-ported
in future hardware.
An IEEE extended-precision, or X_ floating, datum occupies 16 contiguous bytes in memory,
starting on an arbitrary byte boundary. The bits are labeled from right to left, 0 through 127, as
shown in Figure 2– 15.
0 63 62
S
32 31
Exp. Fraction Hi Fraction Lo :Fx
52 51
35
35
Page 36
37
2– 10 Alpha Architecture Handbook
Figure 2– 15: X_ floating Datum
An X_ floating datum occupies two consecutive even/ odd floating-point registers (such as
F4/ F5), as shown in Figure 2– 16.
Figure 2– 16: X_ floating Register Format
An X_ floating datum is specified by its address A, the address of the byte containing bit 0. The
form of an X_ floating datum is sign magnitude with bit 127 the sign bit, bits <126: 112> an
excess– 16383 binary exponent, and bits <111: 0> a 112-bit fraction.
The value (V) of an X_ floating number is inferred from its constituent sign (S), exponent (E),
and fraction (F) fields as follows:
° If E= 32767 and F<> 0, then V is a NaN, regardless of S.
° If E= 32767 and F= 0, then V = (– 1)** S x Infinity.
° If 0 < E < 32767, then V = (– 1)** S x 2**( E– 16383) x (1. F).
° If E= 0 and F<> 0, then V = (– 1)** S x 2**(– 16382) x (0. F).
° If E = 0 and F = 0, then V = (– 1)** S x 0 (zero).
Note:
Alpha implementations will impose a significant performance penalty when accessing
X_ floating operands that are not naturally aligned. (A naturally aligned X_ floating datum
has zero as the low-order four bits of its address.)
X_ Floating Big-Endian Formats
Section 2. 3
describes Alpha support for big-endian data types. It is intended that software or
hardware implementation for a big-endian X_ float data type comply with that support and have
the following formats.
0
S Exponent Fraction_ high
Fraction_ low
48 47 63 62
:A
:A+ 8
127 0 64 63
S
126 112 111
Exponent Fraction_ high Fraction_ low
Basic Architecture 2– 11
Figure 2– 17: X_ floating Big-Endian Datum
Figure 2– 18: X_ floating Big-Endian Register Format
2.2. 7 Longword Integer Format in Floating-Point Unit
A longword integer operand occupies 32 bits in memory, arranged as shown in Figure 2–
19.
Figure 2– 19: Longword Integer Datum
A longword integer operand occupies 64 bits in a floating register, arranged as shown in Fig-ure
2– 20.
Figure 2– 20: Longword Integer Floating-Register Format
There is no explicit longword load or store instruction; the S_ floating load/ store instructions
are used to move longword data into or out of the floating registers. The register bits <61: 59>
are set by the S_ floating load exponent mapping. They are ignored by S_ floating store. They
are also ignored in operands of a longword integer operate instruction, and they are set to 000
in the result of a longword operate instruction.
The register format bit <62> "I" in Figure 2– 20
is part of the Integer field in Figure 2– 19
and
represents the high-order bit of that field.
15
S Exponent Fraction_ high
Fraction_ low
0
A+ 8:
A:
Byte
Byte
0 15
S Exponent Fraction_ high Fraction_ low
Fn OR 1 Fn
Byte Byte
S Integer :A
0 30 31
0 63 62
S
59 58 29 28
xxx Integer 0 :Fx
61
I
37
37
Page 38
39
2– 12 Alpha Architecture Handbook
Note:
Alpha implementations will impose a significant performance penalty when accessing
longwords that are not naturally aligned. (A naturally aligned longword datum has zero as
the low-order two bits of its address.)
2.2. 8 Quadword Integer Format in Floating-Point Unit
A quadword integer operand occupies 64 bits in memory, arranged as shown in Figure 2–
21.
Figure 2– 21: Quadword Integer Datum
A quadword integer operand occupies 64 bits in a floating register, arranged as shown in Fig-ure
2– 22.
Figure 2– 22: Quadword Integer Floating-Register Format
There is no explicit quadword load or store instruction; the T_ floating load/ store instructions
are used to move quadword data between memory and the floating registers. (The ITOFT and
FTOIT are used to move quadword data between integer and floating registers.)
The T_ floating load instruction performs no bit reordering on input. The T_ floating store
instruction performs no bit reordering on output. This instruction does no checking of the data;
when used to store quadwords, the preceding operation should have specified a quadword
result.
Note:
Alpha implementations will impose a significant performance penalty when accessing
quadwords that are not naturally aligned. (A naturally aligned quadword datum has zero as
the low-order three bits of its address.)
2.2. 9 Data Types with No Hardware Support
° The following VAX data types are not directly supported in Alpha hardware. Octaword
° H_ floating
° D_ floating (except load/ store and convert to/ from G_ floating)
° Variable-Length Bit Field
° Character String
S
:A
:A+ 4
Integer Lo
Integer Hi
0 31 30
0 63 62
S
32 31
Integer Hi Integer Lo :Fx
38
38
Page 39
40
Basic Architecture 2– 13
° Trailing Numeric String
° Leading Separate Numeric String
° Packed Decimal String
2. 3 Big-Endian Addressing Support
Alpha implementations may include optional big-endian addressing support.
In a little-endian machine, the bytes within a quadword are numbered right to left:
Figure 2– 23: Little-Endian Byte Addressing
In a big-endian machine, they are numbered left to right:
Figure 2– 24: Big-Endian Byte Addressing
Bit numbering within bytes is not affected by the byte numbering convention (big-endian or lit-tle-
endian).
The format for the X_ floating big-endian data type is shown in Section 2.2.6.
3.
The byte numbering convention does not matter when accessing complete aligned quadwords
in memory. However, the numbering convention does matter when accessing smaller or
unaligned quantities, or when manipulating data in registers, as follows:
° A quadword load or store of data at location 0 moves the same eight bytes under both numbering conventions. However, a longword load or store of data at location 4 must
move the leftmost half of a quadword under the little-endian convention, and the right-most
half under the big-endian convention. Thus, to support both conventions, the con-vention
being used must be known and it must affect longword load/ store operations.
° A byte extract of byte 5 from a quadword of data into the low byte of a register requires a right shift of 5 bytes under the little-endian convention, but a right shift of 2 bytes
under the big-endian convention.
° Manipulation of data in a register is almost the same for both conventions. In both, inte-ger and floating-point data have their sign bits in the leftmost byte and their least signif-icant
bit in the rightmost byte, so the same integer and floating-point instructions are
5 43 21 6 7 0
2 34 56 1 0 7
39
39
Page 40
41
2– 14 Alpha Architecture Handbook
used unchanged for both conventions. Big-endian character strings have their most sig-nificant
character on the left, while little-endian strings have their most significant char-acter
on the right.
° The compare byte (CMPBGE) instruction is neutral about direction, doing eight byte compares in parallel. However, following the CMPBGE instruction, the code is differ-ent
that examines the byte mask to determine which string is larger, depending on
whether the rightmost or leftmost unequal byte is used. Thus, compilers must be
instructed to generate somewhat different code sequences for the two conventions.
Implementations that include big-endian support must supply all of the following features:
° A means at boot time to choose the byte numbering convention. The implementation is not required to support dynamically changing the convention during program execu-tion.
The chosen convention applies to all code executed, both operating-system and
user.
° If the big-endian convention is chosen, the longword-length load/ store instructions (LDF, LDL, LDL_ L, LDS, STF, STL, STL_ C, STS) invert bit va< 2> (bit 2 of the vir-tual
address). This has the effect of accessing the half of a quadword other than the half
that would be accessed under the little-endian convention.
° If the big-endian convention is chosen, the word-length load instruction, LDWU, inverts bits va< 1: 2> (bits 1 and 2 of the virtual address). This has the effect of accessing
the half of the longword that would be accessed under the little-endian convention.
° If the big-endian convention is chosen, the byte-length load instruction, LDBU, inverts bits va< 0: 2> (bits 0 through 2 of the virtual address). This has the effect of accessing
the half of the word that would be accessed under the little-endian convention.
° If the big-endian convention is chosen, the byte manipulation instructions (EXTxx, INSxx, MSKxx) invert bits Rbv< 2: 0>. This has the effect of changing a shift of 5 bytes
into a shift of 2 bytes, for example.
The instruction stream is always considered to be little-endian, and is independent of the cho-sen
byte numbering convention. Compilers, linkers, and debuggers must be aware of this when
accessing an instruction stream using data-stream load/ store instructions. Thus, the rightmost
instruction in a quadword is always executed first and always has the instruction-stream
address 0 MOD 8. The same bytes accessed by a longword load/ store instruction have data-stream
address 0 MOD 8 under the little-endian convention, and 4 MOD 8 under the big-endian
convention.
Using either byte numbering convention, it is sometimes necessary to access data that origi-nated
on a machine that used the other convention. When this occurs, it is often necessary to
swap the bytes within a datum. See Section A. 4.3
for a suggested code sequence.
40
40
Page 41
42
Instruction Formats 3– 1
Chapter 3
Instruction Formats
3.1 Alpha Registers
Each Alpha processor has a set of registers that hold the current processor state. If an Alpha
system contains multiple Alpha processors, there are multiple per-processor sets of these
registers.
3.1. 1 Program Counter
The Program Counter (PC) is a special register that addresses the instruction stream. As each
instruction is decoded, the PC is advanced to the next sequential instruction. This is referred to
as the updated PC. Any instruction that uses the value of the PC will use the updated PC. The
PC includes only bits <63: 2> with bits <1: 0> treated as RAZ/ IGN. This quantity is a long-word-
aligned byte address. The PC is an implied operand on conditional branch and subroutine
jump instructions. The PC is not accessible as an integer register.
3.1. 2 Integer Registers
There are 32 integer registers (R0 through R31), each 64 bits wide.
Register R31 is assigned special meaning by the Alpha architecture. When R31 is specified as
a register source operand, a zero-valued operand is supplied.
For all cases except the Unconditional Branch and Jump instructions, results of an instruction
that specifies R31 as a destination operand are discarded. Also, it is UNPREDICTABLE
whether the other destination operands (implicit and explicit) are changed by the instruction. It
is implementation dependent to what extent the instruction is actually executed once it has
been fetched. An exception is never signaled for a load that specifies R31 as a destination oper-ation.
For all other operations, it is UNPREDICTABLE whether exceptions are signaled during
the execution of such an instruction. Note, however, that exceptions associated with the
instruction fetch of such an instruction are always signaled.
Implementation note:
As described in Section A. 3.5,
certain load instructions to an R31 destination are the
preferred method for performing a cache block prefetch.
41
41
Page 42
43
3– 2 Alpha Architecture Handbook
There are some interesting cases involving R31 as a destination:
° STx_ C R31,disp( Rb)
Although this might seem like a good way to zero out a shared location and reset the
lock_ flag, this instruction causes the lock_ flag and virtual location {Rbv +
SEXT( disp)} to become UNPREDICTABLE.
° LDx_ L R31,disp( Rb)
This instruction produces no useful result since it causes both lock_ flag and
locked_ physical_ address to become UNPREDICTABLE.
Unconditional Branch (BR and BSR) and Jump (JMP, JSR, RET, and JSR_ COROUTINE)
instructions, when R31 is specified as the Ra operand, execute normally and update the PC
with the target virtual address. Of course, no PC value can be saved in R31.
3.1. 3 Floating-Point Registers
There are 32 floating-point registers (F0 through F31), each 64 bits wide.
When F31 is specified as a register source operand, a true zero-valued operand is supplied. See
Section 4. 7.3
for a definition of true zero.
Results of an instruction that specifies F31 as a destination operand are discarded and it is
UNPREDICTABLE whether the other destination operands (implicit and explicit) are changed
by the instruction. In this case, it is implementation-dependent to what extent the instruction is
actually executed once it has been fetched. An exception is never signaled for a load that speci-fies
F31 as a destination operation. For all other operations, it is UNPREDICTABLE whether
exceptions are signaled during the execution of such an instruction. Note, however, that excep-tions
associated with the instruction fetch of such an instruction are always signaled.
Implementation note:
As described in Section A. 3. 5,
certain load instructions to an F31 destination are the
preferred method for signalling a cache block prefetch.
A floating-point instruction that operates on single-precision data reads all bits <63: 0> of the
source floating-point register. A floating-point instruction that produces a single-precision
result writes all bits <63: 0> of the destination floating-point register.
3. 1.4 Lock Registers
There are two per-processor registers associated with the LDx_ L and STx_ C instructions, the
lock_ flag and the locked_ physical_ address register. The use of these registers is described in
Section 4. 2.
42
42
Page 43
44
Instruction Formats 3– 3
3.1. 5 Processor Cycle Counter (PCC) Register
The PCC register consists of two 32-bit fields. The low-order 32 bits (PCC< 31: 0>) are an
unsigned wrapping counter, PCC_ CNT. The high-order 32 bits (PCC< 63: 32>), PCC_ OFF, are
operating system dependent in their implementation.
PCC_ CNT is the base clock register for measuring time intervals and is suitable for timing
intervals on the order of nanoseconds.
PCC_ CNT increments once per N CPU cycles, where N is an implementation-specific integer
in the range 1.. 16. The cycle counter frequency is the number of times the processor cycle
counter gets incremented per second. The integer count wraps to 0 from a count of FFFF
FFFF 16 . The counter wraps no more frequently than 1. 5 times the implementation's interval
clock interrupt period (which is two thirds of the interval clock interrupt frequency), which
guarantees that an interrupt occurs before PCC _CNT overflows twice.
PCC_ OFF need not contain a value related to time and could contain all zeros in a simple
implementation. However, if PCC_ OFF is used to calculate a per-process or per-thread cycle
count, it must contain a value that, when added to PCC_ CNT, returns the total PCC register
count for that process or thread, modulo 2** 32.
Implementation Note:
OpenVMS Alpha and DIGITAL UNIX supply a per-process value in PCC_ OFF.
PCC is required on all implementations. It is required for every processor, and each processor
on a multiprocessor system has its own private, independent PCC.
The PCC is read by the RPCC instruction. See Section 4.11.8.
3.1. 6 Optional Registers
Some Alpha implementations may include optional memory prefetch or VAX compatibility
processor registers.
3.1.6.1 Memory Prefetch Registers
If the prefetch instructions FETCH and FETCH_ M are implemented, an implementation will
include two sets of state prefetch registers used by those instructions. The use of these regis-ters
is described in Section 4.11.
These registers are not directly accessible by software and are
listed for completeness.
3.1.6.2 VAX Compatibility Register
The VAX compatibility instructions RC and RS include the intr_ flag register, as described in
Section 4. 12.
3.2 Notation
The notation used to describe the operation of each instruction is given as a sequence of con-trol
and assignment statements in an ALGOL-like syntax.
43
43
Page 44
45
3– 4 Alpha Architecture Handbook
3.2. 1 Operand Notation
Tables 3– 1,
3– 2,
and 3– 3
list the notation for the operands, the operand values, and the other
expression operands.
Table 3– 1: Operand Notation
Notation Meaning
Ra An integer register operand in the Ra field of the instruction
Rb An integer register operand in the Rb field of the instruction
#b An integer literal operand in the Rb field of the instruction
Rc An integer register operand in the Rc field of the instruction
Fa A floating-point register operand in the Ra field of the instruction
Fb A floating-point register operand in the Rb field of the instruction
Fc A floating-point register operand in the Rc field of the instruction
Table 3– 2: Operand Value Notation
Notation Meaning
Rav The value of the Ra operand. This is the contents of register Ra.
Rbv The value of the Rb operand. This could be the contents of register Rb, or
a zero-extended 8-bit literal in the case of an Operate format instruction.
Fav The value of the floating point Fa operand. This is the contents of register
Fa.
Fbv The value of the floating point Fb operand. This is the contents of register
Fb.
Table 3– 3: Expression Operand Notation
Notation Meaning
IPR_ x Contents of Internal Processor Register x)
IPR_ SP[ mode] Contents of the per-mode stack pointer selected by mode
PC Updated PC value
Rn Contents of integer register n
Fn Contents of floating-point register n
X[ m] Element m of array X
44
44
Page 45
46
Instruction Formats 3– 5
3.2. 2 Instruction Operand Notation
The notation used to describe instruction operands follows from the operand specifier notation
used in the VAX Architecture Standard. Instruction operands are described as follows:
<name>.< access type>< data type>
3.2.2.1 Operand Name Notation
Specifies the instruction field (Ra, Rb, Rc, or disp) and register type of the operand (integer or
floating). It can be one of the following:
3.2.2.2 Operand Access Type Notation
A letter that denotes the operand access type:
Table 3– 4: Operand Name Notation
Name Meaning
disp The displacement field of the instruction
fnc The PALcode function field of the instruction
Ra An integer register operand in the Ra field of the instruction
Rb An integer register operand in the Rb field of the instruction
#b An integer literal operand in the Rb field of the instruction
Rc An integer register operand in the Rc field of the instruction
Fa A floating-point register operand in the Ra field of the instruction
Fb A floating-point register operand in the Rb field of the instruction
Fc A floating-point register operand in the Rc field of the instruction
Table 3– 5: Operand Access Type Notation
Access Type Meaning
a The operand is used in an address calculation to form an effective
address. The data type code that follows indicates the units of addressabil-ity
(or scale factor) applied to this operand when the instruction is
decoded.
For example:
". al" means scale by 4 (longwords) to get byte units (used in branch dis-placements);
". ab" means the operand is already in byte units (used in
load/ store instructions).
i The operand is an immediate literal in the instruction.
45
45
Page 46
47
3– 6 Alpha Architecture Handbook
3.2.2.3 Operand Data Type Notation
A letter that denotes the data type of the operand:
3.2. 3 Operators
Table 3– 7
describes the operators:
r The operand is read only.
m The operand is both read and written.
w The operand is write only.
Table 3– 6: Operand Data Type Notation
Data Type Meaning
b Byte
f F_ floating
g G_ floating
l Longword
q Quadword
s IEEE single floating (S_ floating)
t IEEE double floating (T_ floating)
w Word
x The data type is specified by the instruction
Table 3– 7: Operators
Operator Meaning
! Comment delimiter
+ Addition
-Subtraction
* Signed multiplication
*U Unsigned multiplication
** Exponentiation (left argument raised to right argument)
/ Division
¬ Replacement
Table 3– 5: Operand Access Type Notation (Continued)
Access Type Meaning
46
46
Page 47
48
Instruction Formats 3– 7
|| Bit concatenation
{} Indicates explicit operator precedence
(x) Contents of memory location whose address is x
x <m: n> Contents of bit field of x defined by bits n through m
x <m> M'th bit of x
ACCESS( x, y) Accessibility of the location whose address is x using the
access mode y. Returns a Boolean value TRUE if the
address is accessible, else FALSE.
AND Logical product
ARITH_ RIGHT_ SHIFT( x, y) Arithmetic right shift of first operand by the second oper-and.
Y is an unsigned shift value. Bit 63, the sign bit, is
copied into vacated bit positions and shifted out bits are
discarded.
BYTE_ ZAP( x, y) X is a quadword, y is an 8-bit vector in which each bit
corresponds to a byte of the result. The y bit to x byte cor-respondence
is y <n> « x <8n+ 7: 8n>. This correspon-dence
also exists between y and the result.
For each bit of y from n = 0 to 7, if y <n> is 0 then byte
<n> of x is copied to byte <n> of result, and if y <n> is 1
then byte <n> of result is forced to all zeros.
Table 3– 7: Operators (Continued)
Operator Meaning
47
47
Page 48
49
3– 8 Alpha Architecture Handbook
CASE The CASE construct selects one of several actions based
on the value of its argument. The form of a case is:
CASE argument OF
argvalue1: action_ 1
argvalue2: action_ 2
...
argvaluen: action_ n
[otherwise: default_ action]
ENDCASE
If the value of argument is argvalue1 then action_ 1 is exe-cuted;
if argument = argvalue2, then action_ 2 is executed,
and so forth.
Once a single action is executed, the code stream breaks
to the ENDCASE (there is an implicit break as in Pascal).
Each action may nonetheless be a sequence of
pseudocode operations, one operation per line.
Optionally, the last argvalue may be the atom 'otherwise'.
The associated default action will be taken if none of the
other argvalues match the argument.
DIV Integer division (truncates)
LEFT_ SHIFT( x, y) Logical left shift of first operand by the second operand. Y
is an unsigned shift value. Zeros are moved into the
vacated bit positions, and shifted out bits are discarded.
LOAD_ LOCKED The processor records the target physical address in a per-processor
locked_ physical_ address register and sets the
per-processor lock_ flag.
lg Log to the base 2.
MAP_ x F_ float or S_ float memory-to-register exponent mapping
function.
MAXS( x, y) Returns the larger of x and y, with x and y interpreted as
signed integers.
MAXU( x, y) Returns the larger of x and y, with x and y interpreted as
unsigned integers.
MINS( x, y) Returns the smaller of x and y, with x and y interpreted as
signed integers.
MINU( x, y) Returns the smaller of x and y, with x and y interpreted as
unsigned integers.
x MOD y x modulo y
Table 3– 7: Operators (Continued)
Operator Meaning
48
48
Page 49
50
Instruction Formats 3– 9
NOT Logical (ones) complement
OR Logical sum
PHYSICAL_ ADDRESS Translation of a virtual address
PRIORITY_ ENCODE Returns the bit position of most significant set bit, inter-preting
its argument as a positive integer (= int( lg( x))). For
example:
priority_ encode( 255 ) = 7
Relational Operators:
RIGHT_ SHIFT( x, y) Logical right shift of first operand by the second operand.
Y is an unsigned shift value. Zeros are moved into
vacated bit positions, and shifted out bits are discarded.
SEXT( x) X is sign-extended to the required size.
STORE_ CONDITIONAL If the lock_ flag is set, then do the indicated store and clear
the lock_ flag.
Table 3– 7: Operators (Continued)
Operator Meaning
Operator Meaning
LT Less than signed
LTU Less than unsigned
LE Less or equal signed
LEU Less or equal unsigned
EQ Equal signed and unsigned
NE Not equal signed and unsigned
GE Greater or equal signed
GEU Greater or equal unsigned
GT Greater signed
GTU Greater unsigned
LBC Low bit clear
LBS Low bit signed
49
49
Page 50
51
3– 10 Alpha Architecture Handbook
3.2. 4 Notation Conventions
The following conventions are used:
° Only operands that appear on the left side of a replacement operator are modified.
° No operator precedence is assumed other than that replacement (¬) has the lowest pre-cedence. Explicit precedence is indicated by the use of "{}".
° All arithmetic, logical, and relational operators are defined in the context of their oper-ands. For example, "+" applied to G_ floating operands means a G_ floating add,
whereas "+" applied to quadword operands is an integer add. Similarly, "LT" is a
G_ floating comparison when applied to G_ floating operands and an integer comparison
when applied to quadword operands.
3.3 Instruction Formats
There are five basic Alpha instruction formats:
° Memory
° Branch
° Operate
° Floating-point Operate
° PALcode
All instruction formats are 32 bits long with a 6-bit major opcode field in bits <31: 26> of the
instruction.
Any unused register field (Ra, Rb, Fa, Fb) of an instruction must be set to a value of 31.
Software Note:
There are several instructions, each formatted as a memory instruction, that do not use the
Ra and/ or Rb fields. These instructions are: Memory Barrier, Fetch, Fetch_ M, Read
Process Cycle Counter, Read and Clear, Read and Set, and Trap Barrier.
TEST( x, cond) The contents of register x are tested for branch condition
(cond) true. TEST returns a Boolean value TRUE if x
bears the specified relation to 0, else FALSE is returned.
Integer and floating test conditions are drawn from the
preceding list of relational operators.
XOR Logical difference
ZEXT( x) X is zero-extended to the required size.
Table 3– 7: Operators (Continued)
Operator Meaning
50
50
Page 51
52
Instruction Formats 3– 11
3.3. 1 Memory Instruction Format
The Memory format is used to transfer data between registers and memory, to load an effec-tive
address, and for subroutine jumps. It has the format shown in Figure 3–
1.
Figure 3– 1: Memory Instruction Format
A Memory format instruction contains a 6-bit opcode field, two 5-bit register address fields, Ra
and Rb, and a 16-bit signed displacement field.
The displacement field is a byte offset. It is sign-extended and added to the contents of register
Rb to form a virtual address. Overflow is ignored in this calculation.
The virtual address is used as a memory load/ store address or a result value, depending on the
specific instruction. The virtual address (va) is computed as follows for all memory format
instructions except the load address high (LDAH):
va ¬ {Rbv + SEXT( Memory_ disp)}
For LDAH the virtual address (va) is computed as follows:
va ¬ {Rbv + SEXT( Memory_ disp* 65536)}
3.3.1.1 Memory Format Instructions with a Function Code
Memory format instructions with a function code replace the memory displacement field in the
memory instruction format with a function code that designates a set of miscellaneous instruc-tions.
The format is shown in Figure 3–
2.
Figure 3– 2: Memory Instruction with Function Code Format
The memory instruction with function code format contains a 6-bit opcode field and a 16-bit
function field. Unused function codes produce UNPREDICTABLE but not UNDEFINED
results; they are not security holes.
There are two fields, Ra and Rb. The usage of those fields depends on the instruction. See Sec-tion
4.11.
0 31 26 25 21 20 16 15
Opcode Ra Rb Memory_ disp
0 31 26 25 21 20 16 15
Opcode Ra Rb Function
51
51
Page 52
53
3– 12 Alpha Architecture Handbook
3.3.1.2 Memory Format Jump Instructions
For computed branch instructions (CALL, RET, JMP, JSR_ COROUTINE) the displacement
field is used to provide branch-prediction hints as described in Section 4.3.
3.3. 2 Branch Instruction Format
The Branch format is used for conditional branch instructions and for PC-relative subroutine
jumps. It has the format shown in Figure 3–
3.
Figure 3– 3: Branch Instruction Format
A Branch format instruction contains a 6-bit opcode field, one 5-bit register address field (Ra),
and a 21-bit signed displacement field.
The displacement is treated as a longword offset. This means it is shifted left two bits (to
address a longword boundary), sign-extended to 64 bits, and added to the updated PC to form
the target virtual address. Overflow is ignored in this calculation. The target virtual address
(va) is computed as follows:
va ¬ PC + {4* SEXT( Branch_ disp)}
3.3. 3 Operate Instruction Format
The Operate format is used for instructions that perform integer register to integer register
operations. The Operate format allows the specification of one destination operand and two
source operands. One of the source operands can be a literal constant. The Operate format in
Figure 3– 4
shows the two cases when bit <12> of the instruction is 0 and 1.
Figure 3– 4: Operate Instruction Format
0 31 26 25 21 20
Opcode Ra Branch_ disp
0 31 26 25
0
13 12 11 21 20 16 15 5 4
Opcode Ra Rb SBZ Function Rc
0 31 26 25
1
13 12 11 21 20 5 4
Opcode Ra LIT Function Rc
52
52
Page 53
54
Instruction Formats 3– 13
An Operate format instruction contains a 6-bit opcode field and a 7-bit function code field.
Unused function codes for opcodes defined as reserved in the Version 5 Alpha architecture
specification (May 1992) produce an illegal instruction trap. Those opcodes are 01, 02, 03, 04,
05, 06, 07, 0A, 0C, 0D, 0E, 14, 19, 1B, 1D, 1E, and 1F. For other opcodes, unused function
codes produce UNPREDICTABLE but not UNDEFINED results; they are not security holes.
There are three operand fields, Ra, Rb, and Rc.
The Ra field specifies a source operand. Symbolically, the integer Rav operand is formed as
follows:
IF inst< 25: 21> EQ 31 THEN
Rav ¬ 0
ELSE
Rav ¬ Ra
END
The Rb field specifies a source operand. Integer operands can specify a literal or an integer
register using bit <12> of the instruction.
If bit <12> of the instruction is 0, the Rb field specifies a source register operand.
If bit <12> of the instruction is 1, an 8-bit zero-extended literal constant is formed by bits
<20: 13> of the instruction. The literal is interpreted as a positive integer between 0 and 255
and is zero-extended to 64 bits. Symbolically, the integer Rbv operand is formed as follows:
IF inst <12> EQ 1 THEN
Rbv ¬ ZEXT( inst< 20: 13>)
ELSE
IF inst <20: 16> EQ 31 THEN
Rbv ¬ 0
ELSE
Rbv ¬ Rb
END
END
The Rc field specifies a destination operand.
3.3. 4 Floating-Point Operate Instruction Format
The Floating-point Operate format is used for instructions that perform floating-point register
to floating-point register operations. The Floating-point Operate format allows the specifica-tion
of one destination operand and two source operands. The Floating-point Operate format is
shown in Figure 3– 5.
Figure 3– 5: Floating-Point Operate Instruction Format
0 31 26 25 21 20 16 15 5 4
Opcode Fa Fb Function Fc
53
53
Page 54
55
3– 14 Alpha Architecture Handbook
A Floating-point Operate format instruction contains a 6-bit opcode field and an 11-bit func-tion
field. Unused function codes for those opcodes defined as reserved in the Version 5 Alpha
architecture specification (May 1992) produce an illegal instruction trap. Those opcodes are
01, 02, 03, 04, 05, 06, 07, 14, 19, 1B, 1D, 1E, and 1F. For other opcodes, unused function
codes produce UNPREDICTABLE but not UNDEFINED results; they are not security holes.
There are three operand fields, Fa, Fb, and Fc. Each operand field specifies either an integer or
floating-point operand as defined by the instruction.
The Fa field specifies a source operand. Symbolically, the Fav operand is formed as follows:
IF inst< 25: 21> EQ 31 THEN
Fav ¬ 0
ELSE
Fav ¬ Fa
END
The Fb field specifies a source operand. Symbolically, the Fbv operand is formed as follows:
IF inst< 20: 16> EQ 31 THEN
Fbv ¬ 0
ELSE
Fbv ¬ Fb
END
Note:
Neither Fa nor Fb can be a literal in Floating-point Operate instructions.
The Fc field specifies a destination operand.
3.3.4.1 Floating-Point Convert Instructions
Floating-point Convert instructions use a subset of the Floating-point Operate format and per-form
register-to-register conversion operations. The Fb operand specifies the source; the Fa
field must be F31.
3.3.4.2 Floating-Point/ Integer Register Moves
Instructions that move data between a floating-point register file and an integer register file are
a subset of of the Floating-point Operate format. The unused source field must be 31.
3.3. 5 PALcode Instruction Format
The Privileged Architecture Library (PALcode) format is used to specify extended processor
functions. It has the format shown in Figure 3–
6.
54
54
Page 55
56
Instruction Formats 3– 15
Figure 3– 6: PALcode Instruction Format
The 26-bit PALcode function field specifies the operation. The source and destination oper-ands
for PALcode instructions are supplied in fixed registers that are specified in the individual
instruction descriptions.
An opcode of zero and a PALcode function of zero specify the HALT instruction.
0 31 26 25
Opcode PALcode Function
55
55
Page 56
57
56
56
Page 57
58
Instruction Descriptions 4– 1
Chapter 4
Instruction Descriptions
4. 1 Instruction Set Overview
This chapter describes the instructions implemented by the Alpha architecture. The instruction
set is divided into the following sections:
Within each major section, closely related instructions are combined into groups and described
together.
The instruction group description is composed of the following:
° The group name
° The format of each instruction in the group, which includes the name, access type, and data type of each instruction operand
° The operation of the instruction
° Exceptions specific to the instruction
° The instruction mnemonic and name of each instruction in the group
Instruction Type Section
Integer load and store 4. 2
Integer control 4.3
Integer arithmetic 4.4
Logical and shift 4. 5
Byte manipulation 4. 6
Floating-point load and store 4.7
Floating-point control 4.8
Floating-point branch 4.9
Floating-point operate 4.10
Miscellaneous 4. 11
VAX compatibility 4.12
Multimedia (graphics and video) 4.13
57
57
Page 58
59
4– 2 Alpha Architecture Handbook
° Qualifiers specific to the instructions in the group
° A description of the instruction operation
° Optional programming examples and optional notes on the instruction
4.1. 1 Subsetting Rules
An instruction that is omitted in a subset implementation of the Alpha architecture is not per-formed
in either hardware or PALcode. System software may provide emulation routines for
subsetted instructions.
4.1. 2 Floating-Point Subsets
Floating-point support is optional on an Alpha processor. An implementation that supports
floating-point must implement the following:
° The 32 floating-point registers
° The Floating-point Control Register (FPCR) and the instructions to access it
° The floating-point branch instructions
° The floating-point copy sign (CPYSx) instructions
° The floating-point convert instructions
° The floating-point conditional move instruction (FCMOV)
° The S_ floating and T_ floating memory operations
Software Note:
A system that will not support floating-point operations is still required to provide the 32
floating-point registers, the Floating-point Control Register (FPCR) and the instructions to
access it, and the T_ floating memory operations if the system intends to support the
OpenVMS Alpha operating system. This requirement facilitates the implementation of a
floating-point emulator and simplifies context-switching.
In addition, floating-point support requires at least one of the following subset groups:
1. VAX Floating-point Operate and Memory instructions (F_ and G_ floating).
2. IEEE Floating-point Operate instructions (S_ and T_ floating). Within this group, an implementation can choose to include or omit separately the ability to perform IEEE
rounding to plus infinity and minus infinity.
Note:
If one instruction in a group is provided, all other instructions in that group must be
provided. An implementation with full floating-point support includes both groups; a
subset floating-point implementation supports only one of these groups. The individual
instruction descriptions indicate whether an instruction can be subsetted.
58
58
Page 59
60
Instruction Descriptions 4– 3
4.1. 3 Software Emulation Rules
General-purpose layered and application software that executes in User mode may assume that
certain loads (LDL, LDQ, LDF, LDG, LDS, and LDT) and certain stores (STL, STQ, STF,
STG, STL, and STT) of unaligned data are emulated by system software. General-purpose lay-ered
and application software that executes in User mode may assume that subsetted
instructions are emulated by system software. Frequent use of emulation may be significantly
slower than using alternative code sequences.
Emulation of loads and stores of unaligned data and subsetted instructions need not be pro-vided
in privileged access modes. System software that supports special-purpose dedicated
applications need not provide emulation in User mode if emulation is not needed for correct
execution of the special-purpose applications.
4.1. 4 Opcode Qualifiers
Some Operate format and Floating-point Operate format instructions have several variants. For
example, for the VAX formats, Add F_ floating (ADDF) is supported with and without float-ing
underflow enabled and with either chopped or VAX rounding. For IEEE formats, IEEE
unbiased rounding, chopped, round toward plus infinity, and round toward minus infinity can
be selected.
The different variants of such instructions are denoted by opcode qualifiers, which consist of a
slash (/) followed by a string of selected qualifiers. Each qualifier is denoted by a single char-acter
as shown in Table 4– 1.
The opcodes for each qualifier are listed in Appendix
C.
The default values are normal rounding, exception completion disabled, inexact result dis-abled,
floating underflow disabled, and integer overflow disabled.
Table 4– 1: Opcode Qualifiers
Qualifier Meaning
C Chopped rounding
D Rounding mode dynamic
M Round toward minus infinity
I Inexact result enable
S Exception completion enable
U Floating underflow enable
V Integer overflow enable
59
59
Page 60
61
4– 4 Alpha Architecture Handbook
4. 2 Memory Integer Load/ Store Instructions
The instructions in this section move data between the integer registers and memory.
They use the Memory instruction format. The instructions are summarized in Table 4–
2.
Table 4– 2: Memory Integer Load/ Store Instructions
Mnemonic Operation
LDA Load Address
LDAH Load Address High
LDBU Load Zero-Extended Byte from Memory to Register
LDL Load Sign-Extended Longword
LDL_ L Load Sign-Extended Longword Locked
LDQ Load Quadword
LDQ_ L Load Quadword Locked
LDQ_ U Load Quadword Unaligned
LDWU Load Zero-Extended Word from Memory to Register
STB Store Byte
STL Store Longword
STL_ C Store Longword Conditional
STQ Store Quadword
STQ_ C Store Quadword Conditional
STQ_ U Store Quadword Unaligned
STW Store Word
60
60
Page 61
62
Instruction Descriptions 4– 5
4. 2.1 Load Address
Format:
Operation:
Ra ¬ Rbv + SEXT( disp) !LDA
Ra ¬ Rbv + SEXT( disp* 65536) !LDAH
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment
for LDA, and 65536 times the sign-extended 16-bit displacement for LDAH. The 64-bit
result is written to register Ra.
LDAx Ra. wq, disp. ab( Rb. ab) !Memory format
None
LDA Load Address
LDAH Load Address High
None
61
61
Page 62
63
4– 6 Alpha Architecture Handbook
4.2. 2 Load Memory Data into Integer Register
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 000 2 !LDQ
big_ endian_ data: va' ¬ va XOR 100 2 !LDL
big_ endian_ data: va' ¬ va XOR 110 2 !LDWU
big_ endian_ data: va' ¬ va XOR 111 2 !LDBU
little_ endian_ data: va' ¬ va
ENDCASE
Ra ¬ (va')< 63: 0> !LDQ
Ra ¬ SEXT(( va')< 31: 0>) !LDL
Ra ¬ ZEXT(( va')< 15: 0>) !LDWU
Ra ¬ ZEXT(( va')< 07: 0>) !LDBU
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian access, the indicated bits are inverted, and any memory management
fault is reported for va (not va').
LDx Ra. wq, disp. ab( Rb. ab) !Memory format
Access Violation
Alignment
Fault on Read
Translation Not Valid
LDBU Load Zero-Extended Byte from Memory to Register
LDL Load Sign-Extended Longword from Memory to Register
LDQ Load Quadword from Memory to Register
LDWU Load Zero-Extended Word from Memory to Register
None
62
62
Page 63
64
Instruction Descriptions 4– 7
In the case of LDQ and LDL, the source operand is fetched from memory, sign-extended, and
written to register Ra.
In the case of LDWU and LDBU, the source operand is fetched from memory, zero-extended,
and written to register Ra.
In all cases, if the data is not naturally aligned, an alignment exception is generated.
Notes:
° The word or byte that the LDWU or LDBU instruction fetches from memory is placed in the low (rightmost) word or byte of Ra, with the remaining 6 or 7 bytes set to zero.
° Accesses have byte granularity.
° For big-endian access with LDWU or LDBU, the word/ byte remains in the rightmost part of Ra, but the va sent to memory has the indicated bits inverted. See Operation sec-tion,
above.
° No sparse address space mechanisms are allowed with the LDWU and LDBU instruc-tions.
Implementation Notes:
° The LDWU and LDBU instructions are supported in hardware on Alpha implementa-tions for which the AMASK instruction returns bit 0 set. LDWU and LDBU are sup-ported
with software emulation in Alpha implementations for which AMASK does not
return bit 0 set. Software emulation of LDWU and LDBU is significantly slower than
hardware support.
° Depending on an address space region's caching policy, implementations may read a (partial) cache block in order to do word/ byte stores. This may only be done in regions
that have memory-like behavior.
° Implementations are expected to provide sufficient low-order address bits and length-of-access information to devices on I/ O buses. But, strictly speaking, this is out-side
the scope of architecture.
63
63
Page 64
65
4– 8 Alpha Architecture Handbook
4.2. 3 Load Unaligned Memory Data into Integer Register
Format:
Operation:
va ¬ {{ Rbv + SEXT( disp)} AND NOT 7}
Ra ¬ (va)< 63: 0>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment,
then the low-order three bits are cleared. The source operand is fetched from memory
and written to register Ra.
LDQ_ U Ra. wq, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Read
Translation Not Valid
LDQ_ U Load Unaligned Quadword from Memory to Register
None
64
64
Page 65
66
Instruction Descriptions 4– 9
4.2. 4 Load Memory Data into Integer Register Locked
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 000 2 ! LDQ_ L
big_ endian_ data: va' ¬ va XOR 100 2 ! LDL_ L
little_ endian_ data: va' ¬ va ! LDL_ L
ENDCASE
lock_ flag ¬ 1
locked_ physical_ address ¬ PHYSICAL_ ADDRESS( va)
Ra ¬ SEXT(( va')< 31: 0>) ! LDL_ L
Ra ¬ (va)< 63: 0> ! LDQ_ L
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian longword access, va< 2> (bit 2 of the virtual address) is inverted, and
any memory management fault is reported for va (not va'). The source operand is fetched
from memory, sign-extended for LDL_ L, and written to register Ra.
LDx_ L Ra. wq, disp. ab( Rb. ab) !Memory format
Access Violation
Alignment
Fault on Read
Translation Not Valid
LDL_ L Load Sign-Extended Longword from Memory to Register
Locked
LDQ_ L Load Quadword from Memory to Register Locked
4– 10 Alpha Architecture Handbook
When a LDx_ L instruction is executed without faulting, the processor records the target physi-cal
address in a per-processor locked_ physical_ address register and sets the per-processor
lock_ flag.
If the per-processor lock_ flag is (still) set when a STx_ C instruction is executed (accessing
within the same 16-byte naturally aligned block as the LDx_ L), the store occurs; otherwise, it
does not occur, as described for the STx_ C instructions. The behavior of an STx_ C instruction
is UNPREDICTABLE, as described in Section 4.2.5,
when it does not access the same 16-byte
naturally aligned block as the LDx_ L.
Processor A causes the clearing of a set lock_ flag in processor B by doing any of the following
in B's locked range of physical addresses: a successful store, a successful store_ conditional, or
executing a WH64 instruction that modifies data on processor B. A processor's locked range is
the aligned block of 2** N bytes that includes the locked_ physical_ address. The 2** N value is
implementation dependent. It is at least 16 (minimum lock range is an aligned 16-byte block)
and is at most the page size for that implementation (maximum lock range is one physical
page).
A processor's lock_ flag is also cleared if that processor encounters a CALL_ PAL REI,
CALL_ PAL rti, or CALL_ PAL rfe instruction. It is UNPREDICTABLE whether or not a pro-cessor's
lock_ flag is cleared on any other CALL_ PAL instruction. It is UNPREDICTABLE
whether a processor's lock_ flag is cleared by that processor executing a normal load or store
instruction. It is UNPREDICTABLE whether a processor's lock_ flag is cleared by that proces-sor
executing a taken branch (including BR, BSR, and Jumps); conditional branches that fall
through do not clear the lock_ flag. It is UNPREDICTABLE whether a processor's lock_ flag is
cleared by that processor executing a WH64 or ECB instruction.
The sequence:
LDx_ L
Modify
STx_ C
BEQ xxx
when executed on a given processor, does an atomic read-modify-write of a datum in shared
memory if the branch falls through. If the branch is taken, the store did not modify memory
and the sequence may be repeated until it succeeds.
Notes:
° LDx_ L instructions do not check for write access; hence a matching STx_ C may take an access-violation or fault-on-write exception.
Executing a LDx_ L instruction on one processor does not affect any architecturally
visible state on another processor, and in particular cannot cause an STx_ C on another
processor to fail.
LDx_ L and STx_ C instructions need not be paired. In particular, an LDx_ L may be
followed by a conditional branch: on the fall-through path an STx_ C is executed,
whereas on the taken path no matching STx_ C is executed.
66
66
Page 67
68
Instruction Descriptions 4– 11
If two LDx_ L instructions execute with no intervening STx_ C, the second one
overwrites the state of the first one. If two STx_ C instructions execute with no
intervening LDx_ L, the second one always fails because the first clears lock_ flag.
° Software will not emulate unaligned LDx_ L instructions.
° If the virtual and physical addresses for a LDx_ L and STx_ C sequence are not within the same naturally aligned 16-byte sections of virtual and physical memory, that
sequence may always fail, or may succeed despite another processor's store to the lock
range; hence, no useful program should do this.
° If any other memory access (ECB, LDx, LDQ_ U, STx, STQ_ U, WH64) is executed on the given processor between the LDx_ L and the STx_ C, the sequence above may
always fail on some implementations; hence, no useful program should do this.
° If a branch is taken between the LDx_ L and the STx_ C, the sequence above may always fail on some implementations; hence, no useful program should do this.
(CMOVxx may be used to avoid branching.)
° If a subsetted instruction (for example, floating-point) is executed between the LDx_ L and the STx_ C, the sequence above may always fail on some implementations because
of the Illegal Instruction Trap; hence, no useful program should do this.
° If an instruction with an unused function code is executed between the LDx_ L and the STx_ C, the sequence above may always fail on some implementations because an
instruction with an unused function code is UNPREDICTABLE.
° If a large number of instructions are executed between the LDx_ L and the STx_ C, the sequence above may always fail on some implementations because of a timer interrupt
always clearing the lock_ flag before the sequence completes; hence, no useful program
should do this.
° Hardware implementations are encouraged to lock no more than 128 bytes. Software implementations are encouraged to separate locked locations by at least 128 bytes from
other locations that could potentially be written by another processor while the first
location is locked.
° Execution of a WH64 instruction on processor A to a region within the lock range of processor B, where the execution of the WH64 changes the contents of memory, causes
the lock_ flag on processor B to be cleared. If the WH64 does not change the contents of
memory on processor B, it need not clear the lock_ flag.
Implementation Notes:
Implementations that impede the mobility of a cache block on LDx_ L, such as that which
may occur in a Read for Ownership cache coherency protocol, may release the cache block
and make the subsequent STx_ C fail if a branch-taken or memory instruction is executed
on that processor.
All implementations should guarantee that at least 40 non-subsetted operate instructions
can be executed between timer interrupts.
67
67
Page 68
69
4– 12 Alpha Architecture Handbook
4.2. 5 Store Integer Register Data into Memory Conditional
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 000 2 ! STQ_ C
big_ endian_ data: va' ¬ va XOR 100 2 ! STL_ C
little_ endian_ data: va' ¬ va ! STL_ C
ENDCASE
IF lock_ flag EQ 1 THEN
(va')< 31: 0> ¬ Rav< 31: 0> ! STL_ C
(va') ¬ Rav ! STQ_ C
Ra ¬ lock_ flag
lock_ flag ¬ 0
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian longword access, va< 2> (bit 2 of the virtual address) is inverted, and
any memory management fault is reported for va (not va').
If the lock_ flag is set and the address meets the following constraints relative to the address
specified by the preceding LDx_ L instruction, the Ra operand is written to memory at this
address. If the address meets the following constraints but the lock_ flag is not set, a zero is
returned in Ra and no write to memory occurs. The constraints are:
STx_ C Ra. mx, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Write
Alignment
Translation Not Valid
STL_ C Store Longword from Register to Memory Conditional
STQ_ C Store Quadword from Register to Memory Conditional
Instruction Descriptions 4– 13
° The computed virtual address must specify a location within the naturally aligned 16-byte block in virtual memory accessed by the preceding LDx_ L instruction.
° The resultant physical address must specify a location within the naturally aligned 16-byte block in physical memory accessed by the preceding LDx_ L instruction.
If those addressing constraints are not met, it is UNPREDICTABLE whether the STx_ C
instruction succeeds or fails, regardless of the state of the lock_ flag, unless the lock_ flag is
cleared as described in the next paragraph.
Whether or not the addressing constraints are met, a zero is returned and no write to memory
occurs if the lock_ flag was cleared by execution on a processor of a CALL_ PAL REI,
CALL_ PAL rti, CALL_ PAL rfe, or STx_ C, after the most recent execution on that processor
of a LDx_ L instruction (in processor issue sequence).
In all cases, the lock_ flag is set to zero at the end of the operation.
Notes:
° Software will not emulate unaligned STx_ C instructions.
° Each implementation must do the test and store atomically, as illustrated in the follow-ing two examples. (See Section 5.6. 1
for complete information.)
– If two processors attempt STx_ C instructions to the same lock range and that lock
range was accessed by both processors' preceding LDx_ L instructions, exactly one
of the stores succeeds.
– A processor executes a LDx_ L/ STx_ C sequence and includes an MB between the
LDx_ L to a particular address and the successful STx_ C to a different address (one
that meets the constraints required for predictable behavior). That instruction
sequence establishes an access order under which a store operation by another pro-cessor
to that lock range occurs before the LDx_ L or after the STx_ C.
° If the virtual and physical addresses for a LDx_ L and STx_ C sequence are not within the same naturally aligned 16-byte sections of virtual and physical memory, that
sequence may always fail, or may succeed despite another processor's store to the lock
range; hence, no useful program should do this.
° The following sequence should not be used:
try_ again: LDQ_ L R1, x
<modify R1>
STQ_ C R1, x
BEQ R1, try_ again
That sequence penalizes performance when the STQ_ C succeeds, because the
sequence contains a backward branch, which is predicted to be taken in the Alpha
architecture. In the case where the STQ_ C succeeds and the branch will actually fall
through, that sequence incurs unnecessary delay due to a mispredicted backward
branch. Instead, a forward branch should be used to handle the failure case, as shown
in Section 5.5.2.
69
69
Page 70
71
4– 14 Alpha Architecture Handbook
Software Note:
If the address specified by a STx_ C instruction does not match the one given in the
preceding LDx_ L instruction, an MB is required to guarantee ordering between the two
instructions.
Hardware/ Software Implementation Note:
STQ_ C is used in the first Alpha implementations to access the MailBox Pointer Register
(MBPR). In this special case, the effect of the STQ_ C is well defined (that is, not
UNPREDICTABLE) even though the preceding LDx_ L did not specify the address of the
MBPR. The effect of STx_ C in this special case may vary from implementation to
implementation.
Implementation Notes:
A STx_ C must propagate to the point of coherency, where it is guaranteed to prevent any
other store from changing the state of the lock bit, before its outcome can be determined.
If an implementation could encounter a TB or cache miss on the data reference of the
STx_ C in the sequence above (as might occur in some shared I-and D-stream
direct-mapped TBs/ caches), it must be able to resolve the miss and complete the store
without always failing.
70
70
Page 71
72
Instruction Descriptions 4– 15
4.2. 6 Store Integer Register Data into Memory
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 000 2 !STQ
big_ endian_ data: va' ¬ va XOR 100 2 !STL
big_ endian_ data: va' ¬ va XOR 110 2 !STW
big_ endian_ data: va' ¬ va XOR 111 2 !STB
little_ endian_ data: va' ¬ va
ENDCASE
(va') ¬ Rav !STQ
(va')< 31: 00> ¬ Rav< 31: 0> !STL
(va')< 15: 00> ¬ Rav< 15: 0> !STW
(va')< 07: 00> ¬ Rav< 07: 0> !STB
Exceptions:
Instruction mnemonics:
Qualifiers:
Description: The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian access, the indicated bits are inverted, and any memory management
fault is reported for va (not va').
STx Ra. rx, disp. ab( Rb. ab) !Memory format
Access Violation
Alignment
Fault on Write
Translation Not Valid
STB Store Byte from Register to Memory
STL Store Longword from Register to Memory
STQ Store Quadword from Register to Memory
STW Store Word from Register to Memory
None
71
71
Page 72
73
4– 16 Alpha Architecture Handbook
The Ra operand is written to memory at this address. If the data is not naturally aligned, an
alignment exception is generated.
Notes:
° The word or byte that the STB or STW instruction stores to memory comes from the low (rightmost) byte or word of Ra.
° Accesses have byte granularity.
° For big-endian access with STB or STW, the byte/ word remains in the rightmost part of Ra, but the va sent to memory has the indicated bits inverted. See Operation section,
above.
° No sparse address space mechanisms are allowed with the STB and STW instructions.
Implementation Notes:
° The STB and STW instructions are supported in hardware on Alpha implementations for which the AMASK instruction returns bit 0 set. STB and STW are supported with
software emulation in Alpha implementations for which AMASK does not return bit 0
set. Software emulation of STB and STW is significantly slower than hardware support.
° Depending on an address space region's caching policy, implementations may read a (partial) cache block in order to do byte/ word stores. This may only be done in regions
that have memory-like behavior.
° Implementations are expected to provide sufficient low-order address bits and length-of-access information to devices on I/ O buses. But, strictly speaking, this is out-side
the scope of architecture.
72
72
Page 73
74
Instruction Descriptions 4– 17
4.2. 7 Store Unaligned Integer Register Data into Memory
Format:
Operation:
va ¬ {{ Rbv + SEXT( disp)} AND NOT 7}
(va)< 63: 0> ¬ Rav< 63: 0>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment,
then clearing the low order three bits. The Ra operand is written to memory at this
address.
STQ_ U Ra. rq, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Write
Translation Not Valid
STQ_ U Store Unaligned Quadword from Register to Memory
None
73
73
Page 74
75
4– 18 Alpha Architecture Handbook
4. 3 Control Instructions
Alpha provides integer conditional branch, unconditional branch, branch to subroutine, and
jump instructions. The PC used in these instructions is the updated PC, as described in Section
3.1.1.
To allow implementations to achieve high performance, the Alpha architecture includes
explicit hints based on a branch-prediction model:
° For many implementations of computed branches (JSR/ RET/ JMP), there is a substan-tial performance gain in forming a good guess of the expected target I-cache address
before register Rb is accessed.
° For many implementations, the first-level (or only) I-cache is no bigger than a page (8 KB to 64 KB).
° Correctly predicting subroutine returns is important for good performance. Some implementations will therefore keep a small stack of predicted subroutine return
I-cache addresses.
The Alpha architecture provides three kinds of branch-prediction hints: likely target address,
return-address stack action, and conditional branch-taken.
For computed branches, the otherwise unused displacement field contains a function code
(JMP/ JSR/ RET/ JSR_ COROUTINE), and, for JSR and JMP, a field that statically specifies the
16 low bits of the most likely target address. The PC-relative calculation using these bits can
be exactly the PC-relative calculation used in unconditional branches. The low 16 bits are
enough to specify an I-cache block within the largest possible Alpha page and hence are
expected to be enough for branch-prediction logic to start an early I-cache access for the most
likely target.
For all branches, hint or opcode bits are used to distinguish simple branches, subroutine calls,
subroutine returns, and coroutine links. These distinctions allow branch-predict logic to main-tain
an accurate stack of predicted return addresses.
For conditional branches, the sign of the target displacement is used as a taken/ fall-through
hint. The instructions are summarized in Table 4–
3.
Table 4– 3: Control Instructions Summary
Mnemonic Operation
BEQ Branch if Register Equal to Zero
BGE Branch if Register Greater Than or Equal to Zero
BGT Branch if Register Greater Than Zero
BLBC Branch if Register Low Bit Is Clear
BLBS Branch if Register Low Bit Is Set
BLE Branch if Register Less Than or Equal to Zero
BLT Branch if Register Less Than Zero
74
74
Page 75
76
Instruction Descriptions 4– 19
BNE Branch if Register Not Equal to Zero
BR Unconditional Branch
BSR Branch to Subroutine
JMP Jump
JSR Jump to Subroutine
RET Return from Subroutine
JSR_ COROUTINE Jump to Subroutine Return
Table 4– 3: Control Instructions Summary (Continued)
Mnemonic Operation
75
75
Page 76
77
4– 20 Alpha Architecture Handbook
4.3. 1 Conditional Branch
Format:
Operation:
{update PC}
va ¬ PC + {4* SEXT( disp)}
IF TEST( Rav, Condition_ based_ on_ Opcode) THEN
PC ¬ va
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is tested. If the specified relationship is true, the PC is loaded with the target vir-tual
address; otherwise, execution continues with the next sequential instruction.
The displacement is treated as a signed longword offset. This means it is shifted left two bits
(to address a longword boundary), sign-extended to 64 bits, and added to the updated PC to
form the target virtual address.
The conditional branch instructions are PC-relative only. The 21-bit signed displacement gives
a forward/ backward branch distance of +/– 1M instructions.
The test is on the signed quadword integer interpretation of the register contents; all 64 bits are
tested.
Bxx Ra. rq, disp. al !Branch format
None
BEQ Branch if Register Equal to Zero
BGE Branch if Register Greater Than or Equal to Zero
BGT Branch if Register Greater Than Zero
BLBC Branch if Register Low Bit Is Clear
BLBS Branch if Register Low Bit Is Set
BLE Branch if Register Less Than or Equal to Zero
BLT Branch if Register Less Than Zero
BNE Branch if Register Not Equal to Zero
None
76
76
Page 77
78
Instruction Descriptions 4– 21
4.3. 2 Unconditional Branch
Format:
Operation:
{update PC}
Ra ¬ PC
PC ¬ PC + {4* SEXT( disp)}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The PC of the following instruction (the updated PC) is written to register Ra and then the PC
is loaded with the target address.
The displacement is treated as a signed longword offset. This means it is shifted left two bits
(to address a longword boundary), sign-extended to 64 bits, and added to the updated PC to
form the target virtual address.
The unconditional branch instructions are PC-relative. The 21-bit signed displacement gives a
forward/ backward branch distance of +/– 1M instructions.
PC-relative addressability can be established by:
BR Rx, L1
L1:
Notes:
° BR and BSR do identical operations. They only differ in hints to possible branch-pre-diction logic. BSR is predicted as a subroutine call (pushes the return address on a
branch-prediction stack), whereas BR is predicted as a branch (no push).
BxR Ra. wq, disp. al !Branch format
None
BR Unconditional Branch
BSR Branch to Subroutine
4– 22 Alpha Architecture Handbook
4.3. 3 Jumps
Format:
Operation:
{update PC}
va ¬ Rbv AND {NOT 3}
Ra ¬ PC
PC ¬ va
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The PC of the instruction following the Jump instruction (the updated PC) is written to register
Ra and then the PC is loaded with the target virtual address.
The new PC is supplied from register Rb. The low two bits of Rb are ignored. Ra and Rb may
specify the same register; the target calculation using the old value is done before the new
value is assigned.
All Jump instructions do identical operations. They only differ in hints to possible branch-pre-diction
logic. The displacement field of the instruction is used to pass this information. The
four different "opcodes" set different bit patterns in disp< 15: 14>, and the hint operand sets
disp< 13: 0>.
These bits are intended to be used as shown in Table 4–
4.
mnemonic Ra. wq,( Rb. ab), hint !Memory format
None
JMP Jump
JSR Jump to Subroutine
RET Return from Subroutine
JSR_ COROUTINE Jump to Subroutine Return
Instruction Descriptions 4– 23
The design in Table 4– 4
allows specification of the low 16 bits of a likely longword target
address (enough bits to start a useful I-cache access early), and also allows distinguishing call
from return (and from the other two less frequent operations).
Note that the above information is used only as a hint; correct setting of these bits can improve
performance but is not needed for correct operation. See Section A. 2.2
for more information on
branch prediction.
An unconditional long jump can be performed by:
JMP R31,( Rb), hint
Coroutine linkage can be performed by specifying the same register in both the Ra and Rb
operands. When disp< 15: 14> equals '10' (RET) or '11' (JSR_ COROUTINE) (that is, the tar-get
address prediction, if any, would come from a predictor implementation stack), then bits
<13: 0> are reserved for software and must be ignored by all implementations. All encodings
for bits <13: 0> are used by Compaq software or Reserved to Compaq, as follows:
Table 4– 4: Jump Instructions Branch Prediction
disp< 15: 14> Meaning Predicted Target< 15: 0> Prediction Stack Action
00 JMP PC + {4* disp< 13: 0>} –
01 JSR PC + {4* disp< 13: 0>} Push PC
10 RET Prediction stack Pop
11 JSR_ COROUTINE Prediction stack Pop, push PC
Encoding Meaning
0000 16 Indicates non-procedure return
0001 16 Indicates procedure return
All other encodings are reserved to Compaq.
79
79
Page 80
81
4– 24 Alpha Architecture Handbook
4.4 Integer Arithmetic Instructions
The integer arithmetic instructions perform add, subtract, multiply, signed and unsigned com-pare,
and bit count operations.
Count instruction (CIX) extension implementation note:
The CIX extension to the architecture provides the CTLZ, CTPOP, and CTTZ instructions.
Alpha processors for which the AMASK instruction returns bit 2 set implement these
instructions. Those processors for which AMASK does not return bit 2 set can take an
Illegal Instruction trap, and software can emulate their function, if required. AMASK is
described in Sections 4.11. 1
and D. 3.
The integer instructions are summarized in Table 4–
5
There is no integer divide instruction. Division by a constant can be done by using UMULH;
division by a variable can be done by using a subroutine. See Section A. 4.2.
Table 4– 5: Integer Arithmetic Instructions Summary
Mnemonic Operation
ADD Add Quadword/ Longword
S4ADD Scaled Add by 4
S8ADD Scaled Add by 8
CMPEQ Compare Signed Quadword Equal
CMPLT Compare Signed Quadword Less Than
CMPLE Compare Signed Quadword Less Than or Equal
CTLZ Count leading zero
CTPOP Count population
CTTZ Count trailing zero
CMPULT Compare Unsigned Quadword Less Than
CMPULE Compare Unsigned Quadword Less Than or Equal
MUL Multiply Quadword/ Longword
UMULH Multiply Quadword Unsigned High
SUB Subtract Quadword/ Longword
S4SUB Scaled Subtract by 4
S8SUB Scaled Subtract by 8
80
80
Page 81
82
Instruction Descriptions 4– 25
4.4. 1 Longword Add
Format:
Operation:
Rc ¬ SEXT( (Rav + Rbv)< 31: 0>)
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is added to register Rb or a literal and the sign-extended 32-bit sum is written to
Rc.
The high order 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated
32-bit sum. Overflow detection is based on the longword sum Rav< 31: 0> + Rbv< 31: 0>.
ADDL Ra. rl, Rb. rl, Rc. wq !Operate format
ADDL Ra. rl,# b. ib, Rc. wq !Operate format
Integer Overflow
ADDL Add Longword
Integer Overflow Enable (/ V)
81
81
Page 82
83
4– 26 Alpha Architecture Handbook
4.4. 2 Scaled Longword Add
Format:
Operation:
CASE
S4ADDL: Rc ¬ SEXT ((( LEFT_ SHIFT( Rav, 2)) + Rbv)< 31: 0>)
S8ADDL: Rc ¬ SEXT ((( LEFT_ SHIFT( Rav, 3)) + Rbv)< 31: 0>)
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is scaled by 4 (for S4ADDL) or 8 (for S8ADDL) and is added to register Rb or a
literal, and the sign-extended 32-bit sum is written to Rc.
The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 32-bit
sum.
SxADDL Ra. rl, Rb. rq, Rc. wq !Operate format
SxADDL Ra. rl,# b. ib, Rc. wq !Operate format
None
S4ADDL Scaled Add Longword by 4
S8ADDL Scaled Add Longword by 8
None
82
82
Page 83
84
Instruction Descriptions 4– 27
4.4. 3 Quadword Add
Format:
Operation:
Rc ¬ Rav + Rbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is added to register Rb or a literal and the 64-bit sum is written to Rc.
On overflow, the least significant 64 bits of the true result are written to the destination
register.
The unsigned compare instructions can be used to generate carry. After adding two values, if
the sum is less unsigned than either one of the inputs, there was a carry out of the most signifi-cant
bit.
ADDQ Ra. rq, Rb. rq, Rc. wq !Operate format
ADDQ Ra. rq,# b. ib, Rc. wq !Operate format
Integer Overflow
ADDQ Add Quadword
Integer Overflow Enable (/ V)
83
83
Page 84
85
4– 28 Alpha Architecture Handbook
4.4. 4 Scaled Quadword Add
Format:
Operation:
CASE
S4ADDQ: Rc ¬ LEFT_ SHIFT( Rav, 2) + Rbv
S8ADDQ: Rc ¬ LEFT_ SHIFT( Rav, 3) + Rbv
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is scaled by 4 (for S4ADDQ) or 8 (for S8ADDQ) and is added to register Rb or a
literal, and the 64-bit sum is written to Rc.
On overflow, the least significant 64 bits of the true result are written to the destination
register.
SxADDQ Ra. rq, Rb. rq, Rc. wq !Operate format
SxADDQ Ra. rq,# b. ib, Rc. wq !Operate format
None
S4ADDQ Scaled Add Quadword by 4
S8ADDQ Scaled Add Quadword by 8
None
84
84
Page 85
86
Instruction Descriptions 4– 29
4.4. 5 Integer Signed Compare
Format:
Operation:
IF Rav SIGNED_ RELATION Rbv THEN
Rc ¬ 1
ELSE
Rc ¬ 0
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is compared to Register Rb or a literal. If the specified relationship is true, the
value one is written to register Rc; otherwise, zero is written to Rc.
Notes:
° Compare Less Than A, B is the same as Compare Greater Than B, A; Compare Less Than or Equal A, B is the same as Compare Greater Than or Equal B, A. Therefore, only
the less-than operations are included.
CMPxx Ra. rq, Rb. rq, Rc. wq !Operate format
CMPxx Ra. rq,# b. ib, Rc. wq !Operate format
None
CMPEQ Compare Signed Quadword Equal
CMPLE Compare Signed Quadword Less Than or Equal
CMPLT Compare Signed Quadword Less Than
None
85
85
Page 86
87
4– 30 Alpha Architecture Handbook
4.4. 6 Integer Unsigned Compare
Format:
Operation:
IF Rav UNSIGNED_ RELATION Rbv THEN
Rc ¬ 1
ELSE
Rc ¬ 0
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is compared to Register Rb or a literal. If the specified relationship is true, the
value one is written to register Rc; otherwise, zero is written to Rc.
CMPUxx Ra. rq, Rb. rq, Rc. wq !Operate format
CMPUxx Ra. rq,# b. ib, Rc. wq !Operate format
None
CMPULE Compare Unsigned Quadword Less Than or Equal
CMPULT Compare Unsigned Quadword Less Than
None
86
86
Page 87
88
Instruction Descriptions 4– 31
4.4. 7 Count Leading Zero
Format:
Operation:
temp = 0
FOR i FROM 63 DOWN TO 0
IF { Rbv< i> EQ 1 } THEN BREAK
temp = temp + 1
END
Rc< 6: 0> ¬ temp< 6: 0>
Rc< 63: 7> ¬ 0
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The number of leading zeros in Rb, starting at the most significant bit position, is written to Rc.
Ra must be R31.
CTLZ Rb. rq, Rc. wq ! Operate format
None
CTLZ Count Leading Zero
None
87
87
Page 88
89
4– 32 Alpha Architecture Handbook
4.4. 8 Count Population
Format:
Operation:
temp = 0
FOR i FROM 0 TO 63
IF { Rbv< i> EQ 1 } THEN temp = temp + 1
END
Rc< 6: 0> ¬ temp< 6: 0>
Rc< 63: 7> ¬ 0
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The number of ones in Rb is written to Rc. Ra must be R31.
CTPOP Rb. rq, Rc. wq ! Operate format
None
CTPOP Count Population
None
88
88
Page 89
90
Instruction Descriptions 4– 33
4.4. 9 Count Trailing Zero
Format:
Operation:
temp = 0
FOR i FROM 0 TO 63
IF { Rbv< i> EQ 1 } THEN BREAK
temp = temp + 1
END
Rc< 6: 0> ¬ temp< 6: 0>
Rc< 63: 7> ¬ 0
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The number of trailing zeros in Rb, starting at the least significant bit position, is written to Rc.
Ra must be R31.
CTTZ Rb. rq, Rc. wq ! Operate format
None
CTTZ Count Trailing Zero
None
89
89
Page 90
91
4– 34 Alpha Architecture Handbook
4.4. 10 Longword Multiply
Format:
Operation:
Rc ¬ SEXT (( Rav * Rbv)< 31: 0>)
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is multiplied by register Rb or a literal and the sign-extended 32-bit product is
written to Rc.
The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 32-bit
product. Overflow detection is based on the longword product Rav< 31: 0> * Rbv< 31: 0>. On
overflow, the proper sign extension of the least significant 32 bits of the true result is written to
the destination register.
The MULQ instruction can be used to return the full 64-bit product.
MULL Ra. rl, Rb. rl, Rc. wq !Operate format
MULL Ra. rl,# b. ib, Rc. wq !Operate format
Integer Overflow
MULL Multiply Longword
Integer Overflow Enable (/ V)
90
90
Page 91
92
Instruction Descriptions 4– 35
4.4. 11 Quadword Multiply
Format:
Operation:
Rc ¬ Rav * Rbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is multiplied by register Rb or a literal and the 64-bit product is written to register
Rc. Overflow detection is based on considering the operands and the result as signed quanti-ties.
On overflow, the least significant 64 bits of the true result are written to the destination
register.
The UMULH instruction can be used to generate the upper 64 bits of the 128-bit result when
an overflow occurs.
MULQ Ra. rq, Rb. rq, Rc. wq !Operate format
MULQ Ra. Rq,# b. ib, Rc. wq !Operate format
Integer Overflow
MULQ Multiply Quadword
Integer Overflow Enable (/ V)
91
91
Page 92
93
4– 36 Alpha Architecture Handbook
4.4. 12 Unsigned Quadword Multiply High
Format:
Operation:
Rc ¬ {Rav * U Rbv}< 127: 64>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra and Rb or a literal are multiplied as unsigned numbers to produce a 128-bit result.
The high-order 64-bits are written to register Rc.
The UMULH instruction can be used to generate the upper 64 bits of a 128-bit result as
follows:
Ra and Rb are unsigned: result of UMULH
Ra and Rb are signed: (result of UMULH) – Ra< 63>* Rb – Rb< 63>* Ra
The MULQ instruction gives the low 64 bits of the result in either case.
UMULH Ra. rq, Rb. rq, Rc. wq !Operate format
UMULH Ra. rq,# b. ib, Rc. wq !Operate format
None
UMULH Unsigned Multiply Quadword High
None
92
92
Page 93
94
Instruction Descriptions 4– 37
4.4. 13 Longword Subtract
Format:
Operation:
Rc ¬ SEXT (( Rav -Rbv)< 31: 0>)
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Rb or a literal is subtracted from register Ra and the sign-extended 32-bit difference is
written to Rc.
The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 32-bit
difference. Overflow detection is based on the longword difference Rav< 31: 0> – Rbv< 31: 0>.
SUBL Ra. rl, Rb. rl, Rc. wq !Operate format
SUBL Ra. rl,# b. ib, Rc. wq !Operate format
Integer Overflow
SUBL Subtract Longword
Integer Overflow Enable (/ V)
93
93
Page 94
95
4– 38 Alpha Architecture Handbook
4.4. 14 Scaled Longword Subtract
Format:
Operation:
CASE
S4SUBL: Rc ¬ SEXT ((( LEFT_ SHIFT( Rav, 2)) -Rbv)< 31: 0>)
S8SUBL: Rc ¬ SEXT ((( LEFT_ SHIFT( Rav, 3)) -Rbv)< 31: 0>)
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Rb or a literal is subtracted from the scaled value of register Ra, which is scaled by 4
(for S4SUBL) or 8 (for S8SUBL), and the sign-extended 32-bit difference is written to Rc.
The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 32-bit
difference.
SxSUBL Ra. rl, Rb. rl, Rc. wq !Operate format
SxSUBL Ra. rl,# b. ib, Rc. wq !Operate format
None
S4SUBL Scaled Subtract Longword by 4
S8SUBL Scaled Subtract Longword by 8
None
94
94
Page 95
96
Instruction Descriptions 4– 39
4.4. 15 Quadword Subtract
Format:
Operation:
Rc ¬ Rav -Rbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Rb or a literal is subtracted from register Ra and the 64-bit difference is written to reg-ister
Rc. On overflow, the least significant 64 bits of the true result are written to the
destination register.
The unsigned compare instructions can be used to generate borrow. If the minuend (Rav) is
less unsigned than the subtrahend (Rbv), a borrow will occur.
SUBQ Ra. rq, Rb. rq, Rc. wq !Operate format
SUBQ Ra. rq,# b. ib, Rc. wq !Operate format
Integer Overflow
SUBQ Subtract Quadword
Integer Overflow Enable (/ V)
95
95
Page 96
97
4– 40 Alpha Architecture Handbook
4.4. 16 Scaled Quadword Subtract
Format:
Operation:
CASE
S4SUBQ: Rc ¬ LEFT_ SHIFT( Rav, 2) -Rbv
S8SUBQ: Rc ¬ LEFT_ SHIFT( Rav, 3) -Rbv
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Rb or a literal is subtracted from the scaled value of register Ra, which is scaled by 4
(for S4SUBQ) or 8 (for S8SUBQ), and the 64-bit difference is written to Rc.
SxSUBQ Ra. rq, Rb. rq, Rc. wq !Operate format
SxSUBQ Ra. rq,# b. ib, Rc. wq !Operate format
None
S4SUBQ Scaled Subtract Quadword by 4
S8SUBQ Scaled Subtract Quadword by 8
Instruction Descriptions 4– 41
4. 5 Logical and Shift Instructions
The logical instructions perform quadword Boolean operations. The conditional move integer
instructions perform conditionals without a branch. The shift instructions perform left and right
logical shift and right arithmetic shift. These are summarized in Table 4–
6.
Software Note:
There is no arithmetic left shift instruction. Where an arithmetic left shift would be used, a
logical shift will do. For multiplying by a small power of two in address computations,
logical left shift is acceptable.
Integer multiply should be used to perform an arithmetic left shift with overflow checking.
Bit field extracts can be done with two logical shifts. Sign extension can be done with a left
logical shift and a right arithmetic shift.
Table 4– 6: Logical and Shift Instructions Summary
Mnemonic Operation
AND Logical Product
BIC Logical Product with Complement
BIS Logical Sum (OR)
EQV Logical Equivalence (XORNOT)
ORNOT Logical Sum with Complement
XOR Logical Difference
CMOVxx Conditional Move Integer
SLL Shift Left Logical
SRA Shift Right Arithmetic
SRL Shift Right Logical
97
97
Page 98
99
4– 42 Alpha Architecture Handbook
4.5. 1 Logical Functions
Format:
Operation:
Rc ¬ Rav AND Rbv !AND
Rc ¬ Rav OR Rbv !BIS
Rc ¬ Rav XOR Rbv !XOR
Rc ¬ Rav AND {NOT Rbv} !BIC
Rc ¬ Rav OR {NOT Rbv} !ORNOT
Rc ¬ Rav XOR {NOT Rbv} !EQV
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
These instructions perform the designated Boolean function between register Ra and register
Rb or a literal. The result is written to register Rc.
The NOT function can be performed by doing an ORNOT with zero (Ra = R31).
mnemonic Ra. rq, Rb. rq, Rc. wq !Operate format
mnemonic Ra. rq,# b. ib, Rc. wq !Operate format
None
AND Logical Product
BIC Logical Product with Complement
BIS Logical Sum (OR)
EQV Logical Equivalence (XORNOT)
ORNOT Logical Sum with Complement
XOR Logical Difference
None
98
98
Page 99
100
Instruction Descriptions 4– 43
4.5. 2 Conditional Move Integer
Format:
Operation:
IF TEST( Rav, Condition_ based_ on_ Opcode) THEN
Rc ¬ Rbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is tested. If the specified relationship is true, the value Rbv is written to register
Rc.
CMOVxx Ra. rq, Rb. rq, Rc. wq !Operate format
CMOVxx Ra. rq,# b. ib, Rc. wq !Operate format
None
CMOVEQ CMOVE if Register Equal to Zero
CMOVGE CMOVE if Register Greater Than or Equal to Zero
CMOVGT CMOVE if Register Greater Than Zero
CMOVLBC CMOVE if Register Low Bit Clear
CMOVLBS CMOVE if Register Low Bit Set
CMOVLE CMOVE if Register Less Than or Equal to Zero
CMOVLT CMOVE if Register Less Than Zero
CMOVNE CMOVE if Register Not Equal to Zero
None
99
99
Page 100
101
4– 44 Alpha Architecture Handbook
Notes:
Except that it is likely in many implementations to be substantially faster, the instruction:
CMOVEQ Ra, Rb, Rc
is exactly equivalent to:
BNE Ra, label
OR Rb, Rb, Rc
label: ...
For example, a branchless sequence for:
R1= MAX( R1, R2)
is:
CMPLT R1, R2, R3 ! R3= 1 if R1< R2
CMOVNE R3, R2, R1 ! Move R2 to R1 if R1< R2
100
100
Page 101
102
Instruction Descriptions 4– 45
4.5. 3 Shift Logical
Format:
Operation:
Rc ¬ LEFT_ SHIFT( Rav, Rbv< 5: 0>) !SLL
Rc ¬ RIGHT_ SHIFT( Rav, Rbv< 5: 0>) !SRL
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is shifted logically left or right 0 to 63 bits by the count in register Rb or a literal.
The result is written to register Rc. Zero bits are propagated into the vacated bit positions.
SxL Ra. rq, Rb. rq, Rc. wq !Operate format
SxL Ra. rq,# b. ib, Rc. wq !Operate format
None
SLL Shift Left Logical
SRL Shift Right Logical
None
101
101
Page 102
103
4– 46 Alpha Architecture Handbook
4.5. 4 Shift Arithmetic
Format:
Operation:
Rc ¬ ARITH_ RIGHT_ SHIFT( Rav, Rbv< 5: 0>)
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is right shifted arithmetically 0 to 63 bits by the count in register Rb or a literal.
The result is written to register Rc. The sign bit (Rav< 63>) is propagated into the vacated bit
positions.
SRA Ra. rq, Rb. rq, Rc. wq !Operate format
SRA Ra. rq,# b. ib, Rc. wq !Operate format
None
SRA Shift Right Arithmetic
None
102
102
Page 103
104
Instruction Descriptions 4– 47
4. 6 Byte Manipulation Instructions
Alpha implementations that support the BWX extension provide the following instructions for
loading, sign-extending, and storing bytes and words between a register and memory:
The AMASK instruction reports whether a particular Alpha implementation supports the BWX
extension. AMASK is described in Sections 4.11.
1 and D.
3.
LDBU and STB are the recommended way to perform byte load and store operations on Alpha
implementations that support them; use them rather than the extract, insert, and mask byte
instructions described in this section. In particular, the implementation examples in this sec-tion
that illustrate byte operations are not appropriate for Alpha implementations that support
the BWX extension – instead use the recommendations in Section A. 4.1.
In addition to LDBU and STB, Alpha provides the instructions in Table 4– 7
for operating on
byte operands within registers.
Instruction Meaning Described in Section
LDBU/ LDWU Load byte/ word unaligned 4. 2. 2
SEXTB/ SEXTW Sign-extend byte/ word 4. 6. 5
STB/ STW Store byte/ word 4. 2. 6
Table 4– 7: Byte-Within-Register Manipulation Instructions Summary
Mnemonic Operation
CMPBGE Compare Byte
EXTBL Extract Byte Low
EXTWL Extract Word Low
EXTLL Extract Longword Low
EXTQL Extract Quadword Low
EXTWH Extract Word High
EXTLH Extract Longword High
EXTQH Extract Quadword High
INSBL Insert Byte Low
INSWL Insert Word Low
INSLL Insert Longword Low
INSQL Insert Quadword Low
103
103
Page 104
105
4– 48 Alpha Architecture Handbook
INSWH Insert Word High
INSLH Insert Longword High
INSQH Insert Quadword High
MSKBL Mask Byte Low
MSKWL Mask Word Low
MSKLL Mask Longword Low
MSKQL Mask Quadword Low
MSKWH Mask Word High
MSKLH Mask Longword High
MSKQH Mask Quadword High
SEXTB Sign extend byte
SEXTW Sign extend word
ZAP Zero Bytes
ZAPNOT Zero Bytes Not
Table 4– 7: Byte-Within-Register Manipulation Instructions Summary
(Continued)
Mnemonic Operation
104
104
Page 105
106
Instruction Descriptions 4– 49
4.6. 1 Compare Byte
Format:
Operation:
FOR i FROM 0 TO 7
temp< 8: 0> ¬ 0 || Rav< i* 8+ 7: i* 8>} + {0 || NOT Rbv< i* 8+ 7: i* 8>} + 1
Rc< i> ¬ temp< 8>
END
Rc< 63: 8> ¬ 0
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
CMPBGE does eight parallel unsigned byte comparisons between corresponding bytes of Rav
and Rbv, storing the eight results in the low eight bits of Rc. The high 56 bits of Rc are set to
zero. Bit 0 of Rc corresponds to byte 0, bit 1 of Rc corresponds to byte 1, and so forth. A result
bit is set in Rc if the corresponding byte of Rav is greater than or equal to Rbv (unsigned).
Notes:
The result of CMPBGE can be used as an input to ZAP and ZAPNOT.
To scan for a byte of zeros in a character string:
<initialize R1 to aligned QW address of string>
LOOP:
LDQ R2, 0( R1) ; Pick up 8 bytes
LDA R1, 8( R1) ; Increment string pointer
CMPBGE R31, R2,R3 ; If NO bytes of zero, R3< 7: 0>= 0
BEQ R3, LOOP ; Loop if no terminator byte found
... ; At this point, R3 can be used to
; determine which byte terminated
CMPBGE Ra. rq, Rb. rq, Rc. wq !Operate format
CMPBGE Ra. rq,# b. ib, Rc. wq !Operate format
None
CMPBGE Compare Byte
None
105
105
Page 106
107
4– 50 Alpha Architecture Handbook
To compare two character strings for greater/ equal/ less:
<initialize R1 to aligned QW address of string1>
<initialize R2 to aligned QW address of string2>
LOOP:
LDQ R3, 0( R1) ; Pick up 8 bytes of string1
LDA R1, 8( R1) ; Increment string1 pointer
LDQ R4, 0( R2) ; Pick up 8 bytes of string2
LDA R2, 8( R2) ; Increment string2 pointer
CMPBGE R31, R3, R6 ; Test for zeros in string1
XOR R3, R4, R5 ; Test for all equal bytes
BNE R6, DONE ; Exit if a zero found
BEQ R5, LOOP ; Loop if all equal
DONE: CMPBGE R31, R5, R5 ;
...
; At this point, R5 can be used to determine the first not-equal
; byte position (if any), and R6 can be used to determine the
; position of the terminating zero in string1 (if any).
To range-check a string of characters in R1 for '0'¼' 9':
LDQ R2, lit0s ; Pick up 8 bytes of the character
; BELOW '0' '//////// '
LDQ R3, lit9s ; Pick up 8 bytes of the character
; ABOVE '9' ':::::::: '
CMPBGE R2, R1, R4 ; Some R4< i>= 1 if character is LT '0'
CMPBGE R1, R3, R5 ; Some R5< i>= 1 if character is GT '9'
BNE R4, ERROR ; Branch if some char too low
BNE R5, ERROR ; Branch if some char too high
106
106
Page 107
108
Instruction Descriptions 4– 51
4. 6.2 Extract Byte
Format:
Operation:
CASE
big_ endian_ data: Rbv' ¬ Rbv XOR 111 2
little_ endian_ data: Rbv' ¬ Rbv
ENDCASE
CASE
EXTBL: byte_ mask ¬ 0000 0001 2
EXTWx: byte_ mask ¬ 0000 0011 2
EXTLx: byte_ mask ¬ 0000 1111 2
EXTQx: byte_ mask ¬ 1111 1111 2
ENDCASE
CASE
EXTxL:
byte_ loc ¬ Rbv'< 2: 0>* 8
temp ¬ RIGHT_ SHIFT( Rav, byte_ loc< 5: 0>)
Rc ¬ BYTE_ ZAP( temp, NOT( byte_ mask) )
EXTxH:
byte_ loc ¬ 64 -Rbv'< 2: 0>* 8
temp ¬ LEFT_ SHIFT( Rav, byte_ loc< 5: 0>)
Rc ¬ BYTE_ ZAP( temp, NOT( byte_ mask) )
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
EXTxx Ra. rq, Rb. rq, Rc. wq !Operate format
EXTxx Ra. rq,# b. ib, Rc. wq !Operate format
None
EXTBL Extract Byte Low
EXTWL Extract Word Low
EXTLL Extract Longword Low
EXTQL Extract Quadword Low
EXTWH Extract Word High
EXTLH Extract Longword High
EXTQH Extract Quadword High
None
107
107
Page 108
109
4– 52 Alpha Architecture Handbook
Description:
EXTxL shifts register Ra right by 0 to 7 bytes, inserts zeros into vacated bit positions, and then
extracts 1, 2, 4, or 8 bytes into register Rc. EXTxH shifts register Ra left by 0 to 7 bytes,
inserts zeros into vacated bit positions, and then extracts 2, 4, or 8 bytes into register Rc. The
number of bytes to shift is specified by Rbv'< 2: 0>. The number of bytes to extract is speci-fied
in the function code. Remaining bytes are filled with zeros.
Notes:
The comments in the examples below assume that the effective address (ea) of X( R11) is such
that (ea mod 8) = 5), the value of the aligned quadword containing X( R11) is CBAx xxxx, and
the value of the aligned quadword containing X+ 7( R11) is yyyH GFED, and the datum is
little-endian.
The examples below are the most general case unless otherwise noted; if more information is
known about the value or intended alignment of X, shorter sequences can be used.
The intended sequence for loading a quadword from unaligned address X( R11) is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = CBAx xxxx
LDQ_ U R2, X+ 7( R11) ; Ignores va< 2: 0>, R2 = yyyH GFED
LDA R3, X( R11) ; R3< 2: 0> = (X mod 8) = 5
EXTQL R1, R3, R1 ; R1 = 0000 0CBA
EXTQH R2, R3, R2 ; R2 = HGFE D000
OR R2, R1, R1 ; R1 = HGFE DCBA
The intended sequence for loading and zero-extending a longword from unaligned address X
is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = CBAx xxxx
LDQ_ U R2, X+ 3( R11) ; Ignores va< 2: 0>, R2 = yyyy yyyD
LDA R3, X( R11) ; R3< 2: 0> = (X mod 8) = 5
EXTLL R1, R3, R1 ; R1 = 0000 0CBA
EXTLH R2, R3, R2 ; R2 = 0000 D000
OR R2, R1, R1 ; R1 = 0000 DCBA
The intended sequence for loading and sign-extending a longword from unaligned address X
is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = CBAx xxxx
LDQ_ U R2, X+ 3( R11) ; Ignores va< 2: 0>, R2 = yyyy yyyD
LDA R3, X( R11) ; R3< 2: 0> = (X mod 8) = 5
EXTLL R1, R3, R1 ; R1 = 0000 0CBA
EXTLH R2, R3, R2 ; R2 = 0000 D000
OR R2, R1, R1 ; R1 = 0000 DCBA
ADDL R31, R1, R1 ; R1 = ssss DCBA
108
108
Page 109
110
Instruction Descriptions 4– 53
For software that is not designed to use the BWX extension, the intended sequence for loading
and zero-extending a word from unaligned address X is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = yBAx xxxx
LDQ_ U R2, X+ 1( R11) ; Ignores va< 2: 0>, R2 = yBAx xxxx
LDA R3, X( R11) ; R3< 2: 0> = (X mod 8) = 5
EXTWL R1, R3, R1 ; R1 = 0000 00BA
EXTWH R2, R3, R2 ; R2 = 0000 0000
OR R2, R1, R1 ; R1 = 0000 00BA
For software that is not designed to use the BWX extension, the intended sequence for loading
and sign-extending a word from unaligned address X is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = yBAx xxxx
LDQ_ U R2, X+ 1( R11) ; Ignores va< 2: 0>, R2 = yBAx xxxx
LDA R3, X+ 1+ 1( R11) ; R3< 2: 0> = 5+ 1+ 1 = 7
EXTQL R1, R3, R1 ; R1 = 0000 000y
EXTQH R2, R3, R2 ; R2 = BAxx xxx0
OR R2, R1, R1 ; R1 = BAxx xxxy
SRA R1, #48, R1 ; R1 = ssss ssBA
For software that is not designed to use the BWX extension, the intended sequence for loading
and zero-extending a byte from address X is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = yyAx xxxx
LDA R3, X( R11) ; R3< 2: 0> = (X mod 8) = 5
EXTBL R1, R3, R1 ; R1 = 0000 000A
For software that is not designed to use the BWX extension, the intended sequence for loading
and sign-extending a byte from address X is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = yyAx xxxx
LDA R3, X+ 1( R11) ; R3< 2: 0> = (X + 1) mod 8, i. e.,
; convert byte position within
; quadword to one-origin based
EXTQH R1, R3, R1 ; Places the desired byte into byte 7
; of R1. final by left shifting
; R1. initial by ( 8 -R3< 2: 0> ) byte
; positions
SRA R1, #56, R1 ; Arithmetic Shift of byte 7 down
; into byte 0,
Optimized examples:
Assume that a word fetch is needed from 10( R3), where R3 is intended to contain a long-word-
aligned address. The optimized sequences below take advantage of the known constant
offset, and the longword alignment (hence a single aligned longword contains the entire word).
The sequences generate a Data Alignment Fault if R3 does not contain a longword-aligned
address.
109
109
Page 110
111
4– 54 Alpha Architecture Handbook
For software that is not designed to use the BWX extension, the intended sequence for loading
and zero-extending an aligned word from 10( R3) is:
LDL R1, 8( R3) ; R1 = ssss BAxx
; Faults if R3 is not longword aligned
EXTWL R1, #2, R1 ; R1 = 0000 00BA
For software that is not designed to use the BWX extension, the intended sequence for loading
and sign-extending an aligned word from 10( R3) is:
LDL R1, 8( R3) ; R1 = ssss BAxx
; Faults if R3 is not longword aligned
SRA R1, #16, R1 ; R1 = ssss ssBA
Big-endian examples:
For software that is not designed to use the BWX extension, the intended sequence for loading
and zero-extending a byte from address X is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = xxxx xAyy
LDA R3, X( R11) ; R3< 2: 0> = 5, shift will be 2 bytes
EXTBL R1, R3, R1 ; R1 = 0000 000A
The intended sequence for loading a quadword from unaligned address X( R11) is:
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = xxxxxABC
LDQ_ U R2, X+ 7( R11) ; Ignores va< 2: 0>, R2 = DEFGHyyy
LDA R3, X+ 7( R11) ; R3< 2: 0> = 4, shift will be 3 bytes
EXTQH R1, R3, R1 ; R1 = ABC0 0000
EXTQL R2, R3, R2 ; R2 = 000D EFGH
OR R1, R2, R1 ; R1 = ABCD EFGH
Note that the address in the LDA instruction for big-endian quadwords is X+ 7, for longwords
is X+ 3, and for words is X+ 1; for little-endian, these are all just X. Also note that the EXTQH
and EXTQL instructions are reversed with respect to the little-endian sequence.
110
110
Page 111
112
Instruction Descriptions 4– 55
4.6. 3 Byte Insert
Format:
Operation:
CASE
big_ endian_ data: Rbv' ¬ Rbv XOR 111 2
little_ endian_ data: Rbv' ¬ Rbv
ENDCASE
CASE
INSBL: byte_ mask ¬ 0000 0000 0000 0001 2
INSWx: byte_ mask ¬ 0000 0000 0000 0011 2
INSLx: byte_ mask ¬ 0000 0000 0000 1111 2
INSQx: byte_ mask ¬ 0000 0000 1111 1111 2
ENDCASE
byte_ mask ¬ LEFT_ SHIFT( byte_ mask, Rbv'< 2: 0>)
CASE
INSxL:
byte_ loc ¬ Rbv'< 2: 0>* 8
temp ¬ LEFT_ SHIFT( Rav, byte_ loc< 5: 0>)
Rc ¬ BYTE_ ZAP( temp, NOT( byte_ mask< 7: 0>))
INSxH:
byte_ loc ¬ 64 -Rbv'< 2: 0>* 8
temp ¬ RIGHT_ SHIFT( Rav, byte_ loc< 5: 0>)
Rc ¬ BYTE_ ZAP( temp, NOT( byte_ mask< 15: 8>))
ENDCASE
Exceptions:
Instruction mnemonics:
INSxx Ra. rq, Rb. rq, Rc. wq !Operate format
INSxx Ra. rq,# b. ib, Rc. wq !Operate format
None
INSBL Insert Byte Low
INSWL Insert Word Low
INSLL Insert Longword Low
INSQL Insert Quadword Low
INSWH Insert Word High
INSLH Insert Longword High
INSQH Insert Quadword High
111
111
Page 112
113
4– 56 Alpha Architecture Handbook
Qualifiers:
Description:
INSxL and INSxH shift bytes from register Ra and insert them into a field of zeros, storing the
result in register Rc. Register Rbv'< 2: 0> selects the shift amount, and the function code
selects the maximum field width: 1, 2, 4, or 8 bytes. The instructions can generate a byte,
word, longword, or quadword datum that is spread across two registers at an arbitrary byte
alignment.
None
112
112
Page 113
114
Instruction Descriptions 4– 57
4. 6.4 Byte Mask
Format:
Operation:
CASE
big_ endian_ data: Rbv'¬ Rbv XOR 111 2
little_ endian_ data: Rbv'¬ Rbv
ENDCASE
CASE
MSKBL: byte_ mask ¬ 0000 0000 0000 0001 2
MSKWx: byte_ mask ¬ 0000 0000 0000 0011 2
MSKLx: byte_ mask ¬ 0000 0000 0000 1111 2
MSKQx: byte_ mask ¬ 0000 0000 1111 1111 2
ENDCASE
byte_ mask ¬ LEFT_ SHIFT( byte_ mask, Rbv'< 2: 0>)
CASE
MSKxL:
Rc ¬ BYTE_ ZAP( Rav, byte_ mask< 7: 0>)
MSKxH:
Rc ¬ BYTE_ ZAP( Rav, byte_ mask< 15: 8>)
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
MSKxx Ra. rq, Rb. rq, Rc. wq !Operate format
MSKxx Ra. rq,# b. ib, Rc. wq !Operate format
None
MSKBL Mask Byte Low
MSKWL Mask Word Low
MSKLL Mask Longword Low
MSKQL Mask Quadword Low
MSKWH Mask Word High
MSKLH Mask Longword High
MSKQH Mask Quadword High
4– 58 Alpha Architecture Handbook
Description:
MSKxL and MSKxH set selected bytes of register Ra to zero, storing the result in register Rc.
Register Rbv'< 2: 0> selects the starting position of the field of zero bytes, and the function
code selects the maximum width: 1, 2, 4, or 8 bytes. The instructions generate a byte, word,
longword, or quadword field of zeros that can spread across two registers at an arbitrary byte
alignment.
Notes:
The comments in the examples below assume that the effective address (ea) of X( R11) is such
that (ea mod 8) = 5, the value of the aligned quadword containing X( R11) is CBAx xxxx, the
value of the aligned quadword containing X+ 7( R11) is yyyH GFED, the value to be stored
from R5 is HGFE DCBA, and the datum is little-endian. Slight modifications similar to those
in Section 4. 6.2
apply to big-endian data.
The examples below are the most general case; if more information is known about the value
or intended alignment of X, shorter sequences can be used.
The intended sequence for storing an unaligned quadword R5 at address X( R11) is:
LDA R6, X( R11) ; R6< 2: 0> = (X mod 8) = 5
LDQ_ U R2, X+ 7( R11) ; Ignores va< 2: 0>, R2 = yyyH GFED
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = CBAx xxxx
INSQH R5, R6, R4 ; R4 = 000H GFED
INSQL R5, R6, R3 ; R3 = CBA0 0000
MSKQH R2, R6, R2 ; R2 = yyy0 0000
MSKQL R1, R6, R1 ; R1 = 000x xxxx
OR R2, R4, R2 ; R2 = yyyH GFED
OR R1, R3, R1 ; R1 = CBAx xxxx
STQ_ U R2, X+ 7( R11) ; Must store high then low for
STQ_ U R1, X( R11) ; degenerate case of aligned QW
The intended sequence for storing an unaligned longword R5 at X is:
LDA R6, X( R11) ; R6< 2: 0> = (X mod 8) = 5
LDQ_ U R2, X+ 3( R11) ; Ignores va< 2: 0>, R2 = yyyy yyyD
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = CBAx xxxx
INSLH R5, R6, R4 ; R4 = 0000 000D
INSLL R5, R6, R3 ; R3 = CBA0 0000
MSKLH R2, R6, R2 ; R2 = yyyy yyy0
MSKLL R1, R6, R1 ; R1 = 000x xxxx
OR R2, R4, R2 ; R2 = yyyy yyyD
OR R1, R3, R1 ; R1 = CBAx xxxx
STQ_ U R2, X+ 3( R11) ; Must store high then low for
STQ_ U R1, X( R11) ; degenerate case of aligned
114
114
Page 115
116
Instruction Descriptions 4– 59
For software that is not designed to use the BWX extension, the intended sequence for storing
an unaligned word R5 at X is:
LDA R6, X( R11) ; R6< 2: 0> = (X mod 8) = 5
LDQ_ U R2, X+ 1( R11) ; Ignores va< 2: 0>, R2 = yBAx xxxx
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = yBAx xxxx
INSWH R5, R6, R4 ; R4 = 0000 0000
INSWL R5, R6, R3 ; R3 = 0BA0 0000
MSKWH R2, R6, R2 ; R2 = yBAx xxxx
MSKWL R1, R6, R1 ; R1 = y00x xxxx
OR R2, R4, R2 ; R2 = yBAx xxxx
OR R1, R3, R1 ; R1 = yBAx xxxx
STQ_ U R2, X+ 1( R11) ; Must store high then low for
STQ_ U R1, X( R11) ; degenerate case of aligned
For software that is not designed to use the BWX extension, the intended sequence for storing
a byte R5 at X is:
LDA R6, X( R11) ; R6< 2: 0> = (X mod 8) = 5
LDQ_ U R1, X( R11) ; Ignores va< 2: 0>, R1 = yyAx xxxx
INSBL R5, R6, R3 ; R3 = 00A0 0000
MSKBL R1, R6, R1 ; R1 = yy0x xxxx
OR R1, R3, R1 ; R1 = yyAx xxxx
STQ_ U R1, X( R11) ;
115
115
Page 116
117
4– 60 Alpha Architecture Handbook
4.6. 5 Sign Extend
Format:
Operation:
CASE
SEXTB: Rc ¬ SEXT( Rbv< 07: 0>)
SEXTW: Rc ¬ SEXT( Rbv< 15: 0>)
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The byte or word in register Rb is sign-extended to 64 bits and written to register Rc. Ra must
be R31.
Implementation Note:
The SEXTB and SEXTW instructions are supported in hardware on Alpha
implementations for which the AMASK instruction returns bit 0 set. SEXTB and SEXTW
are supported with software emulation in Alpha implementations for which AMASK does
not return bit 0 set. Software emulation of SEXTB and SEXTW is significantly slower
than hardware support.
SEXTx Rb. rq, Rc. wq !Operate format
SEXTx #b. ib, Rc. wq !Operate format
None
SEXTB Sign Extend Byte
SEXTW Sign Extend Word
None
116
116
Page 117
118
Instruction Descriptions 4– 61
4.6. 6 Zero Bytes
Format:
Operation:
CASE
ZAP:
Rc ¬ BYTE_ ZAP( Rav, Rbv< 7: 0>)
ZAPNOT:
Rc ¬ BYTE_ ZAP( Rav, NOT Rbv< 7: 0>)
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
ZAP and ZAPNOT set selected bytes of register Ra to zero and store the result in register Rc.
Register Rb< 7: 0> selects the bytes to be zeroed. Bit 0 of Rbv corresponds to byte 0, bit 1 of
Rbv corresponds to byte 1, and so on. A result byte is set to zero if the corresponding bit of
Rbv is a one for ZAP and a zero for ZAPNOT.
ZAPx Ra. rq, Rb. rq, Rc. wq !Operate format
ZAPx Ra. rq,# b. ib, Rc. wq !Operate format
None
ZAP Zero Bytes
ZAPNOT Zero Bytes Not
None
117
117
Page 118
119
4– 62 Alpha Architecture Handbook
4. 7 Floating-Point Instructions
Alpha provides instructions for operating on floating-point operands in each of four data
formats:
° F_ floating (VAX single)
° G_ floating (VAX double, 11-bit exponent)
° S_ floating (IEEE single)
° T_ floating (IEEE double, 11-bit exponent)
Data conversion instructions are also provided to convert operands between floating-point and
quadword integer formats, between double and single floating, and between quadword and
longword integers.
Note:
D_ floating is a partially supported datatype; no D_ floating arithmetic operations are
provided in the architecture. For backward compatibility, exact D_ floating arithmetic may
be provided via software emulation. D_ floating "format compatibility," in which binary
files of D_ floating numbers may be processed but without the last 3 bits of fraction
precision, can be obtained via conversions to G_ floating, G arithmetic operations, then
conversion back to D_ floating.
The choice of data formats is encoded in each instruction. Each instruction also encodes the
choice of rounding mode and the choice of trapping mode.
All floating-point operate instructions (not including loads or stores) that yield an F_ floating or
G_ floating zero result must materialize a true zero.
4.7. 1 Single-Precision Operations
Single-precision values (F_ floating or S_ floating) are stored in the floating-point registers in
canonical form, as subsets of double-precision values, with 11-bit exponents restricted to the
corresponding single-precision range, and with the 29 low-order fraction bits restricted to be all
zero.
Single-precision operations applied to canonical single-precision values give single-precision
results. Single-precision operations applied to non-canonical operands give UNPREDICT-ABLE
results.
Longword integer values in floating-point registers are stored in bits <63: 62,58: 29>, with bits
<61: 59> ignored and zeros in bits <28: 0>.
4.7. 2 Subsets and Faults
All floating-point operations may take floating disabled faults. Any subsetted floating-point
instruction may take an Illegal Instruction Trap. These faults are not explicitly listed in the
description of each instruction.
118
118
Page 119
120
Instruction Descriptions 4– 63
All floating-point loads and stores may take memory management faults (access control viola-tion,
translation not valid, fault on read/ write, data alignment).
The floating-point enable (FEN) internal processor register (IPR) allows system software to
restrict access to the floating-point registers.
If a floating-point instruction is implemented and FEN = 0, attempts to execute the instruction
cause a floating disabled fault.
If a floating-point instruction is not implemented, attempts to execute the instruction cause an
Illegal Instruction Trap. This rule holds regardless of the value of FEN.
An Alpha implementation may provide both VAX and IEEE floating-point operations, either,
or none.
Some floating-point instructions are common to the VAX and IEEE subsets, some are VAX
only, and some are IEEE only. These are designated in the descriptions that follow. If either
subset is implemented, all the common instructions must be implemented.
An implementation that includes IEEE floating-point may subset the ability to perform round-ing
to plus infinity and minus infinity. If not implemented, instructions requesting these
rounding modes take Illegal Instruction Trap.
An implementation that includes IEEE floating-point may implement any subset of the Trap
Disable flags (DNOD, DZED, INED, INVD, OVFD, and UNFD) and Denormal Control flags
(DNZ and UNDZ) in the FPCR:
° If a Trap Disable flag is not implemented, then the corresponding trap occurs as usual.
° If DNZ is not implemented, then any IEEE operation with a denormal input must take an Invalid Operation Trap.
° If UNDZ is not implemented, then any IEEE operation that includes a /S qualifier that underflows must take an Underflow Trap.
° If DZED is implemented, then IEEE division of 0/ 0 must be treated as an invalid opera-tion instead of a division by zero.
Any unimplemented bits in the FPCR are read as zero and ignored when set.
4.7. 3 Definitions
The following definitions apply to Alpha floating-point support.
Alpha finite number
A floating-point number with a definite, in-range value. Specifically, all numbers in the inclu-sive
ranges –MAX through –MIN, zero, and +MIN through +MAX, where MAX is the largest
non-infinite representable floating-point number and MIN is the smallest non-zero represent-able
normalized floating-point number.
119
119
Page 120
121
4– 64 Alpha Architecture Handbook
For VAX floating-point, finites do not include reserved operands or dirty zeros (this differs
from the usual VAX interpretation of dirty zeros as finite). For IEEE floating-point, finites do
not include infinites, NaNs, or denormals, but do include minus zero.
denormal
An IEEE floating-point bit pattern that represents a number whose magnitude lies between
zero and the smallest finite number.
dirty zero
A VAX floating-point bit pattern that represents a zero value, but not in true-zero form.
infinity
An IEEE floating-point bit pattern that represents plus or minus infinity.
LSB
The least significant bit. For a positive finite representable number A, A + 1 LSB is the next
larger representative number, and A + ½ LSB is exactly halfway between A and the next larger
representable number. For a positive representable number A whose fraction field is not all
zeros, A – 1 LSB is the next smaller representable number, and A – ½ LSB is exactly halfway
between A and the next smaller representable number.
non-finite number
An IEEE infinity, NaN, denormal number, or a VAX dirty zero or reserved operand.
Not-a-Number
An IEEE floating-point bit pattern that represents something other than a number. This comes
in two forms: signaling NaNs (for Alpha, those with an initial fraction bit of 0) and quiet NaNs
(for Alpha , those with initial fraction bit of 1).
representable result
A real number that can be represented exactly as a VAX or IEEE floating-point number, with
finite precision and bounded exponent range.
reserved operand
A VAX floating-point bit pattern that represents an illegal value.
trap shadow
The set of instructions potentially executed after an instruction that signals an arithmetic trap
but before the trap is actually taken.
true result
The mathematically correct result of an operation, assuming that the input operand values are
exact. The true result is typically rounded to the nearest representable result.
120
120
Page 121
122
Instruction Descriptions 4– 65
true zero
The value +0, represented as exactly 64 zeros in a floating-point register.
4.7. 4 Encodings
Floating-point numbers are represented with three fields: sign, exponent, and fraction. The sign
is 1 bit; the exponent is 8, 11, or 15 bits; and the fraction is 23, 52, 55, or 112 bits. Some
encodings represent special values:
The values of MIN and MAX for each of the five floating-point data formats are:
Sign Exponent Fraction Vax Meaning VAX Finite IEEE Meaning IEEE Finite
x All-1's Non-zero Finite Yes +/– NaN No
x All-1's 0 Finite Yes +/– Infinity No
0 0 Non-zero Dirty zero No +Denormal No
1 0 Non-zero Resv. operand No –Denormal No
0 0 0 True zero Yes +0 Yes
1 0 0 Resv. operand No –0 Yes
x Other x Finite Yes finite Yes
Data
Format MIN MAX
F_ floating 2**– 127 * 0.5 2** 127 *( 1.0 – 2**– 24)
(0. 293873588e– 38) (1. 7014117e38)
G_ floating 2**– 1023 * 0.5 2** 1023 * (1. 0 – 2**– 53)
(0. 5562684646268004e– 308) (0. 89884656743115785407e308)
S_ floating 2**– 126 * 1.0 2** 127 * (2. 0 – 2**– 23)
(1. 17549435e– 38) (3. 40282347e38)
T_ floating 2**– 1022 * 1.0 2** 1023 * (2. 0 – 2**– 52)
(2. 2250738585072013e– 308) (1. 7976931348623158e308)
X_ floating 2**– 16382* 1.0 2** 16383*( 2. 0– 2**– 112)
(See below * )
* (1. 18973149535723176508575932662800702e4932)
(See below * )
* (3. 36210314311209350626267781732175260e– 4932)
121
121
Page 122
123
4– 66 Alpha Architecture Handbook
4.7. 5 Rounding Modes
All rounding modes map a true result that is exactly representable to that representable value.
VAX Rounding Modes
For VAX floating-point operations, two rounding modes are provided and are specified in each
instruction: normal (biased) rounding and chopped rounding.
Normal VAX rounding maps the true result to the nearest of two representable results, with
true results exactly halfway between mapped to the larger in absolute value (sometimes called
biased rounding away from zero); maps true results ³ MAX + 1/ 2 LSB in magnitude to an
overflow; maps true results < MIN – 1/ 4 LSB in magnitude to an underflow.
Chopped VAX rounding maps the true result to the smaller in magnitude of two surrounding
representable results; maps true results ³ MAX + 1 LSB in magnitude to an overflow; maps
true results < MIN in magnitude to an underflow.
IEEE Rounding Modes
For IEEE floating-point operations, four rounding modes are provided: normal rounding (unbi-ased
round to nearest), rounding toward minus infinity, round toward zero, and rounding
toward plus infinity. The first three can be specified in the instruction. Rounding toward plus
infinity can be obtained by setting the Floating-point Control Register (FPCR) to select it and
then specifying dynamic rounding mode in the instruction (See Section 4.7.8).
Alpha IEEE
arithmetic does rounding before detecting overflow/ underflow.
Normal IEEE rounding maps the true result to the nearest of two representable results, with
true results exactly halfway between mapped to the one whose fraction ends in 0 (sometimes
called unbiased rounding to even); maps true results ³ MAX + 1/ 2 LSB in magnitude to an
overflow; maps true results < MIN – 1/ 2 LSB in magnitude to an underflow.
Plus infinity IEEE rounding maps the true result to the larger of two surrounding representable
results; maps true results > MAX in magnitude to an overflow; maps positive true results
£ +MIN – 1 LSB to an underflow; and maps negative true results > –MIN to an underflow.
Minus infinity IEEE rounding maps the true result to the smaller of two surrounding represent-able
results; maps true results > MAX in magnitude to an overflow; maps positive true results
< +MIN to an underflow; and maps negative true results ³ –MIN + 1 LSB to an underflow.
Chopped IEEE rounding maps the true result to the smaller in magnitude of two surrounding
representable results; maps true results ³ MAX + 1 LSB in magnitude to an overflow; and
maps non-zero true results < MIN in magnitude to an underflow.
Dynamic rounding mode uses the IEEE rounding mode selected by the FPCR register and is
described in more detail in Section 4.7.
8.
122
122
Page 123
124
Instruction Descriptions 4– 67
The following tables summarize the floating-point rounding modes:
4.7. 6 Computational Models
The Alpha architecture provides a choice of floating-point computational models.
There are two computational models available on systems that implement the VAX float-ing-
point subset:
° VAX-format arithmetic with precise exceptions
° High-performance VAX-format arithmetic
There are three computational models available on systems that implement the IEEE float-ing-
point subset:
° IEEE compliant arithmetic
° IEEE compliant arithmetic without inexact exception
° High-performance IEEE-format arithmetic
4.7.6.1 VAX-Format Arithmetic with Precise Exceptions
This model provides floating-point arithmetic that is fully compatible with the floating-point
arithmetic provided by the VAX architecture. It provides support for VAX non-finites and
gives precise exceptions.
This model is implemented by using VAX floating-point instructions with the /S, /SU, and /SV
trap qualifiers. Each instruction can determine whether it also takes an exception on underflow
or integer overflow. The performance of this model depends on how often computations
involve non-finite operands. Performance also depends on how an Alpha system chooses to
trade off implementation complexity between hardware and operating system completion han-dlers
(see Section 4.7.7.3).
VAX Rounding Mode Instruction Notation
Normal rounding (No qualifier)
Chopped /C
IEEE Rounding Mode Instruction Notation
Normal rounding (No qualifier)
Dynamic rounding /D
Plus infinity /D and ensure that FPCR< DYN> = '11'
Minus infinity /M
Chopped /C
123
123
Page 124
125
4– 68 Alpha Architecture Handbook
4.7.6.2 High-Performance VAX-Format Arithmetic
This model provides arithmetic operations on VAX finite numbers. An imprecise arithmetic
trap is generated by any operation that involves non-finite numbers, floating overflow, and
divide-by-zero exceptions.
This model is implemented by using VAX floating-point instructions with a trap qualifier other
than /S, /SU, or /SV. Each instruction can determine whether it also traps on underflow or inte-ger
overflow. This model does not require the overhead of an operating system completion
handler and can be the faster of the two VAX models.
4.7.6.3 IEEE-Compliant Arithmetic
This model provides floating-point arithmetic that fully complies with the IEEE Standard for
Binary Floating-Point Arithmetic. It provides all of the exception status flags that are in the
standard. It provides a default where all traps and faults are disabled and where IEEE
non-finite values are used in lieu of exceptions.
Alpha operating systems provide additional mechanisms that allow the user to specify dynami-cally
which exception conditions should trap and which should proceed without trapping. The
operating systems also include mechanisms that allow alternative handling of denormal val-ues.
See Appendix B
and the appropriate operating system documentation for a description of
these mechanisms.
This model is implemented by using IEEE floating-point instructions with the /SUI
or /SVI trap qualifiers. The performance of this model depends on how often computations
involve inexact results and non-finite operands and results. Performance also depends on how
the Alpha system chooses to trade off implementation complexity between hardware and oper-ating
system completion handlers (see Section 4. 7. 7. 3).
This model provides acceptable
performance on Alpha systems that implement the inexact disable (INED) bit in the FPCR.
Performance may be slow if the INED bit is not implemented.
4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception
This model is similar to the model in Section 4.7.6.3,
except this model does not signal inexact
results either by the inexact status flag or by trapping. Combining routines that are compiled
with this model and routines that are compiled with the model in Section 4. 7. 6. 3
can give an
application better control over testing when an inexact operation will affect computational
accuracy.
This model is implemented by using IEEE floating-point instructions with the /SU or /SV trap
qualifiers. The performance of this model depends on how often computations involve
non-finite operands and results. Performance also depends on how an Alpha system chooses to
trade off implementation complexity between hardware and operating system completion han-dlers
(see Section 4.7.7.3).
124
124
Page 125
126
Instruction Descriptions 4– 69
4.7.6.5 High-Performance IEEE-Format Arithmetic
This model provides arithmetic operations on IEEE finite numbers and notifies applications of
all exceptional floating-point operations. An imprecise arithmetic trap is generated by any
operation that involves non-finite numbers, floating overflow, divide-by-zero, and invalid
operations. Underflow results are set to zero. Conversion to integer results that overflow are set
to the low-order bits of the integer value.
This model is implemented by using IEEE floating-point instructions with a trap qualifier other
than /SU, /SV, /SUI, or /SVI. Each instruction can determine whether it also traps on under-flow
or integer overflow. This model does not require the overhead of an operating system
completion handler and can be the fastest of the three IEEE models.
4.7. 7 Trapping Modes
There are six exceptions that can be generated by floating-point operate instructions, all sig-naled
by an arithmetic exception trap. These exceptions are:
° Invalid operation
° Division by zero
° Overflow
° Underflow
° Inexact result
° Integer overflow (conversion to integer only)
4.7.7.1 VAX Trapping Modes
This section describes the characteristics of the four VAX trapping modes, which are summa-rized
in Table 4– 8.
When no trap mode is specified (the default):
° Arithmetic is performed on VAX finite numbers. ° Operations give imprecise traps whenever the following occur:
– an operand is a non-finite number
– a floating overflow
– a divide-by-zero
° Traps are imprecise and it is not always possible to determine which instruction trig-gered a trap or the operands of that instruction.
° An underflow produces a zero result without trapping. ° A conversion to integer that overflows uses the low-order bits of the integer as the
result without trapping.
° The result of any operation that traps is UNPREDICTABLE.
125
125
Page 126
127
4– 70 Alpha Architecture Handbook
When /U or /V mode is specified:
° Arithmetic is performed on VAX finite numbers. ° Operations give imprecise traps whenever the following occur:
– an operand is a non-finite number
– an underflow
– an integer overflow
– a floating overflow
– a divide-by-zero
° Traps are imprecise and it is not always possible to determine which instruction trig-gered a trap or the operands of that instruction.
° An underflow trap produces a zero result. ° A conversion to integer trapping with an integer overflow produces the low-order bits
of the integer value.
° The result of any other operation that traps is UNPREDICTABLE.
When /S mode is specified:
° Arithmetic is performed on all VAX values, both finite and non-finite. ° A VAX dirty zero is treated as zero.
° Exceptions are signaled for: – a VAX reserved operand, which generates an invalid operation exception
– a floating overflow
– a divide-by-zero
° Exceptions are precise and an application can locate the instruction that caused the exception, along with its operand values. See Section 4.7.7.
3.
° An operation that underflows produces a zero result without taking an exception. ° A conversion to integer that overflows uses the low-order bits of the integer as the
result, without taking an exception.
° When an operation takes an exception, the result of the operation is UNPREDICT-ABLE.
When /SU or /SV mode is specified:
° Arithmetic is performed on all VAX values, both finite and non-finite. ° A VAX dirty zero is treated as zero.
° Exceptions are signaled for: – a VAX reserved operand, which generates an invalid operation exception
– an underflow
– an integer overflow
– a floating overflow
– a divide-by-zero
° Exceptions are precise and an application can locate the instruction that caused the exception, along with its operand values. See Section 4.7.7.
3.
° An underflow exception produces a zero. ° A conversion to integer exception with integer overflow produces the low-order bits of
the integer value.
° The result of any other operation that takes an exception is UNPREDICTABLE.
126
126
Page 127
128
Instruction Descriptions 4– 71
A summary of the VAX trapping modes, instruction notation, and their meaning follows in
Table 4– 8:
4.7.7.2 IEEE Trapping Modes
This section describes the characteristics of the four IEEE trapping modes, which are summa-rized
in Table 4– 9.
When no trap mode is specified (the default):
° Arithmetic is performed on IEEE finite numbers. ° Operations give imprecise traps whenever the following occur:
– an operand is a non-finite number
– a floating overflow
– a divide-by-zero
– an invalid operation
° Traps are imprecise, and it is not always possible to determine which instruction trig-gered a trap or the operands of that instruction.
° An underflow produces a zero result without trapping. ° A conversion to integer that overflows uses the low-order bits of the integer as the
result without trapping.
° When an operation traps, the result of the operation is UNPREDICTABLE.
When /U or /V mode is specified :
° Arithmetic is performed on IEEE finite numbers. ° Operations give imprecise traps whenever the following occur:
– an operand is a non-finite number
– an underflow
– an integer overflow
– a floating overflow
– a divide-by-zero
– an invalid operation
Table 4– 8: VAX Trapping Modes Summary
Trap Mode Notation Meaning
Underflow disabled No qualifier
/S
Imprecise
Precise exception completion
Underflow enabled /U
/SU
Imprecise
Precise exception completion
Integer overflow disabled No qualifier
/S
Imprecise
Precise exception completion
Integer overflow enabled /V
/SV
Imprecise
Precise exception completion
127
127
Page 128
129
4– 72 Alpha Architecture Handbook
° Traps are imprecise, and it is not always possible to determine which instruction trig-gered a trap or the operands of that instruction.
° An underflow trap produces a zero. ° A conversion to integer trap with an integer overflow produces the low-order bits of the
integer.
° The result of any other operation that traps is UNPREDICTABLE.
When /SU or /SV mode is specified:
° Arithmetic is performed on all IEEE values, both finite and non-finite. ° Alpha systems support all IEEE features except inexact exception (which requires /SUI
or /SVI):
– The IEEE standard specifies a default where exceptions do not fault or trap. In com-bination
with the FPCR, this mode allows disabling exceptions and producing
IEEE compliant nontrapping results. See Sections 4.7. 7.10
and 4.7.7.11.
– Each Alpha operating system provides a way to optionally signal IEEE floating-
point exceptions. This mode enables the IEEE status flags that keep a record of
each exception that is encountered. An Alpha operating system uses the IEEE float-ing-
point control (FP_ C) quadword, described in Section B. 2.1,
to maintain the
IEEE status flags and to enable calls to IEEE user signal handlers.
° Exceptions signaled in this mode are precise and an application can locate the instruc-tion that caused the exception, along with its operand values. See Section 4.7.7.3.
When /SUI or /SVI mode is specified:
° Arithmetic is performed on all IEEE values, both finite and non-finite. ° Inexact exceptions are supported, along with all the other IEEE features supported by
the /SU or /SV mode.
A summary of the IEEE trapping modes, instruction notation, and their meaning follows in
Table 4– 9:
Table 4– 9: Summary of IEEE Trapping Modes
Trap Mode Notation Meaning
Underflow disabled and
inexact disabled
No qualifier Imprecise
Underflow enabled and
inexact disabled
/U
/SU
Imprecise
Precise exception completion
Underflow enabled and
inexact enabled
/SUI Precise exception completion
Integer overflow disabled and
inexact disabled
No qualifier Imprecise
128
128
Page 129
130
Instruction Descriptions 4– 73
4.7.7.3 Arithmetic Trap Completion
Because floating-point instructions may be pipelined, the trap PC can be an arbitrary number
of instructions past the one triggering the trap. Those instructions that are executed after the
trigger instruction of an arithmetic trap are collectively referred to as the trap shadow of the
trigger instruction.
Marking floating-point instructions for exception completion with any valid qualifier combina-tion
that includes the /S qualifier enables the completion of the triggering instruction. For any
instruction so marked, the output register for the triggering instruction cannot also be one of
the input registers, so that an input register cannot be overwritten and the input value is avail-able
after a trap occurs.
See Section B. 2
for more information.
The AMASK instruction reports how the arithmetic trap should be completed:
° If AMASK returns with bit 9 clear, floating-point traps are imprecise. Exception com-pletion requires that generated code must obey the trap shadow rules in Section
4.7.7.3.1,
with a trap shadow length as described in Section 4. 7.7.3.
2.
° If AMASK returns with bit 9 set, the hardware implements precise floating-point traps. If the instruction has any valid qualifier combination that includes /S, the trap PC points
to the instruction that immediately follows the instruction that triggered the trap. The
trap shadow contains zero instructions; exception completion does not require that the
generated code follow the conditions in Section 4. 7.7.3. 1
and the length rules in Section
4.7.7.3.2.
4.7.7.3.1 Trap Shadow Rules
For an operating system (OS) completion handler to complete non-finite operands and excep-tions,
the following conditions must hold.
Conditions 1 and 2, below, allow an OS completion handler to locate the trigger instruction by
doing a linear scan backwards from the trap PC while comparing destination registers in the
trap shadow with the registers that are specified in the register write mask parameter to the
arithmetic trap.
Integer overflow enabled and
inexact disabled
/V
/SV
Imprecise
Precise exception completion
Integer overflow enabled and
inexact enabled
/SVI Precise exception completion
Table 4– 9: Summary of IEEE Trapping Modes (Continued)
Trap Mode Notation Meaning
129
129
Page 130
131
4– 74 Alpha Architecture Handbook
Condition 3 allows an OS completion handler to emulate the trigger instruction with its origi-nal
input operand values.
Condition 4 allows the handler to re-execute instructions in the trap shadow with their original
operand values.
Condition 5 prevents any unusual side effects that would cause problems on repeated execu-tion
of the instructions in the trap shadow.
Conditions:
1. The destination register of the trigger instruction may not be used as the destination reg-ister
of any instruction in the trap shadow.
2. The trap shadow may not include any branch or jump instructions.
3. An instruction in the trap shadow may not modify an input to the trigger instruction.
4. The value in a register or memory location that is used as input to some instruction in the trap shadow may not be modified by a subsequent instruction in the trap shadow
unless that value is produced by an earlier instruction in the trap shadow.
5. The trap shadow may not contain any instructions with side effects that interact with earlier instructions in the trap shadow or with other parts of the system. Examples of
operations with prohibited side effects are:
– Modifications of the stack pointer or frame pointer that can change the accessibility
of stack variables and the exception context that is used by earlier instructions in
the trap shadow.
– Modifications of volatile values and access to I/ O device registers.
– If order of exception reporting is important, taking an arithmetic trap by an integer
instruction or by a floating-point instruction that does not include a /S qualifier,
either of which can report exceptions out of order.
An instruction may be in the trap shadows of multiple instructions that include a /S qualifier.
That instruction must obey all conditions for all those trap shadows. For example, the destina-tion
register of an instruction in multiple trap shadows must be different than the destination
registers of each possible trigger instruction.
4.7.7.3.2 Trap Shadow Length Rules
The trap shadow length rules in Table 4– 10
apply only to those floating-point instructions with
any valid qualifier combination that includes a /S trap qualifier. Further, the instruction to
which the trap shadow extends is not part of the trap shadow and that instruction is not exe-cuted
prior to the arithmetic trap that is signaled by the trigger instruction.
Implementation notes:
° On Alpha implementations for which the IMPLVER instruction returns the value 0, the trap shadow of an instruction may extend after the result is consumed by a float-ing-
point STx instruction. On all other implementations, the trap shadow ends when a
result is consumed.
° Because Alpha implementations need not execute instructions that have R31 or F31 as the destination operand, instructions with such an destination should not be thought to
end a trap shadow.
130
130
Page 131
132
Instruction Descriptions 4– 75
Table 4– 10: Trap Shadow Length Rules
Floating-Point
Instruction Group
Trap Shadow Extends Until Any of the Following
Occurs:
Floating-point operate
(except DIVx and SQRTx)
° Encountering a CALL_ PAL, EXCB, or TRAPB instruction.
° The result is consumed by any instruction except floating-point STx.
° The fourth instruction * after the result is consumed by a floating-point STx instruction.
Or, following the floating-point STx of the result, the
result of a LDx that loads the stored value is
consumed by any instruction.
° The result of a subsequent floating-point operate instruction is consumed by any instruction except
floating-point STx.
° The second instruction *
after the result of a subse-quent floating-point operate
instruction is consumed
by a floating-point STx instruction.
° The result of a subsequent floating-point DIVx or SQRTx instruction is consumed by any instruction.
Floating-point DIVx
° Encountering a CALL_ PAL, EXCB, or TRAPB instruction.
° The result is consumed by any instruction except floating-point STx.
° The fourth instruction *
after the result is consumed by a floating-point STx instruction.
Or, following the floating-point STx of the result, the
result of a LDx that loads the stored value is
consumed by any instruction.
° The result of a subsequent floating-point DIVx is con-sumed by any instruction.
131
131
Page 132
133
4– 76 Alpha Architecture Handbook
4.7.7.4 Invalid Operation (INV) Arithmetic Trap
An invalid operation arithmetic trap is signaled if an operand is a non-finite number or if an
operand is invalid for the operation to be performed. (Note that CMPTxy does not trap on plus
or minus infinity.) Invalid operations are:
° Any operation on a signaling NaN.
° Addition of unlike-signed infinities or subtraction of like-signed infinities, such as (+ infinity + –infinity) or (+ infinity – +infinity).
° Multiplication of 0*infinity.
° IEEE division of 0/ 0 or infinity/infinity.
° Conversion of an infinity or NaN to an integer.
° CMPTLE or CMPTLT when either operand is a NaN.
° SQRTx of a negative non-zero number.
The instruction cannot disable the trap and, if the trap occurs, an UNPREDICTABLE value is
stored in the result register. However, under some conditions, the FPCR can dynamically dis-able
the trap, as described in Section 4.7.7.10,
producing a correct IEEE result, as described in
Section 4. 7.10.
IEEE-compliant system software must also supply an invalid operation indication to the user
for x REM 0 and for conversions to integer that take an integer overflow trap.
If an implementation does not support the DZED (division by zero disable) bit, it may respond
to the IEEE division of 0/ 0 by delivering a division by zero trap to the operating system, which
IEEE compliant software must change to an invalid operation trap for the user.
Floating-point SQRTx
° Encountering a CALL_ PAL, EXCB, or TRAPB instruction.
° The result is consumed by any instruction.
° The result of a subsequent SQRTx instruction is con-sumed by any instruction.
* The length of four instructions is a conservative estimate of how far the trap shadow may
extend past a consuming floating-point STx instruction. The length of two instructions is a
conservative estimate of how far the trap shadow may extend after a subsequent float-ing-
point operate instruction is consumed by a floating-point STx instruction. Compilers can
make a more precise estimate by consulting the DECchip 21064 and DECchip 21064A
Alpha AXP Microprocessors Hardware Reference Manual, EC-QD2RA-TE.
Table 4– 10: Trap Shadow Length Rules (Continued)
Floating-Point
Instruction Group
Trap Shadow Extends Until Any of the Following
Occurs:
132
132
Page 133
134
Instruction Descriptions 4– 77
An implementation may choose not to take an INV trap for a valid IEEE operation that
involves denormal operands if:
° The instruction is modified by any valid qualifier combination that includes the /S (exception completion) qualifier.
° The implementation supports the DNZ (denormal operands to zero) bit and DNZ is set.
° The instruction produces the result and exceptions required by Section 4.7. 10,
as modi-fied by the DNZ bit described in Section 4.7.7.11.
An implementation may choose not to take an INV trap for a valid IEEE operation that
involves denormal operands, and direct hardware implementation of denormal arithmetic is
permitted if:
° The instruction is modified by any valid qualifier combination that includes the /S (exception completion) qualifier.
° The implementation supports both the DNOD (denormal operand exception disable) bit and the DNZ (denormal operands to zero) bit and DNOD is set while DNZ is clear.
° The instruction produces the result and exceptions required by Section 4.7.10,
possibly modified by the UDNZ bit described in Section 4.7. 7.11.
Regardless of the setting of the INVD (invalid operation disable) bit, the implementation may
choose not to trap on valid operations that involve quiet NaNs and infinities as operands for
IEEE instructions that are modified by any valid qualifier combination that includes the /S
(exception completion) qualifier.
4.7.7.5 Division by Zero (DZE) Arithmetic Trap
A division by zero arithmetic trap is taken if the numerator does not cause an invalid operation
trap and the denominator is zero.
The instruction cannot disable the trap and, if the trap occurs, an UNPREDICTABLE value is
stored in the result register. However, under some conditions, the FPCR can dynamically dis-able
the trap, as described in Section 4.7.7.10,
producing a correct IEEE result, as described in
Section 4. 7.10.
If an implementation does not support the DZED (division by zero disable) bit, it may respond
to the IEEE division of 0/ 0 by delivering a division by zero trap to the operating system, which
IEEE compliant software must change to an invalid operation trap for the user.
4.7.7.6 Overflow (OVF) Arithmetic Trap
An overflow arithmetic trap is signaled if the rounded result exceeds in magnitude the largest
finite number of the destination format.
The instruction cannot disable the trap and, if the trap occurs, an UNPREDICTABLE value is
stored in the result register. However, under some conditions, the FPCR can dynamically dis-able
the trap, as described in Section 4.7.7.10,
producing a correct IEEE result, as described in
Section 4. 7.10.
133
133
Page 134
135
4– 78 Alpha Architecture Handbook
4.7.7.7 Underflow (UNF) Arithmetic Trap
An underflow occurs if the rounded result is smaller in magnitude than the smallest finite num-ber
of the destination format.
If an underflow occurs, a true zero (64 bits of zero) is always stored in the result register. In the
case of an IEEE operation that takes an underflow arithmetic trap, a true zero is stored even if
the result after rounding would have been –0 (underflow below the negative denormal range).
If an underflow occurs and underflow traps are enabled by the instruction, an underflow arith-metic
trap is signaled. However, under some conditions, the FPCR can dynamically disable the
trap, as described in Section 4.7.7.10,
producing the result described in Section 4.7.10,
as mod-ified
by the UNDZ bit described in Section 4.7.7.11.
4.7.7.8 Inexact Result (INE) Arithmetic Trap
An inexact result occurs if the infinitely precise result differs from the rounded result.
If an inexact result occurs, the normal rounded result is still stored in the result register. If an
inexact result occurs and inexact result traps are enabled by the instruction, an inexact result
arithmetic trap is signaled. However, under some conditions, the FPCR can dynamically dis-able
the trap; see Section 4.7.7. 10
for information.
4.7.7.9 Integer Overflow (IOV) Arithmetic Trap
In conversions from floating to quadword integer, an integer overflow occurs if the rounded
result is outside the range –2** 63.. 2** 63– 1. In conversions from quadword integer to long-word
integer, an integer overflow occurs if the result is outside the range –2** 31.. 2** 31– 1.
If an integer overflow occurs in CVTxQ or CVTQL, the true result truncated to the low-order
64 or 32 bits respectively is stored in the result register.
If an integer overflow occurs and integer overflow traps are enabled by the instruction, an inte-ger
overflow arithmetic trap is signaled.
4.7.7.10 IEEE Floating-Point Trap Disable Bits
In the case of IEEE exception completion modes, any of the traps described in Sections 4.7.7.4
through 4. 7. 7. 9
may be disabled by setting the appropriate trap disable bit in the FPCR. The
trap disable bits only affect the IEEE trap modes when the instruction is modified by any valid
qualifier combination that includes the /S (exception completion) qualifier. The trap disable
bits (DNOD, DZED, INED, INVD, OVFD, and UNFD) do not affect any of the VAX trap
modes.
If a trap disable bit is set and the corresponding trap condition occurs, the hardware implemen-tation
sets the result of the operation to the nontrapping result value as specified in the IEEE
standard and Section 4.7.10
and modified by the denormal control bits. If the implementation
is unable to calculate the required result, it ignores the trap disable bit and signals a trap as
usual.
Note that a hardware implementation may choose to support any subset of the trap disable bits,
including the empty subset.
134
134
Page 135
136
Instruction Descriptions 4– 79
4.7.7.11 IEEE Denormal Control Bits
In the case of IEEE exception completion modes, the handling of denormal operands and
results is controlled by the DNZ and UNDZ bits in the FPCR. These denormal control bits only
affect denormal handling by IEEE instructions that are modified by any valid qualifier combi-nation
that includes the /S (exception completion) qualifier.
The denormal control bits apply only to the IEEE operate instructions – ADD, SUB, MUL,
DIV, SQRT, CMPxx, and CVT with floating-point source operand.
If both the UNFD (underflow disable) bit and the UNDZ (underflow to zero) bit are set in the
FPCR, the implementation sets the result of an underflow operation to a true zero result. The
zeroing of a denormal result by UNDZ must also be treated as an inexact result.
If the DNZ (denormal operands to zero) bit is set in the FPCR, the implementation treats each
denormal operand as if it were a signed zero value. The source operands in the register are not
changed. If DNZ is set, IEEE operations with any valid qualifier combination that includes a /S
qualifier signal arithmetic traps as if any denormal operand were zero; that is, with DNZ set:
° An IEEE operation with a denormal operand never generates an overflow, underflow, or inexact result arithmetic trap.
° Dividing by a denormal operand is a division by zero or invalid operation as appropri-ate.
° Multiplying a denormal by infinity is an invalid operation.
° A SQRT of a negative denormal produces a –0 instead of an invalid operation.
° A denormal operand, treated as zero, does not take the denormal operand exception trap controlled by the DNOD bit in the FPCR.
Note that a hardware implementation may choose to support any subset of the denormal con-trol
bits, including the empty subset.
4.7. 8 Floating-Point Control Register (FPCR)
When an IEEE floating-point operate instruction specifies dynamic mode (/ D) in its function
field (function field bits <12: 11> = 11), the rounding mode to be used for the instruction is
derived from the FPCR register. The layout of the rounding mode bits and their assignments
matches exactly the format used in the 11-bit function field of the floating-point operate
instructions. The function field is described in Section 4.7.9.
In addition, the FPCR gives a summary of each exception type for the exception conditions
detected by all IEEE floating-point operates thus far, as well as an overall summary bit that
indicates whether any of these exception conditions has been detected. The individual excep-tion
bits match exactly in purpose and order the exception bits found in the exception summary
quadword that is pushed for arithmetic traps. However, for each instruction, these exception
bits are set independent of the trapping mode specified for the instruction. Therefore, even
though trapping may be disabled for a certain exceptional condition, the fact that the excep-tional
condition was encountered by an instruction is still recorded in the FPCR.
Floating-point operates that belong to the IEEE subset and CVTQL, which belongs to both
135
135
Page 136
137
4– 80 Alpha Architecture Handbook
VAX and IEEE subsets, appropriately set the FPCR exception bits. It is UNPREDICTABLE
whether floating-point operates that belong only to the VAX floating-point subset set the
FPCR exception bits.
Alpha floating-point hardware only transitions these exception bits from zero to one. Once set
to one, these exception bits are only cleared when software writes zero into these bits by writ-ing
a new value into the FPCR.
Section 4. 7.2
allows certain of the FPCR bits to be subsetted.
The format of the FPCR is shown in Figure 4–
1 and described in Table 4– 11.
Figure 4– 1: Floating-Point Control Register (FPCR) Format
Table 4– 11: Floating-Point Control Register (FPCR) Bit Descriptions
Bit Description (Meaning When Set)
63 Summary Bit (SUM). Records bitwise OR of FPCR exception bits. Equal to
FPCR< 57 |56 | 55 | 54 | 53 | 52>.
62 Inexact Disable (INED) * . Suppress INE trap and place correct IEEE nontrapping
result in the destination register.
61 Underflow Disable (UNFD) *
. Suppress UNF trap and place correct IEEE nontrap-ping
result in the destination register if the implementation is capable of produc-ing
correct IEEE nontrapping result. The correct result value is determined
according to the value of the UNDZ bit.
60 Underflow to Zero (UNDZ) *
. When set together with UNFD, on underflow, the
hardware places a true zero (64 bits of zero) in the destination register rather than
the result specified by the IEEE standard.
59– 58 Dynamic Rounding Mode (DYN). Indicates the rounding mode to be used by an
IEEE floating-point operate instruction when the instruction's function field spec-ifies
dynamic mode (/ D). Assignments are:
63 62 60 0
S U
M O V N E
U N
F
O V
F
D Z
E N V
58 59 57 56 55 54 53 52 51
RAZ/ IGN N V
50 49 48
D
D Z
E D
O V
F D DYN _RM
U N
D Z
U N
F
61
D
N E
D
I I I I I
47 46
D N
Z
N O
D
D
DYN IEEE Rounding Mode Selected
00 Chopped rounding mode
01 Minus infinity
10 Normal rounding
11 Plus infinity
136
136
Page 137
138
Instruction Descriptions 4– 81
FPCR is read from and written to the floating-point registers by the MT_ FPCR and MF_ FPCR
instructions respectively, which are described in Section 4.7.8.1.
57 Integer Overflow (IOV). An integer arithmetic operation or a conversion from
floating to integer overflowed the destination precision.
56 Inexact Result (INE). A floating arithmetic or conversion operation gave a result
that differed from the mathematically exact result.
55 Underflow (UNF). A floating arithmetic or conversion operation underflowed the
destination exponent.
54 Overflow (OVF). A floating arithmetic or conversion operation overflowed the
destination exponent.
53 Division by Zero (DZE). An attempt was made to perform a floating divide oper-ation
with a divisor of zero.
52 Invalid Operation (INV). An attempt was made to perform a floating arithmetic,
conversion, or comparison operation, and one or more of the operand values were
illegal.
51 Overflow Disable (OVFD) *
. Suppress OVF trap and place correct IEEE nontrap-ping
result in the destination register if the implementation is capable of produc-ing
correct IEEE nontrapping results.
50 Division by Zero Disable (DZED) *
. Suppress DZE trap and place correct IEEE
nontrapping result in the destination register if the implementation is capable of
producing correct IEEE nontrapping results.
49 Invalid Operation Disable (INVD) *
. Suppress INV trap and place correct IEEE
nontrapping result in the destination register if the implementation is capable of
producing correct IEEE nontrapping results.
48 Denormal Operands to Zero (DNZ) *
. Treat all denormal operands as a signed zero
value with the same sign as the denormal.
47 Denormal Operand Exception Disable (DNOD) *
. Suppress INV trap for valid
operations that involve denormal operand values and place the correct IEEE non-trapping
result in the destination register if the implementation is capable of pro-cessing
the denormal operand. If the result of the operation underflows, the
correct result is determined according to the value of the UNDZ bit. If DNZ is set,
DNOD has no effect because a denormal operand is treated as having a zero value
instead of a denormal value.
46– 0 Reserved. Read as Zero. Ignored when written.
* Bit only has meaning for IEEE instructions when any valid qualifier combination that
includes exception completion (/ S) is specified.
Table 4– 11: Floating-Point Control Register (FPCR) Bit Descriptions (Continued)
Bit Description (Meaning When Set)
137
137
Page 138
139
4– 82 Alpha Architecture Handbook
FPCR and the instructions to access it are required for an implementation that supports float-ing-
point (see Section 4.7. 8).
On implementations that do not support floating-point, the
instructions that access FPCR (MF_ FPCR and MT_ FPCR) take an Illegal Instruction Trap.
Software Note:
Support for FPCR is required on a system that supports the OpenVMS Alpha operating
system even if that system does not support floating-point.
4.7.8.1 Accessing the FPCR
Because Alpha floating-point hardware can overlap the execution of a number of float-ing-
point instructions, accessing the FPCR must be synchronized with other floating-point
instructions. An EXCB instruction must be issued both prior to and after accessing the FPCR
to ensure that the FPCR access is synchronized with the execution of previous and subsequent
floating-point instructions; otherwise synchronization is not ensured.
Issuing an EXCB followed by an MT_ FPCR followed by another EXCB ensures that only
floating-point instructions issued after the second EXCB are affected by and affect the new
value of the FPCR. Issuing an EXCB followed by an MF_ FPCR followed by another EXCB
ensures that the value read from the FPCR only records the exception information for float-ing-
point instructions issued prior to the first EXCB.
Consider the following example:
ADDT/ D
EXCB ;1
MT_ FPCR F1,F1, F1
EXCB ;2
SUBT/ D
Without the first EXCB, it is possible in an implementation for the ADDT/ D to execute in par-allel
with the MT_ FPCR. Thus, it would be UNPREDICTABLE whether the ADDT/ D was
affected by the new rounding mode set by the MT_ FPCR and whether fields cleared by the
MT_ FPCR in the exception summary were subsequently set by the ADDT/ D.
Without the second EXCB, it is possible in an implementation for the MT_ FPCR to execute in
parallel with the SUBT/ D. Thus, it would be UNPREDICTABLE whether the SUBT/ D was
affected by the new rounding mode set by the MT_ FPCR and whether fields cleared by the
MT_ FPCR in the exception summary field of FPCR were previously set by the SUBT/ D.
Specifically, code should issue an EXCB before and after it accesses the FPCR if that code
needs to see valid values in FPCR bits <63> and <57: 52>. An EXCB should be issued before
attempting to write the FPCR if the code expects changes to bits <59: 52> not to have depen-dencies
with prior instructions. An EXCB should be issued after attempting to write the FPCR
if the code expects subsequent instructions to have dependencies with changes to bits <59: 52>.
138
138
Page 139
140
Instruction Descriptions 4– 83
4.7.8.2 Default Values of the FPCR
Processor initialization leaves the value of FPCR UNPREDICTABLE.
Software Note:
Compaq software should initialize FPCR< DYN> = 10 during program activation. Using
this default, a program can be coded to use only dynamic rounding without the need to
explicitly set the rounding mode to normal rounding in its start-up code.
Program activation normally clears all other fields in the FPCR. However, this behavior
may depend on the operating system.
4.7.8.3 Saving and Restoring the FPCR
The FPCR must be saved and restored across context switches so that the FPCR value of one
process does not affect the rounding behavior and exception summary of another process.
The dynamic rounding mode put into effect by the programmer (or initialized by image activa-tion)
is valid for the entirety of the program and remains in effect until subsequently changed
by the programmer or until image run-down occurs.
Software Notes:
The following software notes apply to saving and restoring the FPCR:
1. The IEEE standard precludes saving and restoring the FPCR across subroutine calls.
2. The IEEE standard requires that an implementation provide status flags that are set whenever the corresponding conditions occur and are reset only at the user's request.
The exception bits in the FPCR do not satisfy that requirement, because they can be spuriously set by instructions in a trap shadow that should not have been executed had
the trap been taken synchronously.
The IEEE status flags can be provided by software (as software status bits) as follows:
Trap interface software (usually the operating system) keeps a set of software
status bits and a mask of the traps that the user wants to receive. Code is generated
with the /SUI qualifiers. For a particular exception, the software clears the
corresponding trap disable bit if either the corresponding software status bit is 0 or
if the user wants to receive such traps. If a trap occurs, the software locates the
offending instruction in the trap shadow, simulates it and sets any of the software
status bits that are appropriate. Then, the software either delivers the trap to the
user program or disables further delivery of such traps. The user program must
interface to this trap interface software to set or clear any of the software status bits
or to enable or disable floating-point traps. See Section B.
2.
When such a scheme is being used, the trap disable bits and denormal control bits
should be modified only by the trap interface software. If the disable bits are
spuriously cleared, unnecessary traps may occur. If they are spuriously set, the
software may fail to set the correct values in the software status bits. Programs should
call routines in the trap interface software to set or clear bits in the FPCR.
139
139
Page 140
141
4– 84 Alpha Architecture Handbook
Compaq software may choose to initialize the software status bits and the trap disable
bits to all 1's to avoid any initial trapping when an exception condition first occurs. Or,
software may choose to initialize those bits to all 0's in order to provide a summary of
the exception behavior when the program terminates.
In any event, the exception bits in the FPCR are still useful to programs. A program
can clear all of the exception bits in the FPCR, execute a single floating-point
instruction, and then examine the status bits to determine which hardware-defined
exceptions the instruction encountered. For this operation to work in the presence of
various implementation options, the single instruction should be followed by a TRAPB
or EXCB instruction, and exception completion by the system software should save
and restore the FPCR registers without other modifications.
3. Because of the way the LDS and STS instructions manipulate bits <61: 59> of float-ing-point registers, they should not be used to manipulate FPCR values.
4.7. 9 Floating-Point Instruction Function Field Format
The function code for IEEE and VAX floating-point instructions, bits <15.. 5>, contain the
function field. That field is shown in Figure 4– 2
and described for IEEE floating-point in Table
4– 12
and for VAX floating-point in Table 4– 13.
Function codes for the independent float-ing-
point instructions, those with opcode 17 16 , do not correspond to the function fields below.
The function field contains subfields that specify the trapping and rounding modes that are
enabled for the instruction, the source datatype, and the instruction class.
Figure 4– 2: Floating-Point Instruction Function Field
Opcode Fa Fb Fc T R
P
R N
D
S R
C
F N
C
31 25 20 15 12 10 8 4 0 5 9 11 13 16 21 26
140
140
Page 141
142
Instruction Descriptions 4– 85
Table 4– 12: IEEE Floating-Point Function Field Bit Summary
Bits Field Meaning *
15– 13 TRP Trapping modes:
12– 11 RND Rounding modes:
10– 9 SRC Source datatype:
Contents Meaning for Opcodes 14 16 and 16 16
000 Imprecise (default)
001 Underflow enable (/ U) — floating-point output
Integer overflow enable (/ V) — integer output
010 UNPREDICTABLE for opcode 16 16 instructions
Reserved for opcode 14 16 instructions
011 UNPREDICTABLE for opcode 16 16 instructions
Reserved for opcode 14 16 instructions
100 UNPREDICTABLE for opcode 16 16 instructions
Reserved for opcode 14 16 instructions
101 /SU — floating-point output
/SV — integer output
110 UNPREDICTABLE for opcode 16 16 instructions
Reserved for opcode 14 16 instructions
111 /SUI — floating-point output
/SVI — integer output
Contents Meaning for Opcodes 16 16 and 14 16
00 Chopped (/ C)
01 Minus infinity (/ M)
10 Normal (default)
11 Dynamic (/ D)
Contents Meaning for
Opcode 16 16
Meaning for
Opcode 14 16
00 S_ floating S_ floating
01 Reserved Reserved
10 T_ floating T_ floating
11 Q_ fixed Reserved
141
141
Page 142
143
4– 86 Alpha Architecture Handbook
8– 5 FNC Instruction class:
* Encodings for the instructions CVTST and CVTST/ S are exceptions to this table; use the
encodings in Section C. 1.
Table 4– 12: IEEE Floating-Point Function Field Bit Summary (Continued)
Bits Field Meaning *
Contents Meaning for
Opcode 16 16
Meaning for
Opcode 14 16
0000 ADDx Reserved
0001 SUBx Reserved
0010 MULx Reserved
0011 DIVx Reserved
0100 CMPxUN ITOFS/ ITOFT
0101 CMPxEQ Reserved
0110 CMPxLT Reserved
0111 CMPxLE Reserved
1000 Reserved Reserved
1001 Reserved Reserved
1010 Reserved Reserved
1011 Reserved SQRTS/ SQRTT
1100 CVTxS Reserved
1101 Reserved Reserved
1110 CVTxT Reserved
1111 CVTxQ Reserved
142
142
Page 143
144
Instruction Descriptions 4– 87
Table 4– 13: VAX Floating-Point Function Field Bit Summary
Bits Field Meaning
15– 13 TRP Trapping modes:
12– 11 RND Rounding modes:
10– 9 SRC Source datatype: *
Contents Meaning for Opcodes 14 16 and 15 16
000 Imprecise (default)
001 Underflow enable (/ U) – floating-point output
Integer overflow enable (/ V) – integer output
010 UNPREDICTABLE for opcode 15 16 instructions
Reserved for opcode 14 16 instructions
011 UNPREDICTABLE for opcode 15 16 instructions
Reserved for opcode 14 16 instructions
100 /S – Exception completion enable
101 /SU – floating-point output
/SV – integer output
110 UNPREDICTABLE for opcode 15 16 instructions
Reserved for opcode 14 16 instructions
111 UNPREDICTABLE for opcode 15 16 instructions
Reserved for opcode 14 16 instructions
Contents Meaning for Opcodes 15 16 and 14 16
00 Chopped (/ C)
01 UNPREDICTABLE
10 Normal (default)
11 UNPREDICTABLE
Contents Meaning for Opcode 15 16 Meaning for Opcode 14 16
00 F_ floating F_ floating
01 D_ floating F_ floating
10 G_ floating G_ floating
11 Q_ fixed Reserved
143
143
Page 144
145
4– 88 Alpha Architecture Handbook
4.7. 10 IEEE Standard
The IEEE Standard for Binary Floating-Point Arithmetic (ANSI/ IEEE Standard 754-1985) is
included by reference.
This standard leaves certain operations as implementation dependent. The remainder of this
section specifies the behavior of the Alpha architecture in these situations. Note that this
behavior may be supplied by either hardware (if the invalid operation disable, or INVD, bit is
implemented) or by software. See Sections
4.7.7.10, 4.7.7.11,
4.7.8,
4.7. 8.3,
and Section B.
1.
4.7.10.1 Conversion of NaN and Infinity Values
Conversion of a NaN or an Infinity value to an integer gives a result of zero.
Conversion of a NaN value from S_ floating to T_ floating gives a result identical to the input,
except that the most significant fraction bit (bit 51) is set to indicate a quiet NaN.
Conversion of a NaN value from T_ floating to S_ floating gives a result identical to the input,
except that the most significant fraction bit (bit 51) is set to indicate a quiet NaN, and bits
<28: 0> are cleared to zero.
8– 5 FNC Instruction class:
* In the SRC field, both 00 and 01 specify the F_ floating source datatype for opcode 14 16 .
Table 4– 13: VAX Floating-Point Function Field Bit Summary (Continued)
Bits Field Meaning
Contents Meaning for
Opcode 15 16
Meaning for
Opcode 14 16
0000 ADDx Reserved
0001 SUBx Reserved
0010 MULx Reserved
0011 DIVx Reserved
0100 CMPxUN ITOFF
0101 CMPxEQ Reserved
0110 CMPxLT Reserved
0111 CMPxLE Reserved
1000 Reserved Reserved
1001 Reserved Reserved
1010 Reserved SQRTF/ SQRTG
1011 Reserved Reserved
1100 CVTxF Reserved
1101 CVTxD Reserved
1110 CVTxG Reserved
1111 CVTxQ Reserved
144
144
Page 145
146
Instruction Descriptions 4– 89
4.7.10.2 Copying NaN Values
Copying a NaN value without changing its precision does not cause an invalid operation
exception.
4.7.10.3 Generating NaN Values
When an operation is required to produce a NaN and none of its inputs are NaN values, the
result of the operation is the quiet NaN value that has the sign bit set to one, all exponent bits
set to one (to indicate a NaN), the most significant fraction bit set to one (to indicate that the
NaN is quiet), and all other fraction bits cleared to zero. This value is referred to as "the canon-ical
quiet NaN."
4.7.10.4 Propagating NaN Values
When an operation is required to produce a NaN and one or both of its inputs are NaN values,
the IEEE standard requires that quiet NaN values be propagated when possible. With the Alpha
architecture, the result of such an operation is a NaN generated according to the first of the fol-lowing
rules that is applicable:
1. If the operand in the Fb register of the operation is a quiet NaN, that value is used as the
result.
2. If the operand in the Fb register of the operation is a signaling NaN, the result is the quiet NaN formed from the Fb value by setting the most significant fraction bit (bit 51)
to a one bit.
3. If the operation uses its Fa operand and the value in the Fa register is a quiet NaN, that value is used as the result.
4. If the operation uses its Fa operand and the value in the Fa register is a signaling NaN, the result is the quiet NaN formed from the Fa value by setting the most significant
fraction bit (bit 51) to a one bit.
5. The result is the canonical quiet NaN.
145
145
Page 146
147
4– 90 Alpha Architecture Handbook
4. 8 Memory Format Floating-Point Instructions
The instructions in this section move data between the floating-point registers and memory.
They use the Memory instruction format. They do not interpret the bits moved in any way; spe-cifically,
they do not trap on non-finite values.
The instructions are summarized in Table 4– 14.
Table 4– 14: Memory Format Floating-Point Instructions Summary
Mnemonic Operation Subset
LDF Load F_ floating VAX
LDG Load G_ floating (Load D_ floating) VAX
LDS Load S_ floating (Load Longword Integer) Both
LDT Load T_ floating (Load Quadword Integer) Both
STF Store F_ floating VAX
STG Store G_ floating (Store D_ floating) VAX
STS Store S_ floating (Store Longword Integer) Both
STT Store T_ floating (Store Quadword Integer) Both
146
146
Page 147
148
Instruction Descriptions 4– 91
4. 8.1 Load F_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 100 2
little_ endian_ data: va' ¬ va
ENDCASE
Fa ¬ (va')< 15> || MAP_ F(( va')< 14: 7>) || (va')< 6: 0> ||
(va')< 31: 16> || 0< 28: 0>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
LDF fetches an F_ floating datum from memory and writes it to register Fa. If the data is not
naturally aligned, an alignment exception is generated.
The MAP_ F function causes the 8-bit memory-format exponent to be expanded to an 11-bit
register-format exponent according to Table 2–
1.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian longword access, va< 2> (bit 2 of the virtual address) is inverted, and
any memory management fault is reported for va (not va'). The source operand is fetched
from memory and the bytes are reordered to conform to the F_ floating register format. The
result is then zero-extended in the low-order longword and written to register Fa.
LDF Fa. wf, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Read
Alignment
Translation Not Valid
LDF Load F_ floating
None
147
147
Page 148
149
4– 92 Alpha Architecture Handbook
4.8. 2 Load G_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
Fa ¬ (va)< 15: 0> || (va)< 31: 16> || (va)< 47: 32> || (va)< 63: 48>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
LDG fetches a G_ floating (or D_ floating) datum from memory and writes it to register Fa. If
the data is not naturally aligned, an alignment exception is generated.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
The source operand is fetched from memory, the bytes are reordered to conform to the
G_ floating register format (also conforming to the D_ floating register format), and the result is
then written to register Fa.
LDG Fa. wg, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Read
Alignment
Translation Not Valid
LDG Load G_ floating (Load D_ floating)
None
148
148
Page 149
150
Instruction Descriptions 4– 93
4.8. 3 Load S_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 100 2
little_ endian_ data: va' ¬ va
ENDCASE
Fa ¬ (va')< 31> || MAP_ S(( va')< 30: 23>) || (va')< 22: 0> || 0< 28: 0>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
LDS fetches a longword (integer or S_ floating) from memory and writes it to register Fa. If the
data is not naturally aligned, an alignment exception is generated. The MAP_ S function causes
the 8-bit memory-format exponent to be expanded to an 11-bit register-format exponent
according to Table 2– 2.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian longword access, va< 2> (bit 2 of the virtual address) is inverted, and
any memory management fault is reported for va (not va'). The source operand is fetched
from memory, is zero-extended in the low-order longword, and then written to register Fa.
Longword integers in floating registers are stored in bits <63: 62,58: 29>, with bits <61: 59>
ignored and zeros in bits <28: 0>.
LDS Fa. ws, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Read
Alignment
Translation Not Valid
LDS Load S_ floating (Load Longword Integer)
None
149
149
Page 150
151
4– 94 Alpha Architecture Handbook
4. 8.4 Load T_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
Fa ¬ (va)< 63: 0>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
LDT fetches a quadword (integer or T_ floating) from memory and writes it to register Fa. If
the data is not naturally aligned, an alignment exception is generated.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
The source operand is fetched from memory and written to register Fa.
LDT Fa. wt, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Read
Alignment
Translation Not Valid
LDT Load T_ floating (Load Quadword Integer)
None
150
150
Page 151
152
Instruction Descriptions 4– 95
4. 8.5 Store F_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 100 2
little_ endian_ data: va' ¬ va
ENDCASE
(va')< 31: 0> ¬ Fav< 44: 29> || Fav< 63: 62> || Fav< 58: 45>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
STF stores an F_ floating datum from Fa to memory. If the data is not naturally aligned, an
alignment exception is generated.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian longword access, va< 2> (bit 2 of the virtual address) is inverted, and
any memory management fault is reported for va (not va'). The bits of the source operand are
fetched from register Fa, the bits are reordered to conform to F_ floating memory format, and
the result is then written to memory. Bits <61: 59> and <28: 0> of Fa are ignored. No checking
is done.
STF Fa. rf, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Write
Alignment
Translation Not Valid
STF Store F_ floating
None
151
151
Page 152
153
4– 96 Alpha Architecture Handbook
4.8. 6 Store G_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
(va)< 63: 0> ¬ Fav< 15: 0> || Fav< 31: 16> || Fav< 47: 32> || Fav< 63: 48>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
STG stores a G_ floating (or D_ floating) datum from Fa to memory. If the data is not naturally
aligned, an alignment exception is generated.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
The source operand is fetched from register Fa, the bytes are reordered to conform to the
G_ floating memory format (also conforming to the D_ floating memory format), and the result
is then written to memory.
STG Fa. rg, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Write
Alignment
Translation Not Valid
STG Store G_ floating (Store D_ floating)
None
152
152
Page 153
154
Instruction Descriptions 4– 97
4.8. 7 Store S_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
CASE
big_ endian_ data: va' ¬ va XOR 100 2
little_ endian_ data: va' ¬ va
ENDCASE
(va')< 31: 0> ¬ Fav< 63: 62> || Fav< 58: 29>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
STS stores a longword (integer or S_ floating) datum from Fa to memory. If the data is not nat-urally
aligned, an alignment exception is generated.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
For a big-endian longword access, va< 2> (bit 2 of the virtual address) is inverted, and
any memory management fault is reported for va (not va'). The bits of the source operand are
fetched from register Fa, the bits are reordered to conform to S_ floating memory format, and
the result is then written to memory. Bits <61: 59> and <28: 0> of Fa are ignored. No checking
is done.
STS Fa. rs, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Write
Alignment
Translation Not Valid
STS Store S_ floating (Store Longword Integer)
None
153
153
Page 154
155
4– 98 Alpha Architecture Handbook
4.8. 8 Store T_ floating
Format:
Operation:
va ¬ {Rbv + SEXT( disp)}
(va)< 63: 0> ¬ Fav< 63: 0>
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
STT stores a quadword (integer or T_ floating) datum from Fa to memory. If the data is not nat-urally
aligned, an alignment exception is generated.
The virtual address is computed by adding register Rb to the sign-extended 16-bit displace-ment.
The source operand is fetched from register Fa and written to memory.
STT Fa. rt, disp. ab( Rb. ab) !Memory format
Access Violation
Fault on Write
Alignment
Translation Not Valid
STT Store T_ floating (Store Quadword Integer)
None
154
154
Page 155
156
Instruction Descriptions 4– 99
4. 9 Branch Format Floating-Point Instructions
Alpha provides six floating conditional branch instructions. These branch-format instructions
test the value of a floating-point register and conditionally change the PC.
They do not interpret the bits tested in any way; specifically, they do not trap on non-finite
values.
The test is based on the sign bit and whether the rest of the register is all zero bits. All 64 bits
of the register are tested. The test is independent of the format of the operand in the register.
Both plus and minus zero are equal to zero. A non-zero value with a sign of zero is greater than
zero. A non-zero value with a sign of one is less than zero. No reserved operand or non-finite
checking is done.
The floating-point branch operations are summarized in Table 4– 15:
Table 4– 15: Floating-Point Branch Instructions Summary
Mnemonic Operation Subset
FBEQ Floating Branch Equal Both
FBGE Floating Branch Greater Than or Equal Both
FBGT Floating Branch Greater Than Both
FBLE Floating Branch Less Than or Equal Both
FBLT Floating Branch Less Than Both
FBNE Floating Branch Not Equal Both
155
155
Page 156
157
4– 100 Alpha Architecture Handbook
4.9. 1 Conditional Branch
Format:
Operation:
{update PC}
va ¬ PC + {4* SEXT( disp)}
IF TEST( Fav, Condition_ based_ on_ Opcode) THEN
PC ¬ va
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Fa is tested. If the specified relationship is true, the PC is loaded with the target vir-tual
address; otherwise, execution continues with the next sequential instruction.
The displacement is treated as a signed longword offset. This means it is shifted left two bits
(to address a longword boundary), sign-extended to 64 bits, and added to the updated PC to
form the target virtual address.
The conditional branch instructions are PC-relative only. The 21-bit signed displacement gives
a forward/ backward branch distance of +/– 1M instructions.
FBxx Fa. rq, disp. al !Branch format
None
FBEQ Floating Branch Equal
FBGE Floating Branch Greater Than or Equal
FBGT Floating Branch Greater Than
FBLE Floating Branch Less Than or Equal
FBLT Floating Branch Less Than
FBNE Floating Branch Not Equal
None
156
156
Page 157
158
Instruction Descriptions 4– 101
Notes:
° To branch properly on non-finite operands, compare to F31, then branch on the result of the compare.
° The largest negative integer (8000 0000 0000 0000 16 ) is the same bit pattern as floating minus zero, so it is treated as equal to zero by the branch instructions. To branch prop-erly
on the largest negative integer, convert it to floating or move it to an integer regis-ter
and do an integer branch.
157
157
Page 158
159
4– 102 Alpha Architecture Handbook
4. 10 Floating-Point Operate Format Instructions
The floating-point bit-operate instructions perform copy and integer convert operations on
64-bit register values. The bit-operate instructions do not interpret the bits moved in any way;
specifically, they do not trap on non-finite values.
The floating-point arithmetic-operate instructions perform add, subtract, multiply, divide, com-pare,
register move, squre root, and floating convert operations on 64-bit register values in one
of the four specified floating formats.
Each instruction specifies the source and destination formats of the values, as well as the
rounding mode and trapping mode to be used. These instructions use the Floating-point Oper-ate
format.
Floating-point convert and square-root (FIX) extension implementation note:
The FIX extension to the architecture provides the FTOIx, ITOFx, and SQRTx
instructions. Alpha processors for which the AMASK instruction returns bit 1 set
implement these instructions. Those processors for which AMASK does not return bit 1 set
can take an Illegal Instruction trap, and software can emulate their function, if required.
AMASK is described in Sections 4.11.1
and D.
3.
The floating-point operate instructions are summarized in Table 4– 16.
Table 4– 16: Floating-Point Operate Instructions Summary
Mnemonic Operation Subset
Bit and FPCR Operations:
CPYS Copy Sign Both
CPYSE Copy Sign and Exponent Both
CPYSN Copy Sign Negate Both
CVTLQ Convert Longword to Quadword Both
CVTQL Convert Quadword to Longword Both
FCMOVxx Floating Conditional Move Both
MF_ FPCR Move from Floating-point Control Register Both
MT_ FPCR Move to Floating-point Control Register Both
158
158
Page 159
160
Instruction Descriptions 4– 103
Arithmetic Operations
ADDF Add F_ floating VAX
ADDG Add G_ floating VAX
ADDS Add S_ floating IEEE
ADDT Add T_ floating IEEE
CMPGxx Compare G_ floating VAX
CMPTxx Compare T_ floating IEEE
CVTDG Convert D_ floating to G_ floating VAX
CVTGD Convert G_ floating to D_ floating VAX
CVTGF Convert G_ floating to F_ floating VAX
CVTGQ Convert G_ floating to Quadword VAX
CVTQF Convert Quadword to F_ floating VAX
CVTQG Convert Quadword to G_ floating VAX
CVTQS Convert Quadword to S_ floating IEEE
CVTQT Convert Quadword to T_ floating IEEE
CVTST Convert S_ floating to T_ floating IEEE
CVTTQ Convert T_ floating to Quadword IEEE
CVTTS Convert T_ floating to S_ floating IEEE
DIVF Divide F_ floating VAX
DIVG Divide G_ floating VAX
DIVS Divide S_ floating IEEE
DIVT Divide T_ floating IEEE
FTOIS Floating-point to integer register move, S_ floating IEEE
FTOIT Floating-point to integer register move, T_ floating IEEE
ITOFF Integer to floating-point register move, F_ floating VAX
ITOFS Integer to floating-point register move, S_ floating IEEE
ITOFT Integer to floating-point register move, T_ floating IEEE
Table 4– 16: Floating-Point Operate Instructions Summary (Continued)
Mnemonic Operation Subset
159
159
Page 160
161
4– 104 Alpha Architecture Handbook
Arithmetic Operations
MULF Multiply F_ floating VAX
MULG Multiply G_ floating VAX
MULS Multiply S_ floating IEEE
MULT Multiply T_ floating IEEE
SQRTF Square root F_ floating VAX
SQRTG Square root G_ floating VAX
SQRTS Square root S_ floating IEEE
SQRTT Square root T_ floating IEEE
SUBF Subtract F_ floating VAX
SUBG Subtract G_ floating VAX
SUBS Subtract S_ floating IEEE
SUBT Subtract T_ floating IEEE
Table 4– 16: Floating-Point Operate Instructions Summary (Continued)
Mnemonic Operation Subset
160
160
Page 161
162
Instruction Descriptions 4– 105
4.10. 1 Copy Sign
Format:
Operation:
CASE
CPYS: Fc ¬ Fav< 63> || Fbv< 62: 0>
CPYSN: Fc ¬ NOT( Fav< 63>) || Fbv< 62: 0>
CPYSE: Fc ¬ Fav< 63: 52> || Fbv< 51: 0>
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
For CPYS and CPYSN, the sign bit of Fa is fetched (and complemented in the case of CPYSN)
and concatenated with the exponent and fraction bits from Fb; the result is stored in Fc.
For CPYSE, the sign and exponent bits from Fa are fetched and concatenated with the fraction
bits from Fb; the result is stored in Fc.
No checking of the operands is performed.
Notes:
° Register moves can be performed using CPYS Fx, Fx, Fy. Floating-point absolute value can be done using CPYS F31,Fx, Fy. Floating-point negation can be done using
CPYSN Fx, Fx, Fy. Floating values can be scaled to a known range by using CPYSE.
CPYSy Fa. rq, Fb. rq, Fc. wq !Floating-point Operate format
None
CPYS Copy Sign
CPYSE Copy Sign and Exponent
CPYSN Copy Sign Negate
None
161
161
Page 162
163
4– 106 Alpha Architecture Handbook
4.10. 2 Convert Integer to Integer
Format:
Operation:
CASE
CVTQL: Fc ¬ Fbv< 31: 30> || 0< 2: 0> || Fbv< 29: 0> || 0< 28: 0>
CVTLQ: Fc ¬ SEXT( Fbv< 63: 62> || Fbv< 58: 29>)
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The two's-complement operand in register Fb is converted to a two's-complement result and
written to register Fc. Register Fa must be F31.
The conversion from quadword to longword is a repositioning of the low 32 bits of the oper-and,
with zero fill and optional integer overflow checking. Integer overflow occurs if Fb is
outside the range –2** 31.. 2** 31– 1. If integer overflow occurs, the truncated result is stored in
Fc, and an arithmetic trap is taken if enabled.
The conversion from longword to quadword is a repositioning of 32 bits of the operand, with
sign extension.
CVTxy Fb. rq, Fc. wx !Floating-point Operate format
Integer Overflow, CVTQL only
CVTLQ Convert Longword to Quadword
CVTQL Convert Quadword to Longword
Trapping: Exception Completion (/ S) (CVTQL only)
Integer Overflow Enable (/ V) (CVTQL only)
162
162
Page 163
164
Instruction Descriptions 4– 107
4.10. 3 Floating-Point Conditional Move
Format:
Operation:
IF TEST( Fav, Condition_ based_ on_ Opcode) THEN
Fc ¬ Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Fa is tested. If the specified relationship is true, register Fb is written to register Fc;
otherwise, the move is suppressed and register Fc is unchanged. The test is based on the sign
bit and whether the rest of the register is all zero bits, as described for floating branches in Sec-tion
4.9.
FCMOVxx Fa. rq, Fb. rq, Fc. wq !Floating-point Operate format
None
FCMOVEQ FCMOVE if Register Equal to Zero
FCMOVGE FCMOVE if Register Greater Than or Equal to Zero
FCMOVGT FCMOVE if Register Greater Than Zero
FCMOVLE FCMOVE if Register Less Than or Equal to Zero
FCMOVLT FCMOVE if Register Less Than Zero
FCMOVNE FCMOVE if Register Not Equal to Zero
None
163
163
Page 164
165
4– 108 Alpha Architecture Handbook
Notes:
Except that it is likely in many implementations to be substantially faster, the instruction:
FCMOVxx Fa, Fb, Fc
is exactly equivalent to:
FByy Fa, label ! yy = NOT xx
CPYS Fb, Fb, Fc
label: ...
For example, a branchless sequence for:
F1= MAX( F1, F2)
is:
CMPxLT F1, F2, F3 ! F3= one if F1< F2; x= F/ G/ S/ T
FCMOVNE F3, F2, F1 ! Move F2 to F1 if F1< F2
164
164
Page 165
166
Instruction Descriptions 4– 109
4.10. 4 Move from/ to Floating-Point Control Register
Format:
Operation:
CASE
MF_ FPCR: Fa ¬ FPCR
MT_ FPCR: FPCR ¬ Fav
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The Floating-point Control Register (FPCR) is read from (MF_ FPCR) or written to
(MT_ FPCR), a floating-point register. The floating-point register to be used is specified by the
Fa, Fb, and Fc fields all pointing to the same floating-point register. If the Fa, Fb, and Fc fields
do not all point to the same floating-point register, then it is UNPREDICTABLE which regis-ter
is used. If the Fa, Fb, and Fc fields do not all point to the same floating-point register, the
resulting values in the Fc register and in FPCR are UNPREDICTABLE.
If the Fc f ield is F31 in the case of MT_ FPCR, the resulting value in FPCR is
UNPREDICTABLE.
The use of these instructions and the FPCR are described in Section 4.7.8.
Mx_ FPCR Fa. rq, Fa. rq, Fa. wq !Floating-point Operate format
None
MF_ FPCR Move from Floating-point Control Register
MT_ FPCR Move to Floating-point Control Register
4– 110 Alpha Architecture Handbook
4.10. 5 VAX Floating Add
Format:
Operation:
Fc ¬ Fav + Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Fa is added to register Fb, and the sum is written to register Fc.
The sum is rounded or chopped to the specified precision, and then the corresponding range is
checked for overflow/ underflow. The single-precision operation on canonical single-precision
values produces a canonical single-precision result.
An invalid operation trap is signaled if either operand has exp= 0 and is not a true zero (that is,
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if
this occurs. See Section 4.7.7
for details of the stored result on overflow or underflow.
ADDx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
ADDF Add F_ floating
ADDG Add G_ floating
Rounding: Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
166
166
Page 167
168
Instruction Descriptions 4– 111
4.10. 6 IEEE Floating Add
Format:
Operation:
Fc ¬ Fav + Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Fa is added to register Fb, and the sum is written to register Fc.
The sum is rounded to the specified precision and then the corresponding range is checked for
overflow/ underflow. The single-precision operation on canonical single-precision values pro-duces
a canonical single-precision result.
See Section 4.7.7
for details of the stored result on overflow, underflow, or inexact result.
ADDx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
Inexact Result
ADDS Add S_ floating
ADDT Add T_ floating
Rounding: Dynamic (/ D)
Minus infinity (/ M)
Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
Inexact Enable (/ I)
167
167
Page 168
169
4– 112 Alpha Architecture Handbook
4.10. 7 VAX Floating Compare
Format:
Operation:
IF Fav SIGNED_ RELATION Fbv THEN
Fc ¬ 4000 0000 0000 0000 16
ELSE
Fc ¬ 0000 0000 0000 0000 16
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The two operands in Fa and Fb are compared. If the relationship specified by the qualifier is
true, a non-zero floating value (0. 5) is written to register Fc; otherwise, a true zero is written to
Fc.
Comparisons are exact and never overflow or underflow. Three mutually exclusive relations
are possible: less than, equal, and greater than.
An invalid operation trap is signaled if either operand has exp= 0 and is not a true zero (that is,
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if
this occurs.
Notes:
° Compare Less Than A, B is the same as Compare Greater Than B, A; Compare Less Than or Equal A, B is the same as Compare Greater Than or Equal B, A. Therefore, only
the less-than operations are included.
CMPGyy Fa. rg, Fb. rg, Fc. wq !Floating-point Operate format
Invalid Operation
CMPGEQ Compare G_ floating Equal
CMPGLE Compare G_ floating Less Than or Equal
CMPGLT Compare G_ floating Less Than
Trapping: Exception Completion (/ S)
168
168
Page 169
170
Instruction Descriptions 4– 113
4.10. 8 IEEE Floating Compare
Format:
Operation:
IF Fav SIGNED_ RELATION Fbv THEN
Fc ¬ 4000 0000 0000 0000 16
ELSE
Fc ¬ 0000 0000 0000 0000 16
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The two operands in Fa and Fb are compared. If the relationship specified by the qualifier is
true, a non-zero floating value (2. 0) is written to register Fc; otherwise, a true zero is written to
Fc.
Comparisons are exact and never overflow or underflow. Four mutually exclusive relations are
possible: less than, equal, greater than, and unordered. The unordered relation is true if one or
both operands are NaN. (This behavior must be provided by an operating system (OS) comple-tion
handler, since NaNs trap.) Comparisons ignore the sign of zero, so +0 = –0.
Comparisons with plus and minus infinity execute normally and do not take an invalid operation
trap.
Notes:
° In order to use CMPTxx with exception completion handling, it is necessary to specify the /SU IEEE trap mode, even though an underflow trap is not possible.
° Compare Less Than A, B is the same as Compare Greater Than B, A; Compare Less Than or Equal A, B is the same as Compare Greater Than or Equal B, A. Therefore, only
the less-than operations are included.
CMPTyy Fa. rx, Fb. rx, Fc. wq !Floating-point Operate format
Invalid Operation
CMPTEQ Compare T_ floating Equal
CMPTLE Compare T_ floating Less Than or Equal
CMPTLT Compare T_ floating Less Than
CMPTUN Compare T_ floating Unordered
Trapping: Exception Completion (/ SU)
169
169
Page 170
171
4– 114 Alpha Architecture Handbook
4.10. 9 Convert VAX Floating to Integer
Format:
Operation:
Fc ¬ {conversion of Fbv}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The floating operand in register Fb is converted to a two's-complement quadword number and
written to register Fc. The conversion aligns the operand fraction with the binary point just to
the right of bit zero, rounds as specified, and complements the result if negative. Register Fa
must be F31.
An invalid operation trap is signaled if the operand has exp= 0 and is not a true zero (that is,
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if
this occurs.
See Section 4.7.7
for details of the stored result on integer overflow.
CVTGQ Fb. rx, Fc. wq !Floating-point Operate format
Invalid Operation
Integer Overflow
CVTGQ Convert G_ floating to Quadword
Rounding: Chopped (/ C)
Trapping: Exception Completion (/ S)
Integer Overflow Enable (/ V)
170
170
Page 171
172
Instruction Descriptions 4– 115
4.10. 10 Convert Integer to VAX Floating
Format:
Operation:
Fc ¬ {conversion of Fbv< 63: 0>}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The two's-complement quadword operand in register Fb is converted to a single-or dou-ble-
precision floating result and written to register Fc. The conversion complements a number
if negative, normalizes it, rounds to the target precision, and packs the result with an appropri-ate
sign and exponent field. Register Fa must be F31.
CVTQy Fb. rq, Fc. wx !Floating-point Operate format
None
CVTQF Convert Quadword to F_ floating
CVTQG Convert Quadword to G_ floating
Rounding: Chopped (/ C)
171
171
Page 172
173
4– 116 Alpha Architecture Handbook
4.10. 11 Convert VAX Floating to VAX Floating
Format:
Operation:
Fc ¬ {conversion of Fbv}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The floating operand in register Fb is converted to the specified alternate floating format and
written to register Fc. Register Fa must be F31.
An invalid operation trap is signaled if the operand has exp= 0 and is not a true zero (that is,
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if
this occurs.
See Section 4.7.7
for details of the stored result on overflow or underflow.
Notes:
° The only arithmetic operations on D_ floating values are conversions to and from G_ floating. The conversion to G_ floating rounds or chops as specified, removing three
fraction bits. The conversion from G_ floating to D_ floating adds three low-order zeros
as fraction bits, then the 8-bit exponent range is checked for overflow/ underflow.
° The conversion from G_ floating to F_ floating rounds or chops to single precision, then the 8-bit exponent range is checked for overflow/ underflow.
° No conversion from F_ floating to G_ floating is required, since F_ floating values are always stored in registers as equivalent G_ floating values.
CVTxy Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
CVTDG Convert D_ floating to G_ floating
CVTGD Convert G_ floating to D_ floating
CVTGF Convert G_ floating to F_ floating
Rounding: Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
172
172
Page 173
174
Instruction Descriptions 4– 117
4.10. 12 Convert IEEE Floating to Integer
Format:
Operation:
Fc ¬ {conversion of Fbv}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The floating operand in register Fb is converted to a two's-complement number and written to
register Fc. The conversion aligns the operand fraction with the binary point just to the right of
bit zero, rounds as specified, and complements the result if negative. Register Fa must be F31.
See Section 4.7.7
for details of the stored result on integer overflow and inexact result.
CVTTQ Fb. rx, Fc. wq !Floating-point Operate format
Invalid Operation
Inexact Result
Integer Overflow
CVTTQ Convert T_ floating to Quadword
Rounding: Dynamic (/ D)
Minus infinity (/ M)
Chopped (/ C)
Trapping: Exception Completion (/ S)
Integer Overflow Enable (/ V)
Inexact Enable (/ I)
173
173
Page 174
175
4– 118 Alpha Architecture Handbook
4.10. 13 Convert Integer to IEEE Floating
Format:
Operation:
Fc ¬ {conversion of Fbv< 63: 0>}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The two's-complement operand in register Fb is converted to a single-or double-precision
floating result and written to register Fc. The conversion complements a number if negative,
normalizes it, rounds to the target precision, and packs the result with an appropriate sign and
exponent field. Register Fa must be F31.
See Section 4.7.7
for details of the stored result on inexact result.
Notes:
° In order to use CVTQS or CVTQT with exception completion handling, it is necessary to specify the /SUI IEEE trap mode, even though an underflow trap is not possible.
CVTQy Fb. rq, Fc. wx !Floating-point Operate format
Inexact Result
CVTQS Convert Quadword to S_ floating
CVTQT Convert Quadword to T_ floating
Rounding: Dynamic (/ D)
Minus infinity (/ M)
Chopped (/ C)
Trapping: Exception Completion (/ S)
Inexact Enable (/ I)
174
174
Page 175
176
Instruction Descriptions 4– 119
4.10. 14 Convert IEEE S_ Floating to IEEE T_ Floating
Format:
Operation:
Fc ¬ {conversion of Fbv}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The S_ floating operand in register Fb is converted to T_ floating format and written to register
Fc. Register Fa must be F31.
Notes:
° The conversion from S_ floating to T_ floating is exact. No rounding occurs. No under-flow, overflow, or inexact result can occur. In fact, the conversion for finite values is the
identity transformation.
° A trap handler can convert an S_ floating denormal value into the corresponding T_ floating finite value by adding 896 to the exponent and normalizing.
CVTST Fb. rx, Fc. wx ! Floating-point Operate format
Invalid Operation
CVTST Convert S_ floating to T_ floating
Trapping: Exception Completion (/ S)
175
175
Page 176
177
4– 120 Alpha Architecture Handbook
4.10. 15 Convert IEEE T_ Floating to IEEE S_ Floating
Format:
Operation:
Fc ¬ {conversion of Fbv}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The T_ floating operand in register Fb is converted to S_ floating format and written to register
Fc. Register Fa must be F31.
See Section 4.7.7
for details of the stored result on overflow, underflow, or inexact result.
CVTTS Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
Inexact Result
CVTTS Convert T_ floating to S_ floating
Rounding: Dynamic (/ D)
Minus infinity (/ M)
Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
Inexact Enable (/ I)
176
176
Page 177
178
Instruction Descriptions 4– 121
4.10. 16 VAX Floating Divide
Format:
Operation:
Fc ¬ Fav / Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The dividend operand in register Fa is divided by the divisor operand in register Fb and the
quotient is written to register Fc.
The quotient is rounded or chopped to the specified precision and then the corresponding range
is checked for overflow/ underflow. The single-precision operation on canonical single-preci-sion
values produces a canonical single-precision result.
An invalid operation trap is signaled if either operand has exp= 0 and is not a true zero (that is,
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if
this occurs.
A division by zero trap is signaled if Fbv is zero. The contents of Fc are UNPREDICTABLE if
this occurs.
See Section 4.7.7
for details of the stored result on overflow or underflow.
DIVx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Division by Zero
Overflow
Underflow
DIVF Divide F_ floating
DIVG Divide G_ floating
Rounding: Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
177
177
Page 178
179
4– 122 Alpha Architecture Handbook
4.10. 17 IEEE Floating Divide
Format:
Operation:
Fc ¬ Fav / Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The dividend operand in register Fa is divided by the divisor operand in register Fb and the
quotient is written to register Fc.
The quotient is rounded to the specified precision and then the corresponding range is checked
for overflow/ underflow. The single-precision operation on canonical single-precision values
produces a canonical single-precision result.
See Section 4.7.7
for details of the stored result on overflow, underflow, or inexact result.
DIVx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Division by Zero
Overflow
Underflow
Inexact Result
DIVS Divide S_ floating
DIVT Divide T_ floating
Rounding: Dynamic (/ D)
Minus infinity (/ M)
Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
Inexact Enable (/ I)
178
178
Page 179
180
Instruction Descriptions 4– 123
4.10. 18 Floating-Point Register to Integer Register Move
Format:
Operation:
CASE:
FTOIS:
Rc< 63: 32> ¬ SEXT( Fav< 63>)
Rc< 31: 0> ¬ Fav< 63: 62> || Fav <58: 29>
FTOIT:
Rc <-Fav
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Data in a floating-point register file is moved to an integer register file.
The Fb field must be F31.
The instructions do not interpret bits in the register files; specifically, the instructions do not
trap on non-finite values. Also, the instructions do not access memory.
FTOIS is exactly equivalent to the sequence:
STS
LDL
FTOIT is exactly equivalent to the sequence:
STT
LDQ
Software Note:
FTOIS and FTOIT are no slower than the corresponding store/ load sequence and can be
significantly faster.
FTOIx Fa. rq, Rc. wq !Floating-point Operate format
None
FTOIS Floating-point to Integer Register Move, S_ floating
FTOIT Floating-point to Integer Register Move, T_ floating
None
179
179
Page 180
181
4– 124 Alpha Architecture Handbook
4.10. 19 Integer Register to Floating-Point Register Move
Format:
Operation:
CASE:
ITOFF:
Fc ¬ Rav< 31> || MAP_ F( Rav< 30: 23> || Rav< 22: 0> || 0< 28: 0>
ITOFS:
Fc ¬ Rav< 31> || MAP_ S( Rav< 30: 23> || Rav< 22: 0> || 0< 28: 0>
ITOFT:
Fc <-Rav
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Data in an integer register file is moved to a floating-point register file.
The Rb field must be R31.
The instructions do not interpret bits in the register files; specifically, the instructions do not
trap on non-finite values. Also, the instructions do not access memory.
ITOFF is equivalent to the following sequence, except that the word swapping that LDF nor-mally
performs is not performed by ITOFF:
STL
LDF
ITOFx Ra. rq, Fc. wq !Floating-point Operate format
None
ITOFF Integer to Floating-point Register Move, F_ floating
ITOFS Integer to Floating-point Register Move, S_ floating
ITOFT Integer to Floating-point Register Move, T_ floating
None
180
180
Page 181
182
Instruction Descriptions 4– 125
ITOFS is exactly equivalent to the sequence:
STL
LDS
ITOFT is exactly equivalent to the sequence:
STQ
LDT
Software Note:
ITOFF, ITOFS, and ITOFT are no slower than the corresponding store/ load sequence and
can be significantly faster.
181
181
Page 182
183
4– 126 Alpha Architecture Handbook
4.10. 20 VAX Floating Multiply
Format:
Operation:
Fc ¬ Fav * Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The multiplicand operand in register Fb is multiplied by the multiplier operand in register Fa
and the product is written to register Fc.
The product is rounded or chopped to the specified precision and then the corresponding range
is checked for overflow/ underflow. The single-precision operation on canonical single-preci-
sion values produces a canonical single-precision result.
An invalid operation trap is signaled if either operand has exp= 0 and is not a true zero (that is,
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if
this occurs.
See Section 4.7.7
for details of the stored result on overflow or underflow.
MULx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
MULF Multiply F_ floating
MULG Multiply G_ floating
Rounding: Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
182
182
Page 183
184
Instruction Descriptions 4– 127
4.10. 21 IEEE Floating Multiply
Format:
Operation:
Fc ¬ Fav * Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The multiplicand operand in register Fb is multiplied by the multiplier operand in register Fa
and the product is written to register Fc.
The product is rounded to the specified precision and then the corresponding range is checked
for overflow/ underflow. The single-precision operation on canonical single-precision values
produces a canonical single-precision result.
See Section 4.7.7
for details of the stored result on overflow, underflow, or inexact result.
MULx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
Inexact Result
MULS Multiply S_ floating
MULT Multiply T_ floating
Rounding: Dynamic (/ D)
Minus infinity (/ M)
Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
Inexact Enable (/ I)
183
183
Page 184
185
4– 128 Alpha Architecture Handbook
4.10. 22 VAX Floating Square Root
Format:
Operation:
Fc ¬ Fb ** (1/ 2)
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The square root of the floating-point operand in register Fb is written to register Fc. (The Fa
field of this instruction must be set to a value of F31.)
The result is rounded or chopped to the specified precision. The single-precision operation on a
canonical single-precision value produces a canonical single-precision result.
An invalid operation is signaled if the operand has exp= 0 and is not a true zero (that is, VAX
reserved operands and dirty zeros trap). An invalid operation is signaled if the sign of the oper-and
is negative.
The contents of the Fc are UNPREDICTABLE if an invalid operation is signaled.
Notes:
° Floating-point overflow and underflow are not possible for square root operation. The underflow enable qualifier is ignored.
SQRTx Fb. rx, Fc. wx !Floating-point Operate format
Invalid operation
SQRTF Square root F_ floating
SQRTG Square root G_ floating
Rounding: Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U) — See Notes below
184
184
Page 185
186
Instruction Descriptions 4– 129
4.10. 23 IEEE Floating Square Root
Format:
Operation:
Fc ¬ Fb ** (1/ 2)
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The square root of the floating-point operand in register Fb is written to register Fc. (The Fa
field of this instruction must be set to a value of F31.)
The result is rounded to the specified precision. The single-precision operation on a canonical
single-precision value produces a canonical single-precision result.
An invalid operation is signaled if the sign of the operand is less than zero. However, SQRT
(– 0) produces a result of –0.
Notes:
° Floating-point overflow and underflow are not possible for square root operation. The underflow enable qualifier is ignored.
SQRTx Fb. rx, Fc. wx !Floating-point Operate format
Inexact result
Invalid operation
SQRTS Square root S_ floating
SQRTT Square root T_ floating
Rounding: Chopped (/ C)
Dynamic (/ D)
Minus infinity (/ M)
Trapping: Inexact Enable (/ I)
Exception Completion (/ S)
Underflow Enable (/ U) — See Notes below
185
185
Page 186
187
4– 130 Alpha Architecture Handbook
4.10. 24 VAX Floating Subtract
Format:
Operation:
Fc ¬ Fav -Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The subtrahend operand in register Fb is subtracted from the minuend operand in register Fa
and the difference is written to register Fc.
The difference is rounded or chopped to the specified precision and then the corresponding
range is checked for overflow/ underflow. The single-precision operation on canonical sin-gle-
precision values produces a canonical single-precision result.
An invalid operation trap is signaled if either operand has exp= 0 and is not a true zero (that is,
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if
this occurs.
See Section 4.7.7
for details of the stored result on overflow or underflow.
SUBx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
SUBF Subtract F_ floating
SUBG Subtract G_ floating
Rounding: Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
186
186
Page 187
188
Instruction Descriptions 4– 131
4.10. 25 IEEE Floating Subtract
Format:
Operation:
Fc ¬ Fav -Fbv
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The subtrahend operand in register Fb is subtracted from the minuend operand in register Fa
and the difference is written to register Fc.
The difference is rounded to the specified precision and then the corresponding range is
checked for overflow/ underflow. The single-precision operation on canonical single-precision
values produces a canonical single-precision result.
See Section 4.7.7
for details of the stored result on overflow, underflow, or inexact result.
SUBx Fa. rx, Fb. rx, Fc. wx !Floating-point Operate format
Invalid Operation
Overflow
Underflow
Inexact Result
SUBS Subtract S_ floating
SUBT Subtract T_ floating
Rounding: Dynamic (/ D)
Minus infinity (/ M)
Chopped (/ C)
Trapping: Exception Completion (/ S)
Underflow Enable (/ U)
Inexact Enable (/ I)
187
187
Page 188
189
4– 132 Alpha Architecture Handbook
4.11 Miscellaneous Instructions
Alpha provides the miscellaneous instructions shown in Table 4– 17.
Table 4– 17: Miscellaneous Instructions Summary
Mnemonic Operation
AMASK Architecture Mask
CALL_ PAL Call Privileged Architecture Library Routine
ECB Evict Cache Block
EXCB Exception Barrier
FETCH Prefetch Data
FETCH_ M Prefetch Data, Modify Intent
IMPLVER Implementation Version
MB Memory Barrier
RPCC Read Processor Cycle Counter
TRAPB Trap Barrier
WH64 Write Hint — 64 Bytes
WMB Write Memory Barrier
188
188
Page 189
190
Instruction Descriptions 4– 133
4.11. 1 Architecture Mask
Format:
Operation:
Rc ¬ Rbv AND {NOT CPU_ feature_ mask}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Rbv represents a mask of the requested architectural extensions. Bits are cleared that corre-spond
to architectural extensions that are present. Reserved bits and bits that correspond to
absent extensions are copied unchanged. In either case, the result is placed in Rc. If the result
is zero, all requested features are present.
Software may specify an Rbv of all 1's to determine the complete set of architectural exten-sions
implemented by a processor. Assigned bit definitions are located in Section D.
3.
Ra must be R31 or the result in Rc is UNPREDICTABLE and it is UNPREDICTABLE
whether an exception is signaled.
Software Note:
Use this instruction to make instruction-set decisions; use IMPLVER to make code-tuning
decisions.
Implementation Note:
Instruction encoding is implemented as follows:
° On 21064/ 21064A/ 21066/ 21068/ 21066A (EV4/ EV45/ LCA/ LCA45 chips), AMASK copies Rbv to Rc.
° On 21164 (EV5), AMASK copies Rbv to Rc.
AMASK Rb. rq, Rc. wq !Operate format
AMASK #b. ib, Rc. wq !Operate format
None
AMASK Architecture Mask
None
189
189
Page 190
191
4– 134 Alpha Architecture Handbook
° On 21164A (EV56), 21164PC (PCA56), and 21264 (EV6), AMASK correctly indicates support for architecture extensions by copying Rbv to Rc and clearing appropriate bits.
Bits are assigned and placed in Appendix D
for architecture extensions as ECOs for those
extensions are passed. The low 8 bits are reserved for standard architecture extensions so
they can be tested with a literal; application-specific extensions are assigned from bit 8
upward.
190
190
Page 191
192
Instruction Descriptions 4– 135
4.11. 2 Call Privileged Architecture Library
Format:
Operation:
{Stall instruction issuing until all
prior instructions are guaranteed to
complete without incurring exceptions.}
{Trap to PALcode.}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The CALL_ PAL instruction is not issued until all previous instructions are guaranteed to com-plete
without exceptions. If an exception occurs, the continuation PC in the exception stack
frame points to the CALL_ PAL instruction. The CALL_ PAL instruction causes a trap to
PALcode.
CALL_ PAL fnc. ir !PAL format
None
CALL_ PAL Call Privileged Architecture Library
None
191
191
Page 192
193
4– 136 Alpha Architecture Handbook
4.11. 3 Evict Data Cache Block
Format:
Operation:
va ¬ Rbv
IF { va maps to memory space } THEN
Prepare to reuse cache resources that are occupied by the
the addressed byte.
END
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The ECB instruction provides a hint that the addressed location will not be referenced again in
the near future, so any cache space it occupies should be made available to cache other mem-ory
locations. If the cache copy of the location is dirty, the processor may start writing it back;
if the cache has multiple sets, the processor may arrange for the set containing the addressed
byte to be the next set allocated.
The ECB instruction does not generate exceptions; if it encounters data address translation
errors (access violation, translation not valid, and so forth) during execution, it is treated as a
NOP.
If the address maps to non-memory-like (I/ O) space, ECB is treated as a NOP.
Software Note:
° ECB makes a particular cache location available for reuse by evicting and invalidating its contents. The intent is to give software more control over cache allocation policy in
set-associative caches so that "useful" blocks can be retained in the cache.
° ECB is a performance hint — it does not serialize the eviction of the addressed cache block with any preceding or following memory operation.
ECB (Rb. ab) ! Memory format
None
ECB Evict Cache Block
None
192
192
Page 193
194
Instruction Descriptions 4– 137
° ECB is not intended for flushing caches prior to power failure or low power operation — CFLUSH is intended for that purpose.
Implementation Note:
Implementations with set-associative caches are encouraged to update their allocation
pointer so that the next D-stream reference that misses the cache and maps to this line is
allocated into the vacated set.
193
193
Page 194
195
4– 138 Alpha Architecture Handbook
4. 11. 4 Exception Barrier
Format:
Operation:
{EXCB does not appear to issue until completion of all
exceptions and dependencies on the Floating-point Control
Register (FPCR) from prior instructions.}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The EXCB instruction allows software to guarantee that in a pipelined implementation, all pre-vious
instructions have completed any behavior related to exceptions or rounding modes before
any instructions after the EXCB are issued.
In particular, all changes to the Floating-point Control Register (FPCR) are guaranteed to have
been made, whether or not there is an associated exception. Also, all potential floating-point
exceptions and integer overflow exceptions are guaranteed to have been taken. EXCB is thus a
superset of TRAPB.
If a floating-point exception occurs for which trapping is enabled, the EXCB instruction acts
like a fault. In this case, the value of the Program Counter reported to the program may be the
address of the EXCB instruction (or earlier) but is never the address of an instruction follow-ing
the EXCB.
The relationship between EXCB and the FPCR is described in Section 4. 7.8.1.
EXCB ! Memory format
None
EXCB Exception Barrier
None
194
194
Page 195
196
Instruction Descriptions 4– 139
4.11. 5 Prefetch Data
Format:
Operation:
va ¬ {Rbv}
{Optionally prefetch aligned 512-byte block surrounding va.}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The virtual address is given by Rbv. This address is used to designate an aligned 512-byte
block of data. An implementation may optionally attempt to move all or part of this block (or a
larger surrounding block) of data to a part of the memory hierarchy that has faster-access, in
anticipation of subsequent Load or Store instructions that access that data.
Implementation Note:
FETCHx is intended to help software overlap memory latencies when such latencies are on
the order of at least 100 cycles. FETCHx is unlikely to help (or be implemented) for
significantly shorter memory latencies. Code scheduling and cache-line prefetching (See
Section A. 3.5)
should be used to overlap such shorter latencies.
Existing Alpha implementations (through the 21264) have memory latencies that are too
short to profitably implement FETCHx. Therefore, FETCHx does not improve memory
performance in existing Alpha implementations.
The FETCH instruction is a hint to the implementation that may allow faster execution. An
implementation is free to ignore the hint. If prefetching is done in an implementation, the order
of fetch within the designated block is UNPREDICTABLE.
The FETCH_ M instruction gives the additional hint that modifications (stores) to some or all
of the data block are anticipated.
FETCHx 0( Rb. ab) !Memory format
None
FETCH Prefetch Data
FETCH_ M Prefetch Data, Modify Intent
None
195
195
Page 196
197
4– 140 Alpha Architecture Handbook
No exceptions are generated by FETCHx. If a Load (or Store in the case of FETCH_ M) that
uses the same address would fault, the prefetch request is ignored. It is UNPREDICTABLE
whether a TB-miss fault is ever taken by FETCHx.
Implementation Note:
Implementations are encouraged to take the TB-miss fault, then continue the prefetch.
196
196
Page 197
198
Instruction Descriptions 4– 141
4.11. 6 Implementation Version
Format:
Operation:
Rc ¬ value, which is defined in Appendix D
Exceptions:
Instruction mnemonics:
Description:
A small integer is placed in Rc that specifies the major implementation version of the proces-sor
on which it is executed. This information can be used to make code-scheduling or tuning
decisions, or the information can be used to branch to different pieces of code optimized for
different implementations.
Notes:
° The value returned by IMPLVER does not identify the particular processor type. Rather, it identifies a group of processors that can be treated similarly for performance
characteristics such as scheduling. Ra must be R31 and Rb must be the literal #1 or the
result in Rc is UNPREDICTABLE and it is UNPREDICTABLE whether an exception
is signaled.
Software Note:
Use this instruction to make code-tuning decisions; use AMASK to make instruction-set
decisions.
IMPLVER Rc !Operate format
None
IMPLVER Implementation Version
197
197
Page 198
199
4– 142 Alpha Architecture Handbook
4.11. 7 Memory Barrier
Format:
Operation:
{Guarantee that all subsequent loads or stores
will not access memory until after all previous
loads and stores have accessed memory, as
observed by other processors.}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The use of the Memory Barrier (MB) instruction is required only in multiprocessor systems.
In the absence of an MB instruction, loads and stores to different physical locations are
allowed to complete out of order on the issuing processor as observed by other processors. The
MB instruction allows memory accesses to be serialized on the issuing processor as observed
by other processors. See Chapter 5 for details on using the MB instruction to serialize these
accesses. Chapter 5 also details coordinating memory accesses across processors.
Note that MB ensures serialization only; it does not necessarily accelerate the progress of
memory operations.
MB !Memory format
None
MB Memory Barrier
None
198
198
Page 199
200
Instruction Descriptions 4– 143
4.11. 8 Read Processor Cycle Counter
Format:
Operation:
Ra ¬ {cycle counter}
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
Register Ra is written with the processor cycle counter (PCC). The PCC register consists of
two 32-bit fields. The low-order 32 bits (PCC< 31: 0>) are an unsigned, wrapping counter,
PCC_ CNT. The high-order 32 bits (PCC< 63: 32>), PCC_ OFF, are operating-system depen-dent
in their implementation.
See Section 3.1.5
for a description of the PCC.
If an operating system uses PCC_ OFF to calculate the per-process or per-thread cycle count,
that count must be derived from the 32-bit sum of PCC_ OFF and PCC_ CNT. The following
example computes that cycle count, modulo 2** 32, and returns the count value in R0. Notice
the care taken not to cause an unwanted sign extension.
RPCC R0 ; Read the process cycle counter
SLL R0, #32, R1 ; Line up the offset and count fields
ADDQ R0, R1, R0 ; Do add
SRL R0, #32, R0 ; Zero extend the count to 64 bits
The following example code returns the value of PCC_ CNT in R0< 31: 0> and all zeros in
R0< 63: 32>.
RPCC R0
ZAPNOT R0,# 15, R0
RPCC Ra. wq !Memory format
None
RPCC Read Processor Cycle Counter
None
199
199
Page 200
201
4– 144 Alpha Architecture Handbook
4.11. 9 Trap Barrier
Format:
Operation:
{TRAPB does not appear to issue until all prior instructions
are guaranteed to complete without causing any arithmetic traps}.
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The TRAPB instruction allows software to guarantee that in a pipelined implementation, all
previous arithmetic instructions will complete without incurring any arithmetic traps before the
TRAPB or any instructions after it are issued.
If an arithmetic exception occurs for which trapping is enabled, the TRAPB instruction acts
like a fault. In this case, the value of the Program Counter reported to the program may be the
address of the TRAPB instruction (or earlier) but is never the address of the instruction follow-ing
the TRAPB.
This fault behavior by TRAPB allows software, using one TRAPB instruction for each excep-tion
domain, to isolate the address range in which an exception occurs. If the address of the
instruction following the TRAPB were allowed, there would be no way to distinguish an
exception in the address range preceding a label from an exception in the range that includes
the label along with the faulting instruction and a branch back to the label. This case arises
when the code is not following exception completion rules but is inserting TRAPB instruc-tions
to isolate exceptions to the proper scope.
Use of TRAPB should be compared with use of the EXCB instruction; see Section 4. 11.4.
TRAPB !Memory format
None
TRAPB Trap Barrier
None
200
200
Page 201
202
Instruction Descriptions 4– 145
4.11. 10 Write Hint
Format:
Operation:
va ¬ Rbv
IF { va maps to memory space } THEN
Write UNPREDICTABLE data to the aligned 64-byte region
containing the addressed byte.
END
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The WH64 instruction provides a hint that the current contents of the aligned 64-byte block
containing the addressed byte will never be read again but will be overwritten in the near
future.
The processor may allocate cache resources to hold the block without reading its previous con-tents
from memory; the contents of the block may be set to any value that does not introduce a
security hole, as described in Section 1.6.3.
The WH64 instruction does not generate exceptions; if it encounters data address translation
errors (access violation, translation not valid, and so forth), it is treated as a NOP.
If the address maps to non-memory-like (I/ O) space, WH64 is treated as a NOP.
Software Note:
This instruction is a performance hint that should be used when writing a large continuous
region of memory. The intended code sequence consists of one WH64 instruction followed
by eight quadword stores for each aligned 64-byte region to be written.
Sometimes, the UNPREDICTABLE data will exactly match some or all of the previous
contents of the addressed block of memory.
WH64 (Rb. ab) ! Memory format
None
WH64 Write Hint -64 Bytes
None
201
201
Page 202
203
4– 146 Alpha Architecture Handbook
Implementation Note:
If the 64-byte region containing the addressed byte is not in the data cache,
implementations are encouraged to allocate the region in the data cache without first
reading it from memory. However, if any of the addressed bytes exist in the caches of
other processors, they must be kept coherent with respect to those processors.
Processors with cache blocks smaller than 64 bytes are encouraged to implement WH64 as
defined. However, they may instead implement the instruction by allocating a smaller
aligned cache block for write access or by treating WH64 as a NOP.
Processors with cache blocks larger than 64 bytes are also encouraged to implement WH64
as defined. However, they may instead treat WH64 as a NOP.
202
202
Page 203
204
Instruction Descriptions 4– 147
4.11. 11 Write Memory Barrier
Format:
Operation:
{ Guarantee that
{ All preceding stores that access memory-like
{ regions are ordered before any subsequent stores
{ that access memory-like regions and
{ All preceding stores that access non-memory-like
{ regions are ordered before any subsequent stores
{ that access non-memory-like regions.
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The WMB instruction provides a way for software to control write buffers. It guarantees that
writes preceding the WMB are not aggregated with writes that follow the WMB.
WMB guarantees that writes to memory-like regions that precede the WMB are ordered before
writes to memory-like regions that follow the WMB. Similarly, WMB guarantees that writes to
non-memory-like regions that precede the WMB are ordered before writes to non-mem-ory-
like regions that follow the WMB. It does not order writes to memory-like regions relative
to writes to non-memory-like regions.
WMB causes writes that are contained in buffers to be completed without unnecessary delay. It
is particularly suited for batching writes to high-performance I/ O devices.
WMB prevents writes that precede the WMB from being merged with writes that follow the
WMB. In particular, two writes that access the same location and are separated by a WMB
cause two distinct and ordered write events.
In the absence of a WMB (or IMB or MB) instruction, stores to memory-like or non-mem-ory-
like regions can be aggregated and/ or buffered and completed in any order.
WMB !Memory format
None
WMB Write Memory Barrier
None
203
203
Page 204
205
4– 148 Alpha Architecture Handbook
The WMB instruction is the preferred method for providing high-bandwidth write streams
where order must be preserved between writes in that stream.
Notes:
WMB is useful for ordering streams of writes to a non-memory-like region, such as to mem-ory-
mapped control registers or to a graphics frame buffer. While both MB and WMB can
ensure that writes to a non-memory-like region occur in order, without being aggregated or
reordered, the WMB is usually faster and is never slower than MB.
WMB can correctly order streams of writes in programs that operate on shared sections of data
if the data in those sections are protected by a classic semaphore protocol. The following
example illustrates such a protocol:
The example above is similar to that in Section 5.5.4,
except a WMB is substituted for the sec-ond
MB in the lock-update-release sequence. It is correct to substitute WMB for the second
MB only if:
1. All data locations that are read or written in the critical section are accessed only after
acquiring a software lock by using lock_ variable (and before releasing the software
lock).
2. For each read u of shared data in the critical section, there is a write v such that:
a. v is BEFORE the WMB
b. v follows u in processor issue sequence (see Section 5.6.1.1)
c. v either depends on u (see Section 5.6. 1.7)
or overlaps u (see Section 5.6.1),
or both.
3. Both lock_ variable and all the shared data are in memory-like regions (or lock_ variable and all the shared data are in non-memory-like regions). If the lock_ variable is in a
non-memory-like region, the atomic lock protocol must use some implementation-spe-cific hardware support.
The substitution of a WMB for the second MB is usually faster and never slower.
Processor i Processor j
<Acquire lock>
MB
<Read and write data
in shared section>
WMB
<Release lock> Þ <Acquire lock>
MB
<Read and write data in shared section>
WMB
204
204
Page 205
206
Instruction Descriptions 4– 149
4.12 VAX Compatibility Instructions
Alpha provides the instructions shown in Table 4– 18
for use in translated VAX code. These
instructions are not a permanent part of the architecture and will not be available in some
future implementations. They are intended to preserve customer assumptions about VAX
instruction atomicity in porting code from VAX to Alpha.
These instructions should be generated only by the VAX-to-Alpha software translator; they
should never be used in native Alpha code. Any native code that uses them may cease to work.
Table 4– 18: VAX Compatibility Instructions Summary
Mnemonic Operation
RC Read and Clear
RS Read and Set
205
205
Page 206
207
4– 150 Alpha Architecture Handbook
4.12. 1 VAX Compatibility Instructions
Format:
Operation:
Ra ¬ intr_ flag
intr_ flag ¬ 0 !RC
intr_ flag ¬ 1 !RS
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The intr_ flag is returned in Ra and then cleared to zero (RC) or set to one (RS).
These instructions may be used to determine whether the sequence of Alpha instructions
between RS and RC (corresponding to a single VAX instruction) was executed without inter-ruption
or exception.
Intr_ flag is a per-processor state bit. The intr_ flag is cleared if that processor encounters a
CALL_ PAL REI instruction.
It is UNPREDICTABLE whether a processor's intr_ flag is affected when that processor exe-cutes
an LDx_ L or STx_ C instruction. A processor's intr_ flag is not affected when that
processor executes a normal load or store instruction.
A processor's intr_ flag is not affected when that processor executes a taken branch.
Notes:
° These instructions are intended only for use by the VAX-to-Alpha software translator; they should never be used by native code.
Rx Ra. wq !Memory format
None
RC Read and Clear
RS Read and Set
None
206
206
Page 207
208
Instruction Descriptions 4– 151
4. 13 Multimedia (Graphics and Video) Support
Alpha provides the following instructions that enhance support for graphics and video
algorithms:
The MIN and MAX instructions allow the clamping of pixel values to maximium values that
are allowed in different standards and stages of the CODECs.
The PERR instruction accelerates the macroblock search in motion estimation.
The pack and unpack (PKxB and UNPKBx) instructions accelerate the blocking of interleaved
YUV coordinates for processing by the CODEC.
Implementation Note:
Alpha processors for which the AMASK instruction returns bit 8 set implement these
instructions. Those processors for which AMASK does not return bit 8 set can take an
Illegal Instruction trap, and software can emulate their function, if required.
Mnemonic Operation
MINUB8 Vector Unsigned Byte Minimum
MINSB8 Vector Signed Byte Minimum
MINUW4 Vector Unsigned Word Minimum
MINSW4 Vector Signed Word Minimum
MAXUB8 Vector Unsigned Byte Maximum
MAXSB8 Vector Signed Byte Maximum
MAXUW4 Vector Unsigned Word Maximum
MAXSW4 Vector Signed Word Maximum
PERR Pixel Error
PKLB Pack Longwords to Bytes
PKWB Pack Words to Bytes
UNPKBL Unpack Bytes to Longwords
UNPKBW Unpack Bytes to Words
207
207
Page 208
209
4– 152 Alpha Architecture Handbook
4.13. 1 Byte and Word Minimum and Maximum
Format:
Operation:
CASE
MINUB8:
FOR i FROM 0 TO 7
Rcv< i* 8+ 7: i* 8> = MINU( Rav< i* 8+ 7: i* 8>, Rbv< i* 8+ 7: i* 8>)
END
MINSB8:
FOR i FROM 0 TO 7
Rcv< i* 8+ 7: i* 8> = MINS( Rav< i* 8+ 7: i* 8>, Rbv< i* 8+ 7: i* 8>)
END
MINUW4:
FOR i FROM 0 TO 3
Rcv< i* 16+ 15: i* 16> = MINU( Rav< i* 16+ 15: i* 16>, Rbv< i* 16+ 15: i* 16>)
END
MINSW4:
FOR i FROM 0 TO 3
Rcv< i* 16+ 15: i* 16> = MINS( Rav< i* 16+ 15: i* 16>, Rbv< i* 16+ 15: i* 16>)
END
MAXUB8:
FOR i FROM 0 TO 7
Rcv< i* 8+ 7: i* 8> = MAXU( Rav< i* 8+ 7: i* 8>, Rbv< i* 8+ 7: i* 8>)
END
MAXSB8:
FOR i FROM 0 TO 7
Rcv< i* 8+ 7: i* 8> = MAXS( Rav< i* 8+ 7: i* 8>, Rbv< i* 8+ 7: i* 8>)
END
MAXUW4:
FOR i FROM 0 TO 3
Rcv< i* 16+ 15: i* 16> = MAXU( Rav< i* 16+ 15: i* 16>, Rbv< i* 16+ 15: i* 16>)
END
MAXSW4:
FOR i FROM 0 TO 3
Rcv< i* 16+ 15: i* 16> = MAXS( Rav< i* 16+ 15: i* 16>, Rbv< i* 16+ 15: i* 16>)
END
ENDCASE:
Exceptions:
MINxxx Ra. rq, Rb. rq, Rc. wq
Ra. rq,# b. ib, Rc. wq ! Operate Format
MAXxxx Ra. rq, Rb. rq, Rc. wq
Ra. rq,# b. ib, Rc. wq ! Operate Format
None
208
208
Page 209
210
Instruction Descriptions 4– 153
Instruction mnemonics:
Qualifiers:
Description:
For MINxB8, each byte of Rc is written with the smaller of the corresponding bytes of Ra or
Rb. The bytes may be interpreted as signed or unsigned values.
For MINxW4, each word of Rc is written with the smaller of the corresponding words of Ra or
Rb. The words may be interpreted as signed or unsigned values.
For MAXxB8, each byte of Rc is written with the larger of the corresponding bytes of Ra or
Rb. The bytes may be interpreted as signed or unsigned values.
For MAXxW4, each word of Rc is written with the larger of the corresponding words of Ra or
Rb. The words may be interpreted as signed or unsigned values.
MINUB8 Vector Unsigned Byte Minimum
MINSB8 Vector Signed Byte Minimum
MINUW4 Vector Unsigned Word Minimum
MINSW4 Vector Signed Word Minimum
MAXUB8 Vector Unsigned Byte Maximum
MAXSB8 Vector Signed Byte Maximum
MAXUW4 Vector Unsigned Word Maximum
MAXSW4 Vector Signed Word Maximum
None
209
209
Page 210
211
4– 154 Alpha Architecture Handbook
4.13. 2 Pixel Error
Format:
Operation:
temp = 0
FOR i FROM 0 TO 7
IF { Rav< i* 8+ 7: i* 8> GEU Rbv< i* 8+ 7: i* 8>} THEN
temp ¬ temp + (Rav< i* 8+ 7: i* 8> -Rbv< i* 8+ 7: i* 8>)
ELSE
temp ¬ temp + (Rbv< i* 8+ 7: i* 8> -Rav< i* 8+ 7: i* 8>)
END
Rc ¬ temp
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
The absolute value of the difference between each of the bytes in Ra and Rb is calculated. The
sum of the resulting bytes is written to Rc.
PERR Ra. rq, Rb. rq, Rc. wq ! Operate Format
None
PERR Pixel Error
None
210
210
Page 211
212
Instruction Descriptions 4– 155
4. 13. 3 Pack Bytes
Format:
Operation:
CASE
PKLB:
BEGIN
Rc< 07: 00> ¬ Rbv< 07: 00>
Rc< 15: 08> ¬ Rbv< 39: 32>
Rc< 63: 16> ¬ 0
END
PKWB:
BEGIN
Rc< 07: 00> ¬ Rbv< 07: 00>
Rc< 15: 08> ¬ Rbv< 23: 16>
Rc< 23: 16> ¬ Rbv< 39: 32>
Rc< 31: 24> ¬ Rbv< 55: 48>
Rc< 63: 32> ¬ 0
END
ENDCASE
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
For PKLB, the component longwords of Rb are truncated to bytes and written to the lower two
byte positions of Rc. The upper six bytes of Rc are written with zero.
For PKWB, the component words of Rb are truncated to bytes and written to the lower four
byte positions of Rc. The upper four bytes of Rc are written with zero.
PKxB Rb. rq, Rc. wq ! Operate Format
None
PKLB Pack Longwords to Bytes
PKWB Pack Words to Bytes
None
211
211
Page 212
213
4– 156 Alpha Architecture Handbook
4.13. 4 Unpack Bytes
Format:
Operation:
temp = 0
CASE
UNPKBL:
BEGIN
temp< 07: 00> = Rbv< 07: 00>
temp< 39: 32> = Rbv< 15: 08>
END
UNPKBW:
BEGIN
temp< 07: 00> = Rbv< 07: 00>
temp< 23: 16> = Rbv< 15: 08>
temp< 39: 32> = Rbv< 23: 16>
temp< 55: 48> = Rbv< 31: 24>
END
ENDCASE
Rc ¬ temp
Exceptions:
Instruction mnemonics:
Qualifiers:
Description:
For UNPKBL, the lower two component bytes of Rb are zero-extended to longwords. The
resulting longwords are written to Rc.
For UNPKBW, the lower four component bytes of Rb are zero-extended to words. The result-ing
words are written to Rc.
UNPKBx Rb. rq, Rc. wq ! Operate Format
None
UNPKBL Unpack Bytes to Longwords
UNPKBW Unpack Bytes to Words
None
212
212
Page 213
214
System Architecture and Programming Implications 5– 1
Chapter 5
System Architecture and Programming
Implications
5. 1 Introduction
Portions of the Alpha architecture have implications for programming, and the system struc-ture,
of both uniprocessor and multiprocessor implementations. Architectural implications
considered in the following sections are:
° Physical address space behavior
° Caches and write buffers
° Translation buffers and virtual caches
° Data sharing
° Read/ write ordering
° Arithmetic traps
To meet the requirements of the Alpha architecture, software and hardware implementors need
to take these issues into consideration.
5.2 Physical Address Space Characteristics
Alpha physical address space is divided into four equal-size regions. The regions are delin-eated
by the two most significant, implemented, physical address bits. Each region's
characteristics are distinguished by the coherency, granularity, and width of memory accesses,
and whether the region exhibits memory-like behavior or non-memory-like behavior.
5.2. 1 Coherency of Memory Access
Alpha implementations must provide a coherent view of memory, in which each write by a
processor or I/ O device (hereafter, called "processor") becomes visible to all other processors.
No distinction is made between coherency of "memory space" and "I/ O space."
213
213
Page 214
215
5– 2 Alpha Architecture Handbook
Memory coherency may be provided in different ways for each of the four physical address
regions.
Possible per-region policies include, but are not restricted to:
° No caching
No copies are kept of data in a region; all reads and writes access the actual data
location (memory or I/ O register), but a processor may elide multiple accesses to the
same data (see Section 5.2.3).
° Write-through caching
Copies are kept of any data in the region; reads may use the copies, but writes update
the actual data location and either update or invalidate all copies.
° Write-back caching
Copies are kept of any data in the region; reads and writes may use the copies, and
writes use additional state to determine whether there are other copies to invalidate or
update.
Software/ Hardware Note:
To produce separate and distinct accesses to a specific location, the location must be a
region with no caching and a memory barrier instruction must be inserted between
accesses. See Section 5.2.3.
Part of the coherency policy implemented for a given physical address region may include
restrictions on excess data transfers (performing more accesses to a location than is necessary
to acquire or change the location's value) or may specify data transfer widths (the granularity
used to access a location).
Independent of coherency policy, a processor may use different hardware or different hard-ware
resource policies for caching or buffering different physical address regions.
5.2. 2 Granularity of Memory Access
For each region, an implementation must support aligned quadword access and may optionally
support aligned longword access or byte access. If byte access is supported in a region, aligned
word access and aligned longword access are also supported.
For a quadword access region, accesses to physical memory must be implemented such that
independent accesses to adjacent aligned quadwords produce the same results regardless of the
order of execution. Further, an access to an aligned quadword must be done in a single atomic
operation.
For a longword access region, accesses to physical memory must be implemented such that
independent accesses to adjacent aligned longwords produce the same results regardless of the
order of execution. Further, an access to an aligned longword must be done in a single atomic
operation, and an access to an aligned quadword must also be done in a single atomic
operation.
214
214
Page 215
216
System Architecture and Programming Implications 5– 3
For a byte access region, accesses to physical memory must be implemented such that indepen-dent
accesses to adjacent bytes or adjacent aligned words produce the same results, regardless
of the order of execution. Further, an access to a byte, an aligned word, an aligned longword,
or an aligned quadword must be done in a single atomic operation.
In this context, "atomic" means that the following is true if different processors do simulta-neous
reads and writes of the same data:
° The result of any set of writes must be the same as if the writes had occurred sequen-tially in some order, and
° Any read that observes the effect of a write on some part of memory must observe the effect of that write (or of a later write or writes) on the entire part of memory that is
accessed by both the read and the write.
When a write accesses only part of a given word, longword, or quadword, a read of the entire
structure may observe the effect of that partial write without observing the effect of an earlier
write of another byte or bytes to the same structure. See Sections 5.6.1.5
and 5. 6.1.6.
5.2. 3 Width of Memory Access
Subject to the granularity, ordering, and coherency constraints given in Sections 5. 2. 1,
5. 2. 2,
and 5.6,
accesses to physical memory may be freely cached, buffered, and prefetched.
A processor may read more physical memory data (such as a full cache block) than is actually
accessed, writes may trigger reads, and writes may write back more data than is actually
updated. A processor may elide multiple reads and/ or writes to the same data.
5.2. 4 Memory-Like and Non-Memory-Like Behavior
Memory-like regions obey the following rules:
° Each page frame in the region either exists in its entirety or does not exist in its entirety; there are no holes within a page frame.
° All locations that exist are read/ write.
° A write to a location followed by a read from that location returns precisely the bits written; all bits act as memory.
° A write to one location does not change any other location.
° Reads have no side effects.
° Longword access granularity is provided, and if the byte/ word extension is imple-mented, byte access granularity is provided.
° Instruction-fetch is supported.
° Load-locked and store-conditional are supported.
Non-memory-like regions may have much more arbitrary behavior:
° Unimplemented locations or bits may exist anywhere.
° Some locations or bits may be read-only and others write-only.
215
215
Page 216
217
5– 4 Alpha Architecture Handbook
° Address ranges may overlap, such that a write to one location changes the bits read from a different location.
° Reads may have side effects, although this is strongly discouraged.
° Longword granularity need not be supported and, even if the byte/ word extension is implemented, byte access granularity need not be implemented.
° Instruction-fetch need not be supported.
° Load-locked and store-conditional need not be supported.
Hardware/ Software Coordination Note:
The details of such behavior are outside the scope of the Alpha architecture. Specific
processor and I/ O device implementations may choose and document whatever behavior
they need. It is the responsibility of system designers to impose enough consistency to
allow processors successfully to access matching non-memory devices in a coherent way.
5. 3 Translation Buffers and Virtual Caches
A system may choose to include a virtual instruction cache (virtual I-cache) or a virtual data
cache (virtual D-cache). A system may also choose to include either a combined data and
instruction translation buffer (TB) or separate data and instruction TBs (DTB and ITB). The
contents of these caches and/ or translation buffers may become invalid, depending on what
operating system activity is being performed.
Whenever a non-software field of a valid page table entry (PTE) is modified, copies of that
PTE must be made coherent. PALcode mechanisms are available to clear all TBs, both DTB
and ITB entries for a given VA, either DTB or ITB entries for a given VA, or all entries with
the address space match (ASM) bit clear. Virtual D-cache entries are made coherent whenever
the corresponding DTB entry is requested to be cleared by any of the appropriate PALcode
mechanisms. Virtual I-cache entries can be made coherent via the IMB instruction.
If a processor implements address space numbers (ASNs), and the old PTE has the Address
Space Match (ASM) bit clear (ASNs in use) and the Valid bit set, then entries can also effec-tively
be made coherent by assigning a new, unused ASN to the currently running process and
not reusing the previous ASN before calling the appropriate PALcode routine to invalidate the
translation buffer (TB).
In a multiprocessor environment, making the TBs and/ or caches coherent on only one proces-sor
is not always sufficient. An operating system must arrange to perform the above actions on
each processor that could possibly have copies of the PTE or data for any affected page.
5. 4 Caches and Write Buffers
A hardware implementation may include mechanisms to reduce memory access time by mak-ing
local copies of recently used memory contents (or those expected to be used) or by
buffering writes to complete at a later time. Caches and write buffers are examples of these
mechanisms. They must be implemented so that their existence is transparent to software
(except for timing, error reporting/ control/ recovery, and modification to the I-stream).
216
216
Page 217
218
System Architecture and Programming Implications 5– 5
The following requirements must be met by all cache/ write-buffer implementations. All pro-cessors
must provide a coherent view of memory.
° Write buffers may be used to delay and aggregate writes. From the viewpoint of another processor, buffered writes appear not to have happened yet. (Write buffers must not
delay writes indefinitely. See Section 5.6.1.9.)
° Write-back caches must be able to detect a later write from another processor and inval-idate or update the cache contents.
° A processor must guarantee that a data store to a location followed by a data load from the same location reads the updated value.
° Cache prefetching is allowed, but virtual caches must not prefetch from invalid pages. See Sections 5.6.1. 3,
5.6.4.3,
and 5.6.4.4.
° A processor must guarantee that all of its previous writes are visible to all other proces-sors before a HALT instruction completes. A processor must guarantee that its caches
are coherent with the rest of the system before continuing from a HALT.
° If battery backup is supplied, a processor must guarantee that the memory system remains coherent across a powerfail/ recovery sequence. Data that was written by the
processor before the powerfail may not be lost, and any caches must be in a valid state
before (and if) normal instruction processing is continued after power is restored.
° Virtual instruction caches are not required to notice modifications of the virtual I-stream (they need not be coherent with the rest of memory). Software that creates or
modifies the instruction stream must execute a CALL_ PAL IMB before trying to exe-cute
the new instructions.
In this context, to "modify the virtual I-stream" means either:
– any Store to the same physical address that is subsequently fetched as an instruction
by some corresponding (virtual address, ASN) pair, or
– any change to the virtual-to-physical address mapping so that different values are
fetched.
For example, if two different virtual addresses, VA1 and VA2, map to the same page
frame, a store to VA1 modifies the virtual I-stream fetched by VA2.
However, the following sequence does not modify the virtual I-stream (this might
happen in soft page faults).
1. Change the mapping of an I-stream page from valid to invalid.
2. Copy the corresponding page frame to a new page frame.
3. Change the original mapping to be valid and point to the new page frame.
° Physical instruction caches are not required to notice modifications of the physical I-stream (they need not be coherent with the rest of memory), except for certain paging
activity. (See Section 5.6.4.4.)
Software that creates or modifies the instruction stream
must execute a CALL_ PAL IMB before trying to execute the new instructions.
In this context, to "modify the physical I-stream" means any Store to the same physical
address that is subsequently fetched as an instruction.
217
217
Page 218
219
5– 6 Alpha Architecture Handbook
5.5 Data Sharing
In a multiprocessor environment, writes to shared data must be synchronized by the
programmer.
5.5. 1 Atomic Change of a Single Datum
The ordinary STL and STQ instructions can be used to perform an atomic change of a shared
aligned longword or quadword. (" Change" means that the new value is not a function of the old
value.) In particular, an ordinary STL or STQ instruction can be used to change a variable that
could be simultaneously accessed via an LDx_ L/ STx_ C sequence.
5.5. 2 Atomic Update of a Single Datum
The load-locked/ store-conditional instructions may be used to perform an atomic update of a
shared aligned longword or quadword. (" Update" means that the new value is a function of the
old value.)
The following sequence performs a read-modify-write operation on location x. Only regis-ter-
to-register operate instructions and branch fall-throughs may occur in the sequence:
try_ again:
LDQ_ L R1, x
<modify R1>
STQ_ C R1, x
BEQ R1, no_ store
:
no_ store:
<code to check for excessive iterations>
BR try_ again
If this sequence runs with no exceptions or interrupts, and no other processor writes to loca-tion
x (more precisely, the locked range including x) between the LDQ_ L and STQ_ C
instructions, then the STQ_ C shown in the example stores the modified value in x and sets R1
to 1. If, however, the sequence encounters exceptions or interrupts that eventually continue the
sequence, or another processor writes to x, then the STQ_ C does not store and sets R1 to 0. In
this case, the sequence is repeated by the branches to no_ store and try_ again. This repetition
continues until the reasons for exceptions or interrupts are removed and no interfering store is
encountered.
To be useful, the sequence must be constructed so that it can be replayed an arbitrary number
of times, giving the same result values each time. A sufficient (but not necessary) condition is
that, within the sequence, the set of operand destinations and the set of operand sources are
disjoint.
Note:
A sufficiently long instruction sequence between LDx_ L and STx_ C will never complete,
because periodic timer interrupts will always occur before the sequence completes. The
rules in Section A. 5
describe sequences that will eventually complete in all Alpha
implementations.
218
218
Page 219
220
System Architecture and Programming Implications 5– 7
This load-locked/ store-conditional paradigm may be used whenever an atomic update of a
shared aligned quadword is desired, including getting the effect of atomic byte writes.
5.5. 3 Atomic Update of Data Structures
Before accessing shared writable data structures (those that are not a single aligned longword
or quadword), the programmer can acquire control of the data structure by using an atomic
update to set a software lock variable. Such a software lock can be cleared with an ordinary
store instruction.
A software-critical section, therefore, may look like the sequence:
stq_ c_ loop:
spin_ loop:
LDQ R1, lock_ variable ; This optional spin-loop code
BLBS R1, already_ set ; should be used unless the
; lock is known to be low-contention.
LDQ_ L R1, lock_ variable ; \
BLBS R1, already_ set ; \
OR R1,# 1, R2 ; > Set lock bit
STQ_ C R2, lock_ variable ; /
BEQ R2, stq_ c_ fail ; /
MB
<critical section: updates various data structures>
MB ; Second MB
STQ R31, lock_ variable ; Clear lock bit
:
:
already_ set:
<code to block or reschedule or test for too many iterations>
BR spin_ loop
stq_ c_ fail:
<code to test for too many iterations>
BR stq_ c_ loop
This code has a number of subtleties:
° If the lock_ variable is already set, the spin loop is done without doing any stores. This avoidance of stores improves memory subsystem performance and avoids the deadlock
described below. The loop uses an ordinary load. This code sequence is preferred unless
the lock is known to be low-contention, because the sequence increases the probability
that the LDQ_ L hits in the cache and the LDQ_ L/ STQ_ C sequence complete quickly
and successfully.
° If the lock_ variable is actually being changed from 0 to 1, and the STQ_ C fails (due to an interrupt, or because another processor simultaneously changed lock_ variable), the
entire process starts over by reading the lock_ variable again.
° Only the fall-through path of the BLBS instructions does a STx_ C; some implementa-tions may not allow a successful STx_ C after a branch-taken.
° Only register-to-register operate instructions are used to do the modify.
219
219
Page 220
221
5– 8 Alpha Architecture Handbook
° Both conditional branches are forward branches, so they are properly predicted not to be taken (to match the common case of no contention for the lock).
° The OR writes its result to a second register; this allows the OR and the BLBS to be interchanged if that would give a faster instruction schedule.
° Other operate instructions (from the critical section) may be scheduled into the LDQ_ L.. STQ_ C sequence, so long as they do not fault or trap and they give correct
results if repeated; other memory or operate instructions may be scheduled between the
STQ_ C and BEQ.
° The memory barrier instructions are discussed in Section 5. 5.4.
It is correct to substitute WMB for the second MB only if:
– All data locations that are read or written in the critical section are accessed only
after acquiring a software lock by using lock_ variable (and before releasing the
software lock).
– For each read u of shared data in the critical section, there is a write v such that:
1. v is BEFORE the WMB
2. v follows u in processor issue sequence (see Section 5.6.1.1)
3. v either depends on u (see Section 5.6.1.7)
or overlaps u (see Section 5.6. 1),
or both.
– Both lock_ variable and all the shared data are in memory-like regions (or
lock_ variable and all the shared data are in non-memory-like regions). If the
lock_ variable is in a non-memory-like region, the atomic lock protocol must use
some implementation-specific hardware support.
Generally, the substitution of a WMB for the second MB increases performance.
° An ordinary STQ instruction is used to clear the lock_ variable.
It would be a performance mistake to spin-wait by repeating the full LDQ_ L.. STQ_ C sequence
(to move the BLBS after the BEQ) because that sequence may repeatedly change the software
lock_ variable from "locked" to "locked," with each write causing extra access delays in all
other caches that contain the lock_ variable. In the extreme, spin-waits that contain writes may
deadlock as follows:
If, when one processor spins with writes, another processor is modifying (not changing)
the lock_ variable, then the writes on the first processor may cause the STx_ C of the
modify on the second processor always to fail.
This deadlock situation is avoided by:
° Having only one processor execute a store (no STx_ C), or
° Having no write in the spin loop, or
° Doing a write only if the shared variable actually changes state (1 ® 1 does not change state).
220
220
Page 221
222
System Architecture and Programming Implications 5– 9
5.5. 4 Ordering Considerations for Shared Data Structures
A critical section sequence, such as shown in Section 5.5.3,
is conceptually only three steps:
1. Acquire software lock
2. Critical section — read/ write shared data
3. Clear software lock
In the absence of explicit instructions to the contrary, the Alpha architecture allows reads and
writes to be reordered. While this may allow more implementation speed and overlap, it can
also create undesired side effects on shared data structures. Normally, the critical section just
described would have two instructions added to it:
<acquire software lock>
MB (memory barrier #1)
<critical section – read/ write shared data>
MB (memory barrier #2)
<clear software lock>
<endcode_ example>
The first memory barrier prevents any reads (from within the critical section) from being
prefetched before the software lock is acquired; such prefetched reads would potentially con-tain
stale data.
The second memory barrier prevents any writes and reads in the critical section being delayed
past the clearing of the software lock. Such delayed accesses could interact with the next user
of the shared data, defeating the purpose of the software lock entirely. It is correct to substitute
WMB for the second MB only if:
1. All data locations that are read or written in the critical section are accessed only after
acquiring a software lock by using lock_ variable (and before releasing the software
lock).
2. For each read u of shared data in the critical section, there is a write v such that:
a. v is BEFORE the WMB
b. v follows u in processor issue sequence (see Section 5.6. 1.1)
c. v either depends on u (see Section 5.6.1.7)
or overlaps u (see Section 5.6.1),
or both.
3. Both lock_ variable and all the shared data are in memory-like regions (or lock_ variable and all the shared data are in non-memory-like regions). If the lock_ variable is in a
non-memory-like region, the atomic lock protocol must use some implementation-spe-cific hardware support.
Generally, the substitution of a WMB for the second MB increases performance.
Software Note:
In the VAX architecture, many instructions provide noninterruptable read-modify-write
sequences to memory variables. Most programmers never regard data sharing as an issue.
In the Alpha architecture, programmers must pay more attention to synchronizing access to
shared data; for example, to AST routines. In the VAX architecture, a programmer can use
221
221
Page 222
223
5– 10 Alpha Architecture Handbook
an ADDL2 to update a variable that is shared between a "MAIN" routine and an AST
routine, if running on a single processor. In the Alpha architecture, a programmer must
deal with AST shared data by using multiprocessor shared data sequences.
5.6 Read/ Write Ordering
This section applies to programs that run on multiple processors or on one or more processors
that are interacting with DMA I/ O devices. To a program running on a single processor and not
interacting with DMA I/ O devices, all memory accesses appear to happen in the order speci-fied
by the programmer. This section deals with predictable read/ write ordering across multiple
processors and/ or DMA I/ O devices.
The order of reads and writes done in an Alpha implementation may differ from that specified
by the programmer.
For any two memory accesses A and B, either A must occur before B in all Alpha implementa-tions,
B must occur before A, or they are UNORDERED. In the last case, software cannot
depend upon one occurring first: the order may vary from implementation to implementation,
and even from run to run or moment to moment on a single implementation.
If two accesses cannot be shown to be ordered by the rules given, they are UNORDERED and
implementations are free to do them in any order that is convenient. Implementations may take
advantage of this freedom to deliver substantially higher performance.
The discussion that follows first defines the architectural issue sequence of memory accesses
on a single processor, then defines the (partial) ordering on this issue sequence that all Alpha
implementations are required to maintain.
The individual issue sequences on multiple processors are merged into access sequences at
each shared memory location. The discussion defines the (partial) ordering on the individual
access sequences that all Alpha implementations are required to maintain.
The net result is that for any code that executes on multiple processors, one can determine
which memory accesses are required to occur before others on all Alpha implementations and
hence can write useful shared-variable software.
Software writers can force one access to occur before another by inserting a memory barrier
instruction (MB, WMB, or CALL_ PAL IMB) between the accesses.
5.6. 1 Alpha Shared Memory Model
An Alpha system consists of a collection of processors, I/ O devices (and possibly a bridge to
connect remote I/ O devices), and shared memories that are accessible by all processors.
Note:
An example of an unshared location is a physical address in I/ O space that refers to a CSR
that is local to a processor and not accessible by other processors.
A processor is an Alpha CPU.
222
222
Page 223
224
System Architecture and Programming Implications 5– 11
In most systems, DMA I/ O devices or other agents can read or write shared memory locations.
The order of accesses by those agents is not completely specified in this document. It is possi-ble
in some systems for read accesses by I/ O devices or other agents to give results indicating
some reordering of accesses. However, there are guarantees that apply in all systems. See Sec-tion
5.6.4.7.
A shared memory is the primary storage place for one or more locations.
A location is a byte, specified by its physical address. Multiple virtual addresses may map to
the same physical address. Ordering considerations are based only on the physical address.
This definition of location specifically includes locations and registers in memory mapped I/ O
devices and bridges to remote I/ O (for example, Mailbox Pointer Registers, or MBPRs).
Implementation Note:
An implementation may allow a location to have multiple physical addresses, but the rules
for accesses via mixtures of the addresses are implementation-specific and outside the
scope of this section. Accesses via exactly one of the physical addresses follow the rules
described next.
Each processor may generate accesses to shared memory locations. There are six types of
accesses:
1. Instruction fetch by processor i to location x, returning value a, denoted Pi: I< 4>( x, a).
2. Data read (including load-locked) by processor i to location x, returning value a, denoted Pi: R< size>( x, a).
3. Data write (including successful store-conditional) by processor i to location x, storing value a, denoted Pi: W< size>( x, a).
4. Memory barrier issued by processor i, denoted Pi: MB.
5. Write memory barrier issued by processor i, denoted Pi: WMB.
6. I-stream memory barrier issued by processor i, denoted Pi: IMB.
The first access type is also called an I-stream access or I-fetch. The next two are also called
D-stream accesses. The first three types are collectively called read/ write accesses, denoted
Pi: Op< m>( x, a), where m is the size of the access in bytes, x is the (physical) address of the
access, and a is a value representable in m bytes; for any k in the range 0.. m– 1, byte k of value
a (where byte 0 is the low-order byte) is the value written to or read from location x+ k by the
access. This relationship reflects little-endian addressing; big-endian addressing representation
is as described in Chapter 2.
The last three types collectively are called barriers or memory barriers.
The size of a read/ write access is 8 for a quadword access, 4 for a longword access (including
all instruction fetches), 2 for a word access, or 1 for a byte access. All read/ write accesses in
this chapter are naturally aligned. That is, they have the form Pi: Op< m>( x, a), where the
address x is divisible by size m.
The word "access" is also used as a verb; a read/ write access Pi: Op< m>( x, a) accesses byte z if
x £ z < x+ m. Two read/ write accesses Op1< m>( x, a) and Op2< n>( y, b) are defined to overlap if
223
223
Page 224
225
5– 12 Alpha Architecture Handbook
there is at least one byte that is accessed by both, that is, if max( x, y) < min( x+ m, y+ n).
5.6.1.1 Architectural Definition of Processor Issue Sequence
The issue sequence for a processor is architecturally defined with respect to a hypothetical sim-ple
implementation that contains one processor and a single shared memory, with no caches or
buffers. This is the instruction execution model:
1. I-fetch: An Alpha instruction is fetched from memory.
2. Read/ Write: That instruction is executed and runs to completion, including a single data read from memory for a Load instruction or a single data write to memory for a Store
instruction.
3. Update: The PC for the processor is updated.
4. Loop: Repeat the above sequence indefinitely.
If the instruction fetch step gets a memory management fault, the I-fetch is not done and the
PC is updated to point to a PALcode fault handler. If the read/ write step gets a memory man-agement
fault, the read/ write is not done and the PC is updated to point to a PALcode fault
handler.
5.6.1.2 Definition of Before and After
The ordering relation BEFORE (Ü ) is a partial order on memory accesses. It is further defined
in Sections 5.6.1.3
through 5.6.1.9.
The ordering relation BEFORE (Ü ), being a partial order, is acyclic.
The BEFORE order cannot be observed directly, nor fully predicted before an actual execu-tion,
nor reproduced exactly from one execution to another. Nonetheless, some useful ordering
properties must hold in all Alpha implementations.
If u Ü v, then v is said to be AFTER u.
5.6.1.3 Definition of Processor Issue Constraints
Processor issue constraints are imposed on the processor issue sequence defined in Section
5.6.1.1,
as shown in Table 5– 1:
224
224
Page 225
226
System Architecture and Programming Implications 5– 13
Where "overlap" denotes the condition max( x, y) < min( x+ m, y+ n).
For two accesses u and v issued by processor Pi, if u precedes v by processor issue constraint,
then u precedes v in BEFORE order. u and v on Pi are ordered by processor issue constraint if
any of the following applies:
1. The entry in Table 5– 1
indicated by the access type of u (1st) and v (2nd) indicates the
accesses are ordered.
2. u and v are both writes to memory-like regions and there is a WMB between u and v in processor issue sequence.
3. u and v are both writes to non-memory-like regions and there is a WMB between u and v in processor issue sequence.
4. u is a TB fill that updates a PTE, for example, a PTE read in order to satisfy a TB miss, and v is an I-or D-stream access using that PTE (see Sections 5.6.4.3
and 5.6.4.
4).
In Table 5– 1,
1st and 2nd refer to the ordering of accesses in the processor issue sequence.
Note that Table 5– 1
imposes no direct constraint on the ordering relationship between non-overlapping
read/ write accesses, though there may be indirect constraints due to the transitivity
of BEFORE (Ü ). Conditions 2 through 4, above, impose ordering constraints on some pairs of
nonoverlapping read/ write accesses.
Table 5– 1
permits a read access Pi: R< n>( y, b) to be ordered BEFORE an overlapping write
access Pi: W< m>( x, a) that precedes the read access in processor issue order. This asymmetry
for reads allows reads to be satisfied by using data from an earlier write in processor issue
sequence by the same processor (for example, by hitting in a write buffer) before the write
completes. The write access remains "visible" to the read access; "visibility" is described in
Sections 5.6.1.5
and 5.6. 1.6
and illustrated in Litmus Test 11 in Section 5.6. 2.11.
An I-fetch Pi: I< 4>( y, b) may also be ordered BEFORE an overlapping write Pi: W< m>( x, a) that
precedes it in processor issue sequence. In that case, the write may, but need not, be visible to
the I-fetch. This asymmetry in Table 5– 1
allows writes to the I-stream to be incoherent until a
CALL_ PAL IMB is executed.
Implementations are free to perform memory accesses from a single processor in any sequence
that is consistent with processor issue constraints.
Table 5– 1: Processor Issue Constraints
1st¯ 2nd ® Pi: I< n= 4>( y, b) Pi: R< n>( y, b) Pi: W< n>( y, b) Pi: MB Pi: IMB
Pi: I< m= 4>( x, a) Ü if overlap Ü if overlap Ü Ü
Pi: R< m>( x, a) Ü if overlap Ü if overlap Ü Ü
Pi: W< m>( x, a) Ü if overlap Ü Ü
Pi: MB Ü ÜÜÜ
Pi: IMB Ü ÜÜ ÜÜ
225
225
Page 226
227
5– 14 Alpha Architecture Handbook
5.6.1.4 Definition of Location Access Constraints
Location access constraints are imposed on overlapping read/ write accesses. If u and v are
overlapping read/ write accesses, at least one of which is a write, then u and v must be compara-ble
in the BEFORE (Ü ) ordering, that is, either u Ü v or v Ü u.
There is no direct requirement that nonoverlapping accesses be comparable in the BEFORE
(Ü ) ordering.
All writes accessing any given byte are totally ordered, and any read or I-fetch accessing a
given byte is ordered with respect to all writes accessing that byte.
5.6.1.5 Definition of Visibility
If u is a write access Pi: W< m>( x, a) and v is an overlapping read access Pj: R< n>( y, b), u is visi-ble
to v only if:
u Ü v, or
u precedes v in processor issue sequence (possible only if Pi= Pj).
If u is a write access Pi: W< m>( x, a) and v is an overlapping instruction fetch Pj: I< 4>( y, b),
there are the following rules for visibility:
1. If u Ü v, then u is visible to v.
2. If u precedes v in processor issue sequence, then:
a. If there is a write w such that:
u overlaps w and precedes w in processor issue sequence, and
w is visible to v,
then u is visible to v.
b. If there is an instruction fetch w such that:
u is visible to w, and
w overlaps v and precedes v in processor issue sequence,
then u is visible to v.
3. If u does not precede v in either processor issue sequence or BEFORE order, then u is not visible to v.
Note that the rules of visibility for reads and instruction fetches are slightly different. If a write
u precedes an overlapping instruction fetch v in processor issue sequence, but u is not
BEFORE v, then u may or may not be visible to v.
5.6.1.6 Definition of Storage
The property of storage applies only to memory-like regions.
The value read from any byte by a read access or instruction fetch v, is the value written by the
latest (in BEFORE order) write u to that byte that is visible to v. More formally:
If u is Pi: W< m>( x, a), and v is either Pj: I< 4>( y, b) or Pj: R< n>( y, b), and z is a byte accessed
by both u and v, and u is visible to v; and there is no write that is AFTER u, is visible to v,
226
226
Page 227
228
System Architecture and Programming Implications 5– 15
and accesses byte z; then the value of byte z read by v is exactly the value written by u. In
this situation, u is a source of v.
The only way to communicate information between different processors is for one to write a
shared location and the other to read the shared location and receive the newly written value.
(In this context, the sending of an interrupt from processor Pi to Pj is modeled as Pi writing to a
location INTij, and Pj reading from INTij.)
5.6.1.7 Definition of Dependence Constraint
The depends relation (DP) is defined as follows. Given u and v issued by processor Pi, where u
is a read or an instruction fetch and v is a write, u precedes v in DP order (written u DP v, that
is, v depends on u) in either of the following situations:
° u determines the execution of v, the location accessed by v, or the value written by v.
° u determines the execution or address or value of another memory access z that pre-cedes v or might precede v (that is, would precede v in some execution path depending
on the value read by u) by processor issue constraint (see Section 5.6.1.3).
Note that the DP relation does not directly impose a BEFORE (Ü) ordering between accesses
u and v.
The dependence constraint requires that the union of the DP relation and the "is a source of"
relation (see Section 5.6.1.6)
be acyclic. That is, there must not exist reads and/ or I-fetches R1,
…, Rn, and writes W1, …, Wn, such that:
1. n ³ 1,
2. For each i, 1 £ i £ n, Ri DP Wi,
3. For each i, 1 £ i < n, Wi is a source of Ri + 1, and
4. Wn is a source of R1.
That constraint eliminates the possibility of "causal loops." A simple example of a "causal
loop" is when the execution of a write on Pi depends on the execution of a write on Pj and vice
versa, creating a circular dependence chain. The following simple example of a "causal loop"
is written in the style of the litmus tests in Section 5.6.2,
where initially x and y are 1:
Processor Pi executes:
LDQ R1, x
STQ R1, y
Processor Pj executes:
LDQ R1, y
STQ R1, x
227
227
Page 228
229
5– 16 Alpha Architecture Handbook
Representing those code sequences in the style of the litmus tests in Section 5.6. 2,
it is impos-sible
for the following sequence to result:
Analysis:
Given the initial condition x, y = 1, the access sequence above would also be impossible if the
code were:
Processor Pi's program:
LDQ R1, x
BNE R1, done
STQ R31, y
done:
Processor Pj's program:
LDQ R1, y
BNE R1, done
STQ R31, x
done:
5.6.1.8 Definition of Load-Locked and Store-Conditional
The property of load-locked and store-conditional applies only to memory-like regions.
For each successful store-conditional v, there exists a load-locked u such that the following are
true:
1. u precedes v in the processor issue sequence.
2. There is no load-locked or store-conditional between u and v in the processor issue sequence.
3. If u and v access within the same naturally aligned 16-byte physical and virtual block in memory, then for every write w by a different processor that accesses within u's lock
range (where w is either a store or a successful store conditional), it must be true that w
Ü u or v Ü w.
u's lock range contains the region of physical memory that u accesses. See Sections 4.2.4 and
4.2.5, which define the lock range and conditions for success or failure of a store conditional.
Pi Pj
[U1] Pi: R< 8>( x, 0) [V1] Pj: R< 8>( y, 0)
[U2] Pi: W< 8>( y, 0) [V2] Pj: W< 8>( x, 0)
<1> By the definitions of storage and visibility, U2 is the source of V1, and V2 is the
source of U1.
<2> By the definition of DP and examination of the code, U1 DP U2, and V1 DP V2.
<3> Thus, U1 DP U2, U2 is the source of V1, V1 DP V2, and V2 is the source of U1.
This circular chain is forbidden by the dependence constraint.
228
228
Page 229
230
System Architecture and Programming Implications 5– 17
5.6.1.9 Timeliness
Even in the absence of a barrier after the write, no write by a processor may be delayed indefi-nitely
in the BEFORE ordering.
5.6. 2 Litmus Tests
Many issues about writing and reading shared data can be cast into questions about whether a
write is before or after a read. These questions can be answered by rigorously checking
whether any ordering satisfies the rules in Sections 5.6. 1.3
through 5.6.1.8.
In litmus tests 1– 9 below, all initial quadword memory locations contain 1. In all these litmus
tests, it is assumed that initializations are performed by a write or writes that are BEFORE all
the explicitly listed accesses, that all relevant writes other than the initializations are explicitly
shown, and that all accesses shown are to memory-like regions (so the definition of storage
applies).
5.6.2.1 Litmus Test 1 (Impossible Sequence)
Initially, location x contains 1:
Analysis:
Thus, once a processor reads a new value from a location, it must never see an old value – time
must not go backward. V2 must read 2.
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: R< 8>( x, 2)
[V2] Pj: R< 8>( x, 1)
<1> By the definition of storage (Section 5.6. 1.6),
V1 reading 2 implies that U1 is visible
to V1.
<2> By the rules for visibility (Section 5.6.1.5),
U1 being visible to V1, but being issued
by a different processor, implies that U1 Ü
V1.
<3> By the processor issue constraints (Section 5.6.1.3),
V1 Ü V2.
<4> By the transitivity of the partial order Ü, it follows from <2> and <3> that U1 Ü
V2.
<5> By the rules for visibility, it follows from U1 Ü V2 that U1 is visible to V2.
<6> Since U1 is AFTER the initialization of x, U1 is the latest (in the Ü ordering) write
to x that is visible to V1.
<7> By the definition of storage, it follows that V2 should read the value written by U1,
in contradiction to the stated result.
229
229
Page 230
231
5– 18 Alpha Architecture Handbook
5.6.2.2 Litmus Test 2 (Impossible Sequence)
Initially, location x contains 1:
Analysis:
Thus, once processor Pj reads a new value written by U1, any other writes that must precede
the read must also precede U1. V3 must read 2.
5.6.2.3 Litmus Test 3 (Impossible Sequence)
Initially, location x contains 1:
Analysis:
Again, time cannot go backwards. If V1 is ordered before U1, then processor Pk cannot read
first the later value 3 and then the earlier value 2. Alternatively, if V1 is ordered before U1, U2
must read 2.
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: W< 8>( x, 3)
[V2] Pj: R< 8>( x, 2)
[V3] Pj: R< 8>( x, 3)
<1> Since V1 precedes V2 in processor issue sequence, V1 is visible to V2.
<2> V2 reading 2 implies U1 is the latest (in Ü order) write to x visible to V2.
<3> From <1> and <2>, V1 Ü U1.
<4> Since U1 is visible to V2, and they are issued by different processors, U1 Ü V2.
<5> By the processor issue constraints, V2 Ü V3.
<6> From <4> and <5>, U1 Ü V3.
<7> From <6> and the visibility rules, U1 is visible to V3.
<8> Since both V1 and the initialization of x are BEFORE U1, U1 is the latest write to x
that is visible to V3.
<9> By the definition of storage, it follows that V3 should read the value written by U1,
in contradiction to the stated result.
Pi Pj Pk
[U1] Pi: W< 8>( x, 2) [V1] Pj: W< 8>( x, 3) [W1] Pk: R< 8>( x, 3)
[U2] Pi: R< 8>( x, 3) [W2] Pk: R< 8>( x, 2)
<1> U2 reading 3 implies V1 is the latest write to x visible to U2, therefore U1 Ü V1.
<2> W1 reading 3 implies V1 is visible to W1, so V1 Ü W1 Ü W, therefore V1 is also
visible to W2.
<3> W2 reading 2 implies U1 is the latest write to x visible to W2, therefore V1 Ü U1.
<4> From <1> and <3>, U1 Ü V1 Ü U1.
230
230
Page 231
232
System Architecture and Programming Implications 5– 19
5.6.2.4 Litmus Test 4 (Sequence Okay)
Initially, locations x and y contain 1:
Analysis:
There are no conflicts in the sequence. There are no violations of the definition of BEFORE.
5.6.2.5 Litmus Test 5 (Sequence Okay)
Initially, locations x and y contain 1:
Analysis:
There is U2 Ü V1 Ü V2 Ü V3 Ü U1. There are no conflicts in this sequence. There are no
violations of the definition of BEFORE.
5.6.2.6 Litmus Test 6 (Sequence Okay)
Initially, locations x and y contain 1:
Analysis:
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: R< 8>( y, 2)
[U2] Pi: W< 8>( y, 2) [V2] Pj: R< 8>( x, 1)
<1> V1 reading 2 implies U2 Ü V1, by storage and visibility.
<2> Since V2 does not read 2, there cannot be U1 Ü V2.
<3> By the access order constraints, it follows from <2> that V2 Ü U1.
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: R< 8>( y, 2)
[V2] Pj: MB
[U2] Pi: W< 8>( y, 2) [V3] Pj: R< 8>( x, 1)
<1> V1 reading 2 implies U2 Ü V1, by storage and visibility.
<2> V1 Ü V2 Ü V3, by processor issue constraints.
<3> V3 reading 1 implies V3 Ü U1, by storage and visibility.
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: R< 8>( y, 2)
[U2] Pi: MB
[U3] Pi: W< 8>( y, 2) [V2] Pj: R< 8>( x, 1)
<1> U1 Ü U2 Ü U3, by processor issue constraints.
<2> V1 reading 2 implies U3 Ü V1, by storage and visibility.
<3> V2 reading 1 implies V2 Ü U1, by storage and visibility.
231
231
Page 232
233
5– 20 Alpha Architecture Handbook
There is V2 Ü U1 Ü U2 Ü U3 Ü V1. There are no conflicts in this sequence. There are no
violations of the definition of BEFORE.
In litmus tests 4, 5, and 6, writes to two different locations x and y are observed (by another
processor) to occur in the opposite order than that in which they were performed. An update to
y propagates quickly to Pj, but the update to x is delayed, and Pi and Pj do not both have MBs.
5.6.2.7 Litmus Test 7 (Impossible Sequence)
Initially, locations x and y contain 1:
Analysis:
Both <1> and <5> cannot be true, so if V1 reads 2, then V3 must also read 2.
If both x and y are in memory-like regions, the sequence remains impossible if U2 is changed
to a WMB. Similarly, if both x and y are in non-memory-like regions, the sequence remains
impossible if U2 is changed to a WMB.
5.6.2.8 Litmus Test 8 (Impossible Sequence)
Initially, locations x and y contain 1:
Analysis:
Both <1> and <5> cannot be true, so if U3 reads 1, then V3 must read 2, and vice versa.
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: R< 8>( y, 2)
[U2] Pi: MB [V2] Pj: MB
[U3] Pi: W< 8>( y, 2) [V3] Pj: R< 8>( x, 1)
<1> V3 reading 1 implies V3 Ü U1, by storage and visibility.
<2> V1 reading 2 implies U3 Ü V1, by storage and visibility.
<3> U1 Ü U2 Ü U3, by processor issue constraints.
<4> V1 Ü V2 Ü V3, by processor issue constraints.
<5> By <2>, <3>, and <4>, U1 Ü U2 Ü U3 Ü V1 Ü V2 Ü V3.
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: W< 8>( y, 2)
[U2] Pi: MB [V2] Pj: MB
[U3] Pi: R< 8>( y, 1) [V3] Pj: R< 8>( x, 1)
<1> V3 reading 1 implies V3 Ü U1, by storage and visibility.
<2> U3 reading 1 implies U3 Ü V1, by storage and visibility.
<3> U1 Ü U2 Ü U3, by processor issue constraints.
<4> V1 Ü V2 Ü V3, by processor issue constraints.
<5> By <2>, <3>, and <4>, U1 Ü U2 Ü U3 Ü V1 Ü V2 Ü V3.
232
232
Page 233
234
System Architecture and Programming Implications 5– 21
5.6.2.9 Litmus Test 9 (Impossible Sequence)
Initially, location x contains 1:
Analysis:
Both <1> and <2> cannot be true. Time cannot go backwards. If V3 reads 2, then U3 must read
2. Alternatively, if U3 reads 3, then V3 must read 3.
5.6.2.10 Litmus Test 10 (Sequence Okay)
For an aligned quadword location, x, initially 100000001 16 :
Analysis:
There is no ordering cycle, so the sequence is permitted.
5.6.2.11 Litmus Test 11 (Impossible Sequence)
For an aligned quadword location, x, initially 100000001 16 :
Analysis:
Both <1> and <2> cannot be true.
Pi Pj
[U1] Pi: W< 8>( x, 2) [V1] Pj: W< 8>( x, 3)
[U2] Pi: R< 8>( x, 2) [V2] Pj: R< 8>( x, 3)
[U3] Pi: R< 8>( x, 3) [V3] Pj: R< 8>( x, 2)
<1> V3 reading 2 implies U1 is the latest write to x visible to V3, therefore V1 Ü U1.
<2> U3 reading 3 implies V1 is the latest write to x visible to U3, therefore U1 Ü V1.
Pi Pj
[U1] Pi: W< 4>( x, 2) [V1] Pj: W< 4>( x+ 4,2)
[U2] Pi: R< 8>( x, 100000002 16 ) [V2] Pj: R< 8>( x, 200000001 16 )
<1> Since U2 reads 1 from x+ 4, V1 is not visible to U2. Thus U2 Ü V1.
<2> Similarly, V2 Ü U1.
<3> U1 is visible to U2, but since they are issued by the same processor, it is not neces-sarily
the case that U1 Ü U2.
<4> Similarly, it is not necessarily the case that V1 Ü V2.
Pi Pj
[U1] Pi: W< 4>( x, 2) [V1] Pj: R< 8>( x, 200000001 16 )
[U2] Pi: MB or WMB
[U3] Pi: W< 4>( x+ 4,2)
<1> V1 reading 200000001 16 implies U3 Ü V1 Ü U1 by storage and visibility.
<2> U1 Ü U2 Ü U3, by processor issue constraints.
233
233
Page 234
235
5– 22 Alpha Architecture Handbook
5. 6. 3 Implied Barriers
There are no implied barriers in Alpha. If an implied barrier is needed for functionally correct
access to shared data, it must be written as an explicit instruction. (Software must explicitly
include any needed MB, WMB, or CALL_ PAL IMB instructions.)
Alpha transitions such as the following have no built-in implied memory barriers:
° Entry to PALcode
° Sending and receiving interrupts
° Returning from exceptions, interrupts, or machine checks
° Swapping context
° Invalidating the Translation Buffer (TB)
Depending on implementation choices for maintaining cache coherency, some PALcode/ cache
implementations may have an implied CALL_ PAL IMB in the I-stream TB fill routine, but
this is transparent to the non-PALcode programmer.
5.6. 4 Implications for Software
Software must explicitly include MB, WMB, or CALL_ PAL IMB instructions according to the
following circumstances.
5.6.4.1 Single Processor Data Stream
No barriers are ever needed. A read to physical address x will always return the value written
by the immediately preceding write to x in the processor issue sequence.
5.6.4.2 Single Processor Instruction Stream
An I-fetch from virtual or physical address x does not necessarily return the value written by
the immediately preceding write to x in the issue sequence. To make the I-fetch reliably get the
newly written instruction, a CALL_ PAL IMB is needed between the write and the I-fetch.
5.6.4.3 Multiprocessor Data Stream (Including Single Processor with DMA I/ O)
Generally, the only way to reliably communicate shared data is to write the shared data on one
processor or DMA I/ O device, execute an MB (or the logical equivalent 1 if it is a DMA I/ O
device), then write a flag (equivalently, send an interrupt) signaling the other processor that the
shared data is ready. Each receiving processor must read the new flag (equivalently, receive the
interrupt), execute an MB, then read or update the shared data. In the special case in which data
1 In this context, the logical equivalent of an MB for a DMA device is whatever is necessary under the
applicable I/ O subsystem architecture to ensure that preceding writes will be BEFORE (see Section
5.6.1.2)
the subsequent write of a flag or transmission of an interrupt. Not all I/ O devices behave
exactly as required by the Alpha architecture. To interoperate properly with those devices, some spe-cial
action might be required by the program executing on the CPU. For example, PCI bus devices
require that after the CPU has received an interrupt, the CPU must read a CSR location on the PCI
device, execute an MB, then read or update the shared data. From the perspective of the Alpha archi-tecture,
this CSR read can be regarded as a necessary assist to help the DMA I/ O device complete its
logical equivalent of an MB.
234
234
Page 235
236
System Architecture and Programming Implications 5– 23
is communicated through just one location in memory, memory barriers are not necessary.
Software Note:
Note that this section does not describe how to reliably communicate data from a processor
to a DMA device. See Section 5. 6.4.7.
Leaving out the first MB removes the assurance that the shared data is written before the flag is
written.
Leaving out the second MB removes the assurance that the shared data is read or updated only
after the flag is seen to change; in this case, an early read could see an old value, and an early
update could be overwritten.
This implies that after a DMA I/ O device has written some data to memory (such as paging in
a page from disk), the DMA device must logically execute an MB 1 before posting a comple-tion
interrupt, and the interrupt handler software must execute an MB before the data is
guaranteed to be visible to the interrupted processor. Other processors must also execute MBs
before they are guaranteed to see the new data.
An important special case occurs when a write is done (perhaps by an I/ O device) to some
physical page frame, then an MB is executed, and then a previously invalid PTE is changed to
be a valid mapping of the physical page frame that was just written. In this case, all processors
that access virtual memory by using the newly valid PTE must guarantee to deliver the newly
written data after the TB miss, for both I-stream and D-stream accesses.
5.6.4.4 Multiprocessor Instruction Stream (Including Single Processor with DMA I/ O)
The only way to update the I-stream reliably is to write the shared I-stream on one processor or
DMA I/ O device, then execute a CALL_ PAL IMB (or an MB if the processor is not going to
execute the new I-stream, or the logical equivalent of an MB if it is a DMA I/ O device), then
write a flag (equivalently, send an interrupt) signaling the other processor that the shared
I-stream is ready. Each receiving processor must read the new flag (equivalently, receive the
interrupt), execute a CALL_ PAL IMB, then fetch the shared I-stream.
Software Note:
Note that this section does not describe how to reliably communicate I-stream from a
processor to a DMA device. See Section 5.6.4.7.
Leaving out the first CALL_ PAL IMB (or MB) removes the assurance that the shared I-stream
is written before the flag.
Leaving out the second CALL_ PAL IMB removes the assurance that the shared I-stream is
read only after the flag is seen to change; in this case, an early read could see an old value.
1 See Footnote 1 on page 5-22.
235
235
Page 236
237
5– 24 Alpha Architecture Handbook
This implies that after a DMA I/ O device has written some I-stream to memory (such as pag-ing
in a page from disk), the DMA device must logically execute an MB 1 before posting a
completion interrupt, and the interrupt handler software must execute a CALL_ PAL IMB
before the I-stream is guaranteed to be visible to the interrupted processor. Other processors
must also execute CALL_ PAL IMB instructions before they are guaranteed to see the new
I-stream.
An important special case occurs under the following circumstances:
1. A write (perhaps by an I/ O device) is done to some physical page frame.
2. A CALL_ PAL IMB (or MB) is executed.
3. A previously invalid PTE is changed to be a valid mapping of the physical page frame that was written in step 1.
In this case, all processors that access virtual memory by using the newly valid PTE must guar-antee
to deliver the newly written I-stream after the TB miss.
5.6.4.5 Multiprocessor Context Switch
If a process migrates from executing on one processor to executing on another, the context
switch operating system code must include a number of barriers.
A process migrates by having its context stored into memory, then eventually having that con-text
reloaded on another processor. In between, some shared mechanism must be used to
communicate that the context saved in memory by the first processor is available to the second
processor. This could be done by using an interrupt, by using a flag bit associated with the
saved context, or by using a shared-memory multiprocessor data structure, as follows:
1 See Footnote 1 on page 5-22.
First Processor Second Processor
:
Save state of current process.
MB [1]
Pass ownership of process con-text
data structure memory. Þ Pick up ownership of process context data
structure memory.
MB [2]
Restore state of new process context data struc-ture
memory.
Make I-stream coherent [3].
Make TB coherent [4].
:
Execute code for new process that accesses
memory that is not common to all processes.
236
236
Page 237
238
System Architecture and Programming Implications 5– 25
MB [1] ensures that the writes done to save the state of the current process happen before
the ownership is passed.
MB [2] ensures that the reads done to load the state of the new process happen after the
ownership is picked up and hence are reliably the values written by the processor saving
the old state. Leaving this MB out makes the code fail if an old value of the context
remains in the second processor's cache and invalidates from the writes done on the first
processor are not delivered soon enough.
The TB on the second processor must be made coherent with any write to the page tables
that may have occurred on the first processor just before the save of the process state. This
must be done with a series of TB invalidate instructions to remove any nonglobal page
mapping for this process, or by assigning an ASN that is unused on the second processor to
the process. One of these actions must occur sometime before starting execution of the
code for the new process that accesses memory (instruction or data) that is not common to
all processes. A common method is to assign a new ASN after gaining ownership of the
new process and before loading its context, which includes its ASN.
The D-cache on the second processor must be made coherent with any write to the
D-stream that may have occurred on the first processor just before the save of process
state. This is ensured by MB [2] and does not require any additional instructions.
The I-cache on the second processor must be made coherent with any write to the I-stream
that may have occurred on the first processor just before the save of process state. This can
be done with a CALL_ PAL IMB sometime before the execution of any code that is not
common to all processes, More commonly, this can be done by forcing a TB miss (via the
new ASN or via TB invalidate instructions) and using the TB-fill rule (see Section 5.6. 4.3).
This latter approach does not require any additional instruction.
Combining all these considerations gives the following, where, on a single processor, there is
no need for the barriers:
237
237
Page 238
239
5– 26 Alpha Architecture Handbook
5.6.4.6 Multiprocessor Send/ Receive Interrupt
If one processor writes some shared data, then sends an interrupt to a second processor, and
that processor receives the interrupt, then accesses the shared data, the sequence from Section
5.6.4.3
must be used:
First Processor Second Processor
:
Pick up ownership of process con-text
data structure memory.
MB
Assign new ASN or invalidate
TBs.
Save state of current process.
Restore state of new process.
MB
Pass ownership of process context
data structure memory.
:
Þ : Pickup ownership of new process context data structure memory.
:MB
Assign new ASN or invalidate TBs.
Save state of current process.
Restore state of new process.
MB
Pass ownership of old process context data
structure memory.
:
Execute code for new process that accesses
memory that is not common to all processes.
238
238
Page 239
240
System Architecture and Programming Implications 5– 27
Leaving out the MB at the beginning of the interrupt-receipt routine causes the code to fail if
an old value of the context remains in the second processor's cache, and invalidates from the
writes done on the first processor are not delivered soon enough.
5.6.4.7 Implications for Memory Mapped I/ O
Sections 5.6.4. 3
and 5.6. 4.4
describe methods for communicating data from a processor or
DMA I/ O device to another processor that work reliably in all Alpha systems. Special consid-erations
apply to the communication of data or I-stream from a processor to a DMA I/ O
device. These considerations arise from the use of bridges to connect to I/ O buses with devices
that are accessible by memory accesses to non-memory-like regions of physical memory.
The following communication method works in all Alpha systems.
To reliably communicate shared data from a processor to an I/ O device:
1. Write the shared data to a memory-like physical memory region on the processor.
2. Execute an MB instruction.
3. Write a flag (equivalently, send an interrupt or write a register location implemented in the I/ O device).
The receiving I/ O device must:
1. Read the flag (equivalently, detect the interrupt or detect the write to the register loca-tion
implemented in the I/ O device).
2. Execute the equivalent of an MB 1
3. Read the shared data.
As shown in Section 5. 6. 4. 3,
leaving out the memory barrier removes the assurance that the
shared data is written before the flag is. Unlike the case in Section 5. 6. 4. 3,
writing the shared
data to a non-memory-like physical memory region removes the assurance that the I/ O device
First Processor Second Processor
:
Write data
MB
Send interrupt Þ Receive interrupt
MB
Access data
:
1 In this context, the logical equivalent of an MB for a DMA device is whatever is necessary under the
applicable I/ O subsystem architecture to ensure that preceding writes will be BEFORE (see Section
5.6.1.2)
the subsequent reads of shared data. Typically, this action is defined to be present between
every read and write access done by the I/ O device, according to the applicable I/ O subsystem archi-tecture.
239
239
Page 240
241
5– 28 Alpha Architecture Handbook
will detect the writes of the shared data before detecting the flag write, interrupt, or device reg-ister
write.
This implies that after a processor has prepared a data buffer to be read from memory by a
DMA I/ O device (such as writing a buffer to disk), the processor must execute an MB before
starting the I/ O. The I/ O device, after receiving the start signal, must logically execute an MB
before reading the data buffer, and the buffer must be located in a memory-like physical mem-ory
region.
There are methods of communicating data that may work in some systems but are not guaran-teed
in all systems. Two notable examples are:
1. If an Alpha processor writes a location implemented in a component located on an I/ O
bus in the system, then executes a memory barrier, then writes a flag in some memory
location (in a memory-like or non-memory-like region), a device on the I/ O bus may be
able to detect (via read access) the result of the flag in memory write and the write of
the location on the I/ O bus out of order (that is, in a different order than the order in
which the Alpha processor wrote those locations).
2. If an Alpha processor writes a location that is a control register within an I/ O device, then executes a memory barrier, then writes a location in memory (in a memory-like or
non-memory-like region), the I/ O device may be able to detect (via read access) the result of the memory write before receiving and responding to the write of its own con-trol
register.
In almost every case, a mechanism that ensures the completion of writes to control register
locations within I/ O devices is provided. The normal and strongly recommended mechanism is
to read a location after writing it, which guarantees that the write is complete. In any case, all
systems that use a particular I/ O device should provide the same mechanism for that device.
5.6.4.8 Multiple Processors Writing to a Single I/ O Device
Generally, for multiple processors to cooperate in writing to a single I/ O device, the first pro-cessor
must write to the device, execute an MB, then notify other processors. Another
processor that intends to write the same I/ O device after the first processor must receive the
notification, execute an MB, and then write to the I/ O device. For example:
First Processor Second Processor
:
Write CSR_ A
MB
Write flag (in memory) Þ Read flag (in memory)
MB
Write CSR_ B
:
240
240
Page 241
242
System Architecture and Programming Implications 5– 29
The MB on the first processor guarantees that the write to CSR_ A precedes the write to flag in
memory, as perceived on other processors. (The MB does not guarantee that the write to
CSR_ A has completed. See Section 5.6. 4. 7
for a discussion of how a processor can guarantee
that a write to an I/ O device has completed at that device.) The MB on the second processor
guarantees that the write to CSR_ B will reach the I/ O device after the write to CSR_ A.
5.6. 5 Implications for Hardware
The coherency point for physical address x is the place in the memory subsystem at which
accesses to x are ordered. It may be at a main memory board, or at a cache containing x exclu-sively,
or at the point of winning a common bus arbitration.
The coherency point for x may move with time, as exclusive access to x migrates between
main memory and various caches.
MB and CALL_ PAL IMB force all preceding writes to at least reach their respective coher-ency
points. This does not mean that main-memory writes have been done, just that the order
of the eventual writes is committed. For example, on the XMI with retry, this means getting the
writes acknowledged as received with good parity at the inputs to memory board queues; the
actual RAM write happens later.
MB and CALL_ PAL IMB also force all queued cache invalidates to be delivered to the local
caches before starting any subsequent reads (that may otherwise cache hit on stale data) or
writes (that may otherwise write the cache, only to have the write effectively overwritten by a
late-delivered invalidate).
WMB ensures that the final order of writes to memory-like regions is committed and that the
final order of writes to non-memory-like regions is committed. This does not imply that the
final order of writes to memory-like regions relative to writes to non-memory-like regions is
committed. It also prevents writes that precede the WMB from merging with writes that fol-low
the WMB. For example, an implementation with a write buffer might implement WMB by
closing all valid write buffer entries from further merging and then drain the write buffer
entries in order.
Implementations may allow reads of x to hit (by physical address) on pending writes in a write
buffer, even before the writes to x reach the coherency point for x. If this is done, it is still true
that no earlier value of x may subsequently be delivered to the processor that took the hit on the
write buffer value.
Virtual data caches are allowed to deliver data before doing address translation, but only if
there cannot be a pending write under a synonym virtual address. Lack of a write-buffer match
on untranslated address bits is sufficient to guarantee this.
Virtual data caches must invalidate or otherwise become coherent with the new value when-ever
a PALcode routine is executed that affects the validity, fault behavior, protection
behavior, or virtual-to-physical mapping specified for one or more pages. Becoming coherent
can be delayed until the next subsequent MB instruction or TB fill (using the new mapping) if
the implementation of the PALcode routine always forces a subsequent TB fill.
241
241
Page 242
243
5– 30 Alpha Architecture Handbook
5. 7 Arithmetic Traps
Alpha implementations are allowed to execute multiple instructions concurrently and to for-ward
results from one instruction to another. Thus, when an arithmetic trap is detected, the PC
may have advanced an arbitrarily large number of instructions past the instruction T (calculat-ing
result R) whose execution triggered the trap.
When the trap is detected, any or all of these subsequent instructions may run to completion
before the trap is actually taken. The set of instructions subsequent to T that complete before
the trap is taken are collectively called the trap shadow of T. The PC pushed on the stack when
the trap is taken is the PC of the first instruction past the trap shadow.
The instructions in the trap shadow of T may use the UNPREDICTABLE result R of T, they
may generate additional traps, and they may completely change the PC (branches, JSR).
Thus, by the time a trap is taken, the PC pushed on the stack may bear no useful relationship to
the PC of the trigger instruction T, and the state visible to the programmer may have been
updated using the UNPREDICTABLE result R. If an instruction in the trap shadow of T uses
R to calculate a subsequent register value, that register value is UNPREDICTABLE, even
though there may be no trap associated with the subsequent calculation. Similarly:
° If an instruction in the trap shadow of T stores R or any subsequent UNPREDICT-ABLE result, the stored value is UNPREDICTABLE.
° If an instruction in the trap shadow of T uses R or any subsequent UNPREDICTABLE result as the basis of a conditional or calculated branch, the branch target is UNPRE-DICTABLE.
° If an instruction in the trap shadow of T uses R or any subsequent UNPREDICTABLE result as the basis of an address calculation, the memory address actually accessed is
UNPREDICTABLE.
Software can follow the rules in Section 4.7.7.3 to reliably bound how far the PC may advance
before taking a trap, how far an UNPREDICTABLE result may propagate or continue from a
trap by supplying a well-defined result R within an arithmetic trap handler. Arithmetic instruc-tions
that do not use the /S exception completion qualifier can reliably produce that behavior
by inserting TRAPB instructions at appropriate points.
242
242
Page 243
244
Common PALcode Architecture 6– 1
Chapter 6
Common PALcode Architecture
6.1 PALcode
In a family of machines, both users and operating system developers require functions to be
implemented consistently. When functions conform to a common interface, the code that uses
those functions can be used on several different implementations without modification.
These functions range from the binary encoding of the instruction and data to the exception
mechanisms and synchronization primitives. Some of these functions can be implemented cost
effectively in hardware, but others are impractical to implement directly in hardware. These
functions include low-level hardware support functions such as Translation Buffer miss fill
routines, interrupt acknowledge, and vector dispatch. They also include support for privileged
and atomic operations that require long instruction sequences.
In the VAX, these functions are generally provided by microcode. This is not seen as a prob-lem
because the VAX architecture lends itself to a microcoded implementation.
One of the goals of Alpha architecture is to implement functions consistently without micro-code.
However, it is still desirable to provide an architected interface to these functions that
will be consistent across the entire family of machines. The Privileged Architecture Library
(PALcode) provides a mechanism to implement these functions without microcode.
6. 2 PALcode Instructions and Functions
PALcode is used to implement the following functions:
° Instructions that require complex sequencing as an atomic operation ° Instructions that require VAX style interlocked memory access
° Privileged instructions ° Memory management control, including translation buffer (TB) management
° Context swapping ° Interrupt and exception dispatching
° Power-up initialization and booting ° Console functions
° Emulation of instructions with no hardware support
243
243
Page 244
245
6– 2 Alpha Architecture Handbook
The Alpha architecture lets these functions be implemented in standard machine code that is
resident in main memory. PALcode is written in standard machine code with some implemen-tation-
specific extensions to provide access to low-level hardware. This lets an Alpha
implementation make various design trade-offs based on the hardware technology being used
to implement the machine. The PALcode can abstract these differences and make them invisi-ble
to system software.
For example, in a MOS VLSI implementation, a small (32-entry) fully associative TB can be
the right match to the media, given that chip area is a costly resource. In an ECL version, a
large (1024 entry) direct-mapped TB can be used because it will use RAM chips and does not
have fast associative memories available. This difference would be handled by implementa-tion-
specific versions of the PALcode on the two systems, both versions providing transparent
TB miss service routines. The operating system code would not need to know there were any
differences.
An Alpha Privileged Architecture Library (PALcode) of routines and environments is supplied
by Compaq. Other systems may use a library supplied by Compaq or architect and implement a
different library of routines. Alpha systems are required to support the replacement of PAL-code
defined by Compaq with an operating system-specific version.
6. 3 PALcode Environment
The PALcode environment differs from the normal environment in the following ways:
° Complete control of the machine state.
° Interrupts are disabled.
° Implementation-specific hardware functions are enabled, as described below.
° I-stream memory management traps are prevented (by disabling I-stream mapping, mapping PALcode with a permanent TB entry, or by other mechanisms).
Complete control of the machine state allows all functions of the machine to be controlled.
Disabling interrupts allows the system to provide multi-instruction sequences as atomic opera-tions.
Enabling implementation-specific hardware functions allows access to low-level system
hardware. Preventing I-stream memory management traps allows PALcode to implement
memory management functions such as translation buffer fill.
6. 4 Special Functions Required for PALcode
PALcode uses the Alpha instruction set for most of its operations. A small number of addi-tional
functions are needed to implement the PALcode. Five opcodes are reserved to
implement PALcode functions: PAL19, PAL1B, PAL1D, PAL1E, and PAL1F. These instruc-tions
produce an trap if executed outside the PALcode environment.
° PALcode needs a mechanism to save the current state of the machine and dispatch into PALcode.
° PALcode needs a set of instructions to access hardware control registers.
244
244
Page 245
246
Common PALcode Architecture 6– 3
° PALcode needs a hardware mechanism to transition the machine from the PALcode environment to the non-PALcode environment. This mechanism loads the PC, enables
interrupts, enables mapping, and disables PALcode privileges.
An Alpha implementation may also choose to provide additional functions to simplify or
improve performance of some PALcode functions. The following are some examples:
° An Alpha implementation may include a read/ write virtual function that allows PAL-code to perform mapped memory accesses using the mapping hardware rather than pro-viding
the virtual-to-physical translation in PALcode routines. PALcode may provide a
special function to do physical reads and writes and have the Alpha loads and stores
continue to operate on virtual address in the PALcode environment.
° An Alpha implementation may include hardware assists for various functions, such as saving the virtual address of a reference on a memory management error rather than
having to generate it by simulating the effective address calculation in PALcode.
° An Alpha implementation may include private registers so it can function without hav-ing to save and restore the native general registers.
6.5 PALcode Effects on System Code
PALcode will have one effect on system code. Because PALcode may reside in main memory
and maintain privileged data structures in main memory, the operating system code that allo-cates
physical memory cannot use all of physical memory.
The amount of memory PALcode requires is small, so the loss to the system is negligible.
6.6 PALcode Replacement
Alpha systems are required to support the replacement of PALcode supplied by Compaq with
an operating system-specific version. The following functions must be implemented in PAL-code,
not directly in hardware, to facilitate replacement with different versions.
° Translation Buffer fill. Different operating systems will want to replace the Translation Buffer (TB) fill routines. The replacement routines will use different data structures.
Page tables will not be present in these systems. Therefore, no portion of the TB fill
flow that would change with a change in page tables may be placed in hardware, unless
it is placed in a manner that can be overridden by PALcode.
° Process structure. Different operating systems might want to replace the process con-text switch routines. The replacement routines will use different data structures. The
HWPCB or PCB will not be present in these systems. Therefore, no portion of the con-text
switching flows that would change with a change in process structure may be
placed in hardware.
PALcode can be viewed as consisting of the following somewhat intertwined components:
° Chip/ architecture component
° Hardware platform component
° Operating system component
245
245
Page 246
247
6– 4 Alpha Architecture Handbook
PALcode should be written modularly to facilitate the easy replacement or conditional build-ing
of each component. Such a practice simplifies the integration of CPU hardware, system
platform hardware, console firmware, operating system software, and compilers.
PALcode subsections that are commonly subject to modification include:
° Translation Buffer fill
° Process structure and context switch
° Interrupt and exception frame format and routine dispatch
° Privileged PALcode instructions
° Transitions to and from console I/ O mode
° Power-up reset
6.7 Required PALcode Instructions
The PALcode instructions listed in Table 6– 1
and Section C. 11
must be recognized by mne-monic
and opcode in all operating system implementations, but the effect of each instruction is
dependent on the implementation. Compaq defines the operation of these PALcode instruc-tions
for operating system implementations supplied by Compaq.
Table 6– 1: PALcode Instructions that Require Recognition
Mnemonic Name
BPT Breakpoint trap
BUGCHK Bugcheck trap
CSERVE Console service
GENTRAP Generate trap
RDUNIQUE Read unique value
SWPPAL Swap PALcode
WRUNIQUE Write unique value
246
246