|
| 1 | +\C{64bit} Writing 64-bit Code (Unix, Win64) |
| 2 | + |
| 3 | +This chapter attempts to cover some of the common issues involved when |
| 4 | +writing 64-bit code, to run under \i{Win64} or Unix. It covers how to |
| 5 | +write assembly code to interface with 64-bit C routines, and how to |
| 6 | +write position-independent code for shared libraries. |
| 7 | + |
| 8 | +All 64-bit code uses a flat memory model, since segmentation is not |
| 9 | +available in 64-bit mode. The one exception is the \c{FS} and \c{GS} |
| 10 | +registers, which still add their bases. |
| 11 | + |
| 12 | +Position independence in 64-bit mode is significantly simpler, since |
| 13 | +the processor supports \c{RIP}-relative addressing directly; see the |
| 14 | +\c{REL} keyword (\k{effaddr}). On most 64-bit platforms, it is |
| 15 | +probably desirable to make that the default, using the directive |
| 16 | +\c{DEFAULT REL} (\k{default}). |
| 17 | + |
| 18 | +64-bit programming is relatively similar to 32-bit programming, but |
| 19 | +of course pointers are 64 bits long; additionally, all existing |
| 20 | +platforms pass arguments in registers rather than on the stack. |
| 21 | +Furthermore, 64-bit platforms use SSE2 by default for floating point. |
| 22 | +Please see the ABI documentation for your platform. |
| 23 | + |
| 24 | +64-bit platforms differ in the sizes of the C/C++ fundamental |
| 25 | +datatypes, not just from 32-bit platforms but from each other. If a |
| 26 | +specific size data type is desired, it is probably best to use the |
| 27 | +types defined in the standard C header \c{<inttypes.h>}. |
| 28 | + |
| 29 | +All known 64-bit platforms except some embedded platforms require that |
| 30 | +the stack is 16-byte aligned at the entry to a function. In order to |
| 31 | +enforce that, the stack pointer (\c{RSP}) needs to be aligned on an |
| 32 | +\c{odd} multiple of 8 bytes before the \c{CALL} instruction. |
| 33 | + |
| 34 | +In 64-bit mode, the default instruction size is still 32 bits. When |
| 35 | +loading a value into a 32-bit register (but not an 8- or 16-bit |
| 36 | +register), the upper 32 bits of the corresponding 64-bit register are |
| 37 | +set to zero. |
| 38 | + |
| 39 | +\H{reg64} Register Names in 64-bit Mode |
| 40 | + |
| 41 | +NASM uses the following names for general-purpose registers in 64-bit |
| 42 | +mode, for 8-, 16-, 32- and 64-bit references, respectively: |
| 43 | + |
| 44 | +\c AL/AH, CL/CH, DL/DH, BL/BH, SPL, BPL, SIL, DIL, R8B-R15B |
| 45 | +\c AX, CX, DX, BX, SP, BP, SI, DI, R8W-R15W |
| 46 | +\c EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, R8D-R15D |
| 47 | +\c RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8-R15 |
| 48 | + |
| 49 | +This is consistent with the AMD documentation and most other |
| 50 | +assemblers. The Intel documentation, however, uses the names |
| 51 | +\c{R8L-R15L} for 8-bit references to the higher registers. It is |
| 52 | +possible to use those names by definiting them as macros; similarly, |
| 53 | +if one wants to use numeric names for the low 8 registers, define them |
| 54 | +as macros. The standard macro package \c{altreg} (see \k{pkg_altreg}) |
| 55 | +can be used for this purpose. |
| 56 | + |
| 57 | +\H{id64} Immediates and Displacements in 64-bit Mode |
| 58 | + |
| 59 | +In 64-bit mode, immediates and displacements are generally only 32 |
| 60 | +bits wide. NASM will therefore truncate most displacements and |
| 61 | +immediates to 32 bits. |
| 62 | + |
| 63 | +The only instruction which takes a full \i{64-bit immediate} is: |
| 64 | + |
| 65 | +\c MOV reg64,imm64 |
| 66 | + |
| 67 | +NASM will produce this instruction whenever the programmer uses |
| 68 | +\c{MOV} with an immediate into a 64-bit register. If this is not |
| 69 | +desirable, simply specify the equivalent 32-bit register, which will |
| 70 | +be automatically zero-extended by the processor, or specify the |
| 71 | +immediate as \c{DWORD}: |
| 72 | + |
| 73 | +\c mov rax,foo ; 64-bit immediate |
| 74 | +\c mov rax,qword foo ; (identical) |
| 75 | +\c mov eax,foo ; 32-bit immediate, zero-extended |
| 76 | +\c mov rax,dword foo ; 32-bit immediate, sign-extended |
| 77 | + |
| 78 | +The length of these instructions are 10, 5 and 7 bytes, respectively. |
| 79 | + |
| 80 | +If optimization is enabled and NASM can determine at assembly time |
| 81 | +that a shorter instruction will suffice, the shorter instruction will |
| 82 | +be emitted unless of course \c{STRICT QWORD} or \c{STRICT DWORD} is |
| 83 | +specified (see \k{strict}): |
| 84 | + |
| 85 | +\c mov rax,1 ; Assembles as "mov eax,1" (5 bytes) |
| 86 | +\c mov rax,strict qword 1 ; Full 10-byte instruction |
| 87 | +\c mov rax,strict dword 1 ; 7-byte instruction |
| 88 | +\c mov rax,symbol ; 10 bytes, not known at assembly time |
| 89 | +\c lea rax,[rel symbol] ; 7 bytes, usually preferred by the ABI |
| 90 | + |
| 91 | +Note that \c{lea rax,[rel symbol]} is position-independent, whereas |
| 92 | +\c{mov rax,symbol} is not. Most ABIs prefer or even require |
| 93 | +position-independent code in 64-bit mode. However, the \c{MOV} |
| 94 | +instruction is able to reference a symbol anywhere in the 64-bit |
| 95 | +address space, whereas \c{LEA} is only able to access a symbol within |
| 96 | +within 2 GB of the instruction itself (see below.) |
| 97 | + |
| 98 | +The only instructions which take a full \I{64-bit displacement}64-bit |
| 99 | +\e{displacement} is loading or storing, using \c{MOV}, \c{AL}, \c{AX}, |
| 100 | +\c{EAX} or \c{RAX} (but no other registers) to an absolute 64-bit address. |
| 101 | +Since this is a relatively rarely used instruction (64-bit code generally uses |
| 102 | +relative addressing), the programmer has to explicitly declare the |
| 103 | +displacement size as \c{ABS QWORD}: |
| 104 | + |
| 105 | +\c default abs |
| 106 | +\c |
| 107 | +\c mov eax,[foo] ; 32-bit absolute disp, sign-extended |
| 108 | +\c mov eax,[a32 foo] ; 32-bit absolute disp, zero-extended |
| 109 | +\c mov eax,[qword foo] ; 64-bit absolute disp |
| 110 | +\c |
| 111 | +\c default rel |
| 112 | +\c |
| 113 | +\c mov eax,[foo] ; 32-bit relative disp |
| 114 | +\c mov eax,[a32 foo] ; d:o, address truncated to 32 bits(!) |
| 115 | +\c mov eax,[qword foo] ; error |
| 116 | +\c mov eax,[abs qword foo] ; 64-bit absolute disp |
| 117 | + |
| 118 | +A sign-extended absolute displacement can access from -2 GB to +2 GB; |
| 119 | +a zero-extended absolute displacement can access from 0 to 4 GB. |
| 120 | + |
| 121 | +\H{unix64} Interfacing to 64-bit C Programs (Unix) |
| 122 | + |
| 123 | +On Unix, the 64-bit ABI as well as the x32 ABI (32-bit ABI with the |
| 124 | +CPU in 64-bit mode) is defined by the documents at: |
| 125 | + |
| 126 | +\W{https://www.nasm.us/abi/unix64}\c{https://www.nasm.us/abi/unix64} |
| 127 | + |
| 128 | +Although written for AT&T-syntax assembly, the concepts apply equally |
| 129 | +well for NASM-style assembly. What follows is a simplified summary. |
| 130 | + |
| 131 | +The first six integer arguments (from the left) are passed in \c{RDI}, |
| 132 | +\c{RSI}, \c{RDX}, \c{RCX}, \c{R8}, and \c{R9}, in that order. |
| 133 | +Additional integer arguments are passed on the stack. These |
| 134 | +registers, plus \c{RAX}, \c{R10} and \c{R11} are destroyed by function |
| 135 | +calls, and thus are available for use by the function without saving. |
| 136 | + |
| 137 | +Integer return values are passed in \c{RAX} and \c{RDX}, in that order. |
| 138 | + |
| 139 | +Floating point is done using SSE registers, except for \c{long |
| 140 | +double}, which is 80 bits (\c{TWORD}) on most platforms (Android is |
| 141 | +one exception; there \c{long double} is 64 bits and treated the same |
| 142 | +as \c{double}.) Floating-point arguments are passed in \c{XMM0} to |
| 143 | +\c{XMM7}; return is \c{XMM0} and \c{XMM1}. \c{long double} are passed |
| 144 | +on the stack, and returned in \c{ST0} and \c{ST1}. |
| 145 | + |
| 146 | +All SSE and x87 registers are destroyed by function calls. |
| 147 | + |
| 148 | +On 64-bit Unix, \c{long} is 64 bits. |
| 149 | + |
| 150 | +Integer and SSE register arguments are counted separately, so for the case of |
| 151 | + |
| 152 | +\c void foo(long a, double b, int c) |
| 153 | + |
| 154 | +\c{a} is passed in \c{RDI}, \c{b} in \c{XMM0}, and \c{c} in \c{ESI}. |
| 155 | + |
| 156 | +\H{win64} Interfacing to 64-bit C Programs (Win64) |
| 157 | + |
| 158 | +The Win64 ABI is described by the document at: |
| 159 | + |
| 160 | +\W{https://www.nasm.us/abi/win64}\c{https://www.nasm.us/abi/win64} |
| 161 | + |
| 162 | +What follows is a simplified summary. |
| 163 | + |
| 164 | +The first four integer arguments are passed in \c{RCX}, \c{RDX}, |
| 165 | +\c{R8} and \c{R9}, in that order. Additional integer arguments are |
| 166 | +passed on the stack. These registers, plus \c{RAX}, \c{R10} and |
| 167 | +\c{R11} are destroyed by function calls, and thus are available for |
| 168 | +use by the function without saving. |
| 169 | + |
| 170 | +Integer return values are passed in \c{RAX} only. |
| 171 | + |
| 172 | +Floating point is done using SSE registers, except for \c{long |
| 173 | +double}. Floating-point arguments are passed in \c{XMM0} to \c{XMM3}; |
| 174 | +return is \c{XMM0} only. |
| 175 | + |
| 176 | +On Win64, \c{long} is 32 bits; \c{long long} or \c{_int64} is 64 bits. |
| 177 | + |
| 178 | +Integer and SSE register arguments are counted together, so for the case of |
| 179 | + |
| 180 | +\c void foo(long long a, double b, int c) |
| 181 | + |
| 182 | +\c{a} is passed in \c{RCX}, \c{b} in \c{XMM1}, and \c{c} in \c{R8D}. |
| 183 | + |
0 commit comments