<- previous index next ->
The Intel x86-64 has many registers and named sub-registers.
This is why your 16-bit Intel programs will still run.
Here are some that are used in assembly language programming
and debugging (the "dash number" gives the number of bits):
Typically typed lower case.
+---------------------------------+ A register
|RAX-64 |
| +---------------------------+| RAX really extended accumulator
| | EAX-32 +-----------------+|| EAX extended accumulator
| | | AX-16 ||| (lower part of dividend)
| | |+--------+------+||| (quotient after division)
| | || AH-8 | AL-8 |||| (lower part of product)
| | |+--------+------+||| (H for high, L for low)
| | +-----------------+||
| +---------------------------+|
+---------------------------------+
+---------------------------------+ B register
|RBX-64 |
| +---------------------------+| RBX really extended base pointer
| | EBX-32 +-----------------+|| (EBX is double word segment)
| | | BX-16 ||| (BX is word segment)
| | |+--------+------+|||
| | || BH-8 | BL-8 ||||
| | |+--------+------+|||
| | +-----------------+||
| +---------------------------+|
+---------------------------------+
+---------------------------------+ C register
|RCX-64 |
| +---------------------------+| RCX 64-bit counter
| | ECX-32 +-----------------+|| (string and loop operations)
| | | CX-16 ||| (ECX is a 32 bit counter)
| | |+--------+------+||| (CX is a 16 bit counter)
| | || CH-8 | CL-8 ||||
| | |+--------+------+|||
| | +-----------------+||
| +---------------------------+|
+---------------------------------+
+---------------------------------+ D register
|RDX-64 |
| +---------------------------+| RDX extended EDX extended DX
| | EDX-32 +-----------------+|| (I/O pointer for memory mapped I/O)
| | | DX-16 ||| (remainder after divide)
| | |+--------+------+||| (upper part of dividend)
| | || DH-8 | DL-8 |||| (upper part of product)
| | |+--------+------+|||
| | +-----------------+||
| +---------------------------+|
+---------------------------------+
+---------------------------------+ Stack Pointer
|RSP-64 |
| +---------------------------+| RSP 64-bit stack pointer
| | ESP-32 +-------------+|| ESP extended stack pointer
| | | SP-16 ||| SP stack pointer
| | +-------------+|| (used by PUSH and POP)
| +---------------------------+|
+---------------------------------+
+---------------------------------+ Base Pointer
|RBP-64 |
| +---------------------------+| RBP 64-bit base pointer
| | EBP-32 +-------------+|| EBP extended base pointer
| | | BP-16 ||| (by convention, callers stack)
| | +-------------+|| (BP in ES segment)
| +---------------------------+| We save it, push then pop
+---------------------------------+
+---------------------------------+ Source Index
|RSI-64 |
| +---------------------------+| RSI 64-bit source index
| | ESI-32 +-------------+|| ESI extended source index
| | | SI-16 ||| SI source index
| | +-------------+|| (SI in DS segment)
| +---------------------------+|
+---------------------------------+
+---------------------------------+ Destination Index
|RDI-64 |
| +---------------------------+| RDI 64-bit destination index
| | EDI-32 +-------------+|| EDI extended destination index
| | | DI-16 ||| DI destination index
| | +-------------+|| (DI in ES segment)
| +---------------------------+|
+---------------------------------+
+---------------------------------+ Instruction Pointer
|RIP-64 |
| +---------------------------+| RIP 64-bit instruction pointer
| | EIP-32 +-------------+|| EIP extended instruction pointer
| | | IP-16 ||| IP instruction pointer
| | +-------------+|| set by jump and call
| +---------------------------+|
+---------------------------------+
+---------------------------------+ Flags indicating errors
|RFLAGS-64 |
| +---------------------------+| RFLAGS 64-bit flags
| | EFLAGS-32 +-------------+|| EFLAGS extended flags
| | | FLAGS-16 ||| FLAGS
| | +-------------+|| (not a register name!)
| +---------------------------+| (must use PUSHF and POPF)
+---------------------------------+
Additional 64-bit registers are R8, R9, R10, R11, R12, R13, R14, R15
128-bit Registers for SSE instructions and printf are xmm0, ..., xmm15
Use of registers and little endian
see testreg_64.asm for register syntax
see testreg_64.lst for binary encoding
Just a snippet of testreg_64.asm :
section .data ; preset constants, writeable
aa8: db 8 ; 8-bit
aa16: dw 16 ; 16-bit
aa32: dd 32 ; 32-bit
aa64: dq 64 ; 64-bit
section .text ; instructions, code segment
mov rax,[aa64] ; five registers in RAX
mov eax,[aa32] ; four registers in EAX
mov ax,[aa16]
mov ah,[aa8]
mov al,[aa8]
Just a snippet of testreg_64.lst
(line number, hex address in segment, hex data, assembly language)
((note byte 10 hex is 16 decimal, 20 hex is 32 decimal, etc))
8 00000000 08 aa8: db 8
9 00000001 1000 aa16: dw 16
10 00000003 20000000 aa32: dd 32
11 00000007 4000000000000000 aa64: dq 64
24 00000001 488B0425[07000000] mov rax,[aa64]
25 00000009 8B0425[03000000] mov eax,[aa32]
26 00000010 668B0425[01000000] mov ax,[aa16]
27 00000018 8A2425[00000000] mov ah,[aa8]
28 0000001F 8A0425[00000000] mov al,[aa8]
OH! Did I forget to mention that Intel is a "little endian" machine.
The bytes are stored backwards to English.
The little end, least significant byte is first, smallest address.
Other registers that are extended include:
+-------------+ CS code segment
| CS-16 |
+-------------+
+-------------+ SS stack segment
| SS-16 |
+-------------+
+-------------+ DS data segment
| DS-16 | (current module)
+-------------+
+-------------+ ES data segment
| ES-16 | (calling module, destination string)
+-------------+
+-------------+ FS heap segment
| FS-16 |
+-------------+
+-------------+ GS global segment
| GS-16 | (shared)
+-------------+
There are also 80-bit or more, floating point registers ST0, ..., ST7
(These are actually a stack, note FST vs FSTP etc)
There are also control registers CR0, ..., CR4
There are also debug registers DR0, DR1, DR2, DR3, DR6, DR7
There are also test registers TR3, ...., TR7
Basic NASM syntax
The basic syntax for a line in NASM is:
label: opcode operand(s) ; comment
The "label" is a case sensitive user name, followed by a colon.
The label is optional and when not present, indent the opcode.
The label should start in column one of the line.
The label may be on a line with nothing else or a comment.
In assembly language the "label" is an address,
not a value as it is in compiler language.
The "opcode" is not case sensitive and may be a machine instruction
or an assembler directive (pseudo operation) or a macro call.
Typically, all "opcode" fields are neatly lined up starting in the
same column. Use of "tab" is OK.
Machine instructions may be preceded by a "prefix" such as:
a16, a32, o16, o32, and others.
"operand(s)" depend on the choice of "opcode".
An operand may have several parts separated by commas,
The parts may be a combination of register names, constants,
memory references in brackets [ ] or empty.
Comments are optional, yet encouraged.
Everything from the semicolon to the end of the line is
a comment, ignored by the assembler.
The semicolon may be in column one, making the entire line
a comment. Some editors put in two semicolon, no difference.
Sections or segments:
One specific assembler directive is the "section" or "SECTION"
directive. Four types of section are predefined for ELF format:
section .data ; initialized data
; writeable, not executable
; default alignment 8 bytes
section .bss ; uninitialized space for data
; writeable, not executable
; default alignment 8 bytes
section .rodata ; initialized data
; read only, not executable
; default alignment 8 bytes
section .text ; instructions (code)
; not writeable, executable
; default alignment 16 bytes
section other ; any name other than .data, .bss,
; .rodata, .text
; your stuff
; not executable, not writeable
; default alignment 1 byte
Efficiency and samples
A few comments on efficiency:
My experience is that a good assembly language programmer
can make a small (about 100 lines) "C" program more
efficient than the gcc compiler. But, for larger
programs, the compiler will be more efficient.
Exceptions are, for example, the SGI IRIX cc compiler
that has super optimization for that specific machine.
For the Intel x86-64 here are some samples in nasm and from gcc
(different syntax but you should be able to recognize the instructions)
Focus on the loop, there is prologue and epilogue code that should
be included, yet was omitted. Note the test has "check" values
at each end of the array. There is no range testing in
either "C" or assembly language.
A simple loop loopint_64.asm
; loopint_64.asm code loopint.c for nasm
; /* loopint_64.c a very simple loop that will be coded for nasm */
; #include <stdio.h>
; int main()
; {
; long int dd1[100]; // 100 could be 3 gigabytes
; long int i; // must be long for more than 2 gigabytes
; dd1[0]=5; /* be sure loop stays 1..98 */
; dd1[99]=9;
; for(i=1; i<99; i++) dd1[i]=7;
; printf("dd1[0]=%ld, dd1[1]=%ld, dd1[98]=%ld, dd1[99]=%ld\n",
; dd1[0], dd1[1], dd1[98],dd1[99]);
; return 0;
;}
; execution output is dd1[0]=5, dd1[1]=7, dd1[98]=7, dd1[99]=9
section .bss
dd1: resq 100 ; reserve 100 long int
i: resq 1 ; actually unused, kept in register
section .data ; Data section, initialized variables
fmt: db "dd1[0]=%ld, dd1[1]=%ld, dd1[98]=%ld, dd1[99]=%ld",10,0
extern printf ; the C function, to be called
section .text
global main
main: push rbp ; set up stack
mov qword [dd1],5 ; dd1[0]=5; memory to memory
mov qword [dd1+99*8],9 ; dd1[99]=9; indexed 99 qword
mov rdi, 1*8 ; i=1; index, will move by 8 bytes
loop1: mov qword [dd1+rdi],7 ; dd1[i]=7;
add rdi, 8 ; i++; 8 bytes
cmp rdi, 8*99 ; i<99
jne loop1 ; loop until incremented i=99
mov rdi, fmt ; pass address of format
mov rsi, qword [dd1] ; dd1[0] first list parameter
mov rdx, qword [dd1+1*8] ; dd1[1] second list parameter
mov rcx, qword [dd1+98*8] ; dd1[98] third list parameter
mov r8, qword [dd1+99*8] ; dd1[99] fourth list parameter
mov rax, 0 ; no xmm used
call printf ; Call C function
pop rbp ; restore stack
mov rax,0 ; normal, no error, return value
ret ; return 0;
Speed consideration must take into account cache and virtual memory
performance, number of bytes transferred from RAM and clock cycles.
On modern computer architectures, this is almost impossible. For example,
the Pentium 4 translates the 80x86 code into RISC pipeline code and
is actually executing instructions that are different from the
assembly language. Carefully benchmarking complete applications is
about the only conclusive measure of efficiency.
"C" and other programming languages may call subroutines, functions,
procedures written in assembly language. Here is a small sample
using floating point just to show use of ST registers, mentioned in comments.
Main C program test_callf1_64.c
// test_callf1_64.c test callf1_64.asm
// nasm -f elf64 -l callf1_64.lst callf1_64.asm
// gcc -m64 -o test_callf1_64 test_callf1_64.c callf1_64.o
// ./test_callf1_64 > test_callf1_64.out
#include "callf1_64.h"
#include <stdio.h>
int main()
{
double L[2];
printf("test_callf1_64.c using callf1_64.asm\n");
L[0]=1.0;
L[1]=2.0;
callf1_64(L); // add 3.0 to L[0], add 4.0 to L[1]
printf("L[0]=%e, L[1]=%e \n", L[0], L[1]);
return 0;
}
Full with debug callf1_64.asm
Stripped down callf1_64.asm with no demo, no debug:
; callf1_64.asm a basic structure for a subroutine to be called from "C"
; Parameter: double *L
; Result: L[0]=L[0]+3.0 L[1]=L[1]+4.0
global callf1_64 ; linker must know name of subroutine
SECTION .data ; Data section, initialized variables
a3: dq 3.0 ; 64-bit variable a initialized to 3.0
a4: dq 4.0 ; 64-bit variable b initializes to 4.0
SECTION .text ; Code section.
callf1_64: ; name must appear as a nasm label
push rbp ; save rbp
mov rax,rdi ; first, only, in parameter, address
; add 3.0 to L[0]
fld qword [rax] ; load L[0] (pushed on flt pt stack, st0)
fadd qword [a3] ; floating add 3.0 (to st0)
fstp qword [rax] ; store into L[0] (pop flt pt stack)
fld qword [rax+8] ; load L[1] (pushed on flt pt stack, st0)
fadd qword [a4] ; floating add 4.0 (to st0)
fstp qword [rax+8] ; store into L[1] (pop flt pt stack)
pop rbp ; restore callers stack frame
ret ; return
We did not need to save floating point stack, we left it unchanged.
We could have used dt and tword for 80 bit floating point.
Calling printf uses xmm registers.
Over the years I have kept snippets of computer related news.
If time: Each small part of a computer system can fetch an
instruction every clock time. The easiest way to understand
this is a pipeline. Think of water coming into a pipe, flowing
through and finally out the end.
Simple computer pipeline:
___________________________________________________________________
address -> instruction -> decode -> arithmetic -> memory -> finish
___________________________________________________________________
We use registers that all have the system clock and each clock the
instruction moves to the next register (stage of the pipeline)
shown in the following 5 clock per instruction pipeline:
IF instruction fetch, IP is address into memory fetching instruction
ID instruction decode and register read out of two values
EX execute instruction or compute data memory address
M data memory access to store or fetch a data word
WB write back value into general register
IF ID EX M WB
+--+ +--+ +--+ +--+ +--+
| | | | | A|-|\ | | | |
| | | | /---| | \ \_| | | |
|IP|-(I)-|IR|-(R) = | | / / | |-(D)-| |--+
| | | | ^ \---| B|-|/ | | | | |
+--+ +--+ | +--+ +--+ +--+ |
^ ^ | ^ ALU ^ ^ |
| | | | | | |
clk-+--------+-----------+--------+--------+ |
| |
+-----------------------------+
<- previous index next ->