CMPE 310 Lecture 4,

    <- previous    index    next ->

Lecture 4 Arithmetic and shifting

Both integer and floating point arithmetic are demonstrated.
In order to make the source code smaller, a macro is defined
to print out results. The equivalent "C" program is given as
comments.

First, see how to call the "C" library function, printf, to make
it easier to print values:
Look at the file printf1_64.asm

; printf1_64.asm   print an integer from storage and from a register
; Assemble:	nasm -f elf64 -l printf1_64.lst  printf1_64.asm
; Link:		gcc -m64 -o printf1_64  printf1_64.o
; Run:		./printf1_64 > printf1_64.out
; Output:	a=5, rax=7

; Equivalent C code
; /* printf1.c  print a long int, 64-bit, and an expression */
; #include <stdio.h>
; int main()
; {
;   long int a=5;
;   printf("a=%ld, rax=%ld\n", a, a+2);
;   return 0;
; }

; Declare external function
        extern	printf		; the C function, to be called

        SECTION .data		; Data section, initialized variables

	a:	dq	5	; long int a=5;
fmt:    db "a=%ld, rax=%ld", 10, 0	; The printf format, "\n",'0'


        SECTION .text           ; Code section.

        global main		; the standard gcc entry point
main:				; the program label for the entry point
        push    rbp		; set up stack frame
	
	mov	rax,[a]		; put "a" from store into register
	add	rax,2		; a+2  add constant 2
	mov	rdi,fmt		; format for printf
	mov	rsi,[a]         ; first parameter for printf
	mov	rdx,rax         ; second parameter for printf
	mov	rax,0		; no xmm registers
        call    printf		; Call C function

	pop	rbp		; restore stack

	mov	rax,0		; normal, no error, return value
	ret			; return
	
Printing floating point
Now, we may need to print "float" and "double" and calling  printf
gets more complicated. Still easier than doing your own conversion.
Look at the file printf2.asm
Output is printf2.out

; printf2_64.asm  use "C" printf on char, string, int, long int, float, double
; 
; Assemble:	nasm -f elf64 -l printf2_64.lst  printf2_64.asm
; Link:		gcc -m64 -o printf2_64  printf2_64.o
; Run:		./printf2_64 > printf2_64.out
; Output:	cat printf2_64.out
; 
; A similar "C" program   printf2_64.c 
; #include <stdio.h>
; int main()
; {
;   char      char1='a';            /* sample character */
;   char      str1[]="mystring";    /* sample string */
;   int       len=9;                /* sample string */
;   int       inta1=12345678;       /* sample integer 32-bit */
;   long int  inta2=12345678900;    /* sample long integer 64-bit */
;   long int  hex1=0x123456789ABCD; /* sample hexadecimal 64-bit*/
;   float     flt1=5.327e-30;       /* sample float 32-bit */
;   double    flt2=-123.4e300;      /* sample double 64-bit*/
; 
;   printf("printf2_64: flt2=%e\n", flt2);
;   printf("char1=%c, srt1=%s, len=%d\n", char1, str1, len);
;   printf("char1=%c, srt1=%s, len=%d, inta1=%d, inta2=%ld\n",
;          char1, str1, len, inta1, inta2);
;   printf("hex1=%lX, flt1=%e, flt2=%e\n", hex1, flt1, flt2);
;   return 0;
; }
        extern printf                   ; the C function to be called

        SECTION .data                   ; Data section
					; format strings for printf
fmt2:   db "printf2: flt2=%e", 10, 0
fmt3:	db "char1=%c, str1=%s, len=%d", 10, 0
fmt4:	db "char1=%c, str1=%s, len=%d, inta1=%d, inta2=%ld", 10, 0
fmt5:	db "hex1=%lX, flt1=%e, flt2=%e", 10, 0
	
char1:	db	'a'			; a character 
str1:	db	"mystring",0	        ; a C string, "string" needs 0
len:	equ	$-str1			; len has value, not an address
inta1:	dd	12345678		; integer 12345678, note dd
inta2:	dq	12345678900		; long integer 12345678900, note dq
hex1:	dq	0x123456789ABCD	        ; long hex constant, note dq
flt1:	dd	5.327e-30		; 32-bit floating point, note dd
flt2:	dq	-123.456789e300	        ; 64-bit floating point, note dq

	SECTION .bss
		
flttmp:	resq 1			        ; 64-bit temporary for printing flt1
	
        SECTION .text                   ; Code section.

        global	main		        ; "C" main program 
main:				        ; label, start of main program
	push    rbp			; set up stack frame 
	fld	dword [flt1]	        ; need to convert 32-bit to 64-bit
	fstp	qword [flttmp]          ; floating load makes 80-bit,
	                                ; store as 64-bit
	mov	rdi,fmt2
	movq	xmm0, qword [flt2]
	mov	rax, 1			; 1 xmm register
	call	printf

	mov	rdi, fmt3		; first arg, format
	mov	rsi, [char1]		; second arg, char
	mov	rdx, str1		; third arg, string
	mov	rcx, len		; fourth arg, int
	mov	rax, 0			; no xmm used
	call	printf

	mov	rdi, fmt4		; first arg, format
	mov	rsi, [char1]		; second arg, char
	mov	rdx, str1		; third arg, string
	mov	rcx, len		; fourth arg, int
	mov	r8, [inta1]		; fifth arg, inta1 32->64
	mov	r9, [inta2]		; sixth arg, inta2
	mov	rax, 0			; no xmm used
	call	printf

	mov	rdi, fmt5		; first arg, format
	mov	rsi, [hex1]		; second arg, char
	movq	xmm0, qword [flttmp]    ; first double
	movq	xmm1, qword [flt2]	; second double
	mov	rax, 2			; 2 xmm used
	call	printf
	
	pop	rbp			; restore stack	
        mov     rax, 0			; exit code, 0=normal
        ret				; main returns to operating system


Integer arithmetic	
Now, for integer arithmetic, look at the file intarith_64.asm
Output is intarith_64.out
C version is intarith_64.c
Since all the lines use the same format, a macro was created
to do the call on printf.

; intarith_64.asm    show some simple C code and corresponding nasm code
;                    the nasm code is one sample, not unique
;
; compile:	nasm -f elf64 -l intarith_64.lst  intarith_64.asm
; link:		gcc -m64 -o intarith_64  intarith_64.o
; run:		./intarith_64 > intarith_64.out
;
; the output from running intarith_64.asm and intarith.c is:	
; c=5  , a=3, b=4, c=5
; c=a+b, a=3, b=4, c=7
; c=a-b, a=3, b=4, c=-1
; c=a*b, a=3, b=4, c=12
; c=c/a, a=3, b=4, c=4
;
;The file  intarith.c  is:
;  /* intarith.c */
;  #include <stdio.h>
;  int main()
;  { 
;    long int a=3, b=4, c;
;    c=5;
;    printf("%s, a=%ld, b=%ld, c=%ld\n","c=5  ", a, b, c);
;    c=a+b;
;    printf("%s, a=%ld, b=%ld, c=%ld\n","c=a+b", a, b, c);
;    c=a-b;
;    printf("%s, a=%ld, b=%ld, c=%ld\n","c=a-b", a, b, c);
;    c=a*b;
;    printf("%s, a=%ld, b=%ld, c=%ld\n","c=a*b", a, b, c);
;    c=c/a;
;    printf("%s, a=%ld, b=%ld, c=%ld\n","c=c/a", a, b, c);
;    return 0;
; }
        extern printf		; the C function to be called

%macro	pabc 1			; a "simple" print macro
	section .data
.str	db	%1,0		; %1 is first actual in macro call
	section .text
        mov     rdi, fmt4	; first arg, format
	mov	rsi, .str	; second arg
	mov     rdx, [a]        ; third arg
	mov     rcx, [b]        ; fourth arg
	mov     r8, [c]         ; fifth arg
	mov     rax, 0	        ; no xmm used
	call    printf		; Call C function
%endmacro
	
	section .data  		; preset constants, writeable
a:	dq	3		; 64-bit variable a initialized to 3
b:	dq	4		; 64-bit variable b initializes to 4
fmt4:	db "%s, a=%ld, b=%ld, c=%ld",10,0	; format string for printf
	
	section .bss 		; unitialized space
c:	resq	1		; reserve a 64-bit word

	section .text		; instructions, code segment
	global	 main		; for gcc standard linking
main:				; label
	push 	rbp		; set up stack
lit5:				; c=5;
	mov	rax,5	 	; 5 is a literal constant
	mov	[c],rax		; store into c
	pabc	"c=5  "		; invoke the print macro
	
addb:				; c=a+b;
	mov	rax,[a]	 	; load a
	add	rax,[b]		; add b
	mov	[c],rax		; store into c
	pabc	"c=a+b"		; invoke the print macro
	
subb:				; c=a-b;
	mov	rax,[a]	 	; load a
	sub	rax,[b]		; subtract b
	mov	[c],rax		; store into c
	pabc	"c=a-b"		; invoke the print macro
	
mulb:				; c=a*b;
	mov	rax,[a]	 	; load a (must be rax for multiply)
	imul	qword [b]	; signed integer multiply by b
	mov	[c],rax		; store bottom half of product into c
	pabc	"c=a*b"		; invoke the print macro
	
diva:				; c=c/a;
	mov	rax,[c]	 	; load c
	mov	rdx,0		; load upper half of dividend with zero
	idiv	qword [a]	; divide double register edx rax by a
	mov	[c],rax		; store quotient into c
	pabc	"c=c/a"		; invoke the print macro

	pop	rbp		; pop stack
        mov     rax,0           ; exit code, 0=normal
	ret			; main returns to operating system


Note that two registers are used for general multiply and divide.

        bbbb  [mem] a product of 64-bits times 64-bits is 128-bits
 imul   bbbb  rax
   ---------
rdx bbbbbbbb  rax   the upper part of the product is in rdx
                    the lower part of the product is in rax


rdx bbbbbbbb  rax  before divide, the upper part of dividend is in rdx
                                  the lower part of dividend is in rax
 idiv   bbbb  [mem] the divisor
    --------
                   after divide,  the quotient is in rax
                                  the remainder is in rdx

Floating point arithmetic
Now, for floating point arithmetic, look at the file fltarith_64.asm
Output is fltarith_64.out
C version is fltarith_64.c
Since all the lines use the same format, a macro was created
to do the call on printf.

Note the many similarities to integer arithmetic, yet some basic differences.

; fltarith_64.asm   show some simple C code and corresponding nasm code
;                   the nasm code is one sample, not unique
;
; compile  nasm -f elf64 -l fltarith_64.lst  fltarith_64.asm
; link     gcc -m64 -o fltarith_64  fltarith_64.o
; run      ./fltarith_64 > fltarith_64.out
;
; the output from running fltarith and fltarithc is:	
; c=5.0, a=3.000000e+00, b=4.000000e+00, c=5.000000e+00
; c=a+b, a=3.000000e+00, b=4.000000e+00, c=7.000000e+00
; c=a-b, a=3.000000e+00, b=4.000000e+00, c=-1.000000e+00
; c=a*b, a=3.000000e+00, b=4.000000e+00, c=1.200000e+01
; c=c/a, a=3.000000e+00, b=4.000000e+00, c=4.000000e+00
; a=i  , a=8.000000e+00, b=1.600000e+01, c=1.600000e+01
; a<=b , a=8.000000e+00, b=1.600000e+01, c=1.600000e+01
; b==c , a=8.000000e+00, b=1.600000e+01, c=1.600000e+01
;The file  fltarith.c  is:
;  #include <stdio.h>
;  int main()
;  { 
;    double a=3.0, b=4.0, c;
;    long int i=8;
;
;    c=5.0;
;    printf("%s, a=%e, b=%e, c=%e\n","c=5.0", a, b, c);
;    c=a+b;
;    printf("%s, a=%e, b=%e, c=%e\n","c=a+b", a, b, c);
;    c=a-b;
;    printf("%s, a=%e, b=%e, c=%e\n","c=a-b", a, b, c);
;    c=a*b;
;    printf("%s, a=%e, b=%e, c=%e\n","c=a*b", a, b, c);
;    c=c/a;
;    printf("%s, a=%e, b=%e, c=%e\n","c=c/a", a, b, c);
;    a=i;
;    b=a+i;
;    i=b;
;    c=i;
;    printf("%s, a=%e, b=%e, c=%e\n","c=c/a", a, b, c);
;    if(a<b) printf("%s, a=%e, b=%e, c=%e\n","a<=b ", a, b, c);
;    else    printf("%s, a=%e, b=%e, c=%e\n","a>b  ", a, b, c);
;    if(b==c)printf("%s, a=%e, b=%e, c=%e\n","b==c ", a, b, c);
;    else    printf("%s, a=%e, b=%e, c=%e\n","b!=c ", a, b, c);
;    return 0;
; }

        extern printf		; the C function to be called

%macro	pabc 1			; a "simple" print macro
	section	.data
.str	db	%1,0		; %1 is macro call first actual parameter
	section .text
				; push onto stack backwards 
        mov	rdi, fmt	; address of format string
	mov	rsi, .str	; string passed to macro
	movq	xmm0, qword [a]	; first floating point in fmt
	movq	xmm1, qword [b]	; second floating point
	movq	xmm2, qword [c]	; third floating point
	mov	rax, 3		; 3 floating point arguments to printf
        call    printf          ; Call C function
%endmacro
	
	section	.data  		; preset constants, writeable
a:	dq	3.0		; 64-bit variable a initialized to 3.0
b:	dq	4.0		; 64-bit variable b initializes to 4.0
i:	dq	8		; a 64 bit integer
five:	dq	5.0		; constant 5.0
fmt:    db "%s, a=%e, b=%e, c=%e",10,0	; format string for printf
	
	section .bss 		; unitialized space
c:	resq	1		; reserve a 64-bit word

	section .text		; instructions, code segment
	global	main		; for gcc standard linking
main:				; label

	push	rbp		; set up stack
lit5:				; c=5.0;
	fld	qword [five]	; 5.0 constant
	fstp	qword [c]	; store into c
	pabc	"c=5.0"		; invoke the print macro
	
addb:				; c=a+b;
	fld	qword [a] 	; load a (pushed on flt pt stack, st0)
	fadd	qword [b]	; floating add b (to st0)
	fstp	qword [c]	; store into c (pop flt pt stack)
	pabc	"c=a+b"		; invoke the print macro
	
subb:				; c=a-b;
	fld	qword [a] 	; load a (pushed on flt pt stack, st0)
	fsub	qword [b]	; floating subtract b (to st0)
	fstp	qword [c]	; store into c (pop flt pt stack)
	pabc	"c=a-b"		; invoke the print macro
	
mulb:				; c=a*b;
	fld	qword [a]	; load a (pushed on flt pt stack, st0)
	fmul	qword [b]	; floating multiply by b (to st0)
	fstp	qword [c]	; store product into c (pop flt pt stack)
	pabc	"c=a*b"		; invoke the print macro
	
diva:				; c=c/a;
	fld	qword [c] 	; load c (pushed on flt pt stack, st0)
	fdiv	qword [a]	; floating divide by a (to st0)
	fstp	qword [c]	; store quotient into c (pop flt pt stack)
	pabc	"c=c/a"		; invoke the print macro

intflt:				; a=i;
	fild	qword [i]	; load integer as floating point
	fst	qword [a]	; store the floating point (no pop)
	fadd	st0		; b=a+i; 'a' as 'i'  already on flt stack
	fst	qword [b]	; store sum (no pop) 'b' still on stack
	fistp	qword [i]	; i=b; store floating point as integer
	fild	qword [i]	; c=i; load again from ram (redundant)
	fstp	qword [c]
	pabc	"a=i  "		; invoke the print macro

cmpflt:	fld	qword [b]	; into st0, then pushed to st1
	fld	qword [a]	; in st0
	fcomip	st0,st1		; a compare b, pop a
	jg	cmpfl2
	pabc	"a<=b "
	jmp	cmpfl3
cmpfl2:	
	pabc	"a>b  "
cmpfl3:
	fld	qword [c]	; should equal [b]
	fcomip  st0,st1
	jne	cmpfl4
	pabc	"b==c "
	jmp	cmpfl5
cmpfl4:
	pabc	"b!=c "
cmpfl5:

	pop	rbp		; pop stack
        mov     rax,0           ; exit code, 0=normal
	ret			; main returns to operating system

Shift data in a register
Refer to nasmdoc.txt for details.
A brief summary is provided here.
"reg" is an 8-bit, 16-bit or 32-bit or 64-bit register
"count" is a number of bits to shift
"right" moves contents of the register to the right, makes it smaller
"left" moves contents of the register to the left, makes it bigger

  SAL   reg,count   shift arithmetic left
  SAR   reg,count   shift arithmetic right (sign extension)
  SHL   reg,count   shift left (logical, zero fill)
  SHR   reg,count   shift right (logical, zero fill)
  ROL   reg,count   rotate left
  ROR   reg,count   rotate right
  SHLD  reg1,reg2,count  shift left double-register 
  SHRD  reg1,reg2,count  shift right double-register

An example of using the various shifts is in: shift_64.asm
Output is shift_64.out
Just to make it easy to check, we keep all shift amounts a multiple
of 4, 4 bits per hex digit in output.

; shift_64.asm    the nasm code is one sample, not unique
;
; compile:	nasm -f elf64 -l shift_64.lst  shift_64.asm
; link:		gcc -m64 -o shift_64  shift_64.o
; run:		./shift_64 > shift_64.out
;
; the output from running shift.asm (zero filled) is:	
; shl rax,4, old rax=ABCDEF0987654321, new rax=BCDEF09876543210, 
; shl rax,8, old rax=ABCDEF0987654321, new rax=CDEF098765432100, 
; shr rax,4, old rax=ABCDEF0987654321, new rax= ABCDEF098765432, 
; sal rax,8, old rax=ABCDEF0987654321, new rax=CDEF098765432100, 
; sar rax,4, old rax=ABCDEF0987654321, new rax=FABCDEF098765432, 
; rol rax,4, old rax=ABCDEF0987654321, new rax=BCDEF0987654321A, 
; ror rax,4, old rax=ABCDEF0987654321, new rax=1ABCDEF098765432, 
; shld rdx,rax,8, old rdx:rax=0,ABCDEF0987654321,
;                 new rax=ABCDEF0987654321 rdx=              AB, 
; shl rax,8     , old rdx:rax=0,ABCDEF0987654321,
;                 new rax=CDEF098765432100 rdx=              AB, 
; shrd rdx,rax,8, old rdx:rax=0,ABCDEF0987654321,
;                 new rax=ABCDEF0987654321 rdx=2100000000000000, 
; shr rax,8     , old rdx:rax=0,ABCDEF0987654321,
;                 new rax=  ABCDEF09876543 rdx=2100000000000000, 

        extern printf		; the C function to be called

%macro	prt	1		; old and new rax
	section .data
.str	db	%1,0		; %1 is which shift string
	section .text
        mov	rdi, fmt	; address of format string
	mov	rsi, .str 	; callers string
	mov	rdx,rax		; new value
	mov	rax, 0		; no floating point
        call    printf          ; Call C function
%endmacro

%macro	prt2	1		; old and new rax,rdx
	section .data
.str	db	%1,0		; %1 is which shift
	section .text
        mov	rdi, fmt2	; address of format string
	mov	rsi, .str 	; callers string
	mov	rcx, rdx	; new rdx befor next because used
	mov	rdx, rax	; new rax
	mov	rax, 0		; no floating point
        call    printf          ; Call C function
%endmacro

	 section .bss
raxsave: resq	1		; save rax while calling a function 
rdxsave: resq	1		; save rdx while calling a function 
	
	section .data  		; preset constants, writeable
b64:	dq	0xABCDEF0987654321	; data to shift
fmt:    db "%s, old rax=ABCDEF0987654321, new rax=%16lX, ",10,0	; format string
fmt2:   db "%s, old rdx:rax=0,ABCDEF0987654321,",10,"                new rax=%16lX rdx=%16lX, ",10,0
	
	section .text		; instructions, code segment
	global	 main		; for gcc standard linking
main:	push	rbp		; set up stack
	
shl1:	mov	rax, [b64]	; data to shift
	shl	rax, 4		; shift rax 4 bits, one hex position left
	prt	"shl rax,4 "	; invoke the print macro

shl4:	mov	rax, [b64]	; data to shift
	shl	rax,8		; shift rax 8 bits. two hex positions left
	prt	"shl rax,8 "	; invoke the print macro

shr4:	mov	rax, [b64]	; data to shift
	shr	rax,4		; shift
	prt	"shr rax,4 "	; invoke the print macro

sal4:	mov	rax, [b64]	; data to shift
	sal	rax,8		; shift
	prt	"sal rax,8 "	; invoke the print macro

sar4:	mov	rax, [b64]	; data to shift
	sar	rax,4		; shift
	prt	"sar rax,4 "	; invoke the print macro

rol4:	mov	rax, [b64]	; data to shift
	rol	rax,4		; shift
	prt	"rol rax,4 "	; invoke the print macro

ror4:	mov	rax, [b64]	; data to shift
	ror	rax,4		; shift
	prt	"ror rax,4 "	; invoke the print macro

shld4:	mov	rax, [b64]	; data to shift
	mov	rdx,0		; register receiving bits
	shld	rdx,rax,8	; shift
	mov	[raxsave],rax	; save, destroyed by function
	mov	[rdxsave],rdx	; save, destroyed by function
	prt2	"shld rdx,rax,8"; invoke the print macro

shla:	mov	rax,[raxsave]	; restore, destroyed by function
	mov	rdx,[rdxsave]	; restore, destroyed by function
	shl	rax,8		; finish double shift, both registers
	prt2	"shl rax,8     "; invoke the print macro

shrd4:	mov	rax, [b64]	; data to shift
	mov	rdx,0		; register receiving bits
	shrd	rdx,rax,8	; shift
	mov	[raxsave],rax	; save, destroyed by function
	mov	[rdxsave],rdx	; save, destroyed by function
	prt2	"shrd rdx,rax,8"; invoke the print macro

shra:	mov	rax,[raxsave]	; restore, destroyed by function
	mov	rdx,[rdxsave]	; restore, destroyed by function
	shr	rax,8		; finish double shift, both registers
	prt2	"shr rax,8     "; invoke the print macro

	pop	rbp		; restore stack
	mov     rax,0           ; exit code, 0=normal
	ret			; main returns to operating system

First project is assigned.
You may want to do this in Lab this Friday.
www.cs.umbc.edu/~squire/cmpe310_proj.shtml


Instructions and data come from the cache

The "cache" is very high speed memory on the CPU chip.
Typical CPU's can get words out of the cache every clock.
In order to be as fast as the logic on the CPU, the cache
can not be as large as the main memory. Typical cache sizes
are hundreds of kilobytes to a few megabytes.

There is typically a level 1 instruction cache, a level 1
data cache. These would be in the blocks on our project
schematic labeled instruction memory and data memory.

Then, there is typically a level 2 unified cache that is
larger and may be slower than the level 1 caches. Unified
means it is used for both instructions and data.

Some computers have a level 3 cache that is larger and
slower than the level 2 cache. Multi core computers
have at least a L1 instruction cache and a L1 data cache
for every core. Some have a L3 unified cache that is
available to all cores. Thus data can go from one core
to another without going through RAM.


     +-----------+   +-----------+
     | L1 Icache |   | L1 Dcache |
     +-----------+   +-----------+
           |               |
     +---------------------------+
     | L2 unified cache          |
     +---------------------------+
              |
           +------+
           | RAM  |
           +------+
              |
           +------+
           | Disc |  or Solid State Drive, SSD
           +------+

The goal of the computer system is to use the cache for instructions
and data in order to execute instructions as fast as possible.
Typical RAM requires 5 to 10 clocks to get an instruction or
data word. A typical CPU does prefetching and branch prediction
to bring instructions into the cache in order to minimize
stalls waiting for instructions. You will simulate a cache and
the associated stalls in part 3 of your project.

Intel IA-64 cache structure, page 3
IA-64 Itanium


An approximate hierarchy is:

                size    response
     CPU                  0.5 ns  2 GHz clock
     L1 cache  .032MB     0.5 ns  one for instructions, another for data
     L2 cache     4MB     1.0 ns
     RAM       4000MB     4.0 ns
     disk    500000MB     4.0 ms = 4,000,000 ns

A program is loaded from disk, into RAM, then as needed
into L2 cache, then as needed into L1 cache, then as needed
into the CPU pipelines.
1)  The CPU initiates the request by sending the L1 cache an address.
    If the L1 cache has the value at that address, the value is quickly
    sent to the CPU.
2)  If the L1 cache does not have the value, the address is passed to
    the L2 cache. If the L2 cache has the value, the value is quickly
    passed to the L1 cache. The L1 cache passes the value to the CPU.
3)  If the L2 cache does not have the value at the address, the
    address is passed to a memory controller that must access RAM
    in order to get the value. The value passes from RAM, through
    the memory controller to the L2 cache then to the L1 cache then
    to the CPU.

This may seem tedious yet each level is optimized to provide good
performance for the total system. One reason the system is fast is
because of wide data paths. The RAM data path may be 128-bits or
256-bits wide. This wide data path may continue through the
L2 cache and L1 cache. The cache is organized in blocks
(lines or entries may be used in place of the word blocks)
that provide for many bytes of data to be accessed in parallel.
When reading from a cache, it is like combinational logic, it
is not clocked. When writing into a cache it must write on
a clock edge.

A cache receives an address, a computer address, a binary number.
The parts of the cache are all powers of two. The basic unit of
an address is a byte. For our study, four bytes, one word, will
always be fetched from the cache. When working the homework
problems be sure to read the problem carefully to determine if
the addresses given are byte addresses or word addresses.
It will be easiest and less error prone if all addresses are
converted to binary for working the homework.

The basic elements of a cache are:
  A valid bit: This is a 1 if values are in the cache block
  A tag field: This is the upper part of the address for
               the values in the cache block.
  Cache block: The values that may be instructions or data

Here is the absolutely simplest cache with one word blocks

    <- previous    index    next ->

Lecture 4 Arithmetic and shifting

Printing floating point

Integer arithmetic

Floating point arithmetic

Shift data in a register

First project is assigned.

Instructions and data come from the cache

Other links

Go to top