These are not intended to be complete lecture notes. Complicated figures or tables or formulas are included here in case they were not clear or not copied correctly in class. Source code may be included in line or by a link. Lecture numbers correspond to the syllabus numbering.
Numbers are represented as the coefficients of powers of a base.
(in plain text, we use "^" to mean, raise to power or exponentiation)
With no extra base indication, expect decimal numbers:
12.34 is a representation of
1*10^1 + 2*10^0 + 3*10^-1 + 4*10^-2 or
10
2
.3
+ .04
------
10.34
Binary numbers, in NASM assembly language, have a trailing B or b.
101.11B is a representation of
1*2^2 + 0*2^1 + 1*2^0 + 1*2^-1 + 1*2^-2 or
4
0
1
.5 (you may compute 2^-n or look up in table below)
+ .25
------
5.75
Converting a decimal number to binary may be accomplished:
Convert 12.34 from decimal to binary
Integer part Fraction part
quotient remainder integer fraction
12/2 = 6 0 .34*2 = 0.68
6/2 = 3 0 .68*2 = 1.36
3/2 = 1 1 .36*2 = 0.72
1/2 = 0 1 .72*2 = 1.44
done .44*2 = 0.88
read up 1100 .88*2 = 1.76
.76*2 = 1.52
.52*2 = 1.04
quit
read down .01010111
answer is 1100.01010111
Powers of 2
Decimal
n -n
2 n 2
1 0 1.0
2 1 0.5
4 2 0.25
8 3 0.125
16 4 0.0625
32 5 0.03125
64 6 0.015625
128 7 0.0078125
256 8 0.00390625
512 9 0.001953125
1024 10 0.0009765625
2048 11 0.00048828125
4096 12 0.000244140625
8192 13 0.0001220703125
16384 14 0.00006103515625
32768 15 0.000030517578125
65536 16 0.0000152587890625
Binary
n -n
2 n 2
1 0 1.0
10 1 0.1
100 2 0.01
1000 3 0.001
10000 4 0.0001
100000 5 0.00001
1000000 6 0.000001
10000000 7 0.0000001
100000000 8 0.00000001
1000000000 9 0.000000001
10000000000 10 0.0000000001
100000000000 11 0.00000000001
1000000000000 12 0.000000000001
10000000000000 13 0.0000000000001
100000000000000 14 0.00000000000001
1000000000000000 15 0.000000000000001
10000000000000000 16 0.0000000000000001
Hexadecimal
n -n
2 n 2
1 0 1.0
2 1 0.8
4 2 0.4
8 3 0.2
10 4 0.1
20 5 0.08
40 6 0.04
80 7 0.02
100 8 0.01
200 9 0.008
400 10 0.004
800 11 0.002
1000 12 0.001
2000 13 0.0008
4000 14 0.0004
8000 15 0.0002
10000 16 0.0001
n n
n 2 hexadecimal 2 decimal approx notation
10 400 1,024 10^3 K kilo
20 100000 1,048,576 10^6 M mega
30 40000000 1,073,741,824 10^9 G giga
40 10000000000 1,099,511,627,776 10^12 T tera
The three representations of negative numbers that have been
used in computers are twos complement, ones complement and
sign magnitude. In order to represent negative numbers, it must
be known where the "sign" bit is placed. All modern binary
computers use the leftmost bit of the computer word as a sign bit.
The examples below use a 4-bit register to show all possible
values for the three representations.
decimal twos complement ones complement sign magnitude
0 0000 0000 0000
1 0001 0001 0001
2 0010 0010 0010
3 0011 0011 0011
4 0100 0100 0100
5 0101 0101 0101
6 0110 0110 0110
7 0111 0111 0111
-7 1001 1000 1111
-6 1010 1001 1110
-5 1011 1010 1101
-4 1100 1011 1100
-3 1101 1100 1011
-2 1110 1101 1010
-1 1111 1110 1001
-8 1000 -0 1111 -0 1000
^ / ^|||
\_ add 1 _/ sign__/ --- magnitude
To get the sign magnitude, convert the decimal to binary and
place a zero in the sign bit for positive, place a one in the
sign bit for negative.
To get the ones complement, convert the decimal to binary,
including leading zeros, then invert every bit. 1->0, 0->1.
To get the twos complement, get the ones complement and add 1.
(Throw away any bits that are outside of the register)
It may seem silly to have a negative zero, but it is
mathematically incorrect to have -(-8) = -8
NASM is installed on linux.gl.umbc.edu and can be used there.
From anywhere that you can reach the internet, log onto your
UMBC account using:
ssh your-user-id@linux.gl.umbc.edu
your-password
You should set up a directory for CMSC 313 and keep all your
course work in one directory.
e.g. mkdir cs313 # only once
cd cs313
Copy over a sample program to your directory using:
cp /afs/umbc.edu/users/s/q/squire/pub/download/hello.asm .
Assemble hello.asm using:
nasm -f elf hello.asm
Link to create an executable using:
gcc -o hello hello.o
Execute the program using:
hello
Now look at the file hello.asm
; hello.asm a first program for nasm for Linux, Intel, gcc
;
; assemble: nasm -f elf -l hello.lst hello.asm
; link: gcc -o hello hello.o
; run: hello
; output is: Hello World
SECTION .data ; data section
msg: db "Hello World",10 ; the string to print, 10=lf or '\n'
len: equ $-msg ; "$" means "here"
; len is a value, not an address
SECTION .text ; code section
global main ; make label available to linker
main: ; standard gcc entry point
mov edx,len ; arg3, length of string to print
mov ecx,msg ; arg2, pointer to string
mov ebx,1 ; arg1, where to write, screen
mov eax,4 ; write command to int 80 hex
int 0x80 ; interrupt 80 hex, call kernel
mov ebx,0 ; exit code, 0=normal
mov eax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel
There can be many types of data in the ".data" section:
Look at the file testdata.asm
and see the results in testdata.lst
; testdata.asm a program to demonstrate data types and values
;
; assemble: nasm -f elf -l testdata.lst testdata.asm
; link: gcc -o testdata testdata.o
; run: testdata
; Look at the list file, testdata.lst
; Note! nasm ignores the type of data and type of reserved
; space when used as memory addresses.
; You may have to use qualifiers BYTE, WORD or DWORD [dd01]
section .data ; data section
; initialized, writeable
; db for data byte, 8-bit
db01: db 255,1,17 ; decimal values for bytes
db02: db 0xff,0ABh ; hexadecimal values for bytes
db03: db 'a','b','c' ; character values for bytes
db04: db "abc" ; string value as bytes 'a','b','c'
db05: db 'abc' ; same as "abc" three bytes
db06: db "hello",13,10,0 ; "C" string including cr and lf
; dw for data word, 16-bit
dw01: dw 12345,-17,32 ; decimal values for words
dw02: dw 0xFFFF,0abcdH ; hexadecimal values for words
dw03: dw 'a','ab','abc' ; character values for words
dw04: dw "hello" ; three words, 6-bytes allocated
; dd for data double word, 32-bit
dd01: dd 123456789,-7 ; decimal values for double words
dd02: dd 0xFFFFFFFF ; hexadecimal value for double words
dd03: dd 'a' ; character value in double word
dd04: dd "hello" ; string in two double words
dd05: dd 13.27E30 ; floating point value 32-bit IEEE
; dq for data quad word, 64-bit
dq01: dq 13.27E300 ; floating point value 64-bit IEEE
; dt for data ten of 80-bit floating point
dt01: dt 13.270E3000 ; floating point value 80-bit in register
section .bss ; reserve storage space
; uninitialized, writeable
s01: resb 10 ; 10 8-bit bytes reserved
s02: resw 20 ; 20 16-bit words reserved
s03: resd 30 ; 30 32-bit double words reserved
s04: resq 40 ; 40 64-bit quad words reserved
s05: resb 1 ; one more byte
SECTION .text ; code section
global main ; make label available to linker
main: ; standard gcc entry point
mov al,[db01] ; correct to load a byte
mov ah,[db01] ; correct to load a byte
mov ax,[dw01] ; correct to load a word
mov eax,[dd01] ; correct to load a double word
mov al,BYTE [db01] ; redundant, yet allowed
mov ax,[db01] ; no warning, loads two bytes
mov eax,[dw01] ; no warning, loads two words
; mov ax,BYTE [db01] ; error, size miss match
; mov eax,WORD [dw01] ; error, size miss match
; push BYTE [db01] ; error, can not push a byte
push WORD [dw01] ; "push" needs to know size 2-byte
push DWORD [dd01] ; "push" needs to know size 4-byte
; push QWORD [dq01] ; error, can not push a quad word
push DWORD [dq01+4] ; push a floating point, half of it
push DWORD [dq01] ; push other half of floating point
fld DWORD [dd05] ; floating load 32-bit
fld QWORD [dq01] ; floating load 64-bit
mov ebx,0 ; exit code, 0=normal
mov eax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel
; end testdata.asm
Now, see the values in testdata.lst (widen your window)
1 ; testdata.asm program to demonstrate data types and values
2 ;
3 ; assemble: nasm -f elf -l testdata.lst testdata.asm
4 ; link: gcc -o testdata testdata.o
5 ; run: testdata
6 ; Look at the list file, testdata.lst
7
8 ; Note! nasm ignores the type of data and type of reserved
9 ; space when used as memory addresses.
10 ; You may have to use qualifiers BYTE, WORD or DWORD [dd01]
11
12
13 section .data ; data section
14 ; initialized, writeable
15
16 ; db for data byte, 8-bit
17 00000000 FF0111 db01: db 255,1,17 ; decimal values for bytes
18 00000003 FFAB db02: db 0xff,0ABh ; hexadecimal values for bytes
19 00000005 616263 db03: db 'a','b','c' ; character values for bytes
20 00000008 616263 db04: db "abc" ; string value as bytes 'a','b','c'
21 0000000B 616263 db05: db 'abc' ; same as "abc" three bytes
22 0000000E 68656C6C6F0D0A00 db06: db "hello",13,10,0 ; "C" string including cr and lf
23
24 ; dw for data word, 16-bit
25 00000016 3930EFFF2000 dw01: dw 12345,-17,32 ; decimal values for words
26 0000001C FFFFCDAB dw02: dw 0xFFFF,0abcdH ; hexadecimal values for words
27 00000020 6100616261626300 dw03: dw 'a','ab','abc' ; character values for words
28 00000028 68656C6C6F00 dw04: dw "hello" ; three words, 6-bytes allocated
29
30 ; dd for data double word, 32-bit
31 0000002E 15CD5B07F9FFFFFF dd01: dd 123456789,-7 ; decimal values for double words
32 00000036 FFFFFFFF dd02: dd 0xFFFFFFFF ; hexadecimal value for double words
33 0000003A 61000000 dd03: dd 'a' ; character value in double word
34 0000003E 68656C6C6F000000 dd04: dd "hello" ; string in two double words
35 00000046 AF7D2773 dd05: dd 13.27E30 ; floating point value 32-bit IEEE
36
37 ; dq for data quad word, 64-bit
38 0000004A C86BB752A7D0737E dq01: dq 13.27E300 ; floating point value 64-bit IEEE
39
40 ; dt for data ten of 80-bit floating point
41 00000052 4011E5A59932D5B6F0- dt01: dt 13.270E3000 ; floating point value 80-bit in register
42 0000005B 66
43
44
45
46 section .bss ; reserve storage space
47 ; uninitialized, writeable
48
49 00000000 s01: resb 10 ; 10 8-bit bytes reserved
50 0000000A s02: resw 20 ; 20 16-bit words reserved
51 00000032 s03: resd 30 ; 30 32-bit double words reserved
52 000000AA s04: resq 40 ; 40 64-bit quad words reserved
53 000001EA s05: resb 1 ; one more byte
54
55 SECTION .text ; code section
56 global main ; make label available to linker
57 main: ; standard gcc entry point
58
59
60 00000000 A0[00000000] mov al,[db01] ; correct to load a byte
61 00000005 8A25[00000000] mov ah,[db01] ; correct to load a byte
62 0000000B 66A1[16000000] mov ax,[dw01] ; correct to load a word
63 00000011 A1[2E000000] mov eax,[dd01] ; correct to load a double word
64
65 00000016 A0[00000000] mov al,BYTE [db01] ; redundant, yet allowed
66
67 0000001B 66A1[00000000] mov ax,[db01] ; no warning, loads two bytes
68 00000021 A1[16000000] mov eax,[dw01] ; no warning, loads two words
69
70 ; mov ax,BYTE [db01] ; error, size miss match
71 ; mov eax,WORD [dw01] ; error, size miss match
72
73 ; push BYTE [db01] ; error, can not push a byte
74 00000026 66FF35[16000000] push WORD [dw01] ; "push" needs to know size 2-byte
75 0000002D FF35[2E000000] push DWORD [dd01] ; "push" needs to know size 4-byte
76 ; push QWORD [dq01] ; error, can not push a quad word
77 00000033 FF35[4E000000] push DWORD [dq01+4] ; push a floating point, half of it
78 00000039 FF35[4A000000] push DWORD [dq01] ; push other half of floating point
79 0000003F D905[46000000] fld DWORD [dd05] ; floating load 32-bit
80 00000045 DD05[4A000000] fld QWORD [dq01] ; floating load 64-bit
81
82 0000004B BB00000000 mov ebx,0 ; exit code, 0=normal
83 00000050 B801000000 mov eax,1 ; exit command to kernel
84 00000055 CD80 int 0x80 ; interrupt 80 hex, call kernel
85
86 ; end testdata.asm
The Intel 80x86 has many registers and named sub-registers.
Here are some that are used in assembly language programming
and debugging (the "dash number" gives the number of bits):
+---------------------------+ EAX extended accumulator
| EAX-32 +-----------------+| (lower part of dividend)
| | AX-16 || (quotient after division)
| |+--------+------+|| (lower part of product)
| || AH-8 | AL-8 |||
| |+--------+------+||
| +-----------------+|
+---------------------------+
+---------------------------+ EBX extended base pointer
| EBX-32 +-----------------+| (BX in DS segment)
| | BX-16 ||
| |+--------+------+||
| || BH-8 | BL-8 |||
| |+--------+------+||
| +-----------------+|
+---------------------------+
+---------------------------+ ECX extended counter
| ECX-32 +-----------------+| (string and loop operations)
| | CX-16 || (CX is a 16 bit counter)
| |+--------+------+||
| || CH-8 | CL-8 |||
| |+--------+------+||
| +-----------------+|
+---------------------------+
+---------------------------+ EDX extended DX
| EDX-32 +-----------------+| (I/O pointer for memory mapped I/O)
| | DX-16 || (remainder after divide)
| |+--------+------+|| (upper part of dividend)
| || DH-8 | DL-8 ||| (upper part of product)
| |+--------+------+||
| +-----------------+|
+---------------------------+
+---------------------------+ ESP extended stack pointer
| ESP-32 +-------------+| SP stack pointer
| | SP-16 || (used by PUSH and POP)
| +-------------+|
+---------------------------+
+---------------------------+ EBP extended base pointer
| EBP-32 +-------------+| (by convention, callers stack)
| | BP-16 || (BP in ES segment)
| +-------------+|
+---------------------------+
+---------------------------+ ESI extended source index
| ESI-32 +-------------+| SI source index
| | SI-16 || (in DS segment)
| +-------------+|
+---------------------------+
+---------------------------+ EDI extended destination index
| EDI-32 +-------------+|
| | DI-16 || (DI in ES segment)
| +-------------+|
+---------------------------+
+---------------------------+ EIP extended instruction pointer
| EIP-32 +-------------+| IP instruction pointer
| | IP-16 ||
| +-------------+|
+---------------------------+
+---------------------------+ EFLAGS error flags
| EFLAGS-32 +-------------+| or just flags
| | EFLAGS-16 || (not a register name!)
| +-------------+| (must use PUSHF and POPF)
+---------------------------+
For 32-bit "C" compatible programming, stop here.
+-------------+ CS code segment
| CS-16 |
+-------------+
+-------------+ SS stack segment
| SS-16 |
+-------------+
+-------------+ DS data segment
| DS-16 | (current module)
+-------------+
+-------------+ ES data segment
| ES-16 | (calling module, destination string)
+-------------+
+-------------+ FS heap segment
| FS-16 |
+-------------+
+-------------+ GS global segment
| GS-16 | (shared)
+-------------+
There are also 80-bit floating point registers ST0 .. ST7
There are also 64-bit MMX registers MM0 .. MM7
There are also control registers CR0 .. CR4
There are also debug registers DR0 .. DR3, DR6, DR7
There are also test registers TR3 .. TR7
A dumb program to test register names is testreg.asm
Another dumb program to test al,ah,ax,eax regeax.asm
The basic syntax for a line in NASM is:
label: opcode operand(s) ; comment
The "label" is a case sensitive user name, followed by a colon.
The label is optional and when not present, indent the opcode.
The label should start in column one of the line.
The label may be on a line with nothing else or a comment.
The "opcode" is not case sensitive and may be a machine instruction
or an assembler directive (pseudo operation) or a macro call.
Typically, all "opcode" fields are neatly lined up starting in the
same column. Use of "tab" is OK.
Machine instructions may be preceded by a "prefix" such as:
a16, a32, o16, o32, and others.
"operand(s)" depend on the choice of "opcode".
An operand may have several parts separated by commas,
The parts may be a combination of register names, constants,
memory references in brackets [ ] or empty.
Comments are optional, yet encouraged.
Everything from the semicolon to the end of the line is
a comment, ignored by the assembler.
The semicolon may be in column one, making the entire line
a comment.
Sections or segments:
One specific assembler directive is the "section" or "SECTION"
directive. Four types of section are predefined for ELF format:
section .data ; initialized data
; writeable, not executable
; default alignment 4 bytes
section .bss ; uninitialized space for data
; writeable, not executable
; default alignment 4 bytes
section .rodata ; initialized data
; read only, not executable
; default alignment 4 bytes
section .text ; instructions (code)
; not writeable, executable
; default alignment 16 bytes
section other ; any name other than .data, .bss,
; .rodata, .text
; your stuff
; not executable, not writeable
; default alignment 1 byte
A few comments on efficiency:
My experience is that a good assembly language programmer
can make a small (about 100 lines) "C" program more
efficient than the gcc compiler. But, for larger
programs, the compiler will be more efficient.
Exceptions are, for example, the SGI IRIX cc compiler
that has super optimization for that specific machine.
For the Intel 80x86 here are some samples in nasm and from gcc
(different syntax but you should be able to recognize the instructions)
Focus on the loop, there is prologue and epilogue code that should
be included, yet was omitted. Note the test has "check" values
at each end of the array. There is no range testing in
either "C" or assembly language.
A simple loop loopint.asm
Same code from gcc loopint.s
Hex machine code generated by nasm loopint.lst
Most efficient loop loopint2.asm
Same code from gcc loopint2.s
Hex machine code generated by nasm loopint2.lst
Speed consideration must take into account cache and virtual memory
performance, number of bytes transfered from RAM and clock cycles.
On modern computer architectures, this is almost impossible. For example,
the Pentium 4 translates the 80x86 code into RISC pipeline code and
is actually executing instructions that are different from the
assembly language. Carefully benchmarking complete applications is
about the only conclusive measure of efficiency.
Both integer and floating point arithmetic are demonstrated.
In order to make the source code smaller, a macro is defined
to print out results. The equivalent "C" program is given as
comments.
First, see how to call the "C" library function, printf, to make
it easier to print values:
Look at the file printf1.asm
; printf1.asm print an integer from storage and from a register
; Assemble: nasm -f elf -l printf1.lst printf1.asm
; Link: gcc -o printf1 printf1.o
; Run: printf1
; Output: a=5, eax=7
; Equivalent C code
; /* printf1.c print an int and an expression */
; #include <stdio.h>
; int main()
; {
; int a=5;
; printf("a=%d, eax=%d\n", a, a+2);
; return 0;
; }
; Declare some external functions
;
extern printf ; the C function, to be called
section .data ; Data section, initialized variables
a: dd 5 ; int a=5;
fmt: db "a=%d, eax=%d", 10, 0 ; The printf format, "\n",'0'
section .text ; Code section.
global main ; the standard gcc entry point
main: ; the program label for the entry point
push ebp ; set up stack frame
mov ebp,esp
mov eax, [a] ; put "a" from store into register
add eax, 2 ; a+2
push eax ; value of a+2
push dword [a] ; value of variable a
push dword fmt ; address of format string
call printf ; Call C function
add esp, 12 ; pop stack 3 push times 4 bytes
mov esp, ebp ; take down stack frame
pop ebp ; same as "leave" op
mov eax,0 ; normal, no error, return value
ret ; return
Now, for integer arithmetic, look at the file intarith.asm
; intarith.asm show some simple C code and corresponding nasm code
; the nasm code is one sample, not unique
;
; compile: nasm -f elf -l intarith.lst intarith.asm
; link: gcc -o intarith intarith.o
; run: intarith
;
; the output from running intarith.asm and intarith.c is:
; c=5 , a=3, b=4, c=5
; c=a+b, a=3, b=4, c=7
; c=a-b, a=3, b=4, c=-1
; c=a*b, a=3, b=4, c=12
; c=c/a, a=3, b=4, c=4
;
;The file intarith.c is:
; /* intarith.c */
; #include <stdio.h>
; int main()
; {
; int a=3, b=4, c;
;
; c=5;
; printf("%s, a=%d, b=%d, c=%d\n","c=5 ", a, b, c);
; c=a+b;
; printf("%s, a=%d, b=%d, c=%d\n","c=a+b", a, b, c);
; c=a-b;
; printf("%s, a=%d, b=%d, c=%d\n","c=a-b", a, b, c);
; c=a*b;
; printf("%s, a=%d, b=%d, c=%d\n","c=a*b", a, b, c);
; c=c/a;
; printf("%s, a=%d, b=%d, c=%d\n","c=c/a", a, b, c);
; return 0;
; }
extern printf ; the C function to be called
%macro pabc 1 ; a "simple" print macro
section .data
.str db %1,0 ; %1 is first actual in macro call
section .text
; push onto stack backward
push dword [c] ; int c
push dword [b] ; int b
push dword [a] ; int a
push dword .str ; users string
push dword fmt ; address of format string
call printf ; Call C function
add esp,20 ; pop stack 5*4 bytes
%endmacro
section .data ; preset constants, writeable
a: dd 3 ; 32-bit variable a initialized to 3
b: dd 4 ; 32-bit variable b initializes to 4
fmt: db "%s, a=%d, b=%d, c=%d",10,0 ; format string for printf
section .bss ; uninitialized space
c: resd 1 ; reserve a 32-bit word
section .text ; instructions, code segment
global main ; for gcc standard linking
main: ; label
lit5: ; c=5;
mov eax,5 ; 5 is a literal constant
mov [c],eax ; store into c
pabc "c=5 " ; invoke the print macro
addb: ; c=a+b;
mov eax,[a] ; load a
add eax,[b] ; add b
mov [c],eax ; store into c
pabc "c=a+b" ; invoke the print macro
subb: ; c=a-b;
mov eax,[a] ; load a
sub eax,[b] ; subtract b
mov [c],eax ; store into c
pabc "c=a-b" ; invoke the print macro
mulb: ; c=a*b;
mov eax,[a] ; load a (must be eax for multiply)
imul dword [b] ; signed integer multiply by b
mov [c],eax ; store bottom half of product into c
pabc "c=a*b" ; invoke the print macro
diva: ; c=c/a;
mov eax,[c] ; load c
mov edx,0 ; load upper half of dividend with zero
idiv dword [a] ; divide double register edx eax by a
mov [c],eax ; store quotient into c
pabc "c=c/a" ; invoke the print macro
mov eax,0 ; exit code, 0=normal
ret ; main returned to operating system
bbbb [mem] a product of 32-bits times 32-bits is 64-bits
imul bbbb eax
---------
edx bbbbbbbb eax the upper part of the product is in edx
the lower part of the product is in eax
edx bbbbbbbb eax before divide, the upper part of dividend is in edx
the lower part of dividend is in eax
idiv bbbb [mem] the divisor
--------
after divide, the quotient is in eax
the remainder is in edx
Now, for floating point arithmetic, look at the file fltarith.asm
Note the many similarities to integer arithmetic, yet some basic differences.
; fltarith.asm show some simple C code and corresponding nasm code
; the nasm code is one sample, not unique
;
; compile nasm -f elf -l fltarith.lst fltarith.asm
; link gcc -o fltarith fltarith.o
; run fltarith
;
; the output from running fltarith and fltarithc is:
; c=5.0, a=3.000000e+00, b=4.000000e+00, c=5.000000e+00
; c=a+b, a=3.000000e+00, b=4.000000e+00, c=7.000000e+00
; c=a-b, a=3.000000e+00, b=4.000000e+00, c=-1.000000e+00
; c=a*b, a=3.000000e+00, b=4.000000e+00, c=1.200000e+01
; c=c/a, a=3.000000e+00, b=4.000000e+00, c=4.000000e+00
; a=i , a=8.000000e+00, b=1.600000e+01, c=1.600000e+01
;The file fltarith.c is:
; #include <stdio.h>
; int main()
; {
; double a=3.0, b=4.0, c;
; int i=8;
;
; c=5.0;
; printf("%s, a=%e, b=%e, c=%e\n","c=5.0", a, b, c);
; c=a+b;
; printf("%s, a=%e, b=%e, c=%e\n","c=a+b", a, b, c);
; c=a-b;
; printf("%s, a=%e, b=%e, c=%e\n","c=a-b", a, b, c);
; c=a*b;
; printf("%s, a=%e, b=%e, c=%e\n","c=a*b", a, b, c);
; c=c/a;
; printf("%s, a=%e, b=%e, c=%e\n","c=c/a", a, b, c);
; a=i;
; b=a+i;
; i=b;
; c=i;
; printf("%s, a=%e, b=%e, c=%e\n","c=c/a", a, b, c);
; return 0;
; }
extern printf ; the C function to be called
%macro pabc 1 ; a "simple" print macro
section .data
.str db %1,0 ; %1 is macro call first actual parameter
section .text
; push onto stack backwards
push dword [c+4] ; double c (bottom)
push dword [c] ; double c
push dword [b+4] ; double b (bottom)
push dword [b] ; double b
push dword [a+4] ; double a (bottom)
push dword [a] ; double a
push dword .str ; users string
push dword fmt ; address of format string
call printf ; Call C function
add esp,32 ; pop stack 8*4 bytes
%endmacro
section .data ; preset constants, writeable
a: dq 3.0 ; 64-bit variable a initialized to 3.0
b: dq 4.0 ; 64-bit variable b initializes to 4.0
i: dw 8 ; a 32 bit integer
five: dq 5.0 ; constant 5.0
fmt: db "%s, a=%e, b=%e, c=%e",10,0 ; format string for printf
section .bss ; unitialized space
c: resq 1 ; reserve a 64-bit word
section .text ; instructions, code segment
global main ; for gcc standard linking
main: ; label
lit5: ; c=5.0;
fld qword [five] ; 5.0 constant
fstp qword [c] ; store into c
pabc "c=5.0" ; invoke the print macro
addb: ; c=a+b;
fld qword [a] ; load a (pushed on flt pt stack, st0)
fadd qword [b] ; floating add b (to st0)
fstp qword [c] ; store into c (pop flt pt stack)
pabc "c=a+b" ; invoke the print macro
subb: ; c=a-b;
fld qword [a] ; load a (pushed on flt pt stack, st0)
fsub qword [b] ; floating subtract b (to st0)
fstp qword [c] ; store into c (pop flt pt stack)
pabc "c=a-b" ; invoke the print macro
mulb: ; c=a*b;
fld qword [a] ; load a (pushed on flt pt stack, st0)
fmul qword [b] ; floating multiply by b (to st0)
fstp qword [c] ; store product into c (pop flt pt stack)
pabc "c=a*b" ; invoke the print macro
diva: ; c=c/a;
fld qword [c] ; load c (pushed on flt pt stack, st0)
fdiv qword [a] ; floating divide by a (to st0)
fstp qword [c] ; store quotient into c (pop flt pt stack)
pabc "c=c/a" ; invoke the print macro
intflt: ; a=i;
fild dword [i] ; load integer as floating point
fst qword [a] ; store the floating point (no pop)
fadd st0 ; b=a+i; 'a' as 'i' already on flt stack
fst qword [b] ; store sum (no pop) 'b' still on stack
fistp dword [i] ; i=b; store floating point as integer
fild dword [i] ; c=i; load again from ram (redundant)
fstp qword [c]
pabc "a=i " ; invoke the print macro
mov eax,0 ; exit code, 0=normal
ret ; main returns to operating system
Refer to nasmdoc.txt or textbook 10.4 for details.
A brief summary is provided here.
"reg" is an 8-bit, 16-bit or 32-bit register
"count" is a number of bits to shift
"right" moves contents of the register to the right, makes it smaller
"left" moves contents of the register to the left, makes it bigger
SAL reg,count shift arithmetic left
SAR reg,count shift arithmetic right (sign extension)
SHL reg,count shift left (logical, zero fill)
SHR reg,count shift right (logical, zero fill)
ROL reg,count rotate left
ROR reg,count rotate right
SHLD reg1,reg2,count shift left double-register
SHRD reg1,reg2,count shift right double-register
An example of using the various shifts is in: shift.asm
See www.csee.umbc.edu/help/nasm/nasm.shtml for notes on using debugger.
A program that prints where its sections are allocated
(in virtual memory) is where.asm
; where.asm print addresses of sections
; Assemble: nasm -f elf -l where.lst where.asm
; Link: gcc -o where where.o
; Run: where
; Output: you need to run it, on my computer
; data a: at 8048330
; bss b: at 804840C
; rodata c: at 804956C
; code main: at 8049424
extern printf ; the C function, to be called
section .data ; Data section, initialized variables
a: db 0,1,2,3,4,5,6,7
fmt: db "data a: at %X",10
db "bss b: at %X",10
db "rodata c: at %X",10
db "code main: at %X",10,0
section .bss ; reserved storage, uninitialized
b: resb 8
section .rodata ; read only initialized storage
c: db 7,6,5,4,3,2,1,0
section .text ; Code section.
global main ; the standard gcc entry point
main: ; the program label for the entry point
push ebp
mov ebp,esp
push ebx
lea eax,[a] ; load effective address of [a]
push eax
lea ebx,[b]
push ebx
lea ecx,[c]
push ecx
lea edx,[main]
push edx
push dword fmt ; address of format string
call printf ; Call C function
add esp, 20 ; pop stack 5 push times 4 bytes
pop ebx
mov esp,ebp
pop ebp
mov eax,0 ; normal, no error, return value
ret ; return
The basic integer compare instruction is "cmp"
Following this instruction is typically one of:
JL label ; jump on less than "<"
JLE label ; jump on less than or equal "<="
JG label ; jump on greater than ">"
JGE label ; jump on greater than or equal ">="
JE label ; jump on equal "=="
JNE label ; jump on not equal "!="
After many integer arithmetic instructions
JZ label ; jump on zero
JNZ label ; jump on non zero
JS label ; jump on sign plus
JNS labe; ; jump on sign not plus
Note: Use 'cmp' rather than 'sub' for comparison.
Overflow can occur on subtraction resulting in sign inversion.
Convert a "C" 'if' statement to nasm assembly ifint.asm
The significant features are:
1) use a compare instruction for the test
2) put a label on the start of the false branch (e.g. false1:)
3) put a label after the end of the 'if' statement (e.g. exit1:)
4) choose a conditional jump that goes to the false part
5) put an unconditional jump to (e.g. exit1:) at the end of the true part
; ifint.asm code ifint.c for nasm
; /* ifint.c an 'if' statement that will be coded for nasm */
; #include <stdio.h>
; int main()
; {
; int a=1;
; int b=2;
; int c=3;
; if(a<b)
; printf("true a < b \n");
; else
; printf("wrong on a < b \n");
; if(b>c)
; printf("wrong on b > c \n");
; else
; printf("false b > c \n");
; return 0;
;}
; result of executing both "C" and assembly is:
; true a < b
; false b > c
global main ; define for linker
extern printf ; tell linker we need this C function
section .data ; Data section, initialized variables
a: dd 1
b: dd 2
c: dd 3
fmt1: db "true a < b ",10,0
fmt2: db "wrong on a < b ",10,0
fmt3: db "wrong on b > c ",10,0
fmt4: db "false b > c ",10,0
section .text
main: mov eax,[a]
cmp eax,[b]
jge false1 ; choose jump to false part
; a < b sign is set
push dword fmt1 ; printf("true a < b \n");
call printf
add esp,4
jmp exit1 ; jump over false part
false1: ; a < b is false
push dword fmt2 ; printf("wrong on a < b \n");
call printf
add esp,4
exit1: ; finished 'if' statement
mov eax,[b]
cmp eax,[c]
jle false2 ; choose jump to false part
; b > c sign is not set
push dword fmt3 ; printf("wrong on b > c \n");
call printf
add esp,4
jmp exit2 ; jump over false part
false2: ; a > b is false
push dword fmt4 ; printf("false b :gt; c \n");
call printf
add esp,4
exit2: ; finished 'if' statement
mov eax,0 ; normal, no error, return value
ret ; return 0;
Convert a "C" loop to nasm assembly loopint.asm
The significant features are:
1) "C" int is 4-bytes, thus dd1[1] becomes dword [dd1+4]
dd1[99] becomes dword [dd1+4*99]
2) "C" int is 4-bytes, thus dd1[i]; i++; becomes add edi,4
since "i" is never stored, the register edi holds "i"
3) the 'cmp' instruction sets flags that control the jump instruction.
cmp edi,4*99 is like i<99
jnz loop1 jumps if register edi is not 4*99
; loopint.asm code loopint.c for nasm
; /* loopint.c a very simple loop that will be coded for nasm */
; #include <stdio.h>
; int main()
; {
; int dd1[100];
; int i;
; dd1[0]=5; /* be sure loop stays 1..98 */
; dd1[99]=9;
; for(i=1; i<99; i++) dd1[i]=7;
; printf("dd1[0]=%d, dd1[1]=%d, dd1[98]=%d, dd1[99]=%d\n",
; dd1[0], dd1[1], dd1[98],dd1[99]);
; return 0;
;}
section .bss
dd1: resd 100
i: resd 1 ; actually unused, kept in register
section .text
global main
main: mov dword [dd1],5 ; dd1[0]=5;
mov dword [dd1+99*4],9 ; dd1[99]=9;
mov edi,4 ; i=1; /* 4 bytes */
loop1: mov dword [dd1+edi],7 ; dd1[i]=7;
add edi,4 ; i++; /* 4 bytes */
cmp edi,4*99 ; i<99
jne loop1 ; loop until i=99
extern printf ; the C function, to be called
section .data ; Data section, initialized variables
fmt: db "dd1[0]=%d, dd1[1]=%d, dd1[98]=%d, dd1[99]=%d",10,0
section .text ; Code section, continued
push dword [dd1+99*4] ; dd1[99]
push dword [dd1+98*4] ; dd1[98]
push dword [dd1+4] ; dd1[1]
push dword [dd1] ; dd1[0]
push dword fmt ; address of format string
call printf ; Call C function
add esp, 20 ; pop stack 5 push times 4 bytes
mov eax,0 ; normal, no error, return value
ret ; return 0;
; no registers needed to be saved
Previously, integer arithmetic in "C" was converted to
NASM assembly language. The following is very similar
(cut and past) of intarith.asm to intlogic.asm that
shows the "C" operators "&" and, "|" or, "^" xor, "~" not.
intlogic.asm
One significant use of loops is to evaluate polynomials and
convert numbers from one base to another.
(Yes, this is related to project 1 for CMSC 313)
The following program has seven loops.
Loop1 (h1loop) uses Horners method to convert ASCII decimal digits
to binary, using a sentinal, '.', with 'cmp' and 'je'
to exit the loop
Loop2 (h2loop) uses Horners method to convert ASCII decimal digits
using 'edi' as an index, 'ecx' and 'loop' to do the loop.
Loop3 (h3loop) uses Horners method to evaluate a polynomial,
using 'edi' as an index, 'ecx' and 'loop' to do the loop.
Loop4 (h4loop) uses Horners method, with data order optimized,
using 'ecx' as both index and loop counter, to get a
three instruction loop.
Loop5 (h5loop) uses Horners method to evaluate a polynomial
using double precision floating point. Note 8 byte
increment and quad word to printf.
Loop6 (h6loop) uses Horners method to evaluate the fractional
part of a double precision floating point polynomial.
Note that divide is used in place of multiply and the
least significant coefficient is used first.
Loop7 (h7loop) uses Horners method to convert ASCII decimal
fraction to binary. Note that shifting is needed
because the binary point can not be at the right end
of the word.
Loop8 (h8loop) just prints 16 bits from the result of Loop7
as ASCII characters.
Study horner.asm to understand
the NASM coding of the loops.
; horner.asm Horners method of evaluating polynomials
;
; given a polynomial Y = a_n X^n + a_n-1 X^n-1 + ... a_1 X + a_0
; a_n is the coefficient 'a' with subscript n. X^n is X to nth power
; compute y_1 = a_n * X + a_n-1
; compute y_2 = y_1 * X + a_n-2
; compute y_i = y_i-1 * X + a_n-i i=3..n
; thus y_n = Y = value of polynomial
;
; in assembly language:
; load some register with a_n, multiply by X
; add a_n-1, multiply by X, add a_n-2, multiply by X, ...
; finishing with the add a_0
;
; for conversion of decimal to binary, X=10
;
extern printf
section .data
decdig: db '5','2','8','0','.' ; decimal integer 5280
fmt: db "%d",10,0
global main
section .text
main: push ebp ; save ebp
mov ebp,esp ; ebp is callers stack
push ebx
push edi ; save registers
; method 1, using a "sentinel" e.g. '.'
mov eax,0 ; accumulate value here
mov al,[decdig] ; get first ASCII digit
sub al,48 ; convert ASCII digit to binary
mov edi,1 ; subscript initialization
h1loop: mov ebx,0 ; clear register (upper part)
mov bl,[decdig+edi] ; get next ASCII digit
cmp bl,'.' ; compare to decimal point
je h1fin ; exit loop on decimal point
sub bl,48 ; convert ASCII digit to binary
imul eax,10 ; * X (ignore edx)
add eax,ebx ; + a_n-i
inc edi ; increment subscript
jmp h1loop
h1fin:
push dword eax ; print eax
push dword fmt ; format %d
call printf
add esp,8 ; restore stack
; method 2, using a count
mov eax,0 ; accumulate value here
mov al,[decdig] ; get first ASCII digit
sub al,48 ; convert ASCII digit to binary
mov edi,1 ; subscript initialization
mov ecx,3 ; loop iteration count initialization
h2loop: mov ebx,0 ; clear register (upper part)
mov bl,[decdig+edi] ; get next ASCII digit
sub bl,48 ; convert ASCII digit to binary
imul eax,10 ; * X (ignore edx)
add eax,ebx ; + a_n-i
inc edi ; increment subscript
loop h2loop ; decrement ecx, jump on non zero
push dword eax ; print eax
push dword fmt ; format %d
call printf
add esp,8 ; restore stack
; evaluate a polynomial, X=7, using a count
section .data
a: dd 2,5,-7,22,-9 ; coefficients of polynomial, a_n first
X: dd 7
section .text
mov eax,[a] ; accumulate value here, get coefficient a_n
mov edi,1 ; subscript initialization
mov ecx,4 ; loop iteration count initialization, n
h3loop: imul eax,[X] ; * X (ignore edx)
add eax,[a+4*edi] ; + a_n-i
inc edi ; increment subscript
loop h3loop ; decrement ecx, jump on non zero
push dword eax ; print eax
push dword fmt ; format %d
call printf
add esp,8 ; restore stack
; evaluate a polynomial, X=7, using a count as index
; optimal organization of data allows a three instruction loop
section .data
aa: dd -9,22,-7,5,2 ; coefficients of polynomial, a_0 first
section .text
mov eax,[aa+4*4] ; accumulate value here, get coefficient a_n
mov ecx,4 ; loop iteration count initialization, n
h4loop: imul eax,[X] ; * X (ignore edx)
add eax,[aa+4*ecx-4]; + aa_n-i
loop h4loop ; decrement ecx, jump on non zero
push dword eax ; print eax
push dword fmt ; format %d
call printf
add esp,8 ; restore stack
; evaluate a double floating polynomial, X=7.0, using a count as index
; optimal organization of data allows a three instruction loop
section .data
af: dq -9.0,22.0,-7.0,5.0,2.0 ; coefficients of polynomial, a_0 first
XF: dq 7.0
Y: dq 0.0
N: dd 4
fmtflt: db "%e",10,0
section .text
mov ecx,[N] ; loop iteration count initialization, n
fld qword [af+8*ecx]; accumulate value here, get coefficient a_n
h5loop: fmul qword [XF] ; * XF
fadd qword [af+8*ecx-8] ; + aa_n-i
loop h5loop ; decrement ecx, jump on non zero
fstp qword [Y] ; store Y in order to print Y
push dword [Y+4] ; print Y (must be two parts of quadword)
push dword [Y] ; print Y
push dword fmtflt ; format %e
call printf
add esp,12 ; restore stack
; Convert the fractional polynomial, Y = a_-1 X^-1 + a_-2 X^-2 + ...
; This must be performed using divide in reverse order.
; compute y_1 = a_-n / X + a_-n+1
; compute y_2 = y_1 / X + a_-n+2
; compute y_i = y_i-1 / X + a_-n+i i=3..n
; thus y_n = Y_n-1 / X = Y = value of polynomial
; Using the coefficients above a_-1 = -9.0 (first)
; a_-2 = 22.0, a_-3 = -7.0, a_-4 = 5.0, a_-5 = 2.0
; N=4 (not 5) because the the first term is outside the loop
mov ecx,[N] ; loop iteration count initialization, n
fld qword [af+8*ecx]; accumulate value here, get a_-n-1 = 2.0
h6loop: fdiv qword [XF] ; * XF
fadd qword [af+8*ecx-8] ; + aa_n-i
loop h6loop ; decrement ecx, jump on non zero
fdiv qword [XF] ; extra divide for fractional terms
fstp qword [Y] ; store Y in order to print Y
push dword [Y+4] ; print Y (must be two parts of quadword)
push dword [Y] ; print Y
push dword fmtflt ; format %e
call printf
add esp,12 ; restore stack
; Convert the fractional part, a_-1 X^-1 + a_-2 X^-2 + ...
; This must be performed using "fixed point" arithmetic.
; The implied binary point is 16-bits from LSB.
section .data
fracdig:db '.','1','2','3','4' ; decimal fraction .1234
fmth: db "%X",10,0
ten: dd 10
eaxsave:dd 0
global h7loop ; for debugging loop "break h7loop"
section .text
mov eax,0 ; accumulate value here
mov al,[fracdig+4] ; get last ASCII digit
sub al,48 ; convert ASCII digit to binary
shl eax,16 ; move binary point
mov ecx,3 ; loop iteration count initialization
h7loop: mov edx,0 ; must clear upper dividend
idiv dword [ten] ; quotient in eax
mov ebx,0 ; clear register (upper part)
mov bl,[fracdig+ecx]; get next previous ASCII digit
sub bl,48 ; convert ASCII digit to binary
shl ebx,16 ; move binary point 16-bits
add eax,ebx ; + a_n-i
loop h7loop ; decrement ecx, jump on non zero
mov edx,0 ; must clear upper dividend
idiv dword [ten] ; final divide
mov [eaxsave],eax ; save eax, printf destroys it
push dword eax ; print eax
push dword fmth ; format %X (look at low 16-bits)
call printf
add esp,8 ; restore stack
; print the bits in eaxsave:
section .bss
abits: resb 17 ; 16 characters plus zero terminator
section .data
fmts: db "%s",10,0
section .text
mov eax,[eaxsave] ; restore eax
ror eax,1 ; get bottom bit in top of eax
mov ecx,16 ; for printing 16 bits
h8loop: mov edx,0 ; clear edx ready for a bit
shld edx,eax,1 ; top bit of eax into edx
add edx,48 ; make it ASCII
mov [abits+ecx-1],dl ; store character
ror eax,1 ; next bit into top of eax
loop h8loop ; decrement ecx, jump non zero
mov byte [abits+16],0 ; end of "C" string
push dword abits ; string to print
push dword fmts ; "%s"
call printf
add esp,8
pop edi
pop ebx
mov esp,ebp ; restore callers stack frame
pop ebp
ret ; return
; output from execution:
; 5280
; 5280
; 6319
; 6319
; 6.319000e+03
; -8.549414e-01
; 1F97
; 0001111110010111
Here is a basic subroutine (function, procedure, etc)
Note the use of the stack pointer for passing parameters.
Note saving and restoring the callers registers.
(Yes, this is needed for CMSC 313 project 2)
"call1" below, is called by the "C" program test_call1.c
/* test_call1.c test call1.asm */
#include <stdio.h>
int main()
{
int L[2];
L[0]=1;
L[1]=2;
call1(L);
printf("L[0]=%d, L[1]=%d \n", L[0], L[1]);
return 0;
}
The result is L{0]=0, L[1]=0, from the following:
; call1.asm a basic structure for a subroutine to be called from "C"
;
; This saves more registers than used here
; Parameters: int L[] or int *L
; Result: L[0]=0 L[1]=0
global call1 ; linker must know name of subroutine
call1: ; name must appear as a nasm label
push ebp ; save ebp
mov ebp, esp ; ebp is callers stack
push ebx ; save registers
push edi
push esi
mov edi,[ebp+8] ; get address of L into edi
mov eax,0 ; get a 32-bit zero
mov [edi],eax ; L[0]=0
add edi,4 ; add one dword=32-bit int
mov [edi],eax ; L[1]=0
pop esi ; restore registers
pop edi ; in reverse order
pop ebx
mov esp,ebp ; restore callers stack frame
pop ebp
ret ; return
;
; Notes about the callers stack, ebp in our code:
; ebp+8 is the last argument passed to us by the caller,
; this is our first argument
; ebp+12 would be our second argument, etc. +4 each
; the arguments can be values or addresses,
; as defined by the "C" function prototypes
; ebp+4 is the return address in the caller, used by 'ret'
; ebp which is our starting esp, is the next available stack space
Study call1.asm
Now, to pass more arguments, call2.c
can be implemented as call2.asm
Note passing arrays including strings is via address,
passing scalar values is via passing values.
; call2.asm code loopint.c as subroutine (void function)
; /* call2.c a very simple loop that will be coded for nasm */
; #include <stdio.h>
; void call2(int *A, int start, int end, int value);
; int main()
; {
; int dd1[100];
; int i;
; dd1[0]=5; /* be sure loop stays 1..98 */
; dd1[1]=6;
; dd1[98]=8;
; dd1[99]=9;
; call2(dd1,1,98,7); /* fill dd1[1] thru dd1[98] with 7 */
; printf("dd1[0]=%d, dd1[1]=%d, dd1[98]=%d, dd1[99]=%d\n",
; dd1[0], dd1[1], dd1[98],dd1[99]);
; return 0;
; }
; void call2(int *A, int start, int end, int value)
; {
; int i;
;
; for(i=start; i<=end; i++) A[i]=value;
; }
; execution output is dd1[0]=5, dd1[1]=7, dd1[98]=7, dd1[99]=9
section .bss
i: resd 1 ; actually unused, kept in register
section .text
global call2 ; linker must know name of subroutine
call2: ; name must appear as a nasm label
push ebp ; save ebp
mov ebp, esp ; ebp is callers stack
push ebx ; save registers
push edi
mov edi,[ebp+8] ; get address of A into edi
mov eax,[ebp+12] ; get value of start
mov ebx,[ebp+16] ; get value of end
mov edx,[ebp+20] ; get value of value
loop1: mov [4*eax+edi],edx ; A[i]=value;
add eax,1 ; i++;
cmp eax,ebx ; i<=end
jle loop1 ; loop i<=end is false
pop edi ; in reverse order
pop ebx
mov esp,ebp ; restore callers stack frame
pop ebp
ret ; return
;
; Notes about the callers stack, ebp in our code:
; ebp+8 is the last argument passed to us by the caller,
; this is our first argument, the address of A.
; ebp+12 is our second argument, 'start' a value.
A simple function, called and written in the same .asm file
intfunc.asm
; intfunc.asm call integer function int sum(int x, int y)
;
; compile: nasm -f elf intfunc.asm
; link: gcc -o intfunc.o
; run: intfunc
; result: 5 = sum(2,3)
extern printf
section .data
x: dd 2
y: dd 3
z: dd 1
fmt: db "%d = sum(%d,%d)",10,0
global main
main: push ebp
mov ebp,esp
push ebx
push dword [y] ; push arguments for sum
push dword [x]
call sum ; coded below
add esp,8
mov [z],eax ; save result from sum
push dword [y] ; print
push dword [x]
push dword [z]
push dword fmt
call printf
add esp,16
pop ebx
mov esp,ebp
pop ebp
mov eax,0
ret
; end main
sum: push ebp ; function int sum(int x, int y)
mov ebp,esp
push ebx
mov eax,[ebp+8] ; get argument x
mov ebx,[ebp+12] ; get argument y
add eax,ebx ; x+y with result in eax
pop ebx
mov esp,ebp
pop ebp
ret ; return value in eax
; end of function int sum(int x, int y)
A simple demonstration of using a double sin(double x) function
from the "C" math.h fltfunc.asm
; fltfunc.asm call math routine double sin(double x)
;
; compile: nasm -f elf fltfunc.asm
; link: gcc -o fltfunc.o -lm # needs math library
; run: fltfunc
;
extern sin ; be sure to 'extern' functions
extern printf
section .data
x: dq 0.7853975 ; Pi/4 = 45 degrees
y: dq 1.0 ; should be about 7.07E-1
fmt: db "%e = sin(%e)",10,0
global main
main: push ebp
mov ebp,esp
push ebx
push dword [x+4] ; push quad word (double) for sin
push dword [x]
call sin ; call the library sin function
add esp,8
fstp qword [y] ; save the return value
push dword [x+4] ; print
push dword [x]
push dword [y+4]
push dword [y]
push dword fmt
call printf
add esp,20
pop ebx
mov esp,ebp
pop ebp
mov eax,0
ret
; result: 7.071063e-01 = sin(7.853975e-01)
; end fltfunc.asm
And a final example of a simple recursive function, factorial,
written in unoptimized assembly language following the "C" code.
test_factorial.asm
test_factorial.c
; test_factorial.asm based on test_factorial.c
; /* test_factorial.c the simplest example of a recursive function */
; /* a recursive function is a function that calls itself */
; static int factorial(int n) /* n! is n factorial = 1*2*3*...*(n-1)*n */
; {
; if( n <= 1 ) return 1; /* must have a way to stop recursion */
; return n * factorial(n-1); /* factorial calls factorial with n-1 */
; } /* n * (n-1) * (n-2) * ... * (1) */
; #include <stdio.h>
; int main()
; {
; printf(" 0!=%d \n", factorial(0)); /* Yes, 0! is one */
; printf(" 1!=%d \n", factorial(1));
; ...
; printf("18!=%d \n", factorial(18)); /* wrong, uncaught in C */
; return 0;
; }
; /* output of execution is:
; 0!=1
; 1!=1
; ...
; 12!=479001600
; 13!=1932053504 wrong! 13! = 12! * 13, must end in two zeros
; 14!=1278945280 wrong! and no indication!
; 15!=2004310016 wrong!
; 16!=2004189184 wrong!
; 17!=-288522240 wrong and obvious if you check your results
; 18!=-898433024 Only sometimes does integer overflow go negative
; */
;
; compile: nasm -f elf test_factorial.asm
; link: gcc -o test_factorial.o
; run: test_factorial
section .bss
tmp: resd 1 ; over written each call
section .text
factorial: ; not global is 'static' in "C"
push ebp ; function int factorial(int n)
mov ebp,esp
push ebx
mov eax,[ebp+8] ; get argument n
cmp eax,1 ; compare for exit
jle exitf ; go return a 1
sub eax,1 ; n-1
push dword eax ; compute factorial(n-1)
call factorial
pop edx ; get back our "n-1"
add edx,1 ; have our "n"
mov [tmp],edx
imul eax,[tmp] ; n * factorial(n-1) in eax
jmp returnf
exitf: mov eax,1
returnf:
pop ebx
mov esp,ebp
pop ebp
ret ; return value in eax
; end of function static int factorial(int n)
extern printf
section .data
n: dd 0 ; initial, will count to 18
fmt: db "%d! = %d",10,0 ; simple format
section .bss
nfact: resd 1 ; just a place to hold result
section .text
global main
main: push ebp
mov ebp,esp
push ebx
loop1:
push dword [n] ; push arguments for factorial
call factorial ; coded above
add esp,4
mov [nfact],eax ; save result from factorial
push dword [nfact] ; print
push dword [n]
push dword fmt
call printf
add esp,12
mov eax,[n]
inc eax
mov [n],eax
cmp eax,18
jle loop1 ; print factorial 0..18
pop ebx
mov esp,ebp
pop ebp
mov eax,0
ret
; end main
A sample of a basic stand alone bootable program is boot1.asm
; boot1.asm stand alone program for floppy boot sector
; Compiled using nasm -f bin boot1.asm
; Written to floppy with dd if=boot1 of=/dev/fd0
; Boot record is loaded at 0000:7C00,
ORG 7C00h
; load message address into SI register:
LEA SI,[msg]
; screen function:
MOV AH,0Eh
print: MOV AL,[SI]
CMP AL,0
JZ done ; zero byte at end of string
INT 10h ; write character to screen.
INC SI
JMP print
; wait for 'any key':
done: MOV AH,0
INT 16h ; waits for key press
; AL is ASCII code or zero
; AH is keyboard code
; store magic value at 0040h:0072h to reboot:
; 0000h - cold boot.
; 1234h - warm boot.
MOV AX,0040h
MOV DS,AX
MOV word[0072h],0000h ; cold boot.
JMP 0FFFFh:0000h ; reboot!
msg DB 'Welcome, I have control of the computer.',13,10
DB 'Press any key to reboot.',13,10
DB '(after removing the floppy)',13,10,0
; end boot1
This program could be extended to find or verify the keycodes
that are input (not all keys have ASCII codes).
One keyboard has the following ASCII and keycodes ascii.txt
American Standard Code for Information Interchange, ASCII
(with keycodes for a particular 104 key keyboard)
dec is decimal value
hex is 8-bit hexadecimal value
key is 104-key PC keyboard keycode in hexadecimal
type means how to type character (shift not shown) C- for hold control down
def is control character definition, e.g. LF line feed, FF form feed,
CR carriage return, BS back space,
dec hex key type def dec hex key type dec hex key type dec hex key type
0 00 13 C-@ NULL 32 20 5E space 64 40 13 @ 96 60 11 `
1 01 3C C-A SOH 33 21 12 ! 65 41 3C A 97 61 3C a
2 02 50 C-B STX 34 22 46 " 66 42 50 B 98 62 50 b
3 03 4E C-C ETX 35 23 14 # 67 43 4E C 99 63 4E c
4 04 3E C-D EOT 36 24 15 $ 68 44 3E D 100 64 3E d
5 05 29 C-E ENQ 37 25 16 % 69 45 29 E 101 65 29 e
6 06 3F C-F ACK 38 26 18 & 70 46 3F F 102 66 3F f
7 07 40 C-G BEL 39 27 46 ' 71 47 40 G 103 67 40 g
8 08 41 C-H BS 40 28 1A ( 72 48 41 H 104 68 41 h
9 09 2E C-I HT 41 29 1B ) 73 49 2E I 105 69 2E i
10 0A 42 C-J LF 42 2A 19 * 74 4A 42 J 106 6A 42 j
11 0B 43 C-K VT 43 2B 1D + 75 4B 43 K 107 6B 43 k
12 0C 44 C-L FF 44 2C 53 , 76 4C 44 L 108 6C 44 l
13 0D 52 C-M CR 45 2D 1C - 77 4D 52 M 109 6D 52 m
14 0E 51 C-N SO 46 2E 54 . 78 4E 51 N 110 6E 51 n
15 0F 2F C-O SI 47 2F 55 / 79 4F 2F O 111 6F 2F o
16 10 30 C-P DLE 48 30 1B 0 80 50 30 P 112 70 30 p
17 11 27 C-Q DC1 49 31 12 1 81 51 27 Q 113 71 27 q
18 12 2A C-R DC2 50 32 13 2 82 52 2A R 114 72 2A r
19 13 3D C-S DC3 51 33 14 3 83 53 3D S 115 73 3D s
20 14 2B C-T DC4 52 34 15 4 84 54 2B T 116 74 2B t
21 15 2D C-U NAK 53 35 16 5 85 55 2D U 117 75 2D u
22 16 4F C-V SYN 54 36 17 6 86 56 4F V 118 76 4F v
23 17 2E C-W ETB 55 37 17 7 87 57 28 W 119 77 28 w
24 18 4D C-X CAN 56 38 19 8 88 58 4D X 120 78 4D x
25 19 2C C-Y EM 57 39 1A 9 89 59 2C Y 121 79 2C y
26 1A 4C C-Z SUB 58 3A 45 : 90 5A 4C Z 122 7A 4C z
27 1B 31 C-[ ESC 59 3B 45 ; 91 5B 31 [ 123 7B 31 {
28 1C 33 C-\ FS 60 3C 53 < 92 5C 33 \ 124 7C 33 |
29 1D 32 C-] GS 61 3D 3D = 93 5D 32 ] 125 7D 32 }
30 1E 17 C-^ RS 62 3E 54 > 94 5E 17 ^ 126 7E 11 ~
31 1F 1C C-_ US 63 3F 55 ? 95 5F 1C _ 127 7F 34 delete
Additional key codes (most have no ASCII)[must track shift-up, shift-down etc.]
key type key type key type key type
01 ESCAPE 10 PAUSE 39 keypad 9 PAGE UP 5D LEFT ALT
02 F1 1E BACKSPACE 3A keypad + 5E SPACE
03 F2 1F INSERT 3B CAPS LOCK 5F RIGHT ALT
04 F3 20 HOME 47 ENTER 60 RIGHT CTRL
05 F4 21 PAGE UP 48 keypad 4 LEFT 61 LEFT ARROW
06 F5 22 NUM LOCK 49 keypad 5 62 DOWN ARROW
07 F6 23 keypad / 4A keypad 6 RIGHT 63 RIGHT ARROW
08 F7 24 keypad * 4B LEFT SHIFT 64 keypad 0 INS
09 F8 25 keypad - 56 RIGHT SHIFT 65 keypad . DEL
0A F9 26 TAB 57 UP ARROW 66 LEFT WINDOWS
0B F10 34 DELETE 58 keypad 1 END 67 RIGHT WINDOWS
0C F11 35 END 59 keypad 2 DOWN 68 APPLICATION
0D F12 36 PAGE DOWN 5A keypad 3 PAGE DN 7E SYS REQ
0E PRT SCRN 37 keypad 7 HOME 5B keypad ENTER 7F BREAK
0F SCROLL LOCK 38 keypad 8 UP 5C LEFT CTRL
Now you may wish to download another self booting program,
memtest.bin a binary program.
If you can get this file, undamaged, onto your computer, running
linux, then you can write a floppy disk:
dd if=memtest.bin of=/dev/fd0
Then do a safe shutdown.
Reboot your computer from the power off state.
You should see information about your computer.
e.g. clock speed, type of CPU, cache sizes, RAM size,
and it will run a very thurough memory test on your RAM.
You will not be able to run a bootable floppy on a UMBC
Intel PC because the BIOS should be set to not boot from
a floppy and the BIOS should be password protected, so you
can not change the BIOS. The machine is probably secured
so you can not get in and change the BIOS chip.
More on bootable floppies is at nasm boot info
See Project 3 A few basic BIOS calls: See BIOS ref A more complete, and harder to read, BIOS Interrupt Services But, not INT 21h or above. You do not have DOS running. A sample bootable program is boot1.asm A more complete bootable program with subroutines and uses a printer on lpt 0 is: bootreg.asm
White board lecture eip->instruction->decode->registers->alu->ear->data RAM etc.
The Intel 80x86 have privilege levels. There are instructions that can only be executed at the highest privilege level, CPL = 0. This would be reserved for the operating system in order to preven the average user from causing chaos. e.g. The average user could issue a HLT instruction to halt the machine and thus every process would be dead. Other CPL=0 only instructions include: CLTS Clear Task Switcing flag in cr0 INVP Invalidate cache INVLPG Invalidate translation lookaside buffer, TLB WBINVD Write Back and Invalidate cache It should be obvious that when running a multiprocessing operating system, that there are many instructions that only the operating system should use. The operating system controls the resources of the computer, including RAM, I/O and user processes. Some sample protections are tested by the following sample programs: A few simple tests to be sure protections are working. These three programs result in segfault, intentionally. safe.asm store into read only section safe1.asm store into code section safe2.asm jump (execute) data A few simple tests to be sure privileged instructions can not execute. priv.asm hlt instruction to halt computer priv1.asm other privileged instructions In order to allow the user some access, controlled access, to system resources, an interface to the operating system, or kernel, is provided. You will see in the next lecture that some BIOS functions are also provided as Linux kernel calls.
To understand Linux System Calls, learn from the UMBC Expert: Gary Burt. CMSC 313 -- System Calls System Call Table When making Linux kernel calls from a "C" program, you will need #include <unistd.h> A sample syscall1.asm demonstrates file open (unistd.h failed), file read (in hunks of 8192) and file write (the whole file!)
Go over lectures 1 through 11. See typical questions here
33 questions: some true-false, some multiple choice, some short answer, e.g. convert decimal to binary or binary to decimal A few one line to three line assembly language questions.
For these notes: 1 = true = high = value of a digital signal on a wire 0 = false = low = value of a digital signal on a wire X = unknown or indeterminant to people, not on a wire A digital logic gate can be represented at least three ways, we will interchangeably use: schematic symbol, truth table or equation. The equations may be from languages such as mathematics, VHDL or Verilog. Digital logic gates are connected by wires. A wire or a group of wires can be given a name, called a signal name. From an electronic view the digital logic wire has a high or a low (voltage) but we will always consider the wire to have a one (1) or a zero (0). The basic logic gates are shown below.The basic "and" gate:
truth table equation symbola b | c ----+-- c <= a and b; 0 0 | 0 0 1 | 0 c = a & b; 1 0 | 0 1 1 | 1 c = and(a,b) Easy way to remember: The output is 1 when all inputs are 1, 0 otherwise. In theory, an "and" gate can have any number of inputs.
The basic "and" gate:
truth table equation symbola b c | d d = and(a, b, c) ------+-- 0 0 0 | 0 notice how a truth table has the inputs 0 0 1 | 0 counting 0, 1, 2, ... in binary. 0 1 0 | 0 0 1 1 | 0 the output (may be more than one bit) is 1 0 0 | 0 after the vertical line, on the right. 1 0 1 | 0 1 1 0 | 0 1 1 1 | 1
The basic "or" gate:
truth table equation symbola b | c ----+-- c <= a or b; 0 0 | 0 0 1 | 1 c = a | b; 1 0 | 1 1 1 | 1 c = or(a,b) Easy way to remember: The output is 0 when all inputs are 0, 1 otherwise. In theory, an "or" gate can have any number of inputs.
The basic "or" gate:
truth table equation symbola b c | d d = or(a, b, c) ------+-- 0 0 0 | 0 notice how a truth table has the inputs 0 0 1 | 1 counting 0, 1, 2, ... in binary. 0 1 0 | 1 0 1 1 | 1 the output (may be more than one bit) is 1 0 0 | 1 after the vertical line, on the right. 1 0 1 | 1 1 1 0 | 1 1 1 1 | 1
The basic "nand" gate:
truth table equation symbola b | c ----+-- c <= a nand b; 0 0 | 1 0 1 | 1 c = ~ (a & b); 1 0 | 1 1 1 | 0 c = nand(a,b) Easy way to remember: "nand" reads "not and", the complement of "and".
The basic "nor" gate:
truth table equation symbola b | c ----+-- c <= a nor b; 0 0 | 1 0 1 | 0 c = ~ (a | b); 1 0 | 0 1 1 | 0 c = nor(a,b) Easy way to remember: "nor" reads "not or", the complement of "or".
The basic "xor" gate:
truth table equation symbola b | c ----+-- c <= a xor b; 0 0 | 0 0 1 | 1 c = a ^ b; 1 0 | 1 1 1 | 0 c = xor(a,b) Easy way to remember: "eXclusive or" not 11, or odd number of ones.
The basic "xor" gate:
truth table equation symbola b c | d ------+-- d <= a xor b xor c; 0 0 0 | 0 0 0 1 | 1 0 1 0 | 1 d = a ^ b ^ c; 0 1 1 | 0 1 0 0 | 1 d = xor(a,b,c) 1 0 1 | 0 1 1 0 | 0 1 1 1 | 1 Easy way to remember: odd parity, odd number of ones.
The basic "xnor" gate:
truth table equation symbola b | c ----+-- c <= a xnor b; 0 0 | 1 0 1 | 0 c = ~ (a ^ b); 1 0 | 0 1 1 | 1 c = xnor(a,b) Easy way to remember: "xnor" reads "not xor", equality or even parity.
The basic "not" gate:
truth table equation symbola | b --+-- b <= not a; 0 | 1 1 | 0 b = ~ a; b = not(a) Easy way to remember: invert or "not", the complement.
A specialized gate:
truth table equation symbola b c | d ------+-- d <= not( not a and b and not c); 0 0 0 | 1 0 0 1 | 1 0 1 0 | 0 d = ~( ~a & b & ~c); 0 1 1 | 1 1 0 0 | 1 d = and(not(a),b,not(c)) 1 0 1 | 1 _______ 1 1 0 | 1 _ _ 1 1 1 | 1 d = (a b c) Easy way to remember: none, just work it out. Bubbles on the input mean the same as bubbles on the output, invert the signal value. Often this is written with a line _ _ above the variable d = a b c which is read: d equals a bar and b and c bar. The word "bar" for the line above the variable, meaning invert the variable. It is known that there are 16 Boolean functions with two inputs. In fact, for any number of inputs, n, there are 2^(2^n) Boolean functions ( two to the power of two to the nth). For n=2 16 functions 2^4 n=3 256 functions 2^8 n=4 65,536 functions 2^16 n=5 over four billion functions 2^32 The truth table for all Boolean functions of two inputs is n x n x a a n o _ _ o n n o 1 1 o a b | 0 r 2 a 4 b r d d r b 1 a 3 r 1 ----+-------------------------------- 0 0 | 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 | 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 | 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 Notice that for two input variables, a b, there are 2^2 = 4 rows Notice that for four rows there are 2^4 = 16 columns. A question is: Which are "universal" functions from which all other functions can be obtained? The answer is that either "nand" or "nor" can be used to create all other functions (when having 0 and 1 available). It turns out that electric circuits rather naturally create "nand" or "nor" gates. No more than five "nand" gates or five "nor" gates are needed in creating any of the 16 Boolean functions of two inputs. Here are the circuits using only "nand" to get all 16 functions.
Combinational digital logic uses Boolean Algebra.
The basic relations are well known, yet several notations are used. Notation A: use words "and" "or" "not" etc. Notation B: use characters & for "and", | for "or", ~ for "not" Notation C: use characters * for "and", + for "or", - for "not" Notation D: use symbols "dot" for "and", + for "or", bar for "not" Notation E: use symbols "blank" for "and", + for "or", bar for "not" Generally, the symbols for "and" are like the symbols for multiply, the symbols for "or" are like the symbols for addition. In mathematics, multiplication always has precedence over addition, do not expect "and" to always have precedence over "or. Here are 19 basic identities that can be used to simplify or convert one Boolean equation to another. 1. X + 0 = X "or" anything with zero gives anything 2. X * 1 = X "and" anything with one gives anything 3. X + 1 = 1 "or" anything with one gives one 4. X * 0 = 0 "and" anything with zero gives zero 5. X + X = X "or" with self gives self 6. X * X = X "and" with self gives self _ 7. X + X = 1 "or" with complement gives one _ 8. X * X = 0 "and" with complement gives zero 9. not(not(X)) = X any even number of complements cancel 10. X + Y = Y + X "or" is commutative 11. X * Y = Y * X "and" is commutative 12. X + (Y + Z) = (X + Y) + Z "or" is associative 13. X * (Y * Z) = (X * Y) * Z "and" is associative 14. X * (Y + Z) = X * Y + X * Z distributive law 15. X + Y * Z = (X + Y) * (X + Z) distributive law _________ _ _ 16. X + Y = ( X * Y ) DeMorgan's theorem _________ _ _ 17. ( X + Y ) = X * Y DeMorgan's theorem _________ _ _ 18. ( X * Y ) = X + Y DeMorgan's theorem _________ _ _ 19. X * Y = ( X + Y ) DeMorgan's theorem Basically, DeMorgan's theorem says: Convert "and" to "or", negate the variables and negate the entire expression. Convert "or" to "and", negate the variables and negate the entire expression.Any truth table can be converted to a equation or schematic.
Given any truth table, there is a simple procedure for generating a Boolean equation that uses "and", "or" and "not" (any representation). First an example: Given truth table a b | c for each row where 'c' is 1, _ ----+-- create an "and" with 'a' if 'a' is 1, or 'a' if 'a' is 0 0 0 | 1 _ 0 1 | 0 with 'b' if 'b' is 1, or 'b' if 'b' is 0 1 0 | 1 _ _ 1 1 | 1 thus, first row a * b _ third row a * b fourth row a * b now, "or" the "and's" to form the final equation _ _ _ c = (a * b) + (a * b) + (a * b) c <= (not a and not b) or (a and not b) or (a and b); The schematic can be drawn directly, one "and" gate for each row where 'c' is 1 with a bubble for each variable that is 0The general process to convert a truth table (or partial truth table) to a Boolean equation using "and" "or" "not" is: For each output For each row where the output is 1 create a minterm that is the "and" of the input variables with the input variable complemented when the input variable is 0. The output is the "or" of the above minterms. Another example with three input variables and two outputs. a b c | s co ------+----- 0 0 0 | 0 0 _ _ _ _ _ _ 0 0 1 | 1 0 s = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c) 0 1 0 | 1 0 0 1 1 | 0 1 _ _ _ 1 0 0 | 1 0 co = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c) 1 0 1 | 0 1 1 1 0 | 0 1 1 1 1 | 1 1 The exact same information is presented by the schematic:
Note that this is not a minimum representation, we will talk about minimizing digital logic in a few lectures.
Any equation can be converted to a truth table.
Example, convert c <= (a and b) or (not a and b); _ c = (a * b) + (a * b) to a truth table We can immediately construct the truth table structure. We see the input variables are 'a' and 'b' and the output is 'c' We generate all possible values for input by counting in binary. a b | c ----+-- 0 0 | 0 1 | 1 0 | 1 1 | The only step that remains is to fill in the 'c' column. For the first row, substitute 0 for 'a' and 0 for 'b' in the equation and evaluate to find 'c' _ c = (0 * 0) + (0 * 0) = (0) + ( 1 * 0) = 0 (using identities above) For the second row, substitute 0 for 'a' and 1 for 'b' in the equation and evaluate to find 'c' _ c = (0 * 1) + (0 * 1) = (0) + ( 1 * 1) = 1 (using identities above) For the third row, substitute 1 for 'a' and 0 for 'b' in the equation and evaluate to find 'c' _ c = (1 * 0) + (1 * 0) = (0) + ( 0 * 0) = 0 (using identities above) For the fourth row, substitute 1 for 'a' and 1 for 'b' in the equation and evaluate to find 'c' _ c = (1 * 1) + (1 * 1) = (1) + ( 0 * 1) = 1 (using identities above) Filling in the values for 'c' gives the completed truth table: a b | c ----+-- 0 0 | 1 0 1 | 0 1 0 | 1 1 1 | 0Any digital logic schematic can be converted to a truth table.
Any equation can be converted to a schematic and any schematic can be converted to an equation. When converting a schematic to a truth table directly, you are simulating the actual behavior of the digital logic. From schematic below we can immediately construct the truth table structure. We see the input variables are 'a' and 'b' and the output is 'c' We generate all possible values for input by counting in binary. a b | c ----+-- 0 0 | 0 1 | 1 0 | 1 1 | Now, start by placing the values of input signals on the input wires. a=0, b=0. Note that signals other than inputs are labeled X for unknown. Then, as shown on the sequence of figures, propagate the signals. For each gate, use the gate input to compute the gate output. This is actually how the hardware works. Each gate is continually using the inputs to produce the output, with a small delay. All gates operate in parallel. All gates operate all the time.Working a little faster, apply truth table inputs a=0, b=1, then a=1, b=0, and finally a=1, b=1.
Filling in the values for 'c' gives the completed truth table: a b | c ----+-- 0 0 | 0 0 1 | 1 1 0 | 0 1 1 | 1
"Combinational logic" means gates connected together without feedback.
There is no storage of information. Inputs are applied and outputs
are produced. By convention, we draw combinational logic from
inputs on the left to outputs on the right. For large schematic
diagrams this convention is often violated.
When no constraints are given, any of the gates previously
defined can be connected to design a circuit that performs
the stated function.
Example: Design a circuit that has:
an input for tail lights both on
an input for right turn that lets the signal "osc" control right tail light.
an input for left turn that lets the signal "osc" control left tail light.
("osc" will make the light flash on and off as a turn indicator.)
Constraint: use "and" and "or" gates with inversion bubbles allowed
Solution: There are four inputs "tail" "right" "left" and "osc"
There are two outputs "right_light" and "left_light"
The general strategy in design is to work backward from an output.
Yet, as usual, some work from input toward output is also used.
"right_light" must select between "tail" and "osc". Selection
can typically be implemented by "and" gates feeding an "or" gate
with a control signal into one "and" gate and its complement into
the other "and" gate.
Analyzing this circuit, if "right" is off, "tail" controls
the "right_light". If "right is on, "osc" controls the "right_light".
A common symbol for this circuit is a multiplexor, mux for short.
The same circuit as above is usually drawn as the schematic diagram:
Now we can use the first schematic with new labeling for
the "left_light", combining the circuits yields:
Now a new requirement is added, the flashers must over ride all
other signals and make "osc" drive both right and left tail lights.
A typical design technique is to build on existing designs,
thus note that "flash" only needs to be able to turn on both
the old "right" and old "left". This is two "or" functions
that are easily added to the previous circuit.
In general a multiplexor can have any number of inputs.
Typically the number of inputs is a power of two and the
control signal, ctl, has the number of bits in the power.
ctl | out Note that "ctl" is a two bit signal, shown by the "2"
----+----
0 0 | a The truth table does not have to expand
0 1 | b a, b, c and d because the mux just passes
1 0 | c the values through to "out" based on the
1 1 | d value of "ctl"
For a general circuit that has some type of description, we use
a rectangle with some notation indicating the function of the
circuit. The inputs and outputs are given signal names.
There are many simulation and design tools available for digital logic. Some sections of CMSC 313 use B2Logic. This is a graphical interface program for use on Microsoft Windows. This program has many building blocks from the digital logic of the 1970 era. In this era dual in line packages had many 4-bit circuits. B2Logic allows a maximum of a 16 bit bus as a primitive. This is a practical learning tool for simple logic circuits, yet it can not handle todays designs, 32-bit and 64-bit computer architectures. Some sections of CMSC 313 use DigSim a Java applet that can be run from any WEB browser. DigSim is interactive and dynamic yet seems limited in circuit complexity and timing accuracy. Learn more at Richard Chang's WEB page There are major commercial Electronic Design Automation, EDA, systems for todays digital logic. Cadence is one of todays major suppliers and UMBC has Cadence software available on GL computers. Mentor Graphics, Synopsis and others provide large tool sets. Altera and Xilinx are major providers of software for making custom integrated circuits using Field Programmable Gate Arrays, FPGA. www.altera.com Altera has a downloadable student version. www.xilinx.com The student version of Xilinx came with your textbook. Be sure to install CD-Rom 2 of 2 first, if you wish to try this software. The best WEB site to find free EDA tools is www.geda.seul.org For projects for this section of CMSC 313 we will use Cadence VHDL that is available on linux.gl.umbc.edu.
First: You must have an account on a GL machine. Every student
and faculty should have this.
Either log in directly to linux.gl.umbc.edu or
Use ssh linux.gl.umbc.edu
You can copy many sample files to your working directory using:
cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411.tar .
Do not forget the final space dot. There are many files available.
Next: Follow instructions exactly or you figure out a variation.
1) Get this tar file into your home directory (on /afs i.e.
available on all GL machines.)
cs411.tar and then type commands:
cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411.tar .
tar -xvf cs411.tar
cd vhdl
mv Makefile.cadence Makefile
source vhdl_cshrc
make
more add32_test.out
make clean # saves a lot of disk quota
Then do your own thing with Makefile for other VHDL files
2) The manual, step by step method (same results as above)
Be in your home directory.
mkdir vhdl # for your source code .vhdl files
cd vhdl
mkdir vhdl_lib # your WORK library, keep hands off
You now need to get the following 6 files into you vhdl directory:
vhdl_cshrc
cds.lib change $HOME to your path if needed
hdl.var
Makefile.cadence for first test
add32_test.vhdl for first test
add32_test.run for first test
mv Makefile.cadence Makefile
# Run the test run:
source vhdl_cshrc
make # should be no error messages
more add32_test.out # it should have VHDL simulation output
make clean # saves on your quota
You are on your own to write VHDL and modify the Makefile.
Remember each time you log on:
cd vhdl
source vhdl_cshrc
make # or do your own thing.
The above is the latest generation Cadence "ldv" "ncvhdl, nceval, ncsim"
You can get working chips from VHDL using synthesis tools. One of the quickest ways to get chips is to use FPGA's, Field Programmable Gate Arrays. The two companies listed below provide the software and the foundry for you to design your own integrated circuit chips: www.altera.com www.xilinx.com Complete Computer Aided Design, CAD, packages are available from companies such as Cadence, Mentor Graphics and Synopsis.
Basic decimal addition (with carry digit shown)
101 <- carry (note that three numbers are added after first digit)
567
+ 526
-----
1093
Binary addition (with carry bit shown)
1011 <- carry (note that three bits are added after first bit)
for future reference c(3)=1, c(2)=0, c(1)=1, c(0)=1
1011 bits are numbered from zero, right to left
+ 1001
-----
10100 for future reference s(3)=0, s(2)=1, s(1)=0, s(0)=0
the leftmost '1' is cout
Since three bits must be added, a truth table for a full adder
needs three inputs and thus eight entries.
a b c | s co
------+----- _ _ _ _ _ _
0 0 0 | 0 0 s = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c)
0 0 1 | 1 0 simplifies to
0 1 0 | 1 0 s = a xor b xor c
0 1 1 | 0 1 s <= a xor b xor c;
1 0 0 | 1 0 _ _ _
1 0 1 | 0 1 co = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c)
1 1 0 | 0 1 simplifies to
1 1 1 | 1 1 co = (a*b)+(a*c)+(b*c)
co <= (a and b) or (a and c) or (b and c);
This can be drawn as a box for use on larger schematics
+-------+
| a b c | The inputs are shown at the top (or left)
| |
| fadd |
| |
| co s | The outputs are shown at the bottom (or right)
+-------+
The full adder can be written as an entity in VHDL
entity fadd is -- full stage adder, interface
port(a : in std_logic;
b : in std_logic;
c : in std_logic;
s : out std_logic;
co : out std_logic);
end entity fadd;
architecture circuits of fadd is -- full adder stage, body
begin -- circuits of fadd
s <= a xor b xor c after 1 ns;
co <= (a and b) or (a and c) or (b and c) after 1 ns;
end architecture circuits; -- of fadd
Connecting four full adders, four fadd's, to make a 4-bit adder
The connections are written for VHDL as
a0: entity WORK.fadd port map(a(0), b(0), cin, s(0), c(0));
a1: entity WORK.fadd port map(a(1), b(1), c(0), s(1), c(1));
a2: entity WORK.fadd port map(a(2), b(2), c(1), s(2), c(2));
a3: entity WORK.fadd port map(a(3), b(3), c(2), s(3), c(3));
Note that the carry out of the previous stage is wired into
the carry input of the next higher stage. In a computer,
four bits are added to four bits and this produces four bits of sum.
The last carry bit, c(3) here, is usually called 'cout' and is
not called a 'sum' bit.
The circuit was simulated with
a(3)=0, a(2)=0, a(1)=0, a(0)=1 cin=0
b(3)=1, b(2)=1, b(1)=1, b(0)=1
There is a small delay time from the input to the output.
When a circuit is simulated, the initial values of signals
are shown as 'U' for uninitialized. As the circuit simulation
proceeds, the 'U' are computed and become '0' or '1'.
Partial output from the VHDL simulation shows this propagation.
(the upper line is logic '1', the lower line is logic '0')
s(0) UU_____________________________
s(1) UUUUUU_________________________
s(2) UUUUUUUUUU_____________________
s(3) UUUUUUUUUUUUUU_________________
____________________________
c(0) UU
________________________
c(1) UUUUUU
____________________
c(2) UUUUUUUUUU
________________
c(3) UUUUUUUUUUUUUU
At the end of the simulation the values are:
s(0)=0, s(1)=0, s(2)=0, s(3)=0, c(0)=1, c(1)=1, c(2)=1, c(3)=1
The full VHDL code is add_trace.vhdl
The run file is add_trace.run
The full output file is add_trace.out
A fragment of the Makefile is Makefile.add_trace
Given that the computer can "add" it now has to be able to "subtract."
Thus, a representation has to be chosen for negative numbers.
All computers have chosen the left most bit (also called the
high-order bit) to be the sign bit. The convention is that a '1'
in the sign bit means negative, a '0' in the sign bit means positive.
Within these conventions, three representations have been used
in computers: two's complement, one's complement and sign magnitude.
All bits are shown for 4-bit words in the table below.
decimal twos complement ones complement sign magnitude
0 0000 0000 0000
1 0001 0001 0001
2 0010 0010 0010
3 0011 0011 0011
4 0100 0100 0100
5 0101 0101 0101
6 0110 0110 0110
7 0111 0111 0111
-8 1000 - -
-7 1001 1000 1111
-6 1010 1001 1110
-5 1011 1010 1101
-4 1100 1011 1100
-3 1101 1100 1011
-2 1110 1101 1010
-1 1111 1110 1001
-0 - 1111 1000
We could choose to build a subtractor that uses a borrow, yet
this would require as many gates as were needed for the adder.
By choosing the two's complement representation of negative
numbers, an adder with a relatively low gate count multiplexor
and inverter can become a subtractor. The implementation follows
the definition of a negative number in two's complement
representation: invert the bits and add one.
Given a new symbol for an adder, the complete circuit for
doing 4-bit add and subtract becomes:
When the signal "subtract" is '1' the circuit subtracts 'b' from 'a'.
When the signal "subtract" is '0' the circuit adds 'a' to 'b'.
The basic circuit is written for VHDL as:
a4: entity work.add4 port map(a, b_mux, subtract, sum, cout);
i4: b_bar <= not b;
m4: entity work.mux4 port map(b, b_bar, subtract, b_mux);
The general rule is that each circuit component symbol on
a schematic diagram will become one VHDL statement.
There are many other VHDL statements needed to run a complete
simulation.
The annotated output of the simulation is:
subtract=0, a=0100, b=0010, sum=0110 4+2=6
subtract=1, a=0100, b=0010, sum=0010 4-2=2
subtract=0, a=1100, b=0010, sum=1110 (-4)+2=(-2)
subtract=1, a=1100, b=0010, sum=1010 (-4)-2=(-6)
subtract=0, a=1100, b=1110, sum=1010 (-4)+(-2)=(-6)
subtract=1, a=1100, b=1110, sum=1110 (-4)-(-2)=(-2)
subtract=0, a=0011, b=1110, sum=0001, 3+(-2)=1
subtract=1, a=0011, b=1110, sum=0101, 3-(-2)=5
The full VHDL code is sub4.vhdl
The run file is sub4.run
The full output file is sub4.out
A fragment of the Makefile is Makefile.sub4
Multiplication and division are taught in elementary school, yet
they are still being worked on for computer applications.
The earliest computers just provided add and subtract with
conditional Branch, leaving the programmer to write multiply
and divide subroutines.
Early computers used bit-serial methods that required about
N squared clock times for multiplying or dividing N-bit numbers.
With a parallel adder, the time for multiply was reduced to
N/2 clock times (Booth algorithm) and division N clock times.
Todays computers use parallel, combinational, circuits for
multiply and divide. These circuits still take too long for
signals to propagate in one clock time. The combinational
circuits are "pipelined" so that a multiply or divide can be
completed every clock time.
Consider multiplying unsigned numbers 1010 * 1100 (10 times 12)
Using a hand method would produce:
1010
* 1100
---------
0000 <- think of the multiplier bit being "anded" with
0000 the multiplicand. A 1-bit "and" in digital logic
1010 is like a 1-bit "multiply".
1010
---------
01111000 4-bits times 4-bits produces an 8-bit product
When adding by hand, we can add the middle columns four bits and
produce a sum bit and possibly a carry. In hardware the number
of input bits is fixed. From the previous lecture, we could use
four 4-bit adders with additional "and" gates to do the multiply.
A better design incorporates the "and" gate to do a 1-bit multiply
inside the previous lectures full adder. With this single building
block, that is easy to replicate many times, we get the following
parallel multiplier design.
The 4-bit by 4-bit multiply to produce an 8-bit unsigned product is
The component madd circuit is
The VHDL source code is pmul4.vhdl
The VHDL test driver is pmul4_test.vhdl
The VHDL output is pmul4_test.out
The Cadence run file is pmul4_test.run
The partial Makefile is Makefile.pmul4_test
Notice that the only component used to build the multiplier
is "madd" and some uses of "madd" have constants as inputs.
It is technology dependent whether the same circuit is used
or specialized, minimized, circuits are substituted.
Division is performed by using subtraction. A sample unsigned binary
division of an 8-bit dividend by a 4-bit divisor that produces
a 4-bit quotient and 4-bit remainder is:
1010 <- quotient
/---------
1100/ 01111011 <- dividend
-1100
-----
0110
-0000
------
1101
-1100
------
0011
-0000
-----
0011 <- remainder
With a parallel adder and a double length register, serial division
can be performed. Conventional division requires a trial subtraction
and possibly a restore of the partial remainder. A non restoring
serial division requires N clock times for a N-bit divisor.
The schematic for a parallel 8-bit dividend divided by 4-bit divisor
to produce an 4-bit quotient and 4-bit remainder is:
Notice that the building block is similar to the 'madd' component
in the parallel multiplier. The 'cas' component is the same full
adder with an additional xor gate.
The VHDL test driver is divcas4_test.vhdl
The VHDL output is divcas4_test.out
The Cadence run file is divcas4_test.run
The partial Makefile is Makefile.divcas4_test
Divide can create on overflow condition. This is typically handled by
separate logic in order to keep the main circuit neat. There is a
one bit preshift of the dividend in the manual, serial and parallel
division. Thus, no dividend bit number seven appears on the parallel
schematic.
A Karnaugh map, K-map, is a visual representation of a Boolean function.
The plan is to recognize patterns in the visual representation and
thus find a minimized circuit for the Boolean function.
There is a specific labeling for a Karnaugh map for each number
of circuit input variables. A Karnaugh map consists of squares where
each square represents a minterm. Notice that only one variable can
change in any adjacent horizontal or vertical square. Remember that
a minterm is the input pattern where there is a '1' in the output
of a truth table.
After the map is drawn and labeled, a '1' is placed in each square
corresponding to a minterm of the function. Later an 'X' will be
allowed for "don't care" minterms. By convention, no zeros are
written into the map.
Having a filled in map, visual skills and intuition are used to
find the minimum number of rectangles that enclose all the ones.
The rectangles must have sides that are powers of two. No
rectangle is allowed to contain a blank square. The map is a toroid
such that the top row is logically adjacent to the bottom row and
the right column is logically adjacent to the left column. Thus
rectangles do not have to be within the two dimensional map.
The resulting minimized boolean function is written as a sum of
products. Each rectangle represents a product, "and" gate, and
the products are summed, "or gate", to produce the result. A rectangle
that contains both a variable and its complement does not have
that variable in the product term, omit the variable as an input
to the "and" gate.
Basic labeling Minterm numbers Minterms
B=0 B=1 B=0 B=1 B=0 B=1
+---+---+ +---+---+ +---+---+
A=0 | | | A=0 |m0 |m1 | A=0 |__ |_ |
+---+---+ +---+---+ |AB |AB |
A=1 | | | A=1 |m2 |m3 | +---+---+
+---+---+ +---+---+ A=1 | _ | |
|AB |AB |
+---+---+
Truth table Karnaugh map Covering with rectangles
A B | F B=0 B=1 B=0 B=1
----+-- +---+---+ +-----+-----+
0 0 | 0 A=0 | | 1 | | |+---+|
0 1 | 1 m1 +---+---+ A=0 | || 1 ||
1 0 | 1 m2 A=1 | 1 | | | |+---+|
1 1 | 0 +---+---+ +-----+-----+
|+---+| |
A=1 || 1 || |
|+---+| |
+-----+-----+
_ _
Minimized function F = AB + AB
Note: For each covering rectangle, there will be exactly one
product term in the final equation for the function.
Find the variable(s) that are both 1 and 0 in the rectangle.
Such variables will not appear in the product term. Take any
minterm from the covering rectangle, replace 1 with the variable,
replace 0 with the complement of the variable. Cross out the
variables that do not appear. The result is exactly one product
term needed by the final equation of the function.
It is possible to have minterms that are don't care. For these
minterms, place an "X" or "-" in the Karnaugh map rather than
a one. The covering follows the obvious extended rule.
Covering rectangles may include any don't care squares.
Covering rectangles do not have to include don't care squares.
No rectangle can enclose only don't care squares.
A tabular algorithm for producing the minimum two level sum of products
is know as the Quine McClusky method.
You may download and build the software that performs this minimization.
qm.tgz or link to a Linux executable
ln -s /afs/umbc.edu/users/s/q/squire/pub/linux/qm qm
The man page, qm.1 , is in the same directory.
The algorithm may be performed manually using the following steps:
1) Have available the minterms of the function to be minimized.
There may be X's for don't care cases.
2) Create groups of minterms, starting with the minterms with the
fewest number of ones.
All minterms in a group must have the same number of ones and
if any X's, the X's must be in the same position. There may be
some groups with only one minterm.
3) Create new minterms by combining minterms from groups that
differ by a count of one. Two minterms are combined if they
differ in exactly one position. Place an X in that position
of the newly created minterm. Mark the minterms that are
used in combining (they will be deleted at the end of this step).
Basically, take the first minterm from the first group. Compare
this minterm to all minterms in the next group(s) that have
one additional one. Repeat working until the last group is reached.
4) Delete the marked minterms.
5) Repeat steps 2) 3) and 4) until no more minterms are combined.
6) The minimized function is the remaining minterms, deleting any
X's.
Example:
1) Given the minterms
A B C D | F
--------+--
0 0 0 0 | 1 m0
0 0 1 0 | 1 m2
1 0 0 0 | 1 m8
1 0 1 0 | 1 m10
2) Create groups
m0 0 0 0 0 count of 1's is 0
-------
m2 0 0 1 0 count of 1's is 1
m8 1 0 0 0
-------
m10 1 0 1 0 count of 1's is 2
3) Create new minterms by combining
Compare all in first group to all in second group
m0 to m2 0 0 0 0
0 0 1 0
======= they differ in one position
0 0 X 0 combine and put an X in that position
m0 to m8 0 0 0 0
1 0 0 0
======= they differ in one position
X 0 0 0 combine and put an X in that position
Compare all in second group to all in third group
m2 to m10 0 0 1 0
1 0 1 0
======= they differ in one position
X 0 1 0 combine and put an X in that position
m8 to m10 1 0 0 0
1 0 1 0
======= they differ in one position
1 0 X 0 combine and put an X in that position
no more candidates to compare.
4) Delete marked minterms (those used in any combining)
(do not keep duplicates) Thus the minterms are now:
0 0 X 0
X 0 0 0
X 0 1 0
1 0 X 0
2) Repeat grouping (technically there are four groups, although
the number of ones is either zero or one).
0 0 X 0
-------
X 0 0 0
-------
X 0 1 0
-------
1 0 X 0
3) Create new minterms by combining
0 0 X 0
1 0 X 0 any X's must be the same in both
======= they differ in one position
X 0 X 0 combine and put an X in that position
X 0 0 0
X 0 1 0
======= they differ in one position
X 0 X 0 combine and put an X in that position
4) Delete marked minterms (those used in any combining)
(do not keep duplicates) Thus the minterms are now:
X 0 X 0
5) No more combining is possible.
6) The minimized function is the remaining minterms, deleting any
X's. All remaining minterms are prime implicants
A B C D __
X 0 X 0 thus F = BD
In essence, the Quine McClusky algorithm is doing the same
operations as the Karnaugh map. The difference is that no guessing
is used in the Quine McClusky algorithm and "qm" as it is called,
can be (and has been) implemented as a computer program.
A final note on labeling:
It does not matter what names are used for variables.
It does not matter in what order variables are used.
It does not matter if "-" or "X" is used for don't care.
It is important to keep a consistent relation between the bit
positions in minterms and the order of variables.
You may download and build the software that performs this minimization.
qm.tgz or link to a Linux executable
ln -s /afs/umbc.edu/users/s/q/squire/pub/linux/qm qm
The man page, qm.1 , is in the same directory.
More information is at Simulators and parsers
We now focus on sequential logic. Logic with storage and state.
The previous lectures were on combinational logic, gates.
In order to build very predictable large digital logic systems,
synchronous design is used. A synchronous system has a special
signal called a master clock. The clock signal continuously
has values 0101010101010 ... . This is usually just a square
wave generator at some frequency. A clock with frequency 1 GHz
has a period of 1 ns. Half of the period the clock is a logical 1
and the other half of the clock period the clock is a logical 0.
___ ___ ___
clk ___| |___| |___|
|< 1 ns>|
The VHDL code fragment to generate the clk signal is:
signal clk : std_logic := '0';
begin
clk <= not clk after 500 ps;
A synchronous system is designed with registers that input a
value on a raising clock edge, hold the signal until the next
raising clock edge. The designer must know the timing of
combinational logic because the signals must propagate through
the combinational logic in less than a clock time.
Combinational logic can not have loops or feedback.
Sequential logic is specifically designed to allow loops and
feedback. The design rule is that and loop or feedback must
include a storage element (register) that is clocked.
+------------------------------------+
| |
| +---------------+ +----------+ |
+->| combinational |-->| register |--+
| logic | | |
+---------------+ +----------+
^
| clock signal
A register may be many bits and each bit is built from a flip flop.
A flip flop is ideally either in a '1' state or a '0' state.
The most primitive flip flop is called a latch. A latch can be made
from two cross coupled nand gates. The latch is not easy to work
with in large circuits, thus JK flip flops and D flip flops are
typically used. In modern large scale integrated circuits, the
flip flops and thus the registers are designed at the device level.
A classical model of a JK flip flop is
On the raising edge of the clock signal,
if J='1' the Q output is set to '1'
if K='1' the Q output is set to '0'
if both J and K are '1', the Q signal is inverted.
Note that Q_BAR is the complement of Q in the steady state.
There is a transient time when both could be '1' or both could be '0'.
The SET signal is normally '1' yet can be set to '0' for a short
time in order to force Q='1' (set the flip flop).
The RESET signal is normally '1' yet can be set to '0' for a short
time in order to force Q='0' (reset the flip flop or register to zero).
A slow counter, called a ripple counter, can be made from JK flip
flops using the following circuit:
The VHDL source code for the entity JKFF, the JK flip flop,
and the four bit ripple counter is jkff_cntr.vhdl
The Cadence run file is jkff_cntr.run
The Cadence output file is jkff_cntr.out
ncsim: 04.10-s017: (c) Copyright 1995-2003 Cadence Design Systems, Inc.
ncsim> run 340 ns
q3, q2, q1, q0 q3_ q2_ q1_ q0_ clk
0 0 0 0 1 1 1 1 1 at 10 NS
0 0 0 1 1 1 1 0 1 at 30 NS
0 0 1 0 1 1 0 1 1 at 50 NS
0 0 1 1 1 1 0 0 1 at 70 NS
0 1 0 0 1 0 1 1 1 at 90 NS
0 1 0 1 1 0 1 0 1 at 110 NS
0 1 1 0 1 0 0 1 1 at 130 NS
0 1 1 1 1 0 0 0 1 at 150 NS
1 0 0 0 0 1 1 1 1 at 170 NS
1 0 0 1 0 1 1 0 1 at 190 NS
1 0 1 0 0 1 0 1 1 at 210 NS
1 0 1 1 0 1 0 0 1 at 230 NS
1 1 0 0 0 0 1 1 1 at 250 NS
1 1 0 1 0 0 1 0 1 at 270 NS
1 1 1 0 0 0 0 1 1 at 290 NS
1 1 1 1 0 0 0 0 1 at 310 NS
0 0 0 0 1 1 1 1 1 at 330 NS
________________________________________________________________
reset
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
clk | |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_
___ ___ ___ ___ ___ ___ ___ ___
q0 ___| |___| |___| |___| |___| |___| |___| |___| |
_______ _______ _______ _______
q1 _______| |_______| |_______| |_______| |
_______________ _______________
q2 _______________| |_______________| |
_______________________________
q3 _______________________________| |
Ran until 340 NS + 0
ncsim> exit
In many designs, only one input is needed and the resulting flip flop
is a D flip flop. A D flip flop needs 6 nand gates rather than the
9 nand gates needed by the JK flip flop. There is a proportional
reduction is devices when the flip flop is designed from basic
transistors.
The VHDL source code for the entity DFF, the D flip flop,
and the four bit counter is dff_cntr.vhdl
The Cadence run file is dff_cntr.run
The Cadence output file is dff_cntr.out
The VHDL source code for the D flip flops in the text book
is dff.vhdl
Entity dff1 is five nand model from page 248, Fig. 6-8.
Entity dff2 is six nand model from page 300, Fig. 6-36
The Cadence run file is dff.run
The Cadence output file is dff.out
The D flip flop ripple counter from the book
Sequential logic can be represented as three equivalent forms:
State Transition Table, State Transition Diagram and logic circuit.
A State is given a name, we use A, B, C for this discussion, yet
meaningful names are better. A machine or sequential logic circuit
can be in only one state at a time. We are assuming synchronous
logic where all flip flops are clocked by the same clock signal.
The input signal is assumed to be available just before each clock
transition. Optionally, the arrival of an input can also cause the
clock to have one pulse.
Two possible forms of State Transition Table are:
Input state state
| 0 | 1 Input
--+---+--- A 0 B
state A | B | C A 1 C
B | C | B B 0 C
C | C | A B 1 B
C 0 C
C 1 A
The meaning of both tables is:
When in state A with input 0 transition to state B
When in state A with input 1 transition to state C
When in state B with input 0 transition to state C
When in state B with input 1 stay in state B
etc.
The exact same information can be presented as a
State Transition Diagram.
The meaning is the same:
When in state A with input 0 transition to state B
When in state A with input 1 transition to state C
When in state B with input 0 transition to state C
When in state B with input 1 stay in state B
etc.
To convert either a State Transition Table or Diagram to
a circuit, assign a D flip flop to each state. The "q" output
of the flip flop is assigned the signal name of the state.
The "d" input of the flip flop is assigned a signal name
of the state concatenated with "in".
Write the combinational logic equations for each state
input from observing the "to" state in the transition table
or diagram.
For this sequential machine, using I as the input
Ain <= (C and I); -- C transitions to A when I='1'
Bin <= (A and not I) or -- A transitions to B when I='0'
(B and I); -- B transitions to B when I='1'
Cin <= (A and I) or -- A transitions to C when I='1'
(B and not I) or -- B transitions to C when I='0'
(C and not I); -- C transitions to C when I='0'
The partial circuit is shown below.
Implied is a set signal to A and reset signals to
B and C for the initial or start condition.
Implied is a common clock signal to all flip flops.
Not shown is the output(s) that may be any combinational
circuit, function, of the input and states.
e.g. out <= (A and I) or (B and not I);
There is an algorithm and corresponding computer program for
minimizing the State Transition Table,
see Myhill Nerode minimization.
There is an algorithm and corresponding computer program for
minimizing the combinational logic
are Quine McClusky minimization.
One application of sequential logic is for garage door openers
or car door locks. The basic sequential logic is a spin lock.
This circuit has the property of eventually detecting the
specific sequence it is designed to accept. The transmitter may
start anywhere in the sequence and continue to repeat the sequence
until the receiver detects the specific sequence.
This "spin lock" is designed to accept the sequence 101101.
A transmitter could be designed to send the specific sequence
followed by an equal number of zero bits then repeat the
specific sequence. (Assuming the first bit of the sequence is a '1')
More sophisticated spin locks will change the sequence that is
detected each time a sequence is accepted. The transmitter must then
send a family of sequences because, in general, the transmitter
will not know what the receivers sequence setting is. A method
of handling this unknown is to have the receiver change to a
pseudo random setting of some of the bit positions. The
transmitter then generates and transmits all of the pseudo random
patterns in the correct bit positions. Sample pseudo random
sequence generators are shown below.
A maximal length pseudo random sequence generator can generate
2^n -1 unique patterns with an n-bit shift register. For each
number of shift register stages there are one or more feedback
circuits using just exclusive-or to compute the next input bit.
The output may be n-bit patterns, a2, a1. a0 in the circuit below.
The output may also be a bit stream taken from just a0 or a2.
The basic shift register is clocked at some frequency. Bits shift
left to right one position per clock. The top bit is inserted
based on the feedback into the exclusive-or gate(s).
A sequence, starting with the "seed" 0 0 1 is shown below:
0 0 1
1 0 0
0 1 0
1 0 1
1 1 0
1 1 1
0 1 1
0 0 1
Notice the 2^3 -1 = 7 unique patterns and then the repeat.
A maximal length pseudo random shift register for 5-bit patterns is
shown in typical abbreviated schematic form.
With a seed of 0 0 0 0 1 the next few values are
1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
0 1 1 1 0
The full output sequence with bits reversed
Maximal length pseudo random sequences may be generated for
any length. Below in short hand notation is the feedback paths
for many lengths up to 32.
/* length bits(high order first) of h[], right is h[0]
* 2 1 1 1 top bit always one, the input to msb stage
* 3 1 0 1 1 bottom bit always one, output of lsb stage
* 4 1 0 0 1 1 x(4)= x^1+x^0 x^0=1 initially
* 5 1 1 0 1 1 1 x(5)=x^4+x^2+x^1+x^0 x^0=1 initially
* 6 1 0 0 0 0 1 1 + is exclusive or
* 7 1 1 1 1 0 1 1 1
* 8 1 1 1 1 0 0 1 1 1
* 9 1 1 1 0 0 0 1 1 1 1
* 10 1 1 0 0 1 1 1 1 1 1 1
* 11 1 1 0 1 1 0 0 1 1 1 1 1
* 12 1 1 0 0 0 1 0 0 1 0 1 1 1
* 13 1 1 0 0 0 1 1 1 1 1 1 1 1 1
* 14 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1
* 15 1 1 0 1 0 0 0 1 1 0 1 1 1 1 1 1
* 16 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1
* 18 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1
* 20 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1
* 24 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1
* 30 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
* 31 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1
* 32 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 1 1
*/
Below is a schematic of a one clock per instruction computer.The operation for each instruction is: The Instruction Pointer Register contains the address of the next instruction to be executed. The instruction address goes into the Instruction Memory of Instruction Cache and the instruction comes out. "inst" on the diagram. The Instruction Decode has all the bytes of the instruction: The instruction has bits for the operation code. e.g. there is a different bit pattern for add, sub, etc. Most instructions will reference one register. The register number has enough bits to select one of the general registers. Many instructions have a second register. (Not shown here, on some computers there can be three registers.) The second (or third) register may be the register number that receives the result of the operation. Many instructions have either a memory address for a operand or a memory offset from a register or immediate data for use by the operation. This data is passed into the ALU for use by the operation, either for computing a result or computing an address. The general registers receive two register numbers and very quickly output the data from those two registers. The ALU receives two data values and control from the Operation Code part of the instruction. The ALU computes the value and outputs the value on the line labeled "addr". This line goes three places: To the mux and possibly into the Instruction Pointer if the operation is a jump or a branch. To the Data Memory or Data Cache if the value is a computed memory address. To the mux that may return the value to a register. The Data Memory or Data Cache receives an address and write data. Depending on the control signals "write" and "read": The Data Memory reads the memory value and send it to the mux. The Data Memory writes the "write date" into memory at the memory location "addr". The final mux may take a value just read from the Data Memory or Data Cache and return that value to a register or take the computed value from the ALU and return that value to a register. While the above signals are propagating, the Instruction Pointer is updated by either incrementing by the number of bytes in the instruction or from the jump or branch address. This is one instruction, the clock transitions and the next instruction is started. The timing consideration that limits the speed of this design is the long propagation from the new Instruction Pointer value until the register is written. Notice that the register is written on clock_bar and the Data Cache is written on clock_bar. Any real computer must use instruction and data caches for this design because RAM memory access is slower than logic on the CPU chip.
This lecture uses Intel documentation on the IA-32 Architecture. In principal this covers all Intel 80x86 machines up to and including the Pentium 4. Stored locally in order to minimize network traffic. First look over Appendix B. (This is a .pdf file that your browser should activate acroread to display. Look on the left for a table of contents and ultimately click on Appendix B. Intel IA-32 Instructions(pdf) Note the "One Byte" opcodes. There are two tables with up to 128 instruction operation codes in each table. Then move on to the "Two Byte" opcodes. The first opcode byte would tell the CPU to look at the next byte to determine the operation code for this instruction. Now, move back to Appendix A and see the various formats that an instruction may have. Consider the choices that would have to be made by a programmer writing a disassembler for this architecture. Intel IA-32 Instructions(pdf) The IA-32 is a CISC, Complex Instruction Set Computer. This is in contrast to computer architectures such as the Alpha, MIPS, PowerPC = Power4 = MAC G5, etc. that are RISC, Reduced Instruction Set Computer. "Reduced" does not mean, necessarily, fewer instructions. "Reduced" means lower complexity and more regularity. Typically all instructions are the same number of bytes. Four bytes equals 32 bits is the most popular. Regular in the sense that all registers are general purpose. Not like the IA-32 using EAX and EDX for multiply and divide.
The CPU can be described by control paths and data paths. This lecture will follow a few instructions through the data paths indicated in the architecture schematic in lecture 24 We then consider instruction timing and the possible improvement of having higher clock speeds with a pipelined computer architecture.
Review previous lectures Sample questions will be presented in class.
The midterm was considered the end of the Assembly Language
part of this course. Thus, the final exam will cover
lectures 15 through 29 on digital logic and computer organization.
There will be questions of types:
true-false
multiple choice
short answer (words, numbers, logic equations)
know the symbols and truth tables for
"and" "nand" "or" "nor" "not" "xor" "mux" "dff"
know how to recognize the corresponding State Diagram,
State Transition Table and schematic and VHDL statements
for sequential logic. (e.g. project)
know how to construct Karnaugh map from minterms.
know how to get VHDL equation from Karnaugh map.
recognize adders, subtractors and simple logic circuits.
understand data flow through a computer architecture.
Last updated 5/9/04