CS313 Selected Lecture Notes

This is one big WEB page, used for printing

 These are not intended to be complete lecture notes.
 Complicated figures or tables or formulas are included here
 in case they were not clear or not copied correctly in class.
 Source code may be included in line or by a link.

 Lecture numbers correspond to the syllabus numbering.

Contents

  • Lecture 1 Number Systems
  • Lecture 2 NASM
  • Lecture 3 Registers, Syntax and sections
  • Lecture 4 Arithmetic and shifting
  • Lecture 5 Using Debugger
  • Lecture 6 Branching and loops
  • Lecture 7 Subroutines and stacks
  • Lecture 8 Boot programs and 16-bit
  • Lecture 9 BIOS calls
  • Lecture 10 Hardware interface
  • Lecture 11 Privileged instructions
  • Lecture 12 Linux kernel calls
  • Lecture 13 Review
  • Lecture 14 Mid term exam
  • Lecture 15 Logic Gates
  • Lecture 16 Combinational logic
  • Lecture 17 Combinational logic design
  • Lecture 18 Simulation tools
  • Lecture 19 Arithmetic circuits
  • Lecture 20 Multiply and divide
  • Lecture 21 Karnaugh maps, Quine McClusky
  • Lecture 22 Flip-flops, latches, registers
  • Lecture 23 Sequential logic
  • Lecture 24 Computer organization
  • Lecture 25 Instruction set
  • Lecture 26 Data Paths
  • Lecture 27 Arithematic Logic Unit
  • Lecture 28 Architecture
  • Lecture 29 Review
  • Lecture 30 Final Exam
  • Other Links
  • Lecture 1 Number systems

    Numbers are represented as the coefficients of powers of a base.
    (in plain text, we use "^" to mean, raise to power or exponentiation)
    
    With no extra base indication, expect decimal numbers:
    
             12.34   is a representation of
    
      1*10^1 + 2*10^0 + 3*10^-1 + 4*10^-2  or
    
         10
          2
           .3
        +  .04
        ------
         10.34
    
    
    Binary numbers, in NASM assembly language, have a trailing B or b.
    
         101.11B  is a representation of
    
      1*2^2 + 0*2^1 + 1*2^0 + 1*2^-1 + 1*2^-2   or
    
         4
         0
         1
          .5        (you may compute 2^-n or look up in table below)
       +  .25
       ------
         5.75
    
    Converting a decimal number to binary may be accomplished:
    
       Convert  12.34  from decimal to binary
    
       Integer part                      Fraction part
            quotient remainder                integer fraction
       12/2 =   6       0              .34*2 =      0.68
        6/2 =   3       0              .68*2 =      1.36
        3/2 =   1       1              .36*2 =      0.72
        1/2 =   0       1              .72*2 =      1.44
        done                           .44*2 =      0.88
        read up  1100                  .88*2 =      1.76
                                       .76*2 =      1.52
                                       .52*2 =      1.04
                                       quit
                                       read down   .01010111
        answer is  1100.01010111
    
    
      Powers of 2
                       Decimal
                     n         -n
                    2    n    2
                     1   0   1.0 
                     2   1   0.5 
                     4   2   0.25 
                     8   3   0.125 
                    16   4   0.0625 
                    32   5   0.03125 
                    64   6   0.015625 
                   128   7   0.0078125 
                   256   8   0.00390625
                   512   9   0.001953125
                  1024  10   0.0009765625 
                  2048  11   0.00048828125 
                  4096  12   0.000244140625 
                  8192  13   0.0001220703125 
                 16384  14   0.00006103515625 
                 32768  15   0.000030517578125 
                 65536  16   0.0000152587890625 
    
                       Binary
                     n         -n
                    2    n    2
                     1   0   1.0 
                    10   1   0.1
                   100   2   0.01 
                  1000   3   0.001 
                 10000   4   0.0001 
                100000   5   0.00001 
               1000000   6   0.000001 
              10000000   7   0.0000001 
             100000000   8   0.00000001
            1000000000   9   0.000000001
           10000000000  10   0.0000000001 
          100000000000  11   0.00000000001 
         1000000000000  12   0.000000000001 
        10000000000000  13   0.0000000000001 
       100000000000000  14   0.00000000000001 
      1000000000000000  15   0.000000000000001 
     10000000000000000  16   0.0000000000000001 
    
    
                      Hexadecimal
                     n         -n
                    2    n    2
                     1   0   1.0 
                     2   1   0.8
                     4   2   0.4 
                     8   3   0.2 
                    10   4   0.1 
                    20   5   0.08 
                    40   6   0.04 
                    80   7   0.02 
                   100   8   0.01
                   200   9   0.008
                   400  10   0.004 
                   800  11   0.002 
                  1000  12   0.001 
                  2000  13   0.0008 
                  4000  14   0.0004 
                  8000  15   0.0002 
                 10000  16   0.0001 
                 
                 
            n                       n
        n  2  hexadecimal          2  decimal  approx  notation
       10             400               1,024   10^3   K kilo
       20          100000           1,048,576   10^6   M mega
       30        40000000       1,073,741,824   10^9   G giga
       40     10000000000   1,099,511,627,776   10^12  T tera
    
    The three representations of negative numbers that have been
    used in computers are  twos complement,  ones complement  and
    sign magnitude. In order to represent negative numbers, it must
    be known where the "sign" bit is placed. All modern binary
    computers use the leftmost bit of the computer word as a sign bit.
    
    The examples below use a 4-bit register to show all possible
    values for the three representations.
    
     decimal   twos complement  ones complement  sign magnitude
           0      0000            0000             0000
           1      0001            0001             0001
           2      0010            0010             0010
           3      0011            0011             0011
           4      0100            0100             0100
           5      0101            0101             0101
           6      0110            0110             0110
           7      0111            0111             0111
          -7      1001            1000             1111
          -6      1010            1001             1110
          -5      1011            1010             1101
          -4      1100            1011             1100
          -3      1101            1100             1011
          -2      1110            1101             1010
          -1      1111            1110             1001
              -8  1000        -0  1111         -0  1000
                      ^           /                ^||| 
                       \_ add 1 _/          sign__/ --- magnitude
    
    To get the sign magnitude, convert the decimal to binary and
    place a zero in the sign bit for positive, place a one in the
    sign bit for negative.
    
    To get the ones complement, convert the decimal to binary,
    including leading zeros, then invert every bit. 1->0, 0->1.
    
    To get the twos complement, get the ones complement and add 1.
    (Throw away any bits that are outside of the register)
    
    It may seem silly to have a negative zero, but it is
    mathematically incorrect to have -(-8) = -8
    
    

    Lecture 2 Getting and using NASM

    NASM is installed on  linux.gl.umbc.edu  and can be used there.
    
    From anywhere that you can reach the internet, log onto your
    UMBC account using:
    
        ssh  your-user-id@linux.gl.umbc.edu
        your-password
    
    
    You should set up a directory for CMSC 313 and keep all your
    course work in one directory.
    
      e.g.    mkdir  cs313  # only once
              cd  cs313
    
    Copy over a sample program to your directory using:
    
     cp /afs/umbc.edu/users/s/q/squire/pub/download/hello.asm  .
    
    
    Assemble hello.asm using:
    
      nasm -f elf hello.asm
    
    Link to create an executable using:
    
      gcc -o hello  hello.o
    
    Execute the program using:
    
      hello
    
    Now look at the file hello.asm
    
    ;  hello.asm  a first program for nasm for Linux, Intel, gcc
    ;
    ; assemble:	nasm -f elf -l hello.lst  hello.asm
    ; link:		gcc -o hello  hello.o
    ; run:	        hello 
    ; output is:	Hello World 
    
    	SECTION .data		; data section
    msg:	db "Hello World",10	; the string to print, 10=lf or '\n'
    len:	equ $-msg		; "$" means "here"
    				; len is a value, not an address
    
    	SECTION .text		; code section
            global main		; make label available to linker 
    main:				; standard  gcc  entry point
    	
    	mov	edx,len		; arg3, length of string to print
    	mov	ecx,msg		; arg2, pointer to string
    	mov	ebx,1		; arg1, where to write, screen
    	mov	eax,4		; write command to int 80 hex
    	int	0x80		; interrupt 80 hex, call kernel
    	
    	mov	ebx,0		; exit code, 0=normal
    	mov	eax,1		; exit command to kernel
    	int	0x80		; interrupt 80 hex, call kernel
    
    
    There can be many types of data in the  ".data" section:
    Look at the file testdata.asm
    and see the results in testdata.lst
    
    ;  testdata.asm  a program to demonstrate data types and values
    ;
    ; assemble:	nasm -f elf -l testdata.lst  testdata.asm
    ; link:		gcc -o testdata  testdata.o
    ; run:	        testdata 
    ; Look at the list file, testdata.lst
    
    ; Note! nasm ignores the type of data and type of reserved
    ; space when used as memory addresses.
    ; You may have to use qualifiers BYTE, WORD or DWORD [dd01]
    
    	
    	section .data		; data section
    				; initialized, writeable
    
    				; db for data byte, 8-bit 
    db01:	db	255,1,17	; decimal values for bytes
    db02:	db	0xff,0ABh	; hexadecimal values for bytes
    db03:	db	'a','b','c'	; character values for bytes
    db04:	db	"abc"		; string value as bytes 'a','b','c'
    db05:	db	'abc'		; same as "abc" three bytes
    db06:	db	"hello",13,10,0 ; "C" string including cr and lf
    
    				; dw for data word, 16-bit
    dw01:	dw	12345,-17,32	; decimal values for words
    dw02:	dw	0xFFFF,0abcdH	; hexadecimal values for words
    dw03:	dw	'a','ab','abc'	; character values for words
    dw04:	dw	"hello"		; three words, 6-bytes allocated
    
    				; dd for data double word, 32-bit
    dd01:	dd	123456789,-7	; decimal values for double words
    dd02:	dd	0xFFFFFFFF	; hexadecimal value for double words
    dd03:	dd	'a'		; character value in double word
    dd04:	dd	"hello"		; string in two double words
    dd05:	dd	13.27E30	; floating point value 32-bit IEEE
    
    				; dq for data quad word, 64-bit
    dq01:	dq	13.27E300	; floating point value 64-bit IEEE
    
    				; dt for data ten of 80-bit floating point
    dt01:	dt	13.270E3000	; floating point value 80-bit in register
    
    
    	
    	section .bss		; reserve storage space
    				; uninitialized, writeable
    	
    s01:	resb	10		; 10 8-bit bytes reserved
    s02:	resw	20		; 20 16-bit words reserved
    s03:	resd	30		; 30 32-bit double words reserved
    s04:	resq	40		; 40 64-bit quad words reserved
    s05:	resb	1		; one more byte
    	
    	SECTION .text		; code section
            global main		; make label available to linker 
    main:				; standard  gcc  entry point
    
    	
    	mov	al,[db01]	; correct to load a byte
    	mov	ah,[db01]	; correct to load a byte
    	mov	ax,[dw01]	; correct to load a word
    	mov	eax,[dd01]	; correct to load a double word
    
    	mov	al,BYTE [db01]	; redundant, yet allowed
    
    	mov	ax,[db01]	; no warning, loads two bytes
    	mov	eax,[dw01]	; no warning, loads two words
    
    ;	mov	ax,BYTE [db01]	; error, size miss match
    ;	mov	eax,WORD [dw01]	; error, size miss match
    
    ;	push	BYTE [db01]	; error, can not push a byte
    	push	WORD [dw01]	; "push" needs to know size 2-byte
    	push	DWORD [dd01]	; "push" needs to know size 4-byte
    ;	push	QWORD [dq01]	; error, can not push a quad word
    	push	DWORD [dq01+4]	; push a floating point, half of it
    	push	DWORD [dq01]	; push other half of floating point
    	fld	DWORD [dd05]	; floating load 32-bit
    	fld	QWORD [dq01]	; floating load 64-bit
    		
    	mov	ebx,0		; exit code, 0=normal
    	mov	eax,1		; exit command to kernel
    	int	0x80		; interrupt 80 hex, call kernel
    
    ; end testdata.asm
    
    Now, see the values in testdata.lst (widen your window)
    
         1                                  ;  testdata.asm  program to demonstrate data types and values
         2                                  ;
         3                                  ; assemble:	nasm -f elf -l testdata.lst  testdata.asm
         4                                  ; link:		gcc -o testdata  testdata.o
         5                                  ; run:	        testdata 
         6                                  ; Look at the list file, testdata.lst
         7                                  
         8                                  ; Note! nasm ignores the type of data and type of reserved
         9                                  ; space when used as memory addresses.
        10                                  ; You may have to use qualifiers BYTE, WORD or DWORD [dd01]
        11                                  
        12                                  	
        13                                  	section .data		; data section
        14                                  				; initialized, writeable
        15                                  
        16                                  				; db for data byte, 8-bit 
        17 00000000 FF0111                  db01:	db	255,1,17	; decimal values for bytes
        18 00000003 FFAB                    db02:	db	0xff,0ABh	; hexadecimal values for bytes
        19 00000005 616263                  db03:	db	'a','b','c'	; character values for bytes
        20 00000008 616263                  db04:	db	"abc"		; string value as bytes 'a','b','c'
        21 0000000B 616263                  db05:	db	'abc'		; same as "abc" three bytes
        22 0000000E 68656C6C6F0D0A00        db06:	db	"hello",13,10,0 ; "C" string including cr and lf
        23                                  
        24                                  				; dw for data word, 16-bit
        25 00000016 3930EFFF2000            dw01:	dw	12345,-17,32	; decimal values for words
        26 0000001C FFFFCDAB                dw02:	dw	0xFFFF,0abcdH	; hexadecimal values for words
        27 00000020 6100616261626300        dw03:	dw	'a','ab','abc'	; character values for words
        28 00000028 68656C6C6F00            dw04:	dw	"hello"		; three words, 6-bytes allocated
        29                                  
        30                                  				; dd for data double word, 32-bit
        31 0000002E 15CD5B07F9FFFFFF        dd01:	dd	123456789,-7	; decimal values for double words
        32 00000036 FFFFFFFF                dd02:	dd	0xFFFFFFFF	; hexadecimal value for double words
        33 0000003A 61000000                dd03:	dd	'a'		; character value in double word
        34 0000003E 68656C6C6F000000        dd04:	dd	"hello"		; string in two double words
        35 00000046 AF7D2773                dd05:	dd	13.27E30	; floating point value 32-bit IEEE
        36                                  
        37                                  				; dq for data quad word, 64-bit
        38 0000004A C86BB752A7D0737E        dq01:	dq	13.27E300	; floating point value 64-bit IEEE
        39                                  
        40                                  				; dt for data ten of 80-bit floating point
        41 00000052 4011E5A59932D5B6F0-     dt01:	dt	13.270E3000	; floating point value 80-bit in register
        42 0000005B 66                 
        43                                  
        44                                  
        45                                  	
        46                                  	section .bss		; reserve storage space
        47                                  				; uninitialized, writeable
        48                                  	
        49 00000000           s01:	resb	10		; 10 8-bit bytes reserved
        50 0000000A           s02:	resw	20		; 20 16-bit words reserved
        51 00000032           s03:	resd	30		; 30 32-bit double words reserved
        52 000000AA           s04:	resq	40		; 40 64-bit quad words reserved
        53 000001EA           s05:	resb	1		; one more byte
        54                                  	
        55                                  	SECTION .text		; code section
        56                                          global main		; make label available to linker 
        57                                  main:				; standard  gcc  entry point
        58                                  
        59                                  	
        60 00000000 A0[00000000]            	mov	al,[db01]	; correct to load a byte
        61 00000005 8A25[00000000]          	mov	ah,[db01]	; correct to load a byte
        62 0000000B 66A1[16000000]          	mov	ax,[dw01]	; correct to load a word
        63 00000011 A1[2E000000]            	mov	eax,[dd01]	; correct to load a double word
        64                                  
        65 00000016 A0[00000000]            	mov	al,BYTE [db01]	; redundant, yet allowed
        66                                  
        67 0000001B 66A1[00000000]          	mov	ax,[db01]	; no warning, loads two bytes
        68 00000021 A1[16000000]            	mov	eax,[dw01]	; no warning, loads two words
        69                                  
        70                                  ;	mov	ax,BYTE [db01]	; error, size miss match
        71                                  ;	mov	eax,WORD [dw01]	; error, size miss match
        72                                  
        73                                  ;	push	BYTE [db01]	; error, can not push a byte
        74 00000026 66FF35[16000000]        	push	WORD [dw01]	; "push" needs to know size 2-byte
        75 0000002D FF35[2E000000]          	push	DWORD [dd01]	; "push" needs to know size 4-byte
        76                                  ;	push	QWORD [dq01]	; error, can not push a quad word
        77 00000033 FF35[4E000000]          	push	DWORD [dq01+4]	; push a floating point, half of it
        78 00000039 FF35[4A000000]          	push	DWORD [dq01]	; push other half of floating point
        79 0000003F D905[46000000]          	fld	DWORD [dd05]	; floating load 32-bit
        80 00000045 DD05[4A000000]          	fld	QWORD [dq01]	; floating load 64-bit
        81                                  		
        82 0000004B BB00000000              	mov	ebx,0		; exit code, 0=normal
        83 00000050 B801000000              	mov	eax,1		; exit command to kernel
        84 00000055 CD80                    	int	0x80		; interrupt 80 hex, call kernel
        85                                  
        86                                  ; end testdata.asm
    
    
    

    Lecture 3 Registers, syntax, sections

    The Intel 80x86 has many registers and named sub-registers.
    Here are some that are used in assembly language programming
    and debugging (the "dash number" gives the number of bits):
    
     +---------------------------+  EAX extended accumulator
     | EAX-32 +-----------------+|  (lower part of dividend)
     |        |       AX-16     ||  (quotient after division)
     |        |+--------+------+||  (lower part of product)
     |        ||  AH-8  | AL-8 |||
     |        |+--------+------+||
     |        +-----------------+|
     +---------------------------+
    
     +---------------------------+  EBX extended base pointer
     | EBX-32 +-----------------+|  (BX in DS segment)
     |        |       BX-16     ||  
     |        |+--------+------+||
     |        ||  BH-8  | BL-8 |||
     |        |+--------+------+||
     |        +-----------------+|
     +---------------------------+
    
     +---------------------------+  ECX extended counter
     | ECX-32 +-----------------+|  (string and loop operations)
     |        |       CX-16     ||  (CX is a 16 bit counter)
     |        |+--------+------+||
     |        ||  CH-8  | CL-8 |||
     |        |+--------+------+||
     |        +-----------------+|
     +---------------------------+
    
     +---------------------------+  EDX extended DX
     | EDX-32 +-----------------+|  (I/O pointer for memory mapped I/O)
     |        |       DX-16     ||  (remainder after divide)
     |        |+--------+------+||  (upper part of dividend)
     |        ||  DH-8  | DL-8 |||  (upper part of product)
     |        |+--------+------+||
     |        +-----------------+|
     +---------------------------+
    
     +---------------------------+  ESP extended stack pointer
     | ESP-32     +-------------+|  SP  stack pointer
     |            | SP-16       ||  (used by PUSH and POP)
     |            +-------------+|
     +---------------------------+
    
     +---------------------------+  EBP extended base pointer
     | EBP-32     +-------------+|  (by convention, callers stack)
     |            | BP-16       ||  (BP in ES segment)
     |            +-------------+|
     +---------------------------+
    
     +---------------------------+  ESI extended source index
     | ESI-32     +-------------+|  SI  source index
     |            | SI-16       ||  (in DS segment)
     |            +-------------+|
     +---------------------------+
    
     +---------------------------+  EDI extended destination index
     | EDI-32     +-------------+|  
     |            | DI-16       ||  (DI in ES segment)
     |            +-------------+|
     +---------------------------+
    
     +---------------------------+  EIP extended instruction pointer
     | EIP-32     +-------------+|  IP  instruction pointer
     |            | IP-16       ||  
     |            +-------------+|
     +---------------------------+
    
     +---------------------------+   EFLAGS error flags
     | EFLAGS-32  +-------------+|   or just  flags
     |            | EFLAGS-16   ||   (not a register name!)
     |            +-------------+|   (must use PUSHF and POPF)
     +---------------------------+
    
     For 32-bit "C" compatible programming, stop here.
    
                  +-------------+   CS code segment
                  | CS-16       |
                  +-------------+
    
                  +-------------+   SS stack segment
                  | SS-16       |
                  +-------------+
    
                  +-------------+   DS data segment
                  | DS-16       |   (current module)
                  +-------------+
    
                  +-------------+   ES data segment
                  | ES-16       |   (calling module, destination string)
                  +-------------+
    
                  +-------------+   FS heap segment
                  | FS-16       |
                  +-------------+
    
                  +-------------+   GS global segment
                  | GS-16       |   (shared)
                  +-------------+
    
    There are also 80-bit floating point registers ST0 .. ST7
    There are also 64-bit MMX registers MM0 .. MM7
    There are also control registers CR0 .. CR4
    There are also debug registers DR0 .. DR3, DR6, DR7
    There are also test registers TR3 .. TR7
    
    A dumb program to test register names is testreg.asm
    
    Another dumb program to test al,ah,ax,eax regeax.asm
    
    
    The basic syntax for a line in NASM is:
    
    label:  opcode  operand(s) ; comment
    
    The "label" is a case sensitive user name, followed by a colon.
    The label is optional and when not present, indent the opcode.
    The label should start in column one of the line.
    The label may be on a line with nothing else or a comment.
    
    The "opcode" is not case sensitive and may be a machine instruction
    or an assembler directive (pseudo operation) or a macro call.
    Typically, all "opcode" fields are neatly lined up starting in the
    same column. Use of "tab" is OK.
    Machine instructions may be preceded by a "prefix" such as:
    a16, a32, o16, o32, and others.
    
    "operand(s)" depend on the choice of "opcode".
    An operand may have several parts separated by commas,
    The parts may be a combination of register names, constants,
    memory references in brackets [ ] or empty.
    
    Comments are optional, yet encouraged.
    Everything from the semicolon to the end of the line is
    a comment, ignored by the assembler.
    The semicolon may be in column one, making the entire line
    a comment.
    
    Sections or segments:
    One specific assembler directive is the "section" or "SECTION"
    directive. Four types of section are predefined for ELF format:
    
            section  .data    ; initialized data
                              ; writeable, not executable
                              ; default alignment 4 bytes
    
            section  .bss     ; uninitialized space for data
                              ; writeable, not executable
                              ; default alignment 4 bytes
    
            section  .rodata  ; initialized data
                              ; read only, not executable
                              ; default alignment 4 bytes
    
            section  .text    ; instructions (code)
                              ; not writeable, executable
                              ; default alignment 16 bytes
    
            section  other    ; any name other than .data, .bss,
                              ; .rodata, .text
                              ; your stuff
                              ; not executable, not writeable
                              ; default alignment 1 byte
    
    A few comments on efficiency:
    My experience is that a good assembly language programmer
    can make a small (about 100 lines) "C" program more
    efficient than the  gcc  compiler. But, for larger
    programs, the compiler will be more efficient.
    
    Exceptions are, for example, the SGI IRIX  cc  compiler
    that has super optimization for that specific machine.
    
    For the Intel 80x86 here are some samples in nasm and from gcc
    (different syntax but you should be able to recognize the instructions)
    Focus on the loop, there is prologue and epilogue code that should
    be included, yet was omitted. Note the test has "check" values
    at each end of the array. There is no range testing in
    either "C" or assembly language.
    
    A simple loop loopint.asm
    Same code from gcc  loopint.s
    Hex machine code generated by nasm loopint.lst
    
    Most efficient loop loopint2.asm
    Same code from gcc  loopint2.s
    Hex machine code generated by nasm loopint2.lst
    
    Speed consideration must take into account cache and virtual memory
    performance, number of bytes transfered from RAM and clock cycles.
    On modern computer architectures, this is almost impossible. For example,
    the Pentium 4 translates the 80x86 code into RISC pipeline code and
    is actually executing instructions that are different from the
    assembly language. Carefully benchmarking complete applications is
    about the only conclusive measure of efficiency.
    
    
    

    Lecture 4 Arithmetic and shifting

    Both integer and floating point arithmetic are demonstrated.
    In order to make the source code smaller, a macro is defined
    to print out results. The equivalent "C" program is given as
    comments.
    
    First, see how to call the "C" library function, printf, to make
    it easier to print values:
    Look at the file printf1.asm
    
    ; printf1.asm   print an integer from storage and from a register
    ; Assemble:	nasm -f elf -l printf1.lst  printf1.asm
    ; Link:		gcc -o printf1  printf1.o
    ; Run:		printf1
    ; Output:	a=5, eax=7
    
    ; Equivalent C code
    ; /* printf1.c  print an int and an expression */
    ; #include <stdio.h>
    ; int main()
    ; {
    ;   int a=5;
    ;   printf("a=%d, eax=%d\n", a, a+2);
    ;   return 0;
    ; }
    
    ; Declare some external functions
    ;
            extern	printf		; the C function, to be called
    
            section .data		; Data section, initialized variables
    a:	dd	5		; int a=5;
    fmt:    db "a=%d, eax=%d", 10, 0 ; The printf format, "\n",'0'
    
            section .text           ; Code section.
            global main		; the standard gcc entry point
    main:				; the program label for the entry point
            push    ebp		; set up stack frame
            mov     ebp,esp
    
    	mov	eax, [a]	; put "a" from store into register
    	add	eax, 2		; a+2
    	push	eax		; value of a+2
            push    dword [a]	; value of variable a
            push    dword fmt	; address of format string
            call    printf		; Call C function
            add     esp, 12		; pop stack 3 push times 4 bytes
    
            mov     esp, ebp	; take down stack frame
            pop     ebp		; same as "leave" op
    
    	mov	eax,0		;  normal, no error, return value
    	ret			; return
    	
    
    
    Now, for integer arithmetic, look at the file intarith.asm
    
    ; intarith.asm    show some simple C code and corresponding nasm code
    ;                 the nasm code is one sample, not unique
    ;
    ; compile:	nasm -f elf -l intarith.lst  intarith.asm
    ; link:		gcc -o intarith  intarith.o
    ; run:		intarith
    ;
    ; the output from running intarith.asm and intarith.c is:	
    ; c=5  , a=3, b=4, c=5
    ; c=a+b, a=3, b=4, c=7
    ; c=a-b, a=3, b=4, c=-1
    ; c=a*b, a=3, b=4, c=12
    ; c=c/a, a=3, b=4, c=4
    ;
    ;The file  intarith.c  is:
    ;  /* intarith.c */
    ;  #include <stdio.h>
    ;  int main()
    ;  { 
    ;    int a=3, b=4, c;
    ;
    ;    c=5;
    ;    printf("%s, a=%d, b=%d, c=%d\n","c=5  ", a, b, c);
    ;    c=a+b;
    ;    printf("%s, a=%d, b=%d, c=%d\n","c=a+b", a, b, c);
    ;    c=a-b;
    ;    printf("%s, a=%d, b=%d, c=%d\n","c=a-b", a, b, c);
    ;    c=a*b;
    ;    printf("%s, a=%d, b=%d, c=%d\n","c=a*b", a, b, c);
    ;    c=c/a;
    ;    printf("%s, a=%d, b=%d, c=%d\n","c=c/a", a, b, c);
    ;    return 0;
    ; }
    
            extern printf		; the C function to be called
    
    %macro	pabc 1			; a "simple" print macro
    	section .data
    .str	db	%1,0		; %1 is first actual in macro call
    	section .text
    				; push onto stack backward 
    	push	dword [c]	; int c
    	push	dword [b]	; int b 
    	push	dword [a]	; int a
    	push	dword .str 	; users string
            push    dword fmt       ; address of format string
            call    printf          ; Call C function
            add     esp,20          ; pop stack 5*4 bytes
    %endmacro
    	
    	section .data  		; preset constants, writeable
    a:	dd	3		; 32-bit variable a initialized to 3
    b:	dd	4		; 32-bit variable b initializes to 4
    fmt:    db "%s, a=%d, b=%d, c=%d",10,0	; format string for printf
    	
    	section .bss 		; uninitialized space
    c:	resd	1		; reserve a 32-bit word
    
    	section .text		; instructions, code segment
    	global	 main		; for gcc standard linking
    main:				; label
    	
    lit5:				; c=5;
    	mov	eax,5	 	; 5 is a literal constant
    	mov	[c],eax		; store into c
    	pabc	"c=5  "		; invoke the print macro
    	
    addb:				; c=a+b;
    	mov	eax,[a]	 	; load a
    	add	eax,[b]		; add b
    	mov	[c],eax		; store into c
    	pabc	"c=a+b"		; invoke the print macro
    	
    subb:				; c=a-b;
    	mov	eax,[a]	 	; load a
    	sub	eax,[b]		; subtract b
    	mov	[c],eax		; store into c
    	pabc	"c=a-b"		; invoke the print macro
    	
    mulb:				; c=a*b;
    	mov	eax,[a]	 	; load a (must be eax for multiply)
    	imul	dword [b]	; signed integer multiply by b
    	mov	[c],eax		; store bottom half of product into c
    	pabc	"c=a*b"		; invoke the print macro
    	
    diva:				; c=c/a;
    	mov	eax,[c]	 	; load c
    	mov	edx,0		; load upper half of dividend with zero
    	idiv	dword [a]	; divide double register edx eax by a
    	mov	[c],eax		; store quotient into c
    	pabc	"c=c/a"		; invoke the print macro
    
            mov     eax,0           ; exit code, 0=normal
    	ret			; main returned to operating system
    
    
    
            bbbb  [mem] a product of 32-bits times 32-bits is 64-bits
     imul   bbbb  eax
       ---------
    edx bbbbbbbb  eax   the upper part of the product is in edx
                        the lower part of the product is in eax
    
    
    edx bbbbbbbb  eax  before divide, the upper part of dividend is in edx
                                      the lower part of dividend is in eax
     idiv   bbbb  [mem] the divisor
        --------
                       after divide,  the quotient is in eax
                                      the remainder is in edx
    
    
    Now, for floating point arithmetic, look at the file fltarith.asm
    Note the many similarities to integer arithmetic, yet some basic differences.
    
    ; fltarith.asm   show some simple C code and corresponding nasm code
    ;                the nasm code is one sample, not unique
    ;
    ; compile  nasm -f elf -l fltarith.lst  fltarith.asm
    ; link     gcc -o fltarith  fltarith.o
    ; run      fltarith
    ;
    ; the output from running fltarith and fltarithc is:	
    ; c=5.0, a=3.000000e+00, b=4.000000e+00, c=5.000000e+00
    ; c=a+b, a=3.000000e+00, b=4.000000e+00, c=7.000000e+00
    ; c=a-b, a=3.000000e+00, b=4.000000e+00, c=-1.000000e+00
    ; c=a*b, a=3.000000e+00, b=4.000000e+00, c=1.200000e+01
    ; c=c/a, a=3.000000e+00, b=4.000000e+00, c=4.000000e+00
    ; a=i  , a=8.000000e+00, b=1.600000e+01, c=1.600000e+01
    ;The file  fltarith.c  is:
    ;  #include <stdio.h>
    ;  int main()
    ;  { 
    ;    double a=3.0, b=4.0, c;
    ;    int i=8;
    ;
    ;    c=5.0;
    ;    printf("%s, a=%e, b=%e, c=%e\n","c=5.0", a, b, c);
    ;    c=a+b;
    ;    printf("%s, a=%e, b=%e, c=%e\n","c=a+b", a, b, c);
    ;    c=a-b;
    ;    printf("%s, a=%e, b=%e, c=%e\n","c=a-b", a, b, c);
    ;    c=a*b;
    ;    printf("%s, a=%e, b=%e, c=%e\n","c=a*b", a, b, c);
    ;    c=c/a;
    ;    printf("%s, a=%e, b=%e, c=%e\n","c=c/a", a, b, c);
    ;    a=i;
    ;    b=a+i;
    ;    i=b;
    ;    c=i;
    ;    printf("%s, a=%e, b=%e, c=%e\n","c=c/a", a, b, c);
    ;    return 0;
    ; }
    
            extern printf		; the C function to be called
    
    %macro	pabc 1			; a "simple" print macro
    	section	.data
    .str	db	%1,0		; %1 is macro call first actual parameter
    	section .text
    				; push onto stack backwards 
    	push	dword [c+4]	; double c (bottom)
    	push	dword [c]	; double c
    	push	dword [b+4]	; double b (bottom)
    	push	dword [b]	; double b 
    	push	dword [a+4]	; double a (bottom)
    	push	dword [a]	; double a
    	push	dword .str 	; users string
            push    dword fmt       ; address of format string
            call    printf          ; Call C function
            add     esp,32          ; pop stack 8*4 bytes
    %endmacro
    	
    	section	.data  		; preset constants, writeable
    a:	dq	3.0		; 64-bit variable a initialized to 3.0
    b:	dq	4.0		; 64-bit variable b initializes to 4.0
    i:	dw	8		; a 32 bit integer
    five:	dq	5.0		; constant 5.0
    fmt:    db "%s, a=%e, b=%e, c=%e",10,0	; format string for printf
    	
    	section .bss 		; unitialized space
    c:	resq	1		; reserve a 64-bit word
    
    	section .text		; instructions, code segment
    	global	main		; for gcc standard linking
    main:				; label
    	
    lit5:				; c=5.0;
    	fld	qword [five]	; 5.0 constant
    	fstp	qword [c]	; store into c
    	pabc	"c=5.0"		; invoke the print macro
    	
    addb:				; c=a+b;
    	fld	qword [a] 	; load a (pushed on flt pt stack, st0)
    	fadd	qword [b]	; floating add b (to st0)
    	fstp	qword [c]	; store into c (pop flt pt stack)
    	pabc	"c=a+b"		; invoke the print macro
    	
    subb:				; c=a-b;
    	fld	qword [a] 	; load a (pushed on flt pt stack, st0)
    	fsub	qword [b]	; floating subtract b (to st0)
    	fstp	qword [c]	; store into c (pop flt pt stack)
    	pabc	"c=a-b"		; invoke the print macro
    	
    mulb:				; c=a*b;
    	fld	qword [a]	; load a (pushed on flt pt stack, st0)
    	fmul	qword [b]	; floating multiply by b (to st0)
    	fstp	qword [c]	; store product into c (pop flt pt stack)
    	pabc	"c=a*b"		; invoke the print macro
    	
    diva:				; c=c/a;
    	fld	qword [c] 	; load c (pushed on flt pt stack, st0)
    	fdiv	qword [a]	; floating divide by a (to st0)
    	fstp	qword [c]	; store quotient into c (pop flt pt stack)
    	pabc	"c=c/a"		; invoke the print macro
    
    intflt:				; a=i;
    	fild	dword [i]	; load integer as floating point
    	fst	qword [a]	; store the floating point (no pop)
    	fadd	st0		; b=a+i; 'a' as 'i'  already on flt stack
    	fst	qword [b]	; store sum (no pop) 'b' still on stack
    	fistp	dword [i]	; i=b; store floating point as integer
    	fild	dword [i]	; c=i; load again from ram (redundant)
    	fstp	qword [c]
    	pabc	"a=i  "		; invoke the print macro
    
            mov     eax,0           ; exit code, 0=normal
    	ret			; main returns to operating system
    
    
    
    Refer to nasmdoc.txt or textbook 10.4 for details.
    A brief summary is provided here.
    "reg" is an 8-bit, 16-bit or 32-bit register
    "count" is a number of bits to shift
    "right" moves contents of the register to the right, makes it smaller
    "left" moves contents of the register to the left, makes it bigger
    
      SAL   reg,count   shift arithmetic left
      SAR   reg,count   shift arithmetic right (sign extension)
      SHL   reg,count   shift left (logical, zero fill)
      SHR   reg,count   shift right (logical, zero fill)
      ROL   reg,count   rotate left
      ROR   reg,count   rotate right
      SHLD  reg1,reg2,count  shift left double-register 
      SHRD  reg1,reg2,count  shift right double-register
    
    An example of using the various shifts is in: shift.asm
    
    

    Lecture 5 Using debugger

    See www.csee.umbc.edu/help/nasm/nasm.shtml for notes on using debugger.
    
    A program that prints where its sections are allocated
    (in virtual memory) is where.asm
    
    ; where.asm   print addresses of sections
    ; Assemble:	nasm -f elf -l where.lst  where.asm
    ; Link:		gcc -o where  where.o
    ; Run:		where
    ; Output:	you need to run it, on my computer
    ; data    a: at 8048330
    ; bss     b: at 804840C
    ; rodata  c: at 804956C
    ; code main: at 8049424
    
            extern	printf		; the C function, to be called
            section .data		; Data section, initialized variables
    a:	db	0,1,2,3,4,5,6,7
    fmt:    db "data    a: at %X",10
    	db "bss     b: at %X",10
    	db "rodata  c: at %X",10
    	db "code main: at %X",10,0 
    
    	section .bss		; reserved storage, uninitialized
    b:	resb	8
    
    	section	.rodata		; read only initialized storage
    c:	db	7,6,5,4,3,2,1,0
    	
            section .text           ; Code section.
            global main		; the standard gcc entry point
    main:				; the program label for the entry point
            push	ebp
    	mov     ebp,esp
    	push	ebx
    
    	lea	eax,[a]		; load effective address of [a]
    	push	eax
    	lea	ebx,[b]
            push    ebx
    	lea	ecx,[c]
    	push	ecx
    	lea	edx,[main]
    	push	edx
            push    dword fmt	; address of format string
            call    printf		; Call C function
            add     esp, 20		; pop stack 5 push times 4 bytes
    
    	pop	ebx
    	mov	esp,ebp
    	pop	ebp
    	mov	eax,0		; normal, no error, return value
    	ret			; return
    	
    
    

    Lecture 6 Branching and loops

    The basic integer compare instruction is  "cmp"
    Following this instruction is typically one of:
      JL  label  ; jump on less than  "<"
      JLE label  ; jump on less than or equal "<="
      JG  label  ; jump on greater than ">"
      JGE label  ; jump on greater than or equal ">="
      JE  label  ; jump on equal "=="
      JNE label  ; jump on not equal "!="
    
    After many integer arithmetic instructions
      JZ  label  ; jump on zero
      JNZ label  ; jump on non zero
      JS  label  ; jump on sign plus
      JNS labe;  ; jump on sign not plus
    
    Note: Use 'cmp' rather than 'sub' for comparison.
    Overflow can occur on subtraction resulting in sign inversion.
    
    Convert a "C" 'if' statement to nasm assembly ifint.asm
    The significant features are:
    1) use a compare instruction for the test
    2) put a label on the start of the false branch (e.g. false1:)
    3) put a label after the end of the 'if' statement (e.g. exit1:)
    4) choose a conditional jump that goes to the false part
    5) put an unconditional jump to (e.g. exit1:) at the end of the true part
    
    ; ifint.asm  code ifint.c for nasm 
    ; /* ifint.c an 'if' statement that will be coded for nasm */
    ; #include <stdio.h>
    ; int main()
    ; {
    ;   int a=1;
    ;   int b=2;
    ;   int c=3;
    ;   if(a<b)
    ;     printf("true a < b \n");
    ;   else
    ;     printf("wrong on a < b \n");
    ;   if(b>c)
    ;     printf("wrong on b > c \n");
    ;   else
    ;     printf("false b > c \n");
    ;   return 0;
    ;}
    ; result of executing both "C" and assembly is:
    ; true a < b
    ; false b > c 
    	
    	global	main		; define for linker
            extern	printf		; tell linker we need this C function
            section .data		; Data section, initialized variables
    a:	dd 1
    b:	dd 2
    c:	dd 3
    fmt1:   db "true a < b ",10,0
    fmt2:   db "wrong on a < b ",10,0
    fmt3:   db "wrong on b > c ",10,0
    fmt4:   db "false b > c ",10,0
    
    	section .text
    main:	mov	eax,[a]
    	cmp	eax,[b]
    	jge	false1		; choose jump to false part
    	; a < b sign is set
            push    dword fmt1	; printf("true a < b \n"); 
            call    printf	
            add     esp,4
            jmp	exit1		; jump over false part
    false1:	;  a < b is false 
            push    dword fmt2	; printf("wrong on a < b \n");
            call    printf
            add     esp,4
    exit1:				; finished 'if' statement
    
    	mov	eax,[b]
    	cmp	eax,[c]
    	jle	false2		; choose jump to false part
    	; b > c sign is not set
            push    dword fmt3	; printf("wrong on b > c \n");
            call    printf	
            add     esp,4
            jmp	exit2		; jump over false part
    false2:	;  a > b is false 
            push    dword fmt4	; printf("false b :gt; c \n");
            call    printf
            add     esp,4
    exit2:				; finished 'if' statement
    
    	mov	eax,0		; normal, no error, return value
    	ret			; return 0;
    
    
    
    Convert a "C" loop to nasm assembly  loopint.asm
    The significant features are:
    1) "C" int  is 4-bytes, thus  dd1[1] becomes  dword [dd1+4]
                                  dd1[99] becomes  dword [dd1+4*99]
    
    2) "C" int  is 4-bytes, thus  dd1[i]; i++; becomes  add edi,4
       since "i" is never stored, the register  edi  holds "i"
    
    3) the 'cmp' instruction sets flags that control the jump instruction.
       cmp  edi,4*99   is like  i<99
       jnz  loop1      jumps if register  edi  is not  4*99
    
    ; loopint.asm  code loopint.c for nasm 
    ; /* loopint.c a very simple loop that will be coded for nasm */
    ; #include <stdio.h>
    ; int main()
    ; {
    ;   int dd1[100];
    ;   int i;
    ;   dd1[0]=5; /* be sure loop stays 1..98 */
    ;   dd1[99]=9;
    ;   for(i=1; i<99; i++) dd1[i]=7;
    ;   printf("dd1[0]=%d, dd1[1]=%d, dd1[98]=%d, dd1[99]=%d\n",
    ;           dd1[0], dd1[1], dd1[98],dd1[99]);
    ;   return 0;
    ;}
    	section	.bss
    dd1:	resd	100
    i:	resd	1		; actually unused, kept in register
    
    	section .text
    	global main
    main:	mov dword [dd1],5	; dd1[0]=5;
    	mov dword [dd1+99*4],9	; dd1[99]=9;
    
    	mov edi,4		; i=1; /* 4 bytes */
    loop1:	mov dword [dd1+edi],7	; dd1[i]=7;
    	add edi,4		; i++; /* 4 bytes */
    	cmp edi,4*99		; i<99
    	jne	loop1		; loop until i=99
    	
            extern	printf		; the C function, to be called
            section .data		; Data section, initialized variables
    fmt:    db "dd1[0]=%d, dd1[1]=%d, dd1[98]=%d, dd1[99]=%d",10,0
    
            section .text           ; Code section, continued
    	push	dword [dd1+99*4]	; dd1[99]
            push    dword [dd1+98*4]	; dd1[98]
    	push	dword [dd1+4]	; dd1[1]
            push    dword [dd1]	; dd1[0]
            push    dword fmt	; address of format string
            call    printf		; Call C function
            add     esp, 20		; pop stack 5 push times 4 bytes
    
    	mov	eax,0		; normal, no error, return value
    	ret			; return 0;
                                    ; no registers needed to be saved 
    	
    	
    Previously, integer arithmetic in "C" was converted to
    NASM assembly language. The following is very similar
    (cut and past) of intarith.asm to intlogic.asm that
    shows the "C" operators "&" and, "|" or, "^" xor, "~" not.
    
    intlogic.asm
    
    One significant use of loops is to evaluate polynomials and
    convert numbers from one base to another.
    (Yes, this is related to project 1 for CMSC 313)
    
    The following program has seven loops.
    Loop1 (h1loop) uses Horners method to convert ASCII decimal digits
          to binary, using a sentinal, '.', with 'cmp' and 'je'
          to exit the loop
    
    Loop2 (h2loop) uses Horners method to convert ASCII decimal digits
           using 'edi' as an index, 'ecx' and 'loop' to do the loop.
    
    Loop3 (h3loop) uses Horners method to evaluate a polynomial,
           using 'edi' as an index, 'ecx' and 'loop' to do the loop.
    
    Loop4 (h4loop) uses Horners method, with data order optimized,
          using 'ecx' as both index and loop counter, to get a
          three instruction loop.
    
    Loop5 (h5loop) uses Horners method to evaluate a polynomial
          using double precision floating point. Note 8 byte
          increment and quad word to printf.
    
    Loop6 (h6loop) uses Horners method to evaluate the fractional
          part of a double precision floating point polynomial.
          Note that divide is used in place of multiply and the
          least significant coefficient is used first.
    
    Loop7 (h7loop) uses Horners method to convert ASCII decimal
          fraction to binary. Note that shifting is needed
          because the binary point can not be at the right end
          of the word.
    
    Loop8 (h8loop) just prints 16 bits from the result of Loop7
          as ASCII characters.
    
    Study horner.asm to understand
    the NASM coding of the loops.
    
    ; horner.asm  Horners method of evaluating polynomials
    ;
    ; given a polynomial  Y = a_n X^n + a_n-1 X^n-1 + ... a_1 X + a_0
    ; a_n is the coefficient 'a' with subscript n. X^n is X to nth power
    ; compute y_1 = a_n * X + a_n-1
    ; compute y_2 = y_1 * X + a_n-2
    ; compute y_i = y_i-1 * X + a_n-i   i=3..n
    ; thus    y_n = Y = value of polynomial 
    ;
    ; in assembly language:
    ;   load some register with a_n, multiply by X
    ;   add a_n-1, multiply by X, add a_n-2, multiply by X, ...
    ;   finishing with the add  a_0
    ;
    ; for conversion of decimal to binary, X=10
    ; 
    	extern	printf
    	section	.data
    decdig:	db	'5','2','8','0','.' ; decimal integer 5280
    fmt:	db	"%d",10,0
    	
    	global	main
    	section	.text
    main:	push	ebp		; save ebp
            mov	ebp,esp		; ebp is callers stack
    	push	ebx
    	push	edi		; save registers
    	
    
    ; method 1, using a "sentinel" e.g. '.'
    	
    	mov	eax,0		; accumulate value here
    	mov	al,[decdig]	; get first ASCII digit
    	sub	al,48		; convert ASCII digit to binary
    	mov	edi,1		; subscript initialization
    h1loop:	mov	ebx,0		; clear register (upper part)
    	mov	bl,[decdig+edi]	; get next ASCII digit
    	cmp	bl,'.'		; compare to decimal point
    	je	h1fin		; exit loop on decimal point
    	sub	bl,48		; convert ASCII digit to binary
    	imul	eax,10		; * X     (ignore edx)
    	add	eax,ebx		; + a_n-i
    	inc	edi		; increment subscript
    	jmp	h1loop
    h1fin:
    	push	dword eax	; print eax
    	push	dword fmt	; format %d
    	call	printf
    	add	esp,8		; restore stack
    
    ; method 2, using a count
    	
    	mov	eax,0		; accumulate value here
    	mov	al,[decdig]	; get first ASCII digit
    	sub	al,48		; convert ASCII digit to binary
    	mov	edi,1		; subscript initialization
    	mov	ecx,3		; loop iteration count initialization
    h2loop:	mov	ebx,0		; clear register (upper part)
    	mov	bl,[decdig+edi]	; get next ASCII digit
    	sub	bl,48		; convert ASCII digit to binary
    	imul	eax,10		; * X     (ignore edx)
    	add	eax,ebx		; + a_n-i
    	inc	edi		; increment subscript
    	loop	h2loop		; decrement ecx, jump on non zero
    
    	push	dword eax	; print eax
    	push	dword fmt	; format %d
    	call	printf
    	add	esp,8		; restore stack
    
    ; evaluate a polynomial, X=7, using a count
    
    	section	.data
    a:	dd	2,5,-7,22,-9	; coefficients of polynomial, a_n first
    X:	dd	7
    	section	.text
    	mov	eax,[a]		; accumulate value here, get coefficient a_n
    	mov	edi,1		; subscript initialization
    	mov	ecx,4		; loop iteration count initialization, n
    h3loop:	imul	eax,[X]		; * X     (ignore edx)
    	add	eax,[a+4*edi]	; + a_n-i
    	inc	edi		; increment subscript
    	loop	h3loop		; decrement ecx, jump on non zero
    
    	push	dword eax	; print eax
    	push	dword fmt	; format %d
    	call	printf
    	add	esp,8		; restore stack
    
    ; evaluate a polynomial, X=7, using a count as index
    ; optimal organization of data allows a three instruction loop
    	
    	section	.data
    aa:	dd	-9,22,-7,5,2	; coefficients of polynomial, a_0 first
    	section	.text
    	mov	eax,[aa+4*4]	; accumulate value here, get coefficient a_n
    	mov	ecx,4		; loop iteration count initialization, n
    h4loop:	imul	eax,[X]		; * X     (ignore edx)
    	add	eax,[aa+4*ecx-4]; + aa_n-i
    	loop	h4loop		; decrement ecx, jump on non zero
    
    	push	dword eax	; print eax
    	push	dword fmt	; format %d
    	call	printf
    	add	esp,8		; restore stack
    
    ; evaluate a double floating polynomial, X=7.0, using a count as index
    ; optimal organization of data allows a three instruction loop
    	
    	section	.data
    af:	dq	-9.0,22.0,-7.0,5.0,2.0	; coefficients of polynomial, a_0 first
    XF:	dq	7.0
    Y:	dq	0.0
    N:	dd	4
    fmtflt:	db	"%e",10,0
    
    	section	.text
    	mov	ecx,[N]		; loop iteration count initialization, n
    	fld	qword [af+8*ecx]; accumulate value here, get coefficient a_n
    h5loop:	fmul	qword [XF]	; * XF
    	fadd	qword [af+8*ecx-8] ; + aa_n-i
    	loop	h5loop		; decrement ecx, jump on non zero
    
    	fstp	qword [Y]	; store Y in order to print Y
    	push	dword [Y+4]	; print Y (must be two parts of quadword)
    	push	dword [Y]	; print Y
    	push	dword fmtflt	; format %e
    	call	printf
    	add	esp,12		; restore stack
    
    
    ; Convert the fractional polynomial, Y = a_-1 X^-1 + a_-2 X^-2 + ...
    ; This must be performed using divide in reverse order.
    ; compute y_1 = a_-n / X + a_-n+1
    ; compute y_2 = y_1 / X + a_-n+2
    ; compute y_i = y_i-1 / X + a_-n+i   i=3..n
    ; thus    y_n = Y_n-1 / X  = Y = value of polynomial 
    ; Using the coefficients above a_-1 = -9.0 (first)
    ; a_-2 = 22.0, a_-3 = -7.0, a_-4 = 5.0, a_-5 = 2.0
    ; N=4 (not 5) because the the first term is outside the loop
    
    	mov	ecx,[N]		; loop iteration count initialization, n
    	fld	qword [af+8*ecx]; accumulate value here, get  a_-n-1 = 2.0
    h6loop:	fdiv	qword [XF]	; * XF
    	fadd	qword [af+8*ecx-8] ; + aa_n-i
    	loop	h6loop		; decrement ecx, jump on non zero
    	fdiv	qword [XF]	; extra divide for fractional terms
    	
    	fstp	qword [Y]	; store Y in order to print Y
    	push	dword [Y+4]	; print Y (must be two parts of quadword)
    	push	dword [Y]	; print Y
    	push	dword fmtflt	; format %e
    	call	printf
    	add	esp,12		; restore stack
    
    
    ; Convert the fractional part, a_-1 X^-1 + a_-2 X^-2 + ...
    ; This must be performed using "fixed point" arithmetic.
    ; The implied binary point is 16-bits from LSB.
    	
    	section	.data
    fracdig:db	'.','1','2','3','4' ; decimal fraction .1234
    fmth:	db	"%X",10,0
    ten:	dd	10
    eaxsave:dd	0
    	
    	global h7loop		; for debugging loop  "break h7loop"
    	section	.text
    	mov	eax,0		; accumulate value here
    	mov	al,[fracdig+4]	; get last ASCII digit
    	sub	al,48		; convert ASCII digit to binary
    	shl	eax,16		; move binary point
    	mov	ecx,3		; loop iteration count initialization
    h7loop:	mov	edx,0		; must clear upper dividend
    	idiv	dword [ten]	; quotient in eax
    	mov	ebx,0		; clear register (upper part)
    	mov	bl,[fracdig+ecx]; get next previous ASCII digit
    	sub	bl,48		; convert ASCII digit to binary
    	shl	ebx,16		; move binary point 16-bits
    	add	eax,ebx		; + a_n-i
    	loop	h7loop		; decrement ecx, jump on non zero
    	mov	edx,0		; must clear upper dividend
    	idiv	dword [ten]	; final divide
    	
    	mov	[eaxsave],eax	; save eax, printf destroys it
    	push	dword eax	; print eax
    	push	dword fmth	; format %X (look at low 16-bits)
    	call	printf
    	add	esp,8		; restore stack
    
    ; print the bits in eaxsave:
    	section	.bss
    abits:	resb	17		; 16 characters plus zero terminator
    	section	.data
    fmts:	db	"%s",10,0
    	section	.text
    	mov	eax,[eaxsave]	; restore eax
    	ror	eax,1		; get bottom bit in top of eax
    	mov	ecx,16		; for printing 16 bits
    h8loop:	mov	edx,0		; clear edx ready for a bit
    	shld	edx,eax,1	; top bit of eax into edx
    	add	edx,48		; make it ASCII
    	mov	[abits+ecx-1],dl ; store character
    	ror	eax,1		; next bit into top of eax
    	loop	h8loop		; decrement ecx, jump non zero
    	
    	mov	byte [abits+16],0 ; end of "C" string
    	push	dword abits	; string to print
    	push	dword fmts	; "%s"
    	call	printf
    	add	esp,8
    	
    
    	pop	edi
    	pop	ebx
    	mov	esp,ebp		; restore callers stack frame
    	pop	ebp
    	ret			; return
    
    ; output from execution:
    ; 5280
    ; 5280
    ; 6319
    ; 6319
    ; 6.319000e+03
    ; -8.549414e-01
    ; 1F97
    ; 0001111110010111
    
    
    

    Lecture 7 Subroutines

    Here is a basic subroutine (function, procedure, etc)
    Note the use of the stack pointer for passing parameters.
    Note saving and restoring the callers registers.
    (Yes, this is needed for CMSC 313 project 2)
    
    "call1" below, is called by the "C" program test_call1.c
    
    /* test_call1.c   test  call1.asm */
     #include <stdio.h>
    int main()
    {
      int L[2];
      L[0]=1;
      L[1]=2;
      call1(L);
      printf("L[0]=%d, L[1]=%d \n", L[0], L[1]);
      return 0;
    }
    
    The result is L{0]=0, L[1]=0, from the following:
    
    ; call1.asm  a basic structure for a subroutine to be called from "C"
    ;
    ; This saves more registers than used here
    ; Parameters: int L[] or  int *L
    ; Result: L[0]=0  L[1]=0
    
            global call1		; linker must know name of subroutine
    call1:				; name must appear as a nasm label
            push	ebp		; save ebp
            mov	ebp, esp	; ebp is callers stack
            push	ebx		; save registers
            push	edi
    	push	esi
    
            mov	edi,[ebp+8]	; get address of L into edi
            mov	eax,0		; get a 32-bit zero 
            mov	[edi],eax	; L[0]=0
            add	edi,4		; add one dword=32-bit int
            mov	[edi],eax	; L[1]=0
    
            pop	esi		; restore registers
    	pop	edi		; in reverse order
            pop	ebx
            mov	esp,ebp		; restore callers stack frame
            pop	ebp
            ret			; return
    ;
    ; Notes about the callers stack, ebp in our code:
    ; ebp+8  is the last argument passed to us by the caller,
    ;        this is our first argument
    ; ebp+12 would be our second argument, etc. +4 each
    ;        the arguments can be values or addresses,
    ;        as defined by the "C" function prototypes
    ; ebp+4  is the return address in the caller, used by 'ret'
    ; ebp    which is our starting esp, is the next available stack space
    
    Study call1.asm 
    
    Now, to pass more arguments, call2.c
    can be implemented as call2.asm
    Note passing arrays including strings is via address,
         passing scalar values is via passing values.
    
    ; call2.asm  code loopint.c as subroutine (void function)
    ; /* call2.c a very simple loop that will be coded for nasm */
    ; #include <stdio.h>
    ; void call2(int *A, int start, int end, int value);
    ; int main()
    ; {
    ;   int dd1[100];
    ;   int i;
    ;   dd1[0]=5; /* be sure loop stays 1..98 */
    ;   dd1[1]=6;
    ;   dd1[98]=8;
    ;   dd1[99]=9;
    ;   call2(dd1,1,98,7); /* fill dd1[1] thru dd1[98] with 7 */
    ;   printf("dd1[0]=%d, dd1[1]=%d, dd1[98]=%d, dd1[99]=%d\n",
    ;           dd1[0], dd1[1], dd1[98],dd1[99]);
    ;   return 0;
    ; }
    ; void call2(int *A, int start, int end, int value)
    ; {
    ;   int i;
    ;
    ;   for(i=start; i<=end; i++) A[i]=value;
    ; }
    
    ; execution output is dd1[0]=5, dd1[1]=7, dd1[98]=7, dd1[99]=9
     
    	section	.bss
    i:	resd	1		; actually unused, kept in register
    
    	section .text
    	global	call2		; linker must know name of subroutine
    call2:				; name must appear as a nasm label
            push	ebp		; save ebp
            mov	ebp, esp	; ebp is callers stack
            push	ebx		; save registers
            push	edi
    
            mov	edi,[ebp+8]	; get address of A into edi
            mov	eax,[ebp+12]	; get value of start
            mov	ebx,[ebp+16]	; get value of end
            mov	edx,[ebp+20]	; get value of value
    	
    loop1:	mov	[4*eax+edi],edx	; A[i]=value;
    	add	eax,1		; i++;
    	cmp	eax,ebx		; i<=end
    	jle	loop1		; loop i<=end is false
    
    	pop	edi		; in reverse order
            pop	ebx
            mov	esp,ebp		; restore callers stack frame
            pop	ebp
            ret			; return
    ;
    ; Notes about the callers stack, ebp in our code:
    ; ebp+8  is the last argument passed to us by the caller,
    ;        this is our first argument, the address of A.
    ; ebp+12 is our second argument, 'start' a value.
    
    A simple function, called and written in the same .asm file
    intfunc.asm
    
    ; intfunc.asm  call integer function  int sum(int x, int y)
    ; 
    ; compile:	nasm -f elf intfunc.asm 
    ; link:		gcc -o intfunc.o
    ; run:		intfunc
    ; result:	5 = sum(2,3)
    
    	extern	printf
    	section .data
    x:	dd	2
    y:	dd	3
    z:	dd	1
    fmt:	db	"%d = sum(%d,%d)",10,0
    	global	main
    main:	push	ebp
    	mov	ebp,esp
    	push	ebx
    
    	push	dword [y]	; push arguments for sum
    	push	dword [x]
    	call	sum		; coded below
    	add	esp,8
    	mov	[z],eax		; save result from sum
    
    	push	dword [y]	; print
    	push	dword [x]
    	push	dword [z]
    	push	dword fmt
    	call	printf
    	add	esp,16
    
    	pop	ebx
    	mov	esp,ebp
    	pop	ebp
    	mov	eax,0
    	ret
    ; end main
    
    sum:	push	ebp		; function int sum(int x, int y)
    	mov	ebp,esp
    	push	ebx
    
    	mov	eax,[ebp+8]	; get argument x
    	mov	ebx,[ebp+12]	; get argument y
    	add	eax,ebx		; x+y with result in eax
    
    	pop	ebx
    	mov	esp,ebp
    	pop	ebp
    	ret			; return value in eax
    ; end of function  int sum(int x, int y)
    
    A simple demonstration of using a double sin(double x) function
    from the "C" math.h  fltfunc.asm
    
    ; fltfunc.asm  call math routine  double sin(double x)
    ; 
    ; compile:	nasm -f elf fltfunc.asm 
    ; link:		gcc -o fltfunc.o -lm    # needs math library
    ; run:		fltfunc
    ; 
    	extern	sin             ; be sure to 'extern' functions
    	extern	printf
    	section .data
    x:	dq	0.7853975	; Pi/4 = 45 degrees
    y:	dq	1.0		; should be about 7.07E-1
    fmt:	db	"%e = sin(%e)",10,0
    	global	main
    main:	push	ebp
    	mov	ebp,esp
    	push	ebx
    
    	push	dword [x+4]     ; push quad word (double) for sin
    	push	dword [x]
    	call	sin             ; call the library sin function
    	add	esp,8
    	fstp	qword [y]       ; save the return value
    
    	push	dword [x+4]     ; print
    	push	dword [x]
    	push	dword [y+4]
    	push	dword [y]
    	push	dword fmt
    	call	printf
    	add	esp,20
    
    	pop	ebx
    	mov	esp,ebp
    	pop	ebp
    	mov	eax,0
    	ret
    
    ; result:	7.071063e-01 = sin(7.853975e-01)
    ; end fltfunc.asm
    
    And a final example of a simple recursive function, factorial,
    written in unoptimized assembly language following the "C" code.
    test_factorial.asm
    test_factorial.c
    
    ; test_factorial.asm   based on test_factorial.c
    ; /* test_factorial.c  the simplest example of a recursive function         */
    ; /*                   a recursive function is a function that calls itself */
    ; static int factorial(int n)  /* n! is n factorial = 1*2*3*...*(n-1)*n  */
    ; {
    ;   if( n <= 1 ) return 1;     /* must have a way to stop recursion      */
    ;   return n * factorial(n-1); /* factorial calls factorial with n-1     */
    ; }                            /* n * (n-1) * (n-2) * ... * (1)          */
    ; #include <stdio.h>
    ; int main()
    ; {
    ;   printf(" 0!=%d \n", factorial(0)); /* Yes, 0! is one */
    ;   printf(" 1!=%d \n", factorial(1));
    ;   ...
    ;   printf("18!=%d \n", factorial(18)); /* wrong, uncaught in C */
    ;   return 0;
    ; }
    ; /* output of execution is:
    ;   0!=1
    ;   1!=1
    ;   ...
    ;  12!=479001600
    ;  13!=1932053504    wrong! 13! = 12! * 13, must end in two zeros
    ;  14!=1278945280    wrong! and no indication!
    ;  15!=2004310016    wrong!
    ;  16!=2004189184    wrong!
    ;  17!=-288522240    wrong and obvious if you check your results
    ;  18!=-898433024    Only sometimes does integer overflow go negative
    ; */
    ; 
    ; compile:	nasm -f elf test_factorial.asm 
    ; link:		gcc -o test_factorial.o
    ; run:		test_factorial
    
    	section	.bss
    tmp:	resd	1		; over written each call
    	section	.text
    factorial:                      ; not global is 'static' in "C"
    	push	ebp		; function int factorial(int n)
    	mov	ebp,esp
    	push	ebx
    
    	mov	eax,[ebp+8]	; get argument n
    	cmp	eax,1		; compare for exit
    	jle	exitf		; go return a 1
    	sub	eax,1		; n-1
    	push	dword eax	; compute factorial(n-1)
    	call	factorial
    	pop	edx		; get back our "n-1"
    	add	edx,1		; have our "n"
    	mov	[tmp],edx
    	imul	eax,[tmp]	; n * factorial(n-1) in eax
    	jmp	returnf
    exitf:	mov	eax,1
    returnf:	
    	pop	ebx
    	mov	esp,ebp
    	pop	ebp
    	ret			; return value in eax
    ; end of function  static int factorial(int n)
    
    	extern	printf
    	section .data
    n:	dd	0		; initial, will count to 18
    fmt:	db	"%d! = %d",10,0 ; simple format
    	section	.bss
    nfact:	resd	1		; just a place to hold result
    	section	.text
    	global	main
    main:	push	ebp
    	mov	ebp,esp
    	push	ebx
    
    loop1:	
    	push	dword [n]	; push arguments for factorial
    	call	factorial	; coded above
    	add	esp,4
    	mov	[nfact],eax	; save result from factorial
    
    	push	dword [nfact]	; print
    	push	dword [n]
    	push	dword fmt
    	call	printf
    	add	esp,12
    	mov	eax,[n]
    	inc	eax
    	mov	[n],eax
    	cmp	eax,18
    	jle	loop1		; print factorial 0..18
    	
    	pop	ebx
    	mov	esp,ebp
    	pop	ebp
    	mov	eax,0
    	ret
    ; end main
    
    
    
    

    Lecture 8 Boot programs and 16-bit

    A sample of a basic stand alone bootable program is boot1.asm
    
    ; boot1.asm   stand alone program for floppy boot sector
    ; Compiled using            nasm -f bin boot1.asm
    ; Written to floppy with    dd if=boot1 of=/dev/fd0
    	
    ; Boot record is loaded at 0000:7C00,
    	ORG 7C00h
    ; load message address into SI register:
    	LEA SI,[msg]
    ; screen function:
    	MOV AH,0Eh
    print:  MOV AL,[SI]         
    	CMP AL,0         
    	JZ done		; zero byte at end of string
    	INT 10h		; write character to screen.    
         	INC SI         
    	JMP print
    
    ; wait for 'any key':
    done:   MOV AH,0       
        	INT 16h		; waits for key press
    			; AL is ASCII code or zero
    			; AH is keyboard code
    
    ; store magic value at 0040h:0072h to reboot:
    ;		0000h - cold boot.
    ;		1234h - warm boot.
    	MOV  AX,0040h
    	MOV  DS,AX
    	MOV  word[0072h],0000h   ; cold boot.
    	JMP  0FFFFh:0000h	 ; reboot!
    
    msg 	DB  'Welcome, I have control of the computer.',13,10
    	DB  'Press any key to reboot.',13,10
    	DB  '(after removing the floppy)',13,10,0
    ; end boot1
    
    This program could be extended to find or verify the keycodes
    that are input (not all keys have ASCII codes).
    
    One keyboard has the following ASCII and keycodes ascii.txt
    
    American Standard Code for Information Interchange, ASCII
    (with keycodes for a particular 104 key keyboard)
      dec is decimal value
      hex is 8-bit hexadecimal value
      key is 104-key PC keyboard keycode in hexadecimal
      type means how to type character (shift not shown) C- for hold control down
      def  is control character definition, e.g. LF line feed, FF form feed,
           CR carriage return, BS back space,
                                              
    dec hex key type def   dec hex key type   dec hex key type   dec hex key type
      0  00  13 C-@  NULL   32  20  5E space   64  40  13 @       96  60  11 `
      1  01  3C C-A  SOH    33  21  12 !       65  41  3C A       97  61  3C a
      2  02  50 C-B  STX    34  22  46 "       66  42  50 B       98  62  50 b
      3  03  4E C-C  ETX    35  23  14 #       67  43  4E C       99  63  4E c
      4  04  3E C-D  EOT    36  24  15 $       68  44  3E D      100  64  3E d
      5  05  29 C-E  ENQ    37  25  16 %       69  45  29 E      101  65  29 e
      6  06  3F C-F  ACK    38  26  18 &       70  46  3F F      102  66  3F f
      7  07  40 C-G  BEL    39  27  46 '       71  47  40 G      103  67  40 g
      8  08  41 C-H  BS     40  28  1A (       72  48  41 H      104  68  41 h
      9  09  2E C-I  HT     41  29  1B )       73  49  2E I      105  69  2E i
     10  0A  42 C-J  LF     42  2A  19 *       74  4A  42 J      106  6A  42 j
     11  0B  43 C-K  VT     43  2B  1D +       75  4B  43 K      107  6B  43 k
     12  0C  44 C-L  FF     44  2C  53 ,       76  4C  44 L      108  6C  44 l
     13  0D  52 C-M  CR     45  2D  1C -       77  4D  52 M      109  6D  52 m
     14  0E  51 C-N  SO     46  2E  54 .       78  4E  51 N      110  6E  51 n
     15  0F  2F C-O  SI     47  2F  55 /       79  4F  2F O      111  6F  2F o
     16  10  30 C-P  DLE    48  30  1B 0       80  50  30 P      112  70  30 p
     17  11  27 C-Q  DC1    49  31  12 1       81  51  27 Q      113  71  27 q
     18  12  2A C-R  DC2    50  32  13 2       82  52  2A R      114  72  2A r
     19  13  3D C-S  DC3    51  33  14 3       83  53  3D S      115  73  3D s
     20  14  2B C-T  DC4    52  34  15 4       84  54  2B T      116  74  2B t
     21  15  2D C-U  NAK    53  35  16 5       85  55  2D U      117  75  2D u
     22  16  4F C-V  SYN    54  36  17 6       86  56  4F V      118  76  4F v
     23  17  2E C-W  ETB    55  37  17 7       87  57  28 W      119  77  28 w
     24  18  4D C-X  CAN    56  38  19 8       88  58  4D X      120  78  4D x
     25  19  2C C-Y  EM     57  39  1A 9       89  59  2C Y      121  79  2C y
     26  1A  4C C-Z  SUB    58  3A  45 :       90  5A  4C Z      122  7A  4C z
     27  1B  31 C-[  ESC    59  3B  45 ;       91  5B  31 [      123  7B  31 {
     28  1C  33 C-\  FS     60  3C  53 <       92  5C  33 \      124  7C  33 |
     29  1D  32 C-]  GS     61  3D  3D =       93  5D  32 ]      125  7D  32 }
     30  1E  17 C-^  RS     62  3E  54 >       94  5E  17 ^      126  7E  11 ~
     31  1F  1C C-_  US     63  3F  55 ?       95  5F  1C _      127  7F  34 delete
    
    Additional key codes (most have no ASCII)[must track shift-up, shift-down etc.]
      key type        key type          key type             key type
      01  ESCAPE      10  PAUSE          39 keypad 9 PAGE UP  5D LEFT ALT
      02  F1          1E  BACKSPACE      3A keypad +          5E SPACE
      03  F2          1F  INSERT         3B CAPS LOCK         5F RIGHT ALT
      04  F3          20  HOME           47 ENTER             60 RIGHT CTRL
      05  F4          21  PAGE UP        48 keypad 4 LEFT     61 LEFT ARROW
      06  F5          22  NUM LOCK       49 keypad 5          62 DOWN ARROW
      07  F6          23  keypad /       4A keypad 6 RIGHT    63 RIGHT ARROW
      08  F7          24  keypad *       4B LEFT SHIFT        64 keypad 0 INS
      09  F8          25  keypad -       56 RIGHT SHIFT       65 keypad . DEL
      0A  F9          26  TAB            57 UP ARROW          66 LEFT WINDOWS
      0B  F10         34  DELETE         58 keypad 1 END      67 RIGHT WINDOWS
      0C  F11         35  END            59 keypad 2 DOWN     68 APPLICATION
      0D  F12         36  PAGE DOWN      5A keypad 3 PAGE DN  7E SYS REQ
      0E  PRT SCRN    37  keypad 7 HOME  5B keypad ENTER      7F BREAK
      0F  SCROLL LOCK 38  keypad 8 UP    5C LEFT CTRL
    
    Now you may wish to download another self booting program,
    memtest.bin a binary program.
    If you can get this file, undamaged, onto your computer, running
    linux, then you can write a floppy disk:
    
      dd if=memtest.bin of=/dev/fd0
    
    Then do a safe shutdown.
    Reboot your computer from the power off state.
    You should see information about your computer.
    e.g. clock speed, type of CPU, cache sizes, RAM size,
    and it will run a very thurough memory test on your RAM.
    
    You will not be able to run a bootable floppy on a UMBC
    Intel PC because the BIOS should be set to not boot from
    a floppy and the BIOS should be password protected, so you
    can not change the BIOS. The machine is probably secured
    so you can not get in and change the BIOS chip. 
    
    More on bootable floppies is at nasm boot info
    
    
    

    Lecture 9 BIOS calls

    See Project 3
    
    A few basic BIOS calls:
    See BIOS ref
    
    A more complete, and harder to read, BIOS 
    Interrupt Services But, not INT 21h or above.
    You do not have DOS running.
    
    A sample bootable program is boot1.asm
    A more complete bootable program with subroutines and uses a
    printer on lpt 0 is:
    bootreg.asm
    
    

    Lecture 10 Hardware Interface

    White board lecture
    eip->instruction->decode->registers->alu->ear->data RAM etc.
    

    Lecture 11 Privileged instructions

    The Intel 80x86 have privilege levels.
    There are instructions that can only be executed at the highest
    privilege level, CPL = 0. This would be reserved for the
    operating system in order to preven the average user from
    causing chaos. e.g. The average user could issue a HLT instruction
    to halt the machine and thus every process would be dead.
    Other CPL=0 only instructions include:
      CLTS  Clear Task Switcing flag in cr0
      INVP  Invalidate cache
      INVLPG Invalidate translation lookaside buffer, TLB
      WBINVD Write Back and Invalidate cache
    
    It should be obvious that when running a multiprocessing operating
    system, that there are many instructions that only the operating
    system should use.
    
    The operating system controls the resources of the computer,
    including RAM, I/O and user processes. Some sample protections
    are tested by the following sample programs:
    
    A few simple tests to be sure protections are working.
    These three programs result in segfault, intentionally.
    safe.asm store into read only section
    safe1.asm store into code section
    safe2.asm jump (execute) data
    
    A few simple tests to be sure privileged instructions can not execute.
    priv.asm hlt instruction to halt computer
    priv1.asm other privileged instructions
    
    In order to allow the user some access, controlled access, to
    system resources, an interface to the operating system, or kernel,
    is provided. You will see in the next lecture that some BIOS
    functions are also provided as Linux kernel calls.
    
    

    Lecture 12 Linux kernel calls

    To understand Linux  System Calls, learn from the UMBC Expert: Gary Burt.
    CMSC 313 -- System Calls
    System Call Table
    
    When making Linux kernel calls from a "C" program, you will need
    #include <unistd.h>
    
    A sample syscall1.asm
    demonstrates file open (unistd.h failed), file read (in hunks of 8192)
    and file write (the whole file!)
    
    

    Lecture 13 Review

    Go over lectures 1 through 11.
    
    See typical questions here
    
    

    Lecture 14 mid-term exam

    33 questions: some true-false, some multiple choice,
    some short answer, e.g. convert decimal to binary or binary to decimal
    A few one line to three line assembly language questions.
    
    

    Lecture 15 Logic gates

    For these notes:
      1 = true = high = value of a digital signal on a wire
      0 = false = low = value of a digital signal on a wire
      X = unknown or indeterminant to people, not on a wire
    
    A digital logic gate can be represented at least three ways,
    we will interchangeably use: schematic symbol, truth table or equation.
    The equations may be from languages such as mathematics, VHDL or Verilog.
    
    Digital logic gates are connected by wires. A wire or a group of
    wires can be given a name, called a signal name. From an electronic
    view the digital logic wire has a high or a low (voltage) but we
    will always consider the wire to have a one (1) or a zero (0).
    
    The basic logic gates are shown below.
    
    

    The basic "and" gate:

    truth table equation symbol a b | c ----+-- c <= a and b; 0 0 | 0 0 1 | 0 c = a & b; 1 0 | 0 1 1 | 1 c = and(a,b) Easy way to remember: The output is 1 when all inputs are 1, 0 otherwise. In theory, an "and" gate can have any number of inputs.

    The basic "and" gate:

    truth table equation symbol a b c | d d = and(a, b, c) ------+-- 0 0 0 | 0 notice how a truth table has the inputs 0 0 1 | 0 counting 0, 1, 2, ... in binary. 0 1 0 | 0 0 1 1 | 0 the output (may be more than one bit) is 1 0 0 | 0 after the vertical line, on the right. 1 0 1 | 0 1 1 0 | 0 1 1 1 | 1

    The basic "or" gate:

    truth table equation symbol a b | c ----+-- c <= a or b; 0 0 | 0 0 1 | 1 c = a | b; 1 0 | 1 1 1 | 1 c = or(a,b) Easy way to remember: The output is 0 when all inputs are 0, 1 otherwise. In theory, an "or" gate can have any number of inputs.

    The basic "or" gate:

    truth table equation symbol a b c | d d = or(a, b, c) ------+-- 0 0 0 | 0 notice how a truth table has the inputs 0 0 1 | 1 counting 0, 1, 2, ... in binary. 0 1 0 | 1 0 1 1 | 1 the output (may be more than one bit) is 1 0 0 | 1 after the vertical line, on the right. 1 0 1 | 1 1 1 0 | 1 1 1 1 | 1

    The basic "nand" gate:

    truth table equation symbol a b | c ----+-- c <= a nand b; 0 0 | 1 0 1 | 1 c = ~ (a & b); 1 0 | 1 1 1 | 0 c = nand(a,b) Easy way to remember: "nand" reads "not and", the complement of "and".

    The basic "nor" gate:

    truth table equation symbol a b | c ----+-- c <= a nor b; 0 0 | 1 0 1 | 0 c = ~ (a | b); 1 0 | 0 1 1 | 0 c = nor(a,b) Easy way to remember: "nor" reads "not or", the complement of "or".

    The basic "xor" gate:

    truth table equation symbol a b | c ----+-- c <= a xor b; 0 0 | 0 0 1 | 1 c = a ^ b; 1 0 | 1 1 1 | 0 c = xor(a,b) Easy way to remember: "eXclusive or" not 11, or odd number of ones.

    The basic "xor" gate:

    truth table equation symbol a b c | d ------+-- d <= a xor b xor c; 0 0 0 | 0 0 0 1 | 1 0 1 0 | 1 d = a ^ b ^ c; 0 1 1 | 0 1 0 0 | 1 d = xor(a,b,c) 1 0 1 | 0 1 1 0 | 0 1 1 1 | 1 Easy way to remember: odd parity, odd number of ones.

    The basic "xnor" gate:

    truth table equation symbol a b | c ----+-- c <= a xnor b; 0 0 | 1 0 1 | 0 c = ~ (a ^ b); 1 0 | 0 1 1 | 1 c = xnor(a,b) Easy way to remember: "xnor" reads "not xor", equality or even parity.

    The basic "not" gate:

    truth table equation symbol a | b --+-- b <= not a; 0 | 1 1 | 0 b = ~ a; b = not(a) Easy way to remember: invert or "not", the complement.

    A specialized gate:

    truth table equation symbol a b c | d ------+-- d <= not( not a and b and not c); 0 0 0 | 1 0 0 1 | 1 0 1 0 | 0 d = ~( ~a & b & ~c); 0 1 1 | 1 1 0 0 | 1 d = and(not(a),b,not(c)) 1 0 1 | 1 _______ 1 1 0 | 1 _ _ 1 1 1 | 1 d = (a b c) Easy way to remember: none, just work it out. Bubbles on the input mean the same as bubbles on the output, invert the signal value. Often this is written with a line _ _ above the variable d = a b c which is read: d equals a bar and b and c bar. The word "bar" for the line above the variable, meaning invert the variable. It is known that there are 16 Boolean functions with two inputs. In fact, for any number of inputs, n, there are 2^(2^n) Boolean functions ( two to the power of two to the nth). For n=2 16 functions 2^4 n=3 256 functions 2^8 n=4 65,536 functions 2^16 n=5 over four billion functions 2^32 The truth table for all Boolean functions of two inputs is n x n x a a n o _ _ o n n o 1 1 o a b | 0 r 2 a 4 b r d d r b 1 a 3 r 1 ----+-------------------------------- 0 0 | 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 | 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 | 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 Notice that for two input variables, a b, there are 2^2 = 4 rows Notice that for four rows there are 2^4 = 16 columns. A question is: Which are "universal" functions from which all other functions can be obtained? The answer is that either "nand" or "nor" can be used to create all other functions (when having 0 and 1 available). It turns out that electric circuits rather naturally create "nand" or "nor" gates. No more than five "nand" gates or five "nor" gates are needed in creating any of the 16 Boolean functions of two inputs. Here are the circuits using only "nand" to get all 16 functions.

    Lecture 16 Combinational Logic

    
    

    Combinational digital logic uses Boolean Algebra.

    The basic relations are well known, yet several notations are used. Notation A: use words "and" "or" "not" etc. Notation B: use characters & for "and", | for "or", ~ for "not" Notation C: use characters * for "and", + for "or", - for "not" Notation D: use symbols "dot" for "and", + for "or", bar for "not" Notation E: use symbols "blank" for "and", + for "or", bar for "not" Generally, the symbols for "and" are like the symbols for multiply, the symbols for "or" are like the symbols for addition. In mathematics, multiplication always has precedence over addition, do not expect "and" to always have precedence over "or. Here are 19 basic identities that can be used to simplify or convert one Boolean equation to another. 1. X + 0 = X "or" anything with zero gives anything 2. X * 1 = X "and" anything with one gives anything 3. X + 1 = 1 "or" anything with one gives one 4. X * 0 = 0 "and" anything with zero gives zero 5. X + X = X "or" with self gives self 6. X * X = X "and" with self gives self _ 7. X + X = 1 "or" with complement gives one _ 8. X * X = 0 "and" with complement gives zero 9. not(not(X)) = X any even number of complements cancel 10. X + Y = Y + X "or" is commutative 11. X * Y = Y * X "and" is commutative 12. X + (Y + Z) = (X + Y) + Z "or" is associative 13. X * (Y * Z) = (X * Y) * Z "and" is associative 14. X * (Y + Z) = X * Y + X * Z distributive law 15. X + Y * Z = (X + Y) * (X + Z) distributive law _________ _ _ 16. X + Y = ( X * Y ) DeMorgan's theorem _________ _ _ 17. ( X + Y ) = X * Y DeMorgan's theorem _________ _ _ 18. ( X * Y ) = X + Y DeMorgan's theorem _________ _ _ 19. X * Y = ( X + Y ) DeMorgan's theorem Basically, DeMorgan's theorem says: Convert "and" to "or", negate the variables and negate the entire expression. Convert "or" to "and", negate the variables and negate the entire expression.

    Any truth table can be converted to a equation or schematic.

    Given any truth table, there is a simple procedure for generating a Boolean equation that uses "and", "or" and "not" (any representation). First an example: Given truth table a b | c for each row where 'c' is 1, _ ----+-- create an "and" with 'a' if 'a' is 1, or 'a' if 'a' is 0 0 0 | 1 _ 0 1 | 0 with 'b' if 'b' is 1, or 'b' if 'b' is 0 1 0 | 1 _ _ 1 1 | 1 thus, first row a * b _ third row a * b fourth row a * b now, "or" the "and's" to form the final equation _ _ _ c = (a * b) + (a * b) + (a * b) c <= (not a and not b) or (a and not b) or (a and b); The schematic can be drawn directly, one "and" gate for each row where 'c' is 1 with a bubble for each variable that is 0 The general process to convert a truth table (or partial truth table) to a Boolean equation using "and" "or" "not" is: For each output For each row where the output is 1 create a minterm that is the "and" of the input variables with the input variable complemented when the input variable is 0. The output is the "or" of the above minterms. Another example with three input variables and two outputs. a b c | s co ------+----- 0 0 0 | 0 0 _ _ _ _ _ _ 0 0 1 | 1 0 s = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c) 0 1 0 | 1 0 0 1 1 | 0 1 _ _ _ 1 0 0 | 1 0 co = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c) 1 0 1 | 0 1 1 1 0 | 0 1 1 1 1 | 1 1 The exact same information is presented by the schematic: Note that this is not a minimum representation, we will talk about minimizing digital logic in a few lectures.

    Any equation can be converted to a truth table.

    Example, convert c <= (a and b) or (not a and b); _ c = (a * b) + (a * b) to a truth table We can immediately construct the truth table structure. We see the input variables are 'a' and 'b' and the output is 'c' We generate all possible values for input by counting in binary. a b | c ----+-- 0 0 | 0 1 | 1 0 | 1 1 | The only step that remains is to fill in the 'c' column. For the first row, substitute 0 for 'a' and 0 for 'b' in the equation and evaluate to find 'c' _ c = (0 * 0) + (0 * 0) = (0) + ( 1 * 0) = 0 (using identities above) For the second row, substitute 0 for 'a' and 1 for 'b' in the equation and evaluate to find 'c' _ c = (0 * 1) + (0 * 1) = (0) + ( 1 * 1) = 1 (using identities above) For the third row, substitute 1 for 'a' and 0 for 'b' in the equation and evaluate to find 'c' _ c = (1 * 0) + (1 * 0) = (0) + ( 0 * 0) = 0 (using identities above) For the fourth row, substitute 1 for 'a' and 1 for 'b' in the equation and evaluate to find 'c' _ c = (1 * 1) + (1 * 1) = (1) + ( 0 * 1) = 1 (using identities above) Filling in the values for 'c' gives the completed truth table: a b | c ----+-- 0 0 | 1 0 1 | 0 1 0 | 1 1 1 | 0

    Any digital logic schematic can be converted to a truth table.

    Any equation can be converted to a schematic and any schematic can be converted to an equation. When converting a schematic to a truth table directly, you are simulating the actual behavior of the digital logic. From schematic below we can immediately construct the truth table structure. We see the input variables are 'a' and 'b' and the output is 'c' We generate all possible values for input by counting in binary. a b | c ----+-- 0 0 | 0 1 | 1 0 | 1 1 | Now, start by placing the values of input signals on the input wires. a=0, b=0. Note that signals other than inputs are labeled X for unknown. Then, as shown on the sequence of figures, propagate the signals. For each gate, use the gate input to compute the gate output. This is actually how the hardware works. Each gate is continually using the inputs to produce the output, with a small delay. All gates operate in parallel. All gates operate all the time. Working a little faster, apply truth table inputs a=0, b=1, then a=1, b=0, and finally a=1, b=1. Filling in the values for 'c' gives the completed truth table: a b | c ----+-- 0 0 | 0 0 1 | 1 1 0 | 0 1 1 | 1

    Lecture 17 Combinational logic design

    "Combinational logic" means gates connected together without feedback.
    There is no storage of information. Inputs are applied and outputs
    are produced. By convention, we draw combinational logic from
    inputs on the left to outputs on the right. For large schematic
    diagrams this convention is often violated.
    
    When no constraints are given, any of the gates previously
    defined can be connected to design a circuit that performs
    the stated function.
    
    Example: Design a circuit that has:
      an input for tail lights both on
      an input for right turn that lets the signal "osc" control right tail light.
      an input for left turn that lets the signal "osc" control left tail light.
      ("osc" will make the light flash on and off as a turn indicator.)
    
      Constraint: use "and" and "or" gates with inversion bubbles allowed
    
    Solution: There are four inputs "tail" "right" "left" and "osc"
              There are two outputs "right_light" and "left_light"
    
      The general strategy in design is to work backward from an output.
      Yet, as usual, some work from input toward output is also used.
    
      "right_light" must select between "tail" and "osc". Selection
      can typically be implemented by "and" gates feeding an "or" gate
      with a control signal into one "and" gate and its complement into
      the other "and" gate.
      
      Analyzing this circuit, if "right" is off, "tail" controls
      the "right_light". If "right is on, "osc" controls the "right_light".
    
    A common symbol for this circuit is a multiplexor, mux for short.
    The same circuit as above is usually drawn as the schematic diagram:
    
    
    
      Now we can use the first schematic with new labeling for
      the "left_light", combining the circuits yields:
      
    
    Now a new requirement is added, the flashers must over ride all
    other signals and make "osc" drive both right and left tail lights.
    
    A typical design technique is to build on existing designs,
    thus note that "flash" only needs to be able to turn on both
    the old "right" and old "left". This is two "or" functions
    that are easily added to the previous circuit.
      
    
    In general a multiplexor can have any number of inputs.
    Typically the number of inputs is a power of two and the
    control signal, ctl, has the number of bits in the power.
      
    
     ctl | out  Note that "ctl" is a two bit signal, shown by the "2"
     ----+----
     0 0 |  a   The truth table does not have to expand
     0 1 |  b   a, b, c and d  because the mux just passes
     1 0 |  c   the values through to "out" based on the
     1 1 |  d   value of "ctl"
    
    
    For a general circuit that has some type of description, we use
    a rectangle with some notation indicating the function of the
    circuit. The inputs and outputs are given signal names.
      
    
    

    Lecture 18 Simulation tools

    There are many simulation and design tools available for digital logic.
    
    Some sections of CMSC 313 use B2Logic.
    This is a graphical interface program for use on Microsoft Windows.
    This program has many building blocks from the digital logic of
    the 1970 era. In this era dual in line packages had many 4-bit
    circuits. B2Logic allows a maximum of a 16 bit bus as a primitive.
    This is a practical learning tool for simple logic circuits, yet it
    can not handle todays designs, 32-bit and 64-bit computer architectures.
    
    Some sections of CMSC 313 use DigSim a Java applet that can be run
    from any WEB browser. DigSim is interactive and dynamic yet seems
    limited in circuit complexity and timing accuracy. Learn more at
    Richard Chang's WEB page
    
    There are major commercial Electronic Design Automation, EDA, systems
    for todays digital logic. Cadence is one of todays major suppliers and
    UMBC has Cadence software available on GL computers.
    Mentor Graphics, Synopsis and others provide large tool sets.
    
    Altera and Xilinx are major providers of software for making custom
    integrated circuits using Field Programmable Gate Arrays, FPGA.
     www.altera.com 
    Altera has a downloadable student version.
     www.xilinx.com 
    The student version of Xilinx came with your textbook. Be sure
    to install CD-Rom 2 of 2 first, if you wish to try this software.
    
    The best WEB site to find free EDA tools is www.geda.seul.org
    
    For projects for this section of CMSC 313 we will use Cadence VHDL
    that is available on linux.gl.umbc.edu.
    
    

    Using Cadence VHDL on Linux.GL machines

      First: You must have an account on a GL machine. Every student
             and faculty should have this.
             Either log in directly to linux.gl.umbc.edu or
             Use   ssh  linux.gl.umbc.edu
    
             You can copy many sample files to your working directory using:
             cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411.tar  .
             Do not forget the final space dot. There are many files available.
    
      Next:  Follow instructions exactly or you figure out a variation.
      1)     Get this tar file into your home directory (on /afs  i.e.
             available on all GL machines.)
             cs411.tar   and then type commands:
             cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411.tar  .
             tar -xvf cs411.tar
             cd vhdl
             mv Makefile.cadence Makefile
             source vhdl_cshrc
             make
             more add32_test.out
             make clean              # saves a lot of disk quota
    
             Then do your own thing with Makefile for other VHDL files
    
      2)     The manual, step by step method (same results as above)
             Be in your home directory.
             mkdir vhdl                 # for your source code  .vhdl files
             cd vhdl
             mkdir vhdl_lib             # your WORK library, keep hands off
             
             You now need to get the following 6 files into you  vhdl  directory:
             vhdl_cshrc
             cds.lib      change $HOME to your path if needed
             hdl.var
             Makefile.cadence    for first test
             add32_test.vhdl     for first test
             add32_test.run       for first test
    
             mv Makefile.cadence  Makefile
             # Run the test run:
             source  vhdl_cshrc
             make                        # should be no error messages
             more  add32_test.out        # it should have VHDL simulation output
             make clean                  # saves on your quota
    
             You are on your own to write VHDL and modify the Makefile.
             Remember each time you log on:
             cd vhdl
             source vhdl_cshrc
             make                        # or do your own thing.
    
      The above is the latest generation Cadence "ldv" "ncvhdl, nceval, ncsim"
    
    

    FPGA and other CAD information

    You can get working chips from VHDL using synthesis tools.
    
    One of the quickest ways to get chips is to use FPGA's,
    Field Programmable Gate Arrays.
    The two companies listed below provide the software and the
    foundry for you to design your own integrated circuit chips:
    
     www.altera.com 
    
     www.xilinx.com 
    
    Complete Computer Aided Design, CAD, packages are available from
    companies such as Cadence, Mentor Graphics and Synopsis.
    
    

    Other Digital Logic Tool Links

    Lecture 19 Arithmetic circuits

    Basic decimal addition (with carry digit shown)
      101  <- carry (note that three numbers are added after first digit)
    
       567
     + 526
     -----
      1093
    
    Binary addition (with carry bit shown)
      1011  <- carry (note that three bits are added after first bit)
               for future reference c(3)=1, c(2)=0, c(1)=1, c(0)=1
       1011    bits are numbered from zero, right to left
     + 1001
      -----
      10100    for future reference s(3)=0, s(2)=1, s(1)=0, s(0)=0
               the leftmost '1' is cout
    
    Since three bits must be added, a truth table for a full adder
    needs three inputs and thus eight entries.
    
     a b c | s co
     ------+-----        _ _       _   _       _ _
     0 0 0 | 0 0    s = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c) 
     0 0 1 | 1 0        simplifies to
     0 1 0 | 1 0    s = a xor b xor c
     0 1 1 | 0 1    s <= a xor b xor c;
     1 0 0 | 1 0          _           _           _
     1 0 1 | 0 1    co = (a*b*c) + (a*b*c) + (a*b*c) + (a*b*c)
     1 1 0 | 0 1         simplifies to
     1 1 1 | 1 1    co = (a*b)+(a*c)+(b*c)
                    co <= (a and b) or (a and c) or (b and c);
    
    This can be drawn as a box for use on larger schematics
    
          +-------+
          | a b c |  The inputs are shown at the top (or left)
          |       |
          | fadd  |
          |       |
          | co  s |  The outputs are shown at the bottom (or right)
          +-------+
    
    The full adder can be written as an entity in VHDL
    
    entity fadd is               -- full stage adder, interface
      port(a  : in  std_logic;
           b  : in  std_logic;
           c  : in  std_logic;
           s  : out std_logic;
           co : out std_logic);
    end entity fadd;
    
    architecture circuits of fadd is  -- full adder stage, body
    begin  -- circuits of fadd
      s <= a xor b xor c after 1 ns;
      co <= (a and b) or (a and c) or (b and c) after 1 ns;
    end architecture circuits; -- of fadd
    
    
    Connecting four full adders, four fadd's, to make a 4-bit adder
    
     
    
    The connections are written for VHDL as 
    
      a0: entity WORK.fadd port map(a(0), b(0),  cin, s(0), c(0));
      a1: entity WORK.fadd port map(a(1), b(1), c(0), s(1), c(1));
      a2: entity WORK.fadd port map(a(2), b(2), c(1), s(2), c(2));
      a3: entity WORK.fadd port map(a(3), b(3), c(2), s(3), c(3));
    
    Note that the carry out of the previous stage is wired into
    the carry input of the next higher stage. In a computer,
    four bits are added to four bits and this produces four bits of sum.
    The last carry bit, c(3) here, is usually called 'cout' and is
    not called a 'sum' bit.
    
    
    The circuit was simulated with
    a(3)=0, a(2)=0, a(1)=0, a(0)=1   cin=0
    b(3)=1, b(2)=1, b(1)=1, b(0)=1
    
    There is a small delay time from the input to the output.
    When a circuit is simulated, the initial values of signals
    are shown as 'U' for uninitialized. As the circuit simulation
    proceeds, the 'U' are computed and become '0' or '1'.
    Partial output from the VHDL simulation shows this propagation.
    (the upper line is logic '1', the lower line is logic '0')
    
    s(0)  UU_____________________________
                                         
    s(1)  UUUUUU_________________________
                                         
    s(2)  UUUUUUUUUU_____________________
                                         
    s(3)  UUUUUUUUUUUUUU_________________
             ____________________________
    c(0)  UU                             
                 ________________________
    c(1)  UUUUUU                         
                     ____________________
    c(2)  UUUUUUUUUU                     
                         ________________
    c(3)  UUUUUUUUUUUUUU                 
    
    At the end of the simulation the values are:
    s(0)=0, s(1)=0, s(2)=0, s(3)=0, c(0)=1, c(1)=1, c(2)=1, c(3)=1
     
    
    The full VHDL code is   add_trace.vhdl
    
    The run file is         add_trace.run
    
    The full output file is add_trace.out
    
    A fragment of the Makefile is Makefile.add_trace
    
    
    Given that the computer can "add" it now has to be able to "subtract."
    Thus, a representation has to be chosen for negative numbers.
    All computers have chosen the left most bit (also called the
    high-order bit) to be the sign bit. The convention is that a '1'
    in the sign bit means negative, a '0' in the sign bit means positive.
    Within these conventions, three representations have been used
    in computers: two's complement, one's complement and sign magnitude.
    All bits are shown for 4-bit words in the table below.
    
     decimal   twos complement  ones complement  sign magnitude
           0      0000            0000             0000
           1      0001            0001             0001
           2      0010            0010             0010
           3      0011            0011             0011
           4      0100            0100             0100
           5      0101            0101             0101
           6      0110            0110             0110
           7      0111            0111             0111
          -8      1000             -                -
          -7      1001            1000             1111
          -6      1010            1001             1110
          -5      1011            1010             1101
          -4      1100            1011             1100
          -3      1101            1100             1011
          -2      1110            1101             1010
          -1      1111            1110             1001
          -0       -              1111             1000
    
    We could choose to build a subtractor that uses a borrow, yet
    this would require as many gates as were needed for the adder.
    By choosing the two's complement representation of negative
    numbers, an adder with a relatively low gate count multiplexor
    and inverter can become a subtractor. The implementation follows
    the definition of a negative number in two's complement
    representation: invert the bits and add one.
    
    
    Given a new symbol for an adder, the complete circuit for
    doing 4-bit add and subtract becomes:
    
    
    
    When the signal "subtract" is '1' the circuit subtracts 'b' from 'a'.
    When the signal "subtract" is '0' the circuit adds 'a' to 'b'.
    
    The basic circuit is written for VHDL as:
    
      a4: entity work.add4 port map(a, b_mux, subtract, sum, cout);
      i4: b_bar <= not b;
      m4: entity work.mux4 port map(b, b_bar, subtract, b_mux);
    
    The general rule is that each circuit component symbol on
    a schematic diagram will become one VHDL statement.
    There are many other VHDL statements needed to run a complete
    simulation.
    
    The annotated output of the simulation is:
    
    subtract=0, a=0100, b=0010, sum=0110  4+2=6
    subtract=1, a=0100, b=0010, sum=0010  4-2=2
    subtract=0, a=1100, b=0010, sum=1110  (-4)+2=(-2)
    subtract=1, a=1100, b=0010, sum=1010  (-4)-2=(-6)
    subtract=0, a=1100, b=1110, sum=1010  (-4)+(-2)=(-6)
    subtract=1, a=1100, b=1110, sum=1110  (-4)-(-2)=(-2)
    subtract=0, a=0011, b=1110, sum=0001, 3+(-2)=1
    subtract=1, a=0011, b=1110, sum=0101, 3-(-2)=5
    
    
    The full VHDL code is   sub4.vhdl
    
    The run file is         sub4.run
    
    The full output file is sub4.out
    
    A fragment of the Makefile is Makefile.sub4
    
    

    Lecture 20 Multiply and divide

    Multiplication and division are taught in elementary school, yet
    they are still being worked on for computer applications.
    
    The earliest computers just provided add and subtract with
    conditional Branch, leaving the programmer to write multiply
    and divide subroutines.
    
    Early computers used bit-serial methods that required about
    N squared clock times for multiplying or dividing N-bit numbers.
    
    With a parallel adder, the time for multiply was reduced to
    N/2 clock times (Booth algorithm) and division N clock times.
    
    Todays computers use parallel, combinational, circuits for
    multiply and divide. These circuits still take too long for
    signals to propagate in one clock time. The combinational
    circuits are "pipelined" so that a multiply or divide can be
    completed every clock time.
    
    Consider multiplying unsigned numbers  1010 * 1100  (10 times 12)
    Using a hand method would produce:
          1010
        * 1100
     ---------
          0000  <- think of the multiplier bit being "anded" with
         0000      the multiplicand. A 1-bit "and" in digital logic
        1010       is like a 1-bit "multiply". 
       1010
     ---------
      01111000  4-bits times 4-bits produces an 8-bit product
    
    When adding by hand, we can add the middle columns four bits and
    produce a sum bit and possibly a carry. In hardware the number
    of input bits is fixed. From the previous lecture, we could use
    four 4-bit adders with additional "and" gates to do the multiply.
    A better design incorporates the "and" gate to do a 1-bit multiply
    inside the previous lectures full adder. With this single building
    block, that is easy to replicate many times, we get the following
    parallel multiplier design.
     
      The 4-bit by 4-bit multiply to produce an 8-bit unsigned product is
    
      
      
    
      The component  madd  circuit is
    
       
    
       The VHDL source code is pmul4.vhdl
    
       The VHDL test driver is pmul4_test.vhdl
    
       The VHDL output is pmul4_test.out
    
       The Cadence run file is pmul4_test.run
    
       The partial Makefile is Makefile.pmul4_test
    
    
      Notice that the only component used to build the multiplier
      is "madd" and some uses of "madd" have constants as inputs.
      It is technology dependent whether the same circuit is used
      or specialized, minimized, circuits are substituted.
    
    Division is performed by using subtraction. A sample unsigned binary
    division of an 8-bit dividend by a 4-bit divisor that produces
    a 4-bit quotient and 4-bit remainder is:
    
                    1010  <- quotient
              /---------
         1100/  01111011  <- dividend
                -1100
                -----
                  0110
                 -0000
                 ------
                   1101
                  -1100
                 ------
                    0011  
                   -0000
                   -----
                    0011 <- remainder
     
    With a parallel adder and a double length register, serial division
    can be performed. Conventional division requires a trial subtraction
    and possibly a restore of the partial remainder. A non restoring
    serial division requires N clock times for a N-bit divisor.
    
      The schematic for a parallel 8-bit dividend divided by 4-bit divisor
      to produce an 4-bit quotient and 4-bit remainder is:
    
      
    
    
    Notice that the building block is similar to the 'madd' component
    in the parallel multiplier. The 'cas' component is the same full
    adder with an additional xor gate.
    
       The VHDL test driver is divcas4_test.vhdl
    
       The VHDL output is divcas4_test.out
    
       The Cadence run file is divcas4_test.run
    
       The partial Makefile is Makefile.divcas4_test
    
    Divide can create on overflow condition. This is typically handled by
    separate logic in order to keep the main circuit neat. There is a
    one bit preshift of the dividend in the manual, serial and parallel
    division. Thus, no dividend bit number seven appears on the parallel
    schematic.
    
    

    Lecture 21 Karnaugh maps, Quine McClusky

    A Karnaugh map, K-map, is a visual representation of a Boolean function.
    The plan is to recognize patterns in the visual representation and
    thus find a minimized circuit for the Boolean function.
    
    There is a specific labeling for a Karnaugh map for each number
    of circuit input variables. A Karnaugh map consists of squares where
    each square represents a minterm. Notice that only one variable can
    change in any adjacent horizontal or vertical square. Remember that
    a minterm is the input pattern where there is a '1' in the output
    of a truth table.
    
    After the map is drawn and labeled, a '1' is placed in each square
    corresponding to a minterm of the function. Later an 'X' will be
    allowed for "don't care" minterms. By convention, no zeros are
    written into the map.
    
    Having a filled in map, visual skills and intuition are used to
    find the minimum number of rectangles that enclose all the ones.
    The rectangles must have sides that are powers of two.  No
    rectangle is allowed to contain a blank square. The map is a toroid
    such that the top row is logically adjacent to the bottom row and
    the right column is logically adjacent to the left column. Thus
    rectangles do not have to be within the two dimensional map.
    
    The resulting minimized boolean function is written as a sum of
    products. Each rectangle represents a product, "and" gate, and
    the products are summed, "or gate", to produce the result. A rectangle
    that contains both a variable and its complement does not have
    that variable in the product term, omit the variable as an input
    to the "and" gate.
    
    
       Basic labeling    Minterm numbers     Minterms 
    
            B=0 B=1           B=0 B=1           B=0 B=1
           +---+---+         +---+---+         +---+---+
       A=0 |   |   |     A=0 |m0 |m1 |     A=0 |__ |_  |
           +---+---+         +---+---+         |AB |AB |
       A=1 |   |   |     A=1 |m2 |m3 |         +---+---+
           +---+---+         +---+---+     A=1 | _ |   |
                                               |AB |AB |
                                               +---+---+
    
     Truth table        Karnaugh map    Covering with rectangles
    
       A B | F             B=0 B=1            B=0   B=1
       ----+--            +---+---+         +-----+-----+
       0 0 | 0        A=0 |   | 1 |         |     |+---+|
       0 1 | 1  m1        +---+---+     A=0 |     || 1 ||
       1 0 | 1  m2    A=1 | 1 |   |         |     |+---+|
       1 1 | 0            +---+---+         +-----+-----+
                                            |+---+|     |
                                        A=1 || 1 ||     |
                                            |+---+|     |
                                            +-----+-----+
                                _     _
       Minimized function   F = AB + AB
    
       Note: For each covering rectangle, there will be exactly one
       product term in the final equation for the function.
       Find the variable(s) that are both 1 and 0 in the rectangle.
       Such variables will not appear in the product term. Take any
       minterm from the covering rectangle, replace 1 with the variable,
       replace 0 with the complement of the variable. Cross out the
       variables that do not appear. The result is exactly one product
       term needed by the final equation of the function.
    
    
    
    
    
    It is possible to have minterms that are don't care. For these
    minterms, place an "X" or "-" in the Karnaugh map rather than
    a one. The covering follows the obvious extended rule.
    Covering rectangles may include any don't care squares.
    Covering rectangles do not have to include don't care squares.
    No rectangle can enclose only don't care squares.
    
    

    Quine McClusky minimization

    A tabular algorithm for producing the minimum two level sum of products
    is know as the Quine McClusky method.
    
    You may download and build the software that performs this minimization.
    qm.tgz or link to a Linux executable
    ln -s /afs/umbc.edu/users/s/q/squire/pub/linux/qm qm
    
    The man page, qm.1 , is in the same directory.
    
    The algorithm may be performed manually using the following steps:
    1) Have available the minterms of the function to be minimized.
       There may be X's for don't care cases.
    
    2) Create groups of minterms, starting with the minterms with the
       fewest number of ones.
       All minterms in a group must have the same number of ones and
       if any X's, the X's must be in the same position. There may be
       some groups with only one minterm.
    
    3) Create new minterms by combining minterms from groups that
       differ by a count of one. Two minterms are combined if they
       differ in exactly one position. Place an X in that position
       of the newly created minterm. Mark the minterms that are
       used in combining (they will be deleted at the end of this step).
       Basically, take the first minterm from the first group. Compare
       this minterm to all minterms in the next group(s) that have
       one additional one. Repeat working until the last group is reached.
    
    4) Delete the marked minterms.
    
    5) Repeat steps 2) 3) and 4) until no more minterms are combined.
    
    6) The minimized function is the remaining minterms, deleting any
       X's.
    
    Example:
    1) Given the minterms
      A B C D | F
      --------+--
      0 0 0 0 | 1  m0
      0 0 1 0 | 1  m2
      1 0 0 0 | 1  m8
      1 0 1 0 | 1  m10
    
    2) Create groups
       m0  0 0 0 0   count of 1's is 0 
           -------
       m2  0 0 1 0   count of 1's is 1
       m8  1 0 0 0
           -------
       m10 1 0 1 0   count of 1's is 2
    
    3) Create new minterms by combining
       Compare all in first group to all in second group
       m0 to m2  0 0 0 0
                 0 0 1 0
                 =======  they differ in one position
                 0 0 X 0  combine and put an X in that position
    
       m0 to m8  0 0 0 0
                 1 0 0 0
                 =======  they differ in one position
                 X 0 0 0  combine and put an X in that position
    
      Compare all in second group to all in third group
      m2 to m10  0 0 1 0
                 1 0 1 0
                 =======  they differ in one position
                 X 0 1 0  combine and put an X in that position
    
      m8 to m10  1 0 0 0
                 1 0 1 0
                 =======  they differ in one position
                 1 0 X 0  combine and put an X in that position
    
      no more candidates to compare.
    
    4) Delete marked minterms (those used in any combining)
       (do not keep duplicates) Thus the minterms are now:
       0 0 X 0
       X 0 0 0
       X 0 1 0
       1 0 X 0
    
    2) Repeat grouping (technically there are four groups, although
       the number of ones is either zero or one).
       0 0 X 0
       -------
       X 0 0 0
       -------
       X 0 1 0
       -------
       1 0 X 0
    
    3) Create new minterms by combining
       0 0 X 0
       1 0 X 0  any X's must be the same in both
       =======  they differ in one position
       X 0 X 0  combine and put an X in that position
    
       X 0 0 0
       X 0 1 0
       =======  they differ in one position
       X 0 X 0  combine and put an X in that position
    
    4) Delete marked minterms (those used in any combining)
       (do not keep duplicates) Thus the minterms are now:
       X 0 X 0
    
    5) No more combining is possible.
    
    6) The minimized function is the remaining minterms, deleting any
       X's. All remaining minterms are prime implicants
    
       A B C D            __
       X 0 X 0   thus F = BD
    
    In essence, the Quine McClusky algorithm is doing the same
    operations as the Karnaugh map. The difference is that no guessing
    is used in the Quine McClusky algorithm and "qm" as it is called,
    can be (and has been) implemented as a computer program.
    
    A final note on labeling:
    It does not matter what names are used for variables.
    It does not matter in what order variables are used.
    It does not matter if "-" or "X" is used for don't care.
    It is important to keep a consistent relation between the bit
    positions in minterms and the order of variables.
    
    You may download and build the software that performs this minimization.
    qm.tgz or link to a Linux executable
    ln -s /afs/umbc.edu/users/s/q/squire/pub/linux/qm qm
    
    The man page, qm.1 , is in the same directory.
    More information is at Simulators and parsers
    
    

    Lecture 22 Flip flops, latches, registers

    We now focus on sequential logic. Logic with storage and state.
    The previous lectures were on combinational logic, gates.
    
    In order to build very predictable large digital logic systems,
    synchronous design is used. A synchronous system has a special
    signal called a master clock. The clock signal continuously
    has values 0101010101010 ... . This is usually just a square
    wave generator at some frequency. A clock with frequency 1 GHz
    has a period of 1 ns. Half of the period the clock is a logical 1
    and the other half of the clock period the clock is a logical 0. 
    
               ___     ___     ___
     clk   ___|   |___|   |___|
    
          |< 1 ns>|
    
       The VHDL code fragment to generate the  clk  signal is:
            signal clk : std_logic := '0';
          begin
            clk <= not clk after 500 ps;
    
    
    A synchronous system is designed with registers that input a
    value on a raising clock edge, hold the signal until the next
    raising clock edge. The designer must know the timing of
    combinational logic because the signals must propagate through
    the combinational logic in less than a clock time.
    
    Combinational logic can not have loops or feedback.
    Sequential logic is specifically designed to allow loops and
    feedback. The design rule is that and loop or feedback must
    include a storage element (register) that is clocked.
    
          +------------------------------------+
          |                                    |
          |  +---------------+   +----------+  |
          +->| combinational |-->| register |--+
             | logic         |   |          |
             +---------------+   +----------+
                                      ^
                                      | clock signal
    
    
    A register may be many bits and each bit is built from a flip flop.
    A flip flop is ideally either in a '1' state or a '0' state.
    The most primitive flip flop is called a latch. A latch can be made
    from two cross coupled nand gates. The latch is not easy to work
    with in large circuits, thus JK flip flops and D flip flops are
    typically used. In modern large scale integrated circuits, the 
    flip flops and thus the registers are designed at the device level.
    
    A classical model of a JK flip flop is
    
    
    
    On the raising edge of the clock signal,
       if J='1' the Q output is set to '1'
       if K='1' the Q output is set to '0'
       if both J and K are '1', the Q signal is inverted.
    
    Note that Q_BAR is the complement of Q in the steady state.
    There is a transient time when both could be '1' or both could be '0'.
    The SET signal is normally '1' yet can be set to '0' for a short
    time in order to force Q='1' (set the flip flop). 
    The RESET signal is normally '1' yet can be set to '0' for a short
    time in order to force Q='0' (reset the flip flop or register to zero).
    
    A slow counter, called a ripple counter, can be made from JK flip
    flops using the following circuit:
    
    
    
    The VHDL source code for the entity JKFF, the JK flip flop,
    and the four bit ripple counter is jkff_cntr.vhdl
    
    The Cadence run file is jkff_cntr.run
    
    The Cadence output file is jkff_cntr.out
    
    ncsim: 04.10-s017: (c) Copyright 1995-2003 Cadence Design Systems, Inc.
    ncsim> run 340 ns
    q3, q2, q1, q0  q3_ q2_ q1_ q0_ clk
    0   0   0   0   1   1   1   1   1  at  10 NS
    0   0   0   1   1   1   1   0   1  at  30 NS
    0   0   1   0   1   1   0   1   1  at  50 NS
    0   0   1   1   1   1   0   0   1  at  70 NS
    0   1   0   0   1   0   1   1   1  at  90 NS
    0   1   0   1   1   0   1   0   1  at  110 NS
    0   1   1   0   1   0   0   1   1  at  130 NS
    0   1   1   1   1   0   0   0   1  at  150 NS
    1   0   0   0   0   1   1   1   1  at  170 NS
    1   0   0   1   0   1   1   0   1  at  190 NS
    1   0   1   0   0   1   0   1   1  at  210 NS
    1   0   1   1   0   1   0   0   1  at  230 NS
    1   1   0   0   0   0   1   1   1  at  250 NS
    1   1   0   1   0   0   1   0   1  at  270 NS
    1   1   1   0   0   0   0   1   1  at  290 NS
    1   1   1   1   0   0   0   0   1  at  310 NS
    0   0   0   0   1   1   1   1   1  at  330 NS
          ________________________________________________________________
    reset                                                                 
           _   _   _   _   _   _   _   _   _   _   _   _   _   _   _   _  
    clk   | |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_
              ___     ___     ___     ___     ___     ___     ___     ___ 
    q0    ___|   |___|   |___|   |___|   |___|   |___|   |___|   |___|   |
                  _______         _______         _______         _______ 
    q1    _______|       |_______|       |_______|       |_______|       |
                          _______________                 _______________ 
    q2    _______________|               |_______________|               |
                                          _______________________________ 
    q3    _______________________________|                               |
    
    Ran until 340 NS + 0
    ncsim> exit
    
    In many designs, only one input is needed and the resulting flip flop
    is a D flip flop. A D flip flop needs 6 nand gates rather than the
    9 nand gates needed by the JK flip flop. There is a proportional
    reduction is devices when the flip flop is designed from basic
    transistors.
    
    
    
    The VHDL source code for the entity DFF, the D flip flop,
    and the four bit counter is dff_cntr.vhdl
    
    The Cadence run file is dff_cntr.run
    
    The Cadence output file is dff_cntr.out
    
    
    The VHDL source code for the D flip flops in the text book
    is dff.vhdl
    Entity dff1 is five nand model from page 248, Fig. 6-8.
    Entity dff2 is six  nand model from page 300, Fig. 6-36
    
    The Cadence run file is dff.run
    
    The Cadence output file is dff.out
    
    The D flip flop ripple counter from the book
    
    
    
    

    Lecture 23 Sequential Logic

    Sequential logic can be represented as three equivalent forms:
    State Transition Table, State Transition Diagram and logic circuit.
    A State is given a name, we use A, B, C for this discussion, yet
    meaningful names are better. A machine or sequential logic circuit
    can be in only one state at a time. We are assuming synchronous
    logic where all flip flops are clocked by the same clock signal.
    The input signal is assumed to be available just before each clock
    transition. Optionally, the arrival of an input can also cause the
    clock to have one pulse.
    
    Two possible forms of State Transition Table are:
    
              Input       state    state
             | 0 | 1          Input
           --+---+---        A  0  B
    state  A | B | C         A  1  C
           B | C | B         B  0  C
           C | C | A         B  1  B
                             C  0  C
                             C  1  A
    
    The meaning of both tables is:
      When in state A with input 0 transition to state B
      When in state A with input 1 transition to state C
      When in state B with input 0 transition to state C
      When in state B with input 1 stay in state B
      etc.
    
    The exact same information can be presented as a
    State Transition Diagram.
    
    
    
    The meaning is the same:
      When in state A with input 0 transition to state B
      When in state A with input 1 transition to state C
      When in state B with input 0 transition to state C
      When in state B with input 1 stay in state B
      etc.
    
    To convert either a State Transition Table or Diagram to
    a circuit, assign a D flip flop to each state. The "q" output
    of the flip flop is assigned the signal name of the state.
    The "d" input of the flip flop is assigned a signal name
    of the state concatenated with "in".
    
    Write the combinational logic equations for each state
    input from observing the "to" state in the transition table
    or diagram.
    
    For this sequential machine, using I as the input
    
     Ain <= (C and I);        -- C transitions to A when I='1'
    
     Bin <= (A and not I) or  -- A transitions to B when I='0'
            (B and I);        -- B transitions to B when I='1'
    
     Cin <= (A and I)     or  -- A transitions to C when I='1'
            (B and not I) or  -- B transitions to C when I='0'
            (C and not I);    -- C transitions to C when I='0'
    
    The partial circuit is shown below.
    Implied is a set signal to A and reset signals to
    B and C for the initial or start condition.
    Implied is a common clock signal to all flip flops.
    
    
    
    Not shown is the output(s) that may be any combinational
    circuit, function, of the input and states.
    e.g. out <= (A and I) or (B and not I);
    
    There is an algorithm and corresponding computer program for
    minimizing the State Transition Table,
    see Myhill Nerode minimization.
    
    There is an algorithm and corresponding computer program for
    minimizing the combinational logic
    are Quine McClusky minimization.
    
    One application of sequential logic is for garage door openers
    or car door locks. The basic sequential logic is a spin lock.
    This circuit has the property of eventually detecting the
    specific sequence it is designed to accept. The transmitter may
    start anywhere in the sequence and continue to repeat the sequence
    until the receiver detects the specific sequence.
    
    
    
    This "spin lock" is designed to accept the sequence  101101.
    A transmitter could be designed to send the specific sequence
    followed by an equal number of zero bits then repeat the
    specific sequence. (Assuming the first bit of the sequence is a '1')
    
    More sophisticated spin locks will change the sequence that is
    detected each time a sequence is accepted. The transmitter must then
    send a family of sequences because, in general, the transmitter
    will not know what the receivers sequence setting is. A method
    of handling this unknown is to have the receiver change to a
    pseudo random setting of some of the bit positions. The
    transmitter then generates and transmits all of the pseudo random
    patterns in the correct bit positions. Sample pseudo random
    sequence generators are shown below.
    
    
    A maximal length pseudo random sequence generator can generate
    2^n -1 unique patterns with an n-bit shift register. For each
    number of shift register stages there are one or more feedback
    circuits using just exclusive-or to compute the next input bit.
    
    The output may be n-bit patterns, a2, a1. a0 in the circuit below.
    The output may also be a bit stream taken from just a0 or a2.
    
    The basic shift register is clocked at some frequency. Bits shift
    left to right one position per clock. The top bit is inserted
    based on the feedback into the exclusive-or gate(s).
    
    A sequence, starting with the "seed" 0 0 1 is shown below:
    
    
    
                   0           0           1
                   1           0           0
                   0           1           0
                   1           0           1
                   1           1           0
                   1           1           1
                   0           1           1
                   0           0           1
    
    Notice the 2^3 -1 = 7 unique patterns and then the repeat.
    
    A maximal length pseudo random shift register for 5-bit patterns is
    shown in typical abbreviated schematic form.
    
    
    
    With a seed of  0 0 0 0 1 the next few values are
                    1 0 0 0 0
                    1 1 0 0 0
                    1 1 1 0 0
                    0 1 1 1 0
    
    The full output sequence with bits reversed
    
    Maximal length pseudo random sequences may be generated for
    any length. Below in short hand notation is the feedback paths
    for many lengths up to 32.
    
    /* length  bits(high order first) of h[], right is h[0]
     *  2      1 1 1          top bit always one, the input to msb stage
     *  3      1 0 1 1        bottom bit always one, output of lsb stage
     *  4      1 0 0 1 1        x(4)= x^1+x^0             x^0=1 initially
     *  5      1 1 0 1 1 1       x(5)=x^4+x^2+x^1+x^0     x^0=1 initially
     *  6      1 0 0 0 0 1 1                            + is exclusive or
     *  7      1 1 1 1 0 1 1 1
     *  8      1 1 1 1 0 0 1 1 1
     *  9      1 1 1 0 0 0 1 1 1 1
     * 10      1 1 0 0 1 1 1 1 1 1 1
     * 11      1 1 0 1 1 0 0 1 1 1 1 1
     * 12      1 1 0 0 0 1 0 0 1 0 1 1 1 
     * 13      1 1 0 0 0 1 1 1 1 1 1 1 1 1
     * 14      1 1 0 1 0 0 0 1 1 1 1 1 1 1 1
     * 15      1 1 0 1 0 0 0 1 1 0 1 1 1 1 1 1
     * 16      1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1
     * 18      1 1 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1
     * 20      1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1
     * 24      1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1
     * 30      1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
     * 31      1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1
     * 32      1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 1 1
     */
    
    

    Lecture 24 Computer organization

    Below is a schematic of a one clock per instruction computer.
    
    
    
    The operation for each instruction is:
    
      The Instruction Pointer Register contains the address of the
      next instruction to be executed.
    
      The instruction address goes into the Instruction Memory of
      Instruction Cache and the instruction comes out.
      "inst" on the diagram.
    
      The Instruction Decode has all the bytes of the instruction:
    
        The instruction has bits for the operation code.
        e.g. there is a different bit pattern for add, sub, etc.
    
        Most instructions will reference one register. The register
        number has enough bits to select one of the general registers.
    
        Many instructions have a second register. (Not shown here,
        on some computers there can be three registers.) The second
        (or third) register may be the register number that receives
        the result of the operation.
    
        Many instructions have either a memory address for a operand or
        a memory offset from a register or immediate data for use by
        the operation. This data is passed into the ALU for use by
        the operation, either for computing a result or computing
        an address.
    
      The general registers receive two register numbers and very
      quickly output the data from those two registers.
    
      The ALU receives two data values and control from the
      Operation Code part of the instruction. The ALU computes
      the value and outputs the value on the line labeled "addr".
      This line goes three places: To the mux and possibly into
      the Instruction Pointer if the operation is a jump or a branch.
      To the Data Memory or Data Cache if the value is a computed
      memory address. To the mux that may return the value to a register.
    
      The Data Memory or Data Cache receives an address and write data.
      Depending on the control signals "write" and "read":
      The Data Memory reads the memory value and send it to the mux.
      The Data Memory writes the  "write date" into memory at
      the memory location "addr".
    
      The final mux may take a value just read from the Data Memory
      or Data Cache and return that value to a register or
      take the computed value from the ALU and return that value
      to a register.
    
      While the above signals are propagating, the Instruction Pointer
      is updated by either incrementing by the number of bytes in the
      instruction or from the jump or branch address.
    
    This is one instruction, the clock transitions and the next instruction
    is started.
    
    The timing consideration that limits the speed of this design is
    the long propagation from the new Instruction Pointer value until
    the register is written. Notice that the register is written on
    clock_bar and the Data Cache is written on clock_bar. Any real
    computer must use instruction and data caches for this design
    because RAM memory access is slower than logic on the CPU chip. 
    
    

    Lecture 25 Instruction set

    
    This lecture uses Intel documentation on the IA-32 Architecture.
    In principal this covers all Intel 80x86 machines up to and including
    the Pentium 4. Stored locally in order to minimize network traffic.
    
    First look over Appendix B. (This is a  .pdf  file that your
    browser should activate  acroread  to display. Look on the left
    for a table of contents and ultimately click on Appendix B.
    
    Intel IA-32 Instructions(pdf) 
    
    Note the "One Byte" opcodes. There are two tables with up to 128
    instruction operation codes in each table.
    
    Then move on to the "Two Byte" opcodes. The first opcode byte would
    tell the CPU to look at the next byte to determine the operation code
    for this instruction.
    
    Now, move back to Appendix A and see the various formats that
    an instruction may have. Consider the choices that would have to be
    made by a programmer writing a disassembler for this architecture.
    
    Intel IA-32 Instructions(pdf) 
    
    The IA-32 is a CISC, Complex Instruction Set Computer.
    
    This is in contrast to computer architectures such as the
    Alpha, MIPS, PowerPC = Power4 = MAC G5, etc. that are
    RISC, Reduced Instruction Set Computer. "Reduced" does not
    mean, necessarily, fewer instructions. "Reduced" means
    lower complexity and more regularity. Typically all instructions
    are the same number of bytes. Four bytes equals 32 bits is the most
    popular. Regular in the sense that all registers are general
    purpose. Not like the IA-32 using EAX and EDX for multiply
    and divide.
    
    

    Lecture 26 Data paths

    
    The CPU can be described by control paths and data paths.
    
    This lecture will follow a few instructions through the
    data paths indicated in the architecture schematic in
    lecture 24
    
    We then consider instruction timing and the possible
    improvement of having higher clock speeds with a
    pipelined computer architecture.
    
    

    Lecture 27 Arithmetic Logic Unit

    
    
    
    

    Lecture 28 Architecture

    
    
    

    Lecture 29 Review

      Review previous lectures
    
      Sample questions will be presented in class.
    
    

    Lecture 30 Final Exam

    The midterm was considered the end of the Assembly Language
    part of this course. Thus, the final exam will cover 
    lectures 15 through 29 on digital logic and computer organization.
    
    There will be questions of types:
      true-false
      multiple choice
      short answer (words, numbers, logic equations)
    
         know the symbols and truth tables for
         "and"  "nand" "or"  "nor" "not"  "xor" "mux"  "dff"
    
         know how to recognize the corresponding State Diagram,
         State Transition Table and schematic and VHDL statements
         for sequential logic. (e.g. project)
    
         know how to construct Karnaugh map from minterms.
    
         know how to get VHDL equation from Karnaugh map.
    
         recognize adders, subtractors and simple logic circuits.
    
         understand data flow through a computer architecture.
    
    

    Other links

    Go to top

    Last updated 5/9/04