[CMSC 411 Home] | [Syllabus] | [Project] | [VHDL resource] | [Homework 1-6] | [Homework 7-12] [Files] | [Lecture Notes]

CMSC 411 Computer Architecture Project Fall 2020

corrected

ask questions on webex, https://umbc.webex.com/meet/squire

Tuesdays and Thursdays 1:00pm to 2:15pm

additional information and questions

Cadence setup, 2020, follow instructions exactly, Best to use
Makefiles to save on your typing.
Log into linux.gl.umbc.edu and type commands: (skip any done for HW4,HW6)
  mkdir cs411
  cd cs411
  mkdir vhdl2
  cd vhdl2
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/cadence_setup.tar . 
  tar -xvf cadence_setup.tar
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/Makefile_411 .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/Makefile_ghdl .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/t_table.run  .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/t_table.vhdl  .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/inverter.run  .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/inverter.vhdl  .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/inverter_test.vhdl  .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/add32_test.run  .
  cp  /afs/umbc.edu/users/s/q/squire/pub/download/add32_test.vhdl  .
  # if you typed everything correctly, run cadence logic simulation
  make -f Makefile_411 t_table.out
  make -f Makefile_ghdl t_table.out
  make -f Makefile_411 inverter_out.txt
  make -f Makefile_ghdl inverter_out.txt
  make -f Makefile_ghdl clean # saves on quota 

Look over the files newly created cadence files   dir -ltr *

You may use either or both cadence and GHDL simulator.
  make -f Makefile_ghdl t_table.gout
  make -f Makefile_ghdl add32_test.gout

Look over the files newly created GHDL files    dir -ltr *
  Oh, junk files also. Remove with clean
  make -f Makefile_ghdl clean
  dir -ltr *


The goal of the semester project is to design and simulate a pipelined RISC CPU. Major components will be the pipelined ALU data path, the instruction decoder, hazard detection and associated forwarding/stall and cache memory controller.

Do not copy a previous semesters project

It will not work, you will loose points.

You will get a  -0  or worse, on any project part that is
a copy of a previous semesters project. DO NOT COPY !


Submitting your Project

 The project is to be submitted on GL as five transactions for five files:
   submit cs411 part1 part1.vhdl
   submit cs411 part2 part2a.vhdl
   submit cs411 part2 part2b.vhdl
   submit cs411 part3 part3a.vhdl
   submit cs411 part3 part3b.vhdl

 The files you submit are not the starter files but the starter files
 with your additions to make it work. Do not submit extra files.
 I use makefiles and  .chk .chkg files to grade projects.
 
 Note: DO NOT use "Blackboard" for turning in project or homework.

Five Part Project

  • part1
  • part2a
  • part2b
  • part3a
  • part3b
  • Other Links
  • Getting Started

    Each time you log on, using Cadence VHDL: cd cs411/vhdl2 then work on your .vhdl files make, then fix errors and check "diff" if no errors make clean # just before you logoff, save disk quota If Cadence license problem, use Makefile_ghdl You may use either cadence or GHDL or both on GL. MAC OSX users do the following: mkdir ghdl # do all your work here brew install Caskroom/cask/ghdl # this installs ghdl, like vhdl

    Start the project by getting files

     Starter files may be copied to your vhdl2 subdirectory on
     linux.gl.umbc.edu  using commands such as: set up for cadence and GHDL
    
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/part1_start.vhdl .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/cs411_opcodes.txt .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/bshift.vhdl .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/part1.abs .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/part1.run .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/part1.chk .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/part1.chkg .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/divcas16.vhdl .
     cp  /afs/umbc.edu/users/s/q/squire/pub/download/bshift.vhdl .
    
    
     
     you should have      add32.vhdl   file from your HW4
                          pmul16.vhdl  file from your HW6
    

    Part1

     PART1: Handle lw, sw, add, sub, and, or, addi, lwim, sll, srl, mul, div, cmpl
                   and nop with no hazards.
            (nop's are inserted in the part1.abs file to prevent hazards.)
            See cs411_opcodes.txt for detailed instruction formats and definitions.
            See reglist.txt for register use conventions.
       You should use part1_start.vhdl as a       start for coding your circuit.
      use part1_start.vhdl as a start for coding your circuit.
            You can do your own shift circuit or use the bshift.vhdl component.
            The instruction definitions and bit patterns for this semester are in
            cs411_opcodes.txt
    
       Quick start steps:
         1)  copy part1_start.vhdl to part1.vhdl
    	 then work on project in part1.vhdl  
         2)  replace  "_start" with "", e.g  delete _start everywhere.
         3)  fill in .vhdl for the ALU_32 architecture to implement
             sub, and, or, sll, srl, cmpl, mul, div . See diagram.
             All other instructions must do a plain add.
             Note that EX_IR coming into ALU_32 has the instruction in "inst"
             and a possible schematic and some code ALU.
         4)  compute the signal  WB_write_enb (needs 'or' of more opcodes)
             search for ??? some more work needed
              as an example for setting a mux control based on opcode.
              In each stage **_IR is the instruction currently in that stage.
              **_IR(31 downto 26) is the six bit major op code. "100011" for lw
              **_IR(5 downto 0) is the six bit minor op code. "100000" for add.
                  when major op code is "000000"
    
         5) Compile, analyze, run using commands in your Makefile
            make -f Makefile_411 part1.out
    	make -f Makefile_ghdl part1.gout
    	
         6) then do difference:
            diff -iw part1.out part1.chk | more
            diff -iw part1.gout part1.chkg | more
    
    	(ignore "squire" just from my run, ignore extra
    	cadence or GHDL stuff. Use either or both to find errors.)
    	
    	Look at difference  <  lines are yours
    	                    >  lines are check, required
    	see line number, look at  part1.out on that line and
    	see what instruction is being executed in which stage and
    	fix that instruction.
    	Or, if difference in ALUSrc or RegDst etc, fix that signal.
    	
       Add to Makefile, if you did not download Makefile_411	
          all:  ... part1.out  # add part1.out to the list
    
          part1.out: part1.vhdl add32.vhdl bshift.vhdl pmul16.vhdl \
                     divcas16.vhdl part1.run part1.abs
             run_ncvhdl.bash -v93 add32.vhdl ...
             run_ncvhdl.bash -v93 bshift.vhdl ...
             run ncvhdl.bash -v93 pmul16.vhdl ...
             run ncvhdl.bash -v93 divcas16.vhdl ...
             run_ncvhdl.bash -v93 part1.vhdl  # copied from part1_start.vhdl
             run_ncelab.bash -v93 part1:schematic
    	 run_ncsim.bash  -batch -logfile part1.out -input part1.run part1
    
             diff -iw part1.out part1.chk     should be no differences
                                              no stalls, timing should be exact
            in Makefile_ghdl:
    	
            ghdl -a --ieee=synopsys add32.vhdl
            ghdl -a --ieee=synopsys bshift.vhdl
            ghdl -a --ieee=synopsys pmul16.vhdl
            ghdl -a --ieee=synopsys divcas16.vhdl
            ghdl -a --ieee=synopsys part1.vhdl
            ghdl -e --ieee=synopsys part1
            ghdl -r --ieee=synopsys part1 --stop-time=250ns > part1.gout
            diff -iw part1.gout part1.chkg     should be no differences
                                                no stalls, timing should be exact
    
    
    
    
            The CS411 Project Part 1 uses a schematic as shown in Lecture 18
            and part1.ps
            Check that opcodes are latest cs411_opcodes.txt
    
            For grading reasons, keep the signal names that
            are pipeline registers and the entity/memory names.
    
    
            The resulting output should be as shown in
             part1.chk  file based on part1.abs and  part1.run 
    
            Check the results in part1.out to be sure the instructions
            worked. You can follow each instruction through the pipeline
            by following the instruction register, *_IR and check the
            *_*  signals for correct values at each stage.
    
            It is possible that your part1.out does not agree with
            part1.chk but you should
            be able to explain why. (Probably different don't care choices.)
    
            You may want to copy part1.vhdl to another file and add more
            'write' statements to print out more internal signal names in order
            to help debug your circuit. debug.txt
    
            Submit all components and your main circuit as one plain text
            file using submit. DO NOT include add32.vhdl or bshift.vhdl,
            pmul16.vhdl, divcas16.vhdl, etc.
            they are provided by the instructor for testing. The file
            must be named  "part1.vhdl". DO NOT EMail except for questions.
    
            You submit on GL using:  submit cs411 part1 part1.vhdl
    
            No makefiles or run files or output is to be
            submitted. Partial credit will be given based on number of
            instructions simulated correctly. The starter file part1_start.vhdl
            only simulates the  lw and a few instruction correctly.
    
            This code  part1.vhdl gets copied to part2a.vhdl for next project
    
    

    Part2a: Copy your part1.vhdl to part2a.vhdl

    Substitute string "part2a" for every "part1" cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.chkg . implement data forwarding and jump and branch. CS411 does the branch and jump in the ID stage CS411 goes beyond the book by forwarding for beq. submit cs411 part2 part2a.vhdl # before working part2b You are upgrading part1.vhdl to part2a.jpg or part2a.ps Data forwarding paths must cover at least those cases covered in class (see the class handout for details). Additional insight may be gained from a comparison of the pipeline stages with and without data forwarding in forward.txt A possible implementation of forwarding is forward_mem.jpg The EX stage forwarding may use entity mux_32_3, a multiplexor with three 32-bit inputs. Note: jump and beq are followed by a delayed branch slot that contains an instruction that is always executed. jump can not cause a stall. If beq does not get data forwarding, then it can stall, and stall, and stall. Add data forwarding for beq by adding two mux's in the ID STAGE that get inputs from the MEM stage as shown in part2a.jpg or part2a.ps Implement your circuit assuming that software has correctly filled the delayed branch slot and implement the branch in the ID stage as modified for this class project. You may use the mux32_3 For grading reasons, keep the signal names that are pipeline registers and the component/memory names. Run the following commands to check your work. make -f Makefile_411 part1.out make -f Makefile_ghdl part1.gout diff -iw part1.out part1.chk diff -iw part1.gout part1.chkg Implement green logic in two diagrams in lecture 19 For additional debugging, download and insert debug_forward.vhdl fix signal names if yours are different, and diff -iw part2a.out part2a_print.chk Ignore difference in PC_next in clock 6. My bug in my part2a.vhdl made an error in part2a.chk and part2a.chkg. MEM_data_reg should be EX_BB, MEM_addr): I had EX_B. Fixed now, part2a.chk, part2a.chkg OK. (The part2a_bug.chk and part2a_bug.chkg still accepted) OK to submit either way. I will ignore it when grading.

    Part2b: Copy your part2a.vhdl to part2b.vhdl

    Substitute string "part2b" for every "part2a" cp /afs/umbc.edu/users/s/q/squire/pub/download/Makefile_ghdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.chkg . implement hazard detection and stall the minimum possible. Handle hazards. Detect hazards, prevent wrong results by stalling when necessary. A stall is implemented by holding the instruction in the ID stage and letting the EX, MEM and WB stages proceed. The stall signal prevents the IF and ID stages from getting a clock signal. A terse summary of the hazard detection is in hazard.txt A possible implementation of hazards is stall_lw.jpg The CS411 Project Part 2b uses a modified schematic handed out See web for schematic part2b.jpg and part2b.ps Run the following commands to check your work. make -f Makefile_411 part2b.out make -f Makefile_ghdl part2b.gout diff -iw part2b.out part2b.chk diff -iw part2b.gout part2b.chkg OK if different PCnext on clock 6 Part2b needs both data forwarding and hazards (stalls) Submit all components and your main circuit as one plain text file using 'submit'. No makefiles or run files or output is to be submitted. Partial credit will be given based on number of data forwards, jump, beq, and hazard stalls handled correctly. Your circuit will not be tested with jump or branch or data addresses greater than 10 bits, in other words your instruction and data memories do not need to be bigger than 1024 words. You may not get exactly the .chk results. (except Clock 6, any PCnext also possibly a few more PC_next you still get 100) Timing and stalls will be graded. Points will be deducted for memory or register differences or improper stalls. For additional debugging, download and insert debug_stall.vhdl fix signal names if yours are different, and diff -iw part2b.out part2b_print.chk

    Part3a: Copy your part2b.vhdl to part3a.vhdl

    Substitute "part3a" for every "part2b" cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.chkg . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a_print.chk . Implement a cache in the instruction memory (read only) submit cs411 part3 part3a.vhdl Put the cache inside the instruction memory component (entity and architecture). (you will need to pass a few extra signals in and out) One output is the name of miss signal set <= '1', '0' etc. Then in architecture part3a "or" the new signal into "stall". part3a.ps Use the existing shared memory data as the main memory. Make a miss on the instruction cache cause a three cycle stall. A cycle is 10 ns, thus a three cycle stall is 30 ns. Previous stalls from part2b must still work. added: add to end of entity instruction_memory miss: out std_logic); add code after local_miss <= miss <= '1','0' after 30ns; -- to hold stall add to inst_mem: WORK.instruction_memory miss => IF_miss); -- instruction fetch miss add to stall <= ... or IF_miss; -- define above signal IF_miss: std_logic := '0'; The instruction cache cache holds 16 words organized as four blocks of four words. Remember vhdl memory is addressed by word address, the MIPS/SGI memory is addressed by byte address and a cache is addressed by block number. The cache schematic for the instruction cache was handed out in class and shown in. icache.jpg or cache.png The cache may be implemented using behavioral VHDL, basically writing sequential code in VHDL or by connecting hardware. Possible behavioral, not required, VHDL to set up the start of a cache: (no partial credit for just putting this in your cache.) add in or out signals to entity instruction_memory as needed for example, 'clk' 'clear' 'miss' make corresponding changes at inst_mem: also add code below with additions architecture behavior of instruction_memory is subtype block_type is std_logic_vector(154 downto 0); type cache_type is array (0 to 3) of block_type; signal cache : cache_type := (others=>(others=>'0')); signal local_miss : std_logic := '0'; -- needed between process calls -- now we have a cache memory initialized to zero begin -- behavior inst_mem: process ... add local_miss) compute same as miss variable quad_word_address : natural; -- for memory fetch variable cblock : block_type;-- the shaded block in the cache variable index : natural; -- index into cache to get a block variable word : natural; -- select a word variable my_line : line; -- for debug printout alias tag : std_logic_vector(25 downto 0) is cblock(153 downto 128); alias w0 : std_logic_vector(31 downto 0) is cblock(127 downto 96); alias valid : std_logic is cblock(154); -- other alias allowed ... begin if clear = '1' then miss <= '0'; inst <= x"00000000"; end if; if clear = '0' then index := to_integer(addr(5 downto 4)); word := to_integer(addr(3 downto 2)); cblock := cache(index); -- has valid (154), tag (153 downto 128) -- W0 (127 downto 96), W1(95 downto 64) -- W2(63 downto 32), W3 (31 downto 0) -- cblock is the shaded block in handout if (valid = '1') and (tag = addr(31 downto 6)) then -- hit -- ... do hit else -- miss -- ... do miss, get 4 words from memory, set tag and valid ... quad_word_address := to_integer(addr(13 downto 4)); w0 := memory(quad_word_address*4+0); w1 := memory(quad_word_address*4+1); -- ... -- fill in cblock with new words, then cache(index) <= cblock after 30 ns; -- 3 clock delay miss <= '1', '0' after 30 ns; -- miss is '1' for 30 ns local_miss <= '1', '0' after 30 ns; -- to get process to run -- this "miss" signal gets ored into part2b "stall" signal ... -- the part3a.chk file has 'inst' set to zero while 'miss' is 1 -- not required but cleans up the "diff" end if; end if; -- clear = '0' remember to or cashe miss signal into stall signal that takes care of sclk signal. I was a bit extreme in computing the miss signal in the cache. I did not use a hit signal yet had a local_miss signal. signal local_miss : std_logic := '0'; -- saved between calls My process had process(addr, clear, local_miss) More information, including debug print, is in Lecture 24 and debug.txt For debugging your cache, you might find it convenient to add this 'debug' print process right after end process inst_mem; debug: process -- used to print contents of I cache, diff part3a_print.chk variable my_line : LINE; -- not part of working circuit begin wait for 9.5 ns; -- just before rising clock for I in 0 to 3 loop write(my_line, string'("line=")); write(my_line, I); write(my_line, string'(" V=")); write(my_line, cache(I)(154)); write(my_line, string'(" tag=")); hwrite(my_line, cache(I)(151 downto 128)); -- ignore top bits write(my_line, string'(" w0=")); hwrite(my_line, cache(I)(127 downto 96)); write(my_line, string'(" w1=")); hwrite(my_line, cache(I)(95 downto 64)); write(my_line, string'(" w2=")); hwrite(my_line, cache(I)(63 downto 32)); write(my_line, string'(" w3=")); hwrite(my_line, cache(I)(31 downto 0)); writeline(output, my_line); end loop; wait for 0.5 ns; -- rest of clock end process debug; And, add in front of instruction_memory architecture: use STD.textio.all; use IEEE.std_logic_textio.all; Then diff -iw part3a.out part3a_print.chk see part3a_print.chk with debug You may print out signals such as 'miss' using prtmiss from. debug.txt For grading reasons, keep the signal names that are pipeline registers and the component/memory names. make -f Makefile_411 part3a.out make -f Makefile_ghdl part3a.gout diff -iw part3a.out part3a.chk diff -iw part3a.gout part3a.chkg diff -iw part3a.out part3a_print.chk You submit on GL using: submit cs411 part3 part3a.vhdl Ignore difference in PC_next in clock 27. Ignore PC not zero if results in registers and memory are same.

    Part3b: Copy your part3a.vhdl to part3b.vhdl

    Substitute "part3b" for every "part3a" cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.chkg . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b_print.chk . Implement a cache in the data memory (read/write) submit cs411 part3 part3b.vhdl Put the cache inside the data memory entity and process. Almost all the code from the instruction cache, part3a, can be copied and used inside the data memory for the data cache. (you will need to pass a few extra signals in and out) part3b.ps Use the existing shared memory data as the main memory. Make a miss on the data cache cause a three cycle stall of all pipeline stages. (you will need another signal similar to sclk in order to stall the EX, MEM and WB stages e.g. dsclk dsclk replaces clk for all registers in EX, MEM and WB stages.) A cycle is 10 ns, thus a three cycle stall is 30 ns. Previous stalls from part2b and part3a must still work. Change MEMread : std_logic := '1'; to MEMread : std_logic := '0'; for part3b. Do a write through cache for the data memory. (It must work to the point that results in main memory are correct at the end of the run and the timing is correct, partial credit for partial functionality with correct timing for the stalls.) Then test part3b.vhdl with the data cache. make -f Makefile_411 part3b.out make -f Makefile_ghdl part3b.gout diff -iw part3b.out part3b.chk diff -iw part3b.gout part3b.chkg submit cs411 part3 part3b.vhdl Submit all components and your main circuit as one plain text file by using 'submit'. No makefiles or run files or output is to be submitted. Partial credit will be given based on correct timing and number of instructions simulated correctly, number of hazards handled correctly and proper operation of the data cache. Of course, the instruction cache must work before the data cache is graded. I use dsclk in EX, MEM, WB stages in place of clk clk200 <= clk after 200 ps; -- slight delay dsclk <= clk200 or dmiss; -- dmiss out of data_memory stall gets added or dmiss dmiss out of data_memory similar to miss out of instruction_memory my code in data_memory very much like my code in instruction_memory same cache structure in another cache, L1 Dcache Add two signals to entity data_memory clear miss miss will be called dmiss outside the data_memory data_memory does nothing unless either read_enable or write_enable is '1' use code from instruction_memory cashe: very similar when read_enable is '1' reading from memory add code when write_enable is '1' writing into memory one word do nothing in data_memory unless either read_enb or write_enb is '1' Test read_enable and write_enable for both "hit" and "miss" cases. Typical start of data cache process ... begin if clear='1' then miss <= '0'; end if; if clear='0' and miss='0' and (read_enable='1' or (write_enable='1' and write_clk'event and write_clk='1')) then index := to_integer(address(5 downto 4)); word := to_integer(address(3 downto 2)); cblock := cache(index); ... -- for debug, After: end process data_mem; insert for part3b_print.chk debug: process -- used to print contents of D cache, use part3b_print.chk variable my_line : LINE; -- not part of working circuit begin wait for 9.5 ns; -- just before rising clock for I in 0 to 3 loop write(my_line, string'("line=")); write(my_line, I); write(my_line, string'(" V=")); write(my_line, cache(I)(154)); write(my_line, string'(" tag=")); hwrite(my_line, cache(I)(151 downto 128)); -- ignore top bits write(my_line, string'(" w0=")); hwrite(my_line, cache(I)(127 downto 96)); write(my_line, string'(" w1=")); hwrite(my_line, cache(I)(95 downto 64)); write(my_line, string'(" w2=")); hwrite(my_line, cache(I)(63 downto 32)); write(my_line, string'(" w3=")); hwrite(my_line, cache(I)(31 downto 0)); writeline(output, my_line); end loop; wait for 0.5 ns; -- rest of clock end process debug; -- end architecture behavior; -- of data_memory And, add in front of data_memory architecture: use STD.textio.all; use IEEE.std_logic_textio.all;

    Files to download and other links

    Last updated 12/3/2020