[CMSC 411 Home] | [Syllabus] | [Project] | [VHDL resource] | [Homework 1-6] | [Homework 7-12] [Files] | [Lecture Notes]
Cadence setup, 2020, follow instructions exactly, Best to use Makefiles to save on your typing. Log into linux.gl.umbc.edu and type commands: (skip any done for HW4,HW6) mkdir cs411 cd cs411 mkdir vhdl2 cd vhdl2 cp /afs/umbc.edu/users/s/q/squire/pub/download/cadence_setup.tar . tar -xvf cadence_setup.tar cp /afs/umbc.edu/users/s/q/squire/pub/download/Makefile_411 . cp /afs/umbc.edu/users/s/q/squire/pub/download/Makefile_ghdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/t_table.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/t_table.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/inverter.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/inverter.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/inverter_test.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/add32_test.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/add32_test.vhdl . # if you typed everything correctly, run cadence logic simulation make -f Makefile_411 t_table.out make -f Makefile_ghdl t_table.out make -f Makefile_411 inverter_out.txt make -f Makefile_ghdl inverter_out.txt make -f Makefile_ghdl clean # saves on quota Look over the files newly created cadence files dir -ltr * You may use either or both cadence and GHDL simulator. make -f Makefile_ghdl t_table.gout make -f Makefile_ghdl add32_test.gout Look over the files newly created GHDL files dir -ltr * Oh, junk files also. Remove with clean make -f Makefile_ghdl clean dir -ltr *
The goal of the semester project is to design and simulate a pipelined RISC CPU. Major components will be the pipelined ALU data path, the instruction decoder, hazard detection and associated forwarding/stall and cache memory controller.
You will get a -0 or worse, on any project part that is a copy of a previous semesters project. DO NOT COPY !
The project is to be submitted on GL as five transactions for five files: submit cs411 part1 part1.vhdl submit cs411 part2 part2a.vhdl submit cs411 part2 part2b.vhdl submit cs411 part3 part3a.vhdl submit cs411 part3 part3b.vhdl The files you submit are not the starter files but the starter files with your additions to make it work. Do not submit extra files. I use makefiles and .chk .chkg files to grade projects. Note: DO NOT use "Blackboard" for turning in project or homework.
Starter files may be copied to your vhdl2 subdirectory on linux.gl.umbc.edu using commands such as: set up for cadence and GHDL cp /afs/umbc.edu/users/s/q/squire/pub/download/part1_start.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411_opcodes.txt . cp /afs/umbc.edu/users/s/q/squire/pub/download/bshift.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part1.chkg . cp /afs/umbc.edu/users/s/q/squire/pub/download/divcas16.vhdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/bshift.vhdl . you should have add32.vhdl file from your HW4 pmul16.vhdl file from your HW6
PART1: Handle lw, sw, add, sub, and, or, addi, lwim, sll, srl, mul, div, cmpl and nop with no hazards. (nop's are inserted in the part1.abs file to prevent hazards.) See cs411_opcodes.txt for detailed instruction formats and definitions. See reglist.txt for register use conventions. You should use part1_start.vhdl as a start for coding your circuit. use part1_start.vhdl as a start for coding your circuit. You can do your own shift circuit or use the bshift.vhdl component. The instruction definitions and bit patterns for this semester are in cs411_opcodes.txt Quick start steps: 1) copy part1_start.vhdl to part1.vhdl then work on project in part1.vhdl 2) replace "_start" with "", e.g delete _start everywhere. 3) fill in .vhdl for the ALU_32 architecture to implement sub, and, or, sll, srl, cmpl, mul, div . See diagram. All other instructions must do a plain add. Note that EX_IR coming into ALU_32 has the instruction in "inst" and a possible schematic and some code ALU. 4) compute the signal WB_write_enb (needs 'or' of more opcodes) search for ??? some more work needed as an example for setting a mux control based on opcode. In each stage **_IR is the instruction currently in that stage. **_IR(31 downto 26) is the six bit major op code. "100011" for lw **_IR(5 downto 0) is the six bit minor op code. "100000" for add. when major op code is "000000" 5) Compile, analyze, run using commands in your Makefile make -f Makefile_411 part1.out make -f Makefile_ghdl part1.gout 6) then do difference: diff -iw part1.out part1.chk | more diff -iw part1.gout part1.chkg | more (ignore "squire" just from my run, ignore extra cadence or GHDL stuff. Use either or both to find errors.) Look at difference < lines are yours > lines are check, required see line number, look at part1.out on that line and see what instruction is being executed in which stage and fix that instruction. Or, if difference in ALUSrc or RegDst etc, fix that signal. Add to Makefile, if you did not download Makefile_411 all: ... part1.out # add part1.out to the list part1.out: part1.vhdl add32.vhdl bshift.vhdl pmul16.vhdl \ divcas16.vhdl part1.run part1.abs run_ncvhdl.bash -v93 add32.vhdl ... run_ncvhdl.bash -v93 bshift.vhdl ... run ncvhdl.bash -v93 pmul16.vhdl ... run ncvhdl.bash -v93 divcas16.vhdl ... run_ncvhdl.bash -v93 part1.vhdl # copied from part1_start.vhdl run_ncelab.bash -v93 part1:schematic run_ncsim.bash -batch -logfile part1.out -input part1.run part1 diff -iw part1.out part1.chk should be no differences no stalls, timing should be exact in Makefile_ghdl: ghdl -a --ieee=synopsys add32.vhdl ghdl -a --ieee=synopsys bshift.vhdl ghdl -a --ieee=synopsys pmul16.vhdl ghdl -a --ieee=synopsys divcas16.vhdl ghdl -a --ieee=synopsys part1.vhdl ghdl -e --ieee=synopsys part1 ghdl -r --ieee=synopsys part1 --stop-time=250ns > part1.gout diff -iw part1.gout part1.chkg should be no differences no stalls, timing should be exact The CS411 Project Part 1 uses a schematic as shown in Lecture 18 and part1.ps Check that opcodes are latest cs411_opcodes.txt For grading reasons, keep the signal names that are pipeline registers and the entity/memory names. The resulting output should be as shown in part1.chk file based on part1.abs and part1.run Check the results in part1.out to be sure the instructions worked. You can follow each instruction through the pipeline by following the instruction register, *_IR and check the *_* signals for correct values at each stage. It is possible that your part1.out does not agree with part1.chk but you should be able to explain why. (Probably different don't care choices.) You may want to copy part1.vhdl to another file and add more 'write' statements to print out more internal signal names in order to help debug your circuit. debug.txt Submit all components and your main circuit as one plain text file using submit. DO NOT include add32.vhdl or bshift.vhdl, pmul16.vhdl, divcas16.vhdl, etc. they are provided by the instructor for testing. The file must be named "part1.vhdl". DO NOT EMail except for questions. You submit on GL using: submit cs411 part1 part1.vhdl No makefiles or run files or output is to be submitted. Partial credit will be given based on number of instructions simulated correctly. The starter file part1_start.vhdl only simulates the lw and a few instruction correctly. This code part1.vhdl gets copied to part2a.vhdl for next project
Part2a: Copy your part1.vhdl to part2a.vhdl
Substitute string "part2a" for every "part1" cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2a.chkg . implement data forwarding and jump and branch. CS411 does the branch and jump in the ID stage CS411 goes beyond the book by forwarding for beq. submit cs411 part2 part2a.vhdl # before working part2b You are upgrading part1.vhdl to part2a.jpg or part2a.ps Data forwarding paths must cover at least those cases covered in class (see the class handout for details). Additional insight may be gained from a comparison of the pipeline stages with and without data forwarding in forward.txt A possible implementation of forwarding is forward_mem.jpg The EX stage forwarding may use entity mux_32_3, a multiplexor with three 32-bit inputs. Note: jump and beq are followed by a delayed branch slot that contains an instruction that is always executed. jump can not cause a stall. If beq does not get data forwarding, then it can stall, and stall, and stall. Add data forwarding for beq by adding two mux's in the ID STAGE that get inputs from the MEM stage as shown in part2a.jpg or part2a.ps Implement your circuit assuming that software has correctly filled the delayed branch slot and implement the branch in the ID stage as modified for this class project. You may use the mux32_3 For grading reasons, keep the signal names that are pipeline registers and the component/memory names. Run the following commands to check your work. make -f Makefile_411 part1.out make -f Makefile_ghdl part1.gout diff -iw part1.out part1.chk diff -iw part1.gout part1.chkg Implement green logic in two diagrams in lecture 19 For additional debugging, download and insert debug_forward.vhdl fix signal names if yours are different, and diff -iw part2a.out part2a_print.chk Ignore difference in PC_next in clock 6. My bug in my part2a.vhdl made an error in part2a.chk and part2a.chkg. MEM_data_reg should be EX_BB, MEM_addr): I had EX_B. Fixed now, part2a.chk, part2a.chkg OK. (The part2a_bug.chk and part2a_bug.chkg still accepted) OK to submit either way. I will ignore it when grading.Part2b: Copy your part2a.vhdl to part2b.vhdl
Substitute string "part2b" for every "part2a" cp /afs/umbc.edu/users/s/q/squire/pub/download/Makefile_ghdl . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part2b.chkg . implement hazard detection and stall the minimum possible. Handle hazards. Detect hazards, prevent wrong results by stalling when necessary. A stall is implemented by holding the instruction in the ID stage and letting the EX, MEM and WB stages proceed. The stall signal prevents the IF and ID stages from getting a clock signal. A terse summary of the hazard detection is in hazard.txt A possible implementation of hazards is stall_lw.jpg The CS411 Project Part 2b uses a modified schematic handed out See web for schematic part2b.jpg and part2b.ps Run the following commands to check your work. make -f Makefile_411 part2b.out make -f Makefile_ghdl part2b.gout diff -iw part2b.out part2b.chk diff -iw part2b.gout part2b.chkg OK if different PCnext on clock 6 Part2b needs both data forwarding and hazards (stalls) Submit all components and your main circuit as one plain text file using 'submit'. No makefiles or run files or output is to be submitted. Partial credit will be given based on number of data forwards, jump, beq, and hazard stalls handled correctly. Your circuit will not be tested with jump or branch or data addresses greater than 10 bits, in other words your instruction and data memories do not need to be bigger than 1024 words. You may not get exactly the .chk results. (except Clock 6, any PCnext also possibly a few more PC_next you still get 100) Timing and stalls will be graded. Points will be deducted for memory or register differences or improper stalls. For additional debugging, download and insert debug_stall.vhdl fix signal names if yours are different, and diff -iw part2b.out part2b_print.chk
Part3a: Copy your part2b.vhdl to part3a.vhdl
Substitute "part3a" for every "part2b" cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a.chkg . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3a_print.chk . Implement a cache in the instruction memory (read only) submit cs411 part3 part3a.vhdl Put the cache inside the instruction memory component (entity and architecture). (you will need to pass a few extra signals in and out) One output is the name of miss signal set <= '1', '0' etc. Then in architecture part3a "or" the new signal into "stall". part3a.ps Use the existing shared memory data as the main memory. Make a miss on the instruction cache cause a three cycle stall. A cycle is 10 ns, thus a three cycle stall is 30 ns. Previous stalls from part2b must still work. added: add to end of entity instruction_memory miss: out std_logic); add code after local_miss <= miss <= '1','0' after 30ns; -- to hold stall add to inst_mem: WORK.instruction_memory miss => IF_miss); -- instruction fetch miss add to stall <= ... or IF_miss; -- define above signal IF_miss: std_logic := '0'; The instruction cache cache holds 16 words organized as four blocks of four words. Remember vhdl memory is addressed by word address, the MIPS/SGI memory is addressed by byte address and a cache is addressed by block number. The cache schematic for the instruction cache was handed out in class and shown in. icache.jpg or cache.png The cache may be implemented using behavioral VHDL, basically writing sequential code in VHDL or by connecting hardware. Possible behavioral, not required, VHDL to set up the start of a cache: (no partial credit for just putting this in your cache.) add in or out signals to entity instruction_memory as needed for example, 'clk' 'clear' 'miss' make corresponding changes at inst_mem: also add code below with additions architecture behavior of instruction_memory is subtype block_type is std_logic_vector(154 downto 0); type cache_type is array (0 to 3) of block_type; signal cache : cache_type := (others=>(others=>'0')); signal local_miss : std_logic := '0'; -- needed between process calls -- now we have a cache memory initialized to zero begin -- behavior inst_mem: process ... add local_miss) compute same as miss variable quad_word_address : natural; -- for memory fetch variable cblock : block_type;-- the shaded block in the cache variable index : natural; -- index into cache to get a block variable word : natural; -- select a word variable my_line : line; -- for debug printout alias tag : std_logic_vector(25 downto 0) is cblock(153 downto 128); alias w0 : std_logic_vector(31 downto 0) is cblock(127 downto 96); alias valid : std_logic is cblock(154); -- other alias allowed ... begin if clear = '1' then miss <= '0'; inst <= x"00000000"; end if; if clear = '0' then index := to_integer(addr(5 downto 4)); word := to_integer(addr(3 downto 2)); cblock := cache(index); -- has valid (154), tag (153 downto 128) -- W0 (127 downto 96), W1(95 downto 64) -- W2(63 downto 32), W3 (31 downto 0) -- cblock is the shaded block in handout if (valid = '1') and (tag = addr(31 downto 6)) then -- hit -- ... do hit else -- miss -- ... do miss, get 4 words from memory, set tag and valid ... quad_word_address := to_integer(addr(13 downto 4)); w0 := memory(quad_word_address*4+0); w1 := memory(quad_word_address*4+1); -- ... -- fill in cblock with new words, then cache(index) <= cblock after 30 ns; -- 3 clock delay miss <= '1', '0' after 30 ns; -- miss is '1' for 30 ns local_miss <= '1', '0' after 30 ns; -- to get process to run -- this "miss" signal gets ored into part2b "stall" signal ... -- the part3a.chk file has 'inst' set to zero while 'miss' is 1 -- not required but cleans up the "diff" end if; end if; -- clear = '0' remember to or cashe miss signal into stall signal that takes care of sclk signal. I was a bit extreme in computing the miss signal in the cache. I did not use a hit signal yet had a local_miss signal. signal local_miss : std_logic := '0'; -- saved between calls My process had process(addr, clear, local_miss) More information, including debug print, is in Lecture 24 and debug.txt For debugging your cache, you might find it convenient to add this 'debug' print process right after end process inst_mem; debug: process -- used to print contents of I cache, diff part3a_print.chk variable my_line : LINE; -- not part of working circuit begin wait for 9.5 ns; -- just before rising clock for I in 0 to 3 loop write(my_line, string'("line=")); write(my_line, I); write(my_line, string'(" V=")); write(my_line, cache(I)(154)); write(my_line, string'(" tag=")); hwrite(my_line, cache(I)(151 downto 128)); -- ignore top bits write(my_line, string'(" w0=")); hwrite(my_line, cache(I)(127 downto 96)); write(my_line, string'(" w1=")); hwrite(my_line, cache(I)(95 downto 64)); write(my_line, string'(" w2=")); hwrite(my_line, cache(I)(63 downto 32)); write(my_line, string'(" w3=")); hwrite(my_line, cache(I)(31 downto 0)); writeline(output, my_line); end loop; wait for 0.5 ns; -- rest of clock end process debug; And, add in front of instruction_memory architecture: use STD.textio.all; use IEEE.std_logic_textio.all; Then diff -iw part3a.out part3a_print.chk see part3a_print.chk with debug You may print out signals such as 'miss' using prtmiss from. debug.txt For grading reasons, keep the signal names that are pipeline registers and the component/memory names. make -f Makefile_411 part3a.out make -f Makefile_ghdl part3a.gout diff -iw part3a.out part3a.chk diff -iw part3a.gout part3a.chkg diff -iw part3a.out part3a_print.chk You submit on GL using: submit cs411 part3 part3a.vhdl Ignore difference in PC_next in clock 27. Ignore PC not zero if results in registers and memory are same.Part3b: Copy your part3a.vhdl to part3b.vhdl
Substitute "part3b" for every "part3a" cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.abs . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.run . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.chk . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b.chkg . cp /afs/umbc.edu/users/s/q/squire/pub/download/part3b_print.chk . Implement a cache in the data memory (read/write) submit cs411 part3 part3b.vhdl Put the cache inside the data memory entity and process. Almost all the code from the instruction cache, part3a, can be copied and used inside the data memory for the data cache. (you will need to pass a few extra signals in and out) part3b.ps Use the existing shared memory data as the main memory. Make a miss on the data cache cause a three cycle stall of all pipeline stages. (you will need another signal similar to sclk in order to stall the EX, MEM and WB stages e.g. dsclk dsclk replaces clk for all registers in EX, MEM and WB stages.) A cycle is 10 ns, thus a three cycle stall is 30 ns. Previous stalls from part2b and part3a must still work. Change MEMread : std_logic := '1'; to MEMread : std_logic := '0'; for part3b. Do a write through cache for the data memory. (It must work to the point that results in main memory are correct at the end of the run and the timing is correct, partial credit for partial functionality with correct timing for the stalls.) Then test part3b.vhdl with the data cache. make -f Makefile_411 part3b.out make -f Makefile_ghdl part3b.gout diff -iw part3b.out part3b.chk diff -iw part3b.gout part3b.chkg submit cs411 part3 part3b.vhdl Submit all components and your main circuit as one plain text file by using 'submit'. No makefiles or run files or output is to be submitted. Partial credit will be given based on correct timing and number of instructions simulated correctly, number of hazards handled correctly and proper operation of the data cache. Of course, the instruction cache must work before the data cache is graded. I use dsclk in EX, MEM, WB stages in place of clk clk200 <= clk after 200 ps; -- slight delay dsclk <= clk200 or dmiss; -- dmiss out of data_memory stall gets added or dmiss dmiss out of data_memory similar to miss out of instruction_memory my code in data_memory very much like my code in instruction_memory same cache structure in another cache, L1 Dcache Add two signals to entity data_memory clear miss miss will be called dmiss outside the data_memory data_memory does nothing unless either read_enable or write_enable is '1' use code from instruction_memory cashe: very similar when read_enable is '1' reading from memory add code when write_enable is '1' writing into memory one word do nothing in data_memory unless either read_enb or write_enb is '1' Test read_enable and write_enable for both "hit" and "miss" cases. Typical start of data cache process ... begin if clear='1' then miss <= '0'; end if; if clear='0' and miss='0' and (read_enable='1' or (write_enable='1' and write_clk'event and write_clk='1')) then index := to_integer(address(5 downto 4)); word := to_integer(address(3 downto 2)); cblock := cache(index); ... -- for debug, After: end process data_mem; insert for part3b_print.chk debug: process -- used to print contents of D cache, use part3b_print.chk variable my_line : LINE; -- not part of working circuit begin wait for 9.5 ns; -- just before rising clock for I in 0 to 3 loop write(my_line, string'("line=")); write(my_line, I); write(my_line, string'(" V=")); write(my_line, cache(I)(154)); write(my_line, string'(" tag=")); hwrite(my_line, cache(I)(151 downto 128)); -- ignore top bits write(my_line, string'(" w0=")); hwrite(my_line, cache(I)(127 downto 96)); write(my_line, string'(" w1=")); hwrite(my_line, cache(I)(95 downto 64)); write(my_line, string'(" w2=")); hwrite(my_line, cache(I)(63 downto 32)); write(my_line, string'(" w3=")); hwrite(my_line, cache(I)(31 downto 0)); writeline(output, my_line); end loop; wait for 0.5 ns; -- rest of clock end process debug; -- end architecture behavior; -- of data_memory And, add in front of data_memory architecture: use STD.textio.all; use IEEE.std_logic_textio.all;
Last updated 12/3/2020