CMSC 611 Fall 2001     Common Project Description

 

Implement a DLX Simulator using VHDL

The goal is to implement the DLX processor using VHDL hardware description language. The implementation should contain all the five pipeline stages as described in the text.

The EX stage should support both integer and floating point operations. The instructions that you have to handle are listed in the opcode table along with the necessary information. You should have one integer unit, one floating point adder, one floating point multiplier and one floating point divider. The integer unit can handle one operation per clock cycle. The latency for the FP adder is 2, for FP multiplier is 5 and for FP divider is 19 clock cycles. All the floating point units should be pipelined internally i.e. if there are no hazards two FP adds should be handled by the FP adder. The no. and contents of both integer and floating points registers should be according to the text. Floating point numbers will follow the IEEE 754 standard. You should have a floating point status register (FPSR). Register write should be done on the rising edge (so that it happens before "reads"). The pipeline registers are updated on the falling edge of clock.

You should check and avoid all possible hazards. Forwarding should be implemented to take care of hazards. The forwarding paths required for integer unit are tabulated in table 3.19 on page 160 in the text. You will have to implement forwarding for the FP units in a similar fashion. Also you should be able to forward between FP and integer units if required. If forwarding cannot prevent the hazard the pipeline should be stalled to prevent the hazard. Few or all stages might have to be stalled depending on the type of hazard encountered. You should be able to flush the pipeline in case of control hazards. Exceptions should be flagged. You don't have to include code to take care of exceptions when they arise. Branches should be taken care of in the ID stage i.e. the test and branch target calculation are to be performed in this stage.

There should be separate data and instruction caches both 64KB in size. You can use any block size less than or equal to 4KB and the caches can be either direct mapped, set associative or fully associative. You can use any replacement policy you want. Assume reasonable penalties for cache hits and misses. Stall the whole pipeline in case of cache misses. The size of the main memory should be 256K. You should be able to load the memory and the PC from a file containing hex values and should be able to start executing once the memory is loaded. Flush out the caches to memory when execution is over.

Assume a reasonable clock frequency for the processor. Make sure that you can handle all the instructions in the opcode table in any order. A sample input file will be provided which can be used to test your code. The output from your run should be a file that dumps out the memory locations along with their addresses. The output format should be the same as in the sample input file. Any other relevant information will be updated under this link.

VHDL help available here (maintained by Jon Squire)       or
here (in Sections titled "Cadence" and "Cadence Tutorials" maintained by Chintan Patel)
Until you get comfortable with VHDL, stick to instructions on either one site, but not both (there are slight differences in environment setup). This should avoid most problems running VHDL using cadence.

For further info, you can always do a (web) search for VHDL and related topics.
Jon Squire    and    Chintan Patel   (among local VHDL experts) have kindly agreed to help with VHDL related questions. Jon Squire has office hours everyday from 3 to 5 pm Mon-Thu. Chintan is accessible via email (cpatel2@csee.umbc.edu)


The test file that you need to run in assembly code format (.s) is here

The first line in the .s file (with the start: label) gives the value of the PC where the execution should start. The instructions and data is to be stored in memory starting at location 00000000. Address of each consecutive instruction is obtained by incrementing the current address by 4. Make sure you can load instructions from a similar-formatted   .s   file into the memory of your implementation.

A brief report (no more than 3 to 4 pages) should explain your design parameters (Ex: memory and cache miss stall cycles, etc.). DO NOT REPEAT MATERIAL FROM THE PROJECT DESCRIPTION AND/OR TEXTBOOK.

Analyze the test file, list the stalls encountered and report the total time required by your implementation to run the code. During the demo, we will look for conformance with the timing you report in the write-up.

For the demo: put write statements in the write-back stage to print to a file the simulation time, the clock cycle number, the instruction in the WB stage, the ID of register that the instruction writes and the data value. Output a memory dump at the end of the execution. Reduce the memory and cache sizes to the minimum necessary to speed up the simulations during the demo.

The Demo dates are: 20th and 21st, from noon to 4 pm. Estimate: half an hour per group. Your code must be compiled, elaborated and ready for simulation before your slot starts (else you may not finish the demo and lose most of the points).
If ALL the participants in a group cannot make it between 12 and 4, we'll work out alternate time slots.

Time slot sign up sheets will be outside my door.

Further queries should be directed to (who else ?) Chintan.