CMSC 611 Homework

CS611 Details of homework assignments

    The most important item on all homework is YOUR NAME!
    No name, no credit. ALSO, put last 4 digits of SS#.
    Staple or clip pages together.

Homework must be submitted when due. You loose 10%, one grade, the first day homework is late. Then 10% each week thereafter. Max 50% off. A zero really hurts your average! Paper or EMail to squire@cs.umbc.edu is acceptable. If I can not read or understand your homework, you do not get credit. Type or print if your handwriting is bad. Homework is always due on a scheduled class day within 15 minutes after the start of the class. If class is canceled then homework is due the next time the class meets.

  EMail only plain text! No word processor formats.
       You may use a word processor or other software tools and
       print the results and turn in paper.
       Put CS611 and HW number in subject line.

Some homework must be "submitted"

 The "submit" facility only works on the "irix.gl.umbc.edu" 
 and  linux.gl.umbc.edu machines.

 The student commands are:
    submit   cs611 HW6 file   puts your "file" into cs611 HW6
    submitrm cs611 HW6 file   removes your "file" from cs611 HW6
    submitls cs611 HW6        lists your files in cs611 HW6

    Note: "HW" is upper case
       a) you must have your userid registered for "submit"
          send mail from a gl machine to squire if your submit fails
       b) you have to be logged onto a gl machine, SSH or telnet are OK
       c) everything is case sensitive, remember the uppercase HW.

Do your own homework!

You can discuss homework with other class members but DO NOT COPY!

HW1 Amdahl's Law 25 points

  You must show your work, not just the answer.
  Book Page 60, Exercise 1.2
  Book Page 62, Exercise 1.7

HW2 CPI 25 points

  You must show your work, not just the answer.
  Book Page 121, Exercise 2.11
  Book Page 122, Exercise 2.12

HW3 Pipelines 25 points

  You must show your work, not just the answer.
  Book Page 214, Exercise 3.1
  Book Page 219, Exercise 3.12

HW4 SuperScalar DLX 25 points

  1. Does a DLX sequence of instructions exists that must
     stall in a "scoreboard" machine, Figure 4.3, Page 244,
     yet the same sequence will not stall in a Tomasulo machine,
     Figure 4.8, Page 253, ? (yes or no)

  2. Does a DLX sequence of instructions exists that must
     stall in a Tomasulo machine, Figure 4.8, Page 253,
     yet the same sequence will not stall in a "scoreboard"
     machine, Figure 4.3, Page 244, ? (yes or no)

  3. Using the paper "Combining Branch Predictors" by Scott
     McFarling, use the instruction trace given below and
     update a Bimodal Predictor, Figure 1, for the case:
     two PC bits, four entries in the count vector, two
     bit counts as described in the paper.
     Initialize all counts to 10 base 2 (different from paper)

     a) show the four count values at the end of the trace.
     b) keep a count of correct predictions as you update
        the counts in order to give the percent predicted
        correctly.

     The following trace, as decimal integers, represent the
     sequence of PC (low order bits) of conditional branches
     and the letter T following the PC value indicates the
     branch was taken.
     1T, 1T, 1T, 1, 3T, 2, 1T, 1T, 1T, 1, 3, 2, 1T, 1T, 1T, 1,
     0T, 0T, 0T, 0, 3T, 2, 1T, 1T, 1T, 1, 2, 3

     (Sample output with dummy answers should look like:
           0  00    <-- initially  10  in all cases
           1  11
           2  01
           3  11
              50%)

HW5 Cache 25 points

 
Draw the diagram and compute values for the cache system
described below. The diagram can be drawn free hand yet
needs to be neat enough to be read. Use similar level of detail
as was on the handout in class on caches. Show at least the
tag comparators and "and" gate with the valid bit. Show all
four rows of the L1 cache, and about 8 rows of the 65,536 rows
of the L2 cache. Use this diagram to hand simulate the caches
action when running the address sequence below.

  L1 instruction cache for a DLX machine,
  2-way associative, block size is 4 words (16 bytes),
  index field in PC is two bits (e.g. 4 blocks long)
  LRU (Least recently used) replacement policy.

  Thus PC bits are  +--------------+-------+---------+----------+
                    |  tag         | index | word    | byte     |
                    |              |       | select  | select   |
                    +--------------+-------+---------+----------+
    bit number       31          6  5    4  3       2 1        0

  Timing: the instruction is delivered in 1 ns for a hit,
  a miss requires that a block be filled from the L2 cache.
  (The 1 ns is still used by the L1 cache, even on a miss.)

  L2 general cache, direct mapped, block size is 4 words (16 bytes)
  index field is 16 bits (i.e. 65,536 blocks long)

  Timing: four words are delivered to L1 in 8 ns for a hit,
  a miss requires that a block be filled from RAM.
  Assume the 8 ns includes time to get the address,
  put the four words on the bus into the L1 cache and
  raise the "L2 hit" signal. (The data from RAM flows
  through the L2 cache on the way to the L1 cache,
  thus the 8 ns is used by the L2 cache, even on a miss.) 


  RAM, 128 bit bus, (16 bytes) (4 words) delivered to L2 in 20 ns
  Assume the 20 ns includes time to get the address, fetch
  the data, put the data on the bus and raise the "data_ready"
  signal for the L2 cache..

  All "valid" bits are initially zero.
  From the above, the first instruction takes 1 + 8 + 20 = 29 ns.
  Other facts: The memory to L2 cache bus is 128 bits wide.
               The L2 to L1 bus is 128 bits wide, thus no word
               select multiplexer on the L2 cache.

  Given the sequence of PC addresses below,
  1) What is the total time to deliver all instructions. (ns)
  2) What is the average time to deliver all instructions. (ns)
  3) What is the L1 cache miss rate. (fraction)
  4) What is the L2 cache miss rate looking at only the L2 cache.
     (fraction)
  5) Assume one clock per nanosecond (ns) What is the average CPI.
     (xx.xx clocks per instruction)

  6) Show hit or miss on each cache for each PC.
     (Do this first, of course!)

                    L1   L2 
  PC:  00000000                <--- show H for hit, M for miss
       00000010                     blank for unused.
       00000020
       00000030                     for each address for L1 and L2
       00000040
       00000004
       00000008                         * change, was 6
       00000080
       00000044
       00000008
       440001F4
       110003B8                         * change, 8 was 6
       00000038

HW6 VHDL 25 points

 
Write the VHDL code to perform an IEEE 754 Floating point add.
You are given two 32-bit floating point numbers that are to
be added to produce a third floating point number.

Simplifications you may use include:
  Input numbers are normalized.
  No overflow or underflow or denormalization will occur.
  No rounding is necessary (either use truncation or round toward zero.)
  Use VHDL add, subtract, shift and other operators as needed.
  You do not have to go to the gate level.

  Use fp_add_test.vhdl as a start.
  Fill in the  architecture behavior of fp_add  to do the
  IEEE floating point add.

  Choose some reasonable test data for "a" and "b" in the test bench.

  The handout in class shows the commands needed on sunserver1.cs.umbc.edu

  A previous handout shows commands for linux.gl.umbc.edu but you
  will have to delete a few words to make VHDL-87 rather than VHDL-93 

  Look at VHDL help for more information.

  Compile and run. When reasonably correct, on gl.umbc.edu  do a submit

  submit cs611 HW6 fp_add_test.vhdl

Midterm exam. 15% of course grade

  Closed book. Short answer, Numeric problems and some Multiple choice.
  Numerical problems will be on CPI, Amdahl's Law, Pipelining,
  Branch Prediction, Cache and IEEE Floating Point.

  Exam covers book:     1.5, 1.6,
                        2.3, 2.8,
                        3.1-3.5, 3.7, 3.9,
                        4.2, 4.4
                        5.1-5.5
              lectures: 1 through 14 excluding Introduction and VHDL
              homework: 1 through 5 
              papers:   McFarling "Combining Branch predictors"

HW7 I/O Timing 25 points

 
Based on textbook and lecture answer the following:
Show your work.

Q1. Given a PCI bus running at 66MHz, 64 bits wide,
    what is the maximum bandwidth in MHz?
    (I could have asked for Mb/sec, same number, yet MB/sec is wrong!)

Q2. Given a Ultra SCSI 160MB/s controller and disk drive that
    spins at 10,000 rpm and has an average seek of 6ms, 160MB/s
    transfer rate, defragmented.

    How long does it take, in seconds, to transfer 3.2MB where
    each disk transfer is a 32KB block?

a)  assuming 1/3 average seek time for first block and average
    rotational delay for all blocks.

b)  like a) but assuming a 4MB internal disk buffer so that only
    the first block has a rotational delay penalty.

    All numbers in decimal, K=1,000, M=1,000,000
    Defragmented disks will typically only pay the seek penalty
    on the first block, the internal disk buffer can prevent
    any rotational delay penalty assuming there is room for
    read-a-head. Assume ideal timing.

Q3. How many raw bytes must be stored to have one hour of
    music played at 44.1 KHz with 16 bits coming from each of
    two channels?

Q4. What bandwidth in MHz (one bit per clock) is needed to continuously
    read 4.7GB of digital data from a 12X DVD in 10 minutes?
    (ignore seek and rotational delay, G=1,000,000,000)

HW8 Little's Law 25 points

 
Given a queue/server model M/M/4
Given an average-arrival-rate  20 tasks per second

Q1.   Given a single-server-utilization of 80%
  a)  What is the average-single-server-rate in tasks per second ?
  b)  What is the average-time-in-queue for a task ?
  c)  What is the average-time-in-system for a task ?
  d)  What is the average-tasks-in-queue ?

Q2.  What is the maximum-tasks-in-queue ?
     Possible short answers:
       average-tasks-in-queue / single-server-utilization
       about ten times the average-tasks-in-queue
       unbounded

Q3.  Given that we want the average-tasks-in-queue to be 10 tasks,
     (Still M/M/4 and average-arrival-rate of 20 tasks per second)
     What single-server-utilization is needed?
     (Answer as a percentage within 2% gets full credit.)

Comments: None should be needed and these may be redundant,
          yet, in order to prevent long lines asking questions:
          M/M/4 technically stands for
          First M means memory-less random distribution of arrivals
          Second M means memory-less random distribution of service times
          4 means the single queue is feeding four servers

          For a server problem it is reasonable to assume an
          exponential probability distribution for each M.
          It is reasonable to assume the four servers equally
          divide the workload of one server that is four times as fast.
          It is reasonable to assume all servers have the same utilization.

          The equations in the textbook on page 509 and 510 apply.
          The equations given in class represent the same equations
          that are in the textbook.

          As always, do not plug numbers into randomly selected equations.
          Try to understand what equations apply to the problem
          and check if your answers are intuitively reasonable.

HW9 25 points

  

  Not assigned in Fall 2000

Final Exam 50 points

 
  Comprehensive, about 1/3 pre midterm, 2/3 post midterm
  True/False, multiple choice, short answer
  In range 25 to 50 questions.

  The exam covers:
    Lectures 3 - 13, 16 - 28
    Homework 1 - 8
    McFarling Paper through bimodal
    IEEE 754 paper, floating point add, sub, mul, div
    Textbook  1.5
              3.2 - 3.7
              4.1 , 4.2 , 4.4 , 4.8
              5.2 - 5.7
              6.2 - 6.5
              7.2 , 7.5
              8.2 - 8.6
              A.3 - A.5
              B.3
              E.1

Go to top

 Last updated 12/12/00