<- previous index next ->
Cache "miss rate" is used as a measure of cache performance. Given 10 accesses to a cache, 9 hits and 1 miss, the miss rate = 1/10 = 10% Because there must always be compulsory misses, the miss rate can never be zero. On some plots below, the miss rate is 1% meaning a 99% hit rate. The importance of the plots is not the numbers, rather the trends. Note that this was based on SPEC92, over 20 years ago. Programs were much smaller back then, yet the trend for performance is the same today. Caches are scaled up today, 1MB and 2MB caches are common and 8MB caches are available. Cache performance based on two factors: 1) Cache size (bigger is better) 2) Cache associativity (more is better) A 4 way associative cache. Count tag equal comparators. Cache performance based on two factors: 1) cache size (bigger is better) 2) block size (more is usually better, but not for small caches!) Caches hold a small part of memory in the CPU for fast access. The following two sets of memory usage are from my computers and show the size of some programs on Windows and Linux. Memory usage on Windows XP: 37 processes Windows Explorer 18,104 KB 18 MB too big for cache Firefox 21,216 KB Photoshop 29,496 KB etc. total 163,000 KB 163MB of 512MB used. You would want good performance by keeping most of a program in cache. Thus, the need for caches in the megabytes. Memory usage on RedHat Linux: 83 processes, 3 running X 38,119 KB way too big for cache Firefox 20,083 KB Gimp 5,402 KB with extras running etc running top reports: 306 MB memory used 195 MB memory free 14 MB memory buff From: ps -Al ## memory size in KB F S UID PID PPID C PRI NI ADR SZ WCHAN TTY TIME CMD 4 S 0 1 0 1 75 0 - 345 schedu ? 00:00:04 init 1 S 0 2 1 0 75 0 - 0 contex ? 00:00:00 keventd 1 S 0 3 1 0 75 0 - 0 schedu ? 00:00:00 kapmd 1 S 0 4 1 0 94 19 - 0 ksofti ? 00:00:00 ksoftirqd_C 1 S 0 9 1 0 85 0 - 0 bdflus ? 00:00:00 bdflush 1 S 0 5 1 0 75 0 - 0 schedu ? 00:00:00 kswapd 1 S 0 6 1 0 75 0 - 0 schedu ? 00:00:00 kscand/DMA 1 S 0 7 1 0 75 0 - 0 schedu ? 00:00:00 kscand/Norm 1 S 0 8 1 0 75 0 - 0 schedu ? 00:00:00 kscand/High 1 S 0 10 1 0 75 0 - 0 schedu ? 00:00:00 kupdated 1 S 0 11 1 0 85 0 - 0 md_thr ? 00:00:00 mdrecoveryd 1 S 0 15 1 0 75 0 - 0 end ? 00:00:00 kjournald 1 S 0 73 1 0 85 0 - 0 end ? 00:00:00 khubd 1 S 0 1012 1 0 75 0 - 0 end ? 00:00:00 kjournald 1 S 0 1137 1 0 85 0 - 0 end ? 00:00:00 kjournald 1 S 0 3676 1 0 84 0 - 524 schedu ? 00:00:00 dhclient 5 S 0 3727 1 0 75 0 - 369 schedu ? 00:00:00 syslogd 5 S 0 3731 1 0 75 0 - 344 do_sys ? 00:00:00 klogd 5 S 32 3749 1 0 75 0 - 388 schedu ? 00:00:00 portmap 5 S 29 3768 1 0 75 0 - 391 schedu ? 00:00:00 rpc.statd 1 S 0 3812 1 0 75 0 - 0 end ? 00:00:00 rpciod 1 S 0 3813 1 0 85 0 - 0 schedu ? 00:00:00 lockd 5 S 0 3825 1 0 84 0 - 343 schedu ? 00:00:00 apmd 5 S 0 3841 1 0 85 0 - 5014 schedu ? 00:00:00 ypbind 1 S 0 3945 1 0 75 0 - 372 pipe_w ? 00:00:00 automount 1 S 0 3947 1 0 75 0 - 372 pipe_w ? 00:00:00 automount 1 S 0 3949 1 0 75 0 - 372 pipe_w ? 00:00:00 automount 5 S 0 3968 1 0 85 0 - 879 schedu ? 00:00:00 sshd 5 S 38 3989 1 0 75 0 - 601 schedu ? 00:00:00 ntpd 1 S 0 4013 1 0 75 0 - 0 schedu ? 00:00:00 afs_rxliste 1 S 0 4015 1 0 75 0 - 0 end ? 00:00:00 afs_callbac 1 S 0 4017 1 0 75 0 - 0 schedu ? 00:00:00 afs_rxevent 1 S 0 4019 1 0 75 0 - 0 schedu ? 00:00:00 afsd 1 S 0 4021 1 0 75 0 - 0 schedu ? 00:00:00 afs_checkse 1 S 0 4023 1 0 75 0 - 0 end ? 00:00:00 afs_backgro 1 S 0 4025 1 0 75 0 - 0 end ? 00:00:00 afs_backgro 1 S 0 4027 1 0 75 0 - 0 end ? 00:00:00 afs_backgro 1 S 0 4029 1 0 75 0 - 0 end ? 00:00:00 afs_cachetr 5 S 0 4037 1 0 75 0 - 354 schedu ? 00:00:00 gpm 1 S 0 4046 1 0 75 0 - 358 schedu ? 00:00:00 crond 5 S 43 4078 1 0 76 0 - 1226 schedu ? 00:00:00 xfs 1 S 2 4087 1 0 85 0 - 355 schedu ? 00:00:00 atd 4 S 0 4306 1 0 82 0 - 340 schedu tty1 00:00:00 mingetty 4 S 0 4307 1 0 82 0 - 340 schedu tty2 00:00:00 mingetty 4 S 0 4308 1 0 82 0 - 340 schedu tty3 00:00:00 mingetty 4 S 0 4309 1 0 82 0 - 340 schedu tty4 00:00:00 mingetty 4 S 0 4310 1 0 82 0 - 340 schedu tty5 00:00:00 mingetty 4 S 0 4311 1 0 82 0 - 340 schedu tty6 00:00:00 mingetty 4 S 0 4312 1 0 75 0 - 616 schedu ? 00:00:00 kdm 4 S 0 4325 4312 1 75 0 - 38119 schedu ? 00:00:02 X 5 S 0 4326 4312 0 77 0 - 877 wait4 ? 00:00:00 kdm 4 S 12339 4352 4326 0 85 0 - 1143 rt_sig ? 00:00:00 csh 0 S 12339 4393 4352 0 79 0 - 1034 wait4 ? 00:00:00 startkde 1 S 12339 4394 4393 0 75 0 - 785 schedu ? 00:00:00 ssh-agent 1 S 12339 4436 1 0 75 0 - 5012 schedu ? 00:00:00 kdeinit 1 S 12339 4439 1 0 75 0 - 5440 schedu ? 00:00:00 kdeinit 1 S 12339 4442 1 0 75 0 - 5742 schedu ? 00:00:00 kdeinit 1 S 12339 4444 1 0 75 0 - 9615 schedu ? 00:00:00 kdeinit 0 S 12339 4454 4436 0 75 0 - 2149 schedu ? 00:00:00 artsd 1 S 12339 4474 1 0 75 0 - 10689 schedu ? 00:00:00 kdeinit 0 S 12339 4481 4393 0 75 0 - 341 schedu ? 00:00:00 kwrapper 1 S 12339 4483 1 0 75 0 - 9466 schedu ? 00:00:00 kdeinit 1 S 12339 4484 4436 0 75 0 - 9772 schedu ? 00:00:00 kdeinit 1 S 12339 4486 1 0 75 0 - 9908 schedu ? 00:00:00 kdeinit 1 S 12339 4488 1 0 75 0 - 10299 schedu ? 00:00:00 kdeinit 1 S 12339 4489 4436 0 75 0 - 5085 schedu ? 00:00:00 kdeinit 1 S 12339 4493 1 0 75 0 - 9698 schedu ? 00:00:00 kdeinit 0 S 12339 4494 4436 0 75 0 - 2942 schedu ? 00:00:00 pam-panel-i 4 S 0 4495 4494 0 75 0 - 389 schedu ? 00:00:00 pam_timesta 1 S 12339 4496 4436 0 75 0 - 9994 schedu ? 00:00:00 kdeinit 1 S 12339 4497 4436 0 75 0 - 10010 schedu ? 00:00:00 kdeinit 1 S 12339 4500 1 0 75 0 - 9503 schedu ? 00:00:00 kalarmd 0 S 12339 4501 4496 0 75 0 - 1165 rt_sig pts/2 00:00:00 csh 0 S 12339 4502 4497 0 75 0 - 1159 rt_sig pts/1 00:00:00 csh 0 S 12339 4546 4501 0 85 0 - 1039 wait4 pts/2 00:00:00 firefox 0 S 12339 4563 4546 0 85 0 - 1048 wait4 pts/2 00:00:00 run-mozilla 0 S 12339 4568 4563 1 75 0 - 20083 schedu pts/2 00:00:01 firefox-bin 0 S 12339 4573 1 0 75 0 - 1682 schedu pts/2 00:00:00 gconfd-2 0 S 12339 4583 4502 0 75 0 - 5402 schedu pts/1 00:00:00 gimp 0 S 12339 4776 4583 0 85 0 - 2140 schedu pts/1 00:00:00 script-fu 1 S 12339 4779 4436 1 75 0 - 9971 schedu ? 00:00:00 kdeinit 0 S 12339 4780 4779 0 75 0 - 1155 rt_sig pts/3 00:00:00 csh 0 R 12339 4803 4780 0 80 0 - 856 - pts/3 00:00:00 ps A benchmark that was designed to note discontinuity in time as the data size increased exceeding the L1 cache, L2 cache. It would take hours if the program exceeded RAM and went to virtual memory on disk! The basic code, a simple matrix times matrix multiply: /* matmul.c 100*100 matrix multiply */ #include <stdio.h> #define N 100 int main() { double a[N][N]; /* input matrix */ double b[N][N]; /* input matrix */ double c[N][N]; /* result matrix */ int i,j,k; /* initialize */ for(i=0; i<N; i++){ /* FYI in debugger, this is line 13 */ for(j=0; j<N; j++){ a[i][j] = (double)(i+j); b[i][j] = (double)(i-j); } } printf("starting multiply \n"); for(i=0; i<N; i++){ for(j=0; j<N; j++){ c[i][j] = 0.0; for(k=0; k<N; k++){ /* how many instructions are in this loop? */ c[i][j] = c[i][j] + a[i][k]*b[k][j]; /* most time spent here! */ /* this statement is executed one million times */ } } } printf("a result %g \n", c[7][8]); /* prevent dead code elimination */ return 0; } The actual code: time_matmul.c and results: time_matmul_1ghz.out time_matmul_p4_25.out time_matmul_2100.out Test results on two computers using same executable: A fact you should know about memory usage: If your program gets more memory while running, e.g. using malloc, then tries to release that memory when not needed, e.g. free, the memory still belongs to your process. The memory is not given back to the operating system for use by another program. Thus, some programs keep growing in size as they run. Hopefully, internally, reusing any memory they previously freed. On Linux you can use cat /proc/cpuinfo to see brief cache size CS machine cpuinfo source code time_mp8.c measured time_mp8.out We have seen the Intel P4 architecture, and here is a view of the AMD Athlon architecture circa 2001. 9 pipelines, possibly 9 instruction issued per clock, 3 is typical. You can find out your computers cache sizes and speeds: www.memtest86.com Get the .bin file to make a bootable floppy Get the .iso file to make a bootable CD As part of the output, you do not have to run the memory test, you will see cache sizes and bandwidth values. (Shown on plot above.) part3a is assigned
<- previous index next ->