CMSC 345 Lecture 5,

    <- previous    index    next ->

Lecture 5, Software Metrics

Management class says:
"If you can not measure it, you can not manage it."

KSLOC and NBNC
There are many metrics that software management needs
in order to do their job. A primary metric is used to
determine a profitable bid price for a new project.
Managers and their company tend to keep a record of
previous projects including labor hours and cost.
For software the most primitive metric is size, in
terms of KSLOC, kilo source lines of code.

Most software products that are sold by a company
are kept in a software configuration management, CM,
repository. This is both for the potential reuse of
the software and for use in predicting cost and
quality. The software quality assurance organization
is responsible for:
   a) measuring software quality
   b) improving software quality
   c) predicting software quality

It happens that measuring total lines of code can
be very easy. You as a student can easily count the
number of lines of code you have in your GL account.

For example, you have java code:
In a command window, in your login directory, type:

wc -l `find . -name \*.java`   # watch back ticks, gives number of lines
wc -l `find . -name \*.py`     # number of lines of Python, etc.

There can be a large variation in number of lines for
various individuals. When I was estimating, I used NBNC lines.
A computer program easily counted non-comment non-blank lines.
I had programs for several languages that my group produced.
What may be a surprising data point, for embedded software
for Department of Defense, I estimated one labor hour per
NBNC line of code. I estimated the NBNC lines of code based
on previous completed contracts. This estimate included
documents, meetings, writing and testing code, presentations. 
When it came to just writing code, I had people who could
do 20 lines per hour. Keep this a secret between us!

Types of software complexity measures
Over the years many metrics have been proposed and used
in various ways. McCabe Cyclomatic Complexity and
Halstead Software Metrics are covered below. Function Points
and other metrics may be used by the company where you work.

An example, one data point, of software quality vs.
development methodology, based on function points is:
This was from an article that did not completely define
the details of the measurement, yet shows a trend.
The Carnegie Mellon University, CMU, Software Engineering
Institute, SEI, designed a Capability Maturity Model, CMM,
that a software organization may use to determine their
level of maturity, from lowest 1, to highest 5.
 CMU SEI CMM Level 1: 0.75 defect rate per function point
 CMU SEI CMM Level 2: 0.44
 CMU SEI CMM Level 3: 0.27
 CMU SEI CMM Level 4: 0.14
 CMU SEI CMM Level 5: 0.05

Halstead Complexity Measure
First, define four types to be counted in source code:
 n1 - the number of distinct operators types e.g. =  +  -  [  (  ++
 n2 - the number of distinct operands, variable names, function names, ...
 N1 - the total number of operators
 N2 - the total number of operands

Note that style in the C language can give different values
  i++;         N1 = 1  N2 = 1  typically do not count ";" in C language
  i += 1;      N1 = 1  N2 = 2  each unique constant is an operand
  i = i + 1;   N1 = 2  N2 = 3  which is more error prone?

  might count [ and ] or just count [, same for ( ) and { }
  typically do not count data declaration type statements or /*  */
  counting can vary with language, may count 2 for each : in Python
 
Then Halstead measures
  n = n1 + n2       program vocabulary
  N = N1 + N2       program length, indication of size
  L = n1*log2(n1) + n2*log2(n2)  calculated length
  V = N log2 n      program volume
  D = n1/2 * N2/n2  difficulty
  E = D * V         effort

Some empirical examples
  T = E/18 seconds  coding time (not total project time)
  B = (E^2/3)/3000 or = V/3000  bugs that we call defects 

If you would base your project estimate on the above,
I have this money making Bridge to sell you. :)


McCabe Cyclomatic Complexity

In simple terms, count the number of paths in the
control flow graph.
In actual terms: Define a block of code as having one 
entrance and one exit. This would be a path in a 
control flow graph.

An "if" statement in most languages, starts
two paths.
A "for" statement in most languages, starts
two paths, one through the body and one around
the body to the end.
"break" or "continue" not part of an "if" statement
starts two paths.
A "case" statement starts as many paths as cases,
plus 1 if there is no "other" or "default" part.
A "try" statement starts two paths with the
exception handler being one of the two paths.

Typically require "structured programming" and
thus do not allow "go to" or statement labels.
My SQA organization used a rule, add 10 to
the complexity measure for each "go to".

The cyclomatic complexity directly relates to
software testing. One part of a quality test is
that every path, every block, must be executed
at least once. This is, of course, not the
only test requirement. The program STEST automated
the detection that this test requirement was
satisfied.

Thus, you can see that a slightly different tool,
program, would be needed for each programming language
a software organization would produce.

Big questions remain, do we include the metrics
from libraries we use? What if our organization
wrote the library? This could be very language
dependent. Do you trust the C math.h and the
C++ STL libraries?

Other things to count

From history: "Job Control Language", JCL, the name for
commands to the operating system to compile, link,
attach libraries, etc. needed to make a complete program.
This was the name for IBM main-frame computers
On VAX VMS, a .com file, On Windows a .bat file, on Unix or
Linux, a Makefile, or commands to Eclipse or other
development environment.

Some organizations may count JCL lines in the total
line count. Other options are to count only executable
lines or count only executable statements. Another option
is to count data definition statements. A simple example
is to just count semicolons in C language, this, of course,
will not work in Fortran or Python. Thus, a tool is
needed for every language an organization uses.

Now, you need to be working on the first draft of
the Software Requirements Specification, SRS, document.

srsTemplate.doc
srsTemplate.docx
srsA.docx

You may copy the file on linux.gl.umbc.edu using the command

cp /afs/umbc.edu/users/s/q/squire/pub/download/srsTemplate.doc .

(Note  "." spoken as  "dot", means "here")

    <- previous    index    next ->

Lecture 5, Software Metrics

KSLOC and NBNC

Types of software complexity measures

Halstead Complexity Measure

McCabe Cyclomatic Complexity

Other things to count

Other links

Go to top