| |
Volume
5, Number 1: September 12, 2005
A
Review of the September 2005 IEEE
Spectrum issue on Software
Failures

The September
2005 issue of IEEE
Spectrum just arrived
in my office, and I was
intrigued to see that
it’s a special report
on software failures.
The magazine’s blurb
for the cover story says,
“As organizations
waste tens of billions
of dollars annually on
failed software projects,
the key to consistently
creating large, reliable,
and efficient IT systems
remains elusive. Can adding
engineering rigor to the
black art of programming
resolve the software crisis?”
There are three main articles
in this issue, and I was
pleasantly surprised to
see that all three are
directly accessible on
the IEEE Web site. The
online version lacks the
sidebars and photographs
that you’ll find
in the hard-copy version,
but the substantive content
is still there. I recommend
that you get both the
hard-copy and the online
version; but if you’re
impatient, lazy, or miserly,
then you can survive with
just the online version.
For what it’s worth,
there’s a “members
only” section of
the IEEE Spectrum
Web site, which may or
may not provide a more
richly-illustrated version
of the articles. Even
though I’m an IEEE
member, I could not manage
to get the Web site to
accept my user-id and
password. It would be
ironic if this turned
out to be a bug, while
attempting to access electronic
information about bugs!
The first article in the
issue is “Who
Killed the Virtual Case
File?”, by Harry
Goldstein, a detailed
12-page report on the
$170 million failure of
the FBI’s case-management
system known as VCF. And
if you’d like even
more detail, the IEEE
article notes that a December
2002 audit by the U.S.
Department of Justice’s
inspector general provides
an 81-page
assessment of the
project’s failures.
The audit attributes the
project failure to a number
of factors that will sound
quite familiar to veteran
project managers and consultants
— including poorly
defined requirements,
overly ambitious schedules,
and the lack of a plan
to guide hardware purchases,
network deployments, and
software development for
the bureau.
But reading through the
IEEE Spectrum
article, I found additional
details that made me wonder
how ever thought it would
be possible to develop
a complex, advanced system.
As late as 2000, for example,
“agents couldn’t
e-mail U.S. Attorney offices,
federal agencies, local
law enforcement, or each
other; instead, they typically
faxed case-related information.”
And the environment surrounding
the project was complicated
further by the intense
pressure caused by the
9/11 attack in 2001.
After a series of problems,
management changes, and
delays, the VCF project
finally collapsed in 2004
— with the FBI and
its software contractor,
SAIC, blaming each other
for the outcome. But as
is common with such failures,
a replacement project
is already underway: the
contract for the new system,
known as “Sentinel,”
is supposed to be awarded
by the end of 2005, with
a delivery of “phase
one” scheduled for
the end of 2006. My colleague
Ken
Orr, who reviewed
the plans for both VCF
and Sentinel as one of
“greybeards”
for FBI Director Robert
Mueller (see the greybeards’
National
Research Council report),
commented in the IEEE
Spectrum article, “The
sheer fact that they made
that kind of announcement
about Sentinel shows that
they really haven’t
learned anything. To say
that you’re going
to go out and buy something
and have it installed
within a year, based on
their track record,”
isn’t credible.
Next in the IEEE Spectrum
issue is an article entitled
“The
Exterminators,”
by Philip E. Ross. It
describes the strategy
and practices of a small
British software firm
called Praxis
High Integrity Systems,
which uses “formal
methods” of mathematical
logic to dramatically
reduce the number of bugs
in software development.
One statistic explains
why people are impressed
by the company: “With
an average of less than
one error in every 10,000
lines of delivered code
… Praxis claims
a bug rate that is at
least 50 — and possibly
as much as 1,000 —
times better than the
industry standard.”
Many of the details of
formal methods have been
known and practiced for
over 20 years (see, for
example, “Formal
Methods: State of the
Art and Future Directions”);
and indeed, Praxis itself
was formed in 1983. But
Praxis remains a tiny
company of only 100 people,
and you’re not likely
to see its ideas and techniques
used in the standard,
off-the-shelf products
from such behemoths as
Microsoft; on the other
hand, it’s encouraging
to note that even Microsoft
has begun using formal
methods in recent years,
“applying them to
develop small applications,
such as a bug-finding
tool used in-house and
also a theorem-proving
‘driver verifier,’
which makes sure device
drivers run properly under
Windows. But as Mr. Ross
acknowledges in his article:
“…
although formal
methods have been
used to great effect
in small and medium-sized
projects, no one
has yet managed
to apply them to
large ones. There’s
some reason to think
no one ever will,
except perhaps in
a limited fashion
…
“The largest
system Praxis has
ever built had 200,000
lines of code. For
comparison, Microsoft
Windows XP has around
40 million, and
some Linux versions
have more than 200
million.”
Finally,
my friend and colleague,
Rob
Charette, has written
an article entitled “Why
Software Fails,”
which — as the title
obviously implies —
catalogs the dozen key
reasons that large, complex
software projects fail
so often. The article
ought to be required reading
for every senior corporate
executive, for as Charette
says, “Software
is everywhere. It’s
what lets us get cash
from an ATM, make a phone
call, and drive our car
… The average company
spends about 4 to 5 percent
of revenue on information
technology … In
other words, IT is now
one of the largest corporate
expenses outside employee
costs.”
Not only does the ubiquity
and pervasiveness of software
continue to surprise us,
but the size of our software
systems is startling even
to long-time professionals
in the field. Charette
says that “a typical
cellphone now contains
2 million lines of software
code; by 2010 it will
likely have 10 times as
much. General Motors Corp.
estimates that by then
its cars will each have
100 million lines of code.”
Compare those numbers,
by the way, with the statistics
from the Praxis article:
ultra-high-reliability
software methods have
only been used, thus far,
in modest systems of up
to 200,000 lines of code
— but we’re
using cellphones controlled
by 10 times as much software,
and hurtling down the
highway in cars that will
soon be controlled by
500 times as much software!
The Web-based version
of Charette’s article
unfortunately does not
contain a full-page sidebar
titled “Software
Hall of Shame,”
which lists some 31 examples
of massive software failures;
that alone should justify
an investment in the hard-copy
version. It was reassuring,
in a humbling sort of
way, to see that the list
of horrendous failures
includes examples from
Canada, England, and Australia
in addition to the United
States.
The factors listed by
Charette as primary causes
of software failures will
be familiar to most veterans
in the industry; other
software gurus, such as
Capers Jones and Howard
Rubin, have chronicled
and quantified these failures
in numerous articles and
textbooks for at least
20 years. Thankfully,
Charette doesn’t
try to mislead us by concluding
his article with some
“silver-bullet”
solutions that will somehow
make all of our software
problems disappear. There
really is no need to do
so; as Charette observes
in the final paragraph
of this excellent article,
“We already know
how to do software well.
It may finally be time
to act on what we know.”
|
|
|
|