Volume 5, Number 1: September 12, 2005

 

A Review of the September 2005 IEEE Spectrum issue on Software Failures

 

The September 2005 issue of IEEE Spectrum just arrived in my office, and I was intrigued to see that it’s a special report on software failures. The magazine’s blurb for the cover story says, “As organizations waste tens of billions of dollars annually on failed software projects, the key to consistently creating large, reliable, and efficient IT systems remains elusive. Can adding engineering rigor to the black art of programming resolve the software crisis?”

There are three main articles in this issue, and I was pleasantly surprised to see that all three are directly accessible on the IEEE Web site. The online version lacks the sidebars and photographs that you’ll find in the hard-copy version, but the substantive content is still there. I recommend that you get both the hard-copy and the online version; but if you’re impatient, lazy, or miserly, then you can survive with just the online version. For what it’s worth, there’s a “members only” section of the IEEE Spectrum Web site, which may or may not provide a more richly-illustrated version of the articles. Even though I’m an IEEE member, I could not manage to get the Web site to accept my user-id and password. It would be ironic if this turned out to be a bug, while attempting to access electronic information about bugs!

The first article in the issue is “Who Killed the Virtual Case File?”, by Harry Goldstein, a detailed 12-page report on the $170 million failure of the FBI’s case-management system known as VCF. And if you’d like even more detail, the IEEE article notes that a December 2002 audit by the U.S. Department of Justice’s inspector general provides an 81-page assessment of the project’s failures. The audit attributes the project failure to a number of factors that will sound quite familiar to veteran project managers and consultants — including poorly defined requirements, overly ambitious schedules, and the lack of a plan to guide hardware purchases, network deployments, and software development for the bureau.

But reading through the IEEE Spectrum article, I found additional details that made me wonder how ever thought it would be possible to develop a complex, advanced system. As late as 2000, for example, “agents couldn’t e-mail U.S. Attorney offices, federal agencies, local law enforcement, or each other; instead, they typically faxed case-related information.” And the environment surrounding the project was complicated further by the intense pressure caused by the 9/11 attack in 2001.

After a series of problems, management changes, and delays, the VCF project finally collapsed in 2004 — with the FBI and its software contractor, SAIC, blaming each other for the outcome. But as is common with such failures, a replacement project is already underway: the contract for the new system, known as “Sentinel,” is supposed to be awarded by the end of 2005, with a delivery of “phase one” scheduled for the end of 2006. My colleague Ken Orr, who reviewed the plans for both VCF and Sentinel as one of “greybeards” for FBI Director Robert Mueller (see the greybeards’ National Research Council report), commented in the IEEE Spectrum article, “The sheer fact that they made that kind of announcement about Sentinel shows that they really haven’t learned anything. To say that you’re going to go out and buy something and have it installed within a year, based on their track record,” isn’t credible.

Next in the IEEE Spectrum issue is an article entitled “The Exterminators,” by Philip E. Ross. It describes the strategy and practices of a small British software firm called Praxis High Integrity Systems, which uses “formal methods” of mathematical logic to dramatically reduce the number of bugs in software development. One statistic explains why people are impressed by the company: “With an average of less than one error in every 10,000 lines of delivered code … Praxis claims a bug rate that is at least 50 — and possibly as much as 1,000 — times better than the industry standard.”

Many of the details of formal methods have been known and practiced for over 20 years (see, for example, “Formal Methods: State of the Art and Future Directions”); and indeed, Praxis itself was formed in 1983. But Praxis remains a tiny company of only 100 people, and you’re not likely to see its ideas and techniques used in the standard, off-the-shelf products from such behemoths as Microsoft; on the other hand, it’s encouraging to note that even Microsoft has begun using formal methods in recent years, “applying them to develop small applications, such as a bug-finding tool used in-house and also a theorem-proving ‘driver verifier,’ which makes sure device drivers run properly under Windows. But as Mr. Ross acknowledges in his article:


“… although formal methods have been used to great effect in small and medium-sized projects, no one has yet managed to apply them to large ones. There’s some reason to think no one ever will, except perhaps in a limited fashion …

“The largest system Praxis has ever built had 200,000 lines of code. For comparison, Microsoft Windows XP has around 40 million, and some Linux versions have more than 200 million.”


Finally, my friend and colleague, Rob Charette, has written an article entitled “Why Software Fails,” which — as the title obviously implies — catalogs the dozen key reasons that large, complex software projects fail so often. The article ought to be required reading for every senior corporate executive, for as Charette says, “Software is everywhere. It’s what lets us get cash from an ATM, make a phone call, and drive our car … The average company spends about 4 to 5 percent of revenue on information technology … In other words, IT is now one of the largest corporate expenses outside employee costs.”

Not only does the ubiquity and pervasiveness of software continue to surprise us, but the size of our software systems is startling even to long-time professionals in the field. Charette says that “a typical cellphone now contains 2 million lines of software code; by 2010 it will likely have 10 times as much. General Motors Corp. estimates that by then its cars will each have 100 million lines of code.” Compare those numbers, by the way, with the statistics from the Praxis article: ultra-high-reliability software methods have only been used, thus far, in modest systems of up to 200,000 lines of code — but we’re using cellphones controlled by 10 times as much software, and hurtling down the highway in cars that will soon be controlled by 500 times as much software!

The Web-based version of Charette’s article unfortunately does not contain a full-page sidebar titled “Software Hall of Shame,” which lists some 31 examples of massive software failures; that alone should justify an investment in the hard-copy version. It was reassuring, in a humbling sort of way, to see that the list of horrendous failures includes examples from Canada, England, and Australia in addition to the United States.

The factors listed by Charette as primary causes of software failures will be familiar to most veterans in the industry; other software gurus, such as Capers Jones and Howard Rubin, have chronicled and quantified these failures in numerous articles and textbooks for at least 20 years. Thankfully, Charette doesn’t try to mislead us by concluding his article with some “silver-bullet” solutions that will somehow make all of our software problems disappear. There really is no need to do so; as Charette observes in the final paragraph of this excellent article, “We already know how to do software well. It may finally be time to act on what we know.”

 

 
 

www.yourdon.com

©2006 Ed Yourdon