E-Book Overview
Article, part of the Reliability Society 2010 Annual Technical Report, 6 p
E-Book Content
This article is part of the Reliability Society 2010 Annual Technical Report
Advanced Combinatorial Test Methods for System Reliability D. Richard Kuhn*, Raghu N. Kacker*, Yu Lei** *National Institute of Standards & Technology **University of Texas at Arlington Gaithersburg, MD 20899 Arlington, TX Every computer user is familiar with software bugs. Many seem to appear almost randomly, suggesting that the conditions triggering them must be complex, and some famous software bugs have been traced to highly unusual combinations of conditions. For example, the 1997 Mars Pathfinder mission began experiencing system resets at seemingly unpredictable times soon after it landed and began collecting data. Fortunately, engineers were able to deduce and correct the problem, which occurred only when (1) a particular type of data was being collected and (2) intermediate priority tasks exceeded a certain load, allowing a blocking condition that eventually triggered a reset. At 155,000 lines of code (not including the operating system), the Pathfinder program is small compared with commercial software: a Boeing 777 airliner flies on 6.5 million lines of code, the Microsoft Windows XP operating system is estimated at 40 million, and within the next two years the average new car may have more than 100 million lines of code in various subsystems. Ensuring correct operation of complex software is so difficult that more than half of a software development budget – frequently tens of millions of dollars – is normally devoted to testing, and even then errors often escape detection. A 2002 NIST-funded study by the Research Triangle Institute estimated an annual cost of inadequate software testing infrastructure at $22.2 to $59.5 billion for the US economy [1]. As with any engineered system, cost is a critical issue for quality software. Any improvement in software testing efficiency can have a huge impact when testing consumes over half of the development budget. It is clearly possible to build ultra-dependable software (we bet our lives on this proposition each time we board a commercial aircraft), but the process is extremely expensive. Much of the cost results from the human effort involved in attempting to ensure that the software functions correctly in every situation. Even before the Pathfinder incident, NASA researchers had shown that the fault density (number of faults per line of code) can be over 100 times greater in rarely executed code than in frequently executed portions of a program [2]. In a 1999 study that considered faults arising from rare conditions, NIST reviewed 15 years of medical device recall data, in an effort to determine what types of testing could detect the reported faults [3]. For example, one recall report indicated that the “upper limit CO2 alarm can be manually set above upper limit without alarm sounding.” In this case, a single parameter – CO2 alarm value – caused the problem, and a test with the upper limit value exceeded could have detected it. Another report gave an example of a problem triggered only when two conditions were met simultaneously: “the ventilator could fail when the altitude adjustment feature was set on 0 meters and the total flow volume was set at a delivery rate of less than 2.2 liters per minute”. In this case, a test in which the pair of conditions was true – altitude is 0 and rate is less than 2.2 lpm – could have detected the flaw. Recognizing that system failures can result from the interaction of conditions that might be innocuous individually, software developers have long used “pairwise testing”, in which all possible pairs of parameter values are covered by at least one test. For example, suppose we wanted to show that a new software application works correctly on PCs that use Windows or
<