Although numerous empirical studies have been conducted to measure the fault detection capability of software analysis methods, few studies have been conducted using programs of similar size and characteristics. Therefore, it is difficult to derive meaningful conclusions on the relative detection ability and cost-effectiveness of various fault detection methods. In order to compare fault detection capability objectively, experiments must be conducted using the same set of programs to evaluate all methods and must involve participants who possess comparable levels of technical expertise. One such experiment was ‘Conflict1’, which compared voting, a testing method, self-checks, code reading by stepwise refinement and data-flow analysis methods on eight versions of a battle simulation program. Since an inspection method was not included in the comparison, the authors conducted a follow-up experiment ‘Conflict2’, in which five of the eight versions from Conflict1 were subjected to Fagan inspection. Conflict2 examined not only the number and types of faults detected by each method, but also the cost-effectiveness of each method, by comparing the average amount of effort expended in detecting faults. The primary findings of the Conflict2 experiment are the following. First, voting detected the largest number of faults, followed by the testing method, Fagan inspection, self-checks, code reading and data-flow analysis. Second, the voting, testing and inspection methods were largely complementary to each other in the types of faults detected. Third, inspection was far more cost-effective than the testing method studied. Copyright © 2002 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]