This thesis presents a pre-processing stage for optimising software reliability by quantitatively identifying the most error prone regions in a software program. These error prone regions are identified using Genetic Algorithms on the source code’s graph representation, weighted with potential Sources of Error. By identifying these potentially error dense regions, the efficiency of the software quality activities can be increased. The information about quantitative error proneness can be used for more accurate effort and cost estimations of quality assurance. Although various methods have been applied for detecting and reducing errors in software, little research has been done into partitioning a system into smaller, error prone domains for a more targeted Software Quality Assurance. To identify error proneness in software regions is important as these domains can be given priority in code inspections or testing. Quality activities come at a high price, typically requiring more than half of the project resources to produce a working program. However, a working program does not necessarily mean a defect free program. Exhaustive software testing is rarely possible because it becomes intractable for even medium sized software. Inspections require experts; they can be subjective and expensive. Typically due to project budget constraints only parts of a program can be tested or inspected, but these parts are not necessarily the most error prone. A more effective approach is to focus inspection and testing efforts on those regions that are most likely to contain faults, that is, the most error prone regions. The strategic approach presented in this thesis consists in parsing a software source code and attributing weights to software’s paths using a method for assessing quantitatively the error proneness of software modules. By representing these paths as a weighted connectivity matrix, a Genetic Algorithm is applied to the paths with a strategy of finding a selection of paths with maximum weights as potential error carriers. These maximum error prone paths can then be selected for priority in testing and inspection. The approach does not deal with the actual inspection, testing or test cases per se, but it makes an informed choice on where to focus the main effort possible. This in turn aids project management by eliminating the guesswork of where to focus the effort and budget for quality assurance activities. The technique presented in this thesis is supported by a set of experiments: (i) empirical analysis of Genetic Algorithm variables and their effect on performance; (ii) Pareto analysis using error seeding identification with best fit, random and clustered approaches; (iii) segmenting path strata and identifying error prone regions in the path (iv) comparison with traditional software inspection. Results from the experiments conducted in the thesis support the proposed technique, through error identification rates greater than 85% from only 20% of the most error prone code. This is a strong result as it fits with the Pareto analysis or the 80/20 rule as a standard analysis technique.
|Doctor of Philosophy
- Sitte, Renate, Principal Supervisor, External person
|18 May 2006
|Published - 2006