Predicting risky modules in open-source software for high-performance computing

TitlePredicting risky modules in open-source software for high-performance computing
Publication TypeConference Paper
Year of Publication2005
AuthorsPhadke, AA, Allen, EB
Secondary TitleProceedings of the second international workshop on Software engineering for high performance computing system applications
Place PublishedNew York, NY, USA
ISBN Number1-59593-117-1
KeywordsC4.5, decision trees, empirical case study, high performance computing, logistic regression, Open-source software, PETSc, software metrics, software quality model, software reliability

This paper presents the position that software-quality modeling of open-source software for high-performance computing can identify modules that have a high risk of bugs.Given the source code for a recent release, a model can predict which modules are likely to have bugs, based on data from past releases. If a user knows which software modules correspond to functionality of interest, then risks to operations become apparent. If the risks are too great, the user may prefer not to upgrade to the most recent release.Of course, such predictions are never perfect. After release, bugs are discovered. Some bugs are missed by the model, and some predicted errors do not occur. A successful model will be accurate enough for informed management action at the time of the predictions.As evidence for this position, this paper summarizes a case study of the Portable Extensible Toolkit for Scientific Computation (PETSC), which is a mathematical library for high-performance computing. Data was drawn from source-code and configuration management logs. The accuracy of logistic-regression and decision-tree models indicated that the methodology is promising. The case study also illustrated several modeling issues.

Full Text