Proposed Application of Data Mining Techniques for Clustering Software Projects

TitleProposed Application of Data Mining Techniques for Clustering Software Projects
Publication TypeJournal Article
Year of Publication2010
AuthorsRezende, Henrique Ribiero, and Esmin Ahmed Ali Abdalla
Secondary TitleINFOCOMP Special Edition
Pagination43-48
Keywordsflossmole
Abstract

Software projects always generate a lot of data, ranging from informal documentation to a database with thousands of lines of code. This information extracted from software projects takes even greater when it comes to OSS (Open Source Software). Such data may include source code base, historical change in the software, bug reports, mailing lists, among others. Using data mining techniques, we can extract valuable knowledge of this set of in formation, thus providing improvements throughout the process of software development. The results can be used to improve the quality of software, or even to manage the project in order to obtain maximum efficiency. This article proposes the application of data mining techniques to cluster software projects, cites the advantages that can be obtained with these techniques, and illustrates the application of data mining in a Open Source Software database

Notes

"Using data available on the web, mainly in software
repositories, a collaborative project called FLOSSmole
was created to collect, share, and store comparable data
and analysis of the FLOSS development for academic
research. The project is based on continuous data collection
and analysis efforts of many research groups,
reducing duplication and promoting compatibility both
in the data sources of FLOSS software, as well as in research
and analysis [6]. In the FLOSSmole project, data
was collected in different software repositories. This
data was stored in relational databases (SQL), and available
on FLOSSmole Project website [2]. For this study,
we used the database collected from SourceForge repository,
as they are the largest repository of projects today,
and it is well known among developers."

AttachmentSize
art06.pdf658.6 KB