An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis.

TitleAn Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis.
Publication TypeConference Paper
Year of Publication2013
AuthorsHannemann, Anna, Hackstein Michael, Klamma Ralf, and Jarke Matthias
Secondary TitleSoftware Engineering
Pagination143–156
PublisherCiteseer
Abstract

Knowledge mining in Open-Source Software (OSS) brings a great benefit
for software engineering (SE). The researchers discover, investigate, and even simulate
the organization of development processes within open-source communities in
order to understand the community-oriented organization and to transform its advantages
into conventional SE projects. Despite a great number of different studies on
OSS data, not much attention has been paid to the data filtering step so far. The noise
within uncleaned data can lead to inaccurate conclusions for SE. A special challenge
for data cleaning presents the variety of communicational and development infrastructures
used by OSS projects. This paper presents an adaptive filter-framework supporting
data cleaning and other preprocessing steps. The framework allows to combine
filters in arbitrary order, defining which preprocessing steps should be performed. The
filter-portfolio can by extended easily. A schema matching in case of cross-project
analysis is available. Three filters - spam detection, quotation elimination and core periphery
distinction - were implemented within the filter-framework. In the analysis
of three large-scale OSS projects (BioJava, Biopython, BioPerl), the filtering led to
a significant data modification and reduction. The results of text mining (sentiment
analysis) and social network analysis on uncleaned and cleaned data differ significantly, confirming the importance of the data preprocessing step within OSS empirical
studies.

URLhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.437.5602&rep=rep1&type=pdf#page=143