A Secondary Data Archive for Code-Level Debian Metrics

TitleA Secondary Data Archive for Code-Level Debian Metrics
Publication TypeConference Paper
Year of Publication2011
AuthorsKozak, C, Squire, M
Refereed DesignationRefereed
Secondary Title2011 Second International Workshop on Replication in Empirical Software Engineering Research (RESER)
Pagination43 - 51
Date Published09/2011
Place PublishedBanff, Alberta, Canada
ISBN Number978-1-4673-0972-1

In this paper, we describe a new process to collect, calculate, archive, and distribute interesting metrics for all the packages in the standard Debian GNU/Linux installation. Our method replicates and extends previous work done by other groups studying free and open source software systems (FLOSS) in three important ways. First, although there have been other previous studies that attempted to collect a large set of code-level metrics for a small set of projects, and there have been studies that generated a small set of metrics for the large Debian codebase, our project does both: we generate a larger set of metrics for the entire set of Debian packages. Second, our integration of new Debian metadata and additional code-level metrics not gathered before adds several additional layers for exploration. Finally, and most importantly, because we integrate our collection and analysis process into the automated FLOSSmole data store, we ensure timely, repeatable, and very easy comparison, replication and analysis by other groups. Thus our collection activity will continue in an automated fashion even after this paper is published, providing the foundation for additional studies to be conducted later, all freely accessible to any interested research group. After outlining our process, we discuss a few observations about the data, we outline some implications for the research community, and we present opportunities for further research.

Full Text