Trends That Affect Temporal Analysis Using SourceForge Data

TitleTrends That Affect Temporal Analysis Using SourceForge Data
Publication TypeConference Paper
Year of Publication2010
AuthorsMacLean, Alexander C., Pratt Landon J., Krein Jonathan L., and Knutson Charles D.
Secondary Title5th Workshop on Public Data about Software Development (WoPDaSD 2010)
Keywordscliff walls, committers, cvs, evolution, growth, source code, sourceforge, time, time series

SourceForge is a valuable source of software artifact data for researchers who study project evolution and developer behavior. However, the data exhibit patterns that may bias temporal analyses. Most notable are cliff walls in project source code repository timelines, which indicate large commits that are out of character for the given project. These cliff walls often hide significant periods of development and developer collaboration—a threat to studies that rely on SourceForge repository data. We demonstrate how to identify these cliff walls, discuss reasons for their appearance, and propose preliminary measures for mitigating their effects in evolution-oriented studies.


"In this paper we examine some of the limitations of artifact data by specifically addressing the applicability of SourceForge data to the study of project evolution."

"For our analysis we examine 9,997 Production/Stable or Maintenance phase projects stored in CVS on SourceForge and extracted in October of 2006 [5]"

wopdasd001.pdf1.31 MB