Experiences Mining Open Source Release Histories

TitleExperiences Mining Open Source Release Histories
Publication TypeConference Paper
Year of Publication2011
AuthorsTsay, J, Wright, H, Perry, D
Secondary TitleInternational Conference on Software and Systems Process (ICSSP 2011)
Date Published05/2011
Keywordsdoap, flossmole cited, life cycle, release engineering, release history, release management, releases

Software releases form a critical part of the life cycle of a software project. Typically, each project produces releases in its own way, using various methods of versioning, archiving, announcing and publishing the release. Understanding the release history of a software project can shed light on the project history, as well as the release process used by that project, and how those processes change. However, many factors make automating the retrieval of release history information difficult, such as the many sources of data, a lack of relevant standards and a disparity of tools used to create releases.

In spite of the large amount of raw data available, no attempt has been made to create a release history database of a large number of projects in the open source ecosystem. This paper presents our experiences, including the tools, techniques and pitfalls, in our early work to create a software release history database which will be of use to future researchers who want to study and model the release engineering process in greater depth.


"First, we selected the projects to initially target, using several criteria to get a broad picture of the open source landscape. Second, we collected the actual data, using a framework of parsers and some manual inspection. Third, we standardized and inserted the data into a database for later use."

"but we plan to eventually cross reference our list of projects with existing open source project information (such as FLOSSmole) to take advantage of the work already done by other researchers."

"For each release, we collected the following data: the project it belonged to, the date the release was published, the type of release, the release label (version number) and the source of the data"

discussion of their difficulties

"We conclude that programmatically creating a release history database from existing open source data is not trivial,"

"We have currently collected 1579 distinct releases from 22 different open source projects"

Full Text
PDF icon icssp11short-p034-tsay.pdf181.45 KB