Open-Source Change Logs

TitleOpen-Source Change Logs
Publication TypeJournal Article
Year of Publication2004
AuthorsChen, K, Schach, SR, Yu, L, Offutt, J, Heller, GZ
Secondary TitleEmpirical Softw. Engg.
Date PublishedSeptember
PublisherKluwer Academic Publishers
Place PublishedHingham, MA, USA
ISSN Number1382-3256
Keywordschange log, gcc, GCC-g, GNUJSP, Jikes, log files, Open-source software, source code

A recent editorial in Empirical Software Engineering suggested that open-source software projects offer a great deal of data that can be used for experimentation. These data not only include source code, but also artifacts such as defect reports and update logs. A common type of update log that experimenters may wish to investigate is the ChangeLog, which lists changes and the reasons for which they were made. ChangeLog files are created to support the development of software rather than for the needs of researchers, so questions need to be asked about the limitations of using them to support research. This paper presents evidence that the ChangeLog files provided at three open-source web sites were incomplete. We examined at least three ChangeLog files for each of three different open-source software products, namely, GNUJSP, GCC-g++, and Jikes. We developed a method for counting changes that ensures that, as far as possible, each individual ChangeLog entry is treated as a single change. For each ChangeLog file, we compared the actual changes in the source code to the entries in the ChangeLog file and discovered significant omissions. For example, using our change-counting method, only 35 of the 93 changes in version 1.11 of Jikes appear in the ChangeLog file—that is, over 62% of the changes were not recorded there. The percentage of omissions we found ranged from 3.7 to 78.6%. These are significant omissions that should be taken into account when using ChangeLog files for research. Before using ChangeLog files as a basis for research into the development and maintenance of open-source software, experimenters should carefully check for omissions and inaccuracies.


"We decided to compare actual differences in the source code with entries in the ChangeLog file. We used lxr, the Linux cross-referencing tool..., to determine the precise differences between two successive software versions. We then compared these differences with the records in the ChangeLog file to check the completeness of the ChangeLog file."

Alternate Journal
Full Text
PDF icon chen.pdf142.87 KB