Benchmarking Lightweight Techniques to Link E-Mails and Source Code

TitleBenchmarking Lightweight Techniques to Link E-Mails and Source Code
Publication TypeConference Paper
Year of Publication2009
AuthorsBacchelli, A, D'Ambros, M, Lanza, M, Robbes, R
Secondary Title2009 16th Working Conference on Reverse Engineering
Pagination205 - 214
Place PublishedLille, France
ISBN Number978-0-7695-3867-9
Keywordsargouml, email, mailing lists

During the evolution of a software system, a large amount of information, which is not always directly related to the source code, is produced. Several researchers have provided evidence that the contents of mailing lists represent a valuable source of information: Through e-mails, developers discuss design decisions, ideas, known problems and bugs, etc. which are otherwise not to be found in the system.

A technical challenge in this context is how to establish the missing link between free-form e-mails and the system artifacts they refer to. Although the range of approaches is vast, establishing their accuracy remains a problem, as there is no benchmark against which to compare their performance.

To overcome this issue, we manually inspected a statistically significant number of e-mails pertaining to the ArgoUML system. Based on this benchmark, we present a variety of lightweight techniques to assign e-mails to software artifacts and measure their effectiveness in terms of precision and recall.


"We present different lightweight approaches that, exploiting the specific characteristics of e-mails and the ones of the source code, are capable of establishing a bi-directional link between source code entities and e-mails"

"We analyzed ArgoUML1, a UML modelling tool written in Java, developed over the course of approximately 9 years, and made available under the BSD Open Source License. We consider the release 0.28 (March 2009) that comprehends 2,197 classes. We employed the lightweight approaches to map such classes to the related e-mails in ArgoUML mailing lists.
ArgoUML e-mails are stored in six mailing lists (see Table I), for a total amount of 79,175 messages"

Figure 4 is helpful to understand what this does.

Full Text
PDF icon wcre2009.pdf720.23 KB
Taxonomy upgrade extras: