Linking e-mails and source code artifacts

TitleLinking e-mails and source code artifacts
Publication TypeConference Paper
Year of Publication2010
AuthorsBacchelli, A, Lanza, M, Robbes, R
Tertiary AuthorsKramer, J, Bishop, J, Devanbu, P, Uchitel, S
Secondary TitleProceedings of the 32nd ACM/IEEE International Conference on Software Engineering - ICSE '10
Date Published05/2010
PublisherACM Press
Place PublishedCape Town, South Africa
ISBN Number9781605587196

E-mails concerning the development issues of a system constitute an important source of information about high-level design decisions, low-level implementation concerns, and the social structure of developers.

Establishing links between e-mails and the software artifacts they discuss is a non-trivial problem, due to the inherently informal nature of human communication. Different approaches can be brought into play to tackle this traceability issue, but the question of how they can be evaluated remains unaddressed, as there is no recognized benchmark against which they can be compared.

In this article we present such a benchmark, which we created through the manual inspection of a statistically significant number of e-mails pertaining to six unrelated software systems. We then use our benchmark to measure the effectiveness of a number of approaches, ranging from lightweight approaches based on regular expressions to full-fledged information retrieval approaches.


"we devised a set of lightweight methods, based on regular expressions, to establish the link between e-mails and software artifacts. We evaluated them in terms of precision and recall considering one single Java system. In this paper we overcome a number of limitations of our previous work, resulting in the following contributions:
• An extensive and publicly available1 benchmark and toolset for recovering traceability links between e-mails and source code artifacts. We created our benchmark by analyzing the mailing lists of six different software systems written in four different programming languages. For each system we manually annotated a statistically significant number of e-mails.
• A comprehensive evaluation of linking techniques. We evaluated and compared, in terms of precision and recall, different linking methods, ranging from lightweight grep-style approaches to more complex approa- ches from the information retrieval (IR) field."

Full Text