<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="6.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Bacchelli, Alberto</style></author><author><style face="normal" font="default" size="100%">D'Ambros, Marco</style></author><author><style face="normal" font="default" size="100%">Lanza, Michele</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Extracting source code from e-mails</style></title><secondary-title><style face="normal" font="default" size="100%">Proceedings of ICPC 2010 (18th IEEE International Conference on Program Comprehension)</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">argouml</style></keyword><keyword><style  face="normal" font="default" size="100%">email</style></keyword><keyword><style  face="normal" font="default" size="100%">freenet</style></keyword><keyword><style  face="normal" font="default" size="100%">jmeter</style></keyword><keyword><style  face="normal" font="default" size="100%">mailing lists</style></keyword><keyword><style  face="normal" font="default" size="100%">mina</style></keyword><keyword><style  face="normal" font="default" size="100%">natural language</style></keyword><keyword><style  face="normal" font="default" size="100%">openjpa</style></keyword><keyword><style  face="normal" font="default" size="100%">source code</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2010</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">http://www.inf.usi.ch/phd/bacchelli/publications.php</style></url></web-urls><related-urls><url><style face="normal" font="default" size="100%">http://flosshub.org/sites/flosshub.org/files/icpc2010.pdf</style></url></related-urls></urls><pages><style face="normal" font="default" size="100%">24-33</style></pages><abstract><style face="normal" font="default" size="100%">E-mails, used by developers and system users to communicate over a broad range of topics, offer a valuable source of information. If archived, e-mails can be mined to support program comprehension activities and to provide views of a software system that are alternative and complementary to those offered by the source code.

However, e-mails are written in natural language, and therefore contain noise that makes it difficult to retrieve the important data. Thus, before conducting an effective system analysis and extracting data for program comprehension, it is necessary to select the relevant messages, and to expose only the meaningful information.

In this work we focus both on classifying e-mails that hold fragments of the source code of a system, and on extracting the source code pieces inside the e-mail. We devised and analyzed a number of lightweight techniques to accomplish these tasks. To assess the validity of our techniques, we manually inspected and annotated a statistically significant number of e-mails from five unrelated open source software systems written in Java. With such a benchmark in place, we measured the effectiveness of each technique in terms of precision and recall.</style></abstract><notes><style face="normal" font="default" size="100%">&quot;We want to extract source code fragments from e-mail messages. To do this, we first need to select e-mails that contain source code fragments, and then we extract such fragments from the content in which they are enclosed.&quot;

&quot;we manually build a statistically significant benchmark taking sample e- mails from five unrelated open source Java software systems.&quot;

</style></notes></record></records></xml>