Developer identification methods for integrated data from various sources

TitleDeveloper identification methods for integrated data from various sources
Publication TypeConference Paper
Year of Publication2005
AuthorsRobles, G, Gonzalez-Barahona, JM
Secondary TitleProceedings of the 2005 international workshop on Mining software repositories
Place PublishedNew York, NY, USA
ISBN Number1-59593-123-6
Keywordsanonymization, bug tracker, developers, email, email address, gnome, identity, mailing list, privacy, source code, version control

Studying a software project by mining data from a single repository has been a very active research field in software engineering during the last years. However, few efforts have been devoted to perform studies by integrating data from various repositories, with different kinds of information, which would, for instance, track the different activities of developers. One of the main problems of these multi-repository studies is the different identities that developers use when they interact with different tools in different contexts. This makes them appear as different entities when data is mined from different repositories (and in some cases, even from a single one). In this paper we propose an approach, based on the application of heuristics, to identify the many identities of developers in such cases, and a data structure for allowing both the anonymized distribution of information, and the tracking of identities for verification purposes. The methodology will be presented in general, and applied to the GNOME project as a case example. Privacy issues and partial merging with new data sources will also be considered and discussed.

Full Text
PDF icon 106DeveloperIdentification.pdf219.7 KB