The Eclipse and Mozilla Defect Tracking Dataset: A Genuine Dataset for Mining Bug Information

TitleThe Eclipse and Mozilla Defect Tracking Dataset: A Genuine Dataset for Mining Bug Information
Publication TypeConference Proceedings
Year of Publication2013
AuthorsLamkanfi, A, Pérez, J, Demeyer, S
Refereed DesignationRefereed
Secondary Title10th Working Conference on Mining Software Repositories
Date Published05/2013
Abstract

The analysis of bug reports is an important subfield within the mining software repositories community. It explores the rich data available in defect tracking systems to uncover interesting and actionable information about the bug triaging process. While bug data is readily accessible from systems like Bugzilla and JIRA, a common database schema and a curated dataset could significantly enhance future research because it allows for easier replication. Consequently, in this paper we propose the Eclipse and Mozilla Defect Tracking Dataset, a representative database of bug data, filtered to contain only genuine defects (i.e., no feature requests) and designed to cover the whole bug-triage life cycle (i.e., store all intermediate actions). We have used this dataset ourselves for predicting bug severity, for studying bug-fixing time and for identifying erroneously assigned components. github.com/ansymo/msr2013-bug_dataset

Notes

Intended to be an "open bug database" or baseline for multiple studies in the community. 14 bug attributes [id, product, summary, status, etc], some change over time and some do not. Data set: github.com/ansymo/msr2013-bug_dataset

URLhttp://github.com/ansymo/msr2013-bug_dataset
Full Text