On mining data across software repositories

TitleOn mining data across software repositories
Publication TypeConference Paper
Year of Publication2009
AuthorsAnbalagan, P, Vouk, M
Secondary Title2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories
Pagination171 - 174
PublisherIEEE
Place PublishedVancouver, BC, Canada
ISBN Number978-1-4244-3493-0
Keywordsbug reports, bugzilla, Fedora, Firefox, htmlscraper, integration, launchpad, national vulnerability database, RedHat, Suse, tracker, Ubuntu
Abstract

Software repositories provide abundance of valuable information about open source projects. With the increase in the size of the data maintained by the repositories, automated extraction of such data from individual repositories, as well as of linked information across repositories, has become a necessity. In this paper we describe a framework that uses web scraping to automatically mine repositories and link information across repositories. We discuss two implementations of the framework. In the first implementation, we automatically identify and collect security problem reports from project repositories that deploy the Bugzilla bug tracker using related vulnerability information from the National Vulnerability Database. In the second, we collect security problem reports for projects that deploy the Launchpad bug tracker along with related vulnerability information from the National Vulnerability Database. We have evaluated our tool on various releases of Fedora, Ubuntu, Suse, RedHat, and Firefox projects. The percentage of security bugs identified using our tool is consistent with that reported by other researchers.

DOI10.1109/MSR.2009.5069498(link is external)
Full Text
AttachmentSize
PDF icon 171MiningAcrossmsr09.pdf443.17 KB