Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories

TitleBoa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories
Publication TypeConference Proceedings
Year of Publication2013
AuthorsDyer, Robert, Nguyen Hoan Anh, Rajan Hridesh, and Nguyen Tien N.
Refereed DesignationRefereed
Secondary Title35th Int'l Conference on Software Engineering (ICSE 2013)
Pagination422-431
Date Published05/2013
Keywordsease of use, forge, github, google code, lower barrier to entry, mining, repository, reproducible, scalable, Software, sourceforge
Abstract

In today’s software-centric world, ultra-large-scale software repositories, e.g. SourceForge (350,000+ projects), GitHub (250,000+ projects), and Google Code (250,000+ projects) are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important hypotheses. However, systematic extraction of relevant data from these repositories and analysis of such data for testing hypotheses is hard, and best left for mining software repository (MSR) experts! The goal of Boa, a domain-specific language and infrastructure described here, is to ease testing MSR-related hypotheses. We have implemented Boa and provide a web-based interface to Boa’s infrastructure. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also see drastic improvements in scalability. Last but not least, reproducing an experiment conducted using Boa is just a matter of re-running small Boa programs provided by previous researchers.