%0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T The Bug Catalog of the Maven Ecosystem %A Mitropoulos, Dimitris %A Vassilios Karakoidas %A Louridas, Panos %A Gousios, Georgios %A Diomidis Spinellis %K findbugs %K Maven Repository %K msr data showcase %K Software Bugs %X Examining software ecosystems can provide the research community with data regarding artifacts, processes, and communities. We present a dataset obtained from the Maven central repository ecosystem (approximately 265GB of data) by statically analyzing the repository to detect potential software bugs. For our analysis we used FindBugs, a tool that examines Java bytecode to detect numerous types of bugs. The dataset contains the metrics results that FindBugs reports for every project version (a JAR) included in the ecosystem. For every version we also stored specific metadata such as the JAR's size, its dependencies and others. Our dataset can be used to produce interesting research results, as we show in specific examples. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 372–375 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597123 %R 10.1145/2597073.2597123 %> https://flosshub.org/sites/flosshub.org/files/mitro.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T A Dataset for Pull-based Development Research %A Gousios, Georgios %A Zaidman, Andy %K Distributed software development %K Empirical software engineering %K msr data showcase %K pull request %K pull-based development %X Pull requests form a new method for collaborating in distributed software development. To study the pull request distributed development model, we constructed a dataset of almost 900 projects and 350,000 pull requests, including some of the largest users of pull requests on Github. In this paper, we describe how the project selection was done, we analyze the selected features and present a machine learning tool set for the R statistics environment. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 368–371 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597122 %R 10.1145/2597073.2597122 %> https://flosshub.org/sites/flosshub.org/files/pullreqs-dataset.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Lean GHTorrent: GitHub Data on Demand %A Gousios, Georgios %A Vasilescu, Bogdan %A Serebrenik, Alexander %A Zaidman, Andy %K data on demand %K dataset %K github %K msr data showcase %X In recent years, GitHub has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GitHub as their host and have migrated their code base to it. GitHub offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GitHub data is, to date, largely underexplored. To facilitate studies of GitHub, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GitHub REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GitHub repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the "barrier for entry" even further for researchers interested in mining GitHub data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GitHub studies (since a snapshot of the data on which the results were obtained can now easily accompany each study). %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 384–387 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597126 %R 10.1145/2597073.2597126 %> https://flosshub.org/sites/flosshub.org/files/lean-ghtorrent_0.pdf