%0 Conference Proceedings %B 2017 IEEE/ACM 39th International Conference on Software Engineering %D 2017 %T Machine Learning-Based Detection of Open Source License Exceptions %A Vendome, Christopher %A Mario Linares-Vasquez %A Bavota, Gabriele %A Di Penta, Massimiliano %A Daniel M. German %A Poshyvanyk, Denys %K classifier %K empirical studies %K license %K machine learning %X From a legal perspective, software licenses govern the redistribution, reuse, and modification of software as both source and binary code. Free and Open Source Software (FOSS) licenses vary in the degree to which they are permissive or restrictive in allowing redistribution or modification under licenses different from the original one(s). In certain cases developers may modify the license by appending to it an exception to specifically allow reuse or modification under a particular condition. These exceptions are an important factor to consider for license compliance analysis since they modify the standard (and widely understood_ terms of the original license. In this work, we first perform a large-scale empirical study on the change history of over 51k FOSS systems aimed at quantitatively investigating the prevalence of known license exceptions and identifying new ones. Subsequently, we performed a study on the detection of license exceptions by relying on machine learning. We evaluated the license exception classification with four different supervised learners and sensitivity analysis. Finally we present a categorization of license exceptions and explain their implications. %B 2017 IEEE/ACM 39th International Conference on Software Engineering %P 118-129 %8 05/2017 %R 10.1109/ICSE.2017.19 %0 Journal Article %J Empirical Software Engineering %D 2016 %T An empirical study of integration activities in distributions of open source software %A Adams, Bram %A Kavanagh, Ryan %A Hassan, Ahmed E. %A Daniel M. German %X Reuse of software components, either closed or open source, is considered to be one of the most important best practices in software engineering, since it reduces development cost and improves software quality. However, since reused components are (by definition) generic, they need to be customized and integrated into a specific system before they can be useful. Since this integration is system-specific, the integration effort is non-negligible and increases maintenance costs, especially if more than one component needs to be integrated. This paper performs an empirical study of multi-component integration in the context of three successful open source distributions (Debian, Ubuntu and FreeBSD). Such distributions integrate thousands of open source components with an operating system kernel to deliver a coherent software product to millions of users worldwide. We empirically identified seven major integration activities performed by the maintainers of these distributions, documented how these activities are being performed by the maintainers, then evaluated and refined the identified activities with input from six maintainers of the three studied distributions. The documented activities provide a common vocabulary for component integration in open source distributions and outline a roadmap for future research on software integration. %B Empirical Software Engineering %I Springer %V 21 %P 960–1001 %U http://mcis.soccerlab.polymtl.ca/publications/2016/integration_oss_distribution.pdf %> https://flosshub.org/sites/flosshub.org/files/integration_oss_distribution.pdf %0 Book Section %B Open Source Systems: Integrating Communities: 12th IFIP WG 2.13 International Conference, OSS 2016, Gothenburg, Sweden, May 30 - June 2, 2016, Proceedings %D 2016 %T Herding Cats: A Case Study of Release Management in an Open Collaboration Ecosystem %A Poo-Caamaño, Germán %A Singer, Leif %A Knauss, Eric %A Daniel M. German %E Kevin Crowston %E Hammouda, Imed %E Lundell, Björn %E Gregorio Robles %E Gamalielsson, Jonas %E Juho Lindman %X Release management in large-scale software development projects requires significant communication and coordination. It is particularly challenging in Free and Open Source Software (FOSS) ecosystems, in which hundreds of loosely connected developers and their projects need to be coordinated to release software to a schedule. To better understand this process and its challenges, we analyzed over two and half years of communication in the GNOME ecosystem and studied developers’ interactions. We cataloged communication channels, categorized high level communication and coordination activities in one of them, and triangulated our results by interviewing developers. We found that a release schedule, influence instead of direct control, and diversity are factors that impact positively the release process in the GNOME ecosystem. Our results can help organizations build better large-scale teams and show that research focused on individual projects might miss important parts of the picture. %B Open Source Systems: Integrating Communities: 12th IFIP WG 2.13 International Conference, OSS 2016, Gothenburg, Sweden, May 30 - June 2, 2016, Proceedings %I Springer International Publishing %C Cham %P 147–162 %@ 978-3-319-39225-7 %U http://dx.doi.org/10.1007/978-3-319-39225-7_12 %& Herding Cats: A Case Study of Release Management in an Open Collaboration Ecosystem %R 10.1007/978-3-319-39225-7_12 %0 Conference Proceedings %B 12th Working Conference on Mining Software Repositories (MSR 2015) %D 2015 %T A Dataset of the Activity of the git Super-repository of Linux in 2012 %A Daniel M. German %A Adams, Bram %A Hassan, Ahmed E. %X This dataset documents the activity in the public portion of the git Super-repository of the Linux kernel during 2012. In a distributed version control system, such as git, the Super-repository is the collection of all the repositories (repos) used for development. In such a Super-repository, some repos will be accessible only by their owners (they are private, and are located in places that are unreachable to other users) while others are available to other members of the team. The latter public repositories are used as avenues through which commits flow from one developer to another. During the last six weeks of 2011, we proceeded to automatically discover the public portion of the Super-repository of Linux. Then, in 2012, every 3 hrs, each of these public repositories was queried to see what new commits it had and what commits had disappeared from it using a process we call continuous mining. This resulted in the identification of 533,513 different commits across 451 different public repositories and how they propagated through the Linux Super-repository, including the repository of Linus Torvalds (i.e., the main repository of the Linux kernel). This information could help us understand how kernel contributors use git, how they collaborate and how commits are integrated into the Linux kernel and into the repositories of organizations that distribute the kernel. This dataset is at http://turingmachine.org/2015/linuxGit %B 12th Working Conference on Mining Software Repositories (MSR 2015) %I IEEE %8 05/2015 %U http://turingmachine.org/2015/linuxGit/msr-data-git-linux.pdf %> https://flosshub.org/sites/flosshub.org/files/msr-data-git-linux.pdf %0 Journal Article %J Empirical Software Engineering %D 2015 %T An in-depth study of the promises and perils of mining GitHub %A Kalliamvakou, Eirini %A Gousios, Georgios %A Blincoe, Kelly %A Singer, Leif %A Daniel M. German %A Damian, Daniela %K github %X With over 10 million git repositories, GitHub is becoming one of the most important sources of software artifacts on the Internet. Researchers mine the information stored in GitHub’s event logs to understand how its users employ the site to collaborate on software, but so far there have been no studies describing the quality and properties of the available GitHub data. We document the results of an empirical study aimed at understanding the characteristics of the repositories and users in GitHub; we see how users take advantage of GitHub’s main features and how their activity is tracked on GitHub and related datasets to point out misalignment between the real and mined data. Our results indicate that while GitHub is a rich source of data on software development, mining GitHub for research purposes should take various potential perils into consideration. For example, we show that the majority of the projects are personal and inactive, and that almost 40% of all pull requests do not appear as merged even though they were. Also, approximately half of GitHub’s registered users do not have public activity, while the activity of GitHub users in repositories is not always easy to pinpoint. We use our identified perils to see if they can pose validity threats; we review selected papers from the MSR 2014 Mining Challenge and see if there are potential impacts to consider. We provide a set of recommendations for software engineering researchers on how to approach the data in GitHub. %B Empirical Software Engineering %I Springer %U http://www.gousios.gr/pub/promises-perils-github-extended.pdf %! Empir Software Eng %R 10.1007/s10664-015-9393-5 %> https://flosshub.org/sites/flosshub.org/files/promises-perils-github-extended.pdf %0 Conference Proceedings %B 12th Working Conference on Mining Software Repositories (MSR 2015) %D 2015 %T A Method to Detect License Inconsistencies in Large-Scale Open Source Projects %A Yuhao Wu %A Manabe, Yuki %A Tetsuya Kanda %A Daniel M. German %A Inoue, Katsuro %X The reuse of free and open source software (FOSS) components is becoming more and more popular. They usually contain one or more software licenses describing the requirements and conditions which should be followed when been reused. Licenses are usually written in the header of source code files as program comments. Removing or modifying the license header by re-distributors will result in the inconsistency of license with its ancestor, and may potentially cause license infringement. But to the best of our knowledge, no research has been devoted to investigate such kind of license infringements nor license inconsistencies. In this paper, we describe and categorize different types of license inconsistencies and propose a feasible method to detect them. Then we apply this method to Debian 7.5 and present the license inconsistencies found in it. With a manual analysis, we summarized various reasons behind these license inconsistencies, some of which imply license infringement and require the attention from the developers. This analysis also exposes the difficulty to discover license infringements, highlighting the usefulness of finding and maintaining source code provenance. %B 12th Working Conference on Mining Software Repositories (MSR 2015) %I IEEE %8 05/2015 %U http://sel.ist.osaka-u.ac.jp/lab-db/betuzuri/archive/992/992.pdf %> https://flosshub.org/sites/flosshub.org/files/992.pdf %0 Conference Proceedings %B OpenSym 2015, the 11th International Symposium on Open Collaboration %D 2015 %T Software Patents: A Replication Study %A Poo-Caamaño, Germán %A Daniel M. German %X Previous research has documented the legal and economic aspects of software patents. To study the evolution in the granting of software patents we reproduced and extended part of the empirical study on software patents conducted by Bessen and Hunt. The original study established a criteria to identify software patents, and provided a look at the evolution of patents granted until 2002. We present a simple approach to retrieve patents from the full text database provided by the United States Patent and Trademark Of- fice (USPTO), which is freely accessible. We also present the evolution of software patents since the original study, and which we also present separated by major technological firms. Our research shows a continuous increase in the number of software patents granted higher, both in number of patents granted (in absolute numbers) and in proportion of overall patents (in relative terms). The relevance of studying the evolution of software patents relies in the challenges to find prior-art, either for practitioners looking for patenting as well as for examiners evaluating granting a new patent. %B OpenSym 2015, the 11th International Symposium on Open Collaboration %8 08/2015 %U http://www.opensym.org/os2015/proceedings-files/p104-poo-caamano.pdf %> https://flosshub.org/sites/flosshub.org/files/p104-poo-caamano.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T The Promises and Perils of Mining GitHub %A Kalliamvakou, Eirini %A Gousios, Georgios %A Blincoe, Kelly %A Singer, Leif %A Daniel M. German %A Damian, Daniela %K bias %K code reviews %K git %K github %K mining software repositories %X With over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the information stored in GitHub's event logs, trying to understand how its users employ the site to collaborate on software. However, so far there have been no studies describing the quality and properties of the data available from GitHub. We document the results of an empirical study aimed at understanding the characteristics of the repositories in GitHub and how users take advantage of GitHub's main features---namely commits, pull requests, and issues. Our results indicate that, while GitHub is a rich source of data on software development, mining GitHub for research purposes should take various potential perils into consideration. We show, for example, that the majority of the projects are personal and inactive; that GitHub is also being used for free storage and as a Web hosting service; and that almost 40% of all pull requests do not appear as merged, even though they were. We provide a set of recommendations for software engineering researchers on how to approach the data in GitHub. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 92–101 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597074 %R 10.1145/2597073.2597074 %> https://flosshub.org/sites/flosshub.org/files/perils.pdf %0 Journal Article %J Empirical Software Engineering %D 2013 %T Management of community contributions %A Bettenburg, Nicolas %A Hassan, Ahmed E. %A Adams, Bram %A Daniel M. German %K android %K contribution %K linux %K management %X In recent years, many companies have realized that collaboration with a thriving user or developer community is a major factor in creating innovative technology driven by market demand. As a result, businesses have sought ways to stimulate contributions from developers outside their corporate walls, and integrate external developers into their development process. To support software companies in this process, this paper presents an empirical study on the contribution management processes of two major, successful, open source software ecosystems. We contrast a for-profit (ANDROID) system having a hybrid contribution style, with a not-for-profit (LINUX kernel) system having an open contribution style. To guide our comparisons, we base our analysis on a conceptual model of contribution management that we derived from a total of seven major open-source software systems. A quantitative comparison based on data mined from the ANDROID code review system and the LINUX kernel code review mailing lists shows that both projects have significantly different contribution management styles, suited to their respective market goals, but with individual advantages and disadvantages that are important for practitioners. Contribution management is a real-world problem that has received very little attention from the research community so far. Both studied systems (LINUX and ANDROID) employ different strategies and techniques for managing contributions, and both approaches are valuable examples for practitioners. Each approach has specific advantages and disadvantages that need to be carefully evaluated by practitioners when adopting a contribution management process in practice. %B Empirical Software Engineering %I Springer %P 1–38 %U http://link.springer.com/article/10.1007/s10664-013-9284-6 %0 Conference Proceedings %B 10th Working Conference on Mining Software Repositories %D 2013 %T Will My Patch Make It? And How Fast?: Case Study on the Linux Kernel %A Yujuan Jiang %A Adams, Bram %A Daniel M. German %X The Linux kernel follows an extremely distributed reviewing and integration process supported by 130 developer mailing lists and a hierarchy of dozens of Git repositories for version control. Since not every patch can make it and of those that do, some patches require a lot more reviewing and integration effort than others, developers, reviewers and integrators need support for estimating which patches are worthwhile to spend effort on and which ones do not stand a chance. This paper cross- links and analyzes eight years of patch reviews from the kernel mailing lists and committed patches from the Git repository to understand which patches are accepted and how long it takes those patches to get to the end user. We found that 33% of the patches makes it into a Linux release, and that most of them need 3 to 6 months for this. Furthermore, that patches developed by more experienced developers are more easily accepted and faster reviewed and integrated. Additionally, reviewing time is impacted by submission time, the number of affected subsystems by the patch and the number of requested reviewers. %B 10th Working Conference on Mining Software Repositories %8 05/2013 %0 Conference Paper %B 2012 3rd International Workshop on Emerging Trends in Software Metrics (WETSoM) %D 2012 %T Modification and developer metrics at the function level: Metrics for the study of the evolution of a software project %A Gregorio Robles %A Herraiz, Israel %A Daniel M. German %A Izquierdo-Cortazar, Daniel %X Software evolution, and particularly its growth, has been mainly studied at the file (also sometimes referred as module) level. In this paper we propose to move from the physical towards a level that includes semantic information by using functions or methods for measuring the evolution of a software system. We point out that use of functions-based metrics has many advantages over the use of files or lines of code. We demonstrate our approach with an empirical study of two Free/Open Source projects: a community-driven project, Apache, and a company-led project, Novell Evolution. We discovered that most functions never change; when they do their number of modifications is correlated with their size, and that very few authors who modify each; finally we show that the departure of a developer from a software project slows the evolution of the functions that she authored. %B 2012 3rd International Workshop on Emerging Trends in Software Metrics (WETSoM) %I IEEE %C Zurich, Switzerland %P 49 - 55 %@ 978-1-4673-1763-4 %R 10.1109/WETSoM.2012.6226993 %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T Apples vs. oranges? %A Davies, Julius %A Daniel M. German %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K eclipse %K netbeans %K source code %X We attempt to compare the source code of two Java IDE systems: Netbeans and Eclipse. The result of this experiment shows that many factors, if ignored, could risk a bias in the results, and we posit various observations that should be taken into consideration to minimize such risk. %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 246-249 %8 05/2011 %@ 9781450305747 %! MSR '11 %R 10.1145/1985441.1985483 %0 Conference Paper %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %D 2011 %T Towards understanding twitter use in software engineering: preliminary findings, ongoing challenges and future questions %A Bougie, Gargi %A Starke, Jamie %A Storey, Margaret-Anne %A Daniel M. German %K eclipse %K linux %K mxunit %K social media %K software development %K twitter %K web 2.0 %X There has been some research conducted around the motivation for the use of Twitter and the value brought by micro-blogging tools to individuals and business environments. This paper builds on our understanding of how the phenomenon affects the population which birthed the technology: Software Engineers. We find that the Software Engineering community extensively leverages Twitter's capabilities for conversation and information sharing and that use of the tool is notably different between distinct Software Engineering groups. Our work exposes topics for future research and outlines some of the challenges in exploring this type of data. %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %S Web2SE '11 %I ACM %C New York, NY, USA %P 31–36 %@ 978-1-4503-0595-2 %U http://doi.acm.org/10.1145/1984701.1984707 %R 10.1145/1984701.1984707 %> https://flosshub.org/sites/flosshub.org/files/WEB2SE2011.pdf %0 Conference Paper %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %D 2011 %T Towards understanding twitter use in software engineering: preliminary findings, ongoing challenges and future questions %A Bougie, Gargi %A Starke, Jamie %A Storey, Margaret-Anne %A Daniel M. German %K eclipse %K linux %K mxunit %K social media %K software development %K twitter %K web 2.0 %X There has been some research conducted around the motivation for the use of Twitter and the value brought by micro-blogging tools to individuals and business environments. This paper builds on our understanding of how the phenomenon affects the population which birthed the technology: Software Engineers. We find that the Software Engineering community extensively leverages Twitter's capabilities for conversation and information sharing and that use of the tool is notably different between distinct Software Engineering groups. Our work exposes topics for future research and outlines some of the challenges in exploring this type of data. %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %S Web2SE '11 %I ACM %C New York, NY, USA %P 31–36 %@ 978-1-4503-0595-2 %U http://doi.acm.org/10.1145/1984701.1984707 %R 10.1145/1984701.1984707 %> https://flosshub.org/sites/flosshub.org/files/WEB2SE2011_0.pdf %0 Conference Paper %B 1st Workshop on Replication in Empirical Software Engineering Research %D 2010 %T Beyond replication: An example of the potential benefits of replicability in the mining of software repositories community %A Gregorio Robles %A Daniel M. German %K literature review %K msr %K replication %B 1st Workshop on Replication in Empirical Software Engineering Research %8 05/2010 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T A comparative exploration of FreeBSD bug lifetimes %A Bougie, Gargi %A Treude, Christoph %A Daniel M. German %A Storey, Margaret-Anne %K bug reports %K bug tracking %K classification %K eclipse %K msr challenge %K prediction %X In this paper, we explore the viability of mining the basic data provided in bug repositories to predict bug lifetimes. We follow the method of Lucas D. Panjer as described in his paper, Predicting Eclipse Bug Lifetimes. However, in place of Eclipse data, the FreeBSD bug repository is used. We compare the predictive accuracy of five different classification algorithms applied to the two data sets. In addition, we propose future work on whether there is a more informative way of classifying bugs than is considered by current bug tracking systems. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 106 - 109 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463291 %> https://flosshub.org/sites/flosshub.org/files/106ChallengeGargi.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Identifying licensing of jar archives using a code-search approach %A Di Penta, Massimiliano %A Daniel M. German %A Antoniol, Giuliano %K apache %K bytecode %K classification %K eclipse %K google code %K jar %K java %K licenses %K source code %X Free and open source software strongly promotes the reuse of source code. Some open source Java components/libraries are distributed as jar archives only containing the bytecode and some additional information. For whoever wanting to integrate this jar in her own project, it is important to determine the license(s) of the code from which the jar archive was produced, as this affects the way that such component can be used. This paper proposes an automatic approach to determine the license of jar archives, combining the use of a code-search engine with the automatic classification of licenses contained in textual flies enclosed in the jar. Results of an empirical study performed on 37 jars - from 17 different systems - indicate that this approach is able to successfully infer the jar licenses in over 95% of the cases, but that in many cases the license in textual flies may differ from the one of the classes contained in the jar. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 151 - 160 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463282 %> https://flosshub.org/sites/flosshub.org/files/151msr2010.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Perspectives on bugs in the Debian bug tracking system %A Davies, Julius %A Hanyu Zhang %A Nussbaum, Lucas %A Daniel M. German %K bug reports %K debian %K msr challenge %K popularity %X Bugs in Debian differ from regular software bugs. They are usually associated with packages, instead of software modules. They are caused and fixed by source package uploads instead of code commits. The majority are reported by individuals who appear in the bug database once, and only once. There also exists a small group of bug reporters with over 1,000 bug reports each to their name. We also explore our idea that a high bug-frequency for an individual package might be an indicator of popularity instead of poor quality. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 86 - 89 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463288 %> https://flosshub.org/sites/flosshub.org/files/86bugs-debian.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Code siblings: Technical and legal implications of copying code between applications %A Daniel M. German %A Di Penta, Massimiliano %A Gueheneuc, Yann-Gael %A Antoniol, Giuliano %K bsd %K fossology %K freebsd %K linux %K openbsd %K source code %X Source code cloning does not happen within a single system only. It can also occur between one system and another. We use the term code sibling to refer to a code clone that evolves in a different system than the code from which it originates. Code siblings can only occur when the source code copyright owner allows it and when the conditions imposed by such license are not incompatible with the license of the destination system. In some situations copying of source code fragments are allowed - legally - in one direction, but not in the other. In this paper, we use clone detection, license mining and classification, and change history techniques to understand how code siblings - under different licenses - flow in one direction or the other between Linux and two BSD Unixes, FreeBSD and OpenBSD. Our results show that, in most cases, this migration appears to happen according to the terms of the license of the original code being copied, favoring always copying from less restrictive licenses towards more restrictive ones. We also discovered that sometimes code is inserted to the kernels from an outside source. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 81 - 90 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069483 %> https://flosshub.org/sites/flosshub.org/files/81CodeSiblings.pdf %0 Conference Paper %B Proceedings of the 30th International Conference on Software Engineering (ICSE 2008) %D 2008 %T Open source software peer review practices: a case study of the apache server %A Peter C. Rigby %A Daniel M. German %A Storey, Margaret-Anne %K apache %K cvs %K email %K inspection %K mining software repositories (email) %K open source software %K peer review %K version control %X Peer review is seen as an important quality assurance mechanism in both industrial development and the open source software (OSS) community. The techniques for performing inspections have been well studied in industry; in OSS development, peer reviews are less well understood. We examine the two peer review techniques used by the successful, mature Apache server project: review-then-commit and commit-then-review. Using archival records of email discussion and version control repositories, we construct a series of metrics that produces measures similar to those used in traditional inspection experiments. Specifically, we measure the frequency of review, the level of participation in reviews, the size of the artifact under review, the calendar time to perform a review, and the number of reviews that find defects. We provide a comparison of the two Apache review techniques as well as a comparison of Apache review to inspection in an industrial project. We conclude that Apache reviews can be described as (1) early, frequent reviews (2) of small, independent, complete contributions (3) conducted asynchronously by a potentially large, but actually small, group of self-selected experts (4) leading to an efficient and effective peer review technique. %B Proceedings of the 30th International Conference on Software Engineering (ICSE 2008) %S ICSE '08 %I ACM %C New York, NY, USA %P 541–550 %@ 978-1-60558-079-1 %U http://doi.acm.org/10.1145/1368088.1368162 %R 10.1145/1368088.1368162 %> https://flosshub.org/sites/flosshub.org/files/p541-rigby.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Towards a simplification of the bug report form in eclipse %A Herraiz, Israel %A Daniel M. German %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %K bug fixing %K bug report %K bug tracking system %K classification %K eclipse %K msr challenge %K severity %X We believe that the bug report form of Eclipse contains too many fields, and that for some fields, there are too many options. In this MSR challenge report, we focus in the case of the severity field. That field contains seven different levels of severity. Some of them seem very similar, and it is hard to distinguish among them. Users assign severity, and developers give priority to the reports depending on their severity. However, if users can not distinguish well among the various severity options, they will probably assign different priorities to bugs that require the same priority. We study the mean time to close bugs reported in Eclipse, and how the severity assigned by users affects this time. The results shows that classifying by time to close, there are less clusters of bugs than levels of severity. We therefore conclude that there is a need to make a simpler bug report form. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 145–148 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370786 %R http://doi.acm.org/10.1145/1370750.1370786 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T What do large commits tell us?: a taxonomical study of large commits %A Hindle, Abram %A Daniel M. German %A Holt, Ric %K boost %K bug fixing %K egroupware %K enlightenment %K evolution %K firebird %K large commits %K maintenance %K mysql %K postgresql %K samba %K software evolution %K source control system %K spring %X Research in the mining of software repositories has frequently ignored commits that include a large number of files (we call these large commits). The main goal of this paper is to understand the rationale behind large commits, and if there is anything we can learn from them. To address this goal we performed a case study that included the manual classification of large commits of nine open source projects. The contributions include a taxonomy of large commits, which are grouped according to their intention. We contrast large commits against small commits and show that large commits are more perfective while small commits are more corrective. These large commits provide us with a window on the development practices of maintenance teams. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 99–108 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370773 %R http://doi.acm.org/10.1145/1370750.1370773 %> https://flosshub.org/sites/flosshub.org/files/p99-hindle.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Using Software Distributions to Understand the Relationship among Free and Open Source Software Projects %A Daniel M. German %K dependencies %K evolution %K fink %K metrics %X Success in the open source software world has been measured in terms of metrics such as number of downloads, number of commits, number of lines of code, number of participants, etc. These metrics tend to discriminate towards applications that are small and tend to evolve slowly. A problem is, however, how to identify applications in these latter categories that are important. Software distributions specify the dependencies needed to build and to run a given software application. We use this information to create a dependency graph of the applications contained in such a distribution. We explore the characteristics of this graph, and use it to define some metrics to quantify the dependencies (and dependents) of a given software application. We demonstrate that some applications that are invisible to the final user (such as libraries) are widely used by end-user applications. This graph can be used as a proxy to measure success of small, slowly evolving free and open source software. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 24 - 24 %@ 0-7695-2950-X %R 10.1109/MSR.2007.32 %> https://flosshub.org/sites/flosshub.org/files/28300024.pdf %0 Journal Article %D 2006 %T A preliminary examination of code review processes in open source projects %A Peter C. Rigby %A Daniel M. German %X In this paper, we provide preliminary answers to the following questions regarding OSS peer review or inspection. What is the patch process and review process used by the projects? What types of review does a the project use? Why are patches rejected? What percentage of patches are rejected? Who performs the review? Are the top developers also the top reviewers? When are reviews performed? What is the frequency of review? How long do reviews take to perform? How does the patch size affect the review? How does merit-based trust among actors affect the review? Are more trusted individuals reviewed less often? How much feedback is provided in the review? What kinds of non-source code patches are reviewed? How does the kind of patch affect the review? What affect does reviewing have on other elements of the patch process? What is the relationship between reviewing and testing? The first two questions are answered in a qualitative manner for GCC, Linux, Mozilla, and Apache. The remaining questions are answered for the Apache project. The most striking similarities among projects is there use of a pre-commit review and requests for small, complete, independent patches. The Apache project also uses a post-commit review of trusted members. Reviews in the Apache project occur very frequently and usually have a review interval of hours. A small core group of reviewers conduct over 80% of reviews for Apache; however, the number of and actual individuals fluctuates over the 9 years of data we examine. %8 January %G eng %> https://flosshub.org/sites/flosshub.org/files/Rigby2006TR.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T A study of the contributors of PostgreSQL %A Daniel M. German %K contributions %K contributors %K cvs %K developers %K mining challenge %K mining software repositories %K msr challenge %K patches %K postgresql %K revision history %K roles %K software evolution %K source code %K team %X This report describes some characteristics of the development team of PostgreSQL that were uncovered by analyzing the history of its software artifacts as recorded by the project's CVS repository. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 163–164 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138022 %R http://doi.acm.org/10.1145/1137983.1138022 %> https://flosshub.org/sites/flosshub.org/files/163AStudyOf.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Using evolutionary annotations from change logs to enhance program comprehension %A Daniel M. German %A Peter C. Rigby %A Storey, Margaret-Anne %K annotations %K apache %K bug tracking %K change history %K eclipse %K evolutionary %K log files %K mailing lists %K mining software repositories %K software evolution %K version control %X Evolutionary annotations are descriptions of how source code evolves over time. Typical source comments, given their static nature, are usually inadequate for describing how a program has evolved over time; instead, source code comments are typically a description of what a program currently does. We propose the use of evolutionary annotations as a way of describing the rationale behind changes applied to a given program (for example "These lines were added to ..."). Evolutionary annotations can assist a software developer in the understanding of how a given portion of source code works by showing him how the source has evolved into its current form.In this paper we describe a method to automatically create evolutionary annotations from change logs, defect tracking systems and mailing lists. We describe the design of a prototype for Eclipse that can filter and present these annotations alongside their corresponding source code and in workbench views. We use Apache as a test case to demonstrate the feasibility of this approach. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 159–162 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138020 %R http://doi.acm.org/10.1145/1137983.1138020 %> https://flosshub.org/sites/flosshub.org/files/159UsingEvolutionary.pdf %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T The challenges of creating open source education software: the Gild experience %A Daniel M. German %A Rigby, Peter %A Cubranic, Davor %A Storey, Margaret-Anne %A Thomson, Suzanne %K COMMUNITY %K eclipse %K learning environment %K novice programmers %K open source %K programming environment %B OSS2005: Open Source Systems %P 338-340 %U http://pascal.case.unibz.it/handle/2038/1539 %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Experiences teaching a graduate course in Open Source Software Engineering %A Daniel M. German %K course %K FOSS %K MOTIVATION %K open source %K open source software engineering %X This paper describes the early experiences of a graduate course in open source software engineering at the Department of Computer Science at the University Victoria. It includes a description of the motivation for the course, its structure and evaluation methods. It concludes with a discussion of the lessons learned and its future. %B OSS2005: Open Source Systems %P 326-328 %U http://pascal.case.unibz.it/handle/2038/971 %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T SCQL: a formal model and a query language for source control repositories %A Hindle, Abram %A Daniel M. German %K evolution %K file %K gnumeric %K modperl %K openssl %K revision %K samba %K scm %K source code %X Source Control Repositories are used in most software projects to store revisions to source code files. These repositories operate at the file level and support multiple users. A generalized formal model of source control repositories is described herein. The model is a graph in which the different entities stored in the repository become vertices and their relationships become edges. We then define SCQL, a first order, and temporal logic based query language for source control repositories. We demonstrate how SCQL can be used to specify some questions and then evaluate them using the source control repositories of five different large software projects. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 100-104 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083161 %R http://doi.acm.org/10.1145/1082983.1083161 %> https://flosshub.org/sites/flosshub.org/files/100scql.pdf %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T The evolution of the GNOME Project %A Daniel M. German %K commercial software %K gnome %K organizational sponsorship %X The GNOME Project is an attempt to create a GUI desktop for Unix systems. Originally started by a handful of volunteers in 1996, GNOME has become the desktop of choice for Solaris, HP-UX, and Red Hat Linux, and it is currently developed by a team of approximately five hundred people around the world. The importance of GNOME to the Unix world has attracted the attention of several software companies who are actively participating in its development. At the same time, some of its volunteer developers have created enterprises who expect to sell services and products around GNOME. This extended abstract describes, first, the development model of GNOME, then the influence that private companies had had on the project: on one hand they are contributing a large amount of resources to the project, accelerating its development, and increasing its reliability and documentation; and on the other hand, the GNOME Foundation has been created to maintain the goal of the project to provide a free (as in freedom) software desktop for Unix, and avoid that the commercial interests of these partners could jeopardize the interests of the community. %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/German.pdf