%0 Conference Paper %B Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering %D 2017 %T Using Metrics to Track Code Review Performance %A Izquierdo-Cortazar, Daniel %A Sekitoleko, Nelson %A Jesus M. Gonzalez-Barahona %A Kurth, Lars %K code review %K data mining %K Software development analytics %X During 2015, some members of the Xen Project Advisory Board became worried about the performance of their code review process. The Xen Project is a free, open source software project developing one of the most popular virtualization platforms in the industry. They use a pre-commit peer review process similar to that in the Linux kernel, based on email messages. They had observed a large increase over time in the number of messages related to code review, and were worried about how this could be a signal of problems with their code review process. To address these concerns, we designed and conducted, with their continuous feedback, a detailed analysis focused on finding these problems, if any. During the study, we dealt with the methodological problems of Linux-like code review, and with the deeper issue of finding metrics that could uncover the problems they were worried about. For having a benchmark, we run the same analysis on a similar project, which uses very similar code review practices: the Linux Netdev (Netdev) project. As a result, we learned how in fact the Xen Project had some problems, but at the moment of the analysis those were already under control. We found as well how different the Xen and Netdev projects were behaving with respect to code review performance, despite being so similar from many points of view. In this paper we show the results of both analyses, and propose a comprehensive methodology, fully automated, to study Linux-style code review. We discuss also the problems of getting significant metrics to track improvements or detect problems in this kind of code review. %B Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering %S EASE'17 %I ACM %C New York, NY, USA %P 214–223 %@ 978-1-4503-4804-1 %U http://doi.acm.org/10.1145/3084226.3084247 %R 10.1145/3084226.3084247 %0 Generic %D 2015 %T Lessons Learned from Applying Social Network Analysis on an Industrial Free/Libre/Open Source Software Ecosystem %A Teixeira, Jose %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %K business models %K cloud computing %K homophily %K open source %K Open-Coopetition %K openstack %K social network analysis %K Software ecosystems %X Many software projects are no longer done in-house by a single organization. Instead, we are in a new age where software is developed by a networked community of individuals and organizations, which base their relations to each other on mutual interest. Paradoxically, recent research suggests that software development can actually be jointly-developed by rival firms. For instance, it is known that the mobile-device makers Apple and Samsung kept collaborating in open source projects while running expensive patent wars in the court. Taking a case study approach, we explore how rival firms collaborate in the open source arena by employing a multi-method approach that combines qualitative analysis of archival data (QA) with mining software repositories (MSR) and Social Network Analysis (SNA). While exploring collaborative processes within the OpenStack ecosystem, our research contributes to Software Engineering research by exploring the role of groups, sub-communities and business models within a high-networked open source ecosystem. Surprising results point out that competition for the same revenue model (i.e., operating conflicting business models) does not necessary affect collaboration within the ecosystem. Moreover, while detecting the different sub-communities of the OpenStack community, we found out that the expected social tendency of developers to work with developers from same firm (i.e., homophony) did not hold within the OpenStack ecosystem. Furthermore, while addressing a novel, complex and unexplored open source case, this research also contributes to the management literature in coopetition strategy and high-tech entrepreneurship with a rich description on how heterogeneous actors within a high-networked ecosystem (involving individuals, startups, established firms and public organizations) joint-develop a complex infrastructure for big-data in the open source arena. %U http://arxiv.org/abs/1507.04587 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Evolution of the core team of developers in libre software projects %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %A Herraiz, Israel %K core %K cvs %K cvsanaly %K developers %K evolution %K gimp %K scm %X In many libre (free, open source) software projects, most of the development is performed by a relatively small number of persons, the "core team". The stability and permanence of this group of most active developers is of great importance for the evolution and sustainability of the project. In this position paper we propose a quantitative methodology to study the evolution of core teams by analyzing information from source code management repositories. The most active developers in different periods are identified, and their activity is calculated over time, looking for core team evolution patterns. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 167 - 170 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069497 %> https://flosshub.org/sites/flosshub.org/files/167core-evolution.pdf %0 Journal Article %J 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %D 2009 %T Using Software Archaeology to Measure Knowledge Loss in Software Projects Due to Developer Turnover %A Izquierdo-Cortazar, Daniel %A Gregorio Robles %A Ortega, Felipe %A Jesus M. Gonzalez-Barahona %K attrition %K case study %K developers %K evince %K evolution %K gimp %K growth %K knowledge collaboration %K lines of code %K nautilus %K quality %K sloc %K turnover %X Developer turnover can result in a major problem when developing software. When senior developers abandon a software project, they leave a knowledge gap that has to be managed. In addition, new (junior) developers require some time in order to achieve the desired level of productivity. In this paper, we present a methodology to measure the effect of knowledge loss due to developer turnover in software projects. For a given software project, we measure the quantity of code that has been authored by developers that do not belong to the current development team, which we define as orphaned code. Besides, we study how orphaned code is managed by the project. Our methodology is based on the concept of software archaeology, a derivation of software evolution. As case studies we have selected four FLOSS (free, libre, open source software) projects, from purely driven by volunteers to company-supported. The application of our methodology to these case studies will give insight into the turnover that these projects suffer and how they have managed it and shows that this methodology is worth being augmented in future research. %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %I IEEE Computer Society %C Los Alamitos, CA, USA %P 1-10 %@ 978-0-7695-3450-3 %R http://doi.ieeecomputersociety.org/10.1109/HICSS.2009.1014 %> https://flosshub.org/sites/flosshub.org/files/07-07-08.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Towards a simplification of the bug report form in eclipse %A Herraiz, Israel %A Daniel M. German %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %K bug fixing %K bug report %K bug tracking system %K classification %K eclipse %K msr challenge %K severity %X We believe that the bug report form of Eclipse contains too many fields, and that for some fields, there are too many options. In this MSR challenge report, we focus in the case of the severity field. That field contains seven different levels of severity. Some of them seem very similar, and it is hard to distinguish among them. Users assign severity, and developers give priority to the reports depending on their severity. However, if users can not distinguish well among the various severity options, they will probably assign different priorities to bugs that require the same priority. We study the mean time to close bugs reported in Eclipse, and how the severity assigned by users affects this time. The results shows that classifying by time to close, there are less clusters of bugs than levels of severity. We therefore conclude that there is a need to make a simpler bug report form. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 145–148 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370786 %R http://doi.acm.org/10.1145/1370750.1370786 %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Forecasting the Number of Changes in Eclipse Using Time Series Analysis %A Herraiz, Israel %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %K change management %K cvs %K cvsanaly %K eclipse %K prediction %X In order to predict the number of changes in the following months for the project Eclipse, we have applied a statistical (non-explanatory) model based on time series analysis. We have obtained the monthly number of changes in the CVS repository of Eclipse, using the CVSAnalY tool. The input to our model was the filtered series of the number of changes per month, and the output was the number of changes per month for the next three months. Then we aggregated the results of the three months to obtain the total number of changes in the given period in the challenge. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 32 - 32 %@ 0-7695-2950-X %R 10.1109/MSR.2007.10 %> https://flosshub.org/sites/flosshub.org/files/28300032.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Impact of the Creation of the Mozilla Foundation in the Activity of Developers %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %A Herraiz, Israel %K cvs %K cvsanaly %K developers %K mining challenge %K mozilla %K msr challenge %K revision history %X During 2003, the Mozilla project transitioned from company-promoted (sponsored by AOL) to community-promoted (sponsored by the Mozilla Foundation). What happened to the group of developers during this transition? There was any significant impact on its activity or composition? To answer these questions, we have performed an analysis of the CVS repository of Mozilla, using the CVSAnalY tool, finding little on activity, but dramatic changes in the the composition of the development team. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 28 - 28 %@ 0-7695-2950-X %R 10.1109/MSR.2007.15 %> https://flosshub.org/sites/flosshub.org/files/28300028.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Towards a Theoretical Model for Software Growth %A Herraiz, Israel %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %K C %K complexity %K evolution %K freebsd %K growth %K halstead %K lines of code %K loc %K mccabe %K metrics %K scm %K size %K sloc %K sloccount %K source code %X Software growth (and more broadly, software evolution) is usually considered in terms of size or complexity of source code. However in different studies, usually different metrics are used, which make it difficult to compare approaches and results. In addition, not all metrics are equally easy to calculate for a given source code, which leads to the question of which one is the easiest to calculate without losing too much information. To address both issues, in this paper present a comprehensive study, based on the analysis of about 700,000 C source code files, calculating several size and complexity metrics for all of them. For this sample, we have found double Pareto statistical distributions for all metrics considered, and a high correlation between any two of them. This would imply that any model addressing software growth should produce this Pareto distributions, and that analysis based on any of the considered metrics should show a similar pattern, provided the sample of files considered is large enough. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 21 - 21 %@ 0-7695-2950-X %R 10.1109/MSR.2007.31 %> https://flosshub.org/sites/flosshub.org/files/28300021.pdf %0 Journal Article %J International Journal of Information Technology and Web Engineering %D 2006 %T Applying Social Network Analysis Techniques to Community-Driven Libre Software Projects %A López-Fernández, L. %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %A Herraiz, I. %K apache %K conway's law %K cvs %K gnome %K kde %K scm %K social network analysis %K source code %X Source code management repositories of large, long-lived libre (free, open source) software projects can be a source of valuable data about the organizational structure, evolution, and knowledge exchange in the corresponding development communities. Unfortunately, the sheer volume of the available information renders it almost unusable without applying methodologies which highlight the relevant information for a given aspect of the project. Such methodology is proposed in this article, based on well known concepts from the social networks analysis field, which can be used to study the relationships among developers and how they collaborate in different parts of a project. It is also applied to data mined from some well known projects (Apache, GNOME, and KDE), focusing on the characterization of their collaboration network architecture. These cases help to understand the potentials of the methodology and how it is applied, but also shows some relevant results which open new paths in the understanding of the informal organization of libre software development communities. %B International Journal of Information Technology and Web Engineering %V 1 %G eng %> https://flosshub.org/sites/flosshub.org/files/06_Lopez_ijitwe_sna.pdf %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Evolution of Volunteer Participation in Libre Software Projects: Evidence from Debian %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %A Martin Michlmayr %K contributors %K debian %K maintainers %K PopCon %K popularity %K Volunteers %X Most libre software projects rely on the work of volunteers. Therefore, attracting people who contribute their time and technical skills is of paramount importance, both in technical and economic terms. This reliance on volunteers leads to some fundamental management challenges: volunteer contributions are inherently difficult to predict, plan and manage, especially in the case of large projects. In this paper we analyze the evolution in time of the human resources of one of the largest and most complex libre software projects composed primarily of volunteers, the Debian project. Debian currently has around 1300 volunteers working on several tasks: much activity is focused on packaging software applications and libraries, but there is also major work related to the maintenance of the infrastructure needed to sustain the development. We have performed a quantitative investigation of data from almost seven years, studying how volunteer involvement has affected the software... %B OSS2005: Open Source Systems %P 100-107 %U http://pascal.case.unibz.it/handle/2038/857 %> https://flosshub.org/sites/flosshub.org/files/robles_barahona_michlmayr-evolution_participation.pdf %0 Generic %D 2004 %T Applying Social Network Analysis to the Information in CVS Repositories %A López-Fernández, L. %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %K apache %K complex networks %K cvs %K gnome %K kde %K libre software engineering %K source code %K source code repositories %K visualization techniques %K vizualization %X The huge quantities of data available in the CVS repositories of large, long-lived libre (free, open source) software projects, and the many interrelationships among those data offer opportunities for extracting large amounts of valuable information about their structure, evolution and internal processes. Unfortunately, the sheer volume of that information renders it almost unusable without applying methodologies which highlight the relevant information for a given aspect of the project. In this paper, we propose the use of a well known set of methodologies (social network analysis) for characterizing libre software projects, their evolution over time and their internal structure. In addition, we show how we have applied such methodologies to real cases, and extract some preliminary conclusions from that experience. %B International Workshop on Mining Software Repositories (MSR 2004) %P 101-105 %> https://flosshub.org/sites/flosshub.org/files/101ApplyingSocial.pdf %0 Conference Proceedings %B Proceedings of the 4th ICSE Workshop on Open Source %D 2004 %T Community structure of modules in the Apache project %A Jesus M. Gonzalez-Barahona %A Luis Lopez %A Gregorio Robles %K apache %K cvs %K source code %X The relationships among modules in a software project of a certain size can give us much information about its internal organization and a way to control and monitor development activities and evolution of large libre software projects. In this paper, we show how information available in CVS repositories can be used to study the structure of the modules in a project when they are related by the people working in them, and how techniques taken from the social networks fields can be used to highlight the characteristics of that structure. As a case example, we also show some results of applying this methodology to the Apache project in several points in time. Among other facts, it is shown how the project evolves and is self-structuring, with developer communities of modules corresponding to semantically related families of modules. %B Proceedings of the 4th ICSE Workshop on Open Source %P 44-48 %> https://flosshub.org/sites/flosshub.org/files/gonzalezBarahona44-48.pdf