%0 Conference Proceedings %B 2017 IEEE/ACM 39th IEEE International Conference on Software Engineering Companion %D 2017 %T Charting the market disruptive nature of Open Source: Experiences from Sony Mobile %A Mols, CE %A Wnuk, K %K ecosystem %K poster %K software business %X Open Source Software (OSS) has substantial impact on how software-intensive firms develop products and deliver value to the customers. These companies need both strategic and operational support on how to adapt OSS as a part of their products and how to adjust processes and organizations to increase the benefits from OSS participation. This work presents the key insights from the journey that Sony Mobile has made from a company developing proprietary software to a respected member of OSS communities. We framed the experiences into an Open Source Maturity Model that includes two scenarios: engineering-driven and business-driven open source. We outline the most important decisions, roles, processes and implications. %B 2017 IEEE/ACM 39th IEEE International Conference on Software Engineering Companion %P 175-176 %8 05/2017 %0 Conference Proceedings %B 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) %D 2017 %T Classifying code comments in Java open-source software systems %A Luca Pascarella %A Bacchelli, Alberto %K java %K Survey %X Code comments are a key software component containing information about the underlying implementation. Several studies have shown that code comments enhance the readability of the code. Nevertheless, not all the comments have the same goal and target audience. In this paper, we investigate how six diverse Java OSS projects use code comments, with the aim of understanding their purpose. Through our analysis, we produce a taxonomy of source code comments; subsequently, we investigate how often each category occur by manually classifying more than 2,000 code comments from the aforementioned projects. In addition, we conduct an initial evaluation on how to automatically classify code comments at line level into our taxonomy using machine learning; initial results are promising and suggest that an accurate classification is within reach. %B 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) %P 227-237 %8 05/2017 %0 Conference Proceedings %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %D 2017 %T Considering the use of walled gardens for FLOSS project communication %A Squire, Megan %K apache %K chat %K communication %K email %K free software %K irc %K mailing list %K open source %K Slack %K Stack Overflow %K teams %K Wordpress %X At its core, free, libre, and open source software (FLOSS) is defined by its adherence to a set of licenses that give various freedoms to the users of the software, for example the ability to use the software, to read or modify its source code, and to distribute the software to others. In addition, many FLOSS projects and developers also champion other values related to "freedom" and "openness", such as transparency, for example in communication and decision-making, or community-orientedness, for example in broadening access, collaboration, and participation. This paper explores how one increasingly common software development practice - communicating inside non-archived, third-party "walled gardens" - puts these FLOSS values into conflict. If communities choose to use non-archived walled gardens for communication, they may be prioritizing one type of openness (broad participation) over another (transparency). We use 18 FLOSS projects as a sample to describe how walled gardens are currently being used for intra-project communication, as well as to determine whether or not these projects provide archives of these communications. Findings will be useful to the FLOSS community as a whole as it seeks to under- stand the evolution and impact of its communication choices. %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %S IFIP Advances in Information and Communication Technology %8 05/2017 %U https://link.springer.com/content/pdf/10.1007%2F978-3-319-57735-7_1.pdf %R 10.1007/978-3-319-57735-7_1 %> https://flosshub.org/sites/flosshub.org/files/preprint_0.pdf %0 Conference Proceedings %B 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE) %D 2017 %T Developer Turnover in Global, Industrial Open Source Projects: Insights from Applying Survival Analysis %A Bin Lin %A Gregorio Robles %A Serebrenik, Alexander %K survival analysis %X Large open source software projects often have a globally distributed development team. Studies have shown developer turnover has a significant impact on the project success. Frequent developer turnover may lead to loss of productivity due to lacking relevant knowledge and spending extra time learning how projects work. Thus, lots of attention has been paid to which factors are related to developer retention; however, few of them focus on the impact of activities of individual developers. In this paper, we study five open source projects from different organizations and examine whether developer turnover is affected by when they start contributing and what types of contributions they are making. Our study reveals that developers have higher chances to survive in software projects when they 1) start contributing to the project earlier; 2) mainly modify instead of creating files; 3) mainly code instead of dealing with documentations. Our results also shed lights on the potential approaches to improving developer retention. %B 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE) %P 66-75 %8 05/2017 %0 Conference Proceedings %B 2017 IEEE 25th International Conference on Program Comprehension (ICPC) %D 2017 %T Do Software Developers Understand Open Source Licenses? %A Almeida, Daniel A. %A Murphy, Gail C. %A Wilson, Greg %A Hoye, Mike %K license %K Survey %X —Software provided under open source licenses is widely used, from forming high-profile stand-alone applications (e.g., Mozilla Firefox) to being embedded in commercial offerings (e.g., network routers). Despite the high frequency of use of open source licenses, there has been little work about whether software developers understand the open source licenses they use. To our knowledge, only one survey has been conducted, which focused on which licenses developers choose and when they encounter problems with licensing open source software. To help fill the gap of whether or not developers understand the open source licenses they use, we conducted a survey that posed development scenarios involving three popular open source licenses (GNU GPL 3.0, GNU LGPL 3.0 and MPL 2.0) both alone and in combination. The 375 respondents to the survey, who were largely developers, gave answers consistent with those of a legal expert’s opinion in 62% of 42 cases. Although developers clearly understood cases involving one license, they struggled when multiple licenses were involved. An analysis of the quantitative and qualitative results of the study indicate a need for tool support to help guide developers in understanding this critical information attached to software components. %B 2017 IEEE 25th International Conference on Program Comprehension (ICPC) %P 1-11 %8 05/2017 %R 10.1109/ICPC.2017.7 %0 Conference Proceedings %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %D 2017 %T How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects %A Y. Wu %A Manabe, Yuki %A Daniel M. Germán %A Inoue, K. %K Code clone %K debian %K License inconsistency %K licenses %K Software license %X A license inconsistency is the presence of two or more source files that evolved from the same original file containing different licenses. In our previous study, we have shown that license inconsistencies do exist in open source projects and may lead to potential license violation problems. In this study, we try to find out whether the issues of license inconsistencies are properly solved by analyzing two versions of a FOSS distribution—Debian—and investigate the evolution patterns of license inconsistencies. Findings are: license inconsistencies occur mostly because the original copyright owner updated the license while the reusers were still using the old version of the source files with the old license; most license inconsistencies would disappear when the reusers synchronize their project from the upstream, while some would exist permanently if reusers decide not to synchronize anymore. Legally suspicious cases have not been found yet in those Debian distributions. %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %S IFIP Advances in Information and Communication Technology %I Springer %V 496 %P 69-79 %8 05/2017 %U https://link.springer.com/chapter/10.1007/978-3-319-57735-7_8 %R 10.1007/978-3-319-57735-7_8 %0 Conference Proceedings %B 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) %D 2017 %T How Open Source Projects use Static Code Analysis Tools in Continuous Integration Pipelines %A Zampetti, Fiorella %A Scalabrino, Simone %A Oliveto, Rocco %A Canfora, Gerardo %A Di Penta, Massimiliano %K continuous integration %K empirical study %K static analysis %X Static analysis tools are often used by software developers to entail early detection of potential faults, vulnerabilities, code smells, or to assess the source code adherence to coding standards and guidelines. Also, their adoption within Continuous Integration (CI) pipelines has been advocated by researchers and practitioners. This paper studies the usage of static analysis tools in 20 Java open source projects hosted on GitHub and using Travis CI as continuous integration infrastructure. Specifically, we investigate (i) which tools are being used and how they are configured for the CI, (ii) what types of issues make the build fail or raise warnings, and (iii) whether, how, and after how long are broken builds and warnings resolved. Results indicate that in the analyzed projects build breakages due to static analysis tools are mainly related to adherence to coding standards, and there is also some attention to missing licenses. Build failures related to tools identifying potential bugs or vulnerabilities occur less frequently, and in some cases such tools are activated in a “softer” mode, without making the build fail. Also, the study reveals that build breakages due to static analysis tools are quickly fixed by actually solving the problem, rather than by disabling the warning, and are often properly documented. %B 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) %P 334-344 %8 05/2017 %R 10.1109/MSR.2017.2 %0 Book Section %B Advances in Ubiquitous Networking 2: Proceedings of the UNet'16 %D 2017 %T Knowledge Flows Within Open Source Software Projects: A Social Network Perspective %A Kerzazi, Noureddine %A El Asri, Ikram %E El-Azouzi, Rachid %E Menasche, Daniel Sadoc %E Sabir, Essaïd %E De Pellegrini, Francesco %E Benjillali, Mustapha %K expertise %K Knowledge flows %K open source %K SNA %X Developing software is knowledge-intensive activity, requiring extensive technical knowledge and awareness. The abstract part of development is the social interactions that drive knowledge flows between contributors, especially for Open Source Software (OSS). This study investigated knowledge sharing and propagation from social perspective using social network analysis (SNA). We mined and analyzed the issue and review histories of three OSS from GitHub. Particular attention has been paid to the socio-interactions through comments from contributors on reviews. We aim at explaining the propagation and density of knowledge flows within contributor networks. The results show that review requests flow from the core contributors toward peripheral contributors and comments on reviews are in a continuous loop from the core teams to the peripherals and back; and the core contributors leverage on their awareness and technical knowledge to increase their notoriety by playing the role of communication brokers supported by comments on work items. %B Advances in Ubiquitous Networking 2: Proceedings of the UNet'16 %I Springer Singapore %C Singapore %P 247–258 %@ 978-981-10-1627-1 %U http://dx.doi.org/10.1007/978-981-10-1627-1_19 %R 10.1007/978-981-10-1627-1_19 %0 Conference Proceedings %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %D 2017 %T OSSpal: Finding and Evaluating Open Source Software %A Wasserman, A. %A Guo, X. %A McMillian, B. %A Qian K. %A Wei M.Y. %A Xu, Q. %K Open source forges %K software evaluation %K software metrics %K Software taxonomy %X This paper describes the OSSpal project, which is aimed at helping companies, government agencies, and other organizations find high quality free and open source software (FOSS) that meets their needs. OSSpal is a successor to the Business Readiness Rating (BRR), combining quantitative and qualitative evaluation measures for software in various categories. Instead of a purely numeric calculated score OSSpal adds curation of high-quality FOSS projects and individual user reviews of these criteria. Unlike the BRR project, for which there was no automated support, OSSpal has an operational, publicly available website where users may search by project name or category, and enter ratings and reviews for projects. %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %S IFIP Advances in Information and Communication Technology %I Springer %V 496 %P 193-203 %8 05/2017 %U https://link.springer.com/chapter/10.1007/978-3-319-57735-7_18 %R 10.1007/978-3-319-57735-7_18 %0 Journal Article %J IEEE Transactions on Software Engineering %D 2017 %T Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft %A Bosu, Amiangshu %A Carver, Jeffrey C. %A Christian Bird %A Orbeck, Jonathan %A Chockley, Christopher %K code review %K commercial projects %K peer impressions %K Survey %X Many open source and commercial developers practice contemporary code review, a lightweight, informal, tool-based code review process. To better understand this process and its benefits, we gathered information about code review practices via surveys of open source software developers and developers from Microsoft. The results of our analysis suggest that developers spend approximately 10-15 percent of their time in code reviews, with the amount of effort increasing with experience. Developers consider code review important, stating that in addition to finding defects, code reviews offer other benefits, including knowledge sharing, community building, and maintaining code quality. The quality of the code submitted for review helps reviewers form impressions about their teammates, which can influence future collaborations. We found a large amount of similarity between the Microsoft and OSS respondents. One interesting difference is that while OSS respondents view code review as an important method of impression formation, Microsoft respondents found knowledge dissemination to be more important. Finally, we found little difference between distributed and co-located Microsoft teams. Our findings identify the following key areas that warrant focused research: 1) exploring the non-technical benefits of code reviews, 2) helping developers in articulating review comments, and 3) assisting reviewers’ program comprehension during code reviews. %B IEEE Transactions on Software Engineering %V 43 %P 56 - 75 %8 1/2017 %U https://amiangshu.com/papers/CodeReview-TSE-2016.pdf %N 1 %! IIEEE Trans. Software Eng. %R 10.1109/TSE.2016.2576451 %> https://flosshub.org/sites/flosshub.org/files/CodeReview-TSE-2016.pdf %0 Conference Proceedings %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %D 2017 %T Understanding the Effects of Practices on KDE Ecosystem Health %A Simone da Silva Amorim %A John D. McGregor %A Eduardo Santana de Almeida %A Christina von Flach Garcia Chavez %K Ethnographic studies %K Open source software ecosystems %K Software ecosystem health %K Software practices %X Open source software ecosystems have adjusted and evolved a set of practices over the years to support the delivery of sustainable software. However, few studies have investigated the impacts of such practices on the health of these ecosystems. In this paper, we present the results of an ethnographic-based study conducted during the Latin-American KDE users and contributors meeting (LaKademy 2015) with the goal of collecting practices used within the KDE ecosystem and understanding how they affect ecosystem health. The analysis was based on softgoal interdependency graphs adapted to represent practices and relate them to non-functional requirements and goals. Our results provide a preliminary insight to understand how KDE ecosystem community interacts, which working practices have been adopted and how they affect ecosystem health. %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %S IFIP Advances in Information and Communication Technology %I Springer %V 496 %P 89-100 %8 05/2017 %U https://link.springer.com/chapter/10.1007/978-3-319-57735-7_10 %R 10.1007/978-3-319-57735-7_10 %0 Conference Proceedings %B 2017 IEEE/ACM 39th International Conference on Software Engineering %D 2017 %T Understanding the Impressions, Motivations, and Barriers of One Time Code Contributors to FLOSS Projects: A Survey %A Amanda Lee %A Carver, Jeffrey C. %A Bosu, Amiangshu %K newcomers %K One Time Contributors %K Qualitative Research %K Survey %X Successful Free/Libre Open Source Software (FLOSS) projects must attract and retain high-quality talent. Researchers have invested considerable effort in the study of core and peripheral FLOSS developers. To this point, one critical subset of developers that have not been studied are One-Time code Contributors (OTC) – those that have had exactly one patch accepted. To understand why OTCs have not contributed another patch and provide guidance to FLOSS projects on retaining OTCs, this study seeks to understand the impressions, motivations, and barriers experienced by OTCs. We conducted an online survey of OTCs from 23 popular FLOSS projects. Based on the 184 responses received, we observed that OTCs generally have positive impressions of their FLOSS project and are driven by a variety of motivations. Most OTCs primarily made contributions to fix bugs that impeded their work and did not plan on becoming long term contributors. Furthermore, OTCs encounter a number of barriers that prevent them from continuing to contribute to the project. Based on our findings, there are some concrete actions FLOSS projects can take to increase the chances of converting OTCs into long-term contributors. %B 2017 IEEE/ACM 39th International Conference on Software Engineering %P 187-197 %8 05/2017 %0 Conference Proceedings %B 2017 IEEE/ACM 10th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE) %D 2017 %T Using Gamification to Orient and Motivate Students to Contribute to OSS Projects %A Guilherme C. Diniz %A Marco A. Graciotto Silva %A Marco Gerosa %A Steinmacher, Igor %K engagement %K gamification %K MOTIVATION %K newcomers %K students %X Students can benefit from contributing to Open Source Software (OSS), since they can enrich their portfolio and learn with real world projects. However, sometimes students are demotivated to contribute due to entrance barriers. On the other hand, gamification is widely used to engage and motivate people to accomplish tasks and improve their performance. The goal of this work is to analyze the use of gamification to orient and motivate undergraduate students to overcome onboarding barriers and engage to OSS projects. To achieve this goal, we implemented four gaming elements (Quests, Points, Ranking, and Levels) in GitLab and assessed the environment by means of a study conducted with 17 students within a real OSS project (JabRed). At the end of the study, the students evaluated their experience through a questionnaire. We found that the Quest element helped to guide participants and keep them motivated and points helped by providing feedback on students' performed tasks. We conclude that the gamified environment oriented the students in an attempt to make a contribution and that gamification can motivate and orient newcomers' to engage to OSS projects. %B 2017 IEEE/ACM 10th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE) %P 36-42 %8 05/2017 %0 Conference Paper %B Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering %D 2017 %T Using Metrics to Track Code Review Performance %A Izquierdo-Cortazar, Daniel %A Sekitoleko, Nelson %A Jesus M. Gonzalez-Barahona %A Kurth, Lars %K code review %K data mining %K Software development analytics %X During 2015, some members of the Xen Project Advisory Board became worried about the performance of their code review process. The Xen Project is a free, open source software project developing one of the most popular virtualization platforms in the industry. They use a pre-commit peer review process similar to that in the Linux kernel, based on email messages. They had observed a large increase over time in the number of messages related to code review, and were worried about how this could be a signal of problems with their code review process. To address these concerns, we designed and conducted, with their continuous feedback, a detailed analysis focused on finding these problems, if any. During the study, we dealt with the methodological problems of Linux-like code review, and with the deeper issue of finding metrics that could uncover the problems they were worried about. For having a benchmark, we run the same analysis on a similar project, which uses very similar code review practices: the Linux Netdev (Netdev) project. As a result, we learned how in fact the Xen Project had some problems, but at the moment of the analysis those were already under control. We found as well how different the Xen and Netdev projects were behaving with respect to code review performance, despite being so similar from many points of view. In this paper we show the results of both analyses, and propose a comprehensive methodology, fully automated, to study Linux-style code review. We discuss also the problems of getting significant metrics to track improvements or detect problems in this kind of code review. %B Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering %S EASE'17 %I ACM %C New York, NY, USA %P 214–223 %@ 978-1-4503-4804-1 %U http://doi.acm.org/10.1145/3084226.3084247 %R 10.1145/3084226.3084247 %0 Conference Proceedings %B 38th International Conference on Software Engineering (ICSE 2016) %D 2016 %T How Do Free/Open Source Developers Pick Their Tools? A Delphi Study of the Debian Project %A Martin Krafft %A Stol, Klaas-Jan %A Fitzgerald, Brian %K Delphi %K Free/Open Source Software %K Qualitative Study %K study %K tools %X Free and Open Source Software (FOSS) has come to play a critical role in the global software industry. Organizations are widely adopting FOSS and interacting with open source communities, and hence organizations have a considerable interest in seeing these communities flourishing. Very little research has focused on the tools used to develop that software. Given the absence of organizational policies and mandate that would occur in a traditional environment, an open question is how FOSS developers decide what tools to use. In this paper we report on a policy delphi study conducted in the Debian Project, one of the largest FOSS projects. Drawing from data collected in three phases from a panel of 21 experts, we identified 15 factors that affect their decision to adopt tools. This in turn can help FOSS communities to define a suitable policy of actions, in order to improve their processes. %B 38th International Conference on Software Engineering (ICSE 2016) %U https://www.researchgate.net/publication/291312269_How_Do_FreeOpen_Source_Developers_Pick_Their_Tools_A_Delphi_Study_of_the_Debian_Project %0 Journal Article %J ACM Trans. Manage. Inf. Syst. %D 2016 %T Peripheral Developer Participation in Open Source Projects: An Empirical Analysis %A Krishnamurthy, Rajiv %A Jacob, Varghese %A Radhakrishnan, Suresh %A Kutsal Dogan %K Code ownership %K open source software %K project management %K software metrics %X The success of the Open Source model of software development depends on the voluntary participation of external developers (the peripheral developers), a group that can have distinct motivations from that of project founders (the core developers). In this study, we examine peripheral developer participation by empirically examining approximately 2,600 open source projects. In particular, we hypothesize that peripheral developer participation is higher when the potential for building reputation by gaining recognition from project stakeholders is higher. We consider recognition by internal stakeholders (such as core developers) and external stakeholders (such as end-users and peers). We find a positive association between peripheral developer participation and the potential of stakeholder recognition after controlling for bug reports, feature requests, and other key factors. Our findings provide important insights for OSS founders and corporate managers for open sourcing or OSS adoption decisions. %B ACM Trans. Manage. Inf. Syst. %I ACM %C New York, NY, USA %V 6 %P 14:1–14:31 %U http://doi.acm.org/10.1145/2820618 %R 10.1145/2820618 %0 Journal Article %J Applied Economics %D 2016 %T Is there a wage premium for volunteer OSS engagement? – signalling, learning and noise %A Bitzer, Jürgen %A Geishecker, Ingo %A Schröder, Philipp J. H. %K open source software %K peer production %K signalling %K voluntary work %K wage formation %X Volunteer-based open-source production has become a significant new model for the organization of software development. Economics often pictures this phenomenon as a case of signaling: Individuals engage in the volunteer programming of open-source software (OSS) as a labor-market signal resulting in a wage premium. Yet, this explanation could so far not be empirically tested. The present paper fills this gap by estimating an upper-bound composite wage premium of voluntary OSS contributions and by separating the potential signaling effect of OSS engagement from other effects. Although some 70% of OSS contributors believe that OSS involvement benefits their careers, we find no actual labor market premium for OSS engagement. The presence of other motives such as fun of play or altruism render OSS contributions too noisy to function as a signal. %B Applied Economics %I Routledge %P 1 - 16 %8 09/2016 %! Applied Economics %R 10.1080/00036846.2016.1218427 %0 Report %D 2015 %T Candoia: A Platform and an Ecosystem for Building and Deploying Versatile Mining Software Repositories Tools %A Nitin M. Tiwari %A Dalton D. Mills %A Ganesha Upadhyaya %A Eric Lin %A Rajan, Hridesh %K Analysis of software and its evolution %K Application specific development environments %K flossmole cited %K msr %K research to practice %K software evolution %K software repositories %X Research on mining software repositories (MSR) has shown great promise during the last decade in solving many challenging software engineering problems. There exists, however, a ‘valley of death’ between these significant innovations in the MSR research and their deployment in practice. The significant cost of converting a prototype to software; need to provide support for a wide variety of tools and technologies e.g. CVS, SVN, Git, Bugzilla, Jira, Issues, etc, to improve applicability; and the high cost of customizing tools to practitioner-specific settings are some key hurdles in transition to practice. We describe Candoia, a platform and an ecosystem that is aimed at bridging this valley of death between innovations in MSR research and their deployment in practice. We have implemented Candoia and provide facilities to build and publish MSR ideas as Candoia apps. Our evaluation demonstrates that Candoia drastically reduces the cost of converting an idea to an app, thus reducing the barrier to transitioning research findings into practice. We also see versatility, in Candoia app’s ability to work with a variety of tools and technologies that the platform supports. Finally, we find that customizing Candoia app to fit project-specific needs is often well within the grasp of developers. %B Iowa State University Computer Science Technical Reports %I Iowa State University %8 11/2015 %U http://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=1378&context=cs_techreports %> https://flosshub.org/sites/flosshub.org/files/Candoia-%20A%20Platform%20and%20an%20Ecosystem%20for%20Building%20and%20Deploying%20V.pdf %0 Conference Proceedings %B 12th Working Conference on Mining Software Repositories (MSR 2015) %D 2015 %T An Empirical Study of Architectural Change in Open-Source Software Systems %A Duc Minh Le %A Pooyan Behnamghader %A Joshua Garcia‡ Daniel Link %A Arman Shahbazian %A Nenad Medvidovic %K architectural change %K architecture recovery %K open-source systems %K software architecture %K software evolution %X From its very inception, the study of software architecture has recognized architectural decay as a regularly occurring phenomenon in long-lived systems. Architectural decay is caused by repeated changes to a system during its lifespan. Despite decay’s prevalence, there is a relative dearth of empirical data regarding the nature of architectural changes that may lead to decay, and of developers’ understanding of those changes. In this paper, we take a step toward addressing that scarcity by conducting an empirical study of changes found in software architectures spanning several hundred versions of 14 opensource systems. Our study reveals several new findings regarding the frequency of architectural changes in software systems, the common points of departure in a system’s architecture during maintenance and evolution, the difference between system-level and component-level architectural change, and the suitability of a system’s implementation-level structure as a proxy for its architecture. %B 12th Working Conference on Mining Software Repositories (MSR 2015) %I IEEE %8 05/2015 %U http://softarch.usc.edu/~pooyan/publications/emparch_msr15.pdf %> https://flosshub.org/sites/flosshub.org/files/emparch_msr15.pdf %0 Journal Article %J Journal of Information Technology Research %D 2015 %T Evaluation of FLOSS by Analyzing Its Software Evolution: %A Macho, Héctor J. %A Gregorio Robles %A González-Barahona, Jesus M %K free software %K LMS %K moodle %K open source %K software engineering %K software evaluation %K software evolution %X In today’s world, management often rely on FLOSS (Free/Libre/Open Source Software) systems to run their organizations. However, the nature of FLOSS is different from the software they have been using in the last decades. Its development model is distributed, and its authors are diverse as many volunteers and companies may collaborate in the project. In this paper, we want to shed some light on how to evaluate a FLOSS system by looking at the Moodle platform, which is currently the most used learning management system among educational institutions worldwide. In contrast with other evaluation models that have been proposed so far, the one we present is based on retrieving historical information that can be obtained publicly from the Internet, allowing us to study its evolution. As a result, we will show how by using our methodology management can take informed decisions that lower the risk that organizations face when investing in a FLOSS system. %B Journal of Information Technology Research %V 8 %P 62 - 81 %8 01/2015 %N 1 %R 10.4018/JITR.2015010105 %> https://flosshub.org/sites/flosshub.org/files/Evaluation%20of%20FLOSS%20by%20Analyzing%20its%20Software%20Evolution%20-%20An%20Example%20Using%20the%20Moodle%20Platform.pdf %0 Book Section %B Open Source Systems: Adoption and Impact %D 2015 %T How Developers Acquire FLOSS Skills %A Barcomb, Ann %A Grottke, Michael %A Stauffert, Jan-Philipp %A Dirk Riehle %A Jahn, Sabrina %E Damiani, Ernesto %E Frati, Fulvio %E Dirk Riehle %E Wasserman, Anthony I. %K competencies %K Informal learning %K Non-formal learning %K open source %K Skills %K Software developer %X With the increasing prominence of open collaboration as found in free/libre/open source software projects and other joint production communities, potential participants need to acquire skills. How these skills are learned has received little research attention. This article presents a large-scale survey (5,309 valid responses) in which users and developers of the beta release of a popular file download application were asked which learning styles were used to acquire technical and social skills. We find that the extent to which a person acquired the relevant skills through informal methods tends to be higher if the person is a free/libre/open source code contributor, while being a professional software developer does not have this effect. Additionally, younger participants proved more likely to make use of formal methods of learning. These insights will help individuals, commercial companies, educational institutions, governments and open collaborative projects decide how they promote learning. %B Open Source Systems: Adoption and Impact %S IFIP Advances in Information and Communication Technology %I Springer International Publishing %V 451 %P 23-32 %@ 978-3-319-17836-3 %U http://dx.doi.org/10.1007/978-3-319-17837-0_3 %R 10.1007/978-3-319-17837-0_3 %> https://flosshub.org/sites/flosshub.org/files/oss-2015.pdf %0 Conference Proceedings %B 12th Working Conference on Mining Software Repositories (MSR 2015) %D 2015 %T Investigating Code Review Practices in Defective Files: An Empirical Study of the Qt System %A Patanamon Thongtanunam %A McIntosh, Shane %A Hassan, Ahmed E. %A Hajimu Iida %K code review %K software quality %X Software code review is a well-established software quality practice. Recently, Modern Code Review (MCR) has been widely adopted in both open source and proprietary projects. To evaluate the impact that characteristics of MCR practices have on software quality, this paper comparatively studies MCR practices in defective and clean source code files. We investigate defective files along two perspectives: 1) files that will eventually have defects (i.e., future-defective files) and 2) files that have historically been defective (i.e., risky files). Through an empirical study of 11,736 reviews of changes to 24,486 files from the Qt open source project, we find that both future-defective files and risky files tend to be reviewed less rigorously than their clean counterparts. We also find that the concerns addressed during the code reviews of both defective and clean files tend to enhance evolvability, i.e., ease future maintenance (like documentation), rather than focus on functional issues (like incorrect program logic). Our findings suggest that although functionality concerns are rarely addressed during code review, the rigor of the reviewing process that is applied to a source code file throughout a development cycle shares a link with its defect proneness. %B 12th Working Conference on Mining Software Repositories (MSR 2015) %I IEEE %8 05/2015 %U http://sail.cs.queensu.ca/publications/pubs/msr2015-thongtanunam.pdf %> https://flosshub.org/sites/flosshub.org/files/msr2015-thongtanunam.pdf %0 Generic %D 2015 %T Lessons Learned from Applying Social Network Analysis on an Industrial Free/Libre/Open Source Software Ecosystem %A Teixeira, Jose %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %K business models %K cloud computing %K homophily %K open source %K Open-Coopetition %K openstack %K social network analysis %K Software ecosystems %X Many software projects are no longer done in-house by a single organization. Instead, we are in a new age where software is developed by a networked community of individuals and organizations, which base their relations to each other on mutual interest. Paradoxically, recent research suggests that software development can actually be jointly-developed by rival firms. For instance, it is known that the mobile-device makers Apple and Samsung kept collaborating in open source projects while running expensive patent wars in the court. Taking a case study approach, we explore how rival firms collaborate in the open source arena by employing a multi-method approach that combines qualitative analysis of archival data (QA) with mining software repositories (MSR) and Social Network Analysis (SNA). While exploring collaborative processes within the OpenStack ecosystem, our research contributes to Software Engineering research by exploring the role of groups, sub-communities and business models within a high-networked open source ecosystem. Surprising results point out that competition for the same revenue model (i.e., operating conflicting business models) does not necessary affect collaboration within the ecosystem. Moreover, while detecting the different sub-communities of the OpenStack community, we found out that the expected social tendency of developers to work with developers from same firm (i.e., homophony) did not hold within the OpenStack ecosystem. Furthermore, while addressing a novel, complex and unexplored open source case, this research also contributes to the management literature in coopetition strategy and high-tech entrepreneurship with a rich description on how heterogeneous actors within a high-networked ecosystem (involving individuals, startups, established firms and public organizations) joint-develop a complex infrastructure for big-data in the open source arena. %U http://arxiv.org/abs/1507.04587 %0 Conference Proceedings %B 12th Working Conference on Mining Software Repositories (MSR 2015) %D 2015 %T Mining StackOverflow to Filter out Off-topic IRC Discussion %A Shaiful Alam Chowdhury %A Hindle, Abram %K irc %K Stack Overflow %K youtube %X Internet Relay Chat (IRC) is a commonly used tool by OpenSource developers. Developers use IRC channels to discuss programming related problems, but much of the discussion is irrelevant and off-topic. Essentially if we treat IRC discussions like email messages, and apply spam filtering, we can try to filter out the spam (the off-topic discussions) from the ham (the programming discussions). Yet we need labelled data that unfortunately takes time to curate. To avoid costly curration in order to filter out off-topic discussions, we need positive and negative data-sources. Online discussion forums, such as StackOverflow, are very effective for solving programming problems. By engaging in open-data, StackOverflow data becomes a powerful source of labelled text regarding programming. This work shows that we can train classifiers using StackOverflow posts as positive examples of on-topic programming discussion. YouTube video comments, notorious for their lack of quality, serve as training set of offtopic discussion. By exploiting these datasets, accurate classifiers can be built, tested and evaluated that require very little effort for end-users to deploy and exploit. %B 12th Working Conference on Mining Software Repositories (MSR 2015) %8 05/2015 %> https://flosshub.org/sites/flosshub.org/files/shaiful-mining_so_0.pdf %0 Conference Paper %B Proceedings of the 11th International Symposium on Open Collaboration (OpenSym 2015) %D 2015 %T A multiple case study of small free software businesses as social entrepreneurships %A Barcomb, Ann %K free software %K open source software %K public good %K small business %K social entrepreneurship %K social ventures %X Free/libre and open source software are frequently described as a single community or movement. The difference between free software and open source ideology may influence founders, resulting in different types of companies being created. Specifically, the relationship between free/libre software ideology and social entrepreneurships is investigated. This paper presents seven case studies of businesses, five of which were founded by people who identify with the free/libre software movement. The result is a theory that small businesses founded by free/libre software advocates have three characteristics of social entrepreneurships. First, social benefit is prioritized over wealth creation. Second, the business’s social mission is not incidental but is furthered through its for-profit activities, rather than supported by the company’s profits. Third, the company’s success is defined in part by the success of its social mission Free/libre software entrepreneurs who recognize their activities as social entrepreneurships can benefit from the existing literature on the unique challenges faced by socially-oriented businesses. %B Proceedings of the 11th International Symposium on Open Collaboration (OpenSym 2015) %U https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/6334 %> https://flosshub.org/sites/flosshub.org/files/p100-barcomb.pdf %0 Book Section %B Open Source Systems: Adoption and Impact %D 2015 %T The RISCOSS Platform for Risk Management in Open Source Software Adoption %A Franch, X. %A Kenett, R. %A Mancinelli, F. %A Susi, A. %A Ameller, D. %A Annosi, M.C. %A Ben-Jacob, R. %A Blumenfeld, Y. %A Franco, O.H. %A Gross, D. %A Lopez, L. %A Morandini, M. %A Oriol, M. %A Siena, A. %E Damiani, Ernesto %E Frati, Fulvio %E Dirk Riehle %E Wasserman, Anthony I. %K Open source adoption %K Open Source Projects %K open source software %K OSS %K Risk Management %K Software platform %X Managing risks related to OSS adoption is a must for organizations that need to smoothly integrate OSS-related practices in their development processes. Adequate tool support may pave the road to effective risk management and ensure the sustainability of such activity. In this paper, we present the RISCOSS platform for managing risks in OSS adoption. RISCOSS builds upon a highly configurable data model that allows customization to several types of scopes. It implements two different working modes: exploration, where the impact of decisions may be assessed before making them; and continuous assessment, where risk variables (and their possible consequences on business goals) are continuously monitored and reported to decision-makers. The blackboard-oriented architecture of the platform defines several interfaces for the identified techniques, allowing new techniques to be plugged in. %B Open Source Systems: Adoption and Impact %S IFIP Advances in Information and Communication Technology %I Springer International Publishing %V 451 %P 124-133 %@ 978-3-319-17836-3 %U http://dx.doi.org/10.1007/978-3-319-17837-0_12 %R 10.1007/978-3-319-17837-0_12 %0 Book Section %B Open Source Systems: Adoption and Impact %D 2015 %T Scaling and Internationalizing an Agile FOSS Project: Lessons Learned %A Fellhofer, Stephan %A Harzl, Annemarie %A Slany, Wolfgang %E Damiani, Ernesto %E Frati, Fulvio %E Dirk Riehle %E Wasserman, Anthony I. %K Agile development %K communication %K Distributed software development %K Documentation management %K Internationalization %K kanban %K Scaling %X This paper describes problems that arose with the scaling and internationalization of the open source project Catrobat. The problems we faced were the lack of a centralized user management, insufficient scaling of our communication channels, and the necessity to adapt agile development techniques to remote collaboration. To solve the problems we decided to use a mix of open source tools (Git, IRC, LDAP) and commercial solutions (Jira, Confluence, GitHub) because we believe that this mix best fits our needs. Other projects can benefit from the lessons we learned during the reorganization of our knowledge base and communication tools, as infrastructure changes can be very labor-intensive and time-consuming. %B Open Source Systems: Adoption and Impact %S IFIP Advances in Information and Communication Technology %I Springer International Publishing %V 451 %P 13-22 %@ 978-3-319-17836-3 %U http://dx.doi.org/10.1007/978-3-319-17837-0_2 %R 10.1007/978-3-319-17837-0_2 %0 Conference Proceedings %B 37th International Conference on Software Engineering %D 2015 %T "Should we move to Stack Overflow?" Measuring the utility of social media for developer support %A Squire, Megan %K developer support %K forums %K mailing list %K metrics %K quality %K social media %K Stack Overflow %K technical support %X Stack Overflow is an enormously popular question-and-answer web site intended for software developers to help each other with programming issues. Some software projects aimed at developers (for example, application programming interfaces, application engines, cloud services, development frameworks, and the like) are closing their self-supported developer discussion forums and mailing lists and instead directing developers to use special-purpose tags on Stack Overflow. The goals of this paper are to document the main reasons given for moving developer support to Stack Overflow, and then to collect and analyze data from a group of software projects that have done this, in order to show whether the expected quality of support was actually achieved. The analysis shows that for all four software projects in this study, two of the desired quality indicators, developer participation and response time, did show improvements on Stack Overflow as compared to mailing lists and forums. However, we also found several projects that moved back from Stack Overflow, despite achieving these desired improvements. The results of this study are applicable to a wide variety of software projects that provide developer support using social media. %B 37th International Conference on Software Engineering %I IEEE %P 10pp %8 05/2015 %> https://flosshub.org/sites/flosshub.org/files/SEIP2015stackv2.pdf %0 Book Section %B Open Source Systems: Adoption and Impact %D 2015 %T Smart Route Planning Using Open Data and Participatory Sensing %A Nallur, Vivek %A Elgammal, Amal %A Clarke, Siobhán %E Damiani, Ernesto %E Frati, Fulvio %E Dirk Riehle %E Wasserman, Anthony I. %K Open-data %K open-source %K Participatory sensing %K Smart-city-routing %X Smart cities are not merely the infusion of technology into a city’s infrastructure, but also require citizens interacting with their urban environment in a smart and informed manner. Transportation is key aspect of smart cities. In this paper, we present a smart route planning open-source system; SMART-GH utilizes open data and participatory sensing, where citizens actively participate in collecting data about the city in their daily environment, e.g., noise, air pollution, etc. SMART-GH then augments the routing logic with sensor data to answer queries such as ‘return the least noisy route’. SMART-GH enables citizens to make smarter decisions about their daily commute, and subsequently improve their quality of life. %B Open Source Systems: Adoption and Impact %S IFIP Advances in Information and Communication Technology %I Springer International Publishing %V 451 %P 91-100 %@ 978-3-319-17836-3 %U http://dx.doi.org/10.1007/978-3-319-17837-0_9 %R 10.1007/978-3-319-17837-0_9 %> https://flosshub.org/sites/flosshub.org/files/nallur15.pdf %0 Journal Article %J Cognitive Systems Research %D 2015 %T Stigmergic coordination in FLOSS development teams: Integrating explicit and implicit mechanisms %A Bolici, Francesco %A Howison, James %A Kevin Crowston %K Coordination mechanisms %K distributed teams %K FLOSS teams %K Stigmergic coordination %X The vast majority of literature on coordination in team-based projects has drawn on a conceptual separation between explicit (e.g. plans, feedbacks) and implicit coordination mechanisms (e.g. mental maps, shared knowledge). This analytical distinction presents some limitations in explaining how coordination is reached in organizations characterized by distributed teams, scarce face to face meetings and fuzzy and changing lines of authority, as in free/libre open source software (FLOSS) development. Analyzing empirical illustrations from two FLOSS projects, we highlight the existence of a peculiar model, stigmergic coordination, which includes aspects of both implicit and explicit mechanisms. The work product itself (implicit) and the characteristics under which it is shared (explicit) play an under-appreciated role in helping software developers manage dependencies as they arise. We develop this argument beyond the existing literature by working with an existing coordination framework, considering the role that the codebase itself might play at each step. We also discuss the features and the practices to support stigmergic coordination in distributed teams, as well as recommendations for future research. “Not everything that implicitly exists needs to be rendered explicit” (Sloterdijk, 2009, p. 3). %B Cognitive Systems Research %8 12/2015 %U http://www.sciencedirect.com/science/article/pii/S1389041715000339 %! Cognitive Systems Research %R 10.1016/j.cogsys.2015.12.003 %> https://flosshub.org/sites/flosshub.org/files/COGSYS-RS-%28HHS%29-%282015%29-%283%29.pdf %0 Book Section %B Open Source Systems: Adoption and Impact %D 2015 %T Surveying the Adoption of FLOSS by Public Administration Local Organizations %A Tosi, Davide %A Lavazza, Luigi %A Morasca, Sandro %A Chiappa, Marco %E Damiani, Ernesto %E Frati, Fulvio %E Dirk Riehle %E Wasserman, Anthony I. %K FLOSS adoption %K Italy %K Public administrations %K Survey %X Background. The introduction of Open Source Software technologies in the Public Administration plays a key role in the spread of Open Source Software. The state of the art in the adoption of Open Source Software solutions in the Public Administration is not very well known even in areas like Lombardy, which is Italy’s largest and most developed region. Goal. The goal of the investigation documented in this paper is to obtain a clear picture about the introduction of Open Source Software technologies in the Public Administration, the obstacles to their adoption, and the willingness of stakeholders to proceed with their introduction. Method. We carried out a qualitative and quantitative survey that was submitted to a representative part of the Public Administrations in Lombardy. Results. The analysis of the qualitative and quantitative information shows that several Public Administrations are already using Open Source Software technologies, though not in all application areas. The savings are one frequently cited incentive to the adoption of Open Source Software. However, one obstacle is the fact that a comprehensive law on software in the Public Administration has not yet been approved. Conclusions. Our analysis provides results that indicate a common understanding of incentives, obstacles, and opportunities for Open Source Software technologies in Public Administrations. %B Open Source Systems: Adoption and Impact %S IFIP Advances in Information and Communication Technology %I Springer International Publishing %V 451 %P 114-123 %@ 978-3-319-17836-3 %U http://dx.doi.org/10.1007/978-3-319-17837-0_11 %R 10.1007/978-3-319-17837-0_11 %0 Book Section %B Open Source Systems: Adoption and Impact %D 2015 %T A Systematic Approach for Evaluating BPM Systems: Case Studies on Open Source and Proprietary Tools %A Delgado, Andrea %A Calegari, Daniel %A Milanese, Pablo %A Falcon, Renatta %A García, Esteban %E Damiani, Ernesto %E Frati, Fulvio %E Dirk Riehle %E Wasserman, Anthony I. %K Business Process Management Systems (BPMS) %K Evaluation methodology %K Open source and proprietary BPMS %K Systematic approach %X Business Process Management Systems (BPMS) provide support for modeling, developing, deploying, executing and evaluating business processes in an organization. Selecting a BPMS is not a trivial task, not only due to the many existing alternatives, both in the open source and proprietary realms, but also because it requires a thorough evaluation of its capabilities, contextualizing them in the organizational environment in which they will be used. In this paper we present a methodology to guide the systematic evaluation of BPMS that takes into account the specific needs of each organization. It provides a list of key characteristics of BPMS which are ranked by the organization and evaluated using test cases and quantitative criteria. We also present case studies of open source and proprietary BPMS evaluations following our proposal. %B Open Source Systems: Adoption and Impact %S IFIP Advances in Information and Communication Technology %I Springer International Publishing %V 451 %P 81-90 %@ 978-3-319-17836-3 %U http://dx.doi.org/10.1007/978-3-319-17837-0_8 %R 10.1007/978-3-319-17837-0_8 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T The Agile Management of Development Projects of Software Combining Scrum, Kanban and Expert Consultation %A Febles Parker, MichelEvaristo %A Monte, YusleydiFernández %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K Agile management of projects %K kanban %K scrum %X At the University of Informatics Sciences (UCI), Havana, Cuba, it is found The Center of Free Solutions of Software (CESOL) who has an informatic project named “Auditing of Source Code” (ACF). This project has as objective to develop an open source software solution to auditing the source code of several software solutions with an agile projects management. In the present investigation have been showed the experiences obtained in the mixed application of two methods of agile projects management; Kanban and Scrum, together with the method Judgment of Expert, during the stage of construction of the lifecycle of ACF, when it is was performed a quality auditing by specialists of the CALISOFT company. In the auditing were detected several errors and to resolve them was necessary to estimate efforts, time and to revalue the lifecycle of the project. Moreover, the investigation show how this method can be used as a guide for young project managers for a correct planification and how can be used as a personal organizational method. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 176-180 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_25 %R 10.1007/978-3-642-55128-4_25 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T The Bug Catalog of the Maven Ecosystem %A Mitropoulos, Dimitris %A Vassilios Karakoidas %A Louridas, Panos %A Gousios, Georgios %A Diomidis Spinellis %K findbugs %K Maven Repository %K msr data showcase %K Software Bugs %X Examining software ecosystems can provide the research community with data regarding artifacts, processes, and communities. We present a dataset obtained from the Maven central repository ecosystem (approximately 265GB of data) by statically analyzing the repository to detect potential software bugs. For our analysis we used FindBugs, a tool that examines Java bytecode to detect numerous types of bugs. The dataset contains the metrics results that FindBugs reports for every project version (a JAR) included in the ecosystem. For every version we also stored specific metadata such as the JAR's size, its dependencies and others. Our dataset can be used to produce interesting research results, as we show in specific examples. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 372–375 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597123 %R 10.1145/2597073.2597123 %> https://flosshub.org/sites/flosshub.org/files/mitro.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T A Code Clone Oracle %A Krutz, Daniel E. %A Le, Wei %K clone %K Clone Oracle %K Code Clone Detection %K msr data showcase %K software engineering %X Code clones are functionally equivalent code segments. Detecting code clones is important for determining bugs, fixes and software reuse. Code clone detection is also essential for developing fast and precise code search algorithms. How- ever, the challenge of such research is to evaluate that the clones detected are indeed functionally equivalent, consider- ing the majority of clones are not textual or even syntactically identical. The goal of this work is to generate a set of method level code clones with a high confidence to help to evaluate future code clone detection and code search tools to evaluate their techniques. We selected three open source programs, Apache, Python and PostgreSQL, and randomly sampled a total of 1536 function pairs. To confirm whether or not these function pairs indicate a clone and what types of clones they belong to, we recruited three experts who have experience in code clone research and four students who have experience in programming for manual inspection. For confidence of the data, the experts consulted multiple code clone detection tools to make the consensus. To assist manual inspection, we built a tool to automatically load function pairs of interest and record the manual inspection results. We found that none of the 66 pairs are textual identical type- 1 clones, and 9 pairs are type-4 clones. Our data is available at: http://phd.gccis.rit.edu/weile/data/cloneoracle/. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 388–391 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597127 %R 10.1145/2597073.2597127 %> https://flosshub.org/sites/flosshub.org/files/clone_oracle.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Collaboration in Open-source Projects: Myth or Reality? %A Tymchuk, Yuriy %A Mocci, Andrea %A Lanza, Michele %K COLLABORATION %K Software ecosystems %X One of the fundamental principles of open-source projects is that they foster collaboration among developers, disregarding their geographical location or personal background. When it comes to software repositories collaboration is a rather ephemeral phenomenon which lacks a clear definition, and it must therefore be mined and modeled. This throws up the question whether what is mined actually maps to reality. In this paper we investigate collaboration by modeling it using a number of diverse approaches that we then compare to a ground truth obtained by surveying a substantial set of developers of the Pharo open-source community. Our findings indicate that the notion of collaboration must be revisited, as it is undermined by a number of factors that are often tackled in imprecise ways or not taken into account at all. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 304–307 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597093 %R 10.1145/2597073.2597093 %0 Journal Article %J MIS Quarterly %D 2014 %T COLLABORATION THROUGH OPEN SUPERPOSITION: A THEORY OF THE OPEN SOURCE WAY. %A Howison, James %A Kevin Crowston %K COLLABORATION %K COMPUTER programmers %K COMPUTER programming %K COMPUTER software %K coordination %K FREEWARE (Computer software) %K INFORMATION storage & retrieval systems %K open source software %K research %K socio-technical system %X This paper develops and illustrates the theory of collaboration through open superposition: the process of depositing motivationally independent layers of work on top of each other over time. The theory is developed in a study of community-based free and open source software (FLOSS) development, through a research arc of discovery (participant observation), replication (two archival case studies), and theorization. The theory explains two key findings: (1) the overwhelming majority of work is accomplished with only a single programmer working on any one task, and (2) tasks that appear too large for any one individual are more likely to be deferred until they are easier rather than being undertaken through structured team work. Moreover, the theory explains how working through open superposition can lead to the discovery of a work breakdown that results in complex, functionally interdependent, work being accomplished without crippling search costs. We identify a set of socio-technical %B MIS Quarterly %V 38 %P 29 - A9 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T A Dataset for Maven Artifacts and Bug Patterns Found in Them %A Saini, Vaibhav %A Sajnani, Hitesh %A Ossher, Joel %A Lopes, Cristina V. %K Empirical Research %K Empirical software engineering %K findbugs %K maven %K software quality %X In this paper, we present data downloaded from Maven, one of the most popular component repositories. The data includes the binaries of 186,392 components, along with source code for 161,025. We identify and organize these components into groups where each group contains all the versions of a library. In order to asses the quality of these components, we make available report generated by the FindBugs tool on 64,574 components. The information is also made available in the form of a database which stores total number, type, and priority of bug patterns found in each component, along with its defect density. We also describe how this dataset can be useful in software engineering research. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 416–419 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597134 %R 10.1145/2597073.2597134 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T An Exploration of Code Quality in FOSS Projects %A Ahmed, Iftekhar %A Ghorashi, Soroush %A Jensen, Carlos %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K Code Quality %K FOSS %K open source software %K success metrics %X It is a widely held belief that Free/Open Source Software (FOSS) development leads to the creation of software with the same, if not higher quality compared to that created using proprietary software development models. However there is little research on evaluating the quality of FOSS code, and the impact of project characteristics such as age, number of core developers, code-base size, etc. In this exploratory study, we examined 110 FOSS projects, measuring the quality of the code and architectural design using code smells. We found that, contrary to our expectations, the overall quality of the code is not affected by the size of the code base, but that it was negatively impacted by the growth of the number of code contributors. Our results also show that projects with more core developers don’t necessarily have better code quality. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 181-190 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_26 %R 10.1007/978-3-642-55128-4_26 %0 Conference Paper %B Proceedings of the Companion Publication of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing %D 2014 %T Exploring the Ecosystem of Software Developers on GitHub and Other Platforms %A Wu, Yu %A Kropczynski, Jessica %A Shih, Patrick C. %A Carroll, John M. %K ecosystem %K follow %K github %K social connection %X GitHub provides various social features for developers to collaborate with others. Those features are important for developers to coordinate their work (Dabbish et al., 2012; Marlow et al., 2013). We hypothesized that the social system of GitHub users was bound by system interactions such that contributing to similar code repositories would lead to users following one another on GitHub or vice versa. Using a quadratic assignment procedure (QAP) correlation, however, only a weak correlation among followship and production activities (code, issue, and wiki contributions) was found. Survey with GitHub users revealed an ecosystem on the Internet for software developers, which includes many platforms, such as Forrst, Twitter, and Hacker News, among others. Developers make social introductions and other interactions on these platforms and engage with one anther on GitHub. Due to these preliminary findings, we describe GitHub as a part of a larger ecosystem of developer interactions. %B Proceedings of the Companion Publication of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing %S CSCW Companion '14 %I ACM %C New York, NY, USA %P 265–268 %@ 978-1-4503-2541-7 %U http://doi.acm.org/10.1145/2556420.2556483 %R 10.1145/2556420.2556483 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T FLOSS 2013: A Survey Dataset About Free Software Contributors: Challenges for Curating, Sharing, and Combining %A Gregorio Robles %A Reina, Laura Arjona %A Serebrenik, Alexander %A Vasilescu, Bogdan %A González-Barahona, Jesús M. %K anonymization %K data combining %K data sharing %K ethics %K free software %K microdata %K msr data showcase %K open data %K open source %K privacy %K Survey %X In this data paper we describe a data set obtained by means of performing an on-line survey to over 2,000 Free Libre Open Source Software (FLOSS) contributors. The survey includes questions related to personal characteristics (gender, age, civil status, nationality, etc.), education and level of English, professional status, dedication to FLOSS projects, reasons and motivations, involvement and goals. We describe as well the possibilities and challenges of using private information from the survey when linked with other, publicly available data sources. In this regard, an example of data sharing will be presented and legal, ethical and technical issues will be discussed. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 396–399 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597129 %R 10.1145/2597073.2597129 %> https://flosshub.org/sites/flosshub.org/files/msr14gregorio.pdf %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T Flow Research SXP Agile Methodology for FOSS Projects %A Peñalver Romero, GladysMarsi %A Leyva Samada, LisandraIsabel %A Abad, AbelMeneses %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K methodology SXP %K open-source %K production %K research %K Software %X This paper aims to explain a procedure that takes into account the different research processes carried out in developing an open-source, allowing control and management. This study is the SXP methodology applied in this type of project was carried out, allowing the validity of the basis of this research. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 195-198 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_28 %R 10.1007/978-3-642-55128-4_28 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T FOSS Service Management and Incidences %A Ortiz, SusanaSánchez %A Pérez Benitez, Alfredo %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K FOSS %K service management and incidences %X The Free Open Source Software (FOSS) solutions have been reaching a high demand, usage and global recognition, not only in the development of applications for companies and institutions also in the management of services and incidents. With the upswing of Information Technology (IT), the development of tools that enable the reporting of problems and incidents on any organization or company is necessary. Every day you need more applications, software generally, that make easier the user’s actions. This paper describes the need to use these tools and recount the development of a web application that allows the management of reports and incidents from users of Nova, the GNU/Linux Cuban distribution. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 76-79 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_9 %R 10.1007/978-3-642-55128-4_9 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T How Do Social Interaction Networks Influence Peer Impressions Formation? A Case Study %A Bosu, Amiangshu %A Carver, JeffreyC. %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K COLLABORATION %K FOSS %K open source %K OSS %K social network analysis %X Due to their lack of physical interaction, Free and Open Source Software (FOSS) participants form impressions of their teammates largely based on sociotechnical mechanisms including: code commits, code reviews, mailing-lists, and bug comments. These mechanisms may have different effects on peer impression formation. This paper describes a social network analysis of the WikiMedia project to determine which type of interaction has the most favorable characteristics for impressions formation. The results suggest that due to lower centralization, high interactivity, and high degree of interactions between participants, the code review interactions have the most favorable characteristics to support impression formation among FOSS participants. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 31-40 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_4 %R 10.1007/978-3-642-55128-4_4 %0 Conference Paper %B Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing %D 2014 %T How Social Q&A Sites Are Changing Knowledge Sharing in Open Source Software Communities %A Vasilescu, Bogdan %A Serebrenik, Alexander %A Devanbu, Prem %A Filkov, Vladimir %K a %K crowdsourced knowledge %K gamification. %K mailing lists %K open source %K social q&\#38 %X Historically, mailing lists have been the preferred means for coordinating development and user support activities. With the emergence and popularity growth of social Q&A sites such as the StackExchange network (e.g., StackOverflow), this is beginning to change. Such sites offer different socio-technical incentives to their participants than mailing lists do, e.g., rich web environments to store and manage content collaboratively, or a place to showcase their knowledge and expertise more vividly to peers or potential recruiters. A key difference between StackExchange and mailing lists is gamification, i.e., StackExchange participants compete to obtain reputation points and badges. In this paper, we use a case study of R (a widely-used tool for data analysis) to investigate how mailing list participation has evolved since the launch of StackExchange. Our main contribution is the assembly of a joint data set from the two sources, in which participants in both the texttt{r-help} mailing list and StackExchange are identifiable. This permits their activities to be linked across the two resources and also over time. With this data set we found that user support activities show a strong shift away from texttt{r-help}. In particular, mailing list experts are migrating to StackExchange, where their behaviour is different. First, participants active both on texttt{r-help} and on StackExchange are more active than those who focus exclusively on only one of the two. Second, they provide faster answers on StackExchange than on texttt{r-help}, suggesting they are motivated by the emph{gamified} environment. To our knowledge, our study is the first to directly chart the changes in behaviour of specific contributors as they migrate into gamified environments, and has important implications for knowledge management in software engineering. %B Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing %S CSCW '14 %I ACM %C New York, NY, USA %P 342–354 %@ 978-1-4503-2540-0 %U http://doi.acm.org/10.1145/2531602.2531659 %R 10.1145/2531602.2531659 %> https://flosshub.org/sites/flosshub.org/files/cscw14.pdf %0 Conference Paper %B Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies %D 2014 %T Investigating Social Media in GitHub's Pull-requests: A Case Study on Ruby on Rails %A Zhang, Yang %A Yin, Gang %A Yu, Yue %A Wang, Huaimin %K @-mention %K github %K pull-request %K social media %X In GitHub, pull-request mechanism is an outstanding social development method by integrating with many social media. Many studies have explored that social media has an important effect on software development. @-mention as a typical social media, is a useful tool in social platform. In this paper, we made a quantitative analysis of @-mention in pull-requests of the project Ruby on Rails. First, we make a convictive statistics of the popularity of pull-request mechanism in GitHub. Then we investigate the current situation of @-mention in the Ruby on Rails. Our empirical analysis results find some insights of @-mention. %B Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies %S CrowdSoft 2014 %I ACM %C New York, NY, USA %P 37–41 %@ 978-1-4503-3224-8 %U http://doi.acm.org/10.1145/2666539.2666572 %R 10.1145/2666539.2666572 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Magnet or Sticky? An OSS Project-by-project Typology %A Yamashita, Kazuhiro %A McIntosh, Shane %A Kamei, Yasutaka %A Ubayashi, Naoyasu %K Developer migration %K Magnet %K mining challenge %K msr challenge %K open source %K Sticky %X For Open Source Software (OSS) projects, retaining existing contributors and attracting new ones is a major concern. In this paper, we expand and adapt a pair of population migration metrics to analyze migration trends in a collection of open source projects. Namely, we study: (1) project stickiness, i.e., its tendency to retain existing contributors and (2) project magnetism, i.e., its tendency to attract new contributors. Using quadrant plots, we classify projects as attractive (highly magnetic and sticky), stagnant (highly sticky, weakly magnetic), fluctuating (highly magnetic, weakly sticky), or terminal (weakly magnetic and sticky). Through analysis of the MSR challenge dataset, we find that: (1) quadrant plots can effectively identify at-risk projects, (2) stickiness is often motivated by professional activity and (3) transitions among quadrants as a project ages often coincides with interesting events in the evolution history of a project. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 344–347 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597116 %R 10.1145/2597073.2597116 %> https://flosshub.org/sites/flosshub.org/files/yamashita.pdf %0 Unpublished Work %D 2014 %T Measuring the Health of Open Source Software Ecosystems: Moving Beyond the Scope of Project Health %A Slinger Jansen %K open source ecosystems %K Software ecosystem health %K Software repository mining %X Background. The livelihood of an open source ecosystem is important to different ecosystem participants: software developers, end-users, investors, and participants want to know whether their ecosystem is healthy and performing well. Currently, there exists no working operationalization available that can be used to determine the health of open source ecosystems. Health is typically looked at from a project scope, not from an ecosystem scope. Objectives. With such an operationalization, stakeholders can make better decisions on whether to invest in an ecosystem: developers can select the healthiest ecosystem to join, keystone organizers can establish which governance techniques are effective, and end-users can select ecosystems that are robust, will live long, and prosper. Method. Design research is used to create the health operationalization. The evaluation step is done using four ecosystem health projects from literature. Results. The Open Source Ecosystem Health Operationalization is provided, which establishes the health of a complete software ecosystem, using the data from collections of open source projects that belong to the ecosystem. Conclusion. The groundwork is done, by providing a summary of research challenges, for more research in ecosystem health. With the operationalization in hand, re- searchers no longer need to start from scratch when researching open source ecosystems’ health. %U https://www.dropbox.com/s/borc730uw32kkzp/SECOhealth.pdf?dl=0 %> https://flosshub.org/sites/flosshub.org/files/SECOhealth%20%281%29.pdf %0 Conference Paper %B Proceedings of the 2014 European Conference on Software Architecture Workshops %D 2014 %T The Merits of a Meritocracy in Open Source Software Ecosystems %A Eckhardt, Evert %A Kaats, Erwin %A Slinger Jansen %A Alves, Carina %K Ecosystem Health %K Meritocracy %K open source %K Software ecosystems %X The Eclipse open source ecosystem has grown from a small internal IBM project to one of the biggest Integrated Development Environments in the market. Open source communities and ecosystems do not follow the standard governance strategies typically used in large organizations. A meritocracy is a frequently occurring form of governance on different levels in open ecosystems. In this paper we investigate how this form of governance influences the health of projects within the Eclipse ecosystem in terms of the amount of commits within each month. We analyzed the hierarchy of Eclipse, how merits are conceptualized within the ecosystem and the effect of the appointments of mentors and project leads on the amount of commits. From our research, we can conclude that this system is not always as fair as it seems; merits are only a benefit in some cases. %B Proceedings of the 2014 European Conference on Software Architecture Workshops %S ECSAW '14 %I ACM %C New York, NY, USA %P 7:1–7:6 %@ 978-1-4503-2778-7 %U http://doi.acm.org/10.1145/2642803.2642810 %R 10.1145/2642803.2642810 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Oops! Where Did That Code Snippet Come from? %A Guo, Lisong %A Lawall, Julia %A Muller, Gilles %K debugging %K linux kernel %K oops %K sequence alignment %X A kernel oops is an error report that logs the status of the Linux kernel at the time of a crash. Such a report can provide valuable first-hand information for a Linux kernel maintainer to conduct postmortem debugging. Recently, a repository has been created that systematically collects kernel oopses from Linux users. However, debugging based on only the information in a kernel oops is difficult. We consider the initial problem of finding the offending line, i.e., the line of source code that incurs the crash. For this, we propose a novel algorithm based on approximate sequence matching, as used in bioinformatics, to automatically pinpoint the offending line based on information about nearby machine-code instructions, as found in a kernel oops. Our algorithm achieves 92% accuracy compared to 26% for the traditional approach of using only the oops instruction pointer. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 52–61 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597094 %R 10.1145/2597073.2597094 %> https://flosshub.org/sites/flosshub.org/files/guo.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Security and Emotion: Sentiment Analysis of Security Discussions on GitHub %A Pletea, Daniel %A Vasilescu, Bogdan %A Serebrenik, Alexander %K github %K mining challenge %K msr challenge %K security %K sentiment analysis %X Application security is becoming increasingly prevalent during software and especially web application development. Consequently, countermeasures are continuously being discussed and built into applications, with the goal of reducing the risk that unauthorized code will be able to access, steal, modify, or delete sensitive data. In this paper we gauged the presence and atmosphere surrounding security-related discussions on GitHub, as mined from discussions around commits and pull requests. First, we found that security related discussions account for approximately 10% of all discussions on GitHub. Second, we found that more negative emotions are expressed in security-related discussions than in other discussions. These findings confirm the importance of properly training developers to address security concerns in their applications as well as the need to test applications thoroughly for security vulnerabilities in order to reduce frustration and improve overall project atmosphere. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 348–351 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597117 %R 10.1145/2597073.2597117 %> https://flosshub.org/sites/flosshub.org/files/pletea.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Sentiment Analysis of Commit Comments in GitHub: An Empirical Study %A Guzman, Emitza %A Azócar, David %A Li, Yang %K Human Factors in Software Engineering %K mining challenge %K msr challenge %K sentiment analysis %X Emotions have a high impact in productivity, task quality, creativity, group rapport and job satisfaction. In this work we use lexical sentiment analysis to study emotions expressed in commit comments of different open source projects and analyze their relationship with different factors such as used programming language, time and day of the week in which the commit was made, team distribution and project approval. Our results show that projects developed in Java tend to have more negative commit comments, and that projects that have more distributed teams tend to have a higher positive polarity in their emotional content. Additionally, we found that commit comments written on Mondays tend to a more negative emotion. While our results need to be confirmed by a more representative sample they are an initial step into the study of emotions and related factors in open source projects. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 352–355 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597118 %R 10.1145/2597073.2597118 %0 Journal Article %J Revista Eletrônica de Sistemas de Informação %D 2014 %T SENTIMENT ANALYSIS OF FREE/OPEN SOURCE DEVELOPERS: PRELIMINARY FINDINGS FROM A CASE STUDY %A Rousinopoulos, Athanasios-Ilias %A Gregorio Robles %A González-Barahona, Jesús M. %K developer productivity %K FLOSS %K mailing lists %K natural language processing %K openSUSE %K sentiment analysis %K software development; software repository mining %X Software development is a human intensive activity. And as such, how developers face their tasks is of major importance. In an environment such as the one that is common in FOSS (free/open source software) projects where professionals (i.e., paid developers) share the development effort with volunteers, the morale of the development and user community is of major importance. In this paper, we present a preliminary analysis using sentiment analysis techniques to a FOSS project. We therefore mine the mailing list of a project and apply these techniques to the most relevant participants. Although the application is at this time limited, we hope that this experience can be of benefit in the future to determine situations that may affect the developers or the project, such as low productivity, developer abandonment, project forking, etc. %B Revista Eletrônica de Sistemas de Informação %V 13 %8 08/2014 %U http://189.16.45.2/ojs/index.php/reinfo/article/view/1677 %N 2 %! RESI %R 10.5329/RESI.2014.1302006 %> https://flosshub.org/sites/flosshub.org/files/1677-6732-1-PB.pdf %0 Conference Paper %B Proceedings of the Companion Publication of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing %D 2014 %T Software Developers Are Humans, Too! %A Vasilescu, Bogdan %K human aspects %K open-source %K software developers %X Open-source communities can be seen as knowledge-sharing ecosystems: participants learn from the community and from one another, and share their knowledge through contributions to the source code repositories or by offering support to users. With the emergence and growing popularity of social media sites targeting software developers (e.g., StackOverflow, GitHub), the paths through which knowledge flows within open-source software knowledge-sharing ecosystems are also beginning to change. My dissertation research seeks to raise our understanding of these changes. %B Proceedings of the Companion Publication of the 17th ACM Conference on Computer Supported Cooperative Work &\#38; Social Computing %S CSCW Companion '14 %I ACM %C New York, NY, USA %P 97–100 %@ 978-1-4503-2541-7 %U http://doi.acm.org/10.1145/2556420.2556833 %R 10.1145/2556420.2556833 %0 Journal Article %J Science of Computer Programming %D 2014 %T Studying software evolution using topic models %A Stephen W. Thomas %A Adams, Bram %A Hassan, Ahmed E. %A Blostein, Dorothea %K Latent Dirichlet allocation %K mining software repositories %K software evolution %K topic model %X Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently cooccurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task. In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%–89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system. %B Science of Computer Programming %I Elsevier %V 80 %P 457–479 %U http://sail.cs.queensu.ca/publications/pubs/Thomas-2012-SCP.pdf %0 Conference Paper %B Proceedings of The International Symposium on Open Collaboration %D 2014 %T Understanding Coopetition in the Open-Source Arena: The Cases of WebKit and OpenStack %A Teixeira, Jose %K COLLABORATION %K Competition %K Coopetition %K Ecosystems %K FLOSS %K Open-Coopetition %K open-source %K OSS %K Strategic Alliances %X In an era of software crisis, the move of firms towards distributed software development teams is being challenged by emerging collaboration issues. On this matter, the open-source phenomenon may shed some light, as successful cases on distributed collaboration in the open-source community have been recurrently reported. In our research we explore collaboration networks in the WebKit and OpenStack high-networked open-source projects, by mining their source-code version-control-systems data with Social Network Analysis (SNA). Our approach allows us to observe how key events in the industry affect open-source collaboration networks over time. With our findings, we highlight the explanatory power from network visualizations capturing the collaborative dynamics of high-networked software projects over time. Moreover, we argue that competing companies that sell similar products in the same market, can collaborate in the open-source community while publicly manifesting intense rivalry (e.g. Apple vs Samsung patent-wars). After integrating our findings with the current body of theoretical knowledge in management strategy, economics, strategic alliances and coopetition, we propose the novel notion of open-coopetition, where rival firms collaborate with competitors in the open-source community. We argue that classical coopetition management theories do not fully explain the competitive and collaborative issues that are simultaneously present and interconnected in the WebKit and OpenStack open-source communities. We propose the development of the novel open-coopetition theory for a better understanding on how rival-firms collaborate with competitors by open-source manners. %B Proceedings of The International Symposium on Open Collaboration %S OpenSym '14 %I ACM %C New York, NY, USA %P 39:1–39:5 %@ 978-1-4503-3016-9 %U http://doi.acm.org/10.1145/2641580.2641627 %R 10.1145/2641580.2641627 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Understanding "Watchers" on GitHub %A Sheoran, Jyoti %A Blincoe, Kelly %A Kalliamvakou, Eirini %A Damian, Daniela %A Ell, Jordan %K github %K mining challenge %K msr challenge %K repositories %K Software Teams %K Watchers %X Users on GitHub can watch repositories to receive notifications about project activity. This introduces a new type of passive project membership. In this paper, we investigate the behavior of watchers and their contribution to the projects they watch. We find that a subset of project watchers begin contributing to the project and those contributors account for a significant percentage of contributors on the project. As contributors, watchers are more confident and contribute over a longer period of time in a more varied way than other contributors. This is likely attributable to the knowledge gained through project notifications. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 336–339 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597114 %R 10.1145/2597073.2597114 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T Use of Open Software Tools for Data Offloading Techniques Analysis on Mobile Networks %A Koo, JoséM. %A Espino, JuanP. %A Armuelles, Iván %A Villarreal, Rubén %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K Data Offloading %K LTE %K ns3 %K OSS for research and education %K small cells %K WiFi %X This research aims to highlight the benefits of using free software based tools for studying a LTE mobile network with realistic parameters. We will overload this LTE network and offload it through data offloading techniques such as small cells and WiFi offload. For this research, discreteevent open software network simulator ns3 will be implemented. Ns3 is a network simulator based on the programming language C++, and has all the necessary libraries to simulate an LTE and WiFi network. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 111-112 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_15 %R 10.1007/978-3-642-55128-4_15 %0 Conference Paper %B Proceedings of The International Symposium on Open Collaboration %D 2014 %T Volunteer Attraction and Retention in Open Source Communities %A Barcomb, Ann %K Community Management %K FLOSS %K open source %K Recruitment %K Service Duration %K Volunteer Management %K Volunteer Retention %K Volunteers %X The importance of volunteers in open source has led to the position of community manager becoming more common in foundations and projects. Yet the advice for volunteer management and retention is fragmented, incomplete, contradictory, and has not been empirically examined. Our aim is to fill this gap by creating a comprehensive guidebook of best practices drawing from open source practitioner guides and general literature on volunteering, and to subject a subset of practices to empirical study. A method for evaluating volunteer attrition in terms of value to the organization will also be developed. %B Proceedings of The International Symposium on Open Collaboration %S OpenSym '14 %I ACM %C New York, NY, USA %P 40:1–40:2 %@ 978-1-4503-3016-9 %U http://doi.acm.org/10.1145/2641580.2641628 %R 10.1145/2641580.2641628 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T When Are OSS Developers More Likely to Introduce Vulnerable Code Changes? A Case Study %A Bosu, Amiangshu %A Carver, JeffreyC. %A Hafiz, Munawar %A Hilley, Patrick %A Janni, Derek %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K FOSS %K open source %K OSS %K security %K vulnerability %X We analyzed peer code review data of the Android Open Source Project (AOSP) to understand whether code changes that introduce security vulnerabilities, referred to as vulnerable code changes (VCC), occur at certain intervals. Using a systematic manual analysis process, we identified 60 VCCs. Our results suggest that AOSP developers were more likely to write VCCs prior to AOSP releases, while during the post-release period they wrote fewer VCCs. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 234-236 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_37 %R 10.1007/978-3-642-55128-4_37 %0 Conference Proceedings %B 25th International Conference on Software Engineering and Knowledge Engineering (SEKE) %D 2013 %T Analyzing Social Behavior of Software Developers Across Different Communication Channels %A Iqbal, Aftab %A M Karnstedt %A M Hausenblas %K communication %K developer %K social media %X Software developers use different project repositories (i.e., mailing list, bug tracking repositories, discussion forums etc.) to interact with each other or to solve software related problems. The growing interest in the usage of social media channels (i.e., Twitter, Facebook, LinkedIn) have also attracted the open source software community and software developers to adopt an identity in order to disseminate project-related information to a wider audience. Much research has been carried out to analyze the social behavior of software developers in different project repositories but so far no one has tried to study the social communication patterns of developers in other social media channels. We in this paper presents a new dimension to the social aspects of software developers and study if the social communication patterns of software developers is different on project repositories and social media channels (i.e., Twitter). %B 25th International Conference on Software Engineering and Knowledge Engineering (SEKE) %U http://index.ksi.edu/conf/seke/2013/cr/296.pdf %> https://flosshub.org/sites/flosshub.org/files/iqbal_a_et_al_june_2013.pdf %0 Conference Paper %B Proceedings of the 6th Balkan Conference in Informatics %D 2013 %T An application of data envelopment analysis to software quality assessment %A Paschalidou, Georgia %A Stiakakis, Emmanouil %A Chatzigeorgiou, Alexander %K dea %K design metrics %K software evolution %K software quality %X Data Envelopment Analysis (DEA) is a non-parametric technique which involves the use of linear programming methods to measure the efficiency of a homogenous set of units. These units are known as Decision Making Units (DMUs) and defined by multiple input and output data. Efficiencies are measured relative to a piece-wise surface (efficient frontier) which envelops the data, thus justifying the name of the technique. Although DEA has been mostly used in production economics, its application in the context of software quality evaluation seems to be a promising approach. This study provides an application of DEA to assess the evolution of two open-source software projects in terms of selected metric values for successive versions of each project. What is really interesting in DEA is that a single efficiency score is calculated for each version despite the often convoluted overall picture of the metric values. According to a simplified view of DEA, there are two categories of units, the efficient (onto the efficient frontier) and the inefficient ones. Each inefficient unit is characterized by a reference set of peers which involves all the efficient units "operating" closer to that unit. Through the consideration of the reference set of the inefficient versions of each project, the metrics that require improvement, as well as the extent of improvement, could be estimated. These results could assist software developers in identifying design issues that require further improvement. Notwithstanding the fact that there are a number of issues to be further investigated, the applicability of DEA and other operations research tools in the context of software quality might yield interesting results. %B Proceedings of the 6th Balkan Conference in Informatics %S BCI '13 %I ACM %C New York, NY, USA %P 228–235 %@ 978-1-4503-1851-8 %U http://doi.acm.org/10.1145/2490257.2490264 %R 10.1145/2490257.2490264 %0 Conference Proceedings %B 35th Int'l Conference on Software Engineering (ICSE 2013) %D 2013 %T Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories %A Dyer, Robert %A Nguyen, Hoan Anh %A Rajan, Hridesh %A Nguyen, Tien N. %K ease of use %K forge %K github %K google code %K lower barrier to entry %K mining %K repository %K reproducible %K scalable %K Software %K sourceforge %X In today’s software-centric world, ultra-large-scale software repositories, e.g. SourceForge (350,000+ projects), GitHub (250,000+ projects), and Google Code (250,000+ projects) are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important hypotheses. However, systematic extraction of relevant data from these repositories and analysis of such data for testing hypotheses is hard, and best left for mining software repository (MSR) experts! The goal of Boa, a domain-specific language and infrastructure described here, is to ease testing MSR-related hypotheses. We have implemented Boa and provide a web-based interface to Boa’s infrastructure. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also see drastic improvements in scalability. Last but not least, reproducing an experiment conducted using Boa is just a matter of re-running small Boa programs provided by previous researchers. %B 35th Int'l Conference on Software Engineering (ICSE 2013) %P 422-431 %8 05/2013 %0 Conference Paper %B Proceedings of the 22Nd International Conference on World Wide Web Companion %D 2013 %T Discovery of Technical Expertise from Open Source Code Repositories %A Venkataramani, Rahul %A Gupta, Atul %A Asadullah, Allahbaksh %A Muddu, Basavaraju %A Bhat, Vasudev %K github %K knowledge discovery %K recommendations %K source code repository %K stackoverflow %K technical expertise %X Online Question and Answer websites for developers have emerged as the main forums for interaction during the software development process. The veracity of an answer in such websites is typically verified by the number of 'upvotes' that the answer garners from peer programmers using the same forum. Although this mechanism has proved to be extremely successful in rating the usefulness of the answers, it does not lend itself very elegantly to model the expertise of a user in a particular domain. In this paper, we propose a model to rank the expertise of the developers in a target domain by mining their activity in different opensource projects. To demonstrate the validity of the model, we built a recommendation system for StackOverflow which uses the data mined from GitHub. %B Proceedings of the 22Nd International Conference on World Wide Web Companion %S WWW '13 Companion %I International World Wide Web Conferences Steering Committee %C Republic and Canton of Geneva, Switzerland %P 97–98 %@ 978-1-4503-2038-2 %U http://dl.acm.org/citation.cfm?id=2487788.2487832 %0 Conference Paper %B Proceedings of the 29th IEEE International Conference on Software Maintainability %D 2013 %T How does Context affect the Distribution of Software Maintainability Metrics? %A Zhang, Feng %A Audris Mockus %A Ying Zou %A Foutse Khomh %A Hassan, Ahmed E. %K benchmark %K context %K contextual factor %K flossmole %K large scale %K metrics %K mining software repositories %K sampling %K software maintainability %K sourceforge %K static metrics %X Software metrics have many uses, e.g., defect prediction, effort estimation, and benchmarking an organization against peers and industry standards. In all these cases, metrics may depend on the context, such as the programming language. Here we aim to investigate if the distributions of commonly used metrics do, in fact, vary with six context factors: application domain, programming language, age, lifespan, the number of changes, and the number of downloads. For this preliminary study we select 320 nontrivial software systems from SourceForge. These software systems are randomly sampled from nine popular application domains of SourceForge. We calculate 39 metrics commonly used to assess software maintainability for each software system and use Kruskal Wallis test and Mann-Whitney U test to determine if there are significant differences among the distributions with respect to each of the six context factors. We use Cliff’s delta to measure the magnitude of the differences and find that all six context factors affect the distribution of 20 metrics and the programming language factor affects 35 metrics. We also briefly discuss how each context factor may affect the distribution of metric values.We expect our results to help software benchmarking and other software engineering methods that rely on these commonly used metrics to be tailored to a particular context. %B Proceedings of the 29th IEEE International Conference on Software Maintainability %S ICSM '13 %> https://flosshub.org/sites/flosshub.org/files/icsm2013_contextstudy.pdf %0 Book %B IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification %D 2013 %T How Healthy Is My Project? Open Source Project Attributes as Indicators of Success %A Piggot, James %A Amrit, Chintan %E Petrinja, Etiel %E Succi, Giancarlo %E Ioini, Nabil %E Sillitti, Alberto %K flossmole %K sourceforge %X Determining what factors can influence the successful outcome of a software project has been labeled by many scholars and software engineers as a difficult problem. In this paper we use machine learning to create a model that can determine the stage a software project has obtained with some accuracy. Our model uses 8 Open Source project metrics to determine the stage a project is in. We validate our model using two performance measures; the exact success rate of classifying an Open Source Software project and the success rate over an interval of one stage of its actual performance using different scales of our dependent variable. In all cases we obtain an accuracy of above 70% with one away classification (a classification which is away by one) and about 40% accuracy with an exact classification. We also determine the factors (according to one classifier) that uses only eight variables among all the variables available in SourceForge, that determine the health of an OSS project. %B IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification %I Springer Berlin Heidelberg %C Berlin, Heidelberg %V 404 %P 30 - 44 %@ 978-3-642-38928-3 %U http://link.springer.com/chapter/10.1007/978-3-642-38928-3_3 %R 10.1007/978-3-642-38928-3_3 %> https://flosshub.org/sites/flosshub.org/files/OSSHealth_1.0.pdf %0 Book %B IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification %D 2013 %T Is It All Lost? A Study of Inactive Open Source Projects %A Khondu, Jymit %A Capiluppi, Andrea %A Stol, Klaas %E Petrinja, Etiel %E Succi, Giancarlo %E Ioini, Nabil %E Sillitti, Alberto %K sourceforge %X Open Source Software (OSS) proponents suggest that when developers lose interest in their project, their last duty is to “hand it off to a competent successor.” However, the mechanisms of such a hand-off are not clear, or widely known among OSS developers. As a result, many OSS projects, after a certain long period of evolution, stop evolving, in fact becoming “inactive” or “abandoned” projects. This paper presents an analysis of the population of projects contained within one of the largest OSS repositories available (SourceForge.net), in order to describe how projects abandoned by their developers can be identified, and to discuss the attributes and characteristics of these inactive projects. In particular, the paper attempts to differentiate projects that experienced maintainability issues from those that are inactive for other reasons, in order to be able to correlate common characteristics to the “failure” of these projects. %B IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification %I Springer Berlin Heidelberg %C Berlin, Heidelberg %V 404 %P 61 - 79 %@ 978-3-642-38928-3 %U http://staff.lero.ie/stol/files/2013/03/2013-Is-It-All-Lost-A-Study-of-Inactive-Open-Source-Projects.pdf %R 10.1007/978-3-642-38928-3_5 %> https://flosshub.org/sites/flosshub.org/files/2013-Is-It-All-Lost-A-Study-of-Inactive-Open-Source-Projects.pdf %0 Conference Paper %B The 25th International Conference on Software Engineering and Knowledge Engineering (SEKE 2013) %D 2013 %T Managing Corrective Actions to Closure in Open Source Software Test Process %A Abdou, Tamer %A Grogono, Peter %A Kamthan, Pankaj %K open source software %K software engineering %K software quality %K Software testing %K Test Process Improvement. %X In assessing test process maturity, one of the goals is to manage disciplinary issues. Managing corrective actions to closure is known to aid software quality assurance, in general, and testing process activities, in particular. In this paper, a framework for software testing assessment, namely OSS-TPA, that aims to evaluate corrective actions in OSS test process, is proposed. The OSS-TPA framework is based on earlier studies and relies on a conceptual model for test process activities in OSS development. Using success factors in OSS development, the relationship between the maturity of managing corrective actions and the adoption of OSS is investigated. %B The 25th International Conference on Software Engineering and Knowledge Engineering (SEKE 2013) %C Boston, USA %P 306–311 %U http://index.ksi.edu/conf/seke/2013/cr/282.pdf %> https://flosshub.org/sites/flosshub.org/files/282.pdf %0 Journal Article %J International Journal of the Commons %D 2013 %T Preliminary steps toward a general theory of internet-based collective-action in digital information commons: Findings from a study of open source software projects %A Charles Schweik %A Robert English %K collaborative success and abandonment %K common property regime %K digital information commons %K flossmole %K Free/libre software %K open source software %K sourceforge %K srda %X This paper presents some of the findings from a 5-year empirical study of FOSS (free/libre and open source software) commons, completed in 2011. FOSS projects are Internet-based common property regimes where the project source code is developed over the Internet. The resulting software is generally distributed with a license that provides users with the freedoms to access, use, read, modify and redistribute the software. In this study we used three different and very large datasets (approximately 107,000; 174,000 and 1400 cases respectively) with information on FOSS projects residing in Sourceforge.net, one of the largest, if not the largest, FOSS repository in the world. We employ various quantitative methods to uncover factors that lead some FOSS projects to ongoing collaborative success, while others become abandoned. After presenting some of our study’s results, we articulate the collaborative “story” of FOSS that emerged. We close the paper by discussing some key findings that can contribute to a general theory of Internet-based collective-action and FOSS-like forms of digital online commons. %B International Journal of the Commons %V 7 %U http://www.thecommonsjournal.org/index.php/ijc/article/view/URN%3ANBN%3ANL%3AUI%3A10-1-114926 %0 Conference Paper %B Proceedings of the 2013 International Conference on Software and System Process %D 2013 %T Processes in Securing Open Architecture Software Systems %A Walt Scacchi %A Alspaugh, Thomas A. %K configuration %K continuous software development %K Open architecture %K process integration %K process modeling %K security %X Our goal is to identify and understand issues that arise in the development and evolution processes for securing open architecture (OA) software systems. OA software systems are those developed with a mix of closed source and open source software components that are configured via an explicit system architectural specification. Such a specification may serve as a reference model or product line model for a family of concurrently sustained OA system versions/variants. We employ a case study focusing on an OA software system whose security must be continually sustained throughout its ongoing development and evolution. We limit our focus to software processes surrounding the architectural design, continuous integration, release deployment, and evolution found in the OA system case study. We also focus on the role automated tools, software development support mechanisms, and development practices play in facilitating or constraining these processes through the case study. Our purpose is to identify issues that impinge on modeling (specification) and integration of these processes, and how automated tools mediate these processes, as emerging research problems areas for the software process research community. Finally, our study is informed by related research found in the prescriptive versus descriptive practice of these processes and tool usage in studies of conventional and open source software development projects. %B Proceedings of the 2013 International Conference on Software and System Process %S ICSSP 2013 %I ACM %C New York, NY, USA %P 126–135 %@ 978-1-4503-2062-7 %U http://doi.acm.org/10.1145/2486046.2486068 %R 10.1145/2486046.2486068 %> https://flosshub.org/sites/flosshub.org/files/Scacchi-Alspaugh-ICSSP13.pdf %0 Conference Proceedings %B 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER2013) %D 2013 %T A Replicable Infrastructure for Empirical Studies of Email Archives %A Squire, Megan %K apache %K cleaning %K collection %K couchdb %K database %K document-oriented database %K email %K lucene %K mailing lists %K nosql %K replication %K storage %X This paper describes a replicable infrastructure solution for conducting empirical software engineering studies based on email mailing list archives. Mailing list emails, such as those affiliated with free, libre, and open source software (FLOSS) projects, are currently archived in several places online, but each research team that wishes to study these email artifacts closely must design their own solution for collection, storage and cleaning of the data. Consequently, research results will be difficult to replicate, especially as the email archive for any living project will still be continually growing. This paper describes a simple, replicable infrastructure for the collection, storage, and cleaning of project email data and analyses. %B 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER2013) %I IEEE %C Baltimore, MD, USA %P 43-50 %8 10/2013 %@ 978-0-7695-5121-0 %> https://flosshub.org/sites/flosshub.org/files/RESERv2.pdf %0 Conference Proceedings %B 10th Working Conference on Mining Software Repositories %D 2013 %T Using Citation Influence to Predict Software Defects %A Wei Hu %A Kenny Wong %K eclipse %K netbeans %K social network %X The software dependency network reflects structure and the developer contribution network reflects process. Previous studies have used social network properties over these networks to predict whether a software component is defect-prone. However, these studies do not consider the strengths of the dependencies in the networks. In our approach, we use a citation influence topic model to determine dependency strengths among components and developers, analyze weak and strong dependencies separately, and apply social network properties to predict defect-prone components. In experiments on Eclipse and NetBeans, our approach has higher accuracy than prior work. %B 10th Working Conference on Mining Software Repositories %8 05/2013 %0 Conference Proceedings %B 10th Working Conference on Mining Software Repositories %D 2013 %T Who Does What during a Code Review? Datasets of OSS Peer Review Repositories %A Kazuki Hamasaki %A Raula Gaikovina Kula %A Norihiro Yoshida %A A. E. Camargo Cruz %A Kenji Fujiwara %A Hajimu Iida %K android %K case study %K code review %K data set %K peer review %K roles %K source code %X We present four datasets that are focused on the general roles of OSS peer review members. With data mined from both an integrated peer review system and code source repositories, our rich datasets comprise of peer review data that was automatically recorded. Using the Android project as a case study, we describe our extraction methodology, the datasets and their application used for three separate studies. Our datasets are available online at http://sdlab.naist.jp/reviewmining/ %B 10th Working Conference on Mining Software Repositories %8 05/2013 %0 Journal Article %J Empirical Software Engineering %D 2012 %T Analyzing and mining a code search engine usage log %A Bajracharya, Sushil Krishna %A Lopes, Cristina Videira %K code search %K koders %K search %K search engine %K topics %X This paper presents an analysis of a year long usage log of Koders, the first commercially available Internet-Scale code search engine (http://www.koders.com). The usage log comprises about ten million activities from more than three million users. Analysis of the usage data shows that despite of attracting a large number of visitors, Koders has a very sparse usage and that it lacks regular usage from many of its users. When compared to Web search, search behavior in Koders showed many similar patterns. A topic modeling analysis of the usage data shows what topics users of Koders are looking for. Observations on the prevalence of these topics among the users, and observations on how search and download activities vary across topics, lead to the conclusion that users who find code search engines usable are those who already know to a high level of specificity what to look for. This paper also presents a general categorization of these topics that provides insights on the different ways code search engine users express their queries. It identifies various forms of queries in Koders’s log and the kinds of results addressed by the queries. It also provides several suggestions for improvements in code search engines based on the analysis of usage, topics, and query forms. The work presented in this paper is the first of its kind that reveals several insights on the usage of an Internet-Scale code search engine. %B Empirical Software Engineering %V 17 %P 424 - 466 %8 8/2012 %N 4-5 %! Empir Software Eng %R 10.1007/s10664-010-9144-6 %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology (OSS 2012) %D 2012 %T A Comprehensive Study of Software Forks: Dates, Reasons and Outcomes %A Gregorio Robles %A González-Barahona, Jesús M. %K forking %K forks %K free software %K Legal %K open source %K social %K software evolution %K sustainability %X Summary. In general it is assumed that a software product evolves within the authoring company or group of developers that develop the project. However, in some cases different groups of developers make the software evolve in different directions, a situation which is commonly known as a fork. In the case of free software, although forking is a practice that is considered as a last resort, it is inherent to the four freedoms. This paper tries to shed some light on the practice of forking. Therefore, we have identified significant forks, several hundreds in total, and have studied them in depth. Among the issues that have been analyzed for each fork is the date when the forking occurred, the reason of the fork, and the outcome of the fork, i.e., if the original or the forking project are still developed. Our investigation shows, among other results, that forks occur in every software domain, that they have become more frequent in recent years, and that very few forks merge with the original project. %B IFIP Advances in Information and Communication Technology (OSS 2012) %I IFIP AICT %V 378 %P 1-14 %> https://flosshub.org/sites/flosshub.org/files/paper_0.pdf %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T Do More Experienced Developers Introduce Fewer Bugs? %A Izquierdo-Cortázar, Daniel %A Gregorio Robles %A González-Barahona, Jesús M. %K mercurial %K mozilla %K scm %K source code analysis %X Developer experience is a common matter of study in the software maintenance and evolution research literature. However it is still not well understood if less experienced developers are more prone to introduce errors in the source code than their more experienced colleagues. This paper aims to study the relationships between experience and the bug introduction ratio using the Mozilla community as case of study. As results, statistical differences among developers with different levels of experience has not been observed, when the expected result would have been the opposite1. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %I IFIP AICT, Springer %V 378 %P 268-273 %8 09/2012 %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T Emerging Hackerspaces – Peer-Production Generation %A Moilanen, Jarkko %K COMMUNITY %K fabbing %K fablab %K hackerspace %K makerspace %K MOTIVATION %K movement %K open source %K peer-production %K Survey %K sustainability %X This paper describes a peer-production movement, the hackerspace movement, its members and values. The emergence of hackerspaces, fablabs and makerspaces is changing how hacker communities and other like-minded communities function. Thus, an understanding of the nature of hackerspaces helps in detailing the features of contemporary peer-production. Building on previous work on 'fabbing', two different sets of results are presented: (1) empirical observations from a longitudinal study of hackerspace participants; and (2) a theoretical description of hacker generations as a larger context in which peer-production can be located. With regard to (1), research data has been collected through prolonged observation of hackerspace communities and two surveys. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %I IFIP AICT %V 378 %P 94-111 %8 09/2012 %0 Conference Paper %B 45th Hawai'i International Conference on System Sciences %D 2012 %T An Empirical Study of Volunteer Members' Perceived Turnover in Open Source Software Projects %A Yu, Yiqing %A Benlian, Alexander %A Hess, Thomas %K developers %K launchpad %K sourceforge %K Survey %X Turnover of volunteer members and the ensuing instability bring about severe problems to open source software (OSS) projects. To better understand it, we based our study on Herzberg ́s two-factor theory to investigate the influence of hygiene factors on volunteer members ́ dissatisfaction and perceived turnover. After empirically testing the research model, we found shortcomings in project regulation and administration are the key reason for volunteer members ́ dissatisfaction, followed by future rewards and personal needs for software functionalities. By contrast, a possible lack of supportive working relationship among OSS developers was not found to be a trigger for developer dissatisfaction. Dissatisfaction was confirmed to be a significant predictor of perceived turnover. The results demonstrates generalized hygiene factors cannot unreflectively be transferred into the OSS context because volunteer members ́ personal expectation has a weaker influence on perceived turnover than objective attributes of OSS project. Our study further makes suggestions for project administrators. %B 45th Hawai'i International Conference on System Sciences %P 3396-3405 %8 01/2012 %0 Journal Article %J Empirical Software Engineering %D 2012 %T The evolution of Java build systems %A McIntosh, Shane %A Adams, Bram %A Hassan, Ahmed E. %K ant %K build %K maven %K scm %K source code analysis %X Build systems are responsible for transforming static source code artifacts into executable software. While build systems play such a crucial role in software development and maintenance, they have been largely ignored by software evolution researchers. However, a firm understanding of build system aging processes is needed in order to allow project managers to allocate personnel and resources to build system maintenance tasks effectively, and reduce the build maintenance overhead on regular development activities. In this paper, we study the evolution of build systems based on two popular Java build languages (i.e., ANT and Maven) from two perspectives: (1) a static perspective, where we examine the complexity of build system specifications using software metrics adopted from the source code domain; and (2) a dynamic perspective, where the complexity and coverage of representative build runs are measured. Case studies of the build systems of six open source build projects with a combined history of 172 releases show that build system and source code size are highly correlated, with source code restructurings often requiring build system restructurings. Furthermore, we find that Java build systems evolve dynamically in terms of duration and recursive depth of the directory hierarchy. %B Empirical Software Engineering %V 17 %P 578 - 608 %8 8/2012 %N 4-5 %! Empir Software Eng %R 10.1007/s10664-011-9169-5 %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T Examining Turnover in Open Source Software Projects Using a Logistic Hierarchical Linear Modeling Approach %A Sharma, P.N. %A Hulland, J. %A Daniel, S. %K Logistic Hierarchical Linear Modeling %K sourceforge %K turnover %X Developer turnover in open source software projects is a critical and insufficiently researched problem. Previous research has focused on understanding the developer motivations to contribute using either the individual developer perspective or the project perspective. In this exploratory study we argue that because the developers are embedded in projects it is imperative to include both perspectives. We analyze turnover in open source software projects by including both individual developer level factors, as well as project specific factors. Using the Logistic Hierarchical Linear Modeling approach allows us to empirically examine the factors influencing developer turnover and also how these factors differ among developers and projects. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %C Eighth International Conference on Open Source Systems %V 378 %8 09/2012 %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T Forking the Commons: Developmental Tensions and Evolutionary Patterns in Open Source Software %A Gençer, Mehmet %A Özel, Bülent %K divergence %K forking %K software evolution %K specialization %X Open source software (OSS) presents opportunities and challenges for developers to exploit its commons based licensing regime by creating specializations of a software technology to address plurality of goals and priorities. By ‘forking’ a new branch of development separate from the main project, development diverges into a path in order to relieve tensions related to specialization, which later encounters new tensions. In this study, we first classify forces and patterns within this divergence process. Such tensions may stem from a variety of sources including internal power conflicts, emergence of new environmental niches such as demand for specialized uses of same software, or differences along stability vs. development speed trade-off. We then present an evolutionary model which combines divergence options available to resolve tensions, and how further tensions emerge. In developing this model we attempt to define open software evolution at the level of systems of software, rather than at individual software project level. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %I IFIP AICT, Springer %V 378 %P 310-315 %8 09/2012 %0 Journal Article %J International Journal of Open Source Software and Processes %D 2012 %T How the FLOSS Research Community Uses Email Archives %A Squire, Megan %K email %K email archives %K literature %K mailing lists %K review %K Survey %X Artifacts of the software development process, such as source code or emails between developers, are a frequent object of study in empirical software engineering literature. One of the hallmarks of free, libre, and open source software (FLOSS) projects is that the artifacts of the development process are publicly-accessible and therefore easily collected and studied. Thus, there is a long history in the FLOSS research community of using these artifacts to gain understanding about the phenomenon of open source software, which could then be compared to studies of software engineering more generally. This paper looks specifically at how the FLOSS research community has used email artifacts from free and open source projects. It provides a classification of the relevant literature using a publicly-available online repository of papers about FLOSS development using email. The outcome of this paper is to provide a broad overview for the software engineering and FLOSS research communities of how other researchers have used FLOSS email message artifacts in their work %B International Journal of Open Source Software and Processes %V 4 %P 37 - 59 %8 12/2012 %N 1 %R 10.4018/jossp.2012010103 %> https://flosshub.org/sites/flosshub.org/files/ijossp_v3_PREPRINT.pdf %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T The Impact of Formal QA Practices on FLOSS Communities – The Case of Mozilla %A Barham, Adina %K email %K information flow %K mailing lists %K mozilla %K quality assurance %K social network analysis %K test %X The number of FLOSS projects that include a QA step in the development model is increasing which suggests that a new layer may be emerging in the classic “onion model”. This change might affect the information flow within projects and implicitly their sustainability. Communities, the essential resource of FLOSS projects, have been extensively studied but questions concerning QA remain. This paper takes a step towards answering such questions by analyzing QA mailing lists and issue tracker data for the Mozilla group of projects. Because the Bugzilla data set contains over half a million bugs, data processing and analysis is a considerable challenge for this research. The provisional conclusions are that QA activity may not be increasing steadily over time but is dependent on other factors and that the QA team and other groups of contributors form a highly connected network that doesn’t contain isolates. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %I IFIP AICT, Springer %V 378 %P 262-267 %8 09/2012 %0 Report %D 2012 %T Integrating FLOSS repositories on the Web %A Iqbal, Aftab %A Cyganiak, Richard %A Hausenblas, Michael %K flossmole %K google code %K sourceforge %X This paper provides a novel approach to the problem of integrating data from multiple code forges of FLOSS. We review the current problems in integrating the data from multiple forges and argue that Semantic Web technologies are suitable for representing knowledge contained in code forges. Further, we show the advantage of linking the metadata of projects to other data sources on the Web which will enable querying extra information from the Web. The paper briefly describes how the modeling is achieved and what benefits can be obtained by enabling linking to other relevant data sources already available on the Web. %B DERI Technical Report 2012-12-10 %8 2012 %U https://www.researchgate.net/publication/259757473_Integrating_FLOSS_repositories_on_the_Web %> https://flosshub.org/sites/flosshub.org/files/DERI-TR-AFTAB-2012-12-10_0.pdf %0 Book %D 2012 %T Internet Success: A Study of Open Source Software Commons %A Schweik, C. M. %A English, R. %K flossmole %K srda %I MIT Press %C Cambridge, MA, USA %U http://tinyurl.com/d3e4545 %9 Book %0 Conference Paper %B 45th Hawai'i International Conference on System Sciences %D 2012 %T Network-Based Analysis of the Structure and Evolution of an Open Source Software Product %A Le, Qize %A Panchal, Jitesh H. %K drupal %K source code %X In this paper, an analysis of product structures in open source software (OSS) at both product level and module level is presented. At the product level, the product structures are modeled as complex networks, and the evolutionary characteristics of product structures are analyzed by using network analysis metrics. At the module level, linking mechanisms, which describe how a module is attached with other modules, are proposed. The linking mechanisms are modeled as probability functions dependent on the degrees of linking modules. A case study from an open source software project, Drupal, is presented. The evolutionary trends of Drupal product structures are analyzed and discussed. Finally, a model is presented to illustrate the effects of linking mechanisms at the module level on the product structures at the system level. The results indicate that the model built using the proposed linking mechanisms generates networks whose evolutionary characteristics are close to that of the original network. %B 45th Hawai'i International Conference on System Sciences %P 3436-3445 %8 01/2012 %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T Open Source Prediction Methods: A Systematic Literature Review %A Syeed, M.M. Mahbubul %A Kilamo, Terhi %A Hammouda, Imed %A Systä, Tarja %K Prediction Success %K Systematic literature review %X For the adoption of Open Source Software (OSS) components, knowledge of the project development and associated risks with their use is needed. That, in turn, calls for reliable prediction models to support preventive maintenance and building quality software. In this paper, we perform a systematic literature review on the state-of-the- art on predicting OSS projects considering both code and community dimension. We also distill future direction for research in this field. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %I IFIP AICT, Springer %V 378 %P 280-285 %8 09/2012 %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T Open-Source Technologies Realizing Social Networks: A Multiple Descriptive Case-Study %A Teixeira, Jose %K entrepreneurship %K facebook %K netlog %K social networks %K spotify %X This article aims at describing the role of the open-source software phenomenon within high-tech corporations providing social networks and applications. By taking a multiple case study approach, We address what are the open-source software technological components embedded by leading social networking players, and a rich description on how those players collaborate with the open-source community. Our findings, based on a population of three commercial providers of social networks a suggest that open-source plays an important role on the technological development of their social networking platforms. An open-source technological stack for realizing social networks is proposed and several managerial issues dealing with collaboration with open-source communities are explored. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %I IFIP AICT, Springer %V 378 %P 250-255 %8 09/2012 %0 Thesis %D 2012 %T Software Libre y abierto: comunidades y redes de producción digital de bienes comunes %A Tania E. Turner Sen %K bienes comunes %K commons %K comunidades virtuales %K FLOSS %K flossmole %K hackers %K redes virtuales %K repositories %K repositorios %K Software libre y abierto %K virtual communities %K virtual networks %X This thesis is about a collective form of production that have expanded and strengthen in the global high technology market. It is about FLOSS production. The study takes on account that technnologies are not neutral, they emerge as strategies and mechanisms of politics and economic interests. Although, FLOSS production is inserted in the capitalist context, the collective work of the communities and networks that produce it is based on ideas about freedom and solidarity. The types of rules and organization of labour inside of this communities have develop a kind of product that it is well categorized as part of the new commons. The conclusions at the end of this work pretend to offer a clear approach to the FLOSS production networks dynamics inside the virtual infrastructure. Specifically, it offers an approach of the interaction and forms of cooperation, as well of the individual and collective schemas that motivates the cooperation action of the individuals. %I Universidad Nacional Autónoma de México %C Ciudad de México, México %P 269 pages %U http://132.248.9.195/ptd2012/agosto/406008604/Index.html %> https://flosshub.org/sites/flosshub.org/files/Tesis.pdf %0 Journal Article %J Empirical Software Engineering %D 2012 %T Studying the impact of social interactions on software quality %A Bettenburg, Nicolas %A Hassan, Ahmed E. %K bug tracker %K eclipse %K Firefox %K Human Factors %K measurement %K metrics %K software evolution %K Software quality assurance %X Correcting software defects accounts for a significant amount of resources in a software project. To make best use of testing efforts, researchers have studied statistical models to predict in which parts of a software system future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on source code-oriented metrics, such as lines of code or number of changes. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the various repositories used to manage development efforts. In this paper, we develop statistical models to study the impact of social interactions in a software project on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of two large open-source software projects. The results of our case studies demonstrate the impact of metrics from four different dimensions of social interaction on post-release defects. Our findings show that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our results demonstrate that social information does not substitute, but rather augments traditional source code-based metrics used in defect prediction models. %B Empirical Software Engineering %! Empir Software Eng %R 10.1007/s10664-012-9205-0 %0 Conference Proceedings %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %D 2012 %T Two Evolution Indicators for FOSS Projects %A Petrinja, Etiel %A Succi, Giancarlo %K metrics %K sourceforge %X In this paper we introduce two project evolution indicators. One is showing an increase of downloads of the project and therefore a growing interest of users in the results of the project. The second indica- tor is predicting the future evolution of the project with the submission of new revisions to the concurrent versioning system. Both indicators can provide evidence of the sustainability of a software project. We used the General Linear Model method to statistically formulate the two linear equations that can be used to predict the two indicators. The predicting equations were build by using two stratified data samples one of 760 projects and the second of 880 projects extracted from the SourceForge repository. The six metrics included into the final version of the two models were extracted from a set of thirty project and product metrics as: the number of downloads, the number of developers, etc. We have validated the discriminant and the concurrent validity of the two models by using different statistical tests as the goodness-of-fit and we have used the two models to predict the indicators on two hold-out validation samples. The model predicting the increment of downloads was correct in 75 percent of the cases, the model predicting the submission of new revisions was correct in 93 percent of the cases. %B IFIP Advances in Information and Communication Technology 378 (OSS 2012) %I IFIP AICT %V 378 %P 216-232 %8 09/2012 %0 Conference Paper %B 45th Hawai'i International Conference on System Sciences %D 2012 %T Who Will Remain? An Evaluation of Actual Person-Job and Person-Team Fit to Predict Developer Retention in FLOSS Projects %A Schilling, A. %A Laumer, S. %A Weitzel, T. %K email %K email archives %K google summer of code %K kde %K mailing list %K students %X Many businesses and private households rely on Free Libre Open Source Software (FLOSS). Due to a lack of sustained contributors, however, most FLOSS projects do not survive. The early identification of developers who are likely to remain is thus an eminent challenge for the management of FLOSS initiatives. Previous research has shown that individuals' subjective assessment is often inaccurate emphasizing the need to objectively evaluate retention behavior. Consistent with the concepts Person-Job (P-J) and Person-Team (P-T) fit from the traditional recruitment literature, we derive objective measures to predict developer retention in FLOSS projects. In an analysis of the contribution behavior of former Google Summer of Code (GSoC) students we reveal that the level of development experience and conversational knowledge is strongly associated with retention. Surprisingly, our analysis reveals that students with abilities that are underrepresented in the project and students with a higher academic education do not remain considerably longer. %B 45th Hawai'i International Conference on System Sciences %P 3446-3455 %8 01/2012 %R http://doi.ieeecomputersociety.org/10.1109/HICSS.2012.644 %> https://flosshub.org/sites/flosshub.org/files/45.pdf %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Adoption of OSS Development Practices by the Software Industry: A Survey %A Petrinja, Etiel %A Sillitti, Alberto %A Succi, Giancarlo %K qualipso %K Survey %X The paper presents a survey of aspects related to the adoption of Open Source Software by the software industry. The aim of this study was to collect data related to practices and elements in the development process of companies that influence the trust in the quality of the product by potential adopters. The work is part of the research done inside the QualiPSo project and was carried out using a qualitative study based on a structured questionnaire focused on perceptions of experts and development practices used by companies involved in the Open Source Software industry. The results of the survey confirm intuitive concerns related to the adoption of Open Source Software as: the selection of the license, the quality issues addressed, and the development process tasks inside Open Source Software projects. The study uncovered specific aspects related to trust and trustworthiness of the Open Source Software development process that we did not find in previous studies as: the standards implemented by the OSS project, the project's roadmap is respected, and the communication channels that are available. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 233-243 %8 10/2011 %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T Apples vs. oranges? %A Davies, Julius %A Daniel M. German %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K eclipse %K netbeans %K source code %X We attempt to compare the source code of two Java IDE systems: Netbeans and Eclipse. The result of this experiment shows that many factors, if ignored, could risk a bias in the results, and we posit various observations that should be taken into consideration to minimize such risk. %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 246-249 %8 05/2011 %@ 9781450305747 %! MSR '11 %R 10.1145/1985441.1985483 %0 Journal Article %J International Journal of Open Source Software and Processes %D 2011 %T Are Developers Fixing Their Own Bugs? %A Izquierdo-Cortazar, Daniel %A Capiluppi, Andrea %A Jesus M. Gonzalez-Barahona %K bug fixing %K developers %K loc %K scm %X The process of fixing software bugs plays a key role in the maintenance activities of a software project. Ideally, code ownership and responsibility should be enforced among developers working on the same artifacts, so that those introducing buggy code could also contribute to its fix. However, especially in FLOSS projects, this mechanism is not clearly understood: in particular, it is not known whether those contributors fixing a bug are the same introducing and seeding it in the first place. This paper analyzes the comm-central FLOSS project, which hosts part of the Thunderbird, SeaMonkey, Lightning extensions and Sunbird projects from the Mozilla community. The analysis is focused at the level of lines of code and it uses the information stored in the source code management system. The results of this study show that in 80% of the cases, the bug-fixing activity involves source code modified by at most two developers. It also emerges that the developers fixing the bug are only responsible for 3.5% of the previous modifications to the lines affected; this implies that the other developers making changes to those lines could have made that fix. In most of the cases the bug fixing process in comm-central is not carried out by the same developers than those who seeded the buggy code. %B International Journal of Open Source Software and Processes %V 3 %P 23 - 42 %N 2 %R 10.4018/jossp.2011040102 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Aspects of an Open Source Software Sustainable Life Cycle %A Arantes, Flavia Linhalis %A Freire, Fernanda Maria Pereira %K Financial Resources %K OSS Communities %K OSS Sustainability %K software maintenance %X In this paper we present a literature overview about OSS sustainability, considering not only financial resources, but also community growth, source code and tools management. Based on these aspects, we define an OSS life cycle that may contribute to OSS projects sustainability. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 325-329 %8 10/2011 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Building Knowledge in Open Source Software Research in Six Years of Conferences %A Mulazzini, Fabio %A Rossi, Bruno %A Russo, Barbara %A Steff, Maximilian %K Cross-citations %K flossmole cited %K graph %K literature review %K network %K research %K Systematic Mapping Study %X Since its origins, the diffusion of the OSS phenomenon and the information about it has been entrusted to the Internet and its virtual communities of developers. This public mass of data has attracted the interest of researchers and practitioners aiming at formalizing it into a body of knowledge. To this aim, in 2005, a new series of conferences on OSS started to collect and convey OSS knowledge to the research and industrial community. Our work mines articles of the OSS conference series to understand the process of knowledge grounding and the community surrounding it. As such, we propose a semi-automated approach for a systematic mapping study on these articles. We automatically build a map of cross-citations among all the papers of the conferences and then we manually inspect the resulting clusters to identify knowledge building blocks and their mutual relationships. We found that industry-related, quality assurance, and empirical studies often originate or maintain new streams of research. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 123-141 %8 10/2011 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Cliff Walls: An Analysis of Monolithic Commits Using Latent Dirichlet Allocation %A Pratt, Landon J. %A MacLean, Alexander C. %A Knutson, Charles D. %A Ringger, Eric K. %K artifacts %K commit %K cvs %K LDA %K lines of code %K log files %K scm %K sloc %K sourceforge %K version control %X Artifact-based research provides a mechanism whereby researchers may study the creation of software yet avoid many of the difficulties of direct observation and experimentation. However, there are still many challenges that can affect the quality of artifact-based studies, especially those studies examining software evolution. Large commits, which we refer to as “Cliff Walls,” are one significant threat to studies of software evolution because they do not appear to represent incremental development. We used Latent Dirichlet Allocation to extract topics from over 2 million commit log messages, taken from 10,000 SourceForge projects. The topics generated through this method were then analyzed to determine the causes of over 9,000 of the largest commits. We found that branch merges, code imports, and auto-generated documentation were significant causes of large commits. We also found that corrective maintenance tasks, such as bug fixes, did not play a significant role in the creation of large commits. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 282-298 %8 10/2011 %0 Journal Article %J International Journal of Open Source Software and Processes %D 2011 %T An Empirical Study of Open Source Software Usability %A Raza, Arif %A Capretz, Luiz Fernando %A Ahmed, Faheem %K Survey %X Recent years have seen a sharp increase in the use of open source projects by common novice users; Open Source Software (OSS) is thus no longer a reserved arena for software developers and computer gurus. Although user-centered designs are gaining popularity in OSS, usability is still not considered one of the prime objectives in many design scenarios. This paper analyzes industry users’ perception of usability factors, including understandability, learnability, operability, and attractiveness on OSS usability. The research model of this empirical study establishes the relationship between the key usability factors and OSS usability from industrial perspective. In order to conduct the study, a data set of 105 industry users is included. The results of the empirical investigation indicate the significance of the key factors for OSS usability. %B International Journal of Open Source Software and Processes %V 3 %P 1 - 16 %N 1 %R 10.4018/jossp.2011010101 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Framing the Conundrum of Total Cost of Ownership of Open Source Software %A Maha Shaikh %A Cornford, Tony %K benefits %K exit costs %K open source software %K software adoption %K Survey %K tco %K total cost of ownership %X This paper reflects the results of phase I of our study on the total cost of ownership (TCO) of open source software adoption. Not only have we found TCO to be an intriguing issue but it is contentious, baffling and each company approaches it in a distinctive manner (and sometimes not at all). In effect it is a conundrum that needs unpacking before it can be explained and understood. Our paper discusses the components of TCO as total cost of ownership and total cost of acquisition (and besides). Using this broad dichotomy and its various components we then analyze our data to make sense of procurement decisions in relation to open source software in the public sector and private companies. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 208-219 %8 10/2011 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Impact of Stakeholder Type and Collaboration on Issue Resolution Time in OSS Projects %A Duc, Ach Nguyen %A Cruzes, Daniela S. %A Ayala, Claudia %A Conradi, Reidar %K COLLABORATION %K companies %K coordination %K defects %K feature requests %K geronimo %K jira %K qpid %K qt %K social network analysis %K volunteer %X Initialized by a collective contribution of volunteer developers, Open source software (OSS) attracts an increasing involvement of commercial firms. Many OSS projects are composed of a mix group of firm-paid and volunteer developers, with different motivations, collaboration practices and working styles. As OSS development consists of collaborative works in nature, it is important to know whether these differences have an impact on collaboration between difference types of stakeholders, which lead to an influence in the project outcomes. In this paper, we empirically investigate the firm-paid participation in resolving OSS evolution issues, the stakeholder collaboration and its impact on OSS issue resolution time. The results suggest that though a firm-paid assigned developer resolves much more issues than a volunteer developer does, there is no difference in issue resolution time between them. Besides, the more important factor that influences the issue resolution time comes from the collaboration among stakeholders rather than from individual characteristics. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 1-16 %8 10/2011 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T The Importance of Architectural Knowledge in Integrating Open Source Software %A Stol, Klaas-Jan %A Ali Babar, Muhammad %A Avgeriou, Paris %K architectural knowledge %K component-based development %K Open Source Software integration %K OSS Integrator %K software architecture %K Survey %X Open Source Software (OSS) is increasingly used in Component-Based Software Development (CBSD) of large software systems. An important issue in CBSD is selection of suitable components. Various OSS selection methods have been proposed, but most of them do not consider the software architecture aspects of OSS products. The Software Architecture (SA) research community refers to a product’s architectural information, such as design decisions and underlying rationale, and used architecture patterns, as Architecture Knowledge (AK). In order to investigate the importance of AK of OSS components in integration, we conducted an exploratory empirical study. Based on in-depth interviews with 12 IT professionals, this paper presents insights into the following questions: 1) what AK of OSS is needed? 2) Why is AK of OSS needed? 3) Is AK of OSS generally available? And 4) what is the relative importance of AK? Based on these new insights, we provide a research agenda to further the research field of software architecture in OSS. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 142-158 %8 10/2011 %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T Java generics adoption %A Christian Bird %A Murphy-Hill, Emerson %A Parnin, Chris %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K commits %K generics %K java %K source code %K version history %X Support for generic programming was added to the Java language in 2004, representing perhaps the most significant change to one of the most widely used programming languages today. Researchers and language designers anticipated this addition would relieve many long-standing problems plaguing developers, but surprisingly, no one has yet measured whether generics actually provide such relief. In this paper, we report on the first empirical investigation into how Java generics have been integrated into open source software by automatically mining the history of 20 popular open source Java programs, traversing more than 500 million lines of code in the process. We evaluate five hypotheses, each based on assertions made by prior researchers, about how Java developers use generics. For example, our results suggest that generics do not significantly reduce the number of type casts and that generics are usually adopted by a single champion in a project, rather than all committers. %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 3-12 %8 05/2011 %@ 9781450305747 %! MSR '11 %R 10.1145/1985441.1985446 %0 Conference Paper %B 2011 44th Hawaii International Conference on System Sciences (HICSS 2011) %D 2011 %T Joining Free/Open Source Software Communities: An Analysis of Newbies' First Interactions on Project Mailing Lists %A Jensen, Carlos %A King, Scott %A Kuechler, Victor %K email %K email archive %K gimp %K mailing list %K mediawiki %K postgresql %K subversion %X Free/Open source software (FOSS) is an important part of the IT ecosystem. Due to the voluntary nature of participation, continual recruitment is key to the growth and sustainability of these communities. It is therefore important to understand how and why potential contributors fail in the process of transitioning from user to contributor. Most newcomers, or "newbies", have their first interaction with a community through a mailing list. To understand how this first contact influences future interactions, we studied eight mailing lists across four FOSS projects: MediaWiki, GIMP, PostgreSQL, and Subversion. We analyzed discussions initiated by newbies to determine the effect of gender, nationality, politeness, helpfulness and timeliness of response. We found that nearly 80% of newbie posts received replies, and that receiving timely responses, especially within 48 hours, was positively correlated with future participation. We also found that while the majority of interactions were positive, 1.5% of responses were rude or hostile. %B 2011 44th Hawaii International Conference on System Sciences (HICSS 2011) %I IEEE %C Kauai, HI %P 1 - 10 %@ 978-1-4244-9618-1 %R 10.1109/HICSS.2011.264 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Knowledge Homogeneity and Specialization in the Apache HTTP Server Project %A MacLean, Alexander C. %A Pratt, Landon J. %A Knutson, Charles D. %A Ringger, Eric K. %K apache %K commits %K developer %K email %K email archive %K LDA %K mailing list %K revision control %K revision history %K scm %K social network analysis %K specialization %K subversion %K svn %X We present an analysis of developer communication in the Apache HTTP Server project. Using topic modeling techniques we expose latent conceptual sub-communities arising from developer specialization within the greater developer population. However, we found that among the major contributors to the project, very little specialization exists. We present theories to explain this phenomenon, and suggest further research. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 106-122 %8 10/2011 %U http://sequoia.cs.byu.edu/lab/files/pubs/MacLean2011a.pdf %> https://flosshub.org/sites/flosshub.org/files/MacLean2011a.pdf %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Libre Software as an Innovation Enabler in India: Experiences of a Bangalorian Software SME %A Henttonen, Katja %K free software %K India %K INNOVATION %K open source %K software business %X Free/Libre and open source software (FLOSS) has been advocated for its presumed capacity to support native software industries in developing countries. It is said to create new spaces for exploration and to lower entry barriers to mature software markets, for example. However, little empirical research has been conducted concerning FLOSS business in a developing country setting and, thus, there is not much evidence to support or refute these claims. This paper presents a business case study conducted in India, a country branded as a 'software powerhouse' of the developing world. The findings show how FLOSS has opened up significant opportunities for the case company, especially in terms of improving its innovative capability and upgrading in the software value chain. On the other hand, they also highlight some challenges to FLOSS involvement that rise specifically from the Indian context. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 220-232 %8 10/2011 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Preparing FLOSS for Future Network Paradigms: A Survey on Linux Network Management %A Matos, Alfredo %A Thomson, John %A Paulo Trezentos %K linux %K networking %K Survey %X Operating system tools must fulfill the requirements generated by the advances in networking paradigms. To understand the current state of the Free, Libre and Open Source Software (FLOSS) ecosystem, we present a survey on the main tools used to manage and interact with the network, and how they are organized in Linux-based operating systems. Based on the survey results, we present a reference Linux network stack that can serve as the basis for future heterogeneous network environments, contributing towards a standardized approach in Linux. Using this stack, and focusing on dynamic and spontaneous network interactions, we present an evolution path for network related technologies, contributing to Linux as a network research operating system and to FLOSS as a whole. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 75-89 %8 10/2011 %0 Conference Paper %B 15th European Conference on Software Maintenance and Reengineering (CSMR 2011) %D 2011 %T Process Mining Software Repositories %A Poncin, Wouter %A Serebrenik, Alexander %A Brand, Mark van den %K amsn %K email %K email archives %K gcc %K mailing list %K Process mining %K software repositories %X Software developers’ activities are in general recorded in software repositories such as version control systems, bug trackers and mail archives. While abundant information is usually present in such repositories, successful information extraction is often challenged by the necessity to simultaneously analyze different repositories and to combine the information obtained. We propose to apply process mining techniques, originally developed for business process analysis, to address this challenge. However, in order for process mining to become applicable, different software repositories should be combined, and “related” software development events should be matched: e.g., mails sent about a file, modifications of the file and bug reports that can be traced back to it. The combination and matching of events has been implemented in FRASR (FRamework for Analyzing Software Repositories), augmenting the process mining framework ProM. FRASR has been successfully applied in a series of case studies addressing such aspects of the development process as roles of different developers and the way bug reports are handled. %B 15th European Conference on Software Maintenance and Reengineering (CSMR 2011) %I IEEE %C Oldenburg, Germany %P 5 - 14 %@ 978-1-61284-259-2 %R 10.1109/CSMR.2011.5 %> https://flosshub.org/sites/flosshub.org/files/2011-03_CSMR.pdf %0 Journal Article %J Information and Software Technology %D 2011 %T Sociomaterial bricolage: The creation of location-spanning work practices by global software developers %A Johri, Aditya %K Global software development %K Interpretive analysis %K interviews %K Qualitative field study %K Sociomaterial bricolage %K Virtual teams %K Work practices %X Context Studies on global software development have documented severe coordination and communication problems among coworkers due to geographic dispersion and consequent dependency on technology. These problems are exacerbated by increase in the complexity of work undertaken by global teams. However, despite these problems, global software development is on the rise and firms are adopting global practices across the board, raising the question: What does successful global software development look like and what can we learn from its practitioners? Objective This study draws on practice-based studies of work to examine successful work practices of global software developers. The primary aim of this study was to understand how workers develop practices that allow them to function effectively across geographically dispersed locations. Method An ethnographically-informed field study was conducted with data collection at two international locations of a firm. Interview, observation and archival data were collected. A total of 42 interviews and 3 weeks of observations were conducted. Results Teams spread across different locations around the world developed work practices through sociomaterial bricolage. Two facets of technology use were necessary for the creation of these practices: multiplicity of media and relational personalization at dyadic and team levels. New practices were triggered by the need to achieve a work-life balance, which was disturbed by global development. Reflecting on my role as a researcher, I underscore the importance of understanding researchers’ own frames of reference and using research practices that mirror informants’ work practices. Conclusion Software developers on global teams face unique challenges which necessitate a shift in their work practices. Successful teams are able to create practices that span locations while still being tied to location based practices. Inventive use of material and social resources is central to the creation of these practices. %B Information and Software Technology %V 53 %P 955 - 968 %8 9/2011 %U http://www.sciencedirect.com/science/article/pii/S0950584911000437 %N 9 %! Information and Software Technology %R 10.1016/j.infsof.2011.01.014 %0 Conference Paper %B Proceedings of the 33rd International Conference on Software Engineering %D 2011 %T Socio-technical developer networks: should we trust our measurements? %A Meneely, Andrew %A Williams, Laurie %K developer network %K developers %K linux %K linux kernel %K PHP %K social network analysis %K Survey %K wireshark %X Software development teams must be properly structured to provide effectiv collaboration to produce quality software. Over the last several years, social network analysis (SNA) has emerged as a popular method for studying the collaboration and organization of people working in large software development teams. Researchers have been modeling networks of developers based on socio-technical connections found in software development artifacts. Using these developer networks, researchers have proposed several SNA metrics that can predict software quality factors and describe the team structure. But do SNA metrics measure what they purport to measure? The objective of this research is to investigate if SNA metrics represent socio-technical relationships by examining if developer networks can be corroborated with developer perceptions. To measure developer perceptions, we developed an online survey that is personalized to each developer of a development team based on that developer's SNA metrics. Developers answered questions about other members of the team, such as identifying their collaborators and the project experts. A total of 124 developers responded to our survey from three popular open source projects: the Linux kernel, the PHP programming language, and the Wireshark network protocol analyzer. Our results indicate that connections in the developer network are statistically associated with the collaborators whom the developers named. Our results substantiate that SNA metrics represent socio-technical relationships in open source development projects, while also clarifying how the developer network can be interpreted by researchers and practitioners. %B Proceedings of the 33rd International Conference on Software Engineering %S ICSE '11 %I ACM %C New York, NY, USA %P 281–290 %@ 978-1-4503-0445-0 %U http://doi.acm.org/10.1145/1985793.1985832 %R 10.1145/1985793.1985832 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Successful Reuse of Software Components: A Report from the Open Source Perspective %A Capiluppi, Andrea %A Boldyreff, Cornelia %A Stol, Klaas-Jan %K component-based software development %K OSS components %K Software reuse %X A promising way of software reuse is Component-Based Software Development (CBSD). There is an increasing number of OSS products available that can be freely used in product development. However, OSS communities themselves have not yet taken full advantage of the “reuse mechanism”. Many OSS projects duplicate effort and code, even when sharing the same application domain and topic. One successful counter-example is the FFMpeg multimedia project, since several of its components are widely and consistently reused into other OSS projects. This paper documents the history of the libavcodec library of components from the FFMpeg project, which at present is reused in more than 140 OSS projects. Most of the recipients use it as a black-box component, although a number of OSS projects keep a copy of it in their repositories, and modify it as such. In both cases, we argue that libavcodec is a successful example of reusable OSS library of components. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 159-176 %8 10/2011 %0 Conference Paper %B Proceedings of the 2011 Community Building Workshop on Collaborative Teaching of Globally Distributed Software Development %D 2011 %T Teaching distributed software engineering with UCOSP: the undergraduate capstone open-source project %A Stroulia, Eleni %A Bauer, Ken %A Craig, Michelle %A Reid, Karen %A Wilson, Greg %K distributed %K education %K pedagogical %K project-based courses %K software engineering education %X Software engineering courses in computer-science departments are meant to prepare students for the practice of designing, developing, understanding and maintaining software in the real world. The effectiveness of these courses have potentially a tremendous impact on the software industry, since it is through these courses that students must learn the state-of-the-art process and the tools of their eventual "trade", so that they can bring this knowledge to their job and thus advance the actual state of practice. The value of "learning software engineering" through project-based courses has long been recognized by educators and practitioners alike. In this paper, we discuss our experience with a distributed project-based course, which infuses the students' learning experience with an increased degree of realism, which, we believe, further improves the quality of their learning and advances their readiness to join the profession. %B Proceedings of the 2011 Community Building Workshop on Collaborative Teaching of Globally Distributed Software Development %S CTGDSD '11 %I ACM %C New York, NY, USA %P 20–25 %@ 978-1-4503-0590-7 %U http://doi.acm.org/10.1145/1984665.1984670 %R 10.1145/1984665.1984670 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T To Fork or Not to Fork: Fork Motivations in SourceForge Projects %A Nyman, Linus %A Mikkonen, Tommi %K fork rate %K sourceforge %X A project fork occurs when software developers take a copy of source code from one software package and use it to begin an independent development work that is maintained separately from its origin. Although forking in open source software does not require the permission of the original authors, the new version, nevertheless, competes for the attention of the same developers that have worked on the original version. The motivations developers have for performing forks are many, but in general they have received little attention. In this paper, we present the results of a study of forks performed in SourceForge (http://sourceforge.net/) and list the developers’ motivations for their actions. The main motivation, seen in close to half of the cases of forking, was content modification; either adding content to the original program or focusing the content to the needs of a specific segment of users. In a quarter of the cases the motivation was technical modification; either porting the program to new hardware or software, or improving the original. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 259-268 %8 10/2011 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Towards a Unified Definition of Open Source Quality %A Ruiz, Claudia %A Robinson, William %K literature review %K measurement %K open source %K quality %K Software %X Software quality needs to be specified and evaluated in order to determine the success of a development project, but this is a challenge with Free/Libre Open Source Software (FLOSS) because of its permanently emergent state. This has not deterred the growth of the assumption that FLOSS is higher quality than traditionally developed software, despite of mixed research results. With this literature review, we found the reason for these mixed results is that that quality is being defined, measured, and evaluated differently. We report the most popular definitions, such as software structure measures, process measures, such as defect fixing, and maturity assessment models. The way researchers have built their samples has also contributed to the mixed results with different project properties being considered and ignored. Because FLOSS projects are evolving, their quality is too, and it must be measured using metrics that take into account its community’s commitment to quality rather than just its software structure. Challenges exist in defining what constitutes a defect or bug, and the role of modularity in affecting FLOSS quality. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 17-33 %8 10/2011 %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Towards Improving OSS Products Selection – Matching Selectors and OSS Communities Perspectives %A Ayala, Claudia %A Cruzes, Daniela S. %A Franch, Xavier %A Conradi, Reidar %K empirical study %K information rendering strategy %K open source software %K selection %X Adopting third-party software is becoming an economical and strategic need for today organizations. A fundamental part of its successful adoption is the informed selection of products that best fit the organization needs. One of the main current problems hampering selection, specially of OSS products is the vast amount of unstructured, incomplete, evolvable and widespread information about products that highly increases the risks of taking a wrong decision. In this paper, we aim to inform and provide evidence to OSS communities that help them to envisage improvements on their information rendering strategies to satisfy industrial OSS selectors’ needs. Our results are from the matching between the informational needs of 23 OSS selectors from diverse software-intensive organizations, and the in-depth study of 9 OSS communities of different sizes and domains. The results evidenced specific areas of improvement that might help to enhance the industrial OSS selection practice. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 244-258 %8 10/2011 %0 Conference Paper %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %D 2011 %T Towards understanding twitter use in software engineering: preliminary findings, ongoing challenges and future questions %A Bougie, Gargi %A Starke, Jamie %A Storey, Margaret-Anne %A Daniel M. German %K eclipse %K linux %K mxunit %K social media %K software development %K twitter %K web 2.0 %X There has been some research conducted around the motivation for the use of Twitter and the value brought by micro-blogging tools to individuals and business environments. This paper builds on our understanding of how the phenomenon affects the population which birthed the technology: Software Engineers. We find that the Software Engineering community extensively leverages Twitter's capabilities for conversation and information sharing and that use of the tool is notably different between distinct Software Engineering groups. Our work exposes topics for future research and outlines some of the challenges in exploring this type of data. %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %S Web2SE '11 %I ACM %C New York, NY, USA %P 31–36 %@ 978-1-4503-0595-2 %U http://doi.acm.org/10.1145/1984701.1984707 %R 10.1145/1984701.1984707 %> https://flosshub.org/sites/flosshub.org/files/WEB2SE2011.pdf %0 Conference Paper %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %D 2011 %T Towards understanding twitter use in software engineering: preliminary findings, ongoing challenges and future questions %A Bougie, Gargi %A Starke, Jamie %A Storey, Margaret-Anne %A Daniel M. German %K eclipse %K linux %K mxunit %K social media %K software development %K twitter %K web 2.0 %X There has been some research conducted around the motivation for the use of Twitter and the value brought by micro-blogging tools to individuals and business environments. This paper builds on our understanding of how the phenomenon affects the population which birthed the technology: Software Engineers. We find that the Software Engineering community extensively leverages Twitter's capabilities for conversation and information sharing and that use of the tool is notably different between distinct Software Engineering groups. Our work exposes topics for future research and outlines some of the challenges in exploring this type of data. %B Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering %S Web2SE '11 %I ACM %C New York, NY, USA %P 31–36 %@ 978-1-4503-0595-2 %U http://doi.acm.org/10.1145/1984701.1984707 %R 10.1145/1984701.1984707 %> https://flosshub.org/sites/flosshub.org/files/WEB2SE2011_0.pdf %0 Conference Paper %B Proceedings of the 33rd International Conference on Software Engineering %D 2011 %T Understanding broadcast based peer review on open source software projects %A Peter C. Rigby %A Storey, Margaret-Anne %K apache %K case studies %K email %K freebsd %K grounded theory %K kde %K linux %K linux kernel %K open source software %K peer review %K subversion %X Software peer review has proven to be a successful technique in open source software (OSS) development. In contrast to industry, where reviews are typically assigned to specific individuals, changes are broadcast to hundreds of potentially interested stakeholders. Despite concerns that reviews may be ignored, or that discussions will deadlock because too many uninformed stakeholders are involved, we find that this approach works well in practice. In this paper, we describe an empirical study to investigate the mechanisms and behaviours that developers use to find code changes they are competent to review. We also explore how stakeholders interact with one another during the review process. We manually examine hundreds of reviews across five high profile OSS projects. Our findings provide insights into the simple, community-wide techniques that developers use to effectively manage large quantities of reviews. The themes that emerge from our study are enriched and validated by interviewing long-serving core developers. %B Proceedings of the 33rd International Conference on Software Engineering %S ICSE '11 %I ACM %C New York, NY, USA %P 541–550 %@ 978-1-4503-0445-0 %R 10.1145/1985793.1985867 %> https://flosshub.org/sites/flosshub.org/files/Rigby2011ICSE.pdf %0 Journal Article %J Journal of the Association for Information Systems %D 2011 %T Validity Issues in the Use of Social Network Analysis with Digital Trace data %A Howison, James %A Andrea Wiggins %A Kevin Crowston %K information system %K Online Communities %K social network analysis %K Virtuality %X There is an exciting natural match between social network analysis methods and the growth of data sources produced by social interactions via information technologies, from online communities to corporate information systems. Information Systems researchers have not been slow to embrace this combination of method and data. Such systems increasingly provide "digital trace data" that provide new research opportunities. Yet digital trace data are substantively different from the survey and interview data for which network analysis measures and interpretations were originally developed. This paper examines ten validity issues associated with the combination of data digital trace data and social network analysis methods, with examples from the IS literature, to provide recommendations for improving the validity of research using this combination. %B Journal of the Association for Information Systems %V 12 %N 12 %& Article 2 %> https://flosshub.org/sites/flosshub.org/files/HowisonSNADigitalTraceData-WorkingPaper.pdf %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T Visualizing collaboration and influence in the open-source software community %A Marschner, Eli %A Rosenfeld, Evan %A Heer, Jeffrey %A Heller, Brandon %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K COLLABORATION %K data exploration %K geography %K geoscatter %K github %K graph %K mapping %K metadata %K open source %K social graph %K user profiles %K visualization %X We apply visualization techniques to user profiles and repository metadata from the GitHub source code hosting service. Our motivation is to identify patterns within this development community that might otherwise remain obscured. Such patterns include the effect of geographic distance on developer relationships, social connectivity and influence among cities, and variation in project-specific contribution styles (e.g., centralized vs. distributed). Our analysis examines directed graphs in which nodes represent users' geographic locations and edges represent (a) follower relationships, (b) successive commits, or (c) contributions to the same project. We inspect this data using a set of visualization techniques: geo-scatter maps, small multiple displays, and matrix diagrams. Using these representations, and tools based on them, we develop hypotheses about the larger GitHub community that would be difficult to discern using traditional lists, tables, or descriptive statistics. These methods are not intended to provide conclusive answers; instead, they provide a way for researchers to explore the question space and communicate initial insights. %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 223-226 %8 05/2011 %@ 9781450305747 %U http://vis.stanford.edu/files/2011-GotHub-MSR.pdf %! MSR '11 %R 10.1145/1985441.1985476 %0 Journal Article %J Information and Software Technology %D 2010 %T Adoption of open source software in software-intensive organizations – A systematic literature review %A Hauge, Øyvind %A Ayala, Claudia %A Conradi, Reidar %K open source software %K organizations %K software development %K Systematic literature review %X Context Open source software (OSS) is changing the way organizations develop, acquire, use, and commercialize software. Objective This paper seeks to identify how organizations adopt OSS, classify the literature according to these ways of adopting OSS, and with a focus on software development evaluate the research on adoption of OSS in organizations. Method Based on the systematic literature review method we reviewed publications from 24 journals and seven conference and workshop proceedings, published between 1998 and 2008. From a population of 24,289 papers, we identified 112 papers that provide empirical evidence on how organizations actually adopt OSS. Results We show that adopting OSS involves more than simply using OSS products. We moreover provide a classification framework consisting of six distinctly different ways in which organizations adopt OSS. This framework is used to illustrate some of the opportunities and challenges organizations meet when approaching OSS, to show that OSS can be adopted successfully in different ways, and to organize and review existing research. We find that existing research on OSS adoption does not sufficiently describe the context of the organizations studied, and it fails to benefit fully from related research fields. While existing research covers a large number of topics, it contains very few closely related studies. To aid this situation, we offer directions for future research. Conclusion The implications of our findings are twofold. On the one hand, practitioners should embrace the many opportunities OSS offers, but consciously evaluate the consequences of adopting it in their own context. They may use our framework and the success stories provided by the literature in their own evaluations. On the other hand, researchers should align their work, and perform more empirical research on topics that are important to organizations. Our framework may be used to position this research and to describe the context of the organization they are studying. %B Information and Software Technology %V 52 %P 1133 - 1154 %8 11/2010 %U http://www.sciencedirect.com/science/article/pii/S0950584910000972 %N 11 %! Information and Software Technology %R 10.1016/j.infsof.2010.05.008 %0 Journal Article %J Information and Software Technology %D 2010 %T Analysis of virtual communities supporting OSS projects using social network analysis %A Toral, S.L. %A Martínez-Torres, M.R. %A Barrero, F. %K arm %K email %K Knowledge brokers %K linux %K mailing list %K open source software %K social network analysis %K virtual communities %X This paper analyses the behaviour of virtual communities for Open Source Software (OSS) projects. The development of OSS projects relies on virtual communities, which are built on relationships among members, being their final objective sharing knowledge and improving the underlying project. This study addresses the interactive collaboration in these kinds of communities applying social network analysis (SNA). In particular, SNA techniques will be used to identify those members playing a middle-man role among other community members. Results will illustrate the importance of this role to achieve successful virtual communities. %B Information and Software Technology %V 52 %P 296 - 303 %8 3/2010 %U http://www.sciencedirect.com/science/article/pii/S0950584909001888 %N 3 %! Information and Software Technology %R 10.1016/j.infsof.2009.10.007 %0 Conference Paper %B 2010 43rd Hawaii International Conference on System Sciences (HICSS 2010) %D 2010 %T Analyzing Leadership Dynamics in Distributed Group Communication %A Kevin Crowston %A Andrea Wiggins %A Howison, James %K core %K DYNAMICS %K email %K email archives %K fire %K flossmole %K gaim %K leadership %K mailing list %K project success %K social network analysis %K srda %X We apply social network analysis (SNA) to examine the dynamics of leadership in distributed groups, specifically Free/Libre Open Source Software development projects, and its relation to group performance. Based on prior work on leadership in distributed groups, we identify leaders with those who make the highest level of contribution to the group and assess the degree of leadership by measuring centralization of communications. We compare the dynamics of leadership in two FLOSS projects, one more and one less effective. We find that in both projects, centralization was higher in developer-oriented communications venues than in user-oriented venues, suggesting higher degrees of leadership in developer venues. However, we do not find a consistent relation between centralization and effectiveness. We suggest that SNA can instead be useful for identifying interesting periods in the history of the project, e.g., periods where the leadership of the project is in transition. %B 2010 43rd Hawaii International Conference on System Sciences (HICSS 2010) %I IEEE %C Honolulu, Hawaii, USA %P 1 - 10 %@ 978-1-4244-5509-6 %R 10.1109/HICSS.2010.62 %> https://flosshub.org/sites/flosshub.org/files/07-06-02.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Automated dependency resolution for open source software %A Ossher, Joel %A Bajracharya, Sushil %A Lopes, Cristina %K dependencies %K java %K source code %K sourcerer %X Opportunities for software reuse are plentiful, thanks in large part to the widespread adoption of open source processes and the availability of search engines for locating relevant artifacts. One challenge presented by open source software reuse is simply getting a newly downloaded artifact to build/run in the first place. The artifact itself likely reuses other artifacts, and so depends on their being located to function properly. While merely tedious in the individual case, this can cause serious difficulties for those seeking to study open source software. It is simply not feasible to manually resolve dependencies for thousands of projects, and many forms of analysis require declarative completeness. In this paper we present a method for automatically resolving dependencies for open source software. It works by cross-referencing a project's missing type information with a repository of candidate artifacts. We have implemented this method on top of the Sourcerer, an infrastructure for the large-scale indexing and analysis of open source code. The performance of our resolution algorithm was evaluated in two parts. First, for a small number of popular open source projects, we manually examined the artifacts suggested by our system to determine if they were appropriate. Second, we applied the algorithm to the 13,241 projects in the Sourcerer managed repository to evaluate the rate of resolution success. The results demonstrate the feasibility of this approach, as the algorithm located all of the required artifacts needed by 3,904 additional projects, increasing the percentage of declaratively complete projects in Sourcerer from 39% to 69%. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 130 - 140 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463346 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Clones: What is that smell? %A Rahman, Foyzur %A Christian Bird %A Devanbu, Premkumar %K apache %K bug fix revisions %K bugs %K clone %K evolution %K gimp %K nautilus %K scm %K source code %X Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses relationship between cloning and defect proneness. We find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Finally, we find little evidence that clones with more copies are actually more error prone. Our findings do not support the claim that clones are really a "bad smell". Perhaps we can clone, and breathe easy, at the same time. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 72 - 81 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463343 %> https://flosshub.org/sites/flosshub.org/files/72rahman2010cws.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Cloning and copying between GNOME projects %A Krinke, Jens %A Gold, Nicolas %A Jia, Yue %A Binkley, David %K clone %K gnome %K msr challenge %K source code %X This paper presents an approach to automatically distinguish the copied clone from the original in a pair of clones. It matches the line-by-line version information of a clone to the pair's other clone. A case study on the GNOME Desktop Suite revealed a complex flow of reused code between the different subprojects. In particular, it showed that the majority of larger clones (with a minimal size of 28 lines or higher) exist between the subprojects and more than 60% of the clone pairs can be automatically separated into original and copy. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 98 - 101 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463290 %> https://flosshub.org/sites/flosshub.org/files/98Coning.pdf %0 Journal Article %J International Journal of Open Source Software and Processes %D 2010 %T Data Mining User Activity in Free and Open Source Software (FOSS)/ Open Learning Management Systems %A McGrath, Owen %K data mining %K education %K student %X Free and Open Source Software (FOSS)/Open Educational Systems development projects abound in higher education today. Many universities worldwide have adopted open source software like ATutor and Moodle as an alternative to commercial or homegrown systems. The move to open source learning management systems entails many special considerations, including usage analysis facilities. The tracking of users and their activities poses major technical and analytical challenges within web-based systems. This paper examines how user activity tracking challenges are met with data mining techniques, particularly web usage mining methods, in four different open learning management systems: ATutor, LON-CAPA, Moodle, and Sakai. As examples of data mining technologies adapted within widely used systems, they represent important first steps for moving educational data mining outside the research laboratory. Moreover, as examples of different open source development contexts, exemplify the potential for programmatic integration of data mining technology processes in the future. As open systems mature in the use of educational data mining, they move closer to the long-sought goal of achieving more interactive, personalized, adaptive learning environments online on a broad scale. %B International Journal of Open Source Software and Processes %V 2 %P 65 - 75 %N 1 %R 10.4018/jossp.2010010105 %0 Journal Article %J International Journal of Open Source Software and Processes %D 2010 %T Developing a Dynamic and Responsive Online Learning Environment %A Buchan, Janet %K education %K learning %K sakai %X Charles Stuart University adopted the open source software, Sakai, as the foundation for the university’s new, integrated Online Learning Environment. This study explores whether a pedagogical advantage exists in adopting such an open source learning management system. Research suggests that the community source approach to development of open source software has many inherent pedagogical advantages, but this paper examines whether this is due to the choice of open source software or simply having access to appropriate technology for learning and teaching in the 21st century. The author also addresses the challenges of the project management methodology and processes in the large-scale implementation of an open-source courseware management solution at the institutional level. Consequently, this study outlines strategies that an institution can use to harness the potential of a community source approach to software development to meet the institutional and individual user needs into the future. %B International Journal of Open Source Software and Processes %V 2 %P 32 - 48 %N 1 %R 10.4018/jossp.2010010103 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Do stack traces help developers fix bugs? %A Schroter, Adrian %A Schröter, Adrian %A Bettenburg, Nicolas %A Premraj, Rahul %K bug fixing %K bug report %K debugging %K eclipse %K stack trace %X A widely shared belief in the software engineering community is that stack traces are much sought after by developers to support them in debugging. But limited empirical evidence is available to confirm the value of stack traces to developers. In this paper, we seek to provide such evidence by conducting an empirical study on the usage of stack traces by developers from the ECLIPSE project. Our results provide strong evidence to this effect and also throws light on some of the patterns in bug fixing using stack traces. We expect the findings of our study to further emphasize the importance of adding stack traces to bug reports and that in the future, software vendors will provide more support in their products to help general users make such information available when filing bug reports. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 118 - 121 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463280 %> https://flosshub.org/sites/flosshub.org/files/118-10-msr.pdf %0 Book Section %B Open Source Software: New Horizons %D 2010 %T Download Patterns and Releases in Open Source Software Projects: A Perfect Symbiosis? %A Rossi, Bruno %A Russo, Barbara %A Succi, Giancarlo %E Ågerfalk, Pär %E Boldyreff, Cornelia %E González-Barahona, Jesús %E Madey, Gregory %E Noll, John %K flossmole %K oss2010 %K sourceforge %X Software usage by end-users is one of the factors used to evaluate the success of software projects. In the context of open source software, there is no single and non-controversial measure of usage, though. Still, one of the most used and readily available measure is data about projects downloads. Nevertheless, download counts and averages do not convey as much information as the patterns in the original downloads time series. In this research, we propose a method to increase the expressiveness of mere download rates by considering download patterns against software releases. We apply experimentally our method to the most downloaded projects of SourceForge's history crawled through the FLOSSMole repository. Findings show that projects with similar usage can have indeed different levels of sensitivity to releases, revealing different behaviors of users. Future research will develop further the pattern recognition approach to automatically categorize open source projects according to their download patterns. %B Open Source Software: New Horizons %S IFIP Advances in Information and Communication Technology %I Springer Boston %V 319 %P 252-267 %U http://dx.doi.org/10.1007/978-3-642-13244-5_20 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T The evolution of ANT build systems %A McIntosh, Shane %A Adams, Bram %A Hassan, Ahmed E. %K ant %K argouml %K build %K eclipse %K jboss %K maintenance %K metrics %K source code %K tomcat %X Build systems are responsible for transforming static source code artifacts into executable software. While build systems play such a crucial role in software development and maintenance, they have been largely ignored by software evolution researchers. With a firm understanding of build system aging processes, project managers could allocate personnel and resources to build system maintenance tasks more effectively, reducing the build maintenance overhead on regular development activities. In this paper, we study the evolution of ANT build systems from two perspectives: (1) a static perspective, where we examine the build system specifications using software metrics adopted from the source code domain; and (2) a dynamic perspective where representative sample build runs are conducted and their output logs are analyzed. Case studies of four open source ANT build systems with a combined history of 152 releases show that not only do ANT build systems evolve, but also that they need to react in an agile manner to changes in the source code. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 42 - 51 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463341 %> https://flosshub.org/sites/flosshub.org/files/42msr2010_mcintosh.pdf %0 Conference Paper %B 2010 43rd Hawaii International Conference on System Sciences (HICSS 2010) %D 2010 %T Exploring Complexity in Open Source Software: Evolutionary Patterns, Antecedents, and Outcomes %A Darcy, David P. %A Daniel, Sherae L. %A Stewart, Katherine J. %K complexity %K evolution %K fda %K life cycle %K sourceforge %K srda %X Software complexity is important to researchers and managers, yet much is unknown about how complexity evolves over the life of a software application and whether different dimensions of software complexity may exhibit similar or different evolutionary patterns. Using cross-sectional and longitudinal data on a sample of 108 open source projects, this research investigated how the complexity of open source project releases varied throughout the life of the project. Functional data analysis was applied to the release histories of the projects and recurring evolutionary patterns were derived. There were projects that saw little evolution, according to their measures of size and structural complexity. However, projects that displayed some evolution often differed on the pattern of evolution depending on whether size or structural complexity was examined. Factors that contribute to and result from the patterns of complexity were evaluated, and implications for research and practice are presented. %B 2010 43rd Hawaii International Conference on System Sciences (HICSS 2010) %I IEEE %C Honolulu, Hawaii, USA %P 1 - 11 %@ 978-1-4244-5509-6 %R 10.1109/HICSS.2010.198 %> https://flosshub.org/sites/flosshub.org/files/10-07-02.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T An extensive comparison of bug prediction approaches %A D'Ambros, Marco %A Lanza, Michele %A Robbes, Romain %K apache %K bug reports %K eclipse %K famix %K lucene %K mylyn %K prediction %K scm %X Reliably predicting software defects is one of software engineering's holy grails. Researchers have devised and implemented a plethora of bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available data set consisting of several software systems, and provide an extensive comparison of the explanative and predictive power of well-known bug prediction approaches, together with novel approaches we devised. Based on the results, we discuss the performance and stability of the approaches with respect to our benchmark and deduce a number of insights on bug prediction models. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 31 - 41 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463279 %> https://flosshub.org/sites/flosshub.org/files/31dambrosLanzaRobbes31.pdf %0 Conference Paper %B Proceedings of ICPC 2010 (18th IEEE International Conference on Program Comprehension) %D 2010 %T Extracting source code from e-mails %A Bacchelli, Alberto %A D'Ambros, Marco %A Lanza, Michele %K argouml %K email %K freenet %K jmeter %K mailing lists %K mina %K natural language %K openjpa %K source code %X E-mails, used by developers and system users to communicate over a broad range of topics, offer a valuable source of information. If archived, e-mails can be mined to support program comprehension activities and to provide views of a software system that are alternative and complementary to those offered by the source code. However, e-mails are written in natural language, and therefore contain noise that makes it difficult to retrieve the important data. Thus, before conducting an effective system analysis and extracting data for program comprehension, it is necessary to select the relevant messages, and to expose only the meaningful information. In this work we focus both on classifying e-mails that hold fragments of the source code of a system, and on extracting the source code pieces inside the e-mail. We devised and analyzed a number of lightweight techniques to accomplish these tasks. To assess the validity of our techniques, we manually inspected and annotated a statistically significant number of e-mails from five unrelated open source software systems written in Java. With such a benchmark in place, we measured the effectiveness of each technique in terms of precision and recall. %B Proceedings of ICPC 2010 (18th IEEE International Conference on Program Comprehension) %P 24-33 %U http://www.inf.usi.ch/phd/bacchelli/publications.php %> https://flosshub.org/sites/flosshub.org/files/icpc2010.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Finding file clones in FreeBSD Ports Collection %A Sasaki, Yusuke %A Yamamoto, Tetsuo %A Hayase, Yasuhiro %A Inoue, Katsuro %K clone %K freebsd %K msr challenge %K source code %X In Open Source System (OSS) development, software components are often imported and reused; for this reason we might expect that files are copied in multiple projects (file clones). In this paper, we propose a file clone detection tool called FCFinder and show the analysis performed with it on the FreeBSD Ports Collection, a large OSS project collection. We found many file clones among similar or related projects, which are systematically introduced from base projects. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 102 - 105 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463293 %> https://flosshub.org/sites/flosshub.org/files/102FreeBSDClones.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Identifying licensing of jar archives using a code-search approach %A Di Penta, Massimiliano %A Daniel M. German %A Antoniol, Giuliano %K apache %K bytecode %K classification %K eclipse %K google code %K jar %K java %K licenses %K source code %X Free and open source software strongly promotes the reuse of source code. Some open source Java components/libraries are distributed as jar archives only containing the bytecode and some additional information. For whoever wanting to integrate this jar in her own project, it is important to determine the license(s) of the code from which the jar archive was produced, as this affects the way that such component can be used. This paper proposes an automatic approach to determine the license of jar archives, combining the use of a code-search engine with the automatic classification of licenses contained in textual flies enclosed in the jar. Results of an empirical study performed on 37 jars - from 17 different systems - indicate that this approach is able to successfully infer the jar licenses in over 95% of the cases, but that in many cases the license in textual flies may differ from the one of the classes contained in the jar. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 151 - 160 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463282 %> https://flosshub.org/sites/flosshub.org/files/151msr2010.pdf %0 Journal Article %J International Journal of Open Source Software and Processes %D 2010 %T Impact of Programming Language Fragmentation on Developer Productivity %A Krein, Jonathan L. %A MacLean, Alexander C. %A Knutson, Charles D. %A Delorey, Daniel P. %A Eggett, Dennis L. %K commits %K entropy %K language entropy %K programming languages %K sourceforge %K srda %X Programmers often develop software in multiple languages. In an effort to study the effects of programming language fragmentation on productivity—and ultimately on a developer’s problem-solving abilities—the authors present a metric, language entropy, for characterizing the distribution of a developer’s programming efforts across multiple programming languages. This paper presents an observational study examining the project contributions of a random sample of 500 SourceForge developers. Using a random coefficients model, the authors find a statistically (alpha level of 0.001) and practically significant correlation between language entropy and the size of monthly project contributions. Results indicate that programming language fragmentation is negatively related to the total amount of code contributed by developers within SourceForge, an open source software (OSS) community. %B International Journal of Open Source Software and Processes %V 2 %P 41 - 61 %8 32/2010 %N 2 %R 10.4018/jossp.2010040104 %0 Conference Paper %B 2010 43rd Hawaii International Conference on System Sciences %D 2010 %T The Importance of Social Network Structure in the Open Source Software Developer Community %A Matthew Van Antwerp %A Madey, Greg %K developers %K popularity %K project success %K social network analysis %K sourceforge %K srda %X This paper outlines the motivations and methods for analyzing the developer network of open source software (OSS) projects. Previous work done by Hinds [5] suggested social network structure was instrumental towards the success of an OSS project, as measured by activity and output. The follow-up paper by Hinds [4] discovered that his hypotheses, based on social network theory and previous research on the importance of subgroup connectedness, were vastly different than the results of his study of over 100 successful OSS projects. He concluded that the social network structure had no significant effect on project success. We outline how his approach disregarded potentially important factors and through a new study evaluate the role of the OSS developer network as it pertains to long-term project popularity. We also present an initial investigation into the adequacy of using the SourceForge activity percentile as a long-term success metric. In contrast with Hinds, we show that previously existing developer-developer ties are an indicator of past and future project popularity. %B 2010 43rd Hawaii International Conference on System Sciences %I IEEE %C Honolulu, Hawaii, USA %P 1 - 10 %@ 978-1-4244-5509-6 %R 10.1109/HICSS.2010.385 %> https://flosshub.org/sites/flosshub.org/files/07-06-07.pdf %0 Conference Proceedings %B International Conference on Information Systems %D 2010 %T Learning in Open Source Software (OSS) Development: How Developer Interactions in Culturally Diverse Projects Impact the Acquisition of Collaboration and Learning Skills %A Diamant, I. %A Daniel, S.L. %K sourceforge %X Participants in an OSS/FLOSS development project often span national and organizational boundaries, as developers from different countries and corporations join the project. A project team’s national and organizational culture creates opportunities for learning from others, but may also lead to conflict and inhibit learning. This research examines developers’ learning in an OSS project, and the cultural context in which learning takes place. We focus on single- and doubleloop learning and examine the impact of the team’s national and organizational culture on a developer’s learning. Archival and survey data are collected from two large-scale Sourceforge projects. This research can contribute to the OSS literature by examining the impact of team interactions on developers’ learning. Practically, administrators and managers stand to gain insight into the learning benefits of participation in OSS projects, and thus better assess their value as a training ground for global software development. %B International Conference on Information Systems %U http://aisel.aisnet.org/icis2010_submissions/66/ %0 Conference Paper %B Demonstration Track, Proceedings of the 17th SIGSOFT Symposium on Foundations of Software Engineering %D 2010 %T Linkster: Enabling Efficient Manual Mining %A Christian Bird %A Adrian Bachman %A Rahman, Foyzur %A Bernstein, Abraham %K artifacts %K bug %K bug tracking %K data mining %K email %K mailing lists %K open source %K source code %X While many uses of mined software engineering data are automatic in nature, some techniques and studies either require, or can be improved, by manual methods. Unfortunately, manually inspecting, analyzing, and annotating mined data can be difficult and tedious, especially when information from multiple sources must be integrated. Oddly, while there are numerous tools and frameworks for automatically mining and analyzing data, there is a dearth of tools which facilitate manual methods. To fill this void, we have developed LINKSTER, a tool which integrates data from bug databases, source code repositories, and mailing list archives to allow manual inspection and annotation. LINKSTER has already been used successfully by an OSS project lead to obtain data for one empirical study. %B Demonstration Track, Proceedings of the 17th SIGSOFT Symposium on Foundations of Software Engineering %I ACM %> https://flosshub.org/sites/flosshub.org/files/bird2010lee.pdf %0 Conference Paper %B 5th Workshop on Public Data about Software Development (WoPDaSD 2010) %D 2010 %T A Longitudinal Study on Collaboration Networks and Decision to Participate in a FLOSS Community %A Guido Conaldi %A Tonellato, Marco %K bicho %K bug fixing %K bug reports %K bugzilla %K COLLABORATION %K developers %K epiphany %K flossmetrics %K gnome %K social network analysis %X In this paper we conjecture that individual decisions of FLOSS (Free/Libre Open Source Software) developers to take on a task are influenced by network relations generated by collaboration among project members. In order to explore our conjecture we collected data on a FLOSS project team consisting of 227 developers committed since 2002 to the development of a web browser. We reconstructed 2-mode co- collaboration networks (software developer by bug) in which a tie represents an action taken by a developer in order to solve a specific bug. Co-collaboration networks were collected at five points in time during a six-month development cycle of the software. We report and discuss results of longitudinal actor-based modeling that we specify to test for the influence of local network structures on developer’s decision to take action on a specific bug. The study controls for bug-specific and developer-specific characteristics that may also affect developers’ decisions exogenously. We also control for priority and severity levels assigned by the team to bugs in an attempt to manage voluntary contribution. %B 5th Workshop on Public Data about Software Development (WoPDaSD 2010) %> https://flosshub.org/sites/flosshub.org/files/wopdasd002.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Mining security changes in FreeBSD %A Mauczka, Andreas %A Schanes, Christian %A Fankhauser, Florian %A Bernhart, Mario %A Grechenig, Thomas %K freebsd %K msr challenge %K security %X Current research on historical project data is rarely touching on the subject of security related information. Learning how security is treated in projects and which parts of a software are historically security relevant or prone to security changes can enhance the security strategy of a software project. We present a mining methodology for security related changes by modifying an existing method of software repository analysis. We use the gathered security changes to find out more about the nature of security in the FreeBSD project and we try to establish a link between the identified security changes and a tracker for security issues (security advisories). We give insights how security is presented in the FreeBSD project and show how the mined data and known security problems are connected. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 90 - 93 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463289 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Mining subclassing directives to improve framework reuse %A Bruch, Marcel %A Mezini, Mira %A Monperrus, Martin %K api %K documentation %K eclipse %K frameworks %K jface %K source code %X To help developers in using frameworks, good documentation is crucial. However, it is a challenge to create high quality documentation especially of hotspots in white-box frameworks. This paper presents an approach to documentation of object-oriented white-box frameworks which mines from client code four different kinds of documentation items, which we call subclassing directives. A case study on the Eclipse JFace user-interface framework shows that the approach can improve the state of API documentation w.r.t. subclassing directives. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 141 - 150 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463347 %> https://flosshub.org/sites/flosshub.org/files/141Mining-Subclassing-Directives-to-Improve-Framework-Reuse.pdf %0 Conference Paper %B Proceedings of the 3rd International Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development (FLOSS '10) %D 2010 %T The onion has cancer: some social network analysis visualizations of open source project communication %A Oezbek, Christopher %A Prechelt, Lutz %A Thiel, Florian %K argouml %K Bochs %K bugzilla %K communication structure %K Flyspray %K gEDA %K Grub %K MonetDB %K open source process %K request tracker %K Rox %K social network analysis %K Xfce %X Background: People contribute to OSS projects in wildly different degrees, from reporting a single defect once and never coming back to spending many hours each workday on the project over several years - or anything in between. It is a common conception that these degrees of participation sort the participants into a number of similar groups which are layered like the peels of an onion: The onion model. Objective: We check whether this model of gradually different degrees of participation is valid with respect to the participation in OSS project mailing-list traffic. Methods: We perform social network analysis based on replies to mailing-list messages and use visualization to check the nature of three different groups of participants. Results: There appears to be a discontinuity with respect to core members: The degree to which very active core members (as opposed to less active co-developers) react to e-mails of senders from the project's periphery is significantly higher than would be expected from their level of activity in general. Limitations: The effect might be an artifact of the assumption that each mailing-list message can be treated the same. Conclusions: We conclude that core member status may be qualitatively (rather than just quantitatively) different and the transition of individual mailing-list participants towards ever higher participation is qualitatively discontinuous. %B Proceedings of the 3rd International Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development (FLOSS '10) %S FLOSS '10 %I ACM %C New York, NY, USA %P 5–10 %@ 978-1-60558-978-7 %U http://doi.acm.org/10.1145/1833272.1833274 %R 10.1145/1833272.1833274 %> https://flosshub.org/sites/flosshub.org/files/OezThiPre10-SNA.pdf %0 Conference Paper %B Seventh Annual Acquisition Research Symposium, {NPS} Proceedings - %D 2010 %T On Open and Collaborative Software Development in the DoD %A Hissam, S. A. %A Weinstock, C. %A Bass, L. %K collaborative development %K open source software %K reuse %K software engineering %X The US Department of Defense (specifically, but not limited to, the DoD CIO's Clarifying Guidance Regarding Open Source Software, DISA's launch of Forge.mil and OSD's Open Technology Development Roadmap Plan) has called for increased use of open source software and the adoption of best practices from the free/open source software (F/OSS) community to foster greater reuse and innovation between programs in the DoD. In our paper, we examine some key aspects of open and collaborative software development inspired by the success of the F/OSS movement as it might manifest itself within the US DoD. This examination is made from two perspectives: the reuse potential among DoD programs sharing software and the incentives, strategies and policies that will be required to foster a culture of collaboration needed to achieve the benefits indicative of F/OSS. Our conclusion is that to achieve predictable and expected reuse, not only are technical infrastructures needed, but also a shift to the business practices in the software development and delivery pattern seen in the traditional acquisition lifecycle is needed. Thus, there is potential to overcome the challenges discussed within this paper to engender a culture of openness and community collaboration to support the DoD mission. %B Seventh Annual Acquisition Research Symposium, {NPS} Proceedings - %I Naval Postgraduate School %C Monterey, California %V 1 %P 219–235 %8 04/2010 %U http://www.acquisitionresearch.net/cms/_files/FY2010/NPS-AM-10-037.pdf %0 Book %B IFIP Advances in Information and Communication TechnologyOpen Source Software: New Horizons %D 2010 %T Open Source Software Developer and Project Networks %A Madey, G. %A van Antwerp, M. %E Ågerfalk, Pär %E Boldyreff, Cornelia %E González-Barahona, Jesús M. %E Madey, Gregory R. %E Noll, John %K berlios %K savannah %K sourceforge %X This paper outlines complex network concepts and how social networks are built from Open Source Software (OSS) data. We present an initial study of the social networks of three different OSS forges, BerliOS Developer, GNU Savannah, and SourceForge. Much research has been done on snapshot or conflated views of these networks, especially SourceForge, due to the size of the SourceForge community. The degree distribution, connectedness, centrality, and scale-free nature of SourceForge has been presented for the network at particular points in time. However, very little research has been done on how the network grows, how connections were made, especially during its infancy, and how these metrics evolve over time. %B IFIP Advances in Information and Communication TechnologyOpen Source Software: New Horizons %I Springer Berlin Heidelberg %C Berlin, Heidelberg %V 319 %P 407 - 412 %@ 978-3-642-13244-5 %R 10.1007/978-3-642-13244-5_39 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Predicting the severity of a reported bug %A Lamkanfi, Ahmed %A Demeyer, Serge %A Giger, Emanuel %A Goethals, Bart %K bug reports %K eclipse %K gnome %K mozilla %K severity %K text mining %X The severity of a reported bug is a critical factor in deciding how soon it needs to be fixed. Unfortunately, while clear guidelines exist on how to assign the severity of a bug, it remains an inherent manual process left to the person reporting the bug. In this paper we investigate whether we can accurately predict the severity of a reported bug by analyzing its textual description using text mining algorithms. Based on three cases drawn from the open-source community (Mozilla, Eclipse and GNOME), we conclude that given a training set of sufficient size (approximately 500 reports per severity), it is possible to predict the severity with a reasonable accuracy (both precision and recall vary between 0.65-0.75 with Mozilla and Eclipse; 0.70-0.85 in the case of GNOME). %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 1 - 10 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463284 %> https://flosshub.org/sites/flosshub.org/files/1lamkanfiDemeyer1.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Replaying IDE interactions to evaluate and improve change prediction approaches %A Robbes, Romain %A Pollet, Damien %A Lanza, Michele %K cbse %K change based software evolution %K change prediction %K changes %K commit %K cvs %K development history %K eclipseeye %K ide %K mylyn %K spyware %K svn %X Change prediction helps developers by recommending program entities that will have to be changed alongside the entities currently being changed. To evaluate their accuracy, current change prediction approaches use data from versioning systems such as CVS or SVN. These data sources provide a coarse-grained view of the development history that flattens the sequence of changes in a single commit. They are thus not a valid basis for evaluation in the case of development-style prediction, where the order of the predictions has to match the order of the changes a developer makes. We propose a benchmark for the evaluation of change prediction approaches based on fine-grained change data recorded from IDE usage. Moreover, the change prediction approaches themselves can use the more accurate data to fine-tune their prediction. We present an evaluation procedure and use it on several change prediction approaches, both novel and from the literature, and report on the results. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 161 - 170 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463278 %> https://flosshub.org/sites/flosshub.org/files/161Robbes2010changePrediction.pdf %0 Conference Paper %B Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010) %D 2010 %T Responsiveness as a measure for assessing the health of OSS ecosystems %A Gamalielsson, Jonas %A Lundell, Björn %A Lings, Brian %K email %K email archives %K gmane %K mailing lists %K nagios %K response time %K sourceforge %X The health of an Open Source ecosystem is an important decision factor when considering the adoption of Open Source software or when monitoring a seeded Open Source project. In this paper we introduce responsiveness as a qualitative measure of the quality of replies within mailing lists, which can be used for assessing ecosystem health. We consider one specific metric of responsiveness in this paper, and that is the response time of follow-up messages in mailing lists. We also describe a way for characterising the nature of communication in messages with short and long response times. The approach is tested in the context of the Nagios project, and we particularly focus on the responsiveness for contributors acting in their professional roles as core developers. Our contribution is a step towards a deeper understanding of voluntary support provided in mailing lists of OSS projects. %B Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010) %8 05/2010 %> https://flosshub.org/sites/flosshub.org/files/osscomm002.pdf %0 Journal Article %J ACM Trans. Softw. Eng. Methodol. %D 2010 %T The small-world effect: The influence of macro-level properties of developer collaboration networks on open-source project success %A Param Vir Singh %K collaborative software development %K online community %K open source software development %K productivity %K small world networks %K social networks %K sourceforge %K team formation %X In this study we investigate the impact of community-level networks—relationships that exist among developers in an OSS community—on the productivity of member developers. Specifically, we argue that OSS community networks characterized by small-world properties would positively influence the productivity of the member developers by providing them with speedy and reliable access to more quantity and variety of information and knowledge resources. Specific hypotheses are developed and tested using longitudinal data on a large panel of 4,279 projects from 15 different OSS communities hosted at Sourceforge. Our results suggest that significant variation exists in small-world properties of OSS communities at Sourceforge. After accounting for project, foundry, and time-specific observed and unobserved effects, we found a statistically significant relationship between small-world properties of a community and the technical and commercial success of the software produced by its members. In contrast to the findings of prior research, we also found the lack of a significant relationship between closeness and betweenness centralities of the project teams and their success. These results were robust to a number of controls and model specifications. %B ACM Trans. Softw. Eng. Methodol. %I ACM %C New York, NY, USA %V 20 %P 6:1–6:27 %U http://doi.acm.org/10.1145/1824760.1824763 %R 10.1145/1824760.1824763 %0 Thesis %D 2010 %T The Sociability of Free Software: A GNU Look at Free Software Identified Businesses as Social Entrepreneurships %A Barcomb, Ann %K free software %K open source software %K public good %K small business %K social entrepreneurship %K social ventures %X This research strives to address the gap in the literature surrounding companies which identify with the philosophical values associated with the Free Software movement, which have historically been associated with Open Source businesses. It investigates whether ethically-motivated Free Software identified companies resemble social entrepreneurships. This work also examines whether there are significant differences between the business practices of Free Software identified companies, Free Software, and Open Source enterprises in order to assess if it is appropriate to address them as a group. The study is based on seven case studies, and includes one company which is a Free Software business, but does not identify with the Free Software philosophy, as well as one company which is ethically-motivated but identifies with Open Source rather than Free Software. The results indicate that there is good reason to believe that adherence to Free Software philosophy creates socially-aware businesses, which may be social entrepreneurships. No problems were discovered with the practice of grouping together Free Software and Open Source companies in the study of business practices, provided that a broad definition of success is used. %I Maastricht University %U http://barcomb.org/cgi/paper.cgi?paper=barcomb:2010:sociability %9 masters %> https://flosshub.org/sites/flosshub.org/files/barcomb-2010-sociability.pdf %0 Conference Paper %B Proceedings of the 2010 Brazilian Symposium on Software Engineering %D 2010 %T A Study of the Relationships between Source Code Metrics and Attractiveness in Free Software Projects %A Meirelles, Paulo %A Santos Jr., Carlos %A Miranda, Joao %A Kon, Fabio %A Terceiro, Antonio %A Chavez, Christina %K source code %K source code analysis %K sourceforge %K user satisfaction %K users %X A significant number of Free Software projects has been widely used and considered successful. However, there is an even larger number of them that cannot overcome the initial steps towards building an active community of users and developers. In this study, we investigated whether there are relationships between source code metrics and attractiveness, i.e., the ability of a project to attract users and developers. To verify these relationships, we analyzed 6,773 Free Software projects from the SourceForge.net repository. The results indicated that attractiveness is indeed correlated to some source code metrics. This suggests that measurable attributes of the project source code somehow affect the decision to contribute to and adopt a Free Software. The findings described in this paper show that it is relevant for project leaders to monitor source code quality, particularly a few objective metrics, since these can have a positive influence in projects chances of forming a community of contributors and users around their software, enabling further enhancement in quality. %B Proceedings of the 2010 Brazilian Symposium on Software Engineering %S SBES '10 %I IEEE Computer Society %C Washington, DC, USA %P 11–20 %@ 978-0-7695-4273-7 %U http://dx.doi.org/10.1109/SBES.2010.27 %R 10.1109/SBES.2010.27 %> https://flosshub.org/sites/flosshub.org/files/sourcecode_attractiveness.pdf %0 Conference Paper %B Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010) %D 2010 %T Success and Abandonment in Open Source Commons: Selected Findings from an Empirical Study of Sourceforge.net Projects %A Schweik, C. M. %A English, R. %A Paienjton, Q. %A Haire, S. %K abandonment %K flossmole %K metadata %K project failure %K project success %K sourceforge %K time %X Some open source software collaborations are sustained over long periods of time and across several versions of a software product, while others become abandoned even before the first version of the product has been developed. In this study, we identify factors that might be responsible for one or the other of these collaborative trajectories. We examine 107,747 open source software projects hosted on Sourceforge.net in August 2006 using data available through the FLOSSmole Project. We employ Classification and Regression Tree modeling and Random Forests statistical approaches to begin to establish an understanding of how various project attributes, especially physical and community ones, contribute to project success or abandonment. We find that factors associated with success and abandonment differ for projects in the early stage of development (pre-first release) compared to projects that have had a first release, and that product utility, project vision, leadership, and group-size are associated with success in open source collaborations. We also find that successful open source projects exist across all types of software and not simply in areas associated with the open source “movement.” Other evidence suggests that Sourceforge.net may play an important role in “intellectual match-making.” %B Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010) %8 05/2010 %> https://flosshub.org/sites/flosshub.org/files/osscomm003.pdf %0 Journal Article %J Information and Software Technology %D 2010 %T Survival analysis on the duration of open source projects %A Samoladas, Ioannis %A Lefteris Angelis %A Ioannis Stamelos %K flossmetrics %K prediction %K source code %K survival analysis %X Context Open source (FLOSS) project survivability is an important piece of information for many open source stakeholders. Coordinators of open source projects would like to know the chances for the survival of the projects they coordinate. Companies are also interested in knowing how viable a project is in order to either participate or invest in it, and volunteers want to contribute to vivid projects. Objective The purpose of this article is the application of survival analysis techniques for estimating the future development of a FLOSS project. Method In order to apply such approach, duration data regarding FLOSS projects from the FLOSSMETRICS (This work was partially supported by the European Community’s Sixth Framework Program under the Contract FP6-033982) database were collected. Such database contains metadata for thousands of FLOSS projects, derived from various forges. Subsequently, survival analysis methods were employed to predict the survivability of the projects, i.e. their probability of continuation in the future, by examining their duration, combined with other project characteristics such as their application domain and number of committers. Results It was shown how probability of termination or continuation may be calculated and how a prediction model may be built to upraise project future. In addition, the benefit of adding more committers to FLOSS projects was quantified. Conclusion Analysis results demonstrate the usefulness of the proposed framework for assessing the survival probability of a FLOSS project. %B Information and Software Technology %V 52 %P 902 - 922 %8 9/2010 %N 9 %! Information and Software Technology %R 10.1016/j.infsof.2010.05.001 %0 Journal Article %J Journal of the Association for Information Systems %D 2010 %T Sustainability of Open-Source Projects: A Longitudinal Study %A Chengular-Smith, I. %A Sidorova, Anna %A Daniel, Sherae L. %K contribution %K developers %K sourceforge %K sustainability %X This paper examines the factors that influence the long-term sustainability of FLOSS projects. A model of project sustainability based on organizational ecology is developed and tested empirically. Data about activity and contribution patterns over the course of five years for 2,772 projects registered with SourceForge is analyzed. Our results suggest that the size of the project’s development base, project age and the size of niche occupied by the project are positively related to the project’s ability to attract user and/or developer resources. The ability to attract resources is an indicator of the perceived project legitimacy, which in turn is a strong predictor of the project’s future sustainability. Thus a project’s ability to attract developer and user resources is shown to play a mediating role between the demographic (size and age) and ecological (niche) characteristics of the project and its future sustainability. Our results support the applicability of tenets of organizational ecology related to the liability of smallness, the liability of newness, and population characteristics (niche size) to the FLOSS development environment. The implications of the results for future research and practice are discussed. %B Journal of the Association for Information Systems %V 11 %U http://aisel.aisnet.org/jais/vol11/iss11/5/ %N 11 %0 Conference Paper %B Software Reliability Engineering (ISSRE), 2010 IEEE 21st International Symposium on %D 2010 %T Towards a bayesian approach in modeling the disclosure of unique security faults in open source projects %A Anbalagan, Prasanth %A Vouk, Mladen %K security %X Software security has both an objective and a subjective component. A lot of the information available about that today is focused on security vulnerabilities and their disclosure. It is less frequent that security breaches and failures rates are reported, even in open source projects. Disclosure of security problems can take several forms. A disclosure can be accompanied by a release of the fix for the problem, or not. The latter category can be further divided into ”voluntary” and ”involuntary” security issues. In widely used software there is also considerable variability in the operational profile under which the software is used. This profile is further modified by attacks on the software that may be triggered by security disclosures. Therefore a comprehensive model of software security qualities of a product needs to incorporate both objective measures, such as security problem disclosure, repair and, failure rates, as well as less objective metrics such as implied variability in the operational profile, influence of attacks, and subjective impressions of exposure and severity of the problems, etc. We show how a classical Bayesian model can be adapted for use in the security context. The model is discussed and assessed using data from three open source software project releases. Our results show that the model is suitable for use with a certain subset of disclosed security faults, but that additional work will be needed to identify appropriate shape and scaling functions that would accurately reflect end-user perceptions associated with security problems. %B Software Reliability Engineering (ISSRE), 2010 IEEE 21st International Symposium on %I IEEE %P 101–110 %U http://ai2-s2-pdfs.s3.amazonaws.com/edcf/0b13ae1e6317c7e31f6b8783f669b978ffb3.pdf %> https://flosshub.org/sites/flosshub.org/files/0b13ae1e6317c7e31f6b8783f669b978ffb3.pdf %0 Conference Paper %B 5th Workshop on Public Data about Software Development (WoPDaSD 2010) %D 2010 %T Trends That Affect Temporal Analysis Using SourceForge Data %A MacLean, Alexander C. %A Pratt, Landon J. %A Krein, Jonathan L. %A Knutson, Charles D. %K cliff walls %K committers %K cvs %K evolution %K growth %K source code %K sourceforge %K time %K time series %X SourceForge is a valuable source of software artifact data for researchers who study project evolution and developer behavior. However, the data exhibit patterns that may bias temporal analyses. Most notable are cliff walls in project source code repository timelines, which indicate large commits that are out of character for the given project. These cliff walls often hide significant periods of development and developer collaboration—a threat to studies that rely on SourceForge repository data. We demonstrate how to identify these cliff walls, discuss reasons for their appearance, and propose preliminary measures for mitigating their effects in evolution-oriented studies. %B 5th Workshop on Public Data about Software Development (WoPDaSD 2010) %> https://flosshub.org/sites/flosshub.org/files/wopdasd001.pdf %0 Journal Article %J Information and Software Technology %D 2010 %T Using the DEMO methodology for modeling open source software development processes %A Huysmans, Philip %A Ven, Kris %A Verelst, Jan %K DEMO %K Enterprise ontology %K open source software %K Software process modeling %X Context Open source software development (OSSD) process modeling has received increasing interest in recent years. These efforts aim to identify common elements in the development process between multiple open source software (OSS) projects. However, the complexity inherent to OSSD process modeling puts significant demands on the modeling language. Objective In this paper, we propose that the Design and Engineering Methodology for Organizations (DEMO) may provide an interesting alternative to develop OSSD process models. DEMO exhibits two unique features within the context of OSSD process modeling. First, DEMO analyzes processes at the ontological level and provides high-level process descriptions, instead of focusing on the implementation level. Second, DEMO studies the communication patterns between human actors, instead of the sequences in which activities are performed. Method We investigate the feasibility of using DEMO to construct OSSD process models by means of a case study. DEMO models were constructed to describe the NetBeans Requirements and Release process. In addition, the quality of these DEMO models was evaluated using a quality framework for conceptual modeling. Results Our results showed that our DEMO models exhibited a high level of abstraction, thereby reducing the complexity of the OSSD process models. In addition, the evaluation of the models developed in this paper by using the quality framework for conceptual modeling showed that the models were of high quality. Conclusions We have shown that the DEMO methodology can be successfully used to model OSSD processes and to obtain abstract and high-quality OSSD process models. However, given some potential drawbacks with respect to understandability and implementability, we primarily propose the use of DEMO within OSSD process modeling as an analysis tool that should be complemented with other techniques and models for communication and reenactment purposes. %B Information and Software Technology %V 52 %P 656 - 671 %8 6/2010 %U http://www.sciencedirect.com/science/article/pii/S0950584910000157 %N 6 %! Information and Software Technology %R 10.1016/j.infsof.2010.02.002 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Validity of network analyses in Open Source Projects %A Nia, Roozbeh %A Christian Bird %A Devanbu, Premkumar %A Filkov, Vladimir %K apache %K email archives %K mailing lists %K missing data %K mysql %K perl %K social networks %X Social network methods are frequently used to analyze networks derived from Open Source Project communication and collaboration data. Such studies typically discover patterns in the information flow between contributors or contributions in these projects. Social network metrics have also been used to predict defect occurrence. However, such studies often ignore or side-step the issue of whether (and in what way) the metrics and networks of study are influenced by inadequate or missing data. In previous studies email archives of OSS projects have provided a useful trace of the communication and co-ordination activities of the participants. These traces have been used to construct social networks that are then subject to various types of analysis. However, during the construction of these networks, some assumptions are made, that may not always hold; this leads to incomplete, and sometimes incorrect networks. The question then becomes, do these errors affect the validity of the ensuing analysis? In this paper we specifically examine the stability of network metrics in the presence of inadequate and missing data. The issues that we study are: 1) the effect of paths with broken information flow (i.e. consecutive edges which are out of temporal order) on measures of centrality of nodes in the network, and 2) the effect of missing links on such measures. We demonstrate on three different OSS projects that while these issues do change network topology, the metrics used in the analysis are stable with respect to such changes. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 201 - 209 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463342 %> https://flosshub.org/sites/flosshub.org/files/201NetworkAnalysis.pdf %0 Book %B IFIP Advances in Information and Communication Technology Open Source Software: New Horizons (OSS 2010) %D 2010 %T Warehousing and Studying Open Source Versioning Metadata %A van Antwerp, M. %A Madey, G. %E Ågerfalk, Pär %E Boldyreff, Cornelia %E González-Barahona, Jesús M. %E Madey, Gregory R. %E Noll, John %K berlios %K cvs %K savannah %K scm %K sourceforge %K srda %K subversion %K svn %X In this paper, we describe the downloading and warehousing of Open Source Software (OSS) versioning metadata from SourceForge, BerliOS Developer, and GNU Savannah. This data enables and supports research in areas such as software engineering, open source phenomena, social network analysis, data mining, and project management. This newly-formed database containing Concurrent Versions System (CVS) and Subversion (SVN) metadata offers new research opportunities for large-scale OSS development analysis. The CVS and SVN data is juxtaposed with the SourceForge.net Research Data Archive [5] for the purpose of performing more powerful and interesting queries. We also present an initial statistical analysis of some of the most active projects. %B IFIP Advances in Information and Communication Technology Open Source Software: New Horizons (OSS 2010) %I Springer Berlin Heidelberg %C Berlin, Heidelberg %V 319 %P 413 - 418 %@ 978-3-642-13244-5 %R 10.1007/978-3-642-13244-5_40 %0 Journal Article %J International Journal of Open Source Software and Processes %D 2010 %T Weaving a Semantic Web Across OSS Repositories %A Olivier Berger %A Valentin Vlasceanu %A Christian Bac %A Quang Vu Dang %A Lauriere, Stéphane %K archive %K bug %K bugtracker %K database %K debian %K forge %K interoperability %K ontology %K OSLC-CM %K RDF %K repository of repositories %K semantic %K semantic Web %X Several public repositories and archives of “facts” about libre software projects, maintained either by open source communities or by research communities, have been flourishing over the Web in recent years. These have enabled new analysis and support for new quality assurance tasks. This paper presents some complementary existing tools, projects and models proposed both by OSS actors or research initiatives that are likely to lead to useful future developments in terms of study of the FLOSS phenomenon, and also to the very practitioners in the FLOSS development projects. A goal of the research conducted within the HELIOS project is to address bugs traceability issues. In this regard, the authors investigate the potential of using Semantic Web technologies in navigating between many different bugtracker systems scattered all over the open source ecosystem. By using Semantic Web techniques, it is possible to interconnect the databases containing data about open-source software projects development, which enables OSS partakers to identify resources, annotate them, and further interlink those using dedicated properties and collectively designing a distributed semantic graph. %B International Journal of Open Source Software and Processes %V 2 %P 29 - 40 %8 32/2010 %N 2 %R 10.4018/jossp.2010040103 %> https://flosshub.org/sites/flosshub.org/files/wopdasd2009-olivier-berger.pdf %0 Journal Article %J IEEE Transactions on Software Engineering %D 2010 %T What Makes a Good Bug Report? %A Zimmermann, Thomas %A Premraj, Rahul %A Bettenburg, Nicolas %A Sascha Just %A Schroter, Adrian %A Weiss, Cathrin %K bug report %K Survey %X In software development, bug reports provide crucial information to developers. However, these reports widely differ in their quality. We conducted a survey among developers and users of APACHE, ECLIPSE, and MOZILLA to find out what makes a good bug report. The analysis of the 466 responses revealed an information mis- match between what developers need and what users supply. Most developers consider steps to reproduce, stack traces, and test cases as helpful, which are at the same time most difficult to provide for users. Such insight is helpful to design new bug tracking tools that guide users at collecting and providing more helpful information. Our CUEZILLA prototype is such a tool and measures the quality of new bug reports; it also recommends which elements should be added to improve the quality. We trained CUEZILLA on a sample of 289 bug reports, rated by developers as part of the survey. In our experiments, CUEZILLA was able to predict the quality of 31–48% of bug reports accurately. %B IEEE Transactions on Software Engineering %I IEEE Computer Society %C Los Alamitos, CA, USA %V 36 %P 618-643 %U http://dl.acm.org/citation.cfm?id=1453146 %R http://doi.ieeecomputersociety.org/10.1109/TSE.2010.63 %> https://flosshub.org/sites/flosshub.org/files/bettenburg-fse-2008.pdf %0 Conference Paper %B 6th IEEE Working Conference on Mining Software Repositories %D 2009 %T Amassing and indexing a large sample of version control systems: towards the census of public source code history %A Audris Mockus %K bazaar %K cvs %K flossmole %K git %K mercurial %K source code %K sourceforge %K subversion %K version control %X The source code and its history represent the output and process of software development activities and are an in- valuable resource for study and improvement of software development practice. While individual projects and groups of projects have been extensively analyzed, some fundamental questions, such as the spread of innovation or genealogy of the source code, can be answered only by considering the entire universe of publicly available source code and its history. We describe methods we developed over the last six years to gather, index, and update an approximation of such a universal repository for publicly accessible version control systems and for the source code inside a large corporation. While challenging, the task is achievable with limited resources. The bottlenecks in network bandwidth, processing, and disk access can be dealt with using inherent parallelism of the tasks and suitable tradeoffs between the amount of storage and computations, but a completely automated discovery of public version control systems may require enticing participation of the sampled projects. Such universal repository would allow studies of global properties and origins of the source code that are not possible through other means. %B 6th IEEE Working Conference on Mining Software Repositories %8 May 16–17 %G eng %> https://flosshub.org/sites/flosshub.org/files/11amassing.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Assigning bug reports using a vocabulary-based expertise model of developers %A Matter, Dominique %A Kuhn, Adrian %A Nierstrasz, Oscar %K bug reports %K bugzilla %K develect %K developers %K eclipse %K expertise %K scm %X For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming reports is a time consuming task. Part of the bug triage is the assignment of a report to a developer with the appropriate expertise. In this paper, we present an approach to automatically suggest developers who have the appropriate expertise for handling a bug report. We model developer expertise using the vocabulary found in their source code contributions and compare this vocabulary to the vocabulary of bug reports. We evaluate our approach by comparing the suggested experts to the persons who eventually worked on the bug. Using eight years of Eclipse development as a case study, we achieve 33.6% top-1 precision and 71.0% top-10 recall. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 131 - 140 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069491 %> https://flosshub.org/sites/flosshub.org/files/131AssigningBugReports.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code %A Kuhn, Adrian %K frequency %K hapax %K information retrieval %K java %K junit %K keywords %K labeling %K source code %X As more and more open-source software components become available on the Internet we need automatic ways to label and compare them. For example, a developer who searches for reusable software must be able to quickly gain an understanding of retrieved components. This understanding cannot be gained at the level of source code due to the semantic gap between source code and the domain model. In this paper we present a lexical approach that uses the log-likelihood ratios of word frequencies to automatically provide labels for software components. We present a prototype implementation of our labeling/comparison algorithm and provide examples of its application. In particular, we apply the approach to detect trends in the evolution of a software system. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 175 - 178 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069499 %> https://flosshub.org/sites/flosshub.org/files/175AutomaticLabeling.pdf %0 Journal Article %J International Journal of Open Source Software and Processes %D 2009 %T Bridging the Gap between Agile and Free Software Approaches %A Paul J. Adams %A Capiluppi, Andrea %K agile %K kde %K lines of code %K loc %K plone %K productivity %K scm %K scrum %K sprints %K subversion %X Agile sprints are short events where a small team collocates in order to work on particular aspects of the overall project for a short period of time. Sprinting is a process that has been observed also in Free Software projects: these two paradigms, sharing common principles and values have shown several commonalities of practice. This article evaluates the impact of sprinting on a Free Software project through the analysis of code repository logs: sprints from two Free Software projects (Plone and KDE PIM) are assessed and two hypotheses are formulated: do sprints increase productivity? Are Free Software projects more productive after sprints compared with before? The primary contribution of this article is to show how sprinting creates a large increase in productivity both during the event, and immediately after the event itself: this argues for more in-depth studies focussing on the nature of sprinting. %B International Journal of Open Source Software and Processes %V 1 %P 58 - 71 %8 31/2009 %N 1 %R 10.4018/jossp.2009010104 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Code siblings: Technical and legal implications of copying code between applications %A Daniel M. German %A Di Penta, Massimiliano %A Gueheneuc, Yann-Gael %A Antoniol, Giuliano %K bsd %K fossology %K freebsd %K linux %K openbsd %K source code %X Source code cloning does not happen within a single system only. It can also occur between one system and another. We use the term code sibling to refer to a code clone that evolves in a different system than the code from which it originates. Code siblings can only occur when the source code copyright owner allows it and when the conditions imposed by such license are not incompatible with the license of the destination system. In some situations copying of source code fragments are allowed - legally - in one direction, but not in the other. In this paper, we use clone detection, license mining and classification, and change history techniques to understand how code siblings - under different licenses - flow in one direction or the other between Linux and two BSD Unixes, FreeBSD and OpenBSD. Our results show that, in most cases, this migration appears to happen according to the terms of the license of the original code being copied, favoring always copying from less restrictive licenses towards more restrictive ones. We also discovered that sometimes code is inserted to the kernels from an outside source. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 81 - 90 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069483 %> https://flosshub.org/sites/flosshub.org/files/81CodeSiblings.pdf %0 Journal Article %J The R Journal %D 2009 %T Collaborative Software Development Using R-Forge %A Stefan Theußl %A Achim Zeileis %K forge %K R %K scm %K source code repositories %K statistics %X Open source software (OSS) is typically created in a decentralized self-organizing process by a community of developers having the same or similar interests (see the famous essay by Raymond, 1999). A key factor for the success of OSS over the last two decades is the Internet: Developers who rarely meet face-to-face can employ new means of communication, both for rapidly writing and deploying software (in the spirit of Linus Torvald’s “release early, release often paradigm”). Therefore, many tools emerged that assist a collaborative software development process, including in particular tools for source code management (SCM) and version control. In the R world, SCM is not a new idea; in fact, the R Development Core Team has always been using SCM tools for the R sources, first by means of Concurrent Versions System (CVS, see Cederqvist et al., 2006), and then via Subversion (SVN, see Pilato et al., 2004). A central repository is hosted by ETH Zürich mainly for managing the development of the base R system. Mailing lists like R-help, R-devel and many others are currently the main communication channels in the R community. First, we present the core features that R- Forge offers to the R community. Second, we give a hands-on tutorial on how users and developers can get started with R-Forge. In particular, we illustrate how people can register, set up new projects, use R- Forge’s SCM facilities, provide their packages on R-Forge, host a project-specific website, and how package maintainers submit a package to the Compre- hensive R Archive Network (CRAN, http://CRAN. R-project.org/). Finally, we summarize recent developments and give a brief outlook to future work. %B The R Journal %V 1 %P 9-14 %8 05/2009 %> https://flosshub.org/sites/flosshub.org/files/rjournal.pdf %0 Conference Paper %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %D 2009 %T The Commit Size Distribution of Open Source Software %A Arafat, O. %A Dirk Riehle %K commits %K configuration management %K history %K lines of code %K sloc %K source code %X With the growing economic importance of open source, we need to improve our understanding of how open source software development processes work. The analysis of code contributions to open source projects is an important part of such research. In this paper we analyze the size of code contributions to more than 9,000 open source projects. We review the total distribution and distinguish three categories of code contributions using a size-based heuristic: single focused commits, aggregate team contributions, and repository refactorings. We find that both the overall distribution and the individual categories follow a power law. We also suggest that distinguishing these commit categories by size will benefit future analyses. %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %I IEEE %C Waikoloa, Hawaii, USA %P 1 - 8 %@ 978-0-7695-3450-3 %R 10.1109/HICSS.2009.421 %> https://flosshub.org/sites/flosshub.org/files/07-07-07.pdf %0 Journal Article %J International Journal of Intelligent Control and Systems %D 2009 %T Competition and production of digital public goods %A Radtke, Nicholas P. %A Janssen, Marco A. %K digital public goods %K FLOSS %K open source software %K sourceforge %K success metrics %K wikipedia %X With the Internet has come the phenomenon of people volunteering to work on digital public goods such as open source software and online encyclopedia articles. Presumably, the success of individual public goods has an effect on attracting volunteers. However, the definition of success is ill-defined. This paper explores the impact of different success metrics on a simple public goods model. The findings show that the different success metrics considered do have an impact on the behavior of the model, with the largest differences being between consumer-oriented and producer-oriented metrics. This indicates that many proposed success metrics may be mapped into one of these two categories and within a category, all success metrics measure the same phenomenon. We argue that the characteristics of producer-oriented metrics more closely match real world phenomena, indicating that public goods are driven by producer, and not consumer, interests. %B International Journal of Intelligent Control and Systems %V 14 %P 77-86 %U http://www.public.asu.edu/~majansse/pubs/ijics2009.pdf %& 77 %> https://flosshub.org/sites/flosshub.org/files/ijics2009.pdf %0 Journal Article %J Decis. Support Syst. %D 2009 %T Determinants of open source software project success: A longitudinal study %A Subramaniam, Chandrasekar %A Sen, Ravi %A Nelson, Matthew L. %K contributors %K developers %K licenses %K longitudinal study %K Open source project %K OSS %K project success %K restrictive %K Software project success %X In this paper, we investigate open source software (OSS) success using longitudinal data on OSS projects. We find that restrictive OSS licenses have an adverse impact on OSS success. On further analysis, restrictive OSS license is found to be negatively associated with developer interest, but is positively associated with the interest of non-developer users and project administrators. We also show that developer and non-developer interest in the OSS project and the project activity levels in any time period significantly affect the project success measures in subsequent time period. The implications of our findings for OSS research and practice are discussed. %B Decis. Support Syst. %I Elsevier Science Publishers B. V. %C Amsterdam, The Netherlands, The Netherlands %V 46 %P 576–585 %8 January %U http://portal.acm.org/citation.cfm?id=1480545.1480824 %R 10.1016/j.dss.2008.10.005 %0 Journal Article %J 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %D 2009 %T Easier Said than Done: An Empirical Investigation of Software Design and Quality in Open Source Software Development %A Conley, Caryn A. %A Lee Sproull %K modularity %K quality %K source code %K sourceforge %X We empirically examine the relationship between software design modularity and software quality in open source software (OSS) development projects. Conventional wisdom suggests that degree of software modularity affects software quality. An analysis of 203 software releases in 46 OSS projects hosted on SourceForge.net lends support for a more complex relationship between software modularity and software quality than conventional wisdom suggests. We find that software modularity is associated with reduced software complexity, an increased number of static software bugs, and a mixed relationship with the percentage of bugs closed. We do not find empirical evidence supporting any relationship between modularity and other measures of customer satisfaction. In addition to empirically testing the relationship between modularity and quality, we introduce new measures of software modularity and software quality. Implications are developed for the theory of modularity and the practice of software development. %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %I IEEE Computer Society %C Los Alamitos, CA, USA %P 1-10 %@ 978-0-7695-3450-3 %R http://doi.ieeecomputersociety.org/10.1109/HICSS.2009.687 %> https://flosshub.org/sites/flosshub.org/files/09-14-05.pdf %0 Journal Article %J 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %D 2009 %T Evaluating Longitudinal Success of Open Source Software Projects: A Social Network Perspective %A Jing Wu %A Khim Yong Goh %K bug tracking system %K communication %K project success %K social network analysis %K sourceforge %X To date, numerous open source projects are hosted on many online repositories. While some of these projects are active and thriving, some projects are either languishing or showing no development activities at all. This phenomenon thus begs the important question of what are the influential factors that affect the success of open source projects. In a quest to deepen our understanding of the evolution of open source projects, this research aims to analyze the success of open source projects by using the theoretical lens of social network analysis. Based on extensive analyses of data collected from online repositories, we study the impact of the communication patterns of software development teams on the demand and supply outcomes of these projects, while accounting for project-specific characteristics. Using panel data analysis of data over 13 months, we find significant impacts of communication patterns on project outcomes over the long term. %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %I IEEE Computer Society %C Los Alamitos, CA, USA %P 1-10 %@ 978-0-7695-3450-3 %R http://doi.ieeecomputersociety.org/10.1109/HICSS.2009.713 %> https://flosshub.org/sites/flosshub.org/files/09-02-12.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Evolution of the core team of developers in libre software projects %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %A Herraiz, Israel %K core %K cvs %K cvsanaly %K developers %K evolution %K gimp %K scm %X In many libre (free, open source) software projects, most of the development is performed by a relatively small number of persons, the "core team". The stability and permanence of this group of most active developers is of great importance for the evolution and sustainability of the project. In this position paper we propose a quantitative methodology to study the evolution of core teams by analyzing information from source code management repositories. The most active developers in different periods are identified, and their activity is calculated over time, looking for core team evolution patterns. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 167 - 170 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069497 %> https://flosshub.org/sites/flosshub.org/files/167core-evolution.pdf %0 Conference Paper %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %D 2009 %T An Exploratory Study on the Two New Trends in Open Source Software: End-Users and Service %A Namjoo Choi %A Chengular-Smith, I. %K developers %K intended audiences %K sourceforge %X Many have been envisaging the emergence of Open Source Software (OSS) for general end-users and the enhancements in providing services and support, as the most critical factors for OSS success, and at the same time, the most critical issues which are holding back the OSS movement. While these two distinct waves in OSS evolution have become more observable, researchers have not yet explored the characteristics of these two distinct new waves. The current study found evidence for these two waves and further explored the two waves by empirically examining two hundred projects hosted in Sourceforge.net. We compared the characteristics of OSS projects that are intended for two disparate audiences: developers and end-users and found that projects for end-users supported more languages but also had more restrictive licenses as compared to projects for developers. %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %I IEEE %C Waikoloa, Hawaii, USA %P 1 - 10 %@ 978-0-7695-3450-3 %R 10.1109/HICSS.2009.63 %> https://flosshub.org/sites/flosshub.org/files/07-07-05.pdf %0 Conference Paper %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %D 2009 %T Flat for the few, steep for the many: Structural cohesion as a measure of hierarchy in FLOSS communities %A Guido Conaldi %K case study %K email %K email archives %K epiphany %K gnome %K mailing list %K social network analysis %X A discrepancy exists between the emphasis posed by practitioners on decentralized and non-hierarchical communication in Free/Libre Open Source Software (FLOSS) communities and empirical evidence of their hierarchical structure. In order to explain this apparent paradox it is here hypothesized that in FLOSS communities local sub-groups exist and are less hierarchical, more decentralized than the whole social network to which they belong. A measure of structural cohesion based on network node connectivity is proposed as an effective method to test whether FLOSS communication networks can be decomposed in nested hierarchies of progressively less centralized sub-groups. Preliminary results from a case study that are consistent with the hypothesis are presented and discussed. %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %8 2009 %> https://flosshub.org/sites/flosshub.org/files/guido-conaldi-flat-for-the-few.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T From work to word: How do software developers describe their work? %A Maalej, Walid %A Happel, Hans-Jorg %K apache %K developers %K diaries %K eureka %K mycomp %K scm %K work management system %X Developers take notes about their work sessions, either to remember the work status and share it with collaborators, or because employers explicitly require this for project management matters. We report on an exploratory study which aims at understanding how software developers describe their work. We analyzed more than 750,000 work descriptions of about 2,000 professionals taken over 8 years in three settings. We observed several similarities in the content and time meta-data of work descriptions. Most frequent terms, such as top-30 performed activities, are used consistently. Particular templates such as ldquoACTION concerning ARTIFACT because of CAUSErdquo occur frequently. Developers described sessions that last 30-120 min. 4-16 times a day. Maintaining diaries seems to consume between 3-6% of the total work time, and in 10% of the sessions, developers did not describe their work in sufficient detail. We argue that our results make the first step towards automatically generating work diaries for software developers. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 121 - 130 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069490 %0 Journal Article %J Journal of Systems and Software %D 2009 %T Identifying exogenous drivers and evolutionary stages in FLOSS projects %A Karl Beecher %A Capiluppi, Andrea %A Boldyreff, Cornelia %K developers %K forge %K forges %K repositories %K repository %K scm %K software repositories %K sourceforge %K success %K users %X The success of a Free/Libre/Open Source Software (FLOSS) project has been evaluated in the past through the number of commits made to its configuration management system, number of developers and number of users. Most studies, based on a popular FLOSS repository (SourceForge), have concluded that the vast majority of projects are failures. This study's empirical results confirm and expand conclusions from an earlier and more limited work. Not only do projects from different repositories display different process and product characteristics, but a more general pattern can be observed. Projects may be considered as early inceptors in highly visible repositories, or as established projects within desktop-wide projects, or finally as structured parts of FLOSS distributions. These three possibilities are formalized into a framework of transitions between repositories. The framework developed here provides a wider context in which results from FLOSS repository mining can be more effectively presented. Researchers can draw different conclusions based on the overall characteristics studied about an Open Source software project's potential for success, depending on the repository that they mine. These results also provide guidance to OSS developers when choosing where to host their project and how to distribute it to maximize its evolutionary success. %B Journal of Systems and Software %V 82 %P 739 - 750 %U http://www.sciencedirect.com/science/article/B6V0N-4TVTJFS-1/2/e32ecee1bcb54bd4a5dff6d5e3daca8d %R DOI: 10.1016/j.jss.2008.10.026 %0 Conference Paper %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %D 2009 %T Language entropy: A metric for characterization of author programming language distribution %A Krein, Jonathan L. %A MacLean, Alexander C. %A Delorey, Daniel P. %A Knutson, Charles D. %A Eggett, Dennis L. %K contributions %K developers %K language entropy %K lines of code %K loc %K multiple languages %K programming languages %K sourceforge %X Programmers are often required to develop in multiple languages. In an effort to study the effects of programming language fragmentation on productivity—and ultimately on a programmer’s problem solving abilities—we propose a metric, language entropy, for characterizing the distribution of an individual’s development efforts across multiple programming languages. To evaluate this metric, we present an observational study examining all project contributions (through August 2006) of a random sample of 500 SourceForge developers. Using a random coefficients model, we found a statistically significant correlation (alpha level of 0.05) between language entropy and the size of monthly pro ject contributions (measured in lines of code added). Our results indicate that language entropy is a good candidate for characterizing author programing language distribution. %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %8 2009 %> https://flosshub.org/sites/flosshub.org/files/LanguageEntropy-JonathanKrein.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Learning from defect removals %A Ayewah, Nathaniel %A Pugh, William %K bug fixing %K bugzilla %K change management %K cherry %K cvs %K eclipse %K groovy %K launching %K source code %K svn %K text editor %X Recent research has tried to identify changes in source code repositories that fix bugs by linking these changes to reports in issue tracking systems. These changes have been traced back to the point in time when they were previously modified as a way of identifying bug introducing changes. But we observe that not all changes linked to bug tracking systems are fixing bugs; some are enhancing the code. Furthermore, not all fixes are applied at the point in the code where the bug was originally introduced. We flesh out these observations with a manual review of several software projects, and use this opportunity to see how many defects are in the scope of static analysis tools. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 179 - 182 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069500 %> https://flosshub.org/sites/flosshub.org/files/179LearnFromDefects-MSR09.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T On mining data across software repositories %A Anbalagan, Prasanth %A Vouk, Mladen %K bug reports %K bugzilla %K Fedora %K Firefox %K htmlscraper %K integration %K launchpad %K national vulnerability database %K RedHat %K Suse %K tracker %K Ubuntu %X Software repositories provide abundance of valuable information about open source projects. With the increase in the size of the data maintained by the repositories, automated extraction of such data from individual repositories, as well as of linked information across repositories, has become a necessity. In this paper we describe a framework that uses web scraping to automatically mine repositories and link information across repositories. We discuss two implementations of the framework. In the first implementation, we automatically identify and collect security problem reports from project repositories that deploy the Bugzilla bug tracker using related vulnerability information from the National Vulnerability Database. In the second, we collect security problem reports for projects that deploy the Launchpad bug tracker along with related vulnerability information from the National Vulnerability Database. We have evaluated our tool on various releases of Fedora, Ubuntu, Suse, RedHat, and Firefox projects. The percentage of security bugs identified using our tool is consistent with that reported by other researchers. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 171 - 174 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069498 %> https://flosshub.org/sites/flosshub.org/files/171MiningAcrossmsr09.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Mining search topics from a code search engine usage log %A Bajracharya, Sushil %A Lopes, Cristina %K analysis %K black duck %K koders %K log %K logfile %K search %K source code %X We present a topic modeling analysis of a year long usage log of Koders, one of the major commercial code search engines. This analysis contributes to the understanding of what users of code search engines are looking for. Observations on the prevalence of these topics among the users, and on how search and download activities vary across topics, leads to the conclusion that users who find code search engines usable are those who already know to a high level of specificity what to look for. This paper presents a general categorization of these topics that provides insights on the different ways code search engine users express their queries. The findings support the conclusion that existing code search engines provide only a subset of the various information needs of the users when compared to the categories of queries they look at. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 111 - 120 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069489 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Mining source code to automatically split identifiers for software analysis %A Enslen, Eric %A Hill, Emily %A Pollock, Lori %A Vijay-Shanker, K. %K java %K samurai %K sourceforge %X Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their constituent words. Unlike natural languages, where space and punctuation are used to delineate words, identifiers cannot contain spaces. One common way to split identifiers is to follow programming language naming conventions. For example, Java programmers often use camel case, where words are delineated by uppercase letters or non-alphabetic characters. However, programmers also create identifiers by concatenating sequences of words together with no discernible delineation, which poses challenges to automatic identifier splitting. In this paper, we present an algorithm to automatically split identifiers into sequences of words by mining word frequencies in source code. With these word frequencies, our identifier splitter uses a scoring technique to automatically select the most appropriate partitioning for an identifier. In an evaluation of over 8000 identifiers from open source Java programs, our Samurai approach outperforms the existing state of the art techniques. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 71 - 80 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069482 %> https://flosshub.org/sites/flosshub.org/files/71EnslenandHillandPollockandVijayShanker.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Mining the coherence of GNOME bug reports with statistical topic models %A Linstead, Erik %A Baldi, Pierre %K bug reports %K bugzilla %K gnome %K msr challenge %K quality %K sourcerer %X We adapt latent Dirichlet allocation to the problem of mining bug reports in order to define a new information-theoretic measure of coherence. We then apply our technique to a snapshot of the GNOME Bugzilla database consisting of 431,863 bug reports for multiple software projects. In addition to providing an unsupervised means for modeling report content, our results indicate substantial promise in applying statistical text mining algorithms for estimating bug report quality. Complete results are available from our supplementary materials Web site at http://sourcerer.ics.uci.edu/msr2009/gnome_coherence.html. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 99 - 102 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069486 %0 Journal Article %J Research Policy %D 2009 %T Monetary donations to an open source software platform %A Sandeep Krishnamurthy %A Tripathi, Arvind K. %K Collective action %K Donation %K Identification %K incentives %K metadata %K MOTIVATION %K Open source software platform %K projects %K Reciprocity %K Relational commitment %K sourceforge %X Online open source software platforms, such as Sourceforge.net, play a vital role in creating an ecosystem that enables the creation and growth of open source projects. However, there is little research exploring the interactions between open source stakeholders and the platform. We believe that the sustainability of the platform crucially depends on financial incentives. While platforms can obtain these incentives through multiple means, in this paper we focus on one form of financial incentives—voluntary monetary donations by open source community members. We report findings from two empirical studies that examine factors that impact donations. Study 1 investigates the factors that cause some community members to donate and not others. We find that the decision to donate is impacted by relational commitment with open source software platform, donation to projects and accepting donations from others. Study 2 examines what drives the level of donation. We find that the length of association with the platform and relational commitment affects donation levels. %B Research Policy %V 38 %P 404 - 414 %8 03/2009 %N 2 %! Research Policy %R 10.1016/j.respol.2008.11.004 %0 Conference Paper %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %D 2009 %T Multiple Social Networks Analysis of FLOSS Projects using Sargas %A de Sousa, S.F. %A Balieiro, M.A. %A dos R. Costa, J.M. %A de Souza, C.R.B. %K case study %K multiple social networks %K ossnetwork %K pmd %K social network analysis %K transflow %X Due to their characteristics and claimed advantages, several researchers have been investigating free and open-source projects. Different aspects are being studied: for instance, what motivates developers to join FLOSS projects, the tools, processes and practices used in FLOSS projects, the evolution of FLOSS communities among other things. Researchers have studied collaboration and coordination of open source software developers using an approach known as social network analysis and have gained important insights about these projects. Most researchers, however, have not focused on the integrated study of these networks and, accordingly, in their interrelationships. This paper describes an approach and tool to combine multiple social networks to study the evolution of open-source projects. Our tool, named Sargas, allows comparison and visualization of different social networks at the same time. Initial results of our analysis can be used to extend the "onion-model" of open source participation. %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %I IEEE %C Waikoloa, Hawaii, USA %P 1 - 10 %@ 978-0-7695-3450-3 %R 10.1109/HICSS.2009.316 %> https://flosshub.org/sites/flosshub.org/files/07-07-06.pdf %0 Conference Paper %B Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 %D 2009 %T The promises and perils of mining git %A Christian Bird %A Peter C. Rigby %A Earl T. Barr %A David J. Hamilton %A Daniel M. Germán %A Premkumar T. Devanbu %K dscm %K git %K mining %K scm %K source code %X We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as "How do contributions flow between developers to the official project repository?" However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data. %B Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 %P 1-10 %> https://flosshub.org/sites/flosshub.org/files/1promisePeril.pdf %0 Conference Paper %B Proceedings of the 17th International Symposium on Software Reliability Engineering %D 2009 %T Putting it All Together: Using Socio-Technical Networks to Predict Failures %A Christian Bird %A Nachiappan Nagappan %A Devanbu, Premkumar %A Gall, Harald %A Brendan Murphy %K eclipse %K microsoft %K social network %K vista %K windows %X Studies have shown that social factors in development organizations have a dramatic effect on software quality. Separately, program dependency information has also been used successfully to predict which software components are more fault prone. Interestingly, the influence of these two phenomena have only been studied separately. Intuition and practical experience suggests, however, that task assignment (i.e. who worked on which components and how much) and dependency structure (which components have dependencies on others) together interact to influence the quality of the resulting software. We study the influence of combined socio-technical software networks on the fault-proneness of individual software components within a system. The network properties of a software component in this combined network are able to predict if an entity is failure prone with greater accuracy than prior methods which use dependency or contribution information in isolation. We evaluate our approach in different settings by using it on Windows Vista and across six releases of the Eclipse development environment including using models built from one release to predict failure prone components in the next release. We compare this to previous work. In every case, our method performs as well or better and is able to more accurately identify those software components that have more post-release failures, with precision and recall rates as high as 85%. %B Proceedings of the 17th International Symposium on Software Reliability Engineering %> https://flosshub.org/sites/flosshub.org/files/bird2009pat.pdf %0 Journal Article %J Electronic Notes in Theoretical Computer Science %D 2009 %T Quality Factors and Coding Standards - a Comparison Between Open Source Forges %A Capiluppi, Andrea %A Boldyreff, Cornelia %A Karl Beecher %A Paul J. Adams %K artefacts %K artifacts %K coding standards %K coding style %K complexity %K forge %K forges %K kde %K metrics %K quality %K source code %K sourceforge %X Enforcing adherence to standards in software development in order to produce high quality software artefacts has long been recognised as best practice in traditional software engineering. In a distributed heterogeneous development environment such those found within the Open Source paradigm, coding standards are informally shared and adhered to by communities of loosely coupled developers. Following these standards could potentially lead to higher quality software. This paper reports on the empirical analysis of two major forges where OSS projects are hosted. The first one, the KDE forge, provides a set of guidelines and coding standards in the form of a coding style that developers may conform to when producing the code source artefacts. The second studied forge, SourceForge, imposes no formal coding standards on developers. A sample of projects from these two forges has been analysed to detect whether the SourceForge sample, where no coding standards are reinforced, has a lower quality than the sample from KDE. Results from this analysis form a complex picture; visually, all the selected metrics show a clear divide between the two forges, but from the statistical standpoint, clear distinctions cannot be drawn amongst these quality related measures in the two forge samples. %B Electronic Notes in Theoretical Computer Science %V 233 %P 89 - 103 %U http://www.sciencedirect.com/science/article/B75H1-4VXDKRV-7/2/abcc2be2c4c3998e4bc9b53473ca2d81 %R DOI: 10.1016/j.entcs.2009.02.063 %0 Journal Article %J Journal of Evolutionary Economics %D 2009 %T Returns from social capital in open source software networks %A Méndez-Durón, Rebeca %A García, Clara E. %K contributors %K developers %K games %K gpl %K project success %K roles %K social capital %K social network analysis %K social networks %K sourceforge %K srda %K teams %X Open Source Software projects base their operation on a collaborative structure for knowledge exchange in the form of provision or reception of information, expertise, and feedback on the creation of source code. Here, we address the direction of these knowledge flows among projects throughout social networks and their impact on project success. We identify the roles of membership or contribution that individuals play within projects. We found that connections through contributors who bring their knowledge to the project, improve project success, and that connection through members, who transfer their knowledge towards other projects, enhance project success. Finally, we found that ties through shared membership and contributions hamper project success. The analysis of knowledge flows and their impact on project success imply a translation of returns from investment in social capital, where investment takes the shape of knowledge flows and the returns mean the projects' diffusion over the network. %B Journal of Evolutionary Economics %V 19 %P 277 - 295 %8 4/2009 %N 2 %! J Evol Econ %R 10.1007/s00191-008-0125-5 %> https://flosshub.org/sites/flosshub.org/files/Mendez-DuronGarcia.pdf %0 Journal Article %J AMCIS 2009 Proceedings %D 2009 %T Security of Open Source and Closed Source Software: An Empirical Comparison of Published Vulnerabilities %A Schryen, Guido %K closed source software %K empirical comparison %K open source software %K security %K Vulnerabilities %X Reviewing literature on open source and closed source security reveals that the discussion is often determined by biased attitudes toward one of these development styles. The discussion specifically lacks appropriate metrics, methodology and hard data. This paper contributes to solving this problem by analyzing and comparing published vulnerabilities of eight open source software and nine closed source software packages, all of which are widely deployed. Thereby, it provides an extensive empirical analysis of vulnerabilities in terms of mean time between vulnerability disclosures, the development of disclosure over time, and the severity of vulnerabilities, and allows for validating models provided in the literature. The investigation reveals that (a) the mean time between vulnerability disclosures was lower for open source software in half of the cases, while the other cases show no differences, (b) in contrast to literature assumption, 14 out of 17 software packages showed a significant linear or piecewise linear correlation between time and the number of published vulnerabilities, and (c) regarding the severity of vulnerabilities, no significant differences were found between open source and closed source. %B AMCIS 2009 Proceedings %P 387 %U http://epub.uni-regensburg.de/21296/1/Schryen_-_AMCIS_09_-_Security_of_open_source_and_closed_source_software_-_Web_version.pdf %> https://flosshub.org/sites/flosshub.org/files/Schryen_-_AMCIS_09_-_Security_of_open_source_and_closed_source_software_-_Web_version.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR) %D 2009 %T SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects %A Ossher, Joel %A Bajracharya, Sushil %A Linstead, Erik %A Baldi, Pierre %A Lopes, Cristina %K apache %K java %K java.net %K source code %K sourceforge %K sourcerer %X The open source movement has made vast quantities of source code available online for free, providing an extremely large dataset for empirical study and potential re-use. A major difficulty in exploiting this potential fully is that the data are currently scattered between competing source code repositories, none of which are structured for empirical analysis and cross-project comparison. As a result, software researchers and developers are left to compile their own datasets, resulting in duplicated effort and limited results. To address this challenge, we built SourcererDB, an aggregated repository of statically analyzed and cross-linked open source Java projects. SourcererDB contains local snapshots of 2,852 Java projects taken from Sourceforge, Apache and Java.net. These projects are statically analyzed to extract rich structural information, which is then stored in a relational database. References to entities in the 16,058 external jars are resolved and grouped, allowing for cross-project usage information to be accessed easily. This paper describes: (a) the mechanism for resolving and grouping these cross-project references, (b) the structure of and the metamodel for the SourcererDB repository, and (d) end-user dataset access mechanisms. Our goal in building SourcererDB is to provide a rich dataset of source code to facilitate the sharing of extracted data and to encourage reuse and repeatability of experiments. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR) %I IEEE %C Vancouver, BC, Canada %P 183 - 186 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069501 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects %A Ossher, Joel %A Bajracharya, Sushil %A Linstead, Erik %A Baldi, Pierre %A Lopes, Cristina %K apache %K integration %K java %K java.net %K project %K repository %K sourceforge %K SourcererDB %X The open source movement has made vast quantities of source code available online for free, providing an extremely large dataset for empirical study and potential resuse. A major difficulty in exploiting this potential fully is that the data are currently scattered between competing source code repositories, none of which are structured for empirical analysis and cross-project comparison. As a result, software researchers and developers are left to compile their own datasets, resulting in duplicated effort and limited results. To address this challenge, we built SourcererDB, an aggregated repository of statically analyzed and cross-linked open source Java projects. SourcererDB contains local snapshots of 2,852 Java projects taken from Sourceforge, Apache and Java.net. These projects are statically analyzed to extract rich structural information, which is then stored in a relational database. References to entities in the 16,058 external jars are resolved and grouped, allowing for cross-project usage information to be accessed easily. This paper describes: (a) the mechanism for resolving and grouping these cross-project references, (b) the structure of and the metamodel for the SourcererDB repository, and (d) end-user dataset access mechanisms. Our goal in building SourcererDB is to provide a rich dataset of source code to facilitate the sharing of extracted data and to encourage reuse and repeatability of experiments. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 183 - 186 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069501 %0 Conference Paper %B International Conference on Intelligent User Interfaces %D 2009 %T Tagsplanations: Explaining Recommendations using Tags %A Vig, J. %A Sen, S. %A Riedl, J. %K recommender %K SYSTEMS %K tagging, %B International Conference on Intelligent User Interfaces %C Sanibel Island, FL %8 02/08/2009 %G eng %0 Journal Article %J International Journal of Open Source Software and Processes %D 2009 %T Tools for the Study of the Usual Data Sources found in Libre Software Projects %A Gregorio Robles %A González-Barahona, Jesús M. %A Izquierdo-Cortazar, Daniel %A Herraiz, Israel %K bug tracking systems %K data sources %K mailing lists %K scm %K tools %X Due to the open nature of Free/Libre/Open Source software projects, researchers have gained access to a rich set of development-related information. Although this information is publicly available on the Internet, obtaining and analyzing it in a convenient way is not an easy task and many considerations have to be taken into account. In this paper we present the most important data sources that can be found in libre software projects and that are studied by the research community: source code, source code management systems, mailing lists and bug tracking systems. We will give advice for the problems that can be found when retrieving and preparing the data sources for a posterior analysis, as well as provide information about the tools that support these tasks. %B International Journal of Open Source Software and Processes %V 1 %P 24 - 45 %8 31/2009 %N 1 %R 10.4018/jossp.2009010102 %> https://flosshub.org/sites/flosshub.org/files/robles.pdf %0 Conference Paper %B Proceedings of the 27th international conference on Human factors in computing systems %D 2009 %T Understanding how and why open source contributors use diagrams in the development of Ubuntu %A Yatani, Koji %A Chung, Eunyoung %A Jensen, Carlos %A Truong, Khai N. %K developers %K diagramming %K interviews %K open source software (oss) %K software development %K Ubuntu %K visual representation %X Some of the most interesting differences between Open Source Software (OSS) development and commercial co-located software development lie in the communication and collaboration practices of these two groups of developers. One interesting practice is that of diagramming. Though well studied and important in many aspects of co-located software development (including communication and collaboration among developers), its role in OSS development has not been thoroughly studied. In this paper, we report our investigation on how and why Ubuntu contributors use diagrams in their work. Our study shows that diagrams are not actively used in many scenarios where they commonly would in co-located software development efforts. We describe differences in the use and practices of diagramming, their possible reasons, and present design considerations for potential systems aimed at better supporting diagram use in OSS development. %B Proceedings of the 27th international conference on Human factors in computing systems %S CHI '09 %I ACM %C New York, NY, USA %P 995–1004 %@ 978-1-60558-246-7 %U http://doi.acm.org/10.1145/1518701.1518853 %R http://doi.acm.org/10.1145/1518701.1518853 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Using association rules to study the co-evolution of production & test code %A Lubsen, Zeeger %A Zaidman, Andy %A Pinzger, Martin %K association rules %K checkstyle %K source code %K unit test %X Unit tests are generally acknowledged as an important aid to produce high quality code, as they provide quick feedback to developers on the correctness of their code. In order to achieve high quality, well-maintained tests are needed. Ideally, tests co-evolve with the production code to test changes as soon as possible. In this paper, we explore an approach based on association rule mining to determine whether production and test code co-evolve synchronously. Through two case studies, one with an open source and another one with an industrial software system, we show that our association rule mining approach allows one to assess the co-evolution of product and test code in a software project and, moreover, to uncover the distribution of programmer effort over pure coding, pure testing, or a more test-driven-like practice. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 151 - 154 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069493 %> https://flosshub.org/sites/flosshub.org/files/151UsingAssociation.pdf %0 Journal Article %J 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %D 2009 %T Using Software Archaeology to Measure Knowledge Loss in Software Projects Due to Developer Turnover %A Izquierdo-Cortazar, Daniel %A Gregorio Robles %A Ortega, Felipe %A Jesus M. Gonzalez-Barahona %K attrition %K case study %K developers %K evince %K evolution %K gimp %K growth %K knowledge collaboration %K lines of code %K nautilus %K quality %K sloc %K turnover %X Developer turnover can result in a major problem when developing software. When senior developers abandon a software project, they leave a knowledge gap that has to be managed. In addition, new (junior) developers require some time in order to achieve the desired level of productivity. In this paper, we present a methodology to measure the effect of knowledge loss due to developer turnover in software projects. For a given software project, we measure the quantity of code that has been authored by developers that do not belong to the current development team, which we define as orphaned code. Besides, we study how orphaned code is managed by the project. Our methodology is based on the concept of software archaeology, a derivation of software evolution. As case studies we have selected four FLOSS (free, libre, open source software) projects, from purely driven by volunteers to company-supported. The application of our methodology to these case studies will give insight into the turnover that these projects suffer and how they have managed it and shows that this methodology is worth being augmented in future research. %B 2009 42nd Hawaii International Conference on System Sciences (HICSS 2009) %I IEEE Computer Society %C Los Alamitos, CA, USA %P 1-10 %@ 978-0-7695-3450-3 %R http://doi.ieeecomputersociety.org/10.1109/HICSS.2009.1014 %> https://flosshub.org/sites/flosshub.org/files/07-07-08.pdf %0 Journal Article %J Information & Management %D 2009 %T Virtual organizational learning in open source software development projects %A Yoris A. Au %A Darrell Carpenter %A Xiaogang Chen %A Jan G. Clark %K bug fixing %K bugs %K learning %K Project performance %K sourceforge %K team size %K teams %K virtual organization %X We studied virtual organizational learning in open source software (OSS) development projects. Specifically, our research focused on learning effects of OSS projects and the factors that affect the learning process. The number and percentage of resolved bugs and bug resolution time of 118 SourceForge.net OSS projects were used to measure the learning effects. Projects were characterized by project type, number and experience of developers, number of bugs, and bug resolution time. Our results provided evidence of virtual organizational learning in OSS development projects and support for several factors as determinants of performance. Team size was a significant predictor, with mid-sized project teams functioning best. Teams of three to seven developers exhibited the highest efficiency over time and teams of eight to 15 produced the lowest mean time for bug resolution. Increasing the percentage of bugs assigned to specific developers or boosting developer participation in other OSS projects also improved performance. Furthermore, project type introduced variability in project team performance. %B Information & Management %V 46 %P 9 - 15 %U http://www.sciencedirect.com/science/article/B6VD0-4V1D7NT-1/2/a3bbf7652c674f753398160b8f05f6e9 %R DOI: 10.1016/j.im.2008.09.004 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Visualizing Gnome with the Small Project Observatory %A Lungu, Mircea %A Malnati, Jacopo %A Lanza, Michele %K bugzilla %K contributions %K gnome %K msr challenge %K spo %K visualization %X We analyzed the gnome family of systems with the small project observatory, our online ecosystem visualization platform. We begin by briefly introducing the model of SPO. We then observe and discuss several phases in the activity of the gnome ecosystem. We follow and look at how the contributors are distributed between writing source code and doing other activities such as internationalization. We end with a visual overview of the activity of more than 900 contributors in the 10 years of existence of gnome. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 103 - 106 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069487 %> https://flosshub.org/sites/flosshub.org/files/103Lung2009a.pdf %0 Journal Article %J Information & Management %D 2009 %T Volunteers' involvement in online community based software development %A Bo Xu %A Donald R. Jones %A Bingjia Shao %K age %K developers %K effectiveness %K function points %K ideology %K leadership %K MOTIVATION %K scm %K sourceforge %K status %K Survey %K team size %K Volunteers %X We sought to gain understanding of voluntary developers' involvement in open source software (OSS) projects. Data were collected from voluntary developers working on open source projects. Our findings indicated that a voluntary developer's involvement was very important to his or her performance and that involvement was dependent on individual motivations (personal software needs, reputation and skills gaining expectation, enjoyment in open source coding) and project community factors (leadership effectiveness, interpersonal relationship, community ideology). Our work contributes theoretically and empirically to the body of OSS research and has practical implications for OSS project management. %B Information & Management %V 46 %P 151 - 158 %U http://www.sciencedirect.com/science/article/B6VD0-4VP1CN0-1/2/8e1c7be4fcedd1419209c5c843ffa923 %R DOI: 10.1016/j.im.2008.12.005 %0 Conference Paper %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %D 2009 %T Weaving a Semantic Web across OSS repositories: a spotlight on bts­link, UDD, SWIM %A Olivier Berger %A Valentin Vlasceanu %A Christian Bac %A Laurière, Stéphane %K bts-link %K bug tracker %K bugzilla %K debian %K ecosystem %K helios %K mandriva %K semantic Web %K swim %K udd %X Several public repositories and archives of facts about libre software projects, developed either by open source communities or by research communities, have been flourishing over the Web in the recent years. These enable new analysis and support new quality assurance tasks. By using Semantic Web techniques, the databases containing data about open-source software projects development can be interconnected, hence letting OSS partakers identify resources, annotate them and further interlink them using dedicated properties, collectively designing a distributed semantic graph. Such links expressed with standard Semantic techniques are paving the way to new applications (including ones meant for “end-users”). For instance this may have an impact on the way research efforts are conducted (less fragmented), and could also be used by development communities to improve Quality Assurance tasks. A goal of the research conducted within the HELIOS project, is to address bugtracker synchronization issues. For that, the potential of using Semantic Web technologies in navigating between many different bugtracker systems scattered all over the open source ecosystem is being investigated. This position paper presents some existing tools, projects and models proposed by OSS actors that are complementary to research initiatives, and that are likely to lead to useful future developments: UDD (Ultimate Debian Database) and bts-link, developed by the Debian community, and SWIM (Semantic Web enabled Issue Manager) developed by Mandriva. The HELIOS team welcomes comments on the future paths that can be considered in using the Semantic Web approach for improving these projects. %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %> https://flosshub.org/sites/flosshub.org/files/HELIOS-WOPDASD-improved-Olivier.pdf %0 Journal Article %J International Journal of Open Source Software and Processes %D 2009 %T What Makes Free/Libre Open Source Software (FLOSS) Projects Successful? An Agent-Based Model of FLOSS Projects %A Radtke, Nicholas P. %A Janssen, Marco A. %A Collofello, James S. %K Agent-Based Model %K Emergent Properties %K FLOSS %K open source %K Prediction Success %K Simulation %X The last few years have seen a rapid increase in the number of Free/Libre Open Source Software (FLOSS) projects. Some of these projects, such as Linux and the Apache web server, have become phenomenally successful. However, for every successful FLOSS project there are dozens of FLOSS projects which never succeed. These projects fail to attract developers and/or consumers and, as a result, never get off the ground. The aim of this research is to better understand why some FLOSS projects flourish while others wither and die. This article presents a simple agent-based model that is calibrated on key patterns of data from SourceForge, the largest online site hosting open source projects. The calibrated model provides insight into the conditions necessary for FLOSS success and might be used for scenario analysis of future developments of FLOSS. %B International Journal of Open Source Software and Processes %V 1 %P 1 - 13 %U http://www.public.asu.edu/~majansse/pubs/ijossp09.pdf %N 2 %R 10.4018/jossp.2009040101 %> https://flosshub.org/sites/flosshub.org/files/ijossp09.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Advances in the Sourceforge Research Data Archive %A Matthew Van Antwerp %A Madey, Greg %K forge %K forges %K repositories %K repository %K sourceforge %K srda %X The SourceForge Research Data Archive (SRDA), located at http://zerlot.cse.nd.edu, is a collection of Open Source Software (OSS) data and resources [6]. Over 100 researchers worldwide use the archive for research in many fields. In this paper, we describe the recent changes, the work in progress, and future plans for making the archive easier to use and for allowing more advanced research to be done with the data available. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 25-29 %8 2009 %> https://flosshub.org/sites/flosshub.org/files/srda2008.pdf %0 Journal Article %J Information Economics and Policy %D 2008 %T The allocation of collaborative efforts in open-source software %A den Besten, Matthijs %A Jean-Michel Dalle %A Galia, Fabrice %K age %K apache %K complexity %K cvs %K division of labor %K functions %K gaim %K gcc %K ghostscript %K lines of code %K loc %K log files %K mozilla %K netbsd %K openssh %K postgresql %K python %K revision control %K scm %K size %K source code %K Stigmergy %K version control %X The article investigates the allocation of collaborative efforts among core developers (maintainers) of open-source software by analyzing on-line development traces (logs) for a set of 10 large projects. Specifically, we investigate whether the division of labor within open-source projects is influenced by characteristics of software code. We suggest that the collaboration among maintainers tends to be influenced by different measures of code complexity. We interpret these findings by providing preliminary evidence that the organization of open-source software development would self-adapt to characteristics of the code base, in a 'stigmergic' manner. %B Information Economics and Policy %V 20 %P 316 - 322 %U http://www.sciencedirect.com/science/article/B6V8J-4SSG4PN-1/2/88b3824c30a31c18929d8a5ca6d64f62 %R DOI: 10.1016/j.infoecopol.2008.06.003 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools %A Hill, Emily %A Fry, Zachary P. %A Boyd, Haley %A Sridhara, Giriprasad %A Novikova, Yana %A Pollock, Lori %A Vijay-Shanker, K. %K automatic abbreviation expansion %K azureus %K itext.net %K liferay %K maintenance %K natural language %K openoffice.org %K program comprehension %K source code %K tiger envelopes %K tools %X When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. In this paper, we present an automated approach to mining abbreviation expansions from source code to enhance software maintenance tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 79–88 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370771 %R http://doi.acm.org/10.1145/1370750.1370771 %> https://flosshub.org/sites/flosshub.org/files/p79-hill.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Analyzing the evolution of eclipse plugins %A Wermelinger, Michel %A Yu, Yijun %K architectural evolution %K cvs %K eclipse %K metadata %K msr challenge %K releases %K source code %X Eclipse is a good example of a modern component-based complex system that is designed for long-term evolution, due to its architecture of reusable and extensible components. This paper presents our preliminary results about the evolution of Eclipse's architecture, based on a lightweight and scalable analysis of the metadata in Eclipse's sources. We find that the development of Eclipse follows a systematic process: most architectural changes take place in milestones, and maintenance releases only make exceptional changes to component dependencies. We also found a stable architectural core that remains since the first release. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 133–136 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370783 %R http://doi.acm.org/10.1145/1370750.1370783 %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Are FLOSS developers committing to CVS/SVN as much as they are talking in mailing lists? Challenges for Integrating data from Multiple Repositories %A Sowe, Sulayman K. %A Samoladas, Ioannis %A Ioannis Stamelos %A Lefteris Angelis %K cvs %K cvsanaly %K developers %K email %K email archives %K flossmetrics %K mailing list %K mlstats %K source code %X This paper puts forward a framework for investigating Free and Open Source Software (F/OSS) developers activities in both source code and mailing lists repositories. We used data dumps of fourteen pro jects from the FLOSSMetrics (FM) retrieval system. Our intentions are (i) to present a possible methodology, its advantages and disadvantages which can benefit future researchers using some aspects of the FM retrieval system’s data dumps, and (ii) discuss our initial research results on the contributions developers make to both coding and lists activities. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 49-54 %8 09/2008 %> https://flosshub.org/sites/flosshub.org/files/49-542008.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Author Entropy: A Metric for Characterization of Software Authorship Patterns %A Taylor, Quinn C. %A Stevenson, James E. %A Delorey, Daniel P. %A Knutson, Charles D. %K developers %K entropy %K flossmole %K sourceforge %X We propose the concept of author entropy and describe how file-level entropy measures may be used to understand and characterize authorship patterns within individual files, as well as across an entire project. As a proof of concept, we compute author entropy for 28,955 files from 33 open-source projects. We explore patterns of author entropy, identify techniques for visualizing author entropy, and propose avenues for further study. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 42-47 %8 2008 %> https://flosshub.org/sites/flosshub.org/files/entropy2008.pdf %0 Conference Paper %B the 2008 international workshopProceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T Branching and merging in the repository %A Spacco, Jamie %A Williams, Chadd C. %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K argouml %K changes %K cvs2svn %K diffj %K revision %K scm %K source code %K version control %X Two of the most complex operations version control software allows a user to perform are branching and merging. Branching provides the user the ability to create a copy of the source code to allow changes to be stored in version control but outside of the trunk. Merging provides the user the ability to copy changes from a branch to the trunk. Performing a merge can be a tedious operation and one that may be error prone. In this paper, we compare file revisions found on branches with those found on the trunk to determine when a change that is applied to a branch is moved to the trunk. This will allow us to study how developers use merges and to determine if merges are in fact more error prone than other commits. %B the 2008 international workshopProceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 19-22 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370754 %> https://flosshub.org/sites/flosshub.org/files/p19-williams.pdf %0 Journal Article %J Journal of Database Management %D 2008 %T Bug Fixing Practices within Free/Libre Open Source Software Development Teams %A Kevin Crowston %A Barbara Scozzi %K activity %K bug tracker %K bug tracking system %K coordination %K downloads %K dynapi %K effectiveness %K FLOSS %K gaim %K kicq %K phpmyadmin %K project success %K size %K status %X Free/libre open source software (FLOSS, e.g., Linux or Apache) is primarily developed by distributed teams. Developers contribute from around the world and coordinate their activity almost exclusively by means of email and bulletin boards, yet some how profit from the advantages and evade the challenges of distributed software development. In this article we investigate the structure and the coordination practices adopted by development teams during the bug-fixing process, which is considered one of main areas of FLOSS project success. In particular, based on a codification of the messages recorded in the bug tracking system of four projects, we identify the accomplished tasks, the adopted coordination mechanisms, and the role undertaken by both the FLOSS development team and the FLOSS community. We conclude with suggestions for further research. %B Journal of Database Management %V 19 %P 1–30 %> https://flosshub.org/sites/flosshub.org/files/CrowstonScozziJDBM2008.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Collecting data from distributed FOSS projects %A Fagerholm, Fabian %A Taina, Juha %K bitkeeper %K bug tracking system %K cvs %K distributed %K email archive %K fork rate %K git %K life cycle %K linux %K linux kernel %K mailing list %K merge rate %K subversion %K svn %K version control %X A key trait of Free and Open Source Software (foss) development is its distributed nature. Nevertheless, two project-level operations, the fork and the merge of program code, are among the least well understood events in the lifespan of a foss project. Some projects have explicitly adopted these operations as the primary means of concurrent development. In this study, we examine the effect of highly distributed software development, as found in the Linux kernel project, on collection and modelling of software development data. We find that distributed development calls for sophisticated temporal modelling techniques where several versions of the source code tree can exist at once. Attention must be turned towards the methods of quality assurance and peer review that projects employ to manage these parallel source trees. Our analysis indicates that two new metrics, fork rate and merge rate, could be useful for determining the role of distributed version control systems in foss projects. The study presents a preliminary data set consisting of version control and mailing list data. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 8-13 %8 2009 %> https://flosshub.org/sites/flosshub.org/files/fagerholm.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Cross-repository data linking with RDF and OWL %A Howison, James %K data integration %K flossmole %K forges %K integration %K owl %K RDF %K repositories %K semantic %K semantic Web %K sparql %K srda %X This paper provides an approach to the problem of integrating data from multiple research repositories for FLOSS data. It introduces semantic web technologies (RDF, OWL, OWL-DL reasoners and SPARQL) to argue that these are useful for building shared research infrastructure. The paper illustrates its point by describing parts of an ontology developed for the integration and analysis of project communications drawn from FLOSSmole, the Notre Dame archive and direct collection of data. RDF vocabularies provide a way to agree on things we agree about as well as a way to be clearer about ways in which we disagree. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 15-22 %8 2009 %> https://flosshub.org/sites/flosshub.org/files/howison2008.pdf %0 Conference Paper %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T Determinism and evolution %A González-Barahona, Jesús M. %A Gregorio Robles %A Herraiz, Israel %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K changes %K evolution %K source code %K sourceforge %X It has been proposed that software evolution follows a Self-Organized Criticality (SOC) dynamics. This fact is supported by the presence of long range correlations in the time series of the number of changes made to the source code over time. Those long range correlations imply that the current state of the project was determined time ago. In other words, the evolution of the software project is governed by a sort of determinism. But this idea seems to contradict intuition. To explore this apparent contradiction, we have performed an empirical study on a sample of 3,821 libre (free, open source) software projects, finding that their evolution projects is short range correlated. This suggests that the dynamics of software evolution may not be SOC, and therefore that the past of a project does not determine its future except for relatively short periods of time, at least for libre software. %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 1-9 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370752 %> https://flosshub.org/sites/flosshub.org/files/p1-herraiz.pdf %0 Journal Article %J Industrial and Corporate Change %D 2008 %T Dynamics of innovation in an "open source" collaboration environment: lurking, laboring, and launching FLOSS projects on SourceForge %A David, P. A. %A Rullani, F. %K contributors %K core %K developers %K roles %K SFnetDataset %K sourceforge %K users %K virtual communities %K virtual organization %K virtual organizations %X A systems analysis perspective is adopted to examine the critical properties of the Free/Libre/Open Source Software (FLOSS) mode of innovation, as reflected on the SourceForge platform (SF.net). This approach re-scales March's (1991) framework and applies it to characterize the “innovation system” of a “distributed organization” of interacting agents in a virtual collaboration environment, rather than to innovation within a firm. March (1991) views the process of innovation at the organizational level as the coupling of sub-processes of exploration and exploitation. Correspondingly, the innovation system of the virtual collaboration environment represented by SF.net is an emergent property of two “coupled” processes: one involves the interactions among agents searching the locale for information and knowledge resources to use in designing novel software products (i.e., exploration), and the other involves the mobilization of individuals’ capabilities for application in the software development projects that become established on the platform (i.e., exploitation). The micro-dynamics of this system are studied empirically by constructing transition probability matrices representing the movements of 222,835 SF.net users among seven different activity states, which range from “lurking” (not contributing or contributing to projects without becoming a member) to “laboring” (joining one or more projects as members), and to “launching” (founding one or more projects) within each successive 6-month interval. The estimated probabilities are found to form first-order Markov chains describing ergodic processes. This makes it possible the computation of the equilibrium distribution of agents among the states, thereby suppressing transient effects and revealing persisting patterns of project joining and project launching. The latter show the FLOSS innovation process on SF.net to be highly dissipative: a very large proportion of the registered “developers” fail to become even minimally active on the platform. There is nevertheless an active core of mobile project joiners, and a (still smaller) core of project founders who persist in creating new projects. The structure of these groups’ interactions (as displayed within the 3-year period examined) is investigated in detail, and it is shown that it would be sufficient to sustain both the exploration and exploitation phases of the platform's global dynamics. %B Industrial and Corporate Change %V 17 %P 647 - 710 %8 07/2008 %N 4 %! Industrial and Corporate Change %R 10.1093/icc/dtn026 %0 Journal Article %J Information Economics and Policy (Empirical Issues in Open Source Software) %D 2008 %T Effort modeling and programmer participation in open source software projects %A Koch, Stefan %K cvs %K developers %K email %K email archives %K gnome %K lines of code %K scm %K Software repository mining %K source code %K sourceforge %X This paper develops models for programmer participation and effort estimation in open source software projects and employs the results to assess the efficiency of open source software creation. Successful development of such models will be important for decision makers of various kinds. We propose hypotheses based on a prior case study on manpower function and effort modeling. A large data set retrieved from a project repository is used to test these hypotheses. The main results are that if Norden-Rayleigh-based approaches are used, they need to be complemented in order to account for the addition of new features during a product life cycle, and that programmer-participation based effort models result in distinctly lower estimations of effort than those based on output metrics, such as lines of code. %B Information Economics and Policy (Empirical Issues in Open Source Software) %V 20 %P 345 - 355 %8 12/2008 %U http://www.sciencedirect.com/science/article/B6V8J-4SSND1J-1/2/c857fa1493e19aa7fe4297dedb077b3a %R DOI: 10.1016/j.infoecopol.2008.06.004 %> https://flosshub.org/sites/flosshub.org/files/KochEffortModeling.pdf %0 Journal Article %J IEEE Transactions on Software Engineering %D 2008 %T An Empirical Study on the Relationship Between Software Design Quality, Development Effort and Governance in Open Source Projects %A Capra, E. %A Francalanci, C. %A Merlo, F. %K effort estimation %K governance %K quality %K source code %X The relationship among software design quality, development effort, and governance practices is a traditional research problem. However, the extent to which consolidated results on this relationship remain valid for open source (OS) projects is an open research problem. An emerging body of literature contrasts the view of open source as an alternative to proprietary software and explains that there exists a continuum between closed and open source projects. This paper hypothesizes that as projects approach the OS end of the continuum, governance becomes less formal. In turn a less formal governance is hypothesized to require a higher-quality code as a means to facilitate coordination among developers by making the structure of code explicit and facilitate quality by removing the pressure of deadlines from contributors. However, a less formal governance is also hypothesized to increase development effort due to a more cumbersome coordination overhead. The verification of research hypotheses is based on empirical data from a sample of 75 major OS projects. Empirical evidence supports our hypotheses and suggests that software quality, mainly measured as coupling and inheritance, does not increase development effort, but represents an important managerial variable to implement the more open governance approach that characterizes OS projects which, in turn, increases development effort. %B IEEE Transactions on Software Engineering %V 34 %P 765 - 782 %8 11/2008 %N 6 %! IIEEE Trans. Software Eng. %R 10.1109/TSE.2008.68 %0 Journal Article %J Information Management & Computer Security %D 2008 %T Evaluating the performance of open source software projects using data envelopment analysis %A Wray, Barry %A Mathieu, Richard %A Teets, J. %K dea %K efficiency %K Project performance %K sourceforge %X Purpose – The purpose of this paper is to develop and test a model of the relative performance of open source software (OSS) projects. Design/methodology/approach – This paper evaluates the relative performance of OSS projects by evaluating multiple project inputs and multiple project outputs by using a data envelopment analysis (DEA) model. The DEA model produces an efficiency score for each project based on project inputs and outputs. The method of producing an efficiency score is based on the convex envelopment technology structure. The efficiency measure quantifies a “distance” to an efficient frontier. Findings – The DEA model produced an index of corresponding intensities linking an inefficient project to its benchmark efficient project(s). The inefficiency measures produced an ordering of inefficient projects. Eight projects were found to be “efficient” and used as benchmarking projects. Research limitations/implications – This research is limited to only security-based OSS projects. Future research on other areas of OSS projects is warranted. Practical implications – The result of this research is a practical model that can be used by OSS project developers to evaluate the relative performance of their projects and make resource decisions. Originality/value – This research extends the work of previous studies that have examined the relative performance of software development projects in a traditional development environment. As a result of this research, OSS projects can now be adequately benchmarked and evaluated according to project performance. An OSS project manger can effectively use these results to critically evaluate resources for their project and judge the relative efficiency of the resources. %B Information Management & Computer Security %V 16 %P 449 - 462 %8 2008 %N 5 %! Information Management & Computer Security %R 10.1108/09685220810920530 %0 Conference Paper %B Electronic Notes in Theoretical Computer Science %D 2008 %T Evaluating the Quality of Open Source Software %A Diomidis Spinellis %A Gousios, Georgios %A Vassilios Karakoidas %A Panagiotis Louridas %A Paul J. Adams %A Samoladas, Ioannis %A Ioannis Stamelos %K bug tracking system %K email %K email archives %K mailing list %K metrics %K open source %K process quality attributes %K product quality attributes %K source code %K SQO-OSS %K wiki %X Traditionally, research on quality attributes was either kept under wraps within the organization that performed it, or carried out by outsiders using narrow, black-box techniques. The emergence of open source software has changed this picture allowing us to evaluate both software products and the processes that yield them. Thus, the software source code and the associated data stored in the version control system, the bug tracking databases, the mailing lists, and the wikis allow us to evaluate quality in a transparent way. Even better, the large number of (often competing) open source projects makes it possible to contrast the quality of comparable systems serving the same domain. Furthermore, by combining historical source code snapshots with significant events, such as bug discoveries and fixes, we can further dig into the causes and effects of problems. Here we present motivating examples, tools, and techniques that can be used to evaluate the quality of open source (and by extension also proprietary) software. %B Electronic Notes in Theoretical Computer Science %I The Reengineering Forum %V 233 %P 5–28 %8 03/2009 %U http://www.dmst.aueb.gr/dds/pubs/conf/2008-SQM-SQOOSS/html/SGKL09.html %R 10.1016/j.entcs.2009.02.058 %> https://flosshub.org/sites/flosshub.org/files/entcs-sqooss.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Expertise identification and visualization from CVS %A Alonso, Omar %A Premkumar T. Devanbu %A Gertz, Michael %K apache %K classification %K committers %K components %K contributors %K expertise %K expertise identification %K repository %K scm %K source code %X As software evolves over time, the identification of expertise becomes an important problem. Component ownership and team awareness of such ownership are signals of solid project. Ownership and ownership awareness are also issues in open-source software (OSS) projects. Indeed, the membership in OSS projects is dynamic with team members arriving and leaving. In large open source projects, specialists who know the system very well are considered experts. How can one identify the experts in a project by mining a particular repository like the source code? Have they gotten help from other people? We provide an approach using classification of the source code tree as a path to derive the expertise of the committers. Because committers may get help from other people, we also retrieve their contributors. We also provide a visualization that helps to further explore the repository via committers and categories. We present a prototype implementation that describes our research using the Apache HTTP Web server project as a case study. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 125–128 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370780 %R http://doi.acm.org/10.1145/1370750.1370780 %> https://flosshub.org/sites/flosshub.org/files/p125-alonso.pdf %0 Journal Article %J Information Economics and Policy %D 2008 %T Explaining leadership in virtual teams: The case of open source software %A Paola Giuri %A Francesco Rullani %A Salvatore Torrisi %K contributors %K Human capital %K leadership %K roles %K sourceforge %K team %X This paper contributes to the open source software (OSS) literature by investigating the likelihood that a participant becomes a project leader. Project leaders are key actors in a virtual community and are crucial to the success of the OSS model. Knowledge of the forces that lead to the emergence of project managers among the multitude of participants is still limited. We aim to fill this gap in the literature by analyzing the association between the roles played by an individual who is registered with a project, and a set of individual-level and project-level characteristics. In line with the theory of occupational choice elaborated by (Lazear, E.P., 2002. Entrepreneurship. NBER Working Paper No. 9109, Cambridge, Mass; Lazear, E.P., 2004. Balanced skills and entrepreneurship, American Economic Review 94, pp. 208-211), we find that OSS project leaders possess diversified skill sets which are needed to select the inputs provided by various participants, motivate contributors, and coordinate their efforts. Specialists, like pure developers, are endowed with more focused skill sets. Moreover, we find that the degree of modularity of the development process is positively associated with the presence of project leaders. That result is consistent with the modern theory of modular production (Baldwin, C.Y., Clark, K.B., 1997. Managing in an age of modularity. Harvard Business Review September-October. pp. 84-93; Mateos-Garcia, J., Steinmueller, W.E., 2003. The Open Source Way of Working: A New Paradigm for the Division of Labour in Software Development? SPRU - Science and Technology Policy Studies. Open Source Movement Research INK Working Paper, No. 1; Aoki, M., 2004. An organizational architecture of T-form: Silicon Valley clustering and its institutional coherence. Industrial and Corporate Change 13, pp. 967-981). %B Information Economics and Policy %V 20 %P 305 - 315 %U http://www.sciencedirect.com/science/article/B6V8J-4SRW10C-1/2/5ce36096ba3947338962268b54a5a7a9 %R DOI: 10.1016/j.infoecopol.2008.06.002 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T An extension of fault-prone filtering using precise training and a dynamic threshold %A Hata, Hideaki %A Mizuno, Osamu %A Kikuno, Tohru %K eclipse %K fault-prone modules %K spam filter %K text mining %X Fault-prone module detection in source code is important for assurance of software quality. Most previous fault-prone detection approaches have been based on software metrics. Such approaches, however, have difficulties in collecting the metrics and in constructing mathematical models based on the metrics. To mitigate such difficulties, we have proposed a novel approach for detecting fault-prone modules using a spam-filtering technique, named Fault-Prone Filtering. In our approach, fault-prone modules are detected in such a way that the source code modules are considered as text files and are applied to the spam filter directly. In practice, we use the training only errors procedure and apply this procedure to fault-prone. Since no pre-training is required, this procedure can be applied to an actual development field immediately. This paper describes an extension of the training only errors procedures. We introduce a precise unit of training, "modified lines of code," instead of methods. In addition, we introduce the dynamic threshold for classification. The result of the experiment shows that our extension leads to twice the precision with about the same recall, and improves 15% on the best F1 measurement. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 89–98 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370772 %R http://doi.acm.org/10.1145/1370750.1370772 %> https://flosshub.org/sites/flosshub.org/files/p89-hata.pdf %0 Conference Paper %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T Extracting structural information from bug reports %A Premraj, Rahul %A Zimmermann, Thomas %A Kim, Sunghun %A Bettenburg, Nicolas %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K bug reports %K eclipse %K enumerations %K infozilla %K natural language %K patches %K source code %K stack trace %X In software engineering experiments, the description of bug reports is typically treated as natural language text, although it often contains stack traces, source code, and patches. Neglecting such structural elements is a loss of valuable information; structure usually leads to a better performance of machine learning approaches. In this paper, we present a tool called infoZilla that detects structural elements from bug reports with near perfect accuracy and allows us to extract them. We anticipate that infoZilla can be used to leverage data from bug reports at a different granularity level that can facilitate interesting research in the future. %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 27-30 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370757 %> https://flosshub.org/sites/flosshub.org/files/p27-bettenburg.pdf %0 Journal Article %J Information Economics and Policy %D 2008 %T Geographic origin of libre software developers %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %A Roberto Andradas-Izquierdo %A Rishab Aiyer Ghosh %K developers %K email %K email address %K email archives %K geography %K mailing list %K open source software %K sourceforge %K timezone %K users %X This paper examines the claim that libre (free, open source) software involves global development. The anecdotal evidence is that developers usually work in teams including individuals residing in many different geographical areas, time zones and even continents and that, as a whole, the libre software community is also diverse in terms of national origin. However, its exact composition is difficult to capture, since there are few records of the geographical location of developers. Past studies have been based on surveying a limited (and sometimes biased) sample and extrapolating that sample to the global distribution of developers. In this paper we present an alternate approach in which databases are analyzed to create traces of information from which the geographical origin of developers can be inferred. Applying this technique to the SourceForge users database and the mailing lists archives from several large projects, we have estimated the geographical origin of more than one million individuals who are closely related to the libre software development process. The paper concludes that the result is a good proxy for the actual distribution of libre software developers working on global projects. %B Information Economics and Policy %V 20 %P 356 - 363 %U http://www.sciencedirect.com/science/article/B6V8J-4T3DCPK-1/2/3981dfbc523eae1d1ce65fb1f0c0edb7 %R DOI: 10.1016/j.infoecopol.2008.07.001 %0 Generic %D 2008 %T How Do Firms Make Use of Open Source Communities? %A Linus Dahlander %A M Magnusson %K case study %K cendio %K email %K mailing list %K mysql %K roxen %K secondary data %K sot %X Relying on four in-depth case studies of firms involved with open source software, we investigate how firms make use of open source communities, and how that use is associated with their business models. Three themes - accessing, aligning and assimilating -are inductively developed for how the firms relate to the external knowledge created in the communities. For each theme, we make an argument about the tactics associated with each theme and their positive and negative consequences. The findings are related to the literature on the open and distributed nature of innovation, and various theoretical and managerial implications are discussed. %B Long Range Planning %V 41 %P 629-649 %8 Dec %G eng %U http://www.acm.jhu.edu/~paulproteus/tmp/sdarticle.pdf %> https://flosshub.org/sites/flosshub.org/files/dahlandermagnusson2008.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Improving change descriptions with change contexts %A Parnin, Chris %A Görg, Carsten %K bytecode analysis %K cecil %K change management %K change pairs %K semantic diff %K zedgraph %X Software archives are one of the best sources available to researchers for understanding the software development process. However, much detective work is still necessary in order to unravel the software development story. During this process, researchers must isolate changes and follow their trails over time. In support of this analysis, several research tools have provided different representations for connecting the many changes extracted from software archives. Most of these tools are based on textual analysis of source code and use line-based differencing between software versions. This approach limits the ability to process changes structurally resulting in less concise and comparable items. Adoption of structure-based approaches have been hampered by complex implementations and overly verbose change descriptions. We present a technique for expressing changes that is fine-grained but preserves some structural aspects. The structural information itself may not have changed, but instead provides a context for interpreting the change. This in turn, enables more relevant and concise descriptions in terms of software types and programming activities. We apply our technique to common challenges that researchers face, and then we discuss and compare our results with other techniques. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 51–60 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370765 %R http://doi.acm.org/10.1145/1370750.1370765 %> https://flosshub.org/sites/flosshub.org/files/p51-parnin.pdf %0 Conference Paper %B 3nd International Workshop on Public Data about Software Development (WoPDaSD 2008), Milano, Italy, September 2008 %D 2008 %T Improving community awareness in software forges by semantical aggregation of tools feeds %A Quang Vu Dang %A Christian Bac %A Olivier Berger %A Xuan Sang Dao %K community of practice %K DOAF. %K FOAF %K free and open source software development %K public data %K RDF %K semantic Web %K social filtering %K social network analysis %X It is rather difficult to monitor or visualize what can be the contribution of a member in a project, especially when the project uses multiple tools to produce its results. This is the case for collaborative development of FLOSS software, that use Wiki, bug tracker, mailing lists and source code management tools. This paper presents an approach to data collection by using aggregation of feeds published by the different tools of a software forge. To allow this aggregation, collected data is semantically reformatted into Semantic Web standards: RDF, DC, DOAP, and FOAF. Resulting data can then be processed, republished or displayed to project members. We implemented this approach in a supervision module that has been integrated into the PicoForge platform. This module is able do draw a live graph of the social community out of the different sources of data, and in turn export semantic feeds for other uses. %B 3nd International Workshop on Public Data about Software Development (WoPDaSD 2008), Milano, Italy, September 2008 %G eng %> https://flosshub.org/sites/flosshub.org/files/Paper4.pdf %0 Journal Article %J Science Studies %D 2008 %T The Material and Social Dynamics of Motivation: Contributions to Open Source Language Technology Development %A Stephanie Freeman %K contributions %K developers %K email %K email archives %K mailing list %K MOTIVATION %K openoffice %K openoffice.org %K secondary data %K Volunteers %X Volunteer motivation has been a central theme in Free/Libre/Open Source Software (FLOSS) literature. This research has been largely dominated by economists who rely in their surveys on the distinction between intrinsic and extrinsic motivations and the "hacker ethic" for profit juxtaposition. The paper argues that survey-based analytical frameworks and research designs have led to a focus on some motivational attributions at the expense of others. It then presents a case study that explores dynamic, non individualistic and content-sensitive aspects of motivations. The approach is based on socio-cultural psychology and the author's observations of a hybrid firm-community FLOSS project, OpenOffice.org. Instead of separating intrinsic motivations from extrinsic ones, it is argued that complex and changing patterns of motivations are tied to changing objects and personal histories prior to and during participation. The boundary between work and hobby in an individual's participation path is blurred and shifting. %B Science Studies %G eng %> https://flosshub.org/sites/flosshub.org/files/Freeman.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Mining usage expertise from version archives %A Schuler, David %A Zimmermann, Thomas %K api %K computer-supported cooperative work %K eclipse %K expertise %K recommendation %K scm %K software repository %K source code %X In software development, there is an increasing need to find and connect developers with relevant expertise. Existing expertise recommendation systems are mostly based on variations of the Line 10 Rule: developers who changed a file most often have the most implementation expertise. In this paper, we introduce the concept of usage expertise, which manifests itself whenever developers are using functionality, e.g., by calling API methods. We present preliminary results for the ECLIPSE project that demonstrate that our technique allows to recommend experts for files with no or little history, identify developers with similar expertise, and measure the usage of API methods. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 121–124 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370779 %R http://doi.acm.org/10.1145/1370750.1370779 %> https://flosshub.org/sites/flosshub.org/files/p121-schuler.pdf %0 Conference Paper %B Proceedings of the 30th international conference on Software engineering %D 2008 %T Power through brokering: open source community participation in software engineering student projects %A Krogstie, Birgit R. %K Communities Of Practice %K computer science education %K FLOSS %K open source %K software engineering %K software engineering education %X Many software engineering projects use open source software tools or components. The project team's active participation in the open source community may be necessary for the team to use the technology. Based on an in-depth field study of industry software engineering project students interacting with an open source community, we find that participation in the community may affect the team's work and learning by strengthening the power of the broker between the team and the community. We outline pitfalls and benefits of having student teams acquire development-related knowledge from open source communities. The findings are relevant to the organization and supervision of software engineering student projects interacting with open source communities. %B Proceedings of the 30th international conference on Software engineering %S ICSE '08 %I ACM %C New York, NY, USA %P 791–800 %@ 978-1-60558-079-1 %U http://doi.acm.org/10.1145/1368088.1368201 %R 10.1145/1368088.1368201 %0 Conference Paper %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T On the relation of refactorings and software defect prediction %A Sigmund, Thomas %A Gall, Harald C. %A Ratzinger, Jacek %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K argouml %K bug fixing %K bug reports %K defects %K evolution %K jboss %K liferay %K prediction %K refactoring %K spring %K weka %K xdoclet %X This paper analyzes the influence of evolution activities such as refactoring on software defects. In a case study of five open source projects we used attributes of software evolution to predict defects in time periods of six months. We use versioning and issue tracking systems to extract 110 data mining features, which are separated into refactoring and non-refactoring related features. These features are used as input into classification algorithms that create prediction models for software defects. We found out that refactoring related features as well as non-refactoring related features lead to high quality prediction models. Additionally, we discovered that refactorings and defects have an inverse correlation: The number of software defects decreases, if the number of refactorings increased in the preceding time period. As a result, refactoring should be a significant part of both bug fixes and other evolutionary changes to reduce software defects. %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 35-38 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370759 %> https://flosshub.org/sites/flosshub.org/files/p35-ratzinger.pdf %0 Journal Article %J Information and Software Technology %D 2008 %T Self-organization process in open-source software: An empirical study %A Yu, Liguo %K Empirical study; %K evolution %K linux %K requirements %K Self-organization %K software evolution %X Software systems must continually evolve to adapt to new functional requirements or quality requirements to remain competitive in the marketplace. However, different software systems follow different strategies to evolve, affecting both the release plan and the quality of these systems. In this paper, software evolution is considered as a self-organization process and the difference between closed-source software and open-source software is discussed in terms of self-organization. In particular, an empirical study of the evolution of Linux from version 2.4.0 to version 2.6.13 is reported. The study shows how open-source software systems self-organize to adapt to functional requirements and quality requirements. %B Information and Software Technology %V 50 %P 361 - 374 %8 4/2008 %U http://www.sciencedirect.com/science/article/pii/S0950584907000225 %N 5 %! Information and Software Technology %R 10.1016/j.infsof.2007.02.018 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Small patches get in! %A Weißgerber, Peter %A Neu, Daniel %A Diehl, Stephan %K case study %K cvs %K email %K email archives %K flac %K mailing list %K openafs %K patch acceptance %K patches %K revision control %K scm %X While there is a considerable amount of research on analyzing the change information stored in software repositories, only few researcher have looked at software changes contained in email archives in form of patches. In this paper we look at the email archives of two open source projects and answer questions like the following: How many emails contain patches? How long does it take for a patch to be accepted? Does the size of the patch influence its chances to be accepted or the duration until it gets accepted? Obviously, the answers to these questions can be helpful for the authors of patches, in particular because some of the answers are surprising. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 67–76 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370767 %R http://doi.acm.org/10.1145/1370750.1370767 %> https://flosshub.org/sites/flosshub.org/files/p67-weissgerber.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Talk and work: a preliminary report %A Pattison, David S. %A Bird, Christian A. %A Premkumar T. Devanbu %K ant %K apache %K email %K mailing lists %K postgresql %K python %K scm %K source code %X Developers in Open Source Software (OSS) projects communicate using mailing lists. By convention, the mailing lists used only for task-related discussions, so they are primarily concerned with the software under development, and software process issues (releases, etc.). We focus on the discussions concerning the software, and study the frequency with which software entities (functions, methods, classes, etc) are mentioned in the mail. We find a strong, striking, cumulative relationship between this mention count in the email, and the number of times these entities are included in changes to the software. When we study the same phenomena over a series of time-intervals, the relationship is much less strong. This suggests some interesting avenues for future research. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 113–116 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370776 %R http://doi.acm.org/10.1145/1370750.1370776 %> https://flosshub.org/sites/flosshub.org/files/p113-pattison.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Towards a simplification of the bug report form in eclipse %A Herraiz, Israel %A Daniel M. German %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %K bug fixing %K bug report %K bug tracking system %K classification %K eclipse %K msr challenge %K severity %X We believe that the bug report form of Eclipse contains too many fields, and that for some fields, there are too many options. In this MSR challenge report, we focus in the case of the severity field. That field contains seven different levels of severity. Some of them seem very similar, and it is hard to distinguish among them. Users assign severity, and developers give priority to the reports depending on their severity. However, if users can not distinguish well among the various severity options, they will probably assign different priorities to bugs that require the same priority. We study the mean time to close bugs reported in Eclipse, and how the severity assigned by users affects this time. The results shows that classifying by time to close, there are less clusters of bugs than levels of severity. We therefore conclude that there is a need to make a simpler bug report form. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 145–148 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370786 %R http://doi.acm.org/10.1145/1370750.1370786 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T What do large commits tell us?: a taxonomical study of large commits %A Hindle, Abram %A Daniel M. German %A Holt, Ric %K boost %K bug fixing %K egroupware %K enlightenment %K evolution %K firebird %K large commits %K maintenance %K mysql %K postgresql %K samba %K software evolution %K source control system %K spring %X Research in the mining of software repositories has frequently ignored commits that include a large number of files (we call these large commits). The main goal of this paper is to understand the rationale behind large commits, and if there is anything we can learn from them. To address this goal we performed a case study that included the manual classification of large commits of nine open source projects. The contributions include a taxonomy of large commits, which are grouped according to their intention. We contrast large commits against small commits and show that large commits are more perfective while small commits are more corrective. These large commits provide us with a window on the development practices of maintenance teams. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 99–108 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370773 %R http://doi.acm.org/10.1145/1370750.1370773 %> https://flosshub.org/sites/flosshub.org/files/p99-hindle.pdf %0 Conference Paper %B Proceedings of the 2008 international workshop on Cooperative and human aspects of software engineering (CHASE '08) %D 2008 %T What dynamic network metrics can tell us about developer roles %A Pohl, Mathias %A Diehl, Stephan %K identifying roles %K social network analysis %X Software development is heavily dependent on the participants of the process and their roles within the process. Each developer has his specific skills and interests and hence contributes to the project in a different way. While some programmers work on separate modules, others developers integrate these modules towards the final product. To identify such different groups of people one approach is to work with methods taken from social network analysis. To this end, a social network has to be defined in a suitable way, and appropriate analysis strategies have to be chosen. This paper shows how a network of software developers could be defined based on information in a software repository, and what it can possibly tell about roles of developers (and what not) in the process of the application server Tomcat. %B Proceedings of the 2008 international workshop on Cooperative and human aspects of software engineering (CHASE '08) %S CHASE '08 %I ACM %C New York, NY, USA %P 81–84 %@ 978-1-60558-039-5 %U http://doi.acm.org/10.1145/1370114.1370135 %R 10.1145/1370114.1370135 %> https://flosshub.org/sites/flosshub.org/files/10.1.1.217.4765.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Analysis of the Linux Kernel Evolution Using Code Clone Coverage %A Livieri, Simone %A Higo, Yoshiki %A Matsushita, Makoto %A Inoue, Katsuro %K ccfinder %K clone %K cloning %K kernel %K linux %K metrics %K source code %X Most studies of the evolution of software systems are based on the comparison of simple software metrics. In this paper, we present our preliminary investigation of the evolution of the Linux kernel using code-clone analysis and the code-clone coverage metrics. We examined 136 versions of the stable Linux kernel using a distributed extension of the code clone detection tool CCFinder. The result is shown as a heat map. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 22 - 22 %@ 0-7695-2950-X %R 10.1109/MSR.2007.1 %> https://flosshub.org/sites/flosshub.org/files/28300022.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Comparing Approaches to Mining Source Code for Call-Usage Patterns %A Kagdi, Huzefa %A Collard, Michael L. %A Maletic, Jonathan I. %K function calls %K functions %K kernel %K linux %K sequence %K sequencing %K sequential-pattern mining %X Two approaches for mining function-call usage patterns from source code are compared. The first approach, itemset mining, has recently been applied to this problem. The other approach, sequential-pattern mining, has not been previously applied to this problem. Here, a call-usage pattern is a composition of function calls that occur in a function definition. Both approaches look for frequently occurring patterns that represent standard usage of functions and identify possible errors. Itemset mining produces unordered patterns, i.e., sets of function calls, whereas, sequential-pattern mining produces partially ordered patterns, i.e., sequences of function calls. The trade-off between the additional ordering context given by sequential-pattern mining and the efficiency of itemset mining is investigated. The two approaches are applied to the Linux kernel v2.6.14 and results show that mining ordered patterns is worth the additional cost. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 20 - 20 %@ 0-7695-2950-X %R 10.1109/MSR.2007.3 %> https://flosshub.org/sites/flosshub.org/files/28300020.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Correlating Social Interactions to Release History during Software Evolution %A Baysal, Olga %A Malton, Andrew J. %K ant %K apache %K change management %K developers %K discussion %K effort estimation %K lsedit %K mailing lists %K scm %K source code %X In this paper, we propose a method to reason about the nature of software changes by mining and correlating discussion archives. We employ an information retrieval approach to find correlation between source code change history and history of social interactions surrounding these changes. We apply our correlation method on two software systems, LSEdit and Apache Ant. The results of these exploratory case studies demonstrate the evidence of similarity between the content of free-form text emails among developers and the actual modifications in the code. We identify a set of correlation patterns between discussion and changed code vocabularies and discover that some releases referred to as minor should instead fall under the major category. These patterns can be used to give estimations about the type of a change and time needed to implement it. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 7 - 7 %@ 0-7695-2950-X %R 10.1109/MSR.2007.4 %> https://flosshub.org/sites/flosshub.org/files/28300007.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Detecting Patch Submission and Acceptance in OSS Projects %A Christian Bird %A Gourley, Alex %A Devanbu, Prem %K apache %K contributions %K mysql %K patches %K postgresql %K python %K scm %K source code %X The success of open source software (OSS) is completely dependent on the work of volunteers who contribute their time and talents. The submission of patches is the major way that participants outside of the core group of developers make contributions. We argue that the process of patch submission and acceptance into the codebase is an important piece of the open source puzzle and that the use of patch-related data can be helpful in understanding how OSS projects work. We present our methods in identifying the submission and acceptance of patches and give results and evaluation in applying these methods to the Apache webserver, Python interpreter, Postgres SQL database, and (with limitations) MySQL database projects. In addition, we present valuable ways in which this data has been and can be used. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 26 - 26 %@ 0-7695-2950-X %R 10.1109/MSR.2007.6 %> https://flosshub.org/sites/flosshub.org/files/28300026.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Determining Implementation Expertise from Bug Reports %A Anvik, John %A Murphy, Gail C. %K bug reports %K developers %K eclipse %K expertise %K repository %K scm %K source code %X As developers work on a software product they accumulate expertise, including expertise about the code base of the software product. We call this type of expertise "implementation expertise". Knowing the set of developers who have implementation expertise for a software product has many important uses. This paper presents an empirical evaluation of two approaches to determining implementation expertise from the data in source and bug repositories. The expertise sets created by the approaches are compared to those provided by experts and evaluated using the measures of precision and recall. We found that both approaches are good at finding all of the appropriate developers, although they vary in how many false positives are returned. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 2 - 2 %@ 0-7695-2950-X %R 10.1109/MSR.2007.7 %> https://flosshub.org/sites/flosshub.org/files/28300002.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Evaluating the Harmfulness of Cloning: A Change Based Experiment %A Lozano, Angela %A Wermelinger, Michel %A Nuseibeh, Bashar %K ccfinder %K clone %K clones %K clonetracker %K cloning %K ctags %K cvs %K dnsjava %K maintenance %K scm %K source code %X Cloning is considered a harmful practice for software maintenance because it requires consistent changes of the entities that share a cloned fragment. However this claim has not been refuted or confirmed empirically. Therefore, we have developed a prototype tool, CloneTracker, in order to study the rate of change of applications containing clones. This paper describes CloneTracker and illustrates its preliminary application on a case study. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 18 - 18 %@ 0-7695-2950-X %R 10.1109/MSR.2007.8 %0 Conference Paper %B OSS2007: Open Source Development, Adoption and Innovation (IFIP 2.13) %D 2007 %T Exploring the Effects of Coordination and Communication Tools on the Efficiency of Open Source Projects using Data Envelopment Analysis %A Koch, Stefan %K metadata %K sourceforge %X In this paper, we propose to explore possible benefits of communication and coordination tools in open source projects using data envelopment analysis (DEA), a general method for efficiency comparisons. DEA offers several advantages: It is a non-parametric optimization method without any need for the user to define any relations between different factors or a production function, can account for economies or diseconwhile omies of scale, and is able to deal with multi-input, multi-output systems in which the factors have different scales. Using a data set of 30 open source project retrieved from SourceForge.net, we demonstrate the application of DEA, showing that the efficiency of the projects is in general relatively high. Regarding the effects of tool employment on the efficiency of projects, the results were surprising: Most of the possible tools, and overall usage, showed a negative relationship to efficiency. %B OSS2007: Open Source Development, Adoption and Innovation (IFIP 2.13) %S IFIP International Federation for Information Processing %I Springer %V 234/2007 %P 97 - 108 %8 2007/// %G eng %& 8 %R http://dx.doi.org/10.1007/978-0-387-72486-7_8 %> https://flosshub.org/sites/flosshub.org/files/Exploring%20the%20Effects%20Coodination.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Identifying Changed Source Code Lines from Version Repositories %A Canfora, Gerardo %A Cerulo, Luigi %A Di Penta, Massimiliano %K argouml %K cvs %K levenshtein %K scm %K source code %X Observing the evolution of software systems at different levels of granularity has been a key issue for a number of studies, aiming at predicting defects or at studying certain phenomena, such as the presence of clones or of crosscutting concerns. Versioning systems such as CVS and SVN, however, only provide information about lines added or deleted by a contributor: any change is shown as a sequence of additions and deletions. This provides an erroneous estimate of the amount of code changed. This paper shows how the evolution of changes at source code line level can be inferred from CVS repositories, by combining information retrieval techniques and the Levenshtein edit distance. The application of the proposed approach to the ArgoUML case study indicates a high precision and recall. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 14 - 14 %@ 0-7695-2950-X %R 10.1109/MSR.2007.14 %> https://flosshub.org/sites/flosshub.org/files/28300014.pdf %0 Journal Article %J Upgrade: The European Journal for the Informatics Professional %D 2007 %T Identifying Success and Abandonment of Free/Libre and Open Source (FLOSS) Commons: A Preliminary Classification of Sourceforge.net projects %A English, R. %A Schweik, C. M. %K flossmole %K sourceforge %K success %X Free/Libre and Open Source Software (FLOSS) projects are a form of commons where individuals work collectively to produce software that is a public, rather than a private, good. The famous phrase “Tragedy of the Commons” describes a situation where a natural resource commons, such as a pasture, or a water supply, gets depleted because of overuse. The tragedy in FLOSS commons is distinctly different: It occurs when collective action ceases before a software product is produced or reaches its full potential. This paper builds on previous work about defining success in FLOSS projects by taking a collective action perspective. We first report the results of interviews with FLOSS developers regarding our ideas about success and failure in FLOSS projects. Building on those interviews and previous work, we then describe our criteria for defining success/tragedy in FLOSS commons. Finally, we discuss the results of a preliminary classification of nearly all projects hosted on Sourceforge.net as of August 2006. %B Upgrade: The European Journal for the Informatics Professional %V VIII %P 54-59 %U http://www.cepis.org/upgrade/files/full-VI-07.pdf %N 6 (December) %0 Journal Article %J Communications of the ACM %D 2007 %T Increased security through open source %A Hoepman, Jaap-Henk %A Jacobs, Bart %K security %X The last few years have shown a worldwide rise in the attention for, and actual use of, open source software (OSS), most notably of the operating system Linux and various applications running on top of it. Various major companies and governments are adopting OSS. As a result, there are many publications concerning its advantages and disadvantages. The ongoing discussions cover a wide range of topics, such as Windows versus Linux, cost issues, intellectual property rights, development methods, etc. Here we wish to focus on security issues surrounding OSS. It has become a reasonably well-established conviction within the computer security community that publishing designs and protocols contributes to the security of systems built on them. But should one go all the way and publish source code as well? That is the fundamental question that we wish to address in this paper. %B Communications of the ACM %I ACM %V 50 %P 79–83 %U https://arxiv.org/pdf/0801.3924.pdf %> https://flosshub.org/sites/flosshub.org/files/0801.3924.pdf %0 Journal Article %J Information & Management %D 2007 %T Investigating recognition-based performance in an open content community: A social capital perspective %A Okoli, C. %A Oh, Wonseok %K open content %K recognition-based performance %K social capital %K social networks %K social status %K virtual communities %X As the open source movement grows, it becomes important to understand the dynamics that affect the motivation of participants who contribute their time freely to such projects. One important motivation that has been identified is the desire for formal recognition in the open source community. We investigated the impact of social capital in participants' social networks on their recognition-based performance; i.e., the formal status they are accorded in the community. We used a sample of 465 active participants in the Wikipedia open content encyclopedia community to investigate the effects of two types of social capital and found that network closure, measured by direct and indirect ties, had a significant positive effect on increasing participants' recognition-based performance. Structural holes had mixed effects on participants' status, but were generally a source of social capital. (C) 2007 Elsevier B.V. All rights reserved. %B Information & Management %V 44 %P 240-252 %8 Apr %@ 0378-7206 %G eng %M ISI:000247156800002 %1 management %2 SNA %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Mining CVS Repositories to Understand Open-Source Project Developer Roles %A Yu, Liguo %A Ramaswamy, Srini %K cvs %K developer interaction %K developers %K mediawiki %K orac-dr %K roles %K scm %K source code %X This paper presents a model to represent the interactions of distributed open-source software developers and utilizes data mining techniques to derive developer roles. The model is then applied on case studies of two open-source projects, ORAC-DR and Mediawiki with encouraging results. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 8 - 8 %@ 0-7695-2950-X %R 10.1109/MSR.2007.19 %> https://flosshub.org/sites/flosshub.org/files/28300008.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Mining Eclipse Developer Contributions via Author-Topic Models %A Linstead, Erik %A Rigor, Paul %A Bajracharya, Sushil %A Lopes, Cristina %A Baldi, Pierre %K contributions %K developers %K eclipse %K expertise %K mining challenge %K msr challenge %K source code %K topics %X We present the results of applying statistical author-topic models to a subset of the Eclipse 3.0 source code consisting of 2,119 source files and 700,000 lines of code from 59 developers. This technique provides an intuitive and automated framework with which to mine developer contributions and competencies from a given code base while simultaneously extracting software function in the form of topics. In addition to serving as a convenient summary for program function and developer activities, our study shows that topic models provide a meaningful, effective, and statistical basis for developer similarity analysis. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 30 - 30 %@ 0-7695-2950-X %R 10.1109/MSR.2007.20 %> https://flosshub.org/sites/flosshub.org/files/28300030.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Mining Software Repositories with iSPAROL and a Software Evolution Ontology %A Kiefer, Christoph %A Bernstein, Abraham %A Tappolet, Jonas %K database %K eclipse %K evoont %K java %K owl %K semantic %K sparql %X One of the most important decisions researchers face when analyzing the evolution of software systems is the choice of a proper data analysis/exchange format. Most existing formats have to be processed with special programs written specifically for that purpose and are not easily extendible. Most scientists, therefore, use their own database(s) requiring each of them to repeat the work of writing the import/export programs to their format. We present EvoOnt, a software repository data exchange format based on the Web Ontology Language (OWL). EvoOnt includes software, release, and bug-related information. Since OWL describes the semantics of the data, EvoOnt is (1) easily extendible, (2) comes with many existing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The paper also shows iSPARQL -- our SPARQL-based Semantic Web query engine containing similarity joins. Together with EvoOnt, iSPARQL can accomplish a sizable number of tasks sought in software repository mining projects, such as an assessment of the amount of change between versions or the detection of bad code smells. To illustrate the usefulness of EvoOnt (and iSPARQL), we perform a series of experiments with a real-world Java project. These show that a number of software analyses can be reduced to simple iSPARQL queries on an EvoOnt dataset. %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 10 - 10 %@ 0-7695-2950-X %R 10.1109/MSR.2007.21 %> https://flosshub.org/sites/flosshub.org/files/28300010.pdf %0 Journal Article %J IEEE Software %D 2007 %T An Open Source Approach to Developing Software in a Small Organization %A Martin, Ken %A Hoffman, Bill %K Cross-platform %K Small-business %K software engineering %K Software testing %X This article is part of a special issue on Software Engineering Challenges in Small Software Organizations. The software development approach that developers at Kitware use borrows techniques from agile development and Extreme Programming and emphasizes long-term, ongoing projects. The company has used this approach on open source and closed-source projects in a wide range of sizes %B IEEE Software %V 24 %P 46 - 53 %8 01/2007 %N 1 %! IEEE Softw. %R 10.1109/MS.2007.5 %0 Journal Article %J International Economics and Economic Policy %D 2007 %T Open source software: Motivation and restrictive licensing %A Fershtman, Chaim %A Gandal, Neil %K contributions %K contributors %K developers %K incentives %K license analysis %K licenses %K lines of code %K loc %K MOTIVATION %K restrictive %K scm %K size %K status %K version history %X Open source software (OSS) is an economic paradox. Development of open source software is often done by unpaid volunteers and the source code is typically freely available. Surveys suggest that status, signaling, and intrinsic motivations play an important role in inducing developers to invest effort. Contribution to an OSS project is rewarded by adding one’s name to the list of contributors which is publicly observable. Such incentives imply that programmers may have little incentive to contribute beyond the threshold level required for being listed as a contributor. Using a unique data set we empirically examine this hypothesis. We find that the output per contributor in open source projects is much higher when licenses are less restrictive and more commercially oriented. These results indeed suggest a status, signaling, or intrinsic motivation for participation in OSS projects with restrictive licenses. %B International Economics and Economic Policy %I Springer Berlin / Heidelberg %V 4 %P 209-225 %U http://dx.doi.org/10.1007/s10368-007-0086-4 %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T A Preliminary Analysis of Publicly Available FLOSS Measurements: Towards Discovering Maintainability Trends %A Samoladas, Ioannis %A Bibi, Stamatia %A Ioannis Stamelos %A Sowe, Sulayman K. %A Deligiannis, Ignatios %K decision tree %K flossmole %K java %K machine learning %K metrics %K sourcekibitzer %X The spread of free/libre/open source software (FLOSS) and the openness of its development model offer researchers a valuable source of information regarding software data. The creation of large portals, which host a vast amount of FLOSS projects make it easy to create large datasets with valuable information regarding the FLOSS development process. In addition initiatives such as FLOSSMole provide researchers with a single point and continuing access to those data. Up to now the majority of datasets from FLOSSMole offered data regarding the development process and not the code itself. From February 2007 FLOSSMole offers data donated by SourceKibitzer, which contain source code metrics for FLOSS projects written in Java. In this paper we provide a premilinary analysis on those data using machine learning techniques, such as classification rules and decision trees. Using the first available data from February 2007, we tried to build rules that can be used in order to estimate the future values of metrics offered for March. Here we present some preliminary results that are encouraging and deserve to be further analyzed in future releases of SourceKibitzer datasets. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %8 2007 %> https://flosshub.org/sites/flosshub.org/files/Samolades2007.pdf %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T Programming Language Trends in Open Source Development: An Evaluation Using Data from All Production Phase SourceForge Projects %A Delorey, Daniel P. %A Knutson, Charles D. %A Giraud-Carrier, C. %K cvs %K cvs2mysql %K programming languages %K sfra %K sourceforge %K srda %X In this work, we analyze data collected from the CVS repos- itories of 9,997 Open Source projects hosted on SourceForge in an effort to understand trends in programming language usage in the Open Source community between 2000 and 2005. The trends we consider include: 1) the relative popularity of the ten most popular programming languages over time, 2) the use of multiple programming languages by individual programmers and by individual projects, and 3) the programming languages most often used in combination. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %> https://flosshub.org/sites/flosshub.org/files/Delorey2007b.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Release Pattern Discovery via Partitioning: Methodology and Case Study %A Hindle, Abram %A Godfrey, Michael W. %A Holt, Richard C. %K bitkeeper %K bt2csv %K cvs %K evolution %K mysql %K releases %K revision history %K scm %K softchange %K version control %X The development of Open Source systems produces a variety of software artifacts such as source code, version control records, bug reports, and email discussions. Since the development is distributed across different tool environments and developer practices, any analysis of project behavior must be inferred from whatever common artifacts happen to be available. In this paper, we propose an approach to characterizing a project's behavior around the time of major and minor releases; we do this by partitioning the observed activities, such as artifact check-ins, around the dates of major and minor releases, and then look for recognizable patterns. We validate this approach by means of a case study on the MySQL database system; in this case study, we found patterns which suggested MySQL was behaving consistently within itself. These patterns included testing and documenting that took place more before a release than after and that the rate of source code changes dipped around release time. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 19 - 19 %@ 0-7695-2950-X %R 10.1109/MSR.2007.28 %> https://flosshub.org/sites/flosshub.org/files/28300019.pdf %0 Journal Article %J Information and Software Technology Journal %D 2007 %T Self-organization of teams for free/libre open source software development %A Kevin Crowston %A Li, Qing %A Kangning Wei %A Eseryel, U. Yeliz %A Howison, James %K case study %K compiere %K coordination %K DESIGN %K distributed teams %K egroupware %K email %K email archives %K forum %K free/libre open source software development %K gaim %K INTERNET %K mailing list %K metadata %K qualitative research methods %K self-organizing teams %K sourceforge %K SYSTEMS %K task assignment %K WORK %X This paper provides empirical evidence about how free/libre open source software development teams self-organize their work. Following a case study methodology, we examined developer interaction data from three active and successful FLOSS projects using qualitative research methods, specifically inductive content analysis, to identify the task-assignment mechanisms used by the participants. We found that "self-assignment" was the most common mechanism across three FLOSS projects. This mechanism is consistent with expectations for distributed and largely volunteer teams. We conclude by discussing whether these emergent practices can be usefully transferred to mainstream practice and indicating directions for future research. %B Information and Software Technology Journal %V 49 %G eng %> https://flosshub.org/sites/flosshub.org/files/task_assignment_final.pdf %0 Journal Article %J Journal of Database Management %D 2007 %T Social network structures in open source software development teams %A Long, Y. %A Siau, K. %K bug tracking %K bugs %K COMMUNITY %K INNOVATION %K longitudinal study %K MODEL %K open source %K social %K social network analysis %K social networks %K sourceforge %K structure %X Drawing on social network theories and previous studies, this research examines the dynamics of social network structures in open source software (OSS) teams. Three projects were selected from SourceForge.net in terms of their similarities as well as their differences. Monthly data were extracted from the bug tracking systems in order to achieve a longitudinal view of the interaction pattern of each project. Social network analysis was used to generate the indices of social structure. The finding suggests that the interaction pattern of OSS projects evolves from a single hub at the beginning to a corel periphery model as the projects move forward. %B Journal of Database Management %V 18 %P 25-40 %8 Apr-Jun %@ 1063-8016 %G eng %M ISI:000244332400003 %1 information systems %2 SNA %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Spam Filter Based Approach for Finding Fault-Prone Software Modules %A Mizuno, Osamu %A Ikami, Shiro %A Nakaichi, Shuya %A Kikuno, Tohru %K argouml %K bug reports %K classification %K eclipse %K java %K modules %K scm %K spam %K text mining %X Because of the increase of needs for spam e-mail detection, the spam filtering technique has been improved as a convenient and effective technique for text mining. We propose a novel approach to detect fault-prone modules in a way that the source code modules are considered as text files and are applied to the spam filter directly. In order to show the applicability of our approach, we conducted experimental applications using source code repositories of Java based open source developments. The result of experiments shows that our approach can classify more than 75% of software modules correctly. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 4 - 4 %@ 0-7695-2950-X %R 10.1109/MSR.2007.29 %> https://flosshub.org/sites/flosshub.org/files/28300004.pdf %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T Studying Production Phase SourceForge Projects: An Exploratory Analysis Using cvs2mysql and SFRA %A Delorey, Daniel P. %A Knutson, Charles D. %A MacLean, Alexander C. %K Data Collection %K forge %K repositories %K sourceforge %X A wealth of data can be extracted from the natural by-products of software development processes and used in empirical studies of software engineering. However, the size and accuracy of such studies depend in large part on the availability of tools that facilitate the collection of data from individual projects and the combination of data from multiple projects. To demonstrate this point, we present our experience gathering and analyzing data from nearly 10,000 open source projects hosted on SourceForge. We describe the tools we developed to collect the data and the ways in which these tools and data may be used by other researchers. We also provide examples of statistics that we have calculated from these data to describe interesting author- and project-level behaviors of the SourceForge community. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %8 2007 %> https://flosshub.org/sites/flosshub.org/files/Delorey2007c.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Towards a Theoretical Model for Software Growth %A Herraiz, Israel %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %K C %K complexity %K evolution %K freebsd %K growth %K halstead %K lines of code %K loc %K mccabe %K metrics %K scm %K size %K sloc %K sloccount %K source code %X Software growth (and more broadly, software evolution) is usually considered in terms of size or complexity of source code. However in different studies, usually different metrics are used, which make it difficult to compare approaches and results. In addition, not all metrics are equally easy to calculate for a given source code, which leads to the question of which one is the easiest to calculate without losing too much information. To address both issues, in this paper present a comprehensive study, based on the analysis of about 700,000 C source code files, calculating several size and complexity metrics for all of them. For this sample, we have found double Pareto statistical distributions for all metrics considered, and a high correlation between any two of them. This would imply that any model addressing software growth should produce this Pareto distributions, and that analysis based on any of the considered metrics should show a similar pattern, provided the sample of files considered is large enough. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 21 - 21 %@ 0-7695-2950-X %R 10.1109/MSR.2007.31 %> https://flosshub.org/sites/flosshub.org/files/28300021.pdf %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T Understanding the KDE Social Structure through Mining of Email Archive %A Studer, Matthias %A Müller, Benoît %A Ritschard, Gilbert %K bug tracking system %K bugzilla %K commit %K email %K email archive %K kde %K mailing list %K participation %K revision control %K social network analysis %X In order to achieve a better understanding of FLOSS social structure, we need a definition of social position. From a theoretical perspective, we propose to think the participation as a trajectory. Empirically, we use optimal matching to build a typology of participation trajectories based on KDE email archives. We show how these trajectories structure the community as a whole by combining these results with a social network analysis. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %> https://flosshub.org/sites/flosshub.org/files/wopdasd_studer_et_all_full.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Using Software Repositories to Investigate Socio-technical Congruence in Development Projects %A Valetto, Giuseppe %A Helander, Mary %A Ehrlich, Kate %A Chulani, Sunita %A Wegman, Mark %A Williams, Clay %K developers %K graph %K scm %K social networks %K source code %X We propose a quantitative measure of socio-technical congruence as an indicator of the performance of an organization in carrying out a software development project. We show how the information necessary to implement that measure can be mined from commonly used software repositories, and we describe how socio-technical congruence can be computed based on that information. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 25 - 25 %@ 0-7695-2950-X %R 10.1109/MSR.2007.33 %> https://flosshub.org/sites/flosshub.org/files/28300025.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Visual Data Mining in Software Archives to Detect How Developers Work Together %A Weissgerber, Peter %A Pohl, Mathias %A Burch, Michael %K change %K coordination %K cvs %K developers %K junit %K modules %K scm %K source code %K svn %K teams %K tomcat %K visualization %X Analyzing the check-in information of open source software projects which use a version control system such as CVS or SUBVERSION can yield interesting and important insights into the programming behavior of developers. As in every major project tasks are assigned to many developers, the development must be coordinated between these programmers. This paper describes three visualization techniques that help to examine how programmers work together, e.g. if they work as a team or if they develop their part of the software separate from each other. Furthermore, phases of stagnation in the lifetime of a project can be uncovered and thus, possible problems are revealed. To demonstrate the usefulness of these visualization techniques we performed case studies on two open source projects. In these studies interesting patterns of developers? behavior, e.g. the specialization on a certain module can be observed. Moreover, modules that have been changed by many developers can be identified as well as such ones that have been altered by only one programmer. %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 9 - 9 %@ 0-7695-2950-X %R 10.1109/MSR.2007.34 %> https://flosshub.org/sites/flosshub.org/files/28300009.pdf %0 Journal Article %J International Journal of Information Technology and Web Engineering %D 2006 %T Applying Social Network Analysis Techniques to Community-Driven Libre Software Projects %A López-Fernández, L. %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %A Herraiz, I. %K apache %K conway's law %K cvs %K gnome %K kde %K scm %K social network analysis %K source code %X Source code management repositories of large, long-lived libre (free, open source) software projects can be a source of valuable data about the organizational structure, evolution, and knowledge exchange in the corresponding development communities. Unfortunately, the sheer volume of the available information renders it almost unusable without applying methodologies which highlight the relevant information for a given aspect of the project. Such methodology is proposed in this article, based on well known concepts from the social networks analysis field, which can be used to study the relationships among developers and how they collaborate in different parts of a project. It is also applied to data mined from some well known projects (Apache, GNOME, and KDE), focusing on the characterization of their collaboration network architecture. These cases help to understand the potentials of the methodology and how it is applied, but also shows some relevant results which open new paths in the understanding of the informal organization of libre software development communities. %B International Journal of Information Technology and Web Engineering %V 1 %G eng %> https://flosshub.org/sites/flosshub.org/files/06_Lopez_ijitwe_sna.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Applying the evolution radar to PostgreSQL %A D'Ambros, Marco %A Lanza, Michele %K cvs %K documentation %K evolution %K evolution radar %K logical coupling %K makefile %K mining challenge %K msr challenge %K postgresql %K re-engineering %K refactoring %K release history %K rhdb %K source code %K version control %K visualization %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 177–178 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138029 %R http://doi.acm.org/10.1145/1137983.1138029 %> https://flosshub.org/sites/flosshub.org/files/177ApplyingEvolution.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Are refactorings less error-prone than other changes? %A Weißgerber, Peter %A Diehl, Stephan %K argouml %K bug reports %K bugs %K change history %K jedit %K junit %K re-engineering %K refactoring %K reverse engineering %K software evolution %K version control %X Refactorings are program transformations which should preserve the program behavior. Consequently, we expect that during phases when there are mostly refactorings in the change history of a system, only few new bugs are introduced. For our case study we analyzed the version histories of several open source systems and reconstructed the refactorings performed. Furthermore, we obtained bug reports from various sources depending on the system. Based on this data we identify phases when the above hypothesis holds and those when it doesn't. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 112–118 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138011 %R http://doi.acm.org/10.1145/1137983.1138011 %> https://flosshub.org/sites/flosshub.org/files/112AreRefactorings.pdf %0 Conference Paper %B Proceedings of the 28th international conference on Software engineering %D 2006 %T A case study of a corporate open source development model %A Gurbani, Vijay K. %A Garvert, Anita %A Herbsleb, James D. %K architecture %K case study %K open source %K session initiation protocol %K software development %K vkg %X Open source practices and tools have proven to be highly effective for overcoming the many problems of geographically distributed software development. We know relatively little, however, about the range of settings in which they work. In particular, can corporations use the open source development model effectively for software projects inside the corporate domain? Or are these tools and practices incompatible with development environments, management practices, and market-driven schedule and feature decisions typical of a commercial software house? We present a case study of open source software development methodology adopted by a significant commercial software project in the telecommunications domain. We extract a number of lessons learned from the experience, and identify open research questions. %B Proceedings of the 28th international conference on Software engineering %S ICSE '06 %I ACM %C New York, NY, USA %P 472–481 %@ 1-59593-375-1 %U http://doi.acm.org/10.1145/1134285.1134352 %R 10.1145/1134285.1134352 %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Co-change visualization applied to PostgreSQL and ArgoUML: (MSR challenge report) %A Beyer, Dirk %K argouml %K ccvisu %K cvs %K force-directed graph layout %K graph %K mining challenge %K msr challenge %K postgresql %K software clustering %K software structure analysis %K software visualization %K version control %K visualization %X Co-change visualization is a method to recover the subsystem structure of a software system from the version history, based on common changes and visual clustering. This paper presents the results of applying the tool CCVisu which implements co-change visualization, to the two open-source software systems PostgreSQL and ArgoUML The input of the method is the co-change graph, which can be easily extracted by CCVisu from a Cvs version repository. The output is a graph layout that places software artifacts that were often commonly changed at close positions, and artifacts that were rarely co-changed at distant positions. This property of the layout is due to the clustering property of the underlying energy model,which evaluates the quality of a produced layout. The layout can be displayed on the screen, or saved to a file in SVG or VRML format. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 165–166 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138023 %R http://doi.acm.org/10.1145/1137983.1138023 %> https://flosshub.org/sites/flosshub.org/files/165Co-Change.pdf %0 Conference Paper %B OSS2006: Open Source Systems (IFIP 2.13) %D 2006 %T Collaborative Maintenance in Large Open-Source Projects %A den Besten, Matthijs %A Jean-Michel Dalle %A Galia, Fabrice %K apache %K COLLABORATION %K complexity %K cvs %K gaim %K gcc %K ghostscript %K halstead %K lines of code %K loc %K mccabe %K mozilla %K netbsd %K openssh %K postgresql %K python %K sloc %X The paper investigates collaborative work among maintainers of open source software by analyzing the logs of a set of 10 large projects. We inquire whether teamwork can be influenced by several characteristics of code. Preliminary results suggest that collaboration among maintainers in most large open-source projects seems to be positively influenced by file vintage and by Halstead volume of files, and negatively by McCabe complexity and size measured in SLOCs. These results could be consistent with an increased attractivity of files created early in the history of a project, and with maintainers being less attracted by more verbose code and by more complex code, although in this last case it might also reflect the fact that more complex files would be de facto more exclusive in terms of maintenance. %B OSS2006: Open Source Systems (IFIP 2.13) %S IFIP International Federation for Information Processing %I Springer %P 233 - 244 %G eng %R http://dx.doi.org/10.1007/0-387-34226-5_23 %> https://flosshub.org/sites/flosshub.org/files/Collaborative%20Maintenance.pdf %0 Conference Paper %B OSS2006: Open Source Systems (IFIP 2.13) %D 2006 %T Communication Networks in an Open Source Software Project %A Roberts, Jeffrey %A Il-Horn Hann %A Sandra Slaughter %K apache %K core %K developers %K email %K email archive %K mailing list %K participation %K social network analysis %X This study explores the nature of the social network and the patterns of communication that exist in an open source software development project, the Apache HTTP (WEB) server project. Our analysis of archival data on email communications between developers in the Apache HTTP server project suggests an interesting pattern of communication. We find that the core developers self-organize into three sub-groups that communicate intensely in completing the project. Our analysis also reveals that a few prominent developers who are centrally located in the network are driving communications within the project. We identify the implications of our findings and suggest areas for further research. %B OSS2006: Open Source Systems (IFIP 2.13) %S IFIP International Federation for Information Processing %I Springer %V 203/2006 %P 297 - 306 %8 2006/// %G eng %R http://dx.doi.org/10.1007/0-387-34226-5_30 %> https://flosshub.org/sites/flosshub.org/files/Communication%20Networks%20in%20an%20Open%20Source.pdf %0 Conference Paper %B Proceedings of the Conference on Software Maintenance and Reengineering %D 2006 %T Comparison Between SLOCs and Number of Files As Size Metrics for Software Evolution Analysis %A Herraiz, Israel %A Gregorio Robles %A Gonzalez-Barahon, Jes us M. %K empirical studies %K libre software %K metrics %K software evolution %B Proceedings of the Conference on Software Maintenance and Reengineering %S CSMR '06 %I IEEE Computer Society %C Washington, DC, USA %P 206–213 %@ 0-7695-2536-9 %U http://dl.acm.org/citation.cfm?id=1116163.1116405 %0 Journal Article %J Proceedings of the 39th Annual Hawaii International Conference on System Sciences-Volume 06 %D 2006 %T Core and periphery in Free/Libre and Open Source software team communications %A Kevin Crowston %A Kangning Wei %A Li, Qing %A Howison, James %K bug fixing %K contributions %K contributors %K core %K developers %K social network analysis %K sourceforge %K team %X The concept of the core group of developers is important and often discussed in empirical studies of FLOSS projects. This paper examines the question, "how does one empirically distinguish the core?" Being able to identify the core members of a FLOSS development project is important because many of the processes necessary for successful projects likely involve core members differently than peripheral members, so analyses that mix the two groups will likely yield invalid results. We compare 3 analysis approaches to identify the core: the named list of developers, a Bradford's law analysis that takes as the core the most frequent contributors and a social network analysis of the interaction pattern that identifies the core in a core-and-periphery structure. We apply these measures to the interactions around bug fixing for 116 SourceForge projects. The 3 techniques identify different individuals as core members; examination of which individuals are identified leads to suggestions for refining the measures. All 3 measures though suggest that the core of FLOSS projects is a small fraction of the total number of contributors. %B Proceedings of the 39th Annual Hawaii International Conference on System Sciences-Volume 06 %G eng %1 information systems %2 computational %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Detecting similar Java classes using tree algorithms %A Sager, Tobias %A Bernstein, Abraham %A Pinzger, Martin %A Kiefer, Christoph %K change analysis %K clones %K coogle %K eclipse %K famix %K java %K similarity %K software evolution %K software repositories %K source code %K tree similarity measures %X Similarity analysis of source code is helpful during development to provide, for instance, better support for code reuse. Consider a development environment that analyzes code while typing and that suggests similar code examples or existing implementations from a source code repository. Mining software repositories by means of similarity measures enables and enforces reusing existing code and reduces the developing effort needed by creating a shared knowledge base of code fragments. In information retrieval similarity measures are often used to find documents similar to a given query document. This paper extends this idea to source code repositories. It introduces our approach to detect similar Java classes in software projects using tree similarity algorithms. We show how our approach allows to find similar Java classes based on an evaluation of three tree-based similarity measures in the context of five user-defined test cases as well as a preliminary software evolution analysis of a medium-sized Java project. Initial results of our technique indicate that it (1) is indeed useful to identify similar Java classes, (2)successfully identifies the ex ante and ex post versions of refactored classes, and (3) provides some interesting insights into within-version and between-version dependencies of classes within a Java project. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 65–71 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138000 %R http://doi.acm.org/10.1145/1137983.1138000 %> https://flosshub.org/sites/flosshub.org/files/65Detecting.pdf %0 Conference Paper %B Proceedings of the 2006 International Workshop on Economics Driven Software Engineering Research %D 2006 %T Effort Estimation by Characterizing Developer Activity %A Amor, Juan Jose %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %K developer characterization %K effort estimation %K mining software repositories %K open source software %K software economics %X During the latest years libre (free, open source) software has gained a lot of attention from the industry. Following this interest, the research community is also studying it. For instance, many teams are performing quantitative analysis on the large quantity of data which is publicly available from the development repositories maintained by libre software projects. However, not much of this research is focused on cost or effort estimations, despite its importance (for instance, for companies developing libre software or collaborating with libre software projects), and the availability of some data which could be useful for this purpose. Our position is that classical effort estimation models can be improved from the study of these data, at least when applied to libre software. In this paper, we focus on the characterization of developer activity, which we argue can improve effort estimation. This activity can be traced with a lot of detail, and the resulting data can also be used for validation of any effort estimation model. %B Proceedings of the 2006 International Workshop on Economics Driven Software Engineering Research %S EDSER '06 %I ACM %C New York, NY, USA %P 3–6 %@ 1-59593-396-4 %U http://doi.acm.org/10.1145/1139113.1139116 %R 10.1145/1139113.1139116 %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T The evolution radar: visualizing integrated logical coupling information %A D'Ambros, Marco %A Lanza, Michele %A Lungu, Mircea %K change management %K cvs %K evolution %K logical coupling %K mozilla %K scm %K source code %K thunderbird %K tinderbox %K visualization %X In software evolution research logical coupling has extensively been used to recover the hidden dependencies between source code artifacts. They would otherwise go lost because of the file-based nature of current versioning systems. Previous research has dealt with low-level couplings between files, leading to an explosion of data to be analyzed, or has abstracted the logical couplings to module level, leading to a loss of detailed information. In this paper we propose a visualization-based approach which integrates both file-level and module-level logical coupling information. This not only facilitates an in-depth analysis of the logical couplings at all granularity levels, it also leads to a precise characterization of the system modules in terms of their logical coupling dependencies. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 26–32 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1137992 %R http://doi.acm.org/10.1145/1137983.1137992 %> https://flosshub.org/sites/flosshub.org/files/26TheEvolutionRadar.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Examining the evolution of code comments in PostgreSQL %A Zhen Ming Jiang %A Hassan, Ahmed E. %K code comments %K comments %K cvs %K evolution %K functions %K maintenance %K mining challenge %K msr challenge %K postgresql %K software evolution %K software maintenance %K source code %X It is common, especially in large software systems, for developers to change code without updating its associated comments due to their unfamiliarity with the code or due to time constraints. This is a potential problem since outdated comments may confuse or mislead developers who perform future development. Using data recovered from CVS, we study the evolution of code comments in the PostgreSQL project. Our study reveals that over time the percentage of commented functions remains constant except for early fluctuation due to the commenting style of a particular active developer. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 179–180 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138030 %R http://doi.acm.org/10.1145/1137983.1138030 %> https://flosshub.org/sites/flosshub.org/files/179ExaminingTheEvolution.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Fine grained indexing of software repositories to support impact analysis %A Canfora, Gerardo %A Cerulo, Luigi %K argouml %K change analysis %K Firefox %K gedit %K impact analysis %K mining software repositories %K scm %K source code %K version control %X Versioned and bug-tracked software systems provide a huge amount of historical data regarding source code changes and issues management. In this paper we deal with impact analysis of a change request and show that data stored in software repositories are a good descriptor on how past change requests have been resolved. A fine grained analysis method of software repositories is used to index code at different levels of granularity, such as lines of code and source files, with free text contained in software repositories. The method exploits information retrieval algorithms to link the change request description and code entities impacted by similar past change requests. We evaluate such approach on a set of three open-source projects. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 105–111 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138009 %R http://doi.acm.org/10.1145/1137983.1138009 %> https://flosshub.org/sites/flosshub.org/files/105FineGrained.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Geographic location of developers at SourceForge %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %K distributed %K email %K email address %K free software %K geographical location %K geography %K libre software %K mining software repositories %K open source software %K sourceforge %K timezone %X The development of libre (free/open source) software is usually performed by geographically distributed teams. Participation in most cases is voluntary, sometimes sporadic, and often not framed by a pre-defined management structure. This means that anybody can contribute, and in principle no national origin has advantages over others, except for the differences in availability and quality of Internet connections and language. However, differences in participation across regions do exist, although there are little studies about them. In this paper we present some data which can be the basis for some of those studies. We have taken the database of users registered at SourceForge, the largest libre software development web-based platform, and have inferred their geographical locations. For this, we have applied several techniques and heuristics on the available data (mainly e-mail addresses and time zones), which are presented and discussed in detail. The results show a snapshot of the regional distribution of SourceForge users, which may be a good proxy of the actual distribution of libre software developers. In addition, the methodology may be of interest for similar studies in other domains, when the available data is similar (as is the case of mailing lists related to software projects). %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 144–150 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138017 %R http://doi.acm.org/10.1145/1137983.1138017 %> https://flosshub.org/sites/flosshub.org/files/144GeographicLocation.pdf %0 Journal Article %J Knowledge, Technology & Policy %D 2006 %T Hierarchy and centralization in Free and Open Source Software team communications %A Kevin Crowston %A Howison, James %K apache %K bug fixing %K bug tracking %K FLOSS %K project success %K savannah %K social network analysis %K sourceforge %K team size %K teams %X Free/Libre Open Source Software (FLOSS) development teams provide an interesting and convenient setting for studying distributed work. We begin by answering perhaps the most basic question: what is the social structure of these teams? Based on a social network analysis of interactions represented in 62,110 bug reports from 122 large and active projects, we find that some OSS teams are highly centralized, but contrary to expectation, others are not. Projects are mostly quite hierarchical on four measures of hierarchy, consistent with past research but contrary to the popular image of these projects. Furthermore, we find that the level of centralization is negatively correlated with project size, suggesting that larger projects become more modular. The paper makes a further methodological contribution by identifying appropriate analysis approaches for interaction data. We conclude by sketching directions for future research. %B Knowledge, Technology & Policy %V 18 %P 65–85 %> https://flosshub.org/sites/flosshub.org/files/CrowstonHierarchyAndCentralization.pdf %0 Journal Article %J Information and Software Technology %D 2006 %T Identifying Knowledge Brokers that Yield Software Engineering Knowledge in OSS Projects %A Sowe, Sulayman K. %A Ioannis Stamelos %A Lefteris Angelis %K debian %K email %K email archives %K expertise %K knowledge sharing %K mailing list %K project success %K social network analysis %X Much research on open source software development concentrates on developer lists and other software repositories to investigate what motivates professional software developers to participate in open source software projects. Little attention has been paid to individuals who spend valuable time in lists helping participants on some mundane yet vital project activities. Using three Debian lists as a case study we investigate the impact of knowledge brokers and their associated activities in open source projects. Social network analysis was used to visualize how participants are affiliated with the lists. The network topology reveals substantial community participation. The consequence of collaborating in mundane activities for the success of open source software projects is discussed. The direct beneficiaries of this research are in the identification of knowledge experts in open source software projects. %B Information and Software Technology %V 46 %P 1025-1033 %8 11/2006 %G eng %R 10.1016/j.infsof.2005.12.019 %> https://flosshub.org/sites/flosshub.org/files/IST-Vol-48-11-2006.pdf %0 Journal Article %J MIS Quarterly %D 2006 %T The Impact of Ideology on Effectiveness in Open Source Software Development Teams %A Stewart, K. %A Gosain, S. %K bug fixing %K bug reports %K bug tracking %K communication %K COMMUNITY %K effectiveness %K feature requests %K ideology %K metadata %K sourceforge %K Survey %K team effort %K team size %K trust %X The emerging work on understanding open source software has argued for the importance of understanding what leads to effectiveness in OSS development teams and has pointed to the importance of ideology. This paper develops a framework of the OSS ideology (including specific norms, beliefs, and values) and a theoretical model to show how adherence to components of the ideology impact effectiveness in OSS teams. The model is based on the idea that ideology provides clan control, which is important in OSS development settings because OSS teams generally lack formal behavioral and outcome controls. The paper hypothesizes both direct effects of ideology on OSS team effectiveness and indirect effects via influences on affective trust, cognitive trust, and communication quality. Hypotheses are tested using survey and objective data on OSS projects. Four effectiveness measures are used to capture unique aspects of effectiveness in OSS including both the extent to which a team attracts input from the community and the team's success in accomplishing project outcomes. Results support the main thesis that OSS team members' adherence to the tenets of the OSS community ideology enhances OSS team effectiveness. The study uncovers several differences in the importance of OSS norms, beliefs, and values to different kinds of OSS team effectiveness and discusses implications for theory and practice. %B MIS Quarterly %V 30 %P 291-314 %8 2006 %G eng %> https://flosshub.org/sites/flosshub.org/files/stewartgosain2.pdf %0 Conference Paper %B OSS2006: Open Source Systems (IFIP 2.13) %D 2006 %T Impact of Social Ties on Open Source Project Team Formation %A Hahn, Jungpil %A Moon, Jae %A Zhang, Chen %K developers %K metadata %K social network analysis %K sourceforge %X In this paper, we empirically examined the role of social ties in OSSD team formation and developer joining behavior. We find that the existence and the amount of prior social relations in the network do increase the probability of an OSS project to attract more developers. Interestingly, for projects without preexisting social ties, developers tend to join the project initiated by people with less OSSD experience. This research fills a gap in the open source literature by conducting an empirical investigation of the role of social relations on project team formation behavior. Furthermore, the adoption of social network analysis, which has received little attention in the OSS literature, can yield some interesting results on the interactions among OSS developers. %B OSS2006: Open Source Systems (IFIP 2.13) %S IFIP International Federation for Information Processing %I Springer %V 203/2006 %P 307 - 317 %8 2006/// %G eng %R http://dx.doi.org/10.1007/0-387-34226-5_31 %> https://flosshub.org/sites/flosshub.org/files/Impact%20of%20Social%20Ties%20on%20Open%20Source%20Project.pdf %0 Journal Article %J Software Process–Improvement and Practice %D 2006 %T Information systems success in Free and Open Source Software development: Theory and measures %A Kevin Crowston %A Howison, James %A Hala Annabi %K bug fixing %K developers %K downloads %K FLOSS %K flossmole %K page views %K popularity %K project success %K size %K sourceforge %K success %K team size %X Information systems success is one of the most widely used dependent variables in information systems (IS) research, but research on Free/Libre and Open Source software (FLOSS) often fails to appropriately conceptualize this important concept. In this paper, we reconsider what success means within a FLOSS context. We first review existing models of IS success and success variables used in FLOSS research and assess them for their usefulness, practicality and fit to the FLOSS context. Then, drawing on a theoretical model of group effectiveness in the FLOSS development process, as well as an online discussion group with developers, we present additional concepts that are central to an appropriate understanding of success for FLOSS. In order to examine the practicality and validity of this conceptual scheme, the second half of our paper presents an empirical study that demonstrates its operationalization of the chosen measures and assesses their internal validity. We use data from SourceForge to measure the project’s effectiveness in team building, the speed of the project at responding to bug reports and the project’s popularity. We conclude by discussing the implications of this study for our proposed extension of IS success in the context of FLOSS development and highlight future directions for research. %B Software Process–Improvement and Practice %V 11 %P 123–148 %R 10.1002/spip.259 %> https://flosshub.org/sites/flosshub.org/files/CrowstonHowisonAnnabi2006.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Information theoretic evaluation of change prediction models for large-scale software %A Askari, Mina %A Holt, Ric %K bugs %K change analysis %K cvs %K evaluation approach %K file %K freebsd %K information theory %K kde %K koffice %K log files %K netbsd %K openbsd %K postgresql %K prediction %K prediction models %K scm %K source code %X In this paper, we analyze the data extracted from several open source software repositories. We observe that the change data follows a Zipf distribution. Based on the extracted data, we then develop three probabilistic models to predict which files will have changes or bugs. The first model is Maximum Likelihood Estimation (MLE), which simply counts the number of events, i.e., changes or bugs, that happen to each file and normalizes the counts to compute a probability distribution. The second model is Reflexive Exponential Decay (RED) in which we postulate that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. The third model is called RED-Co-Change. With each modification to a given file, the RED-Co-Change model not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes. We then present an information-theoretic approach to evaluate the performance of different prediction models. In this approach, the closeness of model distribution to the actual unknown probability distribution of the system is measured using cross entropy. We evaluate our prediction models empirically using the proposed information-theoretic approach for six large open source systems. Based on this evaluation, we observe that of our three prediction models, the RED-Co-Change model predicts the distribution that is closest to the actual distribution for all the studied systems. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 126–132 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138013 %R http://doi.acm.org/10.1145/1137983.1138013 %> https://flosshub.org/sites/flosshub.org/files/126InformationTheoretic.pdf %0 Journal Article %J International Journal of Information Technology and Web Engineering %D 2006 %T Integration of libre software applications to create a collaborative work platform for researchers at GET %A Olivier Berger %A Christian Bac %A Benoit Hamet %K collaborative work environment %K contribution %K free software %K groupware %K in-house applications %K libre software %K open source software %K OpenLDAP %K phpGroupware %K PicoLibre %K ProGET %K Sympa %K TWiki %K WebDAV %K wiki %X Libre software provides powerful applications ready to be integrated for the build-up of platforms for internal use in organizations. We describe the architecture of the collaborative work platform which we have integrated, designed for researchers at GET. We present the elements we have learned during this project in particular with respect to contribution to external libre projects, in order to better ensure the maintainability of the internal applications, and to phpGroupware as a framework for specific applications development. %B International Journal of Information Technology and Web Engineering %I IGI Global %V 1 %P 1-16 %8 07/2006 %0 Conference Paper %B Proceedings of the 38th Annual Hawaii International Conference on System Sciences %D 2006 %T Knowledge Reuse in Open Source Software: An Exploratory Study of 15 Open Source Projects %A von Krogh, G. %A Spaeth, S. %A Haefliger, S. %K cvs %K email %K knowledge reuse %K lines of code %K loc %K source code %K Survey %X To date, there is no investigation of knowledge reuse in open source software projects. This paper focuses on the forms of knowledge reuse and the factors impacting on them. It develops a theory drawn from data of 15 open source software projects and finds that the effort to search, integrate and maintain external knowledge influences the form of knowledge to be reused. Implications for firms and innovation research are discussed. %B Proceedings of the 38th Annual Hawaii International Conference on System Sciences %I IEEE %C Big Island, HI, USA %P 1-10 %8 2006 %U http://www.computer.org/csdl/proceedings/hicss/2005/2268/07/22680198b-abs.html %R 10.1109/HICSS.2005.378 %0 Journal Article %J Management Science %D 2006 %T Location, Location, Location: How Network Embeddedness Affects Project Success in Open Source Systems %A Grewal, Rajdeep %A Lilien, Gary L. %A Mallapragada, Girish %K affiliation network %K age %K developers %K latent class analysis %K network embeddedness %K open source software %K page views %K perl %K project success %K registration %K sourceforge %X The community-based model for software development in open source environments is becoming a viable alternative to traditional firm-based models. To better understand the workings of open source environments, we examine the effects of network embeddedness---or the nature of the relationship among projects and developers---on the success of open source projects. We find that considerable heterogeneity exists in the network embeddedness of open source projects and project managers. We use a visual representation of the affiliation network of projects and developers as well as a formal statistical analysis to demonstrate this heterogeneity and to investigate how these structures differ across projects and project managers. Our main results surround the effect of this differential network embeddedness on project success. We find that network embeddedness has strong and significant effects on both technical and commercial success, but that those effects are quite complex. We use latent class regression analysis to show that multiple regimes exist and that some of the effects of network embeddedness are positive under some regimes and negative under others. We use project age and number of page views to provide insights into the direction of the effect of network embeddedness on project success. Our findings show that different aspects of network embeddedness have powerful but subtle effects on project success and suggest that this is a rich environment for further study. %B Management Science %I INFORMS %C Institute for Operations Research and the Management Sciences (INFORMS), Linthicum, Maryland, USA %V 52 %P 1043–1056 %8 July %U http://portal.acm.org/citation.cfm?id=1246148.1246155 %R 10.1287/mnsc.1060.0550 %0 Journal Article %J J. Syst. Softw. %D 2006 %T Maintainability of the kernels of open-source operating systems: A comparison of Linux with FreeBSD, NetBSD, and OpenBSD %A Yu, Liguo %A Schach, Stephen R. %A Chen, Kai %A Heller, Gillian Z. %A Offutt, Jeff %K abiword %K Common coupling %K coupling %K Definition-use analysis %K freebsd %K kernel %K lines of code %K linux %K linux kernel %K loc %K Maintainability %K modules %K netbsd %K Open-source software %K openbsd %K source code %X We compared and contrasted the maintainability of four open-source operating systems: Linux, FreeBSD, NetBSD, and OpenBSD. We used our categorization of common coupling in kernel-based software to highlight future maintenance problems. An unsafe definition is a definition of a global variable that can affect a kernel module if that definition is changed. For each operating system we determined a number of measures, including the number of global variables, the number of instances of global variables in the kernel and overall, as well as the number of unsafe definitions in the kernel and overall. We also computed the value of each our measures per kernel KLOC and per KLOC overall. For every measure and every ratio, Linux compared unfavorably with FreeBSD, NetBSD, and OpenBSD. Accordingly, we are concerned about the future maintainability of Linux. %B J. Syst. Softw. %I Elsevier Science Inc. %C New York, NY, USA %V 79 %P 807–815 %8 June %U http://dx.doi.org/10.1016/j.jss.2005.08.014 %R http://dx.doi.org/10.1016/j.jss.2005.08.014 %> https://flosshub.org/sites/flosshub.org/files/YuSchachChen.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T MAPO: mining API usages from open source repositories %A Xie, Tao %A Pei, Jian %K api %K application programming interfaces %K documentation %K mining software repositories %K pmd %K program comprehension %K search engine %K sequences %K source code %K source code search engine %X To improve software productivity, when constructing new software systems, developers often reuse existing class libraries or frameworks by invoking their APIs. Those APIs, however, are often complex and not well documented, posing barriers for developers to use them in new client code. To get familiar with how those APIs are used, developers may search the Web using a general search engine to find relevant documents or code examples. Developers can also use a source code search engine to search open source repositories for source files that use the same APIs. Nevertheless, the number of returned source files is often large. It is difficult for developers to learn API usages from a large number of returned results. In order to help developers understand API usages and write API client code more effectively, we have developed an API usage mining framework and its supporting tool called MAPO (for Mining API usages from Open source repositories). Given a query that describes a method, class, or package for an API, MAPO leverages the existing source code search engines to gather relevant source files and conducts data mining. The mining leads to a short list of frequent API usages for developers to inspect. MAPO currently consists of five components: a code search engine, a source code analyzer, a sequence preprocessor, a frequent sequence miner, and a frequent sequence post processor. We have examined the effectiveness of MAPO using a set of various queries. The preliminary results show that the framework is practical for providing informative and succinct API usage patterns. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 54–57 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1137997 %R http://doi.acm.org/10.1145/1137983.1137997 %> https://flosshub.org/sites/flosshub.org/files/54MAPO.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Micro pattern evolution %A Kim, Sunghun %A Pan, Kai %A Whitehead,Jr., E. James %K argouml %K bugs %K columba %K design patterns %K evolution %K extraction %K java %K jedit %K source code %X When analyzing the evolution history of a software project, we wish to develop results that generalize across projects. One approach is to analyze design patterns, permitting characteristics of the evolution to be associated with patterns, instead of source code. Traditional design patterns are generally not amenable to reliable automatic extraction from source code, yet automation is crucial for scalable evolution analysis. Instead, we analyze “micro pattern” evolution; patterns whose abstraction level is closer to source code, and designed to be automatically extractable from Java source code or bytecode. We perform micro-pattern evolution analysis on three open source projects, ArgoUML, Columba, and jEdit to identify micro pattern frequencies, common kinds of pattern evolution, and bug-prone patterns. In all analyzed projects, we found that the micro patterns of Java classes do not change often. Common bug- prone pattern evolution kinds are ‘Pool → Pool’, ‘Implementor → NONE’, and ‘Sampler → Sampler’. Among all pattern evolution kinds,‘Box’,‘CompoundBox’, ‘Pool’, ‘CommonState’, and ‘Outline’ micro patterns have high bug rates, but they have low frequencies and a small number of changes. The pattern evolution kinds that are bug-prone are somewhat similar across projects. The bug-prone pattern evolution kinds of two different periods of the same project are almost identical. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 40–46 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1137995 %R http://doi.acm.org/10.1145/1137983.1137995 %> https://flosshub.org/sites/flosshub.org/files/40MicroPattern.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining additions of method calls in ArgoUML %A Zimmermann, Thomas %A Breu, Silvia %A Lindig, Christian %A Livshits, Benjamin %K argouml %K change analysis %K eclipse %K function calls %K mining challenge %K msr challenge %K pattern %K source code %K xelopes %X In this paper we refine the classical co-change to the addition of method calls. We use this concept to find usage patterns and to identify cross-cutting concerns for ArgoUML. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 169–170 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138025 %R http://doi.acm.org/10.1145/1137983.1138025 %> https://flosshub.org/sites/flosshub.org/files/169MiningAdditions.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining eclipse for cross-cutting concerns %A Breu, Silvia %A Zimmermann, Thomas %A Lindig, Christian %K aspects %K concept analysis %K cvs %K eclipse %K source code %X Software may contain functionality that does not align with its architecture. Such cross-cutting concerns do not exist from the beginning but emerge over time. By analysing where developers add code to a program, our history-based mining identifies cross-cutting concerns in a two-step process. First, we mine CVS archives for sets of methods where a call to a specific single method was added. In a second step, such simple cross-cutting concerns are combined to complex cross-cutting concerns. To compute these efficiently, we apply formal concept analysis—an algebraic theory. History-based mining scales well: we are the first to report aspects mined from an industrial-sized project like Eclipse. For example, we identified a locking concern that crosscuts 1284 methods. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 94–97 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138006 %R http://doi.acm.org/10.1145/1137983.1138006 %> https://flosshub.org/sites/flosshub.org/files/94MiningEclipse.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining email social networks %A Christian Bird %A Gourley, Alex %A Devanbu, Prem %A Gertz, Michael %A Swaminathan, Anand %K communication %K contributions %K developers %K email %K email archives %K mailing lists %K open source %K social networks %X Communication & Co-ordination activities are central to large software projects, but are difficult to observe and study in traditional (closed-source, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, on the other hand, use the internet as the communication medium,and typically conduct discussions in an open, public manner. As a result, the email archives of OSS projects provide a useful trace of the communication and co-ordination activities of the participants. However, there are various challenges that must be addressed before this data can be effectively mined. Once this is done, we can construct social networks of email correspondents, and begin to address some interesting questions. These include questions relating to participation in the email; the social status of different types of OSS participants; the relationship of email activity and commit activity (in the CVS repositories) and the relationship of social status with commit activity. In this paper, we begin with a discussion of our infrastructure (including a novel use of Scientific Workflow software) and then discuss our approach to mining the email archives; and finally we present some preliminary results from our data analysis. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 137–143 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138016 %R http://doi.acm.org/10.1145/1137983.1138016 %> https://flosshub.org/sites/flosshub.org/files/137MiningEmail.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining email social networks in Postgres %A Christian Bird %A Gourley, Alex %A Devanbu, Prem %A Gertz, Michael %A Swaminathan, Anand %K developers %K email %K email archives %K open source %K postgresql %K scm %K social network analysis %K social networks %K source code %K status %X Open Source Software (OSS) projects provide a unique opportunity to gather and analyze publicly available historical data. The Postgres SQL server, for example, has over seven years of recorded development and communication activity. We mined data from both the source code repository and the mailing list archives to examine the relationship between communication and development in Postgres. Along the way, we had to deal with the difficult challenge of resolving email aliases. We used a number of social network analysis measures and statistical techniques to analyze this data. We present our findings in this paper. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 185–186 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138033 %R http://doi.acm.org/10.1145/1137983.1138033 %> https://flosshub.org/sites/flosshub.org/files/185MiningEmail.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining large software compilations over time: another perspective of software evolution %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %A Martin Michlmayr %A Amor, Juan Jose %K debian %K distributions %K evolution %K large software collections %K lines of code %K loc %K metrics %K mining software repositories %K size %K sloc %K sloccount %K software evolution %K software integrators %X With the success of libre (free, open source) software, a new type of software compilation has become increasingly common. Such compilations, often referred to as 'distributions', group hundreds, if not thousands, of software applications and libraries written by independent parties into an integrated system. Software compilations raise a number of questions that have not been targeted so far by software evolution, which usually focuses on the evolution of single applications. Undoubtedly, the challenges that software compilations face differ from those found in single software applications. Nevertheless, it can be assumed that both, the evolution of applications and that of software compilations, have similarities and dependencies.In this sense, we identify a dichotomy, common to that in economics, of software evolution in the small (micro-evolution) and in the large (macro-evolution). The goal of this paper is to study the evolution of a large software compilation, mining the publicly available repository of a well-known Linux distribution, Debian. We will therefore investigate changes related to hundreds of millions of lines of code over seven years. The aspects that will be covered in this paper are size (in terms of number of packages and of number of lines of code), use of programming languages, maintenance of packages and file sizes. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 3–9 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1137986 %R http://doi.acm.org/10.1145/1137983.1137986 %> https://flosshub.org/sites/flosshub.org/files/3miningLarge.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining sequences of changed-files from version histories %A Kagdi, Huzefa %A Yusuf, Shehnaaz %A Maletic, Jonathan I. %K change %K change history %K change management %K change sequences %K heuristics %K kde %K mining software repositories %K scm %K sequences %K source code %X Modern source-control systems, such as Subversion, preserve change-sets of files as atomic commits. However, the specific ordering information in which files were changed is typically not found in these source-code repositories. In this paper, a set of heuristics for grouping change-sets (i.e., log-entries) found in source-code repositories is presented. Given such groups of change-sets, sequences of files that frequently change together are uncovered. This approach not only gives the (unordered) sets of files but supplements them with (partial temporal) ordering information. The technique is demonstrated on a subset of KDE source-code repository. The results show that the approach is able to find sequences of changed-files. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 47–53 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1137996 %R http://doi.acm.org/10.1145/1137983.1137996 %> https://flosshub.org/sites/flosshub.org/files/47MiningSequences.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining software repositories with CVSgrab %A Voinea, Lucian %A Telea, Alexandru %K argouml %K cvs %K cvsgrab %K evolution %K mining challenge %K msr challenge %K postgresql %K software visualization %K source code %K team %K visualization %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 167–168 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138024 %R http://doi.acm.org/10.1145/1137983.1138024 %> https://flosshub.org/sites/flosshub.org/files/167MiningSoftware.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining version archives for co-changed lines %A Zimmermann, Thomas %A Kim, Sunghun %A Zeller, Andreas %A Whitehead,Jr., E. James %K change %K change analysis %K change management %K graph %K lines of code %K source code %X Files, classes, or methods have frequently been investigated in recent research on co-change. In this paper, we present a first study at the level of lines. To identify line changes across several versions, we define the annotation graph which captures how lines evolve over time. The annotation graph provides more fine-grained software evolution information such as life cycles of each line and related changes: "Whenever a developer changed line 1 of version.txt she also changed line 25 of Library.java." %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 72–75 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138001 %R http://doi.acm.org/10.1145/1137983.1138001 %> https://flosshub.org/sites/flosshub.org/files/72MiningVersionArchives.pdf %0 Journal Article %J Intern. J. Internet Technology and Web Engineering %D 2006 %T Multi-Modal Modeling, Analysis and Validation of Open Source Software Development Processes %A Walt Scacchi %A Chris Jensen %A Noll, J. %A Elliott, M. %K empirical studies of software engineering %K open source software development %K process modeling %K requirements processes %K software process %X Understanding the context, structure, activities, and content of software development processes found in practice has been and remains a challenging problem. In the world of free/open source software development, discovering and understanding what processes are used in particular projects is important in determining how they are similar to or different from those advocated by the software engineering community. Prior studies have revealed that development processes in F/OSSD projects are different in a number of ways. In this paper, we describe how a variety of modeling perspectives and techniques are used to elicit, analyze, and validate software development processes found in F/OSSD projects, with examples drawn from studies of the software requirements process found in the NetBeans.org project. %B Intern. J. Internet Technology and Web Engineering %V 1 %P 49-63 %G eng %> https://flosshub.org/sites/flosshub.org/files/Scacchi-Jensen-Noll-Elliott-OSSC05.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T An open framework for CVS repository querying, analysis and visualization %A Voinea, Lucian %A Telea, Alexandru %K argouml %K cvs %K cvsgrab %K evolution visualization %K postgresql %K software visualization %X We present an open framework for visual mining of CVS software repositories. We address three aspects: data extraction, analysis and visualization. We first discuss the challenges of CVS data extraction and storage, and propose a flexible way to deal with CVS implementation inconsistencies. We next present a new technique to enrich the raw data with information about artifacts showing similar evolution. Finally, we propose a visualization backend and show its applicability on industry-size repositories. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 33–39 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1137993 %R http://doi.acm.org/10.1145/1137983.1137993 %0 Journal Article %J Statistical Science %D 2006 %T Opportunities and Challenges Applying Functional Data Analysis to the Study of Open Source Software Evolution %A Stewart, Katherine J. %A Darcy, David P. %A Daniel, Sherae L. %K complexity %K evolution %K fda %K java %K lines of code %K loc %K release history %K scm %K size %K sourceforge %X This paper explores the application of functional data analysis (FDA) as a means to study the dynamics of software evolution in the open source context. Several challenges in analyzing the data from software projects are discussed, an approach to overcoming those challenges is described, and preliminary results from the analysis of a sample of open source software (OSS) projects are provided. The results demonstrate the utility of FDA for uncovering and categorizing multiple distinct patterns of evolution in the complexity of OSS projects. These results are promising in that they demonstrate some patterns in which the complexity of software decreased as the software grew in size, a particularly novel result. The paper reports preliminary explorations of factors that may be associated with decreasing complexity patterns in these projects. The paper concludes by describing several next steps for this research project as well as some questions for which more sophisticated analytical techniques may be needed. %B Statistical Science %I Institute of Mathematical Statistics %V 21 %P 167-178 %U http://www.jstor.org/stable/27645747 %0 Conference Paper %B OSS2006: Open Source Systems (IFIP 2.13) %D 2006 %T Participation in Free and Open Source Communities: An Empirical Study of Community Members’ Perceptions %A Schofield, Andrew %A Cooper, Grahame %K Survey %X Although the defining factors of Free and Open Source Software (FOSS) are generally seen as the availability and accessibility of the source code, it is what these facilitate that is perhaps of more significance. Source code availability allows the sharing of code, skills, knowledge, and effort, focused on a particular piece of software under development. The result of this is the FOSS community, which although often perceived as a single group, is actually many small groups, each bound by a common interest in a particular piece of software and using the Internet as a communication medium. Although there have been studies focusing on the motivation of FOSS developers to contribute to software, there has been little investigation into the motives, attitudes, and the culture within the communities as a whole. There is much more to most of these communities than software development. Many also have extensive support networks for the use of software, portals for research, and social facilities. This paper describes the results of an investigation into how FOSS community members perceive the communities that they belong to, their reasons for being in the community, and the manner in which they participate. %B OSS2006: Open Source Systems (IFIP 2.13) %S IFIP International Federation for Information Processing %I Springer %P 221 - 231 %G eng %R http://dx.doi.org/10.1007/0-387-34226-5_22 %> https://flosshub.org/sites/flosshub.org/files/Participation%20in%20free%20and%20open%20source.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Predicting defect densities in source code files with decision tree learners %A Knab, Patrick %A Pinzger, Martin %A Bernstein, Abraham %K change analysis %K data mining %K decision tree learner %K defect density %K defect prediction %K mozilla %K prediction %K release history %K scm %K source code %K version control %X With the advent of open source software repositories the data available for defect prediction in source files increased tremendously. Although traditional statistics turned out to derive reasonable results the sheer amount of data and the problem context of defect prediction demand sophisticated analysis such as provided by current data mining and machine learning techniques.In this work we focus on defect density prediction and present an approach that applies a decision tree learner on evolution data extracted from the Mozilla open source web browser project. The evolution data includes different source code, modification, and defect measures computed from seven recent Mozilla releases. Among the modification measures we also take into account the change coupling, a measure for the number of change-dependencies between source files. The main reason for choosing decision tree learners, instead of for example neural nets, was the goal of finding underlying rules which can be easily interpreted by humans. To find these rules, we set up a number of experiments to test common hypotheses regarding defects in software entities. Our experiments showed, that a simple tree learner can produce good results with various sets of input data. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 119–125 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138012 %R http://doi.acm.org/10.1145/1137983.1138012 %> https://flosshub.org/sites/flosshub.org/files/119Predicting.pdf %0 Conference Paper %B 1st Workshop on Public Data about Software Development (WoPDaSD 2006) %D 2006 %T Regurgitate: Using GIT For F/LOSS Data Collection %A Bart Massey %A Keith Packard %K cvs %K cvsanaly %K git %K history %K promise %K regurgitate %K scm %X We have created a new tool, regurgitate, for importing CVS repositories into the GIT source code management system. Important features of GIT include great expressiveness in capturing relationships between revisions and across files as well as extremely high-speed processing. These features make GIT an ideal platform for gathering detailed longitudinal metrics for open source projects. The availability of regurgitate facilitates using GIT as an analysis tool for that majority of open source projects that keep their repositories in CVS. In particular, GIT is fast enough that it is practical to replay the entire development history of a project commit-at-a-time, collecting metrics at each step. We demonstrate this process for a simple metric and a collection of benchmark F/LOSS repositories. %B 1st Workshop on Public Data about Software Development (WoPDaSD 2006) %> https://flosshub.org/sites/flosshub.org/files/massey.pdf %0 Journal Article %J IEEE Intelligent Systems %D 2006 %T Self-Organization Patterns in Wasp and Open Source Communities %A Valverde, S. %A Theraulaz, G. %A Gautrais, J. %A Fourcassie, V. %A Sole, R.V. %K agents %K decentralization %K developers %K email %K email archives %K flossmole %K hierarchy %K labor division %K organization %K self-organizing teams %K social network analysis %K social networks %K sourceforge %K teams %K wasps %X In this paper, we conducted a comparative study of how social organization takes place in a wasp colony and OSS developer communities. Both these systems display similar global organization patterns, such as hierarchies and clear labor divisions. As our analysis shows, both systems also define interacting agent networks with similar common features that reflect limited information sharing among agents. As far as we know, this is the first research study analyzing the patterns and functional significance of these systems' weighted-interaction networks. By illuminating the extent to which self-organization is responsible for patterns such as hierarchical structure, we can gain insight into the origins of organization in OSS communities. %B IEEE Intelligent Systems %V 21 %P 36 - 40 %8 03/2006 %U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.5574&rep=rep1&type=pdf %N 2 %! IEEE Intell. Syst. %R 10.1109/MIS.2006.34 %> https://flosshub.org/sites/flosshub.org/files/valverde.pdf %0 Conference Paper %B OSS2006: Open Source Systems (IFIP 2.13) %D 2006 %T Social dynamics of free and open source team communications %A Howison, James %A Inoue, Keisuke %A Kevin Crowston %K bug fixing %K bug reports %K bug tracker %K bug tracking %K bugs %K communications %K Dynamic social networks %K FLOSS teams %K Human Factors %K social network analysis %K software development %K sourceforge %X This paper furthers inquiry into the social structure of free and open source software (FLOSS) teams by undertaking social network analysis across time. Contrary to expectations, we confirmed earlier findings of a wide distribution of centralizations even when examining the networks over time. The paper also provides empirical evidence that while change at the center of FLOSS projects is relatively uncommon, participation across the project communities is highly skewed, with many participants appearing for only one period. Surprisingly, large project teams are not more likely to undergo change at their centers. %B OSS2006: Open Source Systems (IFIP 2.13) %S IFIP International Federation for Information Processing %I Springer %V 203/2006 %P 319 - 330 %8 06/2006 %G eng %R http://dx.doi.org/10.1007/0-387-34226-5_32 %> https://flosshub.org/sites/flosshub.org/files/Social%20dynamics%20of%20free%20and%20open%20source%20team.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T A study of the contributors of PostgreSQL %A Daniel M. German %K contributions %K contributors %K cvs %K developers %K mining challenge %K mining software repositories %K msr challenge %K patches %K postgresql %K revision history %K roles %K software evolution %K source code %K team %X This report describes some characteristics of the development team of PostgreSQL that were uncovered by analyzing the history of its software artifacts as recorded by the project's CVS repository. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 163–164 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138022 %R http://doi.acm.org/10.1145/1137983.1138022 %> https://flosshub.org/sites/flosshub.org/files/163AStudyOf.pdf %0 Conference Paper %B Proceedings of the 38th Annual Hawaii International Conference on System Sciences %D 2006 %T A Topological Analysis of the Open Souce Software Development Community %A Jin Xu %A Gao, Yongqin %A Christley, S. %A Madey, G. %K contributors %K developers %K roles %K social network analysis %K social networks %K sourceforge %K srda %K users %X The fast growth of OSS has increased the interest in studying the composition of the OSS community and its collaboration mechanisms. Moreover, the success of a project may be related to the underlying social structure of the OSS development community. In this paper, we perform a quantitative analysis of Open Source Software developers by studying the entire development community at SourceForge [26]. Statistics and social network properties are explored to find collaborations and the effects of different members in the OSS development community. Small world phenomenon and scale free behaviors are found in the SourceForge development network. These topological properties may potentially explain the success and efficiency of OSS development practices. We also infer from our analysis that weakly associated but contributing co-developers and active users may be an important factor in OSS development. %B Proceedings of the 38th Annual Hawaii International Conference on System Sciences %I IEEE %C Big Island, HI, USA %P 1-10 %U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.6830&rep=rep1&type=pdf %R 10.1109/HICSS.2005.57 %> https://flosshub.org/sites/flosshub.org/files/xuGao.pdf %0 Journal Article %J Management Science %D 2006 %T Understanding the Motivations, Participation, and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects %A Roberts, Jeffrey A. %A Il-Horn Hann %A Slaughter, Sandra A. %K apache %K change logs %K contributions %K email %K email archives %K extrinsic motivation %K intrinsic motivation %K mailing lists %K MOTIVATION %K open source software %K participation %K software development performance %K source code %K status %K Survey %X Understanding what motivates participation is a central theme in the research on open source software (OSS) development. Our study contributes by revealing how the different motivations of OSS developers are interrelated, how these motivations influence participation leading to performance, and how past performance influences subsequent motivations. Drawing on theories of intrinsic and extrinsic motivation, we develop a theoretical model relating the motivations, participation, and performance of OSS developers. We evaluate our model using survey and archival data collected from a longitudinal field study of software developers in the Apache projects. Our results reveal several important findings. First, we find that developers’ motivations are not independent but rather are related in complex ways. Being paid to contribute to Apache projects is positively related to developers’ status motivations but negatively related to their use-value motivations. Perhaps surprisingly, we find no evidence of diminished intrinsic motivation in the presence of extrinsic motivations; rather, status motivations enhance intrinsic motivations. Second, we find that different motivations have an impact on participation in different ways. Developers’ paid participation and status motivations lead to above-average contribution levels, but use-value motivations lead to below-average contribution levels, and intrinsic motivations do not significantly impact average contribution levels. Third, we find that developers’ contribution levels positively impact their performance rankings. Finally, our results suggest that past-performance rankings enhance developers’ subsequent status motivations. %B Management Science %V 52 %P 984 - 999 %8 07/2006 %N 7 %! Management Science %R 10.1287/mnsc.1060.0554 %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Using evolutionary annotations from change logs to enhance program comprehension %A Daniel M. German %A Peter C. Rigby %A Storey, Margaret-Anne %K annotations %K apache %K bug tracking %K change history %K eclipse %K evolutionary %K log files %K mailing lists %K mining software repositories %K software evolution %K version control %X Evolutionary annotations are descriptions of how source code evolves over time. Typical source comments, given their static nature, are usually inadequate for describing how a program has evolved over time; instead, source code comments are typically a description of what a program currently does. We propose the use of evolutionary annotations as a way of describing the rationale behind changes applied to a given program (for example "These lines were added to ..."). Evolutionary annotations can assist a software developer in the understanding of how a given portion of source code works by showing him how the source has evolved into its current form.In this paper we describe a method to automatically create evolutionary annotations from change logs, defect tracking systems and mailing lists. We describe the design of a prototype for Eclipse that can filter and present these annotations alongside their corresponding source code and in workbench views. We use Apache as a test case to demonstrate the feasibility of this approach. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 159–162 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138020 %R http://doi.acm.org/10.1145/1137983.1138020 %> https://flosshub.org/sites/flosshub.org/files/159UsingEvolutionary.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Using software birthmarks to identify similar classes and major functionalities %A Kakimoto, Takeshi %A Monden, Akito %A Kamei, Yasutaka %A Tamada, Haruaki %A Tsunoda, Masateru %A Matsumoto, Ken-ichi %K argouml %K class %K file %K mining challenge %K msr challenge %K multi-dimensional scaling %K similarity %K software birthmark %K source code %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 171–172 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138026 %R http://doi.acm.org/10.1145/1137983.1138026 %> https://flosshub.org/sites/flosshub.org/files/171UsingSoftware.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Where is bug resolution knowledge stored? %A Canfora, Gerardo %A Cerulo, Luigi %K argouml %K bugs %K bugzilla %K cvs %K impact analysis %K mining challenge %K mining software repositories %K msr challenge %K source code %X ArgoUML uses both CVS and Bugzilla to keep track of bug-fixing activities since 1998. A common practice is to reference source code changes resolving a bug stored in Bugzilla by inserting the id number of the bug in the CVS commit notes. This relationship reveals useful to predict code entities impacted by a new bug report.In this paper we analyze ArgoUML software repositories with a tool, we have implemented, showing what are Bugzilla fields that better predict such impact relationship, that is where knowledge about bug resolution is stored. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 183–184 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138032 %R http://doi.acm.org/10.1145/1137983.1138032 %> https://flosshub.org/sites/flosshub.org/files/183WhereIsBug.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Accelerating cross-project knowledge collaboration using collaborative filtering and social networks %A Ohira, Masao %A Ohsugi, Naoki %A Ohoka, Tetsuya %A Matsumoto, Ken-ichi %K collaborative filtering %K developers %K knowledge collaboration %K projects %K social networks %K sourceforge %K visualization tool %X Vast numbers of free/open source software (F/OSS) development projects use hosting sites such as Java.net and SourceForge.net. These sites provide each project with a variety of software repositories (e.g. repositories for source code sharing, bug tracking, discussions, etc.) as a media for communication and collaboration. They tend to focus on supporting rich collaboration among members in each project. However, a majority of hosted projects are relatively small projects consisting of few developers and often need more resources for solving problems. In order to support cross-project knowledge collaboration in F/OSS development, we have been developing tools to collect data of projects and developers at SourceForge, and to visualize the relationship among them using the techniques of collaborative filtering and social networks. The tools help a developer identify “who should I ask?” and “what can I ask?” and so on. In this paper, we report a case study of applying the tools to F/OSS projects data collected from SourceForge and how effective the tools can be used for helping cross-project knowledge collaboration. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 111-115 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083163 %R http://doi.acm.org/10.1145/1082983.1083163 %> https://flosshub.org/sites/flosshub.org/files/111Accelerating.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Analysis of signature change patterns %A Kim, Sunghun %A Whitehead,Jr., E. James %A Bevan, Jennifer %K apache %K gcc %K kernel %K linux %K signature change %K signature change patterns %K software evolution %K software evolution path %K soure code %X Software continually changes due to performance improvements, new requirements, bug fixes, and adaptation to a changing operational environment. Common changes include modifications to data definitions, control flow, method/function signatures, and class/file relationships. Signature changes are notable because they require changes at all sites calling the modified function, and hence as a class they have more impact than other change kinds.We performed signature change analysis over software project histories to reveal multiple properties of signature changes, including their kind, frequency, and evolution patterns. These signature properties can be used to alleviate the impact of signature changes. In this paper we introduce a taxonomy of signature change kinds to categorize observed changes. We report multiple properties of signature changes based on an analysis of eight prominent open source projects including the Apache HTTP server, GCC, and Linux 2.5 kernel. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 1–5 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083154 %R http://doi.acm.org/10.1145/1082983.1083154 %> https://flosshub.org/sites/flosshub.org/files/64AnalysisOfSignature.pdf %0 Journal Article %J AMCIS 2005 Proceedings %D 2005 %T Are All Open Source Projects Created Equal? Understanding the Sustainability of Open Source Software Development Model %A Long, J. %A Yuan, M.J. %K contributors %K core %K developers %K downloads %K metadata %K project success %K sourceforge %X A very intriguing question in Open Source software (OSS) development is: why there are only a few open source projects succeed, while the majority of projects never do. In this research, we examine the factors that may influence the performance of OSS projects. We particularly focus on the OSS’s core developers’ role in the project’s success. Extant research has yet to distinguish core developers and non-core developers from the community at large. The different roles of the core developers and non-core developers in OSS projects’ success still remain unclear. Our research contributes to the literature by separating the core developers from the development forces in general and empirically examining the core developers’ importance. Drawing the evidences from our extensive dataset of 300 open source projects, we demonstrated that core developers’ leadership and project advocation are crucial in determining the fate of the OSS projects. Our research could provide better understanding of OSS sustainability. It could also give practical advice to the OSS community on how to make the project successful. %B AMCIS 2005 Proceedings %> https://flosshub.org/sites/flosshub.org/files/LongYuan.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Developer identification methods for integrated data from various sources %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %K anonymization %K bug tracker %K developers %K email %K email address %K gnome %K identity %K mailing list %K privacy %K source code %K version control %X Studying a software project by mining data from a single repository has been a very active research field in software engineering during the last years. However, few efforts have been devoted to perform studies by integrating data from various repositories, with different kinds of information, which would, for instance, track the different activities of developers. One of the main problems of these multi-repository studies is the different identities that developers use when they interact with different tools in different contexts. This makes them appear as different entities when data is mined from different repositories (and in some cases, even from a single one). In this paper we propose an approach, based on the application of heuristics, to identify the many identities of developers in such cases, and a data structure for allowing both the anonymized distribution of information, and the tracking of identities for verification purposes. The methodology will be presented in general, and applied to the GNOME project as a case example. Privacy issues and partial merging with new data sources will also be considered and discussed. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 106-110 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083162 %R http://doi.acm.org/10.1145/1082983.1083162 %> https://flosshub.org/sites/flosshub.org/files/106DeveloperIdentification.pdf %0 Conference Proceedings %B Americas Conference on Information Systems (AMCIS 2005) %D 2005 %T Development Success in Open Source Software Projects: Exploring the Impact of Copylefted Licenses %A Colazo, Jorge A. %A Fang, Yulin %A Neufeld, Derrick J. %K contributions %K copyleft %K developer %K developers %K membership %K productivity %K project success %K success %X Copyleft prevents the source code of open source software (OSS) from being privately appropriated. The ethos of the OSS movement suggests that volunteer developers may particularly value and contribute to copylefted projects. Based on social movement theory, we hypothesized that copylefted OSS projects are more likely than non-copylefted OSS projects to succeed in the development process, in terms of two key indicators: developer membership and developer productivity. We performed an exploratory study using data from 62 relevant OSS projects spanning an average of three years of development time. We found that copylefted projects were associated with higher developer membership and productivity. This is the first study to empirically test the relationship between copylefted licenses and OSS project success. Implications for OSS project initiators as well as future research directions are discussed. %B Americas Conference on Information Systems (AMCIS 2005) %U http://aisel.isworld.org/password.asp?Vpath=AMCIS/2005&\#38;PDFpath=OSSDAU01-1167.pdf %0 Conference Proceedings %B Proceedings of the 43rd Annual Meeting of the ACL %D 2005 %T Digesting Virtual “Geek” Culture: The Summarization of Technical Internet Relay Chats %A Liang Zhou %A Edouard Hovy %K computational linguistics %K irc %K linux %K summarizing %X This paper describes a summarization system for technical chats and emails on the Linux kernel. To reflect the complexity and sophistication of the discussions, they are clustered according to subtopic structure on the sub-message level, and immediate responding pairs are identified through machine learning methods. A resulting summary consists of one or more mini-summaries, each on a subtopic from the discussion. %B Proceedings of the 43rd Annual Meeting of the ACL %C Ann Arbor, MI, USA %P 298-305 %8 06/2005 %U http://acl.ldc.upenn.edu/P/P05/P05-1037.pdf %> https://flosshub.org/sites/flosshub.org/files/P05-1037.pdf %0 Journal Article %J IEEE Transactions on Software Engineering %D 2005 %T Empirical validation of object-oriented metrics on open source software for fault prediction %A Gyimothy, T. %A Ferenc, R. %A Siket, I. %K bugs %K bugzilla %K cbo %K defects %K dit %K fault-prone modules %K faults %K lcom %K lcomn %K loc %K metrics %K mozilla %K noc %K object-oriented %K rfc %K source code %K wmc %X Open source software systems are becoming increasingly important these days. Many companies are investing in open source projects and lots of them are also using such software in their own work. But, because open source software is often developed with a different management style than the industrial ones, the quality and reliability of the code needs to be studied. Hence, the characteristics of the source code of these projects need to be measured to obtain more information about it. This paper describes how we calculated the object-oriented metrics given by Chidamber and Kemerer to illustrate how fault-proneness detection of the source code of the open source Web and e-mail suite called Mozilla can be carried out. We checked the values obtained against the number of bugs found in its bug database - called Bugzilla - using regression and machine learning methods to validate the usefulness of these metrics for fault-proneness prediction. We also compared the metrics of several versions of Mozilla to see how the predicted fault-proneness of the software system changed during its development cycle. %B IEEE Transactions on Software Engineering %V 31 %P 897-910 %G eng %U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.8372&rep=rep1&type=pdf %M WOS:000233015300008 %1 software engineering %2 case study %> https://flosshub.org/sites/flosshub.org/files/Gyimothy.pdf %0 Conference Paper %B Proceedings of the 27th international conference on Software engineering %D 2005 %T Enriching software engineering courses with service-learning projects and the open-source approach %A Liu, Chang %K education %K pedagogical %K service learning %K software engineering education %X Real-world software engineers deal with complex problem. Yet many software engineering courses do not involve projects of enough complexity to give students such experience. We sense that service-learning projects, while difficult to manage and sustain, can serve a crucial role in this regard. Through trials in a senior-level software engineering course, we discovered that the open-source approach works well to enable students to work on large, multiple-term service-learning projects. We developed GROw, a cross-term, cross-team educational software process to meet the challenges of adopting complex, real-world projects in one-term courses, and to sustain service learning. %B Proceedings of the 27th international conference on Software engineering %S ICSE '05 %I ACM %C New York, NY, USA %P 613–614 %@ 1-58113-963-2 %U http://doi.acm.org/10.1145/1062455.1062566 %R 10.1145/1062455.1062566 %0 Journal Article %D 2005 %T Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code (updated) %A Alan MacCormack %A John Rusnak %A Carliss Baldwin %K complexity %K cost %K dependencies %K functions %K lines of code %K linux %K loc %K mozilla %K source code %X This paper reports data from a study that seeks to characterize the differences in design structure between complex software products. In particular, we use Design Structure Matrices (DSMs) to map the dependencies between the elements of a design and define metrics that allow us to compare the structures of different designs. We first use these metrics to compare the architectures of two software products - the Linux operating system and the Mozilla web browser - that were developed via contrasting modes of organization: specifically, open source versus proprietary development. We then track the evolution of Mozilla, paying particular attention to a purposeful "re-design" effort that was undertaken with the intention of making the product more "modular." We find significant differences in structure between Linux and the first version of Mozilla, suggesting that Linux had a more modular architecture. We also find that the redesign of Mozilla resulted in an architecture that was significantly more modular than that of its predecessor, and indeed, than that of Linux. Our results, while exploratory, are consistent with a view that different modes of organization are associated with designs that possess different structures. However, we also illustrate that purposeful managerial actions can have a large impact on structure. This latter result is important given recent moves to release proprietary software into the public domain. These moves are likely to fail unless the product possesses an architecture that facilitates participation. Our paper provides evidence that a tightly-coupled design can be adapted to meet this objective. %8 June %G eng %> https://flosshub.org/sites/flosshub.org/files/maccormackrusnakbaldwin2.pdf %0 Journal Article %J IEEE Trans. Software Eng. %D 2005 %T The FreeBSD Project: A Replication Case Study of Open Source Development %A Trung T. Dinh-Trong %A James M. Bieman %K apache %K bug reports %K contributors %K core %K cvs %K defect density %K developers %K email %K email archive %K freebsd %K mailing list %K scm %K source code %K users %X Case studies can help to validate claims that open source software development produces higher quality software at lower cost than traditional commercial development. One problem inherent in case studies is external validity—we do not know whether or not results from one case study apply to another development project. We gain or lose confidence in case study results when similar case studies are conducted on other projects. This case study of the FreeBSD project, a long-lived open source project, provides further understanding of open source development. The paper details a method for mining repositories and querying project participants to retrieve key process information. The FreeBSD development process is fairly well-defined with proscribed methods for determining developer responsibilities, dealing with enhancements and defects, and managing releases. Compared to the Apache project, FreeBSD uses 1) a smaller set of core developers—developers who control the code base—that implement a smaller percentage of the system, 2) a larger set of top developers to implement 80 percent of the system, and 3) a more well-defined testing process. FreeBSD and Apache have a similar ratio of core developers to people involved in adapting and debugging the system and people who report problems. Both systems have similar defect densities and the developers are also users in both systems. %B IEEE Trans. Software Eng. %V 31 %P 481-494 %R 10.1109/TSE.2005.73 %> https://flosshub.org/sites/flosshub.org/files/DinhTrungBieman.pdf %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Future Development in the European Software Industry: Patentability of Computer Programs or Open Source Software? %A Rentocchini, Francesco %K european %K market segment %K open source %K os %K patent %K patent literature %K patentability %K software industry %K software sector %X Economic literature has treated the patent system as an indispensable tool to incentive inventive activity and to foster diffusion of technological improvements, but recent developments have brought at the center of the stage the Open Source phenomenon which is based on completely different mechanisms among which the free disclosure of the inventive steps. This work analyzes changes that are taking place into patent literature in order to give account of the desirability of patents in software sector. In addition some ideas on empirical analysis are put forward: the possibility of measuring the relationship between patents and input of innovation process in the software sector and the influence that private firms will have on Open Source developers motivations. %B OSS2005: Open Source Systems %P 311-313 %U http://hdl.handle.net/2038/966 %0 Conference Proceedings %B International Conference on Information Systems %D 2005 %T A human capital perspective of organizational intention to adopt open source software %A Yan Li %A Chuan-Hoo Tan %A Hock-Hai Teo %A Alex Siow %K Survey %B International Conference on Information Systems %G eng %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Idealism and Commercialism – Developing Free/Libre and Open Source Software in Private Businesses %A Lundestad, Christian V. %K dominance %K FLOSS %K FLOSS community %K free/libre %K legitimacy %K linux %K open source %K Private Businesses %K social organisation %K theories of power %X This paper presents a PhD research project undertaken as part of a larger project aimed at paying sociological attention to different forms of distribution of knowledge, including program code. We want to investigate empirically how the commons knows as free/open source software is actually made. In my PhD project I study the use and development of FLOSS in private businesses, focusing on professional developers working in private businesses and at the same time participating in the FLOSS community. The theoretical starting point is theories of power, dominance and legitimacy by Max Weber and Pierre Bourdieu. %B OSS2005: Open Source Systems %P 301-302 %U http://pascal.case.unibz.it/handle/2038/970 %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T An International Master Programme in Free Software in the European Higher Education Space %A Megías, David %A Serra, Jordi %A Macau, Rafael %K education %K free software %K FS community %K GNU/Linux %K learning %K master programme %K software development %K university %X The Universitat Oberta de Catalunya (Open University of Catalonia, UOC) offers an International Master programme in Free Software. The first edition of this master programme began on November 2003 and there are about 240 students currently enrolled at the different specialities offered by the program. In this paper, the design, the methodology and the first few conclusions drawn from this higher education experience are discussed and summarized. After this master programme was changed to accomplish with European Higher Education Space (EHES). %B OSS2005: Open Source Systems %P 349-352 %U http://pascal.case.unibz.it/handle/2038/713 %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories %A Antoniol, Giuliano %A Rollo, Vincenzo Fabio %A Venturi, Gabriele %K change history %K data mining %K evolution %K files %K kernel %K linear predictive coding %K linux %K lpc %K size %K software evolution %K source code %X This paper presents an approach to recover time variant information from software repositories. It is widely accepted that software evolves due to factors such as defect removal, market opportunity or adding new features. Software evolution details are stored in software repositories which often contain the changes history. On the other hand there is a lack of approaches, technologies and methods to efficiently extract and represent time dependent information. Disciplines such as signal and image processing or speech recognition adopt frequency domain representations to mitigate differences of signals evolving in time. Inspired by time-frequency duality, this paper proposes the use of Linear Predictive Coding (LPC) and Cepstrum coefficients to model time varying software artifact histories. LPC or Cepstrum allow obtaining very compact representations with linear complexity. These representations can be used to highlight components and artifacts evolved in the same way or with very similar evolution patterns. To assess the proposed approach we applied LPC and Cepstral analysis to 211 Linux kernel releases (i.e., from 1.0 to 1.3.100), to identify files with very similar size histories. The approach, the preliminary results and the lesson learned are presented in this paper. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 74-78 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083156 %R http://doi.acm.org/10.1145/1082983.1083156 %> https://flosshub.org/sites/flosshub.org/files/74LinearPredictive.pdf %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Looking at Free and Open Source Software: A Study about F/OSS Developers' Culture %A Teli, Maurizio %K cultural analysis %K cultural study %K cyber %K F/OSS developers %K open source %K software development %X My work will be a cultural study of a F/OSS development project, mixing a symmetric approach with the interaction analysis by Erving Goffman. Methodologically I will approach cyber – ethnography. %B OSS2005: Open Source Systems %P 324-325 %U http://pascal.case.unibz.it/handle/2038/968 %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Migrazione di un Sistema Informativo da UNIX-AIX a UNIX-Linux %A Colasanti, Cecilia %A Patruno, Vincenzo %A Vaccari, Carlo %K architettura hardware %K linux %K migrazione %K open source %K server %K sistema informativo %K sistema proprietario %K unix aix %X Il presente documento ha come obiettivo quello di descrivere la politica adottata dall'Istituto Nazionale di Statistica rispetto all'uso del software Open Source. In particolare vengono descritti i sistemi che attualmente operano su piattaforma Linux, quelli che sono in fase di migrazione e le scelte che si sono fatte nel caso di convivenza tra sistemi “open” e sistemi “proprietari”. Viene inoltre illustrata l'architettura hardware scelta nel caso della migrazione di un sistema complesso da piattaforma interamente proprietaria (UNIX AIX) a piattaforma con sistema operativo open Linux Red Hat. %B OSS2005: Open Source Systems %P 287-288 %U http://pascal.case.unibz.it/handle/2038/978 %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Mining evolution data of a product family %A Fischer, Michael %A Oberleitner, Johann %A Ratzinger, Jacek %A Gall, Harald %K bsd %K change analysis %K change history %K cvs %K evolution %K freebsd %K netbsd %K openbsd %K release history %K source code %K text mining %X Diversification of software assets through changing requirements impose a constant challenge on the developers and maintainers of large software systems. Recent research has addressed the mining for data in software repositories of single products ranging from fine- to coarse grained analyses. But so far, little attention has been payed to mining data about the evolution of product families. In this work, we study the evolution and commonalities of three variants of the BSD (Berkeley Software Distribution), a large open source operating system. The research questions we tackle are concerned with how to generate high level views of the system discovering and indicating evolutionary highlights. To process the large amount of data, we extended our previously developed approach for storing release history information to support the analysis of product families. In a case study we apply our approach on data from three different code repositories representing about 8.5GB of data and 10 years of active development. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 12-16 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083145 %R http://doi.acm.org/10.1145/1082983.1083145 %> https://flosshub.org/sites/flosshub.org/files/12MiningEvolution.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Mining version histories to verify the learning process of Legitimate Peripheral Participants %A Huang, Shih-Kun %A Liu, Kang-min %K awstats %K bzflag %K cvs %K filezilla %K gallery %K Legitimate Peripheral Participants (LPP) %K moodle %K open boundary %K open source software development process %K phpmyadmin %K social networks %K sourceforge %X Since code revisions reflect the extent of human involvement in the software development process, revision histories reveal the interactions and interfaces between developers and modules.We therefore divide developers and modules into groups according to the revision histories of the open source software repository, for example, sourceforge.net. To describe the interactions in the open source development process, we use a representative model, Legitimate Peripheral Participation (LPP) [6], to divide developers into groups such as core and peripheral teams, based on the evolutionary process of learning behavior.With the conventional module relationship, we divide modules into kernel and non-kernel types (such as UI). In the past, groups of developers and modules have been partitioned naturally with informal criteria. In this work, however, we propose a developer-module relationship model to analyze the grouping structures between developers and modules. Our results show some process cases of relative importance on the constructed graph of project development. The graph reveals certain subtle relationships in the interactions between core and non-core team developers, and the interfaces between kernel and non-kernel modules. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 84-88 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083158 %R http://doi.acm.org/10.1145/1082983.1083158 %> https://flosshub.org/sites/flosshub.org/files/84MiningVersion.pdf %0 Book Section %B Perspectives on free and open source software %D 2005 %T Open and Closed Systems are Equivalent (that is, in an ideal world) %A Anderson, Ross %K security %X ...In May 2002, I proved a controversial theorem [8]: that, under the standard assumptions of reliability growth theory, it does not matter whether the system is open or closed. Opening a system enables the attacker to discover vulnerabilities more quickly, but it helps the defenders exactly as much. This caused consternation in some circles, as it was interpreted as a general claim that open systems are no better than closed ones. But that is not what the theorem implies. Most real systems will deviate in important ways from the assumptions of the standard reliability growth model, and it will often be the case that open systems (or closed systems) will be better in some particular application. My theorem lets people concentrate on the differences between open and closed systems that matter in a particular case. %B Perspectives on free and open source software %I MIT Press %P 127-142 %@ 9780262062466 %U http://www.cl.cam.ac.uk/~rja14/Papers/toulousebook.pdf %> https://flosshub.org/sites/flosshub.org/files/toulousebook.pdf %0 Conference Paper %B Proceedings of the second international workshop on Software engineering for high performance computing system applications %D 2005 %T Predicting risky modules in open-source software for high-performance computing %A Phadke, Amit A. %A Allen, Edward B. %K C4.5 %K decision trees %K empirical case study %K high performance computing %K logistic regression %K Open-source software %K PETSc %K software metrics %K software quality model %K software reliability %X This paper presents the position that software-quality modeling of open-source software for high-performance computing can identify modules that have a high risk of bugs.Given the source code for a recent release, a model can predict which modules are likely to have bugs, based on data from past releases. If a user knows which software modules correspond to functionality of interest, then risks to operations become apparent. If the risks are too great, the user may prefer not to upgrade to the most recent release.Of course, such predictions are never perfect. After release, bugs are discovered. Some bugs are missed by the model, and some predicted errors do not occur. A successful model will be accurate enough for informed management action at the time of the predictions.As evidence for this position, this paper summarizes a case study of the Portable Extensible Toolkit for Scientific Computation (PETSC), which is a mathematical library for high-performance computing. Data was drawn from source-code and configuration management logs. The accuracy of logistic-regression and decision-tree models indicated that the methodology is promising. The case study also illustrated several modeling issues. %B Proceedings of the second international workshop on Software engineering for high performance computing system applications %S SE-HPCS '05 %I ACM %C New York, NY, USA %P 60–64 %@ 1-59593-117-1 %U http://doi.acm.org/10.1145/1145319.1145337 %R 10.1145/1145319.1145337 %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Recovering system specific rules from software repositories %A Williams, Chadd C. %A Hollingsworth, Jeffrey K. %K function usage patterns %K functions %K source code %K wine %X One of the most successful applications of static analysis based bug finding tools is to search the source code for violations of system-specific rules. These rules may describe how functions interact in the code, how data is to be validated or how an API is to be used. To apply these tools, the developer must encode a rule that must be followed in the source code. The difficulty is that many of these system-specific rules are undocumented and "grow" over time as the source code changes. Most research in this area relies on expert programmers to document these little-known rules. In this paper we discuss a method to automatically recover a subset of these rules, function usage patterns, by mining the software repository. We present a preliminary study that applies our work to a large open source software project. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 7-11 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083144 %R http://doi.acm.org/10.1145/1082983.1083144 %> https://flosshub.org/sites/flosshub.org/files/7Recovering.pdf %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T SchoolTool: Defining Our Niche in the Open Source Architecture of Schools %A Hoffman, Tom %K information system %K open source %K school %K student %B OSS2005: Open Source Systems %P 334-337 %U http://pascal.case.unibz.it/handle/2038/1436 %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T SCQL: a formal model and a query language for source control repositories %A Hindle, Abram %A Daniel M. German %K evolution %K file %K gnumeric %K modperl %K openssl %K revision %K samba %K scm %K source code %X Source Control Repositories are used in most software projects to store revisions to source code files. These repositories operate at the file level and support multiple users. A generalized formal model of source control repositories is described herein. The model is a graph in which the different entities stored in the repository become vertices and their relationships become edges. We then define SCQL, a first order, and temporal logic based query language for source control repositories. We demonstrate how SCQL can be used to specify some questions and then evaluate them using the source control repositories of five different large software projects. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 100-104 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083161 %R http://doi.acm.org/10.1145/1082983.1083161 %> https://flosshub.org/sites/flosshub.org/files/100scql.pdf %0 Conference Paper %B Proceedings of the 27th international conference on Software engineering %D 2005 %T Silver bullet or fool's gold: supporting usability in open source software development %A Twidale, Michael %K course project %K education %K lifecycle model %K pedagogical %K software engineering education %K software process %X At first glance it can look like Open Source Software development violates many, if not all, of the precepts of decades of careful research and teaching in Software Engineering. One could take a classic SE textbook and compare the activities elaborated and advocated in the various chapters with what is actually done in plain sight in the public logs of an OSS project in say SourceForge. For a Professor of Software Engineering this might make for rather depressing reading. Are the principles of SE being rendered obsolete? Has OSS really discovered Brooks' Silver Bullet? Or is it just a flash in the pan or Fool's Gold.In this talk I will mainly look at one aspect of Open Source Development, the 'problem' of creating usable interfaces, particularly for non-technical end-users. Any approach involves the challenge of how to coordinate distributed collaborative interface analysis and design, given that in conventional software development this is usually done in small teams and almost always face to face. Indeed all the methods in any HCI text just assume same-time same-place work and don't map to distributed work, let alone the looser mechanisms of OSS development. Instead what is needed is a form of participatory usability involving the coordination of end users and developers in a constantly evolving redesign process. %B Proceedings of the 27th international conference on Software engineering %S ICSE '05 %I ACM %C New York, NY, USA %P 35–35 %@ 1-58113-963-2 %U http://doi.acm.org/10.1145/1062455.1062468 %R 10.1145/1062455.1062468 %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T A Social Network Approach To Free/Open Source Software Simulation %A Wagstrom, Patrick Adam %A Herbsleb, James %A Carley, Kathleen %K email %K mailing list %K social network analysis %X Free and Open Source Software (F/OSS) development is a complex process that is just beginning to be understood. The actual development process is frequently characterized as disparate volunteer developers collaborating to create a piece of software. The developers of F/OSS, like most software engineers, spend a significant portion of their time fostering collaboration through various channels social communication. We have analyzed several methods of communication; a social networking site, project mailing lists, and developer weblogs; to gain an understanding of the social network structure behind F/OSS projects. This social network data was used to create a model of F/OSS development that allows for multiple projects, users, and developers with varying goals and socialization methods. Using this model we have been able to replicate some of the known phenomena observed in F/OSS and provide a first step in the creation of a robust model of F/OSS. %B OSS2005: Open Source Systems %P 16-23 %U http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.4984 %R 10.1.1.178.4984 %0 Journal Article %J Computer Supported Cooperative Work (CSCW) %D 2005 %T Socialization in an Open Source Software Community: A Socio-Technical Analysis %A DUCHENEAUT, NICOLAS %K cvs %K developers %K email %K email archive %K mailing list %K open source project browser %K participation %K python %K scm %K source code %K team %K tools %X Open Source Software (OSS) development is often characterized as a fundamentally new way to develop software. Past analyses and discussions, however, have treated OSS projects and their organization mostly as a static phenomenon. Consequently, we do not know how these communities of software developers are sustained and reproduced over time through the progressive integration of new members. To shed light on this issue I report on my analyses of socialization in a particular OSS community. In particular, I document the relationships OSS newcomers develop over time with both the social and material aspects of a project. To do so, I combine two mutually informing activities: ethnography and the use of software specially designed to visualize and explore the interacting networks of human and material resources incorporated in the email and code databases of OSS. Socialization in this community is analyzed from two perspectives: as an individual learning process and as a political process. From these analyses it appears that successful participants progressively construct identities as software craftsmen, and that this process is punctuated by specific rites of passage. Successful participants also understand the political nature of software development and progressively enroll a network of human and material allies to support their efforts. I conclude by discussing how these results could inform the design of software to support socialization in OSS projects, as well as practical implications for the future of these projects. %B Computer Supported Cooperative Work (CSCW) %I Springer Netherlands %V 14 %P 323-368 %U http://dx.doi.org/10.1007/s10606-005-9000-1 %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Socialization practices in FLOSS development teams %A Chengetai Masango %K development team %K FLOSS %K member %K open source %K socialization %X Socialization of new members into Free/Libre Open Source Software (FLOSS) development teams is an important but little studied process in producing effective teams of this type. This is a dissertation proposal for a virtual ethnographic study that looks at the mechanisms and processes used to socialize new members into the team in order to help maintain a common group identity and focus. %B OSS2005: Open Source Systems %P 322-323 %U http://pascal.case.unibz.it/handle/2038/1438 %0 Conference Paper %B Proceedings of the 27th international conference on Software engineering %D 2005 %T Software architecture in an open source world %A Roy T. Fielding %K apache %K collaborative open source development %K eclipse %K extensibility %K Firefox %K linux %K linux kernel %K loose coupling %K modularity %K mozilla %K open source %K software architecture %X In spite of the hype and hysteria surrounding open source software development, there is very little that can be said of open source in general. Open source projects range in scope from the miniscule, such as the thousands of non-maintained code dumps left behind at the end of class projects, dissertations, and failed commercial ventures, to the truly international, with thousands of developers collaborating, directly or indirectly, on a common platform. One characteristic that is shared by the largest and most successful open source projects, however, is a software architecture designed to promote anarchic collaboration through extensions while at the same time preserving centralized control over the interfaces. This talk features a survey of the state-of-the-practice in open source development in regards to software architecture, with particular emphasis on the modular extensibility interfaces within several of the most successful projects, including Apache httpd, Eclipse, Mozilla Firefox, Linux kernel, and the World Wide Web (which few people recognize as an open source project in itself). These projects fall under the general category of collaborative open source software development, which emphasizes community aspects of software engineering in order to compensate for the often-volunteer nature of core developers and take advantage of the scalability obtainable through Internet-based virtual organizations. %B Proceedings of the 27th international conference on Software engineering %S ICSE '05 %I ACM %C New York, NY, USA %P 43–43 %@ 1-58113-963-2 %U http://doi.acm.org/10.1145/1062455.1062474 %R 10.1145/1062455.1062474 %0 Conference Paper %B Proceedings of the 27th international conference on Software engineering %D 2005 %T Software engineering education in the era of outsourcing, distributed development, and open source software: challenges and opportunities %A Hawthorne, Matthew J. %A Perry, Dewayne E. %K computer science education %K contextual learning %K education %K informatics %K software engineering education %X As software development becomes increasingly globally distributed, and more software functions are delegated to common open source software (OSS) and commercial off-the-shelf (COTS) components, practicing software engineers face significant challenges for which current software engineering curricula may leave them inadequately prepared. A new multi-faceted distributed development model is emerging that effectively commoditizes many development activities once considered integral to software engineering, while simultaneously requiring practitioners to apply engineering principles in new and often unfamiliar contexts. We discuss the challenges that software engineers face as a direct result of outsourcing and other distributed development approaches that are increasingly being utilized by industry, and some of the key ways we need to evolve software engineering curricula to address these challenges. %B Proceedings of the 27th international conference on Software engineering %S ICSE '05 %I ACM %C New York, NY, USA %P 643–644 %@ 1-58113-963-2 %U http://doi.acm.org/10.1145/1062455.1062581 %R 10.1145/1062455.1062581 %0 Conference Paper %B 2005 Symposium on Usable Privacy and Security %D 2005 %T Stopping spyware at the gate: a user study of privacy, notice and spyware %A N. Good %A Dhamija, R. %A J. Grossklags %A D. Thaw %A Aronowitz, S. %A D. Mulligan %A J. Konstan %K agreement, %K and %K Aspects, %K Design, %K end %K EULA, %K Experimentation, %K Factors, %K Human %K Legal %K license %K notice, %K of %K privacy, %K security %K service, %K spyware, %K terms %K ToS, %K usability, %K user %B 2005 Symposium on Usable Privacy and Security %I Association for Computing Machinery %C Pittsburgh, PA %P 43-52 %8 07/2005 %@ 1-59593-178-3 %G eng %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Structure, Cohesion, and Open Source Software Success %A Sherae Daniel %K interest community %K network externalities %K open source software %K software quality %X This paper proposes a dissertation designed to understand how the open source software (OSS) development group and its associated interest community jointly and independently impact OSS success for a single OSS project. %B OSS2005: Open Source Systems %P 317-319 %U http://pascal.case.unibz.it/handle/2038/1536 %0 Journal Article %J Hous. L. Rev. %D 2005 %T A Theory of Disclosure for Security and Competitive Reasons: Open Source, Proprietary Software, and Government Systems %A Swire, Peter P %K security %X A previous article, “A Model for When Disclosure Helps Security: What is Different about Computer and Network Security?” proposed a model for when disclosure helps or hurts security and provided reasons why computer security is often different in this respect than physical security. This chapter provides a general approach for describing the incentives of actors to disclose information about their software or systems. A chief point of this chapter is that the incentives of disclosure depend on two largely independent assessments: (i) the degree to which disclosure helps or hurts security %B Hous. L. Rev. %I HeinOnline %V 42 %P 1333 %> https://flosshub.org/sites/flosshub.org/files/KP21%2003%20Swire.pdf %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Transfering Libre Software Development Practices to the Production of Educational Resources: the Edukalibre Project %A González-Barahona, Jesús M. %A Chris Tebb %A Vania Dimitrova %A Chaparro, Diego %A Romera, Teo %K educational resources %K information systems %K open source %K software development practices %X The transfer of methodologies common in libre (free, open source) sofware development to the domain of educational resources can radically change the way educational content is developed and used, enabling both educational practitioners and students to become actively involved in its creation and distribution. New software architectures and tools are needed to effectively support this process. This paper describes a platform aimed to support the creation of free, collaboratively constructed educational content on the web, which has been developed within the Edukalibre project. It provides easy access to core technologies: a control version system combined with conversion tools to produce several convenient formats for each document. Its modular architecture offers many different interfaces to the users. The Edukalibre platform is distributed as libre software. %B OSS2005: Open Source Systems %P 341-348 %U http://pascal.case.unibz.it/handle/2038/1548 %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Understanding source code evolution using abstract syntax tree matching %A Neamtiu, Iulian %A Foster, Jeffrey S. %A Hicks, Michael %K abstract syntax trees %K apache %K bind %K evolution %K linux %K openssh %K software evolution %K source code %K source code analysis %K vsftpd %X Mining software repositories at the source code level can provide a greater understanding of how software evolves. We present a tool for quickly comparing the source code of different versions of a C program. The approach is based on partial abstract syntax tree matching, and can track simple changes to global variables, types and functions. These changes can characterize aspects of software evolution useful for answering higher level questions. In particular, we consider how they could be used to inform the design of a dynamic software updating system. We report results based on measurements of various versions of popular open source programs, including BIND, OpenSSH, Apache, Vsftpd and the Linux kernel. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 2-6 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083143 %R http://doi.acm.org/10.1145/1082983.1083143 %> https://flosshub.org/sites/flosshub.org/files/2Understanding.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Using a clone genealogy extractor for understanding and supporting evolution of code clones %A Kim, Miryung %A Notkin, David %K clone %K clone detection %K cvs %K developers %K evolution %K maintenance %K refactoring %K source code %X Programmers often create similar code snippets or reuse existing code snippets by copying and pasting. Code clones —syntactically and semantically similar code snippets—can cause problems during software maintenance because programmers may need to locate code clones and change them consistently. In this work, we investigate (1) how code clones evolve, (2) how many code clones impose maintenance challenges, and (3) what kind of tool or engineering process would be useful for maintaining code clones. Based on a formal definition of clone evolution, we built a clone genealogy tool that automatically extracts the history of code clones from a source code repository (CVS). Our clone genealogy tool enables several analyses that reveal evolutionary characteristics of code clones. Our initial results suggest that aggressive refactoring may not be the best solution for all code clones; thus, we propose alternative tool solutions that assist in maintaining code clones using clone genealogy information. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 17-23 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083146 %R http://doi.acm.org/10.1145/1082983.1083146 %> https://flosshub.org/sites/flosshub.org/files/17Using.pdf %0 Generic %D 2004 %T Applying Social Network Analysis to the Information in CVS Repositories %A López-Fernández, L. %A Gregorio Robles %A Jesus M. Gonzalez-Barahona %K apache %K complex networks %K cvs %K gnome %K kde %K libre software engineering %K source code %K source code repositories %K visualization techniques %K vizualization %X The huge quantities of data available in the CVS repositories of large, long-lived libre (free, open source) software projects, and the many interrelationships among those data offer opportunities for extracting large amounts of valuable information about their structure, evolution and internal processes. Unfortunately, the sheer volume of that information renders it almost unusable without applying methodologies which highlight the relevant information for a given aspect of the project. In this paper, we propose the use of a well known set of methodologies (social network analysis) for characterizing libre software projects, their evolution over time and their internal structure. In addition, we show how we have applied such methodologies to real cases, and extract some preliminary conclusions from that experience. %B International Workshop on Mining Software Repositories (MSR 2004) %P 101-105 %> https://flosshub.org/sites/flosshub.org/files/101ApplyingSocial.pdf %0 Conference Proceedings %B Proceedings of the 4th ICSE Workshop on Open Source %D 2004 %T Community structure of modules in the Apache project %A Jesus M. Gonzalez-Barahona %A Luis Lopez %A Gregorio Robles %K apache %K cvs %K source code %X The relationships among modules in a software project of a certain size can give us much information about its internal organization and a way to control and monitor development activities and evolution of large libre software projects. In this paper, we show how information available in CVS repositories can be used to study the structure of the modules in a project when they are related by the people working in them, and how techniques taken from the social networks fields can be used to highlight the characteristics of that structure. As a case example, we also show some results of applying this methodology to the Apache project in several points in time. Among other facts, it is shown how the project evolves and is self-structuring, with developer communities of modules corresponding to semantically related families of modules. %B Proceedings of the 4th ICSE Workshop on Open Source %P 44-48 %> https://flosshub.org/sites/flosshub.org/files/gonzalezBarahona44-48.pdf %0 Conference Proceedings %B Proceedings of the 4th ICSE Workshop on Open Source %D 2004 %T Contributing to OS Projects. A Comparison between Individual and Firms %A Andrea Bonaccorsi %A Cristina Rossi %K Survey %X This paper studies the contributions software firms make to Open Source (OS) projects. Our goal is to ascertain whether they follow the same regularity of pattern seen for individual programmer An exhaustive empirical analysis was carried out using data on project membership1 , project coordination and the contributions made by 146 Italian firms that do business with OS software. We compare our findings with the results of the surveys taken on OS programmers. The availability of the data gathered by Hertel et al. ([10]) on 141 developers of the Linux kernel allowed a direct comparison to be carried out between the two sets2 . %B Proceedings of the 4th ICSE Workshop on Open Source %P 18-22 %> https://flosshub.org/sites/flosshub.org/files/19-23.pdf %0 Conference Paper %B 1st International Workshop on Computer Supported Activity Coordination, 6th International Conference on Enterprise Information Systems %D 2004 %T Coordination practices for bug fixing within FLOSS development teams %A Kevin Crowston %A Barbara Scozzi %K activity %K bug fixing %K bug reports %K bug tracker %K coordination %K downloads %K dynapi %K FLOSS %K gaim %K kicq %K phpmyadmin %K status %X Free/Libre Open Source Software (FLOSS) is primarily developed by distributed teams. Developers contribute from around the world and coordinate their activity almost exclusively by means of email and bulletin boards. FLOSS development teams some how profit from the advantages and evade the challenges of distributed software development. Despite the relevance of the FLOSS both for research and practice, few studies have investigated the work practices adopted by these development teams. In this paper we investigate the structure and the coordination practices adopted by development teams during the bug-fixing process, which is considered one of main areas of FLOSS project success. In particular, based on a codification of the messages recorded in the bug tracking system of four projects, we identify the accomplished tasks, the adopted coordination mechanisms, and the role undertaken by both the FLOSS development team and the FLOSS community. We conclude with suggestions for further research. %B 1st International Workshop on Computer Supported Activity Coordination, 6th International Conference on Enterprise Information Systems %C Porto, Portugal %> https://flosshub.org/sites/flosshub.org/files/CrowstonScozzi04coordination.pdf %0 Conference Paper %B 2004 Open Source Conference (OSCON) %D 2004 %T Do the Rich Get Richer? The Impact of Power Laws on Open Source Development Projects %A Conklin, Megan %K open source %K power law %K social network analysis %K sourceforge %B 2004 Open Source Conference (OSCON) %C Portland, OR, USA %G eng %9 conference presentation %0 Conference Paper %B Proceedings of the 2004 international workshop on Mining software repositories - MSR '04 %D 2004 %T Four Interesting Ways in Which History Can Teach Us About Software %A Michael Godfrey %A Xinyi Dong %A Cory Kapser %A Lijie Zou %K ant %K apache %K change analysis %K clone %K clone detection %K cvs %K evolution %K gcc %K growth %K kepler %K linux %K midworld %K mycore %K postgresql %K source code %K version control %X In this position paper, we outline four kinds of studies that we have undertaken in trying to understand various aspects of a software system’s evolutionary history. In each instance, the studies have involved detailed examination of real software systems based on “facts” extracted from various kinds of source artifact repositories, as well as the development of accompanying tools to aid in the extraction, abstraction, and comprehension processes. We briefly discuss the goals, results, and methodology of each approach. %B Proceedings of the 2004 international workshop on Mining software repositories - MSR '04 %P 58-62 %8 05/2004 %> https://flosshub.org/sites/flosshub.org/files/58FourInterestingWays.pdf %0 Conference Paper %B Third EPIP Workshop %D 2004 %T Free & Open Source Software Creation and ‘the Economy of Regard’ %A Jean-Michel Dalle %A Paul A. David %A Rishab Ayer Ghosh %A Frank A. Wolak %K linux %K linux kernel %K scm %K source code %B Third EPIP Workshop %8 04/2004 %> https://flosshub.org/sites/flosshub.org/files/DalleDavidGhosh%20Wolak.pdf %0 Journal Article %J IEE Seminar Digests %D 2004 %T Improving comprehension and cooperation through code structure %A A. Capiluppi %K arla %K code structure %K contributors %K developers %K open source system %K scm %K software development %K software engineering %K software process %K software product %K software system architecture %K source code %K source components %K tree evolution %K tree structure %X Defining a relationship between a software system's architecture and the process' efforts is one of the most fascinating questions of software engineering. Apparently, when a system's architecture is complex, the process to improve and evolve it will be more difficult. We try to tackle this question from a different point of view: given an open source system, in all the phases of its evolution, we focus on both the aspects of software developers, and the obtained software product. More we observe one of the possible architectures of this system, based on the tree structure derived from source components. First conclusions show that some patterns of tree evolution are recognizable: some branches may appear more promising than other, and are extensively evolved, while other remains in the same status for all the life cycle. More, when the tree structure reaches some status, the process of joining as a core developer seems to forestall. %B IEE Seminar Digests %I IEE %V 2004 %P 23-28 %U http://link.aip.org/link/abstract/IEESEM/v2004/i908/p23/s1 %R 10.1049/ic:20040260 %> https://flosshub.org/sites/flosshub.org/files/capiluppi2004.pdf %0 Conference Proceedings %B Proceedings of the 4th ICSE Workshop on Open Source %D 2004 %T Inside an Open Source Software Community: Empirical Analysis on Individual and Group Level %A Maass, W. %K apache %K Survey %X An established Open Source Software community (Apache Cocoon) was explored using an online questionnaire about demographic data and individual and group-related factors. Individual factors encompassed forms of contributions, motivation, expertise and knowledge. Role structures, expectations towards other members, trust and collaboration issues were analysed at group level. More than 60% of the developer community completed this questionnaire. Results provide a valuable basis for deeper understanding of knowledge sharing, collaboration and innovation processes in distributed work groups. %B Proceedings of the 4th ICSE Workshop on Open Source %P 65-70 %> https://flosshub.org/sites/flosshub.org/files/maass66-71.pdf %0 Conference Paper %B International Workshop on Mining Software Repositories (MSR 2004) %D 2004 %T LASER: a lexical approach to analogy in software reuse %A Amin, R. %A Mel O Cinneide %A Veale, Tony %K class %K developers %K functions %K jrefactory %K method %K naming %K natural language %K reuse %K source code %K wordnet %X Software reuse is the process of creating a software system from existing software components, rather than creating it from scratch. With the increase in size and complexity of existing software repositories, the need to provide intelligent support to the programmer becomes more pressing. An analogy is a comparison of certain similarities between things which are otherwise unlike. This concept has shown to be valuable in developing UML-level reuse techniques. In the LASER project we apply lexically-driven Analogy at the code level, rather than at the UML-level, in order to retrieve matching components from a repository of existing components. Using the lexical ontology Word-Net, we have conducted a case study to assess if class and method names in open source applications are used in a semantically meaningful way. Our results demonstrate that both hierarchical reuse and parallel reuse can be enhanced through the use of lexically-driven Analogy. %B International Workshop on Mining Software Repositories (MSR 2004) %I IEE %C Edinburgh, Scotland, UK %V 2004 %P 112 - 116 %R 10.1049/ic:20040487 %> https://flosshub.org/sites/flosshub.org/files/112LASER.pdf %0 Conference Proceedings %B International Conference on Information Systems 2004 %D 2004 %T Membership dynamics and network stability in the open-source community: the ising perspective %A Oh, Wonseok %A Jeon, Sangyong %K email %K email archive %K hypermail %K linux %K mailing list %K membership %K membership herding %K newsgroup %K open source %K participants %K social network analysis %K stakeholders %K team size %X In this paper, we address the following two questions: (1)How does a participant’s membership decision affect the others (neighbors) with whom he has collaborated over an extended period of time in an open source software (OSS) network? (2) To what extent do network characteristics (i.e, size and connectivity) mediate the impact of external factors on the OSS participants’ dynamic membership decisions and hence the stability of the network? From the Ising perspective, we present fresh theoretical insight into the dynamic and reciprocal membership relations between OSS participants. We also performed simulations based on empirical data that were collected from two actual OSS communities. Some of the key findings include that (1) membership herding is highly present when the external force is weak, but decreases significantly when the force increases, (2) the propensity for membership herding is most likely to be seen in a large network with a random connectivity, and (3) for large networks, at low external force a random connectivity will perform better than a scale-free counterpart in terms of the network strength. However, as the temperature (external force) increases, the reverse phenomenon is observed. In addition, the scale-free connectivity appears to be less volatile than with the random connectivity in response to the increase in the temperature. We conclude with several implications that may be of significance to OSS stakeholders. %B International Conference on Information Systems 2004 %G eng %> https://flosshub.org/sites/flosshub.org/files/OhJeon.pdf %0 Conference Paper %B Proc. Int'l Workshop on Mining Software Repositories ({MSR}) %D 2004 %T Mining CVS repositories, the softChange experience %A German, Daniel %K bugzilla %K cvs %K email archives %K log files %K logs %K softchange %X CVS logs are a rich source of software trails (information left behind by the contributors to the development process, usually in the forms of logs). This paper describes how softChange extracts these trails, and enhances them. This paper also addresses some challenges that CVS fact extraction poses to researchers. %B Proc. Int'l Workshop on Mining Software Repositories ({MSR}) %P 17–21 %> https://flosshub.org/sites/flosshub.org/files/17MiningCVS.pdf %0 Journal Article %J Empirical Softw. Engg. %D 2004 %T Open-Source Change Logs %A Chen, Kai %A Schach, Stephen R. %A Yu, Liguo %A Offutt, Jeff %A Heller, Gillian Z. %K change log %K gcc %K GCC-g %K GNUJSP %K Jikes %K log files %K Open-source software %K source code %X A recent editorial in Empirical Software Engineering suggested that open-source software projects offer a great deal of data that can be used for experimentation. These data not only include source code, but also artifacts such as defect reports and update logs. A common type of update log that experimenters may wish to investigate is the ChangeLog, which lists changes and the reasons for which they were made. ChangeLog files are created to support the development of software rather than for the needs of researchers, so questions need to be asked about the limitations of using them to support research. This paper presents evidence that the ChangeLog files provided at three open-source web sites were incomplete. We examined at least three ChangeLog files for each of three different open-source software products, namely, GNUJSP, GCC-g++, and Jikes. We developed a method for counting changes that ensures that, as far as possible, each individual ChangeLog entry is treated as a single change. For each ChangeLog file, we compared the actual changes in the source code to the entries in the ChangeLog file and discovered significant omissions. For example, using our change-counting method, only 35 of the 93 changes in version 1.11 of Jikes appear in the ChangeLog file—that is, over 62% of the changes were not recorded there. The percentage of omissions we found ranged from 3.7 to 78.6%. These are significant omissions that should be taken into account when using ChangeLog files for research. Before using ChangeLog files as a basis for research into the development and maintenance of open-source software, experimenters should carefully check for omissions and inaccuracies. %B Empirical Softw. Engg. %I Kluwer Academic Publishers %C Hingham, MA, USA %V 9 %P 197–210 %8 September %U http://portal.acm.org/citation.cfm?id=990374.990391 %R 10.1023/B:EMSE.0000027779.70556.d0 %> https://flosshub.org/sites/flosshub.org/files/chen.pdf %0 Conference Proceedings %B Proc. of Workshop on Mining Software Repositories at the International Conference on Software Engineering %D 2004 %T The perils and pitfalls of mining SourceForge %A Howison, James %A Kevin Crowston %K Data Collection %K sourceforge %X SourceForge provides abundant accessible data from Open Source Software development projects, making it an attractive data source for software engineering research. However it is not without theoretical peril and practical pitfalls. In this paper, we outline practical lessons gained from our spidering, parsing and analysis of SourceForge data. SourceForge can be practically difficult: projects are defunct, data from earlier systems has been dumped in and crucial data is hosted outside SourceForge, dirtying the retrieved data. These practical issues play directly into analysis: decisions made in screening projects can reduce the range of variables, skewing data and biasing correlations. SourceForge is theoretically perilous: because it provides easily accessible data items for each project, tempting researchers to fit their theories to these limited data. Worse, few are plausible dependent variables. Studies are thus likely to test the same hypotheses even if they start from different theoretical bases. To avoid these problems, analyses of SourceForge projects should go beyond project level variables and carefully consider which variables are used for screening projects and which for testing hypotheses. %B Proc. of Workshop on Mining Software Repositories at the International Conference on Software Engineering %P 7-11 %8 05/2004 %G eng %> https://flosshub.org/sites/flosshub.org/files/howison04msr.pdf %0 Journal Article %J Electronic Markets %D 2004 %T Profiling an Open Source Project Ecology and Its Programmers %A Koch, Stefan %K affiliation network %K brooks law %K cocomo %K effort estimation %K evolution %K productivity %K project success %K scm %K size %K time %K version control %X While many successful and well-known open source projects produce output of high quality, a general assessment of this development paradigm is still missing. In this paper, an online community of both small and large, successful and failed projects and their programmers is analysed mainly using the version-control data of each project, also according to their productivity and estimation of expended effort. As the results show, there are indeed significant differences between this cooperative development model and the commercial organization of work in the areas explored. Both open source software projects in their size and their programmers' effort differ significantly, and the evolution of projects' size over time seems in part to contradict the laws of software evolution proposed for commercial systems. Both the inequality of effort distribution between programmers and an increasing number of developers in a project do not lead to a decrease in productivity, opposing Brooks's Law. Effort estimation based on the COCOMO model for commercial organizations shows a large amount of effort expended for the projects, while a more general Norden-Rayleigh modeling shows a distinctly smaller expenditure. This proposes that either a highly efficient development is achieved by this self-organizing cooperative and highly decentralized form of work, or that the participation of users besides programming tasks is enormous and constitutes an economic factor of large proportions. %B Electronic Markets %V 14 %P 77 - 88 %8 6/2004 %N 2 %! Electronic Markets %R 10.1080/10196780410001675031 %0 Conference Paper %B Workshop on Open Source Software Engineering, International Conference on Software Engineering %D 2004 %T Towards a Portfolio of FLOSS project Success Measures %A Kevin Crowston %A Hala Annabi %A Howison, James %A Chengetai Masango %K bug fixing %K developers %K downloads %K project success %K sourceforge %K team %K team size %X Project success is one of the most widely used dependent variables in information systems research. However, conventional measures of project success are difficult to apply to Free/Libre Open Source Software projects. In this paper, we present an analysis of four measures of success applied to SourceForge projects: number of members of the extended development community, project activity, bug fixing time and number of downloads. We argue that these four measures provide different insights into the collaboration and control mechanisms of the projects. %B Workshop on Open Source Software Engineering, International Conference on Software Engineering %8 May %G eng %> https://flosshub.org/sites/flosshub.org/files/crowston04towards.pdf %0 Conference Proceedings %B International Conference on Information Systems 2004 %D 2004 %T Why developers participate in open source software projects: an empirical investigation %A Il-Horn Hann %A Jeff Roberts %A Sandra Slaughter %K Survey %B International Conference on Information Systems 2004 %G eng %0 Journal Article %J Electronic Markets %D 2004 %T Will the Open Source Movement Survive a Litigious Society? %A Vijay K. Vemuri %A Vince Bertone %K courts %K INNOVATION %K lawsuit %K litigation %K patents %K software patents %X Since no one is willing to undertake costly research and development to create innovation, incentives in the form of patents were instituted to motivate R&D. In software development, contrary to economic intuition, open source software has emerged as a viable alternative source of innovation. The patenting system has performed reasonably well in enhancing many other technologies. Since the mid-1990s patenting of software and business methods is increasingly accepted in the United States. The legitimacy of many of these new patents is subject to controversy and debate. In this paper we examine the trend, rate of litigation and disposition of US patents in the US Federal Courts. We find that litigation rates of software and business method patents is four times that of all other patents and is increasing. A majority of patent litigations are not won by the perpetrator of the lawsuits. The open source software community is not immune to heightened patent litigations. Since software development is incremental, the paths of OSS and commercial development are entwined. The spillover of patent litigation into OSS may have disastrous consequences: It may increase the 'cost' of OSS, dissuade volunteer developers and make OSS less attractive to users. %B Electronic Markets %V 14 %P 114-123 %0 Conference Paper %B Proceedings of the 3rd Workshop on Open Source Software Engineering %D 2003 %T Automating the measurement of open source projects %A German, Daniel %A Audris Mockus %K bug reports %K bug tracking %K changelog %K cvs %K defects %K evolution %K log files %K logs %K mailing list %K scm %K softchange %K source code %K ximian %K ximian evolution %X The proliferation of open source projects raises a number of vital economic, social, and software engineering questions that are subject of intense research. Based on experience analyzing numerous open source and commercial projects we propose a set of tools to support extraction and validation of software project data. Such tools would streamline empirical investigation of open source projects and make it possible to test existing and new theories about the nature of open source projects. Our soft- ware includes tools to extract and summarize information from mailing lists, CVS logs, ChangeLog files, and defect tracking databases. More importantly, it cross-links records from various data sources and identifies all contributors for a software change. We illustrate some of the capabilities by analyzing data from Ximian Evolution project. %B Proceedings of the 3rd Workshop on Open Source Software Engineering %P 63–67 %> https://flosshub.org/sites/flosshub.org/files/germanMockus2003.pdf %0 Journal Article %J Computers & Security %D 2003 %T The availability of source code in relation to timely response to security vulnerabilities %A John Reinke %A Hossein Saiedian %K bugtraq %K cert %K email %K email archives %K mailing list %K security %K vulnerability %X Once a vulnerability has been found in an application or service that runs on a computer connected to the Internet, fixing that exploit in a timely fashion is of the utmost importance. There are two parts to fixing vulnerability: a party acting on behalf of the application's vendor gives instructions to fix it or makes a patch available that can be downloaded; then someone using that information fixes the computer or application in question. This paper considers the effects of proprietary software versus non-proprietary software in determining the speed with which a security fix is made available, since this can minimize the amount of time that the computer system remains vulnerable. %B Computers & Security %V 22 %P 707 - 724 %U http://www.sciencedirect.com/science/article/B6V8G-4B9CV31-C/2/a218fccfaef185af5c122f118b252703 %R DOI: 10.1016/S0167-4048(03)00011-7 %0 Journal Article %D 2003 %T Clustering and Dependencies in Free/Open Source Software Development: Methodology and Tools %A Rishab Ayer Ghosh %K scm %K source code %K source code analysis %X This paper addresses the problem of measurement of non-monetary economic activity, specifically in the area of free/open source software communities. It describes the problems associated with research on these communities in the absence of measurable monetary transactions, and suggests possible alternatives. A class of techniques using software source code as factual documentation of economic activity is described and a methodology for the extraction, interpretation and analysis of empirical data from software source code is detailed, with the outline of algorithms for identifying collaborative authorship and determining the identity of coherent economic actors in developer communities. Finally, conclusions are drawn from the application of these techniques to a base of software. %8 April %G eng %U http://dxm.org/papers/toulouse2/cluster-final.pdf %> https://flosshub.org/sites/flosshub.org/files/cluster-final.pdf %0 Journal Article %J Research Policy %D 2003 %T Community, joining, and specialization in open source software innovation: a case study %A Georg von Krogh %A Spaeth, S. %A Karim R Lakhani %K cvs %K email %K email archives %K freenet %K INNOVATION %K mailing lists %K roles %K source code %X This paper develops an inductive theory of the open source software innovation process by focussing on the creation of Freenet, a project aimed at developing a decentralized and anonymous peer-to-peer electronic file sharing network. We are particularly interested in the strategies and processes by which new people join the existing community of software developers, and how they initially contribute code. Analyzing data from multiple sources on the Freenet software development process, we generate the constructs of "joining script", We are grateful to helpful comments from two anonymous reviewers. We also thank Chris Argyris, John Seely Brown, Eric von Hippel, Stefan Haefliger, Petra Kugler, Heike Bruch, Simon Gchter, Simon Peck, and Hari Tsoukas for helpful comments and suggestions. Ben Ho and Craig Lebowitz provided technical assistance with data importation and parsing. We would like to thank Ian Clarke and the Freenet developers for their willingness to participate in our study and providing key insights into the open source development process. Karim R. Lakhani would like to acknowledge the generous support of The Boston Consulting Group and Canada's Social Science and Humanities Research Council doctoral fellowship. Georg von Krogh and Sebastian Spaeth acknowledge the generous support from the Research Foundation at the University of St. Gallen. %B Research Policy %V 32 %P 1217-1241 %G eng %1 policy %2 case study %R http://dx.doi.org/10.1016/S0048-7333(03)00050-7 %> https://flosshub.org/sites/flosshub.org/files/krogh03.pdf %0 Journal Article %D 2003 %T Contributing to the common pool resources in Open Source software. A comparison between individuals and firms %A Andrea Bonaccorsi %K developers %K linux %K linux kernel %K Survey %X This paper studies the contributions to Open Source projects of software firms. Our goal is to analyse whether they follow the same regularities that characterize the behaviour of individual programmers. An exhaustive empirical analysis is carried out using data on project membership, project coordination and contribution efforts of 146 Italian firms that do business with Open Source software. We follow a meta-analytic approach comparing our findings with the results of the surveys conducted on Free Software programmers. Moreover, the availability of the data gathered by Hertel et al. (2003) on 141 developers of the Linux kernel will allow direct comparisons between the two sets. %8 August %G eng %> https://flosshub.org/sites/flosshub.org/files/bnaccorsirossidevelopers.pdf %0 Conference Paper %B Conference on Cooperation, Innovation & Technology (CITE 2003) %D 2003 %T Distributed Collective Practices and F/OSS Problem Management: Perspective and Methods %A Gasser, Les %A Gabriel Ripoche %K Automated process extraction %K bug fixing %K bug reports %K bugzilla %K Collective knowledge management %K Information extraction from natural language texts %K mozilla %K Software problem management %X This paper presents the state of our research on Distributed Collective Practices (DCPs) in Free/Open-Source Software (F/OSS) projects, focusing on sensemaking and resolution of software problems. We are exploring the hypothesis that variations in the content and in the articulation of these socio-technical processes have an impact on the outcome of the activity of F/OSS collectives, and more specifically on problem resolution. Our preliminary techniques for combining qualitative data analysis with automated process extraction result in a scalable analysis method called Computational Amplification (CA). We are applying CA to 128,000 problem reports from the Mozilla F/OSS project. The paper illustrates how CA is used to create multidimensional process models and shows types of conclusions we can reach. %B Conference on Cooperation, Innovation & Technology (CITE 2003) %0 Journal Article %J Proceedings of the 3rd ICSE Workshop on Open Source %D 2003 %T Evidences in the evolution of OS projects through Changelog Analyses %A Capiluppi, Andrea %K classification %K freshmeat %K loc %K modularity %K repository %K size %K sloc %K source code %X Most empirical studies about Open Source (OS) projects or products are vertical and usually deal with the flagship, successful projects. There is a substantial lack of horizontal studies to shed light on the whole population of projects, including failures. This paper presents a horizontal study aimed at characterizing OS projects. We analyze a sample of around 400 projects from a popular OS project repository. Each project is characterized by a number of attributes. We analyze these attributes statically and over time. The main results show that few projects are capable of attracting a meaningful community of developers. The majority of projects is made by few (in many cases one) person with a very slow pace of evolution. We then try to observe how many projects count on a substantial number of developers, and analyze those projects more deeply. The goal is to achieve a better insight in the dynamics of open source development. The initial results of this analysis, especially growth in code size and tendency to stability in modularity, seem to be in line with traditional close source development. %B Proceedings of the 3rd ICSE Workshop on Open Source %P 19-24 %U http://hdl.handle.net/10552/1037 %> https://flosshub.org/sites/flosshub.org/files/capiluppi2003.pdf %0 Journal Article %J Organization Science %D 2003 %T From a Firm-Based to a Community-Based Model of Knowledge Creation: The Case of the Linux Kernel Development %A Lee, Gwendolyn K. %A Cole, Robert E. %K credits %K developers %K email %K email archives %K knowledge creation %K linux kernel %K mailing list %K maintainers %K scm %K source code %K Survey %K Volunteers %X We propose a new model of knowledge creation in purposeful, loosely coordinated, distributed systems, as an alternative to a firm-based one. Specifically, using the case of the Linux kernel development project, we build a model of community-based, evolutionary knowledge creation to study how thousands of talented volunteers, dispersed across organizational and geographical boundaries, collaborate via the Internet to produce a knowledge-intensive, innovative product of high quality. By comparing and contrasting the Linux model with the traditional/commercial model of software development and firm-based knowledge creation efforts, we show how the proposed model of knowledge creation expands beyond the boundary of the firm. Our model suggests that the product development process can be effectively organized as an evolutionary process of learning driven by criticism and error correction. We conclude by offering some theoretical implications of our community-based model of knowledge creation for the literature of organizational learning, community life, and the uses of knowledge in society. %B Organization Science %I INFORMS %V 14 %P pp. 633-649 %U http://www.jstor.org/stable/4135125 %0 Journal Article %J Research Policy %D 2003 %T Guarding the Commons: How Community Managed Software Projects Protect Their Work %A Siobhan O'Mahony %K Common Pool Resources %K email %K email archives %K intellectual property %K mailing list %K open source %K Public Goods %K Software %K Survey %X Theorists often speculate why open source and free software project contributors give their work away. Although contributors make their work publicly available, they do not forfeit their rights to it. Community managed software projects protect their work by using several legal and normative tactics, which should not be conflated with a disregard for or neglect of intellectual property rights. These tactics allow a project?s intellectual property to be publicly and freely available and yet, governable. Exploration of this seemingly contradictory state may provide new insight into governance models for the management of digital intellectual property. %B Research Policy %7 7 %V 32 %P 1179-1198 %8 February %G eng %> https://flosshub.org/sites/flosshub.org/files/rp-omahony.pdf %0 Journal Article %J Research Policy %D 2003 %T How open source software works: "free" user-to-user assistance %A Karim R Lakhani %A von Hippel, Eric %K apache %K help %K logs %K MOTIVATION %K participants %K Survey %K usenet %X Research into free and open source software development projects has so far largely focused on how the major tasks of software development are organized and motivated. But a complete project requires the execution of "mundane but necessary" tasks as well. In this paper, we explore how the mundane but necessary task of field support is organized in the case of Apache web server software, and why some project participants are motivated to provide this service gratis to others. We find that the Apache field support system functions effectively. We also find that, when we partition the help system into its component tasks, 98% of the effort expended by information providers in fact returns direct learning benefits to those providers. This finding considerably reduces the puzzle of why information providers are willing to perform this task "for free." Implications are discussed. %B Research Policy %V 32 %P 923-943 %G eng %U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.8172&rep=rep1&type=pdf %M WOS:000183049000004 %1 policy %2 case study %R http://dx.doi.org/10.1016/S0048-7333(02)00095-1 %> https://flosshub.org/sites/flosshub.org/files/lakhani2003.pdf %0 Report %D 2003 %T An Introduction to Open Source Communities %A Eugene Eric Kim %K free software foundation directory %K freshmeat %K fsf %K sourceforge %K squirrelmail %K touchgraph %X This report describes what open source communities are and how they work. It cites relevant research and presents original case studies of two open source projects: TouchGraph and SquirrelMail. It then identifies patterns of collaboration shared by these projects, and describes how these patterns might apply to other types of communities. Finally, it reviews what is still not well understood about open source communities, and proposes several paths for further research. %I Blue Oxen Associates %8 04/2003 %G eng %> https://flosshub.org/sites/flosshub.org/files/blueoxen.pdf %0 Journal Article %J Proceedings of the 2nd Workshop on Open Source Software Engineering ICSE2002 %D 2003 %T Maintainability of the Linux Kernel %A Schach, Stephen R. %A Jin, B. %A Wright, D.R. %K coupling %K kernel %K linux %K linux kernel %K modules %K source code %X We have examined 365 versions of Linux. For every version, we counted the number of instances of common (global) coupling between each of the 17 kernel modules and all the other modules in that version of Linux. We found that the number of instances of common coupling grows exponentially with version number. This result is significant at the 99.99% level, and no additional variables are needed to explain this increase. We conclude that, unless Linux is restructured with a bare minimum of common coupling, the dependencies induced by common coupling will, at some future date, make Linux exceedingly hard to maintain without inducing regression faults. %B Proceedings of the 2nd Workshop on Open Source Software Engineering ICSE2002 %8 October %G eng %> https://flosshub.org/sites/flosshub.org/files/linux-maint_0.pdf %0 Journal Article %D 2003 %T Managing the Boundary of an 'Open' Project %A Siobhan O'Mahony %K debian %K membership %K social network analysis %X In the past ten years, the boundaries between public and open science and commercial research efforts have become more porous. Scholars have thus more critically examined ways in which these two institutional regimes intersect. Large open source software projects have also attracted commercial collaborators and now struggle to develop code in an open public environment that still protects their communal boundaries. This research applies a dynamic social network approach to understand how one community managed software project, Debian, develops a membership process. We examine the project's face-to-face social network during a five-year period (1997-2001) to see how changes in the social structure affect the evolution of membership mechanisms and the determination of gatekeepers. While the amount and importance of a contributor's work increases the probability that a contributor will become a gatekeeper, those more central in the social network are more likely to become gatekeepers and influence the membership process. A greater understanding of the mechanisms open projects use to manage their boundaries has critical implications for research and knowledge producing communities operating in pluralistic, open and distributed environments. %8 October %G eng %> https://flosshub.org/sites/flosshub.org/files/omahonyferraro.pdf %0 Conference Paper %B Lecture Notes in Computer Science %D 2003 %T Mining Open Source Software (OSS) Data Using Association Rules Network %A Sanjay Chawla %A Bavani Arunasalam %A Joseph G. Davis %K arn %K association rules %K factor analysis %K project success %K sourceforge %K svd %X The Open Source Software(OSS) movement has attracted considerable attention in the last few years. In this paper we report our results of mining data acquired from SourceForge.net, the largest open source software hosting website. In the process we introduce Association Rules Network(ARN), a (hyper)graphical model to represent a special class of association rules. Using ARNs we discover important relationships between the attributes of successful OSS projects. We verify and validate these relationships using Factor Analysis, a classical statistical technique related to Singular Value Decomposition(SVD). %B Lecture Notes in Computer Science %V 2637 %P 461-466 %0 Journal Article %J Journal of the American Society for Information Science and Technology %D 2003 %T Open source software development and Lotka's Law: Bibliometric patterns in programming %A Newby, G. B. %A Greenberg, J. %A Jones, P. %K developers %K linux %K linux software map %K lsm %K sourceforge %K team size %X This research applies Lotka's Law to metadata on open source software development. Lotka's Law predicts the proportion of authors at different levels of productivity. Open source software development harnesses the creativity of thousands of programmers worldwide, is important to the progress of the Internet and many other computing environments, and yet has not been widely researched. We examine metadata from the Linux Software Map (LSM), which documents many open source projects, and Sourceforge, one of the largest resources for open source developers. Authoring patterns found are comparable to prior studies of Lotka's Law for scientific and scholarly publishing. Lotka's Law was found to be effective in understanding software development productivity patterns, and offer promise in predicting aggregate behavior of open source developers. %B Journal of the American Society for Information Science and Technology %V 54 %P 169-178 %G eng %M WOS:000180175400008 %1 information science %2 computational %R 10.1002/asi.10177 %0 Conference Paper %B Proceedings of 7th Annual Conference of the Southern Association for Information Systems %D 2003 %T Organizational Structure of Open Source Projects: A Life Cycle Approach %A Donald E. Wynn %K division of labor %K downloads %K growth %K interview %K leadership %K life cycle %K lifecycle %K project success %K roles %K sourceforge %K Survey %X The structure of open source project communities is discussed in relation to the organizational life cycle. In lieu of sales figures, the download counts for each project are used to identify the life cycle stage of a random sample of open source projects. A research model is proposed that attempts to measure the fit between the life cycle stage and the specific organizational characteristics of these projects (focus, division of labor, role of the leader, level of commitment, and coordination/control) as an indicator of the success of a project as measured by the satisfaction and involvement of both developers and users. %B Proceedings of 7th Annual Conference of the Southern Association for Information Systems %> https://flosshub.org/sites/flosshub.org/files/wynn2004.pdf %0 Journal Article %J IEEE Security & Privacy Magazine %D 2003 %T Software security for open-source systems %A Cowan, C. %K security %X Debate over whether open-source software development leads to more or less secure software has raged for years. Neither is intrinsically correct: open-source software gives both attackers and defenders greater power over system security. Fortunately, several security-enhancing technologies for open-source systems can help defenders improve their security. %B IEEE Security & Privacy Magazine %V 1 %P 38 - 45 %8 01/2003 %N 1 %! IEEE Secur. Privacy Mag. %R 10.1109/MSECP.2003.1176994 %0 Conference Paper %B 1st Workshop on Open Source in an Industrial Context %D 2003 %T Supporting Distributed and Decentralized Projects: Drawing Lessons from the Open Source Community %A Erenkrantz, J. %A Taylor, R.N. %K abiword %K apache %K debian %K freebsd %K kde %K linux %K mozilla %K mysql %K perl %K PHP %K postgresql %K python %K subversion %K tomcat %K tools %X Open source projects are typically organized in a distributed and decentralized manner. These factors strongly determine the processes followed and constrain the types of tools that can be utilized. This paper explores how distribution and decentralization have affected processes and tools in existing open source projects with the goals of summarizing the lessons learned and identifying opportunities for improving both. Issues considered include decision-making, accountability, communication, awareness, rationale, managing source code, testing, and release management. %B 1st Workshop on Open Source in an Industrial Context %8 10/2003 %> https://flosshub.org/sites/flosshub.org/files/erenkrantz2003.pdf %0 Conference Paper %B Proceedings of the 25th International Conference on Software Engineering %D 2003 %T Toward an understanding of the motivation Open Source Software developers %A Ye, Yunwen %A Kishida, Kouichi %K change log %K COMMUNITY %K contributions %K contributors %K developers %K email %K email archives %K evolution %K gimp %K log files %K mailing list %K roles %K source code %X An Open Source Software (OSS) project is unlikely to be successful unless there is an accompanied community that provides the platform for developers and users to collaborate. Members of such communities are volunteers whose motivation to participate and contribute is of essential importance to the success of OSS projects. In this paper, we aim to create an understanding of what motivates people to participate in OSS communities. We theorize that learning is one of the motivational forces. Our theory is grounded in the learning theory of Legitimate Peripheral Participation, and is supported by analyzing the social structure of OSS communities and the co-evolution between OSS systems and communities. We also discuss practical implications of our theory for creating and maintaining sustainable OSS communities as well as for software engineering research and education. %B Proceedings of the 25th International Conference on Software Engineering %S ICSE '03 %I IEEE Computer Society %C Washington, DC, USA %P 419–429 %@ 0-7695-1877-X %U http://portal.acm.org/citation.cfm?id=776816.776867 %> https://flosshub.org/sites/flosshub.org/files/YeKishida.pdf %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T Adopting OSS Methods by Adopting OSS Tools %A Robbins, Jason E. %K ant %K argouml %K bugzilla %K cactus %K cvs %K developers %K eclipse %K emacs %K email %K faq %K junit %K mailing lists %K make %K netbeans %K package management %K rpm %K scarab %K subversion %K teams %K tools %K torque %K WORK %X The open source movement has created and used a set of software engineering tools with features that fit the characteristics of open source development processes. To a large extent, the open source culture and methodology are conveyed to new developers via the toolset itself, and through the demonstrated usage of these tools on existing projects. The rapid and wide adoption of open source tools stands in stark contrast to the difficulties encountered in adopting traditional CASE tools. This paper explores the characteristics that make these tools adoptable and how adopting them may influence software development processes. %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/Robbins.pdf %0 Journal Article %J Information and Software Technology %D 2002 %T Analyzing cloning evolution in the Linux kernel %A Antoniol, G. %A Villano, U. %A Merlo, E. %A Di Penta, M. %K cvs %K kernel %K lines of code %K linux %K loc %K project success %K source code %X Identifying code duplication in large multi-platform software systems is a challenging problem. This is due to a variety of reasons including the presence of high-level programming languages and structures interleaved with hardware-dependent low-level resources and assembler code, the use of GUI-based configuration scripts generating commands to compile the system, and the extremely high number of possible different configurations. This paper studies the extent and the evolution of code duplications in the Linux kernel. Linux is a large, multi-platform software system; it is based on the Open Source concept, and so there are no obstacles in discussing its implementation. In addition, it is decidedly too large to be examined manually: the current Linux kernel release (2.4.18) is about three million LOCs. Nineteen releases, from 2.4.0 to 2.4.18, were processed and analyzed, identifying code duplication among Linux subsystems by means of a metric-based approach. The obtained results support the hypothesis that the Linux system does not contain a relevant fraction of code duplication. Furthermore, code duplication tends to remain stable across releases, thus suggesting a fairly stable structure, evolving smoothly without any evidence of degradation. (C) 2002 Elsevier Science B.V. All rights reserved. %B Information and Software Technology %V 44 %P 755-765 %G eng %U web.soccerlab.polymtl.ca/~antoniol/publications/.../infsoft2002.pdf %M WOS:000178367900005 %> https://flosshub.org/sites/flosshub.org/files/infsoft2002.pdf %0 Journal Article %J First Monday %D 2002 %T Cave or Community? An Empirical Examination of 100 Mature Open Source Projects %A Sandeep Krishnamurthy %K age %K contributors %K developers %K project success %K registration %K sourceforge %X Starting with Eric Raymond's groundbreaking work, "The Cathedral and the Bazaar", open-source software (OSS) has commonly been regarded as work produced by a community of developers. Yet, given the nature of software programs, one also hears of developers with no lives that work very hard to achieve great product results. In this paper, I sought empirical evidence that would help us understand which is more common - the cave (i.e., lone producer) or the community. Based on a study of the top 100 mature products on Sourceforge, I find a few surprising things. First, most OSS programs are developed by individuals, rather than communities. The median number of developers in the 100 projects I looked at was 4 and the mode was 1 - numbers much lower than previous numbers reported for highly successful projects! Second, most OSS programs do not generate a lot of discussion. Third, products with more developers tend to be viewed and downloaded more often. Fourth, the number of developers associated with a project was positively correlated to the age of the project. Fifth, the larger the project, the smaller the percent of project administrators. %B First Monday %V 7 %8 06/2002 %G eng %> https://flosshub.org/sites/flosshub.org/files/krishnamurthy.pdf %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T Characterizing the OSS process %A Capiluppi, Andrea %A Patricia Lago %A Maurizio Morisio %K bugs %K change log %K classification %K cvs %K downloads %K freshmeat %K metadata %K patches %K popularity %K project success %K release history %K sourceforge %K vitality %X The Open Source model of software development has gained the attention of both the business, the practitioners’ and the research communities. The Open Source process has been described by the seminal paper by Eric Raymond [4] and [5]. However, sound empirical studies are still very limited [3], [6]. Our goal is to investigate the OS process by empirical means, to analyze, characterize it, and possibly model it with quantitative models. It should be noted that the Open Source process provides open process and product data, and therefore is a rare opportunity for empirical research. Our initial research focus is on the characterization of the process, starting from the evolution of OS projects. In traditional projects, a significant number of releases in a short time is usually considered an instability factor [7] and [8], while in the OSS community, it is an evidence of vitality, shows the commitment of the authors and the power of attraction of other programmers [9]. Is it possible to characterize the vitality of projects? And, can vitality be traced to some other characteristics of a project? %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/CapiluppiLagoMorisio.pdf %0 Journal Article %J Information Systems Journal %D 2002 %T Code quality analysis in open source software development %A Ioannis Stamelos %A Lefteris Angelis %A Apostolos Oikonomou %A Georgios L. Bleris %K C %K Code quality characteristics %K functions %K linux %K metrics %K open source development %K software measurement %K structural code analysis %K Suse %K user satisfaction %X Proponents of open source style software development claim that better software is produced using this model compared with the traditional closed model. However, there is little empirical evidence in support of these claims. In this paper, we present the results of a pilot case study aiming: (a) to understand the implications of structural quality; and (b) to figure out the benefits of structural quality analysis of the code delivered by open source style development. To this end, we have measured quality characteristics of 100 applications written for Linux, using a software measurement tool, and compared the results with the industrial standard that is proposed by the tool. Another target of this case study was to investigate the issue of modularity in open source as this characteristic is being considered crucial by the proponents of open source for this type of software development. We have empirically assessed the relationship between the size of the application components and the delivered quality measured through user satisfaction. We have determined that, up to a certain extent, the average component size of an application is negatively related to the user satisfaction for this application. %B Information Systems Journal %V 12 %P 43–60 %0 Conference Proceedings %B The Twenty-Third International Conference on Information Systems %D 2002 %T Economic incentives for participating in open source software projects %A Il-Horn Hann %A Jeff Roberts %A Sandra Slaughter %A Roy Fielding %K apache %K contributions %K email %K email archives %K mailing list %K organizational sponsorship %K participation %K patch %K scm %K source code %K Survey %K version control %X Using the Internet as a basis for communication, collaboration, and storage of artifacts, the open source community is producing software of a quality that was previously thought to be achievable only by professional engineers following strict software development paradigms. This accomplishment is even more astounding as developers contribute to the source code without any remuneration. Open source leaders as well as academics have proposed theories about the motivation of open source developers that are rooted in diverse fields such as social psychology and anthropology. However, Lerner and Tirole (2000) argue that developer participation in open source projects may, in part, be explained by existing economic theory regarding career concerns. This research seeks to confirm or disconfirm the existence of economic returns to participation in open source development. Our findings suggest that greater open source participation per se, as measured in contributions made, is not associated with wage increases. However, a higher status in a merit-based ranking within the Apache Project is associated with significantly higher wages. This suggests that employers do not reward the gain in experience through open source participation as an increase in human capital. The results are also consistent with the notion that a high rank within the Apache Software Foundation is a credible signal of the productive capacity of a programmer. %B The Twenty-Third International Conference on Information Systems %P 365–372 %G eng %> https://flosshub.org/sites/flosshub.org/files/42.pdf %0 Conference Paper %B Proceedings of the International Workshop on Principles of Software Evolution %D 2002 %T Evolution patterns of open-source software systems and communities %A Nakakoji, Kumiyo %A Yamamoto, Yasuhiro %A Nishinaka, Yoshiyuki %A Kishida, Kouichi %A Ye, Yunwen %K case study %K open-source software (OSS) %K open-source software community %K software evolution %X Open-Source Software (OSS) development is regarded as a successful model of encouraging "natural product evolution". To understand how this "natural product evolution" happens, we have conducted a case study of four typical OSS projects. Unlike most previous studies on software evolution that focus on the evolution of the system per se, our study takes a broader perspective: It examines not only the evolution of OSS systems, but also the evolution of the associated OSS communities, as well as the relationship between the two types of evolution.Through the case study, we have found that while collaborative development within a community is the essential characteristic of OSS, different collaboration models exist, and that the difference in collaboration model results in different evolution patterns of OSS systems and communities. To treat such differences systematically, we propose to classify OSS into three types: Exploration-Oriented, Utility-Oriented, and Service-Oriented. Such a classification can provide guidance on the creation and maintenance of sustainable OSS development and communities. %B Proceedings of the International Workshop on Principles of Software Evolution %S IWPSE '02 %I ACM %C New York, NY, USA %P 76–85 %@ 1-58113-545-9 %U http://doi.acm.org/10.1145/512035.512055 %R 10.1145/512035.512055 %0 Conference Paper %B ICIS 2002. Proceedings of International Conference on Information Systems 2002 %D 2002 %T An Exploratory Study of Factors Influencing the Level of Vitality and Popularity of Open Source Projects %A Stewart, Katherine J. %A Ammeter, Tony %K activity %K audience %K developers %K freshmeat %K license analysis %K licenses %K organizational sponsorship %K project success %K roles %K status %K target audience %K users %X In this research, we ask the question: What differentiates successful from unsuccessful open source software projects? Using a sample of 240 open source projects, we examine how organizational sponsorship, target audience (developer versus end user), license choice, and development status interact over time to influence the extent to which open source software projects attract user attention and developer activity. %B ICIS 2002. Proceedings of International Conference on Information Systems 2002 %P 1-5 %8 2002 %0 Journal Article %J Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T High Quality and Open Source Software Practices %A T. Halloran %A W. Scherlis %K apache %K bug report %K bug tracker %K bug tracking system %K feature requests %K gcc %K gnome %K kde %K lines of code %K linux %K loc %K mozilla %K netbeans %K perl %K position paper %K python %K sloc %K source code %K Survey %K tomcat %K xfree86 %X Surveys suggest that, according to various metrics, the quality and dependability of today’s open source software is roughly on par with commercial and government developed software. What are the prospects for advancing to much higher levels of quality in open source software? More specifically, what attributes must be possessed by quality-related interventions for them to be feasibly adoptable in open source practice? In order to identify some of these attributes, we conducted a preliminary survey of the quality practices of a number of successful open source projects. We focus, in particular, on attributes related to adoptability by the open source practitioner community. %B Proceedings of the 2nd ICSE Workshop on Open Source %8 2002 %> https://flosshub.org/sites/flosshub.org/files/HalloranScherlis.pdf %0 Journal Article %J Computers & Security %D 2002 %T The Open Source approach—opportunities and limitations with respect to security and privacy %A Hansen, Marit %A Köhntopp, Kristian %A Pfitzmann, Andreas %K security %X Today’s software often does not even fulfil basic security or privacy requirements. Some people regard the open source paradigm as the solution to this problem. First, we carefully explain the security and privacy aspects of open source, which in particular offer the possibility for a dramatic increase in trustworthiness for and autonomy of the user. We show which expectations for an improvement of the software trustworthiness dilemma are realistic. Finally, we describe measures necessary for developing secure and trustworthy open source systems. %B Computers & Security %I Elsevier %V 21 %P 461–471 %U https://dud.inf.tu-dresden.de/literatur/HaKP_02OpenSource_0214.doc %> https://flosshub.org/sites/flosshub.org/files/HaKP_02OpenSource_0214.doc %0 Book Section %B Proceedings of the Eighth Americas Conference on Information Systems %D 2002 %T The open source software development phenomenon: An analysis based on social network theory %A Madey, G. %A Freeh, V %A Tynan, R %K developers %K social network analysis %K social networks %K sourceforge %X The OSS movement is a phenomenon that challenges many traditional theories in economics, software engineering, business strategy, and IT management. Thousands of software programmers are spending tremendous amounts of time and effort writing and debugging software, most often with no direct monetary compensation. The programs, some of which are extremely large and complex, are written without the benefit of traditional project management, change tracking, or error checking techniques. Since the programmers are working outside of a traditional organizational reward structure, accountability is an issue as well. A significant portion of internet e-commerce runs on OSS, and thus many firms have little choice but to trust mission-critical e-commerce systems to run on such software, requiring IT management to deal with new types of socio-technical problems. A better understanding of how the OSS community functions may help IT planners make more informed decisions and develop more effective strategies for using OSS software. We hypothesize that open source software development can be modeled as self-organizing, collaboration, social networks. We analyze structural data on over 39,000 open source projects hosted at SourceForge.net involving over 33,000 developers. We define two software developers to be connected part of a collaboration social network if they are members of the same project, or are connected by a chain of connected developers. Project sizes, developer project participation, and clusters of connected developers are analyzed. We find evidence to support our hypothesis, primarily in the presence of power-law relationships on project sizes (number of developers per project), project membership (number of projects joined by a developer), and cluster sizes. Potential implications for IT researchers, IT managers, and governmental policy makers are discussed. %B Proceedings of the Eighth Americas Conference on Information Systems %P 1806–1813 %U http://ais.bepress.com/cgi/viewcontent.cgi?article=1606&context=amcis2002 %> https://flosshub.org/sites/flosshub.org/files/MadeyFreehAmcis2002.pdf %0 Journal Article %J IEE Proceedings Software %D 2002 %T Open Source Software Projects as Virtual Organizations: Competency Rallying for Software Development %A Kevin Crowston %A Barbara Scozzi %K competencies %K competency rallying %K coordination %K project success %K sourceforge %K virtual organizations %X The contribution of this paper is the identification and testing of factors important for the success of Open Source Software (OSS) projects. We present an analysis of OSS communities as virtual organizations and apply Katzy and Crowston's (2000) competency rallying (CR) theory to the case of OSS development projects. CR theory suggests that project participants must develop necessary competencies, identify and understand market opportunities, marshal competencies to meet the opportunity and manage a short-term cooperative process. Using data collected from 7477 OSS projects hosted by the SourceForge system (http://sourceforge.net/), we formulate and test a set of specific hypotheses derived from CR theory. %B IEE Proceedings Software %V 149 %P 3–17 %G eng %> https://flosshub.org/sites/flosshub.org/files/crowston.pdf %0 Journal Article %J Journal of Law, Economics and Organization %D 2002 %T The Scope of Open Source Licensing %A Josh Lerner %A Jean Tirole %K developers %K license %K licenses %K permissive %K restrictive %K sourceforge %X This paper is an initial exploration of the determinants of open source license choice. It first enumerates the various considerations that should figure into the licensor's choice of contractual terms, in particular highlighting how the decision is shaped not just by the preferences of the licensor itself, but also by that of the community of developers. The paper then presents an empirical analysis of the determinants of license choice using the SourceForge database, a compilation of nearly 40,000 open source projects. Projects geared toward end-users tend to have restrictive licenses, while those oriented toward developers are less likely to do so. Projects that are designed to run on commercial operating systems and those geared towards the Internet are less likely to have restrictive licenses. Finally, projects that are likely to be attractive to consumers such as games are more likely to have restrictive licenses. %B Journal of Law, Economics and Organization %V 21 %P 20-56 %8 2005 %G eng %> https://flosshub.org/sites/flosshub.org/files/lernertirole2.pdf %0 Generic %D 2002 %T Security in open versus closed systems—the dance of Boltzmann, Coase and Moore %A Anderson, Ross %K security %X Some members of the open-source and free software com- munity argue that their code is more secure, because vulnerabilities are easier for users to find and fix. Meanwhile the proprietary vendor com- munity maintains that access to source code rather makes things easier for the attackers. In this paper, I argue that this is the wrong way to approach the interaction between security and the openness of design. I show first that under quite reasonable assumptions the security assur- ance problem scales in such a way that making it either easier, or harder, to find attacks, will help attackers and defendants equally. This model may help us focus on and understand those cases where some asymmetry is introduced. However, there are more pressing security problems for the open source community. The interaction between security and openness is entangled with attempts to use security mechanisms for commercial advantage – to entrench monopolies, to control copyright, and above all to control interoperability. As an example, I will discuss TCPA, a recent initiative by Intel and others to build DRM technology into the PC platform. Al- though advertised as providing increased information security for users, it appears to have more to do with providing commercial advantage for vendors, and may pose an existential threat to open systems. %I Technical report, Cambridge University, England %U http://www.cl.cam.ac.uk/~rja14/Papers/toulouse.pdf %> https://flosshub.org/sites/flosshub.org/files/toulouse.pdf %0 Journal Article %J Information systems journal %D 2002 %T On the security of open source software %A Payne, Christian %K security %X With the rising popularity of so-called ‘open source’ software there has been increasing interest in both its various benefits and disadvantages. In particular, despite its prominent use in providing many aspects of the Internet’s basic infrastructure, many still question the suitability of such software for the commerce-oriented Internet of the future. This paper evaluates the suitability of open source software with respect to one of the key attributes that tomorrow’s Internet will require, namely security. It seeks to present a variety of arguments that have been made, both for and against open source security and analyses in relation to empirical evidence of system security from a previous study. The results represent preliminary quantitative evidence concerning the security issues surrounding the use and development of open source software, in particular relative to traditional proprietary software. %B Information systems journal %I Wiley Online Library %V 12 %P 61–78 %> https://flosshub.org/sites/flosshub.org/files/Payne2002_ISJ12_SecurityOSS.pdf %0 Journal Article %J Software, {IEE} Proceedings - %D 2002 %T Trust and vulnerability in open source software %A Hissam, S. A. %A Plakosh, D. %A Weinstock, C. %K closed source software %K community of software developers %K critical infrastructures %K cyber criminal %K open source software %K PITAC %K predictably reliable systems %K predictably secure systems %K software components %K trust %K users %K vulnerability %X Software plays an ever increasing role in the critical infrastructures that run our cities, manage our economies, and defend our nations. In 1999, the Presidents Information Technology Advisory Committee (PITAC) reported to the United States President the need for software components that are reliable, tested, modelled and secure supporting the development of predictably reliable and secure systems that underscore our critical infrastructures. Open source software (OSS) constitutes a viable source for software components. Some believe that OSS is more reliable and more secure than closed source software (CSS)-due to a phenomenon dubbed 'many eyeballs'-but is this truly the case? Or does OSS give the cyber criminal an edge that he would likewise not have? We explore OSS from the perspective of the cyber criminal and discuss what the community of software developers and users alike can do to increase their trust in both open source software and closed source software %B Software, {IEE} Proceedings - %V 149 %P 47–51 %8 02/2002 %N 1 %& 47 %R 10.1049/ip-sen:20020208 %0 Journal Article %J ACM Transactions on Software Engineering and Methodology %D 2002 %T Two case studies of open source software development: Apache and Mozilla %A Audris Mockus %A Roy Fielding %A Herbsleb, J. D. %K apache %K bug fixing %K bug reports %K bugzilla %K change history %K core %K defect density %K email %K email archives %K mailing list %K mozilla %K ownership %K participation %K productivity %K scm %K source code %X According to its proponents, open source style software development has the capacity to compete successfully, and perhaps in many cases displace, traditional commercial development methods. In order to begin investigating such claims, we examine data from two major open source projects, the Apache web server and the Mozilla browser. By using email archives of source code change history and problem reports we quantify aspects of developer participation, core team size, code ownership, productivity, defect density, and problem resolution intervals for these OSS projects. We develop several hypotheses by comparing the Apache project with several commercial projects. We then test and refine several of these hypotheses, based on an analysis of Mozilla data. We conclude with thoughts about the prospects for high- performance commercial/ open source process hybrids. %B ACM Transactions on Software Engineering and Methodology %V 11 %P 309-346 %G eng %M WOS:000177759000002 %1 software engineering %2 case study %> https://flosshub.org/sites/flosshub.org/files/mockusFieldingHerbsleb2002.pdf %0 Conference Paper %B In The 2nd Workshop on Open Source Software Engineering at the 24th International Conference on Software Engineering (ICSE2002 %D 2002 %T Understanding oss as a self-organizing process %A Madey, G. %A Freeh, V %A Tynan, R %K developers %K size %K social network analysis %K social networks %K sourceforge %X We hypothesize that open source software development can be modeled as self-organizing, collaboration, social networks. We analyze structural data on over 39,000 open source projects hosted at SourceForge.net. We define two software developers to be connected — part of a collaboration social network — if they are members of the same project, or are connected by a chain of connected developers. Project sizes, developer project participation, and clusters of connected developers are analyzed. We find evidence to support our hypothesis, primarily in the presence of power-law relationships on project sizes (number of developers per project), project membership (number of projects joined by a developer), and cluster sizes. %B In The 2nd Workshop on Open Source Software Engineering at the 24th International Conference on Software Engineering (ICSE2002 %> https://flosshub.org/sites/flosshub.org/files/MadeyFreehTynan.pdf %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T Why Do Developers Contribute to Open Source Projects? First Evidence of Economic Incentives %A Il-Horn Hann %A Jeff Roberts %A Sandra Slaughter %A Roy Fielding %K apache %K contributions %K cvs %K developers %K ECONOMICS %K email %K email archives %K financial %K Human capital %K mailing list %K MOTIVATION %K participation %K source code %K version control %X The availability of commercial quality, free software products such as the Apache HTTP (web) server or the Linux operating system has focused significant attention on the open source development process by which these products were created. One of the more perplexing aspects of open source software projects is why developers freely devote their time and energy to these projects. While many open source participants cite idealistic motives for participation, Lerner and Tirole (2000) argue that developer participation in open source projects may, in part, be explained by existing economic theory regarding career concerns. This research seeks to confirm or disconfirm the existence of economic returns to participation in open source development. Preliminary results of our empirical investigation suggest that greater open source participation per se, as measured in contributions made, does not lead to wage increases. However, a higher status in a merit-based ranking within the Apache Project does lead to significantly higher wages. This suggests that employers do not reward the gain in experience through open source participation as an increase in human capital. The results are also consistent with the notion that a high rank within the Apache Software Foundation is a credible signal of the productive capacity of a programmer. %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/HannRobertsSlaughterFielding.pdf %0 Journal Article %J IEEE Software %D 2001 %T Does open source improve system security? %A Witten, Brian %A Landwehr, Carl %A Caloyannides, Michael %K security %X An attacker could examine public source code to find flaws in a system. So, is source code access a net gain or loss for security? The authors consider this question from several perspectives and tentatively conclude that having source code available should work in favor of system security. %B IEEE Software %I IEEE %V 18 %P 57–61 %U https://pdfs.semanticscholar.org/71f6/01579ad1c373ed59a19eba0396f7f0cb7a0e.pdf %> https://flosshub.org/sites/flosshub.org/files/01579ad1c373ed59a19eba0396f7f0cb7a0e.pdf %0 Conference Proceedings %B International Conference on Information Systems 2001 %D 2001 %T An exploratory study of ideology and trust in open source development groups %A Katherine Stewart %A Gosain, S. %K contributors %K groups %K ideology %K license analysis %K licenses %K metadata %K open source %K sourceforge %K Survey %K team %K team size %K teams %K trust %K types %X Open source (OS) software development has been the subject of heightened interest among organizational scholars because of the novel social coordination practices that signal a departure from traditional proprietary software development. We propose that trust among group members in open source development groups (OSDGs) plays a key role in facilitating their success. Trust is important in this context because of the risk of opportunistic behavior by other members who volunteers may not have met and may never expect to meet, as well as a lack of explicit market contracts or common organizational affiliation. The open source community is differentiated by a coherent ideology that emphasizes a distinct set of interrelated norms, beliefs, and values. These serve to create incentives for open source practices that eschew conventional transactional norms in favor of a gift culture and a focus on reputations. In this study, we primarily examine the role of the shared ideology in enabling the development of affective and cognitive trust in OSDGs. We further examine how this trust leads to desired outcomes - group efficacy and effectiveness. The study is based on exploratory interviews, examination of archival records and a preliminary survey to understand the specific conditions of open source efforts on which this work-in-progress report is based. This is being followed-up by empirical testing of our research model through a survey of a broad variety of OSDGs. This study would contribute to a clarification of the role of trust in enabling software groups to work effectively and help to understand the bases of trust in ideology-permeated groups. %B International Conference on Information Systems 2001 %G eng %U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.638&rep=rep1&type=pdf %R 10.1.1.104.638 %> https://flosshub.org/sites/flosshub.org/files/stewartGosain2001.pdf %0 Conference Paper %B Proceedings of the 4th International Workshop on Principles of Software Evolution (IWPSE 2001) %D 2001 %T Growth, evolution, and structural change in open source software %A Michael Godfrey %A Tu, Qiang %K agile methods %K beagle %K cloning %K evolution %K fetchmail %K gcc %K growth %K kernel %K lehman's laws %K lines of code %K linux %K linux kernel %K loc %K open source software %K software architecture %K software evolution %K source code %K structural change %K supporting environments %K vim %X Our recent work has addressed how and why software systems evolve over time, with a particular emphasis on software architecture and open source software systems [2, 3, 6]. In this position paper, we present a short summary of two recent projects. First, we have performed a case study on the evolution of the Linux kernel [3], as well as some other open source software (OSS) systems. We have found that several OSS systems appear not to obey some of "Lehman's laws" of software evolution [5, 7], and that Linux in particular is continuing to grow at a geometric rate. Currently, we are working on a detailed study of the evolution of one of the subsystems of the Linux kernel: the SCSI drivers subsystem. We have found that cloning, which is usually considered to be an indicator of lazy development and poor process, is quite common and is even considered to be a useful practice. Second, we are developing a tool called Beagle to aid software maintainers in understanding how large systems have changed over time. Beagle integrates data from various static analysis and metrics tools and provides a query engine as well as navigable visualizations. Of particular note, Beagle aims to provide help in modelling long term evolution of systems that have undergone architectural and structural change. %B Proceedings of the 4th International Workshop on Principles of Software Evolution (IWPSE 2001) %S IWPSE '01 %I ACM %C New York, NY, USA %P 103–106 %@ 1-58113-508-4 %U http://doi.acm.org/10.1145/602461.602482 %R http://doi.acm.org/10.1145/602461.602482 %> https://flosshub.org/sites/flosshub.org/files/tu2001.pdf %0 Conference Paper %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %D 2001 %T Software Architectures and Open Source Software – Where can Research Leverage the Most? %A Arief, B. %A Gacek, C. %A Lawrie, T. %K architecture %K software architecture %X Software architectures have been playing a central role in software engineering research for some years now. They are considered of pivotal importance in the success of complex software systems development. However, with the emergence of Open Source Software (OSS) development, a new opportunity for studying architectural issues arises. In this paper, we introduce accepted notions of software architectures (Section 2), discuss some of the known issues in OSS (Section 3), resulting in a set of aspects we consider to be relevant for future research (Section 4). %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %8 05/2001 %> https://flosshub.org/sites/flosshub.org/files/ariefgaceklawrie.pdf %0 Conference Paper %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %D 2001 %T Software Development Practices in Open Software Development Communities: A Comparative Case Study %A Walt Scacchi %K apache %K argouml %K astronomy %K chandra %K games %K infrastructure %K internet news %K mozilla %K systems design %X This study presents an initial set of findings from an empirical study of social processes, technical system configurations, organizational contexts, and interrelationships that give rise to open software. "Open software", or more narrowly, open source software, represents an approach for communities of like-minded participants to develop software system representations that are intended to be shared freely, rather than offered as closed commercial products. While there is a growing popular literature attesting to open software [DiBona, Ockman, Stone 1999, Fogel 1999], there are very few systematic studies [e.g., Feller and Fitzgerald 2000, Mockus, Fielding, Herbsleb 2000] that informs how these communities produce software. Similarly, little is known about how people in these communities coordinate software development across different settings, or about what software processes, work practices, and organizational contexts are necessary to their success. To the extent that academic research communities and commercial enterprises seek the supposed efficacy of open software [Smarr and Graham 2000], they will need grounded models of the processes and practices of open software development to allow effective investment of their resources. This study investigates four communities engaged in open software development. Case study methods are used to compare practices across communities. %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %> https://flosshub.org/sites/flosshub.org/files/scacchi.pdf %0 Conference Paper %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %D 2001 %T Software Engineering Research in the Bazaar %A Hassan, Ahmed E. %A Godfrey, Michael W. %A Holt, Richard C. %K apache %K architecture %K gcc %K kernel %K linux %K linux kernel %K mozilla %K open source software %K software architecture %K Software Engineering Research %K source code %K vim %X During the last five years, our research group has studied the architecture and evolution of several large open source systems — including Linux, GCC, VIM, Mozilla, and Apache — and we have found that open source software systems often exhibit interesting differences when compared to similar commercially-developed systems. Our investigations of these systems have involved the creation of software architecture models, software architecture repair, the creation of a reference architecture for web servers, the study of evolution and growth of open source systems, and the modelling of architectural properties of systems that are apparent only at build time. %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %> https://flosshub.org/sites/flosshub.org/files/hassangodfreyholt.pdf %0 Journal Article %J Proceedings of the International Conference on Software Engineering (ICSE 2000) %D 2000 %T A Case Study of Open Source Software Development: The Apache Server %A Audris Mockus %A Roy Fielding %A Herbsleb, James %K apache %K bug fix revisions %K bugs %K core %K cvs %K defect density %K developers %K email archives %K participation %K productivity %K revision control %K revision history %K roles %K scm %K source code %K team size %X According to its proponents, open source style software development has the capacity to compete successfully, and perhaps in many cases displace, traditional commercial development methods. We examine the development process of a major open source application, the Apache web server. By using email archives of source code change history and problem reports we quantify aspects of developer participation, core team size, code ownership, productivity, defect density, and problem resolution interval for this OSS project. This analysis reveals a unique process, which performs well on important measures. %B Proceedings of the International Conference on Software Engineering (ICSE 2000) %8 June %G eng %> https://flosshub.org/sites/flosshub.org/files/mockusapache.pdf %0 Conference Paper %B Proceedings of the 2000 ACM conference on Computer supported cooperative work (CSCW) %D 2000 %T Collaboration with Lean Media: how open-source software succeeds %A Yamauchi, Yutaka %A Yokozawa, Makoto %A Shinohara, Takeshi %A Ishida, Toru %K cooperative work %K cvs %K distributed work %K electronic media %K INNOVATION %K open-source %K software engineering %X Open-source software, usually created by volunteer programmers dispersed worldwide, now competes with that developed by software firms. This achievement is particularly impressive as open-source programmers rarely meet. They rely heavily on electronic media, which preclude the benefits of face-to-face contact that programmers enjoy within firms. In this paper, we describe findings that address this paradox based on observation, interviews and quantitative analyses of two open-source projects. The findings suggest that spontaneous work coordinated afterward is effective, rational organizational culture helps achieve agreement among members and communications media moderately support spontaneous work. These findings can imply a new model of dispersed collaboration. %B Proceedings of the 2000 ACM conference on Computer supported cooperative work (CSCW) %S CSCW '00 %I ACM %C New York, NY, USA %P 329–338 %@ 1-58113-222-0 %U http://doi.acm.org/10.1145/358916.359004 %R 10.1145/358916.359004 %0 Conference Paper %B Proceedings of the International Conference on Software Maintenance (ICSM'00) %D 2000 %T Evolution in Open Source Software: A Case Study %A Godfrey, Michael W. %A Tu, Qiang %K evolution %K functions %K growth %K lines of code %K linux %K linux kernel %K loc %K source code %X Most studies of software evolution have been performed on systems developed within a single company using traditional management techniques. With the widespread availability of several large software systems that have been developed using an 'open source' development approach, we now have a chance to examine these systems in detail, and see if their evolutionary narratives are significantly different from commercially developed systems. This paper summarizes our preliminary investigations into the evolution of the best known open source system: the Linux operating system kernel. Because Linux is large (over two million lines of code in the most recent version) and because its development model is not as tightly planned and managed as most industrial software processes, we had expected to find that Linux was growing more slowly as it got bigger and more complex. Instead, we have found that Linux has been growing at a super-linear rate for several years. In this paper, we explore the evolution of the Linux kernel both at the system level and within the major subsystems, and we discuss why we think Linux continues to exhibit such strong growth. %B Proceedings of the International Conference on Software Maintenance (ICSM'00) %S ICSM '00 %I IEEE Computer Society %C Washington, DC, USA %P 131– %@ 0-7695-0753-0 %U http://portal.acm.org/citation.cfm?id=850948.853411 %> https://flosshub.org/sites/flosshub.org/files/godfrey00.pdf