%0 Conference Proceedings %B 2017 IEEE/ACM 12th International Workshop on Software Engineering for Science (SE4Science) %D 2017 %T Advancing Open Science with Version Control and Blockchains %A Jonathan Bell %A Thomas D. LaToza %A Foteini Baldmitsi %A Angelos Stavrou %K blockchain %K replication %K reproducible %X The scientific community is facing a crisis of reproducibility: confidence in scientific results is damaged by concerns regarding the integrity of experimental data and the analyses applied to that data. Experimental integrity can be compromised inadvertently when researchers overlook some important component of their experimental procedure, or intentionally by researchers or malicious third-parties who are biased towards ensuring a specific outcome of an experiment. The scientific community has pushed for “open science” to add transparency to the experimental process, asking researchers to publicly register their data sets and experimental procedures. We argue that the software engineering community can leverage its expertise in tracking traceability and provenance of source code and its related artifacts to simplify data management for scientists. Moreover, by leveraging smart contract and blockchain technologies, we believe that it is possible for such a system to guarantee end-to-end integrity of scientific data and results while supporting collaborative research. %B 2017 IEEE/ACM 12th International Workshop on Software Engineering for Science (SE4Science) %P 13-14 %8 05/2017 %0 Journal Article %J Journal of Systems and Software %D 2017 %T Predicting bug-fixing time: A replication study using an open source software project %A Akbarinasaji, Shirin %A Caglayan, Bora %A Bener, Ayse %K Replication study; Bug fixing time; Effort estimation; Software maintainability; Deferred bugs %X Background: On projects with tight schedules and limited budgets, it may not be possible to resolve all known bugs before the next release. Estimates of the time required to fix known bugs (the “bug fixing time”) would assist managers in allocating bug fixing resources when faced with a high volume of bug reports. Aim: In this work, we aim to replicate a model for predicting bug fixing time with open source data from Bugzilla Firefox. Method: To perform the replication study, we follow the replication guidelines put forth by Carver [J. C. Carver, Towards reporting guidelines for experimental replications: a proposal, in: 1st International Workshop on Replication in Empirical Software Engineering, 2010.]. Similar to the original study, we apply a Markov-based model to predict the number of bugs that can be fixed monthly. In addition, we employ Monte-Carlo simulation to predict the total fixing time for a given number of bugs. We then use the k-nearest neighbors algorithm to classify fixing times into slow and fast. Result: The results of the replicated study on Firefox are consistent with those of the original study. The results show that there are similarities in the bug handling behaviour of both systems. Conclusion: We conclude that the model that estimates the bug fixing time is robust enough to be generalized, and we can rely on this model for our future research. %B Journal of Systems and Software %8 2/2017 %! Journal of Systems and Software %R 10.1016/j.jss.2017.02.021 %0 Conference Proceedings %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %D 2017 %T Release Early, Release Often and Release on Time. An Empirical Case Study of Release Management %A Teixeira, Jose %K openstack %K release management %X The dictum of “Release early, release often.” by Eric Raymond as the Linux modus operandi highlights the importance of release management in open source software development. Nevertheless, there are very few empirical studies addressing release management in open source software development. It is already known that most open source software communities adopt either feature-based or time-based release strategies. Each of these has its advantages and disadvantages that are context-specific. Recent research reported that many prominent open source software projects have moved from feature-based to time-based releases. In this longitudinal case study, we narrate how OpenStack shifted towards a liberal six-month release cycle. If prior research discussed why projects should adopt time-based releases and how they can adopt such a strategy, we discuss how OpenStack adapted its software development processes, its organizational design and its tools toward a hybrid release management strategy — a strive for balancing the benefits and drawbacks of feature-based and time-based release strategies. %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %S IFIP Advances in Information and Communication Technology %I Springer %V 496 %P 167-181 %8 05/2017 %U https://link.springer.com/chapter/10.1007/978-3-319-57735-7_16 %R 10.1007/978-3-319-57735-7_16 %0 Journal Article %J Journal of Information Processing %D 2016 %T Magnet or Sticky? Measuring Project Characteristics from the Perspective of Developer Attraction and Retention %A Yamashita, Kazuhiro %A Kamei, Yasutaka %A McIntosh, Shane %A Hassan, Ahmed E. %A Ubayashi, Naoyasu %K github %K retention %X Open Source Software (OSS) is vital to both end users and enterprises. As OSS systems are becoming a type of infrastructure, long-term OSS projects are desired. For the survival of OSS projects, the projects need to not only retain existing developers, but also attract new developers to grow. To better understand how projects retain and attract contributors, our preliminary study aimed to measure the personnel attraction and retention of OSS projects using a pair of population migration metrics, called Magnet (personnel attraction) and Sticky (retention) metrics. Because the preliminary study analyzed only 90 projects and the 90 projects are not representative of GitHub, this paper extend the preliminary study to better understand the generalizability of the results by analyzing 16, 552 projects of GitHub. Furthermore, we also add a pilot study to investigate the typical duration between releases to find more appropriate release duration. The study results show that (1) approximately 23% of developers remain in the same projects that the developers contribute to, (2) the larger projects are likely to attract and retain more developers, (3) 53% of terminal projects eventually decay to a state of fewer than ten developers and (4) 55% of attractive projects remain in an attractive category. %B Journal of Information Processing %V 24 %P 339-348 %U https://www.jstage.jst.go.jp/article/ipsjjip/24/2/24_339/_article %R 10.2197/ipsjjip.24.339 %0 Report %D 2015 %T Candoia: A Platform and an Ecosystem for Building and Deploying Versatile Mining Software Repositories Tools %A Nitin M. Tiwari %A Dalton D. Mills %A Ganesha Upadhyaya %A Eric Lin %A Rajan, Hridesh %K Analysis of software and its evolution %K Application specific development environments %K flossmole cited %K msr %K research to practice %K software evolution %K software repositories %X Research on mining software repositories (MSR) has shown great promise during the last decade in solving many challenging software engineering problems. There exists, however, a ‘valley of death’ between these significant innovations in the MSR research and their deployment in practice. The significant cost of converting a prototype to software; need to provide support for a wide variety of tools and technologies e.g. CVS, SVN, Git, Bugzilla, Jira, Issues, etc, to improve applicability; and the high cost of customizing tools to practitioner-specific settings are some key hurdles in transition to practice. We describe Candoia, a platform and an ecosystem that is aimed at bridging this valley of death between innovations in MSR research and their deployment in practice. We have implemented Candoia and provide facilities to build and publish MSR ideas as Candoia apps. Our evaluation demonstrates that Candoia drastically reduces the cost of converting an idea to an app, thus reducing the barrier to transitioning research findings into practice. We also see versatility, in Candoia app’s ability to work with a variety of tools and technologies that the platform supports. Finally, we find that customizing Candoia app to fit project-specific needs is often well within the grasp of developers. %B Iowa State University Computer Science Technical Reports %I Iowa State University %8 11/2015 %U http://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=1378&context=cs_techreports %> https://flosshub.org/sites/flosshub.org/files/Candoia-%20A%20Platform%20and%20an%20Ecosystem%20for%20Building%20and%20Deploying%20V.pdf %0 Book Section %B Open Source Systems: Adoption and Impact %D 2015 %T The RISCOSS Platform for Risk Management in Open Source Software Adoption %A Franch, X. %A Kenett, R. %A Mancinelli, F. %A Susi, A. %A Ameller, D. %A Annosi, M.C. %A Ben-Jacob, R. %A Blumenfeld, Y. %A Franco, O.H. %A Gross, D. %A Lopez, L. %A Morandini, M. %A Oriol, M. %A Siena, A. %E Damiani, Ernesto %E Frati, Fulvio %E Dirk Riehle %E Wasserman, Anthony I. %K Open source adoption %K Open Source Projects %K open source software %K OSS %K Risk Management %K Software platform %X Managing risks related to OSS adoption is a must for organizations that need to smoothly integrate OSS-related practices in their development processes. Adequate tool support may pave the road to effective risk management and ensure the sustainability of such activity. In this paper, we present the RISCOSS platform for managing risks in OSS adoption. RISCOSS builds upon a highly configurable data model that allows customization to several types of scopes. It implements two different working modes: exploration, where the impact of decisions may be assessed before making them; and continuous assessment, where risk variables (and their possible consequences on business goals) are continuously monitored and reported to decision-makers. The blackboard-oriented architecture of the platform defines several interfaces for the identified techniques, allowing new techniques to be plugged in. %B Open Source Systems: Adoption and Impact %S IFIP Advances in Information and Communication Technology %I Springer International Publishing %V 451 %P 124-133 %@ 978-3-319-17836-3 %U http://dx.doi.org/10.1007/978-3-319-17837-0_12 %R 10.1007/978-3-319-17837-0_12 %0 Journal Article %J MIS Quarterly %D 2014 %T COLLABORATION THROUGH OPEN SUPERPOSITION: A THEORY OF THE OPEN SOURCE WAY. %A Howison, James %A Kevin Crowston %K COLLABORATION %K COMPUTER programmers %K COMPUTER programming %K COMPUTER software %K coordination %K FREEWARE (Computer software) %K INFORMATION storage & retrieval systems %K open source software %K research %K socio-technical system %X This paper develops and illustrates the theory of collaboration through open superposition: the process of depositing motivationally independent layers of work on top of each other over time. The theory is developed in a study of community-based free and open source software (FLOSS) development, through a research arc of discovery (participant observation), replication (two archival case studies), and theorization. The theory explains two key findings: (1) the overwhelming majority of work is accomplished with only a single programmer working on any one task, and (2) tasks that appear too large for any one individual are more likely to be deferred until they are easier rather than being undertaken through structured team work. Moreover, the theory explains how working through open superposition can lead to the discovery of a work breakdown that results in complex, functionally interdependent, work being accomplished without crippling search costs. We identify a set of socio-technical %B MIS Quarterly %V 38 %P 29 - A9 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T Crafting a Systematic Literature Review on Open-Source Platforms %A Teixeira, Jose %A Baiyere, Abayomi %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K Ecosystems %K FLOSS %K open-source %K Platforms %K R&D Management %X This working paper unveils the crafting of a systematic literature review on open-source platforms. The high-competitive mobile devices market, where several players such as Apple, Google, Nokia and Microsoft run a platforms- war with constant shifts in their technological strategies, is gaining increasing attention from scholars. It matters, then, to review previous literature on past platforms-wars, such as the ones from the PC and game-console industries, and assess its implications to the current mobile devices platforms-war. The paper starts by justifying the purpose and rationale behind this literature review on open-source platforms. The concepts of open-source software and computer-based platforms were then discussed both individually and in unison, in order to clarify the core-concept of “open-source platform” that guides this literature review. The detailed design of the employed methodological strategy is then presented as the central part of this paper. The paper concludes with preliminary findings organizing previous literature on open-source platforms for the purpose of guiding future research in this area. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 113-122 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_16 %R 10.1007/978-3-642-55128-4_16 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T Flow Research SXP Agile Methodology for FOSS Projects %A Peñalver Romero, GladysMarsi %A Leyva Samada, LisandraIsabel %A Abad, AbelMeneses %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K methodology SXP %K open-source %K production %K research %K Software %X This paper aims to explain a procedure that takes into account the different research processes carried out in developing an open-source, allowing control and management. This study is the SXP methodology applied in this type of project was carried out, allowing the validity of the basis of this research. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 195-198 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_28 %R 10.1007/978-3-642-55128-4_28 %0 Book Section %B Open Source Software: Mobile Open Source Technologies %D 2014 %T A Layered Approach to Managing Risks in OSS Projects %A Franch, Xavier %A Kenett, Ron %A Mancinelli, Fabio %A Susi, Angelo %A Ameller, David %A Ben-Jacob, Ron %A Siena, Alberto %E Corral, Luis %E Sillitti, Alberto %E Succi, Giancarlo %E Vlasenko, Jelena %E Wasserman, AnthonyI. %K Layered Model %K open source %K OSS %K Risk Management %X In this paper, we propose a layered approach to managing risks in OSS projects. We define three layers: the first one for defining risk drivers by collecting and summarising available data from different data sources, including human-provided contextual information; the second layer, for converting these risk drivers into risk indicators; the third layer for assessing how these indicators impact the business of the adopting organisation. The contributions are: 1) the complexity of gathering data is isolated in one layer using appropriate techniques, 2) the context needed to interpret this data is provided by expert involvement evaluating risk scenarios and answering questionnaires in a second layer, 3) a pattern-based approach and risk reasoning techniques to link risks to business goals is proposed in the third layer. %B Open Source Software: Mobile Open Source Technologies %S IFIP Advances in Information and Communication Technology %I Springer Berlin Heidelberg %V 427 %P 168-171 %@ 978-3-642-55127-7 %U http://dx.doi.org/10.1007/978-3-642-55128-4_23 %R 10.1007/978-3-642-55128-4_23 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Understanding "Watchers" on GitHub %A Sheoran, Jyoti %A Blincoe, Kelly %A Kalliamvakou, Eirini %A Damian, Daniela %A Ell, Jordan %K github %K mining challenge %K msr challenge %K repositories %K Software Teams %K Watchers %X Users on GitHub can watch repositories to receive notifications about project activity. This introduces a new type of passive project membership. In this paper, we investigate the behavior of watchers and their contribution to the projects they watch. We find that a subset of project watchers begin contributing to the project and those contributors account for a significant percentage of contributors on the project. As contributors, watchers are more confident and contribute over a longer period of time in a more varied way than other contributors. This is likely attributable to the knowledge gained through project notifications. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 336–339 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597114 %R 10.1145/2597073.2597114 %0 Conference Paper %B Proceedings of The International Symposium on Open Collaboration %D 2014 %T Volunteer Attraction and Retention in Open Source Communities %A Barcomb, Ann %K Community Management %K FLOSS %K open source %K Recruitment %K Service Duration %K Volunteer Management %K Volunteer Retention %K Volunteers %X The importance of volunteers in open source has led to the position of community manager becoming more common in foundations and projects. Yet the advice for volunteer management and retention is fragmented, incomplete, contradictory, and has not been empirically examined. Our aim is to fill this gap by creating a comprehensive guidebook of best practices drawing from open source practitioner guides and general literature on volunteering, and to subject a subset of practices to empirical study. A method for evaluating volunteer attrition in terms of value to the organization will also be developed. %B Proceedings of The International Symposium on Open Collaboration %S OpenSym '14 %I ACM %C New York, NY, USA %P 40:1–40:2 %@ 978-1-4503-3016-9 %U http://doi.acm.org/10.1145/2641580.2641628 %R 10.1145/2641580.2641628 %0 Conference Proceedings %B 35th Int'l Conference on Software Engineering (ICSE 2013) %D 2013 %T Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories %A Dyer, Robert %A Nguyen, Hoan Anh %A Rajan, Hridesh %A Nguyen, Tien N. %K ease of use %K forge %K github %K google code %K lower barrier to entry %K mining %K repository %K reproducible %K scalable %K Software %K sourceforge %X In today’s software-centric world, ultra-large-scale software repositories, e.g. SourceForge (350,000+ projects), GitHub (250,000+ projects), and Google Code (250,000+ projects) are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important hypotheses. However, systematic extraction of relevant data from these repositories and analysis of such data for testing hypotheses is hard, and best left for mining software repository (MSR) experts! The goal of Boa, a domain-specific language and infrastructure described here, is to ease testing MSR-related hypotheses. We have implemented Boa and provide a web-based interface to Boa’s infrastructure. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also see drastic improvements in scalability. Last but not least, reproducing an experiment conducted using Boa is just a matter of re-running small Boa programs provided by previous researchers. %B 35th Int'l Conference on Software Engineering (ICSE 2013) %P 422-431 %8 05/2013 %0 Journal Article %J Entertainment Computing %D 2013 %T Building and mining a repository of design pattern instances: Practical and research benefits %A Apostolos Ampatzoglou %A Olia Michou %A Ioannis Stamelos %K flossmole cited %K repository %X Design patterns are well-known design solutions that are reported to produce substantial benefits with respect to software quality. However, to our knowledge there are no scientific efforts on gathering information on software projects that use design patterns. This paper introduces a web repository of design patterns instances that have been used in open source projects. The usefulness of such a repository lies in the provision of a base of knowledge, where developers can identify reusable components and researchers can find a mined data set. Currently, 141 open source projects have been considered and more than 4500 pattern instances have been found and recorded in the database of the repository. The evaluation of the repository has been performed from an academic and a practical point of view. The results suggest that the repository can be useful for both experienced and inexperienced users. However, the benefits of using the repository are more significant for inexperienced users. %B Entertainment Computing %V 4 %P 131 - 142 %U http://www.sciencedirect.com/science/article/pii/S1875952112000195 %R 10.1016/j.entcom.2012.10.002 %0 Conference Paper %B Proceedings of the 22Nd International Conference on World Wide Web Companion %D 2013 %T Discovery of Technical Expertise from Open Source Code Repositories %A Venkataramani, Rahul %A Gupta, Atul %A Asadullah, Allahbaksh %A Muddu, Basavaraju %A Bhat, Vasudev %K github %K knowledge discovery %K recommendations %K source code repository %K stackoverflow %K technical expertise %X Online Question and Answer websites for developers have emerged as the main forums for interaction during the software development process. The veracity of an answer in such websites is typically verified by the number of 'upvotes' that the answer garners from peer programmers using the same forum. Although this mechanism has proved to be extremely successful in rating the usefulness of the answers, it does not lend itself very elegantly to model the expertise of a user in a particular domain. In this paper, we propose a model to rank the expertise of the developers in a target domain by mining their activity in different opensource projects. To demonstrate the validity of the model, we built a recommendation system for StackOverflow which uses the data mined from GitHub. %B Proceedings of the 22Nd International Conference on World Wide Web Companion %S WWW '13 Companion %I International World Wide Web Conferences Steering Committee %C Republic and Canton of Geneva, Switzerland %P 97–98 %@ 978-1-4503-2038-2 %U http://dl.acm.org/citation.cfm?id=2487788.2487832 %0 Conference Proceedings %D 2013 %T Project Roles in the Apache Software Foundation: A Dataset %A Squire, Megan %K apache %K dataset %K roles %X This paper outlines the steps in the creation and maintenance of a new dataset listing leaders of the various projects of the Apache Software Foundation (ASF). Included in this dataset are different levels of committers to the various ASF project code bases, as well as regular and emeritus members of the ASF, and directors and officers of the ASF. The dataset has been donated to the FLOSSmole project under an open source license, and is available for download (https://code.google.com /p/flossmole/downloads/detail?name=apachePeople2013-Jan.zip), or for direct querying via a database client. %8 05/2013 %> https://flosshub.org/sites/flosshub.org/files/apacheRolesPREPRINT.pdf %> https://flosshub.org/sites/flosshub.org/files/MSR%20presentation_0.pdf %0 Conference Proceedings %B 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER2013) %D 2013 %T A Replicable Infrastructure for Empirical Studies of Email Archives %A Squire, Megan %K apache %K cleaning %K collection %K couchdb %K database %K document-oriented database %K email %K lucene %K mailing lists %K nosql %K replication %K storage %X This paper describes a replicable infrastructure solution for conducting empirical software engineering studies based on email mailing list archives. Mailing list emails, such as those affiliated with free, libre, and open source software (FLOSS) projects, are currently archived in several places online, but each research team that wishes to study these email artifacts closely must design their own solution for collection, storage and cleaning of the data. Consequently, research results will be difficult to replicate, especially as the email archive for any living project will still be continually growing. This paper describes a simple, replicable infrastructure for the collection, storage, and cleaning of project email data and analyses. %B 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER2013) %I IEEE %C Baltimore, MD, USA %P 43-50 %8 10/2013 %@ 978-0-7695-5121-0 %> https://flosshub.org/sites/flosshub.org/files/RESERv2.pdf %0 Conference Proceedings %B 10th Working Conference on Mining Software Repositories %D 2013 %T Who Does What during a Code Review? Datasets of OSS Peer Review Repositories %A Kazuki Hamasaki %A Raula Gaikovina Kula %A Norihiro Yoshida %A A. E. Camargo Cruz %A Kenji Fujiwara %A Hajimu Iida %K android %K case study %K code review %K data set %K peer review %K roles %K source code %X We present four datasets that are focused on the general roles of OSS peer review members. With data mined from both an integrated peer review system and code source repositories, our rich datasets comprise of peer review data that was automatically recorded. Using the Android project as a case study, we describe our extraction methodology, the datasets and their application used for three separate studies. Our datasets are available online at http://sdlab.naist.jp/reviewmining/ %B 10th Working Conference on Mining Software Repositories %8 05/2013 %0 Journal Article %J International Journal of Open Source Software and Processes %D 2012 %T How the FLOSS Research Community Uses Email Archives %A Squire, Megan %K email %K email archives %K literature %K mailing lists %K review %K Survey %X Artifacts of the software development process, such as source code or emails between developers, are a frequent object of study in empirical software engineering literature. One of the hallmarks of free, libre, and open source software (FLOSS) projects is that the artifacts of the development process are publicly-accessible and therefore easily collected and studied. Thus, there is a long history in the FLOSS research community of using these artifacts to gain understanding about the phenomenon of open source software, which could then be compared to studies of software engineering more generally. This paper looks specifically at how the FLOSS research community has used email artifacts from free and open source projects. It provides a classification of the relevant literature using a publicly-available online repository of papers about FLOSS development using email. The outcome of this paper is to provide a broad overview for the software engineering and FLOSS research communities of how other researchers have used FLOSS email message artifacts in their work %B International Journal of Open Source Software and Processes %V 4 %P 37 - 59 %8 12/2012 %N 1 %R 10.4018/jossp.2012010103 %> https://flosshub.org/sites/flosshub.org/files/ijossp_v3_PREPRINT.pdf %0 Thesis %D 2012 %T Software Libre y abierto: comunidades y redes de producción digital de bienes comunes %A Tania E. Turner Sen %K bienes comunes %K commons %K comunidades virtuales %K FLOSS %K flossmole %K hackers %K redes virtuales %K repositories %K repositorios %K Software libre y abierto %K virtual communities %K virtual networks %X This thesis is about a collective form of production that have expanded and strengthen in the global high technology market. It is about FLOSS production. The study takes on account that technnologies are not neutral, they emerge as strategies and mechanisms of politics and economic interests. Although, FLOSS production is inserted in the capitalist context, the collective work of the communities and networks that produce it is based on ideas about freedom and solidarity. The types of rules and organization of labour inside of this communities have develop a kind of product that it is well categorized as part of the new commons. The conclusions at the end of this work pretend to offer a clear approach to the FLOSS production networks dynamics inside the virtual infrastructure. Specifically, it offers an approach of the interaction and forms of cooperation, as well of the individual and collective schemas that motivates the cooperation action of the individuals. %I Universidad Nacional Autónoma de México %C Ciudad de México, México %P 269 pages %U http://132.248.9.195/ptd2012/agosto/406008604/Index.html %> https://flosshub.org/sites/flosshub.org/files/Tesis.pdf %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Building Knowledge in Open Source Software Research in Six Years of Conferences %A Mulazzini, Fabio %A Rossi, Bruno %A Russo, Barbara %A Steff, Maximilian %K Cross-citations %K flossmole cited %K graph %K literature review %K network %K research %K Systematic Mapping Study %X Since its origins, the diffusion of the OSS phenomenon and the information about it has been entrusted to the Internet and its virtual communities of developers. This public mass of data has attracted the interest of researchers and practitioners aiming at formalizing it into a body of knowledge. To this aim, in 2005, a new series of conferences on OSS started to collect and convey OSS knowledge to the research and industrial community. Our work mines articles of the OSS conference series to understand the process of knowledge grounding and the community surrounding it. As such, we propose a semi-automated approach for a systematic mapping study on these articles. We automatically build a map of cross-citations among all the papers of the conferences and then we manually inspect the resulting clusters to identify knowledge building blocks and their mutual relationships. We found that industry-related, quality assurance, and empirical studies often originate or maintain new streams of research. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 123-141 %8 10/2011 %0 Conference Paper %B International Conference on Software and Systems Process (ICSSP 2011) %D 2011 %T Experiences Mining Open Source Release Histories %A Jason Tsay %A Wright, Hyrum %A Perry, Dewayne %K doap %K flossmole cited %K life cycle %K release engineering %K release history %K release management %K releases %X Software releases form a critical part of the life cycle of a software project. Typically, each project produces releases in its own way, using various methods of versioning, archiving, announcing and publishing the release. Understanding the release history of a software project can shed light on the project history, as well as the release process used by that project, and how those processes change. However, many factors make automating the retrieval of release history information difficult, such as the many sources of data, a lack of relevant standards and a disparity of tools used to create releases. In spite of the large amount of raw data available, no attempt has been made to create a release history database of a large number of projects in the open source ecosystem. This paper presents our experiences, including the tools, techniques and pitfalls, in our early work to create a software release history database which will be of use to future researchers who want to study and model the release engineering process in greater depth. %B International Conference on Software and Systems Process (ICSSP 2011) %8 05/2011 %> https://flosshub.org/sites/flosshub.org/files/icssp11short-p034-tsay.pdf %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Knowledge Homogeneity and Specialization in the Apache HTTP Server Project %A MacLean, Alexander C. %A Pratt, Landon J. %A Knutson, Charles D. %A Ringger, Eric K. %K apache %K commits %K developer %K email %K email archive %K LDA %K mailing list %K revision control %K revision history %K scm %K social network analysis %K specialization %K subversion %K svn %X We present an analysis of developer communication in the Apache HTTP Server project. Using topic modeling techniques we expose latent conceptual sub-communities arising from developer specialization within the greater developer population. However, we found that among the major contributors to the project, very little specialization exists. We present theories to explain this phenomenon, and suggest further research. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 106-122 %8 10/2011 %U http://sequoia.cs.byu.edu/lab/files/pubs/MacLean2011a.pdf %> https://flosshub.org/sites/flosshub.org/files/MacLean2011a.pdf %0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Package Upgrade Robustness: An Analysis for GNU/Linux Package Management Systems %A Thomson, John %A Guerrriro, Andre %A Paulo Trezentos %A Johnson, Jeff %K linux %K package management %K rpm %X GNU/Linux systems are today used in servers, desktops, mobile and embedded devices. One of the critical operations is the installation and maintenance of software packages in the system. Currently there are no frameworks or tools for evaluating Package Management Systems (PMSs), such as RPM, in Linux and for measuring their reliability. The authors perform an analysis of the robustness of the RPM engine and discuss some of the current limitations. This article contributes to the enhancement of Software Reliability in Linux by providing a framework and testing tools under an open source license. These tools can easily be extended to other PMSs such as DEB packages or Gentoo Portage. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 299-306 %8 10/2011 %0 Journal Article %J Empirical Software Engineering %D 2011 %T The search for a research method for studying OSS process innovation %A Prechelt, Lutz %A Oezbek, Christopher %K argouml %K Bochs %K bugzilla %K Flyspray %K FreeDOS %K gEDA %K grounded theory %K Grub %K Innovation introduction %K KVM %K mailing list %K Methodology %K MonetDB %K open source %K Request Tracket %K Rox %K U-Boot %K Xfce %X Medium-sized, open-participation Open Source Software (OSS) projects do not usually perform explicit software process improvement on any routine basis. It would be useful to understand how to get such a project to accept a process improvement proposal and hence to perform process innovation. We want to determine an effective and feasible qualitative research method for studying the above question. We present (narratively) a case study of how we worked towards and eventually found such a research method. The case involves four attempts at collecting suitable data about innovation episodes (direct participation (twice), polling developers for episodes, manually finding episodes in mailing list archives) and the adaptation of the Grounded Theory data analysis methodology. Direct participation allows gathering rather rich data, but does not allow for observing a sufficiently large number of innovation episodes. Polling developers for episodes did not prove to be useful. Using mailing list archives to find data to be analyzed is both feasible and effective. We also describe how the data thus found can be analyzed based on the Grounded Theory Method with suitable adjustments. By-and-large, our findings ought to apply to studying various phenomena in OSS development processes that are similarly heavyweight and infrequent. However, specific details may block this possibility and we cannot predict which details that might be. The amount of effort involved in direct participation approaches to qualitative research can easily be underestimated. Also, survey approaches are not well-suited for many process issues in OSS, because too few developers are sufficiently process-conscious. An approach based on passive observation is a viable alternative in the OSS context due to the availability of large amounts of fairly complete archival data. %B Empirical Software Engineering %V 16 %P 514 - 537 %8 8/2011 %N 4 %! Empir Software Eng %R 10.1007/s10664-011-9160-1 %0 Conference Paper %B 1st Workshop on Replication in Empirical Software Engineering Research %D 2010 %T Beyond replication: An example of the potential benefits of replicability in the mining of software repositories community %A Gregorio Robles %A Daniel M. German %K literature review %K msr %K replication %B 1st Workshop on Replication in Empirical Software Engineering Research %8 05/2010 %0 Conference Paper %B Proceedings of the 3rd International Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development (FLOSS '10) %D 2010 %T The onion has cancer: some social network analysis visualizations of open source project communication %A Oezbek, Christopher %A Prechelt, Lutz %A Thiel, Florian %K argouml %K Bochs %K bugzilla %K communication structure %K Flyspray %K gEDA %K Grub %K MonetDB %K open source process %K request tracker %K Rox %K social network analysis %K Xfce %X Background: People contribute to OSS projects in wildly different degrees, from reporting a single defect once and never coming back to spending many hours each workday on the project over several years - or anything in between. It is a common conception that these degrees of participation sort the participants into a number of similar groups which are layered like the peels of an onion: The onion model. Objective: We check whether this model of gradually different degrees of participation is valid with respect to the participation in OSS project mailing-list traffic. Methods: We perform social network analysis based on replies to mailing-list messages and use visualization to check the nature of three different groups of participants. Results: There appears to be a discontinuity with respect to core members: The degree to which very active core members (as opposed to less active co-developers) react to e-mails of senders from the project's periphery is significantly higher than would be expected from their level of activity in general. Limitations: The effect might be an artifact of the assumption that each mailing-list message can be treated the same. Conclusions: We conclude that core member status may be qualitatively (rather than just quantitatively) different and the transition of individual mailing-list participants towards ever higher participation is qualitatively discontinuous. %B Proceedings of the 3rd International Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development (FLOSS '10) %S FLOSS '10 %I ACM %C New York, NY, USA %P 5–10 %@ 978-1-60558-978-7 %U http://doi.acm.org/10.1145/1833272.1833274 %R 10.1145/1833272.1833274 %> https://flosshub.org/sites/flosshub.org/files/OezThiPre10-SNA.pdf %0 Conference Paper %B Seventh Annual Acquisition Research Symposium, {NPS} Proceedings - %D 2010 %T On Open and Collaborative Software Development in the DoD %A Hissam, S. A. %A Weinstock, C. %A Bass, L. %K collaborative development %K open source software %K reuse %K software engineering %X The US Department of Defense (specifically, but not limited to, the DoD CIO's Clarifying Guidance Regarding Open Source Software, DISA's launch of Forge.mil and OSD's Open Technology Development Roadmap Plan) has called for increased use of open source software and the adoption of best practices from the free/open source software (F/OSS) community to foster greater reuse and innovation between programs in the DoD. In our paper, we examine some key aspects of open and collaborative software development inspired by the success of the F/OSS movement as it might manifest itself within the US DoD. This examination is made from two perspectives: the reuse potential among DoD programs sharing software and the incentives, strategies and policies that will be required to foster a culture of collaboration needed to achieve the benefits indicative of F/OSS. Our conclusion is that to achieve predictable and expected reuse, not only are technical infrastructures needed, but also a shift to the business practices in the software development and delivery pattern seen in the traditional acquisition lifecycle is needed. Thus, there is potential to overcome the challenges discussed within this paper to engender a culture of openness and community collaboration to support the DoD mission. %B Seventh Annual Acquisition Research Symposium, {NPS} Proceedings - %I Naval Postgraduate School %C Monterey, California %V 1 %P 219–235 %8 04/2010 %U http://www.acquisitionresearch.net/cms/_files/FY2010/NPS-AM-10-037.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings %A Gregorio Robles %K data %K literature review %K msr %K replication %X This paper is the result of reviewing all papers published in the proceedings of the former International Workshop on Mining Software Repositories (MSR) (2004-2006) and now Working Conference on MSR (2007-2009). We have analyzed the papers that contained any experimental analysis of software projects for their potentiality of being replicated. In this regard, three main issues have been addressed: i) the public availability of the data used as case study, ii) the public availability of the processed dataset used by researchers and iii) the public availability of the tools and scripts. A total number of 171 papers have been analyzed from the six workshops/working conferences up to date. Results show that MSR authors use in general publicly available data sources, mainly from free software repositories, but that the amount of publicly available processed datasets is very low. Regarding tools and scripts, for a majority of papers we have not been able to find any tool, even for papers where the authors explicitly state that they have built one. Lessons learned from the experience of reviewing the whole MSR literature and some potential solutions to lower the barriers of replicability are finally presented and discussed. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 171 - 180 %@ 978-1-4244-6802-7 %U http://gsyc.urjc.es/~grex/msr2010 %R 10.1109/MSR.2010.5463348 %> https://flosshub.org/sites/flosshub.org/files/171MSR_2010_69.final_.pdf %0 Conference Paper %B Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010) %D 2010 %T Responsiveness as a measure for assessing the health of OSS ecosystems %A Gamalielsson, Jonas %A Lundell, Björn %A Lings, Brian %K email %K email archives %K gmane %K mailing lists %K nagios %K response time %K sourceforge %X The health of an Open Source ecosystem is an important decision factor when considering the adoption of Open Source software or when monitoring a seeded Open Source project. In this paper we introduce responsiveness as a qualitative measure of the quality of replies within mailing lists, which can be used for assessing ecosystem health. We consider one specific metric of responsiveness in this paper, and that is the response time of follow-up messages in mailing lists. We also describe a way for characterising the nature of communication in messages with short and long response times. The approach is tested in the context of the Nagios project, and we particularly focus on the responsiveness for contributors acting in their professional roles as core developers. Our contribution is a step towards a deeper understanding of voluntary support provided in mailing lists of OSS projects. %B Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010) %8 05/2010 %> https://flosshub.org/sites/flosshub.org/files/osscomm002.pdf %0 Conference Paper %B 2010 43rd Hawaii International Conference on System Sciences (HICSS 2010) %D 2010 %T Towards an Openness Rating System for Open Source Software %A Bein, Wolfgang %A Jeffery, Clinton %K alice %K case study %K contribution %K documentation %K freespire %K galib %K latex %K license %K linux %K linux kernel %K mediaportal %K openness %K openoffice %K opensolaris %K rating %K unicon %X Many open source software projects are not very open to third party developers. The point of open source is to enable anyone to fix bugs or add desired capabilities without holding them hostage to the original developers. This principle is important because an open source project's developers may be unresponsive or unable to meet third party needs, even if funding support for requested improvements is offered.This paper presents a simple rating system for evaluating the openness of software distributions. The rating system considers factors such as platform portability, documentation, licensing, and contribution policy. Several popular open source products are rated in order to illustrate the efficacy of the rating system. %B 2010 43rd Hawaii International Conference on System Sciences (HICSS 2010) %I IEEE %C Honolulu, Hawaii, USA %P 1 - 8 %@ 978-1-4244-5509-6 %R 10.1109/HICSS.2010.405 %> https://flosshub.org/sites/flosshub.org/files/10-07-04.pdf %0 Journal Article %J International Journal of Open Source Software and Processes %D 2010 %T Weaving a Semantic Web Across OSS Repositories %A Olivier Berger %A Valentin Vlasceanu %A Christian Bac %A Quang Vu Dang %A Lauriere, Stéphane %K archive %K bug %K bugtracker %K database %K debian %K forge %K interoperability %K ontology %K OSLC-CM %K RDF %K repository of repositories %K semantic %K semantic Web %X Several public repositories and archives of “facts” about libre software projects, maintained either by open source communities or by research communities, have been flourishing over the Web in recent years. These have enabled new analysis and support for new quality assurance tasks. This paper presents some complementary existing tools, projects and models proposed both by OSS actors or research initiatives that are likely to lead to useful future developments in terms of study of the FLOSS phenomenon, and also to the very practitioners in the FLOSS development projects. A goal of the research conducted within the HELIOS project is to address bugs traceability issues. In this regard, the authors investigate the potential of using Semantic Web technologies in navigating between many different bugtracker systems scattered all over the open source ecosystem. By using Semantic Web techniques, it is possible to interconnect the databases containing data about open-source software projects development, which enables OSS partakers to identify resources, annotate them, and further interlink those using dedicated properties and collectively designing a distributed semantic graph. %B International Journal of Open Source Software and Processes %V 2 %P 29 - 40 %8 32/2010 %N 2 %R 10.4018/jossp.2010040103 %> https://flosshub.org/sites/flosshub.org/files/wopdasd2009-olivier-berger.pdf %0 Journal Article %J The R Journal %D 2009 %T Collaborative Software Development Using R-Forge %A Stefan Theußl %A Achim Zeileis %K forge %K R %K scm %K source code repositories %K statistics %X Open source software (OSS) is typically created in a decentralized self-organizing process by a community of developers having the same or similar interests (see the famous essay by Raymond, 1999). A key factor for the success of OSS over the last two decades is the Internet: Developers who rarely meet face-to-face can employ new means of communication, both for rapidly writing and deploying software (in the spirit of Linus Torvald’s “release early, release often paradigm”). Therefore, many tools emerged that assist a collaborative software development process, including in particular tools for source code management (SCM) and version control. In the R world, SCM is not a new idea; in fact, the R Development Core Team has always been using SCM tools for the R sources, first by means of Concurrent Versions System (CVS, see Cederqvist et al., 2006), and then via Subversion (SVN, see Pilato et al., 2004). A central repository is hosted by ETH Zürich mainly for managing the development of the base R system. Mailing lists like R-help, R-devel and many others are currently the main communication channels in the R community. First, we present the core features that R- Forge offers to the R community. Second, we give a hands-on tutorial on how users and developers can get started with R-Forge. In particular, we illustrate how people can register, set up new projects, use R- Forge’s SCM facilities, provide their packages on R-Forge, host a project-specific website, and how package maintainers submit a package to the Compre- hensive R Archive Network (CRAN, http://CRAN. R-project.org/). Finally, we summarize recent developments and give a brief outlook to future work. %B The R Journal %V 1 %P 9-14 %8 05/2009 %> https://flosshub.org/sites/flosshub.org/files/rjournal.pdf %0 Journal Article %J Decis. Support Syst. %D 2009 %T Determinants of open source software project success: A longitudinal study %A Subramaniam, Chandrasekar %A Sen, Ravi %A Nelson, Matthew L. %K contributors %K developers %K licenses %K longitudinal study %K Open source project %K OSS %K project success %K restrictive %K Software project success %X In this paper, we investigate open source software (OSS) success using longitudinal data on OSS projects. We find that restrictive OSS licenses have an adverse impact on OSS success. On further analysis, restrictive OSS license is found to be negatively associated with developer interest, but is positively associated with the interest of non-developer users and project administrators. We also show that developer and non-developer interest in the OSS project and the project activity levels in any time period significantly affect the project success measures in subsequent time period. The implications of our findings for OSS research and practice are discussed. %B Decis. Support Syst. %I Elsevier Science Publishers B. V. %C Amsterdam, The Netherlands, The Netherlands %V 46 %P 576–585 %8 January %U http://portal.acm.org/citation.cfm?id=1480545.1480824 %R 10.1016/j.dss.2008.10.005 %0 Journal Article %J Journal of Systems and Software %D 2009 %T Identifying exogenous drivers and evolutionary stages in FLOSS projects %A Karl Beecher %A Capiluppi, Andrea %A Boldyreff, Cornelia %K developers %K forge %K forges %K repositories %K repository %K scm %K software repositories %K sourceforge %K success %K users %X The success of a Free/Libre/Open Source Software (FLOSS) project has been evaluated in the past through the number of commits made to its configuration management system, number of developers and number of users. Most studies, based on a popular FLOSS repository (SourceForge), have concluded that the vast majority of projects are failures. This study's empirical results confirm and expand conclusions from an earlier and more limited work. Not only do projects from different repositories display different process and product characteristics, but a more general pattern can be observed. Projects may be considered as early inceptors in highly visible repositories, or as established projects within desktop-wide projects, or finally as structured parts of FLOSS distributions. These three possibilities are formalized into a framework of transitions between repositories. The framework developed here provides a wider context in which results from FLOSS repository mining can be more effectively presented. Researchers can draw different conclusions based on the overall characteristics studied about an Open Source software project's potential for success, depending on the repository that they mine. These results also provide guidance to OSS developers when choosing where to host their project and how to distribute it to maximize its evolutionary success. %B Journal of Systems and Software %V 82 %P 739 - 750 %U http://www.sciencedirect.com/science/article/B6V0N-4TVTJFS-1/2/e32ecee1bcb54bd4a5dff6d5e3daca8d %R DOI: 10.1016/j.jss.2008.10.026 %0 Conference Paper %B The International Symposium on Wiki's and Open Collaboration %D 2009 %T A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia %A Halfaker, A. %A Kittur, N. %A Kraut, R. %A Riedl, J. %K experience, %K ownership, %K peer %K peer, %K quality %K review, %K wikipedia, %K wikiwork, %B The International Symposium on Wiki's and Open Collaboration %C Orlando, FL %8 10/2009 %G eng %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T On mining data across software repositories %A Anbalagan, Prasanth %A Vouk, Mladen %K bug reports %K bugzilla %K Fedora %K Firefox %K htmlscraper %K integration %K launchpad %K national vulnerability database %K RedHat %K Suse %K tracker %K Ubuntu %X Software repositories provide abundance of valuable information about open source projects. With the increase in the size of the data maintained by the repositories, automated extraction of such data from individual repositories, as well as of linked information across repositories, has become a necessity. In this paper we describe a framework that uses web scraping to automatically mine repositories and link information across repositories. We discuss two implementations of the framework. In the first implementation, we automatically identify and collect security problem reports from project repositories that deploy the Bugzilla bug tracker using related vulnerability information from the National Vulnerability Database. In the second, we collect security problem reports for projects that deploy the Launchpad bug tracker along with related vulnerability information from the National Vulnerability Database. We have evaluated our tool on various releases of Fedora, Ubuntu, Suse, RedHat, and Firefox projects. The percentage of security bugs identified using our tool is consistent with that reported by other researchers. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 171 - 174 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069498 %> https://flosshub.org/sites/flosshub.org/files/171MiningAcrossmsr09.pdf %0 Journal Article %J Research Policy %D 2009 %T Monetary donations to an open source software platform %A Sandeep Krishnamurthy %A Tripathi, Arvind K. %K Collective action %K Donation %K Identification %K incentives %K metadata %K MOTIVATION %K Open source software platform %K projects %K Reciprocity %K Relational commitment %K sourceforge %X Online open source software platforms, such as Sourceforge.net, play a vital role in creating an ecosystem that enables the creation and growth of open source projects. However, there is little research exploring the interactions between open source stakeholders and the platform. We believe that the sustainability of the platform crucially depends on financial incentives. While platforms can obtain these incentives through multiple means, in this paper we focus on one form of financial incentives—voluntary monetary donations by open source community members. We report findings from two empirical studies that examine factors that impact donations. Study 1 investigates the factors that cause some community members to donate and not others. We find that the decision to donate is impacted by relational commitment with open source software platform, donation to projects and accepting donations from others. Study 2 examines what drives the level of donation. We find that the length of association with the platform and relational commitment affects donation levels. %B Research Policy %V 38 %P 404 - 414 %8 03/2009 %N 2 %! Research Policy %R 10.1016/j.respol.2008.11.004 %0 Journal Article %J Journal of Evolutionary Economics %D 2009 %T Returns from social capital in open source software networks %A Méndez-Durón, Rebeca %A García, Clara E. %K contributors %K developers %K games %K gpl %K project success %K roles %K social capital %K social network analysis %K social networks %K sourceforge %K srda %K teams %X Open Source Software projects base their operation on a collaborative structure for knowledge exchange in the form of provision or reception of information, expertise, and feedback on the creation of source code. Here, we address the direction of these knowledge flows among projects throughout social networks and their impact on project success. We identify the roles of membership or contribution that individuals play within projects. We found that connections through contributors who bring their knowledge to the project, improve project success, and that connection through members, who transfer their knowledge towards other projects, enhance project success. Finally, we found that ties through shared membership and contributions hamper project success. The analysis of knowledge flows and their impact on project success imply a translation of returns from investment in social capital, where investment takes the shape of knowledge flows and the returns mean the projects' diffusion over the network. %B Journal of Evolutionary Economics %V 19 %P 277 - 295 %8 4/2009 %N 2 %! J Evol Econ %R 10.1007/s00191-008-0125-5 %> https://flosshub.org/sites/flosshub.org/files/Mendez-DuronGarcia.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects %A Ossher, Joel %A Bajracharya, Sushil %A Linstead, Erik %A Baldi, Pierre %A Lopes, Cristina %K apache %K integration %K java %K java.net %K project %K repository %K sourceforge %K SourcererDB %X The open source movement has made vast quantities of source code available online for free, providing an extremely large dataset for empirical study and potential resuse. A major difficulty in exploiting this potential fully is that the data are currently scattered between competing source code repositories, none of which are structured for empirical analysis and cross-project comparison. As a result, software researchers and developers are left to compile their own datasets, resulting in duplicated effort and limited results. To address this challenge, we built SourcererDB, an aggregated repository of statically analyzed and cross-linked open source Java projects. SourcererDB contains local snapshots of 2,852 Java projects taken from Sourceforge, Apache and Java.net. These projects are statically analyzed to extract rich structural information, which is then stored in a relational database. References to entities in the 16,058 external jars are resolved and grouped, allowing for cross-project usage information to be accessed easily. This paper describes: (a) the mechanism for resolving and grouping these cross-project references, (b) the structure of and the metamodel for the SourcererDB repository, and (d) end-user dataset access mechanisms. Our goal in building SourcererDB is to provide a rich dataset of source code to facilitate the sharing of extracted data and to encourage reuse and repeatability of experiments. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 183 - 186 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069501 %0 Conference Paper %B International Conference on Intelligent User Interfaces %D 2009 %T Tagsplanations: Explaining Recommendations using Tags %A Vig, J. %A Sen, S. %A Riedl, J. %K recommender %K SYSTEMS %K tagging, %B International Conference on Intelligent User Interfaces %C Sanibel Island, FL %8 02/08/2009 %G eng %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Using Latent Dirichlet Allocation for automatic categorization of software %A Tian, Kai %A Revelle, Meghan %A Poshyvanyk, Denys %K categorization %K category mining %K lact %K mudablue %K multiple languages %K repository %X In this paper, we propose a technique called LACT for automatically categorizing software systems in open-source repositories. LACT is based on latent Dirichlet Allocation, an information retrieval method which is used to index and analyze source code documents as mixtures of probabilistic topics. For an initial evaluation, we performed two studies. In the first study, LACT was compared against an existing tool, MUDABlue, for classifying 41 software systems written in C into problem domain categories. The results indicate that LACT can automatically produce meaningful category names and yield classification results comparable to MUDABlue. In the second study, we applied LACT to 43 software systems written in different programming languages such as C/C++, Java, C#, PHP, and Perl. The results indicate that LACT can be used effectively for the automatic categorization of software systems regardless of the underlying programming language or paradigm. Moreover, both studies indicate that LACT can identify several new categories that are based on libraries, architectures, or programming languages, which is a promising improvement as compared to manual categorization and existing techniques. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 163 - 166 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069496 %> https://flosshub.org/sites/flosshub.org/files/163MSR2009_TianPos.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Advances in the Sourceforge Research Data Archive %A Matthew Van Antwerp %A Madey, Greg %K forge %K forges %K repositories %K repository %K sourceforge %K srda %X The SourceForge Research Data Archive (SRDA), located at http://zerlot.cse.nd.edu, is a collection of Open Source Software (OSS) data and resources [6]. Over 100 researchers worldwide use the archive for research in many fields. In this paper, we describe the recent changes, the work in progress, and future plans for making the archive easier to use and for allowing more advanced research to be done with the data available. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 25-29 %8 2009 %> https://flosshub.org/sites/flosshub.org/files/srda2008.pdf %0 Journal Article %J Information Economics and Policy %D 2008 %T The allocation of collaborative efforts in open-source software %A den Besten, Matthijs %A Jean-Michel Dalle %A Galia, Fabrice %K age %K apache %K complexity %K cvs %K division of labor %K functions %K gaim %K gcc %K ghostscript %K lines of code %K loc %K log files %K mozilla %K netbsd %K openssh %K postgresql %K python %K revision control %K scm %K size %K source code %K Stigmergy %K version control %X The article investigates the allocation of collaborative efforts among core developers (maintainers) of open-source software by analyzing on-line development traces (logs) for a set of 10 large projects. Specifically, we investigate whether the division of labor within open-source projects is influenced by characteristics of software code. We suggest that the collaboration among maintainers tends to be influenced by different measures of code complexity. We interpret these findings by providing preliminary evidence that the organization of open-source software development would self-adapt to characteristics of the code base, in a 'stigmergic' manner. %B Information Economics and Policy %V 20 %P 316 - 322 %U http://www.sciencedirect.com/science/article/B6V8J-4SSG4PN-1/2/88b3824c30a31c18929d8a5ca6d64f62 %R DOI: 10.1016/j.infoecopol.2008.06.003 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Analyzing the evolution of eclipse plugins %A Wermelinger, Michel %A Yu, Yijun %K architectural evolution %K cvs %K eclipse %K metadata %K msr challenge %K releases %K source code %X Eclipse is a good example of a modern component-based complex system that is designed for long-term evolution, due to its architecture of reusable and extensible components. This paper presents our preliminary results about the evolution of Eclipse's architecture, based on a lightweight and scalable analysis of the metadata in Eclipse's sources. We find that the development of Eclipse follows a systematic process: most architectural changes take place in milestones, and maintenance releases only make exceptional changes to component dependencies. We also found a stable architectural core that remains since the first release. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 133–136 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370783 %R http://doi.acm.org/10.1145/1370750.1370783 %0 Conference Paper %B the 2008 international workshopProceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T Branching and merging in the repository %A Spacco, Jamie %A Williams, Chadd C. %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K argouml %K changes %K cvs2svn %K diffj %K revision %K scm %K source code %K version control %X Two of the most complex operations version control software allows a user to perform are branching and merging. Branching provides the user the ability to create a copy of the source code to allow changes to be stored in version control but outside of the trunk. Merging provides the user the ability to copy changes from a branch to the trunk. Performing a merge can be a tedious operation and one that may be error prone. In this paper, we compare file revisions found on branches with those found on the trunk to determine when a change that is applied to a branch is moved to the trunk. This will allow us to study how developers use merges and to determine if merges are in fact more error prone than other commits. %B the 2008 international workshopProceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 19-22 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370754 %> https://flosshub.org/sites/flosshub.org/files/p19-williams.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Cross-repository data linking with RDF and OWL %A Howison, James %K data integration %K flossmole %K forges %K integration %K owl %K RDF %K repositories %K semantic %K semantic Web %K sparql %K srda %X This paper provides an approach to the problem of integrating data from multiple research repositories for FLOSS data. It introduces semantic web technologies (RDF, OWL, OWL-DL reasoners and SPARQL) to argue that these are useful for building shared research infrastructure. The paper illustrates its point by describing parts of an ontology developed for the integration and analysis of project communications drawn from FLOSSmole, the Notre Dame archive and direct collection of data. RDF vocabularies provide a way to agree on things we agree about as well as a way to be clearer about ways in which we disagree. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 15-22 %8 2009 %> https://flosshub.org/sites/flosshub.org/files/howison2008.pdf %0 Journal Article %J Industrial and Corporate Change %D 2008 %T Dynamics of innovation in an "open source" collaboration environment: lurking, laboring, and launching FLOSS projects on SourceForge %A David, P. A. %A Rullani, F. %K contributors %K core %K developers %K roles %K SFnetDataset %K sourceforge %K users %K virtual communities %K virtual organization %K virtual organizations %X A systems analysis perspective is adopted to examine the critical properties of the Free/Libre/Open Source Software (FLOSS) mode of innovation, as reflected on the SourceForge platform (SF.net). This approach re-scales March's (1991) framework and applies it to characterize the “innovation system” of a “distributed organization” of interacting agents in a virtual collaboration environment, rather than to innovation within a firm. March (1991) views the process of innovation at the organizational level as the coupling of sub-processes of exploration and exploitation. Correspondingly, the innovation system of the virtual collaboration environment represented by SF.net is an emergent property of two “coupled” processes: one involves the interactions among agents searching the locale for information and knowledge resources to use in designing novel software products (i.e., exploration), and the other involves the mobilization of individuals’ capabilities for application in the software development projects that become established on the platform (i.e., exploitation). The micro-dynamics of this system are studied empirically by constructing transition probability matrices representing the movements of 222,835 SF.net users among seven different activity states, which range from “lurking” (not contributing or contributing to projects without becoming a member) to “laboring” (joining one or more projects as members), and to “launching” (founding one or more projects) within each successive 6-month interval. The estimated probabilities are found to form first-order Markov chains describing ergodic processes. This makes it possible the computation of the equilibrium distribution of agents among the states, thereby suppressing transient effects and revealing persisting patterns of project joining and project launching. The latter show the FLOSS innovation process on SF.net to be highly dissipative: a very large proportion of the registered “developers” fail to become even minimally active on the platform. There is nevertheless an active core of mobile project joiners, and a (still smaller) core of project founders who persist in creating new projects. The structure of these groups’ interactions (as displayed within the 3-year period examined) is investigated in detail, and it is shown that it would be sufficient to sustain both the exploration and exploitation phases of the platform's global dynamics. %B Industrial and Corporate Change %V 17 %P 647 - 710 %8 07/2008 %N 4 %! Industrial and Corporate Change %R 10.1093/icc/dtn026 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Expertise identification and visualization from CVS %A Alonso, Omar %A Premkumar T. Devanbu %A Gertz, Michael %K apache %K classification %K committers %K components %K contributors %K expertise %K expertise identification %K repository %K scm %K source code %X As software evolves over time, the identification of expertise becomes an important problem. Component ownership and team awareness of such ownership are signals of solid project. Ownership and ownership awareness are also issues in open-source software (OSS) projects. Indeed, the membership in OSS projects is dynamic with team members arriving and leaving. In large open source projects, specialists who know the system very well are considered experts. How can one identify the experts in a project by mining a particular repository like the source code? Have they gotten help from other people? We provide an approach using classification of the source code tree as a path to derive the expertise of the committers. Because committers may get help from other people, we also retrieve their contributors. We also provide a visualization that helps to further explore the repository via committers and categories. We present a prototype implementation that describes our research using the Apache HTTP Web server project as a case study. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 125–128 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370780 %R http://doi.acm.org/10.1145/1370750.1370780 %> https://flosshub.org/sites/flosshub.org/files/p125-alonso.pdf %0 Journal Article %J Information Economics and Policy %D 2008 %T Explaining leadership in virtual teams: The case of open source software %A Paola Giuri %A Francesco Rullani %A Salvatore Torrisi %K contributors %K Human capital %K leadership %K roles %K sourceforge %K team %X This paper contributes to the open source software (OSS) literature by investigating the likelihood that a participant becomes a project leader. Project leaders are key actors in a virtual community and are crucial to the success of the OSS model. Knowledge of the forces that lead to the emergence of project managers among the multitude of participants is still limited. We aim to fill this gap in the literature by analyzing the association between the roles played by an individual who is registered with a project, and a set of individual-level and project-level characteristics. In line with the theory of occupational choice elaborated by (Lazear, E.P., 2002. Entrepreneurship. NBER Working Paper No. 9109, Cambridge, Mass; Lazear, E.P., 2004. Balanced skills and entrepreneurship, American Economic Review 94, pp. 208-211), we find that OSS project leaders possess diversified skill sets which are needed to select the inputs provided by various participants, motivate contributors, and coordinate their efforts. Specialists, like pure developers, are endowed with more focused skill sets. Moreover, we find that the degree of modularity of the development process is positively associated with the presence of project leaders. That result is consistent with the modern theory of modular production (Baldwin, C.Y., Clark, K.B., 1997. Managing in an age of modularity. Harvard Business Review September-October. pp. 84-93; Mateos-Garcia, J., Steinmueller, W.E., 2003. The Open Source Way of Working: A New Paradigm for the Division of Labour in Software Development? SPRU - Science and Technology Policy Studies. Open Source Movement Research INK Working Paper, No. 1; Aoki, M., 2004. An organizational architecture of T-form: Silicon Valley clustering and its institutional coherence. Industrial and Corporate Change 13, pp. 967-981). %B Information Economics and Policy %V 20 %P 305 - 315 %U http://www.sciencedirect.com/science/article/B6V8J-4SRW10C-1/2/5ce36096ba3947338962268b54a5a7a9 %R DOI: 10.1016/j.infoecopol.2008.06.002 %0 Generic %D 2008 %T How Do Firms Make Use of Open Source Communities? %A Linus Dahlander %A M Magnusson %K case study %K cendio %K email %K mailing list %K mysql %K roxen %K secondary data %K sot %X Relying on four in-depth case studies of firms involved with open source software, we investigate how firms make use of open source communities, and how that use is associated with their business models. Three themes - accessing, aligning and assimilating -are inductively developed for how the firms relate to the external knowledge created in the communities. For each theme, we make an argument about the tactics associated with each theme and their positive and negative consequences. The findings are related to the literature on the open and distributed nature of innovation, and various theoretical and managerial implications are discussed. %B Long Range Planning %V 41 %P 629-649 %8 Dec %G eng %U http://www.acm.jhu.edu/~paulproteus/tmp/sdarticle.pdf %> https://flosshub.org/sites/flosshub.org/files/dahlandermagnusson2008.pdf %0 Conference Paper %B 3nd International Workshop on Public Data about Software Development (WoPDaSD 2008), Milano, Italy, September 2008 %D 2008 %T Improving community awareness in software forges by semantical aggregation of tools feeds %A Quang Vu Dang %A Christian Bac %A Olivier Berger %A Xuan Sang Dao %K community of practice %K DOAF. %K FOAF %K free and open source software development %K public data %K RDF %K semantic Web %K social filtering %K social network analysis %X It is rather difficult to monitor or visualize what can be the contribution of a member in a project, especially when the project uses multiple tools to produce its results. This is the case for collaborative development of FLOSS software, that use Wiki, bug tracker, mailing lists and source code management tools. This paper presents an approach to data collection by using aggregation of feeds published by the different tools of a software forge. To allow this aggregation, collected data is semantically reformatted into Semantic Web standards: RDF, DC, DOAP, and FOAF. Resulting data can then be processed, republished or displayed to project members. We implemented this approach in a supervision module that has been integrated into the PicoForge platform. This module is able do draw a live graph of the social community out of the different sources of data, and in turn export semantic feeds for other uses. %B 3nd International Workshop on Public Data about Software Development (WoPDaSD 2008), Milano, Italy, September 2008 %G eng %> https://flosshub.org/sites/flosshub.org/files/Paper4.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Mining usage expertise from version archives %A Schuler, David %A Zimmermann, Thomas %K api %K computer-supported cooperative work %K eclipse %K expertise %K recommendation %K scm %K software repository %K source code %X In software development, there is an increasing need to find and connect developers with relevant expertise. Existing expertise recommendation systems are mostly based on variations of the Line 10 Rule: developers who changed a file most often have the most implementation expertise. In this paper, we introduce the concept of usage expertise, which manifests itself whenever developers are using functionality, e.g., by calling API methods. We present preliminary results for the ECLIPSE project that demonstrate that our technique allows to recommend experts for files with no or little history, identify developers with similar expertise, and measure the usage of API methods. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 121–124 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370779 %R http://doi.acm.org/10.1145/1370750.1370779 %> https://flosshub.org/sites/flosshub.org/files/p121-schuler.pdf %0 Conference Paper %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T On the relation of refactorings and software defect prediction %A Sigmund, Thomas %A Gall, Harald C. %A Ratzinger, Jacek %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K argouml %K bug fixing %K bug reports %K defects %K evolution %K jboss %K liferay %K prediction %K refactoring %K spring %K weka %K xdoclet %X This paper analyzes the influence of evolution activities such as refactoring on software defects. In a case study of five open source projects we used attributes of software evolution to predict defects in time periods of six months. We use versioning and issue tracking systems to extract 110 data mining features, which are separated into refactoring and non-refactoring related features. These features are used as input into classification algorithms that create prediction models for software defects. We found out that refactoring related features as well as non-refactoring related features lead to high quality prediction models. Additionally, we discovered that refactorings and defects have an inverse correlation: The number of software defects decreases, if the number of refactorings increased in the preceding time period. As a result, refactoring should be a significant part of both bug fixes and other evolutionary changes to reduce software defects. %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 35-38 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370759 %> https://flosshub.org/sites/flosshub.org/files/p35-ratzinger.pdf %0 Journal Article %J Information and Software Technology %D 2008 %T Self-organization process in open-source software: An empirical study %A Yu, Liguo %K Empirical study; %K evolution %K linux %K requirements %K Self-organization %K software evolution %X Software systems must continually evolve to adapt to new functional requirements or quality requirements to remain competitive in the marketplace. However, different software systems follow different strategies to evolve, affecting both the release plan and the quality of these systems. In this paper, software evolution is considered as a self-organization process and the difference between closed-source software and open-source software is discussed in terms of self-organization. In particular, an empirical study of the evolution of Linux from version 2.4.0 to version 2.6.13 is reported. The study shows how open-source software systems self-organize to adapt to functional requirements and quality requirements. %B Information and Software Technology %V 50 %P 361 - 374 %8 4/2008 %U http://www.sciencedirect.com/science/article/pii/S0950584907000225 %N 5 %! Information and Software Technology %R 10.1016/j.infsof.2007.02.018 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Small patches get in! %A Weißgerber, Peter %A Neu, Daniel %A Diehl, Stephan %K case study %K cvs %K email %K email archives %K flac %K mailing list %K openafs %K patch acceptance %K patches %K revision control %K scm %X While there is a considerable amount of research on analyzing the change information stored in software repositories, only few researcher have looked at software changes contained in email archives in form of patches. In this paper we look at the email archives of two open source projects and answer questions like the following: How many emails contain patches? How long does it take for a patch to be accepted? Does the size of the patch influence its chances to be accepted or the duration until it gets accepted? Obviously, the answers to these questions can be helpful for the authors of patches, in particular because some of the answers are surprising. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 67–76 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370767 %R http://doi.acm.org/10.1145/1370750.1370767 %> https://flosshub.org/sites/flosshub.org/files/p67-weissgerber.pdf %0 Journal Article %J Interacting with Computers %D 2008 %T A socio-cognitive analysis of online design discussions in an Open Source Software community %A Barcellini, Flore %A Détienne, Françoise %A Burkhardt, Jean-Marie %A Warren Sack %K Role %X This paper is an analysis of online discussions in an Open Source Software (OSS) design community, the Python project. Developers of Python are geographically distributed and work online asynchronously. The objective of our study is to understand and to model the dynamics of the OSS design process that takes place in mailing list exchanges. We develop a method to study distant and asynchronous collaborative design activity based on an analysis of quoting practices. We analyze and visualize three aspects of the online dynamics: social, thematic temporal, and design. We show that roles emerge during discussions according to the involvement and the position of the participants in the discussions and how they influence participation in the design discussions. In our analysis of the thematic temporal dynamics of discussion, we examine how themes of discussion emerge, diverge, and are refined over time. To understand the design dynamics, we perform a content analysis of messages exchanged between developers to reveal how the online discussions reflect the “work flow” of the project: it provides us with a picture of the collaborative design process in the OSS community. These combined results clarify how knowledge and artefacts are elaborated in this epistemic, exploration-oriented, OSS community. Finally, we outline the need to automate of our method to extend our results. The proposed automation could have implications for both researchers and participants in OSS communities. %B Interacting with Computers %V 20 %P 141 - 165 %U http://www.sciencedirect.com/science/article/pii/S0953543807000793 %R 10.1016/j.intcom.2007.10.004 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T SpotWeb: detecting framework hotspots via mining open source repositories on the web %A Thummalapenta, Suresh %A Xie, Tao %K code reuse %K code search engine %K frameworks %K hotspots %K junit %K log4j %K repositories %X The essentials of modern software development (such as low cost and high efficiency) demand software developers to make intensive reuse of existing open source frameworks or libraries (generally referred as frameworks) available on the web. However, developers often face challenges in reusing these frameworks due to several factors such as the complexity and lack of proper documentation. In this paper, we propose a code-search-engine-based approach that tries to detect hotspots in a given framework by mining code examples gathered from open source repositories available on the web; these hotspots are the APIs that are frequently reused. Hotspots can serve as starting points for developers in understanding and reusing the given framework. We developed a tool, called SpotWeb, for frameworks or libraries written in Java and conducted two case studies with two open source frameworks JUnit and Log4j. We also show that the detected hotspots of Log4j and JUnit are consistent with their respective documentations. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 109–112 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370775 %R http://doi.acm.org/10.1145/1370750.1370775 %> https://flosshub.org/sites/flosshub.org/files/p109-thummalapenta.pdf %0 Journal Article %J Int. J. Hum.-Comput. Stud. %D 2008 %T User and developer mediation in an Open Source Software community: Boundary spanning through cross participation in online discussions %A Barcellini, Flore %A Détienne, Françoise %A Burkhardt, Jean-Marie %K Boundary spanners %K Cross-participants %K Distributed design %K Open Source Software Community %K Role emerging design %X The aim of this research is to analyse how design and use are mediated in Open Source Software (OSS) design. Focusing on the Python community, our study examines a ''pushed-by-users'' design proposal through the discussions occurring in two mailing-lists: one, user-oriented and the other, developer-oriented. To characterize the links between users and developers, we investigate the activities and references (knowledge sharing) performed by the contributors to these two mailing-lists. We found that the participation of users remains local to their community. However, several key participants act as boundary spanners between the user and the developer communities. This emerging role is characterized by cross-participation in parallel same-topic discussions in both mailing-lists, cohesion between cross-participants, the occupation of a central position in the social network linking users and developers, as well as active, distinctive and adapted contributions. The user championing the proposal acts as a key boundary spanner coordinating the process and using explicit linking strategies. We argue that OSS design may be considered as a form of ''role emerging design'', i.e. design organized and pushed through emerging roles and through a balance between these roles. The OSS communities seem to provide a suitable socio-technical environment to enable such role emergence. %B Int. J. Hum.-Comput. Stud. %I Academic Press, Inc. %C Duluth, MN, USA %V 66 %P 558–570 %U http://dx.doi.org/10.1016/j.ijhcs.2007.10.008 %R 10.1016/j.ijhcs.2007.10.008 %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Determining Implementation Expertise from Bug Reports %A Anvik, John %A Murphy, Gail C. %K bug reports %K developers %K eclipse %K expertise %K repository %K scm %K source code %X As developers work on a software product they accumulate expertise, including expertise about the code base of the software product. We call this type of expertise "implementation expertise". Knowing the set of developers who have implementation expertise for a software product has many important uses. This paper presents an empirical evaluation of two approaches to determining implementation expertise from the data in source and bug repositories. The expertise sets created by the approaches are compared to those provided by experts and evaluated using the measures of precision and recall. We found that both approaches are good at finding all of the appropriate developers, although they vary in how many false positives are returned. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 2 - 2 %@ 0-7695-2950-X %R 10.1109/MSR.2007.7 %> https://flosshub.org/sites/flosshub.org/files/28300002.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Impact of the Creation of the Mozilla Foundation in the Activity of Developers %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %A Herraiz, Israel %K cvs %K cvsanaly %K developers %K mining challenge %K mozilla %K msr challenge %K revision history %X During 2003, the Mozilla project transitioned from company-promoted (sponsored by AOL) to community-promoted (sponsored by the Mozilla Foundation). What happened to the group of developers during this transition? There was any significant impact on its activity or composition? To answer these questions, we have performed an analysis of the CVS repository of Mozilla, using the CVSAnalY tool, finding little on activity, but dramatic changes in the the composition of the development team. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 28 - 28 %@ 0-7695-2950-X %R 10.1109/MSR.2007.15 %> https://flosshub.org/sites/flosshub.org/files/28300028.pdf %0 Journal Article %J Information & Management %D 2007 %T Investigating recognition-based performance in an open content community: A social capital perspective %A Okoli, C. %A Oh, Wonseok %K open content %K recognition-based performance %K social capital %K social networks %K social status %K virtual communities %X As the open source movement grows, it becomes important to understand the dynamics that affect the motivation of participants who contribute their time freely to such projects. One important motivation that has been identified is the desire for formal recognition in the open source community. We investigated the impact of social capital in participants' social networks on their recognition-based performance; i.e., the formal status they are accorded in the community. We used a sample of 465 active participants in the Wikipedia open content encyclopedia community to investigate the effects of two types of social capital and found that network closure, measured by direct and indirect ties, had a significant positive effect on increasing participants' recognition-based performance. Structural holes had mixed effects on participants' status, but were generally a source of social capital. (C) 2007 Elsevier B.V. All rights reserved. %B Information & Management %V 44 %P 240-252 %8 Apr %@ 0378-7206 %G eng %M ISI:000247156800002 %1 management %2 SNA %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Mining CVS Repositories to Understand Open-Source Project Developer Roles %A Yu, Liguo %A Ramaswamy, Srini %K cvs %K developer interaction %K developers %K mediawiki %K orac-dr %K roles %K scm %K source code %X This paper presents a model to represent the interactions of distributed open-source software developers and utilizes data mining techniques to derive developer roles. The model is then applied on case studies of two open-source projects, ORAC-DR and Mediawiki with encouraging results. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 8 - 8 %@ 0-7695-2950-X %R 10.1109/MSR.2007.19 %> https://flosshub.org/sites/flosshub.org/files/28300008.pdf %0 Journal Article %J International Economics and Economic Policy %D 2007 %T Open source software: Motivation and restrictive licensing %A Fershtman, Chaim %A Gandal, Neil %K contributions %K contributors %K developers %K incentives %K license analysis %K licenses %K lines of code %K loc %K MOTIVATION %K restrictive %K scm %K size %K status %K version history %X Open source software (OSS) is an economic paradox. Development of open source software is often done by unpaid volunteers and the source code is typically freely available. Surveys suggest that status, signaling, and intrinsic motivations play an important role in inducing developers to invest effort. Contribution to an OSS project is rewarded by adding one’s name to the list of contributors which is publicly observable. Such incentives imply that programmers may have little incentive to contribute beyond the threshold level required for being listed as a contributor. Using a unique data set we empirically examine this hypothesis. We find that the output per contributor in open source projects is much higher when licenses are less restrictive and more commercially oriented. These results indeed suggest a status, signaling, or intrinsic motivation for participation in OSS projects with restrictive licenses. %B International Economics and Economic Policy %I Springer Berlin / Heidelberg %V 4 %P 209-225 %U http://dx.doi.org/10.1007/s10368-007-0086-4 %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Release Pattern Discovery via Partitioning: Methodology and Case Study %A Hindle, Abram %A Godfrey, Michael W. %A Holt, Richard C. %K bitkeeper %K bt2csv %K cvs %K evolution %K mysql %K releases %K revision history %K scm %K softchange %K version control %X The development of Open Source systems produces a variety of software artifacts such as source code, version control records, bug reports, and email discussions. Since the development is distributed across different tool environments and developer practices, any analysis of project behavior must be inferred from whatever common artifacts happen to be available. In this paper, we propose an approach to characterizing a project's behavior around the time of major and minor releases; we do this by partitioning the observed activities, such as artifact check-ins, around the dates of major and minor releases, and then look for recognizable patterns. We validate this approach by means of a case study on the MySQL database system; in this case study, we found patterns which suggested MySQL was behaving consistently within itself. These patterns included testing and documenting that took place more before a release than after and that the rate of source code changes dipped around release time. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 19 - 19 %@ 0-7695-2950-X %R 10.1109/MSR.2007.28 %> https://flosshub.org/sites/flosshub.org/files/28300019.pdf %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T Studying Production Phase SourceForge Projects: An Exploratory Analysis Using cvs2mysql and SFRA %A Delorey, Daniel P. %A Knutson, Charles D. %A MacLean, Alexander C. %K Data Collection %K forge %K repositories %K sourceforge %X A wealth of data can be extracted from the natural by-products of software development processes and used in empirical studies of software engineering. However, the size and accuracy of such studies depend in large part on the availability of tools that facilitate the collection of data from individual projects and the combination of data from multiple projects. To demonstrate this point, we present our experience gathering and analyzing data from nearly 10,000 open source projects hosted on SourceForge. We describe the tools we developed to collect the data and the ways in which these tools and data may be used by other researchers. We also provide examples of statistics that we have calculated from these data to describe interesting author- and project-level behaviors of the SourceForge community. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %8 2007 %> https://flosshub.org/sites/flosshub.org/files/Delorey2007c.pdf %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T Understanding the KDE Social Structure through Mining of Email Archive %A Studer, Matthias %A Müller, Benoît %A Ritschard, Gilbert %K bug tracking system %K bugzilla %K commit %K email %K email archive %K kde %K mailing list %K participation %K revision control %K social network analysis %X In order to achieve a better understanding of FLOSS social structure, we need a definition of social position. From a theoretical perspective, we propose to think the participation as a trajectory. Empirically, we use optimal matching to build a typology of participation trajectories based on KDE email archives. We show how these trajectories structure the community as a whole by combining these results with a social network analysis. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %> https://flosshub.org/sites/flosshub.org/files/wopdasd_studer_et_all_full.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Applying the evolution radar to PostgreSQL %A D'Ambros, Marco %A Lanza, Michele %K cvs %K documentation %K evolution %K evolution radar %K logical coupling %K makefile %K mining challenge %K msr challenge %K postgresql %K re-engineering %K refactoring %K release history %K rhdb %K source code %K version control %K visualization %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 177–178 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138029 %R http://doi.acm.org/10.1145/1137983.1138029 %> https://flosshub.org/sites/flosshub.org/files/177ApplyingEvolution.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Are refactorings less error-prone than other changes? %A Weißgerber, Peter %A Diehl, Stephan %K argouml %K bug reports %K bugs %K change history %K jedit %K junit %K re-engineering %K refactoring %K reverse engineering %K software evolution %K version control %X Refactorings are program transformations which should preserve the program behavior. Consequently, we expect that during phases when there are mostly refactorings in the change history of a system, only few new bugs are introduced. For our case study we analyzed the version histories of several open source systems and reconstructed the refactorings performed. Furthermore, we obtained bug reports from various sources depending on the system. Based on this data we identify phases when the above hypothesis holds and those when it doesn't. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 112–118 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138011 %R http://doi.acm.org/10.1145/1137983.1138011 %> https://flosshub.org/sites/flosshub.org/files/112AreRefactorings.pdf %0 Journal Article %J Management Science %D 2006 %T Location, Location, Location: How Network Embeddedness Affects Project Success in Open Source Systems %A Grewal, Rajdeep %A Lilien, Gary L. %A Mallapragada, Girish %K affiliation network %K age %K developers %K latent class analysis %K network embeddedness %K open source software %K page views %K perl %K project success %K registration %K sourceforge %X The community-based model for software development in open source environments is becoming a viable alternative to traditional firm-based models. To better understand the workings of open source environments, we examine the effects of network embeddedness---or the nature of the relationship among projects and developers---on the success of open source projects. We find that considerable heterogeneity exists in the network embeddedness of open source projects and project managers. We use a visual representation of the affiliation network of projects and developers as well as a formal statistical analysis to demonstrate this heterogeneity and to investigate how these structures differ across projects and project managers. Our main results surround the effect of this differential network embeddedness on project success. We find that network embeddedness has strong and significant effects on both technical and commercial success, but that those effects are quite complex. We use latent class regression analysis to show that multiple regimes exist and that some of the effects of network embeddedness are positive under some regimes and negative under others. We use project age and number of page views to provide insights into the direction of the effect of network embeddedness on project success. Our findings show that different aspects of network embeddedness have powerful but subtle effects on project success and suggest that this is a rich environment for further study. %B Management Science %I INFORMS %C Institute for Operations Research and the Management Sciences (INFORMS), Linthicum, Maryland, USA %V 52 %P 1043–1056 %8 July %U http://portal.acm.org/citation.cfm?id=1246148.1246155 %R 10.1287/mnsc.1060.0550 %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining refactorings in ARGOUML %A Weißgerber, Peter %A Diehl, Stephan %A Görg, Carsten %K argouml %K bug tracking %K bugs %K cvs %K email %K evolution %K mining challenge %K msr challenge %K re-engineering %K refactoring %K release history %X In this paper we combine the results of our refactoring reconstruc- tion technique with bug, mail and release information to perform process and bug analyses of the ARGOUML CVS archive. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 175–176 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138028 %R http://doi.acm.org/10.1145/1137983.1138028 %> https://flosshub.org/sites/flosshub.org/files/175MiningRefactorings.pdf %0 Journal Article %J Intern. J. Internet Technology and Web Engineering %D 2006 %T Multi-Modal Modeling, Analysis and Validation of Open Source Software Development Processes %A Walt Scacchi %A Chris Jensen %A Noll, J. %A Elliott, M. %K empirical studies of software engineering %K open source software development %K process modeling %K requirements processes %K software process %X Understanding the context, structure, activities, and content of software development processes found in practice has been and remains a challenging problem. In the world of free/open source software development, discovering and understanding what processes are used in particular projects is important in determining how they are similar to or different from those advocated by the software engineering community. Prior studies have revealed that development processes in F/OSSD projects are different in a number of ways. In this paper, we describe how a variety of modeling perspectives and techniques are used to elicit, analyze, and validate software development processes found in F/OSSD projects, with examples drawn from studies of the software requirements process found in the NetBeans.org project. %B Intern. J. Internet Technology and Web Engineering %V 1 %P 49-63 %G eng %> https://flosshub.org/sites/flosshub.org/files/Scacchi-Jensen-Noll-Elliott-OSSC05.pdf %0 Journal Article %J Statistical Science %D 2006 %T Opportunities and Challenges Applying Functional Data Analysis to the Study of Open Source Software Evolution %A Stewart, Katherine J. %A Darcy, David P. %A Daniel, Sherae L. %K complexity %K evolution %K fda %K java %K lines of code %K loc %K release history %K scm %K size %K sourceforge %X This paper explores the application of functional data analysis (FDA) as a means to study the dynamics of software evolution in the open source context. Several challenges in analyzing the data from software projects are discussed, an approach to overcoming those challenges is described, and preliminary results from the analysis of a sample of open source software (OSS) projects are provided. The results demonstrate the utility of FDA for uncovering and categorizing multiple distinct patterns of evolution in the complexity of OSS projects. These results are promising in that they demonstrate some patterns in which the complexity of software decreased as the software grew in size, a particularly novel result. The paper reports preliminary explorations of factors that may be associated with decreasing complexity patterns in these projects. The paper concludes by describing several next steps for this research project as well as some questions for which more sophisticated analytical techniques may be needed. %B Statistical Science %I Institute of Mathematical Statistics %V 21 %P 167-178 %U http://www.jstor.org/stable/27645747 %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Predicting defect densities in source code files with decision tree learners %A Knab, Patrick %A Pinzger, Martin %A Bernstein, Abraham %K change analysis %K data mining %K decision tree learner %K defect density %K defect prediction %K mozilla %K prediction %K release history %K scm %K source code %K version control %X With the advent of open source software repositories the data available for defect prediction in source files increased tremendously. Although traditional statistics turned out to derive reasonable results the sheer amount of data and the problem context of defect prediction demand sophisticated analysis such as provided by current data mining and machine learning techniques.In this work we focus on defect density prediction and present an approach that applies a decision tree learner on evolution data extracted from the Mozilla open source web browser project. The evolution data includes different source code, modification, and defect measures computed from seven recent Mozilla releases. Among the modification measures we also take into account the change coupling, a measure for the number of change-dependencies between source files. The main reason for choosing decision tree learners, instead of for example neural nets, was the goal of finding underlying rules which can be easily interpreted by humans. To find these rules, we set up a number of experiments to test common hypotheses regarding defects in software entities. Our experiments showed, that a simple tree learner can produce good results with various sets of input data. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 119–125 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138012 %R http://doi.acm.org/10.1145/1137983.1138012 %> https://flosshub.org/sites/flosshub.org/files/119Predicting.pdf %0 Conference Paper %B 1st Workshop on Public Data about Software Development (WoPDaSD 2006) %D 2006 %T Regurgitate: Using GIT For F/LOSS Data Collection %A Bart Massey %A Keith Packard %K cvs %K cvsanaly %K git %K history %K promise %K regurgitate %K scm %X We have created a new tool, regurgitate, for importing CVS repositories into the GIT source code management system. Important features of GIT include great expressiveness in capturing relationships between revisions and across files as well as extremely high-speed processing. These features make GIT an ideal platform for gathering detailed longitudinal metrics for open source projects. The availability of regurgitate facilitates using GIT as an analysis tool for that majority of open source projects that keep their repositories in CVS. In particular, GIT is fast enough that it is practical to replay the entire development history of a project commit-at-a-time, collecting metrics at each step. We demonstrate this process for a simple metric and a collection of benchmark F/LOSS repositories. %B 1st Workshop on Public Data about Software Development (WoPDaSD 2006) %> https://flosshub.org/sites/flosshub.org/files/massey.pdf %0 Conference Paper %B OSS2006: Open Source Systems (IFIP 2.13) %D 2006 %T Retrieving Open Source Software Licenses %A Tuunanen, Timo %A Koskinen, Jussi %A Kärkkäinen, Tommi %K gaim %K license %K license analysis %K maintenance %K mozilla %K reuse %X Open Source Software maintenance and reuse require identifying and comprehending the applied software licenses. This paper first characterizes software maintenance, and open source software (OSS) reuse which are particularly relevant in this context. The information needs of maintainers and reusers can be supported by reverse engineering tools at different information retrieval levels. The paper presents an automated license retrieval approach called ASLA. User needs, system architecture, tool features, and tool evaluation are presented. The implemented tool features support identifying source file dependencies and licenses in source files, and adding new license templates for identifying licenses. The tool is evaluated against another tool for license information extraction. ASLA requires the source code as available input but is otherwise not limited to OSS. It supports the same programming languages as GCC. License identification coverage is good and the tool is extendable. %B OSS2006: Open Source Systems (IFIP 2.13) %S IFIP International Federation for Information Processing %I Springer %P 35 - 46 %G eng %R http://dx.doi.org/10.1007/0-387-34226-5_4 %> https://flosshub.org/sites/flosshub.org/files/Retrieving%20Open%20Source%20Software%20Licenses.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T A study of the contributors of PostgreSQL %A Daniel M. German %K contributions %K contributors %K cvs %K developers %K mining challenge %K mining software repositories %K msr challenge %K patches %K postgresql %K revision history %K roles %K software evolution %K source code %K team %X This report describes some characteristics of the development team of PostgreSQL that were uncovered by analyzing the history of its software artifacts as recorded by the project's CVS repository. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 163–164 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138022 %R http://doi.acm.org/10.1145/1137983.1138022 %> https://flosshub.org/sites/flosshub.org/files/163AStudyOf.pdf %0 Conference Paper %B Proceedings of the 38th Annual Hawaii International Conference on System Sciences %D 2006 %T A Topological Analysis of the Open Souce Software Development Community %A Jin Xu %A Gao, Yongqin %A Christley, S. %A Madey, G. %K contributors %K developers %K roles %K social network analysis %K social networks %K sourceforge %K srda %K users %X The fast growth of OSS has increased the interest in studying the composition of the OSS community and its collaboration mechanisms. Moreover, the success of a project may be related to the underlying social structure of the OSS development community. In this paper, we perform a quantitative analysis of Open Source Software developers by studying the entire development community at SourceForge [26]. Statistics and social network properties are explored to find collaborations and the effects of different members in the OSS development community. Small world phenomenon and scale free behaviors are found in the SourceForge development network. These topological properties may potentially explain the success and efficiency of OSS development practices. We also infer from our analysis that weakly associated but contributing co-developers and active users may be an important factor in OSS development. %B Proceedings of the 38th Annual Hawaii International Conference on System Sciences %I IEEE %C Big Island, HI, USA %P 1-10 %U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.6830&rep=rep1&type=pdf %R 10.1109/HICSS.2005.57 %> https://flosshub.org/sites/flosshub.org/files/xuGao.pdf %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Carrot2 Clustering Framework %A Weiss, Dawid %A Osi´nski, Stanisław %K BSD license %K cluster %K clustering framework %K open source %K research %K result %X Carrot2 is an Open Source framework for research experiments with querying various textual data sources, processing and presentation of the results. Its main goal is to promote component reuse in order to reduce the effort involved in the development of Information Retrieval software. So far, the most successful and popular application of Carrot2 has been organizing results of Internet searches into easy to browse thematic groups called clusters. In this area, the project successfully competes with commercial counterparts like Vivisimo or iBoogie. %B OSS2005: Open Source Systems %P 298-299 %U http://pascal.case.unibz.it/handle/2038/788 %0 Journal Article %J IEEE Transactions on Software Engineering %D 2005 %T Empirical validation of object-oriented metrics on open source software for fault prediction %A Gyimothy, T. %A Ferenc, R. %A Siket, I. %K bugs %K bugzilla %K cbo %K defects %K dit %K fault-prone modules %K faults %K lcom %K lcomn %K loc %K metrics %K mozilla %K noc %K object-oriented %K rfc %K source code %K wmc %X Open source software systems are becoming increasingly important these days. Many companies are investing in open source projects and lots of them are also using such software in their own work. But, because open source software is often developed with a different management style than the industrial ones, the quality and reliability of the code needs to be studied. Hence, the characteristics of the source code of these projects need to be measured to obtain more information about it. This paper describes how we calculated the object-oriented metrics given by Chidamber and Kemerer to illustrate how fault-proneness detection of the source code of the open source Web and e-mail suite called Mozilla can be carried out. We checked the values obtained against the number of bugs found in its bug database - called Bugzilla - using regression and machine learning methods to validate the usefulness of these metrics for fault-proneness prediction. We also compared the metrics of several versions of Mozilla to see how the predicted fault-proneness of the software system changed during its development cycle. %B IEEE Transactions on Software Engineering %V 31 %P 897-910 %G eng %U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.8372&rep=rep1&type=pdf %M WOS:000233015300008 %1 software engineering %2 case study %> https://flosshub.org/sites/flosshub.org/files/Gyimothy.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Error detection by refactoring reconstruction %A Görg, Carsten %A Weißgerber, Peter %K bugs %K class %K inheritance %K jedit %K refactoring %K tomcat %X In many cases it is not sufficient to perform a refactoring only at one location of a software project. For example, refactorings may have to be performed consistently to several classes in the inheritance hierarchy, e.g. subclasses or implementing classes, to preserve equal behavior.In this paper we show how to detect incomplete refactorings - which can cause long standing bugs because some of them do not cause compiler errors - by analyzing software archives. To this end we reconstruct the class inheritance hierarchies, as well as refactorings on the level of methods. Then, we relate these refactorings to the corresponding hierarchy in order to find missing refactorings and thus, errors and inconsistencies that have been introduced in a software project at some point of the history.Finally. we demonstrate our approach by case studies on two open source projects. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 29-33 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083148 %R http://doi.acm.org/10.1145/1082983.1083148 %> https://flosshub.org/sites/flosshub.org/files/29ErrorDetection.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Mining evolution data of a product family %A Fischer, Michael %A Oberleitner, Johann %A Ratzinger, Jacek %A Gall, Harald %K bsd %K change analysis %K change history %K cvs %K evolution %K freebsd %K netbsd %K openbsd %K release history %K source code %K text mining %X Diversification of software assets through changing requirements impose a constant challenge on the developers and maintainers of large software systems. Recent research has addressed the mining for data in software repositories of single products ranging from fine- to coarse grained analyses. But so far, little attention has been payed to mining data about the evolution of product families. In this work, we study the evolution and commonalities of three variants of the BSD (Berkeley Software Distribution), a large open source operating system. The research questions we tackle are concerned with how to generate high level views of the system discovering and indicating evolutionary highlights. To process the large amount of data, we extended our previously developed approach for storing release history information to support the analysis of product families. In a case study we apply our approach on data from three different code repositories representing about 8.5GB of data and 10 years of active development. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 12-16 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083145 %R http://doi.acm.org/10.1145/1082983.1083145 %> https://flosshub.org/sites/flosshub.org/files/12MiningEvolution.pdf %0 Conference Paper %B Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences - Volume 07 %D 2005 %T A Preliminary Analysis of the Influences of Licensing and Organizational Sponsorship on Success in Open Source Projects %A Stewart, Katherine J. %A Ammeter, Anthony P. %A Maruping, Likoebe M. %K contributors %K developers %K freshmeat %K license analysis %K licensing %K metadata %K popularity %K restrictive %K users %X This paper develops and tests a model of the impact of licensing restrictiveness and organizational sponsorship on the popularity and vitality of open source software (OSS) development projects. Using data gathered from Freshmeat.net and OSS project home pages the main conclusions derived from the analysis are that organizational sponsorship has a positive effect on project popularity by easing user concerns about cost and quality and that license restrictiveness may have a negative effect on popularity by reducing the perceived utility of open source software. Theoretical and practical implications are discussed, and the paper outlines several avenues for future research. %B Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences - Volume 07 %S HICSS '05 %I IEEE Computer Society %C Washington, DC, USA %P 1-10 %8 2005 %@ 0-7695-2268-8-7 %U http://dx.doi.org/10.1109/HICSS.2005.38 %R http://dx.doi.org/10.1109/HICSS.2005.38 %0 Conference Paper %B OSS2005: Open Source Systems %D 2005 %T Quality Improvement in Volunteer Free Software Projects: Exploring the Impact of Release Management %A Martin Michlmayr %K free software %K open source %K process improvement %K quality assurance %K release management %K volunteer projects %X Even though free software has achieved great popularity and success in recent years, there are a number of product quality challenges facing the open source development model. There is significant room for further quality improvement and one area that deserves special attention is release management. This research will identify problems with current release practices, verify possible advantages of an increasingly popular release model, and develop interventions to improve release management in free software projects. The research also aims to answer the fundamental question as to how volunteer projects can deliver predictable and high quality software. %B OSS2005: Open Source Systems %P 309-310 %U http://pascal.case.unibz.it/handle/2038/1429 %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T SCQL: a formal model and a query language for source control repositories %A Hindle, Abram %A Daniel M. German %K evolution %K file %K gnumeric %K modperl %K openssl %K revision %K samba %K scm %K source code %X Source Control Repositories are used in most software projects to store revisions to source code files. These repositories operate at the file level and support multiple users. A generalized formal model of source control repositories is described herein. The model is a graph in which the different entities stored in the repository become vertices and their relationships become edges. We then define SCQL, a first order, and temporal logic based query language for source control repositories. We demonstrate how SCQL can be used to specify some questions and then evaluate them using the source control repositories of five different large software projects. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 100-104 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083161 %R http://doi.acm.org/10.1145/1082983.1083161 %> https://flosshub.org/sites/flosshub.org/files/100scql.pdf %0 Conference Paper %B Proceedings of the 2005 international workshop on Mining software repositories %D 2005 %T Using a clone genealogy extractor for understanding and supporting evolution of code clones %A Kim, Miryung %A Notkin, David %K clone %K clone detection %K cvs %K developers %K evolution %K maintenance %K refactoring %K source code %X Programmers often create similar code snippets or reuse existing code snippets by copying and pasting. Code clones —syntactically and semantically similar code snippets—can cause problems during software maintenance because programmers may need to locate code clones and change them consistently. In this work, we investigate (1) how code clones evolve, (2) how many code clones impose maintenance challenges, and (3) what kind of tool or engineering process would be useful for maintaining code clones. Based on a formal definition of clone evolution, we built a clone genealogy tool that automatically extracts the history of code clones from a source code repository (CVS). Our clone genealogy tool enables several analyses that reveal evolutionary characteristics of code clones. Our initial results suggest that aggressive refactoring may not be the best solution for all code clones; thus, we propose alternative tool solutions that assist in maintaining code clones using clone genealogy information. %B Proceedings of the 2005 international workshop on Mining software repositories %S MSR '05 %I ACM %C New York, NY, USA %P 17-23 %@ 1-59593-123-6 %U http://doi.acm.org/10.1145/1082983.1083146 %R http://doi.acm.org/10.1145/1082983.1083146 %> https://flosshub.org/sites/flosshub.org/files/17Using.pdf %0 Conference Paper %B International Workshop on Mining Software Repositories (MSR 2004) %D 2004 %T LASER: a lexical approach to analogy in software reuse %A Amin, R. %A Mel O Cinneide %A Veale, Tony %K class %K developers %K functions %K jrefactory %K method %K naming %K natural language %K reuse %K source code %K wordnet %X Software reuse is the process of creating a software system from existing software components, rather than creating it from scratch. With the increase in size and complexity of existing software repositories, the need to provide intelligent support to the programmer becomes more pressing. An analogy is a comparison of certain similarities between things which are otherwise unlike. This concept has shown to be valuable in developing UML-level reuse techniques. In the LASER project we apply lexically-driven Analogy at the code level, rather than at the UML-level, in order to retrieve matching components from a repository of existing components. Using the lexical ontology Word-Net, we have conducted a case study to assess if class and method names in open source applications are used in a semantically meaningful way. Our results demonstrate that both hierarchical reuse and parallel reuse can be enhanced through the use of lexically-driven Analogy. %B International Workshop on Mining Software Repositories (MSR 2004) %I IEE %C Edinburgh, Scotland, UK %V 2004 %P 112 - 116 %R 10.1049/ic:20040487 %> https://flosshub.org/sites/flosshub.org/files/112LASER.pdf %0 Journal Article %J First Monday %D 2004 %T Release criteria for the Linux kernel %A Glance, D.G. %K bugs %K change log %K linux %K linux kernel %K log files %K mailing list %K patches %K quality %K release history %X Before software is released to its users, software developers will ensure that the software has met specified functional and technical requirements and that it is as free from bugs as possible. Users should be able to have a high degree of confidence that the software will perform as specified and without fault. With open source development practices such as those employed on the Linux kernel project, there are no detailed specifications and little formal testing processes. The questions, then, are what criteria, if any, are used in determining the suitability for release of a particular version of this software, and do users have any degree of confidence in the quality of that release of software? These questions were examined in this study using information from the Linux Kernel Mailing List (LKML), the primary forum for discussion of development issues of the Linux kernel, and change logs submitted with version releases of the Linux kernel. It was determined that very little planning is employed in determining the release of a particular version of the software and that a version of the software is essentially a collection of source patches released at regular intervals with some stabilisation of the code base before each release. Very little attempt is made to verify that the code is bug free, and consequently, the code released is of a largely unknown level of quality. End users are left to decide for themselves the suitability and robustness of a particular version of the software. %B First Monday %V 9 %8 2004 %U http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1136/1056 %> https://flosshub.org/sites/flosshub.org/files/Glance.pdf %0 Conference Paper %B 7th European Conference on Software Maintenance and Reengineering (CSMR'03) %D 2003 %T Characteristics of Open Source Projects %A Capiluppi, Andrea %A Patricia Lago %A Maurizio Morisio %K evolution %K project success %K repository %X Most empirical studies about Open Source (OS)projects or products are vertical and usually deal with the flagship, successful projects. There is a substantial lack of horizontal studies to shed light on the whole population of projects, including failures. This paper presents a horizontal study aimed at characterizing OS projects. We analyze a sample of around 400 projects from a popular OS project repository. Each project is characterized by a number of attributes. We analyze these attributes statically and over time. The main results show that few projects are capable of attracting a meaningful community of developers. The majority of projects is made by few (in many cases one) person with a very slow pace of evolution. %B 7th European Conference on Software Maintenance and Reengineering (CSMR'03) %P 317- %0 Journal Article %J Research Policy %D 2003 %T Community, joining, and specialization in open source software innovation: a case study %A Georg von Krogh %A Spaeth, S. %A Karim R Lakhani %K cvs %K email %K email archives %K freenet %K INNOVATION %K mailing lists %K roles %K source code %X This paper develops an inductive theory of the open source software innovation process by focussing on the creation of Freenet, a project aimed at developing a decentralized and anonymous peer-to-peer electronic file sharing network. We are particularly interested in the strategies and processes by which new people join the existing community of software developers, and how they initially contribute code. Analyzing data from multiple sources on the Freenet software development process, we generate the constructs of "joining script", We are grateful to helpful comments from two anonymous reviewers. We also thank Chris Argyris, John Seely Brown, Eric von Hippel, Stefan Haefliger, Petra Kugler, Heike Bruch, Simon Gchter, Simon Peck, and Hari Tsoukas for helpful comments and suggestions. Ben Ho and Craig Lebowitz provided technical assistance with data importation and parsing. We would like to thank Ian Clarke and the Freenet developers for their willingness to participate in our study and providing key insights into the open source development process. Karim R. Lakhani would like to acknowledge the generous support of The Boston Consulting Group and Canada's Social Science and Humanities Research Council doctoral fellowship. Georg von Krogh and Sebastian Spaeth acknowledge the generous support from the Research Foundation at the University of St. Gallen. %B Research Policy %V 32 %P 1217-1241 %G eng %1 policy %2 case study %R http://dx.doi.org/10.1016/S0048-7333(03)00050-7 %> https://flosshub.org/sites/flosshub.org/files/krogh03.pdf %0 Journal Article %J Proceedings of the 3rd ICSE Workshop on Open Source %D 2003 %T Evidences in the evolution of OS projects through Changelog Analyses %A Capiluppi, Andrea %K classification %K freshmeat %K loc %K modularity %K repository %K size %K sloc %K source code %X Most empirical studies about Open Source (OS) projects or products are vertical and usually deal with the flagship, successful projects. There is a substantial lack of horizontal studies to shed light on the whole population of projects, including failures. This paper presents a horizontal study aimed at characterizing OS projects. We analyze a sample of around 400 projects from a popular OS project repository. Each project is characterized by a number of attributes. We analyze these attributes statically and over time. The main results show that few projects are capable of attracting a meaningful community of developers. The majority of projects is made by few (in many cases one) person with a very slow pace of evolution. We then try to observe how many projects count on a substantial number of developers, and analyze those projects more deeply. The goal is to achieve a better insight in the dynamics of open source development. The initial results of this analysis, especially growth in code size and tendency to stability in modularity, seem to be in line with traditional close source development. %B Proceedings of the 3rd ICSE Workshop on Open Source %P 19-24 %U http://hdl.handle.net/10552/1037 %> https://flosshub.org/sites/flosshub.org/files/capiluppi2003.pdf %0 Conference Paper %B Proceedings of 7th Annual Conference of the Southern Association for Information Systems %D 2003 %T Organizational Structure of Open Source Projects: A Life Cycle Approach %A Donald E. Wynn %K division of labor %K downloads %K growth %K interview %K leadership %K life cycle %K lifecycle %K project success %K roles %K sourceforge %K Survey %X The structure of open source project communities is discussed in relation to the organizational life cycle. In lieu of sales figures, the download counts for each project are used to identify the life cycle stage of a random sample of open source projects. A research model is proposed that attempts to measure the fit between the life cycle stage and the specific organizational characteristics of these projects (focus, division of labor, role of the leader, level of commitment, and coordination/control) as an indicator of the success of a project as measured by the satisfaction and involvement of both developers and users. %B Proceedings of 7th Annual Conference of the Southern Association for Information Systems %> https://flosshub.org/sites/flosshub.org/files/wynn2004.pdf %0 Conference Paper %B Proceedings of the 25th International Conference on Software Engineering %D 2003 %T Toward an understanding of the motivation Open Source Software developers %A Ye, Yunwen %A Kishida, Kouichi %K change log %K COMMUNITY %K contributions %K contributors %K developers %K email %K email archives %K evolution %K gimp %K log files %K mailing list %K roles %K source code %X An Open Source Software (OSS) project is unlikely to be successful unless there is an accompanied community that provides the platform for developers and users to collaborate. Members of such communities are volunteers whose motivation to participate and contribute is of essential importance to the success of OSS projects. In this paper, we aim to create an understanding of what motivates people to participate in OSS communities. We theorize that learning is one of the motivational forces. Our theory is grounded in the learning theory of Legitimate Peripheral Participation, and is supported by analyzing the social structure of OSS communities and the co-evolution between OSS systems and communities. We also discuss practical implications of our theory for creating and maintaining sustainable OSS communities as well as for software engineering research and education. %B Proceedings of the 25th International Conference on Software Engineering %S ICSE '03 %I IEEE Computer Society %C Washington, DC, USA %P 419–429 %@ 0-7695-1877-X %U http://portal.acm.org/citation.cfm?id=776816.776867 %> https://flosshub.org/sites/flosshub.org/files/YeKishida.pdf %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T Adopting OSS Methods by Adopting OSS Tools %A Robbins, Jason E. %K ant %K argouml %K bugzilla %K cactus %K cvs %K developers %K eclipse %K emacs %K email %K faq %K junit %K mailing lists %K make %K netbeans %K package management %K rpm %K scarab %K subversion %K teams %K tools %K torque %K WORK %X The open source movement has created and used a set of software engineering tools with features that fit the characteristics of open source development processes. To a large extent, the open source culture and methodology are conveyed to new developers via the toolset itself, and through the demonstrated usage of these tools on existing projects. The rapid and wide adoption of open source tools stands in stark contrast to the difficulties encountered in adopting traditional CASE tools. This paper explores the characteristics that make these tools adoptable and how adopting them may influence software development processes. %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/Robbins.pdf %0 Journal Article %J First Monday %D 2002 %T Cave or Community? An Empirical Examination of 100 Mature Open Source Projects %A Sandeep Krishnamurthy %K age %K contributors %K developers %K project success %K registration %K sourceforge %X Starting with Eric Raymond's groundbreaking work, "The Cathedral and the Bazaar", open-source software (OSS) has commonly been regarded as work produced by a community of developers. Yet, given the nature of software programs, one also hears of developers with no lives that work very hard to achieve great product results. In this paper, I sought empirical evidence that would help us understand which is more common - the cave (i.e., lone producer) or the community. Based on a study of the top 100 mature products on Sourceforge, I find a few surprising things. First, most OSS programs are developed by individuals, rather than communities. The median number of developers in the 100 projects I looked at was 4 and the mode was 1 - numbers much lower than previous numbers reported for highly successful projects! Second, most OSS programs do not generate a lot of discussion. Third, products with more developers tend to be viewed and downloaded more often. Fourth, the number of developers associated with a project was positively correlated to the age of the project. Fifth, the larger the project, the smaller the percent of project administrators. %B First Monday %V 7 %8 06/2002 %G eng %> https://flosshub.org/sites/flosshub.org/files/krishnamurthy.pdf %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T Characterizing the OSS process %A Capiluppi, Andrea %A Patricia Lago %A Maurizio Morisio %K bugs %K change log %K classification %K cvs %K downloads %K freshmeat %K metadata %K patches %K popularity %K project success %K release history %K sourceforge %K vitality %X The Open Source model of software development has gained the attention of both the business, the practitioners’ and the research communities. The Open Source process has been described by the seminal paper by Eric Raymond [4] and [5]. However, sound empirical studies are still very limited [3], [6]. Our goal is to investigate the OS process by empirical means, to analyze, characterize it, and possibly model it with quantitative models. It should be noted that the Open Source process provides open process and product data, and therefore is a rare opportunity for empirical research. Our initial research focus is on the characterization of the process, starting from the evolution of OS projects. In traditional projects, a significant number of releases in a short time is usually considered an instability factor [7] and [8], while in the OSS community, it is an evidence of vitality, shows the commitment of the authors and the power of attraction of other programmers [9]. Is it possible to characterize the vitality of projects? And, can vitality be traced to some other characteristics of a project? %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/CapiluppiLagoMorisio.pdf %0 Conference Paper %B ICIS 2002. Proceedings of International Conference on Information Systems 2002 %D 2002 %T An Exploratory Study of Factors Influencing the Level of Vitality and Popularity of Open Source Projects %A Stewart, Katherine J. %A Ammeter, Tony %K activity %K audience %K developers %K freshmeat %K license analysis %K licenses %K organizational sponsorship %K project success %K roles %K status %K target audience %K users %X In this research, we ask the question: What differentiates successful from unsuccessful open source software projects? Using a sample of 240 open source projects, we examine how organizational sponsorship, target audience (developer versus end user), license choice, and development status interact over time to influence the extent to which open source software projects attract user attention and developer activity. %B ICIS 2002. Proceedings of International Conference on Information Systems 2002 %P 1-5 %8 2002 %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T Exploring the Strengths and Limits of Open Source Software Engineering Processes: A Research Agenda %A Kevin Crowston %A Barbara Scozzi %K replicability %K requirements %K research %K research agenda %X Many researchers have investigated the nature and characteristics of open source software (OSS) projects and their developer communities. In this position paper, after examining some success factors, we discuss potential limits on the replicability and portability of OSS engineering processes. Based on this analysis, we propose a research agenda to better understand the current nature of the processes and thus the strengths and the limitations. %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/CrowstonScozzi.pdf %0 Journal Article %J Journal of Law, Economics and Organization %D 2002 %T The Scope of Open Source Licensing %A Josh Lerner %A Jean Tirole %K developers %K license %K licenses %K permissive %K restrictive %K sourceforge %X This paper is an initial exploration of the determinants of open source license choice. It first enumerates the various considerations that should figure into the licensor's choice of contractual terms, in particular highlighting how the decision is shaped not just by the preferences of the licensor itself, but also by that of the community of developers. The paper then presents an empirical analysis of the determinants of license choice using the SourceForge database, a compilation of nearly 40,000 open source projects. Projects geared toward end-users tend to have restrictive licenses, while those oriented toward developers are less likely to do so. Projects that are designed to run on commercial operating systems and those geared towards the Internet are less likely to have restrictive licenses. Finally, projects that are likely to be attractive to consumers such as games are more likely to have restrictive licenses. %B Journal of Law, Economics and Organization %V 21 %P 20-56 %8 2005 %G eng %> https://flosshub.org/sites/flosshub.org/files/lernertirole2.pdf %0 Conference Paper %B Proceedings of the 2nd ICSE Workshop on Open Source %D 2002 %T Where Do Open Source Requirements Come From (And What Should We Do About It)? %A Bart Massey %K requirements %X The collection and specification of software requirements is one of the most intense areas of software engineering research. This makes it a natural area to explore when considering open-source software. In this paper, I argue that the sources of open-source software requirements differ in some important respects from the sources of commercial software project requirements. This has some interesting implications for both open-source and commercial development. %B Proceedings of the 2nd ICSE Workshop on Open Source %> https://flosshub.org/sites/flosshub.org/files/massey.pdf %0 Conference Paper %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %D 2001 %T Reputation Layers for Open-Source Development %A Hasan Masum %K currency %K developers %K MOTIVATION %K reputation %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %> https://flosshub.org/sites/flosshub.org/files/masum.pdf %0 Journal Article %J Information Systems Journal %D 2001 %T Striking a balance between trust anti control in a virtual organization: a content analysis of open source software case studies %A Gallivan, M. J. %K apache %K case studies %K Control %K fetchmail %K jun %K linux %K linux kernel %K McDonaldization %K mozilla %K networked organization %K perl %K rationalization %K trust %K virtual organization %X Many organization theorists have predicted the emergence of the networked or virtual firm as a model for the design of future organizations. Researchers have also emphasized the importance of trust as a necessary condition for ensuring the success of virtual organizations. This paper examines the open source software (OSS) 'movement' as an example of a virtual organization and proposes a model that runs contrary to the belief that trust is critical for virtual organizations. Instead, I argue that various control mechanisms can ensure the effective performance of autonomous agents who participate in virtual organizations. Borrowing from the theory of the 'McDonaldization' of society, I argue that, given a set of practices to ensure the control, efficiency, predictability and calculability of processes and outcomes in virtual organizations, effective performance may occur in the absence of trust. As support for my argument, I employ content analysis to examine a set of published case studies of OSS projects. My results show that, although that trust is rarely mentioned, ensuring control is an important criterion for effective performance within OSS projects. The case studies feature few references to other dimensions of 'McDonaldization' (efficiency, predictability and calculability), however, and I conclude that the OSS movement relies on many other forms of social control and self-control, which are often unacknowledged in OSS projects. Through these implicit forms of control, OSS projects are able to secure the cooperation of the autonomous agents that participate in project teams. I conclude by extrapolating from these case studies to other virtual organizations. %B Information Systems Journal %V 11 %P 277-304 %G eng %M WOS:000172198800003 %1 information systems %2 case study %0 Journal Article %J Proceedings of the International Conference on Software Engineering (ICSE 2000) %D 2000 %T A Case Study of Open Source Software Development: The Apache Server %A Audris Mockus %A Roy Fielding %A Herbsleb, James %K apache %K bug fix revisions %K bugs %K core %K cvs %K defect density %K developers %K email archives %K participation %K productivity %K revision control %K revision history %K roles %K scm %K source code %K team size %X According to its proponents, open source style software development has the capacity to compete successfully, and perhaps in many cases displace, traditional commercial development methods. We examine the development process of a major open source application, the Apache web server. By using email archives of source code change history and problem reports we quantify aspects of developer participation, core team size, code ownership, productivity, defect density, and problem resolution interval for this OSS project. This analysis reveals a unique process, which performs well on important measures. %B Proceedings of the International Conference on Software Engineering (ICSE 2000) %8 June %G eng %> https://flosshub.org/sites/flosshub.org/files/mockusapache.pdf