%0 Conference Proceedings %B Open Source Systems: Grounding Research (OSS 2011) %D 2011 %T Knowledge Homogeneity and Specialization in the Apache HTTP Server Project %A MacLean, Alexander C. %A Pratt, Landon J. %A Knutson, Charles D. %A Ringger, Eric K. %K apache %K commits %K developer %K email %K email archive %K LDA %K mailing list %K revision control %K revision history %K scm %K social network analysis %K specialization %K subversion %K svn %X We present an analysis of developer communication in the Apache HTTP Server project. Using topic modeling techniques we expose latent conceptual sub-communities arising from developer specialization within the greater developer population. However, we found that among the major contributors to the project, very little specialization exists. We present theories to explain this phenomenon, and suggest further research. %B Open Source Systems: Grounding Research (OSS 2011) %I Springer %P 106-122 %8 10/2011 %U http://sequoia.cs.byu.edu/lab/files/pubs/MacLean2011a.pdf %> https://flosshub.org/sites/flosshub.org/files/MacLean2011a.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Replaying IDE interactions to evaluate and improve change prediction approaches %A Robbes, Romain %A Pollet, Damien %A Lanza, Michele %K cbse %K change based software evolution %K change prediction %K changes %K commit %K cvs %K development history %K eclipseeye %K ide %K mylyn %K spyware %K svn %X Change prediction helps developers by recommending program entities that will have to be changed alongside the entities currently being changed. To evaluate their accuracy, current change prediction approaches use data from versioning systems such as CVS or SVN. These data sources provide a coarse-grained view of the development history that flattens the sequence of changes in a single commit. They are thus not a valid basis for evaluation in the case of development-style prediction, where the order of the predictions has to match the order of the changes a developer makes. We propose a benchmark for the evaluation of change prediction approaches based on fine-grained change data recorded from IDE usage. Moreover, the change prediction approaches themselves can use the more accurate data to fine-tune their prediction. We present an evaluation procedure and use it on several change prediction approaches, both novel and from the literature, and report on the results. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 161 - 170 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463278 %> https://flosshub.org/sites/flosshub.org/files/161Robbes2010changePrediction.pdf %0 Book %B IFIP Advances in Information and Communication Technology Open Source Software: New Horizons (OSS 2010) %D 2010 %T Warehousing and Studying Open Source Versioning Metadata %A van Antwerp, M. %A Madey, G. %E Ågerfalk, Pär %E Boldyreff, Cornelia %E González-Barahona, Jesús M. %E Madey, Gregory R. %E Noll, John %K berlios %K cvs %K savannah %K scm %K sourceforge %K srda %K subversion %K svn %X In this paper, we describe the downloading and warehousing of Open Source Software (OSS) versioning metadata from SourceForge, BerliOS Developer, and GNU Savannah. This data enables and supports research in areas such as software engineering, open source phenomena, social network analysis, data mining, and project management. This newly-formed database containing Concurrent Versions System (CVS) and Subversion (SVN) metadata offers new research opportunities for large-scale OSS development analysis. The CVS and SVN data is juxtaposed with the SourceForge.net Research Data Archive [5] for the purpose of performing more powerful and interesting queries. We also present an initial statistical analysis of some of the most active projects. %B IFIP Advances in Information and Communication Technology Open Source Software: New Horizons (OSS 2010) %I Springer Berlin Heidelberg %C Berlin, Heidelberg %V 319 %P 413 - 418 %@ 978-3-642-13244-5 %R 10.1007/978-3-642-13244-5_40 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Learning from defect removals %A Ayewah, Nathaniel %A Pugh, William %K bug fixing %K bugzilla %K change management %K cherry %K cvs %K eclipse %K groovy %K launching %K source code %K svn %K text editor %X Recent research has tried to identify changes in source code repositories that fix bugs by linking these changes to reports in issue tracking systems. These changes have been traced back to the point in time when they were previously modified as a way of identifying bug introducing changes. But we observe that not all changes linked to bug tracking systems are fixing bugs; some are enhancing the code. Furthermore, not all fixes are applied at the point in the code where the bug was originally introduced. We flesh out these observations with a manual review of several software projects, and use this opportunity to see how many defects are in the scope of static analysis tools. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 179 - 182 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069500 %> https://flosshub.org/sites/flosshub.org/files/179LearnFromDefects-MSR09.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Collecting data from distributed FOSS projects %A Fagerholm, Fabian %A Taina, Juha %K bitkeeper %K bug tracking system %K cvs %K distributed %K email archive %K fork rate %K git %K life cycle %K linux %K linux kernel %K mailing list %K merge rate %K subversion %K svn %K version control %X A key trait of Free and Open Source Software (foss) development is its distributed nature. Nevertheless, two project-level operations, the fork and the merge of program code, are among the least well understood events in the lifespan of a foss project. Some projects have explicitly adopted these operations as the primary means of concurrent development. In this study, we examine the effect of highly distributed software development, as found in the Linux kernel project, on collection and modelling of software development data. We find that distributed development calls for sophisticated temporal modelling techniques where several versions of the source code tree can exist at once. Attention must be turned towards the methods of quality assurance and peer review that projects employ to manage these parallel source trees. Our analysis indicates that two new metrics, fork rate and merge rate, could be useful for determining the role of distributed version control systems in foss projects. The study presents a preliminary data set consisting of version control and mailing list data. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 8-13 %8 2009 %> https://flosshub.org/sites/flosshub.org/files/fagerholm.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Visual Data Mining in Software Archives to Detect How Developers Work Together %A Weissgerber, Peter %A Pohl, Mathias %A Burch, Michael %K change %K coordination %K cvs %K developers %K junit %K modules %K scm %K source code %K svn %K teams %K tomcat %K visualization %X Analyzing the check-in information of open source software projects which use a version control system such as CVS or SUBVERSION can yield interesting and important insights into the programming behavior of developers. As in every major project tasks are assigned to many developers, the development must be coordinated between these programmers. This paper describes three visualization techniques that help to examine how programmers work together, e.g. if they work as a team or if they develop their part of the software separate from each other. Furthermore, phases of stagnation in the lifetime of a project can be uncovered and thus, possible problems are revealed. To demonstrate the usefulness of these visualization techniques we performed case studies on two open source projects. In these studies interesting patterns of developers? behavior, e.g. the specialization on a certain module can be observed. Moreover, modules that have been changed by many developers can be identified as well as such ones that have been altered by only one programmer. %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 9 - 9 %@ 0-7695-2950-X %R 10.1109/MSR.2007.34 %> https://flosshub.org/sites/flosshub.org/files/28300009.pdf