%0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Analysing the 'Biodiversity' of Open Source Ecosystems: The GitHub Case %A Matragkas, Nicholas %A Williams, James R. %A Kolovos, Dimitris S. %A Paige, Richard F. %K Data and knowledge visualization %K data mining %K mining challenge %K msr challenge %X In nature the diversity of species and genes in ecological communities affects the functioning of these communities. Biologists have found out that more diverse communities appear to be more productive than less diverse communities. Moreover such communities appear to be more stable in the face of perturbations. In this paper, we draw the analogy between ecological communities and Open Source Software (OSS) ecosystems, and we investigate the diversity and structure of OSS communities. To address this question we use the MSR 2014 challenge dataset, which includes data from the top-10 software projects for the top programming languages on GitHub. Our findings show that OSS communities on GitHub consist of 3 types of users (core developers, active users, passive users). Moreover, we show that the percentage of core developers and active users does not change as the project grows and that the majority of members of large projects are passive users. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 356–359 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597119 %R 10.1145/2597073.2597119 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Co-evolution of Project Documentation and Popularity Within Github %A Aggarwal, Karan %A Hindle, Abram %A Stroulia, Eleni %K Cross Correlation %K Documentation Change %K mining challenge %K msr challenge %K popularity %X Github is a very popular collaborative software-development platform that provides typical source-code management and issue tracking features augmented by strong social-networking features such as following developers and watching projects. These features help "spread the word" about individuals and projects, building the reputation of the former and increasing the popularity of the latter. In this paper, we investigate the relation between project popularity and regular, consistent documentation updates. We found strong indicators that consistently popular projects exhibited consistent documentation effort and that this effort tended to attract more documentation collaborators. We also found that frameworks required more documentation effort than libraries to achieve similar adoption success, especially in the initial phase. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 360–363 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597120 %R 10.1145/2597073.2597120 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Do Developers Discuss Design? %A Brunet, João %A Murphy, Gail C. %A Terra, Ricardo %A Figueiredo, Jorge %A Serey, Dalton %K Design Discussions %K empirical study %K machine learning %K mining challenge %K msr challenge %X Design is often raised in the literature as important to attaining various properties and characteristics in a software system. At least for open-source projects, it can be hard to find evidence of ongoing design work in the technical artifacts produced as part of the development. Although developers usually do not produce specific design documents, they do communicate about design in different ways. In this paper, we provide quantitative evidence that developers address design through discussions in commits, issues, and pull requests. To achieve this, we built a discussions' classifier and automatically labeled 102,122 discussions from 77 projects. Based on this data, we make four observations about the projects: i) on average, 25% of the discussions in a project are about design; ii) on average, 26% of developers contribute to at least one design discussion; iii) only 1% of the developers contribute to more than 15% of the discussions in a project; and iv) these few developers who contribute to a broad range of design discussions are also the top committers in a project. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 340–343 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597115 %R 10.1145/2597073.2597115 %> https://flosshub.org/sites/flosshub.org/files/brunet.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T An Insight into the Pull Requests of GitHub %A Rahman, Mohammad Masudur %A Chanchal K. Roy %K Commit comments %K mining challenge %K msr challenge %K pull request %K topic model %X Given the increasing number of unsuccessful pull requests in GitHub projects, insights into the success and failure of these requests are essential for the developers. In this paper, we provide a comparative study between successful and unsuccessful pull requests made to 78 GitHub base projects by 20,142 developers from 103,192 forked projects. In the study, we analyze pull request discussion texts, project specific information (e.g., domain, maturity), and developer specific information (e.g., experience) in order to report useful insights, and use them to contrast between successful and unsuccessful pull requests. We believe our study will help developers overcome the issues with pull requests in GitHub, and project administrators with informed decision making. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 364–367 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597121 %R 10.1145/2597073.2597121 %> https://flosshub.org/sites/flosshub.org/files/rahman.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Magnet or Sticky? An OSS Project-by-project Typology %A Yamashita, Kazuhiro %A McIntosh, Shane %A Kamei, Yasutaka %A Ubayashi, Naoyasu %K Developer migration %K Magnet %K mining challenge %K msr challenge %K open source %K Sticky %X For Open Source Software (OSS) projects, retaining existing contributors and attracting new ones is a major concern. In this paper, we expand and adapt a pair of population migration metrics to analyze migration trends in a collection of open source projects. Namely, we study: (1) project stickiness, i.e., its tendency to retain existing contributors and (2) project magnetism, i.e., its tendency to attract new contributors. Using quadrant plots, we classify projects as attractive (highly magnetic and sticky), stagnant (highly sticky, weakly magnetic), fluctuating (highly magnetic, weakly sticky), or terminal (weakly magnetic and sticky). Through analysis of the MSR challenge dataset, we find that: (1) quadrant plots can effectively identify at-risk projects, (2) stickiness is often motivated by professional activity and (3) transitions among quadrants as a project ages often coincides with interesting events in the evolution history of a project. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 344–347 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597116 %R 10.1145/2597073.2597116 %> https://flosshub.org/sites/flosshub.org/files/yamashita.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Security and Emotion: Sentiment Analysis of Security Discussions on GitHub %A Pletea, Daniel %A Vasilescu, Bogdan %A Serebrenik, Alexander %K github %K mining challenge %K msr challenge %K security %K sentiment analysis %X Application security is becoming increasingly prevalent during software and especially web application development. Consequently, countermeasures are continuously being discussed and built into applications, with the goal of reducing the risk that unauthorized code will be able to access, steal, modify, or delete sensitive data. In this paper we gauged the presence and atmosphere surrounding security-related discussions on GitHub, as mined from discussions around commits and pull requests. First, we found that security related discussions account for approximately 10% of all discussions on GitHub. Second, we found that more negative emotions are expressed in security-related discussions than in other discussions. These findings confirm the importance of properly training developers to address security concerns in their applications as well as the need to test applications thoroughly for security vulnerabilities in order to reduce frustration and improve overall project atmosphere. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 348–351 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597117 %R 10.1145/2597073.2597117 %> https://flosshub.org/sites/flosshub.org/files/pletea.pdf %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Sentiment Analysis of Commit Comments in GitHub: An Empirical Study %A Guzman, Emitza %A Azócar, David %A Li, Yang %K Human Factors in Software Engineering %K mining challenge %K msr challenge %K sentiment analysis %X Emotions have a high impact in productivity, task quality, creativity, group rapport and job satisfaction. In this work we use lexical sentiment analysis to study emotions expressed in commit comments of different open source projects and analyze their relationship with different factors such as used programming language, time and day of the week in which the commit was made, team distribution and project approval. Our results show that projects developed in Java tend to have more negative commit comments, and that projects that have more distributed teams tend to have a higher positive polarity in their emotional content. Additionally, we found that commit comments written on Mondays tend to a more negative emotion. While our results need to be confirmed by a more representative sample they are an initial step into the study of emotions and related factors in open source projects. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 352–355 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597118 %R 10.1145/2597073.2597118 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T A Study of External Community Contribution to Open-source Projects on GitHub %A Padhye, Rohan %A Mani, Senthil %A Sinha, Vibha Singhal %K community participation %K core committers %K external contribution %K mining challenge %K mining software repositories %K msr challenge %K Open-source software %K pull requests %X Open-source software projects are primarily driven by community contribution. However, commit access to such projects' software repositories is often strictly controlled. These projects prefer to solicit external participation in the form of patches or pull requests. In this paper, we analyze a set of 89 top-starred GitHub projects and their forks in order to explore the nature and distribution of such community contribution. We first classify commits (and developers) into three categories: core, external and mutant, and study the relative sizes of each of these classes through a ring-based visualization. We observe that projects written in mainstream scripting languages such as JavaScript and Python tend to include more external participation than projects written in upcoming languages such as Scala. We also visualize the geographic spread of these communities via geocoding. Finally, we classify the types of pull requests submitted based on their labels and observe that bug fixes are more likely to be merged into the main projects as compared to feature enhancements. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 332–335 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597113 %R 10.1145/2597073.2597113 %0 Conference Paper %B Proceedings of the 11th Working Conference on Mining Software Repositories %D 2014 %T Understanding "Watchers" on GitHub %A Sheoran, Jyoti %A Blincoe, Kelly %A Kalliamvakou, Eirini %A Damian, Daniela %A Ell, Jordan %K github %K mining challenge %K msr challenge %K repositories %K Software Teams %K Watchers %X Users on GitHub can watch repositories to receive notifications about project activity. This introduces a new type of passive project membership. In this paper, we investigate the behavior of watchers and their contribution to the projects they watch. We find that a subset of project watchers begin contributing to the project and those contributors account for a significant percentage of contributors on the project. As contributors, watchers are more confident and contribute over a longer period of time in a more varied way than other contributors. This is likely attributable to the knowledge gained through project notifications. %B Proceedings of the 11th Working Conference on Mining Software Repositories %S MSR 2014 %I ACM %C New York, NY, USA %P 336–339 %@ 978-1-4503-2863-0 %U http://doi.acm.org/10.1145/2597073.2597114 %R 10.1145/2597073.2597114 %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T Do comments explain codes adequately? %A Mizuno, Osamu %A Hirata, Yukinao %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K comments %K eclipse %K msr challenge %K netbeans %K prediction %X Comment lines in the software source code include descriptions of codes, usage of codes, copyrights, unused codes, comments, and so on. It is required for comments to explain the content of written code adequately, since the wrong description in the comment may causes further bug and confusion in maintenance. In this paper, we try to clarify a research question: "In which projects do comments describe the code adequately?" To answer this question, we selected the group 1 of mining challenge and used data obtained from Eclipse and Netbeans. Since it is difficult to answer the above question directly, we define the distance between codes and comments. By utilizing the fault-prone module prediction technique, we can answer the alternative question from the data of two projects. The result shows that Eclipse project has relatively adequate comments. %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 242-245 %8 05/2011 %@ 9781450305747 %! MSR '11 %R 10.1145/1985441.1985482 %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T A tale of two browsers %A Davis, Ian %A Godfrey, Michael W. %A Baysal, Olga %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K chrome %K development history %K Firefox %K msr challenge %X We explore the space of open source systems and their user communities by examining the development artifact histories of two popular web browsers -- Firefox and Chrome -- as well as usage data. By examining the data and addressing a number of research questions, two very different profiles emerge: Firefox, as the older and established system, with long product version cycles but short bug fix cycles, and a user base that is slow to adopt newer versions; and Chrome, as the new and fast evolving system, with short version cycles, longer bug fix cycles, and a user base that very quickly adopts new versions as they become available (due largely to Chrome's mandatory automatic updates). %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 238-241 %8 05/2011 %@ 9781450305747 %! MSR '11 %R 10.1145/1985441.1985481 %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T What topics do Firefox and Chrome contributors discuss? %A Zagarese, Quirino %A Distante, Damiano %A Di Penta, Massimiliano %A Bernardi, Mario Luca %A Sementa, Carmine %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K bug reports %K chrome %K Firefox %K LDA %K msr challenge %X Firefox and Chrome are two very popular open source Web browsers, implemented in C/C++. This paper analyzes what topics were discussed in Firefox and Chrome bug reports over time. To this aim, we indexed the text contained in bug reports submitted each semester of the project history, and identified topics using Latent Dirichlet Allocation (LDA). Then, we investigated to what extent Firefox and Chrome developers/contributors discussed similar topics, either in different periods, or over the same period. Results indicate a non-negligible overlap of topics, mainly on issues related to page layouting, user interaction, and multimedia contents. %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 234-237 %8 05/2011 %@ 9781450305747 %! MSR '11 %R 10.1145/1985441.1985480 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Assessment of issue handling efficiency %A Luijten, Bart %A Visser, Joost %A Zaidman, Andy %K bug reports %K bug tracking %K classification %K gnome %K msr challenge %K visualization %X We mined the issue database of GNOME to assess how issues are handled. How many issues are submitted and resolved? Does the backlog grow or decrease? How fast are issues resolved? Does issue resolution speed increase or decrease over time? In which subproject are issues handled most efficiently? To answer such questions, we apply several visualization and quantification instruments to the raw issue data. In particular, we aggregate issues into four risk categories, based on their resolution time. These categories are the basis both for visualizing and ranking, which are used in concert for issue database exploration. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 94 - 97 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463292 %> https://flosshub.org/sites/flosshub.org/files/94bluijtenMSR2010.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Cloning and copying between GNOME projects %A Krinke, Jens %A Gold, Nicolas %A Jia, Yue %A Binkley, David %K clone %K gnome %K msr challenge %K source code %X This paper presents an approach to automatically distinguish the copied clone from the original in a pair of clones. It matches the line-by-line version information of a clone to the pair's other clone. A case study on the GNOME Desktop Suite revealed a complex flow of reused code between the different subprojects. In particular, it showed that the majority of larger clones (with a minimal size of 28 lines or higher) exist between the subprojects and more than 60% of the clone pairs can be automatically separated into original and copy. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 98 - 101 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463290 %> https://flosshub.org/sites/flosshub.org/files/98Coning.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T A comparative exploration of FreeBSD bug lifetimes %A Bougie, Gargi %A Treude, Christoph %A Daniel M. German %A Storey, Margaret-Anne %K bug reports %K bug tracking %K classification %K eclipse %K msr challenge %K prediction %X In this paper, we explore the viability of mining the basic data provided in bug repositories to predict bug lifetimes. We follow the method of Lucas D. Panjer as described in his paper, Predicting Eclipse Bug Lifetimes. However, in place of Eclipse data, the FreeBSD bug repository is used. We compare the predictive accuracy of five different classification algorithms applied to the two data sets. In addition, we propose future work on whether there is a more informative way of classifying bugs than is considered by current bug tracking systems. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 106 - 109 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463291 %> https://flosshub.org/sites/flosshub.org/files/106ChallengeGargi.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Finding file clones in FreeBSD Ports Collection %A Sasaki, Yusuke %A Yamamoto, Tetsuo %A Hayase, Yasuhiro %A Inoue, Katsuro %K clone %K freebsd %K msr challenge %K source code %X In Open Source System (OSS) development, software components are often imported and reused; for this reason we might expect that files are copied in multiple projects (file clones). In this paper, we propose a file clone detection tool called FCFinder and show the analysis performed with it on the FreeBSD Ports Collection, a large OSS project collection. We found many file clones among similar or related projects, which are systematically introduced from base projects. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 102 - 105 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463293 %> https://flosshub.org/sites/flosshub.org/files/102FreeBSDClones.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Mining security changes in FreeBSD %A Mauczka, Andreas %A Schanes, Christian %A Fankhauser, Florian %A Bernhart, Mario %A Grechenig, Thomas %K freebsd %K msr challenge %K security %X Current research on historical project data is rarely touching on the subject of security related information. Learning how security is treated in projects and which parts of a software are historically security relevant or prone to security changes can enhance the security strategy of a software project. We present a mining methodology for security related changes by modifying an existing method of software repository analysis. We use the gathered security changes to find out more about the nature of security in the FreeBSD project and we try to establish a link between the identified security changes and a tracker for security issues (security advisories). We give insights how security is presented in the FreeBSD project and show how the mined data and known security problems are connected. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 90 - 93 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463289 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Perspectives on bugs in the Debian bug tracking system %A Davies, Julius %A Hanyu Zhang %A Nussbaum, Lucas %A Daniel M. German %K bug reports %K debian %K msr challenge %K popularity %X Bugs in Debian differ from regular software bugs. They are usually associated with packages, instead of software modules. They are caused and fixed by source package uploads instead of code commits. The majority are reported by individuals who appear in the bug database once, and only once. There also exists a small group of bug reporters with over 1,000 bug reports each to their name. We also explore our idea that a high bug-frequency for an individual package might be an indicator of popularity instead of poor quality. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 86 - 89 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463288 %> https://flosshub.org/sites/flosshub.org/files/86bugs-debian.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Author entropy vs. file size in the GNOME suite of applications %A Casebolt, Jason R. %A Krein, Jonathan L. %A MacLean, Alexander C. %A Knutson, Charles D. %A Delorey, Daniel P. %K author entropy %K contributions %K gnome %K msr challenge %X We present the results of a study in which author entropy was used to characterize author contributions per file. Our analysis reveals three patterns: banding in the data, uneven distribution of data across bands, and file size dependent distributions within bands. Our results suggest that when two authors contribute to a file, large files are more likely to have a dominant author than smaller files. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 91 - 94 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069484 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Evaluating process quality in GNOME based on change request data %A Schackmann, Holger %A Lichter, Horst %K bugzilla %K bugzillametrics.org %K change analysis %K change history %K gnome %K msr challenge %K qmetric %X The lifecycle of defects reports and enhancement requests collected in the Bugzilla database of the GNOME project provides valuable information on the evolution of the change request process and for the assessment of process quality in the GNOME sub projects. We present a quality model for the analysis of quality characteristics that is based on evaluating metrics on the Bugzilla database, and illustrate it with a comparative evaluation for 25 of the largest products within GNOME. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 95 - 98 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069485 %> https://flosshub.org/sites/flosshub.org/files/95ProcessQualityInGNOME.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Mining the coherence of GNOME bug reports with statistical topic models %A Linstead, Erik %A Baldi, Pierre %K bug reports %K bugzilla %K gnome %K msr challenge %K quality %K sourcerer %X We adapt latent Dirichlet allocation to the problem of mining bug reports in order to define a new information-theoretic measure of coherence. We then apply our technique to a snapshot of the GNOME Bugzilla database consisting of 431,863 bug reports for multiple software projects. In addition to providing an unsupervised means for modeling report content, our results indicate substantial promise in applying statistical text mining algorithms for estimating bug report quality. Complete results are available from our supplementary materials Web site at http://sourcerer.ics.uci.edu/msr2009/gnome_coherence.html. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 99 - 102 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069486 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T On the use of Internet Relay Chat (IRC) meetings by developers of the GNOME GTK+ project %A Shihab, Emad %A Zhen Ming Jiang %A Hassan, Ahmed E. %K gnome %K gtk %K irc %K msr challenge %X Developers of open source projects are distributed across the world. They rely on email, mailing lists, instant messaging, IRC channels and more recently IRC meetings to communicate. Most of the studies thus far focus on the use of mailing lists by OSS developers, however, an increasing number of open source projects are using IRC meetings to hold developer meetings. In this paper, we mine the #gtk-devel IRC meeting channel and study the usage of the IRC meetings held by the GNOME GTK+ core developers and maintainers. We look at three different dimensions: the discussion volume of the meetings, the number of participants attending the meetings and the activity of these participants. Our findings show that IRC meetings are gaining popularity among open source developers and maintainers: the IRC meeting discussions are increasing in volume, have increasing attendance levels, and the participants actively contribute to the meetings. To the best of our knowledge, this is the first study on the use of developer IRC meetings by OSS developers. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 107 - 110 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069488 %> https://flosshub.org/sites/flosshub.org/files/107MSR2009-MSR-0130-Shihab-Emad.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Visualizing Gnome with the Small Project Observatory %A Lungu, Mircea %A Malnati, Jacopo %A Lanza, Michele %K bugzilla %K contributions %K gnome %K msr challenge %K spo %K visualization %X We analyzed the gnome family of systems with the small project observatory, our online ecosystem visualization platform. We begin by briefly introducing the model of SPO. We then observe and discuss several phases in the activity of the gnome ecosystem. We follow and look at how the contributors are distributed between writing source code and doing other activities such as internationalization. We end with a visual overview of the activity of more than 900 contributors in the 10 years of existence of gnome. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 103 - 106 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069487 %> https://flosshub.org/sites/flosshub.org/files/103Lung2009a.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Analyzing the evolution of eclipse plugins %A Wermelinger, Michel %A Yu, Yijun %K architectural evolution %K cvs %K eclipse %K metadata %K msr challenge %K releases %K source code %X Eclipse is a good example of a modern component-based complex system that is designed for long-term evolution, due to its architecture of reusable and extensible components. This paper presents our preliminary results about the evolution of Eclipse's architecture, based on a lightweight and scalable analysis of the metadata in Eclipse's sources. We find that the development of Eclipse follows a systematic process: most architectural changes take place in milestones, and maintenance releases only make exceptional changes to component dependencies. We also found a stable architectural core that remains since the first release. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 133–136 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370783 %R http://doi.acm.org/10.1145/1370750.1370783 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T An initial study of the growth of eclipse defects %A Zhang, Hongyu %K bug reports %K defect growth model %K defect prediction %K eclipse %K msr challenge %K polynomial regression %X We analyze the Eclipse defect data from June 2004 to November 2007, and find that the growth of the number of defects can be well modeled by polynomial functions. Furthermore, we can predict the number of future Eclipse defects based on the nature of defect growth. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 141–144 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370785 %R http://doi.acm.org/10.1145/1370750.1370785 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T A newbie's guide to eclipse APIs %A Holmes, Reid %A Walker, Robert J. %K API popularity %K documentation %K eclipse %K mining software repositories %K module %K msr challenge %K PopCon %K popularity %X Eclipse has evolved from a fledgling Java IDE into a mature software ecosystem. One of the greatest benefits Eclipse provides developers is flexibility; however, this is not without cost. New Eclipse developers often find the framework to be large and confusing. Determining which parts of the framework they should be using can be a difficult task as Eclipse documentation tends to be either very high-level, focusing on the design of the framework, or low-level, focusing on specific APIs. We have developed a tool called PopCon that provides a bridge between high-level design documentation and low-level API documentation by statically analyzing a framework and several of its clients and providing a ranked list of the relative popularity of its APIs. We have applied PopCon to the Eclipse framework for this challenge to help newbie Eclipse developers identify some of the most relevant APIs for their tasks. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 149–152 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370787 %R http://doi.acm.org/10.1145/1370750.1370787 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Summarizing developer work history using time series segmentation: challenge report %A Siy, Harvey %A Chundi, Parvathi %A Subramaniam, Mahadevan %K contributions %K cvs %K developers %K eclipse %K msr challenge %K temporal segmentation %K time series %K work history %X Temporal segmentation partitions time series data with the intent of producing more homogeneous segments. It is a technique used to preprocess data so that subsequent time series analysis on individual segments can detect trends that may not be evident when performing time series analysis on the entire dataset. This technique allows data miners to partition a large dataset without making any assumption of periodicity or any other a priori knowledge of the dataset's features. We investigate the insights that can be gained from the application of time series segmentation to software version repositories. Software version repositories from large projects contain on the order of hundreds of thousands of timestamped entries or more. It is a continuing challenge to aggregate such data so that noise is reduced and important characteristics are brought out. In this paper, we present a way to summarize developer work history in terms of the files they have modified over time by segmenting the CVS change data of individual Eclipse developers. We show that the files they modify tends to change significantly over time though most of them tend to work within the same directories. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 137–140 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370784 %R http://doi.acm.org/10.1145/1370750.1370784 %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Towards a simplification of the bug report form in eclipse %A Herraiz, Israel %A Daniel M. German %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %K bug fixing %K bug report %K bug tracking system %K classification %K eclipse %K msr challenge %K severity %X We believe that the bug report form of Eclipse contains too many fields, and that for some fields, there are too many options. In this MSR challenge report, we focus in the case of the severity field. That field contains seven different levels of severity. Some of them seem very similar, and it is hard to distinguish among them. Users assign severity, and developers give priority to the reports depending on their severity. However, if users can not distinguish well among the various severity options, they will probably assign different priorities to bugs that require the same priority. We study the mean time to close bugs reported in Eclipse, and how the severity assigned by users affects this time. The results shows that classifying by time to close, there are less clusters of bugs than levels of severity. We therefore conclude that there is a need to make a simpler bug report form. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 145–148 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370786 %R http://doi.acm.org/10.1145/1370750.1370786 %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Impact of the Creation of the Mozilla Foundation in the Activity of Developers %A Jesus M. Gonzalez-Barahona %A Gregorio Robles %A Herraiz, Israel %K cvs %K cvsanaly %K developers %K mining challenge %K mozilla %K msr challenge %K revision history %X During 2003, the Mozilla project transitioned from company-promoted (sponsored by AOL) to community-promoted (sponsored by the Mozilla Foundation). What happened to the group of developers during this transition? There was any significant impact on its activity or composition? To answer these questions, we have performed an analysis of the CVS repository of Mozilla, using the CVSAnalY tool, finding little on activity, but dramatic changes in the the composition of the development team. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 28 - 28 %@ 0-7695-2950-X %R 10.1109/MSR.2007.15 %> https://flosshub.org/sites/flosshub.org/files/28300028.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Mining Eclipse Developer Contributions via Author-Topic Models %A Linstead, Erik %A Rigor, Paul %A Bajracharya, Sushil %A Lopes, Cristina %A Baldi, Pierre %K contributions %K developers %K eclipse %K expertise %K mining challenge %K msr challenge %K source code %K topics %X We present the results of applying statistical author-topic models to a subset of the Eclipse 3.0 source code consisting of 2,119 source files and 700,000 lines of code from 59 developers. This technique provides an intuitive and automated framework with which to mine developer contributions and competencies from a given code base while simultaneously extracting software function in the form of topics. In addition to serving as a convenient summary for program function and developer activities, our study shows that topic models provide a meaningful, effective, and statistical basis for developer similarity analysis. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 30 - 30 %@ 0-7695-2950-X %R 10.1109/MSR.2007.20 %> https://flosshub.org/sites/flosshub.org/files/28300030.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Predicting Defects and Changes with Import Relations %A Schroter, Adrian %K defects %K eclipse %K effort estimation %K mining challenge %K msr challenge %K prediction %X Lowering the number of defects and estimating the development time of a software project are two important goals of software engineering. To predict the number of defects and changes we train models with import relations. This enables us to decrease the number of defects by more efficient testing and to assess the effort needed in respect to the number of changes. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 31 - 31 %@ 0-7695-2950-X %R 10.1109/MSR.2007.24 %> https://flosshub.org/sites/flosshub.org/files/28300031.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T Predicting Eclipse Bug Lifetimes %A Panjer, Lucas D. %K bug fixing %K bugzilla %K classification %K eclipse %K effort estimation %K mining challenge %K msr challenge %K prediction %K weka %X In non-trivial software development projects planning and allocation of resources is an important and difficult task. Estimation of work time to fix a bug is commonly used to support this process. This research explores the viability of using data mining tools to predict the time to fix a bug given only the basic information known at the beginning of a bug's lifetime. To address this question, a historical portion of the Eclipse Bugzilla database is used for modeling and predicting bug lifetimes. A bug history transformation process is described and several data mining models are built and tested. Interesting behaviours derived from the models are documented. The models can correctly predict up to 34.9% of the bugs into a discretized log scaled lifetime class. %B Fourth International Workshop on Mining Software RepositoriesFourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 29 - 29 %@ 0-7695-2950-X %R 10.1109/MSR.2007.25 %> https://flosshub.org/sites/flosshub.org/files/28300029.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Analyzing OSS developers' working time using mailing lists archives %A Tsunoda, Masateru %A Monden, Akito %A Kakimoto, Takeshi %A Kamei, Yasutaka %A Matsumoto, Ken-ichi %K developers %K email %K email archives %K mailing lists %K mining challenge %K msr challenge %K overtime work %K postgresql %K workload %X Our mining question is “when OSS developers work?” OSS developers' working time may be a good indicator to understand the development style of a project. (For example, if many developers work in office hour, these might be daily works in a company.) %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 181–182 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138031 %R http://doi.acm.org/10.1145/1137983.1138031 %> https://flosshub.org/sites/flosshub.org/files/181AnalyzingOSS.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Applying the evolution radar to PostgreSQL %A D'Ambros, Marco %A Lanza, Michele %K cvs %K documentation %K evolution %K evolution radar %K logical coupling %K makefile %K mining challenge %K msr challenge %K postgresql %K re-engineering %K refactoring %K release history %K rhdb %K source code %K version control %K visualization %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 177–178 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138029 %R http://doi.acm.org/10.1145/1137983.1138029 %> https://flosshub.org/sites/flosshub.org/files/177ApplyingEvolution.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Co-change visualization applied to PostgreSQL and ArgoUML: (MSR challenge report) %A Beyer, Dirk %K argouml %K ccvisu %K cvs %K force-directed graph layout %K graph %K mining challenge %K msr challenge %K postgresql %K software clustering %K software structure analysis %K software visualization %K version control %K visualization %X Co-change visualization is a method to recover the subsystem structure of a software system from the version history, based on common changes and visual clustering. This paper presents the results of applying the tool CCVisu which implements co-change visualization, to the two open-source software systems PostgreSQL and ArgoUML The input of the method is the co-change graph, which can be easily extracted by CCVisu from a Cvs version repository. The output is a graph layout that places software artifacts that were often commonly changed at close positions, and artifacts that were rarely co-changed at distant positions. This property of the layout is due to the clustering property of the underlying energy model,which evaluates the quality of a produced layout. The layout can be displayed on the screen, or saved to a file in SVG or VRML format. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 165–166 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138023 %R http://doi.acm.org/10.1145/1137983.1138023 %> https://flosshub.org/sites/flosshub.org/files/165Co-Change.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Examining the evolution of code comments in PostgreSQL %A Zhen Ming Jiang %A Hassan, Ahmed E. %K code comments %K comments %K cvs %K evolution %K functions %K maintenance %K mining challenge %K msr challenge %K postgresql %K software evolution %K software maintenance %K source code %X It is common, especially in large software systems, for developers to change code without updating its associated comments due to their unfamiliarity with the code or due to time constraints. This is a potential problem since outdated comments may confuse or mislead developers who perform future development. Using data recovered from CVS, we study the evolution of code comments in the PostgreSQL project. Our study reveals that over time the percentage of commented functions remains constant except for early fluctuation due to the commenting style of a particular active developer. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 179–180 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138030 %R http://doi.acm.org/10.1145/1137983.1138030 %> https://flosshub.org/sites/flosshub.org/files/179ExaminingTheEvolution.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T How long did it take to fix bugs? %A Kim, Sunghun %A Whitehead,Jr., E. James %K argouml %K bug fixing %K bugs %K mining challenge %K msr challenge %K postgresql %K quality %K time %X The number of bugs (or fixes) is a common factor used to measure the quality of software and assist bug related analysis. For example, if software files have many bugs, they may be unstable. In comparison, the bug-fix time--the time to fix a bug after the bug was introduced--is neglected. We believe that the bug-fix time is an important factor for bug related analysis, such as measuring software quality. For example, if bugs in a file take a relatively long time to be fixed, the file may have some structural problems that make it difficult to make changes. In this report, we compute the bug-fix time of files in ArgoUML and PostgreSQL by identifying when bugs are introduced and when the bugs are fixed. This report includes bug-fix time statistics such as average bug-fix time, and distributions of bug-fix time. We also list the top 20 bug-fix time files of two projects. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 173–174 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138027 %R http://doi.acm.org/10.1145/1137983.1138027 %> https://flosshub.org/sites/flosshub.org/files/173HowLong.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining additions of method calls in ArgoUML %A Zimmermann, Thomas %A Breu, Silvia %A Lindig, Christian %A Livshits, Benjamin %K argouml %K change analysis %K eclipse %K function calls %K mining challenge %K msr challenge %K pattern %K source code %K xelopes %X In this paper we refine the classical co-change to the addition of method calls. We use this concept to find usage patterns and to identify cross-cutting concerns for ArgoUML. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 169–170 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138025 %R http://doi.acm.org/10.1145/1137983.1138025 %> https://flosshub.org/sites/flosshub.org/files/169MiningAdditions.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining refactorings in ARGOUML %A Weißgerber, Peter %A Diehl, Stephan %A Görg, Carsten %K argouml %K bug tracking %K bugs %K cvs %K email %K evolution %K mining challenge %K msr challenge %K re-engineering %K refactoring %K release history %X In this paper we combine the results of our refactoring reconstruc- tion technique with bug, mail and release information to perform process and bug analyses of the ARGOUML CVS archive. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 175–176 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138028 %R http://doi.acm.org/10.1145/1137983.1138028 %> https://flosshub.org/sites/flosshub.org/files/175MiningRefactorings.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Mining software repositories with CVSgrab %A Voinea, Lucian %A Telea, Alexandru %K argouml %K cvs %K cvsgrab %K evolution %K mining challenge %K msr challenge %K postgresql %K software visualization %K source code %K team %K visualization %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 167–168 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138024 %R http://doi.acm.org/10.1145/1137983.1138024 %> https://flosshub.org/sites/flosshub.org/files/167MiningSoftware.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T A study of the contributors of PostgreSQL %A Daniel M. German %K contributions %K contributors %K cvs %K developers %K mining challenge %K mining software repositories %K msr challenge %K patches %K postgresql %K revision history %K roles %K software evolution %K source code %K team %X This report describes some characteristics of the development team of PostgreSQL that were uncovered by analyzing the history of its software artifacts as recorded by the project's CVS repository. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 163–164 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138022 %R http://doi.acm.org/10.1145/1137983.1138022 %> https://flosshub.org/sites/flosshub.org/files/163AStudyOf.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Using software birthmarks to identify similar classes and major functionalities %A Kakimoto, Takeshi %A Monden, Akito %A Kamei, Yasutaka %A Tamada, Haruaki %A Tsunoda, Masateru %A Matsumoto, Ken-ichi %K argouml %K class %K file %K mining challenge %K msr challenge %K multi-dimensional scaling %K similarity %K software birthmark %K source code %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 171–172 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138026 %R http://doi.acm.org/10.1145/1137983.1138026 %> https://flosshub.org/sites/flosshub.org/files/171UsingSoftware.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Where is bug resolution knowledge stored? %A Canfora, Gerardo %A Cerulo, Luigi %K argouml %K bugs %K bugzilla %K cvs %K impact analysis %K mining challenge %K mining software repositories %K msr challenge %K source code %X ArgoUML uses both CVS and Bugzilla to keep track of bug-fixing activities since 1998. A common practice is to reference source code changes resolving a bug stored in Bugzilla by inserting the id number of the bug in the CVS commit notes. This relationship reveals useful to predict code entities impacted by a new bug report.In this paper we analyze ArgoUML software repositories with a tool, we have implemented, showing what are Bugzilla fields that better predict such impact relationship, that is where knowledge about bug resolution is stored. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 183–184 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138032 %R http://doi.acm.org/10.1145/1137983.1138032 %> https://flosshub.org/sites/flosshub.org/files/183WhereIsBug.pdf