%0 Journal Article %J Empirical Software Engineering %D 2016 %T An empirical study of integration activities in distributions of open source software %A Adams, Bram %A Kavanagh, Ryan %A Hassan, Ahmed E. %A Daniel M. German %X Reuse of software components, either closed or open source, is considered to be one of the most important best practices in software engineering, since it reduces development cost and improves software quality. However, since reused components are (by definition) generic, they need to be customized and integrated into a specific system before they can be useful. Since this integration is system-specific, the integration effort is non-negligible and increases maintenance costs, especially if more than one component needs to be integrated. This paper performs an empirical study of multi-component integration in the context of three successful open source distributions (Debian, Ubuntu and FreeBSD). Such distributions integrate thousands of open source components with an operating system kernel to deliver a coherent software product to millions of users worldwide. We empirically identified seven major integration activities performed by the maintainers of these distributions, documented how these activities are being performed by the maintainers, then evaluated and refined the identified activities with input from six maintainers of the three studied distributions. The documented activities provide a common vocabulary for component integration in open source distributions and outline a roadmap for future research on software integration. %B Empirical Software Engineering %I Springer %V 21 %P 960–1001 %U http://mcis.soccerlab.polymtl.ca/publications/2016/integration_oss_distribution.pdf %> https://flosshub.org/sites/flosshub.org/files/integration_oss_distribution.pdf %0 Journal Article %J Journal of Information Processing %D 2016 %T Magnet or Sticky? Measuring Project Characteristics from the Perspective of Developer Attraction and Retention %A Yamashita, Kazuhiro %A Kamei, Yasutaka %A McIntosh, Shane %A Hassan, Ahmed E. %A Ubayashi, Naoyasu %K github %K retention %X Open Source Software (OSS) is vital to both end users and enterprises. As OSS systems are becoming a type of infrastructure, long-term OSS projects are desired. For the survival of OSS projects, the projects need to not only retain existing developers, but also attract new developers to grow. To better understand how projects retain and attract contributors, our preliminary study aimed to measure the personnel attraction and retention of OSS projects using a pair of population migration metrics, called Magnet (personnel attraction) and Sticky (retention) metrics. Because the preliminary study analyzed only 90 projects and the 90 projects are not representative of GitHub, this paper extend the preliminary study to better understand the generalizability of the results by analyzing 16, 552 projects of GitHub. Furthermore, we also add a pilot study to investigate the typical duration between releases to find more appropriate release duration. The study results show that (1) approximately 23% of developers remain in the same projects that the developers contribute to, (2) the larger projects are likely to attract and retain more developers, (3) 53% of terminal projects eventually decay to a state of fewer than ten developers and (4) 55% of attractive projects remain in an attractive category. %B Journal of Information Processing %V 24 %P 339-348 %U https://www.jstage.jst.go.jp/article/ipsjjip/24/2/24_339/_article %R 10.2197/ipsjjip.24.339 %0 Conference Proceedings %B 12th Working Conference on Mining Software Repositories (MSR 2015) %D 2015 %T A Dataset of the Activity of the git Super-repository of Linux in 2012 %A Daniel M. German %A Adams, Bram %A Hassan, Ahmed E. %X This dataset documents the activity in the public portion of the git Super-repository of the Linux kernel during 2012. In a distributed version control system, such as git, the Super-repository is the collection of all the repositories (repos) used for development. In such a Super-repository, some repos will be accessible only by their owners (they are private, and are located in places that are unreachable to other users) while others are available to other members of the team. The latter public repositories are used as avenues through which commits flow from one developer to another. During the last six weeks of 2011, we proceeded to automatically discover the public portion of the Super-repository of Linux. Then, in 2012, every 3 hrs, each of these public repositories was queried to see what new commits it had and what commits had disappeared from it using a process we call continuous mining. This resulted in the identification of 533,513 different commits across 451 different public repositories and how they propagated through the Linux Super-repository, including the repository of Linus Torvalds (i.e., the main repository of the Linux kernel). This information could help us understand how kernel contributors use git, how they collaborate and how commits are integrated into the Linux kernel and into the repositories of organizations that distribute the kernel. This dataset is at http://turingmachine.org/2015/linuxGit %B 12th Working Conference on Mining Software Repositories (MSR 2015) %I IEEE %8 05/2015 %U http://turingmachine.org/2015/linuxGit/msr-data-git-linux.pdf %> https://flosshub.org/sites/flosshub.org/files/msr-data-git-linux.pdf %0 Conference Proceedings %B 12th Working Conference on Mining Software Repositories (MSR 2015) %D 2015 %T Investigating Code Review Practices in Defective Files: An Empirical Study of the Qt System %A Patanamon Thongtanunam %A McIntosh, Shane %A Hassan, Ahmed E. %A Hajimu Iida %K code review %K software quality %X Software code review is a well-established software quality practice. Recently, Modern Code Review (MCR) has been widely adopted in both open source and proprietary projects. To evaluate the impact that characteristics of MCR practices have on software quality, this paper comparatively studies MCR practices in defective and clean source code files. We investigate defective files along two perspectives: 1) files that will eventually have defects (i.e., future-defective files) and 2) files that have historically been defective (i.e., risky files). Through an empirical study of 11,736 reviews of changes to 24,486 files from the Qt open source project, we find that both future-defective files and risky files tend to be reviewed less rigorously than their clean counterparts. We also find that the concerns addressed during the code reviews of both defective and clean files tend to enhance evolvability, i.e., ease future maintenance (like documentation), rather than focus on functional issues (like incorrect program logic). Our findings suggest that although functionality concerns are rarely addressed during code review, the rigor of the reviewing process that is applied to a source code file throughout a development cycle shares a link with its defect proneness. %B 12th Working Conference on Mining Software Repositories (MSR 2015) %I IEEE %8 05/2015 %U http://sail.cs.queensu.ca/publications/pubs/msr2015-thongtanunam.pdf %> https://flosshub.org/sites/flosshub.org/files/msr2015-thongtanunam.pdf %0 Journal Article %J Science of Computer Programming %D 2014 %T Studying software evolution using topic models %A Stephen W. Thomas %A Adams, Bram %A Hassan, Ahmed E. %A Blostein, Dorothea %K Latent Dirichlet allocation %K mining software repositories %K software evolution %K topic model %X Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently cooccurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task. In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%–89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system. %B Science of Computer Programming %I Elsevier %V 80 %P 457–479 %U http://sail.cs.queensu.ca/publications/pubs/Thomas-2012-SCP.pdf %0 Conference Paper %B Proceedings of the 29th IEEE International Conference on Software Maintainability %D 2013 %T How does Context affect the Distribution of Software Maintainability Metrics? %A Zhang, Feng %A Audris Mockus %A Ying Zou %A Foutse Khomh %A Hassan, Ahmed E. %K benchmark %K context %K contextual factor %K flossmole %K large scale %K metrics %K mining software repositories %K sampling %K software maintainability %K sourceforge %K static metrics %X Software metrics have many uses, e.g., defect prediction, effort estimation, and benchmarking an organization against peers and industry standards. In all these cases, metrics may depend on the context, such as the programming language. Here we aim to investigate if the distributions of commonly used metrics do, in fact, vary with six context factors: application domain, programming language, age, lifespan, the number of changes, and the number of downloads. For this preliminary study we select 320 nontrivial software systems from SourceForge. These software systems are randomly sampled from nine popular application domains of SourceForge. We calculate 39 metrics commonly used to assess software maintainability for each software system and use Kruskal Wallis test and Mann-Whitney U test to determine if there are significant differences among the distributions with respect to each of the six context factors. We use Cliff’s delta to measure the magnitude of the differences and find that all six context factors affect the distribution of 20 metrics and the programming language factor affects 35 metrics. We also briefly discuss how each context factor may affect the distribution of metric values.We expect our results to help software benchmarking and other software engineering methods that rely on these commonly used metrics to be tailored to a particular context. %B Proceedings of the 29th IEEE International Conference on Software Maintainability %S ICSM '13 %> https://flosshub.org/sites/flosshub.org/files/icsm2013_contextstudy.pdf %0 Journal Article %J Empirical Software Engineering %D 2013 %T Management of community contributions %A Bettenburg, Nicolas %A Hassan, Ahmed E. %A Adams, Bram %A Daniel M. German %K android %K contribution %K linux %K management %X In recent years, many companies have realized that collaboration with a thriving user or developer community is a major factor in creating innovative technology driven by market demand. As a result, businesses have sought ways to stimulate contributions from developers outside their corporate walls, and integrate external developers into their development process. To support software companies in this process, this paper presents an empirical study on the contribution management processes of two major, successful, open source software ecosystems. We contrast a for-profit (ANDROID) system having a hybrid contribution style, with a not-for-profit (LINUX kernel) system having an open contribution style. To guide our comparisons, we base our analysis on a conceptual model of contribution management that we derived from a total of seven major open-source software systems. A quantitative comparison based on data mined from the ANDROID code review system and the LINUX kernel code review mailing lists shows that both projects have significantly different contribution management styles, suited to their respective market goals, but with individual advantages and disadvantages that are important for practitioners. Contribution management is a real-world problem that has received very little attention from the research community so far. Both studied systems (LINUX and ANDROID) employ different strategies and techniques for managing contributions, and both approaches are valuable examples for practitioners. Each approach has specific advantages and disadvantages that need to be carefully evaluated by practitioners when adopting a contribution management process in practice. %B Empirical Software Engineering %I Springer %P 1–38 %U http://link.springer.com/article/10.1007/s10664-013-9284-6 %0 Journal Article %J Empirical Software Engineering %D 2012 %T The evolution of Java build systems %A McIntosh, Shane %A Adams, Bram %A Hassan, Ahmed E. %K ant %K build %K maven %K scm %K source code analysis %X Build systems are responsible for transforming static source code artifacts into executable software. While build systems play such a crucial role in software development and maintenance, they have been largely ignored by software evolution researchers. However, a firm understanding of build system aging processes is needed in order to allow project managers to allocate personnel and resources to build system maintenance tasks effectively, and reduce the build maintenance overhead on regular development activities. In this paper, we study the evolution of build systems based on two popular Java build languages (i.e., ANT and Maven) from two perspectives: (1) a static perspective, where we examine the complexity of build system specifications using software metrics adopted from the source code domain; and (2) a dynamic perspective, where the complexity and coverage of representative build runs are measured. Case studies of the build systems of six open source build projects with a combined history of 172 releases show that build system and source code size are highly correlated, with source code restructurings often requiring build system restructurings. Furthermore, we find that Java build systems evolve dynamically in terms of duration and recursive depth of the directory hierarchy. %B Empirical Software Engineering %V 17 %P 578 - 608 %8 8/2012 %N 4-5 %! Empir Software Eng %R 10.1007/s10664-011-9169-5 %0 Journal Article %J Empirical Software Engineering %D 2012 %T Studying the impact of social interactions on software quality %A Bettenburg, Nicolas %A Hassan, Ahmed E. %K bug tracker %K eclipse %K Firefox %K Human Factors %K measurement %K metrics %K software evolution %K Software quality assurance %X Correcting software defects accounts for a significant amount of resources in a software project. To make best use of testing efforts, researchers have studied statistical models to predict in which parts of a software system future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on source code-oriented metrics, such as lines of code or number of changes. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the various repositories used to manage development efforts. In this paper, we develop statistical models to study the impact of social interactions in a software project on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of two large open-source software projects. The results of our case studies demonstrate the impact of metrics from four different dimensions of social interaction on post-release defects. Our findings show that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our results demonstrate that social information does not substitute, but rather augments traditional source code-based metrics used in defect prediction models. %B Empirical Software Engineering %! Empir Software Eng %R 10.1007/s10664-012-9205-0 %0 Conference Paper %B CSMR '12: Proceedings of the 16th European Conference on Software Maintenance and Reengineering %D 2012 %T Using Code Search to Link Code Fragments in Discussions and Source Code %A Bettenburg, Nicolas %A Stephen W. Thomas %A Hassan, Ahmed E. %X When discussing software, practitioners often reference parts of the project’s source code. Such references have different motivations, such as mentoring and guiding less experienced developers, pointing out code that needs changes, or proposing possible strategies for the implementation of future changes. The fact that particular parts of a source code are being discussed makes these parts of the software special. Knowing which code is being talked about the most can not only help practitioners to guide important software engineering and maintenance activities, but also act as a high-level documentation of development activities for managers. In this paper, we use clone-detection as specific instance of a code search based approach for establishing links between code fragments that are discussed by developers and the actual source code of a project. Through a case study on the Eclipse project we explore the traceability links established through this approach, both quantitatively and qualitatively, and compare fuzzy code search based traceability linking to classical approaches, in particular change log analysis and information retrieval. We demonstrate a sample application of code search based traceability links by visualizing those parts of the project that are most discussed in issue reports with a Treemap visualization. The results of our case study show that the traceability links established through fuzzy code search-based traceability linking are conceptually different than classical approaches based on change log analysis or information retrieval. %B CSMR '12: Proceedings of the 16th European Conference on Software Maintenance and Reengineering %I IEEE %P 319-329 %> https://flosshub.org/sites/flosshub.org/files/Bettenburg_2012_CSMR.pdf %0 Journal Article %J Journal of Systems and Software %D 2012 %T Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report %A Weiyi Shang %A Adams, Bram %A Hassan, Ahmed E. %K flossmole cited %X The Mining Software Repositories (MSR) field analyzes software repository data to uncover knowledge and assist development of ever growing, complex systems. However, existing approaches and platforms for MSR analysis face many challenges when performing large-scale MSR studies. Such approaches and platforms rarely scale easily out of the box. Instead, they often require custom scaling tricks and designs that are costly to maintain and that are not reusable for other types of analysis. We believe that the web community has faced many of these software engineering scaling challenges before, as web analyses have to cope with the enormous growth of web data. In this paper, we report on our experience in using a web-scale platform (i.e., Pig) as a data preparation language to aid large-scale MSR studies. Through three case studies, we carefully validate the use of this web platform to prepare (i.e., Extract, Transform, and Load, ETL) data for further analysis. Despite several limitations, we still encourage MSR researchers to leverage Pig in their large-scale studies because of Pig's scalability and flexibility. Our experience report will help other researchers who want to scale their analyses. %B Journal of Systems and Software %V 85 %P 2195 - 2204 %8 10/2012 %U http://www.sciencedirect.com/science/article/pii/S0164121211002007 %N 10 %! Journal of Systems and Software %R 10.1016/j.jss.2011.07.034 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T The evolution of ANT build systems %A McIntosh, Shane %A Adams, Bram %A Hassan, Ahmed E. %K ant %K argouml %K build %K eclipse %K jboss %K maintenance %K metrics %K source code %K tomcat %X Build systems are responsible for transforming static source code artifacts into executable software. While build systems play such a crucial role in software development and maintenance, they have been largely ignored by software evolution researchers. With a firm understanding of build system aging processes, project managers could allocate personnel and resources to build system maintenance tasks more effectively, reducing the build maintenance overhead on regular development activities. In this paper, we study the evolution of ANT build systems from two perspectives: (1) a static perspective, where we examine the build system specifications using software metrics adopted from the source code domain; and (2) a dynamic perspective where representative sample build runs are conducted and their output logs are analyzed. Case studies of four open source ANT build systems with a combined history of 152 releases show that not only do ANT build systems evolve, but also that they need to react in an agile manner to changes in the source code. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 42 - 51 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463341 %> https://flosshub.org/sites/flosshub.org/files/42msr2010_mcintosh.pdf %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Should I contribute to this discussion? %A Ibrahim, Walid M %A Bettenburg, Nicolas %A Shihab, Emad %A Adams, Bram %A Hassan, Ahmed E. %K apache %K contributions %K developers %K email %K email archives %K mailing lists %K postgresql %K python %X Development mailing lists play a central role in facilitating communication in open source projects. Since these lists frequently host design and project discussions, knowledgeable contribution to these discussion threads is essential to avoid mis-communication that might slow-down the progress of a project. However, given the sheer volume of emails on these lists, it is easy to miss important discussions. To find out how developers are able to deal with mailing list discussions, we study the main factors that encourage developers to contribute to the development mailing lists. We develop personalized models to automatically identify discussion threads that a developer would contribute to based on his previous contribution behavior. Case studies on development mailing lists of three open source projects (Apache, PostgreSQL and Python) show that the average accuracy of our models is 89-85% and that the models vary significantly between different developers. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town %P 181 - 190 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463345 %> https://flosshub.org/sites/flosshub.org/files/181ibrahim-msr2010.pdf %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T MapReduce as a general framework to support research in Mining Software Repositories (MSR) %A Weiyi Shang %A Zhen Ming Jiang %A Adams, Bram %A Hassan, Ahmed E. %K hadoop %K mapreduce %X Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REX's. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 21 - 30 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069477 %> https://flosshub.org/sites/flosshub.org/files/21MSR2009-MSR-0114-Shang-Weiyi.pdf %0 Conference Paper %B 2009 IEEE International Conference on Software Maintenance (ICSM) %D 2009 %T Studying the use of developer IRC meetings in open source projects %A Shihab, Emad %A Zhen Ming Jiang %A Hassan, Ahmed E. %K evolution %K gtk %K irc %X Open source developers communicate with each other via various online outlets. Thus far, mailing lists have been the main coordination mechanism. However, our previous study shows that the use of developer IRC meetings is increasing in recent years. In this paper, we perform a study on the IRC meetings of two large open source projects: the GTK+ and Evolution projects. We explore three dimensions: who participates in the meetings, what do they discuss and how do they run the meetings. We find (1) that a small and stable number of the participants contribute the majority of messages in meetings, (2) that there are commonly discussed topics as well as project specific topics (3) that meeting styles vary across different projects. %B 2009 IEEE International Conference on Software Maintenance (ICSM) %I IEEE %C Edmonton, AB, Canada %P 147 - 156 %@ 978-1-4244-4897-5 %U http://sail.cs.queensu.ca/publications/pubs/icsm2009_shihab.pdf %R 10.1109/ICSM.2009.5306333 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T On the use of Internet Relay Chat (IRC) meetings by developers of the GNOME GTK+ project %A Shihab, Emad %A Zhen Ming Jiang %A Hassan, Ahmed E. %K gnome %K gtk %K irc %K msr challenge %X Developers of open source projects are distributed across the world. They rely on email, mailing lists, instant messaging, IRC channels and more recently IRC meetings to communicate. Most of the studies thus far focus on the use of mailing lists by OSS developers, however, an increasing number of open source projects are using IRC meetings to hold developer meetings. In this paper, we mine the #gtk-devel IRC meeting channel and study the usage of the IRC meetings held by the GNOME GTK+ core developers and maintainers. We look at three different dimensions: the discussion volume of the meetings, the number of participants attending the meetings and the activity of these participants. Our findings show that IRC meetings are gaining popularity among open source developers and maintainers: the IRC meeting discussions are increasing in volume, have increasing attendance levels, and the participants actively contribute to the meetings. To the best of our knowledge, this is the first study on the use of developer IRC meetings by OSS developers. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 107 - 110 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069488 %> https://flosshub.org/sites/flosshub.org/files/107MSR2009-MSR-0130-Shihab-Emad.pdf %0 Conference Paper %B the 2008 international workshopProceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T Branching and merging in the repository %A Spacco, Jamie %A Williams, Chadd C. %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K argouml %K changes %K cvs2svn %K diffj %K revision %K scm %K source code %K version control %X Two of the most complex operations version control software allows a user to perform are branching and merging. Branching provides the user the ability to create a copy of the source code to allow changes to be stored in version control but outside of the trunk. Merging provides the user the ability to copy changes from a branch to the trunk. Performing a merge can be a tedious operation and one that may be error prone. In this paper, we compare file revisions found on branches with those found on the trunk to determine when a change that is applied to a branch is moved to the trunk. This will allow us to study how developers use merges and to determine if merges are in fact more error prone than other commits. %B the 2008 international workshopProceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 19-22 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370754 %> https://flosshub.org/sites/flosshub.org/files/p19-williams.pdf %0 Conference Paper %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T Determinism and evolution %A González-Barahona, Jesús M. %A Gregorio Robles %A Herraiz, Israel %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K changes %K evolution %K source code %K sourceforge %X It has been proposed that software evolution follows a Self-Organized Criticality (SOC) dynamics. This fact is supported by the presence of long range correlations in the time series of the number of changes made to the source code over time. Those long range correlations imply that the current state of the project was determined time ago. In other words, the evolution of the software project is governed by a sort of determinism. But this idea seems to contradict intuition. To explore this apparent contradiction, we have performed an empirical study on a sample of 3,821 libre (free, open source) software projects, finding that their evolution projects is short range correlated. This suggests that the dynamics of software evolution may not be SOC, and therefore that the past of a project does not determine its future except for relatively short periods of time, at least for libre software. %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 1-9 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370752 %> https://flosshub.org/sites/flosshub.org/files/p1-herraiz.pdf %0 Conference Paper %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T Extracting structural information from bug reports %A Premraj, Rahul %A Zimmermann, Thomas %A Kim, Sunghun %A Bettenburg, Nicolas %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K bug reports %K eclipse %K enumerations %K infozilla %K natural language %K patches %K source code %K stack trace %X In software engineering experiments, the description of bug reports is typically treated as natural language text, although it often contains stack traces, source code, and patches. Neglecting such structural elements is a loss of valuable information; structure usually leads to a better performance of machine learning approaches. In this paper, we present a tool called infoZilla that detects structural elements from bug reports with near perfect accuracy and allows us to extract them. We anticipate that infoZilla can be used to leverage data from bug reports at a different granularity level that can facilitate interesting research in the future. %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 27-30 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370757 %> https://flosshub.org/sites/flosshub.org/files/p27-bettenburg.pdf %0 Conference Paper %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %D 2008 %T On the relation of refactorings and software defect prediction %A Sigmund, Thomas %A Gall, Harald C. %A Ratzinger, Jacek %Y Hassan, Ahmed E. %Y Lanza, Michele %Y Godfrey, Michael W. %K argouml %K bug fixing %K bug reports %K defects %K evolution %K jboss %K liferay %K prediction %K refactoring %K spring %K weka %K xdoclet %X This paper analyzes the influence of evolution activities such as refactoring on software defects. In a case study of five open source projects we used attributes of software evolution to predict defects in time periods of six months. We use versioning and issue tracking systems to extract 110 data mining features, which are separated into refactoring and non-refactoring related features. These features are used as input into classification algorithms that create prediction models for software defects. We found out that refactoring related features as well as non-refactoring related features lead to high quality prediction models. Additionally, we discovered that refactorings and defects have an inverse correlation: The number of software defects decreases, if the number of refactorings increased in the preceding time period. As a result, refactoring should be a significant part of both bug fixes and other evolutionary changes to reduce software defects. %B Proceedings of the 2008 international workshop on Mining software repositories - MSR '08 %I ACM Press %C New York, New York, USA %P 35-38 %8 05/2008 %@ 9781605580241 %! MSR '08 %R 10.1145/1370750.1370759 %> https://flosshub.org/sites/flosshub.org/files/p35-ratzinger.pdf %0 Conference Paper %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %D 2007 %T What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List %A Peter C. Rigby %A Hassan, Ahmed E. %K apache %K developers %K email %K joining %K liwc %K mailing lists %K personality %X Developer mailing lists are a rich source of information about Open Source Software (OSS) development. The unstructured nature of email makes extracting information difficult. We use a psychometrically-based linguistic analysis tool, the LIWC, to examine the Apache httpd server developer mailing list. We conduct three preliminary experiments to assess the appropriateness of this tool for information extraction from mailing lists. First, using LIWC dimensions that are correlated with the big five personality traits, we assess the personality of four top developers against a baseline for the entire mailing list. The two developers that were responsible for the major Apache releases had similar personalities. Their personalities were different from the baseline and the other developers. Second, the first and last 50 emails for two top developers who have left the project are examined. The analysis shows promise in understanding why developers join and leave a project. Third, we examine word usage on the mailing list for two major Apache releases. The differences may reflect the relative success of each release. %B Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) %I IEEE %C Minneapolis, MN, USA %P 23 - 23 %@ 0-7695-2950-X %R 10.1109/MSR.2007.35 %> https://flosshub.org/sites/flosshub.org/files/28300023.pdf %0 Conference Paper %B Proceedings of the 2006 international workshop on Mining software repositories %D 2006 %T Examining the evolution of code comments in PostgreSQL %A Zhen Ming Jiang %A Hassan, Ahmed E. %K code comments %K comments %K cvs %K evolution %K functions %K maintenance %K mining challenge %K msr challenge %K postgresql %K software evolution %K software maintenance %K source code %X It is common, especially in large software systems, for developers to change code without updating its associated comments due to their unfamiliarity with the code or due to time constraints. This is a potential problem since outdated comments may confuse or mislead developers who perform future development. Using data recovered from CVS, we study the evolution of code comments in the PostgreSQL project. Our study reveals that over time the percentage of commented functions remains constant except for early fluctuation due to the commenting style of a particular active developer. %B Proceedings of the 2006 international workshop on Mining software repositories %S MSR '06 %I ACM %C New York, NY, USA %P 179–180 %@ 1-59593-397-2 %U http://doi.acm.org/10.1145/1137983.1138030 %R http://doi.acm.org/10.1145/1137983.1138030 %> https://flosshub.org/sites/flosshub.org/files/179ExaminingTheEvolution.pdf %0 Conference Paper %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %D 2001 %T Software Engineering Research in the Bazaar %A Hassan, Ahmed E. %A Godfrey, Michael W. %A Holt, Richard C. %K apache %K architecture %K gcc %K kernel %K linux %K linux kernel %K mozilla %K open source software %K software architecture %K Software Engineering Research %K source code %K vim %X During the last five years, our research group has studied the architecture and evolution of several large open source systems — including Linux, GCC, VIM, Mozilla, and Apache — and we have found that open source software systems often exhibit interesting differences when compared to similar commercially-developed systems. Our investigations of these systems have involved the creation of software architecture models, software architecture repair, the creation of a reference architecture for web servers, the study of evolution and growth of open source systems, and the modelling of architectural properties of systems that are apparent only at build time. %B 1st Workshop on Open Source Software Engineering at ICSE 2001 %> https://flosshub.org/sites/flosshub.org/files/hassangodfreyholt.pdf