%0 Conference Paper %B Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 %D 2009 %T The promises and perils of mining git %A Christian Bird %A Peter C. Rigby %A Earl T. Barr %A David J. Hamilton %A Daniel M. Germán %A Premkumar T. Devanbu %K dscm %K git %K mining %K scm %K source code %X We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as "How do contributions flow between developers to the official project repository?" However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data. %B Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 %P 1-10 %> https://flosshub.org/sites/flosshub.org/files/1promisePeril.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Expertise identification and visualization from CVS %A Alonso, Omar %A Premkumar T. Devanbu %A Gertz, Michael %K apache %K classification %K committers %K components %K contributors %K expertise %K expertise identification %K repository %K scm %K source code %X As software evolves over time, the identification of expertise becomes an important problem. Component ownership and team awareness of such ownership are signals of solid project. Ownership and ownership awareness are also issues in open-source software (OSS) projects. Indeed, the membership in OSS projects is dynamic with team members arriving and leaving. In large open source projects, specialists who know the system very well are considered experts. How can one identify the experts in a project by mining a particular repository like the source code? Have they gotten help from other people? We provide an approach using classification of the source code tree as a path to derive the expertise of the committers. Because committers may get help from other people, we also retrieve their contributors. We also provide a visualization that helps to further explore the repository via committers and categories. We present a prototype implementation that describes our research using the Apache HTTP Web server project as a case study. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 125–128 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370780 %R http://doi.acm.org/10.1145/1370750.1370780 %> https://flosshub.org/sites/flosshub.org/files/p125-alonso.pdf %0 Conference Paper %B Proceedings of the 2008 international working conference on Mining software repositories %D 2008 %T Talk and work: a preliminary report %A Pattison, David S. %A Bird, Christian A. %A Premkumar T. Devanbu %K ant %K apache %K email %K mailing lists %K postgresql %K python %K scm %K source code %X Developers in Open Source Software (OSS) projects communicate using mailing lists. By convention, the mailing lists used only for task-related discussions, so they are primarily concerned with the software under development, and software process issues (releases, etc.). We focus on the discussions concerning the software, and study the frequency with which software entities (functions, methods, classes, etc) are mentioned in the mail. We find a strong, striking, cumulative relationship between this mention count in the email, and the number of times these entities are included in changes to the software. When we study the same phenomena over a series of time-intervals, the relationship is much less strong. This suggests some interesting avenues for future research. %B Proceedings of the 2008 international working conference on Mining software repositories %S MSR '08 %I ACM %C New York, NY, USA %P 113–116 %8 05/2008 %@ 978-1-60558-024-1 %U http://doi.acm.org/10.1145/1370750.1370776 %R http://doi.acm.org/10.1145/1370750.1370776 %> https://flosshub.org/sites/flosshub.org/files/p113-pattison.pdf