%0 Conference Proceedings %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %D 2017 %T How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects %A Y. Wu %A Manabe, Yuki %A Daniel M. Germán %A Inoue, K. %K Code clone %K debian %K License inconsistency %K licenses %K Software license %X A license inconsistency is the presence of two or more source files that evolved from the same original file containing different licenses. In our previous study, we have shown that license inconsistencies do exist in open source projects and may lead to potential license violation problems. In this study, we try to find out whether the issues of license inconsistencies are properly solved by analyzing two versions of a FOSS distribution—Debian—and investigate the evolution patterns of license inconsistencies. Findings are: license inconsistencies occur mostly because the original copyright owner updated the license while the reusers were still using the old version of the source files with the old license; most license inconsistencies would disappear when the reusers synchronize their project from the upstream, while some would exist permanently if reusers decide not to synchronize anymore. Legally suspicious cases have not been found yet in those Debian distributions. %B Open Source Systems: Towards Robust Practices 13th International Conference on Open Source Systems %S IFIP Advances in Information and Communication Technology %I Springer %V 496 %P 69-79 %8 05/2017 %U https://link.springer.com/chapter/10.1007/978-3-319-57735-7_8 %R 10.1007/978-3-319-57735-7_8 %0 Journal Article %J Empirical Software Engineering %D 2016 %T The Debsources Dataset: two decades of free and open source software %A Caneill, Matthieu %A Daniel M. Germán %A Zacchiroli, Stefano %K debian %K metadata %K postgresql %X We present the Debsources Dataset: distribution metadata and source code metrics spanning two decades of Free and Open Source Software (FOSS) history, seen through the lens of the Debian distribution. Debsources is a software platform used to gather, search, and publish on the Web the full source code of the Debian operating system, as well as measures about it. A notable public instance of Debsources is available at http://sources.debian.net, it includes both current and historical releases of Debian. Plugins to compute popular source code metrics (lines of code, defined symbols, disk usage) and other derived data (e.g., Checksums) have been written, integrated, and run on all the source code available on sources.debian.net. The Debsources Dataset is a PostgreSQL database dump of sources.debian.net metadata, as of February 10th, 2015. The dataset contains both Debian-specific metadata -- e.g., which software packages are available in which release, which source code file belong to which package, release dates, etc. -- and source code information gathered by running Debsources plugins. The Debsources Dataset offer a very long-term historical view of the macro-level evolution and constitution of FOSS through the lens of popular, representative FOSS projects of their times. %B Empirical Software Engineering %I IEEE %8 05/2015 %U https://matthieu.io/dl/papers/debsources-ese-2016.pdf %! Empir Software Eng %R 10.1007/s10664-016-9461-5 %0 Conference Paper %B Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 %D 2009 %T The promises and perils of mining git %A Christian Bird %A Peter C. Rigby %A Earl T. Barr %A David J. Hamilton %A Daniel M. Germán %A Premkumar T. Devanbu %K dscm %K git %K mining %K scm %K source code %X We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as "How do contributions flow between developers to the official project repository?" However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data. %B Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 %P 1-10 %> https://flosshub.org/sites/flosshub.org/files/1promisePeril.pdf