%0 Journal Article %J Empirical Software Engineering %D 2012 %T Software Bertillonage %A Davies, Julius %A German, Daniel %A Michael Godfrey %A Hindle, Abram %X Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components—such as external libraries or cloned source code—is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying the source origin of binary libraries within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 275 GB collection of open source Java libraries. To show the approach is both valid and effective, we conducted an empirical study on 945 jars from the Debian GNU/Linux distribution, as well as an industrial case study on 81 jars from an e-commerce application. %B Empirical Software Engineering %I Springer Netherlands %P 1-43 %U http://dx.doi.org/10.1007/s10664-012-9199-7 %R 10.1007/s10664-012-9199-7 %0 Conference Paper %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %D 2011 %T Apples vs. oranges? %A Davies, Julius %A Daniel M. German %Y van Deursen, Arie %Y Xie, Tao %Y Zimmermann, Thomas %K eclipse %K netbeans %K source code %X We attempt to compare the source code of two Java IDE systems: Netbeans and Eclipse. The result of this experiment shows that many factors, if ignored, could risk a bias in the results, and we posit various observations that should be taken into consideration to minimize such risk. %B Proceedings of the 8th working conference on Mining software repositories - MSR '11 %I ACM Press %C New York, New York, USA %P 246-249 %8 05/2011 %@ 9781450305747 %! MSR '11 %R 10.1145/1985441.1985483 %0 Conference Paper %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %D 2010 %T Perspectives on bugs in the Debian bug tracking system %A Davies, Julius %A Hanyu Zhang %A Nussbaum, Lucas %A Daniel M. German %K bug reports %K debian %K msr challenge %K popularity %X Bugs in Debian differ from regular software bugs. They are usually associated with packages, instead of software modules. They are caused and fixed by source package uploads instead of code commits. The majority are reported by individuals who appear in the bug database once, and only once. There also exists a small group of bug reporters with over 1,000 bug reports each to their name. We also explore our idea that a high bug-frequency for an individual package might be an indicator of popularity instead of poor quality. %B 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) %I IEEE %C Cape Town, South Africa %P 86 - 89 %@ 978-1-4244-6802-7 %R 10.1109/MSR.2010.5463288 %> https://flosshub.org/sites/flosshub.org/files/86bugs-debian.pdf