Evaluation of source code copy detection methods on freebsd

Submitted by msquire on Wed, 2011-04-13 14:09

Title	Evaluation of source code copy detection methods on freebsd
Publication Type	Conference Paper
Year of Publication	2008
Authors	Chang, H-F, Mockus, A
Secondary Title	Proceedings of the 2008 international working conference on Mining software repositories
Pagination	61–66
Date Published	05/2008
Publisher	ACM
Place Published	New York, NY, USA
ISBN Number	978-1-60558-024-1
Keywords	clone, cloning, code copying, freebsd, version control
Abstract	Studies have shown that substantial code reuse is common in open source and in commercial projects. However, the precise extent of reuse and its impact on productivity and quality are not well investigated in the open source context. Previously, we have introduced a simple-to-use method that needs only a set of file pathnames to identifies directories that share filenames and partially validated its performance on a set of closed-source projects. To evaluate this method and to improve reuse detection at the file level, we apply it and four additional file copy detection methods that utilize the underlying content of multiple versions of the source code on the FreeBSD project. The evaluation quantified unique advantages of each method and showed that the filename method detected roughly half of all reuse cases. We are still faced with a challenge to scale the content based methods to large repositories containing all versions of open source files.
URL	http://doi.acm.org/10.1145/1370750.1370766
DOI	10.1145/1370750.1370766
Full Text