Evaluation of source code copy detection methods on freebsd

TitleEvaluation of source code copy detection methods on freebsd
Publication TypeConference Paper
Year of Publication2008
AuthorsChang, H-F, Mockus, A
Secondary TitleProceedings of the 2008 international working conference on Mining software repositories
Date Published05/2008
Place PublishedNew York, NY, USA
ISBN Number978-1-60558-024-1
Keywordsclone, cloning, code copying, freebsd, version control

Studies have shown that substantial code reuse is common in open source and in commercial projects. However, the precise extent of reuse and its impact on productivity and quality are not well investigated in the open source context. Previously, we have introduced a simple-to-use method that needs only a set of file pathnames to identifies directories that share filenames and partially validated its performance on a set of closed-source projects. To evaluate this method and to improve reuse detection at the file level, we apply it and four additional file copy detection methods that utilize the underlying content of multiple versions of the source code on the FreeBSD project. The evaluation quantified unique advantages of each method and showed that the filename method detected roughly half of all reuse cases. We are still faced with a challenge to scale the content based methods to large repositories containing all versions of open source files.

Full Text
PDF icon p61-chang.pdf225.53 KB