Detecting similar Java classes using tree algorithms

TitleDetecting similar Java classes using tree algorithms
Publication TypeConference Paper
Year of Publication2006
AuthorsSager, T, Bernstein, A, Pinzger, M, Kiefer, C
Secondary TitleProceedings of the 2006 international workshop on Mining software repositories
Place PublishedNew York, NY, USA
ISBN Number1-59593-397-2
Keywordschange analysis, clones, coogle, eclipse, famix, java, similarity, software evolution, software repositories, source code, tree similarity measures

Similarity analysis of source code is helpful during development to provide, for instance, better support for code reuse. Consider a development environment that analyzes code while typing and that suggests similar code examples or existing implementations from a source code repository. Mining software repositories by means of similarity measures enables and enforces reusing existing code and reduces the developing effort needed by creating a shared knowledge base of code fragments. In information retrieval similarity measures are often used to find documents similar to a given query document. This paper extends this idea to source code repositories. It introduces our approach to detect similar Java classes in software projects using tree similarity algorithms. We show how our approach allows to find similar Java classes based on an evaluation of three tree-based similarity measures in the context of five user-defined test cases as well as a preliminary software evolution analysis of a medium-sized Java project. Initial results of our technique indicate that it (1) is indeed useful to identify similar Java classes, (2)successfully identifies the ex ante and ex post versions of refactored classes, and (3) provides some interesting insights into within-version and between-version dependencies of classes within a Java project.

Full Text
PDF icon 65Detecting.pdf553.78 KB