Title | Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories |
Publication Type | Conference Paper |
Year of Publication | 2005 |
Authors | Antoniol, G, Rollo, VF, Venturi, G |
Secondary Title | Proceedings of the 2005 international workshop on Mining software repositories |
Pagination | 74-78 |
Publisher | ACM |
Place Published | New York, NY, USA |
ISBN Number | 1-59593-123-6 |
Keywords | change history, data mining, evolution, files, kernel, linear predictive coding, linux, lpc, size, software evolution, source code |
Abstract | This paper presents an approach to recover time variant information from software repositories. It is widely accepted that software evolves due to factors such as defect removal, market opportunity or adding new features. Software evolution details are stored in software repositories which often contain the changes history. On the other hand there is a lack of approaches, technologies and methods to efficiently extract and represent time dependent information. Disciplines such as signal and image processing or speech recognition adopt frequency domain representations to mitigate differences of signals evolving in time. Inspired by time-frequency duality, this paper proposes the use of Linear Predictive Coding (LPC) and Cepstrum coefficients to model time varying software artifact histories. LPC or Cepstrum allow obtaining very compact representations with linear complexity. These representations can be used to highlight components and artifacts evolved in the same way or with very similar evolution patterns. To assess the proposed approach we applied LPC and Cepstral analysis to 211 Linux kernel releases (i.e., from 1.0 to 1.3.100), to identify files with very similar size histories. The approach, the preliminary results and the lesson learned are presented in this paper.
|
URL | http://doi.acm.org/10.1145/1082983.1083156 |
DOI | 10.1145/1082983.1083156 |
Full Text | |