Machine Learning-Based Detection of Open Source License Exceptions

TitleMachine Learning-Based Detection of Open Source License Exceptions
Publication TypeConference Proceedings
Year of Publication2017
AuthorsVendome, Christopher, Linares-Vasquez Mario, Bavota Gabriele, Di Penta Massimiliano, German Daniel M., and Poshyvanyk Denys
Secondary Title2017 IEEE/ACM 39th International Conference on Software Engineering
Date Published05/2017
Keywordsclassifier, empirical studies, license, machine learning

From a legal perspective, software licenses govern the redistribution, reuse, and modification of software as both source and binary code. Free and Open Source Software (FOSS) licenses vary in the degree to which they are permissive or restrictive in allowing redistribution or modification under licenses different from the original one(s). In certain cases developers may modify the license by appending to it an exception to specifically allow reuse or modification under a particular condition. These exceptions are an important factor to consider for license compliance analysis since they modify the standard (and widely understood_ terms of the original license. In this work, we first perform a large-scale empirical study on the change history of over 51k FOSS systems aimed at quantitatively investigating the prevalence of known license exceptions and identifying new ones. Subsequently, we performed a study on the detection of license exceptions by relying on machine learning. We evaluated the license exception classification with four different supervised learners and sensitivity analysis. Finally we present a categorization of license exceptions and explain their implications.


"We address these questions by first performing a large scale mining-based study... [W]e analyzed the source code of 51,754 projects written in six different programming languages (Ruby, Javascript, Python, C, C++, and C#) hosted on GitHub.