"May the fork be with you": novel metrics to analyze collaboration on GitHub

Multi–repository software projects are becoming more and
more popular, thanks to web–based facilities such as GitHub.
Code and process metrics generally assume a single repository
must be analyzed, in order to measure the characteristics
of a codebase. Thus they are not apt to measure how
much relevant information is hosted in multiple repositories
contributing to the same codebase. Nor can they feature
the characteristics of such a distributed development process.
We present a set of novel metrics, based on an original
classification of commits, conceived to capture some interesting
aspects of a multi–repository development process. We
also describe an efficient way to build a data structure that
allows to compute these metrics on a set of Git repositories.
Interesting outcomes, obtained by applying our metrics
on a large sample of projects hosted on GitHub, show the
usefulness of our contribution.


"According to FLOSSmole [8] (Free Libre OpenSource Software)
statistics, GitHub had 191765 repositories publicly
available at May 2012."