The Debsources Dataset: Two Decades of Debian Source Code Metadata

Publication TypeConference Proceedings
Year of Publication2015
AuthorsZacchiroli, S
Refereed DesignationNon-Refereed
Secondary Title12th Working Conference on Mining Software Repositories (MSR 2015)
Date Published05/2015

We present the Debsources Dataset: distribution
metadata and source code metrics spanning two decades of Free
and Open Source Software (FOSS) history, seen through the lens
of the Debian distribution.
Debsources is a software platform used to gather, search, and
publish on the Web the full source code of the Debian operating
system, as well as measures about it. A notable public instance
of Debsources is available at; it includes
both current and historical releases of Debian. Plugins to compute
popular source code metrics (lines of code, defined symbols,
disk usage) and other derived data (e.g., checksums) have been
written, integrated, and run on all the source code available on
The Debsources Dataset is a PostgreSQL database dump
of metadata, as of February 10th, 2015.
The dataset contains both Debian-specific metadata—e.g., which
software packages are available in which release, which source
code file belong to which package, release dates, etc.—and source
code information gathered by running Debsources plugins.
The Debsources Dataset offer a very long-term historical view
of the macro-level evolution and constitution of FOSS through
the lens of popular, representative FOSS projects of their times.

