A Data Set for Social Diversity Studies of GitHub Teams

TitleA Data Set for Social Diversity Studies of GitHub Teams
Publication TypeConference Proceedings
Year of Publication2015
AuthorsVasilescu, B, Serebrenik, A, Filkov, V
Secondary Title12th Working Conference on Mining Software Repositories (MSR 2015)
Date Published05/2015
Keywordsghtorrent, github

Like any other team oriented activity, the software
development process is effected by social diversity in the programmer
teams. The effect of team diversity can be significant,
but also complex, especially in decentralized teams. Discerning
the precise contribution of diversity on teams’ effectiveness
requires quantitative studies of large data sets.
Here we present for the first time a large data set of social
diversity attributes of programmers in GITHUB teams. Using
alias resolution, location data, and gender inference techniques,
we collected a team social diversity data set of 23,493 GITHUB
projects. We illustrate how the data set can be used in practice
with a series of case studies, and we hope its availability will foster
more interest in studying diversity issues in software teams.

Full Text
PDF icon cr-msr-data-15.pdf238.34 KB
Taxonomy upgrade extras: