Visualizing collaboration and influence in the open-source software community

TitleVisualizing collaboration and influence in the open-source software community
Publication TypeConference Paper
Year of Publication2011
AuthorsMarschner, E, Rosenfeld, E, Heer, J, Heller, B
Tertiary Authorsvan Deursen, A, Xie, T, Zimmermann, T
Secondary TitleProceedings of the 8th working conference on Mining software repositories - MSR '11
Date Published05/2011
PublisherACM Press
Place PublishedNew York, New York, USA
ISBN Number9781450305747
KeywordsCOLLABORATION, data exploration, geography, geoscatter, github, graph, mapping, metadata, open source, social graph, user profiles, visualization

We apply visualization techniques to user profiles and repository metadata from the GitHub source code hosting service. Our motivation is to identify patterns within this development community that might otherwise remain obscured. Such patterns include the effect of geographic distance on developer relationships, social connectivity and influence among cities, and variation in project-specific contribution styles (e.g., centralized vs. distributed). Our analysis examines directed graphs in which nodes represent users' geographic locations and edges represent (a) follower relationships, (b) successive commits, or (c) contributions to the same project. We inspect this data using a set of visualization techniques: geo-scatter maps, small multiple displays, and matrix diagrams. Using these representations, and tools based on them, we develop hypotheses about the larger GitHub community that would be difficult to discern using traditional lists, tables, or descriptive statistics. These methods are not intended to provide conclusive answers; instead, they provide a way for researchers to explore the question space and communicate initial insights.


"This data set includes the complete social graph of 500,000 follow links as well as over 1,000,000 commits and 50,000 users."

"...a large fraction of [GitHub] users provide a location in their profile, which we can turn into geographic coordinates using a geocoding API like PlaceFinder...

"For each repository, we extract the owner, collaborator, and contributor usernames, plus branch names. New user- names help to find new repositories, while branch names are used to fetch commit metadata. Using this method, the crawler uncovered 40,860 code repositories, representing 33,388 unique project names and 1,219,872 individual commits."

"In addition to crawled data, we use the complete GitHub user follower graph from Jan 19, 2011. This graph includes 452,248 links connecting 106,247 unique users, 47% (49,500) of which could be geocoded with the PlaceFinder API"

Full Text