Knowledge Homogeneity and Specialization in the Apache HTTP Server Project
|Title||Knowledge Homogeneity and Specialization in the Apache HTTP Server Project|
|Publication Type||Conference Proceedings|
|Year of Publication||2011|
|Authors||MacLean, Alexander C., Pratt Landon J., Knutson Charles D., and Ringger Eric K.|
|Secondary Title||Open Source Systems: Grounding Research (OSS 2011)|
|Keywords||apache, commits, developer, email, email archive, LDA, mailing list, revision control, revision history, scm, social network analysis, specialization, subversion, svn|
We present an analysis of developer communication in the Apache HTTP Server project. Using topic modeling techniques we expose latent conceptual sub-communities arising from developer specialization within the greater developer population. However, we found that among the major contributors to the project, very little specialization exists. We present theories to explain this phenomenon, and suggest further research.
"Our data set consists of the commit history and email archives for the Apache HTTP Server Project, spanning sixteen years (2/27/1995 - 1/31/2011)"
"we 1) mapped the committers to email records, 2) cleaned the email records to remove extraneous information, 3) identified topics of discussion in the resulting messages, and 4) constructed a social network model from committers and topics."
"If specialization exists within the httpd community, we should see distinct communities develop around topics. In addition, unique groups of developers should congregate around specialized subtopics. We examined the data from both angles: topical affinity and topic communities."