A Qualitative Method for Mining Open Source Software Repositories

Publication TypeConference Proceedings
Year of Publication2012
AuthorsNoll, J, Seichter, D, Beecham, S
Refereed DesignationRefereed
Secondary TitleIFIP Advances in Information and Communication Technology 378 (OSS 2012)
Date Published09/2012
PublisherIFIP AICT, Springer
Keywordscontent analysis, Electronic Medical Record, Qualitative Research

The volume of data archived in open source software project repositories makes automated, quantitative techniques attractive for extracting and analyzing information from these archives. However, many kinds of archival data include blocks of natural language text that are difficult to analyze automatically.

This paper introduces a qualitative analysis method that is transparent and repeatable, leads to objective findings when dealing with qualitative data, and is efficient enough to be applied to large archives.

The method was applied in a case study of developer and user forum discussions of an open source electronic medical record project. The study demonstrates that the qualitative repository mining method can be employed to derive useful results quickly yet accurately. These results would not be possible using a strictly automated approach.


The method proposed by this study employs content analysis Krippendorff [10], a classification technique that is frequently applied to interview and focus group data. The objective of content analysis is to ask quantitative questions about qualitative data. The approach is similar to the grounded theory method, but differs from grounded theory in that the results are quantitative rather than qualitative: content analysis produces results such as, “49% of messages submitted to project mailing lists were sent by core developers.”

