%0 Journal Article %J International Journal of Open Source Software and Processes %D 2010 %T Impact of Programming Language Fragmentation on Developer Productivity %A Krein, Jonathan L. %A MacLean, Alexander C. %A Knutson, Charles D. %A Delorey, Daniel P. %A Eggett, Dennis L. %K commits %K entropy %K language entropy %K programming languages %K sourceforge %K srda %X Programmers often develop software in multiple languages. In an effort to study the effects of programming language fragmentation on productivity—and ultimately on a developer’s problem-solving abilities—the authors present a metric, language entropy, for characterizing the distribution of a developer’s programming efforts across multiple programming languages. This paper presents an observational study examining the project contributions of a random sample of 500 SourceForge developers. Using a random coefficients model, the authors find a statistically (alpha level of 0.001) and practically significant correlation between language entropy and the size of monthly project contributions. Results indicate that programming language fragmentation is negatively related to the total amount of code contributed by developers within SourceForge, an open source software (OSS) community. %B International Journal of Open Source Software and Processes %V 2 %P 41 - 61 %8 32/2010 %N 2 %R 10.4018/jossp.2010040104 %0 Conference Paper %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %D 2009 %T Author entropy vs. file size in the GNOME suite of applications %A Casebolt, Jason R. %A Krein, Jonathan L. %A MacLean, Alexander C. %A Knutson, Charles D. %A Delorey, Daniel P. %K author entropy %K contributions %K gnome %K msr challenge %X We present the results of a study in which author entropy was used to characterize author contributions per file. Our analysis reveals three patterns: banding in the data, uneven distribution of data across bands, and file size dependent distributions within bands. Our results suggest that when two authors contribute to a file, large files are more likely to have a dominant author than smaller files. %B 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR)2009 6th IEEE International Working Conference on Mining Software Repositories %I IEEE %C Vancouver, BC, Canada %P 91 - 94 %@ 978-1-4244-3493-0 %R 10.1109/MSR.2009.5069484 %0 Conference Paper %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %D 2009 %T Language entropy: A metric for characterization of author programming language distribution %A Krein, Jonathan L. %A MacLean, Alexander C. %A Delorey, Daniel P. %A Knutson, Charles D. %A Eggett, Dennis L. %K contributions %K developers %K language entropy %K lines of code %K loc %K multiple languages %K programming languages %K sourceforge %X Programmers are often required to develop in multiple languages. In an effort to study the effects of programming language fragmentation on productivity—and ultimately on a programmer’s problem solving abilities—we propose a metric, language entropy, for characterizing the distribution of an individual’s development efforts across multiple programming languages. To evaluate this metric, we present an observational study examining all project contributions (through August 2006) of a random sample of 500 SourceForge developers. Using a random coefficients model, we found a statistically significant correlation (alpha level of 0.05) between language entropy and the size of monthly pro ject contributions (measured in lines of code added). Our results indicate that language entropy is a good candidate for characterizing author programing language distribution. %B 4th Workshop on Public Data about Software Development (WoPDaSD 2009) %8 2009 %> https://flosshub.org/sites/flosshub.org/files/LanguageEntropy-JonathanKrein.pdf %0 Conference Proceedings %B 21st Annual Psychology of Programming Interest Group Conference %D 2009 %T Mining Programming Language Vocabularies from Source Code %A Delorey, Daniel P. %A Knutson, Charles D. %A Davies, Mark %X We can learn much from the artifacts produced as the by-products of software devel- opment and stored in software repositories. Of all such potential data sources, one of the most important from the perspective of program comprehension is the source code itself. While other data sources give insight into what developers intend a program to do, the source code is the most accurate human-accessible description of what it will do. However, the ability of an individual developer to comprehend a particular source file depends directly on his or her familiarity with the specific features of the programming language being used in the file. This is not unlike the difficulties second-language learners may encounter when attempting to read a text written in a new language. We propose that by applying the techniques used by corpus linguists in the study of natural language texts to a corpus of programming language texts (i.e., source code repositories), we can gain new insights into the communication medium that is programming language. In this paper we lay the foundation for applying corpus linguistic methods to programming language by 1) defining the term “word” for programming language, 2) developing data collection tools and a data storage schema for the Java programming language, and 3) presenting an initial analysis of an example linguistic corpus based on version 1.5 of the Java Developers Kit. %B 21st Annual Psychology of Programming Interest Group Conference %P 12 pp %> https://flosshub.org/sites/flosshub.org/files/21st-delorey.pdf %0 Conference Paper %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %D 2008 %T Author Entropy: A Metric for Characterization of Software Authorship Patterns %A Taylor, Quinn C. %A Stevenson, James E. %A Delorey, Daniel P. %A Knutson, Charles D. %K developers %K entropy %K flossmole %K sourceforge %X We propose the concept of author entropy and describe how file-level entropy measures may be used to understand and characterize authorship patterns within individual files, as well as across an entire project. As a proof of concept, we compute author entropy for 28,955 files from 33 open-source projects. We explore patterns of author entropy, identify techniques for visualizing author entropy, and propose avenues for further study. %B 3rd Workshop on Public Data about Software Development (WoPDaSD 2008) %P 42-47 %8 2008 %> https://flosshub.org/sites/flosshub.org/files/entropy2008.pdf %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T Programming Language Trends in Open Source Development: An Evaluation Using Data from All Production Phase SourceForge Projects %A Delorey, Daniel P. %A Knutson, Charles D. %A Giraud-Carrier, C. %K cvs %K cvs2mysql %K programming languages %K sfra %K sourceforge %K srda %X In this work, we analyze data collected from the CVS repos- itories of 9,997 Open Source projects hosted on SourceForge in an effort to understand trends in programming language usage in the Open Source community between 2000 and 2005. The trends we consider include: 1) the relative popularity of the ten most popular programming languages over time, 2) the use of multiple programming languages by individual programmers and by individual projects, and 3) the programming languages most often used in combination. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %> https://flosshub.org/sites/flosshub.org/files/Delorey2007b.pdf %0 Conference Paper %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %D 2007 %T Studying Production Phase SourceForge Projects: An Exploratory Analysis Using cvs2mysql and SFRA %A Delorey, Daniel P. %A Knutson, Charles D. %A MacLean, Alexander C. %K Data Collection %K forge %K repositories %K sourceforge %X A wealth of data can be extracted from the natural by-products of software development processes and used in empirical studies of software engineering. However, the size and accuracy of such studies depend in large part on the availability of tools that facilitate the collection of data from individual projects and the combination of data from multiple projects. To demonstrate this point, we present our experience gathering and analyzing data from nearly 10,000 open source projects hosted on SourceForge. We describe the tools we developed to collect the data and the ways in which these tools and data may be used by other researchers. We also provide examples of statistics that we have calculated from these data to describe interesting author- and project-level behaviors of the SourceForge community. %B 2nd Workshop on Public Data about Software Development (WoPDaSD 2007) %8 2007 %> https://flosshub.org/sites/flosshub.org/files/Delorey2007c.pdf