%0 Conference Proceedings %B 10th Working Conference on Mining Software Repositories %D 2013 %T The Impact of Tangled Code Changes %A Kim Herzig %A Zeller, Andreas %K bias %K data quality %K history %K java %K mining software repositories %K noise %K tangled code changes %K version control %X When interacting with version control systems, developers often commit unrelated or loosely related code changes in a single transaction. When analyzing the version history, such tangled changes will make all changes to all modules appear related, possibly compromising the resulting analyses through noise and bias. In an investigation of five open-source JAVA projects, we found up to 15% of all bug fixes to consist of multiple tangled changes. Using a multi-predictor approach to untangle changes, we show that on average at least 16.6% of all source files are incorrectly associated with bug reports. We recommend better change organization to limit the impact of tangled changes. %B 10th Working Conference on Mining Software Repositories %8 05/2013 %U http://www.kim-herzig.de/wp-content/uploads/2013/03/msr2013-untangling.pdf %> https://flosshub.org/sites/flosshub.org/files/msr2013-untangling.pdf %0 Conference Proceedings %B 35th Int'l Conference on Software Engineering (ICSE 2013) %D 2013 %T It’s Not a Bug, It’s a Feature: How Misclassification Impacts Bug Prediction %A Kim Herzig %A Sascha Just %A Zeller, Andreas %K bias %K bug reports %K data quality %K mining software repositories %K noise %X In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified—that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We discuss the impact of this misclassification on earlier studies and recommend manual data validation for future studies. %B 35th Int'l Conference on Software Engineering (ICSE 2013) %P 392-401 %8 05/2013