Author: "Steffen Herbold" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Steffen Herbold"' showing total 93 results

Start Over Author "Steffen Herbold"

93 results on '"Steffen Herbold"'

1. A large-scale comparison of human-written versus ChatGPT-generated essays

Author: Steffen Herbold, Annette Hautli-Janisz, Ute Heuer, Zlata Kikteva, and Alexander Trautsch
Subjects: Medicine, Science
Abstract: Abstract ChatGPT and similar generative AI models have attracted hundreds of millions of users and have become part of the public discourse. Many believe that such models will disrupt society and lead to significant changes in the education system and information generation. So far, this belief is based on either colloquial evidence or benchmarks from the owners of the models—both lack scientific rigor. We systematically assess the quality of AI-generated content through a large-scale study comparing human-written versus ChatGPT-generated argumentative student essays. We use essays that were rated by a large number of human experts (teachers). We augment the analysis by considering a set of linguistic characteristics of the generated essays. Our results demonstrate that ChatGPT generates essays that are rated higher regarding quality than human-written essays. The writing style of the AI models exhibits linguistic characteristics that are different from those of the human-written essays. Since the technology is readily available, we believe that educators must act immediately. We must re-invent homework and develop teaching concepts that utilize these AI models in the same way as math utilizes the calculator: teach the general concepts first and then use AI tools to free up time for other learning objectives.
Published: 2023
Full Text: View/download PDF

2. Galba: genome annotation with miniprot and AUGUSTUS

Author: Tomáš Brůna, Heng Li, Joseph Guhlin, Daniel Honsel, Steffen Herbold, Mario Stanke, Natalia Nenasheva, Matthis Ebel, Lars Gabriel, and Katharina J. Hoff
Subjects: Gene prediction, Protein coding gene, Miniprot, AUGUSTUS, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Results Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein-to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. Conclusions Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.
Published: 2023
Full Text: View/download PDF

3. Question Type Prediction in Natural Debate.

Author: Zlata Kikteva, Alexander Trautsch, Steffen Herbold, and Annette Hautli-Janisz
Published: 2024

4. On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers.

Author: Laura Cabra-Acela, Anamaria Mojica-Hanke, Mario Linares-Vásquez, and Steffen Herbold
Published: 2023
Full Text: View/download PDF

5. On the Impact of Reconstruction and Context for Argument Prediction in Natural Debate.

Author: Zlata Kikteva, Alexander Trautsch, Patrick Katzer, Mirko Oest, Steffen Herbold, and Annette Hautli-Janisz
Published: 2023
Full Text: View/download PDF

6. Predicting Issue Types with seBERT.

Author: Alexander Trautsch and Steffen Herbold
Published: 2022
Full Text: View/download PDF

7. Smoke testing for machine learning: simple tests to discover severe bugs.

Author: Steffen Herbold and Tobias Haar
Published: 2023

8. On the validity of pre-trained transformers for natural language processing in the software engineering domain.

Author: Julian von der Mosel, Alexander Trautsch, and Steffen Herbold
Published: 2023

9. Exploring the relationship between performance metrics and cost saving potential of defect prediction models.

Author: Steffen Tunkel and Steffen Herbold
Published: 2023

10. Problems with with SZZ and Features: An empirical assessment of the state of practice of defect prediction data collection.

Author: Steffen Herbold, Alexander Trautsch, Fabian Trautsch, and Benjamin Ledel
Published: 2023

11. With registered reports towards large scale data curation.

Author: Steffen Herbold
Published: 2020
Full Text: View/download PDF

12. The SmartSHARK ecosystem for software repository mining.

Author: Alexander Trautsch, Fabian Trautsch, Steffen Herbold, Benjamin Ledel, and Jens Grabowski
Published: 2020
Full Text: View/download PDF

13. Large-Scale Manual Validation of Bugfixing Changes.

Author: Steffen Herbold, Alexander Trautsch, and Benjamin Ledel
Published: 2020
Full Text: View/download PDF

14. Static source code metrics and static analysis warnings for fine-grained just-in-time defect prediction.

Author: Alexander Trautsch, Steffen Herbold, and Jens Grabowski
Published: 2020
Full Text: View/download PDF

15. Data-Science-Crashkurs: Eine interaktive und praktische Einführung

Author: Steffen Herbold
Published: 2022

16. A Systematic Mapping Study of Developer Social Network Research.

Author: Steffen Herbold, Aynur Amirfallah, Fabian Trautsch, and Jens Grabowski
Published: 2021
Full Text: View/download PDF

17. On the Feasibility of Automated Prediction of Bug and Non-Bug Issues.

Author: Steffen Herbold, Alexander Trautsch, and Fabian Trautsch
Published: 2021
Full Text: View/download PDF

18. A Longitudinal Study of Static Analysis Warning Evolution and the Effects of PMD on Software Quality in Apache Open Source Projects.

Author: Alexander Trautsch, Steffen Herbold, and Jens Grabowski
Published: 2021
Full Text: View/download PDF

19. Are Unit and Integration Test Definitions Still Valid for Modern Java Projects? An Empirical Study on Open-Source Projects.

Author: Fabian Trautsch, Steffen Herbold, and Jens Grabowski
Published: 2021
Full Text: View/download PDF

20. On the Relatively Small Impact of Deep Dependencies on Cloud Application Reliability.

Author: Xiaowei Wang 0001, Fabian Glaser, Steffen Herbold, and Jens Grabowski
Published: 2017
Full Text: View/download PDF

21. Performance tuning for automotive Software Fault Prediction.

Author: Harald Altinger, Steffen Herbold, Friederike Schneemann, Jens Grabowski, and Franz Wotawa
Published: 2017
Full Text: View/download PDF

22. On the Cost and Profit of Software Defect Prediction.

Author: Steffen Herbold
Published: 2021
Full Text: View/download PDF

23. Adressing problems with external validity of repository mining studies through a smart data platform.

Author: Fabian Trautsch, Steffen Herbold, Philip Makedonski, and Jens Grabowski
Published: 2016
Full Text: View/download PDF

24. Learning from Software Project Histories - Predictive Studies Based on Mining Software Repositories.

Author: Verena Honsel, Steffen Herbold, and Jens Grabowski
Published: 2016
Full Text: View/download PDF

25. Hidden Markov Models for the Prediction of Developer Involvement Dynamics and Workload.

Author: Verena Honsel, Steffen Herbold, and Jens Grabowski
Published: 2016
Full Text: View/download PDF

26. Automated Deployment and Parallel Execution of Legacy Applications in Cloud Environments (Short Paper).

Author: Michael Göttsche, Fabian Glaser, Steffen Herbold, and Jens Grabowski
Published: 2015
Full Text: View/download PDF

27. Improving Security Testing with Usage-Based Fuzz Testing.

Author: Martin A. Schneider, Steffen Herbold, Marc-Florian Wendland, and Jens Grabowski
Published: 2015
Full Text: View/download PDF

28. Intuition vs. Truth: Evaluation of Common Myths about StackOverflow Posts.

Author: Verena Honsel, Steffen Herbold, and Jens Grabowski
Published: 2015
Full Text: View/download PDF

29. Mining Software Dependency Networks for Agent-Based Simulation of Software Evolution.

Author: Verena Honsel, Daniel Honsel, Steffen Herbold, Jens Grabowski, and Stephan Waack
Published: 2015
Full Text: View/download PDF

30. CrossPare: A Tool for Benchmarking Cross-Project Defect Predictions.

Author: Steffen Herbold
Published: 2015
Full Text: View/download PDF

31. The MIDAS Cloud Platform for Testing SOA Applications.

Author: Steffen Herbold, Alberto De Francesco, Jens Grabowski, Patrick Harms, Lom-Messan Hillah, Fabrice Kordon, Ariele-Paolo Maesano, Libero Maesano, Claudia Di Napoli, Fabio De Rosa, Martin A. Schneider, Nicola Tonellotto, Marc-Florian Wendland, and Pierre-Henri Wuillemin
Published: 2015
Full Text: View/download PDF

32. Novel Insights on Cross Project Fault Prediction Applied to Automotive Software.

Author: Harald Altinger, Steffen Herbold, Jens Grabowski, and Franz Wotawa
Published: 2015
Full Text: View/download PDF

33. Are automated static analysis tools worth it? An investigation into relative warning density and external software quality on the example of Apache open source projects

Author: Alexander Trautsch, Steffen Herbold, and Jens Grabowski
Subjects: Software
Abstract: Automated Static Analysis Tools (ASATs) are part of software development best practices. ASATs are able to warn developers about potential problems in the code. On the one hand, ASATs are based on best practices so there should be a noticeable effect on software quality. On the other hand, ASATs suffer from false positive warnings, which developers have to inspect and then ignore or mark as invalid. In this article, we ask whether ASATs have a measurable impact on external software quality, using the example of PMD for Java. We investigate the relationship between ASAT warnings emitted by PMD on defects per change and per file. Our case study includes data for the history of each file as well as the differences between changed files and the project in which they are contained. We investigate whether files that induce a defect have more static analysis warnings than the rest of the project. Moreover, we investigate the impact of two different sets of ASAT rules. We find that, bug inducing files contain less static analysis warnings than other files of the project at that point in time. However, this can be explained by the overall decreasing warning density. When compared with all other changes, we find a statistically significant difference in one metric for all rules and two metrics for a subset of rules. However, the effect size is negligible in all cases, showing that the actual difference in warning density between bug inducing changes and other changes is small at best.
Published: 2023

34. What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes

Author: Alexander Trautsch, Johannes Erbel, Steffen Herbold, and Jens Grabowski
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Software
Abstract: Many software metrics are designed to measure aspects that are believed to be related to software quality. Static software metrics, e.g., size, complexity and coupling are used in defect prediction research as well as software quality models to evaluate software quality. Static analysis tools also include boundary values for complexity and size that generate warnings for developers. While this indicates a relationship between quality and software metrics, the extent of it is not well understood. Moreover, recent studies found that complexity metrics may be unreliable indicators for understandability of the source code. To explore this relationship, we leverage the intent of developers about what constitutes a quality improvement in their own code base. We manually classify a randomized sample of 2,533 commits from 54 Java open source projects as quality improving depending on the intent of the developer by inspecting the commit message. We distinguish between perfective and corrective maintenance via predefined guidelines and use this data as ground truth for the fine-tuning of a state-of-the art deep learning model for natural language processing. The benchmark we provide with our ground truth indicates that the deep learning model can be confidently used for commit intent classification. We use the model to increase our data set to 125,482 commits. Based on the resulting data set, we investigate the differences in size and 14 static source code metrics between changes that increase quality, as indicated by the developer, and changes unrelated to quality. In addition, we investigate which files are targets of quality improvements. We find that quality improving commits are smaller than non-quality improving commits. Perfective changes have a positive impact on static source code metrics while corrective changes do tend to add complexity. Furthermore, we find that files which are the target of perfective maintenance already have a lower median complexity than files which are the target of non-pervective changes. Our study results provide empirical evidence for which static source code metrics capture quality improvement from the developers point of view. This has implications for program understanding as well as code smell detection and recommender systems.
Published: 2023

35. AutoQUEST - Automated Quality Engineering of Event-Driven Software.

Author: Steffen Herbold and Patrick Harms
Published: 2013
Full Text: View/download PDF

36. Training data selection for cross-project defect prediction.

Author: Steffen Herbold
Published: 2013
Full Text: View/download PDF

37. A Model for Usage-Based Testing of Event-Driven Software.

Author: Steffen Herbold, Jens Grabowski, and Stephan Waack
Published: 2011
Full Text: View/download PDF

38. Improved Bug Reporting and Reproduction through Non-intrusive GUI Usage Monitoring and Automated Replaying.

Author: Steffen Herbold, Jens Grabowski, Stephan Waack, and Uwe Bünting
Published: 2011
Full Text: View/download PDF

39. A comparative study to benchmark cross-project defect prediction approaches.

Author: Steffen Herbold, Alexander Trautsch, and Jens Grabowski
Published: 2018
Full Text: View/download PDF

40. Large-Scale Manual Validation of Bugfixing Changes

Author: Steffen Herbold, Alexander Trautsch, and Benjamin Ledel
Subjects: Scale (ratio), Computer science, business.industry, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020207 software engineering, Context (language use), 02 engineering and technology, Commit, Software engineering, business, Java Programming Language
Abstract: Context: Accurate data about bug fixes is important for different venues of research, e.g., program repair. While automated procedures are able to identify bug fixing activities, they cannot distinguish between the bug fix and other activities that are happening in parallel, e.g., refactorings or the addition of features. Objective: The creation of a large corpus of manually validated bug fixes and to gain insights into the limitations of manual validation. Method: We use a crowd working approach to manually validate bug fixing commit and analyze the limitations. Limitations: Insights limited to the Java programming language and possibly by the participants in the crowd working.
Published: 2022
Full Text: View/download PDF

41. Differential testing for machine learning: an analysis for classification algorithms beyond deep learning

Author: Steffen Herbold and Steffen Tunkel
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Machine Learning, Software, Machine Learning (cs.LG)
Abstract: Context: Differential testing is a useful approach that uses different implementations of the same algorithms and compares the results for software testing. In recent years, this approach was successfully used for test campaigns of deep learning frameworks. Objective: There is little knowledge on the application of differential testing beyond deep learning. Within this article, we want to close this gap for classification algorithms. Method: We conduct a case study using Scikit-learn, Weka, Spark MLlib, and Caret in which we identify the potential of differential testing by considering which algorithms are available in multiple frameworks, the feasibility by identifying pairs of algorithms that should exhibit the same behavior, and the effectiveness by executing tests for the identified pairs and analyzing the deviations. Results: While we found a large potential for popular algorithms, the feasibility seems limited because often it is not possible to determine configurations that are the same in other frameworks. The execution of the feasible tests revealed that there is a large amount of deviations for the scores and classes. Only a lenient approach based on statistical significance of classes does not lead to a huge amount of test failures. Conclusions: The potential of differential testing beyond deep learning seems limited for research into the quality of machine learning libraries. Practitioners may still use the approach if they have deep knowledge about implementations, especially if a coarse oracle that only considers significant differences of classes is sufficient., Under review
Published: 2022

42. Problems with SZZ and features: an empirical study of the state of practice of defect prediction data collection

Author: Steffen Herbold, Alexander Trautsch, Fabian Trautsch, and Benjamin Ledel
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, SZZ -- Bug fix labeling -- Bug inducing changes -- Defect prediction data -- Data set, Computer Science - Software Engineering, article, ddc:004, Software
Abstract: Context: The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features are also important. Objective: We provide an empirical analysis of the defect labels created with the SZZ algorithm and the impact of commonly used features on results. Method: We used a combination of manual validation and adopted or improved heuristics for the collection of defect data. We conducted an empirical study on 398 releases of 38 Apache projects. Results: We found that only half of the bug fixing commits determined by SZZ are actually bug fixing. If a six-month time frame is used in combination with SZZ to determine which bugs affect a release, one file is incorrectly labeled as defective for every file that is correctly labeled as defective. In addition, two defective files are missed. We also explored the impact of the relatively small set of features that are available in most defect prediction data sets, as there are multiple publications that indicate that, e.g., churn related features are important for defect prediction. We found that the difference of using more features is not significant. Conclusion: Problems with inaccurate defect labels are a severe threat to the validity of the state of the art of defect prediction. Small feature sets seem to be a less severe threat., Comment: Accepted at Empirical Software Engineering, Springer. First three authors are equally contributing
Published: 2022

43. A fine-grained data set and analysis of tangling in bug fixing commits

Author: Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza Aghamohammadi, Taher A. Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi, Shangwen Wang, Gema Rodríguez-Pérez, Ricardo Colomo-Palacios, Roberto Verdecchia, Paramvir Singh, Yihao Qin, Debasish Chakroborti, Willard Davis, Vijay Walunj, Hongjun Wu, Diego Marcilio, Omar Alam, Abdullah Aldaeej, Idan Amit, Burak Turhan, Simon Eismann, Anna-Katharina Wickert, Ivano Malavolta, Matúš Sulír, Fatemeh Fard, Austin Z. Henley, Stratos Kourtzanidis, Eray Tuzun, Christoph Treude, Simin Maleki Shamasbi, Ivan Pashchenko, Marvin Wyrich, James Davis, Alexander Serebrenik, Ella Albrecht, Ethem Utku Aktas, Daniel Strüber, Johannes Erbel, Software and Sustainability (S2), Network Institute, Information Management & Software Engineering, and Software Engineering and Technology
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Tangled commits, Economics, ddc:330, Research turk, Software Science, Software, Registered report, Bug fix, Manual validation, Tangled changes
Abstract: Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise., Status: Accepted at Empirical Software Engineering
Published: 2022

44. Automatic source localization and spectra generation from sparse beamforming maps

Author: Carsten Spehr, Steffen Herbold, and Armin Goudarzi
Subjects: Beamforming, FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, Acoustics and Ultrasonics, Computer science, Statistical noise, 01 natural sciences, Computer Science - Sound, 010305 fluids & plasmas, beamforming, Machine Learning (cs.LG), Normal distribution, Machine Learning, CLEAN-SC, Arts and Humanities (miscellaneous), Audio and Speech Processing (eess.AS), 0103 physical sciences, Airframe, Broadband, FOS: Electrical engineering, electronic engineering, information engineering, acoustics, 010301 acoustics, Wind tunnel, business.industry, Pattern recognition, Hierarchical clustering, Identification (information), Artificial intelligence, business, clustering, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Beamforming is an imaging tool for the investigation of aeroacoustic phenomena and results in high dimensional data that is broken down to spectra by integrating spatial Regions Of Interest. This paper presents two methods that enable the automated identification of aeroacoustic sources in sparse beamforming maps and the extraction of their corresponding spectra to overcome the manual definition of Regions Of Interest. The methods are evaluated on two scaled airframe half-model wind-tunnel measurements and on a generic monopole source. The first relies on the spatial normal distribution of aeroacoustic broadband sources in sparse beamforming maps. The second uses hierarchical clustering methods. Both methods are robust to statistical noise and predict the existence, location, and spatial probability estimation for sources based on which Regions Of Interest are automatically determined., Preprint for JASA special issue on machine learning in acoustics, Revision 2
Published: 2021

45. On the validity of pre-trained transformers for natural language processing in the software engineering domain

Author: Julian von der Mosel, Alexander Trautsch, and Steffen Herbold
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Machine Learning, Software, Machine Learning (cs.LG)
Abstract: Transformers are the current state-of-the-art of natural language processing in many domains and are using traction within software engineering research as well. Such models are pre-trained on large amounts of data, usually from the general domain. However, we only have a limited understanding regarding the validity of transformers within the software engineering domain, i.e., how good such models are at understanding words and sentences within a software engineering context and how this improves the state-of-the-art. Within this article, we shed light on this complex, but crucial issue. We compare BERT transformer models trained with software engineering data with transformers based on general domain data in multiple dimensions: their vocabulary, their ability to understand which words are missing, and their performance in classification tasks. Our results show that for tasks that require understanding of the software engineering context, pre-training with software engineering data is valuable, while general domain models are sufficient for general language understanding, also within the software engineering domain., Comment: Review status: submitted
Published: 2021
Full Text: View/download PDF

46. Repayment under Flexible Loan Contracts: Evidence based on High Frequency Data

Author: Antonia Grohmann, Steffen Herbold, and Friederike Lenel
Subjects: Flexibility (engineering), Finance, History, 050208 finance, Evidence-based practice, Polymers and Plastics, business.industry, media_common.quotation_subject, 05 social sciences, 1. No poverty, Frequency data, Context (language use), Payment, Industrial and Manufacturing Engineering, Product (business), Loan, 0502 economics and business, Cash flow, 050207 economics, Business and International Management, business, media_common
Abstract: We study repayment and delinquency in the context of an alternative financial product that enables the purchase of a large asset---a solar panel home system---while offering complete repayment flexibility. Using a large administrative data set on daily repayment of 38,400 borrowers in Tanzania over 5.5 years, we perform unsupervised pattern analysis to classify repayment behavior. We show that borrowers with fluctuating incomes use the loan's flexibility more and that farmers in particular adjust their repayment to cash flow. We further find that use of flexibility is linked to repayment difficulties, yet does not automatically lead to default. Our results indicate that low-income households can finance large assets through innovative financial approaches that allow aligning payments to financial circumstances.
Published: 2021

47. Exploring the relationship between performance metrics and cost saving potential of defect prediction models

Author: Steffen Tunkel and Steffen Herbold
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Software
Abstract: Context: Performance metrics are a core component of the evaluation of any machine learning model and used to compare models and estimate their usefulness. Recent work started to question the validity of many performance metrics for this purpose in the context of software defect prediction. Objective: Within this study, we explore the relationship between performance metrics and the cost saving potential of defect prediction models. We study whether performance metrics are suitable proxies to evaluate the cost saving capabilities and derive a theory for the relationship between performance metrics and cost saving potential. Methods: We measure performance metrics and cost saving potential in defect prediction experiments. We use a multinomial logit model, decision, and random forest to model the relationship between the metrics and the cost savings. Results: We could not find a stable relationship between cost savings and performance metrics. We attribute the lack of the relationship to the inability of performance metrics to account for the property that a small proportion of very large software artifacts are the main driver of the costs. Conclusion: Any defect prediction study interested in finding the best prediction model, must consider cost savings directly, because no reasonable claims regarding the economic benefits of defect prediction can be made otherwise., Comment: Under review
Published: 2021
Full Text: View/download PDF

48. Static source code metrics and static analysis warnings for fine-grained just-in-time defect prediction

Author: Steffen Herbold, Jens Grabowski, and Alexander Trautsch
Subjects: Data collection, Source code, Java, Computer science, media_common.quotation_subject, 020207 software engineering, 02 engineering and technology, Static analysis, computer.software_genre, Software quality, Software metric, Software quality assurance, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Quality (business), Data mining, computer, computer.programming_language, media_common
Abstract: Software quality evolution and predictive models to support decisions about resource distribution in software quality assurance tasks are an important part of software engineering research. Recently, a fine-grained just-in-time defect prediction approach was proposed which has the ability to find bug-inducing files within changes instead of only complete changes. In this work, we utilize this approach and improve it in multiple places: data collection, labeling and features. We include manually validated issue types, an improved SZZ algorithm which discards comments, whitespaces and refactorings. Additionally, we include static source code metrics as well as static analysis warnings and warning density derived metrics as features. To assess whether we can save cost we incorporate a specialized defect prediction cost model. To evaluate our proposed improvements of the fine-grained just-in-time defect prediction approach we conduct a case study that encompasses 38 Java projects, 492,241 file changes in 73,598 commits and spans 15 years. We find that static source code metrics and static analysis warnings are correlated with bugs and that they can improve the quality and cost saving potential of just-in-time defect prediction models.
Published: 2020

49. The SmartSHARK ecosystem for software repository mining

Author: Fabian Trautsch, Alexander Trautsch, Jens Grabowski, Benjamin Ledel, and Steffen Herbold
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Empirical process (process control model), Foundation (engineering), 020207 software engineering, 02 engineering and technology, Detailed data, Data science, Software metric, Software Engineering (cs.SE), Computer Science - Software Engineering, Software, 020204 information systems, Research based, 0202 electrical engineering, electronic engineering, information engineering, Software repository, business
Abstract: Software repository mining is the foundation for many empirical software engineering studies. The collection and analysis of detailed data can be challenging, especially if data shall be shared to enable replicable research and open science practices. SmartSHARK is an ecosystem that supports replicable and reproducible research based on software repository mining., Submitted to ICSE 2020 Demo Track
Published: 2020

50. Expert Decision Support System for Aeroacoustic Classification from Deconvolved Beamforming Maps

Author: Armin Goudarzi, Steffen Herbold, and Carsten Spehr
Subjects: Beamforming, 020301 aerospace & aeronautics, Decision support system, KI, Computer science, 02 engineering and technology, computer.software_genre, 01 natural sciences, beamforming, 010104 statistics & probability, machine learning, 0203 mechanical engineering, AI, Data mining, 0101 mathematics, computer, clustering
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

93 results on '"Steffen Herbold"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources