8,169 results on '"Software testing"'
Search Results
2. Generation of algebraic data type values using evolutionary algorithms
- Author
-
Ballesteros, Ignacio, Benac-Earle, Clara, Mariño, Julio, Fredlund, Lars-Åke, and Herranz, Ángel
- Published
- 2025
- Full Text
- View/download PDF
3. ENZZ: Effective N-gram coverage assisted fuzzing with nearest neighboring branch estimation
- Author
-
Peng, Xi, Jia, Peng, Fan, Ximing, and Liu, Jiayong
- Published
- 2025
- Full Text
- View/download PDF
4. Testing infrastructures to support mobile application testing: A systematic mapping study
- Author
-
Kuroishi, Pedro Henrique, Paiva, Ana Cristina Ramada, Maldonado, José Carlos, and Vincenzi, Auri Marcelo Rizzo
- Published
- 2025
- Full Text
- View/download PDF
5. Enhancing logic-based testing with EvoDomain: A search-based domain-oriented test suite generation approach
- Author
-
Kalaee, Akram, Parsa, Saeed, and Mansouri, Zahra
- Published
- 2025
- Full Text
- View/download PDF
6. Predicting test failures induced by software defects: A lightweight alternative to software defect prediction and its industrial application
- Author
-
Madeyski, Lech and Stradowski, Szymon
- Published
- 2025
- Full Text
- View/download PDF
7. Performance regression testing initiatives: a systematic mapping
- Author
-
dos Santos, Luciana Brasil Rebelo, de Souza, Érica Ferreira, Endo, André Takeshi, Trubiani, Catia, Pinciroli, Riccardo, and Vijaykumar, Nandamudi Lankalapalli
- Published
- 2025
- Full Text
- View/download PDF
8. Improving seed quality with historical fuzzing results
- Author
-
Li, Yang, Zeng, Yingpei, Song, Xiangpu, and Guo, Shanqing
- Published
- 2025
- Full Text
- View/download PDF
9. A Comprehensive Review on Deep Learning System Testing
- Author
-
Li, Ying, Shan, Chun, Liu, Zhen, Liao, Shuyan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Zhu, Tianqing, editor, Li, Jin, editor, and Castiglione, Aniello, editor
- Published
- 2025
- Full Text
- View/download PDF
10. Software System Testing Assisted by Large Language Models: An Exploratory Study
- Author
-
Augusto, Cristian, Morán, Jesús, Bertolino, Antonia, de la Riva, Claudio, Tuya, Javier, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Menéndez, Héctor D., editor, Bello-Orgaz, Gema, editor, Barnard, Pepita, editor, Bautista, John Robert, editor, Farahi, Arya, editor, Dash, Santanu, editor, Han, DongGyun, editor, Fortz, Sophie, editor, and Rodriguez-Fernandez, Victor, editor
- Published
- 2025
- Full Text
- View/download PDF
11. Checking Test Suite Efficacy Through Dual-Channel Techniques
- Author
-
Cezar Petrescu, Constantin, Smith, Sam, Butler, Alexis, Kumar Dash, Santanu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Menéndez, Héctor D., editor, Bello-Orgaz, Gema, editor, Barnard, Pepita, editor, Bautista, John Robert, editor, Farahi, Arya, editor, Dash, Santanu, editor, Han, DongGyun, editor, Fortz, Sophie, editor, and Rodriguez-Fernandez, Victor, editor
- Published
- 2025
- Full Text
- View/download PDF
12. Verifying Components of Arm® Confidential Computing Architecture with ESBMC
- Author
-
Wu, Tong, Xiong, Shale, Manino, Edoardo, Stockwell, Gareth, Cordeiro, Lucas C., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Giacobazzi, Roberto, editor, and Gorla, Alessandra, editor
- Published
- 2025
- Full Text
- View/download PDF
13. Virtual Prototyping of Processor-Based Platforms
- Author
-
Kogel, Tim and Chattopadhyay, Anupam, editor
- Published
- 2025
- Full Text
- View/download PDF
14. Hybrid GWO-PSO for Path Coverage Testing
- Author
-
Ahsan, Fatma, Anwer, Faisal, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Gonçalves, Paulo J. Sequeira, editor, Singh, Pradeep Kumar, editor, Tanwar, Sudeep, editor, and Epiphaniou, Gregory, editor
- Published
- 2025
- Full Text
- View/download PDF
15. Effective Model-Based Testing
- Author
-
Ruys, Theo C., van der Bijl, Machiel, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Jansen, Nils, editor, Junges, Sebastian, editor, Kaminski, Benjamin Lucien, editor, Matheja, Christoph, editor, Noll, Thomas, editor, Quatmann, Tim, editor, Stoelinga, Mariëlle, editor, and Volk, Matthias, editor
- Published
- 2025
- Full Text
- View/download PDF
16. Study on Automatic Software Test Case Generation
- Author
-
Mulla, Nilofar, Jayakumar, Naveenkumar, Joshi, Shashank, Godse, Deepali, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Senatore, Sabrina, editor, and Hu, Yu-Chen, editor
- Published
- 2025
- Full Text
- View/download PDF
17. Software reliability modeling under an uncertain testing environment.
- Author
-
Haque, Md. Asraful and Ahmad, Nesar
- Subjects
- *
SOFTWARE reliability , *COMPUTER software testing , *SYSTEMS software , *TIME management , *PREDICTION models - Abstract
The increased demand for high-quality software makes software reliability modeling an important research area. The number of failures that occur during the testing phase over a specified period of time is usually used to determine how reliable a software system is. There are numerous reliability models suggested, addressing the various aspects of testing and debugging processes. The issue with those models is that they assume testing operations are conducted under controlled circumstances. The software testing environment, however, is not always consistent or ideal and its efficacy can be influenced by many uncertain factors such as testing effort, testing coverage, testing skills, hardware and software configuration differences, time and budget constraints, and incomplete or inaccurate requirements. The proposed model incorporates a special parameter that represents the effectiveness of the testing environment. The model was validated on two distinct datasets and it was compared to six well known models using four different goodness-of-fit criteria. Both datasets revealed that the suggested model outperforms the other models in prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. Impact in Software Testing Using Artificial Intelligence: A Literature Review.
- Author
-
Hamid, Mervat Mohamed and Waleed Ali, A. P. Aseel
- Subjects
ARTIFICIAL intelligence ,TECHNOLOGICAL innovations ,DEEP learning ,INTERNET security ,MACHINE learning - Abstract
Software (SW) testing is more effective in software development cycle life, with increase in software complexity and society's growing reliance on software across various sectors. Software testing is a critical step; however, it often requires substantial time and financial resources, making it challenging to achieve comprehensive coverage. Artificial intelligence (AI), particularly through machine learning and reinforcement learning techniques, offers trans-formative solutions to these challenges, especially in white-box testing. This paper focuses on leveraging AI techniques - specifically Ensemble Learning and Swarm Intelligence algorithms to optimize software testing. Swarm Intelligence, which imitates the collective behavior of natural organisms, is effective in identifying efficient paths within the source code. This paper includes review various methods such as bee algorithms, ant colony optimization, and particle swarm optimization, all of which enhance error detection speed and accuracy while minimizing resource consumption. When combined with Ensemble Learning, which aggregates results from multiple models, these AI techniques foster robust decision-making and comprehensive test coverage. This integrated approach not only addresses issues related to control and data flow but also significantly contributes to achieving a more efficient and reliable software testing process, ultimately reducing both the time and costs associated with testing. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
19. Solving the t-Wise Coverage Maximum Problem via Effective and Efficient Local Search-Based Sampling.
- Author
-
Luo, Chuan, Song, Jianping, Zhao, Qiyuan, Sun, Binqi, Chen, Junjie, Zhang, Hongyu, Lin, Jinkun, and Hu, Chunming
- Subjects
RELIABILITY in engineering ,COMPUTER software testing ,SYSTEMS software ,SCALABILITY ,METRIC system - Abstract
To meet the increasing demand for customized software, highly configurable systems become essential in practice. Such systems offer many options to configure, and ensuring the reliability of these systems is critical. A widely used evaluation metric for testing these systems is \(t\) -wise coverage, where \(t\) represents testing strength, and its value typically ranges from 2 to 6. It is crucial to design effective and efficient methods for generating test suites that achieve high \(t\) -wise coverage. However, current state-of-the-art methods need to generate large test suites for achieving high \(t\) -wise coverage. In this work, we propose a novel method called LS-Sampling-Plus that can efficiently generate test suites with high \(t\) -wise coverage for \(2\leq t\leq 6\) while being smaller in size compared to existing state-of-the-art methods. LS-Sampling-Plus incorporates many core algorithmic techniques, including two novel scoring functions, a dynamic mechanism for updating sampling probabilities, and a validity-guaranteed systematic search method. Our experiments on various practical benchmarks show that LS-Sampling-Plus can achieve higher \(t\) -wise coverage than current state-of-the-art methods, through building a test suite of the same size. Moreover, our evaluations indicate the effectiveness of all core algorithmic techniques of LS-Sampling-Plus. Furthermore, LS-Sampling-Plus exhibits better scalability and fault detection capability than existing state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
20. An Empirical Study of Testing Machine Learning in the Wild.
- Author
-
Openja, Moses, Khomh, Foutse, Foundjem, Armstrong, Jiang, Zhen Ming, Abidi, Mouna, and Hassan, Ahmed E.
- Subjects
COMPUTER vision ,WORKFLOW software ,MACHINE learning ,CONFORMANCE testing ,SYSTEMS software ,DEEP learning - Abstract
Background: Recently, machine and deep learning (ML/DL) algorithms have been increasingly adopted in many software systems. Due to their inductive nature, ensuring the quality of these systems remains a significant challenge for the research community. Traditionally, software systems were constructed deductively, by writing explicit rules that govern the behavior of the system as program code. However, ML/DL systems infer rules from training data i.e., they are generated inductively. Recent research in ML/DL quality assurance has adapted concepts from traditional software testing, such as mutation testing, to improve reliability. However, it is unclear if these proposed testing techniques are adopted in practice, or if new testing strategies have emerged from real-world ML deployments. There is little empirical evidence about the testing strategies. Aims: To fill this gap, we perform the first fine-grained empirical study on ML testing in the wild to identify the ML properties being tested, the testing strategies, and their implementation throughout the ML workflow. Method: We conducted a mixed-methods study to understand ML software testing practices. We analyzed test files and cases from 11 open-source ML/DL projects on GitHub. Using open coding, we manually examined the testing strategies, tested ML properties, and implemented testing methods to understand their practical application in building and releasing ML/DL software systems. Results: Our findings reveal several key insights: (1) The most common testing strategies, accounting for less than 40%, are Grey-box and White-box methods, such as Negative Testing, Oracle Approximation, and Statistical Testing. (2) A wide range of \(17\) ML properties are tested, out of which only 20% to 30% are frequently tested, including Consistency, Correctness, and Efficiency. (3) Bias and Fairness is more tested in Recommendation (6%) and Computer Vision (CV) (3.9%) systems, while Security and Privacy is tested in CV (2%), Application Platforms (0.9%), and NLP (0.5%). (4) We identified 13 types of testing methods, such as Unit Testing, Input Testing, and Model Testing. Conclusions: This study sheds light on the current adoption of software testing techniques and highlights gaps and limitations in existing ML testing practices. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. Understanding Test Convention Consistency as a Dimension of Test Quality.
- Author
-
Robillard, Martin P., Nassif, Mathieu, and Sohail, Muhammad
- Subjects
COMPUTER software testing ,TEST validity ,DESCRIPTIVE statistics ,TEST design ,CATALOGS - Abstract
Unit tests must be readable to help developers understand and evolve production code. Most existing test quality metrics assess test code's ability to detect bugs. Few metrics focus on test code's readability. One standard approach to improve readability is the consistent application of conventions. We investigated test convention consistency as a dimension of test quality. We formalized test suite consistency as the extent to which alternatives are used within a code base and introduce two complementary metrics to capture this extent. We elaborated a catalog of over 30 test conventions for the Java language organized in 10 convention classes that group mutual alternatives. We developed tool support to detect occurrences of conventions, compute consistency metrics over a test suite, and view occurrences of conventions in the corresponding code. We applied our tools to study the consistency of the test suites of 20 large open source Java projects. The study validates the design of the test convention classes, provides descriptive statistics on the range of consistency values for 10 different convention classes, and enables us to link observed changes in consistency values to specific events in the change history of our target systems, thus providing evidence of the construct validity of the metrics. We conclude that analyzing test suite consistency via static analysis shows promise as a practical approach to help improve test suite quality. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
22. Estimating software reliability using size-biased modelling.
- Author
-
Dey, Soumen and Chakraborty, Ashis Kumar
- Subjects
- *
SOFTWARE reliability , *COMPUTER software development , *BAYESIAN analysis , *COMPUTER software management , *LATENT variables - Abstract
Software testing is an important step in software development where inputs are administered repeatedly to detect bugs present in the software. In this paper, we have considered the estimation of total number of bugs and software reliability as a size-biased sampling problem by introducing the concept of eventual bug size as a latent variable. We have developed a Bayesian generalised linear mixed model (GLMM) using software testing detection data to estimate software reliability and stopping phase. The model uses size-biased approach where the probability of detecting a bug is an increasing function of eventual size of the bug which is as an index for the potential number of inputs that may eventually pass through the bug. We have tested the sensitivity of the reliability estimates by varying the number of inputs and detection probability via a simulation study and have found that the key parameters could be accurately estimated. Further, we have applied our model to two empirical data sets – one from a commercial software and the other from ISRO launch mission software testing data set. The hierarchical modelling approach provides a unified modelling framework that may find applications in other fields (e.g. hydrocarbon explorations) apart from software management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Applying the Enhanced Snow Ablation Optimizer to Generate Data Flow-Based Test Data.
- Author
-
Jiao, Chongyang, Zhou, Qinglei, Zhang, Wenning, and Zhang, Chunyan
- Subjects
METAHEURISTIC algorithms ,LATIN hypercube sampling ,COMPUTER software quality control ,SEARCH algorithms ,AUTOMATION software ,COMPUTER software testing - Abstract
Software quality can be effectively ensured by software testing. The creation of test data is a key component of software testing automation. One unresolved issue is how to automatically create test data sets for the data flow coverage criterion. Search-based software testing (SBST) is a technique that employs meta-heuristic search algorithms to generate test data. In this paper, a method of automatic test data generation for data flow coverage criterion based on the enhanced snow ablation optimizer (ESAO) is proposed. First, the snow ablation optimizer (SAO) is enhanced to improve the efficiency of the algorithm through the Latin hypercube sampling (LHS) initialization strategy and warming strategy. Second, the construction of the fitness function is considered in terms of both definition node and use node. Third, the data flow-based test cases are automatically generated based on the ESAO. This method of generating test cases that cover all definition-use pairs (DUPs) improves the efficiency and coverage of test case generation, and thus improves the efficiency of software testing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Risk Prioritizing with Weighted Failure Mode and Effects Analysis and Fuzzy Step-Wise Weight Assessment Ratio Analysis: An Application Software Service Provider Company in the Defense Industry.
- Author
-
Korkusuz Polat, Tulay and Pamuk Candan, Işılay
- Subjects
FAILURE mode & effects analysis ,COMPUTER software industry ,COMPUTER software quality control ,APPLICATION software ,SOFTWARE reliability - Abstract
With the development of technology, the need for software and software products to manage, control, and develop activities in many sectors is increasing daily. In order to create suitable software that will meet the needs of businesses and customers, the software application must be tested in detail before reaching the end user. For this reason, software testing processes are gaining importance in software development activities. This article discusses which errors are critical to solve in complex situations for the reliability and quality of the software product and the relationship between errors. In this study, the classical FMEA method was primarily used to identify and prioritize errors in an ongoing project of a company that provides software services in the defense industry. Later, to eliminate the shortcomings of the classical FMEA method, a new model, the weighted FMEA method (which calculates the risk priority score with five sub-severity components), was developed and applied. In the newly developed weighted FMEA method, the weights were determined by the fuzzy SWARA (Step-Wise Weight Assessment Ratio Analysis) method since the weights of the severity subcomponents were not the same. The risk priority number (RPN) of error types was calculated using classical FMEA and weighted FMEA. Since the RPNs calculated with weighted FMEA are calculated with more subcomponents, the chances of the RPNs' errors appearing the same are much less than the RPNs calculated with classical FMEA. This situation indicates that the RPNs calculated with weighted FMEA are obtained from a more profound analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Enhancing IOT based software defect prediction in analytical data management using war strategy optimization and Kernel ELM.
- Author
-
Zada, Islam, Alshammari, Abdullah, Mazhar, Ahmad A., Aldaeej, Abdullah, Qasem, Sultan Noman, Amjad, Kashif, and Alkhateeb, Jawad H.
- Subjects
- *
COMPUTER software quality control , *OPTIMIZATION algorithms , *MACHINE learning , *ARTIFICIAL intelligence , *SOURCE code - Abstract
The existence of software problems in IoT applications caused by insufficient source code, poor design, mistakes, and insufficient testing poses a serious risk to functioning and user expectations. Prior to software deployment, thorough testing and quality assurance methods are crucial to reducing these risks. This study advances the field of IoT-based software quality assessment while also showcasing the viability and benefits of incorporating AI methods into Software Defect Prediction (SDP), particularly the Kernel-based Extreme Learning Machine (KELM) and the War Strategy Optimisation (WSO) algorithm. These efforts are essential to maintain the dependability and performance of IoT applications given the IoT's rising significance in our linked world. The chosen keywords, such as Software defect prediction, IoT, KELM, and WSO, capture the multidimensional nature of this novel technique and serve as an important source of information for upcoming study in this area. One of the main issues that needs to be addressed in order to overcome the difficulties of developing IoT-based software is how time and resource-consuming it is to test the programme in order to ensure its effectiveness. Software Defect Prediction (SDP) assumes a crucial function in this context in locating flaws in software components. Manual defect analysis grows more inefficient and time-consuming as software projects become more complicated. This research introduces a fresh method to SDP by utilising artificial intelligence (AI) to address these issues. The suggested methodology includes the War Strategy Optimisation (WSO) algorithm, which is cleverly used to optimise classifier hyperparameters, together with a Kernel Extreme Learning Machine (KELM) for SDP. The main objective is to improve softw. This innovative combination, grounded in previous studies [1, 2], promises superior capabilities in predicting software defects. Notably, it represents the inaugural endeavor to integrate the WSO algorithm with KELM for SDP, introducing a unique and advanced approach to software quality assessment. The proposed methodology undergoes rigorous evaluation using a diverse set of real-world software project datasets, including the renowned PROMISE dataset and various open-source datasets coded in Java. Performance assessment is conducted through multiple metrics, including Efficiency Accuracy, Reliability, Sensitivity, and F1-score, collectively illuminating the effectiveness of this approach. The outcome of our experiments underscores the potency of the Kernel Extreme Learning Machine coupled with the War Strategy Optimization algorithm in enhancing the accuracy of SDP and consequently elevating defect detection efficiency within software components. Remarkably, our methodology consistently outperforms existing techniques, registering an average increase of over 90% in accuracy across the parameters examined. This promising result underscores the potential of our approach to effectively tackle the challenges associated with IoT-based software development and software defect prediction. In conclusion, this study significantly contributes to the field of IoT-based software quality assessment, introducing an innovative methodology that substantially bolsters accuracy and reliability in SDP. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. AbstractTrace: The Use of Execution Traces to Cluster, Classify, Prioritize, and Optimize a Bloated Test Suite.
- Author
-
Al-Sharif, Ziad A. and Jeffery, Clinton L.
- Subjects
COMPUTER software quality control ,SOFTWARE engineering ,MACHINE learning ,CLUSTER analysis (Statistics) - Abstract
Due to the incremental and iterative nature of the software testing process, a test suite may become bloated with redundant, overlapping, and similar test cases. This paper aims to optimize a bloated test suite by employing an execution trace that encodes runtime events into a sequence of characters forming a string. A dataset of strings, each of which represents the code coverage and execution behavior of a test case, is analyzed to identify similarities between test cases. This facilitates the de-bloating process by providing a formal mechanism to identify, remove, and reduce extra test cases without compromising software quality. This form of analysis allows for the clustering and classification of test cases based on their code coverage and similarity score. This paper explores three levels of execution traces and evaluates different techniques to measure their similarities. Test cases with the same code coverage should generate the exact string representation of runtime events. Various string similarity metrics are assessed to find the similarity score, which is used to classify, detect, and rank test cases accordingly. Additionally, this paper demonstrates the validity of the approach with two case studies. The first shows how to classify the execution behavior of various test cases, which can provide insight into each test case's internal behavior. The second shows how to identify similar test cases based on their code coverage. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. A Fuzzing Tool Based on Automated Grammar Detection.
- Author
-
Song, Jia and Alves-Foss, Jim
- Subjects
COMPUTER security vulnerabilities ,COMPUTER software security ,COMPUTER software development ,SOURCE code ,COMPUTER software quality control - Abstract
Software testing is an important step in the software development life cycle to ensure the quality and security of software. Fuzzing is a security testing technique that finds vulnerabilities automatically without accessing the source code. We built a fuzzer, called JIMA-Fuzzing, which is an effective fuzzing tool that utilizes grammar detected from sample input. Based on the detected grammar, JIMA-Fuzzing selects a portion of the valid user input and fuzzes that portion. For example, the tool may greatly increase the size of the input, truncate the input, replace numeric values with new values, replace words with numbers, etc. This paper discusses how JIMA-Fuzzing works and shows the evaluation results after testing against the DARPA Cyber Grand Challenge (CGC) dataset. JIMA-Fuzzing is capable of extracting grammar from sample input files, meaning that it does not require access to the source code to generate effective fuzzing files. This feature allows it to work with proprietary or non-open-source programs and significantly reduces the effort needed from human testers. In addition, compared to fuzzing tools guided with symbolic execution or taint analysis, JIMA-Fuzzing takes much less computing power and time to analyze sample input and generate fuzzing files. However, the limitation is that JIMA-Fuzzing relies on good sample inputs and works primarily on programs that require user interaction/input. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. A IMPORTÂNCIA DO TESTE DE SOFTWARE NA GARANTIA DE QUALIDADE EM PROJETOS DE TECNOLOGIA DE INFORMAÇÃO (TI) - ANÁLISE COMPARATIVA ENTRE METODOLOGIAS DE TESTE EM UMA EMPRESA.
- Author
-
Nayara Bettoni, Fernanda, Florian, Fabiana, and Mirella Farina, Renata
- Subjects
INFORMATION technology ,COMPUTER software testing ,COMPUTER software quality control ,PRODUCT quality ,COMPARATIVE studies - Abstract
Copyright of Revista Foco (Interdisciplinary Studies Journal) is the property of Revista Foco and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
29. 面向静态分析的软件测试工具评估方法研究.
- Author
-
曾福萍, 王泽宇, 李宇佳, and 王杰凯
- Abstract
Copyright of Computer Measurement & Control is the property of Magazine Agency of Computer Measurement & Control and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
30. How to test AI: A case study of a machine learning-based trading system.
- Author
-
Lewandowska, Olga and Mai, Edgar
- Subjects
MACHINE learning ,INFORMATION technology ,ALGORITHMIC bias ,CREDIT scoring systems ,INTEREST rates ,BOND prices ,STOCK price forecasting - Abstract
The article discusses the challenges of testing AI and machine learning-based trading systems. It emphasizes the importance of testing in the development and deployment of ML models, especially in the financial sector. The text highlights various testing techniques, such as A/B testing, metamorphic testing, and parallel testing, to ensure the reliability and accuracy of AI systems. Additionally, it explores the role of MLOps in managing and monitoring AI models in production. The article also discusses the future role of testers in AI testing and the significance of explainability in ML models. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
31. PSTR: A Test Case Reuse Method Based on Path Similarity
- Author
-
Xinyue Xu, Sinong Chen, Zhonghao Guo, and Xiangxian Chen
- Subjects
Software testing ,test case reuse ,path similarity ,control flow analysis ,unit testing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Software testing plays a critical role throughout the software development lifecycle. In modern development practices, frequent and incremental updates to code are common, and each update typically necessitates the generation of new test cases. While the generation of test cases can be automated, the creation and adjustment of test oracles (expected outputs or behaviors) still require significant manual effort. This process is time-consuming and labor-intensive, particularly when code changes are minor, making repeated oracle creation inefficient. Meanwhile, test cases and their associated oracles from previous versions, having been refined and validated through multiple iterations, are highly reliable and valuable. However, due to the changes in execution paths introduced by code updates, it is unclear which old test cases remain applicable to the new version, leaving these valuable resources underutilized. Consequently, the ability to effectively identify and reuse applicable test cases and their oracles from previous versions is of paramount importance. Currently, research on test case reuse in unit testing is relatively sparse. Existing approaches often focus on software requirements and pay limited attention to code-level changes. Even methods that consider code changes fail to ensure precision in selecting test cases or maintain high coverage in the reused test set. To address these challenges, this paper proposes a novel approach, Path Similarity-based Test case Reuse (PSTR), tailored for the characteristics of unit testing. PSTR significantly enhances testing efficiency and reduces costs by accurately identifying reusable test cases through path similarity analysis. The proposed PSTR method consists of three core modules: static analysis, dynamic analysis, and reuse. First, static analysis is performed on both the old and new code versions to generate their respective sets of static paths. Next, the test cases from the old version are executed on its code, establishing a mapping between each test case and its covered execution paths. Finally, the path similarity algorithm compares paths between the old and new versions, allowing test cases associated with the most similar paths to be reused for the new version. For cases where direct reuse is not possible, the method provides clear guidance to assist testers in completing subsequent tasks more efficiently. For evaluation, a custom dataset derived from LeetCode problem solutions was used, and the PSTR method was compared with the classical ATR (All Test case Reuse) method. The results demonstrated the superiority of PSTR, achieving a precision rate of 95.64%, which is 14.87% higher than ATR. Additionally, while PSTR reused an average of 77.81% of test cases, it maintained a path coverage rate comparable to ATR, which reused 100% of test cases. Metrics such as F1-score, recall, and misuse rate further highlighted PSTR’s advantages in accurate and effective test case reuse. Specifically, PSTR achieved comparable coverage with fewer reused test cases while significantly reducing the misuse rate. In summary, as an innovative test case reuse method, PSTR has demonstrated remarkable effectiveness and practicality. It reduces resource consumption, improves testing efficiency, and lowers testing costs while maintaining high coverage standards. The findings underscore PSTR’s potential to make software testing more efficient and cost-effective.
- Published
- 2025
- Full Text
- View/download PDF
32. Cypress Copilot: Development of an AI Assistant for Boosting Productivity and Transforming Web Application Testing
- Author
-
Suresh Babu Nettur, Shanthi Karpurapu, Unnati Nettur, and Likhit Sagar Gajja
- Subjects
Agile software development ,behavior driven development ,large language model ,machine learning ,prompt engineering ,software testing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In today’s fast-paced software development environment, Agile methodologies demand rapid delivery and continuous improvement, making automated testing essential for maintaining quality and accelerating feedback loops. Our study addresses the challenges of developing and maintaining automation code for web-based application testing. In this paper, we propose a novel approach that leverages large language models (LLMs) and a novel prompt technique, few-shot chain, to automate code generation for web application testing. We chose the Behavior-Driven Development (BDD) methodology owing to its advantages and selected the Cypress tool for automating web application testing, as it is one of the most popular and rapidly growing frameworks in this domain. We comprehensively evaluated various OpenAI models, including GPT-4-Turbo, GPT-4o, and GitHub Copilot, using zero-shot and few-shot chain prompt techniques. Furthermore, we extensively validated with a vast set of test cases to identify the optimal approach. Our results indicate that the Cypress automation code generated by GPT-4o using a few-shot chained prompt approach excels in generating complete code for each test case, with fewer empty methods and improved syntactical accuracy and maintainability. Based on these findings, we developed a novel open-source Visual Studio Code (IDE) extension, “Cypress Copilot” utilizing GPT-4o and a few-shot chain prompt technique, which has shown promising results. Finally, we validate the Cypress Copilot tool by generating automation code for end-to-end web tests, demonstrating its effectiveness in testing various web applications and its ability to streamline development processes. More importantly, we are releasing this tool to the open-source community, as it has the potential to be a promising partner in enhancing productivity in web application automation testing.
- Published
- 2025
- Full Text
- View/download PDF
33. A Fuzzing Tool Based on Automated Grammar Detection
- Author
-
Jia Song and Jim Alves-Foss
- Subjects
fuzzing ,software vulnerability ,security analysis ,vulnerability analysis ,software testing ,Computer software ,QA76.75-76.765 - Abstract
Software testing is an important step in the software development life cycle to ensure the quality and security of software. Fuzzing is a security testing technique that finds vulnerabilities automatically without accessing the source code. We built a fuzzer, called JIMA-Fuzzing, which is an effective fuzzing tool that utilizes grammar detected from sample input. Based on the detected grammar, JIMA-Fuzzing selects a portion of the valid user input and fuzzes that portion. For example, the tool may greatly increase the size of the input, truncate the input, replace numeric values with new values, replace words with numbers, etc. This paper discusses how JIMA-Fuzzing works and shows the evaluation results after testing against the DARPA Cyber Grand Challenge (CGC) dataset. JIMA-Fuzzing is capable of extracting grammar from sample input files, meaning that it does not require access to the source code to generate effective fuzzing files. This feature allows it to work with proprietary or non-open-source programs and significantly reduces the effort needed from human testers. In addition, compared to fuzzing tools guided with symbolic execution or taint analysis, JIMA-Fuzzing takes much less computing power and time to analyze sample input and generate fuzzing files. However, the limitation is that JIMA-Fuzzing relies on good sample inputs and works primarily on programs that require user interaction/input.
- Published
- 2024
- Full Text
- View/download PDF
34. Code change and smell techniques for regression test selection.
- Author
-
Mori, Allan, Paiva, Ana C. R., and Souza, Simone R. S.
- Abstract
Regression testing is a selective retesting of a system or component to verify that modifications have not induced unintended effects and that the system or component maintains compliance with the specified requirements. However, it can be time-consuming and resource-intensive, especially for large systems. Regression testing selection techniques can help address this issue by selecting a subset of test cases to run. The Change Based technique selects a subset of the existing test cases and executes modified classes. Besides effectively reducing the test suite, this technique may reduce the capability of revealing faults. From this perspective, code smells are known to identify poor design and software quality issues. Some works have explored the association between smells and faults with some promising results. Inspired by these results, we propose combining code change and smell to select regression tests and present eight techniques. Additionally, we developed the Regression Testing Selection Tool (RTST) to automate the selection process using these techniques. We empirically evaluated the approach in Defects4J projects by comparing the techniques’ effectiveness with the Change Based and Class Firewall as a baseline. The results show that the Change and Smell Intersection Based technique achieves the highest reduction rate in the test suite size but with less class coverage. On the other hand, Change and Smell Firewall technique achieves the lowest test suite size reduction with the highest fault detection effectiveness test cases, suggesting the combination of smells and changed classes can potentially find more bugs. The Smell Based technique provides a comparable class coverage to the code change and smell approach. Our findings indicate opportunities for improving the efficiency and effectiveness of regression testing and highlight that software quality should be a concern throughout the software evolution. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
35. Quality attributes of test cases and test suites – importance & challenges from practitioners’ perspectives.
- Author
-
Tran, Huynh Khanh Vi, Ali, Nauman bin, Unterkalmsteiner, Michael, Börstler, Jürgen, and Chatzipetrou, Panagiota
- Abstract
The quality of the test suites and the constituent test cases significantly impacts confidence in software testing. While research has identified several quality attributes of test cases and test suites, there is a need for a better understanding of their relative importance in practice. We investigate practitioners’ perceptions regarding the relative importance of quality attributes of test cases and test suites and the challenges that they face in ensuring the perceived important quality attributes. To capture the practitioners’ perceptions, we conducted an industrial survey using a questionnaire based on the quality attributes identified in an extensive literature review. We used a sampling strategy that leverages LinkedIn to draw a large and heterogeneous sample of professionals with experience in software testing. We collected 354 responses from practitioners with a wide range of experience (from less than one year to 42 years of experience). We found that the majority of practitioners rated Fault Detection, Usability, Maintainability, Reliability, and Coverage to be the most important quality attributes. Resource Efficiency, Reusability, and Simplicity received the most divergent opinions, which, according to our analysis, depend on the software-testing contexts. Also, we identified common challenges that apply to the important attributes, namely inadequate definition, lack of useful metrics, lack of an established review process, and lack of external support. The findings point out where practitioners actually need further support with respect to achieving high-quality test cases and test suites under different software testing contexts. Hence, the findings can serve as a guideline for academic researchers when looking for research directions on the topic. Furthermore, the findings can be used to encourage companies to provide more support to practitioners to achieve high-quality test cases and test suites. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
36. GAMFLEW: serious game to teach white-box testing.
- Author
-
Silva, Mateus, Paiva, Ana C. R., and Mendes, Alexandra
- Abstract
Software testing plays a fundamental role in software engineering, involving the systematic evaluation of software to identify defects, errors, and vulnerabilities from the early stages of the development process. Education in software testing is essential for students and professionals, as it promotes quality and favours the construction of reliable software solutions. However, motivating students to learn software testing may be a challenge. To overcome this, educators may incorporate some strategies into the teaching and learning process, such as real-world examples, interactive learning, and gamification. Gamification aims to make learning software testing more engaging for students by creating a more enjoyable experience. One approach that has proven effective is to use serious games. This paper presents a novel serious game to teach white-box testing test case design techniques, named GAMFLEW (GAMe For LEarning White-box testing). It describes the design, game mechanics, and its implementation. It also presents a preliminary evaluation experiment with students to assess the usability, learnability, and perceived problems, among other aspects. The results obtained are encouraging. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
37. Chat GPT Impact Analysis on API Testing: A Controlled Experiment
- Author
-
Yehezkiel David Setiawan, Laurentius Gusti Ontoseno Panata Yudha, Yovie Adhisti Mulyono, Veronica Marcella Angela Simalango, and Oscar Karnalim
- Subjects
api development ,api platform ,chatgpt ,controlled experiment ,software testing ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This research examines the impact of ChatGPT as a learning aid for students in API testing. A controlled experiment compared two groups: one utilizing ChatGPT and the other relying on traditional documentation. The findings indicate that participants using ChatGPT scored significantly higher in both exam tests compared to the documentation group, despite taking longer to complete tasks. Statistical analysis using t-tests confirmed these differences as significant. Post-test surveys revealed an increase in participants confidence and effectiveness in understanding and using APIs after interacting with ChatGPT. However, potential downsides, such as over-reliance on ChatGPT and insufficient deep conceptual understanding, were also observed. The results suggest that while ChatGPT can greatly enhance the quality of learning and productivity in API-related tasks, users must balance AI assistance with independent problem-solving skills. This study underscores the potential of ChatGPT as a valuable educational tool, provided it is integrated thoughtfully into the learning process.
- Published
- 2024
- Full Text
- View/download PDF
38. MET-MAPF: A Metamorphic Testing Approach for Multi-Agent Path Finding Algorithms.
- Author
-
Zhang, Xiao-Yi, Liu, Yang, Arcaini, Paolo, Jiang, Mingyue, and Zheng, Zheng
- Subjects
COMPUTER software testing ,TEST systems ,MULTIAGENT systems ,SYSTEMS software ,ALGORITHMS - Abstract
The Multi-Agent Path Finding (MAPF) problem, i.e., the scheduling of multiple agents to reach their destinations, has been widely investigated. Testing MAPF systems is challenging, due to the complexity and variety of scenarios and the agents' distribution and interaction. Moreover, MAPF testing suffers from the oracle problem, i.e., it is not always clear whether a test shows a failure or not. Indeed, only considering whether the agents reach their destinations without collision is not sufficient. Other properties related to the 'quality' of the generated paths should be assessed, e.g., an agent should not follow an unnecessarily long path. To tackle this issue, this article proposes MET-MAPF, a Metamorphic Testing approach for MAPF systems. We identified 10 Metamorphic Relations (MRs) that a MAPF system should guarantee, designed over the environment in which agents operate, the behaviour of the single agents and the interactions among agents. Starting from the different MRs, MET-MAPF automatically generates test cases addressing them, so possibly exposing different types of failures. Experimental results show that MET-MAPF can indeed find MR violations not exposed by approaches that only consider the completion of the mission as test oracle. Moreover, experiments show that different MRs expose different types of violations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. AAT4IRS: automated acceptance testing for industrial robotic systems.
- Author
-
dos Santos, Marcela G., Hallé, Sylvain, Petrillo, Fabio, Guéhéneuc, Yann-Gaël, Malavolta, Ivano, Aitken, Jonathan M., Rebelo, Luciana, and Campusano, Miguel
- Subjects
INDUSTRIAL robots ,COMPUTER software testing ,MANUFACTURING processes ,SYSTEM failures ,ROBOTICS software - Abstract
Industrial robotic systems (IRS) consist of industrial robots that automate industrial processes. They accurately perform repetitive tasks, replacing or assisting with dangerous jobs like assembly in the automotive and chemical industries. Failures in these systems can be catastrophic, so it is important to ensure their quality and safety before using them. One way to do this is by applying a software testing process to find faults before they become failures. However, software testing in industrial robotic systems has some challenges. These include differences in perspectives on software testing from people with diverse backgrounds, coordinating and collaborating with diverse teams, and performing software testing within the complex integration inherent in industrial environments. In traditional systems, a well-known development process uses simple, structured sentences in English to facilitate communication between project team members and business stakeholders. This process is called behavior-driven development (BDD), and one of its pillars is the use of templates to write user stories, scenarios, and automated acceptance tests. We propose a software testing (ST) approach called automated acceptance testing for industrial robotic systems (AAT4IRS) that uses natural language to write the features and scenarios to be tested. We evaluated our ST approach through a proof-of-concept, performing a pick-and-place process and applying mutation testing to measure its effectiveness. The results show that the test suites implemented using AAT4IRS were highly effective, with 79% of the generated mutants detected, thus instilling confidence in the robustness of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Hybrid mutation driven testing for natural language inference.
- Author
-
Meng, Linghan, Li, Yanhui, Chen, Lin, Ma, Mingliang, Zhou, Yuming, and Xu, Baowen
- Subjects
- *
COMPUTER software testing , *READING comprehension , *NATURAL languages , *LANGUAGE ability testing , *TEST methods - Abstract
Summary: Natural language inference (NLI) is a task to infer the relationship between the premise and hypothesis sentences, whose models have essential applications in the many natural language processing (NLP) fields, for example, machine reading comprehension and recognizing textual entailment. Due to the data‐driven programming paradigm, bugs inevitably occur in NLI models during the application process, which calls for novel automatic testing techniques to deal with NLI testing challenges. The main difficulty in achieving automatic testing for NLI models is the oracle problem; that is, it may be too expensive to label NLI model inputs manually and hence be too challenging to verify the correctness of model outputs. To tackle the oracle problem, this study proposes a novel automatic testing method hybrid mutation driven testing (HMT), which extends the mutation idea applied in other NLP domains successfully. Specifically, as there are two sets of sentences, that is, premise and hypothesis, to be mutated, we propose four mutation operators to achieve the hybrid mutation strategy, which mutate the premise and the hypothesis sentences jointly or individually. We assume that the mutation would not affect the outputs; that is, if the original and mutated outputs are inconsistent, inconsistency bugs could be detected without knowing the true labels. To evaluate our method HMT, we conduct experiments on two widely used datasets with two advanced models and generate more than 520,000 mutations by applying our mutation operators. Our experimental results show that (a) our method, HMT, can effectively generate mutated testing samples, (b) our method can effectively trigger the inconsistency bugs of the NLI models, and (c) all four mutation operators can independently trigger inconsistency bugs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Program Segment Testing for Human–Machine Pair Programming.
- Author
-
Rao, Lei, Liu, Shaoying, and Liu, Ai
- Subjects
COMPUTER software development ,WORKFLOW ,COMPUTERS ,COMPUTER software ,COMPUTER software testing - Abstract
Human–Machine Pair Programming (HMPP) is a promising technique in the software development process, which means that software construction can be done in the manner that humans are responsible for developing the program while computer is responsible for monitoring the program in real-time and reporting errors. The Java runtime exceptions in the current version of the software under construction can only be effectively detected by means of its execution. Traditional software testing techniques are suitable for testing completed programs but face a challenge in building a suitable testing environment for testing the partial programs produced during HMPP. In this paper, we put forward a novel technique, called Program Segment Testing (PST) for automatically identifying errors caused by runtime exceptions to support HMPP. We first introduce the relevant involved in this technique to detect index out of bounds exceptions, a representative of runtime exceptions. Then we discuss the methodology of this technique in detail and illustrate its workflow with a simple case study. Finally, we carry out an experiment to evaluate this technique and compare it with three existing fault detection techniques using several programs to demonstrate its effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. TPSQLi: Test Prioritization for SQL Injection Vulnerability Detection in Web Applications.
- Author
-
Yang, Guan-Yan, Wang, Farn, Gu, You-Zong, Teng, Ya-Wen, Yeh, Kuo-Hui, Ho, Ping-Hsueh, and Wen, Wei-Ling
- Subjects
INFORMATION technology security ,COMPUTER security vulnerabilities ,SQL ,TEST methods ,WORKFLOW ,COMPUTER software testing - Abstract
The rapid proliferation of network applications has led to a significant increase in network attacks. According to the OWASP Top 10 Projects report released in 2021, injection attacks rank among the top three vulnerabilities in software projects. This growing threat landscape has increased the complexity and workload of software testing, necessitating advanced tools to support agile development cycles. This paper introduces a novel test prioritization method for SQL injection vulnerabilities to enhance testing efficiency. By leveraging previous test outcomes, our method adjusts defense strength vectors for subsequent tests, optimizing the testing workflow and tailoring defense mechanisms to specific software needs. This approach aims to improve the effectiveness and efficiency of vulnerability detection and mitigation through a flexible framework that incorporates dynamic adjustments and considers the temporal aspects of vulnerability exposure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. A systematic mapping on software testing for blockchains.
- Author
-
Elakaş, Anıl, Sözer, Hasan, Şafak, Ilgın, and Kalkan, Kübra
- Subjects
- *
COMPUTER software testing , *CARTOGRAPHY software , *TEST methods , *SCALABILITY , *BLOCKCHAINS - Abstract
The purpose of this study is to identify and classify studies published on software testing techniques applied to blockchain systems. Previously published reviews in related areas have a narrow focus and/or do not follow a systematic review protocol. We conducted a systematic mapping based on an initial selection of 1025 studies. A rigorous selection process resulted in a final pool of 17 primary studies. These studies are categorized with respect to the employed testing methods, considered quality attributes, and functionality. We observe that most of the publications focus on testing functional correctness or security, whereas the testing of runtime performance attracts less attention. Existing approaches mostly employ fuzz testing or mutation testing. Search-based testing is usually combined with these techniques. The application of model-based testing is rare. The adaptability of fuzz testing and model-based testing techniques to changing blockchain platforms and languages remains a concern. On the other hand, performance and scalability issues are noted for search-based techniques and mutation testing. The use and integration of multiple testing techniques also stand out as a viable research direction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Path test data generation using adaptive simulated annealing particle swarm optimization.
- Author
-
Jiao, Chongyang and Zhou, Qinglei
- Subjects
- *
METAHEURISTIC algorithms , *SIMULATED annealing , *COMPUTER software development , *NP-hard problems , *COMPUTER software quality control , *PARTICLE swarm optimization , *COMPUTER software testing - Abstract
Software testing is an effective means of ensuring software quality. The cost of software testing is the main component of the total cost of software development. The generation of test data is very important in testing, because the efficiency of testing depends on the test data used. A significant part of software testing automation is the test data generation. How to automatically generate test datasets is still an open problem. Search-based software testing (SBST) is a process of generating test data that employs meta-heuristic methods to resolve tough NP-hard problems. The test data generation problem belongs to NP-hard problems. This paper focuses on the automatic generation of test data for path coverage based on control flow criteria in structural testing. Some meta-heuristic algorithms have been used to search for software test data. To advance the efficiency and effectiveness of path test data generation, an adaptive simulated annealing particle swarm optimization (ASAPSO) algorithm is proposed. The probabilistic jumping property of simulated annealing (SA) algorithm is introduced into the particle swarm optimization (PSO) algorithm to make the particle accept bad solution at a certain probability during the searching process, and therefore boost the capacity of the proposed algorithm to bounce from the local optimum. The fitness function is constructed by the branch function superposition to properly guide the search process. In addition, to improve the convergence speed of the algorithm, an adaptive adjustment scheme based on inertial weight and an adjustment scheme based on learning factor are proposed as well. The experimental results show that the proposed method can effectually avoid premature convergence and effectively upgrades the efficiency of generating test data automatically. It is also competitive in solving other complicated optimization problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Improving Early Fault Detection in Machine Learning Systems Using Data Diversity-Driven Metamorphic Relation Prioritization.
- Author
-
Srinivasan, Madhusudan and Kanewala, Upulee
- Subjects
MACHINE learning ,COMPUTER software testing ,INSTRUCTIONAL systems ,NEURONS ,DEEP learning - Abstract
Metamorphic testing is a valuable approach to verifying machine learning programs where traditional oracles are unavailable or difficult to apply. This paper proposes a technique to prioritize metamorphic relations (MRs) in metamorphic testing for machine learning and deep learning systems, aiming to enhance early fault detection. We introduce five metrics based on diversity in source and follow-up test cases to prioritize MRs. The effectiveness of our proposed prioritization methods is evaluated on three machine learning and one deep learning algorithm implementation. We compare our approach against random-based, fault-based, and neuron activation coverage-based MR ordering. The results show that our data diversity-based prioritization performs comparably to fault-based prioritization, reducing fault detection time by up to 62% compared to random MR execution. Our proposed metrics outperformed neuron activation coverage-based prioritization, providing 5–550% higher fault detection effectiveness. Overall, our approach to prioritizing metamorphic relations leads to increased fault detection effectiveness and reduced average fault detection time. This improvement in efficiency can result in significant time and cost savings when applying metamorphic testing to machine learning and deep learning systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Towards Improving the Quality of Requirement and Testing Process in Agile Software Development: An Empirical Study.
- Author
-
Ilays, Irum, Hafeez, Yaser, Almashfi, Nabil, Ali, Sadia, Humayun, Mamoona, Aqib, Muhammad, and Alwakid, Ghadah
- Subjects
CASE-based reasoning ,APRIORI algorithm ,REQUIREMENTS engineering ,SATISFACTION ,RESEARCH personnel ,AGILE software development ,COMPUTER software testing - Abstract
Software testing is a critical phase due to misconceptions about ambiguities in the requirements during specification, which affect the testing process. Therefore, it is difficult to identify all faults in software. As requirement changes continuously, it increases the irrelevancy and redundancy during testing. Due to these challenges; fault detection capability decreases and there arises a need to improve the testing process, which is based on changes in requirements specification. In this research, we have developed a model to resolve testing challenges through requirement prioritization and prediction in an agile-based environment. The research objective is to identify the most relevant and meaningful requirements through semantic analysis for correct change analysis. Then compute the similarity of requirements through case-based reasoning, which predicted the requirements for reuse and restricted to error-based requirements. Afterward, the apriori algorithm mapped out requirement frequency to select relevant test cases based on frequently reused or not reused test cases to increase the fault detection rate. Furthermore, the proposed model was evaluated by conducting experiments. The results showed that requirement redundancy and irrelevancy improved due to semantic analysis, which correctly predicted the requirements, increasing the fault detection rate and resulting in high user satisfaction. The predicted requirements are mapped into test cases, increasing the fault detection rate after changes to achieve higher user satisfaction. Therefore, the model improves the redundancy and irrelevancy of requirements by more than 90% compared to other clustering methods and the analytical hierarchical process, achieving an 80% fault detection rate at an earlier stage. Hence, it provides guidelines for practitioners and researchers in the modern era. In the future, we will provide the working prototype of this model for proof of concept. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Automated Testing Linguistic Capabilities of NLP Models.
- Author
-
Lee, Jaeseong, Chen, Simin, Mordahl, Austin, Liu, Cong, Yang, Wei, and Wei, Shiyi
- Subjects
SENTIMENT analysis ,COMPUTER software testing ,HATE speech ,TRUST ,VERBAL behavior testing ,NATURAL language processing - Abstract
Natural language processing (NLP) has gained widespread adoption in the development of real-world applications. However, the black-box nature of neural networks in NLP applications poses a challenge when evaluating their performance, let alone ensuring it. Recent research has proposed testing techniques to enhance the trustworthiness of NLP-based applications. However, most existing works use a single, aggregated metric (i.e., accuracy) which is difficult for users to assess NLP model performance on fine-grained aspects, such as LCs. To address this limitation, we present ALiCT, an automated testing technique for validating NLP applications based on their LCs. ALiCT takes user-specified LCs as inputs and produces diverse test suite with test oracles for each of given LC. We evaluate ALiCT on two widely adopted NLP tasks, sentiment analysis and hate speech detection, in terms of diversity, effectiveness, and consistency. Using Self-BLEU and syntactic diversity metrics, our findings reveal that ALiCT generates test cases that are 190% and 2213% more diverse in semantics and syntax, respectively, compared to those produced by state-of-the-art techniques. In addition, ALiCT is capable of producing a larger number of NLP model failures in 22 out of 25 LCs over the two NLP applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning.
- Author
-
Yu, Shengcheng, Fang, Chunrong, Li, Xin, Ling, Yuchen, Chen, Zhenyu, and Su, Zhendong
- Subjects
COMPUTER vision ,WEB-based user interfaces ,APPLICATION software ,COMPUTER software testing ,MOBILE apps ,REINFORCEMENT learning ,MOBILE learning - Abstract
Software applications (apps) have been playing an increasingly important role in various aspects of society. In particular, mobile apps and web apps are the most prevalent among all applications and are widely used in various industries as well as in people's daily lives. To help ensure mobile and web app quality, many approaches have been introduced to improve app GUI testing via automated exploration, including random testing, model-based testing, learning-based testing, and so on. Despite the extensive effort, existing approaches are still limited in reaching high code coverage, constructing high-quality models, and being generally applicable. Reinforcement learning-based approaches, as a group of representative and advanced approaches for automated GUI exploration testing, are faced with difficult challenges, including effective app state abstraction, reward function design, and so on. Moreover, they heavily depend on the specific execution platforms (i.e., Android or Web), thus leading to poor generalizability and being unable to adapt to different platforms. This work specifically tackles these challenges based on the high-level observation that apps from distinct platforms share commonalities in GUI design. Indeed, we propose PIRLTest, an effective platform-independent approach for app testing. Specifically, PIRLTest utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing. It extracts the GUI widgets from GUI pages and characterizes the corresponding GUI layouts, embedding the GUI pages as states. The app GUI state combines the macroscopic perspective (app GUI layout) and the microscopic perspective (app GUI widget) and attaches the critical semantic information from GUI images. This enables PIRLTest to be platform-independent and makes the testing approach generally applicable on different platforms. PIRLTest explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs to encourage more exploration in uncovered pages without platform dependency. The exploration will be assigned with rewards for all actions, which are designed considering both the app GUI states and the concrete widgets, to help the framework explore more uncovered pages. We conduct an empirical study on 20 mobile apps and 5 web apps, and the results show that PIRLTest is zero-cost when being adapted to different platforms, and can perform better than the baselines, covering 6.3–41.4% more code on mobile apps and 1.5–51.1% more code on web apps. PIRLTest is capable of detecting 128 unique bugs on mobile and web apps, including 100 bugs that cannot be detected by the baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. LLMEffiChecker : Understanding and Testing Efficiency Degradation of Large Language Models.
- Author
-
Feng, Xiaoning, Han, Xiaohong, Chen, Simin, and Yang, Wei
- Subjects
LANGUAGE models ,COMPUTER software testing ,MACHINE learning ,ENERGY consumption ,TRANSLATORS - Abstract
Large Language Models (LLMs) have received much recent attention due to their human-level accuracy. While existing works mostly focus on either improving accuracy or testing accuracy robustness, the computation efficiency of LLMs, which is of paramount importance due to often vast generation demands and real-time requirements, has surprisingly received little attention. In this article, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-art LLMs. By analyzing the working mechanism and implementation of 20,543 public-accessible LLMs, we observe a fundamental property in LLMs that could be manipulated in an adversarial manner to reduce computation efficiency significantly. Our interesting observation is that the output length determines the computation efficiency of LLMs instead of the input, where the output length depends on two factors: an often sufficiently large yet pessimistic pre-configured threshold controlling the max number of iterations and a runtime-generated end of sentence (EOS) token. Our key motivation is to generate test inputs that could sufficiently delay the generation of EOS such that LLMs would have to go through enough iterations to satisfy the pre-configured threshold. We present LLMEffiChecker, which can work under both white-box setting and black-box setting. In the white-box scenario, LLMEffiChecker develops a gradient-guided technique that searches for a minimal and unnoticeable perturbation at character-level, token-level, and structure-level. In the black-box scenario, LLMEffiChecker employs a causal inference-based approach to find critical tokens and similarly applies three levels of imperceptible perturbation to them. Both the white-box and black-box settings effectively delay the appearance of EOS, compelling these inputs to reach the naturally unreachable threshold. To demonstrate the effectiveness of LLMEffiChecker, we conduct a systematic evaluation on nine publicly available LLMs: Google T5, AllenAI WMT14, Helsinki-NLP translator, Facebook FairSeq, UNICAMP-DL translator, MarianMT, Google FLAN-T5, MBZUAI LaMini-GPT, and Salesforce CodeGen. Experimental results show that LLMEffiChecker can increase on average LLMs' response latency and energy consumption by 325% to 3,244% and 344% to 3,616%, respectively, by perturbing just one character or token in the input sentence. Our case study shows that inputs generated by LLMEffiChecker significantly affect the battery power in real-world mobile devices (i.e., drain more than 30 times battery power than normal inputs). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Keeper: Automated Testing and Fixing of Machine Learning Software.
- Author
-
Wan, Chengcheng, Liu, Shicheng, Xie, Sophie, Liu, Yuhan, Hoffmann, Henry, Maire, Michael, and Lu, Shan
- Subjects
COMPUTER software correctness ,MACHINE learning ,ENGINE testing ,APPLICATION software ,JUDGMENT (Psychology) - Abstract
The increasing number of software applications incorporating machine learning (ML) solutions has led to the need for testing techniques. However, testing ML software requires tremendous human effort to design realistic and relevant test inputs and to judge software output correctness according to human common sense. Even when misbehavior is exposed, it is often unclear whether the defect is inside ML API or the surrounding code and how to fix the implementation. This article tackles these challenges by proposing Keeper, an automated testing and fixing tool for ML software. The core idea of Keeper is designing pseudo-inverse functions that semantically reverse the corresponding ML task in an empirical way and proxy common human judgment of real-world data. It incorporates these functions into a symbolic execution engine to generate tests. Keeper also detects code smells that degrade software performance. Once misbehavior is exposed, Keeper attempts to change how ML APIs are used to alleviate the misbehavior. Our evaluation on a variety of applications shows that Keeper greatly improves branch coverage, while identifying 74 previously unknown failures and 19 code smells from 56 out of 104 applications. Our user studies show that 78% of end-users and 95% of developers agree with Keeper's detection and fixing results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.