Author: "Fang, Chunrong" / Database: OAIster - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Fang, Chunrong"' showing total 47 results

Start Over Author "Fang, Chunrong" Database OAIster

47 results on '"Fang, Chunrong"'

1. On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder

Author: Han, Tingxu, Huang, Shenghan, Ding, Ziqi, Sun, Weisong, Feng, Yebo, Fang, Chunrong, Li, Jun, Qian, Hanwei, Wu, Cong, Zhang, Quanjun, Liu, Yang, Chen, Zhenyu, Han, Tingxu, Huang, Shenghan, Ding, Ziqi, Sun, Weisong, Feng, Yebo, Fang, Chunrong, Li, Jun, Qian, Hanwei, Wu, Cong, Zhang, Quanjun, Liu, Yang, and Chen, Zhenyu
Abstract: In this paper, we study a defense against poisoned encoders in SSL called distillation, which is a defense used in supervised learning originally. Distillation aims to distill knowledge from a given model (a.k.a the teacher net) and transfer it to another (a.k.a the student net). Now, we use it to distill benign knowledge from poisoned pre-trained encoders and transfer it to a new encoder, resulting in a clean pre-trained encoder. In particular, we conduct an empirical study on the effectiveness and performance of distillation against poisoned encoders. Using two state-of-the-art backdoor attacks against pre-trained image encoders and four commonly used image classification datasets, our experimental results show that distillation can reduce attack success rate from 80.87% to 27.51% while suffering a 6.35% loss in accuracy. Moreover, we investigate the impact of three core components of distillation on performance: teacher net, student net, and distillation loss. By comparing 4 different teacher nets, 3 student nets, and 6 distillation losses, we find that fine-tuned teacher nets, warm-up-training-based student nets, and attention-based distillation loss perform best, respectively.
Published: 2024

2. Pre-trained Model-based Actionable Warning Identification: A Feasibility Study

Author: Ge, Xiuting, Fang, Chunrong, Zhang, Quanjun, Wu, Daoyuan, Yu, Bowen, Zheng, Qirui, Guo, An, Lin, Shangwei, Zhao, Zhihong, Liu, Yang, Chen, Zhenyu, Ge, Xiuting, Fang, Chunrong, Zhang, Quanjun, Wu, Daoyuan, Yu, Bowen, Zheng, Qirui, Guo, An, Lin, Shangwei, Zhao, Zhihong, Liu, Yang, and Chen, Zhenyu
Abstract: Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have been trained through billions of text/code tokens and demonstrated substantial success applications on various code-related tasks, could potentially circumvent the above problem. Nevertheless, the performance of PTMs on AWI has not been systematically investigated, leaving a gap in understanding their pros and cons. In this paper, we are the first to explore the feasibility of applying various PTMs for AWI. By conducting the extensive evaluation on 10K+ SpotBugs warnings from 10 large-scale and open-source projects, we observe that all studied PTMs are consistently 9.85%~21.12% better than the state-of-the-art ML-based AWI approaches. Besides, we investigate the impact of three primary aspects (i.e., data preprocessing, model training, and model prediction) in the typical PTM-based AWI workflow. Further, we identify the reasons for current PTMs' underperformance on AWI. Based on our findings, we provide several practical guidelines to enhance PTM-based AWI in future work.
Published: 2024

3. Machine Translation Testing via Syntactic Tree Pruning

Author: Zhang, Quanjun, Zhai, Juan, Fang, Chunrong, Liu, Jiawei, Sun, Weisong, Hu, Haichuan, Wang, Qingyu, Zhang, Quanjun, Zhai, Juan, Fang, Chunrong, Liu, Jiawei, Sun, Weisong, Hu, Haichuan, and Wang, Qingyu
Abstract: Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structure and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP can accurately find 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them cannot be found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective to detect translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average., Comment: Accepted to ACM Transactions on Software Engineering and Methodology 2024 (TOSEM'24)
Published: 2024

4. Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

Author: Han, Tingxu, Sun, Weisong, Ding, Ziqi, Fang, Chunrong, Qian, Hanwei, Li, Jiaxun, Chen, Zhenyu, Zhang, Xiangyu, Han, Tingxu, Sun, Weisong, Ding, Ziqi, Fang, Chunrong, Qian, Hanwei, Li, Jiaxun, Chen, Zhenyu, and Zhang, Xiangyu
Abstract: Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for downstream task models. However, their effectiveness is impaired and limited when adapted to pre-trained encoders, due to the lack of label information when pre-training. To address backdoor attacks against pre-trained encoders, in this paper, we innovatively propose a mutual information guided backdoor mitigation technique, named MIMIC. MIMIC treats the potentially backdoored encoder as the teacher net and employs knowledge distillation to distill a clean student encoder from the teacher net. Different from existing knowledge distillation approaches, MIMIC initializes the student with random weights, inheriting no backdoors from teacher nets. Then MIMIC leverages mutual information between each layer and extracted features to locate where benign knowledge lies in the teacher net, with which distillation is deployed to clone clean features from teacher to student. We craft the distillation loss with two aspects, including clone loss and attention loss, aiming to mitigate backdoors and maintain encoder performance at the same time. Our evaluation conducted on two backdoor attacks in SSL demonstrates that MIMIC can significantly reduce the attack success rate by only utilizing <5% of clean data, surpassing seven state-of-the-art backdoor mitigation techniques.
Published: 2024

5. Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation

Author: Xiao, Yuan, Ma, Shiqing, Zhai, Juan, Fang, Chunrong, Jia, Jinyuan, Chen, Zhenyu, Xiao, Yuan, Ma, Shiqing, Zhai, Juan, Fang, Chunrong, Jia, Jinyuan, and Chen, Zhenyu
Abstract: The robustness of convolutional neural networks (CNNs) is vital to modern AI-driven systems. It can be quantified by formal verification by providing a certified lower bound, within which any perturbation does not alter the original input's classification result. It is challenging due to nonlinear components, such as MaxPool. At present, many verification methods are sound but risk losing some precision to enhance efficiency and scalability, and thus, a certified lower bound is a crucial criterion for evaluating the performance of verification tools. In this paper, we present MaxLin, a robustness verifier for MaxPool-based CNNs with tight linear approximation. By tightening the linear approximation of the MaxPool function, we can certify larger certified lower bounds of CNNs. We evaluate MaxLin with open-sourced benchmarks, including LeNet and networks trained on the MNIST, CIFAR-10, and Tiny ImageNet datasets. The results show that MaxLin outperforms state-of-the-art tools with up to 110.60% improvement regarding the certified lower bound and 5.13 $\times$ speedup for the same neural networks. Our code is available at https://github.com/xiaoyuanpigo/maxlin., Comment: Accepted to CVPR2024. Project page: https://github.com/xiaoyuanpigo/maxlin
Published: 2024

6. A Systematic Literature Review on Large Language Models for Automated Program Repair

Author: Zhang, Quanjun, Fang, Chunrong, Xie, Yang, Ma, YuXiang, Sun, Weisong, Yang, Yun, Chen, Zhenyu, Zhang, Quanjun, Fang, Chunrong, Xie, Yang, Ma, YuXiang, Sun, Weisong, Yang, Yun, and Chen, Zhenyu
Abstract: Automated Program Repair (APR) attempts to patch software bugs and reduce manual debugging efforts. Very recently, with the advances in Large Language Models (LLMs), an increasing number of APR techniques have been proposed, facilitating software development and maintenance and demonstrating remarkable performance. However, due to ongoing explorations in the LLM-based APR field, it is challenging for researchers to understand the current achievements, challenges, and potential opportunities. This work provides the first systematic literature review to summarize the applications of LLMs in APR between 2020 and 2024. We analyze 127 relevant papers from LLMs, APR and their integration perspectives. First, we categorize existing popular LLMs that are applied to support APR and outline three types of utilization strategies for their deployment. Besides, we detail some specific repair scenarios that benefit from LLMs, e.g., semantic bugs and security vulnerabilities. Furthermore, we discuss several critical aspects of integrating LLMs into APR research, e.g., input forms and open science. Finally, we highlight a set of challenges remaining to be investigated and the potential guidelines for future research. Overall, our paper provides a systematic overview of the research landscape to the APR community, helping researchers gain a comprehensive understanding of achievements and promote future research., Comment: update new papers
Published: 2024

7. APPT: Boosting Automated Patch Correctness Prediction via Fine-tuning Pre-trained Models

Author: Zhang, Quanjun, Fang, Chunrong, Sun, Weisong, Liu, Yan, He, Tieke, Hao, Xiaodong, Chen, Zhenyu, Zhang, Quanjun, Fang, Chunrong, Sun, Weisong, Liu, Yan, He, Tieke, Hao, Xiaodong, and Chen, Zhenyu
Abstract: Automated program repair (APR) aims to fix software bugs automatically without human debugging efforts and plays a crucial role in software development and maintenance. Despite promising, APR is still challenged by a long-standing overfitting problem (i.e., the generated patch is plausible but overfitting). Various techniques have thus been proposed to address the overfitting problem. Recently, researchers have employed BERT to extract code features, which are then used to train a classifier for patch correctness prediction. However, BERT is restricted to feature extraction for classifier training without benefiting from the training process, potentially generating sub-optimal vector representations for patched code snippets. In this paper, we propose APPT, a pre-trained model-based automated patch correctness assessment technique by both pre-training and fine-tuning. APPT adopts a pre-trained model as the encoder stack, followed by an LSTM stack and a deep learning classifier. More importantly, the pre-trained model is fine-tuned in conjunction with other components as a whole pipeline to fully adapt it specifically for reasoning about patch correctness. We conduct an extensive experiment on 1,183 Defects4J patches and the experimental results show that APPT achieves prediction accuracy of 79.7% and recall of 83.2%, outperforming CACHE by 4.3% and 6.7%. Our additional investigation on 49,694 real-world patches shows that APPT achieves the optimum performance compared with existing representation learning techniques. We further investigate the impact of each component and find that they all positively contribute to APPT, e.g., the fine-tuning process and the LSTM stack increase F1-score by 10.22% and 4.11%, respectively. We also prove that adopting advanced pre-trained models can further provide substantial advancement, highlighting the generalizability of APPT., Comment: Accepted to IEEE Transactions on Software Engineering 2024 (TSE'24)
Published: 2023

8. A Survey of Learning-based Automated Program Repair

Author: Zhang, Quanjun, Fang, Chunrong, Ma, Yuxiang, Sun, Weisong, Chen, Zhenyu, Zhang, Quanjun, Fang, Chunrong, Ma, Yuxiang, Sun, Weisong, and Chen, Zhenyu
Abstract: Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive open-source code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}., Comment: Accepted to ACM Transactions on Software Engineering and Methodology 2023 (TOSEM'23), 69 pages, 6 figures, 7 tables
Published: 2023

9. Backdooring Neural Code Search

Author: Sun, Weisong, Chen, Yuchen, Tao, Guanhong, Fang, Chunrong, Zhang, Xiangyu, Zhang, Quanjun, Luo, Bin, Sun, Weisong, Chen, Yuchen, Tao, Guanhong, Fang, Chunrong, Zhang, Xiangyu, Zhang, Quanjun, and Luo, Bin
Abstract: Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to their impressive performance. However, the security aspect of these models is rarely studied. Particularly, an adversary can inject a backdoor in neural code search models, which return buggy or even vulnerable code with security/privacy issues. This may impact the downstream software (e.g., stock trading systems and autonomous driving) and cause financial loss and/or life-threatening incidents. In this paper, we demonstrate such attacks are feasible and can be quite stealthy. By simply modifying one variable/function name, the attacker can make buggy/vulnerable code rank in the top 11%. Our attack BADCODE features a special trigger generation and injection procedure, making the attack more effective and stealthy. The evaluation is conducted on two neural code search models and the results show our attack outperforms baselines by 60%. Our user study demonstrates that our attack is more stealthy than the baseline by two times based on the F1 score., Comment: Accepted to the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
Published: 2023

10. Automatic Code Summarization via ChatGPT: How Far Are We?

Author: Sun, Weisong, Fang, Chunrong, You, Yudu, Miao, Yun, Liu, Yi, Li, Yuekang, Deng, Gelei, Huang, Shenghan, Chen, Yuchen, Zhang, Quanjun, Qian, Hanwei, Liu, Yang, Chen, Zhenyu, Sun, Weisong, Fang, Chunrong, You, Yudu, Miao, Yun, Liu, Yi, Li, Yuekang, Deng, Gelei, Huang, Shenghan, Chen, Yuchen, Zhang, Quanjun, Qian, Hanwei, Liu, Yang, and Chen, Zhenyu
Abstract: To support software developers in understanding and maintaining programs, various automatic code summarization techniques have been proposed to generate a concise natural language comment for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of natural language processing tasks. Among them, ChatGPT is the most popular one which has attracted wide attention from the software engineering community. However, it still remains unclear how ChatGPT performs in (automatic) code summarization. Therefore, in this paper, we focus on evaluating ChatGPT on a widely-used Python dataset called CSN-Python and comparing it with several state-of-the-art (SOTA) code summarization models. Specifically, we first explore an appropriate prompt to guide ChatGPT to generate in-distribution comments. Then, we use such a prompt to ask ChatGPT to generate comments for all code snippets in the CSN-Python test set. We adopt three widely-used metrics (including BLEU, METEOR, and ROUGE-L) to measure the quality of the comments generated by ChatGPT and SOTA models (including NCS, CodeBERT, and CodeT5). The experimental results show that in terms of BLEU and ROUGE-L, ChatGPT's code summarization performance is significantly worse than all three SOTA models. We also present some cases and discuss the advantages and disadvantages of ChatGPT in code summarization. Based on the findings, we outline several open challenges and opportunities in ChatGPT-based code summarization.
Published: 2023

11. A Prompt Learning Framework for Source Code Summarization

Author: Sun, Weisong, Fang, Chunrong, You, Yudu, Chen, Yuchen, Liu, Yi, Wang, Chong, Zhang, Jian, Zhang, Quanjun, Qian, Hanwei, Zhao, Wei, Liu, Yang, Chen, Zhenyu, Sun, Weisong, Fang, Chunrong, You, Yudu, Chen, Yuchen, Liu, Yi, Wang, Chong, Zhang, Jian, Zhang, Quanjun, Qian, Hanwei, Zhao, Wei, Liu, Yang, and Chen, Zhenyu
Abstract: (Source) code summarization is the task of automatically generating natural language summaries for given code snippets. Such summaries play a key role in helping developers understand and maintain source code. Recently, with the successful application of large language models (LLMs) in numerous fields, software engineering researchers have also attempted to adapt LLMs to solve code summarization tasks. The main adaptation schemes include instruction prompting and task-oriented fine-tuning. However, instruction prompting involves designing crafted prompts for zero-shot learning or selecting appropriate samples for few-shot learning and requires users to have professional domain knowledge, while task-oriented fine-tuning requires high training costs. In this paper, we propose a novel prompt learning framework for code summarization called PromptCS. PromptCS trains a prompt agent that can generate continuous prompts to unleash the potential for LLMs in code summarization. Compared to the human-written discrete prompt, the continuous prompts are produced under the guidance of LLMs and are therefore easier to understand by LLMs. PromptCS freezes the parameters of LLMs when training the prompt agent, which can greatly reduce the requirements for training resources. We evaluate PromptCS on the CodeSearchNet dataset involving multiple programming languages. The results show that PromptCS significantly outperforms instruction prompting schemes on all four widely used metrics. In some base LLMs, e.g., CodeGen-Multi-2B and StarCoderBase-1B and -3B, PromptCS even outperforms the task-oriented fine-tuning scheme. More importantly, the training efficiency of PromptCS is faster than the task-oriented fine-tuning scheme, with a more pronounced advantage on larger LLMs. The results of the human evaluation demonstrate that PromptCS can generate more good summaries compared to baselines., Comment: submitted to ACM Transactions on Software Engineering and Methodology
Published: 2023

12. A Survey on Large Language Models for Software Engineering

Author: Zhang, Quanjun, Fang, Chunrong, Xie, Yang, Zhang, Yaxin, Yang, Yun, Sun, Weisong, Yu, Shengcheng, Chen, Zhenyu, Zhang, Quanjun, Fang, Chunrong, Xie, Yang, Zhang, Yaxin, Yang, Yun, Sun, Weisong, Yu, Shengcheng, and Chen, Zhenyu
Abstract: Software Engineering (SE) is the systematic design, development, and maintenance of software applications, underpinning the digital infrastructure of our modern mainworld. Very recently, the SE community has seen a rapidly increasing number of techniques employing Large Language Models (LLMs) to automate a broad range of SE tasks. Nevertheless, existing information of the applications, effects, and possible limitations of LLMs within SE is still not well-studied. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the LLM-based SE community. We summarize 30 representative LLMs of Source Code across three model architectures, 15 pre-training objectives across four categories, and 16 downstream tasks across five categories. We then present a detailed summarization of the recent SE studies for which LLMs are commonly utilized, including 155 studies for 43 specific code-related tasks across four crucial phases within the SE workflow. Besides, we summarize existing attempts to empirically evaluate LLMs in SE, such as benchmarks, empirical studies, and exploration of SE education. We also discuss several critical aspects of optimization and applications of LLMs in SE, such as security attacks, model tuning, and model compression. Finally, we highlight several challenges and potential opportunities on applying LLMs for future SE studies, such as exploring domain LLMs and constructing clean evaluation datasets. Overall, our work can help researchers gain a comprehensive understanding about the achievements of the existing LLM-based SE studies and promote the practical application of these techniques. Our artifacts are publicly available and will continuously updated at the living repository: \url{https://github.com/iSEngLab/AwesomeLLM4SE}.
Published: 2023

13. Practical Non-Intrusive GUI Exploration Testing with Visual-based Robotic Arms

Author: Yu, Shengcheng, Fang, Chunrong, Du, Mingzhe, Ling, Yuchen, Chen, Zhenyu, Su, Zhendong, Yu, Shengcheng, Fang, Chunrong, Du, Mingzhe, Ling, Yuchen, Chen, Zhenyu, and Su, Zhendong
Abstract: GUI testing is significant in the SE community. Most existing frameworks are intrusive and only support some specific platforms. With the development of distinct scenarios, diverse embedded systems or customized operating systems on different devices do not support existing intrusive GUI testing frameworks. Some approaches adopt robotic arms to replace the interface invoking of mobile apps under test and use computer vision technologies to identify GUI elements. However, some challenges are unsolved. First, existing approaches assume that GUI screens are fixed so that they cannot be adapted to diverse systems with different screen conditions. Second, existing approaches use XY-plane robotic arms, which cannot flexibly simulate testing operations. Third, existing approaches ignore compatibility bugs and only focus on crash bugs. A more practical approach is required for the non-intrusive scenario. We propose a practical non-intrusive GUI testing framework with visual robotic arms. RoboTest integrates novel GUI screen and widget detection algorithms, adaptive to detecting screens of different sizes and then to extracting GUI widgets from the detected screens. Then, a set of testing operations is applied with a 4-DOF robotic arm, which effectively and flexibly simulates human testing operations. During app exploration, RoboTest integrates the Principle of Proximity-guided exploration strategy, choosing close widgets of the previous targets to reduce robotic arm movement overhead and improve exploration efficiency. RoboTest can effectively detect some compatibility bugs beyond crash bugs with a GUI comparison on different devices of the same test operations. We evaluate RoboTest with 20 mobile apps, with a case study on an embedded system. The results show that RoboTest can effectively, efficiently, and generally explore AUTs to find bugs and reduce exploration time overhead., Comment: Accepted by the 46th International Conference on Software Engineering (ICSE 2024)
Published: 2023

14. Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Author: Sun, Weisong, Fang, Chunrong, Miao, Yun, You, Yudu, Yuan, Mengzhe, Chen, Yuchen, Zhang, Quanjun, Guo, An, Chen, Xiang, Liu, Yang, Chen, Zhenyu, Sun, Weisong, Fang, Chunrong, Miao, Yun, You, Yudu, Yuan, Mengzhe, Chen, Yuchen, Zhang, Quanjun, Guo, An, Chen, Xiang, Liu, Yang, and Chen, Zhenyu
Abstract: Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of the source code features while preserving its semantics. These representations can be used for facilitating subsequent code-related tasks. The abstract syntax tree (AST), a fundamental code feature, illustrates the syntactic information of the source code and has been widely used in code representation learning. However, there is still a lack of systematic and quantitative evaluation of how well AST-based code representation facilitates subsequent code-related tasks. In this paper, we first conduct a comprehensive empirical study to explore the effectiveness of the AST-based code representation in facilitating follow-up code-related tasks. To do so, we compare the performance of models trained with code token sequence (Token for short) based code representation and AST-based code representation on three popular types of code-related tasks. Surprisingly, the overall quantitative statistical results demonstrate that models trained with AST-based code representation consistently perform worse across all three tasks compared to models trained with Token-based code representation. Our further quantitative analysis reveals that models trained with AST-based code representation outperform models trained with Token-based code representation in certain subsets of samples across all three tasks. We also conduct comprehensive experiments to evaluate and reveal the impact of the choice of AST parsing/preprocessing/encoding methods on AST-based code representation and subsequent code-related tasks. Our study provides future researchers with detailed guidance on how to select solutions at each stage to fully exploit AST., Comment: submitted to ACM Transactions on Software Engineering and Methodology. arXiv admin note: text overlap with arXiv:2103.10668 by other authors
Published: 2023

15. Machine Learning for Actionable Warning Identification: A Comprehensive Survey

Author: Ge, Xiuting, Fang, Chunrong, Li, Xuanye, Sun, Weisong, Wu, Daoyuan, Zhai, Juan, Lin, Shangwei, Zhao, Zhihong, Liu, Yang, Chen, Zhenyu, Ge, Xiuting, Fang, Chunrong, Li, Xuanye, Sun, Weisong, Wu, Daoyuan, Zhai, Juan, Lin, Shangwei, Zhao, Zhihong, Liu, Yang, and Chen, Zhenyu
Abstract: Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers/practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 50 primary studies from 2000/01/01 to 2023/09/01. Then, we outline the typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI).
Published: 2023

16. TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation

Author: Xian, Zixiang, Huang, Rubing, Towey, Dave, Fang, Chunrong, Chen, Zhenyu, Xian, Zixiang, Huang, Rubing, Towey, Dave, Fang, Chunrong, and Chen, Zhenyu
Abstract: Artificial intelligence (AI) has revolutionized software engineering (SE) by enhancing software development efficiency. The advent of pre-trained models (PTMs) leveraging transfer learning has significantly advanced AI for SE. However, existing PTMs that operate on individual code tokens suffer from several limitations: They are costly to train and fine-tune; and they rely heavily on labeled data for fine-tuning on task-specific datasets. In this paper, we present TransformCode, a novel framework that learns code embeddings in a contrastive learning manner. Our framework is encoder-agnostic and language-agnostic, which means that it can leverage any encoder model and handle any programming language. We also propose a novel data-augmentation technique called abstract syntax tree (AST) transformation, which applies syntactic and semantic transformations to the original code snippets, to generate more diverse and robust samples for contrastive learning. Our framework has several advantages over existing methods: (1) It is flexible and adaptable, because it can easily be extended to other downstream tasks that require code representation (such as code-clone detection and classification); (2) it is efficient and scalable, because it does not require a large model or a large amount of training data, and it can support any programming language; (3) it is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives; and (4) it can also adjust the number of encoder parameters based on computing resources. We evaluate our framework on several code-related tasks, and demonstrate its effectiveness and superiority over the state-of-the-art methods such as SourcererCC, Code2vec, and InferCode., Comment: To be published in IEEE Transactions on Software Engineering
Published: 2023
Full Text: View/download PDF

17. A Survey of Source Code Search: A 3-Dimensional Perspective

Author: Sun, Weisong, Fang, Chunrong, Ge, Yifei, Hu, Yuling, Chen, Yuchen, Zhang, Quanjun, Ge, Xiuting, Liu, Yang, Chen, Zhenyu, Sun, Weisong, Fang, Chunrong, Ge, Yifei, Hu, Yuling, Chen, Yuchen, Zhang, Quanjun, Ge, Xiuting, Liu, Yang, and Chen, Zhenyu
Abstract: (Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work., Comment: submitted to ACM Transactions on Software Engineering and Methodology
Published: 2023

18. Vision-Based Mobile App GUI Testing: A Survey

Author: Yu, Shengcheng, Fang, Chunrong, Tuo, Ziyuan, Zhang, Quanjun, Chen, Chunyang, Chen, Zhenyu, Su, Zhendong, Yu, Shengcheng, Fang, Chunrong, Tuo, Ziyuan, Zhang, Quanjun, Chen, Chunyang, Chen, Zhenyu, and Su, Zhendong
Abstract: Graphical User Interface (GUI) has become one of the most significant parts of mobile applications (apps). It is a direct bridge between mobile apps and end users, which directly affects the end user's experience. Neglecting GUI quality can undermine the value and effectiveness of the entire mobile app solution. Significant research efforts have been devoted to GUI testing, one effective method to ensure mobile app quality. By conducting rigorous GUI testing, developers can ensure that the visual and interactive elements of the mobile apps not only meet functional requirements but also provide a seamless and user-friendly experience. However, traditional solutions, relying on the source code or layout files, have met challenges in both effectiveness and efficiency due to the gap between what is obtained and what app GUI actually presents. Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies and have achieved promising progress. In this survey paper, we provide a comprehensive investigation of the state-of-the-art techniques on 226 papers, among which 78 are vision-based studies. This survey covers different topics of GUI testing, like GUI test generation, GUI test record & replay, GUI testing framework, etc. Specifically, the research emphasis of this survey is placed mostly on how vision-based techniques outperform traditional solutions and have gradually taken a vital place in the GUI testing field. Based on the investigation of existing studies, we outline the challenges and opportunities of (vision-based) mobile app GUI testing and propose promising research directions with the combination of emerging techniques.
Published: 2023

19. A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair

Author: Zhang, Quanjun, Zhang, Tongke, Zhai, Juan, Fang, Chunrong, Yu, Bowen, Sun, Weisong, Chen, Zhenyu, Zhang, Quanjun, Zhang, Tongke, Zhai, Juan, Fang, Chunrong, Yu, Bowen, Sun, Weisong, and Chen, Zhenyu
Abstract: Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion. For example, ChatGPT, the latest black-box LLM, has been investigated by numerous recent research studies and has shown impressive performance in various tasks. However, there exists a potential risk of data leakage since these LLMs are usually close-sourced with unknown specific training details, e.g., pre-training datasets. In this paper, we seek to review the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives. We first introduce {\benchmark}, a new benchmark with buggy and the corresponding fixed programs from competitive programming problems starting from 2023, after the training cutoff point of ChatGPT. The results on {\benchmark} show that ChatGPT is able to fix 109 out of 151 buggy programs using the basic prompt within 35 independent rounds, outperforming state-of-the-art LLMs CodeT5 and PLBART by 27.5\% and 62.4\% prediction accuracy. We also investigate the impact of three types of prompts, i.e., problem description, error feedback, and bug localization, leading to additional 34 fixed bugs. Besides, we provide additional discussion from the interactive nature of ChatGPT to illustrate the capacity of a dialog-based repair workflow with 9 additional fixed bugs. Inspired by the findings, we further pinpoint various challenges and opportunities for advanced SE study equipped with such LLMs (e.g.,~ChatGPT) in the near future. More importantly, our work calls for more research on the reevaluation of the achievements obtained by existing black-box LLMs across various SE tasks, not limited to ChatGPT on APR., Comment: add EvalGPTFix URL
Published: 2023

20. LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities

Author: Yu, Shengcheng, Fang, Chunrong, Ling, Yuchen, Wu, Chentian, Chen, Zhenyu, Yu, Shengcheng, Fang, Chunrong, Ling, Yuchen, Wu, Chentian, and Chen, Zhenyu
Abstract: This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation. Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks. However, existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications. These challenges arise due to differences in screen sizes, input modalities, platform behaviors, API inconsistencies, and application architectures. Overcoming these limitations is crucial for achieving robust and comprehensive test automation. By leveraging the capabilities of LLMs, we aim to address these challenges and explore its potential as a versatile tool for test automation. We investigate how well LLMs can adapt to diverse devices and systems while accurately capturing and generating test scripts. Additionally, we evaluate its cross-platform generation capabilities by assessing its ability to handle operating system variations and platform-specific behaviors. Furthermore, we explore the application of LLMs in cross-app migration, where it generates test scripts across different applications and software environments based on existing scripts. Throughout the investigation, we analyze its adaptability to various user interfaces, app architectures, and interaction patterns, ensuring accurate script generation and compatibility. The findings of this research contribute to the understanding of LLMs' capabilities in test automation. Ultimately, this research aims to enhance software testing practices, empowering app developers to achieve higher levels of software quality and development efficiency., Comment: Accepted by the 23rd IEEE International Conference on Software Quality, Reliability, and Security (QRS 2023)
Published: 2023

21. GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction

Author: Zhang, Quanjun, Fang, Chunrong, Zhang, Tongke, Yu, Bowen, Sun, Weisong, Chen, Zhenyu, Zhang, Quanjun, Fang, Chunrong, Zhang, Tongke, Yu, Bowen, Sun, Weisong, and Chen, Zhenyu
Abstract: Automated program repair (APR) aims to fix software bugs without human intervention and template-based APR has been widely investigated with promising results. However, it is challenging for template-based APR to select the appropriate donor code, which is an important repair ingredient for generating candidate patches. Inappropriate donor code may cause plausible but incorrect patch generation even with correct fix patterns, limiting the repair performance. In this paper, we aim to revisit template-based APR, and propose GAMMA, to directly leverage large pre-trained language models for donor code generation. Our main insight is that instead of retrieving donor code in the local buggy file, we can directly predict the correct code tokens based on the context code snippets and repair patterns by a cloze task. Specifically, (1) GAMMA revises a variety of fix templates from state-of-the-art template-based APR techniques (i.e., TBar) and transforms them into mask patterns. (2) GAMMA adopts a pre-trained language model to predict the correct code for masked code as a fill-in-the-blank task. The experimental results demonstrate that GAMMA correctly repairs 82 bugs on Defects4J-v1.2, which achieves 20.59\% (14 bugs) and 26.15\% (17 bugs) improvement over the previous state-of-the-art template-based approach TBar and learning-based one Recoder. Furthermore, GAMMA repairs 45 bugs and 22 bugs from the additional Defects4J-v2.0 and QuixBugs, indicating the generalizability of GAMMA in addressing the dataset overfitting issue. We also prove that adopting other pre-trained language models can provide substantial advancement, e.g., CodeBERT-based and ChatGPT-based GAMMA is able to fix 80 and 67 bugs on Defects4J-v1.2, indicating the scalability of GAMMA. Overall, our study highlights the promising future of adopting pre-trained models to generate correct patches on top of fix patterns., Comment: Accepted to 38th IEEE/ACM International Conference on Automated Software Engineering (ASE2023)
Published: 2023

22. Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

Author: Song, Jiayang, Zhou, Zhehua, Liu, Jiawei, Fang, Chunrong, Shu, Zhan, Ma, Lei, Song, Jiayang, Zhou, Zhehua, Liu, Jiawei, Fang, Chunrong, Shu, Zhan, and Ma, Lei
Abstract: Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge, such as reasoning and planning. Recognizing that reward function design is also inherently linked to such knowledge, LLM offers a promising potential in this context. Motivated by this, we propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design. The framework commences with the LLM formulating an initial reward function based on natural language inputs. Then, the performance of the reward function is assessed, and the results are presented back to the LLM for guiding its self-refinement process. We examine the performance of our proposed framework through a variety of continuous robotic control tasks across three diverse robotic systems. The results indicate that our LLM-designed reward functions are able to rival or even surpass manually designed reward functions, highlighting the efficacy and applicability of our approach.
Published: 2023

23. Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test Generation

Author: Zhou, Zhichao, Zhou, Yuming, Fang, Chunrong, Chen, Zhenyu, Luo, Xiapu, He, Jingzhu, Tang, Yutian, Zhou, Zhichao, Zhou, Yuming, Fang, Chunrong, Chen, Zhenyu, Luo, Xiapu, He, Jingzhu, and Tang, Yutian
Abstract: Unit testing is critical to the software development process, ensuring the correctness of basic programming units in a program (e.g., a method). Search-based software testing (SBST) is an automated approach to generating test cases. SBST generates test cases with genetic algorithms by specifying the coverage criterion (e.g., branch coverage). However, a good test suite must have different properties, which cannot be captured using an individual coverage criterion. Therefore, the state-of-the-art approach combines multiple criteria to generate test cases. Since combining multiple coverage criteria brings multiple objectives for optimization, it hurts the test suites' coverage for certain criteria compared with using the single criterion. To cope with this problem, we propose a novel approach named \textbf{smart selection}. Based on the coverage correlations among criteria and the subsumption relationships among coverage goals, smart selection selects a subset of coverage goals to reduce the number of optimization objectives and avoid missing any properties of all criteria. We conduct experiments to evaluate smart selection on $400$ Java classes with three state-of-the-art genetic algorithms under the $2$-minute budget. On average, smart selection outperforms combining all goals on $65.1\%$ of the classes having significant differences between the two approaches. Secondly, we conduct experiments to verify our assumptions about coverage criteria relationships. Furthermore, we assess the coverage performance of smart selection under varying budgets of $5$, $8$, and $10$ minutes and explore its effect on bug detection, confirming the advantage of smart selection over combining all goals., Comment: arXiv admin note: substantial text overlap with arXiv:2208.04096
Published: 2023

24. Pre-trained Model-based Automated Software Vulnerability Repair: How Far are We?

Author: Zhang, Quanjun, Fang, Chunrong, Yu, Bowen, Sun, Weisong, Zhang, Tongke, Chen, Zhenyu, Zhang, Quanjun, Fang, Chunrong, Yu, Bowen, Sun, Weisong, Zhang, Tongke, and Chen, Zhenyu
Abstract: Various approaches are proposed to help under-resourced security researchers to detect and analyze software vulnerabilities. It is still incredibly time-consuming and labor-intensive for security researchers to fix vulnerabilities. The time lag between reporting and fixing a vulnerability causes software systems to suffer from significant exposure to possible attacks. Recently, some techniques have proposed applying pre-trained models to fix security vulnerabilities and have proved their success in improving repair accuracy. However, the effectiveness of existing pre-trained models has not been systematically analyzed, and little is known about their advantages and disadvantages. To bridge this gap, we perform the first extensive study on applying various pre-trained models to vulnerability repair. The results show that studied pre-trained models consistently outperform the state-of-the-art technique VRepair with a prediction accuracy of 32.94%~44.96%. We also investigate the impact of major phases in the vulnerability repair workflow. Surprisingly, a simplistic approach adopting transfer learning improves the prediction accuracy of pre-trained models by 9.40% on average. Besides, we provide additional discussion to illustrate the capacity and limitations of pre-trained models. Finally, we further pinpoint various practical guidelines for advancing pre-trained model-based vulnerability repair. Our study highlights the promising future of adopting pre-trained models to patch real-world vulnerabilities., Comment: Accepted to IEEE Transactions on Dependable and Secure Computing 2023 (TDSC'23)
Published: 2023

25. Certifying Robustness of Convolutional Neural Networks with Tight Linear Approximation

Author: Xiao, Yuan, Bai, Tongtong, Gu, Mingzheng, Fang, Chunrong, Chen, Zhenyu, Xiao, Yuan, Bai, Tongtong, Gu, Mingzheng, Fang, Chunrong, and Chen, Zhenyu
Abstract: The robustness of neural network classifiers is becoming important in the safety-critical domain and can be quantified by robustness verification. However, at present, efficient and scalable verification techniques are always sound but incomplete. Therefore, the improvement of certified robustness bounds is the key criterion to evaluate the superiority of robustness verification approaches. In this paper, we present a Tight Linear approximation approach for robustness verification of Convolutional Neural Networks(Ti-Lin). For general CNNs, we first provide a new linear constraints for S-shaped activation functions, which is better than both existing Neuron-wise Tightest and Network-wise Tightest tools. We then propose Neuron-wise Tightest linear bounds for Maxpool function. We implement Ti-Lin, the resulting verification method. We evaluate it with 48 different CNNs trained on MNIST, CIFAR-10, and Tiny ImageNet datasets. Experimental results show that Ti-Lin significantly outperforms other five state-of-the-art methods(CNN-Cert, DeepPoly, DeepCert, VeriNet, Newise). Concretely, Ti-Lin certifies much more precise robustness bounds on pure CNNs with Sigmoid/Tanh/Arctan functions and CNNs with Maxpooling function with at most 63.70% and 253.54% improvement, respectively.
Published: 2022

26. Universally Adaptive Cross-Platform Reinforcement Learning Testing via GUI Image Understanding

Author: Yu, Shengcheng, Fang, Chunrong, Liu, Yulei, Zhang, Ziqian, Yun, Yexiao, Li, Xin, Chen, Zhenyu, Yu, Shengcheng, Fang, Chunrong, Liu, Yulei, Zhang, Ziqian, Yun, Yexiao, Li, Xin, and Chen, Zhenyu
Abstract: With the rapid development of the Internet, more and more applications (app) are playing an important role in various aspects of the world. Among all apps, mobile apps and web apps are dominant in people's daily life and all industries. In order to tackle the challenges in ensuring the app quality, many approaches have been adopted to improve app GUI testing, including random technologies, model-based technologies, etc. However, existing approaches are still insufficient in reaching high code coverage, constructing high quality models, and achieving generalizability. Besides, current approaches is heavily dependent on the execution platforms (i.e., Android, Web). Apps of distinct platforms share commonalities in GUI design, which inspires us to propose a platform-independent approach with the development of computer vision algorithms. In this paper, we propose UniRLTest. It is a reinforcement learning based approach utilizing a universal framework with computer vision algorithms to conduct automated testing on apps from different platforms. UniRLTest extracts the GUI widgets from GUI pages and characterizes the GUI corresponding layouts, embedding the GUI pages as states. UniRLTest explores apps with the guidance of a novelly designed curiosity-driven strategy, which uses a Q-network to estimate the values of specific states and actions to encourage more exploration in uncovered pages without platform dependency. The state embedding similarity is used to calculate the rewards of each exploration step. We conduct an empirical study on 20 mobile apps and 5 web apps, and the results show that UniRLTest can perform better than the baselines, especially in the exploration of new states.
Published: 2022

27. Selectively Combining Multiple Coverage Goals in Search-Based Unit Test Generation

Author: Zhou, Zhichao, Zhou, Yuming, Fang, Chunrong, Chen, Zhenyu, Tang, Yutian, Zhou, Zhichao, Zhou, Yuming, Fang, Chunrong, Chen, Zhenyu, and Tang, Yutian
Abstract: Unit testing is a critical part of software development process, ensuring the correctness of basic programming units in a program (e.g., a method). Search-based software testing (SBST) is an automated approach to generating test cases. SBST generates test cases with genetic algorithms by specifying the coverage criterion (e.g., branch coverage). However, a good test suite must have different properties, which cannot be captured by using an individual coverage criterion. Therefore, the state-of-the-art approach combines multiple criteria to generate test cases. As combining multiple coverage criteria brings multiple objectives for optimization, it hurts the test suites' coverage for certain criteria compared with using the single criterion. To cope with this problem, we propose a novel approach named \textbf{smart selection}. Based on the coverage correlations among criteria and the coverage goals' subsumption relationships, smart selection selects a subset of coverage goals to reduce the number of optimization objectives and avoid missing any properties of all criteria. We conduct experiments to evaluate smart selection on $400$ Java classes with three state-of-the-art genetic algorithms. On average, smart selection outperforms combining all goals on $65.1\%$ of the classes having significant differences between the two approaches.
Published: 2022
Full Text: View/download PDF

28. CIRCLE: Continual Repair across Programming Languages

Author: Yuan, Wei, Zhang, Quanjun, He, Tieke, Fang, Chunrong, Hung, Nguyen Quoc Viet, Hao, Xiaodong, Yin, Hongzhi, Yuan, Wei, Zhang, Quanjun, He, Tieke, Fang, Chunrong, Hung, Nguyen Quoc Viet, Hao, Xiaodong, and Yin, Hongzhi
Abstract: Automatic Program Repair (APR) aims at fixing buggy source code with less manual debugging efforts, which plays a vital role in improving software reliability and development productivity. Recent APR works have achieved remarkable progress via applying deep learning (DL), particularly neural machine translation (NMT) techniques. However, we observe that existing DL-based APR models suffer from at least two severe drawbacks: (1) Most of them can only generate patches for a single programming language, as a result, to repair multiple languages, we have to build and train many repairing models. (2) Most of them are developed in an offline manner. Therefore, they won't function when there are new-coming requirements. To address the above problems, a T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely \emph{C}ont\emph{I}nual \emph{R}epair a\emph{C}ross Programming \emph{L}anguag\emph{E}s (\emph{CIRCLE}). Specifically, (1) CIRCLE utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR. (2) CIRCLE adopts a difficulty-based rehearsal strategy to achieve lifelong learning for APR without access to the full historical data. (3) An elastic regularization method is employed to strengthen CIRCLE's continual learning ability further, preventing it from catastrophic forgetting. (4) CIRCLE applies a simple but effective re-repairing method to revise generated errors caused by crossing multiple programming languages. We train CIRCLE for four languages (i.e., C, JAVA, JavaScript, and Python) and evaluate it on five commonly used benchmarks. The experimental results demonstrate that CIRCLE not only effectively and efficiently repairs multiple programming languages in continual learning settings, but also achieves state-of-the-art performance with a single repair model., Comment: This paper was accepted by ISSTA2022
Published: 2022

29. Investigating Coverage Guided Fuzzing with Mutation Testing

Author: Qian, Ruixiang, Zhang, Quanjun, Fang, Chunrong, Guo, Lihua, Qian, Ruixiang, Zhang, Quanjun, Fang, Chunrong, and Guo, Lihua
Abstract: Coverage guided fuzzing (CGF) is an effective testing technique which has detected hundreds of thousands of bugs from various software applications. It focuses on maximizing code coverage to reveal more bugs during fuzzing. However, a higher coverage does not necessarily imply a better fault detection capability. Triggering a bug involves not only exercising the specific program path but also reaching interesting program states in that path. In this paper, we use mutation testing to improve CGF in detecting bugs. We use mutation scores as feedback to guide fuzzing towards detecting bugs rather than just covering code. To evaluate our approach, we conduct a well-designed experiment on 5 benchmarks. We choose the state-of-the-art fuzzing technique Zest as baseline and construct two modified techniques on it using our approach. The experimental results show that our approach can improve CGF in both code coverage and bug detection., Comment: Accepted by Internetware 2022, conference, 10 pages
Published: 2022

30. Program Repair: Automated vs. Manual

Author: Zhang, Quanjun, Zhao, Yuan, Sun, Weisong, Fang, Chunrong, Wang, Ziyuan, Zhang, Lingming, Zhang, Quanjun, Zhao, Yuan, Sun, Weisong, Fang, Chunrong, Wang, Ziyuan, and Zhang, Lingming
Abstract: Various automated program repair (APR) techniques have been proposed to fix bugs automatically in the last decade. Although recent researches have made significant progress on the effectiveness and efficiency, it is still unclear how APR techniques perform with human intervention in a real debugging scenario. To bridge this gap, we conduct an extensive study to compare three state-of-the-art APR tools with manual program repair, and further investigate whether the assistance of APR tools (i.e., repair reports) can improve manual program repair. To that end, we recruit 20 participants for a controlled experiment, resulting in a total of 160 manual repair tasks and a questionnaire survey. The experiment reveals several notable observations that (1) manual program repair may be influenced by the frequency of repair actions sometimes; (2) APR tools are more efficient in terms of debugging time, while manual program repair tends to generate a correct patch with fewer attempts; (3) APR tools can further improve manual program repair regarding the number of correctly-fixed bugs, while there exists a negative impact on the patch correctness; (4) participants are used to consuming more time to identify incorrect patches, while they are still misguided easily; (5) participants are positive about the tools' repair performance, while they generally lack confidence about the usability in practice. Besides, we provide some guidelines for improving the usability of APR tools (e.g., the misleading information in reports and the observation of feedback).
Published: 2022

31. Code Search based on Context-aware Code Translation

Author: Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Tao, Guanhong, Han, Tingxu, Zhang, Quanjun, Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Tao, Guanhong, Han, Tingxu, and Zhang, Quanjun
Abstract: Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning models to construct embedding representations for code snippets and queries, respectively. Features such as abstract syntactic trees, control flow graphs, etc., are commonly employed for representing the semantics of code snippets. However, the same structure of these features does not necessarily denote the same semantics of code snippets, and vice versa. In addition, these techniques utilize multiple different word mapping functions that map query words/code tokens to embedding representations. This causes diverged embeddings of the same word/token in queries and code snippets. We propose a novel context-aware code translation technique that translates code snippets into natural language descriptions (called translations). The code translation is conducted on machine instructions, where the context information is collected by simulating the execution of instructions. We further design a shared word mapping function using one single vocabulary for generating embeddings for both translations and queries. We evaluate the effectiveness of our technique, called TranCS, on the CodeSearchNet corpus with 1,000 queries. Experimental results show that TranCS significantly outperforms state-of-the-art techniques by 49.31% to 66.50% in terms of MRR (mean reciprocal rank)., Comment: to be published in the 44th IEEE/ACM International Conference on Software Engineering (ICSE 2022) (ICSE'22)
Published: 2022
Full Text: View/download PDF

32. Test Case Prioritization Using Partial Attention

Author: Zhang, Quanjun, Fang, Chunrong, Sun, Weisong, Yu, Shengcheng, Xu, Yutao, Zhang, Quanjun, Fang, Chunrong, Sun, Weisong, Yu, Shengcheng, and Xu, Yutao
Abstract: Test case prioritization (TCP) aims to reorder the regression test suite with a goal of increasing the fault detection rate. Various TCP techniques have been proposed based on different prioritization strategies. Among them, the greedy-based techniques are the most widely-used TCP techniques. However, existing greedy-based techniques usually reorder all candidate test cases in prioritization iterations, resulting in both efficiency and effectiveness problems. In this paper, we propose a generic partial attention mechanism, which adopts the previous priority values (i.e., the number of additionally-covered code units) to avoid considering all candidate test cases. Incorporating the mechanism with the additional-greedy strategy, we implement a novel coverage-based TCP technique based on partition ordering (OCP). OCP first groups the candidate test cases into different partitions and updates the partitions on the descending order. We conduct a comprehensive experiment on 19 versions of Java programs and 30 versions of C programs to compare the effectiveness and efficiency of OCP with six state-of-the-art TCP techniques: total-greedy, additional-greedy, lexicographical-greedy, unify-greedy, art-based, and search-based. The experimental results show that OCP achieves a better fault detection rate than the state-of-the-arts. Moreover, the time costs of OCP are found to achieve 85%-99% improvement than most state-of-the-arts.
Published: 2022

33. An Extractive-and-Abstractive Framework for Source Code Summarization

Author: Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Zhang, Quanjun, Tao, Guanhong, Han, Tingxu, Ge, Yifei, You, Yudu, Luo, Bin, Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Zhang, Quanjun, Tao, Guanhong, Han, Tingxu, Ge, Yifei, You, Yudu, and Luo, Bin
Abstract: (Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques, and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models from the neural machine translation domain. The generated summaries however often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs a task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs a task of abstractive code summarization, which takes in the entire code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques in terms of all three widely used metrics, including BLEU, METEOR, and ROUGH-L., Comment: Accepted to ACM Transactions on Software Engineering and Methodology (TOSEM)
Published: 2022

34. BLESER: Bug Localization Based on Enhanced Semantic Retrieval

Author: Zou, Weiqin, Li, Enming, Fang, Chunrong, Zou, Weiqin, Li, Enming, and Fang, Chunrong
Abstract: Static bug localization techniques that locate bugs at method granularity have gained much attention from both researchers and practitioners. For a static method-level bug localization technique, a key but challenging step is to fully retrieve the semantics of methods and bug reports. Currently, existing studies mainly use the same bag-of-word space to represent the semantics of methods and bug reports without considering structure information of methods and textual contexts of bug reports, which largely and negatively affects bug localization performance. To address this problem, we develop BLESER, a new bug localization technique based on enhanced semantic retrieval. Specifically, we use an AST-based code embedding model (capturing code structure better) to retrieve the semantics of methods, and word embedding models (capturing textual contexts better) to represent the semantics of bug reports. Then, a deep learning model is built on the enhanced semantic representations. During model building, we compare five typical word embedding models in representing bug reports and try to explore the usefulness of re-sampling strategies and cost-sensitive strategies in handling class imbalance problems. We evaluate our BLESER on five Java projects from the Defects4J dataset. We find that: (1) On the whole, the word embedding model ELMo outperformed the other four models (including word2vec, BERT, etc.) in facilitating bug localization techniques. (2) Among four strategies aiming at solving class imbalance problems, the strategy ROS (random over-sampling) performed much better than the other three strategies (including random under-sampling, Focal Loss, etc.). (3) By integrating ELMo and ROS into BLESER, at method-level bug localization, we could achieve MAP of 0.108-0.504, MRR of 0.134-0.510, and Accuracy@1 of 0.125-0.5 on five Defects4J projects.
Published: 2021

35. Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion Understanding

Author: Yu, Shengcheng, Fang, Chunrong, Zhang, Quanjun, Cao, Zhihao, Yun, Yexiao, Cao, Zhenfei, Mei, Kai, Chen, Zhenyu, Yu, Shengcheng, Fang, Chunrong, Zhang, Quanjun, Cao, Zhihao, Yun, Yexiao, Cao, Zhenfei, Mei, Kai, and Chen, Zhenyu
Abstract: Crowdsourced testing, as a distinct testing paradigm, has attracted much attention in software testing, especially in mobile application (app) testing field. Compared with in-house testing, crowdsourced testing shows superiority with the diverse testing environments when faced with the mobile testing fragmentation problem. However, crowdsourced testing also encounters the low-quality test report problem caused by unprofessional crowdworkers involved with different expertise. In order to handle the submitted reports of uneven quality, app developers have to distinguish high-quality reports from low-quality ones to help the bug inspection. One kind of typical low-quality test report is inconsistent test reports, which means the textual descriptions are not focusing on the attached bug-occurring screenshots. According to our empirical survey, only 18.07% crowdsourced test reports are consistent. Inconsistent reports cause waste on mobile app testing. To solve the inconsistency problem, we propose ReCoDe to detect the consistency of crowdsourced test reports via deep image-and-text fusion understanding. ReCoDe is a two-stage approach that first classifies the reports based on textual descriptions into different categories according to the bug feature. In the second stage, ReCoDe has a deep understanding of the GUI image features of the app screenshots and then applies different strategies to handle different types of bugs to detect the consistency of the crowdsourced test reports. We conduct an experiment on a dataset with over 22k test reports to evaluate ReCoDe, and the results show the effectiveness of ReCoDe in detecting the consistency of crowdsourced test reports. Besides, a user study is conducted to prove the practical value of ReCoDe in effectively helping app developers improve the efficiency of reviewing the crowdsourced test reports.
Published: 2021
Full Text: View/download PDF

36. Automated Mobile App Test Script Intent Generation via Image and Code Understanding

Author: Yu, Shengcheng, Fang, Chunrong, Li, Tongyu, Du, Mingzhe, Li, Xuan, Zhang, Jing, Yun, Yexiao, Wang, Xu, Chen, Zhenyu, Yu, Shengcheng, Fang, Chunrong, Li, Tongyu, Du, Mingzhe, Li, Xuan, Zhang, Jing, Yun, Yexiao, Wang, Xu, and Chen, Zhenyu
Abstract: Testing is the most direct and effective technique to ensure software quality. However, it is a burden for developers to understand the poorly-commented tests, which are common in industry environment projects. Mobile applications (app) are GUI-intensive and event-driven, so test scripts focusing on GUI interactions play a more important role in mobile app testing besides the test cases for the source code. Therefore, more attention should be paid to the user interactions and the corresponding user event responses. However, test scripts are loosely linked to apps under test (AUT) based on widget selectors, making it hard to map the operations to the functionality code of AUT. In such a situation, code understanding algorithms may lose efficacy if directly applied to mobile app test scripts. We present a novel approach, TestIntent, to infer the intent of mobile app test scripts. TestIntent combines the GUI image understanding and code understanding technologies. The test script is transferred into an operation sequence model. For each operation, TestIntent extracts the operated widget selector and link the selector to the UI layout structure, which stores the detailed information of the widgets, including coordinates, type, etc. With code understanding technologies, TestIntent can locate response methods in the source code. Afterwards, NLP algorithms are adopted to understand the code and generate descriptions. Also, TestIntent can locate widgets on the app GUI images. Then, TestIntent can understand the widget intent with an encoder-decoder model. With the combination of the results from GUI and code understanding, TestIntent generates the test intents in natural language format. We also conduct an empirical experiment, and the results prove the outstanding performance of TestIntent. A user study also declares that TestIntent can save developers' time to understand test scripts.
Published: 2021

37. Prioritize Crowdsourced Test Reports via Deep Screenshot Understanding

Author: Yu, Shengcheng, Fang, Chunrong, Cao, Zhenfei, Wang, Xu, Li, Tongyu, Chen, Zhenyu, Yu, Shengcheng, Fang, Chunrong, Cao, Zhenfei, Wang, Xu, Li, Tongyu, and Chen, Zhenyu
Abstract: Crowdsourced testing is increasingly dominant in mobile application (app) testing, but it is a great burden for app developers to inspect the incredible number of test reports. Many researches have been proposed to deal with test reports based only on texts or additionally simple image features. However, in mobile app testing, texts contained in test reports are condensed and the information is inadequate. Many screenshots are included as complements that contain much richer information beyond texts. This trend motivates us to prioritize crowdsourced test reports based on a deep screenshot understanding. In this paper, we present a novel crowdsourced test report prioritization approach, namely DeepPrior. We first represent the crowdsourced test reports with a novelly introduced feature, namely DeepFeature, that includes all the widgets along with their texts, coordinates, types, and even intents based on the deep analysis of the app screenshots, and the textual descriptions in the crowdsourced test reports. DeepFeature includes the Bug Feature, which directly describes the bugs, and the Context Feature, which depicts the thorough context of the bug. The similarity of the DeepFeature is used to represent the test reports' similarity and prioritize the crowdsourced test reports. We formally define the similarity as DeepSimilarity. We also conduct an empirical experiment to evaluate the effectiveness of the proposed technique with a large dataset group. The results show that DeepPrior is promising, and it outperforms the state-of-the-art approach with less than half the overhead.
Published: 2021

38. DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks

Author: Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, Chen, Zhenyu, Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, and Chen, Zhenyu
Abstract: Deep neural networks (DNN) have been deployed in many software systems to assist in various classification tasks. In company with the fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance. To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity, which allows us to quickly identify possibly-misclassified tests. To evaluate, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experimental results demonstrate that DeepGini outperforms existing coverage-based techniques in prioritizing tests regarding both effectiveness and efficiency. Meanwhile, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques. © 2020 ACM.
Published: 2020

39. Functional Code Clone Detection With Syntax and Semantics Fusion Learning

Author: Fang, Chunrong, Liu, Zixi, Shi, Yangyang, Huang, Jeff, Shi, Qingkai, Fang, Chunrong, Liu, Zixi, Shi, Yangyang, Huang, Jeff, and Shi, Qingkai
Abstract: Clone detection of source code is among the most fundamental software engineering techniques. Despite intensive research in the past decade, existing techniques are still unsatisfactory in detecting "functional" code clones. In particular, existing techniques cannot efficiently extract syntax and semantics information from source code. In this paper, we propose a novel joint code representation that applies fusion embedding techniques to learn hidden syntactic and semantic features of source codes. Besides, we introduce a new granularity for functional code clone detection. Our approach regards the connected methods with caller-callee relationships as a functionality and the method without any caller-callee relationship with other methods represents a single functionality. Then we train a supervised deep learning model to detect functional code clones. We conduct evaluations on a large dataset of C++ programs and the experimental results show that fusion learning can significantly outperform the state-of-the-art techniques in detecting functional code clones. © 2020 ACM.
Published: 2020

40. Functional Code Clone Detection With Syntax and Semantics Fusion Learning

Author: Fang, Chunrong, Liu, Zixi, Shi, Yangyang, Huang, Jeff, Shi, Qingkai, Fang, Chunrong, Liu, Zixi, Shi, Yangyang, Huang, Jeff, and Shi, Qingkai
Abstract: Clone detection of source code is among the most fundamental software engineering techniques. Despite intensive research in the past decade, existing techniques are still unsatisfactory in detecting "functional" code clones. In particular, existing techniques cannot efficiently extract syntax and semantics information from source code. In this paper, we propose a novel joint code representation that applies fusion embedding techniques to learn hidden syntactic and semantic features of source codes. Besides, we introduce a new granularity for functional code clone detection. Our approach regards the connected methods with caller-callee relationships as a functionality and the method without any caller-callee relationship with other methods represents a single functionality. Then we train a supervised deep learning model to detect functional code clones. We conduct evaluations on a large dataset of C++ programs and the experimental results show that fusion learning can significantly outperform the state-of-the-art techniques in detecting functional code clones. © 2020 ACM.
Published: 2020

41. DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks

Author: Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, Chen, Zhenyu, Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, and Chen, Zhenyu
Abstract: Deep neural networks (DNN) have been deployed in many software systems to assist in various classification tasks. In company with the fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance. To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity, which allows us to quickly identify possibly-misclassified tests. To evaluate, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experimental results demonstrate that DeepGini outperforms existing coverage-based techniques in prioritizing tests regarding both effectiveness and efficiency. Meanwhile, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques. © 2020 ACM.
Published: 2020

42. DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks

Author: Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, Chen, Zhenyu, Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, and Chen, Zhenyu
Abstract: Deep neural networks (DNN) have been deployed in many software systems to assist in various classification tasks. In company with the fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance. To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity, which allows us to quickly identify possibly-misclassified tests. To evaluate, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experimental results demonstrate that DeepGini outperforms existing coverage-based techniques in prioritizing tests regarding both effectiveness and efficiency. Meanwhile, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques. © 2020 ACM.
Published: 2020

43. Functional Code Clone Detection With Syntax and Semantics Fusion Learning

Author: Fang, Chunrong, Liu, Zixi, Shi, Yangyang, Huang, Jeff, Shi, Qingkai, Fang, Chunrong, Liu, Zixi, Shi, Yangyang, Huang, Jeff, and Shi, Qingkai
Abstract: Clone detection of source code is among the most fundamental software engineering techniques. Despite intensive research in the past decade, existing techniques are still unsatisfactory in detecting "functional" code clones. In particular, existing techniques cannot efficiently extract syntax and semantics information from source code. In this paper, we propose a novel joint code representation that applies fusion embedding techniques to learn hidden syntactic and semantic features of source codes. Besides, we introduce a new granularity for functional code clone detection. Our approach regards the connected methods with caller-callee relationships as a functionality and the method without any caller-callee relationship with other methods represents a single functionality. Then we train a supervised deep learning model to detect functional code clones. We conduct evaluations on a large dataset of C++ programs and the experimental results show that fusion learning can significantly outperform the state-of-the-art techniques in detecting functional code clones. © 2020 ACM.
Published: 2020

44. Graph-Based Fuzz Testing for Deep Learning Inference Engine

Author: Luo, Weisi, Chai, Dong, Run, Xiaoyue, Wang, Jiang, Fang, Chunrong, Chen, Zhenyu, Luo, Weisi, Chai, Dong, Run, Xiaoyue, Wang, Jiang, Fang, Chunrong, and Chen, Zhenyu
Abstract: With the wide use of Deep Learning (DL) systems, academy and industry begin to pay attention to their quality. Testing is one of the major methods of quality assurance. However, existing testing techniques focus on the quality of DL models but lacks attention to the core underlying inference engines (i.e., frameworks and libraries). Inspired by the success stories of fuzz testing, we design a graph-based fuzz testing method to improve the quality of DL inference engines. This method is naturally followed by the graph structure of DL models. A novel operator-level coverage criterion based on graph theory is introduced and six different mutations are implemented to generate diversified DL models by exploring combinations of model structures, parameters, and data inputs. The Monte Carlo Tree Search (MCTS) is used to drive DL model generation without a training process. The experimental results show that the MCTS outperforms the random method in boosting operator-level coverage and detecting exceptions. Our method has discovered more than 40 different exceptions in three types of undesired behaviors: model conversion failure, inference failure, output comparison failure. The mutation strategies are useful to generate new valid test inputs, by up to 8.2% more operator-level coverage on average and 8.6 more exceptions captured.
Published: 2020

45. Layout and Image Recognition Driving Cross-Platform Automated Mobile Testing

Author: Yu, Shengcheng, Fang, Chunrong, Yun, Yexiao, Feng, Yang, Yu, Shengcheng, Fang, Chunrong, Yun, Yexiao, and Feng, Yang
Abstract: The fragmentation problem has extended from Android to different platforms, such as iOS, mobile web, and even mini-programs within some applications (app). In such a situation, recording and replaying test scripts is a popular automated mobile app testing approaches. But such approach encounters severe problems when crossing platforms. Different versions of the same app need to be developed to support different platforms relying on different platform supports. Therefore, mobile app developers need to develop and maintain test scripts for multiple platforms aimed at completely the same test requirements, greatly increasing testing costs. However, we discover that developers adopt highly similar user interface layouts for versions of the same app on different platforms. Such a phenomenon inspires us to replay test scripts from the perspective of similar UI layouts. We propose an image-driven mobile app testing framework, utilizing Widget Feature Matching and Layout Characterization Matching. We use computer vision technologies to perform UI feature comparison and layout hierarchy extraction on app screenshots to obtain UI structures with rich contextual information, including coordinates, relative relationship, etc. Based on acquired UI structures, we can form a platform-independent test script, and then locate the target widgets under test. Thus, the proposed framework non-intrusively replays test scripts according to a novel platform-independent test script model. We also design and implement a tool named LIT to devote the proposed framework into practice, based on which, we conduct an empirical study to evaluate the effectiveness and usability of the proposed testing framework. Results show that the overall replay accuracy reaches around 63.39% on Android (14% improvement over state-of-the-art approaches) and 21.83% on iOS (98% improvement over state-of-the-art approaches).
Published: 2020

46. DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks

Author: Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, Chen, Zhenyu, Feng, Yang, Shi, Qingkai, Gao, Xinyu, Wan, Jun, Fang, Chunrong, and Chen, Zhenyu
Abstract: Deep neural networks (DNN) have been deployed in many software systems to assist in various classification tasks. In company with the fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance. To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity, which allows us to quickly identify possibly-misclassified tests. To evaluate, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experimental results demonstrate that DeepGini outperforms existing coverage-based techniques in prioritizing tests regarding both effectiveness and efficiency. Meanwhile, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques.
Published: 2019

47. Testing as an Investment

Author: Xu, Xiaoran, Fang, Chunrong, Wu, Qing, Liu, Jia, Chen, Zhenyu, Xu, Xiaoran, Fang, Chunrong, Wu, Qing, Liu, Jia, and Chen, Zhenyu
Abstract: Software testing is an expensive and important task. Plenty of researches and industrial efforts have been invested on improving software testing techniques, including criteria, tools, etc. These studies can provide guidelines to select suitable test techniques for software engineers. However, in some engineering projects, business issues may be more important than technical ones, hence we need to lobby non-technical members to support our decisions. In this paper, a well-known investment model, Nelson-Siegel model, is introduced to evaluate and forecast the processes of testing with different testing criteria. Through this model, we provide a new perspective to understand short-term, medium-term, and long-term returns of investments throughout the process of testing. A preliminary experiment is conducted to investigate three testing criteria from the viewpoint of investments. The results show that statement-coverage criterion performs best in gaining long-term yields; the short-term and medium-term yields of testing depend on the scale of programs and the number of faults they contain., Comment: 6 pages, The 26th International Conference on Software Engineering and Knowledge Engineering (SEKE 2014)
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

47 results on '"Fang, Chunrong"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources