193 results on '"Jiaguang Sun"'
Search Results
2. RNN-Test: Towards Adversarial Testing for Recurrent Neural Network Systems
- Author
-
Yu Jiang, Jiaguang Sun, Heyuan Shi, Quan Zhang, Jianmin Guo, and Yue Zhao
- Subjects
Optimization problem ,Perplexity ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Convolutional neural network ,Recurrent neural network ,Task analysis ,Language model ,Artificial intelligence ,State (computer science) ,business ,computer ,Software ,MNIST database - Abstract
While massive efforts have been investigated in adversarial testing of convolutional neural networks (CNN), testing for recurrent neural networks (RNN) is still limited and leaves threats for vast sequential application domains. In this paper, we propose an adversarial testing framework RNN-Test for RNN systems, focusing on sequence-to-sequence (seq2seq) tasks of widespread deployments, not only classification domains. First, we design a novel search methodology customized for RNN models by maximizing the inconsistency of RNN states against their inner dependencies to produce adversarial inputs. Next, we introduce two state-based coverage metrics according to the distinctive structure of RNNs to exercise more system behaviors. Finally, RNN-Test solves the joint optimization problem to maximize state inconsistency and state coverage, and crafts adversarial inputs for various tasks of different kinds of inputs. For evaluations, we apply RNN-Test on four RNN models of common structures. On the tested models, the RNN-Test approach is demonstrated to be competitive in generating adversarial inputs, outperforming FGSM-based and DLFuzz-based methods to reduce the model performance more sharply with 2.78% to 37.94% higher success (or generation) rate. RNN-Test could also achieve 52.65% to 66.45% higher adversary rate than testRNN on MNIST LSTM model, as well as 53.76% to 58.02% more perplexity with 16% higher generation rate than DeepStellar on PTB language model. Compared with the traditional neuron coverage, the proposed state coverage metrics as guidance excel with 4.17% to 97.22% higher success (or generation) rate.
- Published
- 2022
3. Code Synthesis for Dataflow-Based Embedded Software Design
- Author
-
Wanli Chang, Yixiao Yang, Jiaguang Sun, Yu Jiang, Zhuo Su, Wen Li, Liming Fang, and Dongyan Wang
- Subjects
Schedule ,Source lines of code ,Generator (computer programming) ,Java ,Dataflow ,Programming language ,Computer science ,computer.file_format ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Code (cryptography) ,Code generation ,Executable ,Electrical and Electronic Engineering ,computer ,Software ,computer.programming_language - Abstract
Model-driven methodology has been widely adopted in embedded software design, and Dataflow is a widely used computation model, with strong modeling and simulation ability supported in tools such as Ptolemy. However, its code synthesis support is quite limited, which restricts its applications in real industrial practice. In this paper, we focus on the automatic code synthesis of Dataflow, and implement , a code generator that could support most of the widely used modeling features such as expression type and boolean switch, more efficiently. First, we disassemble the Dataflow model into actors embedded in if-else or switch-case statements based on schedule analysis, which bridges the semantic gap between the code and the original Dataflow model. Then, we design well-designed templates for each actor, and synthesize well-structured executable C and Java codes with sequential code assembly. Compared to the existing C and Java code generators of Dataflow model in Ptolemy-II, and the C code generator in Simulink, the lines of code synthesized by are decreased by an average of , and , and the execution time of the synthesized code by is also decreased by an average of , and respectively.
- Published
- 2022
4. Semantic Learning and Emulation Based Cross-platform Binary Vulnerability Seeker
- Author
-
Jiaguang Sun, Yu Jiang, Xin Yang, Cong Wang, Xun Jiao, Zijiang Yang, Jian Gao, and Zhe Liu
- Subjects
FOS: Computer and information sciences ,Emulation ,Source code ,Computer Science - Cryptography and Security ,Artificial neural network ,Computer science ,media_common.quotation_subject ,computer.software_genre ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,Ranking ,Control flow graph ,Overhead (computing) ,Compiler ,Data mining ,Cryptography and Security (cs.CR) ,computer ,Software ,media_common ,Vulnerability (computing) - Abstract
Clone detection is widely exploited for software vulnerability search. The approaches based on source code analysis cannot be applied to binary clone detection because the same source code can produce significantly different binaries. In this paper, we present BinSeeker, a cross-platform binary seeker that integrates semantic learning and emulation. With the help of the labeled semantic flow graph, BinSeeker can quickly identify M candidate functions that are most similar to the vulnerability from the target binary. The value of M is relatively large so this semantic learning procedure essentially eliminates those functions that are very unlikely to have the vulnerability. Then, semantic emulation is conducted on these M candidates to obtain their dynamic signature sequences. By comparing signature sequences, BinSeeker produces top-N functions that exhibit most similar behavior to that of the vulnerability. With fast filtering of semantic learning and accurate comparison of semantic emulation, BinSeeker seeks vulnerability precisely with little overhead. The experiments on six widely used programs with fifteen known CVE vulnerabilities demonstrate that BinSeeker outperforms three state-of-the-art tools Genius, Gemini and CACompare. Regarding search accuracy, BinSeeker achieves an MRR value of 0.65 in the target programs, whereas the MRR values by Genius, Gemini and CACompare are 0.17, 0.07 and 0.42, respectively. If we consider ranking a function with the targeted vulnerability in the top-5 as accurate, BinSeeker achieves the accuracy of 93.33 percent, while the accuracy of the other three tools is merely 33.33, 13.33 and 53.33 percent, respectively. Such accuracy is achieved with 0.27s on average to determine whether the target binary function contains a known vulnerability, and the time for the other three tools are 1.57s, 0.15s and 0.98s, respectively., This paper appeared in IEEE Transactions on Software Engineering
- Published
- 2022
5. Automatic Integer Error Repair by Proper-Type Inference
- Author
-
Ming Gu, Min Zhou, Xiaoyu Song, Xi Cheng, and Jiaguang Sun
- Subjects
021110 strategic, defence & security studies ,Computer science ,Semantics (computer science) ,0211 other engineering and technologies ,Type inference ,02 engineering and technology ,Expression (computer science) ,Embedded software ,Integer ,Computer engineering ,Robustness (computer science) ,Benchmark (computing) ,Overhead (computing) ,Electrical and Electronic Engineering - Abstract
C language plays a key role in system programming and applications. Integer error is a common yet important C program defect because arithmetic operations may produce unrepresentable values in certain integer types. Integer error is one of the major sources of software failures and vulnerabilities. Due to the complex semantics of C integers, manually repairing integer errors is prone to introducing additional errors even for experienced programmers. This paper presents an approach to automatically generate fixes for integer errors. Our approach infers, for each expression, a type that is capable of representing its possible values, and utilizes inferred types as program fixes based on common fix patterns codified from real world. We have developed our system IntPTI which is evaluated on the largest public benchmark of integer errors and 7 widely-used open-source projects. The evaluation results demonstrate the superior performance of IntPTI in terms of accuracy, scalability, runtime overhead and robustness of fixes. In addition, IntPTI is applied on the embedded software of a realistic train control system. It succeeds in both detecting 67 new integer errors and generating 101 fixes confirmed by developers. The study substantiates the feasibility and effectiveness of the proposed methodology.
- Published
- 2021
6. Semantic Learning Based Cross-Platform Binary Vulnerability Search For IoT Devices
- Author
-
Houbing Song, Jiaguang Sun, Kim-Kwang Raymond Choo, Xin Yang, Yu Jiang, and Jian Gao
- Subjects
Source code ,Artificial neural network ,Computer science ,media_common.quotation_subject ,020208 electrical & electronic engineering ,Vulnerability ,02 engineering and technology ,Construct (python library) ,Semantics ,computer.software_genre ,Computer Science Applications ,Control and Systems Engineering ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Semantic learning ,Data mining ,Electrical and Electronic Engineering ,computer ,Information Systems ,media_common - Abstract
The rapid development of Internet of Things (IoT) has triggered more security requirements than ever, especially in detecting vulnerabilities in various IoT devices. The widely used clone-based vulnerability search methods are effective on source code; however, their performance is limited in IoT binary search. In this article, we present IoTSeeker , a function semantic learning based vulnerability search approach for cross-platform IoT binary. First, we construct the function semantic graph to capture both the data flow and control flow information and encode lightweight semantic features of each basic block within the semantic graph as numerical vectors. Then, the embedding vector of the whole binary function is generated by feeding the numerical vectors of basic blocks to our customized semantics aware neural network model. Finally, the cosine distance of two embedding vectors is calculated to determine whether a binary function contains a known vulnerability. The experiments show that IoTSeeker outperforms the state-of-the-art approaches for identifying cross-platform IoT binary vulnerabilities. For example, compared to Gemini , IoTSeeker finds 12.68% more vulnerabilities in the top-50 candidates, and improves the value of AUC for 8.23%.
- Published
- 2021
7. EM-Fuzz: Augmented Firmware Fuzzing via Memory Checking
- Author
-
Wanli Chang, Xun Jiao, Jiaguang Sun, Yu Jiang, Yiwen Xu, Zhe Liu, and Jian Gao
- Subjects
Correctness ,Source code ,Computer science ,Firmware ,business.industry ,media_common.quotation_subject ,National Vulnerability Database ,Code coverage ,02 engineering and technology ,Fuzz testing ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Memory management ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Electronic design automation ,Instrumentation (computer programming) ,Electrical and Electronic Engineering ,business ,computer ,Software ,media_common - Abstract
Embedded systems are increasingly interconnected in the emerging application scenarios. Many of these applications are safety critical, making it a high priority to ensure that the systems are free from malicious attacks. This work aims to detect vulnerabilities, that could be exploited by adversaries to compromise functional correctness, in the embedded firmware, which is challenging especially due to the absence of source code. In particular, we propose EM-Fuzz , a firmware vulnerability detection technique that tightly integrates fuzzing with real-time memory checking. Based on the memory instrumentation, the firmware fuzzing can not only be guided by the traditional branch coverage to generate high-quality seeds to explore hard-to-reach regions but also by the recorded memory sensitive operations to continuously exercise sensitive regions which are prone to being attacked. More importantly, the instrumentation integrates real-time memory checkers to expose memory vulnerabilities, which is not well-supported by existing fuzzers without source code. The experiments on several real-world embedded firmware such as OpenSSL demonstrate that EM-Fuzz significantly improves the performance of state-of-the-art fuzzing tools, such as AFL and AFLFast, with the coverage improvements of 93.98% and 46.89%, respectively. Furthermore, EM-Fuzz exposes a total of 23 vulnerabilities, with an average of about 7-h per vulnerability. AFL and AFLFast together find 10 vulnerabilities, costing about 13 h and 10-h per vulnerability on average, respectively. Out of these 23 vulnerabilities, 16 are previously unknown and have been reported to the upstream product vendors, 7 of which have been assigned with unique CVE identifiers in the U.S. National Vulnerability Database.
- Published
- 2020
8. Vulnerable Code Clone Detection for Operating System Through Correlation-Induced Learning
- Author
-
Ying Fu, Heyuan Shi, Yu Jiang, Kun Tang, Jian Dong, Jiaguang Sun, and Runzhe Wang
- Subjects
Structure (mathematical logic) ,Cloning (programming) ,Computer science ,Function (mathematics) ,computer.software_genre ,Computer Science Applications ,Correlation ,Control and Systems Engineering ,Code (cryptography) ,Operating system ,Graph (abstract data type) ,Electrical and Electronic Engineering ,computer ,Information Systems ,Codebase - Abstract
Vulnerable code clones in the operating system (OS) threaten the safety of smart industrial environment, and most vulnerable OS code clone detection approaches neglect correlations between functions that limits the detection effectiveness. In this article, we propose a two-phase framework to find vulnerable OS code clones by learning on correlations between functions. On the training phase, functions as the training set are extracted from the latest code repository and function features are derived by their AST structure. Then, external and internal correlations are explored by graph modeling of functions. Finally, the graph convolutional network for code clone detection (GCN-CC) is trained using function features and correlations. On the detection phase, functions in the to-be-detected OS code repository are extracted and the vulnerable OS code clones are detected by the trained GCN-CC. We conduct experiments on five real OS code repositories, and experimental results show that our framework outperforms the state-of-the-art approaches.
- Published
- 2019
9. Improve Language Modeling for Code Completion Through Learning General Token Repetition of Source Code with Optimized Memory
- Author
-
Xiang Chen, Yixiao Yang, and Jiaguang Sun
- Subjects
Thesaurus (information retrieval) ,Source code ,Repetition (rhetorical device) ,Computer Networks and Communications ,Graph neural networks ,Computer science ,Programming language ,media_common.quotation_subject ,Security token ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Artificial Intelligence ,Code (cryptography) ,Language model ,computer ,Software ,Natural language ,media_common - Abstract
In last few years, applying language model to source code is the state-of-the-art method for solving the problem of code completion. However, compared with natural language, code has more obvious repetition characteristics. For example, a variable can be used many times in the following code. Variables in source code have a high chance to be repetitive. Cloned code and templates, also have the property of token repetition. Capturing the token repetition of source code is important. In different projects, variables or types are usually named differently. This means that a model trained in a finite data set will encounter a lot of unseen variables or types in another data set. How to model the semantics of the unseen data and how to predict the unseen data based on the patterns of token repetition are two challenges in code completion. Hence, in this paper, token repetition is modelled as a graph, we propose a novel REP model which is based on deep neural graph network to learn the code toke repetition. The REP model is to identify the edge connections of a graph to recognize the token repetition. For predicting the token repetition of token [Formula: see text], the information of all the previous tokens needs to be considered. We use memory neural network (MNN) to model the semantics of each distinct token to make the framework of REP model more targeted. The experiments indicate that the REP model performs better than LSTM model. Compared with Attention-Pointer network, we also discover that the attention mechanism does not work in all situations. The proposed REP model could achieve similar or slightly better prediction accuracy compared to Attention-Pointer network and consume less training time. We also find other attention mechanism which could further improve the prediction accuracy.
- Published
- 2019
10. Polar
- Author
-
Xun Jiao, Yu Jiang, Jiaguang Sun, Feilong Zuo, Zhengxiong Luo, and Jian Gao
- Subjects
Protocol (science) ,Computer science ,business.industry ,National Vulnerability Database ,Code coverage ,020207 software engineering ,02 engineering and technology ,Fuzz testing ,law.invention ,Function Code ,Taint checking ,Hardware and Architecture ,law ,020204 information systems ,Embedded system ,Internet Protocol ,0202 electrical engineering, electronic engineering, information engineering ,business ,Modbus ,Software - Abstract
Industrial Control System (ICS) protocols are widely used to build communications among system components. Compared with common internet protocols, ICS protocols have more control over remote devices by carrying a specific field called “function code”, which assigns what the receive end should do. Therefore, it is of vital importance to ensure their correctness. However, traditional vulnerability detection techniques such as fuzz testing are challenged by the increasing complexity of these diverse ICS protocols. In this paper, we present a function code aware fuzzing framework — Polar, which automatically extracts semantic information from the ICS protocol and utilizes this information to accelerate security vulnerability detection. Based on static analysis and dynamic taint analysis, Polar initiates the values of the function code field and identifies some vulnerable operations. Then, novel semantic aware mutation and selection strategies are designed to optimize the fuzzing procedure. For evaluation, we implement Polar on top of two popular fuzzers — AFL and AFLFast, and conduct experiments on several widely used ICS protocols such as Modbus, IEC104, and IEC 61850. Results show that, compared with AFL and AFLFast, Polar achieves the same code coverage and bug detection numbers at the speed of 1.5X-12X. It also gains increase with 0%--91% more paths within 24 hours. Furthermore, Polar has exposed 10 previously unknown vulnerabilities in those protocols, 6 of which have been assigned unique CVE identifiers in the US National Vulnerability Database.
- Published
- 2019
11. Hypergraph-Induced Convolutional Networks for Visual Classification
- Author
-
Zizhao Zhang, Jiaguang Sun, Heyuan Shi, Xibin Zhao, Yubo Zhang, Yue Gao, and Nan Ma
- Subjects
Hypergraph ,Computer Networks and Communications ,Computer science ,02 engineering and technology ,Convolutional neural network ,Pattern Recognition, Automated ,Data modeling ,Text mining ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Artificial neural network ,business.industry ,Pattern recognition ,Graph ,Computer Science Applications ,Visualization ,Data set ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Neural Networks, Computer ,Artificial intelligence ,business ,Algorithms ,Photic Stimulation ,Software - Abstract
At present, convolutional neural networks (CNNs) have become popular in visual classification tasks because of their superior performance. However, CNN-based methods do not consider the correlation of visual data to be classified. Recently, graph convolutional networks (GCNs) have mitigated this problem by modeling the pairwise relationship in visual data. Real-world tasks of visual classification typically must address numerous complex relationships in the data, which are not fit for the modeling of the graph structure using GCNs. Therefore, it is vital to explore the underlying correlation of visual data. Regarding this issue, we propose a framework called the hypergraph-induced convolutional network to explore the high-order correlation in visual data during deep neural networks. First, a hypergraph structure is constructed to formulate the relationship in visual data. Then, the high-order correlation is optimized by a learning process based on the constructed hypergraph. The classification tasks are performed by considering the high-order correlation in the data. Thus, the convolution of the hypergraph-induced convolutional network is based on the corresponding high-order relationship, and the optimization on the network uses each data and considers the high-order correlation of the data. To evaluate the proposed hypergraph-induced convolutional network framework, we have conducted experiments on three visual data sets: the National Taiwan University 3-D model data set, Princeton Shape Benchmark, and multiview RGB-depth object data set. The experimental results and comparison in all data sets demonstrate the effectiveness of our proposed hypergraph-induced convolutional network compared with the state-of-the-art methods.
- Published
- 2019
12. Tolerating C Integer Error via Precision Elevation
- Author
-
Xiaoyu Song, Min Zhou, Jiaguang Sun, Ming Gu, and Xi Cheng
- Subjects
Computer science ,Spec# ,02 engineering and technology ,020202 computer hardware & architecture ,Theoretical Computer Science ,Set (abstract data type) ,Computational Theory and Mathematics ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Test suite ,NIST ,Arithmetic ,computer ,Software ,Range (computer programming) ,computer.programming_language ,Integer (computer science) - Abstract
In C programs, integer error is a common yet important kind of defect due to arithmetic operations that produce unrepresentable values in certain types. Integer errors are harbored in a wide range of applications and possibly lead to serious software failures and exploitable vulnerabilities. Due to the complicated semantics of C, manually preventing integer errors is challenging even for experienced developers. In this paper we propose a novel approach to automate C integer error repair by elevating the precision of arithmetic operations according to a set of code transformation rules. A large portion of integer errors can be repaired by recovering expected results (i.e., tolerance) instead of removing program functionality. Our approach is fully automatic without requiring code specifications. Furthermore, the transformed code is ensured to be well-typed and has conservativeness property with respect to the original code. Our approach is implemented as a prototype CIntFix which succeeds in repairing all the integer errors from 7 categories in NIST's Juliet Test Suite. Furthermore, CIntFix is evaluated on large code bases in SPEC CINT2000, scaling to 366 KLOC within 126 seconds while the transformed code has 10.5 percent slowdown on average. The evaluation results substantiate the potential of our approach in real-world scenarios.
- Published
- 2019
13. A Multi-Player Minimax Game for Generative Adversarial Networks
- Author
-
Mingsheng Long, Philip S. Yu, Ying Jin, Yunbo Wang, Jiaguang Sun, and Jianmin Wang
- Subjects
Discriminator ,business.industry ,Computer science ,05 social sciences ,010501 environmental sciences ,Minimax ,01 natural sciences ,Adversarial system ,ComputingMethodologies_PATTERNRECOGNITION ,0502 economics and business ,Artificial intelligence ,050207 economics ,business ,Game theory ,Generative grammar ,0105 earth and related environmental sciences ,Diversity (business) ,Generator (mathematics) - Abstract
While multi-discriminators have been recently exploited to enhance the discriminability and diversity of Generative Adversarial Networks (GANs), these independent discriminators may not collaborate harmoniously to learn diverse and complementary decision boundaries. This paper extends the original two-player adversarial game of GANs by introducing a new multi-player objective named Discriminator Discrepancy Loss (DDL) for diversifying the multi-discriminators. Besides the competition between the generator and each discriminator, there are also competitions between the discriminators: 1) When training multi-discriminators, we simultaneously minimize the original GAN loss and maximize DDL, seeking a good trade-off between the accuracy and diversity. This yields diversified multi-discriminators that fit the generated data distribution to the real data distribution from more comprehensive perspectives. 2) When training the generator, we minimize DDL to encourage the generator to confuse all discriminators. This enhances the diversity of the generated data distribution. Further, we propose a layer-sharing network architecture for the multi-discriminators, which allows them to learn from distinct perspectives about the shared low-level features through better collaboration. It also makes our model more lightweight than existing multi-discriminators approaches. Our DDL-GAN remarkably outperforms other GANs over five standard datasets for image generation tasks.
- Published
- 2020
14. Necessity and Capability of Flow, Context, Field and Quasi Path Sensitive Points-to Analysis
- Author
-
Ming Gu, Yuexing Wang, Min Zhou, and Jiaguang Sun
- Subjects
Spectrum analyzer ,Control flow ,Computer science ,Context sensitivity ,Pointer (computer programming) ,0202 electrical engineering, electronic engineering, information engineering ,020207 software engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,CPAchecker ,Pointer analysis ,Algorithm ,Automaton - Abstract
Precise pointer analysis is desired since many program analyses benefit from it both in precision and performance. There are several dimensions of pointer analysis precision, flow sensitivity, context sensitivity, field sensitivity and path sensitivity. The more dimensions a pointer analysis considers, the more accurate its results will be. However, considering all dimensions is difficult because the trade-off between precision and efficiency should be balanced. This paper presents a flow, context, field and quasi path sensitive pointer analysis algorithm for C programs. Our algorithm runs on a control flow automaton, a key structure for our analysis to be flow sensitive. During the analysis process, we use function summaries to get context information. Elements of aggregate structures are handled to improve precision. We collect path conditions to filter unreachable paths and make all points-to relations gated. For efficiency, we propose a multi-entry mechanism. The algorithm is implemented in TsmartGP, which is an extension of CPAchecker. Our algorithm is compared with some state-of-the-art algorithms and TsmartGP is compared with cppcheck and Clang Static Analyzer by detecting uninitialized pointer errors in 13 real-world applications. The experimental results show that our algorithm is more accurate and TsmartGP can find more errors than other tools.
- Published
- 2019
15. Dependable Model-driven Development of CPS
- Author
-
Houbing Song, Yu Jiang, Lui Sha, Jiaguang Sun, Yixiao Yang, Yong Guan, Han Liu, and Ming Gu
- Subjects
Control and Optimization ,Computer Networks and Communications ,business.industry ,Computer science ,020208 electrical & electronic engineering ,Runtime verification ,Timed automaton ,Stateflow ,020207 software engineering ,02 engineering and technology ,Human-Computer Interaction ,Software ,Artificial Intelligence ,Hardware and Architecture ,Embedded system ,VHDL ,0202 electrical engineering, electronic engineering, information engineering ,Dependability ,Code generation ,business ,Formal verification ,computer ,computer.programming_language - Abstract
Simulink is widely used for model-driven development (MDD) of cyber-physical systems. Typically, the Simulink-based development starts with Stateflow modeling, followed by simulation, validation, and code generation mapped to physical execution platforms. However, recent trends have raised the demands of rigorous verification on safety-critical applications to prevent intrinsic development faults and improve the system dependability, which is unfortunately challenging. Even though the constructed Stateflow model and the generated code pass the validation of Simulink Design Verifier and Simulink Polyspace, respectively, the system may still fail due to some implicit defects contained in the design model (design defect) and the generated code (implementation defects). In this article, we bridge the Stateflow-based MDD and a well-defined rigorous verification to reduce development faults. First, we develop a self-contained toolkit to translate a Stateflow model into timed automata, where major advanced modeling features in Stateflow are supported. Taking advantage of the strong verification capability of Uppaal, we can not only find bugs in Stateflow models that are missed by Simulink Design Verifier but also check more important temporal properties. Next, we customize a runtime verifier for the generated non-intrusive VHDL and C code of a Stateflow model for monitoring. The major strength of the customization is the flexibility to collect and analyze runtime properties with a pure software monitor, which offers more opportunities for engineers to achieve high reliability of the target system compared with the traditional act that only relies on Simulink Polyspace. In this way, safety-critical properties are both verified at the model level and at the consistent system implementation level with physical execution environment in consideration. We apply our approach to the development of a typical cyber-physical system-train communication controller based on the IEC standard 61375. Experiments show that more ambiguousness in the standard are detected and confirmed and more development faults and those corresponding errors that would lead to system failure have been removed. Furthermore, the verified implementation has been deployed on real trains.
- Published
- 2018
16. Beyond Pairwise Matching: Person Reidentification via High-Order Relevance Learning
- Author
-
Shaoyi Du, Jiaguang Sun, Yubo Zhang, Yue Gao, Nan Wang, and Xibin Zhao
- Subjects
Matching (statistics) ,Computer Networks and Communications ,business.industry ,Computer science ,Feature extraction ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science Applications ,Artificial Intelligence ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Relevance (information retrieval) ,Pairwise comparison ,Artificial intelligence ,Focus (optics) ,Representation (mathematics) ,business ,computer ,Software - Abstract
Person reidentification has attracted extensive research efforts in recent years. It is challenging due to the varied visual appearance from illumination, view angle, background, and possible occlusions, leading to the difficulties when measuring the relevance, i.e., similarities, between probe and gallery images. Existing methods mainly focus on pairwise distance metric learning for person reidentification. In practice, pairwise image matching may limit the data for comparison (just the probe and one gallery subject) and yet lead to suboptimal results. The correlation among gallery data can be also helpful for the person reidentification task. In this paper, we propose to investigate the high-order correlation among the probe and gallery data, not the pairwise matching, to jointly learn the relevance of gallery data to the probe. Recalling recent progresses on feature representation in person reidentification, it is difficult to select the best feature and each type of feature can benefit person description from different aspects. Under such circumstances, we propose a multihypergraph joint learning algorithm to learn the relevance in corporation with multiple features of the imaging data. More specifically, one hypergraph is constructed using one type of feature and multiple hypergraphs can be generated accordingly. Then, the learning process is conducted on the multihypergraph structure, and the identity of a probe is determined by its relevance to each gallery data. The merit of the proposed scheme is twofold. First, different from pairwise image matching, the proposed method jointly explores the relationships among different images. Second, multimodal data, i.e., different features, can be formulated in the multihypergraph structure, which can convey more information in the learning process and can be easily extended. We note that the proposed method is a general framework to incorporate with any combination of features, and thus is flexible in practice. Experimental results and comparisons with the state-of-the-art methods on three public benchmarking data sets demonstrate the superiority of the proposed method.
- Published
- 2018
17. Parallelizing SMT solving: Lazy decomposition and conciliation
- Author
-
Xiaoyu Song, Ming Gu, Jiaguang Sun, Xi Cheng, and Min Zhou
- Subjects
Linguistics and Language ,Computer science ,020207 software engineering ,0102 computer and information sciences ,02 engineering and technology ,Conciliation ,Parallel computing ,Solver ,01 natural sciences ,Language and Linguistics ,Satisfiability ,010201 computation theory & mathematics ,Artificial Intelligence ,Satisfiability modulo theories ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Boolean satisfiability problem ,Random problems - Abstract
Satisfiability Modulo Theories (SMT) is the satisfiability problem for first-order formulae with respect to background theories. SMT extends the propositional satisfiability by introducing various underlying theories. To improve the efficiency of SMT solving, many efforts have been made on low-level algorithms but they generally cannot leverage the capability of parallel hardware. We propose a high-level and flexible framework, namely lazy decomposition and conciliation (LDC), to parallelize solving for quantifier-free SMT problems. Overall, a SMT problem is firstly decomposed into subproblems, then local reasoning inside each subproblem is conciliated with the global reasoning over the shared symbols across subproblems in parallel. LDC can be built on any existing solver without tuning its internal implementation, and is flexible as it is applicable to various underlying theories. We instantiate LDC in the theory of equality with uninterpreted functions, and implement a parallel solver PZ 3 based on Z 3. Experiment results on the QF_UF benchmarks from SMT-LIB as well as random problems show the potential of LDC, as (1) PZ 3 generally outperforms Z 3 in 4 out of 8 problem subcategories under various core configurations; (2) PZ 3 usually achieves super-linear speed-up over Z 3 on problems with sparse structures, which makes it possible to choose an appropriate solver from Z 3 and PZ 3 in advance according to the structure of input problem; (3) compared to PCVC 4, a state-of-the-art portfolio-based parallel SMT solver, PZ 3 achieves speed-up on a larger portion of problems and has better overall speed-up ratio.
- Published
- 2018
18. Temporal Coverage Analysis for Dynamic Verification
- Author
-
Ming Gu, Xiaoyu Song, Min Zhou, Jiaguang Sun, and William N. N. Hung
- Subjects
Correctness ,Sequential logic ,Computer science ,Runtime verification ,0211 other engineering and technologies ,Probabilistic logic ,02 engineering and technology ,computer.software_genre ,020202 computer hardware & architecture ,Software bug ,0202 electrical engineering, electronic engineering, information engineering ,Systems design ,Probabilistic analysis of algorithms ,Algorithm design ,Data mining ,Electrical and Electronic Engineering ,computer ,021106 design practice & management - Abstract
Dynamic verification is widely used to ensure the logical correctness of system design. Verification progress is usually gauged by coverage metrics. Most coverage metrics measure the sub-structures of design under verification that are exercised. More importantly, the probability of a bug being detected is approximated by probabilistic coverage analysis. However, existing analysis methods do not consider the temporal nature of digital systems, i.e., it only applies to combinational circuit but not sequential circuit. In this brief, we propose a probabilistic analysis framework which takes into account the temporal behavior of system design. We propose an effective analysis algorithm which can estimate the probability of a bug being detected for sequential circuit. Experimental results on 17489 random instances show that our method is both efficient and accurate. The analysis has time complexity quadratic to the number of coverage bins and linear to the number of simulation cycles. The analysis result has an average relative error of about 7.38%. In practice, our analysis result can be used to measure the completeness of verification.
- Published
- 2018
19. Data-Centered Runtime Verification of Wireless Medical Cyber-Physical System
- Author
-
Ming Gu, Jiaguang Sun, Lui Sha, Yu Jiang, Rui Wang, and Houbing Song
- Subjects
Decision support system ,Domain-specific language ,Event (computing) ,Computer science ,business.industry ,020208 electrical & electronic engineering ,Runtime verification ,Cyber-physical system ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Data acquisition ,Control and Systems Engineering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Electrical and Electronic Engineering ,Software engineering ,business ,Formal verification ,computer ,Information Systems - Abstract
Wireless medical cyber-physical systems are widely adopted in the daily practices of medicine, where huge amounts of data are sampled by the wireless medical devices and sensors, and is passed to the decision support systems (DSSs). Many text-based guidelines have been encoded for work-flow simulation of DSS to automate health care based on those collected data. But for some complex and life-critical diseases, it is highly desirable to automatically rigorously verify some complex temporal properties encoded in those data, which brings new challenges to current simulation-based DSS with limited support of automatical formal verification and real-time data analysis. In this paper, we conduct the first study on applying runtime verification to cooperate with current DSS based on real-time data. Within the proposed technique, a user-friendly domain specific language, named DRTV, is designed to specify vital real-time data sampled by medical devices and temporal properties originated from clinical guidelines. Some interfaces are developed for data acquisition and communication. Then, for medical practice scenarios described in DRTV model, we will automatically generate event sequences and runtime property verifier automata. If a temporal property violates, real-time warnings will be produced by the formal verifier and passed to medical DSS. We have used DRTV to specify different kinds of medical care scenarios and have applied the proposed technique to assist existing wireless medical cyber-physical system. As presented in experiment results, in terms of warning detection, it outperforms the only use of DSS or human inspection, and improves the quality of clinical health care of hospital.
- Published
- 2017
20. Formal Modeling and Verification of a Rate-Monotonic Scheduling Implementation with Real-Time Maude
- Author
-
Jiaguang Sun, Jiaxiang Liu, Min Zhou, Ming Gu, and Xiaoyu Song
- Subjects
Rate-monotonic scheduling ,Schedule ,Correctness ,Job shop scheduling ,Modeling language ,Programming language ,Computer science ,020208 electrical & electronic engineering ,0102 computer and information sciences ,02 engineering and technology ,Dynamic priority scheduling ,Flow shop scheduling ,computer.software_genre ,01 natural sciences ,Fair-share scheduling ,Scheduling (computing) ,Computer engineering ,010201 computation theory & mathematics ,Control and Systems Engineering ,Two-level scheduling ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Computer Science::Operating Systems ,computer ,Formal verification - Abstract
Rate-monotonic scheduling (RMS) is one of the most important real-time scheduling used in the industry. There are a large number of results about RMS, especially on its schedulability. However, the theoretical results do not contain enough details to be used directly for an industrial RMS implementation. On the other hand, the correctness of such an implementation is of the crucial importance. In this paper, we analyze a realistic RMS implementation by using real-time Maude, a formal modeling language and analysis tool based on rewriting logic. Overhead and some details of the hardware are taken into account in the model. We validate the schedulability and the correctness of the implementation within key scenarios. The soundness and the completeness of our approach are substantiated.
- Published
- 2017
21. Verification of Implementations of Cryptographic Hash Functions
- Author
-
Jiaguang Sun, Houbing Song, Dexi Wang, Yu Jiang, Fei He, and Ming Gu
- Subjects
Theoretical computer science ,General Computer Science ,Computer science ,Model-predictive control ,Hash function ,Cryptography ,02 engineering and technology ,urban traffic control ,SHA-2 ,emission control ,smoothening ,0202 electrical engineering, electronic engineering, information engineering ,Cryptographic hash function ,General Materials Science ,Security of cryptographic hash functions ,Key management ,gradient-based optimization ,Secure Hash Algorithm ,Cryptographic primitive ,business.industry ,020208 electrical & electronic engineering ,General Engineering ,Authorization ,MDC-2 ,Cryptographic protocol ,Hash-based message authentication code ,020202 computer hardware & architecture ,Merkle–Damgård construction ,Hash chain ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,lcsh:TK1-9971 ,Cryptographic nonce - Abstract
Cryptographic hash functions have become the basis of modern network computing for identity authorization and secure computing; protocol consistency of cryptographic hash functions is one of the most important properties that affect the security and correctness of cryptographic implementations, and protocol consistency should be well proven before being applied in practice. Software verification has seen substantial application in safety-critical areas and has shown the ability to deliver better quality assurance for modern software; thus, applying software verification to a protocol consistency proof for cryptographic hash functions is a reasonable approach to prove their correctness. Verification of protocol consistency of cryptographic hash functions includes modeling of the cryptographic protocol and program analysis of the cryptographic implementation; these require a dedicated cryptographic implementation model that preserves the semantics of the code, efficient analysis of cryptographic operations on arrays and bits, and the ability to verify large-scale implementations. In this paper, we propose a fully automatic software verification framework, VeriHash, that brings software verification to protocol consistency proofs for cryptographic hash function implementations. It solves the above challenges by introducing a novel cryptographic model design for modeling the semantics of cryptographic hash function implementations, extended array theories for analysis of operations, and compositional verification for scalability. We evaluated our verification framework on two SHA-3 cryptographic hash function implementations: the winner of the NIST SHA-3 competition, Keccack; and an open-source hash program, RHash. We successfully verified the core parts of the two implementations and reproduced a bug in the published edition of RHash.
- Published
- 2017
22. Train Driving Data Learning with S-CNN Model for Gear Prediction and Optimal Driving
- Author
-
Zhihua Zhong, Jiaguang Sun, Xia Yanan, Yao Liu, and Jin Huang
- Subjects
Scheme (programming language) ,Computer science ,010401 analytical chemistry ,Real-time computing ,Training (meteorology) ,Context (language use) ,02 engineering and technology ,Energy consumption ,021001 nanoscience & nanotechnology ,01 natural sciences ,0104 chemical sciences ,Feature (machine learning) ,Train ,0210 nano-technology ,computer ,Energy (signal processing) ,Efficient energy use ,computer.programming_language - Abstract
With the fast-growing of fleet size and the railway mileage, the energy consumption of trains is becoming a serious concern globally. The nature of railway offers a unique opportunity to optimize the energy efficiency of engines by the aid of various landforms along railway lines. However, because of the feature of non-linearity, high-dimension, complicated-constraint and time-variant, it is difficult to get the optimal train-driving scheme that saves the most energy. This paper proposes a dexterously augmented CNN model called sequential CNN(S-CNN) to generate automatic train-driving schemes. Forward and backward features are defined to depict the context of the running train and they are processed separately in the input layer. Operations of experienced train drivers are used for the CNN model training. The S-CNN model takes both the time-series and space-series into consideration and pays attention to their ordinal attributes. The proposed techniques are validated in a Hardware-in-Loop platform using realistic data. Experimental results proved that the driving operations obtained by S-CNN model are very similar to experienced human drivers and it saves more than 10% energy when compared with the average performance of human drivers.
- Published
- 2019
23. TsmartGP: A Tool for Finding Memory Defects with Pointer Analysis
- Author
-
Jiaguang Sun, Yuexing Wang, Guang Chen, Min Zhou, and Ming Gu
- Subjects
Memory defects ,Control flow ,Computer science ,Pointer (computer programming) ,0202 electrical engineering, electronic engineering, information engineering ,020207 software engineering ,Static program analysis ,02 engineering and technology ,Static analysis ,Algorithm ,Pointer analysis - Abstract
Precise pointer analysis is desired since it is a core technique to find memory defects. There are several dimensions of pointer analysis precision, flow sensitivity, context sensitivity, field sensitivity and path sensitivity. For static analysis tools utilizing pointer analysis, considering all dimensions is difficult because the trade-off between precision and efficiency should be balanced.This paper presents TsmartGP, a static analysis tool for finding memory defects in C programs with a precise and efficient pointer analysis. The pointer analysis algorithm is flow, context, field, and quasi path sensitive. Control flow automatons are the key structures for our analysis to be flow sensitive. Function summaries are applied to get context information and elements of aggregate structures are handled to improve precision. Path conditions are used to filter unreachable paths. For efficiency, a multi-entry mechanism is proposed. Utilizing the pointer analysis algorithm, we implement a checker in TsmartGP to find uninitialized pointer errors in 13 real-world applications. Cppcheck and Clang Static Analyzer are chosen for comparison. The experimental results show that TsmartGP can find more errors while its accuracy is also higher than Cppcheck and Clang Static Analyzer. The demo video is available at https://youtu.be/IQlshemk6OA.
- Published
- 2019
24. A Sensor Attack Detection Method in Intelligent Vehicle with Multiple Sensors
- Author
-
Yu Jing, Houbing Song, Yong Guan, Jiaguang Sun, Rui Wang, and Kang Yang
- Subjects
Variable (computer science) ,Identification (information) ,Computer science ,Real-time computing ,Pairwise comparison ,System dynamics model ,Transient (computer programming) ,Fault model ,Detection rate ,Multiple sensors - Abstract
With the rapid development of intelligent vehicles, more and more researchers are paying attention to their security issues. This paper considers a vehicle system with multiple sensors measuring the same physical variable. Some of these sensors may be maliciously attacked, resulting in the system being unable to work properly. This paper mainly addresses the detection and identification of malicious attacks on sensors in the presence of transient faults. Although there are some solutions at present, the existing methods can hardly capture attacks when a professional attacker manipulates the sensor output very slightly or infrequently. To address this problem, we design a resilient sensor attack detection algorithm. This algorithm adds an additional virtual sensor to the system and builds a fault model for each sensor, and using the pairwise inconsistencies between the sensors to detect the attack. To improve the detection rate in the presence of transient faults, we consider the system dynamics model and incorporates historical measurements into the detection algorithm. In addition, we propose a method to select transient fault model parameters to obtain a more accurate sensor fault model in a dynamic environment. Finally, this paper obtains real experimental data from the EV3 platform to verify the algorithm. Experimental results show that the proposed method is more robust and can detect and identify more attacks. Especially for stealth attacks which are extremely difficult to detect, the detection rate and recognition rate are increased by about 90%.
- Published
- 2019
25. Engineering a Better Fuzzer with Synergically Integrated Optimizations
- Author
-
Jie Liang, Zijiang Yang, Jiaguang Sun, Chengnian Sun, Xun Jiao, Yu Jiang, Mingzhe Wang, and Y. B. Chen
- Subjects
Computer science ,business.industry ,Fuzzy set ,020207 software engineering ,02 engineering and technology ,Fuzz testing ,Root cause ,Machine learning ,computer.software_genre ,020202 computer hardware & architecture ,Scheduling (computing) ,0202 electrical engineering, electronic engineering, information engineering ,Granularity ,Artificial intelligence ,business ,computer - Abstract
State-of-the-art fuzzers implement various optimizations to enhance their performance. As the optimizations reside in different stages such as input seed selection and mutation, it is tempting to combine the optimizations in different stages. However, our initial attempts demonstrate that naive combination actually worsens the performance, which explains that most optimizations are still isolated by stages and metrics. In this paper, we present InteFuzz, the first framework that synergically integrates multiple fuzzing optimizations. We analyze the root cause for performance degradation in naive combination, and discover optimizations conflict in coverage criteria and optimization granularity. To resolve the conflicts, we propose a novel priority-based scheduling mechanism. The dynamic integration considers both branch-based and block-based coverage feedbacks that are used by most fuzzing optimizations. In our evaluation, we extract four optimizations from popular fuzzers such as AFLFast and FairFuzz and compare InteFuzz against naive combinations. The evaluation results show that InteFuzz outperforms the naive combination by 29% and 26% in path-and branch-coverage. Additionally, InteFuzz triggers 222 more unique crashes, and discovers 33 zero-day vulnerabilities in real-world projects with 12 registered as CVEs.
- Published
- 2019
26. Industry practice of coverage-guided enterprise Linux kernel fuzzing
- Author
-
Jiaguang Sun, Xun Jiao, Ying Fu, Houbing Song, Runzhe Wang, Heyuan Shi, Yu Jiang, Xiaohai Shi, and Mingzhe Wang
- Subjects
Computer science ,National Vulnerability Database ,020207 software engineering ,Vulnerability detection ,Linux kernel ,02 engineering and technology ,Fuzz testing ,Deadlock ,computer.software_genre ,General protection fault ,High complexity ,020204 information systems ,Kernel (statistics) ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,computer - Abstract
Coverage-guided kernel fuzzing is a widely-used technique that has helped kernel developers and testers discover numerous vulnerabilities. However, due to the high complexity of application and hardware environment, there is little study on deploying fuzzing to the enterprise-level Linux kernel. In this paper, collaborating with the enterprise developers, we present the industry practice to deploy kernel fuzzing on four different enterprise Linux distributions that are responsible for internal business and external services of the company. We have addressed the following outstanding challenges when deploying a popular kernel fuzzer, syzkaller, to these enterprise Linux distributions: coverage support absence, kernel configuration inconsistency, bugs in shallow paths, and continuous fuzzing complexity. This leads to a vulnerability detection of 41 reproducible bugs which are previous unknown in these enterprise Linux kernel and 6 bugs with CVE IDs in U.S. National Vulnerability Database, including flaws that cause general protection fault, deadlock, and use-after-free.
- Published
- 2019
27. VFQL: combinational static analysis as query language
- Author
-
Jiaguang Sun, Guang Chen, Min Zhou, and Yuexing Wang
- Subjects
050101 languages & linguistics ,Domain-specific language ,Modeling language ,Computer science ,Programming language ,05 social sciences ,02 engineering and technology ,Static analysis ,Query language ,computer.software_genre ,Problem domain ,Path (graph theory) ,0202 electrical engineering, electronic engineering, information engineering ,Control flow graph ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,computer ,Pointer analysis - Abstract
Value flow are widely used in static analysis to detect bugs. Existing techniques usually employ a pointer analysis and generate source sink summaries defined by problem domain, then a solver is invoked to determine whether the path is feasible. However, most of the tools does not provide an easy way for users to find user defined bugs within the same architecture of finding pre-defined bugs. This paper presents VFQL, an expressive query language on value flow graph and the framework to execute the query to find user defined defects. Moreover, VFQL provides a nice GUI to demonstrate the value flow graph and a modeling language to define system libraries or user libraries without code, which further enhances its usability. The experimental results on open benchmarks show that VFQL achieve a competitive performance against other state of art tools. The result of case study conducted on open source program shows that the flexible query and modeling language provide a great support in finding user specified defects.
- Published
- 2019
28. Go-clone: graph-embedding based clone detector for Golang
- Author
-
Yu Jiang, Jian Gao, Jiaguang Sun, Zhenchang Xing, Ming Gu, Cong Wang, Huafeng Zhang, and Weiliang Yin
- Subjects
Source code ,Parsing ,Graph embedding ,Programming language ,Computer science ,media_common.quotation_subject ,Construct (python library) ,computer.software_genre ,Clone (algebra) ,Concurrent computing ,Control flow graph ,computer ,Compiled language ,media_common - Abstract
Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo
- Published
- 2019
29. Uncertainty Theory Based Reliability-Centric Cyber-Physical System Design
- Author
-
Rui Wang, Jian Wang, Xun Jiao, Yongxin Liu, Yu Jiang, Jiaguang Sun, Mingzhe Wang, Hui Kong, and Houbing Song
- Subjects
Reliability theory ,Heuristic ,Computer science ,020208 electrical & electronic engineering ,Cyber-physical system ,020206 networking & telecommunications ,Uncertainty theory ,02 engineering and technology ,Software quality ,Reliability engineering ,0202 electrical engineering, electronic engineering, information engineering ,Engineering design process ,Design paradigm ,Reliability (statistics) - Abstract
Cyber-physical systems (CPSs) are built from, and depend upon, the seamless integration of software and hardware components. The most important challenge in CPS design and verification is to design CPS to be reliable in a variety of uncertainties, i.e., unanticipated and rapidly evolving environments and disturbances. The costs, delays and reliability of the designed CPS are highly dependent on software-hardware partitioning in the design. The key challenges in partitioning CPSs is that it is difficult to formalize reliability characterization in the same way as the uncertain cost and time delay. In this paper, we propose a new CPS design paradigm for reliability assurance while coping with uncertainty. To be specific, we develop an uncertain programming model for partitioning based on the uncertainty theory, to support the assured reliability. The uncertainty effect of the cost and delay time of components to be implemented can be modeled by the uncertainty variables with uncertainty distributions, and the reliability characterization is recursively derived. We convert the uncertain programming model and customize an improved heuristic to solve the converted model. Experiment results on some benchmarks and random graphs show that the uncertain method produces the design with higher reliability. Besides, in order to demonstrate the effectiveness of our model for in coping with uncertainty in design stage, we apply this uncertain framework and existing deterministic models in the design process of a sub-system that is used in real world subway control. The system implemented based on the uncertain model works better than the result of deterministic models. The proposed design paradigm has the potential to be generalized to the design of CPSs for greater assurances of safety and security under a variety of uncertainties
- Published
- 2019
30. Enabling Clone Detection For Ethereum Via Smart Contract Birthmarks
- Author
-
Yu Jiang, Han Liu, Zhiqiang Yang, Jiaguang Sun, and Wenqi Zhao
- Subjects
Smart contract ,Computer science ,Distributed computing ,020208 electrical & electronic engineering ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,020207 software engineering ,02 engineering and technology ,Symbolic execution ,Security token ,Bytecode ,Data access ,0202 electrical engineering, electronic engineering, information engineering ,Business logic ,Clone (computing) ,Vulnerability (computing) - Abstract
The Ethereum ecosystem has introduced a pervasive blockchain platform with programmable transactions. Everyone is allowed to develop and deploy smart contracts. Such flexibility can lead to a large collection of similar contracts, i.e., clones, especially when Ethereum applications are highly domain-specific and may share similar functionalities within the same domain, e.g., token contracts often provide interfaces for money transfer and balance inquiry. While smart contract clones have a wide range of impact across different applications, e.g., security, they are relatively little studied. Although clone detection has been a long-standing research topic, blockchain smart contracts introduce new challenges, e.g., syntactic diversity due to trade-off between storage and execution, understanding high-level business logic etc.. In this paper, we highlighted the very first attempt to clone detection of Ethereum smart contracts. To overcome the new challenges, we introduce the concept of smart contract birthmark, i.e., a semantic-preserving and computable representation for smart contract bytecode. The birthmark captures high-level semantics by effectively sketching symbolic execution traces (e.g., data access dependencies, path conditions) and maintain syntactic regularities (e.g., type and number of instructions) as well. Then, the clone detection problem is reduced to a computation of statistical similarity between two contract birthmarks. We have implemented a clone detector called EClone and evaluated it on Ethereum. The empirical results demonstrated the potential of EClone in accurately identifying clones. We have also extended EClone for vulnerability search and managed to detect CVE-2018-10376 instances.
- Published
- 2019
31. Vetting API Usages in C Programs with IMChecker
- Author
-
Jiaguang Sun, Ming Gu, Jiecheng Wu, Min Zhou, Zuxing Gu, Yu Jiang, and Chi Li
- Subjects
Domain-specific language ,Application programming interface ,Computer science ,business.industry ,Programming language ,Semantics (computer science) ,020207 software engineering ,Linux kernel ,02 engineering and technology ,Static analysis ,computer.software_genre ,Software bug ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer ,Graphical user interface - Abstract
Libraries offer reusable functionality through application programming interfaces (APIs) with usage constraints such as call conditions and orders. Constraint violations, i.e., API misuses, commonly lead to bugs and even security issues. In this paper, we introduce IMChecker, a constraint-directed static analysis toolkit to vet API usages in C programs powered by a domain-specific language (DSL) to specify the API usages. First, we propose a DSL, which covers most API usage constraint types and enables straightforward but precise specification by studying real-world API-misuse bug patches. Then, we design and implement a static analysis engine to automatically parse specifications into checking targets, identify potential API misuses and prune the false positives with rich semantics. We have instantiated IMChecker for C programs with user-friendly graphic interfaces and evaluated the widely used benchmarks and real-world projects. The results show that IMChecker outperforms 4.78--36.25% in precision and 40.25--55.21% w.r.t. state-of-the-arts toolkits. We also found 75 previously unknown bugs in Linux kernel, OpenSSL and applications of Ubuntu, 61 of which have been confirmed by the corresponding development communities. Video: https://youtu.be/YGDxeyOEVIM Repository: https://github.com/tomgu1991/IMChecker
- Published
- 2019
32. Scalable and Extensible Static Memory Safety Analysis with Summary over Access Path
- Author
-
Xiaoyu Song, Jiaguang Sun, Guang Chen, and Min Zhou
- Subjects
Software ,Computer engineering ,Alias ,Computer science ,business.industry ,Pointer (computer programming) ,Scalability ,Test suite ,Static analysis ,business ,Pointer analysis ,Memory safety - Abstract
Static analysis is an effective way of checking memory safety issues in program. Usually, multiple analysis algorithms usually run together to achieve a precise analysis result. In this paper, a novel analysis frame work over access path is presented for incorporating analysis algorithms. A pointer analysis based on access path works as a base layer, alias and pointer information are automatically handled. An summary based checking algorithm is designed for checking real world project. Moreover, the framework is fully extensible and various analysis can be added as plugins. Experimental results show that our method has good precision on Juliet Test Suite and scales to large software.
- Published
- 2018
33. Scalable Verification Framework for C Program
- Author
-
Guang Chen, Li Tianchi, Ming Gu, Dexi Wang, Chao Zhang, and Jiaguang Sun
- Subjects
Model checking ,Source lines of code ,Computer science ,business.industry ,020207 software engineering ,02 engineering and technology ,Software ,Computer engineering ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Program slicing ,Software system ,business ,Software verification ,Abstraction (linguistics) - Abstract
Software verification has been well applied in safety critical areas and has shown the ability to provide better quality assurance for modern software. However, as lines of code and complexity of software systems increase, the scalability of verification becomes a challenge. In this paper, we present an automatic software verification framework TSV to address the scalability issues: (i) the extended structural abstraction and property-guided program slicing to solve large-scale program verification problem, saving time and memory without losing accuracy; (ii) automatically select different verification methods according to the program and property context to improve the verification efficiency. For evaluation, we compare TSV's different configurations with existing C program verifiers based on open benchmarks. We found that TSV with auto-selection performs better than with bounded model checking only or with extended structural abstraction only. What's more, TSV with auto selection achieves a better balance of accuracy, time and memory consumption.
- Published
- 2018
34. Efficient Recovery of Missing Events
- Author
-
Jiaguang Sun, Xiaochen Zhu, Jianmin Wang, Shaoxu Song, and Xuemin Lin
- Subjects
business.industry ,Business process ,Computer science ,Search engine indexing ,Complex event processing ,02 engineering and technology ,Petri net ,computer.software_genre ,Machine learning ,Computer Science Applications ,Computational Theory and Mathematics ,Transmission (telecommunications) ,020204 information systems ,Data quality ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,Pruning (decision trees) ,Recovery approach ,business ,computer ,Information Systems - Abstract
For various entering and transmission issues raised by human or system, missing events often occur in event data, which record execution logs of business processes. Without recovering the missing events, applications such as provenance analysis or complex event processing built upon event data are not reliable. Following the minimum change discipline in improving data quality, it is also rational to find a recovery that minimally differs from the original data. Existing recovery approaches fall short of efficiency owing to enumerating and searching over all of the possible sequences of events. In this paper, we study the efficient techniques for recovering missing events. According to our theoretical results, the recovery problem appears to be NP-hard. Nevertheless, advanced indexing, pruning techniques are developed to further improve the recovery efficiency. The experimental results demonstrate that our minimum recovery approach achieves high accuracy, and significantly outperforms the state-of-the-art technique for up to five orders of magnitudes improvement in time performance.
- Published
- 2016
35. Deep Learning of Transferable Representation for Scalable Domain Adaptation
- Author
-
Philip S. Yu, Mingsheng Long, Yue Cao, Jiaguang Sun, and Jianmin Wang
- Subjects
Multiple kernel learning ,Artificial neural network ,business.industry ,Computer science ,Deep learning ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science Applications ,Deep belief network ,Computational Theory and Mathematics ,Discriminative model ,Categorization ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Information Systems - Abstract
Domain adaptation generalizes a learning model across source domain and target domain that are sampled from different distributions. It is widely applied to cross-domain data mining for reusing labeled information and mitigating labeling consumption. Recent studies reveal that deep neural networks can learn abstract feature representation, which can reduce, but not remove, the cross-domain discrepancy. To enhance the invariance of deep representation and make it more transferable across domains, we propose a unified deep adaptation framework for jointly learning transferable representation and classifier to enable scalable domain adaptation, by taking the advantages of both deep learning and optimal two-sample matching. The framework constitutes two inter-dependent paradigms, unsupervised pre-training for effective training of deep models using deep denoising autoencoders, and supervised fine-tuning for effective exploitation of discriminative information using deep neural networks, both learned by embedding the deep representations to reproducing kernel Hilbert spaces (RKHSs) and optimally matching different domain distributions. To enable scalable learning, we develop a linear-time algorithm using unbiased estimate that scales linearly to large samples. Extensive empirical results show that the proposed framework significantly outperforms state of the art methods on diverse adaptation tasks: sentiment polarity prediction, email spam filtering, newsgroup content categorization, and visual object recognition.
- Published
- 2016
36. An Energy-Efficient Train Control Framework for Smart Railway Transportation
- Author
-
Jiaguang Sun, Jin Huang, Yangdong Deng, and Qinwen Yang
- Subjects
Optimization problem ,Computer science ,business.industry ,020209 energy ,Rail freight transport ,Decision tree ,Training (meteorology) ,02 engineering and technology ,Energy consumption ,Industrial engineering ,Theoretical Computer Science ,Identification (information) ,Computational Theory and Mathematics ,Hardware and Architecture ,Rail transportation ,Offline learning ,0202 electrical engineering, electronic engineering, information engineering ,business ,Software ,Simulation ,Efficient energy use - Abstract
Railway transportation systems are the backbone of smart cities. With the rapid increasing of railway mileage, the energy consumption of train becomes a major concern. The uniqueness of train operations is that the geographic characteristics of each route is known a priori. On the other hand, the parameters (e.g., loads) of a train varies from trip to trip. Such a specialty determines that an energy-optimal driving profile for each train operation has to be pursued by considering both the geographic information and the inherent train conditions. The solution of the optimization problem, however, is hard due to its high dimension, nonlinearity, complex constraints and time-varying characteristics of a control sequence. As a result, an energy-saving solution to the train control optimization problem has to address the dilemma of optimization quality and computing time. This work proposes an energy-efficient train control framework by integrating both offline and onboard optimization techniques. The offline processing builds a decision tree based sketchy solution through a complete flow of sequence mining, optimization and machine learning. The onboard system feeds the train parameters into the decision tree to derive an optimized control sequence. A key innovation of this work is the identification of optimal patterns of control sequence by data mining the driving behaviors of the experienced train drivers and then apply the patterns to online trip planning. The proposed framework efficiently find an optimized driving solution by leveraging the training results derived with a compute-intensive offline learning flow. The framework was already testified in a smart freight train system. It was demonstrated an average of $9.84$ percent energy-saving can be achieved.
- Published
- 2016
37. Multi-Task Learning of Generalizable Representations for Video Action Recognition
- Author
-
Zhiyu Yao, Jiaguang Sun, Philip S. Yu, Yunbo Wang, Jianmin Wang, and Mingsheng Long
- Subjects
FOS: Computer and information sciences ,Computer science ,Generalization ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Frame (networking) ,Supervised learning ,Computer Science - Computer Vision and Pattern Recognition ,Multi-task learning ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Data modeling ,Task (project management) ,ComputingMethodologies_PATTERNRECOGNITION ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences - Abstract
In classic video action recognition, labels may not contain enough information about the diverse video appearance and dynamics, thus, existing models that are trained under the standard supervised learning paradigm may extract less generalizable features. We evaluate these models under a cross-dataset experiment setting, as the above label bias problem in video analysis is even more prominent across different data sources. We find that using the optical flows as model inputs harms the generalization ability of most video recognition models. Based on these findings, we present a multi-task learning paradigm for video classification. Our key idea is to avoid label bias and improve the generalization ability by taking data as its own supervision or supervising constraints on the data. First, we take the optical flows and the RGB frames by taking them as auxiliary supervisions, and thus naming our model as Reversed Two-Stream Networks (Rev2Net). Further, we collaborate the auxiliary flow prediction task and the frame reconstruction task by introducing a new training objective to Rev2Net, named Decoding Discrepancy Penalty (DDP), which constraints the discrepancy of the multi-task features in a self-supervised manner. Rev2Net is shown to be effective on the classic action recognition task. It specifically shows a strong generalization ability in the cross-dataset experiments., ICME 2020
- Published
- 2018
38. PAFL: extend fuzzing optimizations of single mode to industrial parallel mode
- Author
-
Jiaguang Sun, Jie Liang, Y. B. Chen, Chijin Zhou, Mingzhe Wang, and Yu Jiang
- Subjects
Computer science ,National Vulnerability Database ,020207 software engineering ,02 engineering and technology ,computer.file_format ,Fuzz testing ,Data structure ,020202 computer hardware & architecture ,Task (computing) ,Mode (computer interface) ,Computer engineering ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Bitmap ,Software system ,computer - Abstract
Researchers have proposed many optimizations to improve the efficiency of fuzzing, and most optimized strategies work very well on their targets when running in single mode with instantiating one fuzzer instance. However, in real industrial practice, most fuzzers run in parallel mode with instantiating multiple fuzzer instances, and those optimizations unfortunately fail to maintain the efficiency improvements. In this paper, we present PAFL, a framework that utilizes efficient guiding information synchronization and task division to extend those existing fuzzing optimizations of single mode to industrial parallel mode. With an additional data structure to store the guiding information, the synchronization ensures the information is shared and updated among different fuzzer instances timely. Then, the task division promotes the diversity of fuzzer instances by splitting the fuzzing task into several sub-tasks based on branch bitmap. We first evaluate PAFL using 12 different real-world programs from Google fuzzer-test-suite. Results show that in parallel mode, two AFL improvers–AFLFast and FairFuzz do not outperform AFL, which is different from the case in single mode. However, when augmented with PAFL, the performance of AFLFast and FairFuzz in parallel mode improves. They cover 8% and 17% more branches, trigger 79% and 52% more unique crashes. For further evaluation on more widely-used software systems from GitHub, optimized fuzzers augmented with PAFL find more real bugs, and 25 of which are security-critical vulnerabilities registered as CVEs in the US National Vulnerability Database.
- Published
- 2018
39. EClone: detect semantic clones in Ethereum via symbolic transaction sketch
- Author
-
Yu Jiang, Wenqi Zhao, Jiaguang Sun, Chao Liu, Zhiqiang Yang, and Han Liu
- Subjects
Smart contract ,Database ,Computer science ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,020207 software engineering ,02 engineering and technology ,Semantic property ,computer.software_genre ,Sketch ,Set (abstract data type) ,Resource (project management) ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Clone (computing) ,Database transaction ,computer - Abstract
The Ethereum ecosystem has created a prosperity of smart contract applications in public blockchains, with transparent, traceable and programmable transactions. However, the flexibility that everybody can write and deploy smart contracts on Ethereum causes a large collection of similar contracts, i.e., clones. In practice, smart contract clones may amplify severe threats like security attacks, resource waste etc. In this paper, we have developed EClone, a semantic clone detector for Ethereum. The key insight of our clone detection is Symbolic Transaction Sketch, i.e., a set of critical semantic properties generated from symbolic transaction. Sketches of two smart contracts will be normalized into numeric vectors with a same length. Then, the clone detection problem is modeled as a similarity computation process where sketches and other syntactic information are combined. We have applied EClone in identifying semantic clones of deployed Ethereum smart contracts and achieved an accuracy of 93.27%. A demo video of EClone is at https://youtu.be/IRasOVv6vyc.
- Published
- 2018
40. VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation
- Author
-
Heyuan Shi, Jiaguang Sun, Yu Jiang, Xin Yang, Jian Gao, and Ying Fu
- Subjects
Emulation ,Binary function ,Computer science ,business.industry ,media_common.quotation_subject ,020208 electrical & electronic engineering ,Vulnerability ,Binary number ,020207 software engineering ,02 engineering and technology ,Machine learning ,computer.software_genre ,Identification (information) ,0202 electrical engineering, electronic engineering, information engineering ,Clone (computing) ,Artificial intelligence ,Function (engineering) ,business ,computer ,media_common ,TRACE (psycholinguistics) - Abstract
Learning-based clone detection is widely exploited for binary vulnerability search. Although they solve the problem of high time overhead of traditional dynamic and static search approaches to some extent, their accuracy is limited, and need to manually identify the true positive cases among the top-M search results during the industrial practice. This paper presents VulSeeker-Pro, an enhanced binary vulnerability seeker that integrates function semantic emulation at the back end of semantic learning, to release the engineers from the manual identification work. It first uses the semantic learning based predictor to quickly predict the top-M candidate functions which are the most similar to the vulnerability from the target binary. Then the top-M candidates are fed to the emulation engine to resort, and more accurate top-N candidate functions are obtained. With fast filtering of semantic learning and dynamic trace generation of function semantic emulation, VulSeeker-Pro can achieve higher search accuracy with little time overhead. The experimental results on 15 known CVE vulnerabilities involving 6 industry widely used programs show that VulSeeker-Pro significantly outperforms the state-of-the-art approaches in terms of accuracy. In a total of 45 searches, VulSeeker-Pro finds 40 and 43 real vulnerabilities in the top-1 and top-5 candidate functions, which are 12.33× and 2.58× more than the most recent and related work Gemini. In terms of efficiency, it takes 0.22 seconds on average to determine whether the target binary function contains a known vulnerability or not.
- Published
- 2018
41. S-gram: towards semantic-aware security auditing for Ethereum smart contracts
- Author
-
Yu Jiang, Jiaguang Sun, Chao Liu, Wenqi Zhao, and Han Liu
- Subjects
Blockchain ,Computer science ,Vulnerability ,020207 software engineering ,02 engineering and technology ,Audit ,Symbolic execution ,Computer security ,computer.software_genre ,Security token ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Solidity ,Key (cryptography) ,Database transaction ,computer - Abstract
Smart contracts, as a promising and powerful application on the Ethereum blockchain, have been growing rapidly in the past few years. Since they are highly vulnerable to different forms of attacks, their security becomes a top priority. However, existing security auditing techniques are either limited in finding vulnerabilities (rely on pre-defined bug patterns) or very expensive (rely on program analysis), thus are insufficient for Ethereum. To mitigate these limitations, we proposed a novel semantic-aware security auditing technique called S-GRAM for Ethereum. The key insight is a combination of N-gram language modeling and lightweight static semantic labeling, which can learn statistical regularities of contract tokens and capture high-level semantics as well (e.g., flow sensitivity of a transaction). S-GRAM can be used to predict potential vulnerabilities by identifying irregular token sequences and optimize existing in-depth analyzers (e.g., symbolic execution engines, fuzzers etc.). We have implemented S-GRAM for Solidity smart contracts in Ethereum. The evaluation demonstrated the potential of S-GRAM in identifying possible security issues.
- Published
- 2018
42. VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary
- Author
-
Jiaguang Sun, Yu Jiang, Xin Yang, Jian Gao, and Ying Fu
- Subjects
Binary search algorithm ,Binary function ,Computer science ,business.industry ,Code reuse ,Software development ,Binary number ,020207 software engineering ,02 engineering and technology ,Function (mathematics) ,Semantics ,computer.software_genre ,020204 information systems ,Basic block ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Data mining ,business ,computer - Abstract
Code reuse improves software development efficiency, however, vulnerabilities can be introduced inadvertently. Many existing works compute the code similarity based on CFGs to determine whether a binary function contains a known vulnerability. Unfortunately, their performance in cross-platform binary search is challenged. This paper presents VulSeeker, a semantic learning based vulnerability seeker for cross-platform binary. Given a target function and a vulnerable function, VulSeeker first constructs the labeled semantic flow graphs and extracts basic block features as numerical vectors for both of them. Then the embedding vector of the whole binary function is generated by feeding the numerical vectors of basic blocks to the customized semantics aware DNN model. Finally, the similarity of the two binary functions is measured based on the Cosine distance. The experimental results show that VulSeeker outperforms the state-of-the-art approaches in terms of accuracy. For example, compared to the most recent and related work Gemini, VulSeeker finds 50.00% more vulnerabilities in the top-10 candidates and 13.89% more in the top-50 candidates, and improves the values of AUC and ACC for 8.23% and 12.14% respectively. The video is presented at https://youtu.be/Mw0mr84gpI8.
- Published
- 2018
43. SAFL
- Author
-
Han Liu, Jiaguang Sun, Mingzhe Wang, Xi Bin Zhao, Xun Jiao, Y. B. Chen, Yu Jiang, and Jie Liang
- Subjects
Computer science ,media_common.quotation_subject ,Process (computing) ,020207 software engineering ,02 engineering and technology ,Fuzz testing ,Symbolic execution ,020202 computer hardware & architecture ,Computer engineering ,Software testing ,Mutation (genetic algorithm) ,0202 electrical engineering, electronic engineering, information engineering ,State space ,Quality (business) ,media_common - Abstract
Mutation-based fuzzing is a widely used software testing technique for bug and vulnerability detection, and the testing performance is greatly affected by the quality of initial seeds and the effectiveness of mutation strategy. In this paper, we present SAFL1, an efficient fuzzing testing tool augmented with qualified seed generation and efficient coverage-directed mutation. First, symbolic execution is used in a lightweight approach to generate qualified initial seeds. Valuable explore directions are learned from the seeds, thus the later fuzzing process can reach deep paths in program state space earlier and easier. Moreover, we implement a fair and fast coverage-directed mutation algorithm. It helps the fuzzing process to exercise rare and deep paths with higher probability. We implement SAFL based on KLEE and AFL and conduct thoroughly repeated evaluations on real-world program benchmarks against state-of-the-art versions of AFL. After 24 hours, compared to AFL and AFLFast, it discovers 214% and 133% more unique crashes, covers 109% and 63% more paths and achieves 279% and 180% more covered branches. Video link: https://youtu.be/LkiFLNMBhVE
- Published
- 2018
44. Weak-assert
- Author
-
Xibin Zhao, Cong Wang, Xiaoyu Song, Jiaguang Sun, Ming Gu, and Yu Jiang
- Subjects
Source code ,business.industry ,Computer science ,media_common.quotation_subject ,Assertion ,020207 software engineering ,02 engineering and technology ,020202 computer hardware & architecture ,Set (abstract data type) ,Program analysis ,Test case ,Software ,TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS ,Abstract syntax ,0202 electrical engineering, electronic engineering, information engineering ,Test suite ,Software engineering ,business ,media_common - Abstract
Assertions are helpful in program analysis, such as software testing and verification. The most challenging part of automatically recommending assertions is to design the assertion patterns and to insert assertions in proper locations. In this paper, we develop Weak-Assert1, a weakness-oriented assertion recommendation toolkit for program analysis of C code. A weakness-oriented assertion is an assertion which can help to find potential program weaknesses. Weak-Assert uses well-designed patterns to match the abstract syntax trees of source code automatically. It collects significant messages from trees and inserts assertions into proper locations of programs. These assertions can be checked by using program analysis techniques. The experiments are set up on Juliet test suite and several actual projects in Github. Experimental results show that Weak-Assert helps to find 125 program weaknesses in 26 actual projects. These weaknesses are confirmed manually to be triggered by some test cases. The address of the abstract demo video is: https://youtu.be/_RWC4GJvRWc
- Published
- 2018
45. DLFuzz: Differential Fuzzing Testing of Deep Learning Systems
- Author
-
Yu Jiang, Jiaguang Sun, Yue Zhao, Quan Chen, and Jianmin Guo
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,White-box testing ,Deep learning ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,Fuzz testing ,Machine learning ,computer.software_genre ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,computer - Abstract
Deep learning (DL) systems are increasingly applied to safety-critical domains such as autonomous driving cars. It is of significant importance to ensure the reliability and robustness of DL systems. Existing testing methodologies always fail to include rare inputs in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz, the frst differential fuzzing testing framework to guide DL systems exposing incorrect behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and the prediction difference between the original input and the mutated input, without manual labeling effort or cross-referencing oracles from other DL systems with the same functionality. We present empirical evaluations on two well-known datasets to demonstrate its efficiency. Compared with DeepXplore, the state-of-the-art DL whitebox testing framework, DLFuzz does not require extra efforts to find similar functional DL systems for cross-referencing check, but could generate 338.59% more adversarial inputs with 89.82% smaller perturbations, averagely obtain 2.86% higher neuron coverage, and save 20.11% time consumption., Comment: This paper is to appear in ESEC/FSE'2018 (NIER track)
- Published
- 2018
- Full Text
- View/download PDF
46. First, Debug the Test Oracle
- Author
-
Xiaoyu Song, Jiaguang Sun, Ming Gu, Xinrui Guo, and Min Zhou
- Subjects
Source lines of code ,business.industry ,Programming language ,Computer science ,media_common.quotation_subject ,computer.software_genre ,Oracle ,Test (assessment) ,Software ,Test case ,Debugging ,Test suite ,Algorithm design ,business ,computer ,media_common - Abstract
Opposing to the oracle assumption, a trustworthy test oracle is not always available in real practice. Since manually written oracles and human judgements are still widely used, testers and programmers are in fact facing a high risk of erroneous test oracles. However, test oracle errors can bring much confusion thus causing extra time consumption in the debugging process. As substantiated by our experiment on the Siemens Test Suite, automatic fault localization algorithms suffer severely from erroneous test oracles, which impede them from reducing debugging time to the full extent. This paper proposes a simple but effective approach to debug the test oracle. Based on the observation that test cases covering similar lines of code usually generate similar results, we are able to identify suspicious test cases that are differently judged by the test oracle from their neighbors. To validate the effectiveness of our approach, experiments are conducted on both the Siemens Test Suite and grep . The results show that averagely over 75 percent of the highlighted test cases are actually test oracle errors. Moreover, performance of fault localization algorithms recovered remarkably with the debugged oracles.
- Published
- 2015
47. Design of Mixed Synchronous/Asynchronous Systems with Multiple Clocks
- Author
-
Ming Gu, Jiaguang Sun, Han Liu, Hehua Zhang, Yu Jiang, Huafeng Zhang, and Xiaoyu Song
- Subjects
Synchronous circuit ,Asynchronous system ,Computer science ,Distributed computing ,Parallel computing ,Operational semantics ,Atomic clock ,Computational Theory and Mathematics ,Synchronizer ,Hardware and Architecture ,Asynchronous communication ,Component (UML) ,Signal Processing ,VHDL ,Code generation ,Field-programmable gate array ,computer ,Block (data storage) ,computer.programming_language - Abstract
Today’s distributed systems are commonly equipped with both synchronous and asynchronous components controlled with multiple clocks. The key challenges in designing such systems are (1) how to model multi-clocked local synchronous component, local asynchronous component, and asynchronous communication among components in a single framework. (2) how to ensure the correctness of model, and keep consistency between the model and the implementation of real system. In this paper, we propose a novel computation model named GalsBlock for the design of multi-clocked embedded system with both synchronous and asynchronous components. The computation model consists of several hierarchical compound and atom blocks communicating with data port connections. Each atom block can be refined as parallel mealy automata. The synchronous component can be captured in an atom block with the corresponding local control clock while the asynchronous component in an atom block without clock, and the asynchronous communications can be captured in the data port connections among blocks. The unified operational semantics and formal semantics are defined, which can be used for simulation and verification, respectively. Then, we can generate efficient VHDL code from the validated model, which can be synthesized into the FPGA processor for execution directly. We have developed the graphical modeling, simulation, verification, and code generation toolkit to support the computation model, and applied it in the design of a sub-system used in the real train communication control.
- Published
- 2015
48. Domain Invariant Transfer Kernel Learning
- Author
-
Mingsheng Long, Jiaguang Sun, Philip S. Yu, and Jianmin Wang
- Subjects
Graph kernel ,Computer science ,Feature vector ,Kernel principal component analysis ,Kernel (linear algebra) ,Polynomial kernel ,String kernel ,Approximation error ,Invariant (mathematics) ,Training set ,Contextual image classification ,Representer theorem ,business.industry ,Pattern recognition ,Generalization error ,Computer Science Applications ,Kernel method ,Computational Theory and Mathematics ,Kernel embedding of distributions ,Variable kernel density estimation ,Kernel (statistics) ,Radial basis function kernel ,Kernel smoother ,Principal component regression ,Artificial intelligence ,Tree kernel ,business ,Transfer of learning ,Information Systems ,Reproducing kernel Hilbert space - Abstract
Domain transfer learning generalizes a learning model across training data and testing data with different distributions. A general principle to tackle this problem is reducing the distribution difference between training data and testing data such that the generalization error can be bounded. Current methods typically model the sample distributions in input feature space, which depends on nonlinear feature mapping to embody the distribution discrepancy. However, this nonlinear feature space may not be optimal for the kernel-based learning machines. To this end, we propose a transfer kernel learning (TKL) approach to learn a domain-invariant kernel by directly matching source and target distributions in the reproducing kernel Hilbert space (RKHS). Specifically, we design a family of spectral kernels by extrapolating target eigensystem on source samples with Mercer’s theorem. The spectral kernel minimizing the approximation error to the ground truth kernel is selected to construct domain-invariant kernel machines. Comprehensive experimental evidence on a large number of text categorization, image classification, and video event recognition datasets verifies the effectiveness and efficiency of the proposed TKL approach over several state-of-the-art methods.
- Published
- 2015
49. Identifying and constructing elemental parts of shafts based on conditional random fields model
- Author
-
Fangtao Li, Yamei Wen, Jiaguang Sun, and Hui Zhang
- Subjects
Conditional random field ,Computer science ,Heuristic ,3D reconstruction ,3d model ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Industrial and Manufacturing Engineering ,Computer Science Applications ,Task (project management) ,Identification (information) ,Data mining ,Semantic information ,CRFS ,computer - Abstract
Semantic information is very important for understanding 2D engineering drawings. However, this kind of information is implicit so that it is hard to be extracted and understood by computers. In this paper, we aim to identify the semantic information of shafts from their 2D drawings, and then reconstruct the 3D models. The 2D representations of shafts are diverse. By analyzing the characteristics of 2D drawings of shafts, we find that there is always a view which represents the projected outline of the shaft, and each loop in this view corresponds to an elemental part. The conditional random fields (CRFs) model is a classification technique which can automatically integrate various features, rather than manually organizing of heuristic rules. We first use a CRFs model to identify elemental parts with semantic information. The 3D elemental parts are then constructed by a parameters template method. Compared with the existing 3D reconstruction methods, our approach can obtain both geometrical information and semantic information of each part of shafts from 2D drawings. Several examples are provided to demonstrate that our algorithm can accurately handle diverse 2D drawings of shafts. Our work improves the level of semantic understanding of 2D projections in 3D solids reconstruction.It is the first trial to formulate the parts identification task into a classification problem.We employ an advanced classification model, CRFs, to identify the elemental parts.
- Published
- 2015
50. Deadlock detection in FPGA design: A practical approach
- Author
-
Chao Su, Dexi Wang, Fei He, Jiaguang Sun, Yangdong Deng, and Ming Gu
- Subjects
Multidisciplinary ,Computer science ,business.industry ,Vehicle bus ,Software ,Gate array ,Embedded system ,VHDL ,Verilog ,Field-programmable gate array ,business ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Deadlock prevention algorithms ,Formal verification ,computer ,Hardware_LOGICDESIGN ,computer.programming_language - Abstract
Formal verification of VHSIC Hardware Description Language (VHDL) in Field-Programmable Gate Array (FPGA) design has been discussed for many years. In this paper we provide a practical approach to do so. We present a semi-automatic way to verify FPGA VHDL software deadlocks, especially those that reside in automata. A domain is defined to represent the VHDL modules that will be verified; these modules will be transformed into Verilog models and be verified by SMV tools. By analyzing the verification results of SMV, deadlocks can be found; after looking back to the VHDL code, the deadlocking code is located and the problem is solved. VHDL verification is particularly important in safety-critical software. As an example, our solution is applied to a Multifunction Vehicle Bus Controller (MVBC) system for a train. The safety properties were tested well in the development stage, but experienced a breakdown during the long-term software testing stage, which was mainly caused by deadlocks in the VHDL software. In this special case, we managed to locate the VHDL deadlocks and solve the problem by the FPGA deadlock detection approach provided in this paper, which demonstrates that our solution works well.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.