36 results on '"ZhiMao Lu"'
Search Results
2. BERT-based coupling evaluation of biological strategies in bio-inspired design
- Author
-
Feng Sun, He Xu, Yihan Meng, Zhimao Lu, and Chengju Gong
- Subjects
Artificial Intelligence ,General Engineering ,Computer Science Applications - Published
- 2023
- Full Text
- View/download PDF
3. A Complex Encryption System Design Implemented by AES
- Author
-
Houmed Mohamed and Zhimao Lu
- Subjects
Computer engineering ,Symmetric-key algorithm ,business.industry ,Computer science ,Advanced Encryption Standard ,Key (cryptography) ,Data security ,Cryptography ,Information security ,Cryptographic protocol ,business ,Encryption - Abstract
With the rapid development of internet technology and the increasing popularity of e-commerce, data encryption technology plays a very important role in data security. Information security has two aspects: security protocol and cryptographic algorithm and the latter is the foundation and core technology of information security. Advanced Encryption Standard (AES) encryption algorithm is one of the most commonly used algorithms in symmetric encryption algorithms. Such algorithms face issues when used in the context of key management and security functions. This paper focuses on the systematic analysis of these issues and summarizes AES algorithm implementation, comprehensive application and algorithm comparison with other existing methods. To analyze the performance of the proposed algorithm and to make full use of the advantages of AES encryption algorithm, one needs to reduce round key and improve the key schedule, as well as organically integrate with RSA algorithm. Java language is used to implement the algorithm due to its large library, then to show the efficiency of the proposed method we compare different parameters, such as encryption/decryption speed, entropies and memory consumption...) with a classic algorithm. Based on the results of the comparison between AES and the hybrid AES algorithm, the proposed algorithm shows good performance and high security. It therefore can be used for key management and security functions, particularly for sharing sensitive files through insecure channel. This analysis provides a reference useful for selecting different encryption algorithms according to different business needs.
- Published
- 2021
- Full Text
- View/download PDF
4. BERT and Pareto dominance applied to biological strategy decision for bio-inspired design
- Author
-
Feng Sun, He Xu, Yihan Meng, Zhimao Lu, Siqing Chen, Qiandiao Wei, and Chengying Bai
- Subjects
Artificial Intelligence ,Building and Construction ,Information Systems - Published
- 2023
- Full Text
- View/download PDF
5. A Ranking-Based Text Matching Approach for Plagiarism Detection
- Author
-
Zhimao Lu, Zhongyuan Han, Haoliang Qi, and Leilei Kong
- Subjects
Information retrieval ,Computer science ,020204 information systems ,Applied Mathematics ,Text matching ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Plagiarism detection ,02 engineering and technology ,Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Ranking (information retrieval) - Published
- 2018
- Full Text
- View/download PDF
6. A machine learning approach to query generation in plagiarism source retrieval
- Author
-
Haoliang Qi, Zhongyuan Han, Zhimao Lu, and Leilei Kong
- Subjects
Training set ,Exploit ,Computer Networks and Communications ,Computer science ,business.industry ,Heuristic ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,02 engineering and technology ,Machine learning ,computer.software_genre ,Ranking (information retrieval) ,Task (computing) ,Hardware and Architecture ,020204 information systems ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Learning to rank ,Plagiarism detection ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer - Abstract
Plagiarism source retrieval is the core task of plagiarism detection. It has become the standard for plagiarism detection to use the queries extracted from suspicious documents to retrieve the plagiarism sources. Generating queries from a suspicious document is one of the most important steps in plagiarism source retrieval. Heuristic-based query generation methods are widely used in the current research. Each heuristic-based method has its own advantages, and no one statistically outperforms the others on all suspicious document segments when generating queries for source retrieval. Further improvements on heuristic methods for source retrieval rely mainly on the experience of experts. This leads to difficulties in putting forward new heuristic methods that can overcome the shortcomings of the existing ones. This paper paves the way for a new statistical machine learning approach to select the best queries from the candidates. The statistical machine learning approach to query generation for source retrieval is formulated as a ranking framework. Specifically, it aims to achieve the optimal source retrieval performance for each suspicious document segment. The proposed method exploits learning to rank to generate queries from the candidates. To our knowledge, our work is the first research to apply machine learning methods to resolve the problem of query generation for source retrieval. To solve the essential problem of an absence of training data for learning to rank, the building of training samples for source retrieval is also conducted. We rigorously evaluate various aspects of the proposed method on the publicly available PAN source retrieval corpus. With respect to the established baselines, the experimental results show that applying our proposed query generation method based on machine learning yields statistically significant improvements over baselines in source retrieval effectiveness.
- Published
- 2017
- Full Text
- View/download PDF
7. A Method of Plagiarism Source Retrieval and Text Alignment Based on Relevance Ranking Model
- Author
-
Haoliang Qi, Leilei Kong, Feng Zhao, Zicheng Zhao, and Zhimao Lu
- Subjects
Information retrieval ,General Computer Science ,Process (engineering) ,Text alignment ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,05 social sciences ,050301 education ,050801 communication & media studies ,Ranking (information retrieval) ,Task (project management) ,0508 media and communications ,Vector space model ,Digital resources ,Relevance (information retrieval) ,Plagiarism detection ,0503 education - Abstract
The problem of text plagiarism has increased because of the digital resources available on the World Wide Web. Source Retrieval and Text Alignment are two core tasks of plagiarism detection. A plagiarism source retrieval and text alignment system based on relevance ranking model is described in this paper. Not only the source retrieval task but also the text alignment task is all regarded as a process of information retrieval, and the relevance ranking is used to search the plagiarism sources and obtain the candidate plagiarism seeds. For source retrieval, BM25 model is used, while for text alignment, Vector Space Model is exploited. Furthermore, a plagiarism detection system named HawkEyes is developed based on the proposed methods and some demonstrations of HawkEyes are given.
- Published
- 2016
- Full Text
- View/download PDF
8. Crystal Morphology Monitoring based on In-situ Image Analysis of L-glutamic Acid Crystallization
- Author
-
Zhimao Lu, Guijuan Zhang, Lin Zhang, Yongming Jiang, Mengzhu Liu, and Chi Zhang
- Subjects
In situ ,Crystal ,Crystallography ,Materials science ,law ,Visual descriptors ,Glutamic acid ,Crystallization ,Crystal morphology ,law.invention - Published
- 2019
- Full Text
- View/download PDF
9. A Ranking Approach to Source Retrieval of Plagiarism Detection
- Author
-
Zhongyuan Han, Zhimao Lu, Haoliang Qi, and Leilei Kong
- Subjects
Information retrieval ,Artificial Intelligence ,Hardware and Architecture ,Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Plagiarism detection ,02 engineering and technology ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Software ,Ranking (information retrieval) - Published
- 2017
- Full Text
- View/download PDF
10. Detecting High Obfuscation Plagiarism: Exploring Multi-Features Fusion via Machine Learning
- Author
-
Zhongyuan Han, Leilei Kong, Zhimao Lu, and Haoliang Qi
- Subjects
Fusion ,ComputingMilieux_THECOMPUTINGPROFESSION ,Exploit ,Computer Networks and Communications ,business.industry ,Computer science ,Single type ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Lexicon ,Machine learning ,computer.software_genre ,Plagiarism detection ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software - Abstract
Providing effective methods of identification of high-obfuscation plagiarism seeds presents a significant research problem in the field of plagiarism detection. The conventional methods of plagiarism detection are based on single type of features to capture plagiarism seeds. But for high-obfuscation plagiarism detection, these single type features are not sufficient for identifying the plagiarism seeds effectively because of the varied plagiarism methods used in high-obfuscation plagiarism. This paper presents a multi-features fusion method for the highobfuscation plagiarism seeds identification. This method exploits Logical Regression model to integrate lexicon features, syntax features, semantics features and structure features which extracted from suspicious document and source document. A multi-feature fusion classifier based on Logical Regression model is proposed to decide whether a text fragment pair can be regarded as plagiarism seeds or not. Experimental results on the PAN@CLEF2013 summary-obfuscation corpus show that the fusion of different types of features produces more accurate results.
- Published
- 2014
- Full Text
- View/download PDF
11. Visual analytics for the clustering capability of data
- Author
-
Chen Liu, Dongmei Fan, Qi Zhang, Chun-xiang Zhang, Peng Yang, and Zhimao Lu
- Subjects
Clustering high-dimensional data ,Fuzzy clustering ,General Computer Science ,Computer science ,business.industry ,Single-linkage clustering ,Correlation clustering ,Constrained clustering ,Machine learning ,computer.software_genre ,Hierarchical clustering ,ComputingMethodologies_PATTERNRECOGNITION ,CURE data clustering algorithm ,Data mining ,Artificial intelligence ,business ,Cluster analysis ,computer - Abstract
Clustering analysis is an unsupervised method to find hidden structures in datasets and has been widely used in various fields. However, it is always difficult for users to understand, evaluate, and explain the clustering results in the spaces with dimension greater than three. Although high-dimensional visualization of clustering technology can express clustering results well, it still has significant limitations. In this paper, a visualization cluster analysis method based on the minimum distance spectrum (MinDS) is proposed, aimed at reducing the problems of clustering multidimensional datasets. First, the concept of MinDS is defined based on the distance between high-dimensional data. MinDS can map any dataset from high-dimensional space to a lower dimension to determine whether the data set is separable. Next, a clustering method which can automatically determine the number of categories is designed based on MinDS. This method is not only able to cluster a dataset with clear boundaries, but can also cluster the dataset with fuzzy boundaries through the edge corrosion strategy based on the energy of each data point. In addition, strategies for removing noise and identifying outliers are designed to clean datasets according to the characteristics of MinDS. The experimental results presented validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach is simple, stable, and efficient, and can achieve multidimensional visualization cluster analysis of complex datasets.
- Published
- 2013
- Full Text
- View/download PDF
12. 一种基于数据竞争的高分辨率图像的聚类分割算法
- Author
-
DongMei Fan, Qi Zhang, ZhiMao Lu, XiaoLi Xu, and BingCai Chen
- Subjects
General Computer Science ,business.industry ,Machine vision ,Computer science ,Segmentation-based object categorization ,Correlation clustering ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-space segmentation ,Image segmentation ,CURE data clustering algorithm ,Canopy clustering algorithm ,Computer vision ,Artificial intelligence ,Cluster analysis ,business ,Engineering (miscellaneous) - Abstract
In recent years, robot technology has been rapidly developed and applied to every walk of life. With the promotion and popularization of the robot technology, requirement of using robot is becoming high and the demand of intelligent robots is particularly urgent. Machine vision is an important research direction in the field of intelligent robotics. In the robot vision system, the core problem is the targets extraction, and the image segmentation is the key technique of extracting targets accurately, rapidly and in real-time. Since the environment is complex and the targets are diverse, the amount of images data perceived by robots is large and the images are unpredictable. Thus, it is very important to extract and segment targets accurately. Aimed at the segmentation processing of high-resolution images, a novel clustering algorithm is proposed in this paper. According to the sum of the energy of data and their sizes, it recognizes the clustering representatives and members, and identifies the most probability members by the competition among data points. Then we apply the algorithm by combining the novel clustering algorithm into Mean Shift clustering algorithm in color image segmentation problem. The algorithm can quickly and efficiently achieve the targets segmentation of high-resolution images, and has good segmentation effect. The experiments show that the proposed approach has better clustering quality and is faster than the traditional clustering algorithm.
- Published
- 2012
- Full Text
- View/download PDF
13. Clustering by data competition
- Author
-
ZhiMao Lu and Qi Zhang
- Subjects
Fuzzy clustering ,General Computer Science ,Computer science ,business.industry ,Single-linkage clustering ,Correlation clustering ,Constrained clustering ,Pattern recognition ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,CURE data clustering algorithm ,Canopy clustering algorithm ,Artificial intelligence ,Data mining ,Cluster analysis ,business ,computer - Abstract
Clustering analysis is an unsupervised method to find out hidden structures in datasets. Most partitional clustering algorithms are sensitive to the selection of initial exemplars, the outliers and noise. In this paper, a novel technique called data competition algorithm is proposed to solve the problems. First the concept of aggregation field model is defined to describe the partitional clustering problem. Next, the exemplars are identified according to the data competition. Then, the members will be assigned to the suitable clusters. Data competition algorithm is able to avoid poor solutions caused by unlucky initializations, outliers and noise, and can be used to detect the coexpression gene, cluster the image, diagnose the disease, distinguish the variety, etc. The provided experimental results validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach of data competition algorithm is simple, stable and efficient. The experimental results also show that the proposed approach of data competition clustering outperforms three of the most well known clustering algorithms K-means clustering, affinity propagation clustering, hierarchical clustering.
- Published
- 2012
- Full Text
- View/download PDF
14. Extracting Knowledge from On-Line Forums for Non-Obstructive Psychological Counseling Q&A System
- Author
-
Ming Liu, Zhimao Lu, Yuanchao Liu, and Mingkai Song
- Subjects
Information retrieval ,Knowledge management ,business.industry ,Computer science ,computer.internet_protocol ,Keyword extraction ,Popularity ,Knowledge base ,Knowledge extraction ,Component (UML) ,business ,Construct (philosophy) ,computer ,Word (computer architecture) ,XML - Abstract
Psychological counseling Q&A system is enjoying a remarkable and increasing popularity in recent years. Knowledge base is the important component for such kind of systems, but it is difficult and time-consuming to construct the knowledge base manually. Fortunately, there emerges large number of Q&A pairs in many psychological counseling websites, which can provide good source enriching the knowledge base. This paper presents the method of knowledge extraction from psychological consulting Q&A pairs of on-line psychological counseling websites, which include keywords, semantic extension and word sequence. P-XML, which is the knowledge template based on XML, is also designed to store the knowledge. The extracted knowledge has been successfully used in our non-obstructive psychologycal counseling system, called P.A.L., and the experimental results also demonstrated the feasibility and effectiveness of our approach.
- Published
- 2012
- Full Text
- View/download PDF
15. Chinese Word Sense Disambiguation Based on Bayesian Model Improved by Information Gain
- Author
-
Shu-shen Pan, Rubo Zhang, Zhimao Lu, and Dongmei Fan
- Subjects
Computer science ,business.industry ,Speech recognition ,Sense (electronics) ,computer.software_genre ,Bayesian inference ,SemEval ,Artificial intelligence ,Electrical and Electronic Engineering ,Chinese word ,Information gain ,business ,computer ,Natural language processing - Published
- 2011
- Full Text
- View/download PDF
16. Word Sense Disambiguation based on improved Bayesian classifiers
- Author
-
Ting Liu, Sheng Li, and Zhimao Lu
- Subjects
business.industry ,Computer science ,Feature extraction ,Context (language use) ,computer.software_genre ,Bayesian inference ,SemEval ,Naive Bayes classifier ,Dependency grammar ,Classifier (linguistics) ,Unsupervised learning ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing - Abstract
Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.
- Published
- 2006
- Full Text
- View/download PDF
17. HawkEyes Plagiarism Detection System
- Author
-
Zhimao Lu, Feng Zhao, Zhongyuan Han, Haoliang Qi, Leilei Kong, Jie Li, and Yong Han
- Subjects
Information retrieval ,Training set ,ComputingMilieux_THECOMPUTINGPROFESSION ,Computer science ,business.industry ,Text alignment ,Big data ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Clef ,symbols.namesake ,ComputingMilieux_COMPUTERSANDEDUCATION ,symbols ,Plagiarism detection ,Algorism ,business - Abstract
The high-obfuscation plagiarism detection in big data environment, such as the paraphrasing and cross-language plagiarism, is often difficult for anti-plagiarism system because the plagiarism skills are becoming more and more complex. This paper proposes HawkEyes, a plagiarism detection system implemented based on the source retrieval and text alignment algorithms which developed for the international competition on plagiarism detection organized by CLEF. The text alignment algorism in HawkEyes gained the first place in PAN@CLEF2012. In the demonstration, we will present our system implemented on PAN@CLEF2014 training data corpus.
- Published
- 2015
- Full Text
- View/download PDF
18. S-box: L-L Cascade Chaotic Map and Line Map
- Author
-
Zhimao Lu and Ye Tian
- Subjects
S-box ,Nonlinear system ,Computer science ,business.industry ,Cascade ,Bijection ,Integer sequence ,Cryptography ,business ,Algorithm ,Independence (probability theory) ,Computer Science::Cryptography and Security ,Block cipher - Abstract
Being as an important nonlinear component of block ciphers, Substitution box (S-box) directly affect the security of the cryptographic systems. It is important and difficult to design cryptographically strong S-box that simultaneously meet with multiple cryptographic criteria such as bijection, non-linearity, strict avalanche criterion (SAC), bits independence criterion (BIC), differential probability (DP) and linear probability (LP). To address the issue, an S-box generation approach based on L-L cascade Chaotic Map and Line Map (LLCMLM) is proposed in this paper. L-L cascade chaotic map is used to generate an integer sequence ranging 0–255, and line map is applied to scramble the position of the integer sequence. A series of experiments have been conducted to compare multiple cryptographic criteria of LLCMLM with other algorithms. Simulation results indicate that LLCMLM meets well with the design criteria of the S-box.
- Published
- 2015
- Full Text
- View/download PDF
19. A K-means clustering algorithm based on the maximum triangle rule
- Author
-
Jinmei Feng, Peng Yang, Xiaoli Xu, and Zhimao Lu
- Subjects
Determining the number of clusters in a data set ,Combinatorics ,Fuzzy clustering ,CURE data clustering algorithm ,Correlation clustering ,Single-linkage clustering ,Canopy clustering algorithm ,Cluster analysis ,k-medians clustering ,Mathematics - Abstract
Being a measurable criterion of clustering quality for the classical K-means algorithm, the objective function always exists many local minimum values. The objective function may converge at some minimum values, when the initial clustering centers are dropped neighbor to the local minimum values, or the two data objects in the same cluster are regarded as two initial clustering centers which represent two clusters. Then, the problem of local optimal solution will happen. To this, a K-means clustering algorithm based on the maximum triangle rule (KMTR) is proposed in this paper. KMTR, which uses the rule of maximum triangle, selects appropriate initial clustering centers for the classical K-means algorithm. Experimental results on some UCI data sets show the validity of applying maximum triangle rule to the K-means algorithm.
- Published
- 2012
- Full Text
- View/download PDF
20. Color image segmentation based on watershed and Ncut of improved weight matrix
- Author
-
Xiaoli Xu, Zhimao Lu, and Haiyan Li
- Subjects
Pixel ,Computational complexity theory ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Graph theory ,Image segmentation ,Image (mathematics) ,Graph (abstract data type) ,Computer vision ,Artificial intelligence ,Mean-shift ,business ,Cluster analysis ,Mathematics - Abstract
A new color image segmentation method combining twice used watershed and Ncut of improved the weight matrix algorithm is presented in this paper. It preprocesses an image by the twice used watershed algorithm to form segmented regions that preserve the desirable discontinuity characteristics of image. The segmented regions, instead of the image pixels are then represented by using the graph structure and considered as the input image of Ncut algorithm. In addition a new weight matrix is designed in this paper according to the image color and space information. Then the Ncut method is applied to perform globally optimized clustering. Because the image clustering uses the segmented regions, instead of the image pixels, the new method can effectively reduce the computational complexity of traditional Ncut method by using secondary watershed algorithm. The new weight matrix also has certain self-adaptability. Through a large number of experiments using color natural scene images, the results show that the proposed method has superior performance and less computational costs compared to the traditional Ncut algorithm and the method combining the mean shift (MS) and Ncut.
- Published
- 2011
- Full Text
- View/download PDF
21. The opportunistic scheduling wireless body area model
- Author
-
Zhimao Lu and Yi Sui
- Subjects
Key distribution in wireless sensor networks ,business.industry ,Computer science ,Real-time computing ,Body area ,Wireless ,Wireless WAN ,Transmission time ,business ,Computer network ,Scheduling (computing) - Abstract
The authors use opportunistic scheduling to improve the performance of wireless body area model. Previously the model of research is not much about of the model of wireless body area. The algorithm selects the best channel to transmit the data. So the throughput reaches the maximum. It increases the rate and saves the transmission time .It produces encouraging results when we apply the algorithm on the wireless body area networks with the introduction of simulation in the system setup.The effect is better than the former research result.The algorithm can also be used in other area of the wireless communication.
- Published
- 2010
- Full Text
- View/download PDF
22. Spam Filtering Based on Improved CHI Feature Selection Method
- Author
-
Hongxia Yu, Dongmei Fan, Zhimao Lu, and Chaoyue Yuan
- Subjects
business.industry ,Computer Science::Information Retrieval ,Feature extraction ,Pattern recognition ,Feature selection ,Filter (signal processing) ,computer.software_genre ,Support vector machine ,Cross entropy ,F-test ,Entropy (information theory) ,Data mining ,Artificial intelligence ,business ,computer ,Mathematics ,Statistical hypothesis testing - Abstract
In this paper, methods of feature selection used in the spam filtering are studied, including CHI square (CHI), Expected Cross Entropy (ECE), the Weight of Evidence for Text (WET) and Information Gain (IG) and a novel modified CHI feature selection method is proposed in spam filtering. The spam filter combined Support Vector Machine (SVM) is selected to evaluate the CHI square, Expected Cross Entropy, the Weight of Evidence for Text, Information Gain and modified CHI. The experiment proved that the modified CHI could improve the precision, recall and F test measure of spam filter and the modified CHI feature selection method is effective.
- Published
- 2009
- Full Text
- View/download PDF
23. Word Sense Discrimination Based on Word-Sense Category Extending
- Author
-
Rubo Zhang, Dongmei Fan, Guobin Cheng, and Zhimao Lu
- Subjects
Computer science ,business.industry ,Word sense ,computer.software_genre ,Electronic mail ,SemEval ,Word lists by frequency ,ComputingMethodologies_PATTERNRECOGNITION ,Word sense discrimination ,Unsupervised learning ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language ,Natural language processing - Abstract
The problem of ambiguous word sense poses lots of difficulties for language automatic understanding. And the research of word sense discrimination is applied to the resolution of this problem. Statistics are booming in researching this problem. While due to limitation of the scale of training corpus, the method of Statistical word sense discrimination can not attain satisfying results yet. Therefore, under condition that only limit scale corpus is available, how to improve the efficiency and effectiveness of statistical learning method is a hotspot in supervised word sense recognition research. On the basis of the concept of word sense Category, a new word sense discrimination method using word sense Category Extending is proposed. Experiment results show that the proposed method can effectively improve the accuracy of word sense discrimination as the training corpus is not enlarged..
- Published
- 2009
- Full Text
- View/download PDF
24. Automatic Chinese text categorization system based on mutual information
- Author
-
Qi Zhang, Hong Shi, Chaoyue Yuan, and Zhimao Lu
- Subjects
business.industry ,Computer science ,Feature extraction ,Pattern recognition ,Feature selection ,Mutual information ,computer.software_genre ,Support vector machine ,Statistical classification ,Text categorization ,Text mining ,Artificial intelligence ,Data mining ,business ,Classifier (UML) ,computer - Abstract
Feature selection is a key step in automatic text categorization system and it has a significant impact on classification result. In this paper we do research on mutual information (MI) which is one basic method of feature selection. Firstly, we found out three main problems of MI by analyzing the formula of MI theoretically and systematically - the MI loss, the information difference among categories, and the excessive emphasis on low-frequency terms. Then, to solve these three questions, we proposed an improved feature selection method by calculating the absolute values of MI and calculating the differential values between maximum and average of MI. At last, we tested our method using K-Nearest Neighbor (KNN) classifier and Support Vector Machine (SVM) classifier respectively, and we also compared it with the original method on Chinese corpus. The results demonstrate the effectiveness and feasibility of the proposed method.
- Published
- 2009
- Full Text
- View/download PDF
25. Speech endpoint detection in strong noisy environment based on the Hilbert-Huang Transform
- Author
-
Liran Shen, Baisen Liu, and Zhimao Lu
- Subjects
Voice activity detection ,business.industry ,Noise (signal processing) ,Speech recognition ,Speech coding ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Speech processing ,Linear predictive coding ,Signal ,Hilbert–Huang transform ,Signal-to-noise ratio ,Computer Science::Sound ,Artificial intelligence ,business ,Mathematics - Abstract
Speech endpoint detection in strong noise environment plays an important role in speech signal processing. Hilbert-Huang Transform (HHT) is based on the local characteristics of signals, which is an adaptive and efficient transformation method. It is particularly suitable for analyzing the non-linear and non-stationary signals such as speech signal. In this paper, we chose the noisy speech signal when the signal-to-noise ratio is negative. A novel algorithm for speech endpoint detection based on Hilbert-Huang transform is provided after analyzing the noisy speech signal. The signal is first decomposed by Empirical Mode Decomposition (EMD), and partial decomposition results are processed by Hilbert transform. The threshold of noise is estimated by analyzing the front of signal's Hilbert amplitude spectrum. The speech segments and non-speech segments can be distinguished by the threshold and the whole signal's Hilbert amplitude spectrum. Simulation results show that the speech signal can be effective detected by this algorithm at low signal-to-noise ratio.
- Published
- 2009
- Full Text
- View/download PDF
26. An Efficient Spectral Method for Document Cluster Ensemble
- Author
-
Zhimao Lu, Guochang Gu, and Sen Xu
- Subjects
Clustering high-dimensional data ,Computer science ,business.industry ,Correlation clustering ,Pattern recognition ,Document clustering ,Ensemble learning ,Spectral clustering ,ComputingMethodologies_PATTERNRECOGNITION ,Singular value decomposition ,Artificial intelligence ,business ,Cluster analysis ,k-medians clustering - Abstract
Cluster ensemble techniques have been recently shown to be effective in improving the accuracy and stability of single clustering algorithms. A critical problem in cluster ensemble is how to combine multiple clusterers to yield a final superior clustering result. In this paper, we present an efficient spectral graph theory-based ensemble clustering method feasible for large scale applications such as document clustering. Since the EigenValue Decomposition (EVD) of Laplacian is formidable for large document sets, we first transform it to a Singular Value Decomposition (SVD) problem, and then an equivalent EVD is performed. Experiments show that our spectral algorithm yields better clustering results than other cluster ensemble techniques without high computational cost.
- Published
- 2008
- Full Text
- View/download PDF
27. A Fast Spectral Method to Solve Document Cluster Ensemble Problem
- Author
-
Sen Xu, Guochang Gu, and Zhimao Lu
- Subjects
Computer science ,Computer Science::Information Retrieval ,Approximation algorithm ,Document clustering ,computer.software_genre ,Spectral clustering ,Matrix decomposition ,ComputingMethodologies_PATTERNRECOGNITION ,Algorithm design ,Data mining ,Spectral method ,Cluster analysis ,computer ,Eigenvalues and eigenvectors - Abstract
The critical problem in cluster ensemble is how to combine clusterers to yield a final superior clustering result. In this paper, we introduce a spectral method to solve document cluster ensemble problem. Since spectral clustering inevitably needs to compute the eigenvalues and eigenvectors of a matrix, for large scale document datasets, itpsilas computationally intractable. By using algebraic transformation to similarity matrix we get a feasible algorithm. Experiments on TREC and Reuters document sets show that our spectral algorithm yields better clustering results than other typical cluster ensemble techniques without high computational cost.
- Published
- 2008
- Full Text
- View/download PDF
28. Combining Neural Networks and Statistics for Chinese Word Sense Discrimination
- Author
-
Dongmei Fan, Rubo Zhang, and Zhimao Lu
- Subjects
Context model ,Statistical classification ,Training set ,Artificial neural network ,Computer science ,Time delay neural network ,Speech recognition ,Feature extraction ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Mutual information ,Backpropagation ,Word (computer architecture) - Abstract
The input of network is the key problem for Chinese word sense discrimination utilizing the neural network. This paper presents an input model of neural network that calculates the mutual information between contextual words and ambiguous word by using statistical method and taking the contextual words to certain number beside the ambiguous word according to (-M, +N). The experiment adopts triple-layer BP neural network model and proves how the size of training set and the value of M and N affect the performance of neural network model. The experimental objects are six pseudowords owning three word-senses constructed according to certain principles. Tested accuracy of our approach on a closed-corpus reaches 90.31%, and 89.62% on a open-corpus. The experiment proves that the neural network model has good performance on word sense Discrimination.
- Published
- 2008
- Full Text
- View/download PDF
29. A Vicarious Words Method for Word Sense Discrimination
- Author
-
Rubo Zhang, Zhimao Lu, and Dongmei Fan
- Subjects
business.industry ,Computer science ,Speech recognition ,computer.software_genre ,Word sense ,Test (assessment) ,Naive Bayes classifier ,Word sense discrimination ,Chinese language ,Artificial intelligence ,business ,computer ,Value (mathematics) ,Natural language processing - Abstract
This paper presents a new approach based on Vicarious Words (VWs) to resolve Word Sense Discrimination (WSD) in Chinese language. VWs are particular artificial ambiguous words, which can be used to realize unsupervised WSD. A Bayesian classifier is implemented to test the efficacy of the VW solution on Senseval-3 Chinese test suite. The performance is better than state-of-the-art results with an average F-measure of 0.80. The experiment verifies the value of VW for unsupervised method in WSD.
- Published
- 2008
- Full Text
- View/download PDF
30. Word Sense Disambiguation Based on Vicarious Words
- Author
-
Zhimao Lu, Rubo Zhang, and Dongmei Fan
- Subjects
Collocation ,Word-sense disambiguation ,Computer science ,business.industry ,Speech recognition ,Context (language use) ,Mutual information ,computer.software_genre ,Measure (mathematics) ,Unsupervised learning ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language ,Natural language processing - Abstract
This paper presents the concept of vicarious words and develops a new unsupervised Chinese word sense disambiguation method. This method, after statistical learning from the vicarious words, realizes unsupervised word sense disambiguation by calculating mutual information to measure the degree of collocation information between the ambiguous words and their context. In our experiment, we test ten real ambiguous words and get the highest accuracy of 97%. The mean accuracy was 88.52%. The experimental results have proved the feasibility of this method.
- Published
- 2008
- Full Text
- View/download PDF
31. A New Decision Rule for Statistical Word Sense Disambiguation
- Author
-
Rubo Zhang, Dongmei Fan, and Zhimao Lu
- Subjects
Admissible decision rule ,Reflection (computer programming) ,business.industry ,Computer science ,Bayesian probability ,Decision rule ,Bayesian inference ,Machine learning ,computer.software_genre ,Statistical learning theory ,Credibility ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) - Abstract
Word Sense Disambiguation (WSD) is usually considered to be a pattern classification to research and it has always being a key problem and one of difficult points in natural language processing. Statistical learning theory is a mainstream of the research method for WSD. The distribution of the word-senses of an ambiguous word is always not symmetrical and the distinction between word-senses' emergence frequency is great sometimes, so the judgment results are inclined to the maximum probability word-sense in the word-sense classification. The reflection of this phenomenon is obviously in the Bayesian model. When using the Bayesian model to carry on some research we find a new word-sense decision rule, which have a better precision than Bayesian model in WSD. In order to validate the credibility and stabilization of this method we carry through the experiment time and again, and acquire lots of experiment data. The results of the experiment indicate that new decision rule is more excellent than Bayesian decision rule. Furthermore this paper provides a theoretical foundation for this new decision rule.
- Published
- 2008
- Full Text
- View/download PDF
32. Word Sense Disambiguation method based on probability model improved by information gain
- Author
-
Zhimao Lu, Rubo Zhang, Dongmei Fan, and Xueyao Li
- Subjects
Context model ,business.industry ,Computer science ,Bayesian probability ,Feature extraction ,Context (language use) ,Feature selection ,Mutual information ,Machine learning ,computer.software_genre ,Bayesian inference ,Support vector machine ,Naive Bayes classifier ,Text mining ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Word sense disambiguation (WSD) has always being a key problem and one of difficult points in natural language processing. WSD is usually considered to be a pattern classification to be research. Feature selection is an important sector of WSD process. We review naive Bayes model (NBM) seriously, and the feature selection method adopted in this paper is directed at Bayesian Assumption to improve NBM. Positional information concealed in the context of ambiguous word is mined via information gain calculation, to increase the knowledge acquisition efficiency of Bayesian model and to improve the effect of word-sense classification. Eight ambiguous words are tested in our experiment; the experimental results of improved Bayesian model are higher 3.5 per cent than the ones of NBM. The accuracy rise is bigger and the improvement effect is outstanding; and these results prove also the method put forward in this paper is efficacious.
- Published
- 2008
- Full Text
- View/download PDF
33. 3-D Kinematics Modeling for Mobile Robot with Steering Castered-and-Cambered Wheels
- Author
-
Kai Xue, Zhimao Lu, Qiuyun Ouyang, He Xu, Fangliang Peng, Xiao-Zhi Gao, Qing Chang, and Shuanghe Yu
- Subjects
Robot kinematics ,Engineering ,Caster ,Inverse kinematics ,Control theory ,Camber (aerodynamics) ,business.industry ,Robot ,Mobile robot ,Kinematics ,Propulsion ,business ,Simulation - Abstract
In this paper, a universal kinematics model has been presented to a robot with four caster-cambered wheels. The effect toward the robot kinematics performance based on caster and camber has been analyzed. Closed-chain space and Instantaneous Superposition Frame (ISR) is used to build a 3-D kinematics model about differential suspension mobile robot with caster-cambered wheels by independent propulsion and individual steering. An algorithm to estimate the wheel slippage is given. And the Jacobian of wheel with caster and camber has been presented. The results can be the extension of the research results of P.F. Muir and Rajagoplan.
- Published
- 2007
- Full Text
- View/download PDF
34. Maneuver Control of Mobile Robot Based on Equivalent Instantaneous Center of Rotation in Rough Terrain
- Author
-
Kai Xue, Fangliang Peng, Shuanghe Yu, Xiao-Zhi Gao, He Xu, Qing Chang, Qiuyun Ouyang, Wanyi Huang, and Zhimao Lu
- Subjects
Engineering ,business.industry ,Mobile robot ,Terrain ,Computer Science::Robotics ,Azimuth ,Inertial measurement unit ,Control theory ,Orientation (geometry) ,Robot ,business ,Instant centre of rotation ,Simulation ,Reference frame - Abstract
The maneuver control strategy of mobile robot with four steered independently driven wheels (4WS4WD) in rough terrain is rarely discussed. In this paper a approach of projection is adopted to get the equivalent maneuver radii and equivalent steering angles of a 4WS4WD mobile robot based on a arbitrary equivalent Instantaneous Center of Rotation-EICR, and the EICR is a vertical line on the plane of the reference frame. Thus a generic maneuver algorithm in rough terrain with constraints has been investigated explicitly to limit the orientation scope of steering wheels by means of the data of the pitch, roll, azimuth, the angle of rocker and the relative position of EICR of robot from the feed back of Inertial Measurement Unit (EMU), angle sensors, latitudinal sensors and other sensors. A typical maneuver algorithm of robot on plane has been drawn from the generic maneuver algorithm. Then factors which may affect the errors of maneuver action of mobile robot have been analyzed. Finally the results from simulation and tests are implemented to validate the feasibility of the proposed algorithms.
- Published
- 2007
- Full Text
- View/download PDF
35. Research on Association Rules Mining with Temporal Restraint
- Author
-
Hui Ning, Haifeng Yuan, Jianghong Guo, and Zhimao Lu
- Subjects
Association rule learning ,Computer science ,Interval (graph theory) ,Algorithm design ,Data mining ,computer.software_genre ,computer ,Temporal database - Abstract
Firstly an interval extending and merging technology is introduced and then an improved association rules mining method with temporal restraint is proposed. The association analysis which takes use of interval extending and merging technology is implemented based on the data with temporal restraint. Finally the corresponding algorithm is presented.
- Published
- 2007
- Full Text
- View/download PDF
36. An equivalent pseudoword solution to Chinese word sense disambiguation
- Author
-
Jianmin Yao, Ting Liu, Sheng Li, Zhimao Lu, and Haifeng Wang
- Subjects
Word-sense disambiguation ,Computer science ,business.industry ,Speech recognition ,Value (computer science) ,computer.software_genre ,Pseudoword ,Naive Bayes classifier ,Test set ,Chinese language ,Artificial intelligence ,Chinese word ,business ,computer ,Natural language processing - Abstract
This paper presents a new approach based on Equivalent Pseudowords (EPs) to tackle Word Sense Disambiguation (WSD) in Chinese language. EPs are particular artificial ambiguous words, which can be used to realize unsupervised WSD. A Bayesian classifier is implemented to test the efficacy of the EP solution on Senseval-3 Chinese test set. The performance is better than state-of-the-art results with an average F-measure of 0.80. The experiment verifies the value of EP for unsupervised WSD.
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.