15 results on '"Xu, Shuo"'
Search Results
2. Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets
- Author
-
Xu Shuo, Zhang Yuefu, An Xin, and Pi Sainan
- Subjects
multi-label classification ,real-world datasets ,hierarchical structure ,classification system ,label correlation ,machine learning ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Many science, technology and innovation (STI) resources are attached with several different labels. To assign automatically the resulting labels to an interested instance, many approaches with good performance on the benchmark datasets have been proposed for multilabel classification task in the literature. Furthermore, several open-source tools implementing these approaches have also been developed. However, the characteristics of real-world multilabel patent and publication datasets are not completely in line with those of benchmark ones. Therefore, the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.
- Published
- 2024
- Full Text
- View/download PDF
3. A Deep Learning Based Anomaly Detection Model for IoT Networks
- Author
-
Dai, Li E., Wang, Xiao, Xu, Shuo Bo, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Dong, Jian, editor, Zhang, Long, editor, and Cheng, Deqiang, editor
- Published
- 2024
- Full Text
- View/download PDF
4. Estimating Spatiotemporal Fishing Effort of Trawlers with Vessel-Monitoring System Data: A Case Study of the Sea Area of the Bohai Sea and the Yellow Sea, China.
- Author
-
Li, Dan, Lu, Feng, Xu, Shuo, Liu, Huiyuan, Xue, Muhan, Cui, Guohui, Ma, Zhenhua, Fang, Hui, and Wang, Yu
- Subjects
MACHINE learning ,FEATURE extraction ,FISHERY resources ,FISHING ,FISHERIES ,BOOSTING algorithms - Abstract
Measuring the distribution of the fishing effort of trawlers is of great significance for describing marine fishery activities, quantifying fishing systems in terms of marine ecological pressure, and revising the regulations of fishing. The purpose of this paper is to develop an efficient learning algorithm to detect the fishing behavior of trawlers to analyze the distribution of fishing effort. The vessel-monitoring system data of more than 4600 trawlers from September 2019 to April 2023 were used for feature extraction. According to the spatiotemporal information provided by the vessel position data, 11-dimensional features were extracted to form the feature vectors. A Slime Mould Algorithm-optimized Light Gradient-Boosting Machine (SMA-LightGBM) algorithm was proposed to classify the feature vectors to recognize fishing behavior. The presented method showed a remarkable generalization ability and high accuracy, sensitivity, specificity, and Matthews correlation coefficient in the test results, with scores of 98.23%, 98.75%, 97.75%, and 0.9646, respectively. Subsequently, the trained model was used to identify the fishing behavior of trawlers belonging to the coastal provinces of the Bohai Sea and the Yellow Sea in the sea area of 117 ° E ~ 132 ° E , 26 ° N ~ 41 ° N . The fishing effort was calculated and evaluated according to the fishing behavior recognition results. The mean absolute error was 0.3031 kW·h, and the coefficient of determination score was 0.9772. The thermal map of the fishing effort of the trawler was mapped, and the spatiotemporal characteristics were estimated in the region of interest from 2019 to 2023 with a spatial resolution of 1 8 degree × 1 8 degree. This method is an efficient way of analyzing the spatiotemporal characteristics of the fishing effort of trawlers. It provides a quantitative basis for the assessment of fishery resources and can inform fishing policies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. K-Base: Platform to Build the Knowledge Base for an Intelligent Service
- Author
-
Shin, Sungho, Um, Jung-Ho, Choi, Sung-Pil, Jung, Hanmin, Xu, Shuo, Zhu, Lijun, Park, James J. (Jong Hyuk), editor, Adeli, Hojjat, editor, Park, Namje, editor, and Woungang, Isaac, editor
- Published
- 2014
- Full Text
- View/download PDF
6. Important citations identification by exploiting generative model into discriminative model.
- Author
-
An, Xin, Sun, Xin, Xu, Shuo, Hao, Liyuan, and Li, Jinghong
- Subjects
SCIENTIFIC knowledge ,MACHINE learning ,CONVOLUTIONAL neural networks ,DEEP learning ,SUPPORT vector machines ,SUCCESS ,CITATION indexes - Abstract
Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model–based features improve further the performance for identifying important citations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. The CHEMDNER corpus of chemicals and drugs and its annotation principles
- Author
-
Krallinger, Martin, Rabal, Obdulia, Leitner, Florian, Vazquez, Miguel, Salgado, David, Lu, Zhiyong, Leaman, Robert, Lu, Yanan, Ji, Donghong, Lowe, Daniel M, Sayle, Roger A, Batista-Navarro, Riza Theresa, Rak, Rafal, Huber, Torsten, Rocktäschel, Tim, Matos, Sérgio, Campos, David, Tang, Buzhou, Xu, Hua, Munkhdalai, Tsendsuren, Ryu, Keun Ho, Ramanan, SV, Nathan, Senthil, Žitnik, Slavko, Bajec, Marko, Weber, Lutz, Irmer, Matthias, Akhondi, Saber A, Kors, Jan A, Xu, Shuo, An, Xin, Sikdar, Utpal Kumar, Ekbal, Asif, Yoshioka, Masaharu, Dieb, Thaer M, Choi, Miji, Verspoor, Karin, Khabsa, Madian, Giles, C Lee, Liu, Hongfang, Ravikumar, Komandur Elayavilli, Lamurias, Andre, Couto, Francisco M, Dai, Hong-Jie, Tsai, Richard Tzong-Han, Ata, Caglar, Can, Tolga, Usié, Anabel, Alves, Rui, Segura-Bedmar, Isabel, Martínez, Paloma, Oyarzabal, Julen, and Valencia, Alfonso
- Published
- 2015
- Full Text
- View/download PDF
8. Multisource domain factorization network for cross-domain fault diagnosis of rotating machinery: An unsupervised multisource domain adaptation method
- Author
-
Ding Xue, Shun Zhang, Xu Shuo, Shi Yaowei, Jing Li, and Aidong Deng
- Subjects
Generalization ,Computer science ,business.industry ,Mechanical Engineering ,Aerospace Engineering ,Negative transfer ,Machine learning ,computer.software_genre ,Computer Science Applications ,Domain (software engineering) ,Control and Systems Engineering ,Signal Processing ,Feature (machine learning) ,Artificial intelligence ,Entropy (energy dispersal) ,Representation (mathematics) ,Transfer of learning ,business ,Focus (optics) ,computer ,Civil and Structural Engineering - Abstract
Unsupervised domain adaptation (DA) provides a promising approach for tackling fault diagnosis tasks of target datasets without labeled data and has been actively studied in recent years. Most of them focus only on single-source DA, compared to multisource DA (MDA), which has remarkable advantages in generalized knowledge learning and generalization performance. Nevertheless, there are very few fault diagnosis studies based on MDA, and it remains challenging to reduce multiple domain shifts to improve diagnostic performance and mitigate negative transfer during learning. To this end, a novel unsupervised MDA-based transfer learning approach called multisource domain factorization network (MDFN) is proposed in this paper, where the generalized diagnosis knowledge is learned from multiple sources and then used for diagnosing the target task. The highlights of MDFN are that the shared-space component analysis and transferability-based entropy penalty strategy are employed to significantly mitigate negative transfer from the two levels of feature representation and instance transferability and effectively learn shared feature representation. Therefore, the MDFN can extract shared features that combine domain-invariance and discriminability, thereby performing better. The results of two experimental cases on six datasets, including cross-operating-condition and cross-component diagnosis tasks, validate the effectiveness and superiority of the proposed method.
- Published
- 2022
9. Learning Efficient Hash Codes for Fast Graph-Based Data Similarity Retrieval.
- Author
-
Wang, Jinbao, Xu, Shuo, Zheng, Feng, Lu, Ke, Song, Jingkuan, and Shao, Ling
- Subjects
- *
INFORMATION retrieval , *REPRESENTATIONS of graphs , *MACHINE learning , *GRAPH algorithms , *COMPUTER vision , *VISUAL fields - Abstract
Traditional operations, e.g. graph edit distance (GED), are no longer suitable for processing the massive quantities of graph-structured data now available, due to their irregular structures and high computational complexities. With the advent of graph neural networks (GNNs), the problems of graph representation and graph similarity search have drawn particular attention in the field of computer vision. However, GNNs have been less studied for efficient and fast retrieval after graph representation. To represent graph-based data, and maintain fast retrieval while doing so, we introduce an efficient hash model with graph neural networks (HGNN) for a newly designed task (i.e. fast graph-based data retrieval). Due to its flexibility, HGNN can be implemented in both an unsupervised and supervised manner. Specifically, by adopting a graph neural network and hash learning algorithms, HGNN can effectively learn a similarity-preserving graph representation and compute pair-wise similarity or provide classification via low-dimensional compact hash codes. To the best of our knowledge, our model is the first to address graph hashing representation in the Hamming space. Our experimental results reach comparable prediction accuracy to full-precision methods and can even outperform traditional models in some cases. In real-world applications, using hash codes can greatly benefit systems with smaller memory capacities and accelerate the retrieval speed of graph-structured data. Hence, we believe the proposed HGNN has great potential in further research. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. Simple Interrogative Sentence Analysis Based on CRF
- Author
-
Wang Zheng, Yan Yingying, Xu Shuo, Zhang Ning, Zhu Li-Jun, and Li Weifeng
- Subjects
Conditional random field ,Sequence ,business.industry ,Process (engineering) ,Computer science ,020207 software engineering ,02 engineering and technology ,Interrogative ,Machine learning ,computer.software_genre ,Knowledge-based systems ,Knowledge extraction ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Simple (philosophy) - Abstract
This paper intends to enhance the simple interrogative sentence analysis , which leads question answering system to understand ”What is this question asking?”.[ Methods]Under the condition that simple interrogative sentence analysis is regarded as a sequence labelling problems, Conditional Random Field (CRF) model can process it well. [Results] Few manual label can lead to promoted result. [Limitations]For non-factual problems processing needs exceed the defined label system support.[Conclusions]Using Conditional Random Field model to process question analysis problem, which is regarded as a sequential labelling problem, can improve handling capacity with relatively little cost.
- Published
- 2016
11. Which type of dynamic indicators should be preferred to predict patent commercial potential?
- Author
-
Yang, Guancan, Lu, Guoxuan, Xu, Shuo, Chen, Liang, and Wen, Yuxin
- Subjects
PATENTS ,MACHINE learning ,ARTIFICIAL intelligence ,DIGITAL technology ,TECHNOLOGICAL forecasting ,COVID-19 pandemic - Abstract
The current patent value evaluations increasingly focus on serving realistic predictive scenarios, emphasizing the commercial potential of patents at the early stage from the ex-ante perspective. This requirement poses a serious challenge: those classical dynamic indicators that have been proved to be effective in the literature may not be valid for commercial patent potential prediction from the ex-ante perspective. Thereupon, this study groups the dynamic indicators into cross-sectional indicators and longitudinal indicators. Then, a patent commercial potential prediction framework is proposed from the ex-ante perspective, in which the impact of the chronological order on predictive models is investigated comprehensively. More specifically, this study collects the USPTO cancer-related dataset from 2003 to 2013 as the training set, and combines three dynamic indicators (cross-sectional, longitudinal, and mixed) with classical static indicators to test the prediction performance for the following five years (2014–2018). The biased results caused by the ex-post perspective are indeed observed, and the longitudinal indicators are more sensitive to commercial patent potential, especially in the early stage. The effect of the ex-post perspective will gradually weaken over time, and the cross-sectional indicators provide stable prediction performance three years later. These findings will be helpful for subsequent improvements of commercial patent potential prediction models. • To exploit impact of various indicators on predicting patent commercial potential with a supervised machine learning method. • A framework embedding out-of-time & over sampling methods is raised to predict patent commercial potential from ex-ante view. • Dynamic indicators encode essential information on the commercial potential of a focal patent. • Longitudinal indicators are more sensitive to the commercial patent potential, especially at the early stage. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Patent representation learning with a novel design of patent ontology: Case study on PEM patents.
- Author
-
Zhai, Dongsheng, Zhai, Liang, Li, Mengyang, He, Xijun, Xu, Shuo, and Wang, Feifei
- Subjects
PATENTS ,TECHNOLOGICAL innovations ,MINERAL industries ,MACHINE learning ,DATA analysis - Abstract
Under the background of innovation-driven knowledge economy globalization, comprehensive and insightful patent technology information mining can help enterprises win the first-mover advantage in the increasingly fierce technology competition. However, existing machine learning-based methods do not entirely incorporate the characteristics of patent technology of technology composition and technology association at the micro-level and macro-level, making it difficult to mine detailed and comprehensive patent information. To fill this research gap, firstly, we conduct a comprehensive analysis from the micro-level technology composition perspective of patent documents and the macro-level technology association perspective of patent data involved in the technology field, and then we design a novel patent ontology that includes the entity of patent, function, solution and application field. Secondly, we create a patent heterogeneous network with the help of the proposed patent ontology and the technology association. Finally, to fully use the patent technology characteristics, we develop a heterogeneous graph embedding algorithm to embed this information into the patent representation, and the experiments done on non-perfluorinated proton exchange membrane patent data show that our method produces better patent representation than the comparable models. Furthermore, we utilize the patent representation to perform case study to confirm the method's reliability and practicability. • Propose an overall methodology to learn the patent representation. • Design a patent ontology that considers the patent technology's characteristics. • Combine patent ontology and heterogeneous network embedding algorithm. • Conduct various experiments to validate the merits of our method. • Generate patent representation to serve multiple patent analysis tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Prediction of core cancer genes using multi-task classification framework
- Author
-
Gao, Shan, Xu, Shuo, Fang, Yaping, and Fang, Jianwen
- Subjects
- *
CANCER genes , *CARCINOGENESIS , *GENE expression , *MACHINE learning , *SYSTEMS biology , *PREDICTION theory , *COMPUTER multitasking - Abstract
Abstract: Cancer is deemed as a highly heterogeneous disease specific to cell type and tissue origin. All cancers, however, share a common pathogenesis. Therefore, it is widely believed that cancers may share common mechanisms. In this study, we introduce a novel strategy based on multi-tasking learning methods to predict core cancer genes shared by multiple cancers in the hope of elucidating common cancer mechanisms. Our strategy uses two multi-tasking learning algorithms, one for feature selection and the other for validation of selected features. The combined use of two methods results in more robust classifiers and reliable selected features. The top 73 significant features, mapped to 72 genes, are selected as core cancer genes. The effectiveness of the 73 features is further demonstrated in a blind test conducted on an independent test data. The biological significance of these genes is evaluated using systems biology analyses. Extensive functional, pathway and network analysis confirms findings in previous studies and brings new insights into common cancer mechanisms. Our strategy can be used as a general method to find important genes from large gene expression datasets on the genomic level. The selected genes can be used to predict cancers. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
14. Emerging research topics detection with multiple machine learning models.
- Author
-
Xu, Shuo, Hao, Liyuan, An, Xin, Yang, Guancan, and Wang, Feifei
- Subjects
MACHINE learning ,GIBBS sampling ,GENOME editing - Abstract
• Several machine learning models are together used to detect and foresight the emerging research topics. • The following indicators are operationalized: radical novelty, relatively fast growth, coherence and scientific impact. • As for the CIM model, the collapsed Gibbs sampling is done separately for the cited and citing publication parts. • Experimental results on gene editing dataset show that it is feasible to identify emerging research topics with our framework. Emerging research topic detection can benefit the research foundations and policy-makers. With the long-term and recent interest in detecting emerging research topics, various approaches are proposed in the literature. Though, there is still a lack of well-established linkages between the clear conceptual definition of emerging research topics and the proposed indicators for operationalization. This work follows the definition by Wang (2018) , and several machine learning models are together used to detect and foresight the emerging research topics. Finally, experimental results on gene editing dataset discover three emerging research topics, which make clear that it is feasible to identify emerging research topics with our framework. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Multisource domain factorization network for cross-domain fault diagnosis of rotating machinery: An unsupervised multisource domain adaptation method.
- Author
-
Shi, Yaowei, Deng, Aidong, Ding, Xue, Zhang, Shun, Xu, Shuo, and Li, Jing
- Subjects
- *
FAULT diagnosis , *ROTATING machinery , *FACTORIZATION , *MACHINE learning , *DIAGNOSIS , *ENTROPY (Information theory) - Abstract
• A novel MDFN is proposed for cross-domain fault diagnosis of rotating machinery. • The domain factorization strategy is elaborated to learn domain-invariant features. • The IET loss term is designed to avoid the interference of "bad samples". • Significant mitigation of negative transfer. Unsupervised domain adaptation (DA) provides a promising approach for tackling fault diagnosis tasks of target datasets without labeled data and has been actively studied in recent years. Most of them focus only on single-source DA, compared to multisource DA (MDA), which has remarkable advantages in generalized knowledge learning and generalization performance. Nevertheless, there are very few fault diagnosis studies based on MDA, and it remains challenging to reduce multiple domain shifts to improve diagnostic performance and mitigate negative transfer during learning. To this end, a novel unsupervised MDA-based transfer learning approach called multisource domain factorization network (MDFN) is proposed in this paper, where the generalized diagnosis knowledge is learned from multiple sources and then used for diagnosing the target task. The highlights of MDFN are that the shared-space component analysis and transferability-based entropy penalty strategy are employed to significantly mitigate negative transfer from the two levels of feature representation and instance transferability and effectively learn shared feature representation. Therefore, the MDFN can extract shared features that combine domain-invariance and discriminability, thereby performing better. The results of two experimental cases on six datasets, including cross-operating-condition and cross-component diagnosis tasks, validate the effectiveness and superiority of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.