21 results on '"Huang, Heyan"'
Search Results
2. TCM-SD: A Benchmark for Probing Syndrome Differentiation via Natural Language Processing
- Author
-
Ren, Mucheng, Huang, Heyan, Zhou, Yuxiang, Cao, Qianwen, Bu, Yuan, Gao, Yang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sun, Maosong, editor, Liu, Yang, editor, Che, Wanxiang, editor, Feng, Yang, editor, Qiu, Xipeng, editor, Rao, Gaoqi, editor, and Chen, Yubo, editor
- Published
- 2022
- Full Text
- View/download PDF
3. Case-Sensitive Neural Machine Translation
- Author
-
Shi, Xuewen, Huang, Heyan, Jian, Ping, Tang, Yi-Kun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lauw, Hady W., editor, Wong, Raymond Chi-Wing, editor, Ntoulas, Alexandros, editor, Lim, Ee-Peng, editor, Ng, See-Kiong, editor, and Pan, Sinno Jialin, editor
- Published
- 2020
- Full Text
- View/download PDF
4. Multi-granularity semantic representation model for relation extraction
- Author
-
Lei, Ming, Huang, Heyan, and Feng, Chong
- Published
- 2021
- Full Text
- View/download PDF
5. An input information enhanced model for relation extraction
- Author
-
Lei, Ming, Huang, Heyan, Feng, Chong, Gao, Yang, and Su, Chao
- Published
- 2019
- Full Text
- View/download PDF
6. Neural Chinese Word Segmentation as Sequence to Sequence Translation
- Author
-
Shi, Xuewen, Huang, Heyan, Jian, Ping, Guo, Yuhang, Wei, Xiaochi, Tang, Yi-Kun, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Cheng, Xueqi, editor, Ma, Weiying, editor, Liu, Huan, editor, Shen, Huawei, editor, Feng, Shizheng, editor, and Xie, Xing, editor
- Published
- 2017
- Full Text
- View/download PDF
7. Extending Embedding Representation by Incorporating Latent Relations
- Author
-
Gao Yang, Wang Wenbo, Liu Qian, Huang Heyan, and Yuefeng Li
- Subjects
Word embedding ,text mining ,natural language processing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The semantic representation of words is a fundamental task in natural language processing and text mining. Learning word embedding has shown its power on various tasks. Most studies are aimed at generating embedding representation of a word based on encoding its context information. However, many latent relations, such as co-occurring associative patterns and semantic conceptual relations, are not well considered. In this paper, we propose an extensible model to incorporate these kinds of valuable latent relations to increase the semantic relatedness of word pairs by learning word embeddings. To assess the effectiveness of our model, we conduct experiments on both information retrieval and text classification tasks. The results indicate the effectiveness of our model as well as its flexibility on different tasks.
- Published
- 2018
- Full Text
- View/download PDF
8. Named Entity Recognition Based on Bilingual Co-training
- Author
-
Li, Yegang, Huang, Heyan, Zhao, Xingjian, Shi, Shumin, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Liu, Pengyuan, editor, and Su, Qi, editor
- Published
- 2013
- Full Text
- View/download PDF
9. Data Augmentation Under Scarce Condition for Neural Machine Translation
- Author
-
Huang Heyan, Dan Luo, Shi Shumin, and Rihai Su
- Subjects
Machine translation ,Computer science ,Process (engineering) ,business.industry ,Sample (statistics) ,computer.software_genre ,Translation (geometry) ,Task (project management) ,Task analysis ,Artificial intelligence ,business ,computer ,Natural language processing ,BLEU - Abstract
Neural Machine Translation (NMT) has achieved state-of-the-art performance depending on the availability of copious parallel corpora. However, for low-resource NMT task, the scarcity of training data will inevitably lead to poor translation performance. In order to relieve the dependence on scale of bilingual corpus and to cut down training time, we propose a novel data augmentation method named SMC under scarce condition that can Sample Monolingual Corpus containing difficult words only in back-translation process for Mongolian-Chinese (Mn-Ch) and English-Chinese (En-Ch) NMT. Inspired by work in curriculum learning, our approach takes into account the various difficulty-degree of the sample and the corresponding model capabilities. Experimental results show that our method improves translation quality respectively by up to 2.4 and 1.72 BLEU points over the baselines on En-Ch and Mn-Ch datasets while greatly reducing training time.
- Published
- 2019
10. Extending Embedding Representation by Incorporating Latent Relations
- Author
-
Liu Qian, Gao Yang, Huang Heyan, Wang Wenbo, and Yuefeng Li
- Subjects
Vocabulary ,Word embedding ,General Computer Science ,Computer science ,media_common.quotation_subject ,Context (language use) ,text mining ,02 engineering and technology ,Semantics ,computer.software_genre ,Text mining ,Semantic similarity ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,natural language processing ,Representation (mathematics) ,media_common ,Context model ,business.industry ,General Engineering ,Embedding ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,business ,lcsh:TK1-9971 ,computer ,Word (computer architecture) ,Natural language processing - Abstract
The semantic representation of words is a fundamental task in natural language processing and text mining. Learning word embedding has shown its power on various tasks. Most studies are aimed at generating embedding representation of a word based on encoding its context information. However, many latent relations, such as co-occurring associative patterns and semantic conceptual relations, are not well considered. In this paper, we propose an extensible model to incorporate these kinds of valuable latent relations to increase the semantic relatedness of word pairs by learning word embeddings. To assess the effectiveness of our model, we conduct experiments on both information retrieval and text classification tasks. The results indicate the effectiveness of our model as well as its flexibility on different tasks.
- Published
- 2018
11. Multi-Graph Cooperative Learning Towards Distant Supervised Relation Extraction.
- Author
-
Yuan, Changsen, Huang, Heyan, and Feng, Chong
- Subjects
- *
GROUP work in education , *MULTIGRAPH , *NATURAL language processing - Abstract
The Graph Convolutional Network (GCN) is a universal relation extraction method that can predict relations of entity pairs by capturing sentences' syntactic features. However, existing GCN methods often use dependency parsing to generate graph matrices and learn syntactic features. The quality of the dependency parsing will directly affect the accuracy of the graph matrix and change the whole GCN's performance. Because of the influence of noisy words and sentence length in the distant supervised dataset, using dependency parsing on sentences causes errors and leads to unreliable information. Therefore, it is difficult to obtain credible graph matrices and relational features for some special sentences. In this article, we present a Multi-Graph Cooperative Learning model (MGCL), which focuses on extracting the reliable syntactic features of relations by different graphs and harnessing them to improve the representations of sentences. We conduct experiments on a widely used real-world dataset, and the experimental results show that our model achieves the state-of-the-art performance of relation extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion.
- Author
-
Liu, Qian, Huang, Heyan, Xuan, Junyu, Zhang, Guangquan, Gao, Yang, and Lu, Jie
- Subjects
FUZZY measure theory ,FUZZY sets ,NATURAL language processing - Abstract
Top-k words selection is a technique used to detect and return the k most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-k words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. Document-level relation extraction with Entity-Selection Attention.
- Author
-
Yuan, Changsen, Huang, Heyan, Feng, Chong, Shi, Ge, and Wei, Xiaochi
- Subjects
- *
NATURAL language processing , *BASE pairs , *SEMANTICS - Abstract
Document-level relation extraction is a complex natural language processing task that predicts relations of entity pairs by capturing the critical semantic features on entity pairs from the document. However, current methods usually consider that the entity pairs contain the vast majority of information which can represent relational facts, and thus focus on modeling the entity pair, ignoring features on whole document and sentences. In the document-level relation extraction, the distance between entity pairs is relatively long. Judging the relation between entities usually requires reading many sentences or the whole document. Therefore, sentences and documents are particularly crucial for document-level relation extraction. In order to make full use of the multi-level information of sentences and documents, this paper proposes a document-level relation extraction framework with two advantages. First, we use the encoder to obtain the semantic features about the document and use the inter-sentence attention based on entity pairs to dynamically capture the features of multiple vital sentences. Second, we design a document gating that combines sentence-level features with document-level features to predict relations. Extensive experiments on a benchmark dataset have well-validated the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
14. Hypergraph network model for nested entity mention recognition.
- Author
-
Huang, Heyan, Lei, Ming, and Feng, Chong
- Subjects
- *
OBJECT recognition (Computer vision) , *NATURAL language processing , *NATURAL languages , *SOURCE code - Abstract
• We present a hypergraph network model for nested entity mention recognition. • We recognize nested entities by tagging hyperedges instead of nodes. • We propose a theorem to make hyperedges be easily denoted in the program. • We solve the data imbalance problem and reduce the computing cost. We propose a hypergraph network (HGN) model to recognize the nested entity mentions in texts. This model can learn the representations for the sequence structures of natural languages and the representations for the hypergraph structures of nested entity mentions. Mainstream methods recognize an entity mention by separately tagging the words or the gaps between words, which may complicate the problem and not be favorable for capturing the overall features of the mention. To solve these issues, the HGN model treats each entity mention as a whole and tags it with one label. We represent each sentence as a hypergraph, in which nodes represent words and hyperedges represent entity mentions. Thus, entity mention recognition (EMR) is transformed into a task of classifying the hyperedges. The HGN model firstly uses encoders to extract the features and learn a hypergraph representation, and then recognizes entity mentions by tagging every hyperedge. The experiments on three standard datasets demonstrate our model outperforms the previous models for nested EMR. We openly release the source code at https://github.com/nlplab-ie/HGN. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
15. Graph-based reasoning model for multiple relation extraction.
- Author
-
Huang, Heyan, Lei, Ming, and Feng, Chong
- Subjects
- *
KNOWLEDGE graphs , *CLASSIFICATION , *NATURAL language processing , *VALUATION , *APPROXIMATE reasoning - Abstract
Linguistic knowledge is useful for various NLP tasks, but the difficulty lies in the representation and application. We consider that linguistic knowledge is implied in a large-scale corpus, while classification knowledge, the knowledge related to the definitions of entity and relation types, is implied in the labeled training data. Therefore, a corpus subgraph is proposed to mine more linguistic knowledge from the easily accessible unlabeled data, and sentence subgraphs are used to acquire classification knowledge. They jointly constitute a relation knowledge graph (RKG) to extract relations from sentences in this paper. On RKG, entity recognition can be regarded as a property value filling problem and relation classification can be regarded as a link prediction problem. Thus, the multiple relation extraction can be treated as a reasoning process for knowledge completion. We combine statistical reasoning and neural network reasoning to segment sentences into entity chunks and non-entity chunks, then propose a novel Chunk Graph LSTM network to learn the representations of entity chunks and infer the relations among them. The experiments on two standard datasets demonstrate our model outperforms the previous models for multiple relation extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
16. Concept Representation by Learning Explicit and Implicit Concept Couplings.
- Author
-
Lu, Wenpeng, Zhang, Yuteng, Wang, Shoujin, Huang, Heyan, Liu, Qian, and Luo, Sheng
- Subjects
CONCEPT learning ,IMPLICIT learning ,IMAGE representation ,CONCEPTS ,NATURAL language processing - Abstract
Generating the precise semantic representation of a word or concept is a fundamental task in natural language processing. Recent studies which incorporate semantic knowledge into word embedding have shown their potential in improving the semantic representation of a concept. However, existing approaches only achieved limited performance improvement as they usually 1) model a word's semantics from some explicit aspects while ignoring the intrinsic aspects of the word, 2) treat semantic knowledge as a supplement of word embeddings, and 3) consider partial relations between concepts while ignoring rich coupling relations between them, such as explicit concept co-occurrences in descriptive texts in a corpus as well as concept hyperlink relations in a knowledge network, and implicit couplings between concept co-occurrences and hyperlinks. In human consciousness, a concept is always associated with various couplings that exist within/between descriptive texts and knowledge networks, which inspires us to capture as many concept couplings as possible for building a more informative concept representation. We thus propose a neural coupled concept representation (CoupledCR) framework and its instantiation: a coupled concept embedding (CCE) model. CCE first learns two types of explicit couplings that are based on concept co-occurrences and hyperlink relations, respectively, and then learns a type of high-level implicit couplings between these two types of explicit couplings for better concept representation. Extensive experimental results on six real-world datasets show that CCE significantly outperforms eight state-of-the-art word embeddings and semantic representation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
17. Extracting Chinese multi-word terms from small corpus
- Author
-
Huang Heyan, Zhou Lang, Zhang Liang, and Feng Chong
- Subjects
Left and right ,Computer science ,business.industry ,Terminology extraction ,Knowledge engineering ,Intelligent decision support system ,Pattern recognition ,Filter (signal processing) ,computer.software_genre ,Terminology ,Entropy (information theory) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
In this paper, we present an automatic terminology extraction approach for Chinese multi-word terms. In this term extraction system, besides five linguistic rules acquired from an available term list by some machine learning methods, two statistical strategies are involved: a termhood measure based on the term distribution variation, and a unithood measure adopting the left and right entropy method to estimate the collocation variation degree. The candidates are ranked according to the values of the former. The latter is used to filter the preposition phrases and some verb-object phrases that rarely appear as terms. By validating on a small scale corpus in the computer domain, the precision reaches 91.5% of the top 2000 outputs.
- Published
- 2008
18. I Know What You Want to Express: Sentence Element Inference by Incorporating External Knowledge Base.
- Author
-
Wei, Xiaochi, Huang, Heyan, Nie, Liqiang, Zhang, Hanwang, Mao, Xian-Ling, and Chua, Tat-Seng
- Subjects
- *
ELECTRONIC data processing , *PREDICTIVE text entry software , *SEMANTIC computing , *NATURAL language processing , *DATA mining - Abstract
Sentence auto-completion is an important feature that saves users many keystrokes in typing the entire sentence by providing suggestions as they type. Despite its value, the existing sentence auto-completion methods, such as query completion models, can hardly be applied to solving the object completion problem in sentences with the form of (subject, verb, object), due to the complex natural language description and the data deficiency problem. Towards this goal, we treat an SVO sentence as a three-element triple (subject, sentence pattern, object), and cast the sentence object completion problem as an element inference problem. These elements in all triples are encoded into a unified low-dimensional embedding space by our proposed TRANSFER model, which leverages the external knowledge base to strengthen the representation learning performance. With such representations, we can provide reliable candidates for the desired missing element by a linear model. Extensive experiments on a real-world dataset have well-validated our model. Meanwhile, we have successfully applied our proposed model to factoid question answering systems for answer candidate selection, which further demonstrates the applicability of the TRANSFER model. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
19. Research on the standardization processing of Chinese sentences in Mandarin-to-English speech translation system
- Author
-
Huang Heyan, Zong Chengqing, and Chen Zhaoxiong
- Subjects
Machine translation ,Computer science ,business.industry ,computer.software_genre ,Mandarin Chinese ,language.human_language ,Rule-based machine translation ,Speech translation ,language ,Computer-assisted translation ,Artificial intelligence ,Language translation ,business ,computer ,Natural language processing ,Sentence ,Spoken language - Abstract
The informal sentence is one of the most important factors to affect the translation precision ratio of a machine translation (MT) system. Especially, in a speech translation system which translates Chinese spoken language to a foreign language, the informal Chinese sentence processing becomes a key link of processing before translation. The characteristics of Chinese spoken language are summarized, and the strategies for Standardization Processing of Chinese Sentences (SPCS) are presented. The strategies combine the method of system automatic processing with the method to check the results with the help of human computer interaction. The paper discusses in detail the related problems in Chinese sentence analysis.
- Published
- 2002
20. Three birds, one stone: A novel translation based framework for joint entity and relation extraction.
- Author
-
Huang, Heyan, Shang, Yu-Ming, Sun, Xin, Wei, Wei, and Mao, Xianling
- Subjects
- *
KNOWLEDGE graphs , *GRAPH algorithms , *NATURAL language processing - Abstract
Joint entity and relation extraction is an important task in natural language processing and knowledge graph construction. Existing studies mainly focus on three issues: redundant predictions, overlapping triples and relation connections. However, as far as we know, none of them is able to solve the three problems simultaneously in a unified architecture. To address this issue, in this paper, we propose a novel translation based unified framework. Specifically, the proposed framework contains two components: an entity tagger and a relation extractor. The former is used to recognize all candidate head entities and tail entities respectively. The latter predicts relations for every entity pair dynamically through ranking with translation mechanism. To show the superiority of the proposed framework, we instantiate it through the simplest binary entity tagger and TransE algorithm. Extensive experiments over two widely used datasets demonstrate that, even with the simplest components, the proposed framework can still achieve competitive performance with most previous baselines. Moreover, the framework is flexible. It enjoys further performance boost when employing more powerful entity tagger and knowledge graph embedding algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Domain-specific meta-embedding with latent semantic structures.
- Author
-
Liu, Qian, Lu, Jie, Zhang, Guangquan, Shen, Tao, Zhang, Zhihan, and Huang, Heyan
- Subjects
- *
NATURAL language processing - Abstract
Meta-embedding aims at assembling pre-trained embeddings from various sources and producing more expressively powerful word representations. Many natural language processing (NLP) tasks in a specific domain benefit from meta-embedding, especially when the task suffers from low resources. This paper proposes an unsupervised meta-embedding method that jointly models background knowledge from the source embeddings and domain-specific knowledge from the task domain. Specifically, embeddings from multiple sources for a word are dynamically aggregated to a single meta-embedding by a differentiable attention module. The embeddings derived from pre-training on a large-scale corpus provide complete background knowledge of word usage. Then, the meta-embedding is further enriched by exploring domain-specific knowledge from each task domain in two ways. First, contextual information in the raw corpus is considered to capture the semantics of words. Second, a graph representing domain-specific semantic structures is extracted from the raw corpus to highlight the relationships between salient words, then the graph is modeled by a powerful graph convolution network to effectively capture rich semantic structures among words in the task domain. Experiments conducted on two tasks, i.e., text classification and relation extraction, show that our model outputs more accurate word meta-embeddings for the task domain, compared to other state-of-the-art competitors.. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.