Author: "Huang, Heyan" / Publication Type: Periodicals / Topic: natural language processing - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Huang, Heyan"' showing total 21 results

Start Over Author "Huang, Heyan" Topic natural language processing Publication Type Periodicals

21 results on '"Huang, Heyan"'

1. Piecewise graph convolutional network with edge-level attention for relation extraction

Author: Yuan, Changsen, Huang, Heyan, Feng, Chong, and Cao, Qianwen
Published: 2022
Full Text: View/download PDF

2. TCM-SD: A Benchmark for Probing Syndrome Differentiation via Natural Language Processing

Author: Ren, Mucheng, Huang, Heyan, Zhou, Yuxiang, Cao, Qianwen, Bu, Yuan, Gao, Yang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sun, Maosong, editor, Liu, Yang, editor, Che, Wanxiang, editor, Feng, Yang, editor, Qiu, Xipeng, editor, Rao, Gaoqi, editor, and Chen, Yubo, editor
Published: 2022
Full Text: View/download PDF

3. Case-Sensitive Neural Machine Translation

Author: Shi, Xuewen, Huang, Heyan, Jian, Ping, Tang, Yi-Kun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lauw, Hady W., editor, Wong, Raymond Chi-Wing, editor, Ntoulas, Alexandros, editor, Lim, Ee-Peng, editor, Ng, See-Kiong, editor, and Pan, Sinno Jialin, editor
Published: 2020
Full Text: View/download PDF

4. Multi-granularity semantic representation model for relation extraction

Author: Lei, Ming, Huang, Heyan, and Feng, Chong
Published: 2021
Full Text: View/download PDF

5. An input information enhanced model for relation extraction

Author: Lei, Ming, Huang, Heyan, Feng, Chong, Gao, Yang, and Su, Chao
Published: 2019
Full Text: View/download PDF

6. Neural Chinese Word Segmentation as Sequence to Sequence Translation

Author: Shi, Xuewen, Huang, Heyan, Jian, Ping, Guo, Yuhang, Wei, Xiaochi, Tang, Yi-Kun, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Cheng, Xueqi, editor, Ma, Weiying, editor, Liu, Huan, editor, Shen, Huawei, editor, Feng, Shizheng, editor, and Xie, Xing, editor
Published: 2017
Full Text: View/download PDF

7. Extending Embedding Representation by Incorporating Latent Relations

Author: Gao Yang, Wang Wenbo, Liu Qian, Huang Heyan, and Yuefeng Li
Subjects: Word embedding, text mining, natural language processing, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The semantic representation of words is a fundamental task in natural language processing and text mining. Learning word embedding has shown its power on various tasks. Most studies are aimed at generating embedding representation of a word based on encoding its context information. However, many latent relations, such as co-occurring associative patterns and semantic conceptual relations, are not well considered. In this paper, we propose an extensible model to incorporate these kinds of valuable latent relations to increase the semantic relatedness of word pairs by learning word embeddings. To assess the effectiveness of our model, we conduct experiments on both information retrieval and text classification tasks. The results indicate the effectiveness of our model as well as its flexibility on different tasks.
Published: 2018
Full Text: View/download PDF

8. Named Entity Recognition Based on Bilingual Co-training

Author: Li, Yegang, Huang, Heyan, Zhao, Xingjian, Shi, Shumin, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Liu, Pengyuan, editor, and Su, Qi, editor
Published: 2013
Full Text: View/download PDF

9. Data Augmentation Under Scarce Condition for Neural Machine Translation

Author: Huang Heyan, Dan Luo, Shi Shumin, and Rihai Su
Subjects: Machine translation, Computer science, Process (engineering), business.industry, Sample (statistics), computer.software_genre, Translation (geometry), Task (project management), Task analysis, Artificial intelligence, business, computer, Natural language processing, BLEU
Abstract: Neural Machine Translation (NMT) has achieved state-of-the-art performance depending on the availability of copious parallel corpora. However, for low-resource NMT task, the scarcity of training data will inevitably lead to poor translation performance. In order to relieve the dependence on scale of bilingual corpus and to cut down training time, we propose a novel data augmentation method named SMC under scarce condition that can Sample Monolingual Corpus containing difficult words only in back-translation process for Mongolian-Chinese (Mn-Ch) and English-Chinese (En-Ch) NMT. Inspired by work in curriculum learning, our approach takes into account the various difficulty-degree of the sample and the corresponding model capabilities. Experimental results show that our method improves translation quality respectively by up to 2.4 and 1.72 BLEU points over the baselines on En-Ch and Mn-Ch datasets while greatly reducing training time.
Published: 2019

10. Extending Embedding Representation by Incorporating Latent Relations

Author: Liu Qian, Gao Yang, Huang Heyan, Wang Wenbo, and Yuefeng Li
Subjects: Vocabulary, Word embedding, General Computer Science, Computer science, media_common.quotation_subject, Context (language use), text mining, 02 engineering and technology, Semantics, computer.software_genre, Text mining, Semantic similarity, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, natural language processing, Representation (mathematics), media_common, Context model, business.industry, General Engineering, Embedding, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, computer, Word (computer architecture), Natural language processing
Abstract: The semantic representation of words is a fundamental task in natural language processing and text mining. Learning word embedding has shown its power on various tasks. Most studies are aimed at generating embedding representation of a word based on encoding its context information. However, many latent relations, such as co-occurring associative patterns and semantic conceptual relations, are not well considered. In this paper, we propose an extensible model to incorporate these kinds of valuable latent relations to increase the semantic relatedness of word pairs by learning word embeddings. To assess the effectiveness of our model, we conduct experiments on both information retrieval and text classification tasks. The results indicate the effectiveness of our model as well as its flexibility on different tasks.
Published: 2018

11. Multi-Graph Cooperative Learning Towards Distant Supervised Relation Extraction.

Author: Yuan, Changsen, Huang, Heyan, and Feng, Chong
Subjects: *GROUP work in education, *MULTIGRAPH, *NATURAL language processing
Abstract: The Graph Convolutional Network (GCN) is a universal relation extraction method that can predict relations of entity pairs by capturing sentences' syntactic features. However, existing GCN methods often use dependency parsing to generate graph matrices and learn syntactic features. The quality of the dependency parsing will directly affect the accuracy of the graph matrix and change the whole GCN's performance. Because of the influence of noisy words and sentence length in the distant supervised dataset, using dependency parsing on sentences causes errors and leads to unreliable information. Therefore, it is difficult to obtain credible graph matrices and relational features for some special sentences. In this article, we present a Multi-Graph Cooperative Learning model (MGCL), which focuses on extracting the reliable syntactic features of relations by different graphs and harnessing them to improve the representations of sentences. We conduct experiments on a widely used real-world dataset, and the experimental results show that our model achieves the state-of-the-art performance of relation extraction. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

12. A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion.

Author: Liu, Qian, Huang, Heyan, Xuan, Junyu, Zhang, Guangquan, Gao, Yang, and Lu, Jie
Subjects: FUZZY measure theory, FUZZY sets, NATURAL language processing
Abstract: Top-k words selection is a technique used to detect and return the k most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-k words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

13. Document-level relation extraction with Entity-Selection Attention.

Author: Yuan, Changsen, Huang, Heyan, Feng, Chong, Shi, Ge, and Wei, Xiaochi
Subjects: *NATURAL language processing, *BASE pairs, *SEMANTICS
Abstract: Document-level relation extraction is a complex natural language processing task that predicts relations of entity pairs by capturing the critical semantic features on entity pairs from the document. However, current methods usually consider that the entity pairs contain the vast majority of information which can represent relational facts, and thus focus on modeling the entity pair, ignoring features on whole document and sentences. In the document-level relation extraction, the distance between entity pairs is relatively long. Judging the relation between entities usually requires reading many sentences or the whole document. Therefore, sentences and documents are particularly crucial for document-level relation extraction. In order to make full use of the multi-level information of sentences and documents, this paper proposes a document-level relation extraction framework with two advantages. First, we use the encoder to obtain the semantic features about the document and use the inter-sentence attention based on entity pairs to dynamically capture the features of multiple vital sentences. Second, we design a document gating that combines sentence-level features with document-level features to predict relations. Extensive experiments on a benchmark dataset have well-validated the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. Hypergraph network model for nested entity mention recognition.

Author: Huang, Heyan, Lei, Ming, and Feng, Chong
Subjects: *OBJECT recognition (Computer vision), *NATURAL language processing, *NATURAL languages, *SOURCE code
Abstract: • We present a hypergraph network model for nested entity mention recognition. • We recognize nested entities by tagging hyperedges instead of nodes. • We propose a theorem to make hyperedges be easily denoted in the program. • We solve the data imbalance problem and reduce the computing cost. We propose a hypergraph network (HGN) model to recognize the nested entity mentions in texts. This model can learn the representations for the sequence structures of natural languages and the representations for the hypergraph structures of nested entity mentions. Mainstream methods recognize an entity mention by separately tagging the words or the gaps between words, which may complicate the problem and not be favorable for capturing the overall features of the mention. To solve these issues, the HGN model treats each entity mention as a whole and tags it with one label. We represent each sentence as a hypergraph, in which nodes represent words and hyperedges represent entity mentions. Thus, entity mention recognition (EMR) is transformed into a task of classifying the hyperedges. The HGN model firstly uses encoders to extract the features and learn a hypergraph representation, and then recognizes entity mentions by tagging every hyperedge. The experiments on three standard datasets demonstrate our model outperforms the previous models for nested EMR. We openly release the source code at https://github.com/nlplab-ie/HGN. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

15. Graph-based reasoning model for multiple relation extraction.

Author: Huang, Heyan, Lei, Ming, and Feng, Chong
Subjects: *KNOWLEDGE graphs, *CLASSIFICATION, *NATURAL language processing, *VALUATION, *APPROXIMATE reasoning
Abstract: Linguistic knowledge is useful for various NLP tasks, but the difficulty lies in the representation and application. We consider that linguistic knowledge is implied in a large-scale corpus, while classification knowledge, the knowledge related to the definitions of entity and relation types, is implied in the labeled training data. Therefore, a corpus subgraph is proposed to mine more linguistic knowledge from the easily accessible unlabeled data, and sentence subgraphs are used to acquire classification knowledge. They jointly constitute a relation knowledge graph (RKG) to extract relations from sentences in this paper. On RKG, entity recognition can be regarded as a property value filling problem and relation classification can be regarded as a link prediction problem. Thus, the multiple relation extraction can be treated as a reasoning process for knowledge completion. We combine statistical reasoning and neural network reasoning to segment sentences into entity chunks and non-entity chunks, then propose a novel Chunk Graph LSTM network to learn the representations of entity chunks and infer the relations among them. The experiments on two standard datasets demonstrate our model outperforms the previous models for multiple relation extraction. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

16. Concept Representation by Learning Explicit and Implicit Concept Couplings.

Author: Lu, Wenpeng, Zhang, Yuteng, Wang, Shoujin, Huang, Heyan, Liu, Qian, and Luo, Sheng
Subjects: CONCEPT learning, IMPLICIT learning, IMAGE representation, CONCEPTS, NATURAL language processing
Abstract: Generating the precise semantic representation of a word or concept is a fundamental task in natural language processing. Recent studies which incorporate semantic knowledge into word embedding have shown their potential in improving the semantic representation of a concept. However, existing approaches only achieved limited performance improvement as they usually 1) model a word's semantics from some explicit aspects while ignoring the intrinsic aspects of the word, 2) treat semantic knowledge as a supplement of word embeddings, and 3) consider partial relations between concepts while ignoring rich coupling relations between them, such as explicit concept co-occurrences in descriptive texts in a corpus as well as concept hyperlink relations in a knowledge network, and implicit couplings between concept co-occurrences and hyperlinks. In human consciousness, a concept is always associated with various couplings that exist within/between descriptive texts and knowledge networks, which inspires us to capture as many concept couplings as possible for building a more informative concept representation. We thus propose a neural coupled concept representation (CoupledCR) framework and its instantiation: a coupled concept embedding (CCE) model. CCE first learns two types of explicit couplings that are based on concept co-occurrences and hyperlink relations, respectively, and then learns a type of high-level implicit couplings between these two types of explicit couplings for better concept representation. Extensive experimental results on six real-world datasets show that CCE significantly outperforms eight state-of-the-art word embeddings and semantic representation methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

17. Extracting Chinese multi-word terms from small corpus

Author: Huang Heyan, Zhou Lang, Zhang Liang, and Feng Chong
Subjects: Left and right, Computer science, business.industry, Terminology extraction, Knowledge engineering, Intelligent decision support system, Pattern recognition, Filter (signal processing), computer.software_genre, Terminology, Entropy (information theory), Artificial intelligence, business, computer, Natural language processing
Abstract: In this paper, we present an automatic terminology extraction approach for Chinese multi-word terms. In this term extraction system, besides five linguistic rules acquired from an available term list by some machine learning methods, two statistical strategies are involved: a termhood measure based on the term distribution variation, and a unithood measure adopting the left and right entropy method to estimate the collocation variation degree. The candidates are ranked according to the values of the former. The latter is used to filter the preposition phrases and some verb-object phrases that rarely appear as terms. By validating on a small scale corpus in the computer domain, the precision reaches 91.5% of the top 2000 outputs.
Published: 2008

18. I Know What You Want to Express: Sentence Element Inference by Incorporating External Knowledge Base.

Author: Wei, Xiaochi, Huang, Heyan, Nie, Liqiang, Zhang, Hanwang, Mao, Xian-Ling, and Chua, Tat-Seng
Subjects: *ELECTRONIC data processing, *PREDICTIVE text entry software, *SEMANTIC computing, *NATURAL language processing, *DATA mining
Abstract: Sentence auto-completion is an important feature that saves users many keystrokes in typing the entire sentence by providing suggestions as they type. Despite its value, the existing sentence auto-completion methods, such as query completion models, can hardly be applied to solving the object completion problem in sentences with the form of (subject, verb, object), due to the complex natural language description and the data deficiency problem. Towards this goal, we treat an SVO sentence as a three-element triple (subject, sentence pattern, object), and cast the sentence object completion problem as an element inference problem. These elements in all triples are encoded into a unified low-dimensional embedding space by our proposed TRANSFER model, which leverages the external knowledge base to strengthen the representation learning performance. With such representations, we can provide reliable candidates for the desired missing element by a linear model. Extensive experiments on a real-world dataset have well-validated our model. Meanwhile, we have successfully applied our proposed model to factoid question answering systems for answer candidate selection, which further demonstrates the applicability of the TRANSFER model. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

19. Research on the standardization processing of Chinese sentences in Mandarin-to-English speech translation system

Author: Huang Heyan, Zong Chengqing, and Chen Zhaoxiong
Subjects: Machine translation, Computer science, business.industry, computer.software_genre, Mandarin Chinese, language.human_language, Rule-based machine translation, Speech translation, language, Computer-assisted translation, Artificial intelligence, Language translation, business, computer, Natural language processing, Sentence, Spoken language
Abstract: The informal sentence is one of the most important factors to affect the translation precision ratio of a machine translation (MT) system. Especially, in a speech translation system which translates Chinese spoken language to a foreign language, the informal Chinese sentence processing becomes a key link of processing before translation. The characteristics of Chinese spoken language are summarized, and the strategies for Standardization Processing of Chinese Sentences (SPCS) are presented. The strategies combine the method of system automatic processing with the method to check the results with the help of human computer interaction. The paper discusses in detail the related problems in Chinese sentence analysis.
Published: 2002

20. Three birds, one stone: A novel translation based framework for joint entity and relation extraction.

Author: Huang, Heyan, Shang, Yu-Ming, Sun, Xin, Wei, Wei, and Mao, Xianling
Subjects: *KNOWLEDGE graphs, *GRAPH algorithms, *NATURAL language processing
Abstract: Joint entity and relation extraction is an important task in natural language processing and knowledge graph construction. Existing studies mainly focus on three issues: redundant predictions, overlapping triples and relation connections. However, as far as we know, none of them is able to solve the three problems simultaneously in a unified architecture. To address this issue, in this paper, we propose a novel translation based unified framework. Specifically, the proposed framework contains two components: an entity tagger and a relation extractor. The former is used to recognize all candidate head entities and tail entities respectively. The latter predicts relations for every entity pair dynamically through ranking with translation mechanism. To show the superiority of the proposed framework, we instantiate it through the simplest binary entity tagger and TransE algorithm. Extensive experiments over two widely used datasets demonstrate that, even with the simplest components, the proposed framework can still achieve competitive performance with most previous baselines. Moreover, the framework is flexible. It enjoys further performance boost when employing more powerful entity tagger and knowledge graph embedding algorithm. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

21. Domain-specific meta-embedding with latent semantic structures.

Author: Liu, Qian, Lu, Jie, Zhang, Guangquan, Shen, Tao, Zhang, Zhihan, and Huang, Heyan
Subjects: *NATURAL language processing
Abstract: Meta-embedding aims at assembling pre-trained embeddings from various sources and producing more expressively powerful word representations. Many natural language processing (NLP) tasks in a specific domain benefit from meta-embedding, especially when the task suffers from low resources. This paper proposes an unsupervised meta-embedding method that jointly models background knowledge from the source embeddings and domain-specific knowledge from the task domain. Specifically, embeddings from multiple sources for a word are dynamically aggregated to a single meta-embedding by a differentiable attention module. The embeddings derived from pre-training on a large-scale corpus provide complete background knowledge of word usage. Then, the meta-embedding is further enriched by exploring domain-specific knowledge from each task domain in two ways. First, contextual information in the raw corpus is considered to capture the semantics of words. Second, a graph representing domain-specific semantic structures is extracted from the raw corpus to highlight the relationships between salient words, then the graph is modeled by a powerful graph convolution network to effectively capture rich semantic structures among words in the task domain. Experiments conducted on two tasks, i.e., text classification and relation extraction, show that our model outputs more accurate word meta-embeddings for the task domain, compared to other state-of-the-art competitors.. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"Huang, Heyan"'

1. Piecewise graph convolutional network with edge-level attention for relation extraction

2. TCM-SD: A Benchmark for Probing Syndrome Differentiation via Natural Language Processing

3. Case-Sensitive Neural Machine Translation

4. Multi-granularity semantic representation model for relation extraction

5. An input information enhanced model for relation extraction

6. Neural Chinese Word Segmentation as Sequence to Sequence Translation

7. Extending Embedding Representation by Incorporating Latent Relations

8. Named Entity Recognition Based on Bilingual Co-training

9. Data Augmentation Under Scarce Condition for Neural Machine Translation

10. Extending Embedding Representation by Incorporating Latent Relations

11. Multi-Graph Cooperative Learning Towards Distant Supervised Relation Extraction.

12. A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion.

13. Document-level relation extraction with Entity-Selection Attention.

14. Hypergraph network model for nested entity mention recognition.

15. Graph-based reasoning model for multiple relation extraction.

16. Concept Representation by Learning Explicit and Implicit Concept Couplings.

17. Extracting Chinese multi-word terms from small corpus

18. I Know What You Want to Express: Sentence Element Inference by Incorporating External Knowledge Base.

19. Research on the standardization processing of Chinese sentences in Mandarin-to-English speech translation system

20. Three birds, one stone: A novel translation based framework for joint entity and relation extraction.

21. Domain-specific meta-embedding with latent semantic structures.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

21 results on '"Huang, Heyan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources