Publisher: acm / Topic: computation and language (cs.cl) - Searchworks@Jio Institute Digital Library Search Results

Showing total 205 results

Start Over Topic computation and language (cs.cl) Publisher acm

205 results

1. Specialized document embeddings for aspect-based similarity of research papers

Author: Ostendorff, Malte, Blume, Till, Ruas, Terry, Gipp, Bela, and Rehm, Georg
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: Document embeddings and similarity measures underpin content-based recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only one perspective on document similarity that ignores which aspects make two documents alike. To address this limitation, aspect-based similarity measures have been developed using document segmentation or pairwise multi-class document classification. While segmentation harms the document coherence, the pairwise classification approach scales poorly to large scale corpora. In this paper, we treat aspect-based similarity as a classical vector similarity problem in aspect-specific embedding spaces. We represent a document not as a single generic embedding but as multiple specialized embeddings. Our approach avoids document segmentation and scales linearly w.r.t.the corpus size. In an empirical study, we use the Papers with Code corpus containing 157,606 research papers and consider the task, method, and dataset of the respective research papers as their aspects. We compare and analyze three generic document embeddings, six specialized document embeddings and a pairwise classification baseline in the context of research paper recommendations. As generic document embeddings, we consider FastText, SciBERT, and SPECTER. To compute the specialized document embeddings, we compare three alternative methods inspired by retrofitting, fine-tuning, and Siamese networks. In our experiments, Siamese SciBERT achieved the highest scores. Additional analyses indicate an implicit bias of the generic document embeddings towards the dataset aspect and against the method aspect of each research paper. Our approach of aspect-based document embeddings mitigates potential risks arising from implicit biases by making them explicit., Comment: Accepted for publication at JCDL 2022
Published: 2022

2. Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols

Author: Daniel S. Weld, Marti A. Hearst, Andrew Head, Kyle Lo, Raymond Fok, Sam Skjonsberg, and Dongyeop Kang
Subjects: FOS: Computer and information sciences, Glossary, Computer Science - Artificial Intelligence, Interface (Java), Computer science, media_common.quotation_subject, Computer Science - Human-Computer Interaction, 02 engineering and technology, Filter (software), Human-Computer Interaction (cs.HC), H.5.2, Reading (process), 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, 050107 human factors, media_common, Computer Science - Computation and Language, Information retrieval, business.industry, Scientific progress, 05 social sciences, 020207 software engineering, Usability, Comprehension, Symbol, Artificial Intelligence (cs.AI), business, Computation and Language (cs.CL)
Abstract: Despite the central importance of research papers to scientific progress, they can be difficult to read. Comprehension is often stymied when the information needed to understand a passage resides somewhere else: in another section, or in another paper. In this work, we envision how interfaces can bring definitions of technical terms and symbols to readers when and where they need them most. We introduce ScholarPhi, an augmented reading interface with four novel features: (1) tooltips that surface position-sensitive definitions from elsewhere in a paper, (2) a filter over the paper that "declutters" it to reveal how the term or symbol is used across the paper, (3) automatic equation diagrams that expose multiple definitions in parallel, and (4) an automatically generated glossary of important terms and symbols. A usability study showed that the tool helps researchers of all experience levels read papers. Furthermore, researchers were eager to have ScholarPhi's definitions available to support their everyday reading., 18 pages, 17 figures, 2 tables. To appear at the 2021 ACM CHI Conference on Human Factors in Computing Systems. For associated video, see https://youtu.be/yYcQf-Yq8B0. v2 changes: expanded discussion of design process and implementation; improved figure design. v3 changes: fixed typo in cell of Table 2; updated HEDDEx and Schwarz-Hearst accuracy in Section 5.3
Published: 2021

3. Mutually-paced Knowledge Distillation for Cross-lingual Temporal Knowledge Graph Reasoning

Author: Ruijie Wang, Zheng Li, Jingfeng Yang, Tianyu Cao, Chao Zhang, Bing Yin, and Tarek Abdelzaher
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: This paper investigates cross-lingual temporal knowledge graph reasoning problem, which aims to facilitate reasoning on Temporal Knowledge Graphs (TKGs) in low-resource languages by transfering knowledge from TKGs in high-resource ones. The cross-lingual distillation ability across TKGs becomes increasingly crucial, in light of the unsatisfying performance of existing reasoning methods on those severely incomplete TKGs, especially in low-resource languages. However, it poses tremendous challenges in two aspects. First, the cross-lingual alignments, which serve as bridges for knowledge transfer, are usually too scarce to transfer sufficient knowledge between two TKGs. Second, temporal knowledge discrepancy of the aligned entities, especially when alignments are unreliable, can mislead the knowledge distillation process. We correspondingly propose a mutually-paced knowledge distillation model MP-KD, where a teacher network trained on a source TKG can guide the training of a student network on target TKGs with an alignment module. Concretely, to deal with the scarcity issue, MP-KD generates pseudo alignments between TKGs based on the temporal information extracted by our representation module. To maximize the efficacy of knowledge transfer and control the noise caused by the temporal knowledge discrepancy, we enhance MP-KD with a temporal cross-lingual attention mechanism to dynamically estimate the alignment strength. The two procedures are mutually paced along with model training. Extensive experiments on twelve cross-lingual TKG transfer tasks in the EventKG benchmark demonstrate the effectiveness of the proposed MP-KD method., This paper is accepted by The Web Conference 2023
Published: 2023

4. KG-BERTScore: Incorporating Knowledge Graph into BERTScore for Reference-Free Machine Translation Evaluation

Author: Zhanglin Wu, Min Zhang, Ming Zhu, Yinglu Li, Ting Zhu, Hao Yang, Song Peng, and Ying Qin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: BERTScore is an effective and robust automatic metric for referencebased machine translation evaluation. In this paper, we incorporate multilingual knowledge graph into BERTScore and propose a metric named KG-BERTScore, which linearly combines the results of BERTScore and bilingual named entity matching for reference-free machine translation evaluation. From the experimental results on WMT19 QE as a metric without references shared tasks, our metric KG-BERTScore gets higher overall correlation with human judgements than the current state-of-the-art metrics for reference-free machine translation evaluation.1 Moreover, the pre-trained multilingual model used by KG-BERTScore and the parameter for linear combination are also studied in this paper., Comment: 5 pages
Published: 2022

5. Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

Author: Li, Jia, Zhang, Ziyang, Lang, Junjie, Jiang, Yueqi, An, Liuwei, Zou, Peng, Xu, Yangyang, Gao, Sheng, Lin, Jie, Fan, Chunxiao, Sun, Xiao, and Wang, Meng
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilizing different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic, visual, text and biological features. These features are fused by TEMMA and GRU with self-attention mechanism frameworks. In this paper, 1) several new audio features, facial expression features and paragraph-level text embeddings are extracted for accuracy improvement. 2) we substantially improve the accuracy and reliability of multimodal sentiment prediction by mining and blending the multimodal features. 3) effective data augmentation strategies are applied in model training to alleviate the problem of sample imbalance and prevent the model from learning biased subject characters. For the MuSe-Humor sub-challenge, our model obtains the AUC score of 0.8932. For the MuSe-Reaction sub-challenge, the Pearson's Correlations Coefficient of our approach on the test set is 0.3879, which outperforms all other participants. For the MuSe-Stress sub-challenge, our approach outperforms the baseline in both arousal and valence on the test dataset, reaching a final combined result of 0.5151., 8 pages, 2 figures, to appear in MuSe 2022 (ACM MM2022 co-located workshop)
Published: 2022

6. Target-aware Abstractive Related Work Generation with Contrastive Learning

Author: Xiuying Chen, Hind Alamro, Mingzhe Li, Shen Gao, Rui Yan, Xin Gao, and Xiangliang Zhang
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: The related work section is an important component of a scientific paper, which highlights the contribution of the target paper in the context of the reference papers. Authors can save their time and effort by using the automatically generated related work section as a draft to complete the final related work. Most of the existing related work section generation methods rely on extracting off-the-shelf sentences to make a comparative discussion about the target work and the reference papers. However, such sentences need to be written in advance and are hard to obtain in practice. Hence, in this paper, we propose an abstractive target-aware related work generator (TAG), which can generate related work sections consisting of new sentences. Concretely, we first propose a target-aware graph encoder, which models the relationships between reference papers and the target paper with target-centered attention mechanisms. In the decoding process, we propose a hierarchical decoder that attends to the nodes of different levels in the graph with keyphrases as semantic indicators. Finally, to generate a more informative related work, we propose multi-level contrastive optimization objectives, which aim to maximize the mutual information between the generated related work with the references and minimize that with non-references. Extensive experiments on two public scholar datasets show that the proposed model brings substantial improvements over several strong baselines in terms of automatic and tailored human evaluations., 11 pages, 7 figures, SIGIR 2022
Published: 2022

7. CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

Author: Zhuang, Shengyao and Zuccon, Guido
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: Current dense retrievers are not robust to out-of-domain and outlier queries, i.e. their effectiveness on these queries is much poorer than what one would expect. In this paper, we consider a specific instance of such queries: queries that contain typos. We show that a small character level perturbation in queries (as caused by typos) highly impacts the effectiveness of dense retrievers. We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT. In BERT, tokenization is performed using the BERT's WordPiece tokenizer and we show that a token with a typo will significantly change the token distributions obtained after tokenization. This distribution change translates to changes in the input embeddings passed to the BERT-based query encoder of dense retrievers. We then turn our attention to devising dense retriever methods that are robust to such queries with typos, while still being as performant as previous methods on queries without typos. For this, we use CharacterBERT as the backbone encoder and an efficient yet effective training method, called Self-Teaching (ST), that distills knowledge from queries without typos into the queries with typos. Experimental results show that CharacterBERT in combination with ST achieves significantly higher effectiveness on queries with typos compared to previous methods. Along with these results and the open-sourced implementation of the methods, we also provide a new passage retrieval dataset consisting of real-world queries with typos and associated relevance assessments on the MS MARCO corpus, thus supporting the research community in the investigation of effective and robust dense retrievers. Code, experimental results and dataset are made available at https://github.com/ielab/CharacterBERT-DR., 9 pages full paper, accepted at SIGIR2022
Published: 2022

8. Conversational Question Answering on Heterogeneous Sources

Author: Christmann, Philipp, Roy, Rishiraj Saha, and Weikum, Gerhard
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: Conversational question answering (ConvQA) tackles sequential information needs where contexts in follow-up questions are left implicit. Current ConvQA systems operate over homogeneous sources of information: either a knowledge base (KB), or a text corpus, or a collection of tables. This paper addresses the novel issue of jointly tapping into all of these together, this way boosting answer coverage and confidence. We present CONVINSE, an end-to-end pipeline for ConvQA over heterogeneous sources, operating in three stages: i) learning an explicit structured representation of an incoming question and its conversational context, ii) harnessing this frame-like representation to uniformly capture relevant evidences from KB, text, and tables, and iii) running a fusion-in-decoder model to generate the answer. We construct and release the first benchmark, ConvMix, for ConvQA over heterogeneous sources, comprising 3000 real-user conversations with 16000 questions, along with entity annotations, completed question utterances, and question paraphrases. Experiments demonstrate the viability and advantages of our method, compared to state-of-the-art baselines., SIGIR 2022 Research Track Long Paper
Published: 2022

9. Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective

Author: Ying Zhou, Xuanang Chen, Ben He, Zheng Ye, and Le Sun
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: Knowledge graph completion (KGC) aims to infer missing knowledge triples based on known facts in a knowledge graph. Current KGC research mostly follows an entity ranking protocol, wherein the effectiveness is measured by the predicted rank of a masked entity in a test triple. The overall performance is then given by a micro(-average) metric over all individual answer entities. Due to the incomplete nature of the large-scale knowledge bases, such an entity ranking setting is likely affected by unlabelled top-ranked positive examples, raising questions on whether the current evaluation protocol is sufficient to guarantee a fair comparison of KGC systems. To this end, this paper presents a systematic study on whether and how the label sparsity affects the current KGC evaluation with the popular micro metrics. Specifically, inspired by the TREC paradigm for large-scale information retrieval (IR) experimentation, we create a relatively "complete" judgment set based on a sample from the popular FB15k-237 dataset following the TREC pooling method. According to our analysis, it comes as a surprise that switching from the original labels to our "complete" labels results in a drastic change of system ranking of a variety of 13 popular KGC models in terms of micro metrics. Further investigation indicates that the IR-like macro(-average) metrics are more stable and discriminative under different settings, meanwhile, less affected by label sparsity. Thus, for KGC evaluation, we recommend conducting TREC-style pooling to balance between human efforts and label completeness, and reporting also the IR-like macro metrics to reflect the ranking nature of the KGC task., Comment: Accepted by SIGIR 2022, full paper
Published: 2022

10. MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

Author: Zhang, Yu, Garg, Shweta, Meng, Yu, Chen, Xiusi, and Han, Jiawei
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, ComputingMethodologies_PATTERNRECOGNITION, Computation and Language (cs.CL)
Abstract: We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing classifiers leverage textual information in each document. However, in many domains, documents are accompanied by various types of metadata (e.g., authors, venue, and year of a research paper). These metadata and their combinations may serve as strong category indicators in addition to textual contents. In this paper, we explore the potential of using metadata to help weakly supervised text classification. To be specific, we model the relationships between documents and metadata via a heterogeneous information network. To effectively capture higher-order structures in the network, we use motifs to describe metadata combinations. We propose a novel framework, named MotifClass, which (1) selects category-indicative motif instances, (2) retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances, and (3) trains a text classifier using the pseudo training data. Extensive experiments on real-world datasets demonstrate the superior performance of MotifClass to existing weakly supervised text classification approaches. Further analysis shows the benefit of considering higher-order metadata information in our framework., Comment: 11 pages; Accepted to WSDM 2022
Published: 2022

11. Integrating Pattern- and Fact-based Fake News Detection via Model Preference Learning

Author: Juan Cao, Xueyao Zhang, Lei Zhong, and Qiang Sheng
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Focus (computing), Computer Science - Computation and Language, Information retrieval, Preference learning, Computer science, Graph neural networks, Fact checking, Computer Science - Social and Information Networks, Preference, Graph (abstract data type), Fake news, Computation and Language (cs.CL)
Abstract: To defend against fake news, researchers have developed various methods based on texts. These methods can be grouped as 1) pattern-based methods, which focus on shared patterns among fake news posts rather than the claim itself; and 2) fact-based methods, which retrieve from external sources to verify the claim's veracity without considering patterns. The two groups of methods, which have different preferences of textual clues, actually play complementary roles in detecting fake news. However, few works consider their integration. In this paper, we study the problem of integrating pattern- and fact-based models into one framework via modeling their preference differences, i.e., making the pattern- and fact-based models focus on respective preferred parts in a post and mitigate interference from non-preferred parts as possible. To this end, we build a Preference-aware Fake News Detection Framework (Pref-FEND), which learns the respective preferences of pattern- and fact-based models for joint detection. We first design a heterogeneous dynamic graph convolutional network to generate the respective preference maps, and then use these maps to guide the joint learning of pattern- and fact-based models for final prediction. Experiments on two real-world datasets show that Pref-FEND effectively captures model preferences and improves the performance of models based on patterns, facts, or both., ACM CIKM 2021 Full Paper
Published: 2021

12. Disentangling Hate in Online Memes

Author: Rui Cao, Jing Jiang, Wen-Haw Chong, Roy Ka-Wei Lee, and Ziqing Fan
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Information retrieval, Modality (human–computer interaction), Computer science, Offensive, Computer Science - Information Retrieval, Multimedia (cs.MM), Task (project management), Social media mining, Computation and Language (cs.CL), Computer Science - Multimedia, Information Retrieval (cs.IR), ComputingMilieux_MISCELLANEOUS
Abstract: Hateful and offensive content detection has been extensively explored in a single modality such as text. However, such toxic information could also be communicated via multimodal content such as online memes. Therefore, detecting multimodal hateful content has recently garnered much attention in academic and industry research communities. This paper aims to contribute to this emerging research topic by proposing DisMultiHate, which is a novel framework that performed the classification of multimodal hateful content. Specifically, DisMultiHate is designed to disentangle target entities in multimodal memes to improve hateful content classification and explainability. We conduct extensive experiments on two publicly available hateful and offensive memes datasets. Our experiment results show that DisMultiHate is able to outperform state-of-the-art unimodal and multimodal baselines in the hateful meme classification task. Empirical case studies were also conducted to demonstrate DisMultiHate's ability to disentangle target entities in memes and ultimately showcase DisMultiHate's explainability of the multimodal hateful content classification task., Comment: Paper accepted in ACM Multimedia 2021
Published: 2021

13. Low resource recognition and linking of biomedical concepts from a large ontology

Author: Nicholas Monath, Andrew McCallum, Sunil Mohan, and Rico Angell
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Information retrieval, business.industry, Computer science, Low resource, Deep learning, Unified Medical Language System, Search engine indexing, Scientific literature, Ontology (information science), Segmentation, Artificial intelligence, business, Computation and Language (cs.CL), Biomedicine
Abstract: Tools to explore scientific literature are essential for scientists, especially in biomedicine, where about a million new papers are published every year. Many such tools provide users the ability to search for specific entities (e.g. proteins, diseases) by tracking their mentions in papers. PubMed, the most well known database of biomedical papers, relies on human curators to add these annotations. This can take several weeks for new papers, and not all papers get tagged. Machine learning models have been developed to facilitate the semantic indexing of scientific papers. However their performance on the more comprehensive ontologies of biomedical concepts does not reach the levels of typical entity recognition problems studied in NLP. In large part this is due to their low resources, where the ontologies are large, there is a lack of descriptive text defining most entities, and labeled data can only cover a small portion of the ontology. In this paper, we develop a new model that overcomes these challenges by (1) generalizing to entities unseen at training time, and (2) incorporating linking predictions into the mention segmentation decisions. Our approach achieves new state-of-the-art results for the UMLS ontology in both traditional recognition/linking (+8 F1 pts) as well as semantic indexing-based evaluation (+10 F1 pts).
Published: 2021

14. One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles

Author: Zhengyi Ma, Zhicheng Dou, Hanxun Zhong, Ji-Rong Wen, and Yutao Zhu
Subjects: FOS: Computer and information sciences, Focus (computing), Vocabulary, Computer Science - Computation and Language, Copying, User profile, Computer Science - Artificial Intelligence, Computer science, media_common.quotation_subject, Construct (python library), computer.software_genre, Chatbot, Artificial Intelligence (cs.AI), Human–computer interaction, Language model, Computation and Language (cs.CL), computer, Word (computer architecture), media_common
Abstract: Personalized chatbots focus on endowing chatbots with a consistent personality to behave like real users, give more informative responses, and further act as personal assistants. Existing personalized approaches tried to incorporate several text descriptions as explicit user profiles. However, the acquisition of such explicit profiles is expensive and time-consuming, thus being impractical for large-scale real-world applications. Moreover, the restricted predefined profile neglects the language behavior of a real user and cannot be automatically updated together with the change of user interests. In this paper, we propose to learn implicit user profiles automatically from large-scale user dialogue history for building personalized chatbots. Specifically, leveraging the benefits of Transformer on language understanding, we train a personalized language model to construct a general user profile from the user's historical responses. To highlight the relevant historical responses to the input post, we further establish a key-value memory network of historical post-response pairs, and build a dynamic post-aware user profile. The dynamic profile mainly describes what and how the user has responded to similar posts in history. To explicitly utilize users' frequently used words, we design a personalized decoder to fuse two decoding strategies, including generating a word from the generic vocabulary and copying one word from the user's personalized vocabulary. Experiments on two real-world datasets show the significant improvement of our model compared with existing methods. Our code is available at https://github.com/zhengyima/DHAP, Comment: Accepted By SIGIR 2021, Full Papers. The code is available at https://github.com/zhengyima/DHAP
Published: 2021

15. Knowledge-Aware Procedural Text Understanding with Multi-Stage Training

Author: Tao Qin, Daxin Jiang, Zhihan Zhang, Yunfang Wu, and Xiubo Geng
Subjects: FOS: Computer and information sciences, Focus (computing), Computer Science - Computation and Language, Information retrieval, Computer Science - Artificial Intelligence, Computer science, Process (engineering), Heuristic, Commonsense reasoning, Semantics, Task (project management), Artificial Intelligence (cs.AI), Web mining, Schema (psychology), Computation and Language (cs.CL)
Abstract: Procedural text describes dynamic state changes during a step-by-step natural process (e.g., photosynthesis). In this work, we focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Although recent approaches have achieved substantial progress, their results are far behind human performance. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved, which require the incorporation of external knowledge bases. Previous works on external knowledge injection usually rely on noisy web mining tools and heuristic rules with limited applicable scenarios. In this paper, we propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge in this task. Specifically, we retrieve informative knowledge triples from ConceptNet and perform knowledge-aware reasoning while tracking the entities. Besides, we employ a multi-stage training schema which fine-tunes the BERT model over unlabeled data collected from Wikipedia before further fine-tuning it on the final model. Experimental results on two procedural text datasets, ProPara and Recipes, verify the effectiveness of the proposed methods, in which our model achieves state-of-the-art performance in comparison to various baselines., Published as full paper in Proceedings of the Web Conference 2021 (WWW'21)
Published: 2021

16. Transfer learning approach for detecting psychological distress in brexit tweets

Author: Shereen Fouad, Sean-Kelly Palicki, Zahraa Said Abdallah, and Mariam Adedoyin-Olowe
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer science, Sentiment analysis, Applied psychology, Negative transfer, Computer Science - Social and Information Networks, 020207 software engineering, 02 engineering and technology, Social media analytics, Machine Learning (cs.LG), Distress, Brexit, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, media_common.cataloged_instance, Social media, European union, Transfer of learning, Computation and Language (cs.CL), media_common
Abstract: In 2016, United Kingdom (UK) citizens voted to leave the European Union (EU), which was officially implemented in 2020. During this period, UK residents experienced a great deal of uncertainty around the UK's continued relationship with the EU. Many people have used social media platforms to express their emotions about this critical event. Sentiment analysis has been recently considered as an important tool for detecting mental well-being in Twitter contents. However, detecting the psychological distress status in political-related tweets is a challenging task due to the lack of explicit sentences describing the depressive or anxiety status. To address this problem, this paper leverages a transfer learning approach for sentiment analysis to measure the non-clinical psychological distress status in Brexit tweets. The framework transfers the knowledge learnt from self-reported psychological distress tweets (source domain) to detect the distress status in Brexit tweets (target domain). The framework applies a domain adaptation technique to decrease the impact of negative transfer between source and target domains. The paper also introduces a Brexit distress index that can be used to detect levels of psychological distress of individuals in Brexit tweets. We design an experiment that includes data from both domains. The proposed model is able to detect the non-clinical psychological distress status in Brexit tweets with an accuracy of 66% and 62% on the source and target domains, respectively., Comment: SAC 2021, MLA - Machine Learning and its Applications
Published: 2021

17. How Many Pages?

Author: Erion Çano and Ondřej Bojar
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Information retrieval, Artificial neural network, Computer science, business.industry, 05 social sciences, 050905 science studies, computer.software_genre, Computer Science - Information Retrieval, Machine Learning (cs.LG), Domain (software engineering), Task (project management), Metadata, Language model, Artificial intelligence, 0509 other social sciences, 050904 information & library sciences, business, Computation and Language (cs.CL), Regression problems, computer, Information Retrieval (cs.IR), Natural language processing
Abstract: Being able to predict the length of a scientific paper may be helpful in numerous situations. This work defines the paper length prediction task as a regression problem and reports several experimental results using popular machine learning models. We also create a huge dataset of publication metadata and the respective lengths in number of pages. The dataset will be freely available and is intended to foster research in this domain. As future work, we would like to explore more advanced regressors based on neural networks and big pretrained language models., Comment: 5 pages, 6 tables. Published in proceedings of NLPIR 2020, the 4th International Conference on Natural Language Processing and Information Retrieval, Seoul, Korea
Published: 2020

18. Query Understanding via Intent Description Generation

Author: Xueqi Cheng, Yanyan Lan, Yixing Fan, Jiafeng Guo, and Ruqing Zhang
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Web search query, Information retrieval, Computer science, 02 engineering and technology, SemEval, Ranking (information retrieval), Task (computing), Web query classification, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Relevance (information retrieval), Cluster analysis, Computation and Language (cs.CL), Natural language
Abstract: Query understanding is a fundamental problem in information retrieval (IR), which has attracted continuous attention through the past decades. Many different tasks have been proposed for understanding users' search queries, e.g., query classification or query clustering. However, it is not that precise to understand a search query at the intent class/cluster level due to the loss of many detailed information. As we may find in many benchmark datasets, e.g., TREC and SemEval, queries are often associated with a detailed description provided by human annotators which clearly describes its intent to help evaluate the relevance of the documents. If a system could automatically generate a detailed and precise intent description for a search query, like human annotators, that would indicate much better query understanding has been achieved. In this paper, therefore, we propose a novel Query-to-Intent-Description (Q2ID) task for query understanding. Unlike those existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description based on both relevant and irrelevant documents of a given query. To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query. We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task. We discuss the potential usage of such Q2ID technique through an example application., Accepted as Long Research Paper in CIKM2020
Published: 2020

19. Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data

Author: Debarshi Kumar Sanyal, Partha Pratim Das, Soumya Banerjee, Samiran Chattopadhyay, and Plaban Kumar Bhowmick
Subjects: FOS: Computer and information sciences, Structure (mathematical logic), Computer Science - Computation and Language, I.5.1, Artificial neural network, H.3.7, Computer science, business.industry, Deep learning, computer.software_genre, Market segmentation, Segmentation, Artificial intelligence, Paragraph, Construct (philosophy), Transfer of learning, business, Computation and Language (cs.CL), computer, Natural language processing
Abstract: The abstract of a scientific paper distills the contents of the paper into a short paragraph. In the biomedical literature, it is customary to structure an abstract into discourse categories like BACKGROUND, OBJECTIVE, METHOD, RESULT, and CONCLUSION, but this segmentation is uncommon in other fields like computer science. Explicit categories could be helpful for more granular, that is, discourse-level search and recommendation. The sparsity of labeled data makes it challenging to construct supervised machine learning solutions for automatic discourse-level segmentation of abstracts in non-bio domains. In this paper, we address this problem using transfer learning. In particular, we define three discourse categories BACKGROUND, TECHNIQUE, OBSERVATION-for an abstract because these three categories are the most common. We train a deep neural network on structured abstracts from PubMed, then fine-tune it on a small hand-labeled corpus of computer science papers. We observe an accuracy of 75% on the test corpus. We perform an ablation study to highlight the roles of the different parts of the model. Our method appears to be a promising solution to the automatic segmentation of abstracts, where the labeled data is sparse., Comment: to appear in the proceedings of JCDL'2020
Published: 2020

20. Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning

Author: Ji-Rong Wen, Junyi Li, Wayne Xin Zhao, Gaole He, and Peiju Liu
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer science, business.industry, Perspective (graphical), Collaborative learning, 02 engineering and technology, Machine learning, computer.software_genre, Preference, Task (project management), Artificial Intelligence (cs.AI), Order (business), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Generator (mathematics)
Abstract: The task of Knowledge Graph Completion (KGC) aims to automatically infer the missing fact information in Knowledge Graph (KG). In this paper, we take a new perspective that aims to leverage rich user-item interaction data (user interaction data for short) for improving the KGC task. Our work is inspired by the observation that many KG entities correspond to online items in application systems. However, the two kinds of data sources have very different intrinsic characteristics, and it is likely to hurt the original performance using simple fusion strategy. To address this challenge, we propose a novel adversarial learning approach by leveraging user interaction data for the KGC task. Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator. The discriminator takes the learned useful information from user interaction data as input, and gradually enhances the evaluation capacity in order to identify the fake samples generated by the generator. To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks, which will be jointly optimized with the discriminator. Such an approach is effective to alleviate the issues about data heterogeneity and semantic complexity for the KGC task. Extensive experiments on three real-world datasets have demonstrated the effectiveness of our approach on the KGC task., 11 pages, 4 figures, 6 tables. Accepted as WWW 2020 paper
Published: 2020

21. Garbage in, garbage out?

Author: Kevin Yu, Rebekah Tang, Mindy Dai, Yanlai Yang, Jenny Huang, Jie Qiu, and R. Stuart Geiger
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Social computing, Structured content, Computer science, business.industry, Best practice, Computer Science - Digital Libraries, Sample (statistics), Machine learning, computer.software_genre, Machine Learning (cs.LG), Task (project management), Computer Science - Computers and Society, Garbage in, garbage out, Content analysis, Computers and Society (cs.CY), Digital Libraries (cs.DL), Artificial intelligence, business, Computation and Language (cs.CL), computer, Reliability (statistics)
Abstract: Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper's authors labeling the data themselves. Such a task is quite similar to (or a form of) structured content analysis, which is a longstanding methodology in the social sciences and humanities, with many established best practices. In this paper, we investigate to what extent a sample of machine learning application papers in social computing --- specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data --- give specific details about whether such best practices were followed. Our team conducted multiple rounds of structured content analysis of each paper, making determinations such as: Does the paper report who the labelers were, what their qualifications were, whether they independently labeled the same items, whether inter-rater reliability metrics were disclosed, what level of training and/or instructions were given to labelers, whether compensation for crowdworkers is disclosed, and if the training data is publicly available. We find a wide divergence in whether such practices were followed and documented. Much of machine learning research and education focuses on what is done once a "gold standard" of training data is available, but we discuss issues around the equally-important aspect of whether such data is reliable in the first place., 18 pages, includes appendix
Published: 2020

22. A Hybrid Retrieval-Generation Neural Conversation Model

Author: Jingjing Liu, Jianfeng Gao, Chen Qu, Xiaodong Liu, Minghui Qiu, Yelong Shen, Liu Yang, W. Bruce Croft, and Junjie Hu
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Information retrieval, business.industry, Computer science, Deep learning, media_common.quotation_subject, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Computer Science - Information Retrieval, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Conversation, Artificial intelligence, business, Computation and Language (cs.CL), Information Retrieval (cs.IR), Text retrieval, 0105 earth and related environmental sciences, media_common
Abstract: Intelligent personal assistant systems that are able to have multi-turn conversations with human users are becoming increasingly popular. Most previous research has been focused on using either retrieval-based or generation-based methods to develop such systems. Retrieval-based methods have the advantage of returning fluent and informative responses with great diversity. However, the performance of the methods is limited by the size of the response repository. On the other hand, generation-based methods can produce highly coherent responses on any topics. But the generated responses are often generic and not informative due to the lack of grounding knowledge. In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods. Experimental results on Twitter and Foursquare data show that the proposed model outperforms both retrieval-based methods and generation-based methods (including a recently proposed knowledge-grounded neural conversation model) under both automatic evaluation metrics and human evaluation. We hope that the findings in this study provide new insights on how to integrate text retrieval and text generation models for building conversation systems., Accepted as a Full Paper in CIKM 2019. 10 pages
Published: 2019

23. Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creatives

Author: Hitoshi Iyatomi, Shunsuke Kitada, and Yoshifumi Seki
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Human–computer interaction, Computer science, Key (cryptography), Computation and Language (cs.CL), Machine Learning (cs.LG), Task (project management)
Abstract: Accurately predicting conversions in advertisements is generally a challenging task, because such conversions do not occur frequently. In this paper, we propose a new framework to support creating high-performing ad creatives, including the accurate prediction of ad creative text conversions before delivering to the consumer. The proposed framework includes three key ideas: multi-task learning, conditional attention, and attention highlighting. Multi-task learning is an idea for improving the prediction accuracy of conversion, which predicts clicks and conversions simultaneously, to solve the difficulty of data imbalance. Furthermore, conditional attention focuses attention of each ad creative with the consideration of its genre and target gender, thus improving conversion prediction accuracy. Attention highlighting visualizes important words and/or phrases based on conditional attention. We evaluated the proposed framework with actual delivery history data (14,000 creatives displayed more than a certain number of times from Gunosy Inc.), and confirmed that these ideas improve the prediction performance of conversions, and visualize noteworthy words according to the creatives' attributes., Comment: 9 pages, 6 figures. Accepted at The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019) as an applied data science paper
Published: 2019

24. Outline Generation

Author: Jiafeng Guo, Yixing Fan, Ruqing Zhang, Xueqi Cheng, and Yanyan Lan
Subjects: FOS: Computer and information sciences, Structure (mathematical logic), Computer Science - Computation and Language, Computer science, business.industry, Context (language use), 02 engineering and technology, Coherence (statistics), computer.software_genre, Automatic summarization, Task (project management), Identification (information), Section (archaeology), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Paragraph, Structured prediction, business, Computation and Language (cs.CL), computer, Natural language processing
Abstract: In this paper, we introduce and tackle the Outline Generation (OG) task, which aims to unveil the inherent content structure of a multi-paragraph document by identifying its potential sections and generating the corresponding section headings. Without loss of generality, the OG task can be viewed as a novel structured summarization task. To generate a sound outline, an ideal OG model should be able to capture three levels of coherence, namely the coherence between context paragraphs, that between a section and its heading, and that between context headings. The first one is the foundation for section identification, while the latter two are critical for consistent heading generation. In this work, we formulate the OG task as a hierarchical structured prediction problem, i.e., to first predict a sequence of section boundaries and then a sequence of section headings accordingly. We propose a novel hierarchical structured neural generation model, named HiStGen, for the task. Our model attempts to capture the three-level coherence via the following ways. First, we introduce a Markov paragraph dependency mechanism between context paragraphs for section identification. Second, we employ a section-aware attention mechanism to ensure the semantic coherence between a section and its heading. Finally, we leverage a Markov heading dependency mechanism and a review mechanism between context headings to improve the consistency and eliminate duplication between section headings. Besides, we build a novel WIKIOG dataset, a public collection which consists of over 1.75 million document-outline pairs for research on the OG task. Experimental results on our benchmark dataset demonstrate that our model can significantly outperform several state-of-the-art sequential generation models for the OG task., 10 pages, Long paper accepted by SIGIR 2019
Published: 2019

25. Weakly-Supervised Neural Text Classification

Author: Chao Zhang, Yu Meng, Jiawei Han, and Jiaming Shen
Subjects: FOS: Computer and information sciences, Feature engineering, Computer Science - Machine Learning, Computer science, Machine Learning (stat.ML), 02 engineering and technology, Machine learning, computer.software_genre, Computer Science - Information Retrieval, Machine Learning (cs.LG), Task (project management), Statistics - Machine Learning, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Flexibility (engineering), Computer Science - Computation and Language, Training set, business.industry, ComputingMethodologies_PATTERNRECOGNITION, 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Information Retrieval (cs.IR), Generator (mathematics)
Abstract: Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly., Comment: CIKM 2018 Full Paper
Published: 2018

26. COTA

Author: Huaixiu Zheng, Piero Molino, and Yi-Chia Wang
Subjects: FOS: Computer and information sciences, Feature engineering, Computer Science - Machine Learning, Computer science, Machine Learning (stat.ML), 02 engineering and technology, Machine learning, computer.software_genre, Machine Learning (cs.LG), Statistics - Machine Learning, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Network architecture, Computer Science - Computation and Language, business.industry, End user, Deep learning, Ranking, Ticket, 020201 artificial intelligence & image processing, Customer satisfaction, Artificial intelligence, business, Computation and Language (cs.CL), computer
Abstract: For a company looking to provide delightful user experiences, it is of paramount importance to take care of any customer issues. This paper proposes COTA, a system to improve speed and reliability of customer support for end users through automated ticket classification and answers selection for support representatives. Two machine learning and natural language processing techniques are demonstrated: one relying on feature engineering (COTA v1) and the other exploiting raw signals through deep learning architectures (COTA v2). COTA v1 employs a new approach that converts the multi-classification task into a ranking problem, demonstrating significantly better performance in the case of thousands of classes. For COTA v2, we propose an Encoder-Combiner-Decoder, a novel deep learning architecture that allows for heterogeneous input and output feature types and injection of prior knowledge through network architecture choices. This paper compares these models and their variants on the task of ticket classification and answer selection, showing model COTA v2 outperforms COTA v1, and analyzes their inner workings and shortcomings. Finally, an A/B test is conducted in a production setting validating the real-world impact of COTA in reducing issue resolution time by 10 percent without reducing customer satisfaction.
Published: 2018

27. Enhancing Person-Job Fit for Talent Recruitment

Author: Liang Jiang, Tong Xu, Chen Zhu, Chuan Qin, Hui Xiong, Hengshu Zhu, and Enhong Chen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Matching (statistics), Computer Science - Computation and Language, ComputingMilieux_THECOMPUTINGPROFESSION, Computer Science - Artificial Intelligence, Computer science, business.industry, 02 engineering and technology, Data science, Manual labour, Machine Learning (cs.LG), Artificial Intelligence (cs.AI), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Set (psychology), Human resources, business, Computation and Language (cs.CL)
Abstract: The wide spread use of online recruitment services has led to information explosion in the job market. As a result, the recruiters have to seek the intelligent ways for Person Job Fit, which is the bridge for adapting the right job seekers to the right positions. Existing studies on Person Job Fit have a focus on measuring the matching degree between the talent qualification and the job requirements mainly based on the manual inspection of human resource experts despite of the subjective, incomplete, and inefficient nature of the human judgement. To this end, in this paper, we propose a novel end to end Ability aware Person Job Fit Neural Network model, which has a goal of reducing the dependence on manual labour and can provide better interpretation about the fitting results. The key idea is to exploit the rich information available at abundant historical job application data. Specifically, we propose a word level semantic representation for both job requirements and job seekers' experiences based on Recurrent Neural Network. Along this line, four hierarchical ability aware attention strategies are designed to measure the different importance of job requirements for semantic representation, as well as measuring the different contribution of each job experience to a specific ability requirement. Finally, extensive experiments on a large scale real world data set clearly validate the effectiveness and interpretability of the APJFNN framework compared with several baselines., Comment: This is an extended version of our SIGIR18 paper
Published: 2018

28. A semantics-based measure of emoji similarity

Author: Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, and Amit P. Sheth
Subjects: FOS: Computer and information sciences, Emoji, Computer science, 02 engineering and technology, Semantics, computer.software_genre, Set (abstract data type), Text processing, Semantic similarity, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, 050107 human factors, Social and Information Networks (cs.SI), Computer Science - Computation and Language, business.industry, 05 social sciences, Sentiment analysis, Computer Science - Social and Information Networks, Knowledge base, 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Natural language processing
Abstract: Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, we develop and test multiple embedding models to measure emoji similarity. To evaluate our work, we create a new dataset called EmoSim508, which assigns human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation with EmoSim508, we present a real-world use-case of our emoji embedding models using a sentiment analysis task and show that our models outperform the previous best-performing emoji embedding model on this task. The EmoSim508 dataset and our emoji embedding models are publicly released with this paper and can be downloaded from http://emojinet.knoesis.org/., Comment: This paper is accepted at Web Intelligence 2017 as a full paper, In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). Leipzig, Germany: ACM, 2017
Published: 2017

29. Jointly Learning Word Embeddings and Latent Topics

Author: Bei Shi, Wai Lam, Shoaib Jameel, Kwun Ping Lai, and Steven Schockaert
Subjects: FOS: Computer and information sciences, Text corpus, Topic model, Word embedding, Computer science, 02 engineering and technology, Meaning (non-linguistic), computer.software_genre, Computer Science - Information Retrieval, Machine Learning (cs.LG), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Polysemy, Computer Science - Computation and Language, Collocation, business.industry, Term (time), Computer Science - Learning, 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Information Retrieval (cs.IR), Word (computer architecture), Natural language processing
Abstract: Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at the word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such "two-step" methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way., Comment: 10 pagess, 2 figures, full paper. To appear in the proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17)
Published: 2017

30. ARAACOM

Author: Hichem Rahab, Mahieddine Djoudi, Zitouni Abdelhafid, and Abdelhafid Zitouni
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Arabic, Computer science, Sentiment analysis, 020206 networking & telecommunications, 02 engineering and technology, language.human_language, Computer Science - Information Retrieval, Task (project management), World Wide Web, Politics, Mood, 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Computation and Language (cs.CL), Information Retrieval (cs.IR)
Abstract: Nowadays, it is no more needed to do an enormous effort to distribute a lot of forms to thousands of people and collect them, then convert this from into electronic format to track people opinion about some subjects. A lot of web sites can today reach a large spectrum with less effort. The majority of web sites suggest to their visitors to leave backups about their feeling of the site or events. So, this makes for us a lot of data which need powerful mean to exploit. Opinion mining in the web becomes more and more an attracting task, due the increasing need for individuals and societies to track the mood of people against several subjects of daily life (sports, politics, television,...). A lot of works in opinion mining was developed in western languages especially English, such works in Arabic language still very scarce. In this paper, we propose our approach, for opinion mining in Arabic Algerian news paper. CCS CONCEPTS $\bullet$Information systems~Sentiment analysis $\bullet$ Computing methodologies~Natural language processing
Published: 2017

31. Distilling Word Embeddings

Author: Lu Zhang, Ge Li, Lili Mou, Ran Jia, Yan Xu, and Zhi Jin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Theoretical computer science, Artificial neural network, Computer science, business.industry, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Machine Learning (cs.LG), Set (abstract data type), Computer Science - Learning, Margin (machine learning), 020204 information systems, Encoding (memory), 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, business, Computation and Language (cs.CL), computer, Word (computer architecture), 0105 earth and related environmental sciences
Abstract: Distilling knowledge from a well-trained cumbersome network to a small one has recently become a new research topic, as lightweight neural networks with high performance are particularly in need in various resource-restricted systems. This paper addresses the problem of distilling word embeddings for NLP tasks. We propose an encoding approach to distill task-specific knowledge from a set of high-dimensional embeddings, which can reduce model complexity by a large margin as well as retain high accuracy, showing a good compromise between efficiency and performance. Experiments in two tasks reveal the phenomenon that distilling knowledge from cumbersome embeddings is better than directly training neural networks with small embeddings., Accepted by CIKM-16 as a short paper, and by the Representation Learning for Natural Language Processing (RL4NLP) Workshop @ACL-16 for presentation
Published: 2016

32. aNMM

Author: W. Bruce Croft, Qingyao Ai, Liu Yang, and Jiafeng Guo
Subjects: FOS: Computer and information sciences, Feature engineering, Matching (statistics), Computer science, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Computer Science - Information Retrieval, Ranking (information retrieval), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Question answering, Semantic matching, Computer Science - Computation and Language, Artificial neural network, business.industry, Deep learning, Weighting, Ranking, 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Information Retrieval (cs.IR)
Abstract: As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long Short-Term Memory Models (LSTMs) have recently been proposed for semantic matching of questions and answers. To achieve good results, however, these models have been combined with additional features such as word overlap or BM25 scores. Without this combination, these models perform significantly worse than methods based on linguistic feature engineering. In this paper, we propose an attention based neural matching model for ranking short answer text. We adopt value-shared weighting scheme instead of position-shared weighting scheme for combining different matching signals and incorporate question term importance learning using question attention network. Using the popular benchmark TREC QA data, we show that the relatively simple aNMM model can significantly outperform other neural network models that have been used for the question answering task, and is competitive with models that are combined with additional features. When aNMM is combined with additional features, it outperforms all baselines., Comment: Accepted as a full paper by CIKM'16
Published: 2016

33. Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment

Author: Qian Li, Shu Guo, Yangyifei Luo, Cheng Ji, Lihong Wang, Jiawei Sheng, and Jianxin Li
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: The multi-modal entity alignment (MMEA) aims to find all equivalent entity pairs between multi-modal knowledge graphs (MMKGs). Rich attributes and neighboring entities are valuable for the alignment task, but existing works ignore contextual gap problems that the aligned entities have different numbers of attributes on specific modality when learning entity representations. In this paper, we propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA) to compensate the contextual gaps through incorporating consistent alignment knowledge. Attribute-consistent KGs (ACKGs) are first constructed via multi-modal attribute uniformization with merge and generate operators so that each entity has one and only one uniform feature in each modality. The ACKGs are then fed into a relation-aware graph neural network with random dropouts, to obtain aggregated relation representations and robust entity representations. In order to evaluate the ACK-MMEA facilitated for entity alignment, we specially design a joint alignment loss for both entity and attribute evaluation. Extensive experiments conducted on two benchmark datasets show that our approach achieves excellent performance compared to its competitors.
Published: 2023

34. Response-act Guided Reinforced Dialogue Generation for Mental Health Counseling

Author: Aseem Srivastava, Ishan Pandey, Md Shad Akhtar, and Tanmoy Chakraborty
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Virtual Mental Health Assistants (VMHAs) have become a prevalent method for receiving mental health counseling in the digital healthcare space. An assistive counseling conversation commences with natural open-ended topics to familiarize the client with the environment and later converges into more fine-grained domain-specific topics. Unlike other conversational systems, which are categorized as open-domain or task-oriented systems, VMHAs possess a hybrid conversational flow. These counseling bots need to comprehend various aspects of the conversation, such as dialogue-acts, intents, etc., to engage the client in an effective conversation. Although the surge in digital health research highlights applications of many general-purpose response generation systems, they are barely suitable in the mental health domain -- the prime reason is the lack of understanding in mental health counseling. Moreover, in general, dialogue-act guided response generators are either limited to a template-based paradigm or lack appropriate semantics. To this end, we propose READER -- a REsponse-Act guided reinforced Dialogue genERation model for the mental health counseling conversations. READER is built on transformer to jointly predict a potential dialogue-act d(t+1) for the next utterance (aka response-act) and to generate an appropriate response u(t+1). Through the transformer-reinforcement-learning (TRL) with Proximal Policy Optimization (PPO), we guide the response generator to abide by d(t+1) and ensure the semantic richness of the responses via BERTScore in our reward computation. We evaluate READER on HOPE, a benchmark counseling conversation dataset and observe that it outperforms several baselines across several evaluation metrics -- METEOR, ROUGE, and BERTScore. We also furnish extensive qualitative and quantitative analyses on results, including error analysis, human evaluation, etc., This paper has been accepted by The Web Conference (WWW) 2023
Published: 2023

35. Extracting Cultural Commonsense Knowledge at Scale

Author: Tuan-Phong Nguyen, Simon Razniewski, Aparna Varde, and Gerhard Weikum
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents CANDLE, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. CANDLE extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). CANDLE includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the CANDLE CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://candle.mpi-inf.mpg.de/., 11 pages, 6 figures, 10 tables
Published: 2023

36. HateProof: Are Hateful Meme Detection Systems really Robust?

Author: Piush Aggarwal, Pranit Chawla, Mithun Das, Punyajoy Saha, Binny Mathew, Torsten Zesch, and Animesh Mukherjee
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future., Accepted at TheWebConf'2023 (WWW'2023)
Published: 2023

37. Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning

Author: Xiangrong Zhu, Guangyao Li, and Wei Hu
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Federated Learning (FL) recently emerges as a paradigm to train a global machine learning model across distributed clients without sharing raw data. Knowledge Graph (KG) embedding represents KGs in a continuous vector space, serving as the backbone of many knowledge-driven applications. As a promising combination, federated KG embedding can fully take advantage of knowledge learned from different clients while preserving the privacy of local data. However, realistic problems such as data heterogeneity and knowledge forgetting still remain to be concerned. In this paper, we propose FedLU, a novel FL framework for heterogeneous KG embedding learning and unlearning. To cope with the drift between local optimization and global convergence caused by data heterogeneity, we propose mutual knowledge distillation to transfer local knowledge to global, and absorb global knowledge back. Moreover, we present an unlearning method based on cognitive neuroscience, which combines retroactive interference and passive decay to erase specific knowledge from local clients and propagate to the global model by reusing knowledge distillation. We construct new datasets for assessing realistic performance of the state-of-the-arts. Extensive experiments show that FedLU achieves superior results in both link prediction and knowledge forgetting., Accepted in the ACM Web Conference (WWW 2023)
Published: 2023

38. CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation

Author: Congchi Yin, Piji Li, and Zhaochun Ren
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Dialogue structure discovery is essential in dialogue generation. Well-structured topic flow can leverage background information and predict future topics to help generate controllable and explainable responses. However, most previous work focused on dialogue structure learning in task-oriented dialogue other than open-domain dialogue which is more complicated and challenging. In this paper, we present a new framework CTRLStruct for dialogue structure learning to effectively explore topic-level dialogue clusters as well as their transitions with unlabelled information. Precisely, dialogue utterances encoded by bi-directional Transformer are further trained through a special designed contrastive learning task to improve representation. Then we perform clustering to utterance-level representations and form topic-level clusters that can be considered as vertices in dialogue structure graph. The edges in the graph indicating transition probability between vertices are calculated by mimicking expert behavior in datasets. Finally, dialogue structure graph is integrated into dialogue model to perform controlled response generation. Experiments on two popular open-domain dialogue datasets show our model can generate more coherent responses compared to some excellent dialogue models, as well as outperform some typical sentence embedding methods in dialogue utterance representation. Code is available in GitHub., 12 pages, to be published in The Web Conference 2023
Published: 2023

39. A Dual Prompt Learning Framework for Few-Shot Dialogue State Tracking

Author: Yuting Yang, Wenqiang Lei, Pei Huang, Juan Cao, Jintao Li, and Tat-Seng Chua
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Dialogue state tracking (DST) module is an important component for task-oriented dialog systems to understand users' goals and needs. Collecting dialogue state labels including slots and values can be costly, especially with the wide application of dialogue systems in more and more new-rising domains. In this paper, we focus on how to utilize the language understanding and generation ability of pre-trained language models for DST. We design a dual prompt learning framework for few-shot DST. Specifically, we consider the learning of slot generation and value generation as dual tasks, and two prompts are designed based on such a dual structure to incorporate task-related knowledge of these two tasks respectively. In this way, the DST task can be formulated as a language modeling task efficiently under few-shot settings. Experimental results on two task-oriented dialogue datasets show that the proposed method not only outperforms existing state-of-the-art few-shot methods, but also can generate unseen slots. It indicates that DST-related knowledge can be probed from PLM and utilized to address low-resource DST efficiently with the help of prompt learning.
Published: 2023

40. Mental Health Coping Stories on Social Media: A Causal-Inference Study of Papageno Effect

Author: Yunhao Yuan, Koustuv Saha, Barbara Keller, Erkki Tapio Isometsä, Talayeh Aledavood, Ding, Ying, Tang, Jie, Sequeda, Juan, Aroyo, Lora, Castillo, Carlos, Houben, Geert-Jan, Department of Psychiatry, Faculty of Medicine, Department of Computer Science, Microsoft Research, Lecturer Keller Barbara group, University of Helsinki, Aalto-yliopisto, and Aalto University
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Computation and Language, Papageno effect, social media, 518 Media and communications, Computer Science - Social and Information Networks, 3142 Public health care science, environmental and occupational health, suicidal ideation, Computer Science - Computers and Society, Computers and Society (cs.CY), causal inference, Computation and Language (cs.CL), mental health, natural language
Abstract: Publisher Copyright: © 2023 Owner/Author. The Papageno effect concerns how media can play a positive role in preventing and mitigating suicidal ideation and behaviors. With the increasing ubiquity and widespread use of social media, individuals often express and share lived experiences and struggles with mental health. However, there is a gap in our understanding about the existence and effectiveness of the Papageno effect in social media, which we study in this paper. In particular, we adopt a causal-inference framework to examine the impact of exposure to mental health coping stories on individuals on Twitter. We obtain a Twitter dataset with ∼2M posts by ∼10K individuals. We consider engaging with coping stories as the Treatment intervention, and adopt a stratified propensity score approach to find matched cohorts of Treatment and Control individuals. We measure the psychosocial shifts in affective, behavioral, and cognitive outcomes in longitudinal Twitter data before and after engaging with the coping stories. Our findings reveal that, engaging with coping stories leads to decreased stress and depression, and improved expressive writing, diversity, and interactivity. Our work discusses the practical and platform design implications in supporting mental wellbeing.
Published: 2023

41. Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer

Author: Wen Zhang, Yushan Zhu, Mingyang Chen, Yuxia Geng, Yufeng Huang, Yajing Xu, Wenting Song, and Huajun Chen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Knowledge graphs (KG) are essential background knowledge providers in many tasks. When designing models for KG-related tasks, one of the key tasks is to devise the Knowledge Representation and Fusion (KRF) module that learns the representation of elements from KGs and fuses them with task representations. While due to the difference of KGs and perspectives to be considered during fusion across tasks, duplicate and ad hoc KRF modules design are conducted among tasks. In this paper, we propose a novel knowledge graph pretraining model KGTransformer that could serve as a uniform KRF module in diverse KG-related tasks. We pretrain KGTransformer with three self-supervised tasks with sampled sub-graphs as input. For utilization, we propose a general prompt-tuning mechanism regarding task data as a triple prompt to allow flexible interactions between task KGs and task data. We evaluate pretrained KGTransformer on three tasks, triple classification, zero-shot image classification, and question answering. KGTransformer consistently achieves better results than specifically designed task models. Through experiments, we justify that the pretrained KGTransformer could be used off the shelf as a general and effective KRF module across KG-related tasks. The code and datasets are available at https://github.com/zjukg/KGTransformer., Comment: Work accepted by WWW2023
Published: 2023

42. Which Factors Predict the Chat Experience of a Natural Language Generation Dialogue Service?

Author: Eason Chen
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)
Abstract: In this paper, we proposed a conceptual model to predict the chat experience in a natural language generation dialog system. We evaluated the model with 120 participants with Partial Least Squares Structural Equation Modeling (PLS-SEM) and obtained an R-square (R2) with 0.541. The model considers various factors, including the prompts used for generation; coherence, sentiment, and similarity in the conversation; and users' perceived dialog agents' favorability. We then further explore the effectiveness of the subset of our proposed model. The results showed that users' favorability and coherence, sentiment, and similarity in the dialogue are positive predictors of users' chat experience. Moreover, we found users may prefer dialog agents with characteristics of Extroversion, Openness, Conscientiousness, Agreeableness, and Non-Neuroticism. Through our research, an adaptive dialog system might use collected data to infer factors in our model, predict the chat experience for users through these factors, and optimize it by adjusting prompts., CHI EA'23, April 23-28, 2023, Hamburg, Germany
Published: 2023

43. What Types of Questions Require Conversation to Answer? A Case Study of AskReddit Questions

Author: Shih-Hong Huang, Chieh-Yang Huang, Ya-Fang Lin, and Ting-Hao Kenneth Huang
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)
Abstract: The proliferation of automated conversational systems such as chatbots, spoken-dialogue systems, and smart speakers, has significantly impacted modern digital life. However, these systems are primarily designed to provide answers to well-defined questions rather than to support users in exploring complex, ill-defined questions. In this paper, we aim to push the boundaries of conversational systems by examining the types of nebulous, open-ended questions that can best be answered through conversation. We first sampled 500 questions from one million open-ended requests posted on AskReddit, and then recruited online crowd workers to answer eight inquiries about these questions. We also performed open coding to categorize the questions into 27 different domains. We found that the issues people believe require conversation to resolve satisfactorily are highly social and personal. Our work provides insights into how future research could be geared to align with users' needs., Comment: To appear in CHI 2023 Late-Breaking Work
Published: 2023

44. Twitter Opinion Topic Model

Author: Kar Wai Lim and Wray Buntine
Subjects: FOS: Computer and information sciences, Topic model, Computer Science - Computation and Language, Information retrieval, Computer science, Sentiment analysis, Lexicon, Latent Dirichlet allocation, Machine Learning (cs.LG), Computer Science - Information Retrieval, Computer Science - Learning, symbols.namesake, symbols, Social media, Product (category theory), Computation and Language (cs.CL), Information Retrieval (cs.IR), Natural language
Abstract: Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are laden with opinions, their "dirty" nature (as natural language) has discouraged researchers from applying LDA-based opinion model for product review mining. Tweets are often informal, unstructured and lacking labeled data such as categories and ratings, making it challenging for product opinion mining. In this paper, we propose an LDA-based opinion model named Twitter Opinion Topic Model (TOTM) for opinion mining and sentiment analysis. TOTM leverages hashtags, mentions, emoticons and strong sentiment words that are present in tweets in its discovery process. It improves opinion prediction by modeling the target-opinion interaction directly, thus discovering target specific opinion words, neglected in existing approaches. Moreover, we propose a new formulation of incorporating sentiment prior information into a topic model, by utilizing an existing public sentiment lexicon. This is novel in that it learns and updates with the data. We conduct experiments on 9 million tweets on electronic products, and demonstrate the improved performance of TOTM in both quantitative evaluations and qualitative analysis. We show that aspect-based opinion analysis on massive volume of tweets provides useful opinions on products., CIKM paper
Published: 2014

45. Evaluating Impact of Social Media Posts by Executives on Stock Prices

Author: Anubhav Sarkar, Swagata Chakraborty, Sohom Ghosh, and Sudip Kumar Naskar
Subjects: Social and Information Networks (cs.SI), FOS: Economics and business, FOS: Computer and information sciences, Statistical Finance (q-fin.ST), Computer Science - Computation and Language, Quantitative Finance - Statistical Finance, Computer Science - Social and Information Networks, Computation and Language (cs.CL), I.7, Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: Predicting stock market movements has always been of great interest to investors and an active area of research. Research has proven that popularity of products is highly influenced by what people talk about. Social media like Twitter, Reddit have become hotspots of such influences. This paper investigates the impact of social media posts on close price prediction of stocks using Twitter and Reddit posts. Our objective is to integrate sentiment of social media data with historical stock data and study its effect on closing prices using time series models. We carried out rigorous experiments and deep analysis using multiple deep learning based models on different datasets to study the influence of posts by executives and general people on the close price. Experimental results on multiple stocks (Apple and Tesla) and decentralised currencies (Bitcoin and Ethereum) consistently show improvements in prediction on including social media data and greater improvements on including executive posts., Accepted at the 14th meeting of Forum for Information Retrieval Evaluation (FIRE-2022)
Published: 2022

46. Exploring and evaluating personalized models for code generation

Author: Zlotchevski, Andrei, Drain, Dawn, Svyatkovskiy, Alexey, Clement, Colin, Sundaresan, Neel, and Tufano, Michele
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Machine Learning, Computer Science - Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large unsupervised corpora, learning token representations and transformations relevant to modeling generally available text, and are then fine-tuned on a particular downstream task of interest. While fine-tuning is a tried-and-true method for adapting a model to a new domain -- for example, question-answering on a given topic -- generalization remains an on-going challenge. In this paper, we explore and evaluate transformer model fine-tuning for personalization. In the context of generating unit tests for Java methods, we evaluate learning to personalize to a specific software project using several personalization techniques. We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned; (ii) lightweight fine-tuning, which freezes most of the model's parameters, allowing tuning of the token embeddings and softmax layer only or the final layer alone; (iii) prefix tuning, which keeps model parameters frozen, but optimizes a small project-specific prefix vector. Each of these techniques offers a trade-off in total compute cost and predictive performance, which we evaluate by code and task-specific metrics, training time, and total computational operations. We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios., Comment: Accepted to the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022), Industry Track - Singapore, November 14-18, 2022, to appear 9 pages
Published: 2022

47. Introducing Neural Bag of Whole-Words with ColBERTer

Author: Hofstätter, Sebastian, Khattab, Omar, Althammer, Sophia, Sertkan, Mete, and Hanbury, Allan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG)
Abstract: Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reductions dramatically lower ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multi-vector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors per document by learning unique whole-word representations for the terms in each document and learning to identify and remove word representations that are not essential to effective scoring. We employ an explicit multi-task, multi-stage training to facilitate using very small vector dimensions. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness. With just one dimension per token in its smallest setting, ColBERTer achieves index storage parity with the plaintext size, with very strong effectiveness results. Finally, we demonstrate ColBERTer's robustness on seven high-quality out-of-domain collections, yielding statistically significant gains over traditional retrieval baselines.
Published: 2022

48. MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction

Author: Wangjie Jiang, Zhihao Ye, Zijing Ou, Ruihui Zhao, Jianguang Zheng, Yi Liu, Bang Liu, Siheng Li, Yujiu Yang, and Yefeng Zheng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Chinese Spelling Correction (CSC) is gaining increasing attention due to its promise of automatically detecting and correcting spelling errors in Chinese texts. Despite its extensive use in many applications, like search engines and optical character recognition systems, little has been explored in medical scenarios in which complex and uncommon medical entities are easily misspelled. Correcting the misspellings of medical entities is arguably more difficult than those in the open domain due to its requirements of specificdomain knowledge. In this work, we define the task of Medical-domain Chinese Spelling Correction and propose MCSCSet, a large scale specialist-annotated dataset that contains about 200k samples. In contrast to the existing open-domain CSC datasets, MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists. To ensure automated dataset curation, MCSCSet further offers a medical confusion set consisting of the commonly misspelled characters of given Chinese medical terms. This enables one to create the medical misspelling dataset automatically. Extensive empirical studies have shown significant performance gaps between the open-domain and medical-domain spelling correction, highlighting the need to develop high-quality datasets that allow for Chinese spelling correction in specific domains. Moreover, our work benchmarks several representative Chinese spelling correction models, establishing baselines for future work., The full version of CIKM 2022 accepted resource paper "MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction". (https://dl.acm.org/doi/10.1145/3511808.3557636)
Published: 2022

49. Unified Multimodal Model with Unlikelihood Training for Visual Dialog

Author: Wang, Zihao, Wang, Junli, and Jiang, Changjun
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computation and Language (cs.CL)
Abstract: The task of visual dialog requires a multimodal chatbot to answer sequential questions from humans about image content. Prior work performs the standard likelihood training for answer generation on the positive instances (involving correct answers). However, the likelihood objective often leads to frequent and dull outputs and fails to exploit the useful knowledge from negative instances (involving incorrect answers). In this paper, we propose a Unified Multimodal Model with UnLikelihood Training, named UniMM-UL, to tackle this problem. First, to improve visual dialog understanding and generation by multi-task learning, our model extends ViLBERT from only supporting answer discrimination to holding both answer discrimination and answer generation seamlessly by different attention masks. Specifically, in order to make the original discriminative model compatible with answer generation, we design novel generative attention masks to implement the autoregressive Masked Language Modeling (autoregressive MLM) task. And to attenuate the adverse effects of the likelihood objective, we exploit unlikelihood training on negative instances to make the model less likely to generate incorrect answers. Then, to utilize dense annotations, we adopt different fine-tuning methods for both generating and discriminating answers, rather than just for discriminating answers as in the prior work. Finally, on the VisDial dataset, our model achieves the best generative results (69.23 NDCG score). And our model also yields comparable discriminative results with the state-of-the-art in both single-model and ensemble settings (75.92 and 76.17 NDCG scores)., Comment: Accepted by the 30th ACM International Conference on Multimedia (ACM MM 2022)
Published: 2022

50. DeepPerform: An Efficient Approach for Performance Testing of Resource-Constrained Neural Networks

Author: Simin Chen, Mirazul Haque, Cong Liu, and Wei Yang
Subjects: Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Software Engineering, Computer Science - Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Today, an increasing number of Adaptive Deep Neural Networks (AdNNs) are being used on resource-constrained embedded devices. We observe that, similar to traditional software, redundant computation exists in AdNNs, resulting in considerable performance degradation. The performance degradation is dependent on the input and is referred to as input-dependent performance bottlenecks (IDPBs). To ensure an AdNN satisfies the performance requirements of resource-constrained applications, it is essential to conduct performance testing to detect IDPBs in the AdNN. Existing neural network testing methods are primarily concerned with correctness testing, which does not involve performance testing. To fill this gap, we propose DeepPerform, a scalable approach to generate test samples to detect the IDPBs in AdNNs. We first demonstrate how the problem of generating performance test samples detecting IDPBs can be formulated as an optimization problem. Following that, we demonstrate how DeepPerform efficiently handles the optimization problem by learning and estimating the distribution of AdNNs' computational consumption. We evaluate DeepPerform on three widely used datasets against five popular AdNN models. The results show that DeepPerform generates test samples that cause more severe performance degradation (FLOPs: increase up to 552\%). Furthermore, DeepPerform is substantially more efficient than the baseline methods in generating test inputs(runtime overhead: only 6-10 milliseconds)., This paper is accepted to IEEE/ACM International Conference on Automated Software Engineering 2022
Published: 2022

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

205 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources