Author: "Timothy Baldwin" / Topic: computer - Searchworks@Jio Institute Digital Library Search Results

1. Evaluating Document Coherence Modeling

Author: Aili Shen, Meladel Mistica, Jianzhong Qi, Hang Li, Bahar Salehi, and Timothy Baldwin
Subjects: Linguistics and Language, business.industry, Computer science, Communication, 02 engineering and technology, Coherence (statistics), Intrusion detection system, computer.software_genre, Computer Science Applications, Task (project management), Human-Computer Interaction, 03 medical and health sciences, Range (mathematics), 0302 clinical medicine, Artificial Intelligence, 030221 ophthalmology & optometry, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, business, computer, Natural language processing, Sentence
Abstract: While pretrained language models (LMs) have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards a better understanding of their discourse modeling capabilities, we propose a sentence intrusion detection task. We examine the performance of a broad range of pretrained LMs on this detection task for English. Lacking a dataset for the task, we introduce INSteD, a novel intruder sentence detection dataset, containing 170,000+ documents constructed from English Wikipedia and CNN news articles. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting, indicating limited generalization capacity. Further results over a novel linguistic probe dataset show that there is substantial room for improvement, especially in the cross- domain setting.
Published: 2021

2. A General Approach to Multimodal Document Quality Assessment

Author: Timothy Baldwin, Bahar Salehi, Jianzhong Qi, and Aili Shen
Subjects: Computer science, business.industry, media_common.quotation_subject, Context (language use), 02 engineering and technology, computer.software_genre, Readability, Rendering (computer graphics), Task (project management), Artificial Intelligence, Font, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Grammaticality, Quality (business), Artificial intelligence, business, Feature learning, computer, Natural language processing, media_common
Abstract: The perceived quality of a document is affected by various factors, including grammat- icality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one. In this paper, we explore this task in the context of assessing the quality of Wikipedia articles and academic papers. Observing that the visual rendering of a document can capture implicit quality indicators that are not present in the document text — such as images, font choices, and visual layout — we propose a joint model that combines the text content with a visual rendering of the document for document qual- ity assessment. Our joint model achieves state-of-the-art results over five datasets in two domains (Wikipedia and academic papers), which demonstrates the complementarity of textual and visual features, and the general applicability of our model. To examine what kinds of features our model has learned, we further train our model in a multi-task learning setting, where document quality assessment is the primary task and feature learning is an auxiliary task. Experimental results show that visual embeddings are better at learning structural features while textual embeddings are better at learning readability scores, which further verifies the complementarity of visual and textual features.
Published: 2020

3. Evaluating the Efficacy of Summarization Evaluation across Languages

Author: Jey Han Lau, Fajri Koto, and Timothy Baldwin
Subjects: FOS: Computer and information sciences, Focus (computing), Computer Science - Computation and Language, Recall, business.industry, Computer science, computer.software_genre, Automatic summarization, Evaluation methods, Artificial intelligence, business, computer, Computation and Language (cs.CL), Natural language processing
Abstract: While automatic summarization evaluation methods developed for English are routinely applied to other languages, this is the first attempt to systematically quantify their panlinguistic efficacy. We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall). Based on this, we evaluate 19 summarization evaluation metrics, and find that using multilingual BERT within BERTScore performs well across all languages, at a level above that for English., Findings of ACL 2021
Published: 2021

4. Discourse Probing of Pretrained Language Models

Author: Timothy Baldwin, Fajri Koto, and Jey Han Lau
Subjects: FOS: Computer and information sciences, 050101 languages & linguistics, Computer Science - Computation and Language, Computer science, business.industry, 05 social sciences, Baseline model, 02 engineering and technology, computer.software_genre, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0501 psychology and cognitive sciences, Artificial intelligence, Language model, business, computer, Encoder, Computation and Language (cs.CL), Natural language processing
Abstract: Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks. In this paper, we introduce document-level discourse probing to evaluate the ability of pretrained LMs to capture document-level relations. We experiment with 7 pretrained LMs, 4 languages, and 7 discourse probing tasks, and find BART to be overall the best model at capturing discourse -- but only in its encoder, with BERT performing surprisingly well as the baseline model. Across the different models, there are substantial differences in which layers best capture discourse information, and large disparities between models., Accepted at NAACL 2021
Published: 2021

5. ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents

Author: Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Yoshikawa, Ameer Albahem, Lawrence Cavedon, Trevor Cohn, Timothy Baldwin, and Karin Verspoor
Subjects: 0301 basic medicine, event extraction, Computer science, named entity recognition, computer.software_genre, 01 natural sciences, Bibliography. Library science. Information resources, Task (project management), 03 medical and health sciences, Named-entity recognition, Research Metrics and Analytics, information extraction, patent text mining, Original Research, Information retrieval, Event (computing), chemical reactions, cheminformatics, 0104 chemical sciences, 010404 medicinal & biomolecular chemistry, Information extraction, Identification (information), 030104 developmental biology, Cheminformatics, Key (cryptography), computer, Test data
Abstract: Chemical patents represent a valuable source of information about new chemical compounds, which is critical to the drug discovery process. Automated information extraction over chemical patents is, however, a challenging task due to the large volume of existing patents and the complex linguistic properties of chemical patents. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced text mining techniques for chemical patents. The ChEMU 2020 lab proposed two fundamental information extraction tasks focusing on chemical reaction processes described in chemical patents: (1)chemical named entity recognition, requiring identification of essential chemical entities and their roles in chemical reactions, as well as reaction conditions; and (2)event extraction, which aims at identification of event steps relating the entities involved in chemical reactions. The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines. We further show the methods to be robust to variations in sampling of the test data. We provide a detailed overview of the ChEMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. We also present the methods adopted by participants, provide a detailed analysis of their performance, and carefully consider the potential impact of data leakage on interpretation of the results. The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents.
Published: 2021

6. ChEMU 2021: Reaction Reference Resolution and Anaphora Resolution in Chemical Patents

Author: Zubair Afzal, Saber A. Akhondi, Hiyori Yoshikawa, Karin Verspoor, Camilo Thorne, Yuan Li, Lawrence Cavedon, Zenan Zhai, Jiayuan He, Trevor Cohn, Christian Druckenbrodt, Timothy Baldwin, and Biaoyan Fang
Subjects: Information extraction, Information retrieval, Computer science, Cheminformatics, Anaphora (linguistics), Resolution (logic), computer.software_genre, computer, Task (project management)
Abstract: Chemical patents serve as an indispensable source of information about new discoveries of chemical compounds. The ChEMU (Cheminformatics Elsevier Melbourne University) lab addresses information extraction over chemical patents, and aims to advance the state of the art on this topic. ChEMU lab 2021, as part of the 12th Conference and Labs of the Evaluation Forum (CLEF-2021), will be the second ChEMU lab. ChEMU 2021 will provide two distinct tasks related to reference resolution in chemical patents. Task 1—Chemical Reaction Reference Resolution—focuses on paragraph-level references and aims to identify the chemical reactions or general conditions specified in one reaction description referred to by another. Task 2—Anaphora Resolution—focuses on expression-level references and aims to identify the reference relationships between expressions in chemical reaction descriptions. In this paper, we introduce ChEMU 2021, including its motivation, goals, tasks, resources, and evaluation framework.
Published: 2021

7. Overview of ChEMU 2021: Reaction Reference Resolution and Anaphora Resolution in Chemical Patents

Author: Hiyori Yoshikawa, Christian Druckenbrodt, Saber A. Akhondi, Yuan Li, Zenan Zhai, Biaoyan Fang, Zubair Afzal, Timothy Baldwin, Jiayuan He, Karin Verspoor, and Camilo Thorne
Subjects: Information extraction, Information retrieval, Computer science, Cheminformatics, Anaphora (linguistics), Resolution (logic), computer.software_genre, computer, Clef, Task (project management)
Abstract: In this paper, we provide an overview of the Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2021, part of the Conference and Labs of the Evaluation Forum 2021 (CLEF 2021). The ChEMU evaluation lab focuses on information extraction over chemical reactions from patent texts. As the second instance of our ChEMU lab series, we build upon the ChEMU corpus developed for ChEMU 2020, extending it for two distinct tasks related to reference resolution in chemical patents. Task 1—Chemical Reaction Reference Resolution—focuses on paragraph-level references and aims to identify the chemical reactions or general conditions specified in one reaction description referred to by another. Task 2—Anaphora Resolution—focuses on expression-level references and aims to identify the reference relationships between expressions in chemical reaction descriptions. Herein, we describe the resources created for these tasks and the evaluation methodology adopted. We also provide a brief summary of the results obtained in this lab, finding that one submission achieves substantially better results than our baseline models.
Published: 2021

8. ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain

Author: Saber A. Akhondi, Timothy Baldwin, Karin Verspoor, Biaoyan Fang, Jiayuan He, and Christian Druckenbrodt
Subjects: Scheme (programming language), Coreference, Bridging (networking), business.industry, Computer science, Resolution (logic), computer.software_genre, Domain (software engineering), Annotation, Artificial intelligence, business, computer, Natural language processing, computer.programming_language, Anaphora (rhetoric)
Abstract: Chemical patents contain rich coreference and bridging links, which are the target of this research. Specially, we introduce a novel annotation scheme, based on which we create the ChEMU-Ref dataset from reaction description snippets in English-language chemical patents. We propose a neural approach to anaphora resolution, which we show to achieve strong results, especially when jointly trained over coreference and bridging links.
Published: 2021

9. Top-down Discourse Parsing via Sequence Labelling

Author: Timothy Baldwin, Jey Han Lau, and Fajri Koto
Subjects: FOS: Computer and information sciences, Sequence, Computer Science - Computation and Language, Parsing, Computer science, business.industry, Framing (World Wide Web), Top-down and bottom-up design, computer.software_genre, Oracle, Task (project management), Metric (mathematics), Artificial intelligence, business, Computation and Language (cs.CL), computer, Natural language processing, Transformer (machine learning model)
Abstract: We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020). By framing the task as a sequence labelling problem where the goal is to iteratively segment a document into individual discourse units, we are able to eliminate the decoder and reduce the search space for splitting points. We explore both traditional recurrent models and modern pre-trained transformer models for the task, and additionally introduce a novel dynamic oracle for top-down parsing. Based on the Full metric, our proposed LSTM model sets a new state-of-the-art for RST parsing., Accepted at EACL 2021
Published: 2021

10. On the (In)Effectiveness of Images for Text Classification

Author: Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Tomoya Iwakura, Timothy Baldwin, and Daniel Beck
Subjects: Focus (computing), Computer science, business.industry, Core component, Context (language use), computer.software_genre, Confounding effect, Named entity, External data, Artificial intelligence, business, computer, Natural language processing, Range (computer programming), Complement (set theory)
Abstract: Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.
Published: 2021

11. Diverse Adversaries for Mitigating Bias in Training

Author: Trevor Cohn, Timothy Baldwin, and Xudong Han
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, business.industry, Computer Science - Artificial Intelligence, Stability (learning theory), Machine learning, computer.software_genre, Training (civil), Machine Learning (cs.LG), Adversarial system, Artificial Intelligence (cs.AI), Learning based, Artificial intelligence, business, computer, Model bias
Abstract: Adversarial learning can learn fairer and less biased models of language than standard methods. However, current adversarial techniques only partially mitigate model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results show that our method substantially improves over standard adversarial removal methods, in terms of reducing bias and the stability of training., Comment: EACL 2021 (5 pages + 1 references)
Published: 2021
Full Text: View/download PDF

12. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

Author: Timothy Baldwin, Fajri Koto, Jey Han Lau, and Afshin Rahimi
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, business.industry, Computer science, 010501 environmental sciences, computer.software_genre, Semantics, 01 natural sciences, language.human_language, Indonesian, 030507 speech-language pathology & audiology, 03 medical and health sciences, Resource (project management), language, Benchmark (computing), Artificial intelligence, Language model, 0305 other medical science, business, computer, Computation and Language (cs.CL), Natural language processing, 0105 earth and related environmental sciences, Spoken language
Abstract: Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research. Previous work on Indonesian has been hampered by a lack of annotated datasets, a sparsity of language resources, and a lack of resource standardization. In this work, we release the IndoLEM dataset comprising seven tasks for the Indonesian language, spanning morpho-syntax, semantics, and discourse. We additionally release IndoBERT, a new pre-trained language model for Indonesian, and evaluate it over IndoLEM, in addition to benchmarking it against existing resources. Our experiments show that IndoBERT achieves state-of-the-art performance over most of the tasks in IndoLEM., Accepted at COLING 2020 - The 28th International Conference on Computational Linguistics
Published: 2020

13. Target Word Masking for Location Metonymy Resolution

Author: Timothy Baldwin, Maria Vasardani, Haonan Li, and Martin Tomko
Subjects: FOS: Computer and information sciences, Metonymy, Parsing, Computer Science - Computation and Language, Computer science, business.industry, 68T50, I.2.7, 02 engineering and technology, Toponymy, 010502 geochemistry & geophysics, computer.software_genre, 01 natural sciences, Masking (Electronic Health Record), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Natural language processing, 0105 earth and related environmental sciences
Abstract: Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources. In this paper, we propose an end-to-end word-level classification approach based only on BERT, without dependencies on taggers, parsers, curated dictionaries of place names, or other external resources. We show that our approach achieves the state-of-the-art on 5 datasets, surpassing conventional BERT models and benchmarks by a large margin. We also show that our approach generalises well to unseen data., 12 pages
Published: 2020

14. Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?

Author: Jey Han Lau, Kobi Leins, and Timothy Baldwin
Subjects: FOS: Computer and information sciences, 0303 health sciences, 030505 public health, Computer Science - Computation and Language, Basis (linear algebra), Computer science, business.industry, computer.software_genre, Focus (linguistics), Dual (category theory), 03 medical and health sciences, Racial bias, Artificial intelligence, 0305 other medical science, business, Computation and Language (cs.CL), computer, Scientific disciplines, Natural language processing, 030304 developmental biology
Abstract: As part of growing NLP capabilities, coupled with an awareness of the ethical dimensions of research, questions have been raised about whether particular datasets and tasks should be deemed off-limits for NLP research. We examine this question with respect to a paper on automatic legal sentencing from EMNLP 2019 which was a source of some debate, in asking whether the paper should have been allowed to be published, who should have been charged with making such a decision, and on what basis. We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines., 6 pages; accepted for ACL2020
Published: 2020

15. ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents

Author: Hiyori Yoshikawa, Ralph Hoessel, Biaoyan Fang, Saber A. Akhondi, Zenan Zhai, Dat Quoc Nguyen, Timothy Baldwin, Trevor Cohn, Christian Druckenbrodt, Karin Verspoor, and Camilo Thorne
Subjects: Event trigger, Information extraction, Information retrieval, Named-entity recognition, Computer science, Event (computing), Cheminformatics, Key (cryptography), Context (language use), computer.software_genre, computer, Task (project management)
Abstract: We introduce a new evaluation lab named ChEMU (Cheminformatics Elsevier Melbourne University), part of the 11th Conference and Labs of the Evaluation Forum (CLEF-2020). ChEMU involves two key information extraction tasks over chemical reactions from patents. Task 1—Named entity recognition—involves identifying chemical compounds as well as their types in context, i.e., to assign the label of a chemical compound according to the role which the compound plays within a chemical reaction. Task 2—Event extraction over chemical reactions—involves event trigger detection and argument recognition. We briefly present the motivations and goals of the ChEMU tasks, as well as resources and evaluation methodology.
Published: 2020

16. Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes

Author: Karin Verspoor, James R. Gilkerson, LY Hardefeldt, Timothy Baldwin, and Brian Hur
Subjects: Domain adaptation, Veterinary medicine, Antibiotic resistance, Computer science, Document classification, Antimicrobial stewardship, Instance selection, Disease, computer.software_genre, computer
Abstract: Identifying the reasons for antibiotic administration in veterinary records is a critical component of understanding antimicrobial usage patterns. This informs antimicrobial stewardship programs designed to fight antimicrobial resistance, a major health crisis affecting both humans and animals in which veterinarians have an important role to play. We propose a document classification approach to determine the reason for administration of a given drug, with particular focus on domain adaptation from one drug to another, and instance selection to minimize annotation effort.
Published: 2020

17. Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents

Author: Zubair Afzal, Saber A. Akhondi, Biaoyan Fang, Camilo Thorne, Hiyori Yoshikawa, Ameer Albahem, Dat Quoc Nguyen, Zenan Zhai, Jiayuan He, Timothy Baldwin, Karin Verspoor, Christian Druckenbrodt, Lawrence Cavedon, Trevor Cohn, and Ralph Hoessel
Subjects: 0301 basic medicine, Information retrieval, Event (computing), Computer science, computer.software_genre, 01 natural sciences, 0104 chemical sciences, Task (project management), 010404 medicinal & biomolecular chemistry, 03 medical and health sciences, Information extraction, Identification (information), 030104 developmental biology, Named-entity recognition, Cheminformatics, Key (cryptography), computer
Abstract: In this paper, we provide an overview of the Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020). The ChEMU evaluation lab focuses on information extraction over chemical reactions from patent texts. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 addresses chemical named entity recognition, the identification of chemical compounds and their specific roles in chemical reactions. Task 2 focuses on event extraction, the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. Herein, we describe the resources created for these tasks and the evaluation methodology adopted. We also provide a brief summary of the participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than our baseline methods.
Published: 2020

18. Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity

Author: Karin Verspoor, Yuxia Wang, Fei Liu, and Timothy Baldwin
Subjects: Computer science, business.industry, Pooling, Overfitting, Machine learning, computer.software_genre, Task (project management), Domain (software engineering), Similarity (psychology), Language model, Artificial intelligence, Focus (optics), business, computer
Abstract: In this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain. In low-resource setting of clinical STS, these large models tend to be impractical and prone to overfitting. Building on BERT, we study the impact of a number of model design choices, namely different fine-tuning and pooling strategies. We observe that the impact of domain-specific fine-tuning on clinical STS is much less than that in the general domain, likely due to the concept richness of the domain. Based on this, we propose two data augmentation techniques. Experimental results on N2C2-STS 1 demonstrate substantial improvements, validating the utility of the proposed methods.
Published: 2020

19. Learning from Unlabelled Data for Clinical Semantic Textual Similarity

Author: Timothy Baldwin, Yuxia Wang, and Karin Verspoor
Subjects: Training set, Computer science, business.industry, Unlabelled data, 02 engineering and technology, computer.software_genre, Domain (software engineering), Task (project management), 020204 information systems, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Sentence, Natural language processing, Test data
Abstract: Domain pretraining followed by task fine-tuning has become the standard paradigm for NLP tasks, but requires in-domain labelled data for task fine-tuning. To overcome this, we propose to utilise domain unlabelled data by assigning pseudo labels from a general model. We evaluate the approach on two clinical STS datasets, and achieve r= 0.80 on N2C2-STS. Further investigation reveals that if the data distribution of unlabelled sentence pairs is closer to the test data, we can obtain better performance. By leveraging a large general-purpose STS dataset and small-scale in-domain training data, we obtain further improvements to r= 0.90, a new SOTA.
Published: 2020

20. Using natural language processing and VetCompass to understand antimicrobial usage patterns in Australia

Author: James R. Gilkerson, Timothy Baldwin, Brian Hur, Karin Verspoor, and LY Hardefeldt
Subjects: Antiinfective agent, General Veterinary, 040301 veterinary sciences, business.industry, Computer science, Veterinary clinics, Companion animal, Medical record, 0402 animal and dairy science, 04 agricultural and veterinary sciences, General Medicine, computer.software_genre, Antimicrobial, 040201 dairy & animal science, 0403 veterinary science, Text messaging, Antimicrobial stewardship, Artificial intelligence, business, computer, Clinical record, Natural language processing
Abstract: Background Currently there is an incomplete understanding of antimicrobial usage patterns in veterinary clinics in Australia, but such knowledge is critical for the successful implementation and monitoring of antimicrobial stewardship programs. Methods VetCompass Australia collects medical records from 181 clinics in Australia (as of May 2018). These records contain detailed information from individual consultations regarding the medications dispensed. One unique aspect of VetCompass Australia is its focus on applying natural language processing (NLP) and machine learning techniques to analyse the records, similar to efforts conducted in other medical studies. Results The free text fields of 4,394,493 veterinary consultation records of dogs and cats between 2013 and 2018 were collated by VetCompass Australia and NLP techniques applied to enable the querying of the antimicrobial usage within these consultations. Conclusion The NLP algorithms developed matched antimicrobial in clinical records with 96.7% accuracy and an F1 Score of 0.85, as evaluated relative to expert annotations. This dataset can be readily queried to demonstrate the antimicrobial usage patterns of companion animal practices throughout Australia.
Published: 2019

21. Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice

Author: Jan Šnajder, Julian Brooke, and Timothy Baldwin
Subjects: Linguistics and Language, Computer science, Association (object-oriented programming), 02 engineering and technology, computer.software_genre, Measure (mathematics), Lexical item, Ranking (information retrieval), 03 medical and health sciences, 0302 clinical medicine, Artificial Intelligence, Simple (abstract algebra), 0202 electrical engineering, electronic engineering, information engineering, business.industry, Communication, Contrast (statistics), Computer Science Applications, Human-Computer Interaction, Lattice (module), n-gram, 030221 ophthalmology & optometry, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing
Abstract: We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates. In contrast to the standard approach of simple ranking by association measure, in our model n-grams are arranged in a lattice structure based on subsumption and overlap relationships, with nodes inhibiting other nodes in their vicinity when they are selected as a lexical item. We show how the configuration of such a lattice can be optimized tractably, and demonstrate using annotations of sampled n-grams that our method consistently outperforms alternatives by at least 0.05 F-score across several corpora and languages.
Published: 2017

22. SemEval-2017 Task 3: Community Question Answering

Author: Karin Verspoor, Alessandro Moschitti, Preslav Nakov, Timothy Baldwin, Doris Hoogeveen, Lluís Màrquez, and Hamdy Mubarak
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Arabic, Computer science, 68T50, 02 engineering and technology, computer.software_genre, Computer Science - Information Retrieval, Task (project management), Machine Learning (cs.LG), Similarity (network science), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Question answering, Computer Science - Computation and Language, business.industry, I.2.7, language.human_language, SemEval, Artificial Intelligence (cs.AI), language, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Computation and Language (cs.CL), Natural language processing, Information Retrieval (cs.IR)
Abstract: We describe SemEval-2017 Task 3 on Community Question Answering. This year, we reran the four subtasks from SemEval-2016:(A) Question-Comment Similarity,(B) Question-Question Similarity,(C) Question-External Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 2015 and 2016 for training, and fresh data for testing. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums. A total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks A-D. Unfortunately, no teams participated in subtask E. A variety of approaches and features were used by the participating systems to address the different subtasks. The best systems achieved an official score (MAP) of 88.43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. These scores are better than the baselines, especially for subtasks A-C., Comment: community question answering, question-question similarity, question-comment similarity, answer reranking, Multi-domain Question Duplicate Detection, StackExchange, English, Arabic
Published: 2019
Full Text: View/download PDF

23. How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions

Author: Bahar Salehi, Timothy Baldwin, and Navnita Nandakumar
Subjects: business.industry, Principle of compositionality, Computer science, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Character (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Embedding, 020201 artificial intelligence & image processing, Word2vec, Artificial intelligence, business, computer, Natural language processing, Word (computer architecture), 0105 earth and related environmental sciences
Abstract: In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.
Published: 2019

24. Reevaluating Argument Component Extraction in Low Resource Settings

Author: Timothy Baldwin, Cecile Paris, Anirudh Joshi, and Richard O. Sinnott
Subjects: 0301 basic medicine, Low resource, business.industry, Computer science, Deep learning, 030106 microbiology, computer.software_genre, Task (project management), 03 medical and health sciences, 030104 developmental biology, Named-entity recognition, Argument, Component (UML), Artificial intelligence, business, computer, Natural language processing, Meaning (linguistics)
Abstract: Argument component extraction is a challenging and complex high-level semantic extraction task. As such, it is both expensive to annotate (meaning training data is limited and low-resource by nature), and hard for current-generation deep learning methods to model. In this paper, we reevaluate the performance of state-of-the-art approaches in both single- and multi-task learning settings using combinations of character-level, GloVe, ELMo, and BERT encodings using standard BiLSTM-CRF encoders. We use evaluation metrics that are more consistent with evaluation practice in named entity recognition to understand how well current baselines address this challenge and compare their performance to lower-level semantic tasks such as CoNLL named entity recognition. We find that performance utilizing various pre-trained representations and training methodologies often leaves a lot to be desired as it currently stands, and suggest future pathways for improvement.
Published: 2019

25. Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation

Author: Timothy Baldwin, Nitika Mathur, and Trevor Cohn
Subjects: Machine translation, Computer science, business.industry, Context (language use), 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Field (computer science), Metric (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Evaluation of machine translation, Artificial intelligence, business, computer, Word (computer architecture), Sentence, Natural language processing, 0105 earth and related environmental sciences
Abstract: Accurate, automatic evaluation of machine translation is critical for system tuning, and evaluating progress in the field. We proposed a simple unsupervised metric, and additional supervised metrics which rely on contextual word embeddings to encode the translation and reference sentences. We find that these models rival or surpass all existing metrics in the WMT 2017 sentence-level and system-level tracks, and our trained model has a substantially higher correlation with human judgements than all existing metrics on the WMT 2017 to-English sentence level dataset.
Published: 2019

26. Contextualization of Morphological Inflection

Author: Ekaterina Vylomova, Ryan Cotterell, Jason Eisner, Timothy Baldwin, and Trevor Cohn
Subjects: FOS: Computer and information sciences, Contextualization, Computer Science - Computation and Language, Computer science, business.industry, Realization (linguistics), Natural language generation, Context (language use), 02 engineering and technology, computer.software_genre, 03 medical and health sciences, 0302 clinical medicine, Inflection, 030221 ophthalmology & optometry, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Computation and Language (cs.CL), Sentence, Word (computer architecture), Natural language processing
Abstract: Critical to natural language generation is the production of correctly inflected text. In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version. Unlike traditional morphological inflection or surface realization, our task input does not provide ``gold'' tags that specify what morphological features to realize on each lemmatized word; rather, such features must be inferred from sentential context. We develop a neural hybrid graphical model that explicitly reconstructs morphological features before predicting the inflected forms, and compare this to a system that directly predicts the inflected forms without relying on any morphological annotation. We experiment on several typologically diverse languages from the Universal Dependencies treebanks, showing the utility of incorporating linguistically-motivated latent variables into NLP models., Comment: NAACL 2019
Published: 2019
Full Text: View/download PDF

27. Semi-supervised Stochastic Multi-Domain Learning using Variational Inference

Author: Trevor Cohn, Yitong Li, and Timothy Baldwin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Matching (statistics), Computer science, Inference, 02 engineering and technology, Latent variable, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Machine Learning (cs.LG), Domain (software engineering), 0202 electrical engineering, electronic engineering, information engineering, Latent variable model, 0105 earth and related environmental sciences, Computer Science - Computation and Language, business.industry, SIGNAL (programming language), Multi domain, Benchmark (computing), 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer
Abstract: Supervised models of NLP rely on large collections of text which closely resemble the intended testing setting. Unfortunately matching text is often not available in sufficient quantity, and moreover, within any domain of text, data is often highly heterogenous. In this paper we propose a method to distill the important domain signal as part of a multi-domain learning system, using a latent variable model in which parts of a neural model are stochastically gated based on the inferred domain. We compare the use of discrete versus continuous latent variables, operating in a domain-supervised or a domain semi-supervised setting, where the domain is known only for a subset of training inputs. We show that our model leads to substantial performance improvements over competitive benchmark domain adaptation methods, including methods using adversarial learning., ACL 2019 (9 pages + 2 references + 1 appendices)
Published: 2019

28. Multitask Learning for Query Segmentation in Job Search

Author: Fei Liu, Timothy Baldwin, Wilson Wong, and Bahar Salehi
Subjects: business.industry, Computer science, Multi-task learning, 02 engineering and technology, Machine learning, computer.software_genre, Semantics, Task (project management), Term (time), Search engine, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business, computer
Abstract: In this paper, we present the first attempt to use multitask learning for query segmentation. We use the semantic category of the words as an auxiliary task and show that segmentation improves when the model is also trained to predict the semantic category of the query terms, outperforming benchmark methods over a novel dataset from a popular job search engine. Our further experiments show that the task of modeling the query term semantics performs better as a standalone task, without adding segmentation as an auxiliary task.
Published: 2018

29. Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis

Author: Fei Liu, Trevor Cohn, and Timothy Baldwin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial neural network, Computer science, business.industry, Polarity (physics), Mechanism (biology), Sentiment analysis, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Task (project management), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Delayed Memory, Artificial intelligence, business, Set (psychology), Computation and Language (cs.CL), computer, 0105 earth and related environmental sciences
Abstract: While neural networks have been shown to achieve impressive results for sentence-level sentiment analysis, targeted aspect-based sentiment analysis (TABSA) --- extraction of fine-grained opinion polarity w.r.t. a pre-defined set of aspects --- remains a difficult task. Motivated by recent advances in memory-augmented models for machine reading, we propose a novel architecture, utilising external "memory chains" with a delayed memory update mechanism to track entities. On a TABSA task, the proposed model demonstrates substantial improvements over state-of-the-art approaches, including those using external knowledge bases., Accepted to NAACL 2018 (camera-ready)
Published: 2018

30. Semi-supervised User Geolocation via Graph Convolutional Networks

Author: Afshin Rahimi, Timothy Baldwin, and Trevor Cohn
Subjects: FOS: Computer and information sciences, Geolocation, Computer Science - Computation and Language, Computer science, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), 020201 artificial intelligence & image processing, 02 engineering and technology, Data mining, computer.software_genre, Computation and Language (cs.CL), computer
Abstract: Social media user geolocation is vital to many applications such as event detection. In this paper, we propose GCN, a multiview geolocation model based on Graph Convolutional Networks, that uses both text and network context. We compare GCN to the state-of-the-art, and to two baselines we propose, and show that our model achieves or is competitive with the state- of-the-art over three benchmark geolocation datasets when sufficient supervision is available. We also evaluate GCN under a minimal supervision scenario, and show it outperforms baselines. We find that highway network gates are essential for controlling the amount of useful neighbourhood expansion in GCN., Comment: ACL2018
Published: 2018

31. What’s in a Domain? Learning Domain-Robust Text Representations using Adversarial Training

Author: Timothy Baldwin, Yitong Li, and Trevor Cohn
Subjects: Language identification, Computer science, business.industry, Sentiment analysis, A domain, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Training (civil), Raising (linguistics), Domain (software engineering), Task (project management), Adversarial system, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 0105 earth and related environmental sciences
Abstract: Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training. This requires learning an underlying task, while not learning irrelevant signals and biases specific to individual domains. We propose a novel method to optimise both in- and out-of-domain accuracy based on joint learning of a structured neural model with domain-specific and domain-general components, coupled with adversarial training for domain. Evaluating on multi-domain language identification and multi-domain sentiment analysis, we show substantial improvements over standard domain adaptation techniques, and domain-adversarial training.
Published: 2018

32. UniMelb at SemEval-2018 Task 12: Generative Implication using LSTMs, Siamese Networks and Semantic Representations with Synonym Fuzzing

Author: Cecile Paris, Timothy Baldwin, Anirudh Joshi, and Richard O. Sinnott
Subjects: Computer science, business.industry, Synonym, Feature vector, WordNet, 020206 networking & telecommunications, 02 engineering and technology, Fuzz testing, computer.software_genre, SemEval, Task (project management), Semantic similarity, Synonym (database), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Generative grammar
Abstract: This paper describes a warrant classification system for SemEval 2018 Task 12, that attempts to learn semantic representations of reasons, claims and warrants. The system consists of 3 stacked LSTMs: one for the reason, one for the claim, and one shared Siamese Network for the 2 candidate warrants. Our main contribution is to force the embeddings into a shared feature space using vector operations, semantic similarity classification, Siamese networks, and multi-task learning. In doing so, we learn a form of generative implication, in encoding implication interrelationships between reasons, claims, and the associated correct and incorrect warrants. We augment the limited data in the task further by utilizing WordNet synonym “fuzzing”. When applied to SemEval 2018 Task 12, our system performs well on the development data, and officially ranked 8th among 21 teams.
Published: 2018

33. Narrative Modeling with Memory Chains and Semantic Supervision

Author: Timothy Baldwin, Trevor Cohn, and Fei Liu
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer science, business.industry, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Task (project management), Focus (linguistics), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Narrative, Artificial intelligence, State (computer science), business, computer, Computation and Language (cs.CL), Natural language processing, 0105 earth and related environmental sciences
Abstract: Story comprehension requires a deep semantic understanding of the narrative, making it a challenging task. Inspired by previous studies on ROC Story Cloze Test, we propose a novel method, tracking various semantic aspects with external neural memory chains while encouraging each to focus on a particular semantic aspect. Evaluated on the task of story ending prediction, our model demonstrates superior performance to a collection of competitive baselines, setting a new state of the art., Comment: Accepted to ACL 2018 (camera-ready)
Published: 2018
Full Text: View/download PDF

34. Can machine translation systems be evaluated by the crowd alone

Author: Alistair Moffat, Timothy Baldwin, Justin Zobel, and Yvette Graham
Subjects: Linguistics and Language, Machine translation, Computer science, business.industry, Scale (chemistry), media_common.quotation_subject, 02 engineering and technology, Translation (geometry), computer.software_genre, Language and Linguistics, Task (project management), Artificial Intelligence, 020204 information systems, Component (UML), Replication (statistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Quality (business), Isolation (database systems), Artificial intelligence, business, computer, Software, Natural language processing, media_common
Abstract: Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present a new methodology for crowd-sourcing human assessments of translation quality, which allows individual workers to develop their own individual assessment strategy. Agreement with experts is no longer required, and a worker is deemed reliable if they are consistent relative to their own previous work. Individual translations are assessed in isolation from all others in the form of direct estimates of translation quality. This allows more meaningful statistics to be computed for systems and enables significance to be determined on smaller sets of assessments. We demonstrate the methodology's feasibility in large-scale human evaluation through replication of the human evaluation component of Workshop on Statistical Machine Translation shared translation task for two language pairs, Spanish-to-English and English-to-Spanish. Results for measurement based solely on crowd-sourced assessments show system rankings in line with those of the original evaluation. Comparison of results produced by the relative preference approach and the direct estimate method described here demonstrate that the direct estimate method has a substantially increased ability to identify significant differences between translation systems.
Published: 2015

35. gDelta: a missing link in the grammar engineering toolchain

Author: Timothy Baldwin, Ned Letcher, and Rebecca Dridan
Subjects: ID/LP grammar, Linguistics and Language, Head-driven phrase structure grammar, Computer science, media_common.quotation_subject, Attribute grammar, Emergent grammar, Operator-precedence grammar, Mildly context-sensitive grammar formalism, Library and Information Sciences, computer.software_genre, Grammar systems theory, Language and Linguistics, Education, Adaptive grammar, Rule-based machine translation, Grammar-based code, Stochastic grammar, Regular tree grammar, Relational grammar, media_common, Parsing, Grammar, Programming language, business.industry, Phrase structure rules, Link grammar, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Extended Affix Grammar, Affix grammar, Synchronous context-free grammar, Artificial intelligence, Regular grammar, Computational linguistics, business, computer, Generative grammar, Natural language processing
Abstract: The development of precision grammars is an inherently resource-intensive process; their complexity means that changes made to one area of a grammar often introduce unexpected flow-on effects elsewhere in the grammar which may only be discovered after some time has been invested in updating numerous test suite items. In this paper, we present the browser-based gDelta tool, which aims to provide grammar engineers with more immediate feedback on the impact of changes made to a grammar by comparing parser output from two different grammar versions. We describe an attribute weighting algorithm for highlighting components of the grammar that have been strongly impacted by a modification to the grammar, as well as a technique for clustering test suite items whose parsability has changed, in order to locate related groups of effects. These two techniques are used to present the grammar engineer with different views on the grammar to inform them of different aspects of change in a data-driven manner.
Published: 2015

36. Automatic Detection and Language Identification of Multilingual Documents

Author: Timothy Baldwin, Jey Han Lau, and Marco Lui
Subjects: Linguistics and Language, Information retrieval, Language identification, Computer science, business.industry, Communication, computer.software_genre, Synthetic data, Computer Science Applications, Task (project management), Human-Computer Interaction, Artificial Intelligence, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Artificial intelligence, business, computer, Natural language processing
Abstract: Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document. In this work, we address the problem of detecting documents that contain text from more than one language ( multilingual documents). We introduce a method that is able to detect that a document is multilingual, identify the languages present, and estimate their relative proportions. We demonstrate the effectiveness of our method over synthetic data, as well as real-world multilingual documents collected from the web.
Published: 2014

37. Pairwise Webpage Coreference Classification Using Distant Supervision

Author: Julian Brooke, Trevor Cohn, Shivashankar Subramanian, and Timothy Baldwin
Subjects: Coreference, business.industry, Computer science, 02 engineering and technology, Semi-supervised learning, Machine learning, computer.software_genre, Task (project management), 020204 information systems, Web page, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Pairwise comparison, Artificial intelligence, business, computer, PU learning, Natural language processing
Abstract: A person or other entity is often associated with multiple URL endpoints on the web, motivating the task of determining whether a given pair of webpages is coreferent to a given entity. To strike a balance between unsupervised and supervised methods that require annotated data, we build a positive and unlabelled (PU) learning model, where we obtain positive examples using web search-based distant supervision. We evaluate our proposed approach using the SemEval-2007 WePS and ALTA-2016 shared task datasets.
Published: 2017

38. Decoupling Encoder and Decoder Networks for Abstractive Document Summarization

Author: Ying Xu, Trevor Cohn, Jey Han Lau, and Timothy Baldwin
Subjects: Document summarization, Recurrent neural network, Computer science, Real-time computing, Training time, Decoupling (probability), Decoupled architecture, Data_CODINGANDINFORMATIONTHEORY, Data mining, ENCODE, computer.software_genre, Encoder, computer
Abstract: ive document summarization seeks to automatically generate a summary for a document, based on some abstract “understanding” of the original document. State-of-the-art techniques traditionally use attentive encoder–decoder architectures. However, due to the large number of parameters in these models, they require large training datasets and long training times. In this paper, we propose decoupling the encoder and decoder networks, and training them separately. We encode documents using an unsupervised document encoder, and then feed the document vector to a recurrent neural network decoder. With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time. Experiments show that the decoupled model achieves comparable performance with state-of-the-art models for in-domain documents, but less well for out-of-domain documents.
Published: 2017

39. Semi-Automated Resolution of Inconsistency for a Harmonized Multiword Expression and Dependency Parse Annotation

Author: Timothy Baldwin, Julian Brooke, and King Chan
Subjects: Dependency (UML), Parsing, Computer science, business.industry, Treebank, Context (language use), Resolution (logic), computer.software_genre, Multiword expression, Annotation, Artificial intelligence, Heuristics, business, computer, Natural language processing
Abstract: This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations. Candidates for correction are identified using a variety of heuristics, including an entirely novel one which identifies violations of MWE constituency in the dependency tree, and resolved by arbitration with minimal human intervention. Using this technique, we identified and corrected several hundred errors across both parse and MWE annotations, representing changes to a significant percentage (well over 10%) of the MWE instances in the joint corpus.
Published: 2017

40. Sub-character Neural Language Modelling in Japanese

Author: Julian Brooke, Timothy Baldwin, and Viet Anh Nguyen
Subjects: East Asian languages, business.industry, Computer science, 02 engineering and technology, Semantics, computer.software_genre, 030507 speech-language pathology & audiology, 03 medical and health sciences, Range (mathematics), Character (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Decomposition (computer science), 020201 artificial intelligence & image processing, Language modelling, Language model, Artificial intelligence, 0305 other medical science, business, computer, Natural language processing
Abstract: In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using sub-characters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of sub-characters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.
Published: 2017

41. Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation

Author: Timothy Baldwin, Yvette Graham, Qingsong Ma, and Qun Liu
Subjects: business.industry, Computer science, Evaluation of machine translation, Artificial intelligence, computer.software_genre, business, computer, Natural language processing
Abstract: Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators. Monolingual evaluation runs the risk, however, of bias in favour of MT systems that happen to produce translations superficially similar to the reference and, consistent with this intuition, previous investigations have concluded monolingual assessment to be strongly biased in this respect. On re-examination of past analyses, we identify a series of potential analytical errors that force some important questions to be raised about the reliability of past conclusions, however. We subsequently carry out further investigation into reference bias via direct human assessment of MT adequacy via quality controlled crowd-sourcing. Contrary to both intuition and past conclusions, results for show no significant evidence of reference bias in monolingual evaluation of MT.
Published: 2017

42. Robust Training under Linguistic Adversity

Author: Timothy Baldwin, Yitong Li, and Trevor Cohn
Subjects: 020205 medical informatics, Language change, Syntactic methods, business.industry, Computer science, Sentiment analysis, 02 engineering and technology, 010501 environmental sciences, Overfitting, Machine learning, computer.software_genre, 01 natural sciences, Range (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Noise (video), Artificial intelligence, Baseline (configuration management), business, computer, Dropout (neural networks), 0105 earth and related environmental sciences
Abstract: Deep neural networks have achieved remarkable results across many language processing tasks, however they have been shown to be susceptible to overfitting and highly sensitive to noise, including adversarial attacks. In this work, we propose a linguistically-motivated approach for training robust models based on exposing the model to corrupted text examples at training time. We consider several flavours of linguistically plausible corruption, include lexical semantic and syntactic methods. Empirically, we evaluate our method with a convolutional neural model across a range of sentiment analysis datasets. Compared with a baseline and the dropout method, our method achieves better overall performance.
Published: 2017

43. Sequence Effects in Crowdsourced Annotations

Author: Timothy Baldwin, Trevor Cohn, and Nitika Mathur
Subjects: Sequence, Computer science, business.industry, Interface (Java), media_common.quotation_subject, 05 social sciences, 02 engineering and technology, computer.software_genre, 050105 experimental psychology, Annotation, Component (UML), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0501 psychology and cognitive sciences, Quality (business), Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: Manual data annotation is a vital component of NLP research. When designing annotation tasks, properties of the annotation interface can unintentionally lead to artefacts in the resulting dataset, biasing the evaluation. In this paper, we explore sequence effects where annotations of an item are affected by the preceding items. Having assigned one label to an instance, the annotator may be less (or more) likely to assign the same label to the next. During rating tasks, seeing a low quality item may affect the score given to the next item either positively or negatively. We see clear evidence of both types of effects using auto-correlation studies over three different crowdsourced datasets. We then recommend a simple way to minimise sequence effects.
Published: 2017

44. Context-Aware Prediction of Derivational Word-forms

Author: Timothy Baldwin, Ekaterina Vylomova, Ryan Cotterell, and Trevor Cohn
Subjects: FOS: Computer and information sciences, Lemma (mathematics), Computer Science - Computation and Language, Artificial neural network, business.industry, Computer science, Context (language use), 02 engineering and technology, Lexicon, computer.software_genre, Base (topology), 03 medical and health sciences, 0302 clinical medicine, 030221 ophthalmology & optometry, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Representation (mathematics), business, computer, Encoder, Computation and Language (cs.CL), Natural language processing, Word (computer architecture)
Abstract: Derivational morphology is a fundamental and complex characteristic of language. In this paper we propose the new task of predicting the derivational form of a given base-form lemma that is appropriate for a given context. We present an encoder--decoder style neural network to produce a derived form character-by-character, based on its corresponding character-level representation of the base form and the context. We demonstrate that our model is able to generate valid context-sensitive derivations from known base forms, but is less accurate under a lexicon agnostic setting.
Published: 2017
Full Text: View/download PDF

45. Improving Evaluation of Document-level Machine Translation Quality Estimation

Author: Carla Parra, Timothy Baldwin, Qingsong Ma, Yvette Graham, Qun Liu, and Carolina Scarton
Subjects: Estimation, Machine translation, business.industry, Computer science, media_common.quotation_subject, 02 engineering and technology, Gold standard (test), Crowdsourcing, Machine learning, computer.software_genre, Cost reduction, Document level, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Quality (business), Artificial intelligence, Computational linguistics, business, computer, media_common
Abstract: © 2017 Association for Computational Linguistics. Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable. In this paper, we explore the validity of human annotations currently employed in the evaluation of document-level quality estimation for machine translation (MT).We demonstrate the degree to which MT system rankings are dependent on weights employed in the construction of the gold standard, before proposing direct human assessment as a valid alternative. Experiments show direct assessment (DA) scores for documents to be highly reliable, achieving a correlation of above 0.9 in a self-replication experiment, in addition to a substantial estimated cost reduction through quality controlled crowdsourcing. The original gold standard based on post-edits incurs a 10-20 times greater cost than DA.
Published: 2017

46. A Neural Model for User Geolocation and Lexical Dialectology

Author: Afshin Rahimi, Trevor Cohn, and Timothy Baldwin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, business.industry, Computer science, Dialectology, 02 engineering and technology, computer.software_genre, Term (time), Geolocation, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Artificial intelligence, State (computer science), business, computer, Computation and Language (cs.CL), Word (computer architecture), Natural language processing
Abstract: We propose a simple yet effective text- based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms. As part of our analysis of dialectal terms, we release DAREDS, a dataset for evaluating dialect term detection methods.
Published: 2017
Full Text: View/download PDF

47. Topically Driven Neural Language Model

Author: Jey Han Lau, Timothy Baldwin, and Trevor Cohn
Subjects: FOS: Computer and information sciences, Topic model, Computer Science - Computation and Language, Perplexity, Computer science, business.industry, Context (language use), 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Range (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, business, Representation (mathematics), Computation and Language (cs.CL), computer, Natural language processing, Sentence, 0105 earth and related environmental sciences
Abstract: Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics., Comment: 11 pages, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017) (to appear)
Published: 2017

48. Text-Based Twitter User Geolocation Prediction

Author: Bo Han, Paul Cook, and Timothy Baldwin
Subjects: Metadata, Geolocation, Information retrieval, Geospatial analysis, Exploit, Artificial Intelligence, Computer science, Event (computing), Feature selection, Variance (accounting), computer.software_genre, computer, Task (project management)
Abstract: Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its authors location. However, these references are often buried in informal, ungrammatical, and multilingual data, and are therefore non-trivial to identify and exploit. We present an integrated geolocation prediction framework and investigate what factors impact on prediction accuracy. First, we evaluate a range of feature selection methods to obtain location indicative words. We then evaluate the impact of non-geotagged tweets, language, and user-declared metadata on geolocation prediction. In addition, we evaluate the impact of temporal variance on model generalisation, and discuss how users differ in terms of their geolocatability. We achieve state-of-the-art results for the text-based Twitter user geolocation task, and also provide the most extensive exploration of the task to date. Our findings provide valuable insights into the design of robust, practical text-based geolocation prediction systems.
Published: 2014

49. Statistical Methods for Identifying Local Dialectal Terms from GPS-Tagged Documents

Author: Timothy Baldwin, Bo Han, and Paul Cook
Subjects: Linguistics and Language, business.industry, Computer science, Communication, computer.software_genre, Language and Linguistics, Linguistics, Lexicography, Metadata, Set (abstract data type), Corpus linguistics, Global Positioning System, Social media, Artificial intelligence, Geographic coordinate system, business, computer, Natural language processing, Word (computer architecture)
Abstract: Corpora of documents whose metadata includes GPS coordinates have recently become widely available through online social media such as Twitter. This has created opportunities for statistical corpus methods that describe the geographical spread of words, but such techniques do not appear to be widely used in corpus linguistics and lexicography. This paper presents several methods for describing the spread of a set of points, corresponding to documents containing a given word and applies the methods to a corpus of GPS-tagged tweets from Twitter. In experiments on known regionalisms, we show that these methods could be used to help identify such expressions. We analyze the words in the corpus identified as having the most geographically restricted usage and identify some expressions that appear to be previously undocumented regionalisms with highly localized usage.
Published: 2014

50. Word sense and semantic relations in noun compounds

Author: Timothy Baldwin and Su Nam Kim
Subjects: Noun compounds, Collocation, Word-sense disambiguation, business.industry, Computer science, Substitution (logic), computer.software_genre, Word sense, SemEval, Computational Mathematics, Component (UML), Computer Science (miscellaneous), Benchmark (computing), Artificial intelligence, business, computer, Natural language processing
Abstract: In this article, we investigate word sense distributions in noun compounds (NCs). Our primary goal is to disambiguate the word sense of component words in NCs, based on investigation of “semantic collocation” between them. We use sense collocation and lexical substitution to build supervised and unsupervised word sense disambiguation (WSD) classifiers, and show our unsupervised learner to be superior to a benchmark WSD system. Further, we develop a word sense-based approach to interpreting the semantic relations in NCs.
Published: 2013

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

138 results on '"Timothy Baldwin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources