53 results on '"Discourse connectives"'
Search Results
2. Compensating for processing difficulty in discourse: Effect of parallelism in contrastive relations
- Author
-
Ludivine Crible and Martin J. Pickering
- Subjects
Linguistics and Language ,Computer science ,media_common.quotation_subject ,Discourse analysis ,computer.software_genre ,050105 experimental psychology ,Languages and Literatures ,Language and Linguistics ,contrastive relations ,0501 psychology and cognitive sciences ,media_common ,parallelism ,business.industry ,Communication ,05 social sciences ,Phrase structure rules ,050301 education ,Contrast (statistics) ,Coherence (statistics) ,Ambiguity ,self-paced reading ,discourse connectives ,Feature (computer vision) ,ambiguity ,Parallelism (grammar) ,Task analysis ,Artificial intelligence ,business ,0503 education ,computer ,Natural language processing - Abstract
This study aims to establish whether the processing of different connectives (e.g., and, but) and different coherence relations (addition, contrast) can be modulated by a structural feature of the connected segments—namely, parallelism. While but is mainly used to contrast two expressions, and occurs in many different relations and has been shown to come with a processing cost. We report three self-paced reading experiments in which we manipulate whether the connected segments share a common verb phrase. Such parallel constructions frequently occur in contrastive relations, although they are typically treated as additive in comprehension research. We expect that parallelism will compensate for the cognitive complexity of contrast and for the ambiguity of and by further signaling the coherence relation. Our results indicate that parallelism speeds up processing and provides further evidence for priming in comprehension. However, parallelism interacted with connective ambiguity in an overt disambiguation task (Experiment 3) but not in a more natural reading task (Experiment 2). We argue that the processing of contrast remains shallow unless disambiguation is explicitly required.
- Published
- 2020
- Full Text
- View/download PDF
3. Labeling Explicit Discourse Relations Using Pre-trained Language Models
- Author
-
Murathan Kurfalı
- Subjects
Parsing ,business.industry ,Computer science ,02 engineering and technology ,Full Relation ,Discourse connectives ,computer.software_genre ,Task (project management) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Labeling explicit discourse relations is one of the most challenging sub-tasks of the shallow discourse parsing where the goal is to identify the discourse connectives and the boundaries of their arguments. The state-of-the-art models achieve slightly above 45% of F-score by using hand-crafted features. The current paper investigates the efficacy of the pre-trained language models in this task. We find that the pre-trained language models, when finetuned, are powerful enough to replace the linguistic features. We evaluate our model on PDTB 2.0 and report the state-of-the-art results in extraction of the full relation. This is the first time when a model outperforms the knowledge intensive models without employing any linguistic features.
- Published
- 2020
- Full Text
- View/download PDF
4. Dependency profiles in the large-scale analysis of discourse connectives
- Author
-
Aki-Juhani Kyröläinen, Veronika Laippala, Filip Ginter, and Jenna Kanerva
- Subjects
Linguistics and Language ,Dependency (UML) ,Computer science ,business.industry ,Scale analysis (mathematics) ,Artificial intelligence ,Discourse connectives ,computer.software_genre ,business ,computer ,Language and Linguistics ,Natural language processing ,Universal dependencies - Abstract
This article presents dependency profiles (DPs) as an empirical method to investigate linguistic elements and their application to the study of 24 discourse connectives in the 3.7-billion token Finnish Internet Parsebank (http://bionlp-www.utu.fi/dep_search/). DPs are based on co-occurrence patterns of the discourse connectives with dependency syntax relations. They follow the assumption of usage-based models, according to which the semantic and functional properties of linguistic expressions arise based on their distributional characteristics. We focus on the typical usage patterns reflected by the DPs and the (dis)similarities among discourse connectives that these patterns reveal. We demonstrate that 1) DPs can be analyzed with clustering to obtain linguistically meaningful groupings among the connectives and that 2) the clustering can be combined with support vector machines to obtain generic and stable linguistic characteristics of the discourse connectives. We show that this data-driven method offers support for previous results and reveals novel tendencies outside the scope of studies on smaller corpora. As the method is based on automatic syntactic analysis following the cross-linguistic universal dependencies, it does not require manual annotation and can be applied to a number of languages and in contrastive studies.
- Published
- 2018
- Full Text
- View/download PDF
5. Unifying dimensions in coherence relations: How various annotation frameworks are related
- Author
-
Jet Hoek, Ted Sanders, Vera Demberg, Jacqueline Evers-Vermeul, Merel Scholman, Sandrine Zufferey, and Fatemeh Torabi Asr
- Subjects
060201 languages & linguistics ,Linguistics and Language ,business.industry ,05 social sciences ,410 Linguistics ,Applied linguistics ,06 humanities and the arts ,Discourse connectives ,computer.software_genre ,050105 experimental psychology ,Language and Linguistics ,Annotation ,0602 languages and literature ,440 French & related languages ,0501 psychology and cognitive sciences ,Artificial intelligence ,business ,Psychology ,computer ,Natural language processing ,Coherence (linguistics) - Abstract
In this paper, we show how three often used and seemingly different discourse annotation frameworks – Penn Discourse Treebank (PDTB), Rhetorical Structure Theory (RST), and Segmented Discourse Representation Theory – can be related by using a set of unifying dimensions. These dimensions are taken from the Cognitive approach to Coherence Relations and combined with more fine-grained additional features from the frameworks themselves to yield a posited set of dimensions that can successfully map three frameworks. The resulting interface will allow researchers to find identical or at least closely related relations within sets of annotated corpora, even if they are annotated within different frameworks. Furthermore, we tested our unified dimension (UniDim) approach by comparing PDTB and RST annotations of identical newspaper texts and converting their original end label annotations of relations into the accompanying values per dimension. Subsequently, rates of overlap in the attributed values per dimension were analyzed. Results indicate that the proposed dimensions indeed create an interface that makes existing annotation systems “talk to each other.”
- Published
- 2018
- Full Text
- View/download PDF
6. Annotating the meaning of discourse connectives in multilingual corpora
- Author
-
Liesbeth Degand and Sandrine Zufferey
- Subjects
Linguistics and Language ,Relation (database) ,Computer science ,410 Linguistics ,computer.software_genre ,Language and Linguistics ,Lexical item ,Annotation ,410 Linguistik ,060201 languages & linguistics ,business.industry ,05 social sciences ,050301 education ,06 humanities and the arts ,Discourse connectives ,Linguistics ,440 Französisch & verwandte Sprachen ,Tree (data structure) ,0602 languages and literature ,Artificial intelligence ,business ,0503 education ,computer ,Natural language processing ,Coherence (linguistics) ,440 French & related languages ,Meaning (linguistics) - Abstract
Discourse connectives are lexical items indicating coherence relations between discourse segments. Even though many languages possess a whole range of connectives, important divergences exist cross-linguistically in the number of connectives that are used to express a given relation. For this reason, connectives are not easily paired with a univocal translation equivalent across languages. This paper is a first attempt to design a reliable method to annotate the meaning of discourse connectives cross-linguistically using corpus data. We present the methodological choices made to reach this aim and report three annotation experiments using the framework of the Penn Discourse Tree Bank.
- Published
- 2017
- Full Text
- View/download PDF
7. A Systematic Investigation of Neural Models for Chinese Implicit Discourse Relationship Recognition
- Author
-
Yuanbin Wu, Dejian Li, and Man Lan
- Subjects
Parsing ,Word embedding ,business.industry ,Computer science ,Deep learning ,Component (UML) ,Artificial intelligence ,computer.software_genre ,business ,Discourse connectives ,computer ,Natural language processing - Abstract
The Chinese implicit discourse relationship recognition is more challenging than English due to the lack of discourse connectives and high frequency in the text. So far, there is no systematical investigation into the neural components for Chinese implicit discourse relationship. To fill this gap, in this work we present a component-based neural framework to systematically study the Chinese implicit discourse relationship. Experimental results showed that our proposed neural Chinese implicit discourse parser achieves the SOTA performance in CoNLL-2016 corpus.
- Published
- 2019
- Full Text
- View/download PDF
8. Connective-Lex: A Web-Based Multilingual Lexical Resource for Connectives
- Author
-
Manfred Stede, Amália Mendes, and Tatjana Scheffler
- Subjects
050101 languages & linguistics ,Computer science ,02 engineering and technology ,computer.software_genre ,Resource (project management) ,lcsh:P1-1091 ,Lexical resource ,0202 electrical engineering, electronic engineering, information engineering ,Added value ,ressources multilingues ,Web application ,0501 psychology and cognitive sciences ,linking ,Class (computer programming) ,business.industry ,05 social sciences ,Online database ,lcsh:P98-98.5 ,Lexicographical order ,Discourse connectives ,lcsh:Philology. Linguistics ,discourse connectives ,connecteurs discursifs ,crosslinguistic links ,lexicon ,multilingual resources ,lexique ,020201 artificial intelligence & image processing ,Artificial intelligence ,lcsh:Computational linguistics. Natural language processing ,business ,computer ,Natural language processing - Abstract
In this paper, we present a tangible outcome of the TextLink network: a joint online database project displaying and linking existing and newly-created lexicons of discourse connectives in multiple languages. We discuss the definition and demarcation of the class of connectives that should be included in such a resource, and present the syntactic, semantic/pragmatic, and lexicographic information we collected. Further, the technical implementation of the database and the search functionality are presented. We discuss how the multilingual integration of several connective lexicons provides added value for linguistic researchers and other users interested in connectives, by allowing crosslinguistic comparison and a direct linking between discourse relational devices in different languages. Finally, we provide pointers for possible future extensions both in breadth (i.e., by adding lexicons for additional languages) and depth (by extending the information provided for each connective item and by strengthening the crosslinguistic links). Nous présentons dans cet article un résultat tangible du réseau TextLink : un projet conjoint de base de données en ligne, qui montre et relie des lexiques, aussi bien existants que créés récemment, de connecteurs discursifs dans plusieurs langues. Nous commençons par considérer la définition et la délimitation de la classe des connecteurs qui devraient être inclus dans une telle ressource, et nous présentons l’information syntaxique, sémantico-pragmatique et lexicographique que nous avons recueillie. D’autre part, l’implémentation technique de cette base de données et les fonctionnalités de recherche qu’elle permet sont aussi décrites. Nous discutons de quelle manière l’intégration multilingue de plusieurs lexiques de connecteurs apporte une valeur ajoutée aux chercheurs en linguistique et aux autres utilisateurs qui s’intéressent aux connecteurs, en permettant de comparer plusieurs langues et de relier directement les connecteurs dans différentes langues. Pour finir, nous donnons des indications quant à une possible extension future en termes d’ampleur (par exemple, en ajoutant des lexiques pour de nouvelles langues) et de profondeur (en augmentant l’information qui est donnée pour chaque connecteur et en renforçant les liens entre lexiques).
- Published
- 2019
9. Anaphoric Connectives and Long-Distance Discourse Relations in Czech
- Author
-
Lucie Poláková and Jiří Mírovský
- Subjects
Czech ,050101 languages & linguistics ,Coreference ,Parsing ,General Computer Science ,Discourse analysis ,05 social sciences ,Treebank ,02 engineering and technology ,Discourse connectives ,computer.software_genre ,Linguistics ,language.human_language ,Empirical research ,Manual annotation ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Sociology ,computer - Abstract
This paper is a linguistic as well as technical survey for the development of a shallow discourse parser for Czech. It focuses on long-distance discourse relations signalled by (mostly) anaphoric discourse connectives. Proceeding from the division of connectives on “structural” and “anaphoric” according to their (in)ability to accept distant (non-adjacent) text segments as their left-sided arguments, and taking into account results of related analyses on English data in the framework of the Penn Discourse Treebank, we analyze a large amount of language data in Czech. We benefit from the multilayer manual annotation of various language aspects from morphology to discourse, coreference and bridging relations in the Prague Dependency Treebank 3.0. We describe the linguistic parameters of long-distance discourse relations in Czechin connection with their anchoring connective, and suggest possible ways of their detection. Our empirical research also outlines some theoretical consequences for the underlying assumptions in discourse analysis and parsing, e.g. the risk of relying too much on different (language-specific?) part-of-speech categorizations of connectives or the different perspectives in shallow and global discourse analyses (the minimality principle vs. higher text structure).
- Published
- 2019
- Full Text
- View/download PDF
10. Towards the Data-driven System for Rhetorical Parsing of
- Author
-
Svetlana Toldova, Dina Pisarevskaya, Artem Shelmanov, Elena Chistova, Ivan V. Smirnov, and Maria Kobozeva
- Subjects
Parsing ,business.industry ,Computer science ,Linear svm ,Lexicon ,computer.software_genre ,Discourse connectives ,Data-driven ,Relation classification ,Rhetorical question ,Artificial intelligence ,Macro ,business ,computer ,Natural language processing - Abstract
Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.
- Published
- 2019
- Full Text
- View/download PDF
11. Annotations of Connectives and Arguments in Malayalam Language
- Author
-
Sobha Lalitha Devi and S. Kumari Sheeja
- Subjects
explicit ,02 engineering and technology ,computer.software_genre ,Annotation ,0202 electrical engineering, electronic engineering, information engineering ,Malayalam discourse ,Relation (history of concept) ,General Environmental Science ,Mathematics ,060201 languages & linguistics ,Discourse structure ,business.industry ,06 humanities and the arts ,Discourse connectives ,language.human_language ,Linguistics ,Focus (linguistics) ,discourse connectives ,annotation ,implicit connective ,0602 languages and literature ,Text structure ,Malayalam ,language ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Natural language - Abstract
Discourse relations in natural languages link clauses in text and compose overall text structure. Discourse connectives are an important part of modeling the Malayalam discourse structure. We followed the annotation procedure of Penn Discourse Tree Bank and worked on tagging of discourse connectives and arguments of Malayalam text and also report the senses of relation. We present our work on annotations of Malayalam discourse connectives and arguments which helps to know more about the discourse connectives and their appearance in case of semantic rules in Malayalam discourse. Discourse connectives may or may not be explicitly present in the relation. In our work, we focus on the annotation of both explicit and implicit connectives and arguments in Malayalam text and showed encouraging results.
- Published
- 2016
- Full Text
- View/download PDF
12. Constructing a Lexicon of English Discourse Connectives
- Author
-
Debopam Das, Manfred Stede, Tatjana Scheffler, and Peter Bourgonje
- Subjects
060201 languages & linguistics ,Relation (database) ,Computer science ,business.industry ,Discourse structure ,06 humanities and the arts ,02 engineering and technology ,computer.software_genre ,Discourse connectives ,Semantics ,Lexicon ,language.human_language ,German ,Resource (project management) ,0602 languages and literature ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,XML schema ,Artificial intelligence ,business ,computer ,Natural language processing ,computer.programming_language - Abstract
We present a new lexicon of English discourse connectives called DiMLex-Eng, built by merging information from two annotated corpora and an additional list of relation signals from the literature. The format follows the German connective lexicon DiMLex, which provides a cross-linguistically applicable XML schema. DiMLex-Eng contains 149 English connectives, and gives information on syntactic categories, discourse semantics and non-connective uses (if any). We report on the development steps and discuss design decisions encountered in the lexicon expansion phase. The resource is freely available for use in studies of discourse structure and computational applications.
- Published
- 2018
- Full Text
- View/download PDF
13. Automatic Mining of Discourse Connectives for Russian
- Author
-
Svetlana Toldova, Maria Kobozeva, and Dina Pisarevskaya
- Subjects
business.industry ,Computer science ,02 engineering and technology ,Type (model theory) ,Discourse connectives ,computer.software_genre ,Identification (information) ,Rule-based machine translation ,Rhetorical Structure Theory ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Distributional semantics ,business ,Heuristics ,computer ,Sentence ,Natural language processing - Abstract
The identification of discourse connectives plays an important role in many discourse processing approaches. Among them there are functional words usually enumerated in grammars (iz-za ‘due to’, blagodarya ‘thanks to’,) and not grammaticalized expressions (X vedet k Y ‘X leads to Y’, prichina etogo ‘the cause is’). Both types of connectives signal certain relations between discourse units. However, there are no ready-made lists of the second type of connectives. We suggest a method for expanding a seed list of connectives based on their vector representations by candidates for not grammaticalized connectives for Russian. Firstly, we compile a list of patterns for this type of connectives. These patterns are based on the following heuristics: the connectives are often used with anaphoric expressions substituting discourse units (thus, some patterns include special anaphoric elements); the connectives more frequently occur at the sentence beginning or after a comma. Secondly, we build multi-word tokens that are based on these patterns. Thirdly, we build vector representations for the multi-word tokens that match these patterns. Our experiments based on distributional semantics give quite reasonable list of the candidates for connectives.
- Published
- 2018
- Full Text
- View/download PDF
14. Primary and secondary discourse connectives: definitions and lexicons
- Author
-
Laurence Danlos, Katerina Rysova, Manfred Stede, Magdaléna Rysová, Laboratoire de Linguistique Formelle (LLF UMR7110), Centre National de la Recherche Scientifique (CNRS)-Université Paris Diderot - Paris 7 (UPD7), and University of Potsdam
- Subjects
060201 languages & linguistics ,Structure (mathematical logic) ,Linguistics and Language ,Parsing ,Discourse structure ,Computer science ,Communication ,Perspective (graphical) ,discourse connective ,06 humanities and the arts ,02 engineering and technology ,Discourse connectives ,computer.software_genre ,Structuring ,Language and Linguistics ,Linguistics ,Focus (linguistics) ,discourse structure ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,computer ,discourse semantics ,Coherence (linguistics) - Abstract
International audience; Starting from the perspective that discourse structure arises from the presence of coherence relations , we provide a map of linguistic discourse structuring devices (DRDs), and then focus on those found in written text: connectives. To subdivide this class further, we follow the recent idea of structuring the set of connectives by differentiating between primary and secondary con-nectives, on the one hand, and free connecting phrases, on the other. Considering examples from Czech, English, French and German, we develop definitions of these groups, with attention to certain cross-linguistic differences. For primary and secondary connectives, we propose that their behavior can be described to a large extent by declarative lexicons, and we demonstrate a concrete proposal which has been applied to five languages, with others currently being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation).
- Published
- 2018
- Full Text
- View/download PDF
15. Disambiguating discourse connectives for statistical machine translation
- Author
-
Najeh Hajlaoui, Thomas Meyer, and Andrei Popescu-Belis
- Subjects
Phrase ,Acoustics and Ultrasonics ,Machine translation ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,SIGNAL (programming language) ,Ambiguity ,Transfer-based machine translation ,Discourse connectives ,Translation (geometry) ,computer.software_genre ,Computational Mathematics ,Metric (mathematics) ,Computer Science (miscellaneous) ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing ,media_common - Abstract
This paper shows that the automatic labeling of discourse connectives with the relations they signal, prior to machine translation (MT), can be used by phrase-based statistical MT systems to improve their translations. This improvement is demonstrated here when translating from English to four target languages - French, German, Italian and Arabic - using several test sets from recent MT evaluation campaigns. Using automatically labeled data for training, tuning and testing MT systems is beneficial on condition that labels are sufficiently accurate, typically above 70%. To reach such an accuracy, a large array of features for discourse connective labeling (morpho-syntactic, semantic and discursive) are extracted using state-of-the-art tools and exploited in factored MT models. The translation of connectives is improved significantly, between 0.7% and 10% as measured with the dedicated ACT metric. The improvements depend mainly on the level of ambiguity of the connectives in the test sets.
- Published
- 2015
- Full Text
- View/download PDF
16. Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation
- Author
-
Bonnie Webber, Aravind K. Joshi, and Rashmi Prasad
- Subjects
Linguistics and Language ,Computer science ,business.industry ,Treebank ,computer.software_genre ,Discourse connectives ,Language and Linguistics ,Psycholinguistics ,Linguistics ,Computer Science Applications ,Focus (linguistics) ,Annotation ,Artificial Intelligence ,Language technology ,Adjacency list ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically-grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology and psycholinguistics but also has spawned the annotation of comparable corpora in other languages and genres. Given this situation, this paper has four aims: (1) to provide a comprehensive introduction to the PDTB for those who are unfamiliar with it; (2) to correct some wrong (or perhaps inadvertent) assumptions about the PDTB and its annotation that may have weakened previous results or the performance of decision procedures induced from the data; (3) to explain variations seen in the annotation of comparable resources in other languages and genres, which should allow developers of future comparable resources to recognize whether the variations are relevant to them; and (4) to enumerate and explain relationships between PDTB annotation and complementary annotation of other linguistic phenomena. The paper draws on work done by ourselves and others since the corpus was released.
- Published
- 2014
- Full Text
- View/download PDF
17. Improving Discourse Relation Projection to Build Discourse Annotated Corpora
- Author
-
Majid Laali and Leila Kosseim
- Subjects
FOS: Computer and information sciences ,060201 languages & linguistics ,Discourse relation ,Computer Science - Computation and Language ,Intersection (set theory) ,business.industry ,Computer science ,06 humanities and the arts ,02 engineering and technology ,computer.software_genre ,Discourse connectives ,Annotation ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Projection (set theory) ,business ,Computation and Language (cs.CL) ,computer ,Classifier (UML) ,Natural language processing - Abstract
The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit discourse relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approach identified 65% of the unsupported annotations in the English-French parallel sentences from Europarl. By filtering out these unsupported annotations, we induced the first PDTB-style discourse annotated corpus for French from Europarl. We then used this corpus to train a classifier to identify the discourse-usage of French discourse connectives and show a 15% improvement of F1-score compared to the classifier trained on the non-filtered annotations.
- Published
- 2017
- Full Text
- View/download PDF
18. Translating Implicit Discourse Connectives Based on Cross-lingual Annotation and Alignment
- Author
-
Philippe Langlais, Yaohong Jin, and Hongzheng Li
- Subjects
060201 languages & linguistics ,Cross lingual ,business.industry ,Computer science ,06 humanities and the arts ,computer.software_genre ,Discourse connectives ,Linguistics ,Annotation ,0602 languages and literature ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Implicit discourse connectives and relations are distributed more widely in Chinese texts, when translating into English, such connectives are usually translated explicitly. Towards Chinese-English MT, in this paper we describe cross-lingual annotation and alignment of dis-course connectives in a parallel corpus, describing related surveys and findings. We then conduct some evaluation experiments to testify the translation of implicit connectives and whether representing implicit connectives explicitly in source language can improve the final translation performance significantly. Preliminary results show it has little improvement by just inserting explicit connectives for implicit relations.
- Published
- 2017
- Full Text
- View/download PDF
19. Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations
- Author
-
Majid Laali and Leila Kosseim
- Subjects
060201 languages & linguistics ,FOS: Computer and information sciences ,Phrase ,Computer Science - Computation and Language ,Recall ,Exploit ,Machine translation ,business.industry ,Computer science ,06 humanities and the arts ,computer.software_genre ,Lexicon ,Discourse connectives ,Order (business) ,0602 languages and literature ,Artificial intelligence ,business ,computer ,Computation and Language (cs.CL) ,Natural language processing - Abstract
In this paper, we present an approach to exploit phrase tables generated by statistical machine translation in order to map French discourse connectives to discourse relations. Using this approach, we created ConcoLeDisCo, a lexicon of French discourse connectives and their PDTB relations. When evaluated against LEXCONN, ConcoLeDisCo achieves a recall of 0.81 and an Average Precision of 0.68 for the Concession and Condition relations.
- Published
- 2017
- Full Text
- View/download PDF
20. The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations
- Author
-
Yuping Zhou and Nianwen Xue
- Subjects
Linguistics and Language ,Computer science ,business.industry ,Treebank ,Library and Information Sciences ,Discourse connectives ,computer.software_genre ,Syntax ,Language and Linguistics ,Predicate (grammar) ,Linguistics ,Education ,Annotation ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language processing - Abstract
The paper presents the Chinese Discourse TreeBank, a corpus annotated with Penn Discourse TreeBank style discourse relations that take the form of a predicate taking two arguments. We first characterize the syntactic and statistical distributions of Chinese discourse connectives as well as the role of Chinese punctuation marks in discourse annotation, and then describe how we design our annotation strategy procedure based on this characterization. The Chinese-specific features of our annotation strategy include annotating explicit and implicit discourse relations in one single pass, defining the argument labels on semantic, rather than syntactic, grounds, as well as annotating the semantic type of implicit discourse relations directly. We also introduce a flat, 11-valued semantic type classification scheme for discourse relations. We finally demonstrate the feasibility of our approach with evaluation results.
- Published
- 2014
- Full Text
- View/download PDF
21. Learning Connective-based Word Representations for Implicit Discourse Relation Identification
- Author
-
Chloé Braud, Pascal Denis, Department of Computer Science [Copenhagen] (DIKU), Faculty of Science [Copenhagen], University of Copenhagen = Københavns Universitet (KU)-University of Copenhagen = Københavns Universitet (KU), Machine Learning in Information Networks (MAGNET), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), University of Copenhagen = Københavns Universitet (UCPH)-University of Copenhagen = Københavns Universitet (UCPH), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), and Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,media_common.quotation_subject ,Treebank ,02 engineering and technology ,Discourse ,computer.software_genre ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Simple (abstract algebra) ,0202 electrical engineering, electronic engineering, information engineering ,Rhetorical question ,Simplicity ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,PDTB ,media_common ,060201 languages & linguistics ,Discourse relation ,business.industry ,Word Embeddings ,06 humanities and the arts ,Discourse connectives ,Linguistics ,Identification (information) ,0602 languages and literature ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Construct (philosophy) ,computer ,Implicit Discourse Relations ,Natural language processing ,Word (computer architecture) - Abstract
International audience; We introduce a simple semi-supervised approach to improve implicitdiscourse relation identification. This approach harnesses largeamounts of automatically extracted discourse connectives along withtheir arguments to construct new distributional wordrepresentations. Specifically, we represent words in the space ofdiscourse connectives as a way to directly encode their rhetoricalfunction. Experiments on the Penn Discourse Treebank demonstrate theeffectiveness of these task-tailored representations in predictingimplicit discourse relations. Our results indeed show that, despitetheir simplicity, these connective-based representations outperformvarious off-the-shelf word embeddings, and achieve state-of-the-artperformance on this problem.
- Published
- 2016
22. Modeling the interpretation of discourse connectives by Bayesian pragmatics
- Author
-
Frances Yung, Yuji Matsumoto, Kevin Duh, and Taku Komura
- Subjects
Linguistics and Language ,Computer science ,Bayesian probability ,Treebank ,computer.software_genre ,050105 experimental psychology ,Language and Linguistics ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Artificial Intelligence ,0501 psychology and cognitive sciences ,Interpretation (logic) ,Parsing ,Literal (mathematical logic) ,business.industry ,05 social sciences ,Pragmatics ,Discourse connectives ,Linguistics ,Comprehension ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Natural language processing ,Software - Abstract
We propose a framework to model human comprehension of discourse connectives. Following the Bayesian pragmatic paradigm, we advocate that discourse connectives are interpreted based on a simulation of the production process by the speaker, who, in turn, considers the ease of interpretation for the listener when choosing connectives. Evaluation against the sense annotation of the Penn Discourse Treebank confirms the superiority of the model over literal comprehension. A further experiment demonstrates that the proposed model also improves automatic discourse parsing.
- Published
- 2016
- Full Text
- View/download PDF
23. Discourse connective detection in spoken conversations
- Author
-
Shammur Absar Chowdhury, Evgeny A. Stepanov, and Giuseppe Riccardi
- Subjects
060201 languages & linguistics ,Parsing ,Computer science ,business.industry ,Discourse analysis ,Treebank ,06 humanities and the arts ,02 engineering and technology ,computer.software_genre ,Discourse connectives ,Speech processing ,Syntax ,Predicate (grammar) ,Annotation ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Dialog box ,business ,computer ,Natural language processing - Abstract
Discourse parsing is an important task in Language Understanding with applications to human-human and human-machine communication modeling. However, most of the research has focused on written text, and parsers heavily rely on syntactic parsers that themselves have low performance on dialog data. In our work, we address the problem of analyzing the semantic relations between discourse units in human-human spoken conversations. In particular, in this paper we focus on the detection of discourse connectives which are the predicate of such relations. The discourse relations are drawn from the Penn Discourse Treebank annotation model and adapted to a domain-specific Italian human-human spoken conversations. We study the relevance of lexical and acoustic context in predicting discourse connectives. We observe that both lexical and acoustic context have mixed effect on the prediction of specific connectives. While the oracle of using lexical and acoustic contextual feature combinations is F1 = 68.53, the lexical context alone significantly outperforms the baseline by more than 10 points with F1 = 64.93.
- Published
- 2016
- Full Text
- View/download PDF
24. Modelling the Usage of Discourse Connectives as Rational Speech Acts
- Author
-
Frances Yung, Kevin Duh, Yuji Matsumoto, and Taku Komura
- Subjects
060201 languages & linguistics ,Discourse relation ,business.industry ,Computer science ,Treebank ,06 humanities and the arts ,02 engineering and technology ,computer.software_genre ,Discourse connectives ,Measure (mathematics) ,Information density ,Linguistics ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,Production (economics) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Utterance - Abstract
Discourse relations can either be implicit or explicitly expressed by markers, such as ’therefore’ and ’but’. How a speaker makes this choice is a question that is not well understood. We propose a psycholinguistic model that predicts whether a speaker will produce an explicit marker given the discourse relation s/he wishes to express. Based on the framework of the Rational Speech Acts model, we quantify the utility of producing a marker based on the information-theoretic measure of surprisal, the cost of production, and a bias to maintain uniform information density throughout the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms state of-the-art approaches, while giving an explanatory account of the speaker’s choice.
- Published
- 2016
- Full Text
- View/download PDF
25. Assessing the discourse factors that influence the quality of machine translation
- Author
-
Junyi Jessy Li, Marine Carpuat, and Ani Nenkova
- Subjects
Machine translation ,Arabic ,Computer science ,business.industry ,media_common.quotation_subject ,Discourse structure ,Computational linguistics ,Discourse connectives ,computer.software_genre ,language.human_language ,Linguistics ,Example-based machine translation ,Machine translations ,Rule-based machine translation ,Translation quality ,language ,Quality (business) ,Artificial intelligence ,business ,computer ,Sentence ,Natural language processing ,Dynamic and formal equivalence ,Computer aided language translation ,media_common - Abstract
We present a study of aspects of discourse structure - specifically discourse devices used to organize information in a sentence- that significantly impact the quality of machine translation. Our analysis is based on manual evaluations of translations of news from Chinese and Arabic to English. We find that there is a particularly strong mismatch in the notion of what constitutes a sentence in Chinese and English, which occurs often and is associated with significant degradation in translation quality. Also related to lower translation quality is the need to employ multiple explicit discourse connectives (because, but, etc.), as well as the presence of ambiguous discourse connectives in the English translation. Furthermore, the mismatches between discourse expressions across languages significantly impact translation quality., 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22-27, 2014, Baltimore, Maryland
- Published
- 2015
26. A Refined End-to-End Discourse Parser
- Author
-
Jianxiang Wang and Man Lan
- Subjects
Parsing ,Computer science ,business.industry ,Section (typography) ,Construct (python library) ,Discourse connectives ,computer.software_genre ,Linguistics ,Style (sociolinguistics) ,Task (project management) ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,End-to-end principle ,Overall performance ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
The CoNLL-2015 shared task focuses on shallow discourse parsing, which takes a piece of newswire text as input and returns the discourse relations in a PDTB style. In this paper, we describe our discourse parser that participated in the shared task. We use 9 components to construct the whole parser to identify discourse connectives, label arguments and classify the sense of Explicit or Non-Explicit relations in free texts. Compared to previous discourse parser, new components and features are added in our system, which further improves the overall performance of the discourse parser. Our parser ranks the first on two test datasets, i.e., PDTB Section 23 and a blind test dataset.
- Published
- 2015
- Full Text
- View/download PDF
27. Improving the Inference of Implicit Discourse Relations via Classifying Explicit Discourse Connectives
- Author
-
Nianwen Xue and Attapol Rutherford
- Subjects
Discourse relation ,Parsing ,Interpretation (logic) ,Computer science ,business.industry ,Natural language understanding ,Inference ,Context (language use) ,computer.software_genre ,Discourse connectives ,Linguistics ,Component (UML) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Discourse relation classification is an important component for automatic discourse parsing and natural language understanding. The performance bottleneck of a discourse parser comes from implicit discourse relations, whose discourse connectives are not overtly present. Explicit discourse connectives can potentially be exploited to collect more training data to collect more data and boost the performance. However, using them indiscriminately has been shown to hurt the performance because not all discourse connectives can be dropped arbitrarily. Based on this insight, we investigate the interaction between discourse connectives and the discourse relations and propose the criteria for selecting the discourse connectives that can be dropped independently of the context without changing the interpretation of the discourse. Extra training data collected only by the freely omissible connectives improve the performance of the system without additional features.
- Published
- 2015
- Full Text
- View/download PDF
28. The Role of Expectedness in the Implicitation and Explicitation of Discourse Relations
- Author
-
Jet Hoek, Ted Sanders, Jacqueline Evers-Vermeul, ILS L&C, LS taalbeheersing van het Nederlands, and Dep Talen, Literatuur en Communicatie
- Subjects
060201 languages & linguistics ,Computer science ,business.industry ,05 social sciences ,Inference ,06 humanities and the arts ,16. Peace & justice ,computer.software_genre ,Discourse connectives ,Degree (music) ,050105 experimental psychology ,Linguistics ,0602 languages and literature ,0501 psychology and cognitive sciences ,Source text ,Artificial intelligence ,business ,Relation (history of concept) ,computer ,Natural language processing - Abstract
Translation of discourse connectives varies more in human translations than in machine translations. Building on Murray’s (1997) continuity hypothesis and Sanders’ (2005) causality-by-default hypothesis we investigate whether expectedness influences the degree of implicitation and explicitation of discourse relations. We manually analyze how source text connectives are translated, and where connectives in target texts come from. We establish whether relations are explicitly signaled in the other language as well, or whether they have to be reconstructed by inference. We demonstrate that the amount of implicitation and explicitation of connectives in translation is influenced by the expectedness of the relation a connective signals. In addition, we show that the types of connectives most often added in translation are also the ones most often deleted.
- Published
- 2015
- Full Text
- View/download PDF
29. Closing the Gap: Domain Adaptation from Explicit to Implicit Discourse Relations
- Author
-
Yangfeng Ji, Jacob Eisenstein, and Gongbo Zhang
- Subjects
Surface (mathematics) ,Discourse relation ,business.industry ,Computer science ,computer.software_genre ,Discourse connectives ,Linguistics ,Artificial intelligence ,Limit (mathematics) ,business ,Set (psychology) ,Closing (morphology) ,Representation (mathematics) ,computer ,Classifier (UML) ,Natural language processing - Abstract
Many discourse relations are explicitly marked with discourse connectives, and these examples could potentially serve as a plentiful source of training data for recognizing implicit discourse relations. However, there are important linguistic differences between explicit and implicit discourse relations, which limit the accuracy of such an approach. We account for these differences by applying techniques from domain adaptation, treating implicitly and explicitly-marked discourse relations as separate domains. The distribution of surface features varies across these two domains, so we apply a marginalized denoising autoencoder to induce a dense, domain-general representation. The label distribution is also domain-specific, so we apply a resampling technique that is similar to instance weighting. In combination with a set of automatically-labeled data, these improvements eliminate more than 80% of the transfer loss incurred by training an implicit discourse relation classifier on explicitly-marked discourse relations.
- Published
- 2015
- Full Text
- View/download PDF
30. Discovering Implicit Discourse Relations Through Brown Cluster Pair Representation and Coreference Patterns
- Author
-
Nianwen Xue and Attapol Rutherford
- Subjects
Discourse relation ,Coreference ,Computer science ,business.industry ,Representation (arts) ,Discourse connectives ,computer.software_genre ,Linguistics ,Feature (linguistics) ,Sentence pair ,Cluster (physics) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Sentences form coherent relations in a discourse without discourse connectives more frequently than with connectives. Senses of these implicit discourse relations that hold between a sentence pair, however, are challenging to infer. Here, we employ Brown cluster pairs to represent discourse relation and incorporate coreference patterns to identify senses of implicit discourse relations in naturally occurring text. Our system improves the baseline performance by as much as 25%. Feature analyses suggest that Brown cluster pairs and coreference patterns can reveal many key linguistic characteristics of each type of discourse relation.
- Published
- 2014
- Full Text
- View/download PDF
31. Uncovering Discourse Relations to Insert Connectives between the Sentences of an Automatic Summary
- Author
-
António Branco and Sara Botelho Silveira
- Subjects
Discourse relation ,Computer science ,business.industry ,Discourse connectives ,computer.software_genre ,Automatic summarization ,language.human_language ,Linguistics ,Cohesion (linguistics) ,Classifier (linguistics) ,language ,Artificial intelligence ,Portuguese ,business ,computer ,Natural language processing - Abstract
This paper presents a machine learning approach to find and classify discourse relations between two unseen sentences. It describes the process of training a classifier that aims to determine (i) if there is any discourse relation among two sentences, and, if a relation is found, (ii) which is that relation. The final goal of this task is to insert discourse connectives between sentences seeking to enhance text cohesion of a summary produced by an extractive summarization system for the Portuguese language.
- Published
- 2014
- Full Text
- View/download PDF
32. A Constituent-Based Approach to Argument Labeling with Joint Inference in Discourse Parsing
- Author
-
Hwee Tou Ng, Guodong Zhou, and Fang Kong
- Subjects
Parsing ,Computer science ,business.industry ,Discourse analysis ,Treebank ,Inference ,computer.software_genre ,Discourse connectives ,Focus (linguistics) ,Artificial intelligence ,Argument (linguistics) ,business ,computer ,Sentence ,Natural language processing - Abstract
Discourse parsing is a challenging task and plays a critical role in discourse analysis. In this paper, we focus on labeling full argument spans of discourse connectives in the Penn Discourse Treebank (PDTB). Previous studies cast this task as a linear tagging or subtree extraction problem. In this paper, we propose a novel constituent-based approach to argument labeling, which integrates the advantages of both linear tagging and subtree extraction. In particular, the proposed approach unifies intra- and intersentence cases by treating the immediately preceding sentence as a special constituent. Besides, a joint inference mechanism is introduced to incorporate global information across arguments into our constituent-based approach via integer linear programming. Evaluation on PDTB shows significant performance improvements of our constituent-based approach over the best state-of-the-art system. It also shows the effectiveness of our joint inference mechanism in modeling global information across arguments.
- Published
- 2014
- Full Text
- View/download PDF
33. Towards a discourse relation-aware approach for Chinese-English machine translation
- Author
-
Frances Yung
- Subjects
Example-based machine translation ,Annotation ,Discourse relation ,Machine translation ,Computer science ,computer.software_genre ,Discourse connectives ,computer ,Linguistics ,Sentence ,Focus (linguistics) - Abstract
Translation of discourse relations is one of the recent efforts of incorporating discourse information to statistical machine translation (SMT). While existing works focus on disambiguation of ambiguous discourse connectives, or transformation of discourse trees, only explicit discourse relations are tackled. A greater challenge exists in machine translation of Chinese, since implicit discourse relations are abundant and occur both inside and outside a sentence. This thesis proposal describes ongoing work on bilingual discourse annotation and plans towards incorporating discourse relation knowledge to a ChineseEnglish SMT system with consideration of implicit discourse relations. The final goal is a discourse-unit-based translation model unbounded by the traditional assumption of sentence-to-sentence translation.
- Published
- 2014
- Full Text
- View/download PDF
34. Discourse Tagging for Indian Languages
- Author
-
Sindhuja Gopalan, S. Lakshmi, and Sobha Lalitha Devi
- Subjects
Hindi ,Grammatical gender ,Indo aryan ,business.industry ,Computer science ,media_common.quotation_subject ,Agglutination ,computer.software_genre ,Discourse connectives ,language.human_language ,Agreement ,Linguistics ,Variation (linguistics) ,Tamil ,language ,Malayalam ,Indian language ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
Indian Language Discourse Project is to develop large corpus annotated with various types of discourse relations which are explicit and implicit. As an initial step towards it we have annotated corpus in three languages, Hindi, Tamil and Malayalam belonging to the two major language families in India- Indo Aryan and Dravidian. In this paper we describe our initial experiments in annotating all the three language corpus and the domains of the corpus belongs to health. The initial experiment brought out various types of discourse connectives in the three languages and how they vary amongst the languages. The preliminary study itself revealed that there is cross linguistic variation among the three languages. We have shown the inter annotator agreement for all the three languages.
- Published
- 2014
- Full Text
- View/download PDF
35. Annotating Discourse Connectives in Spoken Turkish
- Author
-
Deniz Zeyrek and Isin Demirsahin
- Subjects
Turkish ,Computer science ,business.industry ,computer.software_genre ,Discourse connectives ,language.human_language ,Linguistics ,Style (sociolinguistics) ,Annotation ,language ,Artificial intelligence ,business ,computer ,Natural language processing ,Spoken language - Abstract
In an attempt to extend Penn Discourse Tree Bank (PDTB) / Turkish Discourse Bank (TDB) style annotations to spoken Turkish, this paper presents the first attempt at annotating the explicit discourse connectives in the Spoken Turkish Corpus (STC) demo version. We present the data and the method for the annotation. Then we reflect on the issues and challenges of transitioning from written to spoken language. We present the preliminary findings suggesting that the distribution of the search tokens and their use as discourse connectives are similar in the TDB and the STC demo.
- Published
- 2014
- Full Text
- View/download PDF
36. Assessing the Accuracy of Discourse Connective Translations: Validation of an Automatic Metric
- Author
-
Andrei Popescu-Belis and Najeh Hajlaoui
- Subjects
Machine translation ,business.industry ,Arabic ,Computer science ,Sample (statistics) ,computer.software_genre ,Class (biology) ,language.human_language ,Fluency ,Machine Translation ,Metric (mathematics) ,language ,MT evaluation ,Evaluation of machine translation ,Artificial intelligence ,business ,discourse connectives ,computer ,Sentence ,Natural language processing - Abstract
Automatic metrics for the evaluation of machine translation (MT) compute scores that characterize globally certain aspects of MT quality such as adequacy and fluency. This paper introduces a reference-based metric that is focused on a particular class of function words, namely discourse connectives, of particular importance for text structuring, and rather challenging for MT. To measure the accuracy of connective translation (ACT), the metric relies on automatic word-level alignment between a source sentence and respectively the reference and candidate translations, along with other heuristics for comparing translations of discourse connectives. Using a dictionary of equivalents, the translations are scored automatically, or, for better precision, semi-automatically. The precision of the ACT metric is assessed by human judges on sample data for English/French and English/Arabic translations: the ACT scores are on average within 2% of human scores. The ACT metric is then applied to several commercial and research MT systems, providing an assessment of their performance on discourse connectives.
- Published
- 2013
- Full Text
- View/download PDF
37. LEXCONN: A French Lexicon of Discourse Connectives
- Author
-
Philippe Muller, Laurence Danlos, Charlotte Roze, Analyse Linguistique Profonde à Grande Echelle, Large-scale deep linguistic processing (ALPAGE), Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris Diderot - Paris 7 (UPD7), MEthodes et ingénierie des Langues, des Ontologies et du DIscours (IRIT-MELODI), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Paris Diderot - Paris 7 (UPD7)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Lydia-Mai Ho-Dac, Julie Lemarié, Marie-Paule Péry-Woodley, Marianne Vergez-Couret, ANNODIS, and Muller, Philippe
- Subjects
Discourse representation theory ,discourse relations ,Relation (database) ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,computer.software_genre ,Lexicon ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Resource (project management) ,Empirical research ,lcsh:P1-1091 ,0202 electrical engineering, electronic engineering, information engineering ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,ComputingMilieux_MISCELLANEOUS ,media_common ,060201 languages & linguistics ,business.industry ,lcsh:P98-98.5 ,06 humanities and the arts ,Ambiguity ,[SCCO.LING]Cognitive science/Linguistics ,Discourse connectives ,Linguistics ,lcsh:Philology. Linguistics ,discourse connectives ,identification of connectives ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,0602 languages and literature ,ambiguity ,lexicon ,020201 artificial intelligence & image processing ,Artificial intelligence ,[SCCO.LING] Cognitive science/Linguistics ,lcsh:Computational linguistics. Natural language processing ,business ,computer ,Natural language processing ,Discourse marker - Abstract
With respect to discourse organization, the most basic way of signaling the speaker’s or writer’s intentions is to use explicit lexical markers: so-called discourse markers or discourse connectives. While a lexicon of discourse connectives associated with the relations they express can be very useful for researchers, especially in Natural Language Processing, few projects aim at collecting them exhaustively, and only in a small number of languages. We present LEXCONN, a French lexicon of 328 discourse connectives, collected with their syntactic categories and the discourse relations they convey, and the methodology followed to build this resource. The lexicon has been constructed manually, applying systematic connective and relation identification criteria, using the FRANTEXT corpus as empirical support. Each connective has been associated to a relation within the framework of Segmented Discourse Representation Theory. We make a case for a few refinements in the theory, based on cases where no existing relation seemed to match a connective’s usage.
- Published
- 2012
38. Using syntax to disambiguate explicit discourse connectives in text
- Author
-
Emily Pitler and Ani Nenkova
- Subjects
Discourse relation ,Relation (database) ,Computer science ,business.industry ,media_common.quotation_subject ,Ambiguity ,Discourse connectives ,computer.software_genre ,Syntax ,Linguistics ,Artificial intelligence ,business ,computer ,Discourse marker ,Word (computer architecture) ,Natural language processing ,media_common - Abstract
Discourse connectives are words or phrases such as once, since, and on the contrary that explicitly signal the presence of a discourse relation. There are two types of ambiguity that need to be resolved during discourse processing. First, a word can be ambiguous between discourse or non-discourse usage. For example, once can be either a temporal discourse connective or a simply a word meaning "formerly". Secondly, some connectives are ambiguous in terms of the relation they mark. For example since can serve as either a temporal or causal connective. We demonstrate that syntactic features improve performance in both disambiguation tasks. We report state-of-the-art results for identifying discourse vs. non-discourse usage and human-level performance on sense disambiguation.
- Published
- 2009
- Full Text
- View/download PDF
39. Genre distinctions for discourse in the Penn TreeBank
- Author
-
Bonnie Webber
- Subjects
business.industry ,Computer science ,Treebank ,Artificial intelligence ,Discourse connectives ,computer.software_genre ,business ,computer ,Linguistics ,Natural language processing - Abstract
Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports. All but the latter three were then characterised in terms of features manually annotated in the Penn Discourse TreeBank --- discourse connectives and their senses. Summaries turned out to display very different discourse features than the other three genres. Letters also appeared to have some different features. The two main findings involve (1) differences between genres in the senses associated with intra-sentential discourse connectives, inter-sentential discourse connectives and inter-sentential discourse relations that are not lexically marked; and (2) differences within all four genres between the senses of discourse relations not lexically marked and those that are marked. The first finding means that genre should be made a factor in automated sense labelling of non-lexically marked discourse relations. The second means that lexically marked relations provide a poor model for automated sense labelling of relations that are not lexically marked.
- Published
- 2009
- Full Text
- View/download PDF
40. D-STAG: a Formalism for Discourse Analysis based on SDRT and using Synchronous TAG
- Author
-
Laurence Danlos, Analyse Linguistique Profonde à Grande Echelle, Large-scale deep linguistic processing (ALPAGE), Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris Diderot - Paris 7 (UPD7), Philippe de Groote, Université Paris Diderot - Paris 7 (UPD7)-Inria Paris-Rocquencourt, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
Formalism (philosophy) ,Discourse analysis ,media_common.quotation_subject ,02 engineering and technology ,computer.software_genre ,0202 electrical engineering, electronic engineering, information engineering ,Domain of discourse ,Mathematics ,media_common ,Semantic tree ,Discourse relation ,Functor ,Grammar ,business.industry ,06 humanities and the arts ,[SCCO.LING]Cognitive science/Linguistics ,060202 literary studies ,16. Peace & justice ,Discourse connectives ,Linguistics ,0602 languages and literature ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
We propose D-STAG, a new formalism for the automatic analysis of discourse. The analyses computed by d-stag are hierarchical discourse structures annotated with discourse relations, which are compatible with discourse structures computed in sdrt. A discursive STAG grammar pairs up trees anchored by discourse connectives with trees anchored by (functors associated with) discourse relations.; D-STAG est un nouveau formalisme pour l'analyse automatique de discours.
- Published
- 2009
41. The Hindi Discourse Relation Bank
- Author
-
Aravind K. Joshi, Rashmi Prasad, Dipti Misra Sharma, Umangi Oza, and Sudheer Kolachina
- Subjects
Hindi ,Discourse relation ,business.industry ,Computer science ,Treebank ,Discourse connectives ,computer.software_genre ,language.human_language ,Linguistics ,Annotation ,language ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
We describe the Hindi Discourse Relation Bank project, aimed at developing a large corpus annotated with discourse relations. We adopt the lexically grounded approach of the Penn Discourse Treebank, and describe our classification of Hindi discourse connectives, our modifications to the sense classification of discourse relations, and some cross-linguistic comparisons based on some initial annotations carried out so far.
- Published
- 2009
- Full Text
- View/download PDF
42. Recognizing implicit discourse relations in the Penn Discourse Treebank
- Author
-
Min-Yen Kan, Hwee Tou Ng, and Ziheng Lin
- Subjects
Discourse relation ,Dependency (UML) ,business.industry ,Computer science ,Treebank ,Context (language use) ,computer.software_genre ,Discourse connectives ,Linguistics ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Natural language processing - Abstract
We present an implicit discourse relation classifier in the Penn Discourse Treebank (PDTB). Our classifier considers the context of the two arguments, word pair information, as well as the arguments' internal constituent and dependency parses. Our results on the PDTB yields a significant 14.1% improvement over the baseline. In our error analysis, we discuss four challenges in recognizing implicit relations in the PDTB.
- Published
- 2009
- Full Text
- View/download PDF
43. Discourse Connective Argument Identification with Connective Specific Rankers
- Author
-
R. Elwell and Jason Baldridge
- Subjects
Distance measurement ,Parsing ,Discriminative model ,Discourse structure ,Computer science ,business.industry ,Entropy (information theory) ,Artificial intelligence ,Discourse connectives ,computer.software_genre ,business ,computer ,Natural language processing - Abstract
Automatically identifying the arguments of discourse connectives (e.g., and, because, however) is an important part of modeling discourse structure. Previous work used a single, general classifier for different connectives; however, connectives differ in their distribution and behavior, so conflating them this way loses discriminative power. Here, we show that using models for specific connectives and types of connectives and interpolating them with a general model improves performance. We also describe additional features that provide greater sensitivity to morphological, syntactic, and discourse patterns, and less sensitivity to parse quality. Our best model achieves a 3.6% absolute improvement over the state-of-the-art on identifying both arguments of discourse connectives when using features from gold-standard parses, and a 9.0% improvement when using automatically produced parses.
- Published
- 2008
- Full Text
- View/download PDF
44. Sense Annotation in the Penn Discourse Treebank
- Author
-
Alan Lee, Livio Robaldo, Eleni Miltsakaki, and Aravind K. Joshi
- Subjects
Structure (mathematical logic) ,Discourse relation ,business.industry ,Computer science ,Treebank ,Discourse connectives ,computer.software_genre ,Semantics ,Lexical item ,Linguistics ,Annotation ,Artificial intelligence ,Argument (linguistics) ,Attribution ,business ,computer ,Natural language processing - Abstract
An important aspect of discourse understanding and generation involves the recognition and processing of discourse relations. These are conveyed by discourse connectives, i.e., lexical items like because and as a result or implicit connectives expressing an inferred discourse relation. The Penn Discourse TreeBank (PDTB) provides annotations of the argument structure, attribution and semantics of discourse connectives. In this paper, we provide the rationale of the tagset, detailed descriptions of the senses with corpus examples, simple semantic definitions of each type of sense tags as well as informal descriptions of the inferences allowed at each level.
- Published
- 2008
- Full Text
- View/download PDF
45. A pilot annotation to investigate discourse connectivity in biomedical text
- Author
-
Susan McRoy, Nadya Frid, Hong Yu, Aravind K. Joshi, Rashmi Prasad, and Alan Lee
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Treebank ,computer.software_genre ,Discourse connectives ,Linguistics ,Agreement ,Annotation ,Biomedical text ,Artificial intelligence ,business ,computer ,Natural language processing ,Coherence (linguistics) ,media_common - Abstract
The goal of the Penn Discourse Treebank (PDTB) project is to develop a large-scale corpus, annotated with coherence relations marked by discourse connectives. Currently, the primary application of the PDTB annotation has been to news articles. In this study, we tested whether the PDTB guidelines can be adapted to a different genre. We annotated discourse connectives and their arguments in one 4,937-token full-text biomedical article. Two linguist annotators showed an agreement of 85% after simple conventions were added. For the remaining 15% cases, we found that biomedical domain-specific knowledge is needed to capture the linguistic cues that can be used to resolve inter-annotator disagreement. We found that the two annotators were able to reach an agreement after discussion. Thus our experiments suggest that the PDTB annotation can be adapted to new domains by minimally adjusting the guidelines and by adding some further domain-specific linguistic cues.
- Published
- 2008
- Full Text
- View/download PDF
46. Attribution and the (non-)alignment of syntactic and discourse arguments of connectives
- Author
-
Nikhil Dinesh, Rashmi Prasad, Aravind K. Joshi, Alan Lee, Bonnie Webber, and Eleni Miltsakaki
- Subjects
Non alignment ,Relation (database) ,business.industry ,Computer science ,Treebank ,computer.software_genre ,Discourse connectives ,Syntax ,Linguistics ,Annotation ,Syntactic structure ,Artificial intelligence ,Argument (linguistics) ,business ,Attribution ,computer ,Natural language processing - Abstract
The annotations of the Penn Discourse Treebank (PDTB) include (1) discourse connectives and their arguments, and (2) attribution of each argument of each connective and of the relation it denotes. Because the PDTB covers the same text as the Penn TreeBank WSJ corpus, syntactic and discourse annotation can be compared. This has revealed significant differences between syntactic structure and discourse structure, in terms of the arguments of connectives, due in large part to attribution. We describe these differences, an algorithm for detecting them, and finally some experimental results. These results have implications for automating discourse annotation based on syntactic annotation.
- Published
- 2005
- Full Text
- View/download PDF
47. A parallel Proposition Bank II for Chinese and English
- Author
-
Olga Babko-Malaya, Benjamin Snyder, Martha Palmer, Jinying Chen, and Nianwen Xue
- Subjects
PropBank ,Annotation ,business.industry ,Computer science ,Treebank ,Proposition ,Artificial intelligence ,computer.software_genre ,business ,Discourse connectives ,computer ,Natural language processing ,Preliminary analysis - Abstract
The Proposition Bank (PropBank) project is aimed at creating a corpus of text annotated with information about semantic propositions. The second phase of the project, PropBank II adds additional levels of semantic annotation which include eventuality variables, co-reference, coarse-grained sense tags, and discourse connectives. This paper presents the results of the parallel PropBank II project, which adds these richer layers of semantic annotation to the first 100K of the Chinese Treebank and its English translation. Our preliminary analysis supports the hypothesis that this additional annotation reconciles many of the surface differences between the two languages.
- Published
- 2005
- Full Text
- View/download PDF
48. Modelling the substitutability of discourse connectives
- Author
-
Ben Hutchinson
- Subjects
Parsing ,Computer science ,business.industry ,media_common.quotation_subject ,SIGNAL (programming language) ,Variance (accounting) ,Coherence (statistics) ,computer.software_genre ,Discourse connectives ,Artificial intelligence ,Function (engineering) ,business ,computer ,Natural language processing ,media_common - Abstract
Processing discourse connectives is important for tasks such as discourse parsing and generation. For these tasks, it is useful to know which connectives can signal the same coherence relations. This paper presents experiments into modelling the substitutability of discourse connectives. It shows that substitutability effects distributional similarity. A novel variance-based function for comparing probability distributions is found to assist in predicting substitutability.
- Published
- 2005
- Full Text
- View/download PDF
49. Annotation and data mining of the Penn Discourse TreeBank
- Author
-
Eleni Miltsakaki, Bonnie Webber, Rashmi Prasad, and Aravind K. Joshi
- Subjects
PropBank ,Annotation ,Computer science ,business.industry ,Treebank ,Syntactic structure ,Artificial intelligence ,computer.software_genre ,business ,Discourse connectives ,computer ,Natural language processing ,Linguistics - Abstract
The Penn Discourse TreeBank (PDTB) is a new resource built on top of the Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of standoff annotation allows integration with a stand-off version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments), which adds value for both linguistic discovery and discourse modeling. Here we describe the PDTB and some experiments in linguistic discovery based on the PDTB alone, as well as on the linked PTB and PDTB corpora.
- Published
- 2004
- Full Text
- View/download PDF
50. A semantic account of adverbials as discourse connectives
- Author
-
Bonnie Webber and Kate Forbes
- Subjects
Computer science ,business.industry ,Principle of compositionality ,Adverb ,Discourse connectives ,computer.software_genre ,Linguistics ,Annotation ,Matrix (mathematics) ,Artificial intelligence ,business ,computer ,Natural language processing ,Adverbial - Abstract
We address the question of why certain adverb and preposition phrases are only interpretable with respect to the discourse, and not just their own matrix clause. We show that, in many cases, an adverbial's compositional semantics explains why. We close by reporting on an annotation study aimed at providing specific evidence for how adverbials are interpreted with respect to the discourse.
- Published
- 2002
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.