106 results on '"Bottom-up parsing"'
Search Results
2. An Abstract, Reusable, and Extensible Programming Language Design Architecture
- Author
-
Aït-Kaci, Hassan, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Tannen, Val, editor, Wong, Limsoon, editor, Libkin, Leonid, editor, Fan, Wenfei, editor, Tan, Wang-Chiew, editor, and Fourman, Michael, editor
- Published
- 2013
- Full Text
- View/download PDF
3. Eliminating Stack Symbols in Push-Down Automata and Linear Indexed Grammars
- Author
-
Nakamura, Katsuhiko, Imada, Keita, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Dediu, Adrian-Horia, editor, Martín-Vide, Carlos, editor, and Truthe, Bianca, editor
- Published
- 2013
- Full Text
- View/download PDF
4. Incremental Learning of Context Free Grammars by Bridging Rule Generation and Search for Semi-optimum Rule Sets
- Author
-
Nakamura, Katsuhiko, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Sakakibara, Yasubumi, editor, Kobayashi, Satoshi, editor, Sato, Kengo, editor, Nishino, Tetsuro, editor, and Tomita, Etsuji, editor
- Published
- 2006
- Full Text
- View/download PDF
5. On Implicit Meanings
- Author
-
Dahl, Veronica, Goos, G., editor, Hartmanis, J., editor, van Leeuwen, J., editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Kakas, Antonis C., editor, and Sadri, Fariba, editor
- Published
- 2002
- Full Text
- View/download PDF
6. Advantages of Dependency Parsing for Free Word Order Natural Languages
- Author
-
Gholamreza Ghassem-Sani and Seyed Amin Mirlohi Falavarjani
- Subjects
Parsing ,Computer science ,business.industry ,Speech recognition ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Word order ,Bottom-up parsing - Abstract
An important reason to prefer dependency parsing over classical phrased based methods, especially for languages such as Persian, with the property of being "free word order", is that this particular property has a negative impact on the accuracy of conventional parsing methods. In Persian, some words such as adverbs can freely be moved within a sentence without affecting its correctness or meaning. In this paper, we illustrate the robustness of dependency parsing against this particular problem by training two well-known dependency parsers, namely MST Parser and Malt Parser, using a Persian dependency corpus called Dadegan. We divided the corpus into two separate parts including only projective sentences and only non-projective sentences, which are corelated with the free word order property. As our results show, MST Parsing is not only more tolerant than Malt Parsing against the free word order problem, but it is also in general a more accurate technique.
- Published
- 2015
- Full Text
- View/download PDF
7. Hybrid Dependency Parser with Segmented Treebanks and Reparsing
- Author
-
Fuxiang Wu and Fugen Zhou
- Subjects
Parsing ,Computer science ,business.industry ,Speech recognition ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,Front and back ends ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,Graph (abstract data type) ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
We propose a hybrid dependency parsing pipeline which combines transition-based parser and graph-based parser, and use segmented treebanks to train transition-based parsers as subparsers in front end, and then propose a constrained Eisner’s algorithm to reparse their outputs. We build the pipeline to investigate the influence on parsing accuracy when training with different segmentations of training data and find a convenient method to obtain parsing reliability score while achieving state-of-the-art parsing accuracy. Our results show that the pipeline with segmented training dataset could improve accuracy through reparsing while providing parsing reliability score.
- Published
- 2015
- Full Text
- View/download PDF
8. Basic Notions of Parsing
- Author
-
Roland Hausser
- Subjects
Parsing ,Grammar ,Categorial grammar ,Computer science ,Programming language ,business.industry ,media_common.quotation_subject ,Top-down parsing ,computer.software_genre ,Formal grammar ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Generative grammar ,Natural language processing ,media_common ,Bottom-up parsing - Abstract
This chapter investigates which formal properties make a formal grammar suitable for automatic language analysis and which do not. For this purpose, context-free PS Grammar and its parsers will be used as our main example.
- Published
- 2014
- Full Text
- View/download PDF
9. Language Design and Implementation via the Combination of Embedding and Parsing
- Author
-
Máté Tejfel, Dániel Leskó, and Gergely Dévai
- Subjects
Constructed language ,Domain-specific language ,Parsing ,Computer science ,Programming language ,Design process ,Embedding ,Compiler ,computer.software_genre ,Top-down parsing ,computer ,Bottom-up parsing - Abstract
Language embedding is a method to implement a new language within the framework of an existing programming language. This method is known to speed up the development process compared to standalone languages using classical compiler technology. On the other hand, embedded languages may not be that convenient for the end-users as standalone ones with own concrete syntax. This paper describes a method that uses the flexibility of language embedding in the experimental phase of the language design process, then, once the language features are mature enough, adds concrete syntax and turns the language to a standalone one. Lessons learnt from a project, run in industry-university cooperation and using the presented method, are discussed. Based on these results, a cost model is established that can be used to estimate the potential benefits of this method in case of future language design projects.
- Published
- 2014
- Full Text
- View/download PDF
10. Dependency-Based Semantic Parsing for Concept-Level Text Analysis
- Author
-
Basant Agarwal, Alexander Gelbukh, Amir Hussain, Newton Howard, and Soujanya Poria
- Subjects
Parsing ,Information retrieval ,Noisy text analytics ,Dependency relation ,business.industry ,Computer science ,Sentiment analysis ,Text graph ,computer.software_genre ,Semantics ,Text mining ,Artificial intelligence ,business ,computer ,Natural language processing ,Natural language ,Bottom-up parsing - Abstract
Concept-level text analysis is superior to word-level analysis as it preserves the semantics associated with multi-word expressions. It offers a better understanding of text and helps to significantly increase the accuracy of many text mining tasks. Concept extraction from text is a key step in concept-level text analysis. In this paper, we propose a ConceptNet-based semantic parser that deconstructs natural language text into concepts based on the dependency relation between clauses. Our approach is domain-independent and is able to extract concepts from heterogeneous text. Through this parsing technique, 92.21% accuracy was obtained on a dataset of 3,204 concepts. We also show experimental results on three different text analysis tasks, on which the proposed framework outperformed state-of-the-art parsing techniques.
- Published
- 2014
- Full Text
- View/download PDF
11. Amharic Sentence Parsing Using Base Phrase Chunking
- Author
-
Abeba Ibrahim and Yaregal Assabie
- Subjects
Phrase ,Parsing ,Computer science ,business.industry ,Speech recognition ,Top-down parsing ,computer.software_genre ,language.human_language ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Amharic ,Parser combinator ,language ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,Phrase chunking ,business ,computer ,Natural language processing ,Sentence ,Bottom-up parsing - Abstract
Parsing plays a significant role in many natural language processing NLP applications as their efficiency relies on having an effective parser. This paper presents Amharic sentence parser developed using base phrase chunker that groups syntactically correlated words at different levels. We use HMM to chunk base phrases where incorrectly chunked phrases are pruned with rules. The task of parsing is then performed by taking chunk results as inputs. Bottom-up approach with transformation algorithm is used to transform the chunker to the parser. Corpus from Amharic news outlets and books was collected for training and testing. The training and testing datasets were prepared using the 10-fold cross validation technique. Test results on the test data showed an average parsing accuracy of 93.75%.
- Published
- 2014
- Full Text
- View/download PDF
12. Committee-Based Active Learning for Dependency Parsing
- Author
-
Saeed Majidi and Gregory Crane
- Subjects
Parsing ,Dependency (UML) ,business.industry ,Computer science ,Active learning (machine learning) ,computer.software_genre ,Top-down parsing ,Annotation ,Dependency grammar ,S-attributed grammar ,Artificial intelligence ,business ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Annotations on structured corpora provide a foundational instrument for emerging linguistic research. To generate annotations automatically, data-driven dependency parsers need a large annotated corpus to learn from. But these annotations are expensive to collect and require a labor intensive task. In order to reduce the costs of annotation, we provide a novel framework in which a committee of dependency parsers collaborate to improve their efficiency using active learning.
- Published
- 2013
- Full Text
- View/download PDF
13. Online Service for Polish Dependency Parsing and Results Visualisation
- Author
-
Alina Wróblewska and Piotr Sikora
- Subjects
Service (systems architecture) ,Parsing ,Dependency (UML) ,GeneralLiterature_INTRODUCTORYANDSURVEY ,Computer science ,business.industry ,computer.software_genre ,Visualization ,Dependency grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
The paper presents a new online service for the dependency parsing of Polish. Given raw text as input, the service processes it and visualises output dependency trees. The service applies the parsing system – MaltParser – with a parsing model for Polish trained on the Polish Dependency Bank, and some additional publicly available tools.
- Published
- 2013
- Full Text
- View/download PDF
14. On LR Parsing with Selective Delays
- Author
-
Mark-Jan Nederhof, Sylvain Schmitz, Eberhard Bertsch, Ruhr-Universität Bochum [Bochum], University of St Andrews [Scotland], Laboratoire Spécification et Vérification [Cachan] (LSV), École normale supérieure - Cachan (ENS Cachan)-Centre National de la Recherche Scientifique (CNRS), and Koen De Bosschere and Ranjit Jhala
- Subjects
Computer science ,0102 computer and information sciences ,02 engineering and technology ,computer.software_genre ,Top-down parsing ,01 natural sciences ,Canonical LR parser ,[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL] ,Parser combinator ,0202 electrical engineering, electronic engineering, information engineering ,grammar transformations ,Parsing ,grammar covers ,Programming language ,business.industry ,020207 software engineering ,Parsing expression grammar ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,010201 computation theory & mathematics ,F.4.2 ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
International audience; The paper investigates an extension of LR parsing that allows the delay of parsing decisions until a sufficient amount of context has been processed. We provide two characterizations for the resulting class of grammars, one based on grammar transformations, the other on the direct construction of a parser. We also report on experiments with a grammar collection.
- Published
- 2013
- Full Text
- View/download PDF
15. Flexible Combinatory Categorial Grammar Parsing Using the CYK Algorithm and Answer Set Programming
- Author
-
Peter Schüller
- Subjects
Parsing ,business.industry ,Computer science ,Programming language ,Parse tree ,Combinatory categorial grammar ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,CYK algorithm ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Combinatory Categorial Grammar CCG is a grammar formalism used for natural language parsing. CCG assigns structured lexical categories to words and uses a small set of combinatory rules to combine these categories in order to parse sentences. In this work we describe and implement a new approach to CCG parsing that relies on Answer Set Programming ASP -- a declarative programming paradigm.Different from previous work, we present an encoding that is inspired by the algorithm due to Cocke, Younger, and Kasami CYK. We also show encoding extensions for parse tree normalization and best-effort parsing and outline possible future extensions which are possible due to the usage of ASP as computational mechanism. We analyze performance of our approach on a part of the Brown corpus and discuss lessons learned during experiments with the ASP tools dlv, gringo, and clasp. The new approach is available in the open source CCG parsing toolkit AspCcgTk which uses the C&C supertagger as a preprocessor to achieve wide-coverage natural language parsing.
- Published
- 2013
- Full Text
- View/download PDF
16. Layout-Sensitive Generalized Parsing
- Author
-
Tillmann Rendel, Klaus Ostermann, Christian Kästner, and Sebastian Erdweg
- Subjects
Parsing ,Parser combinator ,Computer science ,Programming language ,Whitespace ,Memoization ,Haskell ,Top-down parsing language ,computer.software_genre ,Top-down parsing ,computer ,computer.programming_language ,Bottom-up parsing - Abstract
The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F#, and Markdown use indentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity.
- Published
- 2013
- Full Text
- View/download PDF
17. Eliminating Stack Symbols in Push-Down Automata and Linear Indexed Grammars
- Author
-
Keita Imada and Katsuhiko Nakamura
- Subjects
Tree-adjoining grammar ,Indexed language ,Nested stack automaton ,Computer science ,Context-sensitive grammar ,Indexed grammar ,L-attributed grammar ,Arithmetic ,Embedded pushdown automaton ,Bottom-up parsing - Abstract
This paper investigates two subjects in push-down automata (PDAs) and linear indexed grammars (LIGs), which are extended PDAs, focusing on eliminating the stack symbols. One of the subjects is concerned with PI- (push-input-) PDA and PI-LIG without e-transition rule, in which only input symbols are pushed down to the stack. It is shown that the class of languages of PI-LIGs is incomparable with that of PDAs, which is the class of context-free languages (CFLs). The other subject is a simple bottom-up parsing method for LIGs, in which the stack symbols are eliminated at the first step of the parsing. The paper shows several PI-LIGs, including PI-PDAs for fundamental context-free and context-sensitive languages, which are synthesized by a grammatical inference system LIG Learner.
- Published
- 2013
- Full Text
- View/download PDF
18. Bare-Bones Dependency Parsing
- Author
-
Joakim Nivre
- Subjects
Parsing ,Computer science ,business.industry ,Parsing expression grammar ,Top-down parsing ,computer.software_genre ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
If all we want from a syntactic parser is a dependency tree, what do we gain by first computing a different representation such as a phrase structure tree? The principle of parsimony suggests that a simpler model should be preferred over a more complex model, all other things being equal, and the simplest model is arguably one that maps a sentence directly to a dependency tree --- a bare-bones dependency parser. In this paper, I characterize the parsing problem faced by such a system, survey the major parsing techniques currently in use, and begin to examine whether the simpler model can in fact rival the performance of more complex systems. Although the empirical evidence is still limited, I conclude that bare-bones dependency parsers can achieve state-of-the-art parsing accuracy and often excel in terms of efficiency.
- Published
- 2012
- Full Text
- View/download PDF
19. Dependency Parsing with Efficient Feature Extraction
- Author
-
Günter Neumann and Alexander Volokh
- Subjects
Parsing ,LR parser ,Computer science ,business.industry ,Speech recognition ,Feature extraction ,Top-down parsing ,computer.software_genre ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence ,Bottom-up parsing - Abstract
The fastest parsers currently can parse an average sentence in up to 2.5ms, a considerable improvement, since most of the older accuracy-oriented parsers parse only few sentences per second. It is generally accepted that the complexity of a parsing algorithm is decisive for the performance of a parser. However, we show that the most time consuming part of processing is feature extraction and therefore an algorithm which allows efficient feature extraction can outperform a less complex algorithm which does not. Our system based on quadratic Covington's parsing strategy with efficient feature extraction is able to parse an average English sentence in only 0.8ms without any parallelisation.
- Published
- 2012
- Full Text
- View/download PDF
20. Parsing Combinatory Categorial Grammar via Planning in Answer Set Programming
- Author
-
Peter Schüller and Yuliya Lierler
- Subjects
Parsing ,Computer science ,business.industry ,Programming language ,Parse tree ,Combinatory categorial grammar ,Top-down parsing ,computer.software_genre ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Combinatory categorial grammar (CCG) is a grammar formalism used for natural language parsing. CCG assigns structured lexical categories to words and uses combinatory rules to combine these categories to parse a sentence. In this work we propose and implement a new approach to CCG parsing that relies on a prominent knowledge representation formalism, answer set programming (ASP) - a declarative programming paradigm. We formulate the task of CCG parsing as a planning problem and use an ASP computational tool to compute solutions that correspond to valid parses. Compared to other approaches, there is no need to implement a specific parsing algorithm using such a declarative method. Our approach aims at producing all semantically distinct parse trees for a given sentence. From this goal, normalization and efficiency issues arise, and we deal with them by combining and extending existing strategies.We have implemented a CCG parsing tool kit-AspCcgTk-that uses ASP as its main computational means. The C&C supertagger can be used as a preprocessor within AspCcgTk, which allows us to achieve wide-coverage natural language parsing.
- Published
- 2012
- Full Text
- View/download PDF
21. A Semi Supervised Learning Model for Mapping Sentences to Logical form with Ambiguous Supervision
- Author
-
Akira Shimazu and Le Minh Nguyen
- Subjects
Parsing ,business.industry ,Computer science ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Semi-supervised learning ,Top-down parsing ,computer.software_genre ,Machine learning ,Semantic similarity ,Semantic equivalence ,Semantic computing ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Semantic parsing is the task of mapping a natural sentence to a meaning representation. The limitation of semantic parsing is that it is very difficult to obtain annotated training data in which a sentence is paired with a semantic representation. To deal with this problem, we introduce a semi supervised learning model for semantic parsing with ambiguous supervision. The main idea of our method is to utilize a large amount of data, to enrich feature space with the maximum entropy model using our semantic learner. We evaluate the proposed models on standard corpora to show that our methods are suitable for semantic parsing problem. Experimental results show that the proposed methods work efficiently and well on ambiguous data and it is comparable to the state of the art method.
- Published
- 2012
- Full Text
- View/download PDF
22. COGPARSE: Brain-Inspired Knowledge-Driven Full Semantics Parsing
- Author
-
Daniel Olsher
- Subjects
Parsing ,business.industry ,Computer science ,computer.software_genre ,Top-down parsing ,Operational semantics ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Computational semantics ,Top-down parsing language ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Humans use semantics during parsing; so should computers. In contrast to phrase structure-based parsers, COGPARSE seeks to determine which meaning-bearing components are present in a text, using world knowledge and lexical semantics for construction grammar form selection, syntactic overlap processing, disambiguation, and confidence calculation. In a brain-inspired way, COGPARSE aligns parsing with the structure of the lexicon, providing a linguistic representation, parsing algorithm, associated linguistic theory, and preliminary metrics for evaluating parse quality. Given sufficient information on nuanced word and construction semantics, COGPARSE can also assemble detailed full-semantics meaning representations of input texts. Beyond the ability to determine which parses are most likely to be intended and to use knowledge in disambiguation, full-semantics parsing enables nuanced meaning representation, learning, summarization, natural language user interfaces, and the taking of action based on natural language input.
- Published
- 2012
- Full Text
- View/download PDF
23. A New Method for Dependent Parsing
- Author
-
Trevor Jim and Yitzhak Mandelbaum
- Subjects
Empty string ,Computer science ,Memoization ,Attribute grammar ,media_common.quotation_subject ,Context-sensitive grammar ,computer.software_genre ,Top-down parsing ,Semantics ,Rule-based machine translation ,Parser combinator ,Phrase structure grammar ,media_common ,Parsing ,Grammar ,business.industry ,Programming language ,Parsing expression grammar ,Context-free grammar ,Tree-adjoining grammar ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Extended Affix Grammar ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,L-attributed grammar ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Dependent grammars extend context-free grammars by allowing semantic values to be bound to variables and used to constrain parsing. Dependent grammars can cleanly specify common features that cannot be handled by context-free grammars, such as length fields in data formats and significant indentation in programming languages. Few parser generators support dependent parsing, however. To address this shortcoming, we have developed a new method for implementing dependent parsers by extending existing parsing algorithms. Our method proposes a point-free language of dependent grammars, which we believe closely corresponds to existing context-free parsing algorithms, and gives a novel transformation from conventional dependent grammars to point-free ones. To validate our technique, we have specified the semantics of both source and target dependent grammar languages, and proven our transformation sound and complete with respect to those semantics. Furthermore, we have empirically validated the suitability of our point-free language by adapting four parsing engines to support it: an Earley parsing engine; a GLR parsing engine; memoizing, arrow-style parser combinators; and PEG parser combinators.
- Published
- 2011
- Full Text
- View/download PDF
24. Processing Text-Technological Resources in Discourse Parsing
- Author
-
Harald Lüngen, Maja Bärenfänger, Mirco Hilbert, and Henning Lobin
- Subjects
Parsing ,business.industry ,Computer science ,Top-down parsing ,computer.software_genre ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Rhetorical Structure Theory ,Top-down parsing language ,S-attributed grammar ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language processing ,Discourse marker ,Bottom-up parsing - Abstract
Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific journal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed systems with a similar task.
- Published
- 2011
- Full Text
- View/download PDF
25. Multi-task Learning for Word Alignment and Dependency Parsing
- Author
-
Shujie Liu
- Subjects
Parsing ,Machine translation ,Computer science ,business.industry ,Speech recognition ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Computer Science::Formal Languages and Automata Theory ,Natural language processing ,Word (computer architecture) ,Bottom-up parsing - Abstract
Word alignment and parsing are two important components for syntax based machine translation. The inconsistent models for alignment and parsing caused problems during translation pair extraction. In this paper, we do word alignment and dependency parsing in a multi-task learning framework, in which word alignment and dependency parsing are consistent and assisted with each other. Our experiments show significant improvement not only for both word alignment and dependency parsing, but also the final translation performance.
- Published
- 2011
- Full Text
- View/download PDF
26. Parsing Range Concatenation Grammars
- Author
-
Laura Kallmeyer
- Subjects
Parsing ,business.industry ,Computer science ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,Parser combinator ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,L-attributed grammar ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
In the following, we will see different parsing algorithms for RCGs. Let us consider a variant of the RCG for the MIX language from Fig. 8.4 as a running example. This variant is given in Fig. 9.1.
- Published
- 2010
- Full Text
- View/download PDF
27. Generative Capacity and Parsing Complexity
- Author
-
Marco Kuhlmann
- Subjects
Parsing ,Dependency (UML) ,Computer science ,business.industry ,Phrase structure rules ,Link grammar ,Top-down parsing ,computer.software_genre ,Transformational grammar ,Parser combinator ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Generative grammar ,Natural language processing ,Bottom-up parsing - Abstract
In this chapter, we complete our study of regular dependency languages by investigating their string-generative capacity and parsing complexity. Specifically, we study the connection between these two measures and the structural constraints discussed in the first part of this book.
- Published
- 2010
- Full Text
- View/download PDF
28. Towards an N-Version Dependency Parser
- Author
-
Pablo Gervás, Jesús Herrera, Virginia Francisco, and Miguel Ballesteros
- Subjects
Parsing ,business.industry ,LR parser ,Computer science ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Maltparser is a contemporary dependency parsing machine learning-based system that shows great accuracy. However 90% for Labelled Attachment Score (LAS) seems to be a de facto limit for such kinds of parsers. Since generally such systems can not be modified, previous works have been developed to study what can be done with the training corpora in order to improve parsing accuracy. High level techniques, such as controlling sentences' length or corpora's size, seem useless for these purposes. But low level techniques, based on an in-depth study of the errors produced by the parser at the word level, seem promising. Prospective low level studies suggested the development of n-version parsers. Each one of these n versions should be able to tackle a specific kind of dependency parsing at the word level and the combined action of all them should reach more accurate parsings. In this paper we present an extensive study on the usefulness and the expected limits for n-version parser to improve parsing accuracy. This work has been developed specifically for Spanish using Maltparser.
- Published
- 2010
- Full Text
- View/download PDF
29. Verifiable Parse Table Composition for Deterministic Parsing
- Author
-
August Schwerdfeger and Eric Van Wyk
- Subjects
Source code ,Theoretical computer science ,Parsing ,Programming language ,Computer science ,media_common.quotation_subject ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Table (database) ,Deterministic parsing ,LALR parser ,computer ,media_common ,Bottom-up parsing - Abstract
One obstacle to the implementation of modular extensions to programming languages lies in the problem of parsing extended languages. Specifically, the parse tables at the heart of traditional LALR(1) parsers are so monolithic and tightly constructed that, in the general case, it is impossible to extend them without regenerating them from the source grammar. Current extensible frameworks employ a variety of solutions, ranging from a full regeneration to using pluggable binary modules for each different extension. But recompilation is time-consuming, while the pluggable modules in many cases cannot support the addition of more than one extension, or use backtracking or non-deterministic parsing techniques. We present here a middle-ground approach that allows an extension, if it meets certain restrictions, to be compiled into a parse table fragment. The host language parse table and fragments from multiple extensions can then always be efficiently composed to produce a conflict-free parse table for the extended language. This allows for the distribution of deterministic parsers for extensible languages in a pre-compiled format, eliminating the need for the “source code” grammar to be distributed. In practice, we have found these restrictions to be reasonable and admit many useful language extensions.
- Published
- 2010
- Full Text
- View/download PDF
30. A Fast General Parser for Automatic Code Generation
- Author
-
Wuu Yang
- Subjects
Simple LR parser ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Computer science ,LR parser ,Programming language ,GLR parser ,Top-down parsing ,LALR parser ,computer.software_genre ,computer ,Canonical LR parser ,Bottom-up parsing - Abstract
The code generator in a compiler attempts to match a subject tree against a collection of tree-shaped patterns for generating instructions. Tree-pattern matching may be considered as a generalization of string parsing. We propose a new generalized LR (GLR) parser, which extends the LR parser stack with a parser cactus. GLR explores all plausible parsing steps to find the least-cost matching. GLR is fast due to two properties: (1) duplicate parsing steps are eliminated and (2) partial parse trees that will not lead to a least-cost matching are discarded as early as possible.
- Published
- 2010
- Full Text
- View/download PDF
31. A Machine Learning Parser Using an Unlexicalized Distituent Model
- Author
-
Lawrence Y. L. Cheung, Samuel W. K. Chan, and Mickey W. C. Chong
- Subjects
Parsing ,Phrase ,Computer science ,business.industry ,Speech recognition ,Treebank ,computer.software_genre ,Part of speech ,Top-down parsing ,Machine learning ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Chunking (psychology) ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence ,Bottom-up parsing - Abstract
Despite the popularity of lexicalized parsing models, practical concerns such as data sparseness and applicability to domains of different vocabularies make unlexicalized models that do not refer to word tokens themselves deserve more attention. A classifier-based parser using an unlexicalized parsing model has been developed. Most importantly, to enhance the accuracy of these tasks, we investigated the notion of distituency (the possibility that two parts of speech cannot remain in the same constituent or phrase) and incorporated it as attributes using various statistic measures. A machine learning method integrates linguistic attributes and information-theoretic attributes in two tasks, namely sentence chunking and phrase recognition. The parser was applied to parsing English and Chinese sentences in the Penn Treebank and the Tsinghua Chinese Treebank. It achieved a parsing performance of F-Score 80.3% in English and 82.4% in Chinese.
- Published
- 2010
- Full Text
- View/download PDF
32. Out-of-the-Box Robust Parsing of Portuguese
- Author
-
João Ricardo Silva, Sergio Castro, António Branco, and Ruben Reis
- Subjects
Parsing ,Computer science ,Programming language ,business.industry ,Probabilistic logic ,Top-down parsing ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,language.human_language ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,language ,Coming out ,Artificial intelligence ,Portuguese ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
In this paper we assess to what extent the available Portuguese treebanks and available probabilistic parsers are suitable for out-of-the-box robust parsing of Portuguese. We also announce the release of the best parser coming out of this exercise, which is, to the best of our knowledge, the first robust parser widely available for Portuguese.
- Published
- 2010
- Full Text
- View/download PDF
33. Parsing as Classification
- Author
-
Lidia Khmylko and Wolfgang Menzel
- Subjects
Parsing ,Computer science ,business.industry ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,computer.software_genre ,Top-down parsing ,Parser combinator ,Dependency grammar ,S-attributed grammar ,Artificial intelligence ,Set (psychology) ,business ,computer ,Natural language processing ,Sentence ,Bottom-up parsing - Abstract
Dependency parsing can be cast as a classification problem over strings of observations. Compared to shallow processing tasks like tagging, parsing is a dynamic classification problem as no statically predefined set of classes exists and any class to be distinguished is composed of pairs from a given label set (syntactic function) and the available attachment points in the sentence, so that even the number of “classes” varies with the length of the input sentence. A number of fundamentally different approaches have been pursued to solve this classification task. They differ in the way they consider the context, in whether they apply machine learning approaches or not, and in the means they use to enforce the tree property of the resulting sentence structure. These differences eventually result in a different behavior on the same data making the paradigm an ideal testbed to apply different information fusion schemes for combined decision making.
- Published
- 2010
- Full Text
- View/download PDF
34. A Hyprolog Parsing Methodology for Property Grammars
- Author
-
Baohua Gu, Veronica Dahl, and Erez Maharshak
- Subjects
Parsing ,business.industry ,Programming language ,Computer science ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Top-down parsing language ,S-attributed grammar ,Artificial intelligence ,L-attributed grammar ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Property Grammars , or PGs, belong to a new family of linguistic formalisms which view a grammar as a set of linguistic constraints, and parsing as a constraint satisfaction problem. Rigid hierarchical parsing gives way to flexible mechanisms which can handle incomplete, ambiguous or erroneous text, and are thus more adequate for new applications such as speech recognition, internet mining, controlled languages and biomedical information. The present work contributes a) a new parsing methodology for PGs in terms of Hyprolog --- an extension of Prolog with linear and intuitionistic logic and with abduction; and b) a customisable extension of PGs that lets us model also concepts and relations to some degree. We exemplify within the domain of extracting concepts from biomedical text.
- Published
- 2009
- Full Text
- View/download PDF
35. Parsing with Agreement
- Author
-
Adam Radziszewski
- Subjects
Shallow parsing ,Parsing ,business.industry ,Computer science ,computer.software_genre ,Top-down parsing ,Noun phrase ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Shallow parsing has been proposed as a means of arriving at practically useful structures while avoiding the difficulties of full syntactic analysis. According to Abney's principles, it is preferred to leave an ambiguity pending than to make a likely wrong decision. We show that continuous phrase chunking as well as shallow constituency parsing display evident drawbacks when faced with freer word order languages. Those drawbacks may lead to unnecessary data loss as a result of decisions forced by the formalism and therefore diminish practical value of shallow parsers for Slavic languages. We present an alternate approach to shallow parsing of noun phrases for Slavic languages which follows the original Abney's principles. The proposed approach to parsing is decomposed into several stages, some of which allow for marking discontinuous phrases.
- Published
- 2009
- Full Text
- View/download PDF
36. Faster Scannerless GLR Parsing
- Author
-
Economopoulos, Giorgos Robert, Klint, Paul, Vinju, Jurgen, Moor, O., Schwartzbach, M.I., Software Engineering, and Software Analysis and Transformation
- Subjects
Parsing ,business.industry ,Programming language ,Computer science ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,S-attributed grammar ,Artificial intelligence ,L-attributed grammar ,Scannerless parsing ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further. In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.
- Published
- 2009
- Full Text
- View/download PDF
37. Intraclausal Coordination and Clause Detection as a Preprocessing Step to Dependency Parsing
- Author
-
Tomaž Šef, Matjaž Gams, and Domen Marinči
- Subjects
Parsing ,Dependency (UML) ,Computer science ,business.industry ,Speech recognition ,Treebank ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
The impact of clause and intraclausal coordination detection to dependency parsing of Slovene is examined. New methods based on machine learning and heuristic rules are proposed for clause and intraclausal coordination detection. They were included in a new dependency parsing algorithm, PACID. For evaluation, Slovene dependency treebank was used. At parsing, 6.4% and 9.2 % relative error reduction was achieved, compared to the dependency parsers MSTP and Malt, respectively.
- Published
- 2009
- Full Text
- View/download PDF
38. Combinator Parsing: A Short Tutorial
- Author
-
S. Doaitse Swierstra
- Subjects
Parsing ,Programming language ,business.industry ,Computer science ,computer.software_genre ,Top-down parsing ,Parser combinator ,Combinator library ,Top-down parsing language ,S-attributed grammar ,Artificial intelligence ,business ,Combinatory logic ,computer ,Natural language processing ,Bottom-up parsing - Abstract
There are numerous ways to implement a parser for a given syntax; using parser combinators is a powerful approach to parsing which derives much of its power and expressiveness from the type system and semantics of the host programming language. This tutorial begins with the construction of a small library of parsing combinators. This library introduces the basics of combinator parsing and, more generally, demonstrates how domain specific embedded languages are able to leverage the facilities of the host language. After having constructed our small combinator library, we investigate some shortcomings of the naive implementation introduced in the first part, and incrementally develop an implementation without these problems. Finally we discuss some further extensions of the presented library and compare our approach with similar libraries.
- Published
- 2009
- Full Text
- View/download PDF
39. Correlating Natural Language Parser Performance with Statistical Measures of the Text
- Author
-
Yi Zhang and Rui Wang
- Subjects
Parsing ,Language identification ,Computer science ,business.industry ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Cache language model ,GLR parser ,Artificial intelligence ,Language model ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Natural language parsing, as one of the central tasks in natural language processing, is widely used in many AI fields. In this paper, we address an issue of parser performance evaluation, particularly its variation across datasets. We propose three simple statistical measures to characterize the datasets and also evaluate their correlation to the parser performance. The results clearly show that different parsers have different performance variation and sensitivity against these measures. The method can be used to guide the choice of natural language parsers for new domain applications, as well as systematic combination for better parsing accuracy.
- Published
- 2009
- Full Text
- View/download PDF
40. Spejd: A Shallow Processing and Morphological Disambiguation Tool
- Author
-
Adam Przepiórkowski and Aleksander Buczyński
- Subjects
Shallow parsing ,Grammar ,Computer science ,business.industry ,Programming language ,media_common.quotation_subject ,Top-down parsing ,computer.software_genre ,Rotation formalisms in three dimensions ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing ,media_common - Abstract
This article presents a formalism and a beta version of a new tool for simultaneous morphosyntactic disambiguation and shallow parsing. Unlike in the case of other shallow parsing formalisms, the rules of the grammar allow for explicit morphosyntactic disambiguation statements, independently of structure-building statements, which facilitates the task of the shallow parsing of morphosyntactically ambiguous or erroneously disambiguated input.
- Published
- 2009
- Full Text
- View/download PDF
41. Efficient Parsing of Romanian Language for Text-to-Speech Purposes
- Author
-
Mihai Alexandru Ordean, Mihaela Ordean, Gheorghe Cosmin Silaghi, Lucian Radu Teodorescu, Razvan Boldizsar, and Andrei Şaupe
- Subjects
Document Structure Description ,Source code ,Parsing ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Speech synthesis ,computer.software_genre ,Top-down parsing ,Parser combinator ,Text normalization ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing ,media_common - Abstract
This paper presents the design of the text analysis component of a TTS system for the Romanian language. Our text analysis is performed in two steps: document structure detection and text normalization. The output is a tree-based representation of the processed data. Parsing is made efficient with the help of the Boost Spirit LL parser [1], the usage of this tool allowing for a greater flexibility in the source code and in the output representation.
- Published
- 2009
- Full Text
- View/download PDF
42. Practical Scope Recovery Using Bridge Parsing
- Author
-
Torbjörn Ekman, Görel Hedin, and Emma Nilsson-Nyman
- Subjects
TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parsing ,Parser combinator ,Computer science ,LR parser ,Programming language ,Parsing expression grammar ,Compiler ,Top-down parsing ,computer.software_genre ,LALR parser ,computer ,Bottom-up parsing - Abstract
Interactive development environments (IDEs) increase programmer productivity, but unfortunately also the burden on language implementors since sophisticated tool support is expected even for small domain-specific languages. Our goal is to alleviate that burden, by generating IDEs from high-level language specifications using the JastAdd meta-compiler system. This puts increased tension on scope recovery in parsers, since at least a partial AST is required by the system to perform static analysis, such as name completion and context sensitive search. In this paper we present a novel recovery algorithm called bridge parsing, which provides a light-weight recovery mechanism that complements existing parsing recovery techniques. An initial phase recovers nesting structure in source files making them easier to process by existing parsers. This enables batch parser generators with existing grammars to be used in an interactive setting with minor or no modifications. We have implemented bridge parsing in a generic extensible IDE for JastAdd based compilers. It is independent of parsing technology, which we validate by showing how it improves recovery in a set of typical interactive editing scenarios for three parser generators: ANTLR (LL(variable lookahead) parsers), LPG (LALR(k) parsers), and Beaver (LALR(1) parsers). ANTLR and LPG both contain sophisticated support for error recovery, while Beaver requires manual error productions. Bridge parsing complements these techniques and yields better recovery for all these tools with only minimal changes to existing grammars.
- Published
- 2009
- Full Text
- View/download PDF
43. Statistical Parsing with Context-Free Filtering Grammar
- Author
-
Michael Demko and Gerald Penn
- Subjects
Parsing ,Computer science ,business.industry ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,S-attributed grammar ,Top-down parsing language ,Statistical parsing ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Statistical parsers that simultaneously generate both phrase-structure and lexical dependency trees have been limited to date in two important ways: detecting non-projective dependencies has not been integrated with other parsing decisions, and/or the constraints between phrase-structure and dependency structure have been overly strict. We introduce context-free filtering grammar as a generalization of a lexicalized factored parsing model, and develop a scoring model to resolve parsing ambiguities for this new grammar formalism. We demonstrate the new model's flexibility by implementing a statistical parser for German, a freer-word-order language exhibiting a mixture of projective and non-projective syntax, using the TuBa-D/Z treebank [1].
- Published
- 2009
- Full Text
- View/download PDF
44. Where Do Parsing Errors Come From
- Author
-
Kaili Müürisep and Helen Nigol
- Subjects
Parsing ,Computer science ,business.industry ,Speech recognition ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,GLR parser ,Top-down parsing language ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
This paper discusses some issues of developing a parser for spoken Estonian which is based on an already existing parser for written language, and employs the Constraint Grammar framework. When we used a corpus of face-to-face everyday conversations as the training and testing material, the parser gained the recall 97.6% and the precision 91.8%. The parsing of institutional phone calls turned out to be a more complicated task, with the recall dropping by 3%. In this paper, we will focus on parsing nonfluent speech using a rule-based parser. We will give an overview of parsing errors and ways to overcome them.
- Published
- 2008
- Full Text
- View/download PDF
45. Partial Parsing: Combining Choice with Commitment
- Author
-
Malcolm Wallace
- Subjects
TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parsing ,Parser combinator ,Computer science ,Programming language ,Top-down parsing language ,S-attributed grammar ,Parsing expression grammar ,Combinatory logic ,Top-down parsing ,computer.software_genre ,computer ,Bottom-up parsing - Abstract
Parser combinators, often monadic, are a venerable and widely-used solution to read data from some external format. However, the capability to return a partial parse has, until now, been largely missing. When only a small portion of the entire data is desired, it has been necessary either to parse the entire input in any case, or to break up the grammar into smaller pieces and move some work outside the world of combinators. This paper presents a technique for mixing lazy, demand-driven, parsing with strict parsing, all within the same set of combinators. The grammar specification remains complete and unbroken, yet only sufficient input is consumed to satisfy the result demanded. It is built on a combinationof applicativeand monadicparsers. Monadic parsing alone is insufficient to allow a choice operator to coexist with the early commitment needed for lazy results. Applicative parsing alone can give partial results, but does not permit context-sensitive grammars. But used together, we gain both partiality and a flexible ease of use. Performance results demonstrate that partial parsing is often faster and more space-efficient than strict parsing, but never worse. The trade-off is that partiality has consequences when dealing with ill-formed input.
- Published
- 2008
- Full Text
- View/download PDF
46. Apply a Rough Set-Based Classifier to Dependency Parsing
- Author
-
Yangsheng Ji, Ruoce Ma, Xinyu Dai, and Lin Shang
- Subjects
Parsing ,Computer science ,business.industry ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Top-down parsing ,computer.software_genre ,Machine learning ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Dependency grammar ,S-attributed grammar ,Rough set ,Artificial intelligence ,Deterministic parsing ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
A rough set-based semi-naive Bayesian classification method is applied to dependency parsing, which is an important task in syntactic structure analysis of natural language processing. Many parsing algorithms have emerged combined with statistical machine learning techniques. The rough set-based classifier is embedded with Nivre's deterministic parsing algorithm to conduct dependency parsing task on a Chinese corpus. Experimental results show that the method has a good performance on dependency parsing task. Moreover, the experiments have justified the effectiveness of the classification influence.
- Published
- 2008
- Full Text
- View/download PDF
47. Dependency Parsing by Transformation and Combination
- Author
-
Jens Nilsson and Joakim Nivre
- Subjects
Parsing ,LR parser ,business.industry ,Computer science ,Treebank ,Top-down parsing ,computer.software_genre ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Dependency graph ,Parser combinator ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
This study presents new language and treebank independent graph transformations that improve accuracy in data-driven dependency parsing. We show that individual generic graph transformations can increase accuracy across treebanks, but especially when they are combined using established parser combination techniques. The combination experiments also indicate that the presumed best way to combine parsers, using the highest scoring parsers, is not necessarily the best approach.
- Published
- 2008
- Full Text
- View/download PDF
48. The Data-Oriented Parsing Approach: Theory and Application
- Author
-
Bod, R., Fulcher, J., Jain, L.C., and Language and Computation (ILLC, FNWI/FGw)
- Subjects
Parsing ,Computer science ,business.industry ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Top-down parsing language ,S-attributed grammar ,Artificial intelligence ,Deterministic parsing ,business ,computer ,Natural language processing ,Bottom-up parsing ,Data-oriented parsing - Abstract
Parsing models have many applications in AI, ranging from natural language processing (NLP) and computational music analysis to logic programming and computational learning. Broadly conceived, a parsing model seeks to uncover the underlying structure of an input, that is, the various ways in which elements of the input combine to form phrases or constituents and how those phrases recursively combine to form a tree structure for the whole input. During the last fifteen years, a major shift has taken place from rule-based, deterministic parsing to corpus-based, probabilistic parsing. A quick glance over the NLP literature from the last ten years, for example, indicates that virtually all natural language parsing systems are currently probabilistic. The same development can be observed in (stochastic) logic programming and (statistical) relational learning. This trend towards probabilistic parsing is not surprising: the increasing availability of very large collections of text, music, images and the like allow for inducing statistically motivated parsing systems from actual data.
- Published
- 2008
- Full Text
- View/download PDF
49. Using Constraints over Finite Sets of Integers for Range Concatenation Grammar Parsing
- Author
-
Yannick Parmentier, Wolfgang Maier, Natural Language Processing: representation, inference and semantics (TALARIS), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Emmy Noether Project (SFB 833), Eberhard Karls Universität Tübingen = Eberhard Karls University of Tuebingen, Chalmers University of Technology and University of Gothenburg, and Bengt Nordström, Aarne Ranta
- Subjects
Computer science ,Concatenation ,02 engineering and technology ,computer.software_genre ,Top-down parsing ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Computer Science::Hardware Architecture ,Parser combinator ,0202 electrical engineering, electronic engineering, information engineering ,Parsing ,060201 languages & linguistics ,business.industry ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,06 humanities and the arts ,Parsing expression grammar ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Range Concatenation Grammar ,Constraints ,0602 languages and literature ,020201 artificial intelligence & image processing ,S-attributed grammar ,Top-down parsing language ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
International audience; Range Concatenation Grammar (RCG) is a formalism with interesting formal properties (it has a polynomial parsing time while being more powerful than Linear Context-Free Rewriting Systems). In this context, we present a constraint-based extension of the state-of-the-art RCG parsing algorithm of (Boullier, 2000), which has been used for the implementation of an open-source parsing architecture.
- Published
- 2008
- Full Text
- View/download PDF
50. DCGs + Memoing = Packrat Parsing but Is It Worth It?
- Author
-
Ralph Becket and Zoltan Somogyi
- Subjects
Parsing ,business.industry ,Programming language ,Computer science ,Memoization ,Parsing expression grammar ,computer.software_genre ,Top-down parsing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Parser combinator ,Top-down parsing language ,S-attributed grammar ,Artificial intelligence ,business ,computer ,Natural language processing ,Bottom-up parsing - Abstract
Packrat parsing is a newly popular technique for efficiently implementing recursive descent parsers. Packrat parsing avoids the potential exponential costs of recursive descent parsing with backtracking by ensuring that each production rule in the grammar is tested at most once against each position in the input stream. This paper argues that (a) packrat parsers can be trivially implemented using a combination of definite clause grammar rules and memoing, and that (b) packrat parsing may actually be significantly less efficient than plain recursive descent with backtracking, but (c) memoing the recognizers of just one or two nonterminals, selected in accordance with Amdahl's law, can sometimes yield speedups. We present experimental evidence to support these claims.
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.