Back to Search
Start Over
TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
- Source :
- ICTACT Journal on Soft Computing, Vol 5, Iss 4, Pp 1021-1026 (2015)
- Publication Year :
- 2015
- Publisher :
- ICT Academy of Tamil Nadu, 2015.
-
Abstract
- Tree adjoining grammars (TAGs) are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promising future for formalism based NLP in multilingual scenarios. In this paper we demonstrate basic synchronous Tree adjoining grammar for English-Tamil language pair that can be used readily for machine translation. We have also developed a multithreaded chart parser that gives ambiguous deep structures and a par dependency structure known as TAG derivation. Furthermore we then focus on a model for training this TAG for each language using a large corpus of text through a map reduce frequency count model in spark and estimation of various probabilistic parameters for the grammar trees thereafter; these parameters can be used to perform statistical parsing on the trained grammar.
- Subjects :
- Text corpus
lcsh:Computer engineering. Computer hardware
Machine translation
Computer science
Attribute grammar
media_common.quotation_subject
lcsh:TK7885-7895
Mildly context-sensitive grammar formalism
computer.software_genre
Top-down parsing
RDDs
Parser combinator
Rule-based machine translation
Regular tree grammar
Probabilistic Grammar
media_common
Spark
Parsing
Chart parser
Grammar
Programming language
business.industry
Parsing expression grammar
Context-free grammar
Tree-adjoining grammar
Ambiguous grammar
Extended Affix Grammar
Synchronous context-free grammar
Statistical parsing
S-attributed grammar
Artificial intelligence
L-attributed grammar
TAGs
business
computer
Natural language
Natural language processing
Subjects
Details
- Language :
- English
- ISSN :
- 22296956 and 09766561
- Volume :
- 5
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- ICTACT Journal on Soft Computing
- Accession number :
- edsair.doi.dedup.....cc01f6132f3a894cf4dc3b37948a0b52