Back to Search Start Over

Annotating Sanskrit Corpus: Adapting IL-POSTS

Authors :
Girish Nath Jha
Madhav Gopal
Diwakar Mishra
Source :
Human Language Technology. Challenges for Computer Science and Linguistics ISBN: 9783642200946, LTC
Publication Year :
2011
Publisher :
Springer Berlin Heidelberg, 2011.

Abstract

In this paper we present an experiment on the use of the hierarchical Indic Languages POS Tagset (IL-POSTS) (Baskaran et al 2008 a&b), developed by Microsoft Research India (MSRI) for tagging Indian languages, for annotating Sanskrit corpus. Sanskrit is a language with richer morphology and relatively free word-order. The authors have included and excluded certain tags according to the requirements of the Sanskrit data. A revision to the annotation guidelines done for IL-POSTS is also presented. The authors also present an experiment of training the tagger at MSRI and documenting the results.

Details

ISBN :
978-3-642-20094-6
ISBNs :
9783642200946
Database :
OpenAIRE
Journal :
Human Language Technology. Challenges for Computer Science and Linguistics ISBN: 9783642200946, LTC
Accession number :
edsair.doi...........188749d5727780d7406d62a55feb6b23
Full Text :
https://doi.org/10.1007/978-3-642-20095-3_34