1. Language Identification and Transliteration approaches for Code-Mixed Text.
- Author
-
Kumbhar, Madhuri and Thakre, Kalpana
- Subjects
- *
MACHINE translating , *SOCIAL media , *TRANSLITERATION , *DIGITAL technology , *NATURAL languages , *NATURAL language processing - Abstract
People have become part of the digital era with the advent of the Web. They actively create, share, a variety of content on the web. Unlike earlier days, people widely use different social platforms to talk about their interests, hobbies, reviews on movies, and purchased items in natural language. Processing such natural languages with mixed language tasks is challenging. A sizable proportion communicates in regional language but using code-mixed and script like Roman and Devnagari for English and Marathi language. These texts are generally informal, causal, short length, non-standard spelling alteration etc are prime challenges in language processing. Language identification in mixed text is challenging, since the Romanized string of several languages is comparable. Mixed text is essential to transform into native script (Devnagari) for further processing like Information Retrieval, machine translation, Question Answering etc. Due to the lack of orthography of Latin script in Marathi, language modelling, and identification of mixed text is a challenging issue. Many NLP (Natural Language Processing) applications ranging from machine translation and information retrieval uses Machine transliteration as input mechanism for non-roman script. In this paper, different techniques and various approaches presented by the researchers for code-mixed language, Indian regional languages processing are discussed. The tasks like language identification, transliteration Named Entity Recognition are reviewed with respect to Statistical, Rule and Neural based approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF