1. Automatic Language Identification in Texts
- Author
-
Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén, Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, and Krister Lindén
- Subjects
- Computational linguistics, Text processing (Computer science)
- Abstract
This book provides readers with a brief account of the history of Language Identification (LI) research and a survey of the features and methods most used in LI literature. LI is the problem of determining the language in which a document is written and is a crucial part of many text processing pipelines. The authors use a unified notation to clarify the relationships between common LI methods. The book introduces LI performance evaluation methods and takes a detailed look at LI-related shared tasks. The authors identify open issues and discuss the applications of LI and related tasks and proposes future directions for research in LI.
- Published
- 2024