Back to Search Start Over

Natural language inference for Malayalam language using language agnostic sentence representation

Authors :
Sara Renjit
Sumam Mary Idicula
Source :
PeerJ Computer Science, Vol 7, p e508 (2021), PeerJ Computer Science
Publication Year :
2021
Publisher :
PeerJ Inc., 2021.

Abstract

Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques.

Details

Language :
English
ISSN :
23765992
Volume :
7
Database :
OpenAIRE
Journal :
PeerJ Computer Science
Accession number :
edsair.doi.dedup.....b7a7f245c64d50fa5a262d2e228180ff