Back to Search
Start Over
Natural language inference for Malayalam language using language agnostic sentence representation
- Source :
- PeerJ Computer Science, Vol 7, p e508 (2021), PeerJ Computer Science
- Publication Year :
- 2021
- Publisher :
- PeerJ Inc., 2021.
-
Abstract
- Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques.
- Subjects :
- General Computer Science
Computer science
Data Mining and Machine Learning
Malayalam
02 engineering and technology
computer.software_genre
Natural language inference
Multiclass classification
03 medical and health sciences
0302 clinical medicine
FastText
0202 electrical engineering, electronic engineering, information engineering
Question answering
Textual entailment
Doc2Vec
business.industry
Word Embeddings
QA75.5-76.95
Automatic summarization
language.human_language
Natural Language and Speech
Computational Linguistics
Binary classification
Electronic computers. Computer science
030221 ophthalmology & optometry
language
020201 artificial intelligence & image processing
LASER
Artificial intelligence
Paragraph
business
computer
Natural language processing
Sentence
BERT
Subjects
Details
- Language :
- English
- ISSN :
- 23765992
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- PeerJ Computer Science
- Accession number :
- edsair.doi.dedup.....b7a7f245c64d50fa5a262d2e228180ff