Back to Search Start Over

A Study of Contrastive Learning Algorithms for Sentence Representation Based on Simple Data Augmentation.

Authors :
Liu, Xiaodong
Gong, Wenyin
Li, Yuxin
Li, Yanchi
Li, Xiang
Source :
Applied Sciences (2076-3417); Sep2023, Vol. 13 Issue 18, p10120, 15p
Publication Year :
2023

Abstract

In the era of deep learning, representational text-matching algorithms based on BERT and its variant models have become mainstream and are limited by the sentence vectors generated by the BERT model, and the SimCSE algorithm proposed in 2021 has improved the sentence vector quality to a certain extent. In this paper, to address the problem that the SimCSE algorithm has—that the greater the difference in sentence length, the smaller the probability that the sentence pairs are similar—an EdaCSE algorithm is proposed to perturb the sentence length using a simple data enhancement method without affecting the semantics of the sentences. The perturbation is applied to the sentence length by adding meaningless English punctuation marks to the original sentence so that the model no longer tends to recognise sentences of similar length as similar sentences. Based on the BERT series of models, experiments were conducted on five different datasets, and the experiments proved that the EdaCSE method improves an average of 1.67, 0.84, and 1.08 on the five datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
13
Issue :
18
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
172359640
Full Text :
https://doi.org/10.3390/app131810120