Back to Search Start Over

Semantically-informed distance and similarity measures for paraphrase plagiarism identification

Authors :
Manuel Montes-y-Gómez
Esaú Villatoro-Tello
Marc Franco-Salvador
Paolo Rosso
Luis Villaseñor-Pineda
Miguel A. Álvarez-Carmona
Source :
RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
Publication Year :
2018
Publisher :
IOS Press, 2018.

Abstract

[EN] Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic information from either an external resource or a distributed representation of words, resulting in informative features for training a supervised classifier for detecting paraphrase plagiarism. Obtained results indicate that the proposed metrics are consistently good in detecting different types of paraphrase plagiarism. In addition, results are very competitive against state-of-the art methods having the advantage of representing a much more simple but equally effective solution.<br />This work was partially supported by CONACYT under scholarship 401887, project grants 257383, 258588 and 2016-01-2410 and under the Thematic Networks program (Language Technologies Thematic Network project 281795). The work of the fourth author was partially supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the grant ALMAMATER (Prometeo II/2014/030).

Details

Language :
English
Database :
OpenAIRE
Journal :
RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
Accession number :
edsair.doi.dedup.....5397c62045cebb90442f77880d6d1c04