Back to Search
Start Over
Semantically-informed distance and similarity measures for paraphrase plagiarism identification
- Source :
- RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
- Publication Year :
- 2018
- Publisher :
- IOS Press, 2018.
-
Abstract
- [EN] Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic information from either an external resource or a distributed representation of words, resulting in informative features for training a supervised classifier for detecting paraphrase plagiarism. Obtained results indicate that the proposed metrics are consistently good in detecting different types of paraphrase plagiarism. In addition, results are very competitive against state-of-the art methods having the advantage of representing a much more simple but equally effective solution.<br />This work was partially supported by CONACYT under scholarship 401887, project grants 257383, 258588 and 2016-01-2410 and under the Thematic Networks program (Language Technologies Thematic Network project 281795). The work of the fourth author was partially supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the grant ALMAMATER (Prometeo II/2014/030).
- Subjects :
- FOS: Computer and information sciences
Statistics and Probability
Computer science
Edit distance
Word2vec representation
02 engineering and technology
computer.software_genre
Paraphrase
Semantic similarity
Similarity (network science)
Artificial Intelligence
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Paraphrase plagiarism
Computer Science - Computation and Language
business.industry
General Engineering
Plagiarism identification
Identification (information)
Scholarship
Thematic map
Work (electrical)
020201 artificial intelligence & image processing
Artificial intelligence
business
Computation and Language (cs.CL)
computer
LENGUAJES Y SISTEMAS INFORMATICOS
Natural language processing
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
- Accession number :
- edsair.doi.dedup.....5397c62045cebb90442f77880d6d1c04