1. ALGORITMO FONÉTICO PARA DETECCIÓN DE CADENAS DE TEXTO DUPLICADAS EN EL IDIOMA ESPAÑOL.
- Author
-
Amón, Iván, Moreno, Francisco, and Echeverri, Jaime
- Subjects
- *
SPELLING errors , *ALGORITHMS , *ENGINEERING models , *DATA quality , *PREVENTION , *SPANISH language , *PHONETICS , *ORTHOGRAPHY & spelling - Abstract
Often data that should be written so they are not identical due to misspellings and typos, variations in word order, use of prefixes and suffixes, among others. Phonetic techniques for duplicate detection are not geared toward the Spanish language, which makes the identification and correction of problems such as spelling errors in texts written in this language. In this research paper we propose an algorithm called PhoneticSpanish to detect duplicate text strings which considers the presence of spelling errors in Spanish. The proposed algorithm was compared with nine techniques to detect duplicates. The results were satisfactory and the algorithm that performed better than the other techniques and demonstrate opportunities for improved analysis of information in Spanish. [ABSTRACT FROM AUTHOR]
- Published
- 2012