Back to Search Start Over

Bridging Linguistic Gaps: Developing a Greek Text Simplification Dataset.

Authors :
Agathos, Leonidas
Avgoustis, Andreas
Kryelesi, Xristiana
Makridou, Aikaterini
Tzanis, Ilias
Mouratidis, Despoina
Kermanidis, Katia Lida
Kanavos, Andreas
Source :
Information (2078-2489); Aug2024, Vol. 15 Issue 8, p500, 25p
Publication Year :
2024

Abstract

Text simplification is crucial in bridging the comprehension gap in today's information-rich environment. Despite advancements in English text simplification, languages with intricate grammatical structures, such as Greek, often remain under-explored. The complexity of Greek grammar, characterized by its flexible syntactic ordering, presents unique challenges that hinder comprehension for native speakers, learners, tourists, and international students. This paper introduces a comprehensive dataset for Greek text simplification, containing over 7500 sentences across diverse topics such as history, science, and culture, tailored to address these challenges. We outline the methodology for compiling this dataset, including a collection of texts from Greek Wikipedia, their annotation with simplified versions, and the establishment of robust evaluation metrics. Additionally, the paper details the implementation of quality control measures and the application of machine learning techniques to analyze text complexity. Our experimental results demonstrate the dataset's initial effectiveness and potential in reducing linguistic barriers and enhancing communication, with initial machine learning models showing promising directions for future improvements in classifying text complexity. The development of this dataset marks a significant step toward improving accessibility and comprehension for a broad audience of Greek speakers and learners, fostering a more inclusive society. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20782489
Volume :
15
Issue :
8
Database :
Complementary Index
Journal :
Information (2078-2489)
Publication Type :
Academic Journal
Accession number :
179353969
Full Text :
https://doi.org/10.3390/info15080500