Back to Search Start Over

Zeta & Eta: An Exploration and Evaluation of two Dispersion-based Measures of Distinctiveness

Authors :
Du, Keli
Dudar, Julia
Rok, Cora
Schöch, Christof
Publication Year :
2021
Publisher :
Zenodo, 2021.

Abstract

In Corpus Linguistics, numerous statistical measures have been adopted to analyze large amounts of textual data in a contrastive perspective, in order to extract characteristic or “distinctive” features. While the most widely-used keyness measures are based on word frequency, an increasing number of research papers recently suggested dispersion-based measures as a better solution. These, however, are not new to Computational Literary Studies (CLS). In 2007, John Burrows introduced Zeta, a statistical measure that is mainly based on the degree of dispersion of a feature in a text corpus. In this paper, we also introduce Eta, a new measure of distinctiveness that is based on deviation of proportions suggested by Stefan Gries. By comparing Eta with Zeta, we demonstrate that both measures are able to identify relevant, interpretable distinctive words in a target corpus. Additionally, we make a first attempt to detect the key differences between these two measures by interpreting the top distinctive words. DFG Schwerpunktprogramm SPP 2207 "Computational Literary Studies" Online: https://gepris.dfg.de/gepris/projekt/402743989 https://dfg-spp-cls.github.io/ Teilprojekt: "Zeta und Konsorten. Distinktivitätsmaße für die Digitalen Literaturwissenschaften" Online: https://gepris.dfg.de/gepris/projekt/424211690 https://dfg-spp-cls.github.io/projects_en/2020/01/24/TP-Zeta_and_Company/ https://zeta-project.eu/de/

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....0671d36ffa5ccfc5848610924fba5686
Full Text :
https://doi.org/10.5281/zenodo.5532519