Back to Search Start Over

Zeta revisited.

Authors :
Hoover, David L
Source :
Digital Scholarship in the Humanities. Dec2022, Vol. 37 Issue 4, p1002-1021. 20p.
Publication Year :
2022

Abstract

It has been claimed that traditional interpretations of Zeta based on graphs with bisector lines (Craig and Kinney, 2009, Shakespeare, Computers, and the Mystery of Authorship. Cambridge: Cambridge University Press.) are unsound, that validation with a held-out segments is inadequate, that counting types rather than tokens is suspect, and that the relationship between stronger Zeta scores and shorter word n-grams is an artifact of the method rather than a research finding (Rizvi, 2019a, The interpretation of zeta test results. Digital Scholarship in the Humanities , 34 (2): 401–18). All of these claims are unsound. The separation of base and counter segments in a Zeta analysis is in fact an important research finding, the traditional interpretation of Zeta results based on a bisector line remain sound, and validation with a held-out segments is an appropriate method. Zeta's reliance on consistency rather than frequency is not a bug but valuable feature that provides an important complementary method to those based on frequency. The results of extensive testing on corpora of prose, poetry, and drama show that increasing word n-gram length (but not character n-gram length) is often negatively correlated with classification accuracy. Higher Zeta scores are often correlated with better results, though the results vary a great deal depending on the corpus, the classification method, and the type and length of n-gram. Most importantly, these results show that Zeta gives strong results that are competitive with methods like Cosine Delta and Support Vector Machine on these classification tasks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
2055768X
Volume :
37
Issue :
4
Database :
Academic Search Index
Journal :
Digital Scholarship in the Humanities
Publication Type :
Academic Journal
Accession number :
159850225
Full Text :
https://doi.org/10.1093/llc/fqab095