1. Big data or not enough? Zeta test reliability and the attribution of Henry VI.
- Author
-
Barber, Ros
- Subjects
- *
TEST reliability , *RELIABILITY in engineering , *BIG data , *ALGORITHMS , *STYLOMETRY , *ATTRIBUTION of authorship - Abstract
In 2016, the editors of the New Oxford Shakespeare announced that certain Shakespeare plays could be attributed to co-authors, and certain anonymous plays to Shakespeare, on the basis of non-traditional attribution methods known collectively as computational stylistics, or stylometry. This article investigates the efficacy of a key algorithm used to attribute parts of the Henry VI plays to Christopher Marlowe, the Zeta method invented by John Burrows and adapted by Hugh Craig. Zeta, a test widely used in computational stylistics, is described by Gabriel Egan as 'by some way the most powerful general-purpose authorship tool currently available'. This article offers extensive independent testing of Zeta. Following criticism of the existing method of Zeta analysis, this article introduces a new, statistically sound method for analysing Zeta results. It investigates a claim that the test is 99.9% reliable in differentiating Shakespeare's style from Marlowe's. Examining the conditions under which certain authors were ruled in or out of co-authorship of the Henry VI plays, it determines the effect of disparity in data set size on Zeta's reliability, as well the effect of small data sets. Several test results confirm that Zeta is unduly influenced by genre. The article concludes that in the light of this study, the small canons of most Early Modern dramatists, particularly where they are genre-skewed like Marlowe's, do not provide enough data for Zeta to be reliable [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF