As a necessary precondition of large-scale digital humanities projects, texts, archival materials, and historical individuals must become data—a process that involves choices about collection, curation, and preparation. While scholars of media and digital culture make clear the mediated and constructed nature of data, practitioners of “distant reading” and related methods have been less inclined to offer a transparent account of their materials. This essay models a theoretically rigorous approach to a new dataset of the authors’ creation: a set of 1,421 playbills from eighteenth-century London. Tracing how categories operate over time on playbills, it finds that the inclusion of genre is a more powerful mode of categorization for eighteenth-century theatrical publics than the inclusion of a named author. The case study of the generically and authorially indeterminate dramatic adaptations of Oroonoko , and the shifting categories used to advertise them, reveals that eighteenth-century theatrical publics had an idiom, previously unrecognized by scholars, for talking about generic ambiguity and even using it to market performances. Oroonoko and other plays that similarly challenged conventional generic and authorial categorization were often advertised as “a Play,” a seemingly empty label that is revealed to carry significance when these playbills are subjected to quantitative analysis. Throughout, the essay attends to the transformation of the archival artifacts into data objects, insisting that the knowledge claims that can be made based on these playbills are enabled, rather than hampered, by an awareness of the highly mediated nature of the dataset. As such, this study demonstrates how a more reflective approach to humanities data collection opens up new interpretive terrain—terrain that takes advantage of the opportunities available at scale while maintaining the humanities’ commitment to ambiguity, mediation, and situatedness.