Back to Search Start Over

Large-Scale Optical Character Recognition of Ancient Greek

Authors :
Bruce Robertson
Federico Boschetti
Source :
Mouseion (Calg.) 14 (III series) (2017): 341–359., info:cnr-pdr/source/autori:Bruce Robertson Federico Boschetti/titolo:Large-Scale Optical Character Recognition of Ancient Greek/doi:/rivista:Mouseion (Calg.)/anno:2017/pagina_da:341/pagina_a:359/intervallo_pagine:341–359/volume:14 (III series)
Publication Year :
2017
Publisher :
University of Calgary Press, for the Classical Association of Canada., Calgary, Canada, 2017.

Abstract

This paper documents our campaign to undertake the large-scale optical character recognition of ancient, or polytonic, Greek. Building upon the Gamera OCR engine and developing a suite of post-processing tools, including automatic spellcheck, we processed 1,200 volumes comprising 329,002,271 Greek words. A sample of 10 pages is studied in detail; they demonstrate the degree to which each step of post-processing improved the results, and with which source documents. These pages attain an average character accuracy of about 96%. These results will provide a basis for further improvements, including the training of other open-source OCR engines.

Details

Language :
English
Database :
OpenAIRE
Journal :
Mouseion (Calg.) 14 (III series) (2017): 341–359., info:cnr-pdr/source/autori:Bruce Robertson Federico Boschetti/titolo:Large-Scale Optical Character Recognition of Ancient Greek/doi:/rivista:Mouseion (Calg.)/anno:2017/pagina_da:341/pagina_a:359/intervallo_pagine:341–359/volume:14 (III series)
Accession number :
edsair.doi.dedup.....44fde5cd4c241a0d083b56b2da74ad01