Back to Search
Start Over
Large-Scale Optical Character Recognition of Ancient Greek
- Source :
- Mouseion (Calg.) 14 (III series) (2017): 341–359., info:cnr-pdr/source/autori:Bruce Robertson Federico Boschetti/titolo:Large-Scale Optical Character Recognition of Ancient Greek/doi:/rivista:Mouseion (Calg.)/anno:2017/pagina_da:341/pagina_a:359/intervallo_pagine:341–359/volume:14 (III series)
- Publication Year :
- 2017
- Publisher :
- University of Calgary Press, for the Classical Association of Canada., Calgary, Canada, 2017.
-
Abstract
- This paper documents our campaign to undertake the large-scale optical character recognition of ancient, or polytonic, Greek. Building upon the Gamera OCR engine and developing a suite of post-processing tools, including automatic spellcheck, we processed 1,200 volumes comprising 329,002,271 Greek words. A sample of 10 pages is studied in detail; they demonstrate the degree to which each step of post-processing improved the results, and with which source documents. These pages attain an average character accuracy of about 96%. These results will provide a basis for further improvements, including the training of other open-source OCR engines.
- Subjects :
- Archeology
History
Scale (ratio)
business.industry
Suite
05 social sciences
02 engineering and technology
Ancient Greek
Optical character recognition
Ancient history
computer.software_genre
language.human_language
OCR
0202 electrical engineering, electronic engineering, information engineering
language
020201 artificial intelligence & image processing
Artificial intelligence
0509 other social sciences
Classics
050904 information & library sciences
business
computer
Natural language processing
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Mouseion (Calg.) 14 (III series) (2017): 341–359., info:cnr-pdr/source/autori:Bruce Robertson Federico Boschetti/titolo:Large-Scale Optical Character Recognition of Ancient Greek/doi:/rivista:Mouseion (Calg.)/anno:2017/pagina_da:341/pagina_a:359/intervallo_pagine:341–359/volume:14 (III series)
- Accession number :
- edsair.doi.dedup.....44fde5cd4c241a0d083b56b2da74ad01