Back to Search Start Over

Design of Text Resources and Tools

Authors :
Gábor Mihály Tóth
Barbara McGillivray
Source :
Applying Language Technology in Humanities Research ISBN: 9783030464929
Publication Year :
2020
Publisher :
Springer International Publishing, 2020.

Abstract

This chapter guides the reader through the key stages of creating language resources. After explaining the difference between linguistic corpora and other text collections, the authors briefly introduce the typology of corpora created by corpus linguists and the concept of corpus annotation. Basic terminology from natural language processing (NLP) and corpus linguistics is introduced, alongside an explanation of the main components of an NLP pipeline and tools, including pre-processing, part-of-speech tagging, lemmatization, and entity extraction.

Details

ISBN :
978-3-030-46492-9
ISBNs :
9783030464929
Database :
OpenAIRE
Journal :
Applying Language Technology in Humanities Research ISBN: 9783030464929
Accession number :
edsair.doi...........a2897af027ed680832e7513c77970355