Back to Search
Start Over
Design of Text Resources and Tools
- Source :
- Applying Language Technology in Humanities Research ISBN: 9783030464929
- Publication Year :
- 2020
- Publisher :
- Springer International Publishing, 2020.
-
Abstract
- This chapter guides the reader through the key stages of creating language resources. After explaining the difference between linguistic corpora and other text collections, the authors briefly introduce the typology of corpora created by corpus linguists and the concept of corpus annotation. Basic terminology from natural language processing (NLP) and corpus linguistics is introduced, alongside an explanation of the main components of an NLP pipeline and tools, including pre-processing, part-of-speech tagging, lemmatization, and entity extraction.
- Subjects :
- Typology
Computer science
Part-of-speech tagging
business.industry
Lemmatisation
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
computer.software_genre
ComputingMethodologies_ARTIFICIALINTELLIGENCE
Pipeline (software)
Terminology
Metadata
Annotation
ComputingMethodologies_PATTERNRECOGNITION
Corpus linguistics
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Artificial intelligence
business
computer
Natural language processing
Subjects
Details
- ISBN :
- 978-3-030-46492-9
- ISBNs :
- 9783030464929
- Database :
- OpenAIRE
- Journal :
- Applying Language Technology in Humanities Research ISBN: 9783030464929
- Accession number :
- edsair.doi...........a2897af027ed680832e7513c77970355