Back to Search
Start Over
User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization
- Publication Year :
- 2023
-
Abstract
- Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT.<br />Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,Online,NAACL
Details
- Database :
- OAIster
- Notes :
- English
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1378466831
- Document Type :
- Electronic Resource