Back to Search
Start Over
The CACAPO Dataset: A Multilingual, Multi-Domain Dataset for Neural Pipeline and End-to-End Data-to-Text Generation
- Source :
- Proceedings of The 13th International Conference on Natural Language Generation, 68-79, STARTPAGE=68;ENDPAGE=79;TITLE=Proceedings of The 13th International Conference on Natural Language Generation
- Publication Year :
- 2020
-
Abstract
- This paper describes the CACAPO dataset, built for training both neural pipeline and end-to-end data-to-text language generation systems. The dataset is multilingual (Dutch and English), and contains almost 10,000 sentences from human-written news texts in the sports, weather, stocks, and incidents domain, together with aligned attribute-value paired data. The dataset is unique in that the linguistic variation and indirect ways of expressing data in these texts reflect the challenges of real world NLG tasks.
- Subjects :
- ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Proceedings of The 13th International Conference on Natural Language Generation, 68-79, STARTPAGE=68;ENDPAGE=79;TITLE=Proceedings of The 13th International Conference on Natural Language Generation
- Accession number :
- edsair.narcis........fac43e94b9275d7bc9002c9a3af6e2e6