Back to Search
Start Over
SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation
- Publication Year :
- 2023
-
Abstract
- Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/tetsu9923/SciReviewGen.<br />Comment: ACL findings 2023 (to be appeared). arXiv admin note: text overlap with arXiv:1810.04020 by other authors
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2305.15186
- Document Type :
- Working Paper