1. SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation
- Author
-
Kasanishi, Tetsu, Isonuma, Masaru, Mori, Junichiro, and Sakata, Ichiro
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/tetsu9923/SciReviewGen., Comment: ACL findings 2023 (to be appeared). arXiv admin note: text overlap with arXiv:1810.04020 by other authors
- Published
- 2023