Back to Search Start Over

SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation

Authors :
Kasanishi, Tetsu
Isonuma, Masaru
Mori, Junichiro
Sakata, Ichiro
Publication Year :
2023

Abstract

Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/tetsu9923/SciReviewGen.<br />Comment: ACL findings 2023 (to be appeared). arXiv admin note: text overlap with arXiv:1810.04020 by other authors

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2305.15186
Document Type :
Working Paper