Back to Search Start Over

Section mixture models for scientific document summarization.

Authors :
Conroy, John M.
Davis, Sashka T.
Source :
International Journal on Digital Libraries. Sep2018, Vol. 19 Issue 2/3, p305-322. 18p. 12 Charts, 5 Graphs.
Publication Year :
2018

Abstract

In this paper, we present a system for summarization of scientific and structured documents that has three components: section mixture models are used for estimation of the weights of terms; a hypothesis test to select a subset of these terms; and a sentence extractor based on techniques for combinatorial optimization. The section mixture models approach is an adaptation of a bigram mixture model based on the main sections of a scientific document and a collection of citing sentences (citances) from papers that reference the document. The model was adapted from earlier work done on Biomedical documents used in the summarization task of the 2014 Text Analysis Conference (TAC 2014). The mixture model trained on the Biomedical data was used also on the data for the Computational Linguistics scientific summarization task of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (CL-SciSumm 2016). This model gives rise to machine-generated summaries with ROUGE scores that are nearly as strong as those seen on the Biomedical data and was also the highest scoring submission to the task of generating a human summary. For sentence extraction, we use the OCCAMS algorithm (Davis et al., in: Vreeken, Ling, Zaki, Siebes, Yu, Goethals, Webb, Wu (eds) ICDM workshops, IEEE Computer Society, pp 454-463, <xref>2012</xref>) which takes the sentences from the original document and the assignment of weights of the terms computed by the language models and outputs a set of minimally overlapping sentences whose combined term coverage is maximized. Finally, we explore the importance of an appropriate background model for the hypothesis test to select terms to achieve the best quality summaries. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14325012
Volume :
19
Issue :
2/3
Database :
Academic Search Index
Journal :
International Journal on Digital Libraries
Publication Type :
Academic Journal
Accession number :
131259128
Full Text :
https://doi.org/10.1007/s00799-017-0218-6