Back to Search Start Over

Multi-document summarization using off the shelf compression software

Authors :
Dragomir R. Radev
Timothy Allison
Stanko Dimitrov
Amardeep Grewal
Source :
Proceedings of the HLT-NAACL 03 on Text summarization workshop.
Publication Year :
2003
Publisher :
Association for Computational Linguistics, 2003.

Abstract

This study examines the usefulness of common off the shelf compression software such as gzip in enhancing already existing summaries and producing summaries from scratch. Since the gzip algorithm works by removing repetitive data from a file in order to compress it, we should be able to determine which sentences in a summary contain the least repetitive data by judging the gzipped size of the summary with the sentence compared to the gzipped size of the summary without the sentence. By picking the sentence that increased the size of the summary the most, we hypothesized that the summary will gain the sentence with the most new information. This hypothesis was found to be true in many cases and to varying degrees in this study.

Details

Database :
OpenAIRE
Journal :
Proceedings of the HLT-NAACL 03 on Text summarization workshop
Accession number :
edsair.doi...........2b2577f42371f939e0d5fe5e5459dd00
Full Text :
https://doi.org/10.3115/1119467.1119470