Language features in extractive summarization: Humans Vs. Machines.

Authors :: Arroyo-Fernández, Ignacio
Curiel, Arturo
Méndez-Cruz, Carlos-Francisco
Source :: Knowledge-Based Systems. Sep2019, Vol. 180, p1-11. 11p.
Publication Year :: 2019
Abstract: This paper presents a comparative statistical analysis of the language features most commonly used for Automatic Text Summarization (ATS), namely: Parts of Speech (PoS) (unigrams and bigrams), sentiments (by token and sentence), and Rhetorical Structure Theory (RTS) relations. The analyses were carried out on both human-made and machine-made summaries, in order to determine whether current ATS systems capture the same kind of information as humans do. Our results show that there are some marked differences between machine and human-made summaries, which at times may seem counterintuitive. For instance, named entities were usually frequent in machine-made summaries, but not in human-made ones. Similarly, words perceived to hold a "neutral" sentiment were systematically favored by machines, but not always by humans. • This paper investigates pertinence of language features commonly utilized in ATS. • Statistical comparisons were conducted between human-made and machine-made summaries. • Some features were interestingly used moderately by humans, but not by machines. [ABSTRACT FROM AUTHOR]