Back to Search Start Over

PGN-LM Model and Forcing-Seq2Seq Model: Multiple Automatic Models of Title Generation for Natural Text using Deep Learning

Authors :
To Thanh Nhan
Nguyen Thi Hiep Thuan
Quan Thanh Tho
Source :
REV Journal on Electronics and Communications. 12
Publication Year :
2022
Publisher :
The Radio and Electronics Association of Vietnam (REV), 2022.

Abstract

In the current era, the amount of information from the Internet in general and the electronic press in particular has increased rapidly and has extremely useful information value in all aspects of life, many popular users have posted several high-quality writings as casual blogs, notes or reviews. Some of them are even selected by editors to be published in professional venues. However, the original posts often come without titles, which are needed to be manually added by the editing teams. This task would be done automatically, with the recent advancement of AI techniques, especially deep learning. Even though auto-title can be considered as a specific case of text summarization, this job poses some major different requirements. Basically, a title is generally short but it needs to capture major content while still maintaining the writing style of the original document. To fulfill those constraints, we introduce PGN-LM Model, an architecture evolved from the Pointer Generator Network, with the ability to solve Out-of-Vocabulary problems that traditional Seq2Seq models cannot handle, and at the same time combined with language modeling techniques. In addition, we also introduce a model called Forcing-Seq2Seq Model, an enhanced Seq2Seq architecture, in which the classical TF-IDF scores are incorporated with Named Entity Recognition method to identify the major keywords of the original texts. To enforce the appearance of those keywords in the generated titles, the specific Teacher Forcing mechanism combined with the language model technique are employed. We have tested our approaches with real datasets and obtained promising initial results, on both metrics of machine and human perspectives.

Details

ISSN :
1859378X
Volume :
12
Database :
OpenAIRE
Journal :
REV Journal on Electronics and Communications
Accession number :
edsair.doi...........26637967729da6fe3b82ed55f24fdd64
Full Text :
https://doi.org/10.21553/rev-jec.285