Back to Search
Start Over
PGN-LM Model and Forcing-Seq2Seq Model: Multiple Automatic Models of Title Generation for Natural Text using Deep Learning
- Source :
- REV Journal on Electronics and Communications. 12
- Publication Year :
- 2022
- Publisher :
- The Radio and Electronics Association of Vietnam (REV), 2022.
-
Abstract
- In the current era, the amount of information from the Internet in general and the electronic press in particular has increased rapidly and has extremely useful information value in all aspects of life, many popular users have posted several high-quality writings as casual blogs, notes or reviews. Some of them are even selected by editors to be published in professional venues. However, the original posts often come without titles, which are needed to be manually added by the editing teams. This task would be done automatically, with the recent advancement of AI techniques, especially deep learning. Even though auto-title can be considered as a specific case of text summarization, this job poses some major different requirements. Basically, a title is generally short but it needs to capture major content while still maintaining the writing style of the original document. To fulfill those constraints, we introduce PGN-LM Model, an architecture evolved from the Pointer Generator Network, with the ability to solve Out-of-Vocabulary problems that traditional Seq2Seq models cannot handle, and at the same time combined with language modeling techniques. In addition, we also introduce a model called Forcing-Seq2Seq Model, an enhanced Seq2Seq architecture, in which the classical TF-IDF scores are incorporated with Named Entity Recognition method to identify the major keywords of the original texts. To enforce the appearance of those keywords in the generated titles, the specific Teacher Forcing mechanism combined with the language model technique are employed. We have tested our approaches with real datasets and obtained promising initial results, on both metrics of machine and human perspectives.
- Subjects :
- Pulmonary and Respiratory Medicine
Pediatrics, Perinatology and Child Health
Subjects
Details
- ISSN :
- 1859378X
- Volume :
- 12
- Database :
- OpenAIRE
- Journal :
- REV Journal on Electronics and Communications
- Accession number :
- edsair.doi...........26637967729da6fe3b82ed55f24fdd64
- Full Text :
- https://doi.org/10.21553/rev-jec.285