Start Over

ResneSt-Transformer: Joint attention segmentation-free for end-to-end handwriting paragraph recognition model

Authors :: Mohammed Hamdan
Mohamed Cheriet
Source :: Array, Vol 19, Iss , Pp 100300- (2023)
Publication Year :: 2023
Publisher :: Elsevier, 2023.
Abstract: Offline handwritten text recognition (HTR) typically relies on segmented text-line images for training and transcription. However, acquiring line-level position and transcript information can be challenging and time-consuming, while automatic line segmentation algorithms are prone to errors that impede the recognition phase. To address these issues, we introduce a state-of-the-art solution that integrates vision and language models using efficient split and multi-head attention neural networks, referred to as joint attention (ResneSt-Transformer), for end-to-end recognition of handwritten paragraphs. Our proposed novel one-stage, segmentation-free pipeline employs joint attention mechanisms to process paragraph images in an end-to-end trainable manner. This pipeline comprises three modules, with the output of one serving as the input for the next. Initially, a feature extraction module employing a CNN with a split attention mechanism (ResneSt50) is utilized. Subsequently, we develop an encoder module containing four transformer layers to generate robust representations of the entire paragraph image. Lastly, we designed a decoder module with six transformer layers to construct weighted masks. The encoder and decoder modules incorporate a multi-head self-attention mechanism and positional encoding, enabling the model to concentrate on specific feature maps at the current time step. By leveraging joint attention and a segmentation-free approach, our neural network calculates split attention weights on the visual representation, facilitating implicit line segmentation. This strategy signifies a substantial advancement toward achieving end-to-end transcription of entire paragraphs. Experiments conducted on paragraph-level benchmark datasets, including RIMES, IAM, and READ 2016 test datasets, demonstrate competitive results compared to recent paragraph-level models while maintaining reduced complexity. The code and pre-trained models are available on our GitHub repository here: HTTPS link.

Subjects :: Handwritten text recognition
ResneSt
Transformer
Self attention
Segmentation-free
Lexicon-free
Computer engineering. Computer hardware
TK7885-7895
Electronic computers. Computer science
QA75.5-76.95

Details

Language :: English
ISSN :: 25900056
Volume :: 19
Issue :: 100300-
Database :: Directory of Open Access Journals
Journal :: Array
Publication Type :: Academic Journal
Accession number :: edsdoj.108f8a1ba1c34f84b30773ec0d611dcb
Document Type :: article
Full Text :: https://doi.org/10.1016/j.array.2023.100300

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

ResneSt-Transformer: Joint attention segmentation-free for end-to-end handwriting paragraph recognition model

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

ResneSt-Transformer: Joint attention segmentation-free for end-to-end handwriting paragraph recognition model

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources