Start Over

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Authors :: Wan, Jianqiang
Song, Sibo
Yu, Wenwen
Liu, Yuliang
Cheng, Wenqing
Huang, Fei
Bai, Xiang
Yao, Cong
Yang, Zhibo
Publication Year :: 2024
Abstract: Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions. Various methods have been proposed to address the challenging problem of VsTP. However, due to the diversified targets and heterogeneous schemas, previous works usually design task-specific architectures and objectives for individual tasks, which inadvertently leads to modal isolation and complex workflow. In this paper, we propose a unified paradigm for parsing visually-situated text across diverse scenarios. Specifically, we devise a universal model, called OmniParser, which can simultaneously handle three typical visually-situated text parsing tasks: text spotting, key information extraction, and table recognition. In OmniParser, all tasks share the unified encoder-decoder architecture, the unified objective: point-conditioned text generation, and the unified input & output representation: prompt & structured sequences. Extensive experiments demonstrate that the proposed OmniParser achieves state-of-the-art (SOTA) or highly competitive performances on 7 datasets for the three visually-situated text parsing tasks, despite its unified, concise design. The code is available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery.<br />Comment: CVPR 2024

Subjects :: Computer Science - Computer Vision and Pattern Recognition

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2403.19128
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources