Start Over

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Authors :: Wu, Haoning
Zhang, Zicheng
Zhang, Erli
Chen, Chaofeng
Liao, Liang
Wang, Annan
Xu, Kaixin
Li, Chunyi
Hou, Jingwen
Zhai, Guangtao
Xue, Geng
Sun, Wenxiu
Yan, Qiong
Lin, Weisi
Publication Year :: 2023
Abstract: Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level visual tasks, their related abilities are still preliminary and need to be improved. In order to enhance these models, we conduct a large-scale subjective experiment collecting a vast number of real human feedbacks on low-level vision. Each feedback follows a pathway that starts with a detailed description on the low-level visual appearance (*e.g. clarity, color, brightness* of an image, and ends with an overall conclusion, with an average length of 45 words. The constructed **Q-Pathway** dataset includes 58K detailed human feedbacks on 18,973 images with diverse low-level appearance. Moreover, to enable foundation models to robustly respond to diverse types of questions, we design a GPT-participated conversion to process these feedbacks into diverse-format 200K instruction-response pairs. Experimental results indicate that the **Q-Instruct** consistently elevates low-level perception and understanding abilities across several foundational models. We anticipate that our datasets can pave the way for a future that general intelligence can perceive, understand low-level visual appearance and evaluate visual quality like a human. Our dataset, model zoo, and demo is published at: https://q-future.github.io/Q-Instruct.<br />Comment: 16 pages, 11 figures, page 12-16 as appendix

Subjects :: Computer Science - Computer Vision and Pattern Recognition
Computer Science - Multimedia

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2311.06783
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources