Author: "Shi, Haoyuan" / Database: OAIster - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shi, Haoyuan"' showing total 5 results

Start Over Author "Shi, Haoyuan" Database OAIster

5 results on '"Shi, Haoyuan"'

1. Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

Author: Li, Yunxin, Chen, Xinyu, Hu, Baotian, Shi, Haoyuan, Zhang, Min, Li, Yunxin, Chen, Xinyu, Hu, Baotian, Shi, Haoyuan, and Zhang, Min
Abstract: Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e.g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i.e., connecting visuals to their relevant knowledge. Visual knowledge plays a significant role in analyzing, inferring, and interpreting information from visuals, helping improve the accuracy of answers to knowledge-based visual questions. In this paper, we mainly explore improving LMMs with visual-language knowledge alignment, especially aimed at challenging knowledge-based visual question answering (VQA). To this end, we present a Cognitive Visual-Language Mapper (CVLM), which contains a pretrained Visual Knowledge Aligner (VKA) and a Fine-grained Knowledge Adapter (FKA) used in the multimodal instruction tuning stage. Specifically, we design the VKA based on the interaction between a small language model and a visual encoder, training it on collected image-knowledge pairs to achieve visual knowledge acquisition and projection. FKA is employed to distill the fine-grained visual knowledge of an image and inject it into Large Language Models (LLMs). We conduct extensive experiments on knowledge-based VQA benchmarks and experimental results show that CVLM significantly improves the performance of LMMs on knowledge-based VQA (average gain by 5.0%). Ablation studies also verify the effectiveness of VKA and FKA, respectively., Comment: working in progress, under review
Published: 2024

2. TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction

Author: Chen, Yinda, Shi, Haoyuan, Liu, Xiaoyu, Shi, Te, Zhang, Ruobing, Liu, Dong, Xiong, Zhiwei, Wu, Feng, Chen, Yinda, Shi, Haoyuan, Liu, Xiaoyu, Shi, Te, Zhang, Ruobing, Liu, Dong, Xiong, Zhiwei, and Wu, Feng
Abstract: Autoregressive next-token prediction is a standard pretraining method for large-scale language models, but its application to vision tasks is hindered by the non-sequential nature of image data, leading to cumulative errors. Most vision models employ masked autoencoder (MAE) based pretraining, which faces scalability issues. To address these challenges, we introduce \textbf{TokenUnify}, a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction. We provide theoretical evidence demonstrating that TokenUnify mitigates cumulative errors in visual autoregression. Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution, ideal for creating spatially correlated long sequences. This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date and providing a unified benchmark for experimental validation. Leveraging the Mamba network inherently suited for long-sequence modeling on this dataset, TokenUnify not only reduces the computational complexity but also leads to a significant 45\% improvement in segmentation performance on downstream EM neuron segmentation tasks compared to existing methods. Furthermore, TokenUnify demonstrates superior scalability over MAE and traditional autoregressive methods, effectively bridging the gap between pretraining strategies for language and vision models. Code is available at \url{https://github.com/ydchen0806/TokenUnify}.
Published: 2024

3. VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Author: Li, Yunxin, Hu, Baotian, Shi, Haoyuan, Wang, Wei, Wang, Longyue, Zhang, Min, Li, Yunxin, Hu, Baotian, Shi, Haoyuan, Wang, Wei, Wang, Longyue, and Zhang, Min
Abstract: Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context. Yet, a challenging type of visual math lies in the multimodal graph theory problem, which demands that LMMs understand the graphical structures accurately and perform multi-step reasoning on the visual graph. Additionally, exploring multimodal graph theory problems will lead to more effective strategies in fields like biology, transportation, and robotics planning. To step forward in this direction, we are the first to design a benchmark named VisionGraph, used to explore the capabilities of advanced LMMs in solving multimodal graph theory problems. It encompasses eight complex graph problem tasks, from connectivity to shortest path problems. Subsequently, we present a Description-Program-Reasoning (DPR) chain to enhance the logical accuracy of reasoning processes through graphical structure description generation and algorithm-aware multi-step reasoning. Our extensive study shows that 1) GPT-4V outperforms Gemini Pro in multi-step graph reasoning; 2) All LMMs exhibit inferior perception accuracy for graphical structures, whether in zero/few-shot settings or with supervised fine-tuning (SFT), which further affects problem-solving performance; 3) DPR significantly improves the multi-step graph reasoning capabilities of LMMs and the GPT-4V (DPR) agent achieves SOTA performance., Comment: 17 pages; Accepted by ICML 2024
Published: 2024

4. Toward Moir\'e-Free and Detail-Preserving Demosaicking

Author: Li, Xuanchen, Niu, Yan, Zhao, Bo, Shi, Haoyuan, An, Zitong, Li, Xuanchen, Niu, Yan, Zhao, Bo, Shi, Haoyuan, and An, Zitong
Abstract: 3D convolutions are commonly employed by demosaicking neural models, in the same way as solving other image restoration problems. Counter-intuitively, we show that 3D convolutions implicitly impede the RGB color spectra from exchanging complementary information, resulting in spectral-inconsistent inference of the local spatial high frequency components. As a consequence, shallow 3D convolution networks suffer the Moir\'e artifacts, but deep 3D convolutions cause over-smoothness. We analyze the fundamental difference between demosaicking and other problems that predict lost pixels between available ones (e.g., super-resolution reconstruction), and present the underlying reasons for the confliction between Moir\'e-free and detail-preserving. From the new perspective, our work decouples the common standard convolution procedure to spectral and spatial feature aggregations, which allow strengthening global communication in the spectral dimension while respecting local contrast in the spatial dimension. We apply our demosaicking model to two tasks: Joint Demosaicking-Denoising and Independently Demosaicking. In both applications, our model substantially alleviates artifacts such as Moir\'e and over-smoothness at similar or lower computational cost to currently top-performing models, as validated by diverse evaluations. Source code will be released along with paper publication., Comment: 11 pages, 5 figures, 5 tables
Published: 2023

5. Quantification of abasic sites in DNA and RNA by LC-MS/MS method

Author: Shi, Haoyuan and Shi, Haoyuan
Abstract: Apurinic/apyrimidinic (AP) sites are common damage lesions of both DNA and RNA, which are resulted from glycoside bond hydrolysis of normal nucleotides and damage repair mechanisms of the damaged nucleotides with abnormal bases. As an important damage intermediate product, AP sites provides information of the response to physical and chemical factors in DNA and RNA. Quantification of AP sites will reveal the relation to endogenous or exogenous damage sources. However, strictly quantitative research of AP sites is limited by technical problems and the existing analytical methods lack enough sensitivity and selectivity. Since the potential of liquid chromatography-tandem mass spectrometry (LC-MS/MS) and related detecting techniques, we attempted to develop strict quantification method of AP sites in DNA and RNA using these techniques. The method we have built consist of several essential processes: 1) Pretreatment of DNA/RNA samples; 2) enzyme-catalyzed digestion of AP site-containing DNA/RNA; 3) derivatization reaction with pentafluorophenylhydrazine (PFPH); and 4) quantitative studies by LC-MS/MS method combined with isotope dilution technique. The LC-MS/MS detection is performed on a triple quadrupole mass spectrometer that shows high performance in both detection sensitivity and selectivity. The detection limit is as low as 6.5 fmol, equivalent to 4 AP sites per 109 nucleotides in 5μg DNA, and at least 10 times lower than existing quantification methods. The whole processes of the developed method was examined and assessed by using AP site-containing oligonucleotides as reference sample. The quantitative method succeeded in monitoring methyl methanesulfonate (MMS)-induced formation of AP sites in cellular DNA. This method also works in RNA, and allows the comparison of the different response between DNA and RNA under depurinated condition.
Published: 2015

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Shi, Haoyuan"'

1. Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

2. TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction

3. VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

4. Toward Moir\'e-Free and Detail-Preserving Demosaicking

5. Quantification of abasic sites in DNA and RNA by LC-MS/MS method

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

5 results on '"Shi, Haoyuan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources