Author: "Zheng, Jinliang" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zheng, Jinliang"' showing total 20 results

Start Over Author "Zheng, Jinliang"

20 results on '"Zheng, Jinliang"'

1. Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

Author: Li, Jianxiong, Wang, Zhihao, Zheng, Jinliang, Zhou, Xiaoai, Wang, Guanming, Song, Guanglu, Liu, Yu, Liu, Jingjing, Zhang, Ya-Qin, Yu, Junzhi, and Zhan, Xianyuan
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions. Directly annotating multimodal instructions for model training proves impractical, due to the sparsity of paired multimodal data. In this study, we demonstrate that by leveraging unimodal instructions abundant in real data, we can effectively teach robots to learn multimodal task specifications. First, we endow the robot with strong \textit{Cross-modality Alignment} capabilities, by pretraining a robotic multimodal encoder using extensive out-of-domain data. Then, we employ two Collapse and Corrupt operations to further bridge the remaining modality gap in the learned multimodal representation. This approach projects different modalities of identical task goal as interchangeable representations, thus enabling accurate robotic operations within a well-aligned multimodal latent space. Evaluation across more than 130 tasks and 4000 evaluations on both simulated LIBERO benchmark and real robot platforms showcases the superior capabilities of our proposed framework, demonstrating significant advantage in overcoming data constraints in robotic learning. Website: zh1hao.wang/Robo_MUTUAL, Comment: preprint
Published: 2024

2. MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Author: Liu, Jihao, Huang, Xin, Zheng, Jinliang, Liu, Boxiao, Wang, Jia, Yoshie, Osamu, Liu, Yu, and Li, Hongsheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs). While existing visual instruction datasets often focus on question-answering, they struggle to generalize to broader application scenarios such as creative writing, summarization, or image analysis. To address these limitations, we propose a novel approach to constructing MM-Instruct that leverages the strong instruction-following capabilities of existing LLMs to generate novel visual instruction data from large-scale but conventional image captioning datasets. MM-Instruct first leverages ChatGPT to automatically generate diverse instructions from a small set of seed instructions through augmenting and summarization. It then matches these instructions with images and uses an open-sourced large language model (LLM) to generate coherent answers to the instruction-image pairs. The LLM is grounded by the detailed text descriptions of images in the whole answer generation process to guarantee the alignment of the instruction data. Moreover, we introduce a benchmark based on the generated instruction data to evaluate the instruction-following capabilities of existing LMMs. We demonstrate the effectiveness of MM-Instruct by training a LLaVA-1.5 model on the generated data, denoted as LLaVA-Instruct, which exhibits significant improvements in instruction-following capabilities compared to LLaVA-1.5 models. The MM-Instruct dataset, benchmark, and pre-trained models are available at https://github.com/jihaonew/MM-Instruct., Comment: Dataset and models are available at https://github.com/jihaonew/MM-Instruct
Published: 2024

3. Instruction-Guided Visual Masking

Author: Zheng, Jinliang, Li, Jianxiong, Cheng, Sijie, Zheng, Yinan, Li, Jiaming, Liu, Jihao, Liu, Yu, Liu, Jingjing, and Zhan, Xianyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model. By constructing visual masks for instruction-irrelevant regions, IVM-enhanced multimodal models can effectively focus on task-relevant image regions to better align with complex instructions. Specifically, we design a visual masking data generation pipeline and create an IVM-Mix-1M dataset with 1 million image-instruction pairs. We further introduce a new learning technique, Discriminator Weighted Supervised Learning (DWSL) for preferential IVM training that prioritizes high-quality data samples. Experimental results on generic multimodal tasks such as VQA and embodied robotic control demonstrate the versatility of IVM, which as a plug-and-play tool, significantly boosts the performance of diverse multimodal models, yielding new state-of-the-art results across challenging multimodal benchmarks. Code, model and data are available at https://github.com/2toinf/IVM., Comment: NeurIPS 2024
Published: 2024

4. Enhancing Vision-Language Model with Unmasked Token Alignment

Author: Liu, Jihao, Zheng, Jinliang, Liu, Boxiao, Liu, Yu, and Li, Hongsheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Contrastive pre-training on image-text pairs, exemplified by CLIP, becomes a standard technique for learning multi-modal visual-language representations. Although CLIP has demonstrated remarkable performance, training it from scratch on noisy web-scale datasets is computationally demanding. On the other hand, mask-then-predict pre-training approaches, like Masked Image Modeling (MIM), offer efficient self-supervised learning for single-modal representations. This paper introduces Unmasked Token Alignment (UTA), a method that leverages existing CLIP models to further enhance its vision-language representations. UTA trains a Vision Transformer (ViT) by aligning unmasked visual tokens to the corresponding image tokens from a frozen CLIP vision encoder, which automatically aligns the ViT model with the CLIP text encoder. The pre-trained ViT can be directly applied for zero-shot evaluation even without training on image-text pairs. Compared to MIM approaches, UTA does not suffer from training-finetuning inconsistency and is much more training-efficient by avoiding using the extra [MASK] tokens. Extensive experimental results demonstrate that UTA can enhance CLIP models and outperform existing MIM methods on various uni- and multi-modal benchmarks. Code and models are available at https://github.com/jihaonew/UTA., Comment: Accepted by TMLR; Code and models are available at https://github.com/jihaonew/UTA
Published: 2024

5. GLID: Pre-training a Generalist Encoder-Decoder Vision Model

Author: Liu, Jihao, Zheng, Jinliang, Liu, Yu, and Li, Hongsheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks. While self-supervised pre-training approaches, e.g., Masked Autoencoder, have shown success in transfer learning, task-specific sub-architectures are still required to be appended for different downstream tasks, which cannot enjoy the benefits of large-scale pre-training. GLID overcomes this challenge by allowing the pre-trained generalist encoder-decoder to be fine-tuned on various vision tasks with minimal task-specific architecture modifications. In the GLID training scheme, pre-training pretext task and other downstream tasks are modeled as "query-to-answer" problems, including the pre-training pretext task and other downstream tasks. We pre-train a task-agnostic encoder-decoder with query-mask pairs. During fine-tuning, GLID maintains the pre-trained encoder-decoder and queries, only replacing the topmost linear transformation layer with task-specific linear heads. This minimizes the pretrain-finetune architecture inconsistency and enables the pre-trained model to better adapt to downstream tasks. GLID achieves competitive performance on various vision tasks, including object detection, image segmentation, pose estimation, and depth estimation, outperforming or matching specialist models such as Mask2Former, DETR, ViTPose, and BinsFormer., Comment: CVPR 2024
Published: 2024

6. DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Author: Li, Jianxiong, Zheng, Jinliang, Zheng, Yinan, Mao, Liyuan, Hu, Xiao, Cheng, Sijie, Niu, Haoyi, Liu, Jihao, Liu, Yu, Liu, Jingjing, Zhang, Ya-Qin, and Zhan, Xianyuan
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding. Most existing methods approach these via separate objectives, which often reach sub-optimal solutions. In this paper, we propose a universal unified objective that can simultaneously extract meaningful task progression information from image sequences and seamlessly align them with language instructions. We discover that via implicit preferences, where a visual trajectory inherently aligns better with its corresponding language instruction than mismatched pairs, the popular Bradley-Terry model can transform into representation learning through proper reward reparameterizations. The resulted framework, DecisionNCE, mirrors an InfoNCE-style objective but is distinctively tailored for decision-making tasks, providing an embodied representation learning framework that elegantly extracts both local and global task progression features, with temporal consistency enforced through implicit time contrastive learning, while ensuring trajectory-level instruction grounding via multimodal joint encoding. Evaluation on both simulated and real robots demonstrates that DecisionNCE effectively facilitates diverse downstream policy learning tasks, offering a versatile solution for unified representation and reward learning. Project Page: https://2toinf.github.io/DecisionNCE/, Comment: ICML 2024
Published: 2024

7. MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

Author: Liu, Jihao, Huang, Xin, Zheng, Jinliang, Liu, Yu, and Li, Hongsheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers. Existing masked image modeling (MIM) methods for hierarchical Vision Transformers replace a random subset of input tokens with a special [MASK] symbol and aim at reconstructing original image tokens from the corrupted image. However, we find that using the [MASK] symbol greatly slows down the training and causes pretraining-finetuning inconsistency, due to the large masking ratio (e.g., 60% in SimMIM). On the other hand, MAE does not introduce [MASK] tokens at its encoder at all but is not applicable for hierarchical Vision Transformers. To solve the issue and accelerate the pretraining of hierarchical models, we replace the masked tokens of one image with visible tokens of another image, i.e., creating a mixed image. We then conduct dual reconstruction to reconstruct the two original images from the mixed input, which significantly improves efficiency. While MixMAE can be applied to various hierarchical Transformers, this paper explores using Swin Transformer with a large window size and scales up to huge model size (to reach 600M parameters). Empirical results demonstrate that MixMAE can learn high-quality visual representations efficiently. Notably, MixMAE with Swin-B/W14 achieves 85.1% top-1 accuracy on ImageNet-1K by pretraining for 600 epochs. Besides, its transfer performances on the other 6 datasets show that MixMAE has better FLOPs / performance tradeoff than previous popular MIM methods. Code is available at https://github.com/Sense-X/MixMIM., Comment: CVPR2023. Code: https://github.com/Sense-X/MixMIM
Published: 2022

8. MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

Author: Liu, Jihao, primary, Huang, Xin, additional, Zheng, Jinliang, additional, Liu, Yu, additional, and Li, Hongsheng, additional
Published: 2023
Full Text: View/download PDF

9. Subdural evacuating port system with subdural thrombolysis for the treatment of chronic subdural hematoma in patients older than 80 years

Author: Liu, Tianqing, primary, Gao, Zhenwen, additional, Zhou, Jianjun, additional, Lai, Xiaoyan, additional, Chen, Xiaomei, additional, Rao, Qiong, additional, Guo, Dongbin, additional, Zheng, Jinliang, additional, Lin, Fuxin, additional, Lin, Yuanxiang, additional, and Lin, Zhiqin, additional
Published: 2023
Full Text: View/download PDF

10. Transcriptome analysis reveals the potential mechanism of the albino skin development in pufferfish Takifugu obscurus

Author: Jin, Wu, Wen, Haibo, Du, Xingwei, Zheng, Jinliang, and Gu, Ruobo
Published: 2015

11. Revealing the quadrupole radiation of liquid gallium nanospheres

Author: Chen, Jingdong, primary, Li, Xuan, additional, Zheng, Jinliang, additional, Ye, Xingmei, additional, and Lin, Huichuan, additional
Published: 2022
Full Text: View/download PDF

12. Transcriptome analysis reveals the potential mechanism of the albino skin development in pufferfish Takifugu obscurus

Author: Wu Jin, Zheng Jinliang, Haibo Wen, Ruobo Gu, and Du Xingwei
Subjects: Male, genetic structures, Linoleic acid, Biology, Hemoglobin complex, Transcriptome, chemistry.chemical_compound, Gene expression, Animals, KEGG, Gene, Skin, Genetics, Pigmentation, Fugu, Gene Expression Profiling, Gene Expression Regulation, Developmental, Cell Biology, General Medicine, Takifugu, Gene Ontology, Biochemistry, chemistry, Female, Arachidonic acid, Developmental Biology
Abstract: Our aim was to explore the potential mechanism underlying albino in Takifugu obscurus. The transcriptome sequencing of the skin samples from normal T. obscurus and albino ones was conducted in this paper. The differentially expressed genes (DEGs) in albino fish compared with controls were assayed. The gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis were performed to explore the differentially expressed gene (DEG)-related functions and pathways. A total of 32 genes were found to be differentially expressed, including eight upregulated ones and 24 downregulated ones. Five GO terms were significantly enriched such as hemoglobin complex and oxygen transporter activity. The significantly enriched pathways contained linoleic acid metabolism and arachidonic acid metabolism. Hemoglobin complex, linoleic, and arachidonic acid metabolism may dysregulated in albino fugu. Dietary control of the linoleic acid and arachidonic acid may be an effective management for mediating albino in T. obscurus.
Published: 2015

13. Alcohol Consumption Is a Risk Factor for Lower Extremity Arterial Disease in Chinese Patients with T2DM

Author: Yang, Shanshan, primary, Wang, Shuang, additional, Yang, Bo, additional, Zheng, Jinliang, additional, Cai, Yuping, additional, and Yang, Zhengguo, additional
Published: 2017
Full Text: View/download PDF

14. Weight loss before a diagnosis of type 2 diabetes mellitus is a risk factor for diabetes complications

Author: Yang, Shanshan, primary, Wang, Shuang, additional, Yang, Bo, additional, Zheng, Jinliang, additional, Cai, Yuping, additional, and Yang, Zhengguo, additional
Published: 2016
Full Text: View/download PDF

15. Ventricular-Peritoneal Shunt for an Infant With Post-Meningitic External Hydrocephalus

Author: Que, Shuanglin, primary, Gao, Zhenwen, additional, Zheng, Jinliang, additional, Lu, Jing, additional, Qiu, Ping, additional, Qi, Xiaolong, additional, and Yang, Xi, additional
Published: 2015
Full Text: View/download PDF

16. Weight loss before a diagnosis of type 2 diabetes mellitus is a risk factor for diabetes complications.

Author: Shanshan Yang, Shuang Wang, Bo Yang, Jinliang Zheng, Yuping Cai, Zhengguo Yang, Yang, Shanshan, Wang, Shuang, Yang, Bo, Zheng, Jinliang, Cai, Yuping, and Yang, Zhengguo
Published: 2016
Full Text: View/download PDF

17. Motif Recognition Parallel Algorithm Based on GPU

Author: Zheng, Jinliang, primary, Lu, Jun, additional, Shi, Xinyi, additional, Shi, Yan, additional, and Jing, Ruiqing, additional
Published: 2014
Full Text: View/download PDF

18. The Key Algorithms of Promoter Data Parallel Processing Based on OpenMP

Author: Shi, Yan, primary, Lu, Jun, additional, Shi, Xinyi, additional, Zheng, Jinliang, additional, and Li, Jun, additional
Published: 2014
Full Text: View/download PDF

19. First studies of embryonic and larval development of Coilia nasus (Engraulidae) under controlled conditions

Author: Xu, Gangchun, primary, Tang, Xue, additional, Zhang, Chengxiang, additional, Gu, Ruobo, additional, Zheng, Jinliang, additional, Xu, Pao, additional, and Le, Guowei, additional
Published: 2010
Full Text: View/download PDF

20. Identification of miRNAs and Their Target Genes Involved in Cucumber Fruit Expansion Using Small RNA and Degradome Sequencing.

Author: Sun, Yongdong, Luo, Weirong, Chang, Huaicheng, Li, Zhenxia, Zhou, Junguo, Li, Xinzheng, Zheng, Jinliang, and Hao, Mingxian
Subjects: CUCUMBERS, NON-coding RNA, NUCLEOTIDE sequence, MICRORNA, GENE targeting, GENE regulatory networks, FRUIT
Abstract: Fruit expansion is an essential and very complex biological process. Regulatory roles of microRNAs (miRNAs) and miRNA–mRNA modules in the cucumber fruit expansion are not yet to be investigated. In this work, 1253 known and 1269 novel miRNAs were identified from nine cucumber fruit small RNA (sRNA) libraries through high-throughput sequencing. A total of 105 highly differentially expressed miRNAs were recognized in the fruit on five days post anthesis with pollination (EXP_5d) sRNA library. Further, expression patterns of 11 differentially expressed miRNAs were validated by quantitative real-time PCR (qRT-PCR). The expression patterns were similar to sRNAs sequencing data. Transcripts of 1155 sequences were predicted as target genes of differentially expressed miRNAs by degradome sequencing. Gene Ontology (GO) enrichment showed that these target genes were involved in 24 biological processes, 15 cell components and nine molecular functions. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis demonstrated that these target genes were significantly enriched in 19 pathways and the enriched KEGG pathways were associated with environmental adaptation, signal transduction and translation. Based on the functional prediction of miRNAs and target genes, our findings suggest that miRNAs have a potential regulatory role in cucumber fruit expansion by targeting their target genes, which provide important data for understanding the miRNA-mediated regulatory networks controlling fruit expansion in cucumber. Specific miRNAs could be selected for further functional research and molecular breeding in cucumber. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

20 results on '"Zheng, Jinliang"'

1. Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

2. MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

3. Instruction-Guided Visual Masking

4. Enhancing Vision-Language Model with Unmasked Token Alignment

5. GLID: Pre-training a Generalist Encoder-Decoder Vision Model

6. DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

7. MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

8. MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

9. Subdural evacuating port system with subdural thrombolysis for the treatment of chronic subdural hematoma in patients older than 80 years

10. Transcriptome analysis reveals the potential mechanism of the albino skin development in pufferfish Takifugu obscurus

11. Revealing the quadrupole radiation of liquid gallium nanospheres

12. Transcriptome analysis reveals the potential mechanism of the albino skin development in pufferfish Takifugu obscurus

13. Alcohol Consumption Is a Risk Factor for Lower Extremity Arterial Disease in Chinese Patients with T2DM

14. Weight loss before a diagnosis of type 2 diabetes mellitus is a risk factor for diabetes complications

15. Ventricular-Peritoneal Shunt for an Infant With Post-Meningitic External Hydrocephalus

16. Weight loss before a diagnosis of type 2 diabetes mellitus is a risk factor for diabetes complications.

17. Motif Recognition Parallel Algorithm Based on GPU

18. The Key Algorithms of Promoter Data Parallel Processing Based on OpenMP

19. First studies of embryonic and larval development of Coilia nasus (Engraulidae) under controlled conditions

20. Identification of miRNAs and Their Target Genes Involved in Cucumber Fruit Expansion Using Small RNA and Degradome Sequencing.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

20 results on '"Zheng, Jinliang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources