Author: "Ma, Xinzhu" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ma, Xinzhu"' showing total 34 results

Start Over Author "Ma, Xinzhu" Publication Year Range Last 10 years

34 results on '"Ma, Xinzhu"'

1. PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

Author: Li, Ye, Tang, Chen, Meng, Yuan, Fan, Jiajun, Chai, Zenghao, Ma, Xinzhu, Wang, Zhi, and Zhu, Wenwu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses challenges to both architectural and decision-making aspects. Firstly, while ViTs inherently support variable-token inference, they do not facilitate dynamic computations for variable channels. To overcome this limitation, we propose a meta-network using weight-sharing techniques to support arbitrary channels of the Multi-head Self-Attention and Multi-layer Perceptron layers, serving as a foundational model for architectural decision-making. Second, simultaneously optimizing the structure of the meta-network and input data constitutes a combinatorial optimization problem with an extremely large decision space, reaching up to around $10^{14}$, making supervised learning infeasible. To this end, we design a lightweight selector employing Proximal Policy Optimization for efficient decision-making. Furthermore, we introduce a novel "Result-to-Go" training mechanism that models ViTs' inference process as a Markov decision process, significantly reducing action space and mitigating delayed-reward issues during training. Extensive experiments demonstrate the effectiveness of PRANCE~ in reducing FLOPs by approximately 50\%, retaining only about 10\% of tokens while achieving lossless Top-1 accuracy. Additionally, our framework is shown to be compatible with various token optimization techniques such as pruning, merging, and sequential pruning-merging strategies. The code is available at \href{https://github.com/ChildTang/PRANCE}{https://github.com/ChildTang/PRANCE}.
Published: 2024

2. Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Author: Chen, Lei, Meng, Yuan, Tang, Chen, Ma, Xinzhu, Jiang, Jingyan, Wang, Xin, Wang, Zhi, and Zhu, Wenwu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recent advancements in diffusion models, particularly the trend of architectural transformation from UNet-based Diffusion to Diffusion Transformer (DiT), have significantly improved the quality and scalability of image synthesis. Despite the incredible generative quality, the large computational requirements of these large-scale models significantly hinder the deployments in real-world scenarios. Post-training Quantization (PTQ) offers a promising solution by compressing model sizes and speeding up inference for the pretrained models while eliminating model retraining. However, we have observed the existing PTQ frameworks exclusively designed for both ViT and conventional Diffusion models fall into biased quantization and result in remarkable performance degradation. In this paper, we find that the DiTs typically exhibit considerable variance in terms of both weight and activation, which easily runs out of the limited numerical representations. To address this issue, we devise Q-DiT, which seamlessly integrates three techniques: fine-grained quantization to manage substantial variance across input channels of weights and activations, an automatic search strategy to optimize the quantization granularity and mitigate redundancies, and dynamic activation quantization to capture the activation changes across timesteps. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of the proposed Q-DiT. Specifically, when quantizing DiT-XL/2 to W8A8 on ImageNet 256x256, Q-DiT achieves a remarkable reduction in FID by 1.26 compared to the baseline. Under a W4A8 setting, it maintains high fidelity in image generation, showcasing only a marginal increase in FID and setting a new benchmark for efficient, high-quality quantization in diffusion transformers. Code is available at \href{https://github.com/Juanerx/Q-DiT}{https://github.com/Juanerx/Q-DiT}.
Published: 2024

3. Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

Author: Liu, Yijun, Meng, Yuan, Wu, Fang, Peng, Shenhao, Yao, Hang, Guan, Chaoyu, Tang, Chen, Ma, Xinzhu, Wang, Zhi, and Zhu, Wenwu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and inference cost, quantization also faces challenges in performance degradation at low bit-widths. Understanding the impact of quantization on LLM capabilities, especially the generalization ability, is crucial. However, the community's main focus remains on the algorithms and models of quantization, with insufficient attention given to whether the quantized models can retain the strong generalization abilities of LLMs. In this work, we fill this gap by providing a comprehensive benchmark suite for this research topic, including an evaluation system, detailed analyses, and a general toolbox. Specifically, based on the dominant pipeline in LLM quantization, we primarily explore the impact of calibration data distribution on the generalization of quantized LLMs and conduct the benchmark using more than 40 datasets within two main scenarios. Based on this benchmark, we conduct extensive experiments with two well-known LLMs (English and Chinese) and four quantization algorithms to investigate this topic in-depth, yielding several counter-intuitive and valuable findings, e.g., models quantized using a calibration set with the same distribution as the test data are not necessarily optimal. Besides, to facilitate future research, we also release a modular-designed toolbox, which decouples the overall pipeline into several separate components, e.g., base LLM module, dataset module, quantizer module, etc. and allows subsequent researchers to easily assemble their methods through a simple configuration. Our benchmark suite is publicly available at https://github.com/TsingmaoAI/MI-optimize
Published: 2024

4. BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

Author: Ren, Yuchen, Chen, Zhiyuan, Qiao, Lifeng, Jing, Hongtai, Cai, Yuchen, Xu, Sheng, Ye, Peng, Ma, Xinzhu, Sun, Siqi, Yan, Hongliang, Yuan, Dong, Ouyang, Wanli, and Liu, Xihui
Subjects: Quantitative Biology - Quantitative Methods, Computer Science - Machine Learning
Abstract: RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we introduce the first comprehensive RNA benchmark BEACON (\textbf{BE}nchm\textbf{A}rk for \textbf{CO}mprehensive R\textbf{N}A Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications, enabling a comprehensive assessment of the performance of methods on various RNA understanding tasks. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components from the tokenizer and positional encoding aspects. Notably, our findings emphasize the superiority of single nucleotide tokenization and the effectiveness of Attention with Linear Biases (ALiBi) over traditional positional encoding methods. Based on these insights, a simple yet strong baseline called BEACON-B is proposed, which can achieve outstanding performance with limited data and computational resources. The datasets and source code of our benchmark are available at https://github.com/terry-r123/RNABenchmark.
Published: 2024

5. TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models

Author: Sun, Haojun, Tang, Chen, Wang, Zhi, Meng, Yuan, jiang, Jingyan, Ma, Xinzhu, and Zhu, Wenwu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Diffusion models have emerged as preeminent contenders in the realm of generative models. Distinguished by their distinctive sequential generative processes, characterized by hundreds or even thousands of timesteps, diffusion models progressively reconstruct images from pure Gaussian noise, with each timestep necessitating full inference of the entire model. However, the substantial computational demands inherent to these models present challenges for deployment, quantization is thus widely used to lower the bit-width for reducing the storage and computing overheads. Current quantization methodologies primarily focus on model-side optimization, disregarding the temporal dimension, such as the length of the timestep sequence, thereby allowing redundant timesteps to continue consuming computational resources, leaving substantial scope for accelerating the generative process. In this paper, we introduce TMPQ-DM, which jointly optimizes timestep reduction and quantization to achieve a superior performance-efficiency trade-off, addressing both temporal and model optimization aspects. For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process, thereby mitigating the explosive combinations of timesteps. In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance, thus rectifying performance degradation observed in prior studies. To expedite the evaluation of fine-grained quantization, we further devise a super-network to serve as a precision solver by leveraging shared quantization results. These two design components are seamlessly integrated within our framework, enabling rapid joint exploration of the exponentially large decision space via a gradient-free evolutionary search algorithm.
Published: 2024

6. Retraining-free Model Quantization via One-Shot Weight-Coupling Learning

Author: Tang, Chen, Meng, Yuan, Jiang, Jiacheng, Xie, Shuzhao, Lu, Rongwei, Ma, Xinzhu, Wang, Zhi, and Zhu, Wenwu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Quantization is of significance for compressing the over-parameterized deep neural models and deploying them on resource-limited devices. Fixed-precision quantization suffers from performance drop due to the limited numerical representation ability. Conversely, mixed-precision quantization (MPQ) is advocated to compress the model effectively by allocating heterogeneous bit-width for layers. MPQ is typically organized into a searching-retraining two-stage process. In this paper, we devise a one-shot training-searching paradigm for mixed-precision model compression. Specifically, in the first stage, all potential bit-width configurations are coupled and thus optimized simultaneously within a set of shared weights. However, our observations reveal a previously unseen and severe bit-width interference phenomenon among highly coupled weights during optimization, leading to considerable performance degradation under a high compression ratio. To tackle this problem, we first design a bit-width scheduler to dynamically freeze the most turbulent bit-width of layers during training, to ensure the rest bit-widths converged properly. Then, taking inspiration from information theory, we present an information distortion mitigation technique to align the behavior of the bad-performing bit-widths to the well-performing ones. In the second stage, an inference-only greedy search scheme is devised to evaluate the goodness of configurations without introducing any additional training costs. Extensive experiments on three representative models and three datasets demonstrate the effectiveness of the proposed method. Code can be available on \href{https://www.github.com/1hunters/retraining-free-quantization}{https://github.com/1hunters/retraining-free-quantization}.
Published: 2024

7. GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

Author: Lu, Yan, Ma, Xinzhu, Yang, Lei, Zhang, Tianzhu, Liu, Yating, Chu, Qi, He, Tong, Li, Yonghui, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework., Comment: 18 pages, 9 figures
Published: 2023

8. Rethinking the BERT-like Pretraining for DNA Sequences

Author: Liang, Chaoqi, Bai, Weiqiang, Qiao, Lifeng, Ren, Yuchen, Sun, Jianle, Ye, Peng, Yan, Hongliang, Ma, Xinzhu, Zuo, Wangmeng, and Ouyang, Wanli
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: With the success of large-scale pretraining in NLP, there is an increasing trend of applying it to the domain of life sciences. In particular, pretraining methods based on DNA sequences have garnered growing attention due to their potential to capture generic information about genes. However, existing pretraining methods for DNA sequences largely rely on direct adoptions of BERT pretraining from NLP, lacking a comprehensive understanding and a specifically tailored approach. To address this research gap, we first conducted a series of exploratory experiments and gained several insightful observations: 1) In the fine-tuning phase of downstream tasks, when using K-mer overlapping tokenization instead of K-mer non-overlapping tokenization, both overlapping and non-overlapping pretraining weights show consistent performance improvement.2) During the pre-training process, using K-mer overlapping tokenization quickly produces clear K-mer embeddings and reduces the loss to a very low level, while using K-mer non-overlapping tokenization results in less distinct embeddings and continuously decreases the loss. 3) Using overlapping tokenization causes the self-attention in the intermediate layers of pre-trained models to tend to overly focus on certain tokens, reflecting that these layers are not adequately optimized. In summary, overlapping tokenization can benefit the fine-tuning of downstream tasks but leads to inadequate pretraining with fast convergence. To unleash the pretraining potential, we introduce a novel approach called RandomMask, which gradually increases the task difficulty of BERT-like pretraining by continuously expanding its mask boundary, forcing the model to learn more knowledge. RandomMask is simple but effective, achieving top-tier performance across 26 datasets of 28 datasets spanning 7 downstream tasks.
Published: 2023

9. Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection

Author: Ma, Xinzhu, Wang, Yongtao, Zhang, Yinmin, Xia, Zhiyi, Meng, Yuan, Wang, Zhihui, Li, Haojie, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. In particular, different from other highly mature tasks, e.g., 2D object detection, the community of image-based 3D object detection is still evolving, where methods often adopt different training recipes and tricks resulting in unfair evaluations and comparisons. What is worse, these tricks may overwhelm their proposed designs in performance, even leading to wrong conclusions. To address this issue, we build a module-designed codebase and formulate unified training standards for the community. Furthermore, we also design an error diagnosis toolbox to measure the detailed characterization of detection models. Using these tools, we analyze current methods in-depth under varying settings and provide discussions for some open questions, e.g., discrepancies in conclusions on KITTI-3D and nuScenes datasets, which have led to different dominant methods for these datasets. We hope that this work will facilitate future research in image-based 3D object detection. Our codes will be released at \url{https://github.com/OpenGVLab/3dodi}, Comment: ICCV23, code will be released soon
Published: 2023

10. An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection

Author: Ma, Xinzhu, Meng, Yuan, Zhang, Yinmin, Bai, Lei, Hou, Jun, Yi, Shuai, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image-based 3D detection is an indispensable component of the perception system for autonomous driving. However, it still suffers from the unsatisfying performance, one of the main reasons for which is the limited training data. Unfortunately, annotating the objects in the 3D space is extremely time/resource-consuming, which makes it hard to extend the training set arbitrarily. In this work, we focus on the semi-supervised manner and explore the feasibility of a cheaper alternative, i.e. pseudo-labeling, to leverage the unlabeled data. For this purpose, we conduct extensive experiments to investigate whether the pseudo-labels can provide effective supervision for the baseline models under varying settings. The experimental results not only demonstrate the effectiveness of the pseudo-labeling mechanism for image-based 3D detection (e.g. under monocular setting, we achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP), but also show several interesting and noteworthy findings (e.g. the models trained with pseudo-labels perform better than that trained with ground-truth annotations based on the same training data). We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting. The codes, pseudo-labels, and pre-trained models will be publicly available., Comment: tech report
Published: 2022

11. Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

Author: Qiu, Zengyu, Ma, Xinzhu, Yang, Kunlin, Liu, Chunya, Hou, Jun, Yi, Shuai, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger, existing KD methods fail to achieve better results. Our work shows that the `prior knowledge' is vital to KD, especially when applying large teachers. Particularly, we propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation. This means that our method also takes the teacher's feature as `input', not just `target'. Besides, we dynamically adjust the ratio of the prior knowledge during the training phase according to the feature gap, thus guiding the student in an appropriate difficulty. To evaluate the proposed method, we conduct extensive experiments on two image classification benchmarks (i.e. CIFAR100 and ImageNet) and an object detection benchmark (i.e. MS COCO. The results demonstrate the superiority of our method in performance under varying settings. Besides, our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers. More importantly, DPK provides a fast solution in teacher model selection for any given model., Comment: ICLR'23 accepted
Published: 2022

12. 3D Object Detection from Images for Autonomous Driving: A Survey

Author: Ma, Xinzhu, Ouyang, Wanli, Simonelli, Andrea, and Ricci, Elisa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years. Benefiting from the rapid development of deep learning technologies, image-based 3D detection has achieved remarkable progress. Particularly, more than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications. However, to date no recent survey exists to collect and organize this knowledge. In this paper, we fill this gap in the literature and provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components. Additionally, we also propose two new taxonomies to organize the state-of-the-art methods into different categories, with the intent of providing a more systematic review of existing methods and facilitating fair comparisons with future works. In retrospect of what has been achieved so far, we also analyze the current challenges in the field and discuss future directions for image-based 3D detection research., Comment: Accepted by T-PAMI
Published: 2022

13. MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Author: Chong, Zhiyu, Ma, Xinzhu, Zhang, Hong, Yue, Yuxin, Li, Haojie, Wang, Zhihui, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D object detection is a fundamental and challenging task for 3D scene understanding, and the monocular-based methods can serve as an economical alternative to the stereo-based or LiDAR-based methods. However, accurately detecting objects in the 3D space from a single image is extremely difficult due to the lack of spatial cues. To mitigate this issue, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. In particular, we first project the LiDAR signals into the image plane and align them with the RGB images. After that, we use the resulting data to train a 3D detector (LiDAR Net) with the same architecture as the baseline model. Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model. Experimental results show that the proposed method can significantly boost the performance of the baseline model and ranks the $1^{st}$ place among all monocular-based methods on the KITTI benchmark. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each part of our designs and illustrate what the baseline model has learned from the LiDAR Net. Our code will be released at \url{https://github.com/monster-ghost/MonoDistill}., Comment: Accepted by ICLR 2022
Published: 2022

14. Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection

Author: Zhang, Yinmin, Ma, Xinzhu, Yi, Shuai, Hou, Jun, Wang, Zhihui, Ouyang, Wanli, and Xu, Dan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: As a crucial task of autonomous driving, 3D object detection has made great progress in recent years. However, monocular 3D object detection remains a challenging problem due to the unsatisfactory performance in depth estimation. Most existing monocular methods typically directly regress the scene depth while ignoring important relationships between the depth and various geometric elements (e.g. bounding box sizes, 3D object dimensions, and object poses). In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection. Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised. We further implement and embed the proposed formula to enable geometry-aware deep representation learning, allowing effective 2D and 3D interactions for boosting the depth estimation. Moreover, we provide a strong baseline through addressing substantial misalignment between 2D annotation and projected boxes to ensure robust learning with the proposed geometric formula. Experiments on the KITTI dataset show that our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting. The model and code will be released at https://github.com/YinminZhang/MonoGeo., Comment: 16 pages, 11 figures
Published: 2021

15. Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Author: Lu, Yan, Ma, Xinzhu, Yang, Lei, Zhang, Tianzhu, Liu, Yating, Chu, Qi, Yan, Junjie, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Geometry Projection is a powerful depth estimation method in monocular 3D object detection. It estimates depth dependent on heights, which introduces mathematical priors into the deep model. But projection process also introduces the error amplification problem, in which the error of the estimated height will be amplified and reflected greatly at the output depth. This property leads to uncontrollable depth inferences and also damages the training efficiency. In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages. Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth, which not only provides high reliable confidence for each depth but also benefits depth learning. Furthermore, at the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification. This learning algorithm monitors the learning situation of each task by a proposed indicator and adaptively assigns the proper loss weights for different tasks according to their pre-tasks situation. Based on that, each task starts learning only when its pre-tasks are learned well, which can significantly improve the stability and efficiency of the training process. Extensive experiments demonstrate the effectiveness of the proposed method. The overall model can infer more reliable object depth than existing methods and outperforms the state-of-the-art image-based monocular 3D detectors by 3.74% and 4.7% AP40 of the car and pedestrian categories on the KITTI benchmark., Comment: To appear at ICCV2021
Published: 2021

16. Delving into Localization Errors for Monocular 3D Object Detection

Author: Ma, Xinzhu, Zhang, Yinmin, Xu, Dan, Zhou, Dongzhan, Yi, Shuai, Li, Haojie, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging. In this work, by intensive diagnosis experiments, we quantify the impact introduced by each sub-task and found the `localization error' is the vital factor in restricting monocular 3D detection. Besides, we also investigate the underlying reasons behind localization errors, analyze the issues they might bring, and propose three strategies. First, we revisit the misalignment between the center of the 2D bounding box and the projected center of the 3D object, which is a vital factor leading to low localization accuracy. Second, we observe that accurately localizing distant objects with existing technologies is almost impossible, while those samples will mislead the learned network. To this end, we propose to remove such samples from the training set for improving the overall performance of the detector. Lastly, we also propose a novel 3D IoU oriented loss for the size estimation of the object, which is not affected by `localization error'. We conduct extensive experiments on the KITTI dataset, where the proposed method achieves real-time detection and outperforms previous methods by a large margin. The code will be made available at: https://github.com/xinzhuma/monodle., Comment: CVPR'2021, code will be made available
Published: 2021

17. Rethinking Pseudo-LiDAR Representation

Author: Ma, Xinzhu, Liu, Shinan, Xia, Zhiyi, Zhang, Hongwen, Zeng, Xingyu, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The recently proposed pseudo-LiDAR based 3D detectors greatly improve the benchmark of monocular/stereo 3D detection task. However, the underlying mechanism remains obscure to the research community. In this paper, we perform an in-depth investigation and observe that the efficacy of pseudo-LiDAR representation comes from the coordinate transformation, instead of data representation itself. Based on this observation, we design an image based CNN detector named Patch-Net, which is more generalized and can be instantiated as pseudo-LiDAR based 3D detectors. Moreover, the pseudo-LiDAR data in our PatchNet is organized as the image representation, which means existing 2D CNN designs can be easily utilized for extracting deep features from input data and boosting 3D detection performance. We conduct extensive experiments on the challenging KITTI dataset, where the proposed PatchNet outperforms all existing pseudo-LiDAR based counterparts. Code has been made available at: https://github.com/xinzhuma/patchnet., Comment: ECCV2020. Supplemental Material attached
Published: 2020

18. Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

Author: Ma, Xinzhu, Wang, Zhihui, Li, Haojie, Zhang, Pengbo, Fan, Xin, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose a monocular 3D object detection framework in the domain of autonomous driving. Unlike previous image-based methods which focus on RGB feature extracted from 2D images, our method solves this problem in the reconstructed 3D space in order to exploit 3D contexts explicitly. To this end, we first leverage a stand-alone module to transform the input data from 2D image plane to 3D point clouds space for a better input representation, then we perform the 3D detection using PointNet backbone net to obtain objects 3D locations, dimensions and orientations. To enhance the discriminative capability of point clouds, we propose a multi-modal feature fusion module to embed the complementary RGB cue into the generated point clouds representation. We argue that it is more effective to infer the 3D bounding boxes from the generated 3D scene space (i.e., X,Y, Z space) compared to the image plane (i.e., R,G,B image plane). Evaluation on the challenging KITTI dataset shows that our approach boosts the performance of state-of-the-art monocular approach by a large margin., Comment: To appear in ICCV'19
Published: 2019

19. User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks

Author: Ci, Yuanzheng, Ma, Xinzhu, Wang, Zhihui, Li, Haojie, and Luo, Zhongxuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Scribble colors based line art colorization is a challenging computer vision problem since neither greyscale values nor semantic information is presented in line arts, and the lack of authentic illustration-line art training pairs also increases difficulty of model generalization. Recently, several Generative Adversarial Nets (GANs) based methods have achieved great success. They can generate colorized illustrations conditioned on given line art and color hints. However, these methods fail to capture the authentic illustration distributions and are hence perceptually unsatisfying in the sense that they often lack accurate shading. To address these challenges, we propose a novel deep conditional adversarial architecture for scribble based anime line art colorization. Specifically, we integrate the conditional framework with WGAN-GP criteria as well as the perceptual loss to enable us to robustly train a deep network that makes the synthesized images more natural and real. We also introduce a local features network that is independent of synthetic data. With GANs conditioned on features from such network, we notably increase the generalization capability over "in the wild" line arts. Furthermore, we collect two datasets that provide high-quality colorful illustrations and authentic line arts for training and benchmarking. With the proposed model trained on our illustration dataset, we demonstrate that images synthesized by the presented approach are considerably more realistic and precise than alternative approaches., Comment: Accepted for publication at the 2018 ACM Multimedia Conference (MM '18)
Published: 2018
Full Text: View/download PDF

20. Push-and-Pull: A General Training Framework with Differential Augmentor for Domain Generalized Point Cloud Classification

Author: Xu, Jiahao, primary, Ma, Xinzhu, additional, Zhang, Lin, additional, Zhang, Bo, additional, and Chen, Tao, additional
Published: 2024
Full Text: View/download PDF

21. Disparity-Based Robust Unstructured Terrain Segmentation

Author: Zhang, Pengbo, Ma, Xinzhu, Wang, Zhihui, Li, Haojie, Luo, Zhongxuan, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Lai, Jian-Huang, editor, Liu, Cheng-Lin, editor, Chen, Xilin, editor, Zhou, Jie, editor, Tan, Tieniu, editor, Zheng, Nanning, editor, and Zha, Hongbin, editor
Published: 2018
Full Text: View/download PDF

22. Rethinking Pseudo-LiDAR Representation

Author: Ma, Xinzhu, primary, Liu, Shinan, additional, Xia, Zhiyi, additional, Zhang, Hongwen, additional, Zeng, Xingyu, additional, and Ouyang, Wanli, additional
Published: 2020
Full Text: View/download PDF

23. Learning Pixel-Wise Continuous Depth Representation via Clustering for Depth Completion

Author: Chen, Shenglun, Zhang, Hong, Ma, Xinzhu, Wang, Zhihui, and Li, Haojie
Abstract: Depth completion is a long-standing challenge in computer vision, where classification-based methods have made tremendous progress in recent years. However, most existing classification-based methods rely on pre-defined pixel-shared and discrete depth values as depth categories. This representation fails to capture the continuous depth values that conform to the real depth distribution, leading to depth smearing in boundary regions. To address this issue, we revisit depth completion from the clustering perspective and propose a novel clustering-based framework called CluDe which focuses on learning the pixel-wise and continuous depth representation. The key idea of CluDe is to iteratively update the pixel-shared and discrete depth representation to its corresponding pixel-wise and continuous counterpart, driven by the real depth distribution. Specifically, CluDe first utilizes depth value clustering to learn a set of depth centers as the depth representation. While these depth centers are pixel-shared and discrete, they are more in line with the real depth distribution compared to pre-defined depth categories. Then, CluDe estimates offsets for these depth centers, enabling their dynamic adjustment along the depth axis of the depth distribution to generate the pixel-wise and continuous depth representation. Extensive experiments demonstrate that CluDe successfully reduces depth smearing around object boundaries by utilizing pixel-wise and continuous depth representation. Furthermore, CluDe achieves state-of-the-art performance on the VOID datasets and outperforms classification-based methods on the KITTI dataset.
Published: 2024
Full Text: View/download PDF

24. 3D Object Detection From Images for Autonomous Driving: A Survey

Author: Ma, Xinzhu, Ouyang, Wanli, Simonelli, Andrea, and Ricci, Elisa
Abstract: 3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years. Benefiting from the rapid development of deep learning technologies, image-based 3D detection has achieved remarkable progress. Particularly, more than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications. However, to date no recent survey exists to collect and organize this knowledge. In this paper, we fill this gap in the literature and provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components. Additionally, we also propose two new taxonomies to organize the state-of-the-art methods into different categories, with the intent of providing a more systematic review of existing methods and facilitating fair comparisons with future works. In retrospect of what has been achieved so far, we also analyze the current challenges in the field and discuss future directions for image-based 3D detection research.
Published: 2024
Full Text: View/download PDF

25. 3D Object Detection From Images for Autonomous Driving: A Survey

Author: Ma, Xinzhu, primary, Ouyang, Wanli, additional, Simonelli, Andrea, additional, and Ricci, Elisa, additional
Published: 2023
Full Text: View/download PDF

26. Disparity-Based Robust Unstructured Terrain Segmentation

Author: Zhang, Pengbo, primary, Ma, Xinzhu, additional, Wang, Zhihui, additional, Li, Haojie, additional, and Luo, Zhongxuan, additional
Published: 2018
Full Text: View/download PDF

27. Residue 49 of AtMinD1 Plays a Key Role in the Guidance of Chloroplast Division by Regulating the ARC6-AtMinD1 Interaction

Author: Zhang, Yanhua, primary, Zhang, Xiaochen, additional, Cui, Huanshuo, additional, Ma, Xinzhu, additional, Hu, Guipeng, additional, Wei, Jing, additional, He, Yikun, additional, and Hu, Yong, additional
Published: 2021
Full Text: View/download PDF

28. Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Author: Lu, Yan, primary, Ma, Xinzhu, additional, Yang, Lei, additional, Zhang, Tianzhu, additional, Liu, Yating, additional, Chu, Qi, additional, Yan, Junjie, additional, and Ouyang, Wanli, additional
Published: 2021
Full Text: View/download PDF

29. Delving into Localization Errors for Monocular 3D Object Detection

Author: Ma, Xinzhu, primary, Zhang, Yinmin, additional, Xu, Dan, additional, Zhou, Dongzhan, additional, Yi, Shuai, additional, Li, Haojie, additional, and Ouyang, Wanli, additional
Published: 2021
Full Text: View/download PDF

30. Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

Author: Ma, Xinzhu, primary, Wang, Zhihui, additional, Li, Haojie, additional, Zhang, Pengbo, additional, Ouyang, Wanli, additional, and Fan, Xin, additional
Published: 2019
Full Text: View/download PDF

31. Self-Adaption Multi-classifier Fusion Networks for Image Recognition

Author: Guo, Zengyuan, primary, Ma, Xinzhu, additional, Li, Haojie, additional, Wang, Zhihui, additional, and Zhang, Pengbo, additional
Published: 2019
Full Text: View/download PDF

32. Learning to Segment Unseen Category Objects using Gradient Gaussian Attention

Author: Zhang, Pengbo, primary, Wang, Zhihui, additional, Ma, Xinzhu, additional, Li, Haojie, additional, and Li, Jianjun, additional
Published: 2019
Full Text: View/download PDF

33. User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks

Author: Ci, Yuanzheng, primary, Ma, Xinzhu, additional, Wang, Zhihui, additional, Li, Haojie, additional, and Luo, Zhongxuan, additional
Published: 2018
Full Text: View/download PDF

34. An Efficient Protocol With Bidirectional Verification for Storage Security in Cloud Computing

Author: Feng, Bin, primary, Ma, Xinzhu, additional, Guo, Cheng, additional, Shi, Hui, additional, Fu, Zhangjie, additional, and Qiu, Tie, additional
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

34 results on '"Ma, Xinzhu"'

1. PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

2. Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

3. Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

4. BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

5. TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models

6. Retraining-free Model Quantization via One-Shot Weight-Coupling Learning

7. GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

8. Rethinking the BERT-like Pretraining for DNA Sequences

9. Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection

10. An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection

11. Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

12. 3D Object Detection from Images for Autonomous Driving: A Survey

13. MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

14. Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection

15. Geometry Uncertainty Projection Network for Monocular 3D Object Detection

16. Delving into Localization Errors for Monocular 3D Object Detection

17. Rethinking Pseudo-LiDAR Representation

18. Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

19. User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks

20. Push-and-Pull: A General Training Framework with Differential Augmentor for Domain Generalized Point Cloud Classification

21. Disparity-Based Robust Unstructured Terrain Segmentation

22. Rethinking Pseudo-LiDAR Representation

23. Learning Pixel-Wise Continuous Depth Representation via Clustering for Depth Completion

24. 3D Object Detection From Images for Autonomous Driving: A Survey

25. 3D Object Detection From Images for Autonomous Driving: A Survey

26. Disparity-Based Robust Unstructured Terrain Segmentation

27. Residue 49 of AtMinD1 Plays a Key Role in the Guidance of Chloroplast Division by Regulating the ARC6-AtMinD1 Interaction

28. Geometry Uncertainty Projection Network for Monocular 3D Object Detection

29. Delving into Localization Errors for Monocular 3D Object Detection

30. Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

31. Self-Adaption Multi-classifier Fusion Networks for Image Recognition

32. Learning to Segment Unseen Category Objects using Gradient Gaussian Attention

33. User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks

34. An Efficient Protocol With Bidirectional Verification for Storage Security in Cloud Computing

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

34 results on '"Ma, Xinzhu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources