8,118 results on '"Ding, Jian"'
Search Results
2. A computational transition for detecting correlated stochastic block models by low-degree polynomials
- Author
-
Chen, Guanyi, Ding, Jian, Gong, Shuyang, and Li, Zhangsong
- Subjects
Mathematics - Probability ,Computer Science - Data Structures and Algorithms ,Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Primary 68Q87, Secondary 62M20 - Abstract
Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric communities, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, and subsampling probability $s$. For the detection problem of distinguishing this model from a pair of independent Erd\H{o}s-R\'enyi graphs with the same edge density $\mathcal{G}(n,\tfrac{\lambda s}{n})$, we focus on tests based on \emph{low-degree polynomials} of the entries of the adjacency matrices, and we determine the threshold that separates the easy and hard regimes. More precisely, we show that this class of tests can distinguish these two models if and only if $s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$, where $\alpha\approx 0.338$ is the Otter's constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum threshold. Our proof of low-degree hardness is based on a conditional variant of the low-degree likelihood calculation., Comment: 75 pages, 2 figures
- Published
- 2024
3. Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
- Author
-
Ataallah, Kirolos, Shen, Xiaoqian, Abdelrahman, Eslam, Sleiman, Essam, Zhuge, Mingchen, Ding, Jian, Zhu, Deyao, Schmidhuber, Jürgen, and Elhoseiny, Mohamed
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Most current LLM-based models for video understanding can process videos within minutes. However, they struggle with lengthy videos due to challenges such as "noise and redundancy", as well as "memory and computation" constraints. In this paper, we present Goldfish, a methodology tailored for comprehending videos of arbitrary lengths. We also introduce the TVQA-long benchmark, specifically designed to evaluate models' capabilities in understanding long videos with questions in both vision and text content. Goldfish approaches these challenges with an efficient retrieval mechanism that initially gathers the top-k video clips relevant to the instruction before proceeding to provide the desired response. This design of the retrieval mechanism enables the Goldfish to efficiently process arbitrarily long video sequences, facilitating its application in contexts such as movies or television series. To facilitate the retrieval process, we developed MiniGPT4-Video that generates detailed descriptions for the video clips. In addressing the scarcity of benchmarks for long video evaluation, we adapted the TVQA short video benchmark for extended content analysis by aggregating questions from entire episodes, thereby shifting the evaluation from partial to full episode comprehension. We attained a 41.78% accuracy rate on the TVQA-long benchmark, surpassing previous methods by 14.94%. Our MiniGPT4-Video also shows exceptional performance in short video comprehension, exceeding existing state-of-the-art methods by 3.23%, 2.03%, 16.5% and 23.59% on the MSVD, MSRVTT, TGIF, and TVQA short video benchmarks, respectively. These results indicate that our models have significant improvements in both long and short-video understanding. Our models and code have been made publicly available at https://vision-cair.github.io/Goldfish_website/, Comment: 25 pages, 11 figures, accepted by ECCV 2024
- Published
- 2024
4. InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding
- Author
-
Ataallah, Kirolos, Gou, Chenhui, Abdelrahman, Eslam, Pahwa, Khushbu, Ding, Jian, and Elhoseiny, Mohamed
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding which presents 1)The longest video duration, averaging 52.59 minutes per video 2) The largest number of question-answer pairs, 108.2K 3) Diversity in questions that examine nine different skills and include both multiple-choice questions and open-ended questions 4) Human-centric, as the video sources come from movies and daily TV shows, with specific human-level question designs such as Movie Spoiler Questions that require critical thinking and comprehensive understanding. Using InfiniBench, we comprehensively evaluate existing Large Multi-Modality Models (LMMs) on each skill, including the commercial models such as GPT-4o and Gemini 1.5 Flash and the open-source models. The evaluation shows significant challenges in our benchmark. Our findings reveal that even leading AI models like GPT-4o and Gemini 1.5 Flash face challenges in achieving high performance in long video understanding, with average accuracies of just 49.16\% and 42.72\%, and average scores of 3.22 and 2.71 out of 5, respectively. We hope this benchmark will stimulate the LMMs community towards long video and human-level understanding. Our benchmark can be accessed at https://vision-cair.github.io/InfiniBench/, Comment: 24 pages,25 figures
- Published
- 2024
5. VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
- Author
-
Li, Xiang, Ding, Jian, and Elhoseiny, Mohamed
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. Although several vision-language datasets in remote sensing have been proposed to pursue this goal, existing datasets are typically tailored to single tasks, lack detailed object information, or suffer from inadequate quality control. Exploring these improvement opportunities, we present a Versatile vision-language Benchmark for Remote Sensing image understanding, termed VRSBench. This benchmark comprises 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs. It facilitates the training and evaluation of vision-language models across a broad spectrum of remote sensing image understanding tasks. We further evaluated state-of-the-art models on this benchmark for three vision-language tasks: image captioning, visual grounding, and visual question answering. Our work aims to significantly contribute to the development of advanced vision-language models in the field of remote sensing. The data and code can be accessed at https://github.com/lx709/VRSBench., Comment: Submitted for consideration at a conference
- Published
- 2024
6. iMotion-LLM: Motion Prediction Instruction Tuning
- Author
-
Felemban, Abdulwahab, Bakr, Eslam Mohamed, Shen, Xiaoqian, Ding, Jian, Mohamed, Abduallah, and Elhoseiny, Mohamed
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios. Different from conventional motion prediction approaches, iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories. By enriching the real-world driving scenarios in the Waymo Open Dataset with textual motion instructions, we created InstructWaymo. Leveraging this dataset, iMotion-LLM integrates a pretrained LLM, fine-tuned with LoRA, to translate scene features into the LLM input space. iMotion-LLM offers significant advantages over conventional motion prediction models. First, it can generate trajectories that align with the provided instructions if it is a feasible direction. Second, when given an infeasible direction, it can reject the instruction, thereby enhancing safety. These findings act as milestones in empowering autonomous navigation systems to interpret and predict the dynamics of multi-agent environments, laying the groundwork for future advancements in this field.
- Published
- 2024
7. One-arm Probabilities for Metric Graph Gaussian Free Fields below and at the Critical Dimension
- Author
-
Cai, Zhenhao and Ding, Jian
- Subjects
Mathematics - Probability - Abstract
For the critical level-set of the Gaussian free field on the metric graph of $\mathbb Z^d$, we consider the one-arm probability $\theta_d(N)$, i.e., the probability that the boundary of a box of side length $2N$ is connected to the center. We prove that $\theta_d(N)$ is $O(N^{-\frac{d}{2}+1})$ for $3\le d\le 5$, and is $N^{-2+o(1)}$ for $d=6$. Our upper bounds match the lower bounds in a previous work by Ding and Wirth up to a constant factor for $3\le d\le 5$, and match the exponent therein for $d=6$. Combined with our previous result that $\theta_d(N) \asymp N^{-2}$ for $d>6$, this seems to present the first percolation model whose one-arm probabilities are essentially completely understood in all dimensions. In particular, these results fully confirm Werner's conjectures (2021) on the one-arm exponents: \begin{equation*} \text{(1) for}\ 3\le d
d_c,\ \theta_d(N)=N^{-2+o(1)}. \end{equation*} Prior to our work, Drewitz, Pr\'evost and Rodriguez obtained upper bounds for $d\in \{3, 4\}$, which are very sharp although lose some diverging factors. In the same work, they conjectured that $\theta_{d_c}(N) = N^{-2+o(1)}$, which is now established. In addition, in a recent concurrent work, Drewitz, Pr\'evost and Rodriguez independently obtained the up-to-constant upper bound for $d=3$. - Published
- 2024
8. Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
- Author
-
Fei, Junjie, Ahmed, Mahmoud, Ding, Jian, Bakr, Eslam Mohamed, and Elhoseiny, Mohamed
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
While 3D MLLMs have achieved significant progress, they are restricted to object and scene understanding and struggle to understand 3D spatial structures at the part level. In this paper, we introduce Kestrel, representing a novel approach that empowers 3D MLLMs with part-aware understanding, enabling better interpretation and segmentation grounding of 3D objects at the part level. Despite its significance, the current landscape lacks tasks and datasets that endow and assess this capability. Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks. To support learning and evaluating for these tasks, we introduce 3DCoMPaT Grounded Instructions Dataset (3DCoMPaT-GRIN). 3DCoMPaT-GRIN Vanilla, comprising 789k part-aware point cloud-instruction-segmentation mask triplets, is used to evaluate MLLMs' ability of part-aware segmentation grounding. 3DCoMPaT-GRIN Grounded Caption, containing 107k part-aware point cloud-instruction-grounded caption triplets, assesses both MLLMs' part-aware language comprehension and segmentation grounding capabilities. Our introduced tasks, dataset, and Kestrel represent a preliminary effort to bridge the gap between human cognition and 3D MLLMs, i.e., the ability to perceive and engage with the environment at both global and part levels. Extensive experiments on the 3DCoMPaT-GRIN show that Kestrel can generate user-specified segmentation masks, a capability not present in any existing 3D MLLM. Kestrel thus established a benchmark for evaluating the part-aware language comprehension and segmentation grounding of 3D objects. Project page at https://feielysia.github.io/Kestrel.github.io/
- Published
- 2024
9. Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation
- Author
-
Zhang, Bo, Ma, Hui, Ding, Jian, Wang, Jian, Xu, Bo, and Lin, Hongfei
- Subjects
Computer Science - Computation and Language ,Computer Science - Multimedia - Abstract
Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image-text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code will be publicly available following acceptance., Comment: Under Review
- Published
- 2024
10. When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
- Author
-
Ma, Xianzheng, Bhalgat, Yash, Smart, Brandon, Chen, Shuai, Li, Xinghui, Ding, Jian, Gu, Jindong, Chen, Dave Zhenyu, Peng, Songyou, Bian, Jia-Wang, Torr, Philip H, Pollefeys, Marc, Nießner, Matthias, Reid, Ian D, Chang, Angel X., Laina, Iro, and Prisacariu, Victor Adrian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.
- Published
- 2024
11. High-Order Synchrosqueezed Chirplet Transforms for Multicomponent Signal Analysis
- Author
-
Yen, Yi-Ju, Lu, De-Yan, Yeh, Sing-Yuan, Ding, Jian-Jiun, and Shen, Chun-Yen
- Subjects
Mathematics - Numerical Analysis ,Electrical Engineering and Systems Science - Signal Processing ,65T99, 42C99, 42a38 - Abstract
This study focuses on the analysis of signals containing multiple components with crossover instantaneous frequencies (IF). This problem was initially solved with the chirplet transform (CT). Also, it can be sharpened by adding the synchrosqueezing step, which is called the synchrosqueezed chirplet transform (SCT). However, we found that the SCT goes wrong with the high chirp modulation signal due to the wrong estimation of the IF. In this paper, we present the improvement of the post-transformation of the CT. The main goal of this paper is to amend the estimation introduced in the SCT and carry out the high-order synchrosqueezed chirplet transform. The proposed method reduces the wrong estimation when facing a stronger variety of chirp-modulated multi-component signals. The theoretical analysis of the new reassignment ingredient is provided. Numerical experiments on some synthetic signals are presented to verify the effectiveness of the proposed high-order SCT.
- Published
- 2024
12. Polynomial lower bound on the effective resistance for the one-dimensional critical long-range percolation
- Author
-
Ding, Jian, Fan, Zherui, and Huang, Lu-Jing
- Subjects
Mathematics - Probability ,60K35, 82B27, 82B43 - Abstract
In this work, we study the critical long-range percolation on $\mathbb{Z}$, where an edge connects $i$ and $j$ independently with probability $1-\exp\{-\beta |i-j|^{-2}\}$ for some fixed $\beta>0$. Viewing this as a random electric network where each edge has a unit conductance, we show that with high probability the effective resistances from the origin 0 to $[-N, N]^c$ and from the interval $[-N,N]$ to $[-2N,2N]^c$ (conditioned on no edge joining $[-N,N]$ and $[-2N,2N]^c$) both have a polynomial lower bound in $N$. Our bound holds for all $\beta>0$ and thus rules out a potential phase transition (around $\beta = 1$) which seemed to be a reasonable possibility., Comment: 26 pages, 10 figures
- Published
- 2024
13. Invisible and Semi-invisible Decays of Bottom Baryons
- Author
-
Zheng, Yong, Ding, Jian-Nan, Li, Dong-Hao, Li, Lei-Yi, Lü, Cai-Dian, and Yu, Fu-Sheng
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Experiment - Abstract
The similar densities of dark matter and baryons in the universe imply that they might arise from the same ultraviolet model. The B-Mesogenesis, which assumes dark matter is charged under the baryon number, attempts to simultaneously explain the origin of baryon asymmetry and dark matter in the universe. In particular, the B-Mesogenesis might induce bottom-baryon decays into invisible or semi-invisible final states, which provide a distinctive signal for probing this scenario. In this work, we systematically study the invisible decays of bottom baryons into dark matters, and semi-invisible decays of bottom baryons into a meson or a photon together with a dark matter particle. In particular, the fully invisible decay can explore the stable particles in B-Mesogenesis. Some QCD-based frameworks are used to calculate the hadronic matrix elements under the B-Mesogenesis model. We estimate the constraints on the Wilson coefficients or the product of some new physics couplings with the Wilson coefficients by the semi-invisible and invisible decays of bottom baryons at future colliders., Comment: 25 pages, 7 figures
- Published
- 2024
- Full Text
- View/download PDF
14. MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
- Author
-
Ataallah, Kirolos, Shen, Xiaoqian, Abdelrahman, Eslam, Sleiman, Essam, Zhu, Deyao, Ding, Jian, and Elhoseiny, Mohamed
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding. The model is capable of processing both temporal visual and textual data, making it adept at understanding the complexities of videos. Building upon the success of MiniGPT-v2, which excelled in translating visual features into the LLM space for single images and achieved impressive results on various image-text benchmarks, this paper extends the model's capabilities to process a sequence of frames, enabling it to comprehend videos. MiniGPT4-video does not only consider visual content but also incorporates textual conversations, allowing the model to effectively answer queries involving both visual and text components. The proposed model outperforms existing state-of-the-art methods, registering gains of 4.22%, 1.13%, 20.82%, and 13.1% on the MSVD, MSRVTT, TGIF, and TVQA benchmarks respectively. Our models and code have been made publicly available here https://vision-cair.github.io/MiniGPT4-video/, Comment: 6 pages,8 figures
- Published
- 2024
15. Unveiling the charge density wave mechanism in vanadium-based Bi-layered kagome metals
- Author
-
Yang, Yi-Chen, Cho, Soohyun, Li, Tong-Rui, Liu, Xiang-Qi, Liu, Zheng-Tai, Jiang, Zhi-Cheng, Ding, Jian-Yang, Xia, Wei, Tao, Zi-Cheng, Liu, Jia-Yu, Jing, Wen-Chuan, Huang, Yu, Shi, Yu-Ming, Huh, Soonsang, Kondo, Takeshi, Sun, Zhe, Liu, Ji-Shan, Ye, Mao, Wang, Yi-Lin, Guo, Yan-Feng, and Shen, Da-Wei
- Subjects
Condensed Matter - Materials Science - Abstract
The charge density wave (CDW), as a hallmark of vanadium-based kagome superconductor AV3Sb5 (A = K, Rb, Cs), has attracted intensive attention. However, the fundamental controversy regarding the underlying mechanism of CDW therein persists. Recently, the vanadium-based bi-layered kagome metal ScV6Sn6, reported to exhibit a long-range charge order below 94 K, has emerged as a promising candidate to further clarify this core issue. Here, employing micro-focusing angle-resolved photoemission spectroscopy ({\mu}-ARPES) and first-principles calculations, we systematically studied the unique CDW order in vanadium-based bi-layered kagome metals by comparing ScV6Sn6 with its isostructural counterpart YV6Sn6, which lacks a CDW ground state. Combining ARPES data and the corresponding joint density of states (DOS), we suggest that the VHS nesting mechanism might be invalid in these materials. Besides, in ScV6Sn6, we identified multiple hybridization energy gaps resulting from CDW-induced band folding, along with an anomalous band dispersion, implying a potential electron-phonon coupling driven mechanism underlying the formation of the CDW order. Our finding not only comprehensively maps the electronic structure of V-based bi-layer kagome metals but also provide constructive experimental evidence for the unique origin of CDW in this system., Comment: 14 pages, 5 figures
- Published
- 2024
16. Discovery of a novel BTK inhibitor S-016 and identification of a new strategy for the treatment of lymphomas including BTK inhibitor-resistant lymphomas
- Author
-
Song, Pei-ran, Wan, Zhi-peng, Huang, Ge-ge, Song, Zi-lan, Zhang, Tao, Tong, Lin-jiang, Fang, Yan, Tang, Hao-tian, Xue, Yu, Zhan, Zheng-sheng, Feng, Fang, Li, Yan, Shi, Wen-hao, Huang, Yu-qing, Chen, Yi, Duan, Wen-hu, Ding, Jian, Zhang, Ao, and Xie, Hua
- Published
- 2024
- Full Text
- View/download PDF
17. Long range order for three-dimensional random field Ising model throughout the entire low temperature regime
- Author
-
Ding, Jian, Liu, Yu, and Xia, Aoteng
- Published
- 2024
- Full Text
- View/download PDF
18. Optical soliton solutions of the resonant nonlinear Schrödinger equation with Kerr-law nonlinearity
- Author
-
Leta, Temesgen Desta, Liu, Wenjun, and Ding, Jian
- Published
- 2024
- Full Text
- View/download PDF
19. Downregulation of RNF128 Inhibits the Proliferation, Migration, Invasion and EMT of Colorectal Cancer Cells
- Author
-
Wang, Meng, Ding, Jian, Zhao, Aihong, Zhang, Yixin, Zhou, Yongkun, and Tian, Zhaochun
- Published
- 2024
- Full Text
- View/download PDF
20. Absolute and relative disparity mechanisms revealed by an equivalent noise analysis.
- Author
-
Ding, Jian, Lu, Hilary, and Levi, Dennis
- Subjects
Depth Perception ,Vision Disparity ,Noise ,Personality Inventory ,Vision ,Binocular - Abstract
The precision of stereopsis and vergence are ultimately limited by internal binocular disparity noise. Here we propose an equivalent noise model with both global and local internal disparity noises to provide a unified explanation of both absolute and relative disparity thresholds. To test this model, we developed a psychophysical procedure to measure the equivalent internal disparity noise by adding external disparity noise to random-Gabor-patch stereograms. We used the method of constant stimuli to measure the minimum and maximum disparity thresholds (Dmin and Dmax) for both absolute and relative disparity. Consistent with previous studies, we found that Dmin thresholds are substantially worse for absolute disparity than for relative disparity. We tested three relative disparity mechanisms: (1) the difference between the monocular separations of targets projecting to the two eyes; (2) the direct measurement of relative disparity; and (3) the difference of absolute disparities of targets. Computing the difference of absolute disparities when detecting relative disparity, Mechanism 3 cancels global noise, resulting in a much lower relative Dmin threshold, and provides a reasonable fit to the experimental data. We also found that the presence of as much as 2400 arcsec of external disparity noise does not appear to affect the Dmax threshold. This observation suggests that Dmax is implicated in a mechanism that disregards the disparity variance of individual items, relying instead on the average disparity across all items, supporting the depth model proposed in our previous study (Ding & Levi, 2021), which posits distinct mechanisms governing Dmin and Dmax thresholds.
- Published
- 2024
21. Uni3DL: Unified Model for 3D and Language Understanding
- Author
-
Li, Xiang, Ding, Jian, Chen, Zhaoyang, and Elhoseiny, Mohamed
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we present Uni3DL, a unified model for 3D and Language understanding. Distinct from existing unified vision-language models in 3D which are limited in task variety and predominantly dependent on projected multi-view images, Uni3DL operates directly on point clouds. This approach significantly expands the range of supported tasks in 3D, encompassing both vision and vision-language tasks in 3D. At the core of Uni3DL, a query transformer is designed to learn task-agnostic semantic and mask outputs by attending to 3D visual features, and a task router is employed to selectively generate task-specific outputs required for diverse tasks. With a unified architecture, our Uni3DL model enjoys seamless task decomposition and substantial parameter sharing across tasks. Uni3DL has been rigorously evaluated across diverse 3D vision-language understanding tasks, including semantic segmentation, object detection, instance segmentation, visual grounding, 3D captioning, and text-3D cross-modal retrieval. It demonstrates performance on par with or surpassing state-of-the-art (SOTA) task-specific models. We hope our benchmark and Uni3DL model will serve as a solid step to ease future research in unified models in the realm of 3D and language understanding. Project page: https://uni3dl.github.io.
- Published
- 2023
22. Chemical Constituents from the Fungus Inonotus sinensis
- Author
-
Ding, Jian-Hai, Yang, Yu-Qin, Quan, Chen-Xi, Xiong, Cai-Zhi, Luo, Wen-Guan, Zhao, Jia-Min, Xiao, Yang, Chen, Tian-Yin, Zhang, Dan-Min, Cao, Jin-Fen, Liu, Shi-Wei, and Liu, Ji-Kai
- Published
- 2024
- Full Text
- View/download PDF
23. One-arm exponent of critical level-set for metric graph Gaussian free field in high dimensions
- Author
-
Cai, Zhenhao and Ding, Jian
- Published
- 2024
- Full Text
- View/download PDF
24. A Polynomial Time Iterative Algorithm for Matching Gaussian Matrices with Non-vanishing Correlation
- Author
-
Ding, Jian and Li, Zhangsong
- Published
- 2024
- Full Text
- View/download PDF
25. The marine-derived compound TAG alleviates Parkinson’s disease by restoring RUBCN-mediated lipid metabolism homeostasis
- Author
-
Yang, Pei, Liu, Yang, Tong, Zhi-wu, Huang, Qian-hui, Xie, Xia-hong, Mao, Shi-yu, Ding, Jian-hua, Lu, Ming, Tan, Ren-xiang, and Hu, Gang
- Published
- 2024
- Full Text
- View/download PDF
26. Developing a CRISPR/FrCas9 system for core promoter editing in rice
- Author
-
Wang, Hui, Ding, Jian, Zhu, Jingyan, Liu, Xiaoshuang, Xu, Rongfang, Qin, Ruiying, Gu, Dongfang, Li, Min, Wei, Pengcheng, and Li, Juan
- Published
- 2024
- Full Text
- View/download PDF
27. Simultaneous Quantification of 39 Pesticides and Veterinary Drug Residues in Aquaculture Products Using Ultra Performance Liquid Chromatography Tandem Mass Spectrometry with Modified QuEChERS
- Author
-
Li, Shuo, Liu, Yijun, Jiang, Dan, Liu, Mengyao, Ding, Jian, Zhao, Fei, Liu, Yang, Hu, Xia, Mao, Xiqin, and Zhao, Qiancheng
- Published
- 2024
- Full Text
- View/download PDF
28. A phase transition and critical phenomenon for the two-dimensional random field Ising model
- Author
-
Ding, Jian, Huang, Fenglin, and Xia, Aoteng
- Subjects
Mathematics - Probability ,60K35, 82B44 - Abstract
We study the random field Ising model in a two-dimensional box with side length $N$ where the external field is given by independent normal variables with mean $0$ and variance $\epsilon^2$. Our primary result is the following phase transition at $T = T_c$: for $\epsilon \ll N^{-7/8}$ the boundary influence (i.e., the difference between the spin averages at the center of the box with the plus and the minus boundary conditions) decays as $N^{-1/8}$ and thus the disorder essentially has no effect on the boundary influence; for $\epsilon \gg N^{-7/8}$, the boundary influence decays as $N^{-\frac{1}{8}}e^{-\Theta(\epsilon^{8/7}\, N)}$ (i.e., the disorder contributes a factor of $e^{-\Theta(\epsilon^{8/7}\, N)}$ to the decay rate). For a natural notion of the correlation length, i.e., the minimal size of the box where the boundary influence shrinks by a factor of $2$ from that with no external field, we also prove the following: as $\epsilon\downarrow 0$ the correlation length transits from $\Theta(\epsilon^{-8/7})$ at $T_c$ to $e^{\Theta(\epsilon^{-4/3}\,\,)}$ for $T < T_c$., Comment: 65 pages; minor revision throughout over previous version
- Published
- 2023
29. Efficiently matching random inhomogeneous graphs via degree profiles
- Author
-
Ding, Jian, Fei, Yumou, and Wang, Yuanzheng
- Subjects
Computer Science - Data Structures and Algorithms ,Mathematics - Probability ,Mathematics - Statistics Theory ,Statistics - Machine Learning - Abstract
In this paper, we study the problem of recovering the latent vertex correspondence between two correlated random graphs with vastly inhomogeneous and unknown edge probabilities between different pairs of vertices. Inspired by and extending the matching algorithm via degree profiles by Ding, Ma, Wu and Xu (2021), we obtain an efficient matching algorithm as long as the minimal average degree is at least $\Omega(\log^{2} n)$ and the minimal correlation is at least $1 - O(\log^{-2} n)$., Comment: 44 pages, 3 figures
- Published
- 2023
30. Tightness of exponential metrics for log-correlated Gaussian fields in arbitrary dimension
- Author
-
Ding, Jian, Gwynne, Ewain, and Zhuang, Zijie
- Subjects
Mathematics - Probability ,Mathematical Physics - Abstract
We prove the tightness of a natural approximation scheme for an analog of the Liouville quantum gravity metric on $\mathbb R^d$ for arbitrary $d\geq 2$. More precisely, let $\{h_n\}_{n\geq 1}$ be a suitable sequence of Gaussian random functions which approximates a log-correlated Gaussian field on $\mathbb R^d$. Consider the family of random metrics on $\mathbb R^d$ obtained by weighting the lengths of paths by $e^{\xi h_n}$, where $\xi > 0$ is a parameter. We prove that if $\xi$ belongs to the subcritical phase (which is defined by the condition that the distance exponent $Q(\xi)$ is greater than $\sqrt{2d}$), then after appropriate re-scaling, these metrics are tight and that every subsequential limit is a metric on $\mathbb R^d$ which induces the Euclidean topology. We include a substantial list of open problems., Comment: 71 pages, 9 figures; minor revision
- Published
- 2023
31. Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
- Author
-
Wang, Yaoting, Liu, Weisong, Li, Guangyao, Ding, Jian, Hu, Di, and Li, Xi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio? In this work, we concentrate on the Audio-Visual Localization and Segmentation tasks but under the demanding zero-shot and few-shot scenarios. To achieve this goal, different from existing approaches that mostly employ the encoder-fusion-decoder paradigm to decode localization information from the fused audio-visual feature, we introduce the encoder-prompt-decoder paradigm, aiming to better fit the data scarcity and varying data distribution dilemmas with the help of abundant knowledge from pre-trained models. Specifically, we first propose to construct Semantic-aware Audio Prompt (SAP) to help the visual foundation model focus on sounding objects, meanwhile, the semantic gap between the visual and audio modalities is also encouraged to shrink. Then, we develop a Correlation Adapter (ColA) to keep minimal training efforts as well as maintain adequate knowledge of the visual foundation model. By equipping with these means, extensive experiments demonstrate that this new paradigm outperforms other fusion-based methods in both the unseen class and cross-dataset settings. We hope that our work can further promote the generalization study of Audio-Visual Localization and Segmentation in practical application scenarios., Comment: Accepted by AAAI 2024
- Published
- 2023
32. On the Robustness of Object Detection Models in Aerial Images
- Author
-
He, Haodong, Ding, Jian, and Xia, Gui-Song
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The robustness of object detection models is a major concern when applied to real-world scenarios. However, the performance of most object detection models degrades when applied to images subjected to corruptions, since they are usually trained and evaluated on clean datasets. Enhancing the robustness of object detection models is of utmost importance, especially for those designed for aerial images, which feature complex backgrounds, substantial variations in scales and orientations of objects. This paper addresses the challenge of assessing the robustness of object detection models in aerial images, with a specific emphasis on scenarios where images are affected by clouds. In this study, we introduce two novel benchmarks based on DOTA-v1.0. The first benchmark encompasses 19 prevalent corruptions, while the second focuses on cloud-corrupted images-a phenomenon uncommon in natural pictures yet frequent in aerial photography. We systematically evaluate the robustness of mainstream object detection models and perform numerous ablation experiments. Through our investigations, we find that enhanced model architectures, larger networks, well-crafted modules, and judicious data augmentation strategies collectively enhance the robustness of aerial object detection models. The benchmarks we propose and our comprehensive experimental analyses can facilitate research on robust object detection in aerial images. Codes and datasets are available at: (https://github.com/hehaodong530/DOTA-C), Comment: 16 pages
- Published
- 2023
33. Uniqueness of the critical long-range percolation metrics
- Author
-
Ding, Jian, Fan, Zherui, and Huang, Lu-Jing
- Subjects
Mathematics - Probability ,60K35, 05C12, 82B27, 82B43 - Abstract
In this work, we study the random metric for the critical long-range percolation on $\mathbb{Z}^d$. A recent work by B\"aumler [3] implies the subsequential scaling limit, and our main contribution is to prove that the subsequential limit is uniquely characterized by a natural list of axioms. Our proof method is hugely inspired by recent works of Gwynne and Miller [42], and Ding and Gwynne [25] on the uniqueness of Liouville quantum gravity metrics., Comment: 100 pages,17 figures
- Published
- 2023
34. Towards Generic and Controllable Attacks Against Object Detection
- Author
-
Li, Guopeng, Xu, Yue, Ding, Jian, and Xia, Gui-Song
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Existing adversarial attacks against Object Detectors (ODs) suffer from two inherent limitations. Firstly, ODs have complicated meta-structure designs, hence most advanced attacks for ODs concentrate on attacking specific detector-intrinsic structures, which makes it hard for them to work on other detectors and motivates us to design a generic attack against ODs. Secondly, most works against ODs make Adversarial Examples (AEs) by generalizing image-level attacks from classification to detection, which brings redundant computations and perturbations in semantically meaningless areas (e.g., backgrounds) and leads to an emergency for seeking controllable attacks for ODs. To this end, we propose a generic white-box attack, LGP (local perturbations with adaptively global attacks), to blind mainstream object detectors with controllable perturbations. For a detector-agnostic attack, LGP tracks high-quality proposals and optimizes three heterogeneous losses simultaneously. In this way, we can fool the crucial components of ODs with a part of their outputs without the limitations of specific structures. Regarding controllability, we establish an object-wise constraint that exploits foreground-background separation adaptively to induce the attachment of perturbations to foregrounds. Experimentally, the proposed LGP successfully attacked sixteen state-of-the-art object detectors on MS-COCO and DOTA datasets, with promising imperceptibility and transferability obtained. Codes are publicly released in https://github.com/liguopeng0923/LGP.git
- Published
- 2023
35. One-arm exponent of critical level-set for metric graph Gaussian free field in high dimensions
- Author
-
Cai, Zhenhao and Ding, Jian
- Subjects
Mathematics - Probability - Abstract
In this paper, we study the critical level-set of Gaussian free field (GFF) on the metric graph $\widetilde{\mathbb{Z}}^d,d>6$. We prove that the one-arm probability (i.e. the probability of the event that the origin is connected to the boundary of the box $B(N)$) is proportional to $N^{-2}$, where $B(N)$ is centered at the origin and has side length $2\lfloor N \rfloor$. Our proof is hugely inspired by Kozma and Nachmias [29] which proves the analogous result of the critical bond percolation for $d\geq 11$, and by Werner [51] which conjectures the similarity between the GFF level-set and the bond percolation in general and proves this connection for various geometric aspects.
- Published
- 2023
36. A polynomial-time iterative algorithm for random graph matching with non-vanishing correlation
- Author
-
Ding, Jian and Li, Zhangsong
- Subjects
Computer Science - Data Structures and Algorithms ,Mathematics - Probability ,Mathematics - Statistics Theory ,Statistics - Machine Learning ,68Q87, 90C35 - Abstract
We propose an efficient algorithm for matching two correlated Erd\H{o}s--R\'enyi graphs with $n$ vertices whose edges are correlated through a latent vertex correspondence. When the edge density $q= n^{- \alpha+o(1)}$ for a constant $\alpha \in [0,1)$, we show that our algorithm has polynomial running time and succeeds to recover the latent matching as long as the edge correlation is non-vanishing. This is closely related to our previous work on a polynomial-time algorithm that matches two Gaussian Wigner matrices with non-vanishing correlation, and provides the first polynomial-time random graph matching algorithm (regardless of the regime of $q$) when the edge correlation is below the square root of the Otter's constant (which is $\approx 0.338$)., Comment: 62 pages, 1 figure
- Published
- 2023
37. HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation
- Author
-
Ding, Jian, Xue, Nan, Xia, Gui-Song, Schiele, Bernt, and Dai, Dengxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Current semantic segmentation models have achieved great success under the independent and identically distributed (i.i.d.) condition. However, in real-world applications, test data might come from a different domain than training data. Therefore, it is important to improve model robustness against domain differences. This work studies semantic segmentation under the domain generalization setting, where a model is trained only on the source domain and tested on the unseen target domain. Existing works show that Vision Transformers are more robust than CNNs and show that this is related to the visual grouping property of self-attention. In this work, we propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks. The masks at different scales aim to segment out both parts and a whole of classes. HGFormer combines mask classification results at both scales for class label prediction. We assemble multiple interesting cross-domain settings by using seven public semantic segmentation datasets. Experiments show that HGFormer yields more robust semantic segmentation results than per-pixel classification methods and flat grouping transformers, and outperforms previous methods significantly. Code will be available at https://github.com/dingjiansw101/HGFormer., Comment: Accepted by CVPR 2023
- Published
- 2023
38. FreePoint: Unsupervised Point Cloud Instance Segmentation
- Author
-
Zhang, Zhikai, Ding, Jian, Jiang, Li, Dai, Dengxin, and Xia, Gui-Song
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Instance segmentation of point clouds is a crucial task in 3D field with numerous applications that involve localizing and segmenting objects in a scene. However, achieving satisfactory results requires a large number of manual annotations, which is a time-consuming and expensive process. To alleviate dependency on annotations, we propose a novel framework, FreePoint, for underexplored unsupervised class-agnostic instance segmentation on point clouds. In detail, we represent the point features by combining coordinates, colors, and self-supervised deep features. Based on the point features, we perform a bottom-up multicut algorithm to segment point clouds into coarse instance masks as pseudo labels, which are used to train a point cloud instance segmentation model. We propose an id-as-feature strategy at this stage to alleviate the randomness of the multicut algorithm and improve the pseudo labels' quality. During training, we propose a weakly-supervised two-step training strategy and corresponding losses to overcome the inaccuracy of coarse masks. FreePoint has achieved breakthroughs in unsupervised class-agnostic instance segmentation on point clouds and outperformed previous traditional methods by over 18.2% and a competitive concurrent work UnScene3D by 5.5% in AP. Additionally, when used as a pretext task and fine-tuned on S3DIS, FreePoint performs significantly better than existing self-supervised pre-training methods with limited annotations and surpasses CSC by 6.0% in AP with 10% annotation masks.
- Published
- 2023
39. Temporal Convolution Network Based Onset Detection and Query by Humming System Design
- Author
-
Hung, Yu Cheng and Ding, Jian-Jiun
- Subjects
Computer Science - Sound ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Onsets are a key factor to split audio into several notes. In this paper, we ensemble multiple temporal convolution network (TCN) based model and utilize a restricted frequency range spectrogram to achieve more robust onset detection. Different from the present onset detection of QBH system which is only available in a clean scenario, our proposal of onset detection and speech enhancement can prevent noise from affecting onset detection function (ODF). Compared to the CNN model which exploits spatial features of the spectrogram, the TCN model exploits both spatial and temporal features of the spectrogram. As the usage of QBH in noisy scenarios, we apply the TCN-based speech enhancement as a preprocessor of QBH. With the combinations of TCN-based speech enhancement and onset detection, simulations show that the proposal can enable the QBH system in both noisy and clean circumstances with short response time., Comment: This paper has been withdrawn by the author due to a crucial definition of probability threshold and several grammer and vocabulary mistakes
- Published
- 2023
40. Pitch Estimation by Denoising Preprocessor and Hybrid Estimation Model
- Author
-
Hung, Yu Cheng, Chen, Ping Hung, and Ding, Jian Jiun
- Subjects
Computer Science - Sound ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Pitch estimation is to estimate the fundamental frequency and the midi number and plays a critical role in music signal analysis and vocal signal processing. In this work, we proposed a new architecture based on a learning-based enhancement preprocessor and a combination of several traditional and deep learning pitch estimation methods to achieve better pitch estimation performance in both noisy and clean scenarios. We test 17 different types of noise and 4 SNRdb noise levels. The results show that the proposed pitch estimation can perform better in both noisy and clean scenarios with short response time., Comment: From ICCE-Taiwan
- Published
- 2023
41. Branched-chain amino acid transaminase 1 confers EGFR-TKI resistance through epigenetic glycolytic activation
- Author
-
Zhang, Tao, Pan, Zilu, Gao, Jing, Wu, Qingqing, Bai, Gang, Li, Yan, Tong, Linjiang, Feng, Fang, Lai, Mengzhen, Liu, Yingqiang, Song, Peiran, Ning, Yi, Tang, Haotian, Luo, Wen, Chen, Yi, Fang, Yan, Zhang, Hui, Liu, Qiupei, Zhang, Yudi, Wang, Hua, Chen, Zhiwei, Chen, Yi, Geng, Meiyu, Ji, Hongbin, Zhao, Guilong, Zhou, Hu, Ding, Jian, and Xie, Hua
- Published
- 2024
- Full Text
- View/download PDF
42. Evaluating the pro-survival potential of apoptotic bodies derived from 2D- and 3D- cultured adipose stem cells in ischaemic flaps
- Author
-
Yu, Gaoxiang, Ding, Jian, Yang, Ningning, Ge, Lu, Chen, Nuo, Zhang, Xuzi, Wang, Qiuchen, Liu, Xian, Zhang, Xuanlong, Jiang, Xiaoqiong, Geng, Yibo, Zhang, Chenxi, Pan, Jiadong, Wang, Xiangyang, Gao, Weiyang, Li, Zhijie, Zhang, Hongyu, Ni, Wenfei, Xiao, Jian, Zhou, Kailiang, and Yang, Liangliang
- Published
- 2024
- Full Text
- View/download PDF
43. Integration of case-based learning and three-dimensional printing for tetralogy of fallot instruction in clinical medical undergraduates: a randomized controlled trial
- Author
-
Zhao, Jian, Gong, Xin, Ding, Jian, Xiong, Kepin, Zhuang, Kangle, Huang, Rui, Li, Shu, and Miao, Huachun
- Published
- 2024
- Full Text
- View/download PDF
44. Synergistic effect of Gd and Sr on the microstructure and mechanical properties of Al–Si–Mg alloy
- Author
-
Dai, Jiahang, Xia, Xingchuan, Wang, Yao, Wang, Jiangbo, Xin, Wei, Zhang, Enkuan, Ding, Jian, and Liu, Yongchang
- Published
- 2024
- Full Text
- View/download PDF
45. From bench to bedside: current development and emerging trend of KRAS-targeted therapy
- Author
-
Chen, Yi, Liu, Qiu-pei, Xie, Hua, and Ding, Jian
- Published
- 2024
- Full Text
- View/download PDF
46. A Novel Hierarchical Structured mCeO2 NR/Mn3O4/Pt as an Efficient Catalyst for Low-Temperature Toluene Combustion
- Author
-
Wang, Jing, Chen, Hao, Jiao, Chaonan, Dai, Jian, Peng, Yinxian, and Ding, Jian
- Published
- 2024
- Full Text
- View/download PDF
47. Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
- Author
-
Xu, Chang, Ding, Jian, Wang, Jinwang, Yang, Wen, Yu, Huai, Yu, Lei, and Xia, Gui-Song
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Detecting arbitrarily oriented tiny objects poses intense challenges to existing detectors, especially for label assignment. Despite the exploration of adaptive label assignment in recent oriented object detectors, the extreme geometry shape and limited feature of oriented tiny objects still induce severe mismatch and imbalance issues. Specifically, the position prior, positive sample feature, and instance are mismatched, and the learning of extreme-shaped objects is biased and unbalanced due to little proper feature supervision. To tackle these issues, we propose a dynamic prior along with the coarse-to-fine assigner, dubbed DCFL. For one thing, we model the prior, label assignment, and object representation all in a dynamic manner to alleviate the mismatch issue. For another, we leverage the coarse prior matching and finer posterior constraint to dynamically assign labels, providing appropriate and relatively balanced supervision for diverse instances. Extensive experiments on six datasets show substantial improvements to the baseline. Notably, we obtain the state-of-the-art performance for one-stage detectors on the DOTA-v1.5, DOTA-v2.0, and DIOR-R datasets under single-scale training and testing. Codes are available at https://github.com/Chasel-Tsui/mmrotate-dcfl., Comment: Accepted by CVPR2023
- Published
- 2023
48. On the prevalence of the periodicity of maximizing measures
- Author
-
Ding, Jian, Li, Zhiqiang, and Zhang, Yiwei
- Subjects
Mathematics - Dynamical Systems ,Mathematics - Probability ,37A99 (Primary) 05C80, 37A50, 37C40, 37D35 (Secondary) - Abstract
For a continuous map $T: X\rightarrow X$ on a compact metric space $(X,d)$, we say that a function $f: X \rightarrow \mathbb{R}$ has the property $\mathscr{P}_T$ if its time averages along forward orbits of $T$ are maximized at a periodic orbit. In this paper, we prove that for the one-side full shift of two symbols, the property $\mathscr{P}_T$ is prevalent (in the sense of Hunt--Sauer--Yorke) in spaces of Lipschitz functions with respect to metrics with mildly fast decaying rate on the diameters of cylinder sets. This result is a strengthening of \cite[Theorem~A]{BZ16}, confirms the prediction mentioned in the ICM proceeding contribution of J. Bochi (\cite[Seciton 1]{Boc18}) suggested by experimental evidence, and is another step towards the Hunt--Ott conjectures in the area of ergodic optimization., Comment: 25 pages
- Published
- 2023
- Full Text
- View/download PDF
49. Few-Shot Object Detection via Variational Feature Aggregation
- Author
-
Han, Jiaming, Ren, Yuqiang, Ding, Jian, Yan, Ke, and Xia, Gui-Song
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
As few-shot object detectors are often trained with abundant base samples and fine-tuned on few-shot novel examples,the learned models are usually biased to base classes and sensitive to the variance of novel examples. To address this issue, we propose a meta-learning framework with two novel feature aggregation schemes. More precisely, we first present a Class-Agnostic Aggregation (CAA) method, where the query and support features can be aggregated regardless of their categories. The interactions between different classes encourage class-agnostic representations and reduce confusion between base and novel classes. Based on the CAA, we then propose a Variational Feature Aggregation (VFA) method, which encodes support examples into class-level support features for robust feature aggregation. We use a variational autoencoder to estimate class distributions and sample variational features from distributions that are more robust to the variance of support examples. Besides, we decouple classification and regression tasks so that VFA is performed on the classification branch without affecting object localization. Extensive experiments on PASCAL VOC and COCO demonstrate that our method significantly outperforms a strong baseline (up to 16\%) and previous state-of-the-art methods (4\% in average). Code will be available at: \url{https://github.com/csuhan/VFA}, Comment: Accepted by AAAI2023
- Published
- 2023
50. Detecting Building Changes with Off-Nadir Aerial Images
- Author
-
Pang, Chao, Wu, Jiang, Ding, Jian, Song, Can, and Xia, Gui-Song
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The tilted viewing nature of the off-nadir aerial images brings severe challenges to the building change detection (BCD) problem: the mismatch of the nearby buildings and the semantic ambiguity of the building facades. To tackle these challenges, we present a multi-task guided change detection network model, named as MTGCD-Net. The proposed model approaches the specific BCD problem by designing three auxiliary tasks, including: (1) a pixel-wise classification task to predict the roofs and facades of buildings; (2) an auxiliary task for learning the roof-to-footprint offsets of each building to account for the misalignment between building roof instances; and (3) an auxiliary task for learning the identical roof matching flow between bi-temporal aerial images to tackle the building roof mismatch problem. These auxiliary tasks provide indispensable and complementary building parsing and matching information. The predictions of the auxiliary tasks are finally fused to the main building change detection branch with a multi-modal distillation module. To train and test models for the BCD problem with off-nadir aerial images, we create a new benchmark dataset, named BANDON. Extensive experiments demonstrate that our model achieves superior performance over the previous state-of-the-art competitors.
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.