15,779 results on '"Zhao, Chen"'
Search Results
2. SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA
- Author
-
Zhang, Siyue, Luu, Anh Tuan, and Zhao, Chen
- Subjects
Computer Science - Computation and Language - Abstract
Text-to-SQL parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. In this paper, we identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets: Text-to-SQL demonstrates superiority in handling questions involving arithmetic operations and long tables; E2E TQA excels in addressing ambiguous questions, non-standard table schema, and complex table contents. To combine both strengths, we propose a Synergistic Table-based Question Answering approach that integrate different models via answer selection, which is agnostic to any model types. Further experiments validate that ensembling models by either feature-based or LLM-based answer selector significantly improves the performance over individual models., Comment: EMNLP 2024
- Published
- 2024
3. Minimal extension property of direct images
- Author
-
Zhao, Chen
- Subjects
Mathematics - Algebraic Geometry - Abstract
Given a projective morphism $f:X\to Y$ from a complex space to a complex manifold, we prove the Griffiths semi-positivity and minimal extension property of the direct image sheaf $f_\ast(\mathscr{F})$. Here, $\mathscr{F}$ is a coherent sheaf on $X$, which consists of the Grauert-Riemenschneider dualizing sheaf, a multiplier ideal sheaf, and a variation of Hodge structure (or more generally, a tame harmonic bundle)., Comment: Comments are welcome
- Published
- 2024
4. O-Mamba: O-shape State-Space Model for Underwater Image Enhancement
- Author
-
Dong, Chenyu, Zhao, Chen, Cai, Weiling, and Yang, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Underwater image enhancement (UIE) face significant challenges due to complex underwater lighting conditions. Recently, mamba-based methods have achieved promising results in image enhancement tasks. However, these methods commonly rely on Vmamba, which focuses only on spatial information modeling and struggles to deal with the cross-color channel dependency problem in underwater images caused by the differential attenuation of light wavelengths, limiting the effective use of deep networks. In this paper, we propose a novel UIE framework called O-mamba. O-mamba employs an O-shaped dual-branch network to separately model spatial and cross-channel information, utilizing the efficient global receptive field of state-space models optimized for underwater images. To enhance information interaction between the two branches and effectively utilize multi-scale information, we design a Multi-scale Bi-mutual Promotion Module. This branch includes MS-MoE for fusing multi-scale information within branches, Mutual Promotion module for interaction between spatial and channel information across branches, and Cyclic Multi-scale optimization strategy to maximize the use of multi-scale information. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) results.The code is available at https://github.com/chenydong/O-Mamba.
- Published
- 2024
5. BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports
- Author
-
Chen, Yuxuan, Yang, Haoyan, Pan, Hengkai, Siddiqui, Fardeen, Verdone, Antonio, Zhang, Qingyang, Chopra, Sumit, Zhao, Chen, and Shen, Yiqiu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Breast ultrasound is essential for detecting and diagnosing abnormalities, with radiology reports summarizing key findings like lesion characteristics and malignancy assessments. Extracting this critical information is challenging due to the unstructured nature of these reports, with varied linguistic styles and inconsistent formatting. While proprietary LLMs like GPT-4 are effective, they are costly and raise privacy concerns when handling protected health information. This study presents a pipeline for developing an in-house LLM to extract clinical information from radiology reports. We first use GPT-4 to create a small labeled dataset, then fine-tune a Llama3-8B model on it. Evaluated on clinician-annotated reports, our model achieves an average F1 score of 84.6%, which is on par with GPT-4. Our findings demonstrate the feasibility of developing an in-house LLM that not only matches GPT-4's performance but also offers cost reductions and enhanced data privacy., Comment: This paper has been accepted as the oral paper for the HCHM workshop, ACM Multimedia 2024
- Published
- 2024
6. CO2Wounds-V2: Extended Chronic Wounds Dataset From Leprosy Patients
- Author
-
Sanchez, Karen, Hinojosa, Carlos, Mieles, Olinto, Zhao, Chen, Ghanem, Bernard, and Arguello, Henry
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Chronic wounds pose an ongoing health concern globally, largely due to the prevalence of conditions such as diabetes and leprosy's disease. The standard method of monitoring these wounds involves visual inspection by healthcare professionals, a practice that could present challenges for patients in remote areas with inadequate transportation and healthcare infrastructure. This has led to the development of algorithms designed for the analysis and follow-up of wound images, which perform image-processing tasks such as classification, detection, and segmentation. However, the effectiveness of these algorithms heavily depends on the availability of comprehensive and varied wound image data, which is usually scarce. This paper introduces the CO2Wounds-V2 dataset, an extended collection of RGB wound images from leprosy patients with their corresponding semantic segmentation annotations, aiming to enhance the development and testing of image-processing algorithms in the medical field., Comment: 2024 IEEE International Conference on Image Processing (ICIP 2024)
- Published
- 2024
7. Learning Fair Invariant Representations under Covariate and Correlation Shifts Simultaneously
- Author
-
Li, Dong, Zhao, Chen, Shao, Minglai, and Wang, Wenjun
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
Achieving the generalization of an invariant classifier from training domains to shifted test domains while simultaneously considering model fairness is a substantial and complex challenge in machine learning. Existing methods address the problem of fairness-aware domain generalization, focusing on either covariate shift or correlation shift, but rarely consider both at the same time. In this paper, we introduce a novel approach that focuses on learning a fairness-aware domain-invariant predictor within a framework addressing both covariate and correlation shifts simultaneously, ensuring its generalization to unknown test domains inaccessible during training. In our approach, data are first disentangled into content and style factors in latent spaces. Furthermore, fairness-aware domain-invariant content representations can be learned by mitigating sensitive information and retaining as much other information as possible. Extensive empirical studies on benchmark datasets demonstrate that our approach surpasses state-of-the-art methods with respect to model accuracy as well as both group and individual fairness., Comment: CIKM 2024
- Published
- 2024
8. Reproduction of NGC1052-DF4 by self-interacting dark matter: dark matter deficiency and tidal features
- Author
-
Zhang, Zhao-Chen, Bi, Xiao-Jun, and Yin, Peng-Fei
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,High Energy Physics - Phenomenology - Abstract
Observations of the velocity dispersion indicate a severe dark matter (DM) deficit in the ultra-diffuse galaxy, NGC1052-DF4 (DF4). The ultra-deep images obtained with the Gemini telescope, which has the deepest imaging data till now, confirm the presence of tidal tails in DF4, suggesting its tidal formation. To enhance tidal effects, we consider the self-interaction among DM particles. Using an N-body simulation in the scenario of self-interacting dark matter (SIDM), we reproduce a DM-deficient galaxy that is consistent with all observational data of DF4. Specifically, our simulation result yields an extremely low DM-to-star mass ratio and a radial surface brightness profile very similar to that from deep images, showing accurate tidal features. By performing simulations with similar tidal effects and various cross-sections of SIDM, we show a significant impact of SIDM on the DM-to-star mass ratio in the central region of the galaxy. Our work confirms the tidal formation of DF4 in theory., Comment: 9 pages, 6 figures
- Published
- 2024
9. Harnessing Temporal Causality for Advanced Temporal Action Detection
- Author
-
Liu, Shuming, Sui, Lin, Zhang, Chen-Lin, Mu, Fangzhou, Zhao, Chen, and Ghanem, Bernard
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries. Over the years, various networks, including convolutions, graphs, and transformers, have been explored for effective temporal modeling for TAD. However, these modules typically treat past and future information equally, overlooking the crucial fact that changes in action boundaries are essentially causal events. Inspired by this insight, we propose leveraging the temporal causality of actions to enhance TAD representation by restricting the model's access to only past or future context. We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on multiple benchmarks. Notably, with CausalTAD, we ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, as well as 1st in the Moment Queries track at the Ego4D Challenge 2024. Our code is available at https://github.com/sming256/OpenTAD/., Comment: 1st in Moment Queries track at the Ego4D Challenge 2024; 1st in Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024
- Published
- 2024
10. Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video
- Author
-
Zhao, Yiqun, Wu, Chenming, Huang, Binbin, Zhi, Yihao, Zhao, Chen, Wang, Jingdong, and Gao, Shenghua
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper introduces the Surfel-based Gaussian Inverse Avatar (SGIA) method, which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling Physically-Based Rendering (PBR) properties for clothed human avatars, allowing for the manipulation of avatars into novel poses under diverse lighting conditions. Specifically, our approach integrates pre-integration and image-based lighting for fast light calculations that surpass the performance of existing implicit-based techniques. To address challenges related to material lighting disentanglement and accurate geometry reconstruction, we propose an innovative occlusion approximation strategy and a progressive training approach. Extensive experiments demonstrate that SGIA not only achieves highly accurate physical properties but also significantly enhances the realistic relighting of dynamic human avatars, providing a substantial speed advantage. We exhibit more results in our project page: https://GS-IA.github.io., Comment: Under Review; Project Page: https://GS-IA.github.io
- Published
- 2024
11. Cycle Contrastive Adversarial Learning for Unsupervised image Deraining
- Author
-
Zhao, Chen, Cai, Weiling, Hu, ChengWei, and Yuan, Zheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
To tackle the difficulties in fitting paired real-world data for single image deraining (SID), recent unsupervised methods have achieved notable success. However, these methods often struggle to generate high-quality, rain-free images due to a lack of attention to semantic representation and image content, resulting in ineffective separation of content from the rain layer. In this paper, we propose a novel cycle contrastive generative adversarial network for unsupervised SID, called CCLGAN. This framework combines cycle contrastive learning (CCL) and location contrastive learning (LCL). CCL improves image reconstruction and rain-layer removal by bringing similar features closer and pushing dissimilar features apart in both semantic and discriminative spaces. At the same time, LCL preserves content information by constraining mutual information at the same location across different exemplars. CCLGAN shows superior performance, as extensive experiments demonstrate the benefits of CCLGAN and the effectiveness of its components.
- Published
- 2024
12. Variational Quantum Imaginary Time Evolution for Matrix Product State Ansatz with Tests on Transcorrelated Hamiltonians
- Author
-
Li, Hao-En, Li, Xiang, Huang, Jia-Cheng, Zhang, Guang-Ze, Shen, Zhu-Ping, Zhao, Chen, Li, Jun, and Hu, Han-Shi
- Subjects
Quantum Physics ,Physics - Chemical Physics - Abstract
The matrix product state (MPS) ansatz offers a promising approach for finding the ground state of molecular Hamiltonians and solving quantum chemistry problems. Building on this concept, the proposed technique of quantum circuit MPS (QCMPS) enables the simulation of chemical systems using a relatively small number of qubits. In this study, we enhance the optimization performance of the QCMPS ansatz by employing the variational quantum imaginary time evolution (VarQITE) approach. Guided by McLachlan's variational principle, the VarQITE method provides analytical metrics and gradients, resulting in improved convergence efficiency and robustness of the QCMPS. We validate these improvements numerically through simulations of $\rm H_2$, $\rm H_4$, and $\rm LiH$ molecules. Additionally, given that VarQITE is applicable to non-Hermitian Hamiltonians, we evaluate its effectiveness in preparing the ground state of transcorrelated (TC) Hamiltonians. This approach yields energy estimates comparable to the complete basis set (CBS) limit while using even fewer qubits. Specifically, we perform simulations of the beryllium atom and $\rm LiH$ molecule using only three qubits, maintaining high fidelity with the CBS ground state energy of these systems. This qubit reduction is achieved through the combined advantages of both the QCMPS ansatz and transcorrelation. Our findings demonstrate the potential practicality of this quantum chemistry algorithm on near-term quantum devices., Comment: 15 pages, 8 figures
- Published
- 2024
13. Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
- Author
-
Mai, Jinjie, Hamdi, Abdullah, Giancola, Silvio, Zhao, Chen, and Ghanem, Bernard
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We built our pipeline EgoLoc-v1, mainly inspired by EgoLoc. We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task, which has been proven to be essential in previous work. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. In this way, we have a hybrid SfM and camera relocalization pipeline, which can provide us with more camera poses, leading to higher QwP and overall success rate. Our method achieves the best performance regarding the most important metric, the overall success rate. We surpass previous state-of-the-art, the competitive EgoLoc, by $1.5\%$. The code is available at \url{https://github.com/Wayne-Mai/egoloc_v1}., Comment: 1st place winner of the 2024 Ego4D-Ego-Exo4D Challenge in VQ3D
- Published
- 2024
14. Large-scale quantum reservoir learning with an analog quantum computer
- Author
-
Kornjača, Milan, Hu, Hong-Ye, Zhao, Chen, Wurtz, Jonathan, Weinberg, Phillip, Hamdan, Majd, Zhdanov, Andrii, Cantu, Sergio H., Zhou, Hengyun, Bravo, Rodrigo Araiza, Bagnall, Kevin, Basham, James I., Campo, Joseph, Choukri, Adam, DeAngelo, Robert, Frederick, Paige, Haines, David, Hammett, Julian, Hsu, Ning, Hu, Ming-Guang, Huber, Florian, Jepsen, Paul Niklas, Jia, Ningyuan, Karolyshyn, Thomas, Kwon, Minho, Long, John, Lopatin, Jonathan, Lukin, Alexander, Macrì, Tommaso, Marković, Ognjen, Martínez-Martínez, Luis A., Meng, Xianmei, Ostroumov, Evgeny, Paquette, David, Robinson, John, Rodriguez, Pedro Sales, Singh, Anshuman, Sinha, Nandan, Thoreen, Henry, Wan, Noel, Waxman-Lenz, Daniel, Wong, Tak, Wu, Kai-Hsin, Lopes, Pedro L. S., Boger, Yuval, Gemelke, Nathan, Kitagawa, Takuya, Keesling, Alexander, Gao, Xun, Bylinskii, Alexei, Yelin, Susanne F., Liu, Fangli, and Wang, Sheng-Tao
- Subjects
Quantum Physics ,Condensed Matter - Disordered Systems and Neural Networks ,Physics - Atomic Physics - Abstract
Quantum machine learning has gained considerable attention as quantum technology advances, presenting a promising approach for efficiently learning complex data patterns. Despite this promise, most contemporary quantum methods require significant resources for variational parameter optimization and face issues with vanishing gradients, leading to experiments that are either limited in scale or lack potential for quantum advantage. To address this, we develop a general-purpose, gradient-free, and scalable quantum reservoir learning algorithm that harnesses the quantum dynamics of neutral-atom analog quantum computers to process data. We experimentally implement the algorithm, achieving competitive performance across various categories of machine learning tasks, including binary and multi-class classification, as well as timeseries prediction. Effective and improving learning is observed with increasing system sizes of up to 108 qubits, demonstrating the largest quantum machine learning experiment to date. We further observe comparative quantum kernel advantage in learning tasks by constructing synthetic datasets based on the geometric differences between generated quantum and classical data kernels. Our findings demonstrate the potential of utilizing classically intractable quantum correlations for effective machine learning. We expect these results to stimulate further extensions to different quantum hardware and machine learning paradigms, including early fault-tolerant hardware and generative machine learning tasks., Comment: 10 + 14 pages, 4 + 7 figures
- Published
- 2024
15. Semantic-guided Adversarial Diffusion Model for Self-supervised Shadow Removal
- Author
-
Zeng, Ziqi, Zhao, Chen, Cai, Weiling, and Dong, Chenyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing unsupervised methods have addressed the challenges of inconsistent paired data and tedious acquisition of ground-truth labels in shadow removal tasks. However, GAN-based training often faces issues such as mode collapse and unstable optimization. Furthermore, due to the complex mapping between shadow and shadow-free domains, merely relying on adversarial learning is not enough to capture the underlying relationship between two domains, resulting in low quality of the generated images. To address these problems, we propose a semantic-guided adversarial diffusion framework for self-supervised shadow removal, which consists of two stages. At first stage a semantic-guided generative adversarial network (SG-GAN) is proposed to carry out a coarse result and construct paired synthetic data through a cycle-consistent structure. Then the coarse result is refined with a diffusion-based restoration module (DBRM) to enhance the texture details and edge artifact at second stage. Meanwhile, we propose a multi-modal semantic prompter (MSP) that aids in extracting accurate semantic information from real images and text, guiding the shadow removal network to restore images better in SG-GAN. We conduct experiments on multiple public datasets, and the experimental results demonstrate the effectiveness of our method.
- Published
- 2024
16. XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis
- Author
-
Li, Hao, Yuan, Ming, Zhang, Yan, Wu, Chenming, Zhao, Chen, Song, Chunyu, Feng, Haocheng, Ding, Errui, Zhang, Dingwen, and Wang, Jingdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Thoroughly testing autonomy systems is crucial in the pursuit of safe autonomous driving vehicles. It necessitates creating safety-critical scenarios that go beyond what can be safely collected from real-world data, as many of these scenarios occur infrequently on public roads. However, the evaluation of most existing NVS methods relies on sporadic sampling of image frames from the training data, comparing the rendered images with ground truth images using metrics. Unfortunately, this evaluation protocol falls short of meeting the actual requirements in closed-loop simulations. Specifically, the true application demands the capability to render novel views that extend beyond the original trajectory (such as cross-lane views), which are challenging to capture in the real world. To address this, this paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations. This dataset is unique as it includes testing images captured by deviating from the training trajectory by 1-4 meters. It comprises six sequences encompassing various time and weather conditions. Each sequence contains 450 training images, 150 testing images, and their corresponding camera poses and intrinsic parameters. Leveraging this novel dataset, we establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multi-camera settings. The experimental findings underscore the significant gap that exists in current approaches, revealing their inadequate ability to fulfill the demanding prerequisites of cross-lane or closed-loop simulation. Our dataset is released publicly at the project page: https://3d-aigc.github.io/XLD/., Comment: project page: https://3d-aigc.github.io/XLD/
- Published
- 2024
17. VDG: Vision-Only Dynamic Gaussian for Driving Simulation
- Author
-
Li, Hao, Li, Jingfeng, Zhang, Dingwen, Wu, Chenming, Shi, Jieqi, Zhao, Chen, Feng, Haocheng, Ding, Errui, Wang, Jingdong, and Han, Junwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (VDG) to boost pose and depth initialization and static-dynamic decomposition. Moreover, VDG can work with only RGB image input and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method. We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods. Additional video and source code will be posted on our project page at https://3d-aigc.github.io/VDG.
- Published
- 2024
18. Algorithmic Fault Tolerance for Fast Quantum Computing
- Author
-
Zhou, Hengyun, Zhao, Chen, Cain, Madelyn, Bluvstein, Dolev, Duckering, Casey, Hu, Hong-Ye, Wang, Sheng-Tao, Kubica, Aleksander, and Lukin, Mikhail D.
- Subjects
Quantum Physics - Abstract
Fast, reliable logical operations are essential for the realization of useful quantum computers, as they are required to implement practical quantum algorithms at large scale. By redundantly encoding logical qubits into many physical qubits and using syndrome measurements to detect and subsequently correct errors, one can achieve very low logical error rates. However, for most practical quantum error correcting (QEC) codes such as the surface code, it is generally believed that due to syndrome extraction errors, multiple extraction rounds -- on the order of the code distance d -- are required for fault-tolerant computation. Here, we show that contrary to this common belief, fault-tolerant logical operations can be performed with constant time overhead for a broad class of QEC codes, including the surface code with magic state inputs and feed-forward operations, to achieve "algorithmic fault tolerance". Through the combination of transversal operations and novel strategies for correlated decoding, despite only having access to partial syndrome information, we prove that the deviation from the ideal measurement result distribution can be made exponentially small in the code distance. We supplement this proof with circuit-level simulations in a range of relevant settings, demonstrating the fault tolerance and competitive performance of our approach. Our work sheds new light on the theory of fault tolerance, potentially reducing the space-time cost of practical fault-tolerant quantum computation by orders of magnitude.
- Published
- 2024
19. FADE: Towards Fairness-aware Augmentation for Domain Generalization via Classifier-Guided Score-based Diffusion Models
- Author
-
Lin, Yujie, Li, Dong, Zhao, Chen, and Shao, Minglai
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Fairness-aware domain generalization (FairDG) has emerged as a critical challenge for deploying trustworthy AI systems, particularly in scenarios involving distribution shifts. Traditional methods for addressing fairness have failed in domain generalization due to their lack of consideration for distribution shifts. Although disentanglement has been used to tackle FairDG, it is limited by its strong assumptions. To overcome these limitations, we propose Fairness-aware Classifier-Guided Score-based Diffusion Models (FADE) as a novel approach to effectively address the FairDG issue. Specifically, we first pre-train a score-based diffusion model (SDM) and two classifiers to equip the model with strong generalization capabilities across different domains. Then, we guide the SDM using these pre-trained classifiers to effectively eliminate sensitive information from the generated data. Finally, the generated fair data is used to train downstream classifiers, ensuring robust performance under new data distributions. Extensive experiments on three real-world datasets demonstrate that FADE not only enhances fairness but also improves accuracy in the presence of distribution shifts. Additionally, FADE outperforms existing methods in achieving the best accuracy-fairness trade-offs.
- Published
- 2024
20. AGFA-Net: Attention-Guided and Feature-Aggregated Network for Coronary Artery Segmentation using Computed Tomography Angiography
- Author
-
Liu, Xinyun and Zhao, Chen
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Coronary artery disease (CAD) remains a prevalent cardiovascular condition, posing significant health risks worldwide. This pathology, characterized by plaque accumulation in coronary artery walls, leads to myocardial ischemia and various symptoms, including chest pain and shortness of breath. Accurate segmentation of coronary arteries from coronary computed tomography angiography (CCTA) images is crucial for diagnosis and treatment planning. Traditional segmentation methods face challenges in handling low-contrast images and complex anatomical structures. In this study, we propose an attention-guided, feature-aggregated 3D deep network (AGFA-Net) for coronary artery segmentation using CCTA images. AGFA-Net leverages attention mechanisms and feature refinement modules to capture salient features and enhance segmentation accuracy. Evaluation on a dataset comprising 1,000 CCTA scans demonstrates AGFA-Net's superior performance, achieving an average Dice coefficient similarity of 86.74% and a Hausdorff distance of 0.23 mm during 5-fold cross-validation. Ablation studies further validate the effectiveness of the proposed modules, highlighting their contributions to improved segmentation accuracy. Overall, AGFA-Net offers a robust and reliable solution for coronary artery segmentation, addressing challenges posed by varying vessel sizes, complex anatomies, and low image contrast., Comment: 13 pages, 7 figures
- Published
- 2024
21. Hofstadter spectrum in a semiconductor moir\'e lattice
- Author
-
Zhao, Chen, Wu, Ming, Ma, Zhen, Liang, Miao, Lu, Ming, Gao, Jin-Hua, and Xie, X. C.
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Recently, the Hofstadter spectrum of a twisted $\mathrm{WSe_2/MoSe_2}$ heterobilayer has been observed in experiment [C. R. Kometter, et al. Nat.Phys.19, 1861 (2023)], but the origin of Hofstadter states remains unclear. Here, we present a comprehensive theoretical interpretation of the observed Hofstadter states by calculating its accurate Hofstadter spectrum. We point out that the valley Zeeman effect, a unique feature of the transition metal dichalcogenide (TMD) materials, plays a crucial role in determining the shape of the Hofstadter spectrum, due to the narrow bandwidth of the moir\'e bands. This is distinct from the graphene-based moir\'e systems. We further predict that the Hofstadter spectrum of the moir\'e flat band, which was not observed in experiment, can be observed in the same system with a larger twist angle $2^\circ\lesssim\theta \lesssim 3^\circ$. Our theory paves the way for further studies of the interplay between the Hofstadter states and correlated insulting states in such moir\'e lattice systems., Comment: 7 pages, 4 figures
- Published
- 2024
22. OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
- Author
-
Wu, Yanmin, Meng, Jiarui, Li, Haijie, Wu, Chenming, Shi, Yahao, Cheng, Xinhua, Zhao, Chen, Feng, Haocheng, Ding, Errui, Wang, Jingdong, and Zhang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the fine level. Finally, we introduce an instance-level 3D-2D feature association method that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments, including open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies, demonstrate the effectiveness of our proposed method. Project page: https://3d-aigc.github.io/OpenGaussian, Comment: technical report, 15 pages
- Published
- 2024
23. A Staged Approach using Machine Learning and Uncertainty Quantification to Predict the Risk of Hip Fracture
- Author
-
Shaik, Anjum, Larsen, Kristoffer, Lane, Nancy E., Zhao, Chen, Su, Kuan-Jui, Keyak, Joyce H., Tian, Qing, Sha, Qiuying, Shen, Hui, Deng, Hong-Wen, and Zhou, Weihua
- Subjects
Physics - Medical Physics ,Computer Science - Machine Learning - Abstract
Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using CNNs to extract features from hip DXA images, along with clinical variables, shape measurements, and texture features, our method provides a comprehensive framework for assessing fracture risk. A staged machine learning-based model was developed using two ensemble models: Ensemble 1 (clinical variables only) and Ensemble 2 (clinical variables and DXA imaging features). This staged approach used uncertainty quantification from Ensemble 1 to decide if DXA features are necessary for further prediction. Ensemble 2 exhibited the highest performance, achieving an AUC of 0.9541, an accuracy of 0.9195, a sensitivity of 0.8078, and a specificity of 0.9427. The staged model also performed well, with an AUC of 0.8486, an accuracy of 0.8611, a sensitivity of 0.5578, and a specificity of 0.9249, outperforming Ensemble 1, which had an AUC of 0.5549, an accuracy of 0.7239, a sensitivity of 0.1956, and a specificity of 0.8343. Furthermore, the staged model suggested that 54.49% of patients did not require DXA scanning. It effectively balanced accuracy and specificity, offering a robust solution when DXA data acquisition is not always feasible. Statistical tests confirmed significant differences between the models, highlighting the advantages of the advanced modeling strategies. Our staged approach could identify individuals at risk with a high accuracy but reduce the unnecessary DXA scanning. It has great promise to guide interventions to prevent hip fractures with reduced cost and radiation., Comment: 29 pages, 5 figures, 6 tables
- Published
- 2024
24. Moir\'e flat bands in alternating twisted $\mathrm{MoTe_2}$ multilayer
- Author
-
Liang, Miao, Ding, Shi-Ping, Wu, Ming, Zhao, Chen, and Gao, Jin-Hua
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
The long-awaited fractional quantum anomalous Hall (FQAH) effect recently has been observed in the twisted $\mathrm{MoTe_2}$ homobilayers, causing a great sensation. Here, we theoretically investigate the moir\'e band structures of a closely related system, the alternating twisted multilayer $\mathrm{MoTe_2}$ (ATML-$\mathrm{MoTe_2}$), where the adjacent layers have opposite twist angles. We illustrate that such ATML-$\mathrm{MoTe_2}$ is a very unique moir\'e system, exhibiting multiple topological flat bands highly controllable by the layer number and twist angle, which is not only an ideal platform to simulate Hubbard model, but also may host FQAH states. Specifically, an N-layer ATML-$\mathrm{MoTe_2}$ ($N \geq 3$) always possesses $N-2$ topological flat bands near Fermi energy $E_f$, which has an odd-even dependent decomposition rule to understand the behaviors of the moir\'e flat bands. We predict three intriguing examples: (1) The AT3L-$\mathrm{MoTe_2}$ ($N=3$) has one isolated moir\'e flat band, which corresponds to a triangular lattice Hubbard model, resembling the twisted TMD heterobilayers. (2) The AT4L-$\mathrm{MoTe_2}$ ($N=4$) has two topological flat bands that are very similar to the twisted $\mathrm{MoTe_2}$ homobilayers, implying the possible existence of FQAH states. (3) When $N>4$, the giant density of states (DOS) induced by the multiple moir\'e flat bands may induce exotic correlated states., Comment: 11 pages, 5 figures
- Published
- 2024
25. A review on machine learning for arterial extraction and quantitative assessment on invasive coronary angiograms
- Author
-
Baral, Pukar, Zhao, Chen, Esposito, Michele, and Zhou, Weihua
- Subjects
Physics - Medical Physics - Abstract
Purpose of Review Recently, machine learning has developed rapidly in the field of medicine, playing an important role in disease diagnosis. Our aim of this paper is to provide an overview of the advancements in machine learning techniques applied to invasive coronary angiography (ICA) for segmentation of coronary arteries and quantitative evaluation like fractional flow reserve (FFR) and stenosis assessment. Recent Findings ICA are used extensively along with machine learning techniques for the segmentation of arteries and quantitative evaluation of stenosis, coronary artery disease and measurement of fractional flow reserve, representing a trend towards using computational methods for enhanced diagnostic precision in cardiovascular medicine. Summary Various research studies have been conducted in this field, each using different algorithms and datasets. The performance of these studies largely depends on the algorithms employed and the datasets used for training and evaluation. However, despite the progress made, there remains a need for machine learning (ML) algorithms that can be easily integrated into clinical practice.
- Published
- 2024
26. Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models
- Author
-
Komanduri, Aneesh, Zhao, Chen, Chen, Feng, and Wu, Xintao
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Methodology - Abstract
Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant research effort to improve image sample quality, there is little work on representation-controlled generation using diffusion models. Specifically, causal modeling and controllable counterfactual generation using DPMs is an underexplored area. In this work, we propose CausalDiffAE, a diffusion-based causal representation learning framework to enable counterfactual generation according to a specified causal model. Our key idea is to use an encoder to extract high-level semantically meaningful causal variables from high-dimensional data and model stochastic variation using reverse diffusion. We propose a causal encoding mechanism that maps high-dimensional data to causally related latent factors and parameterize the causal mechanisms among latent factors using neural networks. To enforce the disentanglement of causal variables, we formulate a variational objective and leverage auxiliary label information in a prior to regularize the latent space. We propose a DDIM-based counterfactual generation procedure subject to do-interventions. Finally, to address the limited label supervision scenario, we also study the application of CausalDiffAE when a part of the training data is unlabeled, which also enables granular control over the strength of interventions in generating counterfactuals during inference. We empirically show that CausalDiffAE learns a disentangled latent space and is capable of generating high-quality counterfactual images., Comment: Accepted to the 27th European Conference on Artificial Intelligence (ECAI 2024)
- Published
- 2024
27. Improved Optimization for the Neural-network Quantum States and Tests on the Chromium Dimer
- Author
-
Li, Xiang, Huang, Jia-Cheng, Zhang, Guang-Ze, Li, Hao-En, Shen, Zhu-Ping, Zhao, Chen, Li, Jun, and Hu, Han-Shi
- Subjects
Physics - Chemical Physics ,Quantum Physics - Abstract
The advent of Neural-network Quantum States (NQS) has significantly advanced wave function ansatz research, sparking a resurgence in orbital space variational Monte Carlo (VMC) exploration. This work introduces three algorithmic enhancements to reduce computational demands of VMC optimization using NQS: an adaptive learning rate algorithm, constrained optimization, and block optimization. We evaluate the refined algorithm on complex multireference bond stretches of $\rm H_2O$ and $\rm N_2$ within the cc-pVDZ basis set and calculate the ground-state energy of the strongly correlated chromium dimer ($\rm Cr_2$) in the Ahlrichs SV basis set. Our results achieve superior accuracy compared to coupled cluster theory at a relatively modest CPU cost. This work demonstrates how to enhance optimization efficiency and robustness using these strategies, opening a new path to optimize large-scale Restricted Boltzmann Machine (RBM)-based NQS more effectively and marking a substantial advancement in NQS's practical quantum chemistry applications., Comment: 13 pages, 9 figures, and 2 tables
- Published
- 2024
- Full Text
- View/download PDF
28. Gaussian-LIC: Real-Time Photo-Realistic SLAM with Gaussian Splatting and LiDAR-Inertial-Camera Fusion
- Author
-
Lang, Xiaolei, Li, Laijian, Wu, Chenming, Zhao, Chen, Liu, Lina, Liu, Yong, Lv, Jiajun, and Zuo, Xingxing
- Subjects
Computer Science - Robotics - Abstract
In this paper, we present a real-time photo-realistic SLAM method based on marrying Gaussian Splatting with LiDAR-Inertial-Camera SLAM. Most existing radiance-field-based SLAM systems mainly focus on bounded indoor environments, equipped with RGB-D or RGB sensors. However, they are prone to decline when expanding to unbounded scenes or encountering adverse conditions, such as violent motions and changing illumination. In contrast, oriented to general scenarios, our approach additionally tightly fuses LiDAR, IMU, and camera for robust pose estimation and photo-realistic online mapping. To compensate for regions unobserved by the LiDAR, we propose to integrate both the triangulated visual points from images and LiDAR points for initializing 3D Gaussians. In addition, the modeling of the sky and varying camera exposure have been realized for high-quality rendering. Notably, we implement our system purely with C++ and CUDA, and meticulously design a series of strategies to accelerate the online optimization of the Gaussian-based scene representation. Extensive experiments demonstrate that our method outperforms its counterparts while maintaining real-time capability. Impressively, regarding photo-realistic mapping, our method with our estimated poses even surpasses all the compared approaches that utilize privileged ground-truth poses for mapping. Our code will be released on project page https://xingxingzuo.github.io/gaussian_lic.
- Published
- 2024
29. Towards Automated Movie Trailer Generation
- Author
-
Argaw, Dawit Mureja, Soldan, Mattia, Pardo, Alejandro, Zhao, Chen, Heilbron, Fabian Caba, Chung, Joon Son, and Ghanem, Bernard
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Movie trailers are an essential tool for promoting films and attracting audiences. However, the process of creating trailers can be time-consuming and expensive. To streamline this process, we propose an automatic trailer generation framework that generates plausible trailers from a full movie by automating shot selection and composition. Our approach draws inspiration from machine translation techniques and models the movies and trailers as sequences of shots, thus formulating the trailer generation problem as a sequence-to-sequence task. We introduce Trailer Generation Transformer (TGT), a deep-learning framework utilizing an encoder-decoder architecture. TGT movie encoder is tasked with contextualizing each movie shot representation via self-attention, while the autoregressive trailer decoder predicts the feature representation of the next trailer shot, accounting for the relevance of shots' temporal order in trailers. Our TGT significantly outperforms previous methods on a comprehensive suite of metrics., Comment: Accepted to CVPR 2024
- Published
- 2024
30. AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation
- Author
-
Xie, Rui, Tai, Ying, Zhao, Chen, Zhang, Kai, Zhang, Zhenyu, Zhou, Jun, Ye, Xiaoqian, Wang, Qian, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion distillation (ADD), we design~\name~to address this issue by incorporating the ideas of both distillation and ControlNet. Specifically, we first propose a prediction-based self-refinement strategy to provide high-frequency information in the student model output with marginal additional time cost. Furthermore, we refine the training process by employing HR images, rather than LR images, to regulate the teacher model, providing a more robust constraint for distillation. Second, we introduce a timestep-adaptive ADD to address the perception-distortion imbalance problem introduced by original ADD. Extensive experiments demonstrate our~\name~generates better restoration results, while achieving faster speed than previous SD-based state-of-the-art models (e.g., $7$$\times$ faster than SeeSR).
- Published
- 2024
31. Click chemistry-mediated enrichment of circulating tumor cells and tumor-derived extracellular vesicles for dual liquid biopsy in differentiated thyroid cancer
- Author
-
Feng, Bing, Wang, Jing, Zhang, Ryan Y, Wei, Anna Yaxuan, Zhao, Chen, Yen, Ying-Tzu, Ji, You-Ren, Kim, Hyoyong, Ju, Yong, Smalley, Matthew, Zuo, Vivian Xufei, Cheng, Liwen, Phung, Aaron, Zhou, Ziang, Yu, Sitong, DiBernardo, Gabriella, Memarzadeh, Sanaz, Posadas, Edwin M, Chai-Ho, Wanxing, Agopian, Vatche, Lee, Junseok, Yeh, Michael W, Wu, James, Zheng, Guangjuan, Tseng, Hsian-Rong, and Zhu, Yazhen
- Subjects
Biomedical and Clinical Sciences ,Oncology and Carcinogenesis ,Cancer ,Clinical Research ,Good Health and Well Being ,Biomedical Engineering ,Nanotechnology ,Nanoscience & Nanotechnology ,Medical biotechnology ,Biomedical engineering - Abstract
Circulating tumor cells (CTCs) and tumor-derived extracellular vesicles (tEVs) are two crucial methodologies of liquid biopsy. Given their distinct size differences and release dynamics, CTCs and tEVs potentially offer synergistic capabilities in the non-invasive detection of differentiated thyroid cancer (DTC), a typically indolent tumor. We present the Combined DTC CTC/tEV Assay, integrating dual liquid biopsy processes: i) DTC CTC enrichment by Click Chips, followed by analysis of seven DTC-specific genes, and ii) DTC tEV enrichment by Click Beads, succeeded by mRNA cargo quantification in DTC tEVs. This method utilizes click chemistry, leveraging a pair of biorthogonal and highly reactive functional motifs (tetrazine, Tz, and trans-cyclooctene, TCO), to overcome the challenges encountered in the conventional immunoaffinity-based enrichment of CTCs and tEVs. The Combined DTC CTC/tEV Assay synergistically combines the diagnostic precision of CTCs with the sensitivity of tEVs, demonstrating superior diagnostic accuracy in DTC detection and boasting an AUROC of 0.99. This outperforms the individual diagnostic performance of using either DTC CTC or DTC tEV alone. This integration enables full utilization of a patient's blood sample, and marks a significant evolution in the development of nanomaterial-based liquid biopsy technologies to address challenging unmet clinical needs in cancer care.
- Published
- 2024
32. A new hip fracture risk index derived from FEA-computed proximal femur fracture loads and energies-to-failure.
- Author
-
Cao, Xuewei, Sigurdsson, Sigurdur, Zhao, Chen, Zhou, Weihua, Liu, Anqi, Deng, Hong-Wen, Gudnason, Vilmundur, Sha, Qiuying, Lang, Thomas, and Keyak, Joyce
- Subjects
Bone strength ,Finite element analysis ,Hip fracture risk ,Osteoporosis ,Principal component analysis ,Male ,Humans ,Proximal Femoral Fractures ,Hip Fractures ,Bone Density ,Femur ,ROC Curve ,Finite Element Analysis - Abstract
UNLABELLED: Hip fracture risk assessment is an important but challenging task. Quantitative CT-based patient-specific finite element (FE) analysis (FEA) incorporates bone geometry and bone density in the proximal femur. We developed a global FEA-computed fracture risk index to increase the prediction accuracy of hip fracture incidence. PURPOSE: Quantitative CT-based patient-specific finite element (FE) analysis (FEA) incorporates bone geometry and bone density in the proximal femur to compute the force (fracture load) and energy necessary to break the proximal femur in a particular loading condition. The fracture loads and energies-to-failure are individually associated with incident hip fracture, and provide different structural information about the proximal femur. METHODS: We used principal component analysis (PCA) to develop a global FEA-computed fracture risk index that incorporates the FEA-computed yield and ultimate failure loads and energies-to-failure in four loading conditions of 110 hip fracture subjects and 235 age- and sex-matched control subjects from the AGES-Reykjavik study. Using a logistic regression model, we compared the prediction performance for hip fracture based on the stratified resampling. RESULTS: We referred the first principal component (PC1) of the FE parameters as the global FEA-computed fracture risk index, which was the significant predictor of hip fracture (p-value
- Published
- 2024
33. Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
- Author
-
Wu, Guande, Zhao, Chen, Silva, Claudio, and He, He
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and communication. To test LLM's ability to collaborate, we design a blocks-world environment, where two agents, each having unique goals and skills, build a target structure together. To complete the goals, they can act in the world and communicate in natural language. Under this environment, we design increasingly challenging settings to evaluate different collaboration perspectives, from independent to more complex, dependent tasks. We further adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors. Both human-machine and machine-machine experiments show that LLM agents have strong grounding capacities, and our approach significantly improves the evaluation metric.
- Published
- 2024
34. Invertible Diffusion Models for Compressed Sensing
- Author
-
Chen, Bin, Zhang, Zhenyu, Li, Weiqi, Zhao, Chen, Yu, Jiwen, Zhao, Shijie, Chen, Jie, and Zhang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
While deep neural networks (NN) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch constrains their effectiveness and hampers rapid deployment. Although recent methods utilize pre-trained diffusion models for image reconstruction, they struggle with slow inference and restricted adaptability to CS. To tackle these challenges, this paper proposes Invertible Diffusion Models (IDM), a novel efficient, end-to-end diffusion-based CS method. IDM repurposes a large-scale diffusion sampling process as a reconstruction model, and finetunes it end-to-end to recover original images directly from CS measurements, moving beyond the traditional paradigm of one-step noise estimation learning. To enable such memory-intensive end-to-end finetuning, we propose a novel two-level invertible design to transform both (1) the multi-step sampling process and (2) the noise estimation U-Net in each step into invertible networks. As a result, most intermediate features are cleared during training to reduce up to 93.8% GPU memory. In addition, we develop a set of lightweight modules to inject measurements into noise estimator to further facilitate reconstruction. Experiments demonstrate that IDM outperforms existing state-of-the-art CS networks by up to 2.64dB in PSNR. Compared to the recent diffusion model-based approach DDNM, our IDM achieves up to 10.09dB PSNR gain and 14.54 times faster inference.
- Published
- 2024
35. Graphs Generalization under Distribution Shifts
- Author
-
Tian, Qin, Wang, Wenjun, Zhao, Chen, Shao, Minglai, Zhang, Wang, and Li, Dong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a significant process. However, the OOD method for graph-structured data currently lacks clarity and remains relatively unexplored due to two primary challenges. Firstly, distribution shifts on graphs often occur simultaneously on node attributes and graph topology. Secondly, capturing invariant information amidst diverse distribution shifts proves to be a formidable challenge. To overcome these obstacles, in this paper, we introduce a novel framework, namely Graph Learning Invariant Domain genERation (GLIDER). The goal is to (1) diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and (2) minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels. Extensive experiment results indicate that our model outperforms baseline methods on node-level OOD generalization across domains in distribution shift on node features and topological structures simultaneously.
- Published
- 2024
36. TexRO: Generating Delicate Textures of 3D Models by Recursive Optimization
- Author
-
Wu, Jinbo, Liu, Xing, Wu, Chenming, Gao, Xiaobo, Liu, Jialun, Liu, Xinqi, Zhao, Chen, Feng, Haocheng, Ding, Errui, and Wang, Jingdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper presents TexRO, a novel method for generating delicate textures of a known 3D mesh by optimizing its UV texture. The key contributions are two-fold. We propose an optimal viewpoint selection strategy, that finds the most miniature set of viewpoints covering all the faces of a mesh. Our viewpoint selection strategy guarantees the completeness of a generated result. We propose a recursive optimization pipeline that optimizes a UV texture at increasing resolutions, with an adaptive denoising method that re-uses existing textures for new texture generation. Through extensive experimentation, we demonstrate the superior performance of TexRO in terms of texture quality, detail preservation, visual consistency, and, notably runtime speed, outperforming other current methods. The broad applicability of TexRO is further confirmed through its successful use on diverse 3D models., Comment: Technical report. Project page: https://3d-aigc.github.io/TexRO
- Published
- 2024
37. DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses
- Author
-
Zhao, Chen, Zhang, Tong, Dang, Zheng, and Salzmann, Mathieu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Determining the relative pose of an object between two images is pivotal to the success of generalizable object pose estimation. Existing approaches typically approximate the continuous pose representation with a large number of discrete pose hypotheses, which incurs a computationally expensive process of scoring each hypothesis at test time. By contrast, we present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass. To this end, we map the two input RGB images, reference and query, to their respective voxelized 3D representations. We then pass the resulting voxels through a pose estimation module, where the voxels are aligned and the pose is computed in an end-to-end fashion by solving a least-squares problem. To enhance robustness, we introduce a weighted closest voxel algorithm capable of mitigating the impact of noisy voxels. We conduct extensive experiments on the CO3D, LINEMOD, and Objaverse datasets, demonstrating that our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods. Our code is released at: https://github.com/sailor-z/DVMNet/., Comment: Accepted by CVPR 2024
- Published
- 2024
38. Tidal Formation of dark matter deficit diffuse galaxy NGC1052-DF2 by SIDM
- Author
-
Zhang, Zhao-Chen, Bi, Xiao-Jun, and Yin, Peng-Fei
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,High Energy Physics - Phenomenology - Abstract
Observations have revealed a significant dark matter deficit in the ultra-diffuse galaxy NGC1052-DF2 (DF2). It is widely accepted that the formation of this unique galaxy can be attributed to the tidal stripping of its host galaxy, NGC1052. In this study, we simulate the evolution of a satellite system containing globular clusters (GCs) within an accreting host halo in the framework of self-interacting dark matter (SIDM). Our simulation results suggest that the heightened tidal stripping resulting from DM self-interactions can give rise to the transformation of a conventional dwarf galaxy into a dark matter deficit galaxy resembling DF2. By comparing the simulation results with identical initial conditions in both the standard cold dark matter (CDM) and SIDM models, we find that the latter is more likely to replicate the properties of DF2. Furthermore, we demonstrate that a DF2 analog can also be produced on an orbit with a greater pericenter distance by increasing the strength of DM self-interactions. This suggests that the issue of extreme orbital parameters can be mitigated by implementing the SIDM model. The distributions of the GC population derived in our SIDM simulation are consistent with the observed characteristics of DF2. For comparison, we also explored the potential for achieving GC distributions in the context of CDM., Comment: 12 pages, 7 figures
- Published
- 2024
39. GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time
- Author
-
Li, Hao, Gao, Yuanyuan, Wu, Chenming, Zhang, Dingwen, Dai, Yalun, Zhao, Chen, Feng, Haocheng, Ding, Errui, Wang, Jingdong, and Han, Junwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper presents GGRt, a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger applicability of 3D Gaussian Splatting (3D-GS) in real-world scenarios. Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model. With the joint learning mechanism, the proposed framework can inherently estimate robust relative pose information from the image observations and thus primarily alleviate the requirement of real camera poses. Moreover, we implement a deferred back-propagation mechanism that enables high-resolution training and inference, overcoming the resolution constraints of previous methods. To enhance the speed and efficiency, we further introduce a progressive Gaussian cache module that dynamically adjusts during training and inference. As the first pose-free generalizable 3D-GS framework, GGRt achieves inference at $\ge$ 5 FPS and real-time rendering at $\ge$ 100 FPS. Through extensive experimentation, we demonstrate that our method outperforms existing NeRF-based pose-free techniques in terms of inference speed and effectiveness. It can also approach the real pose-based 3D-GS methods. Our contributions provide a significant leap forward for the integration of computer vision and computer graphics into practical applications, offering state-of-the-art results on LLFF, KITTI, and Waymo Open datasets and enabling real-time rendering for immersive experiences., Comment: Project page: https://3d-aigc.github.io/GGRt
- Published
- 2024
40. Correlated decoding of logical algorithms with transversal gates
- Author
-
Cain, Madelyn, Zhao, Chen, Zhou, Hengyun, Meister, Nadine, Ataides, J. Pablo Bonilla, Jaffe, Arthur, Bluvstein, Dolev, and Lukin, Mikhail D.
- Subjects
Quantum Physics ,Condensed Matter - Disordered Systems and Neural Networks ,Condensed Matter - Statistical Mechanics - Abstract
Quantum error correction is believed to be essential for scalable quantum computation, but its implementation is challenging due to its considerable space-time overhead. Motivated by recent experiments demonstrating efficient manipulation of logical qubits using transversal gates (Bluvstein et al., Nature 626, 58-65 (2024)), we show that the performance of logical algorithms can be substantially improved by decoding the qubits jointly to account for physical error propagation during transversal entangling gates. We find that such correlated decoding improves the performance of both Clifford and non-Clifford transversal entangling gates, and explore two decoders offering different computational runtimes and accuracies. By considering deep logical Clifford circuits, we find that correlated decoding can significantly improve the space-time cost by reducing the number of rounds of noisy syndrome extraction per gate. These results demonstrate that correlated decoding provides a major advantage in early fault-tolerant computation, and indicate it has considerable potential to reduce the space-time cost in large-scale logical algorithms., Comment: 7+12 pages, 5+3 figures
- Published
- 2024
41. Learning A Physical-aware Diffusion Model Based on Transformer for Underwater Image Enhancement
- Author
-
Zhao, Chen, Dong, Chenyu, and Cai, Weiling
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Underwater visuals undergo various complex degradations, inevitably influencing the efficiency of underwater vision tasks. Recently, diffusion models were employed to underwater image enhancement (UIE) tasks, and gained SOTA performance. However, these methods fail to consider the physical properties and underwater imaging mechanisms in the diffusion process, limiting information completion capacity of diffusion models. In this paper, we introduce a novel UIE framework, named PA-Diff, designed to exploiting the knowledge of physics to guide the diffusion process. PA-Diff consists of Physics Prior Generation (PPG) Branch, Implicit Neural Reconstruction (INR) Branch, and Physics-aware Diffusion Transformer (PDT) Branch. Our designed PPG branch aims to produce the prior knowledge of physics. With utilizing the physics prior knowledge to guide the diffusion process, PDT branch can obtain underwater-aware ability and model the complex distribution in real-world underwater scenes. INR Branch can learn robust feature representations from diverse underwater image via implicit neural representation, which reduces the difficulty of restoration for PDT branch. Extensive experiments prove that our method achieves best performance on UIE tasks.
- Published
- 2024
42. GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos
- Author
-
Liu, Xinqi, Wu, Chenming, Liu, Jialun, Liu, Xing, Wu, Jinbo, Zhao, Chen, Feng, Haocheng, Ding, Errui, and Wang, Jingdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA). Our innovation lies in addressing the intricate challenges of delivering high-fidelity human body reconstructions and aligning 3D Gaussians with human skin surfaces accurately. The key contributions of this paper are twofold. Firstly, we introduce a pose refinement technique to improve hand and foot pose accuracy by aligning normal maps and silhouettes. Precise pose is crucial for correct shape and appearance reconstruction. Secondly, we address the problems of unbalanced aggregation and initialization bias that previously diminished the quality of 3D Gaussian avatars, through a novel surface-guided re-initialization method that ensures accurate alignment of 3D Gaussian points with avatar surfaces. Experimental results demonstrate that our proposed method achieves high-fidelity and vivid 3D Gaussian avatar reconstruction. Extensive experimental analyses validate the performance qualitatively and quantitatively, demonstrating that it achieves state-of-the-art performance in photo-realistic novel view synthesis while offering fine-grained control over the human body and hand pose. Project page: https://3d-aigc.github.io/GVA/.
- Published
- 2024
43. HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields
- Author
-
Qi, Haozhe, Zhao, Chen, Salzmann, Mathieu, and Mathis, Alexander
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Human hands are highly articulated and versatile at handling objects. Jointly estimating the 3D poses of a hand and the object it manipulates from a monocular camera is challenging due to frequent occlusions. Thus, existing methods often rely on intermediate 3D shape representations to increase performance. These representations are typically explicit, such as 3D point clouds or meshes, and thus provide information in the direct surroundings of the intermediate hand pose estimate. To address this, we introduce HOISDF, a Signed Distance Field (SDF) guided hand-object pose estimation network, which jointly exploits hand and object SDFs to provide a global, implicit representation over the complete reconstruction volume. Specifically, the role of the SDFs is threefold: equip the visual encoder with implicit shape information, help to encode hand-object interactions, and guide the hand and object pose regression via SDF-based sampling and by augmenting the feature representations. We show that HOISDF achieves state-of-the-art results on hand-object pose estimation benchmarks (DexYCB and HO3Dv2). Code is available at https://github.com/amathislab/HOISDF, Comment: Accepted at CVPR 2024. 9 figures, many tables
- Published
- 2024
44. Multi-graph Graph Matching for Coronary Artery Semantic Labeling
- Author
-
Zhao, Chen, Xu, Zhihui, Baral, Pukar, Esposito, Michel, and Zhou, Weihua
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Coronary artery disease (CAD) stands as the leading cause of death worldwide, and invasive coronary angiography (ICA) remains the gold standard for assessing vascular anatomical information. However, deep learning-based methods encounter challenges in generating semantic labels for arterial segments, primarily due to the morphological similarity between arterial branches and varying anatomy of arterial system between different projection view angles and patients. To address this challenge, we model the vascular tree as a graph and propose a multi-graph graph matching (MGM) algorithm for coronary artery semantic labeling. The MGM algorithm assesses the similarity between arterials in multiple vascular tree graphs, considering the cycle consistency between each pair of graphs. As a result, the unannotated arterial segments are appropriately labeled by matching them with annotated segments. Through the incorporation of anatomical graph structure, radiomics features, and semantic mapping, the proposed MGM model achieves an impressive accuracy of 0.9471 for coronary artery semantic labeling using our multi-site dataset with 718 ICAs. With the semantic labeled arteries, an overall accuracy of 0.9155 was achieved for stenosis detection. The proposed MGM presents a novel tool for coronary artery analysis using multiple ICA-derived graphs, offering valuable insights into vascular health and pathology.
- Published
- 2024
45. PICO: Accelerating All k-Core Paradigms on GPU
- Author
-
Zhao, Chen, Yu, Ting, Zheng, Zhigao, Jin, Song, Jiang, Jiawei, Du, Bo, and Tao, Dacheng
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs. Solutions to this problem have been developed using both bottom-up and top-down approaches from the perspective of vertex convergence dependency. However, existing algorithms have not effectively harnessed GPU performance to expedite core decomposition, despite the growing need for enhanced performance. Moreover, approaching performance limitations of core decomposition from two different directions within a parallel synchronization structure has not been thoroughly explored. This paper introduces an efficient GPU acceleration framework, PICO, for the Peel and Index2core paradigms of k-core decomposition. We propose PeelOne, a Peel-based algorithm designed to simplify the parallel logic and minimize atomic operations by eliminating vertices that are 'under-core'. We also propose an Index2core-based algorithm, named HistoCore, which addresses the issue of extensive redundant computations across both vertices and edges. Extensive experiments on NVIDIA RTX 3090 GPU show that PeelOne outperforms all other Peel-based algorithms, and HistoCore outperforms all other Index2core-based algorithms. Furthermore, HistoCore even outperforms PeelOne by 1.1x - 3.2x speedup on six datasets, which breaks the stereotype that the Index2core paradigm performs much worse than the Peel in a shared memory parallel setting.
- Published
- 2024
46. Parallel Structures in Pre-training Data Yield In-Context Learning
- Author
-
Chen, Yanda, Zhao, Chen, Yu, Zhou, McKeown, Kathleen, and He, He
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Pre-trained language models (LMs) are capable of in-context learning (ICL): they can adapt to a task with only a few examples given in the prompt without any parameter update. However, it is unclear where this capability comes from as there is a stark distribution shift between pre-training text and ICL prompts. In this work, we study what patterns of the pre-training data contribute to ICL. We find that LMs' ICL ability depends on $\textit{parallel structures}$ in the pre-training data -- pairs of phrases following similar templates in the same context window. Specifically, we detect parallel structures by checking whether training on one phrase improves prediction of the other, and conduct ablation experiments to study their effect on ICL. We show that removing parallel structures in the pre-training data reduces LMs' ICL accuracy by 51% (vs 2% from random ablation). This drop persists even when excluding common patterns such as n-gram repetitions and long-range dependency, showing the diversity and generality of parallel structures. A closer look at the detected parallel structures indicates that they cover diverse linguistic tasks and span long distances in the data.
- Published
- 2024
47. Dynamic Environment Responsive Online Meta-Learning with Fairness Awareness
- Author
-
Zhao, Chen, Mi, Feng, Wu, Xintao, Jiang, Kai, Khan, Latifur, and Chen, Feng
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner's objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the i.i.d (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it's crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model's primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods., Comment: Accepted by TKDD, extended from KDD 2022. arXiv admin note: substantial text overlap with arXiv:2205.11264
- Published
- 2024
48. Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA
- Author
-
Tang, Shaojie, Miao, Penpen, Gao, Xingyu, Zhong, Yu, Zhu, Dantong, Wen, Haixing, Xu, Zhihui, Wei, Qiuyue, Yao, Hongping, Huang, Xin, Gao, Rui, Zhao, Chen, and Zhou, Weihua
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point clouds of the LV epicardial contours (LVECs). Secondly, according to the characteristics of cardiac anatomy, the special points of anterior and posterior interventricular grooves (APIGs) were manually marked in both SPECT and CTA image volumes. Thirdly, we developed an in-house program for coarsely registering the special points of APIGs to ensure a correct cardiac orientation alignment between SPECT and CTA images. Fourthly, we employed ICP, SICP or CPD algorithm to achieve a fine registration for the point clouds (together with the special points of APIGs) of the LV epicardial surfaces (LVERs) in SPECT and CTA images. Finally, the image fusion between SPECT and CTA was realized after the fine registration. The experimental results showed that the cardiac orientation was aligned well and the mean distance error of the optimal registration method (CPD with affine transform) was consistently less than 3 mm. The proposed method could effectively fuse the structures from cardiac CTA and SPECT functional images, and demonstrated a potential in assisting in accurate diagnosis of cardiac diseases by combining complementary advantages of the two imaging modalities.
- Published
- 2024
49. 3D Lymphoma Segmentation on PET/CT Images via Multi-Scale Information Fusion with Cross-Attention
- Author
-
Huang, Huan, Qiu, Liheng, Yang, Shenmiao, Li, Longxi, Nan, Jiaofen, Li, Yanting, Han, Chuang, Zhu, Fubao, Zhao, Chen, and Zhou, Weihua
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Background: Accurate segmentation of diffuse large B-cell lymphoma (DLBCL) lesions is challenging due to their complex patterns in medical imaging. Objective: This study aims to develop a precise segmentation method for DLBCL using 18F-Fluorodeoxyglucose (FDG) positron emission tomography (PET) and computed tomography (CT) images. Methods: We propose a 3D dual-branch encoder segmentation method using shifted window transformers and a Multi-Scale Information Fusion (MSIF) module. To enhance feature integration, the MSIF module performs multi-scale feature fusion using cross-attention mechanisms with a shifted window framework. A gated neural network within the MSIF module dynamically balances the contributions from each modality. The model was optimized using the Dice Similarity Coefficient (DSC) loss function. Additionally, we computed the total metabolic tumor volume (TMTV) and performed statistical analyses. Results: The model was trained and validated on a dataset of 165 DLBCL patients using 5-fold cross-validation, achieving a DSC of 0.7512. Statistical analysis showed a significant improvement over comparative methods (p < 0.05). Additionally, a Pearson correlation coefficient of 0.91 and an R^2 of 0.89 were observed when comparing manual annotations to segmentation results for TMTV measurement. Conclusion: This study presents an effective automatic segmentation method for DLBCL that leverages the complementary strengths of PET and CT imaging. Our method has the potential to improve diagnostic interpretations and assist in treatment planning for DLBCL patients., Comment: 19 pages, 7 figures; reference added
- Published
- 2024
50. Supervised Algorithmic Fairness in Distribution Shifts: A Survey
- Author
-
Shao, Minglai, Li, Dong, Zhao, Chen, Wu, Xintao, Lin, Yujie, and Tian, Qin
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
Supervised fairness-aware machine learning under distribution shifts is an emerging field that addresses the challenge of maintaining equitable and unbiased predictions when faced with changes in data distributions from source to target domains. In real-world applications, machine learning models are often trained on a specific dataset but deployed in environments where the data distribution may shift over time due to various factors. This shift can lead to unfair predictions, disproportionately affecting certain groups characterized by sensitive attributes, such as race and gender. In this survey, we provide a summary of various types of distribution shifts and comprehensively investigate existing methods based on these shifts, highlighting six commonly used approaches in the literature. Additionally, this survey lists publicly available datasets and evaluation metrics for empirical studies. We further explore the interconnection with related research fields, discuss the significant challenges, and identify potential directions for future studies., Comment: IJCAI 2024
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.