43,192 results on '"Zhang, Fan"'
Search Results
2. Deaf and Hard of Hearing Students' Configurational and Phonological Processing in Chinese One- and Two-Character Word Recognition
- Author
-
Wang, Xiaoyun, Li, Degao, and Zhang, Fan
- Published
- 2021
- Full Text
- View/download PDF
3. MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI
- Author
-
Newlin, Nancy R., Schilling, Kurt, Koudoro, Serge, Chandio, Bramsh Qamar, Kanakaraj, Praitayini, Moyer, Daniel, Kelly, Claire E., Genc, Sila, Chen, Jian, Yang, Joseph Yuan-Mou, Wu, Ye, He, Yifei, Zhang, Jiawei, Zeng, Qingrun, Zhang, Fan, Adluru, Nagesh, Nath, Vishwesh, Pathak, Sudhir, Schneider, Walter, Gade, Anurag, Rathi, Yogesh, Hendriks, Tom, Vilanova, Anna, Chamberland, Maxime, Pieciak, Tomasz, Ciupek, Dominika, Vega, Antonio Tristán, Aja-Fernández, Santiago, Malawski, Maciej, Ouedraogo, Gani, Machnio, Julia, Ewert, Christian, Thompson, Paul M., Jahanshad, Neda, Garyfallidis, Eleftherios, and Landman, Bennett A.
- Subjects
Physics - Medical Physics ,Computer Science - Machine Learning - Abstract
White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. There is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences., Comment: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024/019
- Published
- 2024
- Full Text
- View/download PDF
4. Disposable Opto-Acoustic Window Enabled Cost-effective Photoacoustic-Ultrasound Dual-modal Imaging
- Author
-
Jiang, Yunhui, Zhang, Fan, Zheng, Yuwei, Sun, Ruixi, and Gao, Fei
- Subjects
Physics - Medical Physics - Abstract
Photoacoustic imaging (PAI) and ultrasound imaging (USI) are important biomedical imaging techniques, due to their unique and complementary advantages in tissue's structure and function visualization. In this Letter, we proposed a coaxial photoacoustic-ultrasound dual-modal imaging system (coPAUS) with disposable opto-acoustic window. This opto-acoustic window allows part of light to go through it, and another part of light to be converted to ultrasound transmission signal by photoacoustic effect. By single laser pulse illumination, both PA signals and reflected US signals can be generated. Then, a linear array probe receives both PA and US signals, enabling simultaneous dual-modal PA and US imaging. Ex vivo experiments were conducted involving pencil lead, hair, and plastic tube with black spot, as well as in vivo experiment on human finger. The system's resolutions for PA and US imaging are 215 um and 91.125 um, with signal-to-noise ratios for PA and US signals reached up to 37.48 dB and 29.75 dB, respectively, proving the feasibility of the coPAUS dual-modal imaging. The proposed coPAUS system with disposable opto-acoustic window provides an immediate and cost-effective approach to enable US imaging capability based on an existing PA imaging system., Comment: 9 pages, 6 figures, 1 table
- Published
- 2024
5. A Novel Deep Learning Tractography Fiber Clustering Framework for Functionally Consistent White Matter Parcellation Using Multimodal Diffusion MRI and Functional MRI
- Author
-
Wang, Jin, Guo, Bocheng, Li, Yijie, Wang, Junyi, Chen, Yuqian, Rushmore, Jarrett, Makris, Nikos, Rathi, Yogesh, O'Donnell, Lauren J, and Zhang, Fan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Tractography fiber clustering using diffusion MRI (dMRI) is a crucial strategy for white matter (WM) parcellation. Current methods primarily use the geometric information of fibers (i.e., the spatial trajectories) to group similar fibers into clusters, overlooking the important functional signals present along the fiber tracts. There is increasing evidence that neural activity in the WM can be measured using functional MRI (fMRI), offering potentially valuable multimodal information for fiber clustering. In this paper, we develop a novel deep learning fiber clustering framework, namely Deep Multi-view Fiber Clustering (DMVFC), that uses joint dMRI and fMRI data to enable functionally consistent WM parcellation. DMVFC can effectively integrate the geometric characteristics of the WM fibers with the fMRI BOLD signals along the fiber tracts. It includes two major components: 1) a multi-view pretraining module to compute embedding features from fiber geometric information and functional signals separately, and 2) a collaborative fine-tuning module to simultaneously refine the two kinds of embeddings. In the experiments, we compare DMVFC with two state-of-the-art fiber clustering methods and demonstrate superior performance in achieving functionally meaningful and consistent WM parcellation results., Comment: 5 pages, 3 figures
- Published
- 2024
6. Human-inspired Perspectives: A Survey on AI Long-term Memory
- Author
-
He, Zihong, Lin, Weizhe, Zheng, Hao, Zhang, Fan, Jones, Matt, Aitchison, Laurence, Xu, Xuhai, Liu, Miao, Kristensson, Per Ola, and Shen, Junxiao
- Subjects
Computer Science - Artificial Intelligence - Abstract
With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize information over the long term - referred to as long-term memory - have become increasingly significant. These capabilities are crucial for enhancing the performance of AI systems across a wide range of tasks. However, there is currently no comprehensive survey that systematically investigates AI's long-term memory capabilities, formulates a theoretical framework, and inspires the development of next-generation AI long-term memory systems. This paper begins by systematically introducing the mechanisms of human long-term memory, then explores AI long-term memory mechanisms, establishing a mapping between the two. Based on the mapping relationships identified, we extend the current cognitive architectures and propose the Cognitive Architecture of Self-Adaptive Long-term Memory (SALM). SALM provides a theoretical framework for the practice of AI long-term memory and holds potential for guiding the creation of next-generation long-term memory driven AI systems. Finally, we delve into the future directions and application prospects of AI long-term memory.
- Published
- 2024
7. AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection
- Author
-
Wang, Yujin, Xu, Tianyi, Zhang, Fan, Xue, Tianfan, and Gu, Jinwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image Signal Processors (ISPs) convert raw sensor signals into digital images, which significantly influence the image quality and the performance of downstream computer vision tasks. Designing ISP pipeline and tuning ISP parameters are two key steps for building an imaging and vision system. To find optimal ISP configurations, recent works use deep neural networks as a proxy to search for ISP parameters or ISP pipelines. However, these methods are primarily designed to maximize the image quality, which are sub-optimal in the performance of high-level computer vision tasks such as detection, recognition, and tracking. Moreover, after training, the learned ISP pipelines are mostly fixed at the inference time, whose performance degrades in dynamic scenes. To jointly optimize ISP structures and parameters, we propose AdaptiveISP, a task-driven and scene-adaptive ISP. One key observation is that for the majority of input images, only a few processing modules are needed to improve the performance of downstream recognition tasks, and only a few inputs require more processing. Based on this, AdaptiveISP utilizes deep reinforcement learning to automatically generate an optimal ISP pipeline and the associated ISP parameters to maximize the detection performance. Experimental results show that AdaptiveISP not only surpasses the prior state-of-the-art methods for object detection but also dynamically manages the trade-off between detection performance and computational cost, especially suitable for scenes with large dynamic range variations. Project website: https://openimaginglab.github.io/AdaptiveISP/., Comment: Accepted at NeurIPS2024
- Published
- 2024
8. TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds
- Author
-
Lo, Yui, Chen, Yuqian, Liu, Dongnan, Legarreta, Jon Haitz, Zekelman, Leo, Zhang, Fan, Rushmore, Jarrett, Rathi, Yogesh, Makris, Nikos, Golby, Alexandra J., Cai, Weidong, and O'Donnell, Lauren J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Brain imaging studies have demonstrated that diffusion MRI tractography geometric shape descriptors can inform the study of the brain's white matter pathways and their relationship to brain function. In this work, we investigate the possibility of utilizing a deep learning model to compute shape measures of the brain's white matter connections. We introduce a novel framework, TractShapeNet, that leverages a point cloud representation of tractography to compute five shape measures: length, span, volume, total surface area, and irregularity. We assess the performance of the method on a large dataset including 1065 healthy young adults. Experiments for shape measure computation demonstrate that our proposed TractShapeNet outperforms other point cloud-based neural network models in both the Pearson correlation coefficient and normalized error metrics. We compare the inference runtime results with the conventional shape computation tool DSI-Studio. Our results demonstrate that a deep learning approach enables faster and more efficient shape measure computation. We also conduct experiments on two downstream language cognition prediction tasks, showing that shape measures from TractShapeNet perform similarly to those computed by DSI-Studio. Our code will be available at: https://github.com/SlicerDMRI/TractShapeNet., Comment: 10 pages, 2 figures, 4 tables. This work has been submitted to the IEEE for possible publication
- Published
- 2024
9. RediSwap: MEV Redistribution Mechanism for CFMMs
- Author
-
Zhang, Mengqian, Yang, Sen, and Zhang, Fan
- Subjects
Computer Science - Computer Science and Game Theory ,Computer Science - Cryptography and Security - Abstract
Automated Market Makers (AMMs) are essential to decentralized finance, offering continuous liquidity and enabling intermediary-free trading on blockchains. However, participants in AMMs are vulnerable to Maximal Extractable Value (MEV) exploitation. Users face threats such as front-running, back-running, and sandwich attacks, while liquidity providers (LPs) incur the loss-versus-rebalancing (LVR). In this paper, we introduce RediSwap, a novel AMM designed to capture MEV at the application level and refund it fairly among users and liquidity providers. At its core, RediSwap features an MEV-redistribution mechanism that manages arbitrage opportunities within the AMM pool. We formalize the mechanism design problem and the desired game-theoretical properties. A central insight underpinning our mechanism is the interpretation of the maximal MEV value as the sum of LVR and individual user losses. We prove that our mechanism is incentive-compatible and Sybil-proof, and demonstrate that it is easy for arbitrageurs to participate. We empirically compared RediSwap with existing solutions by replaying historical AMM trades. Our results suggest that RediSwap can achieve better execution than UniswapX in 89% of trades and reduce LPs' loss to under 0.5% of the original LVR in most cases.
- Published
- 2024
10. Neural Predictor for Flight Control with Payload
- Author
-
Jin, Ao, Li, Chenhao, Wang, Qinyi, Liu, Ya, Huang, Panfeng, and Zhang, Fan
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Aerial robotics for transporting suspended payloads as the form of freely-floating manipulator are growing great interest in recent years. However, the prior information of the payload, such as the mass, is always hard to obtain accurately in practice. The force/torque caused by payload and residual dynamics will introduce unmodeled perturbations to the system, which negatively affects the closed-loop performance. Different from estimation-like methods, this paper proposes Neural Predictor, a learning-based approach to model force/torque caused by payload and residual dynamics as a dynamical system. It results a hybrid model including both the first-principles dynamics and the learned dynamics. This hybrid model is then integrated into a MPC framework to improve closed-loop performance. Effectiveness of proposed framework is verified extensively in both numerical simulations and real-world flight experiments. The results indicate that our approach can capture force/torque caused by payload and residual dynamics accurately, respond quickly to the changes of them and improve the closed-loop performance significantly. In particular, Neural Predictor outperforms a state-of-the-art learning-based estimator and has reduced the force and torque estimation errors by up to 66.15% and 33.33% while using less samples., Comment: 8 pages
- Published
- 2024
11. The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study
- Author
-
Lo, Yui, Chen, Yuqian, Liu, Dongnan, Liu, Wan, Zekelman, Leo, Rushmore, Jarrett, Zhang, Fan, Rathi, Yogesh, Makris, Nikos, Golby, Alexandra J., Cai, Weidong, and O'Donnell, Lauren J.
- Subjects
Quantitative Biology - Neurons and Cognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
The shape of the brain's white matter connections is relatively unexplored in diffusion MRI tractography analysis. While it is known that tract shape varies in populations and across the human lifespan, it is unknown if the variability in dMRI tractography-derived shape may relate to the brain's functional variability across individuals. This work explores the potential of leveraging tractography fiber cluster shape measures to predict subject-specific cognitive performance. We implement machine learning models to predict individual cognitive performance scores. We study a large-scale database from the HCP-YA study. We apply an atlas-based fiber cluster parcellation to the dMRI tractography of each individual. We compute 15 shape, microstructure, and connectivity features for each fiber cluster. Using these features as input, we train a total of 210 models to predict 7 different NIH Toolbox cognitive performance assessments. We apply an explainable AI technique, SHAP, to assess the importance of each fiber cluster for prediction. Our results demonstrate that shape measures are predictive of individual cognitive performance. The studied shape measures, such as irregularity, diameter, total surface area, volume, and branch volume, are as effective for prediction as microstructure and connectivity measures. The overall best-performing feature is a shape feature, irregularity, which describes how different a cluster's shape is from an idealized cylinder. Further interpretation using SHAP values suggest that fiber clusters with features highly predictive of cognitive ability are widespread throughout the brain, including fiber clusters from the superficial association, deep association, cerebellar, striatal, and projection pathways. This study demonstrates the strong potential of shape descriptors to enhance the study of the brain's white matter and its relationship to cognitive function.
- Published
- 2024
12. Resolution Enhancement of Under-sampled Photoacoustic Microscopy Images using Implicit Neural Representations
- Author
-
Xiao, Youshen, Liao, Sheng, Tian, Xuanyang, Zhang, Fan, Dong, Xinlong, Jiang, Yunhui, Chen, Xiyu, Sun, Ruixi, Zhang, Yuyao, and Gao, Fei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Acoustic-Resolution Photoacoustic Microscopy (AR-PAM) is promising for subcutaneous vascular imaging, but its spatial resolution is constrained by the Point Spread Function (PSF). Traditional deconvolution methods like Richardson-Lucy and model-based deconvolution use the PSF to improve resolution. However, accurately measuring the PSF is difficult, leading to reliance on less accurate blind deconvolution techniques. Additionally, AR-PAM suffers from long scanning times, which can be reduced via down-sampling, but this necessitates effective image recovery from under-sampled data, a task where traditional interpolation methods fall short, particularly at high under-sampling rates. To address these challenges, we propose an approach based on Implicit Neural Representations (INR). This method learns a continuous mapping from spatial coordinates to initial acoustic pressure, overcoming the limitations of discrete imaging and enhancing AR-PAM's resolution. By treating the PSF as a learnable parameter within the INR framework, our technique mitigates inaccuracies associated with PSF estimation. We evaluated our method on simulated vascular data, showing significant improvements in Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) over conventional methods. Qualitative enhancements were also observed in leaf vein and in vivo mouse brain microvasculature images. When applied to a custom AR-PAM system, experiments with pencil lead demonstrated that our method delivers sharper, higher-resolution results, indicating its potential to advance photoacoustic microscopy.
- Published
- 2024
13. Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation
- Author
-
Xu, Chen, Huang, Qiming, Hou, Yuqi, Wu, Jiangxing, Zhang, Fan, Chang, Hyung Jin, and Jiao, Jianbo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Medical image segmentation poses challenges due to domain gaps, data modality variations, and dependency on domain knowledge or experts, especially for low- and middle-income countries (LMICs). Whereas for humans, given a few exemplars (with corresponding labels), we are able to segment different medical images even without exten-sive domain-specific clinical training. In addition, current SAM-based medical segmentation models use fine-grained visual prompts, such as the bounding rectangle generated from manually annotated target segmentation mask, as the bounding box (bbox) prompt during the testing phase. However, in actual clinical scenarios, no such precise prior knowledge is available. Our experimental results also reveal that previous models nearly fail to predict when given coarser bbox prompts. Considering these issues, in this paper, we introduce a domain-aware selective adaptation approach to adapt the general knowledge learned from a large model trained with natural images to the corresponding medical domains/modalities, with access to only a few (e.g. less than 5) exemplars. Our method mitigates the aforementioned limitations, providing an efficient and LMICs-friendly solution. Extensive experimental analysis showcases the effectiveness of our approach, offering potential advancements in healthcare diagnostics and clinical applications in LMICs., Comment: Accepcted in ACCV 2024
- Published
- 2024
14. iFANnpp: Nuclear Power Plant Digital Twin for Robots and Autonomous Intelligence
- Author
-
Do, Youndo, Zebrowitz, Marc, Stahl, Jackson, and Zhang, Fan
- Subjects
Computer Science - Robotics - Abstract
Robotics has gained significant attention due to its autonomy and ability to automate in the nuclear industry. However, the increasing complexity of robots has led to a growing demand for advanced simulation and control methods to predict robot behavior and optimize plant performance. Most existing digital twins only address parts of systems and do not offer an overall design of nuclear power plants. Furthermore, they are often designed for specific algorithms or tasks, making them unsuitable for broader research applications or other potential projects. In response, we propose a comprehensive nuclear power plant designed to enhance real-time monitoring, operational efficiency, and predictive maintenance. We selected to model a full-scope nuclear power plant in Unreal Engine 5 to incorporate the complexities and various phenomena. The high-resolution simulation environment is integrated with a General Pressurized Water Reactor Simulator, a high-fidelity physics-driven software, to create a realistic flow of nuclear power plant and a real-time updating virtual environment. Furthermore, the virtual environment provides various features and a Python bridge for researchers to test custom algorithms and frameworks easily. The digital twin's performance is presented, and several research ideas - such as multi-robot task scheduling and robot navigation in the radiation area - using implemented features are presented., Comment: 12 pages, 9 figures
- Published
- 2024
15. UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction
- Author
-
Wang, Haoran, Anantrasirichai, Nantheera, Zhang, Fan, and Bull, David
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
3D Gaussian splatting (3DGS) offers the capability to achieve real-time high quality 3D scene rendering. However, 3DGS assumes that the scene is in a clear medium environment and struggles to generate satisfactory representations in underwater scenes, where light absorption and scattering are prevalent and moving objects are involved. To overcome these, we introduce a novel Gaussian Splatting-based method, UW-GS, designed specifically for underwater applications. It introduces a color appearance that models distance-dependent color variation, employs a new physics-based density control strategy to enhance clarity for distant objects, and uses a binary motion mask to handle dynamic content. Optimized with a well-designed loss function supporting for scattering media and strengthened by pseudo-depth maps, UW-GS outperforms existing methods with PSNR gains up to 1.26dB. To fully verify the effectiveness of the model, we also developed a new underwater dataset, S-UW, with dynamic object masks.
- Published
- 2024
16. Emu3: Next-Token Prediction is All You Need
- Author
-
Wang, Xinlong, Zhang, Xiaosong, Luo, Zhengxiong, Sun, Quan, Cui, Yufeng, Wang, Jinsheng, Zhang, Fan, Wang, Yueze, Li, Zhen, Yu, Qiying, Zhao, Yingli, Ao, Yulong, Min, Xuebin, Li, Tao, Wu, Boya, Zhao, Bo, Zhang, Bowen, Wang, Liangdong, Liu, Guang, He, Zheqi, Yang, Xi, Liu, Jingjing, Lin, Yonghua, Huang, Tiejun, and Wang, Zhongyuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.g., CLIP combined with LLMs). In this paper, we introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences. Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship models such as SDXL and LLaVA-1.6, while eliminating the need for diffusion or compositional architectures. Emu3 is also capable of generating high-fidelity video via predicting the next token in a video sequence. We simplify complex multimodal model designs by converging on a singular focus: tokens, unlocking great potential for scaling both during training and inference. Our results demonstrate that next-token prediction is a promising path towards building general multimodal intelligence beyond language. We open-source key techniques and models to support further research in this direction., Comment: Project Page: https://emu.baai.ac.cn
- Published
- 2024
17. DualDn: Dual-domain Denoising via Differentiable ISP
- Author
-
Li, Ruikang, Wang, Yujin, Chen, Shiqi, Zhang, Fan, Gu, Jinwei, and Xue, Tianfan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Image denoising is a critical component in a camera's Image Signal Processing (ISP) pipeline. There are two typical ways to inject a denoiser into the ISP pipeline: applying a denoiser directly to captured raw frames (raw domain) or to the ISP's output sRGB images (sRGB domain). However, both approaches have their limitations. Residual noise from raw-domain denoising can be amplified by the subsequent ISP processing, and the sRGB domain struggles to handle spatially varying noise since it only sees noise distorted by the ISP. Consequently, most raw or sRGB domain denoising works only for specific noise distributions and ISP configurations. To address these challenges, we propose DualDn, a novel learning-based dual-domain denoising. Unlike previous single-domain denoising, DualDn consists of two denoising networks: one in the raw domain and one in the sRGB domain. The raw domain denoising adapts to sensor-specific noise as well as spatially varying noise levels, while the sRGB domain denoising adapts to ISP variations and removes residual noise amplified by the ISP. Both denoising networks are connected with a differentiable ISP, which is trained end-to-end and discarded during the inference stage. With this design, DualDn achieves greater generalizability compared to most learning-based denoising methods, as it can adapt to different unseen noises, ISP parameters, and even novel ISP pipelines. Experiments show that DualDn achieves state-of-the-art performance and can adapt to different denoising architectures. Moreover, DualDn can be used as a plug-and-play denoising module with real cameras without retraining, and still demonstrate better performance than commercial on-camera denoising. The project website is available at: https://openimaginglab.github.io/DualDn/, Comment: Accepted at ECCV 2024, Project page: https://openimaginglab.github.io/DualDn/
- Published
- 2024
18. Cloud Adversarial Example Generation for Remote Sensing Image Classification
- Author
-
Ma, Fei, Feng, Yuqiang, Zhang, Fan, and Zhou, Yongsheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Most existing adversarial attack methods for remote sensing images merely add adversarial perturbations or patches, resulting in unnatural modifications. Clouds are common atmospheric effects in remote sensing images. Generating clouds on these images can produce adversarial examples better aligning with human perception. In this paper, we propose a Perlin noise based cloud generation attack method. Common Perlin noise based cloud generation is a random, non-optimizable process, which cannot be directly used to attack the target models. We design a Perlin Gradient Generator Network (PGGN), which takes a gradient parameter vector as input and outputs the grids of Perlin noise gradient vectors at different scales. After a series of computations based on the gradient vectors, cloud masks at corresponding scales can be produced. These cloud masks are then weighted and summed depending on a mixing coefficient vector and a scaling factor to produce the final cloud masks. The gradient vector, coefficient vector and scaling factor are collectively represented as a cloud parameter vector, transforming the cloud generation into a black-box optimization problem. The Differential Evolution (DE) algorithm is employed to solve for the optimal solution of the cloud parameter vector, achieving a query-based black-box attack. Detailed experiments confirm that this method has strong attack capabilities and achieves high query efficiency. Additionally, we analyze the transferability of the generated adversarial examples and their robustness in adversarial defense scenarios.
- Published
- 2024
19. NVRC: Neural Video Representation Compression
- Author
-
Kwan, Ho Man, Gao, Ge, Zhang, Fan, Gower, Andrew, and Bull, David
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed. In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation. Based on the novel entropy coding and quantization models proposed, NVRC, for the first time, is able to optimize an INR-based video codec in a fully end-to-end manner. To further minimize the additional bitrate overhead introduced by the entropy models, we have also proposed a new model compression framework for coding all the network, quantization and entropy model parameters hierarchically. Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. As far as we are aware, this is the first time an INR-based video codec achieving such performance. The implementation of NVRC will be released at www.github.com.
- Published
- 2024
20. EVENet: Evidence-based Ensemble Learning for Uncertainty-aware Brain Parcellation Using Diffusion MRI
- Author
-
Li, Chenjun, Yang, Dian, Yao, Shun, Wang, Shuyue, Wu, Ye, Zhang, Le, Li, Qiannuo, Cho, Kang Ik Kevin, Seitz-Holland, Johanna, Ning, Lipeng, Legarreta, Jon Haitz, Rathi, Yogesh, Westin, Carl-Fredrik, O'Donnell, Lauren J., Sochen, Nir A., Pasternak, Ofer, and Zhang, Fan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In this study, we developed an Evidence-based Ensemble Neural Network, namely EVENet, for anatomical brain parcellation using diffusion MRI. The key innovation of EVENet is the design of an evidential deep learning framework to quantify predictive uncertainty at each voxel during a single inference. Using EVENet, we obtained accurate parcellation and uncertainty estimates across different datasets from healthy and clinical populations and with different imaging acquisitions. The overall network includes five parallel subnetworks, where each is dedicated to learning the FreeSurfer parcellation for a certain diffusion MRI parameter. An evidence-based ensemble methodology is then proposed to fuse the individual outputs. We perform experimental evaluations on large-scale datasets from multiple imaging sources, including high-quality diffusion MRI data from healthy adults and clinically diffusion MRI data from participants with various brain diseases (schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder, Parkinson's disease, cerebral small vessel disease, and neurosurgical patients with brain tumors). Compared to several state-of-the-art methods, our experimental results demonstrate highly improved parcellation accuracy across the multiple testing datasets despite the differences in dMRI acquisition protocols and health conditions. Furthermore, thanks to the uncertainty estimation, our EVENet approach demonstrates a good ability to detect abnormal brain regions in patients with lesions, enhancing the interpretability and reliability of the segmentation results., Comment: 15 pages, 5 figures
- Published
- 2024
21. Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery
- Author
-
Zhang, Fan, Li, Lingling, Jiao, Licheng, Liu, Xu, Liu, Fang, Yang, Shuyuan, and Hou, Biao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Satellite imagery, due to its long-range imaging, brings with it a variety of scale-preferred tasks, such as the detection of tiny/small objects, making the precise localization and detection of small objects of interest a challenging task. In this article, we design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables ``synergistic focusing'' of multi-scale features. Based on our observations of KDN, we abstract a class of RCs with different connection strengths, called n21C, and generalize it to FPN-based multi-branch detectors. In a series of FPN experiments on the scale-preferred tasks, we found that the ``divide-and-conquer'' idea of FPN severely hampers the detector's learning in the right direction due to the large number of large-scale negative samples and interference from background noise. Moreover, these negative samples cannot be eliminated by the focal loss function. The RCs extends the multi-level feature's ``divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks, and enables synergistic effects of multi-level features on the specific learning goal. In addition, interference activations in two aspects are greatly reduced and the detector learns in a more correct direction. Extensive experiments of 17 well-designed detection architectures embedded with n21s on five different levels of scale-preferred tasks validate the effectiveness and efficiency of the RCs. Especially the simplest linear form of RC, E421C performs well in all tasks and it satisfies the scaling property of RGT. We hope that our approach will transfer a large number of well-designed detectors from the computer vision community to the remote sensing community., Comment: 24 pages, 14 figures Journal
- Published
- 2024
- Full Text
- View/download PDF
22. Transmit Beamforming Design for ISAC with Stacked Intelligent Metasurfaces
- Author
-
Li, Shunyu, Zhang, Fan, Mao, Tianqi, Na, Rui, Wang, Zhaocheng, and Karagiannidis, George K.
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
This paper proposes a transmit beamforming strategy for the integrated sensing and communication (ISAC) systems enabled by the novel stacked intelligent metasurface (SIM) architecture, where the base station (BS) simultaneously performs downlink communication and radar target detection via different beams. To ensure superior dual-function performance simultaneously, we design the multi-layer cascading beamformer by maximizing the sum rate of the users while optimally shaping the normalized beam pattern for detection. A dual-normalized differential gradient descent (D3) algorithm is further proposed to solve the resulting non-convex multi-objective problem (MOP), where gradient differences and dual normalization are employed to ensure a fair trade-off between communication and sensing objectives. Numerical results demonstrate the superiority of the proposed beamforming design in terms of balancing communication and sensing performance.
- Published
- 2024
23. Affordance-based Robot Manipulation with Flow Matching
- Author
-
Zhang, Fan and Gienger, Michael
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with flow matching policy also leads to consistently better generalization performance and faster inference than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.
- Published
- 2024
24. PNVC: Towards Practical INR-based Video Compression
- Author
-
Gao, Ge, Kwan, Ho Man, Zhang, Fan, and Bull, David
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.
- Published
- 2024
25. Mitochondrial Genome of an 8,400-Year-Old Individual from Northern China Reveals a Novel Subclade under C5d
- Author
-
Wu, Xiyan, Ning, Chao, Bao, Qingchuan, Gao, Shizhu, Zhang, Fan, Wu, Sihao, Li, Tianjiao, Fan, Linyuan, Li, Tao, Yang, Xuan, Cai, Dawei, and Cui, Yinqiu
- Published
- 2020
26. Cerebrovascular disease is associated with Alzheimer’s plasma biomarker concentrations in adults with Down syndrome
- Author
-
Edwards, Natalie C, Lao, Patrick J, Alshikho, Mohamad J, Ericsson, Olivia M, Rizvi, Batool, Petersen, Melissa E, O’Bryant, Sid, Aguilar, Lisi Flores, Simoes, Sabrina, Mapstone, Mark, Tudorascu, Dana L, Janelidze, Shorena, Hansson, Oskar, Handen, Benjamin L, Christian, Bradley T, Lee, Joseph H, Lai, Florence, Rosas, H Diana, Zaman, Shahid, Lott, Ira T, Yassa, Michael A, Aizenstein, Howard J, Ances, Beau M, Andrews, Howard F, Bell, Karen, Birn, Rasmus M, Brickman, Adam M, Bulova, Peter, Cheema, Amrita, Chen, Kewei, Clare, Isabel, Cohen, Ann D, Constantino, John N, Doran, Eric W, Fagan, Anne, Feingold, Eleanor, Foroud, Tatiana M, Harp, Jordan, Hartley, Sigan L, Head, Elizabeth, Henson, Rachel, Hom, Christy, Honig, Lawrence, Ikonomovic, Milos D, Johnson, Sterling C, Jordan, Courtney, Kamboh, M Ilyas, Keator, David, Klunk, William E, Kofler, Julia K, Kreisl, William Charles, Krinsky-McHale, Sharon J, Lao, Patrick, Laymon, Charles, Lupson, Victoria, Mathis, Chester A, Minhas, Davneet Singh, Nadkarni, Neelesh, Parisi, Melissa, Pang, Deborah, Petersen, Melissa, Price, Julie C, Pulsifer, Margaret, Rafii, Michael S, Reiman, Eric, Rosas, Herminia Diana, Ryan, Laurie, Schmitt, Frederick, Schupf, Nicole, Silverman, Wayne P, Tumuluru, Rameshwari, Tycko, Benjamin, Varadarajan, Badri, White, Desiree A, Zhang, Fan, Gutierrez, José, and Wilcock, Donna M
- Subjects
Biomedical and Clinical Sciences ,Biological Psychology ,Clinical Sciences ,Neurosciences ,Psychology ,Alzheimer's Disease Related Dementias (ADRD) ,Aging ,Cerebrovascular ,Brain Disorders ,Vascular Cognitive Impairment/Dementia ,Neurodegenerative ,Alzheimer's Disease including Alzheimer's Disease Related Dementias (AD/ADRD) ,Alzheimer's Disease ,Prevention ,Dementia ,Acquired Cognitive Impairment ,Clinical Research ,Biomedical Imaging ,2.1 Biological and endogenous factors ,4.1 Discovery and preclinical testing of markers and technologies ,Neurological ,Alzheimer's disease ,Down syndrome ,cerebrovascular disease ,magnetic resonance imaging ,biomarkers ,Alzheimer’s Biomarkers Consortium–Down Syndrome (ABC-DS) Investigators ,Alzheimer’s disease ,Clinical sciences ,Biological psychology - Abstract
By age 40 years, over 90% of adults with Down syndrome have Alzheimer's disease pathology and most progress to dementia. Despite having few systemic vascular risk factors, individuals with Down syndrome have elevated cerebrovascular disease markers that track with the clinical progression of Alzheimer's disease, suggesting a role of cerebrovascular disease that is hypothesized to be mediated by inflammatory factors. This study examined the pathways through which small vessel cerebrovascular disease contributes to Alzheimer's disease-related pathophysiology and neurodegeneration in adults with Down syndrome. One hundred eighty-five participants from the Alzheimer's Biomarkers Consortium-Down Syndrome [mean (SD) age = 45.2 (9.3) years] with available MRI and plasma biomarker data were included in this study. White matter hyperintensity (WMH) volumes were derived from T2-weighted fluid-attenuated inversion recovery MRI scans, and plasma biomarker concentrations of amyloid beta 42/40, phosphorylated tau 217, astrocytosis (glial fibrillary acidic protein) and neurodegeneration (neurofilament light chain) were measured with ultrasensitive immunoassays. We examined the bivariate relationships of WMH, amyloid beta 42/40, phosphorylated tau 217 and glial fibrillary acidic protein with age-residualized neurofilament light chain across Alzheimer's disease diagnostic groups. A series of mediation and path analyses examined statistical pathways linking WMH and Alzheimer's disease pathophysiology to promote neurodegeneration in the total sample and groups stratified by clinical diagnosis. There was a direct and indirect bidirectional effect through the glial fibrillary acidic protein of WMH on phosphorylated tau 217 concentration, which was associated with neurofilament light chain concentration in the entire sample. Amongst cognitively stable participants, WMH was directly and indirectly, through glial fibrillary acidic protein, associated with phosphorylated tau 217 concentration, and in those with mild cognitive impairment, there was a direct effect of WMH on phosphorylated tau 217 and neurofilament light chain concentrations. There were no associations of WMH with biomarker concentrations among those diagnosed with dementia. The findings from this cross-sectional study suggest that among individuals with Down syndrome, cerebrovascular disease promotes neurodegeneration by increasing astrocytosis and tau pathophysiology in the presymptomatic phases of Alzheimer's disease, but future studies will need to confirm these associations with longitudinal data. This work joins an emerging literature that implicates cerebrovascular disease and its interface with neuroinflammation as a core pathological feature of Alzheimer's disease in adults with Down syndrome.
- Published
- 2024
27. Signatures of sliding Wigner crystals in bilayer graphene at zero and finite magnetic fields
- Author
-
Seiler, Anna M., Statz, Martin, Eckel, Christian, Weimer, Isabell, Pöhls, Jonas, Watanabe, Kenji, Taniguchi, Takashi, Zhang, Fan, and Weitz, R. Thomas
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Strongly Correlated Electrons - Abstract
AB-stacked bilayer graphene has emerged as a fascinating yet simple platform for exploring macroscopic quantum phenomena of correlated electrons. Unexpectedly, a phase with negative dR/dT has recently been observed when a large electric displacement field is applied and the charge carrier density is tuned to the vicinity of an ultra-low-density van Hove singularity. This phase exhibits features consistent with Wigner crystallization, including a characteristic temperature dependence and non-linear current bias behavior. However, more direct evidence for the emergence of an electron crystal in AB-stacked bilayer graphene at zero magnetic field remains elusive. Here we explore the low-frequency noise consistent with depinning and sliding of a Wigner crystal lattice. The current bias and frequency dependence of these noise spectra align well with findings from previous experimental and theoretical studies on the quantum electron solids. Our results offer transport signatures consistent with Wigner crystallization in AB-stacked bilayer graphene at zero and finite magnetic fields, paving the way for further substantiating an anomalous Hall crystal in its original form.
- Published
- 2024
28. When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation
- Author
-
Zhu, Xi, Zhang, Wei, Li, Yijie, O'Donnell, Lauren J., and Zhang, Fan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion MRI (dMRI) is an advanced imaging technique characterizing tissue microstructure and white matter structural connectivity of the human brain. The demand for high-quality dMRI data is growing, driven by the need for better resolution and improved tissue contrast. However, acquiring high-quality dMRI data is expensive and time-consuming. In this context, deep generative modeling emerges as a promising solution to enhance image quality while minimizing acquisition costs and scanning time. In this study, we propose a novel generative approach to perform dMRI generation using deep diffusion models. It can generate high dimension (4D) and high resolution data preserving the gradients information and brain structure. We demonstrated our method through an image mapping task aimed at enhancing the quality of dMRI images from 3T to 7T. Our approach demonstrates highly enhanced performance in generating dMRI images when compared to the current state-of-the-art (SOTA) methods. This achievement underscores a substantial progression in enhancing dMRI quality, highlighting the potential of our novel generative approach to revolutionize dMRI imaging standards., Comment: 11 pages, 3 figures
- Published
- 2024
29. HDN:Hybrid Deep-learning and Non-line-of-sight Reconstruction Framework for Photoacoustic Brain Imaging
- Author
-
Wan, Pengcheng, Zhang, Fan, Shen, Yuting, Shang, Xin, Zhao, Hulin, Liu, Shuangli, Feng, Xiaohua, and Gao, Fei
- Subjects
Physics - Medical Physics ,Electrical Engineering and Systems Science - Image and Video Processing ,Physics - Optics - Abstract
Photoacoustic imaging (PAI) combines the high contrast of optical imaging with the deep penetration depth of ultrasonic imaging, showing great potential in cerebrovascular disease detection. However, the ultrasonic wave suffers strong attenuation and multi-scattering when it passes through the skull tissue, resulting in the distortion of the collected photoacoustic (PA) signal. In this paper, inspired by the principles of deep learning and non-line-of-sight (NLOS) imaging, we propose an image reconstruction framework named HDN (Hybrid Deep-learning and Non-line-of-sight), which consists of the signal extraction part and difference utilization part. The signal extraction part is used to correct the distorted signal and reconstruct an initial image. The difference utilization part is used to make further use of the signal difference between the distorted signal and corrected signal, reconstructing the residual image between the initial image and the target image. The test results on a PA digital brain simulation dataset show that compared with the traditional delay-and-sum (DAS) method and deep-learning-based method, HDN achieved superior performance in both signal correction and image reconstruction. Specifically for the SSIM index, the HDN reached 0.606 in imaging results, compared to 0.154 for the DAS method and 0.307 for the deep-learning-based method., Comment: 8 pages, 8figures
- Published
- 2024
30. A dynamical systems perspective on the celestial mechanical contribution to the emergence of life
- Author
-
Zhang, Fan
- Subjects
Nonlinear Sciences - Chaotic Dynamics - Abstract
Biological activities are often seen entrained onto the day-night and other celestial mechanical cycles (e.g., seasonal and lunar), but studies on the origin of life have largely not accounted for such periodic external environmental variations. We argue that this may be an important omission, because the signature replication behaviour of life represents temporal memory in the dynamics of ecosystems, that signifies the absence of mixing properties (i.e., the dynamics are not fully chaotic), and entrainment onto regular, periodic external perturbative influences has been proven capable of suppressing chaos, and thus may bring otherwise unstable chemical reaction sets into viability, as precursors to abiogenesis. As well, external perturbations may be necessary to prevent an open dissipative (bio)chemical system from collapsing into the opposite extreme -- the point attractor of thermal equilibrium. In short, life may precariously rest on the edge of chaos, and open-loop periodic perturbation rooted in celestial mechanics (and should be simulated in laboratory experiments in origin-of-life studies) may help with the balancing. Such considerations, if pertinent, would also be consequential to exobiology, e.g., in regard to tidal-locking properties of potential host worlds., Comment: 6 pages
- Published
- 2024
31. Diverse Impacts of Spin-Orbit Coupling on Superconductivity in Rhombohedral Graphene
- Author
-
Yang, Jixiang, Shi, Xiaoyan, Ye, Shenyong, Yoon, Chiho, Lu, Zhengguang, Kakani, Vivek, Han, Tonghang, Seo, Junseok, Shi, Lihan, Watanabe, Kenji, Taniguchi, Takashi, Zhang, Fan, and Ju, Long
- Subjects
Condensed Matter - Superconductivity ,Condensed Matter - Strongly Correlated Electrons - Abstract
Engineering non-Abelian quasiparticles by combining superconductivity and topological states have been proposed as a route to realize topological quantum computation. Rhombohedral multilayer graphene with layer number N>=3 has been shown as a promising platform, as it hosts integer and fractional quantum anomalous Hall effects when proximitized by transition metal dichalcogenide (TMD) and a moire potential. However, superconductivity in similar devices have remained largely unexplored, although proximitized spin-orbit-coupling (SOC) effect has been shown to strengthen or induce superconductivity in both crystalline and twisted graphene. Here we report electron transport measurements of TMD-proximitized rhombohedral trilayer graphene (RTG) at temperatures down to 40 mK. We observed a new hole-doped superconducting state SC4 with a transition temperature Tc of 230 mK. On the electron-doped side, we identified a new isospin-symmetry breaking three-quarter-metal (TQM) phase. Near this three-quarter-metal state, the state SC3, very weak in bare RTG, is fully developed into a superconducting state at 110 mK. By performing fermiology analysis based on the quantum oscillation measurement, we showed that the SC3 and SC4 states reside at the phase boundaries between different isospin-symmetry-breaking states. These observations are aligned with the existing understanding that SOC enhances graphene superconductivity. Surprisingly, the original superconducting state SC1 in bare RTG is strongly suppressed in the presence of TMD, and we cannot find it down to the base temperature of our measurement. Our observations form the basis of exploring superconductivity and non-Abelian quasiparticles in rhombohedral graphene devices, and provide experimental evidence that challenges the understanding of the impacts of SOC on graphene superconductivity., Comment: 35 pages; 4 figures, 1 table, 13 extended data figures
- Published
- 2024
32. BVI-UGC: A Video Quality Database for User-Generated Content Transcoding
- Author
-
Qi, Zihao, Feng, Chen, Zhang, Fan, Xu, Xiaozhong, Liu, Shan, and Bull, David
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In recent years, user-generated content (UGC) has become one of the major video types consumed via streaming networks. Numerous research contributions have focused on assessing its visual quality through subjective tests and objective modeling. In most cases, objective assessments are based on a no-reference scenario, where the corresponding reference content is assumed not to be available. However, full-reference video quality assessment is also important for UGC in the delivery pipeline, particularly associated with the video transcoding process. In this context, we present a new UGC video quality database, BVI-UGC, for user-generated content transcoding, which contains 60 (non-pristine) reference videos and 1,080 test sequences. In this work, we simulated the creation of non-pristine reference sequences (with a wide range of compression distortions), typical of content uploaded to UGC platforms for transcoding. A comprehensive crowdsourced subjective study was then conducted involving more than 3,500 human participants. Based on this collected subjective data, we benchmarked the performance of 10 full-reference and 11 no-reference quality metrics. Our results demonstrate the poor performance (SROCC values are lower than 0.6) of these metrics in predicting the perceptual quality of UGC in two different scenarios (with or without a reference)., Comment: 12 pages, 11 figures
- Published
- 2024
33. Non-Hermitian Singularities in Scattering Spectra of Mie Resonators
- Author
-
Zhang, Fan, Solodovchenko, Nikolay S., Fan, Hangkai, Limonov, Mikhail F., Song, Mingzhao, Kivshar, Yuri S., and Bogdanov, Andrey A.
- Subjects
Physics - Classical Physics ,Physics - Optics - Abstract
Non-Hermitian systems are known to possess unique singularities in the scattering spectra such as exceptional points, bound states in the continuum, Diabolic points, and anapole states, which are usually considered to be independent. Here, we demonstrate the fundamental relationships between non-Hermitian singularities and observe them experimentally in the scattering spectra. We reveal that exceptional points appear in the anapole regime, and diabolic points are associated with superscattering. We confirm our findings with microwave experiments by measuring the scattering spectra of subwavelength Mie-resonant ceramic rings. Our study underpins the generic behavior of non-Hermitian singularities in the scattering spectra of subwavelength resonators, uncovering their novel applications in non-Hermitian nonlinear optics and topological photonics., Comment: 19 pages, 5 figures
- Published
- 2024
34. Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering
- Author
-
Zhang, Fan, Ji, Ziyue, Kang, Weiguang, Li, Weiqing, and Su, Zhiyong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.
- Published
- 2024
- Full Text
- View/download PDF
35. Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration
- Author
-
Teng, Siyue, Jiang, Yuxuan, Gao, Ge, Zhang, Fan, Davis, Thomas, Liu, Zoe, and Bull, David
- Subjects
Computer Science - Multimedia ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Recent advances in video compression have seen significant coding performance improvements with the development of new standards and learning-based video codecs. However, most of these works focus on application scenarios that allow a certain amount of system delay (e.g., Random Access mode in MPEG codecs), which is not always acceptable for live delivery. This paper conducts a comparative study of state-of-the-art conventional and learned video coding methods based on a low delay configuration. Specifically, this study includes two MPEG standard codecs (H.266/VVC VTM and JVET ECM), two AOM codecs (AV1 libaom and AVM), and two recent neural video coding models (DCVC-DC and DCVC-FM). To allow a fair and meaningful comparison, the evaluation was performed on test sequences defined in the AOM and MPEG common test conditions in the YCbCr 4:2:0 color space. The evaluation results show that the JVET ECM codecs offer the best overall coding performance among all codecs tested, with a 16.1% (based on PSNR) average BD-rate saving over AOM AVM, and 11.0% over DCVC-FM. We also observed inconsistent performance with the learned video codecs, DCVC-DC and DCVC-FM, for test content with large background motions.
- Published
- 2024
36. BVI-AOM: A New Training Dataset for Deep Video Compression Optimization
- Author
-
Nawała, Jakub, Jiang, Yuxuan, Zhang, Fan, Zhu, Xiaoqing, Sole, Joel, and Bull, David
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Deep learning is now playing an important role in enhancing the performance of conventional hybrid video codecs. These learning-based methods typically require diverse and representative training material for optimization in order to achieve model generalization and optimal coding performance. However, existing datasets either offer limited content variability or come with restricted licensing terms constraining their use to research purposes only. To address these issues, we propose a new training dataset, named BVI-AOM, which contains 956 uncompressed sequences at various resolutions from 270p to 2160p, covering a wide range of content and texture types. The dataset comes with more flexible licensing terms and offers competitive performance when used as a training set for optimizing deep video coding tools. The experimental results demonstrate that when used as a training set to optimize two popular network architectures for two different coding tools, the proposed dataset leads to additional bitrate savings of up to 0.29 and 2.98 percentage points in terms of PSNR-Y and VMAF, respectively, compared to an existing training dataset, BVI-DVC, which has been widely used for deep video coding. The BVI-AOM dataset is available at https://github.com/fan-aaron-zhang/bvi-aom, Comment: 5 pages, 5 figures. Swapped the PSNR-HVS plot in Fig. 3 for a PSNR-YUV plot. Updated Fig. 3 (SI/TI/CF plots) and added the URL to the dataset
- Published
- 2024
37. Field-Tunable Valley Coupling and Localization in a Dodecagonal Semiconductor Quasicrystal
- Author
-
Liu, Zhida, Gao, Qiang, Li, Yanxing, Liu, Xiaohui, Zhang, Fan, Kim, Dong Seob, Ni, Yue, Mackenzie, Miles, Abudayyeh, Hamza, Watanabe, Kenji, Taniguchi, Takashi, Shih, Chih-Kang, Khalaf, Eslam, and Li, Xiaoqin
- Subjects
Condensed Matter - Materials Science ,Physics - Optics - Abstract
Quasicrystals are characterized by atomic arrangements possessing long-range order without periodicity. Van der Waals (vdW) bilayers provide a unique opportunity to controllably vary atomic alignment between two layers from a periodic moir\'e crystal to an aperiodic quasicrystal. Here, we reveal a remarkable consequence of the unique atomic arrangement in a dodecagonal WSe2 quasicrystal: the K and Q valleys in separate layers are brought arbitrarily close in momentum space via higher-order Umklapp scatterings. A modest perpendicular electric field is sufficient to induce strong interlayer K-Q hybridization, manifested as a new hybrid excitonic doublet. Concurrently, we observe the disappearance of the trion resonance and attribute it to quasicrystal potential driven localization. Our findings highlight the remarkable attribute of incommensurate systems to bring any pair of momenta into close proximity, thereby introducing a novel aspect to valley engineering., Comment: 12 pages, 12 figures
- Published
- 2024
38. DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework
- Author
-
Zhang, Fan, Ji, Naye, Gao, Fuxing, Zhao, Bozuo, Wu, Jingmei, Jiang, Yanbing, Du, Hui, Ye, Zhenqing, Zhu, Jiayang, Zhong, WeiFan, Yan, Leyao, and Ma, Xiaomeng
- Subjects
Computer Science - Graphics ,Computer Science - Artificial Intelligence ,Computer Science - Robotics ,Computer Science - Sound - Abstract
Speech-driven gesture generation is an emerging domain within virtual human creation, where current methods predominantly utilize Transformer-based architectures that necessitate extensive memory and are characterized by slow inference speeds. In response to these limitations, we propose \textit{DiM-Gestures}, a novel end-to-end generative model crafted to create highly personalized 3D full-body gestures solely from raw speech audio, employing Mamba-based architectures. This model integrates a Mamba-based fuzzy feature extractor with a non-autoregressive Adaptive Layer Normalization (AdaLN) Mamba-2 diffusion architecture. The extractor, leveraging a Mamba framework and a WavLM pre-trained model, autonomously derives implicit, continuous fuzzy features, which are then unified into a singular latent feature. This feature is processed by the AdaLN Mamba-2, which implements a uniform conditional mechanism across all tokens to robustly model the interplay between the fuzzy features and the resultant gesture sequence. This innovative approach guarantees high fidelity in gesture-speech synchronization while maintaining the naturalness of the gestures. Employing a diffusion model for training and inference, our framework has undergone extensive subjective and objective evaluations on the ZEGGS and BEAT datasets. These assessments substantiate our model's enhanced performance relative to contemporary state-of-the-art methods, demonstrating competitive outcomes with the DiTs architecture (Persona-Gestors) while optimizing memory usage and accelerating inference speed., Comment: 10 pages,10 figures. arXiv admin note: text overlap with arXiv:2403.10805
- Published
- 2024
39. Diffusion Feedback Helps CLIP See Better
- Author
-
Wang, Wenxuan, Sun, Quan, Zhang, Fan, Tang, Yepeng, Liu, Jing, and Wang, Xinlong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world representations across domains and modalities, has become a foundation for a variety of vision and multimodal tasks. However, recent studies reveal that CLIP has severe visual shortcomings, such as which can hardly distinguish orientation, quantity, color, structure, etc. These visual shortcomings also limit the perception capabilities of multimodal large language models (MLLMs) built on CLIP. The main reason could be that the image-text pairs used to train CLIP are inherently biased, due to the lack of the distinctiveness of the text and the diversity of images. In this work, we present a simple post-training approach for CLIP models, which largely overcomes its visual shortcomings via a self-supervised diffusion process. We introduce DIVA, which uses the DIffusion model as a Visual Assistant for CLIP. Specifically, DIVA leverages generative feedback from text-to-image diffusion models to optimize CLIP representations, with only images (without corresponding text). We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e.g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks. Extensive evaluation on 29 image classification and retrieval benchmarks confirms that our framework preserves CLIP's strong zero-shot capabilities. The code is available at https://github.com/baaivision/DIVA.
- Published
- 2024
40. White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging
- Author
-
Lo, Yui, Chen, Yuqian, Zhang, Fan, Liu, Dongnan, Zekelman, Leo, Cetin-Karayumak, Suheyla, Rathi, Yogesh, Cai, Weidong, and O'Donnell, Lauren J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Parcellation of white matter tractography provides anatomical features for disease prediction, anatomical tract segmentation, surgical brain mapping, and non-imaging phenotype classifications. However, parcellation does not always reach 100\% accuracy due to various factors, including inter-individual anatomical variability and the quality of neuroimaging scan data. The failure to identify parcels causes a problem of missing microstructure data values, which is especially challenging for downstream tasks that analyze large brain datasets. In this work, we propose a novel deep-learning model to impute tissue microstructure: the White Matter Geometry-guided Diffusion (WMG-Diff) model. Specifically, we first propose a deep score-based guided diffusion model to impute tissue microstructure for diffusion magnetic resonance imaging (dMRI) tractography fiber clusters. Second, we propose a white matter atlas geometric relationship-guided denoising function to guide the reverse denoising process at the subject-specific level. Third, we train and evaluate our model on a large dataset with 9342 subjects. Comprehensive experiments for tissue microstructure imputation and a downstream non-imaging phenotype prediction task demonstrate that our proposed WMG-Diff outperforms the compared state-of-the-art methods in both error and accuracy metrics. Our code will be available at: https://github.com/SlicerDMRI/WMG-Diff., Comment: This paper has been accepted for presentation at The 31st International Conference on Neural Information Processing (ICONIP 2024). 12 pages, 3 figures, 2 tables
- Published
- 2024
41. CrudiTEE: A Stick-and-Carrot Approach to Building Trustworthy Cryptocurrency Wallets with TEEs
- Author
-
Zhou, Lulu, Liu, Zeyu, Zhang, Fan, and Reiter, Michael K.
- Subjects
Computer Science - Cryptography and Security - Abstract
Cryptocurrency introduces usability challenges by requiring users to manage signing keys. Popular signing key management services (e.g., custodial wallets), however, either introduce a trusted party or burden users with managing signing key shares, posing the same usability challenges. TEEs (Trusted Execution Environments) are a promising technology to avoid both, but practical implementations of TEEs suffer from various side-channel attacks that have proven hard to eliminate. This paper explores a new approach to side-channel mitigation through economic incentives for TEE-based cryptocurrency wallet solutions. By taking the cost and profit of side-channel attacks into consideration, we designed a Stick-and-Carrot-based cryptocurrency wallet, CrudiTEE, that leverages penalties (the stick) and rewards (the carrot) to disincentivize attackers from exfiltrating signing keys in the first place. We model the attacker's behavior using a Markov Decision Process (MDP) to evaluate the effectiveness of the bounty and enable the service provider to adjust the parameters of the bounty's reward function accordingly.
- Published
- 2024
42. Deep multimodal saliency parcellation of cerebellar pathways: linking microstructure and individual function through explainable multitask learning
- Author
-
Tchetchenian, Ari, Zekelman, Leo, Chen, Yuqian, Rushmore, Jarrett, Zhang, Fan, Yeterian, Edward H., Makris, Nikos, Rathi, Yogesh, Meijering, Erik, Song, Yang, and O'Donnell, Lauren J.
- Subjects
Quantitative Biology - Neurons and Cognition ,Computer Science - Machine Learning - Abstract
Parcellation of human cerebellar pathways is essential for advancing our understanding of the human brain. Existing diffusion MRI tractography parcellation methods have been successful in defining major cerebellar fibre tracts, while relying solely on fibre tract structure. However, each fibre tract may relay information related to multiple cognitive and motor functions of the cerebellum. Hence, it may be beneficial for parcellation to consider the potential importance of the fibre tracts for individual motor and cognitive functional performance measures. In this work, we propose a multimodal data-driven method for cerebellar pathway parcellation, which incorporates both measures of microstructure and connectivity, and measures of individual functional performance. Our method involves first training a multitask deep network to predict various cognitive and motor measures from a set of fibre tract structural features. The importance of each structural feature for predicting each functional measure is then computed, resulting in a set of structure-function saliency values that are clustered to parcellate cerebellar pathways. We refer to our method as Deep Multimodal Saliency Parcellation (DeepMSP), as it computes the saliency of structural measures for predicting cognitive and motor functional performance, with these saliencies being applied to the task of parcellation. Applying DeepMSP we found that it was feasible to identify multiple cerebellar pathway parcels with unique structure-function saliency patterns that were stable across training folds.
- Published
- 2024
43. AGORA: Open More and Trust Less in Binary Verification Service
- Author
-
Chen, Hongbo, Zhou, Quan, Yang, Sen, Han, Xing, Zhang, Fan, Zhang, Danfeng, and Wang, Xiaofeng
- Subjects
Computer Science - Cryptography and Security - Abstract
Binary verification plays a pivotal role in software security, yet building a verification service that is both open and trustworthy poses a formidable challenge. In this paper, we introduce a novel binary verification service, AGORA, scrupulously designed to overcome the challenge. At the heart of this approach lies a strategic insight: certain tasks can be delegated to untrusted entities, while the corresponding validators are securely housed within the trusted computing base (TCB). AGORA can validate untrusted assertions generated for versatile policies. Through a novel blockchain-based bounty task manager, it also utilizes crowdsourcing to remove trust in theorem provers. These synergistic techniques successfully ameliorate the TCB size burden associated with two procedures: binary analysis and theorem proving. The design of AGORA allows untrusted parties to participate in these complex processes. Moreover, based on running the optimized TCB within trusted execution environments and recording the verification process on a blockchain, the public can audit the correctness of verification results. By implementing verification workflows for software-based fault isolation policy and side-channel mitigation, our evaluation demonstrates the efficacy of AGORA.
- Published
- 2024
44. Cluster Sliding Ferroelectricity in Trilayer Quasi-Hexagonal C60
- Author
-
Wang, Xuefei, Ren, Yanhan, Qiu, Shi, Zhang, Fan, Li, Xueao, Gao, Junfeng, Gao, Weiwei, and Zhao, Jijun
- Subjects
Condensed Matter - Materials Science ,Physics - Computational Physics - Abstract
Electric polarization typically originates from non-centrosymmetric charge distributions. Since chemical bonds between atoms of the same elements favor centrosymmetric crystal structures and symmetrically distributed electron charges, elemental ferroelectrics are extremely rare. In comparison to atoms, elemental clusters are less symmetric and typically have various preferred orientations in crystals. Consequently, the assembly of clusters with different orientations tends to break the inversion symmetry. Based on this concept, we show that sliding ferroelectricity naturally emerges in trilayer quasi-hexagonal phase (qHP) C60, a cluster-assembled carbon allotrope recently synthesized. Trilayer qHP C60's have several stable polar structures, which are distinguishable in second-harmonic generation (SHG) responses. Compared to previously found elemental ferroelectrics, trilayer qHP C60's have sizable band gaps and some of them have both switchable out-of-plane and in-plane polarizations. Remarkably, the out-of-plane and in-plane polarizations are decoupled, enabling an easy-to-implement construction of Van der Waals homostructures with ferroelectrically switchable chirality., Comment: 5 figures
- Published
- 2024
45. Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
- Author
-
Chen, Zhuo, Liu, Jiawei, Liu, Haotan, Cheng, Qikai, Zhang, Fan, Lu, Wei, and Liu, Xiaozhong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making, providing new insight to enhance the reliability and security of RAG models. We manipulate the ranking results of the retrieval model in RAG with instruction and use these results as data to train a surrogate model. By employing adversarial retrieval attack methods to the surrogate model, black-box transfer attacks on RAG are further realized. Experiments conducted on opinion datasets across multiple topics show that the proposed attack strategy can significantly alter the opinion polarity of the content generated by RAG. This demonstrates the model's vulnerability and, more importantly, reveals the potential negative impact on user cognition and decision-making, making it easier to mislead users into accepting incorrect or biased information., Comment: 10 pages, 3 figures, under review
- Published
- 2024
46. LTRL: Boosting Long-tail Recognition via Reflective Learning
- Author
-
Zhao, Qihao, Dai, Yalun, Lin, Shen, Hu, Wei, Zhang, Fan, and Liu, Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In real-world scenarios, where knowledge distributions exhibit long-tail. Humans manage to master knowledge uniformly across imbalanced distributions, a feat attributed to their diligent practices of reviewing, summarizing, and correcting errors. Motivated by this learning process, we propose a novel learning paradigm, called reflecting learning, in handling long-tail recognition. Our method integrates three processes for reviewing past predictions during training, summarizing and leveraging the feature relation across classes, and correcting gradient conflict for loss functions. These designs are lightweight enough to plug and play with existing long-tail learning methods, achieving state-of-the-art performance in popular long-tail visual benchmarks. The experimental results highlight the great potential of reflecting learning in dealing with long-tail recognition., Comment: ECCV2024, Oral
- Published
- 2024
47. TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Classification from Diffusion MRI Tractography
- Author
-
Chen, Yuqian, Zhang, Fan, Wang, Meng, Zekelman, Leo R., Cetin-Karayumak, Suheyla, Xue, Tengfei, Zhang, Chaoyi, Song, Yang, Makris, Nikos, Rathi, Yogesh, Cai, Weidong, and O'Donnell, Lauren J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The relationship between brain connections and non-imaging phenotypes is increasingly studied using deep neural networks. However, the local and global properties of the brain's white matter networks are often overlooked in convolutional network design. We introduce TractGraphFormer, a hybrid Graph CNN-Transformer deep learning framework tailored for diffusion MRI tractography. This model leverages local anatomical characteristics and global feature dependencies of white matter structures. The Graph CNN module captures white matter geometry and grey matter connectivity to aggregate local features from anatomically similar white matter connections, while the Transformer module uses self-attention to enhance global information learning. Additionally, TractGraphFormer includes an attention module for interpreting predictive white matter connections. In sex prediction tests, TractGraphFormer shows strong performance in large datasets of children (n=9345) and young adults (n=1065). Overall, our approach suggests that widespread connections in the WM are predictive of the sex of an individual, and consistent predictive anatomical tracts are identified across the two datasets. The proposed approach highlights the potential of integrating local anatomical information and global feature dependencies to improve prediction performance in machine learning with diffusion MRI tractography., Comment: 23 pages, 4 figures
- Published
- 2024
48. Self-consistent theory for the fractional quantum anomalous Hall effect in rhombohedral pentalayer graphene
- Author
-
Huang, Ke, Li, Xiao, Sarma, Sankar Das, and Zhang, Fan
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
The fractional quantum anomalous Hall (FQAH) effect in rhombohedral pentalayer graphene (PLG) has attracted significant attention due to its potential for observing exotic quantum states. In this work, we present a self-consistent Hartree-Fock theory for the FQAH effect in rhombohedral PLG. In particular, we focus on the convergence of the Hartree-Fock calculation with various reference fields and discuss the stability of the FQAH states in PLG. We show that the so-called charge neutrality scheme provides an unambiguous result for the Hartree-Fock calculation, as it ensures a convergence with respect to the momentum cutoff. Based on the Hartree-Fock band structure, we further carry out exact diagonalization calculations to explore the stability of the FQAH states in PLG. Our work provides an improved and unified (minimal) theoretical framework to understand the FQAH effect in rhombohedral PLG and paves the way for future experimental and theoretical studies., Comment: 19 pages, 12 figures. Comments are welcome
- Published
- 2024
- Full Text
- View/download PDF
49. DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
- Author
-
Li, Xiaotong, Zhang, Fan, Diao, Haiwen, Wang, Yueze, Wang, Xinlong, and Duan, Ling-Yu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex understanding of various visual elements, including multiple objects, text information, and spatial relations. Their development for comprehensive visual perception hinges on the availability of high-quality image-text datasets that offer diverse visual elements and throughout image descriptions. However, the scarcity of such hyper-detailed datasets currently hinders progress within the MLLM community. The bottleneck stems from the limited perceptual capabilities of current caption engines, which fall short in providing complete and accurate annotations. To facilitate the cutting-edge research of MLLMs on comprehensive vision perception, we thereby propose Perceptual Fusion, using a low-budget but highly effective caption engine for complete and accurate image descriptions. Specifically, Perceptual Fusion integrates diverse perception experts as image priors to provide explicit information on visual elements and adopts an efficient MLLM as a centric pivot to mimic advanced MLLMs' perception abilities. We carefully select 1M highly representative images from uncurated LAION dataset and generate dense descriptions using our engine, dubbed DenseFusion-1M. Extensive experiments validate that our engine outperforms its counterparts, where the resulting dataset significantly improves the perception and cognition abilities of existing MLLMs across diverse vision-language benchmarks, especially with high-resolution images as inputs. The dataset and code are publicly available at https://github.com/baaivision/DenseFusion.
- Published
- 2024
50. First-order N\'eel-VBS transition in $S=3/2$ antiferromagnets
- Author
-
Zhang, Fan, Guo, Wenan, and Kaul, Ribhu K.
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Statistical Mechanics - Abstract
We study the transition between N\'eel and columnar valence-bond solid ordering in two-dimensional $S=3/2$ square lattice quantum antiferromagnets with SO(3) symmetry. According to the deconfined criticality scenario, this transition can be direct and continuous like the well-studied $S=1/2$ case. To study the global phase diagram, we work with four multi-spin couplings with full rotational symmetry, that are free of the sign-problem of quantum Monte Carlo. Exploring the phase diagram with quantum Monte Carlo simulations, we find that the phase transition between N\'eel and valence-bond solid is strongly first-order in the parts of the phase diagram that we have accessed., Comment: 11 pages, 16 figures
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.