Author: "Yi, A." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yi, A."' showing total 2,869,125 results

Start Over Author "Yi, A."

2,869,125 results on '"Yi, A."'

1. Memory benefits of daily-living-related contextual cueing for individuals with subjective cognitive decline and mild cognitive impairment

Author: Liu, Chien-hsiou, Li, Kuan-yi, Liao, Wan-wen, Chuang, I-ching, Huang, Yan-hua, and Wu, Ching-yi
Published: 2024

2. BERT-like pre-training for symbolic piano music classification tasks

Author: Chou, Yi-Hui, Chen, I-Chun, Chang, Chin-Jui, Ching, Joann, and Yang, Yi-Hsuan
Published: 2024

3. TIGRE v3: Efficient and easy to use iterative computed tomographic reconstruction toolbox for real datasets

Author: Biguri, Ander, Sadakane, Tomoyuki, Lindroos, Reuben, Liu, Yi, Landman, Malena Sabaté, Du, Yi, Lohvithee, Manasavee, Kaser, Stefanie, Hatamikia, Sepideh, Bryll, Robert, Valat, Emilien, Wonglee, Sarinrat, Blumensath, Thomas, and Schönlieb, Carola-Bibiane
Subjects: Physics - Medical Physics, Computer Science - Mathematical Software, Mathematics - Optimization and Control
Abstract: Computed Tomography (CT) has been widely adopted in medicine and it is increasingly being used in scientific and industrial applications. Parallelly, research in different mathematical areas concerning discrete inverse problems has led to the development of new sophisticated numerical solvers that can be applied in the context of CT. The Tomographic Iterative GPU-based Reconstruction (TIGRE) toolbox was born almost a decade ago precisely in the gap between mathematics and high performance computing for real CT data, providing user-friendly open-source software tools for image reconstruction. However, since its inception, the tools' features and codebase have had over a twenty-fold increase, and are now including greater geometric flexibility, a variety of modern algorithms for image reconstruction, high-performance computing features and support for other CT modalities, like proton CT. The purpose of this work is two-fold: first, it provides a structured overview of the current version of the TIGRE toolbox, providing appropriate descriptions and references, and serving as a comprehensive and peer-reviewed guide for the user; second, it is an opportunity to illustrate the performance of several of the available solvers showcasing real CT acquisitions, which are typically not be openly available to algorithm developers.
Published: 2024

4. Exemplar Masking for Multimodal Incremental Learning

Author: Lee, Yi-Lun, Lee, Chen-Yu, Chiu, Wei-Chen, and Tsai, Yi-Hsuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal incremental learning needs to digest the information from multiple modalities while concurrently learning new knowledge without forgetting the previously learned information. There are numerous challenges for this task, mainly including the larger storage size of multimodal data in exemplar-based methods and the computational requirement of finetuning on huge multimodal models. In this paper, we leverage the parameter-efficient tuning scheme to reduce the burden of fine-tuning and propose the exemplar masking framework to efficiently replay old knowledge. Specifically, the non-important tokens are masked based on the attention weights and the correlation across different modalities, significantly reducing the storage size of an exemplar and consequently saving more exemplars under the same memory buffer. Moreover, we design a multimodal data augmentation technique to diversify exemplars for replaying prior knowledge. In experiments, we not only evaluate our method in existing multimodal datasets but also extend the ImageNet-R dataset to a multimodal dataset as a real-world application, where captions are generated by querying multimodal large language models (e.g., InstructBLIP). Extensive experiments show that our exemplar masking framework is more efficient and robust to catastrophic forgetting under the same limited memory buffer. Code is available at https://github.com/YiLunLee/Exemplar_Masking_MCIL., Comment: Project page: https://github.com/YiLunLee/Exemplar_Masking_MCIL
Published: 2024

5. Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models

Author: Lee, Yi-Lun, Tsai, Yi-Hsuan, and Chiu, Wei-Chen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: While large vision-language models (LVLMs) have shown impressive capabilities in generating plausible responses correlated with input visual contents, they still suffer from hallucinations, where the generated text inaccurately reflects visual contents. To address this, recent approaches apply contrastive decoding to calibrate the model's response via contrasting output distributions with original and visually distorted samples, demonstrating promising hallucination mitigation in a training-free manner. However, the potential of changing information in visual inputs is not well-explored, so a deeper investigation into the behaviors of visual contrastive decoding is of great interest. In this paper, we first explore various methods for contrastive decoding to change visual contents, including image downsampling and editing. Downsampling images reduces the detailed textual information while editing yields new contents in images, providing new aspects as visual contrastive samples. To further study benefits by using different contrastive samples, we analyze probability-level metrics, including entropy and distribution distance. Interestingly, the effect of these samples in mitigating hallucinations varies a lot across LVLMs and benchmarks. Based on our analysis, we propose a simple yet effective method to combine contrastive samples, offering a practical solution for applying contrastive decoding across various scenarios. Extensive experiments are conducted to validate the proposed fusion method among different benchmarks., Comment: Under review. Project pages: https://github.com/YiLunLee/VCD_Analysis
Published: 2024

6. $V_{cb}$ puzzle in semi-leptonic $B\to D^*$ decays revisited

Author: Li, Shuang-Yi, Xu, Jie, Shi, Rui-Xiang, Geng, Li-Sheng, Zhang, Yu-Jie, and Zhang, Yi
Subjects: High Energy Physics - Phenomenology, High Energy Physics - Experiment
Abstract: Inspired by the newly reported $B\to D^*(\to D\pi)\ell\bar{\nu}_\ell$ differential decay rates by the Belle and Belle II Collaborations, we revisit the $V_{cb}$ puzzle in semi-leptonic $B\to D^*$ decays, considering the latest lattice QCD simulations and light-cone sum rule results. We examine the commonly used Caprini-Lellouch-Neubert (CLN), Boyd-Grinstein-Lebed (BGL), and heavy quark effective theory (HQET) parameterizations. We show that these three parameterizations lead to consistent results and reconfirm the $V_{cb}$ puzzle. Then, we use a state-of-the-art Bayesian method to estimate the impact of higher-order terms beyond the present HQET expansion on the uncertainty of $V_{cb}$. We show that higher-order effects cannot eliminate the deviation between the exclusive and inclusive determinations of $V_{cb}$. Finally, utilizing the best-fit results obtained in the HQET parameterization as inputs, we predict the relevant observables, i.e., $R_{D^*}$, $F_L^{D^*}$, and $P_\tau^{D^*}$, sensitive to new physics in the $B\to D^*\ell\bar{\nu}_\ell$ decays. We conclude that lepton-universality violations still exist in the $b\to c\tau\nu$ transitions., Comment: 21 pages, 1 figures, 9 tables
Published: 2024

7. Experimental electronic phase diagram in a diamond-lattice antiferromagnetic system

Author: Ji, Liang-Wen, Yang, Wu-Zhang, Lu, Yi-Ming, Lu, Jia-Yi, Li, Jing, Liu, Yi, Ren, Zhi, and Cao, Guang-Han
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Materials Science
Abstract: We report Ni-doping effect on the magnetic and electronic properties of thiospinel Co$_{1-x}$Ni$_x$[Co$_{0.3}$Ir$_{1.7}$]S$_4$ (0 $\leq x \leq$ 1). The parent compound Co[Co$_{0.3}$Ir$_{1.7}$]S$_4$ exhibits antiferromagnetic order below $T_\mathrm{N} \sim$ 292 K within the $A$-site diamond sublattice, along with a narrow charge-transfer gap. Upon Ni doping, an insulator-to-metal crossover occurs at $x \sim$ 0.35, and the antiferromagnetism is gradually suppressed, with $T_\mathrm{N}$ decreasing to 23 K at $x =$ 0.7. In the metallic state, a spin-glass-like transition emerges at low temperatures. The antiferromagnetic transition is completely suppressed at $x_\mathrm{c} \sim$ 0.95, around which a non-Fermi-liquid behavior emerges, evident from the $T^\alpha$ temperature dependence with $\alpha \approx$ 1.2-1.3 in resistivity and divergent behavior of $C/T$ in specific heat at low temperatures. Meanwhile, the electronic specific heat coefficient $\gamma$ increases substantially, signifying an enhancement of the quasiparticle effective mass. The magnetic phase diagram has been established, in which an antiferromagnetic quantum critical point is avoided at $x_\mathrm{c}$. Conversely, the observed glass-like tail above the critical concentration aligns more closely with theoretical predictions for an extended region of quantum Griffiths phase in the presence of strong disorder., Comment: 8 pages, 5 figures
Published: 2024
Full Text: View/download PDF

8. Photometric and Spectroscopic Investigations of Three Large Amplitude Contact Binaries

Author: Xu, Xin, Li, Kai, Liu, Fei, Yan, Qian-Xue, Wang, Yi-Fan, Cui, Xin-Yu, Wang, Jing-Yi, Gao, Xing, Sun, Guo-You, Wu, Cheng-Yu, and Li, Mu-Zi-Mei
Subjects: Astrophysics - Solar and Stellar Astrophysics
Abstract: We performed photometric and spectroscopic studies of three large amplitude contact binaries, NSVS 2418361, ATLAS J057.1170+31.2384 and NSVS 7377875. The amplitudes of three systems' light curves are more than 0.7 magnitude. We analyzed the light curves using Wilson-Devinney code to yield physical parameters. The photometric solutions suggested that NSVS 7377875 belongs to an A-subtype contact binary, while the others are classified as W-subtype ones. Furthermore, the mass ratio of NSVS 7377875 is higher than 0.72, so it belongs to H-subtype contact binaries. Since their light curves have unequal height at two maxima which is called O'Connell effect, a dark spot on the primary component for each target was required to get a better fit of light curves. The orbital period investigation shows that the period of NSVS 2418361 is increasing, indicating a mass transfer from the less massive component to the more massive one, while the other targets exhibit no long-term variation. Our spectral subtraction analysis of LAMOST spectra revealed excess emissions in the $H_\alpha$ line, indicating chromospheric activity in all the three targets. The Gaia distance was applied to estimate the absolute parameters of the three targets, and we obtained their evolutionary state. The relationships between the energy transfer parameter of 76 H-subtype contact binaries and their bolometric luminosity ratios, as well as their contact degree, were presented. We discovered that H-subtype systems have less efficient energy transfer rate, which is corresponding to the conclusion proposed by Csizmadia \& Klagyivik., Comment: 21 pages, 10 figures, 12 tables, accepted by AJ
Published: 2024

9. How interfacial tension enhances drag in turbulent Taylor-Couette flow with neutrally buoyant and equally viscous droplets

Author: Su, Jinghong, Zhang, Yi-bao, Wang, Cheng, Yi, Lei, Xu, Fan, Fan, Yaning, Wang, Junwu, and Sun, Chao
Subjects: Physics - Fluid Dynamics
Abstract: The presence of dispersed-phase droplets can result in a notable increase in the system's drag. However, our understanding of the mechanism underlying this phenomenon remains limited. In this study, we use three-dimensional direct numerical simulations with a modified multi-marker volume-of-fluid method to investigate liquid-liquid two-phase turbulence in a Taylor-Couette geometry. The dispersed phase has the same density and viscosity as the continuous phase. The Reynolds number $Re\equiv r_i\omega_i d/\nu$ is fixed at 5200, the volume fraction of the dispersed phase is up to $40\%$, and the Weber number $We\equiv \rho u^2_\tau d/\sigma$ is around 8. It is found that the increase in the system's drag originates from the contribution of interfacial tension. Specifically, droplets experience significant deformation and stretching in the streamwise direction due to shear near the inner cylinder. Consequently, the rear end of the droplets lags behind the fore head. This causes opposing interfacial tension effects on the fore head and rear end of the droplets. For the fore head of the droplets, the effect of interfacial tension appears to act against the flow direction. For the rear end, the effect appears to act in the flow direction. The increase in the system's drag is primarily attributed to the effect of interfacial tension on the fore head of the droplets which leads to the hindering effect of the droplets on the surrounding continuous phase. This hindering effect disrupts the formation of high-speed streaks, favoring the formation of low-speed ones, which are generally associated with higher viscous stress and drag of the system. This study provides new insights into the mechanism of drag enhancement reported in our previous experiments.
Published: 2024

10. From Estimands to Robust Inference of Treatment Effects in Platform Trials

Author: Qian, Yuhan, Yi, Yifan, Shao, Jun, Yi, Yanyao, Mayer-Hamblett, Nicole, Heagerty, Patrick J., and Ye, Ting
Subjects: Statistics - Methodology
Abstract: A platform trial is an innovative clinical trial design that uses a master protocol (i.e., one overarching protocol) to evaluate multiple treatments in an ongoing manner and can accelerate the evaluation of new treatments. However, the flexibility that marks the potential of platform trials also creates inferential challenges. Two key challenges are the precise definition of treatment effects and the robust and efficient inference on these effects. To address these challenges, we first define a clinically meaningful estimand that characterizes the treatment effect as a function of the expected outcomes under two given treatments among concurrently eligible patients. Then, we develop weighting and post-stratification methods for estimation of treatment effects with minimal assumptions. To fully leverage the efficiency potential of data from concurrently eligible patients, we also consider a model-assisted approach for baseline covariate adjustment to gain efficiency while maintaining robustness against model misspecification. We derive and compare asymptotic distributions of proposed estimators in theory and propose robust variance estimators. The proposed estimators are empirically evaluated in a simulation study and illustrated using the SIMPLIFY trial. Our methods are implemented in the R package RobinCID.
Published: 2024

11. Look a Group at Once: Multi-Slide Modeling for Survival Prediction

Author: Li, Xinyang, Zhang, Yi, Xie, Yi, Yang, Jianfei, Wang, Xi, Chen, Hao, and Zhang, Haixian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Survival prediction is a critical task in pathology. In clinical practice, pathologists often examine multiple cases, leveraging a broader spectrum of cancer phenotypes to enhance pathological assessment. Despite significant advancements in deep learning, current solutions typically model each slide as a sample, struggling to effectively capture comparable and slide-agnostic pathological features. In this paper, we introduce GroupMIL, a novel framework inspired by the clinical practice of collective analysis, which models multiple slides as a single sample and organizes groups of patches and slides sequentially to capture cross-slide prognostic features. We also present GPAMamba, a model designed to facilitate intra- and inter-slide feature interactions, effectively capturing local micro-environmental characteristics within slide-level graphs while uncovering essential prognostic patterns across an extended patch sequence within the group framework. Furthermore, we develop a dual-head predictor that delivers comprehensive survival risk and probability assessments for each patient. Extensive empirical evaluations demonstrate that our model significantly outperforms state-of-the-art approaches across five datasets from The Cancer Genome Atlas.
Published: 2024

12. Revisit of discrete energy bands in Galilean moon's footprint tails: remote signals of particle absorption

Author: Yang, Fan, Xuzhi-Zhou, Liu, Ying, Sun, Yi-Xin, Yin, Ze-Fan, Hao, Yi-Xin, Liu, Zhi-Yang, Blanc, Michel, Zhao, Jiu-Tong, He, Dong-Wen, Wu, Ya-Ze, Wang, Shan, Yue, Chao, and Zong, Qiu-Gang
Subjects: Astrophysics - Earth and Planetary Astrophysics, Physics - Space Physics
Abstract: Recent observations from the Juno spacecraft during its transit over flux tubes of the Galilean moons have identified sharp enhancements of particle fluxes at discrete energies. These banded structures have been suspected to originate from a bounce resonance between particles and standing Alfven waves generated by the moon-magnetospheric interaction. Here, we show that predictions from the above hypothesis are inconsistent with the observations, and propose an alternative interpretation that the banded structures are remote signals of particle absorption at the moons. In this scenario, whether a particle would encounter the moon before reaching Juno depends on the number of bounce cycles it experiences within a fixed section of drift motion determined by moon-spacecraft longitudinal separation. Therefore, the absorption bands are expected to appear at discrete, equally-spaced velocities consistent with the observations. This finding improves our understanding of moon-plasma interactions and provides a potential way to evaluate the Jovian magnetospheric models., Comment: 15 pages, 4 figures
Published: 2024

13. SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms

Author: Chatterjee, Soumick, Mattern, Hendrik, Dörner, Marc, Sciarra, Alessandro, Dubost, Florian, Schnurre, Hannes, Khatun, Rupali, Yu, Chun-Chih, Hsieh, Tsung-Lin, Tsai, Yi-Shan, Fang, Yi-Zeng, Yang, Yung-Ching, Huang, Juinn-Dar, Xu, Marshall, Liu, Siyu, Ribeiro, Fernanda L., Bollmann, Saskia, Chintalapati, Karthikesh Varma, Radhakrishna, Chethan Mysuru, Kumara, Sri Chandana Hudukula Ram, Sutrave, Raviteja, Qayyum, Abdul, Mazher, Moona, Razzak, Imran, Rodero, Cristobal, Niederren, Steven, Lin, Fengming, Xia, Yan, Wang, Jiacheng, Qiu, Riyu, Wang, Liansheng, Panah, Arya Yazdan, Jurdi, Rosana El, Fu, Guanghui, Arslan, Janan, Vaillant, Ghislain, Valabregue, Romain, Dormont, Didier, Stankoff, Bruno, Colliot, Olivier, Vargas, Luisa, Chacón, Isai Daniel, Pitsiorlas, Ioannis, Arbeláez, Pablo, Zuluaga, Maria A., Schreiber, Stefanie, Speck, Oliver, and Nürnberger, Andreas
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15.
Published: 2024

14. DarkSHINE Baseline Design Report: Physics Prospects and Detector Technologies

Author: Chen, Jing, Chen, Ji-Yuan, Chen, Jun-Feng, Chen, Xiang, Fu, Chang-Bo, Guo, Jun, Guo, Yi-Han, Khaw, Kim Siang, Li, Jia-Lin, Li, Liang, Li, Shu, Lin, Yu-ming, Liu, Dan-Ning, Liu, Kang, Liu, Kun, Liu, Qi-Bin, Liu, Zhi, Lu, Ze-Jia, Lv, Meng, Song, Si-Yuan, Sun, Tong, Tang, Jian-Nan, Wan, Wei-Shi, Wang, Dong, Wang, Xiao-Long, Wang, Yu-Feng, Wang, Zhen, Wang, Zi-Rui, Wu, Wei-Hao, Yang, Hai-Jun, Yang, Lin, Yang, Yong, Yu, Dian, Yuan, Rui, Zhang, Jun-Hua, Zhang, Yu-Lei, Zhang, Yun-Long, Zhao, Zhi-Yu, Zhou, Bai-Hong, Zhu, Chun-Xiang, Zhu, Xu-Liang, and Zhu, Yi-Fan
Subjects: Physics - Instrumentation and Detectors, High Energy Physics - Experiment
Abstract: DarkSHINE is a newly proposed fixed-target experiment initiative to search for the invisible decay of Dark Photon via missing energy/momentum signatures, based on the high repetition rate electron beam to be deployed/delivered by the Shanghai High repetition rate XFEL and Extreme light facility (SHINE). This report elaborates the baseline design of DarkSHINE experiment by introducing the physics goals, experimental setups, details of each sub-detector system technical designs, signal and backgground modelings, expected search sensitivities and future prospects, which mark an important step towards the further prototyping and technical demonstrations.
Published: 2024

15. A fiber array architecture for atom quantum computing

Author: Li, Xiao, Hou, Jia-Yi, Wang, Jia-Chao, Wang, Guang-Wei, He, Xiao-Dong, Zhou, Feng, Wang, Yi-Bo, Liu, Min, Wang, Jin, Xu, Peng, and Zhan, Ming-Sheng
Subjects: Quantum Physics
Abstract: Arrays of single atoms trapped in optical tweezers are increasingly recognized as a promising platform for scalable quantum computing. In both the fault-tolerant and NISQ eras, the ability to individually control qubits is essential for the efficient execution of quantum circuits. Time-division multiplexed control schemes based on atom shuttling or beam scanning have been employed to build programmable neutral atom quantum processors, but achieving high-rate, highly parallel gate operations remains a challenge. Here, we propose a fiber array architecture for atom quantum computing capable of fully independent control of individual atoms. The trapping and addressing lasers for each individual atom are emitted from the same optical waveguide, enabling robust control through common-mode suppression of beam pointing noise. Using a fiber array, we experimentally demonstrate the trapping and independent control of ten single atoms in two-dimensional optical tweezers, achieving individually addressed single-qubit gate with an average fidelity of 0.9966(3). Moreover, we perform simultaneous arbitrary single-qubit gate on four randomly selected qubits, resulting in an average fidelity of 0.9961(4). Our work paves the way for time-efficient execution of quantum algorithms on neutral atom quantum computers., Comment: 12 pages
Published: 2024

16. RoundTable: Investigating Group Decision-Making Mechanism in Multi-Agent Collaboration

Author: Cho, Young-Min, Shu, Raphael, Das, Nilaksh, Alkhouli, Tamer, Lai, Yi-An, Cai, Jason, Sunkara, Monica, and Zhang, Yi
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence
Abstract: This study investigates the efficacy of Multi-Agent Systems in eliciting cross-agent communication and enhancing collective intelligence through group decision-making in a decentralized setting. Unlike centralized mechanisms, where a fixed hierarchy governs social choice, decentralized group decision-making allows agents to engage in joint deliberation. Our research focuses on the dynamics of communication and decision-making within various social choice methods. By applying different voting rules in various environments, we find that moderate decision flexibility yields better outcomes. Additionally, exploring the linguistic features of agent-to-agent conversations reveals indicators of effective collaboration, offering insights into communication patterns that facilitate or hinder collaboration. Finally, we propose various methods for determining the optimal stopping point in multi-agent collaborations based on linguistic cues. Our findings contribute to a deeper understanding of how decentralized decision-making and group conversation shape multi-agent collaboration, with implications for the design of more effective MAS environments., Comment: preprint
Published: 2024

17. Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Author: Yang, Chih-Kai, Fu, Yu-Kuan, Li, Chen-An, Lin, Yi-Cheng, Lin, Yu-Xiang, Chen, Wei-Chih, Chung, Ho Lam, Kuan, Chun-Yi, Huang, Wei-Ping, Lu, Ke-Han, Lin, Tzu-Quan, Wang, Hsiu-Hsuan, Hu, En-Pei, Hsu, Chan-Jan, Tseng, Liang-Hsuan, Chiu, I-Hsiang, Sanga, Ulin, Chen, Xuanjun, Hsu, Po-chun, Yang, Shu-wen, and Lee, Hung-yi
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin., Comment: Work in progress
Published: 2024

18. Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Author: Huang, Chien-yu, Chen, Wei-Chih, Yang, Shu-wen, Liu, Andy T., Li, Chen-An, Lin, Yu-Xiang, Tseng, Wei-Cheng, Diwan, Anuj, Shih, Yi-Jen, Shi, Jiatong, Chen, William, Chen, Xuanjun, Hsiao, Chi-Yuan, Peng, Puyuan, Wang, Shih-Heng, Kuan, Chun-Yi, Lu, Ke-Han, Chang, Kai-Wei, Yang, Chih-Kai, Ritter-Gutierrez, Fabian, Chuang, Ming To, Huang, Kuan-Po, Arora, Siddhant, Lin, You-Kuan, Yeo, Eunjung, Chang, Kalvin, Chien, Chung-Ming, Choi, Kwanghee, Hsieh, Cheng-Hsiu, Lin, Yi-Cheng, Yu, Chee-En, Chiu, I-Hsiang, Guimarães, Heitor R., Han, Jionghao, Lin, Tzu-Quan, Lin, Tzu-Yuan, Chang, Homu, Chang, Ting-Wu, Chen, Chun Wei, Chen, Shou-Jen, Chen, Yu-Hua, Cheng, Hsi-Chun, Dhawan, Kunal, Fang, Jia-Lin, Fang, Shi-Xin, Chiang, Kuan-Yu Fang, Fu, Chi An, Hsiao, Hsien-Fu, Hsu, Ching Yu, Huang, Shao-Syuan, Wei, Lee Chen, Lin, Hsi-Che, Lin, Hsuan-Hao, Lin, Hsuan-Ting, Lin, Jian-Ren, Liu, Ting-Chun, Lu, Li-Chun, Pai, Tsung-Min, Pasad, Ankita, Kuan, Shih-Yun Shan, Shon, Suwon, Tang, Yuxun, Tsai, Yun-Shao, Wei, Jui-Chiang, Wei, Tzu-Chieh, Wu, Chengxi, Wu, Dien-Ruei, Yang, Chao-Han Huck, Yang, Chieh-Chi, Yip, Jia Qi, Yuan, Shao-Xiang, Noroozi, Vahid, Chen, Zhehuai, Wu, Haibin, Livescu, Karen, Harwath, David, Watanabe, Shinji, and Lee, Hung-yi
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.
Published: 2024

19. NeuroFly: A framework for whole-brain single neuron reconstruction

Author: Zhao, Rubin, Liu, Yang, Zhang, Shiqi, Yi, Zijian, Xiao, Yanyang, Xu, Fang, Yang, Yi, and Zhou, Pencheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Quantitative Biology - Quantitative Methods
Abstract: Neurons, with their elongated, tree-like dendritic and axonal structures, enable efficient signal integration and long-range communication across brain regions. By reconstructing individual neurons' morphology, we can gain valuable insights into brain connectivity, revealing the structure basis of cognition, movement, and perception. Despite the accumulation of extensive 3D microscopic imaging data, progress has been considerably hindered by the absence of automated tools to streamline this process. Here we introduce NeuroFly, a validated framework for large-scale automatic single neuron reconstruction. This framework breaks down the process into three distinct stages: segmentation, connection, and proofreading. In the segmentation stage, we perform automatic segmentation followed by skeletonization to generate over-segmented neuronal fragments without branches. During the connection stage, we use a 3D image-based path following approach to extend each fragment and connect it with other fragments of the same neuron. Finally, human annotators are required only to proofread the few unresolved positions. The first two stages of our process are clearly defined computer vision problems, and we have trained robust baseline models to solve them. We validated NeuroFly's efficiency using in-house datasets that include a variety of challenging scenarios, such as dense arborizations, weak axons, images with contamination. We will release the datasets along with a suite of visualization and annotation tools for better reproducibility. Our goal is to foster collaboration among researchers to address the neuron reconstruction challenge, ultimately accelerating advancements in neuroscience research. The dataset and code are available at https://github.com/beanli161514/neurofly
Published: 2024

20. Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Author: Fu, Ao, Zhou, Yi, Zhou, Tao, Yang, Yi, Gao, Bojun, Li, Qun, Wu, Guobin, and Shao, Ling
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: World models and video generation are pivotal technologies in the domain of autonomous driving, each playing a critical role in enhancing the robustness and reliability of autonomous systems. World models, which simulate the dynamics of real-world environments, and video generation models, which produce realistic video sequences, are increasingly being integrated to improve situational awareness and decision-making capabilities in autonomous vehicles. This paper investigates the relationship between these two technologies, focusing on how their structural parallels, particularly in diffusion-based models, contribute to more accurate and coherent simulations of driving scenarios. We examine leading works such as JEPA, Genie, and Sora, which exemplify different approaches to world model design, thereby highlighting the lack of a universally accepted definition of world models. These diverse interpretations underscore the field's evolving understanding of how world models can be optimized for various autonomous driving tasks. Furthermore, this paper discusses the key evaluation metrics employed in this domain, such as Chamfer distance for 3D scene reconstruction and Fr\'echet Inception Distance (FID) for assessing the quality of generated video content. By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions, emphasizing the potential of these technologies to jointly advance the performance of autonomous driving systems. The findings presented in this paper aim to provide a comprehensive understanding of how the integration of video generation and world models can drive innovation in the development of safer and more reliable autonomous vehicles.
Published: 2024

21. Automated Vulnerability Detection Using Deep Learning Technique

Author: Yang, Guan-Yan, Ko, Yi-Heng, Wang, Farn, Yeh, Kuo-Hui, Chang, Haw-Shiang, and Chen, Hsueh-Yi
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Software Engineering, D.2.4, D.2.5
Abstract: Our work explores the utilization of deep learning, specifically leveraging the CodeBERT model, to enhance code security testing for Python applications by detecting SQL injection vulnerabilities. Unlike traditional security testing methods that may be slow and error-prone, our approach transforms source code into vector representations and trains a Long Short-Term Memory (LSTM) model to identify vulnerable patterns. When compared with existing static application security testing (SAST) tools, our model displays superior performance, achieving higher precision, recall, and F1-score. The study demonstrates that deep learning techniques, particularly with CodeBERT's advanced contextual understanding, can significantly improve vulnerability detection, presenting a scalable methodology applicable to various programming languages and vulnerability types., Comment: 4 pages, 1 figures; Presented at The 30st International Conference on Computational & Experimental Engineering and Sciences (ICCES2024)
Published: 2024

22. SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms

Author: Li, Shuzhen, Chen, Yuxin, Chen, Xuesong, Gao, Ruiyang, Zhang, Yupeng, Yu, Chao, Li, Yunfei, Ye, Ziyi, Huang, Weijun, Yi, Hongliang, Leng, Yue, and Wu, Yi
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Machine Learning
Abstract: Sleep monitoring plays a crucial role in maintaining good health, with sleep staging serving as an essential metric in the monitoring process. Traditional methods, utilizing medical sensors like EEG and ECG, can be effective but often present challenges such as unnatural user experience, complex deployment, and high costs. Ballistocardiography~(BCG), a type of piezoelectric sensor signal, offers a non-invasive, user-friendly, and easily deployable alternative for long-term home monitoring. However, reliable BCG-based sleep staging is challenging due to the limited sleep monitoring data available for BCG. A restricted training dataset prevents the model from generalization across populations. Additionally, transferring to BCG faces difficulty ensuring model robustness when migrating from other data sources. To address these issues, we introduce SleepNetZero, a zero-shot learning based approach for sleep staging. To tackle the generalization challenge, we propose a series of BCG feature extraction methods that align BCG components with corresponding respiratory, cardiac, and movement channels in PSG. This allows models to be trained on large-scale PSG datasets that are diverse in population. For the migration challenge, we employ data augmentation techniques, significantly enhancing generalizability. We conducted extensive training and testing on large datasets~(12393 records from 9637 different subjects), achieving an accuracy of 0.803 and a Cohen's Kappa of 0.718. ZeroSleepNet was also deployed in real prototype~(monitoring pads) and tested in actual hospital settings~(265 users), demonstrating an accuracy of 0.697 and a Cohen's Kappa of 0.589. To the best of our knowledge, this work represents the first known reliable BCG-based sleep staging effort and marks a significant step towards in-home health monitoring., Comment: 25 pages
Published: 2024

23. YourSkatingCoach: A Figure Skating Video Benchmark for Fine-Grained Element Analysis

Author: Chen, Wei-Yi, Lin, Yi-Ling, Su, Yu-An, Yeh, Wei-Hsin, and Ku, Lun-Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Combining sports and machine learning involves leveraging ML algorithms and techniques to extract insight from sports-related data such as player statistics, game footage, and other relevant information. However, datasets related to figure skating in the literature focus primarily on element classification and are currently unavailable or exhibit only limited access, which greatly raise the entry barrier to developing visual sports technology for it. Moreover, when using such data to help athletes improve their skills, we find they are very coarse-grained: they work for learning what an element is, but they are poorly suited to learning whether the element is good or bad. Here we propose air time detection, a novel motion analysis task, the goal of which is to accurately detect the duration of the air time of a jump. We present YourSkatingCoach, a large, novel figure skating dataset which contains 454 videos of jump elements, the detected skater skeletons in each video, along with the gold labels of the start and ending frames of each jump, together as a video benchmark for figure skating. In addition, although this type of task is often viewed as classification, we cast it as a sequential labeling problem and propose a Transformer-based model to calculate the duration. Experimental results show that the proposed model yields a favorable results for a strong baseline. To further verify the generalizability of the fine-grained labels, we apply the same process to other sports as cross-sports tasks but for coarse-grained task action classification. Here we fine-tune the classification to demonstrate that figure skating, as it contains the essential body movements, constitutes a strong foundation for adaptation to other sports.
Published: 2024

24. Gain-Loss Coupled Systems

Author: Zhang, Chunlei, Kim, Mun, Zhang, Yi-Hui, Wang, Yi-Pu, Trivedi, Deepanshu, Krasnok, Alex, Wang, Jianbo, Isleifson, Dustin, Roshko, Roy, and Hu, Can-Ming
Subjects: Quantum Physics, Condensed Matter - Mesoscale and Nanoscale Physics, Physics - Applied Physics, Physics - Optics
Abstract: Achieving oscillations with small dimensions, high power, high coherence, and low phase noise has been a long-standing goal in wave physics, driving innovations across classical electromagnetic theory and quantum physics. Key applications include electronic oscillators, lasers, and spin-torque oscillations. In recent decades, physicists have increasingly focused on harnessing passive oscillatory modes to manipulate these oscillations, leading to the development of diverse gain-loss coupled systems, including photon-photon, exciton-photon, photon-magnon, magnon-phonon, and magnon-magnon couplings. This review provides a comprehensive overview of these systems, exploring their fundamental physical structures, key experimental observations, and theoretical insights. By synthesizing insights from these studies, we propose future research directions to further advance the understanding and application of gain-loss coupled systems for quantum science and quantum technologies. (The field of gain-loss coupled systems is vast. The authors welcome suggestions and feedback from the community to continuously improve this review article until it is published)., Comment: 20 pages, 10 figures
Published: 2024

25. Non-Hermitian Hamiltonian Approach for Two-Dimensional Spectroscopy

Author: Zhang, Hao-Yue, Huang, Bin-Yao, Jin, Jing-Yi-Ran, Yao, Yi-Xuan, and Ai, Qing
Subjects: Quantum Physics, Physics - Optics
Abstract: Two-dimensional spectroscopy (2DS) offers significant advantages in terms of high temporal and frequency resolutions and signal-to-noise ratio. Until now, the response-function (RF) formalism has been the prevalent theoretical description. In this study, we compare the non-Hermitian Hamiltonian (NHH) method with the RF formalism in a three-level system with a constant control field. We obtain the signals from both approaches and compare their population dynamics and 2DS. We propose the quasi-Green function for the NHH method, which allows all possible Liouville paths to be inferred. Although the NHH method overestimates relaxations, it also provides a more comprehensive description. Our results demonstrate that the NHH method is more suitable than the RF formalism for investigating the systems that are either dissipative or complex via the 2DS.
Published: 2024

26. Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning

Author: Kuan, Chun-Yi and Lee, Hung-yi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Recent advancements in large audio-language models (LALMs) have shown impressive capabilities in understanding and reasoning about audio and speech information. However, these models still face challenges, including hallucinating non-existent sound events, misidentifying the order of sound events, and incorrectly attributing sound sources, which undermine their reliability and real-world application. To systematically evaluate these issues, we propose three distinct tasks: object existence, temporal order, and object attribute within audio. These tasks assess the models' comprehension of critical audio information aspects. Our experimental results reveal limitations in these fundamental tasks, underscoring the need for better models in recognizing specific sound events, determining event sequences, and identifying sound sources. To improve performance in these areas, we introduce a multi-turn chain-of-thought approach, which demonstrates significantly improved model performance across the proposed tasks., Comment: 5 pages, 1 figure
Published: 2024

27. A Study of Decay Rate of Bound Negative Muons

Author: Deng, Jian-Bo, Deng, Miao-Yi, Ma, Shi-Jie, Wang, Rui-Bo, Fan, Qi-Qi, He, Peng-Zhang, He, Yi-Peng, Li, Shuo-Wen, and Hu, Xian-Ru
Subjects: High Energy Physics - Phenomenology
Abstract: A number of experiments show that the decay lifetimes of muons bound to atomic nuclei are longer than the decay lifetimes of free muons. In this paper, a scheme of extending quantum mechanics (EQM) is proposed to resolve this problem. The Schr$\ddot{\text{o}}$dinger's equation is obtained to prove the validation of this attempt. The decay ratio of bound muons is also calculated in EQM, and the result is in good agreement with the experimental data., Comment: 5 pages, 1 figure, 2 tables
Published: 2024

28. Bots can Snoop: Uncovering and Mitigating Privacy Risks of Bots in Group Chats

Author: Chou, Kai-Hsiang, Lin, Yi-Min, Wang, Yi-An, Li, Jonathan Weiping, Kim, Tiffany Hyun-Jin, and Hsiao, Hsu-Chun
Subjects: Computer Science - Cryptography and Security
Abstract: New privacy concerns arise with chatbots on group messaging platforms. Chatbots may access information beyond their intended functionalities, such as messages unintended for chatbots or sender's identities. Chatbot operators may exploit such information to infer personal information and link users across groups, potentially leading to personal data breaches, pervasive tracking, and targeted advertising. Our analysis of conversation datasets shows that (1) chatbots often access far more messages than needed, and (2) when a user joins a new group with chatbots, there is a 3.4% chance that at least one of the chatbots can recognize and associate the user with their previous interactions in other groups. Although state-of-the-art group messaging protocols provide robust end-to-end security and some platforms have implemented policies to limit chatbot access, no platforms successfully combine these features. This paper introduces SnoopGuard, a secure group messaging protocol that ensures user privacy against chatbots while maintaining strong end-to-end security. Our method offers selective message access, preventing chatbots from accessing unrelated messages, and ensures sender anonymity within the group. SnoopGuard achieves $O(\log n + m)$ message-sending complexity for a group of $n$ users and $m$ chatbots, compared to $O(\log(n + m))$ in state-of-the-art protocols, with acceptable overhead for enhanced privacy. Our prototype implementation shows that sending a message in a group of 50 users and 10 chatbots takes about 30 milliseconds when integrated with Message Layer Security (MLS)., Comment: 18 pages, 5 figures
Published: 2024

29. First Very Long Baseline Interferometry Detections at 870{\mu}m

Author: Raymond, Alexander W., Doeleman, Sheperd S., Asada, Keiichi, Blackburn, Lindy, Bower, Geoffrey C., Bremer, Michael, Broguiere, Dominique, Chen, Ming-Tang, Crew, Geoffrey B., Dornbusch, Sven, Fish, Vincent L., García, Roberto, Gentaz, Olivier, Goddi, Ciriaco, Han, Chih-Chiang, Hecht, Michael H., Huang, Yau-De, Janssen, Michael, Keating, Garrett K., Koay, Jun Yi, Krichbaum, Thomas P., Lo, Wen-Ping, Matsushita, Satoki, Matthews, Lynn D., Moran, James M., Norton, Timothy J., Patel, Nimesh, Pesce, Dominic W., Ramakrishnan, Venkatessh, Rottmann, Helge, Roy, Alan L., Sánchez, Salvador, Tilanus, Remo P. J., Titus, Michael, Torne, Pablo, Wagner, Jan, Weintroub, Jonathan, Wielgus, Maciek, Young, André, Akiyama, Kazunori, Albentosa-Ruíz, Ezequiel, Alberdi, Antxon, Alef, Walter, Algaba, Juan Carlos, Anantua, Richard, Azulay, Rebecca, Bach, Uwe, Baczko, Anne-Kathrin, Ball, David, Baloković, Mislav, Bandyopadhyay, Bidisha, Barrett, John, Bauböck, Michi, Benson, Bradford A., Bintley, Dan, Blundell, Raymond, Bouman, Katherine L., Boyce, Hope, Brissenden, Roger, Britzen, Silke, Broderick, Avery E., Bronzwaer, Thomas, Bustamante, Sandra, Carlstrom, John E., Chael, Andrew, Chan, Chi-kwan, Chang, Dominic O., Chatterjee, Koushik, Chatterjee, Shami, Chen, Yongjun, Cheng, Xiaopeng, Cho, Ilje, Christian, Pierre, Conroy, Nicholas S., Conway, John E., Crawford, Thomas M., Cruz-Osorio, Alejandro, Cui, Yuzhu, Dahale, Rohan, Davelaar, Jordy, De Laurentis, Mariafelicia, Deane, Roger, Dempsey, Jessica, Desvignes, Gregory, Dexter, Jason, Dhruv, Vedant, Dihingia, Indu K., Dzib, Sergio A., Eatough, Ralph P., Emami, Razieh, Falcke, Heino, Farah, Joseph, Fomalont, Edward, Fontana, Anne-Laure, Ford, H. Alyson, Foschi, Marianna, Fraga-Encinas, Raquel, Freeman, William T., Friberg, Per, Fromm, Christian M., Fuentes, Antonio, Galison, Peter, Gammie, Charles F., Georgiev, Boris, Gold, Roman, Gómez-Ruiz, Arturo I., Gómez, José L., Gu, Minfeng, Gurwell, Mark, Hada, Kazuhiro, Haggard, Daryl, Hesper, Ronald, Heumann, Dirk, Ho, Luis C., Ho, Paul, Honma, Mareki, Huang, Chih-Wei L., Huang, Lei, Hughes, David H., Ikeda, Shiro, Impellizzeri, C. M. Violette, Inoue, Makoto, Issaoun, Sara, James, David J., Jannuzi, Buell T., Jeter, Britton, Jiang, Wu, Jiménez-Rosales, Alejandra, Johnson, Michael D., Jorstad, Svetlana, Jones, Adam C., Joshi, Abhishek V., Jung, Taehyun, Karuppusamy, Ramesh, Kawashima, Tomohisa, Kettenis, Mark, Kim, Dong-Jin, Kim, Jae-Young, Kim, Jongsoo, Kim, Junhan, Kino, Motoki, Kocherlakota, Prashant, Kofuji, Yutaro, Koch, Patrick M., Koyama, Shoko, Kramer, Carsten, Kramer, Joana A., Kramer, Michael, Kubo, Derek, Kuo, Cheng-Yu, La Bella, Noemi, Lee, Sang-Sung, Levis, Aviad, Li, Zhiyuan, Lico, Rocco, Lindahl, Greg, Lindqvist, Michael, Lisakov, Mikhail, Liu, Jun, Liu, Kuo, Liuzzo, Elisabetta, Lobanov, Andrei P., Loinard, Laurent, Lonsdale, Colin J., Lowitz, Amy E., Lu, Ru-Sen, MacDonald, Nicholas R., Mahieu, Sylvain, Maier, Doris, Mao, Jirong, Marchili, Nicola, Markoff, Sera, Marrone, Daniel P., Marscher, Alan P., Martí-Vidal, Iván, Medeiros, Lia, Menten, Karl M., Mizuno, Izumi, Mizuno, Yosuke, Montgomery, Joshua, Moriyama, Kotaro, Moscibrodzka, Monika, Mulaudzi, Wanga, Müller, Cornelia, Müller, Hendrik, Mus, Alejandro, Musoke, Gibwa, Myserlis, Ioannis, Nagai, Hiroshi, Nagar, Neil M., Nakamura, Masanori, Narayanan, Gopal, Natarajan, Iniyan, Nathanail, Antonios, Fuentes, Santiago Navarro, Neilsen, Joey, Ni, Chunchong, Nowak, Michael A., Oh, Junghwan, Okino, Hiroki, Sánchez, Héctor Raúl Olivares, Oyama, Tomoaki, Özel, Feryal, Palumbo, Daniel C. M., Paraschos, Georgios Filippos, Park, Jongho, Parsons, Harriet, Pen, Ue-Li, Piétu, Vincent, PopStefanija, Aleksandar, Porth, Oliver, Prather, Ben, Principe, Giacomo, Psaltis, Dimitrios, Pu, Hung-Yi, Raffin, Philippe A., Rao, Ramprasad, Rawlings, Mark G., Ricarte, Angelo, Ripperda, Bart, Roelofs, Freek, Romero-Cañizales, Cristina, Ros, Eduardo, Roshanineshat, Arash, Ruiz, Ignacio, Ruszczyk, Chet, Rygl, Kazi L. J., Sánchez-Argüelles, David, Sánchez-Portal, Miguel, Sasada, Mahito, Satapathy, Kaushik, Savolainen, Tuomas, Schloerb, F. Peter, Schonfeld, Jonathan, Schuster, Karl-Friedrich, Shao, Lijing, Shen, Zhiqiang, Small, Des, Sohn, Bong Won, SooHoo, Jason, Salas, León David Sosapanta, Souccar, Kamal, Srinivasan, Ranjani, Stanway, Joshua S., Sun, He, Tazaki, Fumie, Tetarenko, Alexandra J., Tiede, Paul, Toma, Kenji, Toscano, Teresa, Traianou, Efthalia, Trent, Tyler, Trippe, Sascha, Turk, Matthew, van Bemmel, Ilse, van Langevelde, Huib Jan, van Rossum, Daniel R., Vos, Jesse, Ward-Thompson, Derek, Wardle, John, Washington, Jasmin E., Wharton, Robert, Wiik, Kaj, Witzel, Gunther, Wondrak, Michael F., Wong, George N., Wu, Qingwen, Yadlapalli, Nitika, Yamaguchi, Paul, Yfantis, Aristomenis, Yoon, Doosoo, Younsi, Ziri, Yu, Wei, Yuan, Feng, Yuan, Ye-Fei, Zensus, J. Anton, Zhang, Shuo, Zhao, Guang-Yao, and Zhao, Shan-Shan
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - High Energy Astrophysical Phenomena
Abstract: The first very long baseline interferometry (VLBI) detections at 870$\mu$m wavelength (345$\,$GHz frequency) are reported, achieving the highest diffraction-limited angular resolution yet obtained from the surface of the Earth, and the highest-frequency example of the VLBI technique to date. These include strong detections for multiple sources observed on inter-continental baselines between telescopes in Chile, Hawaii, and Spain, obtained during observations in October 2018. The longest-baseline detections approach 11$\,$G$\lambda$ corresponding to an angular resolution, or fringe spacing, of 19$\mu$as. The Allan deviation of the visibility phase at 870$\mu$m is comparable to that at 1.3$\,$mm on the relevant integration time scales between 2 and 100$\,$s. The detections confirm that the sensitivity and signal chain stability of stations in the Event Horizon Telescope (EHT) array are suitable for VLBI observations at 870$\mu$m. Operation at this short wavelength, combined with anticipated enhancements of the EHT, will lead to a unique high angular resolution instrument for black hole studies, capable of resolving the event horizons of supermassive black holes in both space and time., Comment: Corresponding author: S. Doeleman
Published: 2024
Full Text: View/download PDF

30. Energy calibration of GTM on ground

Author: Huang, Chien-You, Chang, Hsiang-Kuang, Lin, Chih-Hsun, Tsao, Che-Chih, Hu, Chin-Ping, Chang, Hao-Min, Chen, Yan-Fu, Feng, An-Hsuan, Huang, Yi-Wen, Lin, Tzu-Hsuan, Tsao, Yi-Ning, Wu, Chih-En, and Wu, Chun-Wei
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics
Abstract: The Gamma-ray Transients Monitor (GTM) on board the Formosat-8B (FS-8B) satellite is designed to detect and localize Gamma-Ray Bursts (GRBs). By utilizing 2+2 CITIROC chips to manipulate 4+4 detectors, which are composed of GAGG(Ce) scintillators coupled with Silicon Photomultipliers (SiPMs) and oriented in various directions to achieve all-sky coverage, the GRB saturation fluences of GTM in the 50 keV to 1 MeV range for Short GRBs (SGRBs) and Long GRBs (LGRBs) were estimated to be about $3.1 \times 10^{-4}$ and $5.0 \times 10^{-3}\ {\rm erg/cm^2}$, respectively, based on simulations. To precisely interpret the GTM readout signal in terms of energy, several measurements for isotope and gain calibration were conducted. Despite encountering issues with crosstalk and SiPM saturation effect in the data, the energy spectrum can still be recovered by appropriately discarding channel noise and mapping with the correct ADC-to-energy relation. This paper summarizes the energy resolution of GTM and the linear variations in the relationship between photon energy and readout signal. At 662 keV, the energy resolution is about 16 %. Also, it demonstrates that greater gain is achieved by increasing voltage or decreasing temperature.
Published: 2024

31. PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Author: Chen, Mengzhao, Liu, Yi, Wang, Jiahao, Bin, Yi, Shao, Wenqi, and Luo, Ping
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Quantization is essential for deploying Large Language Models (LLMs) by enhancing memory efficiency and inference speed. Existing methods for activation quantization mainly address channel-wise outliers, often neglecting token-wise outliers, leading to reliance on costly per-token dynamic quantization. To address this, we introduce PrefixQuant, a novel technique that isolates outlier tokens offline without re-training. Specifically, PrefixQuant identifies high-frequency outlier tokens and prefixes them in the KV cache, preventing the generation of outlier tokens during inference and simplifying quantization. To our knowledge, PrefixQuant is the first to enable efficient per-tensor static quantization to outperform expensive per-token dynamic quantization. For instance, in W4A4KV4 (4- bit weight, 4-bit activation, and 4-bit KV cache) Llama-3-8B, PrefixQuant with per-tensor static quantization achieves a 7.43 WikiText2 perplexity and 71.08% average accuracy on 5 common-sense reasoning tasks, outperforming previous per-token dynamic quantization methods like QuaRot with 0.98 perplexity improvement and +5.98 points accuracy. Additionally, the inference speed of W4A4 quantized models using PrefixQuant is 1.60x to 2.81x faster than FP16 models and exceeds QuaRot models by 1.2x to 1.3x. Our code is available at \url{https://github.com/ChenMnZ/PrefixQuant}., Comment: A PTQ method to significantly boost the performance of static activation quantization
Published: 2024

32. Estimating Body and Hand Motion in an Ego-sensed World

Author: Yi, Brent, Ye, Vickie, Zheng, Maya, Müller, Lea, Pavlakos, Georgios, Ma, Yi, Malik, Jitendra, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture the wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve the hands: the resulting kinematic and temporal constraints result in over 40% lower hand estimation errors compared to noisy monocular estimates. Project page: https://egoallo.github.io/, Comment: v2: fixed figures for Safari, typos
Published: 2024

33. Current Status of Inert Higgs Dark Matter with Dark Fermions

Author: Fan, Yi-Zhong, Li, Yao-Yu, Lu, Chih-Ting, Luo, Xiao-Yi, Tang, Tian-Peng, Tran, Van Que, and Tsai, Yue-Lin Sming
Subjects: High Energy Physics - Phenomenology
Abstract: The precision measurements of the muon magnetic moment and the $W$ boson mass have sparked interest in the potential deviations from standard model (SM) predictions. While it may be premature to attribute any excesses in these precision measurements to new physics, they do offer a valuable indication of potential directions for physics beyond the SM. Additionally, the particle nature of dark matter (DM) remains a crucial enigma. Despite the absence of any definitive DM signal in direct detection and collider experiments, the Galactic Center GeV $\gamma$-ray excess and the AMS-02 antiproton ($\overline{p}$) excess could potentially offer hints related to the evidence of DM. Motivated by these observations, we propose a simple DM model that addresses all these issues. This model extends the SM by incorporating singlet and doublet Dirac fermion fields, along with a doublet complex scalar field. For the viable parameter regions in this model, we find that future upgrades of the Large Hadron Collider and DM direct detection experiments can only partially probe them, while future high-energy muon colliders hold promise for exploring the unexplored parameter space., Comment: 33 pages, 10 figures, 2 tables. Comments are welcome
Published: 2024

34. Intestinal Symptoms among Children Aged 2-7 Years with Autism Spectrum Disorder in 13 Cities of China

Author: Ting Yang, Qian Zhang, Li Chen, Ying Dai, Fei-Yong Jia, Yan Hao, Ling Li, Jie Zhang, Li-Jie Wu, Xiao-Yan Ke, Ming-Ji Yi, Qi Hong, Jin-Jin Chen, Shuan-Feng Fang, Yi-Chao Wang, Qi Wang, Chun-Hua Jin, Jie Chen, and Ting-Yu Li
Abstract: Background: Autism spectrum disorder (ASD) is a multifactorial, pervasive, neurodevelopmental disorder, of which intestinal symptoms collectively represent one of the most common comorbidities. Methods: In this study, 1,222 children with ASD and 1,206 typically developing (TD) children aged 2-7 years were enrolled from 13 cities in China. Physical measurement and basic information questionnaires were conducted in ASD and TD children. The Childhood Autism Rating Scale (CARS), Social Responsiveness Scale (SRS), and Autism Behavior Checklist (ABC) were used to evaluate the clinical symptoms of children with ASD. The six-item Gastrointestinal Severity Index (6-GSI) was used to evaluate the prevalence of intestinal symptoms in two groups. Results: The detection rates of constipation, stool odor, and total intestinal symptoms in ASD children were significantly higher than those in TD children (40.098% vs. 25.622%, 17.021% vs. 9.287%, and 53.601% vs. 41.294%, respectively). Autistic children presenting with intestinal comorbidity had significantly higher scores on the ABC, SRS, CARS, and multiple subscales than autistic children without intestinal symptoms, suggesting that intestinal comorbidity may exacerbates the core symptoms of ASD children. Conclusion: Intestinal dysfunction was significantly more common in autistic than in TD children. This dysfunction may aggravate the core symptoms of children with ASD.
Published: 2024
Full Text: View/download PDF

35. Advancing readiness for change in substance use for people with substance use disorders using the Kawa model based intervention program: A quasi-experimental study

Author: Hsiao, Han-Yi, Wang, Tsui-Ying, Lee, Chun-Hung, Lu, Young-Chin, Huang, Yu-Chen, Chien, Ying-Chun, Potenza, Marc N, and Lin, Chung-Ying
Published: 2024

36. Comparative genomics revealing the genomic characteristics of 'Klebsiella variicola' clinical isolates in China

Author: Yang, Fang, Liu, Fei-Yi, and Zhong, Yi-Ming
Published: 2024

37. Temporally distinct 3D multi-omic dynamics in the developing human brain

Author: Heffel, Matthew G, Zhou, Jingtian, Zhang, Yi, Lee, Dong-Sung, Hou, Kangcheng, Pastor-Alonso, Oier, Abuhanna, Kevin D, Galasso, Joseph, Kern, Colin, Tai, Chu-Yi, Garcia-Padilla, Carlos, Nafisi, Mahsa, Zhou, Yi, Schmitt, Anthony D, Li, Terence, Haeussler, Maximilian, Wick, Brittney, Zhang, Martin Jinye, Xie, Fangming, Ziffra, Ryan S, Mukamel, Eran A, Eskin, Eleazar, Nowakowski, Tomasz J, Dixon, Jesse R, Pasaniuc, Bogdan, Ecker, Joseph R, Zhu, Quan, Bintu, Bogdan, Paredes, Mercedes F, and Luo, Chongyuan
Subjects: Biological Sciences, Genetics, Brain Disorders, Neurosciences, Mental Illness, Mental Health, Human Genome, 2.1 Biological and endogenous factors, 1.1 Normal biological development and functioning, Neurological, Humans, Cell Differentiation, Chromatin, Disease Susceptibility, DNA Methylation, Epigenesis, Genetic, Epigenomics, Fetus, Hippocampus, Multiomics, Neuroglia, Neurons, Prefrontal Cortex, Schizophrenia, Single Molecule Imaging, Single-Cell Analysis, Time Factors, Infant, Newborn, General Science & Technology
Abstract: The human hippocampus and prefrontal cortex play critical roles in learning and cognition1,2, yet the dynamic molecular characteristics of their development remain enigmatic. Here we investigated the epigenomic and three-dimensional chromatin conformational reorganization during the development of the hippocampus and prefrontal cortex, using more than 53,000 joint single-nucleus profiles of chromatin conformation and DNA methylation generated by single-nucleus methyl-3C sequencing (snm3C-seq3)3. The remodelling of DNA methylation is temporally separated from chromatin conformation dynamics. Using single-cell profiling and multimodal single-molecule imaging approaches, we have found that short-range chromatin interactions are enriched in neurons, whereas long-range interactions are enriched in glial cells and non-brain tissues. We reconstructed the regulatory programs of cell-type development and differentiation, finding putatively causal common variants for schizophrenia strongly overlapping with chromatin loop-connected, cell-type-specific regulatory regions. Our data provide multimodal resources for studying gene regulatory dynamics in brain development and demonstrate that single-cell three-dimensional multi-omics is a powerful approach for dissecting neuropsychiatric risk loci.
Published: 2024

38. Viral proteins resolve the virus-vector conundrum during hemipteran-mediated transmission by subverting salicylic acid signaling pathway.

Author: Zhang, Jing-Ru, Liu, Yi-Ming, Li, Di, Wu, Yi-Jie, Zhao, Shi-Xing, Wang, Xiao-Wei, Liu, Shu-Sheng, Walling, Linda, and Pan, Li-Long
Subjects: Salicylic Acid, Animals, Signal Transduction, Plant Diseases, Insect Vectors, Begomovirus, Viral Proteins, Nicotiana, Hemiptera, Plant Proteins, HSP90 Heat-Shock Proteins
Abstract: Hemipteran insects transmit viruses when infesting plants, during which vectors activate salicylic acid (SA)-regulated antiviral defenses. How vector-borne plant viruses circumvent these antiviral defenses is largely unexplored. During co-infections of begomoviruses and betasatellites in plants, betasatellite-encoded βC1 proteins interfere with SA signaling and reduce the activation of antiviral resistance. βC1 inhibits SA-induced degradation of NbNPR3 (Nicotiana benthamiana nonexpressor of pathogenesis-related genes 3), a negative regulator of SA signaling. βC1 does not bind directly to NbNPR3, but regulates NbNPR3 degradation via heat shock protein 90s (NbHSP90s). NbHSP90s bind to both NbNPR3 and βC1 and suppress SA signaling. This viral success strategy appears to be conserved as it is also documented for viral proteins encoded by two aphid-borne viruses. Our findings reveal an exquisite mechanism that facilitates the persistence of vector-borne plant viruses and provide important insights into the intricacies of the virus life cycle.
Published: 2024

39. CDK12 loss drives prostate cancer progression, transcription-replication conflicts, and synthetic lethality with paralog CDK13.

Author: Tien, Jean, Luo, Jie, Chang, Yu, Zhang, Yuping, Cheng, Yunhui, Wang, Xiaoju, Yang, Jianzhang, Mannan, Rahul, Mahapatra, Somnath, Shah, Palak, Wang, Xiao-Ming, Todd, Abigail, Eyunni, Sanjana, Cheng, Caleb, Rebernick, Ryan, Xiao, Lanbo, Bao, Yi, Neiswender, James, Brough, Rachel, Pettitt, Stephen, Cao, Xuhong, Miner, Stephanie, Zhou, Licheng, Wu, Yi-Mi, Labanca, Estefania, Wang, Yuzhuo, Parolia, Abhijit, Cieslik, Marcin, Robinson, Dan, Wang, Zhen, Feng, Felix, Chou, Jonathan, Lord, Christopher, Ding, Ke, and Chinnaiyan, Arul
Subjects: CDK12, CDK13, Cdk12 knockout, R-loops, paralog-based synthetic lethality, prostate cancer, transcription-replication conflicts, Male, Animals, Humans, Cyclin-Dependent Kinases, Mice, Synthetic Lethal Mutations, Prostatic Neoplasms, Tumor Suppressor Protein p53, Disease Progression, PTEN Phosphohydrolase, Genomic Instability, Transcription, Genetic, Organoids, Prostatic Neoplasms, Castration-Resistant, Cell Proliferation, DNA Replication, Mice, Knockout, Cell Line, Tumor, Mice, Inbred C57BL, CDC2 Protein Kinase
Abstract: Biallelic loss of cyclin-dependent kinase 12 (CDK12) defines a metastatic castration-resistant prostate cancer (mCRPC) subtype. It remains unclear, however, whether CDK12 loss drives prostate cancer (PCa) development or uncovers pharmacologic vulnerabilities. Here, we show Cdk12 ablation in murine prostate epithelium is sufficient to induce preneoplastic lesions with lymphocytic infiltration. In allograft-based CRISPR screening, Cdk12 loss associates positively with Trp53 inactivation but negatively with Pten inactivation. Moreover, concurrent Cdk12/Trp53 ablation promotes proliferation of prostate-derived organoids, while Cdk12 knockout in Pten-null mice abrogates prostate tumor growth. In syngeneic systems, Cdk12/Trp53-null allografts exhibit luminal morphology and immune checkpoint blockade sensitivity. Mechanistically, Cdk12 inactivation mediates genomic instability by inducing transcription-replication conflicts. Strikingly, CDK12-mutant organoids and patient-derived xenografts are sensitive to inhibition or degradation of the paralog kinase, CDK13. We therein establish CDK12 as a bona fide tumor suppressor, mechanistically define how CDK12 inactivation causes genomic instability, and advance a therapeutic strategy for CDK12-mutant mCRPC.
Published: 2024

40. Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset

Author: Shao, Hao-Chiang, Liao, Yuan-Rong, Tseng, Tse-Yu, Chuo, Yen-Liang, and Lin, Fong-Yi
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: With increasing revelations of academic fraud, detecting forged experimental images in the biomedical field has become a public concern. The challenge lies in the fact that copy-move targets can include background tissue, small foreground objects, or both, which may be out of the training domain and subject to unseen attacks, rendering standard object-detection-based approaches less effective. To address this, we reformulate the problem of detecting biomedical copy-move forgery regions as an intra-image co-saliency detection task and propose CMSeg-Net, a copy-move forgery segmentation network capable of identifying unseen duplicated areas. Built on a multi-resolution encoder-decoder architecture, CMSeg-Net incorporates self-correlation and correlation-assisted spatial-attention modules to detect intra-image regional similarities within feature tensors at each observation scale. This design helps distinguish even small copy-move targets in complex microscopic images from other similar objects. Furthermore, we created a copy-move forgery dataset of optical microscopic images, named FakeParaEgg, using open data from the ICIP 2022 Challenge to support CMSeg-Net's development and verify its performance. Extensive experiments demonstrate that our approach outperforms previous state-of-the-art methods on the FakeParaEgg dataset and other open copy-move detection datasets, including CASIA-CMFD, CoMoFoD, and CMF. The FakeParaEgg dataset, our source code, and the CMF dataset with our manually defined segmentation ground truths available at ``https://github.com/YoursEver/FakeParaEgg''., Comment: submitted to IEEE SPL
Published: 2024

41. TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views

Author: Zhao, Liang, Bao, Zehan, Xie, Yi, Chen, Hong, Chen, Yaohui, and Li, Weifu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recent advances in Gaussian Splatting have significantly advanced the field, achieving both panoptic and interactive segmentation of 3D scenes. However, existing methodologies often overlook the critical need for reconstructing specified targets with complex structures from sparse views. To address this issue, we introduce TSGaussian, a novel framework that combines semantic constraints with depth priors to avoid geometry degradation in challenging novel view synthesis tasks. Our approach prioritizes computational resources on designated targets while minimizing background allocation. Bounding boxes from YOLOv9 serve as prompts for Segment Anything Model to generate 2D mask predictions, ensuring semantic accuracy and cost efficiency. TSGaussian effectively clusters 3D gaussians by introducing a compact identity encoding for each Gaussian ellipsoid and incorporating 3D spatial consistency regularization. Leveraging these modules, we propose a pruning strategy to effectively reduce redundancy in 3D gaussians. Extensive experiments demonstrate that TSGaussian outperforms state-of-the-art methods on three standard datasets and a new challenging dataset we collected, achieving superior results in novel view synthesis of specific objects. Code is available at: https://github.com/leon2000-ai/TSGaussian.
Published: 2024

42. Timealign: A multi-modal object detection method for time misalignment fusing in autonomous driving

Author: Song, Zhihang, Peng, Lihui, Hu, Jianming, Yao, Danya, and Zhang, Yi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The multi-modal perception methods are thriving in the autonomous driving field due to their better usage of complementary data from different sensors. Such methods depend on calibration and synchronization between sensors to get accurate environmental information. There have already been studies about space-alignment robustness in autonomous driving object detection process, however, the research for time-alignment is relatively few. As in reality experiments, LiDAR point clouds are more challenging for real-time data transfer, our study used historical frames of LiDAR to better align features when the LiDAR data lags exist. We designed a Timealign module to predict and combine LiDAR features with observation to tackle such time misalignment based on SOTA GraphBEV framework., Comment: 8 pages, 3 figures
Published: 2024

43. Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples

Author: Wang, Yeyuan, Gao, Dehong, Yi, Lei, Jin, Linbo, Zhang, Jinxia, Yang, Libin, and Cai, Xiaoyan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing Vision-Language Pretraining (VLP) methods have achieved remarkable improvements across a variety of vision-language tasks, confirming their effectiveness in capturing coarse-grained semantic correlations. However, their capability for fine-grained understanding, which is critical for many nuanced vision-language applications, remains limited. Prevailing VLP models often overlook the intricate distinctions in expressing different modal features and typically depend on the similarity of holistic features for cross-modal interactions. Moreover, these models directly align and integrate features from different modalities, focusing more on coarse-grained general representations, thus failing to capture the nuanced differences necessary for tasks demanding a more detailed perception. In response to these limitations, we introduce Negative Augmented Samples(NAS), a refined vision-language pretraining model that innovatively incorporates NAS to specifically address the challenge of fine-grained understanding. NAS utilizes a Visual Dictionary(VD) as a semantic bridge between visual and linguistic domains. Additionally, it employs a Negative Visual Augmentation(NVA) method based on the VD to generate challenging negative image samples. These samples deviate from positive samples exclusively at the token level, thereby necessitating that the model discerns the subtle disparities between positive and negative samples with greater precision. Comprehensive experiments validate the efficacy of NAS components and underscore its potential to enhance fine-grained vision-language comprehension., Comment: 15pages, Accepted by AAAI2025, full paper
Published: 2024

44. NeRF-Texture: Synthesizing Neural Radiance Field Textures

Author: Huang, Yi-Hua, Cao, Yan-Pei, Lai, Yu-Kun, Shan, Ying, and Gao, Lin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Texture synthesis is a fundamental problem in computer graphics that would benefit various applications. Existing methods are effective in handling 2D image textures. In contrast, many real-world textures contain meso-structure in the 3D geometry space, such as grass, leaves, and fabrics, which cannot be effectively modeled using only 2D image textures. We propose a novel texture synthesis method with Neural Radiance Fields (NeRF) to capture and synthesize textures from given multi-view images. In the proposed NeRF texture representation, a scene with fine geometric details is disentangled into the meso-structure textures and the underlying base shape. This allows textures with meso-structure to be effectively learned as latent features situated on the base shape, which are fed into a NeRF decoder trained simultaneously to represent the rich view-dependent appearance. Using this implicit representation, we can synthesize NeRF-based textures through patch matching of latent features. However, inconsistencies between the metrics of the reconstructed content space and the latent feature space may compromise the synthesis quality. To enhance matching performance, we further regularize the distribution of latent features by incorporating a clustering constraint. In addition to generating NeRF textures over a planar domain, our method can also synthesize NeRF textures over curved surfaces, which are practically useful. Experimental results and evaluations demonstrate the effectiveness of our approach.
Published: 2024

45. Cycle-Consistent Bridge Diffusion Model for Accelerated MRI Reconstruction

Author: Song, Tao, Wu, Yicheng, Hu, Minhao, Luo, Xiangde, Luo, Guoting, Wang, Guotai, Guo, Yi, Xu, Feng, and Zhang, Shaoting
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Accelerated MRI reconstruction techniques aim to reduce examination time while maintaining high image fidelity, which is highly desirable in clinical settings for improving patient comfort and hospital efficiency. Existing deep learning methods typically reconstruct images from under-sampled data with traditional reconstruction approaches, but they still struggle to provide high-fidelity results. Diffusion models show great potential to improve fidelity of generated images in recent years. However, their inference process starting with a random Gaussian noise introduces instability into the results and usually requires thousands of sampling steps, resulting in sub-optimal reconstruction quality and low efficiency. To address these challenges, we propose Cycle-Consistent Bridge Diffusion Model (CBDM). CBDM employs two bridge diffusion models to construct a cycle-consistent diffusion process with a consistency loss, enhancing the fine-grained details of reconstructed images and reducing the number of diffusion steps. Moreover, CBDM incorporates a Contourlet Decomposition Embedding Module (CDEM) which captures multi-scale structural texture knowledge in images through frequency domain decomposition pyramids and directional filter banks to improve structural fidelity. Extensive experiments demonstrate the superiority of our model by higher reconstruction quality and fewer training iterations, achieving a new state of the art for accelerated MRI reconstruction in both fastMRI and IXI datasets.
Published: 2024

46. Dual-Zone Hard-Core Model for RTS/CTS Handshake Analysis in WLANs

Author: Zhong, Yi, Chen, Zhuoling, Zhang, Wenyi, and Haenggi, Martin
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Information Theory
Abstract: This paper introduces a new stochastic geometry-based model to analyze the Request-to-Send/Clear-to-Send (RTS/CTS) handshake mechanism in wireless local area networks (WLANs). We develop an advanced hard-core point process model, termed the dual-zone hard-core process (DZHCP), which extends traditional hard-core models to capture the spatial interactions and exclusion effects introduced by the RTS/CTS mechanism. This model integrates key parameters accounting for the thinning effects imposed by RTS/CTS, enabling a refined characterization of active transmitters in the network. Analytical expressions are derived for the intensity of the DZHCP, the mean interference, and an approximation of the success probability, providing insight into how network performance depends on critical design parameters. Our results provide better estimates of interference levels and success probability, which could inform strategies for better interference management and improved performance in future WLAN designs., Comment: submitted to IEEE Transactions on Wireless Communications
Published: 2024

47. Benchmarking Table Comprehension In The Wild

Author: Pan, Yikang, Zhu, Yi, Xie, Rand, and Liu, Yizhi
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs), while being increasingly dominant on a myriad of knowledge-intensive activities, have only had limited success understanding lengthy table-text mixtures, such as academic papers and financial reports. Recent advances of long-context LLMs have opened up new possibilities for this field. Nonetheless, we identify two roadblocks: (1) Prior benchmarks of table question answering (TableQA) have focused on isolated tables without context, making it hard to evaluate models in real-world scenarios. (2) Prior benchmarks have focused on some narrow skill sets of table comprehension such as table recognition, data manipulation/calculation, table summarization etc., while a skilled human employs those skills collectively. In this work, we introduce TableQuest, a new benchmark designed to evaluate the holistic table comprehension capabilities of LLMs in the natural table-rich context of financial reports. We employ a rigorous data processing and filtering procedure to ensure that the question-answer pairs are logical, reasonable, and diverse. We experiment with 7 state-of-the-art models, and find that despite reasonable accuracy in locating facts, they often falter when required to execute more sophisticated reasoning or multi-step calculations. We conclude with a qualitative study of the failure modes and discuss the challenges of constructing a challenging benchmark. We make the evaluation data, judging procedure and results of this study publicly available to facilitate research in this field., Comment: Accepted at TRL Workshop@Neurips 2024. Link to data https://github.com/boson-ai/Table_eval_public
Published: 2024

48. Sharpening Your Density Fields: Spiking Neuron Aided Fast Geometry Learning

Author: Gu, Yi, Wang, Zhaorui, Ye, Dongjun, and Xu, Renjing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural Radiance Fields (NeRF) have achieved remarkable progress in neural rendering. Extracting geometry from NeRF typically relies on the Marching Cubes algorithm, which uses a hand-crafted threshold to define the level set. However, this threshold-based approach requires laborious and scenario-specific tuning, limiting its practicality for real-world applications. In this work, we seek to enhance the efficiency of this method during the training time. To this end, we introduce a spiking neuron mechanism that dynamically adjusts the threshold, eliminating the need for manual selection. Despite its promise, directly training with the spiking neuron often results in model collapse and noisy outputs. To overcome these challenges, we propose a round-robin strategy that stabilizes the training process and enables the geometry network to achieve a sharper and more precise density distribution with minimal computational overhead. We validate our approach through extensive experiments on both synthetic and real-world datasets. The results show that our method significantly improves the performance of threshold-based techniques, offering a more robust and efficient solution for NeRF geometry extraction.
Published: 2024

49. Can Students Beyond The Teacher? Distilling Knowledge from Teacher's Bias

Author: Zhang, Jianhua, Gao, Yi, Liu, Ruyu, Cheng, Xu, Zhang, Houxiang, and Chen, Shengyong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Knowledge distillation (KD) is a model compression technique that transfers knowledge from a large teacher model to a smaller student model to enhance its performance. Existing methods often assume that the student model is inherently inferior to the teacher model. However, we identify that the fundamental issue affecting student performance is the bias transferred by the teacher. Current KD frameworks transmit both right and wrong knowledge, introducing bias that misleads the student model. To address this issue, we propose a novel strategy to rectify bias and greatly improve the student model's performance. Our strategy involves three steps: First, we differentiate knowledge and design a bias elimination method to filter out biases, retaining only the right knowledge for the student model to learn. Next, we propose a bias rectification method to rectify the teacher model's wrong predictions, fundamentally addressing bias interference. The student model learns from both the right knowledge and the rectified biases, greatly improving its prediction accuracy. Additionally, we introduce a dynamic learning approach with a loss function that updates weights dynamically, allowing the student model to quickly learn right knowledge-based easy tasks initially and tackle hard tasks corresponding to biases later, greatly enhancing the student model's learning efficiency. To the best of our knowledge, this is the first strategy enabling the student model to surpass the teacher model. Experiments demonstrate that our strategy, as a plug-and-play module, is versatile across various mainstream KD frameworks. We will release our code after the paper is accepted.
Published: 2024

50. Understand the Effectiveness of Shortcuts through the Lens of DCA

Author: Sun, Youran, Liu, Yihua, and Niu, Yi-Shuai
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Mathematics - Optimization and Control
Abstract: Difference-of-Convex Algorithm (DCA) is a well-known nonconvex optimization algorithm for minimizing a nonconvex function that can be expressed as the difference of two convex ones. Many famous existing optimization algorithms, such as SGD and proximal point methods, can be viewed as special DCAs with specific DC decompositions, making it a powerful framework for optimization. On the other hand, shortcuts are a key architectural feature in modern deep neural networks, facilitating both training and optimization. We showed that the shortcut neural network gradient can be obtained by applying DCA to vanilla neural networks, networks without shortcut connections. Therefore, from the perspective of DCA, we can better understand the effectiveness of networks with shortcuts. Moreover, we proposed a new architecture called NegNet that does not fit the previous interpretation but performs on par with ResNet and can be included in the DCA framework.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,869,125 results on '"Yi, A."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources