Author: "Zhang, Li" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Li"' showing total 128,443 results

Start Over Author "Zhang, Li"

128,443 results on '"Zhang, Li"'

201. Sedimentary Carbon and Nitrogen Dynamics Reveal Impact of Human Land-Use Change on Kawainui Marsh, O‘ahu, Hawai‘i

Author: Anderson, Brittany, Zhang, Li, Wang, Huining, Lu, Tianyi, Horgen, F. David, Culliney, John, and Fang, Jiasong
Published: 2017

202. Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation

Author: Yang, Tianzhou, Zhang, Li, Yi, Liwei, Feng, Huawei, Li, Shimeng, Chen, Haoyu, Zhu, Junfeng, Zhao, Jian, Zeng, Yingyue, and Liu, Hongsheng
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: BackgroundEarly diabetes screening can effectively reduce the burden of disease. However, natural population–based screening projects require a large number of resources. With the emergence and development of machine learning, researchers have started to pursue more flexible and efficient methods to screen or predict type 2 diabetes. ObjectiveThe aim of this study was to build prediction models based on the ensemble learning method for diabetes screening to further improve the health status of the population in a noninvasive and inexpensive manner. MethodsThe dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey from 2011-2016. After data cleaning and feature selection, the dataset was split into a training set (80%, 2011-2014), test set (20%, 2011-2014) and validation set (2015-2016). Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and easy ensemble methods were used to build diabetes prediction models. The performance of the models was evaluated through 5-fold cross-validation and external validation. The Delong test (2-sided) was used to test the performance differences between the models. ResultsWe selected 8057 observations and 12 attributes from the database. In the 5-fold cross-validation, the three simple methods yielded highly predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble methods significantly outperformed the simple methods. When we evaluated the models in the test set and validation set, the same trends were observed. The ensemble model of linear discriminant analysis yielded the best performance, with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set. ConclusionsThis study indicates that efficient screening using machine learning methods with noninvasive tests can be applied to a large population and achieve the objective of secondary prevention.
Published: 2020
Full Text: View/download PDF

203. PROC2PDDL: Open-Domain Planning Representations from Texts

Author: Zhang, Tianyi, Zhang, Li, Hou, Zhaoyi, Wang, Ziyu, Gu, Yuling, Clark, Peter, Callison-Burch, Chris, and Tandon, Niket
Subjects: Computer Science - Computation and Language
Abstract: Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used language models to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate state-of-the-art models on defining the preconditions and effects of actions. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%. Our analysis shows both syntactic and semantic errors, indicating LMs' deficiency in both generating domain-specific prgorams and reasoning about events. We hope this analysis and dataset helps future progress towards integrating the best of LMs and formal planning., Comment: In NLRSE 2024, the 2nd Natural Language Reasoning and Structured Explanations Workshop
Published: 2024

204. CAMixerSR: Only Details Need More 'Attention'

Author: Wang, Yan, Liu, Yi, Zhao, Shijie, Li, Junlin, and Zhang, Li
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: To satisfy the rapidly increasing demands on the large image (2K-8K) super-resolution (SR), prevailing methods follow two independent tracks: 1) accelerate existing networks by content-aware routing, and 2) design better super-resolution networks via token mixer refining. Despite directness, they encounter unavoidable defects (e.g., inflexible route or non-discriminative processing) limiting further improvements of quality-complexity trade-off. To erase the drawbacks, we integrate these schemes by proposing a content-aware mixer (CAMixer), which assigns convolution for simple contexts and additional deformable window-attention for sparse textures. Specifically, the CAMixer uses a learnable predictor to generate multiple bootstraps, including offsets for windows warping, a mask for classifying windows, and convolutional attentions for endowing convolution with the dynamic property, which modulates attention to include more useful textures self-adaptively and improves the representation capability of convolution. We further introduce a global classification loss to improve the accuracy of predictors. By simply stacking CAMixers, we obtain CAMixerSR which achieves superior performance on large-image SR, lightweight SR, and omnidirectional-image SR., Comment: Accepted by CVPR 2024
Published: 2024

205. Modular Blind Video Quality Assessment

Author: Wen, Wen, Li, Mu, Zhang, Yabin, Liao, Yiting, Li, Junlin, Zhang, Li, and Ma, Kede
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services. Contemporary deep learning-based models primarily analyze video content in its aggressively subsampled format, while being blind to the impact of the actual spatial resolution and frame rate on video quality. In this paper, we propose a modular BVQA model and a method of training it to improve its modularity. Our model comprises a base quality predictor, a spatial rectifier, and a temporal rectifier, responding to the visual content and distortion, spatial resolution, and frame rate changes on video quality, respectively. During training, spatial and temporal rectifiers are dropped out with some probabilities to render the base quality predictor a standalone BVQA model, which should work better with the rectifiers. Extensive experiments on both professionally-generated content and user-generated content video databases show that our quality model achieves superior or comparable performance to current methods. Additionally, the modularity of our model offers an opportunity to analyze existing video quality databases in terms of their spatial and temporal complexity., Comment: Accepted by CVPR 2024; Camera-ready version
Published: 2024

206. RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

Author: Zhang, Jie, Yang, Xubing, Jiang, Rui, Shao, Wei, and Zhang, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: The development of high-resolution remote sensing satellites has provided great convenience for research work related to remote sensing. Segmentation and extraction of specific targets are essential tasks when facing the vast and complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks. While the direct application of SAM to remote sensing image segmentation tasks does not yield satisfactory results, we propose RSAM-Seg, which stands for Remote Sensing SAM with Semantic Segmentation, as a tailored modification of SAM for the remote sensing field and eliminates the need for manual intervention to provide prompts. Adapter-Scale, a set of supplementary scaling modules, are proposed in the multi-head attention blocks of the encoder part of SAM. Furthermore, Adapter-Feature are inserted between the Vision Transformer (ViT) blocks. These modules aim to incorporate high-frequency image information and image embedding features to generate image-informed prompts. Experiments are conducted on four distinct remote sensing scenarios, encompassing cloud detection, field monitoring, building detection and road mapping tasks . The experimental results not only showcase the improvement over the original SAM and U-Net across cloud, buildings, fields and roads scenarios, but also highlight the capacity of RSAM-Seg to discern absent areas within the ground truth of certain datasets, affirming its potential as an auxiliary annotation method. In addition, the performance in few-shot scenarios is commendable, underscores its potential in dealing with limited datasets., Comment: 12 pages, 11 figures
Published: 2024

207. Data Interpreter: An LLM Agent For Data Science

Author: Hong, Sirui, Lin, Yizhang, Liu, Bang, Liu, Bangbang, Wu, Binhao, Zhang, Ceyao, Wei, Chenxing, Li, Danyang, Chen, Jiaqi, Zhang, Jiayi, Wang, Jinlin, Zhang, Li, Zhang, Lingyao, Yang, Min, Zhuge, Mingchen, Guo, Taicheng, Zhou, Tuo, Tao, Wei, Tang, Xiangru, Lu, Xiangtao, Zheng, Xiawu, Liang, Xinbing, Fei, Yaying, Cheng, Yuheng, Gou, Zhibin, Xu, Zongze, and Wu, Chenglin
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Model (LLM)-based agents have shown effectiveness across many applications. However, their use in data science scenarios requiring solving long-term interconnected tasks, dynamic data adjustments and domain expertise remains challenging. Previous approaches primarily focus on individual tasks, making it difficult to assess the complete data science workflow. Moreover, they struggle to handle real-time changes in intermediate data and fail to adapt dynamically to evolving task dependencies inherent to data science problems. In this paper, we present Data Interpreter, an LLM-based agent designed to automatically solve various data science problems end-to-end. Our Data Interpreter incorporates two key modules: 1) Hierarchical Graph Modeling, which breaks down complex problems into manageable subproblems, enabling dynamic node generation and graph optimization; and 2) Programmable Node Generation, a technique that refines and verifies each subproblem to iteratively improve code generation results and robustness. Extensive experiments consistently demonstrate the superiority of Data Interpreter. On InfiAgent-DABench, it achieves a 25% performance boost, raising accuracy from 75.9% to 94.9%. For machine learning and open-ended tasks, it improves performance from 88% to 95%, and from 60% to 97%, respectively. Moreover, on the MATH dataset, Data Interpreter achieves remarkable performance with a 26% improvement compared to state-of-the-art baselines. The code is available at https://github.com/geekan/MetaGPT.
Published: 2024

208. Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation

Author: Ge, Yuan, Liu, Yilun, Hu, Chi, Meng, Weibin, Tao, Shimin, Zhao, Xiaofeng, Ma, Hongxia, Zhang, Li, Chen, Boxing, Yang, Hao, Li, Bei, Xiao, Tong, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required for training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being affected by biases in GPT models, or reducing the diversity of the selected instruction dataset. In this paper, we propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR). CaR employs a two-step process: first, it ranks instruction pairs using a high-accuracy (84.25%) scoring model aligned with expert preferences; second, it preserves dataset diversity through clustering. In our experiment, CaR efficiently selected a mere 1.96% of Alpaca's IT data, yet the resulting AlpaCaR model surpassed Alpaca's performance by an average of 32.1% in GPT-4 evaluations. Moreover, we find that data selecting is a consistent paradigm whether the pre-trained model is more capable or the model parameters scaling up. Our approach employs compact models with 550M parameters and incurs just 11.2% of the financial outlay of current methods, enhancing its industrial deployability., Comment: Accepted by EMNLP2024
Published: 2024

209. From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

Author: Liu, Yulong, Yuan, Yunlong, Wang, Chunwei, Han, Jianhua, Ma, Yongqiang, Zhang, Li, Zheng, Nanning, and Xu, Hang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: The distinction between humans and animals lies in the unique ability of humans to use and create tools. Tools empower humans to overcome physiological limitations, fostering the creation of magnificent civilizations. Similarly, enabling foundational models like Large Language Models (LLMs) with the capacity to learn external tool usage may serve as a pivotal step toward realizing artificial general intelligence. Previous studies in this field have predominantly pursued two distinct approaches to augment the tool invocation capabilities of LLMs. The first approach emphasizes the construction of relevant datasets for model fine-tuning. The second approach, in contrast, aims to fully exploit the inherent reasoning abilities of LLMs through in-context learning strategies. In this work, we introduce a novel tool invocation pipeline designed to control massive real-world APIs. This pipeline mirrors the human task-solving process, addressing complicated real-life user queries. At each step, we guide LLMs to summarize the achieved results and determine the next course of action. We term this pipeline `from Summary to action', Sum2Act for short. Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements, outperforming established methods like ReAct and DFSDT. This highlights Sum2Act's effectiveness in enhancing LLMs for complex real-world tasks.
Published: 2024

210. BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM

Author: Zhang, Li, Liang, Youwei, Zhang, Ruiyi, Javadi, Amirhosein, and Xie, Pengtao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The Segment Anything Model (SAM), a foundation model pretrained on millions of images and segmentation masks, has significantly advanced semantic segmentation, a fundamental task in computer vision. Despite its strengths, SAM encounters two major challenges. Firstly, it struggles with segmenting specific objects autonomously, as it relies on users to manually input prompts like points or bounding boxes to identify targeted objects. Secondly, SAM faces challenges in excelling at specific downstream tasks, like medical imaging, due to a disparity between the distribution of its pretraining data, which predominantly consists of general-domain images, and the data used in downstream tasks. Current solutions to these problems, which involve finetuning SAM, often lead to overfitting, a notable issue in scenarios with very limited data, like in medical imaging. To overcome these limitations, we introduce BLO-SAM, which finetunes SAM based on bi-level optimization (BLO). Our approach allows for automatic image segmentation without the need for manual prompts, by optimizing a learnable prompt embedding. Furthermore, it significantly reduces the risk of overfitting by training the model's weight parameters and the prompt embedding on two separate subsets of the training dataset, each at a different level of optimization. We apply BLO-SAM to diverse semantic segmentation tasks in general and medical domains. The results demonstrate BLO-SAM's superior performance over various state-of-the-art image semantic segmentation methods.
Published: 2024

211. A First Look at GPT Apps: Landscape and Vulnerability

Author: Zhang, Zejun, Zhang, Li, Yuan, Xin, Zhang, Anlan, Xu, Mengwei, and Qian, Feng
Subjects: Computer Science - Cryptography and Security, Computer Science - Computation and Language
Abstract: Following OpenAI's introduction of GPTs, a surge in GPT apps has led to the launch of dedicated LLM app stores. Nevertheless, given its debut, there is a lack of sufficient understanding of this new ecosystem. To fill this gap, this paper presents a first comprehensive longitudinal (5-month) study of the evolution, landscape, and vulnerability of the emerging LLM app ecosystem, focusing on two GPT app stores: \textit{GPTStore.AI} and the official \textit{OpenAI GPT Store}. Specifically, we develop two automated tools and a TriLevel configuration extraction strategy to efficiently gather metadata (\ie names, creators, descriptions, \etc) and user feedback for all GPT apps across these two stores, as well as configurations (\ie system prompts, knowledge files, and APIs) for the top 10,000 popular apps. Our extensive analysis reveals: (1) the user enthusiasm for GPT apps consistently rises, whereas creator interest plateaus within three months of GPTs' launch; (2) nearly 90\% system prompts can be easily accessed due to widespread failure to secure GPT app configurations, leading to considerable plagiarism and duplication among apps. Our findings highlight the necessity of enhancing the LLM app ecosystem by the app stores, creators, and users.
Published: 2024

212. FrameNeRF: A Simple and Efficient Framework for Few-shot Novel View Synthesis

Author: Xing, Yan, Wang, Pan, Liu, Ligang, Li, Daolun, and Zhang, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: We present a novel framework, called FrameNeRF, designed to apply off-the-shelf fast high-fidelity NeRF models with fast training speed and high rendering quality for few-shot novel view synthesis tasks. The training stability of fast high-fidelity models is typically constrained to dense views, making them unsuitable for few-shot novel view synthesis tasks. To address this limitation, we utilize a regularization model as a data generator to produce dense views from sparse inputs, facilitating subsequent training of fast high-fidelity models. Since these dense views are pseudo ground truth generated by the regularization model, original sparse images are then used to fine-tune the fast high-fidelity model. This process helps the model learn realistic details and correct artifacts introduced in earlier stages. By leveraging an off-the-shelf regularization model and a fast high-fidelity model, our approach achieves state-of-the-art performance across various benchmark datasets.
Published: 2024

213. Calibrating Large Language Models with Sample Consistency

Author: Lyu, Qing, Shridhar, Kumar, Malaviya, Chaitanya, Zhang, Li, Elazar, Yanai, Tandon, Niket, Apidianaki, Marianna, Sachan, Mrinmaya, and Callison-Burch, Chris
Subjects: Computer Science - Computation and Language
Abstract: Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nature and massive scale. In this work, we explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. We perform an extensive evaluation across various open and closed-source models on nine reasoning datasets. Results show that consistency-based calibration methods outperform existing post-hoc approaches. Meanwhile, we find that factors such as intermediate explanations, model scaling, and larger sample sizes enhance calibration, while instruction-tuning makes calibration more difficult. Moreover, confidence scores obtained from consistency have the potential to enhance model performance. Finally, we offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
Published: 2024

214. LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Author: Ding, Yiran, Zhang, Li Lyna, Zhang, Chengruidong, Xu, Yuanyuan, Shang, Ning, Xu, Jiahang, Yang, Fan, and Yang, Mao
Subjects: Computer Science - Computation and Language
Abstract: Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of our method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations.
Published: 2024

215. Higher-order and fractional discrete time crystals in Floquet-driven Rydberg atoms

Author: Liu, Bang, Zhang, Li-Hua, Wang, Qi-Feng, Ma, Yu, Han, Tian-Yu, Zhang, Jun, Zhang, Zheng-Yuan, Shao, Shi-Yao, Li, Qing, Chen, Han-Chao, Shi, Bao-Sen, and Ding, Dong-Sheng
Subjects: Condensed Matter - Quantum Gases, Physics - Atomic Physics
Abstract: Higher-order and fractional discrete time crystals (DTCs) are exotic phases of matter where the discrete time translation symmetry is broken into higher-order and non-integer category. Generation of these unique DTCs has been widely studied theoretically in different systems. However, no current experimental methods can probe these higher-order and fractional DTCs in any quantum many-body systems. We demonstrate an experimental approach to observe higher-order and fractional DTCs in Floquet-driven Rydberg atomic gases. We have discovered multiple $n$-DTCs with integer values of $n$ = 2, 3, and 4, and others ranging up to 14, along with fractional $n$-DTCs with $n$ values beyond the integers. The system response can transition between adjacent integer DTCs, during which the fractional DTCs are investigated. Study of higher-order and fractional DTCs expands fundamental knowledge of non-equilibrium dynamics and is promising for discovery of more complex temporal symmetries beyond the single discrete time translation symmetry., Comment: 17 pages, 10 figures, to be published in Nature Communications
Published: 2024

216. Bifurcation of time crystals in driven and dissipative Rydberg atomic gas

Author: Liu, Bang, Zhang, Li-Hua, Liu, Zong-Kai, Zhang, Jun, Zhang, Zheng-Yuan, Shao, Shi-Yao, Li, Qing, Chen, Han-Chao, Ma, Yu, Han, Tian-Yu, Wang, Qi-Feng, Ding, Dong-Sheng, and Shi, Bao-Sen
Subjects: Condensed Matter - Quantum Gases, Physics - Atomic Physics
Abstract: A time crystal is an exotic phase of matter where time-translational symmetry is broken; this phase differs from the spatial symmetry breaking induced in crystals in space. Lots of experiments report the transition from a thermal equilibrium phase to time crystal phase. However, there is no experimental method to probe the bifurcation effect of distinct time crystals in quantum many-body systems. Here, in a driven and dissipative many-body Rydberg atom system, we observe multiple continuous dissipative time crystals and emergence of more complex temporal symmetries beyond the single time crystal phase. Bifurcation of time crystals in strongly interacting Rydberg atoms is observed; the process manifests as a transition from a time crystal state of long temporal order to one of short temporal order, or vice versa. By manipulating the driving field parameters, we observe the time crystal's bistability and a hysteresis loop. These investigations indicate new possibilities for control and manipulation of the temporal symmetries of non-equilibrium systems., Comment: Added references
Published: 2024

217. Observation of temporal topological boundary states of light in a momentum bandgap

Author: Ren, Yudong, Ye, Kangpeng, Chen, Qiaolu, Chen, Fujia, Zhang, Li, Pan, Yuang, Li, Wenhao, Li, Xinrui, Zhang, Lu, Chen, Hongsheng, and Yang, Yihao
Subjects: Physics - Optics
Abstract: Topological phases have prevailed across diverse disciplines, spanning electronics, photonics, and acoustics. Hitherto, the understanding of these phases has centred on energy (frequency) bandstructures, showcasing topological boundary states at spatial interfaces. Recent strides have uncovered a unique category of bandstructures characterized by gaps in momentum, referred to as momentum bandgaps or k gaps, notably driven by breakthroughs in photonic time crystals. This discovery hints at abundant topological phases defined within momentum bands, alongside a wealth of topological boundary states in the time domain. Here, we report the first experimental observation of k-gap topology in a large-scale optical temporal synthetic lattice, manifesting as temporal topological boundary states. These boundary states are uniquely situated at temporal interfaces between two subsystems with distinct k-gap topology. Counterintuitively, despite the exponential amplification of k-gap modes within both subsystems, these topological boundary states exhibit decay in both temporal directions. Our findings mark a significant pathway for delving into k gaps, temporal topological states, and time-varying physics.
Published: 2024

218. A Neural-network Enhanced Video Coding Framework beyond ECM

Author: Zhao, Yanchen, He, Wenxuan, Jia, Chuanmin, Wang, Qizhe, Li, Junru, Li, Yue, Lin, Chaoyi, Zhang, Kai, Zhang, Li, and Ma, Siwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, a hybrid video compression framework is proposed that serves as a demonstrative showcase of deep learning-based approaches extending beyond the confines of traditional coding methodologies. The proposed hybrid framework is founded upon the Enhanced Compression Model (ECM), which is a further enhancement of the Versatile Video Coding (VVC) standard. We have augmented the latest ECM reference software with well-designed coding techniques, including block partitioning, deep learning-based loop filter, and the activation of block importance mapping (BIM) which was integrated but previously inactive within ECM, further enhancing coding performance. Compared with ECM-10.0, our method achieves 6.26, 13.33, and 12.33 BD-rate savings for the Y, U, and V components under random access (RA) configuration, respectively.
Published: 2024

219. Translating Images to Road Network: A Sequence-to-Sequence Perspective

Author: Lu, Jiachen, Peng, Renyuan, Cai, Xinyue, Xu, Hang, Wen, Feng, Zhang, Wei, and Zhang, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non-Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of auto-regressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success on both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing state-of-the-art alternatives., Comment: V1 is the ICCV 2023 conference version, and V2 is the extended version
Published: 2024

220. Microwave control of collective quantum jump statistics of a dissipative Rydberg gas

Author: Liu, Zong-Kai, Sun, Kong-Hao, Cabot, Albert, Carollo, Federico, Zhang, Jun, Zhang, Zheng-Yuan, Zhang, Li-Hua, Liu, Bang, Han, Tian-Yu, Li, Qing, Ma, Yu, Chen, Han-Chao, Lesanovsky, Igor, Ding, Dong-Sheng, and Shi, Bao-Sen
Subjects: Quantum Physics
Abstract: Quantum many-body systems near phase transitions respond collectively to externally applied perturbations. We explore this phenomenon in a laser-driven dissipative Rydberg gas that is tuned to a bistable regime. Here two metastable phases coexist, which feature a low and high density of Rydberg atoms, respectively. The ensuing collective dynamics, which we monitor in situ, is characterized by stochastic collective jumps between these two macroscopically distinct many-body phases. We show that the statistics of these jumps can be controlled using a dual-tone microwave field. In particular, we find that the distribution of jump times develops peaks corresponding to subharmonics of the relative microwave detuning. Our study demonstrates the control of collective statistical properties of dissipative quantum many-body systems without the necessity of fine-tuning or of ultra cold temperatures. Such robust phenomena may find technological applications in quantum sensing and metrology.
Published: 2024
Full Text: View/download PDF

221. S-Agents: Self-organizing Agents in Open-ended Environments

Author: Chen, Jiaqi, Jiang, Yuxian, Lu, Jiachen, and Zhang, Li
Subjects: Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: Leveraging large language models (LLMs), autonomous agents have significantly improved, gaining the ability to handle a variety of tasks. In open-ended settings, optimizing collaboration for efficiency and effectiveness demands flexible adjustments. Despite this, current research mainly emphasizes fixed, task-oriented workflows and overlooks agent-centric organizational structures. Drawing inspiration from human organizational behavior, we introduce a self-organizing agent system (S-Agents) with a "tree of agents" structure for dynamic workflow, an "hourglass agent architecture" for balancing information priorities, and a "non-obstructive collaboration" method to allow asynchronous task execution among agents. This structure can autonomously coordinate a group of agents, efficiently addressing the challenges of open and dynamic environments without human intervention. Our experiments demonstrate that S-Agents proficiently execute collaborative building tasks and resource collection in the Minecraft environment, validating their effectiveness., Comment: ICLR 2024 Workshop on Large Language Model (LLM) Agents
Published: 2024

222. RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation

Author: Yu, Xiaohan, Zhang, Li, Zhao, Xin, Wang, Yue, and Ma, Zhongrui
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Large language models (LLM) have recently emerged as a powerful tool for a variety of natural language processing tasks, bringing a new surge of combining LLM with recommendation systems, termed as LLM-based RS. Current approaches generally fall into two main paradigms, the ID direct usage paradigm and the ID translation paradigm, noting their core weakness stems from lacking recommendation knowledge and uniqueness. To address this limitation, we propose a new paradigm, ID representation, which incorporates pre-trained ID embeddings into LLMs in a complementary manner. In this work, we present RA-Rec, an efficient ID representation alignment framework for LLM-based recommendation, which is compatible with multiple ID-based methods and LLM architectures. Specifically, we treat ID embeddings as soft prompts and design an innovative alignment module and an efficient tuning method with tailored data construction for alignment. Extensive experiments demonstrate RA-Rec substantially outperforms current state-of-the-art methods, achieving up to 3.0% absolute HitRate@100 improvements while utilizing less than 10x training data., Comment: 10 pages
Published: 2024

223. TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

Author: Dong, Jiaxiang, Wu, Haixu, Wang, Yuxuan, Qiu, Yunzhong, Zhang, Li, Wang, Jianmin, and Long, Mingsheng
Subjects: Computer Science - Machine Learning
Abstract: Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks. Prior methods are mainly based on pre-training techniques well-acknowledged in vision or language, such as masked modeling and contrastive learning. However, randomly masking time series or calculating series-wise similarity will distort or neglect inherent temporal correlations crucial in time series data. To emphasize temporal correlation modeling, this paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks. Concretely, TimeSiam pre-trains Siamese encoders to capture intrinsic temporal correlations between randomly sampled past and current subseries. With a simple data augmentation method (e.g.~masking), TimeSiam can benefit from diverse augmented subseries and learn internal time-dependent representations through a past-to-current reconstruction. Moreover, learnable lineage embeddings are also introduced to distinguish temporal distance between sampled series and further foster the learning of diverse temporal correlations. TimeSiam consistently outperforms extensive advanced pre-training baselines, demonstrating superior forecasting and classification capabilities across 13 standard benchmarks in both intra- and cross-domain scenarios.
Published: 2024

224. S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation

Author: Chen, Yurui, Zhang, Junge, Xie, Ziyang, Li, Wenye, Zhang, Feihu, Lu, Jiachen, and Zhang, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Autonomous driving simulation system plays a crucial role in enhancing self-driving data and simulating complex and rare traffic scenarios, ensuring navigation safety. However, traditional simulation systems, which often heavily rely on manual modeling and 2D image editing, struggled with scaling to extensive scenes and generating realistic simulation data. In this study, we present S-NeRF++, an innovative autonomous driving simulation system based on neural reconstruction. Trained on widely-used self-driving datasets such as nuScenes and Waymo, S-NeRF++ can generate a large number of realistic street scenes and foreground objects with high rendering quality as well as offering considerable flexibility in manipulation and simulation. Specifically, S-NeRF++ is an enhanced neural radiance field for synthesizing large-scale scenes and moving vehicles, with improved scene parameterization and camera pose learning. The system effectively utilizes noisy and sparse LiDAR data to refine training and address depth outliers, ensuring high-quality reconstruction and novel-view rendering. It also provides a diverse foreground asset bank by reconstructing and generating different foreground vehicles to support comprehensive scenario creation.Moreover, we have developed an advanced foreground-background fusion pipeline that skillfully integrates illumination and shadow effects, further enhancing the realism of our simulations. With the high-quality simulated data provided by our S-NeRF++, we found the perception methods enjoy performance boosts on several autonomous driving downstream tasks, further demonstrating our proposed simulator's effectiveness.
Published: 2024

225. LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression

Author: Jiang, Wei, Li, Junru, Zhang, Kai, and Zhang, Li
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing learned video compression models employ flow net or deformable convolutional networks (DCN) to estimate motion information. However, the limited receptive fields of flow net and DCN inherently direct their attentiveness towards the local contexts. Global contexts, such as large-scale motions and global correlations among frames are ignored, presenting a significant bottleneck for capturing accurate motions. To address this issue, we propose a joint local and global motion compensation module (LGMC) for leaned video coding. More specifically, we adopt flow net for local motion compensation. To capture global context, we employ the cross attention in feature domain for motion compensation. In addition, to avoid the quadratic complexity of vanilla cross attention, we divide the softmax operations in attention into two independent softmax operations, leading to linear complexity. To validate the effectiveness of our proposed LGMC, we integrate it with DCVC-TCM and obtain learned video compression with joint local and global motion compensation (LVC-LGMC). Extensive experiments demonstrate that our LVC-LGMC has significant rate-distortion performance improvements over baseline DCVC-TCM., Comment: Accepted to ICASSP 2024 (lecture presentation). The first attempt to use cross attention for bits-free motion estimation and motion compensation
Published: 2024
Full Text: View/download PDF

226. LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

Author: Peng, Renyuan, Cai, Xinyue, Xu, Hang, Lu, Jiachen, Wen, Feng, Zhang, Wei, and Zhang, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Understanding road structures is crucial for autonomous driving. Intricate road structures are often depicted using lane graphs, which include centerline curves and connections forming a Directed Acyclic Graph (DAG). Accurate extraction of lane graphs relies on precisely estimating vertex and edge information within the DAG. Recent research highlights Transformer-based language models' impressive sequence prediction abilities, making them effective for learning graph representations when graph data are encoded as sequences. However, existing studies focus mainly on modeling vertices explicitly, leaving edge information simply embedded in the network. Consequently, these approaches fall short in the task of lane graph extraction. To address this, we introduce LaneGraph2Seq, a novel approach for lane graph extraction. It leverages a language model with vertex-edge encoding and connectivity enhancement. Our serialization strategy includes a vertex-centric depth-first traversal and a concise edge-based partition sequence. Additionally, we use classifier-free guidance combined with nucleus sampling to improve lane connectivity. We validate our method on prominent datasets, nuScenes and Argoverse 2, showcasing consistent and compelling results. Our LaneGraph2Seq approach demonstrates superior performance compared to state-of-the-art techniques in lane graph extraction., Comment: AAAI 2024
Published: 2024

227. Characteristics of the MTx optical transmitter in Total Ionizing Dose

Author: Gong, D., Hou, S., Juang, B. J., Li, J. -H., Liu, C., Liu, T., Qi, M., Ye, J., Zhang, Lei, Zhang, Li, and Zhu, H. P.
Subjects: Physics - Instrumentation and Detectors, High Energy Physics - Experiment
Abstract: The dual-channel multi-mode 850 nm optical Miniature Transmitter (MTx) is developed for data transmission of the ATLAS LAr calorimeter readout at LHC. The MTx's are exposed to the radiation field of proton-proton collisions, therefore, the tolerance in Total Ionizing Dose (TID) is required. The TID effects in the MTx are investigated with X-rays and Co-60 gamma-rays for the active components of VCSEL diodes and the customized Link-on-Chip laser driver (LOCld) developed in 0.25 um Silicon-on-Sapphire CMOS technology. The irradiation tests were conducted at various dose rates. The responses to TID are observed with degradation of laser currents at initial dose of 10 to 100 Gy(SiO2), and partial recovery with additional TID to a stable output about 90 % of the original. The optical eye diagrams of irradiated samples show slightly increased jittering, and are suitable for the ATLAS requirement of 5 Gbps applications., Comment: 7 pages, 10 figures
Published: 2024

228. Boundary-induced topological chiral extended states in Weyl metamaterial waveguides

Author: Han, Ning, Chen, Fujia, Li, Mingzhu, Zhao, Rui, Li, Wenhao, Chen, Qiaolu, Zhang, Li, Pan, Yuang, Ma, Jingwen, Yu, Zhi-Ming, Chen, Hongsheng, and Yang, Yihao
Subjects: Physics - Optics
Abstract: In topological physics, it is commonly understood that the existence of the boundary states of a topological system is inherently dictated by its bulk. A classic example is that the surface Fermi arc states of a Weyl system are determined by the chiral charges of Weyl points within the bulk. Contrasting with this established perspective, here, we theoretically and experimentally discover a family of topological chiral bulk states extending over photonic Weyl metamaterial waveguides, solely induced by the waveguide boundaries, independently of the waveguide width. Notably, these bulk states showcase discrete momenta and function as wormhole tunnels that connect Fermi-arc surface states living in different two dimensional spaces via a third dimension. Our work offers a magneticfield-free mechanism for robust chiral bulk transport of waves and highlights the boundaries as a new degree of freedom to regulate bulk Weyl quasiparticles.
Published: 2024

229. Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video

Author: Pan, Zijie, Yang, Zeyu, Zhu, Xiatian, and Zhang, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Generating dynamic 3D object from a single-view video is challenging due to the lack of 4D labeled data. An intuitive approach is to extend previous image-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling.However, this approach would be slow and expensive to scale due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly reconstruct the 4D content through a 4D Gaussian splatting model. Importantly, our method can achieve real-time rendering under continuous camera trajectories. To enable robust reconstruction under sparse views, we introduce inconsistency-aware confidence-weighted loss design, along with a lightly weighted score distillation loss. Extensive experiments on both synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the quality of novel view synthesis. For example, Efficient4D takes only 10 minutes to model a dynamic object, vs 120 minutes by the previous art model Consistent4D., Comment: Technical report
Published: 2024

230. A Survey of Resource-efficient LLM and Multimodal Foundation Models

Author: Xu, Mengwei, Yin, Wangsong, Cai, Dongqi, Yi, Rongjie, Xu, Daliang, Wang, Qipeng, Wu, Bingyang, Zhao, Yihao, Yang, Chen, Wang, Shihe, Zhang, Qiyang, Lu, Zhenyan, Zhang, Li, Wang, Shangguang, Li, Yuanchun, Liu, Yunxin, Jin, Xin, and Liu, Xuanzhe
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of these large models in a scalable and environmentally sustainable way, there has been a considerable focus on developing resource-efficient strategies. This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects. It offers a comprehensive analysis and valuable insights gleaned from existing literature, encompassing a broad array of topics from cutting-edge model architectures and training/serving algorithms to practical system designs and implementations. The goal of this survey is to provide an overarching understanding of how current approaches are tackling the resource challenges posed by large foundation models and to potentially inspire future breakthroughs in this field.
Published: 2024

231. UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer

Author: Liu, Ji, Tang, Dehua, Huang, Yuanxian, Zhang, Li, Zeng, Xiaocheng, Li, Dong, Lu, Mingjie, Peng, Jinzhang, Wang, Yu, Jiang, Fan, Tian, Lu, and Sirasao, Ashish
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, finetuning subnet by directly removing activation layers would corrupt the original model weights, hindering the pruned model from achieving high performance. To address these issues, we propose a novel depth pruning method for efficient models. Our approach proposes a novel block pruning strategy and progressive training method for the subnet. Additionally, we extend our pruning method to vision transformer models. Experimental results demonstrate that our method consistently outperforms existing depth pruning methods across various pruning configurations. We obtained three pruned ConvNeXtV1 models with our method applying on ConvNeXtV1, which surpass most SOTA efficient models with comparable inference performance. Our method also achieves state-of-the-art pruning performance on the vision transformer model.
Published: 2024

232. Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation

Author: Yang, Jinhai, Guo, Mengxi, Zhao, Shijie, Li, Junlin, and Zhang, Li
Subjects: Computer Science - Multimedia, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Adaptive video streaming requires efficient bitrate ladder construction to meet heterogeneous network conditions and end-user demands. Per-title optimized encoding typically traverses numerous encoding parameters to search the Pareto-optimal operating points for each video. Recently, researchers have attempted to predict the content-optimized bitrate ladder for pre-encoding overhead reduction. However, existing methods commonly estimate the encoding parameters on the Pareto front and still require subsequent pre-encodings. In this paper, we propose to directly predict the optimal transcoding resolution at each preset bitrate for efficient bitrate ladder construction. We adopt a Temporal Attentive Gated Recurrent Network to capture spatial-temporal features and predict transcoding resolutions as a multi-task classification problem. We demonstrate that content-optimized bitrate ladders can thus be efficiently determined without any pre-encoding. Our method well approximates the ground-truth bitrate-resolution pairs with a slight Bj{\o}ntegaard Delta rate loss of 1.21% and significantly outperforms the state-of-the-art fixed ladder., Comment: Accepted by the 2024 Data Compression Conference (DCC) for presentation as a poster. This is the full paper
Published: 2024

233. DGDNN: Decoupled Graph Diffusion Neural Network for Stock Movement Prediction

Author: You, Zinuo, Shi, Zijian, Bo, Hongbo, Cartlidge, John, Zhang, Li, and Ge, Yan
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Forecasting future stock trends remains challenging for academia and industry due to stochastic inter-stock dynamics and hierarchical intra-stock dynamics influencing stock prices. In recent years, graph neural networks have achieved remarkable performance in this problem by formulating multiple stocks as graph-structured data. However, most of these approaches rely on artificially defined factors to construct static stock graphs, which fail to capture the intrinsic interdependencies between stocks that rapidly evolve. In addition, these methods often ignore the hierarchical features of the stocks and lose distinctive information within. In this work, we propose a novel graph learning approach implemented without expert knowledge to address these issues. First, our approach automatically constructs dynamic stock graphs by entropy-driven edge generation from a signal processing perspective. Then, we further learn task-optimal dependencies between stocks via a generalized graph diffusion process on constructed stock graphs. Last, a decoupled representation learning scheme is adopted to capture distinctive hierarchical intra-stock features. Experimental results demonstrate substantial improvements over state-of-the-art baselines on real-world datasets. Moreover, the ablation study and sensitivity study further illustrate the effectiveness of the proposed method in modeling the time-evolving inter-stock and intra-stock dynamics., Comment: 12 pages, 5 figures, author manuscript accepted for ICAART 2024 (International Conference on Agents and Artificial Intelligence)
Published: 2024
Full Text: View/download PDF

234. FGENet: Fine-Grained Extraction Network for Congested Crowd Counting

Author: Ma, Hao-Yuan, Zhang, Li, and Wei, Xiang-Yi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Crowd counting has gained significant popularity due to its practical applications. However, mainstream counting methods ignore precise individual localization and suffer from annotation noise because of counting from estimating density maps. Additionally, they also struggle with high-density images.To address these issues, we propose an end-to-end model called Fine-Grained Extraction Network (FGENet). Different from methods estimating density maps, FGENet directly learns the original coordinate points that represent the precise localization of individuals.This study designs a fusion module, named Fine-Grained Feature Pyramid(FGFP), that is used to fuse feature maps extracted by the backbone of FGENet. The fused features are then passed to both regression and classification heads, where the former provides predicted point coordinates for a given image, and the latter determines the confidence level for each predicted point being an individual. At the end, FGENet establishes correspondences between prediction points and ground truth points by employing the Hungarian algorithm. For training FGENet, we design a robust loss function, named Three-Task Combination (TTC), to mitigate the impact of annotation noise. Extensive experiments are conducted on four widely used crowd counting datasets. Experimental results demonstrate the effectiveness of FGENet. Notably, our method achieves a remarkable improvement of 3.14 points in Mean Absolute Error (MAE) on the ShanghaiTech Part A dataset, showcasing its superiority over the existing state-of-the-art methods. Even more impressively, FGENet surpasses previous benchmarks on the UCF\_CC\_50 dataset with an astounding enhancement of 30.16 points in MAE., Comment: Accepted by 30th International Conference on MultiMedia Modeling
Published: 2024

235. Inhibition of endothelial histone deacetylase 2 shifts endothelial-mesenchymal transitions in cerebral arteriovenous malformation models.

Author: Zhao, Yan, Wu, Xiuju, Yang, Yang, Zhang, Li, Cai, Xinjiang, Chen, Sydney, Vera, Abigail, Ji, Jaden, Boström, Kristina, and Yao, Yucheng
Subjects: Endothelial cells, Mouse models, Vascular biology, Animals, Humans, Mice, Disease Models, Animal, Endothelial Cells, Histone Deacetylase 2, Intracranial Arteriovenous Malformations, Male, Female
Abstract: Cerebral arteriovenous malformations (AVMs) are the most common vascular malformations worldwide and the leading cause of hemorrhagic strokes that may result in crippling neurological deficits. Here, using recently generated mouse models, we uncovered that cerebral endothelial cells (ECs) acquired mesenchymal markers and caused vascular malformations. Interestingly, we found that limiting endothelial histone deacetylase 2 (HDAC2) prevented cerebral ECs from undergoing mesenchymal differentiation and reduced cerebral AVMs. We found that endothelial expression of HDAC2 and enhancer of zeste homolog 1 (EZH1) was altered in cerebral AVMs. These alterations changed the abundance of H4K8ac and H3K27me in the genes regulating endothelial and mesenchymal differentiation, which caused the ECs to acquire mesenchymal characteristics and form AVMs. This investigation demonstrated that the induction of HDAC2 altered specific histone modifications, which resulted in mesenchymal characteristics in the ECs and cerebral AVMs. The results provide insight into the epigenetic impact on AVMs.
Published: 2024

236. Mediator kinase inhibition reverses castration resistance of advanced prostate cancer

Author: Li, Jing, Hilimire, Thomas A, Yueying, Liu, Wang, Lili, Liang, Jiaxin, Győrffy, Balázs, Sikirzhytski, Vitali, Ji, Hao, Zhang, Li, Cheng, Chen, Ding, Xiaokai, Kerr, Kendall R, Dowling, Charles E, Chumanevich, Alexander A, Mack, Zachary T, Schools, Gary P, Lim, Chang-uk, Ellis, Leigh, Zi, Xiaolin, Porter, Donald C, Broude, Eugenia V, McInnes, Campbell, Wilding, George, Lilly, Michael B, Roninson, Igor B, and Chen, Mengqian
Subjects: Biomedical and Clinical Sciences, Clinical Sciences, Oncology and Carcinogenesis, Urologic Diseases, Genetics, Prostate Cancer, Cancer, Development of treatments and therapeutic interventions, 5.1 Pharmaceuticals, Male, Humans, Animals, Prostatic Neoplasms, Castration-Resistant, Mice, Cyclin-Dependent Kinases, Cyclin-Dependent Kinase 8, Cell Line, Tumor, Xenograft Model Antitumor Assays, Protein Kinase Inhibitors, Gene Expression Regulation, Neoplastic, Tumor Microenvironment, Oncology, Therapeutics, Transcription, Medical and Health Sciences, Immunology, Biological sciences, Biomedical and clinical sciences, Health sciences
Abstract: Mediator kinases CDK19 and CDK8, pleiotropic regulators of transcriptional reprogramming, are differentially regulated by androgen signaling, but both kinases are upregulated in castration-resistant prostate cancer (CRPC). Genetic or pharmacological inhibition of CDK8 and CDK19 reverses the castration-resistant phenotype and restores the sensitivity of CRPC xenografts to androgen deprivation in vivo. Prolonged CDK8/19 inhibitor treatment combined with castration not only suppressed the growth of CRPC xenografts but also induced tumor regression and cures. Transcriptomic analysis revealed that Mediator kinase inhibition amplified and modulated the effects of castration on gene expression, disrupting CRPC adaptation to androgen deprivation. Mediator kinase inactivation in tumor cells also affected stromal gene expression, indicating that Mediator kinase activity in CRPC molded the tumor microenvironment. The combination of castration and Mediator kinase inhibition downregulated the MYC pathway, and Mediator kinase inhibition suppressed a MYC-driven CRPC tumor model even without castration. CDK8/19 inhibitors showed efficacy in patient-derived xenograft models of CRPC, and a gene signature of Mediator kinase activity correlated with tumor progression and overall survival in clinical samples of metastatic CRPC. These results indicate that Mediator kinases mediated androgen-independent in vivo growth of CRPC, supporting the development of CDK8/19 inhibitors for the treatment of this presently incurable disease.
Published: 2024

237. Dysregulation of CD4+ and CD8+ resident memory T, myeloid, and stromal cells in steroid-experienced, checkpoint inhibitor colitis.

Author: He, Jun, Kim, Yang-Joon, Mennillo, Elvira, Rusu, Iulia, Bain, Jared, Rao, Arjun, Andersen, Christopher, Law, Karen, Yang, Hai, Tsui, Jessica, Shen, Alan, Davidson, Brittany, Kushnoor, Divyashree, Shi, Yimin, Fan, Frances, Cheung, Alexander, Zhang, Li, Fong, Lawrence, Combes, Alexis, Pisco, Angela, Oh, David, and Kattah, Michael
Subjects: Colitis, Immune Checkpoint Inhibitor, Immune related adverse event - irAE, Memory, Humans, CD8-Positive T-Lymphocytes, Endothelial Cells, Tumor Necrosis Factor Inhibitors, Colitis, CD4-Positive T-Lymphocytes, Steroids, Stromal Cells
Abstract: BACKGROUND: Colitis caused by checkpoint inhibitors (CPI) is frequent and is treated with empiric steroids, but CPI colitis mechanisms in steroid-experienced or refractory disease are unclear. METHODS: Using colon biopsies and blood from predominantly steroid-experienced CPI colitis patients, we performed multiplexed single-cell transcriptomics and proteomics to nominate contributing populations. RESULTS: CPI colitis biopsies showed enrichment of CD4+resident memory (RM) T cells in addition to CD8+ RM and cytotoxic CD8+ T cells. Matching T cell receptor (TCR) clonotypes suggested that both RMs are progenitors that yield cytotoxic effectors. Activated, CD38+ HLA-DR+ CD4+ RM and cytotoxic CD8+ T cells were enriched in steroid-experienced and a validation data set of steroid-naïve CPI colitis, underscoring their pathogenic potential across steroid exposure. Distinct from ulcerative colitis, CPI colitis exhibited perturbed stromal metabolism (NAD+, tryptophan) impacting epithelial survival and inflammation. Endothelial cells in CPI colitis after anti-TNF and anti-cytotoxic T-lymphocyte-associated antigen 4 (anti-CTLA-4) upregulated the integrin α4β7 ligand molecular vascular addressin cell adhesion molecule 1 (MAdCAM-1), which may preferentially respond to vedolizumab (anti-α4β7). CONCLUSIONS: These findings nominate CD4+ RM and MAdCAM-1+ endothelial cells for targeting in specific subsets of CPI colitis patients.
Published: 2024

238. Immune Modulation with RANKL Blockade through Denosumab Treatment in Patients with Cancer.

Author: Chang, Hewitt, Marquez, Jaqueline, Chen, Brandon, Kim, Daniel, Cheng, Michael, Liu, Eric, Yang, Hai, Zhang, Li, Sinha, Meenal, Cheung, Alexander, Kwek, Serena, Chow, Eric, Bridge, Mark, Aggarwal, Rahul, Friedlander, Terence, Small, Eric, Anderson, Mark, and Fong, Lawrence
Subjects: Humans, Male, Bone Neoplasms, Denosumab, Kidney Neoplasms, RANK Ligand, Prostatic Neoplasms
Abstract: Denosumab is a fully human mAb that binds receptor activator of NFκB ligand (RANKL). It is routinely administered to patients with cancer to reduce the incidence of new bone metastasis. RANK-RANKL interactions regulate bone turnover by controlling osteoclast recruitment, development, and activity. However, these interactions also can regulate immune cells including dendritic cells and medullary thymic epithelial cells. Inhibition of the latter results in reduced thymic negative selection of T cells and could enhance the generation of tumor-specific T cells. We examined whether administering denosumab could modify modulate circulating immune cells in patients with cancer. Blood was collected from 23 patients with prostate cancer and 3 patients with renal cell carcinoma, all of whom had advanced disease and were receiving denosumab, prior to and during denosumab treatment. Using high-dimensional mass cytometry, we found that denosumab treatment by itself induced modest effects on circulating immune cell frequency and activation. We also found minimal changes in the circulating T-cell repertoire and the frequency of new thymic emigrants with denosumab treatment. However, when we stratified patients by whether they were receiving chemotherapy and/or steroids, patients receiving these concomitant treatments showed significantly greater immune modulation, including an increase in the frequency of natural killer cells early and classical monocytes later. We also saw broad induction of CTLA-4 and TIM3 expression in circulating lymphocytes and some monocyte populations. These findings suggest that denosumab treatment by itself has modest immunomodulatory effects, but when combined with conventional cancer treatments, can lead to the induction of immunologic checkpoints. See related Spotlight by Nasrollahi and Davar, p. 383.
Published: 2024

239. Abstract P114: Micro RNAs Associated With Clinical Classifiers of Gender, Race and Ethnicity in the Diabetes Prevention Program

Author: Stroebel, Benjamin M, Zhang, Li, Aouizerat, Bradley E, Lewis, Kimberly A, Longoria, Kayla D, Gadgil, Meghana, and Flowers, Elena
Subjects: Epidemiology, Biomedical and Clinical Sciences, Health Sciences, Cardiovascular, Genetics, Diabetes, Obesity, Biotechnology, Clinical Research, Prevention, Nutrition, Metabolic and endocrine, Good Health and Well Being, Cardiorespiratory Medicine and Haematology, Clinical Sciences, Public Health and Health Services, Cardiovascular System & Hematology, Cardiovascular medicine and haematology, Clinical sciences, Sports science and exercise
Abstract: Introduction: Metabolic syndrome (MetS) is a prominent risk factor for both cardiovascular disease (CVD) and type 2 diabetes (T2D). MicroRNAs (miRs) are small noncoding RNA molecules that target messenger RNAs to alter gene expression. Circulating miRs have been studied as potential clinically meaningful biomarkers of risk for MetS as they are readily measured from blood. Though associations among MetS components and social constructs of race, ethnicity, and gender have previously been established, grouping according to these constructs alone may conflate genetic ancestry with the environmental/behavioral effects of being characterized as a particular gender, race, or ethnicity. The purpose of this study was to identify differences in circulating miRs associated with demographic factors and MetS components to better illuminate the nuanced health impacts of race, ethnicity and gender. Methods: This was a secondary analysis of a subset of participants (N=1000) from the Diabetes Prevention Program (DPP). A custom Fireplex assay was used to quantify miRs from banked plasma collected at baseline. Correlations between miRs and metabolic syndrome components were assessed by Pearson’s correlation coefficient. Multivariable linear models adjusted for age and weight were used to analyze associations between gender, race, ethnicity, and miR expression. The Benjamini-Hochberg false discovery rate (FDR) method was applied. Results: The sample was 68% female, 19% Black, 15% Hispanic, and mean age was 52 ± 10 years. After adjusting for multiple comparisons (FDR
Published: 2024

240. Observation of vortex-string chiral modes in metamaterials.

Author: Ma, Jingwen, Jia, Ding, Zhang, Li, Guan, Yi-Jun, Ge, Yong, Sun, Hong-Xiang, Yuan, Shou-Qi, Chen, Hongsheng, Yang, Yihao, and Zhang, Xiang
Abstract: As hypothetical topological defects in the geometry of spacetime, vortex strings could have played many roles in cosmology, and their distinct features can provide observable clues about the early universes evolution. A key feature of vortex strings is that they can interact with Weyl fermionic modes and support massless chiral-anomaly states along strings. To date, despite many attempts to detect vortex strings in astrophysics or to emulate them in artificially created systems, observation of these vortex-string chiral modes remains experimentally elusive. Here we report experimental observations of vortex-string chiral modes using a metamaterial system. This is implemented by inhomogeneous perturbation of Yang-monopole phononic metamaterials. The measured linear dispersion and modal profiles confirm the existence of topological modes bound to and propagating along the string with the chiral anomaly. Our work provides a platform for studying diverse cosmic topological defects in astrophysics and offers applications as topological fibres in communication techniques.
Published: 2024

241. CDK1 inhibition reduces osteogenesis in endothelial cells in vascular calcification.

Author: Zhao, Yan, Yang, Yang, Wu, Xiuju, Zhang, Li, Ji, Jaden, Chen, Sydney, Vera, Abigail, Boström, Kristina, Yao, Yucheng, and Cai, Xinjiang
Subjects: Cardiovascular disease, Cell biology, Vascular biology, Animals, Mice, Calcification, Physiologic, Cell Differentiation, Endothelial Cells, Osteogenesis, Vascular Calcification
Abstract: Vascular calcification is a severe complication of cardiovascular diseases. Previous studies demonstrated that endothelial lineage cells transitioned into osteoblast-like cells and contributed to vascular calcification. Here, we found that inhibition of cyclin-dependent kinase (CDK) prevented endothelial lineage cells from transitioning to osteoblast-like cells and reduced vascular calcification. We identified a robust induction of CDK1 in endothelial cells (ECs) in calcified arteries and showed that EC-specific gene deletion of CDK1 decreased the calcification. We found that limiting CDK1 induced E-twenty-six specific sequence variant 2 (ETV2), which was responsible for blocking endothelial lineage cells from undergoing osteoblast differentiation. We also found that inhibition of CDK1 reduced vascular calcification in a diabetic mouse model. Together, the results highlight the importance of CDK1 suppression and suggest CDK1 inhibition as a potential option for treating vascular calcification.
Published: 2024

242. Evaluation of Culture Conducive to Academic Success by Gender at a Comprehensive Cancer Center.

Author: Westring, Alyssa, Velazquez, Ana, Bank, Erin, Bergsland, Emily, Boreta, Lauren, Conroy, Patricia, Daras, Mariza, Sibley, Amanda, Hsu, Gerald, Paris, Pamela, Piawah, Sorbarikor, Sinha, Sumi, Tsang, Mazie, Venook, Alan, Wong, Melisa, Yom, Sue, Van Loon, Katherine, Hermiston, Michelle, Sosa, Julie Ann, Zhang, Li, and Keenan, Bridget
Subjects: culture, gender equity, health workforce, oncologists, Humans, Female, Male, Sexism, Academic Success, Pandemics, Faculty, Medical, COVID-19, Neoplasms
Abstract: INTRODUCTION: The primary objective of this study was to determine whether workplace culture in academic oncology differed by gender, during the COVID-19 pandemic. MATERIALS AND METHODS: We used the Culture Conducive to Womens Academic Success (CCWAS), a validated survey tool, to investigate the academic climate at an NCI-designated Cancer Center. We adapted the CCWAS to be applicable to people of all genders. The full membership of the Cancer Center was surveyed (total faculty = 429). The questions in each of 4 CCWAS domains (equal access to opportunities, work-life balance, freedom from gender bias, and leadership support) were scored using a 5-point Likert scale. Median score and interquartile ranges for each domain were calculated. RESULTS: A total of 168 respondents (men = 58, women = 106, n = 4 not disclosed) submitted survey responses. The response rate was 39% overall and 70% among women faculty. We found significant differences in perceptions of workplace culture by gender, both in responses to individual questions and in the overall score in the following domains: equal access to opportunities, work-life balance, and leader support, and in the total score for the CCWAS. CONCLUSIONS: Our survey is the first of its kind completed during the COVID-19 pandemic at an NCI-designated Cancer Center, in which myriad factors contributed to burnout and workplace challenges. These results point to specific issues that detract from the success of women pursuing careers in academic oncology. Identifying these issues can be used to design and implement solutions to improve workforce culture, mitigate gender bias, and retain faculty.
Published: 2024

243. CD46-targeted theranostics for Positron Emission Tomography and 225Ac-Radiopharmaceutical Therapy of Multiple Myeloma

Author: Wadhwa, Anju, Wang, Sinan, Patiño-Escobar, Bonell, Bidkar, Anil P, Bobba, Kondapa Naidu, Chan, Emily, Meher, Niranjan, Bidlingmaier, Scott, Su, Yang, Dhrona, Suchi, Geng, Huimin, Sarin, Vishesh, VanBrocklin, Henry F, Wilson, David M, He, Jiang, Zhang, Li, Steri, Veronica, Wong, Sandy W, Martin, Thomas G, Seo, Youngho, Liu, Bin, Wiita, Arun P, and Flavell, Robert R
Subjects: Biomedical and Clinical Sciences, Oncology and Carcinogenesis, Hematology, Biotechnology, Rare Diseases, Cancer, Biomedical Imaging, Bioengineering, Orphan Drug, 5.1 Pharmaceuticals, Development of treatments and therapeutic interventions, Male, Humans, Animals, Mice, Multiple Myeloma, Precision Medicine, Actinium, Radioisotopes, Radiopharmaceuticals, Zirconium, Cell Line, Tumor, Positron Emission Tomography Computed Tomography, Antibodies, Membrane Cofactor Protein, Oncology & Carcinogenesis, Clinical sciences, Oncology and carcinogenesis
Abstract: PurposeMultiple myeloma is a plasma cell malignancy with an unmet clinical need for improved imaging methods and therapeutics. Recently, we identified CD46 as an overexpressed therapeutic target in multiple myeloma and developed the antibody YS5, which targets a cancer-specific epitope on this protein. We further developed the CD46-targeting PET probe [89Zr]Zr-DFO-YS5 for imaging and [225Ac]Ac-DOTA-YS5 for radiopharmaceutical therapy of prostate cancer. These prior studies suggested the feasibility of the CD46 antigen as a theranostic target in multiple myeloma. Herein, we validate [89Zr]Zr-DFO-YS5 for immunoPET imaging and [225Ac]Ac-DOTA-YS5 for radiopharmaceutical therapy of multiple myeloma in murine models.Experimental designIn vitro saturation binding was performed using the CD46 expressing MM.1S multiple myeloma cell line. ImmunoPET imaging using [89Zr]Zr-DFO-YS5 was performed in immunodeficient (NSG) mice bearing subcutaneous and systemic multiple myeloma xenografts. For radioligand therapy, [225Ac]Ac-DOTA-YS5 was prepared, and both dose escalation and fractionated dose treatment studies were performed in mice bearing MM1.S-Luc systemic xenografts. Tumor burden was analyzed using BLI, and body weight and overall survival were recorded to assess antitumor effect and toxicity.Results[89Zr]Zr-DFO-YS5 demonstrated high affinity for CD46 expressing MM.1S multiple myeloma cells (Kd = 16.3 nmol/L). In vitro assays in multiple myeloma cell lines demonstrated high binding, and bioinformatics analysis of human multiple myeloma samples revealed high CD46 expression. [89Zr]Zr-DFO-YS5 PET/CT specifically detected multiple myeloma lesions in a variety of models, with low uptake in controls, including CD46 knockout (KO) mice or multiple myeloma mice using a nontargeted antibody. In the MM.1S systemic model, localization of uptake on PET imaging correlated well with the luciferase expression from tumor cells. A treatment study using [225Ac]Ac-DOTA-YS5 in the MM.1S systemic model demonstrated a clear tumor volume and survival benefit in the treated groups.ConclusionsOur study showed that the CD46-targeted probe [89Zr]Zr-DFO-YS5 can successfully image CD46-expressing multiple myeloma xenografts in murine models, and [225Ac]Ac-DOTA-YS5 can effectively inhibit the growth of multiple myeloma. These results demonstrate that CD46 is a promising theranostic target for multiple myeloma, with the potential for clinical translation.
Published: 2024

244. Explicit construction of quasi-periodic discrete schr\'odinger operators with cantor spectrum

Author: Hou, Xuanji and Zhang, Li
Subjects: Mathematics - Dynamical Systems
Abstract: We construct 1-dim difference Schr\"odinger operators with a class of Geverey potentials such that Cantor spectrum occurs together with the estimations of open spectral gaps. The proof is based on KAM and Moser-P\"oschel argument ., Comment: The loopholes in the proof of Chapter 5 and the errors in the text of Chapter 1 have been revised
Published: 2023

245. Fast KV-Switching and Dual-Layer Flat-Panel Detector Enabled Cone-Beam CT Joint Spectral Imaging

Author: Zhou, Hao, Zhang, Li, Wang, Zhilei, and Gao, Hewei
Subjects: Physics - Medical Physics
Abstract: Purpose: Fast kV-switching (FKS) and dual-layer flat-panel detector (DL-FPD) technologies have been actively studied as promising dual-energy solutions for FPD-based cone-beam computed tomography (CBCT). However, CBCT spectral imaging is known to face challenges in obtaining accurate and robust material discrimination performance due to the limited energy separation. To further improve CBCT spectral imaging capability, this work aims to promote a source-detector joint spectral imaging solution which takes advantages of both FKS and DL-FPD, and to conduct a feasibility study on the first tabletop CBCT system with the joint spectral imaging capability developed. Methods: In this work, the first FKS and DL-FPD jointly enabled multi-energy tabletop CBCT system has been developed in our laboratory. To evaluate its spectral imaging performance, a set of physics experiments are conducted, where the multi-energy and head phantoms are scanned using the 80/105/130kVp switching pairs and projection data are collected using a prototype DL-FPD. To compensate for the slightly angular mismatch between the low- and high-energy projections in FKS, a dual-domain projection completion scheme is implemented. Afterwards material decomposition is carried out by using the maximum-likelihood method, followed by reconstruction of basis material and virtual monochromatic images. Results: The physics experiments confirmed the feasibility and superiority of the joint spectral imaging, whose CNR of the multi-energy phantom were boosted by an average improvement of 21.9%, 20.4% for water and 32.8%, 62.8% for iodine when compared with that of the FKS and DL-FPD in fan-beam and cone-beam experiments, respectively. Conclusions: A feasibility study of the joint spectral imaging for CBCT by utilizing both the FKS and DL-FPD was conducted, with the first tabletop CBCT system having such a capability being developed.
Published: 2023

246. Harnessing Diffusion Models for Visual Perception with Meta Prompts

Author: Wan, Qiang, Huang, Zilong, Kang, Bingyi, Feng, Jiashi, and Zhang, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual inputs, a feat made possible through its pre-training on large-scale image-text pairs. This leads to a natural inquiry: can diffusion models be utilized to tackle visual perception tasks? In this paper, we propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. The effect of meta prompts are two-fold. First, as a direct replacement of the text embeddings in the T2I models, it can activate task-relevant features during feature extraction. Second, it will be used to re-arrange the extracted features to ensures that the model focuses on the most pertinent features for the task on hand. Additionally, we design a recurrent refinement training strategy that fully leverages the property of diffusion models, thereby yielding stronger visual features. Extensive experiments across various benchmarks validate the effectiveness of our approach. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes. Concurrently, the proposed method attains results comparable to the current state-of-the-art in semantic segmentation on ADE20K and pose estimation on COCO datasets, further exemplifying its robustness and versatility.
Published: 2023

247. Layer-dependent evolution of electronic structures and correlations in rhombohedral multilayer graphene

Author: Zhang, Yang, Zhou, Yue-Ying, Zhang, Shihao, Cai, Hao, Tong, Ling-Hui, Tian, Yuan, Chen, Tongtong, Tian, Qiwei, Zhang, Chen, Wang, Yiliu, Zou, Xuming, Liu, Xingqiang, Hu, Yuanyuan, Ren, Ya-Ning, Zhang, Li, Zhang, Lijie, Wang, Wen-Xiao, He, Lin, Liao, Lei, Qin, Zhihui, and Yin, Long-Jing
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Strongly Correlated Electrons
Abstract: The recent discovery of superconductivity and magnetism in trilayer rhombohedral graphene (RG) establishes an ideal, untwisted platform to study strong correlation electronic phenomena. However, the correlated effects in multilayer RG have received limited attention, and, particularly, the evolution of the correlations with increasing layer number remains an unresolved question. Here, we show the observation of layer-dependent electronic structures and correlations, under surprising liquid nitrogen temperature, in RG multilayers from 3 to 9 layers by using scanning tunneling microscopy and spectroscopy. We explicitly determine layer-enhanced low-energy flat bands and interlayer coupling strengths. The former directly demonstrates the further flattening of low-energy bands in thicker RG, and the latter indicates the presence of varying interlayer interactions in RG multilayers. Moreover, we find significant splittings of the flat bands, ranging from ~50-80 meV, at 77 K when they are partially filled, indicating the emergence of interaction-induced strongly correlated states. Particularly, the strength of the correlated states is notably enhanced in thicker RG and reaches its maximum in the six-layer, validating directly theoretical predictions and establishing abundant new candidates for strongly correlated systems. Our results provide valuable insights into the layer dependence of the electronic properties in RG and demonstrate it as a suitable system for investigating robust and highly accessible correlated phases., Comment: 20 pages, 4 figures
Published: 2023
Full Text: View/download PDF

248. Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning

Author: Huang, Xijie, Zhang, Li Lyna, Cheng, Kwang-Ting, Yang, Fan, and Yang, Mao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning. In this work, we propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning to improve LLM mathematical reasoning. Motivated by the observation that adding more concise CoT examples in the prompt can improve LLM reasoning performance, CoT-Influx employs a coarse-to-fine pruner to maximize the input of effective and concise CoT examples. The pruner first selects as many crucial CoT examples as possible and then prunes unimportant tokens to fit the context window. A math reasoning dataset with diverse difficulty levels and reasoning steps is used to train the pruner, along with a math-specialized reinforcement learning approach. As a result, by enabling more CoT examples with double the context window size in tokens, CoT-Influx significantly outperforms various prompting baselines across various LLMs (LLaMA2-7B, 13B, 70B) and 5 math datasets, achieving up to 4.55% absolute improvements. Remarkably, without any fine-tuning, LLaMA2-70B with CoT-Influx surpasses GPT-3.5 and a wide range of larger LLMs (PaLM, Minerva 540B, etc.) on the GSM8K. CoT-Influx serves as a plug-and-play module for LLMs and is compatible with most existing reasoning prompting techniques, such as self-consistency and self-verification.
Published: 2023

249. Second-harmonic generation with a 440,000% W-1 conversion efficiency in a lithium niobate microcavity without periodic poling

Author: Wu, Xiao, Hao, Zhenzhong, Zhang, Li, Jia, Di, Ma, Rui, Bo, Fang, Gao, Feng, Zhang, Guoquan, and Xu, Jingjun
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: Thin-film lithium niobate (TFLN) enables extremely high-efficiency second-order nonlinear optical effects due to large nonlinear coefficient d33 and strong optical field localization. Here, we first designed and fabricated a pulley-waveguide-coupled microring resonator with an intrinsic quality factor above 9.4 x10^5 on the reverse-polarized double-layer X-cut TFLN. In such a TFLN resonator without fine domain structures, second harmonic generation with an absolute (normalized) conversion efficiency of 30% (440,000% W-1), comparable to that in periodically poled lithium niobate (PPLN) microring resonators, was realized with a sub-microwatt continuous pump. This work reduces the dependence of high-efficiency nonlinear frequency conversion on PPLN microcavities that are difficult to prepare.
Published: 2023

250. Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization

Author: Zhao, Huan, Zhang, Li, Li, Yue, Wang, Yannan, Wang, Hongji, Rao, Wei, Wang, Qing, and Xie, Lei
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual speaker diarization systems. To improve the performance of audio-visual speaker diarization, we leverage pre-trained supervised and self-supervised speech models for audio-visual speaker diarization. Specifically, we adopt supervised~(ResNet and ECAPA-TDNN) and self-supervised pre-trained models~(WavLM and HuBERT) as the speaker and audio embedding extractors in an end-to-end audio-visual speaker diarization~(AVSD) system. Then we explore the effectiveness of different frameworks, including Transformer, Conformer, and cross-attention mechanism, in the audio-visual decoder. To mitigate the degradation of performance caused by separate training, we jointly train the audio encoder, speaker encoder, and audio-visual decoder in the AVSD system. Experiments on the MISP dataset demonstrate that the proposed method achieves superior performance and obtained third place in MISP Challenge 2022.
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

128,443 results on '"Zhang, Li"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources