Author: "Li, Jialu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Jialu"' showing total 1,340 results

Start Over Author "Li, Jialu"

1,340 results on '"Li, Jialu"'

1. DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Author: Wang, Zun, Li, Jialu, Lin, Han, Yoon, Jaehong, and Bansal, Mohit
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Storytelling video generation (SVG) has recently emerged as a task to create long, multi-motion, multi-scene videos that consistently represent the story described in the input text script. SVG holds great potential for diverse content creation in media and entertainment; however, it also presents significant challenges: (1) objects must exhibit a range of fine-grained, complex motions, (2) multiple objects need to appear consistently across scenes, and (3) subjects may require multiple motions with seamless transitions within a single scene. To address these challenges, we propose DreamRunner, a novel story-to-video generation method: First, we structure the input script using a large language model (LLM) to facilitate both coarse-grained scene planning as well as fine-grained object-level layout and motion planning. Next, DreamRunner presents retrieval-augmented test-time adaptation to capture target motion priors for objects in each scene, supporting diverse motion customization based on retrieved videos, thus facilitating the generation of new videos with complex, scripted motions. Lastly, we propose a novel spatial-temporal region-based 3D attention and prior injection module SR3AI for fine-grained object-motion binding and frame-by-frame semantic control. We compare DreamRunner with various SVG baselines, demonstrating state-of-the-art performance in character consistency, text alignment, and smooth transitions. Additionally, DreamRunner exhibits strong fine-grained condition-following ability in compositional text-to-video generation, significantly outperforming baselines on T2V-ComBench. Finally, we validate DreamRunner's robust ability to generate multi-object interactions with qualitative examples., Comment: Project website: https://dreamrunner-story2video.github.io/
Published: 2024

2. Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation

Author: Polamreddy, Lakshmikar R., Roy, Kalyan, Yueh, Sheng-Han, Mahato, Deepshikha, Kuppili, Shilpa, Li, Jialu, and Zhang, Youshan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The scarcity of accessible medical image data poses a significant obstacle in effectively training deep learning models for medical diagnosis, as hospitals refrain from sharing their data due to privacy concerns. In response, we gathered a diverse dataset named MedImgs, which comprises over 250,127 images spanning 61 disease types and 159 classes of both humans and animals from open-source repositories. We propose a Leapfrog Latent Consistency Model (LLCM) that is distilled from a retrained diffusion model based on the collected MedImgs dataset, which enables our model to generate real-time high-resolution images. We formulate the reverse diffusion process as a probability flow ordinary differential equation (PF-ODE) and solve it in latent space using the Leapfrog algorithm. This formulation enables rapid sampling without necessitating additional iterations. Our model demonstrates state-of-the-art performance in generating medical images. Furthermore, our model can be fine-tuned with any custom medical image datasets, facilitating the generation of a vast array of images. Our experimental results outperform those of existing models on unseen dog cardiac X-ray images. Source code is available at https://github.com/lskdsjy/LeapfrogLCM., Comment: Total 16 pages including 5 figures and 36 references
Published: 2024

3. SparrowVQE: Visual Question Explanation for Course Content Understanding

Author: Li, Jialu, Thota, Manish Kumar, Gokhman, Ruslan, Holik, Radek, and Zhang, Youshan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Visual Question Answering (VQA) research seeks to create AI systems to answer natural language questions in images, yet VQA methods often yield overly simplistic and short answers. This paper aims to advance the field by introducing Visual Question Explanation (VQE), which enhances the ability of VQA to provide detailed explanations rather than brief responses and address the need for more complex interaction with visual content. We first created an MLVQE dataset from a 14-week streamed video machine learning course, including 885 slide images, 110,407 words of transcripts, and 9,416 designed question-answer (QA) pairs. Next, we proposed a novel SparrowVQE, a small 3 billion parameters multimodal model. We trained our model with a three-stage training mechanism consisting of multimodal pre-training (slide images and transcripts feature alignment), instruction tuning (tuning the pre-trained model with transcripts and QA pairs), and domain fine-tuning (fine-tuning slide image and QA pairs). Eventually, our SparrowVQE can understand and connect visual information using the SigLIP model with transcripts using the Phi-2 language model with an MLP adapter. Experimental results demonstrate that our SparrowVQE achieves better performance in our developed MLVQE dataset and outperforms state-of-the-art methods in the other five benchmark VQA datasets. The source code is available at \url{https://github.com/YoushanZhang/SparrowVQE}.
Published: 2024

4. Unbounded: A Generative Infinite Game of Character Life Simulation

Author: Li, Jialu, Li, Yuanzhen, Wadhwa, Neal, Pritch, Yael, Jacobs, David E., Rubinstein, Michael, Bansal, Mohit, and Ruiz, Nataniel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models. Inspired by James P. Carse's distinction between finite and infinite games, we leverage recent advances in generative AI to create Unbounded: a game of character life simulation that is fully encapsulated in generative models. Specifically, Unbounded draws inspiration from sandbox life simulations and allows you to interact with your autonomous virtual character in a virtual world by feeding, playing with and guiding it - with open-ended mechanics generated by an LLM, some of which can be emergent. In order to develop Unbounded, we propose technical innovations in both the LLM and visual generation domains. Specifically, we present: (1) a specialized, distilled large language model (LLM) that dynamically generates game mechanics, narratives, and character interactions in real-time, and (2) a new dynamic regional image prompt Adapter (IP-Adapter) for vision models that ensures consistent yet flexible visual generation of a character across multiple environments. We evaluate our system through both qualitative and quantitative analysis, showing significant improvements in character life simulation, user instruction following, narrative coherence, and visual consistency for both characters and the environments compared to traditional related approaches., Comment: Project page: https://generative-infinite-game.github.io/
Published: 2024

5. A quasi-ohmic back contact achieved by inserting single-crystal graphene in flexible Kesterite solar cells

Author: Ji, Yixiong, Yang, Wentong, Yan, Di, Luo, Wei, Li, Jialu, Tang, Shi, Fu, Jintao, Bullock, James, Gao, Mei, Li, Xin, Li, Zhancheng, Yang, Jun, Wei, Xingzhan, Shi, Haofei, Liu, Fangyang, and Mulvaney, Paul
Subjects: Condensed Matter - Materials Science, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: Flexible photovoltaics with a lightweight and adaptable nature that allows for deployment on curved surfaces and in building facades have always been a goal vigorously pursued by researchers in thin-film solar cell technology. The recent strides made in improving the sunlight-to-electricity conversion efficiency of kesterite Cu$_{2}$ZnSn(S, Se)$_{4}$ (CZTSSe) suggest it to be a perfect candidate. However, making use of rare Mo foil in CZTSSe solar cells causes severe problems in thermal expansion matching, uneven grain growth, and severe problems at the back contact of the devices. Herein, a strategy utilizing single-crystal graphene to modify the back interface of flexible CZTSSe solar cells is proposed. It will be shown that the insertion of graphene at the Mo foil/CZTSSe interface provides strong physical support for the subsequent deposition of the CZTSSe absorber layer, improving the adhesion between the absorber layer and the Mo foil substrate. Additionally, the graphene passivates the rough sites on the surface of the Mo foil, enhancing the chemical homogeneity of the substrate, and resulting in a more crystalline and homogeneous CZTSSe absorber layer on the Mo foil substrate. The detrimental reaction between Mo and CZTSSe has also been eliminated. Through an analysis of the electrical properties, it is found that the introduction of graphene at the back interface promotes the formation of a quasi-ohmic contact at the back contact, decreasing the back contact barrier of the solar cell, and leading to efficient collection of charges at the back interface. This investigation demonstrates that solution-based CZTSSe photovoltaic devices could form the basis of cheap and flexible solar cells.
Published: 2024

6. Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Author: Zhang, Yue, Ma, Ziqiao, Li, Jialu, Qiao, Yanyuan, Wang, Zun, Chai, Joyce, Wu, Qi, Bansal, Mohit, and Kordjamshidi, Parisa
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers., Comment: Authors contributed equally to this work, and supervisors contributed equal advising to this work
Published: 2024

7. Sound Tagging in Infant-centric Home Soundscapes

Author: Khan, Mohammad Nur Hossain, Li, Jialu, McElwain, Nancy L., Hasegawa-Johnson, Mark, and Islam, Bashima
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively., Comment: Accepted in IEEE/ACM CHASE 2024
Published: 2024

8. Vision Transformer Segmentation for Visual Bird Sound Denoising

Author: Kumar, Sahil, Li, Jialu, and Zhang, Youshan
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio denoising, especially in the context of bird sounds, remains a challenging task due to persistent residual noise. Traditional and deep learning methods often struggle with artificial or low-frequency noise. In this work, we propose ViTVS, a novel approach that leverages the power of the vision transformer (ViT) architecture. ViTVS adeptly combines segmentation techniques to disentangle clean audio from complex signal mixtures. Our key contributions encompass the development of ViTVS, introducing comprehensive, long-range, and multi-scale representations. These contributions directly tackle the limitations inherent in conventional approaches. Extensive experiments demonstrate that ViTVS outperforms state-of-the-art methods, positioning it as a benchmark solution for real-world bird sound denoising applications. Source code is available at: https://github.com/aiai-4/ViVTS., Comment: INTERSPEECH 2024
Published: 2024

9. Complex Image-Generative Diffusion Transformer for Audio Denoising

Author: Li, Junhui, Wang, Pu, Li, Jialu, and Zhang, Youshan
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The audio denoising technique has captured widespread attention in the deep neural network field. Recently, the audio denoising problem has been converted into an image generation task, and deep learning-based approaches have been applied to tackle this problem. However, its performance is still limited, leaving room for further improvement. In order to enhance audio denoising performance, this paper introduces a complex image-generative diffusion transformer that captures more information from the complex Fourier domain. We explore a novel diffusion transformer by integrating the transformer with a diffusion model. Our proposed model demonstrates the scalability of the transformer and expands the receptive field of sparse attention using attention diffusion. Our work is among the first to utilize diffusion transformers to deal with the image generation task for audio denoising. Extensive experiments on two benchmark datasets demonstrate that our proposed model outperforms state-of-the-art methods., Comment: INTERSPEECH 2024
Published: 2024

10. Diffusion Gaussian Mixture Audio Denoise

Author: Wang, Pu, Li, Junhui, Li, Jialu, Guo, Liangdong, and Zhang, Youshan
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent diffusion models have achieved promising performances in audio-denoising tasks. The unique property of the reverse process could recover clean signals. However, the distribution of real-world noises does not comply with a single Gaussian distribution and is even unknown. The sampling of Gaussian noise conditions limits its application scenarios. To overcome these challenges, we propose a DiffGMM model, a denoising model based on the diffusion and Gaussian mixture models. We employ the reverse process to estimate parameters for the Gaussian mixture model. Given a noisy audio signal, we first apply a 1D-U-Net to extract features and train linear layers to estimate parameters for the Gaussian mixture model, and we approximate the real noise distributions. The noisy signal is continuously subtracted from the estimated noise to output clean audio signals. Extensive experimental results demonstrate that the proposed DiffGMM model achieves state-of-the-art performance., Comment: INTERSPEECH 2024
Published: 2024

11. On the Interpretation of Mid-Infrared Absorption Lines of Gas-Phase H$_2$O as Observed by JWST/MIRI

Author: Li, Jialu, Boogert, Adwin, and Tielens, Alexander G. G. M.
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Astrophysics of Galaxies
Abstract: Ro-vibrational absorption lines of H$_2$O in the 5-8 $\mu$m wavelength range selectively probe gas against the mid-infrared continuum emitting background of the inner regions of YSOs and AGN and deliver important information about these warm, dust-obscured environments. JWST/MIRI detects these lines in many lines of sight at a moderate spectral resolving power of $R\sim3500$ (FWHM of 85 km/s). Based on our analysis of high-resolution SOFIA/EXES observations, we find that the interpretation of JWST/MIRI absorption spectra can be severely hampered by the blending of individual transitions and the lost information on the intrinsic line width or the partial coverage of the background continuum source. In this paper, we point out problems such as degeneracy that arise in deriving physical properties from an insufficiently resolved spectrum. This can lead to differences in the column density by two orders of magnitude. We emphasize the importance of weighting optically thin and weak lines in spectral analyses and provide recipes for breaking down the coupled parameters. We also provide an online tool to generate the H$_2$O absorption line spectra that can be compared to observations., Comment: Accepted for publication in ApJS. 26 pages, 23 figures. Comments are more than welcome!
Published: 2024

12. QUADFormer: Learning-based Detection of Cyber Attacks in Quadrotor UAVs

Author: Wang, Pengyu, Yang, Zhaohua, Yang, Nachuan, Wang, Zikai, Li, Jialu, Zhang, Fan, Wang, Chaoqun, Wang, Jiankun, Meng, Max Q. -H., and Shi, Ling
Subjects: Computer Science - Robotics
Abstract: Safety-critical intelligent cyber-physical systems, such as quadrotor unmanned aerial vehicles (UAVs), are vulnerable to different types of cyber attacks, and the absence of timely and accurate attack detection can lead to severe consequences. When UAVs are engaged in large outdoor maneuvering flights, their system constitutes highly nonlinear dynamics that include non-Gaussian noises. Therefore, the commonly employed traditional statistics-based and emerging learning-based attack detection methods do not yield satisfactory results. In response to the above challenges, we propose QUADFormer, a novel Quadrotor UAV Attack Detection framework with transFormer-based architecture. This framework includes a residue generator designed to generate a residue sequence sensitive to anomalies. Subsequently, this sequence is fed into a transformer structure with disparity in correlation to specifically learn its statistical characteristics for the purpose of classification and attack detection. Finally, we design an alert module to ensure the safe execution of tasks by UAVs under attack conditions. We conduct extensive simulations and real-world experiments, and the results show that our method has achieved superior detection performance compared with many state-of-the-art methods.
Published: 2024

13. Cryo-EM structure of the human subcortical maternal complex and the associated discovery of infertility-associated variants

Author: Chi, Pengliang, Ou, Guojin, Liu, Sibei, Ma, Qianhong, Lu, Yuechao, Li, Jinhong, Li, Jialu, Qi, Qianqian, Han, Zhuo, Zhang, Zihan, Liu, Qingting, Guo, Li, Chen, Jing, Wang, Xiang, Huang, Wei, Li, Lei, and Deng, Dong
Published: 2024
Full Text: View/download PDF

14. SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Author: Li, Jialu, Cho, Jaemin, Sung, Yi-Lin, Yoon, Jaehong, and Bansal, Mohit
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Recent text-to-image (T2I) generation models have demonstrated impressive capabilities in creating images from text descriptions. However, these T2I generation models often fall short of generating images that precisely match the details of the text inputs, such as incorrect spatial relationship or missing objects. In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging. First, SELMA leverages an LLM's in-context learning capability to generate multiple datasets of text prompts that can teach different skills, and then generates the images with a T2I model based on the prompts. Next, SELMA adapts the T2I model to the new skills by learning multiple single-skill LoRA (low-rank adaptation) experts followed by expert merging. Our independent expert fine-tuning specializes multiple models for different skills, and expert merging helps build a joint multi-skill T2I model that can generate faithful images given diverse text prompts, while mitigating the knowledge conflict from different datasets. We empirically demonstrate that SELMA significantly improves the semantic alignment and text faithfulness of state-of-the-art T2I diffusion models on multiple benchmarks (+2.1% on TIFA and +6.9% on DSG), human preference metrics (PickScore, ImageReward, and HPS), as well as human evaluation. Moreover, fine-tuning with image-text pairs auto-collected via SELMA shows comparable performance to fine-tuning with ground truth data. Lastly, we show that fine-tuning with images from a weaker T2I model can help improve the generation quality of a stronger T2I model, suggesting promising weak-to-strong generalization in T2I models., Comment: First two authors contributed equally; Project website: https://selma-t2i.github.io/
Published: 2024

15. Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

Author: Li, Jialu, Hasegawa-Johnson, Mark, and McElwain, Nancy L.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: To understand why self-supervised learning (SSL) models have empirically achieved strong performances on several speech-processing downstream tasks, numerous studies have focused on analyzing the encoded information of the SSL layer representations in adult speech. Limited work has investigated how pre-training and fine-tuning affect SSL models encoding children's speech and vocalizations. In this study, we aim to bridge this gap by probing SSL models on two relevant downstream tasks: (1) phoneme recognition (PR) on the speech of adults, older children (8-10 years old), and younger children (1-4 years old), and (2) vocalization classification (VC) distinguishing cry, fuss, and babble for infants under 14 months old. For younger children's PR, the superiority of fine-tuned SSL models is largely due to their ability to learn features that represent older children's speech and then adapt those features to the speech of younger children. For infant VC, SSL models pre-trained on large-scale home recordings learn to leverage phonetic representations at middle layers, and thereby enhance the performance of this task., Comment: Accepted to 2024 ICASSP Workshop of Self-supervision in Audio, Speech and Beyond (SASB)
Published: 2024

16. VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation

Author: Li, Jialu, Padmakumar, Aishwarya, Sukhatme, Gaurav, and Bansal, Mohit
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions. The performance of existing VLN methods is limited by insufficient diversity in navigation environments and limited training data. To address these issues, we propose VLN-Video, which utilizes the diverse outdoor environments present in driving videos in multiple cities in the U.S. augmented with automatically generated navigation instructions and actions to improve outdoor VLN performance. VLN-Video combines the best of intuitive classical approaches and modern deep learning techniques, using template infilling to generate grounded navigation instructions, combined with an image rotation similarity-based navigation action predictor to obtain VLN style data from driving videos for pretraining deep learning VLN models. We pre-train the model on the Touchdown dataset and our video-augmented dataset created from driving videos with three proxy tasks: Masked Language Modeling, Instruction and Trajectory Matching, and Next Action Prediction, so as to learn temporally-aware and visually-aligned instruction representations. The learned instruction representation is adapted to the state-of-the-art navigator when fine-tuning on the Touchdown dataset. Empirical results demonstrate that VLN-Video significantly outperforms previous state-of-the-art models by 2.1% in task completion rate, achieving a new state-of-the-art on the Touchdown dataset., Comment: AAAI 2024
Published: 2024

17. Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

Author: Zhu, Pengfei, Wang, Qian, Wang, Yu, Li, Jialu, and Hu, Qinghua
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: https://github.com/q086/DyFSS.
Published: 2024

18. Molecular insights into the inhibition of proton-activated chloride channel by transfer RNA

Author: Chi, Pengliang, Wang, Xiang, Li, Jialu, Yang, Hui, Li, Kaiju, Zhang, Yuqi, Lin, Shiyi, Yu, Leiye, Liu, Shiqi, Chen, Lu, Ren, Ruobing, Wu, Jianping, Huang, Zhuo, Geng, Jia, and Deng, Dong
Published: 2024
Full Text: View/download PDF

19. Dynamics in Star-forming Cores (DiSCo): Project Overview and the First Look toward the B1 and NGC 1333 Regions in Perseus

Author: Chen, Che-Yu, Friesen, Rachel, Li, Jialu, Schmiedeke, Anika, Frayer, David, Li, Zhi-Yun, Tobin, John, Looney, Leslie W., Offner, Stella, Mundy, Lee G., Harris, Andrew I., Church, Sarah, Ostriker, Eve C., Pineda, Jaime E., Hsieh, Tien-Hao, and Lam, Ka Ho
Subjects: Astrophysics - Astrophysics of Galaxies, Astrophysics - Solar and Stellar Astrophysics
Abstract: The internal velocity structure within dense gaseous cores plays a crucial role in providing the initial conditions for star formation in molecular clouds. However, the kinematic properties of dense gas at core scales (~0.01 - 0.1 pc) has not been extensively characterized because of instrument limitations until the unique capabilities of GBT-Argus became available. The ongoing GBT-Argus Large Program, Dynamics in Star-forming Cores (DiSCo) thus aims to investigate the origin and distribution of angular momenta of star-forming cores. DiSCo will survey all starless cores and Class 0 protostellar cores in the Perseus molecular complex down to ~0.01 pc scales with < 0.05 km/s velocity resolution using the dense gas tracer N$_2$H$^+$. Here, we present the first datasets from DiSCo toward the B1 and NGC 1333 regions in Perseus. Our results suggest that a dense core's internal velocity structure has little correlation with other core-scale properties, indicating these gas motions may be originated externally from cloud-scale turbulence. These first datasets also reaffirm the ability of GBT-Argus for studying dense core velocity structure and provided an empirical basis for future studies that address the angular momentum problem with a statistically broad sample., Comment: 17 pages, 12 figures, accepted by MNRAS
Published: 2023

20. GBT/Argus Observations of Molecular Gas in the Inner Regions of IC 342

Author: Li, Jialu, Harris, Andrew I, Rosolowsky, Erik, Kepley, Amanda, Frayer, David, Bolatto, Alberto, Leroy, Adam K, Meyer, Jennifer Donovan, Church, Sarah, Gundersen, Joshua Ott, Cleary, Kieran, and members, DEGAS team
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: We report observations of the ground state transitions of $^{12}$CO, $^{13}$CO C$^{18}$O, HCN, and HCO$^+$ at 88-115 GHz in the inner region of the nearby galaxy IC 342. These data were obtained with the 16-pixel spectroscopic focal plane array Argus on the 100-m Robert C. Byrd Green Bank Telescope (GBT) at 6-9$^{\prime\prime}$ resolution. In the nuclear bar region, the intensity distributions of $^{12}$CO(1-0) and $^{13}$CO(1-0) emission trace moderate densities, and differ from the dense gas distributions sampled in C$^{18}$O(1-0), HCN(1-0), and HCO$^+$(1-0). We observe a constant HCN(1-0)-to-HCO$^+$(1-0) ratio of 1.2$\pm$0.1 across the whole $\sim$1 kpc bar. This indicates that HCN(1-0) and HCO$^+$(1-0) lines have intermediate optical depth, and that the corresponding $n_{\textrm{H}_2}$ of the gas producing the emission is of 10$^{4.5-6}$ cm$^{-3}$. We show that HCO$^+$(1-0) is thermalized and HCN(1-0) is close to thermalization. The very tight correlation between HCN(1-0) and HCO$^+$(1-0) intensities across the 1~kpc bar suggests that this ratio is more sensitive to the relative abundance of the two species than to the gas density. We confirm the angular offset ($\sim$10$^{\prime\prime}$) between the spatial distribution of molecular gas and the star formation sites. Finally, we find a breakdown of the $L_\textrm{IR}$-$L_\textrm{HCN}$ correlation at high spatial resolution due to the effect of incomplete sampling of star-forming regions by HCN emission in IC 342. The scatter of the $L_\textrm{IR}$-$L_\textrm{HCN}$ relation decreases as the spatial scale increases from 10$^{\prime\prime}$~to 30$^{\prime\prime}$ (170-510~pc), and is comparable to the scatter of the global relation at the scale of 340 pc., Comment: Accepted for publication in ApJ. 18 pages, 10 figures, and 4 tables. Comments are more than welcome!
Published: 2023

21. DCHT: Deep Complex Hybrid Transformer for Speech Enhancement

Author: Li, Jialu, Li, Junhui, Wang, Pu, and Zhang, Youshan
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Most of the current deep learning-based approaches for speech enhancement only operate in the spectrogram or waveform domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be further improved. In this paper, we present a novel deep complex hybrid transformer that integrates both spectrogram and waveform domains approaches to improve the performance of speech enhancement. The proposed model consists of two parts: a complex Swin-Unet in the spectrogram domain and a dual-path transformer network (DPTnet) in the waveform domain. We first construct a complex Swin-Unet network in the spectrogram domain and perform speech enhancement in the complex audio spectrum. We then introduce improved DPT by adding memory-compressed attention. Our model is capable of learning multi-domain features to reduce existing noise on different domains in a complementary way. The experimental results on the BirdSoundsDenoising dataset and the VCTK+DEMAND dataset indicate that our method can achieve better performance compared to state-of-the-art methods., Comment: IEEE DDP conference
Published: 2023

22. DPATD: Dual-Phase Audio Transformer for Denoising

Author: Li, Junhui, Wang, Pu, Li, Jialu, Wang, Xinzhe, and Zhang, Youshan
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods. However, time-domain speech enhancement systems typically receive input audio sequences consisting of a large number of time steps, making it challenging to model extremely long sequences and train models to perform adequately. In this paper, we utilize smaller audio chunks as input to achieve efficient utilization of audio information to address the above challenges. We propose a dual-phase audio transformer for denoising (DPATD), a novel model to organize transformer layers in a deep structure to learn clean audio sequences for denoising. DPATD splits the audio input into smaller chunks, where the input length can be proportional to the square root of the original sequence length. Our memory-compressed explainable attention is efficient and converges faster compared to the frequently used self-attention module. Extensive experiments demonstrate that our model outperforms state-of-the-art methods., Comment: IEEE DDP
Published: 2023

23. Complex Image Generation SwinTransformer Network for Audio Denoising

Author: Zhang, Youshan and Li, Jialu
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Achieving high-performance audio denoising is still a challenging task in real-world applications. Existing time-frequency methods often ignore the quality of generated frequency domain images. This paper converts the audio denoising problem into an image generation task. We first develop a complex image generation SwinTransformer network to capture more information from the complex Fourier domain. We then impose structure similarity and detailed loss functions to generate high-quality images and develop an SDR loss to minimize the difference between denoised and clean audios. Extensive experiments on two benchmark datasets demonstrate that our proposed model is better than state-of-the-art methods.
Published: 2023

24. Multimodal Large Language Model for Visual Navigation

Author: Tsai, Yao-Hung Hubert, Dhar, Vansh, Li, Jialu, Zhang, Bowen, and Zhang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems. These systems incorporate instructions, observations, and history into massive text prompts, which are then combined with pre-trained large language models to facilitate visual navigation. In contrast, our approach aims to fine-tune large language models for visual navigation without extensive prompt engineering. Our design involves a simple text prompt, current observations, and a history collector model that gathers information from previous observations as input. For output, our design provides a probability distribution of possible actions that the agent can take during navigation. We train our model using human demonstrations and collision signals from the Habitat-Matterport 3D Dataset (HM3D). Experimental results demonstrate that our method outperforms state-of-the-art behavior cloning methods and effectively reduces collision rates.
Published: 2023

25. Exploration of common pathogenesis and candidate hub genes between HIV and monkeypox co-infection using bioinformatics and machine learning

Author: Li, Jialu, Hao, Yiwei, Wu, Liang, Liang, Hongyuan, Ni, Liang, Wang, Fang, Wang, Sa, Duan, Yujiao, Xu, Qiuhua, Xiao, Jinjing, Yang, Di, Gao, Guiju, Ding, Yi, Gao, Chengyu, Xiao, Jiang, and Zhao, Hongxin
Published: 2024
Full Text: View/download PDF

26. Structural insights into the inhibition mechanism of fungal GWT1 by manogepix

Author: Dai, Xinli, Liu, Xuanzhong, Li, Jialu, Chen, Hui, Yan, Chuangye, Li, Yaozong, Liu, Hanmin, Deng, Dong, and Wang, Xiang
Published: 2024
Full Text: View/download PDF

27. The subcortical maternal complex modulates the cell cycle during early mammalian embryogenesis via 14-3-3

Author: Han, Zhuo, Wang, Rui, Chi, Pengliang, Zhang, Zihan, Min, Ling, Jiao, Haizhan, Ou, Guojin, Zhou, Dan, Qin, Dandan, Xu, Chengpeng, Gao, Zheng, Qi, Qianqian, Li, Jialu, Lu, Yuechao, Wang, Xiang, Chen, Jing, Yu, Xingjiang, Hu, Hongli, Li, Lei, and Deng, Dong
Published: 2024
Full Text: View/download PDF

28. The structural basis of the activation and inhibition of DSR2 NADase by phage proteins

Author: Wang, Ruiwen, Xu, Qi, Wu, Zhuoxi, Li, Jialu, Guo, Hao, Liao, Tianzhui, Shi, Yuan, Yuan, Ling, Gao, Haishan, Yang, Rong, Shi, Zhubing, and Li, Faxiang
Published: 2024
Full Text: View/download PDF

29. The relationship between the psychological resilience and post-traumatic growth of college students during the COVID-19 pandemic: a model of conditioned processes mediated by negative emotions and moderated by deliberate rumination

Author: Xu, Yanhua, Ni, Yonghui, Yang, Jiayan, Wu, Jiamin, Lin, Yating, Li, Jialu, Zeng, Wei, Zeng, Yuqing, Huang, Dongtao, Wu, Xingrou, Shao, Jinlian, Li, Qian, and Zhu, Ziqi
Published: 2024
Full Text: View/download PDF

30. Prognostic and predictive value of interstitial lung abnormalities and EGFR mutation status in patients with non-small cell lung cancer

Author: Xu, Xiaoli, Zhu, Min, Wang, Zixing, Li, Jialu, Ouyang, Tao, Chen, Cen, Huang, Kewu, Zhang, Yuhui, and Gao, Yanli L.
Published: 2024
Full Text: View/download PDF

31. Regressive vision transformer for dog cardiomegaly assessment

Author: Li, Jialu and Zhang, Youshan
Published: 2024
Full Text: View/download PDF

32. Effects of acetazolamide combined with remote ischemic preconditioning on risk of acute mountain sickness: a randomized clinical trial

Author: Liu, Moqi, Jiao, Xueqiao, Li, Rui, Li, Jialu, Wang, Lu, Wang, Liyan, Wang, Yishu, Lv, Chunmei, Huang, Dan, Wei, Ran, Wang, Liming, Ji, Xunming, and Guo, Xiuhai
Published: 2024
Full Text: View/download PDF

33. Pollution characteristics, bioavailability, and risk assessment of heavy metals in urban road dust from Zhengzhou, China

Author: Li, Jialu, Zuo, Qiting, Feng, Feng, Jia, Hongtao, and Ji, Yingxin
Published: 2024
Full Text: View/download PDF

34. Improvement of multi-channel active noise control algorithm for turboprop aircraft cabin

Author: SHEN Hao, XUE Qing, CHEN Tingyu, LI Jialu, and SHEN Xing
Subjects: active noise control, multi-channel system, filtered-x least mean square algorithm, adaptive filtering, Motor vehicles. Aeronautics. Astronautics, TL1-4050
Abstract: At present，the most widely used control algorithms in the field of active noise control（ANC）are the classical FxLMS algorithm and its improved algorithms，which are applied to noise control in large spaces and areas such as the turboprop aircraft cabin，the amount of algorithmic calculations will be rapidly expanded with the increase in the number of channels of the ANC system，and the real-time requirements of algorithms are difficult to meet. Sequential Partial Update FxLMS（SPU-FxLMS）algorithm effectively solves this problem，but its convergence performance is weaker than that of the FxLMS algorithm. In this paper，the improvements to the SPU-FxLMS algorithm for the problem of slow convergence are made，so that the algorithm can be converged with a faster speed in the early stage of the operation，and then converge with a lower speed after converging to a smooth state. After converging to a steady state，the algorithm continues to run with a low computational capacity. The theoretical derivation and simulation analysis of the improved algorithm are carried out. The results show that the algorithm has good noise reduction performance and robustness while further reducing the amount of computation，and has a good prospect for engineering applications.
Published: 2024
Full Text: View/download PDF

35. Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis

Author: Li, Jialu, Hasegawa-Johnson, Mark, and Karahalios, Karrie
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: The assessment of children at risk of autism typically involves a clinician observing, taking notes, and rating children's behaviors. A machine learning model that can label adult and child audio may largely save labor in coding children's behaviors, helping clinicians capture critical events and better communicate with parents. In this study, we leverage Wav2Vec 2.0 (W2V2), pre-trained on 4300-hour of home audio of children under 5 years old, to build a unified system for tasks of clinician-child speaker diarization and vocalization classification (VC). To enhance children's VC, we build a W2V2 phoneme recognition system for children under 4 years old, and we incorporate its phonetically-tuned embeddings as auxiliary features or recognize pseudo phonetic transcripts as an auxiliary task. We test our method on two corpora (Rapid-ABC and BabbleCor) and obtain consistent improvements. Additionally, we outperform the state-of-the-art performance on the reproducible subset of BabbleCor. Code available at https://huggingface.co/lijialudew, Comment: Accepted to Interspeech 2024
Published: 2023

36. Scaling Data Generation in Vision-and-Language Navigation

Author: Wang, Zun, Li, Jialu, Hong, Yicong, Wang, Yi, Wu, Qi, Bansal, Mohit, Gould, Stephen, Tan, Hao, and Qiao, Yu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning. The long-lasting generalization gap between navigating in seen and unseen environments is also reduced to less than 1% (versus 8% in the previous best method). Moreover, our paradigm also facilitates different models to achieve new state-of-the-art navigation results on CVDN, REVERIE, and R2R in continuous environments., Comment: ICCV 2023
Published: 2023

37. Identification and Characterization of Two Novel Extracellular β-Glucanases from Chaetomium globosum against Fusarium sporotrichioides

Author: Jiang, Cheng, Miao, Guopeng, Li, Jialu, Zhang, Ziyu, Li, Jiamin, Zhu, Shuyan, Zhang, Jinhu, and Zhou, Xingyu
Published: 2024
Full Text: View/download PDF

38. High-resolution SOFIA/EXES Spectroscopy of Water Absorption Lines in the Massive Young Binary W3 IRS 5

Author: Li, Jialu, Boogert, Adwin, Barr, Andrew G., DeWitt, Curtis, Rashman, Maisie, Neufeld, David, Indriolo, Nick, Pendleton, Yvonne, Montiel, Edward, Richter, Matt, Chiar, J. E., and Tielens, Alexander G. G.
Subjects: Astrophysics - Astrophysics of Galaxies, Astrophysics - Solar and Stellar Astrophysics
Abstract: We present in this paper mid-infrared (5-8~$\mu$m) spectroscopy toward the massive young binary W3~IRS~5, using the EXES spectrometer in high-resolution mode ($R\sim$50,000) from the NASA Stratospheric Observatory for Infrared Astronomy (SOFIA). Many ($\sim$180) $\nu_2$=1--0 and ($\sim$90) $\nu_2$=2-1 absorption rovibrational transitions are identified. Two hot components over 500 K and one warm component of 190 K are identified through Gaussian fittings and rotation diagram analysis. Each component is linked to a CO component identified in the IRTF/iSHELL observations ($R$=88,100) through their kinematic and temperature characteristics. Revealed by the large scatter in the rotation diagram, opacity effects are important, and we adopt two curve-of-growth analyses, resulting in column densities of $\sim10^{19}$ cm$^{-2}$. In one analysis, the model assumes a foreground slab. The other assumes a circumstellar disk with an outward-decreasing temperature in the vertical direction. The disk model is favored because fewer geometry constraints are needed, although this model faces challenges as the internal heating source is unknown. We discuss the chemical abundances along the line of sight based on the CO-to-H$_2$O connection. In the hot gas, all oxygen not locked in CO resides in water. In the cold gas, we observe a substantial shortfall of oxygen and suggest that the potential carrier could be organics in solid ice., Comment: Accepted for publication in ApJ. 34 pages, 13 figures, and 14 tables. Comments are more than welcome!
Published: 2023
Full Text: View/download PDF

39. PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation

Author: Li, Jialu and Bansal, Mohit
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Vision-and-Language Navigation (VLN) requires the agent to follow language instructions to navigate through 3D environments. One main challenge in VLN is the limited availability of photorealistic training environments, which makes it hard to generalize to new and unseen environments. To address this problem, we propose PanoGen, a generation method that can potentially create an infinite number of diverse panoramic environments conditioned on text. Specifically, we collect room descriptions by captioning the room images in existing Matterport3D environments, and leverage a state-of-the-art text-to-image diffusion model to generate the new panoramic environments. We use recursive outpainting over the generated images to create consistent 360-degree panorama views. Our new panoramic environments share similar semantic information with the original environments by conditioning on text descriptions, which ensures the co-occurrence of objects in the panorama follows human intuition, and creates enough diversity in room appearance and layout with image outpainting. Lastly, we explore two ways of utilizing PanoGen in VLN pre-training and fine-tuning. We generate instructions for paths in our PanoGen environments with a speaker built on a pre-trained vision-and-language model for VLN pre-training, and augment the visual observation with our panoramic environments during agents' fine-tuning to avoid overfitting to seen environments. Empirically, learning with our PanoGen environments achieves the new state-of-the-art on the Room-to-Room, Room-for-Room, and CVDN datasets. Pre-training with our PanoGen speaker data is especially effective for CVDN, which has under-specified instructions and needs commonsense knowledge. Lastly, we show that the agent can benefit from training with more generated panoramic environments, suggesting promising results for scaling up the PanoGen environments., Comment: Project Webpage: https://pano-gen.github.io/
Published: 2023

40. Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio

Author: Li, Jialu, Hasegawa-Johnson, Mark, and McElwain, Nancy L.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: To perform automatic family audio analysis, past studies have collected recordings using phone, video, or audio-only recording devices like LENA, investigated supervised learning methods, and used or fine-tuned general-purpose embeddings learned from large pretrained models. In this study, we advance the audio component of a new infant wearable multi-modal device called LittleBeats (LB) by learning family audio representation via wav2vec 2.0 (W2V2) pertaining. We show given a limited number of labeled LB home recordings, W2V2 pretrained using 1k-hour of unlabeled home recordings outperforms oracle W2V2 pretrained on 960-hour unlabeled LibriSpeech in terms of parent/infant speaker diarization (SD) and vocalization classifications (VC) at home. Extra relevant external unlabeled and labeled data further benefit W2V2 pretraining and fine-tuning. With SpecAug and environmental speech corruptions, we obtain 12% relative gain on SD and moderate boost on VC. Code and model weights are available., Comment: Proceedings of Interspeech 2023; v4 version updates: correction of W2V2-base pretrained on 960-hour of LibriSpeech and number of families participated for LENA home recordings
Published: 2023
Full Text: View/download PDF

41. Improving Vision-and-Language Navigation by Generating Future-View Image Semantics

Author: Li, Jialu and Bansal, Mohit
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Vision-and-Language Navigation (VLN) is the task that requires an agent to navigate through the environment based on natural language instructions. At each step, the agent takes the next action by selecting from a set of navigable locations. In this paper, we aim to take one step further and explore whether the agent can benefit from generating the potential future view during navigation. Intuitively, humans will have an expectation of how the future environment will look like, based on the natural language instructions and surrounding views, which will aid correct navigation. Hence, to equip the agent with this ability to generate the semantics of future navigation views, we first propose three proxy tasks during the agent's in-domain pre-training: Masked Panorama Modeling (MPM), Masked Trajectory Modeling (MTM), and Action Prediction with Image Generation (APIG). These three objectives teach the model to predict missing views in a panorama (MPM), predict missing steps in the full trajectory (MTM), and generate the next view based on the full instruction and navigation history (APIG), respectively. We then fine-tune the agent on the VLN task with an auxiliary loss that minimizes the difference between the view semantics generated by the agent and the ground truth view semantics of the next step. Empirically, our VLN-SIG achieves the new state-of-the-art on both the Room-to-Room dataset and the CVDN dataset. We further show that our agent learns to fill in missing patches in future views qualitatively, which brings more interpretability over agents' predicted actions. Lastly, we demonstrate that learning to predict future view semantics also enables the agent to have better performance on longer paths., Comment: CVPR 2023 (Project webpage: https://jialuli-luka.github.io/VLN-SIG)
Published: 2023

42. Recognition Method for Train Coupler Handle Based on YOLOv5 Model

Author: Liu, Zhiyuan, Li, Yan, Xu, Zhanmou, Li, Jialu, Ding, Jiayi, Zhang, Xiong, Wan, Shuting, Zhao, Jingyi, Guo, Rui, Cai, Wei, Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Haddar, Mohamed, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Kwon, Young W., Editorial Board Member, Tolio, Tullio A. M., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Schmitt, Robert, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Halgamuge, Saman K., editor, Zhang, Hao, editor, Zhao, Dingxuan, editor, and Bian, Yongming, editor
Published: 2024
Full Text: View/download PDF

43. A Visual Detection Method for Train Couplers Based on YOLOv8 Model

Author: Zhao, Wenning, Yao, Xin, Wang, Bixin, Ding, Jiayi, Li, Jialu, Zhang, Xiong, Wan, Shuting, Zhao, Jingyi, Guo, Rui, Cai, Wei, Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Haddar, Mohamed, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Kwon, Young W., Editorial Board Member, Tolio, Tullio A. M., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Schmitt, Robert, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Halgamuge, Saman K., editor, Zhang, Hao, editor, Zhao, Dingxuan, editor, and Bian, Yongming, editor
Published: 2024
Full Text: View/download PDF

44. Screening for Late-Onset Fetal Growth Restriction in Antepartum Fetal Monitoring Using Deep Forest and SHAP

Author: Huo, Jianhong, Li, Guohua, Li, Chongwen, Li, Xia, Liu, Guiqing, Chen, Qinqun, Li, Jialu, Hao, Yuexing, Wei, Hang, Xhafa, Fatos, Series Editor, Cao, Bing-Yuan, editor, Wang, Shu-Feng, editor, Nasseri, Hadi, editor, and Zhong, Yu-Bin, editor
Published: 2024
Full Text: View/download PDF

45. A UUV Underwater Wireless Power Transfer System

Author: Li, Jialu, Zhang, Jiantao, Lu, Wei, Zhao, Jian, Cui, Shumei, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Cai, Chunwei, editor, Qu, Xiaohui, editor, Mai, Ruikun, editor, Zhang, Pengcheng, editor, Chai, Wenping, editor, and Wu, Shuai, editor
Published: 2024
Full Text: View/download PDF

46. BirdSoundsDenoising: Deep Visual Audio Denoising for Bird Sounds

Author: Zhang, Youshan and Li, Jialu
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio denoising has been explored for decades using both traditional and deep learning-based methods. However, these methods are still limited to either manually added artificial noise or lower denoised audio quality. To overcome these challenges, we collect a large-scale natural noise bird sound dataset. We are the first to transfer the audio denoising problem into an image segmentation problem and propose a deep visual audio denoising (DVAD) model. With a total of 14,120 audio images, we develop an audio ImageMask tool and propose to use a few-shot generalization strategy to label these images. Extensive experimental results demonstrate that the proposed model achieves state-of-the-art performance. We also show that our method can be easily generalized to speech denoising, audio separation, audio enhancement, and noise estimation., Comment: WACV 2023
Published: 2022

47. Structural basis of the subcortical maternal complex and its implications in reproductive disorders

Author: Chi, Pengliang, Ou, Guojin, Qin, Dandan, Han, Zhuo, Li, Jialu, Xiao, Qingjie, Gao, Zheng, Xu, Chengpeng, Qi, Qianqian, Liu, Qingting, Liu, Sibei, Li, Jinhong, Guo, Li, Lu, Yuechao, Chen, Jing, Wang, Xiang, Shi, Hubing, Li, Lei, and Deng, Dong
Published: 2024
Full Text: View/download PDF

48. CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

Author: Li, Jialu, Tan, Hao, and Bansal, Mohit
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions. In this paper, we aim to solve two key challenges in this task: utilizing multilingual instructions for improved instruction-path grounding and navigating through new environments that are unseen during training. To address these challenges, we propose CLEAR: Cross-Lingual and Environment-Agnostic Representations. First, our agent learns a shared and visually-aligned cross-lingual language representation for the three languages (English, Hindi and Telugu) in the Room-Across-Room dataset. Our language representation learning is guided by text pairs that are aligned by visual information. Second, our agent learns an environment-agnostic visual representation by maximizing the similarity between semantically-aligned image pairs (with constraints on object-matching) from different environments. Our environment agnostic visual representation can mitigate the environment bias induced by low-level visual information. Empirically, on the Room-Across-Room dataset, we show that our multilingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environment-agnostic visual representation. Furthermore, we show that our learned language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task, and present detailed qualitative and quantitative generalization and grounding analysis. Our code is available at https://github.com/jialuli-luka/CLEAR, Comment: NAACL 2022 Findings (18 pages)
Published: 2022

49. Calculation and analysis of the foundation settlement of an office building in Tongzhou

Author: Wang, Yahui, primary, Li, Jianfei, additional, Li, Jialu, additional, and Zhou, Shengbin, additional
Published: 2024
Full Text: View/download PDF

50. Strengthening appraisal and design of a building foundation excavation in Beijing

Author: Zhou, Shengbin, primary, Li, Jialu, additional, and Wang, Yahui, additional
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,340 results on '"Li, Jialu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources