Author: "Liu, Yuhan" / Topic: computer science - artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Liu, Yuhan"' showing total 17 results

Start Over Author "Liu, Yuhan" Topic computer science - artificial intelligence

17 results on '"Liu, Yuhan"'

1. DroidSpeak: Enhancing Cross-LLM Communication

Author: Liu, Yuhan, Choukse, Esha, Lu, Shan, Jiang, Junchen, and Musuvathi, Madan
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In multi-agent systems utilizing Large Language Models (LLMs), communication between agents traditionally relies on natural language. This communication often includes the full context of the query so far, which can introduce significant prefill-phase latency, especially with long contexts. We introduce DroidSpeak, a novel framework to target this cross-LLM communication by leveraging the reuse of intermediate data, such as input embeddings (E-cache) and key-value caches (KV-cache). We efficiently bypass the need to reprocess entire contexts for fine-tuned versions of the same foundational model. This approach allows faster context integration while maintaining the quality of task performance. Experimental evaluations demonstrate DroidSpeak's ability to significantly accelerate inter-agent communication, achieving up to a 2.78x speedup in prefill latency with negligible loss in accuracy. Our findings underscore the potential to create more efficient and scalable multi-agent systems.
Published: 2024

2. From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News Evolution

Author: Liu, Yuhan, Song, Zirui, Zhang, Xiaoqing, Chen, Xiuying, and Yan, Rui
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence
Abstract: With the growing spread of misinformation online, research has increasingly focused on detecting and tracking fake news. However, an overlooked issue is that fake news does not naturally exist in social networks -- it often originates from distorted facts or deliberate fabrication by malicious actors. Understanding how true news gradually evolves into fake news is critical for early detection and prevention, reducing its spread and impact. Hence, in this paper, we take the first step toward simulating and revealing this evolution, proposing a Fake News evolUtion Simulation framEwork (FUSE) based on large language models (LLMs). Specifically, we employ LLM as agents to represent individuals in a simulated social network. We define four types of agents commonly observed in daily interactions: spreaders, who propagate information; commentators, who provide opinions and interpretations; verifiers, who check the accuracy of information; and bystanders, who passively observe without engaging. For simulated environments, we model various social network structures, such as high-clustering networks and scale-free networks, to mirror real-world network dynamics. Each day, the agents engage in belief exchanges, reflect on their thought processes, and reintroduce the news accordingly. Given the lack of prior work in this area, we developed a FUSE-EVAL evaluation framework to measure the deviation from true news during the fake news evolution process. The results show that FUSE successfully captures the underlying patterns of how true news transforms into fake news and accurately reproduces previously discovered instances of fake news, aligning closely with human evaluations. Moreover, our work provides insights into the fact that combating fake news should not be delayed until it has fully evolved; instead, prevention in advance is key to achieving better outcomes.
Published: 2024

3. Autoregressive Action Sequence Learning for Robotic Manipulation

Author: Zhang, Xinyu, Liu, Yuhan, Chang, Haonan, Schramm, Liam, and Boularias, Abdeslam
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Autoregressive models have demonstrated remarkable success in natural language processing. In this work, we design a simple yet effective autoregressive architecture for robotic manipulation tasks. We propose the Chunking Causal Transformer (CCT), which extends the next-single-token prediction of causal transformers to support multi-token prediction in a single pass. Further, we design a novel attention interleaving strategy that allows CCT to be trained efficiently with teacher-forcing. Based on CCT, we propose the Autoregressive Policy (ARP) model, which learns to generate action sequences autoregressively. We find that action sequence learning enables better leverage of the underlying causal relationships in robotic tasks. We evaluate ARP across diverse robotic manipulation environments, including Push-T, ALOHA, and RLBench, and show that it outperforms the state-of-the-art methods in all tested environments, while being more efficient in computation and parameter sizes. Video demonstrations, our source code, and the models of ARP can be found at http://github.com/mlzxy/arp.
Published: 2024

4. Dr. GPT in Campus Counseling: Understanding Higher Education Students' Opinions on LLM-assisted Mental Health Services

Author: Zhang, Owen Xingjian, Zhou, Shuyao, Geng, Jiayi, Liu, Yuhan, and Liu, Sunny Xun
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence
Abstract: In response to the increasing mental health challenges faced by college students, we sought to understand their perspectives on how AI applications, particularly Large Language Models (LLMs), can be leveraged to enhance their mental well-being. Through pilot interviews with ten diverse students, we explored their opinions on the use of LLMs across five fictional scenarios: General Information Inquiry, Initial Screening, Reshaping Patient-Expert Dynamics, Long-term Care, and Follow-up Care. Our findings revealed that students' acceptance of LLMs varied by scenario, with participants highlighting both potential benefits, such as proactive engagement and personalized follow-up care, and concerns, including limitations in training data and emotional support. These insights inform how AI technology should be designed and implemented to effectively support and enhance students' mental well-being, particularly in scenarios where LLMs can complement traditional methods, while maintaining empathy and respecting individual preferences., Comment: 5 pages
Published: 2024

5. Unleash the Power of Ellipsis: Accuracy-enhanced Sparse Vector Technique with Exponential Noise

Author: Liu, Yuhan, Wang, Sheng, Liu, Yixuan, Li, Feifei, and Chen, Hong
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence
Abstract: The Sparse Vector Technique (SVT) is one of the most fundamental tools in differential privacy (DP). It works as a backbone for adaptive data analysis by answering a sequence of queries on a given dataset, and gleaning useful information in a privacy-preserving manner. Unlike the typical private query releases that directly publicize the noisy query results, SVT is less informative -- it keeps the noisy query results to itself and only reveals a binary bit for each query, indicating whether the query result surpasses a predefined threshold. To provide a rigorous DP guarantee for SVT, prior works in the literature adopt a conservative privacy analysis by assuming the direct disclosure of noisy query results as in typical private query releases. This approach, however, hinders SVT from achieving higher query accuracy due to an overestimation of the privacy risks, which further leads to an excessive noise injection using the Laplacian or Gaussian noise for perturbation. Motivated by this, we provide a new privacy analysis for SVT by considering its less informative nature. Our analysis results not only broaden the range of applicable noise types for perturbation in SVT, but also identify the exponential noise as optimal among all evaluated noises (which, however, is usually deemed non-applicable in prior works). The main challenge in applying exponential noise to SVT is mitigating the sub-optimal performance due to the bias introduced by noise distributions. To address this, we develop a utility-oriented optimal threshold correction method and an appending strategy, which enhances the performance of SVT by increasing the precision and recall, respectively. The effectiveness of our proposed methods is substantiated both theoretically and empirically, demonstrating significant improvements up to $50\%$ across evaluated metrics.
Published: 2024

6. Scaling Manipulation Learning with Visual Kinematic Chain Prediction

Author: Zhang, Xinyu, Liu, Yuhan, Chang, Haonan, and Boularias, Abdeslam
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Learning general-purpose models from diverse datasets has achieved great success in machine learning. In robotics, however, existing methods in multi-task learning are typically constrained to a single robot and workspace, while recent work such as RT-X requires a non-trivial action normalization procedure to manually bridge the gap between different action spaces in diverse environments. In this paper, we propose the visual kinematics chain as a precise and universal representation of quasi-static actions for robot learning over diverse environments, which requires no manual adjustment since the visual kinematic chains can be automatically obtained from the robot's model and camera parameters. We propose the Visual Kinematics Transformer (VKT), a convolution-free architecture that supports an arbitrary number of camera viewpoints, and that is trained with a single objective of forecasting kinematic structures through optimal point-set matching. We demonstrate the superior performance of VKT over BC transformers as a general agent on Calvin, RLBench, Open-X, and real robot manipulation tasks. Video demonstrations can be found at https://mlzxy.github.io/visual-kinetic-chain., Comment: CoRL 2024
Published: 2024

7. From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News

Author: Liu, Yuhan, Chen, Xiuying, Zhang, Xiaoqing, Gao, Xing, Zhang, Ji, and Yan, Rui
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: In the digital era, the rapid propagation of fake news and rumors via social networks brings notable societal challenges and impacts public opinion regulation. Traditional fake news modeling typically forecasts the general popularity trends of different groups or numerically represents opinions shift. However, these methods often oversimplify real-world complexities and overlook the rich semantic information of news text. The advent of large language models (LLMs) provides the possibility of modeling subtle dynamics of opinion. Consequently, in this work, we introduce a Fake news Propagation Simulation framework (FPS) based on LLM, which studies the trends and control of fake news propagation in detail. Specifically, each agent in the simulation represents an individual with a distinct personality. They are equipped with both short-term and long-term memory, as well as a reflective mechanism to mimic human-like thinking. Every day, they engage in random opinion exchanges, reflect on their thinking, and update their opinions. Our simulation results uncover patterns in fake news propagation related to topic relevance, and individual traits, aligning with real-world observations. Additionally, we evaluate various intervention strategies and demonstrate that early and appropriately frequent interventions strike a balance between governance cost and effectiveness, offering valuable insights for practical applications. Our study underscores the significant utility and potential of LLMs in combating fake news.
Published: 2024

8. MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

Author: Zeng, Zinan, Ye, Sen, Cai, Zijian, Wang, Heng, Liu, Yuhan, Zhang, Haokai, and Luo, Minnan
Subjects: Computer Science - Artificial Intelligence
Abstract: Online movie review websites are valuable for information and discussion about movies. However, the massive spoiler reviews detract from the movie-watching experience, making spoiler detection an important task. Previous methods simply focus on reviews' text content, ignoring the heterogeneity of information in the platform. For instance, the metadata and the corresponding user's information of a review could be helpful. Besides, the spoiler language of movie reviews tends to be genre-specific, thus posing a domain generalization challenge for existing methods. To this end, we propose MMoE, a multi-modal network that utilizes information from multiple modalities to facilitate robust spoiler detection and adopts Mixture-of-Experts to enhance domain generalization. MMoE first extracts graph, text, and meta feature from the user-movie network, the review's textual content, and the review's metadata respectively. To handle genre-specific spoilers, we then adopt Mixture-of-Experts architecture to process information in three modalities to promote robustness. Finally, we use an expert fusion layer to integrate the features from different perspectives and make predictions based on the fused embedding. Experiments demonstrate that MMoE achieves state-of-the-art performance on two widely-used spoiler detection datasets, surpassing previous SOTA methods by 2.56% and 8.41% in terms of accuracy and F1-score. Further experiments also demonstrate MMoE's superiority in robustness and generalization.
Published: 2024

9. How connectivity structure shapes rich and lazy learning in neural circuits

Author: Liu, Yuhan Helena, Baratin, Aristide, Cornford, Jonathan, Mihalas, Stefan, Shea-Brown, Eric, and Lajoie, Guillaume
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence, Quantitative Biology - Neurons and Cognition
Abstract: In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity could exhibit a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights -- in particular their effective rank -- influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in shaping learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting., Comment: Published at ICLR 2024
Published: 2023

10. Automatic and Efficient Customization of Neural Networks for ML Applications

Author: Liu, Yuhan, Wan, Chengcheng, Du, Kuntai, Hoffmann, Henry, Jiang, Junchen, Lu, Shan, and Maire, Michael
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Networking and Internet Architecture
Abstract: ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%.
Published: 2023

11. OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

Author: Du, Kuntai, Liu, Yuhan, Hao, Yitian, Zhang, Qizheng, Wang, Haodong, Huang, Yuyang, Ananthanarayanan, Ganesh, and Jiang, Junchen
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Multimedia, Computer Science - Networking and Internet Architecture
Abstract: Deep learning inference on streaming media data, such as object detection in video or LiDAR feeds and text extraction from audio waves, is now ubiquitous. To achieve high inference accuracy, these applications typically require significant network bandwidth to gather high-fidelity data and extensive GPU resources to run deep neural networks (DNNs). While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs. This paper presents OneAdapt, which meets these requirements by leveraging a gradient-ascent strategy to adapt configuration knobs. The key idea is to embrace DNNs' differentiability to quickly estimate the accuracy's gradient to each configuration knob, called AccGrad. Specifically, OneAdapt estimates AccGrad by multiplying two gradients: InputGrad (i.e. how each configuration knob affects the input to the DNN) and DNNGrad (i.e. how the DNN input affects the DNN inference output). We evaluate OneAdapt across five types of configurations, four analytic tasks, and five types of input data. Compared to state-of-the-art adaptation schemes, OneAdapt cuts bandwidth usage and GPU usage by 15-59% while maintaining comparable accuracy or improves accuracy by 1-5% while using equal or fewer resources., Comment: SoCC' 23
Published: 2023
Full Text: View/download PDF

12. Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models

Author: Ding, Wenxuan, Feng, Shangbin, Liu, Yuhan, Tan, Zhaoxuan, Balachandran, Vidhisha, He, Tianxing, and Tsvetkov, Yulia
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We propose Knowledge Crosswords, a geometric knowledge reasoning benchmark consisting of incomplete knowledge networks bounded by structured factual constraints, where LLMs are tasked with inferring the missing facts to meet all constraints. The novel setting of geometric knowledge reasoning necessitates new LM abilities beyond existing atomic/linear multi-hop QA, such as backtracking, verifying facts and constraints, reasoning with uncertainty, and more. Knowledge Crosswords contains 2,101 individual problems, covering diverse knowledge domains, and is further divided into three difficulty levels. We conduct extensive experiments to evaluate existing LLMs and approaches on Knowledge Crosswords. Results demonstrate that baseline approaches struggle with larger knowledge networks and semantically-equivalent entity distractors. In light of their limitations, we propose two new approaches, Staged Prompting and Verify-All, to augment LLMs' abilities for error-aware backtracking and constraint verification. Our Verify-All significantly outperforms prior methods and is more robust towards problems in the hard subset. Further analysis shows that geometric knowledge reasoning poses new challenges to LLMs' knowledge abilities, particularly in robustness towards varying option orders, complex structural constraints in knowledge networks, "none of the above" scenarios, and more.
Published: 2023

13. GRACE: Loss-Resilient Real-Time Video through Neural Codecs

Author: Cheng, Yihua, Zhang, Ziyi, Li, Hanchen, Arapin, Anton, Zhang, Yue, Zhang, Qizheng, Liu, Yuhan, Zhang, Xu, Yan, Francis Y., Mazumdar, Amrita, Feamster, Nick, and Jiang, Junchen
Subjects: Computer Science - Multimedia, Computer Science - Artificial Intelligence, Computer Science - Networking and Internet Architecture
Abstract: In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements. To counter packet losses without retransmission, two primary strategies are employed -- encoder-based forward error correction (FEC) and decoder-based error concealment. The former encodes data with redundancy before transmission, yet determining the optimal redundancy level in advance proves challenging. The latter reconstructs video from partially received frames, but dividing a frame into independently coded partitions inherently compromises compression efficiency, and the lost information cannot be effectively recovered by the decoder without adapting the encoder. We present a loss-resilient real-time video system called GRACE, which preserves the user's quality of experience (QoE) across a wide range of packet losses through a new neural video codec. Central to GRACE's enhanced loss resilience is its joint training of the neural encoder and decoder under a spectrum of simulated packet losses. In lossless scenarios, GRACE achieves video quality on par with conventional codecs (e.g., H.265). As the loss rate escalates, GRACE exhibits a more graceful, less pronounced decline in quality, consistently outperforming other loss-resilient schemes. Through extensive evaluation on various videos and real network traces, we demonstrate that GRACE reduces undecodable frames by 95% and stall duration by 90% compared with FEC, while markedly boosting video quality over error concealment methods. In a user study with 240 crowdsourced participants and 960 subjective ratings, GRACE registers a 38% higher mean opinion score (MOS) than other baselines.
Published: 2023

14. Agent-based Simulation for Online Mental Health Matching

Author: Liu, Yuhan, Fang, Anna, Moriarty, Glen, Kraut, Robert, and Zhu, Haiyi
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Online mental health communities (OMHCs) are an effective and accessible channel to give and receive social support for individuals with mental and emotional issues. However, a key challenge on these platforms is finding suitable partners to interact with given that mechanisms to match users are currently underdeveloped. In this paper, we collaborate with one of the world's largest OMHC to develop an agent-based simulation framework and explore the trade-offs in different matching algorithms. The simulation framework allows us to compare current mechanisms and new algorithmic matching policies on the platform, and observe their differing effects on a variety of outcome metrics. Our findings include that usage of the deferred-acceptance algorithm can significantly better the experiences of support-seekers in one-on-one chats while maintaining low waiting time. We note key design considerations that agent-based modeling reveals in the OMHC context, including the potential benefits of algorithmic matching on marginalized communities.
Published: 2023

15. TwiBot-22: Towards Graph-Based Twitter Bot Detection

Author: Feng, Shangbin, Tan, Zhaoxuan, Wan, Herun, Wang, Ningnan, Chen, Zilong, Zhang, Binchi, Zheng, Qinghua, Zhang, Wenqian, Lei, Zhenyu, Yang, Shujie, Feng, Xinshun, Zhang, Qingyue, Wang, Hongrui, Liu, Yuhan, Bai, Yuyang, Wang, Heng, Cai, Zijian, Wang, Yanbo, Zheng, Lijing, Ma, Zihan, Li, Jundong, and Luo, Minnan
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence
Abstract: Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse. State-of-the-art bot detection methods generally leverage the graph structure of the Twitter network, and they exhibit promising performance when confronting novel Twitter bots that traditional methods fail to detect. However, very few of the existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset scale, incomplete graph structure, as well as low annotation quality. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark that addresses these issues has seriously hindered the development and evaluation of novel graph-based bot detection approaches. In this paper, we propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets. In addition, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22, to promote a fair comparison of model performance and a holistic understanding of research progress. To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers could consistently evaluate new models and datasets. The TwiBot-22 Twitter bot detection benchmark and evaluation framework are publicly available at https://twibot22.github.io/, Comment: NeurIPS 2022, Datasets and Benchmarks Track
Published: 2022

16. Human Behavior Recognition Method Based on CEEMD-ES Radar Selection

Author: Zhang, Zhaolin, Song, Mingqi, Meng, Wugang, Liu, Yuhan, Li, Fengcong, Feng, Xiang, and Zhao, Yinan
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Artificial Intelligence
Abstract: In recent years, the millimeter-wave radar to identify human behavior has been widely used in medical,security, and other fields. When multiple radars are performing detection tasks, the validity of the features contained in each radar is difficult to guarantee. In addition, processing multiple radar data also requires a lot of time and computational cost. The Complementary Ensemble Empirical Mode Decomposition-Energy Slice (CEEMD-ES) multistatic radar selection method is proposed to solve these problems. First, this method decomposes and reconstructs the radar signal according to the difference in the reflected echo frequency between the limbs and the trunk of the human body. Then, the radar is selected according to the difference between the ratio of echo energy of limbs and trunk and the theoretical value. The time domain, frequency domain and various entropy features of the selected radar are extracted. Finally, the Extreme Learning Machine (ELM) recognition model of the ReLu core is established. Experiments show that this method can effectively select the radar, and the recognition rate of three kinds of human actions is 98.53%., Comment: 4 pages, 5 figures
Published: 2022

17. Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Author: Liu, Yuhan Helena, Ghosh, Arna, Richards, Blake A., Shea-Brown, Eric, and Lajoie, Guillaume
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence, Quantitative Biology - Neurons and Cognition
Abstract: To unveil how the brain learns, ongoing work seeks biologically-plausible approximations of gradient descent algorithms for training recurrent neural networks (RNNs). Yet, beyond task accuracy, it is unclear if such learning rules converge to solutions that exhibit different levels of generalization than their nonbiologically-plausible counterparts. Leveraging results from deep learning theory based on loss landscape curvature, we ask: how do biologically-plausible gradient approximations affect generalization? We first demonstrate that state-of-the-art biologically-plausible learning rules for training RNNs exhibit worse and more variable generalization performance compared to their machine learning counterparts that follow the true gradient more closely. Next, we verify that such generalization performance is correlated significantly with loss landscape curvature, and we show that biologically-plausible learning rules tend to approach high-curvature regions in synaptic weight space. Using tools from dynamical systems, we derive theoretical arguments and present a theorem explaining this phenomenon. This predicts our numerical results, and explains why biologically-plausible rules lead to worse and more variable generalization properties. Finally, we suggest potential remedies that could be used by the brain to mitigate this effect. To our knowledge, our analysis is the first to identify the reason for this generalization gap between artificial and biologically-plausible learning rules, which can help guide future investigations into how the brain learns solutions that generalize., Comment: NeurIPS 2022 Camera Ready Version
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Liu, Yuhan"'

1. DroidSpeak: Enhancing Cross-LLM Communication

2. From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News Evolution

3. Autoregressive Action Sequence Learning for Robotic Manipulation

4. Dr. GPT in Campus Counseling: Understanding Higher Education Students' Opinions on LLM-assisted Mental Health Services

5. Unleash the Power of Ellipsis: Accuracy-enhanced Sparse Vector Technique with Exponential Noise

6. Scaling Manipulation Learning with Visual Kinematic Chain Prediction

7. From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News

8. MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

9. How connectivity structure shapes rich and lazy learning in neural circuits

10. Automatic and Efficient Customization of Neural Networks for ML Applications

11. OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

12. Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models

13. GRACE: Loss-Resilient Real-Time Video through Neural Codecs

14. Agent-based Simulation for Online Mental Health Matching

15. TwiBot-22: Towards Graph-Based Twitter Bot Detection

16. Human Behavior Recognition Method Based on CEEMD-ES Radar Selection

17. Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

17 results on '"Liu, Yuhan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources