Descriptor: "68T01" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"68T01"' showing total 1,030 results

Start Over Descriptor "68T01"

1,030 results on '"68T01"'

1. Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding

Author: Berestizshevsky, Konstantin, Andri, Renzo, and Cavigelli, Lukas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, 68T01, I.2
Abstract: The attention mechanism is essential for the impressive capabilities of transformer-based Large Language Models (LLMs). However, calculating attention is computationally intensive due to its quadratic dependency on the sequence length. We introduce a novel approach called Top-Theta Attention, or simply Top-$\theta$, which selectively prunes less essential attention elements by comparing them against carefully calibrated thresholds. This method greatly improves the efficiency of self-attention matrix multiplication while preserving model accuracy, reducing the number of required V cache rows by 3x during generative decoding and the number of attention elements by 10x during the prefill phase. Our method does not require model retraining; instead, it requires only a brief calibration phase to be resilient to distribution shifts, thus not requiring the thresholds for different datasets to be recalibrated. Unlike top-k attention, Top-$\theta$ eliminates full-vector dependency, making it suitable for tiling and scale-out and avoiding costly top-k search. A key innovation of our approach is the development of efficient numerical compensation techniques, which help preserve model accuracy even under aggressive pruning of attention scores., Comment: 8 pages, 11 figures, work under submission
Published: 2025

2. Mathematical reasoning and the computer

Author: Buzzard, Kevin
Subjects: Computer Science - Artificial Intelligence, 68T01
Abstract: Computers have already changed the way that humans do mathematics: they enable us to compute efficiently. But will they soon be helping us to reason? And will they one day start reasoning themselves? We give an overview of recent developments in neural networks, computer theorem provers and large language models., Comment: This article was written in 2023 and is thus now rather out of date. Apologies for taking so long to upload to ArXiv
Published: 2025

3. Importance Sampling via Score-based Generative Models

Author: Kim, Heasung, Lee, Taekyun, Kim, Hyeji, and de Veciana, Gustavo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, 68T01, I.2.0
Abstract: Importance sampling, which involves sampling from a probability density function (PDF) proportional to the product of an importance weight function and a base PDF, is a powerful technique with applications in variance reduction, biased or customized sampling, data augmentation, and beyond. Inspired by the growing availability of score-based generative models (SGMs), we propose an entirely training-free Importance sampling framework that relies solely on an SGM for the base PDF. Our key innovation is realizing the importance sampling process as a backward diffusion process, expressed in terms of the score function of the base PDF and the specified importance weight function--both readily available--eliminating the need for any additional training. We conduct a thorough analysis demonstrating the method's scalability and effectiveness across diverse datasets and tasks, including importance sampling for industrial and natural images with neural importance weight functions. The training-free aspect of our method is particularly compelling in real-world scenarios where a single base distribution underlies multiple biased sampling tasks, each requiring a different importance weight function. To the best of our knowledge our approach is the first importance sampling framework to achieve this., Comment: 18 pages
Published: 2025

4. Trust and Trustworthiness from Human-Centered Perspective in HRI -- A Systematic Literature Review

Author: de Souza, Debora Firmino, Sousa, Sonia, Kristjuhan-Ling, Kadri, Dunajeva, Olga, Roosileht, Mare, Pentel, Avar, Mõttus, Mati, Özdemir, Mustafa Can, and Gratšjova, Žanna
Subjects: Computer Science - Human-Computer Interaction, 68T01, H.5.2, I.2.1
Abstract: The Industry 5.0 transition highlights EU efforts to design intelligent devices that can work alongside humans to enhance human capabilities, and such vision aligns with user preferences and needs to feel safe while collaborating with such systems take priority. This demands a human-centric research vision and requires a societal and educational shift in how we perceive technological advancements. To better understand this perspective, we conducted a systematic literature review focusing on understanding how trust and trustworthiness can be key aspects of supporting this move towards Industry 5.0. This review aims to overview the most common methodologies and measurements and collect insights about barriers and facilitators for fostering trustworthy HRI. After a rigorous quality assessment following the Systematic Reviews and Meta-Analyses guidelines, using rigorous inclusion criteria and screening by at least two reviewers, 34 articles were included in the review. The findings underscores the significance of trust and safety as foundational elements for promoting secure and trustworthy human-machine cooperation. Confirm that almost 30% of the revised articles do not present a definition of trust, which can be problematic as this lack of conceptual clarity can undermine research efforts in addressing this problem from a central perspective. It highlights that the choice of domain and area of application should influence the choice of methods and approaches to fostering trust in HRI, as those choices can significantly affect user preferences and their perceptions and assessment of robot capabilities. Additionally, this lack of conceptual clarity can be a potential barrier to fostering trust in HRI and explains the sometimes contradictory findings or choice of methods and instruments used to investigate trust in robots and other autonomous systems in the literature., Comment: 18 pages, Systematic Literature Review on Human-Robot Interaction
Published: 2025

5. Cherry-Picking in Time Series Forecasting: How to Select Datasets to Make Your Model Shine

Author: Roque, Luis, Soares, Carlos, Cerqueira, Vitor, and Torgo, Luis
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, 68T01, I.5.1, G.3, H.2.8, I.2.1
Abstract: The importance of time series forecasting drives continuous research and the development of new approaches to tackle this problem. Typically, these methods are introduced through empirical studies that frequently claim superior accuracy for the proposed approaches. Nevertheless, concerns are rising about the reliability and generalizability of these results due to limitations in experimental setups. This paper addresses a critical limitation: the number and representativeness of the datasets used. We investigate the impact of dataset selection bias, particularly the practice of cherry-picking datasets, on the performance evaluation of forecasting methods. Through empirical analysis with a diverse set of benchmark datasets, our findings reveal that cherry-picking datasets can significantly distort the perceived performance of methods, often exaggerating their effectiveness. Furthermore, our results demonstrate that by selectively choosing just four datasets - what most studies report - 46% of methods could be deemed best in class, and 77% could rank within the top three. Additionally, recent deep learning-based approaches show high sensitivity to dataset selection, whereas classical methods exhibit greater robustness. Finally, our results indicate that, when empirically validating forecasting algorithms on a subset of the benchmarks, increasing the number of datasets tested from 3 to 6 reduces the risk of incorrectly identifying an algorithm as the best one by approximately 40%. Our study highlights the critical need for comprehensive evaluation frameworks that more accurately reflect real-world scenarios. Adopting such frameworks will ensure the development of robust and reliable forecasting methods., Comment: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), February 25-March 4, 2025, Philadelphia, Pennsylvania, USA
Published: 2024

6. Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an 'AI Therapist'

Author: Wasenmüller, Robert, Hilbert, Kevin, and Benzmüller, Christoph
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, 68T01
Abstract: Large Language Model (LLM)-Powered Conversational Agents have the potential to provide users with scaled behavioral healthcare support, and potentially even deliver full-scale "AI therapy'" in the future. While such agents can already conduct fluent and proactive emotional support conversations, they inherently lack the ability to (a) consistently and reliably act by predefined rules to align their conversation with an overarching therapeutic concept and (b) make their decision paths inspectable for risk management and clinical evaluation -- both essential requirements for an "AI Therapist". In this work, we introduce a novel paradigm for dialog policy planning in conversational agents enabling them to (a) act according to an expert-written "script" that outlines the therapeutic approach and (b) explicitly transition through a finite set of states over the course of the conversation. The script acts as a deterministic component, constraining the LLM's behavior in desirable ways and establishing a basic architecture for an AI Therapist. We implement two variants of Script-Based Dialog Policy Planning using different prompting techniques and synthesize a total of 100 conversations with LLM-simulated patients. The results demonstrate the feasibility of this new technology and provide insights into the efficiency and effectiveness of different implementation variants., Comment: 9 pages, 5 figures, 1 table
Published: 2024

7. A Theoretical Analysis of Soft-Label vs Hard-Label Training in Neural Networks

Author: Mandal, Saptarshi, Lin, Xiaojun, and Srikant, R.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, 68T01
Abstract: Knowledge distillation, where a small student model learns from a pre-trained large teacher model, has achieved substantial empirical success since the seminal work of \citep{hinton2015distilling}. Despite prior theoretical studies exploring the benefits of knowledge distillation, an important question remains unanswered: why does soft-label training from the teacher require significantly fewer neurons than directly training a small neural network with hard labels? To address this, we first present motivating experimental results using simple neural network models on a binary classification problem. These results demonstrate that soft-label training consistently outperforms hard-label training in accuracy, with the performance gap becoming more pronounced as the dataset becomes increasingly difficult to classify. We then substantiate these observations with a theoretical contribution based on two-layer neural network models. Specifically, we show that soft-label training using gradient descent requires only $O\left(\frac{1}{\gamma^2 \epsilon}\right)$ neurons to achieve a classification loss averaged over epochs smaller than some $\epsilon > 0$, where $\gamma$ is the separation margin of the limiting kernel. In contrast, hard-label training requires $O\left(\frac{1}{\gamma^4} \cdot \ln\left(\frac{1}{\epsilon}\right)\right)$ neurons, as derived from an adapted version of the gradient descent analysis in \citep{ji2020polylogarithmic}. This implies that when $\gamma \leq \epsilon$, i.e., when the dataset is challenging to classify, the neuron requirement for soft-label training can be significantly lower than that for hard-label training. Finally, we present experimental results on deep neural networks, further validating these theoretical findings., Comment: Main Body of the Paper is under Review at L4DC 2025
Published: 2024

8. Score Change of Variables

Author: Robbins, Stephen
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Probability, 68T01, I.2.6
Abstract: We derive a general change of variables formula for score functions, showing that for a smooth, invertible transformation $\mathbf{y} = \phi(\mathbf{x})$, the transformed score function $\nabla_{\mathbf{y}} \log q(\mathbf{y})$ can be expressed directly in terms of $\nabla_{\mathbf{x}} \log p(\mathbf{x})$. Using this result, we develop two applications: First, we establish a reverse-time It\^o lemma for score-based diffusion models, allowing the use of $\nabla_{\mathbf{x}} \log p_t(\mathbf{x})$ to reverse an SDE in the transformed space without directly learning $\nabla_{\mathbf{y}} \log q_t(\mathbf{y})$. This approach enables training diffusion models in one space but sampling in another, effectively decoupling the forward and reverse processes. Second, we introduce generalized sliced score matching, extending traditional sliced score matching from linear projections to arbitrary smooth transformations. This provides greater flexibility in high-dimensional density estimation. We demonstrate these theoretical advances through applications to diffusion on the probability simplex and empirically compare our generalized score matching approach against traditional sliced score matching methods., Comment: 27 pages, 6 figures
Published: 2024

9. Predictors of disease outbreaks at continentalscale in the African region: Insights and predictions with geospatial artificial intelligence using earth observations and routine disease surveillance data

Author: Pezanowski, Scott, Koua, Etien Luc, Okeibunor, Joseph C, and Gueye, Abdou Salam
Subjects: Computer Science - Machine Learning, 68T01, J.3
Abstract: Objectives: Our research adopts computational techniques to analyze disease outbreaks weekly over a large geographic area while maintaining local-level analysis by incorporating relevant high-spatial resolution cultural and environmental datasets. The abundance of data about disease outbreaks gives scientists an excellent opportunity to uncover patterns in disease spread and make future predictions. However, data over a sizeable geographic area quickly outpace human cognition. Our study area covers a significant portion of the African continent (about 17,885,000 km2). The data size makes computational analysis vital to assist human decision-makers. Methods: We first applied global and local spatial autocorrelation for malaria, cholera, meningitis, and yellow fever case counts. We then used machine learning to predict the weekly presence of these diseases in the second-level administrative district. Lastly, we used machine learning feature importance methods on the variables that affect spread. Results: Our spatial autocorrelation results show that geographic nearness is critical but varies in effect and space. Moreover, we identified many interesting hot and cold spots and spatial outliers. The machine learning model infers a binary class of cases or none with the best F1 score of 0.96 for malaria. Machine learning feature importance uncovered critical cultural and environmental factors affecting outbreaks and variations between diseases. Conclusions: Our study shows that data analytics and machine learning are vital to understanding and monitoring disease outbreaks locally across vast areas. The speed at which these methods produce insights can be critical during epidemics and emergencies., Comment: 15 pages, 3 figures, 7 tables
Published: 2024
Full Text: View/download PDF

10. Deploying Large Language Models With Retrieval Augmented Generation

Author: Prabhune, Sonal and Berndt, Donald J.
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language, 68T01, I.2.0
Abstract: Knowing that the generative capabilities of large language models (LLM) are sometimes hampered by tendencies to hallucinate or create non-factual responses, researchers have increasingly focused on methods to ground generated outputs in factual data. Retrieval Augmented Generation (RAG) has emerged as a key approach for integrating knowledge from data sources outside of the LLM's training set, including proprietary and up-to-date information. While many research papers explore various RAG strategies, their true efficacy is tested in real-world applications with actual data. The journey from conceiving an idea to actualizing it in the real world is a lengthy process. We present insights from the development and field-testing of a pilot project that integrates LLMs with RAG for information retrieval. Additionally, we examine the impacts on the information value chain, encompassing people, processes, and technology. Our aim is to identify the opportunities and challenges of implementing this emerging technology, particularly within the context of behavioral research in the information systems (IS) field. The contributions of this work include the development of best practices and recommendations for adopting this promising technology while ensuring compliance with industry regulations through a proposed AI governance model.
Published: 2024

11. Evaluating the Economic Implications of Using Machine Learning in Clinical Psychiatry

Author: Hossain, Soaad, Rasalingam, James, Waheed, Arhum, Awil, Fatah, Kandiah, Rachel, and Ahmed, Syed Ishtiaque
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning, 68T01, I.2, J.4, I.2.1, K.4.3
Abstract: With the growing interest in using AI and machine learning (ML) in medicine, there is an increasing number of literature covering the application and ethics of using AI and ML in areas of medicine such as clinical psychiatry. The problem is that there is little literature covering the economic aspects associated with using ML in clinical psychiatry. This study addresses this gap by specifically studying the economic implications of using ML in clinical psychiatry. In this paper, we evaluate the economic implications of using ML in clinical psychiatry through using three problem-oriented case studies, literature on economics, socioeconomic and medical AI, and two types of health economic evaluations. In addition, we provide details on fairness, legal, ethics and other considerations for ML in clinical psychiatry., Comment: 11 pages, submitted to Machine Learning for Health (ML4H) 2024
Published: 2024

12. MBExplainer: Multilevel bandit-based explanations for downstream models with augmented graph embeddings

Author: Golgoon, Ashkan, Franks, Ryan, Filom, Khashayar, and Kannan, Arjun Ravi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science, Mathematics - Numerical Analysis, Statistics - Machine Learning, 68T01, I.2
Abstract: In many industrial applications, it is common that the graph embeddings generated from training GNNs are used in an ensemble model where the embeddings are combined with other tabular features (e.g., original node or edge features) in a downstream ML task. The tabular features may even arise naturally if, e.g., one tries to build a graph such that some of the node or edge features are stored in a tabular format. Here we address the problem of explaining the output of such ensemble models for which the input features consist of learned neural graph embeddings combined with additional tabular features. We propose MBExplainer, a model-agnostic explanation approach for downstream models with augmented graph embeddings. MBExplainer returns a human-legible triple as an explanation for an instance prediction of the whole pipeline consisting of three components: a subgraph with the highest importance, the topmost important nodal features, and the topmost important augmented downstream features. A game-theoretic formulation is used to take the contributions of each component and their interactions into account by assigning three Shapley values corresponding to their own specific games. Finding the explanation requires an efficient search through the corresponding local search spaces corresponding to each component. MBExplainer applies a novel multilevel search algorithm that enables simultaneous pruning of local search spaces in a computationally tractable way. In particular, three interweaved Monte Carlo Tree Search are utilized to iteratively prune the local search spaces. MBExplainer also includes a global search algorithm that uses contextual bandits to efficiently allocate pruning budget among the local search spaces. We show the effectiveness of MBExplainer by presenting a set of comprehensive numerical examples on multiple public graph datasets for both node and graph classification tasks.
Published: 2024

13. Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets

Author: Ma, Jianmina, Ji, Jingtian, and Gao, Yue
Subjects: Computer Science - Machine Learning, Computer Science - Robotics, 68T01, I.2.6
Abstract: Constrained reinforcement learning has achieved promising progress in safety-critical fields where both rewards and constraints are considered. However, constrained reinforcement learning methods face challenges in striking the right balance between task performance and constraint satisfaction and it is prone for them to get stuck in over-conservative or constraint violating local minima. In this paper, we propose Adversarial Constrained Policy Optimization (ACPO), which enables simultaneous optimization of reward and the adaptation of cost budgets during training. Our approach divides original constrained problem into two adversarial stages that are solved alternately, and the policy update performance of our algorithm can be theoretically guaranteed. We validate our method through experiments conducted on Safety Gymnasium and quadruped locomotion tasks. Results demonstrate that our algorithm achieves better performances compared to commonly used baselines., Comment: 21 pages, 8 figures
Published: 2024

14. SMART: A Flexible Approach to Regression using Spline-Based Multivariate Adaptive Regression Trees

Author: Pattie, William and Krishna, Arvind
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, 68T01, I.5.1, G.3
Abstract: Decision trees are powerful for predictive modeling but often suffer from high variance when modeling continuous relationships. While algorithms like Multivariate Adaptive Regression Splines (MARS) excel at capturing such continuous relationships, they perform poorly when modeling discontinuities. To address the limitations of both approaches, we introduce Spline-based Multivariate Adaptive Regression Trees (SMART), which uses a decision tree to identify subsets of data with distinct continuous relationships and then leverages MARS to fit these relationships independently. Unlike other methods that rely on the tree structure to model interaction and higher-order terms, SMART leverages MARS's native ability to handle these terms, allowing the tree to focus solely on identifying discontinuities in the relationship. We test SMART on various datasets, demonstrating its improvement over state-of-the-art methods in such cases. Additionally, we provide an open-source implementation of our method to be used by practitioners.
Published: 2024

15. A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research

Author: Salazar, Teresa, Araújo, Helder, Cano, Alberto, and Abreu, Pedro Henriques
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, 68T01, I.2.6, I.5.1, K.4.1
Abstract: Group fairness in machine learning is a critical area of research focused on achieving equitable outcomes across different groups defined by sensitive attributes such as race or gender. Federated learning, a decentralized approach to training machine learning models across multiple devices or organizations without sharing raw data, amplifies the need for fairness due to the heterogeneous data distributions across clients, which can exacerbate biases. The intersection of federated learning and group fairness has attracted significant interest, with 47 research works specifically dedicated to addressing this issue. However, no dedicated survey has focused comprehensively on group fairness in federated learning. In this work, we present an in-depth survey on this topic, addressing the critical challenges and reviewing related works in the field. We create a novel taxonomy of these approaches based on key criteria such as data partitioning, location, and applied strategies. Additionally, we explore broader concerns related to this problem and investigate how different approaches handle the complexities of various sensitive groups and their intersections. Finally, we review the datasets and applications commonly used in current research. We conclude by highlighting key areas for future research, emphasizing the need for more methods to address the complexities of achieving group fairness in federated systems.
Published: 2024

16. Effects of AI Feedback on Learning, the Skill Gap, and Intellectual Diversity

Author: Riedl, Christoph and Bogert, Eric
Subjects: Economics - General Economics, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, 68T01, I.2, J.4
Abstract: Can human decision-makers learn from AI feedback? Using data on 52,000 decision-makers from a large online chess platform, we investigate how their AI use affects three interrelated long-term outcomes: Learning, skill gap, and diversity of decision strategies. First, we show that individuals are far more likely to seek AI feedback in situations in which they experienced success rather than failure. This AI feedback seeking strategy turns out to be detrimental to learning: Feedback on successes decreases future performance, while feedback on failures increases it. Second, higher-skilled decision-makers seek AI feedback more often and are far more likely to seek AI feedback after a failure, and benefit more from AI feedback than lower-skilled individuals. As a result, access to AI feedback increases, rather than decreases, the skill gap between high- and low-skilled individuals. Finally, we leverage 42 major platform updates as natural experiments to show that access to AI feedback causes a decrease in intellectual diversity of the population as individuals tend to specialize in the same areas. Together, those results indicate that learning from AI feedback is not automatic and using AI correctly seems to be a skill itself. Furthermore, despite its individual-level benefits, access to AI feedback can have significant population-level downsides including loss of intellectual diversity and an increasing skill gap.
Published: 2024

17. Optimal Memorization Capacity of Transformers

Author: Kajitsuka, Tokio and Sato, Issei
Subjects: Computer Science - Machine Learning, 68T01
Abstract: Recent research in the field of machine learning has increasingly focused on the memorization capacity of Transformers, but how efficient they are is not yet well understood. We demonstrate that Transformers can memorize labels with $\tilde{O}(\sqrt{N})$ parameters in a next-token prediction setting for $N$ input sequences of length $n$, which is proved to be optimal up to logarithmic factors. This indicates that Transformers can efficiently perform memorization with little influence from the input length $n$ owing to the benefit of parameter sharing. We also analyze the memorization capacity in the sequence-to-sequence setting, and find that $\tilde{O}(\sqrt{nN})$ parameters are not only sufficient, but also necessary at least for Transformers with hardmax. These results suggest that while self-attention mechanisms can efficiently identify input sequences, the feed-forward network becomes a bottleneck when associating a label to each token.
Published: 2024

18. Acceptable Use Policies for Foundation Models

Author: Klyman, Kevin
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, 68T01, K.5.0
Abstract: As foundation models have accumulated hundreds of millions of users, developers have begun to take steps to prevent harmful types of uses. One salient intervention that foundation model developers adopt is acceptable use policies: legally binding policies that prohibit users from using a model for specific purposes. This paper identifies acceptable use policies from 30 foundation model developers, analyzes the use restrictions they contain, and argues that acceptable use policies are an important lens for understanding the regulation of foundation models. Taken together, developers' acceptable use policies include 127 distinct use restrictions; the wide variety in the number and type of use restrictions may create fragmentation across the AI supply chain. Developers also employ acceptable use policies to prevent competitors or specific industries from making use of their models. Developers alone decide what constitutes acceptable use, and rarely provide transparency about how they enforce their policies. In practice, acceptable use policies are difficult to enforce, and scrupulous enforcement can act as a barrier to researcher access and limit beneficial uses of foundation models. Nevertheless, acceptable use policies for foundation models are an early example of self-regulation that have a significant impact on the market for foundation models and the overall AI ecosystem., Comment: 10 pages, 2 figures, 2 tables
Published: 2024

19. Learning from Complementary Features

Author: Sugiyama, Kosuke and Uchida, Masato
Subjects: Computer Science - Machine Learning, 68T01, I.5
Abstract: While precise data observation is essential for the learning processes of predictive models, it can be challenging owing to factors such as insufficient observation accuracy, high collection costs, and privacy constraints. In this paper, we examines cases where some qualitative features are unavailable as precise information indicating "what it is," but rather as complementary information indicating "what it is not." We refer to features defined by precise information as ordinary features (OFs) and those defined by complementary information as complementary features (CFs). We then formulate a new learning scenario termed Complementary Feature Learning (CFL), where predictive models are constructed using instances consisting of OFs and CFs. The simplest formalization of CFL applies conventional supervised learning directly using the observed values of CFs. However, this approach does not resolve the ambiguity associated with CFs, making learning challenging and complicating the interpretation of the predictive model's specific predictions. Therefore, we derive an objective function from an information-theoretic perspective to estimate the OF values corresponding to CFs and to predict output labels based on these estimations. Based on this objective function, we propose a theoretically guaranteed graph-based estimation method along with its practical approximation, for estimating OF values corresponding to CFs. The results of numerical experiments conducted with real-world data demonstrate that our proposed method effectively estimates OF values corresponding to CFs and predicts output labels., Comment: 15 pages, 5 figures
Published: 2024

20. Quantifying Behavioural Distance Between Mathematical Expressions

Author: Mežnar, Sebastian, Džeroski, Sašo, and Todorovski, Ljupčo
Subjects: Computer Science - Artificial Intelligence, 68T01, I.1.1, I.2.0
Abstract: Existing symbolic regression methods organize the space of candidate mathematical expressions primarily based on their syntactic, structural similarity. However, this approach overlooks crucial equivalences between expressions that arise from mathematical symmetries, such as commutativity, associativity, and distribution laws for arithmetic operations. Consequently, expressions with similar errors on a given data set are apart from each other in the search space. This leads to a rough error landscape in the search space that efficient local, gradient-based methods cannot explore. This paper proposes and implements a measure of a behavioral distance, BED, that clusters together expressions with similar errors. The experimental results show that the stochastic method for calculating BED achieves consistency with a modest number of sampled values for evaluating the expressions. This leads to computational efficiency comparable to the tree-based syntactic distance. Our findings also reveal that BED significantly improves the smoothness of the error landscape in the search space for symbolic regression., Comment: 15 pages, 10 figures, 1 table, 2 appendices
Published: 2024

21. BernGraph: Probabilistic Graph Neural Networks for EHR-based Medication Recommendations

Author: Piao, Xihao, Gao, Pei, Chen, Zheng, Zhu, Lingwei, Matsubara, Yasuko, Sakurai, Yasushi, and Sun, Jimeng
Subjects: Computer Science - Artificial Intelligence, 68T01
Abstract: The medical community believes binary medical event outcomes in EHR data contain sufficient information for making a sensible recommendation. However, there are two challenges to effectively utilizing such data: (1) modeling the relationship between massive 0,1 event outcomes is difficult, even with expert knowledge; (2) in practice, learning can be stalled by the binary values since the equally important 0 entries propagate no learning signals. Currently, there is a large gap between the assumed sufficient information and the reality that no promising results have been shown by utilizing solely the binary data: visiting or secondary information is often necessary to reach acceptable performance. In this paper, we attempt to build the first successful binary EHR data-oriented drug recommendation system by tackling the two difficulties, making sensible drug recommendations solely using the binary EHR medical records. To this end, we take a statistical perspective to view the EHR data as a sample from its cohorts and transform them into continuous Bernoulli probabilities. The transformed entries not only model a deterministic binary event with a distribution but also allow reflecting \emph{event-event} relationship by conditional probability. A graph neural network is learned on top of the transformation. It captures event-event correlations while emphasizing \emph{event-to-patient} features. Extensive results demonstrate that the proposed method achieves state-of-the-art performance on large-scale databases, outperforming baseline methods that use secondary information by a large margin. The source code is available at \url{https://github.com/chenzRG/BEHRMecom}
Published: 2024

22. ECG-FM: An Open Electrocardiogram Foundation Model

Author: McKeen, Kaden, Oliva, Laura, Masood, Sameer, Toma, Augustin, Rubin, Barry, and Wang, Bo
Subjects: Computer Science - Machine Learning, 68T01, I.2.0
Abstract: The electrocardiogram (ECG) is a ubiquitous diagnostic test. Conventional task-specific ECG analysis models require large numbers of expensive ECG annotations or associated labels to train. Transfer learning techniques have been shown to improve generalization and reduce reliance on labeled data. We present ECG-FM, an open foundation model for ECG analysis, and conduct a comprehensive study performed on a dataset of 1.66 million ECGs sourced from both publicly available and private institutional sources. ECG-FM adopts a transformer-based architecture and is pretrained on 2.5 million samples using ECG-specific augmentations and contrastive learning, as well as a continuous signal masking objective. Our transparent evaluation includes a diverse range of downstream tasks, where we predict ECG interpretation labels, reduced left ventricular ejection fraction, and abnormal cardiac troponin. Affirming ECG-FM's effectiveness as a foundation model, we demonstrate how its command of contextual information results in strong performance, rich pretrained embeddings, and reliable interpretability. Due to a lack of open-weight practices, we highlight how ECG analysis is lagging behind other medical machine learning subfields in terms of foundation model adoption. Our code is available at https://github.com/bowang-lab/ECG-FM/., Comment: 22 pages, 7 figures, 10 tables
Published: 2024

23. MathPartner: An Artificial Intelligence Cloud Service

Author: Malaschonok, Gennadi and Seliverstov, Alexandr
Subjects: Computer Science - Symbolic Computation, 68T01, F.4.1, F.2.2
Abstract: In a broad sense, artificial intelligence is a service to find a solution to complex intellectual problems. In this sense, the MathPartner service provides artificial intelligence that allows us to formulate questions and receive answers to questions formulated in a mathematical language. For mathematicians and physicists today, such a language is \LaTeX. The MathPartner service uses a dialect of \LaTeX, which is called Mathpar. The service is a cloud-based computer algebra system and provides users with the opportunity to solve many mathematical problems. In this publication, we focus only on a small class of extremum problems, which are widely applied in economics, management, logistics, and in many engineering fields. In particular, we consider the shortest path problem and discuss an algorithm that is based on the tropical mathematics. The ability to work with many types of classical and tropical algebras, which are freely available to users, is an important distinguishing feature of this intelligent tool for symbolic-numerical calculations. We also consider the use of the simplex algorithm for solving optimization problems., Comment: 13 pages
Published: 2024

24. Data-Driven Stochastic Closure Modeling via Conditional Diffusion Model and Neural Operator

Author: Dong, Xinghao, Chen, Chuanqi, and Wu, Jin-Long
Subjects: Computer Science - Machine Learning, Mathematics - Dynamical Systems, Physics - Computational Physics, 68T01
Abstract: Closure models are widely used in simulating complex multiscale dynamical systems such as turbulence and the earth system, for which direct numerical simulation that resolves all scales is often too expensive. For those systems without a clear scale separation, deterministic and local closure models often lack enough generalization capability, which limits their performance in many real-world applications. In this work, we propose a data-driven modeling framework for constructing stochastic and non-local closure models via conditional diffusion model and neural operator. Specifically, the Fourier neural operator is incorporated into a score-based diffusion model, which serves as a data-driven stochastic closure model for complex dynamical systems governed by partial differential equations (PDEs). We also demonstrate how accelerated sampling methods can improve the efficiency of the data-driven stochastic closure model. The results show that the proposed methodology provides a systematic approach via generative machine learning techniques to construct data-driven stochastic closure models for multiscale dynamical systems with continuous spatiotemporal fields.
Published: 2024

25. Entropy, Thermodynamics and the Geometrization of the Language Model

Author: Yang, Wenzhe
Subjects: Computer Science - Computation and Language, Condensed Matter - Statistical Mechanics, High Energy Physics - Theory, Mathematics - Differential Geometry, 68T01, I.2.7
Abstract: In this paper, we discuss how pure mathematics and theoretical physics can be applied to the study of language models. Using set theory and analysis, we formulate mathematically rigorous definitions of language models, and introduce the concept of the moduli space of distributions for a language model. We formulate a generalized distributional hypothesis using functional analysis and topology. We define the entropy function associated with a language model and show how it allows us to understand many interesting phenomena in languages. We argue that the zero points of the entropy function and the points where the entropy is close to 0 are the key obstacles for an LLM to approximate an intelligent language model, which explains why good LLMs need billions of parameters. Using the entropy function, we formulate a conjecture about AGI. Then, we show how thermodynamics gives us an immediate interpretation to language models. In particular we will define the concepts of partition function, internal energy and free energy for a language model, which offer insights into how language models work. Based on these results, we introduce a general concept of the geometrization of language models and define what is called the Boltzmann manifold. While the current LLMs are the special cases of the Boltzmann manifold., Comment: 18 pages
Published: 2024

26. An Interpretable Rule Creation Method for Black-Box Models based on Surrogate Trees -- SRules

Author: Verdasco, Mario Parrón and García-Cuesta, Esteban
Subjects: Computer Science - Machine Learning, 68T01, I.2.6
Abstract: As artificial intelligence (AI) systems become increasingly integrated into critical decision-making processes, the need for transparent and interpretable models has become paramount. In this article we present a new ruleset creation method based on surrogate decision trees (SRules), designed to improve the interpretability of black-box machine learning models. SRules balances the accuracy, coverage, and interpretability of machine learning models by recursively creating surrogate interpretable decision tree models that approximate the decision boundaries of a complex model. We propose a systematic framework for generating concise and meaningful rules from these surrogate models, allowing stakeholders to understand and trust the AI system's decision-making process. Our approach not only provides interpretable rules, but also quantifies the confidence and coverage of these rules. The proposed model allows to adjust its parameters to counteract the lack of interpretability by precision and coverage by allowing a near perfect fit and high interpretability of some parts of the model . The results show that SRules improves on other state-of-the-art techniques and introduces the possibility of creating highly interpretable specific rules for specific sub-parts of the model.
Published: 2024

27. VACoDe: Visual Augmented Contrastive Decoding

Author: Kim, Sihyeon, Cho, Boryeong, Bae, Sangmin, Ahn, Sumyeong, and Yun, Se-Young
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, 68T01, I.2.0
Abstract: Despite the astonishing performance of recent Large Vision-Language Models (LVLMs), these models often generate inaccurate responses. To address this issue, previous studies have focused on mitigating hallucinations by employing contrastive decoding (CD) with augmented images, which amplifies the contrast with the original image. However, these methods have limitations, including reliance on a single augmentation, which is restrictive for certain tasks, as well as the high cost of using external knowledge. In this study, we address these limitations by exploring how to utilize multiple image augmentations. Through extensive experiments, we observed that different augmentations produce varying levels of contrast depending on the task. Based on this observation, we introduce a novel method called VACoDe, Visual Augmented Contrastive Decoding. This method adaptively selects the augmentation with the highest contrast for each task using the proposed softmax distance metric. Our empirical tests show that \alg outperforms previous methods and improves output quality in various vision-language tasks. Additionally, VACoDe can be universally applied across different model types and sizes without additional training or the use of external models and data., Comment: 10 pages, 7 figures
Published: 2024

28. Neural information field filter

Author: Hao, Kairui and Bilionis, Ilias
Subjects: Statistics - Machine Learning, Physics - Data Analysis, Statistics and Probability, 68T01, J.2.7
Abstract: We introduce neural information field filter, a Bayesian state and parameter estimation method for high-dimensional nonlinear dynamical systems given large measurement datasets. Solving such a problem using traditional methods, such as Kalman and particle filters, is computationally expensive. Information field theory is a Bayesian approach that can efficiently reconstruct dynamical model state paths and calibrate model parameters from noisy measurement data. To apply the method, we parameterize the time evolution state path using the span of a finite linear basis. The existing method has to reparameterize the state path by initial states to satisfy the initial condition. Designing an expressive yet simple linear basis before knowing the true state path is crucial for inference accuracy but challenging. Moreover, reparameterizing the state path using the initial state is easy to perform for a linear basis, but is nontrivial for more complex and expressive function parameterizations, such as neural networks. The objective of this paper is to simplify and enrich the class of state path parameterizations using neural networks for the information field theory approach. To this end, we propose a generalized physics-informed conditional prior using an auxiliary initial state. We show the existing reparameterization is a special case. We parameterize the state path using a residual neural network that consists of a linear basis function and a Fourier encoding fully connected neural network residual function. The residual function aims to correct the error of the linear basis function. To sample from the intractable posterior distribution, we develop an optimization algorithm, nested stochastic variational inference, and a sampling algorithm, nested preconditioned stochastic gradient Langevin dynamics. A series of numerical and experimental examples verify and validate the proposed method.
Published: 2024

29. IDA: Breaking Barriers in No-code UI Automation Through Large Language Models and Human-Centric Design

Author: Shlomov, Segev, Yaeli, Avi, Marreed, Sami, Schwartz, Sivan, Eder, Netanel, Akrabi, Offer, and Zeltyn, Sergey
Subjects: Computer Science - Human-Computer Interaction, 68T01
Abstract: Business users dedicate significant amounts of time to repetitive tasks within enterprise digital platforms, highlighting a critical need for automation. Despite advancements in low-code tools for UI automation, their complexity remains a significant barrier to adoption among non-technical business users. However, recent advancements in large language models (LLMs) have created new opportunities to overcome this barrier by offering more powerful, yet simpler and more human-centric programming environments. This paper presents IDA (Intelligent Digital Apprentice), a novel no-code Web UI automation tool designed specifically to empower business users with no technical background. IDA incorporates human-centric design principles, including guided programming by demonstration, semantic programming model, and teacher-student learning metaphor which is tailored to the skill set of business users. By leveraging LLMs, IDA overcomes some of the key technical barriers that have traditionally limited the possibility of no-code solutions. We have developed a prototype of IDA and conducted a user study involving real world business users and enterprise applications. The promising results indicate that users could effectively utilize IDA to create automation. The qualitative feedback indicates that IDA is perceived as user-friendly and trustworthy. This study contributes to unlocking the potential of AI assistants to enhance the productivity of business users through no-code user interface automation.
Published: 2024

30. Mechanistic interpretability of large language models with applications to the financial services industry

Author: Golgoon, Ashkan, Filom, Khashayar, and Kannan, Arjun Ravi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Computation and Language, Mathematics - Numerical Analysis, 68T01, I.2.7
Abstract: Large Language Models such as GPTs (Generative Pre-trained Transformers) exhibit remarkable capabilities across a broad spectrum of applications. Nevertheless, due to their intrinsic complexity, these models present substantial challenges in interpreting their internal decision-making processes. This lack of transparency poses critical challenges when it comes to their adaptation by financial institutions, where concerns and accountability regarding bias, fairness, and reliability are of paramount importance. Mechanistic interpretability aims at reverse engineering complex AI models such as transformers. In this paper, we are pioneering the use of mechanistic interpretability to shed some light on the inner workings of large language models for use in financial services applications. We offer several examples of how algorithmic tasks can be designed for compliance monitoring purposes. In particular, we investigate GPT-2 Small's attention pattern when prompted to identify potential violation of Fair Lending laws. Using direct logit attribution, we study the contributions of each layer and its corresponding attention heads to the logit difference in the residual stream. Finally, we design clean and corrupted prompts and use activation patching as a causal intervention method to localize our task completion components further. We observe that the (positive) heads $10.2$ (head $2$, layer $10$), $10.7$, and $11.3$, as well as the (negative) heads $9.6$ and $10.6$ play a significant role in the task completion.
Published: 2024
Full Text: View/download PDF

31. Explainable artificial intelligence in breast cancer detection and risk prediction: A systematic scoping review

Author: Ghasemi, Amirehsan, Hashtarkhani, Soheil, Schwartz, David L, and Shaban-Nejad, Arash
Subjects: Quantitative Biology - Quantitative Methods, Electrical Engineering and Systems Science - Image and Video Processing, 68T01
Abstract: With the advances in artificial intelligence (AI), data-driven algorithms are becoming increasingly popular in the medical domain. However, due to the nonlinear and complex behavior of many of these algorithms, decision-making by such algorithms is not trustworthy for clinicians and is considered a black-box process. Hence, the scientific community has introduced explainable artificial intelligence (XAI) to remedy the problem. This systematic scoping review investigates the application of XAI in breast cancer detection and risk prediction. We conducted a comprehensive search on Scopus, IEEE Explore, PubMed, and Google Scholar (first 50 citations) using a systematic search strategy. The search spanned from January 2017 to July 2023, focusing on peer-reviewed studies implementing XAI methods in breast cancer datasets. Thirty studies met our inclusion criteria and were included in the analysis. The results revealed that SHapley Additive exPlanations (SHAP) is the top model-agnostic XAI technique in breast cancer research in terms of usage, explaining the model prediction results, diagnosis and classification of biomarkers, and prognosis and survival analysis. Additionally, the SHAP model primarily explained tree-based ensemble machine learning models. The most common reason is that SHAP is model agnostic, which makes it both popular and useful for explaining any model prediction. Additionally, it is relatively easy to implement effectively and completely suits performant models, such as tree-based models. Explainable AI improves the transparency, interpretability, fairness, and trustworthiness of AI-enabled health systems and medical devices and, ultimately, the quality of care and outcomes., Comment: 22 Pages, 6 Figures, 13 Tables
Published: 2024
Full Text: View/download PDF

32. Unbalanced optimal transport for stochastic particle tracking

Author: Hao, Kairui, Hans, Atharva, Vlachos, Pavlos, and Bilionis, Ilias
Subjects: Physics - Data Analysis, Statistics and Probability, Physics - Fluid Dynamics, 68T01, J.2.7
Abstract: Non-invasive flow measurement techniques, such as particle tracking velocimetry, resolve 3D velocity fields by pairing tracer particle positions in successive time steps. These trajectories are crucial for evaluating physical quantities like vorticity, shear stress, pressure, and coherent structures. Traditional approaches deterministically reconstruct particle positions and extract particle tracks using tracking algorithms. However, reliable track estimation is challenging due to measurement noise caused by high particle density, particle image overlap, and falsely reconstructed 3D particle positions. To overcome this challenge, probabilistic approaches quantify the epistemic uncertainty in particle positions, typically using a Gaussian probability distribution. However, the standard deterministic tracking algorithms relying on nearest-neighbor search do not directly extend to the probabilistic setting. Moreover, such algorithms do not necessarily find globally consistent solutions robust to reconstruction errors. This paper aims to develop a globally consistent nearest-neighborhood algorithm that robustly extracts stochastic particle tracks from the reconstructed Gaussian particle distributions in all frames. Our tracking algorithm relies on the unbalanced optimal transport theory in the metric space of Gaussian measures. Specifically, we optimize a binary transport plan for efficiently moving the Gaussian distributions of reconstructed particle positions between time frames. We achieve this by computing the partial Wasserstein distance in the metric space of Gaussian measures. Our tracking algorithm is robust to position reconstruction errors since it automatically detects the number of particles that should be matched through hyperparameter optimization. Finally, we validate our method using an in vitro flow experiment using a 3D-printed cerebral aneurysm.
Published: 2024

33. One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Author: Wang, Ruochen, An, Sohyun, Cheng, Minhao, Zhou, Tianyi, Hwang, Sung Ju, and Hsieh, Cho-Jui
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning, 68T01
Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks., Comment: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts
Published: 2024

34. On Discrete Prompt Optimization for Diffusion Models

Author: Wang, Ruochen, Liu, Ting, Hsieh, Cho-Jui, and Gong, Boqing
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning, 68T01
Abstract: This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently finding a solution to this problem: (1) Enormous Domain Space: Setting the domain to the entire language space poses significant difficulty to the optimization process. (2) Text Gradient: Efficiently computing the text gradient is challenging, as it requires backpropagating through the inference steps of the diffusion model and a non-differentiable embedding lookup table. Beyond the problem formulation, our main technical contributions lie in solving the above challenges. First, we design a family of dynamically generated compact subspaces comprised of only the most relevant words to user input, substantially restricting the domain space. Second, we introduce "Shortcut Text Gradient" -- an effective replacement for the text gradient that can be obtained with constant memory and runtime. Empirical evaluation on prompts collected from diverse sources (DiffusionDB, ChatGPT, COCO) suggests that our method can discover prompts that substantially improve (prompt enhancement) or destroy (adversarial attack) the faithfulness of images generated by the text-to-image diffusion model., Comment: ICML 2024. Code available at https://github.com/ruocwang/dpo-diffusion
Published: 2024

35. What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Author: Cao, Xu, Lai, Bolin, Ye, Wenqian, Ma, Yunsheng, Heintz, Joerg, Chen, Jintai, Cao, Jianguo, and Rehg, James M.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, 68T01
Abstract: Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships among patterns in a set of images and extrapolate to predict subsequent patterns. This skill is crucial during the early neurodevelopmental stages of children. Inspired by the AVR tasks in Raven's Progressive Matrices (RPM) and Wechsler Intelligence Scale for Children (WISC), we propose a new dataset MaRs-VQA and a new benchmark VCog-Bench containing three datasets to evaluate the zero-shot AVR capability of MLLMs and compare their performance with existing human intelligent investigation. Our comparative experiments with different open-source and closed-source MLLMs on the VCog-Bench revealed a gap between MLLMs and human intelligence, highlighting the visual cognitive limitations of current MLLMs. We believe that the public release of VCog-Bench, consisting of MaRs-VQA, and the inference pipeline will drive progress toward the next generation of MLLMs with human-like visual cognition abilities., Comment: 14 pages, 4 figures, the appendix will be updated soon
Published: 2024

36. Differentially Private Prototypes for Imbalanced Transfer Learning

Author: Wahdany, Dariush, Jagielski, Matthew, Dziedzic, Adam, and Boenisch, Franziska
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, 68T01
Abstract: Machine learning (ML) models have been shown to leak private information from their training datasets. Differential Privacy (DP), typically implemented through the differential private stochastic gradient descent algorithm (DP-SGD), has become the standard solution to bound leakage from the models. Despite recent improvements, DP-SGD-based approaches for private learning still usually struggle in the high privacy ($\varepsilon\le1)$ and low data regimes, and when the private training datasets are imbalanced. To overcome these limitations, we propose Differentially Private Prototype Learning (DPPL) as a new paradigm for private transfer learning. DPPL leverages publicly pre-trained encoders to extract features from private data and generates DP prototypes that represent each private class in the embedding space and can be publicly released for inference. Since our DP prototypes can be obtained from only a few private training data points and without iterative noise addition, they offer high-utility predictions and strong privacy guarantees even under the notion of \textit{pure DP}. We additionally show that privacy-utility trade-offs can be further improved when leveraging the public data beyond pre-training of the encoder: in particular, we can privately sample our DP prototypes from the publicly available data points used to train the encoder. Our experimental evaluation with four state-of-the-art encoders, four vision datasets, and under different data and imbalancedness regimes demonstrate DPPL's high performance under strong privacy guarantees in challenging private learning setups, Comment: To be published at the 39th Annual AAAI Conference on Artificial Intelligence, Philadelphia, 2025
Published: 2024

37. Pedestrian intention prediction in Adverse Weather Conditions with Spiking Neural Networks and Dynamic Vision Sensors

Author: Sakhai, Mustafa, Mazurek, Szymon, Caputa, Jakub, Argasiński, Jan K., and Wielgosz, Maciej
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, 68T01, I.2.1
Abstract: This study examines the effectiveness of Spiking Neural Networks (SNNs) paired with Dynamic Vision Sensors (DVS) to improve pedestrian detection in adverse weather, a significant challenge for autonomous vehicles. Utilizing the high temporal resolution and low latency of DVS, which excels in dynamic, low-light, and high-contrast environments, we assess the efficiency of SNNs compared to traditional Convolutional Neural Networks (CNNs). Our experiments involved testing across diverse weather scenarios using a custom dataset from the CARLA simulator, mirroring real-world variability. SNN models, enhanced with Temporally Effective Batch Normalization, were trained and benchmarked against state-of-the-art CNNs to demonstrate superior accuracy and computational efficiency in complex conditions such as rain and fog. The results indicate that SNNs, integrated with DVS, significantly reduce computational overhead and improve detection accuracy in challenging conditions compared to CNNs. This highlights the potential of DVS combined with bio-inspired SNN processing to enhance autonomous vehicle perception and decision-making systems, advancing intelligent transportation systems' safety features in varying operational environments. Additionally, our research indicates that SNNs perform more efficiently in handling long perception windows and prediction tasks, rather than simple pedestrian detection., Comment: Submitted for peer review to IEEE Transactions on Intelligent Transportation Systems
Published: 2024

38. Non-negative Tensor Mixture Learning for Discrete Density Estimation

Author: Ghalamkari, Kazu, Hinrich, Jesper Løve, and Mørup, Morten
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, 68T01, I.2.6
Abstract: We present an expectation-maximization (EM) based unified framework for non-negative tensor decomposition that optimizes the Kullback-Leibler divergence. To avoid iterations in each M-step and learning rate tuning, we establish a general relationship between low-rank decomposition and many-body approximation. Using this connection, we exploit that the closed-form solution of the many-body approximation can be used to update all parameters simultaneously in the M-step. Our framework not only offers a unified methodology for a variety of low-rank structures, including CP, Tucker, and Train decompositions, but also their combinations forming mixtures of tensors as well as robust adaptive noise modeling. Empirically, we demonstrate that our framework provides superior generalization for discrete density estimation compared to conventional tensor-based approaches., Comment: 24 pages, 5 figures
Published: 2024

39. Intelligence as Computation

Author: Brock, Oliver
Subjects: Computer Science - Artificial Intelligence, Computer Science - Robotics, 68T01, I.2.0
Abstract: This paper proposes a specific conceptualization of intelligence as computation. This conceptualization is intended to provide a unified view for all disciplines of intelligence research. Already, it unifies several conceptualizations currently under investigation, including physical, neural, embodied, morphological, and mechanical intelligences. To achieve this, the proposed conceptualization explains the differences among existing views by different computational paradigms, such as digital, analog, mechanical, or morphological computation. Viewing intelligence as a composition of computations from different paradigms, the challenges posed by previous conceptualizations are resolved. Intelligence is hypothesized as a multi-paradigmatic computation relying on specific computational principles. These principles distinguish intelligence from other, non-intelligent computations. The proposed conceptualization implies a multi-disciplinary research agenda that is intended to lead to unified science of intelligence., Comment: 30 pages, 0 figures, submitted for review to a journal
Published: 2024

40. Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

Author: Gao, Rong, Liu, Xin, Xing, Bohao, Yu, Zitong, Schuller, Bjorn W., and Kälviäinen, Heikki
Subjects: Computer Science - Computer Vision and Pattern Recognition, 68T01, J.4
Abstract: In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethinking. The first is whether strategies designed for other action recognition are entirely applicable to micro-gestures. The second is whether micro-gestures, as supplementary data, can provide additional insights for emotional understanding. In recognizing micro-gestures, we explored various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods. Considering the significance of temporal domain information for micro-gestures, we introduce a simple and efficient plug-and-play spatiotemporal balancing fusion method. We not only studied our method on the considered micro-gesture dataset but also conducted experiments on mainstream action datasets. The results show that our approach performs well in micro-gesture recognition and on other datasets, achieving state-of-the-art performance compared to previous micro-gesture recognition methods. For emotional understanding based on micro-gestures, we construct complex emotional reasoning scenarios. Our evaluation, conducted with large language models, shows that micro-gestures play a significant and positive role in enhancing comprehensive emotional understanding. The scenarios we developed can be extended to other micro-gesture-based tasks such as deception detection and interviews. We confirm that our new insights contribute to advancing research in micro-gesture and emotional artificial intelligence., Comment: We provide a link to the public release of the code and data in this new version
Published: 2024

41. A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection

Author: Chen, Honghui, Chen, Pingping, Mao, Huan, and Jiang, Mengxi
Subjects: Computer Science - Computer Vision and Pattern Recognition, 68T01, I.2.10
Abstract: Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder. In particular, it does not require any data augmentations and anomalous images for training. The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization. The difference comparison between those features of encoder and decode lead to more accurate and robust localization results than the comparison in single feature or pixel-by-pixel comparison in the conventional works. Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets on both anomaly detection and localization., Comment: 12 pages, 4 figures
Published: 2024

42. HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition

Author: Chen, Honghui, Qiu, Yuhang, Wang, Jiabao, Chen, Pingping, and Ling, Nam
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, 68T01, I.2.10
Abstract: Internal Language Model (LM)-based methods use permutation language modeling (PLM) to solve the error correction caused by conditional independence in external LM-based methods. However, random permutations of human interference cause fit oscillations in the model training, and Iterative Refinement (IR) operation to improve multimodal information decoupling also introduces additional overhead. To address these issues, this paper proposes the Hierarchical Attention autoregressive Model with Adaptive Permutation (HAAP) to enhance the location-context-image interaction capability, improving autoregressive generalization with internal LM. First, we propose Implicit Permutation Neurons (IPN) to generate adaptive attention masks to dynamically exploit token dependencies. The adaptive masks increase the diversity of training data and prevent model dependency on a specific order. It reduces the training overhead of PLM while avoiding training fit oscillations. Second, we develop Cross-modal Hierarchical Attention mechanism (CHA) to couple context and image features. This processing establishes rich positional semantic dependencies between context and image while avoiding IR. Extensive experimental results show the proposed HAAP achieves state-of-the-art (SOTA) performance in terms of accuracy, complexity, and latency on several datasets., Comment: 12 pages, 10 figures
Published: 2024

43. Intrinsic Rewards for Exploration without Harm from Observational Noise: A Simulation Study Based on the Free Energy Principle

Author: Tinker, Theodore Jerome, Doya, Kenji, and Tani, Jun
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning, 68T01
Abstract: In Reinforcement Learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well-established in literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the Free Energy Principle (FEP), this paper proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity, and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents., Comment: 54 pages, 11 figures, to be published in Neural Computation
Published: 2024

44. Driving down Poisson error can offset classification error in clinical tasks

Author: Delahunt, Charles B., Mehanian, Courosh, and Horning, Matthew P.
Subjects: Computer Science - Machine Learning, 68T01, I.2.0
Abstract: Medical machine learning algorithms are typically evaluated based on accuracy vs. a clinician-defined ground truth, a reasonable initial choice since trained clinicians are usually better classifiers than ML models. However, this metric does not fully capture the actual clinical task: it neglects the fact that humans, even with perfect accuracy, are subject to non-trivial error from the Poisson statistics of rare events, because clinical protocols often specify a relatively small sample size. For example, to quantitate malaria on a thin blood film a clinician examines only 2000 red blood cells (0.0004 uL), which can yield large Poisson variation in the actual number of parasites present, so that a perfect human's count can differ substantially from the true average load. In contrast, an ML system may be less accurate on an object level, but it may also have the option to examine more blood (e.g. 0.1 uL, or 250x). Then while its parasite identification error is higher, the Poisson variability of its estimate is lower due to larger sample size. To qualify for clinical deployment, an ML system's performance must match current standard of care, typically a very demanding target. To achieve this, it may be possible to offset the ML system's lower accuracy by increasing its sample size to reduce Poisson error, and thus attain the same net clinical performance as a perfectly accurate human limited by smaller sample size. In this paper, we analyse the mathematics of the relationship between Poisson error, classification error, and total error. This mathematical toolkit enables teams optimizing ML systems to leverage a relative strength (larger sample sizes) to offset a relative weakness (classification accuracy). We illustrate the methods with two concrete examples: diagnosis and quantitation of malaria on blood films., Comment: 8 pages + appendix, 4 figures
Published: 2024

45. Universal Adversarial Perturbations for Vision-Language Pre-trained Models

Author: Zhang, Peng-Fei, Huang, Zi, and Bai, Guangdong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, 68T01, I.2.0
Abstract: Vision-language pre-trained (VLP) models have been the foundation of numerous vision-language tasks. Given their prevalence, it becomes imperative to assess their adversarial robustness, especially when deploying them in security-crucial real-world applications. Traditionally, adversarial perturbations generated for this assessment target specific VLP models, datasets, and/or downstream tasks. This practice suffers from low transferability and additional computation costs when transitioning to new scenarios. In this work, we thoroughly investigate whether VLP models are commonly sensitive to imperceptible perturbations of a specific pattern for the image modality. To this end, we propose a novel black-box method to generate Universal Adversarial Perturbations (UAPs), which is so called the Effective and T ransferable Universal Adversarial Attack (ETU), aiming to mislead a variety of existing VLP models in a range of downstream tasks. The ETU comprehensively takes into account the characteristics of UAPs and the intrinsic cross-modal interactions to generate effective UAPs. Under this regime, the ETU encourages both global and local utilities of UAPs. This benefits the overall utility while reducing interactions between UAP units, improving the transferability. To further enhance the effectiveness and transferability of UAPs, we also design a novel data augmentation method named ScMix. ScMix consists of self-mix and cross-mix data transformations, which can effectively increase the multi-modal data diversity while preserving the semantics of the original data. Through comprehensive experiments on various downstream tasks, VLP models, and datasets, we demonstrate that the proposed method is able to achieve effective and transferrable universal adversarial attacks., Comment: 9 pages, 5 figures
Published: 2024

46. From Language Models to Practical Self-Improving Computer Agents

Author: Sheng, Alex
Subjects: Computer Science - Artificial Intelligence, 68T01, I.2.0
Abstract: We develop a simple and straightforward methodology to create AI computer agents that can carry out diverse computer tasks and self-improve by developing tools and augmentations to enable themselves to solve increasingly complex tasks. As large language models (LLMs) have been shown to benefit from non-parametric augmentations, a significant body of recent work has focused on developing software that augments LLMs with various capabilities. Rather than manually developing static software to augment LLMs through human engineering effort, we propose that an LLM agent can systematically generate software to augment itself. We show, through a few case studies, that a minimal querying loop with appropriate prompt engineering allows an LLM to generate and use various augmentations, freely extending its own capabilities to carry out real-world computer tasks. Starting with only terminal access, we prompt an LLM agent to augment itself with retrieval, internet search, web navigation, and text editor capabilities. The agent effectively uses these various tools to solve problems including automated software development and web-based tasks.
Published: 2024

47. Taxonomy to Regulation: A (Geo)Political Taxonomy for AI Risks and Regulatory Measures in the EU AI Act

Author: Arda, Sinan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning, 68T01, I.2.0
Abstract: Technological innovations have shown remarkable capabilities to benefit and harm society alike. AI constitutes a democratized sophisticated technology accessible to large parts of society, including malicious actors. This work proposes a taxonomy focusing on on (geo)political risks associated with AI. It identifies 12 risks in total divided into four categories: (1) Geopolitical Pressures, (2) Malicious Usage, (3) Environmental, Social, and Ethical Risks, and (4) Privacy and Trust Violations. Incorporating a regulatory side, this paper conducts a policy assessment of the EU AI Act. Adopted in March 2023, the landmark regulation has the potential to have a positive top-down impact concerning AI risk reduction but needs regulatory adjustments to mitigate risks more comprehensively. Regulatory exceptions for open-source models, excessively high parameters for the classification of GPAI models as a systemic risk, and the exclusion of systems designed exclusively for military purposes from the regulation's obligations leave room for future action.
Published: 2024

48. AI Consciousness is Inevitable: A Theoretical Computer Science Perspective

Author: Blum, Lenore and Blum, Manuel
Subjects: Computer Science - Artificial Intelligence, 68T01, F.1, I.2
Abstract: We look at consciousness through the lens of Theoretical Computer Science, a branch of mathematics that studies computation under resource limitations. From this perspective, we develop a formal machine model for consciousness. The model is inspired by Alan Turing's simple yet powerful model of computation and Bernard Baars' theater model of consciousness. Though extremely simple, the model (1) aligns at a high level with many of the major scientific theories of human and animal consciousness, (2) provides explanations at a high level for many phenomena associated with consciousness, and (3) gives insight into how a machine can have subjective consciousness. This combination supports our claim that machine consciousness is not only plausible but inevitable.
Published: 2024

49. As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

Author: Cooke, Di, Edwards, Abigail, Barkoff, Sophia, and Kelly, Kathryn
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, 68T01, I.2
Abstract: As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense., Comment: For study pre-registration, see https://osf.io/fnhr3
Published: 2024

50. Combination of Weak Learners eXplanations to Improve Random Forest eXplicability Robustness

Author: Pala, Riccardo and García-Cuesta, Esteban
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, 68T01, I.2.0, I.2.1
Abstract: The notion of robustness in XAI refers to the observed variations in the explanation of the prediction of a learned model with respect to changes in the input leading to that prediction. Intuitively, if the input being explained is modified slightly subtly enough so as to not change the prediction of the model too much, then we would expect that the explanation provided for that new input does not change much either. We argue that a combination through discriminative averaging of ensembles weak learners explanations can improve the robustness of explanations in ensemble methods.This approach has been implemented and tested with post-hoc SHAP method and Random Forest ensemble with successful results. The improvements obtained have been measured quantitatively and some insights into the explicability robustness in ensemble methods are presented., Comment: 8 pages, 10 figures
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,030 results on '"68T01"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources