61,630 results on '"Schaeffer A"'
Search Results
2. Best-of-N Jailbreaking
- Author
-
Hughes, John, Price, Sara, Lynch, Aengus, Schaeffer, Rylan, Barez, Fazl, Koyejo, Sanmi, Sleight, Henry, Jones, Erik, Perez, Ethan, and Sharma, Mrinank
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
We introduce Best-of-N (BoN) Jailbreaking, a simple black-box algorithm that jailbreaks frontier AI systems across modalities. BoN Jailbreaking works by repeatedly sampling variations of a prompt with a combination of augmentations - such as random shuffling or capitalization for textual prompts - until a harmful response is elicited. We find that BoN Jailbreaking achieves high attack success rates (ASRs) on closed-source language models, such as 89% on GPT-4o and 78% on Claude 3.5 Sonnet when sampling 10,000 augmented prompts. Further, it is similarly effective at circumventing state-of-the-art open-source defenses like circuit breakers. BoN also seamlessly extends to other modalities: it jailbreaks vision language models (VLMs) such as GPT-4o and audio language models (ALMs) like Gemini 1.5 Pro, using modality-specific augmentations. BoN reliably improves when we sample more augmented prompts. Across all modalities, ASR, as a function of the number of samples (N), empirically follows power-law-like behavior for many orders of magnitude. BoN Jailbreaking can also be composed with other black-box algorithms for even more effective attacks - combining BoN with an optimized prefix attack achieves up to a 35% increase in ASR. Overall, our work indicates that, despite their capability, language models are sensitive to seemingly innocuous changes to inputs, which attackers can exploit across modalities.
- Published
- 2024
3. Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
- Author
-
Wang, Tony T., Hughes, John, Sleight, Henry, Schaeffer, Rylan, Agrawal, Rajashree, Barez, Fazl, Sharma, Mrinank, Mu, Jesse, Shavit, Nir, and Perez, Ethan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Cryptography and Security - Abstract
Defending large language models against jailbreaks so that they never engage in a broadly-defined set of forbidden behaviors is an open problem. In this paper, we investigate the difficulty of jailbreak-defense when we only want to forbid a narrowly-defined set of behaviors. As a case study, we focus on preventing an LLM from helping a user make a bomb. We find that popular defenses such as safety training, adversarial training, and input/output classifiers are unable to fully solve this problem. In pursuit of a better solution, we develop a transcript-classifier defense which outperforms the baseline defenses we test. However, our classifier defense still fails in some circumstances, which highlights the difficulty of jailbreak-defense even in a narrow domain., Comment: Accepted to the AdvML-Frontiers and SoLaR workshops at NeurIPS 2024
- Published
- 2024
4. VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction
- Author
-
Cao, Yadi, Liu, Yuxuan, Yang, Liu, Yu, Rose, Schaeffer, Hayden, and Osher, Stanley
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Physics - Fluid Dynamics - Abstract
In-Context Operator Networks (ICONs) are models that learn operators across different types of PDEs using a few-shot, in-context approach. Although they show successful generalization to various PDEs, existing methods treat each data point as a single token, and suffer from computational inefficiency when processing dense data, limiting their application in higher spatial dimensions. In this work, we propose Vision In-Context Operator Networks (VICON), incorporating a vision transformer architecture that efficiently processes 2D functions through patch-wise operations. We evaluated our method on three fluid dynamics datasets, demonstrating both superior performance (reducing scaled $L^2$ error by $40\%$ and $61.6\%$ for two benchmark datasets for compressible flows, respectively) and computational efficiency (requiring only one-third of the inference time per frame) in long-term rollout predictions compared to the current state-of-the-art sequence-to-sequence model with fixed timestep prediction: Multiple Physics Pretraining (MPP). Compared to MPP, our method preserves the benefits of in-context operator learning, enabling flexible context formation when dealing with insufficient frame counts or varying timestep values.
- Published
- 2024
5. Interpretation of High-Dimensional Regression Coefficients by Comparison with Linearized Compressing Features
- Author
-
Schaeffer, Joachim, Rhyu, Jinwook, Droop, Robin, Findeisen, Rolf, and Braatz, Richard
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Linear regression is often deemed inherently interpretable; however, challenges arise for high-dimensional data. We focus on further understanding how linear regression approximates nonlinear responses from high-dimensional functional data, motivated by predicting cycle life for lithium-ion batteries. We develop a linearization method to derive feature coefficients, which we compare with the closest regression coefficients of the path of regression solutions. We showcase the methods on battery data case studies where a single nonlinear compressing feature, $g\colon \mathbb{R}^p \to \mathbb{R}$, is used to construct a synthetic response, $\mathbf{y} \in \mathbb{R}$. This unifying view of linear regression and compressing features for high-dimensional functional data helps to understand (1) how regression coefficients are shaped in the highly regularized domain and how they relate to linearized feature coefficients and (2) how the shape of regression coefficients changes as a function of regularization to approximate nonlinear responses by exploiting local structures., Comment: This manuscript is a short communication. 9 pages, 4 figures
- Published
- 2024
6. Set-Based Retrograde Analysis: Precomputing the Solution to 24-card Bridge Double Dummy Deals
- Author
-
Stone, Isaac, Sturtevant, Nathan R., and Schaeffer, Jonathan
- Subjects
Computer Science - Artificial Intelligence - Abstract
Retrograde analysis is used in game-playing programs to solve states at the end of a game, working backwards toward the start of the game. The algorithm iterates through and computes the perfect-play value for as many states as resources allow. We introduce setrograde analysis which achieves the same results by operating on sets of states that have the same game value. The algorithm is demonstrated by computing exact solutions for Bridge double dummy card-play. For deals with 24 cards remaining to be played ($10^{27}$ states, which can be reduced to $10^{15}$ states using preexisting techniques), we strongly solve all deals. The setrograde algorithm performs a factor of $10^3$ fewer search operations than a standard retrograde algorithm, producing a database with a factor of $10^4$ fewer entries. For applicable domains, this allows retrograde searching to reach unprecedented search depths.
- Published
- 2024
7. DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning
- Author
-
Zhang, Zecheng, Moya, Christian, Lu, Lu, Lin, Guang, and Schaeffer, Hayden
- Subjects
Computer Science - Machine Learning - Abstract
We propose a novel fine-tuning method to achieve multi-operator learning through training a distributed neural operator with diverse function data and then zero-shot fine-tuning the neural network using physics-informed losses for downstream tasks. Operator learning effectively approximates solution operators for PDEs and various PDE-related problems, yet it often struggles to generalize to new tasks. To address this, we investigate fine-tuning a pretrained model, while carefully selecting an initialization that enables rapid adaptation to new tasks with minimal data. Our approach combines distributed learning to integrate data from various operators in pre-training, while physics-informed methods enable zero-shot fine-tuning, minimizing the reliance on downstream data. We investigate standard fine-tuning and Low-Rank Adaptation fine-tuning, applying both to train complex nonlinear target operators that are difficult to learn only using random initialization. Through comprehensive numerical examples, we demonstrate the advantages of our approach, showcasing significant improvements in accuracy. Our findings provide a robust framework for advancing multi-operator learning and highlight the potential of transfer learning techniques in this domain.
- Published
- 2024
8. Novel Clinical-Grade Prostate Cancer Detection and Grading Model: Development and Prospective Validation Using Real World Data, with Performance Assessment on IHC Requested Cases
- Author
-
Nateghi, Ramin, Zhou, Ruoji, Saft, Madeline, Schnauss, Marina, Neill, Clayton, Alam, Ridwan, Handa, Nicole, Huang, Mitchell, Li, Eric V, Goldstein, Jeffery A, Schaeffer, Edward M, Nadim, Menatalla, Pourakpour, Fattaneh, Isaila, Bogdan, Felicelli, Christopher, Mehta, Vikas, Nezami, Behtash G, Ross, Ashley, Yang, Ximing, and Cooper, Lee AD
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Artificial intelligence may assist healthcare systems in meeting increasing demand for pathology services while maintaining diagnostic quality and reducing turnaround time and costs. We aimed to investigate the performance of an institutionally developed system for prostate cancer detection, grading, and workflow optimization and to contrast this with commercial alternatives. From August 2021 to March 2023, we scanned 21,396 slides from 1,147 patients with positive biopsies. We developed models for cancer detection, grading, and screening of equivocal cases for IHC ordering. We compared a task-specific model trained using the PANDA dataset of prostate cancer biopsies with one built using features extracted by the general-purpose histology foundation model, UNI and compare their performance in an unfiltered prospectively collected dataset that reflects our patient population (1737 slides,95 patients). We evaluated the contributions of a bespoke model designed to improve sensitivity in detecting small cancer foci and scoring of broader patterns observed at lower resolution. We found high concordance between the developed systems and pathologist reference in detection (AUC 98.5, sensitivity 95.0, and specificity 97.8), ISUP grading (quadratic Cohen's kappa 0.869), grade group 3 or higher (AUC 97.5, sensitivity 94.9, specificity 96.6) and comparable to published data from commercial systems. Screening could reduce IHC ordering for equivocal cases by 44.5% with an overall error rate of 1.8% (1.4% false positive, 0.4% false negative rates). Institutions like academic medical centers that have high scanning volumes and report abstraction capabilities can develop accurate computational pathology models for internal use. These models have the potential to aid in quality control role and to improve workflow in the pathology lab to help meet future challenges in prostate cancer diagnosis.
- Published
- 2024
9. On minimal positive heights for blocks of almost quasi-simple groups
- Author
-
Malle, Gunter and Fry, A. A. Schaeffer
- Subjects
Mathematics - Representation Theory ,Mathematics - Group Theory ,20C15, 20C20, 20C33 - Abstract
The Eaton--Moret\'o conjecture extends the recently-proven Brauer height zero conjecture to blocks with non-abelian defect group, positing equality between the minimal positive heights of a block of a finite group and its defect group. Here we provide further evidence for the inequality in this conjecture that is not implied by Dade's conjecture. Specifically, we consider minimal counter-examples and show that these cannot be found among almost quasi-simple groups for $p\ge5$. Along the way, we observe that most such blocks have minimal positive height equal to~1.
- Published
- 2024
10. Pushing the Boundaries: Interferometric Mass Photometry at the Quantum Limit of Sensitivity
- Author
-
Müller, Fabian, Köse, Emre, Meixner, Alfred J., Schäffer, Erik, and Braun, Daniel
- Subjects
Quantum Physics ,Physics - Instrumentation and Detectors ,Physics - Optics - Abstract
We present an innovative optical imaging system for measuring parameters of a small particle such as a macromolecule or nanoparticle at the quantum limit of sensitivity. In comparison to the conventional confocal interferometric scattering (iSCAT) approach, our setup adds a second arm to form a Michelson interferometer that allows us to tune a relative phase. We evaluate the quantum Cram\'er-Rao bound (QCRB) for different quantum states, including single-mode coherent states, multi-frequency coherent states, and phase-averaged coherent states. Our results show that the proposed setup can achieve the QCRB of sensitivity and outperform iSCAT for all considered quantum states for mass and phase estimation of a particle.
- Published
- 2024
11. ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
- Author
-
Obbad, Elyas, Mlauzi, Iddah, Miranda, Brando, Schaeffer, Rylan, Obbad, Kamal, Bedi, Suhana, and Koyejo, Sanmi
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Data selection is crucial for optimizing language model (LM) performance on specific tasks, yet most existing methods fail to effectively consider the target task distribution. Current approaches either ignore task-specific requirements entirely or rely on approximations that fail to capture the nuanced patterns needed for tasks like Autoformalization or code generation. Methods that do consider the target distribution often rely on simplistic, sometimes noisy, representations, like hashed n-gram features, which can lead to collisions and introduce noise. We introduce ZIP-FIT, a data selection framework that uses gzip compression to directly measure alignment between potential training data and the target task distribution. In extensive evaluations on Autoformalization and Python code generation, ZIP-FIT significantly outperforms leading baselines like DSIR and D4. Models trained on ZIP-FIT-selected data achieve their lowest cross-entropy loss up to 85.1\% faster than baselines, demonstrating that better task alignment leads to more efficient learning. In addition, ZIP-FIT performs selection up to 65.8\% faster than DSIR and two orders of magnitude faster than D4. Notably, ZIP-FIT shows that smaller, well-aligned datasets often outperform larger but less targeted ones, demonstrating that a small amount of higher quality data is superior to a large amount of lower quality data. Our results imply that task-aware data selection is crucial for efficient domain adaptation, and that compression offers a principled way to measure task alignment. By showing that targeted data selection can dramatically improve task-specific performance, our work provides new insights into the relationship between data quality, task alignment, and model learning efficiency.
- Published
- 2024
12. Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
- Author
-
Kazdan, Joshua, Schaeffer, Rylan, Dey, Apratim, Gerstgrasser, Matthias, Rafailov, Rafael, Donoho, David L., and Koyejo, Sanmi
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
The increasing presence of AI-generated content on the internet raises a critical question: What happens when generative machine learning models are pretrained on web-scale datasets containing data created by earlier models? Some authors prophesy $\textit{model collapse}$ under a "$\textit{replace}$" scenario: a sequence of models, the first trained with real data and each later one trained only on synthetic data from its preceding model. In this scenario, models successively degrade. Others see collapse as easily avoidable; in an "$\textit{accumulate}$' scenario, a sequence of models is trained, but each training uses all real and synthetic data generated so far. In this work, we deepen and extend the study of these contrasting scenarios. First, collapse versus avoidance of collapse is studied by comparing the replace and accumulate scenarios on each of three prominent generative modeling settings; we find the same contrast emerges in all three settings. Second, we study a compromise scenario; the available data remains the same as in the accumulate scenario -- but unlike $\textit{accumulate}$ and like $\textit{replace}$, each model is trained using a fixed compute budget; we demonstrate that model test loss on real data is larger than in the $\textit{accumulate}$ scenario, but apparently plateaus, unlike the divergence seen with $\textit{replace}$. Third, we study the relative importance of cardinality and proportion of real data for avoiding model collapse. Surprisingly, we find a non-trivial interaction between real and synthetic data, where the value of synthetic data for reducing test loss depends on the absolute quantity of real data. Our insights are particularly important when forecasting whether future frontier generative models will collapse or thrive, and our results open avenues for empirically and mathematically studying the context-dependent value of synthetic data.
- Published
- 2024
13. Systematic Feature Design for Cycle Life Prediction of Lithium-Ion Batteries During Formation
- Author
-
Rhyu, Jinwook, Schaeffer, Joachim, Li, Michael L., Cui, Xiao, Chueh, William C., Bazant, Martin Z., and Braatz, Richard D.
- Subjects
Computer Science - Machine Learning ,Statistics - Applications ,I.2.6 - Abstract
Optimization of the formation step in lithium-ion battery manufacturing is challenging due to limited physical understanding of solid electrolyte interphase formation and the long testing time (~100 days) for cells to reach the end of life. We propose a systematic feature design framework that requires minimal domain knowledge for accurate cycle life prediction during formation. Two simple Q(V) features designed from our framework, extracted from formation data without any additional diagnostic cycles, achieved a median of 9.20% error for cycle life prediction, outperforming thousands of autoML models using pre-defined features. We attribute the strong performance of our designed features to their physical origins - the voltage ranges identified by our framework capture the effects of formation temperature and microscopic particle resistance heterogeneity. By designing highly interpretable features, our approach can accelerate formation research, leveraging the interplay between data-driven feature design and mechanistic understanding., Comment: Main: 27 pages, 6 figures. SI: 13 pages, 9 figures
- Published
- 2024
14. Safe Learning-Based Optimization of Model Predictive Control: Application to Battery Fast-Charging
- Author
-
Hirt, Sebastian, Höhl, Andreas, Pohlodek, Johannes, Schaeffer, Joachim, Pfefferkorn, Maik, Braatz, Richard D., and Findeisen, Rolf
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Computer Science - Machine Learning - Abstract
Model predictive control (MPC) is a powerful tool for controlling complex nonlinear systems under constraints, but often struggles with model uncertainties and the design of suitable cost functions. To address these challenges, we discuss an approach that integrates MPC with safe Bayesian optimization to optimize long-term closed-loop performance despite significant model-plant mismatches. By parameterizing the MPC stage cost function using a radial basis function network, we employ Bayesian optimization as a multi-episode learning strategy to tune the controller without relying on precise system models. This method mitigates conservativeness introduced by overly cautious soft constraints in the MPC cost function and provides probabilistic safety guarantees during learning, ensuring that safety-critical constraints are met with high probability. As a practical application, we apply our approach to fast charging of lithium-ion batteries, a challenging task due to the complicated battery dynamics and strict safety requirements, subject to the requirement to be implementable in real time. Simulation results demonstrate that, in the context of model-plant mismatch, our method reduces charging times compared to traditional MPC methods while maintaining safety. This work extends previous research by emphasizing closed-loop constraint satisfaction and offers a promising solution for enhancing performance in systems where model uncertainties and safety are critical concerns., Comment: 7 pages, 4 figures, submitted to ACC 2025
- Published
- 2024
15. Succinct Fermion Data Structures
- Author
-
Carolan, Joseph and Schaeffer, Luke
- Subjects
Quantum Physics - Abstract
Simulating fermionic systems on a quantum computer requires representing fermionic states using qubits. The complexity of many simulation algorithms depends on the complexity of implementing rotations generated by fermionic creation-annihilation operators, and the space depends on the number of qubits used. While standard fermion encodings like Jordan-Wigner are space optimal for arbitrary fermionic systems, physical symmetries like particle conservation can reduce the number of physical configurations, allowing improved space complexity. Such space saving is only feasible if the gate overhead is small, suggesting a (quantum) data structures problem, wherein one would like to minimize space used to represent a fermionic state, while still enabling efficient rotations. We define a structure which naturally captures mappings from fermions to systems of qubits. We then instantiate it in two ways, giving rise to two new second-quantized fermion encodings of $F$ fermions in $M$ modes. An information theoretic minimum of $\mathcal{I}:=\lceil\log \binom{M}{F}\rceil$ qubits is required for such systems, a bound we nearly match over the entire parameter regime. (1) Our first construction uses $\mathcal I+o(\mathcal I)$ qubits when $F=o(M)$, and allows rotations generated by creation-annihilation operators in $O(\mathcal I)$ gates and $O(\log M \log \log M)$ depth. (2) Our second construction uses $\mathcal I+O(1)$ qubits when $F=\Theta(M)$, and allows rotations generated by creation-annihilation operators in $O(\mathcal I^3)$ gates. In relation to comparable prior work, the first represents a polynomial improvement in both space and gate complexity (against Kirby et al. 2022), and the second represents an exponential improvement in gate complexity at the cost of only a constant number of additional qubits (against Harrison et al. or Shee et al. 2022), in the described parameter regimes.
- Published
- 2024
16. Analyzing the Effective Use of Augmented Reality Glasses in University Physics Laboratory Courses for the Example Topic of Optical Polarization
- Author
-
Daniel Laumann, Paul Schlummer, Adrian Abazi, Rasmus Borkamp, Jonas Lauströer, Wolfram Pernice, Carsten Schuck, Reinhard Schulz-Schaeffer, and Stefan Heusler
- Abstract
For nearly two decades, augmented reality (AR) has found diverse applications in education, particularly in science education, where its efficacy has been supported by relevant theories and many empirical studies. However, previous studies have revealed the following research deficit: While AR technology appears to influence learning-related variables, at the time of this study only few research on the use of AR glasses in physics, a discipline for which this technology seems particularly promising in the context of laboratory experiments, has been found. Thus, the present study uses an experimental comparison group design to investigate the question of how the use of AR glasses in a physics laboratory experiment (compared to in a learning setting without AR) influences students' motivation to learn, their cognitive load during the learning process and their learning achievement. The study (sample size N = 75) investigated the impact of AR glasses in a physics laboratory experiment on optical polarization. Results align with prior research, indicating heightened motivation among learners using AR applications. However, the absence of a significant difference in cognitive load between AR and non-AR learners was unexpected. Despite expectations based on spatial contiguity, learners with AR showed no advantage in learning achievement, challenging existing meta-analyses in physics education. These findings suggest a need to shift focus from surface features, like specific AR technology, to the content design of AR applications. Future studies should analyze the deep structure of AR applications, identifying features conducive to learning.
- Published
- 2024
- Full Text
- View/download PDF
17. Prostate Cancer Risk Stratification in NRG Oncology Phase III Randomized Trials Using Multimodal Deep Learning With Digital Histopathology.
- Author
-
Tward, Jonathan, Huang, Huei-Chung, Esteva, Andre, Mohamad, Osama, van der Wal, Douwe, Simko, Jeffry, DeVries, Sandy, Zhang, Jingbin, Joun, Songwan, Showalter, Timothy, Schaeffer, Edward, Morgan, Todd, Monson, Jedidiah, Wallace, James, Bahary, Jean-Paul, Sandler, Howard, Spratt, Daniel, Rodgers, Joseph, Feng, Felix, and Tran, Phuoc
- Subjects
Humans ,Male ,Prostatic Neoplasms ,Deep Learning ,Risk Assessment ,Aged ,Randomized Controlled Trials as Topic ,Middle Aged ,Clinical Trials ,Phase III as Topic - Abstract
PURPOSE: Current clinical risk stratification methods for localized prostate cancer are suboptimal, leading to over- and undertreatment. Recently, machine learning approaches using digital histopathology have shown superior prognostic ability in phase III trials. This study aims to develop a clinically usable risk grouping system using multimodal artificial intelligence (MMAI) models that outperform current National Comprehensive Cancer Network (NCCN) risk groups. MATERIALS AND METHODS: The cohort comprised 9,787 patients with localized prostate cancer from eight NRG Oncology randomized phase III trials, treated with radiation therapy, androgen deprivation therapy, and/or chemotherapy. Locked MMAI models, which used digital histopathology images and clinical data, were applied to each patient. Expert consensus on cut points defined low-, intermediate-, and high-risk groups on the basis of 10-year distant metastasis rates of 3% and 10%, respectively. The MMAIs reclassification and prognostic performance were compared with the three-tier NCCN risk groups. RESULTS: The median follow-up for censored patients was 7.9 years. According to NCCN risk categories, 30.4% of patients were low-risk, 25.5% intermediate-risk, and 44.1% high-risk. The MMAI risk classification identified 43.5% of patients as low-risk, 34.6% as intermediate-risk, and 21.8% as high-risk. MMAI reclassified 1,039 (42.0%) patients initially categorized by NCCN. Despite the MMAI low-risk group being larger than the NCCN low-risk group, the 10-year metastasis risks were comparable: 1.7% (95% CI, 0.2 to 3.2) for NCCN and 3.2% (95% CI, 1.7 to 4.7) for MMAI. The overall 10-year metastasis risk for NCCN high-risk patients was 16.6%, with MMAI further stratifying this group into low-, intermediate-, and high-risk, showing metastasis rates of 3.4%, 8.2%, and 26.3%, respectively. CONCLUSION: The MMAI risk grouping system expands the population of men identified as having low metastatic risk and accurately pinpoints a high-risk subset with elevated metastasis rates. This approach aims to prevent both overtreatment and undertreatment in localized prostate cancer, facilitating shared decision making.
- Published
- 2024
18. Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study
- Author
-
Liu, Hao, Zhang, Zecheng, Liao, Wenjing, and Schaeffer, Hayden
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Neural scaling laws play a pivotal role in the performance of deep neural networks and have been observed in a wide range of tasks. However, a complete theoretical framework for understanding these scaling laws remains underdeveloped. In this paper, we explore the neural scaling laws for deep operator networks, which involve learning mappings between function spaces, with a focus on the Chen and Chen style architecture. These approaches, which include the popular Deep Operator Network (DeepONet), approximate the output functions using a linear combination of learnable basis functions and coefficients that depend on the input functions. We establish a theoretical framework to quantify the neural scaling laws by analyzing its approximation and generalization errors. We articulate the relationship between the approximation and generalization errors of deep operator networks and key factors such as network model size and training data size. Moreover, we address cases where input functions exhibit low-dimensional structures, allowing us to derive tighter error bounds. These results also hold for deep ReLU networks and other similar structures. Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.
- Published
- 2024
19. Software for the SpaceDREAM Robotic Arm
- Author
-
Mühlbauer, Maximilian, Chalon, Maxime, Ulmer, Maximilian, and Albu-Schäffer, Alin
- Subjects
Computer Science - Robotics - Abstract
Impedance-controlled robots are widely used on Earth to perform interaction-rich tasks and will be a key enabler for In-Space Servicing, Assembly and Manufacturing (ISAM) activities. This paper introduces the software architecture used on the On-Board Computer (OBC) for the planned SpaceDREAM mission aiming to validate such robotic arm in Lower Earth Orbit (LEO) conducted by the German Aerospace Center (DLR) in cooperation with KINETIK Space GmbH and the Technical University of Munich (TUM). During the mission several free motion as well as contact tasks are to be performed in order to verify proper functionality of the robot in position and impedance control on joint level as well as in cartesian control. The tasks are selected to be representative for subsequent servicing missions e.g. requiring interface docking or precise manipulation. The software on the OBC commands the robot's joints via SpaceWire to perform those mission tasks, reads camera images and data from additional sensors and sends telemetry data through an Ethernet link via the spacecraft down to Earth. It is set up to execute a predefined mission after receiving a start signal from the spacecraft while it should be extendable to receive commands from Earth for later missions. Core design principle was to reuse as much existing software and to stay as close as possible to existing robot software stacks at DLR. This allowed for a quick full operational start of the robot arm compared to a custom development of all robot software, a lower entry barrier for software developers as well as a reuse of existing libraries. While not every line of code can be tested with this design, most of the software has already proven its functionality through daily execution on multiple robot systems.
- Published
- 2024
20. Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model
- Author
-
Jollie, Derek, Sun, Jingmin, Zhang, Zecheng, and Schaeffer, Hayden
- Subjects
Computer Science - Machine Learning - Abstract
Symbolic encoding has been used in multi-operator learning as a way to embed additional information for distinct time-series data. For spatiotemporal systems described by time-dependent partial differential equations, the equation itself provides an additional modality to identify the system. The utilization of symbolic expressions along side time-series samples allows for the development of multimodal predictive neural networks. A key challenge with current approaches is that the symbolic information, i.e. the equations, must be manually preprocessed (simplified, rearranged, etc.) to match and relate to the existing token library, which increases costs and reduces flexibility, especially when dealing with new differential equations. We propose a new token library based on SymPy to encode differential equations as an additional modality for time-series models. The proposed approach incurs minimal cost, is automated, and maintains high prediction accuracy for forecasting tasks. Additionally, we include a Bayesian filtering module that connects the different modalities to refine the learned equation. This improves the accuracy of the learned symbolic representation and the predicted time-series.
- Published
- 2024
21. PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
- Author
-
Liu, Yuxuan, Sun, Jingmin, He, Xinjie, Pinney, Griffin, Zhang, Zecheng, and Schaeffer, Hayden
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Physics - Fluid Dynamics - Abstract
We propose PROSE-FD, a zero-shot multimodal PDE foundational model for simultaneous prediction of heterogeneous two-dimensional physical systems related to distinct fluid dynamics settings. These systems include shallow water equations and the Navier-Stokes equations with incompressible and compressible flow, regular and complex geometries, and different buoyancy settings. This work presents a new transformer-based multi-operator learning approach that fuses symbolic information to perform operator-based data prediction, i.e. non-autoregressive. By incorporating multiple modalities in the inputs, the PDE foundation model builds in a pathway for including mathematical descriptions of the physical behavior. We pre-train our foundation model on 6 parametric families of equations collected from 13 datasets, including over 60K trajectories. Our model outperforms popular operator learning, computer vision, and multi-physics models, in benchmark forward prediction tasks. We test our architecture choices with ablation studies.
- Published
- 2024
22. Modeling of a continuous superradiant laser on the sub-mHz $^1$S$_0\,\rightarrow\,^3$P$_0$ transition in neutral strontium-88
- Author
-
Dubey, Swadheen, Kazakov, Georgy A., Heizenreder, Benedikt, Zhou, Sheng, Bennetts, Shayne, Schäffer, Stefan Alaric, Sitaram, Ananya, and Schreck, Florian
- Subjects
Physics - Atomic Physics ,Quantum Physics - Abstract
Continuous superradiance using a narrow optical transition has the potential to improve the short-term stability of state-of-the-art optical clocks. Even though pulsed superradiant emission on a mHz linewidth clock transition has been shown, true continuous operation, without Fourier limitation, has turned out to be extremely challenging. The trade-off between maintaining a high atomic flux while minimizing decoherence effects presents a significant obstacle. Here, we discuss the design of a machine that could overcome this problem by combining a high-flux continuous beam of ultra cold strontium atoms with a bowtie cavity for the generation of superradiant lasing. To evaluate the feasibility of our design, we present simulation results for continuous high-efficiency cooling, loading, and pumping to the upper lasing state inside the bowtie cavity. We then present two different models for stimulating the generated superradiant field by taking into account position-dependent shifts, collisional decoherence, light shifts, and atom loss. Finally, we estimate a laser linewidth of less than 100 mHz, limited by atom number fluctuations, and resulting in an output power of hundreds of fW., Comment: 20 pages, 9 figures
- Published
- 2024
23. Interactive incremental learning of generalizable skills with local trajectory modulation
- Author
-
Knauer, Markus, Albu-Schäffer, Alin, Stulp, Freek, and Silvério, João
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Robotics - Abstract
The problem of generalization in learning from demonstration (LfD) has received considerable attention over the years, particularly within the context of movement primitives, where a number of approaches have emerged. Recently, two important approaches have gained recognition. While one leverages via-points to adapt skills locally by modulating demonstrated trajectories, another relies on so-called task-parameterized models that encode movements with respect to different coordinate systems, using a product of probabilities for generalization. While the former are well-suited to precise, local modulations, the latter aim at generalizing over large regions of the workspace and often involve multiple objects. Addressing the quality of generalization by leveraging both approaches simultaneously has received little attention. In this work, we propose an interactive imitation learning framework that simultaneously leverages local and global modulations of trajectory distributions. Building on the kernelized movement primitives (KMP) framework, we introduce novel mechanisms for skill modulation from direct human corrective feedback. Our approach particularly exploits the concept of via-points to incrementally and interactively 1) improve the model accuracy locally, 2) add new objects to the task during execution and 3) extend the skill into regions where demonstrations were not provided. We evaluate our method on a bearing ring-loading task using a torque-controlled, 7-DoF, DLR SARA robot., Comment: 21 pages, 16 figures
- Published
- 2024
24. LeMON: Learning to Learn Multi-Operator Networks
- Author
-
Sun, Jingmin, Zhang, Zecheng, and Schaeffer, Hayden
- Subjects
Computer Science - Machine Learning - Abstract
Single-operator learning involves training a deep neural network to learn a specific operator, whereas recent work in multi-operator learning uses an operator embedding structure to train a single neural network on data from multiple operators. Thus, multi-operator learning is capable of predicting a range of operators within one model. In this work, we propose pretraining and fine-tuning strategies for solving PDEs using multi-operator learning. One key aspect is that by increasing the number of families of operators used in pretraining, a PDE foundation model can be fine-tuned to downstream tasks involving new PDEs with a limited number of samples, thus outperforming single operator neural networks. Specifically, a multi-operator learning model pre-trained with data from diverse PDE families can predict unseen operators after fine-tuning with only a limited number of operators from the new family, enabling them to serve as a data-free PDE solver. We also show that the proposed training and fine-tuning method is able to predict new operators in zero-shot prediction without samples. Additionally, we introduce a PDE-agnostic meta-learning algorithm to improve the adaptability of the model to various PDEs by providing a better parameter initialization process. To address the needs of applications with limited computing resources, we explore low-rank adaptation methods that reduce computational costs while enhancing solver accuracy. Lastly, by examining the scaling law with respect to the number of operator families, we establish and highlight its potential for broad adaptation in PDE-solving tasks.
- Published
- 2024
25. Proton Radiography Inversions with Source Extraction and Comparison to Mesh Methods
- Author
-
Griff-McMahon, J., Valenzuela-Villaseca, V., Malko, S., Fiksel, G., Rosenberg, M. J., Schaeffer, D. B., and Fox, W.
- Subjects
Physics - Plasma Physics - Abstract
Proton radiography is a central diagnostic technique for measuring electromagnetic (EM) fields in high-energy-density, laser-produced plasmas. In this technique, protons traverse the plasma where they accumulate small EM deflections which lead to variations in the proton fluence pattern on a detector. Path-integrated EM fields can then be extracted from the fluence image through an inversion process. In this work, experiments of laser-driven foils were conducted on the OMEGA laser and magnetic field reconstructions were performed using both "fluence-based" techniques and high-fidelity "mesh-based" methods. We implement nonzero boundary conditions into the inversion and show their importance by comparing against mesh measurements. Good agreement between the methods is found only when nonzero boundary conditions are used. We also introduce an approach to determine the unperturbed proton source profile, which is a required input in fluence reconstruction algorithms. In this approach, a fluence inversion is embedded inside of a mesh region, which provides overconstrained magnetic boundary conditions. A source profile is then iteratively optimized to satisfy the boundary information. This method substantially enhances the accuracy in recovering EM fields. Lastly, we propose a scheme to quantify uncertainty in the final inversion that is introduced through errors in the source retrieval., Comment: 16 pages, 12 figures
- Published
- 2024
26. Continuous cavity-QED with an atomic beam
- Author
-
Famà, Francesca, Zhou, Sheng, Heizenreder, Benedikt, Tang, Mikkel, Bennetts, Shayne, Jäger, Simon B., Schäffer, Stefan A., and Schreck, Florian
- Subjects
Physics - Atomic Physics ,Quantum Physics - Abstract
Atoms coupled to cavities provide an exciting playground for the study of fundamental interactions of atoms mediated through a common channel. Many of the applications of cavity-QED and cold-atom experiments more broadly, suffer from limitations caused by the transient nature of an atomic loading cycle. The development of continuous operation schemes is necessary to push these systems to the next level of performance. Here we present a machine designed to produce a continuous flux of collimated atoms that traverse an optical cavity. The atom-light interaction is enhanced by a fast-decaying cavity optimal for studying phenomena where atomic properties dominate. We demonstrate the transition to a collective strong coupling regime heralded by a normal-mode splitting. We observe a second phase with a binary normal-mode splitting born from an offset in the mean velocity of the atoms. Inverting the atomic ensemble in the collective strong coupling regime, we measure continuous optical gain. This work sets the stage for studying threshold conditions for continuous collective phenomena, such as continuous superradiant lasing.
- Published
- 2024
27. When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
- Author
-
Schaeffer, Rylan, Valentine, Dan, Bailey, Luke, Chua, James, Eyzaguirre, Cristóbal, Durante, Zane, Benton, Joe, Miranda, Brando, Sleight, Henry, Hughes, John, Agrawal, Rajashree, Sharma, Mrinank, Emmons, Scott, Koyejo, Sanmi, and Perez, Ethan
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
The integration of new modalities into frontier AI systems offers exciting capabilities, but also increases the possibility such systems can be adversarially manipulated in undesirable ways. In this work, we focus on a popular class of vision-language models (VLMs) that generate text outputs conditioned on visual and textual inputs. We conducted a large-scale empirical study to assess the transferability of gradient-based universal image "jailbreaks" using a diverse set of over 40 open-parameter VLMs, including 18 new VLMs that we publicly release. Overall, we find that transferable gradient-based image jailbreaks are extremely difficult to obtain. When an image jailbreak is optimized against a single VLM or against an ensemble of VLMs, the jailbreak successfully jailbreaks the attacked VLM(s), but exhibits little-to-no transfer to any other VLMs; transfer is not affected by whether the attacked and target VLMs possess matching vision backbones or language models, whether the language model underwent instruction-following and/or safety-alignment training, or many other factors. Only two settings display partially successful transfer: between identically-pretrained and identically-initialized VLMs with slightly different VLM training data, and between different training checkpoints of a single VLM. Leveraging these results, we then demonstrate that transfer can be significantly improved against a specific target VLM by attacking larger ensembles of "highly-similar" VLMs. These results stand in stark contrast to existing evidence of universal and transferable text jailbreaks against language models and transferable adversarial attacks against image classifiers, suggesting that VLMs may be more robust to gradient-based transfer attacks.
- Published
- 2024
28. Open Problems in Technical AI Governance
- Author
-
Reuel, Anka, Bucknall, Ben, Casper, Stephen, Fist, Tim, Soder, Lisa, Aarne, Onni, Hammond, Lewis, Ibrahim, Lujain, Chan, Alan, Wills, Peter, Anderljung, Markus, Garfinkel, Ben, Heim, Lennart, Trask, Andrew, Mukobi, Gabriel, Schaeffer, Rylan, Baker, Mauricio, Hooker, Sara, Solaiman, Irene, Luccioni, Alexandra Sasha, Rajkumar, Nitarshan, Moës, Nicolas, Ladish, Jeffrey, Guha, Neel, Newman, Jessica, Bengio, Yoshua, South, Tobin, Pentland, Alex, Koyejo, Sanmi, Kochenderfer, Mykel J., and Trager, Robert
- Subjects
Computer Science - Computers and Society - Abstract
AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) identify and assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance., Comment: Ben Bucknall and Anka Reuel contributed equally and share the first author position
- Published
- 2024
29. Kinetic study of shock formation and particle acceleration in laser-driven quasi-parallel magnetized collisionless shocks
- Author
-
Zhang, Yu, Heuer, Peter V, Davies, Jonathan R, Schaeffer, Derek B, Wen, Han, García-Rubio, Fernando, and Ren, Chuang
- Subjects
Nuclear and Plasma Physics ,Space Sciences ,Physical Sciences ,Astronomical and Space Sciences ,Atomic ,Molecular ,Nuclear ,Particle and Plasma Physics ,Classical Physics ,Fluids & Plasmas ,Nuclear and plasma physics ,Space sciences - Published
- 2024
30. Comparison of brace to observation in stable, radiological developmental dysplasia of the hip: a protocol for a global multicentre non-inferiority randomised trial.
- Author
-
Zomar, Bryn, Bone, Jeffrey, Nguyen, Vuong, Mulpuri, Kishore, Kelley, Simon, and Schaeffer, Emily
- Subjects
hip ,paediatric orthopaedics ,radiology & imaging ,randomized controlled trial ,ultrasound ,Humans ,Braces ,Infant ,Developmental Dysplasia of the Hip ,Multicenter Studies as Topic ,Watchful Waiting ,Equivalence Trials as Topic ,Female ,Radiography ,Infant ,Newborn ,Randomized Controlled Trials as Topic ,Ultrasonography ,Hip Dislocation ,Congenital ,Male - Abstract
INTRODUCTION: Brace treatment is common to address radiological dysplasia in infants with developmental dysplasia of the hip (DDH); however, it is unclear whether bracing provides significant benefit above careful observation by ultrasound. If observation alone is non-inferior to bracing for radiological dysplasia, unnecessary treatment may be avoided. Therefore, the purpose of this study is to determine whether observation is non-inferior to bracing for infants with radiological dysplasia. METHODS AND ANALYSIS: This will be a multicentre, global, randomised, non-inferiority trial performed under the auspices of a global prospective registry for infants and children diagnosed with DDH. Patients will be included if they present with radiological dysplasia (centred hip, alpha angle 43-60°, percent femoral head coverage greater than 35% measured on ultrasound) of a clinically stable hip under 3 months old. Patients will be excluded if they present with clinical hip instability, have received prior treatment or have known/suspected neuromuscular, collagen, chromosomal or lower-extremity congenital abnormalities or syndromic-associated hip abnormalities. Patients will be enrolled and randomised to undergo observation alone or brace treatment with a Pavlik harness for a minimum of 6 weeks. Follow-up visits will occur at 6 weeks, 1 year and 2 years post-enrolment. The primary outcome will be the norm-referenced acetabular index measured on the 2-year radiograph with a 3° non-inferiority margin. A total of 514 patients will be included.The study is anticipated to start in April 2024 and end in September 2028.The primary outcome will be compared between arms with a mixed-effects model with a random intercept for study centre, and a single covariate for the treatment group. If the lower bound of the 95% CI lies within 3° of the mean, we will treat this as evidence for non-inferiority. ETHICS AND DISSEMINATION: Ethics approval has been obtained from the lead sites ethics board (University of British Columbia, Childrens and Womens Research Ethics Board). Ethics approval will be obtained from the local ethics committees or institutional review boards at each institution prior to patient enrolment. It is intended that the results of this study shall be published in peer-reviewed journals and presented at suitable conferences. TRIAL REGISTRATION NUMBER: NCT05869851.
- Published
- 2024
31. Gaussian process-based online health monitoring and fault analysis of lithium-ion battery systems from field data
- Author
-
Schaeffer, Joachim, Lenz, Eric, Gulla, Duncan, Bazant, Martin Z., Braatz, Richard D., and Findeisen, Rolf
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Systems and Control ,Statistics - Applications ,I.2.6 - Abstract
Health monitoring, fault analysis, and detection are critical for the safe and sustainable operation of battery systems. We apply Gaussian process resistance models on lithium iron phosphate battery field data to effectively separate the time-dependent and operating point-dependent resistance. The data set contains 29 battery systems returned to the manufacturer for warranty, each with eight cells in series, totaling 232 cells and 131 million data rows. We develop probabilistic fault detection rules using recursive spatiotemporal Gaussian processes. These processes allow the quick processing of over a million data points, enabling advanced online monitoring and furthering the understanding of battery pack failure in the field. The analysis underlines that often, only a single cell shows abnormal behavior or a knee point, consistent with weakest-link failure for cells connected in series, amplified by local resistive heating. The results further the understanding of how batteries degrade and fail in the field and demonstrate the potential of efficient online monitoring based on data. We open-source the code and publish the large data set upon completion of the review of this article., Comment: This version is outdated. The final version is published as open access journal article: https://doi.org/10.1016/j.xcrp.2024.102258
- Published
- 2024
- Full Text
- View/download PDF
32. Geomagnetic dipole stability and zonal flow changes controlled by mantle heat flux heterogeneities
- Author
-
Frasson, Thomas, Schaeffer, Natanaël, Nataf, Henri-Claude, and Labrosse, Stéphane
- Subjects
Astrophysics - Earth and Planetary Astrophysics ,Physics - Geophysics - Abstract
Palaeomagnetic evidence shows that the behaviour of the geodynamo has changed during geological times. Variations in the heat flux at the core-mantle boundary (CMB) due to mantle convection could be responsible. Previous studies, based on unrealistically viscous dynamo simulations, have shown that large-scale CMB heat flux heterogeneities impact the magnetic dipole stability. To better understand how they affect the geodynamo, we used several simulations, ranging from standard numerical dynamos to more extreme parameters, including strong-field cases and turbulent cases. We show that heterogeneities with realistic amplitudes can favour a multipolar dynamo by either forcing equatorially antisymmetric zonal flows or eastward zonal flows. Strong-field dynamo models are found to be less sensitive, due to significant westward flows. We also find that the dipolar fraction of the magnetic field is best captured by $M^*=M\ E_{\eta}\ \dfrac{l_c}{\pi}$ where $M$ is the magnetic to kinetic energy ratio, $E_{\eta}$ is the magnetic Ekman number, and $l_c$ is the dominant harmonic degree of the flow, with multipolar dynamos found at lower $M^*$. $M^*$ estimated for the Earth's core is consistent with a reversing dipolar magnetic field. Within the range of $M^*$ susceptible to reversals, breaking the equatorial symmetry or forcing eastward zonal flows in our simulations consistently triggers reversals or a transition towards multipolar dynamos. Our results support that time variations of heat-flux heterogeneities driven by mantle convection through Earth's history are capable of inducing the significant variations in the reversal frequency observed in the palaeomagnetic record., Comment: 23 pages, 14+ figures, 5 appendices
- Published
- 2024
33. Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Frontier AI Models
- Author
-
Duan, Sunny, Khona, Mikail, Iyer, Abhiram, Schaeffer, Rylan, and Fiete, Ila R
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Quantitative Biology - Neurons and Cognition - Abstract
Frontier AI systems are making transformative impacts across society, but such benefits are not without costs: models trained on web-scale datasets containing personal and private data raise profound concerns about data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage - where the model response reveals pieces of such information - remains inadequately understood. Prior work has investigated what factors drive memorization and have identified that sequence complexity and the number of repetitions drive memorization. Here, we focus on the evolution of memorization over training. We begin by reproducing findings that the probability of memorizing a sequence scales logarithmically with the number of times it is present in the data. We next show that sequences which are apparently not memorized after the first encounter can be "uncovered" throughout the course of training even without subsequent encounters, a phenomenon we term "latent memorization". The presence of latent memorization presents a challenge for data privacy as memorized sequences may be hidden at the final checkpoint of the model but remain easily recoverable. To this end, we develop a diagnostic test relying on the cross entropy loss to uncover latent memorized sequences with high accuracy.
- Published
- 2024
34. $\texttt{cunuSHT}$: GPU Accelerated Spherical Harmonic Transforms on Arbitrary Pixelizations
- Author
-
Belkner, Sebastian, Duivenvoorden, Adriaan J., Carron, Julien, Schaeffer, Nathanael, and Reinecke, Martin
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
We present $\texttt{cunusht}$, a general-purpose Python package that wraps a highly efficient CUDA implementation of the nonuniform spin-$0$ spherical harmonic transform. The method is applicable to arbitrary pixelization schemes, including schemes constructed from equally-spaced iso-latitude rings as well as completely nonuniform ones. The algorithm has an asymptotic scaling of $\mathrm{O}{(\ell_{\rm max}^3)}$ for maximum multipole $\ell_{\rm max}$ and achieves machine precision accuracy. While $\texttt{cunusht}$ is developed for applications in cosmology in mind, it is applicable to various other interpolation problems on the sphere. We outperform the fastest available CPU algorithm by a factor of up to 5 for problems with a nonuniform pixelization and $\ell_{\rm max}>4\cdot10^3$ when comparing a single modern GPU to a modern 32-core CPU. This performance is achieved by utilizing the double Fourier sphere method in combination with the nonuniform fast Fourier transform and by avoiding transfers between the host and device. For scenarios without GPU availability, $\texttt{cunusht}$ wraps existing CPU libraries. $\texttt{cunusht}$ is publicly available and includes tests, documentation, and demonstrations.
- Published
- 2024
35. In-Context Learning of Energy Functions
- Author
-
Schaeffer, Rylan, Khona, Mikail, and Koyejo, Sanmi
- Subjects
Computer Science - Machine Learning - Abstract
In-context learning is a powerful capability of certain machine learning models that arguably underpins the success of today's frontier AI models. However, in-context learning is critically limited to settings where the in-context distribution of interest $p_{\theta}^{ICL}( x|\mathcal{D})$ can be straightforwardly expressed and/or parameterized by the model; for instance, language modeling relies on expressing the next-token distribution as a categorical distribution parameterized by the network's output logits. In this work, we present a more general form of in-context learning without such a limitation that we call \textit{in-context learning of energy functions}. The idea is to instead learn the unconstrained and arbitrary in-context energy function $E_{\theta}^{ICL}(x|\mathcal{D})$ corresponding to the in-context distribution $p_{\theta}^{ICL}(x|\mathcal{D})$. To do this, we use classic ideas from energy-based modeling. We provide preliminary evidence that our method empirically works on synthetic data. Interestingly, our work contributes (to the best of our knowledge) the first example of in-context learning where the input space and output space differ from one another, suggesting that in-context learning is a more-general capability than previously realized., Comment: Proceedings of the 1st Workshop on In-Context Learning at the 41st International Conference on Machine Learning, Vienna, Austria. 2024. arXiv admin note: text overlap with arXiv:2402.10202
- Published
- 2024
36. Quantifying Variance in Evaluation Benchmarks
- Author
-
Madaan, Lovish, Singh, Aaditya K., Schaeffer, Rylan, Poulton, Andrew, Koyejo, Sanmi, Stenetorp, Pontus, Narang, Sharan, and Hupkes, Dieuwke
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the variance in our evaluation benchmarks, which dictates whether differences in performance are meaningful. Here, we define and measure a range of metrics geared towards measuring variance in evaluation benchmarks, including seed variance across initialisations, and monotonicity during training. By studying a large number of models -- both openly available and pretrained from scratch -- we provide empirical estimates for a variety of variance metrics, with considerations and recommendations for practitioners. We also evaluate the utility and tradeoffs of continuous versus discrete performance measures and explore options for better understanding and reducing this variance. We find that simple changes, such as framing choice tasks (like MMLU) as completion tasks, can often reduce variance for smaller scale ($\sim$7B) models, while more involved methods inspired from human testing literature (such as item analysis and item response theory) struggle to meaningfully reduce variance. Overall, our work provides insights into variance in evaluation benchmarks, suggests LM-specific techniques to reduce variance, and more generally encourages practitioners to carefully factor in variance when comparing models.
- Published
- 2024
37. Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
- Author
-
Schaeffer, Rylan, Lecomte, Victor, Pai, Dhruv Bhandarkar, Carranza, Andres, Isik, Berivan, Unell, Alyssa, Khona, Mikail, Yerxa, Thomas, LeCun, Yann, Chung, SueYeon, Gromov, Andrey, Shwartz-Ziv, Ravid, and Koyejo, Sanmi
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Quantitative Biology - Neurons and Cognition - Abstract
Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to improve our understanding and our utilization of MMCR. To better understand MMCR, we leverage tools from high dimensional probability to demonstrate that MMCR incentivizes alignment and uniformity of learned embeddings. We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective commonly discussed in MVSSL. To better utilize MMCR, we mathematically predict and experimentally confirm non-monotonic changes in the pretraining loss akin to double descent but with respect to atypical hyperparameters. We also discover compute scaling laws that enable predicting the pretraining loss as a function of gradients steps, batch size, embedding dimension and number of views. We then show that MMCR, originally applied to image data, is performant on multimodal image-text data. By more deeply understanding the theoretical and empirical behavior of MMCR, our work reveals insights on improving MVSSL methods.
- Published
- 2024
38. A normal version of Brauer's height zero conjecture
- Author
-
Moretó, Alexander and Fry, A. A. Schaeffer
- Subjects
Mathematics - Group Theory - Abstract
The celebrated It\^o-Michler theorem asserts that a prime $p$ does not divide the degree of any irreducible character of a finite group $G$ if and only if $G$ has a normal and abelian Sylow $p$-subgroup. The principal block case of the recently-proven Brauer's height zero conjecture isolates the abelian part in the It\^o-Michler theorem. In this paper, we show that the normal part can also be isolated in a similar way. This is a consequence of work on a strong form of the so-called Brauer's height zero conjecture for two primes of Malle and Navarro. Using our techniques, we also provide an alternate proof of this conjecture., Comment: Revised following Gunter Malle's suggestions
- Published
- 2024
39. Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
- Author
-
Schaeffer, Rylan, Schoelkopf, Hailey, Miranda, Brando, Mukobi, Gabriel, Madan, Varun, Ibrahim, Adam, Bradley, Herbie, Biderman, Stella, and Koyejo, Sanmi
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Predictable behavior from scaling advanced AI systems is an extremely desirable property. Although a well-established literature exists on how pretraining performance scales, the literature on how particular downstream capabilities scale is significantly muddier. In this work, we take a step back and ask: why has predicting specific downstream capabilities with scale remained elusive? While many factors are certainly responsible, we identify a new factor that makes modeling scaling behavior on widely used multiple-choice question-answering benchmarks challenging. Using five model families and twelve well-established multiple-choice benchmarks, we show that downstream performance is computed from negative log likelihoods via a sequence of transformations that progressively degrade the statistical relationship between performance and scale. We then reveal the mechanism causing this degradation: downstream metrics require comparing the correct choice against a small number of specific incorrect choices, meaning accurately predicting downstream capabilities requires predicting not just how probability mass concentrates on the correct choice with scale, but also how probability mass fluctuates on specific incorrect choices with scale. We empirically study how probability mass on the correct choice co-varies with probability mass on incorrect choices with increasing compute, suggesting that scaling laws for incorrect choices might be achievable. Our work also explains why pretraining scaling laws are commonly regarded as more predictable than downstream capabilities and contributes towards establishing scaling-predictable evaluations of frontier AI models.
- Published
- 2024
40. MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis
- Author
-
Ciancone, Mathieu, Kerboua, Imene, Schaeffer, Marion, and Siblini, Wissam
- Subjects
Computer Science - Computation and Language ,Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
Recently, numerous embedding models have been made available and widely used for various NLP tasks. The Massive Text Embedding Benchmark (MTEB) has primarily simplified the process of choosing a model that performs well for several tasks in English, but extensions to other languages remain challenging. This is why we expand MTEB to propose the first massive benchmark of sentence embeddings for French. We gather 15 existing datasets in an easy-to-use interface and create three new French datasets for a global evaluation of 8 task categories. We compare 51 carefully selected embedding models on a large scale, conduct comprehensive statistical tests, and analyze the correlation between model performance and many of their characteristics. We find out that even if no model is the best on all tasks, large multilingual models pre-trained on sentence similarity perform exceptionally well. Our work comes with open-source code, new datasets and a public leaderboard.
- Published
- 2024
41. Characters and Sylow $3$-subgroup abelianization
- Author
-
Giannelli, Eugenio, Rizo, Noelia, Fry, A. A. Schaeffer, and Vallejo, Carolina
- Subjects
Mathematics - Group Theory ,Mathematics - Representation Theory - Abstract
We characterize when a finite group G possesses a Sylow 3-subgroup P with abelianization of order 9 in terms of the number of height zero characters lying in the principal 3-block of G, settling a conjecture put forward by Navarro, Sambale, and Tiep in 2018. Along the way, we show that a recent result by Laradji on the number of character of height zero in a block that lie above a given character of some normal subgroup holds, without any hypothesis on the group for blocks of maximal defect., Comment: slight title change and other minor changes, following helpful comments thanks to Gunter Malle and Bejamin Sambale
- Published
- 2024
42. Principal eigenstate classical shadows
- Author
-
Grier, Daniel, Pashayan, Hakop, and Schaeffer, Luke
- Subjects
Quantum Physics ,Computer Science - Information Theory ,Computer Science - Machine Learning - Abstract
Given many copies of an unknown quantum state $\rho$, we consider the task of learning a classical description of its principal eigenstate. Namely, assuming that $\rho$ has an eigenstate $|\phi\rangle$ with (unknown) eigenvalue $\lambda > 1/2$, the goal is to learn a (classical shadows style) classical description of $|\phi\rangle$ which can later be used to estimate expectation values $\langle \phi |O| \phi \rangle$ for any $O$ in some class of observables. We consider the sample-complexity setting in which generating a copy of $\rho$ is expensive, but joint measurements on many copies of the state are possible. We present a protocol for this task scaling with the principal eigenvalue $\lambda$ and show that it is optimal within a space of natural approaches, e.g., applying quantum state purification followed by a single-copy classical shadows scheme. Furthermore, when $\lambda$ is sufficiently close to $1$, the performance of our algorithm is optimal--matching the sample complexity for pure state classical shadows., Comment: 38 pages
- Published
- 2024
43. Accounting for the Effects of Probabilistic Uncertainty During Fast Charging of Lithium-ion Batteries
- Author
-
Kim, Minsu, Schaeffer, Joachim, Berliner, Marc D., Sagnier, Berta Pedret, Findeisen, Rolf, and Braatz, Richard D.
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Batteries are nonlinear dynamical systems that can be modeled by Porous Electrode Theory models. The aim of optimal fast charging is to reduce the charging time while keeping battery degradation low. Most past studies assume that model parameters and ambient temperature are a fixed known value and that all PET model parameters are perfectly known. In real battery operation, however, the ambient temperature and the model parameters are uncertain. To ensure that operational constraints are satisfied at all times in the context of model-based optimal control, uncertainty quantification is required. Here, we analyze optimal fast charging for modest uncertainty in the ambient temperature and 23 model parameters. Uncertainty quantification of the battery model is carried out using non-intrusive polynomial chaos expansion and the results are verified with Monte Carlo simulations. The method is investigated for a constant current--constant voltage charging strategy for a battery for which the strategy is known to be standard for fast charging subject to operating below maximum current and charging constraints. Our results demonstrate that uncertainty in ambient temperature results in violations of constraints on the voltage and temperature. Our results identify a subset of key parameters that contribute to fast charging among the overall uncertain parameters. Additionally, it is shown that the constraints represented by voltage, temperature, and lithium-plating overpotential are violated due to uncertainties in the ambient temperature and parameters. The C-rate and charge constraints are then adjusted so that the probability of violating the degradation acceleration condition is below a pre-specified value. This approach demonstrates a computationally efficient approach for determining fast-charging protocols that take probabilistic uncertainties into account., Comment: 6 pages, 5 figures, accepted for ACC 2024
- Published
- 2024
- Full Text
- View/download PDF
44. Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
- Author
-
Sun, Jingmin, Liu, Yuxuan, Zhang, Zecheng, and Schaeffer, Hayden
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis - Abstract
Foundation models, such as large language models, have demonstrated success in addressing various language and image processing tasks. In this work, we introduce a multi-modal foundation model for scientific problems, named PROSE-PDE. Our model, designed for bi-modality to bi-modality learning, is a multi-operator learning approach which can predict future states of spatiotemporal systems while concurrently learning the underlying governing equations of the physical system. Specifically, we focus on multi-operator learning by training distinct one-dimensional time-dependent nonlinear constant coefficient partial differential equations, with potential applications to many physical applications including physics, geology, and biology. More importantly, we provide three extrapolation studies to demonstrate that PROSE-PDE can generalize physical features through the robust training of multiple operators and that the proposed model can extrapolate to predict PDE solutions whose models or data were unseen during the training. Furthermore, we show through systematic numerical experiments that the utilization of the symbolic modality in our model effectively resolves the well-posedness problems with training multiple operators and thus enhances our model's predictive capabilities.
- Published
- 2024
45. Araucaria: Simplifying INC Fault Tolerance with High-Level Intents
- Author
-
Parizotto, Ricardo, Haque, Israat, and Schaeffer-Filho, Alberto
- Subjects
Computer Science - Networking and Internet Architecture ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Network programmability allows modification of fine-grain data plane functionality. The performance benefits of data plane programmability have motivated many researchers to offload computation that previously operated only on servers to the network, creating the notion of in-network computing (INC). Because failures can occur in the data plane, fault tolerance mechanisms are essential for INC. However, INC operators and developers must manually set fault tolerance requirements using domain knowledge to change the source code. These manually set requirements may take time and lead to errors in case of misconfiguration. In this work, we present Araucaria, a system that aims to simplify the definition and implementation of fault tolerance requirements for INC. The system allows requirements specification using an intent language, which enables the expression of consistency and availability requirements in a constrained natural language. A refinement process translates the intent and incorporates the essential building blocks and configurations into the INC code. We present a prototype of Araucaria and analyze the end-to-end system behavior. Experiments demonstrate that the refinement scales to multiple intents and that the system provides fault tolerance with negligible overhead in failure scenarios.
- Published
- 2024
46. Swing-Up of a Weakly Actuated Double Pendulum via Nonlinear Normal Modes
- Author
-
Sachtler, Arne, Calzolari, Davide, Raff, Maximilian, Schmidt, Annika, Wotte, Yannik P., Della Santina, Cosimo, Remy, C. David, and Albu-Schäffer, Alin
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Computer Science - Robotics - Abstract
We identify the nonlinear normal modes spawning from the stable equilibrium of a double pendulum under gravity, and we establish their connection to homoclinic orbits through the unstable upright position as energy increases. This result is exploited to devise an efficient swing-up strategy for a double pendulum with weak, saturating actuators. Our approach involves stabilizing the system onto periodic orbits associated with the nonlinear modes while gradually injecting energy. Since these modes are autonomous system evolutions, the required control effort for stabilization is minimal. Even with actuator limitations of less than 1% of the maximum gravitational torque, the proposed method accomplishes the swing-up of the double pendulum by allowing sufficient time., Comment: Preprint of a paper to appear at the European Control Conference (ECC) 2024 in Stockholm, Sweden
- Published
- 2024
- Full Text
- View/download PDF
47. Testing common approximations to predict the 21cm signal at the Epoch of Reionization and Cosmic Dawn
- Author
-
Schaeffer, Timothée, Giri, Sambit K., and Schneider, Aurel
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
Predicting the 21cm signal from the epoch of reionization and cosmic dawn is a complex and challenging task. Various simplifying assumptions have been applied over the last decades to make the modeling more affordable. In this paper, we investigate the validity of several such assumptions, using a simulation suite consisting of three different astrophysical source models that agree with the current constraints on the reionization history and the UV luminosity function. We first show that the common assumption of a saturated spin temperature may lead to significant errors in the 21cm clustering signal over the full reionization period. The same is true for the assumption of a neutral universe during the cosmic dawn which may lead to significant deviation from the correct signal during the heating and the Lyman-$\alpha$ coupling period. Another popular simplifying assumption consists of predicting the global differential brightness temperature ($dT_b$) based on the average quantities of the reionization fraction, gas temperature, and Lyman-$\alpha$ coupling. We show that such an approach leads to a 10 percent deeper absorption signal compared to the results obtained by averaging the final $dT_b$-map. Finally, we investigate the simplifying method of breaking the 21cm clustering signal into different auto and cross components that are then solved assuming linearity. We show that even though the individual fields have a variance well below unity, they often cannot be treated perturbatively as the perturbations are strongly non-Gaussian. As a consequence, predictions based on the perturbative solution of individual auto and cross power spectra may lead to strongly biased results, even if higher-order terms are taken into account.
- Published
- 2024
48. X-ray imaging and electron temperature evolution in laser-driven magnetic reconnection experiments at the National Ignition Facility
- Author
-
Valenzuela-Villaseca, V., Molina, J. M., Schaeffer, D. B., Malko, S., Griff-McMahon, J., Lezhnin, K., Rosenberg, M. J., Hu, S. X., Kalantar, D., Trosseille, C., Park, H. -S., Remington, B. A., Fiksel, G., Uzdensky, D., Bhattacharjee, A., and Fox, W.
- Subjects
Physics - Plasma Physics ,Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Solar and Stellar Astrophysics - Abstract
We present results from X-ray imaging of high-aspect-ratio magnetic reconnection experiments driven at the National Ignition Facility. Two parallel, self-magnetized, elongated laser-driven plumes are produced by tiling 40 laser beams. A magnetic reconnection layer is formed by the collision of the plumes. A gated X-ray framing pinhole camera with micro-channel plate (MCP) detector produces multiple images through various filters of the formation and evolution of both the plumes and current sheet. As the diagnostic integrates plasma self-emission along the line of sight, 2-dimensional electron temperature maps $\langle T_e \rangle_Y$ are constructed by taking the ratio of intensity of these images obtained with different filters. The plumes have a characteristic temperature $\langle T_e \rangle_Y = 240 \pm 20$ eV at 2 ns after the initial laser irradiation and exhibit a slow cooling up to 4 ns. The reconnection layer forms at 3 ns with a temperature $\langle T_e \rangle_Y = 280 \pm 50$ eV as the result of the collision of the plumes. The error bars of the plumes and current sheet temperatures separate at $4$ ns, showing the heating of the current sheet from colder inflows. Using a semi-analytical model, we find that the observed heating of the current sheet is consistent with being produced by electron-ion drag, rather than the conversion of magnetic to kinetic energy., Comment: Submitted to Physics of Plasmas. 19 pages (total), 14 figures, 2 tables
- Published
- 2024
- Full Text
- View/download PDF
49. Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast Charging
- Author
-
Hirt, Sebastian, Höhl, Andreas, Schaeffer, Joachim, Pohlodek, Johannes, Braatz, Richard D., and Findeisen, Rolf
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Computer Science - Machine Learning - Abstract
Tuning parameters in model predictive control (MPC) presents significant challenges, particularly when there is a notable discrepancy between the controller's predictions and the actual behavior of the closed-loop plant. This mismatch may stem from factors like substantial model-plant differences, limited prediction horizons that do not cover the entire time of interest, or unforeseen system disturbances. Such mismatches can jeopardize both performance and safety, including constraint satisfaction. Traditional methods address this issue by modifying the finite horizon cost function to better reflect the overall operational cost, learning parts of the prediction model from data, or implementing robust MPC strategies, which might be either computationally intensive or overly cautious. As an alternative, directly optimizing or learning the controller parameters to enhance closed-loop performance has been proposed. We apply Bayesian optimization for efficient learning of unknown model parameters and parameterized constraint backoff terms, aiming to improve closed-loop performance of battery fast charging. This approach establishes a hierarchical control framework where Bayesian optimization directly fine-tunes closed-loop behavior towards a global and long-term objective, while MPC handles lower-level, short-term control tasks. For lithium-ion battery fast charging, we show that the learning approach not only ensures safe operation but also maximizes closed-loop performance. This includes maintaining the battery's operation below its maximum terminal voltage and reducing charging times, all achieved using a standard nominal MPC model with a short horizon and notable initial model-plant mismatch., Comment: 6 pages, 5 figures, accepted for ADCHEM 2024
- Published
- 2024
- Full Text
- View/download PDF
50. Cycle Life Prediction for Lithium-ion Batteries: Machine Learning and More
- Author
-
Schaeffer, Joachim, Galuppini, Giacomo, Rhyu, Jinwook, Asinger, Patrick A., Droop, Robin, Findeisen, Rolf, and Braatz, Richard D.
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Computer Science - Machine Learning - Abstract
Batteries are dynamic systems with complicated nonlinear aging, highly dependent on cell design, chemistry, manufacturing, and operational conditions. Prediction of battery cycle life and estimation of aging states is important to accelerate battery R&D, testing, and to further the understanding of how batteries degrade. Beyond testing, battery management systems rely on real-time models and onboard diagnostics and prognostics for safe operation. Estimating the state of health and remaining useful life of a battery is important to optimize performance and use resources optimally. This tutorial begins with an overview of first-principles, machine learning, and hybrid battery models. Then, a typical pipeline for the development of interpretable machine learning models is explained and showcased for cycle life prediction from laboratory testing data. We highlight the challenges of machine learning models, motivating the incorporation of physics in hybrid modeling approaches, which are needed to decipher the aging trajectory of batteries but require more data and further work on the physics of battery degradation. The tutorial closes with a discussion on generalization and further research directions., Comment: 6 pages, 3 figures, accepted for ACC 2024
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.