2,472 results on '"P. Tenenbaum"'
Search Results
2. Sketching With Your Voice: 'Non-Phonorealistic' Rendering of Sounds via Vocal Imitation
- Author
-
Caren, Matthew, Chandra, Kartik, Tenenbaum, Joshua B., Ragan-Kelley, Jonathan, and Ma, Karima
- Subjects
Computer Science - Graphics ,Computer Science - Computation and Language ,Computer Science - Human-Computer Interaction ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing ,I.3.8 - Abstract
We present a method for automatically producing human-like vocal imitations of sounds: the equivalent of "sketching," but for auditory rather than visual representation. Starting with a simulated model of the human vocal tract, we first try generating vocal imitations by tuning the model's control parameters to make the synthesized vocalization match the target sound in terms of perceptually-salient auditory features. Then, to better match human intuitions, we apply a cognitive theory of communication to take into account how human speakers reason strategically about their listeners. Finally, we show through several experiments and user studies that when we add this type of communicative reasoning to our method, it aligns with human intuitions better than matching auditory features alone does. This observation has broad implications for the study of depiction in computer graphics., Comment: SIGGRAPH Asia 2024
- Published
- 2024
- Full Text
- View/download PDF
3. SIFToM: Robust Spoken Instruction Following through Theory of Mind
- Author
-
Ying, Lance, Liu, Jason Xinyu, Aarya, Shivam, Fang, Yizirui, Tellex, Stefanie, Tenenbaum, Joshua B., and Shu, Tianmin
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Computer Science - Multiagent Systems - Abstract
Spoken language instructions are ubiquitous in agent collaboration. However, in human-robot collaboration, recognition accuracy for human speech is often influenced by various speech and environmental factors, such as background noise, the speaker's accents, and mispronunciation. When faced with noisy or unfamiliar auditory inputs, humans use context and prior knowledge to disambiguate the stimulus and take pragmatic actions, a process referred to as top-down processing in cognitive science. We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions by inferring the human's goal and joint plan as prior for speech perception and understanding. We test SIFToM in simulated home experiments (VirtualHome 2). Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks. We then demonstrate its ability at the task planning level on a mobile manipulator for breakfast preparation tasks., Comment: 7 pages, 4 figures
- Published
- 2024
4. What Makes a Maze Look Like a Maze?
- Author
-
Hsu, Joy, Mao, Jiayuan, Tenenbaum, Joshua B., Goodman, Noah D., and Wu, Jiajun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
A unique aspect of human visual understanding is the ability to flexibly interpret abstract concepts: acquiring lifted rules explaining what they symbolize, grounding them across familiar and unfamiliar contexts, and making predictions or reasoning about them. While off-the-shelf vision-language models excel at making literal interpretations of images (e.g., recognizing object categories such as tree branches), they still struggle to make sense of such visual abstractions (e.g., how an arrangement of tree branches may form the walls of a maze). To address this challenge, we introduce Deep Schema Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning. At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols. DSG uses large language models to extract schemas, then hierarchically grounds concrete to abstract components of the schema onto images with vision-language models. The grounded schema is used to augment visual abstraction understanding. We systematically evaluate DSG and different methods in reasoning on our new Visual Abstractions Dataset, which consists of diverse, real-world images of abstract concepts and corresponding question-answer pairs labeled by humans. We show that DSG significantly improves the abstract visual reasoning performance of vision-language models, and is a step toward human-aligned understanding of visual abstractions.
- Published
- 2024
5. Evaluating Multiview Object Consistency in Humans and Image Models
- Author
-
Bonnen, Tyler, Fu, Stephanie, Bai, Yutong, O'Connell, Thomas, Friedman, Yoni, Kanwisher, Nancy, Tenenbaum, Joshua B., and Efros, Alexei A.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce a benchmark to directly evaluate the alignment between human observers and vision models on a 3D shape inference task. We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape: given a set of images, participants identify which contain the same/different objects, despite considerable viewpoint variation. We draw from a diverse range of images that include common objects (e.g., chairs) as well as abstract shapes (i.e., procedurally generated `nonsense' objects). After constructing over 2000 unique image sets, we administer these tasks to human participants, collecting 35K trials of behavioral data from over 500 participants. This includes explicit choice behaviors as well as intermediate measures, such as reaction time and gaze data. We then evaluate the performance of common vision models (e.g., DINOv2, MAE, CLIP). We find that humans outperform all models by a wide margin. Using a multi-scale evaluation approach, we identify underlying similarities and differences between models and humans: while human-model performance is correlated, humans allocate more time/processing on challenging trials. All images, data, and code can be accessed via our project page., Comment: Project page: https://tzler.github.io/MOCHI/ Code: https://github.com/tzler/mochi_code Huggingface dataset: https://huggingface.co/datasets/tzler/MOCHI
- Published
- 2024
6. On a family of arithmetic series related to the M\'obius function
- Author
-
Tenenbaum, Gérald
- Subjects
Mathematics - Number Theory ,11N 37, 11N25, 11N56 - Abstract
Let $P^-(n)$ denote the smallest prime factor of a natural integer $n>1$. Furthermore let $\mu$ and $\omega$ denote respectively the M\"obius function and the number of distinct prime factors function. We show that, given any set ${{\scr P}}$ of prime numbers with a natural density, we have $\sum_{P^-(n)\in \scr P}\mu(n)\omega(n)/n=0$ and provide a effective estimate for the rate of convergence. This extends a recent result of Alladi and Johnson, who considered the case when ${\scr P}$ is an arithmetic progression.
- Published
- 2024
7. Understanding Epistemic Language with a Bayesian Theory of Mind
- Author
-
Ying, Lance, Zhi-Xuan, Tan, Wong, Lionel, Mansinghka, Vikash, and Tenenbaum, Joshua B.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
How do people understand and evaluate claims about others' beliefs, even though these beliefs cannot be directly observed? In this paper, we introduce a cognitive model of epistemic language interpretation, grounded in Bayesian inferences about other agents' goals, beliefs, and intentions: a language-augmented Bayesian theory-of-mind (LaBToM). By translating natural language into an epistemic ``language-of-thought'', then evaluating these translations against the inferences produced by inverting a probabilistic generative model of rational action and perception, LaBToM captures graded plausibility judgments about epistemic claims. We validate our model in an experiment where participants watch an agent navigate a maze to find keys hidden in boxes needed to reach their goal, then rate sentences about the agent's beliefs. In contrast with multimodal LLMs (GPT-4o, Gemini Pro) and ablated models, our model correlates highly with human judgments for a wide range of expressions, including modal language, uncertainty expressions, knowledge claims, likelihood comparisons, and attributions of false belief., Comment: 21 pages
- Published
- 2024
8. Spectral Approximation for substitution systems
- Author
-
Band, Ram, Beckus, Siegfried, Pogorzelski, Felix, and Tenenbaum, Lior
- Subjects
Mathematics - Spectral Theory ,Mathematics - Dynamical Systems ,37B10, 37B52, 52C23, 81Q10 - Abstract
We study periodic approximations of aperiodic Schr\"odinger operators on lattices in Lie groups with dilation structure. The potentials arise through symbolic substitution systems that have been recently introduced in this setting. We characterize convergence of spectra of associated Schr\"odinger operators in the Hausdorff distance via properties of finite graphs. As a consequence, new examples of periodic approximations are obtained. We further prove that there are substitution systems that do not admit periodic approximations in higher dimensions, in contrast to the one-dimensional case. On the other hand, if the spectra converge, then we show that the rate of convergence is necessarily exponentially fast. These results are new even for substitutions over $\mathbb{Z}^d$., Comment: 33 pages, 5 figures
- Published
- 2024
9. Can Large Language Models Understand Symbolic Graphics Programs?
- Author
-
Qiu, Zeju, Liu, Weiyang, Feng, Haiwen, Liu, Zhen, Xiao, Tim Z., Collins, Katherine M., Tenenbaum, Joshua B., Weller, Adrian, Black, Michael J., and Schölkopf, Bernhard
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Against the backdrop of enthusiasm for large language models (LLMs), there is an urgent need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models have not encountered during training. Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of LLMs. Popular in computer graphics, these programs procedurally generate visual data. While LLMs exhibit impressive skills in general program synthesis and analysis, symbolic graphics programs offer a new layer of evaluation: they allow us to test an LLM's ability to answer different-grained semantic-level questions of the images or 3D geometries without a vision encoder. To semantically understand the symbolic programs, LLMs would need to possess the ability to "imagine" and reason how the corresponding graphics content would look with only the symbolic description. We use this task to evaluate LLMs by creating a large benchmark for the semantic visual understanding of symbolic graphics programs, built procedurally with minimal human effort. Particular emphasis is placed on transformations of images that leave the image level semantics invariant while introducing significant changes to the underlying program. We evaluate commercial and open-source LLMs on our benchmark to assess their ability to reason about visual output of programs, finding that LLMs considered stronger at reasoning generally perform better. Lastly, we introduce a novel method to improve this ability -- Symbolic Instruction Tuning (SIT), in which the LLM is finetuned with pre-collected instruction data on symbolic graphics programs. Interestingly, we find that SIT not only improves LLM's understanding on symbolic programs, but it also improves general reasoning ability on various other benchmarks., Comment: Technical Report v2 (46 pages, 24 figures, project page: https://sgp-bench.github.io/, substantial update from v1)
- Published
- 2024
10. Compositional Physical Reasoning of Objects and Events from Videos
- Author
-
Chen, Zhenfang, Dong, Shilong, Yi, Kexin, Li, Yunzhu, Ding, Mingyu, Torralba, Antonio, Tenenbaum, Joshua B., and Gan, Chuang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects' motion and interactions and predicting corresponding dynamics based on the inferred physical properties. We first introduce the Compositional Physical Reasoning (ComPhy) dataset. For a given set of objects, ComPhy includes limited videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions. Besides the synthetic videos from simulators, we also collect a real-world dataset to show further test physical reasoning abilities of different models. We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties, which leads to inferior performance. We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties from question answering. After training, PCR demonstrates remarkable capabilities. It can detect and associate objects across frames, ground visible and hidden physical properties, make future and counterfactual predictions, and utilize these extracted representations to answer challenging questions., Comment: arXiv admin note: text overlap with arXiv:2205.01089
- Published
- 2024
11. Symbolic metaprogram search improves learning efficiency and explains rule learning in humans.
- Author
-
Rule, Joshua, Piantadosi, Steven, Cropper, Andrew, Ellis, Kevin, Nye, Maxwell, and Tenenbaum, Joshua
- Subjects
Humans ,Learning ,Algorithms - Abstract
Throughout their lives, humans seem to learn a variety of rules for things like applying category labels, following procedures, and explaining causal relationships. These rules are often algorithmically rich but are nonetheless acquired with minimal data and computation. Symbolic models based on program learning successfully explain rule-learning in many domains, but performance degrades quickly as program complexity increases. It remains unclear how to scale symbolic rule-learning methods to model human performance in challenging domains. Here we show that symbolic search over the space of metaprograms-programs that revise programs-dramatically improves learning efficiency. On a behavioral benchmark of 100 algorithmically rich rules, this approach fits human learning more accurately than alternative models while also using orders of magnitude less search. The computation required to match median human performance is consistent with conservative estimates of human thinking time. Our results suggest that metaprogram-like representations may help human learners to efficiently acquire rules.
- Published
- 2024
12. Infinite Ends from Finite Samples: Open-Ended Goal Inference as Top-Down Bayesian Filtering of Bottom-Up Proposals
- Author
-
Zhi-Xuan, Tan, Kang, Gloria, Mansinghka, Vikash, and Tenenbaum, Joshua B.
- Subjects
Computer Science - Artificial Intelligence - Abstract
The space of human goals is tremendously vast; and yet, from just a few moments of watching a scene or reading a story, we seem to spontaneously infer a range of plausible motivations for the people and characters involved. What explains this remarkable capacity for intuiting other agents' goals, despite the infinitude of ends they might pursue? And how does this cohere with our understanding of other people as approximately rational agents? In this paper, we introduce a sequential Monte Carlo model of open-ended goal inference, which combines top-down Bayesian inverse planning with bottom-up sampling based on the statistics of co-occurring subgoals. By proposing goal hypotheses related to the subgoals achieved by an agent, our model rapidly generates plausible goals without exhaustive search, then filters out goals that would be irrational given the actions taken so far. We validate this model in a goal inference task called Block Words, where participants try to guess the word that someone is stacking out of lettered blocks. In comparison to both heuristic bottom-up guessing and exact Bayesian inference over hundreds of goals, our model better predicts the mean, variance, efficiency, and resource rationality of human goal inferences, achieving similar accuracy to the exact model at a fraction of the cognitive cost, while also explaining garden-path effects that arise from misleading bottom-up cues. Our experiments thus highlight the importance of uniting top-down and bottom-up models for explaining the speed, accuracy, and generality of human theory-of-mind., Comment: Accepted for publication at CogSci 2024. 6 pages, 4 figures. (Appendix: 5 pages, 6 figures, 2 tables)
- Published
- 2024
13. Building Machines that Learn and Think with People
- Author
-
Collins, Katherine M., Sucholutsky, Ilia, Bhatt, Umang, Chandra, Kartik, Wong, Lionel, Lee, Mina, Zhang, Cedegao E., Zhi-Xuan, Tan, Ho, Mark, Mansinghka, Vikash, Weller, Adrian, Tenenbaum, Joshua B., and Griffiths, Thomas L.
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
What do we want from machine intelligence? We envision machines that are not just tools for thought, but partners in thought: reasonable, insightful, knowledgeable, reliable, and trustworthy systems that think with us. Current artificial intelligence (AI) systems satisfy some of these criteria, some of the time. In this Perspective, we show how the science of collaborative cognition can be put to work to engineer systems that really can be called ``thought partners,'' systems built to meet our expectations and complement our limitations. We lay out several modes of collaborative thought in which humans and AI thought partners can engage and propose desiderata for human-compatible thought partnerships. Drawing on motifs from computational cognitive science, we motivate an alternative scaling path for the design of thought partners and ecosystems around their use through a Bayesian lens, whereby the partners we construct actively build and reason over models of the human and world.
- Published
- 2024
14. People use fast, goal-directed simulation to reason about novel games
- Author
-
Zhang, Cedegao E., Collins, Katherine M., Wong, Lionel, Weller, Adrian, and Tenenbaum, Joshua B.
- Subjects
Computer Science - Computer Science and Game Theory ,Computer Science - Artificial Intelligence ,Quantitative Biology - Neurons and Cognition - Abstract
We can evaluate features of problems and their potential solutions well before we can effectively solve them. When considering a game we have never played, for instance, we might infer whether it is likely to be challenging, fair, or fun simply from hearing the game rules, prior to deciding whether to invest time in learning the game or trying to play it well. Many studies of game play have focused on optimality and expertise, characterizing how people and computational models play based on moderate to extensive search and after playing a game dozens (if not thousands or millions) of times. Here, we study how people reason about a range of simple but novel connect-n style board games. We ask people to judge how fair and how fun the games are from very little experience: just thinking about the game for a minute or so, before they have ever actually played with anyone else, and we propose a resource-limited model that captures their judgments using only a small number of partial game simulations and almost no lookahead search., Comment: Accepted at CogSci 2024 as a talk
- Published
- 2024
15. Potential Based Diffusion Motion Planning
- Author
-
Luo, Yunhao, Sun, Chen, Tenenbaum, Joshua B., and Du, Yilun
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Effective motion planning in high dimensional spaces is a long-standing open problem in robotics. One class of traditional motion planning algorithms corresponds to potential-based motion planning. An advantage of potential based motion planning is composability -- different motion constraints can be easily combined by adding corresponding potentials. However, constructing motion paths from potentials requires solving a global optimization across configuration space potential landscape, which is often prone to local minima. We propose a new approach towards learning potential based motion planning, where we train a neural network to capture and learn an easily optimizable potentials over motion planning trajectories. We illustrate the effectiveness of such approach, significantly outperforming both classical and recent learned motion planning approaches and avoiding issues with local minima. We further illustrate its inherent composability, enabling us to generalize to a multitude of different motion constraints., Comment: ICML 2024. Project page and code at https://energy-based-model.github.io/potential-motion-plan/
- Published
- 2024
16. On partial derivatives of some summatory functions
- Author
-
Tenenbaum, Gérald
- Subjects
Mathematics - Number Theory ,11N25, 11N37, 11K65, secondary 11N35, 11N64 - Abstract
Let $f$ be a real arithmetic function and let $g:[1,\infty[\to{\mathbb R}$ be a smooth function. We describe two emblematic instances in which saddle-point estimates may be used to evaluate the frequency, on the set of integers $n\leqslant x$, of the event $\{f(n)\leqslant g(n)\}$ from those relevant to the event $\{f(n)\leqslant y\}$. The first example revisits Dickman's historical contribution to the theory of friable integers. The second is concerned with the distribution of the squarefree kernel of an integer.
- Published
- 2024
17. Compositional Image Decomposition with Diffusion Models
- Author
-
Su, Jocelin, Liu, Nan, Wang, Yanbo, Tenenbaum, Joshua B., and Du, Yilun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a scene before. In this paper, we present a method to decompose an image into such compositional components. Our approach, Decomp Diffusion, is an unsupervised method which, when given a single image, infers a set of different components in the image, each represented by a diffusion model. We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects. We further illustrate how inferred factors can be flexibly composed, even with factors inferred from other models, to generate a variety of scenes sharply different than those seen in training time. Website and code at https://energy-based-model.github.io/decomp-diffusion., Comment: ICML 2024, Webpage: https://energy-based-model.github.io/decomp-diffusion
- Published
- 2024
18. Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
- Author
-
Cherian, Anoop, Peng, Kuan-Chuan, Lohit, Suhas, Matthiesen, Joanna, Smith, Kevin, and Tenenbaum, Joshua B.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as humans do? A systematic analysis of AI capabilities for joint vision and text reasoning, however, is missing in the current scientific literature. In this paper, we make an effort towards filling this gap, by evaluating state-of-the-art LVLMs on their mathematical and algorithmic reasoning abilities using visuo-linguistic problems from children's Olympiads. Specifically, we consider problems from the Mathematical Kangaroo (MK) Olympiad, which is a popular international competition targeted at children from grades 1-12, that tests children's deeper mathematical abilities using puzzles that are appropriately gauged to their age and skills. Using the puzzles from MK, we created a dataset, dubbed SMART-840, consisting of 840 problems from years 2020-2024. With our dataset, we analyze LVLMs power on mathematical reasoning; their responses on our puzzles offer a direct way to compare against that of children. Our results show that modern LVLMs do demonstrate increasingly powerful reasoning skills in solving problems for higher grades, but lack the foundations to correctly answer problems designed for younger children. Further analysis shows that there is no significant correlation between the reasoning capabilities of AI models and that of young children, and their capabilities appear to be based on a different type of reasoning than the cumulative knowledge that underlies children's mathematics and logic skills.
- Published
- 2024
19. Learning Iterative Reasoning through Energy Diffusion
- Author
-
Du, Yilun, Mao, Jiayuan, and Tenenbaum, Joshua B.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference based on problem difficulty, enabling it to solve problems outside its training distribution -- such as more complex Sudoku puzzles, matrix completion with large value magnitudes, and pathfinding in larger graphs. Key to our method's success is two novel techniques: learning a sequence of annealed energy landscapes for easier inference and a combination of score function and energy landscape supervision for faster and more stable training. Our experiments show that IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks, particularly in more challenging scenarios. Code and visualizations at https://energy-based-model.github.io/ired/, Comment: ICML 2024, website: https://energy-based-model.github.io/ired/
- Published
- 2024
20. Representational Alignment Supports Effective Machine Teaching
- Author
-
Sucholutsky, Ilia, Collins, Katherine M., Malaviya, Maya, Jacoby, Nori, Liu, Weiyang, Sumers, Theodore R., Korakakis, Michalis, Bhatt, Umang, Ho, Mark, Tenenbaum, Joshua B., Love, Brad, Pardos, Zachary A., Weller, Adrian, and Griffiths, Thomas L.
- Subjects
Computer Science - Machine Learning - Abstract
A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representational alignment and teacher capability for promoting student learning. To explore the characteristics of this utility curve, we design a supervised learning environment that disentangles representational alignment from teacher accuracy. We conduct extensive computational experiments with machines teaching machines, complemented by a series of experiments in which machines teach humans. Drawing on our findings that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), we design a classroom matching procedure that assigns students to teachers based on the utility curve. If we are to design effective machine teachers, it is not enough to build teachers that are accurate -- we want teachers that can align, representationally, to their students too., Comment: Preprint
- Published
- 2024
21. Physically Compatible 3D Object Modeling from a Single Image
- Author
-
Guo, Minghao, Wang, Bohan, Ma, Pingchuan, Zhang, Tianyuan, Owens, Crystal Elaine, Gan, Chuang, Tenenbaum, Joshua B., He, Kaiming, and Matusik, Wojciech
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present a computational framework that transforms single images into 3D physical objects. The visual geometry of a physical object in an image is determined by three orthogonal attributes: mechanical properties, external forces, and rest-shape geometry. Existing single-view 3D reconstruction methods often overlook this underlying composition, presuming rigidity or neglecting external forces. Consequently, the reconstructed objects fail to withstand real-world physical forces, resulting in instability or undesirable deformation -- diverging from their intended designs as depicted in the image. Our optimization framework addresses this by embedding physical compatibility into the reconstruction process. We explicitly decompose the three physical attributes and link them through static equilibrium, which serves as a hard constraint, ensuring that the optimized physical shapes exhibit desired physical behaviors. Evaluations on a dataset collected from Objaverse demonstrate that our framework consistently enhances the physical realism of 3D models over existing methods. The utility of our framework extends to practical applications in dynamic simulations and 3D printing, where adherence to physical compatibility is paramount.
- Published
- 2024
22. LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
- Author
-
Ma, Pingchuan, Wang, Tsun-Hsuan, Guo, Minghao, Sun, Zhiqing, Tenenbaum, Joshua B., Rus, Daniela, Gan, Chuang, and Matusik, Wojciech
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computational Engineering, Finance, and Science - Abstract
Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulating hypotheses, conducting experiments, and revising theories through observational analysis. Inspired by this, we propose to enhance the knowledge-driven, abstract reasoning abilities of LLMs with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework: LLMs act as knowledgeable and versatile thinkers, proposing scientific hypotheses and reason about discrete components, such as physics equations or molecule structures; meanwhile, simulations function as experimental platforms, providing observational feedback and optimizing via differentiability for continuous parts, such as physical parameters. We conduct extensive experiments to demonstrate our framework's efficacy in constitutive law discovery and molecular design, unveiling novel solutions that differ from conventional human expectations yet remain coherent upon analysis., Comment: ICML 2024
- Published
- 2024
23. STAR: A Benchmark for Situated Reasoning in Real-World Videos
- Author
-
Wu, Bo, Yu, Shoubin, Chen, Zhenfang, Tenenbaum, Joshua B, and Gan, Chuang
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Reasoning in the real world is not divorced from situations. How to capture the present knowledge from surrounding situations and perform reasoning accordingly is crucial and challenging for machine intelligence. This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR Benchmark). This benchmark is built upon the real-world videos associated with human actions or interactions, which are naturally dynamic, compositional, and logical. The dataset includes four types of questions, including interaction, sequence, prediction, and feasibility. We represent the situations in real-world videos by hyper-graphs connecting extracted atomic entities and relations (e.g., actions, persons, objects, and relationships). Besides visual perception, situated reasoning also requires structured situation comprehension and logical reasoning. Questions and answers are procedurally generated. The answering logic of each question is represented by a functional program based on a situation hyper-graph. We compare various existing video reasoning models and find that they all struggle on this challenging situated reasoning task. We further propose a diagnostic neuro-symbolic model that can disentangle visual perception, situation abstraction, language understanding, and functional reasoning to understand the challenges of this benchmark., Comment: NeurIPS
- Published
- 2024
24. Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models
- Author
-
Ivanova, Anna A., Sathe, Aalok, Lipkin, Benjamin, Kumar, Unnathi, Radkani, Setayesh, Clark, Thomas H., Kauf, Carina, Hu, Jennifer, Pramod, R. T., Grand, Gabriel, Paulun, Vivian, Ryskina, Maria, Akyürek, Ekin, Wilcox, Ethan, Rashid, Nafisa, Choshen, Leshem, Levy, Roger, Fedorenko, Evelina, Tenenbaum, Joshua, and Andreas, Jacob
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/implausible context. EWOK targets specific concepts from multiple knowledge domains known to be vital for world modeling in humans. Domains range from social interactions (help/hinder) to spatial relations (left/right). Both, contexts and targets are minimal pairs. Objects, agents, and locations in the items can be flexibly filled in enabling easy generation of multiple controlled datasets. We then introduce EWOK-CORE-1.0, a dataset of 4,374 items covering 11 world knowledge domains. We evaluate 20 openweights large language models (1.3B--70B parameters) across a battery of evaluation paradigms along with a human norming study comprising 12,480 measurements. The overall performance of all tested models is worse than human performance, with results varying drastically across domains. These data highlight simple cases where even large models fail and present rich avenues for targeted research on LLM world modeling capabilities., Comment: 21 pages (11 main), 7 figures. Authors Anna Ivanova, Aalok Sathe, Benjamin Lipkin contributed equally
- Published
- 2024
25. Finding structure in logographic writing with library learning
- Author
-
Jiang, Guangyuan, Hofer, Matthias, Mao, Jiayuan, Wong, Lionel, Tenenbaum, Joshua B., and Levy, Roger P.
- Subjects
Computer Science - Computation and Language - Abstract
One hallmark of human language is its combinatoriality -- reusing a relatively small inventory of building blocks to create a far larger inventory of increasingly complex structures. In this paper, we explore the idea that combinatoriality in language reflects a human inductive bias toward representational efficiency in symbol systems. We develop a computational framework for discovering structure in a writing system. Built on top of state-of-the-art library learning and program synthesis techniques, our computational framework discovers known linguistic structures in the Chinese writing system and reveals how the system evolves towards simplification under pressures for representational efficiency. We demonstrate how a library learning approach, utilizing learned abstractions and compression, may help reveal the fundamental computational principles that underlie the creation of combinatorial structures in human cognition, and offer broader insights into the evolution of efficient communication systems., Comment: Accepted at CogSci 2024 (Talk)
- Published
- 2024
26. Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
- Author
-
Dalrymple, David "davidad", Skalse, Joar, Bengio, Yoshua, Russell, Stuart, Tegmark, Max, Seshia, Sanjit, Omohundro, Steve, Szegedy, Christian, Goldhaber, Ben, Ammann, Nora, Abate, Alessandro, Halpern, Joe, Barrett, Clark, Zhao, Ding, Zhi-Xuan, Tan, Wing, Jeannette, and Tenenbaum, Joshua
- Subjects
Computer Science - Artificial Intelligence - Abstract
Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.
- Published
- 2024
27. Mean values of arithmetic functions and application to sums of powers
- Author
-
de la Bretèche, Régis and Tenenbaum, Gérald
- Subjects
Mathematics - Number Theory ,11N25, secondary 11N37, 11N64, 11P05 - Abstract
We provide new upper bounds for sums of certain arithmetic functions in many variables at polynomial arguments and, exploiting recent progress on the mean-value of the Erd\H os-Hooley $\Delta$-function, we derive lower bounds for the cardinality of those integers not exceeding a given limit that are expressible as some sums of powers.
- Published
- 2024
28. GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment
- Author
-
Ying, Lance, Jha, Kunal, Aarya, Shivam, Tenenbaum, Joshua B., Torralba, Antonio, and Shu, Tianmin
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems - Abstract
Verbal communication plays a crucial role in human cooperation, particularly when the partners only have incomplete information about the task, environment, and each other's mental state. In this paper, we propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA). GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents' mental states that are relevant to the goals. This approach enables an embodied assistant to reason about when and how to proactively initialize communication with humans verbally using natural language to help achieve better cooperation. We evaluate our approach against strong baselines in two challenging environments, Overcooked (a multiplayer game) and VirtualHome (a household simulator). Our experimental results demonstrate that large language models struggle with generating meaningful communication that is grounded in the social and physical context. In contrast, our approach can successfully generate concise verbal communication for the embodied assistant to effectively boost the performance of the cooperation as well as human users' perception of the assistant., Comment: 8 pages, 5 figures
- Published
- 2024
29. Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness
- Author
-
Curtis, Aidan, Matheos, George, Gothoskar, Nishad, Mansinghka, Vikash, Tenenbaum, Joshua, Lozano-Pérez, Tomás, and Kaelbling, Leslie Pack
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulation assumes full observability and deterministic action effects. These assumptions limit the ability of the planner to gather information and make decisions that are risk-aware. We propose a strategy for TAMP with Uncertainty and Risk Awareness (TAMPURA) that is capable of efficiently solving long-horizon planning problems with initial-state and action outcome uncertainty, including problems that require information gathering and avoiding undesirable and irreversible outcomes. Our planner reasons under uncertainty at both the abstract task level and continuous controller level. Given a set of closed-loop goal-conditioned controllers operating in the primitive action space and a description of their preconditions and potential capabilities, we learn a high-level abstraction that can be solved efficiently and then refined to continuous actions for execution. We demonstrate our approach on several robotics problems where uncertainty is a crucial factor and show that reasoning under uncertainty in these problems outperforms previously proposed determinized planning, direct search, and reinforcement learning strategies. Lastly, we demonstrate our planner on two real-world robotics problems using recent advancements in probabilistic perception.
- Published
- 2024
30. WatChat: Explaining perplexing programs by debugging mental models
- Author
-
Chandra, Kartik, Collins, Katherine M., Crichton, Will, Chen, Tony, Li, Tzu-Mao, Weller, Adrian, Nigam, Rachit, Tenenbaum, Joshua, and Ragan-Kelley, Jonathan
- Subjects
Computer Science - Programming Languages ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
Often, a good explanation for a program's unexpected behavior is a bug in the programmer's code. But sometimes, an even better explanation is a bug in the programmer's mental model of the language or API they are using. Instead of merely debugging our current code ("giving the programmer a fish"), what if our tools could directly debug our mental models ("teaching the programmer to fish")? In this paper, we apply recent ideas from computational cognitive science to offer a principled framework for doing exactly that. Given a "why?" question about a program, we automatically infer potential misconceptions about the language/API that might cause the user to be surprised by the program's behavior -- and then analyze those misconceptions to provide explanations of the program's behavior. Our key idea is to formally represent misconceptions as counterfactual (erroneous) semantics for the language/API, which can be inferred and debugged using program synthesis techniques. We demonstrate our framework, WatChat, by building systems for explanation in two domains: JavaScript type coercion, and the Git version control system. We evaluate WatChatJS and WatChatGit by comparing their outputs to experimentally-collected human-written explanations in these two domains: we show that WatChat's explanations exhibit key features of human-written explanation, unlike those of a state-of-the-art language model., Comment: This is a preprint of work presented in early-stage non-archival form at the ACL Natural Language Reasoning and Structured Explanations Workshop
- Published
- 2024
31. Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling
- Author
-
Grand, Gabriel, Pepe, Valerio, Andreas, Jacob, and Tenenbaum, Joshua B.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Questions combine our mastery of language with our remarkable facility for reasoning about uncertainty. How do people navigate vast hypothesis spaces to pose informative questions given limited cognitive resources? We study these tradeoffs in a classic grounded question-asking task based on the board game Battleship. Our language-informed program sampling (LIPS) model uses large language models (LLMs) to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain. We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance across varied Battleship board scenarios. In contrast, LLM-only baselines struggle to ground questions in the board state; notably, GPT-4V provides no improvement over non-visual baselines. Our results illustrate how Bayesian models of question-asking can leverage the statistics of language to capture human priors, while highlighting some shortcomings of pure LLMs as grounded reasoners., Comment: Accepted to CogSci 2024
- Published
- 2024
32. Approximations of symbolic substitution systems in one dimension
- Author
-
Tenenbaum, Lior
- Subjects
Mathematics - Dynamical Systems ,Mathematical Physics ,Mathematics - Spectral Theory ,52C23, 37B10, 37B52, 81Q10 - Abstract
Periodic approximations of quasicrystals are a powerful tool in analyzing spectra of Schr\"odinger operators arising from quasicrystals, given the known theory for periodic crystals. Namely, we seek periodic operators whose spectra approximate the spectrum of the limiting operator (of the quasicrystal). This naturally leads to study the convergence of the underlying dynamical systems. We treat dynamical systems which are based on one-dimensional substitutions. We first find natural candidates of dynamical subsystems to approximate the substitution dynamical system. Subsequently, we offer a characterization of their convergence and provide estimates for the rate of convergence. We apply the proposed theory to some guiding examples., Comment: 12 pages, 4 figures, written for the proceedings of ICQ 15 and submitted to the Israel Journal of Chemistry
- Published
- 2024
- Full Text
- View/download PDF
33. Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning
- Author
-
Zhi-Xuan, Tan, Ying, Lance, Mansinghka, Vikash, and Tenenbaum, Joshua B.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
People often give instructions whose meaning is ambiguous without further context, expecting that their actions or goals will disambiguate their intentions. How can we build assistive agents that follow such instructions in a flexible, context-sensitive manner? This paper introduces cooperative language-guided inverse plan search (CLIPS), a Bayesian agent architecture for pragmatic instruction following and goal assistance. Our agent assists a human by modeling them as a cooperative planner who communicates joint plans to the assistant, then performs multimodal Bayesian inference over the human's goal from actions and language, using large language models (LLMs) to evaluate the likelihood of an instruction given a hypothesized plan. Given this posterior, our assistant acts to minimize expected goal achievement cost, enabling it to pragmatically follow ambiguous instructions and provide effective assistance even when uncertain about the goal. We evaluate these capabilities in two cooperative planning domains (Doors, Keys & Gems and VirtualHome), finding that CLIPS significantly outperforms GPT-4V, LLM-based literal instruction following and unimodal inverse planning in both accuracy and helpfulness, while closely matching the inferences and assistive judgments provided by human raters., Comment: Accepted to AAMAS 2024. 8 pages (excl. references), 5 figures/tables. (Appendix: 8 pages, 8 figures/tables). Code available at: https://github.com/probcomp/CLIPS.jl
- Published
- 2024
34. Grounding Language about Belief in a Bayesian Theory-of-Mind
- Author
-
Ying, Lance, Zhi-Xuan, Tan, Wong, Lionel, Mansinghka, Vikash, and Tenenbaum, Joshua
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Despite the fact that beliefs are mental states that cannot be directly observed, humans talk about each others' beliefs on a regular basis, often using rich compositional language to describe what others think and know. What explains this capacity to interpret the hidden epistemic content of other minds? In this paper, we take a step towards an answer by grounding the semantics of belief statements in a Bayesian theory-of-mind: By modeling how humans jointly infer coherent sets of goals, beliefs, and plans that explain an agent's actions, then evaluating statements about the agent's beliefs against these inferences via epistemic logic, our framework provides a conceptual role semantics for belief, explaining the gradedness and compositionality of human belief attributions, as well as their intimate connection with goals and plans. We evaluate this framework by studying how humans attribute goals and beliefs while watching an agent solve a doors-and-keys gridworld puzzle that requires instrumental reasoning about hidden objects. In contrast to pure logical deduction, non-mentalizing baselines, and mentalizing that ignores the role of instrumental plans, our model provides a much better fit to human goal and belief attributions, demonstrating the importance of theory-of-mind for a semantics of belief., Comment: Published at CogSci 2024
- Published
- 2024
35. ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
- Author
-
Zheng, Zhicheng, Yan, Xin, Chen, Zhenfang, Wang, Jingzhou, Lim, Qin Zhi Eddie, Tenenbaum, Joshua B., and Gan, Chuang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce the Continuum Physical Dataset (ContPhy), a novel benchmark for assessing machine physical commonsense. ContPhy complements existing physical reasoning benchmarks by encompassing the inference of diverse physical properties, such as mass and density, across various scenarios and predicting corresponding dynamics. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy, which shows that the current AI models still lack physical commonsense for the continuum, especially soft-bodies, and illustrates the value of the proposed dataset. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models, which enjoy the advantages of both models, precise dynamic predictions, and interpretable reasoning. ContPhy aims to spur progress in perception and reasoning within diverse physical settings, narrowing the divide between human and machine intelligence in understanding the physical world. Project page: https://physical-reasoning-project.github.io, Comment: The first three authors contributed equally to this work
- Published
- 2024
36. Bone impairment in atypical hemolytic and uremic syndrome treated by long-term eculizumab
- Author
-
Regnier, Maitena, Leclerc, Anne-Laure Sellier, Tenenbaum, Julie, Desjonqueres, Marine, Chavassieux, Pascale, Fremeaux-Bacchi, Véronique, Farlay, Delphine, and Bacchetta, Justine
- Published
- 2024
- Full Text
- View/download PDF
37. Computed tomography in patients with sepsis presenting to the emergency department: exploring its role in light of patient outcomes
- Author
-
Pohlan, Julian, Möckel, Martin, Slagman, Anna, Tenenbaum, Hannah, Stolz, Jules, Rubarth, Kerstin, Winning, Johannes, Bauer, Michael, Reinhart, Konrad, Stacke, Angelika, Dewey, Marc, and Bolanaki, Myrto
- Published
- 2024
- Full Text
- View/download PDF
38. Clinical and economic inpatient burden of respiratory syncytial virus (RSV) infections in children < 2 years of age in Germany, 2014–2019: a retrospective health claims analysis
- Author
-
Lade, Caroline, Bayer, Lea, Huebbe, Bennet, Riedel, Jennifer, Melnik, Sima, Brestrich, Gordon, von Eiff, Christof, and Tenenbaum, Tobias
- Published
- 2024
- Full Text
- View/download PDF
39. The severity of respiratory syncytial virus infection in children during the SARS-CoV-2/COVID-19 pandemic: A nationwide study of 11,915 cases in Germany
- Author
-
Maslowski, Sarah, Hohenstein, Sven, Bollmann, Andreas, Karagiannidis, Christian, Papan, Cihan, Thal, Serge C., Wirth, Stefan, Tenenbaum, Tobias, and Aydin, Malik
- Published
- 2024
- Full Text
- View/download PDF
40. HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
- Author
-
Zhou, Qinhong, Chen, Sunli, Wang, Yisong, Xu, Haozhe, Du, Weihua, Zhang, Hongxin, Du, Yilun, Tenenbaum, Joshua B., and Gan, Chuang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events and need to rapidly take action accordingly. To remedy this gap, we propose a new simulated embodied benchmark, called HAZARD, specifically designed to assess the decision-making abilities of embodied agents in dynamic situations. HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind, and specifically supports the utilization of large language models (LLMs) to assist common sense reasoning and decision-making. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines, including reinforcement learning (RL), rule-based, and search-based methods in dynamically changing environments. As a first step toward addressing this challenge using large language models, we further develop an LLM-based agent and perform an in-depth analysis of its promise and challenge of solving these challenging tasks. HAZARD is available at https://vis-www.cs.umass.edu/hazard/., Comment: ICLR 2024. The first two authors contributed equally to this work
- Published
- 2024
41. MMToM-QA: Multimodal Theory of Mind Question Answering
- Author
-
Jin, Chuanyang, Wu, Yutong, Cao, Jing, Xiang, Jiannan, Kuo, Yen-Ling, Hu, Zhiting, Ullman, Tomer, Torralba, Antonio, Tenenbaum, Joshua B., and Shu, Tianmin
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Theory of Mind (ToM), the ability to understand people's mental states, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than video or text understanding. People can flexibly reason about another person's mind based on conceptual representations (e.g., goals, beliefs, plans) extracted from any available data. To address this, we introduce a multimodal Theory of Mind question answering (MMToM-QA) benchmark. MMToM-QA comprehensively evaluates machine ToM both on multimodal data and on different kinds of unimodal data about a person's activity in a household environment. To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models). BIP-ALM extracts unified representations from multimodal data and utilizes language models for scalable Bayesian inverse planning. We conducted a systematic comparison of human performance, BIP-ALM, and state-of-the-art models, including GPT-4. The experiments demonstrate that large language models and large multimodal models still lack robust ToM capacity. BIP-ALM, on the other hand, shows promising results, by leveraging the power of both model-based mental inference and language models., Comment: ACL 2024. 26 pages, 11 figures, 7 tables
- Published
- 2024
42. How does the primate brain combine generative and discriminative computations in vision?
- Author
-
Peters, Benjamin, DiCarlo, James J., Gureckis, Todd, Haefner, Ralf, Isik, Leyla, Tenenbaum, Joshua, Konkle, Talia, Naselaris, Thomas, Stachenfeld, Kimberly, Tavares, Zenna, Tsao, Doris, Yildirim, Ilker, and Kriegeskorte, Nikolaus
- Subjects
Quantitative Biology - Neurons and Cognition ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remove irrelevant variation and represent behaviorally relevant information in a format suitable for downstream functions of cognition and behavioral control. In this conception, vision is driven by the sensory data, and perception is direct because the processing proceeds from the data to the latent variables of interest. The notion of "inference" in this conception is that of the engineering literature on neural networks, where feedforward convolutional neural networks processing images are said to perform inference. The alternative conception is that of vision as an inference process in Helmholtz's sense, where the sensory evidence is evaluated in the context of a generative model of the causal processes giving rise to it. In this conception, vision inverts a generative model through an interrogation of the evidence in a process often thought to involve top-down predictions of sensory data to evaluate the likelihood of alternative hypotheses. The authors include scientists rooted in roughly equal numbers in each of the conceptions and motivated to overcome what might be a false dichotomy between them and engage the other perspective in the realm of theory and experiment. The primate brain employs an unknown algorithm that may combine the advantages of both conceptions. We explain and clarify the terminology, review the key empirical evidence, and propose an empirical research program that transcends the dichotomy and sets the stage for revealing the mysterious hybrid algorithm of primate vision.
- Published
- 2024
43. Health Care Cost Reductions with Machine Learning–Directed Evaluations during Radiation Therapy — An Economic Analysis of a Randomized Controlled Study
- Author
-
Natesan, Divya, Eisenstein, Eric L, Thomas, Samantha M, Eclov, Neville CW, Dalal, Nicole H, Stephens, Sarah J, Malicki, Mary, Shields, Stacey, Cobb, Alyssa, Mowery, Yvonne M, Niedzwiecki, Donna, Tenenbaum, Jessica D, Palta, Manisha, and Hong, Julian C
- Subjects
Information and Computing Sciences ,Biomedical and Clinical Sciences ,Machine Learning ,Good Health and Well Being - Published
- 2024
44. Bayes3D: fast learning and inference in structured generative models of 3D objects and scenes
- Author
-
Gothoskar, Nishad, Ghavami, Matin, Li, Eric, Curtis, Aidan, Noseworthy, Michael, Chung, Karen, Patton, Brian, Freeman, William T., Tenenbaum, Joshua B., Klukas, Mirko, and Mansinghka, Vikash K.
- Subjects
Computer Science - Robotics - Abstract
Robots cannot yet match humans' ability to rapidly learn the shapes of novel 3D objects and recognize them robustly despite clutter and occlusion. We present Bayes3D, an uncertainty-aware perception system for structured 3D scenes, that reports accurate posterior uncertainty over 3D object shape, pose, and scene composition in the presence of clutter and occlusion. Bayes3D delivers these capabilities via a novel hierarchical Bayesian model for 3D scenes and a GPU-accelerated coarse-to-fine sequential Monte Carlo algorithm. Quantitative experiments show that Bayes3D can learn 3D models of novel objects from just a handful of views, recognizing them more robustly and with orders of magnitude less training data than neural baselines, and tracking 3D objects faster than real time on a single GPU. We also demonstrate that Bayes3D learns complex 3D object models and accurately infers 3D scene composition when used on a Panda robot in a tabletop scenario.
- Published
- 2023
45. Learning adaptive planning representations with natural language guidance
- Author
-
Wong, Lionel, Mao, Jiayuan, Sharma, Pratyusha, Siegel, Zachary S., Feng, Jiahai, Korneev, Noa, Tenenbaum, Joshua B., and Andreas, Jacob
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Robotics - Abstract
Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into smaller subproblems appropriate for a goal or set of goals. This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from language models (LMs). Starting with a general-purpose hierarchical planner and a low-level goal-conditioned policy, Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks. On two language-guided interactive planning benchmarks (Mini Minecraft and ALFRED Household Tasks), Ada strongly outperforms other approaches that use LMs for sequential decision-making, offering more accurate plans and better generalization to complex tasks.
- Published
- 2023
46. How to guess a gradient
- Author
-
Singhal, Utkarsh, Cheung, Brian, Chandra, Kartik, Ragan-Kelley, Jonathan, Tenenbaum, Joshua B., Poggio, Tomaso A., and Yu, Stella X.
- Subjects
Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more structured than previously thought. Gradients lie in a predictable low-dimensional subspace which depends on the network architecture and incoming features. Exploiting this structure can significantly improve gradient-free optimization schemes based on directional derivatives, which have struggled to scale beyond small networks trained on toy datasets. We study how to narrow the gap in optimization performance between methods that calculate exact gradients and those that use directional derivatives. Furthermore, we highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
- Published
- 2023
47. What Planning Problems Can A Relational Neural Network Solve?
- Author
-
Mao, Jiayuan, Lozano-Pérez, Tomás, Tenenbaum, Joshua B., and Kaelbling, Leslie Pack
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Neural and Evolutionary Computing ,Statistics - Machine Learning - Abstract
Goal-conditioned policies are generally understood to be "feed-forward" circuits, in the form of neural networks that map from the current state and the goal specification to the next action to take. However, under what circumstances such a policy can be learned and how efficient the policy will be are not well understood. In this paper, we present a circuit complexity analysis for relational neural networks (such as graph neural networks and transformers) representing policies for planning problems, by drawing connections with serialized goal regression search (S-GRS). We show that there are three general classes of planning problems, in terms of the growth of circuit width and depth as a function of the number of objects and planning horizon, providing constructive proofs. We also illustrate the utility of this analysis for designing neural networks for policy learning., Comment: NeurIPS 2023 (Spotlight). Project page: https://concepts-ai.com/p/goal-regression-width/
- Published
- 2023
48. Learning Reusable Manipulation Strategies
- Author
-
Mao, Jiayuan, Tenenbaum, Joshua B., Lozano-Pérez, Tomás, and Kaelbling, Leslie Pack
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Humans demonstrate an impressive ability to acquire and generalize manipulation "tricks." Even from a single demonstration, such as using soup ladles to reach for distant objects, we can apply this skill to new scenarios involving different object positions, sizes, and categories (e.g., forks and hammers). Additionally, we can flexibly combine various skills to devise long-term plans. In this paper, we present a framework that enables machines to acquire such manipulation skills, referred to as "mechanisms," through a single demonstration and self-play. Our key insight lies in interpreting each demonstration as a sequence of changes in robot-object and object-object contact modes, which provides a scaffold for learning detailed samplers for continuous parameters. These learned mechanisms and samplers can be seamlessly integrated into standard task and motion planners, enabling their compositional use., Comment: CoRL 2023. Project page: https://concepts-ai.com/p/mechanisms/
- Published
- 2023
49. Fetal surgery using fetoscopic endoluminal tracheal occlusion for severe congenital diaphragmatic hernia: a single-center experience
- Author
-
Idelson, Ana, Tenenbaum-Gavish, Kinneret, Danon, David, Duvdevani, Nir-Ram, Bromiker, Ruben, Klinger, Gil, Orbach-Zinger, Sharon, Almog, Anastasia, Sharabi-Nov, Adi, Meiri, Hamutal, Nicolaides, Kypros H., Wiznitzer, Arnon, and Gielchinsky, Yuval
- Published
- 2024
- Full Text
- View/download PDF
50. Using games to understand the mind
- Author
-
Allen, Kelsey, Brändle, Franziska, Botvinick, Matthew, Fan, Judith E., Gershman, Samuel J., Gopnik, Alison, Griffiths, Thomas L., Hartshorne, Joshua K., Hauser, Tobias U., Ho, Mark K., de Leeuw, Joshua R., Ma, Wei Ji, Murayama, Kou, Nelson, Jonathan D., van Opheusden, Bas, Pouncy, Thomas, Rafner, Janet, Rahwan, Iyad, Rutledge, Robb B., Sherson, Jacob, Şimşek, Özgür, Spiers, Hugo, Summerfield, Christopher, Thalmann, Mirko, Vélez, Natalia, Watrous, Andrew J., Tenenbaum, Joshua B., and Schulz, Eric
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.