Author: "Kembhavi, A." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Kembhavi, A."' showing total 1,801 results

Start Over Author "Kembhavi, A."

1,801 results on '"Kembhavi, A."'

1. Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Author: Deitke, Matt, Clark, Christopher, Lee, Sangho, Tripathi, Rohun, Yang, Yue, Park, Jae Sung, Salehi, Mohammadreza, Muennighoff, Niklas, Lo, Kyle, Soldaini, Luca, Lu, Jiasen, Anderson, Taira, Bransom, Erin, Ehsani, Kiana, Ngo, Huong, Chen, YenSung, Patel, Ajay, Yatskar, Mark, Callison-Burch, Chris, Head, Andrew, Hendrix, Rose, Bastani, Favyen, VanderBilt, Eli, Lambert, Nathan, Chou, Yvonne, Chheda, Arnavi, Sparks, Jenna, Skjonsberg, Sam, Schmitz, Michael, Sarnat, Aaron, Bischoff, Byron, Walsh, Pete, Newell, Chris, Wolters, Piper, Gupta, Tanmay, Zeng, Kuo-Hao, Borchardt, Jon, Groeneveld, Dirk, Dumas, Jen, Nam, Crystal, Lebrecht, Sophie, Wittlif, Caitlin, Schoenick, Carissa, Michel, Oscar, Krishna, Ranjay, Weihs, Luca, Smith, Noah A., Hajishirzi, Hannaneh, Girshick, Ross, Farhadi, Ali, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Today's most advanced multimodal models remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed models into open ones. As a result, the community is still missing foundational knowledge about how to build performant VLMs from scratch. We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness. Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions. To enable a wide array of user interactions, we also introduce a diverse dataset mixture for fine-tuning that includes in-the-wild Q&A and innovative 2D pointing data. The success of our approach relies on careful choices for the model architecture details, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets, all of which will be released. The best-in-class 72B model within the Molmo family not only outperforms others in the class of open weight and data models but also compares favorably against proprietary systems like GPT-4o, Claude 3.5, and Gemini 1.5 on both academic benchmarks and human evaluation. We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future. Select model weights, inference code, and demo are available at https://molmo.allenai.org.
Published: 2024

2. FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Author: Hu, Jiaheng, Hendrix, Rose, Farhadi, Ali, Kembhavi, Aniruddha, Martin-Martin, Roberto, Stone, Peter, Zeng, Kuo-Hao, and Ehsani, Kiana
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In recent years, the Robotics field has initiated several efforts toward building generalist robot policies through large-scale multi-task Behavior Cloning. However, direct deployments of these policies have led to unsatisfactory performance, where the policy struggles with unseen states and tasks. How can we break through the performance plateau of these models and elevate their capabilities to new heights? In this paper, we propose FLaRe, a large-scale Reinforcement Learning fine-tuning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques. Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance both on previously demonstrated and on entirely novel tasks and embodiments. Specifically, on a set of long-horizon mobile manipulation tasks, FLaRe achieves an average success rate of 79.5% in unseen environments, with absolute improvements of +23.6% in simulation and +30.7% on real robots over prior SoTA methods. By utilizing only sparse rewards, our approach can enable generalizing to new capabilities beyond the pretraining data with minimal human effort. Moreover, we demonstrate rapid adaptation to new embodiments and behaviors with less than a day of fine-tuning. Videos can be found on the project website at https://robot-flare.github.io/
Published: 2024

3. PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

Author: Zeng, Kuo-Hao, Zhang, Zichen, Ehsani, Kiana, Hendrix, Rose, Salvador, Jordi, Herrasti, Alvaro, Girshick, Ross, Kembhavi, Aniruddha, and Weihs, Luca
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of millions of interactions across diverse environments, leveraging parallelized, multi-machine rollouts for efficient training with high throughput. PoliFormer is a masterful navigator, producing state-of-the-art results across two distinct embodiments, the LoCoBot and Stretch RE-1 robots, and four navigation benchmarks. It breaks through the plateaus of previous work, achieving an unprecedented 85.5% success rate in object goal navigation on the CHORES-S benchmark, a 28.5% absolute improvement. PoliFormer can also be trivially extended to a variety of downstream applications such as object tracking, multi-object navigation, and open-vocabulary navigation with no finetuning.
Published: 2024

4. CodeNav: Beyond tool-use to using real-world codebases with LLM agents

Author: Gupta, Tanmay, Weihs, Luca, and Kembhavi, Aniruddha
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: We present CodeNav, an LLM agent that navigates and leverages previously unseen code repositories to solve user queries. In contrast to tool-use LLM agents that require ``registration'' of all relevant tools via manual descriptions within the LLM context, CodeNav automatically indexes and searches over code blocks in the target codebase, finds relevant code snippets, imports them, and uses them to iteratively generate a solution with execution feedback. To highlight the core-capabilities of CodeNav, we first showcase three case studies where we use CodeNav for solving complex user queries using three diverse codebases. Next, on three benchmarks, we quantitatively compare the effectiveness of code-use (which only has access to the target codebase) to tool-use (which has privileged access to all tool names and descriptions). Finally, we study the effect of varying kinds of tool and library descriptions on code-use performance, as well as investigate the advantage of the agent seeing source code as opposed to natural descriptions of code. All code will be made open source under a permissive license.
Published: 2024

5. Task Me Anything

Author: Zhang, Jieyu, Huang, Weikai, Ma, Zixian, Michel, Oscar, He, Dong, Gupta, Tanmay, Ma, Wei-Chiu, Farhadi, Ali, Kembhavi, Aniruddha, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case. This paper introduces Task-Me-Anything, a benchmark generation engine which produces a benchmark tailored to a user's needs. Task-Me-Anything maintains an extendable taxonomy of visual assets and can programmatically generate a vast number of task instances. Additionally, it algorithmically addresses user queries regarding MLM performance efficiently within a computational budget. It contains 113K images, 10K videos, 2K 3D object assets, over 365 object categories, 655 attributes, and 335 relationships. It can generate 750M image/video question-answering pairs, which focus on evaluating MLM perceptual capabilities. Task-Me-Anything reveals critical insights: open-source MLMs excel in object and attribute recognition but lack spatial and temporal understanding; each model exhibits unique strengths and weaknesses; larger models generally perform better, though exceptions exist; and GPT4o demonstrates challenges in recognizing rotating/moving objects and distinguishing colors., Comment: website: https://www.task-me-anything.org
Published: 2024

6. Preserving Identity with Variational Score for General-purpose 3D Editing

Author: Le, Duong H., Pham, Tuan, Kembhavi, Aniruddha, Mandt, Stephan, Ma, Wei-Chiu, and Lu, Jiasen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To address this, we propose an additional score distillation term that enforces identity preservation. This results in a more stable editing process, gradually optimizing NeRF models to match target prompts while retaining crucial input characteristics. We demonstrate the effectiveness of our approach in zero-shot image and neural field editing. Our method successfully alters visual attributes, adds both subtle and substantial structural elements, translates shapes, and achieves competitive results on standard 2D and 3D editing benchmarks. Additionally, our method imposes no constraints like masking or pre-training, making it compatible with a wide range of pre-trained diffusion models. This allows for versatile editing without needing neural field-to-mesh conversion, offering a more user-friendly experience., Comment: 22 pages, 14 figures
Published: 2024

7. Automated Detection of Galactic Rings from SDSS Images

Author: Abraham, Linn, Abraham, Sheelu, Kembhavi, Ajit K., Philip, N. S., Aniyan, A. K., Barway, Sudhanshu, and Kumar, Harish
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: Morphological features in galaxies, like spiral arms, bars, rings, tidal tails etc. carry information about their structure, origin and evolution. It is therefore important to catalog and study such features and to correlate them with other basic galaxy properties, the environment in which the galaxies are located and their interactions with other galaxies. The volume of present and future data on galaxies is so large that traditional methods, which involve expert astronomers identifying morphological features through visual inspection, are no longer sufficient. It is therefore necessary to use AI based techniques like machine learning and deep learning for finding morphological structures quickly and efficiently. We report in this study the application of deep learning for finding ring like structures in galaxy images from the Sloan Digital Sky Survey (SDSS) data release DR18. We use a catalog by Buta (2017) of ringed galaxies from the SDSS to train the network, reaching good accuracy and recall, and generate a catalog of 29420 galaxies of which 4855 have ring like structures with prediction confidence exceeding 90 percent. Using a catalog of barred galaxy images identified by Abraham et. al. (2018) using deep learning techniques, we identify a set of 2087 galaxies with bars as well as rings. The catalog should be very useful in understanding the origin of these important morphological structures. As an example of the usefulness of the catalog, we explore the environments and star formation characteristics of ring galaxies in our sample., Comment: 22 pages, 15 figures, 1 Table. Accepted for publication in The Astrophysical Journal (ApJ)
Published: 2024

8. Iterated Learning Improves Compositionality in Large Vision-Language Models

Author: Zheng, Chenhao, Zhang, Jieyu, Kembhavi, Aniruddha, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our state-of-the-art vision-language models struggle at compositionality. They are unable to distinguish between images of " a girl in white facing a man in black" and "a girl in black facing a man in white". Moreover, prior work suggests that compositionality doesn't arise with scale: larger model sizes or training data don't help. This paper develops a new iterated training algorithm that incentivizes compositionality. We draw on decades of cognitive science research that identifies cultural transmission-the need to teach a new generation-as a necessary inductive prior that incentivizes humans to develop compositional languages. Specifically, we reframe vision-language contrastive learning as the Lewis Signaling Game between a vision agent and a language agent, and operationalize cultural transmission by iteratively resetting one of the agent's weights during training. After every iteration, this training paradigm induces representations that become "easier to learn", a property of compositional languages: e.g. our model trained on CC3M and CC12M improves standard CLIP by 4.7%, 4.0% respectfully in the SugarCrepe benchmark., Comment: CVPR 2024
Published: 2024

9. Light Curve Classification with DistClassiPy: a new distance-based classifier

Author: Chaini, Siddharth, Mahabal, Ashish, Kembhavi, Ajit, and Bianco, Federica B.
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Solar and Stellar Astrophysics, Computer Science - Machine Learning
Abstract: The rise of synoptic sky surveys has ushered in an era of big data in time-domain astronomy, making data science and machine learning essential tools for studying celestial objects. While tree-based models (e.g. Random Forests) and deep learning models dominate the field, we explore the use of different distance metrics to aid in the classification of astrophysical objects. We developed DistClassiPy, a new distance metric based classifier. The direct use of distance metrics is unexplored in time-domain astronomy, but distance-based methods can help make classification more interpretable and decrease computational costs. In particular, we applied DistClassiPy to classify light curves of variable stars, comparing the distances between objects of different classes. Using 18 distance metrics on a catalog of 6,000 variable stars across 10 classes, we demonstrate classification and dimensionality reduction. Our classifier meets state-of-the-art performance but has lower computational requirements and improved interpretability. Additionally, DistClassiPy can be tailored to specific objects by identifying the most effective distance metric for that classification. To facilitate broader applications within and beyond astronomy, we have made DistClassiPy open-source and available at https://pypi.org/project/distclassipy/., Comment: Accepted for publication in Astronomy and Computing (2024). 24 pages, 19 figures
Published: 2024
Full Text: View/download PDF

10. Global Data in Astronomy: Challenges and Opportunities

Author: Hložek, Renee, Cui, Chenzhou, Allen, Mark, Whitelock, Patricia, McIver, Jess, Longo, Giuseppe, Fluke, Christopher, Kembhavi, Ajit, Sharma, Pranav, and Mahabal, Ashish
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Physics - Data Analysis, Statistics and Probability
Abstract: Policy Brief on "Global Data in Astronomy: Challenges and Opportunities", distilled from the corresponding panel that was part of the discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023. Astronomy is increasingly becoming a data-driven science. Advances in our understanding of the physical mechanisms at work in the Universe require building ever-more sensitive telescopes to gather observations of the cosmos to test and advance our theoretical models of how the universe works. To confront the observed data with our theoretical models we require data hosting, archiving and storage and high-performance computing resources to run the theoretical calculations and compare our simulated and observed universe. We also require the sophisticated development of highly skilled human resources. Newer large projects are often run through international collaborations and partnerships, driving a need for 'open science' and collaborative structure across national boundaries. While astronomical data are useful scientifically, the data do not come with the same ethical/privacy-related restrictions as medical/biological data. Moreover, the ability to use data for new scientific analysis extends and expands the impact and reach of scientific surveys -- this is a strength that national funding agencies should capitalize on. We discuss the management and analysis of such large volumes of data and the corresponding significant challenges that require policy-level preparations. The policy webinar took place during the G20 presidency in India (2023). A summary based on the seven panels can be found here: arxiv:2401.04623., Comment: 5 pages. The panel videos including keynotes and the white papers are available on the S20 site at: https://s20india.org/science-policy-webinar-astroinformatics-for-sustainable-development/
Published: 2024

11. AstroInformatics: Recommendations for Global Cooperation

Author: Mahabal, Ashish, Sharma, Pranav, Adhikari, Rana, Allen, Mark, Andreon, Stefano, Bhalerao, Varun, Bianco, Federica, Brown, Anthony, Cenko, S. Bradley, Coehlo, Paula, Cooke, Jeffery, Crichton, Daniel, Cui, Chenzhou, de Carvalho, Reinaldo, Doyle, Richard, Eyer, Laurent, Fanaroff, Bernard, Fluke, Christopher, Forster, Francisco, Govender, Kevin, Graham, Matthew J., Hložek, Renée, Irawati, Puji, Kembhavi, Ajit, Kollmeier, Juna, Krone-Martins, Alberto, Kulkarni, Shri, Longo, Giuseppe, McBride, Vanessa, McIver, Jess, Mitra, Sanjit, Prusti, Timo, Ramaprakash, A. N., Reddy, Eswar, Reitze, David H., Rosa, Reinaldo R., Santos, Rafael, Sekiguchi, Kazuhiro, Sheth, Kartik, Somasundaram, Seetha, Souradeep, Tarun, Srianand, R., Subramaniam, Annapurni, Szalay, Alex, Tendulkar, Shriharsh, Trouille, Laura, Wadadekar, Yogesh, and Whitelock, Patricia
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Physics - Data Analysis, Statistics and Probability
Abstract: Policy Brief on "AstroInformatics, Recommendations for Global Collaboration", distilled from panel discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023. The deliberations encompassed a wide array of topics, including broad astroinformatics, sky surveys, large-scale international initiatives, global data repositories, space-related data, regional and international collaborative efforts, as well as workforce development within the field. These discussions comprehensively addressed the current status, notable achievements, and the manifold challenges that the field of astroinformatics currently confronts. The G20 nations present a unique opportunity due to their abundant human and technological capabilities, coupled with their widespread geographical representation. Leveraging these strengths, significant strides can be made in various domains. These include, but are not limited to, the advancement of STEM education and workforce development, the promotion of equitable resource utilization, and contributions to fields such as Earth Science and Climate Science. We present a concise overview, followed by specific recommendations that pertain to both ground-based and space data initiatives. Our team remains readily available to furnish further elaboration on any of these proposals as required. Furthermore, we anticipate further engagement during the upcoming G20 presidencies in Brazil (2024) and South Africa (2025) to ensure the continued discussion and realization of these objectives. The policy webinar took place during the G20 presidency in India (2023). Notes based on the seven panels will be separately published., Comment: 7 pages
Published: 2024

12. Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Author: Lu, Jiasen, Clark, Christopher, Lee, Sangho, Zhang, Zichen, Khosla, Savya, Marten, Ryan, Hoiem, Derek, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action. To unify different modalities, we tokenize inputs and outputs -- images, text, audio, action, bounding boxes, etc., into a shared semantic space and then process them with a single encoder-decoder transformer model. Since training with such diverse modalities is challenging, we propose various architectural improvements to stabilize model training. We train our model from scratch on a large multimodal pre-training corpus from diverse sources with a multimodal mixture of denoisers objective. To learn an expansive set of skills, such as following multimodal instructions, we construct and finetune on an ensemble of 120 datasets with prompts and augmentations. With a single unified model, Unified-IO 2 achieves state-of-the-art performance on the GRIT benchmark and strong results in more than 35 benchmarks, including image generation and understanding, natural language understanding, video and audio understanding, and robotic manipulation. We release all our models to the research community., Comment: 38 pages, 20 figures
Published: 2023

13. Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences

Author: Hwang, Minyoung, Weihs, Luca, Park, Chanwoo, Lee, Kimin, Kembhavi, Aniruddha, and Ehsani, Kiana
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: Customizing robotic behaviors to be aligned with diverse human preferences is an underexplored challenge in the field of embodied AI. In this paper, we present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences in complex environments. We use multi-objective reinforcement learning to train a single policy adaptable to a broad spectrum of preferences. We introduce three distinct methods to infer human preferences by leveraging different types of interactions: (1) human demonstrations, (2) preference feedback on trajectory comparisons, and (3) language instructions. We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR, demonstrating the ability to prompt agent behaviors to satisfy human preferences in various scenarios. Project page: https://promptable-behaviors.github.io
Published: 2023

14. Holodeck: Language Guided Generation of 3D Embodied AI Environments

Author: Yang, Yue, Sun, Fan-Yun, Weihs, Luca, VanderBilt, Eli, Herrasti, Alvaro, Han, Winson, Wu, Jiajun, Haber, Nick, Krishna, Ranjay, Liu, Lingjie, Callison-Burch, Chris, Yatskar, Mark, Kembhavi, Aniruddha, and Clark, Christopher
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Robotics
Abstract: 3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs for styles, and can capture the semantics of complex queries such as "apartment for a researcher with a cat" and "office of a professor who is a fan of Star Wars". Holodeck leverages a large language model (i.e., GPT-4) for common sense knowledge about what the scene might look like and uses a large collection of 3D assets from Objaverse to populate the scene with diverse objects. To address the challenge of positioning objects correctly, we prompt GPT-4 to generate spatial relational constraints between objects and then optimize the layout to satisfy those constraints. Our large-scale human evaluation shows that annotators prefer Holodeck over manually designed procedural baselines in residential scenes and that Holodeck can produce high-quality outputs for diverse scene types. We also demonstrate an exciting application of Holodeck in Embodied AI, training agents to navigate in novel scenes like music rooms and daycares without human-constructed data, which is a significant step forward in developing general-purpose embodied agents., Comment: Published in CVPR 2024, 21 pages, 27 figures, 2 tables
Published: 2023

15. Harmonic Mobile Manipulation

Author: Yang, Ruihan, Kim, Yejin, Hendrix, Rose, Kembhavi, Aniruddha, Wang, Xiaolong, and Ehsani, Kiana
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factorization of navigation and manipulation, while effective for some tasks, fails in scenarios requiring coordinated actions. To address this challenge, we introduce, HarmonicMM, an end-to-end learning method that optimizes both navigation and manipulation, showing notable improvement over existing techniques in everyday tasks. This approach is validated in simulated and real-world environments and adapts to novel unseen settings without additional tuning. Our contributions include a new benchmark for mobile manipulation and the successful deployment with only RGB visual observation in a real unseen apartment, demonstrating the potential for practical indoor robot deployment in daily life. More results are on our project site: https://rchalyang.github.io/HarmonicMM/, Comment: More results are on our project site: https://rchalyang.github.io/HarmonicMM/
Published: 2023

16. SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Author: Ehsani, Kiana, Gupta, Tanmay, Hendrix, Rose, Salvador, Jordi, Weihs, Luca, Zeng, Kuo-Hao, Singh, Kunal Pratap, Kim, Yejin, Han, Winson, Herrasti, Alvaro, Krishna, Ranjay, Schwenk, Dustin, VanderBilt, Eli, and Kembhavi, Aniruddha
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward shaping and auxiliary losses and is often too slow and ineffective for long-horizon tasks. While IL with human supervision is effective, collecting human trajectories at scale is extremely expensive. In this work, we show that imitating shortest-path planners in simulation produces agents that, given a language instruction, can proficiently navigate, explore, and manipulate objects in both simulation and in the real world using only RGB sensors (no depth map or GPS coordinates). This surprising result is enabled by our end-to-end, transformer-based, SPOC architecture, powerful visual encoders paired with extensive image augmentation, and the dramatic scale and diversity of our training data: millions of frames of shortest-path-expert trajectories collected inside approximately 200,000 procedurally generated houses containing 40,000 unique 3D assets. Our models, data, training code, and newly proposed 10-task benchmarking suite CHORES are available in https://spoc-robot.github.io., Comment: First six authors contributed equally. Project page: https://spoc-robot.github.io/
Published: 2023

17. Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Author: Eftekhar, Ainaz, Zeng, Kuo-Hao, Duan, Jiafei, Farhadi, Ali, Kembhavi, Ani, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visual cues. Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI. Our approach induces a task-conditioned bottleneck using a small learnable codebook module. This codebook is trained jointly to optimize task reward and acts as a task-conditioned selective filter over the visual observation. Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, ProcTHOR, ArchitecTHOR, RoboTHOR, AI2-iTHOR, and ManipulaTHOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. Our qualitative analyses show that agents explore their environments more effectively and their representations retain task-relevant information like target object recognition while ignoring superfluous information about other objects. Code and pretrained models are available at our project website: https://embodied-codebook.github.io., Comment: See project website: https://embodied-codebook.github.io
Published: 2023

18. Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Author: Collaboration, Open X-Embodiment, O'Neill, Abby, Rehman, Abdul, Gupta, Abhinav, Maddukuri, Abhiram, Gupta, Abhishek, Padalkar, Abhishek, Lee, Abraham, Pooley, Acorn, Gupta, Agrim, Mandlekar, Ajay, Jain, Ajinkya, Tung, Albert, Bewley, Alex, Herzog, Alex, Irpan, Alex, Khazatsky, Alexander, Rai, Anant, Gupta, Anchit, Wang, Andrew, Kolobov, Andrey, Singh, Anikait, Garg, Animesh, Kembhavi, Aniruddha, Xie, Annie, Brohan, Anthony, Raffin, Antonin, Sharma, Archit, Yavary, Arefeh, Jain, Arhan, Balakrishna, Ashwin, Wahid, Ayzaan, Burgess-Limerick, Ben, Kim, Beomjoon, Schölkopf, Bernhard, Wulfe, Blake, Ichter, Brian, Lu, Cewu, Xu, Charles, Le, Charlotte, Finn, Chelsea, Wang, Chen, Xu, Chenfeng, Chi, Cheng, Huang, Chenguang, Chan, Christine, Agia, Christopher, Pan, Chuer, Fu, Chuyuan, Devin, Coline, Xu, Danfei, Morton, Daniel, Driess, Danny, Chen, Daphne, Pathak, Deepak, Shah, Dhruv, Büchler, Dieter, Jayaraman, Dinesh, Kalashnikov, Dmitry, Sadigh, Dorsa, Johns, Edward, Foster, Ethan, Liu, Fangchen, Ceola, Federico, Xia, Fei, Zhao, Feiyu, Frujeri, Felipe Vieira, Stulp, Freek, Zhou, Gaoyue, Sukhatme, Gaurav S., Salhotra, Gautam, Yan, Ge, Feng, Gilbert, Schiavi, Giulio, Berseth, Glen, Kahn, Gregory, Yang, Guangwen, Wang, Guanzhi, Su, Hao, Fang, Hao-Shu, Shi, Haochen, Bao, Henghui, Amor, Heni Ben, Christensen, Henrik I, Furuta, Hiroki, Bharadhwaj, Homanga, Walke, Homer, Fang, Hongjie, Ha, Huy, Mordatch, Igor, Radosavovic, Ilija, Leal, Isabel, Liang, Jacky, Abou-Chakra, Jad, Kim, Jaehyung, Drake, Jaimyn, Peters, Jan, Schneider, Jan, Hsu, Jasmine, Vakil, Jay, Bohg, Jeannette, Bingham, Jeffrey, Wu, Jeffrey, Gao, Jensen, Hu, Jiaheng, Wu, Jiajun, Wu, Jialin, Sun, Jiankai, Luo, Jianlan, Gu, Jiayuan, Tan, Jie, Oh, Jihoon, Wu, Jimmy, Lu, Jingpei, Yang, Jingyun, Malik, Jitendra, Silvério, João, Hejna, Joey, Booher, Jonathan, Tompson, Jonathan, Yang, Jonathan, Salvador, Jordi, Lim, Joseph J., Han, Junhyek, Wang, Kaiyuan, Rao, Kanishka, Pertsch, Karl, Hausman, Karol, Go, Keegan, Gopalakrishnan, Keerthana, Goldberg, Ken, Byrne, Kendra, Oslund, Kenneth, Kawaharazuka, Kento, Black, Kevin, Lin, Kevin, Zhang, Kevin, Ehsani, Kiana, Lekkala, Kiran, Ellis, Kirsty, Rana, Krishan, Srinivasan, Krishnan, Fang, Kuan, Singh, Kunal Pratap, Zeng, Kuo-Hao, Hatch, Kyle, Hsu, Kyle, Itti, Laurent, Chen, Lawrence Yunliang, Pinto, Lerrel, Fei-Fei, Li, Tan, Liam, Fan, Linxi "Jim", Ott, Lionel, Lee, Lisa, Weihs, Luca, Chen, Magnum, Lepert, Marion, Memmel, Marius, Tomizuka, Masayoshi, Itkina, Masha, Castro, Mateo Guaman, Spero, Max, Du, Maximilian, Ahn, Michael, Yip, Michael C., Zhang, Mingtong, Ding, Mingyu, Heo, Minho, Srirama, Mohan Kumar, Sharma, Mohit, Kim, Moo Jin, Kanazawa, Naoaki, Hansen, Nicklas, Heess, Nicolas, Joshi, Nikhil J, Suenderhauf, Niko, Liu, Ning, Di Palo, Norman, Shafiullah, Nur Muhammad Mahi, Mees, Oier, Kroemer, Oliver, Bastani, Osbert, Sanketi, Pannag R, Miller, Patrick "Tree", Yin, Patrick, Wohlhart, Paul, Xu, Peng, Fagan, Peter David, Mitrano, Peter, Sermanet, Pierre, Abbeel, Pieter, Sundaresan, Priya, Chen, Qiuyu, Vuong, Quan, Rafailov, Rafael, Tian, Ran, Doshi, Ria, Mart'in-Mart'in, Roberto, Baijal, Rohan, Scalise, Rosario, Hendrix, Rose, Lin, Roy, Qian, Runjia, Zhang, Ruohan, Mendonca, Russell, Shah, Rutav, Hoque, Ryan, Julian, Ryan, Bustamante, Samuel, Kirmani, Sean, Levine, Sergey, Lin, Shan, Moore, Sherry, Bahl, Shikhar, Dass, Shivin, Sonawani, Shubham, Tulsiani, Shubham, Song, Shuran, Xu, Sichun, Haldar, Siddhant, Karamcheti, Siddharth, Adebola, Simeon, Guist, Simon, Nasiriany, Soroush, Schaal, Stefan, Welker, Stefan, Tian, Stephen, Ramamoorthy, Subramanian, Dasari, Sudeep, Belkhale, Suneel, Park, Sungjae, Nair, Suraj, Mirchandani, Suvir, Osa, Takayuki, Gupta, Tanmay, Harada, Tatsuya, Matsushima, Tatsuya, Xiao, Ted, Kollar, Thomas, Yu, Tianhe, Ding, Tianli, Davchev, Todor, Zhao, Tony Z., Armstrong, Travis, Darrell, Trevor, Chung, Trinity, Jain, Vidhi, Kumar, Vikash, Vanhoucke, Vincent, Zhan, Wei, Zhou, Wenxuan, Burgard, Wolfram, Chen, Xi, Chen, Xiangyu, Wang, Xiaolong, Zhu, Xinghao, Geng, Xinyang, Liu, Xiyuan, Liangwei, Xu, Li, Xuanlin, Pang, Yansong, Lu, Yao, Ma, Yecheng Jason, Kim, Yejin, Chebotar, Yevgen, Zhou, Yifan, Zhu, Yifeng, Wu, Yilin, Xu, Ying, Wang, Yixuan, Bisk, Yonatan, Dou, Yongqiang, Cho, Yoonyoung, Lee, Youngwoon, Cui, Yuchen, Cao, Yue, Wu, Yueh-Hua, Tang, Yujin, Zhu, Yuke, Zhang, Yunchu, Jiang, Yunfan, Li, Yunshuang, Li, Yunzhu, Iwasawa, Yusuke, Matsuo, Yutaka, Ma, Zehan, Xu, Zhuo, Cui, Zichen Jeff, Zhang, Zichen, Fu, Zipeng, and Lin, Zipeng
Subjects: Computer Science - Robotics
Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io., Comment: Project website: https://robotics-transformer-x.github.io
Published: 2023

19. OBJECT 3DIT: Language-guided 3D-aware Image Editing

Author: Michel, Oscar, Bhattad, Anand, VanderBilt, Eli, Krishna, Ranjay, Kembhavi, Aniruddha, and Gupta, Tanmay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: Existing image editing tools, while powerful, typically disregard the underlying 3D geometry from which the image is projected. As a result, edits made using these tools may become detached from the geometry and lighting conditions that are at the foundation of the image formation process. In this work, we formulate the newt ask of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene. To promote progress towards this goal, we release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. Each example consists of an input image, editing instruction in language, and the edited image. We also introduce 3DIT : single and multi-task models for four editing tasks. Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations. Surprisingly, training on only synthetic scenes from OBJECT, editing capabilities of 3DIT generalize to real-world images.
Published: 2023

20. Objaverse-XL: A Universe of 10M+ 3D Objects

Author: Deitke, Matt, Liu, Ruoshi, Wallingford, Matthew, Ngo, Huong, Michel, Oscar, Kusupati, Aditya, Fan, Alan, Laforte, Christian, Voleti, Vikram, Gadre, Samir Yitzhak, VanderBilt, Eli, Kembhavi, Aniruddha, Vondrick, Carl, Gkioxari, Georgia, Ehsani, Kiana, Schmidt, Ludwig, and Farhadi, Ali
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data. However, 3D vision tasks have not seen the same progress, in part due to the challenges of acquiring high-quality 3D data. In this work, we present Objaverse-XL, a dataset of over 10 million 3D objects. Our dataset comprises deduplicated 3D objects from a diverse set of sources, including manually designed objects, photogrammetry scans of landmarks and everyday items, and professional scans of historic and antique artifacts. Representing the largest scale and diversity in the realm of 3D datasets, Objaverse-XL enables significant new possibilities for 3D vision. Our experiments demonstrate the improvements enabled with the scale provided by Objaverse-XL. We show that by training Zero123 on novel view synthesis, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities. We hope that releasing Objaverse-XL will enable further innovations in the field of 3D vision at scale.
Published: 2023

21. SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Author: Hsieh, Cheng-Yu, Zhang, Jieyu, Ma, Zixian, Kembhavi, Aniruddha, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem. Given an image, these benchmarks probe a model's ability to identify its associated caption amongst a set of compositional distractors. Surprisingly, we find significant biases in all these benchmarks rendering them hackable. This hackability is so dire that blind models with no access to the image outperform state-of-the-art vision-language models. To remedy this rampant vulnerability, we introduce SugarCrepe, a new benchmark for vision-language compositionality evaluation. We employ large language models, instead of rule-based templates used in previous benchmarks, to generate fluent and sensical hard negatives, and utilize an adversarial refinement mechanism to maximally reduce biases. We re-evaluate state-of-the-art models and recently proposed compositionality inducing strategies, and find that their improvements were hugely overestimated, suggesting that more innovation is needed in this important direction. We release SugarCrepe and the code for evaluation at: https://github.com/RAIVNLab/sugar-crepe.
Published: 2023

22. MIMIC: Masked Image Modeling with Image Correspondences

Author: Marathe, Kalyani, Bigverdi, Mahtab, Khan, Nishat, Kundu, Tuhin, Howe, Patrick, S, Sharan Ranjit, Bhattad, Anand, Kembhavi, Aniruddha, Shapiro, Linda G., and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Dense pixel-specific representation learning at scale has been bottlenecked due to the unavailability of large-scale multi-view datasets. Current methods for building effective pretraining datasets heavily rely on annotated 3D meshes, point clouds, and camera parameters from simulated environments, preventing them from building datasets from real-world data sources where such metadata is lacking. We propose a pretraining dataset-curation approach that does not require any additional annotations. Our method allows us to generate multi-view datasets from both real-world videos and simulated environments at scale. Specifically, we experiment with two scales: MIMIC-1M with 1.3M and MIMIC-3M with 3.1M multi-view image pairs. We train multiple models with different masked image modeling objectives to showcase the following findings: Representations trained on our automatically generated MIMIC-3M outperform those learned from expensive crowdsourced datasets (ImageNet-1K) and those learned from synthetic environments (MULTIVIEW-HABITAT) on two dense geometric tasks: depth estimation on NYUv2 (1.7%), and surface normals estimation on Taskonomy (2.05%). For dense tasks which also require object understanding, we outperform MULTIVIEW-HABITAT, on semantic segmentation on ADE20K (3.89%), pose estimation on MSCOCO (9.4%), and reduce the gap with models pre-trained on the object-centric expensive ImageNet-1K. We outperform even when the representations are frozen, and when downstream training data is limited to few-shot. Larger dataset (MIMIC-3M) significantly improves performance, which is promising since our curation method can arbitrarily scale to produce even larger datasets. MIMIC code, dataset, and pretrained models are open-sourced at https://github.com/RAIVNLab/MIMIC.
Published: 2023

23. Neural Priming for Sample-Efficient Adaptation

Author: Wallingford, Matthew, Ramanujan, Vivek, Fang, Alex, Kusupati, Aditya, Mottaghi, Roozbeh, Kembhavi, Aniruddha, Schmidt, Ludwig, and Farhadi, Ali
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time, even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2.45% improvement in accuracy on ImageNet and 3.81% accuracy improvement on average across standard transfer learning benchmarks. Further, using Neural Priming at inference to adapt to distribution shift, we see a 1.41% accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the challenge of limited labeled data and changing distributions. Code is available at github.com/RAIVNLab/neural-priming., Comment: 18 pages, 7 figures, 9 tables
Published: 2023

24. Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Author: Maharana, Adyasha, Kamath, Amita, Clark, Christopher, Bansal, Mohit, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support. Inconsistent AI models are considered brittle and untrustworthy by human users and are more challenging to incorporate into larger systems that take dependencies on their outputs. Measuring consistency between very heterogeneous tasks that might include outputs in different modalities is challenging since it is difficult to determine if the predictions are consistent with one another. As a solution, we introduce a benchmark dataset, CocoCon, where we create contrast sets by modifying test instances for multiple tasks in small but semantically meaningful ways to change the gold label and outline metrics for measuring if a model is consistent by ranking the original and perturbed instances across tasks. We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks, especially for more heterogeneous tasks. To alleviate this issue, we propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets, that improves the multi-task consistency of large unified models while retaining their original accuracy on downstream tasks., Comment: TMLR 2024; Project Website: https://adymaharana.github.io/cococon/
Published: 2023

25. Neural Radiance Field Codebooks

Author: Wallingford, Matthew, Kusupati, Aditya, Fang, Alex, Ramanujan, Vivek, Kembhavi, Aniruddha, Mottaghi, Roozbeh, and Farhadi, Ali
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks. Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view reconstruction. NRC learns to reconstruct scenes from novel views using a dictionary of object codes which are decoded through a volumetric renderer. This enables the discovery of reoccurring visual and geometric patterns across scenes which are transferable to downstream tasks. We show that NRC representations transfer well to object navigation in THOR, outperforming 2D and 3D representation learning methods by 3.1% success rate. We demonstrate that our approach is able to perform unsupervised segmentation for more complex synthetic (THOR) and real scenes (NYU Depth) better than prior methods (29% relative improvement). Finally, we show that NRC improves on the task of depth ordering by 5.5% accuracy in THOR., Comment: 19 pages, 8 figures, 9 tables
Published: 2023

26. Agro-Food Waste, a Potential Bioresource for Nutraceuticals

Author: Gupte, Arpita, primary, Kulkarni, Anuja, additional, Pawar, Pratik, additional, Jadhav, Smita, additional, and Kembhavi, Anjali, additional
Published: 2024
Full Text: View/download PDF

27. Objaverse: A Universe of Annotated 3D Objects

Author: Deitke, Matt, Schwenk, Dustin, Salvador, Jordi, Weihs, Luca, Michel, Oscar, VanderBilt, Eli, Schmidt, Ludwig, Ehsani, Kiana, Kembhavi, Aniruddha, and Farhadi, Ali
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics, Computer Science - Robotics
Abstract: Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce impressive results and top many of today's benchmarks. A notable omission within this family of large-scale datasets is 3D data. Despite considerable interest and potential applications in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with limited diversity of object categories. Addressing this gap, we present Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. Objaverse improves upon present day 3D repositories in terms of scale, number of categories, and in the visual diversity of instances within a category. We demonstrate the large potential of Objaverse via four diverse applications: training generative 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied AI, and creating a new benchmark for robustness analysis of vision models. Objaverse can open new directions for research and enable new applications across the field of AI., Comment: Website: objaverse.allenai.org
Published: 2022

28. Phone2Proc: Bringing Robust Robots Into Our Chaotic World

Author: Deitke, Matt, Hendrix, Rose, Weihs, Luca, Farhadi, Ali, Ehsani, Kiana, and Kembhavi, Aniruddha
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Training embodied agents in simulation has become mainstream for the embodied AI community. However, these agents often struggle when deployed in the physical world due to their inability to generalize to real-world environments. In this paper, we present Phone2Proc, a method that uses a 10-minute phone scan and conditional procedural generation to create a distribution of training scenes that are semantically similar to the target environment. The generated scenes are conditioned on the wall layout and arrangement of large objects from the scan, while also sampling lighting, clutter, surface textures, and instances of smaller objects with randomized placement and materials. Leveraging just a simple RGB camera, training with Phone2Proc shows massive improvements from 34.7% to 70.7% success rate in sim-to-real ObjectNav performance across a test suite of over 200 trials in diverse real-world environments, including homes, offices, and RoboTHOR. Furthermore, Phone2Proc's diverse distribution of generated scenes makes agents remarkably robust to changes in the real world, such as human movement, object rearrangement, lighting changes, or clutter., Comment: https://allenai.org/project/phone2proc
Published: 2022

29. A General Purpose Supervisory Signal for Embodied Agents

Author: Singh, Kunal Pratap, Salvador, Jordi, Weihs, Luca, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Training effective embodied AI agents often involves manual reward engineering, expert imitation, specialized components such as maps, or leveraging additional sensors for depth and localization. Another approach is to use neural architectures alongside self-supervised objectives which encourage better representation learning. In practice, there are few guarantees that these self-supervised objectives encode task-relevant information. We propose the Scene Graph Contrastive (SGC) loss, which uses scene graphs as general-purpose, training-only, supervisory signals. The SGC loss does away with explicit graph decoding and instead uses contrastive learning to align an agent's representation with a rich graphical encoding of its environment. The SGC loss is generally applicable, simple to implement, and encourages representations that encode objects' semantics, relationships, and history. Using the SGC loss, we attain significant gains on three embodied tasks: Object Navigation, Multi-Object Navigation, and Arm Point Navigation. Finally, we present studies and analyses which demonstrate the ability of our trained representation to encode semantic cues about the environment.
Published: 2022

30. Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action.

Author: Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang 0016, Savya Khosla, Ryan Marten, Derek Hoiem, and Aniruddha Kembhavi
Published: 2024
Full Text: View/download PDF

31. Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences.

Author: Minyoung Hwang, Luca Weihs, Chanwoo Park, Kimin Lee, Aniruddha Kembhavi, and Kiana Ehsani
Published: 2024
Full Text: View/download PDF

32. Iterated Learning Improves Compositionality in Large Vision-Language Models.

Author: Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi, and Ranjay Krishna
Published: 2024
Full Text: View/download PDF

33. SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World.

Author: Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, and Aniruddha Kembhavi
Published: 2024
Full Text: View/download PDF

34. Holodeck: Language Guided Generation of 3D Embodied AI Environments.

Author: Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu 0001, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, and Christopher Clark
Published: 2024
Full Text: View/download PDF

35. MIMIC: Masked Image Modeling with Image Correspondences.

Author: Kalyani Marathe, Mahtab Bigverdi, Nishat Khan, Tuhin Kundu, Patrick Howe, Sharan Ranjit S, Anand Bhattad, Aniruddha Kembhavi, Linda G. Shapiro, and Ranjay Krishna
Published: 2024
Full Text: View/download PDF

36. Seeing the Unseen: Visual Common Sense for Semantic Placement.

Author: Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, and Luca Weihs
Published: 2024
Full Text: View/download PDF

37. Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration.

Author: Abby O'Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta 0004, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alexander Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie, Anthony Brohan, Antonin Raffin, Archit Sharma, Arefeh Yavary, Arhan Jain, Ashwin Balakrishna, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Blake Wulfe, Brian Ichter, Cewu Lu, Charles Xu 0003, Charlotte Le, Chelsea Finn, Chen Wang 0053, Chenfeng Xu, Cheng Chi, Chenguang Huang, Christine Chan, Christopher Agia, Chuer Pan, Chuyuan Fu, Coline Devin, Danfei Xu, Daniel Morton, Danny Driess, Daphne Chen, Deepak Pathak, Dhruv Shah, Dieter Büchler, Dinesh Jayaraman, Dmitry Kalashnikov, Dorsa Sadigh, Edward Johns, Ethan Paul Foster, Fangchen Liu, Federico Ceola, Fei Xia, Feiyu Zhao, Freek Stulp, Gaoyue Zhou, Gaurav S. Sukhatme, Gautam Salhotra, Ge Yan, Gilbert Feng, Giulio Schiavi, Glen Berseth, Gregory Kahn, Guanzhi Wang, Hao Su 0001, Haoshu Fang, Haochen Shi, Henghui Bao, Heni Ben Amor, Henrik I. Christensen, Hiroki Furuta, Homer Walke, Hongjie Fang, Huy Ha, Igor Mordatch, Ilija Radosavovic, Isabel Leal, Jacky Liang, Jad Abou-Chakra, Jaehyung Kim 0001, Jaimyn Drake, Jan Peters 0001, Jan Schneider, Jasmine Hsu, Jeannette Bohg, Jeffrey Bingham, Jeffrey Wu, Jensen Gao, Jiaheng Hu, Jiajun Wu 0001, Jialin Wu, Jiankai Sun, Jianlan Luo, Jiayuan Gu, Jie Tan, Jihoon Oh, Jimmy Wu, Jingpei Lu, Jingyun Yang, Jitendra Malik, João Silvério, Joey Hejna, Jonathan Booher, Jonathan Tompson, Jonathan Yang, Jordi Salvador, Joseph J. Lim, Junhyek Han, Kaiyuan Wang, Kanishka Rao, Karl Pertsch, Karol Hausman, Keegan Go, Keerthana Gopalakrishnan, Ken Goldberg, Kendra Byrne, Kenneth Oslund, Kento Kawaharazuka, Kevin Black, Kevin Lin, Kevin Zhang 0002, Kiana Ehsani, Kiran Lekkala, Kirsty Ellis, Krishan Rana, Krishnan Srinivasan, Kuan Fang, Kunal Pratap Singh, Kuo-Hao Zeng, Kyle Hatch, Kyle Hsu, Laurent Itti, Lawrence Yunliang Chen, Lerrel Pinto, Li Fei-Fei 0001, Liam Tan, Linxi Jim Fan, Lionel Ott, Lisa Lee, Luca Weihs, Magnum Chen, Marion Lepert, Marius Memmel, Masayoshi Tomizuka, Masha Itkina, Mateo Guaman Castro, Max Spero, Maximilian Du, Michael Ahn, Michael C. Yip, Mingtong Zhang, Mingyu Ding, Minho Heo, Mohan Kumar Srirama, Mohit Sharma 0001, Moo Jin Kim, Naoaki Kanazawa, Nicklas Hansen 0001, Nicolas Heess, Nikhil J. Joshi, Niko Sünderhauf, Ning Liu, Norman Di Palo, Nur Muhammad (Mahi) Shafiullah, Oier Mees, Oliver Kroemer, Osbert Bastani, Pannag R. Sanketi, Patrick Tree Miller, Patrick Yin, Paul Wohlhart, Peng Xu, Peter David Fagan, Peter Mitrano, Pierre Sermanet, Pieter Abbeel, Priya Sundaresan, Qiuyu Chen, Quan Vuong, Rafael Rafailov, Ran Tian, Ria Doshi, Roberto Martín-Martín, Rohan Baijal, Rosario Scalise, Rose Hendrix, Roy Lin, Runjia Qian, Ruohan Zhang, Russell Mendonca, Rutav Shah, Ryan Hoque, Ryan Julian, Samuel Bustamante, Sean Kirmani, Sergey Levine, Shan Lin, Sherry Moore, Shikhar Bahl, Shivin Dass, Shubham D. Sonawani, Shuran Song, Sichun Xu, Siddhant Haldar, Siddharth Karamcheti, Simeon Adebola, Simon Guist, Soroush Nasiriany, Stefan Schaal, Stefan Welker, Stephen Tian, Subramanian Ramamoorthy, Sudeep Dasari, Suneel Belkhale, Sungjae Park, Suraj Nair 0003, Suvir Mirchandani, Takayuki Osa, Tanmay Gupta, Tatsuya Harada, Tatsuya Matsushima, Ted Xiao, Thomas Kollar, Tianhe Yu, Tianli Ding, Todor Davchev, Tony Z. Zhao, Travis Armstrong, Trevor Darrell, Trinity Chung, Vidhi Jain, Vincent Vanhoucke, Wei Zhan, Wenxuan Zhou, Wolfram Burgard, Xi Chen, Xiaolong Wang, Xinghao Zhu, Xinyang Geng, Xiyuan Liu, Liangwei Xu, Xuanlin Li, Yao Lu, Yecheng Jason Ma, Yejin Kim, Yevgen Chebotar, Yifan Zhou, Yifeng Zhu, Yilin Wu, Ying Xu, Yixuan Wang, Yonatan Bisk, Yoonyoung Cho, Youngwoon Lee, Yuchen Cui, Yue Cao, Yueh-Hua Wu, Yujin Tang, Yuke Zhu, Yunchu Zhang, Yunfan Jiang 0001, Yunshuang Li, Yunzhu Li, Yusuke Iwasawa, Yutaka Matsuo, Zehan Ma, Zhuo Xu, Zichen Jeff Cui, Zichen Zhang 0016, and Zipeng Lin
Published: 2024
Full Text: View/download PDF

38. Breast Cancer

Author: Nair, Nita S., Wadasadawala, Tabassum, Thakkar, Purvi, Kembhavi, Seema A., Bajpai, Jyoti, Gulia, Seema, Pathak, Rima Sanjay, Turkar, Siddharth, Badwe, Rajendra A., Badwe, Rajendra A., editor, Gupta, Sudeep, editor, Shrikhande, Shailesh V., editor, and Laskar, Siddhartha, editor
Published: 2024
Full Text: View/download PDF

39. Imaging in Oncology

Author: Kembhavi, Seema A., Thakkar (Popat), Palak Bhavesh, Thakur, Meenakshi H., Badwe, Rajendra A., editor, Gupta, Sudeep, editor, Shrikhande, Shailesh V., editor, and Laskar, Siddhartha, editor
Published: 2024
Full Text: View/download PDF

40. SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding

Author: Bastani, Favyen, Wolters, Piper, Gupta, Ritwik, Ferdinando, Joe, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Remote sensing images are useful for a wide variety of planet monitoring applications, from tracking deforestation to tackling illegal fishing. The Earth is extremely diverse -- the amount of potential tasks in remote sensing images is massive, and the sizes of features range from several kilometers to just tens of centimeters. However, creating generalizable computer vision methods is a challenge in part due to the lack of a large-scale dataset that captures these diverse features for many tasks. In this paper, we present SatlasPretrain, a remote sensing dataset that is large in both breadth and scale, combining Sentinel-2 and NAIP images with 302M labels under 137 categories and seven label types. We evaluate eight baselines and a proposed method on SatlasPretrain, and find that there is substantial room for improvement in addressing research challenges specific to remote sensing, including processing image time series that consist of images from very different types of sensors, and taking advantage of long-range spatial context. Moreover, we find that pre-training on SatlasPretrain substantially improves performance on downstream tasks, increasing average accuracy by 18% over ImageNet and 6% over the next best baseline. The dataset, pre-trained model weights, and code are available at https://satlas-pretrain.allen.ai/., Comment: ICCV 2023
Published: 2022

41. Visual Programming: Compositional visual reasoning without training

Author: Gupta, Tanmay and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. VISPROG avoids the need for any task-specific training. Instead, it uses the in-context learning ability of large language models to generate python-like modular programs, which are then executed to get both the solution and a comprehensive and interpretable rationale. Each line of the generated program may invoke one of several off-the-shelf computer vision models, image processing routines, or python functions to produce intermediate outputs that may be consumed by subsequent parts of the program. We demonstrate the flexibility of VISPROG on 4 diverse tasks - compositional visual question answering, zero-shot reasoning on image pairs, factual knowledge object tagging, and language-guided image editing. We believe neuro-symbolic approaches like VISPROG are an exciting avenue to easily and effectively expand the scope of AI systems to serve the long tail of complex tasks that people may wish to perform.
Published: 2022

42. I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

Author: Gu, Sophia, Clark, Christopher, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such as natural language processing. In this paper, we ask whether it is possible to learn those skills from text data and then transfer them to vision tasks without ever training on visual training data. Key to our approach is exploiting the joint embedding space of contrastively trained vision and language encoders. In practice, there can be systematic differences between embedding spaces for different modalities in contrastive models, and we analyze how these differences affect our approach and study strategies to mitigate this concern. We produce models using only text training data on four representative tasks: image captioning, visual entailment, visual question answering and visual news captioning, and evaluate them on standard benchmarks using images. We find these models perform close to models trained on images, while surpassing prior work for captioning and visual entailment in this text-only setting by over 9 points, and outperforming all prior work on visual news by over 30 points. We also showcase a variety of stylistic image captioning models that are trained using no image data and no human-curated language data, but instead using readily-available text data from books, the web, or language models., Comment: website (https://prior.allenai.org/projects/close), code (https://github.com/allenai/close)
Published: 2022

43. Photometric identification of compact galaxies, stars and quasars using multiple neural networks

Author: Chaini, Siddharth, Bagul, Atharva, Deshpande, Anish, Gondkar, Rishi, Sharma, Kaushal, Vivek, M., and Kembhavi, Ajit
Subjects: Astrophysics - Astrophysics of Galaxies, Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning
Abstract: We present MargNet, a deep learning-based classifier for identifying stars, quasars and compact galaxies using photometric parameters and images from the Sloan Digital Sky Survey (SDSS) Data Release 16 (DR16) catalogue. MargNet consists of a combination of Convolutional Neural Network (CNN) and Artificial Neural Network (ANN) architectures. Using a carefully curated dataset consisting of 240,000 compact objects and an additional 150,000 faint objects, the machine learns classification directly from the data, minimising the need for human intervention. MargNet is the first classifier focusing exclusively on compact galaxies and performs better than other methods to classify compact galaxies from stars and quasars, even at fainter magnitudes. This model and feature engineering in such deep learning architectures will provide greater success in identifying objects in the ongoing and upcoming surveys, such as Dark Energy Survey (DES) and images from the Vera C. Rubin Observatory., Comment: 14 pages, 10 figures, Accepted for publication in MNRAS
Published: 2022
Full Text: View/download PDF

44. Retrospectives on the Embodied AI Workshop

Author: Deitke, Matt, Batra, Dhruv, Bisk, Yonatan, Campari, Tommaso, Chang, Angel X., Chaplot, Devendra Singh, Chen, Changan, D'Arpino, Claudia Pérez, Ehsani, Kiana, Farhadi, Ali, Fei-Fei, Li, Francis, Anthony, Gan, Chuang, Grauman, Kristen, Hall, David, Han, Winson, Jain, Unnat, Kembhavi, Aniruddha, Krantz, Jacob, Lee, Stefan, Li, Chengshu, Majumder, Sagnik, Maksymets, Oleksandr, Martín-Martín, Roberto, Mottaghi, Roozbeh, Raychaudhuri, Sonia, Roberts, Mike, Savarese, Silvio, Savva, Manolis, Shridhar, Mohit, Sünderhauf, Niko, Szot, Andrew, Talbot, Ben, Tenenbaum, Joshua B., Thomason, Jesse, Toshev, Alexander, Truong, Joanne, Weihs, Luca, and Wu, Jiajun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
Published: 2022

45. Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Author: Lu, Jiasen, Clark, Christopher, Zellers, Rowan, Mottaghi, Roozbeh, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing. Developing a single unified model for such a large variety of tasks poses unique challenges due to the heterogeneous inputs and outputs pertaining to each task, including RGB images, per-pixel maps, binary masks, bounding boxes, and language. We achieve this unification by homogenizing every supported input and output into a sequence of discrete vocabulary tokens. This common representation across all tasks allows us to train a single transformer-based architecture, jointly on over 90 diverse datasets in the vision and language fields. Unified-IO is the first model capable of performing all 7 tasks on the GRIT benchmark and produces strong results across 16 diverse benchmarks like NYUv2-Depth, ImageNet, VQA2.0, OK-VQA, Swig, VizWizGround, BoolQ, and SciTail, with no task-specific fine-tuning. Code and demos for Unified-IO are available at: https://unified-io.allenai.org.
Published: 2022

46. What do navigation agents learn about their environment?

Author: Dwivedi, Kshitij, Roig, Gemma, Kembhavi, Aniruddha, and Mottaghi, Roozbeh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Today's state of the art visual navigation agents typically consist of large deep learning models trained end to end. Such models offer little to no interpretability about the learned skills or the actions of the agent taken in response to its environment. While past works have explored interpreting deep learning models, little attention has been devoted to interpreting embodied AI systems, which often involve reasoning about the structure of the environment, target characteristics and the outcome of one's actions. In this paper, we introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents. We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment. We demonstrate interesting insights about navigation agents using iSEE, including the ability to encode reachable locations (to avoid obstacles), visibility of the target, progress from the initial spawn location as well as the dramatic effect on the behaviors of agents when we mask out critical individual neurons. The code is available at: https://github.com/allenai/iSEE, Comment: CVPR 2022
Published: 2022

47. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

Author: Deitke, Matt, VanderBilt, Eli, Herrasti, Alvaro, Weihs, Luca, Salvador, Jordi, Ehsani, Kiana, Han, Winson, Kolve, Eric, Farhadi, Ali, Kembhavi, Aniruddha, and Mottaghi, Roozbeh
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of ProcTHOR via a sample of 10,000 generated houses and a simple neural model. Models trained using only RGB images on ProcTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on ProcTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data., Comment: ProcTHOR website: https://procthor.allenai.org
Published: 2022

48. GRIT: General Robust Image Task Benchmark

Author: Gupta, Tanmay, Marten, Ryan, Kembhavi, Aniruddha, and Hoiem, Derek
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Computer vision models excel at making predictions when the test distribution closely resembles the training distribution. Such models have yet to match the ability of biological vision to learn from multiple sources and generalize to new data sources and tasks. To facilitate the development and evaluation of more general vision systems, we introduce the General Robust Image Task (GRIT) benchmark. GRIT evaluates the performance, robustness, and calibration of a vision system across a variety of image prediction tasks, concepts, and data sources. The seven tasks in GRIT are selected to cover a range of visual skills: object categorization, object localization, referring expression grounding, visual question answering, segmentation, human keypoint detection, and surface normal estimation. GRIT is carefully designed to enable the evaluation of robustness under image perturbations, image source distribution shift, and concept distribution shift. By providing a unified platform for thorough assessment of skills and concepts learned by a vision model, we hope GRIT catalyzes the development of performant and robust general purpose vision systems.
Published: 2022

49. Object Manipulation via Visual Target Localization

Author: Ehsani, Kiana, Farhadi, Ali, Kembhavi, Aniruddha, and Mottaghi, Roozbeh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them. Training agents to manipulate objects, poses many challenges. These include occlusion of the target object by the agent's arm, noisy object detection and localization, and the target frequently going out of view as the agent moves around in the scene. We propose Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode. Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite but is trained without the object location estimator, and our analysis shows that our agent is robust to noise in depth perception and agent localization. Importantly, our proposed approach relaxes several assumptions about idealized localization and perception that are commonly employed by recent works in embodied AI -- an important step towards training agents for object manipulation in the real world.
Published: 2022

50. ASC me to Do Anything: Multi-task Training for Embodied AI

Author: Lu, Jiasen, Salvador, Jordi, Mottaghi, Roozbeh, and Kembhavi, Aniruddha
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Embodied AI has seen steady progress across a diverse set of independent tasks. While these varied tasks have different end goals, the basic skills required to complete them successfully overlap significantly. In this paper, our goal is to leverage these shared skills to learn to perform multiple tasks jointly. We propose Atomic Skill Completion (ASC), an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks. The key to the success of this approach is a pre-training scheme that decouples learning of the skills from the high-level tasks making joint training effective. We use ASC to train agents within the AI2-THOR environment to perform four interactive tasks jointly and find it to be remarkably effective. In a multi-task setting, ASC improves success rates by a factor of 2x on Seen scenes and 4x on Unseen scenes compared to no pre-training. Importantly, ASC enables us to train a multi-task agent that has a 52% higher Success Rate than training 4 independent single task agents. Finally, our hierarchical agents are more interpretable than traditional black-box architectures., Comment: 22 pages, 11 figures
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,801 results on '"Kembhavi, A."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources