Author: "Oh, Jean" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Oh, Jean"' showing total 342 results

Start Over Author "Oh, Jean"

342 results on '"Oh, Jean"'

1. Grounding Robot Policies with Visuomotor Language Guidance

Author: Bucker, Arthur, Ortega-Kral, Pablo, Francis, Jonathan, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Recent advances in the fields of natural language processing and computer vision have shown great potential in understanding the underlying dynamics of the world from large-scale internet data. However, translating this knowledge into robotic systems remains an open challenge, given the scarcity of human-robot interactions and the lack of large-scale datasets of real-world robotic data. Previous robot learning approaches such as behavior cloning and reinforcement learning have shown great capabilities in learning robotic skills from human demonstrations or from scratch in specific environments. However, these approaches often require task-specific demonstrations or designing complex simulation environments, which limits the development of generalizable and robust policies for new settings. Aiming to address these limitations, we propose an agent-based framework for grounding robot policies to the current context, considering the constraints of a current robot and its environment using visuomotor-grounded language guidance. The proposed framework is composed of a set of conversational agents designed for specific roles -- namely, high-level advisor, visual grounding, monitoring, and robotic agents. Given a base policy, the agents collectively generate guidance at run time to shift the action distribution of the base policy towards more desirable future states. We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates both in simulation and in real-world experiments without the need for additional human demonstrations or extensive exploration. Project videos at https://sites.google.com/view/motorcortex/home., Comment: 19 pages, 6 figures, 1 table
Published: 2024

2. The Foundational Pose as a Selection Mechanism for the Design of Tool-Wielding Multi-Finger Robotic Hands

Author: Wang, Sunyu, Oh, Jean H., and Pollard, Nancy S.
Subjects: Computer Science - Robotics
Abstract: To wield an object means to hold and move it in a way that exploits its functions. When we wield tools -- such as writing with a pen or cutting with scissors -- our hands would reach very specific poses, often drastically different from how we pick up the same objects just to transport them. In this work, we investigate the design of tool-wielding multi-finger robotic hands based on a hypothesis: the poses that a tool and a hand reach during tool-wielding -- what we call "foundational poses" (FPs) -- can be used as a selection mechanism in the design process. We interpret FPs as snapshots that capture the workings of underlying mechanisms formed by the tool and the hand, and one hand can form multiple mechanisms with the same tool. We tested our hypothesis in a hand design experiment, where we developed a sampling-based design optimization framework that uses FPs to computationally generate many different hand designs and evaluate them in multiple metrics. The results show that more than $99\%$ of the $10,785$ generated hand designs successfully wielded tools in simulation, supporting our hypothesis. Meanwhile, our methods provide insights into the non-convex, multi-objective hand design optimization problem that could be hard to unveil otherwise, such as clustering and the Pareto front. Lastly, we demonstrate our methods' real-world feasibility and potential with a hardware prototype equipped with rigid endoskeleton and soft skin.
Published: 2024

3. SEAL: Towards Safe Autonomous Driving via Skill-Enabled Adversary Learning for Closed-Loop Scenario Generation

Author: Stoler, Benjamin, Navarro, Ingrid, Francis, Jonathan, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Verification and validation of autonomous driving (AD) systems and components is of increasing importance, as such technology increases in real-world prevalence. Safety-critical scenario generation is a key approach to robustify AD policies through closed-loop training. However, existing approaches for scenario generation rely on simplistic objectives, resulting in overly-aggressive or non-reactive adversarial behaviors. To generate diverse adversarial yet realistic scenarios, we propose SEAL, a scenario perturbation approach which leverages learned scoring functions and adversarial, human-like skills. SEAL-perturbed scenarios are more realistic than SOTA baselines, leading to improved ego task success across real-world, in-distribution, and out-of-distribution scenarios, of more than 20%. To facilitate future research, we release our code and tools: https://github.com/cmubig/SEAL, Comment: 8 pages, 4 figures, 2 tables
Published: 2024

4. FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments

Author: Dhrafani, Devansh, Liu, Yifei, Jong, Andrew, Shin, Ukcheol, He, Yao, Harp, Tyler, Hu, Yaoyu, Oh, Jean, and Scherer, Sebastian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: Robust depth perception in visually-degraded environments is crucial for autonomous aerial systems. Thermal imaging cameras, which capture infrared radiation, are robust to visual degradation. However, due to lack of a large-scale dataset, the use of thermal cameras for unmanned aerial system (UAS) depth perception has remained largely unexplored. This paper presents a stereo thermal depth perception dataset for autonomous aerial perception applications. The dataset consists of stereo thermal images, LiDAR, IMU and ground truth depth maps captured in urban and forest settings under diverse conditions like day, night, rain, and smoke. We benchmark representative stereo depth estimation algorithms, offering insights into their performance in degraded conditions. Models trained on our dataset generalize well to unseen smoky conditions, highlighting the robustness of stereo thermal imaging for depth perception. We aim for this work to enhance robotic perception in disaster scenarios, allowing for exploration and operations in previously unreachable areas. The dataset and source code are available at https://firestereo.github.io., Comment: Under review in RA-L. The first 2 authors contributed equally
Published: 2024

5. VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

Author: Kim, Junsu, Lee, Junhee, Shin, Ukcheol, Oh, Jean, and Joo, Kyungdon
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Monocular 3D semantic occupancy prediction is becoming important in robot vision due to the compactness of using a single RGB camera. However, existing methods often do not adequately account for camera perspective geometry, resulting in information imbalance along the depth range of the image. To address this issue, we propose a vanishing point (VP) guided monocular 3D semantic occupancy prediction framework named VPOcc. Our framework consists of three novel modules utilizing VP. First, in the VPZoomer module, we initially utilize VP in feature extraction to achieve information balanced feature extraction across the scene by generating a zoom-in image based on VP. Second, we perform perspective geometry-aware feature aggregation by sampling points towards VP using a VP-guided cross-attention (VPCA) module. Finally, we create an information-balanced feature volume by effectively fusing original and zoom-in voxel feature volumes with a balanced feature volume fusion (BVFV) module. Experiments demonstrate that our method achieves state-of-the-art performance for both IoU and mIoU on SemanticKITTI and SSCBench-KITTI360. These results are obtained by effectively addressing the information imbalance in images through the utilization of VP. Our code will be available at www.github.com/anonymous.
Published: 2024

6. RoPotter: Toward Robotic Pottery and Deformable Object Manipulation with Structural Priors

Author: Yoo, Uksang, Hung, Adam, Francis, Jonathan, Oh, Jean, and Ichnowski, Jeffrey
Subjects: Computer Science - Robotics
Abstract: Humans are capable of continuously manipulating a wide variety of deformable objects into complex shapes. This is made possible by our intuitive understanding of material properties and mechanics of the object, for reasoning about object states even when visual perception is occluded. These capabilities allow us to perform diverse tasks ranging from cooking with dough to expressing ourselves with pottery-making. However, developing robotic systems to robustly perform similar tasks remains challenging, as current methods struggle to effectively model volumetric deformable objects and reason about the complex behavior they typically exhibit. To study the robotic systems and algorithms capable of deforming volumetric objects, we introduce a novel robotics task of continuously deforming clay on a pottery wheel. We propose a pipeline for perception and pottery skill-learning, called RoPotter, wherein we demonstrate that structural priors specific to the task of pottery-making can be exploited to simplify the pottery skill-learning process. Namely, we can project the cross-section of the clay to a plane to represent the state of the clay, reducing dimensionality. We also demonstrate a mesh-based method of occluded clay state recovery, toward robotic agents capable of continuously deforming clay. Our experiments show that by using the reduced representation with structural priors based on the deformation behaviors of the clay, RoPotter can perform the long-horizon pottery task with 44.4% lower final shape error compared to the state-of-the-art baselines.
Published: 2024

7. Amelia: A Large Model and Dataset for Airport Surface Movement Forecasting

Author: Navarro, Ingrid, Ortega-Kral, Pablo, Patrikar, Jay, Wang, Haichuan, Ye, Zelin, Park, Jong Hoon, Oh, Jean, and Scherer, Sebastian
Subjects: Computer Science - Machine Learning
Abstract: The growing demand for air travel requires technological advancements in air traffic management as well as mechanisms for monitoring and ensuring safe and efficient operations. In terminal airspaces, predictive models of future movements and traffic flows can help with proactive planning and efficient coordination; however, varying airport topologies, and interactions with other agents, among other factors, make accurate predictions challenging. Data-driven predictive models have shown promise for handling numerous variables to enable various downstream tasks, including collision risk assessment, taxi-out time prediction, departure metering, and emission estimations. While data-driven methods have shown improvements in these tasks, prior works lack large-scale curated surface movement datasets within the public domain and the development of generalizable trajectory forecasting models. In response to this, we propose two contributions: (1) Amelia-48, a large surface movement dataset collected using the System Wide Information Management (SWIM) Surface Movement Event Service (SMES). With data collection beginning in Dec 2022, the dataset provides more than a year's worth of SMES data (~30TB) and covers 48 airports within the US National Airspace System. In addition to releasing this data in the public domain, we also provide post-processing scripts and associated airport maps to enable research in the forecasting domain and beyond. (2) Amelia-TF model, a transformer-based next-token-prediction large multi-agent multi-airport trajectory forecasting model trained on 292 days or 9.4 billion tokens of position data encompassing 10 different airports with varying topology. The open-sourced model is validated on unseen airports with experiments showcasing the different prediction horizon lengths, ego-agent selection strategies, and training recipes to demonstrate the generalization capabilities., Comment: 24 pages, 9 figures, 8 tables
Published: 2024
Full Text: View/download PDF

8. Flow4D: Leveraging 4D Voxel Network for LiDAR Scene Flow Estimation

Author: Kim, Jaeyeul, Woo, Jungwan, Shin, Ukcheol, Oh, Jean, and Im, Sunghoon
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Understanding the motion states of the surrounding environment is critical for safe autonomous driving. These motion states can be accurately derived from scene flow, which captures the three-dimensional motion field of points. Existing LiDAR scene flow methods extract spatial features from each point cloud and then fuse them channel-wise, resulting in the implicit extraction of spatio-temporal features. Furthermore, they utilize 2D Bird's Eye View and process only two frames, missing crucial spatial information along the Z-axis and the broader temporal context, leading to suboptimal performance. To address these limitations, we propose Flow4D, which temporally fuses multiple point clouds after the 3D intra-voxel feature encoder, enabling more explicit extraction of spatio-temporal features through a 4D voxel network. However, while using 4D convolution improves performance, it significantly increases the computational load. For further efficiency, we introduce the Spatio-Temporal Decomposition Block (STDB), which combines 3D and 1D convolutions instead of using heavy 4D convolution. In addition, Flow4D further improves performance by using five frames to take advantage of richer temporal information. As a result, the proposed method achieves a 45.9% higher performance compared to the state-of-the-art while running in real-time, and won 1st place in the 2024 Argoverse 2 Scene Flow Challenge. The code is available at https://github.com/dgist-cvlab/Flow4D., Comment: 8 pages, 4 figures
Published: 2024

9. How is the Pilot Doing: VTOL Pilot Workload Estimation by Multimodal Machine Learning on Psycho-physiological Signals

Author: Park, Jong Hoon, Chen, Lawrence, Higgins, Ian, Zheng, Zhaobo, Mehrotra, Shashank, Salubre, Kevin, Mousaei, Mohammadreza, Willits, Steven, Levedahl, Blain, Buker, Timothy, Xing, Eliot, Misu, Teruhisa, Scherer, Sebastian, and Oh, Jean
Subjects: Computer Science - Human-Computer Interaction
Abstract: Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a critical factor for safe and efficient operation of VTOLs. In this work, we conduct a user study to collect multimodal data from 28 pilots while they perform a variety of VTOL flight tasks. We analyze and interpolate behavioral patterns related to their performance and perceived workload. Finally, we build machine learning models to estimate their workload from the collected data. Our results are promising, suggesting that quantitative and accurate VTOL pilot workload monitoring is viable. Such assistive tools would help the research field understand VTOL operations and serve as a stepping stone for the industry to ensure VTOL safe operations and further remote operations., Comment: 8 pages, 7 figures
Published: 2024

10. Towards Human-Centered Construction Robotics: A Reinforcement Learning-Driven Companion Robot for Contextually Assisting Carpentry Workers

Author: Wu, Yuning, Wei, Jiaying, Oh, Jean, and Llach, Daniel Cardoso
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a "work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workflow fluency while respecting construction labor's skilled nature. We conduct an in-depth study on deploying a robotic system in carpentry formwork, showcasing a prototype that emphasizes mobility, safety, and comfortable worker-robot collaboration in dynamic environments through a contextual Reinforcement Learning (RL)-driven modular framework. Our research advances robotic applications in construction, advocating for collaborative models where adaptive robots support rather than replace humans, underscoring the potential for an interactive and collaborative human-robot workforce., Comment: 8 pages, 9 figures. This work has been submitted to the IEEE for possible publication
Published: 2024

11. Design and Control Co-Optimization for Automated Design Iteration of Dexterous Anthropomorphic Soft Robotic Hands

Author: Mannam, Pragna, Liu, Xingyu, Zhao, Ding, Oh, Jean, and Pollard, Nancy
Subjects: Computer Science - Robotics
Abstract: We automate soft robotic hand design iteration by co-optimizing design and control policy for dexterous manipulation skills in simulation. Our design iteration pipeline combines genetic algorithms and policy transfer to learn control policies for nearly 400 hand designs, testing grasp quality under external force disturbances. We validate the optimized designs in the real world through teleoperation of pickup and reorient manipulation tasks. Our real world evaluation, from over 900 teleoperated tasks, shows that the trend in design performance in simulation resembles that of the real world. Furthermore, we show that optimized hand designs from our approach outperform existing soft robot hands from prior work in the real world. The results highlight the usefulness of simulation in guiding parameter choices for anthropomorphic soft robotic hand systems, and the effectiveness of our automated design iteration approach, despite the sim-to-real gap.
Published: 2024

12. CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting

Author: Schaldenbrand, Peter, Parmar, Gaurav, Zhu, Jun-Yan, McCann, James, and Oh, Jean
Subjects: Computer Science - Robotics
Abstract: Prior robot painting and drawing work, such as FRIDA, has focused on decreasing the sim-to-real gap and expanding input modalities for users, but the interaction with these systems generally exists only in the input stages. To support interactive, human-robot collaborative painting, we introduce the Collaborative FRIDA (CoFRIDA) robot painting framework, which can co-paint by modifying and engaging with content already painted by a human collaborator. To improve text-image alignment, FRIDA's major weakness, our system uses pre-trained text-to-image models; however, pre-trained models in the context of real-world co-painting do not perform well because they (1) do not understand the constraints and abilities of the robot and (2) cannot perform co-painting without making unrealistic edits to the canvas and overwriting content. We propose a self-supervised fine-tuning procedure that can tackle both issues, allowing the use of pre-trained state-of-the-art text-image alignment models with robots to enable co-painting in the physical world. Our open-source approach, CoFRIDA, creates paintings and drawings that match the input text prompt more clearly than FRIDA, both from a blank canvas and one with human created work. More generally, our fine-tuning procedure successfully encodes the robot's constraints and abilities into a foundation model, showcasing promising results as an effective method for reducing sim-to-real gaps.
Published: 2024

13. SoRTS: Learned Tree Search for Long Horizon Social Robot Navigation

Author: Navarro, Ingrid, Patrikar, Jay, Dantas, Joao P. A., Baijal, Rohan, Higgins, Ian, Scherer, Sebastian, and Oh, Jean
Subjects: Computer Science - Robotics
Abstract: The fast-growing demand for fully autonomous robots in shared spaces calls for the development of trustworthy agents that can safely and seamlessly navigate in crowded environments. Recent models for motion prediction show promise in characterizing social interactions in such environments. Still, adapting them for navigation is challenging as they often suffer from generalization failures. Prompted by this, we propose Social Robot Tree Search (SoRTS), an algorithm for safe robot navigation in social domains. SoRTS aims to augment existing socially aware motion prediction models for long-horizon navigation using Monte Carlo Tree Search. We use social navigation in general aviation as a case study to evaluate our approach and further the research in full-scale aerial autonomy. In doing so, we introduce XPlaneROS, a high-fidelity aerial simulator that enables human-robot interaction. We use XPlaneROS to conduct a first-of-its-kind user study where 26 FAA-certified pilots interact with a human pilot, our algorithm, and its ablation. Our results, supported by statistical evidence, show that SoRTS exhibits a comparable performance to competent human pilots, significantly outperforming its ablation. Finally, we complement these results with a broad set of self-play experiments to showcase our algorithm's performance in scenarios with increasing complexity., Comment: arXiv admin note: substantial text overlap with arXiv:2304.01428
Published: 2023

14. SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

Author: Stoler, Benjamin, Navarro, Ingrid, Jana, Meghdeep, Hwang, Soonmin, Francis, Jonathan, and Oh, Jean
Subjects: Computer Science - Robotics
Abstract: As autonomous driving technology matures, safety and robustness of its key components, including trajectory prediction, is vital. Though real-world datasets, such as Waymo Open Motion, provide realistic recorded scenarios for model development, they often lack truly safety-critical situations. Rather than utilizing unrealistic simulation or dangerous real-world testing, we instead propose a framework to characterize such datasets and find hidden safety-relevant scenarios within. Our approach expands the spectrum of safety-relevance, allowing us to study trajectory prediction models under a safety-informed, distribution shift setting. We contribute a generalized scenario characterization method, a novel scoring scheme to find subtly-avoided risky scenarios, and an evaluation of trajectory prediction models in this setting. We further contribute a remediation strategy, achieving a 10% average reduction in prediction collision rates. To facilitate future research, we release our code to the public: github.com/cmubig/SafeShift, Comment: 10 pages, 5 figures, 5 tables
Published: 2023

15. EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

Author: Bae, Inhwan, Oh, Jean, and Jeon, Hae-Gon
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Capturing high-dimensional social interactions and feasible futures is essential for predicting trajectories. To address this complex nature, several attempts have been devoted to reducing the dimensionality of the output variables via parametric curve fitting such as the B\'ezier curve and B-spline function. However, these functions, which originate in computer graphics fields, are not suitable to account for socially acceptable human dynamics. In this paper, we present EigenTrajectory ($\mathbb{ET}$), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as $\mathbb{ET}$ space, in place of Euclidean space, for representing pedestrian movements. We first reduce the complexity of the trajectory descriptor via a low-rank approximation. We transform the pedestrians' history paths into our $\mathbb{ET}$ space represented by spatio-temporal principle components, and feed them into off-the-shelf trajectory forecasting models. The inputs and outputs of the models as well as social interactions are all gathered and aggregated in the corresponding $\mathbb{ET}$ space. Lastly, we propose a trajectory anchor-based refinement method to cover all possible futures in the proposed $\mathbb{ET}$ space. Extensive experiments demonstrate that our EigenTrajectory predictor can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks, indicating that the proposed descriptor is suited to represent pedestrian behaviors. Code is publicly available at https://github.com/inhwanbae/EigenTrajectory ., Comment: Accepted at ICCV 2023
Published: 2023

16. Designing Anthropomorphic Soft Hands through Interaction

Author: Mannam, Pragna, Shaw, Kenneth, Bauer, Dominik, Oh, Jean, Pathak, Deepak, and Pollard, Nancy
Subjects: Computer Science - Robotics
Abstract: Modeling and simulating soft robot hands can aid in design iteration for complex and high degree-of-freedom (DoF) morphologies. This can be further supplemented by iterating on the design based on its performance in real world manipulation tasks. However, iterating in the real world requires an approach that allows us to test new designs quickly at low costs. In this paper, we leverage rapid prototyping of the hand using 3D-printing, and utilize teleoperation to evaluate the hand in real world manipulation tasks. Using this method, we design a 3D-printed 16-DoF dexterous anthropomorphic soft hand (DASH) and iteratively improve its design over five iterations. Rapid prototyping techniques such as 3D-printing allow us to directly evaluate the fabricated hand without modeling it in simulation. We show that the design improves over five design iterations through evaluating the hand's performance in 30 real-world teleoperated manipulation tasks. Testing over 900 demonstrations shows that our final version of DASH can solve 19 of the 30 tasks compared to Allegro, a popular rigid hand in the market, which can only solve 7 tasks. We open-source our CAD models as well as the teleoperated dataset for further study.
Published: 2023

17. FishRecGAN: An End to End GAN Based Network for Fisheye Rectification and Calibration

Author: Shen, Xin, Joo, Kyungdon, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: We propose an end-to-end deep learning approach to rectify fisheye images and simultaneously calibrate camera intrinsic and distortion parameters. Our method consists of two parts: a Quick Image Rectification Module developed with a Pix2Pix GAN and Wasserstein GAN (W-Pix2PixGAN), and a Calibration Module with a CNN architecture. Our Quick Rectification Network performs robust rectification with good resolution, making it suitable for constant calibration in camera-based surveillance equipment. To achieve high-quality calibration, we use the straightened output from the Quick Rectification Module as a guidance-like semantic feature map for the Calibration Module to learn the geometric relationship between the straightened feature and the distorted feature. We train and validate our method with a large synthesized dataset labeled with well-simulated parameters applied to a perspective image dataset. Our solution has achieved robust performance in high-resolution with a significant PSNR value of 22.343., Comment: 18 pages, 7 figures, 4 tables, accepted by AAIML 2023
Published: 2023

18. Regularizing Self-training for Unsupervised Domain Adaptation via Structural Constraints

Author: Das, Rajshekhar, Francis, Jonathan, Mehta, Sanket Vaibhav, Oh, Jean, Strubell, Emma, and Moura, Jose
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-training based on pseudo-labels has emerged as a dominant approach for addressing conditional distribution shifts in unsupervised domain adaptation (UDA) for semantic segmentation problems. A notable drawback, however, is that this family of approaches is susceptible to erroneous pseudo labels that arise from confirmation biases in the source domain and that manifest as nuisance factors in the target domain. A possible source for this mismatch is the reliance on only photometric cues provided by RGB image inputs, which may ultimately lead to sub-optimal adaptation. To mitigate the effect of mismatched pseudo-labels, we propose to incorporate structural cues from auxiliary modalities, such as depth, to regularise conventional self-training objectives. Specifically, we introduce a contrastive pixel-level objectness constraint that pulls the pixel representations within a region of an object instance closer, while pushing those from different object categories apart. To obtain object regions consistent with the true underlying object, we extract information from both depth maps and RGB-images in the form of multimodal clustering. Crucially, the objectness constraint is agnostic to the ground-truth semantic labels and, hence, appropriate for unsupervised domain adaptation. In this work, we show that our regularizer significantly improves top performing self-training methods (by up to $2$ points) in various UDA benchmarks for semantic segmentation. We include all code in the supplementary.
Published: 2023

19. Core Challenges in Embodied Vision-Language Planning

Author: Francis, Jonathan, Kitamura, Nariaki, Labelle, Felix, Lu, Xiaopeng, Navarro, Ingrid, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction
Abstract: Recent advances in the areas of Multimodal Machine Learning and Artificial Intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Robotics. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly leverage computer vision and natural language for interaction in physical environments. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the current and new algorithmic approaches, metrics, simulators, and datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalisability and furthers real-world deployment., Comment: Extended Abstract accepted to the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023); special journal track for authors of published JAIR 2022 and AIJ 2022 papers. 6 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2106.13948
Published: 2023

20. Learned Tree Search for Long-Horizon Social Robot Navigation in Shared Airspace

Author: Navarro, Ingrid, Patrikar, Jay, Dantas, Joao P. A., Baijal, Rohan, Higgins, Ian, Scherer, Sebastian, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: The fast-growing demand for fully autonomous aerial operations in shared spaces necessitates developing trustworthy agents that can safely and seamlessly navigate in crowded, dynamic spaces. In this work, we propose Social Robot Tree Search (SoRTS), an algorithm for the safe navigation of mobile robots in social domains. SoRTS aims to augment existing socially-aware trajectory prediction policies with a Monte Carlo Tree Search planner for improved downstream navigation of mobile robots. To evaluate the performance of our method, we choose the use case of social navigation for general aviation. To aid this evaluation, within this work, we also introduce X-PlaneROS, a high-fidelity aerial simulator, to enable more research in full-scale aerial autonomy. By conducting a user study based on the assessments of 26 FAA certified pilots, we show that SoRTS performs comparably to a competent human pilot, significantly outperforming our baseline algorithm. We further complement these results with self-play experiments in scenarios with increasing complexity., Comment: 8 Pages, 3 Figs, 4 Tables
Published: 2023

21. Complementary Random Masking for RGB-Thermal Semantic Segmentation

Author: Shin, Ukcheol, Lee, Kyunghyun, Kweon, In So, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: RGB-thermal semantic segmentation is one potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions. However, the previous studies mostly focus on designing a multi-modal fusion module without consideration of the nature of multi-modality inputs. Therefore, the networks easily become over-reliant on a single modality, making it difficult to learn complementary and meaningful representations for each modality. This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities. The proposed masking strategy prevents over-reliance on a single modality. It also improves the accuracy and robustness of the neural network by forcing the network to segment and classify objects even when one modality is partially available. Also, the proposed self-distillation loss encourages the network to extract complementary and meaningful representations from a single modality or complementary masked modalities. Based on the proposed method, we achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks. Our source code is available at https://github.com/UkcheolShin/CRM_RGBTSeg., Comment: ICRA 2024, Our source code is available at https://github.com/UkcheolShin/CRM_RGBTSeg
Published: 2023

22. Robot Synesthesia: A Sound and Emotion Guided AI Painter

Author: Misra, Vihaan, Schaldenbrand, Peter, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: If a picture paints a thousand words, sound may voice a million. While recent robotic painting and image synthesis methods have achieved progress in generating visuals from text inputs, the translation of sound into images is vastly unexplored. Generally, sound-based interfaces and sonic interactions have the potential to expand accessibility and control for the user and provide a means to convey complex emotions and the dynamic aspects of the real world. In this paper, we propose an approach for using sound and speech to guide a robotic painting process, known here as robot synesthesia. For general sound, we encode the simulated paintings and input sounds into the same latent space. For speech, we decouple speech into its transcribed text and the tone of the speech. Whereas we use the text to control the content, we estimate the emotions from the tone to guide the mood of the painting. Our approach has been fully integrated with FRIDA, a robotic painting framework, adding sound and speech to FRIDA's existing input modalities, such as text and style. In two surveys, participants were able to correctly guess the emotion or natural sound used to generate a given painting more than twice as likely as random chance. On our sound-guided image manipulation and music-guided paintings, we discuss the results qualitatively., Comment: 9 pages, 10 figures
Published: 2023

23. Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset

Author: Liu, Zhixuan, Shin, Youeun, Okogwu, Beverley-Claire, Yun, Youngsik, Coleman, Lia, Schaldenbrand, Peter, Kim, Jihie, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: It has been shown that accurate representation in media improves the well-being of the people who consume it. By contrast, inaccurate representations can negatively affect viewers and lead to harmful perceptions of other cultures. To achieve inclusive representation in generated images, we propose a culturally-aware priming approach for text-to-image synthesis using a small but culturally curated dataset that we collected, known here as Cross-Cultural Understanding Benchmark (CCUB) Dataset, to fight the bias prevalent in giant datasets. Our proposed approach is comprised of two fine-tuning techniques: (1) Adding visual context via fine-tuning a pre-trained text-to-image synthesis model, Stable Diffusion, on the CCUB text-image pairs, and (2) Adding semantic context via automated prompt engineering using the fine-tuned large language model, GPT-3, trained on our CCUB culturally-aware text data. CCUB dataset is curated and our approach is evaluated by people who have a personal relationship with that particular culture. Our experiments indicate that priming using both text and image is effective in improving the cultural relevance and decreasing the offensiveness of generated images while maintaining quality., Comment: Still on going work
Published: 2023

24. Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation

Author: Tatiya, Gyan, Francis, Jonathan, Bondi, Luca, Navarro, Ingrid, Nyberg, Eric, Sinapov, Jivko, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding objects, resorting instead to evaluating agents on unheard sound clips of known objects; meanwhile, previous SAVi methods do not include explicit mechanisms for incorporating domain knowledge about object and region semantics. These weaknesses limit the development and assessment of models' abilities to generalise their learned experience. In this work, we introduce the use of knowledge-driven scene priors in the semantic audio-visual embodied navigation task: we combine semantic information from our novel knowledge graph that encodes object-region relations, spatial knowledge from dual Graph Encoder Networks, and background knowledge from a series of pre-training tasks -- all within a reinforcement learning framework for audio-visual navigation. We also define a new audio-visual navigation sub-task, where agents are evaluated on novel sounding objects, as opposed to unheard clips of known objects. We show improvements over strong baselines in generalisation to unseen regions and novel sounding objects, within the Habitat-Matterport3D simulation environment, under the SoundSpaces task., Comment: 19 pages, 8 figures, 9 tables
Published: 2022

25. Distribution-aware Goal Prediction and Conformant Model-based Planning for Safe Autonomous Driving

Author: Francis, Jonathan, Chen, Bingqing, Yao, Weiran, Nyberg, Eric, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Systems and Control
Abstract: The feasibility of collecting a large amount of expert demonstrations has inspired growing research interests in learning-to-drive settings, where models learn by imitating the driving behaviour from experts. However, exclusively relying on imitation can limit agents' generalisability to novel scenarios that are outside the support of the training data. In this paper, we address this challenge by factorising the driving task, based on the intuition that modular architectures are more generalisable and more robust to changes in the environment compared to monolithic, end-to-end frameworks. Specifically, we draw inspiration from the trajectory forecasting community and reformulate the learning-to-drive task as obstacle-aware perception and grounding, distribution-aware goal prediction, and model-based planning. Firstly, we train the obstacle-aware perception module to extract salient representation of the visual context. Then, we learn a multi-modal goal distribution by performing conditional density-estimation using normalising flow. Finally, we ground candidate trajectory predictions road geometry, and plan the actions based on on vehicle dynamics. Under the CARLA simulator, we report state-of-the-art results on the CARNOVEL benchmark., Comment: Accepted: 1st Workshop on Safe Learning for Autonomous Driving, at the International Conference on Machine Learning (ICML 2022); Best Paper Award
Published: 2022

26. Challenges in Close-Proximity Safe and Seamless Operation of Manned and Unmanned Aircraft in Shared Airspace

Author: Patrikar, Jay, Dantas, Joao P. A., Ghosh, Sourish, Kapoor, Parv, Higgins, Ian, Aloor, Jasmine J., Navarro, Ingrid, Sun, Jimin, Stoler, Ben, Hamidi, Milad, Baijal, Rohan, Moon, Brady, Oh, Jean, and Scherer, Sebastian
Subjects: Computer Science - Robotics
Abstract: We propose developing an integrated system to keep autonomous unmanned aircraft safely separated and behave as expected in conjunction with manned traffic. The main goal is to achieve safe manned-unmanned vehicle teaming to improve system performance, have each (robot/human) teammate learn from each other in various aircraft operations, and reduce the manning needs of manned aircraft. The proposed system anticipates and reacts to other aircraft using natural language instructions and can serve as a co-pilot or operate entirely autonomously. We point out the main technical challenges where improvements on current state-of-the-art are needed to enable Visual Flight Rules to fully autonomous aerial operations, bringing insights to these critical areas. Furthermore, we present an interactive demonstration in a prototypical scenario with one AI pilot and one human pilot sharing the same terminal airspace, interacting with each other using language, and landing safely on the same runway. We also show a demonstration of a vision-only aircraft detection system.
Published: 2022

27. Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization

Author: Schaldenbrand, Peter, Liu, Zhixuan, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We introduce an approach to generating videos based on a series of given language descriptions. Frames of the video are generated sequentially and optimized by guidance from the CLIP image-text encoder; iterating through language descriptions, weighting the current description higher than others. As opposed to optimizing through an image generator model itself, which tends to be computationally heavy, the proposed approach computes the CLIP loss directly at the pixel level, achieving general content at a speed suitable for near real-time systems. The approach can generate videos in up to 720p resolution, variable frame-rates, and arbitrary aspect ratios at a rate of 1-2 frames per second. Please visit our website to view videos and access our open-source code: https://pschaldenbrand.github.io/text2video/ .
Published: 2022

28. FRIDA: A Collaborative Robot Painter with a Differentiable, Real2Sim2Real Planning Environment

Author: Schaldenbrand, Peter, McCann, James, and Oh, Jean
Subjects: Computer Science - Robotics
Abstract: Painting is an artistic process of rendering visual content that achieves the high-level communication goals of an artist that may change dynamically throughout the creative process. In this paper, we present a Framework and Robotics Initiative for Developing Arts (FRIDA) that enables humans to produce paintings on canvases by collaborating with a painter robot using simple inputs such as language descriptions or images. FRIDA introduces several technical innovations for computationally modeling a creative painting process. First, we develop a fully differentiable simulation environment for painting, adopting the idea of real to simulation to real (real2sim2real). We show that our proposed simulated painting environment is higher fidelity to reality than existing simulation environments used for robot painting. Second, to model the evolving dynamics of a creative process, we develop a planning approach that can continuously optimize the painting plan based on the evolving canvas with respect to the high-level goals. In contrast to existing approaches where the content generation process and action planning are performed independently and sequentially, FRIDA adapts to the stochastic nature of using paint and a brush by continually re-planning and re-assessing its semantic goals based on its visual perception of the painting progress. We describe the details on the technical approach as well as the system integration.
Published: 2022

29. Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

Author: Aloor, Jasmine Jerry, Patrikar, Jay, Kapoor, Parv, Oh, Jean, and Scherer, Sebastian
Subjects: Computer Science - Robotics
Abstract: Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics., Comment: Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2023, 7 pages
Published: 2022

30. T2FPV: Dataset and Method for Correcting First-Person View Errors in Pedestrian Trajectory Prediction

Author: Stoler, Benjamin, Jana, Meghdeep, Hwang, Soonmin, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Predicting pedestrian motion is essential for developing socially-aware robots that interact in a crowded environment. While the natural visual perspective for a social interaction setting is an egocentric view, the majority of existing work in trajectory prediction therein has been investigated purely in the top-down trajectory space. To support first-person view trajectory prediction research, we present T2FPV, a method for constructing high-fidelity first-person view (FPV) datasets given a real-world, top-down trajectory dataset; we showcase our approach on the ETH/UCY pedestrian dataset to generate the egocentric visual data of all interacting pedestrians, creating the T2FPV-ETH dataset. In this setting, FPV-specific errors arise due to imperfect detection and tracking, occlusions, and field-of-view (FOV) limitations of the camera. To address these errors, we propose CoFE, a module that further refines the imputation of missing data in an end-to-end manner with trajectory forecasting algorithms. Our method reduces the impact of such FPV errors on downstream prediction performance, decreasing displacement error by more than 10% on average. To facilitate research engagement, we release our T2FPV-ETH dataset and software tools.
Published: 2022

31. Social-PatteRNN: Socially-Aware Trajectory Prediction Guided by Motion Patterns

Author: Navarro, Ingrid and Oh, Jean
Subjects: Computer Science - Robotics
Abstract: As robots across domains start collaborating with humans in shared environments, algorithms that enable them to reason over human intent are important to achieve safe interplay. In our work, we study human intent through the problem of predicting trajectories in dynamic environments. We explore domains where navigation guidelines are relatively strictly defined but not clearly marked in their physical environments. We hypothesize that within these domains, agents tend to exhibit short-term motion patterns that reveal context information related to the agent's general direction, intermediate goals and rules of motion, e.g., social behavior. From this intuition, we propose Social-PatteRNN, an algorithm for recurrent, multi-modal trajectory prediction that exploits motion patterns to encode the aforesaid contexts. Our approach guides long-term trajectory prediction by learning to predict short-term motion patterns. It then extracts sub-goal information from the patterns and aggregates it as social context. We assess our approach across three domains: humans crowds, humans in sports and manned aircraft in terminal airspace, achieving state-of-the-art performance.
Published: 2022

32. RCA: Ride Comfort-Aware Visual Navigation via Self-Supervised Learning

Author: Yao, Xinjie, Zhang, Ji, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Under shared autonomy, wheelchair users expect vehicles to provide safe and comfortable rides while following users high-level navigation plans. To find such a path, vehicles negotiate with different terrains and assess their traversal difficulty. Most prior works model surroundings either through geometric representations or semantic classifications, which do not reflect perceived motion intensity and ride comfort in downstream navigation tasks. We propose to model ride comfort explicitly in traversability analysis using proprioceptive sensing. We develop a self-supervised learning framework to predict traversability costmap from first-person-view images by leveraging vehicle states as training signals. Our approach estimates how the vehicle would feel if traversing over based on terrain appearances. We then show our navigation system provides human-preferred ride comfort through robot experiments together with a human evaluation study.
Published: 2022

33. Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing

Author: Francis, Jonathan, Chen, Bingqing, Ganju, Siddha, Kathpal, Sidharth, Poonganam, Jyotish, Shivani, Ayush, Vyas, Vrushank, Genc, Sahika, Zhukov, Ivan, Kumskoy, Max, Koul, Anirudh, Oh, Jean, and Nyberg, Eric
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Systems and Control
Abstract: We present the results of our autonomous racing virtual challenge, based on the newly-released Learn-to-Race (L2R) simulation framework, which seeks to encourage interdisciplinary research in autonomous driving and to help advance the state of the art on a realistic benchmark. Analogous to racing being used to test cutting-edge vehicles, we envision autonomous racing to serve as a particularly challenging proving ground for autonomous agents as: (i) they need to make sub-second, safety-critical decisions in a complex, fast-changing environment; and (ii) both perception and control must be robust to distribution shifts, novel road features, and unseen obstacles. Thus, the main goal of the challenge is to evaluate the joint safety, performance, and generalisation capabilities of reinforcement learning agents on multi-modal perception, through a two-stage process. In the first stage of the challenge, we evaluate an autonomous agent's ability to drive as fast as possible, while adhering to safety constraints. In the second stage, we additionally require the agent to adapt to an unseen racetrack through safe exploration. In this paper, we describe the new L2R Task 2.0 benchmark, with refined metrics and baseline approaches. We also provide an overview of deployment, evaluation, and rankings for the inaugural instance of the L2R Autonomous Racing Virtual Challenge (supported by Carnegie Mellon University, Arrival Ltd., AICrowd, Amazon Web Services, and Honda Research), which officially used the new L2R Task 2.0 benchmark and received over 20,100 views, 437 active participants, 46 teams, and 733 model submissions -- from 88+ unique institutions, in 58+ different countries. Finally, we release leaderboard results from the challenge and provide description of the two top-ranking approaches in cross-domain model transfer, across multiple sensor configurations and simulated races., Comment: 20 pages, 4 figures, 2 tables
Published: 2022

34. StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation

Author: Schaldenbrand, Peter, Liu, Zhixuan, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Generating images that fit a given text description using machine learning has improved greatly with the release of technologies such as the CLIP image-text encoder model; however, current methods lack artistic control of the style of image to be generated. We present an approach for generating styled drawings for a given text description where a user can specify a desired drawing style using a sample image. Inspired by a theory in art that style and content are generally inseparable during the creative process, we propose a coupled approach, known here as StyleCLIPDraw, whereby the drawing is generated by optimizing for style and content simultaneously throughout the process as opposed to applying style transfer after creating content in a sequence. Based on human evaluation, the styles of images generated by StyleCLIPDraw are strongly preferred to those by the sequential approach. Although the quality of content generation degrades for certain styles, overall considering both content \textit{and} style, StyleCLIPDraw is found far more preferred, indicating the importance of style, look, and feel of machine generated images to people as well as indicating that style is coupled in the drawing process itself. Our code (https://github.com/pschaldenbrand/StyleCLIPDraw), a demonstration (https://replicate.com/pschaldenbrand/style-clip-draw), and style evaluation data (https://www.kaggle.com/pittsburghskeet/drawings-with-style-evaluation-styleclipdraw) are publicly available.
Published: 2022

35. UGV-UAV Object Geolocation in Unstructured Environments

Author: Guttendorf, David, Hamilton, D. W. Wilson, Heckman, Anne Harris, Herman, Herman, Jonathan, Felix, Kannappan, Prasanna, Mireles, Nicholas, Navarro-Serment, Luis, Oh, Jean, Pu, Wei, Saxena, Rohan, Schneider, Jeff, Schnur, Matt, Tiernan, Carter, and Tabor, Trenton
Subjects: Computer Science - Robotics
Abstract: A robotic system of multiple unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) has the potential for advancing autonomous object geolocation performance. Much research has focused on algorithmic improvements on individual components, such as navigation, motion planning, and perception. In this paper, we present a UGV-UAV object detection and geolocation system, which performs perception, navigation, and planning autonomously in real scale in unstructured environment. We designed novel sensor pods equipped with multispectral (visible, near-infrared, thermal), high resolution (181.6 Mega Pixels), stereo (near-infrared pair), wide field of view (192 degree HFOV) array. We developed a novel on-board software-hardware architecture to process the high volume sensor data in real-time, and we built a custom AI subsystem composed of detection, tracking, navigation, and planning for autonomous objects geolocation in real-time. This research is the first real scale demonstration of such high speed data processing capability. Our novel modular sensor pod can boost relevant computer vision and machine learning research. Our novel hardware-software architecture is a solid foundation for system-level and component-level research. Our system is validated through data-driven offline tests as well as a series of field tests in unstructured environments. We present quantitative results as well as discussions on key robotic system level challenges which manifest when we build and test the system. This system is the first step toward a UGV-UAV cooperative reconnaissance system in the future., Comment: Authors are with National Robotics Engineering Center, the Robotics Institute of Carnegie Mellon University, Pittsburgh PA, listed in alphabetical order. E-mail: wpu@nrec.ri.cmu.edu
Published: 2022

36. StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis

Author: Schaldenbrand, Peter, Liu, Zhixuan, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Generating images that fit a given text description using machine learning has improved greatly with the release of technologies such as the CLIP image-text encoder model; however, current methods lack artistic control of the style of image to be generated. We introduce StyleCLIPDraw which adds a style loss to the CLIPDraw text-to-drawing synthesis model to allow artistic control of the synthesized drawings in addition to control of the content via text. Whereas performing decoupled style transfer on a generated image only affects the texture, our proposed coupled approach is able to capture a style in both texture and shape, suggesting that the style of the drawing is coupled with the drawing process itself. More results and our code are available at https://github.com/pschaldenbrand/StyleCLIPDraw, Comment: Superseded by arXiv:2202.12362
Published: 2021

37. Autonomous Exploration Development Environment and the Planning Algorithms

Author: Cao, Chao, Zhu, Hongbiao, Yang, Fan, Xia, Yukun, Choset, Howie, Oh, Jean, and Zhang, Ji
Subjects: Computer Science - Robotics
Abstract: Autonomous Exploration Development Environment is an open-source repository released to facilitate the development of high-level planning algorithms and integration of complete autonomous navigation systems. The repository contains representative simulation environment models, fundamental navigation modules, e.g., local planner, terrain traversability analysis, waypoint following, and visualization tools. Together with two of our high-level planner releases -- TARE planner for exploration and FAR planner for route planning, we detail usage of the three open-source repositories and share experiences in the integration of autonomous navigation systems. We use DARPA Subterranean Challenge as a use case where the repositories together form the main navigation system of the CMU-OSU Team. In the end, we discuss a few potential use cases in extended applications.
Published: 2021

38. FAR Planner: Fast, Attemptable Route Planner using Dynamic Visibility Update

Author: Yang, Fan, Cao, Chao, Zhu, Hongbiao, Oh, Jean, and Zhang, Ji
Subjects: Computer Science - Robotics
Abstract: The problem of path planning in unknown environments remains a challenging problem - as the environment is gradually observed during the navigation, the underlying planner has to update the environment representation and replan, promptly and constantly, to account for the new observations. In this paper, we present a visibility graph-based planning framework capable of dealing with navigation tasks in both known and unknown environments. The planner employs a polygonal representation of the environment and constructs the representation by extracting edge points around obstacles to form enclosed polygons. With that, the method dynamically updates a global visibility graph using a two-layered data structure, expanding the visibility edges along with the navigation and removing edges that become occluded by newly observed obstacles. When navigating in unknown environments, the method is attemptable in discovering a way to the goal by picking up the environment layout on the fly, updating the visibility graph, and fast re-planning corresponding to the newly observed environment. We evaluate the method in simulated and real-world settings. The method shows the capability to attempt and navigate through unknown environments, reducing the travel time by up to 12-47% from search-based methods: A*, D* Lite, and more than 24-35% than sampling-based methods: RRT*, BIT*, and SPARS., Comment: This paper includes 8 pages, 10 figures, and 3 tables
Published: 2021

39. Safe Autonomous Racing via Approximate Reachability on Ego-vision

Author: Chen, Bingqing, Francis, Jonathan, Oh, Jean, Nyberg, Eric, and Herbert, Sylvia L.
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Systems and Control
Abstract: Racing demands each vehicle to drive at its physical limits, when any safety infraction could lead to catastrophic failure. In this work, we study the problem of safe reinforcement learning (RL) for autonomous racing, using the vehicle's ego-camera view and speed as input. Given the nature of the task, autonomous agents need to be able to 1) identify and avoid unsafe scenarios under the complex vehicle dynamics, and 2) make sub-second decision in a fast-changing environment. To satisfy these criteria, we propose to incorporate Hamilton-Jacobi (HJ) reachability theory, a safety verification method for general non-linear systems, into the constrained Markov decision process (CMDP) framework. HJ reachability not only provides a control-theoretic approach to learn about safety, but also enables low-latency safety verification. Though HJ reachability is traditionally not scalable to high-dimensional systems, we demonstrate that with neural approximation, the HJ safety value can be learned directly on vision context -- the highest-dimensional problem studied via the method, to-date. We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment. Our approach has significantly fewer constraint violations in comparison to other constrained RL baselines in Safety Gym, and achieves the new state-of-the-art results on the L2R benchmark task. We provide additional visualization of agent behavior at the following anonymized paper website: https://sites.google.com/view/safeautonomousracing/home, Comment: 17 pages, 15 figures, 3 tables
Published: 2021

40. Predicting Like A Pilot: Dataset and Method to Predict Socially-Aware Aircraft Trajectories in Non-Towered Terminal Airspace

Author: Patrikar, Jay, Moon, Brady, Oh, Jean, and Scherer, Sebastian
Subjects: Computer Science - Robotics, Computer Science - Human-Computer Interaction
Abstract: Pilots operating aircraft in un-towered airspace rely on their situational awareness and prior knowledge to predict the future trajectories of other agents. These predictions are conditioned on the past trajectories of other agents, agent-agent social interactions and environmental context such as airport location and weather. This paper provides a dataset, $\textit{TrajAir}$, that captures this behaviour in a non-towered terminal airspace around a regional airport. We also present a baseline socially-aware trajectory prediction algorithm, $\textit{TrajAirNet}$, that uses the dataset to predict the trajectories of all agents. The dataset is collected for 111 days over 8 months and contains ADS-B transponder data along with the corresponding METAR weather data. The data is processed to be used as a benchmark with other publicly available social navigation datasets. To the best of authors' knowledge, this is the first 3D social aerial navigation dataset thus introducing social navigation for autonomous aviation. $\textit{TrajAirNet}$ combines state-of-the-art modules in social navigation to provide predictions in a static environment with a dynamic context. Both the $\textit{TrajAir}$ dataset and $\textit{TrajAirNet}$ prediction algorithm are open-source. The dataset, codebase, and video are available at https://theairlab.org/trajair/, https://github.com/castacks/trajairnet, and https://youtu.be/elAQXrxB2gw respectively., Comment: 7 pages, 4 figures, ICRA 2022
Published: 2021
Full Text: View/download PDF

41. Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling

Author: Lu, Xiaopeng, Fan, Zhen, Wang, Yansen, Oh, Jean, and Rose, Carolyn P.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: As an important task in multimodal context understanding, Text-VQA (Visual Question Answering) aims at question answering through reading text information in images. It differentiates from the original VQA task as Text-VQA requires large amounts of scene-text relationship understanding, in addition to the cross-modal grounding capability. In this paper, we propose Localize, Group, and Select (LOGOS), a novel model which attempts to tackle this problem from multiple aspects. LOGOS leverages two grounding tasks to better localize the key information of the image, utilizes scene text clustering to group individual OCR tokens, and learns to select the best answer from different sources of OCR (Optical Character Recognition) texts. Experiments show that LOGOS outperforms previous state-of-the-art methods on two Text-VQA benchmarks without using additional OCR annotation data. Ablation studies and analysis demonstrate the capability of LOGOS to bridge different modalities and better understand scene text., Comment: 9 pages
Published: 2021

42. Core Challenges in Embodied Vision-Language Planning

Author: Francis, Jonathan, Kitamura, Nariaki, Labelle, Felix, Lu, Xiaopeng, Navarro, Ingrid, and Oh, Jean
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment., Comment: Journal of Artificial Intelligence Research 74 (2022) 459-515
Published: 2021

43. Language Understanding for Field and Service Robots in a Priori Unknown Environments

Author: Walter, Matthew R., Patki, Siddharth, Daniele, Andrea F., Fahnestock, Ethan, Duvallet, Felix, Hemachandra, Sachithra, Oh, Jean, Stentz, Anthony, Roy, Nicholas, and Howard, Thomas M.
Subjects: Computer Science - Robotics, Computer Science - Computation and Language
Abstract: Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. This progress now creates an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Realizing this opportunity requires an efficient and flexible medium through which humans can communicate with collaborative robots. Natural language provides one such medium, and through significant progress in statistical methods for natural-language understanding, robots are now able to interpret a diverse array of free-form commands. However, most contemporary approaches require a detailed, prior spatial-semantic map of the robot's environment that models the space of possible referents of an utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially-observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural-language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a "sensor" -- inferring spatial, topological, and semantic information implicit in the utterance and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic, language grounding model and infer a distribution over a symbolic representation of the robot's action space. We use imitation learning to identify a belief-space policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety navigation and mobile-manipulation experiments., Comment: Field Robotics (accepted, to appear)
Published: 2021

44. Self-supervised Learning of 3D Object Understanding by Data Association and Landmark Estimation for Image Sequence

Author: Yu, Hyeonwoo and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose a self-supervised learningmethod for multi-object pose estimation. 3D object under-standing from 2D image is a challenging task that infers ad-ditional dimension from reduced-dimensional information.In particular, the estimation of the 3D localization or orien-tation of an object requires precise reasoning, unlike othersimple clustering tasks such as object classification. There-fore, the scale of the training dataset becomes more cru-cial. However, it is challenging to obtain large amount of3D dataset since achieving 3D annotation is expensive andtime-consuming. If the scale of the training dataset can beincreased by involving the image sequence obtained fromsimple navigation, it is possible to overcome the scale lim-itation of the dataset and to have efficient adaptation tothe new environment. However, when the self annotation isconducted on single image by the network itself, trainingperformance of the network is bounded to the self perfor-mance. Therefore, we propose a strategy to exploit multipleobservations of the object in the image sequence in orderto surpass the self-performance: first, the landmarks for theglobal object map are estimated through network predic-tion and data association, and the corrected annotation fora single frame is obtained. Then, network fine-tuning is con-ducted including the dataset obtained by self-annotation,thereby exceeding the performance boundary of the networkitself. The proposed method was evaluated on the KITTIdriving scene dataset, and we demonstrate the performanceimprovement in the pose estimation of multi-object in 3D space.
Published: 2021

45. Domain Adaptive Monocular Depth Estimation With Semantic Information

Author: Lu, Fei, Yu, Hyeonwoo, and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The advent of deep learning has brought an impressive advance to monocular depth estimation, e.g., supervised monocular depth estimation has been thoroughly investigated. However, the large amount of the RGB-to-depth dataset may not be always available since collecting accurate depth ground truth according to the RGB image is a time-consuming and expensive task. Although the network can be trained on an alternative dataset to overcome the dataset scale problem, the trained model is hard to generalize to the target domain due to the domain discrepancy. Adversarial domain alignment has demonstrated its efficacy to mitigate the domain shift on simple image classification tasks in previous works. However, traditional approaches hardly handle the conditional alignment as they solely consider the feature map of the network. In this paper, we propose an adversarial training model that leverages semantic information to narrow the domain gap. Based on the experiments conducted on the datasets for the monocular depth estimation task including KITTI and Cityscapes, the proposed compact model achieves state-of-the-art performance comparable to complex latest models and shows favorable results on boundaries and objects at far distances., Comment: 8 pages, 5 figures, code will be released soon
Published: 2021

46. Core Challenges of Social Robot Navigation: A Survey

Author: Mavrogiannis, Christoforos, Baldini, Francesca, Wang, Allan, Zhao, Dapeng, Trautman, Pete, Steinfeld, Aaron, and Oh, Jean
Subjects: Computer Science - Robotics, Computer Science - Human-Computer Interaction
Abstract: Robot navigation in crowded public spaces is a complex task that requires addressing a variety of engineering and human factors challenges. These challenges have motivated a great amount of research resulting in important developments for the fields of robotics and human-robot interaction over the past three decades. Despite the significant progress and the massive recent interest, we observe a number of significant remaining challenges that prohibit the seamless deployment of autonomous robots in public pedestrian environments. In this survey article, we organize existing challenges into a set of categories related to broader open problems in motion planning, behavior design, and evaluation methodologies. Within these categories, we review past work, and offer directions for future research. Our work builds upon and extends earlier survey efforts by a) taking a critical perspective and diagnosing fundamental limitations of adopted practices in the field and b) offering constructive feedback and ideas that we aspire will drive research in the field over the coming decade., Comment: Minor formatting edits (36 pages, 3 figures)
Published: 2021

47. Anchor Distance for 3D Multi-Object Distance Estimation from 2D Single Shot

Author: Yu, Hyeonwoo and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual perception of the objects in a 3D environment is a key to successful performance in autonomous driving and simultaneous localization and mapping (SLAM). In this paper, we present a real time approach for estimating the distances to multiple objects in a scene using only a single-shot image. Given a 2D Bounding Box (BBox) and object parameters, a 3D distance to the object can be calculated directly using 3D reprojection; however, such methods are prone to significant errors because an error from the 2D detection can be amplified in 3D. In addition, it is also challenging to apply such methods to a real-time system due to the computational burden. In the case of the traditional multi-object detection methods, %they mostly pay attention to existing works have been developed for specific tasks such as object segmentation or 2D BBox regression. These methods introduce the concept of anchor BBox for elaborate 2D BBox estimation, and predictors are specialized and trained for specific 2D BBoxes. In order to estimate the distances to the 3D objects from a single 2D image, we introduce the notion of \textit{anchor distance} based on an object's location and propose a method that applies the anchor distance to the multi-object detector structure. We let the predictors catch the distance prior using anchor distance and train the network based on the distance. The predictors can be characterized to the objects located in a specific distance range. By propagating the distance prior using a distance anchor to the predictors, it is feasible to perform the precise distance estimation and real-time execution simultaneously. The proposed method achieves about 30 FPS speed, and shows the lowest RMSE compared to the existing methods., Comment: submitted to RA-letter with ICRA2021 option
Published: 2021

48. Anytime 3D Object Reconstruction using Multi-modal Variational Autoencoder

Author: Yu, Hyeonwoo and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: For effective human-robot teaming, it is important for the robots to be able to share their visual perception with the human operators. In a harsh remote collaboration setting, data compression techniques such as autoencoder can be utilized to obtain and transmit the data in terms of latent variables in a compact form. In addition, to ensure real-time runtime performance even under unstable environments, an anytime estimation approach is desired that can reconstruct the full contents from incomplete information. In this context, we propose a method for imputation of latent variables whose elements are partially lost. To achieve the anytime property with only a few dimensions of variables, exploiting prior information of the category-level is essential. A prior distribution used in variational autoencoders is simply assumed to be isotropic Gaussian regardless of the labels of each training datapoint. This type of flattened prior makes it difficult to perform imputation from the category-level distributions. We overcome this limitation by exploiting a category-specific multi-modal prior distribution in the latent space. The missing elements of the partially transferred data can be sampled, by finding a specific modal according to the remaining elements. Since the method is designed to use partial elements for anytime estimation, it can also be applied for data over-compression. Based on the experiments on the ModelNet and Pascal3D datasets, the proposed approach shows consistently superior performance over autoencoder and variational autoencoder up to 70% data loss., Comment: IEEE Robotics and Automation Letters (accepted with ICRA2022 options)
Published: 2021

49. Content Masked Loss: Human-Like Brush Stroke Planning in a Reinforcement Learning Painting Agent

Author: Schaldenbrand, Peter and Oh, Jean
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The objective of most Reinforcement Learning painting agents is to minimize the loss between a target image and the paint canvas. Human painter artistry emphasizes important features of the target image rather than simply reproducing it (DiPaola 2007). Using adversarial or L2 losses in the RL painting models, although its final output is generally a work of finesse, produces a stroke sequence that is vastly different from that which a human would produce since the model does not have knowledge about the abstract features in the target image. In order to increase the human-like planning of the model without the use of expensive human data, we introduce a new loss function for use with the model's reward function: Content Masked Loss. In the context of robot painting, Content Masked Loss employs an object detection model to extract features which are used to assign higher weight to regions of the canvas that a human would find important for recognizing content. The results, based on 332 human evaluators, show that the digital paintings produced by our Content Masked model show detectable subject matter earlier in the stroke sequence than existing methods without compromising on the quality of the final painting. Our code is available at https://github.com/pschaldenbrand/ContentMaskedLoss.
Published: 2020

50. Prompt engineering with ChatGPT3.5 and GPT4 to improve patient education on retinal diseases

Author: Jung, Hoyoung, Oh, Jean, Stephenson, Kirk A.J., Joe, Aaron W., and Mammo, Zaid N.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

342 results on '"Oh, Jean"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources