Author: "Melnik, Andrew" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Melnik, Andrew"' showing total 120 results

Start Over Author "Melnik, Andrew"

120 results on '"Melnik, Andrew"'

1. SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Author: S, Arjun P, Melnik, Andrew, and Nandi, Gora Chand
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Experience Goal Visual Rearrangement task stands as a foundational challenge within Embodied AI, requiring an agent to construct a robust world model that accurately captures the goal state. The agent uses this world model to restore a shuffled scene to its original configuration, making an accurate representation of the world essential for successfully completing the task. In this work, we present a novel framework that leverages on 3D Gaussian Splatting as a 3D scene representation for experience goal visual rearrangement task. Recent advances in volumetric scene representation like 3D Gaussian Splatting, offer fast rendering of high quality and photo-realistic novel views. Our approach enables the agent to have consistent views of the current and the goal setting of the rearrangement task, which enables the agent to directly compare the goal state and the shuffled state of the world in image space. To compare these views, we propose to use a dense feature matching method with visual features extracted from a foundation model, leveraging its advantages of a more universal feature representation, which facilitates robustness, and generalization. We validate our approach on the AI2-THOR rearrangement challenge benchmark and demonstrate improvements over the current state of the art methods
Published: 2024

2. Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting

Author: Büttner, Michael, Francis, Jonathan, Rhodin, Helge, and Melnik, Andrew
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: This paper introduces a method to enhance Interactive Imitation Learning (IIL) by extracting touch interaction points and tracking object movement from video demonstrations. The approach extends current IIL systems by providing robots with detailed knowledge of both where and how to interact with objects, particularly complex articulated ones like doors and drawers. By leveraging cutting-edge techniques such as 3D Gaussian Splatting and FoundationPose for tracking, this method allows robots to better understand and manipulate objects in dynamic environments. The research lays the foundation for more effective task learning and execution in autonomous robotic systems., Comment: CoRL 2024, Workshop on Lifelong Learning for Home Robots, Munich, Germany
Published: 2024

3. Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Author: Yenamandra, Sriram, Ramachandran, Arun, Khanna, Mukul, Yadav, Karmesh, Vakil, Jay, Melnik, Andrew, Büttner, Michael, Harz, Leon, Brown, Lyon, Nandi, Gora Chand, PS, Arjun, Yadav, Gaurav Kumar, Kala, Rahul, Haschke, Robert, Luo, Yang, Zhu, Jinxin, Han, Yansen, Lu, Bingyi, Gu, Xuan, Liu, Qinyuan, Zhao, Yaping, Ye, Qiting, Dou, Chenxiao, Chua, Yansong, Kuzma, Volodymyr, Humennyy, Vladyslav, Partsey, Ruslan, Francis, Jonathan, Chaplot, Devendra Singh, Chhablani, Gunjan, Clegg, Alexander, Gervet, Theophile, Jain, Vidhi, Ramrakhya, Ram, Szot, Andrew, Wang, Austin, Yang, Tsung-Yen, Edsinger, Aaron, Kemp, Charlie, Shah, Binit, Kira, Zsolt, Batra, Dhruv, Mottaghi, Roozbeh, Bisk, Yonatan, and Paxton, Chris
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings.
Published: 2024

4. Video Diffusion Models: A Survey

Author: Melnik, Andrew, Ljubljanac, Michal, Lu, Cong, Yan, Qi, Ren, Weiming, and Ritter, Helge
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Diffusion generative models have recently become a powerful technique for creating and modifying high-quality, coherent video content. This survey provides a comprehensive overview of the critical components of diffusion models for video generation, including their applications, architectural design, and temporal dynamics modeling. The paper begins by discussing the core principles and mathematical formulations, then explores various architectural choices and methods for maintaining temporal consistency. A taxonomy of applications is presented, categorizing models based on input modalities such as text prompts, images, videos, and audio signals. Advancements in text-to-video generation are discussed to illustrate the state-of-the-art capabilities and limitations of current approaches. Additionally, the survey summarizes recent developments in training and evaluation practices, including the use of diverse video and image datasets and the adoption of various evaluation metrics to assess model performance. The survey concludes with an examination of ongoing challenges, such as generating longer videos and managing computational costs, and offers insights into potential future directions for the field. By consolidating the latest research and developments, this survey aims to serve as a valuable resource for researchers and practitioners working with video diffusion models. Website: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models, Comment: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models
Published: 2024

5. Lane Segmentation Refinement with Diffusion Models

Author: Ruiz, Antonio, Melnik, Andrew, Wang, Dong, and Ritter, Helge
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning. Previously, He et al. (2022) explored the extraction of the lane-level graph from aerial imagery utilizing a segmentation based approach. However, segmentation networks struggle to achieve perfect segmentation masks resulting in inaccurate lane graph extraction. We explore additional enhancements to refine this segmentation-based approach and extend it with a diffusion probabilistic model (DPM) component. This combination further improves the GEO F1 and TOPO F1 scores, which are crucial indicators of the quality of a lane graph, in the undirected graph in non-intersection areas. We conduct experiments on a publicly available dataset, demonstrating that our method outperforms the previous approach, particularly in enhancing the connectivity of such a graph, as measured by the TOPO F1 score. Moreover, we perform ablation studies on the individual components of our method to understand their contribution and evaluate their effectiveness.
Published: 2024

6. Cognitive Planning for Object Goal Navigation using Generative AI Models

Author: S, Arjun P, Melnik, Andrew, and Nandi, Gora Chand
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in Generative AI, particularly in Large Language Models (LLMs) and Large Vision-Language Models (LVLMs), offer new possibilities for integrating cognitive planning into robotic systems. In this work, we present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging LLMs and LVLMs to understand the semantic structure of the scene. To address the challenge of representing complex environments without overwhelming the system, we propose a 3D modular scene representation, enriched with semantic descriptions. This representation is dynamically pruned using an LLM-based mechanism, which filters irrelevant information and focuses on task-specific data. By combining these elements, our system generates high-level sub-goals that guide the exploration of the robot toward the target object. We validate our approach in simulated environments, demonstrating its ability to enhance object search efficiency while maintaining scalability in complex settings.
Published: 2024

7. Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs

Author: Mikami, Yusuke, Melnik, Andrew, Miura, Jun, and Hautamäki, Ville
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, I.2.9, I.2.7
Abstract: We demonstrate experimental results with LLMs that address robotics task planning problems. Recently, LLMs have been applied in robotics task planning, particularly using a code generation approach that converts complex high-level instructions into mid-level policy codes. In contrast, our approach acquires text descriptions of the task and scene objects, then formulates task planning through natural language reasoning, and outputs coordinate level control commands, thus reducing the necessity for intermediate representation code as policies with pre-defined APIs. Our approach is evaluated on a multi-modal prompt simulation benchmark, demonstrating that our prompt engineering experiments with natural language reasoning significantly enhance success rates compared to its absence. Furthermore, our approach illustrates the potential for natural language descriptions to transfer robotics skills from known tasks to previously unseen tasks. The project website: https://natural-language-as-policies.github.io/, Comment: 8 pages, 2 figures
Published: 2024

8. Zero-shot Imitation Policy via Search in Demonstration Dataset

Author: Malato, Federco, Leopold, Florian, Melnik, Andrew, and Hautamaki, Ville
Subjects: Computer Science - Artificial Intelligence
Abstract: Behavioral cloning uses a dataset of demonstrations to learn a policy. To overcome computationally expensive training procedures and address the policy adaptation problem, we propose to use latent spaces of pre-trained foundation models to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a dynamic search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video Pre-Training model. We compare our model to state-of-the-art, Imitation Learning-based Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach clearly wins in terms of accuracy and perceptual evaluation over learning-based models.
Published: 2024
Full Text: View/download PDF

9. Benchmarks for Physical Reasoning AI

Author: Melnik, Andrew, Schiewer, Robin, Lange, Moritz, Muresanu, Andrei, Saeidi, Mozhgan, Garg, Animesh, and Ritter, Helge
Subjects: Computer Science - Artificial Intelligence
Abstract: Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems. We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks. While each of the selected benchmarks poses a unique challenge, their ensemble provides a comprehensive proving ground for an AI generalist agent with a measurable skill level for various physical reasoning concepts. This gives an advantage to such an ensemble of benchmarks over other holistic benchmarks that aim to simulate the real world by intertwining its complexity and many concepts. We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups.
Published: 2023

10. UniTeam: Open Vocabulary Mobile Manipulation Challenge

Author: Melnik, Andrew, Büttner, Michael, Harz, Leon, Brown, Lyon, Nandi, Gora Chand, PS, Arjun, Yadav, Gaurav Kumar, Kala, Rahul, and Haschke, Robert
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: This report introduces our UniTeam agent - an improved baseline for the "HomeRobot: Open Vocabulary Mobile Manipulation" challenge. The challenge poses problems of navigation in unfamiliar environments, manipulation of novel objects, and recognition of open-vocabulary object classes. This challenge aims to facilitate cross-cutting research in embodied AI using recent advances in machine learning, computer vision, natural language, and robotics. In this work, we conducted an exhaustive evaluation of the provided baseline agent; identified deficiencies in perception, navigation, and manipulation skills; and improved the baseline agent's performance. Notably, enhancements were made in perception - minimizing misclassifications; navigation - preventing infinite loop commitments; picking - addressing failures due to changing object visibility; and placing - ensuring accurate positioning for successful object placement.
Published: 2023

11. Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

Author: Sheikh, Jannik, Melnik, Andrew, Nandi, Gora Chand, and Haschke, Robert
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Reinforcement learning and Imitation Learning approaches utilize policy learning strategies that are difficult to generalize well with just a few examples of a task. In this work, we propose a language-conditioned semantic search-based method to produce an online search-based policy from the available demonstration dataset of state-action trajectories. Here we directly acquire actions from the most similar manipulation trajectories found in the dataset. Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities. This holds great potential for expanding the use of our online search-based policy approach to tasks typically addressed by Imitation Learning or Reinforcement Learning-based policies.
Published: 2023

12. Behavioral Cloning via Search in Embedded Demonstration Dataset

Author: Malato, Federico, Leopold, Florian, Hautamaki, Ville, and Melnik, Andrew
Subjects: Computer Science - Artificial Intelligence
Abstract: Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video PreTraining model. We compare our model to state-of-the-art Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach is comparable to trained models, while allowing zero-shot task adaptation by changing the demonstration examples.
Published: 2023

13. Contrastive Language, Action, and State Pre-training for Robot Learning

Author: Rana, Krishan, Melnik, Andrew, and Sünderhauf, Niko
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: In this paper, we introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning. Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment. By employing distributional outputs for both text and behaviour encoders, our model effectively associates diverse textual commands with a single behaviour and vice-versa. We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior for language-conditioned reinforcement learning. Our distributional encoders exhibit superior retrieval and captioning performance on unseen datasets, and the ability to generate meaningful exploratory behaviours from textual commands, capturing the intricate relationships between language, action, and state. This work represents an initial step towards developing a unified pre-trained model for robotics, with the potential to generalise to a broad range of downstream tasks.
Published: 2023

14. Shape complexity estimation using VAE

Author: Rothgaenger, Markus, Melnik, Andrew, and Ritter, Helge
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we compare methods for estimating the complexity of two-dimensional shapes and introduce a method that exploits reconstruction loss of Variational Autoencoders with different sizes of latent vectors. Although complexity of a shape is not a well defined attribute, different aspects of it can be estimated. We demonstrate that our methods captures some aspects of shape complexity. Code and training details will be publicly available.
Published: 2023

15. Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Author: Milani, Stephanie, Kanervisto, Anssi, Ramanauskas, Karolis, Schulhoff, Sander, Houghton, Brandon, Mohanty, Sharada, Galbraith, Byron, Chen, Ke, Song, Yan, Zhou, Tianze, Yu, Bingquan, Liu, He, Guan, Kai, Hu, Yujing, Lv, Tangjie, Malato, Federico, Leopold, Florian, Raut, Amogh, Hautamäki, Ville, Melnik, Andrew, Ishida, Shu, Henriques, João F., Klassert, Robert, Laurito, Walter, Novoseller, Ellen, Goecks, Vinicius G., Waytowich, Nicholas, Watkins, David, Miller, Josh, and Shah, Rohin
Subjects: Computer Science - Artificial Intelligence
Abstract: To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.
Published: 2023

16. Deep Detection Dreams: Enhancing Visualization Tools for Single Stage Object Detectors

Author: Limberg, Christian, Harter, Augustin, Melnik, Andrew, Ritter, Helge, Prendinger, Helmut, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, de Sousa, A. Augusto, editor, Bashford-Rogers, Thomas, editor, Paljic, Alexis, editor, Ziat, Mounia, editor, Hurter, Christophe, editor, Purchase, Helen, editor, Radeva, Petia, editor, Farinella, Giovanni Maria, editor, and Bouatouch, Kadi, editor
Published: 2024
Full Text: View/download PDF

17. Shape Complexity Estimation Using VAE

Author: Rothgänger, Markus, Melnik, Andrew, Ritter, Helge, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2024
Full Text: View/download PDF

18. Stroke-based Rendering: From Heuristics to Deep Learning

Author: Nolte, Florian, Melnik, Andrew, and Ritter, Helge
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In the last few years, artistic image-making with deep learning models has gained a considerable amount of traction. A large number of these models operate directly in the pixel space and generate raster images. This is however not how most humans would produce artworks, for example, by planning a sequence of shapes and strokes to draw. Recent developments in deep learning methods help to bridge the gap between stroke-based paintings and pixel photo generation. With this survey, we aim to provide a structured introduction and understanding of common challenges and approaches in stroke-based rendering algorithms. These algorithms range from simple rule-based heuristics to stroke optimization and deep reinforcement agents, trained to paint images with differentiable vector graphics and neural rendering.
Published: 2022

19. Behavioral Cloning via Search in Video PreTraining Latent Space

Author: Malato, Federico, Leopold, Florian, Raut, Amogh, Hautamäki, Ville, and Melnik, Andrew
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.
Published: 2022

20. Face Generation and Editing with StyleGAN: A Survey

Author: Melnik, Andrew, Miasayedzenkau, Maksim, Makarovets, Dzianis, Pirshtuk, Dzianis, Akbulut, Eren, Holzmann, Dennis, Renusch, Tarek, Reichert, Gustav, and Ritter, Helge
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Our goal with this survey is to provide an overview of the state of the art deep learning methods for face generation and editing using StyleGAN. The survey covers the evolution of StyleGAN, from PGGAN to StyleGAN3, and explores relevant topics such as suitable metrics for training, different latent representations, GAN inversion to latent spaces of StyleGAN, face image editing, cross-domain face stylization, face restoration, and even Deepfake applications. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
Published: 2022
Full Text: View/download PDF

21. Planning with RL and episodic-memory behavioral priors

Author: Beohar, Shivansh and Melnik, Andrew
Subjects: Computer Science - Artificial Intelligence
Abstract: The practical application of learning agents requires sample efficient and interpretable algorithms. Learning from behavioral priors is a promising way to bootstrap agents with a better-than-random exploration policy or a safe-guard against the pitfalls of early learning. Existing solutions for imitation learning require a large number of expert demonstrations and rely on hard-to-interpret learning methods like Deep Q-learning. In this work we present a planning-based approach that can use these behavioral priors for effective exploration and learning in a reinforcement learning environment, and we demonstrate that curated exploration policies in the form of behavioral priors can help an agent learn faster., Comment: Published in ICRA 2022 BPRL Workshop
Published: 2022

22. Solving Learn-to-Race Autonomous Racing Challenge by Planning in Latent Space

Author: Beohar, Shivansh, Heinrich, Fabian, Kala, Rahul, Ritter, Helge, and Melnik, Andrew
Subjects: Computer Science - Artificial Intelligence
Abstract: Learn-to-Race Autonomous Racing Virtual Challenge hosted on wwwaicrowdcom platform consisted of two tracks: Single and Multi Camera. Our UniTeam team was among the final winners in the Single Camera track. The agent is required to pass the previously unknown F1-style track in the minimum time with the least amount of off-road driving violations. In our approach, we used the U-Net architecture for road segmentation, variational autocoder for encoding a road binary mask, and a nearest-neighbor search strategy that selects the best action for a given state. Our agent achieved an average speed of 105 km/h on stage 1 (known track) and 73 km/h on stage 2 (unknown track) without any off-road driving violations. Here we present our solution and results., Comment: Published in SL4AD Workshop, ICML 2022
Published: 2022

23. Faces: AI Blitz XIII Solutions

Author: Melnik, Andrew, Akbulut, Eren, Sheikh, Jannik, Loos, Kira, Buettner, Michael, and Lenze, Tobias
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: AI Blitz XIII Faces challenge hosted on www.aicrowd.com platform consisted of five problems: Sentiment Classification, Age Prediction, Mask Prediction, Face Recognition, and Face De-Blurring. Our team GLaDOS took second place. Here we present our solutions and results. Code implementation: https://github.com/ndrwmlnk/ai-blitz-xiii
Published: 2022

24. Traffic4cast at NeurIPS 2021 -- Temporal and Spatial Few-Shot Transfer Learning in Gridded Geo-Spatial Processes

Author: Eichenberger, Christian, Neun, Moritz, Martin, Henry, Herruzo, Pedro, Spanring, Markus, Lu, Yichao, Choi, Sungbin, Konyakhin, Vsevolod, Lukashina, Nina, Shpilman, Aleksei, Wiedemann, Nina, Raubal, Martin, Wang, Bo, Vu, Hai L., Mohajerpoor, Reza, Cai, Chen, Kim, Inhi, Hermes, Luca, Melnik, Andrew, Velioglu, Riza, Vieth, Markus, Schilling, Malte, Bojesomo, Alabi, Marzouqi, Hasan Al, Liatsis, Panos, Santokhi, Jay, Hillier, Dylan, Yang, Yiming, Sarwar, Joned, Jordan, Anna, Hewage, Emil, Jonietz, David, Tang, Fei, Gruca, Aleksandra, Kopp, Michael, Kreil, David, and Hochreiter, Sepp
Subjects: Computer Science - Machine Learning
Abstract: The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extract relevant features in this complex real-world geo-spatial process. Building on the previous competitions, Traffic4cast 2021 now focuses on the question of model robustness and generalizability across time and space. Moving from one city to an entirely different city, or moving from pre-COVID times to times after COVID hit the world thus introduces a clear domain shift. We thus, for the first time, release data featuring such domain shifts. The competition now covers ten cities over 2 years, providing data compiled from over 10^12 GPS probe data. Winning solutions captured traffic dynamics sufficiently well to even cope with these complex domain shifts. Surprisingly, this seemed to require only the previous 1h traffic dynamic history and static road graph as input., Comment: Pre-print under review, submitted to Proceedings of Machine Learning Research
Published: 2022

25. A Graph-based U-Net Model for Predicting Traffic in unseen Cities

Author: Hermes, Luca, Hammer, Barbara, Melnik, Andrew, Velioglu, Riza, Vieth, Markus, and Schilling, Malte
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Accurate traffic prediction is a key ingredient to enable traffic management like rerouting cars to reduce road congestion or regulating traffic via dynamic speed limits to maintain a steady flow. A way to represent traffic data is in the form of temporally changing heatmaps visualizing attributes of traffic, such as speed and volume. In recent works, U-Net models have shown SOTA performance on traffic forecasting from heatmaps. We propose to combine the U-Net architecture with graph layers which improves spatial generalization to unseen road networks compared to a Vanilla U-Net. In particular, we specialize existing graph operations to be sensitive to geographical topology and generalize pooling and upsampling operations to be applicable to graphs., Comment: About to be published in IJCNN Proceedings 2022
Published: 2022

26. YOLO -- You only look 10647 times

Author: Limberg, Christian, Melnik, Andrew, Harter, Augustin, and Ritter, Helge
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like single-stage object detection models, RCNN-like two-stage region proposal based models, and ResNet-like image classification models. In addition, we created interactive exploration tools for a better visual understanding of the YOLO information processing streams: https://limchr.github.io/yolo_visualization
Published: 2022

27. Critic Guided Segmentation of Rewarding Objects in First-Person Views

Author: Melnik, Andrew, Harter, Augustin, Limberg, Christian, Rana, Krishan, Suenderhauf, Niko, and Ritter, Helge
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that, we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic's score of a high score image and increase the critic's score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation
Published: 2021

28. Towards robust and domain agnostic reinforcement learning competitions

Author: Guss, William Hebgen, Milani, Stephanie, Topin, Nicholay, Houghton, Brandon, Mohanty, Sharada, Melnik, Andrew, Harter, Augustin, Buschmaas, Benoit, Jaster, Bjarne, Berganski, Christoph, Heitkamp, Dennis, Henning, Marko, Ritter, Helge, Wu, Chengjie, Hao, Xiaotian, Lu, Yiming, Mao, Hangyu, Mao, Yihuan, Wang, Chao, Opanowicz, Michal, Kanervisto, Anssi, Schraner, Yanick, Scheller, Christian, Zhou, Xiren, Liu, Lu, Nishio, Daichi, Tsuneda, Toi, Ramanauskas, Karolis, and Juceviciute, Gabija
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Computer Science - Robotics, Statistics - Machine Learning
Abstract: Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field. Despite this, a majority of challenges suffer from the same fundamental problems: participant solutions to the posed challenge are usually domain-specific, biased to maximally exploit compute resources, and not guaranteed to be reproducible. In this paper, we present a new framework of competition design that promotes the development of algorithms that overcome these barriers. We propose four central mechanisms for achieving this end: submission retraining, domain randomization, desemantization through domain obfuscation, and the limitation of competition compute and environment-sample budget. To demonstrate the efficacy of this design, we proposed, organized, and ran the MineRL 2020 Competition on Sample-Efficient Reinforcement Learning. In this work, we describe the organizational outcomes of the competition and show that the resulting participant submissions are reproducible, non-specific to the competition environment, and sample/resource efficient, despite the difficult competition task., Comment: 20 pages, several figures, published PMLR
Published: 2021

29. Solving Physics Puzzles by Reasoning about Paths

Author: Harter, Augustin, Melnik, Andrew, Kumar, Gaurav, Agarwal, Dhruv, Garg, Animesh, and Ritter, Helge
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: We propose a new deep learning model for goal-driven tasks that require intuitive physical reasoning and intervention in the scene to achieve a desired end goal. Its modular structure is motivated by hypothesizing a sequence of intuitive steps that humans apply when trying to solve such a task. The model first predicts the path the target object would follow without intervention and the path the target object should follow in order to solve the task. Next, it predicts the desired path of the action object and generates the placement of the action object. All components of the model are trained jointly in a supervised way; each component receives its own learning signal but learning signals are also backpropagated through the entire architecture. To evaluate the model we use PHYRE - a benchmark test for goal-driven physical reasoning in 2D mechanics puzzles., Comment: 1st NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning (2020)
Published: 2020

30. Modularization of End-to-End Learning: Case Study in Arcade Games

Author: Melnik, Andrew, Fleer, Sascha, Schilling, Malte, and Ritter, Helge
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better generalisation capability. Here, we consider arcade-game environments as sets of interacting objects (controllable, non-controllable) and propose a set of functional modules that are specialized on mastering different types of interactions in a broad range of environments. The modules utilize regression, supervised learning, and reinforcement learning algorithms. Results of this case study in different Atari games suggest that human-level performance can be achieved by a learning agent within a human amount of game experience (10-15 minutes game time) when a proper decomposition of an environment or a task is provided. However, automatization of such decomposition remains a challenging problem. This case study shows how a model of a causal structure underlying an environment or a task can benefit learning time and generalization capability of the agent, and argues in favor of exploiting modular structure in contrast to using pure end-to-end learning approaches.
Published: 2019

31. Transfer Learning with Jukebox for Music Source Separation

Author: El Amri, Wadhah Zai, Tautz, Oliver, Ritter, Helge, Melnik, Andrew, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Goedicke, Michael, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Tröltzsch, Fredi, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Reis, Ricardo, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Maglogiannis, Ilias, editor, Iliadis, Lazaros, editor, Macintyre, John, editor, and Cortez, Paulo, editor
Published: 2022
Full Text: View/download PDF

32. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

Author: Kidziński, Łukasz, Mohanty, Sharada Prasanna, Ong, Carmichael, Huang, Zhewei, Zhou, Shuchang, Pechenko, Anton, Stelmaszczyk, Adam, Jarosik, Piotr, Pavlov, Mikhail, Kolesnikov, Sergey, Plis, Sergey, Chen, Zhibo, Zhang, Zhizheng, Chen, Jiale, Shi, Jun, Zheng, Zhuobin, Yuan, Chun, Lin, Zhihui, Michalewski, Henryk, Miłoś, Piotr, Osiński, Błażej, Melnik, Andrew, Schilling, Malte, Ritter, Helge, Carroll, Sean, Hicks, Jennifer, Levine, Sergey, Salathé, Marcel, and Delp, Scott
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms., Comment: 27 pages, 17 figures
Published: 2018

33. Face Generation and Editing With StyleGAN: A Survey

Author: Melnik, Andrew, primary, Miasayedzenkau, Maksim, additional, Makaravets, Dzianis, additional, Pirshtuk, Dzianis, additional, Akbulut, Eren, additional, Holzmann, Dennis, additional, Renusch, Tarek, additional, Reichert, Gustav, additional, and Ritter, Helge, additional
Published: 2024
Full Text: View/download PDF

34. Zero-Shot Imitation Policy Via Search In Demonstration Dataset

Author: Malato, Federico, primary, Leopold, Florian, additional, Melnik, Andrew, additional, and Hautamäki, Ville, additional
Published: 2024
Full Text: View/download PDF

35. Learn to Move Through a Combination of Policy Gradient Algorithms: DDPG, D4PG, and TD3

Author: Bach, Nicolas, Melnik, Andrew, Schilling, Malte, Korthals, Timo, Ritter, Helge, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nicosia, Giuseppe, editor, Ojha, Varun, editor, La Malfa, Emanuele, editor, Jansen, Giorgio, editor, Sciacca, Vincenzo, editor, Pardalos, Panos, editor, Giuffrida, Giovanni, editor, and Umeton, Renato, editor
Published: 2020
Full Text: View/download PDF

36. An Error-Based Addressing Architecture for Dynamic Model Learning

Author: Bach, Nicolas, Melnik, Andrew, Rosetto, Federico, Ritter, Helge, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nicosia, Giuseppe, editor, Ojha, Varun, editor, La Malfa, Emanuele, editor, Jansen, Giorgio, editor, Sciacca, Vincenzo, editor, Pardalos, Panos, editor, Giuffrida, Giovanni, editor, and Umeton, Renato, editor
Published: 2020
Full Text: View/download PDF

37. Decentralized control and local information for robust and adaptive decentralized Deep Reinforcement Learning

Author: Schilling, Malte, Melnik, Andrew, Ohl, Frank W., Ritter, Helge J., and Hammer, Barbara
Published: 2021
Full Text: View/download PDF

38. Correction to: Transfer Learning with Jukebox for Music Source Separation

Author: Zai El Amri, Wadhah, primary, Tautz, Oliver, additional, Ritter, Helge, additional, and Melnik, Andrew, additional
Published: 2022
Full Text: View/download PDF

39. Transfer Learning with Jukebox for Music Source Separation

Author: Zai El Amri, Wadhah, primary, Tautz, Oliver, additional, Ritter, Helge, additional, and Melnik, Andrew, additional
Published: 2022
Full Text: View/download PDF

40. Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

Author: S, Arjun P, Melnik, Andrew, Nandi, Gora Chand, S, Arjun P, Melnik, Andrew, and Nandi, Gora Chand
Abstract: Recent advancements in Generative Artificial Intelligence, particularly in the realm of Large Language Models (LLMs) and Large Vision Language Models (LVLMs), have enabled the prospect of leveraging cognitive planners within robotic systems. This work focuses on solving the object goal navigation problem by mimicking human cognition to attend, perceive and store task specific information and generate plans with the same. We introduce a comprehensive framework capable of exploring an unfamiliar environment in search of an object by leveraging the capabilities of Large Language Models(LLMs) and Large Vision Language Models (LVLMs) in understanding the underlying semantics of our world. A challenging task in using LLMs to generate high level sub-goals is to efficiently represent the environment around the robot. We propose to use a 3D scene modular representation, with semantically rich descriptions of the object, to provide the LLM with task relevant information. But providing the LLM with a mass of contextual information (rich 3D scene semantic representation), can lead to redundant and inefficient plans. We propose to use an LLM based pruner that leverages the capabilities of in-context learning to prune out irrelevant goal specific information.
Published: 2024

41. An Approach to Hierarchical Deep Reinforcement Learning for a Decentralized Walking Control Architecture

Author: Schilling, Malte, Melnik, Andrew, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, and Samsonovich, Alexei V., editor
Published: 2019
Full Text: View/download PDF

42. Critic Guided Segmentation of Rewarding Objects in First-Person Views

Author: Melnik, Andrew, primary, Harter, Augustin, additional, Limberg, Christian, additional, Rana, Krishan, additional, Sünderhauf, Niko, additional, and Ritter, Helge, additional
Published: 2021
Full Text: View/download PDF

43. Learn to Move Through a Combination of Policy Gradient Algorithms: DDPG, D4PG, and TD3

Author: Bach, Nicolas, primary, Melnik, Andrew, additional, Schilling, Malte, additional, Korthals, Timo, additional, and Ritter, Helge, additional
Published: 2020
Full Text: View/download PDF

44. An Approach to Hierarchical Deep Reinforcement Learning for a Decentralized Walking Control Architecture

Author: Schilling, Malte, primary and Melnik, Andrew, additional
Published: 2018
Full Text: View/download PDF

45. Learning to Run Challenge Solutions: Adapting Reinforcement Learning Methods for Neuromusculoskeletal Environments

Author: Kidziński, Łukasz, primary, Mohanty, Sharada Prasanna, additional, Ong, Carmichael F., additional, Huang, Zhewei, additional, Zhou, Shuchang, additional, Pechenko, Anton, additional, Stelmaszczyk, Adam, additional, Jarosik, Piotr, additional, Pavlov, Mikhail, additional, Kolesnikov, Sergey, additional, Plis, Sergey, additional, Chen, Zhibo, additional, Zhang, Zhizheng, additional, Chen, Jiale, additional, Shi, Jun, additional, Zheng, Zhuobin, additional, Yuan, Chun, additional, Lin, Zhihui, additional, Michalewski, Henryk, additional, Milos, Piotr, additional, Osinski, Blazej, additional, Melnik, Andrew, additional, Schilling, Malte, additional, Ritter, Helge, additional, Carroll, Sean F., additional, Hicks, Jennifer, additional, Levine, Sergey, additional, Salathé, Marcel, additional, and Delp, Scott, additional
Published: 2018
Full Text: View/download PDF

46. YOLO: You Only Look 10647 Times

Author: Limberg, Christian, primary, Melnik, Andrew, additional, Ritter, Helge, additional, and Prendinger, Helmut, additional
Published: 2023
Full Text: View/download PDF

47. Stroke-based Rendering: From Heuristics to Deep Learning

Author: Nolte, Florian, Melnik, Andrew, and Ritter, Helge
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: In the last few years, artistic image-making with deep learning models has gained a considerable amount of traction. A large number of these models operate directly in the pixel space and generate raster images. This is however not how most humans would produce artworks, for example, by planning a sequence of shapes and strokes to draw. Recent developments in deep learning methods help to bridge the gap between stroke-based paintings and pixel photo generation. With this survey, we aim to provide a structured introduction and understanding of common challenges and approaches in stroke-based rendering algorithms. These algorithms range from simple rule-based heuristics to stroke optimization and deep reinforcement agents, trained to paint images with differentiable vector graphics and neural rendering.
Published: 2023
Full Text: View/download PDF

48. A Graph-based U-Net Model for Predicting Traffic in unseen Cities

Author: Hermes, Luca, primary, Hammer, Barbara, additional, Melnik, Andrew, additional, Velioglu, Riza, additional, Vieth, Markus, additional, and Schilling, Malte, additional
Published: 2022
Full Text: View/download PDF

49. Critic Guided Segmentation of Rewarding Objects in First-Person Views

Author: Edelkamp, Stefan, Möller, Ralf, Rueckert, Elmar, Melnik, Andrew, Harter, Augustin, Limberg, Christian, Rana, Krishan, Sünderhauf, Niko, Ritter, Helge, Edelkamp, Stefan, Möller, Ralf, Rueckert, Elmar, Melnik, Andrew, Harter, Augustin, Limberg, Christian, Rana, Krishan, Sünderhauf, Niko, and Ritter, Helge
Abstract: This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic’s score of a high score image and increase the critic’s score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation.
Published: 2021

50. Using Tactile Sensing to Improve the Sample Efficiency and Performance of Deep Deterministic Policy Gradients for Simulated In-Hand Manipulation Tasks

Author: Melnik, Andrew, primary, Lach, Luca, additional, Plappert, Matthias, additional, Korthals, Timo, additional, Haschke, Robert, additional, and Ritter, Helge, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

120 results on '"Melnik, Andrew"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources