26 results on '"Joseph J. Lim"'
Search Results
2. Scaling simulation-to-real transfer by learning a latent space of robot skills
- Author
-
Zhanpeng He, Eric Heiden, Hejia Zhang, Joseph J. Lim, Ryan Julian, Gaurav S. Sukhatme, Stefan Schaal, and Karol Hausman
- Subjects
Artificial neural network ,Computer science ,business.industry ,Applied Mathematics ,Mechanical Engineering ,Space (mathematics) ,Artificial Intelligence ,Modeling and Simulation ,Decomposition (computer science) ,Robot ,Reinforcement learning ,Artificial intelligence ,Electrical and Electronic Engineering ,Transfer of learning ,business ,Scaling ,Feature learning ,Software - Abstract
We present a strategy for simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation–reality gap, we propose a method for increasing the sample efficiency and robustness of existing simulation-to-real approaches which exploits hierarchy and online adaptation. Instead of learning a unique policy for each desired robotic task, we learn a diverse set of skills and their variations, and embed those skill variations in a continuously parameterized space. We then interpolate, search, and plan in this space to find a transferable policy which solves more complex, high-level tasks by combining low-level skills and their variations. In this work, we first characterize the behavior of this learned skill space, by experimenting with several techniques for composing pre-learned latent skills. We then discuss an algorithm which allows our method to perform long-horizon tasks never seen in simulation, by intelligently sequencing short-horizon latent skills. Our algorithm adapts to unseen tasks online by repeatedly choosing new skills from the latent space, using live sensor data and simulation to predict which latent skill will perform best next in the real world. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. In addition to our results indicating a lower sample complexity for families of tasks, we believe that our method provides a promising template for combining learning-based methods with proven classical robotics algorithms such as model-predictive control.
- Published
- 2020
- Full Text
- View/download PDF
3. Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding
- Author
-
Grace Zhang, Linghan Zhong, Youngwoon Lee, and Joseph J. Lim
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Policy transfer ,Robot calibration ,Computer science ,Ground ,Computer Science - Artificial Intelligence ,Robot learning ,Machine Learning (cs.LG) ,Task (project management) ,Domain (software engineering) ,Computer Science - Robotics ,Artificial Intelligence (cs.AI) ,Human–computer interaction ,Code (cryptography) ,Robot ,Robotics (cs.RO) - Abstract
The ability to transfer a policy from one environment to another is a promising avenue for efficient robot learning in realistic settings where task supervision is not available. This can allow us to take advantage of environments well suited for training, such as simulators or laboratories, to learn a policy for a real robot in a home or office. To succeed, such policy transfer must overcome both the visual domain gap (e.g. different illumination or background) and the dynamics domain gap (e.g. different robot calibration or modelling error) between source and target environments. However, prior policy transfer approaches either cannot handle a large domain gap or can only address one type of domain gap at a time. In this paper, we propose a novel policy transfer method with iterative "environment grounding", IDAPT, that alternates between (1) directly minimizing both visual and dynamics domain gaps by grounding the source environment in the target environment domains, and (2) training a policy on the grounded source environment. This iterative training progressively aligns the domains between the two environments and adapts the policy to the target environment. Once trained, the policy can be directly executed on the target environment. The empirical results on locomotion and robotic manipulation tasks demonstrate that our approach can effectively transfer a policy across visual and dynamics domain gaps with minimal supervision and interaction with the target environment. Videos and code are available at https://clvrai.com/idapt ., Robotics: Science and Systems (RSS), 2021
- Published
- 2021
4. IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks
- Author
-
Joseph J. Lim, Youngwoon Lee, and Edward S. Hu
- Subjects
Task (computing) ,Human–computer interaction ,Computer science ,business.industry ,Control (management) ,Robot ,Reinforcement learning ,business ,Robot learning ,Automation ,GeneralLiterature_MISCELLANEOUS ,Domain (software engineering) ,Rendering (computer graphics) - Abstract
The IKEA Furniture Assembly Environment is one of the first benchmarks for testing and accelerating the automation of long-horizon and hierarchical manipulation tasks. The environment is designed to advance reinforcement learning and imitation learning from simple toy tasks to complex tasks requiring both long-term planning and sophisticated low-level control. Our environment features 60 furniture models, 6 robots, photorealistic rendering, and domain randomization. We evaluate reinforcement learning and imitation learning methods on the proposed environment. Our experiments show furniture assembly is a challenging task due to its long horizon and sophisticated manipulation requirements, which provides ample opportunities for future research. The environment is publicly available at https://clvrai.com/furniture.
- Published
- 2021
- Full Text
- View/download PDF
5. Qualitative and quantitative differences in endometrial inflammatory gene expression precede the development of bovine uterine disease
- Author
-
Kieran G. Meade, Joseph J. Lim, Amy Brewer, Aspinas Chapwanya, Paul Cormican, and Cliona O'Farrelly
- Subjects
0301 basic medicine ,Interleukin-1beta ,lcsh:Medicine ,Cattle Diseases ,Disease ,Endometrium ,Article ,Transcriptome ,Andrology ,03 medical and health sciences ,Interleukin-1alpha ,Gene expression ,Immunogenetics ,Animals ,Medicine ,lcsh:Science ,Transcriptomics ,Retrospective Studies ,Uterine Diseases ,Multidisciplinary ,Sequence Analysis, RNA ,Tumor Necrosis Factor-alpha ,business.industry ,Gene Expression Profiling ,lcsh:R ,Postpartum Period ,0402 animal and dairy science ,04 agricultural and veterinary sciences ,medicine.disease ,040201 dairy & animal science ,Gene expression profiling ,030104 developmental biology ,medicine.anatomical_structure ,Gene Expression Regulation ,IL1A ,Case-Control Studies ,lcsh:Q ,Cattle ,Female ,Tumor necrosis factor alpha ,Endometritis ,business - Abstract
The transcriptome of the endometrium early postpartum was profiled to determine if inflammatory gene expression was elevated in cows which subsequently developed uterine disease. Endometrial cytobrush samples were collected at 7 days postpartum (DPP) from 112 Holstein–Friesian dairy cows, from which 27 were retrospectively chosen for RNA-seq on the basis of disease classification [ten healthy and an additional 17 diagnosed with cytological endometritis (CYTO), or purulent vaginal discharge (PVD)] at 21 DPP. 297 genes were significantly differentially expressed between cows that remained healthy versus those that subsequently developed PVD, including IL1A and IL1B (adjusted p IL1A, IL1B and TNFA. Despite the expected heterogeneity associated with natural infection, enhanced activation of the inflammatory response is likely a key contributory feature of both PVD and CYTO development. Prognostic biomarkers of uterine disease would be particularly valuable for seasonal-based dairy systems where any delay to conception undermines sustainability.
- Published
- 2020
- Full Text
- View/download PDF
6. Scaling Simulation-to-Real Transfer by Learning Composable Robot Skills
- Author
-
Joseph J. Lim, Ryan Julian, Stefan Schaal, Gaurav S. Sukhatme, Karol Hausman, Eric Heiden, Zhanpeng He, and Hejia Zhang
- Subjects
0209 industrial biotechnology ,business.industry ,Computer science ,Control (management) ,Parameterized complexity ,Robotics ,02 engineering and technology ,Reuse ,020901 industrial engineering & automation ,Human–computer interaction ,0202 electrical engineering, electronic engineering, information engineering ,Decomposition (computer science) ,Embedding ,Robot ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Set (psychology) - Abstract
We present a novel solution to the problem of simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation-reality gap, we learn a set of diverse policies that are parameterized in a way that makes them easily reusable. This diversity and parameterization of low-level skills allows us to find a transferable policy that is able to use combinations and variations of different skills to solve more complex, high-level tasks. In particular, we first use simulation to jointly learn a policy for a set of low-level skills, and a “skill embedding” parameterization which can be used to compose them. Later, we learn high-level policies which actuate the low-level policies via this skill embedding parameterization. The high-level policies encode how and when to reuse the low-level skills together to achieve specific high-level tasks. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. We illustrate the principles of our method using informative simulation experiments. We then verify its usefulness for real robotics problems by learning, transferring, and composing free-space and contact motion skills on a Sawyer robot using only joint-space control. We experiment with several techniques for composing pre-learned skills, and find that our method allows us to use both learning-based approaches and efficient search-based planning to achieve high-level tasks using only pre-learned skills.
- Published
- 2020
- Full Text
- View/download PDF
7. 3D Interpreter Networks for Viewer-Centered Wireframe Modeling
- Author
-
Joseph J. Lim, Antonio Torralba, Tianfan Xue, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Yuandong Tian, Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences, and Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Synthetic data ,Machine Learning (cs.LG) ,Rendering (computer graphics) ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Image retrieval ,0105 earth and related environmental sciences ,Ground truth ,business.industry ,Object structure ,3d shapes ,Real image ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software ,Interpreter - Abstract
Understanding 3D object structure from a single image is an important but challenging task in computer vision, mostly due to the lack of 3D object annotations to real images. Previous research tackled this problem by either searching for a 3D shape that best explains 2D annotations, or training purely on synthetic data with ground truth 3D information. In this work, we propose 3D INterpreter Networks (3D-INN), an end-to-end trainable framework that sequentially estimates 2D keypoint heatmaps and 3D object skeletons and poses. Our system learns from both 2D-annotated real images and synthetic 3D data. This is made possible mainly by two technical innovations. First, heatmaps of 2D keypoints serve as an intermediate representation to connect real and synthetic data. 3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes. By doing so, 3D-INN benefits from the variation and abundance of synthetic 3D objects, without suffering from the domain difference between real and synthesized images, often due to imperfect rendering. Second, we propose a Projection Layer, mapping estimated 3D structure back to 2D. During training, it ensures 3D-INN to predict 3D structure whose projection is consistent with the 2D annotations to real images. Experiments show that the proposed system performs well on both 2D keypoint estimation and 3D structure recovery. We also demonstrate that the recovered 3D information has wide vision applications, such as image retrieval. ©2018, NSF Robust Intelligence (grant: 1212849), NSF Big Data (grant 1447476), NSF Robust Intelligence (grant 1524817), ONR MURI (grant N00014-16-1-2007)
- Published
- 2018
- Full Text
- View/download PDF
8. Characterization of circulating plasma proteins in dairy cows with cytological endometritis
- Author
-
Amy Brewer, Paolo Nanni, Joseph J. Lim, Jonas Grossmann, Laura Kunz, André M. Almeida, Kieran G. Meade, John J. Callanan, Aspinas Chapwanya, Blake A. Miller, University of Zurich, and Chapwanya, Aspinas
- Subjects
0301 basic medicine ,endometritis ,1303 Biochemistry ,Proteome ,Biophysics ,Cattle Diseases ,610 Medicine & health ,10071 Functional Genomics Center Zurich ,Biology ,Proteomics ,Biochemistry ,Andrology ,03 medical and health sciences ,Immune system ,medicine ,Animals ,Lactation ,dairy cows ,Dairy cattle ,Proteomic Profile ,030102 biochemistry & molecular biology ,Postpartum Period ,Blood Proteins ,Cell Biology ,Puerperal Disorders ,medicine.disease ,Blood proteins ,Fold change ,Dairying ,030104 developmental biology ,570 Life sciences ,biology ,Cattle ,Female ,Endometritis ,Biomarkers ,plasma proteins ,1304 Biophysics - Abstract
Early diagnosis of endometritis in dairy cattle is currently requires invasive techniques and specialist expertise. The goal of this study is to utilize a gel-free mass-spectrometry based proteomics approach to compare the plasma proteome of dairy cattle with cytological endometritis to those without. Blood samples were collected from cows (N = 112) seven days postpartum (DPP). Plasma samples from a cohort of 20 animals with cytological endometritis (n = 10) and without (n = 10) as classified 21 DPP were selected for proteomic analysis. Differential abundances of proteins between the two animal groups were determined using both fold change (≥1.5 fold change) and statistical significance threshold (p, Journal of Proteomics, 205, ISSN:1874-3919, ISSN:1876-7737
- Published
- 2019
- Full Text
- View/download PDF
9. High-fidelity facial and speech animation for VR HMDs
- Author
-
Joseph J. Lim, Kyle Olszewski, Shunsuke Saito, and Hao Li
- Subjects
Facial expression ,Computer science ,business.industry ,Facial motion capture ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Animation ,Virtual reality ,Computer Graphics and Computer-Aided Design ,0202 electrical engineering, electronic engineering, information engineering ,Eye tracking ,020201 artificial intelligence & image processing ,Emotional expression ,Computer vision ,Artificial intelligence ,business ,Computer animation ,Computer facial animation ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Significant challenges currently prohibit expressive interaction in virtual reality (VR). Occlusions introduced by head-mounted displays (HMDs) make existing facial tracking techniques intractable, and even state-of-the-art techniques used for real-time facial tracking in unconstrained environments fail to capture subtle details of the user's facial expressions that are essential for compelling speech animation. We introduce a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions. Using a monocular camera attached to an HMD, we record multiple subjects performing various facial expressions and speaking several phonetically-balanced sentences. These images are used with artist-generated animation data corresponding to these sequences to train a convolutional neural network (CNN) to regress images of a user's mouth region to the parameters that control a digital avatar. To make training this system more tractable, we use audio-based alignment techniques to map images of multiple users making the same utterance to the corresponding animation parameters. We demonstrate that this approach is also feasible for tracking the expressions around the user's eye region with an internal infrared (IR) camera, thereby enabling full facial tracking. This system requires no user-specific calibration, uses easily obtainable consumer hardware, and produces high-quality animations of speech and emotional expressions. Finally, we demonstrate the quality of our system on a variety of subjects and evaluate its performance against state-of-the-art real-time facial tracking techniques.
- Published
- 2016
- Full Text
- View/download PDF
10. Learning and using the arrow of time
- Author
-
Joseph J. Lim, Andrew Zisserman, William T. Freeman, and Donglai Wei
- Subjects
Class (computer programming) ,business.industry ,SIGNAL (programming language) ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Visualization ,Arrow of time ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Codec ,Action recognition ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Sensory cue ,0105 earth and related environmental sciences - Abstract
We seek to understand the arrow of time in videos – what makes videos look like they are playing forwards or backwards? Can we visualize the cues? Can the arrow of time be a supervisory signal useful for activity analysis? To this end, we build three large-scale video datasets and apply a learning-based approach to these tasks. To learn the arrow of time efficiently and reliably, we design a ConvNet suitable for extended temporal footprints and for class activation visualization, and study the effect of artificial cues, such as cinematographic conventions, on learning. Our trained model achieves state-of-theart performance on large-scale real-world video datasets. Through cluster analysis and localization of important regions for the prediction, we examine learned visual cues that are consistent among many samples and show when and where they occur. Lastly, we use the trained ConvNet for two applications: self-supervision for action recognition, and video forensics – determining whether Hollywood film clips have been deliberately reversed in time, often used as special effects.
- Published
- 2018
11. Demo2Vec: Reasoning Object Affordances from Online Videos
- Author
-
Silvio Savarese, Te-Lin Wu, Joseph J. Lim, Kuan Fang, and Daniel Yang
- Subjects
Computer science ,business.industry ,02 engineering and technology ,010501 environmental sciences ,Object (computer science) ,01 natural sciences ,Recurrent neural network ,Action (philosophy) ,Feature (computer vision) ,Human–computer interaction ,0202 electrical engineering, electronic engineering, information engineering ,Robot ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Affordance ,0105 earth and related environmental sciences - Abstract
Watching expert demonstrations is an important way for humans and robots to reason about affordances of unseen objects. In this paper, we consider the problem of reasoning object affordances through the feature embedding of demonstration videos. We design the Demo2Vec model which learns to extract embedded vectors of demonstration videos and predicts the interaction region and the action label on a target image of the same object. We introduce the Online Product Review dataset for Affordance (OPRA) by collecting and labeling diverse YouTube product review videos. Our Demo2Vec model outperforms various recurrent neural network baselines on the collected dataset.
- Published
- 2018
- Full Text
- View/download PDF
12. Freedom from Tritrichomonas foetus infection in cattle in St. Kitts
- Author
-
Aspinas Chapwanya, Chaoqun Yao, Christopher Vosloo, Hilari M. French, Robert O. Gilbert, Rebecca L. Schleisman, Kimberly E. Coker, Juan C. Samper, John J. Callanan, Fortune Sithole, and Joseph J. Lim
- Subjects
0301 basic medicine ,Male ,Veterinary medicine ,030106 microbiology ,St kitts ,Preputial gland ,Cattle Diseases ,Tritrichomonas foetus ,Biology ,03 medical and health sciences ,Food Animals ,Pregnancy ,Calving interval ,medicine ,Animals ,Mating ,Protozoan Infections, Animal ,reproductive and urinary physiology ,Fetus ,Trichomoniasis ,Protozoan Infections ,medicine.disease ,biology.organism_classification ,Animal Science and Zoology ,Cattle ,Female ,Saint Kitts and Nevis - Abstract
Trichomonosis is an endemic disease in cattle that are reared under extensive conditions and bred by natural mating. It causes profound economic losses to the producers by increasing calving interval, increasing embryo losses, and decreasing pregnancy rates. The aim of this study was to determine whether Tritrichomonas foetus infections were absent from cattle in St. Kitts. Using the modified hypergeometric method, preputial samples from bulls (n = 78) were tested using the InPouch™ culture for presence of T. foetus. Results highlighted an absence of trichomoniasis in bulls on St. Kitts with a 95% confidence.
- Published
- 2017
13. Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
- Author
-
Joseph J. Lim, Juan Carlos Niebles, De-An Huang, and Li Fei-Fei
- Subjects
FOS: Computer and information sciences ,Referring expression ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,010501 environmental sciences ,Pragmatics ,Resolution (logic) ,01 natural sciences ,Linguistics ,Visualization ,Action (philosophy) ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,0105 earth and related environmental sciences - Abstract
We propose an unsupervised method for reference resolution in instructional videos, where the goal is to temporally link an entity (e.g., "dressing") to the action (e.g., "mix yogurt") that produced it. The key challenge is the inevitable visual-linguistic ambiguities arising from the changes in both visual appearance and referring expression of an entity in the video. This challenge is amplified by the fact that we aim to resolve references with no supervision. We address these challenges by learning a joint visual-linguistic model, where linguistic cues can help resolve visual ambiguities and vice versa. We verify our approach by learning our model unsupervisedly using more than two thousand unstructured cooking videos from YouTube, and show that our visual-linguistic model can substantially improve upon state-of-the-art linguistic only model on reference resolution in instructional videos., CVPR 2017
- Published
- 2017
14. Single Image 3D Interpreter Network
- Author
-
Joseph J. Lim, Joshua B. Tenenbaum, William T. Freeman, Tianfan Xue, Jiajun Wu, Yuandong Tian, and Antonio Torralba
- Subjects
Ground truth ,Artificial neural network ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,010501 environmental sciences ,Real image ,computer.software_genre ,Object (computer science) ,01 natural sciences ,Task (project management) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Single image ,business ,computer ,Interpreter ,0105 earth and related environmental sciences - Abstract
Understanding 3D object structure from a single image is an important but difficult task in computer vision, mostly due to the lack of 3D object annotations in real images. Previous work tackles this problem by either solving an optimization task given 2D keypoint positions, or training on synthetic data with ground truth 3D information.
- Published
- 2016
- Full Text
- View/download PDF
15. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
- Author
-
Joseph J. Lim, Eric Kolve, Abhinav Gupta, Ali Farhadi, Yuke Zhu, Li Fei-Fei, and Roozbeh Mottaghi
- Subjects
Feature engineering ,FOS: Computer and information sciences ,0209 industrial biotechnology ,Generalization ,business.industry ,media_common.quotation_subject ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Trial and error ,Machine learning ,computer.software_genre ,Visualization ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Robot ,020201 artificial intelligence & image processing ,Artificial intelligence ,Physics engine ,Function (engineering) ,business ,computer ,media_common - Abstract
Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new target goals, and (2) data inefficiency i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to the task of target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows to better generalize. To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and across scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment. The supplementary video can be accessed at the following link: https://youtu.be/SmBxMDiOrvs.
- Published
- 2016
- Full Text
- View/download PDF
16. Physics 101: Learning Physical Object Properties from Unlabeled Videos
- Author
-
Joseph J. Lim, Joshua B. Tenenbaum, Hongyi Zhang, Jiajun Wu, and William T. Freeman
- Subjects
Information retrieval ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,020207 software engineering ,020201 artificial intelligence & image processing ,02 engineering and technology - Published
- 2016
- Full Text
- View/download PDF
17. Experimental evaluation of support vector machine-based and correlation-based approaches to automatic particle selection
- Author
-
Joseph J. Lim, Robert M. Glaeser, Jitendra Malik, Dieter Typke, Bong-Gyoon Han, and Pablo Arbeláez
- Subjects
Computer science ,Fourier shell correlation ,business.industry ,Homogeneity (statistics) ,Cryoelectron Microscopy ,Pattern recognition ,Function (mathematics) ,Texture (music) ,computer.software_genre ,Support vector machine ,Set (abstract data type) ,Software ,Structural Biology ,False positive paradox ,Artificial intelligence ,Data mining ,business ,computer ,Algorithms - Abstract
The goal of this study is to evaluate the performance of software for automated particle-boxing, and in particular the performance of a new tool (TextonSVM) that recognizes the characteristic texture of particles of interest. As part of a high-throughput protocol, we use human editing that is based solely on class-average images to create final data sets that are enriched in what the investigator considers to be true-positive particles. The Fourier shell correlation (FSC) function is then used to characterize the homogeneity of different single-particle data sets that are derived from the same micrographs by two or more alternative methods. We find that the homogeneity is generally quite similar for class-edited data sets obtained by the texture-based method and by SIGNATURE, a cross-correlation-based method. The precision-recall characteristics of the texture-based method are, on the other hand, significantly better than those of the cross-correlation based method; that is to say, the texture-based approach produces a smaller fraction of false positives in the initial set of candidate particles. The computational efficiency of the two approaches is generally within a factor of two of one another. In situations when it is helpful to use a larger number of templates (exemplars), however, TextonSVM scales in a much more efficient way than do boxing programs that are based on localized cross-correlation.
- Published
- 2011
- Full Text
- View/download PDF
18. Discovering states and transformations in image collections
- Author
-
Joseph J. Lim, Phillip Isola, Edward H. Adelson, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Isola, Phillip John, Lim, Joseph Jaewhan, and Adelson, Edward H
- Subjects
Set (abstract data type) ,Meaning (philosophy of language) ,Theoretical computer science ,Transformation (function) ,Computer science ,Simple (abstract algebra) ,Object Class ,Folding (DSP implementation) ,Variety (linguistics) ,Object (computer science) ,Image (mathematics) - Abstract
Objects in visual scenes come in a rich variety of transformed states. A few classes of transformation have been heavily studied in computer vision: mostly simple, parametric changes in color and geometry. However, transformations in the physical world occur in many more flavors, and they come with semantic meaning: e.g., bending, folding, aging, etc. The transformations an object can undergo tell us about its physical and functional properties. In this paper, we introduce a dataset of objects, scenes, and materials, each of which is found in a variety of transformed states. Given a novel collection of images, we show how to explain the collection in terms of the states and transformations it depicts. Our system works by generalizing across object classes: states and transformations learned on one set of objects are used to interpret the image collection for an entirely new object class.
- Published
- 2015
- Full Text
- View/download PDF
19. Looking Beyond the Visible Scene
- Author
-
Byoungkwon An, Joseph J. Lim, Antonio Torralba, and Aditya Khosla
- Subjects
Pixel ,Computer science ,business.industry ,Perception ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scene statistics ,Computer vision ,Artificial intelligence ,business ,Sensory cue ,ComputingMethodologies_COMPUTERGRAPHICS ,media_common - Abstract
A common thread that ties together many prior works in scene understanding is their focus on the aspects directly present in a scene such as its categorical classification or the set of objects. In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned to its pixels - it is so much more. From a simple observation of a scene, we can tell a lot about the environment surrounding the scene such as the potential establishments near it, the potential crime rate in the area, or even the economic climate. Here, we explore several of these aspects from both the human perception and computer vision perspective. Specifically, we show that it is possible to predict the distance of surrounding establishments such as McDonald's or hospitals even by using scenes located far from them. We go a step further to show that both humans and computers perform well at navigating the environment based only on visual cues from scenes. Lastly, we show that it is possible to predict the crime rates in an area simply by looking at a scene without any real-time criminal activity. Simply put, here, we illustrate that it is possible to look beyond the visible scene.
- Published
- 2014
- Full Text
- View/download PDF
20. FPM: Fine Pose Parts-Based Model with 3D CAD Models
- Author
-
Aditya Khosla, Joseph J. Lim, and Antonio Torralba
- Subjects
Computer science ,business.industry ,Leverage (statistics) ,CAD ,Computer vision ,Artificial intelligence ,Real image ,business ,3D pose estimation ,Pose ,Object detection - Abstract
We introduce a novel approach to the problem of localizing objects in an image and estimating their fine-pose. Given exact CAD models, and a few real training images with aligned models, we propose to leverage the geometric information from CAD models and appearance information from real images to learn a model that can accurately estimate fine pose in real images. Specifically, we propose FPM, a fine pose parts-based model, that combines geometric information in the form of shared 3D parts in deformable part based models, and appearance information in the form of objectness to achieve both fast and accurate fine pose estimation. Our method significantly outperforms current state-of-the-art algorithms in both accuracy and speed.
- Published
- 2014
- Full Text
- View/download PDF
21. Parsing IKEA Objects: Fine Pose Estimation
- Author
-
Hamed Pirsiavash, Joseph J. Lim, Antonio Torralba, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Lim, Joseph Jaewhan, Pirsiavash, Hamed, and Torralba, Antonio
- Subjects
Parsing ,business.industry ,Computer science ,Pattern recognition ,computer.software_genre ,3D pose estimation ,Articulated body pose estimation ,Object detection ,Set (abstract data type) ,Computer vision ,Artificial intelligence ,Focus (optics) ,business ,computer ,Pose - Abstract
We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. Our main focus is to unify contributions from the 1970s with recent advances in object detection: use local keypoint detectors to find candidate poses and score global alignment of each candidate pose to the image. Moreover, we also provide a new dataset containing fine-aligned objects with their exactly matched 3D models, and a set of models for widely used objects. We also evaluate our algorithm both on object detection and fine pose estimation, and show that our method outperforms state-of-the art algorithms., United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933)
- Published
- 2013
22. Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
- Author
-
Joseph J. Lim, C. Lawrence Zitnick, and Piotr Dollár
- Subjects
Contextual image classification ,Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Object detection ,Sketch ,Random forest ,Object-class detection ,Histogram ,Viola–Jones object detection framework ,Computer vision ,Artificial intelligence ,business ,Feature detection (computer vision) - Abstract
We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both top-down and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200x faster than competing methods. We also achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA and PASCAL, respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.
- Published
- 2013
- Full Text
- View/download PDF
23. Exploiting hierarchical context on a large database of object categories
- Author
-
Joseph J. Lim, Myung Jin Choi, Antonio Torralba, and Alan S. Willsky
- Subjects
Context model ,Interpretation (logic) ,Computer science ,business.industry ,Feature extraction ,Object model ,Pattern recognition ,Context (language use) ,Image segmentation ,Artificial intelligence ,Object (computer science) ,business ,Object detection - Abstract
There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. Context models can efficiently rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit from using context models has been limited because most of these methods were tested on datasets with only a few object categories, in which most images contain only one or two object categories. In this paper, we introduce a new dataset with images that contain many instances of different object categories and propose an efficient model that captures the contextual information among more than a hundred of object categories. We show that our context model can be applied to scene understanding tasks that local detectors alone cannot solve.
- Published
- 2010
- Full Text
- View/download PDF
24. Context by region ancestry
- Author
-
Pablo Arbeláez, Joseph J. Lim, Jitendra Malik, and Chunhui Gu
- Subjects
Context model ,Contextual image classification ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Context (language use) ,Pattern recognition ,Image segmentation ,Set (abstract data type) ,Tree (data structure) ,Discriminative model ,Segmentation ,Artificial intelligence ,business - Abstract
In this paper, we introduce a new approach for modeling visual context. For this purpose, we consider the leaves of a hierarchical segmentation tree as elementary units. Each leaf is described by features of its ancestral set, the regions on the path linking the leaf to the root. We construct region trees by using a high-performance segmentation method. We then learn the importance of different descriptors (e.g. color, texture, shape) of the ancestors for classification. We report competitive results on the MSRC segmentation dataset and the MIT scene dataset, showing that region ancestry efficiently encodes information about discriminative parts, objects and scenes.
- Published
- 2009
- Full Text
- View/download PDF
25. Recognition using regions
- Author
-
Joseph J. Lim, Pablo Arbeláez, Jitendra Malik, and Chunhui Gu
- Subjects
Caltech 101 ,Contextual image classification ,business.industry ,Computer science ,Feature extraction ,Pattern recognition ,Image segmentation ,Object detection ,Histogram ,Segmentation ,Computer vision ,Artificial intelligence ,Face detection ,business ,Classifier (UML) - Abstract
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally; (2) they are only mildly affected by background clutter. Regions have not been popular as features due to their sensitivity to segmentation errors. In this paper, we start by producing a robust bag of overlaid regions for each image using Arbeldez et al., CVPR 2009. Each region is represented by a rich set of image cues (shape, color and texture). We then learn region weights using a max-margin framework. In detection and segmentation, we apply a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis. The proposed approach significantly outperforms the state of the art on the ETHZ shape database(87.1% average detection rate compared to Ferrari et al. 's 67.2%), and achieves competitive performance on the Caltech 101 database.
- Published
- 2009
- Full Text
- View/download PDF
26. A Distributed Message Passing Algorithm for Sensor Localization
- Author
-
Joseph J. Lim and Max Welling
- Subjects
Brooks–Iyengar algorithm ,Basis (linear algebra) ,Mean squared error ,Computer science ,Distributed algorithm ,Expectation propagation ,Message passing ,Wireless sensor network ,Algorithm - Abstract
We propose a fully distributed message passing algorithm based on expectation propagation for the purpose of sensor localization. Sensors perform noisy measurements of their mutual distances and their relative angles. These measurements form the basis for an iterative, local (i.e. distributed) algorithm to compute the sensor's locations including uncertainties for these estimates. This approach offers a distributed, computationally efficient and flexible framework for information fusion in sensor networks.
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.