Author: "Quanshi Zhang" / Topic: artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

1. Mining Interpretable AOG Representations From Convolutional Networks via Active Question Answering

Author: Quanshi Zhang, Ruiming Cao, Jie Ren, Ying Nian Wu, Ge Huang, and Song-Chun Zhu
Subjects: Artificial neural network, business.industry, Computer science, Applied Mathematics, Pattern recognition, Convolutional neural network, Visualization, Computational Theory and Mathematics, Artificial Intelligence, Question answering, Graph (abstract data type), Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: In this paper, we present a method to mine object-part patterns from conv-layers of a pre-trained convolutional neural network (CNN). The mined object-part patterns are organized by an And-Or graph (AOG). This interpretable AOG representation consists of a four-layer semantic hierarchy, i.e., semantic parts, part templates, latent patterns, and neural units. The AOG associates each object part with certain neural units in feature maps of conv-layers. The AOG is constructed with very few annotations (e.g., 3–20) of object parts. We develop a question-answering (QA) method that uses active human-computer communications to mine patterns from a pre-trained CNN, in order to explain features in conv-layers incrementally. During the learning process, our QA method uses the current AOG for part localization. The QA method actively identifies objects, whose feature maps cannot be explained by the AOG. Then, our method asks people to annotate parts on the unexplained objects, and uses answers to discover CNN patterns corresponding to newly labeled parts. In this way, our method gradually grows new branches and refines existing branches on the AOG to semanticize CNN representations. In experiments, our method exhibited a high learning efficiency. Our method used about $1/6$ 1 / 6 – $1/3$ 1 / 3 of the part annotations for training, but achieved similar or better part-localization performance than fast-RCNN methods.
Published: 2021
Full Text: View/download PDF

2. Interpretable CNNs for Object Classification

Author: Quanshi Zhang, Ying Nian Wu, Xin Wang, Song-Chun Zhu, and Huilin Zhou
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), 02 engineering and technology, Convolutional neural network, Machine Learning (cs.LG), Statistics - Machine Learning, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Training set, Artificial neural network, business.industry, Applied Mathematics, Pattern recognition, Object (computer science), Visualization, Computational Theory and Mathematics, Filter (video), Benchmark (computing), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part. Our method does not require additional annotations of object parts or textures for supervision. Instead, we use the same training data as traditional CNNs. Our method automatically assigns each interpretable filter in a high conv-layer with an object part of a certain category during the learning process. Such explicit knowledge representations in conv-layers of the CNN help people clarify the logic encoded in the CNN, i.e., answering what patterns the CNN extracts from an input image and uses for prediction. We have tested our method using different benchmark CNNs with various architectures to demonstrate the broad applicability of our method. Experiments have shown that our interpretable filters are much more semantically meaningful than traditional filters.
Published: 2021
Full Text: View/download PDF

3. Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

Author: Quanshi Zhang, Xu Cheng, Yilan Chen, and Zhefan Rao
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computational Theory and Mathematics, Computer Science - Artificial Intelligence, Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Applied Mathematics, Computer Science - Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition, Software, Machine Learning (cs.LG)
Abstract: Compared to traditional learning from scratch, knowledge distillation sometimes makes the DNN achieve superior performance. In this paper, we provide a new perspective to explain the success of knowledge distillation based on the information theory, i.e. quantifying knowledge points encoded in intermediate layers of a DNN for classification. To this end, we consider the signal processing in a DNN as a layer-wise process of discarding information. A knowledge point is referred to as an input unit, the information of which is discarded much less than that of other input units. Thus, we propose three hypotheses for knowledge distillation based on the quantification of knowledge points. 1. The DNN learning from knowledge distillation encodes more knowledge points than the DNN learning from scratch. 2. Knowledge distillation makes the DNN more likely to learn different knowledge points simultaneously. In comparison, the DNN learning from scratch tends to encode various knowledge points sequentially. 3. The DNN learning from knowledge distillation is often more stably optimized than the DNN learning from scratch. To verify the above hypotheses, we design three types of metrics with annotations of foreground objects to analyze feature representations of the DNN, i.e. the quantity and the quality of knowledge points, the learning speed of different knowledge points, and the stability of optimization directions. In experiments, we diagnosed various DNNs on different classification tasks, including image classification, 3D point cloud classification, binary sentiment classification, and question answering, which verified the above hypotheses.
Published: 2022

4. Interpretable Compositional Convolutional Neural Networks

Author: Jiaqi Fan, Shikun Huang, Zhihua Wei, Binbin Zhang, Quanshi Zhang, Ping Zhao, and Wen Shen
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, Filter (signal processing), ENCODE, Object (computer science), Convolutional neural network, Image (mathematics), Visual patterns, Core (graph theory), Artificial intelligence, business, Interpretability
Abstract: The reasonable definition of semantic interpretability presents the core challenge in explainable AI. This paper proposes a method to modify a traditional convolutional neural network (CNN) into an interpretable compositional CNN, in order to learn filters that encode meaningful visual patterns in intermediate convolutional layers. In a compositional CNN, each filter is supposed to consistently represent a specific compositional object part or image region with a clear meaning. The compositional CNN learns from image labels for classification without any annotations of parts or regions for supervision. Our method can be broadly applied to different types of CNNs. Experiments have demonstrated the effectiveness of our method., Comment: IJCAI2021
Published: 2021
Full Text: View/download PDF

5. Reports of the Workshops Held at the 2019 AAAI Conference on Artificial Intelligence

Author: Guy Barash, Mauricio Castillo-Effen, Niyati Chhaya, Peter Clark, Huáscar Espinoza, Eitan Farchi, Christopher Geib, Odd Erik Gundersen, Seán HÉigeartaigh, José Hernández-Orallo, Chiori Hori, Xiaowei Huang, Kokil Jaidka, Pavan Kapanipathi, Sarah Keren, Seokhwan Kim, Marc Lanctot, Danny Lange, Julian McAuley, David Martinez, Marwan Mattar, null Mausam, Martin Michalowski, Reuth Mirsky, Roozbeh Mottaghi, Joseph Osborn, Julien Perolat, Martin Schmid, Arash Shaban-Nejad, Onn Shehory, Biplav Srivastava, William Streilein, Kartik Talamadupula, Julian Togelius, Koichiro Yoshino, Quanshi Zhang, and Imed Zitouni
Subjects: Computer science, business.industry, Deep learning, Robotics, Plan (drawing), Recommender system, computer.software_genre, Knowledge extraction, Artificial Intelligence, Reinforcement learning, Artificial intelligence, Dialog system, business, computer, Agile software development
Abstract: The workshop program of the Association for the Advancement of Artificial Intelligence’s 33rd Conference on Artificial Intelligence (AAAI-19) was held in Honolulu, Hawaii, on Sunday and Monday, January 27–28, 2019. There were fifteen workshops in the program: Affective Content Analysis: Modeling Affect-in-Action, Agile Robotics for Industrial Automation Competition, Artificial Intelligence for Cyber Security, Artificial Intelligence Safety, Dialog System Technology Challenge, Engineering Dependable and Secure Machine Learning Systems, Games and Simulations for Artificial Intelligence, Health Intelligence, Knowledge Extraction from Games, Network Interpretability for Deep Learning, Plan, Activity, and Intent Recognition, Reasoning and Learning for Human-Machine Dialogues, Reasoning for Complex Question Answering, Recommender Systems Meet Natural Language Processing, Reinforcement Learning in Games, and Reproducible AI. This report contains brief summaries of the all the workshops that were held.
Published: 2019
Full Text: View/download PDF

6. Verifiability and Predictability: Interpreting Utilities of Network Architectures for Point Cloud Processing

Author: Ping Zhao, Shikun Huang, Binbin Zhang, Zhihua Wei, Wen Shen, Panyue Chen, and Quanshi Zhang
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Network architecture, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Deep learning, Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), Machine learning, computer.software_genre, Machine Learning (cs.LG), Statistics - Machine Learning, Order (exchange), Robustness (computer science), Code (cryptography), Artificial intelligence, Predictability, business, Representation (mathematics), computer, Rotation (mathematics)
Abstract: In this paper, we diagnose deep neural networks for 3D point cloud processing to explore utilities of different intermediate-layer network architectures. We propose a number of hypotheses on the effects of specific intermediate-layer network architectures on the representation capacity of DNNs. In order to prove the hypotheses, we design five metrics to diagnose various types of DNNs from the following perspectives, information discarding, information concentration, rotation robustness, adversarial robustness, and neighborhood inconsistency. We conduct comparative studies based on such metrics to verify the hypotheses. We further use the verified hypotheses to revise intermediate-layer architectures of existing DNNs and improve their utilities. Experiments demonstrate the effectiveness of our method. The code will be released when this paper is accepted.
Published: 2021
Full Text: View/download PDF

7. Explaining Knowledge Distillation by Quantifying the Knowledge

Author: Quanshi Zhang, Xu Cheng, Zhefan Rao, and Yilan Chen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Knowledge engineering, Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Machine Learning (cs.LG), law.invention, Visualization, Statistics - Machine Learning, law, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, Entropy (information theory), 020201 artificial intelligence & image processing, Artificial intelligence, business, Distillation, 0105 earth and related environmental sciences
Abstract: This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts that are encoded in intermediate layers of a deep neural network (DNN). More specifically, three hypotheses are proposed as follows. 1. Knowledge distillation makes the DNN learn more visual concepts than learning from raw data. 2. Knowledge distillation ensures that the DNN is prone to learning various visual concepts simultaneously. Whereas, in the scenario of learning from raw data, the DNN learns visual concepts sequentially. 3. Knowledge distillation yields more stable optimization directions than learning from raw data. Accordingly, we design three types of mathematical metrics to evaluate feature representations of the DNN. In experiments, we diagnosed various DNNs, and above hypotheses were verified.
Published: 2020
Full Text: View/download PDF

8. Extraction of an Explanatory Graph to Interpret a CNN

Author: Song-Chun Zhu, Ruiming Cao, Feng Shi, Ying Nian Wu, Quanshi Zhang, and Xin Wang
Subjects: Artificial neural network, Computer science, business.industry, Applied Mathematics, Feature extraction, Pattern recognition, 02 engineering and technology, Convolutional neural network, Graph, Visualization, Computational Theory and Mathematics, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, Effective method, Graph (abstract data type), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Graphical model, Artificial intelligence, business, Software
Abstract: This paper introduces an explanatory graph representation to reveal object parts encoded inside convolutional layers of a CNN. Given a pre-trained CNN, each filter1 in a conv-layer usually represents a mixture of object parts. We develop a simple yet effective method to learn an explanatory graph, which automatically disentangles object parts from each filter without any part annotations. Specifically, given the feature map of a filter, we mine neural activations from the feature map, which correspond to different object parts. The explanatory graph is constructed to organize each mined part as a graph node. Each edge connects two nodes, whose corresponding object parts usually co-activate and keep a stable spatial relationship. Experiments show that each graph node consistently represented the same object part through different images, which boosted the transferability of CNN features. The explanatory graph transferred features of object parts to the task of part localization, and our method significantly outperformed other approaches.
Published: 2020

9. Prediction and Simulation of Human Mobility Following Natural Disasters

Author: Quanshi Zhang, Yoshihide Sekimoto, Xing Xie, Ryosuke Shibasaki, Xuan Song, and Nicholas Jing Yuan
Subjects: education.field_of_study, Mobility model, Operations research, Emergency management, business.industry, Computer science, Population, 02 engineering and technology, Flow network, Theoretical Computer Science, Risk analysis (engineering), Artificial Intelligence, 020204 information systems, Urban computing, 0202 electrical engineering, electronic engineering, information engineering, Global Positioning System, 020201 artificial intelligence & image processing, Unavailability, business, education, Natural disaster
Abstract: In recent decades, the frequency and intensity of natural disasters has increased significantly, and this trend is expected to continue. Therefore, understanding and predicting human behavior and mobility during a disaster will play a vital role in planning effective humanitarian relief, disaster management, and long-term societal reconstruction. However, such research is very difficult to perform owing to the uniqueness of various disasters and the unavailability of reliable and large-scale human mobility data. In this study, we collect big and heterogeneous data (e.g., GPS records of 1.6 million users 1 over 3 years, data on earthquakes that have occurred in Japan over 4 years, news report data, and transportation network data) to study human mobility following natural disasters. An empirical analysis is conducted to explore the basic laws governing human mobility following disasters, and an effective human mobility model is developed to predict and simulate population movements. The experimental results demonstrate the efficiency of our model, and they suggest that human mobility following disasters can be significantly more predictable and be more easily simulated than previously thought.
Published: 2016
Full Text: View/download PDF

10. Object Discovery: Soft Attributed Graph Mining

Author: Ryosuke Shibasaki, Xiaowei Shao, Huijing Zhao, Quanshi Zhang, and Xuan Song
Subjects: Matching (graph theory), Computer science, Applied Mathematics, Graph theory, 02 engineering and technology, computer.software_genre, Fuzzy logic, Graph, Data modeling, Visualization, Computational Theory and Mathematics, Categorization, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, Graph (abstract data type), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Data mining, Pattern matching, computer, Software
Abstract: We categorize this research in terms of its contribution to both graph theory and computer vision. From the theoretical perspective, this study can be considered as the first attempt to formulate the idea of mining maximal frequent subgraphs in the challenging domain of messy visual data, and as a conceptual extension to the unsupervised learning of graph matching. We define a soft attributed pattern (SAP) to represent the common subgraph pattern among a set of attributed relational graphs (ARGs), considering both their structure and attributes. Regarding the differences between ARGs with fuzzy attributes and conventional labeled graphs, we propose a new mining strategy that directly extracts the SAP with the maximal graph size without applying node enumeration. Given an initial graph template and a number of ARGs, we develop an unsupervised method to modify the graph template into the maximal-size SAP. From a practical perspective, this research develops a general platform for learning the category model (i.e., the SAP) from cluttered visual data (i.e., the ARGs) without labeling "what is where," thereby opening the possibility for a series of applications in the era of big visual data. Experiments demonstrate the superior performance of the proposed method on RGB/RGB-D images and videos.
Published: 2016
Full Text: View/download PDF

11. Explaining Neural Networks Semantically and Quantitatively

Author: Jie Ren, Ge Huang, Hao Chen, Runjin Chen, and Quanshi Zhang
Subjects: FOS: Computer and information sciences, Interpretation (logic), Artificial neural network, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, 0302 clinical medicine, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: This paper presents a method to explain the knowledge encoded in a convolutional neural network (CNN) quantitatively and semantically. The analysis of the specific rationale of each prediction made by the CNN presents a key issue of understanding neural networks, but it is also of significant practical values in certain applications. In this study, we propose to distill knowledge from the CNN into an explainable additive model, so that we can use the explainable model to provide a quantitative explanation for the CNN prediction. We analyze the typical bias-interpreting problem of the explainable model and develop prior losses to guide the learning of the explainable additive model. Experimental results have demonstrated the effectiveness of our method.
Published: 2018

12. Interpretable Convolutional Neural Networks

Author: Ying Nian Wu, Song-Chun Zhu, and Quanshi Zhang
Subjects: FOS: Computer and information sciences, Training set, Artificial neural network, Knowledge representation and reasoning, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Convolutional neural network, Visualization, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, Entropy (information theory), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs., In this version, we release the website of the code. Compared to the previous version, we have corrected all values of location instability in Table 3--6 by dividing the values by sqrt(2), i.e., a=a/sqrt(2). Such revisions do NOT decrease the significance of the superior performance of our method, because we make the same correction to location-instability values of all baselines
Published: 2018
Full Text: View/download PDF

13. From RGB-D Images to RGB Images

Author: Ryosuke Shibasaki, Xiaowei Shao, Xuan Song, Quanshi Zhang, and Huijing Zhao
Subjects: Scale (ratio), Computer science, business.industry, Supervised learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Object (computer science), Object detection, Field (computer science), Theoretical Computer Science, Artificial Intelligence, Scalability, RGB color model, Computer vision, Artificial intelligence, Transfer of learning, business
Abstract: Mining object-level knowledge, that is, building a comprehensive category model base, from a large set of cluttered scenes presents a considerable challenge to the field of artificial intelligence. How to initiate model learning with the least human supervision (i.e., manual labeling) and how to encode the structural knowledge are two elements of this challenge, as they largely determine the scalability and applicability of any solution. In this article, we propose a model-learning method that starts from a single-labeled object for each category, and mines further model knowledge from a number of informally captured, cluttered scenes. However, in these scenes, target objects are relatively small and have large variations in texture, scale, and rotation. Thus, to reduce the model bias normally associated with less supervised learning methods, we use the robust 3D shape in RGB-D images to guide our model learning, then apply the properly trained category models to both object detection and recognition in more conventional RGB images. In addition to model training for their own categories, the knowledge extracted from the RGB-D images can also be transferred to guide model learning for a new category, in which only RGB images without depth information in the new category are provided for training. Preliminary testing shows that the proposed method performs as well as fully supervised learning methods.
Published: 2015
Full Text: View/download PDF

14. Interpreting CNNs via Decision Trees

Author: Yu Yang, Ying Nian Wu, Haotian Ma, and Quanshi Zhang
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Deep learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Decision tree, 02 engineering and technology, Visual reasoning, 010501 environmental sciences, Object (computer science), 01 natural sciences, Convolutional neural network, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), 020201 artificial intelligence & image processing, Artificial intelligence, business, Feature learning, 0105 earth and related environmental sciences
Abstract: This paper aims to quantitatively explain the rationales of each prediction that is made by a pre-trained convolutional neural network (CNN). We propose to learn a decision tree, which clarifies the specific reason for each prediction made by the CNN at the semantic level. I.e., the decision tree decomposes feature representations in high conv-layers of the CNN into elementary concepts of object parts. In this way, the decision tree tells people which object parts activate which filters for the prediction and how much each object part contributes to the prediction score. Such semantic and quantitative explanations for CNN predictions have specific values beyond the traditional pixel-level analysis of CNNs. More specifically, our method mines all potential decision modes of the CNN, where each mode represents a typical case of how the CNN uses object parts for prediction. The decision tree organizes all potential decision modes in a coarse-to-fine manner to explain CNN predictions at different fine-grained levels. Experiments have demonstrated the effectiveness of the proposed method.
Published: 2018
Full Text: View/download PDF

15. Mining Object Parts from CNNs via Active Question-Answering

Author: Ruiming Cao, Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Cognitive neuroscience of visual object recognition, 020207 software engineering, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Object detection, Visualization, 0202 electrical engineering, electronic engineering, information engineering, Question answering, Graph (abstract data type), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Given a convolutional neural network (CNN) that is pre-trained for object classification, this paper proposes to use active question-answering to semanticize neural patterns in conv-layers of the CNN and mine part concepts. For each part concept, we mine neural patterns in the pre-trained CNN, which are related to the target part, and use these patterns to construct an And-Or graph (AOG) to represent a four-layer semantic hierarchy of the part. As an interpretable model, the AOG associates different CNN units with different explicit object parts. We use an active human-computer communication to incrementally grow such an AOG on the pre-trained CNN as follows. We allow the computer to actively identify objects, whose neural patterns cannot be explained by the current AOG. Then, the computer asks human about the unexplained objects, and uses the answers to automatically discover certain CNN patterns corresponding to the missing knowledge. We incrementally grow the AOG to encode new knowledge discovered during the active-learning process. In experiments, our method exhibits high learning efficiency. Our method uses about 1/6-1/3 of the part annotations for training, but achieves similar or better part-localization performance than fast-RCNN methods., Comment: Published in CVPR 2017
Published: 2017
Full Text: View/download PDF

16. Intelligent System for Human Behavior Analysis and Reasoning Following Large-Scale Disasters

Author: Yoshihide Sekimoto, Xuan Song, Teerayut Horanont, Ryosuke Shibasaki, Satoshi Ueyama, and Quanshi Zhang
Subjects: education.field_of_study, Emergency management, ComputingMethodologies_SIMULATIONANDMODELING, Computer Networks and Communications, Computer science, business.industry, Scale (chemistry), Population, Computer security, computer.software_genre, Behavioral analysis, Fukushima daiichi, Artificial Intelligence, Global Positioning System, education, business, computer
Abstract: By mining "big GPS records" of 1.6 million users, an intelligent system automatically discovers, analyzes, and simulates population evacuations during the Great East Japan Earthquake and the Fukushima Daiichi nuclear accident.
Published: 2013
Full Text: View/download PDF

17. A novel dynamic model for multiple pedestrians tracking in extremely crowded scenarios

Author: Xiaowei Shao, Quanshi Zhang, Huijing Zhao, Hongbin Zha, Ryosuke Shibasaki, and Xuan Song
Subjects: Tracking model, Subway station, business.industry, Computer science, Novelty, High density, Hardware and Architecture, Signal Processing, Fully automatic, Multi target tracking, Computer vision, Artificial intelligence, business, Merge (version control), Software, Information Systems
Abstract: Tracking hundreds of persons in the large and high density scenarios is a particularly challenging task due to the frequent occlusions and merged measurements. In such circumstances, a stronger dynamic model for prediction usually plays a more important role in the overall tracking process. In this paper, we propose an elaborate dynamic model for multiple pedestrians tracking in the extremely crowded environments. The novelty of this tracking model is that: the global semantic scene structure, local instantaneous crowd flow and the social interactions among persons are taken into account together and combined into an unified approach, which can make the prediction for persons' motion more powerful and accurate. We apply the proposed model by using an online ''tracking-learning'' framework, which can not only perform the robust tracking in the extremely crowded scenarios, but also ensures that the entire process is fully automatic and online. The testing is conducted on the JR subway station of Tokyo, and the experimental results show that the system with our tracking model can robustly track more than 180 targets at the same time while the occlusions and merge/split frequently occur.
Published: 2013
Full Text: View/download PDF

18. A fully online and unsupervised system for large and high-density area surveillance

Author: Jinshi Cui, Huijing Zhao, Xiaowei Shao, Hongbin Zha, Quanshi Zhang, Xuan Song, and Ryosuke Shibasaki
Subjects: Structure (mathematical logic), Subway station, Cover (telecommunications), Computer science, business.industry, High density, Tracking (particle physics), Theoretical Computer Science, Artificial Intelligence, Feature (computer vision), Key (cryptography), Computer vision, Artificial intelligence, business, Abnormality detection
Abstract: For reasons of public security, an intelligent surveillance system that can cover a large, crowded public area has become an urgent need. In this article, we propose a novel laser-based system that can simultaneously perform tracking, semantic scene learning, and abnormality detection in a fully online and unsupervised way. Furthermore, these three tasks cooperate with each other in one framework to improve their respective performances. The proposed system has the following key advantages over previous ones: (1) It can cover quite a large area (more than 60×35m), and simultaneously perform robust tracking, semantic scene learning, and abnormality detection in a high-density situation. (2) The overall system can vary with time, incrementally learn the structure of the scene, and perform fully online abnormal activity detection and tracking. This feature makes our system suitable for real-time applications. (3) The surveillance tasks are carried out in a fully unsupervised manner, so that there is no need for manual labeling and the construction of huge training datasets. We successfully apply the proposed system to the JR subway station in Tokyo, and demonstrate that it can cover an area of 60×35m, robustly track more than 150 targets at the same time, and simultaneously perform online semantic scene learning and abnormality detection with no human intervention.
Published: 2013
Full Text: View/download PDF

19. Mining Deep And-Or Object Structures via Cost-Sensitive Question-Answer-Based Active Annotations

Author: Hao Zhang, Ying Nian Wu, Song-Chun Zhu, and Quanshi Zhang
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Cost sensitive, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Viewpoints, Machine learning, computer.software_genre, Discriminative model, Signal Processing, Model learning, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Question answer, business, computer, Software, Generative grammar
Abstract: This paper presents a cost-sensitive active Question-Answering (QA) framework for learning a nine-layer And-Or graph (AOG) from web images. The AOG explicitly represents object categories, poses/viewpoints, parts, and detailed structures within the parts in a compositional hierarchy. The QA framework is designed to minimize an overall risk, which trades off the loss and query costs. The loss is defined for nodes in all layers of the AOG, including the generative loss (measuring the likelihood of the images) and the discriminative loss (measuring the fitness to human answers). The cost comprises both the human labor of answering questions and the computational cost of model learning. The cost-sensitive QA framework iteratively selects different storylines of questions to update different nodes in the AOG. Experiments showed that our method required much less human supervision (e.g. labeling parts on 3–10 training objects for each category) and achieved better performance than baseline methods.
Published: 2017
Full Text: View/download PDF

20. Unsupervised skeleton extraction and motion capture from 3D deformable matching

Author: Huijing Zhao, Quanshi Zhang, Xiaowei Shao, Xuan Song, and Ryosuke Shibasaki
Subjects: Markov random field, Matching (graph theory), business.industry, Computer science, Cognitive Neuroscience, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Point cloud, Skeleton (category theory), Object (computer science), Motion capture, Computer Science Applications, Artificial Intelligence, Feature (computer vision), Point (geometry), Computer vision, Artificial intelligence, business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: This paper presents a novel method to extract skeletons of complex articulated objects from 3D point cloud sequences collected by the Kinect. Our approach is more robust than the traditional video-based and stereo-based approaches, as the Kinect directly provides 3D information without any markers, 2D-to-3D-transition assumptions, and feature point extraction. We track all the raw 3D points on the object, and utilize the point trajectories to determine the object skeleton. The point tracking is achieved by the 3D non-rigid matching based on the Markov Random Field (MRF) Deformation Model. To reduce the large computational cost of the non-rigid matching, a coarse-to-fine procedure is proposed. To the best of our knowledge, this is the first to extract skeletons of highly deformable objects from 3D point cloud sequences by point tracking. Experiments prove our method's good performance, and the extracted skeletons are successfully applied to the motion capture.
Published: 2013
Full Text: View/download PDF

21. When 3D Reconstruction Meets Ubiquitous RGB-D Images

Author: Quanshi Zhang, Huijing Zhao, Xiaowei Shao, Xuan Song, and Ryosuke Shibasaki
Subjects: Structure (mathematical logic), Set (abstract data type), business.industry, Computer science, 3D reconstruction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, RGB color model, Computer vision, Artificial intelligence, business, Object (computer science), Rotation (mathematics), ComputingMethodologies_COMPUTERGRAPHICS
Abstract: 3D reconstruction from a single image is a classical problem in computer vision. However, it still poses great challenges for the reconstruction of daily-use objects with irregular shapes. In this paper, we propose to learn 3D reconstruction knowledge from informally captured RGB-D images, which will probably be ubiquitously used in daily life. The learning of 3D reconstruction is defined as a category modeling problem, in which a model for each category is trained to encode category-specific knowledge for 3D reconstruction. The category model estimates the pixel-level 3D structure of an object from its 2D appearance, by taking into account considerable variations in rotation, 3D structure, and texture. Learning 3D reconstruction from ubiquitous RGB-D images creates a new set of challenges. Experimental results have demonstrated the effectiveness of the proposed approach.
Published: 2014
Full Text: View/download PDF

22. Attributed Graph Mining and Matching: An Attempt to Define and Extract Soft Attributed Patterns

Author: Huijing Zhao, Ryosuke Shibasaki, Quanshi Zhang, Xiaowei Shao, and Xuan Song
Subjects: Factor-critical graph, business.industry, Voltage graph, Pattern recognition, Strength of a graph, Butterfly graph, Simplex graph, law.invention, law, Line graph, Artificial intelligence, Null graph, Graph property, business, Computer Science::Databases, MathematicsofComputing_DISCRETEMATHEMATICS, Mathematics
Abstract: Graph matching and graph mining are two typical areas in artificial intelligence. In this paper, we define the soft attributed pattern (SAP) to describe the common subgraph pattern among a set of attributed relational graphs (ARGs), considering both the graphical structure and graph attributes. We propose a direct solution to extract the SAP with the maximal graph size without node enumeration. Given an initial graph template and a number of ARGs, we modify the graph template into the maximal SAP among the ARGs in an unsupervised fashion. The maximal SAP extraction is equivalent to learning a graphical model (i.e. an object model) from large ARGs (i.e. cluttered RGB/RGB-D images) for graph matching, which extends the concept of "unsupervised learning for graph matching." Furthermore, this study can be also regarded as the first known approach to formulating "maximal graph mining" in the graph domain of ARGs. Our method exhibits superior performance on RGB and RGB-D images.
Published: 2014
Full Text: View/download PDF

23. Start from minimum labeling: Learning of 3D object models and point labeling from a large and complex environment

Author: Ryosuke Shibasaki, Huijing Zhao, Quanshi Zhang, Xiaowei Shao, and Xuan Song
Subjects: business.industry, Computer science, Point (geometry), Artificial intelligence, business, Object (computer science)
Published: 2014
Full Text: View/download PDF

24. Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes

Author: Huijing Zhao, Ryosuke Shibasaki, Xiaowei Shao, Quanshi Zhang, and Xuan Song
Subjects: Matching (graph theory), business.industry, Image matching, Graph theory, Machine learning, computer.software_genre, 3-dimensional matching, Model learning, Graph (abstract data type), Artificial intelligence, Graphical model, business, computer, Mathematics
Abstract: Although graph matching is a fundamental problem in pattern recognition, and has drawn broad interest from many fields, the problem of learning graph matching has not received much attention. In this paper, we redefine the learning of graph matching as a model learning problem. In addition to conventional training of matching parameters, our approach modifies the graph structure and attributes to generate a graphical model. In this way, the model learning is oriented toward both matching and recognition performance, and can proceed in an unsupervised fashion. Experiments demonstrate that our approach outperforms conventional methods for learning graph matching.
Published: 2013
Full Text: View/download PDF

25. Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models

Author: Xiaowei Shao, Xuan Song, Ryosuke Shibasaki, Huijing Zhao, and Quanshi Zhang
Subjects: Computer science, business.industry, Feature extraction, Pattern recognition, Object (computer science), Object detection, Object-class detection, Feature (computer vision), RGB color model, Object model, Computer vision, Viola–Jones object detection framework, Graphical model, Artificial intelligence, business
Abstract: An object model base that covers a large number of object categories is of great value for many computer vision tasks. As artifacts are usually designed to have various textures, their structure is the primary distinguishing feature between different categories. Thus, how to encode this structural information and how to start the model learning with a minimum of human labeling become two key challenges for the construction of the model base. We design a graphical model that uses object edges to represent object structures, and this paper aims to incrementally learn this category model from one labeled object and a number of casually captured scenes. However, the incremental model learning may be biased due to the limited human labeling. Therefore, we propose a new strategy that uses the depth information in RGBD images to guide the model learning for object detection in ordinary RGB images. In experiments, the proposed method achieves superior performance as good as the supervised methods that require the labeling of all target objects.
Published: 2013
Full Text: View/download PDF

26. Laser-based intelligent surveillance and abnormality detection in extremely crowded scenarios

Author: Huijing Zhao, Quanshi Zhang, Xiaowei Shao, Ryosuke Shibasaki, Xuan Song, and Hongbin Zha
Subjects: Computer science, business.industry, Robustness (computer science), ComputerApplications_COMPUTERSINOTHERSYSTEMS, Computer vision, Artificial intelligence, business, Object detection, Abnormality detection
Abstract: Abnormal activity detection plays a crucial role in surveillance applications, and a surveillance system that can perform robustly in the extremely crowded area has become an urgent need for public security. In this paper, we propose a novel laser-based system which can simultaneously perform the tracking, semantic scene learning and abnormality detection in the large and crowded environment. In our system, a novel abnormality detection model is proposed, and it considers and combines various factors that will influence human activity. Moreover, this model intensively investigate the relationship between pedestrians' social behaviors and their walking scenarios. We successfully applied the proposed system to the JR subway station of Tokyo, which can cover a 60×35m area, robustly track more than 180 targets at the same time and simultaneously perform the online semantic scene learning and abnormality detection with no human intervention.
Published: 2012
Full Text: View/download PDF

27. Moving object classification using horizontal laser scan data

Author: Jinshi Cui, Ryosuke Shibasaki, Quanshi Zhang, Huijing Zhao, Hongbin Zha, and Masaki Chiba
Subjects: Laser scanning, business.industry, Computer science, Property (programming), Feature extraction, Simultaneous localization and mapping, Object (computer science), Object detection, Lidar, Feature (computer vision), Global Positioning System, Computer vision, Artificial intelligence, business
Abstract: Motivated by two potential applications, i.e. enhancing driving safety and traffic data collection, a system has been developed using a single-layer horizontal laser scanner as the major sensor for both localization and perception of the surroundings in a large dynamic urban environment. This research focuses on a classification method, that given a stream of laser measurements, classify the moving object into either a person, a group of people, a bicycle or a car. In this research, a number of features are defined after examining the property of data appearance. A classification method is proposed after examining the likelihood measures between each pair of feature and class. Experimental results are presented, demonstrating that the algorithm has efficiency with respect to both driving safety and traffic data collection in highly dynamic environment.
Published: 2009
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

27 results on '"Quanshi Zhang"'

1. Mining Interpretable AOG Representations From Convolutional Networks via Active Question Answering

2. Interpretable CNNs for Object Classification

3. Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

4. Interpretable Compositional Convolutional Neural Networks

5. Reports of the Workshops Held at the 2019 AAAI Conference on Artificial Intelligence

6. Verifiability and Predictability: Interpreting Utilities of Network Architectures for Point Cloud Processing

7. Explaining Knowledge Distillation by Quantifying the Knowledge

8. Extraction of an Explanatory Graph to Interpret a CNN

9. Prediction and Simulation of Human Mobility Following Natural Disasters

10. Object Discovery: Soft Attributed Graph Mining

11. Explaining Neural Networks Semantically and Quantitatively

12. Interpretable Convolutional Neural Networks

13. From RGB-D Images to RGB Images

14. Interpreting CNNs via Decision Trees

15. Mining Object Parts from CNNs via Active Question-Answering

16. Intelligent System for Human Behavior Analysis and Reasoning Following Large-Scale Disasters

17. A novel dynamic model for multiple pedestrians tracking in extremely crowded scenarios

18. A fully online and unsupervised system for large and high-density area surveillance

19. Mining Deep And-Or Object Structures via Cost-Sensitive Question-Answer-Based Active Annotations

20. Unsupervised skeleton extraction and motion capture from 3D deformable matching

21. When 3D Reconstruction Meets Ubiquitous RGB-D Images

22. Attributed Graph Mining and Matching: An Attempt to Define and Extract Soft Attributed Patterns

23. Start from minimum labeling: Learning of 3D object models and point labeling from a large and complex environment

24. Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes

25. Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models

26. Laser-based intelligent surveillance and abnormality detection in extremely crowded scenarios

27. Moving object classification using horizontal laser scan data

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

27 results on '"Quanshi Zhang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources