6 results on '"Wang, James Z."'
Search Results
2. ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild
- Author
-
Luo, Yu, Ye, Jianbo, Adams, Jr., Reginald B., Li, Jia, Newman, Michelle G., and Wang, James Z.
- Published
- 2020
- Full Text
- View/download PDF
3. PaDNet: Pan-Density Crowd Counting.
- Author
-
Tian, Yukun, Lei, Yiming, Zhang, Junping, and Wang, James Z.
- Subjects
COMPUTER vision ,CROWDS ,DENSITY ,COUNTING - Abstract
Crowd counting is a highly challenging problem in computer vision and machine learning. Most previous methods have focused on consistent density crowds, i.e., either a sparse or a dense crowd, meaning they performed well in global estimation while neglecting local accuracy. To make crowd counting more useful in the real world, we propose a new perspective, named pan-density crowd counting, which aims to count people in varying density crowds. Specifically, we propose the Pan-Density Network (PaDNet) which is composed of the following critical components. First, the Density-Aware Network (DAN) contains multiple subnetworks pretrained on scenarios with different densities. This module is capable of capturing pan-density information. Second, the Feature Enhancement Layer (FEL) effectively captures the global and local contextual features and generates a weight for each density-specific feature. Third, the Feature Fusion Network (FFN) embeds spatial context and fuses these density-specific features. Further, the metrics Patch MAE (PMAE) and Patch RMSE (PRMSE) are proposed to better evaluate the performance on the global and local estimations. Extensive experiments on four crowd counting benchmark datasets, the ShanghaiTech, the UCF_CC_50, the UCSD, and the UCF-QNRF, indicate that PaDNet achieves state-of-the-art recognition performance and high robustness in pan-density crowd counting. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
4. MILES: Multiple-Instance Learning via Embedded Instance Selection.
- Author
-
Chen, Yixin, Bi, Jinbo, and Wang, James Z.
- Subjects
MACHINE learning ,COMPUTATIONAL learning theory ,COMPUTER vision ,LABELS ,APPLICATION software ,ALGORITHMS - Abstract
Multiple-instance problems arise from the situations where training class labels are attached to sets of samples (named bags), instead of individual samples within each bag (called instances). Most previous multiple-instance learning (MIL) algorithms are developed based on the assumption that a bag is positive if and only if at least one of its instances is positive. Although the assumption works well in a drug activity prediction problem, it is rather restrictive for other applications, especially those in the computer vision area. We propose a learning method, MILES (Multiple-Instance Learning via Embedded instance Selection), which converts the multiple-instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels. MILES maps each bag into a feature space defined by the instances in the training bags via an instance similarity measure. This feature mapping often provides a large number of redundant or irrelevant features. Hence, 1-norm SVM is applied to select important features as well as construct classifiers simultaneously. We have performed extensive experiments. In comparison with other methods, MILES demonstrates competitive classification accuracy, high computation efficiency, and robustness to labeling uncertainty. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
5. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach.
- Author
-
Jia Li and Wang, James Z.
- Subjects
- *
IMAGE processing , *INDEXING , *COMPUTER vision , *MARKOV processes - Abstract
Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in computer vision and content-based image retrieval. In this paper, we introduce a statistical modeling approach to this problem. Categorized images are used to train a dictionary of hundreds of statistical models each representing a concept. Images of any given concept are regarded as instances of a stochastic process that characterizes the concept. To measure the extent of association between an image and the textual description of a concept, the likelihood of the occurrence of the image based on the characterizing stochastic process is computed. A high likelihood indicates a strong association. In our experimental implementation, we focus on a particular group of stochastic processes, that is, the two-dimensional multiresolution hidden Markov models (2D MHMMs). We implemented and tested our ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images. The system is evaluated quantitatively using more than 4,600 images outside the training database and compared with a random annotation scheme. Experiments have demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images. [ABSTRACT FROM AUTHOR]
- Published
- 2003
- Full Text
- View/download PDF
6. DeepStroke: An efficient stroke screening framework for emergency rooms with multimodal adversarial deep learning.
- Author
-
Cai, Tongan, Ni, Haomiao, Yu, Mingli, Huang, Xiaolei, Wong, Kelvin, Volpi, John, Wang, James Z., and Wong, Stephen T.C.
- Subjects
- *
HOSPITAL emergency services , *DEEP learning , *MULTIMODAL user interfaces , *FACIAL paralysis , *SPEECH disorders , *FACE , *PHYSICIANS , *MEDICAL screening - Abstract
• A powerful multimodal deep learning framework for stroke screening in ER settings • Spatiotemporal facial frame proposal tackles "in-the-wild" patient conditions • Multi-level fusion of visual and audio features achieves better overall performance • Adversarial training mitigates "face-remembering" and learns stroke features • Transfer learning reduces facial-attribute bias and improves generalizability [Display omitted] In an emergency room (ER) setting, stroke triage or screening is a common challenge. A quick CT is usually done instead of MRI due to MRI's slow throughput and high cost. Clinical tests are commonly referred to during the process, but the misdiagnosis rate remains high. We propose a novel multimodal deep learning framework, DeepStroke , to achieve computer-aided stroke presence assessment by recognizing patterns of minor facial muscles incoordination and speech inability for patients with suspicion of stroke in an acute setting. Our proposed DeepStroke takes one-minute facial video data and audio data readily available during stroke triage for local facial paralysis detection and global speech disorder analysis. Transfer learning was adopted to reduce face-attribute biases and improve generalizability. We leverage a multi-modal lateral fusion to combine the low- and high-level features and provide mutual regularization for joint training. Novel adversarial training is introduced to obtain identity-free and stroke-discriminative features. Experiments on our video-audio dataset with actual ER patients show that DeepStroke outperforms state-of-the-art models and achieves better performance than both a triage team and ER doctors, attaining a 10.94% higher sensitivity and maintaining 7.37% higher accuracy than traditional stroke triage when specificity is aligned. Meanwhile, each assessment can be completed in less than six minutes, demonstrating the framework's great potential for clinical translation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.