Author: "Yunlu Xu" / Topic: 02 engineering and technology - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yunlu Xu"' showing total 9 results

Start Over Author "Yunlu Xu" Topic 02 engineering and technology

9 results on '"Yunlu Xu"'

1. FREE: A Fast and Robust End-to-End Video Text Spotter

Author: Liang Qiao, Shiliang Pu, Fei Wu, Zhanzhan Cheng, Jing Lu, Yi Niu, Baorui Zou, Shuigeng Zhou, and Yunlu Xu
Subjects: Computer science, business.industry, Process (computing), 02 engineering and technology, Spotting, Computer Graphics and Computer-Aided Design, Pipeline (software), End-to-end principle, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Scale (map), business, Software
Abstract: Currently, video text spotting tasks usually fall into the four-staged pipeline: detecting text regions in individual images, recognizing localized text regions frame-wisely, tracking text streams and post-processing to generate final results. However, they may suffer from the huge computational cost as well as sub-optimal results due to the interferences of low-quality text and the none-trainable pipeline strategy. In this article, we propose a fast and robust end-to-end video text spotting framework named FREE by only recognizing the localized text stream one-time instead of frame-wise recognition. Specifically, FREE first employs a well-designed spatial-temporal detector that learns text locations among video frames. Then a novel text recommender is developed to select the highest-quality text from text streams for recognizing. Here, the recommender is implemented by assembling text tracking, quality scoring and recognition into a trainable module. It not only avoids the interferences from the low-quality text but also dramatically speeds up the video text spotting. FREE unites the detector and recommender into a whole framework, and helps achieve global optimization. Besides, we collect a large scale video text dataset for promoting the video text spotting community, containing 100 videos from 21 real-life scenarios. Extensive experiments on public benchmarks show our method greatly speeds up the text spotting process, and also achieves the remarkable state-of-the-art.
Published: 2021
Full Text: View/download PDF

2. Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection

Author: Fei Wu, Yunlu Xu, Shiliang Pu, Yi Niu, Jianwen Xie, Chengwei Zhang, and Zhanzhan Cheng
Subjects: FOS: Computer and information sciences, 0209 industrial biotechnology, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, 02 engineering and technology, General Medicine, Star (graph theory), Class (biology), Term (time), 020901 industrial engineering & automation, Recurrent neural network, Action (philosophy), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Noise (video), Artificial intelligence, business
Abstract: This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weights and thus helps the noise relieving from background or other actions. Secondly, we build temporal relationship between actions by feeding the assembled features into an enhanced recurrent neural network. Finally, we transform the output of recurrent neural network into the corresponding action distribution. In order to generate more precise temporal proposals, we design a score term called segregated temporal gradient-weighted class activation mapping (ST-GradCAM) fused with attention weights. Experiments on THUMOS'14 and ActivityNet1.3 datasets show that our approach outperforms the state-of-the-art weakly-supervised method, and performs at par with the fully-supervised counterparts., Comment: Accepted to Proc. AAAI Conference on Artificial Intelligence 2019
Published: 2019
Full Text: View/download PDF

3. TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

Author: Zhanzhan Cheng, Yunlu Xu, Yi Niu, Liang Qiao, Fei Wu, Jing Lu, Shiliang Pu, and Peng Zhang
Subjects: FOS: Computer and information sciences, Focus (computing), business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, Semantics, 01 natural sciences, Task (computing), Variable (computer science), Information extraction, Structured text, Trie, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, 0105 earth and related environmental sciences
Abstract: Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks, (1) text reading for detecting and recognizing texts in images and (2) information extraction for analyzing and extracting key elements from previously extracted plain text. However, they mainly focus on improving information extraction task, while neglecting the fact that text reading and information extraction are mutually correlated. In this paper, we propose a unified end-to-end text reading and information extraction network, where the two tasks can reinforce each other. Specifically, the multimodal visual and textual features of text reading are fused for information extraction and in turn, the semantics in information extraction contribute to the optimization of text reading. On three real-world datasets with diverse document images (from fixed layout to variable layout, from structured text to semi-structured text), our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy., Accepted to ACM MM2020. Code is available at https://davar-lab.github.io/publication.html or https://github.com/hikopensource/DAVAR-Lab-OCR
Published: 2020

4. Sparse coding with cross-view invariant dictionaries for person re-identification

Author: Weidong Qiu, Yunlu Xu, Jie Guo, and Zheng Huang
Subjects: K-SVD, Computer Networks and Communications, Computer science, business.industry, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Disjoint sets, Re identification, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Invariant (mathematics), Neural coding, business, Software, Subspace topology
Abstract: The task of matching observations of the same person in disjoint views captured by non-overlapping cameras is known as the person re-identification problem. It is challenging owing to low-quality images, inter-object occlusions, and variations in illumination, viewpoints and poses. Unlike previous approaches that learn Mahalanobis-like distance metrics, we propose a novel approach based on dictionary learning that takes the advances of sparse coding of discriminatingly and cross-view invariantly encoding features representing different people. Firstly, we propose a robust and discriminative feature extraction method of different feature levels. The feature representations are projected to a lower computation common subspace. Secondly, we learn a single cross-view invariant dictionary for each feature level for different camera views and a fusion strategy is utilized to generate the final matching results. Experimental statistics show the superior performance of our approach by comparing with state-of-the-art methods on two publicly available benchmark datasets VIPeR and PRID 2011.
Published: 2017
Full Text: View/download PDF

5. Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting

Author: Sanli Tang, Yi Niu, Fei Wu, Liang Qiao, Shiliang Pu, Zhanzhan Cheng, and Yunlu Xu
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Boundary (topology), Pattern recognition, 02 engineering and technology, General Medicine, Spotting, Perceptron, Pipeline (software), End-to-end principle, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business
Abstract: Many approaches have recently been proposed to detect irregular scene text and achieved promising results. However, their localization results may not well satisfy the following text recognition part mainly because of two reasons: 1) recognizing arbitrary shaped text is still a challenging task, and 2) prevalent non-trainable pipeline strategies between text detection and text recognition will lead to suboptimal performances. To handle this incompatibility problem, in this paper we propose an end-to-end trainable text spotting approach named Text Perceptron. Concretely, Text Perceptron first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies without extra parameters. It unites text detection and the following recognition part into a whole framework, and helps the whole network achieve global optimization. Experiments show that our method achieves competitive performance on two standard text benchmarks, i.e., ICDAR 2013 and ICDAR 2015, and also obviously outperforms existing methods on irregular text benchmarks SCUT-CTW1500 and Total-Text., Accepted by AAAI2020. Code is available at https://davar-lab.github.io/publication.html or https://github.com/hikopensource/DAVAR-Lab-OCR
Published: 2020

6. Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization

Author: Chengwei Zhang, Fei Wu, Futai Zou, Shiliang Pu, Yunlu Xu, Yi Niu, and Zhanzhan Cheng
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, 02 engineering and technology, Adversarial system, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Classifier (UML)
Abstract: Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizing the video-level classification loss, which exploits the most discriminative parts of actions but ignores the minor regions. In this paper, we propose a novel weakly-supervised framework by adversarial learning of two modules for eliminating such demerits. Specifically, the first module is designed as a well-designed Seeded Sequence Growing (SSG) Network for progressively extending seed regions (namely the highly reliable regions initialized by a CAS-based framework) to their expected boundaries. The second module is a specific classifier for mining trivial or incomplete action regions, which is trained on the shared features after erasing the seeded regions activated by SSG. In this way, a whole network composed of these two modules can be trained in an adversarial manner. The goal of the adversary is to mine features that are difficult for the action classifier. That is, erasion from SSG will force the classifier to discover minor or even new action regions on the input feature sequence, and the classifier will drive the seeds to grow, alternately. At last, we could obtain the action locations and categories from the well-trained SSG and the classifier. Extensive experiments on two public benchmarks THUMOS'14 and ActivityNet1.3 demonstrate the impressive performance of our proposed method compared with the state-of-the-arts., To be appeared in ACM MM2019
Published: 2019
Full Text: View/download PDF

7. Focusing Attention: Towards Accurate Text Recognition in Natural Images

Author: Shuigeng Zhou, Fan Bai, Shiliang Pu, Zhanzhan Cheng, Gang Zheng, and Yunlu Xu
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Optical character recognition, computer.software_genre, Machine learning, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Natural (music), Focusing attention, 020201 artificial intelligence & image processing, Artificial intelligence, State (computer science), business, computer
Abstract: Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon "attention drift". To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods., Comment: Revise the description of IC15 datasets (1811 samples)
Published: 2017
Full Text: View/download PDF

8. Joint Dictionary Learning for Person Re-identification

Author: Jie Guo, Zheng Huang, and Yunlu Xu
Subjects: K-SVD, business.industry, Computer science, Small number, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Viewpoints, Re identification, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Invariant (mathematics), business, Dictionary learning, computer
Abstract: Person re-identification is known as matching an individual captured in one or more cameras using a gallery of provided candidates from a different camera view. It is a hard task owing to variations in illumination, viewpoints, poses and small number of annotated training individuals. For obtaining the proper distance metrics, we propose a novel approach based on dictionary learning. Our method decomposes the dictionary into a view-invariant dictionary and a view-specific one in order to overcome the limited performance resulting from the balance between the discrimination and cross-view invariant ability. We present this multi-task joint dictionary learning method and show the competitive performance by comparing our results with state-of-the-art methods on two publicly available datasets.
Published: 2017
Full Text: View/download PDF

9. Active Control of Photon Recycling for Tunable Optoelectronic Materials

Author: Elizabeth M. Tennyson, Edo Waks, Jeremy N. Munday, Je-Hyung Kim, Yunlu Xu, Marina S. Leite, Sabyasachi Barik, and Joseph Murray
Subjects: Materials science, Band gap, business.industry, Photon recycling, 02 engineering and technology, 021001 nanoscience & nanotechnology, Active control, 01 natural sciences, Atomic and Molecular Physics, and Optics, Electronic, Optical and Magnetic Materials, law.invention, law, Optoelectronic materials, 0103 physical sciences, Optoelectronics, 010306 general physics, 0210 nano-technology, business, Light-emitting diode
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Yunlu Xu"'

1. FREE: A Fast and Robust End-to-End Video Text Spotter

2. Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection

3. TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

4. Sparse coding with cross-view invariant dictionaries for person re-identification

5. Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting

6. Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization

7. Focusing Attention: Towards Accurate Text Recognition in Natural Images

8. Joint Dictionary Learning for Person Re-identification

9. Active Control of Photon Recycling for Tunable Optoelectronic Materials

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

9 results on '"Yunlu Xu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources