Descriptor: "text information extraction" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"text information extraction"' showing total 16 results

Start Over Descriptor "text information extraction"

16 results on '"text information extraction"'

1. Towards digitalized maintenance of operating tunnels: A text documents-based defect evaluation and visualization

Author: Ou, Xuefeng, Tang, Cong, Qu, Tongming, Xu, Shiquan, Zhou, Ye, and Tian, Jiao
Published: 2025
Full Text: View/download PDF

2. 基于BERT-Bi-LSTM-CRF 模型的机场类中文航行通告要素实体识别.

Author: 郝宽公, 董兵, 吴悦, 彭自琛, and 罗创
Abstract: NOTAM (notice to airman) is important information in the field of civil aviation intelligence. In view of the problems such as more professional terms, inconsistent format and complex semantics of Chinese NOTAMs(notices to airman), an entity recognition model based on BERT-Bi-LSTM-CRF was proposed to extract the event element entities from the E items of NOTAMs. The word vector was pre-trained by the bidirectional encoder representations from transforms (BERT) model to capture rich semantic features and input into the bidirectional long and short-term memory network (Bi-LSTM) model to extract contextual features, and finally the conditional random field (CRF) model was used to output the best Entity prediction labels. The original corpus related to airport-like navigation announcements was collected and organized, and after data pre-processing and text annotation, the training set, validation set and evaluation set data were formed which can be used for entity recognition experiments. Based on this data and different entity recognition models for comparison experiments, the accuracy of the BERT-Bi-LSTM-CRF model is 89. 68%, the recall rate is 81. 77%, and the F1 value is 85. 54%, where the F1 value is effectively improved compared with the existing models, and the results validate the effectiveness of the model for elemental entity recognition in airport-like navigational announcements. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Event Detection and Information Extraction Strategies from Text: A Preliminary Study Using GENIA Corpus

Author: Abdullah, Mohd Hafizul Afifi, Aziz, Norshakirah, Abdulkadir, Said Jadid, Akhir, Emelia Akashah Patah, Talpur, Noureen, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Al-Sharafi, Mohammed A., editor, Al-Emran, Mostafa, editor, Al-Kabi, Mohammed Naji, editor, and Shaalan, Khaled, editor
Published: 2023
Full Text: View/download PDF

4. Effective Approach for Fine-Tuning Pre-Trained Models for the Extraction of Texts From Source Codes

Author: Shruthi D., Chethan H.K., and Agughasi Victor Ikechukwu
Subjects: text information extraction, t5 model, pre-processing framework, code summarization, natural language processing, Information technology, T58.5-58.64
Abstract: This study introduces SR-Text, a robust approach leveraging pre-trained models like BERT and T5 for enhanced text extraction from source codes. Addressing the limitations of traditional manual summarization, our methodology focuses on fine-tuning these models to better understand and generate contextual summaries, thus overcoming challenges such as long-term dependency and dataset quality issues. We conduct a detailed analysis of programming language syntax and semantics to develop syntax-aware text retrieval techniques, significantly boosting the accuracy and relevance of the texts extracted. The paper also explores a hybrid approach that integrates statistical machine learning with rule-based methods, enhancing the robustness and adaptability of our text extraction processes across diverse coding styles and languages. Empirical results from a meticulously curated dataset demonstrate marked improvements in performance metrics: precision increased by 15%, recall by 20%, and an F1 score enhancement of 18%. These improvements underscore the effectiveness of using advanced machine learning models in software engineering tasks. This research not only paves the way for future work in multilingual code summarization but also discusses broader implications for automated software analysis tools, proposing directions for future research to further refine and expand this methodology.
Published: 2024
Full Text: View/download PDF

5. Named Entity Recognition in Equipment Support Field Using Tri-Training Algorithm and Text Information Extraction Technology

Author: Chenguang Liu, Yongli Yu, Xingxin Li, and Peng Wang
Subjects: Text information extraction, named entity recognition, equipment support, automatic word segmentation, information identification, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Weaponry equipment names belong to an important military naming entity that is difficult to identify because of features, such as complex components, miscellaneous, and scarce annotation corpus. Here, the automatic recognition of weaponry equipment names is specifically explored, a NER (Named Entity Recognition) algorithm is proposed based on BI-LSTM-CRF (Bi-directional Long Short Term Memory Conditional Random Field), thereby demonstrating the effectiveness of domain features in domain-specific entity recognition. Firstly, Chinese characters are represented by word embedding and input into the model. Then, the input feature vector sequence is processed by BI-LSTM (Bi-directional Long Short Term Memory) NN (Neural Network) to extract context semantic learning features. Finally, the learned features are connected to the linear CRF (Conditional Random Field), the NEs (Named Entities) in the equipment support field are labeled, and the NER results are obtained and output. The experimental results show that the accuracy of the NER algorithm based on the BI-LSTM-CRF model is 92.02%, the recall rate is 93.21%, and the F1 value reaches 93.88%. The effect of this model is better than the BI-LSTM NN model and LSTM-CRF (Long Short Term Memory Conditional Random Field) NN model. The proposed model provides some references for entity recognition in the field of equipment support.
Published: 2021
Full Text: View/download PDF

6. Dietary Composition Perception Algorithm Using Social Robot Audition for Mandarin Chinese

Author: Zhidong Su, Yang Li, and Guanci Yang
Subjects: Dietary composition perception, semantic understanding, robot audition, social robot, text information extraction, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: As the problem of an aging population becomes more and more serious, social robots have an increasingly significant influence on human life. By employing regular question-and-answer conversations or wearable devices, some social robotics products can establish personal health archives. But those robots are unable to collect diet information automatically through robot vision or audition. A healthy diet can reduce a person's risk of developing cancer, diabetes, heart disease, and other age-related diseases. In order to automatically perceive the dietary composition of the elderly by listening to people's chatting, this paper proposed a chat-based automatic dietary composition perception algorithm (DCPA). DCPA uses social robot audition to understand the semantic information and percept dietary composition for Mandarin Chinese. Firstly, based on the Mel-frequency cepstrum coefficient and convolutional neural network, a speaker recognition method is designed to identify speech data. Based on speech segmentation and speaker recognition algorithm, an audio segment classification method is proposed to distinguish different speakers, store their identity information and the sequence of expression in a speech conversation. Secondly, a dietetic lexicon is established, and two kinds of dietary composition semantic understanding algorithms are proposed to understand the eating semantics and sensor dietary composition information. To evaluate the performance of the proposed DCPA algorithm, we implemented the proposed DCPA in our social robot platform. Then we established two categories of test datasets relating to a one-person and a multi-person chat. The test results show that DCPA is capable of understanding users' dietary compositions, with an F1 score of 0.9505, 0.8940 and 0.8768 for one-person talking, a two-person chat and a three-person chat, respectively. DCPA has good robustness for obtaining dietary information.
Published: 2020
Full Text: View/download PDF

7. VIETNAMESE TEXT EXTRACTION FROM BOOK COVERS

Author: Phan Thị Thanh Nga, Nguyễn Thị Huyền Trang, Nguyễn Văn Phúc, Thái Duy Quý, and Võ Phương Bình
Subjects: book cover, ocr (optical character recognition), text information extraction, vietnamese text detection., General Works
Abstract: Automatic information extraction from images reduces the cost, human interference, and timely processing. Converting printed book covers to readable text for later automation process would be useful for a wide range of users such as librarians, bookshop keepers, and individual users. In this paper, we present a novel method for the Vietnamese text extraction from images of scanned book covers. The proposed system accepts the book covers snapshot, filters the input image for an enhancement of quality, locates the regions with text, then utilizes the optical character recognizer (OCR) to extract the text. The last step is to filter the extracted text in accompany with at dictionary to achieve the final text result. Carrying out the experiments with the proposed system using our dataset delivered encouraging experimental results.
Published: 2017
Full Text: View/download PDF

8. Application Study of Hidden Markov Model and Maximum Entropy in Text Information Extraction

Author: Li, Rong, Liu, Li-ying, Fu, He-fang, Zheng, Jia-heng, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Deng, Hepu, editor, Wang, Lanzhou, editor, Wang, Fu Lee, editor, and Lei, Jingsheng, editor
Published: 2009
Full Text: View/download PDF

9. F-Logic Data and Knowledge Reasoning in the Semantic Web Context

Author: Meštrović, Ana, Čubrilo, Mirko, Machado, J. A. Tenreiro, editor, Pátkai, Béla, editor, and Rudas, Imre J., editor
Published: 2009
Full Text: View/download PDF

10. A Survey on Text Information Extraction from Born-Digital and Scene Text Images

Author: Faustina Joan, S. P. and Valli, S.
Published: 2019
Full Text: View/download PDF

11. A Fast Caption Detection Method for Low Quality Video Images.

Author: Gui, Tianyi, Sun, Jun, Naoi, Satoshi, Katsuyama, Yutaka, Minagawa, Akihiro, and Hotta, Yoshinobu
Abstract: Captions in videos are important and accurate clues for video retrieval. In this paper, we propose a fast and robust video caption detection and localization algorithm to handle low quality video images. First, the stroke response maps from complex background are extracted by a stoke filter. Then, two localization algorithms are used to locate thin stroke and thick stroke caption regions respectively. Finally, a HOG based SVM classifier is carried out on the detected results to further remove noises. Experimental results show the superior performance of our proposed method compared with existing work in terms of accuracy and speed. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

12. Color reduction for complex document images.

Author: Nikolaou, Nikos and Papamarkos, Nikos
Subjects: *IMAGE processing, *COLOR, *DATA mining, *IMAGING systems, *AUTOMATIC extracting (Information science)
Abstract: A new technique for color reduction of complex document images is presented in this article. It reduces significantly the number of colors of the document image (less than 15 colors in most of the cases) so as to have solid characters and uniform local backgrounds. Therefore, this technique can be used as a preprocessing step by text information extraction applications. Specifically, using the edge map of the document image, a representative set of samples is chosen that constructs a 3D color histogram. Based on these samples in the 3D color space, a relatively large number of colors (usually no more than 100 colors) are obtained by using a simple clustering procedure. The final colors are obtained by applying a mean-shift based procedure. Also, an edge preserving smoothing filter is used as a preprocessing stage that enhances significantly the quality of the initial image. Experimental results prove the method's capability of producing correctly segmented complex color documents where the character elements can be easily extracted as connected components. © 2009 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 19, 14–26, 2009 [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

13. A stroke filter and its application to text localization

Author: Jung, Cheolkon, Liu, Qifeng, and Kim, Joongkyu
Subjects: *LOCALIZATION theory, *DIGITAL video, *DIGITAL images, *TEXT files, *DATA mining, *PHOTOGRAPHIC light filters, *DATABASES
Abstract: Abstract: Most researchers have used edge, intensity, corner, and texture features for text localization in video images. However, these features do not fully coincide with the features of the text, and can not fulfill all the necessary conditions of the text. Therefore, it is very difficult to localize text robustly in video images which have complex backgrounds with strong edge or texture clutter using these features. In this paper, we propose a stroke filter which can detect strokes of texts for robust text localization. By using this stroke filter, we can remove text candidates which have strong edges but are not text. Furthermore, we apply the stroke filter to our text localization system and localize text more robustly in the video images. The effectiveness and efficiency of the proposed method is verified by extensive experiments on a challenging database containing 480 video images. [Copyright &y& Elsevier]
Published: 2009
Full Text: View/download PDF

14. A new approach for text segmentation using a stroke filter

Author: Jung, Cheolkon, Liu, Qifeng, and Kim, Joongkyu
Subjects: *ALGORITHMS, *FILTERS (Mathematics), *DIGITAL filters (Mathematics), *FOUNDATIONS of arithmetic
Abstract: Abstract: We propose a new method for achieving robust text segmentation in images by using a stroke filter. It is known that to segment text accurately and robustly from a complex background is a very difficult task. Most of the existing methods are sensitive to text color, size, font, and background clutter, because they use simple segmentation methods or require prior knowledge about text shape. In this paper, we attempt to consider the intrinsic characteristics of the text by using the stroke filter and design a new and robust algorithm for text segmentation. First, we describe the stroke filter briefly based on local region analysis. Second, the determination of text color polarity and local region growing procedures are performed successively based on the response of the stroke filter. Finally, the feedback procedure by the recognition score from an optical character recognition (OCR) module is used to improve the performance of text segmentation. By means of experiments on a large database, we demonstrate that the performance of our method is quite impressive from the viewpoints of the accuracy and robustness. [Copyright &y& Elsevier]
Published: 2008
Full Text: View/download PDF

15. Text information extraction in images and video: a survey

Author: Jung, Keechul, In Kim, Kwang, and K. Jain, Anil
Subjects: *PERFORMANCE, *ALGORITHMS, *SURVEYS, *ALGEBRA
Abstract: Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research. [Copyright &y& Elsevier]
Published: 2004
Full Text: View/download PDF

16. Classification of regions extracted from scene images by morphological filters in text or non-text using decision tree

Author: Luz Alves, Wonder Alexandre, Hashimoto, Ronaldo Fumio, and Skala, Václav
Subjects: morphological filter, text region classification, morfologický filtr, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, extrakce textových informací, text localization, text information extraction, klasifikace textových oblastí, lokalizace textu
Abstract: We present in this work a new method to classify regions extracted from scene images by morphological filters in text or nontext region using a decision tree. Our technique can be divided into three parts. Firstly, we extract a set of regions by a robust scheme based on morphological filters. Then, after a refinement, a set of text attributes is obtained for each region. In the last step, a decision tree is built in order to classify them as text or non-text regions. Experiments performed using images from the ICDAR public dataset show that this method is a good alternative for practical problems involving text location in scene images.
Published: 2010

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

16 results on '"text information extraction"'

1. Towards digitalized maintenance of operating tunnels: A text documents-based defect evaluation and visualization

2. 基于BERT-Bi-LSTM-CRF 模型的机场类中文航行通告要素实体识别.

3. Event Detection and Information Extraction Strategies from Text: A Preliminary Study Using GENIA Corpus

4. Effective Approach for Fine-Tuning Pre-Trained Models for the Extraction of Texts From Source Codes

5. Named Entity Recognition in Equipment Support Field Using Tri-Training Algorithm and Text Information Extraction Technology

6. Dietary Composition Perception Algorithm Using Social Robot Audition for Mandarin Chinese

7. VIETNAMESE TEXT EXTRACTION FROM BOOK COVERS

8. Application Study of Hidden Markov Model and Maximum Entropy in Text Information Extraction

9. F-Logic Data and Knowledge Reasoning in the Semantic Web Context

10. A Survey on Text Information Extraction from Born-Digital and Scene Text Images

11. A Fast Caption Detection Method for Low Quality Video Images.

12. Color reduction for complex document images.

13. A stroke filter and its application to text localization

14. A new approach for text segmentation using a stroke filter

15. Text information extraction in images and video: a survey

16. Classification of regions extracted from scene images by morphological filters in text or non-text using decision tree

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

16 results on '"text information extraction"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources