Descriptor: "metadata extraction" / Publication Year Range: Last 50 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"metadata extraction"' showing total 197 results

Start Over Descriptor "metadata extraction" Publication Year Range Last 50 years

197 results on '"metadata extraction"'

1. NAA-LIMS: a laboratory information management system for neutron activation analysis at the Peruvian institute of nuclear energy.

Author: Rivas, Jherson and Bedregal, Patricia
Subjects: *MANAGEMENT information systems, *LABORATORY management, *INFORMATION resources management, *NEUTRON flux, *INFORMATION storage & retrieval systems
Abstract: This work presents a modular laboratory information management system (LIMS) developed at the Peruvian Institute of Nuclear Energy for neutron activation analysis using the k0 method. The LIMS is based on Visual Studio, WinForms, the.NET Framework, and C#. It digitizes processes and ensures metrological traceability while facilitating sample management, spectral processing, and neutron flux characterization. Data is stored securely in the cloud to allow instant and secure access by multiple users. Its innovation lies in the personalized and modular design, which optimizes laboratory efficiency without requiring significant adjustments to existing procedures. In addition, it was designed for it to be adaptable for use by other research organizations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework.

Author: Wang, Zongmin, Shi, Xujie, Yang, Haibo, Yu, Bo, and Cai, Yingchun
Subjects: *NATURAL disasters, *CLUSTER analysis (Statistics), *INFORMATION technology, *METADATA, *DATA mining, *NATURAL resources
Abstract: The development of information technology has led to massive, multidimensional, and heterogeneously sourced disaster data. However, there's currently no universal metadata standard for managing natural disasters. Common pre-training models for information extraction requiring extensive training data show somewhat limited effectiveness, with limited annotated resources. This study establishes a unified natural disaster metadata standard, utilizes self-trained universal information extraction (UIE) models and Python libraries to extract metadata stored in both structured and unstructured forms, and analyzes the results using the Word2vec-Kmeans cluster algorithm. The results show that (1) the self-trained UIE model, with a learning rate of 3 × 10−4 and a batch_size of 32, significantly improves extraction results for various natural disasters by over 50%. Our optimized UIE model outperforms many other extraction methods in terms of precision, recall, and F1 scores. (2) The quality assessments of consistency, completeness, and accuracy for ten tables all exceed 0.80, with variances between the three dimensions being 0.04, 0.03, and 0.05. The overall evaluation of data items of tables also exceeds 0.80, consistent with the results at the table level. The metadata model framework constructed in this study demonstrates high-quality stability. (3) Taking the flood dataset as an example, clustering reveals five main themes with high similarity within clusters, and the differences between clusters are deemed significant relative to the differences within clusters at a significance level of 0.01. Overall, this experiment supports effective sharing of disaster data resources and enhances natural disaster emergency response efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Deep Neural Networks for Automated Metadata Extraction

Author: El Omari, Abdellah, Antari, Jilali, Elkina, Hamza, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Motahhir, Saad, editor, and Bossoufi, Badre, editor
Published: 2024
Full Text: View/download PDF

4. Real-Time Security Risk Assessment From CCTV Using Hand Gesture Recognition

Author: Murat Koca
Subjects: CCTV footage, deep learning, cyber security, hand gesture recognition, media-pipe, metadata extraction, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Closed-Circuit Television (CCTV) surveillance systems, long associated with physical security, are becoming more crucial when combined with cybersecurity measures. Combining traditional surveillance with cyber defenses is a flexible method for protecting against both physical and digital dangers. This study introduces the use of convolutional neural networks (CNNs) and hand gesture detection using CCTV data to perform real-time security risk assessments. The suggested method’s emphasis on automated extraction of key information, such as identity and behavior, illustrates its special use in silent or acoustically challenging settings. This study uses deep learning techniques to develop a novel approach for detecting hand gestures in CCTV images by automatically extracting relevant features using a media-pipe architecture. For instance, it facilitates risk assessment through the use of hand gestures in noisy environments or muted audio streams. Given this method’s uniqueness and efficiency, the suggested solution will be able to alert appropriate authorities in the event of a security breach. There seems to be considerable opportunity for the development of applications in several domains of security, law enforcement, and public safety, including but not limited to shopping malls, educational institutions, transportation, the armed forces, theft, abduction, etc.
Published: 2024
Full Text: View/download PDF

5. Generic features selection for structure classification of diverse styled scholarly articles.

Author: Waqas, Muhammad and Anjum, Nadeem
Abstract: The enormous growth in online research publications in diversified domains has attracted the research community to extract these valuable scientific resources by searching online digital libraries and publishers' websites. A precise search is desired to enlist most related articles by applying semantic queries to the document's metadata and the structural elements. The online search engines and digital libraries offer only keyword-based search on full-body text, which creates excessive results. Therefore, the research article's structural and metadata information has to be stored in machine comprehendible form by the online research publishers. The research community in recent years has adopted different approaches to extract structural information from research documents like rule-based heuristics and machine-learning-based approaches. Studies suggest that machine-learning-based techniques have produced optimum results for document structure extraction from publishers having diversified publication layouts. In this paper, we have proposed thirteen different logical layout structural (LLS) components. We have identified a two-staged innovative set of generic features that are associated with the LLS. This approach has given our technique an advantage against the state-of-the-art for structural classification of digital scientific articles with diversified publication styles. We have applied chi-square ( c h i 2 ) for feature selection, and the final result has revealed that SVM (Kernal function) has produced an optimum result with an overall F-measure of 0.95. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Event and Entity Extraction from Generated Video Captions

Author: Scherer, Johannes, Bhowmik, Deepayan, Scherp, Ansgar, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Holzinger, Andreas, editor, Kieseberg, Peter, editor, Cabitza, Federico, editor, Campagner, Andrea, editor, Tjoa, A Min, editor, and Weippl, Edgar, editor
Published: 2023
Full Text: View/download PDF

7. Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings.

Author: Skondras, Panagiotis, Zotos, Nikos, Lagios, Dimitris, Zervas, Panagiotis, Giotopoulos, Konstantinos C., and Tzimas, Giannis
Subjects: *JOB postings, *DEEP learning, *MACHINE learning, *FEEDFORWARD neural networks, *LANGUAGE models, *METADATA, *JOB classification
Abstract: This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly used to analyze and classify job postings. However, the effectiveness of these algorithms largely hinges on the quality and volume of the training data. In our study, we propose a multi-class classification methodology for job postings, drawing on AI models such as text-davinci-003 and the quantized versions of Falcon 7b (Falcon), Wizardlm 7B (Wizardlm), and Vicuna 7B (Vicuna) to generate synthetic datasets. These synthetic data are employed in two use-case scenarios: (a) exclusively as training datasets composed of synthetic job postings (situations where no real data is available) and (b) as an augmentation method to bolster underrepresented job title categories. To evaluate our proposed method, we relied on two well-established approaches: the feedforward neural network (FFNN) and the BERT model. Both the use cases and training methods were assessed against a genuine job posting dataset to gauge classification accuracy. Our experiments substantiated the benefits of using synthetic data to enhance job posting classification. In the first scenario, the models' performance matched, and occasionally exceeded, that of the real data. In the second scenario, the augmented classes consistently outperformed in most instances. This research confirms that AI-generated datasets can enhance the efficacy of NLP algorithms, especially in the domain of multi-class classification job postings. While data augmentation can boost model generalization, its impact varies. It is especially beneficial for simpler models like FNN. BERT, due to its context-aware architecture, also benefits from augmentation but sees limited improvement. Selecting the right type and amount of augmentation is essential. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. Generating Synthetic Resume Data with Large Language Models for Enhanced Job Description Classification †.

Author: Skondras, Panagiotis, Zervas, Panagiotis, and Tzimas, Giannis
Subjects: LANGUAGE models, MACHINE learning, JOB classification, TRANSFORMER models, JOB descriptions, NATURAL language processing, FEEDFORWARD neural networks
Abstract: In this article, we investigate the potential of synthetic resumes as a means for the rapid generation of training data and their effectiveness in data augmentation, especially in categories marked by sparse samples. The widespread implementation of machine learning algorithms in natural language processing (NLP) has notably streamlined the resume classification process, delivering time and cost efficiencies for hiring organizations. However, the performance of these algorithms depends on the abundance of training data. While selecting the right model architecture is essential, it is also crucial to ensure the availability of a robust, well-curated dataset. For many categories in the job market, data sparsity remains a challenge. To deal with this challenge, we employed the OpenAI API to generate both structured and unstructured resumes tailored to specific criteria. These synthetically generated resumes were cleaned, preprocessed and then utilized to train two distinct models: a transformer model (BERT) and a feedforward neural network (FFNN) that incorporated Universal Sentence Encoder 4 (USE4) embeddings. While both models were evaluated on the multiclass classification task of resumes, when trained on an augmented dataset containing 60 percent real data (from Indeed website) and 40 percent synthetic data from ChatGPT, the transformer model presented exceptional accuracy. The FFNN, albeit predictably, achieved lower accuracy. These findings highlight the value of augmented real-world data with ChatGPT-generated synthetic resumes, especially in the context of limited training data. The suitability of the BERT model for such classification tasks further reinforces this narrative. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. Annotated Open Corpus Construction and BERT-Based Approach for Automatic Metadata Extraction From Korean Academic Papers

Author: Hyesoo Kong, Hwamook Yoon, Jaewook Seol, Mihwan Hyun, Hyejin Lee, Soonyoung Kim, and Wonjun Choi
Subjects: BERT, corpus construction, metadata extraction, transfer learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the accelerating development of science and technology, the academic papers being published in various fields are increasing rapidly. Academic papers specially in science and technology fields are a crucial media for researchers who develop new technologies by identifying knowledge regarding the latest technological trends and conduct derivative studies in science and technology. Therefore, the continual collection of extensive academic papers, structuring of metadata, and construction of databases are significant tasks. However, research on automatic metadata extraction from Korean papers is not being actively conducted currently owing to insufficient Korean training data. We automatically constructed the largest labeled corpus in South Korea to date from 315,320 PDF papers belonging to 503 Korean academic journals and this labeled corpus can be used for training the models of automatic extraction for 12 metadata types from PDF papers. This labeled corpus is available at https://doi.org/10.23057/48. Moreover, we developed inspection process and guidelines for the automatically constructed data and performed a full inspection of the validation and testing data. The reliability of the inspected data was verified through the inter-annotator agreement measurement. Using our corpus, we trained and evaluated the BERT based transfer learning model to verify its reliability. Furthermore, we proposed new training methods that can improve the metadata extraction performance of Korean papers, and through these methods, we developed KorSciBERT-ME-J and KorSciBERT-ME-J+C models. The KorSciBERT-ME-J showed the highest performance with an F1 score of 99.36%, as well as robust performance in automatic metadata extraction from Korean academic papers in various formats.
Published: 2023
Full Text: View/download PDF

10. A hybrid strategy to extract metadata from scholarly articles by utilizing support vector machine and heuristics.

Author: Waqas, Muhammad, Anjum, Nadeem, and Afzal, Muhammad Tanvir
Abstract: The immense growth in online research publications has attracted the research community to extract valuable information from scientific resources by exploring online digital libraries and publishers' websites. The metadata stored in a machine comprehendible form can facilitate a precise search to enlist most related articles by applying semantic queries to the document's metadata and the structural elements. The online search engines and digital libraries offer only keyword-based search on full-body text, which creates excessive results. The research community in recent years has adopted different approaches to extract structural information from research documents. We have distributed the content of an article into two logical layouts and metadata levels. This strategy has given our technique an advantage over the state-of-the-art (SOTA) extracting metadata with diversified publication styles. The experimental results have revealed that the proposed approach has shown a significant gain in performance of 20.26% to 27.14%. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. EKSTRAKCIJA METAPODATKOV S POMOČJO STROJNEGA UČENJA.

Author: SABADIN, Ivančica
Abstract: Copyright of Moderna Arhivistika is the property of Maribor Provincial Archives and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

12. Multi-perspective Approach for Curating and Exploring the History of Climate Change in Latin America within Digital Newspapers.

Author: Vargas-Solar, Genoveva, Zechinelli-Martini, José-Luis, A. Espinosa-Oviedo, Javier, and M. Vilches-Blázquez, Luis
Abstract: This paper introduces a multi-perspective approach to deal with curation and exploration issues in historical newspapers. It has been implemented in the platform LACLICHEV (Latin American Climate Change Evolution platform). Exploring the history of climate change through digitalized newspapers published around two centuries ago introduces four challenges: (1) curating content for tracking entries describing meteorological events; (2) processing (digging into) colloquial language (and its geographic variations5 ) for extracting meteorological events; (3) analyzing newspapers to discover meteorological patterns possibly associated with climate change; (4) designing tools for exploring the extracted content. LACLICHEV provides tools for curating, exploring, and analyzing historical newspaper articles, their description and location, and the vocabularies used for referring to meteorological events. This platform makes it possible to understand and identify possible patterns and models that can build an empirical and social view of the history of climate change in the Latin American region. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. Using Provenance in Data Analytics for Seismology: Challenges and Directions

Author: da Costa, Umberto Souza, Espinosa-Oviedo, Javier Alfonso, Musicante, Martin A., Vargas-Solar, Genoveva, Zechinelli-Martini, José-Luis, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Chiusano, Silvia, editor, Cerquitelli, Tania, editor, Wrembel, Robert, editor, Nørvåg, Kjetil, editor, Catania, Barbara, editor, Vargas-Solar, Genoveva, editor, and Zumpano, Ester, editor
Published: 2022
Full Text: View/download PDF

14. Re-purposing Excavation Database Content as Paradata: An Explorative Analysis of Paradata Identification Challenges and Opportunities

Author: Lisa Börjesson, Olle Sköld, Zanna Friberg, Daniel Löwenborg, Gísli Pálsson, and Isto Huvila
Subjects: metadata, paradata, metadata extraction, data reuse, research data, unstructured data, archaeological data, Bibliography. Library science. Information resources
Abstract: Although data reusers request information about how research data was created and curated, this information is often non-existent or only briefly covered in data descriptions. The need for such contextual information is particularly critical in fields like archaeology, where old legacy data created during different time periods and through varying methodological framings and fieldwork documentation practices retains its value as an important information source. This article explores the presence of contextual information in archaeological data with a specific focus on data provenance and processing information, i.e., paradata. The purpose of the article is to identify and explicate types of paradata in field observation documentation. The method used is an explorative close reading of field data from an archaeological excavation enriched with geographical metadata. The analysis covers technical and epistemological challenges and opportunities in paradata identification, and discusses the possibility of using identified paradata in data descriptions and for data reliability assessments. Results show that it is possible to identify both knowledge organisation paradata (KOP) relating to data structuring and knowledge-making paradata (KMP) relating to fieldwork methods and interpretative processes. However, while the data contains many traces of the research process, there is an uneven and, in some categories, low level of structure and systematicity that complicates automated metadata and paradata identification and extraction. The results show a need to broaden the understanding of how structure and systematicity are used and how they impact research data in archaeology and comparable field sciences. The insight into how a dataset's KOP and KMP can be read is also a methodological contribution to data literacy research and practice development. On a repository level, the results underline the need to include paradata about dataset creation, purpose, terminology, dataset internal and external relations, and eventual data colloquialisms that require explanation to reusers.
Published: 2022
Full Text: View/download PDF

15. Automatic Annotation of Images in Persian Scientific Documents Based on Text Analysis Methods

Author: Azadeh fakhrzadeh, Mohadeseh Rahnama, and Jalal A Nasiri
Subjects: image tagging, text analysis, image annotation, image retrieval, metadata extraction, information technology, Bibliography. Library science. Information resources
Abstract: In this paper a new method for annotating images in Persian scientific documents is suggested. Images in scientific documents contain valuable information. In many cases, by analyzing images one can understand the main idea and important results of the document. Due to explosive growth of image data, automatic image annotation has attracted extensive attention and become one of the growing subjects in the literature. Image annotation is the first step in image retrieval methods, in which descriptive tags are assigned to each image. Here, for image annotation the associated text is used. The caption and the part of the document that includes the reference to the image are considered. None phrases in the associated text are ranked based on five different methods: term frequency, inverse document frequency, term frequency–inverse document frequency, cosine similarity between word embedding of noun phrases in the text and the caption and using both term frequency–inverse document frequency and cosine similarity methods. Image tags in every method are the noun phrases with the highest rank. Suggested methods are evaluated on the test data from Iran scientific information database (Ganj), the main database of Persian scientific documents. Term frequency–inverse document frequency method gives the best results.
Published: 2022

16. LACLICHEV: Exploring the History of Climate Change in Latin America Within Newspapers Digital Collections

Author: Vargas-Solar, Genoveva, Zechinelli-Martini, José-Luis, Espinosa-Oviedo, Javier A., Vilches-Blázquez, Luis M., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Bellatreche, Ladjel, editor, Dumas, Marlon, editor, Karras, Panagiotis, editor, Matulevičius, Raimundas, editor, Awad, Ahmed, editor, Weidlich, Matthias, editor, Ivanović, Mirjana, editor, and Hartig, Olaf, editor
Published: 2021
Full Text: View/download PDF

17. Üstverilerin Derin Öğrenme Algoritmaları Kullanılarak Otomatik Olarak Çıkartılması ve Sınıflanması

Author: Murat İnce
Subjects: üstveri çıkartma, konvolüsyonel sinir ağları, tekrarlayan sinir ağları, metadata extraction, convolutional neural networks, recurrent neural networks, Technology, Engineering (General). Civil engineering (General), TA1-2040, Science, Science (General), Q1-390
Abstract: Günümüzde bilişim teknolojilerinin yaygınlaşması sebebiyle dijital içerik ihtiyacı artmıştır. Bu içeriklerin oluşturulması zaman alıcı ve maliyetli bir süreçtir. İçerik oluşturulurken öğrenme nesnelerinden faydalanılmaktadır. Bu nesnelerin bilgisayarlar tarafından keşfedilebilir ve okunabilir olması yeniden kullanılabilirlik ve paylaşılabilirlik açısından önemlidir. Bu sebeple nesneler tanımlayıcı kimlik bilgilerini içeren üstveriler ile bütünleşik olarak kullanılmaktadırlar. Bu üstveriler ne kadar düzgün oluşturulup sınıflandırılırsa nesnelerin kullanılabilirliği o derece artmış olmaktadır. Bu sebeple nesnelerden otomatik üstveri çıkartan birçok yöntem geliştirilmiştir. Bu çalışmada da Konvolüsyonel Sinir Ağları (KSA), Tekrarlayan Sinir Ağları (TSA) gibi derin öğrenme ve Doğal Dil İşleme (DDİ) yöntemleri kullanılarak öğrenme nesnelerindeki içeriklerden otomatik olarak üstveri çıkartılması ve sınıflaması yapılmıştır. Sistemin başarısı ve doğruluğu örnek öğrenme nesneleri ile test edilmiştir. Sonuçlar sistemin başarılı bir şekilde kullanılabileceğini göstermiştir.
Published: 2021
Full Text: View/download PDF

18. Generating Synthetic Resume Data with Large Language Models for Enhanced Job Description Classification

Author: Panagiotis Skondras, Panagiotis Zervas, and Giannis Tzimas
Subjects: metadata extraction, resumes, CV, big data, multiclass classification, ChatGPT, Information technology, T58.5-58.64
Abstract: In this article, we investigate the potential of synthetic resumes as a means for the rapid generation of training data and their effectiveness in data augmentation, especially in categories marked by sparse samples. The widespread implementation of machine learning algorithms in natural language processing (NLP) has notably streamlined the resume classification process, delivering time and cost efficiencies for hiring organizations. However, the performance of these algorithms depends on the abundance of training data. While selecting the right model architecture is essential, it is also crucial to ensure the availability of a robust, well-curated dataset. For many categories in the job market, data sparsity remains a challenge. To deal with this challenge, we employed the OpenAI API to generate both structured and unstructured resumes tailored to specific criteria. These synthetically generated resumes were cleaned, preprocessed and then utilized to train two distinct models: a transformer model (BERT) and a feedforward neural network (FFNN) that incorporated Universal Sentence Encoder 4 (USE4) embeddings. While both models were evaluated on the multiclass classification task of resumes, when trained on an augmented dataset containing 60 percent real data (from Indeed website) and 40 percent synthetic data from ChatGPT, the transformer model presented exceptional accuracy. The FFNN, albeit predictably, achieved lower accuracy. These findings highlight the value of augmented real-world data with ChatGPT-generated synthetic resumes, especially in the context of limited training data. The suitability of the BERT model for such classification tasks further reinforces this narrative.
Published: 2023
Full Text: View/download PDF

19. Automatic Detection of the Boundary between Metadata and Body in Persian Theses using BA_SVM

Author: Mohadese Rahnama, Seyed Mohammad Hossein Hasheminejad, and Jalal A Nasiri
Subjects: metadata extraction, information extraction, support vector machine (svm), metaheuristic algorithm, bat algorithm (ba), Bibliography. Library science. Information resources
Abstract: Metadata extraction facilitates the process of indexing and improves information retrieval. Also automation of this process increases efficiency more than manual extraction. The example of the thesis metadata are names of students, professors, title, field, degree, abstract, keywords, etc. In this paper the aim is automatic boundary detection of metadata from the main body in Persian theses. Therefore, 250 theses collected from IRANDOC system. Features were extracted from paragraphs of each thesis then paragraphs were classified using support vector machine into 2 classes: metadata and body. In this study, Bat algorithm is used to set the parameter of SVM. The result reveals that the proposed method predicts type of paragraphs with 96.6 percent accuracy.
Published: 2021

20. LAME: Layout-Aware Metadata Extraction Approach for Research Articles.

Author: Jongyun Choi, Hyesoo Kong, Hwamook Yoon, Heungseon Oh, and Yuchul Jung
Subjects: METADATA, SCHOLARLY periodicals, CONFERENCE papers, ACADEMIC conferences
Abstract: The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of automatic layout analysis, construction of a large meta-data training set, and implementation of metadata extractor). In the framework, we designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed a pre-trainedmodel, Layout-MetaBERT, to extract the metadata from academic journals with varying layout formats. The experimental results with our metadata extractor exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

21. Understanding Scanner Utilization With Real-Time DICOM Metadata Extraction

Author: Pradeeban Kathiravelu, Ashish Sharma, and Puneet Sharma
Subjects: Biomedical imaging, digital imaging and communications in medicine (DICOM), metadata extraction, picture archiving and communication system (PACS), scanner utilization, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Understanding system performance metrics ensures better utilization of the radiology resources with more targeted interventions. The images produced by radiology scanners typically follow the DICOM (Digital Imaging and Communications in Medicine) standard format. The DICOM images consist of textual metadata that can be used to calculate key timing parameters, such as the exact study durations and scanner utilization. However, hospital networks lack the resources and capabilities to extract the metadata from the images quickly and automatically compute the scanner utilization properties. Thus, they resort to using data records from the Radiology Information Systems (RIS). However, data acquired from RIS are prone to human errors, rendering many derived key performance metrics inadequate and inaccurate. Hence, there is motivation to establish a real-time image transfer from the Picture Archiving and Communication Systems (PACS) to receive the DICOM images from the scanners to research clusters to conduct such metadata processing to evaluate scanner utilization metrics efficiently and quickly. This paper analyzes the scanners' utilization by developing a real-time monitoring framework that retrieves radiology images into a research cluster using the DICOM networking protocol and then extracts and processes the metadata from the images. Our proposed approach facilitates a better understanding of scanner utilization across a vast healthcare network by observing properties such as study duration, the interval between the encounters, and the series count of studies. Benchmarks against using the RIS data indicate that our proposed framework based on real-time PACS data estimates the scanner utilization more accurately. Furthermore, our framework has been running stable and performing its computation for more than two years on our extensive healthcare network in pseudo real-time.
Published: 2021
Full Text: View/download PDF

22. Dijital Kütüphanelerde Dokümanlardan Bilgi Geri Kazanımı için Kullanılan Güncel Teknolojiler: Derleme Çalışması

Author: Mohamed Amin Abdisamad, Alev Mutlu, Furkan Göz, Öztürk Tüfekçi, Kerem Küçük, and Osman Kabasakal
Subjects: doküman işleme, üst veri çıkarımı, varlık ismi tanıma, anahtar kelime çıkarımı, doküman benzerliği, document processing, metadata extraction, name entity recognition, keyword extraction, document similarity, Technology, Engineering (General). Civil engineering (General), TA1-2040, Science, Science (General), Q1-390
Abstract: Son yıllarda, farklı konular için sunulan dijital bilgi kaynaklarının sayısı aşırı miktarda artmaktadır. Bu dijital bilgi kaynaklarına erişim desteği sunan sistemlerin birçoğu tarama, arama ve bilgi geri kazanımı araçlarına odaklanmıştır. Sayısal kütüphaneler, elektronik kitaplıklar ve Web sayfaları, bilgi erişimini iyileştirmek, belge koleksiyonlarını farklı anahtar kriterlere göre hiyerarşik olarak oluşturmak ve düzenlemek için yeni birçok açılım sunmaktadır. Farklı arama araçları, bilgi erişim teknikleri kullanılarak erişilebilen belgeleri düzenlemek, endekslemek ve özetlemek için yazılım tabanlı hizmetleri kullanarak daha kapsamlı bir doküman kapsamı sunulabilmektedir. Dijital kütüphanelerdeki arama mekanizmalarına uygulanan teknolojiler, doküman koleksiyonlarını yönetmek, anlamlı veri çıkarmak ve doküman ilişkilerinin belirlenmesi için farklı yöntem ve teknolojilerin kullanımını zorunlu kılmıştır. Özellikle belgeler arasındaki ilişki ne biçimleri ne de türleri ile açıkça tanımlanamamaktadır. Bu çalışma, sayısal kütüphaneler için belgelerin içeriğinden üst-veri çıkarımı, varlık isimlerinin elde edilmesi, anahtar kelimelerin elde erilmesi ve doküman benzerliklerinin oluşturulması için kullanılan yöntem ve teknikler için kapsamlı bir çalışma sunmaktadır.
Published: 2021
Full Text: View/download PDF

23. Text and metadata extraction from scanned Arabic documents using support vector machines.

Author: Qin, Wenda, Elanwar, Randa, and Betke, Margrit
Subjects: *SUPPORT vector machines, *SUPERVISED learning, *DOCUMENT imaging systems, *METADATA, *IMAGE analysis
Abstract: Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recognizers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical L ayout A nalysis of scanned B ooks pages in A rabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

24. ارائة روشی برای برچسب زدن تصاویر موجود در...

Author: آزاده فخرزاده, محدثه رهنما, and جالل‌الدین نصیر&
Subjects: SCIENCE databases, NOUN phrases (Grammar), IMAGE retrieval, ANNOTATIONS, INFORMATION technology
Abstract: In this paper a new method for annotating images in Persian scientific documents is suggested. Images in scientific documents contain valuable information. In many cases, by analyzing images one can understand the main idea and important results of the document. Due to explosive growth of image data, automatic image annotation has attracted extensive attention and become one of the growing subjects in the literature. Image annotation is the first step in image retrieval methods, in which descriptive tags are assigned to each image. Here, for image annotation the associated text is used. The caption and the part of the document that includes the reference to the image are considered. None phrases in the associated text are ranked based on five different methods: term frequency, inverse document frequency, term frequency–inverse document frequency, cosine similarity between word embedding of noun phrases in the text and the caption and using both term frequency–inverse document frequency and cosine similarity methods. Image tags in every method are the noun phrases with the highest rank. Suggested methods are evaluated on the test data from Iran scientific information database (Ganj), the main database of Persian scientific documents. Term frequency–inverse document frequency method gives the best results. [ABSTRACT FROM AUTHOR]
Published: 2022

25. A Structure-Based Method for Building a Database of Extracted Figures from Scientific Documents: A Case Study of Iran Scientific Information Database (GANJ)

Author: Azadeh Fakhrzadeh and Amir Hossein Seddighi
Subjects: image processing, image extraction, metadata extraction, information technology, Bibliography. Library science. Information resources
Abstract: Figures in scientific documents are rich source of information. The first step in retrieving information from such figures is to build a valid figure database. To this end, we developed a system for generating figure database from scholarly Persian documents, in large scale. The first step is to parse files and extract figures and their corresponding descriptions. There are two general approaches for extracting figures from documents, one is based on image processing methods and another one is based on processing the file primitives. The focus of this paper is on later one. This approach is shown to be a better choice for the search engines because of its speed and scalability properties. We propose a structure based method that extracts the figures and their descriptions by analyzing the file layout. This information is saved in a database with a specific structure and is indexed for retrieval in the search engine. The proposed algorithm was implemented in Python programming language. As a benchmark we used the basic method in the literature which is based on the processing PDF file. We employed the proposed method in a case study on Iran scientific information database (Ganj). In this regard, 150 scientific documents were randomly chosen from Ganj database and analyzed using two mentioned methods. Based on our experimental results, the proposed method is more efficient than the basic method especially for Persian documents. There many unanswered challenges for Persian documents when using the basic method. The number of noise images resulted from the basic method is high and Persian text extracted is not well organized. Our proposed method overcomes some of these drawbacks and is recommended for generating figure database from scientific Persian documents. The proposed method is able to correctly extract about 40% of the images with their corresponding descriptions which is 10% better than the basic method.
Published: 2020

26. An open-source framework for neuroscience metadata management applied to digital reconstructions of neuronal morphology

Author: Kayvan Bijari, Masood A. Akram, and Giorgio A. Ascoli
Subjects: Neuroscience curation, Metadata extraction, Knowledge engineering, Data sharing, Information management tools, Neuronal morphology, Computer applications to medicine. Medical informatics, R858-859.7, Computer software, QA76.75-76.765
Abstract: Abstract Research advancements in neuroscience entail the production of a substantial amount of data requiring interpretation, analysis, and integration. The complexity and diversity of neuroscience data necessitate the development of specialized databases and associated standards and protocols. NeuroMorpho.Org is an online repository of over one hundred thousand digitally reconstructed neurons and glia shared by hundreds of laboratories worldwide. Every entry of this public resource is associated with essential metadata describing animal species, anatomical region, cell type, experimental condition, and additional information relevant to contextualize the morphological content. Until recently, the lack of a user-friendly, structured metadata annotation system relying on standardized terminologies constituted a major hindrance in this effort, limiting the data release pace. Over the past 2 years, we have transitioned the original spreadsheet-based metadata annotation system of NeuroMorpho.Org to a custom-developed, robust, web-based framework for extracting, structuring, and managing neuroscience information. Here we release the metadata portal publicly and explain its functionality to enable usage by data contributors. This framework facilitates metadata annotation, improves terminology management, and accelerates data sharing. Moreover, its open-source development provides the opportunity of adapting and extending the code base to other related research projects with similar requirements. This metadata portal is a beneficial web companion to NeuroMorpho.Org which saves time, reduces errors, and aims to minimize the barrier for direct knowledge sharing by domain experts. The underlying framework can be progressively augmented with the integration of increasingly autonomous machine intelligence components.
Published: 2020
Full Text: View/download PDF

27. PARDA: A Dataset for Scholarly PDF Document Metadata Extraction Evaluation

Author: Fan, Tiantian, Liu, Junming, Qiu, Yeliang, Jiang, Congfeng, Zhang, Jilin, Zhang, Wei, Wan, Jian, Akan, Ozgur, Series Editor, Bellavista, Paolo, Series Editor, Cao, Jiannong, Series Editor, Coulson, Geoffrey, Series Editor, Dressler, Falko, Series Editor, Ferrari, Domenico, Series Editor, Gerla, Mario, Series Editor, Kobayashi, Hisashi, Series Editor, Palazzo, Sergio, Series Editor, Sahni, Sartaj, Series Editor, Shen, Xuemin (Sherman), Series Editor, Stan, Mircea, Series Editor, Xiaohua, Jia, Series Editor, Zomaya, Albert Y., Series Editor, Gao, Honghao, editor, Wang, Xinheng, editor, Yin, Yuyu, editor, and Iqbal, Muddesar, editor
Published: 2019
Full Text: View/download PDF

28. Selection Methods for Geodata Visualization of Metadata Extracted from Unstructured Digital Data for Scientific Heritage Studies

Author: Prokudin, Dmitry, Levit, Georgy, Hossfeld, Uwe, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Alexandrov, Daniel A., editor, Boukhanovsky, Alexander V., editor, Chugunov, Andrei V., editor, Kabanov, Yury, editor, Koltsova, Olessia, editor, and Musabirov, Ilya, editor
Published: 2019
Full Text: View/download PDF

29. Semantic Representation of Scientific Publications

Author: Vahdati, Sahar, Fathalla, Said, Auer, Sören, Lange, Christoph, Vidal, Maria-Esther, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Doucet, Antoine, editor, Isaac, Antoine, editor, Golub, Koraljka, editor, Aalberg, Trond, editor, and Jatowt, Adam, editor
Published: 2019
Full Text: View/download PDF

30. Üstverilerin Derin Öğrenme Algoritmaları Kullanılarak Otomatik Olarak Çıkartılması ve Sınıflanması.

Author: İNCE, Murat
Abstract: Copyright of Duzce University Journal of Science & Technology is the property of Duzce University Journal of Science & Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2021
Full Text: View/download PDF

31. FLAG-PDFe: Features Oriented Metadata Extraction Framework for Scientific Publications

Author: Muhammad Waqas Ahmed and Muhammad Tanvir Afzal
Subjects: Machine learning, research article, metadata extraction, text patterns, document structure analysis, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The unprecedented growth of the research publications in diversified domains has overwhelmed the research community. It requires a cumbersome process to extract this enormous information by manually analyzing these research documents. To automatically extract content of a document in a structured way, metadata and content must be annotated. Scientific community has been focusing on automatic extraction of content by forming different heuristics and applying different machine learning techniques. One of the renowned conference organizers, ESWC organizes state-of-the-art challenge to extract metadata like authors, affiliations, countries in affiliations, supplementary material, sections, table, figures, funding agencies, and EU funded projects from PDF files of research articles. We have proposed a feature centric technique that can be used to extract logical layout structure of articles from publishers with diversified composition styles. To extract unique metadata from a research article placed in logical layout structure, we have developed a four-staged novel approach “FLAG-PDFe”. The approach is built upon distinct and generic features based on the textual and the geometric information from the raw content of research documents. At the first stage, the distinct features are used to identify different physical layout components of an individual article. Since research journals follow their unique publishing styles and layout formats, therefore, we develop generic features to handle these diversified publishing patterns. We employ support vector classification (SVC) in the third stage to extract the logical layout structure (LLS)/ sections of an article, after performing comprehensive evaluation of generic features and machine learning models. Finally, we further apply heuristics on LLS to extract the desired metadata of an article. The outcomes of the study are obtained using the gold standard data set. The results yields 0.877 recall, precision 0.928 and 0.897 F-measure. Our approach has achieved a 16% gain on f-measure when compared to the best approach of the ESWC challenge.
Published: 2020
Full Text: View/download PDF

32. Reference Metadata Extraction from Korean Research Papers

Author: Seol, Jae-Wook, Choi, Won-Jun, Jeong, Hee-Seok, Hwang, Hye-Kyong, Yoon, Hwa-Mook, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Groza, Adrian, editor, and Prasath, Rajendra, editor
Published: 2018
Full Text: View/download PDF

33. A Metadata Extractor for Books in a Digital Library

Author: Akhtar, Sk. Simran, Sanyal, Debarshi Kumar, Chattopadhyay, Samiran, Bhowmick, Plaban Kumar, Das, Partha Pratim, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Dobreva, Milena, editor, Hinze, Annika, editor, and Žumer, Maja, editor
Published: 2018
Full Text: View/download PDF

34. Metadata Extraction for Scientific Papers

Author: Meng, Binjie, Hou, Lei, Yang, Erhong, Li, Juanzi, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Sun, Maosong, editor, Liu, Ting, editor, Wang, Xiaojie, editor, Liu, Zhiyuan, editor, and Liu, Yang, editor
Published: 2018
Full Text: View/download PDF

35. Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark data.

Author: Schembera, Björn
Subjects: *METADATA, *MIDDLE Ages, *DATA management, *RAINBOWS, *ANNOTATIONS
Abstract: The deluge of dark data is about to happen. Lacking data management capabilities, especially in the field of supercomputing, and missing data documentation (i.e., missing metadata annotation) constitute a major source of dark data. The present work contributes to addressing this challenge by presenting ExtractIng, a generic automated metadata extraction toolkit. Existing metadata information of simulation output files scattered through the file system, can be aggregated, parsed and converted to the EngMeta metadata model. Use cases from computational engineering are considered to demonstrate the viability of ExtractIng. The evaluation results show that the metadata extraction is simulation-code independent in the sense that it can handle data outputs from various fields of science, is easy to integrate into simulation workflows and compatible with a multitude of computational environments. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

36. Automatic Detection of the Boundary between Metadata and Body in Persian Theses using BA_SVM.

Author: Rahnama, Mohadese, Hossein Hasheminejad, Seyed Mohammad, and Nasiri, Jalal A.
Subjects: METADATA, SUPPORT vector machines, INFORMATION retrieval, ALGORITHMS
Abstract: Metadata extraction facilitates the process of indexing and improves information retrieval. Also automation of this process increases efficiency more than manual extraction. The example of the thesis metadata are names of students, professors, title, field, degree, abstract, keywords, etc. In this paper the aim is automatic boundary detection of metadata from the main body in Persian theses. Therefore, 250 theses collected from IRANDOC system. Features were extracted from paragraphs of each thesis then paragraphs were classified using support vector machine into 2 classes: metadata and body. In this study, Bat algorithm is used to set the parameter of SVM. The result reveals that the proposed method predicts type of paragraphs with 96.6 percent accuracy. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

37. An Efficient Framework for Algorithmic Metadata Extraction over Scholarly Documents Using Deep Neural Networks

Author: Raghavendra Nayaka, P. and Ranjan, Rajeev
Published: 2023
Full Text: View/download PDF

38. Metadata Extraction Analysis: A Review of Video Data in Effect to Social Media Compression

Author: Dawn Iris Calibo and Jasmin D. Niguidula
Subjects: video compression, metadata extraction, video analysis, social media., Computer software, QA76.75-76.765
Abstract: In the 21st century, the unending improvement of the web, online networking has made approaches to take advantage to streamline the speed of its webpage. Through metatadta extraction, two of the most well-known social networking sites were subjected to experimentation to see the impacts of video compression using standard parameters when uploaded to the said sites. The assessment demonstrates that both sites display keys similarities and differences. It is further explained in the research the different structures amongst the social network that create the outcome
Published: 2019
Full Text: View/download PDF

39. Video Summarization Framework for Newscasts and Reports – Work in Progress

Author: Leszczuk, Mikołaj, Grega, Michał, Koźbiał, Arian, Gliwski, Jarosław, Wasieczko, Krzysztof, Smaïli, Kamel, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Dziech, Andrzej, editor, and Czyżewski, Andrzej, editor
Published: 2017
Full Text: View/download PDF

40. ارائة روشی ساختارمحور برای ایجاد پایگاه &#1583...

Author: آزاده فخرزاده and امیرحسین صدیقی
Subjects: SCIENCE databases, INFORMATION resources, ALGORITHMS, IMAGE processing, SEARCH engines, PYTHON programming language
Abstract: Figures in scientific documents are rich sources of information. The first step in retrieving information from such figures is to build a valid figure database. To this end, we developed a system for generating figure database from scholarly Persian documents, in large scale. The first step is to parse files and extract figures and their corresponding descriptions. There are two general approaches for extracting figures from documents. One is based on image processing methods and another is based on processing the file primitives. The focus of this paper is on latter one. This approach is shown to be a better choice for the search engines because of its speed and scalability properties. We propose a structure based method that extracts the figures and their descriptions by analyzing the file layout. This information is saved in a database with a specific structure and is indexed for retrieval in the search engine. The proposed algorithm was implemented in Python programming language. As a benchmark we used the basic method in the literature which is based on the processing PDF file. We employed the proposed method in a case study on Iran scientific information database (Ganj). In this regard, 150 scientific documents were randomly chosen from Ganj database and analyzed using two mentioned methods. Based on our experimental results, the proposed method is more efficient than the basic method especially for Persian documents. There are many unanswered challenges for Persian documents when using the basic method. The number of noise images resulted from the basic method is high and Persian text extracted is not well organized. Our proposed method overcomes some of these drawbacks and is recommended for generating figure database from scientific Persian documents. The proposed method is able to correctly extract about 40% of the images with their corresponding descriptions which is 10% better than the basic method. [ABSTRACT FROM AUTHOR]
Published: 2020

41. An open-source framework for neuroscience metadata management applied to digital reconstructions of neuronal morphology.

Author: Bijari, Kayvan, Akram, Masood A., and Ascoli, Giorgio A.
Subjects: NEUROSCIENCES, ANIMAL species, METADATA, ARTIFICIAL intelligence, MORPHOLOGY, DATABASE design
Abstract: Research advancements in neuroscience entail the production of a substantial amount of data requiring interpretation, analysis, and integration. The complexity and diversity of neuroscience data necessitate the development of specialized databases and associated standards and protocols. NeuroMorpho.Org is an online repository of over one hundred thousand digitally reconstructed neurons and glia shared by hundreds of laboratories worldwide. Every entry of this public resource is associated with essential metadata describing animal species, anatomical region, cell type, experimental condition, and additional information relevant to contextualize the morphological content. Until recently, the lack of a user-friendly, structured metadata annotation system relying on standardized terminologies constituted a major hindrance in this effort, limiting the data release pace. Over the past 2 years, we have transitioned the original spreadsheet-based metadata annotation system of NeuroMorpho.Org to a custom-developed, robust, web-based framework for extracting, structuring, and managing neuroscience information. Here we release the metadata portal publicly and explain its functionality to enable usage by data contributors. This framework facilitates metadata annotation, improves terminology management, and accelerates data sharing. Moreover, its open-source development provides the opportunity of adapting and extending the code base to other related research projects with similar requirements. This metadata portal is a beneficial web companion to NeuroMorpho.Org which saves time, reduces errors, and aims to minimize the barrier for direct knowledge sharing by domain experts. The underlying framework can be progressively augmented with the integration of increasingly autonomous machine intelligence components. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

42. Architectural Challenges of Genotype-Phenotype Data Management

Author: Chlebiej, Michał, Habela, Piotr, Rutkowski, Andrzej, Szulc, Iwona, Wiśniewski, Piotr, Stencel, Krzysztof, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Liu, Ting, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Kozielski, Stanisław, editor, Mrozek, Dariusz, editor, Kasprowski, Paweł, editor, Małysiak-Mrozek, Bożena, editor, and Kostrzewa, Daniel, editor
Published: 2016
Full Text: View/download PDF

43. Extracting enhanced artificial intelligence model metadata from software repositories

Author: Tsay, Jason, Braz, Alan, Hirzel, Martin, Shinnar, Avraham, and Mummert, Todd
Published: 2022
Full Text: View/download PDF

44. Metadata Extraction from Conference Proceedings Using Template-Based Approach

Author: Kovriguina, Liubov, Shipilo, Alexander, Kozlov, Fedor, Kolchin, Maxim, Cherny, Eugene, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Gandon, Fabien, editor, Cabrio, Elena, editor, Stankovic, Milan, editor, and Zimmermann, Antoine, editor
Published: 2015
Full Text: View/download PDF

45. Discovering the Topical Evolution of the Digital Library Evaluation Community

Author: Papachristopoulos, Leonidas, Kleidis, Nikos, Sfakakis, Michalis, Tsakonas, Giannis, Papatheodorou, Christos, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Garoufallou, Emmanouel, editor, Hartley, Richard J., editor, and Gaitanou, Panorea, editor
Published: 2015
Full Text: View/download PDF

46. Bookmarklet-Triggered Literature Metadata Extraction System Using Cloud Plugins

Author: Ma, Kun, Abraham, Ajith, Kacprzyk, Janusz, Series editor, Abraham, Ajith, editor, Muda, Azah Kamilah, editor, and Choo, Yun-Huoy, editor
Published: 2015
Full Text: View/download PDF

47. Exploring LOD through metadata extraction and data-driven visualizations

Author: Oscar Peña, Unai Aguilera, and Diego López-de-Ipiña
Published: 2016
Full Text: View/download PDF

48. Metadata Extraction and Management in Data Lakes With GEMMS

Author: Christoph Quix, Rihan Hai, and Ivan Vatov
Subjects: Metadata management, data integration, scientific data, metadata extraction, data lakes, Information technology, T58.5-58.64
Abstract: In addition to volume and velocity, Big data is also characterized by its variety. Variety in structure and semantics requires new integration approaches which can resolve the integration challenges also for large volumes of data. Data lakes should reduce the upfront integration costs and provide a more flexible way for data integration and analysis, as source data is loaded in its original structure to the data lake repository. Some syntactic transformation might be applied to enable access to the data in one common repository; however, a deep semantic integration is done only after the initial loading of the data into the data lake. Thereby, data is easily made available and can be restructured, aggregated, and transformed as required by later applications. Metadata management is a crucial component in a data lake, as the source data needs to be described by metadata to capture its semantics. We developed a Generic and Extensible Metadata Management System for data lakes (called GEMMS) that aims at the automatic extraction of metadata from a wide variety of data sources. Furthermore, the metadata is managed in an extensible metamodel that distinguishes structural and semantical metadata. The use case applied for evaluation is from the life science domain where the data is often stored only in files which hinders data access and efficient querying. The GEMMS framework has been proven to be useful in this domain. Especially, the extensibility and flexibility of the framework are important, as data and metadata structures in scientific experiments cannot be defined a priori.
Published: 2016
Full Text: View/download PDF

49. Mathematical Content Semantic Markup Methods and Open Scientific E-Journals Management Systems

Author: Elizarov, Alexander, Lipachev, Evgeny, Zuev, Denis, Junqueira Barbosa, Simone Diniz, Series editor, Chen, Phoebe, Series editor, Cuzzocrea, Alfredo, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Ślęzak, Dominik, Series editor, Washio, Takashi, Series editor, Yang, Xiaokang, Series editor, Klinov, Pavel, editor, and Mouromtsev, Dmitry, editor
Published: 2014
Full Text: View/download PDF

50. MetaExtractor: A System for Metadata Extraction from Structured Data Sources

Author: Pomares-Quimbaya, Alexandra, Torres-Moreno, Miguel Eduardo, Roldán, Fabián, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Cuzzocrea, Alfredo, editor, Kittl, Christian, editor, Simos, Dimitris E., editor, Weippl, Edgar, editor, and Xu, Lida, editor
Published: 2013
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

197 results on '"metadata extraction"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources