197 results on '"metadata extraction"'
Search Results
2. Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework.
- Author
-
Wang, Zongmin, Shi, Xujie, Yang, Haibo, Yu, Bo, and Cai, Yingchun
- Subjects
- *
NATURAL disasters , *CLUSTER analysis (Statistics) , *INFORMATION technology , *METADATA , *DATA mining , *NATURAL resources - Abstract
The development of information technology has led to massive, multidimensional, and heterogeneously sourced disaster data. However, there's currently no universal metadata standard for managing natural disasters. Common pre-training models for information extraction requiring extensive training data show somewhat limited effectiveness, with limited annotated resources. This study establishes a unified natural disaster metadata standard, utilizes self-trained universal information extraction (UIE) models and Python libraries to extract metadata stored in both structured and unstructured forms, and analyzes the results using the Word2vec-Kmeans cluster algorithm. The results show that (1) the self-trained UIE model, with a learning rate of 3 × 10−4 and a batch_size of 32, significantly improves extraction results for various natural disasters by over 50%. Our optimized UIE model outperforms many other extraction methods in terms of precision, recall, and F1 scores. (2) The quality assessments of consistency, completeness, and accuracy for ten tables all exceed 0.80, with variances between the three dimensions being 0.04, 0.03, and 0.05. The overall evaluation of data items of tables also exceeds 0.80, consistent with the results at the table level. The metadata model framework constructed in this study demonstrates high-quality stability. (3) Taking the flood dataset as an example, clustering reveals five main themes with high similarity within clusters, and the differences between clusters are deemed significant relative to the differences within clusters at a significance level of 0.01. Overall, this experiment supports effective sharing of disaster data resources and enhances natural disaster emergency response efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Deep Neural Networks for Automated Metadata Extraction
- Author
-
El Omari, Abdellah, Antari, Jilali, Elkina, Hamza, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Motahhir, Saad, editor, and Bossoufi, Badre, editor
- Published
- 2024
- Full Text
- View/download PDF
4. Real-Time Security Risk Assessment From CCTV Using Hand Gesture Recognition
- Author
-
Murat Koca
- Subjects
CCTV footage ,deep learning ,cyber security ,hand gesture recognition ,media-pipe ,metadata extraction ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Closed-Circuit Television (CCTV) surveillance systems, long associated with physical security, are becoming more crucial when combined with cybersecurity measures. Combining traditional surveillance with cyber defenses is a flexible method for protecting against both physical and digital dangers. This study introduces the use of convolutional neural networks (CNNs) and hand gesture detection using CCTV data to perform real-time security risk assessments. The suggested method’s emphasis on automated extraction of key information, such as identity and behavior, illustrates its special use in silent or acoustically challenging settings. This study uses deep learning techniques to develop a novel approach for detecting hand gestures in CCTV images by automatically extracting relevant features using a media-pipe architecture. For instance, it facilitates risk assessment through the use of hand gestures in noisy environments or muted audio streams. Given this method’s uniqueness and efficiency, the suggested solution will be able to alert appropriate authorities in the event of a security breach. There seems to be considerable opportunity for the development of applications in several domains of security, law enforcement, and public safety, including but not limited to shopping malls, educational institutions, transportation, the armed forces, theft, abduction, etc.
- Published
- 2024
- Full Text
- View/download PDF
5. Generic features selection for structure classification of diverse styled scholarly articles.
- Author
-
Waqas, Muhammad and Anjum, Nadeem
- Abstract
The enormous growth in online research publications in diversified domains has attracted the research community to extract these valuable scientific resources by searching online digital libraries and publishers' websites. A precise search is desired to enlist most related articles by applying semantic queries to the document's metadata and the structural elements. The online search engines and digital libraries offer only keyword-based search on full-body text, which creates excessive results. Therefore, the research article's structural and metadata information has to be stored in machine comprehendible form by the online research publishers. The research community in recent years has adopted different approaches to extract structural information from research documents like rule-based heuristics and machine-learning-based approaches. Studies suggest that machine-learning-based techniques have produced optimum results for document structure extraction from publishers having diversified publication layouts. In this paper, we have proposed thirteen different logical layout structural (LLS) components. We have identified a two-staged innovative set of generic features that are associated with the LLS. This approach has given our technique an advantage against the state-of-the-art for structural classification of digital scientific articles with diversified publication styles. We have applied chi-square ( c h i 2 ) for feature selection, and the final result has revealed that SVM (Kernal function) has produced an optimum result with an overall F-measure of 0.95. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Event and Entity Extraction from Generated Video Captions
- Author
-
Scherer, Johannes, Bhowmik, Deepayan, Scherp, Ansgar, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Holzinger, Andreas, editor, Kieseberg, Peter, editor, Cabitza, Federico, editor, Campagner, Andrea, editor, Tjoa, A Min, editor, and Weippl, Edgar, editor
- Published
- 2023
- Full Text
- View/download PDF
7. Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings.
- Author
-
Skondras, Panagiotis, Zotos, Nikos, Lagios, Dimitris, Zervas, Panagiotis, Giotopoulos, Konstantinos C., and Tzimas, Giannis
- Subjects
- *
JOB postings , *DEEP learning , *MACHINE learning , *FEEDFORWARD neural networks , *LANGUAGE models , *METADATA , *JOB classification - Abstract
This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly used to analyze and classify job postings. However, the effectiveness of these algorithms largely hinges on the quality and volume of the training data. In our study, we propose a multi-class classification methodology for job postings, drawing on AI models such as text-davinci-003 and the quantized versions of Falcon 7b (Falcon), Wizardlm 7B (Wizardlm), and Vicuna 7B (Vicuna) to generate synthetic datasets. These synthetic data are employed in two use-case scenarios: (a) exclusively as training datasets composed of synthetic job postings (situations where no real data is available) and (b) as an augmentation method to bolster underrepresented job title categories. To evaluate our proposed method, we relied on two well-established approaches: the feedforward neural network (FFNN) and the BERT model. Both the use cases and training methods were assessed against a genuine job posting dataset to gauge classification accuracy. Our experiments substantiated the benefits of using synthetic data to enhance job posting classification. In the first scenario, the models' performance matched, and occasionally exceeded, that of the real data. In the second scenario, the augmented classes consistently outperformed in most instances. This research confirms that AI-generated datasets can enhance the efficacy of NLP algorithms, especially in the domain of multi-class classification job postings. While data augmentation can boost model generalization, its impact varies. It is especially beneficial for simpler models like FNN. BERT, due to its context-aware architecture, also benefits from augmentation but sees limited improvement. Selecting the right type and amount of augmentation is essential. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. Generating Synthetic Resume Data with Large Language Models for Enhanced Job Description Classification †.
- Author
-
Skondras, Panagiotis, Zervas, Panagiotis, and Tzimas, Giannis
- Subjects
LANGUAGE models ,MACHINE learning ,JOB classification ,TRANSFORMER models ,JOB descriptions ,NATURAL language processing ,FEEDFORWARD neural networks - Abstract
In this article, we investigate the potential of synthetic resumes as a means for the rapid generation of training data and their effectiveness in data augmentation, especially in categories marked by sparse samples. The widespread implementation of machine learning algorithms in natural language processing (NLP) has notably streamlined the resume classification process, delivering time and cost efficiencies for hiring organizations. However, the performance of these algorithms depends on the abundance of training data. While selecting the right model architecture is essential, it is also crucial to ensure the availability of a robust, well-curated dataset. For many categories in the job market, data sparsity remains a challenge. To deal with this challenge, we employed the OpenAI API to generate both structured and unstructured resumes tailored to specific criteria. These synthetically generated resumes were cleaned, preprocessed and then utilized to train two distinct models: a transformer model (BERT) and a feedforward neural network (FFNN) that incorporated Universal Sentence Encoder 4 (USE4) embeddings. While both models were evaluated on the multiclass classification task of resumes, when trained on an augmented dataset containing 60 percent real data (from Indeed website) and 40 percent synthetic data from ChatGPT, the transformer model presented exceptional accuracy. The FFNN, albeit predictably, achieved lower accuracy. These findings highlight the value of augmented real-world data with ChatGPT-generated synthetic resumes, especially in the context of limited training data. The suitability of the BERT model for such classification tasks further reinforces this narrative. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Annotated Open Corpus Construction and BERT-Based Approach for Automatic Metadata Extraction From Korean Academic Papers
- Author
-
Hyesoo Kong, Hwamook Yoon, Jaewook Seol, Mihwan Hyun, Hyejin Lee, Soonyoung Kim, and Wonjun Choi
- Subjects
BERT ,corpus construction ,metadata extraction ,transfer learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
With the accelerating development of science and technology, the academic papers being published in various fields are increasing rapidly. Academic papers specially in science and technology fields are a crucial media for researchers who develop new technologies by identifying knowledge regarding the latest technological trends and conduct derivative studies in science and technology. Therefore, the continual collection of extensive academic papers, structuring of metadata, and construction of databases are significant tasks. However, research on automatic metadata extraction from Korean papers is not being actively conducted currently owing to insufficient Korean training data. We automatically constructed the largest labeled corpus in South Korea to date from 315,320 PDF papers belonging to 503 Korean academic journals and this labeled corpus can be used for training the models of automatic extraction for 12 metadata types from PDF papers. This labeled corpus is available at https://doi.org/10.23057/48. Moreover, we developed inspection process and guidelines for the automatically constructed data and performed a full inspection of the validation and testing data. The reliability of the inspected data was verified through the inter-annotator agreement measurement. Using our corpus, we trained and evaluated the BERT based transfer learning model to verify its reliability. Furthermore, we proposed new training methods that can improve the metadata extraction performance of Korean papers, and through these methods, we developed KorSciBERT-ME-J and KorSciBERT-ME-J+C models. The KorSciBERT-ME-J showed the highest performance with an F1 score of 99.36%, as well as robust performance in automatic metadata extraction from Korean academic papers in various formats.
- Published
- 2023
- Full Text
- View/download PDF
10. A hybrid strategy to extract metadata from scholarly articles by utilizing support vector machine and heuristics.
- Author
-
Waqas, Muhammad, Anjum, Nadeem, and Afzal, Muhammad Tanvir
- Abstract
The immense growth in online research publications has attracted the research community to extract valuable information from scientific resources by exploring online digital libraries and publishers' websites. The metadata stored in a machine comprehendible form can facilitate a precise search to enlist most related articles by applying semantic queries to the document's metadata and the structural elements. The online search engines and digital libraries offer only keyword-based search on full-body text, which creates excessive results. The research community in recent years has adopted different approaches to extract structural information from research documents. We have distributed the content of an article into two logical layouts and metadata levels. This strategy has given our technique an advantage over the state-of-the-art (SOTA) extracting metadata with diversified publication styles. The experimental results have revealed that the proposed approach has shown a significant gain in performance of 20.26% to 27.14%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. EKSTRAKCIJA METAPODATKOV S POMOČJO STROJNEGA UČENJA.
- Author
-
SABADIN, Ivančica
- Abstract
Copyright of Moderna Arhivistika is the property of Maribor Provincial Archives and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
12. Multi-perspective Approach for Curating and Exploring the History of Climate Change in Latin America within Digital Newspapers.
- Author
-
Vargas-Solar, Genoveva, Zechinelli-Martini, José-Luis, A. Espinosa-Oviedo, Javier, and M. Vilches-Blázquez, Luis
- Abstract
This paper introduces a multi-perspective approach to deal with curation and exploration issues in historical newspapers. It has been implemented in the platform LACLICHEV (Latin American Climate Change Evolution platform). Exploring the history of climate change through digitalized newspapers published around two centuries ago introduces four challenges: (1) curating content for tracking entries describing meteorological events; (2) processing (digging into) colloquial language (and its geographic variations5 ) for extracting meteorological events; (3) analyzing newspapers to discover meteorological patterns possibly associated with climate change; (4) designing tools for exploring the extracted content. LACLICHEV provides tools for curating, exploring, and analyzing historical newspaper articles, their description and location, and the vocabularies used for referring to meteorological events. This platform makes it possible to understand and identify possible patterns and models that can build an empirical and social view of the history of climate change in the Latin American region. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Using Provenance in Data Analytics for Seismology: Challenges and Directions
- Author
-
da Costa, Umberto Souza, Espinosa-Oviedo, Javier Alfonso, Musicante, Martin A., Vargas-Solar, Genoveva, Zechinelli-Martini, José-Luis, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Chiusano, Silvia, editor, Cerquitelli, Tania, editor, Wrembel, Robert, editor, Nørvåg, Kjetil, editor, Catania, Barbara, editor, Vargas-Solar, Genoveva, editor, and Zumpano, Ester, editor
- Published
- 2022
- Full Text
- View/download PDF
14. Re-purposing Excavation Database Content as Paradata: An Explorative Analysis of Paradata Identification Challenges and Opportunities
- Author
-
Lisa Börjesson, Olle Sköld, Zanna Friberg, Daniel Löwenborg, Gísli Pálsson, and Isto Huvila
- Subjects
metadata ,paradata ,metadata extraction ,data reuse ,research data ,unstructured data ,archaeological data ,Bibliography. Library science. Information resources - Abstract
Although data reusers request information about how research data was created and curated, this information is often non-existent or only briefly covered in data descriptions. The need for such contextual information is particularly critical in fields like archaeology, where old legacy data created during different time periods and through varying methodological framings and fieldwork documentation practices retains its value as an important information source. This article explores the presence of contextual information in archaeological data with a specific focus on data provenance and processing information, i.e., paradata. The purpose of the article is to identify and explicate types of paradata in field observation documentation. The method used is an explorative close reading of field data from an archaeological excavation enriched with geographical metadata. The analysis covers technical and epistemological challenges and opportunities in paradata identification, and discusses the possibility of using identified paradata in data descriptions and for data reliability assessments. Results show that it is possible to identify both knowledge organisation paradata (KOP) relating to data structuring and knowledge-making paradata (KMP) relating to fieldwork methods and interpretative processes. However, while the data contains many traces of the research process, there is an uneven and, in some categories, low level of structure and systematicity that complicates automated metadata and paradata identification and extraction. The results show a need to broaden the understanding of how structure and systematicity are used and how they impact research data in archaeology and comparable field sciences. The insight into how a dataset's KOP and KMP can be read is also a methodological contribution to data literacy research and practice development. On a repository level, the results underline the need to include paradata about dataset creation, purpose, terminology, dataset internal and external relations, and eventual data colloquialisms that require explanation to reusers.
- Published
- 2022
- Full Text
- View/download PDF
15. Automatic Annotation of Images in Persian Scientific Documents Based on Text Analysis Methods
- Author
-
Azadeh fakhrzadeh, Mohadeseh Rahnama, and Jalal A Nasiri
- Subjects
image tagging ,text analysis ,image annotation ,image retrieval ,metadata extraction ,information technology ,Bibliography. Library science. Information resources - Abstract
In this paper a new method for annotating images in Persian scientific documents is suggested. Images in scientific documents contain valuable information. In many cases, by analyzing images one can understand the main idea and important results of the document. Due to explosive growth of image data, automatic image annotation has attracted extensive attention and become one of the growing subjects in the literature. Image annotation is the first step in image retrieval methods, in which descriptive tags are assigned to each image. Here, for image annotation the associated text is used. The caption and the part of the document that includes the reference to the image are considered. None phrases in the associated text are ranked based on five different methods: term frequency, inverse document frequency, term frequency–inverse document frequency, cosine similarity between word embedding of noun phrases in the text and the caption and using both term frequency–inverse document frequency and cosine similarity methods. Image tags in every method are the noun phrases with the highest rank. Suggested methods are evaluated on the test data from Iran scientific information database (Ganj), the main database of Persian scientific documents. Term frequency–inverse document frequency method gives the best results.
- Published
- 2022
16. LACLICHEV: Exploring the History of Climate Change in Latin America Within Newspapers Digital Collections
- Author
-
Vargas-Solar, Genoveva, Zechinelli-Martini, José-Luis, Espinosa-Oviedo, Javier A., Vilches-Blázquez, Luis M., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Bellatreche, Ladjel, editor, Dumas, Marlon, editor, Karras, Panagiotis, editor, Matulevičius, Raimundas, editor, Awad, Ahmed, editor, Weidlich, Matthias, editor, Ivanović, Mirjana, editor, and Hartig, Olaf, editor
- Published
- 2021
- Full Text
- View/download PDF
17. Üstverilerin Derin Öğrenme Algoritmaları Kullanılarak Otomatik Olarak Çıkartılması ve Sınıflanması
- Author
-
Murat İnce
- Subjects
üstveri çıkartma ,konvolüsyonel sinir ağları ,tekrarlayan sinir ağları ,metadata extraction ,convolutional neural networks ,recurrent neural networks ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Science ,Science (General) ,Q1-390 - Abstract
Günümüzde bilişim teknolojilerinin yaygınlaşması sebebiyle dijital içerik ihtiyacı artmıştır. Bu içeriklerin oluşturulması zaman alıcı ve maliyetli bir süreçtir. İçerik oluşturulurken öğrenme nesnelerinden faydalanılmaktadır. Bu nesnelerin bilgisayarlar tarafından keşfedilebilir ve okunabilir olması yeniden kullanılabilirlik ve paylaşılabilirlik açısından önemlidir. Bu sebeple nesneler tanımlayıcı kimlik bilgilerini içeren üstveriler ile bütünleşik olarak kullanılmaktadırlar. Bu üstveriler ne kadar düzgün oluşturulup sınıflandırılırsa nesnelerin kullanılabilirliği o derece artmış olmaktadır. Bu sebeple nesnelerden otomatik üstveri çıkartan birçok yöntem geliştirilmiştir. Bu çalışmada da Konvolüsyonel Sinir Ağları (KSA), Tekrarlayan Sinir Ağları (TSA) gibi derin öğrenme ve Doğal Dil İşleme (DDİ) yöntemleri kullanılarak öğrenme nesnelerindeki içeriklerden otomatik olarak üstveri çıkartılması ve sınıflaması yapılmıştır. Sistemin başarısı ve doğruluğu örnek öğrenme nesneleri ile test edilmiştir. Sonuçlar sistemin başarılı bir şekilde kullanılabileceğini göstermiştir.
- Published
- 2021
- Full Text
- View/download PDF
18. Generating Synthetic Resume Data with Large Language Models for Enhanced Job Description Classification
- Author
-
Panagiotis Skondras, Panagiotis Zervas, and Giannis Tzimas
- Subjects
metadata extraction ,resumes ,CV ,big data ,multiclass classification ,ChatGPT ,Information technology ,T58.5-58.64 - Abstract
In this article, we investigate the potential of synthetic resumes as a means for the rapid generation of training data and their effectiveness in data augmentation, especially in categories marked by sparse samples. The widespread implementation of machine learning algorithms in natural language processing (NLP) has notably streamlined the resume classification process, delivering time and cost efficiencies for hiring organizations. However, the performance of these algorithms depends on the abundance of training data. While selecting the right model architecture is essential, it is also crucial to ensure the availability of a robust, well-curated dataset. For many categories in the job market, data sparsity remains a challenge. To deal with this challenge, we employed the OpenAI API to generate both structured and unstructured resumes tailored to specific criteria. These synthetically generated resumes were cleaned, preprocessed and then utilized to train two distinct models: a transformer model (BERT) and a feedforward neural network (FFNN) that incorporated Universal Sentence Encoder 4 (USE4) embeddings. While both models were evaluated on the multiclass classification task of resumes, when trained on an augmented dataset containing 60 percent real data (from Indeed website) and 40 percent synthetic data from ChatGPT, the transformer model presented exceptional accuracy. The FFNN, albeit predictably, achieved lower accuracy. These findings highlight the value of augmented real-world data with ChatGPT-generated synthetic resumes, especially in the context of limited training data. The suitability of the BERT model for such classification tasks further reinforces this narrative.
- Published
- 2023
- Full Text
- View/download PDF
19. Automatic Detection of the Boundary between Metadata and Body in Persian Theses using BA_SVM
- Author
-
Mohadese Rahnama, Seyed Mohammad Hossein Hasheminejad, and Jalal A Nasiri
- Subjects
metadata extraction ,information extraction ,support vector machine (svm) ,metaheuristic algorithm ,bat algorithm (ba) ,Bibliography. Library science. Information resources - Abstract
Metadata extraction facilitates the process of indexing and improves information retrieval. Also automation of this process increases efficiency more than manual extraction. The example of the thesis metadata are names of students, professors, title, field, degree, abstract, keywords, etc. In this paper the aim is automatic boundary detection of metadata from the main body in Persian theses. Therefore, 250 theses collected from IRANDOC system. Features were extracted from paragraphs of each thesis then paragraphs were classified using support vector machine into 2 classes: metadata and body. In this study, Bat algorithm is used to set the parameter of SVM. The result reveals that the proposed method predicts type of paragraphs with 96.6 percent accuracy.
- Published
- 2021
20. LAME: Layout-Aware Metadata Extraction Approach for Research Articles.
- Author
-
Jongyun Choi, Hyesoo Kong, Hwamook Yoon, Heungseon Oh, and Yuchul Jung
- Subjects
METADATA ,SCHOLARLY periodicals ,CONFERENCE papers ,ACADEMIC conferences - Abstract
The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of automatic layout analysis, construction of a large meta-data training set, and implementation of metadata extractor). In the framework, we designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed a pre-trainedmodel, Layout-MetaBERT, to extract the metadata from academic journals with varying layout formats. The experimental results with our metadata extractor exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Understanding Scanner Utilization With Real-Time DICOM Metadata Extraction
- Author
-
Pradeeban Kathiravelu, Ashish Sharma, and Puneet Sharma
- Subjects
Biomedical imaging ,digital imaging and communications in medicine (DICOM) ,metadata extraction ,picture archiving and communication system (PACS) ,scanner utilization ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Understanding system performance metrics ensures better utilization of the radiology resources with more targeted interventions. The images produced by radiology scanners typically follow the DICOM (Digital Imaging and Communications in Medicine) standard format. The DICOM images consist of textual metadata that can be used to calculate key timing parameters, such as the exact study durations and scanner utilization. However, hospital networks lack the resources and capabilities to extract the metadata from the images quickly and automatically compute the scanner utilization properties. Thus, they resort to using data records from the Radiology Information Systems (RIS). However, data acquired from RIS are prone to human errors, rendering many derived key performance metrics inadequate and inaccurate. Hence, there is motivation to establish a real-time image transfer from the Picture Archiving and Communication Systems (PACS) to receive the DICOM images from the scanners to research clusters to conduct such metadata processing to evaluate scanner utilization metrics efficiently and quickly. This paper analyzes the scanners' utilization by developing a real-time monitoring framework that retrieves radiology images into a research cluster using the DICOM networking protocol and then extracts and processes the metadata from the images. Our proposed approach facilitates a better understanding of scanner utilization across a vast healthcare network by observing properties such as study duration, the interval between the encounters, and the series count of studies. Benchmarks against using the RIS data indicate that our proposed framework based on real-time PACS data estimates the scanner utilization more accurately. Furthermore, our framework has been running stable and performing its computation for more than two years on our extensive healthcare network in pseudo real-time.
- Published
- 2021
- Full Text
- View/download PDF
22. Dijital Kütüphanelerde Dokümanlardan Bilgi Geri Kazanımı için Kullanılan Güncel Teknolojiler: Derleme Çalışması
- Author
-
Mohamed Amin Abdisamad, Alev Mutlu, Furkan Göz, Öztürk Tüfekçi, Kerem Küçük, and Osman Kabasakal
- Subjects
doküman işleme ,üst veri çıkarımı ,varlık ismi tanıma ,anahtar kelime çıkarımı ,doküman benzerliği ,document processing ,metadata extraction ,name entity recognition ,keyword extraction ,document similarity ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Science ,Science (General) ,Q1-390 - Abstract
Son yıllarda, farklı konular için sunulan dijital bilgi kaynaklarının sayısı aşırı miktarda artmaktadır. Bu dijital bilgi kaynaklarına erişim desteği sunan sistemlerin birçoğu tarama, arama ve bilgi geri kazanımı araçlarına odaklanmıştır. Sayısal kütüphaneler, elektronik kitaplıklar ve Web sayfaları, bilgi erişimini iyileştirmek, belge koleksiyonlarını farklı anahtar kriterlere göre hiyerarşik olarak oluşturmak ve düzenlemek için yeni birçok açılım sunmaktadır. Farklı arama araçları, bilgi erişim teknikleri kullanılarak erişilebilen belgeleri düzenlemek, endekslemek ve özetlemek için yazılım tabanlı hizmetleri kullanarak daha kapsamlı bir doküman kapsamı sunulabilmektedir. Dijital kütüphanelerdeki arama mekanizmalarına uygulanan teknolojiler, doküman koleksiyonlarını yönetmek, anlamlı veri çıkarmak ve doküman ilişkilerinin belirlenmesi için farklı yöntem ve teknolojilerin kullanımını zorunlu kılmıştır. Özellikle belgeler arasındaki ilişki ne biçimleri ne de türleri ile açıkça tanımlanamamaktadır. Bu çalışma, sayısal kütüphaneler için belgelerin içeriğinden üst-veri çıkarımı, varlık isimlerinin elde edilmesi, anahtar kelimelerin elde erilmesi ve doküman benzerliklerinin oluşturulması için kullanılan yöntem ve teknikler için kapsamlı bir çalışma sunmaktadır.
- Published
- 2021
- Full Text
- View/download PDF
23. Text and metadata extraction from scanned Arabic documents using support vector machines.
- Author
-
Qin, Wenda, Elanwar, Randa, and Betke, Margrit
- Subjects
- *
SUPPORT vector machines , *SUPERVISED learning , *DOCUMENT imaging systems , *METADATA , *IMAGE analysis - Abstract
Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recognizers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical L ayout A nalysis of scanned B ooks pages in A rabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. ارائة روشی برای برچسب زدن تصاویر موجود در...
- Author
-
آزاده فخرزاده, محدثه رهنما, and جاللالدین نصیر&
- Subjects
SCIENCE databases ,NOUN phrases (Grammar) ,IMAGE retrieval ,ANNOTATIONS ,INFORMATION technology - Abstract
In this paper a new method for annotating images in Persian scientific documents is suggested. Images in scientific documents contain valuable information. In many cases, by analyzing images one can understand the main idea and important results of the document. Due to explosive growth of image data, automatic image annotation has attracted extensive attention and become one of the growing subjects in the literature. Image annotation is the first step in image retrieval methods, in which descriptive tags are assigned to each image. Here, for image annotation the associated text is used. The caption and the part of the document that includes the reference to the image are considered. None phrases in the associated text are ranked based on five different methods: term frequency, inverse document frequency, term frequency–inverse document frequency, cosine similarity between word embedding of noun phrases in the text and the caption and using both term frequency–inverse document frequency and cosine similarity methods. Image tags in every method are the noun phrases with the highest rank. Suggested methods are evaluated on the test data from Iran scientific information database (Ganj), the main database of Persian scientific documents. Term frequency–inverse document frequency method gives the best results. [ABSTRACT FROM AUTHOR]
- Published
- 2022
25. A Structure-Based Method for Building a Database of Extracted Figures from Scientific Documents: A Case Study of Iran Scientific Information Database (GANJ)
- Author
-
Azadeh Fakhrzadeh and Amir Hossein Seddighi
- Subjects
image processing ,image extraction ,metadata extraction ,information technology ,Bibliography. Library science. Information resources - Abstract
Figures in scientific documents are rich source of information. The first step in retrieving information from such figures is to build a valid figure database. To this end, we developed a system for generating figure database from scholarly Persian documents, in large scale. The first step is to parse files and extract figures and their corresponding descriptions. There are two general approaches for extracting figures from documents, one is based on image processing methods and another one is based on processing the file primitives. The focus of this paper is on later one. This approach is shown to be a better choice for the search engines because of its speed and scalability properties. We propose a structure based method that extracts the figures and their descriptions by analyzing the file layout. This information is saved in a database with a specific structure and is indexed for retrieval in the search engine. The proposed algorithm was implemented in Python programming language. As a benchmark we used the basic method in the literature which is based on the processing PDF file. We employed the proposed method in a case study on Iran scientific information database (Ganj). In this regard, 150 scientific documents were randomly chosen from Ganj database and analyzed using two mentioned methods. Based on our experimental results, the proposed method is more efficient than the basic method especially for Persian documents. There many unanswered challenges for Persian documents when using the basic method. The number of noise images resulted from the basic method is high and Persian text extracted is not well organized. Our proposed method overcomes some of these drawbacks and is recommended for generating figure database from scientific Persian documents. The proposed method is able to correctly extract about 40% of the images with their corresponding descriptions which is 10% better than the basic method.
- Published
- 2020
26. An open-source framework for neuroscience metadata management applied to digital reconstructions of neuronal morphology
- Author
-
Kayvan Bijari, Masood A. Akram, and Giorgio A. Ascoli
- Subjects
Neuroscience curation ,Metadata extraction ,Knowledge engineering ,Data sharing ,Information management tools ,Neuronal morphology ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Research advancements in neuroscience entail the production of a substantial amount of data requiring interpretation, analysis, and integration. The complexity and diversity of neuroscience data necessitate the development of specialized databases and associated standards and protocols. NeuroMorpho.Org is an online repository of over one hundred thousand digitally reconstructed neurons and glia shared by hundreds of laboratories worldwide. Every entry of this public resource is associated with essential metadata describing animal species, anatomical region, cell type, experimental condition, and additional information relevant to contextualize the morphological content. Until recently, the lack of a user-friendly, structured metadata annotation system relying on standardized terminologies constituted a major hindrance in this effort, limiting the data release pace. Over the past 2 years, we have transitioned the original spreadsheet-based metadata annotation system of NeuroMorpho.Org to a custom-developed, robust, web-based framework for extracting, structuring, and managing neuroscience information. Here we release the metadata portal publicly and explain its functionality to enable usage by data contributors. This framework facilitates metadata annotation, improves terminology management, and accelerates data sharing. Moreover, its open-source development provides the opportunity of adapting and extending the code base to other related research projects with similar requirements. This metadata portal is a beneficial web companion to NeuroMorpho.Org which saves time, reduces errors, and aims to minimize the barrier for direct knowledge sharing by domain experts. The underlying framework can be progressively augmented with the integration of increasingly autonomous machine intelligence components.
- Published
- 2020
- Full Text
- View/download PDF
27. PARDA: A Dataset for Scholarly PDF Document Metadata Extraction Evaluation
- Author
-
Fan, Tiantian, Liu, Junming, Qiu, Yeliang, Jiang, Congfeng, Zhang, Jilin, Zhang, Wei, Wan, Jian, Akan, Ozgur, Series Editor, Bellavista, Paolo, Series Editor, Cao, Jiannong, Series Editor, Coulson, Geoffrey, Series Editor, Dressler, Falko, Series Editor, Ferrari, Domenico, Series Editor, Gerla, Mario, Series Editor, Kobayashi, Hisashi, Series Editor, Palazzo, Sergio, Series Editor, Sahni, Sartaj, Series Editor, Shen, Xuemin (Sherman), Series Editor, Stan, Mircea, Series Editor, Xiaohua, Jia, Series Editor, Zomaya, Albert Y., Series Editor, Gao, Honghao, editor, Wang, Xinheng, editor, Yin, Yuyu, editor, and Iqbal, Muddesar, editor
- Published
- 2019
- Full Text
- View/download PDF
28. Selection Methods for Geodata Visualization of Metadata Extracted from Unstructured Digital Data for Scientific Heritage Studies
- Author
-
Prokudin, Dmitry, Levit, Georgy, Hossfeld, Uwe, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Alexandrov, Daniel A., editor, Boukhanovsky, Alexander V., editor, Chugunov, Andrei V., editor, Kabanov, Yury, editor, Koltsova, Olessia, editor, and Musabirov, Ilya, editor
- Published
- 2019
- Full Text
- View/download PDF
29. Semantic Representation of Scientific Publications
- Author
-
Vahdati, Sahar, Fathalla, Said, Auer, Sören, Lange, Christoph, Vidal, Maria-Esther, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Doucet, Antoine, editor, Isaac, Antoine, editor, Golub, Koraljka, editor, Aalberg, Trond, editor, and Jatowt, Adam, editor
- Published
- 2019
- Full Text
- View/download PDF
30. Üstverilerin Derin Öğrenme Algoritmaları Kullanılarak Otomatik Olarak Çıkartılması ve Sınıflanması.
- Author
-
İNCE, Murat
- Abstract
Copyright of Duzce University Journal of Science & Technology is the property of Duzce University Journal of Science & Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
31. FLAG-PDFe: Features Oriented Metadata Extraction Framework for Scientific Publications
- Author
-
Muhammad Waqas Ahmed and Muhammad Tanvir Afzal
- Subjects
Machine learning ,research article ,metadata extraction ,text patterns ,document structure analysis ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The unprecedented growth of the research publications in diversified domains has overwhelmed the research community. It requires a cumbersome process to extract this enormous information by manually analyzing these research documents. To automatically extract content of a document in a structured way, metadata and content must be annotated. Scientific community has been focusing on automatic extraction of content by forming different heuristics and applying different machine learning techniques. One of the renowned conference organizers, ESWC organizes state-of-the-art challenge to extract metadata like authors, affiliations, countries in affiliations, supplementary material, sections, table, figures, funding agencies, and EU funded projects from PDF files of research articles. We have proposed a feature centric technique that can be used to extract logical layout structure of articles from publishers with diversified composition styles. To extract unique metadata from a research article placed in logical layout structure, we have developed a four-staged novel approach “FLAG-PDFe”. The approach is built upon distinct and generic features based on the textual and the geometric information from the raw content of research documents. At the first stage, the distinct features are used to identify different physical layout components of an individual article. Since research journals follow their unique publishing styles and layout formats, therefore, we develop generic features to handle these diversified publishing patterns. We employ support vector classification (SVC) in the third stage to extract the logical layout structure (LLS)/ sections of an article, after performing comprehensive evaluation of generic features and machine learning models. Finally, we further apply heuristics on LLS to extract the desired metadata of an article. The outcomes of the study are obtained using the gold standard data set. The results yields 0.877 recall, precision 0.928 and 0.897 F-measure. Our approach has achieved a 16% gain on f-measure when compared to the best approach of the ESWC challenge.
- Published
- 2020
- Full Text
- View/download PDF
32. Reference Metadata Extraction from Korean Research Papers
- Author
-
Seol, Jae-Wook, Choi, Won-Jun, Jeong, Hee-Seok, Hwang, Hye-Kyong, Yoon, Hwa-Mook, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Groza, Adrian, editor, and Prasath, Rajendra, editor
- Published
- 2018
- Full Text
- View/download PDF
33. A Metadata Extractor for Books in a Digital Library
- Author
-
Akhtar, Sk. Simran, Sanyal, Debarshi Kumar, Chattopadhyay, Samiran, Bhowmick, Plaban Kumar, Das, Partha Pratim, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Dobreva, Milena, editor, Hinze, Annika, editor, and Žumer, Maja, editor
- Published
- 2018
- Full Text
- View/download PDF
34. Metadata Extraction for Scientific Papers
- Author
-
Meng, Binjie, Hou, Lei, Yang, Erhong, Li, Juanzi, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Sun, Maosong, editor, Liu, Ting, editor, Wang, Xiaojie, editor, Liu, Zhiyuan, editor, and Liu, Yang, editor
- Published
- 2018
- Full Text
- View/download PDF
35. Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark data.
- Author
-
Schembera, Björn
- Subjects
- *
METADATA , *MIDDLE Ages , *DATA management , *RAINBOWS , *ANNOTATIONS - Abstract
The deluge of dark data is about to happen. Lacking data management capabilities, especially in the field of supercomputing, and missing data documentation (i.e., missing metadata annotation) constitute a major source of dark data. The present work contributes to addressing this challenge by presenting ExtractIng, a generic automated metadata extraction toolkit. Existing metadata information of simulation output files scattered through the file system, can be aggregated, parsed and converted to the EngMeta metadata model. Use cases from computational engineering are considered to demonstrate the viability of ExtractIng. The evaluation results show that the metadata extraction is simulation-code independent in the sense that it can handle data outputs from various fields of science, is easy to integrate into simulation workflows and compatible with a multitude of computational environments. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
36. Automatic Detection of the Boundary between Metadata and Body in Persian Theses using BA_SVM.
- Author
-
Rahnama, Mohadese, Hossein Hasheminejad, Seyed Mohammad, and Nasiri, Jalal A.
- Subjects
METADATA ,SUPPORT vector machines ,INFORMATION retrieval ,ALGORITHMS - Abstract
Metadata extraction facilitates the process of indexing and improves information retrieval. Also automation of this process increases efficiency more than manual extraction. The example of the thesis metadata are names of students, professors, title, field, degree, abstract, keywords, etc. In this paper the aim is automatic boundary detection of metadata from the main body in Persian theses. Therefore, 250 theses collected from IRANDOC system. Features were extracted from paragraphs of each thesis then paragraphs were classified using support vector machine into 2 classes: metadata and body. In this study, Bat algorithm is used to set the parameter of SVM. The result reveals that the proposed method predicts type of paragraphs with 96.6 percent accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. An Efficient Framework for Algorithmic Metadata Extraction over Scholarly Documents Using Deep Neural Networks
- Author
-
Raghavendra Nayaka, P. and Ranjan, Rajeev
- Published
- 2023
- Full Text
- View/download PDF
38. Metadata Extraction Analysis: A Review of Video Data in Effect to Social Media Compression
- Author
-
Dawn Iris Calibo and Jasmin D. Niguidula
- Subjects
video compression ,metadata extraction ,video analysis ,social media. ,Computer software ,QA76.75-76.765 - Abstract
In the 21st century, the unending improvement of the web, online networking has made approaches to take advantage to streamline the speed of its webpage. Through metatadta extraction, two of the most well-known social networking sites were subjected to experimentation to see the impacts of video compression using standard parameters when uploaded to the said sites. The assessment demonstrates that both sites display keys similarities and differences. It is further explained in the research the different structures amongst the social network that create the outcome
- Published
- 2019
- Full Text
- View/download PDF
39. Video Summarization Framework for Newscasts and Reports – Work in Progress
- Author
-
Leszczuk, Mikołaj, Grega, Michał, Koźbiał, Arian, Gliwski, Jarosław, Wasieczko, Krzysztof, Smaïli, Kamel, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Dziech, Andrzej, editor, and Czyżewski, Andrzej, editor
- Published
- 2017
- Full Text
- View/download PDF
40. ارائة روشی ساختارمحور برای ایجاد پایگاه د...
- Author
-
آزاده فخرزاده and امیرحسین صدیقی
- Subjects
SCIENCE databases ,INFORMATION resources ,ALGORITHMS ,IMAGE processing ,SEARCH engines ,PYTHON programming language - Abstract
Figures in scientific documents are rich sources of information. The first step in retrieving information from such figures is to build a valid figure database. To this end, we developed a system for generating figure database from scholarly Persian documents, in large scale. The first step is to parse files and extract figures and their corresponding descriptions. There are two general approaches for extracting figures from documents. One is based on image processing methods and another is based on processing the file primitives. The focus of this paper is on latter one. This approach is shown to be a better choice for the search engines because of its speed and scalability properties. We propose a structure based method that extracts the figures and their descriptions by analyzing the file layout. This information is saved in a database with a specific structure and is indexed for retrieval in the search engine. The proposed algorithm was implemented in Python programming language. As a benchmark we used the basic method in the literature which is based on the processing PDF file. We employed the proposed method in a case study on Iran scientific information database (Ganj). In this regard, 150 scientific documents were randomly chosen from Ganj database and analyzed using two mentioned methods. Based on our experimental results, the proposed method is more efficient than the basic method especially for Persian documents. There are many unanswered challenges for Persian documents when using the basic method. The number of noise images resulted from the basic method is high and Persian text extracted is not well organized. Our proposed method overcomes some of these drawbacks and is recommended for generating figure database from scientific Persian documents. The proposed method is able to correctly extract about 40% of the images with their corresponding descriptions which is 10% better than the basic method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
41. An open-source framework for neuroscience metadata management applied to digital reconstructions of neuronal morphology.
- Author
-
Bijari, Kayvan, Akram, Masood A., and Ascoli, Giorgio A.
- Subjects
NEUROSCIENCES ,ANIMAL species ,METADATA ,ARTIFICIAL intelligence ,MORPHOLOGY ,DATABASE design - Abstract
Research advancements in neuroscience entail the production of a substantial amount of data requiring interpretation, analysis, and integration. The complexity and diversity of neuroscience data necessitate the development of specialized databases and associated standards and protocols. NeuroMorpho.Org is an online repository of over one hundred thousand digitally reconstructed neurons and glia shared by hundreds of laboratories worldwide. Every entry of this public resource is associated with essential metadata describing animal species, anatomical region, cell type, experimental condition, and additional information relevant to contextualize the morphological content. Until recently, the lack of a user-friendly, structured metadata annotation system relying on standardized terminologies constituted a major hindrance in this effort, limiting the data release pace. Over the past 2 years, we have transitioned the original spreadsheet-based metadata annotation system of NeuroMorpho.Org to a custom-developed, robust, web-based framework for extracting, structuring, and managing neuroscience information. Here we release the metadata portal publicly and explain its functionality to enable usage by data contributors. This framework facilitates metadata annotation, improves terminology management, and accelerates data sharing. Moreover, its open-source development provides the opportunity of adapting and extending the code base to other related research projects with similar requirements. This metadata portal is a beneficial web companion to NeuroMorpho.Org which saves time, reduces errors, and aims to minimize the barrier for direct knowledge sharing by domain experts. The underlying framework can be progressively augmented with the integration of increasingly autonomous machine intelligence components. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
42. Architectural Challenges of Genotype-Phenotype Data Management
- Author
-
Chlebiej, Michał, Habela, Piotr, Rutkowski, Andrzej, Szulc, Iwona, Wiśniewski, Piotr, Stencel, Krzysztof, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Liu, Ting, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Kozielski, Stanisław, editor, Mrozek, Dariusz, editor, Kasprowski, Paweł, editor, Małysiak-Mrozek, Bożena, editor, and Kostrzewa, Daniel, editor
- Published
- 2016
- Full Text
- View/download PDF
43. Extracting enhanced artificial intelligence model metadata from software repositories
- Author
-
Tsay, Jason, Braz, Alan, Hirzel, Martin, Shinnar, Avraham, and Mummert, Todd
- Published
- 2022
- Full Text
- View/download PDF
44. Metadata Extraction from Conference Proceedings Using Template-Based Approach
- Author
-
Kovriguina, Liubov, Shipilo, Alexander, Kozlov, Fedor, Kolchin, Maxim, Cherny, Eugene, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Gandon, Fabien, editor, Cabrio, Elena, editor, Stankovic, Milan, editor, and Zimmermann, Antoine, editor
- Published
- 2015
- Full Text
- View/download PDF
45. Discovering the Topical Evolution of the Digital Library Evaluation Community
- Author
-
Papachristopoulos, Leonidas, Kleidis, Nikos, Sfakakis, Michalis, Tsakonas, Giannis, Papatheodorou, Christos, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Garoufallou, Emmanouel, editor, Hartley, Richard J., editor, and Gaitanou, Panorea, editor
- Published
- 2015
- Full Text
- View/download PDF
46. Bookmarklet-Triggered Literature Metadata Extraction System Using Cloud Plugins
- Author
-
Ma, Kun, Abraham, Ajith, Kacprzyk, Janusz, Series editor, Abraham, Ajith, editor, Muda, Azah Kamilah, editor, and Choo, Yun-Huoy, editor
- Published
- 2015
- Full Text
- View/download PDF
47. Exploring LOD through metadata extraction and data-driven visualizations
- Author
-
Oscar Peña, Unai Aguilera, and Diego López-de-Ipiña
- Published
- 2016
- Full Text
- View/download PDF
48. Metadata Extraction and Management in Data Lakes With GEMMS
- Author
-
Christoph Quix, Rihan Hai, and Ivan Vatov
- Subjects
Metadata management ,data integration ,scientific data ,metadata extraction ,data lakes ,Information technology ,T58.5-58.64 - Abstract
In addition to volume and velocity, Big data is also characterized by its variety. Variety in structure and semantics requires new integration approaches which can resolve the integration challenges also for large volumes of data. Data lakes should reduce the upfront integration costs and provide a more flexible way for data integration and analysis, as source data is loaded in its original structure to the data lake repository. Some syntactic transformation might be applied to enable access to the data in one common repository; however, a deep semantic integration is done only after the initial loading of the data into the data lake. Thereby, data is easily made available and can be restructured, aggregated, and transformed as required by later applications. Metadata management is a crucial component in a data lake, as the source data needs to be described by metadata to capture its semantics. We developed a Generic and Extensible Metadata Management System for data lakes (called GEMMS) that aims at the automatic extraction of metadata from a wide variety of data sources. Furthermore, the metadata is managed in an extensible metamodel that distinguishes structural and semantical metadata. The use case applied for evaluation is from the life science domain where the data is often stored only in files which hinders data access and efficient querying. The GEMMS framework has been proven to be useful in this domain. Especially, the extensibility and flexibility of the framework are important, as data and metadata structures in scientific experiments cannot be defined a priori.
- Published
- 2016
- Full Text
- View/download PDF
49. Mathematical Content Semantic Markup Methods and Open Scientific E-Journals Management Systems
- Author
-
Elizarov, Alexander, Lipachev, Evgeny, Zuev, Denis, Junqueira Barbosa, Simone Diniz, Series editor, Chen, Phoebe, Series editor, Cuzzocrea, Alfredo, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Ślęzak, Dominik, Series editor, Washio, Takashi, Series editor, Yang, Xiaokang, Series editor, Klinov, Pavel, editor, and Mouromtsev, Dmitry, editor
- Published
- 2014
- Full Text
- View/download PDF
50. MetaExtractor: A System for Metadata Extraction from Structured Data Sources
- Author
-
Pomares-Quimbaya, Alexandra, Torres-Moreno, Miguel Eduardo, Roldán, Fabián, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Cuzzocrea, Alfredo, editor, Kittl, Christian, editor, Simos, Dimitris E., editor, Weippl, Edgar, editor, and Xu, Lida, editor
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.