445 results on '"Data Engineering"'
Search Results
2. Data Engineering for Nonverbal Expression Analysis - Case Studies of Borderline Personality Disorder
- Author
-
Eraña-Diaz, Marta-Lilia, Rosales-Lagarde, Alejandra, Reyes-Soto, Adriana, Arango-de-Montis, Iván, Rodríguez-Delgado, Andrés, Muñoz-Delgado, Jairo, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Singh, Mayank, editor, Tyagi, Vipin, editor, Gupta, P. K., editor, Flusser, Jan, editor, Ören, Tuncer, editor, Cherif, Amar Ramdane, editor, and Tomar, Ravi, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Research Profile on Data Science in the Field of Tourism.
- Author
-
Bustamante Martínez, Alexander, Galvis Lista, Ernesto Amaru, and Gonzalez Zabala, Mayda Patricia
- Subjects
TOURISTS ,DATA science ,SENTIMENT analysis ,TOURISM ,BIG data ,TOURIST attractions ,METADATA - Abstract
This research examined trends in data science application within tourism using SCOPUS for bibliographic records and VOSViewer for metadata analysis. It highlighted the top 100 keywords based on their strength. Since 2012, there's been an exponential rise in related publications. There seems to be a link between a nation's economic tourism strength and its research output. Countries like Australia, Italy, and Spain, which are known for tourism, also dominate research contributions. China has led in this domain, consistently upping its publications since 2012. The study identified some key areas: Cluster 1 emphasizes Big Data's role in enhancing tourism services; Cluster 2 explores the intricacies of human language in mining tourist reviews; Cluster 3 delves into sentiment polarity detection in texts; while Cluster 4 presents metrics for gauging destination competitiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Clinical bioinformatics desiderata for molecular tumor boards.
- Author
-
Pallocca, Matteo, Betti, Martina, Baldinelli, Sara, Palombo, Ramona, Bucci, Gabriele, Mazzarella, Luca, Tonon, Giovanni, and Ciliberto, Gennaro
- Subjects
- *
MEDICAL informatics , *DIGITAL technology , *CANCER patients , *GENOMICS , *CANCER treatment - Abstract
Clinical Bioinformatics is a knowledge framework required to interpret data of medical interest via computational methods. This area became of dramatic importance in precision oncology, fueled by cancer genomic profiling: most definitions of Molecular Tumor Boards require the presence of bioinformaticians. However, all available literature remained rather vague on what are the specific needs in terms of digital tools and expertise to tackle and interpret genomics data to assign novel targeted or biomarker-driven targeted therapies to cancer patients. To fill this gap, in this article, we present a catalog of software families and human skills required for the tumor board bioinformatician, with specific examples of real-world applications associated with each element presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Modernizing Surgical Quality: A Novel Approach to Improving Detection of Surgical Site Infections in the Veteran Population.
- Author
-
Perkins, Louis, O'Keefe, Thomas, Ardill, William, and Potenza, Bruce
- Subjects
- *
SURGICAL site infections , *ELECTRONIC health records , *SURGERY , *LOGISTIC regression analysis , *MACHINE learning - Abstract
Introduction: Surgical site infections (SSIs) are an important quality measure. Identifying SSIs often relies upon a time-intensive manual review of a sample of common surgical cases. In this study, we sought to develop a predictive model for SSI identification using antibiotic pharmacy data extracted from the electronic medical record (EMR). Methods: A retrospective analysis was performed on all surgeries at a Veteran Affair's Medical Center between January 9, 2020 and January 9, 2022. Patients receiving outpatient antibiotics within 30 days of their surgery were identified, and chart review was performed to detect instances of SSI as defined by VA Surgery Quality Improvement Program criteria. Binomial logistic regression was used to select variables to include in the model, which was trained using k-fold cross validation. Results: Of the 8,253 surgeries performed during the study period, patients in 793 (9.6%) cases were prescribed outpatient antibiotics within 30 days of their procedure; SSI was diagnosed in 128 (1.6%) patients. Logistic regression identified time from surgery to antibiotic prescription, ordering location of the prescription, length of prescription, type of antibiotic, and operating service as important variables to include in the model. On testing, the final model demonstrated good predictive value with c-statistic of 0.81 (confidence interval: 0.71–0.90). Hosmer–Lemeshow testing demonstrated good fit of the model with p value of 0.97. Conclusion: We propose a model that uses readily attainable data from the EMR to identify SSI occurrences. In conjunction with local case-by-case reporting, this tool can improve the accuracy and efficiency of SSI identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. An End-to-End Deep Learning Framework for Fault Detection in Marine Machinery.
- Author
-
Rigas, Spyros, Tzouveli, Paraskevi, and Kollias, Stefanos
- Subjects
- *
DEEP learning , *PRODUCT management software , *ACQUISITION of data , *INTERNET of things , *CLOUD computing - Abstract
The Industrial Internet of Things has enabled the integration and analysis of vast volumes of data across various industries, with the maritime sector being no exception. Advances in cloud computing and deep learning (DL) are continuously reshaping the industry, particularly in optimizing maritime operations such as Predictive Maintenance (PdM). In this study, we propose a novel DL-based framework focusing on the fault detection task of PdM in marine operations, leveraging time-series data from sensors installed on shipboard machinery. The framework is designed as a scalable and cost-efficient software solution, encompassing all stages from data collection and pre-processing at the edge to the deployment and lifecycle management of DL models. The proposed DL architecture utilizes Graph Attention Networks (GATs) to extract spatio-temporal information from the time-series data and provides explainable predictions through a feature-wise scoring mechanism. Additionally, a custom evaluation metric with real-world applicability is employed, prioritizing both prediction accuracy and the timeliness of fault identification. To demonstrate the effectiveness of our framework, we conduct experiments on three types of open-source datasets relevant to PdM: electrical data, bearing datasets, and data from water circulation experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Algor-ethics: charting the ethical path for AI in critical care.
- Author
-
Montomoli, Jonathan, Bitondo, Maria Maddalena, Cascella, Marco, Rezoagli, Emanuele, Romeo, Luca, Bellini, Valentina, Semeraro, Federico, Gamberini, Emiliano, Frontoni, Emanuele, Agnoletti, Vanni, Altini, Mattia, Benanti, Paolo, and Bignami, Elena Giovanna
- Abstract
The integration of Clinical Decision Support Systems (CDSS) based on artificial intelligence (AI) in healthcare is groundbreaking evolution with enormous potential, but its development and ethical implementation, presents unique challenges, particularly in critical care, where physicians often deal with life-threating conditions requiring rapid actions and patients unable to participate in the decisional process. Moreover, development of AI-based CDSS is complex and should address different sources of bias, including data acquisition, health disparities, domain shifts during clinical use, and cognitive biases in decision-making. In this scenario algor-ethics is mandatory and emphasizes the integration of 'Human-in-the-Loop' and 'Algorithmic Stewardship' principles, and the benefits of advanced data engineering. The establishment of Clinical AI Departments (CAID) is necessary to lead AI innovation in healthcare, ensuring ethical integrity and human-centered development in this rapidly evolving field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Special issue on feature engineering editorial.
- Author
-
Verdonck, Tim, Baesens, Bart, Óskarsdóttir, María, and vanden Broucke, Seppe
- Subjects
MACHINE learning ,ENGINEERING design ,SIMPLE machines ,ENGINEERING ,TIME series analysis ,DECISION trees - Abstract
In order to improve the performance of any machine learning model, it is important to focus more on the data itself instead of continuously developing new algorithms. This is exactly the aim of feature engineering. It can be defined as the clever engineering of data hereby exploiting the intrinsic bias of the machine learning technique to our benefit, ideally both in terms of accuracy and interpretability at the same time. Often times it will be applied in combination with simple machine learning techniques such as regression models or decision trees to boost their performance (whilst maintaining the interpretability property which is so often needed in analytical modeling) but it may also improve complex techniques such as XGBoost and neural networks. Feature engineering aims at designing smart features in one of two possible ways: either by adjusting existing features using various transformations or by extracting or creating new meaningful features (a process often called "featurization") from different sources (e.g., transactional data, network data, time series data, text data, etc.). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Extensive data engineering to the rescue: building a multi-species katydid detector from unbalanced, atypical training datasets.
- Author
-
Madhusudhana, Shyam, Klinck, Holger, and Symes, Laurel B.
- Subjects
- *
KATYDIDS , *TROPICAL ecosystems , *DEEP learning , *DATA augmentation , *AUDITORY masking , *BIODIVERSITY monitoring , *DETECTORS - Abstract
Passive acoustic monitoring (PAM) is a powerful tool for studying ecosystems. However, its effective application in tropical environments, particularly for insects, poses distinct challenges. Neotropical katydids produce complex species-specific calls, spanning mere milliseconds to seconds and spread across broad audible and ultrasonic frequencies. However, subtle differences in inter-pulse intervals or central frequencies are often the only discriminatory traits. These extremities, coupled with low source levels and susceptibility to masking by ambient noise, challenge species identification in PAM recordings. This study aimed to develop a deep learning-based solution to automate the recognition of 31 katydid species of interest in a biodiverse Panamanian forest with over 80 katydid species. Besides the innate challenges, our efforts were also encumbered by a limited and imbalanced initial training dataset comprising domain-mismatched recordings. To overcome these, we applied rigorous data engineering, improving input variance through controlled playback re-recordings and by employing physics-based data augmentation techniques, and tuning signal-processing, model and training parameters to produce a custom well-fit solution. Methods developed here are incorporated into Koogu, an open-source Python-based toolbox for developing deep learning-based bioacoustic analysis solutions. The parametric implementations offer a valuable resource, enhancing the capabilities of PAM for studying insects in tropical ecosystems. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Bayesian Neural Networks for predicting the severity of symptoms: a case study.
- Author
-
Belciug, Smaranda and Mihai, Tiberiu
- Abstract
Childhood allergies are a problem that seems to be forgotten by the Artificial Intelligence community, even if they are affecting millions of children. In this paper we are interested in studying the prevalence of childhood allergies, some demographic stats, and to predict the severity of the most encountered allergy, asthma. For this we have used two publicly available datasets, one for Data Engineering and Exploratory Data Analysis, and the other for Bayesian Neural Networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Discover Data
- Subjects
data science ,data collection ,data processing ,data analysis ,big data ,data engineering ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Published
- 2024
12. STAC, an open standard to describe and catalog geospatial data on the web
- Author
-
Giorgio Basile
- Subjects
geospatial ,data engineering ,cloud native ,stac ,Cartography ,GA101-1776 ,Cadastral mapping ,GA109.5 - Abstract
The STAC is a recent geospatial standard that allows to describe and catalog geospatial assets. It is part of a broader innovation effort called Cloud-Native Geospatial, providing modern standards and tools to efficiently access raster and vector data in the cloud.
- Published
- 2024
13. Exploiting Formal Concept Analysis for Data Modeling in Data Lakes
- Author
-
Bendimerad, Anes, Mathonat, Romain, Remil, Youcef, Kaytoue, Mehdi, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cabrera, Inma P., editor, Ferré, Sébastien, editor, and Obiedkov, Sergei, editor
- Published
- 2024
- Full Text
- View/download PDF
14. Automated Identification of Existing and Potential Urban Central Places Based on Open Data and Public Interest
- Author
-
Pavlova, Anna, Katynsus, Aleksandr, Natykin, Maksim, Mityagin, Sergey, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Gervasi, Osvaldo, editor, Murgante, Beniamino, editor, Garau, Chiara, editor, Taniar, David, editor, C. Rocha, Ana Maria A., editor, and Faginas Lago, Maria Noelia, editor
- Published
- 2024
- Full Text
- View/download PDF
15. The Programmable World and Its Emerging Privacy Nightmare
- Author
-
Kotilainen, Pyry, Mehraj, Ali, Mikkonen, Tommi, Mäkitalo, Niko, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Stefanidis, Kostas, editor, Systä, Kari, editor, Matera, Maristella, editor, Heil, Sebastian, editor, Kondylakis, Haridimos, editor, and Quintarelli, Elisa, editor
- Published
- 2024
- Full Text
- View/download PDF
16. Confiance.ai Program Software Engineering for a Trustworthy AI
- Author
-
Gelin, Rodolphe, Kacprzyk, Janusz, Series Editor, and Aldinhas Ferreira, Maria Isabel, editor
- Published
- 2024
- Full Text
- View/download PDF
17. Rethinking Data Acquisition to Data Analytics in Bioprocessing
- Author
-
Bongard, Sophia, Kees, Nicole, Guimarães, Pedro Ivo, Großkopf, Tobias, Schönbohm, Avo, editor, von Horsten, Hans Henning, editor, Plugmann, Philipp, editor, and von Stosch, Moritz, editor
- Published
- 2024
- Full Text
- View/download PDF
18. A Comprehensive Review of Migration of Big Data Applications to Public Clouds: Current Requirements, Types, Strategies, and Case Studies
- Author
-
Nama, Vihaan, Prabhu, H. Vishalakshi, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, and Uddin, Mohammad Shorif, editor
- Published
- 2024
- Full Text
- View/download PDF
19. Data Engineering-Based Research on the Disasters Along the China–Pakistan Economic Corridor
- Author
-
Zhang, Yaonan, Kang, Jianfang, Ai, Minghao, Min, Yufang, Li, Hongxing, Feng, Keting, Zhao, Guohui, Li, Xirong, Wu, Yamin, Chinese Academy of Sciences, Ministry of Education of the PRC, Ministry of Science and Technology of the PRC, China Association for Science and Technology, Chinese Academy of Social Sciences, Chinese Academy of Engineering, National Natural Science Foundation of China, and Chinese Academy of Agricultural Sciences
- Published
- 2024
- Full Text
- View/download PDF
20. Implementation of Machine Learning Based Systems into Production
- Author
-
Bulaev, Vladimir I., Wu, Wei, Series Editor, and Lin, Jia'en, editor
- Published
- 2024
- Full Text
- View/download PDF
21. Event-Based Data Pipelines in Recommender Systems: The Data Engineering Perspective
- Author
-
Reddy, Deexith, Sinha, Urjoshi, Rajput, Rohan Singh, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Miraz, Mahdi H., editor, Southall, Garfield, editor, Ali, Maaruf, editor, and Ware, Andrew, editor
- Published
- 2024
- Full Text
- View/download PDF
22. Securing IoT Using Supervised Machine Learning
- Author
-
Iqbal, Sania, Qureshi, Shaima, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Challa, Rama Krishna, editor, Aujla, Gagangeet Singh, editor, Mathew, Lini, editor, Kumar, Amod, editor, Kalra, Mala, editor, Shimi, S. L., editor, Saini, Garima, editor, and Sharma, Kanika, editor
- Published
- 2024
- Full Text
- View/download PDF
23. STAC, an open standard to describe and catalog geospatial data on the web.
- Author
-
Basile, Giorgio
- Subjects
- *
GEOSPATIAL data , *VECTOR data , *CATALOGS , *STANDARDS - Abstract
The STAC is a recent geospatial standard that allows to describe and catalog geospatial assets. It is part of a broader innovation effort called Cloud-Native Geospatial, providing modern standards and tools to efficiently access raster and vector data in the cloud. [ABSTRACT FROM AUTHOR]
- Published
- 2024
24. Captive markets and medical artificial intelligence.
- Author
-
Yong Min Lee, Stretton, Brandon, Sheryn Tan, Gupta, Aashray, Kovoor, Joshua, Bacchi, Stephen, Wanyin Lim, and Weng Onn Chan
- Subjects
- *
ARTIFICIAL intelligence , *MACHINE learning , *CONVOLUTIONAL neural networks , *LANGUAGE models , *NATURAL language processing , *COMPUTER literacy - Abstract
This article explores the implications of captive markets and medical artificial intelligence (AI) in healthcare systems. It discusses how AI companies with large datasets can create captive markets, limiting consumer choice in service providers. The article emphasizes the risks of healthcare institutions relying on a single AI provider and suggests strategies to mitigate these risks, such as promoting interoperability and fostering local AI expertise. It also addresses the importance of data privacy and conflict of interest disclosures when procuring AI services for healthcare systems. The article provides valuable insights into the challenges and considerations of implementing AI in healthcare. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
25. Building an Econometrics Model for Pier Construction in an Indonesian Oil and Gas Company.
- Author
-
Putra Andrian, Yoga
- Subjects
CONSTRUCTION management ,PETROLEUM industry ,PIERS ,LITERATURE reviews ,CONSTRUCTION projects ,GAS companies ,ENERGY demand management - Abstract
In the face of escalating global energy demands, the construction and management of pier infrastructure emerge as pivotal challenges, particularly for energy companies like Pertamina Patra Niaga in Indonesia. This paper aims to optimize pier construction management through the implementation of an OmniClass Work Breakdown Structure (WBS), addressing the critical need for standardized project management methodologies in complex, large-scale construction projects. By integrating a multidimensional WBS approach, akin to the hypercube or tesseract model, this study explores the enhancement of project planning, execution, and management. Employing a comprehensive literature review and case study analysis, the research investigates the efficacy of OmniClass WBS in facilitating better project coordination, cost estimation, and risk management. The findings underscore the significant advantages of adopting a standardized, multidimensional WBS, including improved data management and project outcome predictability. This paper concludes that the OmniClass WBS framework not only optimizes pier construction projects but also serves as a model for future infrastructure development endeavors within the energy sector [ABSTRACT FROM AUTHOR]
- Published
- 2024
26. Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data.
- Author
-
Kampezidou, Styliani I., Tikayat Ray, Archana, Bhat, Anirudh Prabhakara, Pinon Fischer, Olivia J., and Mavris, Dimitri N.
- Subjects
- *
SUPERVISED learning , *WORKFLOW , *DATA augmentation , *ENGINEERING models , *AUTOMATION , *MACHINE learning , *RESEARCH personnel - Abstract
This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e., feature engineering and model selection), principles (i.e., bias–variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, and output interpretability), models (i.e., neural networks and regression models), methods (i.e., cross-validation and data augmentation), metrics (i.e., Mean Squared Error and F1-score) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community regarding complete and rigorous machine learning workflows and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are non-AI experts. This paper may be of significant value to academic researchers developing and prototyping machine learning workflows for their own research or as customer-tailored solutions for government and industry partners. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data
- Author
-
Styliani I. Kampezidou, Archana Tikayat Ray, Anirudh Prabhakara Bhat, Olivia J. Pinon Fischer, and Dimitri N. Mavris
- Subjects
machine learning workflow ,supervised learning ,numerical data ,categorical data ,data engineering ,extraction, loading, transformation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e., feature engineering and model selection), principles (i.e., bias–variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, and output interpretability), models (i.e., neural networks and regression models), methods (i.e., cross-validation and data augmentation), metrics (i.e., Mean Squared Error and F1-score) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community regarding complete and rigorous machine learning workflows and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are non-AI experts. This paper may be of significant value to academic researchers developing and prototyping machine learning workflows for their own research or as customer-tailored solutions for government and industry partners.
- Published
- 2024
- Full Text
- View/download PDF
28. Incorporating Deep Learning Model Development With an End-to-End Data Pipeline
- Author
-
Kaichong Zhang
- Subjects
Artificial intelligence ,business intelligence ,database management ,data engineering ,data pipeline ,deep learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The rising popularity of artificial intelligence has led to an increasing amount of research on deep learning models. Many current studies have focused on topics such as model structures, model optimization techniques, fine-tuning, and transfer learning, aiming to create novel models that have greater predictability in one or more fields of interest. However, while model development is important, it should not be limited to the topics mentioned above. Instead, the scope of research can be broadened to encompass the holistic design of an end-to-end pipeline for deep learning model development, which includes data storage, extract, transform, and load (ETL), business intelligence, model training and testing, and incremental learning. This paper therefore aims to underscore the importance of this data pipeline and provide a paradigm that delineates each aspect of this pipeline in detail through a practical case study centered on the end-to-end development of recommender system models. Compared to the conventional model development process, the novel data pipeline provides a more organized and efficient data storage and data preparation, an easier and more manageable visualization solutions, and a more comprehensive way for model evaluation and model selection through the usage of databases, business intelligence tools, and incremental learning.
- Published
- 2024
- Full Text
- View/download PDF
29. The E(G)TL Model: A Novel Approach for Efficient Data Handling and Extraction in Multivariate Systems
- Author
-
Aleksejs Vesjolijs
- Subjects
generative AI ,ETL ,data engineering ,data mesh ,DataOps ,hyperloop ,Technology ,Applied mathematics. Quantitative methods ,T57-57.97 - Abstract
This paper introduces the EGTL (extract, generate, transfer, load) model, a theoretical framework designed to enhance the traditional ETL processes by integrating a novel ‘generate’ step utilizing generative artificial intelligence (GenAI). This enhancement optimizes data extraction and processing, presenting a high-level solution architecture that includes innovative data storage concepts: the Fusion and Alliance stores. The Fusion store acts as a virtual space for immediate data cleaning and profiling post-extraction, facilitated by GenAI, while the Alliance store serves as a collaborative data warehouse for both business users and AI processes. EGTL was developed to facilitate advanced data handling and integration within digital ecosystems. This study defines the EGTL solution design, setting the groundwork for future practical implementations and exploring the integration of best practices from data engineering, including DataOps principles and data mesh architecture. This research underscores how EGTL can improve the data engineering pipeline, illustrating the interactions between its components. The EGTL model was tested in the prototype web-based Hyperloop Decision-Making Ecosystem with tasks ranging from data extraction to code generation. Experiments demonstrated an overall success rate of 93% across five difficulty levels. Additionally, the study highlights key risks associated with EGTL implementation and offers comprehensive mitigation strategies.
- Published
- 2024
- Full Text
- View/download PDF
30. Understanding project success involving analytic-based decision support in the digital era: a focus on IC and agile project management
- Author
-
Kudyba, Stephan and D Cruz, Agnel
- Published
- 2023
- Full Text
- View/download PDF
31. Building Advanced Web Applications Using Data Ingestion and Data Processing Tools.
- Author
-
Šprem, Šimun, Tomažin, Nikola, Matečić, Jelena, and Horvat, Marko
- Subjects
ELECTRONIC data processing ,WEB-based user interfaces ,BIOMETRIC identification ,DATA libraries ,INGESTION ,REAL-time computing - Abstract
Today, advanced websites serve as robust data repositories that constantly collect various user-centered information and prepare it for subsequent processing. The data collected can include a wide range of important information from email addresses, usernames, and passwords to demographic information such as age, gender, and geographic location. User behavior metrics are also collected, including browsing history, click patterns, and time spent on pages, as well as different preferences like product selection, language preferences, and individual settings. Interactions, device information, transaction history, authentication data, communication logs, and various analytics and metrics contribute to the comprehensive range of user-centric information collected by websites. A method to systematically ingest and transfer such differently structured information to a central message broker is thoroughly described. In this context, a novel tool—Dataphos Publisher—for the creation of ready-to-digest data packages is presented. Data acquired from the message broker are employed for data quality analysis, storage, conversion, and downstream processing. A brief overview of the commonly used and freely available tools for data ingestion and processing is also provided. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Bull Breeding Soundness Assessment Using Artificial Neural Network-Based Predictive Models.
- Author
-
Marín-Urías, Luis F., García-Ramírez, Pedro J., Domínguez-Mancera, Belisario, Hernández-Beltrán, Antonio, Vásquez-Santacruz, José A., Cervantes-Acosta, Patricia, Barrientos-Morales, Manuel, and Portillo-Vélez, Rogelio de J.
- Subjects
PREDICTION models ,BULLS ,VETERINARY medicine ,FISH spawning ,ARTIFICIAL neural networks - Abstract
For years, efforts have been devoted to establishing an effective bull breeding soundness evaluation procedure; usual research on this subject is based on bull breeding soundness examination (BBSE) methodologies, which have significant limitations in terms of their evaluation procedure, such as their high cost, time consumption, and administrative difficulty, as well as a lack of diagnostic laboratories equipped to handle the more difficult cases. This research focused on the creation of a prediction model to supplement and/or improve the BBSE approach through the study of two algorithms, namely, clustering and artificial neural networks (ANNs), to find the optimum machine learning (ML) approach for our application, with an emphasis on data categorization accuracy. This tool was designed to assist veterinary medicine and farmers in identifying key factors and increasing certainty in their decision-making during the selection of bulls for breeding purposes, providing data from a limited number of factors generated from a deep pairing study of bulls. Zebu, European, and crossbred bulls were the general groupings. The data utilized in the model's creation (N = 359) considered five variables that influence improvement decisions. This approach enhanced decision-making by 12% compared to traditional breeding bull management. ANN obtained an accuracy of 90%, with precision rates of 97% for satisfactory, 92% for unsatisfactory, and 85% for bad. These results indicate that the proposed method can be considered an effective alternative for innovative decision-making in traditional BBSE. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. An End-to-End Deep Learning Framework for Fault Detection in Marine Machinery
- Author
-
Spyros Rigas, Paraskevi Tzouveli, and Stefanos Kollias
- Subjects
data collection ,data engineering ,deep learning ,fault detection ,marine IoT ,MLOps ,Chemical technology ,TP1-1185 - Abstract
The Industrial Internet of Things has enabled the integration and analysis of vast volumes of data across various industries, with the maritime sector being no exception. Advances in cloud computing and deep learning (DL) are continuously reshaping the industry, particularly in optimizing maritime operations such as Predictive Maintenance (PdM). In this study, we propose a novel DL-based framework focusing on the fault detection task of PdM in marine operations, leveraging time-series data from sensors installed on shipboard machinery. The framework is designed as a scalable and cost-efficient software solution, encompassing all stages from data collection and pre-processing at the edge to the deployment and lifecycle management of DL models. The proposed DL architecture utilizes Graph Attention Networks (GATs) to extract spatio-temporal information from the time-series data and provides explainable predictions through a feature-wise scoring mechanism. Additionally, a custom evaluation metric with real-world applicability is employed, prioritizing both prediction accuracy and the timeliness of fault identification. To demonstrate the effectiveness of our framework, we conduct experiments on three types of open-source datasets relevant to PdM: electrical data, bearing datasets, and data from water circulation experiments.
- Published
- 2024
- Full Text
- View/download PDF
34. Sustainable AI - Standards, Current Practices and Recommendations
- Author
-
Banipal, Indervir Singh, Asthana, Shubhi, Mazumder, Sourav, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2023
- Full Text
- View/download PDF
35. Procedurally Generated Colonoscopy and Laparoscopy Data for Improved Model Training Performance
- Author
-
Dowrick, Thomas, Chen, Long, Ramalhinho, João, Puyal, Juana González-Bueno, Clarkson, Matthew J., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhattarai, Binod, editor, Ali, Sharib, editor, Rau, Anita, editor, Nguyen, Anh, editor, Namburete, Ana, editor, Caramalau, Razvan, editor, and Stoyanov, Danail, editor
- Published
- 2023
- Full Text
- View/download PDF
36. So You Want to Work in Tech: How Do You Make the Leap?
- Author
-
Barnes, Matthew, Esarey, Justin, Series Editor, and Jackson, Natalie, editor
- Published
- 2023
- Full Text
- View/download PDF
37. Cloud-Based Simulation Model for Agriculture Big Data in the Kingdom of Bahrain
- Author
-
Ghanim, Mohammed, Alammary, Jaflah, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Yang, Xin-She, editor, Sherratt, R. Simon, editor, Dey, Nilanjan, editor, and Joshi, Amit, editor
- Published
- 2023
- Full Text
- View/download PDF
38. A Survey-Based Evaluation of the Data Engineering Maturity in Practice
- Author
-
Tebernum, Daniel, Altendeitering, Marcel, Howar, Falk, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Cuzzocrea, Alfredo, editor, Gusikhin, Oleg, editor, Hammoudi, Slimane, editor, and Quix, Christoph, editor
- Published
- 2023
- Full Text
- View/download PDF
39. Towards Development of Data Architecture for Learning Analytics Projects Using Data Engineering Approach
- Author
-
Popovych, Valerii, Drlik, Martin, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Tanwar, Sudeep, editor, Wierzchon, Slawomir T., editor, Singh, Pradeep Kumar, editor, Ganzha, Maria, editor, and Epiphaniou, Gregory, editor
- Published
- 2023
- Full Text
- View/download PDF
40. High Performance Dataframes from Parallel Processing Patterns
- Author
-
Perera, Niranda, Kamburugamuve, Supun, Widanage, Chathura, Abeykoon, Vibhatha, Uyar, Ahmet, Shan, Kaiying, Maithree, Hasara, Lenadora, Damitha, Kanewala, Thejaka Amila, Fox, Geoffrey, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wyrzykowski, Roman, editor, Dongarra, Jack, editor, Deelman, Ewa, editor, and Karczewski, Konrad, editor
- Published
- 2023
- Full Text
- View/download PDF
41. Data Engineering in Action
- Author
-
Cascavilla, Giuseppe, Dalla Palma, Stefano, Driessen, Stefan, van den Heuvel, Willem-Jan, De Pascale, Daniel, Sangiovanni, Mirella, Schouten, Gerard, Liebregts, Werner, editor, van den Heuvel, Willem-Jan, editor, van den Born, Arjan, editor, Van den Heuvel, Willem-Jan, Section Editor, Tamburri, Damian A., Section Editor, Böing-Messing, Florian, Section Editor, and Lafarre, Anne J. F., Section Editor
- Published
- 2023
- Full Text
- View/download PDF
42. The Unlikely Wedlock Between Data Science and Entrepreneurship
- Author
-
van den Born, Arjan, Liebregts, Werner, van den Heuvel, Willem-Jan, Liebregts, Werner, editor, van den Heuvel, Willem-Jan, editor, van den Born, Arjan, editor, Van den Heuvel, Willem-Jan, Section Editor, Tamburri, Damian A., Section Editor, Böing-Messing, Florian, Section Editor, and Lafarre, Anne J. F., Section Editor
- Published
- 2023
- Full Text
- View/download PDF
43. Requirements on and Selection of Data Storage Technologies for Life Cycle Assessment
- Author
-
Ulbig, Michael, Merschak, Simon, Hehenberger, Peter, Bachler, Johann, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Goedicke, Michael, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Tröltzsch, Fredi, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Reis, Ricardo, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Noël, Frédéric, editor, Nyffenegger, Felix, editor, Rivest, Louis, editor, and Bouras, Abdelaziz, editor
- Published
- 2023
- Full Text
- View/download PDF
44. Application of microservices patterns to big data systems
- Author
-
Pouya Ataei and Daniel Staegemann
- Subjects
Big data ,Microservices ,Microservices patterns ,Big data architecture ,Data architecture ,Data engineering ,Computer engineering. Computer hardware ,TK7885-7895 ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract The panorama of data is ever evolving, and big data has emerged to become one of the most hyped terms in the industry. Today, users are the perpetual producers of data that if gleaned and crunched, have the potential to reveal game-changing patterns. This has introduced an important shift regarding the role of data in organizations and many strive to harness to power of this new material. Howbeit, institutionalizing data is not an easy task and requires the absorption of a great deal of complexity. According to the literature, it is estimated that only 13% of organizations succeeded in delivering on their data strategy. Among the root challenges, big data system development and data architecture are prominent. To this end, this study aims to facilitate data architecture and big data system development by applying well-established patterns of microservices architecture to big data systems. This objective is achieved by two systematic literature reviews, and infusion of results through thematic synthesis. The result of this work is a series of theories that explicates how microservices patterns could be useful for big data systems. These theories are then validated through expert opinion gathering with 7 experts from the industry. The findings emerged from this study indicates that big data architectures can benefit from many principles and patterns of microservices architecture.
- Published
- 2023
- Full Text
- View/download PDF
45. DATA ENGINEERING IN CRISP-DM PROCESS PRODUCTION DATA – CASE STUDY
- Author
-
Jolanta BRZOZOWSKA, Jakub PIZOŃ, Gulzhan BAYTIKENOVA, Arkadiusz GOLA, Alfiya ZAKIMOVA, and Katarzyna PIOTROWSKA
- Subjects
data engineering ,data mining ,CRISP-DM ,assembly ,process planning ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The paper describes one of the methods of data acquisition in data mining models used to support decision-making. The study presents the possibilities of data collection using the phases of the CRISP-DM model for an organization and presents the possibility of adapting the model for analysis and management in the decisionmaking process. The first three phases of implementing the CRISP-DM model are described using data from an enterprise with small batch production as an example. The paper presents the CRISP-DM based model for data mining in the process of predicting assembly cycle time. The developed solution has been evaluated using real industrial data and will be a part of methodology that allows to estimate the assembly time of a finished product at the quotation stage, i.e., without the detailed technology of the product being known.
- Published
- 2023
- Full Text
- View/download PDF
46. The automated collapse data constructor technique and the data‐driven methodology for seismic collapse risk assessment.
- Author
-
Bijelić, Nenad, Lignos, Dimitrios G., and Alahi, Alexandre
- Subjects
GROUND motion ,RISK assessment ,MACHINE learning ,PROGRESSIVE collapse ,EARTHQUAKE engineering ,DECISION trees ,EARTHQUAKE hazard analysis ,BUILDING failures - Abstract
Majority of the past research on application of machine learning (ML) in earthquake engineering focused on contrasting the predictive performance of different ML algorithms. In contrast, the emphasis of this paper is on the use of data to boost the predictive performance of surrogates. To that end, a novel data engineering methodology for seismic collapse risk assessment is proposed. This method, termed the automated collapse data constructor (ACDC), stems from combined understanding of the ground motion characteristics and the collapse process. In addition, the data‐driven collapse classifier (D2C2) methodology is proposed which enables conversion of the collapse data from a regression format to a classification format. The D2C2 methodology can be used with any classification tool, and it allows estimation of seismic collapse capacities in a way analogous to the incremental dynamic analysis. The proposed methodologies are tested in a case study using decision trees (XGBoost) and neural network classifiers with an extensive dataset of collapse responses of a 4‐story and an 8‐story steel moment resisting frames. The results suggest that the ACDC methodology allows for dramatic improvement of the predictive performance of data‐driven tools while at the same time significantly reducing data requirements. Specifically, the proposed method can reduce the number of ground motions required for collapse risk assessment from at least forty, as traditionally used, to less than twenty motions. Moreover, interpretation of feature importance conforms with the engineering understanding while revealing a novel, period‐dependent measure of ground motion duration. All data and code developed in this research are made openly available. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. Autonomous Federated Learning for Distributed Intrusion Detection Systems in Public Networks
- Author
-
Alireza Bakhshi Zadi Mahmoodi, Saeid Sheikhi, Ella Peltonen, and Panos Kostakos
- Subjects
Network security ,cybersecurity ,federated learning ,data engineering ,distributed computing ,stream processing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The rapid integration of IoT, cloud, and edge computing has resulted in highly interconnected networks, emphasizing the need for advanced Intrusion Detection Systems (IDS) to maintain security. Successful AI-based IDS relies on high-quality data for model training. Even though a vast array of datasets from controlled settings are accessible, many fall short as they are outdated and lack the representative data of network traffic dynamics typically seen in public networks. This paper aims to advance understanding in designing testbed architectures for defense mechanisms within public networks. At its core, this research introduces a unique testbed utilizing the connectivity of panOULU Municipal public network in the city of Oulu, Finland. This experimental setup examines AI-driven security across the public network. It utilizes edge-to-cloud infrastructures, incorporating Software-Defined Networking (SDN) and Network Function Virtualization (NFV) via the VMware vSphere platform. During the training phase, a script distinguishes incoming packets as either benign or malicious based on well-defined local parameters and simulated attack scenarios. This labeled data is then utilized for training machine learning models within the Federated Learning framework, FED-ML. Subsequently, these models are evaluated on previously unseen data. The entire procedure, from traffic gathering to model training, operates without human involvement. The evaluation dataset and testbed configuration we have made publicly available through this research can deepen our understanding of the challenges in safeguarding public networks, especially those that blend various technologies in diverse environments.
- Published
- 2023
- Full Text
- View/download PDF
48. Towards a domain-driven distributed reference architecture for big data systems.
- Author
-
Ataei, Pouya and Litchfield, Alan
- Subjects
BIG data ,COMPUTER software development ,INFRASTRUCTURE (Economics) ,DATA science ,INFORMATION storage & retrieval systems - Abstract
The proliferation of digital devices, rapid development of software and the infrastructure of today, have augmented user’s capability to produce data at an unprecedented rate. The accelerated growth of data could be called the era of big data and forced a paradigm shift in data engineering because the variety, velocity and volume of data overwhelmed existing systems. While companies attempt to extract benefit from big data, success rates are still low. Challenges such as rapid changes in technology, organizational culture, complexity in data engineering, impediments to system development, and a lack of effective big data architectures mean that only an estimated 20% of companies achieved their goals. To this end, this study explores a domain-driven distributed big data reference architecture that addresses issues in data architecture, data engineering, and system development. This reference architecture is empirically grounded and evaluated through deployment in a real-world scenario as an instantiated prototype, solving a problem in practice. The results of the evaluation demonstrate utility and applicability but with architectural trade-offs and challenges. [ABSTRACT FROM AUTHOR]
- Published
- 2023
49. On Studying the Effect of Data Quality on Classification Performances
- Author
-
Jouseau, Roxane, Salva, Sébastien, Samir, Chafik, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yin, Hujun, editor, Camacho, David, editor, and Tino, Peter, editor
- Published
- 2022
- Full Text
- View/download PDF
50. Improving the Yield and Revenue of Indian Crop Production Using Data Engineering
- Author
-
Domala, Jayashree, Dogra, Manmohan, Dsouza, Kevin, Fernandes, Dwayne, Srinivasaraghavan, Anuradha, Xhafa, Fatos, Series Editor, Gupta, Deepak, editor, Polkowski, Zdzislaw, editor, Khanna, Ashish, editor, Bhattacharyya, Siddhartha, editor, and Castillo, Oscar, editor
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.