103 results on '"ETL"'
Search Results
2. PDCP: A Set of Tools for Extracting, Transforming, and Loading Radiotherapy Data from the Orthanc Research PACS
- Author
-
Ali Haidar, Farhannah Aly, and Lois Holloway
- Subjects
orthanc ,ETL ,data mining - Abstract
The Orthanc server is a light-weight open-source picture imaging and archiving system (PACS) used to store digital imaging and communications in medicine (DICOM) data. It is widely used in research environments as it is free, open-source and scalable. To enable the use of Orthanc stored radiotherapy (RT) data in data mining and machine learning tasks, the records need to be extracted, validated, linked, and presented in a usable format. This paper reports patient data collection and processing (PDCP), a set of tools created using python for extracting, transforming, and loading RT data from Orthanc PACs. PDCP enables querying, retrieving, and validating patient imaging summaries; analysing associations between patient DICOM data; retrieving patient imaging data into a local directory; preparing the records for use in various research questions; tracking the patient’s data collection process and identifying reasons behind excluding patient’s data. PDCP targeted simplifying the data preparation process in such applications, and it was made expandable to facilitate additional data preparation tasks.
- Published
- 2022
- Full Text
- View/download PDF
3. Analisa Kategori Barang dengan Penjualan Terbanyak dalam Jangka Waktu 3 Bulan Menggunakan Data Warehouse
- Author
-
I Putu Agus Eka Pratama and Rey Bernard
- Subjects
etl ,data warehouse ,data multi dimensi ,Electrical engineering. Electronics. Nuclear engineering ,Information technology ,tableau ,T58.5-58.64 ,pentaho data integration ,TK1-9971 - Abstract
UD. Makmur Sejahtera sebagai salah satu distributor terbesar untuk barang kebutuhan sehari-hari di Manokwari Papua, memiliki data-data transaksi penjualan untuk setiap kategori barang dan jenis barang. Data-data ini masih tersimpan secara fisik dalam bentuk nota serta belum didigitalkan untuk dapat dimanfaatkan secara maksimal untuk membantu UD. Makmur Sejahtera meningkatkan penjualan. Penelitian ini memiliki ide dasar pemanfaatan data digital transaksi penjualan untuk mengetahui kategori barang mana yang memiliki penjualan terbanyak dalam kurun waktu tiga bulan (Juli 2020 hingga September 2020) melalui proses Extraction, Transformation, Loading (ETL) berbasis Pentaho Data Integration, untuk kemudian disimpan dalam bentuk data multi dimensi, dikategorikan, dan divisualisasikan menggunakan Tableau. Hasil pengujian menunjukkan bahwa komoditas beras merupakan kategori barang dengan penjualan terbanyak pada kurun waktu tiga bulan serta implementasi Data Warehouse sangat membantu UD. Makmur Sejahtera di dalam mencapai tujuan bisnis usahanya.
- Published
- 2022
- Full Text
- View/download PDF
4. Public Sector Credit Card Transactions
- Author
-
Mays, Morgan
- Subjects
Big Data ,Data Warehouse ,Government Spending ,Data Set ,Debit Card Transactions ,Data Visualization ,Data Science ,Purchase Card Transactions ,Credit Card Transactions ,csv ,Transactions ,ETL ,Spending Habits ,Purchases ,Government Spending Habits ,PCard Transactions ,Transactional Data ,Financial Transactions - Abstract
An open source data integration project combining credit card transaction data of major US government entities.
- Published
- 2022
- Full Text
- View/download PDF
5. DATA ANALYSIS AND ETL TOOLS IN BUSINESS INTELLIGENCE
- Author
-
Keval Rajesh Dhanani, Param Pankaj Doshi, Geetha S, and IRJCS: International Research Journal of Computer Science
- Subjects
ETL ,Business Intelligence ,EIS ,Knowledge management ,business.industry ,Computer science ,Business intelligence ,business ,DSS - Abstract
Business intelligence (BI) is a collection of software and services to convert facts into actionable observations. These observations can impact strategic and tactical business decisions of an enterprise. BI can provide users with exhaustive intelligence about the state of the business with the assistance of tools which can access and scrutinize data sets and deliver pinpointing findings in reports and summaries etc. The technology has considerably advanced in the area of information systems through the evolvement of DSS (Decision Support Systems) to EIS (Executive Information Systems) to Business Intelligence systems. ETL (Extract, Transform, and Load) is a procedure of pulling out data from various data sources and processing them according to business calculations and transferring the reformed data into a data warehouse. ETL function lies at the core of Business Intelligence systems because of the in-depth analytics data it provides. Enterprises can gain past, current, and projecting views of real business data with ETL.
- Published
- 2020
- Full Text
- View/download PDF
6. Development of Data Warehouse for Financial Report in Faculty of Science, Naresuan University
- Author
-
Kittipong Songsiri
- Subjects
ETL ,Data Warehouse ,OLAP ,Financial Report Data Warehouse - Abstract
Journal of Applied Informatics and Technology, 4, 2, 99-113
- Published
- 2022
- Full Text
- View/download PDF
7. Numerical Simulation and Optimization of Inorganic Lead-Free Cs3Bi2I9-Based Perovskite Photovoltaic Cell: Impact of Various Design Parameters
- Author
-
Arnob Das, Susmita Datta Peu, Md Abdul Mannan Akanda, Mostafa M. Salah, Md. Sejan Hossain, and Barun Kumar Das
- Subjects
Control and Optimization ,Renewable Energy, Sustainability and the Environment ,defect density ,Energy Engineering and Power Technology ,Building and Construction ,perovskite solar cell ,Cs3Bi2I9 ,PCE ,ETL ,HTL ,fill factor (FF) ,Electrical and Electronic Engineering ,Engineering (miscellaneous) ,Energy (miscellaneous) - Abstract
The lead halide-based perovskite solar cells have attracted much attention in the photovoltaic industry due to their high efficiency, easy manufacturing, lightweight, and low cost. However, these lead halide-based perovskite solar cells are not manufactured commercially due to lead-based toxicity. To investigate lead-free inorganic perovskite solar cells (PSCs), we investigated a novel Cs3Bi2I9-based perovskite configuration in SCAPS-1D software using different hole transport layers (HTLs). At the same time, WS2 is applied as an electron transport layer (ETL). Comparative analysis of the various design configurations reveals that ITO/WS2/Cs3Bi2I9/PEDOT:PSS/Au offers the best performance with 20.12% of power conversion efficiency (PCE). After optimizing the thickness, bandgap, defect density, and carrier density, the efficiency of the configuration is increased from 20.12 to 24.91%. Improvement in other performance parameters such as short circuit current (17.325 mA/cm2), open circuit voltage (1.5683 V), and fill factor (91.66%) are also observed after tuning different attributes. This investigation indicates the potential application of Cs3Bi2I9 as a lead-free and stable perovskite material that can contribute to improving the renewable energy sector.
- Published
- 2023
- Full Text
- View/download PDF
8. A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CASE STUDY USING THE NCBI DATABASE FOR GENETIC VARIATION
- Author
-
Zaineb Naamane and Vladan Jovanovic
- Subjects
Data Warehouse ,0209 industrial biotechnology ,Computer science ,Big data ,Schema Evolution ,Master Data Vault ,02 engineering and technology ,computer.software_genre ,Business Data Vault ,Data Mart ,020901 industrial engineering & automation ,Business Intelligence ,Genetic variation ,0202 electrical engineering, electronic engineering, information engineering ,Data Warehouse Evolution ,Data Vault (DV) ,Materialized View ,Vault (organelle) ,Relational Database Management System (RDMS) ,Database ,business.industry ,NoSQL ,Master Data Management (MDM) ,Data Warehouse (DW) ,Metadata ,ETL ,Metadata Repository ,020201 artificial intelligence & image processing ,Enterprise Data Warehouse (EDW) ,Data mining ,business ,computer - Abstract
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
- Published
- 2021
- Full Text
- View/download PDF
9. Rancang Bangun Engine ETL Data Warehouse dengan Menggunakan Bahasa Python
- Author
-
Dewa Komang Tri Adhitya Putra and I Made Suwija Putra
- Subjects
Data Warehouse ,Data collection ,lcsh:T58.5-58.64 ,Database ,lcsh:Information technology ,Computer science ,Relational database ,Online analytical processing ,Otomatisasi ,Manually ,Manual ,InformationSystems_DATABASEMANAGEMENT ,Python (programming language) ,computer.software_genre ,Data structure ,Data warehouse ,lcsh:TA168 ,ETL ,Automation ,lcsh:Systems engineering ,Online transaction processing ,computer ,Database transaction ,Python ,computer.programming_language - Abstract
Big companies that have many branches in different locations often have difficulty with analyzing transaction processes from each branch. The problem experienced by the company management is the rapid delivery of massive data provided by the branch to the head office so that the analysis process of the company's performance becomes slow and inaccurate. The results of this process used as a consideration in decision making which produce the right information if the data is complete and relevant. The right method of massive data collection is using the data warehouse approach. Data warehouse is a relational database designed to optimize queries in Online Analytical Processing (OLAP) from the transaction process of various data sources that can record any changes in data that occur so that the data becomes more structured. In applying the data collection, data warehouse has extracted, transform, and load (ETL) steps to read data from the Online Transaction Processing (OLTP) system, change the form of data through uniform data structures, and save to the final location in the data warehouse. This study provides an overview of the solution for implementing ETL that can work automatically or manually according to needs using the Python programming language so that it can facilitate the ETL process and can adjust to the conditions of the database in the company system., Perusahaan besar yang memiliki banyak cabang dengan lokasi berbeda sering sekali mengalami kesulitan dalam menganalisis proses transaksi dari setiap cabang. Masalah yang dialami dari pihak manajemen perusahaan terletak pada kurang cepatnya penyampaian data yang massive diberikan oleh cabang ke kantor pusat sehingga proses analisa kinerja perusahaan menjadi lambat dan kurang akurat. Hasil proses analisa yang digunakan sebagai pendukung dalam pengambilan keputusan akan menghasilkan informasi yang tepat jika data yang tersedia lengkap dan relevan. Metode pengumpulan data massive yang baik dan handal digunakan adalah dengan menggunakan pendekatan teknologi data warehouse. Data warehouse merupakan basis data relasional yang didesain lebih kepada mengoptimalkan query dalam Online Analytical Processing (OLAP) dari proses transaksi berbagai macam sumber data dan dapat mencatat segala perubahan data yang terjadi sehingga menjadi lebih terstruktur. Dalam penerapan pengumpulan datanya, data warehouse mempunyai tahapan extract, transform, dan load (ETL) untuk dapat membaca data dari sistem Online Transaction Processing (OLTP), merubah bentuk data melalui penyamaan struktur data, dan menyimpan ke lokasi akhir di data warehouse. Penelitian ini memberikan gambaran solusi implementasi ETL yang bisa bekerja secara otomatis maupun manual sesuai dengan kebutuhan dengan menggunakan bahasa pemrograman Python sehingga bisa memudahkan dalam proses ETL dan bisa menyesuaikan dengan kondisi database di sistem perusahaan
- Published
- 2019
- Full Text
- View/download PDF
10. Load Variability of Training Sessions and Competition in Female Basketball
- Subjects
Etl ,Treinamento ,Variabilidade ,Entrenamiento ,Training ,Load ,Carga ,Variability ,Itl ,Variabilidad - Published
- 2021
11. Reduced Graphene Oxide-Modified Tin Oxide Thin Films for Energy and Environmental Applications
- Author
-
Dai, Xinchen
- Subjects
ETL ,Tin oxide ,RGO ,Electron transport layer ,Reduced graphene oxide ,SnO2 - Abstract
Metal halide perovskite solar cells (PSCs) have attracted tremendous attention because of their rapid development. To enhance the power conversion efficiency (PCE) of PSCs, significant research efforts have focused on the optimization of electron transport layer (ETL). SnO2 has been extensively used as ETL due to its excellent electron transport properties. The optimization of the fabrication of SnO2 and passivation of the structural defects are essential to the improved performance of ETL. This thesis aims to (i) investigate the fabrication of pristine SnO2 thin film and characterize its material properties and (ii) develop a novel fabrication of reduced graphene oxide (RGO) modified SnO2 thin film as ETL and characterize its material properties. The first part was achieved by investigating the effects of UV-ozone treatment on fabrication of SnO2 thin films as well as the effects of precursor concentration and heating temperature on the resultant properties of the SnO2 thin films. The results showed that UV-ozone pretreatment is essential for depositing a continuous SnO2 thin film. A high precursor concentration resulted in low roughness and high n-type defects in SnO2 thin film, which would decrease the electrical conductivity while a high temperature of heat treatment resulted in increased crystallinity and a decrease in oxygen vacancies and residual Cl. The second part was achieved by investigating the effects of (i) preparation time of the RGO-SnO2 precursor and (ii) the concentration of RGO on the properties of RGO-SnO2 thin films. The RGO-SnO2 thin film was successfully fabricated by spin coating a precursor solution of SnCl2 in ethanol (95%) mixed with graphene oxide (GO) followed by heating at low temperatures. By conducting RGO modification, RGO-SnO2 thin films with high crystallinity and low oxygen vacancy contents were achieved resulting in a high electrical conductivity. The study showed that 3 h of preparation time for the precursor and addition amounts of 2.62 wt% RGO would result in the best properties of the RGO-SnO2 thin films. This ETL has a surface roughness Ra of
- Published
- 2021
- Full Text
- View/download PDF
12. An Analytical Tool for Georeferenced Sensor Data based on ELK Stack
- Author
-
François Pinet, Thi Thu Trang Ngo, Myoung-Ah Kang, David Sarramia, Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes (LIMOS), Ecole Nationale Supérieure des Mines de St Etienne (ENSM ST-ETIENNE)-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA)-Institut national polytechnique Clermont Auvergne (INP Clermont Auvergne), Université Clermont Auvergne (UCA)-Université Clermont Auvergne (UCA), Laboratoire de Physique de Clermont (LPC), Institut National de Physique Nucléaire et de Physique des Particules du CNRS (IN2P3)-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA), Technologies et systèmes d'information pour les agrosystèmes (UR TSCF), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Geoscience and Remote Sensing Society (GRSS), acm In-Cooperation, acm SIGSPATIAL, and Ecole Nationale Supérieure des Mines de St Etienne-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA)-Institut national polytechnique Clermont Auvergne (INP Clermont Auvergne)
- Subjects
Spatial Data Warehouse ,Data Integration ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Computer science ,Georeferenced Sensor Data ,Streaming Data ,NoSQL ,7. Clean energy ,Data Lake ,ETL ,Elasticsearch ,Stack (abstract data type) ,13. Climate action ,Georeference ,ELK Stack ,Remote sensing - Abstract
International audience; In the context of the French CAP 2025 I-Site project, an environmental data lake called CEBA is built at an Auvergne regional level. Its goal is to integrate data from heterogeneous sensors, provide end users tools to query and analyse georeferenced environmental data, and open data. The sensors collect different environmental measures according to their location (air and soil temperature, water quality, etc.). The measures are used by different research laboratories to analyse the environment. The main component for data shipping and storing is the ELK stack. Data are collected from sensors through Beats and streamed by Logstash to Elasticsearch. Scientists can query the data through Kibana. In this paper, we propose a data warehouse frontend to CEBA based on the ELK stack. We as well propose an additional component to the ELK stack that operates streaming ETL which allows integrating and aggregating streaming data from different sensors and sources given the user configuration in order to provide end users more analytical capabilities on the data. We show the architecture of this system, we present the functionalities of the data lake through examples, and finally, we present an example dashboard of the data on Kibana.
- Published
- 2021
- Full Text
- View/download PDF
13. Implantació de business intelligence a un grup hoteler
- Author
-
Gómez López, Sergio, Daradoumis Haralabus, Atanasi, and Andrés Sanz, Humberto
- Subjects
ETL ,Informàtica -- TFG ,Informática -- TFG ,dashboard ,model analític ,analytical model ,Electronic data processing -- TFG ,modelo analítico - Abstract
L'any 2020 serà recordat per la lluita contra la COVID-19 i els efectes de la pandèmia mundial que ha esdevingut en una crisi sanitària, social i econòmica sense precedents. A nivell econòmic, la incertesa generada per la pandèmia va provocar una gran crisi en el sector hoteler. Amb aquest escenari, les direccions dels hotels han de prestar molta més cura per garantir la viabilitat del negoci. Una bona presa de decisions radica sempre en l'experiència, la intuïció i les dades. Mentre que l'experiència i la intuïció estan al nostre abast, les xifres i tots els detalls del nostre negoci, sovint, estan ocults dins de les bases de dades de l'empresa. És aquí on el Business Intelligence és cada cop més necessari per grups d'hotels grans o mitjans. En aquest sentit, aquest treball pretén implantar una solució BI de gestió hotelera per oferir una adequada resposta a les necessitats d'informació dels hotels. Aquesta solució s'encarregarà d'explotar la informació corresponent a Vendes i Reserves per dotar a la direcció d'una eina que els ajudi a la presa de decisions. A nivell tècnic, aquesta solució està desenvolupada amb tecnologia Microsoft i englobarà les fases d'integració, on es definiran els processos ETL necessaris per crear un DWH, la fase de modelat de dades, on s'implementarà un DataMart que ens servirà per explotar la informació en els reports, i per últim, la fase de visualització de dades, on es tractarà la creació dels informes necessaris pel grup hoteler. El año 2020 será recordado por la lucha contra la Covid-19 y los efectos de la pandemia mundial que se ha convertido en una crisis sanitaria, social y económica sin precedentes. A nivel económico, la incertidumbre generada por la pandemia provocó una gran crisis en el sector hotelero. Con este escenario, las direcciones de los hoteles deben prestar mucha más cuidado para garantizar la viabilidad del negocio. Una buena toma de decisiones radica siempre en la experiencia, la intuición y los datos. Mientras que la experiencia y la intuición están a nuestro alcance, las cifras y todos los detalles de nuestro negocio, a menudo, están ocultos dentro de las bases de datos de la empresa. Es aquí donde el Business Intelligence es cada vez más necesario para grupos de hoteles grandes o medios. En este sentido, este trabajo pretende implantar una solución BI de gestión hotelera para ofrecer una adecuada respuesta a las necesidades de información de los hoteles. Esta solución se encargará de explotar la información correspondiente a Ventas y Reservas para dotar a la dirección de una herramienta que les ayude a la toma de decisiones. A nivel técnico, esta solución está desarrollada con tecnología Microsoft y englobará las fases de integración, donde se definirán los procesos ETL necesarios para crear un DWH, la fase de modelado de datos, donde se implementará un DataMart que nos servirá para explotar la información en los reportes, y por último, la fase de visualización de datos, donde se tratará la creación de los informes necesarios para el grupo hotelero. The year 2020 will be remembered for the fight against COVID-19 and the effects of the global pandemic which has caused an unprecedented health, social and economic crisis. At an economic level, the uncertainty generated by the pandemic caused a major crisis in the hotel sector. The consequence of the crisis has meant that hotel managers have to pay much more attention to ensure business viability. Good decision making is always based on experience, intuition and data. While experience and intuition are within our control, the numbers and information of businesses are often hidden within the company's databases. This is where Business Intelligence is increasingly important for large or medium-sized hotel groups. This work aims to implement a hotel management BI solution to offer an adequate response to the information needs of hotels. The Sales and Reservation information to provide management with a tool that helps them make decisions. This solution, developed with Microsoft technology, encompasses the following phases. The integration phase, where the ETL processes necessary to create a DWH will be defined. The data modelling phase, where a DataMart will be implemented that will help us to exploit the information in the reports. And finally, the data visualization phase, where the creation of the necessary reports by the hotel group will be discussed.
- Published
- 2021
14. Distributed Data Warehouse Resource Monitoring
- Author
-
Filipe Sá, Maryam Abbasi, Pedro Martins, and Filipe Caldeira
- Subjects
Process (engineering) ,Computer science ,Distributed computing ,Scalability ,data warehouse ,InformationSystems_DATABASEMANAGEMENT ,Pipeline (software) ,Data warehouse ,ETL ,monitoring ,actuate ,Resource (project management) ,Transformation (function) - Abstract
In this paper, we investigate the problem of providing scalability (out and in) to Extraction, Transformation, Load (ETL) and Querying (Q) (ETL+Q) process of data warehouses. In general, data loading, transformation, and integration are heavy tasks that are performed only periodically, instead of row by row. Parallel architectures and mechanisms can optimize the ETL process by speeding up each part of the pipeline process as more performance is needed. We propose parallelization solutions for each part of the ETL+Q, which we integrate into a framework, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL+Q process. Our results show that the proposed system algorithms can handle scalability to provide the desired processing speed in big-data and small-data scenarios. info:eu-repo/semantics/acceptedVersion
- Published
- 2021
- Full Text
- View/download PDF
15. EHDEN - D4.7 - Yearly Progress Report on Technical Framework
- Author
-
Speybroeck, Michel Van, Moinat, Maxim, Reisberg, Sulev, Blacketer, Clair, Sandijk, Sebastiaan Van, Rijnbeek, Peter, José-Luis Oliveira, and Trifan, Alina
- Subjects
ETL ,Framework ,EHDEN portal ,Arachne ,White Rabbit ,Rabbit in a Hat ,Quality Dashboard ,Security ,Interoperability ,FAIR - Abstract
The EHDEN technical framework comprises the entire set of technical constructs, i.e. applications and their interoperability, including security setup within the context of the EHDEN Project. The scope of the framework covers, as well, components for the centralised functionality such as the EHDEN portal as well as the components which are used in the local mapping of databases and tools for managing of verifying the data quality. While Year 1 he focussed on establishing the architecture and technical foundation, year 2 focused on maturing and provide working solutions for the key components. The EHDEN portal has a better plugin structure, the database catalogue has improved usability features and now contains a plugin-database dashboards- to provide more flexibility in (summarised) database information. This will enable prospective users who want to assess the feasibility of a study concept, to retrieve all relevant information from participating databases - from metadata to data profiling information - from a single place. The enhanced API capability will also ensure that the data catalogue contributes to the FAIRification of the data sources. The new release of Arachne is another step forward in providing a technical solution to fully support the execution of federated studies. On the local side, the updated White Rabbit/Rabbit in a Hat tool, improves its practical usability and makes it the de-facto standard for data profiling and documenting the ETL logic. The full integration of the Data Quality Dashboard solution in the mapping process makes sure that EHDEN employs a fully transparent mechanism to monitor and manage the data quality of mapped data sources.
- Published
- 2020
- Full Text
- View/download PDF
16. Big Data platform deployment in a HPC cluster
- Author
-
Ferrer López, Pol, Universitat Autònoma de Barcelona. Escola d'Enginyeria, and Fernández González, Rafael
- Subjects
Big Data ,Microsoft Azure ,HDFS ,Apache Kafka ,Apache Spark ,VPC ,MapR ,On-premise ,VPN ,ETL ,Hadoop ,API ,Apache Drill ,HPC ,Amazon Web Services ,Cloud ,Python - Abstract
El Big Data es un término que está tomando cada vez más importancia en nuestro sector, ya que conforme pasan los años todas las empresas trabajan con cantidades más grandes, complejas e importantes de datos. Por lo tanto, es importante utilizar software y plataformas que nos ayuden a gestionar, distribuir y analizar estos datos. Se mostrará cómo ha sido trabajar en el departamento de Applied Intelligence (Accenture) para desplegar una famosa plataforma Big Data y con sus distintos componentes. Ésta llevará a cabo una ETL (Extract, Transform and Load) encargada de procesar datos que más adelante se utilizarán en un modelo predictivo desarrollado por compañeros de la empresa. Se generará una gran cantidad de documentación para que futuros miembros del equipo puedan desplegar y trabajar con la plataforma. Big Data is a term that is becoming increasingly important in our sector, since as the years go by, all companies work with larger, more complex and important amounts of data. Therefore, it is important to use software and platforms that help us manage, distribute and analyze this data. It will be shown how it has been working in the Applied Intelligence department (Accenture) to deploy a famous Big Data platform and its various components. This will carry out an ETL (Extract, Transform and Load) in charge of processing data that will later be used in a predictive model developed by colleagues from the company. A large amount of documentation will be generated for future team members to deploy and work with the platform. El Big Data és un terme que està prenent cada vegada més importància en el nostre sector, ja que a mesura que passen els anys totes les empreses treballen amb quantitats més grans, complexes i importants de dades. Per tant, és important utilitzar programari i plataformes que ens ajudin a gestionar, distribuir i analitzar aquestes dades. Es mostrés com ha estat treballar al departament d'Applied Intelligence (Accenture) per desplegar una famosa plataforma Big Data i amb els seus diferents components. Aquesta durà a terme una ETL (Extract, Transform and Load) encarregada de processar dades que més endavant s'utilitzaran en un model predictiu desenvolupat per companys de l'empresa. Es generarà una gran quantitat de documentació perquè futurs membres de l'equip puguin desplegar i treballar amb la plataforma.
- Published
- 2020
17. Aplicación móvil para predicciones de apuestas deportivas
- Author
-
Batista Díaz, Alejandro, Rodríguez Penabad, Miguel, and Enxeñaría informática, Grao en
- Subjects
ETL ,REST Services ,Apuestas deportivas ,Statistics and Predictions ,Predicción estadística ,Servicios REST ,Aplicación Android ,Sport Bets ,Android App - Abstract
[Resumen] El objetivo de este trabajo final de grado es la creación de una aplicación móvil que prediga resultados de apuestas deportivas de La Liga Española de fútbol. Para llevarla a cabo se realizará un proceso completo que inicia en la extracción de datos de un proveedor externo y mediante un flujo ETL lo volcará a una base de datos de nuestro diseño y control. Se creará un servicio REST que exporte los datos que requieren las funcionalidades necesarias y finalmente, se diseñará una aplicación móvil donde mostrar las predicciones deportivas en función de modelos estadísticos. Nos centraremos en las apuestas a número de goles, número de tarjetas y número de córners en un partido. [Abstract] The main objective of this final work of degree is the develop of a mobile application that predicts the sports betting results of the Spanish Soccer League (La Liga). To make this posible, a complete process must be develop beginning with the extraction of data from an external provider and through an ETL flow, it will be transferred to a database of our design and control. A REST service will be created and it exports the data that require the necessary functionalities and finally, a mobile application will be designed to show sports predictions based on statistical models. We will focus on betting on number of goals, number of cards and number of corners in a match. Traballo fin de grao (UDC.FIC). Enxeñaría informática. Curso 2019/2020
- Published
- 2020
18. An Evaluation of How Big-Data and Data Warehouses Improve Business Intelligence Decision Making
- Author
-
Pedro Martins, Anthony Martins, Filipe Sá, Filipe Caldeira, and Rocha, {\'A
- Subjects
Data Warehouse ,OLAP ,Computer science ,business.industry ,Online analytical processing ,Data transformation ,Big data ,020206 networking & telecommunications ,02 engineering and technology ,Data Mart · ,Data science ,Data warehouse ,ETL ,Business Intelligence ,Analytical methods ,020204 information systems ,Data mart ,Business intelligence ,· Big Data ,0202 electrical engineering, electronic engineering, information engineering ,Performance indicator ,Architecture ,Power BI ,business - Abstract
Analyze and understand how to combine data warehouse with business intelligence tools, and other useful information or tools to visualize KPIs are critical factors in achieving the goal of raising competencies and business results of an organization. This article reviews data warehouse concepts and their appropriate use in business intelligence projects with a focus on large amounts of information. Nowadays, data volume is more significant and critical, and proper data analysis is essential for a successful project. From importing data to displaying results, there are crucial tasks such as extracting information, transforming it analyzing, and storing data for later querying. This work contributes with the proposition of a Big Data Business Intelligence architecture for an efficiently BI platform and the explanation of each step in creating a Data Warehouse and how data transformation is designed to provide useful and valuable information. To make valuable information useful, Business Intelligence tools are presented and evaluates, contributing to the continuous improvement of business results. info:eu-repo/semantics/publishedVersion
- Published
- 2020
- Full Text
- View/download PDF
19. Solució de Business Intelligence per a l'estudi i anàlisi de la miocardiopatia hipertròfica en pacients d'una consulta de cardiopaties familiars
- Author
-
Juan Tomàs, Laia and Andrés Sanz, Humberto
- Subjects
familly cardiopathies ,Gestor de datos -- TFG ,data warehouse ,quadre de comandament ,cuadro de mando ,business intelligence ,cardiopatías familiares ,Gestor de dades -- TFG ,ETL ,magatzem de dades ,Data warehousing -- TFG ,dashboard ,almacén de datos ,cardiopaties familiars - Abstract
El present projecte té per objectiu dissenyar i implementar una solució de Business Intelligence (BI) per a una consulta de Cardiopaties Familiars d'un hospital de tercer nivell de Catalunya per a l'estudi i anàlisi dels pacients amb Miocardiopatia Hipertròfica (MCH). La MCH és una malaltia que es caracteritza per un engruiximent anormal de les parets del cor. El seu origen es troba, normalment, en mutacions genètiques hereditàries que són causants de la gran variabilitat en les manifestacions, evolució i pronòstic que presenta. El projecte respon a la necessitat d'oferir una solució que permeti recopilar i analitzar dades clíniques i de gestió dels pacients amb MCH. La solució presentada comprèn les següents activitats: Recopilació de les necessitats i requeriments del model de dades i de la solució de BI. Disseny funcional, lògic i físic d'un datawarehouse. Realització dels processos Extract, Transform and Load (ETL). Definició del panell d'indicadors i visualitzacions gràfiques en informes i dashboards. Disseny i implementació d'una solució de Business Intelligence per a l'estudi i anàlisi dels pacients amb Miocardiopatia Hipertròfica. Aquesta solució pretén ser un eina de suport a l'anàlisi de la pràctica clínica i l'atenció assistencial dels pacients amb MCH amb l'objectiu de facilitar la presa de decisions i l'elaboració de publicacions científiques que generin nou coneixement. El presente proyecto tiene por objetivo diseñar e implementar una solución de Business Intelligence (BI) para una consulta de Cardiopatías Familiares de un hospital de tercer nivel de Cataluña para el estudio y análisis de los pacientes con Miocardiopatía Hipertrófica (MCH). La MCH es una enfermedad que se caracteriza por un engrosamiento anormal de las paredes del coro. Su origen se encuentra, normalmente, en mutaciones genéticas hereditarias que son causantes de la gran variabilidad en las manifestaciones, evolución y pronóstico que presenta. El proyecto responde a la necesidad de ofrecer una solución que permita recopilar y analizar datos clínicos y de gestión de los pacientes con MCH. La solución presentada comprende las siguientes actividades: Recopilación de las necesidades y requerimientos del modelo de datos y de la solución de BI. Diseño funcional, lógico y físico de un datawarehouse. Realización de los procesos Extract, Transform and Load (ETL). Definición del panel de indicadores y visualizaciones gráficas en informes y dashboards. Diseño e implementación de una solución de Business Intelligence para el estudio y análisis de los pacientes con Miocardiopatía Hipertrófica. Esta solución pretende ser un herramienta de apoyo al análisis de la práctica clínica y la atención asistencial de los pacientes con MCH con el objetivo de facilitar la toma de decisiones y la elaboración de publicaciones científicas que generen nuevo conocimiento. The aim of this project is to design and implement a Business Intelligence (BI) solution for a Heart Disease Consultation at a third level hospital in Catalonia for the study and analysis of patients with Hypertrophic Cardiomyopathy (HCM). HCM is a disease characterized by abnormal thickening of the walls of the heart. Its origin is usually found in hereditary genetic mutations that are the cause of the great variability in the manifestations, evolution and prognosis. The project responds to the need to offer a solution to collect and analyse clinical and management data from patients with HCM. This solution includes the following activities: Compilation of the needs and requirements of the data model and the BI solution. Functional, logical and physical design of a data warehouse. Carrying out the Extract, Transform and Load (ETL) processes. Definition of indicators and measures panel and graphical displays in reports and dashboards. Design and implementation of a Business Intelligence solution for the study and analysis of patients with Hypertrophic Cardiomyopathy. This solution aims to be a support tool for the analysis of clinical practice and care of patients with HCM. Its main purpose is to facilitate decision-making processes and the development of scientific publications that may generate new knowledge.
- Published
- 2020
20. A Machine Learning Approach to Reduce Dimensional Space in Large Datasets
- Author
-
Alejandro Reina Reina, Rafael M. Terol, David Gil, Saber Ziaei, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Departamento de Tecnología Informática y Computación, Universidad de Alicante. Instituto Universitario de Investigación Informática, and Lucentia
- Subjects
General Computer Science ,Library science ,Large dataset ,02 engineering and technology ,Space (commercial competition) ,large dataset ,020204 information systems ,Political science ,Machine learning ,0202 electrical engineering, electronic engineering, information engineering ,Dashboards ,General Materials Science ,European commission ,Data mining ,dimensionality reduction ,Sustainable development ,PCA ,General Engineering ,Cross-validation ,data mining ,Dimensionality reduction ,ETL ,Work (electrical) ,Lenguajes y Sistemas Informáticos ,020201 artificial intelligence & image processing ,Christian ministry ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 ,Arquitectura y Tecnología de Computadores - Abstract
Large datasets computing is a research problem as well as a huge challenge due to massive amounts of data that are mined and crunched in order to successfully analyze these massive datasets because they constitute a valuable source of information over different and cross-folded domains, and therefore it represents an irreplaceable opportunity. Hence, the increasing number of environments that use data-intensive computations need more complex calculations than the ones applied to grid-based infrastructures. In this way, this paper analyzes the most commonly used algorithms regarding to this complex problem of handling large datasets whose part of research efforts are focused on reducing dimensional space. Consequently, we present a novel machine learning method that reduces dimensional space in large datasets. This approach is carried out by developing different phases: merging all datasets as a huge one, performing the Extract, Transform and Load (ETL) process, applying the Principal Component Analysis (PCA) algorithm to machine learning techniques, and finally displaying the data results by means of dashboards. The major contribution in this paper is the development of a novel architecture divided into five phases that presents an hybrid method of machine learning for reducing dimensional space in large datasets. In order to verify the correctness of our proposal, we have presented a case study with a complex dataset, specifically an epileptic seizure recognition database. The experiments carried out are very promising since they present very encouraging results to be applied to a great number of different domains. This work was partially funded by Grant RTI2018-094283-B-C32, ECLIPSE-UA (Spanish Ministry of Education and Science), and in part by the Lucentia AGI Grant. This work was partially funded by GENDER-NET Plus Joint Call on Gender an UN Sustainable Development Goals (European Commission - Grant Agreement 741874), funded in Spain by “La Caixa” Foundation (ID 100010434) with code LCF/PR/DE18/52010001 to MTH.
- Published
- 2020
21. A Methodology for Data WareHousing Processes Based on Experience
- Author
-
Francisco Fariña Molina, Fernando Medina Quispe, and Wilson Castillo-Rojas
- Subjects
ETL ,OLAP ,General Computer Science ,Welfare economics ,datamart ,almacén de datos ,data warehouse ,data warehousing ,Sociology ,Inteligencia de negocios ,Business intelligence - Abstract
espanolEl articulo presenta una nueva metodologia para procesos data warehousing, que integra diversos enfoques, tecnicas y metodologias, tales como: especificacion de requisitos de informacion, modelamiento relacional, modelo de desarrollo combinado a partir de las propuestas de Kimball y Hefesto, un proceso aumentado de extraccion-transformacion y carga que incorpora explicitamente una fase de validacion de indicadores, y finalmente visualizaciones integradas e interactivas para el analisis multidimensional de los indicadores obtenidos. La metodologia propuesta no solo se basa en los aspectos teoricos descritos en el articulo, sino que ademas en la experiencia lograda por parte del equipo investigador, en el desarrollo de diversos proyectos de data warehousing, principalmente orientados a la generacion de indicadores de productividad academica y de gestion de una universidad, lo que corresponde al caso de estudio de aplicacion de la metodologia que se describe. Los resultados de exito en diferentes proyectos donde ha sido utilizada esta metodologia avalan su eficacia. EnglishThe article presents a new methodology for data warehousing processes, which integrates different approaches, techniques and methodologies, such as: specification of information requirements, relational modeling, combined development model based on the Kimball and Hefesto proposals, an increased process of extraction-transformation and load that explicitly incorporates a validation phase of indicators, and finally integrated and interactive visualizations for the multidimensional analysis of the obtained indicators. The proposed methodology is not only based on the theoretical aspects described in the article, but also on the experience gained by the research team in the development of various data warehousing projects, mainly oriented to the generation of indicators of academic productivity and management of a university, which corresponds to he case study application of the methodology described. The results of success in different projects where this methodology has been used guarantee its effectiveness.
- Published
- 2018
- Full Text
- View/download PDF
22. Aplicación de Inteligencia de Negocios para el análisis de vulnerabilidades en pro de incrementar el nivel de seguridad en un CSIRT académico
- Author
-
César Javier Villacís-Silva, Walter Marcelo Fuertes-Díaz, Francisco Xavier Reyes-Mena, Ernesto Pérez-Estévez, Carlos Enrique Guzmán-Jaramillo, and Paúl Fernando Bernal-Barzallo
- Subjects
cybersecurity ,Computer science ,Intrusion detection system ,computer.software_genre ,business intelligence ,decision making ,vulnerability analysis ,Scrum ,Software ,Factor (programming language) ,electronic data processing ,procesamiento electrónico de datos ,segurança cibernética ,Graphical user interface ,computer.programming_language ,processamento eletrônico de dados ,Database ,análisis de vulnerabilidades ,tomada de decisões ,business.industry ,Electronic data processing ,General Medicine ,inteligencia de negocios ,análise de vulnerabilidades ,ETL ,lcsh:TA1-2040 ,alertas tempranas ,Business intelligence ,seguridad cibernética ,toma de decisiones ,Analysis tools ,lcsh:Engineering (General). Civil engineering (General) ,business ,early alerts ,computer ,alertas precoces ,inteligência de negócios - Abstract
This study aimed at designing a potential solution through Business Intelligence for acquiring data and information from a wide variety of sources and utilizing them in the decision-making of the vulnerability analysis of an Academic CSIRT (Computer Security Incident Response Team). This study was developed in a CSIRT that gathers a variety of Ecuadorian universities. We applied the Action-Research methodology with a qualitative approach, divided into three phases: First, we qualitatively evaluated two intrusion detection analysis tools (Passive Scanner and Snort) to verify their advantages and their ability to be exclusive or complementary; simultaneously, these tools recorded the real-time logs of the incidents in a MySQL related database. Second, we applied the Ralph Kimball's methodology to develop several routines that allowed applying the "Extract, Transform, and Load" process of the non-normalized logs that were subsequently processed by a graphical user interface. Third, we built a software application using Scrum to connect the obtained logs to the Pentaho BI tool, and thus, generate early alerts as a strategic factor. The results demonstrate the functionality of the designed solution, which generates early alerts, and consequently, increases the security level of the CSIRT members. Resumen Esta investigación tuvo como objetivo diseñar una solución para la toma de decisiones mediante Inteligencia de Negocios, que permite adquirir datos e información de una amplia variedad de fuentes y utilizarlos en la toma de decisiones en el análisis de vulnerabilidades de un equipo de respuesta ante incidentes informáticos (CSIRT). Este estudio se ha desarrollado en un CSIRT Académico que agrupa varias universidades miembros del Ecuador. Para llevarlo a cabo se aplicó la metodología de Investigación-Acción con un enfoque cualitativo, dividido en tres fases: Primera, se realizó una evaluación comparativa de dos herramientas de análisis de intrusos: Passive Vulnerability Scanner y Snort, que son utilizadas por el CSIRT, para verificar sus bondades y verificar si son excluyentes o complementarias; enseguida se han guardado los logs en tiempo real de los incidentes registrados por dichas herramientas en una base de datos relacional MySQL. Segunda, se aplicó la metodología de Ralph Kimball para el desarrollo de varias rutinas que permitan aplicar el proceso "Extraer, Transformar y Cargar" de los logs no normalizados, que luego serían procesados por una interfaz gráfica. Tercera, se construyó una aplicación de software mediante la metodología Ágil Scrum, que realice un análisis inteligente con los logs obtenidos mediante la herramienta Pentaho BI, con el propósito de generar alertas tempranas como un factor estratégico. Los resultados muestran la funcionalidad de esta solución que ha generado alertas tempranas y que, en consecuencia, ha incrementado el nivel de seguridad de las universidades miembros del CSIRT académico. Resumo Esta pesquisa teve como objetivo desenhar uma solução para a tomada de decisões mediante Inteligência de Negócios, que permite adquirir dados e informação de uma ampla variedade de fontes e utilizá-los na tomada de decisões na análise de vulnerabilidades de um equipamento de resposta ante incidentes informáticos (CSIRT). Este estudo tem se desenvolvido em um CSIRT Acadêmico que agrupa várias universidades membros do Equador. Para realizá-lo, aplicou-se a metodologia de Pesquisa-Ação com um enfoque qualitativo, dividido em três fases: Primeira, realizou-se uma avaliação comparativa de duas ferramentas de análise de intrusos: Passive Vulnerability Scanner e Snort, que são utilizadas pelo CSIRT, para verificar seus benefícios e se são excludentes ou complementários; imediatamente são guardados os logs em tempo real dos incidentes registrados por ditas ferramentas em uma base de dados relacional MySQL. Segunda, aplicou-se a metodologia de Ralph Kimball para o desenvolvimento de várias rotinas que permitam aplicar o processo "Extrair, Transformar e Carregar" dos logs não normalizados, que logo seriam processados por uma interface gráfica. Terceira, construiu-se uma aplicação de software mediante a metodologia Ágil Scrum, que realize uma análise inteligente com os logs obtidos mediante a ferramenta Pentaho BI, com o propósito de gerar alertas precoces como um fator estratégico. Os resultados mostram a funcionalidade desta solução que tem gerado alertas precoces e que, em consequência, tem incrementado o nível de segurança das universidades membros do CSIRT acadêmico.
- Published
- 2018
- Full Text
- View/download PDF
23. BigDimETL with NoSQL Database
- Author
-
Faiez Gargouri, Olivier Teste, Faiza Ghozzi, Hana Mallek, Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - INPT (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Université de Sfax (TUNISIA), and Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
- Subjects
Decision support system ,Hbase ,business.industry ,Computer science ,Twitter ,Big data ,BigData ,02 engineering and technology ,Intelligence artificielle ,NoSQL ,computer.software_genre ,Data science ,Data warehouse ,ETL ,Join operation ,Knowledge extraction ,020204 information systems ,Business intelligence ,Distributed data store ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,business ,computer ,General Environmental Science - Abstract
In the last decade, we have witnessed an explosion of data volume available on the Web. This is due to the rapid technological advances with the availability of smart devices and social networks such as Twitter, Facebook, Instagram, etc. Hence, the concept of Big Data was created to face this constant increase. In this context, many domains should take in consideration this growth of data, especially, the Business Intelligence (BI) domain. Where, it is full of important knowledge that is crucial for effective decision making. However, new problems and challenges have appeared for the Decision Support System that must be addressed. Accordingly, the purpose of this paper is to adapt Extract-Transform-Load (ETL) processes with Big Data technologies, in order to support decision-making and knowledge discovery. In this paper, we propose a new approach called Big Dimensional ETL (BigDimETL) dealing with ETL development process and taking into account the Multidimensional structure. In addition, in order to accelerate data handling we used the MapReduce paradigm and Hbase as a distributed storage mechanism that provides data warehousing capabilities. Experimental results show that our ETL operation adaptation can perform well especially with Join operation.
- Published
- 2018
- Full Text
- View/download PDF
24. Business Intelligence for the Programs of the Secretaries of Health, Education and Planning in a Territorial Entity
- Author
-
Maria-Alejandra Varona-Taborda, Jorge-Cesar Mosquera-Ramírez, César-Augusto Medina-Moreno, Diego-Fernando Lemus-Muñoz, Carlos-Julián Muñoz-Hernandez, and Christian-Gustavo Arias-Iragorri
- Subjects
ETL ,ente territorial ,datamart ,metodologia Kimball ,General Medicine ,inteligencia de negocios ,territorial entity ,metodología Kimball ,business intelligence ,Kimball methodology ,entidade territorial ,inteligência de negócios - Abstract
Resumen Los entes territoriales en Colombia por ley deben registrar y reportar a diferentes instancias la información de control de los programas gubernamentales que administran. Sin embargo, es tanta la información distribuida en diversas plataformas estatales y propias, que el resultado es procesado y generado en diferentes formatos. Esta situación dificulta el manejo integral de los datos territoriales, pues, aunque la información existe, se encuentra aislada y su análisis se realiza de manera independiente por cada parte responsable del proceso. El objetivo de esta investigación es la Implementación de un modelo de inteligencia de negocios que permite integración y análisis de los datos de los programas de las secretarías de Salud, Educación y Planeación para un Ente Territorial. Se empleó la metodología de Ralph Kimball implementando un modelo de topología estrella en el Datamart; utilizando como gestor de base de datos MySQL se construye un sistema ETL con la herramienta Pentaho que permite la extracción, transformación y carga de los datos en el Datamart. Se obtienen los cubos, reportes y Dashboard con el manejo de herramientas como Pentaho y Power BI y de este modo es posible realizar una correcta interpretación de la información resultante. Después de aplicar Inteligencia de Negocios se logra generar un adecuado análisis de la información, permitiendo la toma de decisiones y aplicación de nuevas estrategias para dar solución a problemas específicos mediante la utilización de tableros de control, visualización de indicadores y generación de reportes. Abstract Territorial entities in Colombia are bound by the law to register and report to different instances the control information of the government programs they administer. However, so much information is distributed on various state and owned platforms that the result is processed and generated in different formats. This situation makes the comprehensive management of territorial data difficult, since, although the information exists, it is isolated, and its analysis is carried out independently by each party responsible for the process. The objective of this research is the Implementation of a business intelligence model that allows integration and analysis of data from the programs of the Health, Education, and Planning ministries for a Territorial Entity. Ralph Kimball's methodology was used, implementing a star topology model in the Datamart using MySQL as a database manager, an ETL system was built with the Pentaho tool which allows the extraction, transformation, and loading of the data in the Datamart. The cubes, reports and Dashboard are obtained with the management of tools such as Pentaho and Power BI, thus it is possible to make a correct interpretation of the resulting information. After applying Business Intelligence, it is possible to generate an adequate analysis of the information, allowing decision-making and application of new strategies to solve specific problems using control panels, visualization of indicators and generation of reports. Resumo As entidades territoriais na Colômbia por lei devem registrar e relatar a diferentes instâncias as informações de controle dos programas de governo que administram. Porém, tanta informação é distribuída em vários estados e plataformas próprias que o resultado é processado e gerado em diferentes formatos. Esta situação dificulta a gestão integral dos dados territoriais, pois, embora a informação exista, é isolada e a sua análise é efectuada de forma independente por cada um dos responsáveis pelo processo. O objetivo desta pesquisa é a implementação de um modelo de business intelligence que permita a integração e análise de dados dos programas dos ministérios da Saúde, Educação e Planejamento de uma Entidade Territorial. Foi utilizada a metodologia de Ralph Kimball, implementando um modelo de topologia em estrela no Datamart; Utilizando o MySQL como gerenciador de banco de dados, é construído um sistema ETL com a ferramenta Pentaho que permite a extração, transformação e carregamento de dados no Datamart. Os cubos, relatórios e Dashboard são obtidos com a gestão de ferramentas como Pentaho e Power BI e desta forma é possível fazer uma interpretação correta da informação resultante. Após a aplicação de Business Intelligence, é possível gerar uma análise adequada das informações, permitindo a tomada de decisões e aplicação de novas estratégias para resolução de problemas específicos por meio da utilização de painéis de controle, visualização de indicadores e geração de relatórios.
- Published
- 2021
- Full Text
- View/download PDF
25. Implementing data-driven decision support system based on independent educational data mart
- Author
-
Zahraa Alhilfi, Marwah Kamil Hussein, Alaa Khalaf Hamoud, and Rabab Hassan Sabr
- Subjects
Decision support system ,OLAP ,Process management ,General Computer Science ,business.industry ,Computer science ,Process (engineering) ,Emerging technologies ,InformationSystems_INFORMATIONSYSTEMSAPPLICATIONS ,Online analytical processing ,InformationSystems_DATABASEMANAGEMENT ,Field (computer science) ,Data-driven ,ETL ,Knowledge base ,Educational data mart ,Independent data mart ,Performance indicator ,Electrical and Electronic Engineering ,business ,KPI - Abstract
Decision makers in the educational field always seek new technologies and tools, which provide solid, fast answers that can support decision-making process. They need a platform that utilize the students’ academic data and turn them into knowledge to make the right strategic decisions. In this paper, a roadmap for implementing a data driven decision support system (DSS) is presented based on an educational data mart. The independent data mart is implemented on the students’ degrees in 8 subjects in a private school (Al-Iskandaria Primary School in Basrah province, Iraq). The DSS implementation roadmap is started from pre-processing paper-based data source and ended with providing three categories of online analytical processing (OLAP) queries (multidimensional OLAP, desktop OLAP and web OLAP). Key performance indicator (KPI) is implemented as an essential part of educational DSS to measure school performance. The static evaluation method shows that the proposed DSS follows the privacy, security and performance aspects with no errors after inspecting the DSS knowledge base. The evaluation shows that the data driven DSS based on independent data mart with KPI, OLAP is one of the best platforms to support short-to-long term academic decisions.
- Published
- 2021
- Full Text
- View/download PDF
26. An investigation on the influence of temperature variation on the performance of tin (Sn) based perovskite solar cells using various transport layers and absorber layers
- Author
-
Sanjay Tiwari, Ayush Khare, and Priyanka Roy
- Subjects
Materials science ,business.industry ,Graphene ,Photovoltaic system ,chemistry.chemical_element ,QC350-467 ,Optics. Light ,Electron transport chain ,Atomic and Molecular Physics, and Optics ,law.invention ,Temperature variation ,ETL ,HTL ,PEDOT:PSS ,chemistry ,law ,Optoelectronics ,SCAPS-1D ,business ,Tin ,Tin based perovskite solar cells ,Perovskite (structure) - Abstract
The perovskite solar cells (PSCs) have become the fastest-growing photovoltaic (PV) cells to date. Despite its numerous merits, these cells face issues, such as poor stability, toxicity, etc. The issue of toxicity is dealt by using less-toxic substitutes of lead (Pb), such as Tin (Sn). However, the performance attained by Sn-based PSCs still lags behind Pb based PSCs to a great extent. In this work, we discuss some of the desirable electron transport layers (ETLs) and hole transport layers (HTLs) that can be used to fabricate an efficient Sn-based PSC, and report their performance. HTLs, such as “Spiro-OMeTAD”, Graphene, PEDOT:PSS, Cu2O, CuI, CuSCN and ETLs, such as ZnO, TiO2, PCBM are used in this study. We also simulate three different Sn perovskite materials based PSCs, (i) MASnI3 (ii) FASnI3 and (iii) CsSn0.5Ge0.5I3. We study theoretically the performance of the PSCs with various transport layers and absorber layers by varying the temperature from 300 K to 400 K.
- Published
- 2021
- Full Text
- View/download PDF
27. EHDEN - D4.3 - Yearly Progress Report on Technical Framework
- Author
-
Maxim Moinat and Michel Van Speybroeck
- Subjects
Tantalus ,ETL ,OHDSI ,EHDEN Database catalogue ,Standardised Vocabularies ,EHDEN portal ,Arachne ,Security ,Athena ,ATLAS ,Data Quality - Abstract
The EHDEN technical framework ( see deliverable D4.1) saw significant progress during its first year across all constituting components. The foundation was laid in a series of workshops that culminated in Deliverable D4.1 Technical Framework Design and Architecture. Subsequently, progress was made across all defined components, in particular – but not limited to- the database, Arachne federated network application, the ETL tooling, the Data Quality Dashboard and a first step towards an integrated security layer. Specific attention was also prepared to provide and integrate documentation. A working prototype of the main applications is available for testing purposes under http://test.ehden.eu. https://www.academy.ehden.eu). A working prototype of the main applications is available for testing purposes under http://test.ehden.eu. This deliverable summarizes the progress on the technical framework in year 1, for more details we refer to D4.1 Technical Framework Design and Architecture.
- Published
- 2019
- Full Text
- View/download PDF
28. EHDEN - D4.1 - Technical Framework Design and Architecture
- Author
-
Klebanov, Gregory, Hottgenroth, Antje, Bochoven, Kees Van, Hughes, Nigel, Moinat, Maxim, Speybroeck, Michel Van, Sandijk, Sebastiaan Van, and Rijnbeek, Peter
- Subjects
ETL ,Database catalogue ,Common data model ,Federated network ,Architecture ,Analytical tools ,Training ,Standardized vocabularies ,EHDEN ,FOS: Civil engineering ,Technical Framework ,Academy ,OMOP - Abstract
The core objective of the EHDEN consortium is to provide all the necessary services that enable a sustainable distributed European data network to perform fast, scalable, and highly reproducible research, while respecting privacy regulations, local data provenance and governance. This includes services and tools to perform data standardization, analytical pipelines, tools to share study results and tools for stakeholder engagement and training. WP4 in EHDEN will implement a technical framework and also all necessary processes for a high-quality and fully reproducible data workflow. Core to EHDEN is open science, which entails the use of open source tools, an open federated health data network, and an engaged community that shares the EHDEN vision and mission. This deliverable provides a high-level overview of the technical framework and describes the functionality of all the components. Furthermore, we explain which components will need to interact and, to some extent, how they will be integrated. The underlying principles of the architecture are maximal usage of available and open source tools, a federated data network approach enabling data profiling, data assessments and full studies, data quality assurance and interactive dashboards. The implementation will include a modular framework fit for up-scaling as well as access security measures. Building the EHDEN ecosystem is clearly a socio-technical challenge. EHDEN must serve a wide range of stakeholders, including industry, regulators, academia, health care system stakeholders, patient organisations, etc. The needs of various stakeholders are considered during the build of the technical framework. The central components, which will be accessible via the EHDEN portal, are: 1) Database Catalogue (Database characteristics, partly automated meta-data generation) 2) EHDEN Network Study Workflow Platform (ARACHNE), including Data Node 3) Study Designer (ATLAS) 4) Dashboards 5) EHDEN Academy (Training) The central platform will interact with local components at Data Partners and Researchers (e.g. OMOP mapping, ETL tools). The core hosting infrastructure is hosted on the AWS platform and co-located in the same region and data centre, with an exception of the ELIXR A&A system, with detailed provider later in this document. The EHDEN ecosystem will implement a Single Sign On (SSO) approach to security, including authentication and authorization, with ELIXR being used as an identity provider. The systems will be integrated e.g. each system will master manage and share with other system its core data sets, as defined in the common data model (to be defined). This document provides an architecture blueprint for the EHDEN framework. It lays a foundation for further work which will continue to evolve during the project lifetime.
- Published
- 2019
- Full Text
- View/download PDF
29. A web-based support system for biometeorological research
- Author
-
Pablo Fdez-Arroyabe, Laura Sebastia, Ángel Marqués-Mateu, and Benjamín Arroquia-Cuadros
- Subjects
Atmospheric Science ,Service (systems architecture) ,010504 meteorology & atmospheric sciences ,Exploit ,Computer science ,Process (engineering) ,Health, Toxicology and Mutagenesis ,Cloud computing ,Reuse ,01 natural sciences ,Data science ,03 medical and health sciences ,0302 clinical medicine ,Meteorology ,Web application ,Humans ,0105 earth and related environmental sciences ,030203 arthritis & rheumatology ,Internet ,Ecology ,business.industry ,Geoprocessing ,Geomatics ,ETL ,Webmapping ,INGENIERIA CARTOGRAFICA, GEODESIA Y FOTOGRAMETRIA ,Web mapping ,Biometeorology ,business ,LENGUAJES Y SISTEMAS INFORMATICOS ,Forecasting - Abstract
[EN] Data are the fundamental building blocks to conduct scientific studies that seek to understand natural phenomena in space and time. The notion of data processing is ubiquitous and nearly operates in any project that requires gaining insight from the data. The increasing availability of information sources, data formats and download services offered to the users, makes it difficult to reuse or exploit the potential of those new resources in multiple scientific fields. In this paper, we present a spatial extract-transform-load (spatial-ETL) approach for downloading atmospheric datasets in order to produce new biometeorological indices and expose them publicly for reuse in research studies. The technologies and processes involved in our work are clearly defined in a context where the GDAL library and the Python programming language are key elements for the development and implementation of the geoprocessing tools. Since the National Oceanic and Atmospheric Administration (NOAA) is the source of information, the ETL process is executed each time this service publishes an updated atmospheric prediction model, thus obtaining different forecasts for spatial and temporal analyses. As a result, we present a web application intended for downloading these newly created datasets after processing, and visualising interactive web maps with the outcomes resulting from a number of geoprocessing tasks. We also elaborate on all functions and technologies used for the design of those processes, with emphasis on the optimisation of the resources as implemented in cloud services
- Published
- 2019
30. Web Applications for Interoperability of Biodiversity Data in France
- Author
-
El-Makki Voundy, Amandine Sahl, Olivier Rovellotti, Julien Corny, and Camille Monchicourt
- Subjects
business.industry ,Data management ,Interoperability ,Biodiversity ,standard ,General Medicine ,web ,World Wide Web ,ETL ,Open data ,open-data ,Web application ,data management ,business - Abstract
In the context of the French law for the reconquest of biodiversity (Legifrance 2016), public and private stakeholders must share environmental impact assessment data as open data to the French National Inventory of the Natural Heritage (Muséum national d'Histoire naturelle 2019). In order to achieve this, the Information System for Nature and Landscape (SINP) provided standards and guidelines for protocols, taxonomy, and metadata in order to comply with the FAIR (Findability, Accessibility, Interoperability, Reusability; Wilkinson et al. 2016) concept of data management. However, private institutions, who must run environmental impact assessments, can be confused by the number of technical details and the high level of data literacy needed to comply with these standards. Here, we will present several tools (GeoNature 2019, Natural Solutions 2019) that we are currently developing to facilitate the raw biodiversity data conversion and export using SINP standards (Jomier et al. 2018). Although SINP and Darwin Core (Wieczorek et al. 2012) standards share common concepts and properties, SINP standards focus on data reusability in the framework of French environmental programs, resulting in the creation of specific mandatory attributes (Chataigner et al. 2014). Our tools perform extract, transform and load (ETL) operations as well as RDF (Resource Description Framework) exports using ad-hoc ontology adapted to the specificities of the SINP standard. Finally, we observed that despite the success of the process (after one year, nearly one thousand datasets are available on the SINP web platform), several issues still need to be addressed, including data quality issues, which could hamper data reuse by stakeholders.
- Published
- 2019
31. Data discovery method for Extract- Transform-Load
- Author
-
Kary Främling, Manik Madhikerrni, Adj. Prof. Främling Kary group, Department of Computer Science, Aalto-yliopisto, and Aalto University
- Subjects
ta113 ,Data Warehouse ,Computer science ,business.industry ,Process (engineering) ,Extract, transform, load ,Data discovery ,Information System ,Data Discovery ,Data modeling ,Database ,ETL ,Trigger ,Data extraction ,Unified Modeling Language ,Analytics ,Reverse Engineering ,Information Retrieval ,Information system ,Process Mapping ,Software engineering ,business ,computer ,computer.programming_language - Abstract
openaire: EC/H2020/688203/EU//BIoTope Information Systems (ISs) are fundamental to streamline operations and support processes of any modern enterprise. Being able to perform analytics over the data managed in various enterprise ISs is becoming increasingly important for organisational growth. Extract, Transform, and Load (ETL) are the necessary pre-processing steps of any data mining activity. Due to the complexity of modern IS, extracting data is becoming increasingly complicated and time-consuming. In order to ease the process, this paper proposes a methodology and a pilot implementation, that aims to simplify data extraction process by leveraging the end-users' knowledge and understanding of the specific IS. This paper first provides a brief introduction and the current state of the art regarding existing ETL process and techniques. Then, it explains in details the proposed methodology. Finally, test results of typical data-extraction tasks from four commercial ISs are reported.
- Published
- 2019
- Full Text
- View/download PDF
32. Multi-node Approach for Map Data Processing
- Author
-
Vít Ptošek and Kateřina Slaninová
- Subjects
Big data processing ,020203 distributed computing ,Data processing ,road network quality ,Finite-state machine ,Computer science ,Spatial database ,0211 other engineering and technologies ,pipeline ,Routing algorithm ,021107 urban & regional planning ,OpenStreetMap ,02 engineering and technology ,computer.file_format ,Hierarchical Data Format ,computer.software_genre ,Graph ,big data parsing ,ETL ,state machine ,multi-node processing ,0202 electrical engineering, electronic engineering, information engineering ,Data mining ,Persistent data structure ,computer - Abstract
OpenStreetMap (OSM) is a popular collaborative open-source project that offers free editable map across the whole world. However, this data often needs a further on-purpose processing to become the utmost valuable information to work with. That is why the main motivation of this paper is to propose a design for big data processing along with data mining leading to the obtaining of statistics with a focus on the detail of a traffic data as a result in order to create graphs representing a road network. To ensure our High-Performance Computing (HPC) platform routing algorithms work correctly, it is absolutely essential to prepare OSM data to be useful and applicable for above-mentioned graph, and to store this persistent data in both spatial database and HDF5 format. Web of Science 897 104 91
- Published
- 2019
- Full Text
- View/download PDF
33. Sistema BI per a l'avaluació de tractaments per reduir el colesterol
- Author
-
Sala Pardo, Joan, Prados Carrasco, Ferran, and Amorós Alcaraz, David
- Subjects
ETL ,Open source intelligence -- TFM ,open source ,código abierto ,Inteligencia de código abierto -- TFM ,codi obert ,Intel·ligència de codi obert -- TFM ,business intelligence - Abstract
L'objectiu del treball és implementar un sistema de BI capaç de donar resposta al negoci sobre l'efectivitat de diferents tractaments per a reduir el colesterol. Les dades de les que es disposa per analitzar comprenen informació relativa als tractaments, hàbits dels pacients, lloc de residència i les mesures necessàries per a poder determinar l'evolució del colesterol. Addicionalment als dos punts anteriors, per a la tria de la solució BI s'ha tingut en compte que fos de codi lliure, que tingués alguna llicència gratuïta, que fos instal·lable en un sistema operatiu Linux i que la interfície principal d'usuari sigui web. Comparant les principals alternatives que acomplissin els requisits anteriors es tria Knowage CE com a producte BI i postgreSQL com a motor de base de dades. Es pretén donar resposta al negoci mitjançant anàlisis multidimensionals, pel que es dissenya el model del magatzem de dades, del cub OLAP i dels corresponents processos ETL. La fase d'implementació comença, partint d'un entorn adequat per a l'execució de les solucions triades, amb la instal·lació i configuració del motor de base de dades, del Knowage, dels processos ETL i de l'esquema OLAP. Amb aquesta iinfraestructura, es creen els anàlisis per respondre al negoci. Més enllà dels coneguts beneficis del BI, el que es pot destacar del treball és que el que s'ha implementat s'ha fet amb una inversió inicial nul·la i el producte és capaç de satisfer les necessitats del negoci actuals i futures. El objetivo del trabajo es implementar un sistema de BI capaz de dar respuesta al negocio sobre la efectividad de diferentes tratamientos para reducir el colesterol. Los datos de las que se dispone para analizar comprenden información relativa a los tratamientos, hábitos de los pacientes, lugar de residencia y las medidas necesarias para poder determinar la evolución del colesterol. Adicionalmente a los dos puntos anteriores, para la elige de la solución BI se ha tenido en cuenta que fuera de código libre, que tuviera alguna licencia gratuita, que fuera instalable en un sistema operativo Linux y que la interfaz principal de usuario sea web. Comparando las principales alternativas que cumplieran los requisitos anteriores se elige Knowage CE como producto BI y postgreSQL como motor de base de datos. Se pretende dar respuesta al negocio mediante análisis multidimensionales, por el que se diseña el modelo del almacén de datos, del cubo OLAP y de los correspondientes procesos ETL. La fase de implementación empieza, partiendo de un entorno adecuado para la ejecución de las soluciones elegidas, con la instalación y configuración del motor de base de datos, del Knowage, de los procesos ETL y del esquema OLAP. Con esta iinfraestructura, se crean los análisis para responder al negocio. Más allá de los conocidos beneficios del BI, el que se puede destacar del trabajo es que el que se ha implementado se ha hecho con una inversión inicial nula y el producto es capaz de satisfacer las necesidades del negocio actuales y futuras. The main goal of this paper is to deploy a BI system able to answer questions coming from the business in regards of the effectivity of different treatments to reduce the patients' cholesterol level. The available data contain information about the treatments, the patients, their habits, where do they live and also the measurements needed to track the their evolution. To choose a BI system, besides these two points already mentioned, it has been also considered that the solution had to be open source, should offer some kind of free license, it had to be Linux compatible and the main user interface had to be web based. Having compared different products accomplishing these conditions, Knowage CE as BI solution and potgreSQL as RDBMS were choosen. It's pretended to perform multidimensional analyses in order to answer the questions from the business so the design included a data warehouse, an OLAP cube and the reespective ETL processes. Considering that a proper environment was already available, the deployment phase comprised both installation and set up of postgreSQL, Knowage, the ETL processes and the OLAP cube. Then, the needed analyses were performed in order to answer the business' questions. In addition to the well known benefits of BI, what can be highlited from this paper is the initial investment done to achive this paper's goals, which is none.
- Published
- 2019
34. Análisis y visualización Big Data en eventos deportivos
- Author
-
Monedero Carreras, Álvaro José, Bregón Bregón, Aníbal, Martínez Prieto, Miguel Angel, and Universidad de Valladolid. Escuela de Ingeniería Informática de Valladolid
- Subjects
Big Data ,ETL ,Visualización ,Análisis ,Fútbol profesional - Abstract
Hoy en día, la utilización de las tecnologías que componen el mundo Big Data ha aumentado de forma exponencial debido a la búsqueda de agilidad e innovación por parte de las grandes empresas. Estas tecnologías proporcionan una gran capacidad de almacenamiento y procesamiento de la información, pero para poder sacar partido de los datos es necesario disponer de una correcta herramienta de visualización. De esta forma, en este proyecto se pretenden aunar ambos escenarios, desarrollando una herramienta de extracción de información y estadísticas de resultados deportivos a través de web scraping, su posterior transformación y almacenamiento en base de datos, y por último, disponer de diferentes cuadros de mando desde los que poder explotar la información almacenada e indexada de una forma visual. Con el desarrollo de este proyecto se extraerá y almacenará la información de partidos y clasificaciones de la Liga de Fútbol Profesional de España a lo largo de multitud de temporadas, permitiendo al usuario final realizar el análisis y la visualización de la información recogida., Nowadays, the use of Big Data technologies has grown considerably due to the search for agility and innovation by business companies. Big Data technologies provide large storage and processing capacity but, without a good data visualization strategy, all these capacities are worthless for business. Once stablished the importance of data visualization, the aim this project is the combination of the use of Big Data technologies for the extraction, transformation and loading of data, and finally the use of a correct visualization strategy for data indexation and visualization of different data dashboards. With de development of this project, we will get the information about matches and standings of different La Liga seasons from early 2000s until now for matches and from early 90s until now and store it. Finally, one of the main goals of the project is to highlight the importance of stablishing a good visualization strategy, so data will be indexed in a search engine and several dashboards will be created to analyze the stored information., Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos), Máster en Ingeniería Informática
- Published
- 2019
35. Sistema de inteligencia de negocio para el análisis de los tratamientos de reducción del colesterol
- Author
-
Mantilla Gómez, Fernando, Prados Carrasco, Ferran, and Amorós Alcaraz, David
- Subjects
ETL ,OLAP ,business intelligence - Abstract
Crear un entorno Business Intelligence (BI) que posibilite el análisis de la información generada durante un experimento que tiene como finalidad comprobar la eficacia de los diferentes tratamientos que tienen como objetivo la reducción de los niveles de colesterol. La información se analizará en forma de cubos de datos lo que permitirá un análisis mucho más cuidadoso y la extracción de conclusiones mucho más elaboradas. Para el desarrollo del proyecto se utilizará una metodología para el desarrollo de proyectos de Business Intelligence que facilite en análisis, diseño e implementación de la solución. Se utiliza Pentaho como solución de Business Intelligence la creación de los procesos ETL, la generación de cubos y la creación del cuadro de mandos. El resultado final es la implementación de un sistema Business Intelligence que facilita la adquisición, el almacenamiento y la explotación de datos asociados a pacientes a los que se los ha diagnosticado los niveles de colesterol, permitiendo a los usuarios comprobar la eficacia de los tratamientos y realizar filtros por zona geográfica y hábitos alimenticios entre otros. Crear un entorn Business Intelligence (BI) que possibiliti l'anàlisi de la informació generada durant un experiment que té com a finalitat comprovar l'eficàcia dels diferents tractaments que tenen com a objectiu la reducció dels nivells de colesterol. La informació s'analitzarà en forma de galledes de dades el que permetrà una anàlisi molt més acurada i l'extracció de conclusions molt més elaborades. Per al desenvolupament del projecte s'utilitzarà una metodologia per al desenvolupament de projectes de Business Intelligence que faciliti en anàlisis, disseny i implementació de la solució. S'utilitza Pentaho com a solució de Business Intelligence la creació dels processos ETL, la generació de galledes i la creació del quadre de comandaments. El resultat final és la implementació d'un sistema Business Intelligence que facilita l'adquisició, l'emmagatzematge i l'explotació de dades associades a pacients als quals els hi ha diagnosticat els nivells de colesterol, permetent als usuaris comprovar l'eficàcia dels tractaments i realitzar filtres per zona geogràfica i hàbits alimentosos entre altres. Create a Business Intelligence (BI) environment that enables the analysis of the information generated during an experiment that aims to verify the effectiveness of different treatments that aim to reduce cholesterol levels. The information will be analyzed in the form of cubes of data which will allow a much more careful analysis and the extraction of much more elaborate conclusions. For the development of the project, a methodology will be used for the development of Business Intelligence projects that facilitate analysis, design and implementation of the solution. Pentaho is used as a Business Intelligence solution for the creation of ETL processes, the generation of cubes and the creation of the scorecard. The final result is the implementation of a Business Intelligence system that facilitates the acquisition, storage and exploitation of data associated with patients who have been diagnosed with cholesterol levels, allowing users to check the effectiveness of treatments and perform filters by geographical area and eating habits among others.
- Published
- 2019
36. A methodology for cohort harmonisation in multicentre clinical research
- Author
-
Isabelle Bos, João Rafael Almeida, Pieter Jelle Visser, José Luís Oliveira, Luís Bastão Silva, VU University medical center, Neurology, and Amsterdam Neuroscience - Neurodegeneration
- Subjects
Computer science ,Computer applications to medicine. Medical informatics ,OMOP CDM ,R858-859.7 ,Database schema ,Health Informatics ,Data structure ,Data science ,Variety (cybernetics) ,ETL ,Clinical trial ,Data harmonisation ,Open source ,Clinical research ,Cohort ,Observational study ,Observational studies ,Clinical studies - Abstract
Many clinical trials and scientific studies have been conducted aiming for better understanding of specific medical conditions. However, these studies are often based on a small number of participants due to the difficulty in finding people with similar medical characteristics and available to participate in the studies. This is particularly critical in rare diseases, where the reduced number of subjects hinders reliable findings. To generate more substantial clinical evidence by increasing the power of the analyses, researchers have started to perform data harmonisation and multiple cohort analyses. However, the analysis of heterogeneous data sources implies dealing with different data structures, terminologies, concepts, languages and, most importantly, the knowledge behind the data. In this paper, we present a methodology to harmonise different cohorts into a standard data schema, helping the research community to generate evidence from a wider variety of data sources. Our methodology was inspired by the OHDSI Common Data Model, which aims to harmonise EHR datasets for observational studies, leveraging on knowledge and open source tools to perform multicentric disease-specific studies. This proposal was validated using Alzheimer’s Disease cohorts from several countries, combining at the end 6,669 subjects and 172 clinical concepts. The harmonised datasets now enable multi-cohort querying and analysis, helping in the execution of new research. The methodology was implemented in Python language and is available, under the MIT licence, at https://bioinformatics-ua.github.io/CMToolkit/ .
- Published
- 2021
- Full Text
- View/download PDF
37. Design of Novel ETL Model to Analyse Corona Virus Data
- Author
-
Amit Kumar Dewangan, Akhilesh Kumar Shrivas, and S. M. Ghosh
- Subjects
lcsh:Medical technology ,Coronavirus disease 2019 (COVID-19) ,Computer science ,lcsh:Medicine ,Value (computer science) ,Health Informatics ,text mining ,02 engineering and technology ,computer.software_genre ,Execution time ,Field (computer science) ,etl ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,corona virus ,data analytics ,pandemic ,lcsh:R ,05 social sciences ,Process (computing) ,Multiple data ,covid-19 ,lcsh:R855-855.5 ,Join (sigma algebra) ,050211 marketing ,020201 artificial intelligence & image processing ,Data mining ,Day to day ,computer - Abstract
INTRODUCTION: The corona disease was first recognized in 2019 in Wuhan, which is the capital of China’s Hubei-province, and from then it continued spreading and as a result declared as a pandemic by all nations The COVID-19 virus has different effects on people in various ways It is a kind of respiratory disease The confirmed cases are increasing day to day in India, which leads to complete lockdown throughout the nation OBJECTIVE: The objective of this research is to design a novel Extract-Trandform and Load NETL model to analyse covid-19 data in india METHODS: The extraction of useful information from a large database is a well-connected research field of text mining This paper is proposed a novel extract-transform-load ETL model to process the COVID-19 data of India to get the exact recovery data from the multiple data sources from different states of India In this, a knowledge-based model that generate knowledge based on three different module split, validation, and join is discussed RESULTS: The outcomes of the proposed NETL process are, output file which has the description of total positive cases, active cases, recovery cases, and death rate, based on different regions The analysis of NETL is done based on accuracy, failure count, and execution time The proposed NETL process is more accurate and taking less compilation time with minimum failure count as compared with existing models CONCLUSION: To analyze the coronavirus data in India, a novel ETL (NETL) model is proposed In this model, a total of 9 CSV files is processed as input files to get different results in different categories This model is having three modules namely splitting, verification, and join The dataset is split into based on its coupling attributes and then joined with a single value to produce the updated results as per the current dataset The last stage of this process is to join the data which is generated through splitting The proposed NETL model is more accurate as compared with existing ETM models © 2020 Amit Kumar Dewangan et al , licensed to EAI
- Published
- 2020
- Full Text
- View/download PDF
38. Urea-Doped ZnO Films as the Electron Transport Layer for High Efficiency Inverted Polymer Solar Cells
- Author
-
Yuezhen Wu, Yuying Hao, Guo Chen, Ruqin Zhang, Kunpeng Guo, Hua Wang, Zhongqiang Wang, and Z.H. Wang
- Subjects
Electron transport layer ,Materials science ,Passivation ,urea ,02 engineering and technology ,010402 general chemistry ,01 natural sciences ,Polymer solar cell ,lcsh:Chemistry ,chemistry.chemical_compound ,Exciton dissociation ,Energy conversion efficiency ,Doping ,General Chemistry ,021001 nanoscience & nanotechnology ,PCE ,0104 chemical sciences ,ETL ,lcsh:QD1-999 ,chemistry ,Chemical engineering ,ZnO ,Urea ,0210 nano-technology ,Light absorber ,polymer solar cells - Abstract
In this paper, urea-doped ZnO (U-ZnO) is investigated as a modified electron transport layer (ETL) in inverted polymer solar cells (PSCs). Using a blend of Poly{4,8-bis[(2-ethylhexyl)oxy] benzo [1,2-b:4,5-b'] dithiophene-2,6-diyl-alt-3-fluoro-2-[(2-ethylhexyl)carbonyl] thieno [3,4-b] thiophene-4,6-diyl}(PTB7), and [6,6]-phenyl-C71-butyric acid methyl ester (PC71BM) as light absorber, a champion power conversion efficiency (PCE) of 9.15% for U-ZnO ETL based PSCs was obtained, which is 15% higher than that of the pure ZnO ETL based PSCs (7.76%). It was demonstrated that urea helps to passivate defects in ZnO ETL, resulting in enhanced exciton dissociation, suppressed charge recombination and efficient charge extraction efficiency. This work suggests that the utilization of the U-ZnO ETL offer promising potential for achieving highly efficient PSCs.
- Published
- 2018
- Full Text
- View/download PDF
39. A Proposed DDS Enabled Model for Data Warehouses with Real Time Updates
- Author
-
Munesh Chandra Trivedi, Avadhesh Kumar Gupta, and Virendra Kumar Yadav
- Subjects
SIMPLE (military communications protocol) ,Computer science ,media_common.quotation_subject ,Triggered update ,Initialization ,Data type ,Phase (combat) ,Data science ,Data warehouse ,Mapping engine ,Design phase ,Etl ,Deviation detection system ,Contradiction ,Quality (business) ,media_common - Abstract
Data warehouse generally contains both types of data i.e. historical & current data from various data sources. Data warehouse in world of computing can be defined as system created for analysis and reporting of these both types of data. These analysis report is then used by an organization to make decisions which helps them in their growth. Construction of data warehouse appears to be simple, collection of data from data sources into one place (after extraction, transform and loading). But construction involves several issues such as inconsistent data, logic conflicts, user acceptance, cost, quality, security, stake holder’s contradictions, REST alignment etc. These issues need to be overcome otherwise will lead to unfortunate consequences affecting the organization growth. Proposed model tries to solve these issues such as REST alignment, stake holder’s contradiction etc. by involving experts of various domains such as technical, analytical, decision makers, management representatives etc. during initialization phase to better understand the requirements and mapping these requirements to data sources during design phase of data warehouse.
- Published
- 2018
40. Sistema d'intel·ligència de negoci per a l'anàlisi de la publicitat en entorns digitals
- Author
-
Ribó Pascual, Albert, Universitat Oberta de Catalunya, Guitart Hormigo, María Isabel, and Amorós Alcaraz, David
- Subjects
ETL ,OLAP ,Sistemas de información para la gestión -- TFM ,Management information systems -- TFM ,business intelligence ,Sistemes d'informació per a la gestió -- TFM - Abstract
En aquest treball es mostrarà com, gràcies a eines de Business Intelligence, es pot ser capaç d'extreure relacions entre les dades per tal de millorar el rendiment dels anuncis. Per fer-ho s'estudiaran les plataformes BI, es triarà una i es seguiran els passos necessaris per obtenir la informació necessària i poder ajudar a la presa de decisions. S'obtindrà una visió global de perquè les plataformes BI han esdevingut elements imprescindibles per a la competitivitat de les empreses, donant suport a l'anàlisi complex de dades multidimensionals per, finalment, guiar la presa de decisions basades en fets. En este trabajo se mostrará como, gracias a herramientas de Business Intelligence, se puede ser capaz de extraer relaciones entre los datos para mejorar el rendimiento de los anuncios. Para hacerlo se estudiarán las plataformas BI, se elegirá una y se seguirán los pasos necesarios para obtener la información necesaria y poder ayudar a la toma de decisiones. Se obtendrá una visión global de porque las plataformas BI han acontecido elementos imprescindibles para la competitividad de las empresas, apoyando al análisis complejo de datos multidimensionals por, finalmente, guiar la toma de decisiones basadas en hechos. This paper will show how, thanks to Business Intelligence tools, we can be able to extract relationships between data in order to improve the performance of the ads. To do so, BI platforms will be studied, one will be chosen and all the necessary steps will be followed to obtain the necessary information and to be able to help with decision-making processes. A global vision will be obtained why BI platforms have become indispensable elements for the competitiveness of companies, supporting the complex analysis of multidimensional data to finally guide decision-making processes based on facts.
- Published
- 2018
41. Using Semantic Web Technologies for Exploratory OLAP: A Survey
- Author
-
Rafael Berlanga, Torben Bach Pedersen, Alkis Simitsis, Victoria Nebot, Alberto Abelló, Oscar Romero, María José Aramburu, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, and Universitat Politècnica de Catalunya. MPI - Modelització i Processament de la Informació
- Subjects
Computer science ,Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC] ,education ,Semantic data model ,computer.software_genre ,Business intelligence ,Data modeling ,Data mining ,Semantic Web ,OLAP ,business.industry ,Online analytical processing ,InformationSystems_DATABASEMANAGEMENT ,Data discovery ,business data processing ,Reasoning ,Data science ,Data warehousing ,Data warehouse ,Computer Science Applications ,ETL ,data warehouses ,Web semàtica ,Computational Theory and Mathematics ,Business -- Data processing ,Scalability ,Mineria de dades ,business ,computer ,Semantic web ,Information Systems ,Data integration - Abstract
This paper describes the convergence of some of the most influential technologies in the last few years, namely data warehousing (DW), on-line analytical processing (OLAP), and the Semantic Web (SW). OLAP is used by enterprises to derive important business-critical knowledge from data inside the company. However, the most interesting OLAP queries can no longer be answered on internal data alone, external data must also be discovered (most often on the web), acquired, integrated, and (analytically) queried, resulting in a new type of OLAP, exploratory OLAP . When using external data, an important issue is knowing the precise semantics of the data. Here, SW technologies come to the rescue, as they allow semantics (ranging from very simple to very complex) to be specified for web-available resources. SW technologies do not only support capturing the “passive” semantics, but also support active inference and reasoning on the data. The paper first presents a characterization of DW/OLAP environments, followed by an introduction to the relevant SW foundation concepts. Then, it describes the relationship of multidimensional (MD) models and SW technologies, including the relationship between MD models and SW formalisms. Next, the paper goes on to survey the use of SW technologies for data modeling and data provisioning, including semantic data annotation and semantic-aware extract, transform, and load (ETL) processes. Finally, all the findings are discussed and a number of directions for future research are outlined, including SW support for intelligent MD querying, using SW technologies for providing context to data warehouses, and scalability issues.
- Published
- 2015
- Full Text
- View/download PDF
42. BigDimETL: ETL for Multidimensional Big Data
- Author
-
Faiza Ghozzi, Hana Mallek, Faiez Gargouri, Olivier Teste, Multimedia, InfoRmation systems and Advanced Computing Laboratory (MIRACL), Faculté des Sciences Economiques et de Gestion de Sfax (FSEG Sfax), Université de Sfax - University of Sfax-Université de Sfax - University of Sfax, Systèmes d’Informations Généralisées (IRIT-SIG), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse - Jean Jaurès (UT2J), Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Université de Sfax (TUNISIA), and Institut National Polytechnique de Toulouse - INPT (FRANCE)
- Subjects
Decision support system ,Computer science ,Process (engineering) ,Twitter ,Big data ,02 engineering and technology ,Multidimensional structure ,Data warehouse ,Bigdata ,020204 information systems ,Parallel processing ,0202 electrical engineering, electronic engineering, information engineering ,MapReduce ,Social media ,Structure (mathematical logic) ,Théorie de l'information ,business.industry ,Recherche d'information ,Data science ,ETL ,[INFO.INFO-IT]Computer Science [cs]/Information Theory [cs.IT] ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,020201 artificial intelligence & image processing ,business - Abstract
International audience; With the broad range of data available on the World Wide Web and the increasing use of social media such as Facebook, Twitter, YouTube, etc. a “Big Data” notion has emerged. This latter has become an important aspect in nowadays business since it is full of important knowledge that is crucial for effective decision making. However, this kind of data brings with it new problems and challenges for the Decision Support System (DSS) that must be addressed. In this paper, we propose a new approach called BigDimETL (Big Dimensional ETL) that deals with ETL (Extract-Transform-Load) development process. Our approach focuses on integrating Big Data taking into account the MultiDimensional Structure (MDS) through a MapReduce paradigm.
- Published
- 2017
- Full Text
- View/download PDF
43. Disseny i implementació d'un sistema d'intel·ligència de negoci per parcs eòlics amb boies meteorològiques
- Author
-
Béjar Latonda, Santiago, Universitat Oberta de Catalunya, Amorós Alcaraz, David, and Guitart Hormigo, María Isabel
- Subjects
ETL ,inteligencia de negocio ,Bases de dades relacionals -- TFM ,JasperReports ,intel·ligència de negoci ,Relational databases -- TFM ,business intelligence ,Bases de datos relacionales -- TFM - Abstract
La finalitat del present treball consisteix en el disseny i implementació d'una solució de Business Intelligence (BI) per tal d'analitzar les dades d'una sèrie de parcs eòlics offshore. En particular, 1) dissenyar i implementar un magatzem de dades (DWH) adequat a les dades originals i les anàlisis que es volen realitzar 2) dissenyar i implementar els processos ETL per transformar i guardar les dades al DWH 3) comparar i implementar diferents solucions de BI de font oberta 4) dissenyar i implementar el cub OLAP eficient per fer les consultes i 5) realitzar les anàlisis sobre el cub OLAP. The purpose of this work is to design and implement a Business Intelligence (BI) solution that analyzes data from a number of offshore wind farms. In particular, 1) design and implement a data warehouse (DWH) capable of handling the original data and performing the desired analysis 2) design and implement ETL processes that transform and save the data to DWH 3) compare and deploy various open source BI solutions 4) design and implement an efficient OLAP cube to perform queries and 5) perform analysis on the OLAP cube. La finalidad del presente trabajo consiste en el diseño e implementación de una solución de Business Intelligence (BI) para analizar los datos de una serie de parques eólicos offshore. En particular, 1) diseñar e implementar un almacén de datos (DWH) adecuado a los datos originales y los análisis que se quieren realizar 2) diseñar e implementar los procesos ETL para transformar y guardar los datos al DWH 3) comparar e implementar diferentes soluciones de BI de fuente abierta 4) diseñar e implementar el cubo OLAP eficiente para hacer las consultas y 5) realizar los análisis sobre el cubo OLAP.
- Published
- 2017
44. Anàlisis d'eines Bussines Intelligence en el mercat actual
- Author
-
Mancebo González, Natalia, Universitat Oberta de Catalunya, Andrés Sanz, Humberto, and Daradoumis Haralabus, Atanasi
- Subjects
ETL ,inteligencia de negocio ,dashboard ,informs ,reports ,Sistemes d'informació per a la gestió -- TFG ,informes ,bussines intelligence ,intel·ligencia de negoci ,Management information systems -- TFG ,Sistemas de información para la gestión -- TFG - Abstract
El desenvolupament del treball, consisteix en realitzar un anàlisis d'eines Bussines Intelligence en el mercat actual. Per la realització del treball, primer es realitza una explicació prèvia del components que engloben Bussines Intelligence, amb els quals posteriorment s'utilitzaran per poder realitzar una valoració de les eines seleccionades. El treball es centrarà amb un estudi de 20 eines Bussines Intelligence on es detallarà la descripció, característiques així com punts forts i febles. With the development of the same one, Bussines Intelligence consists of realizing an analysis of tools on the current market. For the accomplishment of the work, first there is realized a previous explanation of the components that there includes BI (Bussines Intelligence), with which later they will be in use for being able to realize a valuation of the selected tools. The work is it will centre on a study of 20 tools BI where it is will detail the description, characteristics as well as strong and weak points. El desarrollo del trabajo, consiste al realizar un análisis de herramientas Bussines Intelligence en el mercado actual. Por la realización del trabajo, primero se realiza una explicación previa del componentes que engloban Bussines Intelligence, con los cuales posteriormente se utilizarán para poder realizar una valoración de las herramientas seleccionadas. El trabajo se centrará con un estudio de 20 herramientas Bussines Intelligence donde se detallará la descripción, características así como puntos fuertes y débiles.
- Published
- 2017
45. Aplicación de herramientas de Business Intelligence en datos del entorno de salud
- Author
-
Ayechu Abendaño, Uxue, Escuela Técnica Superior de Ingenieros Industriales y de Telecomunicación, Telekomunikazio eta Industria Ingeniarien Goi Mailako Eskola Teknikoa, and Astrain Escola, José Javier
- Subjects
ETL ,Informática ,Data mart ,Tableau ,Data warehouse ,Medicina ,Business intelligence - Abstract
El proyecto surge de la necesidad de explotar la información sanitaria de Navarra. Anteriormente, esta necesidad se intentaba satisfacer mediante la creación de listas impresas, hojas de cálculo, Access vinculados a bases de datos OLTP, cuadros de mando en PDF, etc. Actualmente, estos métodos son insostenibles debido al alto coste de personal técnico que requiere la creación de los informes manualmente y a las dificultades para explotar los distintos productos con información clínica de manera conjunta. Para ello se ha desarrollado un sistema de Business Intelligence (BI) que integra la información clínica y administrativa proveniente de los diversos productos como Atenea, Lakora, Leire, HCI y Lamia. El sistema provee un Data Mart con varias dimensiones y tablas de hechos muy interesantes para el estudio de los datos sanitarios. Dichas dimensiones y tablas integran la información de todos los productos mencionados obteniendo información completa y de calidad. Además del Data Mart, se generan informes muy valiosos, como por ejemplo el cuadro de mandos de crónicos, que muestra indicadores sobre los pacientes crónicos de Navarra, con varias agregaciones, por zona y por plaza. El Data Mart y los cuadros de mandos se emplean para ayudar a la toma de decisiones importantes como el destino de los recursos de sanidad de Gobierno de Navarra This project arises from the necessity to exploit the health information of Navarre. Previously, this necessity was intended to be met by creating printed lists, spreadsheets, Access linked to OLTP databases, PDF dashboards, and so on. Currently, these methods are unsustainable due to the high cost of technical personnel that requires the creation of manual reports and the difficulties to exploit the different products with clinical information jointly. For this reason, it is proposed the development of a Business Intelligence (BI) system that integrates clinical and administrative information from the products such as Atenea, Lakora, Leire, HCI and Lamia. In addition, the system provide a Data Mart with dimensions and tables of facts very interesting for health data analysis. Those dimensions and tables, generate complete and quality information. Furthermore, we generate valuable dashboards, for example a control panel for chronic patients that shows indicators about chronic patients in Navarre, with different aggregations, by zone and place. This panel controls and Data Mart helps take important decisions such as the destination of health resources Graduado o Graduada en Ingeniería Informática por la Universidad Pública de Navarra Informatika Ingeniaritzako Graduatua Nafarroako Unibertsitate Publikoan
- Published
- 2017
46. Tecnología Business Intelligence para tomar las mejores decisiones en tiendas Gourmet
- Author
-
Alcañiz Villanueva, José Ángel, Universitat Oberta de Catalunya, Andrés Sanz, Humberto, and Daradoumis Haralabus, Atanasi
- Subjects
ETL ,inteligencia de negocio ,OLAP ,magatzem de dades ,Pentaho ,almacén de datos ,data warehouse ,Sistemes d'informació per a la gestió -- TFG ,Management information systems -- TFG ,intel·ligència de negoci ,Sistemas de información para la gestión -- TFG ,business intelligence - Abstract
En el presente trabajo, se realiza un proceso completo de BI a través de algunas de sus diferentes metodologías y tecnologías, para transformar los datos de una organización en información que permita responder a las necesidades del negocio. Para empezar, se selecciona la herramienta que mejor se adapta a las necesidades del trabajo. Una vez las herramientas están instaladas y configuradas para el entorno de trabajo, se desarrolla un proceso ETL para construir un data warehouse. Este repositorio será idóneo para obtener información, ya que los datos se integran y organizan en un modelo adecuado, para poder analizarlos y presentar la información rápidamente. Finalmente, teniendo en cuenta los requerimientos de la empresa se crea un sistema de reporting y cubos OLAP. Este método, permitirá dotar a la empresa del conocimiento necesario para tomar las mejores decisiones en el cumplimiento de sus objetivos y en la mejora de sus procesos empresariales. En el present treball, es realitza un procés complet de BI a través d'algunes de les seves diferents metodologies i tecnologies, per transformar les dades d'una organització en informació que permeti respondre a les necessitats del negoci. Per començar, es selecciona l'eina que millor s'adapta a les necessitats de la feina. Un cop les eines estan instal·lades i configurades per a l'entorn de treball, es desenvolupa un procés ETL per construir un data warehouse. Aquest repositori serà idoni per obtenir informació, ja que les dades s'integren i s'organitzen en un model adequat, per poder analitzar-los i presentar la informació ràpidament. Finalment, tenint en compte els requeriments de l'empresa es crea un sistema de reporting i cubs OLAP. Aquest mètode, permetrà dotar a l'empresa del coneixement necessari per prendre les millors decisions en el compliment dels seus objectius i en la millora dels seus processos empresarials. In the present project, a complete process of BI is performed through some of its different methodologies and technologies, to transform organizational data into information that allows answer the needs of the business. Firstly, the tool that best suits the needs of the project is selected. Once the tools are installed and configured for the working environment, an ETL process is developed for build a data warehouse. This will be a suitable repository for obtaining information, since the data are integrated and organized into an appropriate model that is able to analyze them and present the information quickly. Finally, taking into account the requirements of the company, a reporting and OLAP cubes system is developed. This method will provide the company with the necessary knowledge to make the best decisions in the fulfillment of its objectives and in the improvement of its business processes.
- Published
- 2017
47. Automating user-centered design of data-intensive processes
- Author
-
Theodorou, Vasileios, Abelló, Alberto, Lehner, Wolfgang, Abello, Alberto, Technische Universität Dresden, Abelló Gamazo, Alberto, and Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
- Subjects
ETL ,Espionatge industrial ,Informàtica [Àrees temàtiques de la UPC] ,ETL, process quality, quality measures, user-centered design ,ddc:004 ,Gestor de dades - Abstract
Business Intelligence (BI) enables organizations to collect and analyze internal and external business data to generate knowledge and business value, and provide decision support at the strategic, tactical, and operational levels. The consolidation of data coming from many sources as a result of managerial and operational business processes, usually referred to as Extract-Transform-Load (ETL) is itself a statically defined process and knowledge workers have little to no control over the characteristics of the presentable data to which they have access. There are two main reasons that dictate the reassessment of this stiff approach in context of modern business environments. The first reason is that the service-oriented nature of today's business combined with the increasing volume of available data make it impossible for an organization to proactively design efficient data management processes. The second reason is that enterprises can benefit significantly from analyzing the behavior of their business processes fostering their optimization. Hence, we took a first step towards quality-aware ETL process design automation by defining through a systematic literature review a set of ETL process quality characteristics and the relationships between them, as well as by providing quantitative measures for each characteristic. Subsequently, we produced a model that represents ETL process quality characteristics and the dependencies among them and we showcased through the application of a Goal Model with quantitative components (i.e., indicators) how our model can provide the basis for subsequent analysis to reason and make informed ETL design decisions. In addition, we introduced our holistic view for a quality-aware design of ETL processes by presenting a framework for user-centered declarative ETL. This included the definition of an architecture and methodology for the rapid, incremental, qualitative improvement of ETL process models, promoting automation and reducing complexity, as well as a clear separation of business users and IT roles where each user is presented with appropriate views and assigned with fitting tasks. In this direction, we built a tool "POIESIS" which facilitates incremental, quantitative improvement of ETL process models with users being the key participants through well-defined collaborative interfaces. When it comes to evaluating different quality characteristics of the ETL process design, we proposed an automated data generation framework for evaluating ETL processes (i.e., Bijoux). To this end, we classified the operations based on the part of input data they access for processing, which facilitated Bijoux during data generation processes both for identifying the constraints that specific operation semantics imply over input data, as well as for deciding at which level the data should be generated (e.g., single field, single tuple, complete dataset). Bijoux offers data generation capabilities in a modular and configurable manner, which can be used to evaluate the quality of different parts of an ETL process. Moreover, we introduced a methodology that can apply to concrete contexts, building a repository of patterns and rules. This generated knowledge base can be used during the design and maintenance phases of ETL processes, automatically exposing understandable conceptual representations of the processes and providing useful insight for design decisions. Collectively, these contributions have raised the level of abstraction of ETL process components, revealing their quality characteristics in a granular level and allowing for evaluation and automated (re-)design, taking under consideration business users' quality goals., Business Intelligence (BI) permite a las organizaciones recolectar y analizar datos empresariales internos y externos para generar conocimiento y valor de negocio y proporcionar soporte de decisión en los niveles estratégico, táctico y operacional. La consolidación de datos procedentes de muchas fuentes como resultado de los procesos empresariales gerenciales y operacionales, denominados Extract-Transform-Load (ETL), es en sí mismo un proceso estáticamente definido y los trabajadores del conocimiento tienen poco o ningún control sobre las características de la Datos a los que tienen acceso. Hay dos razones principales que dictan la reevaluación de este enfoque rígido en el contexto de los entornos empresariales modernos. El carácter orientado al servicio de los negocios de hoy, combinado con el creciente volumen de datos disponibles, hace imposible que una organización diseñe proactivamente procesos eficientes de gestión de datos. Además, las empresas pueden beneficiarse significativamente de analizar el comportamiento de sus procesos empresariales fomentando su optimización. Dimos un primer paso hacia la automatización del diseño de procesos ETL de calidad, definiendo a través de una revisión sistemática de la literatura un conjunto de características de calidad del proceso ETL y las relaciones entre ellas, y proporcionando medidas cuantitativas para cada característica. Posteriormente, se produjo un modelo que representa las características de la calidad del proceso ETL y las dependencias entre ellas y se mostró a través de la aplicación de un modelo de meta con componentes cuantitativos cómo nuestro modelo puede proporcionar la base para el análisis posterior para razonar y hacer informados Decisiones de diseño ETL. Además, presentamos nuestra visión holística para un diseño consciente de la calidad de los procesos de ETL presentando un marco para el ETL declarativo centrado en el usuario. Esto incluyó la definición de una arquitectura y una metodología para la mejora rápida, incremental y cualitativa de los modelos de procesos ETL, la promoción de la automatización y la reducción de la complejidad, así como una clara separación entre los usuarios empresariales y los roles de TI donde cada usuario Con tareas de ajuste. En esta dirección, construimos una herramienta -POIESIS- que facilita la mejora incremental y cuantitativa de los modelos de proceso ETL, siendo los usuarios los participantes clave a través de interfaces de colaboración bien definidas. Cuando se trata de evaluar diferentes características de calidad del diseño del proceso ETL, hemos propuesto un marco automatizado de generación de datos para evaluar procesos ETL (Bijoux). Para ello, clasificamos las operaciones basadas en la parte de datos de entrada que acceden para procesamiento, lo que facilitó a Bijoux durante los procesos de generación de datos, tanto para identificar las restricciones que la semántica de operación específica implica sobre los datos de entrada como para decidir a qué nivel Los datos deben ser generados (por ejemplo, campo único, única tupla, conjunto de datos completo). Bijoux ofrece capacidades de generación de datos de forma modular y configurable, que pueden usarse para evaluar la calidad de diferentes partes de un proceso ETL. Además, hemos introducido una metodología que puede aplicarse a contextos concretos, construyendo un repositorio de patrones y reglas. Esta base de conocimiento generada puede utilizarse durante las fases de diseño y mantenimiento de los procesos ETL, exponiendo automáticamente las representaciones conceptuales comprensibles de los procesos y proporcionando una visión útil para las decisiones de diseño. En conjunto, estas contribuciones han elevado el nivel de abstracción de los componentes del proceso ETL, revelando sus características de calidad en un nivel granular y permitiendo la evaluación y el (re) diseño automatizado, tomando en consideración los objetivos de calidad de los usuarios empresariales.
- Published
- 2017
48. Análisis, diseño e implementación de un screener de acciones de empresas mediante un Sistema de Inteligencia de Negocio
- Author
-
Ramírez Rodríguez, Aridany Vicente, Universitat Oberta de Catalunya, and Martínez Fontes, Xavier
- Subjects
inteligencia de negocio ,ETL ,magatzem de dades ,almacén de datos ,data warehouse ,Sistemes d'informació per a la gestió -- TFG ,Management information systems -- TFG ,intel·ligència de negoci ,Sistemas de información para la gestión -- TFG ,business intelligence - Abstract
Este trabajo de fin de grado pretende aplicar las técnicas y desarrollos tecnológicos propios del Bussiness intelligence (BI), para ayudar a los inversores particulares a acotar el universo de acciones en las que invertir, mediante una herramienta de análisis que le permita aplicar una serie de condiciones o restricciones que deben cumplir las empresas seleccionadas Aquest treball de finalització de grau pretén aplicar les tècniques i desenvolupaments tecnològics propis del Bussiness intelligence (BI), per a ajudar als inversors particulars a fitar l'univers d'accions en les quals invertir, mitjançant una eina d'anàlisi que li permeta aplicar una sèrie de condicions o restriccions que han de complir les empreses seleccionades This work aims to apply the techniques and technological developments of Business Intelligence(BI), to help private investors to reduce the universe of possible stocks to invest in, through an analysis tool that allows him to apply a series od conditions or restrictions to be fulfilled by the selected companies
- Published
- 2017
49. Inteligencia de negocio en nefrología del Complejo Hospitalario Universitario Insular- Materno Infantil de Gran Canaria
- Author
-
García García, Diego, Universitat Oberta de Catalunya, Martínez Fontes, Xavier, and Daradoumis Haralabus, Atanasi
- Subjects
ETL ,nefrologia ,CMI ,Espionaje industrial -- TFG ,nephrology ,Espionatge industrial -- TFG ,nefrología ,business intelligence ,Business intelligence -- TFG - Abstract
Este proyecto persigue dos objetivos. El diseño e implementación de un sistema de inteligencia de negocio, para el Servicio de Nefrología del Complejo Hospitalario Universitario Insular Materno-Infantil, que permita mediante un cuadro de mandos el control y seguimiento de la eficacia de las sesiones de hemodiálisis de los pacientes. Y enriquecer el sistema de información Cantonera, del Servicio Canario de Salud, mediante la exportación de la información de la actividad asistencial de las sesiones de hemodiálisis en ficheros planos. This project pursues two objectives. The design and implementation of a business intelligence system for the Nephrology Service of the Insular University Hospital Maternal and Child Hospital, which allows the control and monitoring of the efectiveness of the hemodialysis sessions of the patients through a control panel. And enrich the information system Cantonera, of the Canario Health Service, by exporting the information of the healthcare activity of the hemodialysis sessions in flat files. Aquest projecte persegueix dos objectius. El disseny i implementació d'un sistema d'intel·ligència de negoci, per al Servei de Nefrologia del Complex Hospitalari Universitari Insular Matern-Infantil, que permeti mitjançant un quadre de comandaments el control i seguiment de l'eficàcia de les sessions d'hemodiàlisi dels pacients. I enriquir el sistema d'informació Cantonera, del Servei Canari de Salut, mitjançant l'exportació de la informació de l'activitat assistencial de les sessions d'hemodiàlisi en fitxers plans.
- Published
- 2017
50. A Model-Driven Framework for Hardware-Software Co-design of Dataflow Applications
- Author
-
Ahmad, Waheed, Yildiz, Bugra M., Rensink, Arend, Stoelinga, Mariëlle, Berger, C., Mousavi, M., and Wisniewski, R.
- Subjects
Model checking ,Dataflow ,Computer science ,Model transformation ,Interoperability ,SDF ,02 engineering and technology ,computer.software_genre ,Eclipse ,Extensibility ,Uppaal Cora ,EuGenia ,Ecore ,GMF ,HW/SW co-design ,Metamodel ,0202 electrical engineering, electronic engineering, information engineering ,EC Grant Agreement nr.: FP7/318490 ,Viola-Jones face detector ,computer.programming_language ,Model Transformation ,Programming language ,business.industry ,020206 networking & telecommunications ,Synchronous Data Flow ,UPPAAL ,020202 computer hardware & architecture ,Metamodeling ,ETL ,Model Driven Engineering ,Priced Timed Automata ,Model-driven architecture ,Software engineering ,business ,computer - Abstract
Hardware-software (HW-SW) co-design allows to meet system-level objectives by exploiting the synergy of hardware and software. Current tools and approaches for HW-SW co-design face difficulties coping with the increasing complexity of modern-day application due to, e.g., concurrency and energy constraints. Therefore, an automated modeling approach is needed which satisfies modularity, extensibility and interoperability requirements. Model-Driven Engineering (MDE) is a prominent paradigm that, by treating models as first-class citizens, helps to fulfill these requirements. This paper presents a state-of-the-art MDE-based framework for HW-SW co-design of dataflow applications, based on synchronous dataflow (SDF) graph formalism. In the framework, we introduce a reusable set of three coherent metamodels for creating HW-SW co-design models concerning SDF graphs, hardware platforms and allocation of SDF tasks to hardware. The framework also contains model transformations that cast these models into priced timed-automata models, the input language of the well-known model checker Uppaal Cora. We demonstrate how our framework satisfies the requirements of modularity, extensibility and interoperability in an industrial case study.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.