532 results on '"Parallel database"'
Search Results
2. Transliteration from English to Telugu Using Phrase-Based Machine Translation for General Domain English Words
- Author
-
Mogla, Radha, Vasantha Lakshmi, Chellapilla, Chatterjee, Niladri, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Kumar, Amit, editor, Senatore, Sabrina, editor, and Gunjan, Vinit Kumar, editor
- Published
- 2023
- Full Text
- View/download PDF
3. CARP: Cost Effective Load-Balancing Approach for Range-Partitioned Data
- Author
-
Belayadi, Djahida, Hidouci, Khaled-Walid, Midoun, Khadidja, Kacprzyk, Janusz, Series Editor, Demigha, Oualid, editor, Djamaa, Badis, editor, and Amamra, Abdenour, editor
- Published
- 2019
- Full Text
- View/download PDF
4. Cost Effective Load-Balancing Approach for Range-Partitioned Main-Memory Resident Data
- Author
-
Belayadi, Djahida, Hidouci, Khaled-Walid, Bellatreche, Ladjel, Ordonez, Carlos, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Hartmann, Sven, editor, Ma, Hui, editor, Hameurlain, Abdelkader, editor, Pernul, Günther, editor, and Wagner, Roland R., editor
- Published
- 2018
- Full Text
- View/download PDF
5. Universal algorithm of processing of requests with use of parallel technology
- Author
-
Timofeeva N.E. and Dmitrieva K.A.
- Subjects
algorithm ,database ,distributed database ,parallel database ,MapReduсe ,query processing ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Chemistry ,QD1-999 - Abstract
Processing and storage of a large amount of information is one of the difficult and interesting tasks at the moment. The performance of the system as a whole depends on how well the performance and reliability of the database are implemented. One of the most difficult aspects of this issue is the handling of a database query and its efficient execution. In this paper we consider modern methods and models of query processing in databases. We offer an algorithm to service the request of users, which involves the use of parallel technologies in the exchange of information with the nodes of a distributed database and a dictionary, and also allows to increase the query execution time, which in turn will increase the speed of the system as a whole. We bring the current at the moment the technology of storing large amounts of data: distributed and parallel databases, MapReduce.
- Published
- 2018
- Full Text
- View/download PDF
6. Comparing and Analyzing the Energy Efficiency of Cloud Database and Parallel Database
- Author
-
Song, Jie, Li, Tiantian, Liu, Xuebing, Zhu, Zhiliang, Wyld, David C., editor, Zizka, Jan, editor, and Nagamalai, Dhinaharan, editor
- Published
- 2012
- Full Text
- View/download PDF
7. Comparing Hadoop and Fat-Btree Based Access Method for Small File I/O Applications
- Author
-
Luo, Min, Yokota, Haruo, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Chen, Lei, editor, Tang, Changjie, editor, Yang, Jun, editor, and Gao, Yunjun, editor
- Published
- 2010
- Full Text
- View/download PDF
8. NHSS: A speech and singing parallel database
- Author
-
Bidisha Sharma, Karthika Vijayan, Haizhou Li, Xiaoxue Gao, and Xiaohai Tian
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Linguistics and Language ,Computer science ,media_common.quotation_subject ,Computer Science - Human-Computer Interaction ,computer.software_genre ,Computer Science - Sound ,Language and Linguistics ,Human-Computer Interaction (cs.HC) ,Spectral mapping ,Audio and Speech Processing (eess.AS) ,Reading (process) ,FOS: Electrical engineering, electronic engineering, information engineering ,Natural (music) ,media_common ,business.industry ,Communication ,Parallel database ,Lyrics ,Computer Science Applications ,Modeling and Simulation ,Language technology ,Benchmark (computing) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Singing ,business ,computer ,Software ,Natural language processing ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and singing signals, cooperative synthesis of speech and singing voices, and speech-to-singing conversion. This database consists of recordings of sung vocals of English pop songs, the spoken counterpart of lyrics of the songs read by the singers in their natural reading manner, and manually prepared utterance-level and word-level annotations. The audio recordings in the NHSS database correspond to 100 songs sung and spoken by 10 singers, resulting in a total of 7 hours of audio data. There are 5 male and 5 female singers, singing and reading the lyrics of 10 songs each. In this paper, we discuss the design methodology of the database, analyse the similarities and dissimilarities in characteristics of speech and singing voices, and provide some strategies to address relationships between these characteristics for converting one to another. We develop benchmark systems, which can be used as reference for speech-to-singing alignment, spectral mapping, and conversion using the NHSS database., Accepted to Speech Communication
- Published
- 2021
- Full Text
- View/download PDF
9. Design and Implementation of Human-Computer Interaction System in Parallel Digital Library System Based on Neural Network
- Author
-
Jun Cao
- Subjects
0303 health sciences ,Article Subject ,Computer science ,business.industry ,Parallel database ,Parallel algorithm ,Information processing ,02 engineering and technology ,User requirements document ,Digital library ,Computer Science Applications ,QA76.75-76.765 ,03 medical and health sciences ,Information and Communications Technology ,Computer cluster ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Computer software ,Software engineering ,business ,Software ,030304 developmental biology - Abstract
Information and communication technologies are well thought-out as probable assets for the development of socioeconomics in developing countries. Studies have shown that enhanced infrastructure of telecommunication has facilitated means for underserved population development by various ways. Among the existing applications of ICT, the digital library systems provide with better solutions and respond to a variety of unmet needs of research institutions, scientific communities, and developments. With the development of digital library technology, the parallel database system has become the main tool for efficient information processing in the digital library system. On this basis, based on the parallel environment of the computer cluster, by coordinating the communication in the parallel environment, the coordinator, the collection machine, and the query processor can complete the operation of distribution, load, and maintenance, which has high efficiency and saves much precious time, supports the digital library to meet user requirements effectively, and meets the digital library’s performance requirements for data, and also, the key problem in the parallel algorithm has been solved. The experimental results show that this parallel technique has very good performance and efficiency.
- Published
- 2021
- Full Text
- View/download PDF
10. Parallel Database for Student Counselling through Single Window System for Admission in Engineering Colleges.
- Author
-
Gnana Singh, D. Asir Antony, Leavline, E. Jebamalar, and Preethi, B.
- Subjects
DATABASES ,SQL ,EDUCATIONAL counseling ,UNIVERSITY & college admission ,ENGINEERING students - Abstract
Nowadays, the growth of the educational sector generates massive data such as details of students, staff and their academic performance, administrative and research details, etc. These data are stored in distributed databases. The parallel database performs parallel query processing hence, the query processing time is reduced and throughput for the transaction processing is improved. The parallel database system also improves the performance by employing parallelism through various database management operations such as loading the data, building indexes, and evaluating queries. The data is partitioned and placed across multiple disks for parallel input and output (I/O) operation to achieve parallelism on queries using pipeline parallelism. The individual relational operations such as sort, join, aggregation, etc. are executed in parallel database by each processor that works independently with its own partition. Queries are expressed in high level language such as structured query language (SQL) that are translated into relational algebra for query processing. Thus, the parallel database increases the throughput in database management and reduces the query processing time. This paper presents a parallel database for student counselling for engineering university admission using MySQL relational database management system. [ABSTRACT FROM AUTHOR]
- Published
- 2017
11. Scheduling algorithms for parallel transaction processing systems
- Author
-
Wang, Jiahong, Li, Jie, Kameda, Hisao, Goos, G., editor, Hartmanis, J., editor, van Leeuwen, J., editor, and Malyshkin, Victor, editor
- Published
- 1997
- Full Text
- View/download PDF
12. Data Placement Strategy for a Parallel Database System
- Author
-
Ibáñez-Espiga, M. B., Williams, M. H., Tjoa, A Min, editor, and Ramos, Isidro, editor
- Published
- 1992
- Full Text
- View/download PDF
13. Transactional and Spatial Query Processing in the Big Data Era
- Author
-
Kim, Young-Seok
- Subjects
Computer science ,Database ,Log-Structured Merge Tree ,Parallel Database ,Spatial Query Processing ,Transaction Processing - Abstract
Over the past decade, the proliferation of mobile devices has generated a variety of data at an unprecedented rate. The trend will be further accelerated by the advent of the Internet-of- Things era. Such data include signals, texts, photos, and videos tagged with date, time, and geo coordinates. The data are structured, semi-structured, or unstructured. Data-processing systems that aim to ingest, store, index, and analyze Big Data must deal with such data efficiently. In response, we have developed Apache AsterixDB, a parallel, semi-structured information management platform, that provides the ability to ingest, store, index, query, and analyze mass quantities of data.The key contributions of this thesis fall in two major parts. First, in order to store and index newly generated data and make them queryable in a timely manner, a record-level transaction model was designed and implemented in AsterixDB based on the read-committed isolation level. Second, due to the importance of efficient query processing for such dynamic geo- tagged data, we implemented five variants of representative, disk-resident spatial indexing methods on top of the Log-Structured Merge-tree-based (LSM) storage layer in AsterixDB and evaluated their pros and cons in light of the dynamic characteristics of geo-tagged Big Data.
- Published
- 2016
14. Application of resolution principle in search agent's semantic query optimizing model.
- Author
-
Xin-Hua Xu, Wen-Hui Tian, Quan Li, and Liang Wang
- Subjects
MATHEMATICAL optimization ,ARTIFICIAL intelligence ,PREDICATE (Logic) ,SEMANTICS (Philosophy) ,DATABASES - Published
- 2015
15. A Systematic Approach for English- Hindi Parallel Database Creation for Transliteration of General Domain English Words
- Author
-
Radha Mogla, Niladri Chatterjee, and C. Vasantha Lakshmi
- Subjects
Hindi ,Phrase ,Machine translation ,business.industry ,Computer science ,Parallel database ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Phonetics ,Pronunciation ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,language.human_language ,Task (project management) ,ComputingMethodologies_PATTERNRECOGNITION ,Transliteration ,language ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Transliteration is the task of converting a text written in one language, called the source, to another language, called the target, by preserving the phonetic properties of the source language. The present work considers the task of transliteration with English and Hindi as the source and target language, respectively. The aim is to create a database by considering the diversity of letter and grapheme pronunciations in English and to develop a simple machine transliteration system from English to Hindi, using phrase-based statistical machine translation (PBSMT) model. The system was trained by using open-source toolkits Moses and GizaPP. This will help Hindi readers without the knowledge of English in learning the correct pronunciation of an English word, as the output will be written in Hindi script. To test the trained system, five short stories were transliterated and the results were compared with a transliteration tool available online ‘C-DAC Transliteration’.
- Published
- 2021
- Full Text
- View/download PDF
16. Integration of large-scale data processing systems and traditional parallel database technology
- Author
-
Daniel J. Abadi, Azza Abouzied, Kamil Bajda-Pawlikowski, and Avi Silberschatz
- Subjects
Flexibility (engineering) ,SQL ,Database ,Computer science ,business.industry ,Parallel database ,General Engineering ,Fault tolerance ,02 engineering and technology ,computer.software_genre ,Data processing system ,Software ,020204 information systems ,Scalability ,Management system ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer ,computer.programming_language - Abstract
In 2009 we explored the feasibility of building a hybrid SQL data analysis system that takes the best features from two competing technologies: large-scale data processing systems (such as Google MapReduce and Apache Hadoop) and parallel database management systems (such as Greenplum and Vertica). We built a prototype, HadoopDB, and demonstrated that it can deliver the high SQL query performance and efficiency of parallel database management systems while still providing the scalability, fault tolerance, and flexibility of large-scale data processing systems. Subsequently, HadoopDB grew into a commercial product, Hadapt, whose technology was eventually acquired by Teradata. In this paper, we provide an overview of HadoopDB's original design, and its evolution during the subsequent ten years of research and development effort. We describe how the project innovated both in the research lab, and as a commercial product at Hadapt and Teradata. We then discuss the current vibrant ecosystem of software projects (most of which are open source) that continued HadoopDB's legacy of implementing a systems level integration of large-scale data processing systems and parallel database technology.
- Published
- 2019
- Full Text
- View/download PDF
17. A parallel query processing system based on graph-based database partitioning
- Author
-
Donghyoung Han, Yoon-Min Nam, and Min-Soo Kim
- Subjects
SQL ,Information Systems and Management ,Database ,Computer science ,Parallel database ,05 social sciences ,Graph based ,InformationSystems_DATABASEMANAGEMENT ,050301 education ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Theoretical Computer Science ,Query plan ,Artificial Intelligence ,Control and Systems Engineering ,Data redundancy ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,0503 education ,computer ,Software ,computer.programming_language - Abstract
As parallel database systems have large amounts of data to process, it is important to utilize a scalable and efficient horizontal database partitioning method. The existing partitioning methods have major drawbacks that not only cause large amounts of data redundancy but also still require expensive shuffle operations for join queries in many cases—despite their high data redundancy. We elucidate upon the drawbacks originating from the tree-based partitioning schemes and propose a novel graph-based database partitioning method called GPT that both improves the query performance and reduces data redundancy. We integrate the proposed GPT method into a parallel query processing system, Spark SQL , across all the relevant layers and modules, including the query plan generator and the scan operator. Through extensive experiments using three benchmarks, TPC-DS, IMDB and BioWarehouse, we show that GPT significantly outperforms the state-of-the-art method in terms of both storage overhead and query performance.
- Published
- 2019
- Full Text
- View/download PDF
18. COMPASS: Online Sketch-based Query Optimization for In-Memory Databases
- Author
-
Jun Hyung Shin, Asoke Datta, Florin Rusu, and Yesdaulet Izenov
- Subjects
Speedup ,Database ,Relational database ,Computer science ,Compass ,Parallel database ,InformationSystems_DATABASEMANAGEMENT ,Graph (abstract data type) ,Joins ,Cardinality (SQL statements) ,computer.software_genre ,Query optimization ,computer - Abstract
Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses for accurate cardinality estimation. As the complexity of selections and the number of join predicates increase, two problems arise. First, statistics cannot be incrementally composed to effectively estimate the cost of the sub-plans generated in plan enumeration. Second, small errors are propagated exponentially through joins, which can lead to severely sub-optimal plans. In this paper, we introduce COMPASS, a novel query optimization paradigm for in-memory databases based on a single type of statistics---Fast-AGMS sketches. In COMPASS, query optimization and execution are intertwined. Selection predicates and sketch updates are pushed-down and evaluated online during query optimization. This allows Fast-AGMS sketches to be computed only over the relevant tuples---which enhances cardinality estimation accuracy. Plan enumeration is performed over the query join graph by incrementally composing attribute-level sketches---not by building a separate sketch for every sub-plan. We prototype COMPASS in MapD -- an open-source parallel database -- and perform extensive experiments over the complete JOB benchmark. The results prove that COMPASS generates better execution plans -- both in terms of cardinality and runtime -- compared to four other database systems. Overall, COMPASS achieves a speedup ranging from 1.35X to 11.28X in cumulative query execution time over the considered competitors.
- Published
- 2021
- Full Text
- View/download PDF
19. DBSpinner: Making a Case for Iterative Processing in Databases
- Author
-
Chen Jianjun, Jason Yang Sun, Xiaodong Zhang, Sofoklis Floratos, and Ahmad Ghazal
- Subjects
SQL ,Information engineering ,Constant (computer programming) ,Relational database management system ,Computer engineering ,Computer science ,Relational database ,Shared nothing architecture ,Parallel database ,Table (database) ,computer.software_genre ,computer ,computer.programming_language - Abstract
Relational database management systems (RDBMS) have limited iterative processing support. Recursive queries were added to ANSI SQL, however, their semantics do not allow aggregation functions, which disqualifies their use for several applications, such as PageRank and shortest path computations. Recently, another SQL extension, iterative Common Table Expressions (CTEs), is proposed to enable users to perform general iterative computations on RDBMSs.In this work1, we demonstrate how iterative CTEs can be efficiently incorporated into a production RDBMS without major intrusion to the system. We have prototyped our approach on Futurewei’s MPPDB, a shared nothing relational parallel database engine. The implementation is based on a functional rewrite that translates iterative CTEs to other existing SQL operators. Thus, query plans of iterative CTEs can be optimized and executed by the engine with minimal modification to the code base. We have also applied several optimizations specifically for iterative CTEs to i) minimize data movement, ii) reuse results that remain constant and iii) push down predicates to avoid unnecessary data processing. We verified our implementation through extensive experimental evaluation using real world datasets and queries. The results show the feasibility of the rewrite approach and the effectiveness of the optimizations, which improve performance by an order of magnitude in some cases.
- Published
- 2021
- Full Text
- View/download PDF
20. Query Processing and Optimization of Parallel Database System in Multi Processor Environments.
- Author
-
Deepak, Sukheja, Kumar, Singh Umesh, Durgesh, Mishra, and K., Pandya Bhupendra
- Abstract
In present scenario parallel database systems are being applicable in a broad range of systems, right from database applications (OLTP) server to decision support systems (OLAP) server. These developments involve database processing and querying over parallel systems. A means to the success of parallel database systems, particularly in decision-support applications (Data warehousing), is parallel query optimization. Given a SQL query, parallel query optimization has the goal of finding a parallel plan that delivers the query result in minimal time. Various useful and competent, optimizing solutions to be implemented For the parallel databases. Parallel DBS attempt to develop recently in order to make highperformance and high-availability database servers at a much lower price for multiprocessor computer architectures than mainframe computers. The objective of this paper is define a novel approach on how to achieve parallelism for relational database multithreaded query execution use to maximum resource utilization of CPU and memory. This technique offer a solution to the problem of minimizing the response time of input queries against parallel databases. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
21. A Genetic Optimization Physical Planner for Big Data Warehouses
- Author
-
Yacine Mestoui, Carlos Ordonez, Soumia Benkrid, and Ladjel Bellatreche
- Subjects
Decision support system ,business.industry ,Computer science ,Online analytical processing ,Parallel database ,05 social sciences ,Big data ,050301 education ,Context (language use) ,Workload ,02 engineering and technology ,Machine learning ,computer.software_genre ,Business intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0503 education ,computer - Abstract
Workload-driven approaches for partitioning and tuning traditional Parallel Database systems are well studied in the literature. Unfortunately, in the context of new generation "Big Data" warehouses, these approaches are not correctly adapted to Business Intelligence 2.0, where the analyst is at the heart of decision support systems. This "disconnect" situation strongly impacts both data partitioning and fragment allocation processes, which are essential to achieve good query performance. To overcome this problem, recent studies proposed online data partitioning and fragment allocation using AI techniques to improve query performance with adaptive behavior. Nevertheless, they have important limitations: they add significant overhead and they tend to focus on the current workload, ignoring query logs. With such motivation in mind, we first formulate the problem of optimizing database partitioning subject to feasibility constraints, based on a query workload. We then introduce a proactive partitioning approach combining offline and online processing phases, inspired by closed-loop control (used in engineering disciplines) and genetic algorithms (from AI). We present an experimental validation on a big data cluster that shows promising results on typical OLAP workloads.
- Published
- 2020
- Full Text
- View/download PDF
22. Improving round-robin through load adjusted-load informed algorithm in parallel database server application
- Author
-
Peace Asuquo Frank, Uduakobong-Aniebiat Okon, and Nseabasi Peter Essien
- Subjects
Application server ,Computer science ,Parallel database ,Operating system ,computer.software_genre ,computer - Published
- 2020
- Full Text
- View/download PDF
23. Parallel Database Management Systems for GIS
- Author
-
B.M. Gittings, S. Dowers, R.G. Healey, and M.J. Tranter
- Subjects
Database ,Computer science ,Parallel database ,Management system ,computer.software_genre ,computer - Published
- 2020
- Full Text
- View/download PDF
24. Data Equilibrium Method of Distributed Parallel Database Under High Load
- Author
-
Dingxiang Zhang
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer science ,Parallel database ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,High load ,02 engineering and technology ,Computer Vision and Pattern Recognition ,020202 computer hardware & architecture ,Computational science - Abstract
The traditional method cannot make predictive judgment on the future load of the system, which leads to the convergence speed in the local updating process and cause the waste of resources. Aiming at this problem, a data equalization method based on ant colony optimization algorithm is proposed. During the calculation of server cluster integrated load, two kinds of load information input indicators and server indexes are mainly used. A formal description of the task scheduling problem under the high load of distributed parallel database is carried out and the mathematical model is established; the independent and different resource required virtual machine in the system are deployed in the server to balance the system, which has good global convergence, and can effectively control the system resource usage. Experiments showed that the proposed method avoids the unwanted migration caused by the instantaneous peak, which reduces the overhead of the system.
- Published
- 2019
- Full Text
- View/download PDF
25. The Study on Implementation of parallel database for Terminology of Korean Linguistics
- Author
-
Byung-sun Park
- Subjects
Computer science ,Parallel database ,Linguistics ,Terminology - Published
- 2018
- Full Text
- View/download PDF
26. Universal algorithm of processing of requests with use of parallel technology
- Author
-
N.E. Timofeeva and K.A. Dmitrieva
- Subjects
lcsh:Chemistry ,algorithm ,parallel database ,lcsh:QD1-999 ,lcsh:TA1-2040 ,Computer science ,distributed database ,query processing ,MapReduсe ,Parallel computing ,lcsh:Engineering (General). Civil engineering (General) ,database - Abstract
Processing and storage of a large amount of information is one of the difficult and interesting tasks at the moment. The performance of the system as a whole depends on how well the performance and reliability of the database are implemented. One of the most difficult aspects of this issue is the handling of a database query and its efficient execution. In this paper we consider modern methods and models of query processing in databases. We offer an algorithm to service the request of users, which involves the use of parallel technologies in the exchange of information with the nodes of a distributed database and a dictionary, and also allows to increase the query execution time, which in turn will increase the speed of the system as a whole. We bring the current at the moment the technology of storing large amounts of data: distributed and parallel databases, MapReduce.
- Published
- 2018
- Full Text
- View/download PDF
27. A Hybrid System of Hadoop and DBMS for Earthquake Precursor Application.
- Author
-
Tao Luo, Wei Yuan, Pan Deng, Yunquan Zhang, and Guoliang Chen
- Subjects
HYBRID systems ,EARTHQUAKES ,COMPARATIVE studies ,WAREHOUSES ,BIG data ,SCALABILITY - Abstract
Compared with traditional data warehouse applications, big data analytics are huge and complex, and requires massive performance and scalability. In this paper, we explore the feasibility and versatility of building a hybrid system that not only retains the analytical DBMS, but also could handle the demands of rapidly exploding data applications. We propose a hybrid system prototype which takes DBMS as the underlying storage and execution units, and Hadoop as an index layer and a cache. Experiments show that our system meets the demand, and will be appropriate for analogous big data analysis applications. [ABSTRACT FROM AUTHOR]
- Published
- 2013
28. Parallel Database
- Author
-
LIU, LING, editor and ÖZSU, M. TAMER, editor
- Published
- 2009
- Full Text
- View/download PDF
29. A high-performance spatial database based approach for pathology imaging algorithm evaluation.
- Author
-
Fusheng Wang, Jun Kong, Jingjing Gao, Lee A. D. Cooper, Tahsin Kurc, Zhengwen Zhou, Adler, David, Vergara-Niedermayr, Cristobal, Katigbak, Bryan, Brat, Daniel J., and Saltz, Joel H.
- Subjects
- *
ALGORITHMS , *PATHOLOGY , *SQL , *DATABASE management , *IMAGE analysis - Abstract
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
30. MIN-entropy: A New Signature File Declustering Algorithm for Intra-query Parallelism.
- Author
-
Byoung-Mo IM, Myoung Ho Kim, and Jae Soo Yoo
- Subjects
ENTROPY ,PARALLELISM (Linguistics) ,ALGORITHMS ,HAMMING distance ,INFORMATION theory - Published
- 1997
31. Cluster based parallel database management system for data intensive computing.
- Author
-
Li, Jianzhong and Zhang, Wei
- Abstract
This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors. InfiniteDB aims at efficiently support data intensive computing in response to the rapid growing in database size and the need of high performance analyzing of massive databases. It can be efficiently executed in the computing system composed by thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation, inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinatorwrapper mechanism to support the integration of heterogeneous information resources on the Internet, and the fault tolerant and resilient infrastructures. It has been used in many applications and has proved quite effective for data intensive computing. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
32. Dependable configurations and recovery for the Fat-Btree parallel directory structure.
- Author
-
Miyazaki, Jun and Yokota, Haruo
- Subjects
- *
CONFIGURATIONS (Geometry) , *DATABASES , *COMPUTER files , *PARALLEL computers , *COMPUTERS , *PROBABILITY theory , *DATA protection , *ELECTRONIC data processing , *COMPUTER system conversion , *COMPUTER security - Abstract
In this paper, we will discuss a highly dependable system configuration method and recovery method for our proposed Fat-Btree structure, which is a directory structure for shared-nothing parallel computers. The goals are to enhance the service of operation during failure, to minimize the probability of data loss, and to enhance availability by combining physiological logging, logical logging, disk mirroring, staggered allocation, etc. Various system configurations formed by the combination of these methods will be quantitatively evaluated in this paper. As a result, it will be shown that a combination of physiological logging and disk mirroring is most appropriate when hardware cost can be ignored, and that a combination of global logical logging and physiological logging is most appropriate when hardware cost is considered. © 2006 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 89(12): 42–58, 2006; Published online in Wiley InterScience (
www.interscience.wiley.com ). DOI 10.1002/ecjc.20288 [ABSTRACT FROM AUTHOR]- Published
- 2006
- Full Text
- View/download PDF
33. Protein database search of hybrid alignment algorithm based on GPU parallel acceleration
- Author
-
Jincai Wang, Zhanxiu Cai, Jianping Ma, Bo Lian, and Wei Zhou
- Subjects
0301 basic medicine ,Smith–Waterman algorithm ,Speedup ,Computer science ,Parallel database ,Sequence alignment ,Needleman–Wunsch algorithm ,02 engineering and technology ,Parallel computing ,020202 computer hardware & architecture ,Theoretical Computer Science ,03 medical and health sciences ,CUDA ,030104 developmental biology ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Graphics ,Algorithm ,Software ,Information Systems - Abstract
In biological research, alignment of protein sequences by computer is often needed to find similarities between them. Although results can be computed in a reasonable time for alignment of two sequences, it is still very central processing unit (CPU) time-consuming when solving massive sequences alignment problems such as protein database search. In this paper, an optimized protein database search method is presented and tested with Swiss-Prot database on graphic processing unit (GPU) devices, and further, the power of CPU multi-threaded computing is also involved to realize a GPU-based heterogeneous parallelism. In our proposed method, a hybrid alignment approach is implemented by combining Smith–Waterman local alignment algorithm with Needleman–Wunsch global alignment algorithm, and parallel database search is realized with compute unified device architecture (CUDA) parallel computing framework. In the experiment, the algorithm is tested on a lower-end and a higher-end personal computers equipped with GeForce GTX 750 Ti and GeForce GTX 1070 graphics cards, respectively. The results show that the parallel method proposed in this paper can achieve a speedup up to 138.86 times over the serial counterpart, improving efficiency and convenience of protein database search significantly.
- Published
- 2017
- Full Text
- View/download PDF
34. Structured Parallel Efficient Execution Database Management System Over Enormous Dataset with MapReduce using Matlab
- Author
-
P. Suresh Varma and Uma Mahesh Kumar Gandham
- Subjects
Scheme (programming language) ,Multidisciplinary ,Database ,Computer science ,business.industry ,Parallel database ,Big data ,02 engineering and technology ,computer.software_genre ,020204 information systems ,Scalability ,Management system ,0202 electrical engineering, electronic engineering, information engineering ,Dependability ,business ,computer ,Implementation ,computer.programming_language - Abstract
Objective: MapReduce is an encoding representation and a connected execution for handing out and generate huge data set. The objective of the present paper is that retrieve the data from enormous dataset in efficient manner a MapReduce. Methodology: The present paper uses structured parallel efficient execution Database Management System i.e. Parallel Database Management Systems (PDBMS). The present paper uses the Matlab for implementing PDBMS. This paper uses the broad concept of the paradigms quite than the exact implementations of MapReduce and Parallel DBMS. Such enormous information investigation on large clusters present new opportunity and challenge for mounting an extremely scalable and competent dispersed calculation system which is informal to strategy and multi- composite scheme optimization to exploit presentation and dependability to conquer this problem realize a new algorithm called Structured Parallel Efficient Execution Database 'Management (SPEED'MS) System' over Enormous Dataset with MapReduce. Findings: An optimizer is answerable for converting script into well-organized implementation plans for the dispersed calculation engine. Speed is living thing utilized day by day for assorted qualities of data study and data mining applications driving Bing, and other online services. The algorithm has been tested with the Matlab. Applications: MapReduce concept has potential applications like Clinical big data analysis, Bioinformatics Distributed programming.
- Published
- 2017
- Full Text
- View/download PDF
35. PhoeniQ: Failure-Tolerant Query Processing in Multi-node Environments
- Author
-
Masaru Kitsuregawa, Kazuo Goda, Yutaro Bessho, and Yuto Hayamizu
- Subjects
050101 languages & linguistics ,Computer science ,Distributed computing ,Pipeline (computing) ,Node (networking) ,Parallel database ,05 social sciences ,Degree of parallelism ,02 engineering and technology ,Parallel processing (DSP implementation) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,State (computer science) - Abstract
Parallel processing is a flagship approach for answering analytical queries on large-scale database. As the database scale increases, a larger number of processing nodes are likely to be incorporated to increase the degree of parallelism. However, this solution results in an increased probability of node failure. If such a failure happens during query processing, the processing often has to restart from scratch. This temporal cost may not be acceptable for the user. In this paper, we propose PhoeniQ, a fault-tolerant query processing mechanism for analytical parallel database systems. PhoeniQ takes a package-level checkpoint for every operator pipeline and replicates the output of stateful operators among different processing nodes. If a single processing node fails during processing, another node is enabled to resume the execution state of the failed node, so that the query can continue to run. This paper presents our intensive experiments based on our prototype, which demonstrate that PhoeniQ can continue the query processing in the face of node failures with significantly smaller cost than the conventional approach.
- Published
- 2020
- Full Text
- View/download PDF
36. POTENTIAL: A highly adaptive core of parallel database system.
- Author
-
Wen, Jirong, Chen, Hong, and Wang, Shan
- Abstract
POTENTIAL is a virtual database machine based on general computing platforms, especially parallel computing platforms. It provides a complete solution to high-performance database systems by a ‘ virtual processor+virtual data bus+virtual memory’ architecture. Virtual processors manage all CPU resources in the system, on which various operations are running. Virtual data bus is responsible for the management of data transmission between associated operations, which forms the hinges of the entire system. Virtual memory provides efficient data storage and buffering mechanisms that conform to data reference behaviors in database systems. The architecture of POTENTIAL is very clear and has many good features, including high efficiency, high scalability, high extensibility, high portability, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2000
- Full Text
- View/download PDF
37. Distributed and Parallel Database Design
- Author
-
Patrick Valduriez and M. Tamer Özsu
- Subjects
Theoretical computer science ,Computer science ,Schema (psychology) ,Parallel database ,Fragmentation (computing) ,Database design ,Conceptual schema - Abstract
A typical database design is a process which starts from a set of requirements and results in the definition of a schema that defines the set of relations. The distribution design starts from this global conceptual schema (GCS) and follows two tasks: partitioning (fragmentation) and allocation.
- Published
- 2019
- Full Text
- View/download PDF
38. Light Database Encryption Design Utilizing Multicore Processors for Mobile Devices
- Author
-
M. Hafiz Yusoff, Khairulmizam Samsudin, R. Badlishah Ahmad, and Mohammad Ahmed Alomari
- Subjects
021110 strategic, defence & security studies ,Multi-core processor ,business.industry ,Computer science ,Parallel database ,0211 other engineering and technologies ,020206 networking & telecommunications ,02 engineering and technology ,Database encryption ,computer.software_genre ,Encryption ,Relational database management system ,Disk encryption ,Embedded system ,Storage security ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer ,Mobile device - Abstract
The confidentiality of data stored in embedded and handheld devices has become an urgent necessity more than ever before. Encryption of sensitive data is a well-known technique to preserve their confidentiality, however it comes with certain costs that can heavily impact the device processing resources. Utilizing multicore processors, which are equipped with current embedded devices, has brought a new era to enhance data confidentiality while maintaining suitable device performance. Encrypting the complete storage area, also known as Full Disk Encryption (FDE) can still be challenging, especially with newly emerging massive storage systems. Alternatively, since the most user sensitive data are residing inside persisting databases, it will be more efficient to focus on securing SQLite databases, through encryption, where SQLite is the most common RDBMS in handheld and embedded systems. This paper addresses the problem of ensuring data protection in embedded and mobile devices while maintaining suitable device performance by mitigating the impact of encryption. We presented here a proposed design for a parallel database encryption system, called SQLite-XTS. The proposed system encrypts data stored in databases transparently on-the-fly without the need for any user intervention. To maintain a proper device performance, the system takes advantage of the commodity multicore processors available with most embedded and mobile devices.
- Published
- 2019
- Full Text
- View/download PDF
39. Resource bricolage and resource selection for parallel database systems
- Author
-
Rimma V. Nehme, Jeffrey F. Naughton, and Jiexing Li
- Subjects
Optimization problem ,Linear programming ,Computer science ,Process (engineering) ,Distributed computing ,Parallel database ,Workload ,02 engineering and technology ,Database tuning ,Resource (project management) ,Hardware and Architecture ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Performance prediction ,020201 artificial intelligence & image processing ,Information Systems - Abstract
Running parallel database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. Performance differences among machines in the same cluster pose new challenges for parallel database systems. First, for database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines, while at the same time it may underutilize the more powerful machines. Since the processing time of a parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation. Second, since machines might have varying resources or performance, different choices of machines may lead to different costs or performance for executing the same workload. By carefully selecting the most suitable machines for running a workload, we may achieve better performance with the same budget, or we may meet the same performance requirements with a lower cost. We address these challenges by introducing techniques we call resource bricolage and resource selection that improve database performance in heterogeneous environments. Our approaches quantify the performance differences among machines with various resources as they process workloads with diverse resource requirements. For the purpose of better resource utilization, we formalize the problem of minimizing workload execution time and view it as an optimization problem, and then, we employ linear programming to obtain a recommended data partitioning scheme. For the purpose of better resource selection, we formalize two problems: One minimizes the total workload execution time with a given budget, and the other minimizes the total budget with a given performance target. We then employ different mixed-integer programs to search for the optimal resource selection decisions. We verify the effectiveness of both resource bricolage and resource selection techniques with an extensive experimental study.
- Published
- 2016
- Full Text
- View/download PDF
40. Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters
- Author
-
Jeffrey F. Naughton, Rimma V. Nehme, and Jiexing Li
- Subjects
Scheme (programming language) ,Resource (project management) ,Optimization problem ,Process (engineering) ,Computer science ,Parallel database ,Distributed computing ,computer ,Software ,Database tuning ,Information Systems ,computer.programming_language - Abstract
Running parallel database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds or shared infrastructures. For database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines while at the same time it may underutilize the more powerful machines. Since the processing time of a parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation. We take a first step to address this problem by introducing a technique we call resource bricolage that improves database performance in heterogeneous environments. Our approach quantifies the performance differences among machines with various resources as they process workloads with diverse resource requirements. We formalize the problem of minimizing workload execution time and view it as an optimization problem, and then we employ linear programming to obtain a recommended data partitioning scheme. We verify the effectiveness of our technique with an extensive experimental study on a commercial database system.
- Published
- 2016
- Full Text
- View/download PDF
41. Resource Management Method of Distributed Parallel Database Based on Directed Graph
- Author
-
Jiang Shusong, Shi Yingjie, Huayun Zhang, Xiang Wang, and Zhou Yuchao
- Subjects
Network congestion ,History ,Resource (project management) ,Computer science ,Search algorithm ,Distributed computing ,Node (networking) ,Parallel database ,Resource management ,Cache ,Directed graph ,Computer Science Applications ,Education - Abstract
A distributed parallel database resource management method based on directed graph is proposed. By using content distributor based on distributed unstructured P2P network association, the high performance and stability of distributed parallel database system in dynamic changing environment are guaranteed. The problem of network congestion caused by too many redundant messages and the “barrel effect” caused by resource imbalance due to differential configuration is solved, through resource search algorithm based on directed graph lookahead, and the query node cached the resource information of two-level neighbor nodes. By adopting the Linux cgroups resource management mechanism, fully considering the multi tenant and multi factor based resource scheduling strategy, reduce resource fragments and better meet the problem of distributed parallel database storage or hot spot processing.
- Published
- 2020
- Full Text
- View/download PDF
42. Apara: Workload-Aware Data Partition and Replication for Parallel Databases
- Author
-
Chunxi Zhang, Rong Zhang, Yuming Li, Aoying Zhou, and Xiaolei Zhang
- Subjects
050101 languages & linguistics ,Distributed database ,Computer science ,business.industry ,Distributed computing ,Parallel database ,05 social sciences ,Workload ,02 engineering and technology ,Partition (database) ,Computer data storage ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Data partitioning ,business - Abstract
Data partition and replication mechanisms directly determine query execution patterns in parallel database systems, which have a great impact on system performance. Recently, there have been some workload-aware data storage techniques, but they suffer from problems of narrow support to complex workloads or large requirements for storage. In order to enable the support for complex analytical workloads over massive distributed database systems, we design and implement a workload-aware data partition and replication tool, called Apara. We design two heuristic algorithms and define two cost models for effective data partition calculation and efficient replication usages. We run a set of experiments to compare and demonstrate the performance between Apara and the other representative work. The results show that Apara consistently outperforms the primary solutions on TPC-H workloads.
- Published
- 2019
- Full Text
- View/download PDF
43. Simulation of Performance Analysis of MongoDB, PIG, HIVE Storage, Map Reduce, Spark and Yarn
- Author
-
Monika Monu and Sat Pal
- Subjects
SQL ,Database ,Computer science ,business.industry ,Parallel database ,Big data ,Process (computing) ,Volume (computing) ,Yarn ,computer.software_genre ,Information extraction ,visual_art ,Spark (mathematics) ,visual_art.visual_art_medium ,business ,computer ,computer.programming_language - Abstract
Nowadays there are a variety of the size or volume, complexity, variety, rate of growth or veracity of information. The companies have achieved an outstanding stage in order to handle the data. The cause is that the traditional techniques and analytical devices have failed to do this job. Big Data is always increasing rapidly. It is not possible to determine with respect to its size. Hadoop is capable to evaluate the big size data. Hadoop has been considered a framework. It has been applied to process the big data sets across numerous clusters. The Tools Hadoop, Map Reduce etc. are capable to manage this huge amount of data are. Along with this the Apache Hive, No SQL are also this kind of tolls. Information extraction has been considered essential. Its cause is that there is rapid growth of unstructured text data. Thus, it has been considered a computationally intensive and MapReduce and parallel database management systems. These are applied to evaluate the huge size of information. This paper has familiarized big data tools such as pache hive and Apache pig. here the comparison of hive and pig has been made based on some parameters. After making comparison it has been come to know that the hive performs better as compare to pig. Major difference in Hadoop MapReduce and Spark lies in way of processing. Spark is capable to do it in-memory. However, Hadoop MapReduce need to read from and write to the disk. Thus, the speed of processing is different. Spark is 100 times faster as compare to MapReduce
- Published
- 2019
- Full Text
- View/download PDF
44. Similarity query support in big data management systems
- Author
-
Michael J. Carey, Alexander Behm, Vinayak Borkar, Wenhai Li, Rares Vernica, Chen Li, Inci Cetindil, and Taewoo Kim
- Subjects
Information retrieval ,Computer science ,business.industry ,Parallel database ,Joins ,02 engineering and technology ,Query language ,Query optimization ,Operator (computer programming) ,Hardware and Architecture ,Analytics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Software ,Record linkage ,Information Systems - Abstract
Similarity query processing is becoming increasingly important in many applications such as data cleaning, record linkage, Web search, and document analytics. In this paper we study how to provide end-to-end similarity query support natively in a parallel database system. We discuss how to express a similarity predicate in its query language, how to build indexes, how to answer similarity queries (selections and joins) efficiently in the runtime engine, possibly using indexes, and how to optimize similarity queries. One particular challenge is how to incorporate existing similarity join algorithms, which often require a series of steps to achieve a high efficiency, including collecting token frequencies, finding matching record id pairs, and reassembling result records based on id pairs. We present a novel approach that uses existing runtime operators to implement such complex join algorithms without reinventing the wheel; doing so positions the system to automatically benefit from future improvements to those operators. The approach includes a technique to transform a similarity join plan into an efficient operator-based physical plan during query optimization by using a template expressed largely in the system’s user-level query language; this technique greatly simplifies the specification of such a transformation rule. We use Apache AsterixDB, a parallel Big Data management system, to illustrate and validate our techniques. We conduct an experimental study using several large, real datasets on a parallel computing cluster to assess the similarity query support. We also include experiments involving three other parallel systems and report the efficacy and performance results.
- Published
- 2020
- Full Text
- View/download PDF
45. Dynamic Data Reallocation for Skew Management in Shared-Nothing Parallel Databases.
- Author
-
Helal, Abdelsalam, Yuan, David, and EL-Rewini, Hesham
- Abstract
The shared nothing parallel database architecture is gaining wide popularity due to its scalability and increased data availability. However, in order to efficiently utilize parallelism in such architecture, independent data sets must be assigned to different processing nodes. This, of course, can initially be achieved by employing a careful partitioning scheme that allocates disjoint data sets to different processors. However, variations in the data access pattern may render some processors overloaded while others underloaded. This skewness in data access decreases the effective parallelism and eventually leads to overall performance degradation. A number of solutions have been proposed to periodically perform data re-allocation to remove the skewness in data access. Most of the proposed solutions perform either static re-allocation that requires the system to be taken off-line or dynamic, but non-transactional, re-allocation. In this paper, we introduce a dynamic and transactional re-allocation scheme based on the work on disk cooling in shared memory architecture by Scheuermann et al. The proposed scheme enhances the effective parallelism in the system regardless of the variations in the pattern of access. The proposed scheme detects access skew as it occurs and re-allocates data partitions to underloaded processing elements on the fly. Only the block being moved becomes unavailable. In addition, mutual consistency among transactions concurrent to the re-allocation event is preserved. The proposed scheme also uses replication as an additional cooling mechanism to help distribute access load over multiple replicas. We conducted a series of simulation experiments to study the behavior of shared nothing parallel database systems with and without the proposed dynamic re-allocation scheme. We also experimented with several replication strategies to measure their impact on the system performance. Finally, we studied the effect of using different concurrency control strategies on the efficiency of dynamic re-allocation. [ABSTRACT FROM AUTHOR]
- Published
- 1997
- Full Text
- View/download PDF
46. Efficiently extendible mappings for balanced data distribution.
- Author
-
Choy, D., Fagin, R., and Stockmeyer, L.
- Abstract
In data storage applications, a large collection of consecutively numbered data 'buckets' are often mapped to a relatively small collection of consecutively numbered storage 'bins.' For example, in parallel database applications, buckets correspond to hash buckets of data and bins correspond to database nodes. In disk array applications, buckets correspond to logical tracks and bins correspond to physical disks in an array. Measures of the 'goodness' of a mapping method include: One contribution of this paper is to give a new mapping method, the Interval-Round-Robin (IRR) method. The IRR method has optimal balance and relocation cost, and its time complexity and storage requirements compare favorably with known methods. Specifically, if m is the number of times that the number of bins and/or buckets has increased, then the time complexity is O(log m) and the storage is O( m). Another contribution of the paper is to identify the concept of a history-independent mapping, meaning informally that the mapping does not 'remember' the past history of expansions to the number of buckets and bins, but only the current number of buckets and bins. Thus, such mappings require very little information to be stored. Assuming that balance and relocation are optimal, we prove that history-independent mappings are possible if the number of buckets is fixed (so only the number of bins can increase), but not possible if the number of bins and buckets can both increase. [ABSTRACT FROM AUTHOR]
- Published
- 1996
- Full Text
- View/download PDF
47. CARP: Cost Effective Load-Balancing Approach for Range-Partitioned Data
- Author
-
Khaled-Walid Hidouci, Khadidja Midoun, and Djahida Belayadi
- Subjects
Skewed data ,Mathematical optimization ,021103 operations research ,Computer science ,Parallel database ,0211 other engineering and technologies ,Skew ,Response time ,0102 computer and information sciences ,02 engineering and technology ,Fuzzy control system ,Load balancing (computing) ,01 natural sciences ,Partition (database) ,010201 computation theory & mathematics ,Tuple - Abstract
One of the important issues in range partitioning schemes is data skew. Tuples distribution across nodes may be skewed (some nodes have many tuples, while others may have fewer tuples). Processing skewed data not only slows down the response time, but also generates hot nodes. In such a situation, data may need to be moved from the most-loaded partitions to the least-loaded ones in order to achieve storage balancing requirements. Early works from the State-of-The-Art focused on achieving load balancing. However, today’s works focus on reducing the load balancing cost. This latter involves reducing the cost of maintaining partition statistics. In this context, we propose to improve one of the best load balancing work, that is the one of Ganesan et al., to reduce the cost of maintaining the statistics of load balancing. We introduce the concept of fuzzy system image. Both nodes and clients have approximate information about the load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages as opposed to the cost of efficient solutions from the State-of-The-Art (which requires at least \(\mathbb {O}(\log {n})\) messages).
- Published
- 2018
- Full Text
- View/download PDF
48. Modeling parallel processing of databases on the central processor Intel Xeon Phi KNL
- Author
-
A.I. Rekachinsky, R.A. Chulkevich, and Pavel S. Kostenetskiy
- Subjects
Hardware architecture ,Coprocessor ,Database ,Computer science ,Parallel database ,Multiprocessing ,02 engineering and technology ,computer.software_genre ,Parallel processing (DSP implementation) ,020204 information systems ,Management system ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Performance improvement ,computer ,Xeon Phi - Abstract
The development of parallel database management systems is an urgent problem due to the rapid information volume growth. Nowadays the basic principles of DBMS performance improvement include the use of multiprocessor systems [8]. At the same time, acceleration could be achieved by using new hardware architectures, such as hybrid clusters with manycore coprocessors. The implementation of such architectures is limited by the high cost of hardware and its configuration. Therefore, the development of models that allow determining several characteristics and comparing different database queries runtime without both using real hardware and taking into account the exact execution details is a highly topical problem. This paper describes the development of a mathematical model that explores the effectiveness of a new manycore accelerator with Intel Xeon Phi Knights Landing hardware architecture in terms of parallel database processing.
- Published
- 2018
- Full Text
- View/download PDF
49. A Graph-Based Database Partitioning Method for Parallel OLAP Query Processing
- Author
-
Donghyoung Han, Min-Soo Kim, and Yoon-Min Nam
- Subjects
020203 distributed computing ,Database ,Computer science ,Online analytical processing ,Parallel database ,Graph based ,02 engineering and technology ,computer.software_genre ,Data redundancy ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,computer - Abstract
As the amount of data to process increases, a scalable and efficient horizontal database partitioning method becomes more important for OLAP query processing in parallel database platforms. Existing partitioning methods have a few major drawbacks such as a large amount of data redundancy and not supporting join processing without shuffle in many cases despite their large data redundancy. We elucidate the drawbacks arise from their tree-based partitioning schemes and propose a novel graph-based database partitioning method called GPT that improves query performance with lower data redundancy. Through extensive experiments using three benchmarks, we show that GPT significantly outperforms the state-of-the-art method in terms of both storage overhead and query performance.
- Published
- 2018
- Full Text
- View/download PDF
50. Banian: A cross-platform interactive query system for structured big data
- Author
-
Tao Xu, Dongsheng Wang, and Guodong Liu
- Subjects
SQL ,Multidisciplinary ,Database ,Relational database ,View ,Computer science ,Parallel database ,computer.software_genre ,Query optimization ,Scalability ,Query by Example ,Sargable ,computer ,computer.programming_language - Abstract
The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB level structured data called Banian. Banian overcomes the storage structure limitation of relational database and effectively integrates interactive query with large-scale storage management. It provides a uniform query interface for cross-platform datasets and thus shows favorable compatibility and scalability. Banian's system architecture mainly includes three layers: (1) a storage layer using HDFS for the distributed storage of massive data; (2) a scheduling and execution layer employing the splitting and scheduling technology of parallel database; and (3) an application layer providing a cross-platform query interface and supporting standard SQL. We evaluate Banian using PB level Internet data and the TPC-H benchmark. The results show that when compared with Hive, Banian improves the query performance to a maximum of 30 times and achieves better scalability and concurrency.
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.