4,815 results on '"Query processing"'
Search Results
202. A Query Processing Framework for Large-Scale Scientific Data Analysis
- Author
-
Fegaras, Leonidas, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Hameurlain, Abdelkader, editor, Wagner, Roland, editor, Hartmann, Sven, editor, and Ma, Hui, editor
- Published
- 2018
- Full Text
- View/download PDF
203. Distributed k-Nearest Neighbor Queries in Metric Spaces
- Author
-
Ding, Xin, Zhang, Yuanliang, Chen, Lu, Gao, Yunjun, Zheng, Baihua, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Cai, Yi, editor, Ishikawa, Yoshiharu, editor, and Xu, Jianliang, editor
- Published
- 2018
- Full Text
- View/download PDF
204. ESTA: An Energy-Efficient Spatio-Temporal Query Algorithm for Wireless Sensor Networks
- Author
-
Liu, Liang, Xu, Zhe, Wang, Yi-Ting, Qin, Xiao-Lin, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Wang, Guojun, editor, Chen, Jinjun, editor, and Yang, Laurence T., editor
- Published
- 2018
- Full Text
- View/download PDF
205. MPP SQL Query Optimization with RTCG
- Author
-
Sridhar, K. T., Sakkeer, M. A., Andrews, Shiju, Johnson, Jimson, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Mondal, Anirban, editor, Gupta, Himanshu, editor, Srivastava, Jaideep, editor, Reddy, P. Krishna, editor, and Somayajulu, D.V.L.N., editor
- Published
- 2018
- Full Text
- View/download PDF
206. Fault Tolerant Data Stream Processing in Cooperation with OLTP Engine
- Author
-
Ishikawa, Yoshiharu, Sugiura, Kento, Takao, Daiki, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Mondal, Anirban, editor, Gupta, Himanshu, editor, Srivastava, Jaideep, editor, Reddy, P. Krishna, editor, and Somayajulu, D.V.L.N., editor
- Published
- 2018
- Full Text
- View/download PDF
207. Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms
- Author
-
Hirakawa, Koji, Kikuchi, Kotaro, Ueki, Kazuya, Kobayashi, Tetsunori, Hayashi, Yoshihiko, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Tseng, Yuen-Hsien, editor, Sakai, Tetsuya, editor, Jiang, Jing, editor, Ku, Lun-Wei, editor, Park, Dae Hoon, editor, Yeh, Jui-Feng, editor, Yu, Liang-Chih, editor, Lee, Lung-Hao, editor, and Chen, Zhi-Hong, editor
- Published
- 2018
- Full Text
- View/download PDF
208. A Hybrid Framework for Query Processing and Data Analytics on Spark
- Author
-
Chen, Haokun, Zhang, Xiaowang, Zhang, Jiahui, Feng, Zhiyong, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, U, Leong Hou, editor, and Xie, Haoran, editor
- Published
- 2018
- Full Text
- View/download PDF
209. Spatial Batch-Queries Processing Using xBR-trees in Solid-State Drives
- Author
-
Roumelis, George, Vassilakopoulos, Michael, Corral, Antonio, Fevgas, Athanasios, Manolopoulos, Yannis, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Abdelwahed, El Hassan, editor, Bellatreche, Ladjel, editor, Golfarelli, Mattéo, editor, Méry, Dominique, editor, and Ordonez, Carlos, editor
- Published
- 2018
- Full Text
- View/download PDF
210. In-Memory Interval Joins.
- Author
-
Bouros, Panagiotis, Mamoulis, Nikos, Tsitsigkos, Dimitrios, and Terrovitis, Manolis
- Abstract
The interval join is a popular operation in temporal, spatial, and uncertain databases. The majority of interval join algorithms assume that input data reside on disk and so, their focus is to minimize the I/O accesses. Recently, an in-memory approach based on plane sweep (PS) for modern hardware was proposed which greatly outperforms previous work. However, this approach relies on a complex data structure and its parallelization has not been adequately studied. In this article, we investigate in-memory interval joins in two directions. First, we explore the applicability of a largely ignored forward scan (FS)-based plane sweep algorithm, for single-threaded join evaluation. We propose four optimizations for FS that greatly reduce its cost, making it competitive or even faster than the state-of-the-art. Second, we study in depth the parallel computation of interval joins. We design a non-partitioning-based approach that determines independent tasks of the join algorithm to run in parallel. Then, we address the drawbacks of the previously proposed hash-based partitioning and suggest a domain-based partitioning approach that does not produce duplicate results. Within our approach, we propose a novel breakdown of the partition-joins into mini-joins to be scheduled in the available CPU threads and propose an adaptive domain partitioning, aiming at load balancing. We also investigate how the partitioning phase can benefit from modern parallel hardware. Our thorough experimental analysis demonstrates the advantage of our novel partitioning-based approach for parallel computation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
211. An Analysis of Enrollment and Query Attacks on Hierarchical Bloom Filter-Based Biometric Systems.
- Author
-
Shomaji, Sumaiya, Ghosh, Pallabi, Ganji, Fatemeh, Woodard, Damon, and Forte, Domenic
- Abstract
A Hierarchical Bloom Filter (HBF)-based biometric framework was recently proposed to provide compact storage, noise tolerance, and fast query processing for resource-constrained environments, e.g., Internet of things (IoT). While security and privacy were also touted as features of the HBF, it was not thoroughly evaluated. Compared to the classical BFs, the HBF uses a threshold parameter to make robust authentication decisions when the HBF encounters noise in the biometric input which one would think might lead to security issues. In this paper, the attack vectors that could compromise the HBF security by increasing the false positive authentication of non-members and by leaking soft information about enrolled members are explored. With quantitative analyses, HBF-based biometric system security under these well-defined attack vectors is evaluated and it is concluded that the framework is more difficult to attack than the classical Bloom Filter. Further, experimental results show that soft biometric information is also kept private. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
212. Query processing optimization in broadcasting XML data in mobile communications.
- Author
-
Shekarriz, Mohsen, Babamir, Seyed Morteza, and Mirabi, Meghdad
- Subjects
- *
XML (Extensible Markup Language) , *DATA transmission systems , *PROCESS optimization , *BROADCAST channels , *WIRELESS channels , *BROADCASTING industry - Abstract
Todays, XML as a de facto standard is used to broadcast data over mobile wireless networks. In these networks, mobile clients send their XML queries over a wireless broadcast channel and recieve their desired XML data from the channel. However, downloading the whole XML data by a mobile device is a challenge since the mobile devices used by clients are small battery powered devices with limited resources. To meet this challenge, the XML data should be indexed in such a way that the desired XML data can be found easily and only such data can be downloaded instead of the whole XML data by the mobile clients. Several indexing methods are proposed to selectively access the XML data over an XML stream. However, the existing indexing methods cause an increase in the size of XML stream by including some extra information over the XML stream. In this paper, a new XML stream structure is proposed to disseminate the XML data over a broadcast channel by grouping and summarizing the structural information of XML nodes. By summarizing such information, the size of XML stream can be reduced and therefore, the latency of retrieving the desired XML data over a wirless broadcast channel can be reduced. The proposed XML stream structure also contains indexes in order to skip from the irrelevant parts over the XML stream. It therefore can reduce the energy consumption of mobile devices in downloading the results of XML queries. In addition, our proposed XML stream structure can process different types of XML queries and experimental results showed that it improves the performace of XML query processing over the XML data stream compared to the existing research works in terms of access and tuning times. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
213. Progressive approaches to flexible group skyline queries.
- Author
-
Yang, Zhibang, Zhou, Xu, Li, Kenli, Gao, Yunjun, and Li, Keqin
- Subjects
POINT set theory - Abstract
The G-Skyline (GSky) query is formulated to report optimal groups that are not dominated by any other group of the same size. Particularly, a given group G 1 dominates another group G 2 if for any point p ∈ G 1 , p dominates or equals to points p ′ ∈ G 2 ; at the same time, there is at least one point p dominating p ′ . Most existing group skyline queries need to calculate an aggregate point for each group. Compared to these queries, the GSky query is more practical because it avoids specifying an aggregate function which leads to miss important results containing non-skyline points. This means the GSky query can get much more comprehensive query results which not only contain the G-Skylines consisting of skyline points but also the G-Skylines including non-skyline points. Here, a non-skyline point is dominated by another point in a given data set. However, the GSky query usually returns too many results, making it a big burden for users to pick out their expected results. To address these issues, we investigate a flexible group skyline query, namely Flexible G-Skyline (FGSky) query, which is flexible and practical for directly computing the optimal groups on the basis of user preferences. In this paper, we formulate the FGSky query, identify its properties, and present effective pruning strategies. Besides, we propose progressive algorithms for the FGSky query where a grouping strategy and a layered strategy are utilized to get better query performance. Through extensive experiments on both synthetic and real data sets, we demonstrate the efficiency, effectiveness, and progressiveness of the proposed algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
214. Exploring Means to Enhance the Efficiency of GPU Bitmap Index Query Processing.
- Author
-
Tran, Brandon, Schaffner, Brennan, Myre, Joseph M., Sawin, Jason, and Chiu, David
- Subjects
GRAPHICS processing units ,COMPUTER systems ,DATA structures ,METADATA - Abstract
Once exotic, computational accelerators are now commonly available in many computing systems. Graphics processing units (GPUs) are perhaps the most frequently encountered computational accelerators. Recent work has shown that GPUs are beneficial when analyzing massive data sets. Specifically related to this study, it has been demonstrated that GPUs can significantly reduce the query processing time of database bitmap index queries. Bitmap indices are typically used for large, read-only data sets and are often compressed using some form of hybrid run-length compression. In this paper, we present three GPU algorithm enhancement strategies for executing queries of bitmap indices compressed using word aligned hybrid compression: (1) data structure reuse (2) metadata creation with various type alignment and (3) a preallocated memory pool. The data structure reuse greatly reduces the number of costly memory system calls. The use of metadata exploits the immutable nature of bitmaps to pre-calculate and store necessary intermediate processing results. This metadata reduces the number of required query-time processing steps. Preallocating a memory pool can reduce or entirely remove the overhead of memory operations during query processing. Our empirical study showed that performing a combination of these strategies can achieve 32.4 × to 98.7 × speedup over the current state-of-the-art implementation. Our study also showed that by using our enhancements, a common gaming GPU can achieve a 15.0 × speedup over a more expensive high-end CPU. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
215. Internal and external memory set containment join.
- Author
-
Yang, Chengcheng, Deng, Dong, Shang, Shuo, Zhu, Fan, Liu, Li, and Shao, Ling
- Abstract
A set containment join operates on two set-valued attributes with a subset (⊆ ) relationship as the join condition. It has many real-world applications, such as in publish/subscribe services and inclusion dependency discovery. Existing solutions can be broadly classified into union-oriented and intersection-oriented methods. Based on several recent studies, union-oriented methods are not competitive as they involve an expensive subset enumeration step. Intersection-oriented methods build an inverted index on one attribute and perform inverted list intersection on another attribute. Existing intersection-oriented methods intersect inverted lists one-by-one. In contrast, in this paper, we propose to intersect all the inverted lists simultaneously while skipping many irrelevant entries in the lists. To share computation, we utilize the prefix tree structure and extend our novel list intersection method to operate on the prefix tree. To further improve the efficiency, we propose to partition the data and process each partition separately. Each partition will be associated with a much smaller inverted index, and the set containment join cost can be significantly reduced. Moreover, to support large-scale datasets that are beyond the available memory space, we develop a novel adaptive data partition method that is designed to fully leverage the available memory and achieve high I/O efficiency, and thereby exhibiting outstanding performance for external memory set containment join. We evaluate our methods using both real-world and synthetic datasets. Experimental results demonstrate that our method outperforms state-of-the-art methods by up to 10 × when the dataset is completely resided in memory. Furthermore, our approach achieves up to two orders of magnitude improvement on I/O efficiency compared with a baseline method when the dataset size exceeds the main memory space. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
216. GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases.
- Author
-
Licheri, Nicola, Bonnici, Vincenzo, Beccuti, Marco, and Giugno, Rosalba
- Subjects
BIOLOGICAL databases ,CHARTS, diagrams, etc. ,ELECTRIC power filters ,MORPHOLOGY ,DATA structures ,PATTERN matching ,RDF (Document markup language) - Abstract
Background: Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. Results: An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. Conclusion: The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
217. Integrity Authentication for SQL Query Evaluation on Outsourced Databases: A Survey.
- Author
-
Zhang, Bo, Dong, Boxiang, and Wang, Wendy Hui
- Subjects
- *
SQL , *DATABASE management , *CLOUD computing , *DATA integrity , *DATABASES , *DATA management - Abstract
Spurred by the development of cloud computing, there has been considerable recent interest in the Database-as-a-Service (DaaS) paradigm. Users lacking in expertise or computational resources can outsource their data and database management needs to a third-party service provider. Outsourcing, however, raises an important issue of result integrity: how can the client verify with lightweight overhead that the query results returned by the service provider are correct (i.e., the same as the results of query execution locally)? This survey focuses on categorizing and reviewing the progress on the current approaches for result integrity of SQL query evaluation in the DaaS model. The survey also includes some potential future research directions for result integrity verification of the outsourced computations. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
218. Optimizing Multi-Query Evaluation in Federated RDF Systems.
- Author
-
Peng, Peng, Ge, Qi, Zou, Lei, Ozsu, M. Tamer, Xu, Zhiwei, and Zhao, Dongyan
- Subjects
- *
HEURISTIC algorithms , *RDF (Document markup language) , *APPROXIMATION algorithms - Abstract
This paper revisits the classical problem of multiple query optimization in federated RDF systems. We propose a heuristic query rewriting-based approach to optimize the evaluation of multiple queries. This approach can take advantage of SPARQL 1.1 to share the common computation of multiple queries while considering the cost of both query evaluation and data shipment. Although we prove that finding the optimal rewriting for multiple queries is NP-complete, we propose a heuristic rewriting algorithm with a bounded approximation ratio. Furthermore, we propose an efficient method to use the interconnection topology between RDF sources to filter out irrelevant sources, and utilize some characteristics of SPARQL 1.1 to optimize multiple joins of intermediate matches. The extensive experimental studies show that the proposed techniques are effective, efficient and scalable. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
219. Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches.
- Author
-
Lukas, Petr, Baca, Radim, Kratky, Michal, and Ling, Tok Wang
- Subjects
- *
XML (Extensible Markup Language) , *VECTOR spaces , *CONSTRUCTION planning , *IMPEDANCE matching , *TWIGS , *SEMANTICS - Abstract
XML queries can be modeled by twig pattern queries (TPQs) specifying predicates on XML nodes and XPath relationships satisfied between them. A lot of TPQ types have been proposed; this paper takes into account a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery semantics and, in many cases, it leads to a more efficient query processing. In general, there are two types of approaches to process a TPQ: holistic joins and binary joins. Whereas the binary join approach builds a query plan as a tree of interconnected binary operators, the holistic join approach evaluates a whole query using one operator (i.e., using one complex algorithm). Surprisingly, a thorough analytical and experimental comparison is still missing despite an enormous research effort in this area. In this paper, we try to fill this gap; we analytically and experimentally show that the binary joins used in a fully-pipelined plan (i.e., the plan where each join operation does not wait for the complete result of the previous operation and no explicit sorting is used) can often outperform the holistic joins, especially for TPQs with a higher ratio of non-output query nodes. The main contributions of this paper can be summarized as follows: (i) we introduce several improvements of existing binary join approaches allowing to build a fully-pipelined plan for a TPQ considering non-output query nodes, (ii) we prove that for a certain class of TPQs such a plan has the linear time complexity with respect to the size of the input and output as well as the linear space complexity with respect to the XML document depth (i.e., the same complexity as the holistic join approaches), (iii) we show that our improved binary join approach outperforms the holistic join approaches in many situations, and (iv) we propose a simple combined approach that utilizes advantages of both types of approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
220. Selectivity estimation with density-model-based multidimensional histogram.
- Author
-
Zhang, Meifan and Wang, Hongzhi
- Subjects
HISTOGRAMS ,DENSITY ,PAILS - Abstract
Histograms are widely used in selectivity estimation for one-dimensional data. Using the one-dimensional histograms to estimate the selectivity of the multidimensional queries will result in a high estimation error, unless the assumption of attribute independence is true. Constructing a multidimensional histogram also brings great challenges. The storage of a multidimensional histogram exponentially increases with the number of dimensions. In this paper, we propose a density-model-based multidimensional histogram. It uses a lightweight density model to predict the densities of a large number of regions instead of storing too many buckets. The experimental results indicate that our method can provide highly accurate selectivity estimations while occupying little space. In addition, the superiority of our method is more evident in high-dimensional data. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
221. Wavelet-Based Least Common Ancestor Algorithm for Aggregate Query Processing in Energy Aware Wireless Sensor Network.
- Author
-
Bhardwaj, Reeta and Kumar, Dinesh
- Abstract
Wireless sensor network (WSN) is developed as a network of sensors, which engage in sensing and transmitting the data to the sink node. The constraints, such as energy, memory, and bandwidth insist the researchers to develop an efficient method for data transmission in WSN. Accordingly, this paper introduces a data aggregation mechanism based on query processing, Wavelet-based Least Common Ancestor-Sliding window (WLCA-SW). The energy-loss and memory-crisis is well addressed using the proposed WLCA-SW through the successive steps of query processing, duplicate detection, data compression using the wavelet transformation, and data aggregation. The proposed WLCA-SWA is developed with the integration of the weighed sliding window and Least Common Ancestor (LCA), which enables the energy-aware aggregate query processing and de-duplication such that the duplicate records are detected potentially prior to the communication of the sensed data to the sink node. It is prominent that the weighed sliding window is the extension of the existing time-based sliding windows. The effectiveness of the proposed aggregate processing approach is evaluated based on the metrics, such as number of alive nodes, data reduction rate, data-loss percentage, and residual energy, which is found to be 33, 85%, 8.222%, and 0.0610 J at the end of 1000 rounds using 150 nodes for analysis. Moreover, the proposed method has the minimum aggregation error of 0.03, when the analysis is performed using 50 nodes. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
222. Top-k Vehicle Matching in Social Ridesharing: A Price-Aware Approach.
- Author
-
Li, Yafei, Wan, Ji, Chen, Rui, Xu, Jianliang, Fu, Xiaoyi, Gu, Hongyan, Lv, Pei, and Xu, Mingliang
- Subjects
- *
RIDESHARING , *TRAFFIC congestion , *TIME-based pricing , *SOCIAL cohesion , *AIR pollution , *AIR traffic - Abstract
In the past few years ridesharing has largely reshaped the transportation marketplace. It is envisioned as a promising solution to transportation-related problems in metropolitan cities, such as traffic congestion and air pollution. In the current ridesharing research, social ridesharing, which makes use of social relations among drivers and riders to address safety issues, and dynamic pricing are two active directions with important business implications. Simultaneously optimizing social cohesion and revenue is vital to a commercial ridesharing platform's sustainable development, which, however, has not been previously studied. In this paper, we first present a new pricing scheme that better incentivizes drivers and riders to participate in ridesharing, and then propose a novel type of Price-aware Top-k Matching (PTkM) queries which retrieve the top-k vehicles for a rider's request by taking into account both social relations and revenue. We design an efficient algorithm with a set of powerful pruning techniques to tackle this problem. Moreover, we propose a novel index tailored to our problem to further speed up query processing. Extensive experimental results on real datasets show that our proposed algorithms achieve desirable performance for real-world deployment. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
223. CorrectMR: Authentication of Distributed SQL Execution on MapReduce.
- Author
-
Zhang, Bo, Dong, Boxiang, and Wang, Wendy Hui
- Subjects
- *
SQL , *DATA structures , *DATABASE security , *TASK analysis - Abstract
In this paper, we consider the SQL Selection-GroupBy-Aggregation (SGA) query evaluation on an untrusted MapReduce system in which mappers and reducers may return incorrect results. We design CorrectMR, a system that supports efficient verification of result correctness for both intermediate and final results of SGA queries. CorrectMR includes the design of Pedersen Merkle R-tree (PMR-tree), a new authenticated data structure (ADS). To enable efficient verification, CorrectMR includes a distributed ADS construction mechanism that allows mappers/reducers to construct PMR-trees in parallel without a centralized party. CorrectMR provides the following verification functionality: (1) correctness verification of PMR-trees by replication; (2) correctness verification of intermediate (final, resp.) query results by constructing local (global, resp.) PMR-trees and verification objects. Our experimental results demonstrate the efficiency and effectiveness of CorrectMR. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
224. Spatio-Temporal Reachable Area Calculation Based on Urban Traffic Data.
- Author
-
Tang, Xiaolan, Chai, Minglu, Chen, Xiaoran, and Chen, Wenlong
- Abstract
The spatio-temporal reachable area that can be reached by driving vehicles from a specific starting location at starting time during a given time period is widely used in urban applications, such as the online ride-hailing service and the charging station seeking by an electric car. Existing studies on this issue usually utilize the static road network, which implies the spatial features, whereas the temporal characteristics of the dynamic urban traffic have not been explored in depth. In this article, we propose a spatio-temporal reachable area calculation scheme, named STRC, based on a tremendous amount of vehicle trajectory data. In STRC, the objective trajectory fragments are extracted according to the starting node and the query time period, and then mapped to reachable nodes to reduce the computational complexity. From those reachable road segments having reachable nodes on, a boundary segment is selected in each sector area by using the arrival-based one segment policy for time-sensitive applications or the arrival-and-distance-based k-segment policy for distance-related applications. The experiments based on large-scale trajectory data from 34 040 taxis in Beijing for one month show that STRC outputs more accurate reachable area, which covers the trajectory points with much smaller area, than compared schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
225. Quality of service aware optimization of sensor network queries
- Author
-
Galpin, Ixent
- Subjects
621.382 ,Sensor networks ,Query Processing ,Quality of Service - Abstract
Sensor networks comprise resource-constrained wireless nodes with the capability of gathering information about their surroundings and have recently risen to prominence with the promise of being an effective computing platform for diverse applications, ranging from event detection to environmental monitoring. The database community proposed the use of sensor network query processors (SNQPs) as means to meet data collection requirements using a declarative query language. Declarative queries posed against a sensor network constitute an effective means to repurpose sensor networks and reduce the high software development costs associated with them. The range of sensor network applications is very broad. Such applications have diverse, and often conflicting, QoS expectations in terms of the delivery time of results, the acquisition interval at which data is collected, the total energy consumption of the deployment, or the network lifetime. The conflicting nature of these desiderata is aggravated by the resource-constrained nature of sensor networks as a computing fabric, making it particularly challenging to reconcile the trade-offs that arise. Previously, SNQPs have been focussed on evaluating queries as energy-efficiently as possible. There has been comparatively less work on attempting to meet a broad range of optimization goals and constraints that captured these QoS expectations. In this respect, previous work in SNQP has not aimed at being general purpose across the breadth of applications to which sensor networks have been applied. This PhD dissertation presents an approach for enabling QoS-awareness in SNQPs so that query evaluation plans are generated that exhibit good performance for a broader range of sensor network applications in terms of their QoS expectations. The research contributions reported here include (a) a functional decomposition of the decision-making steps required to compile a declarative query into a query evaluation plan in a sensor network setting; (b) algorithms to implement these decision-making steps; and (c) an empirical evaluation to show the benefits of QoS-awareness compared to a representative fixed-goal SNQP.
- Published
- 2010
226. LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
- Author
-
Mingjie Tang, Yongyang Yu, Ahmed R. Mahmood, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref
- Subjects
spatial data ,query processing ,in-memory computation ,parallel computing ,query optimization ,Information technology ,T58.5-58.64 - Abstract
Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happens in practice, and minimizes communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.
- Published
- 2020
- Full Text
- View/download PDF
227. R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
- Author
-
Tin Vu and Ahmed Eldawy
- Subjects
big spatial data ,partitioning ,R*-Grove ,index optimization ,query processing ,Information technology ,T58.5-58.64 - Abstract
The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research.
- Published
- 2020
- Full Text
- View/download PDF
228. Optimizing Skyline Query Processing in Incomplete Data
- Author
-
Yonis Gulzar, Ali A. Alwan, and Sherzod Turaev
- Subjects
Algorithms ,incomplete data ,database ,preference queries ,query processing ,skylines ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Given the significance of skyline queries, they are incorporated in various modern applications including personalized recommendation systems as well as decision-making and decision-support systems. Skyline queries are used to identify superior data items in the database. Most of the previously proposed skyline algorithms work on a complete database where the data are always present (non-missing). However, in many contemporary real-world databases, particularly those databases with large cardinality and high dimensionality, such assumption is not necessarily valid. Hence, missing data pose new challenges if the processing skyline queries cannot easily apply those methods that are designed for complete data. This is due to the fact that imperfect data cause the loss of the transitivity property of the skyline method and cyclic dominance. This paper presents a framework called Optimized Incomplete Skyline (OIS) which utilizes a technique that simplifies the skyline process on a database with missing data and helps prune the data items before performing the skyline process. The proposed strategy assures that the number of the domination tests is significantly reduced. A set of experiments has been accomplished using both real and synthetic datasets aimed at validating the performance of the framework. The experiment results confirm that the OIS framework is indeed superior and steadily outperforms the current approaches in terms of the number of domination tests required to retrieve the skylines.
- Published
- 2019
- Full Text
- View/download PDF
229. Gridvoronoi: An Efficient Spatial Index for Nearest Neighbor Query Processing
- Author
-
Chongsheng Zhang, George Almpanidis, Faegheh Hasibi, and Gaojuan Fan
- Subjects
Geospatial analysis ,nearest neighbour methods ,query processing ,spatial databases ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In this paper, based upon Voronoi Diagram, we propose GridVoronoi which is a novel spatial index that enables users to find the spatial nearest neighbour (NN) from two-dimensional (2D) datasets in almost O(1) time. GridVoronoi augments the Voronoi Diagram with a virtual grid to promptly find out (in a geometric space) which Voronoi cell contains the query point. It consists of an off-line data pre-processing phase and an on-line query processing phase. In the off-line phase, the digital geographical space is partitioned with a Voronoi Diagram and a virtual grid, respectively. Next, for each square unit (i.e., grid cell), the corresponding Voronoi cells that contain or intersect with this square are derived and kept in a hashmap-like structure. In the on-line phase, for each real-time spatial NN query, the algorithm first identifies which virtual square(s) contain(s) this query; then looks up the hashmap structure to find the corresponding Voronoi cell(s) for this grid cell and the final result for the query. Overall, GridVoronoi significantly reduces the time complexity in finding spatial NN in 2D space, thus improves the efficiency of real-time spatial NN queries and Location Based Services.
- Published
- 2019
- Full Text
- View/download PDF
230. Efficient Similarity Search on Quasi-Metric Graphs
- Author
-
Tianming Zhang, Yunjun Gao, Lu Chen, Guanlin Chen, and Shiliang Pu
- Subjects
Algorithm ,graph ,metric space ,query processing ,similarity search ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Similarity search in metric spaces finds similar objects to a given object, which has received much attention as it is able to support various data types and flexible similarity metrics. In real-life applications, metric spaces might be combined with graphs, resulting in geo-social network, citation graph, social image graph, to name but a few. In this paper, we introduce a new notion called quasi-metric graph that connects metric data using a graph, and formulate similarity search on quasi-metric graphs based on the combined similarity metric considering both the metric data similarity and graph similarity. We propose two simple efficient approaches, the best-first method and the breadth-first method, which traverse the quasi-metric graph following the best-first and the breadth-first paradigms, respectively, and utilize the triangle inequality to prune unnecessary evaluation. Extensive experiments with three real datasets demonstrate, compared with several baseline methods, the effectiveness and efficiency of our proposed methods.
- Published
- 2019
- Full Text
- View/download PDF
231. Toward Securing Cloud-Based Data Analytics: A Discussion on Current Solutions and Open Issues
- Author
-
Somayeh Sobati Moghadam and Amjad Fayoumi
- Subjects
Cloud computing ,data analytics ,data privacy ,query processing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In the last few years, organizations and business professionals have realized the value of collaborative data analytics in supporting decision-making. Where several activities are performed on online data by different stakeholders, such as cleansing, aggregation, analysis, and visualization, cloud-based data analytics has become a favored choice for business professionals due to the elasticity, availability, scalability, and pay-as-you-go features offered by cloud computing. However, large amounts of data stored on the cloud are very sensitive (e.g., innovation, financial, legal, and customers' data), and so data privacy remains one of the top concerns for many reasons; mainly those relating to legal or competition issues. In this paper, we review the security and cryptographic mechanisms which aim to make data analytics secure in a cloud environment and discuss current research challenges.
- Published
- 2019
- Full Text
- View/download PDF
232. Universal algorithm of processing of requests with use of parallel technology
- Author
-
Timofeeva N.E. and Dmitrieva K.A.
- Subjects
algorithm ,database ,distributed database ,parallel database ,MapReduсe ,query processing ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Chemistry ,QD1-999 - Abstract
Processing and storage of a large amount of information is one of the difficult and interesting tasks at the moment. The performance of the system as a whole depends on how well the performance and reliability of the database are implemented. One of the most difficult aspects of this issue is the handling of a database query and its efficient execution. In this paper we consider modern methods and models of query processing in databases. We offer an algorithm to service the request of users, which involves the use of parallel technologies in the exchange of information with the nodes of a distributed database and a dictionary, and also allows to increase the query execution time, which in turn will increase the speed of the system as a whole. We bring the current at the moment the technology of storing large amounts of data: distributed and parallel databases, MapReduce.
- Published
- 2018
- Full Text
- View/download PDF
233. An Enhanced Cloud Based View Materialization Approach for Peer-to-Peer Architecture
- Author
-
Megahed, M. E., Ismail, Rasha M., Badr, Nagwa L., Tolba, Mohamed Fahmy, Kacprzyk, Janusz, Series editor, Jain, Lakhmi C., Series editor, Hassanien, Aboul Ella, editor, Mostafa Fouad, Mohamed, editor, Manaf, Azizah Abdul, editor, Zamani, Mazdak, editor, and Ahmad, Rabiah, editor
- Published
- 2017
- Full Text
- View/download PDF
234. Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique
- Author
-
Salami, Behzad, Arcas-Abella, Oriol, Sonmez, Nehir, Unsal, Osman, Kestelman, Adrian Cristal, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Barrios Hernández, Carlos Jaime, editor, Gitler, Isidoro, editor, and Klapp, Jaime, editor
- Published
- 2017
- Full Text
- View/download PDF
235. A Recursive Continuous Query Language for Integration of Streams and Graphs
- Author
-
Watanabe, Yousuke, Xhafa, Fatos, Series editor, Barolli, Leonard, editor, and Amato, Flora, editor
- Published
- 2017
- Full Text
- View/download PDF
236. Describing and Comparing Big Data Querying Tools
- Author
-
Rodrigues, Mário, Santos, Maribel Yasmina, Bernardino, Jorge, Rocha, Álvaro, editor, Correia, Ana Maria, editor, Adeli, Hojjat, editor, Reis, Luís Paulo, editor, and Costanzo, Sandra, editor
- Published
- 2017
- Full Text
- View/download PDF
237. Cloud-Assisted Data Storage and Query Processing at Vehicular Ad-Hoc Sensor Networks
- Author
-
Lai, Yongxuan, Zheng, Lv, Wang, Tian, Yang, Fang, Zhou, Qifeng, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Wang, Guojun, editor, Atiquzzaman, Mohammed, editor, Yan, Zheng, editor, and Choo, Kim-Kwang Raymond, editor
- Published
- 2017
- Full Text
- View/download PDF
238. On the Need for Applications Aware Adaptive Middleware in Real-Time RDF Data Analysis (Short Paper)
- Author
-
Shamszaman, Zia Ush, Ali, Muhammad Intizar, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Panetto, Hervé, editor, Debruyne, Christophe, editor, Gaaloul, Walid, editor, Papazoglou, Mike, editor, Paschke, Adrian, editor, Ardagna, Claudio Agostino, editor, and Meersman, Robert, editor
- Published
- 2017
- Full Text
- View/download PDF
239. Bulk Insertions into xBR-trees
- Author
-
Roumelis, George, Vassilakopoulos, Michael, Corral, Antonio, Manolopoulos, Yannis, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ouhammou, Yassine, editor, Ivanovic, Mirjana, editor, Abelló, Alberto, editor, and Bellatreche, Ladjel, editor
- Published
- 2017
- Full Text
- View/download PDF
240. Storing Join Relationships for Fast Join Query Processing
- Author
-
Hamdi, Mohammed, Yu, Feng, Alswedani, Sarah, Hou, Wen-Chi, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Benslimane, Djamal, editor, Damiani, Ernesto, editor, Grosky, William I., editor, Hameurlain, Abdelkader, editor, Sheth, Amit, editor, and Wagner, Roland R., editor
- Published
- 2017
- Full Text
- View/download PDF
241. WSN-DD: A Wireless Sensor Network Deployment Design Tool
- Author
-
Bonilla Bonilla, David Santiago, Galpin, Ixent, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Calì, Andrea, editor, Wood, Peter, editor, Martin, Nigel, editor, and Poulovassilis, Alexandra, editor
- Published
- 2017
- Full Text
- View/download PDF
242. Private Conjunctive Query over Encrypted Data
- Author
-
Saha, Tushar Kanti, Koshiba, Takeshi, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Joye, Marc, editor, and Nitaj, Abderrahmane, editor
- Published
- 2017
- Full Text
- View/download PDF
243. (A)kNN Query Processing on the Cloud: A Survey
- Author
-
Nodarakis, Nikolaos, Rapti, Angeliki, Sioutas, Spyros, Tsakalidis, Athanasios K., Tsolis, Dimitrios, Tzimas, Giannis, Panagis, Yannis, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Sellis, Timos, editor, and Oikonomou, Konstantinos, editor
- Published
- 2017
- Full Text
- View/download PDF
244. Compression-Aware In-Memory Query Processing: Vision, System Design and Beyond
- Author
-
Hildebrandt, Juliana, Habich, Dirk, Damme, Patrick, Lehner, Wolfgang, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Blanas, Spyros, editor, Bordawekar, Rajesh, editor, Lahiri, Tirthankar, editor, Levandoski, Justin, editor, and Pavlo, Andrew, editor
- Published
- 2017
- Full Text
- View/download PDF
245. Overtaking CPU DBMSes with a GPU in Whole-Query Analytic Processing with Parallelism-Friendly Execution Plan Optimization
- Author
-
Agbaria, Adnan, Minor, David, Peterfreund, Natan, Rozenberg, Eyal, Rosenberg, Ofer, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Blanas, Spyros, editor, Bordawekar, Rajesh, editor, Lahiri, Tirthankar, editor, Levandoski, Justin, editor, and Pavlo, Andrew, editor
- Published
- 2017
- Full Text
- View/download PDF
246. A Distributed Multi-level Composite Index for KNN Processing on Long Time Series
- Author
-
Wang, Xiaqing, Fang, Zicheng, Wang, Peng, Zhu, Ruiyuan, Wang, Wei, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Candan, Selçuk, editor, Chen, Lei, editor, Pedersen, Torben Bach, editor, Chang, Lijun, editor, and Hua, Wen, editor
- Published
- 2017
- Full Text
- View/download PDF
247. Research Issues for Next Generation Content-Based Image Retrieval
- Author
-
Tyagi, Vipin and Tyagi, Vipin
- Published
- 2017
- Full Text
- View/download PDF
248. Spatial Network Big Databases: An Introduction
- Author
-
Yang, KwangSoo, Shekhar, Shashi, Yang, KwangSoo, and Shekhar, Shashi
- Published
- 2017
- Full Text
- View/download PDF
249. Exploratory Ad-Hoc Analytics for Big Data
- Author
-
Eberius, Julian, Thiele, Maik, Lehner, Wolfgang, Zomaya, Albert Y., editor, and Sakr, Sherif, editor
- Published
- 2017
- Full Text
- View/download PDF
250. Bit-Oriented Sampling for Aggregation on Big Data.
- Author
-
Hu, Huan and Li, Jianzhong
- Subjects
- *
CENTRAL limit theorem , *BIG data , *CHEBYSHEV approximation , *DATA analysis - Abstract
The efficiency of big data analysis has become a bottleneck. Aggregation is a fundamental analytical task. It usually consumes a lot of time so that sampling based aggregation is often used to improve response time at a loss of result accuracy. In all of the related works, sampling is conducted at the granularity of data item. Considering the bits at different bit positions of each data item have different contributions to an aggregation result, the performance of sampling based aggregation has a chance of being improved if sampling is conducted at the granularity of bit. Thus, this paper studies bit-oriented sampling for aggregation. Two methods of bit-oriented uniform sampling based aggregation, i.e., DVBM and DVFM, are proposed which are based on the central limit theorem or the Chebyshev's inequality. They are much more efficient than the methods of the traditional data-oriented uniform sampling based aggregation. DVBM can guarantee a given error bound of aggregation with the assumption that sample variance equals dataset variance. By contrast, DVFM achieves the same goal without that assumption, but it could result in a larger sampling size. Extensive experiments are carried out and the results show that DVBM and DVFM are both efficient and effective. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.