332 results on '"Cuckoo hashing"'
Search Results
202. Peeling arguments and double hashing
- Author
-
Justin Thaler and Michael Mitzenmacher
- Subjects
Vertex (graph theory) ,Cuckoo hashing ,Theoretical computer science ,Universal hashing ,Hash function ,Data structure ,Randomness ,Double hashing ,K-independent hashing ,Mathematics - Abstract
The analysis of several algorithms and data structures can be reduced to the analysis of the following greedy “peeling” process: start with a random hypergraph; find a vertex of degree at most k, and remove it and all of its adjacent hyperedges from the graph; repeat until there is no suitable vertex. This specific process finds the k-core of a hypergraph, and variations on this theme have proven useful in analyzing for example decoding from low-density parity-check codes, several hash-based data structures such as cuckoo hashing, and algorithms for satisfiability of random formulae. This approach can be analyzed several ways, with two common approaches being via a corresponding branching process or a fluid limit family of differential equations. In this paper, we make note of an interesting aspect of these types of processes: the results are generally the same when the randomness is structured in the manner of double hashing. This phenomenon allows us to use less randomness and simplify the implementation for several hash-based data structures and algorithms. We explore this approach from both an empirical and theoretical perspective, examining theoretical justifications as well as simulation results for specific problems.
- Published
- 2012
- Full Text
- View/download PDF
203. Convergence of multivariate belief propagation, with applications to cuckoo hashing and load balancing
- Author
-
Leconte, Mathieu, Lelarge, Marc, Massoulié, Laurent, Laboratory of Information, Network and Communication Sciences (LINCS), Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Mines-Télécom [Paris] (IMT), Technicolor Paris lab, Technicolor, Dynamics of Geometric Networks (DYOGENE), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire d'informatique de l'école normale supérieure (LIENS), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Microsoft Research - Inria Joint Centre (MSR - INRIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Microsoft Research Laboratory Cambridge-Microsoft Corporation [Redmond, Wash.], Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Inria Paris-Rocquencourt, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
FOS: Computer and information sciences ,upshifted likelihood ratio stochastic order ,05C80 - Random graphs ,Discrete Mathematics (cs.DM) ,Probability (math.PR) ,load balancing ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,010102 general mathematics ,01 natural sciences ,capacitated b-matchings ,[MATH.MATH-PR]Mathematics [math]/Probability [math.PR] ,[INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI] ,010104 statistics & probability ,local weak convergence ,FOS: Mathematics ,cuckoo hashing ,Mathematics - Combinatorics ,Combinatorics (math.CO) ,0101 mathematics ,random graphs ,Mathematics - Probability ,belief propagation ,Computer Science - Discrete Mathematics - Abstract
This paper is motivated by two applications, namely i) generalizations of cuckoo hashing, a computationally simple approach to assigning keys to objects, and ii) load balancing in content distribution networks, where one is interested in determining the impact of content replication on performance. These two problems admit a common abstraction: in both scenarios, performance is characterized by the maximum weight of a generalization of a matching in a bipartite graph, featuring node and edge capacities. Our main result is a law of large numbers characterizing the asymptotic maximum weight matching in the limit of large bipartite random graphs, when the graphs admit a local weak limit that is a tree. This result specializes to the two application scenarios, yielding new results in both contexts. In contrast with previous results, the key novelty is the ability to handle edge capacities with arbitrary integer values. An analysis of belief propagation algorithms (BP) with multivariate belief vectors underlies the proof. In particular, we show convergence of the corresponding BP by exploiting monotonicity of the belief vectors with respect to the so-called upshifted likelihood ratio stochastic order. This auxiliary result can be of independent interest, providing a new set of structural conditions which ensure convergence of BP., 10 pages format + proofs in the appendix: total 24 pages
- Published
- 2012
204. CS1001.py
- Author
-
Benny Chor and Rani Hod
- Subjects
Cuckoo hashing ,Undergraduate curriculum ,Computer science ,Programming language ,Hash function ,ComputingMilieux_COMPUTERSANDEDUCATION ,Python (programming language) ,computer.software_genre ,Error detection and correction ,computer ,Curriculum ,Data type ,computer.programming_language - Abstract
We describe the curriculum, initial experience, and preliminary evaluation of an introductory CS course for students taking CS as their single or double major. The course is taught during the first or second semester of the first year of studies. It is centered around eleven to thirteen topics, offering a wide cover of major CS subjects. Many of these topics are not covered in "traditional" introductory CS courses, and some of them are not even covered through the standard undergraduate curricula. Examples: digital image representation and processing, error correction and detection codes, hashing (including Cuckoo hashing), and text compression.The programming language used in the course is Python. The students are exposed to all standard programming language constructs (commands, assignments, functions, conditionals, iterations, recursion, etc.) and basic data types (e.g. integers, floating-point numbers, lists, dictionaries, and sets), as well as less basic constructs, like higher order functions, lambda expressions, classes and methods, and iterators.We have set a dual learning outcome: the students are expected to acquire a good knowledge and proficiency of programming and understanding short programs. We also expect them to get a broad view of central subjects in Computer Science.
- Published
- 2012
- Full Text
- View/download PDF
205. Boos ting multiple hash tables to search
- Author
-
Jin-Cheng Li
- Subjects
Computer science ,Hash function ,Linear hashing ,Locality-sensitive hashing ,K-independent hashing ,Open addressing ,Data_FILES ,Computer Science::Data Structures and Algorithms ,Consistent hashing ,Computer Science::Databases ,Computer Science::Cryptography and Security ,business.industry ,Universal hashing ,Dynamic perfect hashing ,Pattern recognition ,2-choice hashing ,Hash table ,Hopscotch hashing ,Cuckoo hashing ,SUHA ,Locality preserving hashing ,Feature hashing ,Artificial intelligence ,business ,Perfect hash function ,Extendible hashing ,Double hashing ,Tabulation hashing - Abstract
Hashing based approximate nearest neighbor (ANN) search is an important technique to reduce the search time and storage for large scale information retrieval problems. Semi-supervised hashing (SSH) methods capture semantic similarity of data and avoid overfitting to training data. SSH outperforms supervised, unsupervised and Random Projection based hashing methods in semantic retrieval task. But, current semi-supervised and supervised hashing methods search by Hash lookup in a single hash table usually subject to a low recall. In order to achieve a high recall, an exhaustive search by hamming ranking is needed. It dramatically decreases the retrieval precision and increases the search time. In this paper, we propose to learn multiple semi-supervised hashing tables using boosting technique to overcome this problem. Multiple hash tables are learned sequentially with boosting to maximize hashing accuracy of each hash table. The mis-hash samples in the current hash table will be penalized by large weights and then the algorithm uses the new weight values to learn the next hash table. Given a query, the true semantic similar samples missed from the active buckets (the buckets in the small hamming radius of the query) of one hash table are more likely to be found in the active buckets of the next hash table. Experimental results show that our method achieves a high recall while preserves a high precision that outperforms several state-of-the-art hashing methods.
- Published
- 2012
- Full Text
- View/download PDF
206. Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)
- Author
-
Jae-Myung Kim, Woon-Hak Kang, Gihwan Oh, and Sang-Won Lee
- Subjects
Hash join ,Primary clustering ,HAT-trie ,Computer science ,Dynamic perfect hashing ,Hash function ,Hash buster ,Block nested loop ,Parallel computing ,Merkle tree ,Rolling hash ,Hash table ,Hash tree ,Cuckoo hashing ,Hash list ,SHA-2 ,Quadratic probing ,Data_FILES ,Hash chain ,Cryptographic hash function ,Hash filter ,Cache ,Tuple ,Perfect hash function ,Double hashing - Abstract
Recently, several studies on multi-core cache-aware hash join have been carried out [Kim09VLDB, Blanas11SIGMOD]. In particular, the work of Blanas has shown that rather simple no-partitioning hash join can outperform the work of Kim. Meanwhile, the simple but best performing hash join of Blanas still experiences severe cache misses in probing phase. Because the key values of tuples in outer relation are not sorted or clustered, each outer record has different hashed key value and thus accesses the different hash bucket. Since the size of hash table of inner table is usually much larger than that of the CPU cache, it is highly probable that the reference to hash bucket of inner table by each outer record would encounter cache miss. To reduce the cache misses in hash join probing phase, we propose a new join algorithm, Sorted Probing (in short, SP), which pre-sorts the hashed key values of outer table of hash join so that the access to the hash bucket of inner table has strong temporal locality, thus minimizing the cache misses during the probing phase. As an optimization technique of sorting, we used the cache-aware AlphaSort technique, which extracts the key from each record of data set to be sorted and its pointer, and then sorts the pairs of (key, rec_ptr). For performance evaluation, we used two hash join algorithms from Blanas' work, no partitioning(NP) and independent partitioning(IP) in a standard C++ program, provided by Blanas. Also, we implemented the AlphaSort and added it before each probing phase of NP and IP, and we call each algorithm as NP+SP and IP+SP. For syntactic workload, IP+SP outperforms all other algorithms: IP+SP is faster than other altorithms up to 30%.
- Published
- 2012
- Full Text
- View/download PDF
207. Hash table in Chinese Chess
- Author
-
Li Hongye, Xiao Chenjun, Lv Huizhan, and Wang Jiao
- Subjects
Primary clustering ,Theoretical computer science ,Universal hashing ,Computer science ,Dynamic perfect hashing ,Hash buster ,Hash function ,Zobrist hashing ,2-choice hashing ,Linear hashing ,Rolling hash ,Hash table ,Hopscotch hashing ,Hash tree ,Cuckoo hashing ,Data_FILES ,Hash chain ,Cryptographic hash function ,Consistent hashing ,Perfect hash function ,Double hashing ,Tabulation hashing - Abstract
Hash table is a very important technique in computer games. With the development of computer Chinese Chess, hash table is wildly used and some of implementations are innovational. Several notable hash table implementations are introduced, some of them being new, while some combing with the specific modification on traditional ones, all considering special characteristics of Chinese Chess. The two common methods, Zobrist hashing and lockless algorithm in parallel search, are also put forward. The experimental results reveal these hash tables are remarkable and essential, significantly improving the overall performance.
- Published
- 2012
- Full Text
- View/download PDF
208. Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash
- Author
-
Philipp Woelfel, Martin Aumüller, and Martin Dietzfelbinger
- Subjects
FOS: Computer and information sciences ,General Computer Science ,Universal hashing ,Computer science ,Applied Mathematics ,Dynamic perfect hashing ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Hash table ,Computer Science Applications ,K-independent hashing ,Combinatorics ,Cuckoo hashing ,010201 computation theory & mathematics ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data Structures and Algorithms (cs.DS) ,F.2.2 ,Perfect hash function ,Double hashing ,Tabulation hashing - Abstract
It is shown that for cuckoo hashing with a stash as proposed by Kirsch, Mitzenmacher, and Wieder (2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with stash size $s$ the probability of a rehash is $O(1/n^{s+1})$, and the evaluation time is $O(s)$. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (2010) (resp. $\Theta(\log n)$-wise independence for standard cuckoo hashing) the new approach even works with 2-wise independent hash families as building blocks. Both construction and analysis build upon the work of Dietzfelbinger and Woelfel (2003). The analysis, which can also be applied to the fully random case, utilizes a graph counting argument and is much simpler than previous proofs. As a byproduct, an algorithm for simulating uniform hashing is obtained. While it requires about twice as much space as the most space efficient solutions, it is attractive because of its simple and direct structure., Comment: 18 Pages
- Published
- 2012
209. CPHASH
- Author
-
Nickolai Zeldovich, Zviad Metreveli, M. Frans Kaashoek, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Kaashoek, M. Frans, Metreveli, Zviad, and Zeldovich, Nickolai
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,HAT-trie ,CPU cache ,Dynamic perfect hashing ,Hash function ,Message passing ,Parallel computing ,Linear hashing ,Computer Graphics and Computer-Aided Design ,Hash table ,Cuckoo hashing ,Content addressable network ,Rainbow table ,Cache ,Cache algorithms ,Software - Abstract
CPHash is a concurrent hash table for multicore processors. CPHash partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partition. CPHash's message passing avoids the need for locks, pipelines batches of asynchronous messages, and packs multiple messages into a single cache line transfer. Experiments on a 80-core machine with 2 hardware threads per core show that CPHash has ~1.6x higher throughput than a hash table implemented using fine-grained locks. An analysis shows that CPHash wins because it experiences fewer cache misses and its cache misses are less expensive, because of less contention for the on-chip interconnect and DRAM. CPServer, a key/value cache server using CPHash, achieves ~5% higher throughput than a key/value cache server that uses a hash table with fine-grained locks, but both achieve better throughput and scalability than memcached. The throughput of CPHash and CPServer also scale near-linearly with the number of cores., Quanta Computer (Firm), National Science Foundation (U.S.). (Award 915164)
- Published
- 2012
- Full Text
- View/download PDF
210. LHlf
- Author
-
Donghui Zhang and Per-Ake Larson
- Subjects
Universal hashing ,Computer science ,Dynamic perfect hashing ,Hash function ,Parallel computing ,Linear hashing ,2-choice hashing ,Computer Graphics and Computer-Aided Design ,Hash table ,K-independent hashing ,Locality-sensitive hashing ,Hopscotch hashing ,Cuckoo hashing ,Open addressing ,Pearson hashing ,Coalesced hashing ,SUHA ,Data_FILES ,Feature hashing ,Consistent hashing ,Extendible hashing ,Perfect hash function ,Software ,Double hashing ,Tabulation hashing - Abstract
LHlf is a new hash table designed to allow very high levels of concurrency. The table is lock free and grows and shrinks auto-matically according to the number of items in the table. Insertions, lookups and deletions are never blocked. LHlf is based on linear hashing but adopts recursive split-ordering of the items within a bucket to be able to split and merge lists in a lock free manner. LHlf is as fast as the best previous lock-free design and in addition it offers stable performance, uses less space, and supports both expansions and contractions.
- Published
- 2012
- Full Text
- View/download PDF
211. *Dictionary Data Structures
- Author
-
Stefan Schrödl and Stefan Edelkamp
- Subjects
Cuckoo hashing ,Theoretical computer science ,Universal hashing ,Hash function ,Data structure ,Priority queue ,Algorithm ,Perfect hash function ,Pairing heap ,Hash table ,Mathematics - Abstract
For the enhanced efficiency of A* this chapter discusses different dictionary data structures. Priority queues are provided for integer and total ordered keys. For duplicate elimination, hashing is studied, including provably constant access time. Moreover, maintaining partial states in form of substrings or subsets is considered.
- Published
- 2012
- Full Text
- View/download PDF
212. Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash
- Author
-
Martin Dietzfelbinger, Philipp Woelfel, and Martin Aumüller
- Subjects
Discrete mathematics ,Universal hashing ,Dynamic perfect hashing ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Hash table ,K-independent hashing ,Hopscotch hashing ,Combinatorics ,Cuckoo hashing ,Open addressing ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Double hashing ,Mathematics - Abstract
It is shown that for cuckoo hashing with a stash as proposed by Kirsch, Mitzenmacher, and Wieder (2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with stash size s the probability of a rehash is O(1/ns+1), and the evaluation time is O(s). Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (2010) (resp. Θ(logn)-wise independence for standard cuckoo hashing) the new approach even works with 2-wise independent hash families as building blocks. Both construction and analysis build upon the work of Dietzfelbinger and Woelfel (2003). The analysis, which can also be applied to the fully random case, utilizes a graph counting argument and is much simpler than previous proofs. As a byproduct, an algorithm for simulating uniform hashing is obtained. While it requires about twice as much space as the most space efficient solutions, it is attractive because of its simple and direct structure.
- Published
- 2012
- Full Text
- View/download PDF
213. On the Vulnerability of Hardware Hash Tables to Sophisticated Attacks
- Author
-
Udi Ben-Porat, Anat Bremler-Barr, Hanoch Levy, Bernhard Plattner, Computer Engineering and Networks Laboratory (TIK), Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich] (ETH Zürich), Computer Science department [Herzliya], Interdisciplinay Center Herzliya, School of Computer Science (TAU-CS), Tel Aviv University [Tel Aviv], Robert Bestak, Lukas Kencl, Li Erran Li, Joerg Widmer, Hao Yin, and TC 6
- Subjects
Computer science ,business.industry ,Universal hashing ,Dynamic perfect hashing ,Hash function ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,Computer security ,computer.software_genre ,Hash table ,[INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI] ,Cuckoo hashing ,Open addressing ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,business ,Consistent hashing ,computer ,Double hashing ,Computer hardware - Abstract
Part 3: Reliability and Resilience; International audience; Peacock and Cuckoo hashing schemes are currently the most studied hash implementations for hardware network systems (such as NIDS, Firewalls, etc.). In this work we evaluate their vulnerability to sophisticated complexity Denial of Service (DoS) attacks. We show that an attacker can use insertion of carefully selected keys to hit the Peacock and Cuckoo hashing schemes at their weakest points. For the Peacock Hashing, we show that after the attacker fills up only a fraction (typically 5% − 10%) of the buckets, the table completely loses its ability to handle collisions, causing the discard rate (of new keys) to increase dramatically (100 − 1,800 times higher). For the Cuckoo Hashing, we show an attack that can impose on the system an excessive number of memory accesses and degrade its performance. We analyze the vulnerability of the system as a function of the critical parameters and provide simulations results as well.
- Published
- 2012
- Full Text
- View/download PDF
214. Performance of linear hashing schemes for primary key retrieval
- Author
-
Yannis Manolopoulos and Nikos A. Lorentzos
- Subjects
Theoretical computer science ,Computer science ,Hash function ,Linear hashing ,Locality-sensitive hashing ,K-independent hashing ,Open addressing ,Data_FILES ,Computer Science::Data Structures and Algorithms ,Consistent hashing ,Computer Science::Databases ,Computer Science::Cryptography and Security ,Universal hashing ,Dynamic perfect hashing ,2-choice hashing ,Hash table ,Hopscotch hashing ,Cuckoo hashing ,Hardware and Architecture ,Locality preserving hashing ,Feature hashing ,Extendible hashing ,Algorithm ,Software ,Double hashing ,Information Systems - Abstract
Linear hashing is one of the most attractive dynamic hashing schemes. Linear hashing with partial expansions and linear hashing with priority splitting are two variations with improved space and time performance. Here, we propose a new structure, which is termed linear hashing with partial expansions and priority splitting. The above four structures are compared by simulation and it is shown that the new scheme outperforms its predecessors in both time and space costs.
- Published
- 1994
- Full Text
- View/download PDF
215. Coherent Parallel Hashing
- Author
-
Samuel Hornus, Anass Lasram, Ismael García, Sylvain Lefebvre, Geometry and Graphics Group (GGG), Universitat de Girona (UdG), Geometry and Lighting (ALICE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Ministerio de Ciencia e Innovación, Spain (TIN2010-20590-C02-02), European Project: 205693,EC:FP7:ERC,ERC-2007-StG,GOODSHAPE(2008), Ministerio de Ciencia e Innovación (Espanya), and Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)
- Subjects
sparse data ,Computer science ,Hash function ,02 engineering and technology ,Parallel computing ,Linear hashing ,hashing ,K-independent hashing ,ACM: I.: Computing Methodologies/I.3: COMPUTER GRAPHICS/I.3.6: Methodology and Techniques/I.3.6.2: Graphics data structures and data types ,Open addressing ,Computer graphics ,parallel ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Consistent hashing ,Sparse matrix ,Universal hashing ,Dynamic perfect hashing ,Infografia ,020207 software engineering ,spatial hashing ,2-choice hashing ,Computer Graphics and Computer-Aided Design ,Hash table ,[INFO.INFO-GR]Computer Science [cs]/Graphics [cs.GR] ,Hopscotch hashing ,Cuckoo hashing ,Locality preserving hashing ,Feature hashing ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Double hashing ,coherent - Abstract
International audience; Recent spatial hashing schemes hash millions of keys in parallel, compacting sparse spatial data in small hash tables while still allowing for fast access from the GPU. Unfortunately, available schemes suffer from two drawbacks: Multiple runs of the construction process are often required before success, and the random nature of the hash functions decreases access performance. We introduce a new parallel hashing scheme which reaches high load factor with a very low failure rate. In addition our scheme has the unique advantage to exploit coherence in the data and the access patterns for faster performance. Compared to existing approaches, it exhibits much greater locality of memory accesses and consistent execution paths within groups of threads. This is especially well suited to Computer Graphics applications, where spatial coherence is common. In absence of coherence our scheme performs similarly to previous methods, but does not suffer from construction failures. Our scheme is based on the Robin Hood scheme modified to quickly abort queries of keys that are not in the table, and to preserve coherence. We demonstrate our scheme on a variety of data sets. We analyze construction and access performance, as well as cache and threads behavior.; Nous décrivons une nouvelle technique de construction de table de hachage en parallèle qui permet d'insérer plusieurs centaines de millions de clés par secondes dans des tables pleines à 99%. Cette technique rend possible un accès très rapide aux données quand l'ordre des requêtes est cohérent, ce qui la rend particulièrement adaptée aux applications de l'informatique graphique.
- Published
- 2011
- Full Text
- View/download PDF
216. Complementary hashing for approximate nearest neighbor search
- Author
-
Gang Zeng, Hao Xu, Shipeng Li, Nenghai Yu, Jingdong Wang, and Zhu Li
- Subjects
Primary clustering ,Theoretical computer science ,Universal hashing ,Computer science ,Dynamic perfect hashing ,Linear probing ,Nearest neighbor search ,Hash function ,2-choice hashing ,Linear hashing ,Hash table ,Hopscotch hashing ,Locality-sensitive hashing ,K-independent hashing ,Cuckoo hashing ,Open addressing ,SUHA ,Locality preserving hashing ,Data_FILES ,Feature hashing ,Consistent hashing ,Perfect hash function ,Extendible hashing ,Double hashing - Abstract
Recently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g., Locality Sensitive Hashing (LSH). However, most data-dependent hashing methods only employ a single hash table. When higher recall is desired, they have to retrieve exponentially growing number of hash buckets around the bucket containing the query, which may drag down the precision rapidly. In this paper, we propose a so-called complementary hashing approach, which is able to balance the precision and recall in a more effective way. The key idea is to employ multiple complementary hash tables, which are learned sequentially in a boosting manner, so that, given a query, its true nearest neighbors missed from the active bucket of one hash table are more likely to be found in the active bucket of the next hash table. Compared with LSH that also can exploit multiple hash tables, our approach is more effective to find true NNs, thanks to the complementarity property of the hash tables from our approach. Experimental results on large scale ANN search show that the proposed method significantly improves the performance and outperforms the state-of-the-art.
- Published
- 2011
- Full Text
- View/download PDF
217. Fast GPU-based locality sensitive hashing for k-nearest neighbor computation
- Author
-
Dinesh Manocha and Jia Pan
- Subjects
Cuckoo hashing ,Theoretical computer science ,Computer science ,Computation ,Hash function ,Parallel computing ,Queue ,Partition (database) ,Hash table ,Locality-sensitive hashing ,k-nearest neighbors algorithm - Abstract
We present an efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces. We use the Bi-level LSH algorithm, which can compute k-nearest neighbors with higher accuracy and is amenable to parallelization. During the first level, we use the parallel RP-tree algorithm to partition datasets into several groups so that items similar to each other are clustered together. The second level involves computing the Bi-Level LSH code for each item and constructing a hierarchical hash table. The hash table is based on parallel cuckoo hashing and Morton curves. In the query step, we use GPU-based work queues to accelerate short-list search, which is one of the main bottlenecks in LSH-based algorithms. We demonstrate the results on large image datasets with 200,000 images which are represented as 512 dimensional vectors. In practice, our GPU implementation can obtain more than 40X acceleration over a single-core CPU-based LSH implementation.
- Published
- 2011
- Full Text
- View/download PDF
218. A balanced semi-supervised hashing method for CBIR
- Author
-
Xiangwei Kong, Haiyan Fu, and Jianhui Zhou
- Subjects
business.industry ,Universal hashing ,Computer science ,Dynamic perfect hashing ,Hash function ,Watermark ,Pattern recognition ,2-choice hashing ,Linear hashing ,Hash table ,k-nearest neighbors algorithm ,Hopscotch hashing ,Locality-sensitive hashing ,Cuckoo hashing ,Open addressing ,Locality preserving hashing ,Data_FILES ,Artificial intelligence ,Feature hashing ,business ,Consistent hashing ,Extendible hashing ,Double hashing - Abstract
Hashing methods have attracted much attention in large scale image research in recent years, because they are not only fast, but also needing a little memory. This paper proposed a balanced semi-supervised hashing method by dividing image into several blocks. With the help of improved semi-supervised hashing, we obtain a short hash code of each block, which jointed together forms a hash code of an integrated image. In the improved semi-supervised hashing, the supervised information is completed by combining the similarity of image pairs and label information. Extensive experiments demonstrate that our method can get more balanced result between retrieval speed, saving storage of original data and retrieval accuracy in CBIR than the state-of-the-art hashing methods.
- Published
- 2011
- Full Text
- View/download PDF
219. Brief announcement
- Author
-
Michael Mitzenmacher and Michael T. Goodrich
- Subjects
Set (abstract data type) ,Cuckoo hashing ,Theoretical computer science ,Key (cryptography) ,Data mining ,Multimap ,computer.software_genre ,Inverted index ,Data structure ,computer ,Auxiliary memory ,Associative array ,Mathematics - Abstract
Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of key-value pairs. A multimap is generalization that allows multiple values to be associated with the same key. We study how multimaps can be implemented efficiently online in external memory frameworks, with constant expected I/O. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key.
- Published
- 2011
- Full Text
- View/download PDF
220. External-Memory Multimaps
- Author
-
Michael T. Goodrich, Justin Thaler, Michael Mitzenmacher, and Elaine Angelino
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,General Computer Science ,Computer science ,Applied Mathematics ,0102 computer and information sciences ,02 engineering and technology ,Multimap ,Data structure ,Inverted index ,computer.software_genre ,01 natural sciences ,Computer Science Applications ,Set (abstract data type) ,Cuckoo hashing ,010201 computation theory & mathematics ,020204 information systems ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Data Structures and Algorithms (cs.DS) ,Out-of-core algorithm ,Memory model ,Data mining ,computer ,Auxiliary memory - Abstract
Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of key-value pairs. A \emph{multimap} is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our external-memory results are for the standard two-level memory model., Accepted to ISAAC 2011. 22 pages, 6 figures
- Published
- 2011
221. A cuckoo hashing variant with improved memory utilization and insertion time
- Author
-
Ely Porat and Bar Shalem
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,Computer science ,Hash function ,Process (computing) ,Disjoint sets ,2-choice hashing ,Orders of magnitude (area) ,Cuckoo hashing ,Computer Science - Data Structures and Algorithms ,Data_FILES ,Data Structures and Algorithms (cs.DS) ,Page ,Algorithm ,Auxiliary memory - Abstract
Cuckoo hashing [4] is a multiple choice hashing scheme in which each item can be placed in multiple locations, and collisions are resolved by moving items to their alternative locations. In the classical implementation of two-way cuckoo hashing, the memory is partitioned into contiguous disjoint fixed-size buckets. Each item is hashed to two buckets, and may be stored in any of the positions within those buckets. Ref. [2] analyzed a variation in which the buckets are contiguous and overlap. However, many systems retrieve data from secondary storage in same-size blocks called pages. Fetching a page is a relatively expensive process; but once a page is fetched, its contents can be accessed orders of magnitude faster. We utilize this property of memory retrieval, presenting a variant of cuckoo hashing incorporating the following constraint: each bucket must be fully contained in a single page, but buckets are not necessarily contiguous. Empirical results show that this modification increases memory utilization and decreases the number of iterations required to insert an item. If each item is hashed to two buckets of capacity two, the page size is 8, and each bucket is fully contained in a single page, the memory utilization equals 89.71% in the classical contiguous disjoint bucket variant, 93.78% in the contiguous overlapping bucket variant, and increases to 97.46% in our new non-contiguous bucket variant. When the memory utilization is 92% and we use breadth first search to look for a vacant position, the number of iterations required to insert a new item is dramatically reduced from 545 in the contiguous overlapping buckets variant to 52 in our new non-contiguous bucket variant. In addition to the empirical results, we present a theoretical lower bound on the memory utilization of our variation as a function of the page size.
- Published
- 2011
- Full Text
- View/download PDF
222. Cuckoo directory: A scalable directory for many-core systems
- Author
-
Ken Balet, Pejman Lotfi-Kamran, Michael Ferdman, and Babak Falsafi
- Subjects
010302 applied physics ,Hardware_MEMORYSTRUCTURES ,biology ,Distributed database ,Computer science ,Distributed computing ,02 engineering and technology ,Directory ,Parallel computing ,biology.organism_classification ,Cuckoo hash ,01 natural sciences ,020202 computer hardware & architecture ,Cuckoo hashing ,Many core ,Server ,0103 physical sciences ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,System on a chip ,Cuckoo - Abstract
Growing core counts have highlighted the need for scalable on-chip coherence mechanisms. The increase in the number of on-chip cores exposes the energy and area costs of scaling the directories. Duplicate-tag-based directories require highly associative structures that grow with core count, precluding scalability due to prohibitive power consumption. Sparse directories overcome the power barrier by reducing directory associativity, but require storage area over-provisioning to avoid high invalidation rates. We propose the Cuckoo directory, a power- and area-efficient scalable distributed directory. The cuckoo directory scales to high core counts without the energy costs of wide associative lookup and without gross capacity over-provisioning. Simulation of a 16-core CMP with commercial server and scientific workloads shows that the Cuckoo directory eliminates invalidations while being up to four times more power-efficient than the Duplicate-tag directory and 24% more power-efficient and up to seven times more area-efficient than the Sparse directory organization. Analytical projections indicate that the Cuckoo directory retains its energy and area benefits with increasing core count, efficiently scaling to at least 1024 cores.
- Published
- 2011
- Full Text
- View/download PDF
223. Simplificación de mallas de triángulos
- Author
-
Pasenau De Riera, Miguel Adolfo, Andújar Gran, Carlos Antonio, and Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics
- Subjects
Simplificación de malla ,Hash híbrido ,Visualització tridimensional (Informàtica) ,So, imatge i multimèdia::Creació multimèdia::Imatge digital [Àrees temàtiques de la UPC] ,Tablas hash ,Hash de cuco ,Level-of-detail ,Cuckoo hashing ,Mesh decimation ,Nivel de detalle ,Hybrid hash ,Three-dimensional display systems ,Image processing -- Digital techniques ,Computer Science::Data Structures and Algorithms ,Hash tables ,Representació visual en tres dimensions - Abstract
English: An algorithm has been developed to simplify triangles meshes using the uniform vertex clustering scheme coupled with an cuckoo hybrid hash table. The approach proposed by Christopher DeCoro has been used but instead of using his probabilistic octree to store the occupied cells of the 3D grid, the cuckoo hybrid hash table proposed by Dan A. Alcantara has been implemented. This hybrid hash table combines a classical sparse perfect hashing and the newly developed cuckoo hashing and is used to accumulate the quadric error functions and the vertices of the cell. A entirely multi-core cpu algorithm has been developed and tested with more than thirty models in three different platforms. The simplification algorithm has been also enhanced by forcing the simplified vertex to remain in its cell, correcting he normals of flipped triangles, getting the collapsed triangles as lines and filtering out the repeated triangles and lines. The hybrid tree has been enhanced to support with 64 bit keys. Some experiments with the hybrid hash has also been presented.
- Published
- 2011
224. Cuckoo Hashing with Pages
- Author
-
Michael Mitzenmacher, Michael Rink, and Martin Dietzfelbinger
- Subjects
Information retrieval ,Database ,Computer science ,0102 computer and information sciences ,02 engineering and technology ,2-choice hashing ,computer.software_genre ,01 natural sciences ,Cuckoo hashing ,010201 computation theory & mathematics ,Backup ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,computer - Abstract
A downside of cuckoo hashing is that it requires lookups to multiple locations, making it a less compelling alternative when lookups are expensive. One such setting is when memory is arranged in large pages, and the major cost is the number of page accesses. We propose the study of cuckoo hashing with pages, advocating approaches where each key has several possible locations, or cells, on a single page, and additional choices on a second backup page. We show experimentally that with k cell choices on one page and a single backup cell choice, one can achieve nearly the same loads as when each key has k + 1 random cells to choose from, with most lookups requiring just one page access, even when keys are placed online using a simple algorithm. While our results are currently experimental, they suggest several interesting new open theoretical questions for cuckoo hashing with pages.
- Published
- 2011
- Full Text
- View/download PDF
225. External-Memory Multimaps
- Author
-
Elaine Angelino, Michael Mitzenmacher, Justin Thaler, and Michael T. Goodrich
- Subjects
Theoretical computer science ,Computer science ,0102 computer and information sciences ,02 engineering and technology ,Multimap ,Inverted index ,Data structure ,01 natural sciences ,Associative array ,Set (abstract data type) ,Cuckoo hashing ,010201 computation theory & mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Out-of-core algorithm ,Auxiliary memory - Abstract
Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of key-value pairs. A multimap is a generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure commonly used in search engines is a type of multimap, with words as keys and document pointers as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our results are provably optimal up to constant factors.
- Published
- 2011
- Full Text
- View/download PDF
226. A new hashing function
- Author
-
Shibai Tong, Shiyuan Yang, and Zhiyu Tian
- Subjects
Theoretical computer science ,Universal hashing ,Computer science ,Dynamic perfect hashing ,Linear probing ,Hash function ,2-choice hashing ,Linear hashing ,Rolling hash ,Hash table ,Management Information Systems ,Hopscotch hashing ,Locality-sensitive hashing ,K-independent hashing ,Open addressing ,Cuckoo hashing ,Hardware and Architecture ,SUHA ,Locality preserving hashing ,Feature hashing ,Consistent hashing ,Algorithm ,Extendible hashing ,Perfect hash function ,Double hashing - Abstract
Existing hashing functions have various limitations. In this paper a new hashing function is proposed, which divides the range of the key-values into some equal segments, and maps the key-values in each segment linearly into the whole range of the address. The paper analyzes the statistical behavior of the function, and points out that, theoretically, by increasing the number of segments, the distribution of the resulting hash values can always approach uniform, if the key-values can be regarded as continuous. Two methods for obtaining the number of segments, the deterministic and the probabilistic, along with the algorithm, are also proposed.
- Published
- 1993
- Full Text
- View/download PDF
227. Multi-directory hashing
- Author
-
Henry Davies, Anastasia Analyti, Hsiao-Yu Chou, and Sakti Pramanik
- Subjects
Hardware_MEMORYSTRUCTURES ,Theoretical computer science ,Computer science ,Universal hashing ,Dynamic perfect hashing ,Hash function ,2-choice hashing ,Linear hashing ,Hash table ,Locality-sensitive hashing ,Hopscotch hashing ,K-independent hashing ,Open addressing ,Cuckoo hashing ,Hardware and Architecture ,Locality preserving hashing ,Data_FILES ,Feature hashing ,Consistent hashing ,Extendible hashing ,Software ,Double hashing ,Information Systems - Abstract
We present a new dynamic hashing scheme for disk-based databases, called Multi-Directory Hashing (MDH). MDH uses multiple hash directories to access a file. The size of each hash directory grows dynamically with the file size. The advantages of MDH are enhanced concurrency, improved bucket utilization and smaller total directory size than single-directory hashing. The expected utilization of MDH increases monotonically and approaches 100% as the number of hash directories increases. A variation of MDH, called Main Memory Multi-Directory Hashing (MM-MDH), is also described. MM-MDH achieves optimal search time in main memory databases. The performance of both methods is analyzed through theoretical and experimental results.
- Published
- 1993
- Full Text
- View/download PDF
228. The ZCache: Decoupling Ways and Associativity
- Author
-
Daniel Sanchez and Christos Kozyrakis
- Subjects
Smart Cache ,Cuckoo hashing ,Hardware_MEMORYSTRUCTURES ,Cache coloring ,Computer science ,Cache invalidation ,Multithreading ,Content-addressable storage ,Cache ,Parallel computing ,Cache pollution ,Cache algorithms ,CAS latency - Abstract
The ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically improved by increasing the number of ways. This reduces conflict misses, but increases hit latency and energy, placing a stringent trade-off on cache design. We present the zcache, a cache design that allows much higher associativity than the number of physical ways (e.g. a 64-associative cache with 4 ways). The zcache draws on previous research on skew-associative caches and cuckoo hashing. Hits, the common case, require a single lookup, incurring the latency and energy costs of a cache with a very low number of ways. On a miss, additional tag lookups happen off the critical path, yielding an arbitrarily large number of replacement candidates for the incoming block. Unlike conventional designs, the zcache provides associativity by increasing the number of replacement candidates, but not the number of cache ways. To understand the implications of this approach, we develop a general analysis framework that allows to compare associativity across different cache designs (e.g. a set-associative cache and a zcache) by representing associativity as a probability distribution. We use this framework to show that for zcaches, associativity depends only on the number of replacement candidates, and is independent of other factors (such as the number of cache ways or the workload). We also show that, for the same number of replacement candidates, the associativity of a zcache is superior than that of a set-associative cache for most workloads. Finally, we perform detailed simulations of multithreaded and multiprogrammed workloads on a large-scale CMP with zcache as the last-level cache. We show that zcaches provide higher performance and better energy efficiency than conventional caches without incurring the overheads of designs with a large number of ways.
- Published
- 2010
- Full Text
- View/download PDF
229. The Power of Simple Tabulation Hashing
- Author
-
Mihai Pǎtraşcu and Mikkel Thorup
- Subjects
FOS: Computer and information sciences ,Discrete mathematics ,Universal hashing ,Dynamic perfect hashing ,Zobrist hashing ,2-choice hashing ,Hash table ,K-independent hashing ,Combinatorics ,Cuckoo hashing ,Artificial Intelligence ,Hardware and Architecture ,Control and Systems Engineering ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Algorithm ,Software ,Double hashing ,Information Systems ,Mathematics ,Tabulation hashing - Abstract
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Zobrist in 1970 who used it for game playing programs. Keys are viewed as consisting of c characters. We initialize c tables H 1, ..., H c mapping characters to random hash codes. A key x = ( x 1 , ..., x c ) is hashed to H 1[ x 1] ⊕ ⋯ ⊕ H c [ x c ], where ⊕ denotes bit-wise exclusive-or. While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, for example, Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.
- Published
- 2010
230. Efficient nearest-neighbor computation for GPU-based motion planning
- Author
-
Dinesh Manocha, Jia Pan, and Christian Lauterbach
- Subjects
Cuckoo hashing ,Computational complexity theory ,Computer science ,Data parallelism ,Parallel algorithm ,Approximation algorithm ,Parallel computing ,Motion planning ,Time complexity ,Locality-sensitive hashing ,k-nearest neighbors algorithm - Abstract
We present a novel k-nearest neighbor search algorithm (KNNS) for proximity computation in motion planning algorithm that exploits the computational capabilities of many-core GPUs. Our approach uses locality sensitive hashing and cuckoo hashing to construct an efficient KNNS algorithm that has linear space and time complexity and exploits the multiple cores and data parallelism effectively. In practice, we see magnitude improvement in speed and scalability over prior GPU-based KNNS algorithm. On some benchmarks, our KNNS algorithm improves the performance of overall planner by 20–40 times for CPU-based planner and up to 2 times for GPU-based planner.
- Published
- 2010
- Full Text
- View/download PDF
231. A distributed linear hashing enabling efficient retrieval for range queries
- Author
-
Tatsuo Tsuji and Ken Higuchi
- Subjects
Theoretical computer science ,Range query (data structures) ,Computer science ,Hash function ,Linear hashing ,Locality-sensitive hashing ,K-independent hashing ,Open addressing ,Consistent hashing ,Computer Science::Databases ,Computer Science::Cryptography and Security ,Universal hashing ,Dynamic perfect hashing ,Linear probing ,2-choice hashing ,Hash table ,Hopscotch hashing ,Cuckoo hashing ,Tree (data structure) ,Tree structure ,SUHA ,Locality preserving hashing ,Feature hashing ,Extendible hashing ,Perfect hash function ,Double hashing ,Tabulation hashing - Abstract
For efficient retrieval of data, the design of the data structuring is important. Tree structures and hash tables are popular data structures. A hash table is a simple data structure and it can be retrieved very fast for an exact match query. But for a range query, the hashing scheme is necessary to search much more data blocks than other data structures. Order-preserving linear hashing is one of the solutions for this problem. Its hash function is a combination of division function and bit reversal function. By using this kind of hashing, the nearest data can be stored on the same data block in many cases as the tree structure and good performance for a range query is expected. In this paper, we estimate the performance of the order-preserving linear hashing on distributed environment. The experimental results show that the proposed scheme provides better results than the traditional distributed linear hashing.
- Published
- 2010
- Full Text
- View/download PDF
232. Advanced Hashing with Hybrid Key Duplication for IP Address Lookup
- Author
-
Rujroj Tiengtavat and Wei-Ming Lin
- Subjects
Theoretical computer science ,Universal hashing ,business.industry ,Computer science ,Dynamic perfect hashing ,Hash function ,Linear hashing ,2-choice hashing ,Hash table ,K-independent hashing ,Hopscotch hashing ,Locality-sensitive hashing ,Cuckoo hashing ,Open addressing ,Coalesced hashing ,Pearson hashing ,SUHA ,Data_FILES ,Feature hashing ,business ,Consistent hashing ,Extendible hashing ,Double hashing ,Computer network - Abstract
Hashing techniques have been widely adopted for general IP address lookup and specific network intrusion detection. In most current commonly used XOR-hashing algorithms, each of the hash key bits is usually explicitly XORed only at most once in the hash process, which may limit the amount of potential randomness that can be introduced by the hashing process. This paper further looks into various ways in duplicating and re-using key bits to maximize randomness needed in the hashing process so as to enhance the overall performance further.
- Published
- 2010
- Full Text
- View/download PDF
233. Load balancing and orientability thresholds for random hypergraphs
- Author
-
Pu Gao and Nicholas C. Wormald
- Subjects
Combinatorics ,Discrete mathematics ,Cuckoo hashing ,Hypergraph ,Conjecture ,Orientability ,Graph ,Vertex (geometry) ,Mathematics - Abstract
Let h>w>0 be two fixed integers. Let H be a random hypergraph whose hyperedges are all of cardinality h. To w-orient a hyperedge, we assign exactly w of its vertices positive signs with respect to the hyperedge, and the rest negative. A (w,k)-orientation of H consists of a w-orientation of all hyperedges of H, such that each vertex receives at most k positive signs from its incident hyperedges. When k is large enough, we determine the threshold of the existence of a (w,k)-orientation of a random hypergraph. The (w,k)-orientation of hypergraphs is strongly related to a general version of the off-line load balancing problem. The graph case, when h=2 and w=1, was solved recently by Cain, Sanders and Wormald and independently by Fernholz and Ramachandran, thereby settling a conjecture made by Karp and Saks. Motivated by a problem of cuckoo hashing, the special hypergraph case with w=k=1, was solved in three separate preprints dating from October 2009, by Frieze and Melsted, by Fountoulakis and Panagiotou, and by Dietzfelbinger, Goerdt, Mitzenmacher, Montanari, Pagh and Rink.
- Published
- 2010
- Full Text
- View/download PDF
234. Tight Thresholds for Cuckoo Hashing via XORSAT
- Author
-
Rasmus Pagh, Martin Dietzfelbinger, Michael Rink, Andreas Goerdt, Andrea Montanari, and Michael Mitzenmacher
- Subjects
Combinatorics ,Cuckoo hashing ,Open addressing ,Dynamic perfect hashing ,Data_FILES ,2-choice hashing ,Consistent hashing ,Algorithm ,Double hashing ,Hash table ,K-independent hashing ,Mathematics - Abstract
We settle the question of tight thresholds for offline cuckoo hashing. The problem can be stated as follows: we have n keys to be hashed into m buckets each capable of holding a single key. Each key has k ≥ 3 (distinct) associated buckets chosen uniformly at random and independently of the choices of other keys. A hash table can be constructed successfully if each key can be placed into one of its buckets. We seek thresholds ck such that, as n goes to infinity, if n/m ≤ c for some c ck a hash table cannot be constructed successfully with high probability. Here we are considering the offline version of the problem, where all keys and hash values are given, so the problem is equivalent to previous models of multiple-choice hashing. We find the thresholds for all values of k > 2 by showing that they are in fact the same as the previously known thresholds for the random k-XORSAT problem.We then extend these results to the setting where keys can have differing number of choices, and make a conjecture (based on experimental observations) that extends our result to cuckoo hash tables storing multiple keys in a bucket.
- Published
- 2010
- Full Text
- View/download PDF
235. Oblivious RAM Revisited
- Author
-
Tzachy Reinman and Benny Pinkas
- Subjects
TheoryofComputation_MISCELLANEOUS ,Sorting algorithm ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Cuckoo hashing ,Constant (computer programming) ,020204 information systems ,Secure two-party computation ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Oblivious ram ,Protocol (object-oriented programming) - Abstract
We reinvestigate the oblivious RAM concept introduced by Goldreich and Ostrovsky, which enables a client, that can store locally only a constant amount of data, to store remotely n data items, and access them while hiding the identities of the items which are being accessed. Oblivious RAM is often cited as a powerful tool, but is also commonly considered to be impractical due to its overhead, which is asymptotically efficient but is quite high. We redesign the oblivious RAM protocol using modern tools, namely Cuckoo hashing and a new oblivious sorting algorithm. The resulting protocol uses only O(n) external memory, and replaces each data request by only O(log2 n) requests.
- Published
- 2010
- Full Text
- View/download PDF
236. On the Insertion Time of Cuckoo Hashing
- Author
-
Angelika Steger, Konstantinos Panagiotou, and Nikolaos Fountoulakis
- Subjects
FOS: Computer and information sciences ,General Computer Science ,E.2 ,General Mathematics ,Dynamic perfect hashing ,Hash function ,G.2.2 ,Linear hashing ,Hash table ,Hopscotch hashing ,Cuckoo hashing ,Open addressing ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Algorithm ,Double hashing ,Mathematics - Abstract
Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate further the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k > 2 and that the number of items in the table is below (but arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic bound for the maximum insertion time that holds with high probability., 27 pages, final version accepted by the SIAM Journal on Computing
- Published
- 2010
- Full Text
- View/download PDF
237. Orientability of Random Hypergraphs and the Power of Multiple Choices
- Author
-
Nikolaos Fountoulakis and Konstantinos Panagiotou
- Subjects
Random graph ,Combinatorics ,Discrete mathematics ,Connected component ,Hypergraph ,symbols.namesake ,Cuckoo hashing ,symbols ,Orientability ,Poisson distribution ,Giant component ,Vertex (geometry) ,Mathematics - Abstract
A hypergraph H = (V,E) is called s-orientable, if there is an assignment of each edge e ∈ E to one of its vertices v ∈ e such that no vertex is assigned more than s edges. Let Hn,m,k be a hypergraph, drawn uniformly at random from the set of all k-uniform hypergraphs with n vertices and m edges. In this paper we establish the threshold for the 1-orientability of Hn,m,k for all k ≥ 3, i.e., we determine a critical quantity ck* such that with probability 1 - o(1) the graph Hn,cn,k has a 1-orientation if c ck*. We present two applications of this result that involve the paradigm of multiple choices. First, we show how it implies sharp load thresholds for cuckoo hash tables, where each element chooses k out of n locations. Particularly, for each k ≥ 3 we prove that with probability 1 - o(1) the maximum number of elements that can be hashed is (1 - o(1))ck*n, and more items prevent the successful allocation. Second, we study random graph processes, where in each step we have the choice among any edge connecting k random vertices. Here we show the existence of a phase transition for avoiding a giant connected component, and quantify precisely the dependence on k.
- Published
- 2010
- Full Text
- View/download PDF
238. Massively Parallel Cuckoo Pattern Matching Applied for NIDS/NIPS
- Author
-
Surin Kittitornkun and Tran Ngoc Thinh
- Subjects
Cuckoo hashing ,Hardware_MEMORYSTRUCTURES ,Speedup ,biology ,Parallel processing (DSP implementation) ,Computer science ,Hash function ,Control reconfiguration ,Pattern matching ,Parallel computing ,biology.organism_classification ,Massively parallel ,Cuckoo - Abstract
This paper describes a Cuckoo-based Pattern Matching (CPM) engine based on a recently developed hashing algorithm called Cuckoo Hashing. We implement the improved parallel Cuckoo Hashing suitable for hardware-based multi-pattern matching with arbitrary length. CPM can rapidly update the static pattern set without reconfiguration while consuming the lowest amount of hardware. With the power of massively parallel processing, the speedup of CPM is up to 128X as compared with serial Cuckoo implementation. Compared to other hardware systems, CPM is far better in performance and saves 30% of the area.
- Published
- 2010
- Full Text
- View/download PDF
239. Lookup with CAM Aided Hash Table
- Author
-
Julong Lan, Yuxiang Hu, and Chengwei Wan
- Subjects
Cuckoo hashing ,Computer science ,Dynamic perfect hashing ,Hash function ,Parallel computing ,Linear hashing ,Rolling hash ,Perfect hash function ,Algorithm ,Hash table ,Double hashing - Abstract
Hashing is popularly adopted when it comes to a large scale of IP flows. This paper mainly focused on the lookup performance of Content Addressable Memory Aided Hash Table (CAHT). High throughout is available with minimized average memory access number. By rational approximation, the paper provided the lower bound on average memory access number over CAHT. When the hash is perfect hash or nearly to the perfect hash, the lower bound is achieved. Through the analysis of some typical CAHT, the lower bound is validated. Further, simulation is coincidental to the theory model. With this lower bound and the overflow fraction of CAHT, we can evaluate the throughput and the capability of CAM, so chose the proper hashing scheme in the actual applications.
- Published
- 2009
- Full Text
- View/download PDF
240. 3.5-Way Cuckoo Hashing for the Price of 2-and-a-Bit
- Author
-
Rina Panigrahy and Eric Lehman
- Subjects
Random graph ,Combinatorics ,Cuckoo hashing ,Hash function ,Ball (bearing) ,Disjoint sets ,Hash table ,Bin ,Search tree ,Mathematics - Abstract
The study of hashing is closely related to the analysis of balls and bins; items are hashed to memory locations much as balls are thrown into bins. In particular, Azar et. al. [2] considered putting each ball in the less-full of two random bins. This lowers the probability that a bin exceeds a certain load from exponentially small to doubly exponential, giving maximum load loglogn + O(1) with high probability. Cuckoo hashing [20] draws on this idea. Each item is hashed to two buckets of capacity k. If both are full, then the insertion procedure moves previously-inserted items to their alternate buckets to make space for the new item. In a natural implementation, the buckets are represented by partitioning a fixed array of memory into non-overlapping blocks of size k. An item is hashed to two such blocks and may be stored at any location within either one. We analyze a simple twist in which each item is hashed to two arbitrary size-k memory blocks. (So consecutive blocks are no longer disjoint, but rather overlap by k − 1 locations.) This twist increases the space utilization from 1 − (2/e + o(1)) k to 1 − (1/e + o(1))1.59k in general. For k = 2, the new method improves utilization from 89.7% to 96.5%, yet lookups access only two items at each of two random locations. This result is surprising because the opposite happens in the non-cuckoo setting; if items are not moved during later insertions, then shifting from non-overlapping to overlapping blocks makes the distribution less uniform.
- Published
- 2009
- Full Text
- View/download PDF
241. Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation
- Author
-
Yuriy Arbitman, Gil Segev, and Moni Naor
- Subjects
FOS: Computer and information sciences ,Computational complexity theory ,Linear probing ,Binary logarithm ,Data structure ,Upper and lower bounds ,Combinatorics ,Cuckoo hashing ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Computer Science::Data Structures and Algorithms ,Perfect hash function ,Randomness ,Mathematics - Abstract
The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constant-time operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. However, although the first analysis of a dynamic dictionary dates back more than 45 years ago (when Knuth analyzed linear probing in 1963), the trade-off between these aspects of performance is still not completely understood. In this paper we settle two fundamental open problems: - We construct the first dynamic dictionary that enjoys the best of both worlds: it stores n elements using (1+epsilon)n memory words, and guarantees constant-time operations in the worst case with high probability. Specifically, for any epsilon = \Omega((\log\log n / \log n)^{1/2} ) and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of epsilon. The construction is a two-level variant of cuckoo hashing, augmented with a "backyard" that handles a large fraction of the elements, together with a de-amortized perfect hashing scheme for eliminating the dependency on epsilon. - We present a variant of the above construction that uses only (1+o(1))B bits, where B is the information-theoretic lower bound for representing a set of size n taken from a universe of size u, and guarantees constant-time operations in the worst case with high probability, as before. This problem was open even in the amortized setting. One of the main ingredients of our construction is a permutation-based variant of cuckoo hashing, which significantly improves the space consumption of cuckoo hashing when dealing with a rather small universe., Comment: FOCS 2010
- Published
- 2009
- Full Text
- View/download PDF
242. Applications of a Splitting Trick
- Author
-
Martin Dietzfelbinger and Michael Rink
- Subjects
Set (abstract data type) ,Combinatorics ,Cuckoo hashing ,Simple (abstract algebra) ,Hash function ,Context (language use) ,Data structure ,Randomness ,Tabulation hashing ,Mathematics - Abstract
We study applications of a simple method for circumventing the "full randomness assumption" when building a hashing-based data structure for a set S of keys. The general approach is to "split" S into "pieces" S i , by a splitting hash function. On a piece S i , a method or data structure for generating full randomness is used that uses more space than |S i |. Under certain circumstances, this data structure can be "shared" among the constructions for the pieces S i , which leads to a tighter overall space bound. The method was introduced in the context of cuckoo hashing and its variants, but it seems to have wider applicability. To demonstrate its power and some subtleties, we study three new applications, improving previous constructions: (i) Space-efficient simulation of full randomness on S (following work by Pagh and Pagh (2003/08) and Dietzfelbinger and Woelfel (2003)); (ii) Construction of highly independent functions in the style of Siegel (1989/2004); (iii) One-probe schemes as in work by Buhrman, Miltersen, Radhakrishnan, and Venkatesh (2000/02) and Pagh and Pagh (2002).
- Published
- 2009
- Full Text
- View/download PDF
243. Some Open Questions Related to Cuckoo Hashing
- Author
-
Michael Mitzenmacher
- Subjects
Random graph ,Cuckoo hashing ,Information retrieval ,Work (electrical) ,Insertion time ,business.industry ,Computer science ,Hash function ,Artificial intelligence ,business ,Online setting ,Hash table - Abstract
The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research.
- Published
- 2009
- Full Text
- View/download PDF
244. De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results
- Author
-
Yuriy Arbitman, Gil Segev, and Moni Naor
- Subjects
Cuckoo hashing ,Open addressing ,Theoretical computer science ,Universal hashing ,Dynamic perfect hashing ,Linear hashing ,Algorithm ,Perfect hash function ,Hash table ,Mathematics ,Hopscotch hashing - Abstract
Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires *** (logn ) time. Whereas such an amortized guarantee may be suitable for some applications, in other applications (such as high-performance routing) this is highly undesirable. Kirsch and Mitzenmacher (Allerton '07) proposed a de-amortization of cuckoo hashing using queueing techniques that preserve its attractive properties. They demonstrated a significant improvement to the worst case performance of cuckoo hashing via experimental results, but left open the problem of constructing a scheme with provable properties. In this work we present a de-amortization of cuckoo hashing that provably guarantees constant worst case operations. Specifically, for any sequence of polynomially many operations, with overwhelming probability over the randomness of the initialization phase, each operation is performed in constant time. In addition, we present a general approach for proving that the performance guarantees are preserved when using hash functions with limited independence instead of truly random hash functions. Our approach relies on a recent result of Braverman (CCC '09) showing that poly-logarithmic independence fools AC 0 circuits, and may find additional applications in various similar settings. Our theoretical analysis and experimental results indicate that the scheme is highly efficient, and provides a practical alternative to the only other known approach for constructing dynamic dictionaries with such worst case guarantees, due to Dietzfelbinger and Meyer auf der Heide (ICALP '90).
- Published
- 2009
- Full Text
- View/download PDF
245. Order-preserving minimal perfect hash functions and information retrieval
- Author
-
Amjad M. Daoud, Edward A. Fox, Lenwood S. Heath, and Qi Fan Chen
- Subjects
Primary clustering ,Theoretical computer science ,Computer science ,Hash function ,Linear hashing ,Rolling hash ,K-independent hashing ,Open addressing ,Cryptographic hash function ,Consistent hashing ,Computer Science::Databases ,Information retrieval ,Universal hashing ,Dynamic perfect hashing ,2-choice hashing ,General Business, Management and Accounting ,Hash table ,Computer Science Applications ,Hopscotch hashing ,Hash tree ,Cuckoo hashing ,Hash chain ,Feature hashing ,Perfect hash function ,Double hashing ,Information Systems - Abstract
Rapid access to information is essential for a wide variety of retrieval systems and applications. Hashing has long been used when the fastest possible direct search is desired, but is generally not appropriate when sequential or range searches are also required. This paper describes a hashing method, developed for collections that are relatively static, that supports both direct and sequential access. Indeed, the algorithm described gives hash functions that are optimal in terms of time and hash table space utilization, and that preserve any a priori ordering desired. Furthermore, the resulting order preserving minimal perfect hash functions (OPMPHFs) can be found using space and time that is on average linear in the number of keys involved.
- Published
- 1991
- Full Text
- View/download PDF
246. Hopscotch Hashing
- Author
-
Moran Tzafrir, Nir Shavit, and Maurice Herlihy
- Subjects
Computer science ,Linear probing ,Hash function ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,Parallel computing ,Hash table ,Hopscotch hashing ,Cuckoo hashing ,Synchronization (computer science) ,Chaining ,0202 electrical engineering, electronic engineering, information engineering ,Cache - Abstract
We present a new class of resizable sequential and concurrent hash map algorithms directed at both uni-processor and multicore machines. The new hopscotch algorithms are based on a novel hopscotchmulti-phased probing and displacement technique that has the flavors of chaining, cuckoo hashing, and linear probing, all put together, yet avoids the limitations and overheads of these former approaches. The resulting algorithms provide tables with very low synchronization overheads and high cache hit ratios. In a series of benchmarks on a state-of-the-art 64-way Niagara II multicore machine, a concurrent version of hopscotch proves to be highly scalable, delivering in some cases 2 or even 3 times the throughput of today's most efficient concurrent hash algorithm, Lea's ConcurrentHashMap from java.concurr.util. Moreover, in tests on both Intel and Sun uni-processor machines, a sequential version of hopscotch consistently outperforms the most effective sequential hash table algorithms including cuckoo hashing and bounded linear probing. The most interesting feature of the new class of hopscotch algorithms is that they continue to deliver good performance when the hash table is more than 90% full, increasing their advantage over other algorithms as the table density grows.
- Published
- 2008
- Full Text
- View/download PDF
247. More Robust Hashing: Cuckoo Hashing with a Stash
- Author
-
Udi Wieder, Adam Kirsch, and Michael Mitzenmacher
- Subjects
Cuckoo hashing ,Open addressing ,Theoretical computer science ,Universal hashing ,Dynamic perfect hashing ,Linear hashing ,Consistent hashing ,Algorithm ,Hash table ,Hopscotch hashing ,Mathematics - Abstract
Cuckoo hashing holds great potential as a high-performance hashing scheme for real applications. Up to this point, the greatest drawback of cuckoo hashing appears to be that there is a polynomially small but practically significant probability that a failure occurs during the insertion of an item, requiring an expensive rehashing of all items in the table. In this paper, we show that this failure probability can be dramatically reduced by the addition of a very small constant-sized stash. We demonstrate both analytically and through simulations that stashes of size equivalent to only three or four items yield tremendous improvements, enhancing cuckoo hashing's practical viability in both hardware and software. Our analysis naturally extends previous analyses of multiple cuckoo hashing variants, and the approach may prove useful in further related schemes.
- Published
- 2008
- Full Text
- View/download PDF
248. History-Independent Cuckoo Hashing
- Author
-
Udi Wieder, Moni Naor, and Gil Segev
- Subjects
Cuckoo hashing ,Amortized analysis ,Open addressing ,Hardware_MEMORYSTRUCTURES ,Theoretical computer science ,Computer science ,Dynamic perfect hashing ,2-choice hashing ,Consistent hashing ,Algorithm ,Hash table ,Hopscotch hashing - Abstract
Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case constant lookup time, and good memory utilization. Various experiments demonstrated that cuckoo hashing is highly suitable for modern computer architectures and distributed settings, and offers significant improvements compared to other schemes. In this work we construct a practical history-independentdynamic dictionary based on cuckoo hashing. In a history-independent data structure, the memory representation at any point in time yields no information on the specific sequence of insertions and deletions that led to its current content, other than the content itself. Such a property is significant when preventing unintended leakage of information, and was also found useful in several algorithmic settings. Our construction enjoys most of the attractive properties of cuckoo hashing. In particular, no dynamic memory allocation is required, updates are performed in expected amortized constant time, and membership queries are performed in worst case constant time. Moreover, with high probability, the lookup procedure queries only two memory entries which are independent and can be queried in parallel. The approach underlying our construction is to enforce a canonical memory representation on cuckoo hashing. That is, up to the initial randomness, each set of elements has a unique memory representation.
- Published
- 2008
- Full Text
- View/download PDF
249. Memory-Efficient Fuzzy Fingerprint Vault based on the Geometric Hashing
- Author
-
Hanna Choi, Sungju Lee, Yongwha Chung, and Daesung Moon
- Subjects
Cuckoo hashing ,Theoretical computer science ,Computer science ,Dynamic perfect hashing ,Hash function ,Linear hashing ,Rolling hash ,Algorithm ,Double hashing ,Hash table ,Hopscotch hashing - Abstract
One of the solutions to the auto-alignment problem in the fuzzy fingerprint vault exploited the idea of the geometric hashing technique. Although this solution can provide higher verification accuracy, it requires more memory space due to the large size of the hash table. In this paper, we propose an approach to reduce the size of the hash table by using the time-memory tradeoff without sacrificing the verification accuracy. That is, instead of generating the full hash table at the enrollment phase, our approach generates the enrollment hash table "on-the-fly" at the verification phase. The size of the hash table can be reduced further by selecting the basis set carefully. Based on the experimental results, we confirm that the proposed approach can reduce both the static and the dynamic memory requirements without sacrificing both the verification accuracy and the security level.
- Published
- 2008
- Full Text
- View/download PDF
250. Blooming Trees for Minimal Perfect Hashing
- Author
-
Gianni Antichi, Gregorio Procissi, Fabio Vitucci, Domenico Ficara, and Stefano Giordano
- Subjects
Primary clustering ,Theoretical computer science ,Computer science ,Hash function ,Cryptography ,Linear hashing ,Rolling hash ,K-independent hashing ,Open addressing ,Cryptographic hash function ,Consistent hashing ,business.industry ,Universal hashing ,Dynamic perfect hashing ,SWIFFT ,Bloom filter ,2-choice hashing ,Hash table ,Hash tree ,Cuckoo hashing ,SUHA ,Hash chain ,Feature hashing ,business ,Perfect hash function ,Double hashing - Abstract
Hash tables are used in many networking applications, such as lookup and packet classification. But the issue of collisions resolution makes their use slow and not suitable for fast operations. Therefore, perfect hash functions have been introduced to make the hashing mechanism more efficient. In particular, a minimal perfect hash function is a function that maps a set of n keys into a set of n integer numbers without collisions. In literature, there are many schemes to construct a minimal perfect hash function, either based on mathematical properties of polynomials or on graph theory. This paper proposes a new scheme which shows remarkable results in terms of space consumption and processing speed. It is based on an alternative to Bloom Filters and requires about 4 bits per key and 12.8 seconds to construct a MPHF with 3.8times109 elements.
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.