162 results on '"Non-blocking algorithm"'
Search Results
2. Duplicacy: A New Generation of Cloud Backup Tool Based on Lock-Free Deduplication
- Author
-
Gilbert Gang Chen, Yangdong Deng, and Zonghui Li
- Subjects
Record locking ,Computer Networks and Communications ,Computer science ,business.industry ,Cloud computing ,Computer Science Applications ,Hardware and Architecture ,Backup ,Server ,Scalability ,Data_FILES ,Non-blocking algorithm ,Data deduplication ,business ,Cloud storage ,Software ,Information Systems ,Computer network - Abstract
The pervasive deployment of cloud services poses an ever-increasing demand for cross-client deduplication solutions to save network bandwidth, lower storage costs, and improve backup speeds. However, existing solutions typically depend on lock based approaches relying on a centralized chunk database, which tends to hinder performance scalability. In this work, we present a new cross-client cloud backup solution, named Duplicacy, based on a Lock-Free Deduplication approach. Lock-Free Deduplication stores chunks to network or cloud storage using content hashes as file names. It then adopts a two-step fossil deletion algorithm to solve the hard problem of deleting unreferenced chunks in the presence of concurrent backups, without the need for any locks. Experiments demonstrate that Duplicacy enables significant performance improvement for backups over previous well-known backup tools. In addition, Duplicacy can work with many general-purpose network or cloud storage services which only support a basic set of file operations, and turn them into sophisticated deduplication-aware storage servers without server-side changes.
- Published
- 2022
3. BQ: A Lock-Free Queue with Batching
- Author
-
Yossi Lev, Gal Milman, Erez Petrank, Victor Luchangco, and Alex Kogan
- Subjects
Multi-core processor ,Linearizability ,Computer science ,Concurrent data structure ,020207 software engineering ,0102 computer and information sciences ,02 engineering and technology ,Parallel computing ,Data structure ,01 natural sciences ,Computer Science Applications ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,Hardware and Architecture ,Modeling and Simulation ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Concurrent computing ,Performance improvement ,Queue ,Software - Abstract
Concurrent data structures provide fundamental building blocks for concurrent programming. Standard concurrent data structures may be extended by allowing a sequence of operations to be submitted as a batch for later execution. A sequence of such operations can then be executed more efficiently than the standard execution of one operation at a time. In this article, we develop a novel algorithmic extension to the prevalent FIFO queue data structure that exploits such batching scenarios. An implementation in C++ on a multicore demonstrates significant performance improvement of more than an order of magnitude (depending on the batch lengths and the number of threads) compared to previous queue implementations.
- Published
- 2022
4. Concurrent linearizable nearest neighbour search in LockFree-kD-tree
- Author
-
Ivan Walulya, Bapi Chatterjee, and Philippas Tsigas
- Subjects
Structure (mathematical logic) ,Linearizability ,Theoretical computer science ,General Computer Science ,Concurrent data structure ,Computer science ,Nearest neighbor search ,020207 software engineering ,0102 computer and information sciences ,02 engineering and technology ,Data structure ,Abstract data type ,01 natural sciences ,Theoretical Computer Science ,k-d tree ,010201 computation theory & mathematics ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm - Abstract
The Nearest neighbour search (NNS) is a fundamental problem in many application domains dealing with multidimensional data. In a concurrent setting, where dynamic modifications are allowed, a linearizable implementation of the NNS is highly desirable. This paper introduces the LockFree-kD-tree (LFkD-tree ): a lock-free concurrent kD-tree, which implements an abstract data type (ADT) that provides the operations Add , Remove , Contains , and NNS . Our implementation is linearizable. The operations in the LFkD-tree use single-word read and compare-and-swap ( ) atomic primitives, which are readily supported on available multi-core processors. We experimentally evaluate the LFkD-tree using several benchmarks comprising real-world and synthetic datasets. The experiments show that the presented design is scalable and achieves significant speed-up compared to the implementations of an existing sequential kD-tree and a recently proposed multidimensional indexing structure, PH-tree.
- Published
- 2021
5. On the implementation of memory reclamation methods in a lock-free hash trie design
- Author
-
Pedro Moreno, Miguel Areias, and Ricardo Rocha
- Subjects
Theoretical computer science ,Record locking ,Computer Networks and Communications ,Computer science ,Hash function ,020206 networking & telecommunications ,02 engineering and technology ,Data structure ,Hash table ,Theoretical Computer Science ,Artificial Intelligence ,Hardware and Architecture ,Hash trie ,Trie ,Data_FILES ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Non-blocking algorithm ,020201 artificial intelligence & image processing ,Software - Abstract
Hash tries are a trie-based data structure with nearly ideal characteristics for the implementation of hash maps. Starting from a particular lock-free hash map data structure, named Lock-Free Hash Tries, we focus on solving the problem of memory reclamation without losing the lock-freedom property. To the best of our knowledge, outside garbage collected environments, there is no current implementation of hash maps that is able to reclaim memory in a lock-free manner. To achieve this goal, we propose an approach for memory reclamation specific to Lock-Free Hash Tries that explores the characteristics of its structure in order to achieve efficient memory reclamation with low and well-defined memory bounds. We present and discuss in detail the key algorithms required to easily reproduce our implementation by others. Experimental results show that our approach obtains better results when compared with other state-of-the-art memory reclamation methods and provides a competitive and scalable hash map implementation, if compared to lock-based implementations.
- Published
- 2021
6. Lock-free Contention Adapting Search Trees
- Author
-
SagonasKonstantinos, WinbladKjell, and JonssonBengt
- Subjects
Linearizability ,Range query (data structures) ,Computer science ,Concurrent data structure ,Distributed computing ,020207 software engineering ,0102 computer and information sciences ,02 engineering and technology ,Data structure ,01 natural sciences ,Computer Science Applications ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,Hardware and Architecture ,Modeling and Simulation ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Software - Abstract
Concurrent key-value stores with range query support are crucial for the scalability and performance of many applications. Existing lock-free data structures of this kind use a fixed synchronization granularity. Using a fixed synchronization granularity in a concurrent key-value store with range query support is problematic as the best performing synchronization granularity depends on a number of factors that are difficult to predict, such as the level of contention and the number of items that are accessed by range queries. We present the first linearizable lock-free key-value store with range query support that dynamically adapts its synchronization granularity. This data structure is called the lock-free contention adapting search tree (LFCA tree). An LFCA tree automatically performs local adaptations of its synchronization granularity based on heuristics that take contention and the performance of range queries into account. We show that the operations of LFCA trees are linearizable, that the lookup operation is wait-free, and that the remaining operations (insert, remove and range query) are lock-free. Our experimental evaluation shows that LFCA trees achieve more than twice the throughput of related lock-free data structures in many scenarios. Furthermore, LFCA trees are able to perform substantially better than data structures with a fixed synchronization granularity over a wide range of scenarios due to their ability to adapt to the scenario at hand.
- Published
- 2021
7. On the correctness and efficiency of a novel lock-free hash trie map design
- Author
-
Miguel Areias and Ricardo Rocha
- Subjects
Correctness ,Theoretical computer science ,Computer Networks and Communications ,Computer science ,Hash function ,020206 networking & telecommunications ,02 engineering and technology ,Data structure ,Hash table ,Theoretical Computer Science ,Artificial Intelligence ,Hardware and Architecture ,Hash trie ,Trie ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,020201 artificial intelligence & image processing ,Software - Abstract
Hash tries are a trie-based data structure with nearly ideal characteristics for the implementation of hash maps. In this paper, we present a novel, simple and scalable hash trie map design that fully supports the concurrent search, insert and remove operations on hash maps. To the best of our knowledge, our proposal is the first that puts together the following characteristics: (i) be lock-free; (ii) use fixed size data structures; and (iii) maintain the access to all internal data structures as persistent memory references. Our design is modular enough to allow different types of configurations aimed for different performances in memory usage and execution time and can be easily implemented in any type of language, library or within other complex data structures. We discuss in detail the key algorithms required to easily reproduce our implementation by others and we present a proof of correctness showing that our proposal is linearizable and lock-free for the search, insert and remove operations. Experimental results show that our proposal is quite competitive when compared against other state-of-the-art proposals implemented in Java.
- Published
- 2021
8. HD-Tree: High performance Lock-Free Nearest Neighbor Search KD-Tree
- Author
-
Sang-gi Lee and NaiHoon Jung
- Subjects
Combinatorics ,Tree (data structure) ,k-d tree ,Nearest neighbor search ,Non-blocking algorithm ,Mathematics - Published
- 2020
9. Concurrent updates to pages with fixed-size rows using lock-free algorithms
- Author
-
Hanuma Kodavalla, Raghavendra Thallam Kodandaramaih, and Girish Mittur Venkataramanappa
- Subjects
Event (computing) ,Computer science ,Transaction log ,Concurrency ,Synchronization (computer science) ,General Engineering ,Non-blocking algorithm ,Operating system ,Paging ,computer.software_genre ,Throughput (business) ,computer ,Row - Abstract
Database systems based on ARIES [11] protocol rely on Write Ahead Logging (WAL) to recover the database in the event of a crash. WAL protocol requires changes to the database are recorded to the transaction log before updating the underlying database page. WAL also mandates that the log record corresponding to the change is persisted to disk before the updated page. While WAL allows updates to the databases using in-place updates or using shadow paging, database systems that perform in-place updates typically latch the page exclusively for the entire duration of log generation and the change on the page. The exclusive latch on the page prevents other threads from modifying the page at the same time, reducing the concurrency, and negatively impacting the throughput of the system. While approaches like Segment-Based recovery [16] attempt to solve the contention by pushing the burden of synchronization to the application along with a proposal for recovering parts of pages, this paper takes a different approach by providing a mechanism to support concurrent updates to certain kinds of pages under a shared latch using lock-free algorithms. The pages are recovered using existing ARIES protocol with a few modifications. This approach significantly boosts the throughput of an ARIES based database system, without any application changes. The paper describes in detail the challenges of implementing the mechanism and how the ARIES concepts like page LSN, logging and checkpoint are handled to support concurrent updates on space maintenance pages in Microsoft SQL Server. The paper also presents the experimental results showcasing the impact of the work.
- Published
- 2020
10. FEAST
- Author
-
Aravind Natarajan, Arunmoezhi Ramachandran, and Neeraj Mittal
- Subjects
Amortized analysis ,Computer science ,Concurrent data structure ,020207 software engineering ,0102 computer and information sciences ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Computer Science Applications ,Tree (data structure) ,Computational Theory and Mathematics ,Shared memory ,010201 computation theory & mathematics ,Hardware and Architecture ,Binary search tree ,Modeling and Simulation ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Enhanced Data Rates for GSM Evolution ,Software - Abstract
We present a lock-free algorithm for concurrent manipulation of a binary search tree (BST) in an asynchronous shared memory system that supports search, insert, and delete operations. In addition to read and write instructions, our algorithm uses (single-word) compare-and-swap (CAS) and bit-test-and-set (BTS) read-modify-write (RMW) instructions, both of which are commonly supported by many modern processors including Intel 64 and AMD64. In contrast to most of the existing concurrent algorithms for a binary search tree, our algorithm is edge-based rather than node-based. When compared to other concurrent algorithms for a binary search tree, modify (insert and delete) operations in our algorithm (a) work on a smaller section of the tree, (b) execute fewer RMW instructions, or (c) use fewer dynamically allocated objects. In our experiments, our lock-free algorithm significantly outperformed all other algorithms for a concurrent binary search tree especially when the contention was high. We also describe modifications to our basic lock-free algorithm so that the amortized complexity of any operation in the modified algorithm can be bounded by the sum of the tree height and the point contention to within a constant factor while preserving the other desirable features of our algorithm.
- Published
- 2020
11. A Lock Free Approach To Parallelize The Cellular Potts Model: Application To Ductal Carcinoma In Situ
- Author
-
Alberto Salguero, Antonio J. Tomeu, and Ingeniería Informática
- Subjects
Speedup ,Java ,DCIS ,Computer science ,Breast Neoplasms ,Parallel computing ,Cellular Automata ,03 medical and health sciences ,0302 clinical medicine ,parallel ,Non-blocking algorithm ,Humans ,Computer Simulation ,Cellular Potts Model ,030304 developmental biology ,computer.programming_language ,0303 health sciences ,Multi-core processor ,cellular automata ,speedup ,Cellular Potts model ,multicore ,Transactional memory ,Computational Biology ,General Medicine ,software transactional memory ,Cellular automaton ,cellular potts model ,Carcinoma, Intraductal, Noninfiltrating ,dcis ,030220 oncology & carcinogenesis ,Software transactional memory ,computer ,TP248.13-248.65 ,Research Article ,Biotechnology - Abstract
In the field of computational biology, in order to simulate multiscale biological systems, the Cellular Potts Model (CPM) has been used, which determines the actions that simulated cells can perform by determining a hamiltonian of energy that takes into account the influence that neighboring cells exert, under a wide range of parameters. There are some proposals in the literature that parallelize the CPM; in all cases, either lock-based techniques or other techniques that require large amounts of information to be disseminated among parallel tasks are used to preserve data coherence. In both cases, computational performance is limited. This work proposes an alternative approach for the parallelization of the model that uses transactional memory to maintain the coherence of the information. A Java implementation has been applied to the simulation of the ductal adenocarcinoma of breast in situ (DCIS). Times and speedups of the simulated execution of the model on the cluster of our university are analyzed. The results show a good speedup.
- Published
- 2020
12. EQueue: Elastic Lock-Free FIFO Queue for Core-to-Core Communication on Multi-Core Processors
- Author
-
Xiong Fu, Tian Yangfeng, and Junchang Wang
- Subjects
020203 distributed computing ,Multi-core processor ,General Computer Science ,Computer science ,CPU cache ,General Engineering ,020207 software engineering ,02 engineering and technology ,Parallel computing ,Burstiness ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Memory footprint ,multi-core processors ,General Materials Science ,Double-ended queue ,Lock-free queue ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Electrical and Electronic Engineering ,Instruction cycle ,Queue ,pipeline parallelism ,lcsh:TK1-9971 - Abstract
In recent years, the number of CPU cores in a multi-core processor keeps increasing. To leverage the increasing hardware resource, programmers need to develop parallelized software programs. One promising approach to parallelizing high-performance applications is pipeline parallelism, which divides a task into a serial of subtasks and then maps these subtasks to a group of CPU cores, making the communication scheme between the subtasks running on different cores a critical component for the parallelized programs. One widely-used implementation of the communication scheme is software-based, lock-free first-in-first-out queues that move data between different subtasks. The primary design goal of the prior lock-free queues was higher throughput, such that the technique of batching data was heavily used in their enqueue and dequeue operations. Unfortunately, a lock-free queue with batching heavily depends on the assumption that data arrive at a constant rate, and the queue is in an equilibrium state. Experimentally we found that the equilibrium state of a queue rarely happens in real, high-performance use cases (e.g., 10Gbps+ network applications) because data arriving rate fluctuates sharply. As a result, existing queues suffer from performance degradation when used in real applications on multi-core processors. In this paper, we present EQueue, a lock-free queue to handle this robustness issue in existing queues. EQueue is lock-free, efficient, and robust. EQueue can adaptively (1) shrink its queue size when data arriving rate is low to keep its memory footprint small to utilize CPU cache better, and (2) enlarge its queue size to avoid overflow when data arriving rate is in burstiness. Experimental results show that when used in high-performance applications, EQueue can always perform an enqueue/dequeue operation in less than 50 CPU cycles, which outperforms FastForward and MCRingBuffer, two state-of-the-art queues, by factors 3 and 2, respectively.
- Published
- 2020
13. Pointer life cycle types for lock-free data structures with memory reclamation
- Author
-
Roland Meyer and Sebastian Wolff
- Subjects
FOS: Computer and information sciences ,Computer Science - Programming Languages ,Linearizability ,Hazard pointer ,Programming language ,Computer science ,Type inference ,020207 software engineering ,02 engineering and technology ,Specification language ,Data structure ,computer.software_genre ,020204 information systems ,Pointer (computer programming) ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Safety, Risk, Reliability and Quality ,computer ,Software ,Programming Languages (cs.PL) ,Garbage collection - Abstract
We consider the verification of lock-free data structures that manually manage their memory with the help of a safe memory reclamation (SMR) algorithm. Our first contribution is a type system that checks whether a program properly manages its memory. If the type check succeeds, it is safe to ignore the SMR algorithm and consider the program under garbage collection. Intuitively, our types track the protection of pointers as guaranteed by the SMR algorithm. There are two design decisions. The type system does not track any shape information, which makes it extremely lightweight. Instead, we rely on invariant annotations that postulate a protection by the SMR. To this end, we introduce angels, ghost variables with an angelic semantics. Moreover, the SMR algorithm is not hard-coded but a parameter of the type system definition. To achieve this, we rely on a recent specification language for SMR algorithms. Our second contribution is to automate the type inference and the invariant check. For the type inference, we show a quadratic-time algorithm. For the invariant check, we give a source-to-source translation that links our programs to off-the-shelf verification tools. It compiles away the angelic semantics. This allows us to infer appropriate annotations automatically in a guess-and-check manner. To demonstrate the effectiveness of our type-based verification approach, we check linearizability for various list and set implementations from the literature with both hazard pointers and epoch-based memory reclamation. For many of the examples, this is the first time they are verified automatically. For the ones where there is a competitor, we obtain a speed-up of up to two orders of magnitude.
- Published
- 2019
14. Parallel Multi-split Extendible Hashing for Persistent Memory
- Author
-
Jianxi Chen, Jing Hu, Qing Yang, Zhouxuan Peng, Ya Yu, and Yifeng Zhu
- Subjects
Concurrency control ,Computer science ,Scalability ,Hash function ,Non-blocking algorithm ,Overhead (computing) ,Parallel computing ,Data structure ,Extendible hashing ,Hash table - Abstract
Emerging persistent memory (PM) is a promising technology that provides near-DRAM performance and disk-like durability. However, many data structures designed based on DRAM, such as hash table and B-tree, are sub-optimal for PM. Prior studies have shown that the scalability of hash tables on Intel Optane DC Persistent Memory Modules (DCPMM) degrades significantly due to expensive lock-based concurrency control and massive data movements during rehashing. This paper proposes a lock-free parallel multi-split extendible hashing scheme (PMEH), which eliminates the lock contention overhead, reduces data movements during rehashing, and ensures data consistency. Under the widely used YCSB workloads, the evaluation results show that compared to other state-of-the-art hashing schemes, PMEH is up to 1.38x faster for insertion and up to 1.9x faster for deletion, while reducing 52% extra writes. In addition, PMEH can ensure instant recovery regardless of data size.
- Published
- 2021
15. Towards an Elastic Lock-Free Hash Trie Design
- Author
-
Miguel Areias and Ricardo Rocha
- Subjects
Memory management ,Hash trie ,Computer science ,Hash function ,Data_FILES ,Non-blocking algorithm ,Context (language use) ,Parallel computing ,Data structure ,Collision ,Hash table - Abstract
A key aspect of any hash map design is the problem of dynamically resizing it in order to deal with hash collisions. In this context, elasticity refers to the ability to automatically resize the internal data structures that support the hash map operations in order to meet varying workloads, thus optimizing the overall memory consumption of the hash map. This work extends a previous lock-free hash trie design to support elastic hashing, i.e., expand saturated hash levels and compress unused hash levels, such that, at each point in time, the number of levels in a path matches the current demand as closely as possible. Experimental results show that elasticity effectively improves the search operation and, in doing so, our design becomes very competitive when compared to other state-of-the-art designs implemented in Java.
- Published
- 2021
16. Lock-free Data Structures for Data Stream Processing
- Author
-
Constantin Pohl and Alexander Baumstark
- Subjects
020203 distributed computing ,Computer science ,Distributed computing ,Design elements and principles ,020207 software engineering ,02 engineering and technology ,Data structure ,Stream processing ,Shared memory ,Multithreading ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Tuple ,Data stream processing - Abstract
Processing data in real-time instead of storing and reading from tables has led to a specialization of DBMS into the so-called data stream processing paradigm. While high throughput and low latency are key requirements to keep up with varying stream behavior and to allow fast reaction to incoming events, there are many possibilities how to achieve them. In combination with modern hardware, like server CPUs with tens of cores, the parallelization of stream queries for multithreading and vectorization is a common schema. High degrees of parallelism, however, need efficient synchronization mechanisms to allow good scaling with threads for shared memory access.In this work, we identify the most time-consuming operations for stream processing exemplarily for our own stream processing engine PipeFabric. In addition, we present different design principles of lock-free data structures which are suited to overcome those bottlenecks. We will finally demonstrate how lock-freedom greatly improves performance for join processing and tuple exchange between operators under different workloads. Nevertheless, the efficient usage of lock-free data structures comes with additional efforts and pitfalls, which we also discuss in this paper.
- Published
- 2019
17. A study of the lock-free tour problem and path-based reformulations
- Author
-
Karen Smilowitz, Sanjay Mehrotra, and Mehmet Başdere
- Subjects
Mathematical optimization ,Computer science ,Block (telecommunications) ,Path (graph theory) ,Non-blocking algorithm ,Combinatorial optimization ,Integer programming ,Industrial and Manufacturing Engineering ,MathematicsofComputing_DISCRETEMATHEMATICS ,Course (navigation) - Abstract
Motivated by marathon course design, this article introduces a novel tour-finding problem, the Lock-Free Tour Problem (LFTP), which ensures that the resulting tour does not block access to certain critical vertices. The LFTP is formulated as a mixed-integer linear program. Structurally, the LFTP yields excessive subtour formation, causing the standard branch-and-cut approach to perform poorly, even with valid inequalities derived from locking properties of the LFTP. For this reason, we introduce path-based reformulations arising from a provably stronger disjunctive program, where disjunctions are obtained by fixing the visit orders in which must-visit edges are visited. In computational tests, the reformulations are shown to yield up to 100 times improvement in solution times. Additional tests demonstrate the value of reformulations for more general tour-finding problems with visit requirements and length restrictions. Finally, practical insights from the Bank of America Chicago Marathon are presented. Supplementary materials are available for this article. We refer the reader to the publisher’s online edition for additional experiments.
- Published
- 2019
18. Proposal and Evaluation of Mutual Exclusion Method using Lock-free Algorism for Embedded Systems using Multi-core Processor
- Author
-
Yuhei Yabuta, Shigeki Nankaku, Shingo Oidate, Hiroshi Noborio, Atsuyuki Takahashi, Takuya Yamasaki, and Kenta Fujimoto
- Subjects
Multi-core processor ,symbols.namesake ,Computer science ,business.industry ,Embedded system ,symbols ,Non-blocking algorithm ,Mutual exclusion ,Electrical and Electronic Engineering ,Algorism ,business - Published
- 2019
19. Program code optimization of Relacy Race Detector library
- Author
-
O. V. Doronin, K. I. Dergun, A. M. Dergachev, and A. O. Klyuchev
- Subjects
Computer science ,Fiber (computer science) ,Program code ,lcsh:QA75.5-76.95 ,operating system ,Non-blocking algorithm ,lcsh:QC350-467 ,Relacy Race Detector ,business.industry ,Mechanical Engineering ,Detector ,thread scheduler ,Atomic and Molecular Physics, and Optics ,Computer Science Applications ,Electronic, Optical and Magnetic Materials ,Multithreading ,data races ,lock-free algorithms ,lcsh:Electronic computers. Computer science ,testing applications ,business ,lcsh:Optics. Light ,Computer hardware ,multithreading ,fiber ,Information Systems - Abstract
The paper presents the results of Relacy Race Detector (RRD) library research as applied to the problem of multithreaded code testing. The study revealed several shortcomings of the RRD library. They are: a static number of threads, complex project structure, errors in implementation and lack of support for snapshots. The work has corrected the shortcomings described above and presented a new approach for the atomic snapshot of multiple threads using fork and fiber mechanisms. With the application of these results and implemented changes it is now easier to use the RRD library for multithreaded applications testing.
- Published
- 2019
20. Non-blocking Patricia tries with replace operations
- Author
-
Niloufar Shafiei
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,Computer Networks and Communications ,Computer science ,Radix tree ,0102 computer and information sciences ,Parallel computing ,01 natural sciences ,Theoretical Computer Science ,03 medical and health sciences ,0302 clinical medicine ,Hash array mapped trie ,Trie ,Non-blocking algorithm ,Data_FILES ,Implementation ,business.industry ,Modular design ,Data structure ,Tree (data structure) ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computational Theory and Mathematics ,Shared memory ,Binary search tree ,010201 computation theory & mathematics ,Hardware and Architecture ,Asynchronous communication ,030220 oncology & carcinogenesis ,Theory of computation ,Distributed, Parallel, and Cluster Computing (cs.DC) ,business - Abstract
This paper presents a non-blocking Patricia trie implementation for an asynchronous shared-memory system using Compare&Swap. The trie implements a linearizable set and supports three update operations: insert adds an element, delete removes an element and replace replaces one element by another. The replace operation is interesting because it changes two different locations of tree atomically. If all update operations modify different parts of the trie, they run completely concurrently. The implementation also supports a wait-free find operation, which only reads shared memory and never changes the data structure. Empirically, we compare our algorithms to some existing set implementations., To appear in the 33rd IEEE International Conference on Distributed Computing Systems (ICDCS 2013)
- Published
- 2019
21. Themis: An AST-Based Lock-Free Routes Synchronizing and Sharing System for Self-Driving in Edge Computing Environments
- Author
-
Tun Lu, Jiang Te, and Ning Gu
- Subjects
Scheme (programming language) ,Schedule ,General Computer Science ,Computer science ,business.industry ,Distributed computing ,General Engineering ,collaborative tools ,Synchronizing ,Cloud computing ,Edge computing ,distributed computing ,Transmission (telecommunications) ,Non-blocking algorithm ,collaborative software ,General Materials Science ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,computer ,lcsh:TK1-9971 ,computer.programming_language - Abstract
The rapid development of self-driving technology has made self-driving cars come into reality, and some people have already adopted different levels of self-driving technology in real driving practices. With the help of self-driving technology, nowadays people can share and schedule their routes together while driving. However, currently, there are poor supports for such activities, as it demands highly responsiveness, strong consistency guarantees, and low transmission costs. As to support the novel scenario, we developed an edge computing oriented and highly efficient AST-Based lock-free synchronizing and sharing system called Themis on a fundamental itinerary planning model. We optimized the system by adopting partial replication, snapshots of history operations, and compression on consecutive operations strategies, which leave certain calculations at the cloud side, and reduce the amount and size of transmission data in the network. Besides, we have analyzed and proved the correctness of our scheme in detail; examining the significant improvements in performances through experiments.
- Published
- 2019
22. Transactional memory as an approach to building a lock-free data structure
- Author
-
S.A. Pyankov and S.L. Babichev
- Subjects
Record locking ,Computer science ,Operating system ,Non-blocking algorithm ,General Earth and Planetary Sciences ,Transactional memory ,computer.software_genre ,Data structure ,computer ,Lock (computer science) ,ABA problem ,General Environmental Science - Abstract
The development of a lock-free data structure is a vital problem. Existing approaches are ineffective and susceptible to different problems, the main one is the ABA problem. In 2013 Intel embedded hardware support of transactional memory in their processors. The lock-free implementation of Treap is described in this study. Authors compared the results of the experiment carried out on lock-based and TSX-based Treap. Analysis of this comparison reveals the feasibility of this approach for building a lock-free data structure.
- Published
- 2019
23. Blaze-Tasks
- Author
-
Damian Dechev, Peter Pirkelbauer, Amalee Wilson, and Christina Peterson
- Subjects
020203 distributed computing ,Generic programming ,Computer science ,020207 software engineering ,02 engineering and technology ,Parallel computing ,Cilk ,Scheduling (computing) ,Software portability ,Shared memory ,Hardware and Architecture ,Problem domain ,Parallel programming model ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,computer ,Software ,Information Systems ,computer.programming_language - Abstract
Compared to threads, tasks are a more fine-grained alternative. The task parallel programming model offers benefits in terms of better performance portability and better load-balancing for problems that exhibit nonuniform workloads. A common scenario of task parallel programming is that a task is recursively decomposed into smaller sub-tasks. Depending on the problem domain, the number of created sub-tasks may be nonuniform, thereby creating potential for significant load imbalances in the system. Dynamic load-balancing mechanisms will distribute the tasks across available threads. The final result of a computation may be modeled as a reduction over the results of all sub-tasks. This article describes a simple, yet effective prototype framework, Blaze-Tasks, for task scheduling and task reductions on shared memory architectures. The framework has been designed with lock-free techniques and generic programming principles in mind. Blaze-Tasks is implemented entirely in C++17 and is thus portable. To load-balance the computation, Blaze-Tasks uses task stealing. To manage contention on a task pool, the number of lock-free attempts to steal a task depends on the distance between thief and pool owner and the estimated number of tasks in a victim’s pool. This article evaluates the Blaze framework on Intel and IBM dual-socket systems using nine benchmarks and compares its performance with other task parallel frameworks. While Cilk outperforms Blaze on Intel on most benchmarks, the evaluation shows that Blaze is competitive with OpenMP and other library-based implementations. On IBM, the experiments show that Blaze outperforms other approaches on most benchmarks.
- Published
- 2018
24. A Memory Efficient Lock-Free Circular Queue
- Author
-
Frank Liu, Narasinga Rao Miniskar, and Jeffrey S. Vetter
- Subjects
Circular buffer ,Record locking ,Memory management ,Computer science ,Synchronization (computer science) ,Non-blocking algorithm ,Double-ended queue ,Parallel computing ,Mutual exclusion ,Queue - Abstract
Hardware queues are import in many applications, such as data transfer, synchronization of concurrent modules with the need of mutual exclusion constructs. State of the art bounded (of a fixed size) lock free circular queues are implemented either by read/write atomic operations, or barrier conditions, or by separating dequeue and enqueue operations. However, these queues always require an unused element at all the times to safeguard the front and rear pointers of the queue, so as to avoid data race conditions, which leads to the waste of memory. The waste of memory is especially disadvantageous in applications such as I/O data transfer, and image transfer between processing filters, when large element size is needed, We propose a lock- free solution of the bounded circular queue through read/write atomic operations, but without the need of an extra element in the queue. The proposed solution is implemented and verified in both Verilog and 'C' languages. We also demonstrate its effectiveness by comparing its area and delay metrics with the implementations of other existing designs of queue.
- Published
- 2021
25. OrcGC
- Author
-
Pedro Ramalhete, Pascal Felber, and Andreia Correia
- Subjects
hazard pointers ,Scheme (programming language) ,020203 distributed computing ,Basis (linear algebra) ,business.industry ,Computer science ,Reference counting ,memory reclamation ,020207 software engineering ,02 engineering and technology ,Data structure ,Object (computer science) ,Allocator ,Land reclamation ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,business ,computer ,Computer hardware ,computer.programming_language - Abstract
Dynamic lock-free data structures require a memory reclamation scheme with a similar progress. Until today, lock-free schemes are applied to data structures on a case-by-case basis, often with algorithm modifications to the data structure. In this paper we introduce two new lock-free reclamation schemes, one manual and the other automatic with user annotated types. The manual reclamation scheme, named pass-the-pointer (PTP), has lock-free progress and a bound on the number of unreclaimed objects that is linear with the number of threads. The automatic lock-free memory reclamation scheme, which we named OrcGC, uses PTP and object reference counting to automatically detect when to protect and when to de-allocate an object. OrcGC has a linear bound on memory usage and can be used with any allocator. We propose a new methodology that utilizes OrcGC to provide lock-free memory reclamation to a data structure. We conducted a performance evaluation on two machines, an Intel and an AMD, applying PTP and OrcGC to several lock-free data structures, providing lock-free memory reclamation where before there was none. On the Intel machine we saw no significant performance impact, while on AMD we observed a worst-case performance drop below 50%.
- Published
- 2021
26. A lock-free relaxed concurrent queue for fast work distribution
- Author
-
Stergios V. Anastasiadis and Giorgos Kappes
- Subjects
020203 distributed computing ,Shared memory ,Computer science ,Concurrency ,Transfer (computing) ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,020207 software engineering ,Throughput ,02 engineering and technology ,Parallel computing ,Latency (engineering) ,Queue - Abstract
The operation of modern systems requires the low latency and high throughput of producer-consumer communication over shared memory. In order to achieve fast communication at high concurrency, we define a relaxed ordering model that splits the queue operations into two stages, the sequential assignment to queue slots and their subsequent concurrent execution. Based on this model, we design and implement the linearizable and lock-free algorithm called Relaxed Concurrent Queue Single (RCQS). We experimentally show that RCQS achieves factors to orders of magnitude advantage over the state-of-the-art queue algorithms in operation latency and item transfer speed.
- Published
- 2021
27. Splash-4: Improving Scalability with Lock-Free Constructs
- Author
-
Stefanos Kaxiras, Alberto Ros, Ruixiang Shao, Eduardo José Gómez-Hernández, Christos Sakalis, and Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Ingeniería y Tecnología de Computadores
- Subjects
Optimization ,Hardware_MEMORYSTRUCTURES ,business.industry ,Computer science ,Suite ,Benchmarks ,Multiprocessing ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Synchronization ,Software ,Synchronization (computer science) ,Scalability ,Benchmark (computing) ,Non-blocking algorithm ,Atomic operations ,Memory model ,business ,Simulation - Abstract
Over the past three decades, the parallel applications of the Splash-2 benchmark suite have been instrumental in advancing multiprocessor research. Recently, the Splash-3 benchmarks eliminated performance bugs, data races, and improper synchronization that plagued Splash-2 benchmarks after the definition of the C memory model. In this work, we revisit the Splash-3 benchmarks and adapt them for contemporary architectures with atomic operations and lock-free constructs. With our changes, we improve the scalability of most benchmarks for up to 32 and 64 cores, showing an improvement of up to 9x in actual machines, and up to 5x in simulation, over the unmodified Splash-3 benchmarks. To denote the substantive nature of the improvements in the Splash-3 benchmarks and to re-introduce them in contemporary research, we refer to the new collection as Splash_4.11GitHub repository: https://github.com/OdnetninI/Splash-4
- Published
- 2021
- Full Text
- View/download PDF
28. TSLQueue: An Efficient Lock-Free Design for Priority Queues
- Author
-
Philippas Tsigas and Adones Rukundo
- Subjects
High memory ,Skip list ,Computer science ,business.industry ,Concurrency ,Non-blocking algorithm ,Linked list ,Abstract data type ,Priority queue ,business ,Queue ,Computer network - Abstract
Priority queues are fundamental abstract data types, often used to manage limited resources in parallel systems. Typical proposed parallel priority queue implementations are based on heaps or skip lists. In recent literature, skip lists have been shown to be the most efficient design choice for implementing priority queues. Though numerous intricate implementations of skip list based queues have been proposed in the literature, their performance is constrained by the high number of global atomic updates per operation and the high memory consumption, which are proportional to the number of sub-lists in the queue.
- Published
- 2021
29. A Relaxed Balanced Lock-Free Binary Search Tree
- Author
-
Lindsay Groves, Manish Kumar Singh, and Alex Potanin
- Subjects
Compare-and-swap ,Tree (data structure) ,Concurrent data structure ,Binary search tree ,Computer science ,Non-blocking algorithm ,Parallel computing ,Word (computer architecture) - Abstract
This paper presents a new relaxed balanced concurrent binary search tree using a single word compare and swap primitive, in which all operations are lock-free. Our design separates balancing actions from update operations and includes a lock-free balancing mechanism in addition to the insert, search, and relaxed delete operations. Search in our design is not affected by ongoing concurrent update operations or by the movement of nodes by tree restructuring operations. Our experiments show that our algorithm performs better than other state-of-the-art concurrent BSTs.
- Published
- 2021
30. Synch: A framework for concurrent data-structures and benchmarks
- Author
-
Nikolaos D. Kallimanis
- Subjects
FOS: Computer and information sciences ,Computer science ,0102 computer and information sciences ,02 engineering and technology ,Software_PROGRAMMINGTECHNIQUES ,01 natural sciences ,Synchronization (computer science) ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Concurrent computing ,Data Structures and Algorithms (cs.DS) ,computer.programming_language ,Multi-core processor ,Xeon ,Concurrent data structure ,business.industry ,020206 networking & telecommunications ,Objective-C ,Computer Science - Distributed, Parallel, and Cluster Computing ,010201 computation theory & mathematics ,POSIX ,Embedded system ,Distributed, Parallel, and Cluster Computing (cs.DC) ,business ,computer - Abstract
The recent advancements in multicore machines highlight the need to simplify concurrent programming in order to leverage their computational power. One way to achieve this is by designing efficient concurrent data structures (e.g. stacks, queues, hash-tables, etc.) and synchronization techniques (e.g. locks, combining techniques, etc.) that perform well in machines with large amounts of cores. In contrast to ordinary, sequential data-structures, the concurrent data-structures allow multiple threads to simultaneously access and/or modify them. Synch is an open-source framework that not only provides some common high-performant concurrent data-structures, but it also provides researchers with the tools for designing and benchmarking high performant concurrent data-structures. The Synch framework contains a substantial set of concurrent data-structures such as queues, stacks, combining-objects, hash-tables, locks, etc. and it provides a user-friendly runtime for developing and benchmarking concurrent data-structures. Among other features, the provided runtime provides functionality for creating threads easily (both POSIX and user-level threads), tools for measuring performance, etc. Moreover, the provided concurrent data-structures and the runtime are highly optimized for contemporary NUMA multiprocessors such as AMD Epyc and Intel Xeon.
- Published
- 2021
- Full Text
- View/download PDF
31. QoS monitoring in real-time streaming overlays based on lock-free data structures
- Author
-
Valerio De Luca, Catiuscia Melle, Franco Tommasi, Tommasi, Francesco, DE LUCA, Valerio, and Melle, Catiuscia
- Subjects
Real-time streaming, P2P streaming, Peer churning, QoS, Playback continuity, Lock-free ,Computer Networks and Communications ,Computer science ,business.industry ,Node (networking) ,Quality of service ,020206 networking & telecommunications ,02 engineering and technology ,Video quality ,Network congestion ,Hardware and Architecture ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Key (cryptography) ,Non-blocking algorithm ,020201 artificial intelligence & image processing ,business ,Software ,Computer network - Abstract
Peer-to-peer streaming is a well-known technology for the large-scale distribution of real-time audio/video contents. Delay requirements are very strict in interactive real-time scenarios (such as synchronous distance learning), where playback lag should be of the order of seconds. Playback continuity is another key aspect in these cases: in presence of peer churning and network congestion, a peer-to-peer overlay should quickly rearrange connections among receiving nodes to avoid freezing phenomena that may compromise audio/video understanding. For this reason, we designed a QoS monitoring algorithm that quickly detects broken or congested links: each receiving node is able to independently decide whether it should switch to a secondary sending node, called “fallback node”. The architecture takes advantage of a multithreaded design based on lock-free data structures, which improve the performance by avoiding synchronization among threads. We will show the good responsiveness of the proposed approach on machines with different computational capabilities: measured times prove both departures of nodes and QoS degradations are promptly detected and clients can quickly restore a stream reception. According to PSNR and SSIM, two well-known full-reference video quality metrics, QoE remains acceptable on receiving nodes of our resilient overlay also in presence of swap procedures.
- Published
- 2021
32. Lock-Free Parallel Computing Using Theatre
- Author
-
Christian Nigro and Libero Nigro
- Subjects
Structure (mathematical logic) ,Java ,Exploit ,Computer science ,Node (networking) ,Serialization ,Non-blocking algorithm ,Thread (computing) ,Parallel computing ,computer ,computer.programming_language ,Scheduling (computing) - Abstract
Theatre is a control-based actor system currently developed in Java, whose design specifically addresses the development of predictable, time-constrained distributed systems. Theatre, though, can also be used for untimed concurrent applications. The control structure regulating message scheduling and dispatching can be customized by programming. This paper describes a novel implementation pTheatre (Parallel Theatre), whose control structure can exploit the potential of parallel computing offered by nowadays multi-core machines. With respect to the distributed implementation of Theatre, pTheatre is more lightweight because it avoids the use of Java serialization during actor migration, and when transmitting messages from a computing node (theatre/thread) to another one. In addition, no locking mechanism is used both in high-level actor programs and in the underlying runtime support. This way, common pitfalls related to classic multi-threaded programming are naturally avoided, and the possibility of enabling high-performance computing is opened. The paper demonstrates the potential of the achieved realization through a parallel matrix multiplication example.
- Published
- 2020
33. Lock-free Fill-in Queue
- Author
-
Basem Assiri
- Subjects
Correctness ,business.industry ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,Linked list ,Data structure ,0202 electrical engineering, electronic engineering, information engineering ,Linked data structure ,Non-blocking algorithm ,Array data structure ,Double-ended queue ,business ,Queue ,Computer network - Abstract
One of the fundamental data structure that is commonly used in parallel and concurrent systems is first-in-first-out queue. Many studies propose lock-based concurrent queue algorithms to satisfy correctness property. However, the use of locks results in delays, contention, deadlock and some other issues. Instead, lock-free algorithms are introduced to overcome such issues and improve performance. Some lock-free algorithms use an atomic operation compare-and-swap (CAS) while some others replace it with fetch-and-add (FAA), to improve the performance. From implementation perspective, some queue algorithms use array data structure to ease the enqueuing and dequeuing processes but the size of queue is static. On the other hand, many queue algorithms use linked data structure where the size of queue is dynamic but the enqueuing and dequeuing processes are complicated. In this paper, we introduce new algorithms for concurrent queue that merge the ideas of array and linked data structure to get the advantages of both. Actually, our queue consists of circular linked list with $k$ empty (dummy) nodes. Therefore, in normal case it works like array. The enqueue process places the data in one of the empty nodes (at tail position), while the dequeue process deletes the data from a non-empty node (at head position), and no need to maintain the linked list queue. However, our queue is dynamic such that we change the queue size by either creating new node and connect it to the linked-list, or deleting some exist nodes. Our algorithm eases the enqueue and dequeue processes, and reduces the use of CAS operations. Therefore, it improves the performance comparing to existing queue algorithms that use CAS, and almost matches the performances of the algorithms that use FAA.
- Published
- 2020
34. Re-evaluation of Atomic Operations and Graph Coloring for Unstructured Finite Volume GPU Simulations
- Author
-
Xu Sun, Xi Zhang, Yang Liu, Yutong Lu, Xiaohu Guo, and Yunfei Du
- Subjects
Speedup ,Finite volume method ,Computer science ,010103 numerical & computational mathematics ,Parallel computing ,01 natural sciences ,010305 fluids & plasmas ,Race condition ,CUDA ,0103 physical sciences ,Non-blocking algorithm ,Node (circuits) ,Polygon mesh ,Graph coloring ,0101 mathematics - Abstract
In general, race condition can be resolved by introducing synchronisations or breaking data dependencies. Atomic operations and graph coloring are the two typical approaches to avoid race condition. Graph coloring algorithms have been generally considered winning algorithms in the literature due to their lock free implementations. In this paper, we present the GPU-accelerated algorithms of the unstructured cell-centered finite volume Computational Fluid Dynamics (CFD) software framework named PHengLEI which was originally developed for aerodynamics applications with arbitrary hybrid meshes. Overall, the newly developed GPU framework demonstrate up to 4.8 speedup comparing with 18 MPI tasks run on the latest Intel CPU node. Furthermore, the enormous efforts have been invested to optimize data dependencies which could lead to race condition due to unstructured mesh indirect addressing and related reduction math operations. With careful comparison between our optimised graph coloring and atomic operations using a series of numerical tests with different mesh sizes, the results show that atomic operations are more efficient than our optimised graph coloring in all of the test cases on Nvidia Tesla GPU V100. Specifically, for the summation operation, using atomicAdd is twice as fast as graph coloring. For the maximum operation, a speedup of 1.5 to 2 is found for atomicMax vs. graph coloring.
- Published
- 2020
35. Lock-free multithreaded semi-global matching with an arbitrary number of path directions
- Author
-
D. Frommholz
- Subjects
lcsh:Applied optics. Photonics ,Ground truth ,Speedup ,Offset (computer science) ,Discretization ,Computer science ,lcsh:T ,lcsh:TA1501-1820 ,OpenMP ,Intrinsics ,lcsh:Technology ,Line Rasterization ,SIMD ,Stereo Matching ,lcsh:TA1-2040 ,SGM ,Multithreading ,Sicherheitsforschung und Anwendungen ,Non-blocking algorithm ,lcsh:Engineering (General). Civil engineering (General) ,Algorithm - Abstract
This paper describes an efficient implementation of the semi-global matching (SGM) algorithm on multi-core processors that allows a nearly arbitrary number of path directions for the cost aggregation stage. The scanlines for each orientation are discretized iteratively once, and the regular substructures of the obtained template are reused and shifted to concurrently sum up the path cost in at most two sweeps per direction over the disparity space image. Since path overlaps do not occur at any time, no expensive thread synchronization will be needed. To further reduce the runtime on high counts of path directions, pixel-wise disparity gating is applied, and both the cost function and disparity loop of SGM are optimized using current single instruction multiple data (SIMD) intrinsics for two major CPU architectures. Performance evaluation of the proposed implementation on synthetic ground truth reveals a reduced height error if the number of aggregation directions is significantly increased or when the paths start with an angular offset. Overall runtime shows a speedup that is nearly linear to the number of available processors.
- Published
- 2020
36. LOCKED-Free Journaling: Improving the Coalescing Degree in EXT4 Journaling
- Author
-
Youjip Won, Yongjun Park, and Kyoungho Koo
- Subjects
Computer science ,Journaling file system ,ext4 ,Non-blocking algorithm ,Operating system ,CPU time ,Commit ,Latency (engineering) ,computer.software_genre ,Database transaction ,computer ,Bottleneck - Abstract
With the recent development of the low-latency storage devices, IO latency is not a critical performance bottleneck of filesystems any more. Instead, CPU Utilization and lock contention have become more critical factors to achieve higher performance. However, EXT4's transaction commit procedure is not suitable for low-latency storage devices due to the presence of the transaction's LOCKED state. In this paper, we first analyze blocked threads that have tried to update filesystem because of LOCKED state and fsync() operation. We then propose an Elimination Transaction Lock-Up scheme that optimizes a transaction commit procedure for low-latency SSDs. With the lock-up elimination scheme, transaction Lock-up overheads from journaling threads can be efficiently eliminated while still ensuring consistency of the EXT4 filesystem.
- Published
- 2020
37. Fair and trustworthy: Lock-free enhanced tendermint blockchain algorithm
- Author
-
Wazir Zada Khan and Basem Assiri
- Subjects
Cryptocurrency ,Blockchain ,Correctness ,Fairness ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,Commit ,Set (abstract data type) ,Consensus protocols ,Lock-free ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,020201 artificial intelligence & image processing ,Blockchain protocol ,Electrical and Electronic Engineering ,Byzantine fault tolerance ,Algorithm ,Smart contracts ,Block (data storage) - Abstract
Blockchain Technology is exclusively used to make online transactions secure by maintaining a distributed and decentralized ledger of records across multiple computers. Tendermint is a general-purpose blockchain engine that is composed of two parts; Tendermint Core and the blockchain application interface. The application interface makes Tendermint suitable for a wide range of applications. In this paper, we analyze and improve Practical Byzantine Fault Tolerant (PBFT), a consensus-based Tendermint blockchain algorithm. In order to avoid negative issues of locks, we first propose a lock-free algorithm for blockchain in which the proposal and voting phases are concurrent whereas the commit phase is sequential. This consideration in the algorithm allows parallelism. Secondly, a new methodology is used to decide the size of the voter set which is a subset of blockchain nodes, further investigating the block sensitivity and trustworthiness of nodes. Thirdly, to fairly select the voter set nodes, we employ the random walk algorithm. Fourthly, we imply the wait-freedom property by using a timeout due to which all blocks are eventually committed or aborted. In addition, we have discussed voting conflicts and consensuses issues that are used as a correctness property, and provide some supportive techniques.
- Published
- 2020
38. Tracking in Order to Recover - Detectable Recovery of Lock-Free Data Structures
- Author
-
Attiya, Hagit, Ben-Baruch, Ohad, Fatourou, Panagiota, Hendler, Danny, and Kosmas, Eleftherios
- Subjects
FOS: Computer and information sciences ,non-volatile memory ,Computer science ,Concurrent data structure ,persistence cost analysis ,Real-time computing ,Crash ,persistence ,Tracking (particle physics) ,Data structure ,exchanger ,tree ,Non-volatile memory ,Computer Science - Distributed, Parallel, and Cluster Computing ,linked-list ,Non-blocking algorithm ,concurrent data structures ,Non-volatile random-access memory ,Distributed, Parallel, and Cluster Computing (cs.DC) ,NVM-based computing ,State (computer science) ,recoverable algorithms and data structures ,lock-freedom ,synchronization - Abstract
This paper presents a generic approach for deriving detectably recoverable implementations of many widely-used concurrent data structures. Such implementations are appealing for emerging systems featuring byte-addressable non-volatile main memory (NVMM), whose persistence allows to efficiently resurrect failed threads after crashes. Detectable recovery ensures that after a crash, every executed operation is able to recover and return a correct response, and that the state of the data structure is not corrupted. Our approach, called Tracking, amends descriptor objects used in existing lock-free helping schemes with additional fields that track an operation's progress towards completion and persists these fields in order to ensure detectable recovery. Tracking avoids full-fledged logging and tracks the progress of concurrent operations in a per-thread manner, thus reducing the cost of ensuring detectable recovery. We have applied Tracking to derive detectably recoverable implementations of a linked list, a binary search tree, and an exchanger. Our experimental analysis introduces a new way of analyzing the cost of persistence instructions, not by simply counting them but by separating them into categories based on the impact they have on the performance. The analysis reveals that understanding the actual persistence cost of an algorithm in machines with real NVMM, is more complicated than previously thought, and requires a thorough evaluation, since the impact of different persistence instructions on performance may greatly vary. We consider this analysis to be one of the major contributions of the paper., This paper has appeared in the Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'22).
- Published
- 2020
39. NVTraverse: in NVRAM data structures, the destination is more important than the journey
- Author
-
Naama Ben-David, Guy E. Blelloch, Yuanhao Wei, Michal Friedman, and Erez Petrank
- Subjects
FOS: Computer and information sciences ,020203 distributed computing ,Class (computer programming) ,Computer science ,Concurrent data structure ,Distributed computing ,020207 software engineering ,02 engineering and technology ,Data structure ,Non-volatile memory ,Tree traversal ,Transformation (function) ,Computer Science - Distributed, Parallel, and Cluster Computing ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Non-volatile random-access memory ,Distributed, Parallel, and Cluster Computing (cs.DC) - Abstract
The recent availability of fast, dense, byte-addressable non-volatile memory has led to increasing interest in the problem of designing and specifying durable data structures that can recover from system crashes. However, designing durable concurrent data structures that are efficient and also satisfy a correctness criterion has proven to be very difficult, leading many algorithms to be inefficient or incorrect in a concurrent setting. In this paper, we present a general transformation that takes a lock-free data structure from a general class called traversal data structure (that we formally define) and automatically transforms it into an implementation of the data structure for the NVRAM setting that is provably durably linearizable and highly efficient. The transformation hinges on the observation that many data structure operations begin with a traversal phase that does not need to be persisted, and thus we only begin persisting when the traversal reaches its destination. We demonstrate the transformation's efficiency through extensive measurements on a system with Intel's recently released Optane DC persistent memory, showing that it can outperform competitors on many workloads.
- Published
- 2020
40. CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table
- Author
-
Lili Dong, Guohua Wang, Jianan Wang, and Su Chen
- Subjects
0303 health sciences ,Genome ,Computer science ,030302 biochemistry & molecular biology ,Computational Biology ,DNA ,Linked list ,Hash table ,Substring ,03 medical and health sciences ,Counting problem ,k-mer ,Chaining ,Non-blocking algorithm ,Humans ,Error detection and correction ,Molecular Biology ,Algorithm ,Algorithms ,030304 developmental biology ,Information Systems - Abstract
Motivation: Calculating the frequency of occurrence of each substring of length k in DNA sequences is a common task in many bioinformatics applications, including genome assembly, error correction, and sequence alignment. Although the problem is simple, efficient counting of datasets with high sequencing depth or large genome size is a challenge. Results: We propose a robust and efficient method, CHTKC, to solve the k-mer counting problem with a lock-free hash table that uses linked lists to resolve collisions. We also design new mechanisms to optimize memory usage and handle situations where memory is not enough to accommodate all k-mers. CHTKC has been thoroughly tested on seven datasets under multiple memory usage scenarios and compared with Jellyfish2 and KMC3. Our work shows that using a hash-table-based method to effectively solve the k-mer counting problem remains a feasible solution.
- Published
- 2020
41. Scalable Concurrent Pools Based on Diffracting Trees
- Author
-
M. S. Kupriyanov, Alexandr D. Anenkov, and Alexey A. Paznikov
- Subjects
0209 industrial biotechnology ,Computer science ,Concurrent data structure ,02 engineering and technology ,Parallel computing ,Thread (computing) ,Data structure ,Tree (data structure) ,020901 industrial engineering & automation ,Multithreading ,Synchronization (computer science) ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,020201 artificial intelligence & image processing ,Throughput (business) - Abstract
Multithreading synchronization is one of the most essential problems parallel programming. Concurrent pool is one of the most common and demanded data structures in scalable applications. The promising way to implement concurrent pool is using diffracting (diffraction) tree as an auxiliary data structure to increase the scalability. In this work, we try to optimize diffracting-tree based pool and propose our implementations which outperforms the existing ones. We designed concurrent based on diffracting trees which optimize access of threads to global variables for maximization of the efficiency (throughput) of data structure. We performed experimental modeling to evaluate the efficiency of concurrent pools. We give the evidence that our pools have higher scalability compared with the existing pool’s implementations based on diffracting trees. We discuss the experimental results and provide the guidance for using them.
- Published
- 2020
42. Re-implementation of Lock-free Contention Adapting SearchTrees
- Author
-
Vo Kha Tam, Triet S. Nguyen, Kevin Boyles, Benjamin Le Heup, and Hong Hui Bao
- Subjects
Structure (mathematical logic) ,Memory leak ,Tree (data structure) ,Memory management ,Computer science ,Non-blocking algorithm ,Multi core programming ,Parallel computing ,Search tree ,Treap - Abstract
To explore multi-core programming, we re-implement the Lock Free Contention Adapting Search Tree. We follow the structure of the original, using immutable treaps as leaf nodes implemented with an array for better performance with memory caching. Memory leaks are prevented through preallocation of elements. We evaluate the performance of the LFCA tree and compare it to the previous MRLock version. The LFCA tree performs better in all cases with multiple threads.
- Published
- 2020
43. Re-Implementation of BQ: A Lock-Free Queue with Batching Using MR Locks
- Author
-
Joshua E Howell, Rocco DiGiorgio, and Benjamin Rhiner
- Subjects
Record locking ,Source code ,Correctness ,Programming language ,Computer science ,media_common.quotation_subject ,Concurrency ,computer.software_genre ,Data structure ,restrict ,Non-blocking algorithm ,computer ,Queue ,media_common - Abstract
This implementation is based on a publication by Gal Milman, Alex Kogan, Yossi Lev, Victor Luchangco, and Erez Petrank in 2018. [1] Titled BQ: A Lock-Free Queue with Batching, they describe the data structures functionality and correctness, along with the previous work it is based on. [1] All implementation details are from this paper unless otherwise stated. In this series of papers, we will be reimplementing and performance testing their data structure. This first paper covers an initial implementation involving no concurrency, using an MR Lock [2] to restrict threads access to the data structure. This ensures correctness as the operations on the queue are guaranteed to be mutually exclusive. Our source code is available here: https://github.com/ClayDiGiorgio/ParallelTeam24
- Published
- 2020
44. Lock-free transactional vector
- Author
-
Kenneth Lamar, Damian Dechev, and Christina Peterson
- Subjects
Shared memory ,Transactional leadership ,Concurrent data structure ,Computer science ,Scalability ,Non-blocking algorithm ,Software transactional memory ,Transactional memory ,Parallel computing ,Data structure - Abstract
The vector is a fundamental data structure, offering constant-time traversal to elements and a dynamically resizable range of indices. While several concurrent vectors exist, a composition of concurrent vector operations dependent on each other can lead to undefined behavior. Techniques for providing transactional capabilities for data structure operations include Software Transactional Memory (STM) and transactional transformation methodologies. Transactional transformations convert concurrent data structures into their transactional equivalents at an operation level, rather than STM's object or memory level. To the best of our knowledge, existing STMs do not support dynamic read/write sets in a lock-free manner, and transactional transformation methodologies are unsuitable for the vector's contiguous memory layout. In this work, we present the first lock-free transactional vector. It integrates the fast lock-free resizing and instant logical status changes from related works. Our approach pre-processes transactions to reduce shared memory access and simplify access logic. This can be done without locking elements or verifying conflicts between transactions. We compare our design against state-of-the-art transactional designs, GCC STM, Transactional Boosting, and STO. All data structures are tested on four different platforms, including x86_64 and ARM architectures. We find that our lock-free transactional vector generally offers better scalability than STM and STO, and competitive performance with Transactional Boosting, but with additional lock-free guarantees. In scenarios with only reads and writes, our vector is as much as 47% faster than Transactional Boosting.
- Published
- 2020
45. Restricted memory-friendly lock-free bounded queues
- Author
-
Nikita Koval and Vitaly Aksenov
- Subjects
Concurrent data structure ,Computer science ,Bounded function ,Path (graph theory) ,Scalability ,Non-blocking algorithm ,Parallel computing ,Software system ,Queue ,Implementation - Abstract
Multi-producer multi-consumer FIFO queue is one of the fundamental concurrent data structures used in software systems. A lot of progress has been done on designing concurrent bounded and unbounded queues [1--10]. As previous works show, it is extremely hard to come up with an efficient algorithm. There are two orthogonal ways to improve the performance of fair concurrent queues: reducing the number of compare-and-swap (CAS) calls, and making queues more memory-friendly by reducing the number of allocations. The most up-to-date efficient algorithms choose the first path and use more scalable fetch-and-add (FAA) instead of CAS [3, 4, 10]. For the second path, the standard way to design memory-friendly versions is to implement queues on top of arrays [2--4, 10]. For unbounded queues it is reasonable to allocate memory in chunks, constructing a linked queue on them; this approach significantly improves the performance. The bounded queues are more memory-friendly by design: they are represented as a fixed-sized array of elements even in theory. However, most of the bounded queue implementations still have issues with memory allocations --- typically, they either use descriptors [5, 8] or store some additional meta-information along with the elements [1, 6, 7, 9].
- Published
- 2020
46. A Compression-Based Design for Higher Throughput in a Lock-Free Hash Map
- Author
-
Ricardo Rocha, Miguel Areias, and Pedro Moreno
- Subjects
Computer engineering ,Computer science ,Hash function ,Non-blocking algorithm ,Cache ,Data structure ,Throughput (business) ,Hash table - Abstract
Lock-free implementation techniques are known to improve the overall throughput of concurrent data structures. A hash map is an important data structure used to organize information that must be accessed frequently. A key role of a hash map is the ability to balance workloads by dynamically adjusting its internal data structures in order to provide the fastest possible access to the information. This work extends a previous lock-free hash map design to also support lock-free compression. The main goal is to significantly reduce the depth of the internal hash levels within the hash map, in order to minimize cache misses and increase the overall throughput. To materialize our design, we redesigned the existent search, insert, remove and expand operations in order to maintain the lock-freedom property of the whole design. Experimental results show that lock-free compression effectively improves the search operation and, in doing so, it outperforms the previous design, which was already quite competitive when compared against the concurrent hash map design supported by Intel.
- Published
- 2020
47. A more Pragmatic Implementation of the Lock-free, Ordered, Linked List
- Author
-
Jesper Larsson Träff and Manuel Pöter
- Subjects
FOS: Computer and information sciences ,020203 distributed computing ,Theoretical computer science ,Computer science ,Concurrent data structure ,020207 software engineering ,02 engineering and technology ,Linked list ,Fixed point ,Data structure ,Hash table ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Data Structures and Algorithms (cs.DS) ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Unique key ,Invariant (computer science) - Abstract
The lock-free, ordered, linked list is an important, standard example of a concurrent data structure. An obvious, practical drawback of textbook implementations is that failed compare-and-swap (CAS) operations lead to retraversal of the entire list (retries), which is particularly harmful for such a linear-time data structure. We alleviate this drawback by first observing that failed CAS operations under some conditions do not require a full retry, and second by maintaining approximate backwards pointers that are used to find a closer starting position in the list for operation retry. Experiments with both a worst-case deterministic benchmark, and a standard, randomized, mixed-operation throughput benchmark on three shared-memory systems (Intel Xeon, AMD EPYC, SPARC-T5) show practical improvements ranging from significant, to dramatic, several orders of magnitude.
- Published
- 2020
- Full Text
- View/download PDF
48. Improving Per-Node Computing Efficiency by an Adaptive Lock-Free Scheduling Model
- Author
-
Mincong Yu, Naqin Zhou, Zhishuo Zheng, Xinyang Wang, and Deyu Qi
- Subjects
Artificial Intelligence ,Hardware and Architecture ,business.industry ,Computer science ,Non-blocking algorithm ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,business ,Software ,Scheduling (computing) ,Computer network - Published
- 2018
49. A tale of lock-free agents: towards Software Transactional Memory in parallel Agent-Based Simulation
- Author
-
Peer-Olaf Siebers and Jonathan Thaler
- Subjects
Correctness ,Computer science ,Parallel programming ,0211 other engineering and technologies ,lcsh:Analysis ,02 engineering and technology ,Parallel computing ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,Massively parallel ,computer.programming_language ,Functional programming ,Multi-core processor ,021103 operations research ,lcsh:T57-57.97 ,Applied Mathematics ,lcsh:QA299.6-433 ,Python (programming language) ,Computer Science Applications ,Haskell ,Modeling and Simulation ,lcsh:Applied mathematics. Quantitative methods ,Software transactional memory ,020201 artificial intelligence & image processing ,Software Transactional Memory ,computer ,Agent-Based Simulation - Abstract
With the decline of Moore’s law and the ever increasing availability of cheap massively parallel hardware, it becomes more and more important to embrace parallel programming methods to implement Agent-Based Simulations (ABS). This has been acknowledged in the field a while ago and numerous research on distributed parallel ABS exists, focusing primarily on Parallel Discrete Event Simulation as the underlying mechanism. However, these concepts and tools are inherently difficult to master and apply and often an excess in case implementers simply want to parallelise their own, custom agent-based model implementation. However, with the established programming languages in the field, Python, Java and C++, it is not easy to address the complexities of parallel programming due to unrestricted side effects and the intricacies of low-level locking semantics. Therefore, in this paper we propose the use of a lock-free approach to parallel ABS using Software Transactional Memory (STM) in conjunction with the pure functional programming language Haskell, which in combination, removes some of the problems and complexities of parallel implementations in imperative approaches. We present two case studies, in which we compare the performance of lock-based and lock-free STM implementations in two different well known Agent-Based Models, where we investigate both the scaling performance under increasing number of CPU cores and the scaling performance under increasing number of agents. We show that the lock-free STM implementations consistently outperform the lock-based ones and scale much better to increasing number of CPU cores both on local hardware and on Amazon EC. Further, by utilizing the pure functional language Haskell we gain the benefits of immutable data and lack of unrestricted side effects guaranteed at compile-time, making validation easier and leading to increased confidence in the correctness of an implementation, something of fundamental importance and benefit in parallel programming in general and scientific computing like ABS in particular.
- Published
- 2019
50. HMalloc: A Hybrid, Scalable, and Lock-Free Memory Allocator for Multi-Threaded Applications
- Author
-
Yiping Yao, Tianlin Li, Zhongwei Lin, and Wenjie Tang
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,Locality ,False sharing ,0102 computer and information sciences ,02 engineering and technology ,Thread (computing) ,Parallel computing ,01 natural sciences ,Lock (computer science) ,Allocator ,Memory management ,Shared memory ,010201 computation theory & mathematics ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Non-blocking algorithm ,020201 artificial intelligence & image processing - Abstract
An efficient multi-threaded memory allocator has great impact on performance of applications with frequent memory allocation and deallocation operations. Currently, most popular memory allocators ignore the difference between thread local and shared memory, and manage them in a unified manner, which cannot make good use of the spatiotemporal locality of memory access. To solve this problem, this paper proposes a hybrid and lock-free multi-threaded memory allocator, named HMalloc. The allocator separates local memory from shared memory. There is no false sharing and lock contentions in local memory allocation and deallocation process. Moreover, coalescence-free is used to optimize this process. Further, a flag-based shared memory allocation and deallocation method is proposed to achieve lock-free shared memory management. Experimental results show that HMalloc can achieve significant performance improvement when compared with existing well-known memory allocators.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.