704 results on '"Concurrent computing"'
Search Results
2. APPLICATION OF REAL-TIME FAN SCHEDULING IN EXPLORATION-EXPLOITATION TO OPTIMIZE MINIMUM FUNCTION OBJECTIVES
- Author
-
Mariano LARIOS-GÓMEZ, Perfecto M. QUINTERO-FLORES, and Mario ANZURES-GARCÍA
- Subjects
real-time task scheduling ,genetic algorithms ,concurrent computing ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This paper presents the application of a task scheduling algorithm called Fan based on artificial intelligence technique such as genetic algorithms for the problem of finding minima in objective functions, where equations are predefined to measure the return on investment. This work combines the methodologies of population exploration and exploitation. Results with good aptitudes are obtained until a better learning based on non-termination conditions is found, until the individual provides a better predisposi¬tion, adhering to the established constraints, exhausting all possible options and satisfying the stopping condition. A real-time task planning algorithm was applied based on consensus techniques. A software tool was developed, and the scheduler called FAN was adapted that contemplates the execution of periodic, aperiodic, and sporadic tasks focused on controlled environments, considering that strict time restrictions are met. In the first phase of the work, it is shown how convergence precipitates to an evolution. This is done in a few iterations. In the second stage, exploitation was improved, giving the algorithm a better performance in convergence and feasibility. As a result, a population was used and iterations were applied with a fan algorithm and better predisposition was obtained, which occurs in asynchronous processes while scheduling in real time.
- Published
- 2023
- Full Text
- View/download PDF
3. Discovering and Analyzing Contextual Behavioral Patterns From Event Logs.
- Author
-
Acheli, Mehdi, Grigori, Daniela, and Weidlich, Matthias
- Subjects
- *
POINT processes , *INFORMATION storage & retrieval systems - Abstract
Event logs that are recorded by information systems provide a valuable starting point for the analysis of processes in various domains, reaching from healthcare, through logistics, to e-commerce. Specifically, behavioral patterns discovered from an event log enable operational insights, even in scenarios where process execution is rather unstructured and shows a large degree of variability. While such behavioral patterns capture frequently recurring episodes of a process’ behavior, they are not limited to sequential behavior but include notions of concurrency and exclusive choices. Existing algorithms to discover behavioral patterns are context-agnostic, though. They neglect the context in which patterns are observed, which severely limits the granularity at which behavioral regularities are identified. In this paper, we therefore present an approach to discover contextual behavioral patterns. Contextual patterns may be frequent solely in a certain partition of the event log, which enables fine-granular insights into the aspects that influence the conduct of a process. Moreover, we show how to analyze the discovered contextual behavioral patterns in terms of causal relations between context information and the patterns, as well as correlations between the patterns themselves. A complete analysis methodology leveraging all the tools presented in the paper and supplemented by interpretations guidelines is also provided. Finally, experiments with real-world event logs demonstrate the effectiveness of our techniques in obtaining fine-granular process insights. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Efficient Oblivious Query Processing for Range and kNN Queries.
- Author
-
Chang, Zhao, Xie, Dong, Li, Feifei, Phillips, Jeff M., and Balasubramonian, Rajeev
- Subjects
- *
BATCH processing , *DATA encryption , *MATHEMATICAL optimization , *RANDOM access memory , *CLOUD computing , *TRUST - Abstract
Increasingly, individuals and companies adopt a cloud service provider as a primary data and IT infrastructure platform. The remote access of the data inevitably brings the issue of trust. Data encryption is necessary to keep sensitive information secure and private on the cloud. Yet adversaries can still learn valuable information regarding encrypted data by observing data access patterns. To solve such problem, Oblivious RAMs (ORAMs) are proposed to completely hide access patterns. However, most ORAM constructions are expensive and not suitable to deploy in a database for supporting query processing over large data. Furthermore, an ORAM processes queries synchronously, hence, does not provide high throughput for concurrent query processing. In this article, we design a practical oblivious query processing framework to enable efficient query processing over a cloud database. In particular, we focus on processing multiple range and $k$ k NN queries asynchronously and concurrently with high throughput. The key idea is to integrate indices into ORAM which leverages a suite of optimization techniques (e.g., oblivious batch processing and caching). The effectiveness and efficiency of our oblivious query processing framework is demonstrated through extensive evaluations over large datasets. Our construction shows an order of magnitude speedup in comparison with other baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Coordinating Fast Concurrency Adapting With Autoscaling for SLO-Oriented Web Applications.
- Author
-
Liu, Jianshu, Zhang, Shungeng, Wang, Qingyang, and Wei, Jinpeng
- Subjects
- *
WEB-based user interfaces , *CLOUD computing , *RESOURCE allocation , *BOTTLENECKS (Manufacturing) - Abstract
Cloud providers tend to support dynamic computing resources reallocation (e.g., Autoscaling) to handle the bursty workload for web applications (e.g., e-commerce) in the cloud environment. Nevertheless, we demonstrate that directly scaling a bottleneck server without quickly adjusting its soft resources (e.g., server threads and database connections) can cause significant response time fluctuations of the target web application. Since soft resources determine the request processing concurrency of each server in the system, simply scaling out/in the bottleneck service can unintentionally change the concurrency level of related services, inducing either under- or over-utilization of the critical hardware resource. In this paper, we propose the Scatter-Concurrency-Throughput (SCT) model, which can rapidly identify the near-optimal soft resource allocation of each server in the system using the measurement of each server’s real-time throughput and concurrency. Furthermore, we implement a Concurrency-aware autoScaling (ConScale) framework that integrates the SCT model to quickly reallocate the soft resources of the key servers in the system to best utilize the new hardware resource capacity after the system scaling. Based on extensive experimental comparisons with two widely used hardware-only scaling mechanisms for web applications: EC2-AutoScaling (VM-based autoscaler) and Kubernetes HPA (container-based autoscaler), we show that ConScale can successfully mitigate the response time fluctuations over the system scaling phase in both VM-based and container-based environments. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Intermittent-Aware Distributed Concurrency Control.
- Author
-
Tsai, Wei-Che, Chen, Wei-Ming, Kuo, Tei-Wei, and Hsiu, Pi-Cheng
- Subjects
- *
ENERGY harvesting , *INTERNET of things , *DATA management , *NONVOLATILE memory , *TASK analysis - Abstract
Internet of Things (IoT) devices are gradually adopting battery-less, energy harvesting solutions, thereby driving the development of an intermittent computing paradigm to accumulate computation progress across multiple power cycles. While many attempts have been made to enable standalone intermittent systems, little attention has focused on IoT networks formed by intermittent devices. We observe that the computation progress improved by distributed task concurrency in an intermittent network can be significantly offset by data unavailability due to frequent system failures. This article presents an intermittent-aware distributed concurrency control protocol which leverages existing data copies inherently created in the network to improve the computation progress of concurrently executed tasks. In particular, we propose a borrowing-based data management method to increase data availability and an intermittent two-phase commit procedure incorporated with distributed backward validation to ensure data consistency in the network. The proposed protocol was integrated into a FreeRTOS-extended intermittent operating system running on Texas Instruments devices. Experimental results show that the computation progress can be significantly improved, and this improvement is more apparent under weaker power, where more devices will remain offline for longer duration. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. A High Performance Concurrency Protocol for Smart Contracts of Permissioned Blockchain.
- Author
-
Jin, Cheqing, Pang, Shuaifeng, Qi, Xiaodong, Zhang, Zhao, and Zhou, Aoying
- Subjects
- *
GRAPH algorithms , *PARALLEL algorithms , *CONTRACTS , *SUBGRAPHS , *BLOCKCHAINS - Abstract
Although the emergence of the programmable smart contract makes blockchain systems easily embrace a wide range of industrial services, how to execute smart contracts efficiently becomes a big challenge nowadays. Due to the existence of Byzantine nodes, existing mature concurrency control protocols in database cannot be employed directly, since the mechanism of executing smart contracts varies a lot. Furthermore, even though smart contract execution follows a two-phase style, i.e., the primary node executes a batch of smart contracts in the first phase and the validators replay them in the second phase, existing parallel solutions merely focus on the optimization for the first phase, rather than the second phase. In this paper, we propose a novel two-phase concurrency control protocol to optimize both phases for the first time. First, the primary executes transactions in parallel and generates a transaction dependency graph with high parallelism for validators. Then, a graph partition algorithm is devised to divide the original graph into several sub-graphs to preserve parallelism and reduce communication cost remarkably. Finally, we propose a deterministic replay protocol to re-execute the primary’s parallel schedule concurrently. Moreover, this two-phase protocol is further optimized by integrating with PBFT. Theoretical analysis and extensive experimental results illustrate that the proposed scheme outperforms state-of-art solutions significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Space-Efficient Subgraph Search Over Streaming Graph With Timing Order Constraint.
- Author
-
Li, Youhuan, Zou, Lei, Ozsu, M. Tamer, and Zhao, Dongyan
- Subjects
- *
STREAMING video & television , *SOCIAL networks , *PUBLIC spaces , *DATA analysis - Abstract
The growing popularity of dynamic applications such as social networks provides a promising way to detect valuable information in real time. These applications create high-speed data that can be easily modeled as streaming graph. Efficient analysis over these data is of great significance. In this paper, we study the subgraph (isomorphism) search over streaming graph data that obeys timing order constraints over the occurrence of edges in the stream. The sliding window model is employed to focus on the most recent data. We propose an efficient solution to answer subgraph search, introduce optimizations to greatly reduce the space cost, and design concurrency management to improve system throughput. Extensive experiments on real network traffic data and synthetic social streaming data shows that our solution outperforms comparative ones by one order of magnitude with less space cost. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Handling Concurrency in Behavior Trees.
- Author
-
Colledanchise, Michele and Natale, Lorenzo
- Subjects
- *
FINITE state machines , *VIDEO game industry , *SOURCE code , *TREES - Abstract
This article addresses the concurrency issues affecting behavior trees (BTs), a popular tool to model the behaviors of autonomous agents in the video game and the robotics industry. BT designers can easily build complex behaviors composing simpler ones, which represents a key advantage of BTs. The parallel composition of BTs expresses a way to combine concurrent behaviors that has high potential, since composing pre-existing BTs in parallel results easier than composing in parallel classical control architectures, as finite state machines or teleo-reactive programs. However, BT designers rarely use such composition due to the underlying concurrency problems similar to the ones faced in concurrent programming. As a result, the parallel composition, despite its potential, finds application only in the composition of simple behaviors or where the designer can guarantee the absence of conflicts by design. In this article, we define two new BT nodes to tackle the concurrency problems in BTs and we show how to exploit them to create predictable behaviors. In addition, we introduce measures to assess execution performance and show how different design choices affect them. We validate our approach in both simulations and the real world. Simulated experiments provide statistically significant data, whereas real-world experiments show the applicability of our method on real robots. We provided an open-source implementation of the novel BT formulation and published all the source code to reproduce the numerical examples and experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. ConcSpectre: Be Aware of Forthcoming Malware Hidden in Concurrent Programs.
- Author
-
Liu, Yang, Xu, Zisen, Fan, Ming, Hao, Yu, Chen, Kai, Chen, Hao, Cai, Yan, Yang, Zijiang, and Liu, Ting
- Subjects
- *
MALWARE , *INTERNET servers , *COMPUTER systems , *DEBUGGING , *COMPUTER software security - Abstract
Concurrent programs with multiple threads executing in parallel are widely used to unleash the power of multicore computing systems. Owing to their complexity, a lot of research focuses on testing and debugging concurrent programs. Besides correctness, we find that security can also be compromised by concurrency. In this article, we present concurrent program spectre (ConcSpectre), a new security threat that hides malware in nondeterministic thread interleavings. To demonstrate such threat, we have developed a stealth malware technique called concurrent logic bomb by partitioning a piece of malicious code and injecting its components separately into a concurrent program. The malicious behavior can be triggered by certain thread interleavings that rarely happen (e.g., $< $ 1%) under a normal execution environment. However, with a new technique called controllable probabilistic activation, we can activate such ConcSpectre malware with a very high probability (e.g., $>$ 90%) by remotely disturbing thread scheduling. In the evaluation, more than 1000 ConcSpectre samples are generated, which bypassed most of the antivirus engines in VirusTotal and four well-known online dynamic malware analysis systems. We also demonstrate how to remotely trigger a ConcSpectre sample on a web server and control its activation probability. Our work shows an urgent need for new malware analysis methods for concurrent programs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. MUSE: A Multistage Assembling Algorithm for Simultaneous Localization of Large-Scale Massive Passive RFIDs.
- Author
-
Ma, Yongtao, Tian, Chenglong, and Liu, Hankai
- Subjects
GRAPH theory ,KNOWLEDGE graphs ,ALGORITHMS ,SLAM (Robotics) ,SYNTHETIC aperture radar ,THEORY of knowledge - Abstract
In this paper, MUSE, an algorithm enabled by backscattering tag-to-tag network (BTTN) is presented to accomplish simultaneous 2-D localization of large-scale (10 m × 10 m) massive (20 $\sim$ ∼ 50) passive UHF RFIDs. In BTTNs, the most intractable problem is the high-frequency loss of range measurements. In a particular case of 30 tags to be located with maximum communication range being 3 m, the rate is nearly up to 85.75 percent. In the proposed framework, we utilize relevant knowledge in the theory of graphs to obtain underlying subsets in which tags can communicate with each other and then assemble them stage by stage to achieve overall localization. Theoretical analysis shows that multistage assembly imparts extraordinary characteristics to MUSE: Assembling rectifies fragment maps given some condition, and in later stages prevents errors flowing down into the next stage. Experimental analysis shows that the condition is easy to satisfy. Furthermore, an analytical expression for the Cramér-Rao lower bound is also derived as a benchmark to evaluate the localization performance. Extensive simulations demonstrate that MUSE outperforms existing algorithms for simultaneous localization. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Enabling Scalable and Extensible Memory-Mapped Datastores in Userspace.
- Author
-
Peng, Ivy B., Gokhale, Maya B., Youssef, Karim, Iwabuchi, Keita, and Pearce, Roger
- Subjects
- *
DYNAMIC loads , *SYSTEMS software , *DYNAMIC balance (Mechanics) , *MACHINE learning , *SOFTWARE architecture , *METAGENOMICS - Abstract
Exascale workloads are expected to incorporate data-intensive processing in close coordination with traditional physics simulations. These emerging scientific, data-analytics and machine learning applications need to access a wide variety of datastores in flat files and structured databases. Programmer productivity is greatly enhanced by mapping datastores into the application process's virtual memory space to provide a unified “in-memory” interface. Currently, memory mapping is provided by system software primarily designed for generality and reliability. However, scalability at high concurrency is a formidable challenge on exascale systems. Also, there is a need for extensibility to support new datastores potentially requiring HPC data transfer services. In this article, we present UMap, a scalable and extensible userspace service for memory-mapping datastores. Through decoupled queue management, concurrency aware adaptation, and dynamic load balancing, UMap enables application performance to scale even at high concurrency. We evaluate UMap in data-intensive applications, including sorting, graph traversal, database operations, and metagenomic analytics. Our results show that UMap as a userspace service outperforms an optimized kernel-based service across a wide range of intra-node concurrency by 1.22-1.9 ${\times}$ × . We performed two case studies to demonstrate UMap's extensibility. First, a new datastore residing in remote memory is incorporated into UMap as an application-specific plugin. Second, we present a persistent memory allocator Metall built atop UMap for unified storage/memory. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations.
- Author
-
Hudson, Stephen, Larson, Jeffrey, Navarro, John-Luke, and Wild, Stefan M.
- Subjects
- *
WORKFLOW management , *COORDINATES - Abstract
Almost all applications stop scaling at some point; those that don't are seldom performant when considering time to solution on anything but aspirational/unicorn resources. Recognizing these tradeoffs as well as greater user functionality in a near-term exascale computing era, we present libEnsemble, a library aimed at particular scalability- and capability-stretching uses. libEnsemble enables running concurrent instances of an application in dynamically allocated ensembles through an extensible Python library. We highlight the structure, execution, and capabilities of the library on leading pre-exascale environments as well as advanced capabilities for exascale environments and beyond. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
- Author
-
Ameema Zainab, Ali Ghrayeb, Haitham Abu-Rub, Shady S. Refaat, and Othmane Bouhali
- Subjects
Apache spark ,concurrent computing ,load forecasting ,parallel processing ,resource management ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers’ real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.
- Published
- 2021
- Full Text
- View/download PDF
15. Performance Analysis, Design Considerations, and Applications of Extreme-Scale In Situ Infrastructures
- Author
-
Bethel, E. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)]
- Published
- 2016
- Full Text
- View/download PDF
16. Garou: An Efficient and Secure Off-Blockchain Multi-Party Payment Hub.
- Author
-
Ye, Yongjie, Ren, Zhifeng, Luo, Xiapu, Zhang, Jingjing, and Wu, Weigang
- Abstract
To mitigate the scalability problem of decentralized cryptocurrencies such as Bitcoin and Ethereum, the payment channel, which allows two parties to perform secure coin transfers without involving the blockchain, has been proposed. The payment channel increases the transaction throughput of two parties to a level that is only limited by their network bandwidth. Recent proposals focus on extending the two-party payment channel to the N-party payment hub. Unfortunately, none of them can achieve efficiency, flexibility in the absence of a trusted third-party. In this paper, we propose Garou, a secure N-party payment hub that allows multiple parties to perform secure off-chain coin transfers. Except in the case of disputes, participants within the payment hub can make concurrent and direct coin transfers with each other without the involvement of the blockchain or any third-party intermediaries. This allows Garou to achieve both high-performance and flexibility. If all participants keep online, Garou guarantees all honest users’ balance security against strong adversarial capabilities. To demonstrate the feasibility of the Garou protocol, we develop a proof of concept prototype for the Ethereum network. Experimental evaluations show that, with 300 participants and without disputes, that the maximum transaction throughput of Garou is $20{ \times }$ higher than that of state-of-the-art payment hubs. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
17. Performance testing of STL and Qt library elements in multi-threaded processing
- Author
-
Piotr Krasowski and Jakub Smołka
- Subjects
concurrent computing ,multithreading ,container performance ,data structures ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
In recent years multithreaded processing has become a important programming aspect. Computers with a multi-core processor are now widely available, enabling the creation of more efficient applications. Many libraries support multithreaded solutions, but performance information is often lacking. The use of appropriate data structures and algorithms significantly speeds up the process of creation and development of applications. Article describes selected elements of the Qt and STL library and compares their performance in concurrent programming. The time needed to perform individual operations was analysed.
- Published
- 2020
- Full Text
- View/download PDF
18. Dynamic Block-Wise Local Learning Algorithm for Efficient Neural Network Training.
- Author
-
Lee, Gwangho, Lee, Sunwoo, and Jeon, Dongsuk
- Subjects
MACHINE learning ,ALGORITHMS ,ARTIFICIAL neural networks ,COMPUTATIONAL complexity - Abstract
In the backpropagation algorithm, the error calculated from the output of the neural network should backpropagate the layers to update the weights of each layer, making it difficult to parallelize the training process and requiring frequent off-chip memory access. Local learning algorithms locally generate error signals which are used for weight updates, removing the need for backpropagation of error signals. However, prior works rely on large, complex auxiliary networks for reliable training, which results in large computational overhead and undermines the advantages of local learning. In this work, we propose a local learning algorithm that significantly reduces computational complexity as well as improves training performance. Our algorithm combines multiple consecutive layers into a block and performs local learning on a block-by-block basis, while dynamically changing block boundaries during training. In experiments, our approach achieves 95.68% and 79.42% test accuracy on the CIFAR-10 and CIFAR-100 datasets, respectively, using a small fully connected layer as auxiliary networks, closely matching the performance of the backpropagation algorithm. Multiply-accumulate (MAC) operations and off-chip memory access also reduce by up to 15% and 81% compared to backpropagation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
19. Workflow Refactoring for Maximizing Concurrency and Block-Structuredness.
- Author
-
Song, Wei, Jacobsen, Hans-Arno, Cheung, S.C., Liu, Hongyu, and Ma, Xiaoxing
- Abstract
In the era of Internet and big data, contemporary workflows become increasingly large in scale and complex in structure, introducing greater challenges for workflow modeling. Workflows are not with maximized concurrency and block-structuredness in terms of control flow, though languages supporting block-structuredness (e.g., BPEL) are employed. Existing workflow refactoring approaches mostly focus on maximizing concurrency according to dependences between activities, but do not consider the block-structuredness of the refactored workflow. It is easier to comprehend and analyze a workflow that is block-structured and to transform it into BPEL-like processes. In this paper, we aim at maximizing both concurrency and block-structuredness. Nevertheless, not all workflows can be refactored with a block-structured representation, and it is intractable to make sure that the refactored workflows are as block-structured as possible. We first define a well-formed dependence pattern of activities. The control flow among the activities in this pattern can be represented in block-structured forms with maximized concurrency. Then, we propose a greedy heuristics-based graph reduction approach to recursively find such patterns. In this way, the resulting workflow is with maximized concurrency and its block-structuredness approximates optimality. We show the effectiveness and efficiency of our approach with real-world scientific workflows. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
20. ConcurDB: Concurrent Query Authentication for Outsourced Databases.
- Author
-
Bajaj, Sumeet, Chakraborti, Anrin, and Sion, Radu
- Subjects
- *
DATABASES , *NETWORK hubs , *DATABASE security , *ONLINE data processing , *DATA structures - Abstract
Clients of outsourced databases need Query Authentication (QA) guaranteeing the integrity and authenticity of query results returned by potentially compromised providers. Prior work provides QA assurances for a limited class of queries by deploying several software-based cryptographic constructs. The constructs are often designed assuming read-only or infrequently updated databases. For dynamic datasets, the data owner is required to perform all updates on behalf of clients. Hence, for concurrent updates by multiple clients, such as for OLTP workloads, existing QA solutions are inefficient. We present ConcurDB, a concurrent QA scheme that enables simultaneous updates by multiple clients. To realize concurrent QA, we have designed several new mechanisms. First, we identify and use an important relationship between QA and memory checking to decouple query execution and verification. We allow clients to execute transactions concurrently and perform verifications in parallel using an offline memory checking based protocol. Then, to extend QA to a multi-client scenario, we design new protocols that enable clients to securely exchange a small set of authentication data even when using the untrusted provider as a communication hub. Finally, we overcome provider-side replay attacks. Using ConcurDB, we provide and evaluate concurrent QA for the full TPC-C benchmark. For updates, ConcurDB shows a 4x performance increase over existing solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. Concurrent Healthcare Data Processing and Storage Framework Using Deep-Learning in Distributed Cloud Computing Environment.
- Author
-
Yan, Shengguang, He, Lijuan, Seo, Jaebok, and Lin, Minmin
- Abstract
Distributed cloud computing environments rely on sophisticated communication and sharing paradigms for ease of access, information processing, and analysis. The challenging characteristic of such cloud computing environments is the concurrency and access as both the service provider and end-user rely on the common sharing platform. In this article, retrieval and storage-based indexing framework (RSIF) is designed to improve the concurrency of user and service provider access to the cloud-stored healthcare data. Concurrency is achieved through replication-free and continuous indexing and time-constrained retrieval of stored information. The process of classifying the constraints for data augmentation and update is performed using deep learning for all the storage instances. Through conditional assessment, the learning process determines the approximation of indexing and ordering for storing and retrieval, respectively. This helps to reduce the time for access and retrieval concurrently, provided the process is not dependent. The simulation analysis using the metrics discontinuous indexing, replicated data, retrieval time, and cost proves the reliability of the proposed framework. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
22. OODIDA: On-Board/Off-Board Distributed Real-Time Data Analytics for Connected Vehicles.
- Author
-
Ulm, Gregor, Smith, Simon, Nilsson, Adrian, Gustavsson, Emil, and Jirstrand, Mats
- Subjects
PROGRAMMING languages ,ELECTRONIC data processing ,VEHICLES ,VEHICLE routing problem ,EDGE computing - Abstract
A fleet of connected vehicles easily produces many gigabytes of data per hour, making centralized (off-board) data processing impractical. In addition, there is the issue of distributing tasks to on-board units in vehicles and processing them efficiently. Our solution to this problem is On-board/Off-board Distributed Data Analytics (OODIDA), which is a platform that tackles both task distribution to connected vehicles as well as concurrent execution of tasks on arbitrary subsets of edge clients. Its message-passing infrastructure has been implemented in Erlang/OTP, while the end points use a language-independent JSON interface. Computations can be carried out in arbitrary programming languages. The message-passing infrastructure of OODIDA is highly scalable, facilitating the execution of large numbers of concurrent tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
23. Facilitating rapid prototyping in the distributed data analytics platform OODIDA via active-code replacement
- Author
-
Gregor Ulm, Simon Smith, Adrian Nilsson, Emil Gustavsson, and Mats Jirstrand
- Subjects
Distributed computing ,Concurrent computing ,Distributed data processing ,Hot swapping ,Code replacement ,Erlang ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
OODIDA (On-board/Off-board Distributed Data Analytics) is a platform for distributed real-time analytics, targeting fleets of reference vehicles in the automotive industry. Its users are data analysts. The bulk of the data analytics tasks are performed by clients (on-board), while a central cloud server performs supplementary tasks (off-board). OODIDA can be automatically packaged and deployed, which necessitates restarting parts of the system, or all of it. As this is potentially disruptive, we added the ability to execute user-defined Python modules on clients as well as the server. These modules can be replaced without restarting any part of the system; they can even be replaced between iterations of an ongoing assignment. This feature is referred to as active-code replacement. It facilitates use cases such as iterative A/B testing of machine learning algorithms or modifying experimental algorithms on-the-fly. Various safeguards are in place to ensure that custom code does not have harmful consequences, for instance by limiting the allowed types for return values or prohibiting importing of certain modules of the Python standard library. Consistency of results is achieved by majority vote, which prevents tainted state. Our evaluation shows that active-code replacement can be done in less than a second in an idealized setting whereas a standard deployment takes many orders of magnitude more time. The main contribution of this paper is the description of a relatively straightforward approach to active-code replacement that is very user-friendly. It enables a data analyst to quickly execute custom code on the cloud server as well as on client devices. Sensible safeguards and design decisions ensure that this feature can be used by non-specialists who are not familiar with the implementation of OODIDA in general or this feature in particular. As a consequence of adding the active-code replacement feature, OODIDA is now very well-suited for rapid prototyping.
- Published
- 2020
- Full Text
- View/download PDF
24. Global Analysis of C Concurrency in High-Level Synthesis.
- Author
-
Ramanathan, Nadesh, Constantinides, George A., and Wickerson, John
- Subjects
GLOBAL analysis (Mathematics) ,COMPILERS (Computer programs) ,FIELD programmable gate arrays ,CARTOGRAPHY software - Abstract
When mapping C programs to hardware, high-level synthesis (HLS) tools reorder independent instructions, aiming to obtain a schedule that requires as few clock cycles as possible. However, when synthesizing multithreaded C programs, reordering opportunities are limited by the presence of atomic operations (“atomics”), the fundamental concurrency primitives in C. Existing HLS tools analyze and schedule each thread in isolation. In this article, we argue that thread-local analysis is conservative, especially since HLS compilers have access to the entire program. Hence, we propose a global analysis that exploits information about memory accesses by all threads when scheduling each thread. Implemented in the LegUp HLS tool, our analysis is sensitive to sequentially consistent (SC) and weak atomics and supports loop pipelining. Since the semantics of C atomics is complicated, we formally verify that our analysis correctly implements the C memory model using the Alloy model checker. Compared with thread-local analysis, our global analysis achieves a $2.3\times $ average speedup on a set of lock-free data structures and data-flow patterns. We also apply our analysis to a larger application: a lock-free, streamed, and load-balanced implementation of Google’s PageRank, where we see a $1.3\times $ average speedup compared with the thread-local analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
25. A Fast MPEG’s CDVS Implementation for GPU Featured in Mobile Devices
- Author
-
Alessandro Garbo and Stefano Quer
- Subjects
Computer applications ,concurrent computing ,embedded software ,image analysis ,object detection ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The Moving Picture Experts Group's Compact Descriptors for Visual Search (MPEG's CDVS) intends to standardize technologies in order to enable an interoperable, efficient, and cross-platform solution for internet-scale visual search applications and services. Among the key technologies within CDVS, we recall the format of visual descriptors, the descriptor extraction process, and the algorithms for indexing and matching. Unfortunately, these steps require precision and computation accuracy. Moreover, they are very time-consuming, as they need running times in the order of seconds when implemented on the central processing unit (CPU) of modern mobile devices. In this paper, to reduce computation times and maintain precision and accuracy, we re-design, for many-cores embedded graphical processor units (GPUs), all main local descriptor extraction pipeline phases of the MPEG's CDVS standard. To reach this goal, we introduce new techniques to adapt the standard algorithm to parallel processing. Furthermore, to reduce memory accesses and efficiently distribute the kernel workload, we use new approaches to store and retrieve CDVS information on proper GPU data structures. We present a complete experimental analysis on a large and standard test set. Our experiments show that our GPU-based approach is remarkably faster than the CPU-based reference implementation of the standard, and it maintains a comparable precision in terms of true and false positive rates.
- Published
- 2018
- Full Text
- View/download PDF
26. A Polynomial-Time Algorithm to Obtain State Machine Cover of Live and Safe Petri Nets.
- Author
-
Karatkevich, Andrei G. and Wisniewski, Remigiusz
- Subjects
- *
PETRI nets , *ALGORITHMS , *DATA mining , *COMPUTATIONAL complexity , *MATHEMATICAL models , *DISCRETE systems - Abstract
A method to find the minimized state machine cover of live and safe Petri nets (PNs) is proposed in this paper. It is required that a cover be obtained in some implementation methods of PN-based control algorithms and other applications such as data mining. Most of the known methods of its computation have at least exponential time and space complexity; this is true not only for the methods which guarantee a minimal cover is obtained but also for some methods of finding approximate solutions. The proposed algorithm is an approximation algorithm and has polynomial computational complexity. The experimental verification shows a very high efficiency of the presented technique. The proposed method is especially valuable in the case of large systems where a solution is obtained hundreds of times quicker than by other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
27. Analysis of Threading Libraries for High Performance Computing.
- Author
-
Castello, Adrian, Gual, Rafael Mayo, Seo, Sangmin, Balaji, Pavan, Quintana-Orti, Enrique S., and Pena, Antonio J.
- Subjects
- *
HIGH performance computing - Abstract
With the appearance of multi-/many core machines, applications and runtime systems have evolved in order to exploit the new on-node concurrency brought by new software paradigms. POSIX threads (Pthreads) was widely-adopted for that purpose and it remains as the most used threading solution in current hardware. Lightweight thread (LWT) libraries emerged as an alternative offering lighter mechanisms to tackle the massive concurrency of current hardware. In this article, we analyze in detail the most representative threading libraries including Pthread- and LWT-based solutions. In addition, to examine the suitability of LWTs for different use cases, we develop a set of microbenchmarks consisting of OpenMP patterns commonly found in current parallel codes, and we compare the results using threading libraries and OpenMP implementations. Moreover, we study the semantics offered by threading libraries in order to expose the similarities among different LWT application programming interfaces and their advantages over Pthreads. This article exposes that LWT libraries outperform solutions based on operating system threads when tasks and nested parallelism are required. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
28. An MDE-Based Tool for Early Analysis of UML2.0/PSM Atomic and Composite Components.
- Author
-
Rouis, Taoufik Sakka, Bhiri, Mohamed Tahar, Sliman, Layth, and Kmimech, Mourad
- Abstract
System analysis is a crucial activity throughout component-based architecture design. It enables detecting and correcting errors in the early stage of the system development life cycle. In this article, we consider system analysis in UML2.0 component-based architectural design phase. This is done by proposing a model-driven engineering (MDE) tool called UML2Ada. It enables the systematic translation of a UML2.0 atomic and composite component into Ada concurrent language. This, in turn, supports the validation of the source description using an Ada dynamic analysis tools such as GNATprove and ObjectAda. In addition, by using an Ada static analysis tool such as FLAVERS or INCA, the proposed tool enables the detection of the potential behavioral concurrency properties of the Ada concurrent program. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
29. Adaptive Model-Based Scheduling in Software Transactional Memory.
- Author
-
Di Sanzo, Pierangelo, Pellegrini, Alessandro, Sannicandro, Marco, Ciciani, Bruno, and Quaglia, Francesco
- Subjects
- *
SCHEDULING software , *MARKOV processes , *AUTHORSHIP , *MEMORY , *PREDICTION models - Abstract
Software Transactional Memory (STM) stands as powerful concurrent programming paradigm, enabling atomicity, and isolation while accessing shared data. On the downside, STM may suffer from performance degradation due to excessive conflicts among concurrent transactions, which cause waste of CPU-cycles and energy because of transaction aborts. An approach to cope with this issue consists of putting in place smart scheduling strategies which temporarily suspend the execution of some transaction in order to reduce the transaction conflict rate. In this article, we present an adaptive model-based transaction scheduling technique relying on a Markov Chain-based performance model of STM systems. Our scheduling technique is adaptive in a twofold sense: (i) It controls the execution of transactions depending on throughput predictions by the model as a function of the current system state. (ii) It re-tunes on-line the Markov Chain-based model to adapt it—and the outcoming transaction scheduling decisions—to dynamic variations of the workload. We have been able to achieve the latter target thanks to the fact that our performance model is extremely lightweight. In fact, to be recomputed, it requires a reduced set of input parameters, whose values can be estimated via a few on-line samples related to the current workload dynamics. We also present a scheduler that implements our adaptive technique, which we integrated within the open source TinySTM package. Further, we report the results of an experimental study based on the STAMP benchmark suite, which has been aimed at assessing both the accuracy of our performance model in predicting the actual system throughput and the advantages of the adaptive scheduling policy over literature techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
30. Advances in Concurrent Computing for Digital Stochastic Measurement Simulation.
- Author
-
Pjevalica, Nebojsa, Pjevalica, Velibor, and Petrovic, Nenad
- Subjects
- *
FOURIER transforms , *ELECTRIC power distribution grids , *STOCHASTIC analysis , *SIMULATION methods & models , *MEASUREMENT , *ELECTRIC transients , *DOMAIN decomposition methods - Abstract
This paper introduces a concurrent computing technique for the acceleration of digital stochastic measurement simulations. The digital stochastic measurement presents an advanced methodology based on the specific parallel hardware structure, utilized for an orthogonal transformation calculus/decomposition. Methodology is analyzed in detail, starting from the very basic idea, toward recent references, covering main research directions and trends. An oversampling nature of the evaluated digital stochastic measurement, along with demanding arithmetic requirements, implies exhausting simulation complexity. As a test case, several typical power grid signals were harmonically analyzed through a discrete Fourier transformation based on the proposed methodology. A harmonic decomposition was simulated with several levels of computing concurrency. Through all the simulated scenarios main success criterion was model accuracy, while the parameter used for selection of the optimal simulation computing technique was the overall calculus speed. Final results exposed thread pool computing technique as an optimal simulation platform. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. Optimizing Parallel I/O Accesses through Pattern-Directed and Layout-Aware Replication.
- Author
-
He, Shuibing, Yin, Yanlong, Sun, Xian-He, Zhang, Xuechen, and Li, Zongpeng
- Subjects
- *
SOLID state drives , *DATA replication , *BENEFIT performances , *COMPUTER systems , *PATTERNMAKING - Abstract
As the performance gap between processors and storage devices keeps increasing, I/O performance becomes a critical bottleneck of modern high-performance computing systems. In this paper, we propose a pattern-directed and layout-aware data replication design, named PDLA, to improve the performance of parallel I/O systems. PDLA includes an HDD-based scheme H-PDLA and an SSD-based scheme S-PDLA. For applications with relatively low I/O concurrency, H-PDLA identifies access patterns of applications and makes a reorganized data replica for each access pattern on HDD-based servers with an optimized data layout. Moreover, to accommodate applications with high I/O concurrency, S-PDLA replicates critical access patterns that can bring performance benefits on SSD-based servers or on HDD-based and SSD-based servers. We have implemented the proposed replication scheme under MPICH2 library on top of OrangeFS file system. Experimental results show that H-PDLA can significantly improve the original parallel I/O system performance and demonstrate the advantages of S-PDLA over H-PDLA. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
32. Toward Automated Analysis of Electrocardiogram Big Data by Graphics Processing Unit for Mobile Health Application
- Author
-
Xiaomao Fan, Runge Chen, Chenguang He, Yunpeng Cai, Pu Wang, and Ye Li
- Subjects
GPU computing ,mobile health ,automated ECG analysis ,parallel algorithm ,concurrent computing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
With the rapid development of mobile health technologies and applications in recent years, large amounts of electrocardiogram (ECG) signals that need to be processed timely have been produced. Although the CPU-based sequential automated ECG analysis algorithm (CPU-AECG) designed for identifying seven types of heartbeats has been in use for years, it is single-threaded and handling lots of concurrent ECG signals still poses a severe challenge. In this paper, we propose a novel GPU-based automated ECG analysis algorithm (GPU-AECG) to effectively shorten the program executing time. A new concurrencybased GPU-AECG, named cGPU-AECG, is also developed to handle multiple concurrent signals. Compared with the CPU-AECG, our cGPU-AECG achieves a 35 times speedup when handling 24-h-long ECG data, without reducing the classification accuracy. With cGPU-AECG, we can handle 24-h-ECG signals from thousands of users in a few seconds and provide prompt feedback, which not only greatly improves the user experience of mobile health services, but also reduces the economic cost of building healthcare platforms.
- Published
- 2017
- Full Text
- View/download PDF
33. LPM: A Systematic Methodology for Concurrent Data Access Pattern Optimization from a Matching Perspective.
- Author
-
Liu, Yuhang and Sun, Xian-He
- Subjects
- *
COMPUTING platforms , *COMPUTER systems , *PATTERN matching , *MATHEMATICAL optimization - Abstract
As applications become increasingly data intensive, conventional computing systems become increasingly inefficient due to data access performance bottlenecks. While intensive efforts have been made in developing new memory technologies and in designing special purpose machines, there is a lack of solutions for evaluating and utilizing recent hardware advancements to address the memory-wall problem in a systematic way. In this study, we present the memory Layered Performance Matching (LPM) methodology to provide a systematic approach for data access performance optimization. LPM uniquely presents and utilizes the data access concurrency, in addition to data access locality, in a memory hierarchical system. The LPM methodology consists of models and algorithms, and is supported with a series of analytic results for its correctness. The rationale of LPM is to reduce the overall data access delay through the matching of data request rate and data supply rate at each layer of a memory hierarchy, with a balanced consideration of data locality, data concurrency, and latency hiding of data flow. Extensive experimentations on both physical platforms and software simulators confirm our theoretical findings, and they show that the LPM approach can be applied in diverse computing platforms and can effectively guide performance optimization of memory systems. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
34. The Queuing-First Approach for Tail Management of Interactive Services.
- Author
-
Mirhosseini, Amirhossein and Wenisch, Thomas F.
- Subjects
- *
TAILS , *HICCUPS - Abstract
Managing high-percentile tail latencies is key to designing user-facing cloud services. Rare system hiccups or unusual code paths make some requests take $\boldsymbol{10\;\times -100\;\times}$10×-100× longer than the average. Prior work seeks to reduce tail latency by trying to address primarily root causes of slow requests. However, often the bulk of requests comprising the tail are not these rare slow-to-execute requests. Rather, due to head-of-line blocking, most of the tail comprises requests enqueued behind slow-to-execute requests. Under high disparity service distributions, queuing effects drastically magnify the impact of rare system hiccups and can result in high tail latencies even under modest load. We demonstrate that improving the queuing behavior of a system often yields greater benefit than mitigating the individual system hiccups that increase service time tails. We suggest two general directions to improve system queuing behavior–server pooling and common-case service acceleration–and discuss circumstances where each is most beneficial. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
35. Fast Wait-Free Construction for Pool-Like Objects with Weakened Internal Order: Stacks as an Example.
- Author
-
Peng, Yaqiong, Yun, Xiaochun, and Hao, Zhiyu
- Subjects
- *
DATA structures , *CONSTRUCTION , *ARCHITECTURE - Abstract
This paper focuses on a large class of concurrent data structures that we call pool-like objects (e.g., stack, double-ended queue, and queue). Performance and progress guarantee are two important characteristics for concurrent data structures. In the aspect of performance, weakening the internal order in a pool-like object is an effective technique to reduce the synchronization cost among threads accessing the object, but no objects with weakened internal order provide a progress guarantee as strong as wait-freedom. Meanwhile, wait-free algorithms tend to be inefficient, which is mainly attributed to the helping mechanisms. Based on the philosophy of existing helping mechanisms, a wait-free pool-like object with weakened internal order would suffer from unnecessary process of getting the latest object state and synchronization. This paper takes a state-of-the-art implementation of stacks with weakened internal order as an example, and transforms it into a highly-efficient wait-free stack named WF-TS-Stack. The transformation method includes a helping mechanism with state reuse and a relaxed removal scheme. In addition, we use a simple and effective scheme to further improve the performance of WF-TS-Stack in Non-Uniform Memory Access (NUMA) architectures. Our evaluation with representative benchmarks shows that WF-TS-Stack outperforms its original building blocks by up to 1.45× at maximum concurrency. We also discuss how to yield an efficient double-ended queue (deque) variant of WF-TS-Stack, because deque is a more generalized pool-like object. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Composable Building Blocks to Open Up Processor Design.
- Author
-
Zhang, Sizhuo, Wright, Andrew, and Bourgeat, Thomas
- Subjects
- *
MICROPROCESSORS , *MODULAR design , *RESEARCH libraries , *DESIGN - Abstract
In this article, we present a framework called Composable Modular Design (CMD) to facilitate the design of out-of-order processors. In CMD, 1) the interface methods of modules provide instantaneous access and perform atomic updates to the state elements inside the module; 2) every interface method is guarded, i.e., it cannot be applied unless it is ready; and 3) modules are composed by atomic rules which call interface methods of different modules. A rule either successfully updates the state of all the called modules or does nothing. The atomicity properties of interfaces in CMD ensure composability when modules are refined selectively. We show the efficacy of CMD by building an out-of-order RISC-V processor which boots Linux. Modules designed using CMD (e.g., ROB, load-store unit, etc.) can be used and refined by other implementations. We believe that this framework can revolutionize architectural research and practice as the library of reusable components grows. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Applying Transactional Memory for Concurrency-Bug Failure Recovery in Production Runs.
- Author
-
Chen, Yuxi, Wang, Shu, Lu, Shan, and Sankaralingam, Karthikeyan
- Subjects
- *
TRANSACTIONAL analysis , *HARDWARE , *TEXT messages , *SYNCHRONIZATION software , *FEASIBILITY studies - Abstract
Concurrency bugs widely exist and severely threaten system availability. Techniques that help recover from concurrency-bug failures during production runs are highly desired. This paper proposes BugTM, an approach that applies transactional memory techniques for concurrency-bug recovery in production runs. Requiring no knowledge about where are concurrency bugs, BugTM uses static analysis and code transformation to enable BugTM-transformed software to recover from a concurrency-bug failure by rolling back and re-executing the recent history of a failure thread. BugTM is instantiated as three schemes that have different trade-offs in performance and recovery capability: BugTM$_H$H uses existing hardware transactional memory (HTM) support, BugTM$_S$S leverages software transactional memory techniques, and BugTM$_{\mathrm{HS}}$ HS is a software-hardware hybrid design. BugTM greatly improves the recovery capability of state-of-the-art techniques with low run-time overhead and no changes to OS or hardware, while guarantees not to introduce new bugs. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
38. The benefits of concurrent computing in tribology system design.
- Author
-
Wang, Nenzi, Chen, Hsin-Yi, Chen, Yu-Wen, and Lin, Yu-Chun
- Subjects
- *
TRIBOLOGY , *COMPUTER operating systems , *PROGRAMMING languages , *COMPUTER programming , *COMPUTATION laboratories - Abstract
Abstrect Many optimum designs of tribological components are highly time-constrained before final productions. It is well-known that the process of a complex simulated design can be considerably accelerated by using some form of parallel computing. Also, for many tribological models additional assumptions can be relaxed with stricter design constrains, if the execution can be speeded up. In this study, the concurrent computing for tribological design is proposed, which is to perform parallel computing using the multitasking capability of today's operating system. In the concurrent computing a master program, which manages the process of the optimization, is used to launch a number of standalone slave programs (air bearing models) in a quick succession. And the operating system (MS-Windows) of the computer manages the parallel execution of the slave programs. Other than the standard programming language (Fortran 95) this approach uses none of the general parallel programming paradigms or directives, such as message passing interface, OpenMP, or coding using graphics processing units. In this study, the algorithm for the multiobjective optimization is group inching fortification method and the concurrent computing is executed in the algorithm-level. High parallel computing speedups are obtained in the simulated bearing designs. The approach can also be applied in using commercial general-purpose packages for modelling and self-coded methods for optimum design of tribological components or systems. Highlights • The concurrent computing for tribological optimization is accomplished by the multitasking ability of the operating systems. • The concurrent computing for tribological studies is not restricted to parallel programming languages or directives. • The concurrent computing can deal with variable computing load and tasks in parallel for tribological optimizations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. Integrating Concurrency Control in n-Tier Application Scaling Management in the Cloud.
- Author
-
Wang, Qingyang, Chen, Hui, Zhang, Shungeng, Hu, Liting, and Palanisamy, Balaji
- Subjects
- *
ELECTRONIC commerce , *RESOURCE management , *DATABASES , *WEB-based user interfaces , *CLOUD computing - Abstract
Scaling complex distributed systems such as e-commerce is an importance practice to simultaneously achieve high performance and high resource efficiency in the cloud. Most previous research focuses on hardware resource scaling to handle runtime workload variation. Through extensive experiments using a representative n-tier web application benchmark (RUBBoS), we demonstrate that scaling an n-tier system by adding or removing VMs without appropriately re-allocating soft resources (e.g., server threads and connections) may lead to significant performance degradation resulting from implicit change of request processing concurrency in the system, causing either over- or under-utilization of the critical hardware resource in the system. We build a concurrency-aware model that determines a near optimal soft resource allocation of each tier by combining some operational queuing laws and the fine-grained online measurement data of the system. We then develop a dynamic concurrency management (DCM) framework that integrates the concurrency-aware model to intelligently reallocate soft resources in the system during the system scaling process. We compare DCM with Amazon EC2-AutoScale, the state-of-the-art hardware only scaling management solution using six real-world bursty workload traces. The experimental results show that DCM achieves significantly shorter tail latency and higher throughput compared to Amazon EC2-AutoScale under all the workload traces. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. A Distributed Heterogeneous Inspection System for High Performance Inline Surface Defect Detection.
- Author
-
Yu-Cheng Chou, Wei-Chieh Liao, Yan-Liang Chen, Ming Chang, and Po Ting Lin
- Subjects
HIGH performance computing ,SURFACE defects ,TURNAROUND time - Abstract
This paper presents the Distributed Heterogeneous Inspection System (DHIS), which comprises two CUDA workstations and is equipped with CPU distributed computing, CPU concurrent computing, and GPU concurrent computing functions. Thirty-two grayscale images, each with 5,000×12,288 pixels and simulated defect patterns, were created to evaluate the performances of three system configurations: (1) DHIS; (2) two CUDA workstations with CPU distributed computing and GPU concurrent computing; (3) one CUDA workstation with GPU concurrent computing. Experimental results indicated that: (1) only DHIS can satisfy the time limit, and the average turnaround time of DHIS is 37.65% of the time limit; (2) a good linear relationship exists between the processing speed ratio and the instruction sequence quantity ratio. [ABSTRACT FROM AUTHOR]
- Published
- 2019
41. Towards the Concurrent Execution of Multiple Hyperspectral Imaging Applications by Means of Computationally Simple Operations
- Author
-
María Díaz, Raúl Guerra, Pablo Horstrand, Sebastián López, José F. López, and Roberto Sarmiento
- Subjects
hyperspectral imaging ,lossy compression ,anomaly detection ,hardware-friendly ,advanced processing algorithms ,concurrent computing ,Science - Abstract
The on-board processing of remotely sensed hyperspectral images is gaining momentum for applications that demand a quick response as an alternative to conventional approaches where the acquired images are off-line processed once they have been transmitted to the ground segment. However, the adoption of this on-board processing strategy brings further challenges for the remote-sensing research community due to the high data rate of the new-generation hyperspectral sensors and the limited amount of available on-board computational resources. This situation becomes even more stringent when different time-sensitive applications coexist, since different tasks must be sequentially processed onto the same computing device. In this work, we have dealt with this issue through the definition of a set of core operations that extracts spectral features useful for many hyperspectral analysis techniques, such as unmixing, compression and target/anomaly detection. Accordingly, it permits the concurrent execution of such techniques reusing operations and thereby requiring much less computational resources than if they were separately executed. In particular, in this manuscript we have verified the goodness of our proposal for the concurrent execution of both the lossy compression and anomaly detection processes in hyperspectral images. To evaluate the performance, several images taken by an unmanned aerial vehicle have been used. The obtained results clearly support the benefits of our proposal not only in terms of accuracy but also in terms of computational burden, achieving a reduction of roughly 50% fewer operations to be executed. Future research lines are focused on extending this methodology to other fields such as target detection, classification and dimensionality reduction.
- Published
- 2020
- Full Text
- View/download PDF
42. SymphonyDB: A Polyglot Model for Knowledge Graph Query Processing
- Author
-
Masoud Salehpour and Joseph G. Davis
- Subjects
FOS: Computer and information sciences ,Information retrieval ,Computer Science - Performance ,Point (typography) ,business.industry ,Computer science ,Data management ,Polyglot ,Databases (cs.DB) ,Semantics ,Data access layer ,Performance (cs.PF) ,Consistency (database systems) ,Computer Science - Databases ,Computer Science - Distributed, Parallel, and Cluster Computing ,Benchmark (computing) ,Concurrent computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,business - Abstract
Unlocking the full potential of Knowledge Graphs (KGs) to enable or enhance various semantic and other applications requires Data Management Systems (DMSs) to efficiently store and process the content of KGs. However, the increases in the size and variety of KG datasets as well as the growing diversity of KG queries pose efficiency challenges for the current generation of DMSs to the extent that the performance of representative DMSs tends to vary significantly across diverse query types and no single platform dominates performance. We present our extensible prototype, SymphonyDB, as an approach to addressing this problem based on a polyglot model of query processing as part of a multi-database system supported by a unified access layer that can analyze/translate individual queries just-in-time and match each to the likely best-performing DMS among Virtuoso, Blazegraph, RDF-3X, and MongoDB as representative DMSs that are included in our prototype at this time. The results of our experiments with the prototype over well-known KG benchmark datasets and queries point to the efficiency and consistency of its performance across different query types and datasets., arXiv admin note: text overlap with arXiv:2004.06203
- Published
- 2022
43. Specification of the Framework for Evaluating the Efficiency of Mechanisms for Concurrent Processes
- Author
-
Nazaruk Vladislav and Rusakov Pavel
- Subjects
concurrent computing ,efficiency ,inter-process communication ,mechanisms ,software framework ,Computer software ,QA76.75-76.765 - Abstract
The goal of this paper is to give a specification of the software framework that evaluates the efficiency of different mechanisms for concurrent processes, notably process synchronization mechanisms. The paper discusses the concept of this framework, the potential users of it and some necessary considerations, including assumptions. Further, it defines general requirements for the framework and presents its desired conceptual design. The conclusions and possible directions for future work end the paper.
- Published
- 2014
- Full Text
- View/download PDF
44. Implementing Snapshot Objects on Top of Crash-Prone Asynchronous Message-Passing Systems.
- Author
-
Delporte-Gallet, Carole, Fauconnier, Hugues, Rajsbaum, Sergio, and Raynal, Michel
- Subjects
- *
PHOTOGRAPHS as information resources , *COMPUTER storage devices , *ASYNCHRONOUS transfer mode , *MESSAGE passing (Computer science) , *ALGORITHMS , *TIMESTAMPS - Abstract
In asynchronous crash-prone read/write shared-memory systems there is the notion of a snapshot object, which simulates the behavior of an array of single-writer/multi-reader (SWMR) shared registers that can be read atomically. Processes in the system can access the object invoking (any number of times) two operations, denoted $\mathsf {write}()$ and $\mathsf {snapshot}()$ . A process invokes $\mathsf {write}()$ to update the value of its register in the array. When it invokes $\mathsf {snapshot}()$ , the process obtains the values of all registers, as if it read them simultaneously. It is known that a snapshot object can be implemented on top of SWMR registers, tolerating any number of process failures. Snapshot objects provide a level of abstraction higher than individual SWMR registers, and they simplify the design of applications. Building a snapshot object on an asynchronous crash-prone message-passing system has similar benefits. The object can be implemented by using the known simulations of a SWMR shared memory on top of an asynchronous message-passing system (if less than half the processes can crash), and then build a snapshot object on top of the simulated SWMR memory. This paper presents an algorithm that implements a snapshot object directly on top of the message-passing system, without building an intermediate layer of a SWMR shared memory. To the authors knowledge, the proposed algorithm is the first providing such a direct construction. The algorithm is more efficient than the indirect solution, yet relatively simple. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
45. Software Transactional Memory in Java on Clojure: A Performance Analysis.
- Author
-
Tomeu, Antonio, Salguero, Alberto, and Capel, Manuel
- Abstract
In the field of concurrency and parallelism, it is known that the use of synchronization techniques based on locks to synchronize threads, is dangerous. When locks are introduced to control access to a shared resource, the probability of making control errors that give rise to security and liveliness problems is very high, and the software correction analysis developed under this paradigm is very complex. An alternative already consolidated in the scope of research, although not fully assumed in the area of software production, is the Software Transactional Memory (MTS), which protects access to shared data within transactions. Using MTS, when two threads attempt to access the same data, the transaction handler resolves the conflict, without the need to use explicit locks in our code. This results in a more readable and secure software, and with a potentially greater concurrency, since there is no waiting for threads. In this work we analyze an implementation of MTS for Java, supported by the Clojure functional language, and compare its performance with the standard blocking model, that Java supports under critical regions in the standard concurrency API, and by locks of multiple classes in the high-level concurrency API. The results show that the MTS model for Java over Clojure analyzed, is far from offers performances better than the alternative standard block model. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
46. Automated Testing of WS-BPEL Service Compositions: A Scenario-Oriented Approach.
- Author
-
Sun, Chang-ai, Zhao, Yan, Pan, Lin, Liu, Huai, and Chen, Tsong Yueh
- Abstract
Nowadays, service oriented architecture (SOA) has become one mainstream paradigm for developing distributed applications. As the basic unit in SOA, web services can be composed to construct complex applications. The quality of web services and their compositions is critical to the success of SOA applications. Testing, as a major quality assurance technique, is confronted with new challenges in the context of service compositions. In this paper, we propose a scenario-oriented testing approach that can automatically generate test cases for service compositions. Our approach is particularly focused on the service compositions specified by Business Process Execution Language for web services (WS-BPEL), a widely recognized executable service composition language. In the approach, a WS-BPEL service composition is first abstracted into a graph model; test scenarios are then derived from the model; finally, test cases are generated according to different scenarios. We also developed a prototype tool implementing the proposed approach, and an empirical study was conducted to demonstrate the applicability and effectiveness of our approach. The experimental results show that the automatic scenario-oriented testing approach is effective in detecting many types of faults seeded in the service compositions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
47. Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.
- Author
-
Zhu, Maohua, Zhuo, Youwei, Wang, Chao, Chen, Wenguang, and Xie, Yuan
- Subjects
GRAPHICS processing units ,COMPUTER storage devices ,ELECTRIC power consumption - Abstract
Graphics processing units (GPUs) are widely used to accelerate data-intensive applications. To improve the performance of data-intensive applications, higher GPU memory bandwidth is desirable. Traditional graphics double data rate memories achieve higher bandwidth by increasing frequency, which leads to excessive power consumption. Recently, a new memory technology called high-bandwidth memory (HBM) based on 3-D die-stacking technology has been used in the latest generation of GPUs, which can provide both high bandwidth and low power consumption with in-package-stacked DRAM memory. However, the capacity of integrated in-package-stacked memory is limited (e.g., only 4 GB for the state-of-the-art HBM-enabled GPU, AMD Radeon Fury X). In this paper, we implement two representative data-intensive applications, convolutional neural network (CNN) and breadth-first search (BFS) on an HBM-enabled GPU to evaluate the improvement brought by the adoption of the HBM, and to investigate techniques to fully unleash the benefits of such HBM-enabled GPU. Based on the evaluation results, we first propose a software pipeline to alleviate the capacity limitation of the HBM for CNN. We then design two programming techniques to improve the utilization of memory bandwidth for BFS application. Experimental results demonstrate that our pipelined CNN training achieves a $1.63\times $ speedup on an HBM-enabled GPU compared with the best high-performance GPU in market, and the two optimization techniques for the BFS algorithm make it at most $24.5\times $ ( $9.8\times $ and $2.5\times $ for each technique, respectively) faster than conventional implementations. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
48. Quadboost: A Scalable Concurrent Quadtree.
- Author
-
Zhou, Keren, Tan, Guangming, and Zhou, Wei
- Subjects
- *
QUADTREES , *MATHEMATICAL decoupling , *HARDWARE , *SEARCH algorithms , *INFORMATION retrieval - Abstract
Building concurrent spatial trees is more complicated than binary search trees since a space hierarchy should be preserved during modifications. We present a non-blocking quadtree (quadboost) that supports concurrent insert, remove, move, and contain operations, in which the move operation combines the searches for different keys together and modifies different positions atomically. To increase its concurrency, a decoupling approach is proposed to separate physical adjustment from logical removal within the remove operation. In addition, we design a continuous find mechanism to reduce the search cost. Experimental results show that quadboost scales well on a multi-core system with 32 hardware threads. It outperforms existing concurrent trees in retrieving two-dimensional keys with up to 109 percent improvement when the number of threads is large. Furthermore, the move operation achieves better performance than the best-known algorithm with up to 47 percent. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
49. Prototyping of Concurrent Control Systems With Application of Petri Nets and Comparability Graphs.
- Author
-
Wisniewski, Remigiusz, Karatkevich, Andrei, Adamski, Marian, Costa, Aniko, and Gomes, Luis
- Subjects
PETRI nets ,PROTOTYPES ,GRAPH theory - Abstract
This paper shows a novel prototyping technique for concurrent control systems described by interpreted Petri nets. The technique is based on the decomposition of an interpreted Petri net into concurrent sequential automata. In general, minimum decomposition requires runtime that is exponential in the number of Petri net places. We show that in many cases, including the real-life ones, the minimum decomposition problem can be solved in polynomial time. The proposed method allows implementing a concurrent control system using minimal number of sequential components, which requires polynomial time and can be applied to most of the considered cases. The presented concept is illustrated by a real-life industrial example of a beverage production and distribution machine implemented in a field programmable gate array. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
50. MV-FTL: An FTL That Provides Page-Level Multi-Version Management.
- Author
-
Lee, Doogie, Shin, Mincheol, Choi, Wongi, Roh, Hongchan, and Park, Sanghyun
- Subjects
- *
SOLID state electronics , *DATABASE management , *COMPUTER systems , *COMPUTER storage devices , *ALGORITHMS - Abstract
In this paper, we propose MV-FTL, a multi-version flash transition layer (FTL) that provides page-level multi-version management. By extending a unique characteristic of solid-state drives (SSDs), the out-of-place (OoP) update to multi-version management, MV-FTL can both guarantee atomic page updates from each transaction and provide concurrency without requiring redundant log data writes as well. For evaluation, we first modified SQLite, a lightweight database management system (DBMS), to cooperate with MV-FTL. Owing to the architectural simplicity of SQLite, we clearly show that MV-FTL improves both the performance and the concurrency aspects of the system. In addition, to prove the effectiveness in a full-fledged enterprise-level DBMS, we modified MyRocks, a MySQL variant by Facebook, to use our new Patch Compaction algorithm, which deeply relies on MV-FTL. The TPC-C and LinkBench benchmark tests demonstrated that MV-FTL reduces the overall amount of writes, implying that MV-FTL can be effective in such DBMSs. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.