Journal: journal of parallel & distributed computing / Publisher: academic press inc. / Search Limiters: Peer Reviewed - Searchworks@Jio Institute Digital Library Search Results

1. IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms.

Author: Benoit, Anne
Subjects: *PUBLISHING, *PERIODICAL publishing, *PERIODICAL articles, *COMBINATORICS, *MATHEMATICAL periodicals
Published: 2015
Full Text: View/download PDF

2. Call For Papers: Security in Grid and Distributed Systems

Published: 2005
Full Text: View/download PDF

3. Call for Papers

Published: 2002
Full Text: View/download PDF

4. Call for Papers: General Purpose Parallel Processing Using GPUs

Published: 2008
Full Text: View/download PDF

5. Call For papers: Fill in from material uploaded

Published: 2006
Full Text: View/download PDF

6. Call for papers (last folioed page)

Published: 2006
Full Text: View/download PDF

7. Call for Papers

Published: 2006
Full Text: View/download PDF

8. Call for Papers: Parallel Techniques for Information Extraction

Published: 2005
Full Text: View/download PDF

9. Call for papers special issue of the journal of parallel distributed computing (JPDC): Grids in bioinformatics and computational biology

Published: 2005
Full Text: View/download PDF

10. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

11. Call for Papers: Special Issue on High-Performance Computational Biology

Published: 2002
Full Text: View/download PDF

12. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

13. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

14. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

15. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

16. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

17. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

18. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

19. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

20. Papers to Appear in Forthcoming Issues.

Published: 2002
Full Text: View/download PDF

21. Novel schemes for embedding Hamiltonian paths and cycles in balanced hypercubes with exponential faulty edges.

Author: Li, Xiao-Yan, Zhao, Kun, Zhuang, Hongbin, and Jia, Xiaohua
Subjects: *HYPERCUBES, *FAULT tolerance (Engineering), *COMPUTER systems, *PARALLEL programming, *NETWORK performance
Abstract: The balanced hypercube B H n plays an essential role in large-scale parallel and distributed computing systems. With the increasing probability of edge faults in large-scale networks and the widespread applications of Hamiltonian paths and cycles, it is especially essential to study the fault tolerance of networks in the presence of Hamiltonian paths and cycles. However, existing researches on edge faults ignore that it is almost impossible for all faulty edges to be concentrated in a certain dimension. Thus, the fault tolerance performance of interconnection networks is severely underestimated. This paper focuses on three measures, t -partition-edge fault-tolerant Hamiltonian, t -partition-edge fault-tolerant Hamiltonian laceable, and t -partition-edge fault-tolerant strongly Hamiltonian laceable, and utilizes these measures to explore the existence of Hamiltonian paths and cycles in balanced hypercubes with exponentially faulty edges. We show that the B H n is 2 n − 1 -partition-edge fault-tolerant Hamiltonian laceable, 2 n − 1 -partition-edge fault-tolerant Hamiltonian, and (2 n − 1 − 1) -partition-edge fault-tolerant strongly Hamiltonian laceable for n ≥ 2. Comparison results show the partitioned fault model can provide the exponential fault tolerance as the value of the dimension n grows. • Based on the partitioned fault model, this paper propose three novel indicators for the balanced hypercubes B H n. • B H n is 2 n − 1 (resp. 2 n − 1 − 1)-partition-edge fault-tolerant (resp. strongly) Hamiltonian laceable. • Comparison results show the partitioned fault model can provide exponential fault tolerance as the dimension n grows. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. Leaderless consensus.

Author: Antoniadis, Karolos, Benhaim, Julien, Desjardins, Antoine, Poroma, Elias, Gramoli, Vincent, Guerraoui, Rachid, Voron, Gauthier, and Zablotchi, Igor
Subjects: *WIDE area networks, *MESSAGE passing (Computer science)
Abstract: Classic synchronous consensus algorithms are leaderless in that processes exchange proposals, choose the maximum value and decide when they see the same choice across a couple of rounds. Indulgent consensus algorithms are typically leader-based. Although they tolerate unexpected delays and find practical applications in blockchains running over an open network like the Internet, their performance is highly dependent on the activity of a single participant. This paper asks whether, under eventual synchrony, it is possible to deterministically solve consensus without a leader. The fact that the weakest failure detector to solve consensus is one that also eventually elects a leader seems to indicate that the answer to the question is negative. We prove in this paper that the answer is actually positive. We first give a precise definition of the very notion of a leaderless algorithm. Then we present three indulgent leaderless consensus algorithms, each we believe interesting in its own right: (i) for shared memory, (ii) for message passing with omission failures and (iii) for message passing with Byzantine failures. Finally, we implement a Byzantine fault tolerant (BFT) state machine replication (SMR), that is leaderless. Our empirical results demonstrate that it is faster and more robust than HotStuff, the recent BFT SMR algorithm used in the Facebook Libra blockchain when deployed in a wide area network. • Eventually synchronous consensus algorithms are leader-based. We show that they can be leaderless. • We propose a formal definition of leaderless consensus algorithm. • We propose a didactic construction of a byzantine fault tolerant consensus algorithm. • We prove correctness. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. Reducing energy consumption using heterogeneous voltage frequency scaling of data-parallel applications for multicore systems.

Author: Bratek, Pawel, Szustak, Lukasz, Wyrzykowski, Roman, and Olas, Tomasz
Subjects: *VOLTAGE, *FLUID dynamics, *MULTICORE processors, *PARALLEL algorithms
Abstract: This paper investigates the exploitation of heterogeneous DVFS (dynamic voltage frequency scaling) control for improving the energy efficiency of data-parallel applications on ccNUMA shared-memory systems. We propose to adjust the clock frequency individually for the appropriately selected groups of cores, taking into account the diversified costs of parallel computation. This paper aims to evaluate the proposed approach using two different data-parallel applications: solving the 3D diffusion problem, and MPDATA fluid dynamics application. As a result, we observe the energy-savings gains of up to 20 percentage points over the traditional homogeneous frequency scaling approach on the server with two 18-core Intel Xeon Gold 6240. Additionally, we confirm the effectiveness of our strategy using two 64-core AMD EPYC 7773X. This paper also introduces two pruning algorithms that help select the optimal heterogeneous DVFS setups taking into account the energy or performance profile of studied applications. Finally, the cost and efficiency of developed algorithms are verified and compared experimentally against the brute-force search. • Heterogeneous DVFS method for energy efficiency of regular data-parallel applications. • Individually adjusting clock frequency for cores based on workload distribution. • Pruning algorithms for selecting optimal heterogeneous DVFS setups. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Distributed three-way formal concept analysis for large formal contexts.

Author: Chunduri, Raghavendra Kumar and Cherukuri, Aswani Kumar
Subjects: *DATA mining software, *PARALLEL algorithms, *DISTRIBUTED algorithms, *KNOWLEDGE representation (Information theory), *SOFTWARE engineering, *MINING engineering, *SOFTWARE engineers
Abstract: Three-way concept analysis (3WCA) is a framework based on Formal concept analysis and three-way decisions is used in the field of knowledge discovery to solve uncertainties in many domains like machine learning, data mining and software engineering. The 3WCA requires both the formal context and its complement context for generating concepts and constructing the concept lattice. In three-way concept analysis, data are analyzed using two types of concept lattices constructed using classical formal concept analysis (FCA) algorithms: object-induced (OE) concept lattices and attribute-induced (AE) concept lattices. The existing formal concept analysis algorithms focus on the sequential generation of OE and AE concepts rather than finding them in parallel and cannot process large datasets efficiently. The main contribution of this paper is that we propose a novel parallel algorithm for concept generation and construction of the three way concept lattice for knowledge discovery and representation in large datasets. Aiming to construct an efficient algorithm for 3WCA, this paper primarily discusses the existing algorithms for concept generation. Further we develop an efficient algorithm for OE and AE concept generation and lattice construction. Extensive experiments are conducted on various datasets to evaluate the efficiency of the proposed algorithm. Both the experimental and statistical results demonstrate the efficacy of the algorithm on larger datasets. Also the proposed algorithm can significantly decrease the time required for OE/AE concept generation and lattice construction compared to the existing classical FCA algorithms. • This article proposes an efficient distributed algorithm for performing three-way concept analysis (3WCA). • The proposed algorithm distributes the input data across the cluster and performs in-memory computations. • The experimental and statistical validations prove that the proposed algorithm is efficient for 3WCA. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. A job scheduling algorithm based on parallel workload prediction on computational grid.

Author: Tang, Xiaoyong, Liu, Yi, Deng, Tan, Zeng, Zexin, Huang, Haowei, Wei, Qiyu, Li, Xiaorong, and Yang, Li
Subjects: *GRID computing, *PARALLEL algorithms, *SCHEDULING, *HEURISTIC, *PREDICTION models, *FORECASTING
Abstract: Generally, the computational grid consists of a large number of computing nodes, some of them are idle due to the uneven geographical distribution of computing requirements. This may cause workload unbalancing problems, which affect the performance of large-scale computational grids. In order to balance the computing requirements and computing nodes, we propose a job scheduling algorithm based on the workload prediction of computing nodes. We first analyze the causes of workload imbalance and the feasibility of reallocating computing resources. Secondly, we design an application and workload-aware scheduling algorithm (AWAS) by combining the previously designed workload prediction model. To reduce the complexity of the AWAS algorithm, we propose a parallel job scheduling method based on computing node workload prediction. The experiments show that the AWAS algorithm can balance the workload among different computing nodes on the real-world dataset. In addition, we propose the parallelism of workload prediction model from the perspective of internal structure and data set to make AWAS apply to more computing nodes of the large-scale computing grids. Experimental results show that the combination of the two can achieve satisfactory acceleration efficiency. • In this paper, we propose an application and grid computing nodes workload aware scheduling algorithm (AWAS). • AWAS optimizes grid loads using greedy and heuristic methods. • This paper parallelizes the scheduling process of AWAS to improve its efficiency. • The experiment results show that the AWAS can achieve workload balancing in China National Grid. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Reliability of the weight vector generation method of the multi-objective evolutionary algorithm and application.

Author: Gao, Shuzhi, Ren, Xuepeng, Zhang, Yimin, and Tang, Haihong
Subjects: *EVOLUTIONARY algorithms, *COEFFICIENTS (Statistics), *RANK correlation (Statistics), *BENCHMARK problems (Computer science), *STATISTICAL correlation, *LIGHT aircraft
Abstract: The decomposition-based multi-objective evolutionary algorithm first generates a set of weight vectors in advance, and it is very important to select a set of appropriate weight vectors for the decomposition-based algorithm. A variety of weight vector generation methods have been proposed in the existing algorithms, but in most algorithms, a pre-defined weight vector generation method is still used, the pre-defined weight vector is too specialized for the simplex-like front surface, which results in poor performance on the front surface with irregularities. At the same time, most of the existing algorithms have proposed many new adaptive strategies for weight vectors, but if you generate a set of more suitable weight vectors at the beginning, and then use the update strategy, it can make the algorithm achieve a better balance between diversity and convergence. In order to select a suitable weight vector, this paper proposes a multi-stage MOEA to select a suitable weight vector. The algorithm is divided into multiple stages according to the evolution process, first of all, in the early stage of evolution, the reliability of multiple weight vector generation methods was evaluated according to the spearman correlation coefficient in statistics, choose the most suitable weight generation method; Secondly, this method can be applied to the search for high-quality solutions in the middle of evolution; Finally, a weight vector adaptive strategy is adopted in the overall evolution process. In the experiment, the proposed algorithm was analyzed in the benchmark test problem, mechanical bearing and light aircraft gear reducer. The experimental results show the effectiveness of the proposed algorithm. • This paper proposes a multi-stage and multi-objective algorithm. • A method of judging the reliability of the weight vector is proposed. • Proposed a way to find high-quality solutions. • Compare with other 15 algorithms in 47 test questions. • Optimized mechanical bearings and gear reducers. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

27. A cluster-based routing method with authentication capability in Vehicular Ad hoc Networks (VANETs).

Author: Azhdari, Mohammad Sadegh, Barati, Ali, and Barati, Hamid
Subjects: *VEHICULAR ad hoc networks, *AD hoc computer networks, *END-to-end delay, *MESSAGE authentication codes, *DATA packeting, *ROUTING algorithms, *WIRELESS communications
Abstract: • This paper presents a clustered routing algorithm with authentication capability. • This routing method prioritizes data packets based on their type. • This paper suggests a suitable clustering algorithm. • This scheme reduces end to end delay and improves packet delivery rate. • The proposed method compare with three routing methods indicates its superiority. Routing is challenging in vehicular ad hoc networks due to their features, such as high mobility of nodes and unstable wireless communication links. Therefore, it is an interesting issue for researchers. In addition, it is very important to design an authentication mechanism between the source node and the destination node because these networks are exposed to many attacks due to their features mentioned above. In this paper, we present a fuzzy logic-based routing method with authentication capability in vehicular ad hoc networks. The proposed routing method has three phases: clustering phase, routing phase between cluster head nodes, and authentication phase. In the first phase, vehicles are clustered using an efficient scheme. In the proposed method, we define two types of data packets: immediate and ordinary. Various data packets have different route discovery processes that are described in Phase 2. Note that any data packet type is divided into two groups: simple and secure. Simple data packets have no authentication mechanism. On the other hand, secure data packets use an authentication mechanism based on message authentication code (MAC) and symmetric key cryptography. We simulate the proposed method using NS2. Simulation results are compared with three routing protocols including, AODV, R 2 SCDT , and 3VSR. Experiments show that the proposed method outperforms others in terms of end-to-end delay, packet collision, packet delivery rate (PDR), packet loss rate (PLR), and throughput. However, it increases the routing overhead slightly. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Local outlier factor for anomaly detection in HPCC systems.

Author: Adesh, Arya, G, Shobha, Shetty, Jyoti, and Xu, Lili
Subjects: *CREDIT card fraud, *BIG data, *DISTRIBUTED computing, *TIME complexity, *COMPUTER systems, *BANK fraud, *SMART cards, *COMPUTER workstation clusters
Abstract: • LOF is an unsupervised anomaly detection algorithm that mines anomalies by calculating the local density of data points relative to their neighborhood. In this work LOF algorithm was implemented using the ECL (enterprise control language) programming language on the HPCC systems (high-performance computing cluster system) platform, an open-source distributed computing platform. • Improved LOF is a modified version of normal LOF, designed to handle datasets with duplicates. This work discusses the implementation of both normal LOF and improved LOF algorithms in HPCC systems for credit card fraud detection and localization data for person activity datasets. • Segmented k - d tree and unsegmented k - d tree are techniques proposed for neighbor search in a distributed system with worst-case time complexity of O((MinPts *| D |)*log(| D |)), where | D | represents the number of data points in the dataset and MinPts is the hyperparameter value. • LOF compared with other anomaly detection algorithms like COF, LoOP, and kNN across 6 benchmark datasets in the HPCC systems platform, demonstrating a favorable balance between execution time and precision in anomaly detection. • LOF implementation was compared across big-data frameworks like Spark, Hadoop, and HPCC systems revealing superior scalability and performance in HPCC systems, especially with larger datasets and higher MinPts values. Local Outlier Factor (LOF) is an unsupervised anomaly detection algorithm that finds anomalies by assessing the local density of a data point relative to its neighborhood. Anomaly detection is the process of finding anomalies in datasets. Anomalies in real-time datasets may indicate critical events like bank frauds, data compromise, network threats, etc. This paper deals with the implementation of the LOF algorithm in the HPCC Systems platform, which is an open-source distributed computing platform for big data analytics. Improved LOF is also proposed which efficiently detects anomalies in datasets rich in duplicates. The impact of varying hyperparameters on the performance of LOF is examined in HPCC Systems. This paper examines the performance of LOF with other algorithms like COF, LoOP, and kNN over several datasets in the HPCC Systems. Additionally, the efficacy of LOF is evaluated across big-data frameworks such as Spark, Hadoop, and HPCC Systems, by comparing their runtime performances. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Balancing privacy and performance in federated learning: A systematic literature review on methods and metrics.

Author: Mohammadi, Samaneh, Balador, Ali, Sinaei, Sima, and Flammini, Francesco
Subjects: *FEDERATED learning, *DISTRIBUTED artificial intelligence, *DATA privacy, *ARTIFICIAL intelligence, *PRIVACY, *DEEP learning
Abstract: Federated learning (FL) as a novel paradigm in Artificial Intelligence (AI), ensures enhanced privacy by eliminating data centralization and brings learning directly to the edge of the user's device. Nevertheless, new privacy issues have been raised particularly during training and the exchange of parameters between servers and clients. While several privacy-preserving FL solutions have been developed to mitigate potential breaches in FL architectures, their integration poses its own set of challenges. Incorporating these privacy-preserving mechanisms into FL at the edge computing level can increase both communication and computational overheads, which may, in turn, compromise data utility and learning performance metrics. This paper provides a systematic literature review on essential methods and metrics to support the most appropriate trade-offs between FL privacy and other performance-related application requirements such as accuracy, loss, convergence time, utility, communication, and computation overhead. We aim to provide an extensive overview of recent privacy-preserving mechanisms in FL used across various applications, placing a particular focus on quantitative privacy assessment approaches in FL and the necessity of achieving a balance between privacy and the other requirements of real-world FL applications. This review collects, classifies, and discusses relevant papers in a structured manner, emphasizing challenges, open issues, and promising research directions. • A comprehensive category of privacy-preserving mechanisms in Federated Learning. • Analyze the impact of privacy-preserving mechanisms in Federated Learning systems. • Investigate the trade-offs between privacy and other performance requirements. • Investigate existing methods and metrics for assessing privacy in Federated Learning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Spatiotemporal dynamics analysis and parameter optimization of a network epidemic-like propagation model based on neural network method.

Author: Shen, Shuling, Chen, Xinlin, and Zhu, Linhe
Subjects: *CONVOLUTIONAL neural networks, *MONTE Carlo method, *PARAMETER identification, *SYSTEM identification, *NEURAL circuitry
Abstract: In this paper, a reaction-diffusion model is established to study the dynamic behavior of rumor propagation. Firstly, we consider the existence of the positive equilibrium points. Then, we perform a stability analysis to study the conditions for the occurrence of Turing instability. Secondly, we use multiscale analysis to derive the expression of the amplitude equation. In the process of numerical simulation, the reality is considered. It shows that controlling the spread rate of rumor and the number of new Internet users have a great effect on curbing the spread of online rumor. Furthermore, it is proved that the analysis of amplitude equation plays a decisive role in the formation of Turing patterns. We also discuss the phenomenon of Turing patterns when the network structure changes and verify the rationality of the model by Monte Carlo method. Finally, we consider two methods based on statistical principle and convolutional neural network severally to identify the parameters of the reaction-diffusion system with Turing instability by using stable patterns. The statistical principle-based method offers superior accuracy, whereas the convolutional neural network-based approach significantly reduces recognition time and cuts down time costs. • This paper investigates a rumor propagation system with reaction-diffusion behavior. • The properties of Turing pattern with or without network background are studied. • We compare different methods to select the optimal parameter identification for system. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Parameter identification method of a reaction-diffusion network information propagation system based on optimization theory.

Author: Ding, Yi and Zhu, Linhe
Subjects: *PARAMETER identification, *MATHEMATICAL optimization, *INFORMATION networks, *INFORMATION storage & retrieval systems, *ALLEE effect, *IDENTIFICATION, *DIFFUSION of innovations
Abstract: With the development of the times, rumors spread rapidly on the Internet. Firstly, this paper establishes a reaction-diffusion system with Allee effect to describe the rumor spreading process and derives the necessary conditions for the emergence of Turing bifurcation. Next, a parameter identification approach utilizing optimal control theory is shown. Ultimately, the impact of the magnitude of the certain parameters in the objective function on parameter identification is examined through numerous parameter identifications in continuous space and various complex networks. Additionally, the convergence rates and error magnitudes of different algorithms for parameter identification are studied across different spatial structures. • The necessary conditions for the emergence of Turing bifurcation are investigated. • This paper applies optimal control theory to achieve parameter identification. • The effects of the three algorithms and topologies on parameter identification are studied. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. GPU-accelerated scalable solver with bit permutated cyclic-min algorithm for quadratic unconstrained binary optimization.

Author: Yasudo, Ryota, Nakano, Koji, Ito, Yasuaki, Katsuki, Ryota, Tabata, Yusuke, Yazane, Takashi, and Hamano, Kenichiro
Subjects: *QUANTUM annealing, *RANDOM numbers, *ISING model, *COMBINATORIAL optimization, *ALGORITHMS, *GRAPHICS processing units, *IMAGE encryption
Abstract: • The adaptive bulk search is a QUBO solver alternative to quantum annealing. • The cyclic-min algorithm is a SA-like algorithm without random number generation. • The bit permutated cyclic-min algorithm provides near-optimal solutions for QUBO. • Our scalable implementation attains linear improvement of the search rate. A wide range of combinatorial optimization problems can be reduced to the Ising model, and equivalently the quadratic unconstrained binary optimization (QUBO) problem. Thus, in recent years, researchers have proposed to solve QUBO on FPGAs, GPUs, and special-purpose processors. The adaptive bulk search (ABS) is a previously-proposed framework for solving QUBO in parallel on multiple GPUs. In the ABS, a CPU host performs a GA-based global search while GPUs asynchronously perform many local searches in parallel. An original ABS adopts a simple local search algorithm called a cyclic-min algorithm, which does not use pseudo random numbers. However, the lack of randomness may cause a potential drawback of restricted bit-flipping operations in a local search. To avoid this drawback, this paper proposes a cyclic-min algorithm with randomly-generated multiple bit permutations, which enables a more effective local search with random number generation in CPUs (not in GPUs). Furthermore, this paper introduces a scalable implementation of the ABS with MPI and OpenMP. Our experimental results on TSUBAME3.0 show that the solution quality improves and the throughput linearly increases as the number of GPUs increases; with 256 GPUs, it evaluates 20.1 × 10 12 solutions per second. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. MIDP: An MDP-based intelligent big data processing scheme for vehicular edge computing.

Author: Liu, Shun, Yang, Qiang, Zhang, Shaobo, Wang, Tian, and Xiong, Neal N.
Subjects: *BIG data, *EDGE computing, *ELECTRONIC data processing, *DISTRIBUTION (Probability theory), *MARKOV processes, *DECISION making
Abstract: The number of Vehicle Equipment (VE) connected to the Internet is increasing, and these VEs generate tasks that contain large amounts of data. Processing these tasks requires a lot of computing resources. Therefore, it is a promising issue that offloading compute-intensive tasks from resource-limited vehicles to Vehicular Edge Computing (VEC) servers, which involves big data transmission, processing and computation. In a network, multiple providers provide VEC servers. When a vehicle generates a task, our goal is to make an intelligent decision on whether and when to offload this task to VEC servers to minimize the task completion time and total big data processing time. When each vehicle passes VEC servers, the vehicle can decide to offload its task to the VEC server in the current communication range, or continue to drive until it reaches the next server's communication range. This issue can be considered as an asset selling problem. It is a challenging issue to make a smart decision for the vehicle with a location view because the vehicle is not sure when the next VEC server will be available and how much about the available computing capacity of the next VEC server. Firstly, this paper formulates the problem as a Markov Decision Process (MDP), defines and analyzes the state set, action set, reward model, and state transition probability distribution. Then it uses Asynchronous Advantage Actor-Critic (A3C) algorithm to solve this MDP problem, builds the various elements of the A3C algorithm, uses Actor (the strategy function) to generate two actions of the vehicle: offloading and moving without offloading. Thirdly, it uses Critic (the value function) to evaluate Actor's behavior, and guide Actor's actions in subsequent stages. The Actor starts from the initial state in the state space until it enters the termination state, forming a complete decision-making process. It minimizes the completion time of task offloading through learning thereby reducing the delay of big data processing. Compared to the Immediately Offload (IO) scheme and Expect Offload (EO) scheme, the MIDP scheme proposed in this paper reduces the average task offloading delay to 29.93% and 29.99%, close to the EO scheme in terms of task completion rate and up to 66.6% improvement compared to the IO scheme. • A task offloading approach in Internet of Vehicles (IoV) is proposed. • An MDP-based Intelligent Big Data Processing (MIDP) Scheme is proposed. • Simulations demonstrate the better performances of MIDP scheme. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. ProvNet: Networked bi-directional blockchain for data sharing with verifiable provenance.

Author: Chenli, Changhao, Tang, Wenyi, Gomulka, Frank, and Jung, Taeho
Subjects: *INFORMATION sharing, *BLOCKCHAINS, *DATA security, *SYSTEMS design, *MATHEMATICAL optimization, *RECORD collecting
Abstract: Data sharing is increasingly popular especially for scientific research and business fields where large volume of datasets need to be used, but it involves data security and privacy concerns. This paper mitigates such concerns by tracking and logging the history of shared data (i.e., provenance records) while preserving data privacy. This is a challenging problem in the data sharing scenario in this paper because the environment is decentralized and internal logs are not accessible publicly due to privacy concerns. We present ProvNet, a decentralized data sharing platform that can detect malicious users and provide secure provenance records using the newly proposed networked blockchain without disclosing raw data contents. Valid sharing records are collected and stored in the blocknet and misbehavior is detected with the stored provenance records according to our accountable protocols. We give a proof-of-concept implementation, and evaluation results show that the overhead is acceptable. • A networked blockchain, blocknet, stores data sharing records bi-directionally. • Using MinHash to check datasets' similarities, thus protect data ownership. • Chameleon Hash is leveraged to add future pointers without changing previous hashes. • A provenance graph built from blocknet, provides users' trackability on shared data. • Prototye performance is acceptable, an indexing system designed for optimization. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

35. A method for reducing cloud service request peaks based on game theory.

Author: Xiao, Zheng, Wang, Mengyuan, Chronopoulos, Anthony Theodore, and Jiang, Jiuchuan
Subjects: *GAME theory, *NASH equilibrium, *ARTIFICIAL intelligence, *WIRELESS Internet, *INTERNET access, *CLOUD computing
Abstract: • Minimization of both cloud service cost and request peak is addressed using game theory. • Nash equilibrium of the formalized game is proven theoretically. • A distributed Peak Clipping Algorithm (DPCA) is proposed to calculate the Nash equilibrium solution. • Experiments validate the request peak and the payment cost of customer are reduced simultaneously, thereby increasing the benefit. The rapid development of Internet of Things (IoT) technology and the popularity of Artificial Intelligence (Al) technology research have brought new opportunities for the development of cloud computing (CC). With the increasing number of mobile Internet access devices and IoT access devices, the number of task requests from CC customers for AI services in the network has also experienced an explosive growth. In this paper, the focus is on the possible overload of cloud providers during the peak period of cloud service requests. Thetime attributes of cloud task execution are classified to avoid overloading the cloud provider as much as possible. In a distributed cloud environment, it is necessary to consider the time flexible attributes of cloud tasks to reasonably compete for cloud resources. In this work, game theory (GT) is introduced to formulate a cloud service scheduling game, in which participants are cloud customers who participate in the purchase of cloud services. The players' strategies are the time flexibility of each cloud task. The problem is formulated as minimizing the cost of scheduling cloud services and a noncooperative game among the customers (as players) is presented. Then the existence of the Nash equilibrium (NE) solution of the game has been proved and a new algorithm has been proposed in this paper to compute it. In addition, the analysis process of the convergence of the proposed PCA algorithm and the proof of its convergence to NE are also included in this paper. At the end of the paper, simulations were performed to verify the theoretical analysis presented. The experimental results show that the proposed PCA algorithm can converge to the Nash equilibrium very quickly, effectively reducing the peak value and increasing profit. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

36. Identifying challenges and opportunities of in-memory computing on large HPC systems.

Author: Huang, Dan, Qin, Zhenlu, Liu, Qing, Podhorszki, Norbert, and Klasky, Scott
Subjects: *HIGH performance computing, *COMPUTER systems, *PHENOMENOLOGICAL theory (Physics), *SCIENTIFIC discoveries, *LIBRARY users, *DATA analysis
Abstract: With the increasing fidelity and resolution enabled by high-performance computing systems, simulation-based scientific discovery is able to model and understand microscopic physical phenomena at a level that was not possible in the past. A grand challenge that the HPC community facing is how to maintain the large amounts of analysis data generated from simulations. In-memory computing, among others, is recognized to be a viable path forward and has experienced tremendous success in the past decade. Nevertheless, there has been a lack of a complete study and understanding of in-memory computing as a whole on HPC systems. Given the enlarging disparity between compute and HPC storage I/O, it is urgent for the HPC community to assess the state of in-memory computing and understand the challenges and opportunities. This paper presents a comprehensive study of in-memory computing with regard to its software evolution, performance, usability, robustness, and portability. In particular, we conduct an indepth analysis on the evolution of in-memory computing based upon more than 3,000 commits, and use realistic workflows for two scientific workloads, i.e., LAMMPS and Laplace to quantitatively assess state-of-the-art in-memory computing libraries, including DataSpaces, DIMES, Flexpath, Decaf and SENSEI on two leading supercomputers, Titan and Cori. Our studies not only illustrate the performance and scalability, but also reveal the key aspects that are of interest to library developers and users, including usability, robustness, portability, potential design defects, etc. • This paper presents a comprehensive study of in-memory computing. • This work covers software performance, usability, robustness, and portability. • An in-depth analysis is conducted on the evolution of in-memory computing. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

37. DQS: A QoS-driven routing optimization approach in SDN using deep reinforcement learning.

Author: Aguirre Sanchez, Lizeth Patricia, Shen, Yao, and Guo, Minyi
Subjects: *DEEP reinforcement learning, *REINFORCEMENT learning, *END-to-end delay, *SOFTWARE-defined networking, *NETWORK performance, *ROUTING algorithms
Abstract: In recent decades, the exponential growth of applications has intensified traffic demands, posing challenges in ensuring optimal user experiences within modern networks. Traditional congestion avoidance and control mechanisms embedded in conventional routing struggle to promptly adapt to new-generation networks. Current routing approaches risk-averse outcomes such as (1) scalability constraints, (2) high convergence times, and (3) congestion due to inadequate real-time traffic prioritization. To address these issues, this paper introduces a QoS-Driven Routing Optimization in Software-Defined Networking (SDN) using Deep Reinforcement Learning (DRL) to optimize routing and enhance QoS efficiency. Employing DRL, the proposed DQS optimizes routing decisions by intelligently distributing traffic, guided by a multi-objective function-driven DRL agent that considers both link and queue metrics. Despite the complexity of the network, DQS sustains scalability while significantly reducing convergence times. Through a Docker-based Openflow prototype, results highlight a substantial 20-30% reduction in end-to-end delay compared to baseline methods. • DQS tackles scalability and congestion with a multi-objective loss function, integrating CoS into routing decisions to make informed decisions. • DQS employs a seven-parameter traffic classifier with ML techniques, efficiently categorizing CoS and ensuring mice and elephant traffic. • The paper presents DQS, employing DRL to optimize routing decisions, minimize delay and loss, and maximize bandwidth use across links and queues. • We evaluate DQS with a Docker-based SDN prototype, showing its adaptability, scalability, and responsiveness to traffic changes and QoS demands. • Results show 20-30% delay reduction and 14% processing time improvement over state-of-the-art algorithms, enhancing network performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Evaluating performance portability of five shared-memory programming models using a high-order unstructured CFD solver.

Author: Dai, Zhe, Deng, Liang, Che, YongGang, Li, Ming, Zhang, Jian, and Wang, Yueqing
Subjects: *COMPUTATIONAL fluid dynamics, *DATA structures
Abstract: This paper presents implementing and optimizing a high-order unstructured computational fluid dynamics (CFD) solver using five shared-memory programming models: CUDA, OpenACC, OpenMP, Kokkos, and OP2. The study aims to evaluate the performance of these models on different hardware architectures, including NVIDIA GPUs, x86-based Intel/AMD, and Arm-based systems. The goal is to determine whether these models can provide developers with performance-portable solvers running efficiently on various architectures. The paper forms a more holistic view of a high-order solver across multiple platforms by visualizing performance portability (PP) and measuring productivity. It gives guidelines for translating existing codebases and their data structures to these models. • We port and optimize a high-order unstructured CFD application by using five shared-memory programming models. • We evaluate the performance portability of five programming models on diverse hardware. • We analyze the workload from the perspective of code volume and learning cost. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Communication lower-bounds for distributed-memory computations for mass spectrometry based omics data.

Author: Saeed, Fahad, Haseeb, Muhammad, and Iyengar, S.S.
Subjects: *MASS spectrometry, *PARALLEL algorithms, *PARALLEL programming, *DATABASES, *METAGENOMICS
Abstract: • We present a theoretical framework that can be used for analyzing, and quantifying the performance of parallel algorithms designed for MS based omics data. • We prove the lower communication bounds for the existing parallel algorithms. • We also prove lower communication bounds that can be theoretically achieved by parallel algorithms for MS based omics analysis. • Extensive experimentation for state of the art tools confirms our theoretical results. • This is first proof of any communication bounds for parallel algorithms for MS based omics. Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is reached by the existing parallel algorithms is Ω (m n + 2 r q p) , where m and n are the dimensions of the theoretical database matrix, q and r are dimensions of spectra, and p is the number of processors. We further prove that communication-optimal strategy with fast-memory M = m n + 2 q r p can achieve Ω (2 m n q p) but is not achieved by any existing parallel proteomics algorithms till date. To validate our claim, we performed a meta-analysis of published parallel algorithms, and their performance results. We show that sub-optimal speedups with increasing number of processors is a direct consequence of not achieving the communication lower-bounds. We further validate our claim by performing experiments which demonstrate the communication bounds that are proved in this paper. Consequently, we assert that next-generation of provable , and demonstrated superior parallel algorithms are urgently needed for MS based large systems-biology studies especially for meta-proteomics, proteogenomic, microbiome, and proteomics for non-model organisms. Our hope is that this paper will excite the parallel computing community to further investigate parallel algorithms for highly influential MS based omics problems. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

40. Distributed and individualized computation offloading optimization in a fog computing environment.

Author: Li, Keqin
Subjects: *NASH equilibrium, *COMBINATORIAL optimization, *ALGORITHMS, *MATHEMATICAL optimization, *INFORMATION technology
Abstract: • Formulate a non-cooperative game with both UEs and MECs as players. • Prove the existence of and convergence to a Nash equilibrium. • Develop a set of algorithms to find the Nash equilibrium. • Demonstrate numerical examples of non-cooperative games with and without MECs' participation. In a newly emerged fog computing environment, various user equipments (UE) enhance their computing power and extend their battery lifetime by computation offloading to mobile edge cloud (MEC) servers. Such an environment is distributed and competitive in nature. In this paper, we take a game theoretical approach to computation offloading optimization in a fog computing environment. Such an approach captures and characterizes the nature of a competitive environment. The main contributions of the paper can be summarized as follows. First, we formulate a non-cooperative game with both UEs and MECs as players. Each UE attempts to minimize the execution time of its tasks with an energy constraint. Each MEC attempts to minimize the product of its power consumption for computation and execution time for allocated tasks. Second, we develop a heuristic algorithm for a UE to determine its "heuristically" best response to the current situation, an algorithm for an MEC to determine its best response to the current situation, and an iterative algorithm to find the Nash equilibrium. Third, we prove that our iterative algorithm converges to a Nash equilibrium. We demonstrate numerical examples of our non-cooperative games with and without MECs' participation. We observe that our iterative algorithm always quickly converges to a Nash equilibrium. The uniqueness of our non-cooperative games is that the strategy set of a player can be discrete and the payoff function of a player can be obtained by a heuristic algorithm for combinatorial optimization. To the best of the author's knowledge, there has been no such investigation of non-cooperative games based on combinatorial optimization for computation offloading optimization in a fog computing environment. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. Application of the Layered Algorithm in search of an airborne contaminant source.

Author: Szaban, Miroslaw and Wawrzynczak, Anna
Subjects: *SEARCH algorithms, *METROPOLITAN areas, *CELLULAR automata, *ALGORITHMS, *MATHEMATICAL optimization, *MICROBIOLOGICAL aerosols
Abstract: The paper presents a new method of optimization by the Layered Algorithm (LA). The proposed algorithm reduces the initial area to the sub-area containing the optimum. The proposed technique is based on the classification of the optimized function values (data sampled by sensors). The classification method uses a two-dimensional three-state Cellular Automata (CA). The CA classifies all area points ascribed to the CA cells based on their values. Specification of the categorization layers to the data gives a possibility to identify the different levels areas. Consequently, after analysis, a sub-area containing the optimum can be designated. In this paper, the proposed algorithm is applied to find the location of the airborne contaminant source by analyzing the concentration of released substances reported by mobile sensors distributed over the domain of interest. The Gaussian dispersion model simulation of the contaminant dispersion in the urbanized area is applied to generate the data used to verify the efficiency of the proposed Layered Algorithm. The LA successfully estimates the sub-area of the considered domain where the contamination source is located, taking to account data from sensors solely. • Optimization method reducing the initial area to the sub-area containing the optimum. • Verification of the Layered Algorithm (LA) with Selected Benchmark Functions. • Analysis of the LA efficiency to localize the airborne contaminant source. • LA results assessment by the measures: classification and accuracy error, relevance. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. A high-performance VLSI array reconfiguration scheme based on network flow under row and column rerouting.

Author: Ding, Hao, Qian, Junyan, Zhao, Lingzhong, and Zhai, Zhongyi
Subjects: *VERY large scale circuit integration, *ARRAY processors, *TELECOMMUNICATION equipment, *ALGORITHMS, *BOTTLENECKS (Manufacturing)
Abstract: • A network flow model of the VLSI processor array is constructed under row and column rerouting. • A new strategy for selecting the bottleneck row in the logic array using the minimum cutting technique is proposed. • The proposed schemes significantly reduce the interconnect redundancy of the logical subarray. The reconfiguration algorithms have been extensively investigated to ensure the reliability and stability for the processor arrays with faults. It is important to reduce the power consumption, capacitance and communication costs in the processors by reducing the interconnection length of the VLSI array. This paper discusses the reconfiguration problem of the high-performance VLSI processor array under the row and column rerouting constraints. A novel method, making use the idea of network flow, is proposed in this paper. Firstly, a network flow model of the VLSI processor array is constructed, such that the high-performance VLSI target array can be obtained by utilizing the minimal cost flow algorithm. Secondly, we propose a new strategy for bottleneck row selection in the logical array using the minimum cut technique, which can find a more suitable bottleneck row. Finally, we conducted reliable experiments to clearly reveal the efficiency of the new rerouting scheme and algorithm in reducing the number of long interconnects. The experimental results show that, for a host array with size of 256×256, the number of long interconnects in the subarray can be reduced by up to 79.22% and 55.88% without performance penalty for random faults with density of 1% and 25% respectively, when compared with state-of-the-art. In addition, the proposed scheme improves existing algorithm in terms of subarray size. On a 256×256 host array with 25% faulty density, the average improvement in subarray size is up to 3.77% compared with state-of-the-art. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

43. Dynamic fault tolerant scheduling with response time minimization for multiple failures in cloud.

Author: Gupta, Pushpanjali, Sahoo, Prasan Kumar, and Veeravalli, Bharadwaj
Subjects: *SCHEDULING, *CLOUD computing, *QUALITY of service, *REACTION time
Abstract: With the increasing demand for large amount of computing resources, the cloud is widely used for executing large number of independent tasks. In order to successfully execute more tasks and maximize the revenues, the cloud service providers (CSPs) should provide reliable services, while maximizing the resource utilization. Providing better Quality of Service (QoS), while maximizing the resource utilization in the event of failures is a critical research issue which needs to be addressed. In this paper, an Elastic pull-based Dynamic Fault Tolerant (E-DFT) scheduling mechanism is designed for minimizing the response time while executing the backups during multiple failures of independent tasks. A basic core primary backup model is also used and integrated with the backup tasks overlapping (BTO) and backup tasks fusion (BTF) techniques to tolerate multiple simultaneous failures. Simulation results show that the proposed E-DFT scheduling can achieve better performance in terms of guarantee ratio and resource utilization over other existing scheduling algorithms. • Cloud resource utilization problem is addressed in this paper. • Multiple-failures tolerant scheduling mechanism is proposed for real-time tasks. • The response time is minimized using the proposed scheduling mechanism. • A basic core primary backup model integrated with the backup tasks is considered. • Rigorous performance evaluation is attempted to demonstrate resource utilization. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

44. Preemptive scheduling on unrelated machines with fractional precedence constraints.

Author: Aggarwal, Vaneet, Lan, Tian, and Peddireddy, Dheeraj
Subjects: *MATRIX decomposition, *SCHEDULING, *DECOMPOSITION method, *PRODUCTION scheduling, *SCHOOL schedules
Abstract: • The notion of fractional precedence constraints is formalized. • The flexibility that the progress of follower jobs can to lag behind (fractionally) their leads is exploited. • Sufficient and necessary conditions of the constrained job scheduling are provided. • Novel matrix decomposition algorithm is developed to verify the feasibility of the objective. • Approximation guarantees for makespan are provided. Many programming models, e.g., MapReduce, introduce precedence constraints between the jobs. This paper formalizes a notion of precedence constraints, called fractional precedence constraints, where the progress of follower jobs only has to lag behind (fractionally) their leads. For a general set of fractional precedence constraints between the jobs, this paper provides a new class of preemptive scheduling algorithms on unrelated machines that have arbitrary processing speeds. In particular, for a given makespan, we establish both sufficient and necessary conditions on the existence of a feasible job schedule, and then propose an efficient scheduling algorithm based on a novel matrix decomposition method, if the sufficient conditions are satisfied. The algorithm is shown to be a Polynomial-Time Approximation Scheme (PTAS), i.e., its solution is able to achieve any feasible makespan with an approximation bound of 1 + ϵ , for an arbitrary ϵ > 0. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

45. Evolving PDC curriculum and tools: A study in responding to technological change.

Author: Adams, Joel C.
Subjects: *RASPBERRY Pi, *CURRICULUM change, *FREEWARE (Computer software), *PARALLEL programming, *MESSAGE passing (Computer science), *GRAPHICS processing units
Abstract: • A history of parallel and distributed computing (PDC) technology evolution, from the late 1990s to the present. • Specific ways in which teaching PDC has changed during that timespan. • Descriptions and pictures of a wide variety of Beowulf clusters. • Select software tools specifically designed for PDC education. • A dozen insights reflecting the author's experiences as a PDC educator. Much has changed about parallel and distributed computing (PDC) since the author began teaching the topic in the late 1990s. This paper reviews some of the key changes to the field and describes their impacts on his work as a PDC educator. Such changes include: the availability of free implementations of the message passing interface (MPI) for distributed-memory multiprocessors; the development of the Beowulf cluster; the advent of multicore architectures; the development of free multithreading languages and libraries such as OpenMP; the availability of (relatively) inexpensive manycore accelerator devices (e.g., GPUs); the availability of free software platforms like CUDA, OpenACC, OpenCL, and OpenMP for using accelerators; the development of inexpensive single board computers (SBCs) like the Raspberry Pi, and other changes. The paper details the evolution of PDC education at the author's institution in response to these changes, including curriculum changes, seven different Beowulf cluster designs, and the development of pedagogical tools and techniques specifically for PDC education. The paper also surveys many of the hardware and software infrastructure options available to PDC educators, provides a strategy for choosing among them, and provides practical advice for PDC pedagogy. Through these discussions, the reader may see how much PDC education has changed over the past two decades, identify some areas of PDC that have remained stable during this same time period, and so gain new insight into how to efficiently invest one's time as a PDC educator. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

46. Energy saving strategy and Nash equilibrium of hybrid P2P networks.

Author: Ma, Zhanyou, Zhang, Changzhen, Zhang, Liyuan, and Wang, Shunzhi
Subjects: *ENERGY consumption, *NASH equilibrium, *MATRIX analytic methods, *GEOMETRIC approach, *PEER-to-peer architecture (Computer networks), *QUEUING theory, *CORPORATE profits
Abstract: • Novel queueing theory method to deal with free riding problem in P2P network. • Sleep/wakeup mechanism and asynchronous vacation strategy aimed at reducing energy consumption of system. • Quantified energy consumption of peers in each state. • Nash equilibrium between the arrival rate and the net profit of a single node. • Social optimal strategy. This paper proposes a penalty strategy with differentiated service rate based on the free riding phenomenon in P2P networks, and establishes an M/M/ c + d queueing model. Based on this model, a sleep/wakeup mechanism is introduced for the peers at the service end, and a single asynchronous vacation strategy is adopted to reduce the energy consumption of the system. In addition, the energy consumption of peers in each state is quantified, and the relationship between the energy consumption and parameters of the system is analyzed. In order to avoid excessive requests for unnecessary services from requesting nodes and increasing energy consumption of the system, this paper analyzes the Nash equilibrium between the arrival rate and the net profit of a single node, and then studies the optimization of social profit. The stationary distribution of queueing model is obtained by the method of matrix geometric solution, the performance indicators of the system are constructed, and the system performance is analyzed by numerical experiments. Experimental results show that the model developed in this paper has a significant penalty effect on free riding behavior, and that the single asynchronous vacation strategy not only saves more than 10% of the total energy consumption compared with the single synchronous vacation strategy, but also makes the hybrid P2P networks more flexible and efficient. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

47. The power of agents in a dispersed system – The Shapley-Shubik power index.

Author: Przybyła-Kasperek, Małgorzata
Subjects: *COMPUTATIONAL complexity
Abstract: • Application of the Shapley-Shubik index to determine the agents' strength in a dispersed decision-making system. • A new method for generating the local decisions within one cluster. • A new method of determining a set of global decisions. • The results that were obtained were compared with the results that were obtained in the previous papers. In this paper, dispersed knowledge – accumulated in several decision tables is considered. Dispersion of knowledge is not a part of the work of the system. We assume that the knowledge is already in the dispersed form when it provides to the system. An advanced process of detecting the relations between the decision tables and constructing coalitions is used. The purpose of this paper is to use the measure to determine the strength of the coalition. With this method, a simple method of combining vectors of decisions generated based on local decision tables was applied. The purpose of using the Shapley-Shubik index was to reduce the computational complexity compared to the approach proposed in the earlier papers. In this paper, the results of experiments are presented, and the two approaches are compared. Based on these results, some conclusions have been drawn. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

48. Integrating batched sparse iterative solvers for the collision operator in fusion plasma simulations on GPUs.

Author: Kashi, Aditya, Nayak, Pratik, Kulkarni, Dhruva, Scheinberg, Aaron, Lin, Paul, and Anzt, Hartwig
Subjects: *SPARSE matrices, *GRAPHICS processing units, *SUPERCOMPUTERS, *LINEAR algebra, *PLASMA confinement, *PLASMA devices, *GINKGO
Abstract: Batched linear solvers, which solve many small related but independent problems, are increasingly important for highly parallel processors such as graphics processing units (GPUs). GPUs need a substantial amount of work to keep them operating efficiently and it is not an option to solve smaller problems one-by-one. Because of the small size of each problem, the task of implementing a parallel partitioning scheme and mapping the problem to hardware is not trivial. In recent history, significant attention has been given to batched dense linear algebra. However, there is also an interest in utilizing sparse iterative solvers in a batched form. An example use case is found in a gyrokinetic Particle-In-Cell (PIC) code used for modeling magnetically confined fusion plasma devices. The collision operator has been identified as a bottleneck, and a proxy app has been created for facilitating optimizations and porting to GPUs. The current collision kernel linear solver does not run on the GPU—a major bottleneck. As these matrices are sparse and well-conditioned, batched iterative sparse solvers are an attractive option. A batched sparse iterative solver capability has recently been developed in the Ginkgo library. In this paper, we describe how Ginkgo 's batched solver technology can integrate into the XGC collision kernel and accelerate the simulation process. Comparisons for the solve times on NVIDIA V100 and A100 GPUs and AMD MI100 GPUs with one dual-socket Intel Xeon Skylake CPU node with 40 cores are presented for matrices from the collision kernel of XGC. Further, the speedups observed for the overall collision kernel are presented in comparison to different modern CPUs on multiple supercomputer systems. The results suggest that Ginkgo 's batched sparse iterative solvers are well suited for efficient utilization of the GPU for this problem, and the performance portability of Ginkgo in conjunction with Kokkos (used within XGC as the heterogeneous programming model) allows seamless execution on exascale-oriented heterogeneous architectures. • Fast batched sparse iterative linear solvers for modern graphics processing units. • Implementation of different batched sparse matrix formats. • Automatic tuning of shared memory utilization on the GPU. • Strategy for integration into the plasma simulation code XGC via Kokkos. • Performance results on various CPUs and V100, A100 and MI100 GPUs. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

49. Interactive anomaly-based DDoS attack detection method in cloud computing environments using a third party auditor.

Author: Mahdavi Hezavehi, Sasha and Rahmani, Rouhollah
Subjects: *DENIAL of service attacks, *CLOUD computing, *INFORMATION technology, *AUDITORS, *CONDITIONED response
Abstract: Cloud computing environments are indispensable components of the majority of information technology organizations and users' lives. Despite multiple benefits of cloud computing environments, cloud users (CUs) as well as cloud service providers (CSPs) may experience unpleasant conditions by detrimental results of distributed denial of service (DDoS) attacks such as unavailability of cloud services or lengthy response times of the services. In this paper, we provide a threshold anomaly-based DDoS attack detection method to protect cloud environments against DDoS attack. Our proposed method is introduced to reduce DDoS attack consequences in CSPs. Our suggested method includes three newly defined components: 1. A third party auditor (TPA) which acquires direct interaction with each datacenter of the CSP, 2. A zone delimiter (ZD) which encapsulates the sensitive internal specifications of a CSP from TPA, and 3. A protocol which is defined to coordinate TPA, ZD, and CSPs for DDoS attack detection via TPA. We analyze our proposed method by determining and conducting a simulation strategy for an intrusion detection system in CSPs. Results illustrate that interactive communication between TPA and datacenters of CSPs improves the user experience of CUs in the time of DDoS attacks by reducing excessive attack filtering stages. Moreover, by using an intrusion detection system (IDS), we investigate efficiency of the proposed method to recover CSPs from DDoS attacks. We further indicate the efficiency of our proposed method by providing accuracy and qualitative comparisons with other existing methods. • A cloud auditor detects Denial of Service or Distributed Denial of Service attacks in services of a Cloud Service Provider. • Interactive communication between cloud auditor and datacenters improves the user experience in the time of attacks. • When attacks remain undetected, cloud users will not experience any unusual condition such as long response time. • When attacks remain undetected, hosts will not experience any unusual condition such as overheating the hardware. • This method functions completely independent from any other intrusion detection mechanism or filtration method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

50. High performance HITA based Binary Edward Curve Crypto processor for FPGA platforms.

Author: Kalaiarasi, M., Venkatasubramani, V.R., Manikandan, M.S.K., and Rajaram, S.
Subjects: *ELLIPTIC curve cryptography, *PARALLEL algorithms, *FIELD programmable gate arrays, *ARCHITECTURAL design, *RESILIENT design
Abstract: In an embedded and resource-constrained environment, Elliptic Curve Cryptography (ECC) has been noted as an efficient and suitable methodology for achieving information security via public-key cryptography. However, the drawback of ECC is its lack of unifiedness in point operation that makes it prone to side-channel attack. Also, ECC does not satisfy the completeness property due to which the addition formula is not defined for all the pairs of input points. Edward curve, with its unified addition law and completeness property, proved to be the answer to aforementioned flaws. High throughput while maintaining low resource is a key issue for elliptic curve cryptography (ECC) hardware implementations in many applications. This paper presents the implementation of a Binary Edward curve Crypto processor over G F (2 233) for FPGA platforms. The architecture is modified to perform scalar multiplication in a parallel manner using two hybrid Karatsuba field multipliers. Field inversion being one of the most tedious operations while reconversion, is also performed in a parallel manner using an efficient Hex Itoh-Tsujii inversion algorithm. The hardware resources are shared for performing point operations and inversion. Exploiting parallelism in point and inversion operations has resulted in reduction of the clock cycles consumed and the resultant architecture is more efficient in terms of throughput over area. The design takes 0.038 ms on Xilinx Virtex-4 and 0.031 ms on Virtex-7 FPGA platforms to perform a 233-bit point multiplication operation. It takes 73.57%, 13.71%, 14.76% and 48.76% more efficient than existing scalar multiplication with BEC. This proposed scalable, side-channel attack resilient design outperforms the existing techniques with respect to throughput over area. • The parallelism in BEC is done with two modular hybrid Karatsuba multiplier. • Inversion in reconversion is done with new Hex-Itoh Tsujii inversion algorithm (HITA). • Point operation and HITA uses the same resources with reduced clock cycles. • This architecture is designed for FPGA platforms yielding high throughput and area-time performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

1,860 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources