14,339 results on '"PARALLEL algorithms"'
Search Results
2. Scheduling on parallel dedicated machines with job rejection.
- Author
-
Mor, Baruch and Mosheiov, Gur
- Subjects
NP-hard problems ,DYNAMIC programming ,PRODUCTION scheduling ,TARDINESS ,PARALLEL algorithms - Abstract
We study scheduling problems on parallel dedicated machines. Thus, each job can be processed on one specific machine only. The option of job-rejection is considered, and the total permitted rejection cost of all the jobs is bounded. Six scheduling problems are solved: ( $ i $ i) minimising makespan, ( $ ii $ ii) minimising makespan with release-dates, ( $ iii $ iii) minimising total completion time, ( $ iv $ iv) minimising total weighted completion time, ( $ v $ v) minimising total load, and ( $ vi $ vi) minimising maximum tardiness. Pseudo-polynomial dynamic programming algorithms are introduced for all these NP-hard problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. An efficient and universal parallel algorithm for high-dimensional quantum dynamics in poly-atomic reactions.
- Author
-
Zhou, Yong, Lu, Yunpeng, Zhang, Zhaojun, and Zhang, Dong H.
- Subjects
- *
QUANTUM theory , *PARALLEL algorithms , *NUCLEAR reactions , *MESSAGE passing (Computer science) , *WAVE functions - Abstract
This study presents a parallel algorithm for high-dimensional quantum dynamics simulations in poly atomic reactions, integrating distributed- and shared-memory models. The distributions of the wave function and potential energy matrix across message passing interface processes are based on bundled radial and angular dimensions, with implementations featuring either two- or one-sided communication schemes. Using realistic parameters for the H + NH3 reaction, performance assessment reveals linear scalability, exceeding 90% efficiency with up to 600 processors. In addition, owing to the universal and concise structure, the algorithm demonstrates remarkable extensibility to diverse reaction systems, as demonstrated by successes with six-atom and four-atom reactions. This work establishes a robust foundation for high-dimensional dynamics studies, showcasing the algorithm's efficiency, scalability, and adaptability. The algorithm's potential as a valuable tool for unraveling quantum dynamics complexities is underscored, paving the way for future advancements in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Parallel Acyclic Joins: Optimal Algorithms and Cyclicity Separation.
- Author
-
Hu, Xiao and Tao, Yufei
- Subjects
ALGORITHMS ,DATABASES ,HYPERGRAPHS ,OPEN-ended questions ,PARALLEL algorithms - Abstract
We study equi-join computation in the massively parallel computation (MPC) model. Currently, a main open question under this topic is whether it is possible to design an algorithm that can process any join with load O(N polylog N/p
1/ρ* ) — measured in the number of words communicated per machine — where N is the total number of tuples in the input relations, ρ* is the join's fractional edge covering number, and p is the number of machines. We settle the question in the negative for the class of tuple-based algorithms (all the known MPC join algorithms fall in this class) by proving the existence of a join query with ρ* = 2 that requires a load of Ω (N/p1/3 ) to evaluate. Our lower bound provides solid evidence that the "AGM bound" alone is not sufficient for characterizing the hardness of join evaluation in MPC (a phenomenon that does not exist in RAM). The hard join instance identified in our argument is cyclic, which leaves the question of whether O(N polylog N/p1/ρ* ) is still possible for acyclic joins. We answer this question in the affirmative by showing that any acyclic join can be evaluated with load O(N / p1/ρ* ), which is asymptotically optimal (there are no polylogarithmic factors in our bound). The separation between cyclic and acyclic joins is yet another phenomenon that is absent in RAM. Our algorithm owes to the discovery of a new mathematical structure — we call "canonical edge cover" — of acyclic hypergraphs, which has numerous non-trivial properties and makes an elegant addition to database theory. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
5. DeepQMC: An open-source software suite for variational optimization of deep-learning molecular wave functions.
- Author
-
Schätzle, Z., Szabó, P. B., Mezera, M., Hermann, J., and Noé, F.
- Subjects
- *
WAVE functions , *SCHRODINGER equation , *LIBRARY software , *COMPUTATIONAL chemistry , *QUANTUM chemistry , *PARALLEL algorithms , *LEARNING communities - Abstract
Computing accurate yet efficient approximations to the solutions of the electronic Schrödinger equation has been a paramount challenge of computational chemistry for decades. Quantum Monte Carlo methods are a promising avenue of development as their core algorithm exhibits a number of favorable properties: it is highly parallel and scales favorably with the considered system size, with an accuracy that is limited only by the choice of the wave function Ansatz. The recently introduced machine-learned parametrizations of quantum Monte Carlo Ansätze rely on the efficiency of neural networks as universal function approximators to achieve state of the art accuracy on a variety of molecular systems. With interest in the field growing rapidly, there is a clear need for easy to use, modular, and extendable software libraries facilitating the development and adoption of this new class of methods. In this contribution, the DeepQMC program package is introduced, in an attempt to provide a common framework for future investigations by unifying many of the currently available deep-learning quantum Monte Carlo architectures. Furthermore, the manuscript provides a brief introduction to the methodology of variational quantum Monte Carlo in real space, highlights some technical challenges of optimizing neural network wave functions, and presents example black-box applications of the program package. We thereby intend to make this novel field accessible to a broader class of practitioners from both the quantum chemistry and the machine learning communities. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. GRASP algorithms for the unrelated parallel machines scheduling problem with additional resources during processing and setups.
- Author
-
Lopez-Esteve, Axel, Perea, Federico, and Yepes-Borrero, Juan C.
- Subjects
DISTRIBUTION (Probability theory) ,MACHINERY ,SCHEDULING ,SETUP time ,SUPPLEMENTARY employment ,PARALLEL algorithms - Abstract
This paper addresses an unrelated parallel machines scheduling problem with the need of additional resources during the processing of the jobs, as well as during the setups that machines need between the processing of any two jobs. This problem is highly complex, and therefore in this paper we propose several constructive heuristics to solve it. To improve the performance of these heuristics, we propose several variations, including randomisation with different probability distributions and a local search phase, having this way GRASP algorithms. The results of extensive experiments over randomly generated instances show several findings on the different parameters that characterise our constructive algorithms. In particular, we highlight the fact that non-uniform probability distributions might be advisable for choosing elements of a restricted candidate list in GRASP algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Investigating heterogeneity in IRTree models for multiple response processes with score‐based partitioning.
- Author
-
Debelak, Rudolf, Meiser, Thorsten, and Gernand, Alicia
- Subjects
- *
ITEM response theory , *FALSE positive error , *RECURSIVE partitioning , *PARALLEL algorithms , *ERROR rates - Abstract
Item response tree (IRTree) models form a family of psychometric models that allow researchers to control for multiple response processes, such as different sorts of response styles, in the measurement of latent traits. While IRTree models can capture quantitative individual differences in both the latent traits of interest and the use of response categories, they maintain the basic assumption that the nature and weighting of latent response processes are homogeneous across the entire population of respondents. In the present research, we therefore propose a novel approach for detecting heterogeneity in the parameters of IRTree models across subgroups that engage in different response behavior. The approach uses score‐based tests to reveal violations of parameter heterogeneity along extraneous person covariates, and it can be employed as a model‐based partitioning algorithm to identify sources of differences in the strength of trait‐based responding or other response processes. Simulation studies demonstrate generally accurate Type I error rates and sufficient power for metric, ordinal, and categorical person covariates and for different types of test statistics, with the potential to differentiate between different types of parameter heterogeneity. An empirical application illustrates the use of score‐based partitioning in the analysis of latent response processes with real data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures.
- Author
-
Gandía-Barberá, Sergio, Cremades, Andres, Vinuesa, Ricardo, Hoyas, Sergio, and Pérez-Quiles, María Jezabel
- Subjects
- *
COHERENT structures , *PARALLEL algorithms , *TURBULENCE , *TURBULENT flow , *PARALLEL programming - Abstract
The behavior of turbulent flows remains a significant unsolved problem in physics. Recently, a large quantity of effort has been directed toward understanding the non-linear interactions of the different flow structures in order to address this challenge. In this paper, different implementations of one exact method for identifying these structures are analyzed. This includes two sequential algorithms and a parallelizable one, developed to handle large-scale data efficiently. The new parallel algorithm offers significant advantages in handling the computational demands of large simulations, providing a more scalable solution for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Improved parallel finite element methods for the stationary Navier–Stokes problem.
- Author
-
Du, Guangzhi and Zuo, Liyun
- Subjects
- *
FINITE element method , *PARALLEL algorithms , *NUMERICAL analysis , *STOKES equations , *EQUATIONS , *ALGORITHMS - Abstract
In this study, two improved parallel finite element algorithms based on two-grid strategies are developed to approximate the stationary Navier–Stokes equations. Algorithms are devised to improve the existing local and parallel finite element methods to arrive at L 2 optimal velocity approximation by considering one further coarse grid correction. Rigorously numerical analysis is established, and numerical experiments are reported to support the theoretical findings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A Novel Approach for Parallel Document Clustering Using an Enhanced Parallel WAND Algorithm.
- Author
-
Ali, Wael, Khamis, Soheir, and Zakaria, Wael
- Subjects
SEARCH algorithms ,DATA structures ,INFORMATION retrieval ,PARALLEL algorithms ,SEARCH engines ,DOCUMENT clustering - Abstract
Document clustering is crucial for managing a large textual data available on the Internet, even though it is computationally costly to cluster high dimensional and large datasets. To tackle these obstacles, a widely used information retrieval method called the weighted AND (WAND) algorithm is utilized as an essential stage in a document clustering process to make it more effective. WAND uses an efficient data structure known as an inverted index to determine document scores and ranks, allowing it to extract the topK documents that are most similar to a given query. Owing to its effectiveness, several versions of parallel algorithms have been proposed to enhance it. However, challenges in document clustering increase since it requires retrieving a higher number of topK and processing longer queries. So, in this paper, an enhanced parallel version of the WAND algorithm (PWAND) is proposed. PWAND divides the inverted index into partitions, each is assigned a specific percentage of topK according to its relevance to the given query. Furthermore, a novel PWAND-based Parallel PArtitional Clustering (PWPPAC) approach that combines the parallel execution of clustering stages with PWAND is proposed. Based on the practical results across a variety of datasets, PWAND is a promising method since it produces results that are extremely match to those obtained by WAND, but with a significant speedup, where the maximum recorded speedup is 85.7x on AG-News dataset. Moreover, the results show that employing the PWAND algorithm in the clustering process makes it more efficient, while maintaining accuracy and quality of clustering. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Hybrid Parallelism Image Encryption Algorithm Based on a Modified Blowfish Algorithm.
- Author
-
Abdulkadhim, Ahmed Abd Ali and Lakizadeh, Amir
- Subjects
IMAGE encryption ,DIGITAL technology ,MULTICORE processors ,MATRICES (Mathematics) ,DIGITAL communications ,PARALLEL algorithms - Abstract
Image encryption is a vital area in our current digital age, playing an important role in protecting information and improving data quality. Encryption protects privacy and enhances security in various applications, such as communications, cloud storage, and digital transmission. As the size and complexity of images increase, the importance of using parallel methods in image processing and encryption becomes more prominent. These methods allow exploiting the multiple processing capabilities available in modern devices, such as multi-core processors, which enhance efficiency and speed in processing large data sets.in this paper, we present a modified Blowfish algorithm and a hybrid parallel approach to overcome the known weaknesses of the traditional Blowfish algorithm. First, we use a Pascal matrix to permuting the image pixels, and the results of this operation are used as input to the modified version of the Blowfish algorithm. In this version, the P matrix is modified using a hybrid chaos approach, which improves the encryption process. Furthermore, this encryption is implemented using a hybrid parallel processing approach, which enhances the performance and efficiency of data processing. Tests and Result are shown using test images (256*256) from the USC-SIPI and CVG-UGR databases. were quicker outcomes and more secure encryption. In addition, the average execution time of encryption and decryption reached (0.00618ms, 0.003292 ms) the information entropy screening rate reached 7.99735, which is close to the optimal ratio of 8. and NPCR and UACI reached (99.639, 33.42825). The algorithm has achieved a high level of security. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Parallel linearized ADMM with application to multichannel image restoration and reconstruction.
- Author
-
He, Chuan, Peng, Wenshen, Wang, Junwei, Feng, Xiaowei, and Jiao, Licheng
- Subjects
- *
IMAGE reconstruction , *NONSMOOTH optimization , *PARALLEL algorithms , *INVERSE problems , *INPAINTING - Abstract
Many large-scale regularized inverse problems in imaging such as image restoration and reconstruction can be modeled as a generic objective function involves sum of nonsmooth but proximable terms, which are usually linear-operator-coupled. For the solution of these problems, a parallel linearized alternating direction method of multipliers (PLADMM) is proposed in this paper. At each step of the proposed algorithm, the proximity operators of the nondifferential terms are called individually. This leads to a highly parallel algorithm structure, where most sub-steps can be simultaneously solved. Profiting from the linearization step, the linear inverse operation is excluded. The convergence property of the proposed method is analyzed. The image deblurring, inpainting, and pMRI reconstruction experiments show that the proposed method has vast applicable vistas. Compared with the state-of-the-art methods, such as PADMM [21], FTVD-v4 [22], PPDS [33], FUSL [8], LADM [36], and ALADMM [27], it gains competitive results both in terms of quantitative indicators, such as PSNR or SSIM, and in terms of visual impression. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. All-to-all reconfigurability with sparse and higher-order Ising machines.
- Author
-
Nikhar, Srijan, Kannan, Sidharth, Aadit, Navid Anjum, Chowdhury, Shuvro, and Camsari, Kerem Y.
- Subjects
GATE array circuits ,GIBBS sampling ,GREEDY algorithms ,PARALLEL algorithms ,TEMPERING ,GRAPHICS processing units - Abstract
Domain-specific hardware to solve computationally hard optimization problems has generated tremendous excitement. Here, we evaluate probabilistic bit (p-bit) based Ising Machines (IM) on the 3-Regular 3-Exclusive OR Satisfiability (3R3X), as a representative hard optimization problem. We first introduce a multiplexed architecture that emulates all-to-all network functionality while maintaining highly parallelized chromatic Gibbs sampling. We implement this architecture in a single Field-Programmable Gate Array (FPGA) and show that running the adaptive parallel tempering algorithm demonstrates competitive algorithmic and prefactor advantages over alternative IMs by D-Wave, Toshiba, and Fujitsu. We also implement higher-order interactions that lead to better prefactors without changing algorithmic scaling for the XORSAT problem. Even though FPGA implementations of p-bits are still not quite as fast as the best possible greedy algorithms accelerated on Graphics Processing Units (GPU), scaled magnetic versions of p-bit IMs could lead to orders of magnitude improvements over the state of the art for generic optimization. Specialized hardware for hard optimization is gaining traction. Here, the authors introduce a sparse, multiplexed, and reconfigurable p-bit Ising Machine on Field-Programmable Gate Arrays, using adaptive parallel tempering and higher-order interactions to achieve competitive performance on the 3-Regular 3-XORSAT problem. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Field Programmable Gate Array (FPGA) Implementation of Parallel Jacobi for Eigen-Decomposition in Direction of Arrival (DOA) Estimation Algorithm.
- Author
-
Zhou, Shuang and Zhou, Li
- Subjects
- *
FIELD programmable gate arrays , *MULTIPLE Signal Classification , *COVARIANCE matrices , *PARALLEL processing , *ELECTRONIC data processing , *PARALLEL algorithms - Abstract
The eigen-decomposition of a covariance matrix is a key step in the Direction of Arrival (DOA) estimation algorithms such as subspace classes. Eigen-decomposition using the parallel Jacobi algorithm implemented on FPGA offers excellent parallelism and real-time performance. Addressing the high complexity and resource consumption of the traditional parallel Jacobi algorithm implemented on FPGA, this study proposes an improved FPGA-based parallel Jacobi algorithm for eigen-decomposition. By analyzing the relationship between angle calculation and rotation during the Jacobi algorithm decomposition process, leveraging parallelism in the data processing, and based on the concepts of time-division multiplexing and parallel partition processing, this approach effectively reduces FPGA resource consumption. The improved parallel Jacobi algorithm is then applied to the classic DOA estimation algorithm, the MUSIC algorithm, and implemented on Xilinx's Zynq FPGA. Experimental results demonstrate that this parallel approach can reduce resource consumption by approximately 75% compared to the traditional method but introduces little additional time consumption. The proposed method in this paper will solve the problem of great hardware consumption of eigen-decomposition based on FPGA in DOA applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Fast CU Partition Decision Algorithm Based on Bayesian and Texture Features.
- Author
-
Tian, Erlin, Yang, Yifan, and Zhang, Qiuwen
- Subjects
BLOCK codes ,VIDEO coding ,INTERNET speed ,VIDEO processing ,VIDEO compression ,PARALLEL algorithms - Abstract
As internet speeds increase and user demands for video quality grow, video coding standards continue to evolve. H.266/Versatile Video Coding (VVC), as the new generation of video coding standards, further improves compression efficiency but also brings higher computational complexity. Despite the significant advancements VVC has made in compression ratio and video quality, the introduction of new coding techniques and complex coding unit (CU) partitioning methods have also led to increased encoding complexity. This complexity not only extends encoding time but also increases hardware resource consumption, limiting the application of VVC in real-time video processing and low-power devices.To alleviate the encoding complexity of VVC, this paper puts forward a Bayesian and texture-feature-based fast splitting algorithm for coding intraframe bloc of VVC, which aims to reduce unnecessary computational steps, enhance encoding efficiency, and maintain video quality as much as possible. In the stage of rapid coding, the video frames are coded by the original VVC test model (VTM), and Joint Rough Mode Decision (JRMD) evaluation cost is used to update the parameter in the Bayesian algorithm to come and set the two thresholds to judge whether the current coding block continues to be split or not. Then, for coding blocks larger than those satisfying the above threshold conditions, the predominant direction of the texture within the coding block is ascertained by calculating the standard deviations along both the horizontal and vertical axes so as to skip some unnecessary splits in the current coding block patterns. The findings from our experiments demonstrate that our proposed approach improves the encoding rate by 1.40% on average, and the execution time of the encoder has been reduced by 49.50%. The overall algorithm has optimized the VVC intraframe coding technology and reduced the coding complexity of VVC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Parallel adaptive large neighborhood search based on spark to solve VRPTW.
- Author
-
Liu, Songzuo, Sun, Jian, Duan, Xiaohong, and Liu, Guofang
- Subjects
- *
EVOLUTIONARY algorithms , *SEARCH algorithms , *ARITHMETIC , *NEIGHBORHOODS , *SCHEDULING , *PARALLEL algorithms , *SIMULATED annealing - Abstract
Aiming at the multi-objective vehicle path planning problem with time windows (VRPTW), a Spark-based parallel Adaptive Large Neighborhood Search algorithm (Spark-ALNS) is proposed to solve it. The main design of the 4-point strategy: (1) Design a new simulated annealing algorithm cooling strategy to achieve a better jump out of the local optimal solution. (2) Adopt CW initialization to accelerate the convergence speed. (3) Use three destruction operators and three repair operators to implement local path optimization. (4) A new parallel strategy is proposed to improve the algorithm's accuracy and reduce the running time. To illustrate the algorithm's effectiveness, the arithmetic example in Solomon is used as an example. The experimental results show that the proposed Spark-ALNS can find better solutions, get the known optimal solutions for 41 out of 56 instances, and find new optimal solutions for 31 algorithms, which outperforms other evolutionary algorithms. The runtime is 3–5 times better than other parallel algorithms and is able to solve VRPTW effectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Photovoltaic Maximum Power Point Tracking Technology Based on Improved Multi‐Subgroup Parallel Optimization Algorithm with Shuffled Frog Leaping Algorithm.
- Author
-
Guo, Zhen, Ye, Ming‐Hao, Chen, Shuang, Nai, Ji‐Qiu, Tong, Di, and Wang, Shu
- Subjects
- *
OPTIMIZATION algorithms , *PHOTOVOLTAIC power systems , *ENERGY development , *CLEAN energy , *PARALLEL algorithms , *PHOTOVOLTAIC power generation - Abstract
To enhance the power generation efficiency of the photovoltaic system, it is necessary to ensure that it can operate stably at the global maximum power point (MPP). This paper presents a study on the maximum power point tracking(MPPT) technology of photovoltaic system and a new MPPT technique based on the improved multi‐subgroup parallel optimization algorithm with shuffled frog leaping algorithm (IMSPO‐SFLA) is proposed, which combines the improved multi‐subgroup parallel optimization algorithm with the shuffled frog leaping algorithm. Moreover, the developed MPPT technique improves the tracking accuracy of the MPP, increased the speed of MPP tracking, the problem of the traditional frog leaping algorithm being prone to falling into a local optimum is successfully overcame by it. In order to verify the effectiveness of the method, a simulation model is established in MATLAB in this paper. The Simulation results show that the method proposed in this paper can make the photovoltaic power generation system be operated more efficiently, which contributes to the development of the renewable energy field and promotes the progress of clean energy technology. Compared with the traditional algorithm, it has obvious advantages in realizing the MPPT of PV power generation system. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Niching Global Optimisation: Systematic Literature Review.
- Author
-
Matanga, Yves, Owolawi, Pius, Du, Chunling, and van Wyk, Etienne
- Subjects
- *
OPTIMIZATION algorithms , *GLOBAL optimization , *METAHEURISTIC algorithms , *CONFERENCE papers , *DATABASES , *PARALLEL algorithms - Abstract
Niching in global optimisation refers to a set of techniques designed to identify multiple optimal solutions within a nonlinear, multimodal landscape. These algorithms enhance the exploratory capabilities of conventional metaheuristics by maintaining diversity and supporting coexisting subpopulations across a search space, thereby allowing a more deterministic approach to the true global optimum. Niching algorithms can be categorised into three primary subfamilies: sequential or temporal niching, parallel or spatial niching, and hybrid models which integrate various niching subparadigms. This research paper aims to explore the effectiveness and limitations of different niching algorithms by providing a systematic literature review of the theoretical frameworks within these subfamilies. Eleven major niching native subparadigms have been identified: fitness sharing, crowding, clearing, speciation, restricted tournament selection, clustering, multiobjectivisation, embedded hybrid methods, ensemble hybrid methods, and other hybrid approaches. This study offers a detailed examination of each paradigm's theoretical foundation, including template algorithmic layouts, and delineates the unique elements of each approach. Research contributions from the inception of niching to 2024 have been aggregated from the SCOPUS database and systematically classified. Data aggregation included journal articles, conference papers, review papers, and research reports published in English only following the PRISMA framework. Application papers with novel theoretical ideas were also taken into account. In all, 203 research works were retained under the inclusion and exclusion criteria. This study concludes with overarching high-level recommendations for future research in modern niching optimisation, emphasising the development of space and time-scalable methods to enhance the adaptability and efficiency of optimisation algorithms in diverse, increasingly multivariable multimodal problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. On the adaptive solution of phase‐field problems with A‐stable explicit last‐stage diagonally implicit Runge–Kutta (ELDIRK) methods.
- Author
-
Westermann, Hendrik and Mahnken, Rolf
- Subjects
- *
TIME integration scheme , *INITIAL value problems , *SLAUGHTERING , *ALGORITHMS , *PARALLEL algorithms - Abstract
Runge–Kutta algorithms with adaptive step size control provide reliable tools for the solution of initial value problems with diagonally implicit Runge‐Kutta (DIRK) methods as the most common approach. In this contribution, the new low‐order explicit last‐stage diagonally implicit Runge–Kutta (ELDIRK) methods are investigated, combining implicit schemes with an additional explicit evaluation as an explicit last stage. This results in Butcher tableaus with two solutions of different convergence orders suitable for embedded methods, where the higher‐order solution is achieved by additional explicit evaluations. Thus, the iterative solution of non‐linear systems is omitted for the additional stage, presenting a major reduction in computational cost for the determination of a local error estimate. The key contribution is the application of the novel Butcher tableaux to phase‐field problems, solved with the finite‐element method, leading to substantial numerical investigations with an efficient approach for DIRK schemes. The most important aspects are the investigations of stability properties which lead to the novel class of A‐stable ELDIRK methods. In accordance, the study of the convergence orders is presented. A local error estimator is presented, such that adaptive step size control for the new low‐order embedded schemes based on an empirical approach for error estimation is achieved. A suitable parallel algorithm is presented with conclusive two‐dimensional phase‐field simulations based on a Kobayashi‐Warren‐Carter model. The higher‐order convergence suggested by the novel schemes is confirmed, and their effective results are demonstrated, resulting in a valuable semi‐explicit addition to the family of Runge–Kutta time integration schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. 基于子网融合的多智能体系统 自组网连通性恢复方法.
- Author
-
何杏宇, 余萍萍, and 杨桂松
- Subjects
- *
MULTIAGENT systems , *REINFORCEMENT (Psychology) , *PARALLEL algorithms , *ENERGY consumption , *PHYSICAL mobility , *REINFORCEMENT learning - Abstract
It is challenging to quickly restore full connectivity in a damaged multi-agent self-organizing network while maintaining the residual connectivity structure. Therefore, this paper proposed a connectivity restoring method based on subnet fusion for self-organized networks in multi-agent systems. Firstly, the method designed a subnet partition algorithm based on network fault detection, to identify faulty nodes and subnet fragmentation in the system. Secondly, the method deployed a leaderfollower mobility model within each subnet to maintain the residual network connectivity. Finally, the method designed a reinforcement learning-based subnet fusion algorithm for leader election, where elected leaders periodically according to a reward function related to mobility distance and energy consumption, being responsible for guiding their followers to move for fusion between subnets. The experimental results show that this method reduces average restoration time by 11.3% and decreases energy consumption by 10.58%, demonstrating its advantages in efficiency and energy usage. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. An efficient partial charging and data gathering strategy using multiple mobile vehicles in wireless rechargeable sensor networks.
- Author
-
Yadav, Chandra Bhushan Kumar and Dash, Dinesh
- Subjects
- *
METAHEURISTIC algorithms , *WIRELESS sensor networks , *ENERGY transfer , *PARALLEL algorithms , *ENERGY consumption - Abstract
Wireless rechargeable sensor networks (WRSNs) are a popular and promising field of research that can be used in many different fields. The battery life and storage space of the sensors are limited, due to this it is hard to keep the network up for longer. Combining wireless energy transfer and wireless data gathering devices on a Mobile Vehicle (MV) is one solution to this challenge. The objective of this work is to reduce the number of dead sensors and packet delivery delay. We proposed the circle-covering based algorithm to determine sojourn point based on energy consumption rates and the location of sensors. An Improved Grey Wolf Optimization (IGWO) meta-heuristic algorithm partitions the network into the minimum number of regions, assigns a MV to each region, and ensures balanced sub-tour lengths among the regions. A novel weight function is proposed to determine the order of the sojourn points. To accomplish our objectives, we propose a heuristic partial charging and data gathering strategy to determine the sojourn times of the MVs at the sojourn points. The performance of our proposed PCDGS scheme is compared with IMPSS, PMCDC, MOAC and JERDC schemes. The simulation results show that our proposed PCDGS scheme outperforms the others. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. GPU-accelerated relaxed graph pattern matching algorithms.
- Author
-
Benachour, Amira, Yahiaoui, Saïd, Bouhenni, Sarra, Kheddouci, Hamamache, and Nouali-Taboudjemat, Nadia
- Subjects
- *
SOCIAL network analysis , *ISOMORPHISM (Mathematics) , *PARALLEL processing , *ALGORITHMS , *GRAPH algorithms , *SCALABILITY , *PARALLEL algorithms - Abstract
Graph pattern matching is widely used in real-world applications, such as social network analysis. Since the traditional subgraph isomorphism is NP-complete and often too restrictive to catch sensible matches, relaxed graph pattern matching models are used. However, existing algorithms suffer from limited linear scalability and restricted degrees of parallelism. In this paper, we propose fast parallel algorithms, GPGS and GPDS, for graph simulation and dual simulation, respectively. They make most use of the GPU performance by adopting the edge-centric processing model. We perform parallel computations on the data graph edges to evaluate the matching constraints for each vertex allowing for fast and scalable algorithms. To the best of our knowledge, we present the first GPU-based algorithms for graph simulation and dual simulation. Extensive experiments on synthetic and real-world data graphs demonstrate that our algorithms significantly outperform existing methods, achieving up to 74.8 × acceleration for GPGS and up to 114.2 × acceleration for GPDS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Using the graph p-distance coloring algorithm for partitioning atoms of some fullerenes.
- Author
-
Mosawi, Hanifa and Tavakoli, Mostafa
- Subjects
- *
GRAPH coloring , *PARALLEL algorithms , *MOLECULAR graphs , *FULLERENES , *PLANAR graphs - Abstract
A molecular graph p-distance coloring is a partitioning of atoms in such a way that there are no path of distance less than or equal to p between the atoms of each set. A fullerene graph is a planar 3-connected cubic graph whose faces are pentagons and hexagons. In this paper we apply some existing coloring algorithms for partitioning atoms of fullerenes based on their distances from each other. We also use these algorithm for coloring faces of some fullerenes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Recovering nested structures in networks: an evaluation of hierarchical clustering techniques.
- Author
-
Gera, Imre and London, András
- Subjects
HIERARCHICAL clustering (Cluster analysis) ,DATA mining ,PARALLEL algorithms ,ALGORITHMS - Abstract
In this article, we present various algorithms to partition the nodes of a network into groups that show the property of nestedness. Since perfect nestedness is a rare phenomenon, we consider the task from a data mining perspective, and we search for groups having high-level of nestedness. We utilize both agglomerative and divisive hierarchical clustering procedures and compare them on several benchmark and real-life networks. Furthermore, we propose different metrics derived from the results of our algorithms. We show that average-linkage and complete-linkage clustering can recover the largest fully nested clusters, and that the cluster size-weighted mean nestedness was a more stable metric for measuring clustering performance. Our proposed algorithms allow us to create multiple resolution views of nestedness-based clustering of networks, extending the field of graph-based data mining. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. ERP 全景视频 VVC 帧内编码 CU 快速划分算法.
- Author
-
李强, 董阳, and 赵宇
- Subjects
PARALLEL algorithms ,LATITUDE ,ALGORITHMS ,VIDEO coding ,ENCODING ,VIDEOS ,WAVELET transforms - Abstract
Copyright of Journal of Chongqing University of Posts & Telecommunications (Natural Science Edition) is the property of Chongqing University of Posts & Telecommunications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
26. Report on the exact methods for finding minimum-sized DFA.
- Author
-
Wieczorek, Wojciech, Strąk, Łukasz, and Nowakowski, Arkadiusz
- Subjects
MACHINE theory ,SEARCH algorithms ,PARALLEL algorithms ,MULTIPROCESSORS ,MATHEMATICAL models - Abstract
This paper presents four state-of-art methods for the finite-state automaton inference based on a sample of labeled strings. The first algorithm is Exbar, and the next three are mathematical models based on ASP, SAT and SMT theories. The potentiality of using multiprocessor computers in the context of automata inference was our research's primary goal. In a series of experiments, we showed that our parallelization of the exbar algorithm is the best choice when a multiprocessor system is available. Furthermore, we obtained a superlinear speedup for some of the prepared datasets, achieving almost a 5-fold speedup on the median, using 12 and 24 processes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Parallel Discrete Harmony Search Algorithm for the Graph Coloring Problem.
- Author
-
Chemaa, Sofiane, Kout, Akram, Djelloul, Halima, and Harrag, Nassir
- Subjects
GRAPH algorithms ,GRAPH coloring ,SEARCH algorithms ,PARALLEL algorithms ,NP-hard problems - Abstract
Graph coloring is an NP-hard combinatorial optimization problem with significant implications in both theoretical and practical contexts due to its complexity and extensive applicability. In this work, a novel approach is proposed to address the graph coloring problem through a parallel discrete Harmony Search algorithm, termed PDHSCol. By harnessing the robustness of the Harmony Search algorithm and integrating parallel processing, our algorithm enhances performance by concurrently generating and evaluating multiple solutions. Implemented in MATLAB, PDHSCol was evaluated using a variety of DIMACS benchmark instances. The experimental results demonstrate that the algorithm performs effectively and yields promising improvements over various other methods, highlighting its potential to deliver high-quality solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Fast CU decision method based on texture characteristics and decision tree for depth map intra-coding.
- Author
-
Si, Lina, Yan, Aohui, and Zhang, Qiuwen
- Subjects
- *
VIDEO coding , *PARALLEL algorithms , *DECISION trees , *SUPPLY & demand , *ALGORITHMS - Abstract
As the demand for higher quality 3D videos continues to grow, the 3D extensions of the Efficient Video Coding (3D-HEVC) standard are gradually unable to meet the needs of users. Versatile Video Coding Standard (H. 266/VVC), as an advanced video coding standard, adopts the nested Multi-Type Tree quadtree (QTMT) partitioning structure that current fast depth map coding unit (CU) partitioning methods cannot apply. Therefore, we have designed a fast intra-frame CU partitioning algorithm for VVC 3D video depth maps. Our proposed algorithm in this article consists of two steps, including two sub-algorithms. First, by analyzing the relationship between image entropy and variance and depth map CU division, we establish a bi-criterion decision algorithm to determine whether the texture complexity of the current CU is low enough to terminate its partitioning process. Then, for CUs that have been determined by the first algorithm to need further partitioning, we use a decision tree model based on Light Gradient Boosting Machine (LGBM) to predict which direction of Rate-Distortion Optimization (RDO) calculation can be skipped, which can avoid some unnecessary RDO calculations in a certain direction. The final experiment demonstrated the effect of the proposed algorithm, which can reduce 47.65% complexity of VVC 3D video intra-coding with negligible 0.23% Bjøntegaard Delta Bitrate (BDBR) increase, superior to other advanced methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Reinforcement learning‐based composite suboptimal control for Markov jump singularly perturbed systems with unknown dynamics.
- Author
-
Li, Wenqian, Jia, Guolong, Wang, Yun, Su, Lei, and Shen, Hao
- Subjects
- *
MACHINE learning , *SYSTEM dynamics , *ALGORITHMS , *REINFORCEMENT learning , *PARALLEL algorithms - Abstract
In this article, a model‐free parallel reinforcement learning method is proposed to solve the suboptimal control problem for the Markov jump singularly perturbed systems. First, since fast and slow dynamics coexist in Markov jump singularly perturbed systems, it may lead to ill‐conditioned numerical problems during the controller design process. Therefore, the original system can be decomposed into independent subsystems at different time‐scales by employing the reduced order method. In addition, two model‐free parallel reinforcement learning algorithms are designed to obtain the optimal controllers for the fast and slow subsystems, respectively. Moreover, within the framework of reinforcement learning, the controllers of the Markov jump singularly perturbed systems can be acquired without the systems dynamics. Finally, a numerical example is introduced to prove the effectiveness of proposed algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Nonlinear association between atherogenic index of plasma and chronic kidney disease: a nationwide cross-sectional study.
- Author
-
Wang, Bo, Jiang, Chunqi, Qu, Yinuo, Wang, Jun, Yan, Chuanzhu, and Zhang, Xin
- Subjects
- *
RECURSIVE partitioning , *CHRONIC kidney failure , *PARALLEL algorithms , *CURVE fitting , *INVERSE relationships (Mathematics) - Abstract
Background: The interplay between metabolic disorders and chronic kidney disease (CKD) has been well-documented. However, the connection between CKD and atherogenic index of plasma (AIP) remains understudied. This research delves into the correlation between these two factors, aiming to shed new light on their potential association. Methods: The relationship between AIP and CKD was evaluated using a weighted multivariate logistic regression model, and the curvilinear relationship between AIP and CKD was explored through smooth curve fitting. We engaged a recursive partitioning algorithm in conjunction with a two-stage linear regression model to precisely determine the inflection point. By conducting stratified analyses, the heterogeneity within subpopulations was explored. Results: In the regression model that accounted for all covariates, ORs (95% CI) for the association between CKD and AIP were 1.12 (0.91, 1.36), indicating no significant association between AIP and CKD. However, sensitivity analyses suggested that the relationship between them may be non-linear. Smooth curve analysis confirmed the non-linear relationship between AIP and CKD, identifying an inflection point at -0.55. Below this threshold, AIP exhibited a significant inverse correlation with CKD. Conversely, above this threshold, a pronounced positive correlation was detected. Stratified analyses elucidated that a non-linear association between AIP and CKD was observed among female participants and those aged 50 and above. Conclusion: We found a curvilinear relationship between chronic kidney disease and atherogenic index of plasma. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. A simple two-stage carrier-phase estimation algorithm for 32-QAM coherent optical communication systems.
- Author
-
Peng, Min, Wang, Xiangqing, Yang, Xiaokun, Wang, Dongfei, Sharma, Abhishek, and Kumari, Meet
- Subjects
QUADRATURE amplitude modulation ,BIT error rate ,SIGNAL-to-noise ratio ,PARALLEL algorithms ,AMPLITUDE modulation - Abstract
The combination of high-order modulation formats and linewidth-tolerant carrier phase estimation (CPE) can effectively improve spectrum efficiency and relax the limitation of laser linewidth. This paper presents a simple two-stage CPE algorithm for polarization-multiplexed (PM) 32-quadrature amplitude modulation (32-QAM) coherent optical communication systems. The algorithm uses an enhanced QPSK partitioning algorithm combined with a simplified 4th power CPE method for coarse estimation in the initial stage and maximum likelihood (ML) detection in the subsequent fine stage. The CPE algorithm significantly increases the number of symbols used in the first stage of coarse estimation. This results in a significant increase in the stability and reliability of the phase estimation, and the CPE algorithm significantly reduces the computational complexity. The optimal parameters, phase estimation performance, and system performance of the algorithm were investigated by building a 22 Gbaud PM 32-QAM coherent system simulation platform and a 5 Gbaud PM 32-QAM coherent system experimental platform. The results show that the proposed two-stage CPE algorithm has a stronger linewidth tolerance difference than the conventional QPSK, and the two-stage CPE algorithm with an optimal block length of 105 performs comparable to blind phase search (BPS). The optical signal noise Ratio (OSNR) value is 21.2 dB and the bit error rate (BER) is 1.8 χ 10
-3 for the optimal block length of 105. The receiving-end DSP unit with a flexible scheme and good communication performance will have potential applications in adaptive elastic optical networks. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
32. Short-Term Electrical Load Forecasting Based on IDBO-PTCN-GRU Model.
- Author
-
Gong, Renxi, Wei, Zhihuan, Qin, Yan, Liu, Tao, and Xu, Jiawei
- Subjects
- *
OPTIMIZATION algorithms , *ELECTRICAL load , *STANDARD deviations , *LATIN hypercube sampling , *DUNG beetles , *PARALLEL algorithms - Abstract
Accurate electrical load forecasting is crucial for the stable operation of power systems. However, existing forecasting models face limitations when handling multidimensional features and feature interactions. Additionally, traditional metaheuristic algorithms tend to become trapped in local optima during the optimization process, negatively impacting model performance and prediction accuracy. To address these challenges, this paper proposes a short-term electrical load forecasting method based on a parallel Temporal Convolutional Network–Gated Recurrent Unit (PTCN-GRU) model, optimized by an improved Dung Beetle Optimization algorithm (IDBO). This method employs a parallel TCN structure, using TCNs with different kernel sizes to extract and integrate multi-scale temporal features, thereby overcoming the limitations of traditional TCNs in processing multidimensional input data. Furthermore, this paper enhances the optimization performance and global search capability of the traditional Dung Beetle Optimization algorithm through several key improvements. Firstly, Latin hypercube sampling is introduced to increase the diversity of the initial population. Next, the Golden Sine Algorithm is integrated to refine the search behavior. Finally, a Cauchy–Gaussian mutation strategy is incorporated in the later stages of iteration to further strengthen the global search capability. Extensive experimental results demonstrate that the proposed IDBO-PTCN-GRU model significantly outperforms comparison models across all evaluation metrics. Specifically, the mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) were reduced by 15.01%, 14.44%, and 14.42%, respectively, while the coefficient of determination (R2) increased by 2.13%. This research provides a novel approach to enhancing the accuracy of electrical load forecasting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Optimizing sparse general matrix–matrix multiplication for DCUs.
- Author
-
Guo, Hengliang, Wang, Haolei, Chen, Wanting, Zhang, Congxiang, Han, Yubo, Zhu, Shengguang, Zhang, Dujuan, Guo, Yang, Shang, Jiandong, Wan, Tao, Li, Qingyang, and Wu, Gang
- Subjects
- *
SPARSE matrices , *PARALLEL algorithms , *MULTIPLICATION , *ALGORITHMS , *MATRICES (Mathematics) - Abstract
Sparse general matrix–matrix multiplication (SpGEMM) is a crucial and complex computational task in many practical applications. Improving the performance of SpGEMM on SIMT processors like modern GPUs is challenging due to the unpredictable sparsity of sparse matrices. Although existing GPU solutions have made progress in improving performance through advanced algorithm design, they ignore some optimizations related to specific processor architectures. This can result in a partially inefficient implementation of their algorithms. This paper focuses on optimizing four inefficient parts of the NSparse algorithm on DCU (a GPU-like accelerator). The optimizations include: 1) setting parameters to improve the load balance of the second matrix by extracting maximum row information at runtime; 2) reducing overhead of binning operations by making full use of registers and shared memory effectively; 3) improving numerical SpGEMM performance by adjusting its calculation mode; and 4) enhancing global load balance through finer-grained grouping and kernel configurations. Experiment results demonstrate that when compared to five state-of-the-art SpGEMM algorithms (bhSparse, KokkosKernels, NSparse, rocSparse, and spECK), our optimized method achieves an average of 7.99x (up to 18.2x), 8.01x (up to 20.83x), 2.37x (up to 6.16x), 1.82x (up to 4.20x), and 1.63x (up to 5.01x) speedups on 29 sparse matrices with different sparse structures, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Iterative algorithms for partitioned neural network approximation to partial differential equations.
- Author
-
Yang, Hee Jun and Kim, Hyea Hyun
- Subjects
- *
ARTIFICIAL neural networks , *PARTIAL differential equations , *PARALLEL programming , *PARALLEL algorithms , *ALGORITHMS - Abstract
To enhance solution accuracy and training efficiency in neural network approximation to partial differential equations, partitioned neural networks can be used as a solution surrogate instead of a single large and deep neural network defined on the whole problem domain. In such a partitioned neural network approach, suitable interface conditions or subdomain boundary conditions are combined to obtain a convergent approximate solution. However, there has been no rigorous study on the convergence and parallel computing enhancement on the partitioned neural network approach. In this paper, iterative algorithms are proposed to enhance parallel computation performance in the partitioned neural network approximation. Our iterative algorithms are based on classical additive Schwarz domain decomposition methods. For the proposed iterative algorithms, their convergence is analyzed under an error assumption on the local and coarse neural network solutions. Numerical results are also included to show the performance of the proposed iterative algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. A parallel stabilized finite element method for the Navier-Stokes problem.
- Author
-
Han, Jing, Du, Guangzhi, and Mi, Shilin
- Subjects
- *
FINITE element method , *PARALLEL algorithms , *NAVIER-Stokes equations , *ALGORITHMS - Abstract
In this article, we mainly propose and analyze a parallel stabilized finite element algorithm based upon two-grid discretization for the Navier-Stokes problem. The lowest equal-order finite element pairs are considered for the finite element discretization and a stabilized term based on two local Gauss integrations is introduced to circumvent the discrete inf-sup condition. The main idea is to utilize a partition of unity to assemble weighted local corrections of solutions on sub-domains. A further coarse grid correction is carried out to derive the optimal error estimates both in H 1 norm and L 2 norm. Theoretical results are rigorously established and some numerical experiments are reported to verify the theoretical results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Study on Large-Scale Urban Water Distribution Network Computation Method Based on a GPU Framework.
- Author
-
Zhang, Rongbin, Hou, Jingming, Li, Jingsi, Wang, Tian, and Imran, Muhammad
- Subjects
WATER management ,WATER distribution ,MUNICIPAL water supply ,MATRIX inversion ,WATER supply ,PARALLEL algorithms ,GRAPHICS processing units - Abstract
Large-scale urban water distribution network simulation plays a critical role in the construction, monitoring, and maintenance of urban water distribution systems. However, during the simulation process, matrix inversion calculations generate a large amount of computational data and consume significant amounts of time, posing challenges for practical applications. To address this issue, this paper proposes a parallel gradient calculation algorithm based on GPU hardware and the CUDA Toolkit library and compares it with the EPANET model and a model based on CPU hardware and the Armadillo library. The results show that the GPU-based model not only achieves a precision level very close to the EPANET model, reaching 99% accuracy, but also significantly outperforms the CPU-based model. Furthermore, during the simulation, the GPU architecture is able to efficiently handle large-scale data and achieve faster convergence, significantly reducing the overall simulation time. Particularly in handling larger-scale water distribution networks, the GPU architecture can improve computational efficiency by up to 13 times. Further analysis reveals that different GPU models exhibit significant differences in computational efficiency, with memory capacity being a key factor affecting performance. GPU devices with larger memory capacity demonstrate higher computational efficiency when processing large-scale water distribution networks. This study demonstrates the advantages of GPU acceleration technology in the simulation of large-scale urban water distribution networks and provides important theoretical and technical support for practical applications in this field. By carefully selecting and configuring GPU devices, the computational efficiency of large-scale water distribution networks can be significantly improved, providing more efficient solutions for future urban water resource management and planning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Fast power flow calculation for distribution networks based on graph models and hierarchical forward-backward sweep parallel algorithm.
- Author
-
Wang, Xinrui, Chen, Wengang, Tian, Ruimin, Ji, Yuze, Zhu, Jianfei, Chen, Zhiyi, Ruan, Jiaqi, and Zhao, Jian
- Subjects
ELECTRICAL load ,PARALLEL programming ,FLOWGRAPHS ,TEST systems ,SUBGRAPHS ,PARALLEL algorithms - Abstract
Introduction: In response to the issues of complexity and low efficiency in line loss calculations for actual distribution networks, this paper proposes a fast power flow calculation method for distribution networks based on Neo4j graph models and a hierarchical forward-backward sweep parallel algorithm. Methods: Firstly, Neo4j is used to describe the distribution network structure as a simple graph model composed of nodes and edges. Secondly, a hierarchical forward-backward sweep method is adopted to perform power flow calculations on the graph model network. Finally, during the computation of distribution network subgraphs, the method is combined with the Bulk Synchronous Parallel (BSP) computing model to quickly complete the line loss analysis. Results and Discussion: Results from the IEEE 33-node test system demonstrate that the proposed method can calculate network losses quickly and accurately, with a computation time of only 0.175s, which is lower than the MySQL and Neo4j graph methods that do not consider hierarchical parallel computing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Exploring Estimation Procedures for Reducing Dimensionality in Psychological Network Modeling.
- Author
-
Shi, Dingjing, Christensen, Alexander P., Day, Eric Anthony, Golino, Hudson F., and Garrido, Luis Eduardo
- Subjects
- *
MONTE Carlo method , *BAYESIAN analysis , *PARALLEL algorithms , *PSYCHOMETRICS , *SAMPLE size (Statistics) - Abstract
AbstractTo understand psychological data, it is crucial to examine the structure and dimensions of variables. In this study, we examined alternative estimation algorithms to the conventional GLASSO-based exploratory graph analysis (EGA) in network psychometric models to assess the dimensionality structure of the data. The study applied Bayesian conjugate or Jeffreys’ priors to estimate the graphical structure and then used the Louvain community detection algorithm to partition and identify groups of nodes, which allowed the detection of the multi- and unidimensional factor structures. Monte Carlo simulations suggested that the two alternative Bayesian estimation algorithms had comparable or better performance when compared with the GLASSO-based EGA and conventional parallel analysis (PA). When estimating the multidimensional factor structure, the analytically based method (i.e., EGA.analytical) showed the best balance between accuracy and mean biased/absolute errors, with the highest accuracy tied with EGA but with the smallest errors. The sampling-based approach (EGA.sampling) yielded higher accuracy and smaller errors than PA; lower accuracy but also lower errors than EGA. Techniques from the two algorithms had more stable performance than EGA and PA across different data conditions. When estimating the unidimensional structure, the PA technique performed the best, followed closely by EGA, and then EGA.analytical and EGA.sampling. Furthermore, the study explored four full Bayesian techniques to assess dimensionality in network psychometrics. The results demonstrated superior performance when using Bayesian hypothesis testing or deriving posterior samples of graph structures under small sample sizes. The study recommends using the EGA.analytical technique as an alternative tool for assessing dimensionality and advocates for the usefulness of the EGA.sampling method as a valuable alternate technique. The findings also indicated encouraging results for extending the regularization-based network modeling EGA method to the Bayesian framework and discussed future directions in this line of work. The study illustrated the practical application of the techniques to two empirical examples in R. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Extended ADMM for general penalized quantile regression with linear constraints in big data.
- Author
-
Liu, Yongxin and Zeng, Peng
- Subjects
- *
PARALLEL algorithms , *INDEPENDENT variables , *DATA warehousing , *DISTRIBUTED algorithms , *BIG data , *QUANTILE regression - Abstract
Quantile regression offers a powerful means of understanding the comprehensive relationship between response variables and predictors. By formulating prior domain knowledge and assumptions as constraints on parameters, the estimation efficiency can be enhanced. This paper studies some methods based on multi-block ADMM (Alternating Direction Method of Multipliers) to fit general penalized quantile regression models with linear constraints on regression coefficients. Different formulations for handling linear constraints and general penalties are explored and compared. Among these formulations, the most efficient one is identified, which provides an explicit expression for each parameter during iterations and eliminates the nested-loop in existing algorithms. Furthermore, this work addresses the challenges posed by big data by developing a parallel ADMM algorithm suitable for distributed data storage. The algorithm's convergence and a robust stopping criterion are established. To demonstrate the excellent performance of the proposed algorithms, extensive numerical experiments and a real data example are presented. These empirical validations showcase the effectiveness of the methods in handling complex datasets. The details of theoretical proofs and different algorithm variations are provided in the Appendix. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Sea Fog Recognition near Coastline Using Millimeter-Wave Radar Based on Machine Learning.
- Author
-
Li, Tao, Qiu, Jianhua, and Xue, Jianjun
- Subjects
- *
METEOROLOGICAL observations , *K-means clustering , *PARALLEL algorithms , *MACHINE learning , *RADAR - Abstract
Sea fog is a hazardous natural phenomenon that reduces visibility, posing a threat to ports and nearshore navigation, making the identification of nearshore sea fog crucial. Millimeter-wave radar has significant advantages over satellites in capturing sudden and localized sea fog weather. The use of millimeter-wave radar for sea fog identification is still in the exploratory stage in operational fields. Therefore, this paper proposes a nearshore sea fog identification algorithm that combines millimeter-wave radar with multiple machine learning methods. Firstly, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is used to partition radar echoes, followed by the K-means clustering algorithm (KMEANS) to divide the partitions into recognition units. Then, Sea-Fog-Recognition-Convolutional Neural Network (SFRCNN) is used to classify whether the recognition units are sea fog areas, and finally, the partition coverage algorithm is employed to improve identification accuracy. The experiments conducted using millimeter-wave radar observation data from the Pingtan Meteorological Observation Base in Fujian, China, achieved an identification accuracy of 96.94%. The results indicate that the proposed algorithm performs well and expands the application prospects of such equipment in meteorological operations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. 多核处理器共享 Cache 的划分算法.
- Author
-
吕海玉, 罗广, 朱嘉炜, and 张凤登
- Subjects
- *
MULTICORE processors , *PARALLEL algorithms , *COMPUTER storage devices , *CYCLING instruction , *MATHEMATICAL models - Abstract
In order to optimize the performance of multi-core processors, this study deeply investigates the management strategy of shared Cache on multi-core processors, and proposes a shared Cache partitioning algorithm MT-FTP(Memory Time based Fair and Throughput Partitioning) based on the fairness of cache time and throughput rate. A mathematical model based on the fairness and throughput index is established, and the partitioning flow of the algorithm is analyzed in the proposed study. The simulation results show that the MT-FTP algorithm has excellent performance in system throughput, and its average IPC (Instructions Per Cycles) value is 1.3% higher than that of UCP(Use Case Point) algorithm and 11.6% higher than that of LRU (Least Recently Used) algorithm. The average fairness of MT-FTP algorithm is 17% higher than that of LRU algorithm, and 16.5% higher than that of UCP algorithm. This algorithm realizes the fairness of shared Cache partition and takes into account the throughput of the system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive).
- Author
-
Tench, David, West, Evan, Zhang, Victor, Bender, Michael A., Chowdhury, Abiyaz, Delayo, Daniel, Dellas, J. Ahmed, Farach-Colton, Martín, Seip, Tyler, and Zhang, Kenny
- Subjects
- *
ARTIFICIAL intelligence , *DATA structures , *HIGH performance computing , *INFORMATION storage & retrieval systems , *DENSE graphs , *PARALLEL algorithms , *GRAPH algorithms - Published
- 2024
- Full Text
- View/download PDF
43. A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce.
- Author
-
Wang, Xikang, Wang, Tongxi, and Xiang, Hua
- Subjects
- *
PARTICLE swarm optimization , *K-means clustering , *BIG data , *COMPUTATIONAL complexity , *RESEARCH personnel , *PARALLEL algorithms - Abstract
The particle swarm optimization-K-Means algorithm is proposed by the related researchers to improve the clustering accuracy of the K-Means algorithm. However, the particle swarm optimization-K-Means algorithm brings more burden to the computation, and the computational efficiency is low when dealing with large data sets. To solve this problem, a parallel particle swarm K-Means algorithm based on MapReduce with multi-threading is proposed. The algorithm performs parallel computation by dividing the particle swarm into several equal-sized sub-populations based on the number of available nodes in the cluster and distributing them to each node. It uses a multi-threaded execution in the evaluation stage, which has the highest computational complexity in the evolutionary process. Experiments show that although splitting the population will affect the optimization effect to some extent, the proposed still can effectively optimize the clustering results of the K-Means algorithm, and the computational efficiency is significantly improved compared with serial particle swarm optimization k-means algorithm and MapReduce-based non-multithreaded particle swarm optimization k-means algorithm, in the experiment with the largest dataset and a configuration of 16 nodes, the proposed algorithm is 58 times faster than the serial algorithm. Furthermore, the computing efficiency can be improved in the clusters with more CPU cores. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Quantitative study of thermal barrier models for paper-based barrier materials using adaptive neuro-fuzzy inference system.
- Author
-
Xia, Zi'ang, Wang, Long, Li, Chaojie, Li, Xue, Yang, Jingxue, Xu, Baoming, Wang, Na, Li, Yao, and Zhang, Heng
- Subjects
- *
AERODYNAMIC heating , *SMART materials , *QUANTITATIVE research , *STANDARD deviations , *PARALLEL algorithms - Abstract
A composite silicone emulsion-biomass polymer paper-based barrier coating material with high barrier performance was prepared by double-layer coating, and the material was tested for oil repellency. The composition-structure-property data set of the paper-based barrier materials was constructed based on the experimental data. An adaptive neuro-fuzzy inference system (ANFIS) was used to construct a prediction model of the coating structure in high-temperature environments to achieve quantitative analysis of the barrier performance in high-temperature environments. The ANFIS prediction model was constructed based on two algorithms, the grid partitioning algorithm and the subtractive clustering algorithm, and the accuracy of the model determined by the two algorithms was compared for training, validation and testing of this experimental data. The results showed that the prediction model of the grid partitioning method had a better fit with the experimental data, with a root mean square error (RMSE) value of 7.00383 and a R-squared (R2) of 0.9644 between the model prediction data and the actual data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Low Complexity Parallel Carrier Frequency Offset Estimation Based on Time-Tagged QPSK Partitioning for Coherent Free-Space Optical Communication.
- Author
-
Zhang, Siqi, Wang, Liqian, Liu, Kunfeng, and Ding, Shuang
- Subjects
QUADRATURE amplitude modulation ,ATMOSPHERIC turbulence ,TELECOMMUNICATION systems ,AMPLITUDE modulation ,PARALLEL algorithms - Abstract
To effectively mitigate the effects of atmospheric turbulence in free space optical (FSO) communication, we propose a parallel carrier frequency offset estimation (FOE) scheme based on time-tagged QPSK partitioning (TTQP). This scheme can be applied to spatial diversity polarization multiplexing (PM) coherent FSO communication systems. Specifically, the TTQP scheme performs QPSK partitioning by time-tagging signal points, accurately recording the time intervals between signals, and significantly reducing implementation complexity through a modified Mth power algorithm. The simulation results for the PM 16-quadrature amplitude modulation (QAM) validate the effectiveness of the proposed scheme. Compared to traditional QPSK partitioning algorithms, the TTQP algorithm achieves high accuracy, low complexity, and multi-format versatility in high-speed coherent FSO communication. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Cross-Correlation Algorithm Based on Speeded-Up Robust Features Parallel Acceleration for Shack–Hartmann Wavefront Sensing.
- Author
-
Wen, Linxiong, Mei, Xiaohan, Tan, Yi, Zhang, Zhiyun, Chai, Fangfang, Wu, Jiayao, Wang, Shuai, and Yang, Ping
- Subjects
CROSS correlation ,ALGORITHMS ,SPEED ,SURFING ,PARALLEL algorithms ,ATMOSPHERE - Abstract
A cross-correlation algorithm to obtain the sub-aperture shifts that occur is a crucial aspect of scene-based SHWS (Shack–Hartmann wavefront sensing). However, when the sub-image is partially absent within the atmosphere, the traditional cross-correlation algorithm can easily obtain the wrong shift results. To overcome this drawback, we propose an algorithm based on SURFs (speeded-up-robust features) matching. In addition, to meet the speed required by wavefront sensing, CUDA parallel optimization of SURF matching is carried out using a GPU thread execution model and a programming model. The results show that the shift error can be reduced by more than two times, and the parallel algorithm can achieve nearly ten times the acceleration ratio. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Prediction of blood‐brain barrier permeability using machine learning approaches based on various molecular representation.
- Author
-
Liang, Li, Liu, Zhiwen, Yang, Xinyi, Zhang, Yanmin, Liu, Haichun, and Chen, Yadong
- Subjects
MACHINE learning ,DRUG discovery ,CENTRAL nervous system diseases ,CENTRAL nervous system ,PARALLEL algorithms ,BLOOD-brain barrier ,SKIN permeability - Abstract
The assessment of compound blood‐brain barrier (BBB) permeability poses a significant challenge in the discovery of drugs targeting the central nervous system. Conventional experimental approaches to measure BBB permeability are labor‐intensive, cost‐ineffective, and time‐consuming. In this study, we constructed six machine learning classification models by combining various machine learning algorithms and molecular representations. The model based on ExtraTree algorithm and random partitioning strategy obtains the best prediction result, with AUC value of 0.932±0.004 and balanced accuracy (BA) of 0.837±0.010 for the test set. We employed the SHAP method to identify important features associated with BBB permeability. In addition, matched molecular pair (MMP) analysis and representative substructure derivation method were utilized to uncover the transformation rules and distinctive structural features of BBB permeable compounds. The machine learning models proposed in this work can serve as an effective tool for assessing BBB permeability in the drug discovery for central nervous system disease. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A Novel Low-Complexity and Parallel Algorithm for DCT IV Transform and Its GPU Implementation.
- Author
-
Chiper, Doru Florin and Dobrea, Dan Marius
- Subjects
ARITHMETIC ,ALGORITHMS ,FACTORIZATION ,SPEED - Abstract
This study proposes a novel factorization method for the DCT IV algorithm that allows for breaking it into four or eight sections that can be run in parallel. Moreover, the arithmetic complexity has been significantly reduced. Based on the proposed new algorithm for DCT IV, the speed performance has been improved substantially. The performance of this algorithm was verified using two different GPU systems produced by the NVIDIA company. The experimental results show that the novel proposed DCT algorithm achieves an impressive reduction in the total processing time. The proposed method is very efficient, improving the algorithm speed by more than 4-times—that was expected by segmenting the DCT algorithm into four sections running in parallel. The speed improvements are about five-times higher—at least 5.41 on Jetson AGX Xavier, and 10.11 on Jetson Orin Nano—if we compare with the classical implementation (based on a sequential approach) of DCT IV. Using a parallel formulation with eight sections running in parallel, the improvement in speed performance is even higher, at least 8.08-times on Jetson AGX Xavier and 11.81-times on Jetson Orin Nano. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. PARALLEL BUCKET-SORT ALGORITHM ON OPTICAL CHAINED-CUBIC TREE INTERCONNECTION NETWORK.
- Author
-
Mahafzah, Basel A.
- Subjects
PARALLEL processing ,RESEARCH personnel ,ALGORITHMS ,PAILS ,PARALLEL algorithms ,TREES - Abstract
The performance of sorting algorithms has a great impact on many computationally intensive applications. Researchers worked on parallelizing many sorting algorithms on various interconnection networks to improve their sequential counterpart performance. One of these interconnection networks is the optical chained-cubic tree (OCCT). In this paper, a parallel bucket sort (PBS) algorithm is presented and applied to the OCCT interconnection network. This PBS algorithm is evaluated analytically and by simulation in terms of various performance metrics including parallel runtime, computation time, communication time, concatenation time, speedup and efficiency, for a different number of processors, dataset sizes and data distributions including random and descending distributions. Simulation results show that the highest obtained speedup is approximately 912x on OCCT using 1020 processors, which means that the parallel runtime of the PBS on 1020 processors is 912 times faster than the sequential runtime of BS on a single processor. [ABSTRACT FROM AUTHOR]
- Published
- 2024
50. A Parallel Image Encryption Method Based on Hybrid Chaotic Technique.
- Author
-
Abdulkadhim, Ahmed Abd Ali and Lakizadeh, Amir
- Subjects
DATA privacy ,ADVANCED Encryption Standard ,DIGITAL technology ,PARALLEL algorithms ,DATA integrity ,IMAGE encryption - Abstract
Encrypting images is crucial for several reasons, as protection against identity theft, privacy and data integrity, particularly in today's digital age where visual data is extensively shared and stored. Where Advanced Encryption Standard, Rivest-Shamir-Adleman algorithms and others, suffer from some drawbacks, as proper management of encryption keys and computationally expensive. Using chaotic systems for image encryption with parallel algorithms, offers several unique advantages. this article presents a novel design for a chaos-based parallel encryption algorithm. The proposed encryption method consists of Generate keys stage, The permutation stage and Diffusion stage. Where the dividing the image into several threads, creating keys from hyper chaotic, and employing parallel processing to handle the threads. Each portion of the image is encrypted using a different thread, Subsequently, the operation is finished by combining the encrypted portions. Experimental results are shown using test images (512*512 and 256*256) from the USC-SIPI and CVG-UGR databases. were quicker outcomes and more secure encryption. In addition, the information entropy screening rate reached 7.999319, which is close to the optimal ratio of 8. The average execution time of encryption and decryption reached (0.01154 ms, 0.02416 ms) and NPCR and UACI reached (99.643, 33.4798). The algorithm has achieved a high level of security. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.