Author: "毛伊敏" - Searchworks@Jio Institute Digital Library Search Results

1. 基于互信息和融合加权的并行深度森.

Author: 毛伊敏 and 李文豪
Abstract: In the context of big data environments, the parallel deep forest algorithm faces several challenges, such as an abundance of irrelevant and redundant features, imbalanced multi-granularity scanning, inadequate classification performance, and low parallelization efficiency. To tackle these issues, it proposed a parallel deep forest algorithm based on mutual information and mixed weighting (PDF-MIMW). Firstly, the algorithm introduced a feature extraction strategy based on mutual information (FE-MI) in the phase of dimensionality reduction, which filters the original feature set by combining feature importance, interaction, and redundancy metrics, thereby eliminating excessive irrelevant and redundant features. Next, the algorithm proposed an improved multi-granularity scanning strategy based on padding (IMGS-P) in the phase of multi-granularity scanning, which involves padding the reduced features and performing random sampling on the subsequences obtained after window scanning, thereby ensuring a balanced multi-granularity scanning process. Then, the algorithm put forth the sub-forest construction strategy based on mixed weighting (SFCMW) in the phase of cascade forest construction, which utilizes the Spark framework to parallelly construct weighted sub-forests, thereby enhancing the model’s classification performance. Finally, the algorithm designed a load balancing strategy based on a mixed particle swarm algorithm in the phase of class vector merging, which optimizes the load distribution among task nodes in the Spark framework, reducing the waiting time during class vector merging and improving the parallelization efficiency of the model. Experiments demonstrate that the PDF-MIMW algorithm achieves superior classification performance and higher training efficiency in the big data environment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. 基于 Spark 和 NRSCA 策略的并行深度森林算法.

Author: 毛伊敏 and 刘绍芬
Subjects: *ROUGH sets, *ALGORITHMS
Abstract: This paper proposed a parallel deep forest algorithm based on Spark and NRSCA strategy (PDF-SNRSCA), aiming to address several issues encountered by parallel deep forest algorithms in big data environments, such as excessive redundancy and irrelevant features, low utilization rate of features at both ends, slow model convergence speed, and low parallel efficiency of cascading forests. Firstly, the algorithm proposes a feature selection strategy (FSNRS) based on neighborhood rough sets and fisher score, which measures the correlation and redundancy of features to effectively reduce the number of redundant and irrelevant features. Secondly, it proposed a scanning strategy based on random selection and equidistant extraction (S-RSEE) to ensure that all features are utilized with the same probability and solve the problem of low utilization rate of two ends in multi-scanning. Finally, combining with the Spark framework, the algorithm realized the parallel training of cascading forests, and it proposed a feature filtering mechanism based on the importance index (FFM-II) to balance the dimensions of enhanced class vectors and original class vectors, thereby accelerating the model convergence speed. Meanwhile, the algorithm designed a task scheduling mechanism based on SCA (TSM-SCA) to redistribute tasks and ensure load balancing in the cluster, which solves the problem of low parallel efficiency of cascading forests. Experiments show that the PDF-SNRSCA algorithm can effectively improve the classification performance of deep forests and greatly enhance the efficiency of parallel training of deep forests. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. 改进的并行关联规则增量挖掘算法.

Author: 毛伊敏, 邓千虎, 邓小鸿, and 刘蔚
Subjects: *BIG data, *ALGORITHMS, *HETEROGENEOUS computing, *PARALLEL programming, *COMPUTER workstation clusters, *PARALLEL processing
Abstract: In the big data environment, the Can-tree based on incremental association rule algorithm has problems such as too much space occupation of the tree structure, the efficiency of frequent pattern mining is poor, and the parallelization performance of MapReduce cluster is insufficient. Aiming at these problems, this paper proposed the MR-PARIRM. Firstly, it designed a RS-SIM to merge similar items in the dataset, and constructed Can-tree based on the merged data, thereby reducing the space occupation of the tree structure. Secondly, this paper proposed an MPS to prune and merge the propagation paths in the tree structure, thereby compressing the frequent pattern search space to speed up frequent item mining. Finally, MRP ARIRM used the DSS to dynamically schedule the computing tasks in the heterogeneous MapReduce cluster, thereby implementing the load balance and effectively improving the parallel computing capabilities of the cluster. The final experimental simulation results show that MR-PARIRM has relatively better performance in the big data environment and is suitable for parallel processing of large-scale data. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

4. 基于网格密度和局部敏感哈希函数的并行化聚类算法.

Author: 毛伊敏, 陶涛, and 曹文梁
Subjects: *BIG data, *DATABASES, *PARALLEL algorithms, *DATA mapping, *DENSITY, *FUZZY clustering technique
Abstract: Aiming at the problems of sensitivity of initial center, high communication overhead of nodes and low efficiency of cluster in big data clustering algorithm based on partitioning, this paper proposed a partitioning-based clustering algorithm using grid density and locality sensitive hash function based on MapReduce, named PBGDLSH-MR. Firstly, based on the initial data set, it proposed the GDS(grid density strategy) to get the initial clustering center, which avoided the sensitivity of initial center caused by random selection of initial cluster center. Secondly, it proposed the DP-LSH ( data partitioning based on locality sensitive hash functions) to map more closely related data objects into the same subdataset and get data partitions on the map. Meanwhile, it designed a formula SI( similarity improvement) to evaluate the data partitioning results, reduced the communication overhead between nodes. In addition, this paper designed an AGS (adaptive grouping strategy) to handle data skew in data partitions, which improved the cluster efficiency. Finally, based on MapReduce, it mined the cluster centers in parallel to gene rate the final clustering results. The experimental results show that the PBGDLSH-MR has better clustering results and performs better parallelization in big data. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

5. 基于大数据的并行化深度卷积神经网络优化算法.

Author: 毛伊敏, 张瑞朋, and 曹文梁
Subjects: *CONVOLUTIONAL neural networks, *CONJUGATE gradient methods, *PARALLEL algorithms, *MATHEMATICAL optimization, *BIG data, *PHYSIOLOGICAL effects of acceleration
Abstract: Aiming at the problems of too many redundant parameters, slow convergence speed and low parallel efficiency of parallel DCNN algorithm in big data environment, this paper proposed a parallel deep convolutional neural network optimization algorithm named PDCNNO. Firstly, the algorithm designed PFM strategy, pretraining network, and obtained the compressed net work, which effectively reduced redundant parameters and reduced the time and space complexity of DCNN training. Secondly, it designed a CGMSE to obtain local classification results, which realized rapid convergence of conjugate gradient method and improved the convergence speed of the network. Finally, in the reduce phase, it proposed a LBRLA strategy to obtain the global classification results, which realized the fast and uniform grouping of data and improved the acceleration ratio of the parallel system, Experiments show that the algorithm not only reduces the time and space complexity of DCNN training in the big data environment, but also improves the parallelization performance of the parallel system. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

6. 基于中心性和模块特性的关键蛋白质识别.

Author: 毛伊敏, 章宇盟, and 胡　健
Subjects: *TOPOLOGICAL property, *ALGORITHMS, *PROTEIN-protein interactions, *CENTRALITY, *PROTEINS
Abstract: Due to the noise in PPI network, as well as the poor id entification accuracy of essential proteins, this paper proposed a method named UCM based on centrality and modularity to identify essential proteins. Firstly, this method integrated topological data and biological data to construct multi-attribute network to reduce the noise( the false positive and the false negative) impact in the original PPI network. Secondly, according to the topological property and biological property of essential proteins, this paper developed a clustering algorithm to mine essential modules from multi-attribute network, which emphasized the importance of the essential proteins from multi-dimension in essential modules. Finally, based on centrality and modularity, it designed an EIS to improve the accuracy of predicting essential proteins by topological properties and biological properties. This paper applied UCM method to the DIP dataset for predicting essential proteins. Compared with other ten methods of predicting essential proteins, the experimental res ults show that this method can identify more essential proteins and have a better performance on predicting essential proteins. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

7. 基于模糊蚁群的加权蛋白质复合物识别算法.

Author: 毛伊敏, 刘银萍, and 胡健
Subjects: *PROTEOMICS, *ANT algorithms, *PEARSON correlation (Statistics), *SEED proteins, *SOFTWARE measurement, *PROTEIN content of food
Abstract: Aiming at the problem that the accuracy and recall of the protein complexes identification algorithm based on ant colony and FCM clustering are not high and the running efficiency is low, this paper proposed a novel protein complex recognition algorithm named FAC-PC. Firstly, combing with the Pearson correlation coefficient and edge aggregation coefficient, the algorithm constructed the weighted protein network. Secondly, in order to overcome the defects of massive merger and filter, repeated pick-up and drop-down operations in ant colony clustering algorithm, it designed the EPS metric to select essential protein, and designed the PFC metric to traverse neighbors of essential proteins to obtain essential group proteins. Then it used the essential group protein to replace the seed node in the process of ant colony clustering, which resulted that the accuracy and time performance were improved . Furthermore, it proposed the SI metric to optimize the probability of pick-up and drop down operations of ant colony to obtain the number of clustering. Finally, according to the improved ant colony algorithm, it obtained the essential protein and the number of clustering to initialize the FCM algorithm, and designed the membership update strategy to optimize the membership update, at the same time, it proposed a new FCM objective function which took a balance between intra-clustering and inter-clustering variation, and finally identified th e protein complex by improved FCM algorithm. This paper used FAC-PC algorithm to identify protein complexes on DIP data. The experimental results show that FACPC algorithm has better performance on accuracy and recall, which is more reasonable to identify protein complexes. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

8. 基于复合物参与度和密度的关键蛋白质预测.

Author: 毛伊敏 and 刘银萍
Abstract: The identification of essential proteins in the protein- protein interaction (PPJ) network tends to only focus on the topological characteristics of t he nodes, and the PPI data contains high false positive, the neighborhood information of nodes and the influence of complex mining on the recognition of essential proteins are not considered comprehensively by the essential proteins recognition algorithm based on complex information, so the accuracy and specificity of the recognition results are not high. In order to deal with these problems, an essential proteins prediction algorithm based on participation degree in protein complex and density (PEC) is proposed. Firstly, the GO annotation information and the edge aggregation coefficient are used to construct the weighted PPI network to overcome the influence of false positives on the experimental results. Based on the edge weight of protein interaction, the similarity matrix is constructed. The maximum difference between eigenvectors is designed to automatically determine the partition number K. Meanwhile, K initial clustering centers are selected according to the degree of protein nodes in the weighted network. Furthermore, the spectral clustering and the fuzzy Cmeans (FCM) clustering algorithm are combined to excavate the protein complex, thus improving the clustering accuracy and reduces the data dimension. Secondly, based on the degree of participation in protein complex and the neighborhood subgraph density, the scores of the essential proteins are proposed. The experiment results on DIP and Krogan datasets show that, compared with 10 classic algorithms such as DC, BC, CC, SC, IC, PeC, WDC, LIDC, LBCC and UC, PEC can correctly identify more essential proteins with higher accuracy and specificity. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

9. 不确定NNSB-OPTICS聚类算法在滑坡危险性预测中的研究与应用.

Author: 毛伊敏, 陈华彬, 李忠利, and 张灿龙
Subjects: *LANDSLIDE prediction, *MODEL theory, *LANDSLIDES, *RAINFALL probabilities, *RAINFALL, *ALGORITHMS, *DENSITY
Abstract: Since the rainfall and other uncertainties are difficult to obtain and effectively deal with in landslide hazard prediction, and the existence of setting density threshold and high time complexity in the OPTICS-PLUS algorithms, this paper proposed an uncertainty NNSB-OPTICS clustering and applied to landslide prediction in order to improve the prediction accuracy.Firstly, this algorithm optimized the expansion strategy of OPTICS-PLUS algorithm, which avoided the manual setting of density threshold and improved the efficiency of the algorithm. Then, according to the distribution characteristics of rainfall data, combined with EW distance formula and cloud model theory, this paper put forward EC distance formula, which could deal with the uncertain rainfall data effectively. Finally, this paper applied the uncertain NNSB-OPTICS clustering algorithm to predict landslide hazard in Baota district of Yan’an city and the landslide prediction accuracy reached into 87. 9%. The experimental results show that this method can effectively improve the accuracy of landslide prediction and has high feasibility. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

10. 基于 Spark和 AMPSO 的并行深度卷积神经网络优化算法.

Author: 刘卫明, 罗全成, 毛伊敏, and 彭喆
Subjects: *CONVOLUTIONAL neural networks, *OPTIMIZATION algorithms, *PARTICLE swarm optimization, *DIFFERENTIAL evolution, *PARALLEL algorithms, *DYNAMIC loads, *DYNAMIC balance (Mechanics)
Abstract: This paper proposed a parallel deep convolutional neural network optimization algorithm based on Spark and AMPSO (PDCNN-SAMPSO), aiming to address several issues encountered by parallel DCNN algorithms in big data environments, such as excessive redundant parameters, slow convergence speed, easy to fall into local optimal, and low parallel efficiency. Firstly, the algorithm designed a kernel pruning strategy based on importance and similarity (KP-IS) to address the problem of excessive redundant parameters by pruning the redundant convolution kernels in the model. Secondly, it proposed a model parallel training strategy based on adaptive mutation particle swarm optimization algorithm (MPT-AMPSO) to solve the slow convergence speed and easy to fall into local optimal issues of parallel DCNN algorithms by initializing the model parameters using adaptive mutation particle swarm optimization algorithm (AMPSO) . Finally, the algorithm proposed a dynamic load balancing strategy based on node performance (DLBNP) to balance the load of each node in the cluster and improve the parallel efficiency. Experiments show that, when using 8 computing nodes to process the CompCars dataset, the runtime of PDCNN-SAMPSO is 22%, 30%, 37% and 27% lower than that of Dis-CNN, DS-DCNN, CLR-Distributed-CNN and RS-DCNN, respectively, the speedup ratio is higher by 1.707, 1.424, 1.859, and 0.922, respectively, and the top-1 accuracy is higher by 4.01%, 4.89%, 2.42%, 5.94%, indicating that PDCNN-AMPSO has good classification performance in the big data environment and is suitable for parallel training of DCNN models in the big data environment. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. 结合增益率与堆叠自编码器的并行随机森林算法.

Author: 刘卫明, 陈伟达, 毛伊敏, and 陈志刚
Subjects: *RANDOM forest algorithms, *LATIN hypercube sampling, *ACTIVE learning, *BIG data, *ELECTRONIC data processing, *ALGORITHMS
Abstract: In the big data environment, the random forest algorithm suffers from excessive redundancy and irrelevant features, the insufficient spatial information content of feature subspace, and low parallelization efficiency. To resolve these issues, this paper presented PRFGRSAE. Firstly, this algorithm proposed a DRNGRSAE, which filtered redundant and irrelevant features of the feature set and extracted features by stacked auto-encoders to reduce the number of redundant and irrelevant features effectively. Secondly, it proposed a SSLF that combined Latin hypercube sampling and normalized correlation degree, which formed feature subspaces with high spatial expression by performing multi-layer division sampling on the feature set, and ensured the feature subspace information content. Finally, it proposed a reducer allocation strategy DSVLA combining with variable action learning automata, which allocated each cluster to reducers for processing evenly and improved the parallelization efficiency effectively. Experimental results show that compared with IMRF, KSMRF, and GAPRF algorithms, the speedup ratio and accuracy of the PRFGRSAE algorithm are significantly improved. Therefore, the algorithm can obtain higher accuracy and parallel efficiency when applied to process large data, especially for data sets with more features. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

12. 基于Map Reduce和MSSA的并行K-means算法.

Author: 刘卫明, 崔瑜, 毛伊敏, and 刘蔚
Subjects: *DIMENSIONAL reduction algorithms, *RANK correlation (Statistics), *PARALLEL algorithms, *SEARCH algorithms, *CENTROID, *K-means clustering
Abstract: In the big data environment, the parallel K-means clustering algorithm suffers from poor clustering effect, unbalanced data partition, cluster centroid sensitivity. To solve these problems, this paper proposed a parallel K-means algorithm based on MapReduce and MSSA(MR-MSKCA). Firstly, the algorithm designed a dimensionality reduction strategy (DRKCAE), which used Kendall correlation cefficient and deep sparse autoencoder to weight features and to extract features to improve the clustering effect of high-dimensional data. Secondly, this paper proposed a uniform partition strategy based on two-stage mapping(UPS), which divided the data set and obtained uniform data partition. Finally, the algorithm proposed a non-uniform mutation sparrow search algorithm(MSSA) to get the parallel K-means clustering centroid, which solves the problem of initial centroid sensitivity. Compared with MR-KNMF, MR-PGDLSH, and MR-GAPKCA, the running time of MR-MSKCA decreased by 45.1%, 49.1%, 59.8%, and the clustering effect increased by 19.2%, 22.8%, 24%. Experiments show that the MR-MSKCA algorithm not only has excellent performance, but also has strong adaptability with large-scale dataset. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. 基于Im2col的并行深度卷积神经网络优化算法.

Author: 胡健, 龚克, 毛伊敏, 陈志刚, and 陈亮
Subjects: *ARTIFICIAL neural networks, *CONVOLUTIONAL neural networks, *FEATURE extraction, *MATHEMATICAL optimization, *ALGORITHMS, *PARALLEL algorithms
Abstract: In the large data environment, there are many problems in the parallel deep convolution neural network（DCNN） algorithm, such as excessive data redundancy, slow convolution layer operation and poor convergence of loss function. This paper proposed a parallel deep convolution neural network optimization algorithm based on the Im2 col method. First, the algorithm proposed a parallel feature extraction strategy based on Marr-Hildreth operator to extract target features from data as input of convolution neural network, which can effectively avoid the problem of excessive data redundancy. Secondly, the algorithm designed a parallel model training strategy based on the Im2 col method. The redundant convolution kernel is removed by designing the Mahalanobis distance center value, and the convolution layer operation speed is improved by combining the MapReduce and Im2 col methods. Finally, the algorithm proposed an improved small-batch gradient descent strategy, which eliminates the effect of abnormal data on the batch gradient and solves the problem of poor convergence of the loss function. The experimental results show that IA-PDCNNOA algorithm performs well in deep convolution neural network calculation under large data environment and is suitable for parallel DCNN model training of large datasets. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

14. 面向电力物联网流数据的一种具有隐私保护的KNN查询方法.

Author: 易叶青, 易颖杰, 刘云如, and 毛伊敏
Abstract: The power Internet of Things(PIoT) is a smart service system that offers full-state awareness, efficient information processing, and convenient and flexible applications to users. However, these services also pose a risk of privacy leakage. The existing research on privacy protection of power data mainly concentrates on secure aggregation, but seldom addresses the core technology of many basic services, such as KNN query. Unlike traditional relational data, the PIoT collects flowing data of user electricity consumption, and the various power parameters exhibit dynamic correlations. Attackers can use data mining and other methods to infer future trends in data changes. Therefore, this paper proposed a privacy-preserving KNN query method. Firstly, it proposed a similarity measurement model based on bucket distance, and proved the upper and lower bounds of the error between the similarity measurement model based on bucket distance and the similarity measurement model based on Euclidean distance. Through this model, the similarity measurement could be transformed into set intersection operations. Then, it constructed a privacy-preserving function, which could generate different data privacy-preserving functions and query privacy-preserving functions for various smart terminals by substituting different parameters. Based on this, it proposed a data encoding scheme based on bucket partitioning and random number allocation. After being encrypted by the privacy-preserving function, the encoded data possessed the characteristic of ciphertext indistinguishability, and could effectively resist various attacks such as chosen plaintext attacks, data mining attacks, statistical analysis attacks, ICA attacks, and inference prediction attacks. Analysis and simulation demonstrate that the proposed secure KNN query method not only has high security but also has low overhead. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. 基于 Relief 和 BFO 的并行支持向量机算法.

Author: 胡健, 王祥太, 毛伊敏, and 刘蔚
Subjects: *BIG data, *SUPPORT vector machines, *PROBLEM solving, *ALGORITHMS, *PARALLEL algorithms
Abstract: Aiming at the problems of parallel support vector machine (SVM) algorithm in big data environment such as redundant data sensitivity, difficulty in parameter selection, and low parallelization efficiency, this paper proposed a parallel SVM algorithm using Relief and bacterial foraging optimization (BFO) algorithm based on MapReduce (RBFO-PSVM). Firstly, the algorithm designed a feature weight calculation strategy (MI-Relief), which used mutual information to improve the weight calculation function of Relief algorithm to eliminate redundant features in the data set and effectively reduce redundant data to support parallelism. Secondly, this paper proposed a hybrid BFO algorithm based on MapReduce (MR-HBFO), which selected the optimal parameters of SVM in parallel, and solved the problem of difficult selection of SVM parameters. Finally, it proposed the kernel clustering strategy (KCS) to reduce the size of the data set involved in parallel training, and proposed a cross-fusion cascaded parallel SVM (CFCPSVM) to improve the cascade SVM (CSVM) feedback mechanism. It trained SVM by combining with the Map Reduce programming framework, and this improved the parallelization efficiency of parallel SVM. Experiments show that the RBFO-PSVM algorithm has a better classification effect on large data sets and is more suitable for large data environments. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

16. 基于分组和 IGSA 的并行密度聚类算法.

Author: 胡春安, 王家欣, and 毛伊敏
Subjects: *ALGORITHMS, *NEIGHBORHOODS, *BIG data, *SCALABILITY
Abstract: Aiming at the problems of indigent scalability, poor parameter optimization ability and low efficiency of parallelization in big data density-based clustering algorithm, this paper developed a density-based clustering algorithm MR-GDBIGS based on groups and improved gravitational search. Firstly, it proposed a GSP to divide data and speed up neighborhood search. Secondly, the algorithm designed an IGSA which used the PUF. In addition, it could dynamically select the optimal parameters of local clustering based on IGSA algorithm, which could improve the clustering effect of local clustering. Finally, based on cover-tree and MapReduce, this paper developed a MR-CTMC to get the result of clustering algorithm more quickly, which improved the core clusters merging efficiency of density-based clustering algorithm. The experimental results show that the MRGDBIGS algorithm has better clustering results and performs better parallelization in big data. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

17. 基于 MapReduce 和 IFOA 的并行密度聚类算法.

Author: 胡健, 徐锴滨, and 毛伊敏
Subjects: *GRID cells, *PARALLEL programming, *MATHEMATICAL optimization, *COMPUTER workstation clusters, *ALGORITHMS
Abstract: Aiming at the problems of unreasonable division of data gridding, poor parameter optimization ability and low efficiency of parallelization in big data density-based clustering algorithm, this paper proposed a density-based clustering algorithm by using improve fruit fly optimization based on MapReduce, named MR-DBIFOA. Firstly, based on KD-Tree, it designed a division strategy to divide the cell of grid adaptively. Secondly, this method proposed an improve fruit fly optimization algorithm which used KLSS and CFF. Then,based on IFOA algorithm, it dynamically selected the optimal parameters of local clustering, which could improve the clustering effect of local clustering. Meanwhile, in order to improve the parallel efficiency, it proposed a density-based clustering algorithm using IFOA to parallel compute the local clusters of clustering algorithm. Finally, based on QR-Tree and MapReduce,it proposed a clusters merging algorithm (MR-QRMEC) to get the result of clustering algorithm more quickly, which improved the core clusters merging efficiency of density-based clustering algorithm. The experimental results show that the MR-DBIFOA has better clustering results and performs better parallelization in big data. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

18. 基于MapReduce的并行频繁项集挖掘算法研究.

Author: 刘卫明, 张弛, and 毛伊敏
Subjects: *IMPACT loads, *BIG data, *MINES & mineral resources, *ALGORITHMS, *PROCESSIONS
Abstract: Aiming at the problem of excessive time,space complexity and unbalanced load for each node based on the parallel frequent itemset mining algorithm MRPrePost,this paper proposed an optimization parallel frequent itemset mining algorithm based on MapReduce,named PFIMD.Firstly,this algorithm adopted a data structure called DiffNodeset,which effectively avoided the defect that the N-list cardinality got very large in the MRPrePost algorithm.Secondly,in order to reduce the time complexity of this algorithm,it designed the T-wcs to avoid the invalid calculation in the procession of two DiffNodesets connection.Finally,considering the impact of cluster load on the efficiency of parallel algorithm,it proposed the LBSBDG,which decreased the size of PPC-Tree on each computing node and reduced the amount of time required to traverse the PPC-Tree by evenly grouping each item in the F-list.The experimental results show that the modified algorithm has better performance on mining frequent itemset in a big data environment. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

19. 基于蚁群聚类的动态加权 PPI 网络复合物挖掘.

Author: 胡健, 朱海湾, and 毛伊敏
Subjects: *COMPUTER simulation, *COMPUTATIONAL complexity, *PROTEIN-protein interactions, *FUZZY algorithms, *ORDER picking systems, *PROTEINS, *SPEED
Abstract: Since static PPI network are difficult to truly reflect the dynamic character of cells, the convergence speed is slow, cluster precision and recall is low in mining protein complex based on ant colony clustering, this paper proposed an ant colony clustering algorithm based on fuzzy granular and closeness degree to mine protein complexes in dynamic weighted PPI network, named FGCDACC-DPC (joint fuzzy granular and closeness degree ant colony clustering-DPC). First, based on the topological and biological characteristics of the PPI network, it designed a comprehensive weight metric (CWM) to accurately describe the interaction between proteins. Second, this method constructed a series of dense and highly co-expressed complex core based on the basic characteristic of the complexes, then it employed the picking and dropping operations, which based on fuzzy granular and closeness degree, to cluster the nodes in PPI network, in order to reduce effectively the computational complexity and rand omness, speed up the clustering speed. Finally, this algorithm designed a local and global strategy founded on function trans mission and timing functional relevance theory for weight's update, which achieved the function transmission between different gene rations of ant colonies and networks at different times to effectively improve clustering accuracy. FGCDACC-DPC algorithm was used to mine protein complexes on DIP data. Experimental results demonstrate that this algorithm has better performance on precision and recall, which is more reasonable to ide ntify protein complexes. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

20. 不确定PAHT聚类算法在滑坡危险性预测上的应用.

Author: 胡　健, 朱　玲, and 毛伊敏
Subjects: *LANDSLIDES, *LANDSLIDE prediction, *DATA modeling, *RAINFALL, *ALGORITHMS
Abstract: In the clustering study of landslide prediction, the difficulties of determining the number of clusters which traditional clustering algorithm needs to set in advance and accurately measuring the important factor of landslide induced-rainfall leads to had prediction effect. Therefore, this paper proposed a new clustering algorithm-uncertain PAHT algorithm. The algorithm introduced a kind of uncertain data model called M-D distance, which effectively measured the uncertain rainfall; and based on the hierarchical clustering thinking, through finding the best threshold p* to determine the k value. Contrast experiment in Yan’ an Baota district as an example, the experimental results verify the effectiveness of uncertain M-D distance and PAHT algorithm and the feasibility of uncertain PAHT algorithm on the landslide hazard prediction. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

21. 不确定近似骨架蚁群聚类算法在滑坡危险性预测中的研究与应用.

Author: 刘卫明, 李忠利, and 毛伊敏
Abstract: The uncertain factor rainfall is hard to accurately handle and the ant colony clustering algorithm is easy to get caught into sub-optimal solution and the searching speed is low in searching space. In order to improve the prediction accuracy of landslide hazard, we propose an uncertain ant colony clustering algorithm based on approximate backbone. Firstly, it utilizes the Gauss point probability model to describe the uncertain data and measure their similarity. Secondly, we introduce the pheromone redistribution and adaptive dynamic variables to update the local pheromone and global pheromone for improving the algorithm's searching speed, and load the genetic algorithm to prevent it from falling into local optimum early. Finally, combining the approximate backbone theory, we build an uncertain ant colony clustering algorithm model based on approximate backbone, which reduces the iteration times and obtains the clustering solution rapidly. Experiments on UCI true datascts and landslide experiment data-sets of the Baota district of Yan'an show that the proposed method achieves a higher clustering quality and the prediction accuracy reaches 93. 3%, which verifies its feasibility. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

22. 结合分段位图和B+树的云数据索引机制研究.

Author: 贺智明, 张慧云, and 毛伊敏
Abstract: In order to solve the large storage space of indexing data in bitmap index and low efficiency during retrieving, this paper developed a cloud data index mechanism combined segmented bitmap and B+ tree( BBI). BBI divided data into several segments based on a certain number when the index was created, bitmap index by segment. It changed the decision factor of the index data quantity from the range of attribute values to the product of the segments and certain number,which greatly reduced the storage space of the index data. Furthermore, it built the B+ tree on each data node, the unnecessary computing expenses on local nodes could be avoided according to the global distribution information. Therefore, retrieving efficiency could be greatly improved. The experimental results show that the BBI index is a better data index in cloud data index mechanism. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

22 results on '"毛伊敏"'

1. 基于互信息和融合加权的并行深度森.

2. 基于 Spark 和 NRSCA 策略的并行深度森林算法.

3. 改进的并行关联规则增量挖掘算法.

4. 基于网格密度和局部敏感哈希函数的并行化聚类算法.

5. 基于大数据的并行化深度卷积神经网络优化算法.

6. 基于中心性和模块特性的关键蛋白质识别.

7. 基于模糊蚁群的加权蛋白质复合物识别算法.

8. 基于复合物参与度和密度的关键蛋白质预测.

9. 不确定NNSB-OPTICS聚类算法在滑坡危险性预测中的研究与应用.

10. 基于 Spark和 AMPSO 的并行深度卷积神经网络优化算法.

11. 结合增益率与堆叠自编码器的并行随机森林算法.

12. 基于Map Reduce和MSSA的并行K-means算法.

13. 基于Im2col的并行深度卷积神经网络优化算法.

14. 面向电力物联网流数据的一种具有隐私保护的KNN查询方法.

15. 基于 Relief 和 BFO 的并行支持向量机算法.

16. 基于分组和 IGSA 的并行密度聚类算法.

17. 基于 MapReduce 和 IFOA 的并行密度聚类算法.

18. 基于MapReduce的并行频繁项集挖掘算法研究.

19. 基于蚁群聚类的动态加权 PPI 网络复合物挖掘.

20. 不确定PAHT聚类算法在滑坡危险性预测上的应用.

21. 不确定近似骨架蚁群聚类算法在滑坡危险性预测中的研究与应用.

22. 结合分段位图和B+树的云数据索引机制研究.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

22 results on '"毛伊敏"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources