278 results on '"association rules"'
Search Results
2. Research on correlation factor analysis and prediction method of overhead transmission line defect state based on association rule mining and RBF-SVM
- Author
-
Zuming Yan, Xinghua Wang, Xiangang Peng, Yongbin Zeng, Xiaoye Liu, and Haoliang Yuan
- Subjects
Transmission lines ,Support vector machine ,Association rule learning ,Computer science ,020209 energy ,Defect state ,Decision tree ,Classification prediction ,02 engineering and technology ,Association rules ,computer.software_genre ,General Energy ,Electric power transmission ,020401 chemical engineering ,Transmission line ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Overhead (computing) ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,State (computer science) ,Data mining ,0204 chemical engineering ,lcsh:TK1-9971 ,computer - Abstract
The effective assessment and prediction of the defect state of transmission lines can provide important technical support for the maintenance management of transmission lines. This paper proposes a method of correlation factors analysis and prediction for transmission line defect state based on association rule mining and RBF-SVM since the single operation parameter is often used in the analysis and prediction of transmission line defect state, and ignoring the influence of internal and external factors such as the meteorological conditions, operating conditions, etc. Firstly, according to the defect state assessment of transmission lines, based on the existing data, a characteristic library of the defect state and correlation factors is constructed by considering various relevant influencing factors. Then FP-Growth algorithm is introduced into the association rules mining, which can find the internal and external factors that have a strong association with defect, and the association rules can be used as the input feature of the prediction model, so as to avoid the influence of low association factors on the accuracy of defect state prediction. Finally, RBF-SVM was used to predict the defect state, and have a better prediction accuracy compared with three commonly used methods of the linear SVM, ANN and the decision tree. The proposed approach is illustrated by predicting the defect state of an overhead transmission line in a certain area. The results verify the effectiveness of the method and provide a certain reference for the maintenance of the transmission line. more...
- Published
- 2021
- Full Text
- View/download PDF
Catalog
3. A novel artificial intelligent approach: comparison of machine learning tools and algorithms based on optimization DEA Malmquist productivity index for eco-efficiency evaluation
- Author
-
Kamyar Kabirifar, Elham Shadkam, Mirpouya Mirmozaffari, Tayyebeh Asgari Gashteroodkhani, Seyyed Mohammad Khalili, Reza Yazdani, University of Texas at Arlington [Arlington], Khayyam University, Ferdowsi University Mashhad, University of New South Wales [Sydney] (UNSW), Islamic Azad University, and University of Guilan more...
- Subjects
Artificial Intelligent ,Optimization ,[SPI.OTHER]Engineering Sciences [physics]/Other ,Association rule learning ,Eco-efficiency ,Process (engineering) ,Computer science ,020209 energy ,Strategy and Management ,02 engineering and technology ,010501 environmental sciences ,Association rules ,Machine learning ,computer.software_genre ,7. Clean energy ,01 natural sciences ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,0202 electrical engineering, electronic engineering, information engineering ,Data envelopment analysis ,Additive model ,Productivity ,0105 earth and related environmental sciences ,business.industry ,[INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO] ,Classification ,Slack variable ,Statistical classification ,General Energy ,Data Envelopment Analysis ,13. Climate action ,Two-stage additive models ,Artificial intelligence ,business ,computer ,Algorithm - Abstract
Purpose Cement as one of the major components of construction activities, releases a tremendous amount of carbon dioxide (CO2) into the atmosphere, resulting in adverse environmental impacts and high energy consumption. Increasing demand for CO2 consumption has urged construction companies and decision-makers to consider ecological efficiency affected by CO2 consumption. Therefore, this paper aims to develop a method capable of analyzing and assessing the eco-efficiency determining factor in Iran’s 22 local cement companies over 2015–2019. Design/methodology/approach This research uses two well-known artificial intelligence approaches, namely, optimization data envelopment analysis (DEA) and machine learning algorithms at the first and second steps, respectively, to fulfill the research aim. Meanwhile, to find the superior model, the CCR model, BBC model and additive DEA models to measure the efficiency of decision processes are used. A proportional decreasing or increasing of inputs/outputs is the main concern in measuring efficiency which neglect slacks, and hence, is a critical limitation of radial models. Thus, the additive model by considering desirable and undesirable outputs, as a well-known DEA non-proportional and non-radial model, is used to solve the problem. Additive models measure efficiency via slack variables. Considering both input-oriented and output-oriented is one of the main advantages of the additive model. Findings After applying the proposed model, the Malmquist productivity index is computed to evaluate the productivity of companies over 2015–2019. Although DEA is an appreciated method for evaluating, it fails to extract unknown information. Thus, machine learning algorithms play an important role in this step. Association rules are used to extract hidden rules and to introduce the three strongest rules. Finally, three data mining classification algorithms in three different tools have been applied to introduce the superior algorithm and tool. A new converting two-stage to single-stage model is proposed to obtain the eco-efficiency of the whole system. This model is proposed to fix the efficiency of a two-stage process and prevent the dependency on various weights. Converting undesirable outputs and desirable inputs to final desirable inputs in a single-stage model to minimize inputs, as well as turning desirable outputs to final desirable outputs in the single-stage model to maximize outputs to have a positive effect on the efficiency of the whole process. Originality/value The performance of the proposed approach provides us with a chance to recognize pattern recognition of the whole, combining DEA and data mining techniques during the selected period (five years from 2015 to 2019). Meanwhile, the cement industry is one of the foremost manufacturers of naturally harmful material using an undesirable by-product; specific stress is given to that pollution control investment or undesirable output while evaluating energy use efficiency. The significant concentration of the study is to respond to five preliminary questions. more...
- Published
- 2021
- Full Text
- View/download PDF
4. Sustainable Development of College and University Education by use of Data Mining Methods
- Author
-
Liwen Wang and Soo-Jin Chung
- Subjects
Scheme (programming language) ,Education reform ,Apriori algorithm ,sustainable development ,Warning system ,Association rule learning ,lcsh:T58.5-58.64 ,apriori algorithm ,lcsh:Information technology ,Teaching method ,Mathematical statistics ,General Engineering ,data mining ,Education ,association rules ,students-oriented ,Mathematics education ,ComputingMilieux_COMPUTERSANDEDUCATION ,lcsh:L ,computer ,Decision tree model ,computer.programming_language ,lcsh:Education - Abstract
To improve the education efficiency of the students, the student-centered education plan is explored. First, the Apriori algorithm of association rules is used to mine the potential related patterns in the score data of college students and establish a reasonable teaching method. Second, aided by the decision tree model, the factors affecting students' academic performance are studied, and the potential relationship between different courses is studied. Finally, the Apriori algorithm of association rules combined with decision tree model is used to generate the early warning mechanism of students' achievement, and the course performance of college students is empirically analyzed. The results show that: C language has two sides of dependence on many subjects; higher mathematics → linear algebra → mathematical statistics → computer composition principle → computer network. The teaching scheme of C language → C + + → Java more conforms to the learning mechanism of college students. Through empirical analysis, the early warning mechanism of association rule Apriori algorithm and decision tree model can effectively analyze student's course and give student's achievement. It is found that the method proposed can provide theoretical basis for students, teachers, and university administrators to carry out education reform and education management decision-making, improve students' performance and education quality, and realize the "student-oriented" education concept, so it can be applied to the actual education management. more...
- Published
- 2021
5. Research of insomnia on traditional Chinese medicine diagnosis and treatment based on machine learning
- Author
-
Tao Liu, Zechen Li, Dongdong Yang, Shan Liang, Yu Fang, Yuqi Tang, and Shanshan Gao
- Subjects
0301 basic medicine ,Insomnia ,Association rule learning ,Computer science ,Sample (statistics) ,Traditional Chinese medicine ,Machine learning ,computer.software_genre ,Association rules ,03 medical and health sciences ,0302 clinical medicine ,Cluster analysis ,Diagnosis ,Acupuncture ,medicine ,Medical prescription ,Pharmacology ,business.industry ,Research ,lcsh:Other systems of medicine ,lcsh:RZ201-999 ,Hierarchical clustering ,Random forest ,TCM ,030104 developmental biology ,Complementary and alternative medicine ,Artificial intelligence ,medicine.symptom ,business ,computer ,030217 neurology & neurosurgery - Abstract
Background Insomnia as one of the dominant diseases of traditional Chinese medicine (TCM) has been extensively studied in recent years. To explore the novel approaches of research on TCM diagnosis and treatment, this paper presents a strategy for the research of insomnia based on machine learning. Methods First of all, 654 insomnia cases have been collected from an experienced doctor of TCM as sample data. Secondly, in the light of the characteristics of TCM diagnosis and treatment, the contents of research samples have been divided into four parts: the basic information, the four diagnostic methods, the treatment based on syndrome differentiation and the main prescription. And then, these four parts have been analyzed by three analysis methods, including frequency analysis, association rules and hierarchical cluster analysis. Finally, a comprehensive study of the whole four parts has been conducted by random forest. Results Researches of the above four parts revealed some essential connections. Simultaneously, based on the algorithm model established by the random forest, the accuracy of predicting the main prescription by the combinations of the four diagnostic methods and the treatment based on syndrome differentiation was 0.85. Furthermore, having been extracted features through applying the random forest, the syndrome differentiation of five zang-organs was proven to be the most significant parameter of the TCM diagnosis and treatment. Conclusions The results indicate that the machine learning methods are worthy of being adopted to study the dominant diseases of TCM for exploring the crucial rules of the diagnosis and treatment. more...
- Published
- 2021
6. Web Kullanıcılarının Bilgi Erişim ve Ziyaret Desenlerinin Web Madenciliği ile Keşfi: Kırklareli Üniversitesi Örneği
- Author
-
Çiğdem Selçukcan Erol and Veli Özcan Budak
- Subjects
General Computer Science ,Association rule learning ,web usage mining ,Computer science ,020209 energy ,Communication. Mass media ,QA75.5-76.95 ,02 engineering and technology ,P87-96 ,association rules ,World Wide Web ,apriori ,web kullanım madenciliği ,Web mining ,bilgi erişim ,birliktelik kuralları ,Electronic computers. Computer science ,0202 electrical engineering, electronic engineering, information engineering ,T1-995 ,020201 artificial intelligence & image processing ,information retrieval ,Technology (General) - Abstract
Web siteleri, kurumsal ya da bireysel açıdan hitap edilen kitleyle ilk temasın sağlandığı bir etkileşim aracıdır. Bu araç, yoğun bir bilgi erişim ve ziyaret trafiğinin bulunduğu süreçlerde, kullanıcı davranışlarındaki farklı desenlerin tespit edilebileceği önemli bir potansiyeli içinde barındırmaktadır. Bu desenler, kullanıcı ihtiyaçlarının daha belirginleştirilmesi ve site geliştiricilerinin bu ihtiyaçlar doğrultusunda güncellemeler yapabilmesi açısından oldukça kritik görevler üstlenebilir. Bu çalışmanın amacı, dünya genelinde yaşanan Covid-19 pandemisinin ülkemizde etkinliğini arttırdığı süreçte, Kırklareli Üniversitesi web sitelerindeki kullanıcıların bilgi erişim ihtiyaçlarındaki değişimin belirlenmesidir. Bu amaç doğrultusunda, kullanıcıların bilgi erişim ve ziyaret davranışları, apriori algoritmasıyla bağımsız ve birlikte olacak şekilde incelenerek, aralarındaki ilişkilerin ortaya çıkarılması hedeflenmiştir. Bilgi erişim kavramı açısından çalışma sonuçları, kullanıcıların “tez yazımı”na yönelik çeşitli arama terimleriyle bilgi ihtiyaçlarını karşılamaya çalıştıklarını göstermiştir. Bu sonuç, özellikle lisansüstü öğrencilerin ilgili süreçte aktif olduklarına işaret etmektedir. Ziyaret davranışları açısından, “uzaktan eğitim”, “koronavirüs” ve “tatil” temalı sayfaların ağırlıklı olarak ziyaret edildiği ortaya çıkmıştır. Bilgi erişim davranışları sonrasında sergilenen ziyaret davranışları açısındansa, “tez yazımı”, “tatil” ve “eğitim öğretimin ertelenmesi” temalı ziyaretlerin birliktelikleri göze çarpmıştır. Çalışma sonucunda ortaya çıkarılmış olan davranış desenleri ve bu desenlerden nasıl faydalanılabileceğine yönelik öneriler çalışma kapsamında detaylı bir şekilde açıklanmıştır. more...
- Published
- 2020
- Full Text
- View/download PDF
7. Machine Learning-Based HIV Risk Estimation Using Incidence Rate Ratios
- Author
-
Oliver Haas, Andreas Maier, and Eva Rothgang
- Subjects
Medicine (General) ,bias ,QH471-489 ,Association rule learning ,Population ,Rate ratio ,Machine learning ,computer.software_genre ,association rules ,R5-920 ,Acquired immunodeficiency syndrome (AIDS) ,risk estimation ,Medicine ,Medical diagnosis ,education ,Estimation ,education.field_of_study ,Receiver operating characteristic ,business.industry ,Reproduction ,HIV ,General Medicine ,Emergency department ,medicine.disease ,machine learning ,clinical data ,Artificial intelligence ,ddc:620 ,business ,computer - Abstract
HIV/AIDS is an ongoing global pandemic, with an estimated 39 million infected worldwide. Early detection is anticipated to help improve outcomes and prevent further infections. Point-of-care diagnostics make HIV/AIDS diagnoses available both earlier and to a broader population. Wide-spread and automated HIV risk estimation can offer objective guidance. This supports providers in making an informed decision when considering patients with high HIV risk for HIV testing or pre-exposure prophylaxis (PrEP). We propose a novel machine learning method that allows providers to use the data from a patient's previous stays at the clinic to estimate their HIV risk. All variables available in the clinical data are considered, making the set of variables objective and independent of expert opinions. The proposed method builds on association rules that are derived from the data. The incidence rate ratio (IRR) is determined for each rule. Given a new patient, the average IRR of all applicable rules is used to estimate their HIV risk. The method was tested and validated on the publicly available clinical database MIMIC-IV, which consists of around 525,000 hospital stays that included a stay at the intensive care unit or emergency department. We evaluated the method using the area under the receiver operating characteristic curve (AUC). The best performance with an AUC of 0.88 was achieved with a model consisting of 78 rules. A threshold value of 1.0, i.e. an IRR that denotes no association, leads to a sensitivity of 98% and a specificity of 51%. The rules were grouped into social factors (e.g. homelessness, violence), drug abuse, psychological illnesses (e.g. depression, PTSD), previously known associations (e.g. pulmonary, neurological diseases), and new associations (e.g. diabetes, insulin uptake). In conclusion, we propose a novel HIV risk estimation method that builds on existing clinical data. It incorporates a wide range of variables, leading to a model that is independent of expert opinions. It supports providers in making informed decisions in the point-of-care diagnostics process by estimating a patient's HIV risk. more...
- Published
- 2021
8. An Association Rules-Based Method for Outliers Cleaning of Measurement Data in the Distribution Network
- Author
-
He Mi, Cheng Guo, Xian Meng, Xin He, Ruimin Duan, Hua Kuang, and Risheng Qin
- Subjects
DBSCAN ,Economics and Econometrics ,Association rule learning ,Computer science ,Energy Engineering and Power Technology ,computer.software_genre ,General Works ,association rules ,Set (abstract data type) ,outliers cleaning ,outliers repairing ,distribution network ,Reliability (statistics) ,Mahalanobis distance ,measurement data ,Renewable Energy, Sustainability and the Environment ,InformationSystems_DATABASEMANAGEMENT ,ComputingMethodologies_PATTERNRECOGNITION ,Fuel Technology ,Data quality ,Outlier ,outliers detection ,Noise (video) ,Data mining ,computer - Abstract
For any power system, the reliability of measurement data is essential in operation, management and also in planning. However, it is inevitable that the measurement data are prone to outliers, which may impact the results of data-based applications. In order to improve the data quality, the outliers cleaning method for measurement data in the distribution network is studied in this paper. The method is based on a set of association rules (AR) that are automatically generated form historical measurement data. First, the association rules are mining in conjunction with the density-based spatial clustering of application with noise (DBSCAN), k-means and Apriori technique to detect outliers. Then, for the outliers repairing process after outliers detection, the proposed method uses a distance-based model to calculate the repairing cost of outliers, which describes the similarity between outlier and normal data. Besides, the Mahalanobis distance is employed in the repairing cost function to reduce the errors, which could implement precise outliers cleaning of measurement data in the distribution network. The test results for the simulated datasets with artificial errors verify that the superiority of the proposed outliers cleaning method for outliers detection and repairing. more...
- Published
- 2021
- Full Text
- View/download PDF
9. Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining
- Author
-
Xuan Ma, Hisakazu Ogura, Yalong Zhang, Dongfen Ye, and Wei Yu
- Subjects
Technology ,Association rule learning ,Computer science ,QH301-705.5 ,QC1-999 ,Big data ,frequent itemset mining ,High dimensional ,Space (commercial competition) ,computer.software_genre ,Multi-objective optimization ,Set (abstract data type) ,association rules ,big data ,General Materials Science ,Biology (General) ,Instrumentation ,QD1-999 ,Fluid Flow and Transfer Processes ,maximal frequent itemset ,business.industry ,Process Chemistry and Technology ,Physics ,General Engineering ,InformationSystems_DATABASEMANAGEMENT ,Engineering (General). Civil engineering (General) ,Computer Science Applications ,Running time ,Chemistry ,multi-objective optimization ,A priori and a posteriori ,Data mining ,TA1-2040 ,business ,computer - Abstract
The solution space of a frequent itemset generally presents exponential explosive growth because of the high-dimensional attributes of big data. However, the premise of the big data association rule analysis is to mine the frequent itemset in high-dimensional transaction sets. Traditional and classical algorithms such as the Apriori and FP-Growth algorithms, as well as their derivative algorithms, are unacceptable in practical big data analysis in an explosive solution space because of their huge consumption of storage space and running time. A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data. First, all frequent 2-itemsets were generated by scanning transaction sets based on which new items were added in as the objects of population evolution. Algorithms aim to search for the maximal frequent itemset to gather more non-void subsets because non-void subsets of frequent itemsets are all properties of frequent itemsets. During the operation of algorithms, lethal gene fragments in individuals were recorded and eliminated so that individuals may resurge. Finally, the set of the Pareto optimal solution of the frequent itemset was gained. All non-void subsets of these solutions were frequent itemsets, and all supersets are non-frequent itemsets. Finally, the practicability and validity of the proposed algorithm in big data were proven by experiments. more...
- Published
- 2021
10. Big Data-Driven Abnormal Behavior Detection in Healthcare Based on Association Rules
- Author
-
Hui Yang, Runtong Zhang, Donghua Chen, Jie He, and Shengyao Zhou
- Subjects
General Computer Science ,Association rule learning ,Computer science ,Big data ,02 engineering and technology ,Disease cluster ,abnormal behavior ,association rules ,03 medical and health sciences ,Order (exchange) ,Health care ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,health care economics and organizations ,030304 developmental biology ,0303 health sciences ,business.industry ,General Engineering ,Medical insurance ,Risk analysis (engineering) ,Benchmark (computing) ,020201 artificial intelligence & image processing ,healthcare insurance ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Abnormality ,business ,lcsh:TK1-9971 - Abstract
Healthcare insurance frauds are causing millions of dollars of public healthcare fund losses around the world in various ways, which makes it very important to strengthen the management of medical insurance in order to guarantee the steady operation of medical insurance funds. Healthcare fraud detection methods can reduce the losses of healthcare insurance funds and improve medical quality. Existing fraud detection studies mostly focus on finding normal behavior patterns and treat those violating normal behavior patterns as fraudsters. However, fraudsters can often disguise themselves with some normal behaviors, such as some consistent behaviors when they seek medical treatments. To address these issues, we combined a MapReduce distributed computing model and association rule mining to propose a medical cluster behavior detection algorithm based on frequent pattern mining. It can detect certain consistent behaviors of patients in medical treatment activities. By analyzing 1.5 million medical claim records, we have verified the effectiveness of the method. Experiments show that this method has better performance than several benchmark methods. more...
- Published
- 2020
11. Cause Analysis of Traffic Accidents on Urban Roads Based on an Improved Association Rule Mining Algorithm
- Author
-
Qiuru Cai
- Subjects
Apriori algorithm ,General Computer Science ,Association rule learning ,Lift (data mining) ,Computer science ,Control (management) ,General Engineering ,causes of traffic accidents ,data mining ,association rules ,Transport engineering ,Multiple factors ,Cause analysis ,General Materials Science ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Dimension (data warehouse) ,Urban roads ,lcsh:TK1-9971 - Abstract
The traffic accidents on urban roads are result of joint actions between multiple factors, namely, human, vehicle, road and environment. To identify the key causes to such accidents, it is necessary to mine the association rules between relevant risk factors out of the statistics on these accidents. Considering the multiple layers and dimensions of accident data, this paper improves the Apriori algorithm to mine the association rules between risk factors, and probes deep into the causes of traffic accidents on urban roads. According to the layer and dimension of specific attributes, the parameters like support, confidence and lift were adjusted to find the qualified association rules between risk factors. The results were further screened to obtain a series of meaningful association rules. The research results enable the traffic department to formulate pertinent accident control measures, and promote the traffic safety on urban roads. more...
- Published
- 2020
- Full Text
- View/download PDF
12. Exploring the Correlation Between Attention and Cognitive Load Through Association Rule Mining by Using a Brainwave Sensing Headband
- Author
-
Shu-Chen Cheng, Yu-Ping Cheng, Yueh-Min Huang, and You-Yi Chen
- Subjects
General Computer Science ,Association rule learning ,Internet of Things ,0206 medical engineering ,02 engineering and technology ,Electroencephalography ,association rules ,Correlation ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,General Materials Science ,Baseline (configuration management) ,Wearable technology ,medicine.diagnostic_test ,business.industry ,General Engineering ,Cognition ,data mining ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,Psychology ,lcsh:TK1-9971 ,electroencephalography ,020602 bioinformatics ,Cognitive load ,Cognitive psychology - Abstract
In recent years, Internet of Things (IoT) technology has brought many applications and developments for wearable devices, and the use of non-invasive electroencephalography (EEG) instruments to measure attention has been a topic of discussion. However, the correlation between attention and cognitive load has rarely been analyzed by data mining. For this reason, this study used head-mounted non-invasive EEG instruments based on IoT technology to collect attention values related to two courses and extracurricular activities and used a cognitive load questionnaire to investigate the cognitive loads of subjects. Correlation analysis was carried out through data mining technology to find the correlation between attention and cognitive load. In addition, six short-term experiments and relaxation experiments were designed to measure the subjects' maximum attention and minimum attention values, so as to propose a strategy for setting the attention baseline. According to the results of the various experiments, subjects suffering from overload showed a state of inattention during the whole activity while subjects suffering a high load showed low sustained attention; only subjects with a medium load showed high sustained attention. Subjects with a low load showed inattention for nearly the entire activity. In this study, a strategy for setting an attention baseline was proposed to normalize the attention values from different EEG instruments. The correlation between attention value and cognitive load is analyzed using association rule mining technology so that the change of cognitive load could be effectively estimated by measuring the attention value instead of using questionnaire in the future. more...
- Published
- 2020
- Full Text
- View/download PDF
13. Power System Fault Classification and Prediction Based on a Three-Layer Data Mining Structure
- Author
-
Yunliang Wang, Xiaodong Wang, Yanjuan Wu, and Yannan Guo
- Subjects
General Computer Science ,Association rule learning ,Computer science ,020209 energy ,02 engineering and technology ,Association rules ,Fault (power engineering) ,computer.software_genre ,Data modeling ,Electric power system ,Local optimum ,Linear regression ,power system fault ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Cluster analysis ,K-means ,stochastic gradient descent algorithm ,Training set ,General Engineering ,Regression analysis ,data mining ,TK1-9971 ,machine learning ,Stochastic gradient descent ,020201 artificial intelligence & image processing ,Electrical engineering. Electronics. Nuclear engineering ,Data mining ,Fault model ,computer - Abstract
In traditional fault diagnosis methods in power systems, it is difficult to accurately classify and predict the types of faults. With the emergence of big data technology, the fault classification and prediction methods based on big data analysis and processing have been applied in power systems. To make the classification and prediction of the fault types more accurate, this paper proposes a hybrid data mining method for power system fault classification and prediction based on clustering, association rules and stochastic gradient descent. This method uses a three-layer data mining model: The first layer uses the $K$ -means clustering algorithm to preprocess the original fault data source, and it proposes to use self-encoding to simplify the data form. The second layer effectively eliminates the data that have little impact on the prediction results by using association rules, and the highly correlated data are mined to become the regression training data. The third layer first uses the cross-validation method to obtain the optimal parameters of each fault model, and then, it uses stochastic gradient descent for data regression training to obtain a classification and prediction model for each fault type. Finally, a verification example shows that compared with a single data mining algorithm model, the proposed method is more comparative in terms of the data mining, and the established power system fault classification and prediction model has global optimality and higher prediction accuracy, which has a certain feasibility for real-time online power system fault classification and prediction. This method reduces the disturbances from low-impact or irrelevant data by mining the fault data three times, and it uses cross-validation to optimize the multiple regression parameters of the regression model to solve the problems of low accuracy, large errors and easily falling into a local optimum, given the conduct of fault classification and prediction. more...
- Published
- 2020
- Full Text
- View/download PDF
14. High-Frequency Path Mining-Based Reward and Punishment Mechanism for Multi-Colony Ant Colony Optimization
- Author
-
Han Pan, Xiaoming You, and Sheng Liu
- Subjects
0209 industrial biotechnology ,Mathematical optimization ,General Computer Science ,Association rule learning ,Computer science ,02 engineering and technology ,minimum spanning tree ,Minimum spanning tree ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,association rules ,020901 industrial engineering & automation ,Local optimum ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Motion planning ,Cluster analysis ,path planning ,gaussian filter ,Reward and punishment mechanism ,Lift (data mining) ,Ant colony optimization algorithms ,General Engineering ,Pheromone ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 - Abstract
To solve the problem of falling into local optimum and poor convergence speed of traditional ant colony algorithm, this paper proposes a High-frequency path mining-based Reward and Punishment mechanism for multi-colony Ant Colony Optimization (HRPACO). Firstly, the pheromone concentration on the path of effective strong association is rewarded adaptively according to the lift of association rules to accelerate the convergence speed. Secondly, the pheromone concentration on the path of minimum spanning tree is punished adaptively according to the support of association rules to improve the diversity of the colony. The interaction of reward and punishment mechanism can effectively balance the diversity and convergence. Finally, a self-evolutionary mechanism based on Gaussian filter is proposed to adaptively adjust the pheromone concentration by dynamic smoothing of the pheromone matrix, so as to help the colony jump out of the local optimum. The TSP is used to verify the performance of the algorithm. The simulation results show that the proposed algorithm can effectively accelerate the convergence speed and improve the accuracy of solution, especially for large-scale problems. Meanwhile, path planning is used to verify the feasibility of the proposed algorithm. The simulation results show that the algorithm can find an effective and better path even in the environment of complex obstacles. more...
- Published
- 2020
- Full Text
- View/download PDF
15. Discovery of Frequent Patterns of Episodes Within a Time Window for Alarm Management Systems
- Author
-
Adel Hidri, Minyar Sassi Hidri, and Ahmed Selmi
- Subjects
General Computer Science ,Association rule learning ,Computer science ,020209 energy ,media_common.quotation_subject ,alarm management ,Sequential pattern mining ,02 engineering and technology ,Machine learning ,computer.software_genre ,Field (computer science) ,Adaptability ,association rules ,020401 chemical engineering ,Alarm management ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,0204 chemical engineering ,media_common ,business.industry ,Scale (chemistry) ,General Engineering ,data mining ,artificial intelligence ,Marketing strategy ,Analytics ,Artificial intelligence ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Line (text file) ,business ,computer ,lcsh:TK1-9971 - Abstract
The sequential pattern mining field is expanding through numerous researches and has a large number of applications such as language processing, alarms management and event management on a broader scale. Its use began with processing items baskets to learn patterns and have a directed marketing strategy but it is generalized to telecommunication alarms management with several works. Our work is in line with this, as it tries to locate patterns and identify them to make predictive statements about certain patterns. It is axed around providing a way to break sequences into episodes and assigning them a value of confidence and support, more precisely in the discovery of frequent patterns of episodes within a time window. Experimental results have shown the effectiveness of our sequential pattern mining approach and its adaptability to alarm management and analytics. more...
- Published
- 2020
16. A Combined Approach for Customer Profiling in Video on Demand Services Using Clustering and Association Rule Mining
- Author
-
Serhat Peker, Cigdem Turhan, and Sinem Guney
- Subjects
Apriori algorithm ,General Computer Science ,Association rule learning ,business.industry ,Computer science ,General Engineering ,Customer segmentation ,IPTV ,data mining ,Service provider ,computer.software_genre ,Marketing strategy ,association rules ,VoD services ,Market segmentation ,Profiling (information science) ,General Materials Science ,RFM model ,Data mining ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,Cluster analysis ,computer ,lcsh:TK1-9971 ,clustering - Abstract
The purpose of this paper is to propose a combined data mining approach for analyzing and profiling customers in video on demand (VoD) services. The proposed approach integrates clustering and association rule mining. For customer segmentation, the LRFMP model is employed alongside the k-means and Apriori algorithms to generate association rules between the identified customer groups and content genres. The applicability of the proposed approach is demonstrated on real-world data obtained from an Internet protocol television (IPTV) operator. In this way, four main customer groups are identified: “high consuming-valuable subscribers”,” less consuming subscribers”,” less consuming-loyal subscribers” and “disloyal subscribers”. In detail, for each group of customers, a different marketing strategy or action is proposed, mainly campaigns, special-day promotions, discounted materials, offering favorite content, etc. Further, genres preferred by these customer segments are extracted using the Apriori algorithm. The results obtained from this case study also show that the proposed approach provides an efficient tool to form different customer segments with specific content rental characteristics, and to generate useful association rules for these distinct groups. The proposed combined approach in this research would be beneficial for IPTV service providers to implement effective CRM and customer-based marketing strategies. more...
- Published
- 2020
17. Sensing the Web for Induction of Association Rules and their Composition through Ensemble Techniques
- Author
-
Giovanni Pilato, Filippo Vella, Ignazio Infantino, and Agnese Augello
- Subjects
World Wide Web ,Boosting (machine learning) ,Association rule learning ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,Association Rules ,Web Sensing ,Emergency ,Big Data ,Boosting ,Ensemble techniques ,02 engineering and technology ,General Environmental Science - Abstract
Starting from geophysical data collected from heterogeneous sources, such as meteorological stations and information gathered from the web, we seek unknown connections between the sampled values through the extraction of association rules. These rules imply the co-occurrence of two or more symbols in the same representation, and the rule confidence may vary according to the collected data. We propose, starting from traditional algorithms such as FP-Growth and Apriori, the creation of complex association rules through boosting of simpler ones. The composition enables the creation of rules that are robust and let emerge a larger number of interesting rules. more...
- Published
- 2020
- Full Text
- View/download PDF
18. A Novel Association Rule-Based Data Mining Approach for Internet of Things Based Wireless Sensor Networks
- Author
-
Sohail Abbas, Walid Osamy, Ahmed Salim, and Ahmed M. Khedr
- Subjects
Scheme (programming language) ,distributed databases ,General Computer Science ,Association rule learning ,Computer science ,Internet of Things ,Stability (learning theory) ,02 engineering and technology ,computer.software_genre ,Association rules ,Base station ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,General Materials Science ,computer.programming_language ,network lifetime ,business.industry ,General Engineering ,Volume (computing) ,020206 networking & telecommunications ,stability ,wireless sensor networks based-clustering ,Analytics ,020201 artificial intelligence & image processing ,Data mining ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,Wireless sensor network ,computer ,lcsh:TK1-9971 - Abstract
Wireless Sensor Network (WSN) is one of the fundamental technologies used in the Internet of Things (IoT) which is deployed for diverse applications to carry out precise real-time observations. The limited resources of WSN with massive volume of fast-flowing IoT data make the aggregation and analytics of data more challenging. Recently, data mining-based solutions have been proposed to effectively handle the data being generated by the sensors and to analyze the data patterns for deducing the required information from it. The increasing need of these techniques motivated us to propose a distributed and efficient data mining technique that not only handles the massive and rapidly generated data by the nodes, but also increases the life span of the network. In this paper, we propose a novel scheme for the IoT based WSN that mines the sensor data using association rule without moving it to any Cluster Head (CH) or Base Station (BS). The new proposed scheme enables sensors to perform computations locally and only the minimum higher-level statistical summaries of the data at Cluster Members (CMs) are exchanged with their CH. This considerably reduces the communication overhead which ultimately prolongs the network lifetime. The proposed scheme is evaluated via extensive simulations and the results obtained demonstrate that the integration of the proposed scheme in the existing protocols significantly reduces the communication overhead which ultimately prolongs the network lifetime and stability. more...
- Published
- 2020
19. A Study On Profiling Students via Data Mining
- Author
-
Mustafa Temiz and Mehmet Ali Alan
- Subjects
Apriori algorithm ,Operations Research and Management Science ,Association rule learning ,lcsh:T55.4-60.8 ,Computer science ,Big data ,data warehouse ,lcsh:Business ,computer.software_genre ,Personalization ,association rules ,Management of Technology and Innovation ,ComputingMilieux_COMPUTERSANDEDUCATION ,Profiling (information science) ,lcsh:Industrial engineering. Management engineering ,student profile ,business.industry ,data mining ,Data warehouse ,Data Mining,Association Rules,Student Profile,Data Warehouse ,Financial transaction ,Data analysis ,Data mining ,business ,lcsh:HF5001-6182 ,computer ,Yöneylem, Araştırma ve Yönetim Bilimi - Abstract
Data mining is a significant method which is utilized in order to reveal the hidden patterns and connections within big data. The method is used at various fields such as financial transactions, banking, education, health sector, logistics and security. Even though analysis towards the consumption habits of the customers is carried out via association rules mining more often, which is one of the basic methods of data mining, the method is also utilized in order to profile patients and students. As well as the customization of a customer is of high significance, so is distinguishing and customizing a student. Within this study, students were tried to be profiled via data mining of the student data of a high school. A set of qualities, that can directly affect the performance of students such as health conditions, financial resources, life standards and education level of the families, were taken into consideration. For that purpose, upon the analysis of data of 443 students in the database, a data warehouse was established. The Apriori algorithm, which is one of the popular algorithms of association rules mining, is utilized for the data analysis. Apriori algorithm was able to produce 72 rules which are accurate above 90%. It is thought that the produced rules can be of help in profiling the students, and they can contribute to work of school management, teachers, parents and students. more...
- Published
- 2019
20. Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered FP-Tree
- Author
-
Shaohong Yin and Yuanyuan Li
- Subjects
China ,Association rule learning ,Computer science ,Computer Science::Information Retrieval ,Weighted Model ,General Engineering ,Downward closure property ,Ordered FPTree ,Weighted Ordered FP-Tree ,Space (commercial competition) ,computer.software_genre ,Data mining algorithm ,Association Rules ,Tree (data structure) ,Data Mining ,Data mining ,computer - Abstract
FP-growth algorithm is a classic algorithm of mining frequent item sets, but there exist certain disadvantages for mining the weighted frequent item sets. Based on the weighted downward closure property of the weighted model, this paper proposed a method to reduce the use of storage space by constructing a weight ordered FP-tree, so as to improve the generation efficiency of weighted frequent item sets. more...
- Published
- 2019
- Full Text
- View/download PDF
21. Discovering hidden patterns in Turkish construction projects delays related to project characteristic
- Author
-
İsmail Cengiz Yılmaz and Ezgi Kazan
- Subjects
Association rule learning ,Computer science ,Turkish ,Delay analysis ,İnşaat Mühendisliği ,construction projects ,Data science ,Civil Engineering ,language.human_language ,lcsh:TH1-9745 ,apriori algorithms ,association rules ,Construction Projects,Delay Analysis,Apriori Algorithms,Association Rules ,delay analysis ,language ,lcsh:TA401-492 ,lcsh:Materials of engineering and construction. Mechanics of materials ,lcsh:Building construction - Abstract
There are delays in delivering construction projects, which have great affects in all countries' economies, due to many factors and reasons and these delays bring many negative consequences. Checking the source of these consequences or minimizing their effects is very important in terms of time and cost savings in the construction sector, especially for countries which have a continuous improvement such as Turkey. Determination of project factors that will cause delay, analysis of their impacts and taking protective measures will help to reduce losses. Hereby in this study, the project factors that may cause to delays are identified and some important association rules between these factors and delay are extracted by collecting the data from Turkish Public and private construction projects. Also, some recommendations were presented for reducing delays by using the extracted rules. more...
- Published
- 2019
22. Rough set‐based rule generation and Apriori‐based rule generation from table data sets: a survey and a combination
- Author
-
Hiroshi Sakai and Michinori Nakata
- Subjects
information incompleteness ,0209 industrial biotechnology ,Apriori algorithm ,database management systems ,granular computing ,Association rule learning ,apriori algorithm ,Computer Networks and Communications ,Computer science ,02 engineering and technology ,table data sets ,computer.software_genre ,incomplete information databases ,information analysis ,association rules ,020901 industrial engineering & automation ,Software ,Artificial Intelligence ,Complete information ,rule generators ,authors ,0202 electrical engineering, electronic engineering, information engineering ,Information system ,rough set theory ,software tools ,rough sets nondeterministic information analysis ,lcsh:Computer software ,business.industry ,Granular computing ,nondeterministic information systems ,lcsh:P98-98.5 ,data mining ,outstanding researches ,Knowledge acquisition ,Human-Computer Interaction ,knowledge acquisition ,novel researches ,lcsh:QA76.75-76.765 ,020201 artificial intelligence & image processing ,apriori-based rule generation ,Computer Vision and Pattern Recognition ,Rough set ,Data mining ,intelligent rule generator ,lcsh:Computational linguistics. Natural language processing ,business ,computer ,computational methodologies ,Information Systems - Abstract
The authors have been coping with new computational methodologies such as rough sets, information incompleteness, data mining, granular computing, etc., and developed some software tools on association rules as well as new mathematical frameworks. They simply term this research Rough sets Non-deterministic Information Analysis (RNIA). They followed several novel types of research, especially Pawlak's rough sets, Lipski's incomplete information databases, Orłowska's non-deterministic information systems, Agrawal's Apriori algorithm. These are outstanding researches related to information incompleteness, data mining, and rule generation. They have been trying to combine such novel researches, and they have been trying to realise more intelligent rule generator handling data sets with information incompleteness. This study surveys the authors’ research highlights on rule generators, and considers a combination of them. more...
- Published
- 2019
- Full Text
- View/download PDF
23. A novel machine learning approach for database exploitation detection and privilege control
- Author
-
Chee Keong Wee and Richi Nayak
- Subjects
reinforcement learning ,Association rule learning ,Computer Networks and Communications ,Network security ,Computer science ,Control (management) ,privilege control ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Privilege (computing) ,computer.software_genre ,lcsh:Telecommunication ,Database ,association rules ,lcsh:TK5101-6720 ,Computer Science (miscellaneous) ,Reinforcement learning ,self-healing ,Electrical and Electronic Engineering ,lcsh:T58.5-58.64 ,business.industry ,lcsh:Information technology ,anomaly detection ,Computer Science Applications ,ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS ,Anomaly detection ,business ,computer - Abstract
Despite protected by firewalls and network security systems, databases are vulnerable to attacks especially when the perpetrators are from within the organization and have authorized access to these systems. Detecting their malicious activities is difficult as each database has its own set of unique usage activities and the generic exploitation avoidance rules are usually not applicable. This paper proposes a novel method to improve the security of a database by using machine learning to learn the user behaviour unique to a database environment and apply that learning to detect anomalous user activities through the analysis of sequences of user session data. Once these suspicious users are detected, their privileges are systematically suppressed. The empirical analysis shows that the proposed approach can intuitively adapt to any database that supports a wide variety of clients and enforce stringent control customized to the specific IT systems. more...
- Published
- 2019
24. Market basket analysis with association rules in the retail sector using Orange. Case Study: Appliances Sales Company
- Author
-
Garcia-Diaz Maria-Elena, Marcos Martinez, Bel´en Escobar, and Diego P. Pinto-Roa
- Subjects
Association rule learning ,Market Basket Analysis ,Orange Canvas ,Affinity analysis ,QA75.5-76.95 ,General Medicine ,Orange (colour) ,Association Rules ,Commerce ,Knowledge Discovery in Databases ,Electronic computers. Computer science ,Data Mining ,Business ,FP- Growth ,Retail sector - Abstract
This research is conducted to analyze the shopping basket by using association rules in the retail area, more specifically in a home goods sales company such as appliances, computer items, furniture, and sporting goods, among others. With the rise of globalization and the advancement of technology, retail companies are constantly struggling to maintain and raise their profits, as well ordering the products and services that the customer wants to obtain. In this sense, they need a new approach to identify different objectives in order to be more competitive and successful, looking for new decision-making strategies. To achieve this goal, and to obtain clear and efficient strategies, by providing large amounts of data collected in business transactions, the need arises to intelligently analyze such data in order to extract useful knowledge that will support decision-making and, an understanding of the association patterns that occur in sales-customer behavior. Predicting which product will make the most profit, products that are sold together, this type of information is of great value for storing products in inventory. Knowing when a product is out of fashion can support inventory management effectively. In this sense, this work presents the rules of association of products obtained by analyzing the data with the FPGrowth algorithm using the Orange tool. more...
- Published
- 2021
- Full Text
- View/download PDF
25. Study of the Behavior of Cryptocurrencies in Turbulent Times Using Association Rules
- Author
-
Miguel Andrés Porro V., José Benito Hernández C., and Andrés García-Medina
- Subjects
Apriori algorithm ,Cryptocurrency ,Association rule learning ,Series (mathematics) ,Turbulence ,General Mathematics ,02 engineering and technology ,cryptocurrencies ,01 natural sciences ,010305 fluids & plasmas ,association rules ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Econometrics ,Economics ,QA1-939 ,020201 artificial intelligence & image processing ,time series ,Engineering (miscellaneous) ,Database transaction ,Mathematics - Abstract
We studied the effects of the recent financial turbulence of 2020 on the cryptocurrency market, taking into account both prices and volumes from December 2019 to July 2020. Time series were transformed into transaction matrices, and the Apriori algorithm was applied to find the association rules between different currencies, identifying whether the price or the volume of the currencies compose the rules. We divided the data set into two subsets and found that before the decline in cryptocurrency prices, the association rules were generally formed by these prices and that, then, the volumes of the transactions dominated to form the association rules. more...
- Published
- 2021
- Full Text
- View/download PDF
26. A Novel Decision-Making Process for COVID-19 Fighting Based on Association Rules and Bayesian Methods
- Author
-
Adel Thaljaoui, Fayez Alfayez, and Salim El Khediri
- Subjects
General Computer Science ,Coronavirus disease 2019 (COVID-19) ,Association rule learning ,Computer science ,AcademicSubjects/SCI01540 ,Bayesian probability ,02 engineering and technology ,Bayesian network’s structure learning based on data approach ,Machine learning ,computer.software_genre ,030218 nuclear medicine & medical imaging ,association rules ,03 medical and health sciences ,0302 clinical medicine ,Section C: Computational Intelligence, Machine Learning and Data Analytics ,0202 electrical engineering, electronic engineering, information engineering ,Decision-making ,autonomous decision-making ,business.industry ,Bayesian network ,COVID-19 ,Bayesian networks ,020201 artificial intelligence & image processing ,Original Article ,Artificial intelligence ,business ,computer - Abstract
Since recording the first case in Wuhan in November 2020, COVID-19 is still spreading widely and rapidly affecting the health of millions all over the globe. For fighting against this pandemic, numerous strategies have been made, where the early isolation is considered among the most effective ones. Proposing useful methods to screen and diagnose the patient’s situation for the purpose of specifying the adequate clinical management represents a significant challenge in diminishing the rates of mortality. Inspired from this current global health situation, we introduce a new autonomous process of decision-making that consists of two modules. The first module is the data analysis based on Bayesian network that is employed to indicate the coronavirus symptoms severity and then classify COVID-19 cases as severe, moderate or mild. The second module represents the decision-making based on association rules method that generates autonomously the adequate decision. To construct the model of Bayesian network, we used an effective method-oriented data for the sake of learning its structure. As a result, the algorithm accuracy in making the correct decision is 30% and in making the adequate decision is 70%. These experimental results demonstrate the importance of the suggested methods for decision-making. more...
- Published
- 2021
27. A novel association rule mining method for the identification of rare functional dependencies in Complex Technical Infrastructures from alarm data
- Author
-
Ahmed Shokry, Luigi Serio, Piero Baraldi, Federico Antonello, Enrico Zio, Ugo Gentile, Politecnico di Milano [Milan] (POLIMI), Centre de Mathématiques Appliquées - Ecole Polytechnique (CMAP), École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS), Centre de recherche sur les Risques et les Crises (CRC), MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Kyung Hee University (KHU), and European Organization for Nuclear Research (CERN) more...
- Subjects
0209 industrial biotechnology ,Association rule learning ,Computer science ,General Engineering ,Alarm data ,02 engineering and technology ,computer.software_genre ,Association rules ,Computer Science Applications ,Identification (information) ,ALARM ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Abnormal behaviors ,Data mining ,Complex Technical Infrastructures ,[SHS.GEST-RISQ]Humanities and Social Sciences/domain_shs.gest-risq ,Representation (mathematics) ,Functional dependency ,Rare functional dependencies ,computer ,ComputingMilieux_MISCELLANEOUS - Abstract
This work presents a data-driven method for identifying rare functional dependencies among components of different systems of Complex Technical Infrastructures (CTIs) from large-scale databases of alarm messages. It is based on the representation of the alarm data in a binary form, the use of a novel association rule mining algorithm properly tailored for discovering rare dependencies among components of different systems and on the identification of groups of functionally dependent components. The proposed method is applied to a synthetic alarm database generated by a simulated CTI model and to a real large-scale database of alarms collected in the CTI of CERN (European Organization for Nuclear Research). The obtained results show the effectiveness of the proposed method. more...
- Published
- 2021
- Full Text
- View/download PDF
28. Unsupervised Machine Learning and Data Mining Procedures Reveal Short Term, Climate Driven Patterns Linking Physico-Chemical Features and Zooplankton Diversity in Small Ponds
- Author
-
Nicolò Bellin, Marco Bartoli, Valeria Rossi, Erica Racchetti, and Catia Maurone
- Subjects
0106 biological sciences ,Fuzzy clustering ,Association rule learning ,Computer science ,Ecology (disciplines) ,Geography, Planning and Development ,Fuzzy set ,Aquatic Science ,010603 evolutionary biology ,01 natural sciences ,Biochemistry ,Fuzzy logic ,association rules ,nutrients ,chlorophyll ,Cluster analysis ,TD201-500 ,Water Science and Technology ,Water supply for domestic and industrial purposes ,010604 marine biology & hydrobiology ,Hydraulic engineering ,Unsupervised learning ,fuzzy clustering ,Physical geography ,Surface runoff ,TC1-978 - Abstract
Machine Learning (ML) is an increasingly accessible discipline in computer science that develops dynamic algorithms capable of data-driven decisions and whose use in ecology is growing. Fuzzy sets are suitable descriptors of ecological communities as compared to other standard algorithms and allow the description of decisions that include elements of uncertainty and vagueness. However, fuzzy sets are scarcely applied in ecology. In this work, an unsupervised machine learning algorithm, fuzzy c-means and association rules mining were applied to assess the factors influencing the assemblage composition and distribution patterns of 12 zooplankton taxa in 24 shallow ponds in northern Italy. The fuzzy c-means algorithm was implemented to classify the ponds in terms of taxa they support, and to identify the influence of chemical and physical environmental features on the assemblage patterns. Data retrieved during 2014 and 2015 were compared, taking into account that 2014 late spring and summer air temperatures were much lower than historical records, whereas 2015 mean monthly air temperatures were much warmer than historical averages. In both years, fuzzy c-means show a strong clustering of ponds in two groups, contrasting sites characterized by different physico-chemical and biological features. Climatic anomalies, affecting the temperature regime, together with the main water supply to shallow ponds (e.g., surface runoff vs. groundwater) represent disturbance factors producing large interannual differences in the chemistry, biology and short-term dynamic of small aquatic ecosystems. Unsupervised machine learning algorithms and fuzzy sets may help in catching such apparently erratic differences. more...
- Published
- 2021
29. COVID-19 patient diagnosis and treatment data mining algorithm based on association rules
- Author
-
Wei Miao and Zicheng Shan
- Subjects
Decision support system ,Information retrieval ,Association rule learning ,Computer science ,business.industry ,Online analytical processing ,Decision tree ,online analytical processing ,data warehouse ,Original Articles ,computer.software_genre ,Data warehouse ,Expert system ,Theoretical Computer Science ,association rules ,Computational Theory and Mathematics ,Web mining ,Knowledge base ,Artificial Intelligence ,Control and Systems Engineering ,COVID‐19 patients ,Original Article ,diagnosis treatment data mining ,business ,computer - Abstract
Association rules are used in different data mining applications, including Web mining, intrusion detection, and bioinformatics. This study mainly discusses the COVID‐19 patient diagnosis and treatment data mining algorithm based on association rules. General data The key time interval during the main diagnosis and treatment process (including onset to dyspnea, first diagnosis, admission, mechanical ventilation, death, and the time from first diagnosis to admission, etc.), the cause of death by laboratory examination, and so forth. The frequency of drug use was counted and association rule algorithm was used to analyse and study the effect of drug treatment. The results could provide reference for rational drug use in COVID‐19 patients. In this study, in order to improve the efficiency of data mining in data processing, it is necessary to pre‐process these data. Secondly, in the application of this data mining, the main objective is to extract association rules of COVID‐19 complications. So its properties for mining should be various diseases. Therefore, it is necessary to classify individual disease types. During the construction of association rules database, the data in the data warehouse is analysed online and the association rules data mining is analysed. The results are stored in the knowledge base for decision support. For example, the prediction results of the decision tree can be displayed at this level. After the construction of the mining model, the display interface can be mined, and the decision‐maker can input the corresponding attribute value and then predict it. 0.76% of people had both COVID‐19, CHD and hypertension, while 46.5% of people with COVID‐19 and CHD were likely to have hypertension. This study is helpful to analyse the imaging factors of COVID‐19 disease. [ABSTRACT FROM AUTHOR] Copyright of Expert Systems is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) more...
- Published
- 2021
30. Finding Effective Item Assignment Plans with Weighted Item Associations Using A Hybrid Genetic Algorithm
- Author
-
Kwang Il Ahn, Kichun Lee, and Minho Ryu
- Subjects
0209 industrial biotechnology ,Association rule learning ,Computer science ,Association (object-oriented programming) ,Crossover ,hybrid genetic algorithm ,02 engineering and technology ,computer.software_genre ,lcsh:Technology ,association rules ,lcsh:Chemistry ,020901 industrial engineering & automation ,Operator (computer programming) ,item assignment ,Genetic algorithm ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,lcsh:QH301-705.5 ,Instrumentation ,Fluid Flow and Transfer Processes ,cross-selling ,lcsh:T ,Process Chemistry and Technology ,General Engineering ,lcsh:QC1-999 ,Tabu search ,Purchasing ,Computer Science Applications ,lcsh:Biology (General) ,lcsh:QD1-999 ,lcsh:TA1-2040 ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Data mining ,lcsh:Engineering (General). Civil engineering (General) ,computer ,lcsh:Physics - Abstract
By identifying useful relationships between massive datasets, association rule mining can provide new insights to decision-makers. Item assignment models based on association between items are used to place items in a retail or e-commerce environment to increase sales. However, existing models fail to combine these associations with item-specific information, such as profit and purchasing frequency. To find effective assignments with item-specific information, we propose a new hybrid genetic algorithm that incorporates a robust tabu search with a novel rectangular partially matched crossover, focusing on rectangular layouts. Interestingly, we show that our item assignment model is equivalent to popular quadratic assignment NP-hard problems. We show the effectiveness of the proposed algorithm, using benchmark instances from QAPLIB and synthetic databases that represent real-life retail situations, and compare our algorithm with other existing algorithms. We also show that the proposed crossover operator outperforms a few existing ones in both fitness values and search times. The experimental results show that not only does the proposed item assignment model generates a more profitable assignment plan than the other tested models based on association alone but it also obtains better solutions than the other tested algorithms. more...
- Published
- 2021
- Full Text
- View/download PDF
31. Establishing a Multiple-Criteria Decision-Making Model for Stock Investment Decisions Using Data Mining Techniques
- Author
-
Mu-Jung Huang, Cheng-Kai Fu, Kuo-Chih Cheng, Kuo-Hua Wang, Lan-Hui Lin, and Huo-Ming Wang
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,apriori algorithm ,Geography, Planning and Development ,Decision tree ,Financial ratio ,TJ807-830 ,02 engineering and technology ,Management, Monitoring, Policy and Law ,TD194-195 ,Profit (economics) ,Renewable energy sources ,association rules ,decision tree ,0202 electrical engineering, electronic engineering, information engineering ,Econometrics ,multiple-criteria decision-making ,GE1-350 ,Environmental effects of industries and plants ,Renewable Energy, Sustainability and the Environment ,Decision tree learning ,020207 software engineering ,data mining ,Investment (macroeconomics) ,Environmental sciences ,Multiple criteria ,020201 artificial intelligence & image processing ,Decision model ,Decision-making models - Abstract
This study attempts to integrate the decision tree algorithm with the Apriori algorithm to explore the relationship among financial ratio, corporate governance, and stock returns to establish a stock investment decision model. The sports and leisure related industries are employed as the research target. The data are collected and processed for generating decision tree and association rules. Based on the analysis outcome, an investment decision model is constructed for investors expecting to decrease their investment risks and further increase their profits. This stock investment decision model is one type of multiple-criteria decision-making model. This study makes three critical contributions to investors. (1) It proposes a systematical model of exploring related data through the decision tree algorithm and the Apriori algorithm to reveal the implicit investment knowledge. (2) An effective investment decision model is established and expected to provide a reference basis during stock-picking decisions. (3) The investment decision model is enhanced with implicit rules found among variables using association rules. more...
- Published
- 2021
32. Clustering Based Approach to Enhance Association Rule Mining
- Author
-
Anu Sahni, Paul Stynes, Samruddhi Kanhere, and Pramod Pathak
- Subjects
Association rule learning ,Computer science ,differential market basket analysis ,Differential (mechanical device) ,Affinity analysis ,computer.software_genre ,retail analytics ,lcsh:Telecommunication ,association rules ,Set (abstract data type) ,Product (business) ,market basket analysis ,lcsh:TK5101-6720 ,Scalability ,Data mining ,Cluster analysis ,computer ,scalability ,Consumer behaviour ,clustering - Abstract
Association rule mining algorithms such as Apriori and FPGrowth are extensively being used in the retail industry to uncover consumer buying patterns. However, the scalability of these algorithms to deal with the voraciously increasing data is the major challenge. This research presents a novel Clustering based approach by reducing the dataset size as a solution. The products are clustered based on their frequency and price. Another important aspect of this study is to find interesting rules by performing differential market basket analysis to identify association rules which are likely ignored in the trivial approach. When using a cluster-based approach, it is observed that the same set of rules can be generated by using only 7% of the total 16210 items, which in turn directly contributes to reducing the processing overheads and thus reducing the computation time. Furthermore, results obtained from differential market basket analysis have highlighted a few interesting rules which were missing from the original set of rules. A clustering-based approach used in this study not only consists of frequent items but also considers their contribution to the overall revenue generation by considering its price. In addition to this, the least contributing product exclusion rate is also improved from 45% to 93 %. These results evidently suggest that the computation cost can be significantly reduced, and more accurate rules can be generated by applying differential market basket analysis. more...
- Published
- 2021
- Full Text
- View/download PDF
33. Construction of Materialized Views in Non-Binary Data Space
- Author
-
Bibekananda Shit, Santanu Roy, Agostino Cortesi, and Soumya Sen
- Subjects
Materialized view ,Speedup ,Settore INF/01 - Informatica ,Association rule learning ,Computer science ,Non-binary data space ,Association rules ,Dynamic support count ,Construct (python library) ,Space (commercial competition) ,computer.software_genre ,Database-centric architecture ,Binary data ,Benchmark (computing) ,Data mining ,computer - Abstract
Materialized views are heavily used to speed up the query response time of any data centric application. In literature, the construction and dynamic maintenance of materialized views are carried out in a Binary Data Space where all attributes are given the same weight. Considering different weights may be particularly significant when similar queries are posed by multiple users, as taking into account the number of accesses to the different attribute values may reflect into the ability of tuning the materialized views accordingly. The methodology to construct weighted materialized view introduced in this paper is based on the association mining techniques, by applying it in a Non-Binary Data Space. The proposed algorithm has been verified by simulation experiments with two benchmark datasets using practical transactional queries. The experimental results prove the superiority of our proposal in terms of query Hit-Miss ratio and flexibility of view size extendibility according to the requirement of practical applications. more...
- Published
- 2021
- Full Text
- View/download PDF
34. Association rule-based malware classification using common subsequences of API calls
- Author
-
Gianni D'Angelo, Francesco Palmieri, Massimo Ficco, D'Angelo, G., Ficco, M., and Palmieri, F.
- Subjects
Malware dynamic analysi ,0209 industrial biotechnology ,Exploit ,Association rule learning ,Computer science ,Markov chain ,Evasion (network security) ,02 engineering and technology ,API call sequence ,Association rules ,Machine learning ,Malware dynamic analysis ,Markov chains ,Sequence alignment ,computer.software_genre ,020901 industrial engineering & automation ,Obfuscation ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Obfuscation (software) ,Association rule ,Malware ,020201 artificial intelligence & image processing ,Data mining ,computer ,Software - Abstract
Emerging malware pose increasing challenges to detection systems as their variety and sophistication continue to increase. Malware developers use complex techniques to produce malware variants, by removing, replacing, and adding useless API calls to the code, which are specifically designed to evade detection mechanisms, as well as do not affect the original functionality of the malicious code involved. In this work, a new recurring subsequences alignment-based algorithm that exploits associative rules has been proposed to infer malware behaviors. The proposed approach exploits the probabilities of transitioning from two API invocations in the call sequence, as well as it also considers their timeline, by extracting subsequence of API calls not necessarily consecutive and representative of common malicious behaviors of specific subsets of malware. The resulting malware classification scheme, capable to operate within dynamic analysis scenarios in which API calls are traced at runtime, is inherently robust against evasion/obfuscation techniques based on the API call flow perturbation. It has been experimentally compared with two detectors based on Markov chain and API call sequence alignment algorithms, which are among the most widely adopted approaches for malware classification. In such experimental assessment the proposed approach showed an excellent classification performance by outperforming its competitors. more...
- Published
- 2021
35. Checking Sets of Pure Evolving Association Rules
- Author
-
Carlo Combi, Romeo Rizzi, and Pietro Sala
- Subjects
Algebra and Number Theory ,Association rule learning ,Computer science ,computer.software_genre ,Data complexity ,Theoretical Computer Science ,Association Rules ,Computational Theory and Mathematics ,Data Complexity ,Data Mining ,Data mining ,computer ,Information Systems ,Data Mining, Association Rules, Data Complexity - Abstract
Extracting association rules from large datasets has been widely studied in many variants in the last two decades; they allow to extract relations between values that occur more “often” in a database. With temporal association rules the concept has been declined to temporal databases. In this context the “most frequent” patterns of evolution of one or more attribute values are extracted. In the temporal setting, especially where the interference betweeen temporal patterns cannot be neglected (e.g., in medical domains), there may be the case that we are looking for a set of temporal association rules for which a “significant” portion of the original database represents a consistent model for all of them. In this work, we introduce a simple and intuitive form for temporal association rules, called pure evolving association rules (PE-ARs for short), and we study the complexity of checking a set of PE-ARs over an instance of a temporal relation under approximation (i.e., a percentage of tuples that may be deleted from the original relation). As a by-product of our study we address the complexity class for a general problem on Directed Acyclic Graphs which is theoretically interesting per se. more...
- Published
- 2021
36. Knowledge Discovering on Graphene Green Technology by Text Mining in National R&D Projects in South Korea
- Author
-
Richa Kumari, Byeong-Hee Lee, Jae Yun Jeong, Tae-Hyun Kim, and Ji Yeon Lee
- Subjects
Topic model ,Engineering ,Association rule learning ,020209 energy ,media_common.quotation_subject ,Geography, Planning and Development ,topic modeling ,lcsh:TJ807-830 ,lcsh:Renewable energy sources ,02 engineering and technology ,Management, Monitoring, Policy and Law ,New Deal ,association rules ,Green New Deal ,0202 electrical engineering, electronic engineering, information engineering ,Project management ,Project management system ,National Science and Technology Information Service ,lcsh:Environmental sciences ,media_common ,lcsh:GE1-350 ,Government ,Renewable Energy, Sustainability and the Environment ,business.industry ,lcsh:Environmental effects of industries and plants ,021001 nanoscience & nanotechnology ,project management ,Engineering management ,lcsh:TD194-195 ,Service (economics) ,green new deal policy ,0210 nano-technology ,business - Abstract
This paper reviews the development of South Korea’s national research and development (R&D) in graphene technology, focusing on projects that have been classified as “green” technology. A total of 826 projects (USD 210 billion) from 2010 to 2019 were collected from the National Science and Technology Information Service (NTIS), which is full-cycle national R&D project management system in South Korea. Then we analyzed its R&D trend by conducting diverse text mining methods including frequency analysis, association rule mining, and topic modeling. The analysis suggests that the number of graphene green technology (GT) R&D projects and the research expenses will show a rising curve again in the incumbent government along with the implementation of the Korean New Deal policy, which integrates the Green New Deal and the Digital New Deal. more...
- Published
- 2020
37. Fault Diagnosis of Traction Transformer Based on Bayesian Network
- Author
-
Pan Weiguo, Lin Sheng, Feng Ding, Sheng Bi, Xiao Yong, and Guo Xiaomin
- Subjects
Control and Optimization ,Association rule learning ,Computer science ,Energy Engineering and Power Technology ,02 engineering and technology ,01 natural sciences ,lcsh:Technology ,law.invention ,association rules ,Traction transformer ,law ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Transformer ,Engineering (miscellaneous) ,Leakage (electronics) ,010302 applied physics ,Renewable Energy, Sustainability and the Environment ,business.industry ,lcsh:T ,Fossil fuel ,Bayesian network ,traction transformer ,fault diagnosis ,Reliability engineering ,conditional probability ,020201 artificial intelligence & image processing ,business ,human activities ,Energy (miscellaneous) ,Test data ,Voltage - Abstract
As the core equipment of a traction power supply system, the traction transformer is very important to ensure the safe and reliable operation of the system. At present, the three-ratio method is mainly used to distinguish transformer faults, whereas such a method has some defects, such as insufficient coding and over-general fault classification. At the same time, on-site maintenance personnel make an empirical judgment based on various test data, which is subjective and uncertain to a certain extent. For cases with multiple abnormal data and relatively complex conditions, on-site personnel often need to discuss and even dismantle the transformer to identify the fault, which is time-consuming and costly. In order to improve the effect of fault diagnosis for traction transformer, this paper uses Bayesian network to correlate the cause and effect of various tests and faults. By combining the results of field tests, the fault is diagnosed by the causal probability of the Bayesian network, rather than relying on the exception that occurred in a single experiment to judge its fault. The diagnosis results are more accurate and objective by using the Bayesian network. In this paper, the frequent test anomalies of the traction transformer are taken into account in the network, so that the network can more comprehensively analyze the operation situation of the traction transformer and judge the type of fault. According to field situations, based on the existing set of symptoms of the Bayesian network fault diagnosis, this paper further considers the insulation resistance, dielectric loss tangent value, oil and gas, power frequency voltage, and leakage current. By combining the association rules algorithm and the experience of the field operators, the cause–effect relationship of test data and the conditional probability parameters of the network are obtained. Then, the Bayesian network is constructed and used for traction transformer fault diagnosis. The case study shows that the four types of fault diagnosed using the Bayesian network model proposed in this paper are consistent with the fault types inspected by on-site operators, which shows promising engineering application prospects. more...
- Published
- 2020
38. Apriori Algorithm for the Data Mining of Global Cyberspace Security Issues for Human Participatory Based on Association Rules
- Author
-
Zhi Li, Xuyu Li, Runhua Tang, and Lin Zhang
- Subjects
Apriori algorithm ,Association rule learning ,cyberspace security ,Internet privacy ,lcsh:BF1-990 ,Sample (statistics) ,02 engineering and technology ,050105 experimental psychology ,association rules ,Web page ,0202 electrical engineering, electronic engineering, information engineering ,Data Protection Act 1998 ,Psychology ,0501 psychology and cognitive sciences ,General Psychology ,Original Research ,ComputingMilieux_THECOMPUTINGPROFESSION ,business.industry ,network sovereignty ,05 social sciences ,data mining ,lcsh:Psychology ,Cyber-attack ,020201 artificial intelligence & image processing ,The Internet ,business ,Cyberspace - Abstract
This study explored the global cyberspace security issues, with the purpose of breaking the stereotype of people’s cognition of cyberspace problems, which reflects the relationship between interdependence and association. Based on the Apriori algorithm in association rules, a total of 181 strong rules were mined from 40 target websites and 56,096 web pages were associated with global cyberspace security. Moreover, this study analyzed support, confidence, promotion, leverage, and reliability to achieve comprehensive coverage of data. A total of 15,661 sites mentioned cyberspace security-related words from the total sample of 22,493 professional websites, accounting for 69.6%, while only 735 sites mentioned cyberspace security-related words from the total sample of 33,603 non-professional sites, accounting for 2%. Due to restrictions of language, the number of samples of target professional websites and non-target websites is limited. Meanwhile, the number of selections of strong rules is not satisfactory. Nowadays, the cores of global cyberspace security issues include internet sovereignty, cyberspace security, cyber attack, cyber crime, data leakage, and data protection. more...
- Published
- 2020
39. Medical Health Benefit Management System for Real-Time Notification of Fraud Using Historical Medical Records
- Author
-
Shoab A. Khan, Irum Matloob, Habibur Rahman, and Farhan Hussain
- Subjects
Knowledge management ,Association rule learning ,Computer science ,Specialty ,anomaly ,02 engineering and technology ,01 natural sciences ,lcsh:Technology ,association score ,association rules ,lcsh:Chemistry ,Health care ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Instrumentation ,lcsh:QH301-705.5 ,Reimbursement ,Fluid Flow and Transfer Processes ,Service (business) ,Government ,business.industry ,lcsh:T ,Process Chemistry and Technology ,010401 analytical chemistry ,General Engineering ,outlier ,lcsh:QC1-999 ,0104 chemical sciences ,Computer Science Applications ,lcsh:Biology (General) ,lcsh:QD1-999 ,lcsh:TA1-2040 ,Management system ,020201 artificial intelligence & image processing ,fraud ,business ,lcsh:Engineering (General). Civil engineering (General) ,Transaction data ,lcsh:Physics ,clustering - Abstract
This paper presents a novel framework for fraud detection in healthcare systems which self-learns from the historical medical data. Historical medical records are required for training and testing of machine learning models. The main problem being faced by both private and government health supported schemes is a rapid rise in the amount of claims by beneficiaries mostly based on fraudulent billing. Detection of fraudulent transactions in healthcare systems is a strenuous task due to intricate relationships among dynamic elements including doctors, patients, service. In light of aforementioned challenges in health support programs, there is a need to develop intelligent fraud detection models for tracing the loopholes in procedures which may lead to successful reimbursement of fraudulent medical bills. In order to address the issue of fraud in healthcare programs our solution proposes a framework based on three entities (patient, doctor, service). Firstly, the framework computes association scores for three elements of the healthcare ecosystem namely patients, doctors or services. The framework filters out identified cases using association scores. The Confidence values, after G-means clustering of transactional data, are computed for each service in each specialty. Rules are generated based on the confidence values of services for each specialty. Then, an evaluation of identified cases is done using rule engine. The framework classifies cases into fraudulent activities based on the similarity bit&rsquo, s value. The validation of framework is performed on local hospital employees transactional data which includes many reported cases of fraudulent activities in addition to some introduced anomalies. more...
- Published
- 2020
40. Automatic identification of knowledge related to dengue cases in the state of Piauí in public databases using Filtered-Association Rules Networks
- Author
-
Jâina Carolina Meneses Calçada, Solange Oliveira Rezende, Joan Davi Santos Silva, and Dario Brito Calçada
- Subjects
Association Rules ,Dengue ,Epidemiological surveillance ,Knowledge Discovery ,Networks ,education.field_of_study ,DENGUE ,General Computer Science ,Association rule learning ,Computer science ,Population ,medicine.disease ,computer.software_genre ,Dengue fever ,Identification (information) ,Knowledge extraction ,Computer Science ,Information system ,medicine ,State (computer science) ,Data mining ,education ,computer - Abstract
Dengue is an endemic disease in Brazil since the 1980s and since 1996 in Piau ́ı. The number of cases increases each year, with the incidence of more severe symptoms. This research aimed to evaluate the use of an automatic knowledge identification technique in factors related to the number of dengue occurrences. We built a dataset formed by data available in the Information System for Notifiable Diseases (SINAN) and meteorological data of the municipalities of the coastal plain of Piau ́ı. The technique used was that of Filtered Association Rules Networks, which allows visual analysis of knowledge through the use of network structures and rules filtering. As a main result, we confirmed the understanding that the most significant number of cases occurs in May, as it is the moment when the rainfall indexes are decreasing, besides that socio-cultural and race factors do not interfere in the identification of the population of higher risk. This research presents the innovation of the use of a computational technique of automatic knowledge discovery that can assist in the elaboration of prevention actions by epidemiological surveillance. more...
- Published
- 2020
41. SAERMA: Stacked Autoencoder Rule Mining Algorithm for the Interpretation of Epistatic Interactions in GWAS for Extreme Obesity
- Author
-
Carl Chalmers, Nurul Hashimah Ahamed Hassain Malim, Basma Abdulaimma, Casimiro Aday Curbelo Montañez, Denis Reilly, Paul Fergus, and Francesco Falciani
- Subjects
FOS: Computer and information sciences ,QA75 ,epistasis ,Computer Science - Machine Learning ,obesity ,General Computer Science ,Association rule learning ,Computer science ,Machine Learning (stat.ML) ,Genome-wide association study ,02 engineering and technology ,Association rules ,Machine Learning (cs.LG) ,QA76 ,03 medical and health sciences ,0302 clinical medicine ,Statistics - Machine Learning ,autoencoders ,Genetic variation ,0202 electrical engineering, electronic engineering, information engineering ,genome-wide association studies (GWAS) ,General Materials Science ,Quantitative Biology - Genomics ,030212 general & internal medicine ,Gene ,Genetic association ,Interpretability ,Genomics (q-bio.GN) ,Artificial neural network ,business.industry ,Deep learning ,General Engineering ,Linear model ,deep learning ,Autoencoder ,R1 ,FOS: Biological sciences ,Epistasis ,020201 artificial intelligence & image processing ,Artificial intelligence ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,Algorithm ,Classifier (UML) ,lcsh:TK1-9971 ,Curse of dimensionality - Abstract
One of the most important challenges in the analysis of high-throughput genetic data is the development of efficient computational methods to identify statistically significant Single Nucleotide Polymorphisms (SNPs). Genome-wide association studies (GWAS) use single-locus analysis where each SNP is independently tested for association with phenotypes. The limitation with this approach, however, is its inability to explain genetic variation in complex diseases. Alternative approaches are required to model the intricate relationships between SNPs. Our proposed approach extends GWAS by combining deep learning stacked autoencoders (SAEs) and association rule mining (ARM) to identify epistatic interactions between SNPs. Following traditional GWAS quality control and association analysis, the most significant SNPs are selected and used in the subsequent analysis to investigate epistasis. SAERMA controls the classification results produced in the final fully connected multi-layer feedforward artificial neural network (MLP) by manipulating the interestingness measures, support and confidence, in the rule generation process. The best classification results were achieved with 204 SNPs compressed to 100 units (77% AUC, 77% SE, 68% SP, 53% Gini, logloss=0.58, and MSE=0.20), although it was possible to achieve 73% AUC (77% SE, 63% SP, 45% Gini, logloss=0.62, and MSE=0.21) with 50 hidden units - both supported by close model interpretation., 12 pages, 6 figures, 12 tables, 9 equations, journal more...
- Published
- 2020
42. Medical Data Stream Distribution Pattern Association Rule Mining Algorithm Based on Density Estimation
- Author
-
Dong Li, Yanwei Wang, and Xiaofeng Li
- Subjects
Data stream ,General Computer Science ,Association rule learning ,distribution pattern ,Computer science ,02 engineering and technology ,mining ,Stability (probability) ,Structural equation modeling ,association rules ,020204 information systems ,Histogram ,density estimation ,0202 electrical engineering, electronic engineering, information engineering ,Range (statistics) ,General Materials Science ,Cluster analysis ,medical data ,Compound neural network ,General Engineering ,Density estimation ,TK1-9971 ,Data redundancy ,020201 artificial intelligence & image processing ,Electrical engineering. Electronics. Nuclear engineering ,Algorithm - Abstract
The traditional data mining method is featured by no analysis over the data distribution and incomplete derived association rule. As a result, the data mining results have the deficiencies of large redundancy probability, large root-mean-square error of approximation (RMSEA) and long consumption time. To handle these issues, this paper proposes a medical data stream distribution pattern association rule mining algorithm based on density estimation. This paper collects medical data, selects the distance method to detect abnormal orphan data in the data stream, detects the duplicate data in the data stream by the similar field matching degree, and eliminates the abnormal data and the duplicate data. Then, the data stream density is estimated based on the histogram estimation samples. According to the data density estimation results, this paper analyzes the distribution of medical data stream from perspectives of concentration, dispersion and morphological characteristics of data distribution. Afterwards, the data distribution pattern association rule mining model is constructed based on compound neural network, data distribution parameters are entered into model’s clustering layer, and in-depth training is conducted over the BP (Back Propagation) neural network at the model’s mining layer. Meanwhile, all rules under the combination of hidden layer’s neuron activity value and corresponding output value, and all rules under the combination of hidden layer’s neuron activity value and corresponding input value are derived, so as to complete association rule mining of medical data stream distribution pattern. The experimental results show that the proposed algorithm has a contour curve closest to the true probability density curve; the dispersion degree of medical data is within a reasonable range, and the medical data has high stability; the data redundancy probability is smaller; the mining result’s RMSEA is small; data mining takes less time. more...
- Published
- 2019
43. Recommendations for Mobile Apps Based on the HITS Algorithm Combined With Association Rules
- Author
-
Wei Li, Yiwen Zhang, Xiangliang Zhong, Qilin Wu, Dengcheng Yan, and Yuan Ting Yan
- Subjects
app recommendation ,010302 applied physics ,General Computer Science ,Association rule learning ,Computer science ,Download ,General Engineering ,Mobile apps ,data mining ,02 engineering and technology ,HITS algorithm ,021001 nanoscience & nanotechnology ,01 natural sciences ,association rules ,World Wide Web ,0103 physical sciences ,Recommender systems ,General Materials Science ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Electrical and Electronic Engineering ,0210 nano-technology ,lcsh:TK1-9971 - Abstract
With the increasing popularity of intelligent devices, the mobile apps market has exploded. Due to a large number of candidate app services, it has become very difficult for users to choose the mobile apps that he/she wants to install. Therefore, it is crucial to improve users' experience and make personalized recommendations. In some cases, the traditional recommendation methods can be convenient, but they still have some shortcomings, resulting in inaccurate recommendations in general. To address this issue, this paper proposes a method for mobile app recommendations that are based on the Hyperlink-Induced Topic Search (HITS) algorithm combined with association rules. This method integrates the authority and hub scores into the candidate applications through the download and rating information, and it not only considers the importance of mobile apps in association rules but also takes the reliability factor of users into account. Experiments with the Huawei application market datasets show that the proposed method significantly improves the recommendation accuracies compared with the traditional methods. more...
- Published
- 2019
- Full Text
- View/download PDF
44. TRICE: Mining Frequent Itemsets by Iterative TRimmed Transaction LattICE in Sparse Big Data
- Author
-
Hamayoun Shahwani, Ch. Muhammad Nadeem Faisal, Muhammad Umar Chaudhry, Muhammad Yasir, Mudassar Ahmad, Muhammad Ashraf, Shahzad Sarwar, and Muhammad Asif Habib
- Subjects
big data applications ,General Computer Science ,Association rule learning ,Computer science ,business.industry ,Big data ,pattern recognition ,General Engineering ,frequent itemset mining ,data mining ,computer.software_genre ,Association rules ,Lattice (order) ,pervasive computing ,Unsupervised learning ,General Materials Science ,Data mining ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,Database transaction ,computer ,lcsh:TK1-9971 - Abstract
Sparseness is often witnessed in big data emanating from a variety of sources, including IoT, pervasive computing, and behavioral data. Frequent itemset mining is the first and foremost step of association rule mining, which is a distinguished unsupervised machine learning problem. However, techniques for frequent itemset mining are least explored for sparse real-world data, showing somewhat comparable performance. On the contrary, the methods are adequately validated for dense data and stand apart from each other in terms of performance. Hence, there arises an immense need for evaluating these techniques as well as proposing new ones for large sparse real-world datasets. In this study, a novel method: Mining Frequent Itemsets by Iterative TRimmed Transaction lattICE (TRICE) is proposed. TRICE iteratively generates combinations of varying-sized trimmed subsets of $I$ , where $I$ denote the set of distinct items in a database. Extensive experiments are conducted to assess TRICE against HARPP, FP-Growth, optimized SaM, and optimized RElim algorithms. The experimental results show that TRICE outperforms all these algorithms both in terms of running time and memory consumption. TRICE maintains a substantial performance gap for all sparse real-world datasets on all minimum support thresholds. Moreover, assessment of memory use of optimized SaM and RElim algorithms has been performed for the first time. more...
- Published
- 2019
45. Content Recommendation Algorithm for Intelligent Navigator in Fog Computing Based IoT Environment
- Author
-
Jiuzhi Lin, Fuhong Lin, Xingshuo An, Yutong Zhou, Ilsun You, and Xing Lü
- Subjects
Service (systems architecture) ,Content recommendation ,General Computer Science ,Association rule learning ,Computer science ,Cloud computing ,02 engineering and technology ,association rules ,Internet of Vehicles ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Relevance (information retrieval) ,Mobile technology ,Edge computing ,business.industry ,General Engineering ,020206 networking & telecommunications ,Traffic congestion ,020201 artificial intelligence & image processing ,The Internet ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,fog computing ,Internet of Things ,business ,Algorithm ,lcsh:TK1-9971 - Abstract
With the development of the Internet and mobile technologies, the Internet of Things (IoT) era has arrived. Vehicle networking technology can not only facilitate people’s travel but also effectively alleviate traffic congestion. The development of fog computing technology provides unlimited possibilities for the Internet of Vehicles (IoV). Intelligent navigator is a very important part of human–computer interaction in IoV. It carries a large number of tasks of recommending content for users. In order to get more accurate recommendation content, we propose a weighted interest degree recommendation algorithm using association rules for intelligence in the IoV. First, the user data are analyzed to establish the association rule mining algorithm. Second, the user interest score is predicted by analyzing the relevance between user interests to recommend personalized service for the user. From the simulation results, we can see that the proposed algorithm can achieve higher recommendation accuracy. more...
- Published
- 2019
46. Publishing Anonymized Set-Valued Data via Disassociation towards Analysis
- Author
-
Nancy Awad, Bechara Al Bouna, Laurent Philippe, Jean-François Couchot, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS), and Université Antonine (UA) more...
- Subjects
ant colony clustering ,Association rule learning ,Computer Networks and Communications ,Property (programming) ,Computer science ,media_common.quotation_subject ,knowledge extraction ,02 engineering and technology ,Data publishing ,[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE] ,computer.software_genre ,privacy ,anonymization ,Set (abstract data type) ,[INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing ,association rules ,[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR] ,Knowledge extraction ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,media_common ,lcsh:T58.5-58.64 ,lcsh:Information technology ,Probabilistic logic ,Ambiguity ,16. Peace & justice ,[INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation ,[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA] ,utility ,disassociation ,020201 artificial intelligence & image processing ,[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET] ,Data mining ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,computer - Abstract
Data publishing is a challenging task for privacy preservation constraints. To ensure privacy, many anonymization techniques have been proposed. They differ in terms of the mathematical properties they verify and in terms of the functional objectives expected. Disassociation is one of the techniques that aim at anonymizing of set-valued datasets (e.g., discrete locations, search and shopping items) while guaranteeing the confidentiality property known as k m -anonymity. Disassociation separates the items of an itemset in vertical chunks to create ambiguity in the original associations. In a previous work, we defined a new ant-based clustering algorithm for the disassociation technique to preserve some items associated together, called utility rules, throughout the anonymization process, for accurate analysis. In this paper, we examine the disassociated dataset in terms of knowledge extraction. To make data analysis easy on top of the anonymized dataset, we define neighbor datasets or in other terms datasets that are the result of a probabilistic re-association process. To assess the neighborhood notion set-valued datasets are formalized into trees and a tree edit distance (TED) is directly applied between these neighbors. Finally, we prove the faithfulness of the neighbors to knowledge extraction for future analysis, in the experiments. more...
- Published
- 2020
- Full Text
- View/download PDF
47. Standardizing interestingness measures for association rules
- Author
-
Mateen Shaikh, Paul D. McNicholas, M. Luiza Antonie, and Thomas Brendan Murphy
- Subjects
FOS: Computer and information sciences ,Association rule learning ,Computer science ,Machine Learning (stat.ML) ,02 engineering and technology ,Association rules ,computer.software_genre ,Statistics - Applications ,01 natural sciences ,Machine Learning (cs.LG) ,010104 statistics & probability ,Text categorization ,Statistics - Machine Learning ,020204 information systems ,Frequency patterns ,0202 electrical engineering, electronic engineering, information engineering ,Applications (stat.AP) ,0101 mathematics ,Interestingness measures ,business.industry ,Computer Science Applications ,Computer Science - Learning ,Standardizations ,Artificial intelligence ,business ,computer ,Analysis ,Natural language processing ,Information Systems - Abstract
Interestingness measures provide information about association rules. The value of an interestingness measure is often interpreted relative to the overall range of the interestingness measure. However, properties of individual association rules can further restrict what value an interestingness measure can achieve. These additional constraints are not typically taken into account in analysis, potentially misleading the investigator. Considering the value of an interestingness measure relative to this further constrained range provides greater insight than the original range alone and can even alter researchers' impressions of the data. Standardizing interestingness measures takes these additional restrictions into account, resulting in values that provide a relative measure of the attainable values. We explore the impacts of standardizing interestingness measures on real and simulated data. Insight Research Centre Natural Sciences and Engineering Research Council of Canada Ontario Ministry of Research and Innovation more...
- Published
- 2018
- Full Text
- View/download PDF
48. Usage Apriori and clustering algorithms in WEKA tools to mining dataset of traffic accidents
- Author
-
Faisal Mohammed Nafie Ali and Abdelmoneim Ali Mohamed Hamed
- Subjects
Association rule learning ,Computer Networks and Communications ,Computer science ,computer.software_genre ,traffic accidents ,lcsh:Telecommunication ,Set (abstract data type) ,association rules ,lcsh:TK5101-6720 ,0502 economics and business ,Expectation–maximization algorithm ,Computer Science (miscellaneous) ,0501 psychology and cognitive sciences ,Electrical and Electronic Engineering ,Cluster analysis ,EM algorithm ,Data mining ,050107 human factors ,050210 logistics & transportation ,lcsh:T58.5-58.64 ,lcsh:Information technology ,05 social sciences ,InformationSystems_DATABASEMANAGEMENT ,Apriori ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,A priori and a posteriori ,computer ,clustering - Abstract
The aim of this study is finding approaches for investigating association rules mining algorithms and clustering to offer new rules from a broad set of discovered rules which taken from traffic accident data at Alghat Provence in KSA. Several tools are applying in data mining to extracting data. WEKA provides applications of learning algorithms that can efficiently execute any dataset. In WEKA tools, there are many algorithms used to mining data. Apriori and cluster are the first-rate and most famed algorithms. Apriori is the simple algorithm, which applied for mining of repeated the patterns from the transaction dataset to find frequent itemsets and association between various item sets. A cluster is a technique used to group a collection of items having similar features. Association rules applied to find the connection between data items in a transactional database. Association rules data mining algorithms used to discover frequent association. WEKA tools were used to analysing traffic dataset, which composed of 946 instances and 8 attributes. Apriori algorithm and EM cluster were implemented for traffic dataset to discover the factors, which causes accidents. Through the results, shows that the Apriori algorithm is better than the EM cluster algorithm. more...
- Published
- 2018
49. On Two Apriori-Based Rule Generators: Apriori in Prolog and Apriori in SQL
- Author
-
Kao-Yi Shen, Hiroshi Sakai, and Michinori Nakata
- Subjects
Apriori algorithm ,SQL ,Association rule learning ,Computer science ,apriori algorithm ,InformationSystems_DATABASEMANAGEMENT ,02 engineering and technology ,computer.software_genre ,Human-Computer Interaction ,Prolog ,association rules ,prolog ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,A priori and a posteriori ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Data mining ,computer ,rule generation ,computer.programming_language - Abstract
This paper focuses on two Apriori-based rule generators. The first is the rule generator in Prolog and C, and the second is the one in SQL. They are namedApriori in PrologandApriori in SQL, respectively. Each rule generator is based on the Apriori algorithm. However, each rule generator has its own properties. Apriori in Prolog employs the equivalence classes defined by table data sets and follows the framework of rough sets. On the other hand, Apriori in SQL employs a search for rule generation and does not make use of equivalence classes. This paper clarifies the properties of these two rule generators and considers effective applications of each to existing data sets. more...
- Published
- 2018
50. Method of Association Rules Mining and Its Application in Analysis of Seawater Samples
- Author
-
Xinhang Xu, Yonghong Liu, Hongtao Zhang, and Qiuhong Sun
- Subjects
Fitness function ,Association rule learning ,lcsh:T58.5-58.64 ,Computer science ,lcsh:T ,lcsh:Information technology ,Crossover ,Photoelectric sensor ,General Engineering ,computer.software_genre ,lcsh:Technology ,Association Rules ,Set (abstract data type) ,Immune Genetic Algorithm (IGA) ,Potential Data ,Mutation (genetic algorithm) ,Genetic algorithm ,A priori and a posteriori ,Data mining ,computer - Abstract
This paper aims to set up new rules for processing seawater quality monitoring data collected by photoelectric sensor network, and mine out the useful information contained in the data. For this purpose, the immune algorithm was introduced to the classical genetic algorithm, the fitness function was designed, and the crossover and mutation probabilities were adjusted, thus creating the adaptive immune genetic algorithm (IIGA). The new algorithm was described in details and applied in an actual case. Through the comparison between the IIGA, IGA and apriori algorithms, the author concluded that the IIGA not only shortened the mining time, but also ensured the operation accuracy. The research findings are of great importance to the association rules mining in various fields. more...
- Published
- 2018
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.