236 results on '"rule-based classification"'
Search Results
2. Assessing the cropping intensity dynamics of the Gosaba CD block of Indian Sundarbans using satellite-based remote sensing.
- Author
-
Ghosh, Argha, Nanda, Manoj Kumar, Sarkar, Debolina, Sarkar, Sukamal, Brahmachari, Koushik, and Mainuddin, Mohammed
- Subjects
SATELLITE-based remote sensing ,CROP rotation ,AGRICULTURAL productivity ,SHIFTING cultivation ,FOOD security ,RAINFALL - Abstract
Food availability is one of the dimensions of food security, and it is necessary to analyze the crop production scenario to estimate the availability of food in a region. Cropping sequence and cropping intensity indicate the seasonal crop production, thereby indicating the seasonal availability of food. Seasonal variation of per capita or per household availability of the cropped land determines the food security status of a given region. In Indian Sundarbans region, people's livelihood is seriously threatened by the food insecurity. The present study aimed to determine the seasonality of cropped land as well as the cropping intensities of Gosaba CD block of Indian Sundarbans during 2017–2018, 2018–2019 and 2019–2020 cropping years using Multi-dated Sentinel-2 data. Rule-based classification was applied for cropping sequence and cropping intensity mapping. Winter season cropped land was the lowest (< 16% of the village area). The area under crop–fallow–crop sequence (200% cropping intensity) decreased, while the area under crop–fallow–fallow (100% cropping intensity) sequence increased. Area under 300% cropping intensity gradually decreased. The average cropping intensity changed from 150% in 2017–2018 to 124% and 136% in 2018–2019 and 2019–2020, respectively. Large variation of the seasonal cropped land per household was estimated, and it became the worst during winter when it became less than 0.5 bighas (0.07 ha). Crop cultivation during dry season depended on the rainfall pattern and surface water availability. The present study successfully addressed the cropping scenario and food insecurity of the study area, and hopefully, it will help the planners and policy makers to take necessary actions for cropping intensification and ensuring food security in the Indian Sundarbans region. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Classification of Human Learning Stages via Kernel Distribution Embeddings
- Author
-
Madeleine Shuhn-Tsuan Yuh, Kendric Ray Ortiz, Kylie Sue Sommer-Kohrt, Meeko Oishi, and Neera Jain
- Subjects
Cyberphysical systems ,cognitive systems and control ,human-in-the-loop systems ,rule-based classification ,kernel methods ,Control engineering systems. Automatic machinery (General) ,TJ212-225 ,Technology - Abstract
Adaptive automation, automation which is responsive to the human's performance via the alteration of control laws or level of assistance, is an important tool for training humans to attain new skills when operating dynamical systems. When coupled with cognitive feedback, adaptive automation has the potential to further facilitate human training, but requires precise assessments of human progression through various learning stages. This is challenging because of the underlying dynamics, as well as the stochasticity inherent to human action. We propose a data-driven approach to assess learning stages in a complex quadrotor landing task that is responsive to stochastic, human-in-the-loop quadrotor dynamics. We represent each learning stage as a distribution of canonical trajectories for that learning stage, then employ kernel distribution embeddings in combination with a rule-based heuristic, to determine which canonical distribution a sample landing trajectory is closest to. We demonstrate our approach on experimental human subject data, and use our approach to evaluate the efficacy of cognitively-based adaptive automation designed to calibrate self-confidence. Our approach is more accurate than standard classification methods, such as nearest centroid assignment, which rely on metrics that are not inherently suited to analysis of trajectories of stochastic dynamical systems.
- Published
- 2024
- Full Text
- View/download PDF
4. Concise and interpretable multi-label rule sets.
- Author
-
Ciaperoni, Martino, Xiao, Han, and Gionis, Aristides
- Subjects
NAIVE Bayes classification ,REDUNDANCY in engineering ,SAMPLING (Process) - Abstract
Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple "if-then" rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules, requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem, we propose a 2-approximation algorithm, which circumvents the exponential-size search space of rules using a novel technique to sample highly discriminative and diverse rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation and a case study, which indicate that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Rule-Based Text Classification of Dental Diagnosis.
- Author
-
Mei WANG, AGRAWAL, Anushka, ROGERS, Nicole, JOHN, Vanchit, and THYVALIKAKATH, Thankam
- Abstract
Unstructured medical records boast an abundance of information that could greatly facilitate medical decision-making and improve patient care. With the development of Natural Language Processing (NLP) methodology, the free-text medical data starts to attract more and more research attention. Most existing studies try to leverage the power of such unstructured data using Machine Learning algorithms, which would usually require a relatively large training set, and high computational capacity. However, when faced with a smaller-scale project, opting for an alternative approach may be more effective and practical. This project proposes an efficient and light-weight rule-based approach to categorize dental diagnosis data. It not only fills the void of dental records in the medical free-text processing area, but also demonstrates that with expertly designed research structure and proper implementation, simple method could achieve our study goal very competently. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. CRC: Consolidated Rules Construction for Expressive Ensemble Classification
- Author
-
Almutairi, Manal, Stahl, Frederic, Bramer, Max, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bramer, Max, editor, and Stahl, Frederic, editor
- Published
- 2022
- Full Text
- View/download PDF
7. Design of metaheuristic rough set-based feature selection and rule-based medical data classification model on MapReduce framework
- Author
-
Bhukya Hanumanthu and Manchala Sadanandam
- Subjects
mapreduce ,big data analytics ,medical data classification ,rough set ,feature selection ,rule-based classification ,Science ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Recently, big data analytics have gained significant attention in healthcare industry due to generation of massive quantities of data in various forms such as electronic health records, sensors, medical imaging, and pharmaceutical details. However, the data gathered from various sources are intrinsically uncertain owing to noise, incompleteness, and inconsistency. The analysis of such huge data necessitates advanced analytical techniques using machine learning and computational intelligence for effective decision making. To handle data uncertainty in healthcare sector, this article presents a novel metaheuristic rough set-based feature selection with rule-based medical data classification (MRSFS-RMDC) technique on MapReduce framework. The proposed MRSFS-RMDC technique designs a butterfly optimization algorithm for minimal rough set selection. In addition, Hadoop MapReduce is applied to process massive quantity of data. Moreover, a rule-based classification approach named Repeated Incremental Pruning for Error Reduction (RIPPER) is used with the inclusion of a set of conditional rules. The RIPPER will scale in a linear way with the number of training records utilized and is suitable to build models with data uncertainty. The proposed MRSFS-RMDC technique is validated using benchmark dataset and the results are inspected under varying aspects. The experimental results highlighted the supremacy of the MRSFS-RMDC technique over the recent state of art methods in terms of different performance measures. The proposed methodology has achieved a higher F-score of 96.49%.
- Published
- 2022
- Full Text
- View/download PDF
8. Rule-Based Arabic Sentiment Analysis using Binary Equilibrium Optimization Algorithm.
- Author
-
Rahab, Hichem, Haouassi, Hichem, and Laouid, Abdelkader
- Subjects
- *
SENTIMENT analysis , *MATHEMATICAL optimization , *METAHEURISTIC algorithms , *WEB development , *ONLINE social networks , *USER-generated content , *RECOMMENDER systems - Abstract
With the development of websites and social networks, Internet users generate a massive amount of comments and information on the Web. Sentiment analysis, also called opinion mining, offers an opportunity to mine the people's sentiments and emotions from the textual comments. In the last decade, sentiment analysis has been applied in research areas such as recommendation and support systems and has become an area of interest for many researchers. Therefore, many studies have been carried out on English, while other languages, such as Arabic, received less attention. Increasingly, sentiment analysis researchers use machine learning due to its excellent performance. However, the generated models are black boxes and non-interpretable by the users. The rule-based classification is a promising approach for generating interpretable models. This work proposes a classification rule-based Arabic sentiment analysis approach together with a new binary equilibrium optimization metaheuristic algorithm as an optimization method for classification rule generation from Arabic documents. The proposed approach has been experimented on the Opinion Corpus for Arabic (OCA) and generates a classification model of thirteen rules. The comparison results with state-of-the-art methods show that the proposed approach outperforms all other white-box models regarding classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. WPEviRC: A Multi-rules-based Classifier for Evidential Databases Without Class Label Ambiguities.
- Author
-
Bahri, Nassim, Tobji, Mohamed Anis Bach, and Yaghlane, Boutheina Ben
- Subjects
- *
AMBIGUITY , *TAGS (Metadata) , *DATABASES , *IMPERFECTION , *PERFORMANCE theory , *DECISION making - Abstract
Rule-based classifiers use a collection of high-quality rules to classify new data instances. They can be categorized according to the adopted classification strategy: Classifiers based on a single rule, and classifiers based on multiple rules. Many works were proposed in this field. However, most of them do not handle imperfect data. In this study, we focus on the issue of multi-rules-based classification for evidential data, i.e., data where imperfection is modeled via the belief functions theory. In this respect, we introduce a new algorithm called PWEviRC. This latter involves a two-level pruning technique to remove redundant and noisy rules. Finally, it applies the Dempster rule of combination to fuse the selected rules and make the final decision. To evaluate the proposed method, we carried out extensive experiments on several benchmark data sets. The performance study showed interesting results in comparison to existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Rule-based continuous line classification using shape and positional relationships between objects in piping and instrumentation diagram.
- Author
-
Han, Seung-Tae, Moon, Yoochan, Lee, Hyunsik, and Mun, Duhwan
- Subjects
- *
INDUSTRIALISM , *CLASSIFICATION , *DIGITIZATION - Abstract
Recently, the digitization of piping and instrumentation diagrams (P&IDs) has become increasingly necessary in the plant industry. In previous studies on line recognition in P&IDs, lines were classified according to line signs after line detection. However, detailed studies on the classification of continuous lines are limited. Herein, a rule-based method for classifying continuous lines in a P&ID is proposed. This method uses the shape and positional relationships between objects in the P&ID to classify continuous lines into eight types: drain, annotation, spec break, dimension, extension, leader, thick continuous, and continuous line. A prototype system was developed to validate the proposed method, and line classification experiments were conducted on actual high-density industrial P&ID systems. The experimental results indicated that the average precision, recall, and F1 score for the five P&IDs were 96.300%, 97.945%, and 96.415%, respectively. These results indicate the excellent performance of the proposed method in classifying continuous lines for various P&IDs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Efficient sequential covering strategy for classification rules mining using a discrete equilibrium optimization algorithm.
- Author
-
MALIK, Mohamed Mahdi and HAOUASSI, Hichem
- Subjects
MATHEMATICAL optimization ,ASSOCIATION rule mining ,METAHEURISTIC algorithms ,CLASSIFICATION algorithms ,CLASSIFICATION ,DATA mining ,EQUILIBRIUM ,ALGORITHMS - Abstract
Rule-based classification is one of the important tasks in data mining due to its wide applications, particularly in the domains that need to interpret the classification decision such as medical diagnosis. The rule-based classification is a combination of the classification and association rule mining fields which aims at building interpretable classifiers by means of classification rules. This paper presents a novel and efficient sequential covering strategy for Classification Rule Mining to improve the interpretability of classifiers using a Discrete Equilibrium Optimization Algorithm called DEOA-CRM. Our approach benefits from the advantages of associative classification and population-based intelligence. It is inspired by the recent meta-heuristic equilibrium optimization algorithm. New discrete operators defined enable our approach to avoid local solutions and find global ones, improving the exploration and exploitation power in the search space. The proposed DEOA-CRM is tested on a total of 12 test data sets of various sizes and benchmarked with four recent and well-known rule-based classification mining algorithms. The obtained results confirm the efficiency of our algorithm in three chosen measures. Our approach fully deserves its use for classification rules generation to help decision-makers generate accurate and interpretable models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Task-Based Visual Interactive Modeling: Decision Trees and Rule-Based Classifiers.
- Author
-
Streeb, Dirk, Metz, Yannick, Schlegel, Udo, Schneider, Bruno, El-Assady, Mennatallah, Neth, Hansjorg, Chen, Min, and Keim, Daniel A.
- Subjects
DECISION trees ,VISUAL analytics ,MACHINE learning ,SYSTEMS design ,TASK analysis ,SYSTEMS development - Abstract
Visual analytics enables the coupling of machine learning models and humans in a tightly integrated workflow, addressing various analysis tasks. Each task poses distinct demands to analysts and decision-makers. In this survey, we focus on one canonical technique for rule-based classification, namely decision tree classifiers. We provide an overview of available visualizations for decision trees with a focus on how visualizations differ with respect to 16 tasks. Further, we investigate the types of visual designs employed, and the quality measures presented. We find that (i) interactive visual analytics systems for classifier development offer a variety of visual designs, (ii) utilization tasks are sparsely covered, (iii) beyond classifier development, node-link diagrams are omnipresent, (iv) even systems designed for machine learning experts rarely feature visual representations of quality measures other than accuracy. In conclusion, we see a potential for integrating algorithmic techniques, mathematical quality measures, and tailored interactive visualizations to enable human experts to utilize their knowledge more effectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Assessing the spatial variation of cropping intensity using multi-temporal Sentinel-2 data by rule-based classification.
- Author
-
Ghosh, Argha, Nanda, Manoj K., and Sarkar, Debolina
- Subjects
NORMALIZED difference vegetation index ,SPATIAL variation ,CROP rotation ,CROP management - Abstract
The present study was conducted to analyze cropping intensity of four blocks (Mogra-Chinsurah, Polba-Dadpur, Singur and Haripal) of the Gangetic alluvial zone of India using multi-dated Sentinel-2 data in 2018–19 cropping year. It was observed that during peak growing stage all crops ascribed higher Normalized Difference Vegetation Index NDVI values (0.4 to 0.73) and NDVI became as low as 0.06 when the fields were vacant. Sentinel-2 data acquired in the peak crop growing period during each cropping season were carefully selected, and NDVI was computed over the whole study area. Rule-based classification was applied for cropping sequence and cropping intensity classification based on the occurrence and non-occurrence of crops using NDVI threshold (0.4). Sentinel-2 images acquired on 22/10/2018, 6/12/2018, 30/1/2019 and 30/4/2019 were used for masking of trees and non-agricultural area. October 22, January 30 and April 30 imageries demonstrated peak crop growing period during kharif, rabi and pre-kharif seasons whereas December 6 image represented occurrence of no or little crop in the study area. Crop acreage was the highest in Polba-Dadpur block during all the three seasons. The crop–fallow—crop sequence occupied the highest areas (43%) followed by crop–crop–crop sequence (39%). 50% and 39% of the total cultivated land was under 200% and 300% cropping intensities. Overall, accuracies of cropping system and cropping intensity classification were 88.54% and 87.85%, respectively. Sentinel-2 data can be successfully used for cropping system analysis which helps in crop planning and management. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Fuzzy Rule-Based Classification Method for Incremental Rule Learning.
- Author
-
Niu, Jiaojiao, Chen, Degang, Li, Jinhai, and Wang, Hui
- Subjects
GRANULAR computing ,ARTIFICIAL intelligence ,CLASSIFICATION - Abstract
Granularrules have been extensively used for classification in fuzzy datasets to promote the advancement of artificial intelligence. However, due to the diversity of data types, how to improve the readability of the extracted granular rules while ensuring efficiency is always a challenge. Since granular reduct in granular computing (GrC) can simplify real complex problem and dataset, this article carries out granular rule learning from the perspective of granular reduct by taking formal concept analysis (FCA)-based GrC method as a framework. Specifically, for achieving classification task, we first propose a method to update the granular reduct, and then explore the updating mechanism of fuzzy granular rule in a reduced dataset. Second, a novel fuzzy rule-based classification model named FRCM is presented for fuzzy granular rule learning. In order to verify the effectiveness of the proposed model, some numerical experiments for incremental learning and fuzzy rule mining are conducted to demonstrate that FRCM can achieve the state-of-the-art classification performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. Application of Data Mining and Machine Learning in Microwave Radiometry (MWR)
- Author
-
Levshinskii, Vladislav, Galazis, Christoforos, Ovchinnikov, Lev, Vesnin, Sergey, Losev, Alexander, Goryanin, Igor, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Roque, Ana, editor, Tomczyk, Arkadiusz, editor, De Maria, Elisabetta, editor, Putze, Felix, editor, Moucek, Roman, editor, Fred, Ana, editor, and Gamboa, Hugo, editor
- Published
- 2020
- Full Text
- View/download PDF
16. Rule-Based BCG Matrix for Product Portfolio Analysis
- Author
-
Chiu, Chih-Chung, Lin, Kuo-Sui, Kacprzyk, Janusz, Series Editor, and Lee, Roger, editor
- Published
- 2020
- Full Text
- View/download PDF
17. Web Service Classification and Prediction Using Rule-Based Approach with Recommendations for Quality Improvements
- Author
-
Swami Das, M., Govardhan, A., Vijaya Lakshmi, D., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Raju, K. Srujan, editor, Govardhan, A., editor, Rani, B. Padmaja, editor, Sridevi, R., editor, and Murty, M. Ramakrishna, editor
- Published
- 2020
- Full Text
- View/download PDF
18. Exploring the Suitability of Rule-Based Classification to Provide Interpretability in Outcome-Based Process Predictive Monitoring.
- Author
-
Lee, Suhwan, Comuzzi, Marco, and Kwon, Nahyun
- Subjects
- *
ASSOCIATION rule mining , *PREDICTION models , *CLASSIFICATION , *RUNNING speed - Abstract
The development of models for process outcome prediction using event logs has evolved in the literature with a clear focus on performance improvement. In this paper, we take a different perspective, focusing on obtaining interpretable predictive models for outcome prediction. We propose to use association rule-based classification, which results in inherently interpretable classification models. Although association rule mining has been used with event logs for process model approximation and anomaly detection in the past, its application to an outcome-based predictive model is novel. Moreover, we propose two ways of visualising the rules obtained to increase the interpretability of the model. First, the rules composing a model can be visualised globally. Second, given a running case on which a prediction is made, the rules influencing the prediction for that particular case can be visualised locally. The experimental results on real world event logs show that in most cases the performance of the rule-based classifier (RIPPER) is close to the one of traditional machine learning approaches. We also show the application of the global and local visualisation methods to real world event logs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. The Importance of Agronomic Knowledge for Crop Detection by Sentinel-2 in the CAP Controls Framework: A Possible Rule-Based Classification Approach.
- Author
-
Sarvia, Filippo, De Petris, Samuele, Ghilardi, Federica, Xausa, Elena, Cantamessa, Gianluca, and Borgogno-Mondino, Enrico
- Subjects
- *
TIME-domain analysis , *MULTISPECTRAL imaging , *AGRICULTURAL policy , *RICE straw , *SOY flour , *CROPS , *RANDOM forest algorithms - Abstract
Farmers are supported by European Union (EU) through contributions related to the common agricultural policy (CAP). To obtain grants, farmers have to apply every year according to the national/regional procedure that, presently, relies on the Geo-Spatial Aid Application (GSAA). To ensure the properness of applications, national/regional payment agencies (PA) operate random controls through in-field surveys. EU regulation n. 809/2014 has introduced a new approach to CAP controls based on Copernicus Sentinel-2 (S2) data. These are expected to better address PA checks on the field, suggesting eventual inconsistencies between satellite-based deductions and farmers' declarations. Within this framework, this work proposed a hierarchical (HI) approach to the classification of crops (soya, corn, wheat, rice, and meadow) explicitly aimed at supporting CAP controls in agriculture, with special concerns about the Piemonte Region (NW Italy) agricultural situation. To demonstrate the effectiveness of the proposed approach, a comparison is made between HI and other, more ordinary approaches. In particular, two algorithms were considered as references: the minimum distance (MD) and the random forest (RF). Tests were operated in a study area located in the southern part of the Vercelli province (Piemonte), which is mainly devoted to agriculture. Training and validation steps were performed for all the classification approaches (HI, MD, RF) using the same ground data. MD and RF were based on S2-derived NDVI image time series (TS) for the 2020 year. Differently, HI was built according to a rule-based approach developing according to the following steps: (a) TS standard deviation analysis in the time domain for meadows mapping; (b) MD classification of winter part of TS in the time domain for wheat detection; (c) MD classification of summer part of TS in the time domain for corn classification; (d) selection of a proper summer multi-spectral image (SMSI) useful for separating rice from soya with MD operated in the spectral domain. To separate crops of interest from other classes, MD-based classifications belonging to HI were thresholded by Otsu's method. Overall accuracy for MD, RF, and HI were found to be 63%, 80%, and 89%, respectively. It is worth remarking that thanks to the SMSI-based approach of HI, a significant improvement was obtained in soya and rice classification. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. R.ROSETTA: an interpretable machine learning framework
- Author
-
Mateusz Garbulowski, Klev Diamanti, Karolina Smolińska, Nicholas Baltzer, Patricia Stoll, Susanne Bornelöv, Aleksander Øhrn, Lars Feuk, and Jan Komorowski
- Subjects
Transcriptomics ,Interpretable machine learning ,Big data ,Rough sets ,Rule-based classification ,R package ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. Results We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case–control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. Conclusions R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.
- Published
- 2021
- Full Text
- View/download PDF
21. Fuzzy three-way rule learning and its classification methods.
- Author
-
Cai, Mingjie, Yan, Mingzhe, and Jia, Zhenhua
- Subjects
- *
ARTIFICIAL intelligence , *GRANULAR computing , *CLASSIFICATION , *CLASSIFICATION algorithms , *MACHINE learning - Abstract
Rules play a crucial role in classification tasks, driving the advancement of artificial intelligence. However, how to improve the interpretability of extracted rules while ensuring the performance of classification tasks is always a challenge, owing to the diversity of data types. Since three-way decision rules derive and explain from positive and negative aspects and provide more detailed information than general rules, this article explores fuzzy three-way rule learning from the perspective of two-way granular reduct by taking the FCA-based granular computing method as a framework. Specifically, we first present the object-induced fuzzy three-way granular rules and the object-induced two-way fuzzy three-way rules. Then, the fuzzy three-way rule-based dynamic updating method (FTRDUM) and the weight-based voting method are proposed to improve the classification performance. Finally, to illustrate the effectiveness of FTRDUM, some numerical experiments are conducted. The results show the superiority of the proposed algorithm in classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Mathematical Variable Detection in PDF Scientific Documents
- Author
-
Hai Phong, Bui, Manh Hoang, Thang, Le, Thi-Lan, Aizawa, Akiko, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Nguyen, Ngoc Thanh, editor, Gaol, Ford Lumban, editor, Hong, Tzung-Pei, editor, and Trawiński, Bogdan, editor
- Published
- 2019
- Full Text
- View/download PDF
23. ReG-Rules: An Explainable Rule-Based Ensemble Learner for Classification
- Author
-
Manal Almutairi, Frederic Stahl, and Max Bramer
- Subjects
Data mining ,ensemble learning ,explainable algorithms ,rule-based classification ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The learning of classification models to predict class labels of new and previously unseen data instances is one of the most essential tasks in data mining. A popular approach to classification is ensemble learning, where a combination of several diverse and independent classification models is used to predict class labels. Ensemble models are important as they tend to improve the average classification accuracy over any member of the ensemble. However, classification models are also often required to be explainable to reduce the risk of irreversible wrong classification. Explainability of classification models is needed in many critical applications such as stock market analysis, credit risk evaluation, intrusion detection, etc. Unfortunately, ensemble learning decreases the level of explainability of the classification, as the analyst would have to examine many decision models to gain insights about the causality of the prediction. The aim of the research presented in this paper is to create an ensemble method that is explainable in the sense that it presents the human analyst with a conditioned view of the most relevant model aspects involved in the prediction. To achieve this aim the authors developed a rule-based explainable ensemble classifier termed Ranked ensemble G-Rules (ReG-Rules) which gives the analyst an extract of the most relevant classification rules for each individual prediction. During the evaluation process ReG-Rules was evaluated in terms of its theoretical computational complexity, empirically on benchmark datasets and qualitatively with respect to the complexity and readability of the induced rule sets. The results show that ReG-Rules scales linearly, delivers a high accuracy and at the same time delivers a compact and manageable set of rules describing the predictions made.
- Published
- 2021
- Full Text
- View/download PDF
24. Rule-based meta-analysis reveals the major role of PB2 in influencing influenza A virus virulence in mice
- Author
-
Fransiskus Xaverius Ivan and Chee Keong Kwoh
- Subjects
Influenza A virus ,Mouse models ,Virulence ,Proteins ,Meta-analysis ,Rule-based classification ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Influenza A virus (IAV) poses threats to human health and life. Many individual studies have been carried out in mice to uncover the viral factors responsible for the virulence of IAV infections. Nonetheless, a single study may not provide enough confident about virulence factors, hence combining several studies for a meta-analysis is desired to provide better views. For this, we documented more than 500 records of IAV infections in mice, whose viral proteins could be retrieved and the mouse lethal dose 50 or alternatively, weight loss and/or survival data, was/were available for virulence classification. Results IAV virulence models were learned from various datasets containing aligned IAV proteins and the corresponding two virulence classes (avirulent and virulent) or three virulence classes (low, intermediate and high virulence). Three proven rule-based learning approaches, i.e., OneR, JRip and PART, and additionally random forest were used for modelling. PART models achieved the best performance, with moderate average model accuracies ranged from 65.0 to 84.4% and from 54.0 to 66.6% for the two-class and three-class problems, respectively. PART models were comparable to or even better than random forest models and should be preferred based on the Occam’s razor principle. Interestingly, the average accuracy of the models was improved when host information was taken into account. For model interpretation, we observed that although many sites in HA were highly correlated with virulence, PART models based on sites in PB2 could compete against and were often better than PART models based on sites in HA. Moreover, PART had a high preference to include sites in PB2 when models were learned from datasets containing the concatenated alignments of all IAV proteins. Several sites with a known contribution to virulence were found as the top protein sites, and site pairs that may synergistically influence virulence were also uncovered. Conclusion Modelling IAV virulence is a challenging problem. Rule-based models generated using viral proteins are useful for its advantage in interpretation, but only achieve moderate performance. Development of more advanced approaches that learn models from features extracted from both viral and host proteins shall be considered for future works.
- Published
- 2019
- Full Text
- View/download PDF
25. Combined Use of 3D and HSI for the Classification of Printed Circuit Board Components.
- Author
-
Polat, Songuel, Tremeau, Alain, and Boochs, Frank
- Subjects
PRINTED circuits ,ELECTRONIC equipment ,PLASTICS ,WASTE recycling ,CLASSIFICATION ,ELECTRONICS recycling ,ELECTRONIC waste - Abstract
Successful recycling of electronic waste requires accurate separation of materials such as plastics, PCBs and electronic components on PCBs (capacitors, transistors, etc.). This article therefore proposes a vision approach based on a combination of 3D and HSI data, relying on the mutual support of the datasets to compensate existing weaknesses when using single 3D- and HSI-Sensors. The combined dataset serves as a basis for the extraction of geometric and spectral features. The classification is performed and evaluated based on these extracted features which are exploited through rules. The efficiency of the proposed approach is demonstrated using real electronic waste and leads to convincing results with an overall accuracy (OA) of 98.24%. To illustrate that the addition of 3D data has added value, a comparison is also performed with an SVM classification based only on hyperspectral data. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. Remote Sensing of Wetland Types: Temperate Bogs, Mires, and Fens
- Author
-
Lucas, Richard, Finlayson, C. Max, editor, Everard, Mark, editor, Irvine, Kenneth, editor, McInnes, Robert J., editor, Middleton, Beth A., editor, van Dam, Anne A., editor, and Davidson, Nick C., editor
- Published
- 2018
- Full Text
- View/download PDF
27. Rule-Based Classification
- Author
-
Tung, Anthony K. H., Shim, Kyuseok, Section editor, Liu, Ling, editor, and Özsu, M. Tamer, editor
- Published
- 2018
- Full Text
- View/download PDF
28. Rule Induction with Iterated Local Search.
- Author
-
Jabba, Ayad Mohammed
- Subjects
CLASSIFICATION algorithms ,SEARCH algorithms ,ALGORITHMS ,MATHEMATICAL induction ,DATA mining ,MATHEMATICAL optimization - Abstract
The wide amount of data in the modern applications available on the Internet make it very complicated to deal with the knowledge behind these data. The data classification task is a useful tool that used to deal with a huge amount of data by classify these data into coherent groups. The data size decreases the performace of the classification technique, especially when contain uninformative data (i.e., irrelevant, noisy). The stochastic local search algorithm is an optimization approaches employed to find the informative data to build the classification model. These methods are used to search for the optimal patterns to construct the classification algorithm. Thus, this research introduces a stochastic local search algorithm for rule induction called Iterated-Miner (Iterated local search-based rule induction). The purpose of the algorithm is to construct classification rules from data. This classification algorithm is inspired by the concepts and principles of stochastic local search and rule induction. The performance of the proposed classifier evaluated with a well-known and state of art classification algorithms called, Ant-Miner, ACO/PSO2, PART FURIA, and ACO/GA on 10 UCI datasets. The results demonstrate that our classifier is superior compare with all classifiers respect to classification accuracy; and the model sizes by our classifier are considerably competitive with those discovered by other classifier. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
29. S3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
- Author
-
Seyed M. H. Hasheminejad and M. Sarvmili
- Subjects
Educational Data Mining ,Particle Swarm Optimization ,Rule-Based Classification ,Information technology ,T58.5-58.64 ,Computer software ,QA76.75-76.765 - Abstract
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for using data mining in educational systems. In this paper, we propose a novel rule-based classification method, called S3PSO (Students’ Performance Prediction based on Particle Swarm Optimization), to extract the hidden rules, which could be used to predict students’ final outcome. The proposed S3PSO method is based on Particle Swarm Optimization (PSO) algorithm in discrete space. The S3PSO particles encoding inducts more interpretable even for normal users like instructors. In S3PSO, Support, Confidence, and Comprehensibility criteria are used to calculate the fitness of each rule. Comparing the obtained results from S3PSO with other rule-based classification methods such as CART, C4.5, and ID3 reveals that S3PSO improves 31 % of the value of fitness measurement for Moodle data set. Additionally, comparing the obtained results from S3PSO with other classification methods such as SVM, KNN, Naïve Bayes, Neural Network and APSO reveals that S3PSO improves 9 % of the value of accuracy for Moodle data set and yields promising results for predicting students’ final outcome.
- Published
- 2019
- Full Text
- View/download PDF
30. On the Use of Binary Features in a Rule-Based Approach for Defect Detection on Patterned Textiles
- Author
-
Rocio A. Lizarraga-Morales, Fernando E. Correa-Tome, Raul E. Sanchez-Yanez, and Jonathan Cepeda-Negrete
- Subjects
Textile defect detection ,local binary features ,rule-based classification ,visual inspection ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The quality assurance of fabrics is a fundamental issue in the textile manufacturing industry. Automatic and accurate detection of defects is one of the most important and challenging tasks in order to guarantee the quality of fabrics. In this paper, we propose an approach for the defect detection on textiles with patterned texture using a rule-based classification system and the local binary features. In our proposal, rules are automatically learned from the textile samples using a rough-set-based approach. The proposed system analyzes the texture of fabrics using a combination of local binary features, which have shown to be highly discriminatory. Our approach is performed in two stages: training and testing. During the training stage, binary features from both defective and defect-free images are extracted and used to formulate an ensemble of the rough-set-based rules. For the testing stage, we submit different samples of fabrics, and they are classified as defective or defect-free. The proposed method is quantitatively evaluated on an extensive dataset of images of the defective fabrics. These experiments show that the proposed approach results in higher accuracy, in comparison with those obtained by the state-of-the-art methods.
- Published
- 2019
- Full Text
- View/download PDF
31. The Importance of Agronomic Knowledge for Crop Detection by Sentinel-2 in the CAP Controls Framework: A Possible Rule-Based Classification Approach
- Author
-
Filippo Sarvia, Samuele De Petris, Federica Ghilardi, Elena Xausa, Gianluca Cantamessa, and Enrico Borgogno-Mondino
- Subjects
agronomic knowledge ,hierarchical crops classification ,rule-based classification ,common agricultural policy controls ,sentinel-2 ,Agriculture - Abstract
Farmers are supported by European Union (EU) through contributions related to the common agricultural policy (CAP). To obtain grants, farmers have to apply every year according to the national/regional procedure that, presently, relies on the Geo-Spatial Aid Application (GSAA). To ensure the properness of applications, national/regional payment agencies (PA) operate random controls through in-field surveys. EU regulation n. 809/2014 has introduced a new approach to CAP controls based on Copernicus Sentinel-2 (S2) data. These are expected to better address PA checks on the field, suggesting eventual inconsistencies between satellite-based deductions and farmers’ declarations. Within this framework, this work proposed a hierarchical (HI) approach to the classification of crops (soya, corn, wheat, rice, and meadow) explicitly aimed at supporting CAP controls in agriculture, with special concerns about the Piemonte Region (NW Italy) agricultural situation. To demonstrate the effectiveness of the proposed approach, a comparison is made between HI and other, more ordinary approaches. In particular, two algorithms were considered as references: the minimum distance (MD) and the random forest (RF). Tests were operated in a study area located in the southern part of the Vercelli province (Piemonte), which is mainly devoted to agriculture. Training and validation steps were performed for all the classification approaches (HI, MD, RF) using the same ground data. MD and RF were based on S2-derived NDVI image time series (TS) for the 2020 year. Differently, HI was built according to a rule-based approach developing according to the following steps: (a) TS standard deviation analysis in the time domain for meadows mapping; (b) MD classification of winter part of TS in the time domain for wheat detection; (c) MD classification of summer part of TS in the time domain for corn classification; (d) selection of a proper summer multi-spectral image (SMSI) useful for separating rice from soya with MD operated in the spectral domain. To separate crops of interest from other classes, MD-based classifications belonging to HI were thresholded by Otsu’s method. Overall accuracy for MD, RF, and HI were found to be 63%, 80%, and 89%, respectively. It is worth remarking that thanks to the SMSI-based approach of HI, a significant improvement was obtained in soya and rice classification.
- Published
- 2022
- Full Text
- View/download PDF
32. Exploring the Suitability of Rule-Based Classification to Provide Interpretability in Outcome-Based Process Predictive Monitoring
- Author
-
Suhwan Lee, Marco Comuzzi, and Nahyun Kwon
- Subjects
business process ,event log ,predictive monitoring ,explainability ,rule-based classification ,Industrial engineering. Management engineering ,T55.4-60.8 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The development of models for process outcome prediction using event logs has evolved in the literature with a clear focus on performance improvement. In this paper, we take a different perspective, focusing on obtaining interpretable predictive models for outcome prediction. We propose to use association rule-based classification, which results in inherently interpretable classification models. Although association rule mining has been used with event logs for process model approximation and anomaly detection in the past, its application to an outcome-based predictive model is novel. Moreover, we propose two ways of visualising the rules obtained to increase the interpretability of the model. First, the rules composing a model can be visualised globally. Second, given a running case on which a prediction is made, the rules influencing the prediction for that particular case can be visualised locally. The experimental results on real world event logs show that in most cases the performance of the rule-based classifier (RIPPER) is close to the one of traditional machine learning approaches. We also show the application of the global and local visualisation methods to real world event logs.
- Published
- 2022
- Full Text
- View/download PDF
33. Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder
- Author
-
Mateusz Garbulowski, Karolina Smolinska, Klev Diamanti, Gang Pan, Khurram Maqbool, Lars Feuk, and Jan Komorowski
- Subjects
autism spectrum disorder ,interpretable machine learning ,transcriptomics ,rule-based classification ,autism spectrum disorder subtypes ,data integration ,Genetics ,QH426-470 - Abstract
Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.
- Published
- 2021
- Full Text
- View/download PDF
34. Inundation mapping of Kerala flood event in 2018 using ALOS-2 and temporal Sentinel-1 SAR images.
- Subjects
- *
FLOOD warning systems , *FLOODS , *EMERGENCY management , *REMOTE sensing , *GOVERNMENT agencies , *SECONDARY analysis - Abstract
In August 2018, the southern Indian state of Kerala received unusually heavy rainfall leading to largescale flooding and destruction. Reliable flood inundation maps derived from remote sensing techniques help in flood disaster management activities. The freely available Sentinel-1A/B SAR data have the potential for flood inundation mapping due to its all-weather imaging capability. In this study, temporal dual-pol Sentinel-1 SAR data have been utilized. Single-date ALOS-2/PALSAR-2 commercial SAR data were also used to fill the gap between Sentinel-1 acquisitions during the peak flood-period. Two flood-mapping approaches, viz. rule-based classification in case of temporal SAR data and histogram-based thresholding approach in case of single-date imagery, were utilized in the study. Also, flood inundation mapping with different data constraints, i.e. availability of single-date and multi-date imagery has been analysed and discussed. The obtained results were validated with multiple data sources like survey data and secondary data from government agencies. An overall accuracy of 90.6% and a critical success index of 81.6% were achieved with the proposed rule-based classification approach. This study highlights the potential of the combination of Sentinel-1 and ALOS-2/PALSAR-2 data for flood inundation mapping. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
35. R.ROSETTA: an interpretable machine learning framework.
- Author
-
Garbulowski, Mateusz, Diamanti, Klev, Smolińska, Karolina, Baltzer, Nicholas, Stoll, Patricia, Bornelöv, Susanne, Øhrn, Aleksander, Feuk, Lars, and Komorowski, Jan
- Subjects
- *
MACHINE learning , *DATA mining , *GENES , *LEARNING strategies , *STATISTICAL models , *ROUGH sets - Abstract
Background: Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. Results: We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA. To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case–control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. Conclusions: R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
36. Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder.
- Author
-
Garbulowski, Mateusz, Smolinska, Karolina, Diamanti, Klev, Pan, Gang, Maqbool, Khurram, Feuk, Lars, and Komorowski, Jan
- Subjects
AUTISM spectrum disorders ,MACHINE learning ,ASPERGER'S syndrome ,GENETIC disorders ,GENES ,HEBBIAN memory - Abstract
Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A , which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. Semi-automatic multi-segmentation classification for land cover change dynamics in North Macedonia from 1988 to 2014.
- Author
-
Kaplan, Gordana
- Abstract
Land cover assessment and monitoring are essential for sustainable management of natural resources and environmental protection. Object-based image analysis (OBIA) for land cover classification has become an area of interest due to its superiority over the pixel-based classification method. The main objective of this paper is developing a method for land cover classification on the national and sub-national level in the Republic of North Macedonia for mapping and monitoring the land cover changes in the study area from 1988 to 2014. For that purpose, in this study, we combine OBIA with rule set semi-automated multi-segmentation classification for large-scale areas over medium-resolution satellite imagery. Thus, Landsat image collections over North Macedonia have been combined with topographic and settlement layers for land cover classification. Based on the knowledge of certain land cover features, rule-based classification has been developed using two different segmentation parameters. The results show that the overall agreement of the new semi-automatic classification method developed for North Macedonia is 83%. The most significant change in the land cover can be noticed in the forest class, with a total increase of 8% on national and 15% in the South-East region. These results confirm that this new semi-automatic, cost-effective, and accurate land cover classification method can be easily employed and adjusted for different study areas and can be used in numerous remote sensing applications. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Multi-view Genetic Programming Learning to Obtain Interpretable Rule-Based Classifiers for Semi-supervised Contexts. Lessons Learnt
- Author
-
Carlos García-Martínez and Sebastián Ventura
- Subjects
Multi-view learning ,Rule-based classification ,Comprehensibility ,Semi-supervised learning ,Co-training ,Grammar-based genetic programming ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Multi-view learning analyzes the information from several perspectives and has largely been applied on semi-supervised contexts. It has not been extensively analyzed for inducing interpretable rule-based classifiers. We present a multi-view and grammar-based genetic programming model for inducing rules for semi-supervised contexts. It evolves several populations and views, and promotes both accuracy and agreement among the views. This work details how and why common practices may not produce the expected results when inducing rule-based classifiers under this methodology.
- Published
- 2020
- Full Text
- View/download PDF
39. Automatic quality measurement of aortic contrast-enhanced CT angiographies for patient-specific dose optimization.
- Author
-
Pallenberg, René, Fleitmann, Marja, Soika, Kira, Stroth, Andreas Martin, Gerlach, Jan, Fürschke, Alexander, Barkhausen, Jörg, Bischof, Arpad, and Handels, Heinz
- Abstract
Purpose: Iodine-containing contrast agent (CA) used in contrast-enhanced CT angiography (CTA) can pose a health risk for patients. A system that adjusts the frequently used standard CA dose for individual patients based on their clinical parameters can be useful. As basis the quality of the image contrast in CTA volumes has to be determined, especially to recognize excessive contrast induced by CA overdosing. However, a manual assessment with a ROI-based image contrast classification is a time-consuming step in everyday clinical practice. Methods: We propose a method to automate the contrast measurement of aortic CTA volumes. The proposed algorithm is based on the mean HU values in selected ROIs that were automatically positioned in the CTA volume. First, an automatic localization algorithm determines the CTA image slices for certain ROIs followed by the localization of these ROIs. A rule-based classification using the mean HU values in the ROIs categorizes images with insufficient, optimal and excessive contrast. Results: In 95.89% (70 out of 73 CTAs obtained with the ulrich medical CT motion contrast media injector) the algorithm chose the same image contrast class as the radiological expert. The critical case of missing an overdose did not occur with a positive predicative value of 100%. Conclusion: The resulting system works well within our range of considered scan protocols detecting enhanced areas in CTA volumes. Our work automized an assessment for classifying CA-induced image contrast which reduces the time needed for medical practitioners to perform such an assessment manually. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
40. Rule-based meta-analysis reveals the major role of PB2 in influencing influenza A virus virulence in mice.
- Author
-
Ivan, Fransiskus Xaverius and Chee Keong Kwoh
- Abstract
Background: Influenza A virus (IAV) poses threats to human health and life. Many individual studies have been carried out in mice to uncover the viral factors responsible for the virulence of IAV infections. Nonetheless, a single study may not provide enough confident about virulence factors, hence combining several studies for a meta-analysis is desired to provide better views. For this, we documented more than 500 records of IAV infections in mice, whose viral proteins could be retrieved and the mouse lethal dose 50 or alternatively, weight loss and/or survival data, was/were available for virulence classification. Results: IAV virulence models were learned from various datasets containing aligned IAV proteins and the corresponding two virulence classes (avirulent and virulent) or three virulence classes (low, intermediate and high virulence). Three proven rule-based learning approaches, i.e., OneR, JRip and PART, and additionally random forest were used for modelling. PART models achieved the best performance, with moderate average model accuracies ranged from 65.0 to 84.4% and from 54.0 to 66.6% for the two-class and three-class problems, respectively. PART models were comparable to or even better than random forest models and should be preferred based on the Occam’s razor principle. Interestingly, the average accuracy of the models was improved when host information was taken into account. For model interpretation, we observed that although many sites in HA were highly correlated with virulence, PART models based on sites in PB2 could compete against and were often better than PART models based on sites in HA. Moreover, PART had a high preference to include sites in PB2 when models were learned from datasets containing the concatenated alignments of all IAV proteins. Several sites with a known contribution to virulence were found as the top protein sites, and site pairs that may synergistically influence virulence were also uncovered. Conclusion: Modelling IAV virulence is a challenging problem. Rule-based models generated using viral proteins are useful for its advantage in interpretation, but only achieve moderate performance. Development of more advanced approaches that learn models from features extracted from both viral and host proteins shall be considered for future works. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
41. A Fuzzy Rule-Based Learning Algorithm for Customer Churn Prediction
- Author
-
Huang, Bingquan, Huang, Ying, Chen, Chongcheng, Kechadi, M. -T., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Perner, Petra, editor
- Published
- 2016
- Full Text
- View/download PDF
42. Combined Use of 3D and HSI for the Classification of Printed Circuit Board Components
- Author
-
Songuel Polat, Alain Tremeau, and Frank Boochs
- Subjects
hyperspectral imaging ,3D data ,point cloud ,classification ,rule-based classification ,waste sorting ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Successful recycling of electronic waste requires accurate separation of materials such as plastics, PCBs and electronic components on PCBs (capacitors, transistors, etc.). This article therefore proposes a vision approach based on a combination of 3D and HSI data, relying on the mutual support of the datasets to compensate existing weaknesses when using single 3D- and HSI-Sensors. The combined dataset serves as a basis for the extraction of geometric and spectral features. The classification is performed and evaluated based on these extracted features which are exploited through rules. The efficiency of the proposed approach is demonstrated using real electronic waste and leads to convincing results with an overall accuracy (OA) of 98.24%. To illustrate that the addition of 3D data has added value, a comparison is also performed with an SVM classification based only on hyperspectral data.
- Published
- 2021
- Full Text
- View/download PDF
43. EARC: Evidential association rule-based classification.
- Author
-
Geng, Xiaojiao, Liang, Yan, and Jiao, Lianmeng
- Subjects
- *
ALGORITHMS , *MODEL-based reasoning , *ASSOCIATION rule mining , *CLASSIFICATION , *MULTIPLE criteria decision making - Abstract
As an extension of classical fuzzy rule-based classification, the belief rule-based classification is a promising technique for handling hybrid information with multiple uncertainties in real-world applications. However, the antecedent structure of each resultant rule is fixed and hence may cause overfitting in small instance cases, while some resultant rules are also redundant due to the similarity of neighboring rules. Here, an evidential association rule-based classification method, called EARC, is developed by integrating evidential association rule mining and classification to obtain an accurate and compact classification model. First, new measures of evidential support and confidence are proposed to represent rule interestingness. Then, a three-stage rule mining algorithm is developed to generate a set of evidential classification association rules, including Apriori-based frequent fuzzy itemsets searching for discovering all possible antecedents, evidential consequents deriving in the belief function framework, and reliable rule extracting with measures of evidential support and confidence. Further, to make the classification efficient, the procedures of rule prescreening and rule selection are presented for deleting redundant rules and obtaining an accurate classifier, respectively. At last, an improved belief reasoning process is presented for classifying each input instance by combining the top K activated rules. Experimental results based on real-world datasets demonstrate the superiority of the proposed method on classification accuracy and interpretability. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
44. A Travel Time Prediction Algorithm Using Rule-Based Classification on MapReduce
- Author
-
Lee, HyunJo, Hong, Seungtae, Kim, Hyung Jin, Chang, Jae-Woo, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Chen, Qiming, editor, Hameurlain, Abdelkader, editor, Toumani, Farouk, editor, Wagner, Roland, editor, and Decker, Hendrik, editor
- Published
- 2015
- Full Text
- View/download PDF
45. Rule-based classification of energy theft and anomalies in consumers load demand profile
- Author
-
Sonal Jain, Kushan A. Choksi, and Naran M. Pindoriya
- Subjects
pattern classification ,power consumption ,data mining ,security of data ,learning (artificial intelligence) ,fraud ,power system management ,power engineering computing ,data privacy ,metering ,power meters ,knowledge based systems ,meta data ,classification block ,rule-based classification ,energy theft ,consumers load demand profile ,advanced metering infrastructure ,ami ,consumers consumption patterns ,power utilities ,fraud detection methodology ,data mining techniques ,consumer consumption patterns ,rule-base learning ,validation technique ,energy anomalies ,abnormality type classification ,validation block ,privacy preservation ,metadata ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The invent of advanced metering infrastructure (AMI) opens the door for a comprehensive analysis of consumers consumption patterns including energy theft studies, which were not possible beforehand. This study proposes a fraud detection methodology using data mining techniques such as hierarchical clustering and decision tree classification to identify abnormalities in consumer consumption patterns and further classify the abnormality type into the anomaly, fraud, high or low power consumption based on rule-based learning. The proposed algorithm uses real-time dataset of Nana Kajaliyala village, Gujarat, India. The focus has been on generalizing the algorithm for varied practical cases to make it adaptive towards non-malicious changes in consumer profile. Simultaneously, this study proposes a novel validation technique used for validation, which utilizes predicted profiles to ensure accurate bifurcation between anomaly and theft targets. The result exhibits high detection ratio and low false-positive ratio due to the application of appropriate validation block. The proposed methodology is also investigated from point of view of privacy preservation and is found to be relatively secure owing to low-sampling rates, minimal usage of metadata and communication layer. The proposed algorithm has an edge over state-of-the-art theft detection algorithms in detection accuracy and robustness towards outliers.
- Published
- 2019
- Full Text
- View/download PDF
46. Rule-Based Text Classification of Dental Diagnosis.
- Author
-
Wang M, Agrawal A, Rogers N, John V, and Thyvalikakath T
- Subjects
- Humans, Machine Learning, Medical Records, Natural Language Processing, Clinical Decision-Making, Algorithms
- Abstract
Unstructured medical records boast an abundance of information that could greatly facilitate medical decision-making and improve patient care. With the development of Natural Language Processing (NLP) methodology, the free-text medical data starts to attract more and more research attention. Most existing studies try to leverage the power of such unstructured data using Machine Learning algorithms, which would usually require a relatively large training set, and high computational capacity. However, when faced with a smaller-scale project, opting for an alternative approach may be more effective and practical. This project proposes an efficient and light-weight rule-based approach to categorize dental diagnosis data. It not only fills the void of dental records in the medical free-text processing area, but also demonstrates that with expertly designed research structure and proper implementation, simple method could achieve our study goal very competently.
- Published
- 2024
- Full Text
- View/download PDF
47. Neutrosophic rule-based prediction system for toxicity effects assessment of biotransformed hepatic drugs.
- Author
-
Basha, Sameh H., Tharwat, Alaa, Abdalla, Areeg, and Hassanien, Aboul Ella
- Subjects
- *
DRUG toxicity , *TOXICITY testing , *BIOTRANSFORMATION (Metabolism) , *CLASSIFICATION - Abstract
Highlights • Novel Neutrosophic Rule-based Classification System (NRCS) models are proposed. • Our dataset has 553 drugs that were bio-transformed into liver. • Three rough set-based algorithms are used for features selection. • Three sampling algorithms are used to obtain balanced data. • The proposed model obtained promising results. Abstract Measuring toxicity is an important step in drug development. However, the current experimental methods which are used to estimate the drug toxicity are expensive and need high computational efforts. Therefore, these methods are not suitable for large-scale evaluation of drug toxicity. As a consequence, there is a high demand to implement computational models that can predict drug toxicity risks. In this paper, we used a dataset that consists of 553 drugs that biotransformed in the liver. In this data, there are four toxic effects, namely, mutagenic, tumorigenic, irritant and reproductive effects. Each drug is represented by 31 chemical descriptors. This paper proposes two models for predicting drug toxicity risks. The proposed models consist of three phases. In the first phase, the most discriminative features are selected using rough set-based methods to reduce the classification time and improve the classification performance. In the second phase, three different sampling algorithms, namely, Random Under-Sampling, Random Over-Sampling, and Synthetic Minority Oversampling Technique (SMOTE) are used to obtain balanced data. In the third phase, the first proposed model employs the Neutrosophic Rule-based Classification System (NRCS), and the second model uses Genetic NRCS (GNRCS) to classify an unknown drug into toxic or non-toxic. The experimental results proved that the proposed models obtained high sensitivity (89–93%), specificity (91–97%), and GM (90–94%) for all toxic effects. Overall, the results of the proposed models indicate that it could be utilized for the prediction of drug toxicity in the early stages of drug development. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
48. Evolving rule-based classifiers with genetic programming on GPUs for drifting data streams.
- Author
-
Cano, Alberto and Krawczyk, Bartosz
- Subjects
- *
GENETIC programming , *GRAPHICS processing units , *INSTRUCTIONAL systems , *MACHINE learning , *DATA mining - Abstract
Highlights • Grammar-guided genetic programming rule-based classifier for drifting data streams. • Online induction of highly accurate and interpretable rules. • Fast adaptation to any type of concept drift. • Mechanisms for rule diversification and adaptation. • Efficient implementation on GPUs suitable for high-speed data streams. Abstract Designing efficient algorithms for mining massive high-speed data streams has become one of the contemporary challenges for the machine learning community. Such models must display highest possible accuracy and ability to swiftly adapt to any kind of changes, while at the same time being characterized by low time and memory complexities. However, little attention has been paid to designing learning systems that will allow us to gain a better understanding of incoming data. There are few proposals on how to design interpretable classifiers for drifting data streams, yet most of them are characterized by a significant trade-off between accuracy and interpretability. In this paper, we show that it is possible to have all of these desirable properties in one model. We introduce ERulesD2S: evolving rule-based classifier for drifting data Streams. By using grammar-guided genetic programming, we are able to obtain accurate sets of rules per class that are able to adapt to changes in the stream without a need for an explicit drift detector. Additionally, we augment our learning model with new proposals for rule propagation and data stream sampling, in order to maintain a balance between learning and forgetting of concepts. To improve efficiency of mining massive and non-stationary data, we implement ERulesD2S parallelized on GPUs. A thorough experimental study on 30 datasets proves that ERulesD2S is able to efficiently adapt to any type of concept drift and outperform state-of-the-art rule-based classifiers, while using small number of rules. At the same time ERulesD2S is highly competitive to other single and ensemble learners in terms of accuracy and computational complexity, while offering fully interpretable classification rules. Additionally, we show that ERulesD2S can scale-up efficiently to high-dimensional data streams, while offering very fast update and classification times. Finally, we present the learning capabilities of ERulesD2S for sparsely labeled data streams. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
49. Knowledge-based Systems and Interestingness Measures: Analysis with Clinical Datasets
- Author
-
Jabez J. Christopher, Khanna H. Nehemiah, and Kannan Arputharaj
- Subjects
knowledge base ,decision support systems ,rule-based classification ,rule list ,interestingeness measures ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Knowledge mined from clinical data can be used for medical diagnosis and prognosis. By improving the quality of knowledge base, the efficiency of prediction of a knowledge-based system can be enhanced. Designing accurate and precise clinical decision support systems, which use the mined knowledge, is still a broad area of research. This work analyses the variation in classification accuracy for such knowledge-based systems using different rule lists. The purpose of this work is not to improve the prediction accuracy of a decision support system, but analyze the factors that influence the efficiency and design of the knowledge base in a rule-based decision support system. Three benchmark medical datasets are used. Rules are extracted using a supervised machine learning algorithm (PART). Each rule in the ruleset is validated using nine frequently used rule interestingness measures. After calculating the measure values, the rule lists are used for performance evaluation. Experimental results show variation in classification accuracy for different rule lists. Confidence and Laplace measures yield relatively superior accuracy: 81.188% for heart disease dataset and 78.255% for diabetes dataset. The accuracy of the knowledge-based prediction system is predominantly dependent on the organization of the ruleset. Rule length needs to be considered when deciding the rule ordering. Subset of a rule, or combination of rule elements, may form new rules and sometimes be a member of the rule list. Redundant rules should be eliminated. Prior knowledge about the domain will enable knowledge engineers to design a better knowledge base.
- Published
- 2016
- Full Text
- View/download PDF
50. Copyright Infringement Detection of Music Videos on YouTube by Mining Video and Uploader Meta-data
- Author
-
Agrawal, Swati, Sureka, Ashish, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Bhatnagar, Vasudha, editor, and Srinivasa, Srinath, editor
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.