25 results
Search Results
2. High-Dimensional Ensemble Learning Classification: An Ensemble Learning Classification Algorithm Based on High-Dimensional Feature Space Reconstruction.
- Author
-
Zhao, Miao and Ye, Ning
- Subjects
MACHINE learning ,CLASSIFICATION algorithms ,FEATURE selection ,NAIVE Bayes classification ,HIGH-dimensional model representation ,CLASSIFICATION ,ALGORITHMS ,PROBLEM solving - Abstract
When performing classification tasks on high-dimensional data, traditional machine learning algorithms often fail to filter out valid information in the features adequately, leading to low levels of classification accuracy. Therefore, this paper explores the high-dimensional data from both the data feature dimension and the model ensemble dimension. We propose a high-dimensional ensemble learning classification algorithm focusing on feature space reconstruction and classifier ensemble, called the HDELC algorithm. First, the algorithm considers feature space reconstruction and then generates a feature space reconstruction matrix. It effectively achieves feature selection and reconstruction for high-dimensional data. An optimal feature space is generated for the subsequent ensemble of the classifier, which enhances the representativeness of the feature space. Second, we recursively determine the number of classifiers and the number of feature subspaces in the ensemble model. Different classifiers in the ensemble system are assigned mutually exclusive non-intersecting feature subspaces for model training. The experimental results show that the HDELC algorithm has advantages compared with most high-dimensional datasets due to its more efficient feature space ensemble capability and relatively reliable ensemble operation performance. The HDELC algorithm makes it possible to solve the classification problem for high-dimensional data effectively and has vital research and application value. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. The Feasibility of Solving the Satisfiability Problem Using Various Machine Learning Approaches.
- Author
-
Lan Zhang, Lei Hu, and Lina Yu
- Subjects
- *
DEEP learning , *MACHINE learning , *PROBLEM solving , *MODAL logic , *PROPOSITION (Logic) , *CLASSIFICATION algorithms - Abstract
In this paper, we propose a novel approach to solving theorem proving problems without relying on any deduction method. We transform logical formulas into numbers, vectors and matrices, and feed the corresponding data into various machine learning algorithms to predict their satisfiability. Here, we introduce ProverX, a novel theorem prover that utilizes various binary classification algorithms, ranging from traditional machine learning to deep learning, to tackle the satisfiability (SAT) problem of propositional logic. Empirical experiments were conducted to evaluate the performance of ProverX using datasets we generated. ProverX achieved accuracy rates ranging from 81.8% to 98.7%, demonstrating a remarkable speedup of almost 180 times compared to CTL-RP, a highly efficient prover. These results demonstrate the feasibility of replacing deduction with learning in theorem proving, opening promising avenues for further exploration in more complex logics, e.g., Modal Logic, Coalition Logic, Propositional Linear-Time Temporal Logic, Computation Tree Logic, etc., provided that their resolution methods exist. We also present a detailed analysis of the best-performing machine learning approaches for this task and introduce a new algorithm, IPDA, which has the potential to further enhance performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
4. Balanced K-Star: An Explainable Machine Learning Method for Internet-of-Things-Enabled Predictive Maintenance in Manufacturing.
- Author
-
Ghasemkhani, Bita, Aktas, Ozlem, and Birant, Derya
- Subjects
MACHINE learning ,PRODUCT management software ,INTERNET of things ,CLASSIFICATION algorithms ,ARTIFICIAL intelligence ,PROBLEM solving - Abstract
Predictive maintenance (PdM) combines the Internet of Things (IoT) technologies with machine learning (ML) to predict probable failures, which leads to the necessity of maintenance for manufacturing equipment, providing the opportunity to solve the related problems and thus make adaptive decisions in a timely manner. However, a standard ML algorithm cannot be directly applied to a PdM dataset, which is highly imbalanced since, in most cases, signals correspond to normal rather than critical conditions. To deal with data imbalance, in this paper, a novel explainable ML method entitled "Balanced K-Star" based on the K-Star classification algorithm is proposed for PdM in an IoT-based manufacturing environment. Experiments conducted on a PdM dataset showed that the proposed Balanced K-Star method outperformed the standard K-Star method in terms of classification accuracy. The results also showed that the proposed method (98.75%) achieved higher accuracy than the state-of-the-art methods (91.74%) on the same data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets.
- Author
-
Bulavas, Viktoras, Marcinkevičius, Virginijus, and Rumiński, Jacek
- Subjects
CLASSIFICATION algorithms ,DECISION trees ,PROBLEM solving ,FEATURE selection ,ALGORITHMS ,MACHINE learning ,RANDOM forest algorithms ,COMPUTER networks - Abstract
This paper is devoted to the problem of class imbalance in machine learning, focusing on the intrusion detection of rare classes in computer networks. The problem of class imbalance occurs when one class heavily outnumbers examples from the other classes. In this paper, we are particularly interested in classifiers, as pattern recognition and anomaly detection could be solved as a classification problem. As still a major part of data network traffic of any organization network is benign, and malignant traffic is rare, researchers therefore have to deal with a class imbalance problem. Substantial research has been undertaken in order to identify these methods or data features that allow to accurately identify these attacks. But the usual tactic to deal with the imbalance class problem is to label all malignant traffic as one class and then solve the binary classification problem. In this paper, however, we choose not to group or to drop rare classes but instead investigate what could be done in order to achieve good multi-class classification efficiency. Rare class records were up-sampled using SMOTE method (Chawla et al., 2002) to a preset ratio targets. Experiments with the 3 network traffic datasets, namely CIC-IDS2017, CSE-CIC-IDS2018 (Sharafaldin et al., 2018) and LITNET-2020 (Damasevicius et al., 2020) were performed aiming to achieve reliable recognition of rare malignant classes available in these datasets. Popular machine learning algorithms were chosen for comparison of their readiness to support rare class detection. Related algorithm hyper parameters were tuned within a wide range of values, different data feature selection methods were used and tests were executed with and without over-sampling to test the multiple class problem classification performance of rare classes. Machine learning algorithms ranking based on Precision, Balanced Accuracy Score, G ¯ , and prediction error Bias and Variance decomposition, show that decision tree ensembles (Adaboost, Random Forest Trees and Gradient Boosting Classifier) performed best on the network intrusion datasets used in this research. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Learning vector quantization classifiers for ROC-optimization.
- Author
-
Villmann, T., Kaden, M., Hermann, W., and Biehl, M.
- Subjects
LEARNING vector quantization ,CLASSIFICATION algorithms ,COMBINATORIAL optimization ,MACHINE learning ,PROBLEM solving - Abstract
This paper proposes a variant of the generalized learning vector quantizer (GLVQ) optimizing explicitly the area under the receiver operating characteristics (ROC) curve for binary classification problems instead of the classification accuracy, which is frequently not appropriate for classifier evaluation. This is particularly important in case of overlapping class distributions, when the user has to decide about the trade-off between high true-positive and good false-positive performance. The model keeps the idea of learning vector quantization based on prototypes by stochastic gradient descent learning. For this purpose, a GLVQ-based cost function is presented, which describes the area under the ROC-curve in terms of the sum of local discriminant functions. This cost function reflects the underlying rank statistics in ROC analysis being involved into the design of the prototype based discriminant function. The resulting learning scheme for the prototype vectors uses structured inputs, i.e. ordered pairs of data vectors of both classes. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
7. Differentially Private Kernel Support Vector Machines Based on the Exponential and Laplace Hybrid Mechanism.
- Author
-
Sun, Zhenlong, Yang, Jing, Li, Xiaoye, and Zhang, Jianpei
- Subjects
SUPPORT vector machines ,CLASSIFICATION algorithms ,MEAN value theorems ,PROBLEM solving ,ALGORITHMS ,SYMMETRIC matrices ,MACHINE learning - Abstract
Support vector machines (SVMs) are among the most robust and accurate methods in all well-known machine learning algorithms, especially for classification. The SVMs train a classification model by solving an optimization problem to decide which instances in the training datasets are the support vectors (SVs). However, SVs are intact instances taken from the training datasets and directly releasing the classification model of the SVMs will carry significant risk to the privacy of individuals, when the training datasets contain sensitive information. In this paper, we study the problem of how to release the classification model of kernel SVMs while preventing privacy leakage of the SVs and satisfying the requirement of privacy protection. We propose a new differentially private algorithm for the kernel SVMs based on the exponential and Laplace hybrid mechanism named DPKSVMEL. The DPKSVMEL algorithm has two major advantages compared with existing private SVM algorithms. One is that it protects the privacy of the SVs by postprocessing and the training process of the non-private kernel SVMs does not change. Another is that the scoring function values are directly derived from the symmetric kernel matrix generated during the training process and does not require additional storage space and complex sensitivity analysis. In the DPKSVMEL algorithm, we define a similarity parameter to denote the correlation or distance between the non-SVs and every SV. And then, every non-SV is divided into a group with one of the SVs according to the maximal value of the similarity. Under some certain similarity parameter value, we replace every SV with a mean value of the top-k randomly selected most similar non-SVs within the group by the exponential mechanism if the number of non-SVs is greater than k. Otherwise, we add random noise to the SVs by the Laplace mechanism. We theoretically prove that the DPKSVMEL algorithm satisfies differential privacy. The extensive experiments show the effectiveness of the DPKSVMEL algorithm for kernel SVMs on real datasets; meanwhile, it achieves higher classification accuracy than existing private SVM algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Transfer Learning Analysis of Multi-Class Classification for Landscape-Aware Algorithm Selection.
- Author
-
Škvorc, Urban, Eftimov, Tome, and Korošec, Peter
- Subjects
CLASSIFICATION algorithms ,MACHINE learning ,KNOWLEDGE transfer ,PROBLEM solving - Abstract
In optimization, algorithm selection, which is the selection of the most suitable algorithm for a specific problem, is of great importance, as algorithm performance is heavily dependent on the problem being solved. However, when using machine learning for algorithm selection, the performance of the algorithm selection model depends on the data used to train and test the model, and existing optimization benchmarks only provide a limited amount of data. To help with this problem, artificial problem generation has been shown to be a useful tool for augmenting existing benchmark problems. In this paper, we are interested in the problem of knowledge transfer between the artificially generated and existing handmade benchmark problems in the domain of continuous numerical optimization. That is, can an algorithm selection model trained purely on artificially generated problems correctly provide algorithm recommendations for existing handmade problems. We show that such a model produces low-quality results, and we also provide explanations about how the algorithm selection model works and show the differences between the problem data sets in order to explain the model's performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Machine learning based phishing detection from URLs.
- Author
-
Sahingoz, Ozgur Koray, Buber, Ebubekir, Demir, Onder, and Diri, Banu
- Subjects
- *
MACHINE learning , *PHISHING , *UNIFORM Resource Locators , *CLASSIFICATION algorithms , *PROBLEM solving - Abstract
Highlights • Use of 7 different classification algorithms and NLP based features. • A Big URL Data Set is produced and shared (36,400 legitimate and 37,175 phishing). • Real-time and language-independent classification algorithms. • Feature-rich classifiers with Word Vectors, NLP-based and Hybrid features. • The proposed approach reaches 97.98% accuracy rate. Abstract Due to the rapid growth of the Internet, users change their preference from traditional shopping to the electronic commerce. Instead of bank/shop robbery, nowadays, criminals try to find their victims in the cyberspace with some specific tricks. By using the anonymous structure of the Internet, attackers set out new techniques, such as phishing, to deceive victims with the use of false websites to collect their sensitive information such as account IDs, usernames, passwords, etc. Understanding whether a web page is legitimate or phishing is a very challenging problem, due to its semantics-based attack structure, which mainly exploits the computer users' vulnerabilities. Although software companies launch new anti-phishing products, which use blacklists, heuristics, visual and machine learning-based approaches, these products cannot prevent all of the phishing attacks. In this paper, a real-time anti-phishing system, which uses seven different classification algorithms and natural language processing (NLP) based features, is proposed. The system has the following distinguishing properties from other studies in the literature: language independence, use of a huge size of phishing and legitimate data, real-time execution, detection of new websites, independence from third-party services and use of feature-rich classifiers. For measuring the performance of the system, a new dataset is constructed, and the experimental results are tested on it. According to the experimental and comparative results from the implemented classification algorithms, Random Forest algorithm with only NLP based features gives the best performance with the 97.98% accuracy rate for detection of phishing URLs. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment.
- Author
-
Abdi, Asad, Shamsuddin, Siti Mariyam, Hasan, Shafaatunnur, and Piran, Jalil
- Subjects
- *
MACHINE learning , *INFORMATION processing , *COMPUTER users , *CLASSIFICATION algorithms , *PROBLEM solving - Abstract
Sentiment summarization is the process of automatically creating a compressed version of the opinionated information expressed in a text. This paper presents a machine learning-based approach to summarize user's opinion expressed in reviews using: (1) Sentiment knowledge to calculate a sentence sentiment score as one of the features for sentence-level classification. It integrates multiple strategies to tackle the following problems: sentiment shifter, the types of sentences and word coverage limit. (2) Word embedding model, a deep-learning-inspired method to understand meaning and semantic relationships among words and to extract a vector representation for each word. (3) Statistical and linguistic knowledge to determine salient sentences. The proposed method combines several types of features into a unified feature set to design a more accurate classification system ( “True”: the extractive reference summary; “False”: otherwise ). Thus, to achieve better performance scores, we carried out a performance study of four well-known feature selection techniques and seven of the most famous classifiers to select the most relevant set of features and find an efficient machine learning classifier, respectively. The proposed method is applied to three different datasets and the results show the integration of support vector machine-based classification method and Information Gain (IG) as a feature selection technique can significantly improve the performance and make the method comparable to other existing methods. Furthermore, our method that learns from this unified feature set can obtain better performance than one that learns from a feature subset. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
11. Improved Local Search Based Approximation Algorithm for Hard Uniform Capacitated k-Median Problem.
- Author
-
Grover, Sapna, Gupta, Neelima, and Pancholi, Aditya
- Subjects
SEARCH algorithms ,APPROXIMATION algorithms ,PROBLEM solving ,MACHINE learning ,CLASSIFICATION algorithms - Abstract
Copyright of Informatica (03505596) is the property of Slovene Society Informatika and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2018
- Full Text
- View/download PDF
12. A collective learning approach for semi-supervised data classification.
- Author
-
UYLAŞ SATI, Nur
- Subjects
- *
DATA mining , *MACHINE learning , *PROBLEM solving , *CLASSIFICATION algorithms , *DATA analysis - Abstract
Semi-supervised data classification is one of significant field of study in machine learning and data mining since it deals with datasets which consists both a few labeled and many unlabeled data. The researchers have interest in this field because in real life most of the datasets have this feature. In this paper we suggest a collective method for solving semi-supervised data classification problems. Examples in R1 presented and solved to gain a clear understanding. For comparison between state of art methods, well-known machine learning tool WEKA is used. Experiments are made on real-world datasets provided in UCI dataset repository. Results are shown in tables in terms of testing accuracies by use of ten fold cross validation. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
13. Effective data generation for imbalanced learning using conditional generative adversarial networks.
- Author
-
Douzas, Georgios and Bacao, Fernando
- Subjects
- *
MACHINE learning , *CLASSIFICATION algorithms , *PROBLEM solving , *APPROXIMATION algorithms , *DATA distribution - Abstract
Learning from imbalanced datasets is a frequent but challenging task for standard classification algorithms. Although there are different strategies to address this problem, methods that generate artificial data for the minority class constitute a more general approach compared to algorithmic modifications. Standard oversampling methods are variations of the SMOTE algorithm, which generates synthetic samples along the line segment that joins minority class samples. Therefore, these approaches are based on local information, rather on the overall minority class distribution. Contrary to these algorithms, in this paper the conditional version of Generative Adversarial Networks (cGAN) is used to approximate the true data distribution and generate data for the minority class of various imbalanced datasets. The performance of cGAN is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when cGAN is used as an oversampling algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
14. Using machine learning to detect and localize concealed objects in passive millimeter-wave images.
- Author
-
López-Tapia, Santiago, Molina, Rafael, and Pérez de la Blanca, Nicolás
- Subjects
- *
MACHINE learning , *MILLIMETER waves , *PROBLEM solving , *CLASSIFICATION algorithms , *COMPUTATIONAL complexity - Abstract
The detection and location of objects concealed under clothing is a very challenging task that has crucial applications in security. In this domain, passive millimeter-wave images (PMMWIs) can be used. However, the quality of the acquired images, and the unknown position, shape, and size of hidden objects render this task difficult. In this paper, we propose a machine learning-based solution to this detection/localization problem. Our method outperforms currently used approaches. The effect of non-stationary noise on different classification algorithms is analyzed and discussed, and a detailed experimental comparative study of classification techniques is presented using a new and comprehensive PMMWI database. The low computational testing cost of this solution allows for its use in real-time applications. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
15. A novel multi-swarm particle swarm optimization with dynamic learning strategy.
- Author
-
Ye, Wenxing, Feng, Weiying, and Fan, Suohai
- Subjects
PARTICLE swarm optimization ,MACHINE learning ,INFORMATION sharing ,PROBLEM solving ,CLASSIFICATION algorithms - Abstract
In the paper, we proposed a novel multi-swarm particle swarm optimization with dynamic learning strategy (PSO-DLS) to improve the performance of PSO. To promote information exchange among sub-swarms, the particle classification mechanism advocates that particles in each sub-swarm are classified into ordinary particles and communication particles with different tasks at each iteration. The ordinary particles focus on exploitation under the guidance of the local best position in its sub-swarm, while the communication particles with dynamic ability that focus on exploration under the guidance of a united local best position in a new search region promote information to be exchanged among sub-swarms. Moreover the strategy sets a dynamic control mechanism with an increasing parameter p for implementing the classification operation, which provides ordinary particles an increasing sense of evolution into communication particles during the searching process. A simple case of analysis on searching behavior supports its remarkable impact on maintaining the diversity and searching a better solution. Experimental results on 15 function problems of CEC 2015 for 10 and 30 dimensions also demonstrate its promising effectiveness in solving complex problems statistically comparing to other algorithms. What's more, the computational times reveal the subtle design of PSO-DLS. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
16. Mexican Hat Wavelet Kernel ELM for Multiclass Classification.
- Author
-
Wang, Jie, Song, Yi-Fan, and Ma, Tian-Lei
- Subjects
- *
FEEDFORWARD neural networks , *MACHINE learning , *WAVELETS (Mathematics) , *CLASSIFICATION algorithms , *PROBLEM solving - Abstract
Kernel extreme learning machine (KELM) is a novel feedforward neural network, which is widely used in classification problems. To some extent, it solves the existing problems of the invalid nodes and the large computational complexity in ELM. However, the traditional KELM classifier usually has a low test accuracy when it faces multiclass classification problems. In order to solve the above problem, a new classifier, Mexican Hat wavelet KELM classifier, is proposed in this paper. The proposed classifier successfully improves the training accuracy and reduces the training time in the multiclass classification problems. Moreover, the validity of the Mexican Hat wavelet as a kernel function of ELM is rigorously proved. Experimental results on different data sets show that the performance of the proposed classifier is significantly superior to the compared classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
17. Incremental rough set approach for hierarchical multicriteria classification.
- Author
-
Luo, Chuan, Li, Tianrui, Chen, Hongmei, Fujita, Hamido, and Yi, Zhang
- Subjects
- *
ROUGH sets , *MULTIPLE criteria decision making , *CLASSIFICATION algorithms , *PROBLEM solving , *DATA analysis , *MACHINE learning - Abstract
Multicriteria classification refers to classify objects evaluated by a set of criteria to preference-ordered decision classes. Dominance-based rough set approach has been successfully introduced to express and reason inconsistencies with a dominance principle in multicriteria classification problems. Hierarchical attribute values exist extensively within many real-world applications, which provide a hierarchical form to organize, view and analyze data from different perspectives for accommodating the preference variability. In this study, we consider an extension of dominance-based rough set approach by applying an incremental learning technique for hierarchical multicriteria classification while attribute values dynamically vary across different levels of granulations. We formalize the dynamic characteristics of knowledge granules with the cut refinement and coarsening through attribute value taxonomies in the hierarchical multicriteria decision systems. In consequence, incremental algorithms for computing dominance-based rough approximations of preference-ordered decision classes are developed by applying the resulted prior-knowledge as the input, and only recomputing those outputs which depend on the changed attribute values. This paper presents the theoretical foundation of the proposed approach. Example analysis and experimental evaluation are also provided for illustration of the feasibility and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
18. Multi-class classification via heterogeneous ensemble of one-class classifiers.
- Author
-
Kang, Seokho, Cho, Sungzoon, and Kang, Pilsung
- Subjects
- *
CLASSIFICATION algorithms , *REGRESSION analysis , *PROBLEM solving , *MACHINE learning , *EXPERIMENTS - Abstract
In this paper, a multi-class classification method based on heterogeneous ensemble of one-class classifiers is proposed. The proposed method consists of two phases: training heterogeneous one-class classifiers for each class using various one-class classification algorithms, and constructing an ensemble by combining the base classifiers using multi-response linear regression-based stacking. The use of various classification algorithms contributes towards increasing the diversity of the ensemble, while stacking resolves the normalization issues on different scales of outputs obtained from the base classifiers. In addition, we also demonstrate the selective utilization of base classifiers by adopting a stepwise variable selection procedure during stacking. Through our experiments on multi-class benchmark datasets, we concluded that our proposed method outperforms the methods that are based on single one-class classification algorithms with statistical significance. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
19. Deep instance envelope network-based imbalance learning algorithm with multilayer fuzzy C-means clustering and minimum interlayer discrepancy.
- Author
-
Li, Fan, Zhang, Xiaoheng, Wang, Pin, and Li, Yongming
- Subjects
FUZZY algorithms ,CLASSIFICATION algorithms ,DATA mining ,MACHINE learning ,PROBLEM solving ,ALGORITHMS - Abstract
Imbalanced learning is important and challenging since the problem of the classification of imbalanced datasets is prevalent in machine learning and data mining fields. Sampling approaches are proposed to address this issue, and cluster-based oversampling methods have shown great potential as they aim to simultaneously tackle between-class and within-class imbalance issues. However, all existing clustering methods are based on a one-time approach. Due to the lack of a priori knowledge, improper setting of the number of clusters often exists, which leads to poor clustering performance. Besides, the existing methods are likely to generate noisy instances. To solve these problems, this paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD). This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge. First, the MlFCM is designed for the original minority class instances to obtain deep instances and increase the diversity of instances. Then, the MIDMD is proposed to avoid the generation of noisy instances and maintain the consistency of the interlayers of instances. Next, the multilayer FCM and minimum interlayer discrepancy mechanism are combined to construct a deep instance envelope network – the MlFC&IDMD. Finally, an imbalance learning algorithm is proposed based on the MlFC&IDMD. In the experimental section, thirty-three popular public datasets are used for verification, and over ten representative algorithms are used for comparison. The experimental results show that the proposed approach significantly outperforms other popular methods. • For the first time, a multilayer FCM (MlFCM) algorithm for classification is proposed to mine the instances for more information. • A minimum interlayer discrepancy mechanism (MIDMD) is proposed to make the distribution of instances before and after clustering consistent. • A deep instance envelope network (MlFC&IDMD) is constructed by combining the MlFCM and MIDMD. • An efficient strategy is proposed to determine the number of layers L of the deep instance envelope network. • A new imbalance learning algorithm is proposed based on the MlFC&IDMD. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Cost-Sensitive Variational Autoencoding Classifier for Imbalanced Data Classification.
- Author
-
Liu, Fen and Qian, Quan
- Subjects
CLASSIFICATION algorithms ,AMORPHOUS substances ,CLASSIFICATION ,DATA distribution ,MACHINE learning ,PROBLEM solving - Abstract
Classification is among the core tasks in machine learning. Existing classification algorithms are typically based on the assumption of at least roughly balanced data classes. When performing tasks involving imbalanced data, such classifiers ignore the minority data in consideration of the overall accuracy. The performance of traditional classification algorithms based on the assumption of balanced data distribution is insufficient because the minority-class samples are often more important than others, such as positive samples, in disease diagnosis. In this study, we propose a cost-sensitive variational autoencoding classifier that combines data-level and algorithm-level methods to solve the problem of imbalanced data classification. Cost-sensitive factors are introduced to assign a high cost to the misclassification of minority data, which biases the classifier toward minority data. We also designed misclassification costs closely related to tasks by embedding domain knowledge. Experimental results show that the proposed method performed the classification of bulk amorphous materials well. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. DRG grouping by machine learning: from expert-oriented to data-based method.
- Author
-
Liu, Xiaoting, Fang, Chenhao, Wu, Chao, Yu, Jianxing, and Zhao, Qi
- Subjects
MACHINE learning ,PROBLEM solving ,DECISION making ,HEALTH care reform ,CLASSIFICATION algorithms - Abstract
Background: Diagnosis-related groups (DRGs) are a payment system that could effectively solve the problem of excessive increases in healthcare costs which are applied as a principal measure in the healthcare reform in China. However, expert-oriented DRG grouping is a black box with the drawbacks of upcoding and high cost.Methods: This study proposes a method of data-based grouping, designed and updated by machine learning algorithms, which could be trained by real cases, or even simulated cases. It inherits the decision-making rules from the expert-oriented grouping and improves performance by incorporating continuous updates at low cost. Five typical classification algorithms were assessed and some suggestions were made for algorithm choice. The kappa coefficients were reported to evaluate the performance of grouping.Results: Based on tenfold cross-validation, experiments showed that data-based grouping had a similar classification performance to the expert-oriented grouping when choosing suitable algorithms. The groupings trained by simulated cases had less accuracy when they were tested by the real cases rather than simulated cases, but the kappa coefficients of the best model were still higher than 0.6. When the grouping was tested in a new DRGs system, the average kappa coefficients were significantly improved from 0.1534 to 0.6435 by the update; and with enough computation resources, the update process could be completed in a very short time.Conclusions: As a new potential option, the data-based grouping meets the requirements of the DRGs system and has the advantages of high transparency and low cost in the design and update process. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
22. Random Fuzzy Granular Decision Tree.
- Author
-
Li, Wei, Ma, Xiaoyu, Chen, Yumin, Dai, Bin, Chen, Runjing, Tang, Chao, Luo, Youmeng, and Zhang, Kaiqiang
- Subjects
FUZZY graphs ,FUZZY clustering technique ,PROBLEM solving ,GRANULAR computing ,DECISION trees ,STANDARD deviations ,INTRACLASS correlation ,CLASSIFICATION algorithms ,MACHINE learning - Abstract
In this study, the classification problem is solved from the view of granular computing. That is, the classification problem is equivalently transformed into the fuzzy granular space to solve. Most classification algorithms are only adopted to handle numerical data; random fuzzy granular decision tree (RFGDT) can handle not only numerical data but also nonnumerical data like information granules. Measures can be taken in four ways as follows. First, an adaptive global random clustering (AGRC) algorithm is proposed, which can adaptively find the optimal cluster centers and maximize the ratio of interclass standard deviation to intraclass standard deviation, and avoid falling into local optimal solution; second, on the basis of AGRC, a parallel model is designed for fuzzy granulation of data to construct granular space, which can greatly enhance the efficiency compared with serial granulation of data; third, in the fuzzy granular space, we design RFGDT to classify the fuzzy granules, which can select important features as tree nodes based on information gain ratio and avoid the problem of overfitting based on the pruning algorithm proposed. Finally, we employ the dataset from UC Irvine Machine Learning Repository for verification. Theory and experimental results prove that RFGDT has high efficiency and accuracy and is robust in solving classification problems. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
23. Gully Erosion Susceptibility Mapping in Highly Complex Terrain Using Machine Learning Models.
- Author
-
Yang, Annan, Wang, Chunmei, Pang, Guowei, Long, Yongqing, Wang, Lei, Cruse, Richard M., and Yang, Qinke
- Subjects
MACHINE learning ,LAND degradation ,EROSION ,PROBLEM solving ,CLASSIFICATION algorithms ,ALGORITHMS ,CURVE fitting ,CURVES - Abstract
Gully erosion is the most severe type of water erosion and is a major land degradation process. Gully erosion susceptibility mapping (GESM)'s efficiency and interpretability remains a challenge, especially in complex terrain areas. In this study, a WoE-MLC model was used to solve the above problem, which combines machine learning classification algorithms and the statistical weight of evidence (WoE) model in the Loess Plateau. The three machine learning (ML) algorithms utilized in this research were random forest (RF), gradient boosted decision trees (GBDT), and extreme gradient boosting (XGBoost). The results showed that: (1) GESM were well predicted by combining both machine learning regression models and WoE-MLC models, with the area under the curve (AUC) values both greater than 0.92, and the latter was more computationally efficient and interpretable; (2) The XGBoost algorithm was more efficient in GESM than the other two algorithms, with the strongest generalization ability and best performance in avoiding overfitting (averaged AUC = 0.947), followed by the RF algorithm (averaged AUC = 0.944), and GBDT algorithm (averaged AUC = 0.938); and (3) slope gradient, land use, and altitude were the main factors for GESM. This study may provide a possible method for gully erosion susceptibility mapping at large scale. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
24. Education 4.0: Teaching the Basics of KNN, LDA and Simple Perceptron Algorithms for Binary Classification Problems.
- Author
-
Lopez-Bernal, Diego, Balderas, David, Ponce, Pedro, and Molina, Arturo
- Subjects
FISHER discriminant analysis ,DISRUPTIVE innovations ,CLASSIFICATION algorithms ,PROBLEM solving ,MACHINE learning ,ALGORITHMS ,TEACHER educators - Abstract
One of the main focuses of Education 4.0 is to provide students with knowledge on disruptive technologies, such as Machine Learning (ML), as well as the skills to implement this knowledge to solve real-life problems. Therefore, both students and professors require teaching and learning tools that facilitate the introduction to such topics. Consequently, this study looks forward to contributing to the development of those tools by introducing the basic theory behind three machine learning classifying algorithms: K-Nearest-Neighbor (KNN), Linear Discriminant Analysis (LDA), and Simple Perceptron; as well as discussing the diverse advantages and disadvantages of each method. Moreover, it is proposed to analyze how these methods work on different conditions through their implementation over a test bench. Thus, in addition to the description of each algorithm, we discuss their application to solving three different binary classification problems using three different datasets, as well as comparing their performances in these specific case studies. The findings of this study can be used by teachers to provide students the basic knowledge of KNN, LDA, and perceptron algorithms, and, at the same time, it can be used as a guide to learn how to apply them to solve real-life problems that are not limited to the presented datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
25. Exploring Symmetry of Binary Classification Performance Metrics.
- Author
-
Luque, Amalia, Carrasco, Alejandro, Martín, Alejandro, and Lama, Juan Ramón
- Subjects
MATHEMATICAL symmetry ,CLASSIFICATION algorithms ,METRIC spaces ,PROBLEM solving ,BINARY control systems - Abstract
Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.