4,039 results
Search Results
2. A Review Paper on Dimensionality Reduction Techniques.
- Author
-
Mulla, Faizan Riyaz and Gupta, Anil Kumar
- Subjects
- *
FEATURE selection , *DATA compression , *MATRIX decomposition , *MACHINE learning , *RANDOM variables , *PREDICTION models - Abstract
Dimensionality Reduction (DR) is the process of reducing the numerous features or random variables under consideration to a limited number of features by obtaining a set of principal variables. These techniques cater great values in machine learning, which come in handy to simplify a classification or a regression dataset, thereby yielding a better-performing predictive model. Techniques used for DR include Feature Selection methods, Matrix Factorization, AutoEncoder methods, and Manifold Learning. Merits of DR include data compression, reduced space of storage, and removal of redundant features. This paper attempts to review various techniques used to carry out dimensionality reduction while providing an exhaustive comparative study over the merits and demerits of each of the techniques used under the empirical experiments performed by the authors whose work is being reviewed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Classification models for likelihood prediction of diabetes at early stage using feature selection
- Author
-
Oladimeji, Oladosu Oyebisi, Oladimeji, Abimbola, and Oladimeji, Olayanju
- Published
- 2024
- Full Text
- View/download PDF
4. Auto-encoder-based algorithm for the selection of key characteristics for products to reduce inspection efforts
- Author
-
Greipel, Jonathan S., Frank, Regina M., Huber, Meike, Steland, Ansgar, and Schmitt, Robert H.
- Published
- 2023
- Full Text
- View/download PDF
5. Feature engineering of EEG applied to mental disorders: a systematic mapping study
- Author
-
García-Ponsoda, Sandra, García-Carrasco, Jorge, Teruel, Miguel A., Maté, Alejandro, and Trujillo, Juan
- Published
- 2023
- Full Text
- View/download PDF
6. Feature Mining and Sensitivity Analysis with Adaptive Sparse Attention for Bearing Fault Diagnosis.
- Author
-
Jiang, Qinglei, Bao, Binbin, Hou, Xiuqun, Huang, Anzheng, Jiang, Jiajie, and Mao, Zhiwei
- Subjects
FAULT diagnosis ,SENSITIVITY analysis ,RECOMMENDER systems ,FEATURE selection ,FILTER paper ,MACHINE learning - Abstract
Bearing fault diagnosis for equipment-safe operation has a crucial role. In recent years, more achievements have been made in bearing fault diagnosis. However, for the fault diagnosis model, the representation and sensitivity of bearing fault features have a great influence on the diagnosis output results; thus, the attention mechanism is particularly important for the selection of features. However, global attention focuses on all sequences, which is computationally expensive and not ideal for fault diagnosis tasks. The local attention mechanism ignores the relationship between non-adjacent sequences. To address the respective shortcomings of global attention and local attention, an adaptive sparse attention network is proposed in this paper to filter fault-sensitive information by soft threshold filtering. In addition, the effects of different signal representation domains on fault diagnosis results are investigated to filter out signal representation forms with better performance. Finally, the proposed adaptive sparse attention network is applied to cross-working conditions diagnosis of bearings. The adaptive sparse attention mechanism focuses on the signal characteristics of different frequency bands for different fault types. The proposed network model achieves better overall performance when comparing the cross-conditions diagnosis accuracy and model convergence speed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Optimizing feature selection in intrusion detection systems: Pareto dominance set approaches with mutual information and linear correlation.
- Author
-
Barbosa, Guilherme Nunes Nasseh, Andreoni, Martin, and Mattos, Diogo Menezes Ferrazani
- Subjects
FEATURE selection ,INTRUSION detection systems (Computer security) ,MACHINE learning ,SOCIAL dominance ,PEARSON correlation (Statistics) ,FILTER paper - Abstract
In the realm of network intrusion detection using machine learning, feature selection aims for computational efficiency, enhanced performance, and model interpretability, preventing overfitting and optimizing data visualization. This paper proposes a filtering method for feature selection, which optimizes information quantity and linear correlation between resultant features. The method identifies Pareto dominant pairs of informative and correlated features, constructs a graph, and selects key features based on betweenness centrality in its connected components. The proposal yields a more concise and informative dataset representation. Experimental results, using three diverse datasets, demonstrate that the proposal achieves more than 95% accuracy in classifying network attacks with just 14% of the total number features in original datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Classifying breast cancer using multi-view graph neural network based on multi-omics data.
- Author
-
Yanjiao Ren, Yimeng Gao, Wei Du, Weibo Qiao, Wei Li, Qianqian Yang, Yanchun Liang, and Gaoyang Li
- Subjects
GRAPH neural networks ,DEEP learning ,MACHINE learning ,FEATURE selection ,BREAST cancer ,TUMOR classification - Abstract
Introduction: As the evaluation indices, cancer grading and subtyping have diverse clinical, pathological, and molecular characteristics with prognostic and therapeutic implications. Although researchers have begun to study cancer differentiation and subtype prediction, most of relevant methods are based on traditional machine learning and rely on single omics data. It is necessary to explore a deep learning algorithm that integrates multi-omics data to achieve classification prediction of cancer differentiation and subtypes. Methods: This paper proposes a multi-omics data fusion algorithm based on a multi-view graph neural network (MVGNN) for predicting cancer differentiation and subtype classification. The model framework consists of a graph convolutional network (GCN) module for learning features from different omics data and an attention module for integrating multi-omics data. Three different types of omics data are used. For each type of omics data, feature selection is performed using methods such as the chi-square test and minimum redundancy maximum relevance (mRMR). Weighted patient similarity networks are constructed based on the selected omics features, and GCN is trained using omics features and corresponding similarity networks. Finally, an attention module integrates different types of omics features and performs the final cancer classification prediction. Results: To validate the cancer classification predictive performance of the MVGNN model, we conducted experimental comparisons with traditional machine learning models and currently popular methods based on integrating multi-omics data using 5-fold cross-validation. Additionally, we performed comparative experiments on cancer differentiation and its subtypes based on single omics data, two omics data, and three omics data. Discussion: This paper proposed the MVGNN model and it performed well in cancer classification prediction based on multiple omics data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Effects of feature selection on lane-change maneuver recognition: an analysis of naturalistic driving data
- Author
-
Li, Xiaohan, Wang, Wenshuo, Zhang, Zhang, and Rötting, Matthias
- Published
- 2019
- Full Text
- View/download PDF
10. An Intrusion Detection Method Based on Hybrid Machine Learning and Neural Network in the Industrial Control Field.
- Author
-
Sun, Duo, Zhang, Lei, Jin, Kai, Ling, Jiasheng, and Zheng, Xiaoyuan
- Subjects
INTRUSION detection systems (Computer security) ,MACHINE learning ,ARTIFICIAL neural networks ,INDUSTRIAL controls manufacturing ,FEATURE selection ,COMPUTER network traffic ,MACHINE theory - Abstract
Aiming at the imbalance of industrial control system data and the poor detection effect of industrial control intrusion detection systems on network attack traffic problems, we propose an ETM-TBD model based on hybrid machine learning and neural network models. Aiming at the problem of high dimensionality and imbalance in the amount of sample data in the massive data of industrial control systems, this paper proposes an IG-based feature selection method and an oversampling method for SMOTE. In the ETM-TBD model, we propose a hyperparameter optimization method based on Bayesian optimization used to optimize the parameters of the four basic machine learners in the model. By introducing a multi-head-attention mechanism, the Transformer module increases the attention between local features and global features, enabling the discovery of the internal relationship between features. Additionally, the BiGRU is used to preserve the temporal features of the dataset, while the DNN is used to extract deeper features. Finally, the SoftMax classifier is used to classify the output. By analyzing the results of the comparison and ablation experiments, it can be concluded that the F1-score of the ETM-TBD model on a robotic arm dataset is 0.9665 and the model has very low FNR and FPR scores of 0.0263 and 0.0081, respectively. It can be seen that the model in this paper is better than the traditional single machine learning algorithm as well as the algorithm lacking any of the modules. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. A model for skin cancer using combination of ensemble learning and deep learning.
- Author
-
Hosseinzadeh, Mehdi, Hussain, Dildar, Zeki Mahmood, Firas Muhammad, A. Alenizi, Farhan, Varzeghani, Amirhossein Noroozi, Asghari, Parvaneh, Darwesh, Aso, Malik, Mazhar Hussain, and Lee, Sang-Woong
- Subjects
SKIN cancer ,DEEP learning ,FEATURE selection ,MACHINE learning ,RANDOM forest algorithms ,SURVIVAL rate - Abstract
Skin cancer has a significant impact on the lives of many individuals annually and is recognized as the most prevalent type of cancer. In the United States, an estimated annual incidence of approximately 3.5 million people receiving a diagnosis of skin cancer underscores its widespread prevalence. Furthermore, the prognosis for individuals afflicted with advancing stages of skin cancer experiences a substantial decline in survival rates. This paper is dedicated to aiding healthcare experts in distinguishing between benign and malignant skin cancer cases by employing a range of machine learning and deep learning techniques and different feature extractors and feature selectors to enhance the evaluation metrics. In this paper, different transfer learning models are employed as feature extractors, and to enhance the evaluation metrics, a feature selection layer is designed, which includes diverse techniques such as Univariate, Mutual Information, ANOVA, PCA, XGB, Lasso, Random Forest, and Variance. Among transfer models, DenseNet-201 was selected as the primary feature extractor to identify features from data. Subsequently, the Lasso method was applied for feature selection, utilizing diverse machine learning approaches such as MLP, XGB, RF, and NB. To optimize accuracy and precision, ensemble methods were employed to identify and enhance the best-performing models. The study provides accuracy and sensitivity rates of 87.72% and 92.15%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Comprehensive review of solar radiation modeling based on artificial intelligence and optimization techniques: future concerns and considerations.
- Author
-
Attar, Nasrin Fathollahzadeh, Sattari, Mohammad Taghi, Prasad, Ramendra, and Apaydin, Halit
- Subjects
SOLAR radiation ,ARTIFICIAL intelligence ,MATHEMATICAL optimization ,RENEWABLE energy sources ,SOLAR energy ,FEATURE selection - Abstract
An alternative energy source such as solar is one of the most important renewable resources. A reliable solar radiation prediction is essential for various applications in agriculture, industry, transport, and the environment because they reduce greenhouse gases and are environmentally friendly. Solar radiation data series have embedded fluctuations and noise signals due to complexity, stochasticity, non-stationarity, and nonlinearity with uncertain and time-varying nature. Aside from being highly nonlinear, solar radiation is highly influenced by the environment and environmental parameters such as air temperature, cloud cover, surface reflectivity, and aerosols. In addition, the spatial measurements of these variables are not readily available. To tackle these challenges, it is necessary to consider data preprocessing techniques and to develop and test precise solar radiation predicting models at different forecast horizons. There is, however, controversy regarding the performance of such models in various studies. Comparisons are not conducted systematically among the different studies. Using a critical literature review, the authors hope to answer these questions and believe that further investigation of solar radiation can benefit researchers and practitioners alike. This study presents a comprehensive evaluation of solar radiation modeling using artificial intelligence in the last 15 years and provides a novel detailed analysis of the available models. The studies conducted in different climates of the world that were published in distinguished journals were considered (i.e., 90 papers in total) for this purpose. Newly discovered procedures for optimizing forecasts, data cleaning, feature selection, classification methods, and stand-alone or hybrid data-driven models for solar radiation prediction and modeling were evaluated. The results strikingly showed that the most used artificial intelligence methods were artificial neural network, adaptive neuro-fuzzy inference system, and decision tree family of models. In addition, the extreme learning machine, support vector machine, and particle swarm optimization were the most used optimization techniques in solar radiation modeling. In terms of forecast horizons, the most common forecast horizon found in papers was on the daily scale (51% of studies), followed by the hourly scale (26%), and the least common was the monthly scale (18%). Based on the regional studies, the highest number of solar radiation papers originated from Asia, with Europe in second place and African countries in third place. An increasing trend in the number of papers from 2011 to 2015 was noted, and the second peak started from 2018 till the present. Under each section, a summary of findings is provided. The paper concludes with future thoughts and directions on solar radiation modeling. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Retrieval of High-Frequency Temperature Profiles by FY-4A/GIIRS Based on Generalized Ensemble Learning.
- Author
-
Gen WANG, Wei HAN, Song YUAN, Jing WANG, Ruo-Ying YIN, Song YE, and Feng XIE
- Subjects
GEOSTATIONARY satellites ,MACHINE learning ,ATMOSPHERIC temperature ,FEATURE selection ,RANDOM forest algorithms ,WEATHER forecasting - Abstract
The temperature profile is an important parameter of the atmospheric thermal state in atmospheric monitoring and weather forecasting. The hyperspectral infrared sounder of a geostationary satellite provides abundant spectral information and can retrieve the temperature profile. Based on the mediumwave channel data (independent variable and model input data) of FY-4A/GIIRS (geosynchronous interferometric infrared sounder) and ERA5 reanalysis data (dependent variable and model output data), the atmospheric temperature profile is retrieved by generalized ensemble learning. Firstly, the feature variables of the model are constructed. Because there are many GIIRS channels, a two-step feature selection method is adopted: step 1--establish a blacklist of GIIRS channels; step 2--select feature variables by using the method of importance permutation. Secondly, they are integrated based on optimizing and adjusting the hyperparameters of three basic machine learning models (Random Forest, XGBoost and LightGBM). Generalized ensemble learning nonlinear convex optimization is used to optimize the weight of each basic model. Finally, based on high-frequency GIIRS observations of Typhoon Lekima and Typhoon Higos, testing and method evaluation of the temperature profile retrievals are carried out. The results show that LightGBM achieves the best retrieval result among the three basic models, followed by Random Forest and finally XGBoost. The root-mean-square error of the whole temperature profile in the training dataset of generalized ensemble learning is less than 0.3 K, while that of the testing dataset is less than 1.4 K, and that between 150 hPa and 925 hPa is less than 1 K. The retrieval results correlate well with the radiosonde temperature profile. The performance of generalized ensemble learning is better than the performances of the three basic models, but it depends on the retrieval results of LightGBM. In the Lekima experimental case, compared to other channels selected for temperature retrieval models, the importance of mediumwave channels 9 and 307 of GIIRS ranks first and second, respectively. The method in this paper provides a new solution and technical support for retrieving atmospheric parameters from hyperspectral and other satellite data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Meticulous Review: Cutting-Edge Cervix Cancer Stratification Using Image Processing And Machine Learning.
- Author
-
Bhavsar, Barkha and Shrimali, Bela
- Subjects
IMAGE processing ,MACHINE learning ,CERVICAL cancer ,DEEP learning ,CELL nuclei ,CANCER education - Abstract
Cervical cancer has under the top cancer found in women of developing countries since last many years. Classification of cervical cancer through a traditional microscopic approach is a monotonous and prolonged task. Most of the time hospital doctors cannot identify the cancer cells as sometimes the nucleus of a cell, which contains the genetic material (DNA), is typically very small and often not visible to the naked eye. Due to the different perspectives of doctors, cancer stages are classified falsely which leads to low recovery and late medication. The use of Image Processing and Machine Learning technologies can take off misclassification and inaccurate prediction. Although many deep learning techniques are available for cervical cancer cell detection and classification, the performance of such techniques for prediction and classification with real and sample datasets is the main challenge. In this paper, we did a thorough state-of-the-art review of the available current literature. The objective of this paper is to bring forth in-depth knowledge to novice researchers with a thorough understanding of the architecture of the computer-assisted classification process. The current literature is studied, analyzed, and discussed with their approaches, results, and methodologies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A Review of Machine Learning's Role in Cardiovascular Disease Prediction: Recent Advances and Future Challenges.
- Author
-
Naser, Marwah Abdulrazzaq, Majeed, Aso Ahmed, Alsabah, Muntadher, Al-Shaikhli, Taha Raad, and Kaky, Kawa M.
- Subjects
MACHINE learning ,CARDIOVASCULAR diseases ,ARTIFICIAL intelligence ,EARLY diagnosis ,TREATMENT delay (Medicine) - Abstract
Cardiovascular disease is the leading cause of global mortality and responsible for millions of deaths annually. The mortality rate and overall consequences of cardiac disease can be reduced with early disease detection. However, conventional diagnostic methods encounter various challenges, including delayed treatment and misdiagnoses, which can impede the course of treatment and raise healthcare costs. The application of artificial intelligence (AI) techniques, especially machine learning (ML) algorithms, offers a promising pathway to address these challenges. This paper emphasizes the central role of machine learning in cardiac health and focuses on precise cardiovascular disease prediction. In particular, this paper is driven by the urgent need to fully utilize the potential of machine learning to enhance cardiovascular disease prediction. In light of the continued progress in machine learning and the growing public health implications of cardiovascular disease, this paper aims to offer a comprehensive analysis of the topic. This review paper encompasses a wide range of topics, including the types of cardiovascular disease, the significance of machine learning, feature selection, the evaluation of machine learning models, data collection & preprocessing, evaluation metrics for cardiovascular disease prediction, and the recent trends & suggestion for future works. In addition, this paper offers a holistic view of machine learning's role in cardiovascular disease prediction and public health. We believe that our comprehensive review will contribute significantly to the existing body of knowledge in this essential area. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. A Study of Breast Cancer Classification Algorithms by Fusing Machine Learning and Deep Learning.
- Author
-
Sun, Lifei and Li, Sen
- Subjects
DEEP learning ,CLASSIFICATION algorithms ,TUMOR classification ,MACHINE learning ,BREAST cancer ,FEATURE selection - Abstract
Although breast cancer, with easy recurrence and high mortality, has become one of the leading causes of cancer death in women, early and accurate diagnosis of breast cancer can effectively increase the likelihood of a cure. Therefore, it is particularly important to improve the accuracy of early diagnosis of breast cancer. However, conventional early diagnosis relies on human experience and has a low accuracy rate. Therefore, many researchers have proposed various machine learning methods to improve the accuracy and efficiency of prediction. Most of the existing studies around breast cancer classification adopt a single algorithm to fit breast cancer data but ignore the applicability of different breast cancer data features to the model. In this paper, we adopt machine algorithms to strip the features of machine learning methods from the rest of the features and attempt to enhance the model effect by designing deep learning model structures to find the hidden patterns in the rest of the features. In addition, due to strict medical data privacy requirements and high collection difficulty and cost, the model designed in this paper will be trained on a small number of samples. As a result, we attempt to find a minimization model for a breast cancer classification algorithm that features both low cost and high efficiency. At the same time, the deep learning model is further designed to complement the original model when it is possible to introduce complex data indicators. Experimental values show that the design model in this paper performs best not only under limited data and limited indicators but also under limited data complex indicators, demonstrating the effectiveness of the approach of mixed comparison and feature selection of multiple classification algorithms. In summary, the fusion model designed and implemented in this paper performs well in the experiments, and the accuracy of the model test reaches 98.3%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Mitigation of Adversarial Attacks in 5G Networks with a Robust Intrusion Detection System Based on Extremely Randomized Trees and Infinite Feature Selection.
- Author
-
Baldini, Gianmarco
- Subjects
FEATURE selection ,5G networks ,INTRUSION detection systems (Computer security) ,MACHINE learning ,COMMUNICATION infrastructure ,DEEP learning - Abstract
Intrusion Detection Systems (IDSs) are an important tool to mitigate cybersecurity threats in the ICT infrastructures. Preferable properties of the IDSs are the optimization of the attack detection accuracy and the minimization of the computing resources and time. A signification portion of IDSs presented in the research literature is based on Machine Learning (ML) and Deep Learning (DL) elements, but they may be prone to adversarial attacks, which may undermine the overall performance of the IDS algorithm. This paper proposes a novel IDS focused on the detection of cybersecurity attacks in 5G networks, which addresses in a simple but effective way two specific adversarial attacks: (1) tampering of the labeled set used to train the ML algorithm, (2) modification of the features in the training data set. The approach is based on the combination of two algorithms, which have been introduced recently in the research literature. The first algorithm is the Extremely Randomized Tree (ERT) algorithm, which enhances the capability of Decision Tree (DT) and Random Forest (RF) algorithms to perform classification in data sets, which are unbalanced and of large size as IDS data sets usually are (legitimate traffic messages are more numerous than attack related messages). The second algorithm is the recently introduced Infinite Feature Selection algorithm, which is used to optimize the choice of the hyper-parameter defined in the approach and improve the overall computing efficiency. The result of the application of the proposed approach on a recently published 5G IDS data set proves its robustness against adversarial attacks with different degrees of severity calculated as the percentage of the tampered data set samples. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Machine learning model based on radiomics features for AO/OTA classification of pelvic fractures on pelvic radiographs.
- Author
-
Park, Jun Young, Lee, Seung Hwan, Kim, Young Jae, Kim, Kwang Gi, and Lee, Gil Jae
- Subjects
PELVIC fractures ,NAIVE Bayes classification ,RECEIVER operating characteristic curves ,RADIOMICS ,MACHINE learning ,RADIOGRAPHS ,FEATURE selection ,KEGEL exercises - Abstract
Depending on the degree of fracture, pelvic fracture can be accompanied by vascular damage, and in severe cases, it may progress to hemorrhagic shock. Pelvic radiography can quickly diagnose pelvic fractures, and the Association for Osteosynthesis Foundation and Orthopedic Trauma Association (AO/OTA) classification system is useful for evaluating pelvic fracture instability. This study aimed to develop a radiomics-based machine-learning algorithm to quickly diagnose fractures on pelvic X-ray and classify their instability. data used were pelvic anteroposterior radiographs of 990 adults over 18 years of age diagnosed with pelvic fractures, and 200 normal subjects. A total of 93 features were extracted based on radiomics:18 first-order, 24 GLCM, 16 GLRLM, 16 GLSZM, 5 NGTDM, and 14 GLDM features. To improve the performance of machine learning, the feature selection methods RFE, SFS, LASSO, and Ridge were used, and the machine learning models used LR, SVM, RF, XGB, MLP, KNN, and LGBM. Performance measurement was evaluated by area under the curve (AUC) by analyzing the receiver operating characteristic curve. The machine learning model was trained based on the selected features using four feature-selection methods. When the RFE feature selection method was used, the average AUC was higher than that of the other methods. Among them, the combination with the machine learning model SVM showed the best performance, with an average AUC of 0.75±0.06. By obtaining a feature-importance graph for the combination of RFE and SVM, it is possible to identify features with high importance. The AO/OTA classification of normal pelvic rings and pelvic fractures on pelvic AP radiographs using a radiomics-based machine learning model showed the highest AUC when using the SVM classification combination. Further research on the radiomic features of each part of the pelvic bone constituting the pelvic ring is needed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning.
- Author
-
Guan, Hui, Xu, Hang, and Cai, Lie
- Subjects
PARTICLE swarm optimization ,STACKING machines ,FEATURE selection ,SEARCH algorithms ,MACHINE learning ,FEATURE extraction - Abstract
To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the feature extraction process, this paper integrates part-of-speech features, TF-IDF features, and Word2Vec features during the feature selection stage. The particle swarm optimization algorithm is used to allocate weights to part-of-speech tags, which enhances the significance of crucial information in requirement texts. Secondly, to overcome the performance limitations of standalone machine learning models, an improved stacking model is proposed. The Low Correlation Algorithm and Grid Search Algorithms are utilized in P-stacking to automatically select the optimal combination of the base models, which reduces manual intervention and improves prediction performance. The experimental results show that compared with the method based on TF-IDF features, the highest F1 scores of a standalone machine learning model in the three datasets were improved by 3.89%, 10.68%, and 21.4%, respectively, after integrating part-of-speech features and Word2Vec features. Compared with the method based on a standalone machine learning model, the improved stacking ensemble machine learning model improved F1 scores by 2.29%, 5.18%, and 7.47% in the testing and evaluation of three datasets, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Chronic kidney disease prediction using boosting techniques based on clinical parameters.
- Author
-
Ganie, Shahid Mohammad, Dutta Pramanik, Pijush Kanti, Mallik, Saurav, and Zhao, Zhongming
- Subjects
CHRONIC kidney failure ,BOOSTING algorithms ,MACHINE learning ,RECEIVER operating characteristic curves ,FORECASTING ,FEATURE selection - Abstract
Chronic kidney disease (CKD) has become a major global health crisis, causing millions of yearly deaths. Predicting the possibility of a person being affected by the disease will allow timely diagnosis and precautionary measures leading to preventive strategies for health. Machine learning techniques have been popularly applied in various disease diagnoses and predictions. Ensemble learning approaches have become useful for predicting many complex diseases. In this paper, we utilise the boosting method, one of the popular ensemble learnings, to achieve a higher prediction accuracy for CKD. Five boosting algorithms are employed: XGBoost, CatBoost, LightGBM, AdaBoost, and gradient boosting. We experimented with the CKD data set from the UCI machine learning repository. Various preprocessing steps are employed to achieve better prediction performance, along with suitable hyperparameter tuning and feature selection. We assessed the degree of importance of each feature in the dataset leading to CKD. The performance of each model was evaluated with accuracy, precision, recall, F1-score, Area under the curve-receiving operator characteristic (AUC-ROC), and runtime. AdaBoost was found to have the overall best performance among the five algorithms, scoring the highest in almost all the performance measures. It attained 100% and 98.47% accuracy for training and testing sets. This model also exhibited better precision, recall, and AUC-ROC curve performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Multi-Classification and Tree-Based Ensemble Network for the Intrusion Detection System in the Internet of Vehicles.
- Author
-
Gou, Wanting, Zhang, Haodi, and Zhang, Ronghui
- Subjects
INTRUSION detection systems (Computer security) ,COMPUTER network traffic ,MACHINE learning ,TELECOMMUNICATION systems ,FEATURE selection ,INTERNET - Abstract
The Internet of Vehicles(IoV) employs vehicle-to-everything (V2X) technology to establish intricate interconnections among the Internet, the IoT network, and the Vehicle Networks (IVNs), forming a complex vehicle communication network. However, the vehicle communication network is very vulnerable to attacks. The implementation of an intrusion detection system (IDS) emerges as an essential requisite to ensure the security of in-vehicle/inter-vehicle communication in IoV. Within this context, the imbalanced nature of network traffic data and the diversity of network attacks stand as pivotal factors in IDS performance. On the one hand, network traffic data often heavily suffer from data imbalance, which impairs the detection performance. To address this issue, this paper employs a hybrid approach combining the Synthetic Minority Over-sampling Technique (SMOTE) and RandomUnderSampler to achieve a balanced class distribution. On the other hand, the diversity of network attacks constitutes another significant factor contributing to poor intrusion detection model performance. Most current machine learning-based IDSs mainly perform binary classification, while poorly dealing with multiclass classification. This paper proposes an adaptive tree-based ensemble network as the intrusion detection engine for the IDS in IoV. This engine employs a deep-layer structure, wherein diverse ML models are stacked as layers and are interconnected in a cascading manner, which enables accurate and efficient multiclass classification, facilitating the precise identification of diverse network attacks. Moreover, a machine learning-based approach is used for feature selection to reduce feature dimensionality, substantially alleviating the computational overhead. Finally, we evaluate the proposed IDS performance on various cyber-attacks from the in-vehicle and external networks in IoV by using the network intrusion detection dataset CICIDS2017 and the vehicle security dataset Car-Hacking. The experimental results demonstrate remarkable performance, with an F1-score of 0.965 on the CICIDS2017 dataset and an F1-score of 0.9999 on the Car-Hacking dataset. These scores demonstrate that our IDS can achieve efficient and precise multiclass classification. This research provides a valuable reference for ensuring the cybersecurity of IoV. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Water quality classification model with small features and class imbalance based on fuzzy rough sets
- Author
-
Shehab, Sara A., Darwish, Ashraf, and Hassanien, Aboul Ella
- Published
- 2023
- Full Text
- View/download PDF
23. Machine learning and the prediction of changes in profitability.
- Author
-
Jones, Stewart, Moser, William J., and Wieland, Matthew M.
- Subjects
MACHINE learning ,FEATURE selection ,INDEPENDENT variables ,PROFITABILITY ,FORECASTING - Abstract
Copyright of Contemporary Accounting Research is the property of Canadian Academic Accounting Association and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
24. Blind Deblurring of Remote-Sensing Single Images Based on Feature Alignment.
- Author
-
Zhu, Baoyu, Lv, Qunbo, Yang, Yuanbo, Sui, Xuefu, Zhang, Yu, Tang, Yinhui, and Tan, Zheng
- Subjects
REMOTE-sensing images ,GENERATIVE adversarial networks ,REMOTE sensing ,CONVOLUTION codes ,DEEP learning ,MACHINE learning ,KALMAN filtering - Abstract
Motion blur recovery is a common method in the field of remote sensing image processing that can effectively improve the accuracy of detection and recognition. Among the existing motion blur recovery methods, the algorithms based on deep learning do not rely on a priori knowledge and, thus, have better generalizability. However, the existing deep learning algorithms usually suffer from feature misalignment, resulting in a high probability of missing details or errors in the recovered images. This paper proposes an end-to-end generative adversarial network (SDD-GAN) for single-image motion deblurring to address this problem and to optimize the recovery of blurred remote sensing images. Firstly, this paper applies a feature alignment module (FAFM) in the generator to learn the offset between feature maps to adjust the position of each sample in the convolution kernel and to align the feature maps according to the context; secondly, a feature importance selection module is introduced in the generator to adaptively filter the feature maps in the spatial and channel domains, preserving reliable details in the feature maps and improving the performance of the algorithm. In addition, this paper constructs a self-constructed remote sensing dataset (RSDATA) based on the mechanism of image blurring caused by the high-speed orbital motion of satellites. Comparative experiments are conducted on self-built remote sensing datasets and public datasets as well as on real remote sensing blurred images taken by an in-orbit satellite (CX-6(02)). The results show that the algorithm in this paper outperforms the comparison algorithm in terms of both quantitative evaluation and visual effects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Optimal PMU Placement for Fault Classification and Localization Using Enhanced Feature Selection in Machine Learning Algorithms.
- Author
-
Faza, Ayman, Al-Mousa, Amjed, and Alqudah, Rajaa
- Subjects
FEATURE selection ,PHASOR measurement ,NAIVE Bayes classification ,FEATURE extraction ,FAULT location (Engineering) ,CLASSIFICATION algorithms ,MACHINE learning - Abstract
Machine learning (ML) algorithms are increasingly used in power systems applications. One important application is the classification and localization of various types of transmission line faults. Using voltage and current measurements from phasor measurement units (PMUs), a number of useful features can be extracted, which can form the basis of a ML-based prediction of the fault type, line, and distance on the line. This paper proposes a technique to find the optimal number and placement of PMUs by performing thorough feature selection. The features are selected to maximize the accuracy of the ML classification and regression algorithms. The results show that for the IEEE 14 bus system, the use of only five PMUs is sufficient to obtain high levels of accuracy. For example, a testing accuracy of 99.0% and 97.1% can be achieved for the fault type and fault line location, respectively. As for the fault distance along the line, the testing MAE of 3.1% can be obtained along with an R 2 score of 94.4%. Adding more PMUs does not provide any additional value in terms of accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A CLASS SPECIFIC FEATURE SELECTION METHOD FOR IMPROVING THE PERFORMANCE OF TEXT CLASSIFICATION.
- Author
-
VENKATESH V., SHARAN S. B., MAHALAXMY S., MONISHA, S., D. S., ASHICK SANJEY, and ASHOKKUMAR P.
- Subjects
FEATURE selection ,MACHINE learning ,CLASSIFICATION - Abstract
Recently, a significant amount of research work has been carried out in the field of feature selection. Although these methods help to increase the accuracy of the machine learning classification, the selected subset of features considers all the classes and may not select recommendable features for a particular class. The main goal of our paper is to propose a new class-specific feature selection algorithm that is capable of selecting an appropriate subset of features for each class. In this regard, we first perform class binarization and then select the best features for each class. During the feature selection process, we deal with class imbalance problems and redundancy elimination. The Weighted Average Voting Ensemble method is used for the final classification. Finally, we carry out experiments to compare our proposed feature selection approach with the existing popular feature selection methods. The results prove that our feature selection method outperforms the existing methods with an accuracy of more than 37%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Genetic Algorithm for Feature Selection Applied to Financial Time Series Monotonicity Prediction: Experimental Cases in Cryptocurrencies and Brazilian Assets.
- Author
-
Contreras, Rodrigo Colnago, Xavier da Silva, Vitor Trevelin, Xavier da Silva, Igor Trevelin, Viana, Monique Simplicio, Santos, Francisco Lledo dos, Zanin, Rodrigo Bruno, Martins, Erico Fernandes Oliveira, and Guido, Rodrigo Capobianco
- Subjects
MACHINE learning ,GENETIC algorithms ,TIME series analysis ,CRYPTOCURRENCIES ,FEATURE selection ,INVESTORS ,ASSETS (Accounting) - Abstract
Since financial assets on stock exchanges were created, investors have sought to predict their future values. Currently, cryptocurrencies are also seen as assets. Machine learning is increasingly adopted to assist and automate investments. The main objective of this paper is to make daily predictions about the movement direction of financial time series through classification models, financial time series preprocessing methods, and feature selection with genetic algorithms. The target time series are Bitcoin, Ibovespa, and Vale. The methodology of this paper includes the following steps: collecting time series of financial assets; data preprocessing; feature selection with genetic algorithms; and the training and testing of machine learning models. The results were obtained by evaluating the models with the area under the ROC curve metric. For the best prediction models for Bitcoin, Ibovespa, and Vale, values of 0.61, 0.62, and 0.58 were obtained, respectively. In conclusion, the feature selection allowed the improvement of performance in most models, and the input series in the form of percentage variation obtained a good performance, although it was composed of fewer attributes in relation to the other sets tested. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Complex Real-Time Monitoring and Decision-Making Assistance System Based on Hybrid Forecasting Module and Social Network Analysis.
- Author
-
Fan, Henghao, Li, Hongmin, Gu, Xiaoyang, and Ren, Zhongqiu
- Subjects
SOCIAL network analysis ,AIR pollution prevention ,PEARSON correlation (Statistics) ,DECISION making ,SENTIMENT analysis ,FEATURE selection ,FORECASTING - Abstract
Timely short-term spatial air quality forecasting is essential for monitoring and prevention in urban agglomerations, providing a new perspective on joint air pollution prevention. However, a single model on air pollution forecasting or spatial correlation analysis is insufficient to meet the strong demand. Thus, this paper proposed a complex real-time monitoring and decision-making assistance system, using a hybrid forecasting module and social network analysis. Firstly, before an accurate forecasting module was constructed, text sentiment analysis and a strategy based on multiple feature selection methods and result fusion were introduced to data preprocessing. Subsequently, CNN-D-LSTM was proposed to improve the feature capture ability to make forecasting more accurate. Then, social network analysis was utilized to explore the spatial transporting characteristics, which could provide solutions to joint prevention and control in urban agglomerations. For experiment simulation, two comparative experiments were constructed for individual models and city cluster forecasting, in which the mean absolute error decreases to 7.8692 and the Pearson correlation coefficient is 0.9816. For overall spatial cluster forecasting, related experiments demonstrated that with appropriate cluster division, the Pearson correlation coefficient could be improved to nearly 0.99. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Special Issue "Algorithms for Feature Selection".
- Author
-
Khan, Muhammad Adnan
- Subjects
DEEP learning ,MACHINE learning ,FEATURE selection ,ALGORITHMS - Published
- 2023
- Full Text
- View/download PDF
30. IoT-cloud based healthcare model for COVID-19 detection: an enhanced k-Nearest Neighbour classifier based approach
- Author
-
Prayag Tiwari, Rajendrani Mukherjee, Deepak Gupta, Aurghyadip Kundu, Ashish Khanna, Indrajit Mukherjee, Mohammad Shorfuzzaman, Department of Computer Science, Aalto-yliopisto, and Aalto University
- Subjects
Feature ,IoT ,Computer science ,Feature selection ,02 engineering and technology ,Machine learning ,computer.software_genre ,Classifier ,Theoretical Computer Science ,k-nearest neighbors algorithm ,Classifier (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,Regular Paper ,Numerical Analysis ,Learning classifier system ,business.industry ,Ant colony optimization algorithms ,Healthcare ,COVID-19 ,020206 networking & telecommunications ,Predictive analytics ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Feature (computer vision) ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Cloud ,Software - Abstract
COVID - 19 affected severely worldwide. The pandemic has caused many causalities in a very short span. The IoT-cloud-based healthcare model requirement is utmost in this situation to provide a better decision in the covid-19 pandemic. In this paper, an attempt has been made to perform predictive analytics regarding the disease using a machine learning classifier. This research proposed an enhanced KNN (k NearestNeighbor) algorithm eKNN, which did not randomly choose the value of k. However, it used a mathematical function of the dataset’s sample size while determining the k value. The enhanced KNN algorithm eKNN has experimented on 7 benchmark COVID-19 datasets of different size, which has been gathered from standard data cloud of different countries (Brazil, Mexico, etc.). It appeared that the enhanced KNN classifier performs significantly better than ordinary KNN. The second research question augmented the enhanced KNN algorithm with feature selection using ACO (Ant Colony Optimization). Results indicated that the enhanced KNN classifier along with the feature selection mechanism performed way better than enhanced KNN without feature selection. This paper involves proposing an improved KNN attempting to find an optimal value of k and studying IoT-cloud-based COVID - 19 detection.
- Published
- 2021
31. An Industrial Load Classification Method Based on a Two-Stage Feature Selection Strategy and an Improved MPA-KELM Classifier: A Chinese Cement Plant Case.
- Author
-
Zhou, Mengran, Zhu, Ziwei, Hu, Feng, Bian, Kai, and Lai, Wenhao
- Subjects
CEMENT plants ,FEATURE selection ,MACHINE learning ,K-nearest neighbor classification ,DATA scrubbing ,REACTIVE power ,POWER plants - Abstract
Accurately identifying industrial loads helps to accelerate the construction of new power systems and is crucial to today's smart grid development. Therefore, this paper proposes an industrial load classification method based on two-stage feature selection combined with an improved marine predator algorithm (IMPA)-optimized kernel extreme learning machine (KELM). First, the time- and frequency-domain features of electrical equipment (active and reactive power) are extracted from the power data after data cleaning, and the initial feature pool is established. Next, a two-stage feature selection algorithm is proposed to generate the smallest features, leading to superior classification accuracy. In the initial selection phase, each feature weight is calculated using ReliefF technology, and the features with smaller weights are removed to obtain the candidate feature set. In the reselection stage, the k-nearest neighbor classifier (KNN) based on the MPA is designed to obtain the superior combination of features from the candidate feature set concerning the classification accuracy and the number of feature inputs. Third, the IMPA-KELM classifier is developed as a load identification model. The MPA improvement strategy includes self-mapping to generate chaotic sequence initialization and boundary mutation operations. Compared with the MPA, IMPA has a faster convergence speed and more robust global search capability. In this paper, actual data from the cement industry within China are used as a research case. The experimental results show that after two-stage feature selection, the initial feature set reduces the feature dimensionality from 58 dimensions to 3 dimensions, which is 5.17% of the original. In addition, the proposed IMPA-KELM has the highest overall recognition accuracy of 93.39% compared to the other models. The effectiveness and feasibility of the proposed method are demonstrated. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. A hybrid breast cancer classification algorithm based on meta-learning and artificial neural networks.
- Author
-
Luyao Han and Zhixiang Yin
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,CLASSIFICATION algorithms ,TUMOR classification ,BREAST cancer ,COMPUTATIONAL linguistics ,SPEECH processing systems - Abstract
The incidence of breast cancer in women has surpassed that of lung cancer as the world's leading new cancer case. Regular screening and measures become an effective way to prevent breast cancer and also provide a good foundation for later treatment. Women should receive regular checkups in the hospital after reaching a certain age. The use of computer-aided technology can improve the accuracy and efficiency of physicians' decision-making. Data pre-processing is required before data analysis, and 16 features are selected using a correlation-based feature selection method. In this paper, metalearning and Artificial Neural Networks (ANN) are combined to create a hybrid algorithm. The proposed hybrid algorithm for predicting breast cancer was attempted to achieve 98.74% accuracy and 98.02% F1-score by creating a combination of various meta-learning models whose output was used as input features for creating ANN models. Therefore, the hybrid algorithm proposed in this paper can obtain better prediction results than a single model. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Securing IoT networks in cloud computing environments: a real-time IDS
- Author
-
Biswas, Soham and Ansari, Md. Sarfaraj Alam
- Published
- 2024
- Full Text
- View/download PDF
34. An adaptive nonlinear whale optimization multi-layer perceptron cyber intrusion detection framework
- Author
-
El-Ghaish, Hany, Miqrish, Haitham, Elmogy, Ahmed, and Elawady, Wael
- Published
- 2024
- Full Text
- View/download PDF
35. Cloud-based email phishing attack using machine and deep learning algorithm.
- Author
-
Butt, Umer Ahmed, Amin, Rashid, Aldabbas, Hamza, Mohan, Senthilkumar, Alouffi, Bader, and Ahmadian, Ali
- Subjects
MACHINE learning ,PHISHING ,DEEP learning ,SPAM email ,SUPPORT vector machines ,COMPUTER systems - Abstract
Cloud computing refers to the on-demand availability of personal computer system assets, specifically data storage and processing power, without the client's input. Emails are commonly used to send and receive data for individuals or groups. Financial data, credit reports, and other sensitive data are often sent via the Internet. Phishing is a fraudster's technique used to get sensitive data from users by seeming to come from trusted sources. The sender can persuade you to give secret data by misdirecting in a phished email. The main problem is email phishing attacks while sending and receiving the email. The attacker sends spam data using email and receives your data when you open and read the email. In recent years, it has been a big problem for everyone. This paper uses different legitimate and phishing data sizes, detects new emails, and uses different features and algorithms for classification. A modified dataset is created after measuring the existing approaches. We created a feature extracted comma-separated values (CSV) file and label file, applied the support vector machine (SVM), Naive Bayes (NB), and long short-term memory (LSTM) algorithm. This experimentation considers the recognition of a phished email as a classification issue. According to the comparison and implementation, SVM, NB and LSTM performance is better and more accurate to detect email phishing attacks. The classification of email attacks using SVM, NB, and LSTM classifiers achieve the highest accuracy of 99.62%, 97% and 98%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Performance Analysis of Intrusion Detection System in the IoT Environment Using Feature Selection Technique.
- Author
-
Alhanaya, Moody and Ateyeh Al-Shqeerat, Khalil Hamdi
- Subjects
FEATURE selection ,INTRUSION detection systems (Computer security) ,MACHINE learning ,INTERNET of things ,K-nearest neighbor classification ,RANDOM forest algorithms - Abstract
The increasing number of security holes in the Internet of Things (IoT) networks creates a question about the reliability of existing network intrusion detection systems. This problem has led to the developing of a research area focused on improving network-based intrusion detection system (NIDS) technologies. According to the analysis of different businesses, most researchers focus on improving the classification results of NIDS datasets by combiningmachine learning and feature reduction techniques. However, these techniques are not suitable for every type of network. In light of this, whether the optimal algorithm and feature reduction techniques can be generalized across various datasets for IoT networks remains. The paper aims to analyze themethods used in this research and whether they can be generalized to other datasets. Six ML models were used in this study, namely, logistic regression (LR), decision trees (DT), Naive Bayes (NB), random forest (RF), K-nearest neighbors (KNN), and linear SVM. The primary detection algorithms used in this study, Principal Component (PCA) and Gini Impurity-Based Weighted Forest (GIWRF) evaluated against three global ToN-IoT datasets, UNSWNB15, and Bot-IoT datasets. The optimal number of dimensions for each dataset was not studied by applying the PCA algorithm. It is stated in the paper that the selection of datasets affects the performance of the FE techniques and detection algorithms used. Increasing the efficiency of this research area requires a comprehensive standard feature set that can be used to improve quality over time. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering.
- Author
-
Chu, Zhiguang, He, Jingsha, Zhang, Xiaolei, Zhang, Xing, and Zhu, Nafei
- Subjects
DATA privacy ,DATABASES ,FEATURE selection ,MACHINE learning ,ELECTRONIC data processing ,CLUSTER analysis (Statistics) - Abstract
As a social information product, the privacy and usability of high-dimensional data are the core issues in the field of privacy protection. Feature selection is a commonly used dimensionality reduction processing technique for high-dimensional data. Some feature selection methods only process some of the features selected by the algorithm and do not take into account the information associated with the selected features, resulting in the usability of the final experimental results not being high. This paper proposes a hybrid method based on feature selection and a cluster analysis to solve the data utility and privacy problems of high-dimensional data in the actual publishing process. The proposed method is divided into three stages: (1) screening features; (2) analyzing the clustering of features; and (3) adaptive noise. This paper uses the Wisconsin Breast Cancer Diagnostic (WDBC) database from UCI's Machine Learning Library. Using classification accuracy to evaluate the performance of the proposed method, the experiments show that the original data are processed by the algorithm in this paper while protecting the sensitive data information while retaining the contribution of the data to the diagnostic results. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Application of Machine Learning in Transformer Health Index Prediction
- Author
-
Alhaytham Alqudsi and Ayman El-Hag
- Subjects
feature selection ,insulation health index ,machine learning ,oil/paper insulation ,transformer asset management ,Technology - Abstract
The presented paper aims to establish a strong basis for utilizing machine learning (ML) towards the prediction of the overall insulation health condition of medium voltage distribution transformers based on their oil test results. To validate the presented approach, the ML algorithms were tested on two databases of more than 1000 medium voltage transformer oil samples of ratings in the order of tens of MVA. The oil test results were acquired from in-service transformers (during oil sampling time) of two different utility companies in the gulf region. The illustrated procedure aimed to mimic a realistic scenario of how the utility would benefit from the use of different ML tools towards understanding the insulation health index of their transformers. This objective was achieved using two procedural steps. In the first step, three different data training and testing scenarios were used with several pattern recognition tools for classifying the transformer health condition based on the full set of input test features. In the second step, the same pattern recognition tools were used along with the three training/testing scenarios for a reduced number of test features. Also, a previously developed reduced model was the basis to reduce the needed number of tests for transformer health index calculations. It was found that reducing the number of tests did not influence the accuracy of the ML prediction models, which is considered as a significant advantage in terms of transformer asset management (TAM) cost reduction.
- Published
- 2019
- Full Text
- View/download PDF
39. Risk Evaluation of Elevators Based on Fuzzy Theory and Machine Learning Algorithms.
- Author
-
Pan, Wei, Xiang, Yi, Gong, Weili, and Shen, Haiying
- Subjects
ELEVATORS ,MACHINE learning ,MACHINE theory ,RISK assessment ,FEATURE selection ,SUPPORT vector machines ,CLASSIFICATION algorithms - Abstract
Elevators have become an integral part of modern buildings, and technological advances have enabled the monitoring of their operational status through sensor technology. In response to the development of the elevator industry and the need for practical elevator operation risk evaluation, this paper proposes an elevator risk evaluation method based on fuzzy theory and machine learning methods. The method begins by establishing an elevator operation risk evaluation index system. The traditional fuzzy comprehensive evaluation method is then employed to evaluate the risk levels of the 50 elevators studied. The collected index data and labels (fuzzy comprehensive evaluation results) are used as inputs to train the support vector machine (SVM) model. To optimize the SVM model, the maximum information coefficient method, enhanced by the correlation-based feature selection (MIC-CFS) method, is employed to select features for the index input to the SVM model. The improved gray wolf algorithm (IGWO) method optimizes the SVM. Finally, the model's performance is verified using new index data. The experimental results demonstrate that introducing machine learning methods for elevator risk evaluation saves time and effort while providing good accuracy compared to the traditional expert evaluation method. The optimization of the SVM model by IGWO and feature selection by the MIC-CFS method results in a more concise SVM model that converges faster during training, exhibits better stability, and achieves higher accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Analysis of traditional machine learning approaches on heart attacks prediction.
- Author
-
BERDINANTH, Micheal, SYED, Samah, VELUSAMY, Shudhesh, SUSEELAN, Angel Deborah, and SIVANAIAH, Rajalakshmi
- Subjects
MACHINE learning ,MYOCARDIAL infarction ,MEDICAL personnel ,DECISION trees ,LOGISTIC regression analysis ,REGRESSION trees ,FORECASTING - Abstract
Considering the persistent challenge of early heart attack detection in patients, despite significant advancements in medical systems, this research project is motivated by the imperative need to develop effective predictive machine learning models. The central problem addressed here in is the identification of individuals at risk of experiencing a heart attack. In response to this problem, two distinct models have been devised and meticulously evaluated, namely decision trees and logistic regression, each designed to fulfil the primary objective of this research. Through a rigorous analysis and thorough evaluation of the results, we have scrutinised the performance of these models. The comparison between decision trees and logistic regression provides valuable insights into their efficacy in predicting heart attacks. The culmination of this endeavor not only contributes to the growing body of knowledge in heart attack prediction and provides healthcare professionals with powerful tools for early diagnosis, potentially saving lives and improving patient outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. ReMAHA–CatBoost: Addressing Imbalanced Data in Traffic Accident Prediction Tasks.
- Author
-
Li, Guolian, Wu, Yadong, Bai, Yulong, and Zhang, Weihan
- Subjects
MACHINE learning ,TRAFFIC accidents ,FEATURE selection ,ENGINEERING models ,GENETIC algorithms - Abstract
Featured Application: ReMAHA–CatBoost is an advanced machine learning model designed for predicting traffic accident severity. It is constructed in two parts: ReMAHA (relief–F-based genetic algorithm with over-sampling algorithm for weighted Mahalanobis distance) and CatBoost, to offer an innovative solution in the field of imbalanced data classification. Key Features and Highlights: (1) ReMAHA Over-sampling: ReMAHA employs the Relief–F algorithm for feature selection and combines it with an innovative over-sampling technique to enhance prediction accuracy for minority classes; (2) Feature Engineering: The model leverages feature engineering to determine the significance of different attributes, enabling it to make precise predictions regarding accident severity; and (3) CatBoost Integration: ReMAHA incorporates CatBoost, a state-of-the-art gradient-boosting algorithm, to improve predictive performance by mitigating issues like overfitting and prediction bias. This paper elucidates the working principles of oversampling algorithms in machine learning tasks based on imbalanced datasets, specifically addressing how to resolve the issue of low accuracy stemming from imbalanced data at the data level. Based on the experimental results presented in this paper, it is evident that ReMAHA–CatBoost outperforms several other oversampling algorithms and models, especially on the US–Accidents traffic accident dataset characterized by an extreme class imbalance ratio of 91.40. This improved performance enhances the precision of traffic accident severity prediction. Using historical information from traffic accidents to predict accidents has always been an area of active exploration by researchers in the field of transportation. However, predicting only the occurrence of traffic accidents is insufficient for providing comprehensive information to relevant authorities. Therefore, further classification of predicted traffic accidents is necessary to better identify and prevent potential hazards and the escalation of accidents. Due to the significant disparity in the occurrence rates of different severity levels of traffic accidents, data imbalance becomes a critical issue. To address the challenge of predicting extremely imbalanced traffic accident events, this paper introduces a predictive framework named ReMAHA–CatBoost. To evaluate the effectiveness of ReMAHA–CatBoost, we conducted experiments on the US–Accidents traffic accident dataset, where the class label imbalance reaches up to 91.40 times. The experimental results demonstrate that the proposed model in this paper exhibits exceptional predictive performance in the domain of imbalanced traffic accident prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. Feature mining and classifier selection for API calls-based malware detection.
- Author
-
Balan, Gheorghe, Simion, Ciprian-Alin, Gavriluţ, Dragoş Teodor, and Luchian, Henri
- Subjects
MACHINE learning ,MALWARE ,DATABASES ,FEATURE selection ,APPLICATION program interfaces ,MACHINE performance ,DECISION trees - Abstract
This paper deals with a major challenge in cyber-security: the need to respond to ever renewed techniques used by attackers in order to avoid detection based on analysing static features of malware. These constantly renewed techniques consist of various changes in file geometry, entropy a.s.o. As a consequence, static malware features sets describe less and less accurately the malicious files; hence, the performance of machine learning models in detecting new variants of the same malware family may be severely impaired. The paper focuses on a promising approach to this detection challenge: defining file features based on OS (operating system) API (Application Program Interface) calls sequences. We explore in detail the detection potential of such features, since, in order to act maliciously, these features are highly unlikely to be hidden. We studied several tens of thousands of such features, a modest-sized subset of which were subsequently fed to several machine learning models. The database used for training and testing consists of 1.5 million files, including malicious files from the polymorphic families Emotet and Trickbot. Using this database, nearly 4,000 pairings (classifier, feature selection algorithm) were trained / tested. Our experimental results show that the API (Application Program Interface) calls-oriented feature mining process is well suited for detecting polymorphic malware. A comparative discussion of the detection results of the various models is presented; depending on the target optimisation criterion (detection rate / false positive rate / saving resources), three of the 4,000 classification models turn out to be best suited for real-world applications: Random Forrest, Legacy Neural Networks and Decision Tree. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. An Enhanced RIME Optimizer with Horizontal and Vertical Crossover for Discriminating Microseismic and Blasting Signals in Deep Mines.
- Author
-
Zhu, Wei, Li, Zhihui, Heidari, Ali Asghar, Wang, Shuihua, Chen, Huiling, and Zhang, Yudong
- Subjects
RHYME ,MINES & mineral resources ,REAL numbers ,FEATURE selection ,BLASTING ,GEOPHONE ,SWARM intelligence ,MICROSEISMS - Abstract
Real-time monitoring of rock stability during the mining process is critical. This paper first proposed a RIME algorithm (CCRIME) based on vertical and horizontal crossover search strategies to improve the quality of the solutions obtained by the RIME algorithm and further enhance its search capabilities. Then, by constructing a binary version of CCRIME, the key parameters of FKNN were optimized using a binary conversion method. Finally, a discrete CCRIME-based BCCRIME was developed, which uses an S-shaped function transformation approach to address the feature selection issue by converting the search result into a real number that can only be zero or one. The performance of CCRIME was examined in this study from various perspectives, utilizing 30 benchmark functions from IEEE CEC2017. Basic algorithm comparison tests and sophisticated variant algorithm comparison experiments were also carried out. In addition, this paper also used collected microseismic and blasting data for classification prediction to verify the ability of the BCCRIME-FKNN model to process real data. This paper provides new ideas and methods for real-time monitoring of rock mass stability during deep well mineral resource mining. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset.
- Author
-
Al-Rajab, Murad, Lu, Joan, Xu, Qiang, Kentour, Mohamed, Sawsa, Ahlam, Shuweikeh, Emad, Joy, Mike, and Arasaradnam, Ramesh
- Subjects
FEATURE selection ,COLON cancer ,MACHINE learning ,PARTICLE swarm optimization ,LITERATURE reviews ,CANCER genes - Abstract
Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer.
- Author
-
Dong, Minhui, Wang, Yu, Todo, Yuki, and Hua, Yuxiao
- Abstract
Cervical cancer is the fourth most commonly diagnosed cancer and one of the leading causes of cancer-related deaths among females worldwide. Early diagnosis can greatly increase the cure rate for cervical cancer. However, due to the need for substantial medical resources, it is difficult to implement in some areas. With the development of machine learning, utilizing machine learning to automatically diagnose cervical cancer has currently become one of the main research directions in the field. Such an approach typically involves a large number of features. However, a portion of these features is redundant or irrelevant. The task of eliminating redundant or irrelevant features from the entire feature set is known as feature selection (FS). Feature selection methods can roughly be divided into three types, including filter-based methods, wrapper-based methods, and embedded-based methods. Among them, wrapper-based methods are currently the most commonly used approach, and many researchers have demonstrated that these methods can reduce the number of features while improving the accuracy of diagnosis. However, this method still has some issues. Wrapper-based methods typically use heuristic algorithms for FS, which can result in significant computational time. On the other hand, heuristic algorithms are often sensitive to parameters, leading to instability in performance. To overcome this challenge, a novel wrapper-based method named the Binary Harris Hawks Optimization (BHHO) algorithm is proposed in this paper. Compared to other wrapper-based methods, the BHHO has fewer hyper-parameters, which contributes to better stability. Furthermore, we have introduced a rank-based selection mechanism into the algorithm, which endows BHHO with enhanced optimization capabilities and greater generalizability. To comprehensively evaluate the performance of the proposed BHHO, we conducted a series of experiments. The experimental results show that the proposed BHHO demonstrates better accuracy and stability compared to other common wrapper-based FS methods on the cervical cancer dataset. Additionally, even on other disease datasets, the proposed algorithm still provides competitive results, proving its generalizability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Weather-Based Prediction of Power Consumption in District Heating Network: Case Study in Finland.
- Author
-
Vakhnin, Aleksei, Ryzhikov, Ivan, Brester, Christina, Niska, Harri, and Kolehmainen, Mikko
- Subjects
HEATING from central stations ,MACHINE learning ,FEATURE selection ,EVOLUTIONARY algorithms ,DIFFERENTIAL evolution ,PLANT capacity ,GENETIC algorithms ,HEAT losses ,ENERGY consumption - Abstract
Accurate prediction of energy consumption in district heating systems plays an important role in supporting effective and clean energy production and distribution in dense urban areas. Predictive models are needed for flexible and cost-effective operation of energy production and usage, e.g., using peak shaving or load shifting to compensate for heat losses in the pipeline. This helps to avoid exceedance of power plant capacity. The purpose of this study is to automate the process of building machine learning (ML) models to solve a short-term power demand prediction problem. The dataset contains a district heating network's measured hourly power consumption and ambient temperature for 415 days. In this paper, we propose a hybrid evolutionary-based algorithm, named GA-SHADE, for the simultaneous optimization of ML models and feature selection. The GA-SHADE algorithm is a hybrid algorithm consisting of a Genetic Algorithm (GA) and success-history-based parameter adaptation for differential evolution (SHADE). The results of the numerical experiments show that the proposed GA-SHADE algorithm allows the identification of simplified ML models with good prediction performance in terms of the optimized feature subset and model hyperparameters. The main contributions of the study are (1) using the proposed GA-SHADE, ML models with varying numbers of features and performance are obtained. (2) The proposed GA-SHADE algorithm self-adapts during operation and has only one control parameter. There is no fine-tuning required before execution. (3) Due to the evolutionary nature of the algorithm, it is not sensitive to the number of features and hyperparameters to be optimized in ML models. In conclusion, this study confirms that each optimized ML model uses a unique set and number of features. Out of the six ML models considered, SVR and NN are better candidates and have demonstrated the best performance across several metrics. All numerical experiments were compared against the measurements and proven by the standard statistical tests. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Classification and spectrum optimization method of grease based on infrared spectrum.
- Author
-
Feng, Xin, Xia, Yanqiu, Xie, Peiyuan, and Li, Xiaohe
- Subjects
INFRARED spectra ,SELF-organizing maps ,FEATURE selection ,MACHINE learning ,ABSORPTION spectra ,CLASSIFICATION - Abstract
The infrared (IR) absorption spectral data of 63 kinds of lubricating greases containing six different types of thickeners were obtained using the IR spectroscopy. The Kohonen neural network algorithm was used to identify the type of the lubricating grease. The results show that this machine learning method can effectively eliminate the interference fringes in the IR spectrum, and complete the feature selection and dimensionality reduction of the high-dimensional spectral data. The 63 kinds of greases exhibit spatial clustering under certain IR spectrum recognition spectral bands, which are linked to characteristic peaks of lubricating greases and improve the recognition accuracy of these greases. The model achieved recognition accuracy of 100.00%, 96.08%, 94.87%, 100.00%, and 87.50% for polyurea grease, calcium sulfonate composite grease, aluminum (Al)-based grease, bentonite grease, and lithium-based grease, respectively. Based on the different IR absorption spectrum bands produced by each kind of lubricating grease, the three-dimensional spatial distribution map of the lubricating grease drawn also verifies the accuracy of classification while recognizing the accuracy. This paper demonstrates fast recognition speed and high accuracy, proving that the Kohonen neural network algorithm has an efficient recognition ability for identifying the types of the lubricating grease. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection.
- Author
-
Zhou, Junyi, Zheng, Haowen, Li, Shaole, Hao, Qiancheng, Zhang, Haoyang, Gao, Wenze, and Wang, Xianpeng
- Subjects
MACHINE learning ,FEATURE selection ,EVOLUTIONARY algorithms ,ALGORITHMS ,COEVOLUTION ,COMBINATORIAL optimization ,MACHINE performance - Abstract
In real-world applications, feature selection is crucial for enhancing the performance of data science and machine learning models. Typically, feature selection is a complex combinatorial optimization problem and a multi-objective optimization problem. Its primary goals are to reduce the dimensionality of the dataset and enhance the performance of machine learning algorithms. The selection of features in high-dimensional datasets is challenging due to the intricate relationships between features, which pose significant challenges to the performance and computational efficiency of algorithms. This paper introduces a Knowledge-Guided Competitive Co-Evolutionary Algorithm (KCCEA) for feature selection, especially for high-dimensional features. In the proposed algorithm, we make improvements to the foundational dominance-based multi-objective evolutionary algorithm in two aspects. First, the use of feature correlation as knowledge to guide evolution enhances the search speed and quality of traditional multi-objective evolutionary algorithm solutions. Second, a dynamically allocated competitive–cooperative evolutionary mechanism is proposed, integrating the improved knowledge-guided evolution with traditional evolutionary algorithms, further enhancing the search efficiency and diversity of solutions. Through rigorous empirical testing on various datasets, the KCCEA demonstrates superior performance compared to basic multi-objective evolutionary algorithms, providing effective solutions to multi-objective feature selection problems while enhancing the interpretability and effectiveness of prediction models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. An Improved Ensemble-Based Cardiovascular Disease Detection System with Chi-Square Feature Selection.
- Author
-
Korial, Ayad E., Gorial, Ivan Isho, and Humaidi, Amjad J.
- Subjects
FEATURE selection ,CARDIOVASCULAR diseases ,DEEP learning ,MACHINE learning ,K-nearest neighbor classification ,RANDOM forest algorithms - Abstract
Cardiovascular disease (CVD) is a leading cause of death globally; therefore, early detection of CVD is crucial. Many intelligent technologies, including deep learning and machine learning (ML), are being integrated into healthcare systems for disease prediction. This paper uses a voting ensemble ML with chi-square feature selection to detect CVD early. Our approach involved applying multiple ML classifiers, including naïve Bayes, random forest, logistic regression (LR), and k-nearest neighbor. These classifiers were evaluated through metrics including accuracy, specificity, sensitivity, F1-score, confusion matrix, and area under the curve (AUC). We created an ensemble model by combining predictions from the different ML classifiers through a voting mechanism, whose performance was then measured against individual classifiers. Furthermore, we applied chi-square feature selection method to the 303 records across 13 clinical features in the Cleveland cardiac disease dataset to identify the 5 most important features. This approach improved the overall accuracy of our ensemble model and reduced the computational load considerably by more than 50%. Demonstrating superior effectiveness, our voting ensemble model achieved a remarkable accuracy of 92.11%, representing an average improvement of 2.95% over the single highest classifier (LR). These results indicate the ensemble method as a viable and practical approach to improve the accuracy of CVD prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Copy Move Image Forgery Detection using Multi-Level Local Binary Pattern Algorithm.
- Author
-
Mahdi, Marwa Emad and M Ali, Nada Hussein
- Subjects
FORGERY ,ALGORITHMS ,STATISTICS ,FEATURE selection ,MACHINE learning ,HOUGH transforms ,SUPPORT vector machines - Abstract
Copyright of Journal of Engineering (17264073) is the property of Republic of Iraq Ministry of Higher Education & Scientific Research (MOHESR) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.