Language: english / Publication Type: Academic Journals / Publication Year Range: This year / Publisher: elsevier b.v. / Search Limiters: Full Text and Peer Reviewed / Topic: feature selection and machine learning - Searchworks@Jio Institute Digital Library Search Results

Showing total 42 results

Start Over Search Limiters Full Text Search Limiters Peer Reviewed Topic feature selection Topic machine learning Publication Year Range This year Language english Publication Type Academic Journals Publisher elsevier b.v.

42 results

1. Optimizing feature selection in intrusion detection systems: Pareto dominance set approaches with mutual information and linear correlation.

Author: Barbosa, Guilherme Nunes Nasseh, Andreoni, Martin, and Mattos, Diogo Menezes Ferrazani
Subjects: FEATURE selection, INTRUSION detection systems (Computer security), MACHINE learning, SOCIAL dominance, PEARSON correlation (Statistics), FILTER paper
Abstract: In the realm of network intrusion detection using machine learning, feature selection aims for computational efficiency, enhanced performance, and model interpretability, preventing overfitting and optimizing data visualization. This paper proposes a filtering method for feature selection, which optimizes information quantity and linear correlation between resultant features. The method identifies Pareto dominant pairs of informative and correlated features, constructs a graph, and selects key features based on betweenness centrality in its connected components. The proposal yields a more concise and informative dataset representation. Experimental results, using three diverse datasets, demonstrate that the proposal achieves more than 95% accuracy in classifying network attacks with just 14% of the total number features in original datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Software Fault Prediction Using FeatBoost Feature Selection Algorithm.

Author: Medicharla, Sirisha, Kumar, Shubham, Devarakonda, Praphul, Agrawalla, Bikash, and Reddy, B Ramachandra
Subjects: FEATURE selection, COMPUTER software testing, MACHINE learning, ALGORITHMS, SOFTWARE engineering, SOFTWARE reliability
Abstract: A critical aspect of software engineering is Software fault prediction which aims to identify and prevent errors in software systems before their release which can cause failures or issues for its users. Various techniques and tools have been developed to detect software faults, including static code analysis, dynamic testing, and machine learning-based approaches. In past few years, the world has seen a growing interest in the use of ML models for predicting software faults, as they can effectively analyse high dimensional datasets and detect complex patterns which are difficult for human experts to detect. However, developing accurate and reliable software fault detection models requires careful selection of data, feature engineering, and model evaluation. This purpose of this paper is to present a comprehensive analysis of potential applications and future research directions in the field of software fault detection. The study emphasizes the importance of identifying and addressing software faults to ensure the reliability and efficiency of software systems. Additionally, the paper outlines various approaches and techniques that can be employed for effective software fault detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. An Investigation into Ensemble Learning Techniques for Evaluating Soil Fertility through Analytical Approaches.

Author: Pant, Janmejay, Joshi, Mahesh Ch., Singh, Devendra, Pant, Hitesh Kumar, Bhatt, Ashutosh, and Pant, Durgesh
Subjects: SOIL fertility, MACHINE learning, SOIL testing, WEATHER, FEATURE selection
Abstract: At the heart of India's livelihood lies agriculture, a foundational and ever-evolving sector. The realm of agriculture faces modern challenges, encompassing unpredictable climate fluctuations, insufficient irrigation infrastructure, and erratic weather conditions. Amidst these challenges, machine learning has emerged as a valuable tool for evaluating soil fertility and Crop assessment in agriculture. Ensemble machine learning methodologies have garnered attention for their potential to enhance predictive capabilities. These techniques involve constructing meta-classifiers that collectively contribute to improved predictive accuracy. This paper centers on the analysis of soil data acquired from testing laboratories, to predict fertility based on a comprehensive dataset. The study employs prominent ensemble machine learning algorithms, specifically focusing on boosting techniques, to elevate predictive accuracy and ensure heightened consistency. The assessment of soil fertility categories involved the examination of 12 carefully chosen attributes. Diverse soil parameters were measured to facilitate the prediction of soil fertility levels. The outcomes of the experimentation revealed that the application of the boosting technique using the Xgboost algorithm yielded superior accuracy compared to alternative ensemble classifiers, achieving a remarkable 96% accuracy rate. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Machine Learning-Based Intrusion Detection on Multi-Class Imbalanced Dataset Using SMOTE.

Author: Widodo, Akdeas Oktanae, Setiawan, Bambang, and Indraswari, Rarasmaya
Subjects: INTRUSION detection systems (Computer security), MACHINE learning, INFORMATION technology, FEATURE selection, CLASSIFICATION algorithms, COMPUTER network security
Abstract: The rapid development of information technology has brought numerous benefits to society, but it has also led to increased security vulnerabilities in network systems. Intrusion detection systems (IDS) play a crucial role in identifying malicious activities, but they face challenges due to imbalanced datasets where the number of attack samples outweighs normal activities. This paper explores the performance of an IDS using SMOTE (Synthetic Minority Over-sampling Technique) and various classification algorithms to address imbalanced datasets and enhance detection of multi-class intrusions. Related works in the field of intrusion detection are reviewed, highlighting the effectiveness of different algorithms and techniques. The proposed work presents a model that combines SMOTE with log normalization and feature selection to improve IDS performance. Experiments are conducted on the NSL-KDD and CIC-IDS2017 datasets, evaluating different oversampling configurations and machine learning models. The results show that applying SMOTE improves overall performance, with high accuracy, precision, recall, and F1-score. Feature selection has minimal impact on model performance, suggesting the presence of redundant features. The study concludes that SMOTE effectively addresses class imbalance and enhances IDS performance, emphasizing the importance of incorporating oversampling techniques in intrusion detection systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Human activity recognition with smartphone-integrated sensors: A survey.

Author: Dentamaro, Vincenzo, Gattulli, Vincenzo, Impedovo, Donato, and Manca, Fabio
Subjects: *HUMAN activity recognition, *MACHINE learning, *FEATURE selection, *FEATURE extraction, *DETECTORS
Abstract: • Newbie study using standard ML techniques with HAR Application and Discussions. • Activities found in Literature with the corresponding reference. • Co-occurrences between activities and sensors with the corresponding reference. • Summary and comparison among the different datasets found in Literature. • Summary of the experimentation settings with performance scores found in Literature. Human Activity Recognition (HAR) is an essential area of research related to the ability of smartphones to retrieve information through embedded sensors and recognize the activity that humans are performing. Researchers have recognized people's activities by processing the data received from the sensors with Machine Learning Models. This work is intended to be a hands-on survey with practical's tables capable of guiding the reader through the sensors used in modern smartphones and highly cited developed machine learning models that perform human activity recognition. Several papers in the literature have been studied, paying attention to the preprocessing, feature extraction, feature selection, and classification techniques of the HAR system. In addition, several summary tables illustrating HAR approaches have been provided: most popular human activities in the literature with paper references, the most popular datasets available for download (Analyzing their characteristics, such as the number of subjects involved, the activities recorded, and the sensors with online-availability), co-occurrences between activities and sensors, and a summary table showing the performance obtained by researchers. =The paper's goal is to recommend, through the discussion phase and thanks to the tables, the current state of the art on this topic. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Convolutional neural networks for quality and species sorting of roundwood with image and numerical data.

Author: Achatz, Julia, Lukovic, Mirko, Hilt, Simon, Lädrach, Thomas, and Schubert, Mark
Subjects: *CONVOLUTIONAL neural networks, *CROSS-sectional imaging, *RECOMMENDER systems, *IMAGE recognition (Computer vision), *FEATURE selection, *HUMAN error, *MACHINE learning
Abstract: Roundwood sorting is still a manual process in many Swiss sawmills, requiring employees to visually inspect and categorize thousands of logs per day. The heavy workload can be both physically and mentally taxing and can lead to increased rates of human error. State-of-the-art automation systems like X-ray log scanners are expensive and difficult to integrate into existing process lines. This paper proposes a novel recommendation system that leverages recent advances in image classification to automate roundwood classification by quality and species. The system integrates a camera to capture cross-sectional images of logs and record numerical data, such as length, taper, and diameter. The analysis of the resulting dataset highlights the challenges of data imbalance and noise, which makes classification difficult and, in some cases, impossible. However, by using selected datasets with reduced noise, state-of-the-art Convolutional Neural Networks (CNNs) can extract quality and species features. Quality models learn from a manually selected and simplified dataset, featuring samples that experts can clearly classify based on the image's information. Species models are trained on a label-noise-reduced dataset, reflecting real-world complexity. The accuracy on the selected dataset for three quality classes is 80%. The species determination is less challenging and reaches 91% accuracy on a synchronized dataset for the main species spruce and fir. Overall, this paper highlights the potential of Machine Learning in augmenting the roundwood sorting processes and presents a novel system that can improve the efficiency and accuracy of the process. [Display omitted] • Automation of roundwood sorting: Replaces manual sorting with image-based AI. • Integrated camera system in roundwood sorting to collect labeled dataset. • Species prediction: 91% accuracy in spruce–fir distinction. • Quality prediction on complexity reduced dataset: 80% accuracy between three main quality levels. • Efficient, adaptable & scalable system which is easy to integrate into existing process lines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. An integrated intrusion detection framework based on subspace clustering and ensemble learning.

Author: Zhu, Jingyi and Liu, Xiufeng
Subjects: *INTRUSION detection systems (Computer security), *COMPUTER network traffic, *PARTICLE swarm optimization, *COMPUTER network security, *FEATURE selection, *MACHINE learning
Abstract: In the rapidly evolving landscape of the Internet of Things (IoT), ensuring robust intrusion detection is paramount for device and data security. This paper proposes a novel method for intrusion detection in IoT networks that leverages a unique blend of subspace clustering and ensemble learning. Our framework integrates three innovative strategies: Clustering Results as Features (CRF), Two-Level Decision Making (TDM), and Iterative Feedback Loop (IFL). These strategies synergize to enhance detection performance and model robustness. We employ mutual information for feature selection and utilize four subspace clustering algorithms – CLIQUE, PROCLUS, SUBCLU, and LOF – to create additional feature sets. Three base learners – NB, LGBM, and XGB – are used in conjunction with a Logistic Regression (LR) meta-learner. To fine-tune our model, we apply Particle Swarm Optimization (PSO) for hyperparameter optimization. We evaluate our framework on the UNSW-NB15 dataset, which contains realistic and diverse IoT network traffic data. The results show that our framework outperforms the state-of-the-art methods in terms of accuracy (97.05%), precision (96.33%), recall (96.55%), F1-score (96.45%), and false positive rate (0.029). Our framework can effectively detect both known and unknown attacks in IoT networks and achieve high accuracy and low false positive rate. The paper contributes both practical implications for network security and theoretical advancements in intrusion detection research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. An improved binary dandelion algorithm using sine cosine operator and restart strategy for feature selection.

Author: Dong, Junwei, Li, Xiaobo, Zhao, Yuxin, Ji, Jingchao, Li, Shaolang, and Chen, Hui
Subjects: *METAHEURISTIC algorithms, *FEATURE selection, *ALGORITHMS, *DANDELIONS, *MACHINE learning, *DATABASES, *CLASSIFICATION algorithms, *DATA mining
Abstract: • A feature selection method based on binary dandelion algorithm is proposed. • The algorithm applies a sine-cosine operator and a restart strategy. • Mutual information and quick bit mutation improve the performance of the algorithm. • The algorithm performs well on datasets with larger dimensions. Feature selection (FS) is an important data preprocessing technology for machine learning and data mining. Metaheuristic algorithm (MH) has been widely used in feature selection because of its powerful search function. This paper presents an improved Binary Dandelion Algorithm using Sine Cosine operator and Restart strategy (SCRBDA) for feature selection. First, the sine cosine operator is used in the radius formula of the core dandelions (CD), which significantly enhances the ability of algorithm development and exploration. Secondly, the algorithm uses a restart strategy to increase its ability to get rid of local optimum. Thirdly, mutual information is used to guide the generation of some dandelions, which pays more attention to the correlation between the selected features and categories. Finally, quick bit mutation is used as the mutation strategy to improve the diversity of the population. The SCRBDA proposed in this paper was tested on 18 datasets of different sizes from UCI machine learning database. The SCRBDA was compared with 8 other classical feature selection algorithms, and the performance of the proposed algorithm was evaluated through feature subset size, classification accuracy, fitness value, and F1-score. The experimental results show that SCRBDA achieves the best performance, which has stronger feature reduction ability and achieves better overall performance on most datasets. Especially on large-scale datasets, SCRBDA can obtain extremely smaller feature subsets while maintaining much higher classification accuracy, and satisfactory F1-score. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Bike sharing and cable car demand forecasting using machine learning and deep learning multivariate time series approaches.

Author: Peláez-Rodríguez, César, Pérez-Aracil, Jorge, Fister, Dušan, Torres-López, Ricardo, and Salcedo-Sanz, Sancho
Subjects: *DEMAND forecasting, *MACHINE learning, *TIME series analysis, *CAR sharing, *CYCLING, *DEEP learning, *FEATURE selection
Abstract: In this paper the performance of different Machine Learning and Deep Learning approaches is evaluated in problems related to green mobility in big cities. Specifically, the forecasting of bike sharing demand in Madrid and Barcelona (Spain) is approached, for different prediction time-horizons, and also a problem of cable car demand forecasting in Madrid city. An important number of predictive variables are considered, which are grouped into four different sets (categorical/calendrical, persistence-based, meteorological and, as a novelty of the paper, information about analogue past instances), whose relevance is studied for all cases. A feature selection mechanism is also incorporated in order to improve the prediction accuracy of the proposed algorithms. A total of 12 different multivariate regression techniques are implemented, covering from Machine Learning methods to time-series Deep Learning approaches. Excellent results in all the prediction problems approached are reported. Finally, the consequences of obtaining accurate prediction in these three problem of green mobility in big cities are discussed. In addition, it is studied how the results could be exported to other similar cases in more general urban mobility studies. Novelties of the work include: (1) Addressing the forecast problem of passenger flow on a cable car using ML and DL multivariate techniques; (2) using the demand of analogous past instances as an additional feature to solve the demand prediction problems; and (3) the extraction of global conclusions about feature relevance when addressing a demand forecasting problem in green mobility. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Prediction and classification of sEMG-based pinch force between different fingers.

Author: Wu, Yansheng, Liang, Shili, Ma, Yongkai, and Li, Bowen
Subjects: *CONVOLUTIONAL neural networks, *MACHINE learning, *SHORT-term memory, *LONG-term memory, *FEATURE selection, *FEATURE extraction, *PROSTHETICS
Abstract: • Pinch force strength is accurately predicted using sEMG. • Different pinch movements are accurately classified using sEMG. • Similarity of EMG features is evaluated. • An EMG feature-to-image method is proposed. The movement of intelligent EMG-driven prosthesis mainly relies on the synergy of different fingers to achieve function of grasping objects. The paper proposes a novel scheme for force prediction and movement classification about pinch between different fingers based on surface electromyography (sEMG) using machine learning. The pinch force and sEMG signals are recorded synchronously by a data acquisition device. Eight features are extracted, which are proven to have better performance in the estimation of sEMG-to-force. We present a novel feature selection method that uses the one-dimensional time series similarity assessment based on Manhattan distance to eliminate the repetitive information between features. Three optimal features carrying less repetitive information are retained. Seven machine learning algorithms are used to predict force strength. The results show that the Long Short Term Memory (LSTM) has the best performance of force prediction, achieving a R 2 of 0.9517 and RMSE of 3.2723. The paper proposes a novel method of converting EMG feature sequences to the normalized gray image in order to classify the finger movement. Five classifiers based on image feature extraction and the Convolutional Neural Network (CNN) are developed respectively. The experimental results indicate that the CNN performs best, achieving an accuracy of 97.66%. In this way, it not only realizes the accurate force prediction, but also realizes the movement classification between different fingers. The proposed methodology has the potential to realize simultaneous force and movement control of prosthetic hand. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Feature selection for chillers fault diagnosis from the perspectives of machine learning and field application.

Author: Wang, Zhanwei, Guo, Jingjing, Xia, Penghua, Wang, Lin, Zhang, Chunxiao, Leng, Qiang, and Zheng, Kaixin
Abstract: • Features are selected from the perspectives of machine learning and field application. • Stepwise FS path combining optimization with machine learning algorithms is proposed. • Best performance using existing features and corresponding feature set are revealed. • Recommendations for feature supplementation to further improve performance are given. • Feature sets are verified to be general and effective by experiments and comparisons. Fault diagnosis (FD) is vital for enhancing chiller efficiency and reliability. Feature selection (FS) is the prerequisite and key to diagnose faults. This paper addresses two intriguing questions from machine learning (ML) and field perspectives. Question-1: Based on commonly installed sensors, what is the best performance that the FD models based on ML algorithms can achieve, and what features are relevant? Question-2: Which features can enhance diagnostic performance? and to what extent? This paper designs a stepwise FS process. First, a field investigation is conducted to gather information on sensors installed in actual chillers. Based on actual field installation, feature calculation cost, and thermodynamic mechanism, three levels of initial feature libraries are created, each containing an increasing number and type of features. An FS method combining an optimization algorithm with an FD model based on ML algorithm is proposed. In the end, the insight into the best diagnostic performance achieved by ML-based models using existing sensors and the corresponding optimal feature subsets is provided, and recommendations for feature supplementation to further improve diagnostic performance are also provided. Compared with other literature-reported feature subsets, the recommended feature subsets show better generality and effectiveness on seven commonly used ML-based models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Motor imagery EEG task recognition using a nonlinear Granger causality feature extraction and an improved Salp swarm feature selection.

Author: Lin, Ruijing, Dong, Chaoyi, Zhou, Peng, Ma, Pengfei, Ma, Shuang, Chen, Xiaoyan, and Liu, Huanzi
Subjects: FEATURE selection, MACHINE learning, FEATURE extraction, SIMULATED annealing, MOTOR imagery (Cognition), ELECTROENCEPHALOGRAPHY, OPTIMIZATION algorithms, CLASSIFICATION algorithms
Abstract: • Brain network features were extracted by a Granger causal analysis. • Brain functional connectivity contributes to improving MI task classification. • An effective swarm optimization algorithm are used for feature selection. In the study of motor imagery (MI) brain-computer interfaces (BCIs), how to improve task classification accuracy has been always one of major challenges in the applications of MI-BCIs. As a type of crucial temporal and spatial feature, nonlinear Granger Causality (NGC) analysis was applied to feature extraction of MI-electroencephalogram (EEG) signals because the constructed brain network features can reflect the causal relationship between different channels in various brain regions. However, the MI-BCI task recognition often suffer from the information redundancy of NGC features, and these redundant features will increase the complexity of the machine learning models and accordingly reduce the prediction accuracy of the classification algorithms. To address this problem, this paper proposes a step-by-step tent chaos simulated annealing salp swarm feature selection (STCSA_SaSFS) algorithm to select an optimal set of features in a wrapper feature selection model. Then, the effectiveness of this feature selection method is verified using a support vector machine (SVM) classifier. Through the study of task related MI-BCI EEG data from ten subjects, the experiments showed that the highest classification accuracy of NGC feature extraction plus STCSA_SaSFS reached 97.19%, and the average classification accuracy was 89.57%. This average classification accuracy was 20.07% higher than that of NGC feature extraction without any feature selection, and it is also 2.96% higher than that of NGC feature extraction plus a traditional SaSFS algorithm. The effectiveness of STCSA_SaSFS was also compared with that of other smart swarm optimization algorithms, such as the sparrow search feature selection algorithm (SpSFS). STCSA_SaSFS outperforms SpSFS with an average classification accuracy of 8.07%. The algorithm was validated using a public dataset validation consisting of 10 subjects, which ultimately showed that the feature selection method proposed in this paper (STSA_SaSAFS) has a large advantage in the classification performance of motor imagery brain-computer interface tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Intrusion detection system for cyberattacks in the Internet of Vehicles environment.

Author: Korium, Mohamed Selim, Saber, Mohamed, Beattie, Alexander, Narayanan, Arun, Sahoo, Subham, and Nardelli, Pedro H.J.
Subjects: INTRUSION detection systems (Computer security), COMPUTER network traffic, CYBERTERRORISM, FEATURE selection, INTERNET, MACHINE learning
Abstract: This paper presents a novel framework for intrusion detection specially designed for cyberattacks, such as Denial-of-Service, Distributed Denial-of-Service, Distributed Reflection Denial-of-Service, Brute Force, Botnets, and Sniffing, on vehicles that are situated in the Internet of Vehicles environment. We propose an intrusion detection system based on machine learning that is capable of detecting abnormal behavior by examining network traffic to find unusual data flows. In this paper, we have presented a strategy for intrusion detection through a careful evaluation and selection of the most effective techniques for the following steps of the machine learning process: (i) data preprocessing by using Z-score normalization that preserves the data distribution for the proposed method and handles outliers; (ii) feature selection by using a regression model that simplifies the model complexity and reduces the execution time; and (iii) model selection and training – Random Forest, Extreme Gradient Boosting, Categorical Boosting, Light Gradient Boosting Machine – with hyperparameter optimization to control the behavior in the training phase and to prevent overfitting. The effectiveness of the proposed solution is demonstrated by extensive numerical experiments carried out using the well-known standard datasets CIC-IDS-2017, CSE-CIC-IDS-2018, and CIC-DDoS-2019, both separately and merged. We achieved a high accuracy above 99.8% within a running time of 46.9 s and 0.24 s detection time for the three combined intrusion detection system datasets, thereby showing that the proposed intrusion detection system outperforms the previous methods introduced in the literature. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. A two-stage feature selection method for hob state recognition.

Author: Jia, Yachao and Li, Guolong
Subjects: *FEATURE selection, *RECOGNITION (Psychology), *VALUE engineering
Abstract: In mechanical field, tool state seriously affects the production quality and efficiency. Recently, machine learning is an important technique to realize tool state recognition. Extracting and selecting features from the raw input data of the machine learning model can effectively eliminate useless information and improve the model accuracy. Therefore, a two-stage feature selection method is proposed in this paper. We first qualitatively propose the non-discreteness and separability of signal features. For these two properties, quantitative calculation methods are then developed in two stages. The proposed two-stage feature selection method is applied to machine learning models for tool state recognition. In engineering application verification, the proposed method achieves an average recognition accuracy of 84.4% in hob wear recognition, and an average recognition accuracy of 99.8% and 94.4% for two data sets in hob fault recognition, which is significantly higher than other feature selection methods. The verification results indicate that the proposed method has good generalization and engineering application value. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Estimation of vehicle control delay using artificial intelligence techniques for heterogeneous traffic conditions.

Author: Ranpura, Pranjal, Shukla, Vipin, and Gujar, Rajesh
Subjects: *MACHINE learning, *ARTIFICIAL neural networks, *ARTIFICIAL intelligence, *TIME delay estimation, *STANDARD deviations, *FEATURE selection
Abstract: The conventional standardized theoretical models (such as Webster, Alcelik, Indo-HCM) used for the delay estimation revolve around the mathematical hypothesis and assumptions, making them static with limitations in accommodating the dynamic traffic behaviors. Recent advances in artificial intelligence make machine learning techniques suitable for estimating vehicle control delay compared to conventional methods. This paper demonstrates the application of several machine learning models developed by focusing on the fluctuations of traffic observed at an intersection having heterogeneous traffic conditions in Ahmedabad city for delay estimation. Several parameters are extracted from on-field and video surveys. However, not all parameters are relevant for accurately estimating vehicle control delay. Hence, a feature selection process consisting of several feature-scoring techniques from filter, wrapper, and embedded methods is applied to all the parameters. This process gave insights into the most statistically relevant independent parameters, and out of all the parameters, cycle time was found to be insignificant, with a feature score of 0 from almost all techniques. Hence, it was removed, and then the prominent parameters were then used to build a vehicle control delay model using support vector regression (SVR), K-nearest neighbor (KNN), artificial neural network (ANN), random forest regression (RF), and decision tree regression (DT). Error distribution, standard deviation of errors, coefficient of Determination (R2), and Root Mean Squared Error (RMSE) are the parameters used for evaluating the performance of the machine learning models. RF outperformed them all with a standard deviation of errors, R2, and RMSE of 11.065, 0.926, and 11.081 on testing data. But ANN, KNN, and DT also performed satisfactorily. Compared with conventional standardized theoretical models, all the machine learning models except SVR performed better. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. A Robust DDoS Intrusion Detection System Using Convolutional Neural Network.

Author: Najar, Ashfaq Ahmad and S., Manohar Naik
Subjects: *CONVOLUTIONAL neural networks, *INTRUSION detection systems (Computer security), *DENIAL of service attacks, *DEEP learning, *FEATURE selection, *DIGITAL technology, *MACHINE learning
Abstract: In today's digital age, the proliferation of network-connected devices has triggered a surge in cyberattacks. Distributed Denial-of-Service (DDoS) attacks pose a particularly formidable challenge to network security by disrupting access to vital services. While numerous researchers have proposed DDoS detection methods utilizing machine learning and deep learning techniques, developing a robust and reliable DDoS intrusion detection system remains challenging. This challenge is exacerbated by issues such as highly imbalanced data, multi-classification, and computational complexity. This paper proposes an innovative feature selection approach to create a robust intrusion detection system capable of detecting and classifying recent common DDoS attack types. We evaluate the performance of our model on the CICDDoS2019 benchmark dataset. Our experimental results demonstrate that our proposed model outperforms existing methods, achieving a detection accuracy of 96.82%, a recall of 96.82%, a precision of 96.76%, and an F1 score of 96.50%. Additionally, our model exhibits faster prediction times, with the ability to predict an attack in just 0.189 ms. Notably, our approach, combined with preprocessing and feature selection techniques, outperforms previous works and baseline models in DDoS attack classification. [Display omitted] • An innovative hybrid feature extraction method for training of DDoS attack detection model. • We propose a robust CNN model integrated with an Inception mechanism designed to classify various types of DDoS attacks. • Evaluated the network performance of the proposed system in terms of various parameters and compared the results with existing studies and baseline models. • The proposed model outperforms more than 96% in all performance metrics (Precision, Recall, Accuracy, and F1 Score). • The proposed model achieves the fastest prediction time of 0.189 ms among existing models in the literature. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Enhancing speech emotion recognition with the Improved Weighted Average Support Vector method.

Author: Zhang, Xiwen and Xiao, Hui
Subjects: EMOTION recognition, MACHINE learning, SUPPORT vector machines, FEATURE selection, SPEECH
Abstract: • Novel SER Method: Introduces Improved Weighted Average Support Vector (IWASV). • Enhanced Accuracy: Feature selection uses Reptile Search Optimization for improved results. • Dataset Evaluation: Tested on TESS, IITKGP-SEHSC, and RAVDESS datasets. • Superior Performance: Achieved high scores in specificity, recall, F1-score, accuracy, and precision. • Effective Emotion Recognition: Demonstrates superior efficiency in recognizing emotions in speech signals. Emotions have a vital role in human communication in today's time, they help individuals express their thoughts and understand the emotions of others better. Speech Emotion Recognition (SER) is a part of machine learning algorithms that aims to develop automated systems capable of detecting and classifying emotions expressed through speech signals. Although various approaches for SER have been established, the success rates vary depending on the language, emotions, and databases. This paper introduces a novel method called Improved Weighted Average Support Vector (IWASV) to enhance speech emotion recognition. The proposed model utilizes an Improved Weighted Average Support Vector that combines a Weighted Average Ensemble with an Improved Support Vector Machine. To enhance accuracy, feature selection employs a local search-based Reptile Search Optimization technique. Experiments were conducted on the TESS, IITKGP-SEHSC, and RAVDESS datasets, and performance evaluation measures such as AUC-ROC, specificity, recall, F1-score, accuracy, and precision were used to assess the proposed and existing methods. The results demonstrated the effectiveness of the IWASV method, achieving a specificity of 97.84%, recall of 97.88%, F1-score of 97.86%, accuracy of 98.56%, precision of 97.93%, and AUC-ROC values of 98.12%, 97.89%, and 97.92% for the TESS, IITKGP-SEHSC, and RAVDESS datasets, respectively. These findings highlight the superior efficiency of the proposed method in recognizing and understanding emotions conveyed through speech signals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Addressing label noise in leukemia image classification using small loss approach and pLOF with weighted-average ensemble.

Author: Aziz, Md. Tarek, Mahmud, S.M. Hasan, Goh, Kah Ong Michael, and Nandi, Dip
Subjects: IMAGE recognition (Computer vision), COMPUTER-aided diagnosis, COOPERATIVE game theory, FEATURE selection, DATA augmentation, MACHINE learning
Abstract: Machine learning (ML) and deep learning (DL) models have been extensively explored for the early diagnosis of various cancer diseases, including Leukemia, with many of them achieving significant performance improvements comparable to those of human experts. However, challenges like limited image data, inaccurate annotations, and prediction reliability still hinder their broad implementation to establish a trustworthy computer-aided diagnosis (CAD) system. This paper introduces a novel weighted-average ensemble model for classifying Acute Lymphoblastic Leukemia, along with a reliable Computer-Aided Diagnosis (CAD) system that combines the strengths of both ML and DL approaches. Initially, a variety of filtering methods are extensively analyzed to determine the most suitable image representation, with subsequent data augmentation techniques to expand the training data. Second, a modified VGG-19 model was proposed with fine-tuning that was utilized as a feature extractor to extract meaningful features from the training samples. Third, A small-loss approach and probabilistic local outlier factor (pLOF) have been developed on the extracted features to address the label noise issue. Fourth, we proposed an weighted-average ensemble model based on the top five models as base learners, with weights calculated based on their model uncertainty to ensure reliable predictions. Fifth, we calculated Shapley values based on cooperative game theory and performed feature selection with different feature combinations to determine the optimal number of features using SHAP. Finally, we integrate these strategies to develop an interpretable CAD system. This system not only predicts the disease but also generates Grad-CAM images to visualize potential affected areas, enhancing both clarity and diagnostic insight. All of our code is provided in the following repository: https://github.com/taareek/leukemia-classification [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Email spam detection by deep learning models using novel feature selection technique and BERT.

Author: Nasreen, Ghazala, Murad Khan, Muhammad, Younus, Muhammad, Zafar, Bushra, and Kashif Hanif, Muhammad
Subjects: FEATURE selection, MACHINE learning, SPAM email, DEEP learning, TECHNOLOGICAL innovations, ELECTRONIC data processing
Abstract: Due to the influx of advancements in technology and the increased simplicity of communication through emails, there has been a severe threat to the global economy and security due to upsurge in volume of unsolicited During the training of models, high-dimensional and redundant datasets may reduce the classification results of the model due to high memory costs and high computation. An important data processing technique is feature selection which helps in selecting relevant features and subsets of information from the dataset. Therefore, choosing efficient feature selection techniques is very important for the best performance of classification of a model. Moreover, most of the research has been performed using traditional machine learning techniques, which are not enough to deal with the huge amount of data and its variations. Also, spammers are becoming smarter with technological advancement. Therefore, there is a need for hybrid techniques consisting of deep learning and conventional algorithms to cope with these problems. We have proposed a novel scheme in this paper for email spam detection, which will result in an improved feature selection approach from the original dataset and increase the accuracy of the classifier as well. The literature has been studied to explore the efficient machine learning models that have been applied by different researchers for email spam detection and feature selection to acquire the best results. Our method, GWO-BERT, has given remarkable results with deep learning techniques such as CNN, biLSTM and LSTM. We have compared our models with RF and LSTM and used dataset: "Lingspam," which is a publicly available dataset. With different experiments, our technique, GWO-BERT, obtained 99.14% accuracy, which is almost equal to 100 percent. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Ensemble classifiers using multi-objective Genetic Programming for unbalanced data.

Author: Meng, Wenyang, Li, Ying, Gao, Xiaoying, and Ma, Jianbin
Subjects: FEATURE selection, GENETIC programming, MACHINE learning
Abstract: Genetic Programming (GP) can be used to design effective classifiers due to its built-in feature selection and feature construction characteristics. Unbalanced data distributions affect the classification performance of GP classifiers. Some fitness functions have been proposed to solve the class imbalance problem of GP classifiers. However, with the evolution of GP, single-objective GP classifiers evaluated by a single fitness function have poor generalization ability. Moreover, using the best evolved GP classifier for decision-making can easily lead to the possibility of misclassification. In this paper, multi-objective GP is used to optimize multiple fitness functions including AUC approximation (Wmw), Distance (Dist), and Complexity to evolve ensemble classifiers, which jointly determines the class labels of unknown instances. Experiments on sixteen datasets show that our multi-objective GP can significantly improve classification performance compared with single-objective GP, and our proposed ensemble classifiers evolved by multi-objective GP can further improve the classification performance than the single best GP classifier. Comparisons with six GP-based and five traditional machine learning algorithms show that our proposed approaches can achieve significantly better classification performance on most cases. • Proposed multi-objective GP can significantly improve the classification performance compared with single-objective GP. • Proposed ensemble classifiers can further improve the classification performance. • Proposed approaches can achieve better performance than six GP-based and five traditional machine learning algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Effect of feature optimization on performance of machine learning models for predicting traffic incident duration.

Author: Obaid, Lubna, Hamad, Khaled, Khalil, Mohamad Ali, and Nassif, Ali Bou
Subjects: *ARTIFICIAL neural networks, *MACHINE performance, *DATA distribution, *K-nearest neighbor classification, *PRINCIPAL components analysis, *MACHINE learning
Abstract: Developing a high-performing traffic incident-duration prediction model is considered a key component for evaluating the impact of these incidents on the roadway network. Various research studies have developed robust incident-duration prediction models. Still, they have faced many issues in providing an accurate prediction result due to the countless data modeling issues, such as complex correlations, highly skewed data distributions, heteroscedasticity, and outliers. This paper investigates the impact of feature optimization (FO) - a relatively new term encompassing two already-known topics: feature engineering (FE) and feature selection (FS) techniques - on the performance of several machine learning models developed for predicting incident durations. The models developed included multivariate linear regression, decision trees, support vector regressors, K-Nearest Neighbors regression, ensembles, and artificial neural networks. Various FO techniques have been used for each model to derive the massive traffic incidents dataset and repeat the prediction process. Our results show that the proposed filtering, wrapper, and embedded FS techniques can successfully reduce the number of features without sacrificing the prediction performance. Using log-normal transformation to deal with continuous data skewness, min-max normalization to deal with data variability, and principal component analysis (PCA) to reform the dataset into a smaller independent feature subset, FE techniques can enhance the accuracy of incident duration estimation over the assessed ML models. The best-performing FE technique was the PCA since performance improvements were observed across all developed ML models. The best-performing FS technique was the Recursive Feature Elimination, outperforming other tested techniques in reducing model complexity while maintaining model accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Probabilistic feature selection for improved asset lifetime estimation in renewables. Application to transformers in photovoltaic power plants.

Author: Ramirez, Ibai, Aizpurua, Jose I., Lasa, Iker, and del Rio, Luis
Subjects: *FEATURE selection, *POWER transformers, *RENEWABLE energy sources, *TRANSFORMER models, *PHOTOVOLTAIC power systems, *ELECTRIC power distribution grids, *POWER plants
Abstract: The increased penetration of renewable energy sources (RESs) as an effective mechanism to reduce carbon emissions leads to an increased weather dependency for power and energy systems. This has created dynamic operation and degradation phenomena, which affect the lifetime estimation of the assets operated with RESs. For the reliable and efficient operation of RES it is crucial to monitor the health of its constituent components and feature selection is a crucial step for building robust and accurate health monitoring approaches. In this context, this paper presents a probabilistic feature selection approach, which probabilistically weights and selects features through a heuristic and iterative process for an improved asset lifetime estimation. Power transformers are key power grid assets and they are used to demonstrate the validity and impact of the proposed approach. The approach is tested on two different photovoltaic power plants operated in Spain and Australia. Results consistently show that the proposed feature-selection approach reduces the prediction error and consistently selects relevant features. The approach has been applied to transformer lifetime estimation, but it can be generally applied to assist in the lifetime estimation of other components operated in RESs. Part of the studies presented here as well as source codes are all open-source under the GitHub repository https://github.com/iramirezg/FeatureSelection. • Probabilistic feature selection approach for improved asset lifetime estimation. • Integration of environmental features for improved renewable-operated asset lifetime. • Systematic and robust feature weighting methodology. • Improved transformer lifetime estimation including sensor and environmental data. • Validated on two real photovoltaic power plant case studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. A novel machine learning approach for detecting first-time-appeared malware.

Author: Shaukat, Kamran, Luo, Suhuai, and Varadharajan, Vijay
Subjects: *DEEP learning, *MACHINE learning, *MALWARE, *FEATURE selection, *WILCOXON signed-rank test, *REVERSE engineering
Abstract: Conventional malware detection approaches have the overhead of feature extraction, the requirement of domain experts, and are time-consuming and resource-intensive. Learning-based approaches are the mainstay of malware detection as they overcome most of these challenges by significantly improving the detection effectiveness and providing a low false positive rate. The exponential growth of malware variants and first-time-appeared malware, which includes polymorphic and zero-day attacks, are some of the significant challenges to learning-based malware detectors. These challenges have catastrophic impacts on the detection effectiveness of these learning-based malware detectors. This paper proposes a novel deep learning-based framework to detect first-time-appeared malware effectively and efficiently by providing better performance than conventional malware detection approaches. First, it translates and visualises each Windows portable executable (PE) file into a coloured image to eliminate the overhead of feature extraction and the need for domain experts to analyse the features. In the subsequent step, a fine-tuned deep learning model is used to extract the deep features from the last fully connected layer. The step has reduced the cost of training required by the deep learning models if used for end-to-end classification. The third step selects the most important and influential features through a powerful feature selection algorithm. The most important features are then fed to a one-class classifier for final detection. With the one-class classifier, an enclosed boundary around the features of benign data is constructed. Anything outside the boundary is declared as an anomaly/malicious. It has enhanced the framework's ability to detect evolving, unseen, polymorphic, and zero-day attacks, as well as reducing the problem of overfitting. The detection effectiveness of the proposed framework is validated with state-of-the-art deep learning models and conventional approaches. The proposed framework has outperformed with an accuracy of 99.30% on the Malimg dataset. The Wilcoxon signed-rank test is used to validate the statistical significance of the proposed framework. It is evident from the results that the proposed framework is effective and can be used in the defence industry, resulting in more powerful and robust solutions against zero-day and polymorphic attacks. [Display omitted] • A novel approach of combining deep learning and machine learning is proposed. First, deep learning is used to extract deep features. The most influential and meticulous features are selected in the subsequent steps to train the machine learning classifier for final detection. The proposed framework eliminates the need for human efforts for reverse engineering tasks. • The proposed framework consists of four steps. In the first step, all PEs are transformed into coloured images. The second step used a deep learning model to extract the deep features. The subsequent step selects the most important features. Finally, the lightweight and most influential features are sent to the final machine learning classifier for final malware detection. • We demonstrate that the proposed framework is lightweight, resilient, efficient and cost-effective. An in-depth analysis is performed to validate the detection effectiveness and generalisation of the proposed framework on multiple datasets. Our results demonstrate that the proposed framework outperformed conventional and state-of-the-art malware detection approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Machine learning for battery systems applications: Progress, challenges, and opportunities.

Author: Nozarijouybari, Zahra and Fathy, Hosam K.
Subjects: *MACHINE learning, *INSTRUCTIONAL systems, *OPEN scholarship, *ELECTRIC batteries, *FEATURE selection, *ELECTRIC machines, *QUALITY control
Abstract: Machine learning has emerged as a transformative force throughout the entire engineering life cycle of electrochemical batteries. Its applications encompass a wide array of critical domains, including material discovery, model development, quality control during manufacturing, real-time monitoring, state estimation, optimization of charge cycles, fault detection, and life cycle management. Machine learning excels in its ability to identify and capture complex behavioral trends in batteries, which may be challenging to model using more traditional methods. The goal of this survey paper is to synthesize the rich existing literature on battery machine learning into a structured perspective on the successes, challenges, and prospects within this research domain. This critical examination highlights several key insights. Firstly, the selection of data sets, features, and algorithms significantly influences the success of machine learning applications, yet it remains an open research area with vast potential. Secondly, data set richness and size are both pivotal for the efficacy of machine learning algorithms, suggesting a potential for active machine learning techniques in the battery systems domain. Lastly, the field of machine learning in battery systems has extensive room for growth, moving beyond its current focus on specific applications like state of charge (SOC) and state of health (SOH) estimation, offering ample opportunities for innovation and expansion. • Machine learning applications are reviewed for the full battery life cycle. • Machine learning can revolutionize battery design, modeling, and management. • Key benefits of machine learning are transferability and physics independence. • Challenges include feature selection and the size/richness of data for learning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. A multitasking multi-objective differential evolution gene selection algorithm enhanced with new elite and guidance strategies for tumor identification.

Author: Li, Min, Zhao, Yangfan, Lou, Mingzhu, Deng, Shaobo, and Wang, Lei
Subjects: *DIFFERENTIAL evolution, *GENE expression, *FEATURE selection, *GENES, *TUMOR markers, *MACHINE learning, *ALGORITHMS, *KNOWLEDGE transfer
Abstract: • MMODE is developed as a new hybrid gene selection method for tumor identification. • MMODE combines multi-tasking and multi-objective frameworks. • MMODE uses a new elite strategy and a new guidance strategy. • MMODE selects a few genes and achieves high classification accuracy. A key preprocessing step in tumor recognition based on microarray expression profile data and machine learning is to identify tumor marker genes. Gene selection aims to select the most relevant gene subset from the original ultra-high dimensional microarray expression profile data to improve tumor identification performance. Inspired by evolutionary multitasking (EMT) and multi-objective optimization, this paper puts forward a novel multitasking multi-objective differential evolution gene selection algorithm (MMODE) which uses new elite and guidance strategies to select the best gene subsets. MMODE initializes two different populations according to different filtering criteria to increase the diversity of the search. These two populations guide their respective populations to search in the optimal direction through knowledge transfer in the evolutionary process. In addition, MMODE employs new elite and guidance strategies that enables individuals to narrow the search range and jump out of local optima. The proposed algorithm is validated on 13 publicly available microarray expression datasets in comparison with state-of-the-art gene selection algorithms. The experimental results show that MMODE can find smaller gene subsets and achieve higher classification accuracy compared with other algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Deep-stratification of the cardiovascular risk by ultrasound carotid artery images.

Author: Vila, Maria del Mar, Gago, Lucas, Pérez-Sánchez, Pablo, Grau, Maria, Remeseiro, Beatriz, and Igual, Laura
Subjects: CAROTID artery ultrasonography, ARTIFICIAL neural networks, CAROTID artery, CONVOLUTIONAL neural networks, CAROTID intima-media thickness, CARDIOVASCULAR diseases risk factors, FEATURE selection
Abstract: Cardiovascular risk estimation functions predict the risk of cardiovascular events with clinical data and survival models. These functions accurately stratify individuals into low, moderate, and high-risk categories. However, they tend to classify a considerable number of individuals into the middle-risk category, and often, a subsequent reclassification into high-risk groups is required. Atherosclerosis is the leading cause of cardiovascular events, and ultrasound images of the Carotid Artery (CA) can detect its burden by measuring the carotid intima-media thickness and identifying atherosclerotic plaques. Current risk estimation functions do not consider ultrasound imaging. This paper proposes the use of deep ultrasound CA image features in survival models to improve risk stratification. In particular, we define new deep CA image features, extracting information from a convolutional neural network, and add them to an existing risk function. The experiments carried out show that using deep image features improves the AUC of the risk function to 0.842, and these features are enough to replace the information provided by blood biomarkers. Furthermore, the use of these features resulted in a 20% improvement in the reclassification of risk categories, specifically for individuals who suffered an event, as shown by the net reclassification improvement metric. • First survival model that integrates CA image features from deep neural networks. • Risk stratification enhanced by reclassification using deep image features. • Localized plaque data can replace blood biomarkers in cardiovascular risk prediction. • Reduction of post-event middle-risk individuals, shifting to high risk. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. A new filter-based gene selection approach in the DNA microarray domain.

Author: Ouaderhman, Tayeb, Chamlal, Hasna, and Janane, Fatima Zahra
Subjects: *FEATURE selection, *MACHINE learning, *MULTIPLE criteria decision making, *LEARNING ability, *TOPSIS method, *DNA
Abstract: The high dimensionality of data hinders the learning ability of machine learning algorithms. Feature selection techniques can be used to reduce dimensionality, which is an important step for processing high-dimensional data. Feature selection solves this problem by removing irrelevant and redundant information, which can improve learning models, reduce calculation time, and improve learning accuracy. In this paper, a novel filter in mixed-attribute datasets for feature selection is proposed. The independent attributes are mixed or heterogeneous in the sense that both numerical and categorical attribute types may appear together in the same dataset. Based on the preordonnances theory, we use a new concept to quantify the relevance and redundancy of features even if there are heterogeneous (mixed-type) data. The technique for order preference by similarity to the ideal solution is one of the well-known multicriteria decision-making methods; it is utilized as a weighting and informative feature selection filter. To assess the effectiveness of the proposed method, several experiments, both simulated and real, are performed, including a comparison to other well-known filter methods. The experimental results show that, in most cases, the method yielded competitive results in comparison to other methods. • Introducing new criterions for relevance and complementarity. • Dealing with mixed-attribute datasets without requiring a preprocessing step. • The multi-criteria decision method, namely TOPSIS, is used to score the explanatory features. • Investigate the performance of the proposed filter in high-dimensional data by using simulated and real data sets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Relationship between feature importance and building characteristics for heating load predictions.

Author: Neubauer, Alexander, Brandt, Stefan, and Kriegel, Martin
Subjects: *HEATING load, *HEATING, *FACTORIAL experiment designs, *RANDOM forest algorithms, *FOREIGN exchange rates, *MACHINE learning, *FEATURE selection
Abstract: The use of machine learning in building technology has become increasingly important in recent years. One of the applications is heating load prediction, which enables demand-side flexibility. Most studies consider the heating load prediction without sufficient context with the existing building characteristics. For an accurate load prediction, suitable features have to be selected according to their importance, the feature importance (FI). The scope of this paper is to investigate whether there is a relationship between the building characteristics and the FI and if so, how strong this relationship is. Additionally, an analysis has been conducted to determine which building characteristic have the most significant impact on FI. For this purpose, a full factorial design of a room with six different building characteristics is carried out. In total, the heating load is calculated for 15 552 room variants. The thermal balance, correlation, random forest FI, permutation FI and SHapley Additive exPlanations (SHAP) values are calculated for these different rooms. The local SHAP values were used to explain the model. These values also provide insight into the interaction of individual features with the heating load. For most variants, the outdoor temperature had the highest FI. It is investigated which building characteristics have the greatest influence on the thermal balance, correlation, FI and SHAP values. A relationship was found between the proportion of thermal balance, the correlation between the features and the label as well as the FI. The greatest association with the thermal balance characteristics was found for the SHAP values. The study shows the systematic relationship between building characteristics and FI. Therefore, FI should always be considered in the context of building characteristics. [Display omitted] • SHAP values provide an excellent way of showing how features affect heating load. • Comparison of various feature selection methods for 15 552 different rooms. • Characteristics of the building are reflected in the feature importance. • Air exchange rate has the greatest influence on the feature importance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Minimising redundancy, maximising relevance: HRV feature selection for stress classification.

Author: Ihianle, Isibor Kennedy, Machado, Pedro, Owa, Kayode, Adama, David Ada, Otuka, Richard, and Lotfi, Ahmad
Subjects: *REDUNDANCY in engineering, *HEART beat, *PEARSON correlation (Statistics), *FEATURE selection
Abstract: Heart rate variability serves as a valuable indicator and biomarker for stress detection and monitoring. Feature selection, which aims to identify relevant features from a large set of variables, is a crucial preprocessing step towards this. However, this task becomes challenging due to high dimensionality and the presence of irrelevant and redundant attributes. The Minimum Redundancy and Maximum Relevance (mRMR) feature selection method addresses this challenge by selecting relevant features while controlling redundancy. This paper presents extensions and evaluated versions of the mRMR feature selection methods for stress detection using Heart Rate Variability (HRV) measures. The proposed feature selection methods extend the traditional mRMR by replacing the Pearson correlation redundancy with non-linear feature redundancy measures capable of capturing more complex relationships between variables. An extensive empirical evaluation is conducted on the proposed mRMR extensions, comparing them with four other baseline feature selection methods using three publicly available datasets. The experimental results demonstrate the effectiveness of incorporating the non-linear feature redundancy measure into the feature selection process. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. An efficient classification framework for Type 2 Diabetes incorporating feature interactions.

Author: Tuppad, Ashwini and Devi Patil, Shantala
Subjects: *TYPE 2 diabetes, *FEATURE selection, *TYPE 2 diabetes diagnosis, *MACHINE learning
Abstract: Accurate and timely diagnosis of Type 2 Diabetes is a highly challenging task due to its initial asymptomatic nature and complex risk factor composition. Recently, Machine Learning (ML) has been actively used to build improved Type 2 Diabetes classification systems. One important aspect of these systems has been feature selection. Filter feature selection techniques especially based on mutual information have been popularly employed in recent works. However, most of them have focused on selecting relevant features and eliminating redundant ones. A third relationship called feature interaction may exist if input features are highly correlated, as in the case of Type 2 Diabetes. Feature interaction signifies the additional information about the target provided by the interaction between a subset of input features, that may not be relevant to the target individually. Second, many of the existing ML models are black-box, making the model interpretability very difficult. This paper proposes an efficient ML framework for the classification of Prediabetes and Type 2 Diabetes by incorporating feature interactions. It presents a hybrid filter-wrapper technique called Feature Interaction-based Greedy Sequential Feature selection. Agglomerative Feature Clustering and Dendrogram visualization for the analysis of interactive features is performed. A model-agnostic explainability technique of SHapley Additive explanations (SHAP) is augmented to provide local and global explanations of model predictions. The performance of the proposed classification framework was found to be interpretable as well as efficient with an accuracy of 98.8669%, precision of 98.8660%, recall of 98.8665%, and F-score of 0.9886 for Diabetes. For Prediabetes, an accuracy of 90.1187%, precision of 94.6958%, recall of 90.1187%, and F-score of 0.9403 was obtained. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Abnormal Heart Sound Detection using Time-Frequency Analysis and Machine Learning Techniques.

Author: Sadeghi Nia, Parastoo and Danandeh Hesar, Hamed
Subjects: HEART murmurs, TIME-frequency analysis, MACHINE learning, PARTICLE swarm optimization, DATABASES
Abstract: Phonocardiogram (PCG) signals contains valuable information pertaining to heart valve functionality, rendering them potentially useful for early detection of cardiovascular diseases. Automated classification of heart sounds harbors great promise for identifying cardiac pathologies. This paper introduces a novel automated approach to classify normal and abnormal heart sounds. Our methodology involves partitioning heart sounds into four segments: S1, S2, systolic, and diastolic, followed by extraction of time–frequency and time-statistical features. Prior to data classification, we employ two techniques - particle Swarm optimization (PSO) and Sequential Forward Feature Selection (SFFS) - for efficient feature selection. We assess the performance of the proposed method on the Physionet Challenge 2016 database, utilizing the 10-fold cross-validation method. To address the issue of dataset imbalance, we apply the synthetic minority over-sampling technique (SMOTE) to create balanced datasets. Our approach surpasses existing methods in the literature, as evidenced by its superior accuracy, sensitivity, and specificity metrics. Specifically, our method achieves an accuracy of 98.03%, a sensitivity of 97.64%, and a specificity of 98.43% in distinguishing normal from abnormal heart sounds on the Physionet database. These findings outperform the results obtained by previously established methods evaluated on the Physionet 2016 challenge database. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Emotion recognition by skeleton-based spatial and temporal analysis.

Author: Oğuz, Abdulhalık and Ertuğrul, Ömer Faruk
Subjects: *EMOTION recognition, *MACHINE learning, *HUMAN-computer interaction, *FEATURE selection, *POSTURE
Abstract: • AER system utilizes kinematic data for enhanced human–computer interaction. • Diverse algorithms achieve >99% success (small windows), >88% (larger). • Innovative analysis: joint neighborhood, window size, axis combination efficiency. • Practical CNN algorithms enable real-time emotion recognition. • Insights from transparent datasets lead to emotion recognition advancements. This study introduces an automatic emotion recognition system (AER) focusing on skeletal-based kinematic datasets for enhanced human–computer interaction. Departing from conventional approaches, it achieves real-time emotion recognition in real-life situations. The dataset covers seven emotions and undergoes assessment by eight diverse machine and deep learning algorithms. A thorough investigation is undertaken by varying window sizes and data states, including raw positions and feature-extracted data. The findings imply that incorporating advanced techniques like joint-related feature extraction and robust classifier models yields promising outcomes. Dataset augmentation via varying window sizes enriches insights into real-world scenarios. Evaluations exhibit classification accuracy surpassing 99% for small windows, 94% for medium, and exceeding 88% for larger windows, thereby confirming the robust nature of the approach. Furthermore, we highlight window size's impact on emotion detection and the benefits of combining coordinate axes for efficiency and accuracy. The analysis intricately examines the contributions of features at both the joint and axis levels, assisting in making well-informed selections. The study's contributions include carefully curated datasets, transparent code, and models, all of which ensure the possibility of replication. The paper establishes a benchmark that bridges theory and practicality, solidifying the proposed approach's effectiveness in balancing accuracy and efficiency. By pioneering advanced AER through kinematic data, it sets a new standard for efficacy while driving seamless human–computer interaction through rigorous analysis and strategic design. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Deep learning in stock portfolio selection and predictions.

Author: Alzaman, Chaher
Subjects: *DEEP learning, *MARKET volatility, *FEATURE selection, *RATIO analysis, *COVID-19, *INVESTORS
Abstract: Deep learning (DL) has made its way into many disciplines ranging from health care to self-driving cars. In financial markets, we see a rich literature for DL applications. Particularly, investors require robust algorithms that can navigate and make sense of extremely noisy and volatile markets. In this work, we use deep learning to select a portfolio of stocks and use a genetic algorithm to optimize the hyperparameters of DL. The work analyzes the improvement in using genetic-based hyperparameter optimization over grid searches. The Genetic Algorithm brings 40% improvements in prediction when compared to a random-grid search. Novelty-wise, the work couples a genetic-based hyperparameter optimization with multiple Deep RankNet models to predict the behavior of financial assets. Our results show promising portfolio returns 20% better than the general market. In the highly volatile COVID 19 period, the models exceed market returns by more than double. Overall, this paper brings a comprehensive work that integrates hyperparameter optimization, Deep RankNet, LSTM, period size variations, input variable transformation, feature selection, training/evaluation ratio analysis, and multiple portfolio selection strategies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Attribute reduction algorithms with an anti-noise mechanism for hybrid data based on fuzzy evidence theory.

Author: Zhang, Qinli, Song, Yan, Peng, Yichun, and Li, Zhaowen
Subjects: *CLASSIFICATION algorithms, *FUZZY algorithms, *HYBRID systems, *ALGORITHMS, *FEATURE selection, *ROUGH sets, *DATA reduction, *MACHINE learning
Abstract: Attribute reduction can remove data noise and redundancy, thus reducing computational complexity, which is very important for machine learning. Because the difference between nominal attribute values is difficult to measure, attribute reduction for hybrid data faces challenges. In addition, most of the existing methods are sensitive to noise due to the lack of an anti-noise mechanism. Decision attribute contains the most important information of data. This paper proposes some techniques that consider the above problems from the perspective of fuzzy evidence theory. First of all, a new distance incorporating decision attributes is defined, and then a new fuzzy relation with an anti-noise mechanism is defined. Furthermore, fuzzy belief and fuzzy plausibility are defined based on the defined new distance and new fuzzy relation. In this framework, two anti-noise attribute reduction algorithms for hybrid data are proposed. Experiments on 12 data sets of various types show that compared with the other 8 state-of-the-art algorithms, the proposed algorithms improve the classification accuracy by at least 2% and the anti-noise ability by at least 11%. Therefore, it can be concluded that the proposed algorithms have excellent anti-noise ability while maintaining good feature selection ability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Encrypted network traffic classification based on machine learning.

Author: Elmaghraby, Reham T., Abdel Aziem, Nada M., Sobh, Mohammed A., and Bahaa-Eldin, Ayman M.
Subjects: COMPUTER network traffic, TELECOMMUNICATION network management, MACHINE learning, TIME complexity, DATA privacy, STREAMING video & television, K-nearest neighbor classification, FEATURE selection
Abstract: Encrypted traffic is an essential part of maintaining the security and privacy of data transmission. It plays an important role in keeping our networks secure by preventing attackers from intercepting confidential information, which they may access without authorization; However, its effectiveness relies heavily on accurate classification techniques being applied correctly, so we can differentiate between legitimate users' activities versus those attempting malicious activity within the networks' boundaries. Encrypted network traffic is becoming increasingly common in modern communication systems, presenting a challenge for effective network management and security. To address this challenge, machine learning models have been employed to classify encrypted traffic but with limited success due to the lack of clear visibility into packet contents and an inability to inspect their content. For the sake of tackling this issue, more effective research has begun on developing machine learning models for classifying encrypted payloads without relying on inspecting their contents directly. This research will investigate how features like packet length, time stamps or transport layer security (TLS) and encrypted payload information can be used as input features when attempting classification tasks, instead of analyzing unencrypted content directly from packets themselves which would otherwise be impossible given the current technology constraints. The evaluation process will focus on assessing different model architectures, as well as feature selection techniques that yield improved results over the existing approaches. In this paper, we proposed three approaches to identify encrypted traffic and classify different applications such as browsing, VOIP, file transfer and video streaming. The first two techniques consist of two stages: the first stage is either a neural network or a bi-directional LSTM, and the second stage is a selection of different classification techniques, namely Random Forest, Support vector machine, Linear regression, and K-nearest neighbor. The final result is achieved using an ensemble voting technique. As for the third technique, the network packets are grouped together by Source IP, destination IP and session time before feeding them into three different combinations of LSTM networks; either coupled with convolution 1D or 2D layers, or without. Like the first two techniques, the final result is achieved by means of ensemble voting. Through extensive comparison between the three approaches, The first approach yielded the highest accuracy. However, the performance of the second and third techniques in terms of time complexity was superior. The achieved accuracies were 96.8%, 95.2% and 96.5% for the proposed techniques, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. An improved high-impedance fault identification scheme for distribution networks based on kernel extreme learning machine.

Author: Sheng, Wanxing, Liu, Keyan, Jia, Dongli, and Wang, Yao
Subjects: *MACHINE learning, *DISCRETE wavelet transforms, *FEATURE extraction, *IDENTIFICATION, *HILBERT-Huang transform, *FEATURE selection
Abstract: • A sensitive fault feature library based on high frequency components is constructed. • A feature selection method based on XGBoost can remove redundant features. • A reliable HIF identification scheme based on KELM can identify fault phase selection accurately. In distribution networks, high-impedance faults (HIFs) occur frequently and have a harmful impact on the distribution network. However, fault detection and fault phase selection of HIFs are challenging due to weak fault characteristics. Therefore, this paper proposes an improved HIF identification scheme based on a kernel extreme learning machine (KELM) that can sensitively identify HIFs and select the fault phase by adaptively extracting the weak fault characteristics. First, a fault feature extraction strategy based on discrete wavelet decomposition (DWT) and the Hilbert–Huang transform (HHT) is proposed to obtain multiple features that describe the weak fault characteristics of HIFs. Second, an XGBoost-based fault feature selection scheme is proposed for screening with sensitive characterization of HIFs. Next, a sensitive and accurate HIF identification scheme based on the improved learning algorithm (KELM) is proposed to enable accurate and sensitive HIF detection and phase selection. Finally, numerical simulations based on PSCAD/EMTDC and MATLAB were carried out, which reveals the effectiveness and accuracy of the proposed HIF identification scheme. Compared with the traditional HIF identification scheme, the proposed method exhibits conciseness and correctness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Improved binary differential evolution with dimensionality reduction mechanism and binary stochastic search for feature selection.

Author: Ahadzadeh, Behrouz, Abdar, Moloud, Safara, Fatemeh, Aghaei, Leyla, Mirjalili, Seyedali, Khosravi, Abbas, García, Salvador, Karray, Fakhri, and Acharya, U.Rajendra
Subjects: FEATURE selection, MACHINE learning, DIFFERENTIAL evolution, SEARCH algorithms, COVID-19, HEART diseases
Abstract: Computer systems store massive amounts of data with numerous features, leading to the need to extract the most important features for better classification in a wide variety of applications. Poor performance of various machine learning algorithms may be caused by unimportant features that increase the time and memory required to build a classifier. Feature selection (FS) is one of the efficient approaches to reducing the unimportant features. This paper, therefore, presents a new FS, named BDE-BSS-DR, that utilizes Binary Differential Evolution (BDE), Binary Stochastic Search (BSS) algorithm, and Dimensionality Reduction (DR) mechanism. The BSS algorithm increases the search capability of the BDE by escaping from local optimal points and exploring the search space. The DR mechanism then reduces the dimensions of the search space gradually. As a result of using DR, the local optima of the search space and the problem of wrong removal of important features before starting the search process are reduced. The algorithm's efficiency is evaluated on 20 different medical datasets. The obtained outcomes indicate that the BDE-BSS-DR outperforms the BDE and BDE-BSS algorithms significantly. Furthermore, the effectiveness of the proposed algorithms in selecting the most important features of the heart disease data, several cancer diseases, and COVID-19 are also compared with several other state-of-the-art methods. Our results show that the BDE-BSS-DR with SVM classifier has a significant advantage over other methods with an average classification accuracy of 95.05% in heart disease and 99.40% in COVID-19 disease. In addition, the comparisons made with KNN and SVM classification prove the efficiency of the DR and BSS in generating a subset of optimal and informative features. • Proposing an optimal feature selection algorithm for medical datasets. • Applying binary differential evolution approach and local search algorithm. • Applying dimensionality Removal mechanism. • Validating on heart disease, several cancer and RNA-seq COVID-19 datasets. • Obtaining outstanding performance in generating optimal feature subset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Development of a novel wrist pulse system for early diagnosis of pathogenic bacterial infections using optimized feature selection with machine learning approaches.

Author: Kumar, Sachin, Veer, Karan, and Kumar, Sanjeev
Subjects: MACHINE learning, BACTERIAL diseases, ANT algorithms, EARLY diagnosis, NOSOLOGY, FEATURE selection
Abstract: • Wrist pulse system is discussed for early diagnosis of pathogenic bacterial infections. • To understand the effect of Ayurvedic and Traditional Chinese Medicine [TCM] based wrist pulse measurement. • Prototype was designed to measure the pulse signal at three doshas (Vata, Pitta, Kapha) locations. • Further, three different optimization-based feature selection, viz., whale optimization ant colony optimization (ACM), and proposed binary Greywolf optimization (BGWO) were proposed. Pelvic inflammatory disease (PID) and urinary tract infections (UTI) are two Pathogenic bacterial infections. PID affects the female reproductive system, whereas UTI affects the urine system. Females are more vulnerable to both forms of illnesses and challenging to detect simultaneously due to similar symptoms. Many Clinical procedures have previously been used to diagnose the diseases, but they are painful, costly, prone to radiation, and invasive. Therefore, this paper proposed the Ayurvedic and Traditional Chinese Medicine [TCM] based wrist pulse measurement to diagnose these bacterial infections non-invasively. An Artery tonometry-based test prototype was designed to measure the pulse signals at three doshas (Vata, Pitta, Kapha) locations on the radial artery under variable static force. The data is recorded at an intermediate force level to get maximum accuracy. Signals feature viz., time domain feature, autoregressive (AR) model features, approximation entropy, sample entropy, and multiscale entropy have been calculated. Further, the efficiency of the machine learning model is enhanced through optimization-based feature selection techniques. Three different optimization-based feature selection, viz., whale optimization (WOA), ant colony optimization (ACO), and proposed binary Greywolf optimization (BGWO), has been proposed to acquire the optimized features subsets for accurate classification of disease subjects. These subsets have been used by popular machine learning techniques to classify the PID and UTI. The results of the experiment demonstrate that as the static force changed, the wrist pulse signal first increased to an intermediate force level and then decreased. Second, the classification accuracy of the machine learning model with feature selection is more than conventional feature classification. Third, the wrist pulse analysis (WPA) shows the maximum accuracy in comparison to conventional clinical PID diagnosis, where Vata dosha has the highest classification accuracy (94.1%) when using BGWO-SVM, followed by Pitta dosha (88.2%) when using BGWO-BNN. Similarly, specificity has been used for UTI classification, where Vata dosha has the highest specificity of 100% with BGWO-SVM, followed by pitta dosha with 83.3% BGWO-BNN. Finally, three dosha classification findings demonstrate that Vata and Pitta doshas have been more significantly impacted, as they have the highest classification accuracy compared to conventional clinical diagnosis, which produces similar results consistent with ayurvedic literature. Therefore, WPA can provide a unique diagnostic method for early PID/UTI identification. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Measuring vertical track irregularities from instrumented heavy haul railway vehicle data using machine learning.

Author: Pires, A.C., Viana, M.C.A., Scaramussa, L.M., Santos, G.F.M., Ramos, P.G., and Santos, A.A.
Subjects: *RAILROAD trains, *STANDARD deviations, *FEATURE selection, *FEATURE extraction
Abstract: This paper proposes a data-driven approach to estimating geometric track irregularities from instrumented railway vehicle (IRV) data. Machine learning is used to find the nonlinear mapping between IRV data and track irregularities. A dynamic model of the BRA1 railway vehicle was used to generate an artificial dataset that contains variables that are measured by the real BRA1 IRV and other variables measured by IRVs found in the literature. An extensive data analysis step was done to verify if the current instrumentation of the BRA1 IRV is sufficient for obtaining both lateral and vertical track irregularities. Feature engineering based on wagon movements, signal integration and time domain statistical metrics were applied to extract features and then the best features were selected using a wrapper method. Eight different regression ML models were trained and optimized after the feature selection using Optuna. The results show that, with the current instrumentation of the BRA1 IRV, obtaining lateral track irregularities is unlikely due to low correlation, however, vertical irregularities can be obtained with a root mean squared error (RMSE) of 0.556 mm. With postprocessing, the RMSE was further reduced to 0.410 mm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Probabilistic prediction of uniaxial compressive strength for rocks from sparse data using Bayesian Gaussian process regression with Synthetic Minority Oversampling Technique (SMOTE).

Author: Song, Chao, Zhao, Tengyuan, Xu, Ling, and Huang, Xiaolin
Subjects: *KRIGING, *MATERIALS compression testing, *FEATURE selection, *TIME management, *SAMPLE size (Statistics), *FORECASTING
Abstract: Uniaxial compressive strength (UCS) of rocks is one of key rock strength parameters. Generally speaking, UCS can be measured directly through uniaxial compression tests, which is often unfeasible, especially when intact rock samples are highly fragile. Alternatively, the UCS of rocks can be estimated indirectly from other easily available rock indices. Note that adequate measurement data is the prerequisite for the accurate estimation of UCS using indirect methods. This may be difficult to achieve due to the limitation of time and budget, especially for small- to medium-sized projects. In this case, it becomes a challenging issue on how to develop a robust and reliable model for UCS estimation using the sparse measurement data. A fully Bayesian Gaussian process regression (fB-GPR) approach with Synthetic Minority Oversampling Technique (SMOTE) is proposed in this paper to address this problem. A real-life example from Malaysia was used for illustration and validation of proposed method. Results showed that when the synthetic sample size in SMOTE reaches 30 (i.e., optimal synthetic sample size), the coefficient of determination (R 2) increases by about 18.92%, and the accuracy of feature selection reaches 98%, compared with the scenario with only sparse measurement data used for fB-GPR model development. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Evaluation of multispectral data for recent manure application: A case study in northern Spain.

Author: Pedrayes, Oscar D., Usamentiaga, Rubén, Trichakis, Yanni, and Bouraoui, Faycal
Subjects: *MULTISPECTRAL imaging, *POLLUTION, *FEATURE selection, *REMOTE-sensing images, *AGRICULTURE, *PRECISION farming, *MANURES
Abstract: [Display omitted] • Inclusion of temporal data can improve detection F 1 -Score by 8% • Infrared bands provide about 4% more F 1 -Score than visible bands despite lower resolution • Using over 80 features provides an increase of about 12% F 1 -Score over using less than 10. • The proposed method successfully detects all test plots with nearly 90% F 1 -Score • A dataset of recent manure application, verified through on-site validation, is made public. The use of manure in agricultural fields during the wet season can lead to environmental pollution by releasing nitrates into nearby water sources. To address this issue, authorities may impose closed periods during which manure application is prohibited. However, ensuring compliance with these regulations can be challenging, as it is difficult to monitor all fields in a country. To tackle this problem, a solution has been proposed that involves employing machine learning techniques in conjunction with satellite imagery to automatically identify freshly manured fields. This paper investigates the relationship and effectiveness of the Sentinel-2 satellite bands and 51 frequently utilized multispectral indices in the context of precision agriculture, by exploring different feature selection methods. The proposed method achieves nearly 90% F 1 -Score and detects all test plots of the northern Spanish region, showing its potential for large-scale use in precision agriculture and environmental monitoring. This method incorporates temporal data, resulting in an 8% improvement in the detection F 1 -Score. Despite their lower spatial resolution, infrared bands have proven to be more effective than visible bands, enhancing the F 1 -Score by 4%. Furthermore, the use of over 80 features contributes to a 12% increase in the F 1 -Score compared to using fewer than 10 features. For further research and future studies, a dataset of recently manured plots, verified on-site, has been developed and made publicly available. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. A systematic literature review of recent lightweight detection approaches leveraging machine and deep learning mechanisms in Internet of Things networks.

Author: Mukhaini, Ghada AL, Anbar, Mohammed, Manickam, Selvakumar, Al-Amiedy, Taief Alaa, and Momani, Ammar Al
Subjects: INTERNET of things, MACHINE learning, DEEP learning, FEATURE selection, DENIAL of service attacks, SCIENCE databases
Abstract: The Internet of Things (IoT) connects daily use devices to the Internet, such as home appliances, health care equipment, sensors, and industrial devices. Concurrently, numerous cyber-attacks target those objects and their backbone IoT networks consecutively. Therefore, several researchers have adopted Machine Learning (ML) and Deep Learning (DL) algorithms to develop efficient Intrusion Detection Systems (IDSs). However, the restricted resources of IoT devices hinder integrating those systems with those tiny devices. Hence, designing lightweight IDSs gets more interest from researchers to build efficient detection models to discard attacks in IoT networks. To give a holistic insight into this research domain, this paper presents a Systematic Literature Review (SLR) to review and analyse the recent ML and DL techniques to lighten the IDS models for detecting attacks in IoT devices. In addition, the literature studies were retrieved from six scientific databases Google Scholar, Science Direct, IEEE Xplore®, Scopus, Web of Science, and Springer. From 4,703 identified records, 57 studies were adopted based on predesigned research questions and inclusion/exclusion criteria. The study's findings illustrate the most recently used ML and DL mechanisms and feature engineering techniques to lighten the proposed IDS models. It also shows the most attacks detected, datasets used, tools and network simulators employed, and evaluation metrics and parameters. Furthermore, it suggests the research challenges and future direction after discussing the limitations of the currently proposed techniques. This study shows that most selected studies are journal articles published in IEEE Xplore®. Furthermore, the most used feature engineering techniques are filter-based, as they deliver better performance and lightness than the developed models. Most studies use correlation algorithms as a feature selection technique. Finally, the most discussed attack in the selected studies is the DoS attack. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Region

Database

42 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources