Publication Year Range: This year / Search Limiters: Available in Library Collection / Topic: feature selection - Searchworks@Jio Institute Digital Library Search Results

1. Classification models for likelihood prediction of diabetes at early stage using feature selection

Author: Oladimeji, Oladosu Oyebisi, Oladimeji, Abimbola, and Oladimeji, Olayanju
Published: 2024
Full Text: View/download PDF

2. Optimizing feature selection in intrusion detection systems: Pareto dominance set approaches with mutual information and linear correlation.

Author: Barbosa, Guilherme Nunes Nasseh, Andreoni, Martin, and Mattos, Diogo Menezes Ferrazani
Subjects: FEATURE selection, INTRUSION detection systems (Computer security), MACHINE learning, SOCIAL dominance, PEARSON correlation (Statistics), FILTER paper
Abstract: In the realm of network intrusion detection using machine learning, feature selection aims for computational efficiency, enhanced performance, and model interpretability, preventing overfitting and optimizing data visualization. This paper proposes a filtering method for feature selection, which optimizes information quantity and linear correlation between resultant features. The method identifies Pareto dominant pairs of informative and correlated features, constructs a graph, and selects key features based on betweenness centrality in its connected components. The proposal yields a more concise and informative dataset representation. Experimental results, using three diverse datasets, demonstrate that the proposal achieves more than 95% accuracy in classifying network attacks with just 14% of the total number features in original datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Musical instrument classifier for early childhood percussion instruments.

Author: Rufino, Brandon, Khan, Ajmal, Dutta, Tilak, and Biddiss, Elaine
Subjects: PERCUSSION instruments, MUSICAL instruments, MIXED reality, FEATURE selection, EARLY childhood education, COGNITIVE ability
Abstract: While the musical instrument classification task is well-studied, there remains a gap in identifying non-pitched percussion instruments which have greater overlaps in frequency bands and variation in sound quality and play style than pitched instruments. In this paper, we present a musical instrument classifier for detecting tambourines, maracas and castanets, instruments that are often used in early childhood music education. We generated a dataset with diverse instruments (e.g., brand, materials, construction) played in different locations with varying background noise and play styles. We conducted sensitivity analyses to optimize feature selection, windowing time, and model selection. We deployed and evaluated our best model in a mixed reality music application with 12 families in a home setting. Our dataset was comprised of over 369,000 samples recorded in-lab and 35,361 samples recorded with families in a home setting. We observed the Light Gradient Boosting Machine (LGBM) model to perform best using an approximate 93 ms window with only 12 mel-frequency cepstral coefficients (MFCCs) and signal entropy. Our best LGBM model was observed to perform with over 84% accuracy across all three instrument families in-lab and over 73% accuracy when deployed to the home. To our knowledge, the dataset compiled of 369,000 samples of non-pitched instruments is first of its kind. This work also suggests that a low feature space is sufficient for the recognition of non-pitched instruments. Lastly, real-world deployment and testing of the algorithms created with participants of diverse physical and cognitive abilities was also an important contribution towards more inclusive design practices. This paper lays the technological groundwork for a mixed reality music application that can detect children's use of non-pitched, percussion instruments to support early childhood music education and play. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Classifying breast cancer using multi-view graph neural network based on multi-omics data.

Author: Yanjiao Ren, Yimeng Gao, Wei Du, Weibo Qiao, Wei Li, Qianqian Yang, Yanchun Liang, and Gaoyang Li
Subjects: GRAPH neural networks, DEEP learning, MACHINE learning, FEATURE selection, BREAST cancer, TUMOR classification
Abstract: Introduction: As the evaluation indices, cancer grading and subtyping have diverse clinical, pathological, and molecular characteristics with prognostic and therapeutic implications. Although researchers have begun to study cancer differentiation and subtype prediction, most of relevant methods are based on traditional machine learning and rely on single omics data. It is necessary to explore a deep learning algorithm that integrates multi-omics data to achieve classification prediction of cancer differentiation and subtypes. Methods: This paper proposes a multi-omics data fusion algorithm based on a multi-view graph neural network (MVGNN) for predicting cancer differentiation and subtype classification. The model framework consists of a graph convolutional network (GCN) module for learning features from different omics data and an attention module for integrating multi-omics data. Three different types of omics data are used. For each type of omics data, feature selection is performed using methods such as the chi-square test and minimum redundancy maximum relevance (mRMR). Weighted patient similarity networks are constructed based on the selected omics features, and GCN is trained using omics features and corresponding similarity networks. Finally, an attention module integrates different types of omics features and performs the final cancer classification prediction. Results: To validate the cancer classification predictive performance of the MVGNN model, we conducted experimental comparisons with traditional machine learning models and currently popular methods based on integrating multi-omics data using 5-fold cross-validation. Additionally, we performed comparative experiments on cancer differentiation and its subtypes based on single omics data, two omics data, and three omics data. Discussion: This paper proposed the MVGNN model and it performed well in cancer classification prediction based on multiple omics data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Automatic Speech Emotion Recognition: a Systematic Literature Review.

Author: Mustafa, Haidy H., Darwish, Nagy R., and Hefny, Hesham A.
Subjects: PATTERN recognition systems, ARTIFICIAL intelligence, FEATURE selection, FEATURE extraction, AUTOMATIC speech recognition, HUMAN-computer interaction
Abstract: Automatic Speech Emotion Recognition (ASER) has recently garnered attention across various fields including artificial intelligence, pattern recognition, and human–computer interaction. However, ASER encounters numerous challenges such as a shortage of diverse datasets, appropriate feature selection, and suitable intelligent recognition techniques. To address these challenges, a systematic literature review (SLR) was conducted following established guidelines. A total of 60 primary research papers spanning from 2011 to 2023 were reviewed to investigate, interpret, and analyze the related literature by addressing five key research questions. Despite being an emerging area with applications in real-life scenarios, ASER still grapples with limitations in existing techniques. This SLR provides a comprehensive overview of existing techniques, datasets, and feature extraction tools in the ASER domain, shedding light on the weaknesses of current research studies. Additionally, it outlines a list of limitations for consideration in future work. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Research on seamount substrate classification method based on machine learning.

Author: DeXiang Huang, YongFu Sun, Wei Gao, WeiKun Xu, Wei Wang, YiXin Zhang, and Lei Wang
Subjects: MACHINE learning, OCEAN mining, OCEANOGRAPHIC submersibles, MINES & mineral resources, CLASSIFICATION, NATURAL resources
Abstract: The western Pacific seamount area is abundant in both biological and mineral resources, making it a crucial location for international investigation of regional seabed resources. An essential stage in comprehending and advancing seamounts is gaining knowledge about the distribution characteristics and laws governing the seabed substrate. Deep-sea geological sampling is challenging because of the intricate nature of the deep-sea environment, resulting in increased difficulty in identifying and evaluating substrates. This study addresses the aforementioned issues by utilizing in-situ video footage obtained from the "Jiaolong" manned deep submersible and shipborne deep-water multibeam data. This data is used as a foundation for constructing a Western Pacific seamount areas substrate classification point set. Additionally, the paper introduces the mRMR-XGBoost substrate classification model. Substrate categorization in deep sea and mountainous regions has been successfully accomplished, yielding a classification accuracy of 92.5%. The classification experiments and box sampling results demonstrate that the mRMR-XGBoost substrate classification model proposed in this paper can efficiently use acoustic and optical data to accurately divide the substrate types in seamount areas, with better classification accuracy, when compared with commonly used machine learning models. It has a significant application value and the best classification effect on the two types of substrates: nodules and gravel substrates. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning.

Author: Lu, Yao, Wang, Kui, Sun, Hui, Qu, Hanwen, Chen, Jiajia, Liu, Wei, and Chang, Chenjie
Subjects: DEFAULT (Finance), FORECASTING, FEATURE selection, ALGORITHMS, CREDIT risk, ECONOMETRIC models, MACHINE learning, GREEN technology
Abstract: In the field of risk assessment, the traditional econometric models are generally used to assess credit risk. And with the introduction of the "dual-carbon" goals to promote the development of a low-carbon economy, the scale of green credit in China has rapidly expanded. But with the advent of the big data era, due to the poor interpretability of a traditional single machine learning model, it is difficult to capture nonlinear relationships, and there are shortcomings in prediction accuracy and robustness. This paper selects the adjusted ensemble learning model based on the homogeneous and heterogeneous factors for user default prediction, which can efficiently process large quantities of high-dimensional data. This article adjusts each model to adapt to the task and innovatively compares various models. In this paper, the missing value filling method, feature selection, and ensemble model are studied and discussed, and the optimal ensemble model is obtained. When comparing the predictions of single models and ensemble models, the accuracy, sensitivity, specificity, F1-Score, Kappa, and MCC of Categorical Features Gradient Boosting (CatBoost) and Random undersampling Boosting (RUSBoost) all reach 100%. The experimental results prove that the algorithm based on adjusted homogeneous and heterogeneous ensemble learning can predict the user default efficiently and accurately. This paper also provides some references for establishing a risk assessment index system. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Traffic Feature Selection and Distributed Denial of Service Attack Detection in Software-Defined Networks Based on Machine Learning.

Author: Han, Daoqi, Li, Honghui, Fu, Xueliang, and Zhou, Shuncheng
Subjects: DENIAL of service attacks, INTRUSION detection systems (Computer security), FEATURE selection, SOFTWARE-defined networking, MACHINE learning, COMPUTER network traffic, INTELLIGENT transportation systems
Abstract: As 5G technology becomes more widespread, the significant improvement in network speed and connection density has introduced more challenges to network security. In particular, distributed denial of service (DDoS) attacks have become more frequent and complex in software-defined network (SDN) environments. The complexity and diversity of 5G networks result in a great deal of unnecessary features, which may introduce noise into the detection process of an intrusion detection system (IDS) and reduce the generalization ability of the model. This paper aims to improve the performance of the IDS in 5G networks, especially in terms of detection speed and accuracy. It proposes an innovative feature selection (FS) method to filter out the most representative and distinguishing features from network traffic data to improve the robustness and detection efficiency of the IDS. To confirm the suggested method's efficacy, this paper uses four common machine learning (ML) models to evaluate the InSDN, CICIDS2017, and CICIDS2018 datasets and conducts real-time DDoS attack detection on the simulation platform. According to experimental results, the suggested FS technique may match 5G network requirements for high speed and high reliability of the IDS while also drastically cutting down on detection time and preserving or improving DDoS detection accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Resource Scheduling in URLLC and eMBB Coexistence Based on Dynamic Selection Numerology.

Author: Wang, Lei, Tao, Sijie, Zhao, Lindong, Zhou, Dengyou, Liu, Zhe, and Sun, Yanbing
Subjects: DEEP reinforcement learning, REINFORCEMENT learning, WIRELESS Internet, RESOURCE allocation, SIMULATION software, FEATURE selection
Abstract: This paper focuses on the resource allocation problem of multiplexing two different service scenarios, enhanced mobile broadband (eMBB) and ultrareliable low latency (URLLC) in 5G New Radio, based on dynamic numerology structure, mini-time slot scheduling, and puncturing to achieve optimal resource allocation. To obtain the optimal channel resource allocation under URLLC user constraints, this paper establishes a relevant channel model divided into two convex optimization problems: (a) eMBB resource allocation and (b) URLLC scheduling. We also determine the numerology values at the beginning of each time slot with the help of deep reinforcement learning to achieve flexible resource scheduling. The proposed algorithm is verified in simulation software, and the simulation results show that the dynamic selection of numerologies proposed in this paper can better improve the data transmission rate of eMBB users and reduce the latency of URLLC services compared with the fixed numerology scheme for the same URLLC packet arrival, while the reasonable resource allocation ensures the reliability of URLLC and eMBB communication. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. A model for skin cancer using combination of ensemble learning and deep learning.

Author: Hosseinzadeh, Mehdi, Hussain, Dildar, Zeki Mahmood, Firas Muhammad, A. Alenizi, Farhan, Varzeghani, Amirhossein Noroozi, Asghari, Parvaneh, Darwesh, Aso, Malik, Mazhar Hussain, and Lee, Sang-Woong
Subjects: SKIN cancer, DEEP learning, FEATURE selection, MACHINE learning, RANDOM forest algorithms, SURVIVAL rate
Abstract: Skin cancer has a significant impact on the lives of many individuals annually and is recognized as the most prevalent type of cancer. In the United States, an estimated annual incidence of approximately 3.5 million people receiving a diagnosis of skin cancer underscores its widespread prevalence. Furthermore, the prognosis for individuals afflicted with advancing stages of skin cancer experiences a substantial decline in survival rates. This paper is dedicated to aiding healthcare experts in distinguishing between benign and malignant skin cancer cases by employing a range of machine learning and deep learning techniques and different feature extractors and feature selectors to enhance the evaluation metrics. In this paper, different transfer learning models are employed as feature extractors, and to enhance the evaluation metrics, a feature selection layer is designed, which includes diverse techniques such as Univariate, Mutual Information, ANOVA, PCA, XGB, Lasso, Random Forest, and Variance. Among transfer models, DenseNet-201 was selected as the primary feature extractor to identify features from data. Subsequently, the Lasso method was applied for feature selection, utilizing diverse machine learning approaches such as MLP, XGB, RF, and NB. To optimize accuracy and precision, ensemble methods were employed to identify and enhance the best-performing models. The study provides accuracy and sensitivity rates of 87.72% and 92.15%, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. IGA-SOMK + + : a new clustering method for constructing web user profiles of older adults in China

Author: Li, Yue, Liu, Chengqi, Hu, Xinyue, Qi, Jianfang, and Chen, Gong
Published: 2024
Full Text: View/download PDF

12. Classification and spatio-temporal evolution analysis of coastal wetlands in the Liaohe Estuary from 1985 to 2023: based on feature selection and sample migration methods.

Author: Lina Ke, Qin Tan, Yao Lu, Quanming Wang, Guangshuai Zhang, Yu Zhao, and Lei Wang
Abstract: Coastal wetlands are important areas with valuable natural resources and diverse biodiversity. Due to the influence of both natural factors and human activities, the landscape of coastal wetlands undergoes significant changes. It is crucial to systematically monitor and analyze the dynamic changes in coastal wetland cover over a long-term time series. In this paper, a long-term time series coastal wetland remote sensing classification process was proposed, which integrated feature selection and sample migration. Utilizing Google Earth Engine (GEE) and Landsat TM/ETM/OLI remote sensing image data, the selected feature set is combined with the sample migration method to generate the training sample set for each target year. The Simple Non-Iterative Clustering-Random Forest (SNIC-RF) model was ultimately employed to accurately map wetland classes in the Liaohe Estuary from 1985 to 2023 and quantitatively evaluate the spatio-temporal pattern change characteristics of wetlands in the study area. The findings indicate that: (1) After feature selection, the accuracy of the model reached 0.88, and the separation of the selected feature set was good. (2) After sample migration, the overall accuracy of sample classification in the target year ranged from 87 to 94%, along with Kappa coefficients of 0.84 to 0.92, thereby ensuring the validity of classification sample migration. (3) SNIC-RF classification results showed better performance of wetland landscape. Compared with RF classification, the overall classification accuracy was increased by 0.69-5.82%, and the Kappa coefficient was increased by 0.0087-0.0751. (4) From 1985 to 2023, there has been a predominant trend of natural wetlands being converted into artificial wetlands. In recent years, this transition has occurred more gently. Finally, this study offers valuable insights into understanding changes and trends in the surface ecological environment of the Liaohe Estuary. The research method can be extended to other types of wetland classification and the comprehensive application of coastal wetland in hydrology, ecology, meteorology, soil, and environment can be further explored on the basis of this research, laying strong groundwork for shaping policies on ecological protection and restoration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. SEDAT: A Stacked Ensemble Learning-Based Detection Model for Multiscale Network Attacks.

Author: Feng, Yan, Yang, Zhihai, Sun, Qindong, and Liu, Yanxiao
Subjects: COMPUTER network traffic, DISTRIBUTION (Probability theory), COMPUTER network security, ANOMALY detection (Computer security), FEATURE selection
Abstract: Anomaly detection for network traffic aims to analyze the characteristics of network traffic in order to discover unknown attacks. Currently, existing detection methods have achieved promising results against high-intensity attacks that aim to interrupt the operation of the target system. In reality, attack behaviors that are commonly exhibited are highly concealed and disruptive. In addition, the attack scales are flexible and variable. In this paper, we construct a multiscale network intrusion behavior dataset, which includes three attack scales and two multiscale attack patterns based on probability distribution. Specifically, we propose a stacked ensemble learning-based detection model for anomalous traffic (or SEDAT for short) to defend against highly concealed multiscale attacks. The model employs a random forest (RF)-based method to select features and introduces multiple base learning autoencoders (AEs) to enhance the representation of multiscale attack behaviors. In addressing the challenge of a single model's inability to capture the regularities of multiscale attack behaviors, SEDAT is capable of adapting to the complex multiscale characteristics in network traffic, enabling the prediction of network access behavior. Comparative experiments demonstrate that SEDAT exhibits superior detection capabilities in multiscale network attacks. In particular, SEDAT achieves an improvement of at least 5% accuracy over baseline methods for detecting multiscale attacks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Inversion Method for Transformer Winding Hot Spot Temperature Based on Gated Recurrent Unit and Self-Attention and Temperature Lag.

Author: Hao, Yuefeng, Zhang, Zhanlong, Liu, Xueli, Yang, Yu, and Liu, Jun
Subjects: TRANSFORMER models, FEATURE selection, INVERSION (Geophysics), TEMPERATURE
Abstract: The hot spot temperature of transformer windings is an important indicator for measuring insulation performance, and its accurate inversion is crucial to ensure the timely and accurate fault prediction of transformers. However, existing studies mostly directly input obtained experimental or operational data into networks to construct data-driven models, without considering the lag between temperatures, which may lead to the insufficient accuracy of the inversion model. In this paper, a method for inverting the hot spot temperature of transformer windings based on the SA-GRU model is proposed. Firstly, temperature rise experiments are designed to collect the temperatures of the entire side and top of the transformer tank, top oil temperature, ambient temperature, the cooling inlet and outlet temperatures, and winding hot spot temperature. Secondly, experimental data are integrated, considering the lag of the data, to obtain candidate input feature parameters. Then, a feature selection algorithm based on mutual information (MI) is used to analyze the correlation of the data and construct the optimal feature subset to ensure the maximum information gain. Finally, Self-Attention (SA) is applied to optimize the Gate Recurrent Unit (GRU) network, establishing the GRU-SA model to perceive the potential patterns between output feature parameters and input feature parameters, achieving the precise inversion of the hot spot temperature of the transformer windings. The experimental results show that considering the lag of the data can more accurately invert the hot spot temperature of the windings. The inversion method proposed in this paper can reduce redundant input features, lower the complexity of the model, accurately invert the changing trend of the hot spot temperature, and achieve higher inversion accuracy than other classical models, thereby obtaining better inversion results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Real-Time RGBT Target Tracking Based on Attention Mechanism.

Author: Zhao, Qian, Liu, Jun, Wang, Junjia, and Xiong, Xingzhong
Subjects: ARTIFICIAL satellite tracking, FEATURE selection, INFRARED imaging, THERMOGRAPHY
Abstract: The fusion tracking of RGB and thermal infrared image (RGBT) has attracted widespread interest within target tracking by leveraging the complementing benefits of information from both visible and thermal infrared modalities, but achieving robustness while operating in real time remains a challenge. Aimed at this problem, this paper proposes a real-time tracking network based on the attention mechanism, which can improve the tracking speed with a smaller model, and at the same time, introduce the attention mechanism in the module to strengthen the attention to the important features, which can guarantee a certain tracking accuracy. Specifically, the modal features of visible and thermal infrared are extracted separately by using the backbone of the dual-stream structure; then, the important features in the two modes are selected and enhanced by using the channel attention mechanism in the feature selection enhancement module (FSEM) and the Transformer, while noise is reduced by using gating circuits. Finally, the final enhancement fusion is performed by using the spatial channel adaptive adjustment fusion module (SCAAM) in both the spatial and channel dimensions. The PR/SR of the proposed algorithm tested on the GTOT, RGBT234 and LasHeR datasets are 90.0%/73.0%, 84.4%/60.2%, and 46.8%/34.3%, respectively, and generally good tracking accuracy has been achieved, with a speed of up to 32.3067 fps, meeting the model's real-time requirement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular Data.

Author: Yan, Liuxi and Xu, Yaoqun
Subjects: GRAPH neural networks, ARTIFICIAL neural networks, DATA structures, FEATURE selection, MACHINE learning, PATTERN matching
Abstract: Graph neural networks (GNNs) perform well in text analysis tasks. Their unique structure allows them to capture complex patterns and dependencies in text, making them ideal for processing natural language tasks. At the same time, XGBoost (version 1.6.2.) outperforms other machine learning methods on heterogeneous tabular data. However, traditional graph neural networks mainly study isomorphic and sparse data features. Therefore, when dealing with tabular data, traditional graph neural networks encounter challenges such as data structure mismatch, feature selection, and processing difficulties. To solve these problems, we propose a novel architecture, XGNN, which combines the advantages of XGBoost and GNNs to deal with heterogeneous features and graph structures. In this paper, we use GAT for our graph neural network model. We can train XGBoost and GNN end-to-end to fit and adjust the new tree in XGBoost based on the gradient information from the GNN. Extensive experiments on node prediction and node classification tasks demonstrate that the performance of our proposed new model is significantly improved for both prediction and classification tasks and performs particularly well on heterogeneous tabular data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Machine Learning Model Development to Predict Power Outage Duration (POD): A Case Study for Electric Utilities.

Author: Ghasemkhani, Bita, Kut, Recep Alp, Yilmaz, Reyat, Birant, Derya, Arıkök, Yiğit Ahmet, Güzelyol, Tugay Eren, and Kut, Tuna
Subjects: ELECTRIC utilities, ELECTRIC power failures, FEATURE selection, ELECTRIC power distribution grids, PUBLIC utilities, MACHINE learning, BOOSTING algorithms
Abstract: In the face of increasing climate variability and the complexities of modern power grids, managing power outages in electric utilities has emerged as a critical challenge. This paper introduces a novel predictive model employing machine learning algorithms, including decision tree (DT), random forest (RF), k-nearest neighbors (KNN), and extreme gradient boosting (XGBoost). Leveraging historical sensors-based and non-sensors-based outage data from a Turkish electric utility company, the model demonstrates adaptability to diverse grid structures, considers meteorological and non-meteorological outage causes, and provides real-time feedback to customers to effectively address the problem of power outage duration. Using the XGBoost algorithm with the minimum redundancy maximum relevance (MRMR) feature selection attained 98.433% accuracy in predicting outage durations, better than the state-of-the-art methods showing 85.511% accuracy on average over various datasets, a 12.922% improvement. This paper contributes a practical solution to enhance outage management and customer communication, showcasing the potential of machine learning to transform electric utility responses and improve grid resilience and reliability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Wireless Mouth Motion Recognition System Based on EEG-EMG Sensors for Severe Speech Impairments.

Author: Moon, Kee S., Kang, John S., Lee, Sung Q., Thompson, Jeff, and Satterlee, Nicholas
Subjects: SPEECH, SIGNAL processing, BIOMEDICAL signal processing, FEATURE selection, MOTION detectors, PEOPLE with paralysis, AUTOMATIC speech recognition
Abstract: This study aims to demonstrate the feasibility of using a new wireless electroencephalography (EEG)–electromyography (EMG) wearable approach to generate characteristic EEG-EMG mixed patterns with mouth movements in order to detect distinct movement patterns for severe speech impairments. This paper describes a method for detecting mouth movement based on a new signal processing technology suitable for sensor integration and machine learning applications. This paper examines the relationship between the mouth motion and the brainwave in an effort to develop nonverbal interfacing for people who have lost the ability to communicate, such as people with paralysis. A set of experiments were conducted to assess the efficacy of the proposed method for feature selection. It was determined that the classification of mouth movements was meaningful. EEG-EMG signals were also collected during silent mouthing of phonemes. A few-shot neural network was trained to classify the phonemes from the EEG-EMG signals, yielding classification accuracy of 95%. This technique in data collection and processing bioelectrical signals for phoneme recognition proves a promising avenue for future communication aids. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Digital Visual Design Reengineering and Application Based on K-means Clustering Algorithm.

Author: Lijie Ren and Hyunsuk Kim
Subjects: K-means clustering, BEES algorithm, FEATURE selection, OPTIMIZATION algorithms, ALGORITHMS, FEATURE extraction, STANDARD deviations, CURVES
Abstract: INTRODUCTION: The article discusses the key steps in digital visual design reengineering, with a special emphasis on the importance of information decoding and feature extraction for flat cultural heritage. These processes not only minimize damage to the aesthetic heritage itself but also feature high quality, efficiency, and recyclability. OBJECTIVES: The aim of the article is to explore the issues of gene extraction methods in digital visual design reengineering, proposing a visual gene extraction method through an improved K-means clustering algorithm. METHODS: A visual gene extraction method based on an improved K-means clustering algorithm is proposed. Initially analyzing the digital visual design reengineering process, combined with a color extraction method using the improved JSO algorithm-based K-means clustering algorithm, a gene extraction and clustering method for digital visual design reengineering is proposed and validated through experiments. .ASA-RESULT: The results show that the proposed method improves the accuracy, robustness, and real-time performance of clustering. Through comparative analysis with Dunhuang murals, the effectiveness of the color extraction method based on the K-means-JSO algorithm in the application of digital visual design reengineering is verified. The method based on the K-means-GWO algorithm performs best in terms of average clustering time and standard deviation. The optimization curve of color extraction based on the K-means-JSO algorithm converges faster and with better accuracy compared to the K-means-ABC, K-means-GWO, K-means-DE, K-means-CMAES, and K-means-WWCD algorithms. CONCLUSION: The color extraction method of the K-means clustering algorithm improved by the JSO algorithm proposed in this paper solves the problems of insufficient standardization in feature selection, lack of generalization ability, and inefficiency in visual gene extraction methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Retrieval of High-Frequency Temperature Profiles by FY-4A/GIIRS Based on Generalized Ensemble Learning.

Author: Gen WANG, Wei HAN, Song YUAN, Jing WANG, Ruo-Ying YIN, Song YE, and Feng XIE
Subjects: GEOSTATIONARY satellites, MACHINE learning, ATMOSPHERIC temperature, FEATURE selection, RANDOM forest algorithms, WEATHER forecasting
Abstract: The temperature profile is an important parameter of the atmospheric thermal state in atmospheric monitoring and weather forecasting. The hyperspectral infrared sounder of a geostationary satellite provides abundant spectral information and can retrieve the temperature profile. Based on the mediumwave channel data (independent variable and model input data) of FY-4A/GIIRS (geosynchronous interferometric infrared sounder) and ERA5 reanalysis data (dependent variable and model output data), the atmospheric temperature profile is retrieved by generalized ensemble learning. Firstly, the feature variables of the model are constructed. Because there are many GIIRS channels, a two-step feature selection method is adopted: step 1--establish a blacklist of GIIRS channels; step 2--select feature variables by using the method of importance permutation. Secondly, they are integrated based on optimizing and adjusting the hyperparameters of three basic machine learning models (Random Forest, XGBoost and LightGBM). Generalized ensemble learning nonlinear convex optimization is used to optimize the weight of each basic model. Finally, based on high-frequency GIIRS observations of Typhoon Lekima and Typhoon Higos, testing and method evaluation of the temperature profile retrievals are carried out. The results show that LightGBM achieves the best retrieval result among the three basic models, followed by Random Forest and finally XGBoost. The root-mean-square error of the whole temperature profile in the training dataset of generalized ensemble learning is less than 0.3 K, while that of the testing dataset is less than 1.4 K, and that between 150 hPa and 925 hPa is less than 1 K. The retrieval results correlate well with the radiosonde temperature profile. The performance of generalized ensemble learning is better than the performances of the three basic models, but it depends on the retrieval results of LightGBM. In the Lekima experimental case, compared to other channels selected for temperature retrieval models, the importance of mediumwave channels 9 and 307 of GIIRS ranks first and second, respectively. The method in this paper provides a new solution and technical support for retrieving atmospheric parameters from hyperspectral and other satellite data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Wind turbine fault detection and isolation robust against data imbalance using KNN.

Author: Fazli, Ali and Poshtan, Javad
Subjects: WIND turbines, K-nearest neighbor classification, SUPERVISORY control & data acquisition systems, FEATURE selection, FALSE alarms
Abstract: Due to the difficulties of system modeling, nonlinearity effects, uncertainties, and the availability of Wind Turbines (WTs) SCADA system data, data‐driven Fault Detection and Isolation (FDI) methods for WTs have received increasing attention. In this paper, using the wind turbine SCADA data, an effective FDI scheme is proposed using the K‐Nearest Neighbors (KNN) classifier. The operational data set is labeled by the status and warning data sets, and the labeled operational data set, after eliminating invalid data, feature selection, and standardization, is used for training and validation of the FDI model. Data imbalance, which is common in real data sets, does not affect the performance of the proposed method, hence there is no need for data balancing methods in this algorithm and the performance is not deteriorated by occurring false alarms. Therefore, the proposed method has provided impressive performance in FDI compared with previous research on this data set. Also, many of the fault classes addressed in this paper were not considered in previous works on this data set. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Multisource High-Resolution Remote Sensing Image Vegetation Extraction with Comprehensive Multifeature Perception.

Author: Li, Yan, Min, Songhan, Song, Binbin, Yang, Hui, Wang, Biao, and Wu, Yongchuang
Subjects: REMOTE sensing, VEGETATION monitoring, REMOTE-sensing images, FEATURE selection, FEATURE extraction, RANDOM forest algorithms, DATA extraction
Abstract: High-resolution remote sensing image-based vegetation monitoring is a hot topic in remote sensing technology and applications. However, when facing large-scale monitoring across different sensors in broad areas, the current methods suffer from fragmentation and weak generalization capabilities. To address this issue, this paper proposes a multisource high-resolution remote sensing image-based vegetation extraction method that considers the comprehensive perception of multiple features. First, this method utilizes a random forest model to perform feature selection for the vegetation index, selecting an index that enhances the otherness between vegetation and other land features. Based on this, a multifeature synthesis perception convolutional network (MSCIN) is constructed, which enhances the extraction of multiscale feature information, global information interaction, and feature cross-fusion. The MSCIN network simultaneously constructs dual-branch parallel networks for spectral features and vegetation index features, strengthening multiscale feature extraction while reducing the loss of detailed features by simplifying the dense connection module. Furthermore, to facilitate global information interaction between the original spectral information and vegetation index features, a dual-path multihead cross-attention fusion module is designed. This module enhances the differentiation of vegetation from other land features and improves the network's generalization performance, enabling vegetation extraction from multisource high-resolution remote sensing data. To validate the effectiveness of this method, we randomly selected six test areas within Anhui Province and compared the results with three different data sources and other typical methods (NDVI, RFC, OCBDL, and HRNet). The results demonstrate that the MSCIN method proposed in this paper, under the premise of using only GF2 satellite images as samples, exhibits robust accuracy in extraction results across different sensors. It overcomes the rapid degradation of accuracy observed in other methods with various sensors and addresses issues such as internal fragmentation, false positives, and false negatives caused by sample generalization and image diversity. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Meticulous Review: Cutting-Edge Cervix Cancer Stratification Using Image Processing And Machine Learning.

Author: Bhavsar, Barkha and Shrimali, Bela
Subjects: IMAGE processing, MACHINE learning, CERVICAL cancer, DEEP learning, CELL nuclei, CANCER education
Abstract: Cervical cancer has under the top cancer found in women of developing countries since last many years. Classification of cervical cancer through a traditional microscopic approach is a monotonous and prolonged task. Most of the time hospital doctors cannot identify the cancer cells as sometimes the nucleus of a cell, which contains the genetic material (DNA), is typically very small and often not visible to the naked eye. Due to the different perspectives of doctors, cancer stages are classified falsely which leads to low recovery and late medication. The use of Image Processing and Machine Learning technologies can take off misclassification and inaccurate prediction. Although many deep learning techniques are available for cervical cancer cell detection and classification, the performance of such techniques for prediction and classification with real and sample datasets is the main challenge. In this paper, we did a thorough state-of-the-art review of the available current literature. The objective of this paper is to bring forth in-depth knowledge to novice researchers with a thorough understanding of the architecture of the computer-assisted classification process. The current literature is studied, analyzed, and discussed with their approaches, results, and methodologies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. A Review of Machine Learning's Role in Cardiovascular Disease Prediction: Recent Advances and Future Challenges.

Author: Naser, Marwah Abdulrazzaq, Majeed, Aso Ahmed, Alsabah, Muntadher, Al-Shaikhli, Taha Raad, and Kaky, Kawa M.
Subjects: MACHINE learning, CARDIOVASCULAR diseases, ARTIFICIAL intelligence, EARLY diagnosis, TREATMENT delay (Medicine)
Abstract: Cardiovascular disease is the leading cause of global mortality and responsible for millions of deaths annually. The mortality rate and overall consequences of cardiac disease can be reduced with early disease detection. However, conventional diagnostic methods encounter various challenges, including delayed treatment and misdiagnoses, which can impede the course of treatment and raise healthcare costs. The application of artificial intelligence (AI) techniques, especially machine learning (ML) algorithms, offers a promising pathway to address these challenges. This paper emphasizes the central role of machine learning in cardiac health and focuses on precise cardiovascular disease prediction. In particular, this paper is driven by the urgent need to fully utilize the potential of machine learning to enhance cardiovascular disease prediction. In light of the continued progress in machine learning and the growing public health implications of cardiovascular disease, this paper aims to offer a comprehensive analysis of the topic. This review paper encompasses a wide range of topics, including the types of cardiovascular disease, the significance of machine learning, feature selection, the evaluation of machine learning models, data collection & preprocessing, evaluation metrics for cardiovascular disease prediction, and the recent trends & suggestion for future works. In addition, this paper offers a holistic view of machine learning's role in cardiovascular disease prediction and public health. We believe that our comprehensive review will contribute significantly to the existing body of knowledge in this essential area. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Keyword Pool Generation for Web Text Collecting: A Framework Integrating Sample and Semantic Information.

Author: Wu, Xiaolong, Feng, Chong, Li, Qiyuan, and Zhu, Jianping
Subjects: REGRESSION analysis, TEXT mining, KEYWORD searching, SAMPLE size (Statistics)
Abstract: Keyword pools are used as search queries to collect web texts, largely determining the size and coverage of the samples and provide a data base for subsequent text mining. However, how to generate a refined keyword pool with high similarity and some expandability is a challenge. Currently, keyword pools for search queries aimed at collecting web texts either lack an objective generation method and evaluation system, or have a low utilization rate of sample semantic information. Therefore, this paper proposed a keyword generation framework that integrates sample and semantic information to construct a complete and objective keyword pool generation and evaluation system. The framework includes a data phase and a modeling phase, and its core is in the modeling phase, where both feature ranking and model performance are considered. A regression model about a topic vector and word vectors is constructed for the first time based on word embedding, and keyword pools are generated from the perspective of model performance. In addition, two keyword generation methods, Recursive Feature Introduction (RFI) and Recursive Feature Introduction and Elimination (RFIE), are also proposed in this paper. Different feature ranking algorithms, keyword generation methods and regression models are compared in the experiments. The results show that: (1) When using RFI to generate keywords, the regression model using ranked features has better prediction performance than the baseline model, and the number of generated keywords is refiner, and the prediction performance of the regression model using tree-based ranked features is significantly better than that of the one using SHAP-based ranked features. (2) The prediction performance of the regression model using RFI with tree-based ranked features is significantly better than that using Recursive Feature Elimination (RFE) with tree-based one. (3) All four regression models using RFI/RFE with SHAP- based/tree-based ranked features have significantly higher average similarity scores and cumulative advantages than the baseline model (the model using RFI with unranked features). (4) Light Gradient Boosting Machine (LGBM) using RFI with SHAP-based ranked features has significantly better prediction performance, higher average similarity scores, and cumulative advantages. In conclusion, our framework can generate a keyword pool that is more similar to the topic, and more refined and expandable, which provides certain research ideas for expanding the research sample size while ensuring the coverage of topics in web text collecting. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Waste Material Classification: A Short-Wave Infrared Discrete-Light-Source Approach Based on Light-Emitting Diodes.

Author: Manakkakudy, Anju, De Iacovo, Andrea, Maiorana, Emanuele, Mitri, Federica, and Colace, Lorenzo
Subjects: WASTE products, LIGHT emitting diodes, RECYCLING management, WASTE management, PLASTIC scrap
Abstract: Waste material classification is a challenging yet important task in waste management. The realization of low-cost waste classification systems and methods is critical to meet the ever-increasing demand for efficient waste management and recycling. In this paper, we demonstrate a simple, compact and low-cost classification system based on optical reflectance measurements in the short-wave infrared for the segregation of waste materials such as plastics, paper, glass, and aluminium. The system comprises a small set of LEDs and one single broadband photodetector. All devices are controlled through low-cost and low-power electronics, and data are gathered and managed via a computer interface. The proposed system reaches accuracy levels as high as 94.3% when considering seven distinct materials and 97.0% when excluding the most difficult to classify, thus representing a valuable proof-of-concept for future system developments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Applying modified golden jackal optimization to intrusion detection for Software-Defined Networking.

Author: Qiu, Feng, Xu, Hui, and Li, Fukui
Subjects: SOFTWARE-defined networking, ALGORITHMS, INTRUSION detection systems (Computer security), INDEXES, COMPUTER network architectures
Abstract: As a meta-heuristic algorithm, the Golden Jackal Optimization (GJO) algorithm has been widely used in traditional network intrusion detection due to its ease of use and high efficiency. This paper aims to extend its application to the emerging field of Software-Defined Networking (SDN), which is a new network architecture. To adapt the GJO for SDN intrusion detection, a modified Golden Jackal Optimization (mGJO) is proposed to enhance its performance with the use of two strategies. First, an Elite Dynamic Opposite Learning strategy operates during each iteration to find solutions opposite to the current global optimal solutions, which increases population diversity. Second, an updating strategy based on the Golden Sine II Algorithm is utilized in the exploitation phase to update the position information of the golden jackal pairs, which accelerates the search for the best feature subset indexes. To validate the feasibility of the mGJO algorithm, this paper first assesses its optimization capability using benchmark test functions. Then, four UCI datasets and the NSL-KDD dataset are used to test the classification capability of the mGJO algorithm and its application in traditional network intrusion detection. Furthermore, the InSDN dataset is used to validate the feasibility of the mGJO algorithm for SDN intrusion detection. The experimental results show that, when the mGJO algorithm is applied to SDN for intrusion detection, the various indexes of classification and the selection of feature subsets achieve better results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Research on Ensemble Learning-Based Feature Selection Method for Time-Series Prediction.

Author: Huang, Da, Liu, Zhaoguo, and Wu, Dan
Subjects: SELECTION bias (Statistics), FINANCIAL markets, PREDICTION models, FEATURE selection, FORECASTING
Abstract: Feature selection has perennially stood as a pivotal concern in the realm of time-series forecasting due to its direct influence on the efficacy of predictive models. Conventional approaches to feature selection predominantly rely on domain knowledge and experiential insights and are, therefore, susceptible to individual subjectivity and the resultant inconsistencies in the outcomes. Particularly in domains such as financial markets, and within datasets comprising time-series information, an abundance of features adds complexity, necessitating adept handling of high-dimensional data. The computational expenses associated with traditional methodologies in managing such data dimensions, coupled with vulnerability to the curse of dimensionality, further compound the challenges at hand. In response to these challenges, this paper advocates for an innovative approach—a feature selection method grounded in ensemble learning. The paper explicitly delineates the formal integration of ensemble learning into feature selection, guided by the overarching principle of "good but different". To operationalize this concept, five feature selection methods that are well suited to ensemble learning were identified, and their respective weights were determined through K-fold cross-validation when applied to specific datasets. This ensemble method amalgamates the outcomes of diverse feature selection techniques into a numeric composite, thereby mitigating potential biases inherent in traditional methods and elevating the precision and comprehensiveness of feature selection. Consequently, this method improves the performance of time-series prediction models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. MpoxNet: dual-branch deep residual squeeze and excitation monkeypox classification network with attention mechanism.

Author: Jingbo Sun, Baoxi Yuan, Zhaocheng Sun, Jiajun Zhu, Yuxin Deng, Yi Gong, and Yuhe Chen
Subjects: MONKEYPOX, COVID-19 pandemic, NOSOLOGY, CLASSIFICATION, FEATURE extraction
Abstract: While the world struggles to recover from the devastation wrought by the widespread spread of COVID-19, monkeypox virus has emerged as a new global pandemic threat. In this paper, a high precision and lightweight classification network MpoxNet based on ConvNext is proposed to meet the need of fast and safe detection of monkeypox classification. In this method, a two-branch depthseparable convolution residual Squeeze and Excitation module is designed. This design aims to extract more feature information with two branches, and greatly reduces the number of parameters in the model by using depth-separable convolution. In addition, our method introduces a convolutional attention module to enhance the extraction of key features within the receptive field. The experimental results show that MpoxNet has achieved remarkable results in monkeypox disease classification, the accuracy rate is 95.28%, the precision rate is 96.40%, the recall rate is 93.00%, and the F1-Score is 95.80%. This is significantly better than the current mainstream classification model. It is worth noting that the FLOPS and the number of parameters of MpoxNet are only 30.68% and 31.87% of those of ConvNext-Tiny, indicating that the model has a small computational burden and model complexity while efficient performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Mitigation of Adversarial Attacks in 5G Networks with a Robust Intrusion Detection System Based on Extremely Randomized Trees and Infinite Feature Selection.

Author: Baldini, Gianmarco
Subjects: FEATURE selection, 5G networks, INTRUSION detection systems (Computer security), MACHINE learning, COMMUNICATION infrastructure, DEEP learning
Abstract: Intrusion Detection Systems (IDSs) are an important tool to mitigate cybersecurity threats in the ICT infrastructures. Preferable properties of the IDSs are the optimization of the attack detection accuracy and the minimization of the computing resources and time. A signification portion of IDSs presented in the research literature is based on Machine Learning (ML) and Deep Learning (DL) elements, but they may be prone to adversarial attacks, which may undermine the overall performance of the IDS algorithm. This paper proposes a novel IDS focused on the detection of cybersecurity attacks in 5G networks, which addresses in a simple but effective way two specific adversarial attacks: (1) tampering of the labeled set used to train the ML algorithm, (2) modification of the features in the training data set. The approach is based on the combination of two algorithms, which have been introduced recently in the research literature. The first algorithm is the Extremely Randomized Tree (ERT) algorithm, which enhances the capability of Decision Tree (DT) and Random Forest (RF) algorithms to perform classification in data sets, which are unbalanced and of large size as IDS data sets usually are (legitimate traffic messages are more numerous than attack related messages). The second algorithm is the recently introduced Infinite Feature Selection algorithm, which is used to optimize the choice of the hyper-parameter defined in the approach and improve the overall computing efficiency. The result of the application of the proposed approach on a recently published 5G IDS data set proves its robustness against adversarial attacks with different degrees of severity calculated as the percentage of the tampered data set samples. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Multi-strategy augmented Harris Hawks optimization for feature selection.

Author: Zhao, Zisong, Yu, Helong, Guo, Hongliang, and Chen, Huiling
Subjects: FEATURE selection, OPTIMIZATION algorithms, GLOBAL optimization, INFORMATION sharing, COMMUNICATION strategies, TECHNOLOGY convergence
Abstract: In the context of increasing data scale, contemporary optimization algorithms struggle with cost and complexity in addressing the feature selection (FS) problem. This paper introduces a Harris hawks optimization (HHO) variant, enhanced with a multi-strategy augmentation (CXSHHO), for FS. The CXSHHO incorporates a communication and collaboration strategy (CC) into the baseline HHO, facilitating better information exchange among individuals, thereby expediting algorithmic convergence. Additionally, a directional crossover (DX) component refines the algorithm's ability to thoroughly explore the feature space. Furthermore, the soft-rime strategy (SR) broadens population diversity, enabling stochastic exploration of an extensive decision space and reducing the risk of local optima entrapment. The CXSHHO's global optimization efficacy is demonstrated through experiments on 30 functions from CEC2017, where it outperforms 15 established algorithms. Moreover, the paper presents a novel FS method based on CXSHHO, validated across 18 varied datasets from UCI. The results confirm CXSHHO's effectiveness in identifying subsets of features conducive to classification tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Correction: Yi et al. SFS-AGGL: Semi-Supervised Feature Selection Integrating Adaptive Graph with Global and Local Information. Information 2024, 15 , 57.

Author: Yi, Yugen, Zhang, Haoming, Zhang, Ningyi, Zhou, Wei, Huang, Xiaomei, Xie, Gengsheng, and Zheng, Caixia
Subjects: FEATURE selection, TIME complexity
Abstract: This document is a correction notice for an article titled "SFS-AGGL: Semi-Supervised Feature Selection Integrating Adaptive Graph with Global and Local Information." The original publication contained errors in Table 2, Table 3, Table 4, Figure 2, and Figure 7. Additionally, there were mistakes in some equations. The corrected versions of these tables, figures, and equations are provided in the document. The authors state that these corrections do not affect the scientific conclusions of the paper. The document also includes figures and tables related to the paper's content, providing information on computational complexity and derivatives of formulas. The authors of the document are Yugen Yi, Haoming Zhang, Ningyi Zhang, Wei Zhou, Xiaomei Huang, Gengsheng Xie, and Caixia Zheng. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

33. Machine learning model based on radiomics features for AO/OTA classification of pelvic fractures on pelvic radiographs.

Author: Park, Jun Young, Lee, Seung Hwan, Kim, Young Jae, Kim, Kwang Gi, and Lee, Gil Jae
Subjects: PELVIC fractures, NAIVE Bayes classification, RECEIVER operating characteristic curves, RADIOMICS, MACHINE learning, RADIOGRAPHS, FEATURE selection, KEGEL exercises
Abstract: Depending on the degree of fracture, pelvic fracture can be accompanied by vascular damage, and in severe cases, it may progress to hemorrhagic shock. Pelvic radiography can quickly diagnose pelvic fractures, and the Association for Osteosynthesis Foundation and Orthopedic Trauma Association (AO/OTA) classification system is useful for evaluating pelvic fracture instability. This study aimed to develop a radiomics-based machine-learning algorithm to quickly diagnose fractures on pelvic X-ray and classify their instability. data used were pelvic anteroposterior radiographs of 990 adults over 18 years of age diagnosed with pelvic fractures, and 200 normal subjects. A total of 93 features were extracted based on radiomics:18 first-order, 24 GLCM, 16 GLRLM, 16 GLSZM, 5 NGTDM, and 14 GLDM features. To improve the performance of machine learning, the feature selection methods RFE, SFS, LASSO, and Ridge were used, and the machine learning models used LR, SVM, RF, XGB, MLP, KNN, and LGBM. Performance measurement was evaluated by area under the curve (AUC) by analyzing the receiver operating characteristic curve. The machine learning model was trained based on the selected features using four feature-selection methods. When the RFE feature selection method was used, the average AUC was higher than that of the other methods. Among them, the combination with the machine learning model SVM showed the best performance, with an average AUC of 0.75±0.06. By obtaining a feature-importance graph for the combination of RFE and SVM, it is possible to identify features with high importance. The AO/OTA classification of normal pelvic rings and pelvic fractures on pelvic AP radiographs using a radiomics-based machine learning model showed the highest AUC when using the SVM classification combination. Further research on the radiomic features of each part of the pelvic bone constituting the pelvic ring is needed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. FEATURE SELECTION FOR COAL HEATING LEVEL ESTIMATION IN THERMAL POWER PLANTS.

Author: VUJNOVIĆ, Sanja M., CVETINOVIĆ, Dejan B., BAKIĆ, Vukman V., and DJUROVIĆ, Željko M.
Subjects: FEATURE selection, STEAM power plants, POWER plants, THERMAL coal, COAL, ELECTRIC power production, CARBON emissions
Abstract: Several recently signed environmental agreements and protocols emphasize the global need to reduce GHG emissions, with a focus on limiting coal consumption due to high NOx and CO2 emissions. However, many countries, including those in the Western Balkans, rely heavily on coal for electricity generation. The outdated thermal power plant infrastructure in these regions poses a major challenge when it comes to meeting modern environmental standards while maintaining efficiency. This study is part of the more comprehensive research which aims to develop an expert system that utilizes existing measurements to estimate key parameters crucial for both energy production and pollution reduction. The focus is on Serbian thermal power plants, particularly plant Nikola Tesla unit B1. One of the critical parameters for optimizing thermal power plant control loops is the heating value of coal, which is challenging to measure in real time due to the coal's varying chemical compositions and caloric values. This paper examines 74 different parameters measured in 59 instances to estimate the hating value of coal at unit B1. Through detailed analysis and feature selection methods, including linear regression, this research aims to identify the most informative parameters for estimating the heating value of coal, which will improve the control system that enables more efficient and environmentally friendly power generation in coal fired thermal power plants. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Automatic Design of Energy-Efficient Dispatching Rules for Multi-Objective Dynamic Flexible Job Shop Scheduling Based on Dual Feature Weight Sets.

Author: Xu, Binzi, Xu, Kai, Fei, Baolin, Huang, Dengchao, Tao, Liang, and Wang, Yan
Subjects: PRODUCTION scheduling, GENETIC programming, FEATURE selection, MANUFACTURING processes, SEARCH algorithms
Abstract: Considering the requirements of the actual production scheduling process, the utilization of the genetic programming hyper-heuristic (GPHH) approach to automatically design dispatching rules (DRs) has recently emerged as a popular optimization approach. However, the decision objects and decision environments for routing and sequencing decisions are different in the dynamic flexible job shop scheduling problem (DFJSSP), leading to different required feature information. Traditional algorithms that allow these two types of scheduling decisions to share one common feature set are not conducive to the further optimization of the evolved DRs, but instead introduce redundant and unnecessary search attempts for algorithm optimization. To address this, some related studies have focused on customizing the feature sets for both routing and sequencing decisions through feature selection when solving single-objective problems. While being effective in reducing the search space, the selected feature sets also diminish the diversity of the obtained DRs, ultimately impacting the optimization performance. Consequently, this paper proposes an improved GPHH with dual feature weight sets for the multi-objective energy-efficient DFJSSP, which includes two novel feature weight measures and one novel hybrid population adjustment strategy. Instead of selecting suitable features, the proposed algorithm assigns appropriate weights to the features based on their multi-objective contribution, which could provide directional guidance to the GPHH while ensuring the search space. Experimental results demonstrate that, compared to existing studies, the proposed algorithm can significantly enhance the optimization performance and interpretability of energy-efficient DRs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Flexible Ureteroscopy Lithotripsy Operative Time Prediction Model for the Treatment of Kidney Stones.

Author: Baidada, Chafik, Aatila, Mustapha, Lachgar, Mohamed, Hrimech, Hamid, Ommane, Younes, and Houlali, Abderrahim
Subjects: MEDICAL personnel, KIDNEY stones, URETEROSCOPY, PREDICTION models, LITHOTRIPSY, OPERATING rooms, FEATURE selection
Abstract: Effective time and resource management is crucial not only in the operating room but also in healthcare supply chains. Healthcare supply chains involve the movement of medical supplies, equipment, and medications from manufacturers to healthcare providers. Effective management is crucial to ensuring that patients receive the care they need promptly. In the operating room, it is essential to have an information process in place to effectively manage time and resources during the current surgical procedure. This paper focuses on developing a predictive model for the operating time of flexible ureteroscopy for kidney stones. The model can forecast surgical and preoperative time based on patient characteristics and surgeon experience. The model can assist in planning ureteroscopy procedures and preventing surgical complications, which is crucial not only for the operating room but also for healthcare supply chains. The paper presents a study that compares different feature selection methods and regression techniques. The study found that sequential backward selection combined with the extra tree regressor was the most effective approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning.

Author: Guan, Hui, Xu, Hang, and Cai, Lie
Subjects: PARTICLE swarm optimization, STACKING machines, FEATURE selection, SEARCH algorithms, MACHINE learning, FEATURE extraction
Abstract: To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the feature extraction process, this paper integrates part-of-speech features, TF-IDF features, and Word2Vec features during the feature selection stage. The particle swarm optimization algorithm is used to allocate weights to part-of-speech tags, which enhances the significance of crucial information in requirement texts. Secondly, to overcome the performance limitations of standalone machine learning models, an improved stacking model is proposed. The Low Correlation Algorithm and Grid Search Algorithms are utilized in P-stacking to automatically select the optimal combination of the base models, which reduces manual intervention and improves prediction performance. The experimental results show that compared with the method based on TF-IDF features, the highest F1 scores of a standalone machine learning model in the three datasets were improved by 3.89%, 10.68%, and 21.4%, respectively, after integrating part-of-speech features and Word2Vec features. Compared with the method based on a standalone machine learning model, the improved stacking ensemble machine learning model improved F1 scores by 2.29%, 5.18%, and 7.47% in the testing and evaluation of three datasets, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. The Holistic Advantage: Unified Quantitative Modeling for Less-Biased, In-Depth Insights into (Socio)Linguistic Variation.

Author: Gonzales, Wilkinson Daniel Wong
Subjects: VARIATION in language, FEATURE selection, SOCIOLINGUISTICS, RANDOM forest algorithms, ENGLISH language, REGRESSION analysis
Abstract: What happens when recognized and diverse conditioning factors of linguistic variation are omitted from analysis and/or are not analyzed under a single analytical procedure? This paper explores the consequences of such a choice on data interpretation and, consequently, (socio)linguistic theorization. Utilizing Twitter-style English in the Philippines (EngPH) as a case study, I employ the Twitter Corpus of Philippine Englishes (TCOPE) primarily to investigate and elucidate variations in three morphosyntactic variables that have been previously examined using a piecemeal approach. I propose a holistic quantitative approach that incorporates documented linguistic, social, diachronic, and stylistic factors in a unified analysis. The paper illustrates the impacts of adopting this holistic approach through two statistical procedures: Bayesian regression modeling and Boruta feature selection with random forest modeling. In contrast to earlier research findings, my overall results reveal biases in non-unified quantitative analyses, where the confidence in the effects of certain factors diminishes in light of others during analysis. The adoption of a unified analysis or modeling also enhances the resolution at which variations have been examined in EngPH. For instance, it highlights that presumed 'universals', such as the hierarchy of linguistic > stylistic > diachronic > social factors in explaining variation in some domains, is contingent on the specific variable under examination. Overall, I argue that unified analyses reduce data distortion and introduce more nuanced interpretations and insights that are critical for establishing a well-grounded empirical theory of EngPH variation and language variation as a whole. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Optimal PMU Placement for Fault Classification and Localization Using Enhanced Feature Selection in Machine Learning Algorithms.

Author: Faza, Ayman, Al-Mousa, Amjed, and Alqudah, Rajaa
Subjects: FEATURE selection, PHASOR measurement, NAIVE Bayes classification, FEATURE extraction, FAULT location (Engineering), CLASSIFICATION algorithms, MACHINE learning
Abstract: Machine learning (ML) algorithms are increasingly used in power systems applications. One important application is the classification and localization of various types of transmission line faults. Using voltage and current measurements from phasor measurement units (PMUs), a number of useful features can be extracted, which can form the basis of a ML-based prediction of the fault type, line, and distance on the line. This paper proposes a technique to find the optimal number and placement of PMUs by performing thorough feature selection. The features are selected to maximize the accuracy of the ML classification and regression algorithms. The results show that for the IEEE 14 bus system, the use of only five PMUs is sufficient to obtain high levels of accuracy. For example, a testing accuracy of 99.0% and 97.1% can be achieved for the fault type and fault line location, respectively. As for the fault distance along the line, the testing MAE of 3.1% can be obtained along with an R 2 score of 94.4%. Adding more PMUs does not provide any additional value in terms of accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Enhancing Classification Performance through FeatureBoostThyro: A Comparative Study of Machine Learning Algorithms and Feature Selection.

Author: Bhende, Deepali, Sakarkar, Gopal, Khandar, Punam, Uparkar, Satyajit, and Bhave, Arvind
Subjects: MACHINE learning, FEATURE selection, INDIANS (Asians), SUPPORT vector machines, ENDOCRINE diseases, LOGISTIC regression analysis
Abstract: Early-stage prediction of a disease is an important and challenging task. The application of machine learning techniques is playing an important role in this era. Thyroid is one of the chronic endocrine diseases, and approximately 42 million people in India are affected by this disease. This paper presents a comprehensive investigation into the enhancement of classification performance through the novel 'FeatureBoostThyro' (FBT) model. The study evaluates various machine learning algorithms, including stochastic gradient descent (SGD), K nearest neighbor (KNN), logistic regression (LR), naive bayes (NB), and support vector machine (SVM), in conjunction with diverse feature selection methods. The research systematically explores the impact of feature selection techniques such as information gain, relief F, chi-square, gini index, forward selection, backward selection, recursive feature elimination, and LASSO on model performance across the chosen algorithms. The analysis reveals notable variations in performance metrics, including accuracy, precision, recall, and F1-score, providing valuable insights into the interplay between algorithm and feature selection. One main contribution of this research is the introduction of the FBT model, which consistently outperforms other models across various feature selection methods, making it a promising tool for addressing complex classification tasks. The findings contribute to a broader understanding of model selection and optimization in machine learning applications. The proposed model undergoes evaluation using two distinct datasets: the primary dataset acquired from Lata Mangeshkar Hospital in Nagpur and the secondary dataset obtained from the UCI dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. A real-time intelligent lithology identification method based on a dynamic felling strategy weighted random forest algorithm.

Author: Tie Yan, Rui Xu, Shi-Hui Sun, Zhao-Kai Hou, and Jin-Yu Feng
Subjects: RANDOM forest algorithms, PETROLOGY, IDENTIFICATION, FEATURE extraction, FEATURE selection, DATA mining
Abstract: Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction, low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm (DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling (LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Special Issue on Advances in Database Engineered Applications.

Author: Chbeir, Richard, Ivanovic, Mirjana, Manolopoulos, Yannis, and Silvestri, Claudio
Subjects: DATABASES, DATA structures, INFORMATION technology, ENGINEERING, APPLIED sciences, FEATURE selection
Abstract: The text describes the 27th International Database Engineering and Applications Symposium (IDEAS-2023) held in Heraklion, Crete, Greece. The conference serves as a platform for data engineering researchers, practitioners, developers, and application users to exchange ideas and experiences. The text also highlights six selected papers from the conference, covering topics such as real-time database operations on an FPGA, optimized path planning for robots, meshfree interpolation of scattered data, optimizing distributed SPARQL queries using machine learning, specification mining over temporal data, and feature selection for improved classification performance. The authors of these papers present their research findings and methodologies in their respective fields. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

43. CMEFS: chaotic mapping-based mayfly optimization with fuzzy entropy for feature selection

Author: Sun, Lin, Liang, Hanbo, Ding, Weiping, Xu, Jiucheng, and Chang, Baofang
Published: 2024
Full Text: View/download PDF

44. Feature selection for hybrid information systems based on fuzzy β covering and fuzzy evidence theory

Author: Ma, Xiaoqin, Hu, Huanhuan, Zhang, Qinli, and Xu, Yi
Published: 2024
Full Text: View/download PDF

45. QNetDiff: a quantitative measurement of network rewiring.

Author: Nose, Shota, Shiroma, Hirotsugu, Yamada, Takuji, and Uno, Yushi
Subjects: LARGE intestine, COLORECTAL cancer, HUMAN body, CANCER patients
Abstract: Bacteria in the human body, particularly in the large intestine, are known to be associated with various diseases. To identify disease-associated bacteria (markers), a typical method is to statistically compare the relative abundance of bacteria between healthy subjects and diseased patients. However, since bacteria do not necessarily cause diseases in isolation, it is also important to focus on the interactions and relationships among bacteria when examining their association with diseases. In fact, although there are common approaches to represent and analyze bacterial interaction relationships as networks, there are limited methods to find bacteria associated with diseases through network-driven analysis. In this paper, we focus on rewiring of the bacterial network and propose a new method for quantifying the rewiring. We then apply the proposed method to a group of colorectal cancer patients. We show that it can identify and detect bacteria that cannot be detected by conventional methods such as abundance comparison. Furthermore, the proposed method is implemented as a general-purpose tool and made available to the general public. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Time Series Feature Selection Method Based on Mutual Information.

Author: Huang, Lin, Zhou, Xingqiang, Shi, Lianhui, and Gong, Li
Subjects: FEATURE selection, PRINCIPAL components analysis, FEATURE extraction, TIME series analysis
Abstract: Time series data have characteristics such as high dimensionality, excessive noise, data imbalance, etc. In the data preprocessing process, feature selection plays an important role in the quantitative analysis of multidimensional time series data. Aiming at the problem of feature selection of multidimensional time series data, a feature selection method for time series based on mutual information (MI) is proposed. One of the difficulties of traditional MI methods is in searching for a suitable target variable. To address this issue, the main innovation of this paper is the hybridization of principal component analysis (PCA) and kernel regression (KR) methods based on MI. Firstly, based on historical operational data, quantifiable system operability is constructed using PCA and KR. The next step is to use the constructed system operability as the target variable for MI analysis to extract the most useful features for the system data analysis. In order to verify the effectiveness of the method, an experiment is conducted on the CMAPSS engine dataset, and the effectiveness of condition recognition is tested based on the extracted features. The results indicate that the proposed method can effectively achieve feature extraction of high-dimensional monitoring data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. A CLASS SPECIFIC FEATURE SELECTION METHOD FOR IMPROVING THE PERFORMANCE OF TEXT CLASSIFICATION.

Author: VENKATESH V., SHARAN S. B., MAHALAXMY S., MONISHA, S., D. S., ASHICK SANJEY, and ASHOKKUMAR P.
Subjects: FEATURE selection, MACHINE learning, CLASSIFICATION
Abstract: Recently, a significant amount of research work has been carried out in the field of feature selection. Although these methods help to increase the accuracy of the machine learning classification, the selected subset of features considers all the classes and may not select recommendable features for a particular class. The main goal of our paper is to propose a new class-specific feature selection algorithm that is capable of selecting an appropriate subset of features for each class. In this regard, we first perform class binarization and then select the best features for each class. During the feature selection process, we deal with class imbalance problems and redundancy elimination. The Weighted Average Voting Ensemble method is used for the final classification. Finally, we carry out experiments to compare our proposed feature selection approach with the existing popular feature selection methods. The results prove that our feature selection method outperforms the existing methods with an accuracy of more than 37%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Genetic Algorithm for Feature Selection Applied to Financial Time Series Monotonicity Prediction: Experimental Cases in Cryptocurrencies and Brazilian Assets.

Author: Contreras, Rodrigo Colnago, Xavier da Silva, Vitor Trevelin, Xavier da Silva, Igor Trevelin, Viana, Monique Simplicio, Santos, Francisco Lledo dos, Zanin, Rodrigo Bruno, Martins, Erico Fernandes Oliveira, and Guido, Rodrigo Capobianco
Subjects: MACHINE learning, GENETIC algorithms, TIME series analysis, CRYPTOCURRENCIES, FEATURE selection, INVESTORS, ASSETS (Accounting)
Abstract: Since financial assets on stock exchanges were created, investors have sought to predict their future values. Currently, cryptocurrencies are also seen as assets. Machine learning is increasingly adopted to assist and automate investments. The main objective of this paper is to make daily predictions about the movement direction of financial time series through classification models, financial time series preprocessing methods, and feature selection with genetic algorithms. The target time series are Bitcoin, Ibovespa, and Vale. The methodology of this paper includes the following steps: collecting time series of financial assets; data preprocessing; feature selection with genetic algorithms; and the training and testing of machine learning models. The results were obtained by evaluating the models with the area under the ROC curve metric. For the best prediction models for Bitcoin, Ibovespa, and Vale, values of 0.61, 0.62, and 0.58 were obtained, respectively. In conclusion, the feature selection allowed the improvement of performance in most models, and the input series in the form of percentage variation obtained a good performance, although it was composed of fewer attributes in relation to the other sets tested. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Prediction of hydrological and water quality data based on granular-ball rough set and k-nearest neighbor analysis.

Author: Dong, Limei, Zuo, Xinyu, and Xiong, Yiping
Subjects: K-nearest neighbor classification, WATER quality, ROUGH sets, DATA quality, BACK propagation, K-means clustering, FEATURE selection
Abstract: Hydrological and water quality datasets usually encompass a large number of characteristic variables, but not all of these significantly influence analytical outcomes. Therefore, by wisely selecting feature variables with rich information content and removing redundant features, it not only can the analysis efficiency be improved, but the model complexity can also be simplified. This paper considers introducing the granular-ball rough set algorithm for feature variable selection and combining it with the k-nearest neighbor method and back propagation network to analyze hydrological and water quality data, thus promoting overall and fused inspection. The results of hydrological water quality data analysis show that the proposed method produces better results compared to using a standalone k-nearest neighbor regressor. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Gene Expression-Based Cancer Classification for Handling the Class Imbalance Problem and Curse of Dimensionality.

Author: Al-Azani, Sadam, Alkhnbashi, Omer S., Ramadan, Emad, and Alfarraj, Motaz
Subjects: TUMOR classification, CANCER genes, MICROARRAY technology, GENE expression, RANDOM forest algorithms, FEATURE selection, CAUSE of death statistics
Abstract: Cancer is a leading cause of death globally. The majority of cancer cases are only diagnosed in the late stages of cancer due to the use of conventional methods. This reduces the chance of survival for cancer patients. Therefore, early detection consequently followed by early diagnoses are important tasks in cancer research. Gene expression microarray technology has been applied to detect and diagnose most types of cancers in their early stages and has gained encouraging results. In this paper, we address the problem of classifying cancer based on gene expression for handling the class imbalance problem and the curse of dimensionality. The oversampling technique is utilized to overcome this problem by adding synthetic samples. Another common issue related to the gene expression dataset addressed in this paper is the curse of dimensionality. This problem is addressed by applying chi-square and information gain feature selection techniques. After applying these techniques individually, we proposed a method to select the most significant genes by combining those two techniques (CHiS and IG). We investigated the effect of these techniques individually and in combination. Four benchmarking biomedical datasets (Leukemia-subtypes, Leukemia-ALLAML, Colon, and CuMiDa) were used. The experimental results reveal that the oversampling techniques improve the results in most cases. Additionally, the performance of the proposed feature selection technique outperforms individual techniques in nearly all cases. In addition, this study provides an empirical study for evaluating several oversampling techniques along with ensemble-based learning. The experimental results also reveal that SVM-SMOTE, along with the random forests classifier, achieved the highest results, with a reporting accuracy of 100%. The obtained results surpass the findings in the existing literature as well. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,251 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources