Descriptor: "Support Vector Machine" / Journal: plos one - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Support Vector Machine"' showing total 1,576 results

Start Over Descriptor "Support Vector Machine" Journal plos one

1,576 results on '"Support Vector Machine"'

1. Machine learning for cell type classification from single nucleus RNA sequencing data

Author: Le, Huy, Peng, Beverly, Uy, Janelle, Carrillo, Daniel, Zhang, Yun, Aevermann, Brian D, and Scheuermann, Richard H
Subjects: Neurosciences, Genetics, Humans, Logistic Models, Machine Learning, RNA, RNA, Small Nuclear, Sequence Analysis, RNA, Support Vector Machine, General Science & Technology
Abstract: With the advent of single cell/nucleus RNA sequencing (sc/snRNA-seq), the field of cell phenotyping is now a data-driven exercise providing statistical evidence to support cell type/state categorization. However, the task of classifying cells into specific, well-defined categories with the empirical data provided by sc/snRNA-seq remains nontrivial due to the difficulty in determining specific differences between related cell types with close transcriptional similarities, resulting in challenges with matching cell types identified in separate experiments. To investigate possible approaches to overcome these obstacles, we explored the use of supervised machine learning methods-logistic regression, support vector machines, random forests, neural networks, and light gradient boosting machine (LightGBM)-as approaches to classify cell types using snRNA-seq datasets from human brain middle temporal gyrus (MTG) and human kidney. Classification accuracy was evaluated using an F-beta score weighted in favor of precision to account for technical artifacts of gene expression dropout. We examined the impact of hyperparameter optimization and feature selection methods on F-beta score performance. We found that the best performing model for granular cell type classification in both datasets is a multinomial logistic regression classifier and that an effective feature selection step was the most influential factor in optimizing the performance of the machine learning pipelines.
Published: 2022

2. Computer-aided diagnosis of external and middle ear conditions: A machine learning approach.

Author: Viscaino, Michelle, Maass, Juan, Delano, Paul, Torrente, Mariela, Stott, Carlos, and Auat Cheein, Fernando
Subjects: Adolescent, Adult, Cerumen, Child, Decision Trees, Diagnosis, Computer-Assisted, Ear Diseases, Early Diagnosis, Humans, Image Interpretation, Computer-Assisted, Male, Middle Aged, Myringosclerosis, Otitis Media, Sensitivity and Specificity, Support Vector Machine, Young Adult
Abstract: In medicine, a misdiagnosis or the absence of specialists can affect the patients health, leading to unnecessary tests and increasing the costs of healthcare. In particular, the lack of specialists in otolaryngology in third world countries forces patients to seek medical attention from general practitioners, whom might not have enough training and experience for making correct diagnosis in this field. To tackle this problem, we propose and test a computer-aided system based on machine learning models and image processing techniques for otoscopic examination, as a support for a more accurate diagnosis of ear conditions at primary care before specialist referral; in particular, for myringosclerosis, earwax plug, and chronic otitis media. To characterize the tympanic membrane and ear canal for each condition, we implemented three different feature extraction methods: color coherence vector, discrete cosine transform, and filter bank. We also considered three machine learning algorithms: support vector machine (SVM), k-nearest neighbor (k-NN) and decision trees to develop the ear condition predictor model. To conduct the research, our database included 160 images as testing set and 720 images as training and validation sets of 180 patients. We repeatedly trained the learning models using the training dataset and evaluated them using the validation dataset to thus obtain the best feature extraction method and learning model that produce the highest validation accuracy. The results showed that the SVM and k-NN presented the best performance followed by decision trees model. Finally, we performed a classification stage -i.e., diagnosis- using testing data, where the SVM model achieved an average classification accuracy of 93.9%, average sensitivity of 87.8%, average specificity of 95.9%, and average positive predictive value of 87.7%. The results show that this system might be used for general practitioners as a reference to make better decisions in the ear pathologies diagnosis.
Published: 2020

3. Clinical state tracking in serious mental illness through computational analysis of speech

Author: Arevian, Armen C, Bone, Daniel, Malandrakis, Nikolaos, Martinez, Victor R, Wells, Kenneth B, Miklowitz, David J, and Narayanan, Shrikanth
Subjects: Mental Health, Clinical Research, Behavioral and Social Science, Schizophrenia, Depression, Serious Mental Illness, Brain Disorders, Health Services, Management of diseases and conditions, 4.1 Discovery and preclinical testing of markers and technologies, Detection, screening and diagnosis, 7.1 Individual care needs, Mental health, Good Health and Well Being, Computational Biology, Female, Humans, Male, Mental Disorders, Middle Aged, Pilot Projects, Residence Characteristics, Speech, Support Vector Machine, General Science & Technology
Abstract: Individuals with serious mental illness experience changes in their clinical states over time that are difficult to assess and that result in increased disease burden and care utilization. It is not known if features derived from speech can serve as a transdiagnostic marker of these clinical states. This study evaluates the feasibility of collecting speech samples from people with serious mental illness and explores the potential utility for tracking changes in clinical state over time. Patients (n = 47) were recruited from a community-based mental health clinic with diagnoses of bipolar disorder, major depressive disorder, schizophrenia or schizoaffective disorder. Patients used an interactive voice response system for at least 4 months to provide speech samples. Clinic providers (n = 13) reviewed responses and provided global assessment ratings. We computed features of speech and used machine learning to create models of outcome measures trained using either population data or an individual's own data over time. The system was feasible to use, recording 1101 phone calls and 117 hours of speech. Most (92%) of the patients agreed that it was easy to use. The individually-trained models demonstrated the highest correlation with provider ratings (rho = 0.78, p
Published: 2020

4. Learning from data to predict future symptoms of oncology patients.

Author: Papachristou, Nikolaos, Puschmann, Daniel, Barnaghi, Payam, Cooper, Bruce, Hu, Xiao, Maguire, Roma, Apostolidis, Kathi, Conley, Yvette P, Hammer, Marilyn, Katsaragakis, Stylianos, Kober, Kord M, Levine, Jon D, McCann, Lisa, Patiraki, Elisabeth, Furlong, Eileen P, Fox, Patricia A, Paul, Steven M, Ream, Emma, Wright, Fay, and Miaskowski, Christine
Subjects: Humans, Neoplasms, Depression, Anxiety, Models, Psychological, Female, Male, Support Vector Machine, Neural Networks, Computer, Models, Psychological, Neural Networks, Computer, General Science & Technology
Abstract: Effective symptom management is a critical component of cancer treatment. Computational tools that predict the course and severity of these symptoms have the potential to assist oncology clinicians to personalize the patient's treatment regimen more efficiently and provide more aggressive and timely interventions. Three common and inter-related symptoms in cancer patients are depression, anxiety, and sleep disturbance. In this paper, we elaborate on the efficiency of Support Vector Regression (SVR) and Non-linear Canonical Correlation Analysis by Neural Networks (n-CCA) to predict the severity of the aforementioned symptoms between two different time points during a cycle of chemotherapy (CTX). Our results demonstrate that these two methods produced equivalent results for all three symptoms. These types of predictive models can be used to identify high risk patients, educate patients about their symptom experience, and improve the timing of pre-emptive and personalized symptom management interventions.
Published: 2018

5. Plasma metabolomic biomarkers accurately classify acute mild traumatic brain injury from controls

Author: Fiandaca, Massimo S, Mapstone, Mark, Mahmoodi, Amin, Gross, Thomas, Macciardi, Fabio, Cheema, Amrita K, Merchant-Borna, Kian, Bazarian, Jeffrey, and Federoff, Howard J
Subjects: Analytical Chemistry, Biomedical and Clinical Sciences, Chemical Sciences, Brain Disorders, Neurosciences, Clinical Research, Physical Injury - Accidents and Adverse Effects, Traumatic Head and Spine Injury, Traumatic Brain Injury (TBI), Detection, screening and diagnosis, 4.1 Discovery and preclinical testing of markers and technologies, Adolescent, Adult, Area Under Curve, Athletes, Athletic Injuries, Biomarkers, Brain Concussion, Diagnosis, Differential, Female, Humans, Linear Models, Longitudinal Studies, Male, Metabolome, Metabolomics, ROC Curve, Retrospective Studies, Support Vector Machine, Tandem Mass Spectrometry, Universities, Young Adult, General Science & Technology
Abstract: Past and recent attempts at devising objective biomarkers for traumatic brain injury (TBI) in both blood and cerebrospinal fluid have focused on abundance measures of time-dependent proteins. Similar independent determinants would be most welcome in diagnosing the most common form of TBI, mild TBI (mTBI), which remains difficult to define and confirm based solely on clinical criteria. There are currently no consensus diagnostic measures that objectively define individuals as having sustained an acute mTBI. Plasma metabolomic analyses have recently evolved to offer an alternative to proteomic analyses, offering an orthogonal diagnostic measure to what is currently available. The purpose of this study was to determine whether a developed set of metabolomic biomarkers is able to objectively classify college athletes sustaining mTBI from non-injured teammates, within 6 hours of trauma and whether such a biomarker panel could be effectively applied to an independent cohort of TBI and control subjects. A 6-metabolite panel was developed from biomarkers that had their identities confirmed using tandem mass spectrometry (MS/MS) in our Athlete cohort. These biomarkers were defined at ≤6 hours following mTBI and objectively classified mTBI athletes from teammate controls, and provided similar classification of these groups at the 2, 3, and 7 days post-mTBI. The same 6-metabolite panel, when applied to a separate, independent cohort provided statistically similar results despite major differences between the two cohorts. Our confirmed plasma biomarker panel objectively classifies acute mTBI cases from controls within 6 hours of injury in our two independent cohorts. While encouraged by our initial results, we expect future studies to expand on these initial observations.
Published: 2018

6. Classifying age from medial clavicle using a 30-year threshold: An image analysis based approach.

Author: Ivković N, Bašić Ž, and Jerković I
Subjects: Humans, Adult, Female, Male, Middle Aged, Neural Networks, Computer, Principal Component Analysis, Tomography, X-Ray Computed methods, Image Processing, Computer-Assisted methods, Aged, Young Adult, Age Determination by Skeleton methods, Age Factors, Logistic Models, Clavicle diagnostic imaging, Clavicle anatomy & histology, Support Vector Machine
Abstract: This study aimed to develop image-analysis-based classification models for distinguishing individuals younger and older than 30 using the medial clavicle. We extracted 2D images of the medial clavicle from multi-slice computed tomography (MSCT) scans from Clinical Hospital Center Split (n = 204). A sample was divided into a training (164 images) and testing (40 images) dataset. The images were loaded into the Orange Data Mining 3.32.0., and transformed into vectors using the pre-trained neural network Painters: A model trained to predict painters from artwork images. We conducted Principal Components Analysis (PCA) to visualize regularities within data and reduce data dimensionality in classification. We employed three classifiers that provided >80% accuracy: Support Vector Machine (SVM), Logistic Regression (LR), and Neutral Network Identity SGD (NNI-SGD). We used 5-fold cross-validation (CV) to obtain optimal variables and performances and validated data on the independent test set, with a standard posterior probabilities (pp) threshold of 0.5 and 0.95. The explainability of the model was accessed visually by analyzing clusters and incorrectly classified images using anthropology field knowledge. Based on the PCA, clavicles clustered into categories under 30 and 40 years, between 40 and 55 years, and over 80 years. The overall accuracy with standard pp ranged from 82.5% to 92.5% for CV and 82.5% to 92.5% for the test set. The posterior probability of 0.95 provided classification accuracy up to 100% but with a lower proportion of images that could be classified. The study showed that image analysis based on a pre-trained deep neural network could contribute to distinguishing clavicles of individuals younger and older than 30., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Ivković et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

7. Prediction of breast cancer Invasive Disease Events using transfer learning on clinical data as image-form.

Author: Fanizzi A, Bove S, Comes MC, Di Benedetto EF, Latorre A, Giotta F, Nardone A, Rizzo A, Soranno C, Zito A, and Massafra R
Subjects: Humans, Female, Middle Aged, Neoplasm Invasiveness, Machine Learning, Support Vector Machine, Aged, Adult, Neoplasm Recurrence, Local, Breast Neoplasms pathology, Breast Neoplasms diagnostic imaging, Breast Neoplasms diagnosis, Neural Networks, Computer
Abstract: Background and Objective: Detecting patients at high risk of occurrence of an Invasive Disease Event after a first diagnosis of breast cancer, such as recurrence, distant metastasis, contralateral tumor and second tumor, could support clinical decision-making processes in the treatment of this malignancy. Though several machine learning models analyzing both clinical and histopathological information have been developed in literature to address this task, these approaches turned out to be unsuitable for describing this problem., Methods: In this study, we designed a novel artificial intelligence-based approach which converts clinical information into an image-form to be analyzed through Convolutional Neural Networks. Specifically, we predicted the occurrence of an Invasive Disease Event at both 5-year and 10-year follow-ups of 696 female patients with a first invasive breast cancer diagnosis enrolled at IRCCS "Giovanni Paolo II" in Bari, Italy. After transforming each patient, represented by a vector of clinical information, to an image form, we extracted low-level quantitative imaging features by means of a pre-trained Convolutional Neural Network, namely, AlexNET. Then, we classified breast cancer patients in the two classes, namely, Invasive Disease Event and non-Invasive Disease Event, via a Support Vector Machine classifier trained on a subset of significative features previously identified., Results: Both 5-year and 10-year models resulted particularly accurate in predicting breast cancer recurrence event, achieving an AUC value of 92.07% and 92.84%, an accuracy of 88.71% and 88.82%, a sensitivity of 86.83% and 88.06%, a specificity of 89.55% and 89.3%, a precision of 71.93% and 84.82%, respectively., Conclusions: This is the first study proposing an approach which converts clinical information into an image-form to develop a decision support system for identifying patients at high risk of occurrence of an Invasive Disease Event, and then defining personalized oncological therapeutic treatments for breast cancer patients., Competing Interests: The authors declare no competing interests., (Copyright: © 2024 Fanizzi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

8. Boosted Harris Hawks Shuffled Shepherd Optimization Augmented Deep Learning based motor imagery classification for brain computer interface.

Author: Assiri FY and Ragab M
Subjects: Humans, Imagination physiology, Brain physiology, Support Vector Machine, Algorithms, Brain-Computer Interfaces, Deep Learning, Electroencephalography methods
Abstract: Motor imagery (MI) classification has been commonly employed in making brain-computer interfaces (BCI) to manage the outside tools as a substitute neural muscular path. Effectual MI classification in BCI improves communication and mobility for people with a breakdown or motor damage, delivering a bridge between the brain's intentions and exterior actions. Employing electroencephalography (EEG) or aggressive neural recordings, machine learning (ML) methods are used to interpret patterns of brain action linked with motor image tasks. These models frequently depend upon models like support vector machine (SVM) or deep learning (DL) to distinguish among dissimilar MI classes, such as visualizing left or right limb actions. This procedure allows individuals, particularly those with motor disabilities, to utilize their opinions to command exterior devices like robotic limbs or computer borders. This article presents a Boosted Harris Hawks Shuffled Shepherd Optimization Augmented Deep Learning (BHHSHO-DL) technique based on Motor Imagery Classification for BCI. The BHHSHO-DL technique mainly exploits the hyperparameter-tuned DL approach for MI identification for BCI. Initially, the BHHSHO-DL technique performs data preprocessing utilizing the wavelet packet decomposition (WPD) model. Besides, the enhanced densely connected networks (DenseNet) model extracts the preprocessed data's complex and hierarchical feature patterns. Meanwhile, the BHHSHO technique-based hyperparameter tuning process is accomplished to elect optimal parameter values of the enhanced DenseNet model. Finally, the classification procedure is implemented by utilizing the convolutional autoencoder (CAE) model. The simulation value of the BHHSHO-DL methodology is performed on a benchmark dataset. The performance validation of the BHHSHO-DL methodology portrayed a superior accuracy value of 98.15% and 92.23% over other techniques under BCIC-III and BCIC-IV datasets., Competing Interests: The authors declare that they have no conflict of interest., (Copyright: © 2024 Assiri, Ragab. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

9. Artificial intelligence model for predicting sexual dimorphism through the hyoid bone in adult patients.

Author: Ferraz AX, Schroder ÂGD, Gonçalves FM, Küchler EC, Santos RS, Zeigelboim BS, Pezzin APT, Taveira KV, Abuabara A, Baratto-Filho F, and de Araujo CM
Subjects: Humans, Female, Male, Adult, Middle Aged, Young Adult, Support Vector Machine, Algorithms, Cephalometry methods, Sex Determination by Skeleton methods, Adolescent, Hyoid Bone diagnostic imaging, Hyoid Bone anatomy & histology, Sex Characteristics, Artificial Intelligence
Abstract: The objective of this study was to develop a predictive model using supervised machine learning to determine sex based on the dimensions of the hyoid bone. Lateral cephalometric radiographs of 495 patients were analyzed, collecting the horizontal and vertical dimensions of the hyoid bone, as well as the distance from the hyoid to the mandible. The following algorithms were trained: Logistic Regression, Gradient Boosting Classifier, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Multilayer Perceptron Classifier (MLP), Decision Tree, AdaBoost Classifier, and Random Forest Classifier. A 5-fold cross-validation approach was used to validate each model. Model evaluation metrics included areas under the curve (AUC), accuracy, recall, precision, F1 score, and ROC curves. The horizontal dimension of the hyoid bone demonstrated the highest predictive power across all evaluated models. The AUC values of the different trained models ranged from 0.81 to 0.86 on test data and from 0.78 to 0.84 in cross-validation, with the random forest classifier achieving the highest accuracy rates. The supervised machine learning model showed good predictive accuracy, indicating the model's potential for sex determination in forensic and anthropological contexts. These findings suggest that the application of artificial intelligence methods can enhance the accuracy of sex estimation, contributing to significant advancements in the field., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Ferraz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

10. Sex prediction through machine learning utilizing mandibular condyles, coronoid processes, and sigmoid notches features.

Author: Basso IB, de Jesus Freitas PF, Ferraz AX, Borkovski AJ, Borkovski AL, Santos RS, Rached RN, Küchler EC, Schroder AGD, de Araujo CM, and Guariza-Filho O
Subjects: Humans, Male, Female, Adult, Mandibular Condyle diagnostic imaging, Algorithms, Young Adult, Adolescent, Sex Determination by Skeleton methods, Support Vector Machine, Middle Aged, Cephalometry methods, Mandible diagnostic imaging, Mandible anatomy & histology, ROC Curve, Machine Learning
Abstract: Characteristics of the mandible structures have been relevant in anthropological and forensic studies for sex prediction. This study aims to evaluate the coronoid process, condyle, and sigmoid notch patterns in sex prediction through supervised machine learning algorithms. Cephalometric radiographs from 410 dental records of patients were screened to investigate the morphology of the coronoid process, condyle, and sigmoid notch and the Co-Gn distance. The following machine learning algorithms were used to build the predictive models: Decision Tree, Gradient Boosting Classifier, K-Nearest Neighbors (KNN), Logistic Regression, Multilayer Perceptron Classifier, Random Forest Classifier, and Support Vector Machine (SVM). A 5-fold cross-validation approach was adopted to validate each model. Metrics such as area under the curve (AUC), accuracy, recall, precision, and F1 Score were calculated for each model, and ROC curves were constructed. All tested variables demonstrated statistical significance (p < 0.10) and were included in the construction of the predictive model. The Co-Gn variable stood out as the most important among the evaluated independent variables, showing greater relevance in three of the four algorithms used in assessing feature importance. In the analysis of the models' performance, the AUC ranged from 0.82 [95% CI = 0.72-0.93] to 0.66 [95% CI = 0.53-0.76] for the test data, and from 0.83 [95% CI = 0.80-0.87] to 0.71 [95% CI = 0.61-0.75] for cross-validation. The precision of the models ranged from 0.83 [95% CI = 0.75-0.91] to 0.68 [95% CI = 0.58-0.78] in the test phase, and from 0.78 [95% CI = 0.74-0.82] to 0.69 [95% CI = 0.65-0.75] in cross-validation. The SVM, KNN, and Gradient Boosting Classifier algorithms stood out with the highest AUC and precision values in both cross-validation and testing. The use of condyle, coronoid process, and sigmoid notch characteristics, in combination with supervised machine learning predictive models, shows potential for contributing to sex prediction based on morphometric bone characteristics, particularly regarding the distance between the condyle and gnathion. However, given the study's limitations, these findings should be interpreted with caution., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Basso et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

11. Prediction model and technical and tactical decision analysis of women's badminton singles based on machine learning.

Author: Yuan H, Wang Y, Yang K, and Bin Y
Subjects: Humans, Female, Republic of Korea, Algorithms, Decision Making, Athletes psychology, Support Vector Machine, Racquet Sports, Machine Learning
Abstract: In the Paris Olympic cycle, South Korean women's athlete An Se-young rose to the top of the 2023 BWF Olympic points with a win rate of 89.5%. With An Se-young as the subject, this paper aims to carry out technical and tactical analysis of women's badminton singles and formulate a prediction model based on machine learning. Firstly, An's technical and tactical statistics are analyzed and presented in a proposed "three-stage" data classification method. Secondly, we improve our "three-stage" machine learning dataset using video analysis of 10 matches (21 point games) where An Se-young faced off against four other players ranked in the top five of the World Badminton Federation (BWF) in week 44 of 2023. Finally, we establish a prediction model for the scoring and losing of points in the women's badminton singles based on the 'Decision tree', 'Random forest', 'XGBoost', 'Support vector' and 'K-proximity' algorithms, and analyze the effectiveness of this model. The results show that the improved data classification is reasonable and can be used to predict the final score of a match. When the support vector machine uses the RBF function kernel, the accuracy reaches its highest at 87.5%, and the consistency of this prediction model is strong. An's playstyle is sustained and unified; she does not seek continuous pressure, but rather exploits and maximizes her aggression following any mistake made by her opponents, immediately utilizing assault methods such as kills or dives, often resulting in the conversion of points during the subsequent 2-3 strikes., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Yuan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

12. Enhancing wind erosion risk assessment through remote sensing techniques.

Author: Boali A, Kariminejad N, and Hosseinalizadeh M
Subjects: Risk Assessment methods, Iran, Environmental Monitoring methods, Support Vector Machine, Machine Learning, Models, Theoretical, Conservation of Natural Resources methods, Wind, Remote Sensing Technology methods
Abstract: Preventing wind erosion and dust storms has always been a major concern in arid and semi-arid areas because of their negative effects on the environment. This study aims to utilize remote sensing and machine learning techniques to model, monitor, and predict the risk of wind erosion in Northeast Iran. Through an examination of relevant studies, a comprehensive review was conducted, leading to the identification of eight remote sensing indicators that exhibited the highest correlation with field data. These indicators were subsequently employed to model the risk of wind erosion in the study area. Various methods including Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), and Generalized Linear Models (GLM) were employed to carry out the modeling process. The final method utilized a weighted average of the model, and the SDM statistical package was used to combine different approaches to decrease uncertainty when modeling and monitoring wind erosion in the area. The modeling results indicated that in 2008, the RF model performed the best (AUC = 0.92, TSS = 0.82, and Kappa = 0.96), while in 2023, the GBM model showed superior performance (AUC = 0.95, TSS = 0.79, and Kappa = 0.95). Therefore, the utilization of an ensemble model emerged as an effective approach to reduce uncertainty during the modeling process. By employing the ensemble model, the outcomes obtained accurately depicted an elevated intensity of wind erosion in the northeastern regions of the study area by 2023. Furthermore, considering the climatic scenarios and projected land use changes, it is anticipated that wind erosion intensity will experience a 23% increase in the central and southern parts of the study area by 2038. By taking into account the reliable results of the ensemble model, which offers reduced uncertainty, it becomes feasible to implement effective planning, optimal management, and appropriate measures to mitigate the progression of wind erosion., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Boali et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

13. Machine learning algorithms to predict treatment success for patients with pulmonary tuberculosis.

Author: Ahamed Fayaz S, Babu L, Paridayal L, Vasantha M, Paramasivam P, Sundarakumar K, and Ponnuraja C
Subjects: Humans, Female, Male, Retrospective Studies, Treatment Outcome, Adult, Sputum microbiology, Antitubercular Agents therapeutic use, Support Vector Machine, Middle Aged, Mycobacterium tuberculosis isolation & purification, Tuberculosis, Pulmonary drug therapy, Tuberculosis, Pulmonary microbiology, Tuberculosis, Pulmonary diagnosis, Machine Learning, Algorithms
Abstract: Despite advancements in detection and treatment, tuberculosis (TB), an infectious illness caused by the Mycobacterium TB bacteria, continues to pose a serious threat to world health. The TB diagnosis phase includes a patient's medical history, physical examination, chest X-rays, and laboratory procedures, such as molecular testing and sputum culture. In artificial intelligence (AI), machine learning (ML) is an advanced study of statistical algorithms that can learn from historical data and generalize the results to unseen data. There are not many studies done on the ML algorithm that enables the prediction of treatment success for patients with pulmonary TB (PTB). The objective of this study is to identify an effective and predictive ML algorithm to evaluate the detection of treatment success in PTB patients and to compare the predictive performance of the ML models. In this retrospective study, a total of 1236 PTB patients who were given treatment under a randomized controlled clinical trial at the ICMR-National Institute for Research in Tuberculosis, Chennai, India were considered for data analysis. The multiple ML models were developed and tested to identify the best algorithm to predict the sputum culture conversion of TB patients during the treatment period. In this study, decision tree (DT), random forest (RF), support vector machine (SVM) and naïve bayes (NB) models were validated with high performance by achieving an area under the curve (AUC) of receiver operating characteristic (ROC) greater than 80%. The salient finding of the study is that the DT model was produced as a better algorithm with the highest accuracy (92.72%), an AUC (0.909), precision (95.90%), recall (95.60%) and F1-score (95.75%) among the ML models. This methodology may be used to study the precise ML model classification for predicting the treatment success of TB patients during the treatment period., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Ahamed Fayaz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

14. Machine learning-aided microRNA discovery for olive oil quality.

Author: Pakdel MH, Asadi AA, Tavakol E, Shariati V, and Hosseini Mazinani M
Subjects: Gene Expression Regulation, Plant, RNA, Plant genetics, Support Vector Machine, MicroRNAs genetics, Olive Oil, Machine Learning, Olea genetics, Olea metabolism
Abstract: MicroRNAs (miRNAs) are key regulators of gene expression in plants, influencing various biological processes such as oil quality and seed development. Although, our knowledge about miRNAs in olive (Olea europaea L.) is progressing, with several miRNAs being identified in previous studies, but most of these reported miRNAs have been predicted without the aid of a reference genome, primarily due to limited genome accessibility at the time. However, significant knowledge gaps still need to be improved in this area. This study addresses the complexities of miRNA detection in olive, using a high quality reference genome and a combination of genomics and machine learning-based methods. By leveraging random forest and support vector machine algorithms, we successfully identified 56 novel miRNAs in olive, surpassing the limitations of conventional homology-based methods. Our subsequent analysis revealed that some of these miRNAs are implicated in the regulation of key genes involved in oil quality. Within the context of oil biosynthesis pathways, the novel miRNA Oeu124369 regulates fatty acid biosynthesis by targeting acetyl-CoA acyltransferase 1 and palmitoyl-protein thioesterase, thereby influencing the production of acetyl-CoA and palmitic acid, respectively. These findings underscore the power of machine learning in unraveling the complex miRNA regulatory network in olive and provide a high quality miRNA resource for future research aimed at improving olive oil production by exploring the target genes of the identified miRNAs to understand their role and their biological processes., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Pakdel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

15. Comparative study of machine learning approaches integrated with genetic algorithm for IVF success prediction.

Author: Dehghan S, Rabiei R, Choobineh H, Maghooli K, Nazari M, and Vahidi-Asl M
Subjects: Humans, Female, Adult, Male, Pregnancy, Neural Networks, Computer, Support Vector Machine, Fertilization in Vitro methods, Machine Learning, Algorithms
Abstract: Introduction: IVF is a widely-used assisted reproductive technology with a consistent success rate of around 30%, and improving this rate is crucial due to emotional, financial, and health-related implications for infertile couples. This study aimed to develop a model for predicting IVF outcome by comparing five machine-learning techniques., Method: The research approached five prominent machine learning algorithms, including Random Forest, Artificial Neural Network (ANN), Support Vector Machine (SVM), Recursive Partitioning and Regression Trees (RPART), and AdaBoost, in the context of IVF success prediction. The study also incorporated GA as a feature selection method to enhance the predictive models' robustness., Results: Findings demonstrate that AdaBoost, particularly when combined with GA feature selection, achieved the highest accuracy rate of 89.8%. Using GA, Random Forest also demonstrated strong performance, achieving an accuracy rate of 87.4%. Genetic Algorithm significantly improved the performance of all classifiers, emphasizing the importance of feature selection. Ten crucial features, including female age, AMH, endometrial thickness, sperm count, and various indicators of oocyte and embryo quality, were identified as key determinants of IVF success., Conclusion: These findings underscore the potential of machine learning and feature selection techniques to assist IVF clinicians in providing more accurate predictions, enabling tailored treatment plans for each patient. Future research and validation can further enhance the practicality and reliability of these predictive models in clinical IVF practice., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Dehghan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

16. A machine learning approach in a monocentric cohort for predicting primary refractory disease in Diffuse Large B-cell lymphoma patients.

Author: Detrait MY, Warnon S, Lagasse R, Dumont L, De Prophétis S, Hansenne A, Raedemaeker J, Robin V, Verstraete G, Gillain A, Depasse N, Jacmin P, and Pranger D
Subjects: Humans, Male, Female, Middle Aged, Aged, Retrospective Studies, Adult, Prognosis, Aged, 80 and over, Support Vector Machine, ROC Curve, Cohort Studies, Positron Emission Tomography Computed Tomography, Lymphoma, Large B-Cell, Diffuse diagnosis, Lymphoma, Large B-Cell, Diffuse pathology, Lymphoma, Large B-Cell, Diffuse mortality, Machine Learning
Abstract: Introduction: Primary refractory disease affects 30-40% of patients diagnosed with DLBCL and is a significant challenge in disease management due to its poor prognosis. Predicting refractory status could greatly inform treatment strategies, enabling early intervention. Various options are now available based on patient and disease characteristics. Supervised machine-learning techniques, which can predict outcomes in a medical context, appear highly suitable for this purpose., Design: Retrospective monocentric cohort study., Patient Population: Adult patients with a first diagnosis of DLBCL admitted to the hematology unit from 2017 to 2022., Aim: We evaluated in our Center five supervised machine-learning (ML) models as a tool for the prediction of primary refractory DLBCL., Main Results: One hundred and thirty patients with Diffuse Large B-cell lymphoma (DLBCL) were included in this study between January 2017 and December 2022. The variables used for analysis included demographic characteristics, clinical condition, disease characteristics, first-line therapy and PET-CT scan realization after 2 cycles of treatment. We compared five supervised ML models: support vector machine (SVM), Random Forest Classifier (RFC), Logistic Regression (LR), Naïve Bayes (NB) Categorical classifier and eXtreme Gradient Boost (XGboost), to predict primary refractory disease. The performance of these models was evaluated using the area under the receiver operating characteristic curve (ROC-AUC), accuracy, false positive rate, sensitivity, and F1-score to identify the best model. After a median follow-up of 19.5 months, the overall survival rate was 60% in the cohort. The Overall Survival at 3 years was 58.5% (95%CI, 51-68.5) and the 3-years Progression Free Survival was 63% (95%CI, 54-71) using Kaplan-Meier method. Of the 124 patients who received a first line treatment, primary refractory disease occurred in 42 patients (33.8%) and 2 patients (1.6%) experienced relapse within 6 months. The univariate analysis on refractory disease status shows age (p = 0.009), Ann Arbor stage (p = 0.013), CMV infection (p = 0.012), comorbidity (p = 0.019), IPI score (p<0.001), first line of treatment (p<0.001), EBV infection (p = 0.008) and socio-economics status (p = 0.02) as influencing factors. The NB Categorical classifier emerged as the top-performing model, boasting a ROC-AUC of 0.81 (95% CI, 0.64-0.96), an accuracy of 83%, a F1-score of 0.82, and a low false positive rate at 10% on the validation set. The eXtreme Gradient Boost (XGboost) model and the Random Forest Classifier (RFC) followed with a ROC-AUC of 0.74 (95%CI, 0.52-0.93) and 0.67 (95%CI, 0.46-0.88) respectively, an accuracy of 78% and 72% respectively, a F1-score of 0.75 and 0.67 respectively, and a false positive rate of 10% for both. The other two models performed worse with ROC-AUC of 0.65 (95%CI, 0.40-0.87) and 0.45 (95%CI, 0.29-0.64) for SVM and LR respectively, an accuracy of 67% and 50% respectively, a f1-score of 0.64 and 0.43 respectively, and a false positive rate of 28% and 37% respectively., Conclusion: Machine learning algorithms, particularly the NB Categorical classifier, have the potential to improve the prediction of primary refractory disease in DLBCL patients, thereby providing a novel decision-making tool for managing this condition. To validate these results on a broader scale, multicenter studies are needed to confirm the results in larger cohorts., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Detrait et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

17. Analysis of influencing factors of traffic accidents on urban ring road based on the SVM model optimized by Bayesian method.

Author: Wang L, Xiao M, Lv J, and Liu J
Subjects: Humans, China, Cities, Accidents, Traffic statistics & numerical data, Bayes Theorem, Support Vector Machine
Abstract: Based on small scale sample of accident data from specific scenarios, fully exploring the potential influencing factors of the severity of traffic accidents has become a key and effective research method. In order to analyze the factors mentioned above in the scenario of urban ring roads, this paper collected data records of 1250 traffic accidents involving different severity on urban ring road of a central city in northwest China in the past 3 years. Firstly, the Support Vector Machine (SVM) model of non-parametric method is utilized to analyze the data above, and three kernel functions of linear, inhomogeneous polynomial and Gaussian radial basis are constructed respectively. Considering comprehensively 16 potential influencing factors covering the driver-vehicle-road-environment integrated system, the SVM models of above three kernel functions are verified, accuracy reaches 0.771 and F1 reaches 0.841. Then, Bayesian Optimization (BO), Grids Search (GS) and Rough Set (RS) are utilized as optimizer to adjust the parameters of Gaussian radial basis function SVM model, the performance of BO-SVM is further improved and reaches the optimum, with an average accuracy of 0.875 on the test set and a F1 of 0.886, completely outperforming the benchmark models of GS-SVM, RS-SVM, Bilayer-LSTM and BP. Finally, the sensitivity analysis method is utilized to quantify the sensitivity of the potential influencing factors to the severity of road accidents, and the backward selection method is utilized to screen the core influencing factors that influence the severity of accident, concluded that core influencing factors are age, driving mileage and vehicle type. This paper will provide reference for the analysis of the significant influencing factors for road accidents severity, and to provide theoretical support for the precise formulation of accident improvement strategies., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

18. tRF-BERT: A transformative approach to aspect-based sentiment analysis in the bengali language.

Author: Ahmed S, Samia MM, Sayma MH, Kabir MM, and Mridha MF
Subjects: Humans, Support Vector Machine, Social Media, Neural Networks, Computer, Algorithms, Restaurants, Language
Abstract: In recent years, the surge in reviews and comments on newspapers and social media has made sentiment analysis a focal point of interest for researchers. Sentiment analysis is also gaining popularity in the Bengali language. However, Aspect-Based Sentiment Analysis is considered a difficult task in the Bengali language due to the shortage of perfectly labeled datasets and the complex variations in the Bengali language. This study used two open-source benchmark datasets of the Bengali language, Cricket, and Restaurant, for our Aspect-Based Sentiment Analysis task. The original work was based on the Random Forest, Support Vector Machine, K-Nearest Neighbors, and Convolutional Neural Network models. In this work, we used the Bidirectional Encoder Representations from Transformers, the Robustly Optimized BERT Approach, and our proposed hybrid transformative Random Forest and Bidirectional Encoder Representations from Transformers (tRF-BERT) models to compare the results with the existing work. After comparing the results, we can clearly see that all the models used in our work achieved better results than any of the previous works on the same dataset. Amongst them, our proposed transformative Random Forest and Bidirectional Encoder Representations from Transformers achieved the highest F1 score and accuracy. The accuracy and F1 score of aspect detection for the Cricket dataset were 0.89 and 0.85, respectively, and for the Restaurant dataset were 0.92 and 0.89 respectively., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Ahmed et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

19. Forecasting for Haditha reservoir inflow in the West of Iraq using Support Vector Machine (SVM).

Author: Mahmood OA, Sulaiman SO, and Al-Jumeily D
Subjects: Iraq, Water Supply, Seasons, Support Vector Machine, Forecasting methods, Rivers
Abstract: Accurate inflow forecasting is an essential non-engineering strategy to guarantee flood management and boost the effectiveness of the water supply. As inflow is the primary reservoir input, precise inflow forecasting may also offer appropriate reservoir design and management assistance. This study aims to generalize the machine learning model using the support vector machine (SVM), which is support vector regression (SVR), to predict the discharges of the Euphrates River upstream of the Haditha Dam reservoir in Anbar province West of Iraq. Time series data were collected for the period (1986-2024) for the river's daily, monthly, and seasonal flow. Different kernel functions of SVR were applied in this study. The kernels are linear, Quadratic, and Gaussian (RBF). The results showed that the daily time scale is better than the monthly and seasonal performance. In contrast, the linear kernel outperformed the other SVR kernel with a time delay of one day based on the value of the coefficient of determination (R2 = 0.95) and the root mean square error (RMSE = 53.29) m3/sec for predicting daily river flow. The results showed that the proposed machine learning model performed well in predicting the daily flow of the Euphrates River upstream of the Haditha Dam reservoir; this indicates that the model might effectively forecast flows, which helps improve water resource management and dam operations., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Mahmood et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

20. A machine learning-based prediction of hospital mortality in mechanically ventilated ICU patients.

Author: Li H, Ashrafi N, Kang C, Zhao G, Chen Y, and Pishgar M
Subjects: Humans, Male, Female, Middle Aged, Aged, Support Vector Machine, Critical Illness mortality, Databases, Factual, Respiration, Artificial mortality, Hospital Mortality, Intensive Care Units, Machine Learning
Abstract: Background: Mechanical ventilation (MV) is vital for critically ill ICU patients but carries significant mortality risks. This study aims to develop a predictive model to estimate hospital mortality among MV patients, utilizing comprehensive health data to assist ICU physicians with early-stage alerts., Methods: We developed a Machine Learning (ML) framework to predict hospital mortality in ICU patients receiving MV. Using the MIMIC-III database, we identified 25,202 eligible patients through ICD-9 codes. We employed backward elimination and the Lasso method, selecting 32 features based on clinical insights and literature. Data preprocessing included eliminating columns with over 90% missing data and using mean imputation for the remaining missing values. To address class imbalance, we used the Synthetic Minority Over-sampling Technique (SMOTE). We evaluated several ML models, including CatBoost, XGBoost, Decision Tree, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Logistic Regression, using a 70/30 train-test split. The CatBoost model was chosen for its superior performance in terms of accuracy, precision, recall, F1-score, AUROC metrics, and calibration plots., Results: The study involved a cohort of 25,202 patients on MV. The CatBoost model attained an AUROC of 0.862, an increase from an initial AUROC of 0.821, which was the best reported in the literature. It also demonstrated an accuracy of 0.789, an F1-score of 0.747, and better calibration, outperforming other models. These improvements are due to systematic feature selection and the robust gradient boosting architecture of CatBoost., Conclusion: The preprocessing methodology significantly reduced the number of relevant features, simplifying computational processes, and identified critical features previously overlooked. Integrating these features and tuning the parameters, our model demonstrated strong generalization to unseen data. This highlights the potential of ML as a crucial tool in ICUs, enhancing resource allocation and providing more personalized interventions for MV patients., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

21. Mathematical modeling and numerical simulation of supercritical processing of drug nanoparticles optimization for green processing: AI analysis.

Author: Aljohani K
Subjects: Carbon Dioxide chemistry, Models, Theoretical, Green Chemistry Technology methods, Computer Simulation, Temperature, Support Vector Machine, Nanoparticles chemistry, Artificial Intelligence, Solubility
Abstract: In recent decades, unfavorable solubility of novel therapeutic agents is considered as an important challenge in pharmaceutical industry. Supercritical carbon dioxide (SCCO2) is known as a green, cost-effective, high-performance, and promising solvent to develop the low solubility of drugs with the aim of enhancing their therapeutic effects. The prominent objective of this study is to improve and modify disparate predictive models through artificial intelligence (AI) to estimate the optimized value of the Oxaprozin solubility in SCCO2 system. In this paper, three different models were selected to develop models on a solubility dataset. Pressure (bar) and temperature (K) are the two inputs for each vector, and each vector has one output (solubility). Selected models include NU-SVM, Linear-SVM, and Decision Tree (DT). Models were optimized through hyper-parameters and assessed applying standard metrics. Considering R-squared metric, NU-SVM, Linear-SVM, and DT have scores of 0.994, 0.854, and 0.950, respectively. Also, they have RMSE error rates of 3.0982E-05, 1.5024E-04, and 1.1680E-04, respectively. Based on the evaluations made, NU-SVM was considered as the most precise method, and optimal values can be summarized as (T = 336.05 K, P = 400.0 bar, solubility = 0.00127) employing this model. Fig 4., Competing Interests: The authors have no conflicts of interest to declare., (Copyright: © 2024 Khalid Aljohani. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

22. Proteome-scale prediction of molecular mechanisms underlying dominant genetic diseases.

Author: Badonyi M and Marsh JA
Subjects: Humans, Support Vector Machine, Genes, Dominant, Mutation, Missense, Gain of Function Mutation, Loss of Function Mutation, Proteome, Genetic Diseases, Inborn genetics
Abstract: Many dominant genetic disorders result from protein-altering mutations, acting primarily through dominant-negative (DN), gain-of-function (GOF), and loss-of-function (LOF) mechanisms. Deciphering the mechanisms by which dominant diseases exert their effects is often experimentally challenging and resource intensive, but is essential for developing appropriate therapeutic approaches. Diseases that arise via a LOF mechanism are more amenable to be treated by conventional gene therapy, whereas DN and GOF mechanisms may require gene editing or targeting by small molecules. Moreover, pathogenic missense mutations that act via DN and GOF mechanisms are more difficult to identify than those that act via LOF using nearly all currently available variant effect predictors. Here, we introduce a tripartite statistical model made up of support vector machine binary classifiers trained to predict whether human protein coding genes are likely to be associated with DN, GOF, or LOF molecular disease mechanisms. We test the utility of the predictions by examining biologically and clinically meaningful properties known to be associated with the mechanisms. Our results strongly support that the models are able to generalise on unseen data and offer insight into the functional attributes of proteins associated with different mechanisms. We hope that our predictions will serve as a springboard for researchers studying novel variants and those of uncertain clinical significance, guiding variant interpretation strategies and experimental characterisation. Predictions for the human UniProt reference proteome are available at https://osf.io/z4dcp/., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Badonyi, Marsh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

23. LERCause: Deep learning approaches for causal sentence identification from nuclear safety reports.

Author: Kim J, Kim J, Lee A, Kim J, and Diesner J
Subjects: Humans, Safety, Natural Language Processing, Neural Networks, Computer, Support Vector Machine, Deep Learning
Abstract: Identifying causal sentences from nuclear incident reports is essential for advancing nuclear safety research and applications. Nonetheless, accurately locating and labeling causal sentences in text data is challenging, and might benefit from the usage of automated techniques. In this paper, we introduce LERCause, a labeled dataset combined with labeling methods meant to serve as a foundation for the classification of causal sentences in the domain of nuclear safety. We used three BERT models (BERT, BioBERT, and SciBERT) to 10,608 annotated sentences from the Licensee Event Report (LER) corpus for predicting sentence labels (Causal vs. non-Causal). We also used a keyword-based heuristic strategy, three standard machine learning methods (Logistic Regression, Gradient Boosting, and Support Vector Machine), and a deep learning approach (Convolutional Neural Network; CNN) for comparison. We found that the BERT-centric models outperformed all other tested models in terms of all evaluation metrics (accuracy, precision, recall, and F1 score). BioBERT resulted in the highest overall F1 score of 94.49% from the ten-fold cross-validation. Our dataset and coding framework can provide a robust baseline for assessing and comparing new causal sentences extraction techniques. As far as we know, our research breaks new ground by leveraging BERT-centric models for causal sentence classification in the nuclear safety domain and by openly distributing labeled data and code to enable reproducibility in subsequent research., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

24. An optimized model based on adaptive convolutional neural network and grey wolf algorithm for breast cancer diagnosis.

Author: Alnowaiser K, Saber A, Hassan E, and Awad WA
Subjects: Humans, Female, Mammography methods, Diagnosis, Computer-Assisted methods, Support Vector Machine, ROC Curve, Breast Neoplasms diagnosis, Breast Neoplasms diagnostic imaging, Neural Networks, Computer, Algorithms
Abstract: Medical image classification (IC) is a method for categorizing images according to the appropriate pathological stage. It is a crucial stage in computer-aided diagnosis (CAD) systems, which were created to help radiologists with reading and analyzing medical images as well as with the early detection of tumors and other disorders. The use of convolutional neural network (CNN) models in the medical industry has recently increased, and they achieve great results at IC, particularly in terms of high performance and robustness. The proposed method uses pre-trained models such as Dense Convolutional Network (DenseNet)-121 and Visual Geometry Group (VGG)-16 as feature extractor networks, bidirectional long short-term memory (BiLSTM) layers for temporal feature extraction, and the Support Vector Machine (SVM) and Random Forest (RF) algorithms to perform classification. For improved performance, the selected pre-trained CNN hyperparameters have been optimized using a modified grey wolf optimization method. The experimental analysis for the presented model on the Mammographic Image Analysis Society (MIAS) dataset shows that the VGG16 model is powerful for BC classification with overall accuracy, sensitivity, specificity, precision, and area under the ROC curve (AUC) of 99.86%, 99.9%, 99.7%, 97.1%, and 1.0, respectively, on the MIAS dataset and 99.4%, 99.03%, 99.2%, 97.4%, and 1.0, respectively, on the INbreast dataset., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Alnowaiser et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

25. Developing a transit desert interactive dashboard: Supervised modeling for forecasting transit deserts.

Author: Choi SJ and Jiao J
Subjects: Humans, Transportation, Male, Female, Support Vector Machine, Supervised Machine Learning, Decision Trees, Built Environment, Models, Theoretical, Machine Learning, Forecasting methods
Abstract: Transit deserts refer to regions with a gap in transit services, with the demand for transit exceeding the supply. This study goes beyond merely identifying transit deserts to suggest actionable solutions. Using a multi-class supervised machine learning framework, we analyzed factors leading to transit deserts, distinguishing demand by gender. Our focus was on peak-time periods. After assessing the Support Vector Machine, Decision Tree, Random Forest, and K-nearest Neighbor, we settled on the Random Forest method, supported by Diverse Counterfactual Explanation and SHapley Additive Explanation in our analysis. The ranking of feature importance in the trained Random Forest model revealed that factors such as density, design, distance to transit, diversity in the built environment, and sociodemographic characteristics significantly contribute to the classification of transit deserts. Diverse Counterfactual Explanation suggested that a reduction in population density and an increase in the proportion of green open spaces would likely facilitate the transformation of transit deserts into transit oases. SHapley Additive Explanation highlighted the differential impact of various features on each identified transit desert. Our analysis results indicate that identifying transit deserts can vary depending on whether the data is aggregated or separated by demographics. We found areas that have unique transit needs based on gender. The disparity in transit services was particularly pronounced for women. Our model pinpointed the core elements that define a transit desert. Broadly, to address transit deserts, strategies should prioritize the needs of disadvantaged groups and enhance the design and accessibility of transit in the built environment. Our research extends existing analyses of transit deserts by leveraging machine learning to develop a predictive model. We developed a machine learning-powered interactive dashboard. Integrating participatory planning approaches with the development of an interactive interface could enhance ongoing community engagement. Planning practices can evolve with AI in the loop., Competing Interests: The authors declare that they have no known competing interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright: © 2024 Choi, Jiao. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

26. HAPI: An efficient Hybrid Feature Engineering-based Approach for Propaganda Identification in social media.

Author: Khanday AMUD, Wani MA, Rabani ST, Khan QR, and Abd El-Latif AA
Subjects: Humans, Support Vector Machine, Bayes Theorem, Social Media, Algorithms, Machine Learning
Abstract: Social media platforms serve as communication tools where users freely share information regardless of its accuracy. Propaganda on these platforms refers to the dissemination of biased or deceptive information aimed at influencing public opinion, encompassing various forms such as political campaigns, fake news, and conspiracy theories. This study introduces a Hybrid Feature Engineering Approach for Propaganda Identification (HAPI), designed to detect propaganda in text-based content like news articles and social media posts. HAPI combines conventional feature engineering methods with machine learning techniques to achieve high accuracy in propaganda detection. This study is conducted on data collected from Twitter via its API, and an annotation scheme is proposed to categorize tweets into binary classes (propaganda and non-propaganda). Hybrid feature engineering entails the amalgamation of various features, including Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), Sentimental features, and tweet length, among others. Multiple Machine Learning classifiers undergo training and evaluation utilizing the proposed methodology, leveraging a selection of 40 pertinent features identified through the hybrid feature selection technique. All the selected algorithms including Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Decision Tree (DT), and Logistic Regression (LR) achieved promising results. The SVM-based HaPi (SVM-HaPi) exhibits superior performance among traditional algorithms, achieving precision, recall, F-Measure, and overall accuracy of 0.69, 0.69, 0.69, and 69.2%, respectively. Furthermore, the proposed approach is compared to well-known existing approaches where it overperformed most of the studies on several evaluation metrics. This research contributes to the development of a comprehensive system tailored for propaganda identification in textual content. Nonetheless, the purview of propaganda detection transcends textual data alone. Deep learning algorithms like Artificial Neural Networks (ANN) offer the capability to manage multimodal data, incorporating text, images, audio, and video, thereby considering not only the content itself but also its presentation and contextual nuances during dissemination., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Khanday et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

27. Assessing the impact of climate variability on maize yields in the different regions of Ghana-A machine learning perspective.

Author: Gyamerah SA, Asare C, Agbi-Kaeser HO, and Baffour-Ata F
Subjects: Ghana, Climate Change, Support Vector Machine, Agriculture methods, Climate, Crops, Agricultural growth & development, Carbon Dioxide analysis, Carbon Dioxide metabolism, Temperature, Zea mays growth & development, Machine Learning
Abstract: Climate variability has become one of the most pressing issues of our time, affecting various aspects of the environment, including the agriculture sector. This study examines the impact of climate variability on Ghana's maize yield for all agro-ecological zones and administrative regions in Ghana using annual data from 1992 to 2019. The study also employs the stacking ensemble learning model (SELM) in predicting the maize yield in the different regions taking random forest (RF), support vector machine (SVM), gradient boosting (GB), decision tree (DT), and linear regression (LR) as base models. The findings of the study reveal that maize production in the regions of Ghana is inconsistent, with some regions having high variability. All the climate variables considered have positive impact on maize yield, with a lesser variability of temperature in the Guinea savanna zones and a higher temperature variability in the Volta Region. Carbon dioxide (CO2) also plays a significant role in predicting maize yield across all regions of Ghana. Among the machine learning models utilized, the stacking ensemble model consistently performed better in many regions such as in the Western, Upper East, Upper West, and Greater Accra regions. These findings are important in understanding the impact of climate variability on the yield of maize in Ghana, highlighting regional disparities in maize yield in the country, and highlighting the need for advanced techniques for forecasting, which are important for further investigation and interventions for agricultural planning and decision-making on food security in Ghana., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Gyamerah et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

28. Remote sensing estimation of sugar beet SPAD based on un-manned aerial vehicle multispectral imagery.

Author: Gao W, Zeng W, Li S, Zhang L, Wang W, Song J, and Wu H
Subjects: Unmanned Aerial Devices, Support Vector Machine, Soil chemistry, Machine Learning, Crops, Agricultural growth & development, Agriculture methods, Droughts, Beta vulgaris, Remote Sensing Technology methods, Remote Sensing Technology instrumentation
Abstract: Accurate, non-destructive and cost-effective estimation of crop canopy Soil Plant Analysis De-velopment(SPAD) is crucial for precision agriculture and cultivation management. Unmanned aerial vehicle (UAV) platforms have shown tremendous potential in predicting crop canopy SPAD. This was because they can rapidly and accurately acquire remote sensing spectral data of the crop canopy in real-time. In this study, a UAV equipped with a five-channel multispectral camera (Blue, Green, Red, Red_edge, Nir) was used to acquire multispectral images of sugar beets. These images were then combined with five machine learning models, namely K-Nearest Neighbor, Lasso, Random Forest, RidgeCV and Support Vector Machine (SVM), as well as ground measurement data to predict the canopy SPAD of sugar beets. The results showed that under both normal irrigation and drought stress conditions, the SPAD values in the normal ir-rigation treatment were higher than those in the water-limited treatment. Multiple vegetation indices showed a significant correlation with SPAD, with the highest correlation coefficient reaching 0.60. Among the SPAD prediction models, different models showed high estimation accuracy under both normal irrigation and water-limited conditions. The SVM model demon-strated a good performance with a correlation coefficient (R2) of 0.635, root mean square error (Rmse) of 2.13, and relative error (Re) of 0.80% for the prediction and testing values under normal irrigation. Similarly, for the prediction and testing values under drought stress, the SVM model exhibited a correlation coefficient (R2) of 0.609, root mean square error (Rmse) of 2.71, and rela-tive error (Re) of 0.10%. Overall, the SVM model showed good accuracy and stability in the pre-diction model, greatly facilitating high-throughput phenotyping research of sugar beet canopy SPAD., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Gao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

29. Impact of economic indicators on rice production: A machine learning approach in Sri Lanka.

Author: Kularathne S, Rathnayake N, Herath M, Rathnayake U, and Hoshino Y
Subjects: Sri Lanka, Agriculture economics, Crops, Agricultural growth & development, Crops, Agricultural economics, Crop Production economics, Support Vector Machine, Oryza growth & development, Machine Learning
Abstract: Rice is a crucial crop in Sri Lanka, influencing both its agricultural and economic landscapes. This study delves into the complex interplay between economic indicators and rice production, aiming to uncover correlations and build prediction models using machine learning techniques. The dataset, spanning from 1960 to 2020, includes key economic variables such as GDP, inflation rate, manufacturing output, population, population growth rate, imports, arable land area, military expenditure, and rice production. The study's findings reveal the significant influence of economic factors on rice production in Sri Lanka. Machine learning models, including Linear Regression, Support Vector Machines, Ensemble methods, and Gaussian Process Regression, demonstrate strong predictive accuracy in forecasting rice production based on economic indicators. These results underscore the importance of economic indicators in shaping rice production outcomes and highlight the potential of machine learning in predicting agricultural trends. The study suggests avenues for future research, such as exploring regional variations and refining models based on ongoing data collection., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Kularathne et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

30. Exploring the power of data mining for uncovering traditional medicinal plant knowledge: A case study in Shahrbabak, Iran.

Author: Bibak H, Heydari F, and Sadat-Hosseini M
Subjects: Iran, Humans, Middle Aged, Adult, Aged, Male, Female, Aged, 80 and over, Neural Networks, Computer, Knowledge, Algorithms, Support Vector Machine, Plants, Medicinal classification, Data Mining methods, Medicine, Traditional methods
Abstract: The present study recorded indigenous knowledge of medicinal plants in Shahrbabak, Iran. We described a method using data mining algorithms to predict medicinal plants' mode of application. Twenty-oneindividuals aged 28 to 81 were interviewed. Firstly, data were collected and analyzed based on quantitative indices such as the informant consensus factor (ICF), the cultural importance index (CI), and the relative frequency of citation (RFC). Secondly, the data was classified by support vector machines, J48 decision trees, neural networks, and logistic regression. So, 141 medicinal plants from 43 botanical families were documented. Lamiaceae, with 18 species, was the dominant family among plants, and plant leaves were most frequently used for medicinal purposes. The decoction was the most commonly used preparation method (56%), and therophytes were the most dominant (48.93%) among plants. Regarding the RFC index, the most important species are Adiantum capillus-veneris L. and Plantago ovata Forssk., while Artemisia auseri Boiss. ranked first based on the CI index. The ICF index demonstrated that metabolic disorders are the most common problems among plants in the Shahrbabak region. Finally, the J48 decision tree algorithm consistently outperforms other methods, achieving 95% accuracy in 10-fold cross-validation and 70-30 data split scenarios. The developed model detects with maximum accuracy how to consume medicinal plants., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Bibak et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

31. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.

Author: Asgari, Ehsaneddin and Mofrad, Mohammad
Subjects: Amino Acid Sequence, Computational Biology, Databases, Protein, Genomics, Intrinsically Disordered Proteins, Natural Language Processing, Nuclear Pore Complex Proteins, Protein Structure, Secondary, Proteins, Proteomics, Support Vector Machine
Abstract: We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as pre-training for various applications of deep learning in bioinformatics. The related data is available at Life Language Processing Website: http://llp.berkeley.edu and Harvard Dataverse: http://dx.doi.org/10.7910/DVN/JMFHTN.
Published: 2015

32. Insulin Resistance: Regression and Clustering

Author: Yoon, Sangho, Assimes, Themistocles L, Quertermous, Thomas, Hsiao, Chin-Fu, Chuang, Lee-Ming, Hwu, Chii-Min, Rajaratnam, Bala, and Olshen, Richard A
Subjects: Reproductive Medicine, Biomedical and Clinical Sciences, Biological Sciences, Nutrition, Diabetes, Blood Glucose, Cluster Analysis, Female, Glucose Tolerance Test, Humans, Insulin Resistance, Male, Polymorphism, Single Nucleotide, Principal Component Analysis, Regression Analysis, Reproducibility of Results, Support Vector Machine, General Science & Technology
Abstract: In this paper we try to define insulin resistance (IR) precisely for a group of Chinese women. Our definition deliberately does not depend upon body mass index (BMI) or age, although in other studies, with particular random effects models quite different from models used here, BMI accounts for a large part of the variability in IR. We accomplish our goal through application of Gauss mixture vector quantization (GMVQ), a technique for clustering that was developed for application to lossy data compression. Defining data come from measurements that play major roles in medical practice. A precise statement of what the data are is in Section 1. Their family structures are described in detail. They concern levels of lipids and the results of an oral glucose tolerance test (OGTT). We apply GMVQ to residuals obtained from regressions of outcomes of an OGTT and lipids on functions of age and BMI that are inferred from the data. A bootstrap procedure developed for our family data supplemented by insights from other approaches leads us to believe that two clusters are appropriate for defining IR precisely. One cluster consists of women who are IR, and the other of women who seem not to be. Genes and other features are used to predict cluster membership. We argue that prediction with "main effects" is not satisfactory, but prediction that includes interactions may be.
Published: 2014

33. Automatic detection of regions in spinach canopies responding to soil moisture deficit using combined visible and thermal imagery.

Author: Raza, Shan-e-Ahmed, Smith, Hazel K, Clarkson, Graham JJ, Taylor, Gail, Thompson, Andrew J, Clarkson, John, and Rajpoot, Nasir M
Subjects: Spinacia oleracea, Plant Leaves, Crops, Agricultural, Soil, Imaging, Three-Dimensional, Probability, Regression Analysis, Normal Distribution, Humidity, Temperature, Principal Component Analysis, Automation, Agricultural Irrigation, Support Vector Machine, Crops, Agricultural, Imaging, Three-Dimensional, General Science & Technology
Abstract: Thermal imaging has been used in the past for remote detection of regions of canopy showing symptoms of stress, including water deficit stress. Stress indices derived from thermal images have been used as an indicator of canopy water status, but these depend on the choice of reference surfaces and environmental conditions and can be confounded by variations in complex canopy structure. Therefore, in this work, instead of using stress indices, information from thermal and visible light imagery was combined along with machine learning techniques to identify regions of canopy showing a response to soil water deficit. Thermal and visible light images of a spinach canopy with different levels of soil moisture were captured. Statistical measurements from these images were extracted and used to classify between canopies growing in well-watered soil or under soil moisture deficit using Support Vector Machines (SVM) and Gaussian Processes Classifier (GPC) and a combination of both the classifiers. The classification results show a high correlation with soil moisture. We demonstrate that regions of a spinach crop responding to soil water deficit can be identified by using machine learning techniques with a high accuracy of 97%. This method could, in principle, be applied to any crop at a range of scales.
Published: 2014

34. Toward a semi-self-paced EEG brain computer interface: decoding initiation state from non-initiation state in dedicated time slots.

Author: Yang, Lingling, Leung, Howard, Peterson, David A, Sejnowski, Terrence J, and Poizner, Howard
Subjects: Humans, Electroencephalography, Electrooculography, Intention, Time Factors, Female, Male, Young Adult, Brain-Computer Interfaces, Support Vector Machine, General Science & Technology
Abstract: Brain computer interfaces (BCIs) offer a broad class of neurologically impaired individuals an alternative means to interact with the environment. Many BCIs are "synchronous" systems, in which the system sets the timing of the interaction and tries to infer what control command the subject is issuing at each prompting. In contrast, in "asynchronous" BCIs subjects pace the interaction and the system must determine when the subject's control command occurs. In this paper we propose a new idea for BCI which draws upon the strengths of both approaches. The subjects are externally paced and the BCI is able to determine when control commands are issued by decoding the subject's intention for initiating control in dedicated time slots. A single task with randomly interleaved trials was designed to test whether it can be used as stimulus for inducing initiation and non-initiation states when the sensory and motor requirements for the two types of trials are very nearly identical. Further, the essential problem on the discrimination between initiation state and non-initiation state was studied. We tested the ability of EEG spectral power to distinguish between these two states. Among the four standard EEG frequency bands, beta band power recorded over parietal-occipital cortices provided the best performance, achieving an average accuracy of 86% for the correct classification of initiation and non-initiation states. Moreover, delta band power recorded over parietal and motor areas yielded a good performance and thus could also be used as an alternative feature to discriminate these two mental states. The results demonstrate the viability of our proposed idea for a BCI design based on conventional EEG features. Our proposal offers the potential to mitigate the signal detection challenges of fully asynchronous BCIs, while providing greater flexibility to the subject than traditional synchronous BCIs.
Published: 2014

35. Inferring tie strength from online directed behavior.

Author: Jones, Jason J, Settle, Jaime E, Bond, Robert M, Fariss, Christopher J, Marlow, Cameron, and Fowler, James H
Subjects: Humans, Regression Analysis, Reproducibility of Results, ROC Curve, Social Behavior, Interpersonal Relations, Internet, Adolescent, Adult, Middle Aged, Friends, Female, Male, Young Adult, Social Networking, Support Vector Machine, General Science & Technology
Abstract: Some social connections are stronger than others. People have not only friends, but also best friends. Social scientists have long recognized this characteristic of social connections and researchers frequently use the term tie strength to refer to this concept. We used online interaction data (specifically, Facebook interactions) to successfully identify real-world strong ties. Ground truth was established by asking users themselves to name their closest friends in real life. We found the frequency of online interaction was diagnostic of strong ties, and interaction frequency was much more useful diagnostically than were attributes of the user or the user's friends. More private communications (messages) were not necessarily more informative than public communications (comments, wall posts, and other interactions).
Published: 2013

36. A Hybrid convolution neural network for the classification of tree species using hyperspectral imagery.

Author: Wang J and Jiang Y
Subjects: Algorithms, Hyperspectral Imaging methods, Deep Learning, Remote Sensing Technology methods, Neural Networks, Computer, Support Vector Machine, Trees classification
Abstract: In recent years, the advancement of hyperspectral remote sensing technology has greatly enhanced the detailed mapping of tree species. Nevertheless, delving deep into the significance of hyperspectral remote sensing data features for tree species recognition remains a challenging endeavor. The method of Hybrid-CS was proposed to addresses this challenge by synergizing the strengths of both deep learning and traditional learning techniques. Initially, we extract comprehensive correlation structures and spectral features. Subsequently, a hybrid approach, combining correlation-based feature selection with an optimized recursive feature elimination algorithm, identifies the most valuable feature set. We leverage the Support Vector Machine algorithm to evaluate feature importance and perform classification. Through rigorous experimentation, we evaluate the robustness of hyperspectral image-derived features and compare our method with other state-of-the-art classification methods. The results demonstrate: (1) Superior classification accuracy compared to traditional machine learning methods (e.g., SVM, RF) and advanced deep learning approaches on the tree species dataset. (2) Enhanced classification accuracy achieved by incorporating SVM and CNN information, particularly with the integration of attention mechanisms into the network architecture. Additionally, the classification performance of a two-branch network surpasses that of a single-branch network. (3) Consistent high accuracy across different proportions of training samples, indicating the stability and robustness of the method. This study underscores the potential of hyperspectral images and our proposed methodology for achieving precise tree species classification, thus holding significant promise for applications in forest resource management and monitoring., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Wang, Jiang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

37. Who are the best passing players in professional soccer? A machine learning approach for classifying passes with different levels of difficulty and discriminating the best passing players.

Author: Merlin M, Pinto A, Moura FA, Torres RDS, and Cunha SA
Subjects: Humans, Principal Component Analysis, Algorithms, Soccer, Athletic Performance classification, Machine Learning, Support Vector Machine
Abstract: The present study aimed to assess the use of technical-tactical variables and machine learning (ML) classifiers in the automatic classification of the passing difficulty (DP) level in soccer matches and to illustrate the use of the model with the best performance to distinguish the best passing players. We compared eight ML classifiers according to their accuracy performance in classifying passing events using 35 technical-tactical variables based on spatiotemporal data. The Support Vector Machine (SVM) algorithm achieved a balanced accuracy of 0.70 ± 0.04%, considering a multi-class classification. Next, we illustrate the use of the best-performing classifier in the assessment of players. In our study, 2,522 pass actions were classified by the SVM algorithm as low (53.9%), medium (23.6%), and high difficulty passes (22.5%). Furthermore, we used successful rates in low-DP, medium-DP, and high-DP as inputs for principal component analysis (PCA). The first principal component (PC1) showed a higher correlation with high-DP (0.80), followed by medium-DP (0.73), and low-DP accuracy (0.24). The PC1 scores were used to rank the best passing players. This information can be a very rich performance indication by ranking the best passing players and teams and can be applied in offensive sequences analysis and talent identification., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Merlin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

38. Image classification with symbolic hints using limited resources.

Author: Jørgensen MG, Tětková L, and Hansen LK
Subjects: Machine Learning, Algorithms, Image Processing, Computer-Assisted methods, Humans, Support Vector Machine
Abstract: Typical machine learning classification benchmark problems often ignore the full input data structures present in real-world classification problems. Here we aim to represent additional information as "hints" for classification. We show that under a specific realistic conditional independence assumption, the hint information can be included by late fusion. In two experiments involving image classification with hints taking the form of text metadata, we demonstrate the feasibility and performance of the fusion scheme. We fuse the output of pre-trained image classifiers with the output of pre-trained text models. We show that calibration of the pre-trained models is crucial for the performance of the fused model. We compare the performance of the fusion scheme with a mid-level fusion scheme based on support vector machines and find that these two methods tend to perform quite similarly, albeit the late fusion scheme has only negligible computational costs., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Jørgensen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

39. Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics.

Author: Sahid MA, Babar MUH, and Uddin MP
Subjects: Humans, Iraq epidemiology, Support Vector Machine, Blood Glucose analysis, Logistic Models, Machine Learning, Diabetes Mellitus diagnosis, Diabetes Mellitus blood
Abstract: Diabetes is a persistent metabolic disorder linked to elevated levels of blood glucose, commonly referred to as blood sugar. This condition can have detrimental effects on the heart, blood vessels, eyes, kidneys, and nerves as time passes. It is a chronic ailment that arises when the body fails to produce enough insulin or is unable to effectively use the insulin it produces. When diabetes is not properly managed, it often leads to hyperglycemia, a condition characterized by elevated blood sugar levels or impaired glucose tolerance. This can result in significant harm to various body systems, including the nerves and blood vessels. In this paper, we propose a multiclass diabetes mellitus detection and classification approach using an extremely imbalanced Laboratory of Medical City Hospital data dynamics. We also formulate a new dataset that is moderately imbalanced based on the Laboratory of Medical City Hospital data dynamics. To correctly identify the multiclass diabetes mellitus, we employ three machine learning classifiers namely support vector machine, logistic regression, and k-nearest neighbor. We also focus on dimensionality reduction (feature selection-filter, wrapper, and embedded method) to prune the unnecessary features and to scale up the classification performance. To optimize the classification performance of classifiers, we tune the model by hyperparameter optimization with 10-fold grid search cross-validation. In the case of the original extremely imbalanced dataset with 70:30 partition and support vector machine classifier, we achieved maximum accuracy of 0.964, precision of 0.968, recall of 0.964, F1-score of 0.962, Cohen kappa of 0.835, and AUC of 0.99 by using top 4 feature according to filter method. By using the top 9 features according to wrapper-based sequential feature selection, the k-nearest neighbor provides an accuracy of 0.935 and 1.0 for the other performance metrics. For our created moderately imbalanced dataset with an 80:20 partition, the SVM classifier achieves a maximum accuracy of 0.938, and 1.0 for other performance metrics. For the multiclass diabetes mellitus detection and classification, our experiments outperformed conducted research based on the Laboratory of Medical City Hospital data dynamics., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Sahid et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

40. Predicting malaria outbreak in The Gambia using machine learning techniques.

Author: Khan O, Ajadi JO, and Hossain MP
Subjects: Gambia epidemiology, Humans, Neural Networks, Computer, Algorithms, Malaria epidemiology, Disease Outbreaks, Machine Learning, Support Vector Machine
Abstract: Malaria is the most common cause of death among the parasitic diseases. Malaria continues to pose a growing threat to the public health and economic growth of nations in the tropical and subtropical parts of the world. This study aims to address this challenge by developing a predictive model for malaria outbreaks in each district of The Gambia, leveraging historical meteorological data. To achieve this objective, we employ and compare the performance of eight machine learning algorithms, including C5.0 decision trees, artificial neural networks, k-nearest neighbors, support vector machines with linear and radial kernels, logistic regression, extreme gradient boosting, and random forests. The models are evaluated using 10-fold cross-validation during the training phase, repeated five times to ensure robust validation. Our findings reveal that extreme gradient boosting and decision trees exhibit the highest prediction accuracy on the testing set, achieving 93.3% accuracy, followed closely by random forests with 91.5% accuracy. In contrast, the support vector machine with a linear kernel performs less favorably, showing a prediction accuracy of 84.8% and underperforming in specificity analysis. Notably, the integration of both climatic and non-climatic features proves to be a crucial factor in accurately predicting malaria outbreaks in The Gambia., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Khan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

41. A hybrid CNN-SVM model for enhanced autism diagnosis.

Author: Qiu L and Zhai J
Subjects: Humans, Male, Female, Autism Spectrum Disorder diagnosis, Autism Spectrum Disorder physiopathology, Autism Spectrum Disorder diagnostic imaging, Brain diagnostic imaging, Brain physiopathology, Adolescent, Child, Adult, Young Adult, Support Vector Machine, Magnetic Resonance Imaging methods, Neural Networks, Computer, Autistic Disorder diagnosis, Autistic Disorder physiopathology
Abstract: Autism is a representative disorder of pervasive developmental disorder. It exerts influence upon an individual's behavior and performance, potentially co-occurring with other mental illnesses. Consequently, an effective diagnostic approach proves to be invaluable in both therapeutic interventions and the timely provision of medical support. Currently, most scholars' research primarily relies on neuroimaging techniques for auxiliary diagnosis and does not take into account the distinctive features of autism's social impediments. In order to address this deficiency, this paper introduces a novel convolutional neural network-support vector machine model that integrates resting state functional magnetic resonance imaging data with the social responsiveness scale metrics for the diagnostic assessment of autism. We selected 821 subjects containing the social responsiveness scale measure from the publicly available Autism Brain Imaging Data Exchange dataset, including 379 subjects with autism spectrum disorder and 442 typical controls. After preprocessing of fMRI data, we compute the static and dynamic functional connectivity for each subject. Subsequently, convolutional neural networks and attention mechanisms are utilized to extracts their respective features. The extracted features, combined with the social responsiveness scale features, are then employed as novel inputs for the support vector machine to categorize autistic patients and typical controls. The proposed model identifies salient features within the static and dynamic functional connectivity, offering a possible biological foundation for clinical diagnosis. By incorporating the behavioral assessments, the model achieves a remarkable classification accuracy of 94.30%, providing a more reliable support for auxiliary diagnosis., Competing Interests: NO authors have competing interests., (Copyright: © 2024 Qiu, Zhai. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

42. Developing machine learning models to predict multi-class functional outcomes and death three months after stroke in Sweden.

Author: Otieno JA, Häggström J, Darehed D, and Eriksson M
Subjects: Humans, Sweden epidemiology, Male, Female, Aged, Aged, 80 and over, Prognosis, Middle Aged, Registries, Support Vector Machine, Logistic Models, Neural Networks, Computer, Risk Factors, Stroke physiopathology, Machine Learning
Abstract: Globally, stroke is the third-leading cause of mortality and disability combined, and one of the costliest diseases in society. More accurate predictions of stroke outcomes can guide healthcare organizations in allocating appropriate resources to improve care and reduce both the economic and social burden of the disease. We aim to develop and evaluate the performance and explainability of three supervised machine learning models and the traditional multinomial logistic regression (mLR) in predicting functional dependence and death three months after stroke, using routinely-collected data. This prognostic study included adult patients, registered in the Swedish Stroke Registry (Riksstroke) from 2015 to 2020. Riksstroke contains information on stroke care and outcomes among patients treated in hospitals in Sweden. Prognostic factors (features) included demographic characteristics, pre-stroke functional status, cardiovascular risk factors, medications, acute care, stroke type, and severity. The outcome was measured using the modified Rankin Scale at three months after stroke (a scale of 0-2 indicates independent, 3-5 dependent, and 6 dead). Outcome prediction models included support vector machines, artificial neural networks (ANN), eXtreme Gradient Boosting (XGBoost), and mLR. The models were trained and evaluated on 75% and 25% of the dataset, respectively. Model predictions were explained using SHAP values. The study included 102,135 patients (85.8% ischemic stroke, 53.3% male, mean age 75.8 years, and median NIHSS of 3). All models demonstrated similar overall accuracy (69%-70%). The ANN and XGBoost models performed significantly better than the mLR in classifying dependence with F1-scores of 0.603 (95% CI; 0.594-0.611) and 0.577 (95% CI; 0.568-0.586), versus 0.544 (95% CI; 0.545-0.563) for the mLR model. The factors that contributed most to the predictions were expectedly similar in the models, based on clinical knowledge. Our ANN and XGBoost models showed a modest improvement in prediction performance and explainability compared to mLR using routinely-collected data. Their improved ability to predict functional dependence may be of particular importance for the planning and organization of acute stroke care and rehabilitation., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Otieno et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

43. Prediction of heart failure patients with distinct left ventricular ejection fraction levels using circadian ECG features and machine learning.

Author: Al Younis SM, Hadjileontiadis LJ, Khandoker AH, Stefanini C, Soulaidopoulos S, Arsenos P, Doundoulakis I, Gatzoulis KA, and Tsioufis K
Subjects: Humans, Female, Male, Aged, Middle Aged, Circadian Rhythm physiology, Support Vector Machine, Neural Networks, Computer, Heart Failure physiopathology, Heart Failure diagnosis, Stroke Volume, Machine Learning, Electrocardiography methods, Ventricular Function, Left physiology
Abstract: Heart failure (HF) encompasses a diverse clinical spectrum, including instances of transient HF or HF with recovered ejection fraction, alongside persistent cases. This dynamic condition exhibits a growing prevalence and entails substantial healthcare expenditures, with anticipated escalation in the future. It is essential to classify HF patients into three groups based on their ejection fraction: reduced (HFrEF), mid-range (HFmEF), and preserved (HFpEF), such as for diagnosis, risk assessment, treatment choice, and the ongoing monitoring of heart failure. Nevertheless, obtaining a definitive prediction poses challenges, requiring the reliance on echocardiography. On the contrary, an electrocardiogram (ECG) provides a straightforward, quick, continuous assessment of the patient's cardiac rhythm, serving as a cost-effective adjunct to echocardiography. In this research, we evaluate several machine learning (ML)-based classification models, such as K-nearest neighbors (KNN), neural networks (NN), support vector machines (SVM), and decision trees (TREE), to classify left ventricular ejection fraction (LVEF) for three categories of HF patients at hourly intervals, using 24-hour ECG recordings. Information from heterogeneous group of 303 heart failure patients, encompassing HFpEF, HFmEF, or HFrEF classes, was acquired from a multicenter dataset involving both American and Greek populations. Features extracted from ECG data were employed to train the aforementioned ML classification models, with the training occurring in one-hour intervals. To optimize the classification of LVEF levels in coronary artery disease (CAD) patients, a nested cross-validation approach was employed for hyperparameter tuning. HF patients were best classified using TREE and KNN models, with an overall accuracy of 91.2% and 90.9%, and average area under the curve of the receiver operating characteristics (AUROC) of 0.98, and 0.99, respectively. Furthermore, according to the experimental findings, the time periods of midnight-1 am, 8-9 am, and 10-11 pm were the ones that contributed to the highest classification accuracy. The results pave the way for creating an automated screening system tailored for patients with CAD, utilizing optimal measurement timings aligned with their circadian cycles., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Al Younis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

44. Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data.

Author: Chen T and Kabir MF
Subjects: Humans, Neural Networks, Computer, Support Vector Machine, ROC Curve, Decision Trees, Neoplasms genetics, Machine Learning, Sequence Analysis, RNA methods
Abstract: In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Chen, Kabir. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

45. Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus.

Author: Idris NF, Ismail MA, Jaya MIM, Ibrahim AO, Abulfaraj AW, and Binzagr F
Subjects: Humans, Algorithms, Data Mining methods, Support Vector Machine, Male, Diabetes Mellitus diagnosis, Machine Learning
Abstract: Diabetes Mellitus is one of the oldest diseases known to humankind, dating back to ancient Egypt. The disease is a chronic metabolic disorder that heavily burdens healthcare providers worldwide due to the steady increment of patients yearly. Worryingly, diabetes affects not only the aging population but also children. It is prevalent to control this problem, as diabetes can lead to many health complications. As evolution happens, humankind starts integrating computer technology with the healthcare system. The utilization of artificial intelligence assists healthcare to be more efficient in diagnosing diabetes patients, better healthcare delivery, and more patient eccentric. Among the advanced data mining techniques in artificial intelligence, stacking is among the most prominent methods applied in the diabetes domain. Hence, this study opts to investigate the potential of stacking ensembles. The aim of this study is to reduce the high complexity inherent in stacking, as this problem contributes to longer training time and reduces the outliers in the diabetes data to improve the classification performance. In addressing this concern, a novel machine learning method called the Stacking Recursive Feature Elimination-Isolation Forest was introduced for diabetes prediction. The application of stacking with Recursive Feature Elimination is to design an efficient model for diabetes diagnosis while using fewer features as resources. This method also incorporates the utilization of Isolation Forest as an outlier removal method. The study uses accuracy, precision, recall, F1 measure, training time, and standard deviation metrics to identify the classification performances. The proposed method acquired an accuracy of 79.077% for PIMA Indians Diabetes and 97.446% for the Diabetes Prediction dataset, outperforming many existing methods and demonstrating effectiveness in the diabetes domain., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Idris et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

46. Extracting the winter wheat using the decision tree based on time series dual-polarization SAR feature and NDVI.

Author: Zhang H, Wang Z, Li Z, Liu X, Wang K, Sun S, Cheng S, and Gao Z
Subjects: China, Optical Imaging, Support Vector Machine, Triticum growth & development, Remote Sensing Technology, Decision Trees, Agriculture methods
Abstract: Winter wheat is one of the most important crops in the world. It is great significance to obtain the planting area of winter wheat timely and accurately for formulating agricultural policies. Due to the limited resolution of single SAR data and the susceptibility of single optical data to weather conditions, it is difficult to accurately obtain the planting area of winter wheat using only SAR or optical data. To solve the problem of low accuracy of winter wheat extraction only using optical or SAR images, a decision tree classification method combining time series SAR backscattering feature and NDVI (Normalized Difference Vegetation Index) was constructed in this paper. By synergy using of SAR and optical data can compensate for their respective shortcomings. First, winter wheat was distinguished from other vegetation by NDVI at the maturity stage, and then it was extracted by SAR backscattering feature. This approach facilitates the semi-automated extraction of winter wheat. Taking Yucheng City of Shandong Province as study area, 9 Sentinel-1 images and one Sentinel-2 image were taken as the data sources, and the spatial distribution of winter wheat in 2022 was obtained. The results indicate that the overall accuracy (OA) and kappa coefficient (Kappa) of the proposed method are 96.10% and 0.94, respectively. Compared with the supervised classification of multi-temporal composite pseudocolor image and single Sentinel-2 image using Support Vector Machine (SVM) classifier, the OA are improved by 10.69% and 5.66%, respectively. Compared with using only SAR feature for decision tree classification, the producer accuracy (PA) and user accuracy (UA) for extracting the winter wheat are improved by 3.08% and 8.25%, respectively. The method proposed in this paper is rapid and accurate, and provide a new technical method for extracting winter wheat., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

47. Predictive modelling of transport decisions and resources optimisation in pre-hospital setting using machine learning techniques.

Author: Farhat H, Makhlouf A, Gangaram P, El Aifa K, Howland I, Babay Ep Rekik F, Abid C, Khenissi MC, Castle N, Al-Shaikh L, Khadhraoui M, Gargouri I, Laughton J, and Alinier G
Subjects: Humans, Algorithms, Female, Male, Adult, Transportation of Patients methods, Support Vector Machine, Middle Aged, Aged, Adolescent, Young Adult, Machine Learning, Emergency Medical Services
Abstract: Background: The global evolution of pre-hospital care systems faces dynamic challenges, particularly in multinational settings. Machine learning (ML) techniques enable the exploration of deeply embedded data patterns for improved patient care and resource optimisation. This study's objective was to accurately predict cases that necessitated transportation versus those that did not, using ML techniques, thereby facilitating efficient resource allocation., Methods: ML algorithms were utilised to predict patient transport decisions in a Middle Eastern national pre-hospital emergency medical care provider. A comprehensive dataset comprising 93,712 emergency calls from the 999-call centre was analysed using R programming language. Demographic and clinical variables were incorporated to enhance predictive accuracy. Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost) algorithms were trained and validated., Results: All the trained algorithm models, particularly XGBoost (Accuracy = 83.1%), correctly predicted patients' transportation decisions. Further, they indicated statistically significant patterns that could be leveraged for targeted resource deployment. Moreover, the specificity rates were high; 97.96% in RF and 95.39% in XGBoost, minimising the incidence of incorrectly identified "Transported" cases (False Positive)., Conclusion: The study identified the transformative potential of ML algorithms in enhancing the quality of pre-hospital care in Qatar. The high predictive accuracy of the employed models suggested actionable avenues for day and time-specific resource planning and patient triaging, thereby having potential to contribute to pre-hospital quality, safety, and value improvement. These findings pave the way for more nuanced, data-driven quality improvement interventions with significant implications for future operational strategies., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Farhat et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

48. Development of new materials for electrothermal metals using data driven and machine learning.

Author: Zhou C, Pei M, Wu C, Xu D, Peng Q, and He G
Subjects: Algorithms, Metals chemistry, Temperature, Support Vector Machine, Machine Learning, Alloys chemistry, Neural Networks, Computer, Titanium chemistry
Abstract: After adopting a combined approach of data-driven methods and machine learning, the prediction of material performance and the optimization of composition design can significantly reduce the development time of materials at a lower cost. In this research, we employed four machine learning algorithms, including linear regression, ridge regression, support vector regression, and backpropagation neural networks, to develop predictive models for the electrical performance data of titanium alloys. Our focus was on two key objectives: resistivity and the temperature coefficient of resistance (TCR). Subsequently, leveraging the results of feature selection, we conducted an analysis to discern the impact of alloying elements on these two electrical properties.The prediction results indicate that for the resistivity data prediction task, the radial basis function kernel-based support vector machine model performs the best, with a correlation coefficient above 0.995 and a percentage error within 2%, demonstrating high predictive capability. For the TCR data prediction task, the best-performing model is a backpropagation neural network with two hidden layers, also with a correlation coefficient above 0.995 and a percentage error within 3%, demonstrating good generalization ability. The feature selection results using random forest and Xgboost indicate that Al and Zr have a significant positive effect on resistivity, while Al, Zr, and V have a significant negative effect on TCR. The conclusion of the composition optimization design suggests that to achieve both high resistivity and TCR, it is recommended to set the Al content in the range of 1.5% to 2% and the Zr content in the range of 2.5% to 3%., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Zhou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

49. The impact of technological innovation on the green digital economy and development strategies.

Author: Liu Y, Yang Y, Zhang X, and Yang Y
Subjects: Neural Networks, Computer, Support Vector Machine, Bayes Theorem, Humans, Sustainable Development trends, Inventions trends, Economic Development trends
Abstract: To investigate the interplay among technological innovation, industrial structure, production methodologies, economic growth, and environmental consequences within the paradigm of a green economy and to put forth strategies for sustainable development, this study scrutinizes the limitations inherent in conventional deep learning networks. Firstly, this study analyzes the limitations and optimization strategies of multi-layer perceptron (MLP) networks under the background of the green economy. Secondly, the MLP network model is optimized, and the dynamic analysis of the impact of technological innovation on the digital economy is discussed. Finally, the effectiveness of the optimization model is verified by experiments. Moreover, a sustainable development strategy based on dynamic analysis is also proposed. The experimental results reveal that, in comparison to traditional Linear Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB) models, the optimized model in this study demonstrates improved performance across various metrics. With a sample size of 500, the optimized model achieves a prediction accuracy of 97.2% for forecasting future trends, representing an average increase of 14.6%. Precision reaches 95.4%, reflecting an average enhancement of 19.2%, while sensitivity attains 84.1%, with an average improvement of 11.8%. The mean absolute error is only 1.16, exhibiting a 1.4 reduction compared to traditional models and confirming the effectiveness of the optimized model in prediction. In the examination of changes in industrial structure using 2020 data to forecast the output value of traditional and green industries in 2030, it is observed that the output value of traditional industries is anticipated to decrease, with an average decline of 11.4 billion yuan. Conversely, propelled by the development of the digital economy, the output value of green industries is expected to increase, with an average growth of 23.4 billion yuan. This shift in industrial structure aligns with the principles and trends of the green economy, further promoting sustainable development. In the study of innovative production methods, the green industry has achieved an increase in output and significantly enhanced production efficiency, showing an average growth of 2.135 million tons compared to the average in 2020. Consequently, this study highlights the dynamic impact of technological innovation on the digital economy and its crucial role within the context of a green economy. It holds certain reference significance for research on the dynamic effects of the digital economy under technological innovation., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

50. Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

Author: Uddin S and Lu H
Subjects: Humans, Support Vector Machine, Logistic Models, Algorithms, Machine Learning
Abstract: Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2024 Uddin, Lu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

Publisher

1,576 results on '"Support Vector Machine"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources