Descriptor: "machine learning model" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"machine learning model"' showing total 1,106 results

Start Over Descriptor "machine learning model"

1,106 results on '"machine learning model"'

1. Machine Learning-Based Acoustic System for Maturity Classification of Durian Fruit Before Harvesting

Author: Nguyen, Huu-Phuoc, Huynh, Viet-Lam, Duong, Thanh-Phong, Nguyen, Chanh-Nghiem, Tran, Nhut-Thanh, Li, Gang, Series Editor, Filipe, Joaquim, Series Editor, Ghosh, Ashish, Series Editor, Xu, Zhiwei, Series Editor, Thai-Nghe, Nguyen, editor, Do, Thanh-Nghi, editor, and Benferhat, Salem, editor
Published: 2025
Full Text: View/download PDF

2. Machine learning models can predict cancer-associated disseminated intravascular coagulation in critically ill colorectal cancer patients.

Author: Qin, Li, Mao, Jieling, Gao, Min, Xie, Jingwen, Liang, Zhikun, and Li, Xiaoyan
Abstract: Background: Due to its complex pathogenesis, the assessment of cancer-associated disseminated intravascular coagulation (DIC) is challenging. We aimed to develop a machine learning (ML) model to predict overt DIC in critically ill colorectal cancer (CRC) patients using clinical features and laboratory indicators. Methods: This retrospective study enrolled consecutive CRC patients admitted to the intensive care unit from January 2018 to December 2023. Four ML algorithms were used to construct predictive models using 5-fold cross-validation. The models' performance in predicting overt DIC and 30-day mortality was evaluated using the area under the receiver operating characteristic curve (ROC-AUC) and Cox regression analysis. The performance of three established scoring systems, ISTH DIC-2001, ISTH DIC-2018, and JAAM DIC, was also assessed for survival prediction and served as benchmarks for model comparison. Results: A total of 2,766 patients were enrolled, with 699 (25.3%) diagnosed with overt DIC according to ISTH DIC-2001, 1,023 (36.9%) according to ISTH DIC-2018, and 662 (23.9%) according to JAAM DIC. The extreme gradient boosting (XGB) model outperformed others in DIC prediction (ROC-AUC: 0.848; 95% CI: 0.818–0.878; p < 0.01) and mortality prediction (ROC-AUC: 0.708; 95% CI: 0.646–0.768; p < 0.01). The three DIC scores predicted 30-day mortality with ROC-AUCs of 0.658 for ISTH DIC-2001, 0.692 for ISTH DIC-2018, and 0.673 for JAAM DIC. Conclusion: The results indicate that ML models, particularly the XGB model, can serve as effective tools for predicting overt DIC in critically ill CRC patients. This offers a promising approach to improving clinical decision-making in this high-risk group. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Predicting Performance of Solar updraught Tower using Machine Learning Regression Model.

Author: Patil, Snehal, Dhoble, Ashwin, Sathe, Tushar, and Thawkar, Vivek
Subjects: *REGRESSION analysis, *LEARNING curve, *MATHEMATICAL models, *PLANT collecting, *POWER plants
Abstract: This work aims to analyse the performance of the SUT power plant in the environmental conditions of Nagpur and develop machine learning (ML) model to predict the power output of the SUT power plant. The developed ML model readily predicts the power output without rigorous calculations. Regularised polynomial regression with multiple variables technique was used to develop the ML model. Optimum values of the regularisation parameter, degree of the polynomial and the number of training examples were obtained from learning curves. A mathematical model of the SUT power plant is developed for analysing the performance of the SUT power plant and collecting training datasets for the ML regression model. The mathematical model analysed the effect of variation in chimney height, chimney and collector radius on SUT's performance and flow parameters. The results demonstrate that raising the height and radius of the chimney and collector radius enhances the power generation of the SUT power plant. The monthly average power output was observed to be maximum in June and July and minimum in December. Calculation of the power output is done from 5.00 AM to 6 PM. The maximum power production in June is 120 kW and in May is 117 kW. The average annual power output is around 64 kW. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Towards ML Models' Recommendations: Towards ML Models' Recommendations: L. Kallab et al.

Author: Kallab, Lara, Mansour, Elio, and Chbeir, Richard
Abstract: Artificial Intelligence encompasses a range of technologies that replicate human-like cognitive abilities through computer systems, enabling the execution of tasks associated with intelligent beings. A prominent way to achieve this is machine learning (ML), which optimizes system performance by employing learning algorithms to create models based on data and its inherent patterns. Today, a multitude of ML models exist having diverse characteristics, including the algorithm type, training dataset, and resultant performance. Such diversity complicates the selection of an appropriate model for a specific use case, answering user demands. This paper presents an approach for ML models retrieval based on the matching between user inputs and ML models criteria, all described in a semantic ML ontology named SML model (Semantic Machine Learning model), which facilitates the process of ML models selection. Our approach is based on similarities measures that we tested and experimented to score the ML models and retrieve the ones matching, at best, user inputs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Optimization of Cascade Aeration Characteristics and Predicting Aeration Efficiency with Machine Learning Model in Multistage Filtration.

Author: Saha, Nilanjan, Heim, Ronjon, Mazumdar, Asis, Banerjee, Gourab, and Sarkar, Oushnik
Subjects: MACHINE learning, DISSOLVED oxygen in water, RANDOM forest algorithms, WATER purification, DECISION trees, WATER filtration
Abstract: The study assesses the optimal aeration efficiency of a stepwise cascade aeration system through experimental trials in a lab scale model setup, aimed at determining the geometric and flow characteristics of the cascade system. Subsequently, the collected datasets are employed to evaluate the efficacy of four advanced machine learning algorithms, namely K-nearest neighbour (KNN), gradient boosting regressor (GBR), decision tree regressor (DTR), and random forest regressor (RFR), in predicting the aeration efficiency at 20 °C (E20) of the cascade aeration system. The predictive machine learning tools are compared based on different performance indices and various graphical interpretations including comparative plot, heat plot, plot of relative error, violin diagram, and Taylor diagram. For assessing the accuracy of the best-fitted predictive model, i.e. GBR, a field-scale surface-water–based water treatment plant with a multi-stage filtration unit, which was set up in an arsenic-affected rural area of West Bengal, India, was used to validate the results, and findings were used to optimize the field-scale plant. It is observed that E20 is dependent on dimensionless discharge (dc/h), squares of the number of the steps (N2), and inclination (h/l) as per dimension analysis. The analysis reveals that with an increase in inclination, E20 for a specific number of cascade steps drops to a certain point and then increases. The highest aeration efficiency (E20) of 0.913 is observed at a hydraulic loading rate of 0.167 l/m2/s, N = 10 and h/l = 0.64. Furthermore, the results demonstrate that the GBR model (with R2 test value of 0.96 and MAE test value of 0.027) emerges as the most accurate regression tool, surpassing the other models in predicting E20 values. Additionally, the findings indicated that at the flow rates of 0.075, 0.1, 0.125, and 0.15 m3/m2/h with the inclination of 0.363 and N = 10, the dissolved oxygen in water increases by more than 5 mg/l, with corresponding aeration efficiencies (E20) of 0.757, 0.675, and 0.602, respectively. Machine learning models offer the potential to optimize the design of aeration structures for accurate prediction and facilitating cost efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Spatiotemporal pattern of water hyacinth (Pontederia crassipes) distribution in Lake Tana, Ethiopia, using a random forest machine learning model.

Author: Belayhun, Matiwos, Chere, Zerihun, Abay, Nigus Gebremedhn, Nicola, Yonas, and Asmamaw, Abay
Subjects: MACHINE learning, WATER distribution, RANDOM forest algorithms, REMOTE-sensing images, NOXIOUS weeds, WATER hyacinth
Abstract: Water hyacinth (Pontederia crassipes) is an invasive weed that covers a significant portion of Lake Tana. The infestation has an impact on the lake's ecological and socioeconomic systems. Early detection of the spread of water hyacinth using geospatial techniques is crucial for its effective management and control. The main objective of this study was to examine the spatiotemporal distribution of water hyacinth from 2016 to 2022 using a random forest machine learning model. The study used 16 variables obtained from Sentinel-2A, Sentinel-1 SAR, and SRTM DEM, and a random forest supervised classification model was applied. Seven spectral indices, five spectral bands, two Sentinel-1 SAR bands, and two topographic variables were used in combination to model the spatial distribution of water hyacinth. The model was evaluated using the overall accuracy and kappa coefficient. The findings demonstrated that the overall accuracy ranged from 0.91 to 0.94 and kappa coefficient from 0.88 to 0.92 in the wet season and 0.93 to 0.95 and 0.90 to 0.93 in the dry season, respectively. B11 and B5 (2022), VH, soil adjusted vegetation index (SAVI), and normalized difference water index (NDWI) (2020), B5 and B12 (2018), and VH and slope (2016) are the highly important variables in the classification. The study found that the spatial coverage of water hyacinth was 686.5 and 650.4 ha (2016), 1,851 and 1,259 ha (2018), 1,396.7 and 1,305.7 ha (2020), and 1,436.5 and 1,216.5 ha (2022) in the wet and dry seasons, respectively. The research findings indicate that variables derived from optical (Sentinel-2A and SRTM) and non-optical (Sentinel-1 SAR) satellite imagery effectively identify water hyacinth and display its spatiotemporal spread using the random forest machine learning algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting.

Author: Zhao, Cuihuan, Yan, Shuan, and Li, Jiahang
Subjects: *RECEIVER operating characteristic curves, *PROTEIN stability, *AMINO acid sequence, *BIOTECHNOLOGY, *INDUSTRIALIZATION
Abstract: Thermophilic proteins maintain their stability and functionality under extreme high-temperature conditions, making them of significant importance in both fundamental biological research and biotechnological applications. In this study, we developed a machine learning-based thermophilic protein GradientBoosting prediction model, TPGPred, designed to predict thermophilic proteins by leveraging a large-scale dataset of both thermophilic and non-thermophilic protein sequences. By combining various machine learning algorithms with feature-engineering methods, we systematically evaluated the classification performance of the model, identifying the optimal feature combinations and classification models. Trained on a large public dataset of 5652 samples, TPGPred achieved an Accuracy score greater than 0.95 and an Area Under the Receiver Operating Characteristic Curve (AUROC) score greater than 0.98 on an independent test set of 627 samples. Our findings offer new insights into the identification and classification of thermophilic proteins and provide a solid foundation for their industrial application development. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. A machine learning model based on results of a comprehensive radiological evaluation can predict the prognosis of basal ganglia cerebral hemorrhage treated with neuroendoscopy.

Author: Hu, Xiaolong, Deng, Peng, Ma, Mian, Tang, Xiaoyu, Qian, Jinghong, Gong, YuHui, Wu, Jiandong, Xu, Xiaowen, and Ding, Zhiliang
Subjects: MACHINE learning, K-nearest neighbor classification, CEREBRAL hemorrhage, BASAL ganglia, SUPPORT vector machines, INTRACEREBRAL hematoma
Abstract: Introduction: Spontaneous intracerebral hemorrhage is the second most common subtype of stroke. Therefore, this study aimed to investigate the risk factors affecting the prognosis of patients with basal ganglia cerebral hemorrhage after neuroendoscopy. Methods: Between January 2020 and January 2024, 130 patients with basal ganglia cerebral hemorrhage who underwent neuroendoscopy were recruited from two independent centers. We split this dataset into training (n = 79), internal validation (n = 22), and external validation (n = 29) sets. The least absolute shrinkage and selection operator-regression algorithm was used to select the top 10 important radiomic features of different regions (perioperative hemorrhage area [PRH], perioperative surround area [PRS], postoperative hemorrhage area [PSH], and postoperative edema area [PSE]). The black hole, island, blend, and swirl signs were evaluated. The top 10 radiomic features and 4 radiological features were combined to construct the k-nearest neighbor classification (KNN), logistic regression (LR), and support vector machine (SVM) models. Finally, the performance of the perioperative hemorrhage and postoperative edema machine learning models was validated using another independent dataset (n = 29). The primary outcome is mRS at 6 months after discharge. The mRS score greater than 3 defined as functional independence. Results: A total of 12 models were built: PRH-KNN, PRH-LR, PRH-SVM, PRS-KNN, PRS-LR, PRS-SVM, PSH-KNN, PSH-LR, PSH-SVM, PSE-KNN, PSE-LR, and PSE-SVM, with corresponding areas under the curve (AUC) values in the internal validation set of 0.95, 0.91, 0.94, 0.52, 0.91, 0.54, 0.67, 0.9, 0.72, 0.92, 0.92, and 0.95, respectively. The AUC values of the PRH-KNN, PRH-LR, PRH-SVM, PSE-KNN, PSE-LR, and PSE-SVM in the external validation were 0.9, 0.92, 0.89, 0.91, 0.92, and 0.88, respectively. Conclusion: The model built based on computed tomography images of different regions accurately predicted the prognosis of patients with basal ganglia cerebral hemorrhage treated with neuroendoscopy. The models built based on the preoperative hematoma area and postoperative edema area showed excellent predictive efficacy in external verification, which has important clinical significance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Small-scale, large impact: utilizing machine learning to assess susceptibility to urban geological disasters—a case study of urban road collapses in Hangzhou.

Author: Yu, Bofan, Xing, Huaixue, Yan, Jiaxing, and Li, Yunan
Abstract: Compared with large-scale geological disasters such as landslides and earthquakes, small-scale urban geological disasters such as collapses and ground fissures are often overlooked. However, the socioeconomic impacts of these small-scale events can often exceed those of larger disasters in major cities. Although the use of machine learning for susceptibility assessment is a well-established aspect of large-scale geological disaster prevention, insufficient disaster samples and resultant dataset imbalances have hindered its application to small-scale urban geological disasters. To address this issue, we propose a comprehensive process that involves defining disaster risk areas to expand disaster sample points, optimizing the extraction method for training and test sets to balance the dataset, and selecting models with high generalization capabilities to enhance prediction accuracy. By focusing on all urban road collapse incidents from 2015 to 2023 in Binjiang District, Hangzhou’s most economically developed areas, we demonstrated the reliability of this process. Furthermore, to support urban policymakers, we employed the SHAP model to demystify the predictive process and assess the impact of factors, providing reliable analytical results. Our approach provides a replicable and comprehensive solution for susceptibility assessments of cities impacted by small-scale geological disasters using machine learning and subsequent analyses. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Diagnostic Potential of Eye Movements in Alzheimer's Disease via a Multiclass Machine Learning Model.

Author: Song, Jiaqi, Huang, Haodong, Liu, Jiarui, Wu, Jiani, Chen, Yingxi, Wang, Lisong, Zhong, Fuxin, Wang, Xiaoqin, Lin, Zihan, Yan, Mengyu, Zhang, Wenbo, Liu, Xintong, Tang, Xinyi, Lü, Yang, and Yu, Weihua
Abstract: Early diagnosis plays a crucial role in controlling Alzheimer's disease (AD) progression and delaying cognitive decline. Traditional diagnostic tools present great challenges to clinical practice due to their invasiveness, high cost, and time-consuming administration. This study was designed to construct a non-invasive and cost-effective classification model based on eye movement parameters to distinguish dementia due to AD (ADD), mild cognitive impairment (MCI), and normal cognition. Eye movement data were collected from 258 subjects, comprising 111 patients with ADD, 81 patients with MCI, and 66 individuals with normal cognition. The fixation, smooth pursuit, prosaccade, and anti-saccade tasks were performed. Machine learning methods were used to screen eye movement parameters and build diagnostic models. Pearson's correlation analysis was used to assess the correlations between the five most important eye movement indicators in the optimal model and neuropsychological scales. The gradient boosting classifier model demonstrated the best classification performance, achieving 68.2% of accuracy and 66.32% of F1-score in multiclass classification of AD. Moreover, the correlation analysis indicated that the eye movement parameters were associated with various cognitive functions, including general cognitive status, attention, visuospatial ability, episodic memory, short-term memory, and language and instrumental activities of daily life. Eye movement parameters in conjunction with machine learning methods achieve satisfactory overall accuracy, making it an effective and less time-consuming method to assist clinical diagnosis of AD. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Machine Learning Model Based on Prognostic Nutritional Index for Predicting Long‐Term Outcomes in Patients With HCC Undergoing Ablation.

Author: Zhang, Nan, Lin, Ke, Qiao, Bin, Yan, Liwei, Jin, Dongdong, Yang, Daopeng, Yang, Yue, Xie, Xiaohua, Xie, Xiaoyan, and Zhuang, Bowen
Subjects: *MACHINE learning, *DECISION making, *SURVIVAL rate, *OVERALL survival, *ABLATION techniques
Abstract: Aims: To develop multiple machine learning (ML) models based on the prognostic nutritional index (PNI) and determine the optimal model for predicting long‐term survival outcomes in hepatocellular carcinoma (HCC) patients after local ablation. Methods: From January 2009 to December 2019, we analyzed data from 848 primary HCC patients who underwent local ablation. ML models were constructed and evaluated using the concordance index (C‐index), concordance‐discordance area under curve (C/D AUC), and Brier scores. The optimal ML model was interpreted using the partial dependence plot (PDP) and SHapley Additive exPlanations (SHAP) framework. Additionally, the prognostic performance of our model was compared with other models. Results: Alkaline phosphatase, preoperation alpha‐fetoprotein level, PNI, tumor number, and tumor size were identified as independent prognostic factors for ML model construction. Among the 19 ML algorithms tested, the Aorsf model showed superior performance in both the training cohort (C/D AUC: 0.733; C‐index: 0.736; Brier score: 0.133) and validation cohort (C/D AUC: 0.713; C‐index: 0.793; Brier score: 0.117). The time‐dependent AUC of the Aorsf model for predicting overall survival was as follows: 1‐, 3‐, 5‐, 7‐, and 9‐year were 0.828, 0.765, 0.781, 0.817, and 0.812 in the training cohort, 0.846, 0.859, 0.824, 0.845, and 0.874 in the validation cohort, respectively. The PDP and SHAP algorithms were employed for visual interpretation. Furthermore, time‐AUC and decision curve analysis demonstrated that the Aorsf model provided superior clinical benefits compared to other models. Conclusion: The PNI‐based Aorsf model effectively predicts long‐term survival outcomes after ablation therapy, making a significant contribution to HCC research by improving surveillance, prevention, and treatment strategies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. A machine learning model based on results of a comprehensive radiological evaluation can predict the prognosis of basal ganglia cerebral hemorrhage treated with neuroendoscopy.

Author: Xiaolong Hu, Peng Deng, Mian Ma, Xiaoyu Tang, Jinghong Qian, YuHui Gong, Jiandong Wu, Xiaowen Xu, and Zhiliang Ding
Subjects: MACHINE learning, K-nearest neighbor classification, CEREBRAL hemorrhage, BASAL ganglia, SUPPORT vector machines, INTRACEREBRAL hematoma
Abstract: Introduction: Spontaneous intracerebral hemorrhage is the second most common subtype of stroke. Therefore, this study aimed to investigate the risk factors affecting the prognosis of patients with basal ganglia cerebral hemorrhage after neuroendoscopy. Methods: Between January 2020 and January 2024, 130 patients with basal ganglia cerebral hemorrhage who underwent neuroendoscopy were recruited from two independent centers. We split this dataset into training (n = 79), internal validation (n = 22), and external validation (n = 29) sets. The least absolute shrinkage and selection operator-regression algorithm was used to select the top 10 important radiomic features of different regions (perioperative hemorrhage area [PRH], perioperative surround area [PRS], postoperative hemorrhage area [PSH], and postoperative edema area [PSE]). The black hole, island, blend, and swirl signs were evaluated. The top 10 radiomic features and 4 radiological features were combined to construct the k-nearest neighbor classification (KNN), logistic regression (LR), and support vector machine (SVM) models. Finally, the performance of the perioperative hemorrhage and postoperative edema machine learning models was validated using another independent dataset (n = 29). The primary outcome is mRS at 6 months after discharge. The mRS score greater than 3 defined as functional independence. Results: A total of 12 models were built: PRH-KNN, PRH-LR, PRH-SVM, PRSKNN, PRS-LR, PRS-SVM, PSH-KNN, PSH-LR, PSH-SVM, PSE-KNN, PSE-LR, and PSE-SVM, with corresponding areas under the curve (AUC) values in the internal validation set of 0.95, 0.91, 0.94, 0.52, 0.91, 0.54, 0.67, 0.9, 0.72, 0.92, 0.92, and 0.95, respectively. The AUC values of the PRH-KNN, PRH-LR, PRH-SVM, PSEKNN, PSE-LR, and PSE-SVM in the external validation were 0.9, 0.92, 0.89, 0.91, 0.92, and 0.88, respectively. Conclusion: The model built based on computed tomography images of different regions accurately predicted the prognosis of patients with basal ganglia cerebral hemorrhage treated with neuroendoscopy. The models built based on the preoperative hematoma area and postoperative edema area showed excellent predictive efficacy in external verification, which has important clinical significance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. A prospective diagnostic model for breast cancer utilizing machine learning to examine the molecular immune infiltrate in HSPB6.

Author: Wang, Lizhe, Wang, Yu, Li, Yueyang, Zhou, Li, Liu, Sihan, Cao, Yongyi, Li, Yuzhi, Liu, Shenting, Du, Jiahui, Wang, Jin, and Zhu, Ting
Abstract: Background: Breast cancer is a significant public health issue worldwide, being the most prevalent cancer among women and a leading cause of death related to this disease. The molecular processes that propel breast cancer progression are not fully elucidated, highlighting the intricate nature of the underlying biology and its crucial impact on global health. The objective of this research was to perform bioinformatics analyses on breast cancer-related datasets to gain a comprehensive understanding of the molecular mechanisms at play and to identify key genes associated with the disease. Methods: The toolkit analyses involve techniques such as differential gene expression analysis, Gene Set Enrichment Analysis (GSEA), Weighted Co-Expression Network Analysis (WGCNA), and Machine Learning algorithms. Furthermore, in vitro cell experiments have demonstrated the impact of HSPB6 on cell migration, proliferation, and apoptosis. Results: The study identified multiple genes that displayed differential expression in breast cancer, notably FHL1 and HSPB6. A machine learning model was developed in this study and specifically trained for breast cancer diagnosis using these genes, achieving high precision. Furthermore, analysis of immune cell infiltration revealed an enrichment of Tregs and M2 macrophages in the treated group, showcasing its significant impact on the tumor’s immunological context. A temporal analysis of breast cancer cells using single-cell RNA sequencing provided insights into cellular developmental trajectories and highlighted changes in expression patterns across key genes during disease progression. The upregulation of HSPB6 in MCF7 cells significantly inhibited both cell migration and proliferation abilities, suggesting that promoting HSPB6 expression could induce ferroptosis in breast cancer cells. Conclusion: Our findings have identified compelling molecular targets and distinctive diagnostic markers for the clinical management of breast cancer. This data will serve as crucial guidance for further research in the field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Machine learning-aided unveiling the relationship between chemical pretreatment and methane production of lignocellulosic waste.

Author: Song, Chao, Zhang, Zhijing, Wang, Xuefeng, Hu, Xuejun, Chen, Chang, and Liu, Guangqing
Subjects: *MACHINE learning, *LIGNOCELLULOSE, *CHEMICAL milling, *ANAEROBIC digestion, *HYDROGEN peroxide
Abstract: [Display omitted] • Machine learning model could predict methane yield of pretreated LW accurately. • Digestion time, pretreatment agent and pretreatment agent were crucial factors. • NaOH, KOH and AHP pretreatments were suitable for LW with low lignin content. • LW with high lignin content preferred AHP pretreatment. Chemical pretreatment is a common method to enhance the cumulative methane yield (CMY) of lignocellulosic waste (LW) but its effectiveness is subject to various factors, and accurate estimation of methane production of pretreated LW remains a challenge. Here, based on 254 LW samples, a machine learning (ML) model to predict the methane production performance of pretreated feedstock was constructed using two automated ML platforms (tree-based pipeline optimization tool and neural network intelligence). Furthermore, the interactive effects of pretreatment conditions, feedstock properties, and digestion conditions on methane production of pretreated LW were studied through model interpretability analysis. The optimal ML model performed well on the validation set, and the digestion time, pretreatment agent, and lignin content (LC) were found to be key factors affecting the methane production of pretreated LW. If the LC in the raw LW was lower than 15%, the maximum CMY might be achieved using the NaOH, KOH, and alkaline hydrogen peroxide (AHP) with concentrations of 3.8%, 4.4%, and 4.5%, respectively. On the other hand, if LC was higher than 15%, only high concentrations of AHP exceeding 4% could significantly increase methane production. This study provides valuable guidance for optimizing pretreatment process, comparing different chemical pretreatment approaches, and regulating the operation of large-scale biogas plants. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. 有限样本下土壤有机碳密度空间分布预测模型对比分析.

Author: 袁可, 张晨, 赵建林, 汪珍亮, 杨节, and 许中胜
Subjects: *SUPPORT vector machines, *RANDOM forest algorithms, *CARBON in soils, *LAND use, *TOPSOIL
Abstract: [Objective] The aim of this study is to explore the accuracy and applicability of different machine learning models for predicting the spatial distribution of surface soil organic carbon density (SOCD) with limited samples, which can provide a references for the study of watershed scale carbon pool in the Chinese Loess Plateau. [Methods] In this study, we compared the accuracy and stability of the predicted SOCD in topsoil (0-20 cm) by four machine learning models, namely Multiple Linear Stepwise Regression (SR), Random Forest (RF), Extreme Gradient Boosting (XGB) and Support Vector Machine (SVM), based on the limited measured samples in a sub-watershed of Yanhe River in the Chinese Loess Plateau. [Results] (1) Under the condition of limited samples, all models successfully and appropriately predicte the spatial distribution of SOCD, among which the SVM model has the best model performance, and the average RMSE, R² and MAE of 50 predictions is 0.74. 0.43 and 0.64, respectively. (2) The average SOCD of different land use types are consistent between measured and predicted values but shows significant difference among land use types. SOCD decreases in the order: shrubland forestland > grassland > cropland. The total organic carbon of cultivated land in the study area is 2.39×106t (0-20 cm). (3) The evaluation of feature importance shows that terrain factors, NDVI max, near-infrared surface reflectance (B5) and Brightness index have significant contributions to the accuracy of predictions. [Conclusion] Under the condition of limited samples, the machine learning model combined with controlling features can be effectively applied to the prediction of the spatial distribution of topsoil SOCD at the watershed scale in the Chinese Loess Plateau. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Comparison of Machine Learning Models Using Diffusion-Weighted Images for Pathological Grade of Intrahepatic Mass-Forming Cholangiocarcinoma.

Author: Xing, Li-Hong, Wang, Shu-Ping, Zhuo, Li-Yong, Zhang, Yu, Wang, Jia-Ning, Ma, Ze-Peng, Zhao, Ying-Jia, Yuan, Shuang-Rui, Zu, Qian-He, and Yin, Xiao-Ping
Abstract: Is the radiomic approach, utilizing diffusion-weighted imaging (DWI), capable of predicting the various pathological grades of intrahepatic mass-forming cholangiocarcinoma (IMCC)? Furthermore, which model demonstrates superior performance among the diverse algorithms currently available? The objective of our study is to develop DWI radiomic models based on different machine learning algorithms and identify the optimal prediction model. We undertook a retrospective analysis of the DWI data of 77 patients with IMCC confirmed by pathological testing. Fifty-seven patients initially included in the study were randomly assigned to either the training set or the validation set in a ratio of 7:3. We established four different classifier models, namely random forest (RF), support vector machines (SVM), logistic regression (LR), and gradient boosting decision tree (GBDT), by manually contouring the region of interest and extracting prominent radiomic features. An external validation of the model was performed with the DWI data of 20 patients with IMCC who were subsequently included in the study. The area under the receiver operating curve (AUC), accuracy (ACC), precision (PRE), sensitivity (REC), and F1 score were used to evaluate the diagnostic performance of the model. Following the process of feature selection, a total of nine features were retained, with skewness being the most crucial radiomic feature demonstrating the highest diagnostic performance, followed by Gray Level Co-occurrence Matrix lmc1 (glcm-lmc1) and kurtosis, whose diagnostic performances were slightly inferior to skewness. Skewness and kurtosis showed a negative correlation with the pathological grading of IMCC, while glcm-lmc1 exhibited a positive correlation with the IMCC pathological grade. Compared with the other three models, the SVM radiomic model had the best diagnostic performance with an AUC of 0.957, an accuracy of 88.2%, a sensitivity of 85.7%, a precision of 85.7%, and an F1 score of 85.7% in the training set, as well as an AUC of 0.829, an accuracy of 76.5%, a sensitivity of 71.4%, a precision of 71.4%, and an F1 score of 71.4% in the external validation set. The DWI-based radiomic model proved to be efficacious in predicting the pathological grade of IMCC. The model with the SVM classifier algorithm had the best prediction efficiency and robustness. Consequently, this SVM-based model can be further explored as an option for a non-invasive preoperative prediction method in clinical practice. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Data Mining Approach for Evil Twin Attack Identification in Wi-Fi Networks.

Author: Banakh, Roman, Nyemkova, Elena, Justice, Connie, Piskozub, Andrian, and Lakh, Yuriy
Subjects: MACHINE learning, IEEE 802.11 (Standard), WIRELESS sensor networks, COMPUTER networking equipment, COMPUTER network security, INTRUSION detection systems (Computer security)
Abstract: Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a legitimate access point. The article is devoted to solving the problem of intrusion detection at the OSI model's physical layers. To solve this, a hardware–software complex has been developed to collect information about the signal strength from Wi-Fi access points using wireless sensor networks. The collected data were supplemented with a generative algorithm considering all possible combinations of signal strength. The k-nearest neighbor model was trained on the obtained data to distinguish the signal strength of legitimate from illegitimate access points. To verify the authenticity of the data, an Evil Twin attack was physically simulated, and a machine learning model analyzed the data from the sensors. As a result, the Evil Twin attack was successfully identified based on the signal strength in the radio spectrum. The proposed model can be used in open access points as well as in large corporate and home Wi-Fi networks to detect intrusions aimed at substituting devices in the radio spectrum where IEEE 802.11 networking equipment operates. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. A novel strategy based on machine learning of selective cooling control of work roll for improvement of cold rolled strip flatness.

Author: Wang, Pengfei, Deng, Jinkun, Li, Xu, Hua, Changchun, Su, Lihong, and Deng, Guanyu
Subjects: STEEL strip, COLD rolling, OPTIMIZATION algorithms, MACHINE learning, STEEL manufacture, WOLVES, SUPPORT vector machines
Abstract: Precise selective cooling control of work roll can significantly improve the cold rolled strip flatness in steel manufacturing industry. To improve the control accuracy of the coolant output of selective work roll cooling control system, a machine learning (ML) algorithm with differential evolution-gray wolf algorithm optimization support vector machine regression (DE-GWO-SVR) model has been proposed for the first time in this study. This model combines the differential evolution (DE) with grey wolf optimization algorithm (GWO) to improve the optimization performance of the algorithm. Then, the SVR model parameters are optimized with differential evolutionary gray wolf hybrid algorithm (DE-GWO) to improve the regression accuracy. Finally, the influences of data normalization methods and the selection of SVR kernel functions were systematically investigated. Compared with the test results of other regression models, the evaluation index R2 based on the DE-GWO-SVR model is greater and the RMSE, MAE, and MAPE are smaller. The DE-GWO-SVR model performs the best, with a higher regression accuracy than the other regression models. Besides, it has been successfully applied to a 1450 mm five-stand industrial cold rolling mill. The model has higher control accuracy for the thermal crown of the work roll and better control effect for the flatness deviation of the strip steel. This study provides a novel strategy with a help of ML algorithm to effectively improve the flatness quality of cold rolled strips by optimizing the selective cooling control of work roll, which exhibits a great practical application potential in steel manufacturing. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Performance analysis of machine learning models for AQI prediction in Gorakhpur City: a critical study.

Author: Mandvi, Patel, Prabhat Kumar, and Singh, Hrishikesh Kumar
Subjects: MACHINE learning, AIR quality indexes, STANDARD deviations, AIR quality, PARTICULATE matter
Abstract: Air pollution and climate change are two complementary forces that directly or indirectly affect the environment's physical, chemical, and biological processes. The air quality index is a parameter defined to cope with this effect of air pollution. This study delves deeper into predicting this AQI parameter using multiple machine learning-based models. The AQI pollutants considered for this study are particulate matter (PM10, PM2.5), SO2, and NO2. It also tries to develop a comparative analysis of two different machine learning (ML) models viz. a viz. XGBoost and Lasso regression. An ever-changing emission concentration of pollutants is displayed by this study conducted in the urban city of Gorakhpur Uttar Pradesh, India. The validation of prediction accuracies of models was done over several statistical metrics. The value of the R2 metric for XGBoost (0.9985) is comparatively more than the R2 value for Lasso regression (0.9218) indicating lesser variance and higher accuracy of XGBoost in predicting AQI. Various statistical measures are taken into consideration in this study, including mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), T-test and p-values, and confidence intervals (CI). An increased degree of model accuracy is suggested as XGBoost's MAE, MSE, and RMSE values are significantly lower than Lasso's. Statistically significant performance differences between the XGBoost and Lasso regression models are demonstrated by T-statistics and p-values for MAE, MSE, RMSE, and R2. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Integrated multiomics analysis identified comprehensive crosstalk between diverse programmed cell death patterns and novel molecular subtypes in Hepatocellular Carcinoma

Author: Li Chen, Yuanbo Hu, Yu Li, Bingyu Zhang, Jiale Wang, Mengmeng Deng, Jinlian Zhang, Wenyao Zhu, Hao Gu, and Lingyu Zhang
Subjects: Hepatocellular carcinoma, Molecular subtype, Programmed cell death, Machine learning model, Integrated multi-omics analysis, Medicine, Science
Abstract: Abstract Hepatocellular carcinoma (HCC) is a highly aggressive malignancy with increasing global prevalence and is one of the leading causes of cancer-related mortality in the human population. Developing robust clinical prediction models and prognostic stratification strategies is crucial for developing individualized treatment plans. A range of novel forms of programmed cell death (PCD) plays a role in the pathological progression and advancement of HCC, and in-depth study of PCD is expected to further improve the prognosis of HCC patients. Sixteen patterns (apoptosis, autophagy, anoikis, lysosome-dependent cell death, immunogenic cell death, necroptosis, ferroptosis, netosis, pyroptosis, disulfidptosis, entotic cell death, cuproptosis, parthanatos, netotic cell death, alkaliptosis, and oxeiptosis) related to PCD were collected from the literatures and used for subsequent analysis. Supervised (Elastic net, Random Forest, XgBoost, and Boruta) and unsupervised (Nonnegative Matrix Factorization, NMF) clustering algorithms were applied to develop and validate a novel classifier for the individualized management of HCC patients at the transcriptomic, proteomic and single-cell levels. Multiple machine learning algorithms developed a programmed cell death index (PCDI) comprising five robust signatures (FTL, G6PD, SLC2A1, HTRA2, and DLAT) in four independent HCC cohorts, and a higher PCDI was predictive of higher pathological grades and worse prognoses. Furthermore, a higher PCDI was found to be correlated with the presence of a repressive tumor immune microenvironment (TME), as determined through an integrated examination of bulk and single-cell transcriptome data. In addition, patients with TP53 mutation had higher PCDI in comparison with TP53 WT patients. Three HCC subtypes were identified through unsupervised clustering (NMF), exhibiting distinct prognoses and significant biological processes, among the three subtypes, PCDcluster 3 was of particular interest as it contained a large proportion of patients with high risk and low metabolic activity. Construction and evaluation of the Nomogram model was drawn based on the multivariate logistic regression analysis, and highlighted the robustness of the Nomogram model in other independent HCC cohorts. Finally, to explore the prognostic value, we also validated the frequent upregulation of DLAT in a real-world cohort of human HCC specimens by qPCR, western blot, and immunohistochemical staining (IHC). Together, our work herein comprehensively emphasized PCD-related patterns and key regulators, such as DLAT, contributed to the evolution and prognosis of tumor foci in HCC patients, and strengthened our understanding of PCD characteristics and promoted more effective risk stratification strategies.
Published: 2024
Full Text: View/download PDF

21. Towards ML Models’ Recommendations

Author: Lara Kallab, Elio Mansour, and Richard Chbeir
Subjects: Machine learning model, Supervised learning, Ontology, User input, Similarities criteria, ML models and user inputs alignment, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract Artificial Intelligence encompasses a range of technologies that replicate human-like cognitive abilities through computer systems, enabling the execution of tasks associated with intelligent beings. A prominent way to achieve this is machine learning (ML), which optimizes system performance by employing learning algorithms to create models based on data and its inherent patterns. Today, a multitude of ML models exist having diverse characteristics, including the algorithm type, training dataset, and resultant performance. Such diversity complicates the selection of an appropriate model for a specific use case, answering user demands. This paper presents an approach for ML models retrieval based on the matching between user inputs and ML models criteria, all described in a semantic ML ontology named SML model (Semantic Machine Learning model), which facilitates the process of ML models selection. Our approach is based on similarities measures that we tested and experimented to score the ML models and retrieve the ones matching, at best, user inputs.
Published: 2024
Full Text: View/download PDF

22. Landslide susceptibility assessment in Shenzhen based on multi-scale convolutional neural networks model

Author: Qing ZHANG, Yi HE, Xueye CHEN, Binghai GAO, Lifeng ZHANG, Zhanao ZHAO, Jiangang LU, and Yalei ZHANG
Subjects: mscnn, landslide susceptibility assessment, machine learning model, shenzhen, Geology, QE1-996.5
Abstract: Convolutional neural network (CNN) models are widely used in landslide susceptibility assessment due to their powerful feature extraction capabilities, and traditional CNN is no longer able to meet the requirements. Therefore, this paper proposes a multi-scale convolutional neural networks (MSCNN) model that can take into account deep and shallow features. By increasing the depth of the model and expanding the receptive field of samples, the MSCNN can tap deeper and more stable features to improve the reliability of landslide susceptibility assessment in complex scenarios. In this study, Shenzhen City is selected as the research area, and 12 landslide conditioning factors of landslides in Shenzhen City were selected based on systematic and representative principles. A multi-scale convolutional neural network landslide susceptibility assessment model is constructed and compared with methods such as multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF). The results show that the AUC value (0.99) of the MSCNN model constructed in this paper is higher than that of MLP (0.97), SVM (0.91), and RF (0.85), which proves that the proposed MSCNN model has excellent prediction ability. The area of extremely high susceptibility in Shenzhen City is approximately 105.3 km², accounting for 4.98% of the total area of the study area, mainly distributed in Longgang District with steep slopes, sparse vegetation cover, and frequent human engineering activities. Slope, surface roughness, and surface relief are identified as the main conditioning factors affecting landslides in Shenzhen City. The landslide susceptibility mapping implemented in this paper reflects the current distribution of landslide disasters in Shenzhen City, providing data support and key technical support for future landslide disaster prevention and control in Shenzhen City.
Published: 2024
Full Text: View/download PDF

23. Prediction of HPC compressive strength based on machine learning

Author: Libing Jin, Jie Duan, Yichen Jin, Pengfei Xue, and Pin Zhou
Subjects: Genetic algorithm, Machine learning model, High-performance concrete, Compressive strength, Parameter analysis, Medicine, Science
Abstract: Abstract There is a complex high-dimensional nonlinear mapping relationship between the compressive strength of High-Performance Concrete (HPC) and its components, which has great influence on the accurate prediction of compressive strength. In this paper, an efficient robust software calculation strategy combining BP Neural Network (BPNN), Support Vector Machine (SVM) and Genetic Algorithm (GA) is proposed for the prediction of compressive strength of HPC. 8 features were extracted from the previous literature, and a compressive strength database containing 454 sets of data was constructed. The model was trained and tested, and the performance of 4 Machine Learning (ML) models, namely BPNN, SVM, GA-BPNN and GA-SVM, was compared. The results show that the coupled model is superior to the single model. Moreover, because GA-SVM has better generalization ability and theoretical basis, its convergence speed and prediction accuracy are better than GA-BPNN. Then Grey Relational Analysis (GRA) and Shapley analysis were used to verify the interpretability of the GA-SVM model, which showed that the water-binder ratio had the most significant influence on the compressive strength. Finally, the combination of multiple input variables to evaluate the compressive strength supplemented this research, and again verified the significant influence of water-binder ratio, providing reference value for subsequent research.
Published: 2024
Full Text: View/download PDF

24. Data‐Driven Cycle Life Prediction of Lithium Metal‐Based Rechargeable Battery Based on Discharge/Charge Capacity and Relaxation Features.

Author: Si, Qianli, Matsuda, Shoichi, Yamaji, Youhei, Momma, Toshiyuki, and Tateyama, Yoshitaka
Subjects: *MACHINE learning, *BATTERY management systems, *FEATURE selection, *STORAGE batteries, *ENERGY density
Abstract: Achieving precise estimates of battery cycle life is a formidable challenge due to the nonlinear nature of battery degradation. This study explores an approach using machine learning (ML) methods to predict the cycle life of lithium‐metal‐based rechargeable batteries with high mass loading LiNi0.8Mn0.1Co0.1O2 electrode, which exhibits more complicated and electrochemical profile during battery operating conditions than typically studied LiFePO₄/graphite based rechargeable batteries. Extracting diverse features from discharge, charge, and relaxation processes, the intricacies of cell behavior without relying on specific degradation mechanisms are navigated. The best‐performing ML model, after feature selection, achieves an R2 of 0.89, showcasing the application of ML in accurately forecasting cycle life. Feature importance analysis unveils the logarithm of the minimum value of discharge capacity difference between 100 and 10 cycle (Log(|min(ΔDQ 100–10(V))|)) as the most important feature. Despite the inherent challenges, this model demonstrates a remarkable 6.6% test error on unseen data, underscoring its robustness and potential for transformative advancements in battery management systems. This study contributes to the successful application of ML in the realm of cycle life prediction for lithium‐metal‐based rechargeable batteries with practically high energy density design. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Enhancing Remote Sensing Water Quality Inversion through Integration of Multisource Spatial Covariates: A Case Study of Hong Kong's Coastal Nutrient Concentrations.

Author: Zhang, Zewei, Li, Cangbai, Yang, Pan, Xu, Zhihao, Yao, Linlin, Wang, Qi, Chen, Guojun, and Tan, Qian
Subjects: *MACHINE learning, *TOTAL suspended solids, *REMOTE sensing, *TERRITORIAL waters, *WATER quality
Abstract: The application of remote sensing technology for water quality monitoring has attracted much attention recently. Remote sensing inversion in coastal waters with complex hydrodynamics for non-optically active parameters such as total nitrogen (TN) and total phosphorus (TP) remains a challenge. Existing studies build the relationships between remote sensing spectral data and TN/TP directly or indirectly via the mediation of optically active parameters (e.g., total suspended solids). Such models are often prone to overfitting, performing well with the training set but underperforming with the testing set, even though both datasets are from the same region. Using the Hong Kong coastal region as a case study, we address this issue by incorporating spatial covariates such as hydrometeorological and locational variables as additional input features for machine learning-based inversion models. The proposed model effectively alleviates overfitting while maintaining a decent level of accuracy (R2 exceeding 0.7) during the training, validation and testing steps. The gap between model R2 values in training and testing sets is controlled within 7%. A bootstrap uncertainty analysis shows significantly improved model performance as compared to the model with only remote sensing inputs. We further employ the Shapely Additive Explanations (SHAP) analysis to explore each input's contribution to the model prediction, verifying the important role of hydrometeorological and locational variables. Our results provide a new perspective for the development of remote sensing inversion models for TN and TP in similar coastal waters. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. 基于信息量−机器学习耦合模型的采煤沉陷区滑坡敏感性评价.

Author: 牛晨昊, 焦润成, 韩建锋, 王晟宇, 郭学飞, 刘畅, 韩玉成, and 宋国峰
Subjects: MACHINE learning, LANDSLIDE hazard analysis, MINE subsidences, COAL mining, STATISTICAL learning
Abstract: Copyright of Coal Geology & Exploration is the property of Xian Research Institute of China Coal Research Institute and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

27. Risk Mapping of Geological Hazards in Plateau Mountainous Areas Based on Multisource Remote Sensing Data Extraction and Machine Learning (Fuyuan, China).

Author: Zhang, Shaohan, Tan, Shucheng, Sun, Yongqi, Ding, Duanyu, and Yang, Wei
Subjects: MACHINE learning, GEOLOGICAL mapping, REMOTE sensing, EMERGENCY management, RANDOM forest algorithms, GEOLOGICAL maps
Abstract: Selecting the most effective prediction model and correctly identifying the main disaster-driving factors in a specific region are the keys to addressing the challenges of geological hazards. Fuyuan County is a typical plateau mountainous town, and slope geological hazards occur frequently. Therefore, it is highly important to study the spatial distribution characteristics of hazards in this area, explore machine learning models that can be highly matched with the geological environment of the study area, and improve the accuracy and reliability of the slope geological hazard risk zoning map (SGHRZM). This paper proposes a hazard mapping research method based on multisource remote sensing data extraction and machine learning. In this study, we visualize the risk level of geological hazards in the study area according to 10 pathogenic factors. Moreover, the accuracy of the disaster point list was verified on the spot. The results show that the coupling model can maximize the respective advantages of the models used and has highest mapping accuracy, and the area under the curve (AUC) is 0.923. The random forest (RF) model was the leader in terms of which single model performed best, with an AUC of 0.909. The grid search algorithm (GSA) is an efficient parameter optimization technique that can be used as a preferred method to improve the accuracy of a model. The list of disaster points extracted from remote sensing images is highly reliable. The high-precision coupling model and the single model have good adaptability in the study area. The research results can provide not only scientific references for local government departments to carry out disaster management work but also technical support for relevant research in surrounding mountainous towns. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Feature-Based Classification of Mild Cognitive Impairment and Alzheimer's Disease Based on Optical Coherence Tomographic Angiographic Image.

Author: Visitsattapongse, Sarinporn, Maneerat, Areerat, Trinavarat, Adisak, Rattanabannakit, Chatchawan, Morkphrom, Ekkaphop, Senanarong, Vorapun, Srinonprasert, Varalak, Songsaeng, Dittapong, Atchaneeyasakul, La-ongsri, and Pintavirooj, Chuchart
Subjects: *MACHINE learning, *ALZHEIMER'S disease, *FEATURE extraction, *MILD cognitive impairment, *COHERENCE (Optics)
Abstract: Alzheimer's disease is a type of neurodegenerative disorder that is characterized by the progressive degeneration of brain cells, leading to cognitive decline and memory loss. It is the most common cause of dementia and affects millions of people worldwide. While there is currently no cure for Alzheimer's disease, early detection and treatment can help to slow the progression of symptoms and improve quality of life. This research presents a diagnostic tool for classifying mild cognitive impairment and Alzheimer's diseases using feature-based machine learning applied to optical coherence tomographic angiography images (OCT-A). Several features are extracted from the OCT-A image, including vessel density in five sectors, the area of the foveal avascular zone, retinal thickness, and novel features based on the histogram of the range-filtered OCT-A image. To ensure effectiveness for a diverse population, a large local database for our study was collected. The promising results of our study, with the best accuracy of 92.17,% will provide an efficient diagnostic tool for early detection of Alzheimer's disease. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease.

Author: Huang, Yong, Ma, Shwu-Fan, Oldham, Justin M., Adegunsoye, Ayodeji, Zhu, Daisy, Murray, Susan, Kim, John S., Bonham, Catherine, Strickland, Emma, Linderholm, Angela L., Lee, Cathryn T., Paul, Tessy, Mannem, Hannah, Maher, Toby M., Molyneaux, Philip L., Strek, Mary E., Martinez, Fernando J., and Noth, Imre
Subjects: INTERSTITIAL lung diseases, MACHINE learning, IDIOPATHIC pulmonary fibrosis, RECEIVER operating characteristic curves, PROTEOMICS
Abstract: Rationale: Distinguishing connective tissue disease–associated interstitial lung disease (CTD-ILD) from idiopathic pulmonary fibrosis (IPF) can be clinically challenging. Objectives: To identify proteins that separate and classify patients with CTD-ILD and those with IPF. Methods: Four registries with 1,247 patients with IPF and 352 patients with CTD-ILD were included in analyses. Plasma samples were subjected to high-throughput proteomics assays. Protein features were prioritized using recursive feature elimination to construct a proteomic classifier. Multiple machine learning models, including support vector machine, LASSO (least absolute shrinkage and selection operator) regression, random forest, and imbalanced Random Forest, were trained and tested in independent cohorts. The validated models were used to classify each case iteratively in external datasets. Measurements and Main Results: A classifier with 37 proteins (proteomic classifier 37 [PC37]) was enriched in the biological process of bronchiole development and smooth muscle proliferation and immune responses. Four machine learning models used PC37 with sex and age score to generate continuous classification values. Receiver operating characteristic curve analyses of these scores demonstrated consistent areas under the curve of 0.85–0.90 in the test cohort and 0.94–0.96 in the single-sample dataset. Binary classification demonstrated 78.6–80.4% sensitivity and 76–84.4% specificity in the test cohort and 93.5–96.1% sensitivity and 69.5–77.6% specificity in the single-sample classification dataset. Composite analysis of all machine learning models confirmed 78.2% (194 of 248) accuracy in the test cohort and 82.9% (208 of 251) in the single-sample classification dataset. Conclusions: Multiple machine learning models trained with large cohort proteomic datasets consistently distinguished CTD-ILD from IPF. Many of the identified proteins are involved in immune pathways. We further developed a novel approach for single-sample classification, which could facilitate honing the differential diagnosis of ILD in challenging cases and improve clinical decision making. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction.

Author: Wang, Dan, Shen, Yanbo, Ye, Dong, Yang, Yanchao, Da, Xuanfang, and Mo, Jingyue
Subjects: *SOLAR radiation, *K-nearest neighbor classification, *SOLAR oscillations, *FEATURE selection, *RANDOM forest algorithms
Abstract: This article aims to evaluate the performance of solar radiation forecasts produced by CMA-WSP v2.0 (version 2 of the China Meteorological Administration Wind and Solar Energy Prediction System) and to explore the application of machine learning algorithms from the scikit-learn Python library to improve the solar radiation prediction made by the CMA-WSP v2.0. It is found that the performance of the solar radiation forecasting from the CMA-WSP v2.0 is closely related to the weather conditions, with notable diurnal fluctuations. The mean absolute percentage error (MAPE) produced by the CMA-WSP v2.0 is approximately 74% between 11:00 and 13:00. However, the MAPE ranges from 193% to 242% at 07:00–08:00 and 17:00–18:00, which is greater than that observed at other daytime periods. The MAPE is relatively low (high) for both sunny and cloudy (overcast and rainy) conditions, with a high probability of an absolute percentage error below 25% (above 100%). The forecasts tend to underestimate (overestimate) the observed solar radiation in sunny and cloudy (overcast and rainy) conditions. By applying machine learning models (such as linear regression, decision trees, K-nearest neighbors, random forests regression, adaptive boosting, and gradient boosting regression) to revise the solar radiation forecasts, the MAPE produced by the CMA-WSP v2.0 is significantly reduced. The reduction in the MAPE is closely connected to the weather conditions. The models of K-nearest neighbors, random forests regression, and decision trees can reduce the MAPE in all weather conditions. The K-nearest neighbor model exhibits the most optimal performance among these models, particularly in rainy conditions. The random forest regression model demonstrates the second-best performance compared to that of the K-nearest neighbor model. The gradient boosting regression model has been observed to reduce the MAPE of the CMA-WSP v2.0 in all weather conditions except rainy. In contrast, the adaptive boosting (linear regression) model exhibited a diminished capacity to improve the CMA-WSP v2.0 solar radiation prediction, with a slight reduction in MAPE observed only in sunny (sunny and cloudy) conditions. In addition, the input feature selection has a considerable influence on the performance of the machine learning model. The incorporation of the time series data associated with the diurnal variation of solar radiation as an input feature can further improve the model's performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Application of a robust MALDI mass spectrometry approach for bee pollen investigation.

Author: Braglia, Chiara, Alberoni, Daniele, Di Gioia, Diana, Giacomelli, Alessandra, Bocquet, Michel, and Bulet, Philippe
Subjects: *MATRIX-assisted laser desorption-ionization, *BEE pollen, *HONEYBEES, *POLLINATION by bees, *PLANT identification, *MACHINE learning, *PLANT diversity, *MASS spectrometry
Abstract: Pollen collected by pollinators can be used as a marker of the foraging behavior as well as indicate the botanical species present in each environment. Pollen intake is essential for pollinators' health and survival. During the foraging activity, some pollinators, such as honeybees, manipulate the collected pollen mixing it with salivary secretions and nectar (corbicular pollen) changing the pollen chemical profile. Different tools have been developed for the identification of the botanical origin of pollen, based on microscopy, spectrometry, or molecular markers. However, up to date, corbicular pollen has never been investigated. In our work, corbicular pollen from 5 regions with different climate conditions was collected during spring. Pollens were identified with microscopy-based techniques, and then analyzed in MALDI-MS. Four different chemical extraction solutions and two physical disruption methods were tested to achieve a MALDI-MS effective protocol. The best performance was obtained using a sonication disruption method after extraction with acetic acid or trifluoroacetic acid. Therefore, we propose a new rapid and reliable methodology for the identification of the botanical origin of the corbicular pollens using MALDI-MS. This new approach opens to a wide range of environmental studies spanning from plant biodiversity to ecosystem trophic interactions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Machine learning models for enhanced cutting temperature prediction in hard milling process.

Author: Balasuadhakar, A., Kumaran, S. Thirumalai, and Uthayakumar, M.
Abstract: Cutting temperature is the most crucial quality character in the machining process. By prudently controlling this factor, high precision workpiece can be produced. Determination of cutting temperature in milling operation is challenging, time consuming and expensive process. These cost and time losses can be eliminated by predicted cutting temperature with machine learning models. The present study deals with the prediction of the cutting temperature on end milling of H11 steel with coated cemented carbide tool under three cooling environments, such as dry Machining, Minimum Quantity Lubrication (MQL) and Nano Fluid Minimum Quantity Lubrication (NMQL). In this study, various machine learning models such as Regularized Linear Regression Model (RLRM), Decision Tree (DT), XGB Regression (XGBR), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Gaussian Process Regression (GPR) were developed. These models use speed, feed, and lubrication conditions as input parameters. Among all the models, GPR yielded the best performance, achieving the highest evaluation metric scores of mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), determination coefficient (R2) and accuracy as of 14.04, 18.79, 14%, 0.9 and 85% respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Extraction and recognition of breast morphological feature parameters based on machine learning models.

Author: Su, Huimin, Zhang, Peishan, Guo, Ziyi, Li, Tao, and Zou, Fengyuan
Subjects: MACHINE learning, PRINCIPAL components analysis, SUPPORT vector machines, RANDOM forest algorithms, FEATURE extraction
Abstract: Breast feature parameters could represent breast morphology. It is significant for improving bra fit, and is an important aspect of garment ergonomics. To obtain the important feature parameters that can effectively represent breast morphology, this study proposed a feature parameter extraction method based on the machine learning model. First, the human body point-cloud data of 201 female college students were obtained by a three-dimensional body scanner, and 24 feature parameters related to breast morphology were acquired. Then, the cluster analysis was used to classify breast morphology into four categories: uniform hemisphere, outward expanding circular, converging water drop, and outward expanding hemisphere. Finally, principal component analysis was used to reduce the dimensionality of feature parameters, and the three machine learning models, naive Bayes, support vector machine, and random forest, were utilized to extract the parameters after dimensionality reduction. The results showed that principal component analysis could reduce the dimensions of breast feature parameters to seven main parameters. Based on the above three models, the seven main parameters were further reduced to three important feature parameters. They were sorted sequentially: breast volume, breast surface area, and longitudinal breast cup straight line length, and the Fisher discriminate function was used to distinguish breast morphology. The recognition accuracy based on the three important feature parameters reached 99%, higher than 97.5% for full feature parameters recognition, and 98% for seven feature parameters recognition. It is proved that the three important feature parameters obtained by the machine model are effective in characterizing breast morphology. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Toward Large‐Scale Riverine Phosphorus Estimation Using Remote Sensing and Machine Learning.

Author: Ramtel, Pradeep, Feng, Dongmei, and Gardner, John
Subjects: MACHINE learning, PHOSPHORUS in water, WATER quality monitoring, REMOTE sensing, SOIL erosion
Abstract: Phosphorus pollution is a major water quality issue impacting the environment and human health. Traditional methods limit the frequency and extent of total phosphorus (TP) measurements across many rivers. However, remote sensing can accurately estimate riverine TP; nevertheless, no large‐scale assessment of riverine TP using remote sensing exists. Large‐scale models using remote sensing can provide a fast and consistent method for TP measurement, important for data generalization and accessing extensive spatial‐temporal change in TP. Our study uses remote sensing and machine learning to estimate the TP in rivers in the contiguous United States (CONUS). Initially, we developed a national scale matchup data set for Landsat detectable rivers (river width >30 m) using in situ TP and surface reflectance. We used in situ data from the Water Quality Portal (WQP), alongside water surface reflectance data from Landsat 5, 7, and 8 spanning from 1984 to 2021. Then, we used this data set to develop a machine learning (ML) model using different preprocessing methods and algorithms. We found that using high‐level vegetation in the clustering approach and over‐sampling or under‐sampling our training data in the sampling approach improved our model estimation accuracy. We compared XGBLinear, XGBTree, Regularized Random Forest (RRF), and K‐Nearest neighbors ML algorithms and selected XGBLinear as the best model with an R2 of 0.604, RMSE of 0.103 mg/L, mean average error of 0.83, and NSE of 0.602. Finally, we identified human footprint, elevation, river area, and soil erosion as the main attributes influencing the accuracy of estimated TP from the ML model. Plain Language Summary: Phosphorus pollution in rivers is a big problem for the environment and humans. Traditional ways of measuring phosphorus levels in rivers are limited and slow. However, remote sensing can help estimate phosphorus levels on a large scale. But, there hasn't been a study using remote sensing to measure phosphorus levels in rivers on a large scale. We studied the potential of remote sensing and machine learning to estimate phosphorus in the United States rivers. First, we collected data from 1984 to 2021, both on‐site phosphorus measurements from the Water Quality Portal and satellite river data from the Landsat archive, and developed our model. We found that using vegetation data and changing our data volume improved our model's accuracy. Likewise, we discovered that river characteristics like human activity level, elevation, river size, and soil erosion influenced how accurately our model could estimate phosphorus levels using remote sensing. Key Points: We created a national‐level in situ total phosphorus and water reflectance matchup data set for rivers within CONUSWe developed a machine learning model to simulate riverine phosphorus using water reflectance data retrieved from LandsatWe identified river‐specific characteristics that influence the model performance on retrieving riverine phosphorus from satellite data [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Design of antiviral AGO2-dependent short hairpin RNAs.

Author: Yuanyuan Bie, Jieling Zhang, Jiyao Chen, Yumin Zhang, Muhan Huang, Leike Zhang, Xi Zhou, and Yang Qiu
Subjects: SARS-CoV-2, HAIRPIN (Genetics), SMALL interfering RNA, RNA interference, HEPATITIS A virus
Abstract: The increasing emergence and re-emergence of RNA virus outbreaks underlines the urgent need to develop effective antivirals. RNA interference (RNAi) is a sequence-specific gene silencing mechanism that is triggered by small interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), which exhibits significant promise for antiviral therapy. AGO2-dependent shRNA (agshRNA) generates a single-stranded guide RNA and presents significant advantages over traditional siRNA and shRNA. In this study, we applied a logistic regression algorithm to a previously published chemically siRNA efficacy dataset and built a machine learning-based model with high predictive power. Using this model, we designed siRNA sequences targeting diverse RNA viruses, including human enterovirus A71 (EV71), Zika virus (ZIKV), dengue virus 2 (DENV2), mouse hepatitis virus (MHV) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and transformed them into agshRNAs. We validated the performance of our agshRNA design by evaluating antiviral efficacies of agshRNAs in cells infected with different viruses. Using the agshRNA targeting EV71 as an example, we showed that the anti-EV71 effect of agshRNA was more potent compared with the corresponding siRNA and shRNA. Moreover, the antiviral effect of agshRNA is dependent on AGO2-processed guide RNA, which can load into the RNA-induced silencing complex (RISC). We also confirmed the antiviral effect of agshRNA in vivo. Together, this work develops a novel antiviral strategy that combines machine learning-based algorithm with agshRNA design to custom design antiviral agshRNAs with high efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. An automated multi-classification of communicable diseases using ensemble learning for disease surveillance.

Author: Thakur, Kavita, Sandhu, Navneet Kaur, Kumar, Yogesh, and Thakkar, Hiren Kumar
Abstract: Communicable diseases are considered significant global health concern for the public, and their timely detection is crucial for effective prevention and spread control. However, communicable disease data are highly interdependent and complex to analyze using the traditional tools for their automatic detection. On the contrary, machine learning enabled models have shown tremendous potential in this regard, enabling rapid and accurate identification of communicable diseases. The objective of this research is to effectively identify as well as classify various communicable diseases using machine learning models in an efficient manner. The data of ten communicable diseases are considered, which are further analyzed by pre-processing, feature selection, and visualization. Later machine learning models such as Random Forest, Gradient Boosting, Decision Tree, Adaptive Boosting (AdaBoost), extreme Gradient Boosting (XGBoost), Extra Tree, Light Gradient Boosting Machine (Light GBM), and Categorical Boosting (CatBoost), along with the hybridization of XGBoost and Random Forest, are being applied, which are further evaluated using the parameters such as precision, false detection rate, recall, negative prediction value, F1 score, accuracy, and Matthew's correlation coefficient. The confusion matrix of all the models for various classes has also been generated to compute the values of performance metrics. During experimentation, it has been found that the random forest and hybridized model classifier obtained the highest accuracy of 99.9%, Random Forest, Extra Tree Classifier, CatBoost, and Hybrid classifier computed the highest Matthew's correlation coefficient score of 99.9%, the Gradient Boosting classifier obtained the best false detection rate value with 95.13% and negative predicted value with 189.82. Overall, the research showed that the artificial intelligence techniques have the potential to improve Communicable disease detection extensively, and future research in this area can help to develop more robust and effective disease surveillance and control tools. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. 基于多尺度卷积神经网络的深圳市滑坡易发性评价.

Author: 张清, 何毅, 陈学业, 高秉海, 张立峰, 赵占鹫, 路建刚, and 张雅蕾
Abstract: Copyright of Chinese Journal of Geological Hazard & Control is the property of China Institute of Geological Environmental Monitoring (CIGEM) Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

38. Prediction of HPC compressive strength based on machine learning.

Author: Jin, Libing, Duan, Jie, Jin, Yichen, Xue, Pengfei, and Zhou, Pin
Subjects: *COMPRESSIVE strength, *GREY relational analysis, *FEATURE extraction, *SUPPORT vector machines, *GENETIC algorithms, *MACHINE learning
Abstract: There is a complex high-dimensional nonlinear mapping relationship between the compressive strength of High-Performance Concrete (HPC) and its components, which has great influence on the accurate prediction of compressive strength. In this paper, an efficient robust software calculation strategy combining BP Neural Network (BPNN), Support Vector Machine (SVM) and Genetic Algorithm (GA) is proposed for the prediction of compressive strength of HPC. 8 features were extracted from the previous literature, and a compressive strength database containing 454 sets of data was constructed. The model was trained and tested, and the performance of 4 Machine Learning (ML) models, namely BPNN, SVM, GA-BPNN and GA-SVM, was compared. The results show that the coupled model is superior to the single model. Moreover, because GA-SVM has better generalization ability and theoretical basis, its convergence speed and prediction accuracy are better than GA-BPNN. Then Grey Relational Analysis (GRA) and Shapley analysis were used to verify the interpretability of the GA-SVM model, which showed that the water-binder ratio had the most significant influence on the compressive strength. Finally, the combination of multiple input variables to evaluate the compressive strength supplemented this research, and again verified the significant influence of water-binder ratio, providing reference value for subsequent research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. A novel univariate interpolation and bivariate regression hybrid method application to biodegradation of bisphenol A diglycidyl ether using laccases from Geobacillus stearothermophilus and Geobacillus thermoparafinivorans strains.

Author: Bibi, Monaza, Yasmin, Azra, Murtza, Iqbal, and Abbas, Sidra
Subjects: GEOBACILLUS stearothermophilus, LACCASE, MACHINE learning, HIGH performance liquid chromatography, GAS chromatography/Mass spectrometry (GC-MS), THERMOPHILIC bacteria, BIOCATALYSIS
Abstract: Bisphenol A diglycidyl ether (BADGE), a derivative of the well-known endocrine disruptor Bisphenol A (BPA), is a potential threat to long-term environmental health due to its prevalence as a micropollutant. This study addresses the previously unexplored area of BADGE toxicity and removal. We investigated, for the first time, the biodegradation potential of laccase isolated from Geobacillus thermophilic bacteria against BADGE. The laccase-mediated degradation process was optimized using a combination of response surface methodology (RSM) and machine learning models. Degradation of BADGE was analyzed by various techniques, including UV-Vis spectrophotometry, high-performance liquid chromatography (HPLC), Fourier transform infrared (FTIR) spectroscopy, and gas chromatography-mass spectrometry (GC-MS). Laccase from Geobacillus stearothermophilus strain MB600 achieved a degradation rate of 93.28% within 30 min, while laccase from Geobacillus thermoparafinivorans strain MB606 reached 94% degradation within 90 min. RSM analysis predicted the optimal degradation conditions to be 60 min reaction time, 80°C temperature, and pH 4.5. Furthermore, CB-Dock simulations revealed good binding interactions between laccase enzymes and BADGE, with an initial binding mode selected for a cavity size of 263 and a Vina score of -5.5, which confirmed the observed biodegradation potential of laccase. These findings highlight the biocatalytic potential of laccases derived from thermophilic Geobacillus strains, notably MB600, for enzymatic decontamination of BADGE-contaminated environments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Experimental and machine learning based study of compressive strength of geopolymer concrete.

Author: Tran, Ngoc Thanh, Nguyen, Duy Hung, Tran, Quang Thanh, Le, Huy Viet, and Nguyen, Duy-Liem
Subjects: *POLYMER-impregnated concrete, *COMPRESSIVE strength, *CONCRETE curing, *CONCRETE, *MACHINE learning
Abstract: In this study, the aim is to investigate and predict the compressive strength of geopolymer concrete (GPC). The effects of curing method, curing time and concrete age on the compressive strength of GPC were evaluated experimentally. Four curing methods, namely room temperature (25°C), mobile dryer (50°C), heating cabinet type 1 (80°C) and heating cabinet type 2 (100°C) were adopted. Additionally, three curing times, of 8 h, 16 h and 24 h, as well as three concrete ages, of 7 days, 14 days and 28 days, were considered. To predict the compressive strength of GPC, 679 test results were collected to develop various machine learning models. The test results indicated that increasing the curing temperature, curing time and concrete age all led to improvements in the compressive strength of GPC. The mobile dryer showed promise as a curing method for cast-in-place GPC. The proposed machine learning models demonstrated good predictive capacity for the compressive strength of GPC with relatively high accuracy. Through sensitivity analysis, concrete age was identified as the most influential variable affecting the final compressive strength of GPC. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Effects of Seasonal Freezing and Thawing on Soil Moisture and Salinity in the Farmland Shelterbelt System in the Hetao Irrigation District.

Author: Luo, Chengwei, Wang, Ruoshui, Hao, Kexin, Jia, Xiaoxiao, Zhu, Junying, Xin, Zhiming, and Xiao, Huijie
Subjects: *SOIL moisture, *ARTIFICIAL neural networks, *MACHINE learning, *SOIL salinity, *SOIL salinization
Abstract: Water resources are scarce, and secondary soil salinization is severe in the Hetao Irrigation District. Farmland shelterbelt systems (FSS) play a critical role in regulating soil water and salt dynamics within the irrigation district. However, the understanding of soil water and salt migration within FSS during the freeze–thaw period remains unclear due to the complex and multifaceted interactions between water and salt. This study focused on a typical FSS and conducted comprehensive monitoring of soil moisture, salinity, temperature, and meteorological parameters during the freeze–thaw period. The results revealed consistent trends in air temperature and soil temperature overall. Soil freezing durations exceeded thawing durations, and both decreased with an increasing soil depth. At the three critical freeze–thaw nodes, the soil moisture content at a 0–20 cm depth was significantly lower than at a 40–100 cm depth (p < 0.05). The soil water content increased with time and depth at varying distances from the shelterbelt, with an average increase of 7.63% after freezing and thawing. The surface water content at the forest edge (0.3H, 4H) was lower than inside the farmland (1H, 2H, 3H). Soil salt accumulation occurred during both freezing stable periods and melting–thawing periods in the 0–100 cm soil layer near the forest edge (0.3H, 4H), with the highest soil salinity reaching 0.62 g·kg−1. After the freeze–thaw period, the soil salt content in each layer increased by 11.41–47.26% compared to before the freeze–thaw period. Salt accumulation in farmland soil near the shelterbelt was stronger than in the far shelterbelt. The multivariate statistical model demonstrated goodness of fit for soil water and salt as 0.94 and 0.72, respectively, while the BP neural network model showed goodness of fit for soil water and salt as 0.82 and 0.85, respectively. Our results provide an efficient theoretical basis for FSS construction and agricultural water management practices. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Evaluation of the Habitat Suitability for Zhuji Torreya Based on Machine Learning Algorithms.

Author: Wu, Liangjun, Yang, Lihui, Li, Yabin, Shi, Jian, Zhu, Xiaochen, and Zeng, Yan
Subjects: MACHINE learning, ALLUVIAL plains, SUPPORT vector machines, RANDOM forest algorithms, WATERSHEDS
Abstract: Torreya, with its dual roles in both food and medicine, has faced multiple challenges in its cultivation in Zhuji city due to frequent global climate disasters in recent years. Therefore, conducting a study on suitable zoning for Torreya habitats based on climatic, topographic, and soil factors is highly important. In this study, we utilized the latitude and longitude coordinates of Torreya distribution points and ecological factor raster data. We thoroughly analyzed the ecological environmental characteristics of the climate, topography, and soil at Torreya distribution points via both physical modeling and machine learning methods. Zhuji city was classified into suitable, moderately suitable, and unsuitable zones to determine regions conducive to Torreya growth. The results indicate that suitable zones for Torreya cultivation in Zhuji city are distributed mainly in mountainous and hilly areas, while unsuitable zones are found predominantly in central basins and northern river plain networks. Moderately suitable zones are located in transitional areas between suitable and unsuitable zones. Compared to climatic factors, soil and topographic factors more significantly restrict Torreya cultivation. Machine learning algorithms can also achieve suitability zoning with a more concise and efficient classification process. In this study, the random forest (RF) algorithm demonstrated greater predictive accuracy than the support vector machine (SVM) and naive Bayes (NB) algorithms, achieving the best classification results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. An empirical framework for event prediction in massive datasets.

Author: Rajita, B. S. A. S., Soni, Samarth, Kumari, Deepa, and Panda, Subhrakanta
Abstract: Certain events always trigger evolutionary changes in temporal Social Networks (SNs) communities. Machine Learning models make predictions for such events. The performance of these ML models largely depends on the dataset's features. Existing literature shows that the community features of the datasets have helped ML models predict the events with some accuracy. However, a temporal dataset has temporal and community features owing to its evolving structures. These temporal features also aid in improving the performance of the ML models. Thus, this work aims to compare the effectiveness of temporal and community features in improving the accuracy of ML models. This paper proposes a framework to extract the detected communities' community- and temporal- features in temporal data. This research also analyses ML models suitable for predicting events based on features and compares their performance. The experimental research shows that adding temporal features improves the prediction accuracy from 79.51 to 81.47% and saves 59.37% of the computational time of ML models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Prediction of Histological Grade of Oral Squamous Cell Carcinoma Using Machine Learning Models Applied to 18 F-FDG-PET Radiomics.

Author: Nikkuni, Yutaka, Nishiyama, Hideyoshi, and Hayashi, Takafumi
Subjects: MACHINE learning, RECEIVER operating characteristic curves, SQUAMOUS cell carcinoma, RADIOMICS, SUPPORT vector machines
Abstract: The histological grade of oral squamous cell carcinoma affects the prognosis. In the present study, we performed a radiomics analysis to extract features from 18F-FDG PET image data, created machine learning models from the features, and verified the accuracy of the prediction of the histological grade of oral squamous cell carcinoma. The subjects were 191 patients in whom an 18F-FDG-PET examination was performed preoperatively and a histopathological grade was confirmed after surgery, and their tumor sizes were sufficient for a radiomics analysis. These patients were split in a 70%/30% ratio for use as training data and testing data, respectively. We extracted 2993 radiomics features from the PET images of each patient. Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB), and K-Nearest Neighbor (KNN) machine learning models were created. The areas under the curve obtained from receiver operating characteristic curves for the prediction of the histological grade of oral squamous cell carcinoma were 0.72, 0.71, 0.84, 0.74, and 0.73 for LR, SVM, RF, NB, and KNN, respectively. We confirmed that a PET radiomics analysis is useful for the preoperative prediction of the histological grade of oral squamous cell carcinoma. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. High spatial and spectral resolution dataset of hyperspectral look-up tables for 3.5 million traits and structural combinations of Central European temperate broadleaf forestsNational Repository

Author: Tomáš Hanousek, Terézia Slanináková, Tomáš Rebok, and Růžena Janoutová
Subjects: LUT, Radiative transfer model, DART, Machine learning model, Synthetic spectral data, Leaf traits, Computer applications to medicine. Medical informatics, R858-859.7, Science (General), Q1-390
Abstract: Accurate retrieval of forest functional traits from remote sensing data is critical for monitoring forest health and productivity. To achieve sufficient accuracy using inverse methods it is essential to have representative database of simulated or measured spectral properties together with corresponding forest traits. However, existing datasets are often limited in scope, covering specific sites and times with simplified structures. This limitation hinders the development of generalizable machine learning models for trait prediction. To address this issue, we present a comprehensive high-resolution dataset of hyperspectral Look-Up Tables (LUT) designed for Central European temperate broadleaf forests.The dataset includes 3.5 million unique combinations of leaf biochemical and canopy structural characteristics of forest scenes together with a variety of sun geometry. The spectral data cover wavelengths from 450 nm to 2300 nm, with a resolution of 2 nm. The dataset is organised into two files: one capturing the average reflectance of all scene pixels and another focusing solely on sunlit leaf pixels. LUT were generated using the Discrete Anisotropic Radiative Transfer model version 5.10.0. Virtual forest scenes were based on 3D tree representations derived from Terrestrial Laser Scanning of European beech trees, adjusted to various leaf area index values and structural configurations to simulate natural forest variability. The reflectance data were processed using MATLAB and Python scripts, resulting in hyperspectral cubes that were processed to generate the LUT.The dataset can be used to train machine learning models, such as Random Forest and Support Vector Machines, for predicting forest functional traits and assisting in the calibration of remote sensing algorithms. The biggest advantage of the dataset is high spectral and spatial resolution, together with the high number of different trait combinations, which allows for adaptability to different times, locations, and hyper- and multispectral sensors, and can support up-coming hyperspectral satellite missions. ESA Copernicus Hyperspectral Imaging Mission for the Environment (CHIME) and NASA Surface Biology and Geology (SBG) future satellite missions can utilise this dataset to develop their product processors for monitoring forest traits.
Published: 2024
Full Text: View/download PDF

46. Machine learning model for early prediction of survival in gallbladder adenocarcinoma: A comparison study

Author: Weijia Wang, Xin Li, Haiyuan Yu, Fangxuan Li, and Guohua Chen
Subjects: Gallbladder adenocarcinoma, Prognosis, Machine learning model, The Surveillance, epidemiology, and end results program (SEER), Biotechnology, TP248.13-248.65, Medical technology, R855-855.5
Abstract: The prognosis for gallbladder adenocarcinoma (GBAC), a highly malignant cancer, is not good. In order to facilitate individualized risk stratification and improve clinical decision-making, this study set out to create and validate a machine learning model that could accurately predict early survival outcomes in GBAC patients. Five models—RSF, Cox regression, GBM, XGBoost, and Deepsurv—were compared using data from the SEER database (2010–2020). The dataset was divided into training (70 %) and validation (30 %) sets, and the C-index, ROC curves, calibration curves, and decision curve analysis (DCA) were used to assess the model's performance. At 1, 2, and 3-year survival intervals, the RSF model performed better than the others in terms of calibration, discrimination, and clinical net benefit. The most important predictor of survival, according to SHAP analysis, is AJCC stage. Patients were divided into high, medium, and low-risk groups according to RSF-derived risk scores, which revealed notable variations in survival results. These results demonstrate the RSF model's potential as an early survival prediction tool for GBAC patients, which could enhance individualized treatment and decision-making.
Published: 2024
Full Text: View/download PDF

47. Machine learning modeling reveals the spatial variations of lake water salinity on the endorheic Tibetan Plateau

Author: Pengju Xu, Kai Liu, Lan Shi, and Chunqiao Song
Subjects: Lake, Water salinity, Water resources, Tibetan Plateau, Machine learning model, Physical geography, GB3-5030, Geology, QE1-996.5
Abstract: Study region: The endorheic Tibetan Plateau (TP). Study focus: Water salinity is sensitive indicator for variations of lake hydrologic and physicochemical characteristics. Due to the heterogeneous influences from geographical and climatic factors, lake water salinity is highly sensitive to environmental diversity and changes. The TP hosts a wide distribution of lakes, the majority of which belong to endorheic drainage type and are saline or salty lakes. However, the harsh environment on the TP poses great challenges for the in–site measurements at large scales, impeding the comprehension of the pattern and variations of lake water salinity across the TP. New hydrological insights for the region: Benefiting extensive field surveys and a meta–analysis, this study establishes machine learning models based on measurements from 100 terminal lakes (>1 km2) and related physical variables. The optimal model (R2 = 0.90, MAE = 8.11 g/L, MAPE = 36.40 %, RMSE = 12.51 g/L, RRMSE = 36.96 g/L) is then applied to predict the water salinity of the other 214 unmeasured terminal lakes. The modeling results reveal a spatial variation pattern of increasing water salinity of these terminal lakes from south to north across the endorheic basins. Further classification of water salinity levels indicated that more than half (213) of the terminal lakes are in an oligosaline state. This study contributes to a spatially–explicit understanding of the distribution variations in water salinity of terminal TP lakes and provides a feasible approach for estimating water salinity of unmeasured lakes at large scales.
Published: 2024
Full Text: View/download PDF

48. Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River

Author: Ngo Thanh Son and Nguyen Duc Loc
Subjects: Water quality monitoring, Machine learning model, Sentinel-2 imagery, Turbidity, Total suspended solids, Upper Ma river, Environmental sciences, GE1-350, Environmental technology. Sanitary engineering, TD1-1066
Abstract: This study aims to establish the optimal regression model for predicting total suspended solids (TSS) and Turbidity based on in situ data and spectral regions of Sentinel-2 images. Various machine learning models were evaluated, including Multilayer Perceptron Regression (MLPR), Random Forest Regression (RFR), AdaBoost Regression (ABR), Multiple Linear Regression (MLR), and K-Nearest Neighbors Regression (KNNR). These models were applied to different band combinations of spectral regions: visible (VIS), near-infrared (NIR), shortwave-infrared (SWIR), VIS+NIR (VNIR), and VIS+NIR+SWIR (VNIR+SWIR). The study results revealed that the MLR model, while not the best performer during training (R2 = 0.89 for TSS and R2 = 0.66 for turbidity), did not exhibit overfitting, with corresponding R² values in testing being 0.80 and 0.42, respectively. Variable selection for MLR models identified optimal spectral bands: B3, B5, B6, B8, B11, and B12 for TSS, and B4, B8, B11, and B12 for Turbidity. The final no-intercept multiple linear regression models achieved R2 = 0.88 for TSS and R2 = 0.62 for turbidity. Performance metrics for TSS were superior, with lower MAE, MSE, and RMSE compared to Turbidity. This study underscores the efficacy of using MLR models with selected spectral bands for accurate and generalizable predictions of TSS and turbidity.
Published: 2024
Full Text: View/download PDF

49. Machine learning models can predict cancer-associated disseminated intravascular coagulation in critically ill colorectal cancer patients

Author: Li Qin, Jieling Mao, Min Gao, Jingwen Xie, Zhikun Liang, and Xiaoyan Li
Subjects: disseminated intravascular coagulation, machine learning model, intensive care unit, colorectal cancer, anticoagulation, Therapeutics. Pharmacology, RM1-950
Abstract: BackgroundDue to its complex pathogenesis, the assessment of cancer-associated disseminated intravascular coagulation (DIC) is challenging. We aimed to develop a machine learning (ML) model to predict overt DIC in critically ill colorectal cancer (CRC) patients using clinical features and laboratory indicators.MethodsThis retrospective study enrolled consecutive CRC patients admitted to the intensive care unit from January 2018 to December 2023. Four ML algorithms were used to construct predictive models using 5-fold cross-validation. The models’ performance in predicting overt DIC and 30-day mortality was evaluated using the area under the receiver operating characteristic curve (ROC-AUC) and Cox regression analysis. The performance of three established scoring systems, ISTH DIC-2001, ISTH DIC-2018, and JAAM DIC, was also assessed for survival prediction and served as benchmarks for model comparison.ResultsA total of 2,766 patients were enrolled, with 699 (25.3%) diagnosed with overt DIC according to ISTH DIC-2001, 1,023 (36.9%) according to ISTH DIC-2018, and 662 (23.9%) according to JAAM DIC. The extreme gradient boosting (XGB) model outperformed others in DIC prediction (ROC-AUC: 0.848; 95% CI: 0.818–0.878; p < 0.01) and mortality prediction (ROC-AUC: 0.708; 95% CI: 0.646–0.768; p < 0.01). The three DIC scores predicted 30-day mortality with ROC-AUCs of 0.658 for ISTH DIC-2001, 0.692 for ISTH DIC-2018, and 0.673 for JAAM DIC.ConclusionThe results indicate that ML models, particularly the XGB model, can serve as effective tools for predicting overt DIC in critically ill CRC patients. This offers a promising approach to improving clinical decision-making in this high-risk group.
Published: 2024
Full Text: View/download PDF

50. Spatiotemporal pattern of water hyacinth (Pontederia crassipes) distribution in Lake Tana, Ethiopia, using a random forest machine learning model

Author: Matiwos Belayhun, Zerihun Chere, Nigus Gebremedhn Abay, Yonas Nicola, and Abay Asmamaw
Subjects: aquatic invasive plant, Lake Tana, machine learning model, remote sensing indices, Sentinel image, water hyacinth, Environmental sciences, GE1-350
Abstract: Water hyacinth (Pontederia crassipes) is an invasive weed that covers a significant portion of Lake Tana. The infestation has an impact on the lake’s ecological and socioeconomic systems. Early detection of the spread of water hyacinth using geospatial techniques is crucial for its effective management and control. The main objective of this study was to examine the spatiotemporal distribution of water hyacinth from 2016 to 2022 using a random forest machine learning model. The study used 16 variables obtained from Sentinel-2A, Sentinel-1 SAR, and SRTM DEM, and a random forest supervised classification model was applied. Seven spectral indices, five spectral bands, two Sentinel-1 SAR bands, and two topographic variables were used in combination to model the spatial distribution of water hyacinth. The model was evaluated using the overall accuracy and kappa coefficient. The findings demonstrated that the overall accuracy ranged from 0.91 to 0.94 and kappa coefficient from 0.88 to 0.92 in the wet season and 0.93 to 0.95 and 0.90 to 0.93 in the dry season, respectively. B11 and B5 (2022), VH, soil adjusted vegetation index (SAVI), and normalized difference water index (NDWI) (2020), B5 and B12 (2018), and VH and slope (2016) are the highly important variables in the classification. The study found that the spatial coverage of water hyacinth was 686.5 and 650.4 ha (2016), 1,851 and 1,259 ha (2018), 1,396.7 and 1,305.7 ha (2020), and 1,436.5 and 1,216.5 ha (2022) in the wet and dry seasons, respectively. The research findings indicate that variables derived from optical (Sentinel-2A and SRTM) and non-optical (Sentinel-1 SAR) satellite imagery effectively identify water hyacinth and display its spatiotemporal spread using the random forest machine learning algorithm.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,106 results on '"machine learning model"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources