Descriptor: "Feature scaling" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Feature scaling"' showing total 395 results

Start Over Descriptor "Feature scaling"

395 results on '"Feature scaling"'

1. Practical Aspects in Machine Learning

Author: Gupta, Pramod, Sehgal, Naresh Kumar, Acken, John M., Gupta, Pramod, Sehgal, Naresh Kumar, and Acken, John M.
Published: 2025
Full Text: View/download PDF

2. CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model.

Author: Yang, Feng, Hu, Chunying, Liang, Aokang, Wang, Sheng, Su, Yun, and Xu, Fu
Subjects: *WILDLIFE monitoring, *ENDANGERED species, *KNOWLEDGE base, *RESEARCH teams, *WILDLIFE conservation, *ALGORITHMS
Abstract: Simple Summary: Accurate detection of wildlife, particularly small and hidden animals, is crucial for conservation efforts. Traditional image-based methods often struggle in complex environments. This study introduces a novel approach that combines image and text data to improve detection accuracy. By incorporating textual information about animal characteristics and leveraging a Concept Enhancement Module (CEM), our model can better understand and locate animals, even in challenging conditions. Experimental results demonstrate a significant improvement in detection accuracy, achieving an average precision of 95.8% on a challenging wildlife dataset. Compared to existing multimodal target detection algorithms, this model achieved at least a 25% improvement in AP and excelled in detecting small targets of certain species, significantly surpassing existing multimodal target detection model benchmarks. This represents a substantial improvement compared to existing state-of-the-art methods. Our multimodal approach offers a promising solution for enhancing wildlife monitoring and conservation efforts. Accurate and efficient wildlife monitoring is essential for conservation efforts. Traditional image-based methods often struggle to detect small, occluded, or camouflaged animals due to the challenges posed by complex natural environments. To overcome these limitations, an innovative multimodal target detection framework is proposed in this study, which integrates textual information from an animal knowledge base as supplementary features to enhance detection performance. First, a concept enhancement module was developed, employing a cross-attention mechanism to fuse features based on the correlation between textual and image features, thereby obtaining enhanced image features. Secondly, a feature normalization module was developed, amplifying cosine similarity and introducing learnable parameters to continuously weight and transform image features, further enhancing their expressive power in the feature space. Rigorous experimental validation on a specialized dataset provided by the research team at Northwest A&F University demonstrates that our multimodal model achieved a 0.3% improvement in precision over single-modal methods. Compared to existing multimodal target detection algorithms, this model achieved at least a 25% improvement in AP and excelled in detecting small targets of certain species, significantly surpassing existing multimodal target detection model benchmarks. This study offers a multimodal target detection model integrating textual and image information for the conservation of rare and endangered wildlife, providing strong evidence and new perspectives for research in this field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Gradient scaling and segmented SoftMax Regression Federated Learning (GDS-SRFFL): a novel methodology for attack detection in industrial internet of things (IIoT) networks.

Author: Rajasekaran, Vijay Anand, Indirajithu, Alagiri, Jayalakshmi, P., Nayyar, Anand, and Balusamy, Balamurugan
Subjects: *INTERNET of things, *FEDERATED learning, *CYBERTERRORISM, *COMPUTER network security, *DEEP learning, *SENSOR networks
Abstract: Industrial internet of things (IIoT) is considered as large-scale IoT-based network comprising of sensors, communication channels, and security protocols used in Industry 4.0 for diverse real-time operations. Industrial IoT (IIoT) networks are vulnerable to diverse cyber threats and attacks. Attack detection is the biggest security issue in the IIoT. Various traditional attack detection methods are proposed by several researchers but all are insufficient to protect privacy and security. To address the issue, a novel Gradient Descent Scaling and Segmented Regression Fine-tuned Federated Learning (GDS-SRFFL) method is introduced for IIoT network attack detection. The aim of the GDS-SRFFL method is to enhance the security of an IIoT network. Initially, the novelty of Gradient Descent Scaling-based preprocessing is applied to the raw dataset for obtaining feature feature-scaled preprocessed network sample. Then, the unwanted intrusions are discovered by using a Segmented Regression Fine-tuned Mini-batch Federated Learning model to ensure the protection of IoT networks with the novelty of SoftMax Regression. In order to validate the proposed methodology, experimentations were conducted on different parameters, namely accuracy, precision, recall, specificity, and attack detection time, and the results concluded that proposed GDS-SRFFL has improved accuracy by 10%, precision by 13%, recall by 10%, specificity by 11% as well as minimum attack detection time by 28% as compared to existing techniques like CNN + LSTM (Altunay and Albayrak in Eng Sci Technol Int J 38:101322, 2023, https://doi.org/10.1016/j.jestch.2022.101322), Enhanced Deep and Ensemble learning in SCADA-based IIoT network (Khan et al. in IEEE Trans Ind Inf 19(1):1030–1038, https://doi.org/10.1109/TII.2022.3190352), RNN (Ullah and Mahmoud in IEEE Access 10:62722–62750, 2022, https://doi.org/10.1109/ACCESS.2022.3176317), and other CNN methods. The proposed method "GDS-SRFFL" has overall accuracy of 89.42% as compared to other existing methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Deep learning based features extraction for facial gender classification using ensemble of machine learning technique.

Author: Waris, Fazal, Da, Feipeng, and Liu, Shanghuan
Abstract: Accurate and efficient gender recognition is an essential for many applications such as surveillance, security, and biometrics. Recently, deep learning techniques have made remarkable advancements in feature extraction and have become extensively implemented in various applications, including gender classification. However, despite the numerous studies conducted on the problem, correctly recognizing robust and essential features from face images and efficiently distinguishing them with high accuracy in the wild is still a challenging task for real-world applications. This article proposes an approach that combines deep learning and soft voting-based ensemble model to perform automatic gender classification with high accuracy in an unconstrained environment. In the proposed technique, a novel deep convolutional neural network (DCNN) was designed to extract 128 high-quality and accurate features from face images. The StandardScaler method was then used to pre-process these extracted features, and finally, these preprocessed features were classified with soft voting ensemble learning model combining the outputs from several machine learning classifiers such as random forest (RF), support vector machine (SVM), linear discriminant analysis (LDA), logistic regression (LR), gradient boosting classifier (GBC) and XGBoost to improve the prediction accuracy. The experimental study was performed on the UTK, label faces in the wild (LFW), Adience and FEI datasets. The results attained evidently show that the proposed approach outperforms all current approaches in terms of accuracy across all datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Machine Learning-Based Water Potability Prediction: Model Evaluation, and Hyperparameter Optimization

Author: Mondal, Anoushka, Dubey, Sudhanshu Sudhakar, Lim, Meng-Hiot, Series Editor, Saha, Apu Kumar, editor, Sharma, Harish, editor, and Prasad, Mukesh, editor
Published: 2024
Full Text: View/download PDF

6. Meticulous predictive modelling for classification of cancerous molecular profiles.

Author: Bhonde, Swati B., Wagh, Sharmila K., and Prasad, Jayashree R.
Subjects: PREDICTION models, RANDOM forest algorithms, DECISION trees, UNCERTAINTY (Information theory), EARLY detection of cancer
Abstract: Functional genomic data has recently been used to aid in the effective and early detection of cancer. The Microarray in genomic data evidence has two main problems, according to previous research, one is high dimensionality and another is limited sample size. Several researchers have used various statistical and machine learning-based methods to study and assess the cancer classification challenge but attaining the highest accuracy is still remaining as a future scope. While classifying the cancer type, leaving a large number of non-informative genes in the study might lead to skewed results and lower power which hinders the methodology overall accuracy. So to overcome it, the dimensional reduction takes place by t-SNE with Kullback divergence and Shannon entropy for an effective prediction. Generally, for classification, decision trees are employed which are prone to over fitting, especially when a tree is particularly deep. Hence, a Decisive random forest classifier is utilized for the cancer prediction with the resources updated and classify them efficiently. The performance of the proposed method was compared with that of other SOTA approaches such as BERT, XLNet, RoBERTa and BART and the results showed that the proposed method outperformed the other methods. The result obtained by the proposed model efficiently predicted the cancer with high accuracy of 99%, high Recall is 99.8%, high precision is 99% and high F1-Score is 99.6%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. When to Use Standardization and Normalization: Empirical Evidence From Machine Learning Models and XAI

Author: Khaled Mahmud Sujon, Rohayanti Binti Hassan, Zeba Tusnia Towshi, Manal A. Othman, Md Abdus Samad, and Kwonhue Choi
Subjects: Standardization, normalization, feature scaling, data preprocessing, machine learning, explainable AI (XAI), Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Optimizing machine learning (ML) model performance relies heavily on appropriate data preprocessing techniques. Despite the widespread use of standardization and normalization, empirical comparisons across different models, dataset sizes, and domains remain sparse. This study bridges this gap by evaluating five machine learning algorithms- Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost)- on datasets of varying sizes from the business, health, and agriculture domains. This study assessed the models without scaling, with standardized data, and with normalized data. The comparative analysis reveals that while standardization consistently improves the performance of linear models like SVM and LR for large and medium datasets, normalization enhances the performance of linear models for small datasets. Moreover, this study employs SHapley Additive exPlanations (SHAP) summary plots to understand how each feature contributes to the model’s performance interpretability with unscaled and scaled datasets. This study provides practical guidelines for selecting appropriate scaling techniques based on the characteristics of datasets and compatibility with various algorithms. Finally, this investigation laid the foundation for data preprocessing and feature engineering across diverse models and domains which offers actionable insights for practitioners.
Published: 2024
Full Text: View/download PDF

8. The effect of feature normalization methods in radiomics

Author: Aydin Demircioğlu
Subjects: Feature normalization, Feature scaling, Feature selection, Radiomics, High-dimensional datasets, Medical physics. Medical radiology. Nuclear medicine, R895-920
Abstract: Abstract Objectives In radiomics, different feature normalization methods, such as z-Score or Min–Max, are currently utilized, but their specific impact on the model is unclear. We aimed to measure their effect on the predictive performance and the feature selection. Methods We employed fifteen publicly available radiomics datasets to compare seven normalization methods. Using four feature selection and classifier methods, we used cross-validation to measure the area under the curve (AUC) of the resulting models, the agreement of selected features, and the model calibration. In addition, we assessed whether normalization before cross-validation introduces bias. Results On average, the difference between the normalization methods was relatively small, with a gain of at most + 0.012 in AUC when comparing the z-Score (mean AUC: 0.707 ± 0.102) to no normalization (mean AUC: 0.719 ± 0.107). However, on some datasets, the difference reached + 0.051. The z-Score performed best, while the tanh transformation showed the worst performance and even decreased the overall predictive performance. While quantile transformation performed, on average, slightly worse than the z-Score, it outperformed all other methods on one out of three datasets. The agreement between the features selected by different normalization methods was only mild, reaching at most 62%. Applying the normalization before cross-validation did not introduce significant bias. Conclusion The choice of the feature normalization method influenced the predictive performance but depended strongly on the dataset. It strongly impacted the set of selected features. Critical relevance statement Feature normalization plays a crucial role in the preprocessing and influences the predictive performance and the selected features, complicating feature interpretation. Key points • The impact of feature normalization methods on radiomic models was measured. • Normalization methods performed similarly on average, but differed more strongly on some datasets. • Different methods led to different sets of selected features, impeding feature interpretation. • Model calibration was not largely affected by the normalization method. Graphical Abstract
Published: 2024
Full Text: View/download PDF

9. CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model

Author: Feng Yang, Chunying Hu, Aokang Liang, Sheng Wang, Yun Su, and Fu Xu
Subjects: rare wildlife detection, multimodal learning, concept enhancement, feature scaling, Veterinary medicine, SF600-1100, Zoology, QL1-991
Abstract: Accurate and efficient wildlife monitoring is essential for conservation efforts. Traditional image-based methods often struggle to detect small, occluded, or camouflaged animals due to the challenges posed by complex natural environments. To overcome these limitations, an innovative multimodal target detection framework is proposed in this study, which integrates textual information from an animal knowledge base as supplementary features to enhance detection performance. First, a concept enhancement module was developed, employing a cross-attention mechanism to fuse features based on the correlation between textual and image features, thereby obtaining enhanced image features. Secondly, a feature normalization module was developed, amplifying cosine similarity and introducing learnable parameters to continuously weight and transform image features, further enhancing their expressive power in the feature space. Rigorous experimental validation on a specialized dataset provided by the research team at Northwest A&F University demonstrates that our multimodal model achieved a 0.3% improvement in precision over single-modal methods. Compared to existing multimodal target detection algorithms, this model achieved at least a 25% improvement in AP and excelled in detecting small targets of certain species, significantly surpassing existing multimodal target detection model benchmarks. This study offers a multimodal target detection model integrating textual and image information for the conservation of rare and endangered wildlife, providing strong evidence and new perspectives for research in this field.
Published: 2024
Full Text: View/download PDF

10. Optimizing epileptic seizure recognition performance with feature scaling and dropout layers.

Author: Omar, Ahmed and Abd El-Hafeez, Tarek
Subjects: *DEEP learning, *EPILEPSY, *CONVOLUTIONAL neural networks, *FEATURE selection, *PRINCIPAL components analysis, *RECOGNITION (Psychology)
Abstract: Epilepsy is a widespread neurological disorder characterized by recurring seizures that have a significant impact on individuals' lives. Accurately recognizing epileptic seizures is crucial for proper diagnosis and treatment. Deep learning models have shown promise in improving seizure recognition accuracy. However, optimizing their performance for this task remains challenging. This study presents a new approach to optimize epileptic seizure recognition using deep learning models. The study employed a dataset of Electroencephalography (EEG) recordings from multiple subjects and trained nine deep learning architectures with different preprocessing techniques. By combining a 1D convolutional neural network (Conv1D) with a Long Short-Term Memory (LSTM) network, we developed the Conv1D + LSTM architecture. This architecture, augmented with dropout layers, achieved an effective test accuracy of 0.993. The LSTM architecture alone achieved a slightly lower accuracy of 0.986. Additionally, the Bidirectional LSTM (BiLSTM) and Gated Recurrent Unit (GRU) architectures performed exceptionally well, with accuracies of 0.983 and 0.984, respectively. Notably, standard scaling proved to be advantageous, significantly improving the accuracy of both BiLSTM and GRU compared to MinMax scaling. These models consistently achieved high test accuracies across different percentages of Principal Component Analysis (PCA), with the best results obtained when retaining 50% and 90% of the features. Chi-square feature selection also enhanced the classification performance of BiLSTM and GRU models. The study reveals that different deep learning architectures respond differently to feature scaling, PCA, and feature selection methods. Understanding these nuances can lead to optimized models for epileptic seizure recognition, ultimately improving patient outcomes and quality of life. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. The effect of feature normalization methods in radiomics.

Author: Demircioğlu, Aydin
Subjects: *RADIOMICS, *FEATURE selection
Abstract: Objectives: In radiomics, different feature normalization methods, such as z-Score or Min–Max, are currently utilized, but their specific impact on the model is unclear. We aimed to measure their effect on the predictive performance and the feature selection. Methods: We employed fifteen publicly available radiomics datasets to compare seven normalization methods. Using four feature selection and classifier methods, we used cross-validation to measure the area under the curve (AUC) of the resulting models, the agreement of selected features, and the model calibration. In addition, we assessed whether normalization before cross-validation introduces bias. Results: On average, the difference between the normalization methods was relatively small, with a gain of at most + 0.012 in AUC when comparing the z-Score (mean AUC: 0.707 ± 0.102) to no normalization (mean AUC: 0.719 ± 0.107). However, on some datasets, the difference reached + 0.051. The z-Score performed best, while the tanh transformation showed the worst performance and even decreased the overall predictive performance. While quantile transformation performed, on average, slightly worse than the z-Score, it outperformed all other methods on one out of three datasets. The agreement between the features selected by different normalization methods was only mild, reaching at most 62%. Applying the normalization before cross-validation did not introduce significant bias. Conclusion: The choice of the feature normalization method influenced the predictive performance but depended strongly on the dataset. It strongly impacted the set of selected features. Critical relevance statement: Feature normalization plays a crucial role in the preprocessing and influences the predictive performance and the selected features, complicating feature interpretation. Key points: • The impact of feature normalization methods on radiomic models was measured. • Normalization methods performed similarly on average, but differed more strongly on some datasets. • Different methods led to different sets of selected features, impeding feature interpretation. • Model calibration was not largely affected by the normalization method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Fault Diagnosis of Electric Drives Using Ensemble Machine Learning Techniques

Author: Paul, Shashank, Chaudhary, Abhishek, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Swaroop, Abhishek, editor, Polkowski, Zdzislaw, editor, Correia, Sérgio Duarte, editor, and Virdee, Bal, editor
Published: 2023
Full Text: View/download PDF

13. Performance of Machine Learning Models on Crime Data

Author: Bhardwaj, Geetika, Bawa, R. K., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Tanwar, Sudeep, editor, Wierzchon, Slawomir T., editor, Singh, Pradeep Kumar, editor, Ganzha, Maria, editor, and Epiphaniou, Gregory, editor
Published: 2023
Full Text: View/download PDF

14. Numerical Feature Selection and Hyperbolic Tangent Feature Scaling in Machine Learning-Based Detection of Anomalies in the Computer Network Behavior.

Author: Protić, Danijela, Stanković, Miomir, Prodanović, Radomir, Vulić, Ivan, Stojanović, Goran M., Simić, Mitar, Ostojić, Gordana, and Stankovski, Stevan
Subjects: FEATURE selection, MACHINE learning, COMPUTER networks, NETWORK PC (Computer), FEEDFORWARD neural networks, ANOMALY detection (Computer security), SUPERVISED learning
Abstract: Anomaly-based intrusion detection systems identify the computer network behavior which deviates from the statistical model of typical network behavior. Binary classifiers based on supervised machine learning are very accurate at classifying network data into two categories: normal traffic and anomalous activity. Most problems with supervised learning are related to the large amount of data required to train the classifiers. Feature selection can be used to reduce datasets. The goal of feature selection is to select a subset of relevant input features to optimize the evaluation and improve performance of a given classifier. Feature scaling normalizes all features to the same range, preventing the large size of features from affecting classification models or other features. The most commonly used supervised machine learning models, including decision trees, support vector machine, k-nearest neighbors, weighted k-nearest neighbors and feedforward neural network, can all be improved by using feature selection and feature scaling. This paper introduces a new feature scaling technique based on a hyperbolic tangent function and damping strategy of the Levenberg–Marquardt algorithm. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

15. Diabetes disease prediction using firefly optimization-based cat-boost classifier in big data analytics.

Author: Geo Jenefer, G. and Deepa, A.J.
Subjects: *FIREFLIES, *ETIOLOGY of diabetes, *BIG data, *DIABETES, *FELIDAE, *ELECTRONIC data processing, *MACHINE learning, *DISEASE progression
Abstract: Globally, diabetes directly causes 1.5 million fatalities each year. It is necessary to predict such diseases at an earlier stage and cure them. Since modern healthcare data comprises huge amounts of information, it is tough to process such data in conventional databases. Previously, various machine learning (ML) algorithms were used to predict diabetics, and their performance was evaluated. But still, those existing algorithms result in poor accuracy and performance.This work proposes a FOCB (Firefly Optimization-based CatBoost) classifier for predicting diabetes. The PIMA Indian diabetic dataset has been taken as the input dataset. The proposed FOCB algorithm has been compared with various machine learning algorithms. From the results, we can see that the FOCB classifier gives the best accuracy of 96% with improved performance. The proposed system has been compared with other FO-based machine learning algorithms like NB, KNN, RF, AB, GB, XGB, CNN, DBN, and CB, and it has been proven that CB based on FO produces better accuracy with less hamming loss. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. Multi-Encoder Context Aggregation Network for Structured and Unstructured Urban Street Scene Analysis

Author: Tanmay Singha, Duc-Son Pham, and Aneesh Krishna
Subjects: Semantic segmentation, feature scaling, feature aggregation, deep learning, scene understanding, convolutional neural networks, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Developing computationally efficient semantic segmentation models that are suitable for resource-constrained mobile devices is an open challenge in computer vision research. To address this challenge, we propose a novel real-time semantic scene segmentation model called Multi-encoder Context Aggregation Network (MCANet), which offers the best combination of low model complexity and state-of-the-art (SOTA) performance on benchmark datasets. While we follow the multi-encoder approach, our novelty lies in the varying number of scales to capture both global context and local details effectively. We introduce suitable lateral connections between sub-encoders for improved feature refinement. We also optimize the backbone by exploiting the residual block of MobileNet for resource-constrained applications. On the decoder side, the proposed model includes a new Local and Global Context Aggregation (LGCA) module that significantly enhances semantic details in the segmentation output. Finally, we use several known efficient convolution techniques for the classification module to make the model more computationally efficient. We provide a comprehensive evaluation of MCANet on multiple datasets containing structured and unstructured urban street scenes. Among the existing real-time models with less than 3 million parameters, the proposed model is more competitive as it achieves the SOTA performance without ImageNet pre-trained weights on both structured and unstructured environments while being more compact for resource-constrained applications.
Published: 2023
Full Text: View/download PDF

17. A Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

Author: S. M. Mahedy Hasan, Md Palash Uddin, Md Al Mamun, Muhammad Imran Sharif, Anwaar Ulhaq, and Govind Krishnamoorthy
Subjects: Autism spectrum disorder, machine learning, classification, feature scaling, feature selection technique, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Autism Spectrum Disorder (ASD) is a type of neurodevelopmental disorder that affects the everyday life of affected patients. Though it is considered hard to completely eradicate this disease, disease severity can be mitigated by taking early interventions. In this paper, we propose an effective framework for the evaluation of various Machine Learning (ML) techniques for the early detection of ASD. The proposed framework employs four different Feature Scaling (FS) strategies i.e., Quantile Transformer (QT), Power Transformer (PT), Normalizer, and Max Abs Scaler (MAS). Then, the feature-scaled datasets are classified through eight simple but effective ML algorithms like Ada Boost (AB), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA). Our experiments are performed on four standard ASD datasets (Toddlers, Adolescents, Children, and Adults). Comparing the classification outcomes using various statistical evaluation measures (Accuracy, Receiver Operating Characteristic: ROC curve, F1-score, Precision, Recall, Mathews Correlation Coefficient: MCC, Kappa score, and Log loss), the best-performing classification methods, and the best FS techniques for each ASD dataset are identified. After analyzing the experimental outcomes of different classifiers on feature-scaled ASD datasets, it is found that AB predicted ASD with the highest accuracy of 99.25%, and 97.95% for Toddlers and Children, respectively and LDA predicted ASD with the highest accuracy of 97.12% and 99.03% for Adolescents and Adults datasets, respectively. These highest accuracies are achieved while scaling Toddlers and Children with normalizer FS and Adolescents and Adults with the QT FS method. Afterward, the ASD risk factors are calculated, and the most important attributes are ranked according to their importance values using four different Feature Selection Techniques (FSTs) i.e., Info Gain Attribute Evaluator (IGAE), Gain Ratio Attribute Evaluator (GRAE), Relief F Attribute Evaluator (RFAE), and Correlation Attribute Evaluator (CAE). These detailed experimental evaluations indicate that proper finetuning of the ML methods can play an essential role in predicting ASD in people of different ages. We argue that the detailed feature importance analysis in this paper will guide the decision-making of healthcare practitioners while screening ASD cases. The proposed framework has achieved promising results compared to existing approaches for the early detection of ASD.
Published: 2023
Full Text: View/download PDF

18. Ensemble Model Discovery for Prognostication of Diabetes

Author: Bahore, Pranjal, Paliwal, Shreyansh, Rautela, Dipanshu, Chaurasiya, Rahul, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Sharma, Harish, editor, Shrivastava, Vivek, editor, Kumari Bharti, Kusum, editor, and Wang, Lipo, editor
Published: 2022
Full Text: View/download PDF

19. OXGBoost: An Optimized eXtreme Gradient Boosting Algorithm for Classification of Breast Cancer

Author: Kumar, Pullela SVVSR, Neti, Praveen, Kumar, Dirisala J. Nagendra, Murthy, G. S. N., Lalitha, R. V. S., Kalyan Ram, Mylavarapu, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Gupta, Deepak, editor, Sambyo, Koj, editor, Prasad, Mukesh, editor, and Agarwal, Sonali, editor
Published: 2022
Full Text: View/download PDF

20. Improving Recommendation for Video Content Using Hyperparameter Tuning in Sparse Data Environment

Author: Gupta, Rohit Kumar, Verma, Vivek Kumar, Mundra, Ankit, Kapoor, Rohan, Mishra, Shekhar, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nanda, Priyadarsi, editor, Verma, Vivek Kumar, editor, Srivastava, Sumit, editor, Gupta, Rohit Kumar, editor, and Mazumdar, Arka Prokash, editor
Published: 2022
Full Text: View/download PDF

21. Social Network Mining for Predicting Users’ Credibility with Optimal Feature Selection

Author: Jayashree, P., Laila, K., Santhosh Kumar, K., Udayavannan, A., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Raj, Jennifer S., editor, Palanisamy, Ram, editor, Perikos, Isidoros, editor, and Shi, Yong, editor
Published: 2022
Full Text: View/download PDF

22. Development of Sustainability Assessment Criteria in Selection of Municipal Solid Waste Treatment Technology in Developing Countries: A Case of Ho Chi Minh City, Vietnam.

Author: Le, Phuong Giang, Le, Hung Anh, Dinh, Xuan Thang, and Nguyen, Kieu Lan Phuong
Abstract: Municipal solid waste (MSW) management is a significant problem for developing countries due to lack of sufficient infrastructure, poor management capacity, and low level of waste treatment technology. This study proposes three main groups of criteria, i.e., social, economic, and environmental, that can be used as an effective tool to assess the sustainability of MSW treatment technologies, considering Ho Chi Minh City, Vietnam as a case study. The sustainability assessment criteria consist of a list of indicators which consider potential waste treatment plants. The indicators and technologies then undertake a selection process from identifying assessment goals and key aspects to data collection and consultation of experts. The findings from the previous phase will be used to select the most preferred waste technology through AHP and normalization approaches. As a result, 12 selected indicators are as follows: investment cost, treatment cost, operation and maintenance costs, revenue/benefits, job creation, community consensus, support policy, community health, air pollution, water pollution, greenhouse gas emissions, and land quota. Among three MSW facilities selected, i.e., landfill, compost, and waste-to-energy incineration, waste-to-energy is determined as the best alternative solution for Ho Chi Minh City in a given context of approximate 70% of landfilling being applied. The selection process and indicators found can guide decision-makers and policy on selecting MSW treatment technologies in developing countries. Additionally, Ho Chi Minh City's governors benefit from finding the most appropriate waste technology. A technology adoption roadmap and its implementation plan should be thought thoroughly to address challenges in MSW management in the city. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. A Comprehensive Report on Machine Learning-based Early Detection of Alzheimer's Disease using Multi-modal Neuroimaging Data.

Author: SHARMA, SHALLU and MANDAL, PRAVAT KUMAR
Subjects: *ALZHEIMER'S disease, *COMPUTER-aided diagnosis, *FEATURE selection, *BRAIN imaging, *FEATURE extraction
Abstract: Alzheimer's Disease (AD) is a devastating neurodegenerative brain disorder with no cure. An early identification helps patientswith AD sustain a normal living. We have outlined machine learning (ML) methodologies with different schemes of feature extraction to synergize complementary and correlated characteristics of data acquired from multiple modalities of neuroimaging. A variety of feature selection, scaling, and fusion methodologies along with confronted challenges are elaborated for designing an ML-based AD diagnosis system. Additionally, thematic analysis has been provided to compare the ML workflow for possible diagnostic solutions. This comprehensive report adds value to the further advancement of computer-aided early diagnosis system based on multi-modal neuroimaging data from patients with AD. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease.

Author: Md, Abdul Quadir, Kulkarni, Sanika, Joshua, Christy Jackson, Vaichole, Tejas, Mohan, Senthilkumar, and Iwendi, Celestine
Subjects: MACHINE learning, LIVER diseases, FEATURE selection, RANDOM forest algorithms, MISSING data (Statistics)
Abstract: There has been a sharp increase in liver disease globally, and many people are dying without even knowing that they have it. As a result of its limited symptoms, it is extremely difficult to detect liver disease until the very last stage. In the event of early detection, patients can begin treatment earlier, thereby saving their lives. It has become increasingly popular to use ensemble learning algorithms since they perform better than traditional machine learning algorithms. In this context, this paper proposes a novel architecture based on ensemble learning and enhanced preprocessing to predict liver disease using the Indian Liver Patient Dataset (ILPD). Six ensemble learning algorithms are applied to the ILPD, and their results are compared to those obtained with existing studies. The proposed model uses several data preprocessing methods, such as data balancing, feature scaling, and feature selection, to improve the accuracy with appropriate imputations. Multivariate imputation is applied to fill in missing values. On skewed columns, log1p transformation was applied, along with standardization, min–max scaling, maximum absolute scaling, and robust scaling techniques. The selection of features is carried out based on several methods including univariate selection, feature importance, and correlation matrix. These enhanced preprocessed data are trained on Gradient boosting, XGBoost, Bagging, Random Forest, Extra Tree, and Stacking ensemble learning algorithms. The results of the six models were compared with each other, as well as with the models used in other research works. The proposed model using extra tree classifier and random forest, outperformed the other methods with the highest testing accuracy of 91.82% and 86.06%, respectively, portraying our method as a real-world solution for detecting liver disease. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. Design and Analysis of Urban Land Lease Price Predicting Model Using Batch Gradient Descent Algorithm.

Author: Niguse, Kifle Berhane
Subjects: *REAL property sales & prices, *COST functions, *URBAN planning, *ALGORITHMS, *BID price
Abstract: Standard and econometric models are appropriate for causal relationships and interpretations among facets of the economy. But with prediction, they tend to over-fit samples and simplify poorly to new, undetected data. This paper presents a batch gradient algorithm for predicting the rice of land with large datasets. This paper uses a batch gradient descent algorithm to minimize the cost function, J(θ) iteratively with possible combinations of θ0 and θ1 the number of iterations i=1500 and learning rates,α of 0.01, 0.02, 0.03 for the linear regression case and i = 100, α =0.3, 0.2, and 0.1 for the multiple regression case. The paper uses Octave-4.0.3(GUI) for implementing 129 samples of the lease bid price of Mekelle City as training sets and feature inputs of two and three for linear regression and multiple regressions. Using α = 0.01, the best fitting parameters found by training the dataset are θ0 = 6.02 and θ1 = 2.30 with a cost of J=67.82. The model predicts with an accuracy of 92.6% using LR and 91.15% using MLR for a 315 m² land size. As the learning rate increases, the fitting parameters θ0 and θ1 increase and decrease respectively with an equal cost but the model's prediction error increments slowly. With multiple regression, as the learning rate lowers, the model under fits prediction drastically (with an accuracy of 60%) with gradient descent and predicts with an accuracy of 91.5% with ordinary equations. So, prediction with ordinary equations provides the best fit for multiple regressions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Standardization and Data Augmentation in Genetic Programming.

Author: Owen, Caitlin A., Dick, Grant, and Whigham, Peter A.
Subjects: DATA augmentation, STANDARDIZATION, BENCHMARK problems (Computer science), GENETIC programming, TRAINING needs
Abstract: Genetic programming (GP) is a common method for performing symbolic regression that relies on the use of ephemeral random constants in order to adequately scale predictions. Suitable values for these constants must be drawn from appropriate, but typically unknown, distributions for the problem being modeled. While rarely used with GP, $Z$ -score standardization of feature and response spaces often significantly improves the predictive performance of GP by removing scale issues and reducing error due to bias. However, in some cases it is also associated with erratic error due to variance. This article demonstrates that this variance component increases in the presence of gaps at the boundaries of the training data explanatory variable intervals. An initial solution to this problem is proposed that augments training data with pseudo instances located at the boundaries of the intervals. When applied to benchmark problems, particularly with small training samples, this solution reduces error due to variance and, therefore, total error. Augmentation is shown to also stabilize error in larger problems; however, results suggest that standardized GP works well on such problems with little need for training data augmentation. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

27. Extraction of Mechanistic Features

Author: Liu, Wing Kam, Gan, Zhengtao, Fleming, Mark, Liu, Wing Kam, Gan, Zhengtao, and Fleming, Mark
Published: 2021
Full Text: View/download PDF

28. A Lightweight Multi-scale Feature Fusion Network for Real-Time Semantic Segmentation

Author: Singha, Tanmay, Pham, Duc-Son, Krishna, Aneesh, Gedeon, Tom, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mantoro, Teddy, editor, Lee, Minho, editor, Ayu, Media Anugerah, editor, Wong, Kok Wai, editor, and Hidayanto, Achmad Nizar, editor
Published: 2021
Full Text: View/download PDF

29. Practical Aspects in Machine Learning

Author: Gupta, Pramod, Sehgal, Naresh K., Gupta, Pramod, and Sehgal, Naresh K.
Published: 2021
Full Text: View/download PDF

30. Cybersecurity in Smart Cities: Detection of Opposing Decisions on Anomalies in the Computer Network Behavior.

Author: Protic, Danijela, Gaur, Loveleen, Stankovic, Miomir, and Rahman, Md Anisur
Subjects: COMPUTER networks, FEEDFORWARD neural networks, SMART cities, NETWORK PC (Computer), COMPUTER engineering, FEATURE selection
Abstract: The increased use of urban technologies in smart cities brings new challenges and issues. Cyber security has become increasingly important as many critical components of information and communication systems depend on it, including various applications and civic infrastructures that use data-driven technologies and computer networks. Intrusion detection systems monitor computer networks for malicious activity. Signature-based intrusion detection systems compare the network traffic pattern to a set of known attack signatures and cannot identify unknown attacks. Anomaly-based intrusion detection systems monitor network traffic to detect changes in network behavior and identify unknown attacks. The biggest obstacle to anomaly detection is building a statistical normality model, which is difficult because a large amount of data is required to estimate the model. Supervised machine learning-based binary classifiers are excellent tools for classifying data as normal or abnormal. Feature selection and feature scaling are performed to eliminate redundant and irrelevant data. Of the 24 features of the Kyoto 2006+ dataset, nine numerical features are considered essential for model training. Min-Max normalization in the range [0,1] and [−1,1], Z-score standardization, and new hyperbolic tangent normalization are used for scaling. A hyperbolic tangent normalization is based on the Levenberg-Marquardt damping strategy and linearization of the hyperbolic tangent function with a narrow slope gradient around zero. Due to proven classification ability, in this study we used a feedforward neural network, decision tree, support vector machine, k-nearest neighbor, and weighted k-nearest neighbor models Overall accuracy decreased by less than 0.1 per cent, while processing time was reduced by more than a two-fold reduction. The results show a clear benefit of the TH scaling regarding processing time. Regardless of how accurate the classifiers are, their decisions can sometimes differ. Our study describes a conflicting decision detector based on an XOR operation performed on the outputs of two classifiers, the fastest feedforward neural network, and the more accurate but slower weighted k-nearest neighbor model. The results show that up to 6% of different decisions are detected. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. Designing a Meta Learning Classifier for Sensor-Enabled Healthcare Applications

Author: Patikar, Srabani, Saha, Anindita, Neogy, Sarmistha, and Chowdhury, Chandreyee
Published: 2024
Full Text: View/download PDF

32. A machine learning classifier-based approach for diabetes mellitus risk prediction.

Author: B JK and Ranganathan M
Abstract: Currently, Diabetes Mellitus (DM) can be life-threatening due to the dietary habits and lifestyle choices of individuals. Diabetes is characterised by elevated levels of glucose in the blood and an excess of protein in the blood. Poor eating habits and lifestyles are largely responsible for the rise in overweight, obesity, and various related conditions. This study investigated many diabetes-related risk forecasting techniques and algorithms. The eight machine learning (ML) algorithms used the diabetes dataset to test various prediction techniques, including a Support Vector Classifier, gradient-boosting, multilayer perceptron, random forest, K-nearest neighbors, logistic regression, extreme gradient boosting, and decision tree. To enhance the diabetic prediction ability of the model, we suggested using Feature Engineering (FE) and feature scaling. For our investigation, we utilized the Mendeley dataset on diabetes to assess the capacity of the model to predict diabetes. We developed a model by using Python programming and eight classification techniques. The Random Forest with 99.21%, Gradient Boosting with 99.61%, Extreme Gradient Boosting, and Decision Tree achieved the highest F1 score (99.81%), accuracy rate (99.80%), precision (99.81%), and recall (99.81%) of all classification approaches., (© 2024 IOP Publishing Ltd. All rights, including for text and data mining, AI training, and similar technologies, are reserved.)
Published: 2024
Full Text: View/download PDF

33. The Impact of Different Feature Scaling Methods on Intrusion Detection for in-Vehicle Controller Area Network (CAN)

Author: Lokman, Siti-Farhana, Othman, Abu Talib, Bakar, Muhamad Husaini Abu, Musa, Shahrulniza, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Anbar, Mohammed, editor, Abdullah, Nibras, editor, and Manickam, Selvakumar, editor
Published: 2020
Full Text: View/download PDF

34. Photoplethysmographic waveform detection for determining hatching egg activity via deep neural network.

Author: Geng, Lei, Guo, Quan, Xiao, Zhitao, Tong, Jun, and Li, Yuelong
Abstract: It is essential to classify dead embryos and live embryos accurately in developing a successful vaccine. The deep learning-based classification of heartbeat signals to determine embryo activity is considered to be the most effective, but generally speaking, existing detection methods are either harmful to embryos or inefficient. The photoplethysmographic (PPG) waveform was used in this study for embryo activity detection. The PPG technique is non-invasive and works based on detection of optical absorption intensity in the blood. We rescaled the original data to weight each feature equally, which allows the CNN model to treat every feature in the data equally without neglecting low-intensity features. We also constructed a novel detection model capable of powerful feature extraction. Our model is based on the CNN structure and GRU. The CNN structure is the basic feature extractor. We added a channel attention mechanism to recalibrate the feature map channel, which enhances the network's ability to extract useful features. The GRU module captures timing characteristics to compensate for the inability of the CNN to extract temporal information. We validated our approach on experimental data to find that it outperforms several baseline methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

35. Permission-Based Feature Scaling Method for Lightweight Android Malware Detection

Author: Zhu, Dali, Xi, Tong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Douligeris, Christos, editor, Karagiannis, Dimitris, editor, and Apostolou, Dimitris, editor
Published: 2019
Full Text: View/download PDF

36. Software and Hardware Design Specifications for Quantifying Carbohydrate Contents in Food

Author: Ishaya GAMBO, Oluwasegun TALABI, Karen OLUFOKUNBI, and Rhodes MASSENON
Subjects: spectrometry, near-infrared radiation, carbohydrate, machine learning, multiple linear regression, feature scaling, diabetes mellitus, Mathematics, QA1-939, Electronic computers. Computer science, QA75.5-76.95
Abstract: Analysis and detection of carbohydrate contents in food have been achieved in the past using different methods. However, these methods require pre-treatment and chemical reaction, which tamper with the sample composition, thereby making it unhealthy for consumption afterwards. A non- destructive method that will preserve the composition of the food sample– while giving an estimated result– is crucial. In this paper, a spectrometer was designed and built that can detect light spectrum penetrating through food samples. The goal is to assist diabetic patients and the general public to identify the amount of carbohydrate in their food for proper medical ration. The paper adopted the experimental research approach. An optical approach was employed using Near-Infrared LED to design a spectrometer that scans through food samples to obtain a spectrum using the TCD1304CD image sensor. The output was extracted using Arduino Uno Microcontroller which runs through a supervised machine learning algorithm, first to train the algorithm, and later to make the prediction using the algorithm. We trained the machine learning model using standard food samples with known composition. The system was tested using standard food samples which are not in the training dataset. The test result obtained was evaluated in comparison to the nutritional data of the testing samples. More so, it shows an average deviation of 46.87% of the expected values provided in the nutritional data of the sample.
Published: 2020
Full Text: View/download PDF

37. Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease

Author: Abdul Quadir Md, Sanika Kulkarni, Christy Jackson Joshua, Tejas Vaichole, Senthilkumar Mohan, and Celestine Iwendi
Subjects: liver disease, machine learning, multivariate imputation, feature scaling, ensemble learning, gradient boosting, Biology (General), QH301-705.5
Abstract: There has been a sharp increase in liver disease globally, and many people are dying without even knowing that they have it. As a result of its limited symptoms, it is extremely difficult to detect liver disease until the very last stage. In the event of early detection, patients can begin treatment earlier, thereby saving their lives. It has become increasingly popular to use ensemble learning algorithms since they perform better than traditional machine learning algorithms. In this context, this paper proposes a novel architecture based on ensemble learning and enhanced preprocessing to predict liver disease using the Indian Liver Patient Dataset (ILPD). Six ensemble learning algorithms are applied to the ILPD, and their results are compared to those obtained with existing studies. The proposed model uses several data preprocessing methods, such as data balancing, feature scaling, and feature selection, to improve the accuracy with appropriate imputations. Multivariate imputation is applied to fill in missing values. On skewed columns, log1p transformation was applied, along with standardization, min–max scaling, maximum absolute scaling, and robust scaling techniques. The selection of features is carried out based on several methods including univariate selection, feature importance, and correlation matrix. These enhanced preprocessed data are trained on Gradient boosting, XGBoost, Bagging, Random Forest, Extra Tree, and Stacking ensemble learning algorithms. The results of the six models were compared with each other, as well as with the models used in other research works. The proposed model using extra tree classifier and random forest, outperformed the other methods with the highest testing accuracy of 91.82% and 86.06%, respectively, portraying our method as a real-world solution for detecting liver disease.
Published: 2023
Full Text: View/download PDF

38. A Fuzzy Logic Approach to Predict the Popularity of a Presidential Candidate

Author: Mazumder, Pritom, Chowdhury, Navid Anjum, Anwar-Ul-Azim Bhuiya, Moh., Akash, Shabbir Haque, Rahman, Rashedur M., Kacprzyk, Janusz, Series Editor, Sieminski, Andrzej, editor, Kozierkiewicz, Adrianna, editor, Nunez, Manuel, editor, and Ha, Quang Thuy, editor
Published: 2018
Full Text: View/download PDF

39. A Neuronal Morphology Classification Approach Based on Deep Residual Neural Networks

Author: Lin, Xianghong, Zheng, Jianyang, Wang, Xiangwen, Ma, Huifang, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Cheng, Long, editor, Leung, Andrew Chi Sing, editor, and Ozawa, Seiichi, editor
Published: 2018
Full Text: View/download PDF

40. Predicting the Crystal Structure and Lattice Parameters of the Perovskite Materials via Different Machine Learning Models Based on Basic Atom Properties

Author: Sams Jarin, Yufan Yuan, Mingxing Zhang, Mingwei Hu, Masud Rana, Sen Wang, and Ruth Knibbe
Subjects: machine learning (ML), perovskites, crystal structures, lattice parameters, feature scaling, feature correlations, Crystallography, QD901-999
Abstract: Perovskite materials have high potential for the renewable energy sources such as solar PV cells, fuel cells, etc. Different structural distortions such as crystal structure and lattice parameters have a critical impact on the determination of the perovskite’s structure strength, stability, and overall performance of the materials in the applications. To improve the perovskite performance and accelerate the prediction of different structural distortions, few ML models have been established to predict the type of crystal structures and their lattice parameters using the basic atom characteristics of the perovskite materials. In this work, different ML models such as random forest (RF), support vector machine (SVM), neural network (NN), and genetic algorithm (GA) supported neural network (GA-NN) have been established, whereas support vector regression (SVR) and genetic algorithm-supported support vector regression (GA-SVR) models have been assessed for the prediction of the lattice parameters. The prediction model accuracy for the crystal structure classification is almost 88% in average for GA-NN whereas for the lattice constants regression model GA-SVR model gives ~95% in average which can be further improved by accumulating more robust datasets into the database. These ML models can be used as an alternative process to accelerate the development of finding out new perovskite material by providing valuable insight for the behaviours of the perovskite materials.
Published: 2022
Full Text: View/download PDF

41. Design and Analysis of Urban Land Lease Price Predicting Model Using Batch Gradient Descent Algorithm

Author: Berhane Niguse, Kifle
Subjects: Batch Gradient Descent Algorithm, Cost Function, Feature Scaling, Learning Rate, Machine Learning, Regression, General Medicine
Abstract: Standard and econometric models are appropriate for causal relationships and interpretations among facets of the economy. But with prediction, they tend to over-fit samples and simplify poorly to new, undetected data. This paper presents a batch gradient algorithm for predicting the rice of land with large datasets. This paper uses a batch gradient descent algorithm to minimize the cost function, iteratively with possible combinations of the number of iterations i=1500 and learning rates, of 0.01, 0.02, 0.03 for the linear regression case and i = 100, 0.3, 0.2, and 0.1 for the multiple regression case. The paper uses Octave-4.0.3(GUI) for implementing 129 samples of the lease bid price of Mekelle City as training sets and feature inputs of two and three for linear regression and multiple regressions. Using = 0.01, the best fitting parameters found by training the dataset are with a cost of J=67.82. The model predicts with an accuracy of 92.6% using LR and 91.15% using MLR for a 315 m2 land size. As the learning rate increases, the fitting parameters increase and decrease respectively with an equal cost but the model’s prediction error increments slowly. With multiple regression, as the learning rate lowers, the model under fits prediction drastically (with an accuracy of 60%) with gradient descent and predicts with an accuracy of 91.5% with ordinary equations. So, prediction with ordinary equations provides the best fit for multiple regressions.
Published: 2023
Full Text: View/download PDF

42. Common Spatial Pattern with Feature Scaling (FSc-CSP) for Motor Imagery Classification

Author: Prathama, Yohanes de Britto Hertyasta, Shapiai, Mohd Ibrahim, Aris, Siti Armiza Mohd, Ibrahim, Zuwairie, Jaafar, Jafreezal, Fauzi, Hilman, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Mohamed Ali, Mohamed Sultan, editor, Wahid, Herman, editor, Mohd Subha, Nurul Adilla, editor, Sahlan, Shafishuhaza, editor, Md. Yunus, Mohd Amri, editor, and Wahap, Ahmad Ridhwan, editor
Published: 2017
Full Text: View/download PDF

43. Toward Alzheimer's disease classification through machine learning.

Author: Rohini, M. and Surendran, D.
Subjects: *ALZHEIMER'S disease, *NOSOLOGY, *MACHINE learning, *AGE factors in cognition, *SUPERVISED learning, *SYMPTOMS, AGE factors in Alzheimer's disease
Abstract: Alzheimer's disease (AD) and cognitive impairment due to aging are the recently prevailing diseases among aged inhabitants due to an increase in the aging population. Several demographic characters, structural and functional neuroimaging investigations, cardio-vascular studies, neuropsychiatric symptoms, cognitive performances, and biomarkers in cerebrospinal fluids are the various predictors for AD. These input features can be considered for the prediction of symptoms whether they belong to AD or normal cognitive impairment due to aging. In the proposed study, the hypothesis is derived for supervised learning methods such as multivariate linear regression, logistic regression, and SVM. Feature scaling and normalization are performed with features as initial steps for applying the parameters to derive the hypothesis. Performance metrics are analyzed with the implementation results. The present work is applied to 1000 baseline assessment data from Alzheimer's Disease Neuroimaging Initiative (ADNI) studies that give conversion prediction. The comparison of results in the literature suggests that the efficiency of the proposed study is highly advantageous in differentiating AD pathology from cognitive impairment due to aging. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

44. Techniques to Improve the Performance of Supervised Learning Models.

Author: Sethi, Manoj, Kumar, Lalit, Sharma, Shashi Bhushan, and Vikas
Subjects: SUPERVISED learning, MACHINE learning, CATEGORIES (Mathematics), FEATURE extraction, KEY performance indicators (Management), DATA extraction
Abstract: Supervised Learning can be defined as training a model with the data which includes the result itself. Many supervised learning algorithms have been found so far. There are a great number of supervised learning models. Each model performs differently and has its own merits and demerits. Many data preprocessing techniques have been found and hence the combination of various data preprocessing techniques can increase the performance of the present supervised learning models. Raw data contain a lot of noise so it cannot be fed to the learning models directly. It needs to be preprocessed using various data preprocessing techniques. The proposed work compares different combinations of various data preprocessing techniques. Comparison is done using various performance metrics and the combination of different data preprocessing is applied to different models. The work comprises categorical data handling, missing value treatment, feature scaling and feature extraction as the data preprocessing steps. The comparison gives an idea of which technique is better for which type of models. The California census 1990 data has been used for the study. [ABSTRACT FROM AUTHOR]
Published: 2021

45. Automatic feature scaling and selection for support vector machine classification with functional data.

Author: Jiménez-Cordero, Asunción and Maldonado, Sebastián
Subjects: SUPPORT vector machines, FEATURE selection, KERNEL (Mathematics), CLASSIFICATION
Abstract: FunctionalData Analysis (FDA) has become a very important field in recent years due to its wide range of applications. However, there are several real-life applications in which hybrid functional data appear, i.e., data with functional and static covariates. The classification of such hybrid functional data is a challenging problem that can be handled with the Support Vector Machine (SVM). Moreover, the selection of the most informative features may yield to drastic improvements in the classification rates. In this paper, an embedded feature selection approach for SVM classification is proposed, in which the isotropic Gaussian kernel is modified by associating a bandwidth to each feature. The bandwidths are jointly optimized with the SVM parameters, yielding an alternating optimization approach. The effectiveness of our methodology was tested on benchmark data sets. Indeed, the proposed method achieved the best average performance when compared to 17 other feature selection and SVM classification approaches. A comprehensive sensitivity analysis of the parameters related to our proposal was also included, confirming its robustness. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

46. Multiclass spectral feature scaling method for dimensionality reduction.

Author: Matsuda, Momo, Morikuni, Keiichi, Imakura, Akira, Ye, Xiucai, and Sakurai, Tetsuya
Subjects: *MATRIX pencils, *CLASSIFICATION
Abstract: Irregular features disrupt the desired classification. In this paper, we consider aggressively modifying scales of features in the original space according to the label information to form well-separated clusters in low-dimensional space. The proposed method exploits spectral clustering to derive scaling factors that are used to modify the features. Specifically, we reformulate the Laplacian eigenproblem of the spectral clustering as an eigenproblem of a linear matrix pencil whose eigenvector has the scaling factors. Numerical experiments show that the proposed method outperforms well-established supervised dimensionality reduction methods for toy problems with more samples than features and real-world problems with more features than samples. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

47. Software and Hardware Design Specifications for Quantifying Carbohydrate Contents in Food.

Author: GAMBO, Ishaya, TALABI, Oluwasegun, OLUFOKUNBI, Karen, and MASSENON, Rhodes
Subjects: SUPERVISED learning, SOFTWARE architecture, ALGORITHMS, FOOD composition, ARDUINO (Microcontroller)
Abstract: Analysis and detection of carbohydrate contents in food have been achieved in the past using different methods. However, these methods require pre-treatment and chemical reaction, which tamper with the sample composition, thereby making it unhealthy for consumption afterwards. A nondestructive method that will preserve the composition of the food sample- while giving an estimated result- is crucial. In this paper, a spectrometer was designed and built that can detect light spectrum penetrating through food samples. The goal is to assist diabetic patients and the general public to identify the amount of carbohydrate in their food for proper medical ration. The paper adopted the experimental research approach. An optical approach was employed using Near-Infrared LED to design a spectrometer that scans through food samples to obtain a spectrum using the TCD1304CD image sensor. The output was extracted using Arduino Uno Microcontroller which runs through a supervised machine learning algorithm, first to train the algorithm, and later to make the prediction using the algorithm. We trained the machine learning model using standard food samples with known composition. The system was tested using standard food samples which are not in the training dataset. The test result obtained was evaluated in comparison to the nutritional data of the testing samples. More so, it shows an average deviation of 46.87% of the expected values provided in the nutritional data of the sample. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

48. Small Object Detection in Unmanned Aerial Vehicle Images Using Feature Fusion and Scaling-Based Single Shot Detector With Spatial Context Analysis.

Author: Liang, Xi, Zhang, Jing, Zhuo, Li, Li, Yuzhao, and Tian, Qi
Subjects: *GEOGRAPHIC spatial analysis, *DETECTORS, *FORECASTING, *DECONVOLUTION (Mathematics)
Abstract: Objects in unmanned aerial vehicle (UAV) images are generally small due to the high-photography altitude. Although many efforts have been made in object detection, how to accurately and quickly detect small objects is still one of the remaining open challenges. In this paper, we propose a feature fusion and scaling-based single shot detector (FS-SSD) for small object detection in the UAV images. The FS-SSD is an enhancement based on FSSD, a variety of the original single shot multibox detector (SSD). We add an extra scaling branch of the deconvolution module with an average pooling operation to form a feature pyramid. The original feature fusion branch is adjusted to be better suited to the small object detection task. The two feature pyramids generated by the deconvolution module and feature fusion module are utilized to make predictions together. In addition to the deep features learned by the FS-SSD, to further improve the detection accuracy, spatial context analysis is proposed to incorporate the object spatial relationships into object redetection. The interclass and intraclass distances between different object instances are computed as a spatial context, which proves effective for multiclass small object detection. Six experiments are conducted on the PASCAL VOC dataset and the two UAV image datasets. The experimental results demonstrate that the proposed method can achieve a comparable detection speed but an accuracy superior to those of the six state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

49. Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques.

Author: D.K., Thara, B.G, PremaSudha, and Xiong, Fan
Subjects: *EPILEPSY, *SEIZURES (Medicine), *FEATURE extraction, *ELECTROENCEPHALOGRAPHY, *COST functions, *MACHINE learning, *BRAIN-computer interfaces
Abstract: • Epilepsy seizure classification using deep neural network. • DNN gives good results if we perform feature scaling of the dataset. • We did experiment with 4 different feature scaling techniques. • We achieved accuracy of 97.56%, sensitivity 98.17%, specificity 94.93% & ROC 97.55%. • The comparison of the proposed model with existing machine learning model is done. Misdiagnosis of epilepsy is more seen in manual analysis of electroencephalogram (EEG) signals for epileptic seizure event detection. Therefore, automated systems for epilepsy detection are required to help neurologists in diagnosing epilepsy. These automated systems act as supporting systems for the neurologists to diagnose epilepsy with good accuracy in less time. In this paper an attempt is made to develop an automated seizure detection method using deep neural network using the dataset collected from Bonn University, Germany. The results of the experiment are compared with the existing machine learning method. Our model gives better results compared to ML methods without the need of feature extraction. It is important to perform normalization of the dataset using feature scaling techniques to obtain good accuracy in the results. In this experiment we also worked on feature scaling of the dataset. At first we tried using StandardScaler and calculated loss using mean squared error. For this we achieved an accuracy of 97.21%, Sensitivity 98.17%, Specificity 94.93%, F1_score 98.48%, MCC 91.96% and ROC 97.55%. Experiment was continued to compare the performance of four different feature scaling techniques and four different loss functions. From the experimental results it was observed that StandardScaler and RobustScaler are equally good and are the best feature scaling techniques. Loss computed using Mean squared error works better in combination with all feature scaling techniques. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

50. A Study on Data Profiling: Focusing on Attribute Value Quality Index.

Author: Jang, Won-Jung, Lee, Sung-Taek, Kim, Jong-Bae, and Gim, Gwang-Yong
Subjects: BIG data, INDUSTRY 4.0, DATA quality, DATABASES, MACHINE learning
Abstract: In the era of the Fourth Industrial Revolution, companies are focusing on securing artificial intelligence (AI) technology to enhance their competitiveness via machine learning, which is the core technology of AI, and to allow computers to acquire a high level of quality data through self-learning. Securing good-quality big data is becoming a very important asset for companies to enhance their competitiveness. The volume of digital information is expected to grow rapidly around the world, reaching 90 zettabytes (ZB) by 2020. It is very meaningful to present the value quality index on each data attribute as it may be desirable to evaluate the data quality for a user with regard to whether the data is suitable for use from the user's point of view. As a result, this allows the user to determine whether they would take the data or not based on the data quality index. In this study, we propose a quality index calculation model with structured and unstructured data, as well as a calculation method for the attribute value quality index (AVQI) and the structured data value quality index (SDVQI). [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

395 results on '"Feature scaling"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources