10,626 results on '"classification methods"'
Search Results
2. Naive Bayes classifier – An ensemble procedure for recall and precision enrichment
- Author
-
Peretz, Or, Koren, Michal, and Koren, Oded
- Published
- 2024
- Full Text
- View/download PDF
3. Semi-supervised ensemble learning for human activity recognition in casas Kyoto dataset
- Author
-
Paola Patricia, Ariza-Colpas, Rosberg, Pacheco-Cuentas, Butt-Aziz, Shariq, Marlon Alberto, Piñeres-Melo, Roberto-Cesar, Morales-Ortega, Miguel, Urina-Triana, and Naz, Sumera
- Published
- 2024
- Full Text
- View/download PDF
4. Efficacy of feature selection and Classification algorithms in cancer remission using medical imaging
- Author
-
Krzywicka, Małgorzata and Wosiak, Agnieszka
- Published
- 2024
- Full Text
- View/download PDF
5. Data-Driven Decision-Making to Identify the Target Audience of Higher Education Institutions Using Machine Learning Techniques
- Author
-
Kobets, Vitaliy, Gulin, Dmytro, Popovych, Ihor, Li, Gang, Series Editor, Filipe, Joaquim, Series Editor, Xu, Zhiwei, Series Editor, Ermolayev, Vadim, editor, Potapov, Igor, editor, Ignatenko, Oleksii, editor, Hornung, Roman, editor, Hlybovets, Andrii, editor, Yakovyna, Vitaliy, editor, Prytula, Yaroslav, editor, and Spivakovsky, Oleksandr, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Data Mining in Credit Card Approval: Feature Importance Testing Comparison
- Author
-
Ye, Qingyu, Fong, Simon, Yu, Jiahui, Tallón-Ballesteros, Antonio J., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Julian, Vicente, editor, Camacho, David, editor, Yin, Hujun, editor, Alberola, Juan M., editor, Nogueira, Vitor Beires, editor, Novais, Paulo, editor, and Tallón-Ballesteros, Antonio, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Mitigating Risk in the Application of Machine Learning to the Diagnosis of Bronchopulmonary Diseases.
- Author
-
Yusupova, N. I., Bogdanov, M. R., and Smetanina, O. N.
- Abstract
This paper explores the critical issue of risk mitigation in the application of machine learning-based software solutions to image classification, using chest X-rays for diagnosing bronchopulmonary diseases as a case study. The research outlines the challenge of reducing the risk of diagnostic errors by implementing defensive measures against adversarial attacks. Drawing on experimental data from chest X-ray images, this study identifies the most effective machine learning methods for classification, as well as the most threatening attacks that undermine recognition accuracy. Furthermore, it proposes countermeasures designed to mitigate these risks. The experimental findings lead to a set of recommendations, formulated as guidelines that integrate recognition methods, attack types, and countermeasures, aimed at minimizing the risk of misdiagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Analysis of the Impact of Selected Dynamic Parameters of a Motor Vehicle on CO 2 Emissions Using Logistic Regression.
- Author
-
Rykała, Magdalena, Grzelak, Małgorzata, and Borucka, Anna
- Subjects
INTERNAL combustion engines ,CARBON emissions ,LOGISTIC regression analysis ,MOTOR vehicle driving ,MOTOR vehicles - Abstract
The article analyzes the impact of selected operational parameters of internal combustion engine vehicles on CO
2 emissions. The study was preceded by a detailed analysis of the issues related to CO2 emissions in the EU, with a focus on Poland, where the tests were conducted. The key scientific assumption is that individual vehicle users' behaviors significantly impact global CO2 emissions. Daily use of private vehicles, driving style, and attention to fuel efficiency contribute to cumulative effects that can drive the transformation toward more sustainable transport. Therefore, the study was conducted using real-time empirical data obtained from the vehicles' OBD (On-Board Diagnostics) diagnostic systems. This approach enabled the creation of a diagnostic tool allowing each vehicle user to assess CO2 emissions and ultimately manage its levels, which is the biggest innovation of the work. Two levels of CO2 emissions were identified as categorical variables in the model, considered either ecological or non-ecological from the perspective of sustainable transport. The CO2 emission threshold of 200 g/km was adopted based on the average age of vehicles in Poland (14.5 years) and Regulation (EC) No 443/2009 of the European Parliament and of the Council. Three models of logistic regression dedicated to different driving cycle phases—starting, urban driving, and highway driving—were proposed and compared. This study demonstrated that during vehicle starting, the most significant factors influencing the probability of ecological driving are vehicle velocity, relative engine load, and relative throttle position, while for the other two types of movement, engine power and torque should also be considered. The logistic regression model for vehicle start-up obtained a value of sensitivity at about 82% and precision at about 85%. In the case of urban driving, the values of the discussed parameters reach significantly higher levels, with sensitivity at around 96% and precision at about 92%. In turn, the model related to highway driving achieved the highest values among the created models, with sensitivity at around 97% and precision at about 93%. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
9. Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods.
- Author
-
Muraru, Mădălina Maria, Simó, Zsuzsa, and Iantovics, László Barna
- Subjects
MACHINE learning ,PHYSICIANS ,K-nearest neighbor classification ,CERVICAL cancer ,RANDOM forest algorithms ,RESAMPLING (Statistics) ,LOGISTIC regression analysis - Abstract
Cervical cancer affects a large portion of the female population, making the prediction of this disease using Machine Learning (ML) of utmost importance. ML algorithms can be integrated into complex, intelligent, agent-based systems that can offer decision support to resident medical doctors or even experienced medical doctors. For instance, an experienced medical doctor may diagnose a case but need expert support that related to another medical specialty. Data imbalance is frequent in healthcare data and has a negative influence on predictions made using ML algorithms. Cancer data, in general, and cervical cancer data, in particular, are frequently imbalanced. For this study, we chose a messy, real-life cervical cancer dataset available in the Kaggle repository that includes large amounts of missing and noisy values. To identify the best imbalanced technique for this medical dataset, the performances of eleven important resampling methods are compared, combined with the following state-of-the-art ML models that are frequently applied in predictive healtchare research: K-Nearest Neighbors (KNN) (with k values of 2 and 3), binary Logistic Regression (bLR), and Random Forest (RF). The studied resampling methods include seven undersampling methods and four oversampling methods. For this dataset, the imbalance ratio was 12.73, with a 95% confidence interval ranging from 9.23% to 16.22%. The obtained results show that resampling methods help improve the classification ability of prediction models applied to cervical cancer data. The applied oversampling techniques for handling imbalanced data generally outperformed the undersampling methods. The average balanced accuracy for oversampling was 77.44%, compared to 62.28% for undersampling. When detecting the minority class, oversampling achieved an average score of 60.80%, while undersampling scored 41.36%. The logistic regression classifier had the greatest impact on balanced techniques, while random forest achieved promising performance, even before applying balancing techniques. Initially, KNN2 outperformed KNN3 across all metrics, including balanced accuracy, for which KNN2 achieved 53.57%, compared to 52.71% for KNN3. However, after applying oversampling techniques, KNN3 significantly improved its balanced accuracy to 73.78%, while that of KNN2 increased to 63.89%. Additionally, KNN3 outperformed KNN2 in minority class performance, scoring 55.72% compared to KNN2's 33.93%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Stilbenes: A new strategy for protecting light-sensitive foods, a review of their structure classification and singlet oxygen quenching mechanism.
- Author
-
Ren, Xueyan, Tian, Xiaolu, Cai, Xinyu, Li, Xue, and Kong, Qingjun
- Subjects
- *
REACTIVE oxygen species , *OXYGEN compounds , *ENERGY development , *FUNCTIONAL groups , *MORPHOLOGY - Abstract
Natural stilbenes have been studied extensively as a result of their complicated structures and diverse biological activities. Singlet oxygen (1O2), a kind of reactive oxygen species (ROS) has a strong destructive effect on food systems (especially for light-sensitive foods). Many cutting-edge scientific studies have found that some stilbenes not only have extensive quenching properties for ROS, but also can selectively quench 1O2. However, the industry devoted too much energy on the development of more new stilbenes, lacking in-depth summaries and reflections on the characteristics of their basic structure and the mechanism of their extraordinary 1O2 quenching abilities. Therefore, we summarized the classification methods for stilbene compounds and evaluated similarities, differences and possible limitations of different classification methods. In addition, we described the role of different functional groups in stilbenes in quenching of 1O2 and summarized the quenching mechanism of 1O2 by stilbenes. By the way, the current application of stilbene compounds and their potential risks in the food industry were also mentioned in this article. The stilbenes can be used as antioxidants (especially new strategies against 1O2 oxidation) in food systems to improve the shelf life. At this stage, it is necessary to develop more effective and safe food antioxidant stilbenes based on their quenching mechanism. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. The Application of Dayak Customary Law In West Kalimantan Towards Fulfilling A Sense of Justice For Victims (Case Study of Indecent Acts).
- Author
-
Fetrus, Wibowo, Basuki Rekso, and Hasibuan, Fauzie Yusuf
- Subjects
CUSTOMARY law ,LEGAL psychology ,LAW enforcement ,PLURALISM ,DISPUTE resolution - Abstract
The position of customary law in the legal system in Indonesia has the same constitutional position as the legal position in general that applies in the life of the state in Indonesia. However, it must be emphasized that there is a difference between customary law and applicable law in general, namely the aspect of its enactment and formation. The applicability of customary law only applies to Indonesians and from the aspect of its form, customary law is generally not written. The method in this research uses empirical legal research. Research approaches using an empirical legal approach include a legal sociology approach, a legal psychology approach and a legal anthropology approach. Techniques for collecting legal materials through identification, inventory of positive legal rules, literature, books, journals, other sources of legal materials. Legal material analysis techniques use legal interpretation (interpretation), grammatical interpretation, systematic interpretation and legal construction methods. The results of the research and discussion show that the application of Kalimantan Dayak customary punishment is carried out by bringing together the two parties who have a conflict, then these parties are seated with traditional heads, tribal chiefs and people who are considered to understand customary law. After being seated together, a discussion was held regarding the customary offense that had occurred, whether a customary offense had indeed occurred or not. If it is known and proven that a customary offense has occurred, then the contents of the victim's demands against the perpetrator will be discussed, hereinafter referred to as customary sanctions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Comparison of SVM, KNN, and Naïve Bayes Classification Methods in Predicting Student Transfers at BK Palu School.
- Author
-
Nugraha, William, Firmansyah, Gerry, Widodo, Agung Mulyo, and Tjahjono, Budi
- Subjects
SCHOOL transfer policy ,SUPPORT vector machines ,GRADE repetition ,DATA mining ,MACHINE learning - Abstract
Student transfers are a significant issue in schools and can affect the dynamics of education and student performance. This research aims to predict student transfers using a comparative analysis of three classification methods: Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naïve Bayes. The study utilizes historical data from BK Palu School, covering the years 2022 to 2024, which includes demographic, academic, socio-economic, and student quality information. The methodology involves data collection, data preparation, algorithm selection, implementation, and evaluation of the three methods. The performance of the classification methods is assessed using metrics such as accuracy, precision, recall, and F1-score. The results indicate that SVM has the highest accuracy in predicting student transfers, followed by KNN and Naïve Bayes. This study contributes to identifying key factors influencing student transfers and offers schools a robust model to develop targeted strategies for reducing transfer rates. Ultimately, this research provides insights into optimizing student retention and improving the overall quality of education. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Forecasting hotel cancellations through machine learning.
- Author
-
Herrera, Anita, Arroyo, Ángel, Jiménez, Alfredo, and Herrero, Álvaro
- Subjects
- *
ARTIFICIAL neural networks , *SUPERVISED learning , *RADIAL basis functions , *RANDOM forest algorithms , *TOURISM - Abstract
Accurate and reliable forecasting of cancellations is important for successful revenue management in the tourism industry. The objective of this study is to develop classification models to predict hotel booking cancellations. The work involves a number of key steps, such as data preprocessing to properly prepare the data; feature engineering to identify relevant attributes to help improve the predictive ability of the models; hyperparameter settings of the models, including choice of optimizers and incorporation of dropout layers to avoid overfitting in the neural networks; potential overfitting is evaluated using K‐fold cross‐validation; and performance is analysed using the confusion matrix and various performance metrics. The algorithms used are Multilayer Perceptron Neural Network, Radial Basis Function Neural Network, Deep Neural Network, Decision Tree Classifier, Random Forest Classifier, Ada Boost Classifier and XgBoost Classifier. Finally, the results of all models are compared, visualizing Deep Neural Network and XgBoost as the most suitable models for predicting hotel reservation cancellations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A novel classification method for LUAD that guides personalized immunotherapy on the basis of the cross-talk of coagulation- and macrophage-related genes
- Author
-
Zhuoqi Li, Ling Chen, Zhigang Wei, Hongtao Liu, Lu Zhang, Fujing Huang, Xiao Wen, and Yuan Tian
- Subjects
coagulation ,macrophage ,prognosis ,LUAD ,classification methods ,risk score model ,Immunologic diseases. Allergy ,RC581-607 - Abstract
PurposeThe coagulation process and infiltration of macrophages affect the progression and prognosis of lung adenocarcinoma (LUAD) patients. This study was designed to explore novel classification methods that better guide the precise treatment of LUAD patients on the basis of coagulation and macrophages.MethodsWeighted gene coexpression network analysis (WGCNA) was applied to identify M2 macrophage-related genes, and TAM marker genes were acquired through the analysis of scRNA-seq data. The MSigDB and KEGG databases were used to obtain coagulation-associated genes. The intersecting genes were defined as coagulation and macrophage-related (COMAR) genes. Unsupervised clustering analysis was used to evaluate distinct COMAR patterns for LUAD patients on the basis of the COMAR genes. The R package “limma” was used to identify differentially expressed genes (DEGs) between COMAR patterns. A prognostic risk score model, which was validated through external data cohorts and clinical samples, was constructed on the basis of the COMAR DEGs.ResultsIn total, 33 COMAR genes were obtained, and three COMAR LUAD subtypes were identified on the basis of the 33 COMAR genes. There were 341 DEGs identified between the three COMAR subtypes, and 60 prognostic genes were selected for constructing the COMAR risk score model. Finally, 15 prognosis-associated genes (CORO1A, EPHA4, FOXM1, HLF, IFIH1, KYNU, LY6D, MUC16, PPARG, S100A8, SPINK1, SPINK5, SPP1, VSIG4, and XIST) were included in the model, which was efficient and robust in predicting LUAD patient prognosis and clinical outcomes in patients receiving anti-PD-1/PD-L1 immunotherapy.ConclusionsLUAD can be classified into three subtypes according to COMAR genes, which may provide guidance for precise treatment.
- Published
- 2025
- Full Text
- View/download PDF
15. Identifying diseases symptoms and general rules using supervised and unsupervised machine learning
- Author
-
Fatemeh Sogandi
- Subjects
Diseases symptoms ,Classification methods ,Association rules ,Apriori algorithm ,Machine learning algorithms ,Medicine ,Science - Abstract
Abstract The symptoms of diseases can vary among individuals and may remain undetected in the early stages. Detecting these symptoms is crucial in the initial stage to effectively manage and treat cases of varying severity. Machine learning has made major advances in recent years, proving its effectiveness in various healthcare applications. This study aims to identify patterns of symptoms and general rules regarding symptoms among patients using supervised and unsupervised machine learning. The integration of a rule-based machine learning technique and classification methods is utilized to extend a prediction model. This study analyzes patient data that was available online through the Kaggle repository. After preprocessing the data and exploring descriptive statistics, the Apriori algorithm was applied to identify frequent symptoms and patterns in the discovered rules. Additionally, the study applied several machine learning models for predicting diseases, including stepwise regression, support vector machine, bootstrap forest, boosted trees, and neural-boosted methods. Several predictive machine learning models were applied to the dataset to predict diseases. It was discovered that the stepwise method for fitting outperformed all competitors in this study, as determined through cross-validation conducted for each model based on established criteria. Moreover, numerous significant decision rules were extracted in the study, which can streamline clinical applications without the need for additional expertise. These rules enable the prediction of relationships between symptoms and diseases, as well as between different diseases. Therefore, the results obtained in this study have the potential to improve the performance of prediction models. We can discover diseases symptoms and general rules using supervised and unsupervised machine learning for the dataset. Overall, the proposed algorithm can support not only healthcare professionals but also patients who face cost and time constraints in diagnosing and treating these diseases.
- Published
- 2024
- Full Text
- View/download PDF
16. Method for Assigning Railway Traffic Managers to Tasks along with Models for Evaluating and Classifying.
- Author
-
Restel, Franciszek, Haładyn, Szymon, Mardeusz, Ewa, Starčević, Martin, and Oziębłowski, Mateusz
- Subjects
VIRTUAL reality ,FUZZY logic ,RAILROADS ,CLASSIFICATION - Abstract
The occurrence of incidences in railway systems leads to impediments and often delays. Because the railway is an anthropotechnical system, two factors are considered as the source of incidents: technical and human. Minimizing adverse incidents in the railway system is the subject of much discussion and research. One of the areas affecting the performance of railway systems is employees. This article presents a method for assigning railway employees to tasks and models for evaluating and classifying railway employees, consisting of two stages. The first stage involves using a survey method and a fuzzy logic model. Each type of service is assigned feature values, obtaining three parameterized employee-role profiles for the train traffic officer. In the second stage, the participant goes through two of the three available evaluation scenarios, during which errors made during the tasks are counted. Validation results of the proposed approach indicate that the method is 87% effective. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Identifying diseases symptoms and general rules using supervised and unsupervised machine learning.
- Author
-
Sogandi, Fatemeh
- Subjects
APRIORI algorithm ,MEDICAL personnel ,SUPPORT vector machines ,SYMPTOMS ,NOSOLOGY - Abstract
The symptoms of diseases can vary among individuals and may remain undetected in the early stages. Detecting these symptoms is crucial in the initial stage to effectively manage and treat cases of varying severity. Machine learning has made major advances in recent years, proving its effectiveness in various healthcare applications. This study aims to identify patterns of symptoms and general rules regarding symptoms among patients using supervised and unsupervised machine learning. The integration of a rule-based machine learning technique and classification methods is utilized to extend a prediction model. This study analyzes patient data that was available online through the Kaggle repository. After preprocessing the data and exploring descriptive statistics, the Apriori algorithm was applied to identify frequent symptoms and patterns in the discovered rules. Additionally, the study applied several machine learning models for predicting diseases, including stepwise regression, support vector machine, bootstrap forest, boosted trees, and neural-boosted methods. Several predictive machine learning models were applied to the dataset to predict diseases. It was discovered that the stepwise method for fitting outperformed all competitors in this study, as determined through cross-validation conducted for each model based on established criteria. Moreover, numerous significant decision rules were extracted in the study, which can streamline clinical applications without the need for additional expertise. These rules enable the prediction of relationships between symptoms and diseases, as well as between different diseases. Therefore, the results obtained in this study have the potential to improve the performance of prediction models. We can discover diseases symptoms and general rules using supervised and unsupervised machine learning for the dataset. Overall, the proposed algorithm can support not only healthcare professionals but also patients who face cost and time constraints in diagnosing and treating these diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Crime Pattern Analysis: Exploring the Use of ML Algorithms to Identify Patterns in Criminal Behavior, Detect Crime Hotspots, and Support Proactive Law Enforcement Strategies.
- Author
-
Malik, Pankaj, Pandit, Rakesh, Singh, Lokendra, Shinde, Pinky, Vijayvargiya, Rashmi, and Chourawar, Sanchita
- Subjects
PATTERN recognition systems ,CRIMINAL behavior ,CRIMINAL methods ,CRIME analysis ,LAW enforcement - Abstract
Purpose: The aim of this research is to examine how machine learning (ML) algorithms can improve the understanding of crime patterns and facilitate proactive law enforcement tactics. Understanding crime trends is essential to properly allocating resources and comprehending criminal behaviour. ML provides an effective toolkit for spotting these trends and forecasting the frequency of crimes in the future. Methods: Using crime data, we design, and test machine learning (ML) based methods for pattern recognition, hotspot detection, and predictive modelling. We evaluate the performance of different algorithms by calculating metrics such as precision, recall, and accuracy. Results: Our results indicate that when it came to criminal pattern analysis, the "Decision Tree" strategy performed better than the other methods. Improved classification accuracy and other assessment metrics served as proof of this. Conclusions: This study demonstrates machine learning algorithms can improve criminal pattern analysis, which can help law enforcement. Nonetheless, the research recognizes the constraints of machine learning and explores the moral issues surrounding its application in law enforcement procedures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
19. Avoiding impacts of phylogenetic tip‐state‐errors on dispersal and extirpation rates in alpine plant biogeography.
- Author
-
Bätscher, Livio and de Vos, Jurriaan M.
- Subjects
- *
MOUNTAIN plants , *BIOLOGICAL extinction , *BIOGEOGRAPHY , *BIOMES , *PLANT species , *TIMBERLINE - Abstract
Aim: Many biogeographic analyses require some form of automated state assignment to tips of phylogenetic trees, reflecting a species presence or absence in a particular area, e.g., a biome. As datasets get exponentially larger, such procedures may increasingly induce errors (here called tip‐state‐error), but the specific algorithmic cause and consequence on downstream estimation of dispersal and extinction rates remains poorly known. We aim to improve automated tip‐scoring methods in the context of the alpine biome by leveraging elevation information. We document the profound effect of tip‐state‐errors on Dispersal‐Extirpation‐Cladogenesis (DEC) models. Location: The European Alpine Arc. Taxon: Three thousand three hundred seventeen vascular plant species, emphasizing six focal genera: Campanula, Carex, Festuca, Ranunculus, Saxifraga, and Viola. Methods: We use GBIF data to classify whether species occur above the upper climatic treeline using a newly developed algorithm ElevDistr or a gridded landscape model of thermal belts, under various filtering thresholds. We compared classification performance using the Flora Alpina as validation data. To determine if tip‐state‐error biases the dispersal and extirpation rate estimation, we fit DEC models for selected clades using tip‐states from different classification models. Results: ElevDistr is less error prone than other approaches. Filtering thresholds lower the false positive rate but increase the false negative rate. Inflated false positive rates bias the dispersal rate estimation upward, while inflated false negative rates lead to upward bias in extirpation rate estimation. Main Conclusions: Even moderate tip‐state‐error may lead to profound systematic bias in dispersal and extinction rate estimation if an unbalanced ratio between false positive and false negative rates occurs. Therefore, careful validation is imperative, though ElevDistr alleviates this problem in the context of the alpine environment. Overall, our results suggest contrasting rates of alpine biome shifts across the studied genera and have major implications for studies addressing the likelihood of niche evolution versus geographic dispersal. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. A review on features and methods of potential fishing zone.
- Author
-
Ya'acob, Norsuzila, Nik Dzulkefli, Nik Nur Shaadah, Abdul Aziz, Mohd Azri, Yusof, Azita Laily, and Umar, Roslan
- Subjects
SUPPORT vector machines ,SUSTAINABILITY ,ARTIFICIAL neural networks ,CLASSIFICATION algorithms ,CLASSIFICATION of fish - Abstract
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Method for Detecting Phishing Sites
- Author
-
Buchyk, Serhii, Toliupa, Serhii, Buchyk, Oleksandr, Shevchenko, Anatolii, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Luntovskyy, Andriy, editor, Klymash, Mikhailo, editor, Melnyk, Igor, editor, Beshley, Mykola, editor, and Schill, Alexander, editor
- Published
- 2024
- Full Text
- View/download PDF
22. Explainable Artificial Intelligence Insight: An Orderly Survey
- Author
-
Chaudhary, Meghna, Afshar Alam, M., Zafar, Sherin, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Mahapatra, Rajendra Prasad, editor, Peddoju, Sateesh K., editor, Roy, Sudip, editor, and Parwekar, Pritee, editor
- Published
- 2024
- Full Text
- View/download PDF
23. Credit Card Fraud Analysis Using Machine Learning
- Author
-
Charitha, Sree, Chowdary, Shivani, Rao, Trupthi, Kodipalli, Ashwini, Kamal, Shoaib, Rohini, B. R., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Shetty, N. R., editor, Prasad, N. H., editor, and Nagaraj, H. C., editor
- Published
- 2024
- Full Text
- View/download PDF
24. Differential Evolution-Based Weighted Voting Stacking Ensemble Classifier for Highly Skewed Binary Data Distribution
- Author
-
Dolo, Kgaugelo Moses, Mnkandla, Ernest, Xhafa, Fatos, Series Editor, Woungang, Isaac, editor, and Dhurandher, Sanjay Kumar, editor
- Published
- 2024
- Full Text
- View/download PDF
25. Prediction of one- and three-months yoga practices effect on chronic venous insufficiency based on machine learning classifiers
- Author
-
Xue Han and Nan Hu
- Subjects
Chronic venous insufficiency ,Cross validation ,Univariate selection ,Correlation matrix ,Classification methods ,Optimization algorithms ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The rise of technology has heightened work demands, adversely impacting mental health and fitness. The COVID-19 pandemic exacerbates psychological stress, emphasizing the need for non-pharmacological interventions like yoga. Yoga positively influences the autonomic nervous system, benefiting cardio-respiratory health, metabolic efficiency, and conditions like Type-2 diabetes, Chronic Venous disease, and obesity. This study employs a dataset with 100 samples and 43 features related to Chronic Venous Insufficiency (CVI). Logistic and Random Forest classifiers are validated using K-fold cross-validation, with feature selection optimizing prediction accuracy. Hybrid models, enhanced with optimization algorithms, predict Venous Clinical Severity Score (VCSS) before, one, and three months after yoga practices. The Random Forest classifier, particularly RFGT, proves highly accurate in categorizing baseline severity and identifying Mild and Moderate CVI cases. RFGT demonstrated AUC score of 0.9072, 0.8714, 0.7709, and 0.7200 in Absent, Mild, Moderate, and Severe patient groups classification before yoga practices (VCSS-Pre). These values were 0.9158, 0.8644, 0.8142, and 0.6333 for VCSS-1 and reported as 0.9269, 0.8399, 0.7838, and 0.7500 for patients’ classification in VCSS-3. Predicting VCSS scores before yoga intervention assists in categorizing participants for personalized care and efficient resource allocation. The RFC-based models, notably RFGT, show high accuracy in identifying baseline severity, enabling early intervention for high-risk individuals. These models, especially RFGT, perform well in classifying Mild and Moderate CVI cases, informing lifestyle modifications. Predicting VCSS-1 scores evaluates the short-term impact of yoga practices, identifying individuals requiring additional support. RFGT aids in personalized recommendations based on specific factors, crucial for severe conditions. Predicting VCSS-3 scores assesses the sustained impact over three months, identifying intervention responders, particularly in Severe and Moderate groups. RFGT demonstrates optimal predictions, contributing to future interventions tailored to individual responses and improved outcomes.
- Published
- 2024
- Full Text
- View/download PDF
26. A Comprehensive Review of Clasifier used with Imbalanced Data in Machine Learning
- Author
-
Muammar Reza Pahlawan, Arief Setyanto, and M. Rudyanto Arief
- Subjects
classification methods ,super vector machine (svm) ,k-nearest neighbors (knn) ,random forest, imbalance data ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Dengan majunya perkembangan teknologi beberapa tahun terakhir, menghadirkan banyak konten digital. Hal ini juga menghadirkan kesempatan dalam bidang penelitian seperti halnya Machine Learning. Salah satu metode dalam Machine Learning adalah klasifikasi. Klasifikasi bertujuan untuk mengelompokkan data sesuai dengan kelasnya. Akan tetapi faktor seperti data imbalance dapat menyebabkan hasil dari metode ini menjadi kurang sesuai dengan yang diharapkan. Penelitian ini menyajikan tinjauan komprehensif tentang metode klasifikasi dalam pengolahan teks, dengan fokus pada penanganan tantangan yang ditimbulkan oleh data yang tidak seimbang. Dengan pertumbuhan eksponensial konten digital, kebutuhan untuk mengkategorikan dan menganalisis data teks secara efektif telah menjadi semakin kritis. Metode klasifikasi memainkan peran penting dalam upaya ini, memfasilitasi tugas seperti analisis sentimen, klasifikasi dokumen, dan pengambilan informasi. Namun, keberadaan data imbalance, ditandai oleh distribusi kelas yang condong, menimbulkan hambatan signifikan terhadap keandalan dan efektivitas model klasifikasi. Dengan penelitian ini diharapkan pembaca, dapat mengetahui metode apa saja yang umumnya digunakan dalam metode klasifikasi. Kemampuan metode klasifikasi tersebut pada umumnya ketika dihadapkan pada kasus tertentu seperti data imbalance. Tinjauan ini menyoroti Support Vector Machine (SVM) sebagai metode klasifikasi paling menonjol sebesar 25%, diikuti oleh K-Nearest Neighbours (KNN) dan Random Forest dengan persentase 19%, Decision Tree, dan Naïve Bayes. Metode alternatif yang disesuaikan dengan tujuan penelitian dan tantangan tertentu juga dieksplorasi. Hasil persentase penggunaan metode tersebut didapat dari kumpulan jurnal yang peneliti kumpulkan dan teliti
- Published
- 2024
- Full Text
- View/download PDF
27. Comparative Analysis of Classification Methods and Suitable Datasets for Protocol Recognition in Operational Technologies.
- Author
-
Holasova, Eva, Fujdiak, Radek, and Misurec, Jiri
- Subjects
- *
COMPUTER network traffic , *INFORMATION technology , *CLASSIFICATION , *COMPARATIVE studies , *COMPARATIVE method - Abstract
The interconnection of Operational Technology (OT) and Information Technology (IT) has created new opportunities for remote management, data storage in the cloud, real-time data transfer over long distances, or integration between different OT and IT networks. OT networks require increased attention due to the convergence of IT and OT, mainly due to the increased risk of cyber-attacks targeting these networks. This paper focuses on the analysis of different methods and data processing for protocol recognition and traffic classification in the context of OT specifics. Therefore, this paper summarizes the methods used to classify network traffic, analyzes the methods used to recognize and identify the protocol used in the industrial network, and describes machine learning methods to recognize industrial protocols. The output of this work is a comparative analysis of approaches specifically for protocol recognition and traffic classification in OT networks. In addition, publicly available datasets are compared in relation to their applicability for industrial protocol recognition. Research challenges are also identified, highlighting the lack of relevant datasets and defining directions for further research in the area of protocol recognition and classification in OT environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Cluster mapping in Spain: Exploring the correlation between industrial agglomeration and regional performance.
- Author
-
Fernández-Escobedo, Rudy, Eguía-Peña, Begoña, and Aldaz-Odriozola, Leire
- Subjects
TRADE regulation ,URBAN economics ,PATENT offices ,REGIONAL economics ,NATURAL resources ,ACADEMIC-industrial collaboration ,FURNITURE manufacturing ,GAS extraction - Published
- 2024
- Full Text
- View/download PDF
29. Using Automated Machine Learning for Spatial Prediction—The Heshan Soil Subgroups Case Study.
- Author
-
Liang, Peng, Qin, Cheng-Zhi, and Zhu, A-Xing
- Subjects
MACHINE learning ,SOIL classification ,DIGITAL soil mapping ,SOILS ,SOIL sampling - Abstract
Recently, numerous spatial prediction methods with diverse characteristics have been developed. Selecting an appropriate spatial prediction method, along with its data preprocessing and parameter settings, presents a challenging task for many users, especially for non-experts. This paper addresses this challenge by exploring the potential of automated machine learning method proposed in artificial intelligent domain to automatically determine the most suitable method among various machine learning methods. As a case study, the automated machine learning method was applied to predict the spatial distribution of soil subgroups in Heshan farm. A total of 110 soil samples and 10 terrain variables were utilized in the designed experiments. To evaluate the performance, the proposed method was compared to each machine learning method with default parameters values or parameters determined by expert knowledge. The results showed that the proposed method typically achieved higher accuracy scores than the two alternative methods. This suggests that automated machine learning performs effectively in scenarios where numerous machine learning methods are available and offers practical utility in reducing the dependence on users' expertise in spatial prediction. However, a more robust automated framework should be developed to encompass a broader range of spatial prediction methods, such as spatial statistic methods, rather than only focusing on machine learning methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Text Classification Technologies in Document Categorization Systems. A Survey.
- Author
-
Kravets, Alla and Semenochkin, Dmitriy
- Subjects
SCIENCE databases ,DATA analysis ,CONVOLUTIONAL neural networks ,RECURRENT neural networks ,METHODOLOGY - Abstract
This paper presents a literature review from 2013 to 2022 on technologies and datasets used in the field of text classification. The review covered 110 sources from 5 scientific databases, the main criterion for inclusion was the presence of an experimental part involving a classifier or other technologies related to the classification process. The study reviewed the classification process and highlighted three main stages of text classification: data preparation, classifier training, and evaluation of results. Using Kitchenham's Systematic Literature Review methodology, scholarly articles dealing with text classification problems were collected and analyzed. A sample of 243 articles was obtained, and after screening, a resulting sample of 110 articles was obtained. Guided by the two research questions posed, this sample was analyzed and the results of the analysis were presented in graphical format. For each of the identified stages of classification, the frequency of use of the main technologies used in a particular stage was analyzed. Each technology was reviewed within its respective source. In addition, considerable attention was given to analyzing the different datasets used for text classification, with a particular focus on the less frequently used ones. An analysis of the frequency of use of datasets concluded that researchers often use proven and popular datasets to demonstrate the effectiveness of their method. Datasets are less frequently used to solve localized text classification problems. One notable trend identified in the analysis is the increasing prevalence of deep learning technologies in text classification. These technologies, including neural networks, recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, and attention mechanisms, have gained considerable popularity among researchers. This study provides valuable insights into the evolution of text classification by shedding light on a variety of technologies, approaches, and datasets used by researchers. As text classification continues to evolve and diversify, this review can be a valuable resource for scholars and practitioners in the field, providing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
31. Explainable AI: To Reveal the Logic of Black-Box Models.
- Author
-
Chinu and Bansal, Urvashi
- Subjects
- *
ARTIFICIAL intelligence , *LOGIC , *INTELLIGENCE service - Abstract
Artificial intelligence (AI) is continuously evolving; however, in the last 10 years, it has gotten considerably more difficult to explain AI models. With the help of explanations, end users can understand the outcomes generated by AI models. The proposed work has shown major issues and gaps in the literature. The main issues found in the literature are unfair/biased decisions made by the model, poor accuracy, reliability, and evaluation metrics to assess the effectiveness of explanations and security of data. Research results obtained in this proposed work highlight the needs, challenges, and opportunities in the field of Explainable artificial intelligence (XAI). How can we make artificial intelligence models explainable? Evaluation of explanations using metrics is the main contribution of this research work. Moreover, the proposed work analyzed different types of explanations, leading companies providing Explainable artificial intelligence services, and open-source tools available in the market for using Explainable artificial intelligence. Finally, based on the reviewed works, the proposed work well-found some future directions for designing more transparent models for artificial intelligence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Typologia rolnictwa: przegląd zagadnień teoretycznych i ujęć empirycznych.
- Author
-
Kossowski, Tomasz M.
- Abstract
Copyright of Rozwój Regionalny & Polityka Regionalna is the property of Uniwersytetu im. Adama Mickiewicza (IH UAM) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
33. Impact of The Covid-19 Pandemic on Student Learning Styles: Naïve Bayes and Decision Tree Classification in Education
- Author
-
Zaqi Kurniawan and Rizka Tiaharyadini
- Subjects
covid-19 pandemic ,student learning styles ,classification methods ,naïve bayes ,decision tree ,Information technology ,T58.5-58.64 - Abstract
The Covid-19 pandemic significantly changed education with social distancing and changes in the learning environment. In this study, one strong reason for the significance of the research is the urgency of changes in students' learning styles during the Covid-19 pandemic. Investigating differences in learning styles before and during the pandemic not only provides deep insight into students' adaptation to these changes, but also provides a foundation for the development of more inclusive and adaptive learning strategies in the future. This study aims to analyze the effect of the Covid-19 pandemic on students' learning styles in an educational context, focusing on the comparison of two classification methods, Naïve Bayes and Decision Tree. The study was conducted by collecting data on students' learning styles before and during the Covid-19 pandemic, using various relevant indicators. The data was obtained based on school survey results and online platforms, involving student characteristics and learning preferences. The data was then analyzed using Naïve Bayes and Decision Tree classification methods to identify significant changes in students' learning styles. The results showed the prediction accuracy of learning style changes with Naïve Bayes 68.75% and Decision Tree 87.50%. Recommendations for educators and education policy makers are to develop inclusive and adaptive learning strategies to meet diverse learning preferences.
- Published
- 2024
- Full Text
- View/download PDF
34. Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods
- Author
-
Mădălina Maria Muraru, Zsuzsa Simó, and László Barna Iantovics
- Subjects
cervical cancer ,sampling methods ,imbalanced datasets ,classification methods ,prediction methods ,K-nearest neighbors ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Cervical cancer affects a large portion of the female population, making the prediction of this disease using Machine Learning (ML) of utmost importance. ML algorithms can be integrated into complex, intelligent, agent-based systems that can offer decision support to resident medical doctors or even experienced medical doctors. For instance, an experienced medical doctor may diagnose a case but need expert support that related to another medical specialty. Data imbalance is frequent in healthcare data and has a negative influence on predictions made using ML algorithms. Cancer data, in general, and cervical cancer data, in particular, are frequently imbalanced. For this study, we chose a messy, real-life cervical cancer dataset available in the Kaggle repository that includes large amounts of missing and noisy values. To identify the best imbalanced technique for this medical dataset, the performances of eleven important resampling methods are compared, combined with the following state-of-the-art ML models that are frequently applied in predictive healtchare research: K-Nearest Neighbors (KNN) (with k values of 2 and 3), binary Logistic Regression (bLR), and Random Forest (RF). The studied resampling methods include seven undersampling methods and four oversampling methods. For this dataset, the imbalance ratio was 12.73, with a 95% confidence interval ranging from 9.23% to 16.22%. The obtained results show that resampling methods help improve the classification ability of prediction models applied to cervical cancer data. The applied oversampling techniques for handling imbalanced data generally outperformed the undersampling methods. The average balanced accuracy for oversampling was 77.44%, compared to 62.28% for undersampling. When detecting the minority class, oversampling achieved an average score of 60.80%, while undersampling scored 41.36%. The logistic regression classifier had the greatest impact on balanced techniques, while random forest achieved promising performance, even before applying balancing techniques. Initially, KNN2 outperformed KNN3 across all metrics, including balanced accuracy, for which KNN2 achieved 53.57%, compared to 52.71% for KNN3. However, after applying oversampling techniques, KNN3 significantly improved its balanced accuracy to 73.78%, while that of KNN2 increased to 63.89%. Additionally, KNN3 outperformed KNN2 in minority class performance, scoring 55.72% compared to KNN2’s 33.93%.
- Published
- 2024
- Full Text
- View/download PDF
35. Analysis of the Impact of Selected Dynamic Parameters of a Motor Vehicle on CO2 Emissions Using Logistic Regression
- Author
-
Magdalena Rykała, Małgorzata Grzelak, and Anna Borucka
- Subjects
emission ,road transport ,motor vehicles ,European Union ,classification methods ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The article analyzes the impact of selected operational parameters of internal combustion engine vehicles on CO2 emissions. The study was preceded by a detailed analysis of the issues related to CO2 emissions in the EU, with a focus on Poland, where the tests were conducted. The key scientific assumption is that individual vehicle users’ behaviors significantly impact global CO2 emissions. Daily use of private vehicles, driving style, and attention to fuel efficiency contribute to cumulative effects that can drive the transformation toward more sustainable transport. Therefore, the study was conducted using real-time empirical data obtained from the vehicles’ OBD (On-Board Diagnostics) diagnostic systems. This approach enabled the creation of a diagnostic tool allowing each vehicle user to assess CO2 emissions and ultimately manage its levels, which is the biggest innovation of the work. Two levels of CO2 emissions were identified as categorical variables in the model, considered either ecological or non-ecological from the perspective of sustainable transport. The CO2 emission threshold of 200 g/km was adopted based on the average age of vehicles in Poland (14.5 years) and Regulation (EC) No 443/2009 of the European Parliament and of the Council. Three models of logistic regression dedicated to different driving cycle phases—starting, urban driving, and highway driving—were proposed and compared. This study demonstrated that during vehicle starting, the most significant factors influencing the probability of ecological driving are vehicle velocity, relative engine load, and relative throttle position, while for the other two types of movement, engine power and torque should also be considered. The logistic regression model for vehicle start-up obtained a value of sensitivity at about 82% and precision at about 85%. In the case of urban driving, the values of the discussed parameters reach significantly higher levels, with sensitivity at around 96% and precision at about 92%. In turn, the model related to highway driving achieved the highest values among the created models, with sensitivity at around 97% and precision at about 93%.
- Published
- 2024
- Full Text
- View/download PDF
36. Brain tumor MRI identification and classification using DWT, PCA and kernel support vector machine
- Author
-
Омар Фарук, Джахидул Ислам, Сакиб Ахмед, Саджиб Хоссейн, and Нараян Чандра Натх
- Subjects
classification methods ,Discrete Wavelet Transform ,Feature extraction ,Image Segmentation ,Pre-Processing ,Probabilistic Neural Network ,Technology (General) ,T1-995 - Abstract
Classification, segmentation, and the identification of the infection region in MRI images of brain tumors are labor-intensive and iterative processes. Numerous anatomical structures of the human body may be envisioned using an image processing theory. With basic imaging methods, it is challenging to see the aberrant human brain's structure. The neurological structure of the human brain may be distinguished and made clearer using the magnetic resonance imaging technique. The MRI approach uses a number of imaging techniques to evaluate and record the human brain’s interior features. In this study, we focused on strategies for noise removal, gray-level co-occurrence matrix (GLCM) extraction of features, and segmentation of brain tumor regions based on Discrete Wavelet Transform (DWT) to minimize complexity and enhance performance. In turn, this reduces any noise that could have been left over after segmentation due to morphological filtering. Brain MRI scans were utilized to test the accuracy of the classification and the location of the tumor using probabilistic neural network classifiers. The classifier's accuracy and position detection were tested using MRI brain imaging. The efficiency of the suggested approach is demonstrated by experimental findings, which showed that normal and diseased tissues could be distinguished from one another from brain MRI scans with about 100% accuracy.
- Published
- 2024
- Full Text
- View/download PDF
37. Improving fraud detection with semi-supervised topic modeling and keyword integration.
- Author
-
Sánchez, Marco and Urquiza, Luis
- Abstract
Fraud detection through auditors' manual review of accounting and financial records has traditionally relied on human experience and intuition. However, replicating this task using technological tools has represented a challenge for information security researchers. Natural language processing techniques, such as topic modeling, have been explored to extract information and categorize large sets of documents. Topic modeling, such as latent Dirichlet allocation (LDA) or non-negative matrix factorization (NMF), has recently gained popularity for discovering thematic structures in text collections. However, unsupervised topic modeling may not always produce the best results for specific tasks, such as fraud detection. Therefore, in the present work, we propose to use semi-supervised topic modeling, which allows the incorporation of specific knowledge of the study domain through the use of keywords to learn latent topics related to fraud. By leveraging relevant keywords, our proposed approach aims to identify patterns related to the vertices of the fraud triangle theory, providing more consistent and interpretable results for fraud detection. The model's performance was evaluated by training with several datasets and testing it with another one that did not intervene in its training. The results showed efficient performance averages with a 7% increase in performance compared to a previous job. Overall, the study emphasizes the importance of deepening the analysis of fraud behaviors and proposing strategies to identify them proactively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Blackberry Fruit Classification in Underexposed Images Combining Deep Learning and Image Fusion Methods.
- Author
-
Morales-Vargas, Eduardo, Fuentes-Aguilar, Rita Q., de-la-Cruz-Espinosa, Emanuel, and Hernández-Melgarejo, Gustavo
- Subjects
- *
IMAGE fusion , *IMAGE recognition (Computer vision) , *DEEP learning , *COMPUTER vision , *FRUIT , *BERRIES , *BLACKBERRIES - Abstract
Berry production is increasing worldwide each year; however, high production leads to labor shortages and an increase in wasted fruit during harvest seasons. This problem opened new research opportunities in computer vision as one main challenge to address is the uncontrolled light conditions in greenhouses and open fields. The high light variations between zones can lead to underexposure of the regions of interest, making it difficult to classify between vegetation, ripe, and unripe blackberries due to their black color. Therefore, the aim of this work is to automate the process of classifying the ripeness stages of blackberries in normal and low-light conditions by exploring the use of image fusion methods to improve the quality of the input image before the inference process. The proposed algorithm adds information from three sources: visible, an improved version of the visible, and a sensor that captures images in the near-infrared spectra, obtaining a mean F1 score of 0.909 ± 0.074 and 0.962 ± 0.028 in underexposed images, without and with model fine-tuning, respectively, which in some cases is an increase of up to 12% in the classification rates. Furthermore, the analysis of the fusion metrics showed that the method could be used in outdoor images to enhance their quality; the weighted fusion helps to improve only underexposed vegetation, improving the contrast of objects in the image without significant changes in saturation and colorfulness. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. STABILITY PREDICTION OF QUADRUPED ROBOT MOVEMENT USING CLASSIFICATION METHODS AND PRINCIPAL COMPONENT ANALYSIS Movement Using Classification Methods And Principal Component Analysis.
- Author
-
DIVANDARI, Mohammad, GHABI, Delaram, and KALTEH, Abdol Aziz
- Subjects
PRINCIPAL components analysis ,ROBOT motion ,CENTRAL pattern generators ,K-nearest neighbor classification ,RANDOM forest algorithms ,MACHINE learning ,NAIVE Bayes classification - Abstract
This paper introduces a novel technique for predicting the stability of quadruped robot locomotion using a central pattern generator (CPG). The proposed method utilizes classification methods and principal component analysis (PCA) to predict stability. The objective of this study is to anticipate the stability or instability of robot movement by modifying controlling parameters, referred to as features. The simulations of robot locomotion are conducted in MATLAB/SIMULINK R, generating a dataset of 82 observations with different parameters. Machine learning (ML) techniques are then applied, using classification methods and PCA, to determine the stability condition. Six classification methods, including K-nearest neighbors (KNN), support vector classifier (SVC), Gaussian Naïve Bayes (GaussianNB), logistic regression (LR), decision tree (DT), and random forest (RF) are implemented using Scikit-learn, an opensource ML library in Python. The performance of these classifiers is evaluated using four metrics: precision, recall, accuracy, and F1-score. The results indicate that KNN and SVC exhibit higher metric values compared to the other classifiers, making them more effective for stability prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Sentinel-2A Verileriyle Trabzon İli 2019-2020 Yılları Arasında Ortaya Çıkan Sınıflandırma Farklarının Çeşitli Algoritmalarla Değerlendirilmesi.
- Author
-
Makineci, Hasan Bilgehan and Akosman, Esma Nur
- Abstract
Copyright of Turkish Journal of Remote Sensing / Türkiye Uzaktan Algılama Dergisi is the property of Turkish Journal of Remote Sensing and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
41. Evaluating Techniques Based on Supervised Learning Methods in Casas Kyoto Dataset for Human Activity Recognition
- Author
-
García-Restrepo, Johanna-Karinna, Ariza-Colpas, Paola Patricia, Butt-Aziz, Shariq, Piñeres-Melo, Marlon Alberto, Naz, Sumera, De-la-hoz-Franco, Emiro, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Saeed, Khalid, editor, Dvorský, Jiří, editor, Nishiuchi, Nobuyuki, editor, and Fukumoto, Makoto, editor
- Published
- 2023
- Full Text
- View/download PDF
42. Forecast the Early Stage of Diabetes Mellitus Using Machine Learning
- Author
-
Karthikeyini, S., Rupa, M., Athira, S., Ravikumar, M., Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Kumar, Sandeep, editor, Hiranwal, Saroj, editor, Purohit, S.D., editor, and Prasad, Mukesh, editor
- Published
- 2023
- Full Text
- View/download PDF
43. Machine Learning Algorithms for Binary Classification of Breast Cancer
- Author
-
Katiyar, Preeti, Singh, Krishna, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Dubey, Ashwani Kumar, editor, Sugumaran, Vijayan, editor, and Chong, Peter Han Joo, editor
- Published
- 2023
- Full Text
- View/download PDF
44. Land Use and Land Cover Classification and Changes Detection Using Machine Learning Approaches
- Author
-
Ebenezer, P. Adlene, Manohar, S., Sakila, V. Sahaya, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Senjyu, Tomonobu, editor, So–In, Chakchai, editor, and Joshi, Amit, editor
- Published
- 2023
- Full Text
- View/download PDF
45. A Comprehensive Study of Crop Disease Detection Using Machine Learning Classification Techniques
- Author
-
Sagar, Sanjeela, Singh, Jaswinder, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Garg, Lalit, editor, Sisodia, Dilip Singh, editor, Kesswani, Nishtha, editor, Vella, Joseph G., editor, Brigui, Imene, editor, Misra, Sanjay, editor, and Singh, Deepak, editor
- Published
- 2023
- Full Text
- View/download PDF
46. The Study of the Unsupervised Classification Method Using the K-means Algorithm by a Proposition of a Simple Initialization Technique
- Author
-
Ouchani, Rahma, Merzougui, Mohammed, Nasri, M’barek, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Ben Ahmed, Mohamed, editor, Boudhir, Anouar Abdelhakim, editor, Santos, Domingos, editor, Dionisio, Rogerio, editor, and Benaya, Nabil, editor
- Published
- 2023
- Full Text
- View/download PDF
47. Review on Facial Recognition System: Past, Present, and Future
- Author
-
Shree, Manu, Dev, Amita, Mohapatra, A. K., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Saraswat, Mukesh, editor, Chowdhury, Chandreyee, editor, Kumar Mandal, Chintan, editor, and Gandomi, Amir H., editor
- Published
- 2023
- Full Text
- View/download PDF
48. Methods of Recognition and Classification of Objects in Digital Logistics
- Author
-
Anantchenko, Igor, Zudilova, Tatiana, Ivanov, Sergei, Osipov, Nikita, Osetrova, Irina, Xhafa, Fatos, Series Editor, Ilin, Igor, editor, Jahn, Carlos, editor, and Tick, Andrea, editor
- Published
- 2023
- Full Text
- View/download PDF
49. Method for Assigning Railway Traffic Managers to Tasks along with Models for Evaluating and Classifying
- Author
-
Franciszek Restel, Szymon Haładyn, Ewa Mardeusz, Martin Starčević, and Mateusz Oziębłowski
- Subjects
train traffic controller ,virtual reality ,classification methods ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The occurrence of incidences in railway systems leads to impediments and often delays. Because the railway is an anthropotechnical system, two factors are considered as the source of incidents: technical and human. Minimizing adverse incidents in the railway system is the subject of much discussion and research. One of the areas affecting the performance of railway systems is employees. This article presents a method for assigning railway employees to tasks and models for evaluating and classifying railway employees, consisting of two stages. The first stage involves using a survey method and a fuzzy logic model. Each type of service is assigned feature values, obtaining three parameterized employee-role profiles for the train traffic officer. In the second stage, the participant goes through two of the three available evaluation scenarios, during which errors made during the tasks are counted. Validation results of the proposed approach indicate that the method is 87% effective.
- Published
- 2024
- Full Text
- View/download PDF
50. Improving fraud detection with semi-supervised topic modeling and keyword integration
- Author
-
Marco Sánchez and Luis Urquiza
- Subjects
Fraud triangle ,Human behavior ,Topic modeling ,Data mining ,Text mining ,Classification methods ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Fraud detection through auditors’ manual review of accounting and financial records has traditionally relied on human experience and intuition. However, replicating this task using technological tools has represented a challenge for information security researchers. Natural language processing techniques, such as topic modeling, have been explored to extract information and categorize large sets of documents. Topic modeling, such as latent Dirichlet allocation (LDA) or non-negative matrix factorization (NMF), has recently gained popularity for discovering thematic structures in text collections. However, unsupervised topic modeling may not always produce the best results for specific tasks, such as fraud detection. Therefore, in the present work, we propose to use semi-supervised topic modeling, which allows the incorporation of specific knowledge of the study domain through the use of keywords to learn latent topics related to fraud. By leveraging relevant keywords, our proposed approach aims to identify patterns related to the vertices of the fraud triangle theory, providing more consistent and interpretable results for fraud detection. The model’s performance was evaluated by training with several datasets and testing it with another one that did not intervene in its training. The results showed efficient performance averages with a 7% increase in performance compared to a previous job. Overall, the study emphasizes the importance of deepening the analysis of fraud behaviors and proposing strategies to identify them proactively.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.