99 results on '"Öznitelik seçimi"'
Search Results
2. Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization.
- Author
-
ZORARPACI, Ezgi
- Subjects
NATURAL language processing ,TEXT mining ,FEATURE selection ,DIGITAL technology ,SENTIMENT analysis - Abstract
Copyright of Afyon Kocatepe University Journal of Science & Engineering / Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi is the property of Afyon Kocatepe University, Faculty of Science & Literature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
3. Determination of Variables Affecting Reading Skills Using the Boruta Algorithm in a Turkish Sample from the PISA 2018.
- Author
-
Şehribanoğlu, Sanem
- Subjects
FEATURE selection ,RANDOM forest algorithms ,SUBJECT headings ,READING ,GROUP reading - Abstract
Copyright of Journal of Faculty of Educational Sciences is the property of Ankara University, Faculty of Educational Sciences and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
4. Hiperspektral görüntülerde Relief-F algoritması ile band seçimi.
- Author
-
Yılmaz, Mehmet and Atasever, Ümit Haluk
- Abstract
Hyperspectral images contain detailed information for classification. However, these data negatively affect the classification results due to their high size, large data volume and strong correlation between adjacent bands. Classification efficiency and accuracy of hyperspectral images can be improved with an appropriate feature selection method. In this study, the Relief-F feature selection algorithm was preferred due to its features such as being independent of the classification model, not taking into account the assumption of multicollinearity, and being able to process noise values. Salinas-A, Indian Pines and Pavia University datasets were used as experimental data to examine the application effect of the Relief-F algorithm. After the applications, the Support Vector Machine classifier showed higher performance in the Salinas-A and Indian Pines datasets after band selection; It has been observed that the classification accuracy of the Random Forest method is largely preserved. The research results show that the Relief-F algorithm determines the most necessary features in hyperspectral images and the number of bands can be reduced by 60% - 70% with a good classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A comparative analysis on the reliability of interpretable machine learning.
- Author
-
YILDIRIM, Mustafa, OKAY, Feyza YILDIRIM, and ÖZDEMİR, Suat
- Subjects
- *
MACHINE learning , *FEATURE selection , *DEFAULT (Finance) , *COMPARATIVE studies - Abstract
There is often a trade-off between accuracy and interpretability in Machine Learning (ML) models. As the model becomes more complex, generally the accuracy increases and the interpretability decreases. Interpretable Machine Learning (IML) methods have emerged to provide the interpretability of complex ML models while maintaining accuracy. Thus, accuracy remains constant while determining feature importance. In this study, we aim to compare agnostic IML methods including SHAP and ELI5 with the intrinsic IML methods and Feature Selection (FS) methods in terms of the similarity of attribute selection. Also, we compare agnostic IML models (SHAP, LIME, and ELI5) among each other in terms of similarity of local attribute selection. Experimental studies have been conducted on both general and private datasets to predict company default. According to the obtained results, this study confirms the reliability of agnostic IML methods by demonstrating similarities of up to 86% in the selection of attributes compared to intrinsic IML methods and FS methods. Additionally, certain agnostic IML methods can interpret models for local instances. The findings indicate that agnostic IML models can be applied in complex ML models to offer both global and local interpretability while maintaining high accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Geleneksel Makine Öğrenmesi Yöntemleri ve Metasezgisel Yöntemlerle Öznitelik Seçim Yöntemlerinin Karşılaştırılması.
- Author
-
AÇAR, İsmail and AYDİLEK, İbrahim Berkan
- Abstract
Copyright of Dicle University Journal of Engineering / Dicle Üniversitesi Mühendislik Dergisi is the property of Dicle Universitesi and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
7. Improvement of Quality Performance in Mask Production by Feature Selection and Machine Learning Methods and An Application.
- Author
-
TEBRİZCİK, Semra, ERSÖZ, Süleyman, and AKTEPE, Adnan
- Subjects
FEATURE selection ,MACHINE learning ,MANUFACTURING defects ,EXTREME ultraviolet lithography ,MEDICAL masks ,MANUFACTURING processes ,MASKS - Abstract
Copyright of Journal of Defense Sciences / Savunma Bilmleri Dergisi is the property of Turkish Military Academy Defense Sciences Institute and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
8. Öznitelik Seçimi ile Desteklenen Makine Öğrenmesine Dayalı Göğüs Kanserinin Erken Tespiti ve Teşhisi.
- Author
-
AKYEL, Cihan, CİYLAN, Bünyamin, and POLAT, Hüseyin
- Abstract
Copyright of Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji is the property of Gazi University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
9. Hyperparameter Tunning and Feature Selection Methods for Malware Detection.
- Author
-
Yılmaz, Esra Kavalcı and Bakır, Halit
- Subjects
MALWARE ,SMARTPHONES ,MACHINE learning ,FEATURE selection ,ALGORITHMS - Abstract
Copyright of Journal of Polytechnic is the property of Journal of Polytechnic and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
10. INVESTIGATING THE EFFECT OF FEATURE SELECTION METHODS ON THE SUCCESS OF OVERALL EQUIPMENT EFFECTIVENESS PREDICTION
- Author
-
Özlem Kuvat and Ümit Yılmaz
- Subjects
feature selection ,machine learning ,overall equipment effectiveness ,öznitelik seçimi ,makine öğrenmesi ,toplam ekipman etkinliği ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Overall equipment effectiveness (OEE) describes production efficiency by combining availability, performance, and quality and is used to evaluate production equipment’s performance. This research’s aim is to investigate the potential of the feature selection techniques and the multiple linear regression method, which is one of the machine learning techniques, in successfully predicting the OEE of the corrugated department of a box factory. In the study, six different planned downtimes and information on seventeen different previously known concepts related to activities to be performed are used as input features. Moreover, backward elimination, forward selection, stepwise selection, correlation-based feature selection (CFS), genetic algorithm, random forest, extra trees, ridge regression, lasso regression, and elastic net feature selection methods are proposed to find the most distinctive feature subset in the dataset. As a result of the analyses performed on the data set consisting of 23 features, 1 output and 1204 working days of information, the elastic net - multiple linear regression model, which selects 19 attributes, gave the best average R2 value compared to other models developed. Occam's razor principle is taken into account since there is not a great difference between the average R2 values obtained. Among the models developed according to the principle, the stepwise selection - multiple linear regression model yielded the best R2 value among those that selected the fewest features.
- Published
- 2023
- Full Text
- View/download PDF
11. Hepatit hastalığının tespitinde bulanık mantık ve makine öğrenmesi yöntemlerinin karşılaştırılması.
- Author
-
Coşkun, Cengiz and Yüksek, Emre
- Abstract
Copyright of Dicle University Journal of Engineering / Dicle Üniversitesi Mühendislik Dergisi is the property of Dicle Universitesi and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
12. Metasezgisel yöntemlerle öznitelik sayısını azaltarak diyabetin erken dönemde tespiti.
- Author
-
ÖZMEN, Tuğberk, KUZU, Üzeyir, KOÇYİĞİT, Yücel, and SARNEL, Haldun
- Subjects
- *
FEATURE selection , *METAHEURISTIC algorithms , *WHALES , *DIABETES - Abstract
Diabetes is a metabolic disease that is common worldwide. The number of people suffering from diabetes is expected to increase every year around the world. This means a negative impact on both the comfort of life of individuals and the health system. In this respect, it is important to diagnose the disease at an early stage. The high dimensionality of the data used for diagnostic purposes has a negative effect on the cost and time of the calculation. To avoid this, it is important to select the most valuable features for diagnosis. In this study, feature selection was made using Salp Swarm Algorithm, Artificial Bee Colony Algorithm, Whale Optimization Algorithm and Ant Colony Algorithm using the samples in the UCI (UCI Machine Learning Repository) data store. In order to evaluate the selected features, accuracy, sensitivity and specificity parameters were calculated using k-Nearest Neighborhood (KNN), Naive Bayes (NB), Support Vector Machine (SVM) and Artificial Neural Networks (ANN) methods. In the calculations for the probability of having diabetes, an accuracy rate of 99.04% was obtained with the k-Nearest Neighborhood method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. A HYBRID DECISION SUPPORT SYSTEM APPLICATION WITH THE ANALYTIC HIERARCHY PROCESS AND DATA MINING TECHNIQUES: DIAGNOSIS OF COVID19 WITH COMPLETE BLOOD COUNT VALUES.
- Author
-
Bursalı, Ahmet and Suner, Aslı
- Subjects
EOSINOPHILS ,REVERSE transcriptase polymerase chain reaction ,KRUSKAL-Wallis Test ,STATISTICS ,COVID-19 ,CLINICAL decision support systems ,ANALYTIC hierarchy process ,BASOPHILS ,INTERNET ,LEUCOCYTES ,MEDICAL care ,ARTIFICIAL intelligence ,MACHINE learning ,MANN Whitney U Test ,LYMPHOCYTES ,NEUTROPHILS ,DIAGNOSTIC imaging ,PLATELET count ,DESCRIPTIVE statistics ,RESEARCH funding ,BLOOD cell count ,DECISION making in clinical medicine ,STATISTICAL sampling ,SENSITIVITY & specificity (Statistics) ,DATA analysis software ,DATA mining ,ALGORITHMS ,MONOCYTES - Abstract
Copyright of Karya Journal of Health Science is the property of Karya Journal of Health Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
14. Feature Selection From MagFace Face Recognition Model With Optimization Algorithms.
- Author
-
ÖZDEMIR, Mehmet Fatih and HANBAY, Davut
- Subjects
- *
FEATURE selection , *FACE perception , *OPTIMIZATION algorithms , *DEEP learning , *ARTIFICIAL intelligence - Abstract
In recent years, many studies have been carried out in the field of artificial intelligence in the literature with the development of equipment. Face recognition algorithms have an important place among these developments. Among the face recognition algorithms, the most successful ones are usually deep learning approaches. Models such as SphereFace, CosFace, ArcFace, and MagFace are important deep learning models in the literature. Despite their success, deep learning models are often computationally costly. Therefore, advanced methods are needed to reduce the computational load for these models. One of the most valid methods for this is to choose the most valuable one among embedding features for face recognition. Thus, cost can be reduced, and accuracy values can be increased even more. In this study, the most valuable of the 512 embedded features in the MagFace model was tried to be obtained by using PSO, GA, SCA, and DE optimization algorithms. As a result, accuracy values of 99.83%, 98.57%, and 98.65% were reached for 193, 252, and 280 features selected in the LFW, CFP, and AGEDB datasets, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Detection of brain tumor with a pre-trained deep learning model based on feature selection using MR images
- Author
-
Fatih Demir, Berna Arı, and Kürşat Demir
- Subjects
feature selection ,relieff algorithm ,mobilenetv2 ,brain tumor ,magnetic resonance imaging ,öznitelik seçimi ,relieff algoritması ,beyin tümörü ,manyetik rezonans görüntüleri ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
One of the most dangerous diseases in the world is a brain tumor. A brain tumor destroys healthy tissue in the brain and then multiplies abnormally, causing increased internal pressure in the skull. This can lead to death if not diagnosed early. Magnetic Resonance Imaging (MRI) is a diagnostic method that is frequently used in soft tissues and gives successful results. In this study, a brain tumor was automatically detected from MR images. For feature extraction, a pre-trained Convolutional Neural Network (CNN) model named MobilenetV2 was used. Then, the ReliefF algorithm was used for feature selection. The features extracted with MobileNetV2 and the features selected with the ReliefF algorithm are given separately to the classifiers and the system performance is tested. As a result of experimental studies, it was seen that the highest performance was obtained with the combination of MobileNetV2 feature extraction, ReliefF algorithm feature selection, and KNN classifier.
- Published
- 2023
- Full Text
- View/download PDF
16. Extreme Learning Machine Algorithms for Prediction of Positive Rate in Covid-19: A Comparative Study
- Author
-
Funda Kutlu Onay and Salih Berkan Aydemir
- Subjects
aşırı öğrenme makinaları ,covid-19 ,tahmin ,öznitelik seçimi ,extreme learning machines ,prediction ,feature selection ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Science ,Science (General) ,Q1-390 - Abstract
Various pandemics have been recorded in world history until today. The Covid-19 outbreak, which emerged at the end of 2019, has recently been a hot topic in the literature. In this work, extreme learning algorithms are presented as a comparative study for predicting the positive rate for the countries: India, Turkey, Italy, USA and UK. The features to be used in the learning phase are determined with the F-test feature selection method. For each extreme learning approach, results are obtained for each country with the root mean square error evaluation criteria. Accordingly, the radial basis kernel function produces the best estimation results, while the linear kernel function has the highest RMSE. Accordingly, the lowest RMSE value has been obtained for India as 4.17E-03 with the radial basis kernel function based ELM. Also, since Turkey's data contains too many outliers, it has the highest RMSE value (0.015 - 0.035) in linear kernel method among the countries.
- Published
- 2023
- Full Text
- View/download PDF
17. INVESTIGATING THE EFFECT OF FEATURE SELECTION METHODS ON THE SUCCESS OF OVERALL EQUIPMENT EFFECTIVENESS PREDICTION.
- Author
-
YILMAZ, Ümit and KUVAT, Özlem
- Subjects
- *
FEATURE selection , *MACHINE learning , *RANDOM forest algorithms , *GENETIC algorithms , *REGRESSION analysis , *SUCCESS - Abstract
Overall equipment effectiveness (OEE) describes production efficiency by combining availability, performance, and quality and is used to evaluate production equipment's performance. This research's aim is to investigate the potential of the feature selection techniques and the multiple linear regression method, which is one of the machine learning techniques, in successfully predicting the OEE of the corrugated department of a box factory. In the study, six different planned downtimes and information on seventeen different previously known concepts related to activities to be performed are used as input features. Moreover, backward elimination, forward selection, stepwise selection, correlation-based feature selection (CFS), genetic algorithm, random forest, extra trees, ridge regression, lasso regression, and elastic net feature selection methods are proposed to find the most distinctive feature subset in the dataset. As a result of the analyses performed on the data set consisting of 23 features, 1 output and 1204 working days of information, the elastic net - multiple linear regression model, which selects 19 attributes, gave the best average R² value compared to other models developed. Occam's razor principle is taken into account since there is not a great difference between the average R² values obtained. Among the models developed according to the principle, the stepwise selection - multiple linear regression model yielded the best R² value among those that selected the fewest features. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Öznitelik seçimi problemleri için ikili beyaz köpekbalığı optimizasyon algoritması.
- Author
-
ONAY, Funda KUTLU
- Subjects
- *
COMMAND & control systems , *GAMMA distributions , *DEBUGGING , *COMPUTER software testing , *REAL-time control , *OPTIMIZATION algorithms - Abstract
In this study, the geometric process (GP) model is considered in order to calculate the debugging and testing costs of a software product. Under the assumption of the GP model, the debugging and testing costs of the software product are obtained depending on the first and second moment functions of the GP. It is observed that the values of the first and second moment functions of the process must be known in order to calculate the debugging and testing costs. At the same time, the calculation of moment functions also depends on both the distribution of the first interarrival time of the GP and the estimates of the model and distribution parameters. In this study, the proposed debugging and testing costs are calculated for the data set containing 136 failure times of a real-time command and control system. For this dataset, it has been shown in previous studies that the GP with gamma distribution can be proposed as a model. Under gamma distribution assumption, the maximum likelihood estimates of the model parameters are obtained. Using the estimates of the model parameters, the first and second moment functions of the GP are calculated with the help of the numerical methods proposed for these functions. Finally, the debugging and testing costs are obtained for the data set. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Dengesiz Metin Sınıflandırmada Öznitelik Seçim Yöntemlerinin Etkililiği.
- Author
-
TİRYAKİ, Hande and UYSAL, Alper Kürşat
- Published
- 2023
- Full Text
- View/download PDF
20. AN APPLICATION OF THE FEATURE SELECTION METHOD BASED ON PAIRWISE CORRELATION FOR DIAGNOSIS OF OVARIAN CANCER WITH MACHINE LEARNING.
- Author
-
BAŞEĞMEZ, Hülya
- Subjects
FEATURE selection ,GENETICS ,CANCER diagnosis ,OVARIAN cancer ,MACHINE learning - Abstract
Copyright of Bingol University Journal of Economics & Administrative Science is the property of Bingol University Journal of Economics & Administrative Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
21. Filtre Tabanlı Öznitelik Seçim Yöntemleri Kullanılarak Metinlerde Duygu Sınıflandırması Üzerine Karşılaştırmalı Bir Çalışma.
- Author
-
SAĞBAŞ, Ensar Arif
- Abstract
Sentiment analysis as a text classification problem is a critical task of extracting subjective information from online text documents. An important problem of text classification is high dimensionality. Dimension reduction is an effective way to improve classification performance in machine learning. Reducing irrelevant features can reduce training time and improve classification accuracy. The performance of different feature selection methods may vary depending on the characteristics of different datasets. In this study, the performance of 6 different filter-based feature selection methods (Correlation-based feature selection, Chi-square, Gain ratio, Information gain, OneR, and Symmetric uncertainty coefficient) were tested and compared on 9 different datasets that are frequently used in sentiment classification. Filter scores were calculated for each feature selection method in all datasets. The obtained filter scores were sorted descendingly. New feature subsets were created and classified by adding features to the previous subset from the feature with the highest filter score to the feature with the lowest filter score. The computational results show that the proposed approach achieves average accuracy rates of 94.34% using the Multinomial Naive Bayes classifier for 9 general sentiment classification datasets. Considering the search space, it can be concluded that this approach can be improved and is competitive with existing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Extreme Learning Machine Algorithms for Prediction of Positive Rate in Covid-19: A Comparative Study.
- Author
-
AYDEMİR, Salih Berkan and ONAY, Funda KUTLU
- Subjects
FEEDFORWARD neural networks ,COVID-19 pandemic ,PREDICTION models ,FEATURE selection ,KERNEL functions - Abstract
Copyright of Duzce University Journal of Science & Technology is the property of Duzce University Journal of Science & Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
23. Rüzgar Gücü Tahmininde Genetik Algoritma ile Öznitelik Seçimi.
- Author
-
YAĞMUR, Ece and YAĞMUR, Sercan
- Published
- 2022
- Full Text
- View/download PDF
24. Makine Öğrenmesi Algoritmaları ile Türkçe için İstenmeyen SMS Filtreleme.
- Author
-
Parlak, Bekir
- Abstract
In this study, the effect of various feature selection approaches and preprocessing technique on filtering spam messages of Turkish language short message service (SMS=Short Message Service) was investigated. In the filtering stage, the entire feature set consists of the features exposed by the Bag-of-Words (BoW) model. Distinctive features in the BoW are determined using feature selection methods. It is then fed into model classification algorithms that are commonly used to classify SMS messages. The filtering framework was evaluated only on the Turkish SMS dataset. Extensive experimental analysis on the relevant datasets revealed that combinations of MNB classifier and EFS feature selection methods provide better classification performance. The effectiveness of the feature selection methods used varies slightly in each classifier. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Müşteri Kayıplarının Tahmini Üzerine Bir Veri Madenciliği Uygulaması.
- Author
-
Büyükkeçeci, Mustafa and Okur, Mehmet Cudi
- Abstract
Copyright of Dokuz Eylul University Muhendislik Faculty of Engineering Journal of Science & Engineering / Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi is the property of Dokuz Eylul Universitesi Muhendislik Fakultesi Fen ve Muhendislik Dergisi and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
26. Metin Sınıflandırma için Öznitelik Ağırlıklandırma Metotlarının Lokal Öznitelik Seçim Metotları Üzerindeki Rolü.
- Author
-
Parlak, Bekir
- Subjects
- *
SUPPORT vector machines , *FEATURE selection , *AUTOMATIC classification , *ODDS ratio - Abstract
With the development of internet technologies, there has been a significant increase in textual data. Automatic text classification approaches have become important in order for these textual data to become meaningful. Feature selection and feature weighting have an important place in automatic text classification approaches. In this study, the effect of feature weighting methods on local feature selection methods is examined in detail. Two different weighting methods, three different local feature selection methods, three different criteria datasets, and two classifiers were used in the study. The highest Micro-F1 and Macro-F1 scores were 92.88 and 65.55 for the Reuters-21578 dataset, 99.02 and 98.15 for the 20Newsgroup dataset, and 97.19 and 93.40 for the Enron1 dataset. Experimental results show that better results are obtained with the combination of Odds Ratio (OR) feature selection method, Term Frequency (TF) feature weighting and Support Vector Machine (SVM) classifier. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Ön Eğitimli Evrişimsel Sinir Ağı Modellerinde Öznitelik Seçim Algoritmasını Kullanarak Cilt Lezyon Görüntülerinin Sınıflandırılması.
- Author
-
TAŞCI, Burak
- Abstract
As stated by the World Health Organization, the occurrence of skin cancer has been increasing in recent years. Between 2 and 3 million non-melanoma skin cancers and at least 132.000 malignant skin cancers occur worldwide each year. Appropriate automatic diagnosis of skin lesions and melanoma recognition can greatly improve the early detection of melanomas. Early diagnosis in skin cancer ensures that patients have the correct diagnosis and treatment. In this study, deep features were extracted from skin lesion images to diagnose whether skin cancer is malignant or not, using cubic-type Support Vector Machine (SVM) classifier and pre-trained Convolutional Neural Network (CNN) based AlexNet and ResNet50 deep architectures, and then combined. Then, effective and distinctive features were selected from these deep features with the ReliefF algorithm. Different classifier algorithms were applied to the combined deep features. Cubic type SVM is used as it gives the best results. In the proposed method, the classification accuracy is 92.41% for the Kaggle dataset and 85.17% for the HAM10000 dataset. In experimental studies, it has been observed that the accuracy score of the proposed model is more successful than other studies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. ANALYSIS OF THE FEATURES FOR AUTOMATIC CLASSIFICATION OF ACADEMIC PERFORMANCE.
- Author
-
EREN, Hakan Alp and GUNAL, Efnan SORA
- Subjects
COLLEGE teachers ,PERFORMANCE contracts in education ,DATA mining ,MACHINE learning ,FEATURE selection - Abstract
Copyright of Journal of Engineering & Architectural Faculty of Eskisehir Osmangazi University / Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi is the property of Eskisehir Osmangazi University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
29. Classification of the death ratio of COVID-19 Pandemic using Machine Learning Techniques.
- Author
-
Ulas, Efehan and Filiz, Enes
- Subjects
COVID-19 pandemic ,DEATH rate ,EPIDEMIOLOGICAL models ,MACHINE learning ,FEATURE selection - Abstract
Copyright of Erzincan University Journal of Science & Technology is the property of Erzincan Binali Yildirim Universitesi and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
30. Siber Saldırılar için Rastgele Orman Algoritması Kullanılarak Öznitelik Seçimi.
- Author
-
BİLEN, Abdulkadir and ÖZER, Ahmet Bedri
- Abstract
With the increase in data sizes, researchers needed various methods to make the analysis process easier. It is important to reduce the data size and increase the analysis accuracy. When analyzing data, it is necessary not to deal with unnecessary fields and to produce more accurate results with less input. It is one of the most important first steps in feature selection and data analysis. Various machine learning methods are used for feature selection. Univariate Feature Selection, Recursive Feature Elimination, Tree-Based Feature Selection and Principal Component Analysis methods were used in the study. With these methods, the most important ones among the 13 features in the data set were determined. The most important 6, 5, and 4 attributes were separately input, and the cyber-attack method was predicted with the Random Forest algorithm. When the number of features was reduced to 4, the highest accuracy rate of 97.24% was obtained. It has been concluded that the inclusion of related features in the estimation is important in terms of size and speed in this ratio feature selection. With the results obtained, the importance of feature selection on the data has been demonstrated once again. [ABSTRACT FROM AUTHOR]
- Published
- 2022
31. Türkiye Covid-19 günlük hasta sayısındaki değişimin sınıflandırılmasına yönelik tahmininin destek vektör makineleri ve k-en yakın komşu algoritmaları ile gerçekleştirilmesi.
- Author
-
FİLİZ, Enes
- Subjects
- *
COVID-19 pandemic , *K-nearest neighbor classification , *SUPPORT vector machines , *COVID-19 vaccines , *COVID-19 , *FEATURE selection - Abstract
Since December 2019, the Covid-19 virus afftected our lives and continues to affect the whole world significantly. The investigistion of the indicators of the Covid-19 virus and vaccination studies are of great interest to overcome the Covid- 19 pandemic based on the World health organization recommendations. In this context, many scientific studies have revealed valuable information for the future of the virus. In this study, estimation of the cOvid-19 cases and Classification of changes in the daily number of cases in Turkey was carried out by using support vector machine and k-nearest neighbor algorithms. The indicators that play a critical role in the estimation of the daily patient number classification have been determined as "positivity rate", "fillation rate", "workplace mobility" and "mobility in parks". It has been observed that the k-nearest neighbor algorithm (84.7%) is the most successful algorithm in the estimation of the daily number of cases when considering the highlighted features. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Parkinson Hastalarının Tespitinde Karınca Koloni Algoritması ile Seçilen Özniteliklerin Performansa Etkisi
- Author
-
Ali Narin
- Subjects
parkinson ,min-max nomalizasyon ,karınca koloni optimizasyon algoritması ,öznitelik seçimi ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Science ,Science (General) ,Q1-390 - Abstract
Nerodejeneratif bir hastalık olan Parkinson, dopamin üreten hücrelerin zamanla azalması sonucunda ortaya çıkar. Bu azalma yaşa bağlı olarak değişir. Dünya nüfusunun yaşlandığı gerçeğine göre bakıldığında bu hastalığın ilerleyen yıllarda daha da artacağı söylenebilir. Parkinson hastalığının tanısı oldukça uzun süreli bir iştir. Kesin bir tanı mekanizması olamamakla birlikte çoğunlukla hasta uzun bir süre takibe alınır ve sonrasında Parkinson hastalığına tanı konulabilir. Bu çalışmada, nörologlara yardımcı bir tanı mekanizması önerilmiştir. Ses verileri yardımıyla Parkinson hastalığına sahip olanlar otomatik olarak tespit edilmiştir. Elde edilen özniteliklere min-max normalizasyon işlemi uygulanıp, karınca koloni algoritması (KKA) ile özniteliklerin seçilmesi işlemi ile tespit başarımlarının arttırılması amaçlanmıştır. Hem normalize edilmiş hem KKA ile seçilmiş özniteliklerin başarımı arttırdığı gösterilmiştir. Destek vektör makinalarının ikinci dereceden fonksiyonları ve KKA ile seçilen 30 adet öznitelik ile %87,5 doğruluk, %89,2 duyarlılık, %85,8 özgüllük ve %89,2 hassaslık ile en yüksek başarım değerleri elde edilmiştir.
- Published
- 2020
- Full Text
- View/download PDF
33. Öznitelik Seçme Yöntemlerinin Makine Öğrenmesi Tabanlı Saldırı Tespit Sistemi Performansına Etkileri.
- Author
-
Emanet, Sura, Baydogmus, Gozde Karatas, and Demir, Onder
- Abstract
Copyright of Dicle University Journal of Engineering / Dicle Üniversitesi Mühendislik Dergisi is the property of Dicle Universitesi and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
34. İMMÜNOTERAPİ TEDAVİSİNİN BAŞARISINI ETKİLEYEN FAKTÖRLER: BİR İZ ANALİZİ ÇALIŞMASI.
- Author
-
ACAR, Saliha
- Abstract
Copyright of Journal of Engineering & Architectural Faculty of Eskisehir Osmangazi University / Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi is the property of Eskisehir Osmangazi University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
35. Uyartım frekansının kestiriminde istatistiksel anlamlılığa dayalı olarak seçilen durağan durum görsel uyarılmış potansiyellere ait dalgacık özniteliklerinin değerlendirilmesi.
- Author
-
Sayılgan, Ebru, Yüce, Yılmaz Kemal, and İşler, Yalçın
- Subjects
- *
VISUAL evoked potentials , *FEATURE selection , *WAVELET transforms , *ELECTROENCEPHALOGRAPHY , *ENTROPY (Information theory) , *BRAIN-computer interfaces , *HILBERT-Huang transform , *WAVELETS (Mathematics) - Abstract
Electroencephalography (EEG) is a noninvasive method to record brain activities. Among different EEG recording methods, the recording, while a visual stimulation is shown to the subject, is one of the most popular methods. Recently, steady-state visually-evoked potentials (SSVEP) where visual objects are blinking at a fixed frequency have been commonly-used method in brain-computer interfaces. Although various features extracted from SSVEP records have been used, the use of features from wavelet transform should be preferred due to the nonstationary structure of these signals. In this study, the combination of mother wavelet and classifier, which gives the highest accuracy to determine the stimulating frequency, is examined by applying common wavelet features to inputs of classifiers. Features of energy, variance, and entropy were extracted for well-known five EEG frequency bands using six different mother wavelets. Then, classifier performances of six basic classifiers were compared. This study was run for both each subjects individually and all subjects together. Results showed that (i) ANOVA-based feature selection reduces the performances, (ii) there is no unique combination of classifier and mother wavelet while evaluating each subject individually, (iii) the highest performance was achieved by combination of ensemble learner and Reverse Biorthogonal wavelet while evaluating all subjects together. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
36. One-dimensional Center Symmetric Local Binary Pattern Based Epilepsy Detection Method.
- Author
-
METİN, Serkan
- Subjects
- *
DIAGNOSIS of epilepsy , *ELECTROENCEPHALOGRAPHY , *FEATURE extraction , *KERNEL functions , *WAVELET transforms - Abstract
The diagnosis of epilepsy from the EEG signals is determined by the visual/manual evaluation performed by the neurologist. This evaluation process is laborious and evaluation results vary according to the experience level of neurologists. Therefore, automated systems that will be created using advanced signal processing techniques are important for diagnosis. In this study, a new feature extraction method is proposed using multiple kernel based one-dimensional center symmetric local binary pattern (1D-CSLBP) to identify epileptic seizures. To strengthen this method, levels have been created and multi-level feature extraction has been carried out. Discrete wavelet transform (DWT) was used to generate the levels and feature extraction was performed using the low pass filter coefficient (L bands) obtained at each level. Neighborhood component analysis (NCA) was used to select the most distinctive features. The obtained features are classified using the nearest neighbors (kNN) algorithm. A high performance method was obtained by using multiple kernel NCA and NCA. The 1D-CSLBP and NCAbased method has reached 100.0% accuracy in A-E, A-D-E, D-E, C-E situations. [ABSTRACT FROM AUTHOR]
- Published
- 2021
37. Kendini tekrarlayan derin sinir ağlarının öznitelik seçim yöntemleri ile iyileştirilmesi ve zaman serisi olarak ele alınan otomatik tanımlama sistemi verilerinde kullanımı.
- Author
-
Doğan, Yunus
- Subjects
- *
RECURRENT neural networks , *BOX-Jenkins forecasting , *FEATURE selection , *HAZARDOUS substances , *AUTOMATIC identification , *MACHINE learning - Abstract
Automatic Identification System (AIS) is an observation and analysis system that has become compulsory nowadays due to the risks of maritime transportation such as collision, fire, and spillage of hazardous or polluting substances. In the literature, we can see the applications of basic mathematical models, statistical models and machine learning algorithms using AIS data in order to detect these dangers in advance and to make controlled and safe travel of ships. In this study, AIS data have been evaluated as time series, and accuracy comparisons have been made by being developed different models with Autoregressive Integrated Moving Average, Multilayer Perceptron (MLP) and Deep Recurrent Neural Networks (DRNN) beside traditional route estimation model. In addition, feature selection techniques have been weighted in MLP and RDNN models, and new algorithms have been proposed with these improving. Relief, Pearson's Correlation, Gain Ratio and Information Gain (IG) methods were used to compare the accuracy of the route and collision estimations. In order to be used in these accuracy tests, AIS data related into certain times of Çanakkale Strait and Marmara Sea were used. The results showed that all the approaches were close and high accuracy due to the linear movement of the ships in Çanakkale Strait. On the other hand, it has been observed that the best approach in the Marmara Sea was the improved DRNN with IG due to its irregular structure. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
38. Data Mining Genome-Based Algorithm for Optimal Gene Selection and Prediction of Colorectal Carcinoma.
- Author
-
BANJOKO, Alabi Waheed
- Subjects
- *
COLON cancer , *DATA mining , *ALGORITHMS , *FORECASTING , *GENE expression profiling - Abstract
Objective: This study presents a method for optimal selection of gene subsets to enhance the non-clinical diagnostic classification and prediction of colorectal cancer using gene expression level of gene expression profiles obtained with an Affymetrix oligonucleotide array. Material and Method: A Hybrid multiobjective Support vector Machine (SVM) feature selection and classification algorithm was employed to determine the Biomarker gene subsets that are highly statistically and clinically relevant to the 62 (tumour or normal) responses of the gene expression levels. The genes selection was done in two stages with the first stage using the Bayesian t-test to prune the non-informative genes and the second stage employed the multi-objective optimization method that allows sequential addition of genes for optimal determination of the pre-selected gene subsets. The SVM with RBF kernel (SVMREF) was fitted sequentially to select the set of near-optimal genes that are correlated with the response class. Results: The optimally selected gene subset yielded an accuracy of 90.1% on the test data that were never used in the building process of the algorithm. Furthermore, the results obtained from the principal component analysis and the complete linkage hierarchical clustering indicated near-perfect discrimination of the two clinical response groups of the colorectal cancer status of the patients. Conclusion: This work has fully demonstrated that non-clinical colon cancer diagnosis and prediction of patients using their gene signatures from the gene microarray expression data is very possible when the appropriate data mining technique tools are used. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
39. Android Platformunda Kötücül Yazılım Tespiti: Literatür İncelemesi.
- Author
-
PEYNİRCİ, Gökçer and EMİNAĞAOĞLU, Mete
- Abstract
Copyright of International Journal of InformaticsTechnologies is the property of Institute of Informatics, Gazi University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
40. DETECTION OF LUNG DISORDERS USING EMBEDDED AND WRAPPER FEATURE SELECTION METHODS
- Author
-
Mustafa Alptekin ENGİN and Selim ARAS
- Subjects
Engineering, Electrical and Electronic ,Solunum sesleri ,öznitelik çıkarımı ,öznitelik seçimi ,sınıflandırma ,Mühendislik, Elektrik ve Elektronik ,Respiratory sounds ,feature extraction ,feature selection ,classification - Abstract
Despite the advances in biomedical signal processing in recent years, the need for fast and highly accurate diagnostic systems for the detection of lung disorders continues. In the study, 150 normal and 444 abnormal lung sounds obtained by automatic detection of respiratory cycles from 94 different people by physical examination were used as a database. Then, 12 different feature extraction methods were applied in the time and frequency domain. Features were evaluated using embedded and wrapper feature selection methods. These methods are recursive feature elimination, adaptive structure learning, dependence-guided unsupervised feature selection, unsupervised feature selection with ordinal locality, feature selection via concave minimization, least absolute shrinkage, and selection operator feature selection methods. Features are classified by linear support vector machines, k nearest neighbor, decision trees, and naive Bayes classification methods. As a result, when the number of features is not limited, 97.3% accuracy is obtained when the recursive feature elimination method is used together with the k nearest neighbor classifier. In the case where the number of features is limited to three, the classification accuracy of 91.4% was achieved using the adaptive structure learning feature selection method and the decision trees., Son yıllarda biyomedikal sinyal işleme alanındaki gelişmelere rağmen, akciğer rahatsızlıklarının tespiti üzerine hızlı ve yüksek doğrulukta çalışan teşhis sistemlerine duyulan ihtiyaç artmaktadır. Yapılan çalışmada fiziki muayene ile 94 farklı kişiden, solunum döngülerinin otomatik olarak tespit edilmesiyle elde edilen 150 adet normal ve 444 adet normal olmayan akciğer sesleri veri tabanı olarak kullanılmıştır. Sınıflandırma işleminde öznitelik olarak frekans ve zaman bölgesinde 12 farklı yöntem uygulanmıştır. Tüm veriler %80 eğitim %20 test aşamasında kullanılacak şekilde ikiye bölünmüştür. Elde edilen öznitelikler gömülü ve sarıcı öznitelik seçim yöntemleri kullanılarak değerlendirilmiştir. Bu yöntemler; özyinelemeli öznitelik eliminasyonu, uyarlanabilir yapı öğrenimi ile öznitelik seçimi, bağımlılık kılavuzlu denetimsiz öznitelik seçimi, sıralı yerellik ile denetimsiz öznitelik seçimi, içbükey küçültme yoluyla öznitelik seçimi, en küçük mutlak büzülme ve seçim operatörü öznitelik seçim yöntemleri olarak isimlendirilmektedir. İncelenen bu öznitelikler doğrusal destek vektör makineleri, k en yakın komşuluk, karar ağaçları ve naive bayes yöntemleri ile sınıflandırılmıştır. Sonuç olarak öznitelik sayısının sınırlandırılmadığı durum için, özyinelemeli öznitelik eliminasyonu yönteminin k en yakın komşuluk sınıflandırma ile beraber kullanıldığı durum için %97,3 doğruluk değerindeki başarıma ulaşılmaktadır. Öznitelik sayısının üç ile sınırlandırıldığı durumda ise uyarlanabilir yapı öğrenimi ile öznitelik seçimi yönteminin karar ağaçları yöntemi ile beraber kullanılması ile %91,4 değerinde başarıma ulaşılmıştır.
- Published
- 2022
- Full Text
- View/download PDF
41. A Comparison of the Multivariate Calibration Methods with Feature Selection for Gas Sensors' Long-Term Drift Effect.
- Author
-
ERGÜN, Gülnur Begüm and GÜNEY, Selda
- Subjects
- *
ELECTRONIC noses , *DETECTORS , *CALIBRATION , *SIGNAL processing , *STANDARDIZATION - Abstract
In many electronic nose applications where gas sensors utilizing for a long time, there is an undesirable drift effect on the sensors, which affects the classification quality negatively. Although the sensor drift is inevitable, it is possible to reduce this effect with the calibration transfer methods. This paper presents a comparison study of various multivariate standardization methods to facilitate an effective calibration way on a comprehensive dataset, which is reachable on-line. In this study, three methods applied: direct standardization (DS) orthogonal signal correction (OSC) and piecewise direct standardization (PDS). In addition, these three methods are applied data, which consisted of selected features. The results have shown that the classification success has increased with multivariate calibration technique applied to the selected features. The results also demonstrate that using the best features in the signal processing part can play an important role for the calibration success. This outcome may lead to a new perspective for the future works. [ABSTRACT FROM AUTHOR]
- Published
- 2019
42. Makine Öğrenmesi ve Öznitelik Seçim Yöntemleriyle Saldırı Tespiti.
- Author
-
KAYNAR, Oğuz, ARSLAN, Halil, GÖRMEZ, Yasin, and IŞIK, Yunus Emre
- Abstract
Copyright of International Journal of InformaticsTechnologies is the property of Institute of Informatics, Gazi University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2018
- Full Text
- View/download PDF
43. Sınıflandırma için diferansiyel mahremiyete dayalı öznitelik seçimi.
- Author
-
Var, Esra and İnan, Ali
- Abstract
Selecting a relevant subset of attributes is one of the most important data preprocessing steps of data mining and machine learning solutions. For the classification task, selection is based on the correlation between an attribute and the class attribute. There are various studies on privacy preserving classification. However, there is no attribute selection solution for such work in the literature. In this study, novel attribute selection methods based on the state of the art solution in statistical database security, known as differential privacy, are proposed. The proposed solutions are implemented with the popular data mining library WEKA and experimental results confirm the positive effects of the proposed solutions on classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
44. Büyük Dünya Endeksleri Kullanılarak BIST-100 Endeksi Değişim Yönünün Makine Öğrenmesi Algoritmaları ile Sınıflandırılması
- Author
-
Hasan Aykut Karaboğa, Serkan Akogul, and Enes Filiz
- Subjects
BIST-100 endeksi,Makine öğrenmesi,Öznitelik seçimi,Sınıflandırma algoritmaları ,Sınıflandırma algoritmaları BIST-100 index ,Classification algorithms ,Fen ,Science ,Machine learning ,Feature selection ,BIST-100 endeksi ,Öznitelik seçimi ,General Medicine ,Makine öğrenmesi - Abstract
Borsa İstanbul 100 (BIST-100) endeksi, diğer büyük dünya endeksleri ile birlikte finans piyasalarının küreselleşme değişiminin bir parçası olmuştur. Endeksler arasındaki ilişkinin analizi yatırımcılara büyük avantajlar sağlayacaktır. Bu durumdan yola çıkarak çeşitli makine öğrenmesi algoritmaları ile büyük dünya endeksleri ve bazı makroekonomik göstergeler kullanılarak BIST-100 endeksinin değişim yönünün (artış-azalış) sınıflandırılması amaçlanmıştır. Bu amaç doğrultusunda BIST-100 endeksinin değişim yönünün sınıflandırmasında etkin rol oynayan değişkenler belirlenmiş ve belirlenen bu değişkenler yardımıyla sınıflandırma başarılarında değişim olup olmadığı incelenmiştir. Tüm değişkenler ile yapılan sınıflandırmada lojistik regresyonun %70,6; öznitelik seçimi ile yapılan sınıflandırmada da Destek Vektör Makinesi PUK çekirdeği algoritmasının %71,9 daha doğru sınıflandırma başarısı gösterdiği belirlenmiştir. Böylelikle daha az sayıda değişken ile daha yüksek sınıflandırma başarısı elde edilmiştir.
- Published
- 2021
- Full Text
- View/download PDF
45. Öznitelik Seçiminde Genetik Algoritma Kullanılarak Kur’an-ı Kerim Ayetlerinin Otomatik Sınıflandırılması
- Author
-
MERT, Fatih, AYDIN, Muhammed Ali, and ORMAN, Zeynep
- Subjects
Engineering ,Quran ,Text Classification ,Genetic Algorithm ,Machine Learning ,Feature Selection ,Kur ,Metin Sınıflandırma ,Genetik Algoritma ,Makine Öğrenmesi ,Öznitelik Seçimi ,Mühendislik - Abstract
Text classification, also known as text tagging, is the process of dividing a given text into organized groups. Using Natural Language Processing methods, text classifiers can automatically analyze text and then assign a set of predefined tags or categories based on its content. If it is a verse of the Holy Qur'an, the main purpose of labeling is to determine the theme of the verse. However, current approaches to verse tagging depend primarily on the availability of scholars with deep expertise in the Arabic language and Qur'anic exegesis. In this study, it is suggested to automate the task of tagging Qur'anic verses using text classification algorithms. In the experiments we carried out with the classification algorithms, the 15 predefined categories to which the English translations of the verses belong were used as features. Unlike similar studies in the literature, Genetic Algorithm was used in the feature selection stage. Thus, it is aimed that this intermediate step will have a positive effect on the final performance. At the end of the study, the performance values of the classification models are given comparatively by using various performance evaluation metrics., Metin etiketleme olarak da bilinen metin sınıflandırması verilen bir metni organize gruplara ayırma işlemidir. Metin sınıflandırıcılar, Doğal Dil İşleme yöntemlerini kullanarak metni otomatik olarak analiz edebilir ve ardından içeriğine göre bir dizi önceden tanımlanmış etiket veya kategori ataması yapabilir. Söz konusu bir Kur'an ayeti ise, etiketlenmedeki temel amaç ayetin ilgili olduğu temanın belirlenmesidir. Ancak mevcuttaki ayet etiketleme yaklaşımları öncelikli olarak Arapça dilinde ve Kur'an tefsirinde derin uzmanlığa sahip alimlerin mevcudiyetine bağlıdır. Bu çalışmada metin sınıflandırma algoritmalarını kullanarak Kur'an ayetlerinin etiketlenmesi görevinin otomatikleştirilmesi önerilmektedir. Sınıflandırma algoritmaları ile gerçekleştirdiğimiz deneylerde ayetlerin İngilizce çevirilerinin ait oldukları önceden tanımlanmış 15 kategori öznitelik olarak kullanılmıştır. Literatürdeki benzer çalışmalardan farklı olarak öznitelik seçimi aşamasında Genetik Algoritma kullanılmıştır. Böylece gerçekleştirilen bu ara adımın nihai performansa olumlu etki etmesi amaçlanmıştır. Çalışmanın sonunda çeşitli performans değerlendirme metrikleri kullanılarak sınıflandırma modellerinin başarım değerleri karşılaştırılmalı olarak verilmiştir.
- Published
- 2022
46. Feature Selection by Genetic Algorithm for Wind Power Prediction
- Author
-
ÇETİN YAĞMUR, Ece and YAĞMUR, Sercan
- Subjects
Engineering ,Mühendislik ,Makine öğrenmesi ,Rüzgar gücü ,Yenilenebilir enerji ,Öznitelik seçimi ,Genetik algoritma ,Machine learning ,Wind power ,Renewable energy ,Feature selection ,Genetic algorithm - Abstract
Sürdürülebilir gelişim için yenilenebilir enerji kaynaklarına olan ihtiyaç her geçen gün artmaktadır. Bu kaynaklardan birisi de rüzgar enerjisidir. Rüzgarın stokastik yapısı nedeniyle rüzgar hızı ve rüzgar gücünün tahmini son yıllarda araştırmacılar tarafından oldukça ilgi çeken bir konu haline gelmiştir. Yapılan çalışmada Türkiye’de yer alan bir rüzgar türbini için 2018 yılı boyunca SCADA sistemi ile elde edilen veri seti ile aynı konum için NASA tarafından paylaşılan meteorolojik veri seti kullanılarak rüzgar gücü tahmini gerçekleştirilmiştir. Girdi değişkenleri olarak SCADA sisteminden çekilen rüzgar hızı, rüzgar yönü ve teorik güç eğrisi; NASA sisteminden çekilen meteorolojik parametreler ve rüzgar gücüne ait geçmiş veriler kullanılmıştır. Modelde yer alan ve hesaplama karmaşıklığına neden olan gereksiz öznitelikler model performansını artırmak amacıyla sarmal seçim yöntemi ile modelden çıkarılmıştır. Sarmal seçim yöntemi olarak Genetik Algoritma (GA) kullanılmıştır. Yapılan çalışmada hem farklı makine öğrenme algoritmalarının tahmin gücü, farklı performans ölçütlerine göre karşılaştırılmış hem de öznitelik seçiminin modele etkisi değerlendirilmiştir. GA ile önerilen nihai modelde değişken sayısı 47’den 9’a indirgenerek gereksiz değişkenler modelden uzaklaştırılmış ve en az sayıda değişken ile R2 değeri 0,98 olan güçlü bir tahmin modeli elde edilmiştir., The need for renewable energy sources for sustainable development has been increasing every day. One of these sources is wind energy. Due to the stochastic nature of the wind, the estimation of wind speed and wind power has been a subject of great interest to researchers in recent years. In this study, wind power estimation was carried out for a wind turbine in Turkey, using the data set obtained by the SCADA system during 2018 and the meteorological data set shared by NASA for the same location. Wind speed, wind direction, and theoretical power curve were taken from the SCADA system as input variables; Meteorological parameters were taken from the NASA system and historical data of wind power were used. Unnecessary features in the model that cause computational complexity are removed from the model with the wrapper selection method to increase model performance. Genetic Algorithm (GA) was used as the wrapper selection method. In the study, the predictive power of different machine learning algorithms was compared according to different performance criteria and the effect of feature selection on the model was evaluated. In the final model proposed by GA, the number of variables was reduced from 47 to 9, unnecessary variables were removed from the model, and a strong prediction model with R2 value of 0.98 was obtained with the least number of variables.
- Published
- 2022
47. Makine Öğrenimi Teknikleri kullanılarak COVID-19 Pandemisinin ölüm oranının sınıflandırılması
- Author
-
ULAŞ, Efehan and FİLİZ, Enes
- Subjects
Classification ,Machine learning ,Decision tree ,COVID-19 ,Feature selection ,Engineering ,Mühendislik ,Sınıflandırma ,Makine öğrenmesi ,Karar ağaçları ,Öznitelik seçimi - Abstract
COVID-19 pandemisi ortaya çıktığından beri, enfekte olmuş bireylerin sayısını ve COVID-19 salgınının ölüm oranını tahmin etmek için dünya çapında birçok epidemiyolojik model geliştirilmiştir. CoVID-19 üzerinde makine öğrenimi teknikleri kullanılarak geliştirilmiş birkaç model bulunmaktadır. Ancak öznitelik seçimini ayrıntılı olarak ele alan çalışmalar oldukça sınırlıdır. Bu nedenle, bu çalışmanın amacı (i) çeşitli özelliklerin bağımsız ve etkileşimli etkilerini araştırmak ve (ii) COVID-19 salgınının ölüm oranını sınıflandırmak için önemli olan algoritmaları bulmaktır. Lojistik regresyon ve karar ağacının (C4.5, Random Forests ve REPTree) en uygun algoritmalar olduğu bulunmuştur. Öznitelik seçme yöntemleriyle elde edilen çeşitli öznitelikler, binde yeni test sayısı, milyonda yeni vaka, milyonda hastane hasta sayısı ve milyonda haftalık hastane kabulüdür. Bu çalışmanın önemi, birkaç özellik ile yüksek oranda sınıflandırma elde edilmiş olmasıdır. Bu çalışma, sınıflandırmada sadece en ilgili özelliklerin dikkate alınması gerektiğini ve sınıflandırmada tüm değişkenlerin kullanılmasının gerekli olmadığını göstermiştir., Since the COVID-19 pandemic has appeared, many epidemiological models are developed around the world to estimate the number of infected individuals and the death ratio of the COVID-19 outbreak. There are several models developed on COVID-19 by using machine learning techniques. However, studies that considered feature selection in detail are very limited. Therefore, the aim of this study is to (i) investigate the independent and interactive effects of a diverse set of features and (ii) find the algorithms that are significant for classifying the death ratio of the COVID-19 outbreak. It was found that logistic regression and decision tree (C4.5, Random Forests, and REPTree) are the most suitable algorithms. A diverse set of features obtained by feature selection methods are the number of new tests per thousand, new cases per million, hospital patients per million, and weekly hospital admissions per million. The importance of this study is that a high rate of classification was obtained with a few features. This study showed that only the most relevant features should be considered in classification and the use of all variables in classification is not necessary.
- Published
- 2022
48. Feature selection with ant colony algorithm
- Author
-
Akcan, Umut, Eroğlu, Duygu Yılmaz, and Bursa Uludağ Üniversitesi/Fen Bilimleri Enstitüsü/Endüstri Mühendisliği Anabilim Dalı.
- Subjects
Data pre-processing ,Ant colony algorithm ,Karınca koloni algoritması ,Feature selection ,Veri ön işleme ,Hybrid algorithm ,Öznitelik seçimi ,Hibrit algoritmalar ,Sınıflandırma ,Classification - Abstract
Gelişen bilgi teknolojileri ile günümüzde veri miktarı hızla büyümektedir. Veri madenciliğin amacı, bu verilerden anlamlı bilgi çıkarmaktır. Veri miktarının büyük ve çok boyutlu olması, hesaplama maliyetlerini artırmakla beraber verilerden anlamlı bilgi çıkartılmasını zorlaştırmaktadır. Öznitelik seçiminin amacı bilgi kaybının asgari düzeyde tutarak verilerin çok boyutluluğunu azaltmaktadır. Literatürde, öznitelik seçimi için filtre, sarmalayıcı, gömülü ve hibrit yöntemler başlıkları altında farklı yaklaşımlar önerilmiştir. Bu tez çalışmasında, karınca koloni algoritması kullanılarak hibrit bir yaklaşım önerilmiştir. Hibrit yöntemler, iki adımdan oluşmaktadır. Önerilen yöntemin ilk adımında karınca koloni algoritması ile denetimsiz öğrenme şeklinde öznitelik seçimi yapılmıştır. İkinci adımında k en yakın komşuluk ve destek vektör makineleri sınıflandırma yöntemleri kullanılarak sınıflandırma modelleri oluşturulmuştur. Elde edilen sonuçlar, literatürde karınca koloni algoritması kullanan bir çalışma ile karşılaştırılmış olup, ortak kullanılan veri setlerinin yarısında daha iyi sonuçlara ulaşılmıştır. Bu sonuçlar, önerilen yöntemin etkinliğini doğrulanmış sonrasında daha yüksek doğruluk oranları elde etmek için hangi özniteliklerin kullanılması gerektiğine karar verebilmek amacıyla 10 kat çapraz doğrulama ile farklı sınıflandırıcılar kullanılmıştır. İlaveten, bir adet özniteliğin bile varlığının ve yokluğunun sonuçları nasıl etkilediğini göstermek için analizler yapılmış, öznitelik seçiminin önemi vurgulanmıştır. Son olarak da farklı sınıflandırıcı ve eğitim/test yapılarında veri setlerinin doğruluk oranı dışında hassasiyet ve gerçek pozitif değerler oranından hesaplanan F-puanının nasıl değiştiğine dair analizler yapılarak sonuçlar yorumlanmıştır. Nowadays with the developing information technologies, the amount of data is growing rapidly. The purpose of data mining is to extract meaningful information from these data. The fact that the amount of data is large and multidimensional increases the computational costs and makes it difficult to extract meaningful information from the data. The purpose of feature selection is to reduce the multidimensionality of the data by keeping information loss to a minimum. In the literature, different approaches have been proposed for feature selection under the headings of filter, wrapper, embedded and hybrid methods. In this thesis, a hybrid approach is proposed using the ant colony algorithm. Hybrid methods consist of two steps. In the first step of the proposed method, feature selection was made in unsupervised learning with the ant colony algorithm. In the second step, classification models are created by using k nearest neighbor and support vector machine classification methods. The results obtained were compared with a study using the ant colony algorithm in the literature. Better results were achieved in half of the commonly used datasets. According to these results, the effectiveness of the proposed method was verified, and then different classifiers were used with 10-fold cross validation in order to decide which features should be used to achieve higher accuracy rates. In addition, analyzes were made to show how the presence and absence of even one feature affected the results, and the importance of feature selection was emphasized. Finally, analyzes were made on how the F-score calculated from precision and true positive values, apart from the accuracy rate of the data sets, changed in different classifier and training/test structures, and the results were interpreted.
- Published
- 2022
49. The design and implementation of a method for question answering systems based on hybrid machine learning techniques
- Author
-
Çınaroğlu, Sinem, Bulut, Hasan, and Ege Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı
- Subjects
Öznitelik Seçimi ,Soru Sınıflandırma ,Deep Learning-Based Feature Representation ,Question Classification ,Question Answering Systems ,Majority Voting ,Feature Selection ,Soru Yanıtlama Sistemleri ,Derin Öğrenme Tabanlı Öznitelik Temsili ,Çoğunluk Oylama ,Topluluk Öğrenmesi ,Ensemble Learning - Abstract
Doğal dilde insan-bilgisayar etkileşiminin başlıca araştırma alanlarından biri olan Soru Yanıtlama Sistemleri, doküman uzayını kullanarak doğal dilde insanlar tarafından sorulan sorulara otomatik olarak cevaplar vermek için geliştirilen bir mimaridir. Ancak, Doğal Dil İşleme’de kullanılan öznitelik uzayının seyrekliği, yüksek boyutluluğu ve gereksiz özniteliklerin varlığı, kullanıcı sorularına doğru cevabı sunma konusunda önemli bir problem teşkil etmektedir. Bu zorluklarla mücadele etmek için sıklıkla başvurulan çözümlerin başında Makine Öğrenmesi yöntemleri gelmektedir. Bu tez çalışması kapsamında da, verileri en doğru şekilde temsil edecek, makine öğrenmesi tekniklerine dayalı etkin bir öznitelik seçim yöntemi önerilmiştir. Ayrıca bilinen çalışmalardan farklı olarak, soru yanıtlama problemi, sınıflandırma problemi olarak ele alınmış, bu problemin çözümü için derin öğrenme tabanlı soru yanıtlama sistemlerini kullanan melez bir sistem tasarımı geliştirilmiştir. Daha sonra, önerilen bu yöntemler soru yanıtlama çalışmalarında sıklıkla kullanılan TREC ve SQuAD veri seti üzerinde test edilmiş, temel makine öğrenmesi yöntemlerinin bireysel performansları ile karşılaştırılmıştır. Geliştirilen yöntemlerin performansı doğruluk ve F1-Skor ölçütleri kullanılarak değerlendirilmiştir. Yapılan deneysel çalışmalar sonucunda, geliştirilen yöntemler ile temel makine öğrenmesi yöntemlerinin soru yanıtlama ve sınıflandırma üzerindeki başarılarının arttırıldığı gözlemlenmiştir., Question Answering Systems is one of the main research areas of human-computer interaction in natural language. These systems are architectures that use document space and generate automatic answers of questions asked by people in natural language. Nevertheless, the sparseness, high dimensionality and redundancy of a feature space used in Natural Language Processing cause a significant problem in serving the correct answer to user questions. At this point, machine learning methods emerge as one of the most frequently used solutions to cope with this difficulty. Within the scope of this thesis, an effective feature selection method based on machine learning techniques has been proposed to represent the data in the most accurate way. In addition, unlike the known studies, the question answering problem is handled as a classification problem, and a hybrid system design using deep learning-based question answering systems is developed to solve this problem. Then, these proposed methods are tested on the TREC and SQuAD datasets, which are frequently used in question answering studies, and compared with the individual performances of classical machine learning methods. The performance of the developed methods is evaluated using the Accuracy and F1-Score metrics. As a result of experimental studies, it has been observed that the proposed methods increase the successes of classical machine learning methods on both question answering and classification tasks.
- Published
- 2022
50. Kolektif makine öğrenmesi tabanlı ağ saldırı tespiti
- Author
-
Emanet, Şura, Demir, Önder, Karataş Baydoğmuş, Gözde, Marmara Üniversitesi, Fen Bilimleri Enstitüsü, and Bilgisayar Mühendisliği Anabilim Dalı
- Subjects
Machine Learning ,Feature Filtering and Intrusion Detection ,Öznitelik Seçimi ,Kolektif Öğrenme Intrusion Detection System ,Feature Selection ,Makine Öğrenmesi ,Saldırı Tespit Sistemi ,Öznitelik Filtreleme ve Saldırı Tespiti ,Ensemble Learning - Abstract
İnternet kullanımının hızla yayılması ve buna paralel olarak çevrimiçi ortamlarda vakit geçiren kullanıcı sayısının gün geçtikçe fazlalaşması, siber risk ve tehditleri de beraberinde getirmektedir. Kötü amaçlı kullanıcılar bilgi, fikir, para gibi birçok önemli unsurun paylaşıldığı bu ortamlarda bulunan sistem ve uygulamaları önemli ölçüde zarara uğratabilmektedir. Saldırı Tespit Sistemleri (STS), İnternet ortamındaki sistem ve uygulama güvenliğinin sağlanmasında kritik bir role sahiptir. Bu sistemler yardımıyla internet ağında gerçekleşen aktiviteler ve trafik analiz edilerek olası atak, ihlal ve tehditler tespit edilir. Eğitimlerinde klasik yöntemlerin yanı sıra, çok sayıda makine öğrenmesi teknikleri kullanılabilmektedir. Son geliştirilen STS’ler, -dinamik bir güvenlik mekanizması oluşturulabilmesi için- makine öğrenmesi tekniklerinin tercih edildiği çalışmaların sayısının giderek arttığını göstermektedir. Bu çalışmada, öznitelik seçimi ve kolektif öğrenme yöntemlerinden faydalanılarak yüksek doğruluk oranına sahip performanslı bir STS elde etme üzerinde durulmuştur. Kullanılan veri kümesi kalitesinin de doğrudan STS verimliliği üzerinde etkisi olması sebebiyle, veri kümesi olarak saldırı çeşitliliği yüksek, bilinen güncel STS veri kümelerinden olan CIC-CSE-IDS2018 tercih edilmiştir. İlk aşamada, -saldırı tespit sürecinin iyileşmesi ve süresinin azalması adına- öznitelikler Spearman‘ın Korelasyon Analizi, Özyinelemeli Öznitelik Seçimi (RFE) ve Ki-Kare Test metotları uygulanarak belirlenmiştir. Belirlenen özniteliklerle oluşturulan yeni veri kümeleri ile orijinal boyuttaki veri kümelerinin karşılaştırılmasında Karar Ağacı, Gradyan Artırma, Adaptif Yükseltme, Lojistik Regresyon, Pasif-Agresif, Ekstra Ağaçlar ve Çok Katmanlı Algılayıcı sınıflandırıcılarından faydalanılmıştır. Yapılan performans denemelerinde Katmanlı 5-Katlamalı Çapraz Doğrulama tekniği kullanılmıştır. Bu tekniğin kullanılması nedeniyle oluşan hesaplama ve zaman maliyetini düşürmek için çok-çekirdekli paralelleştirme (multi-core parallellism) uygulanmıştır. Sonrasında, elde edilen performans sonuçlarının karşılaştırmalı bir analizi yapılmıştır. Sonuçlar, sistem başarımının Spearman’ın korelasyon analizi ve Ki-Kare test yöntemleri ile düştüğünü fakat RFE yöntemi ile arttığını göstermiştir. %98,76 doğruluk oranı ile en başarılı sınıflandırıcı Ekstra Ağaçlar olsa da çalışma süre kriteri göz önünde bulundurulduğunda sırayla %95,15 ve %98,65 doğruluk oranları ile Lojistik Regresyon ve Karar Ağacı sınıflandırıcıları da ön plana çıkmıştır. Pek çok çalışma, topluluk modelini kullanan bir sistemin sınıflandırmada tek bir sınıflandırıcı kullanan sisteme göre daha iyi sonuçlar verebileceğini göstermiştir. Bu sebeple ikinci aşamada, kompleks fakat daha yüksek doğruluk oranı sağlayan bir topluluk modeli oluşturma fikri üzerinde durulmuştur. Sınıflandırma algoritmalarından her birinin faydasını birleştiren “oylama” isimli toplu öğrenme yaklaşımı uygulanarak, ilk aşamada yer alan performans sonuçları üzerinden seçilen sınıflandırıcılar ile kolektif bir model üretilmiştir. Kolektif model için Karar Ağacı, Ekstra Ağaç ve Lojistik Regresyon sınıflandırıcıları seçilmiştir. Sonuçlar, %98,82 doğruluk oranı ile kolektif modelin tek bir sınıflandırıcının bulunduğu bireysel yaklaşımlardan daha üst bir performans gösterdiğini ortaya koymuştur. The fast-moving propagation of internet usage and the corresponding increase in the number of user spending time online bring cyber risks and threats along. Malicious computer users can cause momentous damage to the systems and applications in the internet environment where many important elements such as information, ideas and money are shared. Intrusion Detection Systems (IDSs) have a critical role in ensuring system and application security in the Internet environment. With the help of these systems, activities and traffic on the Internet network are analyzed and possible attacks, violations and threats are detected. In addition to classical methods, many machine learning techniques can be used in their training. Recently developed IDSs show that the number of studies in which machine learning techniques are preferred in order to create a dynamic security mechanism, is increasing day by day. In this study, it is focused on obtaining a high-performance IDS that works with high accuracy by using feature selection and ensemble learning methods. Since the quality of the dataset used has a direct effect on IDS efficiency, CIC-CSE-IDS2018, which is one of an up-to-date IDS dataset known, with a high attack variety, was preferred. In the first stage, the features were determined by applying Spearman's correlation analysis, Recursive Feature Elimination (RFE) and Chi-Square test methods in order to improve attack detection process and reduce its time. Decision Tree, Gradient Boosting, Adaptive Boosting, Logistic Regression, Passive-Aggressive, Extra Trees and Multilayer Perceptron classifiers were used to compare the original datasets with the new datasets consisting of the specified features. Stratified 5-Fold Cross Validation technique was used in performance tests. In order to reduce computational and time cost incurred due to the fact that all experiments were performed with using this technique, multi-core parallelism has been applied. Afterwards, a comparative analysis was made for the performance results obtained. The results showed that, the system performance decreased with Spearman’s correlation analysis and Chi-Square test methods, but increased with RFE method. Although the model with the highest performance belongs to the Extra Trees classifier with an accuracy rate of 98.76%, considering the execution time metric, Logistic Regression and Decision Tree classifiers came to the fore with accuracy rates of 95.15% and 98.65%, respectively. Many studies have shown that a system using the ensemble model can give better results in classification than a system using a single classifier. For this reason, in the second stage, the idea of creating a complex but higher accuracy ensemble model was discussed. By applying the ensemble learning approach called “voting”, which combines the benefits of each of the classification algorithms, a collective model was produced with the classifiers selected based on the performance results obtained in the first stage. Decision Tree, Extra Tree and Logistic Regression classifiers were chosen for the collective model. The results revealed that the collective model outperformed the individual approaches consisting of a single classifier, with an accuracy rate of 98.82%.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.