Descriptor: "Data preprocessing" / Language: turkish - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Data preprocessing"' showing total 12 results

Start Over Descriptor "Data preprocessing" Language turkish

12 results on '"Data preprocessing"'

1. Haber Sınıflandırma Sistemlerinde Naive Bayes ve Makine Öğrenmesi Algoritmaları Arasında Performans KarĢılaĢtırması.

Author: VEZĠROĞLU, Merve and BUCAK, Ġhsan Ömür
Subjects: *MACHINE learning, *RANDOM forest algorithms, *LOGISTIC regression analysis, *ALGORITHMS, *CLASSIFICATION
Abstract: The rapid increase in digital content, particularly in text-based tasks like news classification, has significantly amplified the demand for automated classification methods. At this point, Natural Language Processing (NLP) techniques offer the potential to efficiently generate results from large datasets without human intervention. This study presents a Naive Bayes (NB)-based classification system, developed using Python, aimed at categorizing news headlines. NB algorithms are favored for text classification problems due to their simplicity and fast computation. The dataset used, derived from BBC News headlines, covers diverse categories such as technology, business, sports, entertainment, and politics. The data preprocessing phase included steps such as text cleaning, removing stop words, and converting the text into numerical data using Count Vectorization. This process plays a critical role in ensuring accurate and effective classification. Five different NB variants were examined in this study: Gaussian, Multinomial, Complement, Bernoulli, and Tree-Augmented Naive Bayes (TAN). The results showed that Multinomial NB delivered the best performance with an accuracy rate of 98.53%. Complement NB achieved 98.31%, TAN 98.20%, Bernoulli 96.74%, while Gaussian NB ranged between 91.79% and 92.92%. Additionally, NB algorithms were compared with advanced machine learning algorithms such as Logistic Regression, Random Forest, Linear Support Vector Classifier, and Multi-Layer Perceptron. The Multi-Layer Perceptron stood out with an accuracy rate of 98.31%, while the other algorithms also surpassed 97% accuracy. This study demonstrates that NB algorithms provide a robust, reliable, and effective solution for news classification problems, with the Multinomial and Complement variants showing particularly high accuracy. Future research will aim to further enhance the performance of these algorithms using larger datasets and new approaches. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

2. Karar Ağacı ve Kural Tümevarımı ile Eğitsel Veri Madenciliği: SAÜ İLİTAM Örneği.

Author: DEMİRCİOĞLU DİREN, Deniz and HORZUM, Mehmet Barış
Abstract: Copyright of Pamukkale University Journal of Education is the property of Pamukkale University Journal of Education and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

3. Hedef Tespiti için Yere Nüfuz Eden Radar Verisinde Ön işleme Algoritmalarının Karşılaştırılması.

Author: GÜNEY, Selda and ÇOLAK, Deniz
Published: 2020
Full Text: View/download PDF

4. Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset

Author: Zeynel Cebeci and Figen Yıldız
Subjects: Data preprocessing, Discretization, Unsupervised discretization, Egg quality traits, Agriculture, Agriculture (General), S1-972
Abstract: Discretization is a data pre-processing task transforming continuous variables into discrete ones in order to apply some data mining algorithms such as association rules extraction and classification trees. In this study we empirically compared the performances of equal width intervals (EWI), equal frequency intervals (EFI) and K-means clustering (KMC) methods to discretize 14 continuous variables in a chicken egg quality traits dataset. We revealed that these unsupervised discretization methods can decrease the training error rates and increase the test accuracies of the classification tree models. By comparing the training errors and test accuracies of the model applied with C5.0 classification tree algorithm we also found that EWI, EFI and KMC methods produced the more or less similar results. Among the rules used for estimating the number of intervals, the Rice rule gave the best result with EWI but not with EFI. It was also found that Freedman-Diaconis rule with EFI and Doane rule with EFI and EWI slightly performed better than the other rules.
Published: 2017
Full Text: View/download PDF

5. Çevresel Veri Problemleri için Veri Madenciliği ile Veri Ön İşleme.

Author: Eren, Beytullah and Aksangür, İpek
Abstract: Realistic models and accurate estimates are needed for the control of environmental facilities where waste management is performed. The most important step in developing an accurate prediction model is clean data. Data from environmental facilities should be cleared during the pre-treatment phase. During the cleaning phase of the data; 25, 141, 26, 22, 241, 645, and 688 missing data were determined for pH, EC, AKM, COD, BOD5, Oil-Grease and TDS parameters, respectively. The missing data were completed according to the mean values. Then, 10 noisy data were identified and row based cleaning was performed. In order to determine seasonal average values, BOI5 parameter was studied and seasonal average values were calculated through the program. In this study, it is revealed that the raw data of an environmental facility can be cleaned with data mining programs and made ready for the next stage model application. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

6. Derin öğrenme yöntemleri kullanılarak meme kanseri teşhisi

Author: Canatalay, Peren Cerfi, Uçan, Osman Nuri, and Canatalay, Peren Cerfi
Subjects: Resnet50, Deep Learning, Derin Öğrenme, Data Preprocessing, Breast Cancer, Veri Ön İşleme, Meme Kanseri, VGG19
Abstract: Meme kanseri, kadınlar arasındaki başlıca ölüm sebeplerindendir. Mamografi ekipmanı aracılığıyla elde edilen X-Ray görüntüleri, meme kanserinin erken teşhisine yardımcı olmak açısından en sık kullanılan araçlardan biridir. Bu çalışmadaki amaç, meme kanseri görüntülerinin tümör içeren türlerini derin öğrenme tekniklerine odaklamaktır. Bu yöntemde pek çok parametre bulunmaktadır. Meme kanseri, fenomenin çok karmaşık olduğu ve ayrıca tümör türlerine ilişkin sayısız özelliğin bulunduğu bir hastalıktır. Bu çalışmada, meme kanserinin sınıflandırılabilmesi açısından mamogram görüntü işleme teknikleri ve çeşitli örüntü tanıma teknikleri kullanılmıştır. Tümör görüntü iyileştirmeleri, bölütleme, doku bazlı görüntü özelliği çıkarma ve daha sonra meme kanseri mamogram görüntüsünün sınıflandırılması için örüntü tanıma teknikleri başarılı bir biçimde gerçekleştirilmiştir. Hastalığın doğru metodolojiyle saptanması, tedavi edilebilmesi bakımından oldukça önemli bir rol oynamaktadır. Bu çalışmada; Derin Öğrenme tekniği ile 731186 X-Ray görüntüleri veri seti üzerinde işlem yapılarak; hastanın meme kanserine sahip olup olmadığı, kanser olduğu takdirde ise bunun iyi huylu (benign) ya da kötü huylu (malignite) olup olmadığına ilişkin karar verilmesi ele alınmaktadır. Bu X-Ray görüntülerinin %80'i eğitim amaçlı, %20'si ise test olarak uygulanmıştır. Yapılan çalışmada, X-Ray görüntülerinde farklı Derin Öğrenme tekniklerinin kullanılmasıyla meme kanseri teşhisi yapılmaktadır. Çalışma kapsamında iki metot önerilmiştir. Önerilen birinci metotta VGG19, ikinci metotta ise Resnet50 tekniği kullanılmıştır. Deneysel sonuçlar neticesinde; performansın VGG19 için %91,74 oranında, Resnet50 için ise %98.81oranında hatasızlık oranına ulaştığı görülmüştür. Böylece Resnet50 metodunun daha başarılı olduğu sonucuna ulaşılmıştır. Ek olarak bu verilerden hareketle yapılan çalışmada, meme kanseri X-Ray görüntülerinde kanser olup olmadığı, kanser ise hangi tür kansere sahip olduğu gösterilmiştir. Breast cancer is a leading cause of death among women. Mammography images obtained by X-ray equipment are one of the most often utilized tools in order to detect breast cancer at an early stage the goal of this research is to look at deep learning approaches for tumor-filled breast cancer images. ESA-based systems were created for the purpose sensitive identification of breast cancer, which is particularly difficult to identify when many characteristics are related. There is a molasses parameter in this method. Breast cancer is a disease with a highly complex etiology and various characteristics associated with tumor kinds. Breast cancer was classified in this study using mammography image processing techniques and various pattern recognition techniques. Pattern recognition techniques have been effectively used to enhance tumor images, segment them, extract tissue-based image features, and finally classified mammography images of breast cancer. Accurate illness detection is critical for effective therapy. In this study, the Deep Learning technique is used to evaluate a dataset of 731186 X-Ray images to determine whether a patient has breast cancer and, if so, whether the cancer is benign (benign) or malignant (malignant) (malignant). 80% of these X-Ray images were used for instruction purposes, while 20% were used for testing purposes. The study makes a breast cancer diagnosis utilizing a variety of Deep learning techniques applied to X-Ray images. Within the scope of the study, two methodologies have been offered. The first proposed method employed the VGG19 technique, whereas the second method used the Resnet50 technique. The performance achieved a 91.74 percent error-free rate for VGG19 and 98.81 percent for Resnet50 due to the experimental results. Consequently, it was determined that the Resnet50 method was more effective. Additionally, the study assigne whether or not there is cancer in breast cancer X-Ray images and what form of cancer there is.
Published: 2022

7. Sentiment analysis in Turkish texts using machine learning techniques

Author: Düven, Batıbay, Tunalı, Volkan, and Maltepe Üniversitesi, Lisansüstü Eğitim Enstitüsü
Subjects: Sentiment analysis, Duygu analizi, Machine learning, Data preprocessing, Makine öğrenmesi, Veri önişleme
Abstract: People tend to write down their daily experiences, opinions, and personal feelings. At the same time, people need the opinions of others to make decisions or gain knowledge on any subject. Today, this information exchange takes place on social platforms on the internet. This platform provides opinions on many issues. It has become impossible to read and analyze these views one by one due to the abundance of data. It is very important to analyze the information here correctly. In this case, sentiment analysis comes into play. Sentiment analysis refers to the process of classifying texts expressing opinions and presenting these opinions in an understandable way. In this study, emotion analysis was performed with different machine learning methods on previously tagged training data and success rates were compared. Various pre-treatment methods have been tried to increase the success rates and the results have been interpreted., İnsanlar günlük hayatta yaşadıkları deneyimlerini, görüşlerini, kişisel duygularını yazılı hale getirme eğilimindedirler. Aynı zamanda insanlar herhangi bir konuda karar vermek ya da bilgi sahibi olmak için başkalarının düşüncelerine ihtiyaç duyarlar. Günümüzdeki bu bilgi alışverişi internet üzerindeki sosyal platformlarda gerçekleşmektedir. Bu platform birçok konu hakkında görüş bildirmektedir. Bu görüşlerin tek tek okunup analiz edilmesi veri çokluğundan dolayı imkansız hale gelmiştir. Buradaki bilgilerin doğru analiz edilmesi büyük önem taşımaktadır. Bu durumda devreye duygu analizi girer. Duygu analizi görüş bildiren metinleri sınıflandırmamıza ve bu görüşleri anlaşılır bir biçimde sunma işlemini ifade eder. Bu çalışmada önceden etiketlenmiş eğitim verisi üzerinde farklı makine öğrenmesi yöntemleri ile duygu analizi yapılmış ve başarı oranları karşılaştırılmıştır. Başarı oranlarını arttırmak için çeşitli önişleme yöntemleri denenmiştir ve sonuçlar yorumlanmıştır.
Published: 2021

8. Hedef tespiti için yere nüfuz eden radar verisinde ön işleme algoritmalarının karşılaştırılması

Author: Deniz Çolak, Selda Guney, Çolak, Deniz, and Güney, Selda
Subjects: Prescreening, Çarpaz korelasyon, Computer science, 0211 other engineering and technologies, Data preprocessing, 02 engineering and technology, 010501 environmental sciences, Ön görüntüleme, 01 natural sciences, Ground Penetrating Radar, Veri önişleme, law.invention, Least mean squares filter, law, 021105 building & construction, Radar, 0105 earth and related environmental sciences, Remote sensing, Kernel en küçük ortalama kareler, Kernel Least Mean Square, Cross-correlation, Detector, General Medicine, Yere nüfuz eden radar, Kernel (image processing), Ground-penetrating radar, Line (geometry), Data pre-processing, Cross correlation
Abstract: Yere nüfuz eden radar sistemleri yaklaşık yirmi senedir arkeoloji, jeoloji, inşaat mühendisliği alanlarında yaygın olarak kullanılan bir teknolojidir. Yere nüfuz eden radar önemli bir uzaktan algılama teknolojisi olup, yüzey altındaki nesne ve katmanların algılanmasını ve konumunun belirlenmesini elektromanyetik yöntemlerle sağlamaktadır. Tüm metalik nesneler bir metal detektörü tarafından tespit edilip tanımlanmış olsa da plastik veya düşük metal içerikli kara mayınlarını tespit etmek ve tanımlamak için başka teknolojilere ihtiyaç bulunmaktadır. Yerin altının görüntülenmesine ihtiyaç duyulan tüm sivil ve askeri alanlardaki ihtiyaçların karşılanması kapsamında teknolojik gelişmeler doğrultusunda sistem geliştirme çalışmaları sürdürülmektedir. Bu çalışmada, radardan elde edilen verilerin işlenmesine yönelik görüntü ön işleme algoritma yapıları incelenmiştir. Aynı zamanda ön görüntüleme aşamasında yapılacak görüntü iyileştirmelerinin sistem başarımına etkisi irdelenmiştir. Önişleme aşamasındaki önerilen çapraz korelasyon yöntemi, yine önişleme yöntemi olarak kullanılan En Küçük Ortalama Kareler ve Kernel En Küçük Ortalama Kareler ile hız ve başarılı tespit açısından karşılaştırılmıştır. Sistemin gerçek zamanlı çalışmasına yönelik farklı yöntemler incelenmiş, çapraz korelasyon yöntemi ile daha hızlı ve başarılı tespitlerin yapılacağı gösterilmiştir. The Ground Penetrating Radar (GPR) systems have been widely used in the fields of archaeology, geology, civil engineering for about twenty years. The GPR is an important remote sensing technology that allows objects and layers under the surface to be perceived and positioned using electromagnetic techniques. Although all metallic objects are detected and identified by a metal detector, other technologies are needed to detect and identify landmines with plastic or low metal content. System development studies are continuing in line with technological developments within the scope of meeting the needs of all civil and military areas required to display under the surface. In this study, image pre-processing algorithms for processing the data obtained from the radar are examined. The effects of system improvements on the image enhancements to be made during the pre-screening phase have been studied. The proposed cross-correlation method in the pre-processing phase was compared with the Least Mean Squares and Kernel Least Mean Squares, also used as the pre-processing method, in terms of speed and successful detection. To run the system in real-time operations, these methods have been examined and as a result very fast and improved results have been obtained with the cross-correlation method compared to other methods.
Published: 2020

9. Hastalık tanısı verilerinde veri ön işlemenin topluluk öğrenme sınıflandırma algoritmaları üzerindeki etkisinin incelenmesi

Author: Özkan, Yüksel, Suner Karakülah, Aslı, Biyoistatistik ve Tıbbı Bilişim Anabilim Dalı, and Sağlık Bilimleri Enstitüsü
Subjects: Class Noise, Bioistatistics, Statistical methods, Missing Values, Biyoistatistik, Statistics, Kayıp Gözlem, Biostatistics, Tanısı, Sınıf Gürültüsü, Class Imbalance, Data processing, Medical informatics, Medical informatics applications, Data Preprocessing, Machine learning, Diagnosis, Veri Ön İşleme, Disease, Topluluk Öğrenme, Sınıf Dengesizliği, Ensemble Learning
Abstract: Sağlık alanında hastalığın tanımlanması ve incelenmesi için sınıflandırma yaparken, özellikle karmaşık verilerden anlamlı bilginin ortaya çıkarılmasında, yapay zekâ teknolojisini kullanarak hesaplama yapabilen denetimli makine öğrenme yöntemleri kullanılmaktadır. Topluluk öğrenme yöntemleri ise aynı problemi çözmek için birden fazla öğreniciyi aynı anda eğiterek daha başarılı modellerin kurulmasını sağlamaktadır. Bu çalışmada, sağlık verilerinde doğru hastalık tanısı koymak için kullanılan veri setlerinde olası karşılaşılabilecek kayıp gözlem, sınıf gürültüsü ve sınıf dengesizliği gibi problemlere veri ön işleme yapıldıktan sonra, sınıflandırma algoritmalarının performanslarının karşılaştırılması amaçlanmıştır. Çalışmada, KEEL veri tabanından kalp hastalığı, tiroid, hepatit, lenfödem, meme kanseri ve diyabet gibi hastalıkların tanısı için toplanmış veriler kullanılmıştır. Sınıflandırma yapmak amacıyla, torbalama algoritmalarından rastgele orman ve ağırlıklı alt uzay rastgele orman algoritmaları kullanılırken; artırma algoritmalarından eklemeli lojistik regresyon ve gradyan artırma makinaları algoritmaları kullanılmıştır. Algoritmaların performanslarının karşılaştırılmasında doğruluk, duyarlılık/hassaslık, seçicilik, kesinlik, Kappa istatistiği, Youden indeksi, F - ölçütü ve ROC ölçüm metrikleri kullanılmıştır. Aynı zamanda, algoritmaların çalışma süreleri hesaplanmıştır. Tüm istatistiksel analizler, RStudio 1.2.1335 - Windows 7+ (64-bit) programı ile yapılmıştır. Orijinal veriler ve işlenmiş veriler için algoritmaların performansları karşılaştırıldığında, veri ön işlemeden sonra algoritmaların performans başarılarının arttığı görülmüştür. Genel olarak, artırma algoritmalarının performansları torbalama algoritmalarına göre daha yüksek sonuçlar vermiştir. Algoritmalar çalışma süreleri açısından kıyaslandığında ise, artırma algoritmaları en uzun süre çalışan algoritmalardır. Sonuç olarak, araştırmalar tarafından yüksek performans başarısı hedefleniyorsa, veri ön işleme göz ardı edilmemelidir. Veri ön işlemede, parametrelerin ayarlanma ve değişken seçimi gibi farklı konularda eklenerek benzetim çalışmaları yapılabilir., In the field of health, while classifying for identification and examination of disease, supervised machine learning methods are used, which are able to compute using artificial intelligence technology, in order to extract meaningful information from complex data. Ensemble learning methods enable establishment of more successful models by training multiple learners at the same time to solve same problem. In this study, it is aimed to compare performance of classification algorithms after data preprocessing to problems such as missing values, class noise and class imbalance that may be encountered in data sets used to diagnose accurate disease in health data. In the study, data collected from KEEL database were used to diagnose diseases such as heart disease, thyroid, hepatitis, lymphedema, breast cancer and diabetes. In order to make classification, while random forest and weighted subspace random forest were used as bagging algorithms; additive logistic regression and gradient boosted machines algorithms were used as boosting algorithms. Accuracy, sensitivity, specificity, precision, Kappa statistic, Youden index, F - measure and ROC measurement metrics were used to compare performance of algorithms. At the same time, run times of algorithms were calculated. All statistical analyzes were performed with RStudio 1.2.1335 - Windows 7+ (64-bit) program. When performances of algorithms were compared for original data and processed data, it was seen that performance success of algorithms increased after data preprocessing. In general, performance of boosting algorithms yielded higher results than bagging algorithms. When algorithms were compared in terms of run time, boosting algorithms were the longest running algorithms. As a result, data preprocessing should not be overlooked if research is aimed at high performance success. In data preprocessing, simulation studies can be performed by adding different topics such as tuning parameters and selecting variables.
Published: 2019

10. On the Analysis of Work Accidents Data by Using Data Preprocessing and Statistical Techniques

Author: Yalccn Oruc, Erdal Kilid, Ahmet Elibol, Sedat Akleylek, Zinnet Duygu Aksehir, and Ondokuz Mayıs Üniversitesi
Subjects: Contingency table, Computer science, business.industry, accident of employment, job security, univariate frequency analysis, Univariate, Information technology, Terabyte, computer.software_genre, worker health, machine learning, Work (electrical), data preprocessing, cross tabulation analysis, Preprocessor, Data pre-processing, Data mining, business, computer, Volume (compression)
Abstract: 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) -- OCT 19-21, 2018 -- Kizilcahamam, TURKEY WOS: 000467794200121 The volume of data used in research has increased considerably with the development of information technology. Nowadays, these data are expressed in terms of terabytes while suffering data shortage many years ago. It is necessary to overcome through the data preprocessing stage before using it in machine learning applications. The missing, noisy and inconsistent variables in the dataset are detected and the dataset are fitted by preprocessing phase. In this study, the work accident data was passed through the data preprocessing step and then univariate frequency and cross tabulation analysis were performed on these data. According to the experimental results, high risk variables have been determined in order to get the job accidents. IEEE Turkey Sect, Karabuk Univ, Kutahya Dumlupinar Univ
Published: 2018

11. Veri kümelerindeki eksik değerlerin yeni yaklaşımlar kullanılarak hesaplanması

Author: Aydilek, İbrahim Berkan, Arslan, Ahmet, Enstitüler, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı, and Bilgisayar Mühendisliği Anabilim Dalı
Subjects: Artificial intelligence, Hibrit yaklaşımlar, Fuzzy c-means, Missing data, Bulanık c-ortalamaları, En yakın k-komşu, Data preprocessing, Kayıp değerler, Computer Engineering and Computer Science and Control, Eksik değerler, Veri önişleme, Incomplete values, Hybrid method, Missing values, Kayıp veriler, Data mining, Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Imputation, K-Nearest neighbor
Abstract: Veri kümeleri; veri madenciliği, makine öğrenmesi veya yapay zeka gibi disiplinlerin uygulanabilmesi için gereklidir. Veri kümelerindeki verinin kalitesi, doğru araştırma sonuçları elde edebilmek adına önemli bir konudur. Veri kümelerinde çeşitli nedenlerle veri kalitesini azaltan değeri olmayan nitelikler bulunabilmektedir. Değeri olmayan bu eksik değerler yapılmak istenen çalışmaya ait sonuçların güvenirliğini riske atabilmektedir. Bu nedenle veri kalitesini artırmaya yönelik yöntemler ile veri kümelerindeki eksik değer probleminin giderilmesi gerekmektedir. Bu tez çalışmasında eksik değer hesaplamasında kullanılan klasik yöntemlerden bahsedilerek alternatif gelişmiş yöntemler önerilmiştir. Daha önce konuyla ilgili yapılmış olan çalışmaların faydalarından bahsedilerek eksik değer hesaplamasının önemi vurgulanmıştır. Bulanık c-ortalamaları, destek vektör regresyonu ve genetik algoritmaların hibrit kullanımı ile geliştirilen bir yaklaşım ve ayrıca en yakın k-komşu ve yapay sinir ağlarının hibrit kullanımı sonucu geliştirilen bir diğer yaklaşım önerilmiştir. Bu yaklaşımlarda kullanılan temel algoritmalar olan bulanık c-ortalamaları ve en yakın k-komşu algoritmaları için en uygun parametre değerlerini bulan otomatik bir model önerilmiştir. Önerilen yaklaşımlar literatürde yaygın kullanılan veri kümeleri ile test edilmiş ve benzer diğer yaklaşımlar ile kıyaslanmıştır. Benzer yöntemlerin eksikliklerine karşı önerilen hibrit yaklaşımların literatüre kattığı yenilikler anlatılmıştır. Elde edilen araştırma sonuçlarında, önerilen hibrit yaklaşımların performanslarının benzer yöntemlere göre daha üstün ve tutarlı olduğunu görülmüştür., Data mining, machine learning or artificial intelligence algorithms need a dataset to produce and evaluate research results. Data quality is a significant issue to obtain accurate research results. Many datasets may contain one or more missing values in a row due to various reasons. Missing values reduce data quality and even may jeopardize research results. Therefore, before using missing values in data mining or machine learning methods, they should be handled and estimated without reduce the data quality. In this paper basic conventional and computational intelligence imputation techniques are mentioned. Advantages of closer literature researches bring out the importance of dealing with missing values in datasets. A novel hybrid approach using fuzzy c-means, support vector regression and genetic algorithms is proposed. Also another novel hybrid approach k-nearest neighbors, artificial neural networks is also proposed. Fuzzy c-means and k-nearest neighbors algorithms? parameters are automatically optimized. Approaches tested with different kinds of datasets, which are frequently used in literature and additionally proposed approaches are compared with other closer methods in literature. Disadvantages of closer methods are mentioned in order to assess the originality of the proposed approaches. Findings showed that new novel proposed hybrid approaches performances are more stable and better than the other closer methods.
Published: 2013

12. A data mining software design and application for association rule technique

Author: Özçakır, Feridun Cemal, Çamurcu, Ali Yılmaz, TR137489, TR5913, and Bölüm Yok
Subjects: Veri Madenciliği, Birliktelik Kuralı Madenciliği, Data Preprocessing, Association Rule Mining, Veri Önişleme, Data Mining, Apriori
Abstract: Bu çalışmada, bir firmanın pastane satış verileri üzerinde veri madenciliği uygulamak için birliktelik kuralları ile bir yazılım tasarlanmıştır. Veritabanlarında bilgi keşfi sürecindeki işlemler gerçekleştirilmiştir. Veri seçme işlemi ile operasyon veritabanından uygulama veritabanına veriler transfer edilmiştir. Veritabanı içindeki veriler üzerinde veri önişleme ve veri indirgeme süreçleri uygulanarak veri madenciliğine uygun veri seti elde edilmiştir. Tasarlanan yazılımda, Apriori algoritması kullanılmıştır. Uygulanan Apriori algoritması ile farklı zaman dilimi, farklı satış lokasyonu girdi değerleri doğrultusunda birlikte satın alınan ürünler ile ilgili bağıntılar olduğu gözlemlenmiştir. Genelde aynı ürün grubuna ait ürünlerin, en sık birlikte satın alınan ürünler olduğu görülmüştür. Yazılımın özel tasarımının sağladığı imkan ile yazılımının çalışması esnasında algoritmanın her aşaması izlenebilmiştir. In this study, a software is designed with association rules to apply data mining on patisserie sales datain a company. Processing steps in Knowledge Discovery from Data are implemented. Data is transferred from database of operation location to application location by data selection process. Apropirate data set for data mining has been got using data preprocessing and data reduction processes on data inside database. Apriori algorithm is used in our software. Each step of algorihm is monitored during software run. Each step of algorithm operation can be monitored for that special software.
Published: 2007

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Data preprocessing"'

1. Haber Sınıflandırma Sistemlerinde Naive Bayes ve Makine Öğrenmesi Algoritmaları Arasında Performans KarĢılaĢtırması.

2. Karar Ağacı ve Kural Tümevarımı ile Eğitsel Veri Madenciliği: SAÜ İLİTAM Örneği.

3. Hedef Tespiti için Yere Nüfuz Eden Radar Verisinde Ön işleme Algoritmalarının Karşılaştırılması.

4. Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset

5. Çevresel Veri Problemleri için Veri Madenciliği ile Veri Ön İşleme.

6. Derin öğrenme yöntemleri kullanılarak meme kanseri teşhisi

7. Sentiment analysis in Turkish texts using machine learning techniques

8. Hedef tespiti için yere nüfuz eden radar verisinde ön işleme algoritmalarının karşılaştırılması

9. Hastalık tanısı verilerinde veri ön işlemenin topluluk öğrenme sınıflandırma algoritmaları üzerindeki etkisinin incelenmesi

10. On the Analysis of Work Accidents Data by Using Data Preprocessing and Statistical Techniques

11. Veri kümelerindeki eksik değerlerin yeni yaklaşımlar kullanılarak hesaplanması

12. A data mining software design and application for association rule technique

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

12 results on '"Data preprocessing"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources