801 results on '"Nearest-neighbor chain algorithm"'
Search Results
2. Prediction of Protein-Peptide Interactions with a Nearest Neighbor Algorithm
- Author
-
Tao Huang, Yu-Hang Zhang, Bi-Qing Li, Mei-ling Jin, and Yu-Dong Cai
- Subjects
0301 basic medicine ,Nearest neighbor search ,Biology ,Biochemistry ,k-nearest neighbors algorithm ,03 medical and health sciences ,Computational Mathematics ,030104 developmental biology ,Best bin first ,Nearest neighbor graph ,Nearest-neighbor chain algorithm ,Genetics ,Molecular Biology ,Algorithm - Published
- 2018
3. Fast approximate minimum spanning tree based clustering algorithm
- Author
-
Sraban Kumar Mohanty, Aparajita Ojha, and R. Jothi
- Subjects
Spanning tree ,Cognitive Neuroscience ,02 engineering and technology ,Minimum spanning tree ,Computer Science Applications ,Distributed minimum spanning tree ,Combinatorics ,Nearest neighbor graph ,Artificial Intelligence ,020204 information systems ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Reverse-delete algorithm ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Cluster analysis ,MathematicsofComputing_DISCRETEMATHEMATICS ,Mathematics - Abstract
Minimum Spanning Tree (MST) based clustering algorithms have been employed successfully to detect clusters of heterogeneous nature. Given a dataset of n random points, most of the MST-based clustering algorithms first generate a complete graph G of the dataset and then construct MST from G. The first step of the algorithm is the major bottleneck which takes O(n2) time. This paper proposes an algorithm namely MST-based clustering on partition-based nearest neighbor graph for reducing the computational overhead. By using a centroid based nearest neighbor rule, the proposed algorithm first generates a sparse Local Neighborhood Graph (LNG) and then the approximate MST is constructed from LNG. We prove that both size and computational time to construct the graph (LNG) is O(n3/2), which is a O ( n ) factor improvement over the traditional algorithms. The approximate MST is constructed from LNG in O ( n 3 / 2 lg n ) time, which is asymptotically faster than O(n2). Experimental analysis on both synthetic and real datasets demonstrates that the computational time has been reduced significantly by maintaining the quality of clusters obtained from the approximate MST.
- Published
- 2018
4. Enhanced shared nearest neighbor clustering approach using fuzzy for teleconnection analysis
- Author
-
Kesari Verma and Rika Sharma
- Subjects
Fuzzy clustering ,Computer science ,business.industry ,Centroid ,Boundary (topology) ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Fuzzy logic ,Theoretical Computer Science ,k-nearest neighbors algorithm ,010104 statistics & probability ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Cluster (physics) ,020201 artificial intelligence & image processing ,Geometry and Topology ,Artificial intelligence ,0101 mathematics ,business ,Cluster analysis ,Software - Abstract
Massive amount of Earth science data open an unprecedented opportunity to discover potentially valuable information. Earth science data are complex, nonlinear, high-dimensional data, and the sparsity of data in high-dimensional space poses major challenge in clustering of the data. Shared nearest neighbor clustering (SNN) algorithm is one of the well-known and efficient methods to handle high-dimensional spatiotemporal data. The SNN clustering method does not cluster all the data forming rigid boundary selection. This paper reports fuzzy shared nearest neighbor (FSNN) algorithm which is an enhancement of the SNN clustering method that has the capability of handling the data lying in the boundary regions by means of a fuzzy concept. The clusters obtained can be characterized by the cluster centroid, which summarizes the behavior of the ocean points in the cluster. The statistical measure is used to find the significant relation between the cluster centroids and the existing climate indices. In this study, correlation measure is used to find the significant pattern, such as teleconnection or dipole. The experimentation is performed on Indian continent latitude range $$7.5^{\circ }{-}37.5^{\circ }\hbox {N}$$ and longitude range $$67.5^{\circ }{-}97.5^{\circ }\hbox {E}$$ . Extensive experiments are carried out to compare the proposed approach with existing clustering methods such as K-means, fuzzy C-means and SNN. The proposed method, FSNN algorithm, not only handles the data lying in the overlapping region, but it also finds more compact and well-separated clusters. FSNN shows better results in terms of finding a significant correlation between cluster centroids and existing climate indices and validated by ground truth dataset.
- Published
- 2017
5. An efficient instance selection algorithm for k nearest neighbor regression
- Author
-
Yunsheng Song, Xingwang Zhao, Jiye Liang, and Jing Lu
- Subjects
business.industry ,Generalization ,Cognitive Neuroscience ,Population size ,Pattern recognition ,02 engineering and technology ,Regression ,Computer Science Applications ,k-nearest neighbors algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Simple (abstract algebra) ,020204 information systems ,Nearest-neighbor chain algorithm ,Outlier ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Algorithm ,Mathematics ,Data reduction - Abstract
The k-Nearest Neighbor algorithm(kNN) is an algorithm that is very simple to understand for classification or regression. It is also a lazy algorithm that does not use the training data points to do any generalization, in other words, it keeps all the training data during the testing phase. Thus, the population size becomes a major concern for kNN, since large population size may result in slow execution speed and large memory requirements. To solve this problem, many efforts have been devoted, but mainly focused on kNN classification. And now we propose an algorithm to decrease the size of the training set for kNN regression(DISKR). In this algorithm, we firstly remove the outlier instances that impact the performance of regressor, and then sorts the left instances by the difference on output among instances and their nearest neighbors. Finally, the left instances with little contribution measured by the training error are successively deleted following the rule. The proposed algorithm is compared with five state-of-the-art algorithms on 19 datasets, and experiment results show it could get the similar prediction ability but have the lowest instance storage ratio.
- Published
- 2017
6. A clustering algorithm with affine space-based boundary detection
- Author
-
Xiangli Li, Baozhi Qiu, and Qiong Han
- Subjects
DBSCAN ,Clustering high-dimensional data ,Mathematical optimization ,Fuzzy clustering ,Computer science ,Correlation clustering ,Single-linkage clustering ,02 engineering and technology ,Matrix (mathematics) ,Artificial Intelligence ,CURE data clustering algorithm ,Ramer–Douglas–Peucker algorithm ,020204 information systems ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,k-medians clustering ,Harris affine region detector ,k-medoids ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,Affine space ,Canopy clustering algorithm ,Affinity propagation ,020201 artificial intelligence & image processing ,Affine transformation ,Algorithm - Abstract
Clustering is an important technique in data mining. The innovative algorithm proposed in this paper obtains clusters by first identifying boundary points as opposed to existing methods that calculate core cluster points before expanding to the boundary points. To achieve this, an affine space-based boundary detection algorithm was employed to divide data points into cluster boundary and internal points. A connection matrix was then formed by establishing neighbor relationships between internal and boundary points to perform clustering. Our clustering algorithm with an affine space-based boundary detection algorithm accurately detected clusters in datasets with different densities, shapes, and sizes. The algorithm excelled at dealing with high-dimensional datasets.
- Published
- 2017
7. A novel version of k nearest neighbor: Dependent nearest neighbor
- Author
-
mer Faruk Erturul, Mehmet Emin Taluk, and Batman Üniversitesi Mühendislik - Mimarlık Fakültesi Elektrik-Elektronik Mühendisliği Bölümü
- Subjects
0209 industrial biotechnology ,Sinc function ,k Nearest Neighbor ,business.industry ,Nearest neighbor search ,Feature vector ,Dependency ,Pattern recognition ,02 engineering and technology ,Similarity ,k-nearest neighbors algorithm ,Dependent Nearest Neighbor ,Euclidean distance ,ComputingMethodologies_PATTERNRECOGNITION ,020901 industrial engineering & automation ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Probability distribution ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Fixed-radius near neighbors ,Software ,Mathematics - Abstract
Display Omitted dNN combines the similarity and dependency between the query and labeled samples.dNN maps the samples that are more similar and dependent on the query to the near of the origin, toward to +x axis.dNN has an adaptive dependency region, which is determined by the dependency angle and radius.Higher accuracies were obtained by using adaptive dependency region instead of a constant number of nearest neighbors (k). k nearest neighbor (kNN) is one of the basic processes behind various machine learning methods In kNN, the relation of a query to a neighboring sample is basically measured by a similarity metric, such as Euclidean distance. This process starts with mapping the training dataset onto a one-dimensional distance space based on the calculated similarities, and then labeling the query in accordance with the most dominant or mean of the labels of the k nearest neighbors, in classification or regression issues, respectively. The number of nearest neighbors (k) is chosen according to the desired limit of success. Nonetheless, two distinct samples may have equal distances to query but, with different angles in the feature space. The similarity of the query to these two samples needs to be weighted in accordance with the angle going between the query and each of the samples to differentiate between the two distances in reference to angular information. This opinion can be analyzed in the context of dependency and can be utilized to increase the precision of classifier. With this point of view, instead of kNN, the query is labeled according to its nearest dependent neighbors that are determined by a joint function, which is built on the similarity and the dependency. This method, therefore, may be called dependent NN (d-NN). To demonstrate d-NN, it is applied to synthetic datasets, which have different statistical distributions, and 4 benchmark datasets, which are Pima Indian, Hepatitis, approximate Sinc and CASP datasets. Results showed the superiority of d-NN in terms of accuracy and computation cost as compared to other employed popular machine learning methods.
- Published
- 2017
8. Evolutionary k-nearest neighbor imputation algorithm for gene expression data
- Author
-
Amal Shehan Perera and Hiroshi Madushani de Silva
- Subjects
0301 basic medicine ,Mean squared error ,Evolutionary algorithm ,02 engineering and technology ,computer.software_genre ,k-nearest neighbors algorithm ,03 medical and health sciences ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Imputation (statistics) ,Mathematics ,Statistics::Applications ,business.industry ,Pattern recognition ,General Medicine ,Missing data ,Quantitative Biology::Genomics ,030104 developmental biology ,Best bin first ,Data_GENERAL ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer ,Algorithm ,Large margin nearest neighbor - Abstract
Large data sets are produced by the gene expression process which is done by using the DNA microarray technology. These gene expression data are recognized as a common data source which contains missing expression values. In this paper, we present a genetic algorithm optimized k- Nearest neighbor algorithm (Evolutionary kNNImputation) for missing data imputation. Despite the common imputation methods this paper addresses the effectiveness of using supervised learning algorithms for missing data imputation. Missing data imputation approaches can be categorized into four main categories and among the four approaches, our focus is mainly on local approach where the proposed Evolutionary k- Nearest Neighbor Imputation Algorithm falls in. The Evolutionary k- Nearest Neighbor Imputation Algorithm is an extension of the common k- nearest Neighbor Imputation Algorithm which the genetic algorithm is used to optimize some parameters of k- Nearest Neighbor Algorithm. The selection of similarity matrix and the selection of the parameter value k can be identified as the optimization problem. We have compared the proposed Evolutionary k- Nearest Neighbor Imputation algorithm with k- Nearest Neighbor Imputation algorithm and mean imputation method. The three algorithms were tested using gene expression datasets. Certain percentages of values are randomly deleted in the datasets and recovered the missing values using the three algorithms. Results show that Evolutionary kNNImputation outperforms kNNImputation and mean imputation while showing the importance of using a supervised learning algorithm in missing data estimation. Even though mean imputation happened to show low mean error for a very few missing rates, supervised learning algorithms became effective when it comes to higher missing rates in datasets which is the most common situation among datasets.
- Published
- 2017
9. Adaptive density distribution inspired affinity propagation clustering
- Author
-
Zhiwen Liu, Zhonghang He, Zheyi Fan, Shuqin Weng, and Jiao Jiang
- Subjects
0209 industrial biotechnology ,Fuzzy clustering ,business.industry ,Correlation clustering ,Constrained clustering ,Pattern recognition ,02 engineering and technology ,020901 industrial engineering & automation ,Data stream clustering ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Affinity propagation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Cluster analysis ,Software ,k-medians clustering ,Mathematics - Abstract
As an effective clustering method, Affinity Propagation (AP) has received extensive attentions. But its applications are seriously limited by two major deficiencies. Firstly, the ultimate exemplars and clusters are sensitive to a list of user-defined parameters called preferences. Secondly, it cannot deal with the nonspherical cluster issue. To solve these problems, an adaptive density distribution inspired AP clustering algorithm is proposed in this work. Aiming at the difficulties in preference selection, a density-adaptive preference estimation algorithm is proposed to explore the underlying exemplars, which can obtain the better clustering results by only using the local density distributions of data. Aiming at the arbitrary shape cluster problem, a non-parameter similarity measurement strategy based on the nearest neighbor searching is presented to describe the true structures of data, and then, the data with both spherical and nonspherical distributions can be clustered. The experiments conducted on various synthetic and public datasets demonstrate that the proposed method outperforms other state-of-the-art approaches.
- Published
- 2017
10. A novel data clustering algorithm based on modified gravitational search algorithm
- Author
-
Jie Xiang, Long Quan, Xiaoyan Xiong, Yuan Lan, Matt Almeter, and Xiaohong Han
- Subjects
DBSCAN ,Clustering high-dimensional data ,Fuzzy clustering ,Computer science ,Population-based incremental learning ,Correlation clustering ,Initialization ,02 engineering and technology ,computer.software_genre ,Artificial Intelligence ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,Consensus clustering ,0202 electrical engineering, electronic engineering, information engineering ,Firefly algorithm ,Electrical and Electronic Engineering ,Cluster analysis ,k-medians clustering ,FSA-Red Algorithm ,Constrained clustering ,k-means clustering ,Particle swarm optimization ,020206 networking & telecommunications ,Determining the number of clusters in a data set ,Data stream clustering ,Control and Systems Engineering ,Canopy clustering algorithm ,FLAME clustering ,Affinity propagation ,020201 artificial intelligence & image processing ,Data mining ,Algorithm ,computer - Abstract
Data clustering is a popular analysis tool for data statistics in many fields such as pattern recognition, data mining, machine learning, image analysis, and bioinformatics. The aim of data clustering is to represent large datasets by a fewer number of prototypes or clusters, which brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. In this paper, a novel data clustering algorithm based on modified Gravitational Search Algorithm is proposed, which is called Bird Flock Gravitational Search Algorithm (BFGSA). The BFGSA introduces a new mechanism into GSA to add diversity, a mechanism which is inspired by the collective response behavior of birds. This mechanism performs its diversity enhancement through three main steps including initialization, identification of the nearest neighbors, and orientation change. The initialization is to generate candidate populations for the second steps and the orientation change updates the position of objects based on the nearest neighbors. Due to the collective response mechanism, the BFGSA explores a wider range of the search space and thus escapes suboptimal solutions. The performance of the proposed algorithm is evaluated through 13 real benchmark datasets from the well-known UCI Machine Learning Repository. Its performance is compared with the standard GSA, the Artificial Bee Colony (ABC), the Particle Swarm Optimization (PSO), the Firefly Algorithm (FA), K-means, and other four clustering algorithms from the literature. The simulation results indicate that the BFGSA can effectively be used for data clustering.
- Published
- 2017
11. Efficient Sub-Window Nearest Neighbor Search on Matrix
- Author
-
Tsz Nam Chan, Man Lung Yiu, and Kien A. Hua
- Subjects
Cover tree ,Computer science ,Nearest neighbor search ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,k-nearest neighbors algorithm ,Locality-sensitive hashing ,Best bin first ,Computational Theory and Mathematics ,Nearest neighbor graph ,020204 information systems ,Nearest-neighbor chain algorithm ,R-tree ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Fixed-radius near neighbors ,Algorithm ,computer ,Large margin nearest neighbor ,Information Systems - Abstract
We study a nearest neighbor search problem on a matrix by its element values. Given a data matrix $D$ and a query matrix $q$ , the sub-window nearest neighbor search problem finds a sub-window of $D$ that is the most similar to $q$ . This problem has a wide range of applications, e.g., geospatial data integration, object detection, and motion estimation. In this paper, we propose an efficient progressive search solution that overcomes the drawbacks of existing solutions. First, we present a generic approach to build level-based lower bound functions on top of basic lower bound functions. Second, we develop a novel lower bound function for a group of sub-windows, in order to boost the efficiency of our solution. Furthermore, we extend our solution to support irregular-shaped queries. Experimental results on real data demonstrate the efficiency of our proposed methods.
- Published
- 2017
12. A new general nearest neighbor classification based on the mutual neighborhood information
- Author
-
Weiping Ku, Zhibin Pan, and Yidi Wang
- Subjects
Information Systems and Management ,Computer science ,Nearest neighbor search ,Gaussian ,02 engineering and technology ,computer.software_genre ,Management Information Systems ,k-nearest neighbors algorithm ,symbols.namesake ,Artificial Intelligence ,020204 information systems ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,business.industry ,Pattern recognition ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,Classifier (UML) ,computer ,Software ,Large margin nearest neighbor - Abstract
The nearest neighbor (NN) rule is effective for many applications in pattern classification, such as the famous k-nearest neighbor (kNN) classifier. However, NN-based classifiers perform a one-sided classification by finding the nearest neighbors simply according to the neighborhood of the testing sample. In this paper, we propose a new selection method of nearest neighbors based on a two-sided mode, called general nearest neighbor (GNN) rule. The mutual neighborhood information of both testing sample and training sample is considered, then the overlapping of the above neighborhoods is used to decide the general nearest neighbors of the testing sample. To verify the effectiveness of the GNN rule in pattern classification, a k-general nearest neighbor (kGNN) classifier is proposed by applying the k-neighborhood information of each sample to find the general nearest neighbors. Extensive experiments on twenty real-world datasets from UCI and KEEL repository and two Gaussian artificial datasets of the I-I and Ness dataset prove that the kGNN classifier outperforms the kNN classifier and seven other state-of-the-art NN-based classifiers, particularly in the situations of small training sample size.
- Published
- 2017
13. A framework for distributed nearest neighbor classification using Hadoop
- Author
-
Robert Boykin and Qin Ding
- Subjects
Computational Mathematics ,Computer science ,020204 information systems ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,General Engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Data mining ,computer.software_genre ,computer ,Computer Science Applications ,k-nearest neighbors algorithm - Published
- 2017
14. k-Expected Nearest Neighbor Search over Gaussian Objects
- Author
-
Yoshiharu Ishikawa, Jing Zhao, Chuan Xiao, and Tingting Dong
- Subjects
Cover tree ,General Computer Science ,Computer science ,Nearest neighbor search ,02 engineering and technology ,Locality-sensitive hashing ,Combinatorics ,Best bin first ,Nearest neighbor graph ,020204 information systems ,Nearest-neighbor chain algorithm ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Fixed-radius near neighbors - Published
- 2017
15. A hierarchical algorithm for approximate nearest neighbor searching in a dataset of pyramid-based image representations
- Author
-
Andrey Lange and Mikhail Lange
- Subjects
Computational complexity theory ,business.industry ,Pattern recognition ,02 engineering and technology ,General Medicine ,k-nearest neighbors algorithm ,Best bin first ,Cardinality ,Dimension (vector space) ,Computer Science::Computer Vision and Pattern Recognition ,020204 information systems ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Pyramid (image processing) ,Artificial intelligence ,business ,MNIST database ,Mathematics - Abstract
An algorithm for hierarchical searching an approximate nearest image in a given dataset relative to a submitted image is suggested. The algorithm is supposed to high dimension data and the decision is made in a space of the multiresolution pyramid-based image representations. A computational complexity of the algorithm increases as a logarithm of the image dimension (number of pixels) and as linear function of the dataset cardinality. An efficiency of the algorithm is estimated in terms of a probability distribution for the search errors defined by the differences of distances between the submitted images and the approximate or the nearest decisions, respectively. The algorithm has been tested for two applications, namely, for searching the approximate nearest hand-written digits in MNIST dataset and for gridding noisy images in a digital map taken from Google Maps. For these applications, the empirical search error distributions are calculated using the different values of the algorithm parameter which allows us to change the computational complexity. Also, the linear order of growth of the computational complexity while increasing the dataset cardinality can be decreased.
- Published
- 2017
16. Efficient multiple bichromatic mutual nearest neighbor query processing
- Author
-
J. Antoni Sellarès, Marta Fort, and Ministerio de Economía y Competitividad (Espanya)
- Subjects
Theoretical computer science ,Computer science ,Nearest neighbor search ,Parallel algorithm ,Infografia ,02 engineering and technology ,k-nearest neighbors algorithm ,Computer graphics ,Sistemes d'ajuda a la decisió ,Best bin first ,Nearest neighbor graph ,Hardware and Architecture ,020204 information systems ,Nearest-neighbor chain algorithm ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Fixed-radius near neighbors ,Decision support system ,Software ,Large margin nearest neighbor ,Information Systems - Abstract
In this paper we propose, motivate and solve multiple bichromatic mutual nearest neighbor queries in the plane considering multiplicative weighted Euclidean distances. Given two sets of facilities of different types, a multiple bichromatic mutual ( k , k ' ) -nearest neighbor query finds pairs of points, one of each set, such that the point of the first set is a k-nearest neighbor of the point of the second set and, at the same time, the point of the second set is a k ' -nearest neighbor of the point of the first set. These queries find applications in collaborative marketing and prospective data analysis, where facilities of one type cooperate with facilities of the other type to obtain reciprocal benefits. We present a sequential and a parallel algorithm, to be run on the CPU and on a Graphics Processing Unit, respectively, for solving multiple bichromatic mutual nearest neighbor queries. We also present the time and space complexity analysis of both algorithms, together with their theoretical comparison. Finally, we provide and discuss experimental results obtained with the implementation of the proposed sequential and a parallel algorithm. HighlightsWe define and motivate multiple mutual bichromatic weighted nearest neighbor queries.We solve 2d multiple mutual nearest neighbor queries sequentially and parallelly.We theoretically analyze and compare the time and space complexity of both algorithms.We experimentally show the algorithms to be effective, robust and scalable.
- Published
- 2016
17. An improved location difference of multiple distances based nearest neighbors searching algorithm
- Author
-
Xiaoru Bi, Liu Yang, and Limei Dong
- Subjects
k-medoids ,Computer science ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Atomic and Molecular Physics, and Optics ,Electronic, Optical and Magnetic Materials ,k-nearest neighbors algorithm ,Data set ,Tree structure ,010201 computation theory & mathematics ,Search algorithm ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,Algorithm ,Time complexity ,Curse of dimensionality - Abstract
The location difference of multiple distances based nearest neighbors search algorithm (LDMDBA) has a good performance in efficiency compared with other kNN algorithm. The major advantage of it is its precision is litter lower than the full search algorithm (FSA) algorithm. In this paper, we proposed an improved LDMDBA algorithm (ILDMDBA) by increasing the number of the reference points from log( d ) to d , where the d is the dimensionality of data set. By this way, the prediction of ILDMDBA is improved. Our analysis results show that the time complexity of the proposed algorithm is not increased. The effectiveness and efficiency of the proposed algorithm are demonstrated in experiments involving public and artificial datasets.
- Published
- 2016
18. Competitive Quantization for Approximate Nearest Neighbor Search
- Author
-
Ezgi Can Ozan, Moncef Gabbouj, and Serkan Kiranyaz
- Subjects
Linde–Buzo–Gray algorithm ,Computer science ,Nearest neighbor search ,Competitive learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,k-nearest neighbors algorithm ,large-scale retrieval ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Approximate nearest neighbor search ,Learning vector quantization ,business.industry ,Quantization (signal processing) ,binary codes ,Vector quantization ,020207 software engineering ,Pattern recognition ,Computer Science Applications ,Best bin first ,vector quantization ,Computational Theory and Mathematics ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Gradient descent ,Algorithm ,Large margin nearest neighbor ,Information Systems - Abstract
In this study, we propose a novel vector quantization algorithm for Approximate Nearest Neighbor (ANN) search, based on a joint competitive learning strategy and hence called as competitive quantization (CompQ). CompQ is a hierarchical algorithm, which iteratively minimizes the quantization error by jointly optimizing the codebooks in each layer, using a gradient decent approach. An extensive set of experimental results and comparative evaluations show that CompQ outperforms the-state-of-the-art while retaining a comparable computational complexity. Scopus
- Published
- 2016
19. Prototype-Based Classification Using Class Hyperspheres
- Author
-
Hyun-Jong Lee and Doosung Hwang
- Subjects
business.industry ,Computer science ,Nearest-neighbor chain algorithm ,Pattern recognition ,Artificial intelligence ,Greedy algorithm ,business ,Class (biology) ,Large margin nearest neighbor - Published
- 2016
20. Natural Neighbor Reduction Algorithm for Instance-based Learning
- Author
-
Cheng Zhang, Qingsheng Zhu, Dongdong Cheng, Lijun Yang, and Jinlong Huang
- Subjects
Computer science ,020206 networking & telecommunications ,02 engineering and technology ,Filter (signal processing) ,computer.software_genre ,Hybrid algorithm ,Human-Computer Interaction ,Reduction (complexity) ,Best bin first ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,Line (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,Decision boundary ,020201 artificial intelligence & image processing ,Instance-based learning ,Data mining ,computer ,Algorithm ,Software - Abstract
Instance reduction is aimed at reducing prohibitive computational costs and the storage space for instance-based learning. The most frequently used methods include the condensation and edition approaches. Condensation method removes the patterns far from the decision boundary and do not contribute to better classification accuracy, while edition method removes noisy patterns to improve the classification accuracy. In this paper, a new hybrid algorithm called instance reduction algorithm based on natural neighbor and nearest enemy is presented. At first, an edition algorithm is proposed to filter noisy patterns and smooth the class boundaries by using natural neighbor. The main advantage of the algorithm is that it does not require any user-defined parameters. Then, using a new condensation method based on nearest enemy to reduce instances far from decision line. Through this algorithm, interior instances are discarded. Experiments show that the hybrid approach effectively reduces the number of instances while achieves higher classification accuracy along with competitive algorithms.
- Published
- 2016
21. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors
- Author
-
Xiaohui Liu, Juanying Xie, Hongchao Gao, Philip W. Grant, and Weixin Xie
- Subjects
DBSCAN ,0209 industrial biotechnology ,Information Systems and Management ,Fuzzy clustering ,Correlation clustering ,Single-linkage clustering ,02 engineering and technology ,computer.software_genre ,Theoretical Computer Science ,020901 industrial engineering & automation ,Artificial Intelligence ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,k-medians clustering ,Mathematics ,business.industry ,Pattern recognition ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Control and Systems Engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer ,Software - Abstract
Clustering by fast search and find of Density Peaks (referred to as DPC) was introduced by Alex Rodriguez and Alessandro Laio. The DPC algorithm is based on the idea that cluster centers are characterized by having a higher density than their neighbors and by being at a relatively large distance from points with higher densities. The power of DPC was demonstrated on several test cases. It can intuitively find the number of clusters and can detect and exclude the outliers automatically, while recognizing the clusters regardless of their shape and the dimensions of the space containing them. However, DPC does have some drawbacks to be addressed before it may be widely applied. First, the local density ?i of point i is affected by the cutoff distance dc, and is computed in different ways depending on the size of datasets, which can influence the clustering, especially for small real-world cases. Second, the assignment strategy for the remaining points, after the density peaks (that is the cluster centers) have been found, can create a "Domino Effect", whereby once one point is assigned erroneously, then there may be many more points subsequently mis-assigned. This is especially the case in real-word datasets where there could exist several clusters of arbitrary shape overlapping each other. To overcome these deficiencies, a robust clustering algorithm is proposed in this paper. To find the density peaks, this algorithm computes the local density ?i of point i relative to its K-nearest neighbors for any size dataset independent of the cutoff distance dc, and assigns the remaining points to the most probable clusters using two new point assignment strategies. The first strategy assigns non-outliers by undertaking a breadth first search of the K-nearest neighbors of a point starting from cluster centers. The second strategy assigns outliers and the points unassigned by the first assignment procedure using the technique of fuzzy weighted K-nearest neighbors. The proposed clustering algorithm is benchmarked on publicly available synthetic and real-world datasets which are commonly used for testing the performance of clustering algorithms. The clustering results of the proposed algorithm are compared not only with that of DPC but also with that of several well known clustering algorithms including Affinity Propagation (AP), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and K-means. The benchmarks used are: clustering accuracy (Acc), Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI). The experimental results demonstrate that our proposed clustering algorithm can find cluster centers, recognize clusters regardless of their shape and dimension of the space in which they are embedded, be unaffected by outliers, and can often outperform DPC, AP, DBSCAN and K-means.
- Published
- 2016
22. Improved 3D Face Feature-Point Nearest Neighbor Clustering Using Orthogonal Face Map
- Author
-
Mochamad Hariadi, Samuel Gandang Gunanto, and Eko Mulyanto Yuniarto
- Subjects
Health (social science) ,General Computer Science ,Computer science ,business.industry ,General Mathematics ,05 social sciences ,General Engineering ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Education ,Nearest neighbor clustering ,General Energy ,Feature (computer vision) ,Face (geometry) ,Nearest-neighbor chain algorithm ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,050211 marketing ,Point (geometry) ,Artificial intelligence ,business ,General Environmental Science - Published
- 2016
23. Initialization of dynamic time warping using tree-based fast Nearest Neighbor
- Author
-
Stergios Poularakis and Ioannis Katsavounidis
- Subjects
Dynamic time warping ,Computer science ,business.industry ,Nearest neighbor search ,Initialization ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,k-nearest neighbors algorithm ,Euclidean distance ,Tree (data structure) ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,Computer Science::Sound ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,Signal Processing ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Initializing LBKeogh Dynamic Time Warping search using the Euclidean Distance Nearest Neighbor.Employing a fast Nearest Neighbor algorithm (fastNN) to increase computational efficiency.Successful application on five gesture datasets.Requiring about 20% less search time than existing DTW implementations without any drop in recognition accuracy.Exploring system parameters: number of training examples, data nature, etc. An efficient way to perform Dynamic Time Warping (DTW) search is by using the LBKeogh lower bound, which can eliminate a large number of candidate vectors out of the search process. Although effective, LBKeogh begins the DTW search using the first candidate vector, which is typically arbitrarily chosen. In this work, we propose initializing the LBKeogh-based DTW search using the Euclidean Distance Nearest Neighbor, derived by a fast tree-based Nearest Neighbor technique. Our experimental results suggest that, on one hand, this simple NN-based approach is quite accurate for trajectory classification of digit and letter gesturing and can initialize the DTW search very efficiently, thus requiring about 20% less search time than existing DTW implementations without any drop in recognition accuracy.
- Published
- 2016
24. An Improved Affinity Propagation Clustering Algorithm Based on Entropy Weight Method and Principal Component Analysis
- Author
-
Liu Ying, Han Xuming, Ji Qiang, Wang Limin, Zhang Li, and Mu Guangyu
- Subjects
Fuzzy clustering ,General Computer Science ,Computer science ,business.industry ,Correlation clustering ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,010104 statistics & probability ,Data stream clustering ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Canopy clustering algorithm ,Affinity propagation ,020201 artificial intelligence & image processing ,Artificial intelligence ,0101 mathematics ,Cluster analysis ,business ,Algorithm - Abstract
Traditional affinity propagation algorithm has inefficient results when conducting clustering analysis of high dimensional data because "dimension effect" lead to difficult find the proper class structure .In view of this, the author proposes an improved algorithm on the basis of Entropy Weight Method and Principal Component Analysis (EWPCA-AP). EWPCA-AP algorithm empowers the sample data by Entropy Weight Method, eliminate data irrelevant attributes by Principal Component Analysis, and travel with neighbor clustering algorithm, realization of high-dimensional data clustering in low dimension space. The numerical result of simulation experiment shows that the new EWPCA-AP algorithm can effectively eliminate the redundancy and irrelevant attributes of data and improve the performance of clustering. In addition, the proposed algorithm is applied in the area of the economy in our country and the clustering result is consistent with the real one. This algorithm provides a new intelligent evaluation method for Chinese economy.
- Published
- 2016
25. Engineering optimization by constrained differential evolution with nearest neighbor comparison
- Author
-
Pham Hoang Anh
- Subjects
021103 operations research ,Cover tree ,business.industry ,Nearest neighbor search ,0211 other engineering and technologies ,Pattern recognition ,02 engineering and technology ,k-nearest neighbors algorithm ,Engineering optimization ,Best bin first ,Nearest neighbor graph ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Algorithm ,Large margin nearest neighbor ,Mathematics - Abstract
It has been proposed to utilize nearest neighbor comparison to reduce the number of function evaluations in unconstrained optimization. The nearest neighbor comparison omits the function evaluation of a point when the comparison can be judged by its nearest point in the search population. In this paper, a constrained differential evolution (DE) algorithm is proposed by combining the ε constrained method to handle constraints with the nearest neighbor comparison method. The algorithm is tested using five benchmark engineering design problems and the results indicate that the proposed DE algorithm is able to find good results in a much smaller number of objective function evaluations than conventional DE and it is competitive to other state-of-the-art DE variants.
- Published
- 2016
26. A Novel Approach for Data Clustering using Improved K-means Algorithm
- Author
-
Shubha Puthran and Rishikesh Suryawanshi
- Subjects
DBSCAN ,Normalization (statistics) ,Clustering high-dimensional data ,Fuzzy clustering ,Computer science ,Correlation clustering ,Single-linkage clustering ,02 engineering and technology ,computer.software_genre ,CURE data clustering algorithm ,020204 information systems ,Nearest-neighbor chain algorithm ,Consensus clustering ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,k-medians clustering ,FSA-Red Algorithm ,k-medoids ,k-means clustering ,Constrained clustering ,Data set ,Determining the number of clusters in a data set ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,Canopy clustering algorithm ,FLAME clustering ,Affinity propagation ,020201 artificial intelligence & image processing ,Data mining ,computer - Abstract
statistic and data mining, k-means is well known for its efficiency in clustering large data sets. The aim is to group data points into clusters such that similar items are lumped together in the same cluster. The K-means clustering algorithm is most commonly used algorithms for clustering analysis. The existing K-means algorithm is, inefficient while working on large data and improving the algorithm remains a problem. However, there exist some flaws in classical K-means clustering algorithm. According to the method, the algorithm is sensitive to selecting initial Centroid. The quality of the resulting clusters heavily depends on the selection of initial centroids. K-means clustering is a method of cluster analysis which aims to partition "n" observations into k clusters in which each observation belongs to the cluster with the nearest mean. In the proposed project performing data clustering efficiently by decreasing the time of generating cluster. In this project, our aim is to improve the performance using normalization and initial centroid selection techniques in already existing algorithm. The experimental result shows that, the proposed algorithm can overcome shortcomings of the K-means algorithm. KeywordsAnalysis, Clustering, k-means Algorithm, Improved k- means Algorithm
- Published
- 2016
27. A density-based noisy graph partitioning algorithm
- Author
-
Seoung Bum Kim and Jaehong Yu
- Subjects
Clustering high-dimensional data ,DBSCAN ,Fuzzy clustering ,Computer science ,Cognitive Neuroscience ,Correlation clustering ,0211 other engineering and technologies ,0102 computer and information sciences ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Artificial Intelligence ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,Cluster analysis ,k-medians clustering ,021103 operations research ,k-medoids ,Graph partition ,Constrained clustering ,Spectral clustering ,Graph ,Computer Science Applications ,Determining the number of clusters in a data set ,Data stream clustering ,010201 computation theory & mathematics ,Canopy clustering algorithm ,Affinity propagation ,FLAME clustering ,Data mining ,computer ,Algorithm - Abstract
Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. Numerous researchers have focused recently on graph-based clustering algorithms because their graph structure is useful in modeling the local relationships among observations. These algorithms perform reasonably well in their intended applications. However, no consensus exists about which of them best satisfies all the conditions encountered in a variety of real situations. In this study, we propose a graph-based clustering algorithm based on a novel density-of-graph structure. In the proposed algorithm, a density coefficient defined for each node is used to classify dense and sparse nodes. The main structures of clusters are identified through dense nodes and sparse nodes that are assigned to specific clusters. Experiments on various simulation datasets and benchmark datasets were conducted to examine the properties of the proposed algorithm and to compare its performance with that of existing spectral clustering and modularity-based algorithms. The experimental results demonstrated that the proposed clustering algorithm performed better than its competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly distributed. We propose a graph-based clustering algorithm based on a density-of-graph structure.The proposed algorithm is useful for data exhibiting noisy and nonlinear patterns.The proposed algorithm does not require to specify the number of clusters in advance.The proposed algorithm outperformed other graph-based clustering algorithms.
- Published
- 2016
28. Algorithm of approximate search for the nearest digital array in a hierarchical data set
- Author
-
A.M. Lange, M.M. Lange, and S.N. Ganebnykh
- Subjects
Set (abstract data type) ,Best bin first ,Computer science ,Nearest-neighbor chain algorithm ,Data mining ,computer.software_genre ,Digital array ,Algorithm ,computer ,Hierarchical database model - Published
- 2016
29. An efficient and scalable density-based clustering algorithm for datasets with complex structures
- Author
-
Yinghua Lv, Tinghuai Ma, Jie Cao, Mznah Al-Rodhaan, Meili Tang, Abdullah Al-Dhelaan, and Yuan Tian
- Subjects
DBSCAN ,business.industry ,Cognitive Neuroscience ,OPTICS algorithm ,Pattern recognition ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Locality-sensitive hashing ,Data point ,Artificial Intelligence ,SUBCLU ,020204 information systems ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Unsupervised learning ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,Cluster analysis ,business ,Algorithm ,computer ,Mathematics - Abstract
As a research branch of data mining, clustering, as an unsupervised learning scheme, focuses on assigning objects in the dataset into several groups, called clusters, without any prior knowledge. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most widely used clustering algorithms for spatial datasets, which can detect any shapes of clusters and can automatically identify noise points. However, there are several troublesome limitations of DBSCAN: (1) the performance of the algorithm depends on two specified parameters, e and MinPts in which e represents the maximum radius of a neighborhood from the observing point and MinPts means the minimum number of data points contained in such a neighborhood. (2) The time consumption for searching the nearest neighbors of each object is intolerable in the cluster expansion. (3) Selecting different starting points results in quite different consequences. (4) DBSCAN is unable to identify adjacent clusters of various densities. In addition to these restrictions about DBSCAN mentioned above, the identification of border points is often ignored. In our paper, we successfully solve the above problems. Firstly, we improve the traditional locality sensitive hashing method to implement fast query of nearest neighbors. Secondly, several definitions are redefined on the basis of the influence space of each object, which takes the nearest neighbors and the reverse nearest neighbors into account. The influence space is proved to be sensitive to local density changes to successfully reduce the amount of parameters and identify adjacent clusters of different densities. Moreover, this new relationship based on the influence space makes the insensitivity to the ordering of inputting points possible. Finally, a new concept-core density reachable based on the influence space is put forward which aims to distinguish between border objects and noisy objects. Several experiments are performed which demonstrate that the performance of our proposed algorithm is better than the traditional DBSCAN algorithm and the improved algorithm IS-DBSCAN.
- Published
- 2016
30. Extensions of k-Nearest Neighbor Algorithm
- Author
-
Rashmi Agrawal
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,General Computer Science ,Computer science ,Nearest-neighbor chain algorithm ,Genetic algorithm ,General Engineering ,Feature selection ,Class (biology) ,Fuzzy logic ,Algorithm ,Large margin nearest neighbor ,k-nearest neighbors algorithm - Abstract
The aim of this study is to review various extensions of the nearest neighbor algorithm and discuss their approach along with limitations of the method. In nonparametric classification, no prior information is required for predicting the class label. k-Nearest Neighbor is the simplest and well known algorithm used in data mining The various extensions of k-nearest neighbor algorithm which have been studied are weighted nearest neighbor, feature selection methods, fuzzy nearest neighbor, genetic algorithm based classifiers and nearest neighbor algorithm using ensembling techniques.
- Published
- 2016
31. k Nearest Neighbor Classification Coprocessor with Weighted Clock-Mapping-Based Searching
- Author
-
Toshinobu Akazawa, Fengwei An, Shogo Yamasaki, Hans Jurgen Mattausch, and Lei Chen
- Subjects
010302 applied physics ,Coprocessor ,business.industry ,Computer science ,Nearest neighbor search ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Electronic, Optical and Magnetic Materials ,k-nearest neighbors algorithm ,Best bin first ,Nearest neighbor graph ,Nearest-neighbor chain algorithm ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Electrical and Electronic Engineering ,Fixed-radius near neighbors ,business ,Large margin nearest neighbor - Published
- 2016
32. An Approach to Reduce the Computational Burden of Nearest Neighbor Classifier
- Author
-
Rajesh Kumar, C. Shobha Bindu, and P. R. Viswanath
- Subjects
Nearest Neighbour Classifiers ,Computer science ,Nearest neighbor search ,Feature selection ,02 engineering and technology ,Machine learning ,computer.software_genre ,Editing ,k-nearest neighbors algorithm ,Set (abstract data type) ,Cardinality ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,General Environmental Science ,Training set ,business.industry ,020206 networking & telecommunications ,Linear Discriminant Analysis ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer ,Curse of dimensionality - Abstract
Nearest Neighbor Classifiers demand high computational resources i.e, time and memory. Reducing of reference set(training set) and feature selection are two different approaches to this problem. This paper presents a method to reduce the training set both in cardinality and dimensionality in cascade. The experiments are done on several bench mark datasets and the results obtained are satisfactory.
- Published
- 2016
33. A novel supervised cluster adjustment method using a fast exact nearest neighbor search algorithm
- Author
-
Ali Zaghian and Fakhroddin Noorbehbahani
- Subjects
Fuzzy clustering ,Computer science ,Single-linkage clustering ,Correlation clustering ,02 engineering and technology ,computer.software_genre ,Artificial Intelligence ,020204 information systems ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,business.industry ,05 social sciences ,k-means clustering ,Pattern recognition ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,Canopy clustering algorithm ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Data mining ,0509 other social sciences ,050904 information & library sciences ,business ,Algorithm ,computer - Abstract
Supervised clustering is a new research area that aims to improve unsupervised clustering algorithms exploiting supervised information. Today, there are several clustering algorithms, but the effective supervised cluster adjustment method which is able to adjust the resulting clusters, regardless of applied clustering algorithm has not been presented yet. In this paper, we propose a new supervised cluster adjustment method which can be applied to any clustering algorithm. Since the adjustment method is based on finding the nearest neighbors, a novel exact nearest neighbor search algorithm is also introduced which is significantly faster than the classic one. Several datasets and clustering evaluation metrics are employed to examine the effectiveness of the proposed cluster adjustment method and the proposed fast exact nearest neighbor algorithm comprehensively. The experimental results show that the proposed algorithms are significantly effective in improving clusters and accelerating nearest neighbor searches.
- Published
- 2015
34. Gravitational fixed radius nearest neighbor for imbalanced problem
- Author
-
Zhe Wang, Daqi Gao, and Yujin Zhu
- Subjects
Information Systems and Management ,Cover tree ,Computer science ,Nearest neighbor search ,Nearest neighbour algorithm ,computer.software_genre ,Management Information Systems ,k-nearest neighbors algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,Nearest neighbor graph ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,R-tree ,Ball tree ,Data mining ,Fixed-radius near neighbors ,computer ,Software ,Large margin nearest neighbor - Abstract
We use the gravitational scenario into the fixed radius nearest neighbor rule.The proposed GFRNN deals with imbalanced classification problem.GFRNN does not need any manual parameter setting or coordination.Comparison experiments on 40 datasets validate its effectiveness and efficiency. This paper proposes a novel learning model that introduces the calculation of the pairwise gravitation of the selected patterns into the classical fixed radius nearest neighbor method, in order to overcome the drawback of the original nearest neighbor rule when dealing with imbalanced data. The traditional k nearest neighbor rule is considered to lose power on imbalanced datasets because the final decision might be dominated by the patterns from negative classes in spite of the distance measurements. Differently from the existing modified nearest neighbor learning model, the proposed method named GFRNN has a simple structure and thus becomes easy to work. Moreover, all parameters of GFRNN do not need initializing or coordinating during the whole learning procedure. In practice, GFRNN first selects patterns as candidates out of the training set under the fixed radius nearest neighbor rule, and then introduces the metric based on the modified law of gravitation in the physical world to measure the distance between the query pattern and each candidate. Finally, GFRNN makes the decision based on the sum of all the corresponding gravitational forces from the candidates on the query pattern. The experimental comparison validates both the effectiveness and the efficiency of GFRNN on forty imbalanced datasets, comparing to nine typical methods. As a conclusion, the contribution of this paper is constructing a new simple nearest neighbor architecture to deal with imbalanced classification effectively without any manually parameter coordination, and further expanding the family of the nearest neighbor based rules.
- Published
- 2015
35. Optimal nearest neighbor queries in sensor networks
- Author
-
Costas Busch and Gokarna Sharma
- Subjects
Theoretical computer science ,Asymptotically optimal algorithm ,General Computer Science ,Competitive analysis ,Nearest neighbor graph ,Distributed algorithm ,Video tracking ,Nearest-neighbor chain algorithm ,Wireless sensor network ,Theoretical Computer Science ,k-nearest neighbors algorithm ,Mathematics - Abstract
Given a set of m mobile objects in a sensor network, we consider the problem of finding the nearest object among them from any node in the network at any time. These mobile objects are tracked by nearby sensors called proxy nodes. This problem requires an object tracking mechanism which typically relies on two basic operations: query and update. A query is invoked by a node each time when there is a need to find the closest object from it in the network. Updates of an object's location are initiated when the object moves from one location (proxy node) to another. We present a scalable distributed algorithm for tracking these mobile objects such that both the query cost and the update cost are small. The main idea in our algorithm is to maintain a virtual tree of downward paths pointing to the objects. Our algorithm guarantees an asymptotically optimal O(1) approximation for query cost and an O(min{logn,logD}) approximation for update cost in the constant-doubling graph model, where n and D, respectively, are the number of nodes and the diameter of the network. We also give polylogarithmic approximations for both query and update cost in the general graph model. Our algorithm requires only polylogarithmic bits of memory per node. To the best of our knowledge, this is the first algorithm that is asymptotically optimal in handling nearest neighbor queries with low update cost in a distributed setting. A distributed algorithm for the nearest neighbor problem in sensor networks is proposed.It attains constant approximation for query cost in constant-doubling graph model.It attains log approximation for update cost in the constant-doubling graph model.It attains polylog approximations for both query and update cost in general graph model.It is scalable as both cost approximations do not depend on the number of mobile objects.
- Published
- 2015
36. Preserving nearest neighbor consistency in cluster analysis
- Author
-
Jong-Seok Lee
- Subjects
Best bin first ,Nearest neighbor graph ,Consistency (statistics) ,Computer science ,business.industry ,Nearest-neighbor chain algorithm ,Cluster (physics) ,Pattern recognition ,Artificial intelligence ,business ,Large margin nearest neighbor ,k-nearest neighbors algorithm - Published
- 2018
37. Representative points clustering algorithm based on density factor and relevant degree
- Author
-
Di Wu, Long Sheng, and Jiadong Ren
- Subjects
DBSCAN ,Single-linkage clustering ,Correlation clustering ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Complete-linkage clustering ,Artificial Intelligence ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Data mining ,Cluster analysis ,Algorithm ,computer ,Software ,k-medians clustering ,Mathematics - Abstract
Most of the existing clustering algorithms are affected seriously by noise data and high cost of time. In this paper, on the basis of CURE algorithm, a representative points clustering algorithm based on density factor and relevant degree called RPCDR is proposed. The definition of density factor and relevant degree are presented. The primary representative point whose density factor is less than the prescribed threshold will be deleted directly. New representative points can be reselected from non representative points in corresponding cluster. Moreover, the representative points of each cluster are modeled by using K-nearest neighbor method. Relevant degree is computed by comprehensive considering the correlations of objects within a cluster and between different clusters. And then whether the two clusters need to merge is judged. The theoretic experimental results and analysis prove that RPCDR has better clustering accuracy and execution efficiency.
- Published
- 2015
38. A Genetic Algorithm Based Clustering Ensemble Approach to Learning Relational Databases
- Author
-
Joe Henry Obit, Gabriel Jong Chiye, On Chin Kim, Hui Keng Lau, Rayner Alfred, and Mohd Hanafi Ahmad Hijazi
- Subjects
Health (social science) ,Theoretical computer science ,General Computer Science ,General Mathematics ,Correlation clustering ,Single-linkage clustering ,General Engineering ,Constrained clustering ,computer.software_genre ,Education ,Determining the number of clusters in a data set ,ComputingMethodologies_PATTERNRECOGNITION ,General Energy ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,Canopy clustering algorithm ,Data mining ,Cluster analysis ,computer ,General Environmental Science ,Mathematics - Abstract
Clustering is an unsupervised learning algorithm. k-Means algorithm is one of the well-known and promising clustering algorithms that can converge to a local optimum in few iterative. In our work, we will be hybridizing k-Means algorithm with Genetic Algorithms to look for the solution in the global search space in order to converge to a global optima. The problem for clustering is that when the number of clusters increases up to the same number of total records in the dataset, it leads to a scenario in which a cluster only contains a single record, and thus the cluster purity is maximized to the maximum value, 1. However, it will be useless since the common regularities among records will not be seen. Therefore, choosing the best number of clusters is trivial. Instead of choosing an inappropriate number of clusters and risking the main purpose of the clustering process, a Genetic Algorithm based k-Means ensemble is proposed in order to find the consensus result of several runs of clustering task using different number of clusters, k.
- Published
- 2015
39. Fast approximate matching of binary codes with distinctive bits
- Author
-
Yanping Ma, Bing Zhang, Hongtao Xie, Qiong Dai, Yizhi Liu, and Chenggang Clarence Yan
- Subjects
General Computer Science ,Computer science ,computer.software_genre ,Uniform binary search ,Linear code ,Random binary tree ,Theoretical Computer Science ,Hierarchical clustering ,Nearest-neighbor chain algorithm ,Binary code ,Data mining ,Hamming space ,Cluster analysis ,computer ,Algorithm - Abstract
Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.
- Published
- 2015
40. NBC: An Efficient Hierarchical Clustering Algorithm for Large Datasets
- Author
-
Wei Zhang, Gongxuan Zhang, Tao Li, Zhaomeng Zhu, and Yongli Wang
- Subjects
Linguistics and Language ,Brown clustering ,Computer Networks and Communications ,Computer science ,business.industry ,Single-linkage clustering ,Correlation clustering ,Pattern recognition ,computer.software_genre ,Computer Science Applications ,Hierarchical clustering ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,Canopy clustering algorithm ,Artificial intelligence ,Data mining ,Hierarchical clustering of networks ,business ,Algorithm ,computer ,Software ,Information Systems - Abstract
Nearest neighbor search is a key technique used in hierarchical clustering and its computing complexity decides the performance of the hierarchical clustering algorithm. The time complexity of standard agglomerative hierarchical clustering is O(n3), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain, SLINK and CLINK) is O(n2). This paper presents a new nearest neighbor search method called nearest neighbor boundary (NNB), which first divides a large dataset into independent subset and then finds nearest neighbor of each point in subset. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log 2n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering (NBC), and the proposed algorithm can be adapted to the parallel and distributed computing framework. The experimental results demonstrate that our algorithm is practical for large datasets.
- Published
- 2015
41. Multi-Threaded Hierarchical Clustering by Parallel Nearest-Neighbor Chaining
- Author
-
Sungroh Yoon and Yongkweon Jeon
- Subjects
Clustering high-dimensional data ,Hierarchical agglomerative clustering ,Fuzzy clustering ,Brown clustering ,Computer science ,Distributed computing ,Correlation clustering ,Constrained clustering ,k-nearest neighbors algorithm ,Hierarchical clustering ,Biclustering ,Data stream clustering ,Computational Theory and Mathematics ,Hardware and Architecture ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,Signal Processing ,Consensus clustering ,Canopy clustering algorithm ,Unsupervised learning ,Algorithm design ,Hierarchical clustering of networks ,Cluster analysis - Abstract
Hierarchical agglomerative clustering (HAC) is a clustering method widely used in various disciplines from astronomy to zoology. HAC is useful for discovering hierarchical structure embedded in input data. The cost of executing HAC on large data is typically high, due to the need for maintaining global inter-cluster distance information throughout the execution. To address this issue, we propose a new parallelization scheme for multi-threaded shared-memory machines based on the concept of nearest-neighbor (NN) chains. The proposed multi-threaded algorithm allocates available threads into two groups, one for managing NN chains and the other for updating distance information. In-depth analysis of our approach gives insight into the ideal configuration of threads and theoretical performance bounds. We evaluate our proposed method by testing it with multiple public datasets and comparing its performance with that of several alternatives. In our test, the proposed method completes hierarchical clustering 3.09-51.79 times faster than the alternatives. Our test results also reveal the effects of performance-limiting factors such as starvation in chain growing, overhead incurred from using synchronization locks, and hardware aspects including memory-bandwidth saturation. According to our evaluation, the proposed scheme is effective in improving the HAC algorithm, achieving significant gains over the alternatives in terms of runtime and scalability.
- Published
- 2015
42. Fast nearest neighbor searching based on improved VP-tree
- Author
-
Shiguang Liu and Yinwei Wei
- Subjects
Computational complexity theory ,Computer science ,computer.software_genre ,k-nearest neighbors algorithm ,Best bin first ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,Signal Processing ,Pattern recognition (psychology) ,Ball tree ,Redundancy (engineering) ,Computer Vision and Pattern Recognition ,Data mining ,computer ,Software ,Vantage-point tree - Abstract
A novel nearest neighbor searching framework is proposed.PCA is adopted to optimize the VP-tree structure for efficiency improvement.A novel approach to controlling the pruning conditions of VP-tree is developed.A completely eliminating redundancy method CERM is specially designed. Nearest neighbor searching is an important issue in both pattern recognition and image processing. However, most of the previous methods suffer from high computational complexity, restricting nearest neighbor searching from practical applications. This paper proposes a novel fast nearest neighbor searching method by combining improved VP-tree and PatchMatch method. PCA (Principal Component Analysis) method is employed to optimize the VP-tree so as to improve the searching speed. We also design an approach to controlling the pruning conditions of VP-tree which further improves the searching efficiency. A thorough redundancy elimination method on GPU is also developed, with a satisfactory independent-of-the-patch-size computational complexity. Various experiments show that our new method achieves a better balance between computational efficiency and memory requirements, while also improves the searching accuracy somehow, with great potential for practical real-time applications.
- Published
- 2015
43. Improved nearest neighbor classifiers by weighting and selection of predictors
- Author
-
Dominik Koch and Gerhard Tutz
- Subjects
Statistics and Probability ,Boosting (machine learning) ,02 engineering and technology ,computer.software_genre ,Logistic regression ,01 natural sciences ,Theoretical Computer Science ,k-nearest neighbors algorithm ,010104 statistics & probability ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Parametric statistics ,Mathematics ,business.industry ,Pattern recognition ,Weighting ,Random forest ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,Statistics, Probability and Uncertainty ,business ,computer - Abstract
Nearest neighborhood classification is a flexible classification method that works under weak assumptions. The basic concept is to use the weighted or un-weighted sums over class indicators of observations in the neighborhood of the target value. Two modifications that improve the performance are considered here. Firstly, instead of using weights that are solely determined by the distances we estimate the weights by use of a logit model. By using a selection procedure like lasso or boosting the relevant nearest neighbors are automatically selected. Based on the concept of estimation and selection, in the second step, we extend the predictor space. We include nearest neighborhood counts, but also the original predictors themselves and nearest neighborhood counts that use distances in sub dimensions of the predictor space. The resulting classifiers combine the strength of nearest neighbor methods with parametric approaches and by use of sub dimensions are able to select the relevant features. Simulations and real data sets demonstrate that the method yields better misclassification rates than currently available nearest neighborhood methods and is a strong and flexible competitor in classification problems.
- Published
- 2015
44. Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents
- Author
-
S. Mercy Shalinie and Karthick Seshadri
- Subjects
Clustering high-dimensional data ,Fuzzy clustering ,Theoretical computer science ,Computer Networks and Communications ,Computer science ,Correlation clustering ,Single-linkage clustering ,Parallel algorithm ,Parallel computing ,Theoretical Computer Science ,Text mining ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,Cut ,Cluster analysis ,Sequential algorithm ,k-medians clustering ,Brown clustering ,business.industry ,Computer Science Applications ,Hierarchical clustering ,Determining the number of clusters in a data set ,Data stream clustering ,Computational Theory and Mathematics ,Canopy clustering algorithm ,FLAME clustering ,Affinity propagation ,Hierarchical clustering of networks ,business ,Algorithm ,Software - Abstract
We propose a parallelization scheme for an existing algorithm for constructing a web-directory, that contains categories of web documents organized hierarchically. The clustering algorithm automatically infers the number of clusters using a quality function based on graph cuts. A parallel implementation of the algorithm has been developed to run on a cluster of multi-core processors interconnected by an intranet. The effect of the well-known Latent Semantic Indexing on the performance of the clustering algorithm is also considered. The parallelized graph-cut based clustering algorithm achieves an F-measure in the range [0.69,0.91] for the generated leaf-level clusters while yielding a precision-recall performance in the range [0.66,0.84] for the entire hierarchy of the generated clusters. As measured via empirical observations, the parallel algorithm achieves an average speedup of 7.38 over its sequential variant, at the same time yielding a better clustering performance than the sequential algorithm in terms of F-measure. Copyright © 2015 John Wiley & Sons, Ltd.
- Published
- 2015
45. Constructing a graph of connections in clustering algorithm of complex objects
- Subjects
Optimization algorithm ,k-nearest neighbors ,General Medicine ,algorithm Chameleon ,computer.software_genre ,k-nearest neighbors algorithm ,Hierarchical clustering ,graph construction ,Nearest-neighbor chain algorithm ,connectivity ,Graph (abstract data type) ,Data mining ,Cluster analysis ,lcsh:Science (General) ,computer ,Algorithm ,hierarchical clustering ,Mathematics ,clustering ,lcsh:Q1-390 - Abstract
The article describes the results of modifying the algorithm Chameleon. Hierarchical multi-level algorithm consists of several phases: the construction of the count, coarsening, the separation and recovery. Each phase can be used various approaches and algorithms. The main aim of the work is to study the quality of the clustering of different sets of data using a set of algorithms combinations at different stages of the algorithm and improve the stage of construction by the optimization algorithm of k choice in the graph construction of k of nearest neighbors
- Published
- 2015
46. A Generic Algorithm for k-Nearest Neighbor Graph Construction Based on Balanced Canopy Clustering
- Author
-
Sang-goo Lee, Heasoo Hwang, and Youngki Park
- Subjects
Theoretical computer science ,Graph bandwidth ,Nearest neighbor graph ,Nearest-neighbor chain algorithm ,Canopy clustering algorithm ,Graph (abstract data type) ,Strength of a graph ,Random geometric graph ,Mathematics ,Clustering coefficient - Abstract
Constructing a k-nearest neighbor (k-NN) graph is a primitive operation in the field of recommender systems, information retrieval, data mining and machine learning. Although there have been many algorithms proposed for constructing a k-NN graph, either the existing approaches cannot be used for various types of similarity measures, or the performance of the approaches is decreased as the number of nodes or dimensions increases. In this paper, we present a novel algorithm for k-NN graph construction based on "balanced" canopy clustering. The experimental results show that irrespective of the number of nodes or dimensions, our algorithm is at least five times faster than the brute-force approach while retaining an accuracy of approximately 92%.
- Published
- 2015
47. Enhanced Genetic Algorithm with K-Means for the Clustering Problem
- Author
-
J. B. Lønnum, Noureddine Bouhmala, and A. Viken
- Subjects
Determining the number of clusters in a data set ,Cultural algorithm ,CURE data clustering algorithm ,Population-based incremental learning ,Nearest-neighbor chain algorithm ,Correlation clustering ,Canopy clustering algorithm ,Algorithm ,Mathematics ,FSA-Red Algorithm - Abstract
In this paper, an algorithm for the clustering problem using a combination of the genetic algorithm with the popular K-Means greedy algorithm is proposed. The main idea of this algorithm is to use the genetic search approach to generate new clusters using the famous two-point crossover and then apply the K-Means technique to further improve the quality of the formed clusters in order to speed up the search process. Experimental results demonstrate that the proposed genetic algorithm combined with K-Means converges faster while producing the same quality of the clustering compared to the standard genetic algorithm.
- Published
- 2015
48. Minimum Spanning Tree-Resembling algorithm for Clusters, Outliers and Hubs
- Author
-
S. John Peter
- Subjects
Algebra and Number Theory ,Spanning tree ,Applied Mathematics ,Correlation clustering ,Single-linkage clustering ,Minimum spanning tree ,ComputingMethodologies_PATTERNRECOGNITION ,CURE data clustering algorithm ,Nearest-neighbor chain algorithm ,Reverse-delete algorithm ,Cluster analysis ,Algorithm ,Analysis ,Mathematics - Abstract
Many systems in Science and Engineering can be modeled as graph. Clustering is a process of discovering group of objects such that the objects of the same group are similar, and objects belonging to different groups are dissimilar. A number of clustering algorithms exist that can solve the problem of clustering, but most of them are very sensitive to their input parameters. Graph-based clustering algorithms aimed to find hidden structures from objects. Graph-based clustering algorithm is capable of detecting clusters with irregular shape and sizes. In this paper we present a new algorithm MSTRCOH similar to Minimum Spanning Tree based clustering algorithm. Using this algorithm sub trees are automatically generated from high density region to low density of the graph, where each sub tree will be looked like minimum spanning tree is considered as cluster. The algorithm also detects outliers and hubs, which are present in the data set. Identifying hubs are useful for applications such as viral market...
- Published
- 2015
49. A simple statistics-based nearest neighbor cluster detection algorithm
- Author
-
José-A. Nieves-Vázquez, Gonzalo Urcid, and Gerhard X. Ritter
- Subjects
business.industry ,Single-linkage clustering ,Pattern recognition ,Complete-linkage clustering ,k-nearest neighbors algorithm ,Data set ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,Signal Processing ,Statistics ,Outlier ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Cluster analysis ,business ,Algorithm ,Software ,k-medians clustering ,Mathematics - Abstract
We propose a new method for autonomously finding clusters in spatial data. The proposed method belongs to the so called nearest neighbor approaches for finding clusters. It is a repetitive technique which produces changing averages and deviations of nearest neighbor distance parameters and results in a final set of clusters. The proposed technique is capable of eliminating background noise, outliers, and detection of clusters with different densities in a given data set. Using a wide variety of data sets, we demonstrate that the proposed cluster seeking algorithm performs at least as well as various other currently popular algorithms and in several cases surpasses them in performance. HighlightsA new clustering algorithm based on simple statistics and lattice metrics is given.Mathematical rationale is explained in detail and theorem proofs are provided.Performance classification of the SSNN algorithm is illustrated with 2D datasets.Jain?s benchmark dataset is used to show the SSNN cluster finding capability.High-dimensional image patterns are included as additional clustering examples.
- Published
- 2015
50. Classification Algorithm Based on Natural Nearest Neighbor
- Author
-
Qingsheng Zhu, Huijun Liu, and Ying Zhang
- Subjects
Majority rule ,Adaptive algorithm ,business.industry ,Computer science ,Nearest neighbor search ,Pattern recognition ,Library and Information Sciences ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,k-nearest neighbors algorithm ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,Computational Theory and Mathematics ,Nearest-neighbor chain algorithm ,Artificial intelligence ,Data mining ,business ,Algorithm ,computer ,Large margin nearest neighbor ,Information Systems - Abstract
K-nearest Neighbor (KNN) is a widely used classiflcation method that operates by a majority vote on a k{ nearest neighbor set. But when the KNN is used to deal with the datasets with difierent characteristics in classiflcation, it is di‐cult to select an appropriate parameter k which afiects obviously the performance and e‐ciency of the algorithm. Therefore, how to select appropriate parameter k value of KNN algorithm has been an open issue in the fleld of data mining. So, Natural Nearest Neighbor (3N) proposed by us is a novel concept on nearest neighbor, which does not need a parameter K and in which the neighbors of each point are obtained by an adaptive algorithm. This paper proposes a Classiflcation Algorithm based on Natural Nearest Neighbor (CAb3N). Comprehensive experimental results on the UCI dataset conflrm the claims that CAb3N not only has the advantage of parameter-free, but also has the better accuracy and overall performance both than the traditional KNN algorithm and hubness-based weighted KNN classiflcation algorithm.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.