359 results on '"Cover tree"'
Search Results
2. An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query
- Author
-
Shan-Shan Li
- Subjects
Clustering ,DBSCAN ,neighbor similarity ,cover tree ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
DBSCAN is the most famous density based clustering algorithm which is one of the main clustering paradigms. However, there are many redundant distance computations among the process of DBSCAN clustering, due to brute force Range-Query used to retrieve neighbors for each point in DBSCAN, which yields high complexity (O(n2)) and low efficiency. Thus, it is unsuitable and not applicable for large scale data. In this paper, an improved DBSCAN based on neighbor similarity is proposed, which utilizes and Cover Tree to retrieve neighbors for each point in parallel, and the triangle inequality to filter many unnecessary distance computations. From the experiments conducted on large scale data sets, it is demonstrated that the proposed algorithm greatly speedup the original DBSCAN, and outperform the main improvements of DBSCAN. Comparing with ρ-approximate DBSCAN, which is the current fastest but approximate version of DBSCAN, the proposed algorithm has two advantages: one is faster and the other is that the result is accurate.
- Published
- 2020
- Full Text
- View/download PDF
3. Trustworthy and Context-Aware Distributed Online Learning With Autoscaling for Content Caching in Collaborative Mobile Edge Computing
- Author
-
Changkun Jiang, Lixing Chen, Zichuan Xu, Shimin Gong, Xiaofeng Ding, Pan Zhou, and Yulai Xie
- Subjects
Cover tree ,Mobile edge computing ,Computer Networks and Communications ,Computer science ,business.industry ,Distributed computing ,Big data ,Context (language use) ,Autoscaling ,Tree (data structure) ,Artificial Intelligence ,Hardware and Architecture ,Cache ,business ,Mobile device - Abstract
Content caching is widely recognized a promising functionality to improve service performance in mobile edge computing (MEC). In the big data era, there are massive heterogeneous contents collected by the mobile devices, belonging to different users with specific context (e.g., hobby, environment, age, etc). However, local content caching without content popularity and context information in advance is not accurate enough. Especially, multiple large-scale contents cached in the local database bring high pressure to the process of content selection. Hence, to handle these important issues, we propose a context-aware distributed online learning algorithm for efficient content caching according to a novel tree-based and contextual multi-arm bandit theory for collaborative MEC in this paper. To guarantee the trustworthy collaboration, we introduce a trust evaluation factor to find reliable neighboring ENs. Moreover, our system extracts contextual information from users into the context space and builds up a content cover tree to maximize caching hit rates to satisfy users’ demands. Our simulation results based on a real-world dataset indicate that our proposal can achieve a balance between caching hit rates and time cost, and have a sublinear bound of cumulative regret. This verifies its superior caching-hits performance gain compared to the other related algorithms.
- Published
- 2021
- Full Text
- View/download PDF
4. Coverage Lifetime Maximization
- Author
-
Wang, Bang and Wang, Bang
- Published
- 2010
- Full Text
- View/download PDF
5. A Scalable Noise Reduction Technique for Large Case-Based Systems
- Author
-
Segata, Nicola, Blanzieri, Enrico, Cunningham, Pádraig, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, McGinty, Lorraine, editor, and Wilson, David C., editor
- Published
- 2009
- Full Text
- View/download PDF
6. Diversified and Scalable Service Recommendation With Accuracy Guarantee
- Author
-
Xuyun Zhang, Gautam Srivastava, Tian Wang, Lina Wang, Shaohua Wan, Shaoning Pang, and Lianyong Qi
- Subjects
Information retrieval ,Cover tree ,Computer science ,Hash function ,020206 networking & telecommunications ,02 engineering and technology ,Recommender system ,MovieLens ,Electronic mail ,Human-Computer Interaction ,Set (abstract data type) ,Modeling and Simulation ,0202 electrical engineering, electronic engineering, information engineering ,Information system ,Collaborative filtering ,020201 artificial intelligence & image processing ,Social Sciences (miscellaneous) - Abstract
As one of the most successful recommendation techniques, neighborhood-based collaborative filtering (CF), which recommends appropriate items to a target user by identifying similar users or similar items, has been widely applied to various recommender systems. Although many neighbor-based CF methods have been put forward, there are still some open issues that have remained unsolved. First, the ever-increasing volume of user–item rating data decreases the recommendation efficiency significantly as a recommender system needs to analyze all the rating data when searching for similar neighbors or similar items. In this situation, users’ requirements on quick response may not be met. Second, in neighbor-based CF methods, more attention is paid to the recommendation accuracy while other key indicators of recommendation performances are often ignored, i.e., recommendation diversity (RD), which probably produces similar or redundant items in the recommended list and decreases users’ satisfaction. Considering these issues, a diversified and scalable recommendation method (called DR_LT) based on locality-sensitive hashing and cover tree is proposed in this article, where the item topic information is used to optimize the final recommended list. We show the effectiveness of our proposed method through a set of experiments on MovieLens data set that clearly shows the feasibility of our proposal in terms of item recommendation accuracy, diversity, and scalability.
- Published
- 2021
- Full Text
- View/download PDF
7. Green Cover (Tree) Analysis of Chennai Metropolitan Area Using High Resolution Satellite Imagery
- Author
-
Murugasan Rajiah, Venkatesan Chinnappa, and Thirumalaivasan Devarajan
- Subjects
Cover tree ,Environmental Chemistry ,Environmental science ,Urban environment ,Change detection ,General Environmental Science ,Remote sensing - Published
- 2021
- Full Text
- View/download PDF
8. Analysis of Tree Edit Distance Algorithms
- Author
-
Dulucq, Serge, Touzet, Hélène, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Baeza-Yates, Ricardo, editor, Chávez, Edgar, editor, and Crochemore, Maxime, editor
- Published
- 2003
- Full Text
- View/download PDF
9. Similarity query processing for high-dimensional data
- Author
-
Wei Wang, Chuan Xiao, Ying Zhang, and Jianbin Qin
- Subjects
Clustering high-dimensional data ,Theoretical computer science ,Cover tree ,Computer science ,General Engineering ,02 engineering and technology ,Popularity ,0802 Computation Theory and Mathematics, 0806 Information Systems, 0807 Library and Information Studies ,Locality-sensitive hashing ,Range (mathematics) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Versa ,Strengths and weaknesses - Abstract
Similarity query processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications. Recently, embedding and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing for high-dimensional data. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. In this tutorial, we summarize existing solutions, especially recent advancements from both database (DB) and machine learning (ML) communities, and analyze their strengths and weaknesses. We review exact and approximate methods such as cover tree, locality sensitive hashing, product quantization, and proximity graphs. We also discuss the selectivity estimation problem and show how researchers are bringing in state-of-the-art ML techniques to address the problem. By highlighting the strong connections between DB and ML, we hope that this tutorial provides an impetus towards new ML for DB solutions and vice versa.
- Published
- 2020
- Full Text
- View/download PDF
10. High-Dimensional Similarity Query Processing for Data Science
- Author
-
Chuan Xiao, Ying Zhang, Wei Wang, Yaoshu Wang, and Jianbin Qin
- Subjects
Cover tree ,Computer science ,Nearest neighbor search ,Similarity (psychology) ,Data deduplication ,Recommender system ,Feature learning ,Data science ,Image retrieval ,Locality-sensitive hashing - Abstract
Similarity query (a.k.a. nearest neighbor query) processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications (e.g., classification & regression, deduplication, image retrieval, and recommender systems). Recently, representation learning and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with dense high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. This tutorial aims to provide a comprehensive review of high-dimensional similarity query processing for data science. It introduces solutions from a variety of research communities, including data mining (DM), database (DB), machine learning (ML), computer vision (CV), natural language processing (NLP), and theoretical computer science (TCS), thereby highlighting the interplay between modern computer science and artificial intelligence technologies. We first discuss the importance of high-dimensional similarity query processing in data science applications, and then review query processing algorithms such as cover tree, locality sensitive hashing, product quantization, proximity graphs, as well as recent advancements such as learned indexes. We analyze their strengths and weaknesses and discuss the selection of algorithms in various application scenarios. Moreover, we consider the selectivity estimation of high-dimensional similarity queries, and show how researchers are bringing in state-of-the-art ML techniques to address this problem. We expect that this tutorial will provide an impetus towards new technologies for data science.
- Published
- 2021
11. Plug-and-Play Dual-Tree Algorithm Runtime Analysis.
- Author
-
Curtin, Ryan R., Dongryeol Lee, March, William B., and Ram, Parikshit
- Subjects
- *
MACHINE learning , *REINFORCEMENT learning , *ALGORITHMS , *MACHINE theory , *PLUG & play (Computer architecture) , *COMPUTER architecture - Abstract
Numerous machine learning algorithms contain pairwise statistical problems at their core-that is, tasks that require computations over all pairs of input points if implemented naively. Often, tree structures are used to solve these problems efficiently. Dual-tree algorithms can efficiently solve or approximate many of these problems. Using cover trees, rigorous worst-case runtime guarantees have been proven for some of these algorithms. In this paper, we present a problem-independent runtime guarantee for any dual-tree algorithm using the cover tree, separating out the problem-dependent and the problem-independent elements. This allows us to just plug in bounds for the problem-dependent elements to get runtime guarantees for dual-tree algorithms for any pairwise statistical problem without re-deriving the entire proof. We demonstrate this plug-and-play procedure for nearest-neighbor search and approximate kernel density estimation to get improved runtime guarantees. Under mild assumptions, we also present the first linear runtime guarantee for dual-tree based range search. [ABSTRACT FROM AUTHOR]
- Published
- 2015
12. Effects of Rock Phosphate on Indigenous Rhizobia Associated with Sesbania sesban
- Author
-
Sacko, O., Samba, R., Yattara, I., Faye, N., do Rego, F., Diop, T., Neyra, M., Gueye, M., Dakora, Felix D., editor, Chimphango, Samson B. M., editor, Valentine, Alex J., editor, Elmerich, Claudine, editor, and Newton, William E., editor
- Published
- 2008
- Full Text
- View/download PDF
13. A Practical Index Structure Supporting Fr\'echet Proximity Queries Among Trajectories
- Author
-
Michael Horton, Martin P. Seybold, Joachim Gudmundsson, and John Pfeifer
- Subjects
Computer Science - Machine Learning ,Cover tree ,Theoretical computer science ,Computer science ,Structure (category theory) ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,k-nearest neighbors algorithm ,020204 information systems ,Computer Science - Data Structures and Algorithms ,0202 electrical engineering, electronic engineering, information engineering ,Discrete Mathematics and Combinatorics ,Cluster analysis ,Fréchet distance ,Computer Science Applications ,Range (mathematics) ,010201 computation theory & mathematics ,Modeling and Simulation ,Signal Processing ,Scalability ,Trajectory ,Computer Science - Computational Geometry ,Geometry and Topology ,F.2.2 ,Information Systems - Abstract
We present a scalable approach for range and k nearest neighbor queries under computationally expensive metrics, like the continuous Fréchet distance on trajectory data. Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories, regardless of the trajectory’s individual sizes or the spatial dimension, which allows one to exploit low “intrinsic dimensionality” of datasets for effective search space pruning. Since the distance computation is expensive, generic metric indexing methods are rendered impractical. We present strategies that (i) improve on known upper and lower bound computations, (ii) build cluster trees without any or very few distance calls, and (iii) search using bounds for metric pruning, interval orderings for reduction, and randomized pivoting for reporting the final results. We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world datasets. The results show improvement over state-of-the-art methods for exact queries, and even further speedups are achieved for queries that may return approximate results. Surprisingly, the majority of exact nearest-neighbor queries on real datasets are answered without any distance computations.
- Published
- 2020
14. Generative Local Metric Learning for Nearest Neighbor Classification
- Author
-
Yung-Kyun Noh, Byoung-Tak Zhang, and Daniel D. Lee
- Subjects
Cover tree ,Nearest neighbor search ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,k-nearest neighbors algorithm ,Discriminative model ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,0105 earth and related environmental sciences ,Mathematics ,business.industry ,Applied Mathematics ,Dimensionality reduction ,Pattern recognition ,Generative model ,Computational Theory and Mathematics ,Metric (mathematics) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software ,Large margin nearest neighbor - Abstract
We consider the problem of learning a local metric in order to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning to dimensionality reduction from a novel perspective, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models such as a Gaussian.
- Published
- 2018
- Full Text
- View/download PDF
15. Diversity optimization for recommendation using improved cover tree
- Author
-
Yongqiang Dong, Peng Yang, and Liang Gu
- Subjects
Information Systems and Management ,Correctness ,Cover tree ,business.industry ,Computer science ,Compromise ,media_common.quotation_subject ,Novelty ,02 engineering and technology ,Recommender system ,computer.software_genre ,Machine learning ,Management Information Systems ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer ,Software ,Diversity (business) ,media_common - Abstract
Recently, diversity optimization has played an increasingly important role in recommender systems to improve user satisfaction, and it has attracted more attention in the research community. In this paper, we propose a novel diversity-optimization method based on a time-sensitive semantic cover tree (T2SCT). Specifically, we first define T2SCT and its construction algorithm. Based on T2SCT, we present details of the diversified item-selection algorithm and two supplement algorithms to obtain a complete diversified item list. Then, we give a theoretical analysis to prove the correctness of the proposed method. In general, the proposed method can make diverse recommendations with very little compromise on accuracy. Moreover, the proposed method converges quickly and exhibits good item novelty, owing to the inherent superiority of T2SCT. We conduct extensive experiments on a real-world dataset to verify the performance of our method. Results illustrate that the method is effective and efficient, outperforming other conventional approaches.
- Published
- 2017
- Full Text
- View/download PDF
16. Long-term change in sub-alpine forest cover, tree line and species composition in the Swiss Alps
- Author
-
John Rogan, Peter Bebi, Nathan Mietkiewicz, and Dominik Kulakowski
- Subjects
0106 biological sciences ,Cover tree ,010504 meteorology & atmospheric sciences ,Ecology ,biology ,European Larch ,Climate change ,Alpine climate ,Plant Science ,biology.organism_classification ,010603 evolutionary biology ,01 natural sciences ,Geography ,Centennial ,Aerial photography ,Forest cover ,Dominance (ecology) ,0105 earth and related environmental sciences - Abstract
Context The 20th century has been marked by dramatic changes in land-use, disturbance regimes, and climate, which have interacted to affect global ecological patterns and dynamics, including changes in the extent, composition, and structure of forest cover. Although much research has highlighted dramatic, short-term ecological change, ongoing trends of land-use change and climate change began more than a century ago. Consequently, quantifying and understanding long-term (e.g., centennial) ecological change is critical to contextualizing recent patterns and processes. Here we document changes in the extent, position, and composition of subalpine forests over the past century in eastern Switzerland. Methods Position of treeline, forest cover, and forest composition were evaluated using a unique combination of Object-Based Image Classification of an historical (1909) map, recent (2009) aerial photography, and repeat terrestrial photography to minimize the inherent bias of each data source, while providing the most robust representation of long-term ecological change. Results Over the past century total forest cover expanded by 64.6% and the position of subalpine treeline increased on all aspects. Total forest cover also increased at the highest and lowest elevations on all aspects. Dominance of European larch increased at the highest elevations, but decreased at the lowest elevations, where it was replaced by Norway spruce. These patterns suggest land-use has been the most important driver of forest change over the past century. Conclusions Major changes in the extent, structure, and dynamics of subalpine forests in the Alps initiated earlier than previously documented and most change occurred prior to the middle of the 20th century. Furthermore, these changes were likely driven primarily by changes in land-use, rather than by changes in climate. A combination of data sources and methodological approaches, such as those of the current study, provides a clearer view of long-term changes and minimizes the biases associated with any single data source or methodology. This article is protected by copyright. All rights reserved.
- Published
- 2017
- Full Text
- View/download PDF
17. Efficient Sub-Window Nearest Neighbor Search on Matrix
- Author
-
Tsz Nam Chan, Man Lung Yiu, and Kien A. Hua
- Subjects
Cover tree ,Computer science ,Nearest neighbor search ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,k-nearest neighbors algorithm ,Locality-sensitive hashing ,Best bin first ,Computational Theory and Mathematics ,Nearest neighbor graph ,020204 information systems ,Nearest-neighbor chain algorithm ,R-tree ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Fixed-radius near neighbors ,Algorithm ,computer ,Large margin nearest neighbor ,Information Systems - Abstract
We study a nearest neighbor search problem on a matrix by its element values. Given a data matrix $D$ and a query matrix $q$ , the sub-window nearest neighbor search problem finds a sub-window of $D$ that is the most similar to $q$ . This problem has a wide range of applications, e.g., geospatial data integration, object detection, and motion estimation. In this paper, we propose an efficient progressive search solution that overcomes the drawbacks of existing solutions. First, we present a generic approach to build level-based lower bound functions on top of basic lower bound functions. Second, we develop a novel lower bound function for a group of sub-windows, in order to boost the efficiency of our solution. Furthermore, we extend our solution to support irregular-shaped queries. Experimental results on real data demonstrate the efficiency of our proposed methods.
- Published
- 2017
- Full Text
- View/download PDF
18. Multiscale Geometric Data Analysis via Laplacian Eigenvector Cascading
- Author
-
Joshua L. Mike and Jose A. Perea
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Persistent homology ,Cover tree ,Computer science ,MathematicsofComputing_NUMERICALANALYSIS ,Unsupervised learning ,Laplacian matrix ,Eigenfunction ,Laplace operator ,Algorithm ,Graph ,Eigenvalues and eigenvectors ,Geometric data analysis - Abstract
We develop here an algorithmic framework for constructing consistent multiscale Laplacian eigenfunctions (vectors) on data. Consequently, we address the unsupervised machine learning task of finding scalar functions capturing consistent structure across scales in data, in a way that encodes intrinsic geometric and topological features. This is accomplished by two algorithms for eigenvector cascading. We show via examples that cascading accelerates the computation of graph Laplacian eigenvectors, and more importantly, that one obtains consistent bases of the associated eigenspaces across scales. Finally, we present an application to TDA mapper, showing that our multiscale Laplacian eigenvectors identify stable flair-like structures in mapper graphs of varying granularity.
- Published
- 2019
- Full Text
- View/download PDF
19. PERSISTENT COHOMOLOGY OF COVER REFINEMENTS
- Author
-
Markovichenko, Oleksandr
- Subjects
- Mathematics, Cover tree, Persistent homology, Topological data analysis
- Published
- 2022
20. k-Expected Nearest Neighbor Search over Gaussian Objects
- Author
-
Yoshiharu Ishikawa, Jing Zhao, Chuan Xiao, and Tingting Dong
- Subjects
Cover tree ,General Computer Science ,Computer science ,Nearest neighbor search ,02 engineering and technology ,Locality-sensitive hashing ,Combinatorics ,Best bin first ,Nearest neighbor graph ,020204 information systems ,Nearest-neighbor chain algorithm ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Fixed-radius near neighbors - Published
- 2017
- Full Text
- View/download PDF
21. Natural neighbor: A self-adaptive neighborhood method without parameter K
- Author
-
Jinlong Huang, Ji Feng, and Qingsheng Zhu
- Subjects
Cover tree ,business.industry ,Nearest neighbor search ,Pattern recognition ,02 engineering and technology ,computer.software_genre ,k-nearest neighbors algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,Nearest neighbor graph ,Artificial Intelligence ,020204 information systems ,Signal Processing ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Data mining ,business ,Fixed-radius near neighbors ,computer ,Software ,Large margin nearest neighbor ,Mathematics - Abstract
Our method finds neighbors without any parameter.Our method can handle manifold clusters and noises.The neighbor number of each data is dynamic.Most clusters and outliers are easy to identify by their natural neighbor.Natural Neighbor Eigenvalue is a better choice of k in traditional KNN. Display Omitted K-nearest neighbor (KNN) and reverse k-nearest neighbor (RkNN) are two bases of many well-established and high-performance pattern-recognition techniques, but both of them are vulnerable to their parameter choice. Essentially, the challenge is to detect the neighborhood of various data sets, while utterly ignorant of the data characteristic. In this paper, a novel concept in terms of nearest neighbor is proposed and named natural neighbor (NaN). In contrast to KNN and RkNN, it is a scale-free neighbor, and it can reflect a better data characteristics. This article discusses the theoretical model and applications of natural neighbor in a different field, and we demonstrate the improvement of the proposed neighborhood on both synthetic and real-world data sets.
- Published
- 2016
- Full Text
- View/download PDF
22. A Survey: Over Various Hashing Techniques Which Provide Nearest Neighbor Search in Large Scale Data
- Author
-
Jitendra Agrawal and Mahendra Kumar Ahirwar
- Subjects
Cover tree ,Computer science ,business.industry ,Nearest neighbor search ,Hash function ,Pattern recognition ,02 engineering and technology ,Large scale data ,computer.software_genre ,Locality-sensitive hashing ,Best bin first ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer - Published
- 2016
- Full Text
- View/download PDF
23. Engineering optimization by constrained differential evolution with nearest neighbor comparison
- Author
-
Pham Hoang Anh
- Subjects
021103 operations research ,Cover tree ,business.industry ,Nearest neighbor search ,0211 other engineering and technologies ,Pattern recognition ,02 engineering and technology ,k-nearest neighbors algorithm ,Engineering optimization ,Best bin first ,Nearest neighbor graph ,Nearest-neighbor chain algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Algorithm ,Large margin nearest neighbor ,Mathematics - Abstract
It has been proposed to utilize nearest neighbor comparison to reduce the number of function evaluations in unconstrained optimization. The nearest neighbor comparison omits the function evaluation of a point when the comparison can be judged by its nearest point in the search population. In this paper, a constrained differential evolution (DE) algorithm is proposed by combining the ε constrained method to handle constraints with the nearest neighbor comparison method. The algorithm is tested using five benchmark engineering design problems and the results indicate that the proposed DE algorithm is able to find good results in a much smaller number of objective function evaluations than conventional DE and it is competitive to other state-of-the-art DE variants.
- Published
- 2016
- Full Text
- View/download PDF
24. Self-representation nearest neighbor search for classification
- Author
-
Shichao Zhang, Ming Zong, Lianli Gao, Debo Cheng, Zhang, Shichao, Cheng, Debo, Zong, Ming, and Gao, Lianli
- Subjects
reconstruction ,Cover tree ,sparse coding ,Computer science ,Cognitive Neuroscience ,Nearest neighbor search ,Decision tree ,k nearest neighbors ,Sample (statistics) ,02 engineering and technology ,computer.software_genre ,k-nearest neighbors algorithm ,Artificial Intelligence ,020204 information systems ,decision tree ,Ball tree ,self-representation ,0202 electrical engineering, electronic engineering, information engineering ,Point (geometry) ,business.industry ,Pattern recognition ,Computer Science Applications ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,business ,computer - Abstract
This paper proposes a self-representation method for kNN (k nearest neighbors) classification. Specifically, this paper first designs a self-reconstruction method to reconstruct each data point by all the data, and the derived reconstruction coefficient is then used for calculating the k value for each training sample. Furthermore, a decision tree is built with the resulting k values for each data point to output labels of the training samples. With the built decision tree, the proposed method classifies test samples. Finally, the experimental results on real datasets showed the proposed method outperformed the state-of-the-art methods.
- Published
- 2016
- Full Text
- View/download PDF
25. EFFECTIVE COVERAGE AND CONNECTIVITY CONSERVATION WITH LOAD BALANCING FOR WSN
- Author
-
Bindhu Madhavi P, T. Jit, MohammedJameel A Zarzari, and M. Nagendra
- Subjects
Cover tree ,Computer science ,Distributed computing ,010401 analytical chemistry ,Probabilistic logic ,020206 networking & telecommunications ,02 engineering and technology ,Energy consumption ,Load balancing (computing) ,01 natural sciences ,0104 chemical sciences ,Nondeterministic algorithm ,Base station ,0202 electrical engineering, electronic engineering, information engineering ,Wireless sensor network ,Efficient energy use - Abstract
Wireless sensor networks provide maximum coverage area in a network as much as possible, as the basic objective. Such as, identifying the location of interruptions in combat zone and tracking of an object, with reliable services. Organization of sensor nodes to construct maximum subgroups with energy limit, ability to monitor every required discrete points and activation in alternate manner is the best possible method for quality assurance. Behind the objective accomplishing the maximum subgroups, in the aim of obtaining the maximum connectivity arises the question on assurance of connectivity among nodes. To assure full coverage with base station connectivity in each sensing node, this paper proposes a new algorithm as Maximum Connected Load Balancing Cover Tree (MCLCT) with dynamically formation of routing cover tress with load-balancing. This kind of problem can be known as maximum cover tree problem, proved as nondeterministic polynomial-complete. There are two techniques in proposed MCLCT, first one is conservation of coverage by using recursive heuristic coverage optimization, and second is strat egy for identification of route using probabilistic load balance. The proposed MCLCT provides the solution by sharing the load among the nodes while sensing and transmitting, hence this makes energy consumption balanced between the nodes. Collection of simulation results depicts that, the proposed method accomplishes the objectives in terms of conservation of connectivity and energy efficiency.
- Published
- 2016
- Full Text
- View/download PDF
26. NearTree, a data structure and a software toolkit for the nearest-neighbor problem
- Author
-
Lawrence C. Andrews and Herbert J. Bernstein
- Subjects
Cover tree ,Computer science ,business.industry ,Nearest neighbor search ,Software tool ,02 engineering and technology ,computer.software_genre ,Data structure ,Research Papers ,General Biochemistry, Genetics and Molecular Biology ,k-nearest neighbors algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Software ,020204 information systems ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,business ,computer - Abstract
Many problems in crystallography and other fields can be treated as nearest-neighbor problems. The neartree data structure provides a flexible way to organize and retrieve metric data. In some cases, it can provide near-optimal performance.NearTreeis a software tool that constructs neartrees and provides a number of different query tools.
- Published
- 2016
- Full Text
- View/download PDF
27. Structured Spectral Clustering of PurTree Data
- Author
-
Xiaojun Chen, Rui Mao, Yixiang Fang, and Chao Guo
- Subjects
050101 languages & linguistics ,Cover tree ,Computer science ,business.industry ,05 social sciences ,Pattern recognition ,02 engineering and technology ,Data structure ,Spectral clustering ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Artificial intelligence ,Benchmark data ,business ,Cluster analysis ,Transaction data ,Subspace topology - Abstract
Recently, a “Purchase Tree” data structure is proposed to compress the customer transaction data and a local PurTree Spectral clustering method is proposed to recover the cluster structure from the purchase trees. However, in the PurTree distance, the node weights for the children nodes of a parent node are set as equal and the difference between different nodes are not distinguished. In this paper, we propose a Structured PurTree Subspace Spectral (SPSS) clustering algorithm for PurTree Data. In the new method, we propose a PurTree subspace similarity to compute the similarity between two trees, in which a set of sparse and structured node weights are introduced to distinguish the importance of different nodes in a purchase tree. A new clustering model is proposed to learn a structured graph with explicit cluster structure. An iterative optimization algorithm is proposed to simultaneously learn the structured graph and node weights. We propose a balanced cover tree for fast k-NN searching during building affinity matrices. SPSS was compared with six clustering algorithms on 10 benchmark data sets and the experimental results show the superiority of the new method.
- Published
- 2019
- Full Text
- View/download PDF
28. Monte-Carlo Tree Search Aided Contextual Online Learning Approach for Wireless Caching
- Author
-
Binhong Dong, Xiaodong Wang, Yang Du, Shaoqian Li, Zhi Chen, and Gao Pengyu
- Subjects
Cover tree ,Wireless network ,business.industry ,Computer science ,Big data ,Monte Carlo tree search ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,MovieLens ,Reduction (complexity) ,Tree (data structure) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Enhanced Data Rates for GSM Evolution ,business ,computer - Abstract
Caching popular contents at the edge of wireless networks has recently emerged as a promising technique to offload mobile data traffic and improve the quality of service for users. In the big-data era, the size of the content space is essentially infinite. Moreover, users with common features typically share similar content preference. In order to address these issues, we model the wireless caching problem as a contextual multi-armed bandit (CMAB) problem that considers the infinitely arms, and propose a Monte-Carlo tree search aided contextual upper confidence bound (MCTS-CUCB) algorithm, to make accurate content caching with low complexity. Specifically, we introduce a tree-based search method to analyze the content subspace instead of a single content, thereby reducing the computing load. In the search process, a cover tree is built in an incremental and asymmetric manner, which can reflect the users' content preference. Besides, contextualization allows to learn content preferences for groups of users having similar contexts, which significantly accelerates the learning process and improve the cache hit rate. Our simulation results on a real-world data set (MovieLens 1M Dataset) demonstrate that the proposed MCTS-CUCB algorithm is capable of achieving a considerable reduction in complexity compared with the existing related algorithms with a superior cache hit rate performance.
- Published
- 2018
- Full Text
- View/download PDF
29. Inexact Gradient Projection and Fast Data Driven Compressed Sensing
- Author
-
Mohammad Golbabaee and Michael Davies
- Subjects
FOS: Computer and information sciences ,Mathematical optimization ,Cover tree ,Computer science ,Computer Science - Information Theory ,Nearest neighbor search ,Population ,Brute-force search ,approximate updates ,02 engineering and technology ,constrained least squares ,Library and Information Sciences ,01 natural sciences ,approximate nearest neighbour search ,0202 electrical engineering, electronic engineering, information engineering ,cover trees ,0101 mathematics ,Projection (set theory) ,education ,compressed sensing ,Linear convergence ,education.field_of_study ,Information Theory (cs.IT) ,020206 networking & telecommunications ,Data structure ,Computer Science Applications ,010101 applied mathematics ,Compressed sensing ,Exact algorithm ,Rate of convergence ,iterative projected gradient ,data driven models ,Convergence ,Algorithm ,Information Systems - Abstract
We study the convergence of the iterative projected gradient (IPG) algorithm for arbitrary (possibly non-convex) sets when both the gradient and projection oracles are computed approximately. We consider different notions of approximation of which we show that the progressive fixed precision and the $(1+ \varepsilon)$ -optimal oracles can achieve the same accuracy as for the exact IPG algorithm. We show that the former scheme is also able to maintain the (linear) rate of convergence of the exact algorithm under the same embedding assumption. In contrast, the $(1+ \varepsilon)$ -approximate oracle requires a stronger embedding condition, moderate compression ratios and it typically slows down the convergence. We apply our results to accelerate solving a class of data driven compressed sensing problems, where we replace iterative exhaustive searches over large data sets by fast approximate nearest neighbor search strategies based on the cover tree data structure. For data sets with low intrinsic dimensions, our proposed algorithm achieves a complexity logarithmic in terms of the data set population as opposed to the linear complexity of a brute force search. By running several numerical experiments, we conclude similar observations as predicted by our theoretical analysis.
- Published
- 2018
- Full Text
- View/download PDF
30. Advisory Search and Security on Data Mining using Clustering Approaches
- Author
-
G. Geetha, K. Madhavi, and P. MadhaviLatha
- Subjects
Set (abstract data type) ,Scheme (programming language) ,Data set ,Cover tree ,Dynamic database ,Computer science ,Sorting ,Data mining ,Cluster analysis ,computer.software_genre ,computer ,computer.programming_language - Abstract
Data mining is the procedure for extracting relevant data on a specific category. Now a day's from diversified items a unique relevant set extraction is main. Here different clustering algorithms were used and compared with their time of execution and accuracy. Our proposing clustering scheme helps the user in attracting and providing satisfaction in a relevant search. Notification services, considered as an example in this paper, the dynamic database are a major development in our development, whereas previously it completely depends on static version data set. Cover tree and supporting dynamic item sorting with Editing, inserting and deleting items in the database and provides the theoretical bounds which enhances the characteristics of cover tree solutions with respect to optimizing the data. Results include the accuracy for providing the valid information on proposing approach.
- Published
- 2018
- Full Text
- View/download PDF
31. Parallel Algorithms for Nearest Neighbor Search Problems in High Dimensions
- Author
-
George Biros and Bo Xiao
- Subjects
Theoretical computer science ,Cover tree ,Computer science ,Applied Mathematics ,Nearest neighbor search ,02 engineering and technology ,k-nearest neighbors algorithm ,Computational Mathematics ,Best bin first ,Nearest neighbor graph ,020204 information systems ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Fixed-radius near neighbors ,Large margin nearest neighbor - Abstract
The nearest neighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work on theory and algorithms, surprisingly little work has been done on algorithms for high-end computing platforms, and no open source library exists that can scale efficiently to thousands of cores. In this paper, we present algorithms and a library built on top of the message passing interface (MPI) and OpenMP that enable nearest neighbor searches to hundreds of thousands of cores for arbitrary-dimensional datasets. The library supports both exact and approximate nearest neighbor searches. The latter is based on iterative, randomized, and greedy KD-tree ($k$-dimensional tree) searches. We describe novel algorithms for the construction of the KD-tree, give complexity analysis, and provide experimental evidence for the scalability of the method. In our largest runs, we were able to perform an al...
- Published
- 2016
- Full Text
- View/download PDF
32. Space efficient data structures for nearest larger neighbor
- Author
-
Venkatesh Raman, Varunkumar Jayapaul, Seungbum Jo, Srinivasa Rao Satti, and Rajeev Raman
- Subjects
Theoretical computer science ,Cover tree ,Range query (data structures) ,Nearest neighbor search ,0102 computer and information sciences ,02 engineering and technology ,Range minimum query ,01 natural sciences ,Cartesian tree ,Theoretical Computer Science ,Succinct data structure ,Computational Theory and Mathematics ,Nearest neighbor graph ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Discrete Mathematics and Combinatorics ,020201 artificial intelligence & image processing ,Fixed-radius near neighbors ,Computer Science::Databases ,Mathematics - Abstract
Given a sequence of n elements from a totally ordered set, and a position in the sequence, the nearest larger neighbor (NLN) query returns the position of the element which is closest to the query position, and is larger than the element at the query position. The problem of finding the nearest larger neighbors of all the elements in a one-dimensional array has attracted interest due to its applications for parenthesis matching and in computational geometry 1-3. We consider a data structure version of this problem, which is to preprocess a given sequence of elements to construct a data structure that can answer NLN queries efficiently. We consider time-space tradeoffs for the problem in both the indexing and the encoding models when the input is in a one dimensional array, and also initiate the study of this problem on two-dimensional arrays.
- Published
- 2016
- Full Text
- View/download PDF
33. Gravitational fixed radius nearest neighbor for imbalanced problem
- Author
-
Zhe Wang, Daqi Gao, and Yujin Zhu
- Subjects
Information Systems and Management ,Cover tree ,Computer science ,Nearest neighbor search ,Nearest neighbour algorithm ,computer.software_genre ,Management Information Systems ,k-nearest neighbors algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,Nearest neighbor graph ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,R-tree ,Ball tree ,Data mining ,Fixed-radius near neighbors ,computer ,Software ,Large margin nearest neighbor - Abstract
We use the gravitational scenario into the fixed radius nearest neighbor rule.The proposed GFRNN deals with imbalanced classification problem.GFRNN does not need any manual parameter setting or coordination.Comparison experiments on 40 datasets validate its effectiveness and efficiency. This paper proposes a novel learning model that introduces the calculation of the pairwise gravitation of the selected patterns into the classical fixed radius nearest neighbor method, in order to overcome the drawback of the original nearest neighbor rule when dealing with imbalanced data. The traditional k nearest neighbor rule is considered to lose power on imbalanced datasets because the final decision might be dominated by the patterns from negative classes in spite of the distance measurements. Differently from the existing modified nearest neighbor learning model, the proposed method named GFRNN has a simple structure and thus becomes easy to work. Moreover, all parameters of GFRNN do not need initializing or coordinating during the whole learning procedure. In practice, GFRNN first selects patterns as candidates out of the training set under the fixed radius nearest neighbor rule, and then introduces the metric based on the modified law of gravitation in the physical world to measure the distance between the query pattern and each candidate. Finally, GFRNN makes the decision based on the sum of all the corresponding gravitational forces from the candidates on the query pattern. The experimental comparison validates both the effectiveness and the efficiency of GFRNN on forty imbalanced datasets, comparing to nine typical methods. As a conclusion, the contribution of this paper is constructing a new simple nearest neighbor architecture to deal with imbalanced classification effectively without any manually parameter coordination, and further expanding the family of the nearest neighbor based rules.
- Published
- 2015
- Full Text
- View/download PDF
34. An efficient tree structure for indexing feature vectors
- Author
-
The-Anh Pham, Sabine Barrat, Jean-Yves Ramel, Mathieu Delalandre, Laboratoire d'Informatique Fondamentale et Appliquée de Tours (LIFAT), Université de Tours (UT)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Cover tree ,business.industry ,Feature vector ,Nearest neighbor search ,Search engine indexing ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,Pattern recognition ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,k-d tree ,Tree structure ,Best bin first ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Artificial Intelligence ,Search algorithm ,Signal Processing ,[INFO.INFO-IM]Computer Science [cs]/Medical Imaging ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Mathematics - Abstract
International audience; Keywords: Exact nearest neighbour search Approximate nearest neighbour search Feature indexing Randomized KD-trees Randomized clustering trees a b s t r a c t This paper addresses the problem of feature indexing in feature vector space. A linked-node m-ary tree (LM-tree) structure is presented to quickly produce the queries for an approximate and exact nearest neighbour search. Three main contributions are made, which can be attributed to the proposed LM-tree. First, a new polar-space-based method of data decomposition is presented to construct the LM-tree. Second, a novel pruning rule is proposed to efficiently narrow down the search space. Finally, a bandwidth search method is introduced to explore the nodes of the LM-tree. Extensive experiments were performed to study the behaviour of the proposed indexing algorithm. These experimental results showed that the proposed algorithm provides a significant improvement in the search performance for the tasks of both exact and approximate nearest neighbour (ENN/ANN) searches.
- Published
- 2015
- Full Text
- View/download PDF
35. Scaffoldings and Spines: Organizing High-Dimensional Data Using Cover Trees, Local Principal Component Analysis, and Persistent Homology
- Author
-
Ellen Gasparovic, Paul Bendich, Christopher J. Tralie, and John Harer
- Subjects
Clustering high-dimensional data ,Structure (mathematical logic) ,Persistent homology ,Cover tree ,business.industry ,Point cloud ,Pattern recognition ,0102 computer and information sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,010201 computation theory & mathematics ,Graph (abstract data type) ,Topological data analysis ,Cover (algebra) ,Artificial intelligence ,business ,computer ,Mathematics - Abstract
We propose a flexible and multi-scale method for organizing, visualizing, and understanding point cloud datasets sampled from or near stratified spaces. The first part of the algorithm produces a cover tree for a dataset using an adaptive threshold that is based on multi-scale local principal component analysis. The resulting cover tree nodes reflect the local geometry of the space and are organized via a scaffolding graph. In the second part of the algorithm, the goals are to uncover the strata that make up the underlying stratified space using a local dimension estimation procedure and topological data analysis, as well as to ultimately visualize the results in a simplified spine graph. We demonstrate our technique on several synthetic examples and then use it to visualize song structure in musical audio data.
- Published
- 2018
- Full Text
- View/download PDF
36. A new fast reduction technique based on binary nearest neighbor tree
- Author
-
Juan Li and Yuping Wang
- Subjects
Binary tree ,Cover tree ,business.industry ,Cognitive Neuroscience ,Nearest neighbor search ,Pattern recognition ,computer.software_genre ,Interval tree ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Best bin first ,Artificial Intelligence ,Ball tree ,Data mining ,Artificial intelligence ,business ,Self-balancing binary search tree ,computer ,Large margin nearest neighbor ,Mathematics - Abstract
The K-nearest neighbor ( KNN ) rule is one of the most useful supervised classification methods, and is widely used in many pattern classification applications due to its simplicity. However, it faces prohibitive computational and storage requirements when dealing with large datasets. A reasonable way of alleviating this problem is to extract a small representative subset from the original dataset without reducing the classification accuracy. This means the most internal patterns are removed and the boundary patterns that can contribute to better classification accuracy are retained. To achieve this purpose, a new algorithm based on binary tree technique and some reduction operations is presented. The key issues of the proposed algorithm are how to build binary nearest neighbor search tree and design reduction strategies to keep the high classification accuracy patterns. In particular, firstly, we utilize several tree control rules and KNN rule to build a binary nearest neighbor tree of each random pattern. Secondly, according to the node locations in each binary nearest neighbor tree and the strategies of selection and replacement, different kinds of patterns as prototypes are obtained, which are close to class boundary regions or locate in the interior regions, and some internal patterns are generated. Finally, experimental results show that the proposed algorithm effectively reduces the number of prototypes while maintaining the same level of classification accuracy as the traditional KNN algorithm and other prototype algorithms. Moreover, it is a simple and fast hybrid algorithm for prototype reduction.
- Published
- 2015
- Full Text
- View/download PDF
37. Designing of Semantic Nearest Neighbor Search: Survey
- Author
-
R Pawar Anita, H Mulani Tabssum, B Bandgar Shrimant, and Pansare Rajashree B
- Subjects
Information retrieval ,Cover tree ,Computer science ,Nearest neighbor search - Published
- 2015
- Full Text
- View/download PDF
38. Efficient Coverage and Connectivity Preservation With Load Balance for Wireless Sensor Networks
- Author
-
Joe-Air Jiang, Chia Pang Chen, Maw Yang Liu, Subhas Chandra Mukhopadhyay, and Cheng Long Chuang
- Subjects
Cover tree ,Computer science ,business.industry ,Heuristic ,Distributed computing ,Set cover problem ,Energy consumption ,Load balancing (computing) ,Key distribution in wireless sensor networks ,Node (computer science) ,Mobile wireless sensor network ,Electrical and Electronic Engineering ,Routing (electronic design automation) ,business ,Instrumentation ,Wireless sensor network ,Efficient energy use ,Computer network - Abstract
One of the primary objectives of wireless sensor networks is to provide full coverage of a sensing field as long as possible. Many tasks-such as object tracking and battlefield intrusion detection-require full coverage at any time. With the limited energy of sensor nodes, organizing these nodes into a maximal number of subgroups (or called set cover) capable of monitoring all discrete points of interest and then alternately activating them is a prevalent way to provide better quality of surveillance. In addition to maximizing the number of subgroups, how to guarantee the connectivity of sensor nodes (i.e., there exist links between the base station (BS) and sensor nodes) is also critically important while achieving full coverage. In this paper, thus, we develop a novel maximum connected load-balancing cover tree (MCLCT) algorithm to achieve full coverage as well as BS-connectivity of each sensing node by dynamically forming load-balanced routing cover trees. Such a task is particularly formulated as a maximum cover tree problem, which has been proved to be nondeterministic polynomial-complete. The proposed MCLCT consists of two components: 1) a coverage-optimizing recursive heuristic for coverage management and 2) a probabilistic load-balancing strategy for routing path determination. Through MCLCT, the burden of nodes in sensing and transmitting can be shared, so energy consumption among nodes becomes more evenly. Extensive simulation results show that our solution outperforms the existing ones in terms of energy efficiency and connectivity maintenance.
- Published
- 2015
- Full Text
- View/download PDF
39. Rank-Based Similarity Search: Reducing the Dimensional Dependence
- Author
-
Michael Nett and Michael E. Houle
- Subjects
Similarity (geometry) ,Cover tree ,Triangle inequality ,business.industry ,Applied Mathematics ,Nearest neighbor search ,Iterative deepening depth-first search ,computer.software_genre ,Computational Theory and Mathematics ,Artificial Intelligence ,Metric (mathematics) ,Beam search ,Computer Vision and Pattern Recognition ,Pruning (decision trees) ,Artificial intelligence ,Data mining ,business ,computer ,Algorithm ,Software ,Mathematics - Abstract
This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values.
- Published
- 2015
- Full Text
- View/download PDF
40. Fast £1-norm Nearest Neighbor Search Using A Simple Variant of Randomized Partition Tree
- Author
-
Kaushik Sinha
- Subjects
Big Data ,Cover tree ,business.industry ,Computer science ,Nearest neighbor search ,Pattern recognition ,Locality-sensitive hashing ,Best bin first ,Randomized Partition Tree ,Nearest-neighbor chain algorithm ,Norm (mathematics) ,R-tree ,Ball tree ,General Earth and Planetary Sciences ,Artificial intelligence ,Nearest Neighbor Search ,business ,Fixed-radius near neighbors ,Algorithm ,General Environmental Science - Abstract
For big data applications, randomized partition trees have recently been shown to be very effective in answering high dimensional nearest neighbor search queries with provable guarantee, when distances are measured using £ 2 norm. Unfortunately, if distances are measured using £ 1 norm, the same theoretical guarantee does not hold. In this paper, we show that a simple variant of randomized partition tree, which uses a different randomization using 1-stable distribution, can be used to efficiently answer high dimensional nearest neighbors queries when distances are measured using £ 1 norm. Experimental evaluations on eight real datasets suggest that the proposed method achieves better £i -norm nearest neighbor search accuracy with fewer retrieved data points as compared to locality sensitive hashing.
- Published
- 2015
- Full Text
- View/download PDF
41. An Improved Random Seed Searching Clustering Algorithm Based on Shared Nearest Neighbor
- Author
-
Ya Ran Su and Xi Xian Niu
- Subjects
Cover tree ,business.industry ,Nearest neighbor search ,Pattern recognition ,General Medicine ,computer.software_genre ,Best bin first ,Nearest neighbor graph ,Nearest-neighbor chain algorithm ,Ball tree ,Artificial intelligence ,Data mining ,Cluster analysis ,business ,computer ,Large margin nearest neighbor ,Mathematics - Abstract
Clustering analysis continually consider as a hot field in Data Mining. For different types data sets and application purposes, the relevant researchers concern on various aspect, such as the adaptability to fit density and shape, noise detection, outliers identification, cluster number determination, accuracy and optimization. Lots of related works focus on the Shared Nearest Neighbor measure method, due to its best and wide adaptability to deal with complex distribution data set. Based on Shared Nearest Neighbor, an improved algorithm is proposed in this paper, it mainly target on the problems solution of natural distribute density, arbitrary shape and cluster number determination. The new algorithm start with random selected seed, follow the direction of its nearest neighbors, search and find its neighbors which have the greatest similar features, form the local maximum cluster, dynamically adjust the data objects’ affiliation to realize the local optimization at the same time, and then end the clustering procedure until identify all the data objects. Experiments verify the new algorithm has the advanced ability to fit the problems such as different density, shape, noise, cluster number and so on, and can realize fast optimization searching.
- Published
- 2015
- Full Text
- View/download PDF
42. Online diagnosis of labeled timed Petri net using reduction rules
- Author
-
Boukala Mohand Cherif and Ahmed Saadi Hadjira
- Subjects
Reduction (complexity) ,Theoretical computer science ,Cover tree ,Computer science ,Construct (python library) ,Petri net ,State class - Abstract
We propose some redection rules to simplify the labeled timed Petri net model. Our purpose is to construct a labeled delay timed Petri net(LDTPN) that allows us to preserve the temporal constraints of the original model. We introduced also an algorithm for computing a Reduced Modified State Class, which is used to obtained the Cover Tree State Class that we used for online diagnosis. the solution for online diagnosis proposed in this paper permit us to restrain the explosion of stats.
- Published
- 2017
- Full Text
- View/download PDF
43. PQBF
- Author
-
Jiangtao Cui, Hong Cheng, and Yingfan Liu
- Subjects
Cover tree ,Computer science ,business.industry ,Nearest neighbor search ,Pattern recognition ,02 engineering and technology ,k-nearest neighbors algorithm ,B-tree ,Best bin first ,Nearest neighbor graph ,020204 information systems ,Nearest-neighbor chain algorithm ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Fixed-radius near neighbors ,business ,Algorithm ,Large margin nearest neighbor - Abstract
Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in many multimedia applications. Recently, product quantization (PQ) based methods for ANN search have attracted enormous attention in the community of computer vision, due to its good balance between accuracy and space requirement. PQ based methods embed a high-dimensional vector into a short binary code (called PQ code), and the squared Euclidean distance is estimated by asymmetric quantizer distance (AQD) with pretty high precision. Thus, ANN search in the original space can be converted to similarity search on AQD using the PQ approach. All existing PQ methods are in-memory solutions, which may not handle massive data if they cannot fit entirely in memory. In this paper, we propose an I/O-efficient PQ based solution for ANN search. We design an index called PQB+-forest to support efficient similarity search on AQD. PQB+-forest first creates a number of partitions of the PQ codes by a coarse quantizer and then builds a B+-tree, called PQB+-tree, for each partition. The search process is greatly expedited by focusing on a few selected partitions that are closest to the query, as well as by the pruning power of PQB+-trees. According to the experiments conducted on two large-scale data sets containing up to 1 billion vectors, our method outperforms its competitors, including the state-of-the-art PQ method and the state-of-the-art LSH methods for ANN search.
- Published
- 2017
- Full Text
- View/download PDF
44. AnnexML
- Author
-
Yukihiro Tagami
- Subjects
Cover tree ,Graph embedding ,business.industry ,Nearest neighbor search ,Pattern recognition ,02 engineering and technology ,Best bin first ,Nearest neighbor graph ,020204 information systems ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Large margin nearest neighbor ,Mathematics - Abstract
Extreme multi-label classification methods have been widely used in Web-scale classification tasks such as Web page tagging and product recommendation. In this paper, we present a novel graph embedding method called "AnnexML". At the training step, AnnexML constructs a k-nearest neighbor graph of label vectors and attempts to reproduce the graph structure in the embedding space. The prediction is efficiently performed by using an approximate nearest neighbor search method that efficiently explores the learned k-nearest neighbor graph in the embedding space. We conducted evaluations on several large-scale real-world data sets and compared our method with recent state-of-the-art methods. Experimental results show that our AnnexML can significantly improve prediction accuracy, especially on data sets that have larger a label space. In addition, AnnexML improves the trade-off between prediction time and accuracy. At the same level of accuracy, the prediction time of AnnexML was up to 58 times faster than that of SLEEC, which is a state-of-the-art embedding-based method.
- Published
- 2017
- Full Text
- View/download PDF
45. Similarity search through one-dimensional embeddings
- Author
-
Maria Camila Nardini Barioni, Humberto Luiz Razente, and Rafael L. Bernardes Lima
- Subjects
Cover tree ,Theoretical computer science ,Computer science ,Intersection (set theory) ,Nearest neighbor search ,Search engine indexing ,02 engineering and technology ,Data structure ,iDistance ,Similarity (network science) ,020204 information systems ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing - Abstract
The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.
- Published
- 2017
- Full Text
- View/download PDF
46. Nearest neighbor search on total bregman balls tree
- Author
-
Daniela P. L. Ferreira, Celia A. Z. Barcelos, and Bruno Moraes Rocha
- Subjects
Similarity (geometry) ,Cover tree ,business.industry ,Computer science ,Nearest neighbor search ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,Bregman divergence ,Similarity measure ,Data structure ,01 natural sciences ,Tree (data structure) ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
In high-dimensional spaces, a near neighbor search (NNS) has a high computational cost, which has motivated research toward the development of (NNS) methods based on data structures. These structures open the possibility that only one part of the data set needs to be examined, thus decreasing the computational cost. The NNS methods are in their greater part based on metric properties of the adopted similarity measure. However, there exist real applications that involve, for example, the comparison of probabilities, images, among others, and for which the adequate notion of similarity cannot be defined by a metric. In this study we present a new data structure and a new algorithm for NNS using Total Bregman Divergence (TBD). The TBD is invariant in relation to rotation of the coordinate axes and allows for the definition of centers, which are robust to noisy data. The structure and the proposed search method allow for the development of meta-algorithms, which are applicable to any TBD. The experiment presented herein shows the good performance of the new structure in the NNS methods.
- Published
- 2017
- Full Text
- View/download PDF
47. Proposing new method to improve gravitational fixed nearest neighbor algorithm for imbalanced data classification
- Author
-
Hossein Nezamabadi-pour, Mahin Shabani, and Bahareh Nikpour
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Cover tree ,Best bin first ,Nearest neighbor graph ,Nearest neighbor search ,Nearest-neighbor chain algorithm ,Data mining ,computer.software_genre ,Fixed-radius near neighbors ,computer ,Large margin nearest neighbor ,Mathematics ,k-nearest neighbors algorithm - Abstract
Classification of imbalanced data sets is one of the basic challenges in the field of machine learning and data mining. There have been many proposed methods for classification of such data sets up to now. In algorithmic level methods, new algorithms are created which are adapted to the nature of imbalanced data sets. Gravitational fixed radius nearest neighbor algorithm (GFRNN) is an algorithmic level method developed with the aim of enhancing k nearest neighbor classifier to acquire the ability of dealing with imbalanced data sets. This algorithm, utilizes the sum of gravitational forces on a query sample from its nearest neighbors in a fixed radius to determine its label. Simplicity and no need for parameter setting during the run of algorithm are the main advantages of this method. In this paper, a method is proposed for improving the performance of GFRNN algorithm in which mass assigning of each training sample is done based on the sum of gravitational forces from other training samples on it. The results obtained from applying the proposed method on 10 data sets prove the superiority of it compared with 5 other algorithms.
- Published
- 2017
- Full Text
- View/download PDF
48. Fast density peak clustering for large scale data based on kNN
- Author
-
Zheng Zhang, Haibo Li, Wentao Fan, Hailin Li, Xin Liu, Yi Chen, Xiaoliang Hu, Lianlian Shen, Yewang Chen, and Ji-Xiang Du
- Subjects
Information Systems and Management ,Cover tree ,Computer science ,Computation ,Brute-force search ,02 engineering and technology ,Large scale data ,Management Information Systems ,k-nearest neighbors algorithm ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Cluster analysis ,Algorithm ,Software ,Curse of dimensionality - Abstract
Density Peak (DPeak) clustering algorithm is not applicable for large scale data, due to two quantities, i.e, ρ and δ , are both obtained by brute force algorithm with complexity O ( n 2 ) . Thus, a simple but fast DPeak, namely FastDPeak, 1 is proposed, which runs in about O ( n l o g ( n ) ) expected time in the intrinsic dimensionality. It replaces density with kNN-density, which is computed by fast kNN algorithm such as cover tree, yielding huge improvement for density computations. Based on kNN-density, local density peaks and non-local density peaks are identified, and a fast algorithm, which uses two different strategies to compute δ for them, is also proposed with complexity O ( n ) . Experimental results show that FastDPeak is effective and outperforms other variants of DPeak.
- Published
- 2020
- Full Text
- View/download PDF
49. Scalable Nearest Neighbor Algorithms for High Dimensional Data
- Author
-
Marius Muja and David G. Lowe
- Subjects
Clustering high-dimensional data ,Cover tree ,Computer science ,Applied Mathematics ,Nearest neighbor search ,Approximation algorithm ,computer.software_genre ,k-nearest neighbors algorithm ,Hierarchical clustering ,Best bin first ,Computational Theory and Mathematics ,Nearest neighbor graph ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,R-tree ,Ball tree ,Computer Vision and Pattern Recognition ,Data mining ,Fixed-radius near neighbors ,Cluster analysis ,Algorithm ,computer ,Software ,Large margin nearest neighbor - Abstract
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
- Published
- 2014
- Full Text
- View/download PDF
50. Approximate nearest neighbor algorithm based on navigable small world graphs
- Author
-
Andrey Logvinov, Alexander Ponomarenko, Yury Malkov, and Vladimir Krylov
- Subjects
Cover tree ,Theoretical computer science ,Computer science ,Nearest neighbor search ,Graph ,Vertex (geometry) ,k-nearest neighbors algorithm ,Metric space ,Best bin first ,Nearest neighbor graph ,Hardware and Architecture ,Nearest-neighbor chain algorithm ,Greedy algorithm ,Software ,Information Systems - Abstract
We propose a novel approach to solving the approximate k-nearest neighbor search problem in metric spaces. The search structure is based on a navigable small world graph with vertices corresponding to the stored elements, edges to links between them, and a variation of greedy algorithm for searching. The navigable small world is created simply by keeping old Delaunay graph approximation links produced at the start of construction. The approach is very universal, defined in terms of arbitrary metric spaces and at the same time it is very simple. The algorithm handles insertions in the same way as queries: by finding approximate neighbors for the inserted element and connecting it to them. Both search and insertion can be done in parallel requiring only local information from the structure. The structure can be made distributed. The accuracy of the probabilistic k-nearest neighbor queries can be adjusted without rebuilding the structure. The performed simulation for data in the Euclidean spaces shows that the structure built using the proposed algorithm has small world navigation properties with log 2 ( n ) insertion and search complexity at fixed accuracy, and performs well at high dimensionality. Simulation on a CoPHiR dataset revealed its high efficiency in case of large datasets (more than an order of magnitude less metric computations at fixed recall) compared to permutation indexes. Only 0.03% of the 10 million 208-dimensional vector dataset is needed to be evaluated to achieve 0.999 recall (virtually exact search). For recall 0.93 processing speed 2800 queries/s can be achieved on a dual Intel X5675 Xenon server node with Java implementation.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.