284 results on '"Vantage-point tree"'
Search Results
2. Indexing algorithm based on storing additional distances in metric space for multi-vantage-point tree
- Author
-
Vladimir Fomin and Igor Akeksandrov
- Subjects
Human-Computer Interaction ,Metric space ,Control and Optimization ,Control and Systems Engineering ,Computer science ,Search engine indexing ,Algorithm ,Software ,Computer Science Applications ,Information Systems ,Vantage-point tree - Abstract
Introduction: The similarity search paradigm is used in various computational tasks, such as classification, data mining, pattern recognition, etc. Currently, the technology of tree-like metric access methods occupies a significant place among search algorithms. The classical problem of reducing the time of similarity search in metric space is relevant for modern systems when processing big complex data. Due to multidimensional nature of the search algorithm effectiveness problem, local research in this direction is in demand, constantly bringing useful results. Purpose: To reduce the computational complexity of tree search algorithms in problems involving metric proximity. Results: We developed a search algorithm for a multi-vantage-point tree, based on the priority node-processing queue. We mathematically formalized the problems of additional calculations and ways to solve them. To improve the performance of similarity search, we have proposed procedures for forming a priority queue of processing nodes and reducing the number of intersections of same level nodes. Structural changes in the multi-vantage-point tree and the use of minimum distances between vantage points and node subtrees provide better search efficiency. More accurate determination of the distance from the search object to the nodes and the fact that the search area intersects with a tree node allows you to reduce the amount of calculations. Practical relevance: The resulting search algorithms need less time to process information due to an insignificant increase in memory requirements. Reducing the information processing time expands the application boundaries of tree metric indexing methods in search problems involving large data sets.
- Published
- 2021
- Full Text
- View/download PDF
3. Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification.
- Author
-
Proença, Hugo and Neves, João
- Subjects
- *
NEIGHBORS , *DATA structures , *CENTROID , *DETERMINISTIC algorithms , *GENERALIZATION - Abstract
This paper describes a classification strategy that can be regarded as a more general form of nearest-neighbor classification. It fuses the concepts of nearest neighbor, linear discriminant and Vantage-Point trees, yielding an efficient indexing data structure and classification algorithm. In the learning phase, we define a set of disjoint subspaces of reduced complexity that can be separated by linear discriminants, ending up with an ensemble of simple (weak) classifiers that work locally. In classification, the closest centroids to the query determine the set of classifiers considered, which responses are weighted. The algorithm was experimentally validated in datasets widely used in the field, attaining error rates that are favorably comparable to the state-of-the-art classification techniques. Lastly, the proposed solution has a set of interesting properties for a broad range of applications: 1) it is deterministic; 2) it classifies in time approximately logarithmic with respect to the size of the learning set, being far more efficient than nearest neighbor classification in terms of computational cost; and 3) it keeps the generalization ability of simple models. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
4. TOWARDS ON EXPERIMENTAL COMPARISON OF THE M-TREE INDEX STRUCTURE WITH BK-TREE AND VP-TREE
- Author
-
Attila Kiss, Gergo Gombos, János M. Szalai-Gindl, and István Donkó
- Subjects
M-tree ,postgresql ,Index (economics) ,databases ,Polymers and Plastics ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,metric space ,Industrial and Manufacturing Engineering ,m-tree ,Combinatorics ,indexes ,Data_FILES ,BK tree ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Business and International Management ,lcsh:TK1-9971 ,Vantage-point tree - Abstract
In our previous paper, we showed the M-tree index [7] using GiST in the PostgreSQL database. In this paper, we present that result and we extend that with some preliminary experimental results with other indexes. We compare the M-tree index with the BK-tree and the VP-tree indexes. These can be work in metric space with edit distance, that can be used to compare DNA sequences or melody of songs. In this paper, we compare the indexes in PostgreSQL. We use the range based queries to analyze the performance of the indexes. The result shows that the M-tree index is faster than the other two indexes
- Published
- 2020
- Full Text
- View/download PDF
5. GEMINI: a computationally-efficient search engine for large gene expression datasets.
- Author
-
Timothy DeFreitas, Hachem Saddiki, and Patrick Flaherty
- Subjects
- *
NUCLEOTIDE sequencing , *GENE expression , *METADATA , *SEARCH algorithms , *DATABASE searching , *OVARIAN cancer , *CANCER genetics , *GENETICS of breast cancer - Abstract
Background: Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query -- a text-based string -- is mismatched with the form of the target -- a genomic profile. Results: To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an O(log n) expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 105 samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. Conclusions: GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
6. Modified Indexing Algorithm based on Priority Queue in Metric Space for MVP Tree
- Author
-
Denis Gallyamov, Vladimir V. Fomin, Igor Aleksandrov, and Ruslan Kirichek
- Subjects
Computer science ,Nearest neighbor search ,Search engine indexing ,Data classification ,02 engineering and technology ,Tree (data structure) ,020204 information systems ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Priority queue ,Cluster analysis ,Algorithm ,Vantage-point tree - Abstract
The Internet of Things (IoT) algorithms process huge amounts of heterogeneous data in real-time. One of the most computationally intensive tasks using cloud technologies is the task of clustering and classifying data. The authors propose to develop an approach to data classification within the “Query by Similarity” paradigm, which uses the technology of data indexing based on Metric Access Methods (MAM). To improve the performance of data indexing, this paper proposes a similar nearest neighbor search method combining a multiple vantage point tree (MVP) and improved algorithms for processing the priority queue of nodes. The following two algorithms for processing the priority queue of nodes were developed: 1) algorithm for all kinds of points-queries, which makes it possible to take into account parent nodes of all higher levels; 2) algorithm for grouped based on clustering of points-queries by reusing previously obtained search results. Experimental results confirm the effectiveness of the proposed approaches and algorithms.
- Published
- 2020
- Full Text
- View/download PDF
7. Fast Scalable Approximate Nearest Neighbor Search for High-dimensional Data
- Author
-
K G Renga Bashyam and Sathish Vadhiyar
- Subjects
Clustering high-dimensional data ,Speedup ,Computer science ,Nearest neighbor search ,Parallel algorithm ,02 engineering and technology ,Load balancing (computing) ,01 natural sciences ,Graph ,010104 statistics & probability ,Search algorithm ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,0101 mathematics ,Algorithm ,Vantage-point tree - Abstract
K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scaling k-NN search algorithms for billion-scale datasets with high dimensionality. In this paper, we propose a solution towards this end where we use vantage point trees for partitioning the dataset across multiple processes and exploit an existing graph-based sequential approximate k-NN search algorithm called HNSW (Hierarchical Navigable Small World) for searching locally within a process. Our hybrid MPI-OpenMP solution employs techniques including exploiting MPI one-sided communication for reducing communication times and partition replication for better load balancing across processes. We demonstrate computation of k-NN for 10,000 queries in the order of seconds using our approach on ∼8000 cores on a dataset with billion points in an 128-dimensional space. We also show 10X speedup over a completely k-d tree-based solution for the same dataset, thus demonstrating better suitability of our solution for high dimensional datasets. Our solution shows almost linear strong scaling
- Published
- 2020
- Full Text
- View/download PDF
8. Combined kNN Classification and Hierarchical Similarity Hash for Fast Malware Detection
- Author
-
Sunoh Choi
- Subjects
Computer science ,Hash function ,similarity hash ,02 engineering and technology ,computer.software_genre ,lcsh:Technology ,lcsh:Chemistry ,malware detection ,Similarity (network science) ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Instrumentation ,lcsh:QH301-705.5 ,Vantage-point tree ,Fluid Flow and Transfer Processes ,business.industry ,lcsh:T ,Process Chemistry and Technology ,Deep learning ,General Engineering ,deep learning ,020206 networking & telecommunications ,Pattern recognition ,lcsh:QC1-999 ,Computer Science Applications ,Tree (data structure) ,ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS ,ComputingMethodologies_PATTERNRECOGNITION ,classification ,lcsh:Biology (General) ,lcsh:QD1-999 ,lcsh:TA1-2040 ,Malware ,020201 artificial intelligence & image processing ,Artificial intelligence ,Detection rate ,business ,lcsh:Engineering (General). Civil engineering (General) ,computer ,lcsh:Physics - Abstract
Every day, hundreds of thousands of new malicious files are created. Existing pattern-based antivirus solutions have difficulty detecting these new malicious files. Artificial intelligence (AI)&ndash, based malware detection has been proposed to solve the problem, however, it takes a long time. Similarity hash&ndash, based detection has also been proposed, however, it has a low detection rate. To solve these problems, we propose k-nearest-neighbor (kNN) classification for malware detection with a vantage-point (VP) tree using a similarity hash. When we use kNN classification, we reduce the detection time by 67% and increase the detection rate by 25%. With a VP tree using a similarity hash, we reduce the similarity-hash search time by 20%.
- Published
- 2020
9. Approximating snowflake metrics by trees
- Author
-
William Leeb
- Subjects
Discrete mathematics ,Applied Mathematics ,010102 general mathematics ,02 engineering and technology ,Equivalence of metrics ,01 natural sciences ,Convex metric space ,Intrinsic metric ,Metric space ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,BK tree ,020201 artificial intelligence & image processing ,Metric tree ,0101 mathematics ,Algorithm ,Vantage-point tree ,Mathematics - Abstract
Tree metrics are encountered throughout pure and applied mathematics. Their simple structure makes them a convenient choice of metric in many applications from machine learning and computer science. At the same time, there is an elegant theory of harmonic analysis with respect to tree metrics that parallels the classical theory. A basic question in this field, which is of both theoretical and practical interest, is how to design efficient algorithms for building trees with good metric properties. In particular, given a finite metric space, we seek a random family of dominating tree metrics approximating the underlying metric in expectation. For general metrics, this problem has been solved: on the one hand, there are finite metric spaces that cannot be approximated by trees without incurring a distortion logarithmic in the size of the space, while the tree construction of Fakcharoenphol, Rao, and Talwar (FRT, 2003) shows how to achieve such a logarithmic error for arbitrary metrics. Since a distortion that grows even logarithmically with the size of the set may be too large for practical use in many settings, one naturally asks if there is a more restricted class of metrics where one can do better. The main result of this paper is that certain random family of trees already studied in the computer science literature, including the FRT trees, can be used to approximate snowflake metrics (metrics raised to a power less than 1) with expected distortion bounded by its doubling dimension and the degree of snowflaking. We also show that without snowflaking, the metric distortion can be bounded by a term logarithmic in the distance being approximated and linear in the dimension. We also present an optimal algorithm for building a single FRT tree, whose running time is bounded independently of all problem parameters other than the number of points. We conclude by demonstrating our theoretical results on a numerical example, and applying them to the approximation of the Earth Mover's Distance between probability distributions.
- Published
- 2018
- Full Text
- View/download PDF
10. Exact Nearest Neighbour Search within Constrained Neighbourhood Using the Forest of Vp-Tree-Like Structures
- Author
-
E Myasnikov
- Subjects
Combinatorics ,History ,ComputingMethodologies_PATTERNRECOGNITION ,Neighbourhood (mathematics) ,Nearest neighbour search ,Computer Science Applications ,Education ,Mathematics ,Vantage-point tree - Abstract
In this paper, we address the problem of fast nearest neighbour search. Unfortunately, well-known indexing data structures, such as vp-trees perform poorly on some datasets and do not provide significant acceleration compared to the brute force approach. In the paper, we consider an alternative solution, which can be applied if we are not interested in some fraction of distant nearest neighbours. This solution is based on building the forest of vp-tree-like structures and guarantees the exact nearest neighbour search in the epsilon-neighbourhood of the query point.
- Published
- 2021
- Full Text
- View/download PDF
11. A rule activation method for extended belief rule base with VP-tree and MVP-tree
- Author
-
Xiao-Ting Gong, Qun Su, Yan-Qing Lin, Ying-Ming Wang, and Yang-Geng Fu
- Subjects
Statistics and Probability ,Combinatorics ,Tree (data structure) ,Artificial Intelligence ,Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,General Engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Activation method ,Base (topology) ,Vantage-point tree - Published
- 2017
- Full Text
- View/download PDF
12. A Privacy-Preserving Similarity Search Scheme over Encrypted Word Embeddings
- Author
-
Chiemi Watanabe, Atsuyuki Morishima, Daisuke Aritomo, and Masaki Matsubara
- Subjects
Scheme (programming language) ,Information retrieval ,business.industry ,Computer science ,Nearest neighbor search ,Cloud computing ,Encryption ,Index (publishing) ,Symmetric-key algorithm ,business ,computer ,Word (computer architecture) ,computer.programming_language ,Vantage-point tree - Abstract
Recent evolution in cloud computing platforms have attracted the largest amount of data than ever before. Today, even the most sensitive data are being outsourced, thus, protection is essential to ensure that privacy is not traded for the convenience provided by cloud platforms. Traditional symmetric encryption schemes provide good protection; however, they ruin the merits of cloud computing. Attempts have been made to obtain a scheme where both functionality and protection can be achieved. However, features provided in existing searchable encryption schemes tend to be left behind the latest findings in the information retrieval (IR) area. In this study, we propose a privacy-preserving similar document search system based on Simhash. Our scheme is open to the latest machine-learning based IR schemes, and performance has been tuned utilizing a VP-tree based index, which is optimized for security. Analysis and various tests on real-world datasets demonstrate the scheme's security and efficiency on real-world datasets.
- Published
- 2019
- Full Text
- View/download PDF
13. Nonoptimality of the Maximum-Weight Dependence Tree in Classification
- Author
-
Amin Zollanvari
- Subjects
Discrete mathematics ,K-ary tree ,Applied Mathematics ,020206 networking & telecommunications ,02 engineering and technology ,Mutual information ,Exponential tree ,Interval tree ,Combinatorics ,Tree (data structure) ,Joint probability distribution ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,Order statistic tree ,Vantage-point tree ,Mathematics - Abstract
Half a century ago, Chow and Liu proved that the distribution of the first-order dependence tree that optimally approximates an arbitrary joint distribution in terms of Kullback–Leibler cross entropy is the distribution of the maximum-weight dependence tree (MWDT). In a $p$ -dimensional space, this is a tree with $p-1$ branches between pairs of variables with each branch having a weight equal to the mutual information between variables at both ends. Today, MWDTs have applications that are more important in classification than only being used to model first-order dependence structure among variables. Nevertheless, whether the MWDT is the optimal first-order tree in classification has been left unexplored so far. In this letter, we study the optimality of the MWDT structure from the stand-point of classification.
- Published
- 2017
- Full Text
- View/download PDF
14. The D-FCM partitioned D-BSP tree for massive point cloud data access and rendering
- Author
-
Yi Zhang
- Subjects
Theoretical computer science ,0211 other engineering and technologies ,Point cloud ,020207 software engineering ,02 engineering and technology ,Division (mathematics) ,Atomic and Molecular Physics, and Optics ,Computer Science Applications ,Rendering (computer graphics) ,Binary space partitioning ,Tree structure ,Principal component analysis ,0202 electrical engineering, electronic engineering, information engineering ,Computers in Earth Sciences ,Space partitioning ,Engineering (miscellaneous) ,Algorithm ,021101 geological & geomatics engineering ,Vantage-point tree ,Mathematics - Abstract
The spatial partitioning of massive point cloud data involves dividing the space into a multi-tree structure step by step, so as to achieve the purpose of fast access and to render the point cloud. The current methods are based on spatial regularity and equal division, which is not consistent with the irregular and non-uniform distribution of most point clouds. This paper presents a directional fuzzy c-means (D-FCM) method for irregular spatial partitioning. The distance metric is weighted by a direction coefficient, which is determined by the eigenvalue of the point cloud. The orientation of each node is adaptively calculated by principal component analysis of the point cloud, and Karhunen-Loeve (KL) transform is applied to the points coordinates in node. A binary space partitioning (BSP) tree structure is used to partition the point cloud data node by node, and a directional BSP (D-BSP) tree is formed. The D-BSP tree structure was tested with point clouds of 0.1 million to over 2 billion points (up to 60 GB). The experimental results showed that the D-BSP tree can ensure that the bounding boxes are close to the actual spatial distribution of the point cloud, it can completely expand along the spatial configuration of the point cloud without generating unnecessary partitioning, and it can achieve a higher rendering speed with less memory requirement.
- Published
- 2016
- Full Text
- View/download PDF
15. Analysis of approaches to feature space partitioning for nonlinear dimensionality reduction
- Author
-
Evgeny Myasnikov
- Subjects
Mathematical optimization ,Dimensionality reduction ,Nonlinear dimensionality reduction ,Recursive partitioning ,02 engineering and technology ,01 natural sciences ,Computer Graphics and Computer-Aided Design ,010309 optics ,k-d tree ,Tree (data structure) ,0103 physical sciences ,Ball tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Space partitioning ,Algorithm ,Mathematics ,Vantage-point tree - Abstract
One of the most effective ways to reduce the computational complexity of nonlinear dimensionality reduction is hierarchical partitioning of the space with the subsequent approximation of calculations. In this paper, the efficiency of two approaches to space partitioning, the partitioning of input and output spaces, is analyzed. In addition, a method for nonlinear dimensionality reduction is proposed. It is based on construction of a partitioning tree of the input multidimensional space and an iterative procedure of the gradient descent with the approximation carried out on the nodes of the constructed space partitioning tree. In the method proposed, the relative position of the corrected objects and partitioning tree nodes in both input and output spaces is taken into account in the approximation. The method developed was analyzed based on publicly available datasets.
- Published
- 2016
- Full Text
- View/download PDF
16. Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
- Author
-
K. Meenakshi Sundaram and R Pushpalatha
- Subjects
Clustering high-dimensional data ,Normalization (statistics) ,High dimensional data points ,General Computer Science ,Computer science ,Search engine indexing ,Process (computing) ,Indexed data ,Vantage point tree ,computer.software_genre ,Spectral clustering ,Clustering ,Data retrieval ,Data mining ,Electrical and Electronic Engineering ,Cluster analysis ,computer ,Vantage-point tree ,Curse of dimensionality - Abstract
Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-art-works.
- Published
- 2018
17. A navigation system for tree space
- Author
-
Sarah Mark, Mike Steel, and Jeanette C. McLeod
- Subjects
0301 basic medicine ,Discrete mathematics ,K-ary tree ,General Computer Science ,Interval tree ,Search tree ,Computer Science Applications ,Theoretical Computer Science ,Range tree ,Combinatorics ,03 medical and health sciences ,030104 developmental biology ,Tree structure ,Computational Theory and Mathematics ,Tree rearrangement ,Geometry and Topology ,Left-child right-sibling binary tree ,Vantage-point tree ,Mathematics - Abstract
The reconstruction of evolutionary trees from data sets on overlapping sets of species is a central problem in phylogenetics. Provided that the tree reconstructed for each subset of species is rooted and that these trees t together consistently, the space of all parent trees that ‘display’ these trees was recently shown to satisfy the following strong property: there exists a path from any one parent tree to any other parent tree by a sequence of local rearrangements (nearest neighbour interchanges) so that each intermediate tree also lies in this same tree space. However, the proof of this result uses a non-constructive argument. In this paper we describe a specic, polynomial-time procedure for navigating from any given parent tree to another while remaining in this tree space. The results are of particular relevance to the recent study of ‘phylogenetic terraces’.
- Published
- 2016
- Full Text
- View/download PDF
18. Fast nearest neighbor searching based on improved VP-tree
- Author
-
Shiguang Liu and Yinwei Wei
- Subjects
Computational complexity theory ,Computer science ,computer.software_genre ,k-nearest neighbors algorithm ,Best bin first ,Artificial Intelligence ,Nearest-neighbor chain algorithm ,Signal Processing ,Pattern recognition (psychology) ,Ball tree ,Redundancy (engineering) ,Computer Vision and Pattern Recognition ,Data mining ,computer ,Software ,Vantage-point tree - Abstract
A novel nearest neighbor searching framework is proposed.PCA is adopted to optimize the VP-tree structure for efficiency improvement.A novel approach to controlling the pruning conditions of VP-tree is developed.A completely eliminating redundancy method CERM is specially designed. Nearest neighbor searching is an important issue in both pattern recognition and image processing. However, most of the previous methods suffer from high computational complexity, restricting nearest neighbor searching from practical applications. This paper proposes a novel fast nearest neighbor searching method by combining improved VP-tree and PatchMatch method. PCA (Principal Component Analysis) method is employed to optimize the VP-tree so as to improve the searching speed. We also design an approach to controlling the pruning conditions of VP-tree which further improves the searching efficiency. A thorough redundancy elimination method on GPU is also developed, with a satisfactory independent-of-the-patch-size computational complexity. Various experiments show that our new method achieves a better balance between computational efficiency and memory requirements, while also improves the searching accuracy somehow, with great potential for practical real-time applications.
- Published
- 2015
- Full Text
- View/download PDF
19. Consistency of a phylogenetic tree maximum likelihood estimator
- Author
-
Arindam RoyChoudhury, John Bunge, and Amy D. Willis
- Subjects
Statistics and Probability ,Phylogenetic tree ,Estimation theory ,Applied Mathematics ,Combinatorics ,Tree (data structure) ,Tree structure ,Tree rearrangement ,Computational phylogenetics ,Statistics ,Statistics, Probability and Uncertainty ,Order statistic tree ,Vantage-point tree ,Mathematics - Abstract
Phylogenetic trees represent the order and extent of genetic divergence of a fixed collection of organisms. Order of divergence is represented via the tree structure, and extent of divergence by the branch lengths. Both the tree’s structure and branch lengths are unknown parameters and the tree is estimated using sequence information sampled at a number of genetic sites. Under the model of genetic Brownian motion, we prove that as the number of genetic sites that are sampled becomes large, the maximum likelihood estimator of the tree is consistent. (Our maximum likelihood estimator treats each site as an independent data point, which is different from concatenating the sites.) Existing arguments for consistency rely on the assumption of a finite parameter space or only apply to transition probability matrix-based models, and do not hold here due to the continuous model for branch lengths. The metric space of Billera et al. (2001) is central to the proof. We conclude with some comments on the role of parametric methods in tree estimation.
- Published
- 2015
- Full Text
- View/download PDF
20. A ‘Stochastic Safety Radius’ for Distance-Based Tree Reconstruction
- Author
-
Mike Steel, Olivier Gascuel, Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Centre for Ultrahigh Bandwidth Devices for Optical Systems (CUDOS), and Macquarie University
- Subjects
0301 basic medicine ,Mathematical optimization ,General Computer Science ,Applied Mathematics ,Populations and Evolution (q-bio.PE) ,Radius ,Interval tree ,Upper and lower bounds ,Computer Science Applications ,03 medical and health sciences ,Tree (data structure) ,030104 developmental biology ,FOS: Biological sciences ,Theory of computation ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Variety (universal algebra) ,Quantitative Biology - Populations and Evolution ,Algorithm ,Mathematics ,Distance based ,Vantage-point tree - Abstract
A variety of algorithms have been proposed for reconstructing trees that show the evolutionary relationships between species by comparing differences in genetic data across present-day taxa. If the leaf-to-leaf distances in a tree can be accurately estimated, then it is possible to reconstruct this tree from these estimated distances, using polynomial-time methods such as the popular `Neighbor-Joining' algorithm. There is a precise combinatorial condition under which distance-based methods are guaranteed to return a correct tree (in full or in part) based on the requirement that the input distances all lie within some `safety radius' of the true distances. Here, we explore a stochastic analogue of this condition, and mathematically establish upper and lower bounds on this `stochastic safety radius' for distance-based tree reconstruction methods. Using simulations, we show how this notion provides a new way to compare the performance of distance-based tree reconstruction methods. This may help explain why Neighbor-Joining performs so well, as its stochastic safety radius appears close to optimal (while its more classical safety radius is the same as many other less accurate methods)., 18 pages, 1 figure, 4 tables
- Published
- 2015
- Full Text
- View/download PDF
21. A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density
- Author
-
Werner Stuetzle and Rebecca Nugent
- Subjects
Statistics and Probability ,business.industry ,Feature vector ,Single-linkage clustering ,Statistics ,Nonparametric statistics ,Pattern recognition ,Cluster (physics) ,FOS: Mathematics ,Discrete Mathematics and Combinatorics ,Graph (abstract data type) ,Artificial intelligence ,Statistics, Probability and Uncertainty ,Cluster diagram ,business ,Cluster analysis ,Algorithm ,Mathematics ,Vantage-point tree ,Probability - Abstract
The goal of clustering is to detect the presence of distinct groups in a data set and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. The goal then is to find the modes and assign each observation to the domain of attraction of a mode. The modal structure of a density is summarized by its cluster tree; modes of the density correspond to leaves of the cluster tree. Estimating the cluster tree is the primary goal of nonparametric cluster analysis. We adopt a plug-in approach to cluster tree estimation: estimate the cluster tree of the feature density by the cluster tree of a density estimate. For some density estimates the cluster tree can be computed exactly, for others we have to be content with an approximation. We present a graph-based method that can approximate the cluster tree of any density estimate. Density estimates tend to have spurious modes caused by sampling variability, leading to spurious branches in the cluster tree. We propose excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent. Excess mass can be used as a guide for pruning the graph cluster tree. We point out mathematical and algorithmic connections to single linkage clustering and illustrate our approach on several examples.
- Published
- 2018
- Full Text
- View/download PDF
22. Spanifold: spanning tree flattening onto lower dimension
- Author
-
Christopher G. Small, Petr Kobelevskiy, and Shojaeddin Chenouri
- Subjects
Statistics and Probability ,Fractal tree index ,Combinatorics ,Spanning tree ,Segment tree ,Exponential tree ,Statistics, Probability and Uncertainty ,Minimum spanning tree ,Interval tree ,Isomap ,Vantage-point tree ,Mathematics - Abstract
Dimensionality reduction and manifold learning techniques attempt to recover a lower-dimensional submanifold from the data as encoded in high dimensions. Many techniques, linear or non-linear, have been introduced in the literature. Standard methods, such as Isomap and local linear embedding, map the high-dimensional data points into a low dimension so as to globally minimize a so-called energy function, which measures the mismatch between the precise geometry in high dimensions and the approximate geometry in low dimensions. However, the local effects of such minimizations are often unpredictable, because the energy minimization algorithms are global in nature. In contrast to these methods, the Spanifold algorithm of this paper constructs a tree on the manifold and flattens the manifold in such a way as to approximately preserve pairwise distance relationships within the tree. The vertices of this tree are the data points, and the edges of the tree form a subset of the edges of the nearest-neighbour graph on the data. In addition, the pairwise distances between data points close to the root of the tree undergo minimal distortion as the data are flattened. This allows the user to design the flattening algorithm so as to approximately preserve neighbour relationships in any chosen local region of the data. Copyright © 2015 John Wiley & Sons, Ltd.
- Published
- 2015
- Full Text
- View/download PDF
23. The research on nearest neighbor search algorithm based on vantage point tree
- Author
-
Daguang Jiang, Xianghui Zhao, Hejuan Sun, and Junkai Yi
- Subjects
Degree (graph theory) ,Computer science ,Nearest neighbor search ,02 engineering and technology ,01 natural sciences ,k-nearest neighbors algorithm ,Data set ,0103 physical sciences ,Digital image processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Pruning (decision trees) ,010306 general physics ,Algorithm ,Vantage-point tree - Abstract
The nearest neighbor search algorithm is an indispensable part of many image processing algorithms. In order to improve its performance, this paper proposes a similar nearest neighbor search algorithm combining multiple vantage point trees and improved pruning search methods. Firstly, when the vantage point tree is created, the vantage points are selected by selecting the points with the best degree of differentiation; Then, a new method of constructing vantage point forest is proposed to improve the nearest neighbor non-compact problem in the superior sub-tree; Finally, an improved pruning search method is proposed to find similar nearest neighbors, which reduces many unnecessary distance calculations. The experimental results show that the algorithm performance has been greatly improved in the data set size, returned the number of different neighbors, data dimension and the number of building trees.
- Published
- 2017
- Full Text
- View/download PDF
24. Data Aggregation for Spatially Correlated data using Polynomial Regression in 3D Wireless Sensor Network
- Author
-
Sanjeev Kumar Gupta, Anshuj Jain, Ankit Tripathi, and Bharti Chourasiya
- Subjects
Fractal tree index ,Polynomial regression ,Tree (data structure) ,Computer science ,Segment tree ,Node (computer science) ,Exponential tree ,Data mining ,Interval tree ,computer.software_genre ,Algorithm ,computer ,Vantage-point tree - Abstract
Sensor nodes observing events, occurring in close proximity, gather data that are highly co-related. This work uses a polynomial regression technique to exploit the spatial correlation of the data in a three dimensional sensor network. The sensing nodes, sense the physical attribute and report their position coordinates (x, y, z) and the sensed value to the nearest tree node. Another set of nodes, categorized as the tree nodes, are responsible for generating a polynomial function of the received data and transmit the coefficients of regression to the parent tree node. The approach proceeds from the bottom to the top. The query from the sink, receives a polynomial function which is generated by the root node of the tree in each cluster to compute the attribute value at any location within the boundary. The proposed approach aims to save a lot of energy in the sensor network. Simulations, performed for different tree heights, indicate that a tree with a depth of four gives the best results.
- Published
- 2014
- Full Text
- View/download PDF
25. An Approximation Algorithm for Constructing Degree-Dependent Node-Weighted Multicast Trees
- Author
-
Hwa-Chun Lin and Hsiu-Ming Yang
- Subjects
Fractal tree index ,Red–black tree ,Tree rotation ,Mathematical optimization ,K-ary tree ,Multicast ,Computer science ,Segment tree ,Approximation algorithm ,Exponential tree ,Interval tree ,k-minimum spanning tree ,Steiner tree problem ,Search tree ,symbols.namesake ,Tree traversal ,Computational Theory and Mathematics ,Hardware and Architecture ,Signal Processing ,symbols ,Vantage-point tree - Abstract
This paper studies the problem of constructing a minimum-cost multicast tree (or Steiner tree) in which each node is associated with a cost that is dependent on its degree in the multicast tree. The cost of a node may depend on its degree in the multicast tree due to a number of reasons. For example, a node may need to perform various processing for sending messages to each of its neighbors in the multicast tree. Thus, the overhead for processing the messages increases as the number of neighbors increases. This paper devises a novel technique to deal with the degree-dependent node costs and applies the technique to develop an approximation algorithm for the degree-dependent node-weighted Steiner tree problem. The bound on the cost of the tree constructed by the proposed approximation algorithm is derived to be 2((ln (k/2))+1)(W_T∗ + B), where k is the size of the set of multicast members, W_T∗ is the cost of a minimum-cost Steiner tree T∗, and B is related to the degree-dependent node costs. Simulations are carried out to study the performance of the proposed algorithm. A distributed implementation of the proposed algorithm is presented. In addition, the proposed algorithm is generalized to solve the degree-dependent node-weighted constrained forest problem.
- Published
- 2014
- Full Text
- View/download PDF
26. Universal Enumerative Coding for Tree Models
- Author
-
Gadiel Seroussi, Marcelo Weinberger, and Alvaro Martin
- Subjects
Fractal tree index ,Discrete mathematics ,K-ary tree ,Segment tree ,Library and Information Sciences ,Interval tree ,Search tree ,Computer Science Applications ,Range tree ,Combinatorics ,Tree structure ,Information Systems ,Mathematics ,Vantage-point tree - Abstract
Efficient enumerative coding for tree sources is, in general, surprisingly intricate-a simple uniform encoding of type classes, which is asymptotically optimal in expectation for many classical models, such as FSMs, turns out not to be so in this case. We describe an efficiently computable enumerative code that is universal in the family of tree models in the sense that, for a string emitted by an unknown source whose model is supported on a known tree, the expected normalized code length of the encoding approaches the entropy rate of the source with a convergence rate (K/2)(log n)/n, where K is the number of free parameters of the model family. Based on recent results characterizing type classes of context trees, the code consists of the index of the sequence in the tree type class, and an efficient description of the class itself using a nonuniform encoding of selected string counts. The results are extended to a twice-universal setting, where the tree underlying the source model is unknown.
- Published
- 2014
- Full Text
- View/download PDF
27. Decision Trees for Mining Data Streams Based on the Gaussian Approximation
- Author
-
Maciej Jaworski, Piotr Duda, Leszek Rutkowski, and Lena Pietruczuk
- Subjects
Tree rotation ,Data stream ,Fractal tree index ,K-ary tree ,Computer science ,Decision tree ,computer.software_genre ,Interval tree ,2–3–4 tree ,Entropy (information theory) ,Vantage-point tree ,Incremental decision tree ,Data stream mining ,Entropy (statistical thermodynamics) ,Decision tree learning ,Segment tree ,ID3 algorithm ,Search tree ,Computer Science Applications ,Range tree ,Tree traversal ,Tree structure ,Computational Theory and Mathematics ,Data mining ,computer ,Information Systems - Abstract
Since the Hoeffding tree algorithm was proposed in the literature, decision trees became one of the most popular tools for mining data streams. The key point of constructing the decision tree is to determine the best attribute to split the considered node. Several methods to solve this problem were presented so far. However, they are either wrongly mathematically justified (e.g., in the Hoeffding tree algorithm) or time-consuming (e.g., in the McDiarmid tree algorithm). In this paper, we propose a new method which significantly outperforms the McDiarmid tree algorithm and has a solid mathematical basis. Our method ensures, with a high probability set by the user, that the best attribute chosen in the considered node using a finite data sample is the same as it would be in the case of the whole data stream.
- Published
- 2014
- Full Text
- View/download PDF
28. Unsupervised Tree Induction for Tree-based Translation
- Author
-
Jiajun Zhang, Yu Zhou, Feifei Zhai, and Chengqing Zong
- Subjects
Fractal tree index ,Linguistics and Language ,Incremental decision tree ,K-ary tree ,Computer science ,business.industry ,Communication ,Pattern recognition ,Interval tree ,Search tree ,Computer Science Applications ,Human-Computer Interaction ,Tree traversal ,Tree structure ,Artificial Intelligence ,Artificial intelligence ,business ,Vantage-point tree - Abstract
In current research, most tree-based translation models are built directly from parse trees. In this study, we go in another direction and build a translation model with an unsupervised tree structure derived from a novel non-parametric Bayesian model. In the model, we utilize synchronous tree substitution grammars (STSG) to capture the bilingual mapping between language pairs. To train the model efficiently, we develop a Gibbs sampler with three novel Gibbs operators. The sampler is capable of exploring the infinite space of tree structures by performing local changes on the tree nodes. Experimental results show that the string-to-tree translation system using our Bayesian tree structures significantly outperforms the strong baseline string-to-tree system using parse trees.
- Published
- 2013
- Full Text
- View/download PDF
29. On the low-dimensional Steiner minimum tree problem in Hamming metric
- Author
-
Joschka Kupilas, Rouven Naujoks, and Ernst Althaus
- Subjects
Discrete mathematics ,K-ary tree ,General Computer Science ,Minimum spanning tree ,k-minimum spanning tree ,Steiner tree problem ,Theoretical Computer Science ,Combinatorics ,symbols.namesake ,Hamming graph ,symbols ,Metric tree ,Gomory–Hu tree ,Mathematics ,Vantage-point tree - Abstract
While it is known that the d-dimensional Steiner minimum tree problem in Hamming metric is NP-complete if d is part of the input, it is an open question whether this also holds for fixed dimensions. In this paper, this question is answered by showing that the Steiner minimum tree problem in Hamming metric is already NP-complete in 3 dimensions. Furthermore, we show that, the minimum spanning tree gives a 2-2d approximation on the Steiner minimum tree for d>=2. Using this result, we analyse the so-called k-LCA and A"k approximation algorithms and show improved approximation guarantees for low dimensions.
- Published
- 2013
- Full Text
- View/download PDF
30. The number of DFAs for a given spanning tree
- Author
-
Parisa Babaali, E. Carta-Gerardino, and C. Knaplund
- Subjects
Fractal tree index ,Tree rotation ,TheoryofComputation_COMPUTATIONBYABSTRACTDEVICES ,K-ary tree ,Computer science ,Minimum spanning tree ,Interval tree ,Theoretical Computer Science ,Trie ,Random tree ,Enumeration ,Vantage-point tree ,Minimum degree spanning tree ,Discrete mathematics ,Finite-state machine ,Spanning tree ,Binary tree ,Segment tree ,Nonlinear Sciences::Cellular Automata and Lattice Gases ,Search tree ,Range tree ,Tree (data structure) ,Tree traversal ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Tree structure ,Deterministic finite automaton ,Hardware and Architecture ,Algorithm ,Computer Science::Formal Languages and Automata Theory ,Software ,Order statistic tree ,Information Systems - Abstract
In the last few decades, several techniques to randomly generate a deterministic finite automaton have been developed. These techniques have implications in the enumeration and random generation of automata of size n. One of the ways to generate a finite automaton is to generate a random tree and to complete it to a deterministic finite automaton, assuming that the tree will be the automaton's breadth-first spanning tree. In this paper we explore further this method, and the string representation of a tree, and use it to counting the number of automata having a tree as a breadth-first spanning subtrees. We introduce the notions of characteristic and difference sequence of a tree, and define the weight of a tree. We also present a recursive formula for this quantity in terms of the "derivative" of a tree. Finally, we analyze the implications of this formula in terms of exploring trees with the largest and smallest number of automata in the span of the tree and present simple applications for finding densities of subsets of DFAs.
- Published
- 2013
- Full Text
- View/download PDF
31. Kernel based principal axis tree
- Author
-
Qiaozhi Li, Jie Sheng, and Ankur Teredesai
- Subjects
Computer science ,business.industry ,Pattern recognition ,Interval tree ,01 natural sciences ,Kernel principal component analysis ,Range tree ,010104 statistics & probability ,Tree (data structure) ,k-d tree ,Kernel (statistics) ,Artificial intelligence ,0101 mathematics ,business ,Principal axis theorem ,Vantage-point tree - Abstract
K-nearest neighbors (KNN) describes a problem of finding the nearest K neighbors for a query point from a given data set. Among the available techniques, principal axis tree (PAT) always performs well when being used as an index for nearest neighbors search. As a generalization of k-dimensional tree (k-d tree), PAT uses its principal axis at each step, instead of using a coordinate axis to sort the data set. In this paper, we consider further performance improvement of PAT. Inspired by “kernel tricks” which is computationally cheaper than the explicit computation of the coordinates, we propose a kernel based principal axis tree (KPAT) method. Compared with k-d tree and PAT, the effectiveness of KPAT is verified by experimental results.
- Published
- 2017
- Full Text
- View/download PDF
32. A General Technique for Non-blocking Trees
- Author
-
Faith Ellen, Trevor Brown, and Eric Ruppert
- Subjects
Fractal tree index ,Tree rotation ,Red–black tree ,FOS: Computer and information sciences ,K-ary tree ,AVL tree ,Theoretical computer science ,Computer science ,Optimal binary search tree ,Exponential tree ,Parallel computing ,Interval tree ,Blocking (statistics) ,Trie ,Self-balancing binary search tree ,Vantage-point tree ,Binary tree ,Segment tree ,Computer Graphics and Computer-Aided Design ,Search tree ,Range tree ,Tree traversal ,Tree (data structure) ,Tree structure ,Computer Science - Distributed, Parallel, and Cluster Computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Algorithm ,Order statistic tree ,Software ,Left-child right-sibling binary tree - Abstract
We describe a general technique for obtaining provably correct, non-blocking implementations of a large class of tree data structures where pointers are directed from parents to children. Updates are permitted to modify any contiguous portion of the tree atomically. Our non-blocking algorithms make use of the LLX, SCX and VLX primitives, which are multi-word generalizations of the standard LL, SC and VL primitives and have been implemented from single-word CAS. To illustrate our technique, we describe how it can be used in a fairly straightforward way to obtain a non-blocking implementation of a chromatic tree, which is a relaxed variant of a red-black tree. The height of the tree at any time is O(c + log n), where n is the number of keys and c is the number of updates in progress. We provide an experimental performance analysis which demonstrates that our Java implementation of a chromatic tree rivals, and often significantly outperforms, other leading concurrent dictionaries.
- Published
- 2017
- Full Text
- View/download PDF
33. Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification
- Author
-
Hugo Proença, Joao C. Neves, and uBibliorum
- Subjects
Image classification ,Linear discriminants ,Linear classifier ,02 engineering and technology ,Library and Information Sciences ,k-nearest neighbors algorithm ,Set (abstract data type) ,Nearest neighbor classification ,Mathematics (miscellaneous) ,Vantage-point tree ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,One-class classification ,Mathematics ,Contextual image classification ,business.industry ,020206 networking & telecommunications ,Pattern recognition ,Linear discriminant analysis ,ComputingMethodologies_PATTERNRECOGNITION ,020201 artificial intelligence & image processing ,Psychology (miscellaneous) ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business - Abstract
Submitted by PROENÇA (hugomcp@ubi.pt) on 2020-02-06T14:40:34Z No. of bitstreams: 1 VPC.pdf: 2013894 bytes, checksum: 8a70fd1281c74eac5c3a74b5434d9d94 (MD5) Rejected by Pessoa (pfep@ubi.pt), reason: Deposito com erros: Nomes dos autores (um por campo), resumo, palavras chave etc. Siga o nosso Manual de Auxilio ao Autodepósito: https://www.ubi.pt/Ficheiros/Sites/7/Documentos/1375/Instruções%20para%20Auto%20depósito%20de%20documentos.pdf (questões do Acesso na pagina 9). Após corrigir é só submeter de novo. Obrigado on 2020-02-07T11:47:48Z (GMT) Submitted by PROENÇA (hugomcp@ubi.pt) on 2020-02-07T12:02:25Z No. of bitstreams: 1 VPC.pdf: 2013894 bytes, checksum: 8a70fd1281c74eac5c3a74b5434d9d94 (MD5) Rejected by Pessoa (pfep@ubi.pt), reason: Deposito com erros: Nomes dos autores (anexar às plataformas curriculares), resumo, palavras chave etc. Siga o nosso Manual de Auxilio ao Autodepósito: https://www.ubi.pt/Ficheiros/Sites/7/Documentos/1375/Instruções%20para%20Auto%20depósito%20de%20documentos.pdf (questões do Acesso na pagina 9). Após corrigir é só submeter de novo. Obrigado on 2020-02-07T16:14:19Z (GMT) Submitted by PROENÇA (hugomcp@ubi.pt) on 2020-02-07T16:38:24Z No. of bitstreams: 1 VPC.pdf: 2013894 bytes, checksum: 8a70fd1281c74eac5c3a74b5434d9d94 (MD5) Approved for entry into archive by Pessoa (pfep@ubi.pt) on 2020-02-10T14:18:54Z (GMT) No. of bitstreams: 1 VPC.pdf: 2013894 bytes, checksum: 8a70fd1281c74eac5c3a74b5434d9d94 (MD5) Approved for entry into archive by Pessoa (pfep@ubi.pt) on 2020-02-10T14:21:54Z (GMT) No. of bitstreams: 1 VPC.pdf: 2013894 bytes, checksum: 8a70fd1281c74eac5c3a74b5434d9d94 (MD5) Made available in DSpace on 2020-02-10T14:21:54Z (GMT). No. of bitstreams: 1 VPC.pdf: 2013894 bytes, checksum: 8a70fd1281c74eac5c3a74b5434d9d94 (MD5) Previous issue date: 2017 info:eu-repo/semantics/publishedVersion
- Published
- 2017
34. An Efficient Algorithm for Node-Weighted Tree Partitioning with Subtrees' Weights in a Given Range
- Author
-
Guangchun Luo, Ke Qin, Hao Chen, Yuhai Liu, and Caihui Qu
- Subjects
Fractal tree index ,Tree rotation ,K-ary tree ,Computer science ,T-tree ,Interval tree ,Search tree ,Combinatorics ,Tree (data structure) ,Artificial Intelligence ,Hardware and Architecture ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Software ,Vantage-point tree - Published
- 2013
- Full Text
- View/download PDF
35. Mathematical Properties of the Deep Coalescence Cost
- Author
-
Cuong Than and Noah A. Rosenberg
- Subjects
Fractal tree index ,K-ary tree ,Models, Genetic ,Genetic Speciation ,Applied Mathematics ,Link/cut tree ,Computational Biology ,Interval tree ,Search tree ,Evolution, Molecular ,Tree traversal ,Tree rearrangement ,Genetics ,Quantitative Biology::Populations and Evolution ,ComputingMethodologies_GENERAL ,Algorithm ,Algorithms ,Biotechnology ,Vantage-point tree ,Mathematics - Abstract
In the minimizing-deep-coalescences (MDC) approach for species tree inference, a tree that has the minimal deep coalescence cost for reconciling a collection of gene trees is taken as an estimate of the species tree topology. The MDC method possesses the desirable Pareto property, and in practice it is quite accurate and computationally efficient. Here, in order to better understand the MDC method, we investigate some properties of the deep coalescence cost. We prove that the unit neighborhood of either a rooted species tree or a rooted gene tree under the deep coalescence cost is exactly the same as the tree's unit neighborhood under the rooted nearest-neighbor interchange (NNI) distance. Next, for a fixed species tree, we obtain the maximum deep coalescence cost across all gene trees as well as the number of gene trees that achieve the maximum cost. We also study corresponding problems for a fixed gene tree.
- Published
- 2013
- Full Text
- View/download PDF
36. Expanded Insert Algorithm in a B* Tree
- Author
-
Felipe A. Pati and Thelma D. Palaoag
- Subjects
Tree rotation ,Fractal tree index ,Incremental decision tree ,AVL tree ,K-ary tree ,Theoretical computer science ,Computer science ,Segment tree ,Exponential tree ,Interval tree ,Search tree ,Tree (data structure) ,(a,b)-tree ,Trie ,Algorithm ,Left-child right-sibling binary tree ,Vantage-point tree - Abstract
Tree data structure works great in representing computations unambiguously. In this study, the researchers enhanced the existing insert algorithm in B*Tree by introducing an expanded algorithm for inserting key values in B*Tree. This would further delay, if not reduce redistribution and frequency of splitting nodes in B*Tree. The idea of the expanded algorithm is to check for a cousin node within the same level that can accommodate a key value from its nearest sibling. A cascading effect is expected until the nearest sibling of the overflowing node can accommodate a key value with the premise that the nearest siblings of the overflowing node are full. Exploring the potential of this algorithm can really simplify a complex problem. The expanded insert algorithm would ensure that all leaf nodes in the B*Tree would be full and not just 2/3 full before a redistribution process is to be performed. Thus, reducing the nodes used in a B*Tree that would result to a far more favorable priori and posteriori estimates.
- Published
- 2016
- Full Text
- View/download PDF
37. Minimization of Tree Pattern Queries
- Author
-
Wojciech Czerwiński, Matthias Niewerth, Paweł Parys, and Wim Martens
- Subjects
Fractal tree index ,AVL tree ,K-ary tree ,Theoretical computer science ,Segment tree ,0102 computer and information sciences ,02 engineering and technology ,Interval tree ,01 natural sciences ,Search tree ,Combinatorics ,Tree structure ,010201 computation theory & mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Mathematics ,Vantage-point tree - Abstract
We investigate minimization of tree pattern queries that use the child relation, descendant relation, node labels, and wildcards. We prove that minimization for such tree patterns is Sigma2P-complete and thus solve a problem first attacked by Flesca, Furfaro, and Masciari in 2003. We first provide an example that shows that tree patterns cannot be minimized by deleting nodes. This example shows that the M-NR conjecture, which states that minimality of tree patterns is equivalent to their nonredundancy, is false. We then show how the example can be turned into a gadget that allows us to prove Sigma2P-completeness.
- Published
- 2016
- Full Text
- View/download PDF
38. A Novel Approach to Remote Sensing Image Retrieval with Multi-feature VP-Tree Indexing and Online Feature Selection
- Author
-
Shijin Li, Lixin Yuan, and Hui Yu
- Subjects
business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Feature extraction ,Search engine indexing ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Automatic image annotation ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Visual Word ,Artificial intelligence ,business ,Precision and recall ,Image retrieval ,Remote sensing ,Vantage-point tree - Abstract
With the development of remote sensing (RS) techniques, the amount of RS images increases dramatically. It is a challenge to utilize those RS big data efficiently. Content-based Image Retrieval (CBIR) is a typical approximate similarity search problem, which needs to establish an effective index structure to reduce the time of retrieval. By analyzing the limitations of commonly-used indexing mechanisms in the current CBIR system, we propose a novel scheme that dynamically combines vantage point tree (vp-tree) indexes to CBIR by using spacing-correlation strategies to determine the vantage points. Borrowing ideas from feature selection, we have also put forward a new measure to adaptively online select proper vp-tree indexing in different feature spaces, the distance-contrast-based indexing validity index (DCIVI). And we then employ vp-tree index structure in each feature space, which can properly describe the content of the RS image by the chosen features. Experimental results on various typical land covers retrieval validate that the proposed method is effective and not only is the response speed increased by 70~100 times, but also the retrieval quality (in terms of precision and recall) is improved.
- Published
- 2016
- Full Text
- View/download PDF
39. An improved algorithm for tree edit distance with applications for RNA secondary structure comparison
- Author
-
Kaizhong Zhang and Shihyen Chen
- Subjects
Fractal tree index ,Control and Optimization ,Theoretical computer science ,K-ary tree ,Computer science ,Applied Mathematics ,Segment tree ,Interval tree ,Search tree ,Computer Science Applications ,Tree (data structure) ,Tree traversal ,Computational Theory and Mathematics ,Discrete Mathematics and Combinatorics ,Algorithm ,Vantage-point tree - Abstract
An ordered labeled tree is a tree in which the nodes are labeled and the left-to-right order among siblings is relevant. The edit distance between two ordered labeled trees is the minimum cost of transforming one tree into the other through a sequence of edit operations. We present techniques for speeding up the tree edit distance computation which are applicable to a family of algorithms based on closely related recursion strategies. These techniques aim to reduce repetitious steps in the original algorithms by exploring certain structural features in the tree. When these features exist in a large portion of the tree, the speedup due to our techniques would be significant. Viable examples for application include RNA secondary structure comparison and structured text comparison.
- Published
- 2012
- Full Text
- View/download PDF
40. SEPARABILITY OF POINT SETS BY k-LEVEL LINEAR CLASSIFICATION TREES
- Author
-
Alberto Márquez, Carlos Seara, Delia Garijo, Joseph S. B. Mitchell, Esther M. Arkin, and Universidad de Sevilla. Departamento de Matemática Aplicada I (ETSII)
- Subjects
Discrete mathematics ,K-ary tree ,decision trees ,Applied Mathematics ,Segment tree ,Exponential tree ,Interval tree ,Random binary tree ,Theoretical Computer Science ,Range tree ,Combinatorics ,Computational Mathematics ,machine learning ,classification ,Computational Theory and Mathematics ,Red-blue separation ,Geometry and Topology ,Order statistic tree ,binary space partitions ,Vantage-point tree ,Mathematics - Abstract
Let R and B be sets of red and blue points in the plane in general position. We study the problem of computing a k-level binary space partition (BSP) tree to classify/separate R and B, such that the tree defines a linear decision at each internal node and each leaf of the tree corresponds to a (convex) cell of the partition that contains only red or only blue points. Specifically, we show that a 2-level tree can be computed, if one exists, in time O(n2). We show that a minimum-level (3 ≤ k ≤ log n) tree can be computed in time nO( log n). In the special case of axis-parallel partitions, we show that 2-level and 3-level trees can be computed in time O(n), while a minimum-level tree can be computed in time O(n5).
- Published
- 2012
- Full Text
- View/download PDF
41. Study of Improvement of Search Range Compression Method of VP-tree for Video Indexes
- Author
-
Samuel Sang-Kon Lee, Jea-Jeong Hwang, and Gil-Yang Park
- Subjects
Euclidean distance ,Metric space ,Theoretical computer science ,Node (networking) ,Nearest neighbor search ,Metric (mathematics) ,Search engine indexing ,Point (geometry) ,Algorithm ,Mathematics ,Vantage-point tree - Abstract
In multimedia database, a multidimensional space-based indexing has been used to increase search efficiency. However, this method is inefficient in terms of ubiquity because it uses Euclidean distance as a scale of distance calculation. On the contrary, a metric space-based indexing method, in which metric axiom is prerequisite is widely available because a metric scale other than Euclidean distance could be used. This paper is attempted to propose a way of improving VP-tree, one of the metric space indexing methods. The VP-tree calculates the distance with an object which is ultimately linked to the a leaf node depending on the node fit for the search range from a root node and examines if it is appropriate with the search range. Because search speed decreases as the number of distance calculations at the leaf node increases, however, this paper has proposed a method which uses the latest interface on query object as the base point of trigonometric inequality for improvement after focusing on the trigonometric inequality-based range compression method in a leaf node. This improvement method would be able to narrow the search range and reduce the number of distance calculations. According to a system performance test using 10,000 video data, the new method reduced search time for similar videos by 5-12%, compared to a conventional method.
- Published
- 2012
- Full Text
- View/download PDF
42. A Multidimensional Sequence Approach to Measuring Tree Similarity
- Author
-
Zhiwei Lin, Hui Wang, and Sally McClean
- Subjects
Computer science ,business.industry ,Nearest neighbor search ,Segment tree ,Pattern recognition ,Interval tree ,Search tree ,Computer Science Applications ,Longest common subsequence problem ,Tree (data structure) ,Tree traversal ,Computational Theory and Mathematics ,Similarity (network science) ,Normalized compression distance ,Algorithm design ,Artificial intelligence ,Tree kernel ,business ,Time complexity ,Information Systems ,Vantage-point tree - Abstract
Tree is one of the most common and well-studied data structures in computer science. Measuring the similarity of such structures is key to analyzing this type of data. However, measuring tree similarity is not trivial due to the inherent complexity of trees and the ensuing large search space. Tree kernel, a state of the art similarity measurement of trees, represents trees as vectors in a feature space and measures similarity in this space. When different features are used, different algorithms are required. Tree edit distance is another widely used similarity measurement of trees. It measures similarity through edit operations needed to transform one tree to another. Without any restrictions on edit operations, the computation cost is too high to be applicable to large volume of data. To improve efficiency of tree edit distance, some approximations were introduced into tree edit distance. However, their effectiveness can be compromised. In this paper, a novel approach to measuring tree similarity is presented. Trees are represented as multidimensional sequences and their similarity is measured on the basis of their sequence representations. Multidimensional sequences have their sequential dimensions and spatial dimensions. We measure the sequential similarity by the all common subsequences sequence similarity measurement or the longest common subsequence measurement, and measure the spatial similarity by dynamic time warping. Then we combine them to give a measure of tree similarity. A brute force algorithm to calculate the similarity will have high computational cost. In the spirit of dynamic programming two efficient algorithms are designed for calculating the similarity, which have quadratic time complexity. The new measurements are evaluated in terms of classification accuracy in two popular classifiers (k-nearest neighbor and support vector machine) and in terms of search effectiveness and efficiency in k-nearest neighbor similarity search, using three different data sets from natural language processing and information retrieval. Experimental results show that the new measurements outperform the benchmark measures consistently and significantly.
- Published
- 2012
- Full Text
- View/download PDF
43. Research on Construction of K-d Tree Based on Euclidean Distance
- Author
-
Jun Xu, Hua Feng Ding, Wei Qing Wang, Xing Jiang Xiao, and Yuan Zhang
- Subjects
Discrete mathematics ,K-ary tree ,Tree structure ,Euclidean minimum spanning tree ,Segment tree ,General Engineering ,Interval tree ,Algorithm ,Search tree ,Vantage-point tree ,Mathematics ,Range tree - Abstract
Traditional k-d tree is constructed according to the order in which data appear, so the balance and depth of the constructed k-d tree are not ideal. To overcome the disadvantages of the construction of traditional k-d tree, this paper proposes a new constructing method based on Euclidean distance so that the construction begins with the center of the data, and every time the points of the nearest distance are used to construct k-d tree, so k-d tree generated in this way is relatively better balanced and has better depth, therefore good searching performance is achieved.
- Published
- 2011
- Full Text
- View/download PDF
44. The tree inclusion problem
- Author
-
Philip Bille and Inge Li Gørtz
- Subjects
Discrete mathematics ,K-ary tree ,Linear space ,0102 computer and information sciences ,02 engineering and technology ,Tree-depth ,computer.software_genre ,01 natural sciences ,Range tree ,Combinatorics ,Tree (data structure) ,Mathematics (miscellaneous) ,XML database ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Time complexity ,computer ,Vantage-point tree ,Mathematics - Abstract
Given two rooted, ordered, and labeled trees P and T the tree inclusion problem is to determine if P can be obtained from T by deleting nodes in T . This problem has recently been recognized as an important query primitive in XML databases. Kilpeläinen and Mannila [1995] presented the first polynomial-time algorithm using quadratic time and space. Since then several improved results have been obtained for special cases when P and T have a small number of leaves or small depth. However, in the worst case these algorithms still use quadratic time and space. Let n S , l S , and d S denote the number of nodes, the number of leaves, and the depth of a tree S ∈ P , T . In this article we show that the tree inclusion problem can be solved in space O ( n T ) and time: O⎛⎝min⎧⎨⎩lPnTlPlT log log nT + nTnPnTlog nT+ nT log nT⎫⎬⎭⎞⎠. This improves or matches the best known time complexities while using only linear space instead of quadratic. This is particularly important in practical applications, such as XML databases, where the space is likely to be a bottleneck.
- Published
- 2011
- Full Text
- View/download PDF
45. New dissimilarity measure for recognizing noisy subsequence trees
- Author
-
Takanori Yokoyama, Toshinori Watanabe, Hisashi Koga, and Hiroaki Saito
- Subjects
Incremental decision tree ,K-ary tree ,business.industry ,Segment tree ,Pattern recognition ,Interval tree ,Search tree ,Theoretical Computer Science ,Human-Computer Interaction ,Tree traversal ,Tree structure ,Artificial Intelligence ,Artificial intelligence ,business ,Software ,Mathematics ,Vantage-point tree - Abstract
Tree is a data structure used to express various objects such as semistructured data and genes. When objects are represented as trees, computing tree similarity is essential for pattern recognition and retrieval. This paper considers the noisy subsequence tree recognition problem whose purpose is to recognize the original tree, given its noisy subsequence tree. Previous research on this problem relied on constrained tree edit distance to measure the dissimilarity. However, the number of relabelings must be predetermined to compute it. This paper proposes a new dissimilarity measure for this problem. Our dissimilarity measure is obtained by counting the node edit operations included in the unit-cost tree edit distance that contribute to the matching of node labels. The number of relabelings need not be specified to compute our dissimilarity measure. Moreover, our measure achieves more accurate recognition performance and faster execution speed than the constrained tree edit distance. Our measure is also useful to solve the tree inclusion problem which is the problem of deciding whether a tree includes another tree and shows the extent of approximate tree inclusion when a tree incompletely includes another tree. © 2011 Wiley Periodicals, Inc. (An early version of this work was presented at the 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'07), Springer LNCS, Vol. 4881, pp. 643–652, 2007.)
- Published
- 2011
- Full Text
- View/download PDF
46. A metric normalization of tree edit distance
- Author
-
Zhang Chenguang and Yujian Li
- Subjects
M-tree ,Combinatorics ,String-to-string correction problem ,General Computer Science ,Computer science ,Damerau–Levenshtein distance ,Metric tree ,Edit distance ,Jaro–Winkler distance ,Wagner–Fischer algorithm ,Theoretical Computer Science ,Vantage-point tree - Abstract
Traditional normalized tree edit distances do not satisfy the triangle inequality. We present a metric normalization method for tree edit distance, which results in a new normalized tree edit distance fulfilling the triangle inequality, under the condition that the weight function is a metric over the set of elementary edit operations with all costs of insertions/deletions having the same weight. We prove that the new distance, in the range [0, 1], is a genuine metric as a simple function of the sizes of two ordered labeled trees and the tree edit distance between them, which can be directly computed through tree edit distance with the same complexity. Based on an efficient algorithm to represent digits as ordered labeled trees, we show that the normalized tree edit metric can provide slightly better results than other existing methods in handwritten digit recognition experiments using the approximating and eliminating search algorithm (AESA) algorithm.
- Published
- 2011
- Full Text
- View/download PDF
47. Location-selection of Wireless Network Based on Restricted Steiner Tree Algorithm
- Author
-
Shang-wu Yang, Xiao-xu Lu, and Nan Zheng
- Subjects
Discrete mathematics ,computational complexity ,K-ary tree ,Computational complexity theory ,Computer Science::Computational Geometry ,Minimum spanning tree ,k-minimum spanning tree ,Steiner tree problem ,Combinatorics ,symbols.namesake ,Tree (data structure) ,wireless network ,TheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY ,symbols ,General Earth and Planetary Sciences ,Computer Science::Data Structures and Algorithms ,Algorithm ,restricted Steiner tree ,location-selection ,Decision tree model ,MathematicsofComputing_DISCRETEMATHEMATICS ,General Environmental Science ,Mathematics ,Vantage-point tree - Abstract
The minimum Steiner tree problem has wide application background, such as transportation system, communication network, pipeline design and VISL, etc. It is unfortunately that the computational complexity of the problem is NP- hard. People are common to find some special problems to consider. In this paper, we introduce the definition of restricted Steiner tree problem, i.e., a restricted Steiner tree problem, which the fixed vertices are in the same side of one line L and we find a vertex on L such the distance of the tree is minimal. By the definition and the complexity of the Steiner tree problem, we know that the complexity of this problem is also Np-complete. Therefore, we consider there are two or three fixed vertices to find the restricted Steiner tree problem. We use the algorithm of restricted Steiner tree problem to solve the location-selection of wireless network.
- Published
- 2011
- Full Text
- View/download PDF
48. A General Quad Search Tree on Two-Dimension Data
- Author
-
Xing Jiang Xiao, Yuan Zhang, Jun Xu, Wei Qing Wang, and Hua Feng Ding
- Subjects
Fractal tree index ,Tree traversal ,Theoretical computer science ,Optimal binary search tree ,Segment tree ,Ternary search tree ,General Engineering ,Interval tree ,Algorithm ,Search tree ,Mathematics ,Vantage-point tree - Abstract
In the plane coordinate system, x-coordinate and y-coordinate divide the plane into four quadrants. Combing the values of two-dimension data and the four quadrants, we can construct a general quad search tree easily. The quad search tree can be applied to all the two-dimension data. And, the height of the quad search tree can be reduced effectively, thus we can get a better search speed. The experiments have verified the validity and correctness of the quad search tree.
- Published
- 2011
- Full Text
- View/download PDF
49. Fitting Tree Metrics: Hierarchical Clustering and Phylogeny
- Author
-
Moses Charikar and Nir Ailon
- Subjects
Discrete mathematics ,K-ary tree ,General Computer Science ,Linear programming ,Phylogenetic tree ,General Mathematics ,Correlation clustering ,Approximation algorithm ,Interval tree ,Binary logarithm ,Range tree ,Hierarchical clustering ,Combinatorics ,Tree structure ,Norm (mathematics) ,Metric tree ,Gomory–Hu tree ,Ultrametric space ,Vantage-point tree ,Mathematics - Abstract
Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e. some measure of the difference between the tree metric and the given data). This problem arises in constructing an M-level hierarchical clustering of objects (or an ultrametric on objects) so as to match the given dissimilarity data - a basic problem in statistics. Viewed in this way, the problem is a generalization of the correlation clustering problem (which corresponds to M = 1). We give a very simple randomized combinatorial algorithm for the M-level hierarchical clustering problem that achieves an approximation ratio of M+2. This is a generalization of a previous factor 3 algorithm for correlation clustering on complete graphs. The problem of fitting tree metrics also arises in phylogeny where the objective is to learn the evolution tree by fitting a tree to dissimilarity data on taxa. The quality of the fit is measured by taking the l/sub p/ norm of the difference between the tree metric constructed and the given data. Previous results obtained a factor 3 approximation for finding the closest tree tree metric under the l/spl infin/ norm. No nontrivial approximation for general l/sub p/ norms was known before. We present a novel LP formulation for this problem and obtain an O((log n log log n)/sup 1/p/) approximation using this. Enroute, we obtain an O((log n log log n)/sup 1/p/) approximation for the closest ultrametric under the l/sub p/ norm. Our techniques are based on representing and viewing an ultrametric as a hierarchy of clusterings, and may be useful in other contexts.
- Published
- 2011
- Full Text
- View/download PDF
50. Fast k-nearest neighbors search using modified principal axis search tree
- Author
-
Yi-Ching Liaw, Chien-Min Wu, and Maw-Lin Leou
- Subjects
Cover tree ,Fringe search ,business.industry ,Applied Mathematics ,Nearest neighbor search ,Best-first search ,Pattern recognition ,Search tree ,Computational Theory and Mathematics ,Artificial Intelligence ,Search algorithm ,Signal Processing ,Ball tree ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,Statistics, Probability and Uncertainty ,business ,Mathematics ,Vantage-point tree - Abstract
The problem of k-nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. Among available methods, the principal axis search tree (PAT) algorithm always has good performance on finding nearest k neighbors using the PAT structure and a node elimination criterion. In this paper, a novel kNN search algorithm is proposed. The proposed algorithm stores projection values for all data points in leaf nodes. If a leaf node in the PAT cannot be rejected by the node elimination criterion, data points in the leaf node are further checked using their pre-stored projection values to reject more impossible data points. Experimental results show that the proposed method can effectively reduce the number of distance calculations and computation time for the PAT algorithm, especially for the data set with a large dimension or for a search tree with large number of data points in a leaf node.
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.