10 results on '"Holub, J."'
Search Results
2. CONSTANT-MEMORY ITERATIVE GENERATION OF SPECIAL STRINGS REPRESENTING BINARY TREES.
- Author
-
SMYCZYŃSKI, SEBASTIAN and Holub, J.
- Subjects
- *
DATA structures , *ITERATIVE methods (Mathematics) , *PERMUTATIONS , *COMPUTER algorithms , *DECODING algorithms , *COMPUTER storage devices - Abstract
The shapes of binary trees can be encoded as permutations having a very special property. These permutations are tree permutations, or equivalently they avoid subwords of the type 231. The generation of binary trees in natural order corresponds to the generation of these special permutations in the lexicographic order. In this paper we use a stringologic approach to the generation of these special permutations: decompositions of essential parts into the subwords having staircase shapes. A given permutation differs from the next one with respect to its tail called here the working suffix. Some new properties of such working suffixes are discovered in the paper and used to design effective algorithms transforming one tree permutation into its successor or predecessor in the lexicographic order. The algorithms use a constant amount of additional memory and they look only at those elements of the permutation which belong to the working suffix. The best-case, average-case and worst-case time complexities of the algorithms are O(1), O(1), and O(n) respectively. The advantages of our stringologic approach are constant time and iterative generation, while other known algorithms are usually recursive or not constant-memory ones. In this paper we also present a new compact non-recursive linear time algorithm solving a related problem of decoding the shape of a binary tree from its corresponding tree permutation. [ABSTRACT FROM AUTHOR]
- Published
- 2012
3. CROCHEMORE'S REPETITIONS ALGORITHM REVISITED:: COMPUTING RUNS.
- Author
-
FRANEK, FRANTISEK, JIANG, MEI, and Holub, J.
- Subjects
LEMPEL-Ziv algorithm ,COMPARATIVE studies ,DATA structures ,COMPUTER algorithms ,PERFORMANCE evaluation ,FACTORIZATION - Abstract
Crochemore's repetitions algorithm introduced in 1981 was the first O(n log n) algorithm for computing repetitions. Since then, several linear-time worst-case algorithms for computing runs have been introduced. They all follow a similar strategy: first compute the suffix tree or array, then use the suffix tree or array to compute the Lempel-Ziv factorization, then using the Lempel-Ziv factorization compute all the runs. It is conceivable that in practice an extension of Crochemore's repetitions algorithm may outperform the linear-time algorithms, or at least for certain classes of strings. The nature of Crochemore's algorithm lends itself naturally to parallelization, while the linear-time algorithms are not easily conducive to parallelization. For all these reasons it is interesting to explore ways to extend the original Crochemore's repetitions algorithm to compute runs. We present three variants of extending the repetitions algorithm to compute runs: two with a worsen complexity of O(n (log n)
2 ), and one with the same complexity as the original algorithm. The three variants are tested for speed of performance and their memory requirements are analyzed. The third variant is tested and analyzed for various memory-saving alterations. The purpose of this research is to identify the best extension of Crochemore's algorithm for further study, comparison with other algorithms, and parallel implementation. [ABSTRACT FROM AUTHOR]- Published
- 2012
- Full Text
- View/download PDF
4. ON-LINE CONSTRUCTION OF A SMALL AUTOMATON FOR A FINITE SET OF WORDS.
- Author
-
CROCHEMORE, MAXIME, GIAMBRUNO, LAURA, LANGIU, ALESSIO, and Holub, J.
- Subjects
ROBOTS ,SET theory ,ALGORITHMS ,MACHINE theory ,LINEAR time invariant systems ,LINEAR systems - Abstract
In this paper we describe a "light" algorithm for the on-line construction of a small automaton recognising a finite set of words. The algorithm runs in linear time. We carried out good experimental results on real dictionaries, on biological sequences and on the sets of suffixes (resp. factors) of a set of words that shows how our automaton is near to the minimal one. For the suffixes of a text, we propose a modified construction that leads to an even smaller automaton. We moreover construct linear algorithms for the insertion and deletion of a word in a finite set, directly from the constructed automaton. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
5. ADAPTING BOYER-MOORE-LIKE ALGORITHMS FOR SEARCHING HUFFMAN ENCODED TEXTS.
- Author
-
CANTONE, DOMENICO, FARO, SIMONE, GIAQUINTA, EMANUELE, and Holub, J.
- Subjects
SEARCH algorithms ,ENCODING ,MATCHING theory ,DATA structures ,INFORMATION retrieval ,TEXT files - Abstract
In this paper we propose an efficient approach to the compressed string matching problem on Huffman encoded texts, based on the BOYER-MOORE strategy. Once a candidate valid shift has been located, a subsequent verification phase checks whether the shift is codeword aligned by taking advantage of the skeleton tree data structure. Our approach leads to algorithms that exhibit a sublinear behavior on the average, as shown by extensive experimentation. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
6. PARALLEL ALGORITHMS FOR MAPPING SHORT DEGENERATE AND WEIGHTED DNA SEQUENCES TO A REFERENCE GENOME.
- Author
-
ILIOPOULOS, COSTAS S., MILLER, MIRKA, PISSIS, SOLON P., and Holub, J.
- Subjects
PARALLEL algorithms ,COMPUTER algorithms ,COST effectiveness ,PROBABILITY theory ,NUCLEOTIDE sequence ,DATA analysis ,MATHEMATICAL mappings - Abstract
One of the most ambitious trends in current biomedical research is the large-scale genomic sequencing of patients. Novel high-throughput (or next-generation) sequencing technologies have redefined the way genome sequencing is performed. They are able to produce millions of short sequences (reads) in a single experiment, and with a much lower cost than previously possible. Due to this massive amount of data, efficient algorithms for mapping these sequences to a reference genome are in great demand, and recently, there has been ample work for publishing such algorithms. One important feature of these algorithms is the support of multithreaded parallel computing in order to speedup the mapping process. In this paper, we design parallel algorithms, which make use of the message-passing parallelism model, to address this problem efficiently. The proposed algorithms also take into consideration the probability scores assigned to each base for occurring in a specific position of a sequence. In particular, we present parallel algorithms for mapping short degenerate and weighted DNA sequences to a reference genome. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
7. FINDING CHARACTERISTIC SUBSTRINGS FROM COMPRESSED TEXTS.
- Author
-
INENAGA, SHUNSUKE, BANNAI, HIDEO, and Holub, J.
- Subjects
DATA compression ,COMPUTER science ,COMPUTER algorithms ,POLYNOMIALS ,DATABASE management ,DATA mining - Abstract
Text mining from large scaled data is of great importance in computer science. In this paper, we consider fundamental problems on text mining from compressed strings, i.e., computing a longest repeating substring, longest non-overlapping repeating substring, most frequent substring, and most frequent non-overlapping substring from a given compressed string. Also, we tackle the following novel problem: given a compressed text and compressed pattern, compute the representative of the equivalence class of the pattern w.r.t. the text. We present algorithms that solve the above problems in time polynomial in the size of input compressed strings. The compression scheme we consider is straight line program (SLP) which has exponential compression, and therefore our algorithms are more efficient than any existing algorithms that require decompression of given SLPs. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
8. ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS.
- Author
-
BURCSI, PÉTER, CICALESE, FERDINANDO, FICI, GABRIELE, LIPTÁK, ZSUZSANNA, and Holub, J.
- Subjects
COMPUTER algorithms ,DATA structures ,MATHEMATICAL analysis ,QUERY (Information retrieval system) ,BINARY number system ,MATCHING theory - Abstract
The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a
1 , ..., aσ } is defined as the vector of multiplicities of the characters, p(s) = (p1 , ..., pσ ), where pi = |{j | sj = ai }|. Parikh vector q occurs in s if s has a substring t with p(t) = q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time. The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2 ) time. The second algorithm finds all occurrences of a given Parikh vector in a text over an arbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More precisely, we present two variants of the algorithm, both using an O(n) size data structure, each of which can be constructed in O(n) time. The first solution is very simple and easy to implement and leads to an expected query time of $O(n{(\frac{\sigma }{{\log \,\sigma }})^{1/2}}\frac{{\log \,m}}{{\sqrt m }})$, where m = ∑i qi is the length of a string with Parikh vector q. The second uses wavelet trees and improves the expected runtime to $O(n{(\frac{\sigma }{{\log \,\sigma }})^{1/2}}\frac{1}{{\sqrt m }})$, i.e., by a factor of log m. [ABSTRACT FROM AUTHOR]- Published
- 2012
- Full Text
- View/download PDF
9. ASYMPTOTIC BEHAVIOUR OF THE MAXIMAL NUMBER OF SQUARES IN STANDARD STURMIAN WORDS.
- Author
-
PIATKOWSKI, MARCIN, RYTTER, WOJCIECH, and Holub, J.
- Subjects
COMBINATORICS ,FIBONACCI sequence ,DATA compression ,BOUNDARY layer (Aerodynamics) ,MATHEMATICAL analysis ,MATHEMATICAL sequences - Abstract
Denote by sq(w) the number of distinct squares in a string w and let $\mathcal{S}$ be the class of standard Sturmian words. They are generalizations of Fibonacci words and are important in combinatorics on words. For Fibonacci words the asymptotic behaviour of the number of runs and the number of squares is the same. We show that for Sturmian words the situation is quite different. The tight bound $\frac{8}{{10}}|w|$ for the number of runs was given in [3]. In this paper we show that the tight bound for the maximal number of squares is $\frac{9}{{10}}|w|$. We use the results of [11], where exact (but not closed) complicated formulas were given for sq(w) for $w\, \in \,\mathcal{S}$. We show that for all $w\, \in \,\mathcal{S}$ we have $sq(w)\, \leq \,\frac{9}{{10}}\,|w|$ and there is an infinite sequence of words ${w_k}\, \in \,\mathcal{S}$ such that lim
k→∞ |wk | = ∞ and ${\lim _{k \to \infty }}\,\frac{{sq({w_k})}}{{|{w_k}|}}\, = \,\frac{9}{{10}}$. Surprisingly the maximal number of squares is reached by the words with recurrences of length only 5. This contrasts with the situation of Fibonacci words, though standard Sturmian words are natural extension of Fibonacci words. If this length drops to 4, the asymptotic behaviour of the maximal number of squares falls down significantly below $\frac{9}{{10}}|w|$. The structure of Sturmian words rich in squares has been discovered by us experimentally and verified theoretically. The upper bound is much harder, its proof is not a matter of simple calculations. The summation formulas for the number of squares are complicated, no closed formula is known. Some nontrivial reductions were necessary. [ABSTRACT FROM AUTHOR]- Published
- 2012
- Full Text
- View/download PDF
10. PATTERN MATCHING WITH SWAPS IN PRACTICE.
- Author
-
CAMPANELLI, MATTEO, CANTONE, DOMENICO, FARO, SIMONE, GIAQUINTA, EMANUELE, and Holub, J.
- Subjects
MATCHING theory ,COMPARATIVE studies ,COMPUTER algorithms ,APPROXIMATION theory ,PATTERNS (Mathematics) ,MATHEMATICAL analysis - Abstract
The Pattern Matching problem with Swaps consists in finding all occurrences of a pattern P in a text T, when disjoint local swaps in the pattern are allowed. In the Approximate Pattern Matching problem with Swaps one seeks, for every text location with a swapped match of P, the number of swaps necessary to obtain a match at the location. In this paper we devise two general algorithms for both Standard and Approximate Pattern Matching with Swaps, named CROSS-SAMPLING and BACKWARD-CROSS-SAMPLING, with a $\mathcal{O}(nm)$ and $\mathcal{O}(n{m^2})$ worst-case time complexity, respectively. Then we provide efficient implementations of them, based on bit-parallelism, which achieve $\mathcal{O}(n)$ and $\mathcal{O}(nm)$ worst-case time complexity, with patterns whose length is comparable to the word-size of the target machine. From an extensive comparison with some of the most effective algorithms for the swap matching problem, it turns out that our algorithms are very flexible and achieve very good results inpractice. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.