Back to Search
Start Over
Algorithm for Enumerating All Maximal Frequent Tree Patterns among Words in Tree-Structured Documents and Its Application.
- Source :
- Database Theory & Application; 2009, p107-114, 8p
- Publication Year :
- 2009
-
Abstract
- In order to extract structural features among nodes, in which characteristic words appear, from tree-structured documents, we proposed a text mining algorithm for enumerating all frequent consecutive path patterns (CPPs) on a list W of words (PAKDD, 2004). First of all, in this paper, we extend a CPP to a tree pattern, which is called a tree association pattern (TAP), over a set W of words. A TAP is an ordered rooted tree t such that the root of t has no child or at least 2 children, all leaves of t are labeled with nonempty subsets of W, and all internal nodes, if exists, are labeled with strings. Next, we present text mining algorithms for enumerating all maximal frequent TAPs in tree-structured documents. Then, by reporting experimental results for Reuters news-wires, we evaluate our algorithms. Finally, as an application of CPPs, we present an algorithm for a wrapper based on CPP using XSLT transformation language and demonstrate simply the use of wrapper to translate one of Reuters news-wires to other XML document. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISBNs :
- 9783642105821
- Database :
- Complementary Index
- Journal :
- Database Theory & Application
- Publication Type :
- Book
- Accession number :
- 76880836
- Full Text :
- https://doi.org/10.1007/978-3-642-10583-8_14