Back to Search Start Over

Algorithm for Enumerating All Maximal Frequent Tree Patterns among Words in Tree-Structured Documents and Its Application.

Authors :
Uchida, Tomoyuki
Kawamoto, Kayo
Source :
Database Theory & Application; 2009, p107-114, 8p
Publication Year :
2009

Abstract

In order to extract structural features among nodes, in which characteristic words appear, from tree-structured documents, we proposed a text mining algorithm for enumerating all frequent consecutive path patterns (CPPs) on a list W of words (PAKDD, 2004). First of all, in this paper, we extend a CPP to a tree pattern, which is called a tree association pattern (TAP), over a set W of words. A TAP is an ordered rooted tree t such that the root of t has no child or at least 2 children, all leaves of t are labeled with nonempty subsets of W, and all internal nodes, if exists, are labeled with strings. Next, we present text mining algorithms for enumerating all maximal frequent TAPs in tree-structured documents. Then, by reporting experimental results for Reuters news-wires, we evaluate our algorithms. Finally, as an application of CPPs, we present an algorithm for a wrapper based on CPP using XSLT transformation language and demonstrate simply the use of wrapper to translate one of Reuters news-wires to other XML document. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783642105821
Database :
Complementary Index
Journal :
Database Theory & Application
Publication Type :
Book
Accession number :
76880836
Full Text :
https://doi.org/10.1007/978-3-642-10583-8_14