Back to Search Start Over

Evaluating reliability of tree-patterns in extreme-K categorical samples problems.

Authors :
Chou, Elizabeth
Hsieh, Yin-Chen
Enriquez, Sabrina
Hsieh, Fushing
Source :
Journal of Statistical Computation & Simulation; Dec 2021, Vol. 91 Issue 18, p3828-3849, 22p
Publication Year :
2021

Abstract

Exploratory Data Analysis (EDA) approaches are adopted to address the difficult extreme-K categorical sample problem. Due to observed data's categorical nature, all comparisons among populations are performed by comparing their distributions in the form of a histogram with symbolic bins. A distance measure is designed to evaluate the discrepancy between two symbol-based histograms to facilitate Hierarchical Clustering (HC) algorithms. The resultant binary HC-tree then serves as the basis for our EDA task of discovering tree-patterns of interest. Since each population-leaf's location within a binary HC-tree's geometry is expressed through a binary code sequence, a binary code segment characterizes all commonly shared tree-patterns for all members. We then generate a large ensemble of mimicries of the observed dataset based on multinomial distributions and construct a large ensemble of binary HC-trees. Upon each identified tree-pattern which we determined based on the observed dataset, we evaluate its reliability and uncertainty through two histograms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00949655
Volume :
91
Issue :
18
Database :
Complementary Index
Journal :
Journal of Statistical Computation & Simulation
Publication Type :
Academic Journal
Accession number :
153842749
Full Text :
https://doi.org/10.1080/00949655.2021.1951266