Back to Search Start Over

How many samples are needed to build a classifier: a general sequential approach

Authors :
Fu, Wenjiang J.
Dougherty, Edward R.
Mallick, Bani
Carroll, Raymond J.
Source :
Bioinformatics; January 2005, Vol. 21 Issue: 1 p63-63, 1p
Publication Year :
2005

Abstract

Motivation: The standard paradigm for a classifier design is to obtain a sample of feature-label pairs and then to apply a classification rule to derive a classifier from the sample data. Typically in laboratory situations the sample size is limited by cost, time or availability of sample material. Thus, an investigator may wish to consider a sequential approach in which there is a sufficient number of patients to train a classifier in order to make a sound decision for diagnosis while at the same time keeping the number of patients as small as possible to make the studies affordable. Results: A sequential classification procedure is studied via the martingale central limit theorem. It updates the classification rule at each step and provides stopping criteria to ensure with a certain confidence that at stopping a future subject will have misclassification probability smaller than a predetermined threshold. Simulation studies and applications to microarray data analysis are provided. The procedure possesses several attractive properties: (1) it updates the classification rule sequentially and thus does not rely on distributions of primary measurements from other studies; (2) it assesses the stopping criteria at each sequential step and thus can substantially reduce cost via early stopping; and (3) it is not restricted to any particular classification rule and therefore applies to any parametric or non-parametric method, including feature selection or extraction. Availability: R-code for the sequential stopping rule is available at <inter-ref locator="http://stat.tamu.edu/~wfu/microarray/sequential/R-code.html" locator-type="url">http://stat.tamu.edu/~wfu/microarray/sequential/R-code.html</inter-ref> Contact: <inter-ref locator="wfu@stat.tamu.edu" locator-type="email">wfu@stat.tamu.edu</inter-ref>

Details

Language :
English
ISSN :
13674803 and 13674811
Volume :
21
Issue :
1
Database :
Supplemental Index
Journal :
Bioinformatics
Publication Type :
Periodical
Accession number :
ejs6733131
Full Text :
https://doi.org/10.1093/bioinformatics/bth461