1. Verification of gene expression profiles for colorectal cancer using 12 internet public microarray datasets
- Author
-
Harn-Jing Terng, Sui-Lung Su, Yu-Tien Chang, Hsiu-Ling Chou, Kang-Hua Chen, Chi-Ming Chu, Yun-Wen Shih, Ching-Huang Lai, Yu-Ching Chou, Chung-Tay Yao, Thomas Wetter, Chi-Wen Chang, and Chi-Shuan Huang
- Subjects
Multivariate analysis ,Microarray ,Colorectal cancer ,education ,Computational biology ,Biology ,Bioinformatics ,Predictive Value of Tests ,Risk Factors ,Retrospective Study ,Gene expression ,Databases, Genetic ,medicine ,Biomarkers, Tumor ,Odds Ratio ,Humans ,Genetic Predisposition to Disease ,Oligonucleotide Array Sequence Analysis ,Internet ,business.industry ,Gene Expression Profiling ,Gastroenterology ,Reproducibility of Results ,General Medicine ,Odds ratio ,medicine.disease ,Gene expression profiling ,Logistic Models ,Predictive value of tests ,Case-Control Studies ,Multivariate Analysis ,The Internet ,business ,Colorectal Neoplasms - Abstract
To verify gene expression profiles for colorectal cancer using 12 internet public microarray datasets.Logistic regression analysis was performed, and odds ratios for each gene were determined between colorectal cancer (CRC) and controls. Twelve public microarray datasets of GSE 4107, 4183, 8671, 9348, 10961, 13067, 13294, 13471, 14333, 15960, 17538, and 18105, which included 519 cases of adenocarcinoma and 88 normal mucosa controls, were pooled and used to verify 17 selective genes from 3 published studies and estimate the external generality.We validated the 17 CRC-associated genes from studies by Chang et al (Model 1: 5 genes), Marshall et al (Model 2: 7 genes) and Han et al (Model 3: 5 genes) and performed the multivariate logistic regression analysis using the pooled 12 public microarray datasets as well as the external validation. The goodness-of-fit test of Hosmer-Lemeshow (H-L) showed statistical significance (P = 0.044) for Model 2 of Marshall et al in which observed event rates did not match expected event rates in subgroups of the model population. Expected and observed event rates in subgroups were similar, which are called well calibrated, in Models 1, 3 and 4 with non-significant P values of 0.460, 0.194 and 1.000 for H-L tests, respectively. A 7-gene model of CPEB4, EIF2S3, MGC20553, MS4A1, ANXA3, TNFAIP6 and IL2RB was pairwise selected, which showed the best results in logistic regression analysis (H-L P = 1.000, R (2) = 0.951, areas under the curve = 0.999, accuracy = 0.968, specificity = 0.966 and sensitivity = 0.994).A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.
- Published
- 2013