Back to Search
Start Over
Comparative Analysis of Transformation Methods for Gene Expression Profiles in Breast Cancer Datasets
- Source :
- BIBE
- Publication Year :
- 2016
- Publisher :
- IEEE, 2016.
-
Abstract
- Gene expression profiling has been increasingly used in clinical practice. Integration of expression data across multiple experiments provides better insight into the heterogeneity of the biology being examined. A problem of the data integration, an experimental batch from platform or laboratory sources, remains a barrier to systematically analyzing data across different datasets. Several methods (such as, ComBat) have been proposed to remove batch effects. However, these methods often make assumptions about ideal distribution of the underlying data. Difficulties might be expected when comparing datasets that have fundamentally different (dataset-dependent) distributions. For example, clinical datasets are often collected from patient samples with various disease stages or conditions. Therefore, we have compared several mathematical transformations across many datasets, including the nonparametric Z scaling transformation method (NPZ) we have proposed for clinical use. We selected 2,813 patients with available information on estrogen receptor (ER) status or human epidermal growth factor receptor 2 (HER2) status from 24 Affymetrix HG-U133 (GPL96) or Affymetrix HG-U133 plus 2.0 (GPL570) datasets in the Gene Expression Omnibus database. The microarray expression data were processed with one of the four following methods: Raw (background correction and log transformation only), Microarray Suite 5.0 (MAS5), frozen robust multiarray analysis (fRMA), and radius minimax (RMX). The normalized data were sequentially transformed by using one of the following five methods: untransformed (without transformation), single-array-based transformations (RANK, Z, NPZ, or YuGene). Finally, we compared the ER and HER2 statuses assessed by immunohistochemical (IHC) staining with mRNA expression. We found that single-array-based transformation in addition to normalization improved the concordance rates of the IHC staining. We demonstrated the influence of transformation by using breast cancer samples and showed that adding single-array-based transformations to microarray expression data resulted in stronger correlations with IHC staining.
- Subjects :
- 0301 basic medicine
Normalization (statistics)
Microarray
Concordance
Nonparametric statistics
Computational biology
Biology
medicine.disease
computer.software_genre
Gene expression profiling
03 medical and health sciences
030104 developmental biology
0302 clinical medicine
Breast cancer
Transformation (function)
030220 oncology & carcinogenesis
Gene expression
medicine
Data mining
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)
- Accession number :
- edsair.doi...........bdded895ff0f8d36825e457c309f92d2
- Full Text :
- https://doi.org/10.1109/bibe.2016.51