Carrie Sougnez, Michael R. Reich, Sarah M. Kehoe, Adam Callahan, Giordano Caponigro, Dmitriy Sonkin, Jianwei Che, Todd R. Golub, Robert C. Onofrio, Lili Niu, Kristin G. Ardlie, Wendy Winckler, Pichai Raman, Matthew Meyerson, Markus Warmuth, Kavitha Venkatesan, Adam Margolin, William R. Sellers, Michael D. Jones, Paula Morais, Jordi Barretina, Robert Schlegel, Levi A. Garraway, Sungjoon Kim, Laura E. MacConaill, Nicolas Stransky, Scott Mahan, Ted Liefeld, Christine D. Wilson, Cory M. Johannessen, Stacey Gabriel, Charlie Hatton, Gad Getz, Barbara L. Weber, Supriya Gupta, Andrew I. Su, Jill P. Mesirov, Michael Morrissey, Jessica Harris, and Michael F. Berger
The Cancer Cell Line Encyclopedia represents a collaborative effort to assemble a comprehensive resource of human cancer models for basic and translational research. The CCLE aims to contain high-density SNP microarray data, gene expression microarray data and selected cancer gene mutation data for approximately 1000 human cancer cell lines spanning many tumor types. In addition, we are assessing sensitivity of some of these cell lines using a series of pharmacological compounds that represent both conventional cytotoxic and targeted agents. Another goal of the CCLE collaboration involves systematic integration of the genomic and pharmacologic datasets in order to identify putative targets of prevalent genetic alterations as well as predictors of pharmacologic sensitivity and resistance. The availability of high-quality data across hundreds of cell lines markedly enhances the statistical power to discover genetic alterations involved in carcinogenesis and molecular predictors of pharmacologic vulnerability. We are assembling systematic algorithms that identify genetic predictors of sensitivity or resistance to particular pharmacological compounds. Toward this end, we integrated a sensitivity dataset for 28 compounds profiled against more than 500 cell lines with all genomic data available in the CCLE. Gene expression, DNA copy-number and loss of heterozygosity values were combined with critical oncogene mutations and genotype information as inputs to multifaceted prediction models for pharmacological sensitivity, the accuracies of which were assessed using cross-validations. Two complementary paths were followed in order to predict for the sensitivity of cancer cell lines to pharmacological compounds. In a categorical classification-based model, we have used the Naïve Bayes algorithm to find the most significant features predictive of the sensitive or resistant status of the cell lines to each of the compounds. A second path that we followed is a regression-based machine learning approach, called the Elastic Net, where the goal is to predict a continuous value representing the sensitivity of each cell line, such as the GI50. For both approaches, we show examples of the results that we get for some compounds in the collection, such as AZD6244 (MEK), PHA-665752 (MET), Nutlin-3 (MDM2). We detail the performances of the prediction models as well as the most significant genetic determinants of sensitivity to these inhibitors. Several previously unappreciated genomic predictors of response or intrinsic resistance to targeted anticancer agents have been identified. Our results suggest that this integrative approach applied to a robust cancer cell line collection such as the CCLE has considerable power to discover novel associations that augment ongoing basic research into cancer biology and drug discovery. Citation Information: Clin Cancer Res 2010;16(14 Suppl):PR4.