1. Additional file 7: of Genomics of NSCLC patients both affirm PD-L1 expression and predict their clinical responses to anti-PD-1 immunotherapy
- Author
-
Brogden, Kim, Parashar, Deepak, Hallier, Andrea, Braun, Terry, Qian, Fang, Naiyer Rizvi, Bossler, Aaron, Milhem, Mohammed, Chan, Timothy, Abbasi, Taher, and Vali, Shireen
- Abstract
Figure S2. An example of the relationship between PD-L1 expression and predicted TGFB1 expression using Weka 3 algorithms for all patients in the dataset. Similar trends were seen when comparing the PD-L1 expression level to the other 13 predicted molecules. For this, the number of gene mutations identified for each patient ranged from 2 to 36 with a total of 264 unique genes between all patients. This categorical data was preprocessed and expanded into a gene vector of length 264 to represent each of the unique genes. For each gene in the vector, the data was represented in binary; a 1 was assigned if the patient had a mutation in this gene, a 0 otherwise. Two datasets, one including gene mutations (Molecules and Gene Mutations) and one without (Molecules), were both used to learn prediction models. The Discovery and Validation datasets were determined based on the split provided to allow for comparable results. The performance of a subset of these models on the testing and training sets for both Molecules and Molecules and Gene Mutations datasets are shown. The SMO support vector machine with a normalized polynomial kernel had the best performance when applied to the molecule dataset. This model correctly identified 24 out of 29 patients whereas the simulation models correctly identified 25 of 29. This was only a difference of one match between the two prediction methods. Still, several other methods, while not performing as well overall, were able to identify 9 patients in the test dataset accurately. This was near the computational simulation model prediction capability in which 10 patients were successfully identified in the test dataset. In general, adding the gene mutation data to the molecule data either maintained or decreased the performance of a model. (DOCX 4114Â kb)
- Published
- 2018
- Full Text
- View/download PDF