Back to Search Start Over

Heterogeneity in the gene regulatory landscape of leiomyosarcoma

Authors :
Tatiana Belova
Nicola Biondi
Ping-Han Hsieh
Pavlo Lutsik
Priya Chudasama
Marieke L. Kuijjer
Publication Year :
2023
Publisher :
Zenodo, 2023.

Abstract

We reconstructed gene regulatory networks for 80 TCGA and 37 DKFZ leiomyosarcoma samples, and used these networks as input to PORCUPINE (Principal Components Analysis to Obtain Regulatory Contributions Using Pathway-based Interpretation of Network Estimates) method to identify pathways driving leiomyosarcoma heterogeneity. In short, PORCUPINE combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population. This repository contains following: “rse_gene.RData” - A RangedSummarizedExperiment-class object for the TCGA RNA-seq data. “LMS_37_readCount.txt”- Raw expression count data for 37 DKFZ-LMS samples. This file contains a 57,820 by 38 dataframe, where the first column is gene ID. “GN_ensemblID_symbol.txt”- A 55,476 by 2 dataframe where the first column is Ensemble ID and the second column is gene symbol, corresponding to features in “LMS_37_readCount.txt”. “preprocessing_and_normalization.R” - R script with preprocessing and normalization workflow of the data. “prior.txt” - Prior information on potential regulatory interactions, obtained from scanning known TF motifs to promoter regions in the human genome. This prior network was previously published in Lopes-Ramos et al. 2021 Cancer Research (PMID 34493595). This file contains a 11,151,077 by 3 dataframe, where the first column is the transcription factor's gene IDs, the second column is the target gene IDs and third column shows the presence (1) or absence (0) of a motif of a TF in a promoter region of a gene. “exp.txt” - Gene expression data, contains a 17,899 by 11,322 dataframe including normalized expression data for each sample. The first column is a gene ID. “ppi.txt” – protein-protein interactions between TFs obtained from StringDb (https://string-db.org/), as in Lopes-Ramos et al. 2021 (PMID 34493595). The file contains a 80,037 by 3 dataframe with three columns, where the first two columns contain protein IDs and the third column contains a score for each interaction. “80_tcga_lms_net.RData” - patient-specific gene regulatory networks for 80 TCGA leiomyosarcoma samples. This file contains a 11,151,077 by 80 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file. “37_dkfz_lms_net.RData” - patient-specific gene regulatory networks for 37 DKFZ leiomyosarcoma samples. This file contains a 11,151,077 by 37 dataframe that includes edge weights for each sample. Edge order corresponds to edge order in the edges.RData file. “edges.RData” - regulatory edge information, includes a 11,151,077 by 3 dataframe with three columns: reg (the transcription factor's gene IDs), tar (the target gene IDs), prior (information from “prior” network). "PORCUPINE.zip" - PORCUPINE R package<br />This work was supported by the Norwegian Research Council, Helse Sør-Øst, and University of Oslo through the Centre for Molecular Medicine Norway (187615), the Norwegian Research Council (313932), Familien Blix Fond, as well as the Emmy Noether Programme Grant from the German Research Foundation (DFG, No. CH 2302/1-1).

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....40f27326059085fe3035ad4cf5c1f84b
Full Text :
https://doi.org/10.5281/zenodo.7919284