107 results on '"Soneson C"'
Search Results
2. Community-driven ELIXIR activities in single-cell omics [version 1; peer review: awaiting peer review]
- Author
-
Czarnewski, P., Mahfouz, Ahmed, Calogero, R. A., Palagi, P. M., Portell-Silva, L., Gonzalez-Uriarte, A., Soneson, C., Burdett, T., Szomolay, B., Videm, P., Hotz, H. R., Papatheodorou, I., Hancock, J. M., Gruening, B., Haerty, W., Krause, Roland, Capella-Gutierrez, S., Lesko?ek, B., Alessandri, L., Arigoni, M., Rezen, T., Botzki, A., Ferk, P., Lindvall, J., Heil, Katharina, Ishaque, N., Korpelainen, E., Czarnewski, P., Mahfouz, Ahmed, Calogero, R. A., Palagi, P. M., Portell-Silva, L., Gonzalez-Uriarte, A., Soneson, C., Burdett, T., Szomolay, B., Videm, P., Hotz, H. R., Papatheodorou, I., Hancock, J. M., Gruening, B., Haerty, W., Krause, Roland, Capella-Gutierrez, S., Lesko?ek, B., Alessandri, L., Arigoni, M., Rezen, T., Botzki, A., Ferk, P., Lindvall, J., Heil, Katharina, Ishaque, N., and Korpelainen, E.
- Published
- 2022
3. Nuclear non-coding RNA regulation: SW01.S2–16
- Author
-
Azzalin, C. M., Shchepachev, V., and Soneson, C.
- Published
- 2013
4. Interactive visualization as a key tool for understanding medical omics data
- Author
-
Marini, F, Soneson, C, Rue-Albrecht, K, and Lun, A
- Subjects
ddc: 610 ,shiny ,Bioconductor ,genomics ,interactive ,610 Medical sciences ,Medicine ,exploration ,visualization - Abstract
Life sciences have been evolving through the last decade to become a quantitative discipline, with a leading role played by high-throughput technologies (gene expression profiling, protein quantitation via mass spectrometry, high-throughput imaging). Data is available in different experimental[for full text, please go to the a.m. URL], 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)
- Published
- 2019
- Full Text
- View/download PDF
5. Tximeta: Reference sequence checksums for provenance identification in RNA-seq
- Author
-
Pertea, M, Love, MI, Soneson, C, Hickey, PF, Johnson, LK, Pierce, NT, Shepherd, L, Morgan, M, Patro, R, Pertea, M, Love, MI, Soneson, C, Hickey, PF, Johnson, LK, Pierce, NT, Shepherd, L, Morgan, M, and Patro, R
- Abstract
Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.
- Published
- 2020
6. O-0025COLORECTAL CANCER SUBTYPING CONSORTIUM (CRCSC) IDENTIFIES CONSENSUS OF MOLECULAR SUBTYPES
- Author
-
Dienstmann, R., Guinney, J., Delorenzi, M., De Reynies, A., Roepman, P., Sadanandam, A., Vermeulen, L., Schlicker, A., Missiaglia, E., Soneson, C., Marisa, L., Homicsko, K., Wang, X., Simon, I., Laurent-Puig, P., Wessels, L., Medema, J.P., Kopetz, S., Friend, S., Tejpar, S., Group, Colorectal Cancer Subtyping Consortium, Dienstmann, R., Guinney, J., Delorenzi, M., De Reynies, A., Roepman, P., Sadanandam, A., Vermeulen, L., Schlicker, A., Missiaglia, E., Soneson, C., Marisa, L., Homicsko, K., Wang, X., Simon, I., Laurent-Puig, P., Wessels, L., Medema, J.P., Kopetz, S., Friend, S., Tejpar, S., and Group, Colorectal Cancer Subtyping Consortium
- Published
- 2017
7. The ABCs of viral hepatitis - defined biomarker signatures of acute viral hepatitis
- Author
-
Duffy D, Saleh R, Laird M, Soneson C, Le Fouler L, El-Daly M, Casrouge A, Decalf J, Abbas A, Eldin NS, Fontes M, Hamid MA, Mohamed MK, Rafik M, Fontanet A, and Albert ML
- Published
- 2014
8. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases
- Author
-
Bultet, LA, Aguilar-Rodriguez, J, Ahrens, CH, Ahrne, EL, Ai, N, Aimo, L, Akalin, A, Aleksiev, T, Alocci, D, Altenhoff, A, Alves, I, Ambrosini, G, Pedone, PA, Angelina, P, Anisimova, M, Appel, R, Argoud-Puy, G, Arnold, K, Arpat, B, Artimo, P, Ascencao, K, Auchincloss, A, Axelsen, K, Gerritsen, VB, Bairoch, A, Barisal, P, Baratin, D, Barbato, A, Barbie, V, Barras, D, Barreiro, M, Barret, S, Bastian, F, Batista Neto, TM, Baudis, M, Beaudoing, E, Beckmann, JS, Bekkar, AK, Cammoun, LBH, Benmohammed, S, Bernard, M, Bertelli, C, Bertoni, M, Bienert, S, Bignucolo, O, Bilbao, A, Bilican, A, Blank, D, Blatter, M-C, Blum, L, Bocquet, J, Boeckmann, B, Bolleman, JT, Bordoli, L, Bosshard, L, Boucher, G, Bougueleret, L, Boutet, E, Bovigny, C, Bratulic, S, Breuza, L, Bridge, AJ, Britan, A, Brito, F, Frazao, JB, Bruggmann, R, Bucher, P, Burdet, F, Burger, L, Cabello, EM, Gomez, RMC, Calderon, S, Cannarozzi, G, Carl, S, Casas, CC, Catherinet, S, Perier, RC, Charpilloz, C, Chaskar, PD, Chen, W, Pepe, AC, Chopard, B, Chu, HY, Civic, N, Claassen, M, Clottu, S, Colombo, M, Cosandier, I, Coudert, E, Crespo, I, Creus, M, Cuche, B, Cuendet, MA, Cusin, I, Daga, N, Daina, A, Dauvillier, J, David, F, Davydov, I, Ferreira, MDSRM, de Beer, T, de Castro, E, de Santana, C, Delafontaine, J, Delorenzi, M, Delucinge-Vivier, C, Demirel, O, Derham, R, Dermitzakis, EM, Dib, L, Diene, S, Dilek, N, Dilmi, J, Domagalski, MJ, Dorier, J, Dornevil, D, Dousse, A, Dreos, R, Duchen, P, Roggli, PD, Duperret, ID, Durinx, C, Duvaud, S, Engler, R, Frkek, S, Lopez, PE, Fstreicher, A, Excoffier, L, Fabbretti, R, Falcone, J-L, Falquet, L, Famiglietti, ML, Ferreira, A-M, Feuermann, M, Filliettaz, M, Hegel, V, Foucal, A, Franceschini, A, Fucile, G, Gaidatzis, D, Garcia, V, Gasteiger, E, Gateau, A, Gatti, L, Gaudet, P, Gaudinat, A, Gehant, S, Gfeller, D, Gharib, WH, Ghraichy, M, Gidoin, C, Gil, M, Gleizes, A, Gobeill, J, Gonnet, G, Gos, A, Gotz, L, Gouy, A, Grbic, D, Groux, R, Gruaz-Gumowski, N, Grun, D, Gschwind, A, Guex, N, Gupta, S, Getaz, M, Haake, D, Haas, J, Hatzimanikatis, V, Heckel, G, Gardiol, DFH, Hinard, V, Hinz, U, Homicsko, K, Horlacher, O, Hosseini, S-R, Hotz, H-R, Hulo, C, Hundsrucker, C, Ibberson, M, Ilmjarv, S, Ioannidis, V, Ioannidis, P, Iseli, C, Ivanek, R, Iwaszkiewicz, J, Jacquet, P, Jacquot, M, Jagannathan, V, Jan, M, Jensen, J, Johansson, MU, Johner, N, Jungo, F, Junier, T, Kahraman, A, Katsantoni, M, Keller, G, Kerhornou, A, Khalid, F, Klingbiel, D, Kimljenovic, A, Kriventseva, E, Kryuchkova, N, Kumar, S, Kutalik, Z, Kuznetsov, D, Kuzyakiv, R, Lane, L, Lara, V, Ledesma, L, Leleu, M, Lemercier, P, Lew, D, Lieberherr, D, Liechti, R, Lisacek, F, Fischer, H, Litsios, G, Liu, J, Lombardot, T, Mace, A, Maffioletti, S, Mahi, M-A, Maiolo, M, Majjigapu, SR, Malmstrom, L, Mangold, V, Marek, D, Mariethoz, J, Marin, R, Martin, O, Martin, X, Martin-Campos, T, Mary, C, Masclaux, F, Masson, P, Meier, C, Messina, A, Lenoir, MM, Meyer, X, Michel, P-A, Michielin, O, Milanese, A, Missiaglia, E, Perez, JM, Caria, VM, Moret, P, Moretti, S, Morgat, A, Mottaz, A, Mottin, L, Mouscaz, Y, Mueller, M, Murri, R, Mylonas, R, Neuenschwander, S, Nikitin, F, Niknejad, A, Nouspikel, N, Nso, LN, Okoniewski, M, Omasits, U, Paccaud, B, Pachkov, M, Paesano, SG, Pagni, M, Palagi, PM, Pasche, E, Payne, JL, Pedruzzi, I, Peischl, S, Peitsch, M, Perlini, S, Pilbout, S, Podvinec, M, Pohlmann, R, Polizzi, D, Potter, D, Poux, S, Pozzato, M, Pradervand, S, Praz, V, Pruess, M, Pujadas, E, Racle, J, Raschi, M, Ratib, O, Rausell, A, de Laval, VR, Redaschi, N, Rempfer, C, Ren, G, Vandati, RAR, Rib, L, Grognuz, OR, Altimiras, ER, Rivoire, C, Robin, T, Robinson-Rechavi, M, Rodrigues, J, Roechert, B, Roelli, P, Romano, V, Rossier, G, Roth, A, Rougemont, J, Roux, J, Royo, H, Ruch, P, Ruinelli, M, Rustom, M, Sates, A, Roehrig, UF, Rueeger, S, Salamin, N, Sankar, M, Sarkar, N, Saxenhofer, M, Schaeffer, M, Schaerli, Y, Schaper, E, Schmid, A, Schmid, E, Schmid, C, Schmid, M, Schmidt, S, Schmocker, D, Schneider, M, Schuepbach, T, Schwede, T, Schuetz, F, Sengstag, T, Serrano, M, Sethi, A, Shahmirzadi, O, Sigrist, C, Silvestro, D, Simao Neto, FA, Simillion, C, Simonovic, M, Skunca, N, Sluzek, K, Soneson, C, Sprouffske, K, Stadler, M, Staehli, S, Stevenson, B, Stockinger, H, Straszewski, J, Stricker, T, Studer, G, Stutz, A, Suffiotti, M, Sundaram, S, Szklarczyk, D, Szovenyi, P, Tegenfeldt, F, Teixeira, D, Tellenbach, S, Smith, AAT, Tognolli, M, Topolsky, I, Thuong, VDT, Tsantoulis, P, Tzika, AC, Agote, AU, van Nimwegen, E, von Mering, C, Varadarajan, A, Veranneman, M, Verbregue, L, Veuthey, A-L, Vishnyakova, D, Vyas, R, Wagner, A, Walther, D, Wan, HW, Wang, M, Waterhouse, R, Waterhouse, A, Wicki, A, Wigger, L, Wirapati, P, Witschi, U, Wyder, S, Wyler, K, Wuethrich, D, Xenarios, I, Yamada, K, Yan, Z, Yasrebi, H, Zahn, M, Zangger, N, Zdobnov, E, Zerzion, D, Zoete, V, Zoller, S, Bultet, LA, Aguilar-Rodriguez, J, Ahrens, CH, Ahrne, EL, Ai, N, Aimo, L, Akalin, A, Aleksiev, T, Alocci, D, Altenhoff, A, Alves, I, Ambrosini, G, Pedone, PA, Angelina, P, Anisimova, M, Appel, R, Argoud-Puy, G, Arnold, K, Arpat, B, Artimo, P, Ascencao, K, Auchincloss, A, Axelsen, K, Gerritsen, VB, Bairoch, A, Barisal, P, Baratin, D, Barbato, A, Barbie, V, Barras, D, Barreiro, M, Barret, S, Bastian, F, Batista Neto, TM, Baudis, M, Beaudoing, E, Beckmann, JS, Bekkar, AK, Cammoun, LBH, Benmohammed, S, Bernard, M, Bertelli, C, Bertoni, M, Bienert, S, Bignucolo, O, Bilbao, A, Bilican, A, Blank, D, Blatter, M-C, Blum, L, Bocquet, J, Boeckmann, B, Bolleman, JT, Bordoli, L, Bosshard, L, Boucher, G, Bougueleret, L, Boutet, E, Bovigny, C, Bratulic, S, Breuza, L, Bridge, AJ, Britan, A, Brito, F, Frazao, JB, Bruggmann, R, Bucher, P, Burdet, F, Burger, L, Cabello, EM, Gomez, RMC, Calderon, S, Cannarozzi, G, Carl, S, Casas, CC, Catherinet, S, Perier, RC, Charpilloz, C, Chaskar, PD, Chen, W, Pepe, AC, Chopard, B, Chu, HY, Civic, N, Claassen, M, Clottu, S, Colombo, M, Cosandier, I, Coudert, E, Crespo, I, Creus, M, Cuche, B, Cuendet, MA, Cusin, I, Daga, N, Daina, A, Dauvillier, J, David, F, Davydov, I, Ferreira, MDSRM, de Beer, T, de Castro, E, de Santana, C, Delafontaine, J, Delorenzi, M, Delucinge-Vivier, C, Demirel, O, Derham, R, Dermitzakis, EM, Dib, L, Diene, S, Dilek, N, Dilmi, J, Domagalski, MJ, Dorier, J, Dornevil, D, Dousse, A, Dreos, R, Duchen, P, Roggli, PD, Duperret, ID, Durinx, C, Duvaud, S, Engler, R, Frkek, S, Lopez, PE, Fstreicher, A, Excoffier, L, Fabbretti, R, Falcone, J-L, Falquet, L, Famiglietti, ML, Ferreira, A-M, Feuermann, M, Filliettaz, M, Hegel, V, Foucal, A, Franceschini, A, Fucile, G, Gaidatzis, D, Garcia, V, Gasteiger, E, Gateau, A, Gatti, L, Gaudet, P, Gaudinat, A, Gehant, S, Gfeller, D, Gharib, WH, Ghraichy, M, Gidoin, C, Gil, M, Gleizes, A, Gobeill, J, Gonnet, G, Gos, A, Gotz, L, Gouy, A, Grbic, D, Groux, R, Gruaz-Gumowski, N, Grun, D, Gschwind, A, Guex, N, Gupta, S, Getaz, M, Haake, D, Haas, J, Hatzimanikatis, V, Heckel, G, Gardiol, DFH, Hinard, V, Hinz, U, Homicsko, K, Horlacher, O, Hosseini, S-R, Hotz, H-R, Hulo, C, Hundsrucker, C, Ibberson, M, Ilmjarv, S, Ioannidis, V, Ioannidis, P, Iseli, C, Ivanek, R, Iwaszkiewicz, J, Jacquet, P, Jacquot, M, Jagannathan, V, Jan, M, Jensen, J, Johansson, MU, Johner, N, Jungo, F, Junier, T, Kahraman, A, Katsantoni, M, Keller, G, Kerhornou, A, Khalid, F, Klingbiel, D, Kimljenovic, A, Kriventseva, E, Kryuchkova, N, Kumar, S, Kutalik, Z, Kuznetsov, D, Kuzyakiv, R, Lane, L, Lara, V, Ledesma, L, Leleu, M, Lemercier, P, Lew, D, Lieberherr, D, Liechti, R, Lisacek, F, Fischer, H, Litsios, G, Liu, J, Lombardot, T, Mace, A, Maffioletti, S, Mahi, M-A, Maiolo, M, Majjigapu, SR, Malmstrom, L, Mangold, V, Marek, D, Mariethoz, J, Marin, R, Martin, O, Martin, X, Martin-Campos, T, Mary, C, Masclaux, F, Masson, P, Meier, C, Messina, A, Lenoir, MM, Meyer, X, Michel, P-A, Michielin, O, Milanese, A, Missiaglia, E, Perez, JM, Caria, VM, Moret, P, Moretti, S, Morgat, A, Mottaz, A, Mottin, L, Mouscaz, Y, Mueller, M, Murri, R, Mylonas, R, Neuenschwander, S, Nikitin, F, Niknejad, A, Nouspikel, N, Nso, LN, Okoniewski, M, Omasits, U, Paccaud, B, Pachkov, M, Paesano, SG, Pagni, M, Palagi, PM, Pasche, E, Payne, JL, Pedruzzi, I, Peischl, S, Peitsch, M, Perlini, S, Pilbout, S, Podvinec, M, Pohlmann, R, Polizzi, D, Potter, D, Poux, S, Pozzato, M, Pradervand, S, Praz, V, Pruess, M, Pujadas, E, Racle, J, Raschi, M, Ratib, O, Rausell, A, de Laval, VR, Redaschi, N, Rempfer, C, Ren, G, Vandati, RAR, Rib, L, Grognuz, OR, Altimiras, ER, Rivoire, C, Robin, T, Robinson-Rechavi, M, Rodrigues, J, Roechert, B, Roelli, P, Romano, V, Rossier, G, Roth, A, Rougemont, J, Roux, J, Royo, H, Ruch, P, Ruinelli, M, Rustom, M, Sates, A, Roehrig, UF, Rueeger, S, Salamin, N, Sankar, M, Sarkar, N, Saxenhofer, M, Schaeffer, M, Schaerli, Y, Schaper, E, Schmid, A, Schmid, E, Schmid, C, Schmid, M, Schmidt, S, Schmocker, D, Schneider, M, Schuepbach, T, Schwede, T, Schuetz, F, Sengstag, T, Serrano, M, Sethi, A, Shahmirzadi, O, Sigrist, C, Silvestro, D, Simao Neto, FA, Simillion, C, Simonovic, M, Skunca, N, Sluzek, K, Soneson, C, Sprouffske, K, Stadler, M, Staehli, S, Stevenson, B, Stockinger, H, Straszewski, J, Stricker, T, Studer, G, Stutz, A, Suffiotti, M, Sundaram, S, Szklarczyk, D, Szovenyi, P, Tegenfeldt, F, Teixeira, D, Tellenbach, S, Smith, AAT, Tognolli, M, Topolsky, I, Thuong, VDT, Tsantoulis, P, Tzika, AC, Agote, AU, van Nimwegen, E, von Mering, C, Varadarajan, A, Veranneman, M, Verbregue, L, Veuthey, A-L, Vishnyakova, D, Vyas, R, Wagner, A, Walther, D, Wan, HW, Wang, M, Waterhouse, R, Waterhouse, A, Wicki, A, Wigger, L, Wirapati, P, Witschi, U, Wyder, S, Wyler, K, Wuethrich, D, Xenarios, I, Yamada, K, Yan, Z, Yasrebi, H, Zahn, M, Zangger, N, Zdobnov, E, Zerzion, D, Zoete, V, and Zoller, S
- Abstract
The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article.
- Published
- 2016
9. Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage.
- Author
-
Soneson, C, Matthes, KL, Nowicka, M, Law, CW, Robinson, MD, Soneson, C, Matthes, KL, Nowicka, M, Law, CW, and Robinson, MD
- Abstract
BACKGROUND: RNA-seq has been a boon to the quantitative analysis of transcriptomes. A notable application is the detection of changes in transcript usage between experimental conditions. For example, discovery of pathological alternative splicing may allow the development of new treatments or better management of patients. From an analysis perspective, there are several ways to approach RNA-seq data to unravel differential transcript usage, such as annotation-based exon-level counting, differential analysis of the percentage spliced in, or quantitative analysis of assembled transcripts. The goal of this research is to compare and contrast current state-of-the-art methods, and to suggest improvements to commonly used work flows. RESULTS: We assess the performance of representative work flows using synthetic data and explore the effect of using non-standard counting bin definitions as input to DEXSeq, a state-of-the-art inference engine. Although the canonical counting provided the best results overall, several non-canonical approaches were as good or better in specific aspects and most counting approaches outperformed the evaluated event- and assembly-based methods. We show that an incomplete annotation catalog can have a detrimental effect on the ability to detect differential transcript usage in transcriptomes with few isoforms per gene and that isoform-level prefiltering can considerably improve false discovery rate control. CONCLUSION: Count-based methods generally perform well in the detection of differential transcript usage. Controlling the false discovery rate at the imposed threshold is difficult, particularly in complex organisms, but can be improved by prefiltering the annotation catalog.
- Published
- 2016
10. A glioma classification scheme based on coexpression modules of EGFR and PDGFRA
- Author
-
Sun, Y. (Yingyu), Zhang, W. (Wei), Chen, D. (Dongfeng), Lv, H. (Hui), Zheng, J. (Junxiong), Lilljebjörnd, H. (Henrik), Ran, L. (Liang), Bao, T.X. (Tang Xue), Soneson, C. (Charlotte), Olov Sjögren, H. (Hans), Salford, L.G. (Leif), Ji, J. (Jianguang), Frenc, P.J. (Pim), Fioretose, T. (Thoas), Jiang, T. (Tao), Fan, X. (Xiaolong), Sun, Y. (Yingyu), Zhang, W. (Wei), Chen, D. (Dongfeng), Lv, H. (Hui), Zheng, J. (Junxiong), Lilljebjörnd, H. (Henrik), Ran, L. (Liang), Bao, T.X. (Tang Xue), Soneson, C. (Charlotte), Olov Sjögren, H. (Hans), Salford, L.G. (Leif), Ji, J. (Jianguang), Frenc, P.J. (Pim), Fioretose, T. (Thoas), Jiang, T. (Tao), and Fan, X. (Xiaolong)
- Abstract
We hypothesized that key signaling pathways of glioma genesis might enable the molecular classification of gliomas. Gene coexpression modules around epidermal growth factor receptor (EGFR) (EM, 29 genes) or platelet derived growth factor receptor A (PDGFRA) (PM, 40 genes) in gliomas were identified. Based on EM and PM expression signatures, nonnegative matrix factorization reproducibly clustered 1,369 adult diffuse gliomas WHO grades II-IV from four independent databases generated in three continents, into the subtypes (EM, PM and EMlowPMlow gliomas) in a morphology-independent manner. Besides their distinct patterns of genomic alterations, EM gliomas were associated with higher age at diagnosis, poorer prognosis, and stronger expression of neural stem cell and astrogenesis genes. Both PM and EMlowPMlow gliomas were associated with younger age at diagnosis and better prognosis. PM gliomas were enriched in the expression of oligodendrogenesis genes, whereas EMlowPMlow gliomas were enriched in the signatures of mature neurons and oligodendrocytes. The EM/PM-based molecular classification scheme is applicable to adult low-grade and high-grade diffuse gliomas, and outperforms existing classification schemes in assigning diffuse gliomas to subtypes with distinct transcriptomic and genomic profiles. The majority of the EM/PM classifiers, including regulators of glial fate decisions, have not been extensively studied in glioma biology. Subsets of these classifiers were coexpressed in mouse glial precursor cells, and frequently amplified or lost in an EM/PM glioma subtypespecific manner, resulting in somatic copy number alteration-dependent gene expression that contributes to EM/PM signatures in glioma samples. EM/PM-based molecular classification provides a molecular diagnostic framework to expedite the search for new glioma therapeutic targets.
- Published
- 2014
- Full Text
- View/download PDF
11. Colorectal Cancer Subtyping Consortium (CRCSC) Identifies Consensus of Molecular Subtypes
- Author
-
Dienstmann, R., primary, Guinney, J., additional, Delorenzi, M., additional, De Reynies, A., additional, Roepman, P., additional, Sadanandam, A., additional, Vermeulen, L., additional, Schlicker, A., additional, Missiaglia, E., additional, Soneson, C., additional, Marisa, L., additional, Homicsko, K., additional, Wang, X., additional, Simon, I., additional, Laurent-Puig, P., additional, Wessels, L., additional, Medema, J.P., additional, Kopetz, S., additional, Friend, S., additional, and Tejpar, S., additional
- Published
- 2014
- Full Text
- View/download PDF
12. A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities
- Author
-
Soneson, C., primary and Fontes, M., additional
- Published
- 2011
- Full Text
- View/download PDF
13. O-0025 - Colorectal Cancer Subtyping Consortium (CRCSC) Identifies Consensus of Molecular Subtypes
- Author
-
Dienstmann, R., Guinney, J., Delorenzi, M., De Reynies, A., Roepman, P., Sadanandam, A., Vermeulen, L., Schlicker, A., Missiaglia, E., Soneson, C., Marisa, L., Homicsko, K., Wang, X., Simon, I., Laurent-Puig, P., Wessels, L., Medema, J.P., Kopetz, S., Friend, S., and Tejpar, S.
- Published
- 2014
- Full Text
- View/download PDF
14. A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities.
- Author
-
Soneson C and Fontes M
- Published
- 2012
15. O-0025 COLORECTAL CANCER SUBTYPING CONSORTIUM (CRCSC) IDENTIFIES CONSENSUS OF MOLECULAR SUBTYPES
- Author
-
Dienstmann, R., Guinney, J., Delorenzi, M., De Reynies, A., Roepman, P., Sadanandam, A., Vermeulen, L., Schlicker, A., Missiaglia, E., Soneson, C., Marisa, L., Homicsko, K., Wang, X., Simon, I., Laurent-Puig, P., Wessels, L., Medema, J.P., Kopetz, S., Friend, S., Tejpar, S., and Group, Colorectal Cancer Subtyping Consortium
16. O-0025COLORECTAL CANCER SUBTYPING CONSORTIUM (CRCSC) IDENTIFIES CONSENSUS OF MOLECULAR SUBTYPES
- Author
-
Dienstmann, R., Guinney, J., Delorenzi, M., De Reynies, A., Roepman, P., Sadanandam, A., Vermeulen, L., Schlicker, A., Missiaglia, E., Soneson, C., Marisa, L., Homicsko, K., Wang, X., Simon, I., Laurent-Puig, P., Wessels, L., Medema, J.P., Kopetz, S., Friend, S., Tejpar, S., Group, Colorectal Cancer Subtyping Consortium, Dienstmann, R., Guinney, J., Delorenzi, M., De Reynies, A., Roepman, P., Sadanandam, A., Vermeulen, L., Schlicker, A., Missiaglia, E., Soneson, C., Marisa, L., Homicsko, K., Wang, X., Simon, I., Laurent-Puig, P., Wessels, L., Medema, J.P., Kopetz, S., Friend, S., Tejpar, S., and Group, Colorectal Cancer Subtyping Consortium
17. The projection score - an evaluation criterion for variable subset selection in PCA visualization
- Author
-
Fontes Magnus and Soneson Charlotte
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization. Results We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA. Conclusions We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.
- Published
- 2011
- Full Text
- View/download PDF
18. Integrative analysis of gene expression and copy number alterations using canonical correlation analysis
- Author
-
Fioretos Thoas, Lilljebjörn Henrik, Soneson Charlotte, and Fontes Magnus
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background With the rapid development of new genetic measurement methods, several types of genetic alterations can be quantified in a high-throughput manner. While the initial focus has been on investigating each data set separately, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods based on Canonical Correlation Analysis (CCA) have been proposed for integrating paired genetic data sets. The high dimensionality of microarray data imposes computational difficulties, which have been addressed for instance by studying the covariance structure of the data, or by reducing the number of variables prior to applying the CCA. In this work, we propose a new method for analyzing high-dimensional paired genetic data sets, which mainly emphasizes the correlation structure and still permits efficient application to very large data sets. The method is implemented by translating a regularized CCA to its dual form, where the computational complexity depends mainly on the number of samples instead of the number of variables. The optimal regularization parameters are chosen by cross-validation. We apply the regularized dual CCA, as well as a classical CCA preceded by a dimension-reducing Principal Components Analysis (PCA), to a paired data set of gene expression changes and copy number alterations in leukemia. Results Using the correlation-maximizing methods, regularized dual CCA and PCA+CCA, we show that without pre-selection of known disease-relevant genes, and without using information about clinical class membership, an exploratory analysis singles out two patient groups, corresponding to well-known leukemia subtypes. Furthermore, the variables showing the highest relevance to the extracted features agree with previous biological knowledge concerning copy number alterations and gene expression changes in these subtypes. Finally, the correlation-maximizing methods are shown to yield results which are more biologically interpretable than those resulting from a covariance-maximizing method, and provide different insight compared to when each variable set is studied separately using PCA. Conclusions We conclude that regularized dual CCA as well as PCA+CCA are useful methods for exploratory analysis of paired genetic data sets, and can be efficiently implemented also when the number of variables is very large.
- Published
- 2010
- Full Text
- View/download PDF
19. Deep quantification of substrate turnover defines protease subsite cooperativity.
- Author
-
Gudipati RK, Gaidatzis D, Seebacher J, Muehlhaeusser S, Kempf G, Cavadini S, Hess D, Soneson C, and Großhans H
- Subjects
- Substrate Specificity, Animals, Humans, Mass Spectrometry, Peptides metabolism, Peptides chemistry, Catalytic Domain, Protein Engineering, Dipeptidyl Peptidase 4 metabolism, Dipeptidyl Peptidase 4 chemistry, Caenorhabditis elegans enzymology, Caenorhabditis elegans metabolism, Caenorhabditis elegans Proteins metabolism, Caenorhabditis elegans Proteins chemistry, Caenorhabditis elegans Proteins genetics, Glucagon-Like Peptide 1 metabolism, Glucagon-Like Peptide 1 chemistry
- Abstract
Substrate specificity determines protease functions in physiology and in clinical and biotechnological applications, yet quantitative cleavage information is often unavailable, biased, or limited to a small number of events. Here, we develop qPISA (quantitative Protease specificity Inference from Substrate Analysis) to study Dipeptidyl Peptidase Four (DPP4), a key regulator of blood glucose levels. We use mass spectrometry to quantify >40,000 peptides from a complex, commercially available peptide mixture. By analyzing changes in substrate levels quantitatively instead of focusing on qualitative product identification through a binary classifier, we can reveal cooperative interactions within DPP4's active pocket and derive a sequence motif that predicts activity quantitatively. qPISA distinguishes DPP4 from the related C. elegans DPF-3 (a DPP8/9-orthologue), and we relate the differences to the structural features of the two enzymes. We demonstrate that qPISA can direct protein engineering efforts like the stabilization of GLP-1, a key DPP4 substrate used in the treatment of diabetes and obesity. Thus, qPISA offers a versatile approach for profiling protease and especially exopeptidase specificity, facilitating insight into enzyme mechanisms and biotechnological and clinical applications., Competing Interests: Disclosure and competing interests statement. The authors declare no competing interests., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
20. DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes.
- Author
-
Tiberi S, Meili J, Cai P, Soneson C, He D, Sarkar H, Avalos-Pacheco A, Patro R, and Robinson MD
- Subjects
- Humans, RNA, Messenger genetics, Gene Expression Profiling methods, RNA Splicing genetics, Gene Expression Regulation, Models, Statistical, Bayes Theorem
- Abstract
Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package., (© The Author(s) 2024. Published by Oxford University Press.)
- Published
- 2024
- Full Text
- View/download PDF
21. Author Correction: The shaky foundations of simulating single-cell RNA sequencing data.
- Author
-
Crowell HL, Leonardo SXM, Soneson C, and Robinson MD
- Published
- 2024
- Full Text
- View/download PDF
22. The tidyomics ecosystem: enhancing omic data analyses.
- Author
-
Hutchison WJ, Keyes TJ, Crowell HL, Serizay J, Soneson C, Davis ES, Sato N, Moses L, Tarlinton B, Nahid AA, Kosmac M, Clayssen Q, Yuan V, Mu W, Park JE, Mamede I, Ryu MH, Axisa PP, Paiz P, Poon CL, Tang M, Gottardo R, Morgan M, Lee S, Lawrence M, Hicks SC, Nolan GP, Davis KL, Papenfuss AT, Love MI, and Mangiola S
- Subjects
- Humans, Computational Biology methods, Leukocytes, Mononuclear metabolism, Leukocytes, Mononuclear cytology, Genomics methods, Data Analysis, Software
- Abstract
The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools., (© 2024. Crown.)
- Published
- 2024
- Full Text
- View/download PDF
23. Rbm8a deficiency causes hematopoietic defects by modulating Wnt/PCP signaling.
- Author
-
Kocere A, Chiavacci E, Soneson C, Wells HH, Méndez-Acevedo KM, MacGowan JS, Jacobson ST, Hiltabidle MS, Raghunath A, Shavit JA, Panáková D, Williams MLK, Robinson MD, Mosimann C, and Burger A
- Abstract
Defects in blood development frequently occur among syndromic congenital anomalies. Thrombocytopenia-Absent Radius (TAR) syndrome is a rare congenital condition with reduced platelets (hypomegakaryocytic thrombocytopenia) and forelimb anomalies, concurrent with more variable heart and kidney defects. TAR syndrome associates with hypomorphic gene function for RBM8A/Y14 that encodes a component of the exon junction complex involved in mRNA splicing, transport, and nonsense-mediated decay. How perturbing a general mRNA-processing factor causes the selective TAR Syndrome phenotypes remains unknown. Here, we connect zebrafish rbm8a perturbation to early hematopoietic defects via attenuated non-canonical Wnt/Planar Cell Polarity (PCP) signaling that controls developmental cell re-arrangements. In hypomorphic rbm8a zebrafish, we observe a significant reduction of cd41 -positive thrombocytes. rbm8a -mutant zebrafish embryos accumulate mRNAs with individual retained introns, a hallmark of defective nonsense-mediated decay; affected mRNAs include transcripts for non-canonical Wnt/PCP pathway components. We establish that rbm8a -mutant embryos show convergent extension defects and that reduced rbm8a function interacts with perturbations in non-canonical Wnt/PCP pathway genes w nt5b , wnt11f2 , fzd7a , and vangl2 . Using live-imaging, we found reduced rbm8a function impairs the architecture of the lateral plate mesoderm (LPM) that forms hematopoietic, cardiovascular, kidney, and forelimb skeleton progenitors as affected in TAR Syndrome. Both mutants for rbm8a and for the PCP gene vangl2 feature impaired expression of early hematopoietic/endothelial genes including runx1 and the megakaryocyte regulator gfi1aa . Together, our data propose aberrant LPM patterning and hematopoietic defects as consequence of attenuated non-canonical Wnt/PCP signaling upon reduced rbm8a function. These results also link TAR Syndrome to a potential LPM origin and a developmental mechanism., Competing Interests: COMPETING INTERESTS STATEMENT J.A.S. has been a consultant for Sanofi, Takeda, Genentech, CSL Behring, and HEMA Biologics.
- Published
- 2024
- Full Text
- View/download PDF
24. Convergence of multiple RNA-silencing pathways on GW182/TNRC6.
- Author
-
Welte T, Goulois A, Stadler MB, Hess D, Soneson C, Neagu A, Azzi C, Wisser MJ, Seebacher J, Schmidt I, Estoppey D, Nigsch F, Reece-Hoyes J, Hoepfner D, and Großhans H
- Subjects
- Animals, RNA, Messenger genetics, RNA, Messenger metabolism, Protein Binding, Stem Cells metabolism, Mammals metabolism, Argonaute Proteins genetics, Argonaute Proteins metabolism, MicroRNAs genetics, MicroRNAs metabolism
- Abstract
The RNA-binding protein TRIM71/LIN-41 is a phylogenetically conserved developmental regulator that functions in mammalian stem cell reprogramming, brain development, and cancer. TRIM71 recognizes target mRNAs through hairpin motifs and silences them through molecular mechanisms that await identification. Here, we uncover that TRIM71 represses its targets through RNA-supported interaction with TNRC6/GW182, a core component of the miRNA-induced silencing complex (miRISC). We demonstrate that AGO2, TRIM71, and UPF1 each recruit TNRC6 to specific sets of transcripts to silence them. As cellular TNRC6 levels are limiting, competition occurs among the silencing pathways, such that the loss of AGO proteins or of AGO binding to TNRC6 enhances the activities of the other pathways. We conclude that a miRNA-like silencing activity is shared among different mRNA silencing pathways and that the use of TNRC6 as a central hub provides a means to integrate their activities., Competing Interests: Declaration of interests Several authors are or were employees of Novartis Pharma AG as listed in their affiliations and may own stock in the company., (Copyright © 2023 The Author(s). Published by Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
25. Conserved stromal-immune cell circuits secure B cell homeostasis and function.
- Author
-
Lütge M, De Martin A, Gil-Cruz C, Perez-Shibayama C, Stanossek Y, Onder L, Cheng HW, Kurz L, Cadosch N, Soneson C, Robinson MD, Stoeckli SJ, Ludewig B, and Pikor NB
- Subjects
- Mice, Humans, Animals, Immunity, Humoral, Dendritic Cells, Follicular, Homeostasis, B-Lymphocytes, Stromal Cells
- Abstract
B cell zone reticular cells (BRCs) form stable microenvironments that direct efficient humoral immunity with B cell priming and memory maintenance being orchestrated across lymphoid organs. However, a comprehensive understanding of systemic humoral immunity is hampered by the lack of knowledge of global BRC sustenance, function and major pathways controlling BRC-immune cell interactions. Here we dissected the BRC landscape and immune cell interactome in human and murine lymphoid organs. In addition to the major BRC subsets underpinning the follicle, including follicular dendritic cells, PI16
+ RCs were present across organs and species. As well as BRC-produced niche factors, immune cell-driven BRC differentiation and activation programs governed the convergence of shared BRC subsets, overwriting tissue-specific gene signatures. Our data reveal that a canonical set of immune cell-provided cues enforce bidirectional signaling programs that sustain functional BRC niches across lymphoid organs and species, thereby securing efficient humoral immunity., (© 2023. The Author(s).)- Published
- 2023
- Full Text
- View/download PDF
26. mutscan-a flexible R package for efficient end-to-end analysis of multiplexed assays of variant effect data.
- Author
-
Soneson C, Bendel AM, Diss G, and Stadler MB
- Subjects
- Workflow, Software, High-Throughput Nucleotide Sequencing
- Abstract
Multiplexed assays of variant effect (MAVE) experimentally measure the effect of large numbers of sequence variants by selective enrichment of sequences with desirable properties followed by quantification by sequencing. mutscan is an R package for flexible analysis of such experiments, covering the entire workflow from raw reads up to statistical analysis and visualization. The core components are implemented in C++ for efficiency. Various experimental designs are supported, including single or paired reads with optional unique molecular identifiers. To find variants with changed relative abundance, mutscan employs established statistical models provided in the edgeR and limma packages. mutscan is available from https://github.com/fmicompbio/mutscan ., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
27. Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability.
- Author
-
Sonrel A, Luetge A, Soneson C, Mallona I, Germain PL, Knyazev S, Gilis J, Gerber R, Seurinck R, Paul D, Sonder E, Crowell HL, Fanaswala I, Al-Ajami A, Heidari E, Schmeing S, Milosavljevic S, Saeys Y, Mangul S, and Robinson MD
- Subjects
- Workflow, Computational Biology methods, Benchmarking
- Abstract
Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, and neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for example, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
28. The shaky foundations of simulating single-cell RNA sequencing data.
- Author
-
Crowell HL, Morillo Leonardo SX, Soneson C, and Robinson MD
- Subjects
- Computer Simulation, Cluster Analysis, Sequence Analysis, RNA methods, Gene Expression Profiling methods, Single-Cell Analysis methods, Benchmarking
- Abstract
Background: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data., Results: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity., Conclusions: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
29. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing.
- Author
-
He D, Soneson C, and Patro R
- Abstract
Recently, a new modification has been proposed by Hjörleifsson and Sullivan et al . to the model used to classify the splicing status of reads (as spliced (mature), unspliced (nascent), or ambiguous) in single-cell and single-nucleus RNA-seq data. Here, we evaluate both the theoretical basis and practical implementation of the proposed method. The proposed method is highly-conservative, and therefore, unlikely to mischaracterize reads as spliced (mature) or unspliced (nascent) when they are not. However, we find that it leaves a large fraction of reads classified as ambiguous, and, in practice, allocates these ambiguous reads in an all-or-nothing manner, and differently between single-cell and single-nucleus RNA-seq data. Further, as implemented in practice, the ambiguous classification is implicit and based on the index against which the reads are mapped, which leads to several drawbacks compared to methods that consider both spliced (mature) and unspliced (nascent) mapping targets simultaneously - for example, the ability to use confidently assigned reads to rescue ambiguous reads based on shared UMIs and gene targets. Nonetheless, we show that these conservative assignment rules can be obtained directly in existing approaches simply by altering the set of targets that are indexed. To this end, we introduce the spliceu reference and show that its use with alevin-fry recapitulates the more conservative proposed classification. We also observe that, on experimental data, and under the proposed allocation rules for ambiguous UMIs, the difference between the proposed classification scheme and existing conventions appears much smaller than previously reported. We demonstrate the use of the new piscem index for mapping simultaneously against spliced (mature) and unspliced (nascent) targets, allowing classification against the full nascent and mature transcriptome in human or mouse in <3GB of memory. Finally, we discuss the potential of incorporating probabilistic evidence into the inference of splicing status, and suggest that it may provide benefits beyond what can be obtained from discrete classification of UMIs as splicing-ambiguous.
- Published
- 2023
- Full Text
- View/download PDF
30. A Phylogenetic Framework to Simulate Synthetic Interspecies RNA-Seq Data.
- Author
-
Bastide P, Soneson C, Stern DB, Lespinet O, and Gallopin M
- Subjects
- RNA-Seq, Phylogeny, Sequence Analysis, RNA methods, Software, Gene Expression Profiling methods
- Abstract
Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the interspecies gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the phylogenetic comparative methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for interspecies differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor., (© The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.)
- Published
- 2023
- Full Text
- View/download PDF
31. monaLisa: an R/Bioconductor package for identifying regulatory motifs.
- Author
-
Machlab D, Burger L, Soneson C, Rijli FM, Schübeler D, and Stadler MB
- Subjects
- Software, Transcription Factors
- Abstract
Summary: Proteins binding to specific nucleotide sequences, such as transcription factors, play key roles in the regulation of gene expression. Their binding can be indirectly observed via associated changes in transcription, chromatin accessibility, DNA methylation and histone modifications. Identifying candidate factors that are responsible for these observed experimental changes is critical to understand the underlying biological processes. Here, we present monaLisa, an R/Bioconductor package that implements approaches to identify relevant transcription factors from experimental data. The package can be easily integrated with other Bioconductor packages and enables seamless motif analyses without any software dependencies outside of R., Availability and Implementation: monaLisa is implemented in R and available on Bioconductor at https://bioconductor.org/packages/monaLisa with the development version hosted on GitHub at https://github.com/fmicompbio/monaLisa., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2022. Published by Oxford University Press.)
- Published
- 2022
- Full Text
- View/download PDF
32. Hand2 delineates mesothelium progenitors and is reactivated in mesothelioma.
- Author
-
Prummel KD, Crowell HL, Nieuwenhuize S, Brombacher EC, Daetwyler S, Soneson C, Kresoja-Rakic J, Kocere A, Ronner M, Ernst A, Labbaf Z, Clouthier DE, Firulli AB, Sánchez-Iranzo H, Naganathan SR, O'Rourke R, Raz E, Mercader N, Burger A, Felley-Bosco E, Huisken J, Robinson MD, and Mosimann C
- Subjects
- Animals, Basic Helix-Loop-Helix Transcription Factors genetics, Basic Helix-Loop-Helix Transcription Factors metabolism, Epithelium metabolism, Mice, Transcription Factors metabolism, Zebrafish Proteins genetics, Zebrafish Proteins metabolism, Mesothelioma genetics, Zebrafish
- Abstract
The mesothelium lines body cavities and surrounds internal organs, widely contributing to homeostasis and regeneration. Mesothelium disruptions cause visceral anomalies and mesothelioma tumors. Nonetheless, the embryonic emergence of mesothelia remains incompletely understood. Here, we track mesothelial origins in the lateral plate mesoderm (LPM) using zebrafish. Single-cell transcriptomics uncovers a post-gastrulation gene expression signature centered on hand2 in distinct LPM progenitor cells. We map mesothelial progenitors to lateral-most, hand2-expressing LPM and confirm conservation in mouse. Time-lapse imaging of zebrafish hand2 reporter embryos captures mesothelium formation including pericardium, visceral, and parietal peritoneum. We find primordial germ cells migrate with the forming mesothelium as ventral migration boundary. Functionally, hand2 loss disrupts mesothelium formation with reduced progenitor cells and perturbed migration. In mouse and human mesothelioma, we document expression of LPM-associated transcription factors including Hand2, suggesting re-initiation of a developmental program. Our data connects mesothelium development to Hand2, expanding our understanding of mesothelial pathologies., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
33. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data.
- Author
-
He D, Zakeri M, Sarkar H, Soneson C, Srivastava A, and Patro R
- Subjects
- RNA, Small Nuclear, RNA-Seq, Sequence Analysis, RNA methods, Software, Gene Expression Profiling methods, Single-Cell Analysis methods
- Abstract
The rapid growth of high-throughput single-cell and single-nucleus RNA-sequencing (scRNA-seq and snRNA-seq) technologies has produced a wealth of data over the past few years. The size, volume and distinctive characteristics of these data necessitate the development of new computational methods to accurately and efficiently quantify sc/snRNA-seq data into count matrices that constitute the input to downstream analyses. We introduce the alevin-fry framework for quantifying sc/snRNA-seq data. In addition to being faster and more memory frugal than other accurate quantification approaches, alevin-fry ameliorates the memory scalability and false-positive expression issues that are exhibited by other lightweight tools. We demonstrate how alevin-fry can be effectively used to quantify sc/snRNA-seq data, and also how the spliced and unspliced molecule quantification required as input for RNA velocity analyses can be seamlessly extracted from the same preprocessed data used to generate normal gene expression count matrices., (© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.)
- Published
- 2022
- Full Text
- View/download PDF
34. Mass cytometric and transcriptomic profiling of epithelial-mesenchymal transitions in human mammary cell lines.
- Author
-
Wagner J, Masek M, Jacobs A, Soneson C, Sivapatham S, Damond N, de Souza N, Robinson MD, and Bodenmiller B
- Subjects
- Cell Line, Female, Gene Expression Profiling, Humans, Breast Neoplasms genetics, Epithelial-Mesenchymal Transition genetics, Transcriptome
- Abstract
Epithelial-mesenchymal transition (EMT) equips breast cancer cells for metastasis and treatment resistance. However, detection, inhibition, and elimination of EMT-undergoing cells is challenging due to the intrinsic heterogeneity of cancer cells and the phenotypic diversity of EMT programs. We comprehensively profiled EMT transition phenotypes in four non-cancerous human mammary epithelial cell lines using a flow cytometry surface marker screen, RNA sequencing, and mass cytometry. EMT was induced in the HMLE and MCF10A cell lines and in the HMLE-Twist-ER and HMLE-Snail-ER cell lines by prolonged exposure to TGFβ1 or 4-hydroxytamoxifen, respectively. Each cell line exhibited a spectrum of EMT transition phenotypes, which we compared to the steady-state phenotypes of fifteen luminal, HER2-positive, and basal breast cancer cell lines. Our data provide multiparametric insights at single-cell level into the phenotypic diversity of EMT at different time points and in four human cellular models. These insights are valuable to better understand the complexity of EMT, to compare EMT transitions between the cellular models used here, and for the design of EMT time course experiments., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
35. treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses.
- Author
-
Huang R, Soneson C, Germain PL, Schmidt TSB, Mering CV, and Robinson MD
- Subjects
- Animals, Bacteria genetics, Blood Pressure genetics, Cerebral Cortex metabolism, Computer Simulation, Databases, Genetic, Gene Expression Regulation, Humans, Infant, Newborn, Mice, MicroRNAs genetics, MicroRNAs metabolism, Phylogeny, Single-Cell Analysis, Algorithms, Models, Genetic
- Abstract
treeclimbR is for analyzing hierarchical trees of entities, such as phylogenies or cell types, at different resolutions. It proposes multiple candidates that capture the latent signal and pinpoints branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single-cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.
- Published
- 2021
- Full Text
- View/download PDF
36. CellMixS: quantifying and visualizing batch effects in single-cell RNA-seq data.
- Author
-
Lütge A, Zyprych-Walczak J, Brykczynska Kunzmann U, Crowell HL, Calini D, Malhotra D, Soneson C, and Robinson MD
- Subjects
- Algorithms, Artifacts, Base Sequence genetics, Data Analysis, Gene Expression Profiling methods, Humans, RNA-Seq methods, Software, Exome Sequencing methods, Sequence Analysis methods, Sequence Analysis, RNA methods, Single-Cell Analysis methods
- Abstract
A key challenge in single-cell RNA-sequencing (scRNA-seq) data analysis is batch effects that can obscure the biological signal of interest. Although there are various tools and methods to correct for batch effects, their performance can vary. Therefore, it is important to understand how batch effects manifest to adjust for them. Here, we systematically explore batch effects across various scRNA-seq datasets according to magnitude, cell type specificity, and complexity. We developed a cell-specific mixing score (cms) that quantifies mixing of cells from multiple batches. By considering distance distributions, the score is able to detect local batch bias as well as differentiate between unbalanced batches and systematic differences between cells of the same cell type. We compare metrics in scRNA-seq data using real and synthetic datasets and whereas these metrics target the same question and are used interchangeably, we find differences in scalability, sensitivity, and ability to handle differentially abundant cell types. We find that cell-specific metrics outperform cell type-specific and global metrics and recommend them for both method benchmarks and batch exploration., (© 2021 Lütge et al.)
- Published
- 2021
- Full Text
- View/download PDF
37. A unique bipartite Polycomb signature regulates stimulus-response transcription during development.
- Author
-
Kitazawa T, Machlab D, Joshi O, Maiorano N, Kohler H, Ducret S, Kessler S, Gezelius H, Soneson C, Papasaikas P, López-Bendito G, Stadler MB, and Rijli FM
- Subjects
- Animals, Chromatin metabolism, Embryonic Stem Cells physiology, Enhancer of Zeste Homolog 2 Protein genetics, Enhancer of Zeste Homolog 2 Protein metabolism, Epigenesis, Genetic, Histones metabolism, Mice, Transgenic, Mutation, Polycomb-Group Proteins metabolism, Promoter Regions, Genetic, RNA Polymerase II genetics, RNA, Messenger genetics, RNA, Messenger metabolism, Rhombencephalon drug effects, Rhombencephalon embryology, Sensory Receptor Cells physiology, Chromatin genetics, Gene Expression Regulation, Developmental, Genes, Immediate-Early, Polycomb-Group Proteins genetics
- Abstract
Rapid cellular responses to environmental stimuli are fundamental for development and maturation. Immediate early genes can be transcriptionally induced within minutes in response to a variety of signals. How their induction levels are regulated and their untimely activation by spurious signals prevented during development is poorly understood. We found that in developing sensory neurons, before perinatal sensory-activity-dependent induction, immediate early genes are embedded into a unique bipartite Polycomb chromatin signature, carrying active H3K27ac on promoters but repressive Ezh2-dependent H3K27me3 on gene bodies. This bipartite signature is widely present in developing cell types, including embryonic stem cells. Polycomb marking of gene bodies inhibits mRNA elongation, dampening productive transcription, while still allowing for fast stimulus-dependent mark removal and bipartite gene induction. We reveal a developmental epigenetic mechanism regulating the rapidity and amplitude of the transcriptional response to relevant stimuli, while preventing inappropriate activation of stimulus-response genes.
- Published
- 2021
- Full Text
- View/download PDF
38. Preprocessing choices affect RNA velocity results for droplet scRNA-seq data.
- Author
-
Soneson C, Srivastava A, Patro R, and Stadler MB
- Subjects
- Algorithms, Animals, Databases, Genetic, Mice, Single-Cell Analysis methods, Computational Biology methods, Gene Expression Profiling methods, RNA, Messenger analysis, RNA, Messenger genetics, RNA, Messenger metabolism, RNA, Small Cytoplasmic analysis, RNA, Small Cytoplasmic genetics, RNA, Small Cytoplasmic metabolism, Sequence Analysis, RNA methods
- Abstract
Experimental single-cell approaches are becoming widely used for many purposes, including investigation of the dynamic behaviour of developing biological systems. Consequently, a large number of computational methods for extracting dynamic information from such data have been developed. One example is RNA velocity analysis, in which spliced and unspliced RNA abundances are jointly modeled in order to infer a 'direction of change' and thereby a future state for each cell in the gene expression space. Naturally, the accuracy and interpretability of the inferred RNA velocities depend crucially on the correctness of the estimated abundances. Here, we systematically compare five widely used quantification tools, in total yielding thirteen different quantification approaches, in terms of their estimates of spliced and unspliced RNA abundances in five experimental droplet scRNA-seq data sets. We show that there are substantial differences between the quantifications obtained from different tools, and identify typical genes for which such discrepancies are observed. We further show that these abundance differences propagate to the downstream analysis, and can have a large effect on estimated velocities as well as the biological interpretation. Our results highlight that abundance quantification is a crucial aspect of the RNA velocity analysis workflow, and that both the definition of the genomic features of interest and the quantification algorithm itself require careful consideration., Competing Interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: R.P. is a co-founder of Ocean Genomics Inc.
- Published
- 2021
- Full Text
- View/download PDF
39. MiR-CLIP reveals iso-miR selective regulation in the miR-124 targetome.
- Author
-
Wang Y, Soneson C, Malinowska AL, Laski A, Ghosh S, Kanitz A, Gebert LFR, Robinson MD, and Hall J
- Subjects
- 3' Untranslated Regions genetics, Amino Acid Motifs, Argonaute Proteins chemistry, Base Sequence, Binding Sites, Biotin, Cross-Linking Reagents pharmacology, DNA, Complementary genetics, GTP-Binding Proteins genetics, HEK293 Cells, Humans, Immunoprecipitation, MicroRNAs antagonists & inhibitors, Nuclear Proteins genetics, Nucleic Acid Conformation, Photochemistry, Sequence Analysis, DNA, Streptavidin, Trioxsalen radiation effects, Argonaute Proteins metabolism, Gene Expression Regulation, MicroRNAs genetics, Models, Genetic
- Abstract
Many microRNAs regulate gene expression via atypical mechanisms, which are difficult to discern using native cross-linking methods. To ascertain the scope of non-canonical miRNA targeting, methods are needed that identify all targets of a given miRNA. We designed a new class of miR-CLIP probe, whereby psoralen is conjugated to the 3p arm of a pre-microRNA to capture targetomes of miR-124 and miR-132 in HEK293T cells. Processing of pre-miR-124 yields miR-124 and a 5'-extended isoform, iso-miR-124. Using miR-CLIP, we identified overlapping targetomes from both isoforms. From a set of 16 targets, 13 were differently inhibited at mRNA/protein levels by the isoforms. Moreover, delivery of pre-miR-124 into cells repressed these targets more strongly than individual treatments with miR-124 and iso-miR-124, suggesting that isomirs from one pre-miRNA may function synergistically. By mining the miR-CLIP targetome, we identified nine G-bulged target-sites that are regulated at the protein level by miR-124 but not isomiR-124. Using structural data, we propose a model involving AGO2 helix-7 that suggests why only miR-124 can engage these sites. In summary, access to the miR-124 targetome via miR-CLIP revealed for the first time how heterogeneous processing of miRNAs combined with non-canonical targeting mechanisms expand the regulatory range of a miRNA., (© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2021
- Full Text
- View/download PDF
40. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data.
- Author
-
Crowell HL, Soneson C, Germain PL, Calini D, Collin L, Raposo C, Malhotra D, and Robinson MD
- Subjects
- Animals, Cerebellar Cortex drug effects, Cerebellar Cortex metabolism, Cluster Analysis, Computational Biology, Computer Simulation, Lipopolysaccharides adverse effects, Male, Mice, Models, Statistical, RNA, Small Cytoplasmic, Signal Transduction, Software, Gene Expression Profiling methods, Sequence Analysis, RNA methods, Single-Cell Analysis methods, Transcriptome
- Abstract
Single-cell RNA sequencing (scRNA-seq) has become an empowering technology to profile the transcriptomes of individual cells on a large scale. Early analyses of differential expression have aimed at identifying differences between subpopulations to identify subpopulation markers. More generally, such methods compare expression levels across sets of cells, thus leading to cross-condition analyses. Given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making sample-level inferences, termed here as differential state analysis; however, it is not clear which statistical framework best handles this situation. Here, we surveyed methods to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated pseudobulk data. To evaluate method performance, we developed a flexible simulation that mimics multi-sample scRNA-seq data. We analyzed scRNA-seq data from mouse cortex cells to uncover subpopulation-specific responses to lipopolysaccharide treatment, and provide robust tools for multi-condition analysis within the muscat R package.
- Published
- 2020
- Full Text
- View/download PDF
41. TreeSummarizedExperiment: a S4 class for data with hierarchical structure.
- Author
-
Huang R, Soneson C, Ernst FGM, Rue-Albrecht KC, Yu G, Hicks SC, and Robinson MD
- Subjects
- Phylogeny, Software
- Abstract
Data organized into hierarchical structures (e.g., phylogenies or cell types) arises in several biological fields. It is therefore of interest to have data containers that store the hierarchical structure together with the biological profile data, and provide functions to easily access or manipulate data at different resolutions. Here, we present TreeSummarizedExperiment, a R/S4 class that extends the commonly used SingleCellExperiment class by incorporating tree representations of rows and/or columns (represented by objects of the phylo class). It follows the convention of the SummarizedExperiment class, while providing links between the assays and the nodes of a tree to allow data manipulation at arbitrary levels of the tree. The package is designed to be extensible, allowing new functions on the tree (phylo) to be contributed. As the work is based on the SingleCellExperiment class and the phylo class, both of which are popular classes used in many R packages, it is expected to be able to interact seamlessly with many other tools., Competing Interests: No competing interests were disclosed., (Copyright: © 2021 Huang R et al.)
- Published
- 2020
- Full Text
- View/download PDF
42. Alignment and mapping methodology influence transcript abundance estimation.
- Author
-
Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, Love MI, Kingsford C, and Patro R
- Subjects
- Algorithms, Animals, Gene Expression Profiling, Mice, Sequence Analysis, RNA, Transcriptome, Chromosome Mapping methods, Sequence Alignment methods
- Abstract
Background: The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy., Results: We investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment., Conclusion: We observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.
- Published
- 2020
- Full Text
- View/download PDF
43. ExploreModelMatrix: Interactive exploration for improved understanding of design matrices and linear models in R.
- Author
-
Soneson C, Marini F, Geier F, Love MI, and Stadler MB
- Subjects
- Learning, Linear Models, Software
- Abstract
Linear and generalized linear models are used extensively in many scientific fields, to model observed data and as the basis for hypothesis tests. The use of such models requires specification of a design matrix, and subsequent formulation of contrasts representing scientific hypotheses of interest. Proper execution of these steps requires a thorough understanding of the meaning of the individual coefficients, and is a frequent source of uncertainty for end users. Here, we present an R/Bioconductor package, ExploreModelMatrix , which enables interactive exploration of design matrices and linear model diagnostics. Given a sample annotation table and a desired design formula, the package displays how the model coefficients are combined to give the fitted values for each combination of predictor variables, which allows users to both extract the interpretation of each individual coefficient, and formulate desired linear contrasts. In addition, the interactive interface displays informative characteristics for the regular linear model corresponding to the provided design, such as variance inflation factors and the pseudoinverse of the design matrix., Competing Interests: No competing interests were disclosed., (Copyright: © 2020 Soneson C et al.)
- Published
- 2020
- Full Text
- View/download PDF
44. Tximeta: Reference sequence checksums for provenance identification in RNA-seq.
- Author
-
Love MI, Soneson C, Hickey PF, Johnson LK, Pierce NT, Shepherd L, Morgan M, and Patro R
- Subjects
- Algorithms, Animals, Drosophila melanogaster, Genomics, Humans, Mice, Models, Statistical, Pattern Recognition, Automated, Programming Languages, Reproducibility of Results, Software, Transcriptome, Computational Biology methods, Gene Expression Profiling, RNA-Seq
- Abstract
Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta., Competing Interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: RP is a co-founder of Ocean Genomics.
- Published
- 2020
- Full Text
- View/download PDF
45. Orchestrating single-cell analysis with Bioconductor.
- Author
-
Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pagès H, Smith ML, Huber W, Morgan M, Gottardo R, and Hicks SC
- Subjects
- Gene Expression Profiling, Genome, High-Throughput Nucleotide Sequencing, Software, Single-Cell Analysis methods
- Abstract
Recent technological advancements have enabled the profiling of a large number of genome-wide features in individual cells. However, single-cell data present unique challenges that require the development of specialized methods and software infrastructure to successfully derive biological insights. The Bioconductor project has rapidly grown to meet these demands, hosting community-developed open-source software distributed as R packages. Featuring state-of-the-art computational methods, standardized data infrastructure and interactive data visualization tools, we present an overview and online book (https://osca.bioconductor.org) of single-cell methods for prospective users.
- Published
- 2020
- Full Text
- View/download PDF
46. Publisher Correction: Orchestrating single-cell analysis with Bioconductor.
- Author
-
Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pagès H, Smith ML, Huber W, Morgan M, Gottardo R, and Hicks SC
- Abstract
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
- Published
- 2020
- Full Text
- View/download PDF
47. HDCytoData: Collection of high-dimensional cytometry benchmark datasets in Bioconductor object formats.
- Author
-
Weber LM and Soneson C
- Subjects
- Cluster Analysis, Datasets as Topic, Humans, Computational Biology standards, Cytological Techniques standards, Software
- Abstract
Benchmarking is a crucial step during computational analysis and method development. Recently, a number of new methods have been developed for analyzing high-dimensional cytometry data. However, it can be difficult for analysts and developers to find and access well-characterized benchmark datasets. Here, we present HDCytoData, a Bioconductor package providing streamlined access to several publicly available high-dimensional cytometry benchmark datasets. The package is designed to be extensible, allowing new datasets to be contributed by ourselves or other researchers in the future. Currently, the package includes a set of experimental and semi-simulated datasets, which have been used in our previous work to evaluate methods for clustering and differential analyses. Datasets are formatted into standard SummarizedExperiment and flowSet Bioconductor object formats, which include complete metadata within the objects. Access is provided through Bioconductor's ExperimentHub interface. The package is freely available from http://bioconductor.org/packages/HDCytoData., Competing Interests: No competing interests were disclosed., (Copyright: © 2019 Weber LM and Soneson C.)
- Published
- 2019
- Full Text
- View/download PDF
48. A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes.
- Author
-
Soneson C, Yao Y, Bratus-Neuenschwander A, Patrignani A, Robinson MD, and Hussain S
- Subjects
- Base Sequence genetics, Cell Line, DNA, Complementary genetics, HEK293 Cells, Humans, Poly A genetics, High-Throughput Nucleotide Sequencing methods, Sequence Analysis, RNA methods, Transcriptome genetics
- Abstract
A platform for highly parallel direct sequencing of native RNA strands was recently described by Oxford Nanopore Technologies, but despite initial efforts it remains crucial to further investigate the technology for quantification of complex transcriptomes. Here we undertake native RNA sequencing of polyA + RNA from two human cell lines, analysing ~5.2 million aligned native RNA reads. To enable informative comparisons, we also perform relevant ONT direct cDNA- and Illumina-sequencing. We find that while native RNA sequencing does enable some of the anticipated advantages, key unexpected aspects currently hamper its performance, most notably the quite frequent inability to obtain full-length transcripts from single reads, as well as difficulties to unambiguously infer their true transcript of origin. While characterising issues that need to be addressed when investigating more complex transcriptomes, our study highlights that with some defined improvements, native RNA sequencing could be an important addition to the mammalian transcriptomics toolbox.
- Published
- 2019
- Full Text
- View/download PDF
49. ARMOR: An A utomated R eproducible MO dular Workflow for Preprocessing and Differential Analysis of R NA-seq Data.
- Author
-
Orjuela S, Huang R, Hembach KM, Robinson MD, and Soneson C
- Subjects
- Databases, Genetic, High-Throughput Nucleotide Sequencing, Workflow, Computational Biology methods, Sequence Analysis, RNA methods, Software
- Abstract
The extensive generation of RNA sequencing (RNA-seq) data in the last decade has resulted in a myriad of specialized software for its analysis. Each software module typically targets a specific step within the analysis pipeline, making it necessary to join several of them to get a single cohesive workflow. Multiple software programs automating this procedure have been proposed, but often lack modularity, transparency or flexibility. We present ARMOR, which performs an end-to-end RNA-seq data analysis, from raw read files, via quality checks, alignment and quantification, to differential expression testing, geneset analysis and browser-based exploration of the data. ARMOR is implemented using the Snakemake workflow management system and leverages conda environments; Bioconductor objects are generated to facilitate downstream analysis, ensuring seamless integration with many R packages. The workflow is easily implemented by cloning the GitHub repository, replacing the supplied input and reference files and editing a configuration file. Although we have selected the tools currently included in ARMOR, the setup is modular and alternative tools can be easily integrated., (Copyright © 2019 Orjuela et al.)
- Published
- 2019
- Full Text
- View/download PDF
50. Essential guidelines for computational method benchmarking.
- Author
-
Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A, Gardner PP, Boulesteix AL, Saeys Y, and Robinson MD
- Subjects
- Benchmarking, Datasets as Topic, Publishing, Research Design, Software, Computational Biology standards, Guidelines as Topic
- Abstract
In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.