1. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification
- Author
-
Pardo-Palacios, Francisco J, Wang, Dingjie, Reese, Fairlie, Diekhans, Mark, Carbonell-Sala, Sílvia, Williams, Brian, Loveland, Jane E, De María, Maite, Adams, Matthew S, Balderrama-Gutierrez, Gabriela, Behera, Amit K, Gonzalez Martinez, Jose M, Hunt, Toby, Lagarde, Julien, Liang, Cindy E, Li, Haoran, Meade, Marcus Jerryd, Moraga Amador, David A, Prjibelski, Andrey D, Birol, Inanc, Bostan, Hamed, Brooks, Ashley M, Çelik, Muhammed Hasan, Chen, Ying, Du, Mei RM, Felton, Colette, Göke, Jonathan, Hafezqorani, Saber, Herwig, Ralf, Kawaji, Hideya, Lee, Joseph, Li, Jian-Liang, Lienhard, Matthias, Mikheenko, Alla, Mulligan, Dennis, Nip, Ka Ming, Pertea, Mihaela, Ritchie, Matthew E, Sim, Andre D, Tang, Alison D, Wan, Yuk Kei, Wang, Changqing, Wong, Brandon Y, Yang, Chen, Barnes, If, Berry, Andrew E, Capella-Gutierrez, Salvador, Cousineau, Alyssa, Dhillon, Namrita, Fernandez-Gonzalez, Jose M, Ferrández-Peral, Luis, Garcia-Reyero, Natàlia, Götz, Stefan, Hernández-Ferrer, Carles, Kondratova, Liudmyla, Liu, Tianyuan, Martinez-Martin, Alessandra, Menor, Carlos, Mestre-Tomás, Jorge, Mudge, Jonathan M, Panayotova, Nedka G, Paniagua, Alejandro, Repchevsky, Dmitry, Ren, Xingjie, Rouchka, Eric, Saint-John, Brandon, Sapena, Enrique, Sheynkman, Leon, Smith, Melissa Laird, Suner, Marie-Marthe, Takahashi, Hazuki, Youngworth, Ingrid A, Carninci, Piero, Denslow, Nancy D, Guigó, Roderic, Hunter, Margaret E, Maehr, Rene, Shen, Yin, Tilgner, Hagen U, Wold, Barbara J, Vollmers, Christopher, Frankish, Adam, Au, Kin Fai, Sheynkman, Gloria M, Mortazavi, Ali, Conesa, Ana, and Brooks, Angela N
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Biotechnology ,Human Genome ,Generic health relevance ,Humans ,Animals ,Mice ,RNA-Seq ,Gene Expression Profiling ,Transcriptome ,Sequence Analysis ,RNA ,Molecular Sequence Annotation ,Technology ,Medical and Health Sciences ,Developmental Biology ,Biological sciences - Abstract
The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
- Published
- 2024