Start Over

MetaAc4C: A multi-module deep learning framework for accurate prediction of N4-acetylcytidine sites based on pre-trained bidirectional encoder representation and generative adversarial networks.

Authors :: Li, Zutan
Jin, Bingbing
Fang, Jingya
Source :: Genomics. Jan2024, Vol. 116 Issue 1, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: N4-acetylcytidine (ac4C) is a highly conserved RNA modification that plays a crucial role in various biological processes. Accurately identifying ac4C sites is of paramount importance for gaining a deeper understanding of their regulatory mechanisms. Nevertheless, the existing experimental techniques for ac4C site identification are characterized by limitations in terms of cost-effectiveness, while the performance of current computational methods in accurately identifying ac4C sites requires further enhancement. In this paper, we present MetaAc4C, an advanced deep learning model that leverages pre-trained bidirectional encoder representations from transformers (BERT). The model is based on a bi-directional long short-term memory network (BLSTM) architecture, incorporating attention mechanism and residual connection. To address the issue of data imbalance, we adapt generative adversarial networks to generate synthetic feature samples. On the independent test set, MetaAc4C surpasses the current state-of-the-art ac4C prediction model, exhibiting improvements in terms of ACC, MCC, and AUROC by 2.36%, 4.76%, and 3.11%, respectively, on the unbalanced dataset. When evaluated on the balanced dataset, MetaAc4C achieves improvements in ACC, MCC, and AUROC by 2.6%, 5.11%, and 1.01%, respectively. Notably, our approach of utilizing WGAN-GP augmented training RNA samples demonstrates even superior performance compared to the SMOTE oversampling method. • We provide a new computing framework, termed MetaAc4C, to automatically capture local and long-range information from protein sequences using BLSTM. The attention mechanism and residual connection are employed to effectively capture the critical information from protein sequences. • MetaAc4C uses BERT module in the embedding layer to take the RNA sequence information as a sentence and extract the potential semantic information hidden in the RNA sequence, effectively improving the accuracy of model prediction • In order to solve the problem of data imbalance, we use generative adversarial network (WGAN-GP) to generate synthetic feature samples, and prove that it has higher performance than traditional feature enhancement methods. [ABSTRACT FROM AUTHOR]

Subjects :: *GENERATIVE adversarial networks
*DEEP learning
*LANGUAGE models
*RNA modification & restriction
*NUCLEOTIDE sequence
*AMINO acid sequence

Details

Language :: English
ISSN :: 08887543
Volume :: 116
Issue :: 1
Database :: Academic Search Index
Journal :: Genomics
Publication Type :: Academic Journal
Accession number :: 174875016
Full Text :: https://doi.org/10.1016/j.ygeno.2023.110749

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

MetaAc4C: A multi-module deep learning framework for accurate prediction of N4-acetylcytidine sites based on pre-trained bidirectional encoder representation and generative adversarial networks.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

MetaAc4C: A multi-module deep learning framework for accurate prediction of N4-acetylcytidine sites based on pre-trained bidirectional encoder representation and generative adversarial networks.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources