1. Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors.
- Author
-
Jin, Yan-Ting, Tan, Yang, Gan, Zhong-Hua, Hao, Yu-Duo, Wang, Tian-Yu, Lin, Hao, and Tang, Bo
- Subjects
- *
FEATURE extraction , *CLASSIFICATION algorithms , *GENETIC transcription regulation , *HUMAN genome , *RANDOM forest algorithms - Abstract
• Identification of DHSs can help to understanding the mechanism of disease development and the treatment of the disease. • Multi-dimensional feature fusion strategy was used for feature extraction from DNase samples. • An overall prediction accuracy of 0.859 was achieved with an AUC value of 0.837. DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis -regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F -score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF