Back to Search
Start Over
Identify Key Sequence Features to Improve CRISPR sgRNA Efficacy
- Source :
- IEEE Access, Vol 5, Pp 26582-26590 (2017)
- Publication Year :
- 2017
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2017.
-
Abstract
- The CRISPR/Cas9 system is a creative and innovative gene editing biotechnology tool in genetic engineering. Although several achievements have been attained using the CRISPR/Cas9 system, it is still a challenge to avoid off-target effects and improve the editing efficacy. Previous efforts on evaluating the efficacy and designing the guide RNA mainly focused on DNA properties. However, some DNA features have not been characterized but can be reflected by protein properties, such as the disorder features and the sequence conservation. In this paper, we provided a computational framework to identify important features related to the efficacy of CRISPR/Cas9 focusing on the properties of the proteins encoded by the target DNA fragments. The feature selection method, maximal-relevance-minimal-redundancy, was adopted to analyze these features. And incremental feature selection together with support vector machine, were employed to extract optimal features, on which an optimal classifier can be constructed. As a result, 152 important features were extracted, with which an optimal classifier based on support vector machine was built. This classifier obtained the highest MCC value of 0.355. Finally, a series of detailed biological analyses were performed on the optimal features. From the results, we found that some key factors may differentially affect the binding activity of sgRNAs to their targets. Among them, the disorder status of the target protein sequences was found to be a major factor that is related to the efficacy of sgRNAs, suggesting the DNA features associated with the protein disorder status could also affect the CRISPR/Cas9 efficacy.
- Subjects :
- 0301 basic medicine
General Computer Science
Computer science
Feature extraction
Feature selection
Genomics
Computational biology
Bioinformatics
sgRNAs
03 medical and health sciences
chemistry.chemical_compound
Genome editing
CRISPR
General Materials Science
Guide RNA
protein disorder
Cas9
CRISPR/Cas9 system
incremental feature selection
General Engineering
maximal-relevance-minimal-redundancy
Support vector machine
030104 developmental biology
chemistry
lcsh:Electrical engineering. Electronics. Nuclear engineering
Target protein
lcsh:TK1-9971
Classifier (UML)
DNA
Subjects
Details
- ISSN :
- 21693536
- Volume :
- 5
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....2b4a2ece0aa5d37f048fd3aed86b1198
- Full Text :
- https://doi.org/10.1109/access.2017.2775703