Back to Search Start Over

GSCtool: A Novel Descriptor that Characterizes the Genome for Applying Machine Learning in Genomics

Authors :
Zijie Shen
Enhui Shen
Qian-Hao Zhu
Longjiang Fan
Quan Zou
Chu-Yu Ye
Source :
Advanced Intelligent Systems, Vol 5, Iss 12, Pp n/a-n/a (2023)
Publication Year :
2023
Publisher :
Wiley, 2023.

Abstract

Machine learning (ML) is one of the core driving forces for the next breeding stage, and Breeding 4.0. Genotype matrix based on single‐nucleotide polymorphisms (SNPs) is often used in ML for genome‐to‐phenotype prediction. Genotype matrix has an inherent defect, as the feature spaces it generates across different individuals or groups are inconsistent, and this hinders the application of ML. To overcome the challenge, a genome descriptor, Genic SNPs Composition Tool (GSCtool) is developed, which counts the number of SNPs in each gene of the genome so the dimension of the feature vectors equals the number of annotated genes in a species. Compared to using the genotype matrix, using GSCtool significantly decreases the model training time and has a higher accuracy of phenotype prediction. GSCtool also achieves good performance in variety identification, which is useful in crop variety protection. In general, GSCtool will help facilitate the application and study of genomic ML. The source code and test data of GSCtool are freely available at https://github.com/SZJhacker/GSCtool and https://gitee.com/shenzijie/GSCtool.

Details

Language :
English
ISSN :
26404567
Volume :
5
Issue :
12
Database :
Directory of Open Access Journals
Journal :
Advanced Intelligent Systems
Publication Type :
Academic Journal
Accession number :
edsdoj.833d710b2de44a22ad0b79ff6e27be02
Document Type :
article
Full Text :
https://doi.org/10.1002/aisy.202300426