Back to Search Start Over

A novel method for feature selection based on molecular interactive effect network.

Authors :
Zhang, Yanhui
Lin, Xiaohui
Gao, Zhenbo
Bai, Songnan
Source :
Journal of Pharmaceutical & Biomedical Analysis. Sep2022, Vol. 218, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

Analyzing the biological data by considering the molecule interactions may induce a more accurate identification of disease-related biomarkers. In this study, a novel feature selection method based on molecule (feature) interactive effect network is proposed, denoted as Distance Correlation Gain-Network (DCG-Net). In DCG-Net, DCG is defined to measure the interactive effects between pairwise features with respect to the process of physiological and pathological changes and infer the molecule interactive effect network. DCG index is suitable for discrete random variables and continuous random variables. Then a greedy searching strategy is developed to search the informational modules of the interactive features with high statistical dependence on disease outcome. To evaluate the performance of DCG-Net, it was compared with eight representative feature selection techniques including t -test, ReliefF, SVM-RFE, mRMR, IG-RFE, INDEED, MN-PCC and Dcor-SFS on ten public datasets. The experiment results showed the superior performance of DCG-Net in classification accuracy rate, sensitivity, and specificity for three different classifiers. Subsequently, DCG-Net was employed to analyze a lung adenocarcinoma metabolomics dataset, and the metabolites selected involved in the important pathway and had a better discrimination ability. The experiments demonstrate that DCG can effectively detect the molecular interactions, and incorporation of the molecule interactions is helpful to identify informational biomarkers reflecting the occurrence and development of complex diseases. [Display omitted] • A new method is proposed to extract important information based on feature interactions. • The distance correlation gain suitable for continuous and discrete random variables is defined. • A greedy searching strategy is developed to search the informational modules from the interactive effect network. • Experiments on the public datasets and the application in the metabolomics data showed the validity of the method. • The distance correlation gain is used to explore the interactions between features and construct the molecular network. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
07317085
Volume :
218
Database :
Academic Search Index
Journal :
Journal of Pharmaceutical & Biomedical Analysis
Publication Type :
Academic Journal
Accession number :
157498907
Full Text :
https://doi.org/10.1016/j.jpba.2022.114873