Back to Search
Start Over
The effect of statistical normalisation on network propagation scores
- Source :
- Bioinformatics, UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
- Publication Year :
- 2020
- Publisher :
- Cold Spring Harbor Laboratory, 2020.
-
Abstract
- Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene–disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein–protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias—mean value and variance—that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- Computer science
02 engineering and technology
Biochemistry
Interactome
Diffusion
Computational biology
Protein-protein interaction
0302 clinical medicine
Informàtica [Àrees temàtiques de la UPC]
0202 electrical engineering, electronic engineering, information engineering
Protein function prediction
Prospective Studies
Protein Interaction Maps
Mathematics
Parametric statistics
0303 health sciences
Statistics
Variance (accounting)
Covariance
Original Papers
Graph
Computer Science Applications
Computational Mathematics
Kernel method
Computational Theory and Mathematics
Null (SQL)
Graph (abstract data type)
Network analysis
Mineria de dades
Statistics and Probability
Normalization (statistics)
Network topology
Biologia computacional
03 medical and health sciences
Permutation
Interaction network
020204 information systems
Molecular Biology
Data mining
030304 developmental biology
business.industry
Null (mathematics)
Computational Biology
Proteins
Kernel methods
Pattern recognition
Kernel, Funcions de
Kernel functions
Artificial intelligence
business
030217 neurology & neurosurgery
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Bioinformatics, UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
- Accession number :
- edsair.doi.dedup.....7cb6c544891269b612174c34bc0ea334