Back to Search Start Over

The effect of statistical normalisation on network propagation scores

Authors :
Sergio Picart-Armada
Alexandre Perera-Lluna
Wesley K. Thompson
Alfonso Buil
Universitat Politècnica de Catalunya. Doctorat en Enginyeria Biomèdica
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial
Universitat Politècnica de Catalunya. B2SLab - Bioinformatics and Biomedical Signals Laboratory
Source :
Bioinformatics, UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
Publication Year :
2020
Publisher :
Cold Spring Harbor Laboratory, 2020.

Abstract

Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene–disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein–protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias—mean value and variance—that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData Supplementary information Supplementary data are available at Bioinformatics online.

Details

Database :
OpenAIRE
Journal :
Bioinformatics, UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
Accession number :
edsair.doi.dedup.....7cb6c544891269b612174c34bc0ea334