Back to Search Start Over

PPMGS: An efficient and effective solution for distributed privacy-preserving semi-supervised learning.

Authors :
Li, Zhi
Li, Chaozhuo
Li, Zhoujun
Weng, Jian
Huang, Feiran
Zhou, Zhibo
Source :
Information Sciences. Sep2024, Vol. 678, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Recently, distributed semi-supervised learning has attracted increasing research attention due to its tremendous practical value. A promising distributed semi-supervised learning method should not only achieve desirable classification performance but also protect data privacy in distributed scenarios. Existing approaches typically capture the similarities between data instances with privacy-preserving computations. This paradigm introduces extra computation and heuristic changes to the algorithm, resulting in sub-optimal solutions that are time-consuming. In current distributed semi-supervised learning, instance similarities are widely used to capture the underlying manifold or guide label propagation. This paper emphasizes that instance similarities are not necessary because the structure of data connections can be estimated using coarser-grained information. We propose a Privacy-preserving Mixture-distribution based Graph Smoothing (PPMGS) model for distributed privacy-preserving semi-supervised learning. Our motivation is to construct a graph based on a Gaussian mixture distribution instead of individual data instances, which better captures the underlying data distribution and improves model efficiency. PPMGS includes a privacy-preserving expectation-maximization (EM) phase to estimate the Gaussian mixture distribution depicting the input data and a mixture-distribution-based graph smoothing algorithm to learn a distribution-based classifier by fitting a few labeled samples. Experimental results show that the proposed PPMGS achieves 5%-10% higher accuracy and macro-F1 than state-of-the-art privacy-preserving semi-supervised learning methods. In terms of efficiency, it reduces time cost by 97% and communication cost by 96% in the most complex dataset. The numerical results demonstrate that our proposal outperforms state-of-the-art baselines in both efficiency and effectiveness. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
678
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
178148228
Full Text :
https://doi.org/10.1016/j.ins.2024.120934