Back to Search Start Over

Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments.

Authors :
Choi, Yongwook
Szpankowski, Wojciech
Source :
IEEE Transactions on Information Theory. Feb2012, Vol. 58 Issue 2, p620-638. 19p.
Publication Year :
2012

Abstract

Information theory traditionally deals with “conventional data,” be it textual data, image, or video data. However, databases of various sorts have come into existence in recent years for storing “unconventional data” including biological data, social data, web data, topographical maps, and medical data. In compressing such data, one must consider two types of information: the information conveyed by the structure itself, and the information conveyed by the data labels implanted in the structure. In this paper, we attempt to address the former problem by studying information of graphical structures (i.e., unlabeled graphs). As the first step, we consider the Erdös–Rényi graphs \cal G(n,p) over n vertices in which edges are added independently and randomly with probability p. We prove that the structural entropy of \cal G(n,p) is $n\choose 2h(p)-\log n!+o(1)=n\choose2h(p)-n\log n+O(n),$ where h(p)=-p\log p-(1-p)\log (1-p) is the entropy rate of a conventional memoryless binary source. Then, we propose a two-stage compression algorithm that asymptotically achieves the structural entropy up to the n\log n term (i.e., the first two leading terms) of the structural entropy. Our algorithm runs either in time O(n^2) in the worst case for any graph or in time O(n+e) on average for graphs generated by \cal G(n,p), where e is the average number of edges. To the best of our knowledge, this is the first provable (asymptotically) optimal graph compressor for Erdös-Rényi graph models. We use combinatorial and analytic techniques such as generating functions, Mellin transform, and poissonization to establish these findings. Our experiments confirm the theoretical results and also show the usefulness of our algorithm for some real-world graphs such as the Internet, biological networks, and social networks. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
00189448
Volume :
58
Issue :
2
Database :
Academic Search Index
Journal :
IEEE Transactions on Information Theory
Publication Type :
Academic Journal
Accession number :
71538930
Full Text :
https://doi.org/10.1109/TIT.2011.2173710