1. Strong evidence of an information-theoretical conservation principle linking all discrete systems
- Author
-
Gregory W. Warr and Les Hatton
- Subjects
Distribution (number theory) ,Computer science ,proteome ,Structure (category theory) ,02 engineering and technology ,Information theory ,01 natural sciences ,Measure (mathematics) ,Component (UML) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,music ,Statistical physics ,lcsh:Science ,010306 general physics ,computer ,information theory ,Multidisciplinary ,Zipf's law ,software ,Rank (computer programming) ,020207 software engineering ,Statistical mechanics ,Computer Science ,CoHSI ,lcsh:Q ,Research Article - Abstract
Diverse discrete systems share common global properties that lack a unifying theoretical explanation. However, constraining the simplest measure of total information (Hartley–Shannon) in a statistical mechanics framework reveals a principle, the conservation of Hartley–Shannon information (CoHSI) that directly predicts both known and unsuspected common properties of discrete systems, as borne out in the diverse systems of computer software, proteins and music. Discrete systems fall into two categories distinguished by their structure: heterogeneous systems in which there is a distinguishable order of assembly of the system’s components from an alphabet of unique tokens (e.g. proteins assembled from an alphabet of amino acids), and homogeneous systems in which unique tokens are simply binned, counted and rank ordered. Heterogeneous systems are characterized by an implicit distribution of component lengths, with sharp unimodal peak (containing the majority of components) and a power-law tail, whereas homogeneous systems reduce naturally to Zipf’s Law but with a drooping tail in the distribution. We also confirm predictions that very long components are inevitable for heterogeneous systems; that discrete systems can exhibit simultaneously both heterogeneous and homogeneous behaviour; and that in systems with more than one consistent token alphabet (e.g. digital music), the alphabets themselves show a power-law relationship.
- Published
- 2019