1. Compression and the origins of Zipf's law of abbreviation
- Author
-
Ferrer-i-Cancho, R., Bentz, C., and Seguin, C.
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer Science - Information Theory ,Physics - Data Analysis, Statistics and Probability ,Information Theory (cs.IT) ,FOS: Physical sciences ,Computer Science - Social and Information Networks ,Computation and Language (cs.CL) ,Data Analysis, Statistics and Probability (physics.data-an) - Abstract
Languages across the world exhibit Zipf's law of abbreviation, namely more frequent words tend to be shorter. The generalized version of the law - an inverse relationship between the frequency of a unit and its magnitude - holds also for the behaviours of other species and the genetic code. The apparent universality of this pattern in human language and its ubiquity in other domains calls for a theoretical understanding of its origins. To this end, we generalize the information theoretic concept of mean code length as a mean energetic cost function over the probability and the magnitude of the types of the repertoire. We show that the minimization of that cost function and a negative correlation between probability and the magnitude of types are intimately related., Comment: New results for optimal non-singular coding have been added; some sections have been reorganized
- Published
- 2015
- Full Text
- View/download PDF