Back to Search
Start Over
The variation of Zipf?s law in human language
- Source :
- UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
- Publication Year :
- 2005
- Publisher :
- Springer Science and Business Media LLC, 2005.
-
Abstract
- Words in humans follow the so-called Zipf's law. More precisely, the word frequency spectrum follows a power function, whose typical exponent is ß ˜ 2, but significant variations are found. We hypothesize that the full range of variation reflects our ability to balance the goal of communication, i.e. maximizing the information transfer and the cost of communication, imposed by the limitations of the human brain. We show that the higher the importance of satisfying the goal of communication, the higher the exponent. Here, assuming that words are used according to their meaning we explain why variation in ß should be limited to a particular domain. From the one hand, we explain a non-trivial lower bound at about ß = 1.6 for communication systems neglecting the goal of the communication. From the other hand, we find a sudden divergence of ß if a certain critical balance is crossed. At the same time a sharp transition to maximum information transfer and unfortunately, maximum communication cost, is found. Consistently with the upper bound of real exponents, the maximum finite value predicted is about ß = 2.4. It is convenient, for human language not to cross the transition and remain in a domain where maximum information transfer is high but at a reasonable cost. Therefore, only a particular range of exponents should be found in human speakers. The exponent ß contains information about the balance between cost and communicative efficiency.
- Subjects :
- Information transfer
Information theory
Zipf's law
Computational linguistics
Linguistics
Natural languages
Condensed Matter Physics
Upper and lower bounds
Electronic, Optical and Magnetic Materials
Word lists by frequency
Range (mathematics)
Variation (linguistics)
Large-scale systems
Zipf’s law
Informació, Teoria de la
Exponent
Speech
Lingüística computacional
Statistical physics
Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC]
Divergence (statistics)
Probability
Mathematics
Subjects
Details
- ISSN :
- 14346036 and 14346028
- Volume :
- 44
- Database :
- OpenAIRE
- Journal :
- The European Physical Journal B
- Accession number :
- edsair.doi.dedup.....3df75d5e6916ec4f473deb8367c10269