| Issue |
EPL
Volume 151, Number 6, September 2025
|
|
|---|---|---|
| Article Number | 62001 | |
| Number of page(s) | 7 | |
| Section | Mathematical and interdisciplinary physics | |
| DOI | https://doi.org/10.1209/0295-5075/adfa3e | |
| Published online | 05 September 2025 | |
On the class of coding optimality of human languages and the origins of Zipf's law
Quantitative, Mathematical and Computational Linguistics Research Group, Department of Computer Science, Universitat Politècnica de Catalunya - 08034 Barcelona, Catalonia, Spain
Received:
4
June
2025
Accepted:
11
August
2025
Abstract
Here we present a new class of optimality for coding systems. Members of that class are displaced linearly from optimal coding and thus exhibit Zipf's law, namely a power-law distribution of frequency ranks. Within that class, Zipf's law, the size-rank law and the size-probability law form a group-like structure. We identify human languages that are members of the class. All languages showing sufficient agreement with Zipf's law are potential members of the class. In contrast, there are communication systems in other species that cannot be members of that class for exhibiting an exponential distribution instead but dolphins and humpback whales might. We provide a new insight into plots of frequency vs. rank in double logarithmic scale. For any system, a straight line in that scale indicates that the lengths of optimal codes under non-singular coding and under uniquely decodable encoding are displaced by a linear function whose slope is the exponent of Zipf's law. For systems under compression and constrained to be uniquely decodable, such a straight line may indicate that the system is coding close to optimality. We provide support for the hypothesis that Zipf's law originates from compression and define testable conditions for the emergence of Zipf's law in compressing systems.
© 2025 EPLA. All rights, including for text and data mining, AI training, and similar technologies, are reserved
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.
