| Issue |
EPL
Volume 153, Number 2, January 2026
|
|
|---|---|---|
| Article Number | 22002 | |
| Number of page(s) | 7 | |
| Section | Mathematical and interdisciplinary physics | |
| DOI | https://doi.org/10.1209/0295-5075/ae397f | |
| Published online | 29 January 2026 | |
Permutation entropy and statistical complexity in graph-based representations across world languages
1 Centro Multidisciplinario de Física, Universidad Mayor - Huechuraba, Santiago, Chile
2 Universidad Tecnológica del Perú - Lima, Peru
Received: 6 September 2025
Accepted: 16 January 2026
Abstract
We present a complexity-entropy analysis of word co-occurrence networks built from a parallel corpus of 360 languages spanning diverse typological and geographical groups. Each network represents words as nodes and bigram relations as edges. To capture structural organization, we measure permutation entropy and statistical complexity using the Bandt-Pompe ordinal pattern method, and lexical entropy using the Nemenman-Shafee-Bialek estimator. Language networks cluster in a compact region of the complexity-entropy plane, with consistently low values, clearly distinct from randomized Erdős-Rényi baselines. A negative correlation emerges between lexical and permutation entropy: greater vocabulary diversity tends to coincide with more ordered, less random token sequences. This shift suggests that languages expand their lexical space while reinforcing sequential structure. Family-level comparisons highlight systematic differences: Quechuan and Turkic languages combine high lexical entropy with low permutation entropy, Mayan languages show the reverse pattern, and Panoan and Tupian occupy intermediate positions. These results reveal a robust trade-off between lexical diversity and ordinal structure that supports efficient communication.
© 2026 EPLA. All rights, including for text and data mining, AI training, and similar technologies, are reserved
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.
