Issue
EPL
Volume 81, Number 2, January 2008
Article Number 28006
Number of page(s) 5
Section Interdisciplinary Physics and Related Areas of Science and Technology
DOI http://dx.doi.org/10.1209/0295-5075/81/28006
Published online 17 December 2007
EPL, 81 (2008) 28006
DOI: 10.1209/0295-5075/81/28006

Taxonomy and clustering in collaborative systems: The case of the on-line encyclopedia Wikipedia

A. Capocci1, F. Rao2 and G. Caldarelli3, 2, 4

1  Dipartimento di Informatica e Sistemistica, Università "La Sapienza" - via Ariosto, 25 00185 Rome, Italy
2  Centro Studi e Ricerche e Museo della Fisica "E. Fermi" - Compendio Viminale, 00185 Rome, Italy
3  SMC Centre, INFM-CNR, Dipartimento di Fisica, Università "La Sapienza" - P.le A. Moro 2, 00185 Rome, Italy
4  Linkalab, Center for the Study of Complex Networks - 09100 Cagliari, Italy


received 14 June 2007; accepted in final form 13 November 2007; published January 2008
published online 17 December 2007

Abstract
In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.

PACS
89.75.Fb - Structures and organization in complex systems.
89.75.Hc - Networks and genealogical trees.
89.75.-k - Complex systems.

© EPLA 2008