Volume 98, Number 1, April 2012
|Number of page(s)||6|
|Section||Interdisciplinary Physics and Related Areas of Science and Technology|
|Published online||10 April 2012|
Unveiling the relationship between complex networks metrics and word senses
Institute of Physics of São Carlos, University of São Paulo - P. O. Box 369, Postal Code 13560-970, São Carlos, São Paulo, Brazil
Accepted: 5 March 2012
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts.
PACS: 89.75.Hc – Networks and genealogical trees / 02.50.Sk – Multivariate analysis / 89.20.Ff – Computer science and technology
© EPLA, 2012
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.