Volume 88, Number 6, December 2009
|Number of page(s)||4|
|Section||Interdisciplinary Physics and Related Areas of Science and Technology|
|Published online||07 January 2010|
Relevance is more significant than correlation: Information filtering on sparse data
Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China - 610054 Chengdu, China
2 Department of Physics, University of Fribourg - Chemin du Musée 3, CH-1700 Fribourg, Switzerland
3 Lab of Information Economy and Internet Research, University of Electronic Science and Technology of China 610054 Chengdu, China
4 Department of Modern Physics, University of Science and Technology of China - Hefei 230026, China
Corresponding author: email@example.com
Accepted: 2 December 2009
In some recommender systems where users can vote objects by ratings, the similarity between users can be quantified by a benchmark index, namely the Pearson correlation coefficient, which reflects the rating correlations. Another alternative way is to calculate the similarity based solely on the relevance information, namely whether a user has voted an object. The former one uses more information than the latter, and is intuitively expected to give more accurate rating predictions under the standard collaborative filtering framework. However, according to the extensive experimental analysis, this letter reports the opposite results that the latter method, making use of only the relevance information, can outperform the former method, especially when the data set is sparse. Our finding challenges the routine knowledge on information filtering, and suggests some alternatives to address the sparsity problem.
PACS: 89.20.Ff – Computer science and technology / 89.75.Hc – Networks and genealogical trees / 89.75.-k – Complex systems
© EPLA, 2009
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.