Europhys. Lett., 57 (5) , pp. 759-764 (2002)
Keyword detection in natural languages and DNA
M. Ortuño1, P. Carpena2, P. Bernaola-Galván2, E. Muñoz3 and A. M. Somoza11 Departamento de Física, Universidad de Murcia - Murcia, Spain
2 Departamento de Física Aplicada II, ETSI de Telecomunicación Universidad de Málaga - Málaga, Spain
3 Facultad de Documentación, Universidad de Murcia - Murcia, Spain
(Received 20 July 2001; accepted in final form 30 November 2001)
Abstract
We show that words in a text present long-range frequency
fluctuations due to a strong self-attraction, that is directly related
to the relevance of the term to the text considered.
The standard deviation of the distance between successive occurrences
of a word is an excellent parameter to quantify this self-attraction,
and provides us with an effective tool for automatic keyword extraction.
DNA sequences also present the same features: "words", for example
codons in the coding part of the sequences, attract between themselves.
89.20.-a - Interdisciplinary applications of physics.
89.70.+c - Information science.
© EDP Sciences 2002


BibSonomy
CiteUlike
Del.icio.us
Digg
Facebook
Mendeley
Twitter