EPL is available also on-line on www.epljournal.org
Issue Europhys. Lett.
Volume 57, Number 5, March 2002
Page(s) 759 - 764
Section Interdisciplinary physics and related areas of science and technology
DOI http://dx.doi.org/10.1209/epl/i2002-00528-3

DOI: 10.1209/epl/i2002-00528-3


Europhys. Lett., 57 (5) , pp. 759-764 (2002)

Keyword detection in natural languages and DNA

M. Ortuño1, P. Carpena2, P. Bernaola-Galván2, E. Muñoz3 and A. M. Somoza1

1  Departamento de Física, Universidad de Murcia - Murcia, Spain
2  Departamento de Física Aplicada II, ETSI de Telecomunicación Universidad de Málaga - Málaga, Spain
3  Facultad de Documentación, Universidad de Murcia - Murcia, Spain

(Received 20 July 2001; accepted in final form 30 November 2001)

Abstract
We show that words in a text present long-range frequency fluctuations due to a strong self-attraction, that is directly related to the relevance of the term to the text considered. The standard deviation of the distance between successive occurrences of a word is an excellent parameter to quantify this self-attraction, and provides us with an effective tool for automatic keyword extraction. DNA sequences also present the same features: "words", for example codons in the coding part of the sequences, attract between themselves.

PACS
89.20.-a - Interdisciplinary applications of physics.
89.70.+c - Information science.


© EDP Sciences 2002