Europhys. Lett., 57 (5) , pp. 759-764 (2002)
Keyword detection in natural languages and DNAM. Ortuño1, P. Carpena2, P. Bernaola-Galván2, E. Muñoz3 and A. M. Somoza1
1 Departamento de Física, Universidad de Murcia - Murcia, Spain
2 Departamento de Física Aplicada II, ETSI de Telecomunicación Universidad de Málaga - Málaga, Spain
3 Facultad de Documentación, Universidad de Murcia - Murcia, Spain
(Received 20 July 2001; accepted in final form 30 November 2001)
We show that words in a text present long-range frequency fluctuations due to a strong self-attraction, that is directly related to the relevance of the term to the text considered. The standard deviation of the distance between successive occurrences of a word is an excellent parameter to quantify this self-attraction, and provides us with an effective tool for automatic keyword extraction. DNA sequences also present the same features: "words", for example codons in the coding part of the sequences, attract between themselves.
89.20.-a - Interdisciplinary applications of physics.
89.70.+c - Information science.
© EDP Sciences 2002