Volume 57, Number 5, March 2002
|Page(s)||759 - 764|
|Section||Interdisciplinary physics and related areas of science and technology|
|Published online||01 September 2002|
Keyword detection in natural languages and DNA
Departamento de Física, Universidad de
Murcia - Murcia, Spain
2 Departamento de Física Aplicada II, ETSI de Telecomunicación Universidad de Málaga - Málaga, Spain
3 Facultad de Documentación, Universidad de Murcia - Murcia, Spain
Accepted: 30 November 2001
We show that words in a text present long-range frequency fluctuations due to a strong self-attraction, that is directly related to the relevance of the term to the text considered. The standard deviation of the distance between successive occurrences of a word is an excellent parameter to quantify this self-attraction, and provides us with an effective tool for automatic keyword extraction. DNA sequences also present the same features: “words”, for example codons in the coding part of the sequences, attract between themselves.
PACS: 89.20.-a – Interdisciplinary applications of physics / 89.70.+c – Information science
© EDP Sciences, 2002
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.