Issue
EPL
Volume 79, Number 3, August 2007
Article Number 38006
Number of page(s) 5
Section Interdisciplinary Physics and Related Areas of Science and Technology
DOI http://dx.doi.org/10.1209/0295-5075/79/38006
Published online 17 July 2007
EPL, 79 (2007) 38006
DOI: 10.1209/0295-5075/79/38006

1-D random landscapes and non-random data series

T. M. A. Fink1, 2, K. Willbrand3 and F. C. S. Brown4

1  Systems Biology and CNRS UMR 144, Institut Curie Paris 75005, France
2  Theory of Condensed Matter, Cavendish Laboratory - Cambridge CB3 0HE, UK
3  Laboratoire de Physique Statistique, Ecole Normale Supérieure - 75005 Paris, France
4  Departement de Mathematique, Ecole Normale Supérieure - 75005 Paris, France


received 4 May 2007; accepted in final form 11 June 2007; published August 2007
published online 17 July 2007

Abstract
We study the simplest random landscape, the curve formed by joining consecutive data points $f_{1},\ldots,f_{N+1}$ with line segments, where the fi are i.i.d. random numbers and $f_{i}\ne f_{j}$. We label each segment increasing (+) or decreasing (-) and call this string of +'s and -'s the up-down signature $\sigma $. We calculate the probability $P(\sigma (f))$ for a random curve and use it to bound the algorithmic information content of f. We show that f can be compressed by $k = \log_2 {1/P(\sigma)} - N $ bits, where k is a universal currency for comparing the amount of pattern in different curves. By applying our results to microarray time series data, we blindly identify regulatory genes.

PACS
89.70.+c - Information theory and communication theory.
87.14.Gg - DNA, RNA.
89.75.-k - Complex systems.

© Europhysics Letters Association 2007