Volume 84, Number 1, October 2008
|Number of page(s)||6|
|Section||Interdisciplinary Physics and Related Areas of Science and Technology|
|Published online||15 September 2008|
Identifying short motifs by means of extreme value analysis
Dipartimento di Fisica, Università di Roma “La Sapienza” - p.le Aldo Moro 2, 00185 Rome, Italy, EU
Corresponding author: firstname.lastname@example.org
Accepted: 13 August 2008
The problem of detecting a binding site —a substring of DNA where transcription factors attach— on a long DNA sequence requires the recognition of a small pattern in a large background. For short binding sites, the matching probability can display large fluctuations from one putative binding site to another. Here we use a self-consistent statistical procedure that accounts correctly for the large deviations of the matching probability to predict the location of short binding sites. We apply it in two distinct situations: a) the detection of the binding sites for three specific transcription factors on a set of 134 estrogen-regulated genes; b) the identification, in a set of 138 possible transcription factors, of the ones binding a specific set of nine genes. In both instances, experimental findings are reproduced (when available) and the number of false positives is significantly reduced with respect to the other methods commonly employed.
PACS: 87.10.Vg – Biological information / 02.50.Tt – Inference methods / 87.18.Vf – Systems biology
© EPLA, 2008
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.