Sequence signatures of nucleosome positioning in Caenorhabditis elegans
Print | Close | Text Size:A A A

Kaifu Chen1,2, Lei Wang1,2, Meng Yang1, Jiucheng Liu1,2, Chengqi Xin1,2, Songnian Hu1, and Jun Yu1

1CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China;
2Graduate University of Chinese Academy of Sciences, Beijing 100049, China.

Genomics Proteomics Bioinformatics 2010 Jun; 8(2): 92-102. DOI: 10.1016/S1672-0229(10)60010-1

Abstract: Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5'-end to the 3'-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.