Plastid-LCGbase: A Database for Structural Variation of Plastid Genomes

Plastid is one kind of key organelle, which is responsible for the photosynthesis and regarded as the important materials for genetic transformation and manipulation. The structure and number of plastid are different from those of other organelles including mitochondrion and their phenotypes are influenced by both genetics and environmental factors. The plastid genome produces considerable number of essential and necessary proteins for a variety of biological functions such as photosynthesis, respiration and translation. The gene number and type of plastid genomes from various species vary to some extent, although they have some conservation on both sequence and function.

 

Specifically, it is known that gene clusters in operon-like structure, which tend to be co-expressed and result in the improvement in transcription and translation efficiency, has been observed in plastid genomes. As a result, such gene clusters seem to locate together in the genome and be controlled under the natural selection. So far, the existing databases for plastids or chloroplasts are limited to the annotation of single genome, and there is a lack of studies on the comparative genomics. Prof. YU Jun from Beijing Institute of Genomics, Chinese Academy of Sciences and Dr. WANG Dapeng from UCL Cancer Institute constructed Plastid-LCGbase together, which focusing on the structural variation of genomes and conservation of gene pairs, the work has been published in Nucleic Acids Research.

 

This study collects 470 plastid genomes (most of them are chloroplast genomes), and classifies them according to the classical and traditional taxonomy. Plastid-LCGbase displayed the global gene distribution in the genome and homologous similarity between compared genomes and shown the well-defined variation types including insertion, deletion, translocation, inversion and rearrangement in various evolutionary scales. Also, gene pairs have been divided into three categories (“head-to-head”, “head-to-tail” and “tail-to-tail”) and three patterns (“separation”, “overlapping” and “inclusion”) and at the same time, the variation of distance between neighboring transcription start sites has been taken into account. What’s more, the potential operon structures have been identified by concatenating the highly-conserved gene pairs in a vast number of genomes.

 

In consequence, this database comprises a collection of plastid genome data in a large set of representative species, implements the visualization of comparative genome data and is deemed to provide a powerful platform for the exploration of dynamics and conservation rules of plastid genomes in future.

 

 A overview of Plastid-LCGbase(Image by Dr.WANG Dapeng)