National Genomics Data Center

ZHANG Zhang
zhangzhang(AT)big.ac.cn

Professional Experience
? Executive Director of BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, China, 2016 - Present
? Professor in “100-Talent” Program of CAS, Beijing Institute of Genomics, Chinese Academy of Sciences (CAS), China, 2011 - Present
? Research Scientist, King Abdullah University of Science and Technology, Kingdom of Saudi Arabia, 2009 - 2011
? Postdoctoral Associate, Yale University, United States of America, 2007 - 2009

 

Education
? PhD in Computer Science, Institute of Computing Technology, Chinese Academy of Sciences, China, 2007
? MS in Computer Science, Nanjing University of Science and Technology, China, 2004
? BS in Computer Science, Ningxia University, China, 2002

 

Research Interests
? Big Data Integration and Analytics
? Computational Molecular Evolution

 

Projects & Resources
? IC4R
? MethBank
? LncRNAWiki
? Database Commons
? KaKs_Calculator

 

Academic Activities
? Editorial Board Member: Biology Direct (2013—)
? Academic Editor: PLoS ONE (2012—)
? Associate Editor-in-Chief: Genomics, Proteomics & Bioinformatics (2012—)
? Journal Reviewer: Bioinformatics, Biology Direct, BioSystems, BMC Bioinformatics, BMC Evolutionary Biology, BMC Genomics, BMC Plant Biology, BMC Systems Biology, Briefings in Bioinformatics, Chinese Bulletin of Life Sciences, Current Bioinformatics, Database, Evolutionary Bioinformatics, Gene, Genome Biology, Genomics Proteomics & Bioinformatics, In Silico Biology, Integrative Zoology, Journal of Bioinformatics and Computational Biology, Journal of Molecular Evolution, Molecular Biology and Evolution, PLoS ONE, PLoS Pathogens, RNA
? Grant Referee: UK BBSRC
? Executive Committee Member: International Society for Biocuration (2016—)
? Membership: Genetics Society of China, International Society for Biocuration 

 

Introduction

Big Data Integration and Analytics

Data Integration
The rapid advancements in high-throughput experiment technologies make biological data increasing at an unprecedentedly exponential rate. To answer the most important and complex biological questions, it is very often to involve the integration of diverse data from multiple data sources, which needs to harness collective contributions and build bioinformatic Web APIs for massive data integration.

Data Analysis
The fast-growing volume of biological data makes it imperative to develop time-efficient applications for large-scale data analysis. This requires utility of highly efficient computing technologies (e.g., cloud, parallel) and establishment of lightweight programming environment to make full use of computing resources as well as storage resources.

Data Sharing
Data, broadly speaking, including raw data, algorithms, results, pipelines, publications, knowledge and even connections among people, are growing at an unparalleled pace. Thus, it needs to link researchers all over the world and build scientific social networks for efficient and effective data sharing.

Computational Molecular Evolution

Modeling Compositional Dynamics
Sequence compositions at different levels (e.g., codon) reflect an interplay result of mutation and selection. To better understand sequence evolution, it is of fundamental significance to study sequence composition, which is closely related to gene expression, translation speed and/or accuracy, gene function, protein structure, the intrinsic nature of the genetic code, and so on.

Detecting Mutation and Selection
A number of models have been proposed for modeling evolution of protein-coding sequence. It would be desirable to model sequence evolution and detect selective pressure, not merely in protein-coding sequences, but also in non-coding sequences.

Simulating Evolutionary Process
Simulating evolutionary process of molecular sequences over time is essential for a broad range of evolutionary studies. To perform simulations in a biologically realistic way, it is necessary to take full considerations of a variety of multiple parameters, such as, mutation rate, functional and structural constraints, pattern of site substitution, co-evolving sites, site-specific evolutionary constraints, etc. 

 

Selected Publications
1. Sun S, Xiao J, Zhang H and Zhang Z: Pangenome evidence for higher codon usage bias and stronger translational selection in core genes of Escherichia coli. Front. Microbiol 2016, 7:1180.
2. Yin, H.Y., Ma, L.N., Wang, G.Y., Li, M.W., Zhang, Z: Old genes experience stronger translational selection than young genes. Gene 2016, 590(1):29–34. [PMID=27259662]
3. Wang, G.Y., Sun, S.X., Zhang, Z: Randomness in sequence evolution increases over time. PLoS One 2016, 11(5): e0155935. [PMID=27224236]
4. Zhang Z as corresponding author in IC4R Project Consortium: Information Commons for Rice (IC4R). Nucleic Acids Res 2016, 44(D1):D1172-1180. [PMID=26519466]
5. Zou D, Sun S, Li R, Liu J, Zhang J, Zhang Z: MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data.Nucleic Acids Res 2015, 43(Database issue):D54-58. [PMID=25294826]
6. Zou D, Ma L, Yu J, Zhang Z: Biological databases for human research. Genomics Proteomics Bioinformatics 2015, 13(1):55-63. [PMID=25712261]
7. Ma L, Li A, Zou D, Xu X, Xia L, Yu J, Bajic VB, Zhang Z: LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res 2015, 43(Database issue):D187-192. [PMID=25399417]
8. Bai B, Zhao WM, Tang BX, Wang YQ, Wang L, Zhang Z, Yang HC, Liu YH, Zhu JW, Irwin DM, Wang GD, Zhang YP: DoGSD: the dog and wolf genome SNP database. Nucleic Acids Res 2015, 43(Database issue):D777-783. [PMID=25404132]
9. Zhao Y, Jia X, Yang J, Ling Y, Zhang Z, Yu J, Wu J, Xiao J: PanGP: a tool for quickly analyzing bacterial pan-genome profile. Bioinformatics 2014, 30(9):1297-1299.[PMID=24420766]
10. Zhang Z, Zhu W, Luo J: Bringing biocuration to China. Genomics Proteomics Bioinformatics 2014, 12(4):153-155. [PMID=25042682]
11. Zhang Z, Sang J, Ma L, Wu G, Wu H, Huang D, Zou D, Liu S, Li A, Hao L, Tian M, Xu C, Wang X, Wu J, Xiao J, Dai L, Chen LL, Hu S, Yu J: RiceWiki: a wiki-based database for community curation of rice genes. Nucleic Acids Res 2014, 42(Database issue):D1222-1228. [PMID=24136999]
12. Xu P, Zhang X, Wang X, Li J, Liu G, Kuang Y, Xu J, Zheng X, Ren L, Wang G, Zhang Y, Huo L, Zhao Z, Cao D, Lu C, Li C, Zhou Y, Liu Z, Fan Z, Shan G, Li X, Wu S, Song L, Hou G, Jiang Y, Jeney Z, Yu D, Wang L, Shao C, Song L, Sun J, Ji P, Wang J, Li Q, Xu L, Sun F, Feng J, Wang C, Wang S, Wang B, Li Y, Zhu Y, Xue W, Zhao L, Wang J, Gu Y, Lv W, Wu K, Xiao J, Wu J, Zhang Z, Yu J, Sun X: Genome sequence and genetic diversity of the common carp, Cyprinus carpio. Nat Genet 2014, 46(11):1212-1219. [PMID=25240282]
13. Wu J, Xiao J, Zhang Z, Wang X, Hu S, Yu J: Ribogenomics: the science and knowledge of RNA. Genomics Proteomics Bioinformatics 2014, 12(2):57-63. [PMID=24769101]
14. Wu H, Fang Y, Yu J, Zhang Z: The quest for a unified view of bacterial land colonization. The ISME journal 2014, 8(7):1358-1369. [PMID=24451209]
15. Wu G, Zhu J, Yu J, Zhou L, Huang JZ, Zhang Z: Evaluation of five methods for genome-wide circadian gene identification. Journal of biological rhythms 2014, 29(4):231-242.[PMID=25238853]
16. Ma L, Cui P, Zhu J, Zhang Z, Zhang Z: Translational selection in human: more pronounced in housekeeping genes. Biol Direct 2014, 9:17. [PMID=25011537]
17. Kang Y, Gu C, Yuan L, Wang Y, Zhu Y, Li X, Luo Q, Xiao J, Jiang D, Qian M, Ahmed Khan A, Chen F, Zhang Z, Yu J: Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks. mBio 2014, 5(6):e01867. [PMID=25425232]
18. Zhang Z, Yu J: Does the genetic code have a eukaryotic origin?. Genomics Proteomics Bioinformatics 2013, 11(1):41-55. [PMID=23402863]
19. Zhang Z, Wong GK, Yu J: Protein coding. Encyclopedia of Life Sciences (eLS) 2013. [Link]
20. Wu J, Xiao J, Wang L, Zhong J, Yin H, Wu S, Zhang Z, Yu J: Systematic analysis of intron size and abundance parameters in diverse lineages. Sci China Life Sci 2013, 56(10):968-974. [PMID=24022126]
21. Tong X, Yang Y, Wang W, Bai Z, Ma L, Zheng X, Sun H, Zhang Z, Zhao M, Yu J, Ge RL: Expression profiling of abundant genes in pulmonary and cardiac muscle tissues of Tibetan Antelope (Pantholops hodgsonii). Gene 2013, 523(2):187-191. [PMID=23612247]
22. Ma L, Bajic VB, Zhang Z: On the classification of long non-coding RNAs. RNA Biol 2013, 10(6):925-933. [PMID=23696037]
23. Dai L, Xu C, Tian M, Sang J, Zou D, Li A, Liu G, Chen F, Wu J, Xiao J, Wang X, Yu J, Zhang Z: Community intelligence in knowledge curation: an application to managing scientific nomenclature. PLoS One 2013, 8(2):e56961. [PMID=23451119]
24. Dai L, Tian M, Wu J, Xiao J, Wang X, Townsend JP, Zhang Z: AuthorReward: increasing community curation in biological knowledge wikis through automated authorship quantification. Bioinformatics 2013, 29(14):1837-1839. [PMID=23732274]
25. Chen M, Xiao J, Zhang Z, Liu J, Wu J, Yu J: Identification of human HK genes and gene expression regulation study in cancer from transcriptomics data analysis. PLoS One 2013, 8(1):e54082. [PMID=23382867]
26. Zhang Z, Yu J: The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids. Genomics Proteomics Bioinformatics 2012, 10(4):175-180. [PMID=23084772]
27. Zhang Z, Xiao J, Wu J, Zhang H, Liu G, Wang X, Dai L: ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun 2012, 419(4):779-781. [PMID=22390928]
28. Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J: Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 2012, 13(1):43. [PMID=22435713]
29. Wu H, Zhang Z, Hu S, Yu J: On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct 2012, 7(1):2. [PMID=22230424]
30. Wu H, Qu H, Wan N, Zhang Z, Hu S, Yu J: Strand-biased gene distribution in bacteria is related to both horizontal gene transfer and strand-biased nucleotide composition.Genomics Proteomics Bioinformatics 2012, 10(4):186-196. [PMID=23084774]
31. Dai L, Gao X, Guo Y, Xiao J, Zhang Z: Bioinformatics clouds for big data manipulation. Biol Direct 2012, 7:43; discussion 43. [PMID=23190475]
32. Cui P, Liu W, Zhao Y, Lin Q, Zhang D, Ding F, Xin C, Zhang Z, Song S, Sun F, Yu J, Hu S: Comparative analyses of H3K4 and H3K27 trimethylations between the mouse cerebrum and testis. Genomics Proteomics Bioinformatics 2012, 10(2):82-93. [PMID=22768982]
33. Cui P, Ding F, Lin Q, Zhang L, Li A, Zhang Z, Hu S, Yu J: Distinct contributions of replication and transcription to mutation rate variation of human genomes. Genomics Proteomics Bioinformatics 2012, 10(1):4-10. [PMID=22449396]
34. Zhang Z, Yu J: On the organizational dynamics of the genetic code. Genomics Proteomics Bioinformatics 2011, 9(1-2):21-29. [PMID=21641559]
35. Zhang Z, Bajic VB, Yu J, Cheung K-H, Townsend JP: Data Integration in Bioinformatics: Current Efforts and Challenges. In: Bioinformatics - Trends and Methodologies. Edited by Mahdavi MA, vol. 1. Rijeka, Croatia: InTech; 2011: 41-56. [Link]
36. Zhang Z, Yu J: Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biol Direct 2010, 5(1):63. [PMID=21059261]
37. Zhang Z, Townsend JP: The filamentous fungal gene expression database (FFGED). Fungal Genet Biol 2010, 47(3):199-204. [PMID=20025988]
38. Zhang Z, Lopez-Giraldez F, Townsend JP: LOX: inferring Level Of eXpression from diverse methods of census sequencing. Bioinformatics 2010, 26(15):1918-1919.[PMID=20538728]
39. Wang D, Zhang Y, Zhang Z, Zhu J, Yu J: KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics 2010, 8(1):77-80. [PMID=20451164]
40. Qu H, Wu H, Zhang T, Zhang Z, Hu S, Yu J: Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes. Res Microbiol 2010, 161(10):838-846. [PMID=20868744]
41. Zhang Z, Townsend JP: Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences. PLoS Comput Biol 2009, 5(6):e1000421.[PMID=19557160]
42. Zhang Z, Cheung KH, Townsend JP: Bringing Web 2.0 to bioinformatics. Briefings in Bioinformatics 2009, 10(1):1-10. [PMID=18842678]
43. Li J, Zhang Z, Vang S, Yu J, Wong GK, Wang J: Correlation between Ka/Ks and Ks is related to substitution model and evolutionary lineage. J Mol Evol 2009, 68(4):414-423.[PMID=19308632]
44. Zheng H, Shi J, Fang X, Li Y, Vang S, Fan W, Wang J, Zhang Z, Wang W, Kristiansen K, Wang J: FGF: a web tool for Fishing Gene Family in a whole genome database. Nucleic Acids Res 2007, 35(Web Server issue):W121-125. [PMID=17584790]
45. Zhao X, Zhang Z, Yan J, Yu J: GC content variability of eubacteria is governed by the pol III alpha subunit. Biochem Biophys Res Commun 2007, 356(1):20-25.[PMID=17336933]
46. Hu J, Zhao X, Zhang Z, Yu J: Compositional dynamics of guanine and cytosine content in prokaryotic genomes. Res Microbiol 2007, 158(4):363-370. [PMID=17449227]
47. Zhang Z, Yu J: Evaluation of six methods for estimating synonymous and nonsynonymous substitution rates. Genomics Proteomics Bioinformatics 2006, 4(3):173-181.[PMID=17127215]
48. Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J: KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics 2006, 4(4):259-263. [PMID=17531802]
49. Zhang Z, Li J, Yu J: Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC Evol Biol 2006, 6:44. [PMID=16740169]
50. Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J, Durbin R: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res 2006, 34(Database issue):D572-580. [PMID=16381935]

 

Group Members

Staff:
HAO Lili, LI Cuiping, LI Rujiao, LIANG Fang, MA Lina, SANG Jian, SONG Shuhui, TIAN Dongmei, ZOU Dong

Graduate Students:
SUN Shixiang, YIN Hongyan, WANG Guangyu, XU Xingjian, XIA Lin, YU Chunlei, LI Mengwei, LIU Lin