Data Resources Department

ZHANG Zhang
zhangzhang(AT)big.ac.cn

Introduction 

With the rapid accumulation of massive multi-omics data, we focus on the national strategic needs in big data deposition, integration, and translation to advance life & health sciences by developing a suite of biological databases, algorithms, and tools, with the aim to translate big data into big discoveries and support worldwide activities in both academia and industry.

Group leader

ZHANG Zhang

Biography

2024Present, Head of Department of Data Resources, CNCB

2020—Present, Associate Director of National Genomics Data Center, Beijing Institute of Genomics, CAS & China National Center for Bioinformation (CNCB)

2016—2020, Executive Director of BIG Data Center, Beijing Institute of Genomics, CAS

2016—Present, Professor, University of Chinese Academy of Sciences

2011—Present, Professor in the CAS 100-Talent Program, CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, CAS

2009—2011, Research Scientist, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

2007—2009, Postdoctoral Associate, Yale University, New Haven, Connecticut, United States of America

2004—2007, Ph.D. in Computer Science, Institute of Computing Technology, Chinese Academy of Sciences (CAS)

2002—2004, M.S. in Computer Science, Nanjing University of Science and Technology, Nanjing, China

1998—2002, B.S. in Computer Science, Ningxia University, Yinchuan, China

About Group Leader

Dr. Zhang Zhang is a Distinguished Professor of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS) & China National Center for Bioinformation (CNCB), and acts as Associate Director of the National Genomics Data Center (NGDC), which is part of BIG & CNCB. His research focuses on biological big data integration, development of biological databases and computational tools, and theoretical biology through new algorithms and models to uncover fundamental principles of life. Dr. Zhang serves as Associate Editor-in-Chief of Genomics Proteomics & Bioinformatics.

Research Fields

1. Big Data Integration and Curation: construction of multi-omics databases and knowledgebases by big data integration and curation and development of new theory for biological big data commons and ecosystem, with particular focuses on public health and national strategic important species.

2. Computational Molecular Evolution: establishment of molecular evolutionary models and theories at the nucleotide and codon levels and development of new methods and tools for detecting national selection pressure.

3. Computational Health Genomics: development of new methods and algorithms by associating omics data with health data and conducting integrative data analysis and deep mining via artificial intelligence and machine learning, with the aim to provide more effective ways for precision health and medical treatment for brain tumors, like glioma.

Selected Publications

Publications as first author or corresponding author (# co-first author; * corresponding author)

1. Kang Z, Zhu T, Zou D, Liu M, Zhang Y, Wang L, Zhang Z*, Liu F (2025) HemAtlas: A Multi-omics Hematopoiesis Database. Genomics Proteomics Bioinformatics, 10.1093/gpbjnl/qzaf026.

2. Su Y, Han Z, Ji Y, Liu A, Zou D, Yan L, Liu D, Zhang Z*, Wang QF (2025) Patterns and variations of copy number alterations in acute myeloid leukemia: insights from the LeukAtlas database. Leukemia, 39, 827-836.

3. Zhang Z* listed in CNCB-NGDC Members & Partners (2025) Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2025. Nucleic Acids Res, 53, D30-D44.

4. Zhang Z * (2024) Expanding bioinformatics: Toward a paradigm shift from data to theory. Fundamental Research, doi: 10.1016/j.fmre.2024.11.019.

5. Zhang Z* (2024) Laws of Genome Nucleotide Composition. Genomics Proteomics Bioinformatics, 22, doi: 10.1093/gpbjnl/qzae061.

6. Chen M, Xia L, Tan X, Gao S, Wang S, Li M, Zhang Y, Xu T, Cheng Y, Chu Y, Hu S, Wu S and Zhang Z* (2024) Seeing the unseen in characterizing RNA editome during rice endosperm development. Commun Biol, 7, 1314.

7. Li L, Li C, Li N, Zou D, Zhao W, Luo H, Xue Y, Zhang Z*, Bao Y and Song S (2024) Machine Learning Early Detection of SARS-CoV-2 High-Risk Variants. Adv Sci (Weinh), 10.1002/advs.202405058, e2405058.

8. Gao X, Chen K, Xiong J, Zou D, Yang F, Ma Y, Jiang C, Gao X, Wang G, Gu S, Zhang P, Luo S, Huang K, Bao Y, Zhang Z*, Ma L and Miao W (2024) The P10K database: a data portal for the protist 10 000 genomes project. Nucleic Acids Res, 52, D747-D755.

9. Cao Y, Tian D, Tang Z, Liu X, Hu W, Zhang Z* and Song S (2024) OPIA: an open archive of plant images and related phenotypic traits. Nucleic Acids Res, 52, D1530-D1537.

10. Ma L and Zhang Z*. (2023) The contribution of databases towards understanding the universe of long non-coding RNAs. Nat Rev Mol Cell Biol, 24, 601-602.

11. Liu X, Tian D, Li C, Tang B, Wang Z, Zhang R, Pan Y, Wang Y, Zou D, Zhang Z* and Song S. (2023) GWAS Atlas: an updated knowledgebase integrating more curated associations in plants and animals. Nucleic Acids Res, 51, D969-D976.

12. Jiang S, Qian Q, Zhu T, Zong W, Shang Y, Jin T, Zhang Y, Chen M, Wu Z, Chu Y, Zhang R, Luo S, Jing W, Zou D, Bao Y, Xiao J and Zhang Z*. (2023) Cell Taxonomy: a curated repository of cell types with multifaceted characterization. Nucleic Acids Res, 51, D853-D860.

13. Zhang Z*. (2022) KaKs_Calculator 3.0: Calculating Selective Pressure on Coding and Non-coding Sequences. Genomics Proteomics Bioinformatics, 20, 536-540.

14. Zhang Y, Zou D, Zhu T, Xu T, Chen M, Niu G, Zong W, Pan R, Jing W, Sang J, Liu C, Xiong Y, Sun Y, Zhai S, Chen H, Zhao W, Xiao J, Bao Y, Hao L and Zhang Z*. (2022) Gene Expression Nebulas (GEN): a comprehensive data portal integrating transcriptomic profiles across multiple species at both bulk and single-cell levels. Nucleic Acids Res, 50, D1016-D1024.

15. Ma L, Zou D, Liu L, Shireen H, Abbasi A A, Bateman A, Xiao J, Zhao W, Bao Y and Zhang Z*. (2022) Database Commons: A Catalog of Worldwide Biological Databases. Genomics Proteomics Bioinformatics, 10.1016/j.gpb.2022.12.004.

16. Liu L, Zhang Y, Niu G, Li Q, Li Z, Zhu T, Feng C, Liu X, Zhang Y, Xu T, Chen R, Teng X, Zhang R, Zou D, Ma L and Zhang Z*. (2022) BrainBase: a curated knowledgebase for brain diseases. Nucleic Acids Res, 50, D1131-D1138.

17. Hua Z, Tian D, Jiang C, Song S, Chen Z, Zhao Y, Jin Y, Huang L, Zhang Z* and Yuan Y. (2022) Towards comprehensive integration and curation of chloroplast genomes. Plant Biotechnol J, 20, 2239-2241.

18. Li Z, Liu L, Jiang S, Li Q, Feng C, Du Q, Zou D, Xiao J, Zhang Z* and Ma L. (2021) LncExpDB: an expression database of human long non-coding RNAs. Nucleic Acids Res, 49, D962-D968.

19. Xiong Z, Li M, Yang F, Ma Y, Sang J, Li R, Li Z, Zhang Z* and Bao Y. (2020) EWAS Data Hub: a resource of DNA methylation array data and metadata. Nucleic Acids Res, 48, D890-D895.

20. Tian D, Wang P, Tang B, Teng X, Li C, Liu X, Zou D, Song S and Zhang Z*. (2020) GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res, 48, D927-D932.

21. Song S, Ma L, Zou D, Tian D, Li C, Zhu J, Chen M, Wang A, Ma Y, Li M, Teng X, Cui Y, Duan G, Zhang M, Jin T, Shi C, Du Z, Zhang Y, Liu C, Li R, Zeng J, Hao L, Jiang S, Chen H, Han D, Xiao J, Zhang Z*, Zhao W, Xue Y and Bao Y. (2020) The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR. Genomics Proteomics Bioinformatics, 18, 749-759.

22. Liu L, Wang G, Wang L, Yu C, Li M, Song S, Hao L, Ma L and Zhang Z*. (2020) Computational identification and characterization of glioma candidate biomarkers through multi-omics integrative profiling. Biol Direct, 15, 10.

23. Wang G, Yin H, Li B, Yu C, Wang F, Xu X, Cao J, Bao Y, Wang L, Abbasi A A, Bajic V B, Ma L and Zhang Z*. (2019) Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics, 35, 2949-2956.

24. Ma L, Cao J, Liu L, Du Q, Li Z, Zou D, Bajic V B and Zhang Z*. (2019) LncBook: a curated knowledgebase of human long non-coding RNAs. Nucleic Acids Res, 47, D128-D134.

25. Li M, Zou D, Li Z, Gao R, Sang J, Zhang Y, Li R, Xia L, Zhang T, Niu G, Bao Y and Zhang Z*. (2019) EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res, 47, D983-D988.

26. Song S, Tian D, Li C, Tang B, Dong L, Xiao J, Bao Y, Zhao W, He H and Zhang Z*. (2018) Genome Variation Map: a data repository of genome variations in BIG Data Center. Nucleic Acids Res, 46, D944-D949.

27. Xu X, Ji Z and Zhang Z*. (2017) CloudPhylo: a fast and scalable tool for phylogeny reconstruction. Bioinformatics, 33, 438-440.

28. Wang Y, Song F, Zhu J, Zhang S, Yang Y, Chen T, Tang B, Dong L, Ding N, Zhang Q, Bai Z, Dong X, Chen H, Sun M, Zhai S, Sun Y, Yu L, Lan L, Xiao J, Fang X, Lei H, Zhang Z* and Zhao W. (2017) GSA: Genome Sequence Archive. Genomics Proteomics Bioinformatics, 15, 14-18.

29. Yin H, Wang G, Ma L, Yi S V and Zhang Z*. (2016) What Signatures Dominantly Associate with Gene Age? Genome Biol Evol, 8, 3083-3089.

30. Wang G, Sun S and Zhang Z*. (2016) Randomness in Sequence Evolution Increases over Time. PLoS One, 11, e0155935.

31. Wu H, Fang Y, Yu J and Zhang Z*. (2014) The quest for a unified view of bacterial land colonization. ISME J, 8, 1358-1369.

32. Ma L, Bajic V B and Zhang Z*. (2013) On the classification of long non-coding RNAs. RNA Biol, 10, 925-933.

33. Zhang Z*, Xiao J, Wu J, Zhang H, Liu G, Wang X and Dai L. (2012) ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun, 419, 779-781.

34. Zhang Z, Li J, Cui P, Ding F, Li A, Townsend J P and Yu J. (2012) Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics, 13, 43.

35. Zhang Z and Yu J. (2010) Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biol Direct, 5, 63.

36. Zhang Z, Lopez-Giraldez F and Townsend J P. (2010) LOX: inferring Level Of eXpression from diverse methods of census sequencing. Bioinformatics, 26, 1918-1919.

37. Zhang Z and Townsend J P. (2009) Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences. PLoS Comput Biol, 5, e1000421.

38. Zhao X, Zhang Z#, Yan J and Yu J. (2007) GC content variability of eubacteria is governed by the pol III alpha subunit. Biochem Biophys Res Commun, 356, 20-25.

39. Zhang Z, Li J, Zhao X Q, Wang J, Wong G K and Yu J. (2006) KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics, 4, 259-263.

Group Members

Staff:

ZOU Dong, XU Tianyi, HAN Zhenxian, LI Zhao, YUAN Zhixiang, ZHANG Yang, ZHU Tongtong, ZHOU Wei, YANG Dechang, ZHENG Bo, CHANG Yetong, LI Mingzhu

 Graduate Students:

CHEN Ming, ZHENG Xing, ZHOU Xinyu, QI Yue, CHENG Wenzhuo, MA Kehua, WANG Zihan, LUO Zheng, ZHAN Yiran, LI Pan, WANG Shiting, WANG Miaomiao, WANG Lingjie, JIAO Deming, SHI Jiachen, ZHU Hangbo

Awards and Honors

UCAS Li Pei Excellent Teacher Award, 2025

National Science Fund for Distinguished Young Scholars, 2024

Top 10 Advances in Bioinformatics in China, 2023

UCAS University-Level Excellent Undergraduate Course – Genomics, 2023

Distinguished Professor of Chinese Academy of Sciences, 2022

Top 10 Advances in Bioinformatics in China, 2022

UCAS-BHPB Excellent Supervisor Award, 2021

UCAS Lingyan Golden Award, 2020

National Ten Thousand Talent Program for Young Top-notch Talent, 2019

UCAS-BHPB Excellent Supervisor Award, 2018

Distinguished Professor of Beijing Institute of Genomics, 2017

Excellent Award in the Evaluation of the CAS 100-Talent Program, 2017

Contact or Others

Email: zhangzhang@big.ac.cn