Data Resources Department

DU zhenglin
duzhl@big.ac.cn

Introduction 

Aligned with national strategic priorities, we are driving the establishment of China population cohort data platform and leading genomics analysis for cancer datasets. Additionally, we are actively contributing to key initiatives for the construction of China National Center for Bioinformation, focusing on developing core technologies key technologies for biomedical ontology and AI-driven knowledge database.

Group leader

DU Zhenglin

Biography

 2025.08–Present, Principal Engineer, Beijing Institute of Genomics, CAS  (China National Center for Bioinformation)

2024.07–2025.07, Senior Engineer, Beijing Institute of Genomics, CAS  (China National Center for Bioinformation)

2022.05–2024.06, Head of Information Department, Shanghai PuxiHeGuang Gene Technology Co., Ltd.

2012.01–2022.04, Senior Engineer, Beijing Institute of Genomics, CAS

2008.06–2011.12, Engineer, Beijing Institute of Genomics, CAS

2007.07–2008.06, Engineer, BGI Research Institute, Shenzhen

2002.09–2007.07, Ph.D. in Biochemistry and Molecular Biology, China Agricultural University

1998.09–2002.07, B.S. in Biology, China Agricultural University

 

About Group Leader

Expertise in bioinformatics with extensive experience as Task Leader for multiple national key projects, including the National 863 Program, National Key Research and Development Programs, and the Strategic Priority Research Program of the Chinese Academy of Sciences. Selected as a Key Technical Talent of CAS in 2018. Led critical tasks for the Project "Chinese Academy of Sciences Precision Medicine Initiative," establishing the first high-precision genetic variation map for the Chinese population and the reference genome for Northern Han Chinese. Developed innovative computational methods, including a convolutional neural network–based algorithm for constructing genotype imputation reference panels and a reference-free somatic mutation detection algorithm for tumor samples.

Research Fields

1. Integration and mining of large-scale cohort data

2. AI-driven technologies for omics knowledge DB

3. Bioinformatics tools and online analysis platforms

 

Selected Publications

1. Xia Z#, Du Z#, Zhou X#, et al. Pan.genome and haplotype map of cassava cultivars and wild ancestors provide insights into its adaptive evolution and domestication. Mol Plant. 2025 Jun 2;18(6):1047.1071.

2. Guo S, Huang Z, Zhang Y, He Y, Chen X, Wang W, Li L, Kang Y, Gao Z, Yu J, Du Z*, Chu Y*. Enhancing Variant Calling in Whole.exome Sequencing Data Using Population.matched Reference Genomes. Genomics Proteomics Bioinformatics. 2024 Dec 3;22(5):qzae070.

3. Jiang M, Chen M, Zeng J, Du Z*, Xiao J*. A comprehensive evaluation of the potential of three next.generation short.read.based plant pan.genome construction strategies for the identification of novel non.reference sequence. Front Plant Sci. 2024 Mar 19;15:1371222.

4. Sun X, Kan C, Ma W, Du Z*, Li M*. Genomic Analysis of the Suspicious SARS.CoV.2 Sequences in the Public Sequencing Database. Microbiol Spectr. 2023 Feb 14;11(1):e0342622.

5. CNCB.NGDC Members and Partners. “Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022.” Nucleic acids research vol. 50,D1 (2022): D27.D38.

6. Shi S, Wang Q, Shang Y, Bu C, Lu M, Jiang M, Zhang H, Yu S, Zeng J, Zhang Z, Du Z*, Xiao J*. TSomVar: a tumor.only somatic and germline variant identification method with random forest. Brief Bioinform. 2022 Sep 20;23(5):bbac381.

7. Shi S, Qian Q, Yu S, Wang Q, Wang J, Zeng J, Du Z*, Xiao J*. RefRGim: an intelligent reference panel reconstruction method for genotype imputation with convolutional neural networks. Brief Bioinform. 2021 Nov 5;22(6):bbab326.

8. Liu Q#, Du Z#, Zhu S, Zhao W, Chen H, Xue Y. Metagenomic evidence for the co.existence of SARS and H1N1 in patients from 2007.2012 flu seasons in France. Biosaf Health. 2021 Nov 9.

9. Gong Z, Zhu JW, Li CP, Jiang S, Ma LN, Tang BX, Zou D, Chen ML, Sun YB, Song SH, Zhang Z, Xiao JF, Xue YB, Bao YM, Du ZL*, Zhao WM*. An online coronavirus analysis platform from the National Genomics Data Center. Zool Res. 2020 Nov 18;41(6):705.708.

10. Zeng Jingyao,Yuan Na,Wei Wenjuan,Li Gen*,Du Zhenglin*. Challenges of High.Throughput Computing in Genomic Data Analysis for Large.Scale Cohort Studies[J]. Frontiers of Data and Computing, 2020, 2(1): 117.127.

11. Du Z, Ma L, Qu H, et a. Whole Genome Analyses of Chinese Population and De Novo Assembly of A Northern Han Genome. Genomics Proteomics Bioinformatics. 2019 Jun;17(3):229.247.

12. Zhang L*, Bai W, Yuan N, Du Z*. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput Biol. 2019 May 28;15(5):e1007069. PLoS Comput Biol. 2019 Sep 20;15(9):e1007367.

13. BIG Data Center Members. “The BIG Data Center: from deposition to integration to translation.” Nucleic acids research vol. 45,D1 (2017): D18.D24.

14. Yu L#, Wang GD#, Ruan J#, Chen YB#, Yang CP#, Cao X#, Wu H#, Liu YH#, Du ZL#, et al. Genomic analysis of snub.nosed monkeys (Rhinopithecus) identifies genes and processes related to high.altitude adaptation. Nat Genet. 2016 Aug;48(8):947.52.

Patent

Group Members

LI Zhibo, YANG Ying