Data Resources Department

SONG shuhui
songshh@cncb.ac.cn

Introduction 

Song research group focuses on computational bioinformatics and biological big data science, with strengths in data curation, integration, standardization, and algorithm development for large-scale biological datasets.

The research spans genomics and image-based phenomics, with an emphasis on scalable methods for genomic variation analysis, evolutionary modeling, and machine learning–based data analysis.

The group develops and maintains national-level bioinformatics resources, including the Genome Variation Map (GVM), Open Plant Image Archive (OPIA) for plant image phenotypes, GWAS Atlas, and RCoV19 for SARS-CoV-2 genome variation and real-time surveillance etc. These efforts support applications in smart agriculture and provide critical data and methodological infrastructure for infectious disease monitoring and public health response.

Group leader:SONG Shuhui

Biography

2024.08—Present, Professor, Director, Bioinformatics Service Department, China National Center for Bioinformation

2022.01—Present, Professor, Doctoral Supervisor, Beijing Institute of Genomics, CAS(China National Center for Bioinformation)

2016—Present, Lab Head, Genomic Variation Group, National Genomics Science Data Center, Beijing Institute of Genomics, CAS(China National Center for Bioinformation)

2011—2015, Head, RNA Group, Bioinformatics Analysis Department, Beijing Institute of Genomics, CAS

2011—2021, Associate Professor, Beijing Institute of Genomics, CAS

2008—2010, Assistant Professor, Beijing Institute of Genomics, CAS

About Group Leader

Song Shuhui, PhD, is a Professor at the China National Center for Bioinformation (CNCB) and the Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS). She serves as Associate Director of the National Genomics Data Center (NGDC) and Director of the Bioinformatics Service Department at CNCB.

Her research focuses on computational bioinformatics, biological big data integration and curation, and algorithm and database development. She has published extensively in leading international journals, including Nucleic Acids Research, Briefings in Bioinformatics, and PNAS, and has made important contributions to national bioinformatics data resources.

She has received several prestigious honors, including Distinguished Professor of the Chinese Academy of Sciences, recognition for multiple “Top 10 Advances in Bioinformatics in China”, and selection to the Beijing Nova Program and the Youth Innovation Promotion Association of CAS.

Research Fields

1. Computational bioinformatics

2. Big data integration & curation

3. Algorithm & AI model development

 

Selected Publications

Representative Publications in the Past Five Years:

1. Zhu Z, Wang Y, Liu S, Wang S, Li J, Fang C, Liu Y, Yang X, Tian D, Song S, Tian Z. Genomic atlas of 8,105 accessions reveals stepwise domestication, global dissemination, and improvement trajectories in soybean. Cell,2025, 188(23): 6519-6535.e6515.

https://pubmed.ncbi.nlm.nih.gov/41038165/

2. Zhang S, Chen X, Jin E, Wang A, Chen T, Zhang X, Zhu J, Dong L, Sun Y, Yu C, Zhou Y, Fan Z, Chen H, Zhai S, Sun Y, Chen Q, Xiao J, Song S, Zhang Z, Bao Y, Wang Y, Zhao W. The GSA Family in 2025: A Broadened Sharing Platform for Multi-omics and Multimodal Data. Genomics Proteomics Bioinformatics,2025, 23(4). 

https://pubmed.ncbi.nlm.nih.gov/40857552/

3. Luo H, Bai X, Wu Z, Ren S, Xie H, Yuan Z, Zhang J, Tian D, Song S. SugarcaneOmics: An integrative multi-omics platform for sugarcane research. Plant Commun,2025, 6(11): 101489. 

https://pubmed.ncbi.nlm.nih.gov/40849681/

4. Bai X, Xie H, Luo H, Ren S, Tang B, Li C, Wang Y, Xu B, Wu Z, Tian D, Song S. Genome Variation Map: a platform for the analysis and integration of genomic variation. Nucleic Acids Res,2025. 

https://pubmed.ncbi.nlm.nih.gov/41359037/

5. Tian D, Xu T, Kang H, Luo H, Wang Y, Chen M, Li R, Ma L, Wang Z, Hao L, Tang B, Zou D, Xiao J, Zhao W, Bao Y, Zhang Z, Song S. Plant genomic resources at National Genomics Data Center: assisting in data-driven breeding applications. aBIOTECH,2024, 5(1): 94-106.

https://pubmed.ncbi.nlm.nih.gov/38576435/

6. Li L, Li C, Li N, Zou D, Zhao W, Luo H, Xue Y, Zhang Z, Bao Y, Song S. Machine Learning Early Detection of SARS-CoV-2 High-Risk Variants. Adv Sci (Weinh),2024, 11(45): e2405058. 

https://pubmed.ncbi.nlm.nih.gov/39401400/

7. Cao Y, Tian D, Tang Z, Liu X, Hu W, Zhang Z, Song S. OPIA: an open archive of plant images and related phenotypic traits. Nucleic Acids Res,2024, 52(D1): D1530-d1537. 

https://pubmed.ncbi.nlm.nih.gov/37930849/

8. Liu Y, Zhang Y, Liu X, Shen Y, Tian D, Yang X, Liu S, Ni L, Zhang Z, Song S, Tian Z. SoyOmics: A deeply integrated database on soybean multi-omics. Mol Plant,2023, 16(5): 794-797. 

https://pubmed.ncbi.nlm.nih.gov/36950735/

9. Li L, Xu B, Tian D, Wang A, Zhu J, Li C, Li N, Zhao W, Shi L, Xue Y, Zhang Z, Bao Y, Zhao W, Song S. McAN: a novel computational algorithm and platform for constructing and visualizing haplotype networks. Brief Bioinform,2023, 24(3). 

https://pubmed.ncbi.nlm.nih.gov/37170752/

10. Li C, Ma L, Zou D, Zhang R, Bai X, Li L, Wu G, Huang T, Zhao W, Jin E, Bao Y, Song S. RCoV19: A One-stop Hub for SARS-CoV-2 Genome Data Integration, Variant Monitoring, and Risk Pre-warning. Genomics Proteomics Bioinformatics,2023, 21(5): 1066-1079. 

https://pubmed.ncbi.nlm.nih.gov/37898309/

11. Liu X, Tian D, Li C, Tang B, Wang Z, Zhang R, Pan Y, Wang Y, Zou D, Zhang Z, Song S. GWAS Atlas: an updated knowledgebase integrating more curated associations in plants and animals. Nucleic Acids Res, 2023,51(D1):D969-D976. 

https://www.ncbi.nlm.nih.gov/pubmed/36263826.

12. Hua Z, Jiang C, Song S, Tian D, Chen Z, Jin Y, Zhao Y, Zhou J, Zhang Z, Huang L, Yuan Y. Accurate identification of taxon-specific molecular markers in plants based on DNA signature sequence. Mol Ecol Resour,2023,23(1):106-117.

https://www.ncbi.nlm.nih.gov/pubmed/35951477.

13. Zhang ZW, Teng X, Zhao F, Ma C, Zhang J, Xiao LF, Wang Y, Chang M, Tian Y, Li C, Zhang Z, Song S, Tong WM, Liu P, Niu Y. METTL3 regulates m(6)A methylation of PTCH1 and GLI2 in Sonic hedgehog signaling to promote tumor progression in SHH-medulloblastoma. Cell Rep,2022,41(4):111530. 

https://www.ncbi.nlm.nih.gov/pubmed/36288719.

14. Song S as co-first author in National Genomics Data Center Members and Partners. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res,2022,50(D1):D27-D38.

https://www.ncbi.nlm.nih.gov/pubmed/34718731.

15. Ma Y, Chen M, Bao Y, Song S, Team MP. MPoxVR: A comprehensive genomic resource for monkeypox virus variant surveillance. Innovation (Camb),2022,3(5):100296.

https://www.ncbi.nlm.nih.gov/pubmed/36039088.

16. Hua Z, Tian D, Jiang C, Song S, Chen Z, Zhao Y, Jin Y, Huang L, Zhang Z, Yuan Y. Towards comprehensive integration and curation of chloroplast genomes. Plant Biotechnol J,2022,20(12):2239-2241. 

https://www.ncbi.nlm.nih.gov/pubmed/36069606.

17. Teng X, Li Q, Li Z, Zhang Y, Niu G, Xiao J, Yu J, Zhang Z, Song S. Compositional Variability and Mutation Spectra of Monophyletic SARS-CoV-2 Clades. Genomics, Proteomics & Bioinformatics,2021.

https://www.sciencedirect.com/science/article/pii/S1672022921000103.

18. Song S as co-first author in National Genomics Data Center Members and Partners. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021. Nucleic Acids Res,2021,49(D1):D18-D28.

https://www.ncbi.nlm.nih.gov/pubmed/33175170.

19. Song S as co-first author in China-WHO Jointly team. WHO-convened global study of origins of SARS-CoV-2: China Part. 2021.

https://www.who.int/publications/i/item/who-convened-global-study-of-origins-of-sars-cov-2-china-part.

20. Song S, Li C, Kang L, Tian D, Badar N, Ma W, Zhao S, Jiang X, Wang C, Sun Y, Li W, Lei M, Li S, Qi Q, Ikram A, Salman M, Umair M, Shireen H, Batool F, Zhang B, Chen H, Yang Y, Ali Abbasi A, Li M, Xue Y, Bao Y. Genomic Epidemiology of SARS-CoV-2 in Pakistan. Genomics Proteomics Bioinformatics,2021. 

https://www.ncbi.nlm.nih.gov/pubmed/34695600.

21. Liu X, Wang P, Teng X, Zhang Z, Song S. Comprehensive Analysis of Expression Regulation for RNA m6A Regulators With Clinical Significance in Human Cancers. Front Oncol,2021,11:624395.

https://www.ncbi.nlm.nih.gov/pubmed/33718187.

22. Li C, Tian D, Tang B, Liu X, Teng X, Zhao W, Zhang Z, Song S. Genome Variation Map: a worldwide collection of genome variations across multiple species. Nucleic Acids Res,2021,49(D1):D1186-D1191. 

https://www.ncbi.nlm.nih.gov/pubmed/33170268.

23. Zhao WM, Song SH, Chen ML, Zou D, Ma LN, Ma YK, Li RJ, Hao LL, Li CP, Tian DM, Tang BX, Wang YQ, Zhu JW, Chen HX, Zhang Z, Xue YB, Bao YM. The 2019 novel coronavirus resource. Yi Chuan,2020,42(2):212-221. 

https://www.ncbi.nlm.nih.gov/pubmed/32102777.

24. Yan J, Zou D, Li C, Zhang Z, Song S, Wang X. SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice. Genomics Proteomics Bioinformatics,2020,18(2):173-185.

https://www.ncbi.nlm.nih.gov/pubmed/32619768.

25. Tian D, Wang P, Tang B, Teng X, Li C, Liu X, Zou D, Song S, Zhang Z. GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res,2020,48(D1):D927-D932. 

https://www.ncbi.nlm.nih.gov/pubmed/31566222.

26. Song S as co-first author in National Genomics Data Center Members and Partners. Database Resources of the National Genomics Data Center in 2020. Nucleic Acids Res,2020,48(D1):D24-D33. 

https://www.ncbi.nlm.nih.gov/pubmed/31702008.

27. Song S, Ma L, Zou D, Tian D, Li C, Zhu J, Chen M, Wang A, Ma Y, Li M, Teng X, Cui Y, Duan G, Zhang M, Jin T, Shi C, Du Z, Zhang Y, Liu C, Li R, Zeng J, Hao L, Jiang S, Chen H, Han D, Xiao J, Zhang Z, Zhao W, Xue Y, Bao Y. The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR. Genomics Proteomics Bioinformatics,2020,18(6):749-759. 

https://www.ncbi.nlm.nih.gov/pubmed/33704069.

28. Liu S, Li C, Wang H, Wang S, Yang S, Liu X, Yan J, Li B, Beatty M, Zastrow-Hayes G, Song S, Qin F. Mapping regulatory variants controlling gene expression in drought response and tolerance in maize. Genome Biol,2020,21(1):163.

https://www.ncbi.nlm.nih.gov/pubmed/32631406.

29. Liu Q, Zhao S, Shi CM, Song S, Zhu S, Su Y, Zhao W, Li M, Bao Y, Xue Y, Chen H. Population Genetics of SARS-CoV-2: Disentangling Effects of Sampling Bias and Infection Clusters. Genomics Proteomics Bioinformatics,2020. 

https://www.ncbi.nlm.nih.gov/pubmed/32663617.

30. Gong Z, Zhu JW, Li CP, Jiang S, Ma LN, Tang BX, Zou D, Chen ML, Sun YB, Song SH, Zhang Z, Xiao JF, Xue YB, Bao YM, Du ZL, Zhao WM. An online coronavirus analysis platform from the National Genomics Data Center. Zool Res,2020,41(6):705-708.

https://www.ncbi.nlm.nih.gov/pubmed/33045776.

 

Patent

1. Rapid detection method and kit for indicating the total glycoside content range of Cistanche deserticola raw material; 201510648363

2. SARS-CoV-2 sequence surveillance and high-risk variant early warning system V1.0; 2023SR0397611

3. Online analysis software for identification and annotation of SARS-CoV-2 genome sequence variations V1.0; 2021SR1971135

4. Automated monitoring system for SARS-CoV-2 genome sequence variations; 2021SR2039217

5. Genomic variation-associated knowledge database system V1.0 [GWAS Atlas]; 2020SR0054886

6. Genomic variation database system V2.0; 2021SR1324808

7. Genomic variation database system V1.0; 2019SR0347243

8. MeRIP-PF: Software for identifying RNA m6A modification peaks based on MeRIP-Seq data; 2014SR057409

9. wapRNA: Online RNA analysis software based on high-throughput sequencing; 2012SR0832

 

Group Members

TIAN Dongmei Senior Engineer

LI Cuiping Senior Engineer

LI Lun Associate Researcher

LUO Hong Engineer

BAI Xue Assistant Researcher

 

Awards and Honors

1. Outstanding Member, Youth Innovation Promotion Association, Chinese Academy of Sciences

2. Member, Youth Innovation Promotion Association, Chinese Academy of Sciences

3. Selected as a Beijing Science and Technology Rising Star

4. Member of the Advanced Collective in the National Science and Technology System for COVID-19 Response, Ministry of Science and Technology of China (December 29, 2021)

5. Member of the Advanced Collective in the National Genomic Data Center, Ministry of Science and Technology of China (December 29, 2021)

6. Recognized for one of the “Top 10 Advances in Bioinformatics in China” (National Genomics Data Center)

7. Recognized for one of the “Top 10 Advances in Bioinformatics in China” (National Genomics Data Center, SARS-CoV-2 Resources)

8. Recognized for one of the “Top 10 Advances in Bioinformatics in China” (Big Data Center for Life and Health)

 

Contact or Others

Internal Tel: +86-(0)10-84097620

External Tel: +86-18618491148

e-mail: songshh@big.ac.cn

Work address: NO.1 Beichen West Road, Chaoyang District, Beijing 100101, China