China National Center for Bioinformation Releases HaploThread, a Software for Haplotype Network Analysis

Understanding genetic variation and evolutionary relationships is fundamental to population genetics and infectious disease research. Haplotype network analysis has become an essential approach for tracing pathogen transmission pathways, investigating evolutionary relationships, and studying population dynamics. However, most existing desktop tools for haplotype network analysis rely on single-threaded algorithms, which limits their ability to efficiently process large genomic datasets. In addition, many tools lack advanced interactive visualization capabilities for exploring temporal and spatial patterns in genetic data, creating challenges for researchers working with rapidly expanding genomic datasets.

To address these limitations, researchers from the National Genomics Data Center (NGDC), China National Center for Bioinformation, led by Prof. Shuhui Song, have developed HaploThread, an open-source desktop software designed for efficient haplotype network construction and interactive visualization. HaploThread integrates multiple state-of-the-art multi-threaded algorithms—including McAN and fastHaN (which incorporates MSN, MJN, and TCS)—into a user-friendly graphical interface. By combining advanced parallel computing with an intuitive desktop environment, the software enables researchers to rapidly construct and explore haplotype networks from large-scale genomic datasets without requiring programming expertise.

HaploThread provides two core functional modules: a network construction module and a network visualization module. The construction module supports sequence data in VCF or PHYLIP formats and allows users to run multiple multi-threaded algorithms with configurable computational threads, enabling efficient use of local computing resources. The visualization module supports network files in GraphML or GML formats and allows researchers to explore networks interactively, including automatic node coloring based on metadata such as sampling time, geographic location, or grouping information. The software also supports dynamic temporal visualization and spatial mapping of haplotypes, helping researchers track evolutionary and transmission patterns.

Performance evaluations demonstrate the significant efficiency advantages of HaploThread. In benchmarking tests using SARS-CoV-2 genome datasets, HaploThread completed both network construction and visualization for 5,000 sequences in just 23 seconds on a standard laptop, while several widely used tools were unable to complete the task within one hour. Accuracy assessments further confirmed that networks generated by HaploThread are topologically consistent with those produced by established methods, demonstrating both high computational performance and reliable network reconstruction.

The research article, titled “HaploThread: A Scalable Desktop Tool for Efficient Haplotype Network Inference and Interactive Visualization,” has been published in the journal Molecular Biology and Evolution. Prof. Shuhui Song is the corresponding author, and Dr. Bo Xu and Associate Professor Lun Li are co-first authors of the study. The software is freely available as open-source under the GNU General Public License, with source code and installation packages accessible online. This work was supported by the Key Collaborative Research Program of the Alliance of National and International·Science Organizations for the Belt·and·Road Regions, the Biological Breeding-National Science and Technology Major Project, and the National Natural Science Foundation of China.

Figure 1 HaploThread workflow diagram. It illustrates the network construction module and the network visualization module, along with their main functions, input/output formats, etc.

Article link: https://doi.org/10.1093/molbev/msag052