Phylogenetic Data Integration: Methods and Applications
We have developed methods and software tools for the operation, integration, and visualization of phylogenetic trees and data. Key contributions include: (1) parsing and integrating phylogenetic data; (2) visualizing phylogenetic trees using the grammar of graphics; (3) mapping heterogeneous data onto evolutionary trees; and (4) ensuring reproducibility with programmable data structures. These efforts aim to support researchers in analyzing data within an evolutionary context.
Two monographs have been published to introduce this series of work: “Data integration, manipulation and visualization of phylogenetic trees” (in English) by CRC Press and 《R实战:系统发育树的数据集成操作与可视化》 (in Chinese) by Publishing House of Electronics Industry (电子工业出版社).
1. Parsing and Integrating Phylogenetic Data Link to heading
|
The outputs of phylogenetic software are often in non-standard formats, leading to compatibility issues and hindering integration and comparative analysis. To address this challenge, we developed treeio, a tool capable of parsing both standard and a variety of non-standard data formats. It facilitates the integration of external data and supports exporting phylogenetic trees and associated data into a single file. This capability enables data format conversion and integration, supporting downstream analysis. Our work was published in Molecular Biology and Evolution in 2020. |
2. Visualizing Phylogenetic Trees with Grammar of Graphics Link to heading
|
Numerous software tools exist for visualizing phylogenetic trees, but they primarily focus on displaying the tree’s topological structure. ggtree introduces the grammar of graphics into the visualization of phylogenetic trees and related data. This approach enables visualization through a simple grammar, reducing the complexity of data visualization and accommodating various requirements. This work was published in Methods in Ecology and Evolution in 2017. An invited protocol paper demonstrating the use of this package was also published in Current Protocols in Bioinformatics in 2020. |
3. Associating and Visualizing Data on Phylogeny Link to heading
|
We proposed two methods for integrating and visualizing phylogenetic data. The first method enables data to be directly mapped onto the tree’s topology. The second method involves the restructuring external data based on the tree’s topology, allowing users to visualize it according to their specifications, and subsequently aligning the visualization with the phylogenetic tree. These methods enable the integration of heterogeneous data within the context of phylogenetics. This work was published in Molecular Biology and Evolution in 2018. The ggtreeExtra package, which enhances these capabilities, was published in Molecular Biology and Evolution in 2021. |
4. Enhancing Reproducibility Link to heading
|
Visualisation of phylogenetic trees typically manifests as static images, leading to a lack of reusability. We devised the ggtree object, which encapsulates the phylogenetic tree, data, and visualization directives. This object can be rendered into an image, from which the phylogenetic tree and related data can be extracted. Furthermore, the visualization directives are transferable for visualizing other tree objects. This work, published in iMeta in 2022, supports data reusability and research replicability. |
5. Extending to General Hierarchical Structures Link to heading
|
We have broadened the scope of tools pertaining to tree data integration and visualization to encompass other tree-like structures, such as hierarchical clustering. We developed the ggtreeDendro package to accommodate general hierarchical structures and are working on the ecluster package to support various omics data structures. These advancements enable the interpretation and integration of related data based on their hierarchical relationships. |
Feedback from the academic community Link to heading
|
|
Publications Link to heading
- M Chen#, X Luo#, S Xu#, L Li, J Li, Z Xie, Q Wang, Y Liao, B Liu, W Liang, K Mo, Q Song, X Chen*, TTY Lam*, G Yu*. Scalable method for exploring phylogenetic placement uncertainty with custom visualizations using treeio and ggtree. iMeta. 2025, 4(1):e269.
- L Zhan#, X Luo#, W Xie#, XA Zhu#, Z Xie, J Lin, L Li, W Tang, R Wang, L Deng, Y Liao, B Liu, Y Cai, Q Wang, S Xu*, G Yu*. shinyTempSignal: an R shiny application for exploring temporal and other phylogenetic signals. Journal of Genetics and Genomics. 2024, 51(7):762-768.
- L Li, W Xie, L Zhan, S Wen, X Luo, S Xu, Y Cai, W Tang, Q Wang, M Li, Z Xie, L Deng, H Zhu, G Yu*. Resolving Tumor Evolution: A Phylogenetic Approach. Journal of the National Cancer Center. 2024, 4(2):97-106.
- S Xu, L Li, X Luo, M Chen, W Tang, L Zhan, Z Dai, TT. Lam, Y Guan, G Yu*. Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta, 2022, 1(4):e56.
- G Yu. Data Integration, Manipulation and Visualization of Phylogenetic Treess (1st edition). Chapman and Hall/CRC, 2022. doi: 10.1201/9781003279242
- L Zhou#, T Feng#, S Xu, F Gao, TT Lam, Q Wang, T Wu, H Huang, L Zhan, L Li, Y Guan, Z Dai*, G Yu*. ggmsa: a visual exploration tool for multiple sequence alignment and associated data. Briefings in Bioinformatics. 2022, 23(4):bbac222.
- S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L Zhan, T Wu, E Hu, Y Jiang*, X Bo*, G Yu*. ggtreeExtra: Compact visualization of richly annotated phylogenetic data. Molecular Biology and Evolution. 2021, 38(9):4039-4042.
- G Yu*. Using ggtree to Visualize Data on Tree-Like Structure. Current Protocols in Bioinformatics. 2020, 69(1):e96.
- LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020, 37(2):599-603.
- G Yu*, TTY Lam, H Zhu, Y Guan*. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. 2018, 35(12):3041-3043.
- G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36.







