Evolutionary Knowledge Integration: Foundational Infrastructure and Universal Standards
Redefining the Backbone: A Foundational Infrastructure for Tree-Based Knowledge Integration Link to heading
Over the past decade, our work has fundamentally reshaped how the scientific community operates, integrates, and understands tree-structured biological data. By establishing a widely-adopted infrastructure for phylogenetics, we have moved the field beyond simple visualization into a new paradigm of programmable knowledge synthesis. Our contributions address the core challenges of data fragmentation, theoretical abstraction, and multi-scale integration, providing the rigorous analytical foundations required for modern systems biology.
Two monographs have been published to introduce this series of work: “Data integration, manipulation and visualization of phylogenetic trees” (in English) by CRC Press and 《R实战:系统发育树的数据集成操作与可视化》 (in Chinese) by Publishing House of Electronics Industry (电子工业出版社).
Pillar 1: Bridging the Format Divide — The Universal Infrastructure Link to heading
The outputs of phylogenetic tools have historically been confined to fragmented, non-standard formats, creating significant barriers to knowledge integration. To resolve this, we developed treeio, which serves as the universal infrastructure for the field.
Format Interoperability: treeio resolved the “Format Fragmentation” problem by providing a robust parser for over 20 standard and non-standard formats. This enables the seamless exchange of evolutionary data across disparate software ecosystems and forms the basis for ESI-highly-cited research published in Molecular Biology and Evolution.
Pillar 2: The Grammar of Graphics for Evolution — Theoretical Leadership Link to heading
Before our work, tree visualization was largely restricted to topological display. We pioneered the application of the Grammar of Graphics to phylogenetics through ggtree, decoupling evolutionary data from its visual representation.
Global Standards: ggtree has become a widely-adopted tool for tree annotation, cited in thousands of studies across high-impact journals. Recognized as a “representative work” for the 10th anniversary of Methods in Ecology and Evolution, it provides a high-level abstraction that allows for infinite extensibility in mapping omics data onto evolutionary histories.
Pillar 3: Multi-Layer Synthesis & Data-Driven Integration — Mastering Complexity Link to heading
As omics data reached unprecedented scales, our team introduced the “Data-to-Tree” paradigm in foundational work published in Molecular Biology and Evolution in 2018 (ESI highly cited). This work proposed two comprehensive methods that redefined the integration of heterogeneous data within a unified evolutionary context.
Theoretical Foundations & universal derivatives: The two methods introduced in 2018 have since evolved from specialized phylogenetic tools into universal visualization standards:
- Method 1 (Topological Mapping): Focused on mapping data directly onto tree structures, this paradigm evolved into ggtangle for universal tidy-network visualization.
- Method 2 (Coordinate Alignment): Focused on reconciling disparate data layers with tree topology, this logic provided the foundational architecture for aplot, a widely-adopted tool for multi-layer plot alignment.
Expanding the Ecosystem: These principles were further extended to address specialized biological data types and relational structures: - Molecular Context (ggmsa): Integrating sequence-level information is critical for understanding the molecular basis of evolution. ggmsa provides a modular grammar for multiple sequence alignment (MSA) visualization, enabling the side-by-side alignment of structural and genomic conservation data with phylogenetic trees.
- Relational Flow (ggflow): Beyond static structures, biological evolution and research protocols involve directional transitions. ggflow introduces a grammar for visualizing tree-like flowcharts and process transitions, allowing researchers to document analytical workflows or evolutionary state-change paths within the same ecosystem.
- Layered Complexity (ggtreeExtra): ggtreeExtra handles massive multi-omics layers in complex layouts, enabling the integration of diverse data types around phylogenetic trees.
- Spatial Mapping (ggtreeSpace): ggtreeSpace explores the geometric mapping of evolutionary distances, providing spatial representations of phylogenetic relationships.
Programmable Reproducibility: Our work in iMeta (2022) established the ggtree object—a programmable structure that ensures analytical reproducibility by encapsulating trees, data, and visualization directives.
Pillar 4: Vertical Generalization — From Phylogeny to General Hierarchy Link to heading
To demonstrate the universal utility of our grammar, we expanded its scope beyond evolutionary biology. By generalizing the framework to encompass all hierarchical structures, we bridged the gap between specialized biological interpretation and general data science.
Universal Scope: Through ggtreeDendro, we extended the phylogenetic grammar to hierarchical clustering and classification/regression trees. This enables the same rigorous data integration methods used in phylogenetics to be applied to any sample-level or feature-level hierarchical relationship (e.g., cell clustering), unifying disparate analytical workflows under a single theoretical umbrella.
Feedback from the academic community Link to heading
|
|
Publications Link to heading
- M Chen#, X Luo#, S Xu#, L Li, J Li, Z Xie, Q Wang, Y Liao, B Liu, W Liang, K Mo, Q Song, X Chen*, TTY Lam*, G Yu*. Scalable method for exploring phylogenetic placement uncertainty with custom visualizations using treeio and ggtree. iMeta. 2025, 4(1):e269.
- L Zhan#, X Luo#, W Xie#, XA Zhu#, Z Xie, J Lin, L Li, W Tang, R Wang, L Deng, Y Liao, B Liu, Y Cai, Q Wang, S Xu*, G Yu*. shinyTempSignal: an R shiny application for exploring temporal and other phylogenetic signals. Journal of Genetics and Genomics. 2024, 51(7):762-768.
- L Li, W Xie, L Zhan, S Wen, X Luo, S Xu, Y Cai, W Tang, Q Wang, M Li, Z Xie, L Deng, H Zhu, G Yu*. Resolving Tumor Evolution: A Phylogenetic Approach. Journal of the National Cancer Center. 2024, 4(2):97-106.
- S Xu, L Li, X Luo, M Chen, W Tang, L Zhan, Z Dai, TT. Lam, Y Guan, G Yu*. Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta, 2022, 1(4):e56.
- G Yu. Data Integration, Manipulation and Visualization of Phylogenetic Treess (1st edition). Chapman and Hall/CRC, 2022. doi: 10.1201/9781003279242
- L Zhou#, T Feng#, S Xu, F Gao, TT Lam, Q Wang, T Wu, H Huang, L Zhan, L Li, Y Guan, Z Dai*, G Yu*. ggmsa: a visual exploration tool for multiple sequence alignment and associated data. Briefings in Bioinformatics. 2022, 23(4):bbac222.
- S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L Zhan, T Wu, E Hu, Y Jiang*, X Bo*, G Yu*. ggtreeExtra: Compact visualization of richly annotated phylogenetic data. Molecular Biology and Evolution. 2021, 38(9):4039-4042.
- G Yu*. Using ggtree to Visualize Data on Tree-Like Structure. Current Protocols in Bioinformatics. 2020, 69(1):e96.
- LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020, 37(2):599-603.
- G Yu*, TTY Lam, H Zhu, Y Guan*. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. 2018, 35(12):3041-3043.
- G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36.


