Visualizing Complexity: A Universal Ecosystem for Multi-Scale Scientific Discovery
Data Visualization Ecosystem Link to heading
We have developed a suite of software tools to transform data into intuitive and accurate visual narratives. This ecosystem spans from foundational plotting frameworks to specialized visualization tools for phylogeny and functional omics.
1. General Purpose Plotting Tools Link to heading
To address the fragmentation of the R plotting landscape, we developed a suite of tools that unify disparate plotting systems and introduce data-driven alignment.
- plotbb: Brings the Grammar of Graphics to Base R, allowing for structured, layered plotting within the traditional R graphics system.
- ggplotify: Allows researchers to convert virtually any plot object (Base, Lattice, pheatmap, etc.) into a ggplot2 compatible object, enabling seamless integration and complex assembly within the ggplot2 ecosystem.
- aplot & aplotExtra: Moving beyond simple figure assembly, this suite introduces a systematic approach to automatically synchronize coordinate systems based on the underlying data structure (e.g., matching axes and reordering categories), ensuring heterogeneous subplots are both statistically and spatially congruent.
- ggbreak: Provides a seamless, non-destructive method for axis breaks, essential for visualizing datasets with extreme outliers or multi-scale distributions.
- ggtangle & ggflow: ggtangle visualizes networks within the tidy framework, while ggflow provides a dedicated grammar for flowcharts and transition processes.
- ggfun: Provides foundational utilities that enhance the developer and user experience across the entire ecosystem.
These tools establish a unified framework that bridges plotting systems, enables cross-system interoperability, and introduces intelligent data-driven alignment.
2. Specialized Visualization Tools Link to heading
Beyond general-purpose utilities, we have developed visualization tools for specific biological domains.
- Phylogenetic Visualization (Phylogenetic Contribution): Led by ggtree, ggtreeExtra, ggtreeSpace, and ggtreeDendro, this suite facilitates the integration of tree-structured data with multi-omics layers.
- Functional Enrichment (Knowledge Mining Contribution): enrichplot transforms enrichment results into biologically intuitive visual insights, enabling the automated interpretation of omics datasets.
- Sequence & Genomic Visualization: seqcombo and ggmsa provide a modular grammar for multiple sequence alignment and the visualization of genomic reassortment events.
- Single-Cell & Fine-Scale Omics: ggsc and ivolcano address the unique needs of high-resolution data, providing specialized geometries for single-cell clusters and differential expression.
- Glycobiology: gglycan introduces a grammar for visualizing complex glycan structures, supporting standard symbolic nomenclature (e.g., SNFG).
These tools are widely used in biological research and cited in numerous studies.
3. Semantic Enrichment Link to heading
To bridge the gap between abstract data and human intuition, we developed tools for semantic enrichment and professional branding.
- ggimage & scatterpie: Extending the visual vocabulary of ggplot2 to include external imagery and composite geometries.
- ggstar: Provides a comprehensive suite of easily discernible polygonal shapes for ggplot2.
- emojifont, shadowtext, & meme: Enhancing semantic storytelling through advanced typography and cultural icons.
- hexSticker: Facilitating professional branding for R developers to create sticker for R packages.
Community Impact Link to heading
Our visualization frameworks have been widely adopted and integrated into many third-party packages. This work serves as important infrastructure for bioinformatics visualization workflows.
Selected Publications Link to heading
- S Xu, M Chen, T Feng, L Zhan, L Zhou, G Yu*. Use ggbreak to effectively utilize plotting space to deal with large datasets and outliers. Frontiers in Genetics. 2021, 12:774846.
- G Yu. Data Integration, Manipulation and Visualization of Phylogenetic Trees (1st edition). Chapman and Hall/CRC, 2022. doi: 10.1201/9781003279242
- S Xu, H Dai, X Bo, G Yu*. ggmsa: a visual exploration tool for multiple sequence alignment and associated data. Briefings in Bioinformatics. 2021, 22(6):bbab222.