Visualization Overview

CRESSENT provides a comprehensive suite of visualization tools designed to create publication-ready figures for genomic and phylogenetic analysis. The visualization modules transform complex analytical results into clear, interpretable graphics suitable for scientific publications and presentations.

_images/fig_module_viz.png

Overview

The visualization suite includes four primary modules:

  • Sequence logos for motif visualization

  • Phylogenetic trees with advanced customization options

  • Tanglegrams for comparative phylogenetic analysis

  • Motif mapping for genome-wide pattern distribution

Sequence Logo Generation

Purpose

Create information-rich sequence logos that display motif conservation patterns, nucleotide preferences, and positional information content.

Basic Usage

# From FASTA sequences
cressent seq_logo -i aligned_sequences.fasta \
                 -o visualization/ \
                 -n motif_logo.pdf \
                 --plot_title "CRESS Nonanucleotide"

# From motif table
cressent seq_logo -tb motif_positions.txt \
                 -o visualization/ \
                 -n discovered_motifs.pdf \
                 --width 12 \
                 --height 6

Key Features

  • Multi-row support for long sequences (automatic wrapping)

  • Grouped analysis by metadata categories

  • Information content or probability-based scaling

  • Publication-ready output with customizable dimensions

Advanced Customization

cressent seq_logo -tb motif_table.txt \
                 --plot_title "Viral Family Comparison" \
                 --split \
                 --metadata family_data.csv \
                 --group_label "virus_family" \
                 --ncol 3 \
                 --positions_per_row 40 \
                 --method "bits" \
                 --width 15 \
                 --height 10 \
                 -o grouped_logos/

Parameters

Parameter

Description

Default

--plot_title

Logo title

sequence_logo

--width/--height

Dimensions in inches

10/10

--method

Scaling method (bits/prob)

prob

--positions_per_row

Positions before wrapping

50

--max_positions_single_row

Auto-wrap threshold

100

--split

Group by metadata

False

--ncol

Columns for grouped display

Required with –split

Output Examples

Single sequence logo:

  • Standard logo for conserved motifs

  • Information content visualization

  • Multi-row display for long sequences

Grouped logos:

  • Family-specific motif patterns

  • Comparative analysis across groups

  • Statistical significance indicators

Phylogenetic Tree Visualization

Purpose

Generate sophisticated phylogenetic tree visualizations with metadata integration, alignment display, and publication-quality formatting.

Basic Usage

# Simple tree plot
cressent plot_tree -t phylogeny.treefile \
                  -o tree_output/ \
                  --plot_name tree_analysis.pdf

# Tree with metadata
cressent plot_tree -t phylogeny.treefile \
                  --metadata_1 sequence_info.csv \
                  --metadata_2 name_mapping.tsv \
                  -o tree_output/ \
                  --color TRUE \
                  --tip_label "family"

Layout Options

Multiple tree layouts for different analytical needs:

# Circular tree
cressent plot_tree -t tree.treefile \
                  --layout circular \
                  --open_angle 90 \
                  --offset 0.2 \
                  -o circular_trees/

# Rectangular tree with alignment
cressent plot_tree -t tree.treefile \
                  --layout rectangular \
                  --alignment sequences.fasta \
                  --metadata_1 annotations.csv \
                  --color TRUE \
                  -o aligned_trees/

# Unrooted network
cressent plot_tree -t tree.treefile \
                  --layout unrooted \
                  --branch_length "branch.length" \
                  --fig_width 12 \
                  --fig_height 12 \
                  -o network_trees/

Distance Matrix Trees

Generate trees directly from distance matrices:

cressent plot_tree --dist_matrix sequences.mldist \
                  --metadata_1 annotations.csv \
                  --layout circular \
                  --color TRUE \
                  --tip_label "species" \
                  -o distance_trees/

Advanced Features

Feature

Description

Usage

Metadata integration

Color coding by groups

--metadata_1/2 + --color TRUE

Alignment display

MSA alongside tree

--alignment sequences.fasta

Custom labeling

Flexible tip labels

--tip_label "column_name"

Branch scaling

Different length metrics

--branch_length "method"

Size control

Publication dimensions

--fig_width/height

Parameters

Parameter

Description

Default

--layout

Tree layout style

rectangular

--branch_length

Branch length method

branch.length

--open_angle

Circular tree opening

0

--offset

Tip label offset

0.14

--tip_label

Metadata column for tips

family

--color

Color by groups

TRUE

--plot_tips

Show tip labels

TRUE

Tanglegram Analysis

Purpose

Compare two phylogenetic trees through tanglegram visualization, highlighting topological differences and calculating Robinson-Foulds distances.

Basic Usage

cressent tanglegram --tree1 nucleotide_tree.treefile \
                   --tree2 protein_tree.treefile \
                   --label1 "Nucleotide Tree" \
                   --label2 "Protein Tree" \
                   -o tanglegram_output/ \
                   --name_tanglegram comparison.pdf

Features

  • Automatic tree comparison with Robinson-Foulds scoring

  • Edge highlighting for topological differences

  • Branch color coding for common subtrees

  • Dynamic sizing based on tree complexity

  • Publication formatting with customizable dimensions

Advanced Analysis

cressent tanglegram --tree1 rep_phylogeny.treefile \
                   --tree2 cap_phylogeny.treefile \
                   --label1 "Rep Protein Tree" \
                   --label2 "Capsid Protein Tree" \
                   --width 25 \
                   --height 15 \
                   --lab_cex 1.2 \
                   -o comparative_analysis/ \
                   --name_tanglegram rep_vs_cap.pdf

Parameters

Parameter

Description

Default

--tree1/2

Input tree files

Required

--label1/2

Tree labels

Tree 1/Tree 2

--width/height

Figure dimensions

20/11

--lab_cex

Label size

1.5

--name_tanglegram

Output filename

tanglegram.pdf

Output Features

  • RF score display quantifying tree distance

  • Common subtree highlighting in matching colors

  • Distinctive edge emphasis for conflicting relationships

  • Automatic layout optimization for clarity

Motif Mapping Visualization

Purpose

Create genome-wide visualizations showing motif distribution patterns, functional annotations, and structural relationships.

Input Compatibility

Supports multiple input formats with automatic detection:

# Prosite results
cressent motif_map_viz -f scanprosite_results.csv \
                      -o motif_maps/ \
                      --format prosite

# MEME motif table
cressent motif_map_viz -f motif_table.csv \
                      -o motif_maps/ \
                      --format motif_table

# Auto-detection
cressent motif_map_viz -f results.csv \
                      -o motif_maps/ \
                      --format auto

Visualization Types

Linear genome maps:

  • Sequence-by-sequence motif distribution

  • Position-accurate mapping

  • Color-coded motif types

  • Scalable for multiple sequences

Density plots:

  • Motif frequency analysis

  • Position distribution patterns

  • Comparative statistics

  • Multi-panel layouts

Heatmaps:

  • Presence/absence matrices

  • Quantitative motif counts

  • Hierarchical clustering options

  • Interactive color scaling

Detailed genome maps:

  • High-resolution motif positioning

  • Functional annotation integration

  • Publication-ready formatting

  • Comprehensive legends

Example Workflow

# Step 1: Generate motif data
cressent motif_discovery -i sequences.fasta \
                        -o motif_analysis/ \
                        --scanprosite

# Step 2: Create comprehensive visualizations
cressent motif_map_viz -f motif_analysis/motif_table.csv \
                      -o visualization/ \
                      --format auto

# Step 3: Generate Prosite visualizations
cressent motif_map_viz -f motif_analysis/scanprosite_results.csv \
                      -o visualization/ \
                      --format prosite

Integrated Visualization Workflows

Complete Analysis Pipeline

# 1. Generate sequence data
cressent motif -i sequences.fasta \
               -p "TAGTATTAC" \
               --generate-logo \
               --plot-title "Nonanucleotide Motif" \
               -o complete_analysis/

# 2. Build phylogenetic trees
cressent build_tree -i aligned_sequences.fasta \
                   -o complete_analysis/ \
                   -B 1000

cressent plot_tree -t complete_analysis/*.treefile \
                  --metadata_1 sequence_metadata.csv \
                  --layout circular \
                  --color TRUE \
                  -o complete_analysis/

# 3. Comparative phylogenetics
cressent tanglegram --tree1 rep_tree.treefile \
                   --tree2 cap_tree.treefile \
                   --label1 "Rep Proteins" \
                   --label2 "Capsid Proteins" \
                   -o complete_analysis/

# 4. Motif distribution analysis
cressent motif_map_viz -f complete_analysis/motif_table.csv \
                      -o complete_analysis/

Publication-Ready Output

# High-quality figures for publication
cressent seq_logo -tb motif_positions.txt \
                 --plot_title "Conserved Nonanucleotide" \
                 --method "bits" \
                 --width 8 \
                 --height 4 \
                 -n Figure_1A.pdf \
                 -o publication_figures/

cressent plot_tree -t phylogeny.treefile \
                  --metadata_1 annotations.csv \
                  --layout rectangular \
                  --alignment sequences.fasta \
                  --fig_width 10 \
                  --fig_height 8 \
                  --plot_name Figure_2.pdf \
                  -o publication_figures/

cressent tanglegram --tree1 tree1.treefile \
                   --tree2 tree2.treefile \
                   --width 12 \
                   --height 8 \
                   --name_tanglegram Figure_3.pdf \
                   -o publication_figures/

Best Practices

Design Principles

  1. Consistency in color schemes and styling

  2. Clarity in labeling and legends

  3. Scalability for different output formats

  4. Accessibility with colorblind-friendly palettes

Technical Considerations

  1. Resolution appropriate for intended use

  2. File formats suitable for target applications

  3. Size optimization for web or print

  4. Reproducibility through parameter documentation

Quality Control

  1. Preview outputs before final generation

  2. Validate data integrity in visualizations

  3. Test different parameter combinations

  4. Document successful parameter sets

Troubleshooting

Common Issues

R package dependencies:

  • Ensure all required packages are installed

  • Check version compatibility

  • Update packages if necessary

Large dataset visualization:

  • Reduce data complexity where possible

  • Increase memory allocation

  • Use sampling for preview purposes

Color scheme problems:

  • Test with colorblind simulation tools

  • Use established palettes (RColorBrewer)

  • Maintain sufficient contrast

Performance Optimization

Memory management:

  • Process large datasets in chunks

  • Clear R workspace between analyses

  • Monitor system resources

Rendering speed:

  • Use appropriate resolution settings

  • Optimize data structures

  • Consider vector vs. raster formats

Output Validation

Quality checks:

  • Verify data accuracy in plots

  • Check scaling and proportions

  • Validate legends and labels

  • Test across different viewers/platforms