Benchmarking Overview

Benchmarking Overview#

ARGscape includes a comprehensive benchmarking module for measuring the performance and accuracy of inference methods and visualization tools. This module is designed for researchers who want to evaluate different spatial and temporal inference algorithms or compare visualization performance across datasets of varying sizes.

What the Benchmarking System Measures#

The benchmarking system evaluates three key areas:

Inference Performance#

Measures how efficiently different inference methods process tree sequences:

Wall time: Total elapsed time from start to finish
CPU time: Actual processor time consumed
Peak memory usage: Maximum memory footprint during inference

Supported inference methods include:

FastGAIA (spatial inference)
GAIA-quadratic (spatial inference)
GAIA-linear (spatial inference)
Midpoint (spatial inference, weighted and unweighted variants)
SPARG (spatial inference)
Spacetrees (spatial inference)
tsdate (temporal inference)

Inference Accuracy#

When ground truth data is available (from simulated datasets), the system computes accuracy metrics:

Spatial accuracy metrics:

Mean error (in km or coordinate units)
Median error
Root mean squared error (RMSE)
Standard deviation of errors
Number of nodes compared

Temporal accuracy metrics:

Mean absolute error
Median absolute error
RMSE
Correlation coefficient between inferred and true times
Number of nodes compared

Visualization Performance#

Measures how well the visualization tools handle datasets of different sizes:

Render time: Time to initial render
Time to interactive: Time until the visualization responds to user input
FPS during pan: Frame rate while dragging the view
FPS during zoom: Frame rate while zooming
Heap size: JavaScript memory consumption

Supported visualization tools:

argscape-2d (ForceGraph)
argscape-3d (Spatial3D)
lorax (external tool, planned)
tskit_arg_visualizer (external tool, planned)

Requirements#

Server Requirement#

The benchmarking system requires the ARGscape server to be running. Both inference and visualization benchmarks are executed through the API to ensure consistent measurement conditions.

Start the server before running benchmarks:

argscape --no-browser

Or using uvicorn directly:

uvicorn argscape.api.main:app --port 8000

Installation#

Install ARGscape with benchmark dependencies:

pip install argscape[benchmark]

This installs additional requirements:

msprime for generating simulated datasets
playwright for visualization benchmarks
matplotlib for generating plots

For visualization benchmarks, also install browser drivers:

playwright install chromium

Output Structure#

Benchmark results are organized in a structured directory:

results/
  inference/
    raw_metrics.csv      # Detailed metrics for each run
    outputs/             # Inference output files (subprocess mode)
  visualization/
    raw_metrics.csv      # Visualization metrics
  tables/
    inference_comparison.tex  # LaTeX table for papers
  figures/
    scaling_plots.png    # Performance scaling plots
    scaling_plots.pdf
  summary.json           # Complete results with system info

System Information#

The benchmark results include detailed system information for reproducibility:

ARGscape version
Python version
Platform and processor details
Dependency versions (tskit, msprime, fastgaia, etc.)
Timestamp of the benchmark run

This metadata is stored in summary.json and helps ensure benchmark results can be compared fairly across different systems.

Next Steps#

Running Benchmarks - Learn how to run benchmarks
Interpreting Benchmark Results - Understand the output metrics
Benchmark Datasets - Generate standardized test datasets