Benchmarking Overview#
ARGscape includes a comprehensive benchmarking module for measuring the performance and accuracy of inference methods and visualization tools. This module is designed for researchers who want to evaluate different spatial and temporal inference algorithms or compare visualization performance across datasets of varying sizes.
What the Benchmarking System Measures#
The benchmarking system evaluates three key areas:
Inference Performance#
Measures how efficiently different inference methods process tree sequences:
Wall time: Total elapsed time from start to finish
CPU time: Actual processor time consumed
Peak memory usage: Maximum memory footprint during inference
Supported inference methods include:
FastGAIA (spatial inference)
GAIA-quadratic (spatial inference)
GAIA-linear (spatial inference)
Midpoint (spatial inference, weighted and unweighted variants)
SPARG (spatial inference)
Spacetrees (spatial inference)
tsdate (temporal inference)
Inference Accuracy#
When ground truth data is available (from simulated datasets), the system computes accuracy metrics:
Spatial accuracy metrics:
Mean error (in km or coordinate units)
Median error
Root mean squared error (RMSE)
Standard deviation of errors
Number of nodes compared
Temporal accuracy metrics:
Mean absolute error
Median absolute error
RMSE
Correlation coefficient between inferred and true times
Number of nodes compared
Visualization Performance#
Measures how well the visualization tools handle datasets of different sizes:
Render time: Time to initial render
Time to interactive: Time until the visualization responds to user input
FPS during pan: Frame rate while dragging the view
FPS during zoom: Frame rate while zooming
Heap size: JavaScript memory consumption
Supported visualization tools:
argscape-2d (ForceGraph)
argscape-3d (Spatial3D)
lorax (external tool, planned)
tskit_arg_visualizer (external tool, planned)
Requirements#
Server Requirement#
The benchmarking system requires the ARGscape server to be running. Both inference and visualization benchmarks are executed through the API to ensure consistent measurement conditions.
Start the server before running benchmarks:
argscape --no-browser
Or using uvicorn directly:
uvicorn argscape.api.main:app --port 8000
Installation#
Install ARGscape with benchmark dependencies:
pip install argscape[benchmark]
This installs additional requirements:
msprimefor generating simulated datasetsplaywrightfor visualization benchmarksmatplotlibfor generating plots
For visualization benchmarks, also install browser drivers:
playwright install chromium
Output Structure#
Benchmark results are organized in a structured directory:
results/
inference/
raw_metrics.csv # Detailed metrics for each run
outputs/ # Inference output files (subprocess mode)
visualization/
raw_metrics.csv # Visualization metrics
tables/
inference_comparison.tex # LaTeX table for papers
figures/
scaling_plots.png # Performance scaling plots
scaling_plots.pdf
summary.json # Complete results with system info
System Information#
The benchmark results include detailed system information for reproducibility:
ARGscape version
Python version
Platform and processor details
Dependency versions (tskit, msprime, fastgaia, etc.)
Timestamp of the benchmark run
This metadata is stored in summary.json and helps ensure benchmark results can be compared fairly across different systems.
Next Steps#
Running Benchmarks - Learn how to run benchmarks
Interpreting Benchmark Results - Understand the output metrics
Benchmark Datasets - Generate standardized test datasets