Performance Benchmarks

This page compares the performance of ModernRoboticsBook.jl against Pinocchio (C++) and RigidBodyDynamics.jl (Julia). All benchmarks use the UR5 (6-DOF) robot at the same configuration.

Results

Function	MRB.jl	MRB.jl (in-place)	Pinocchio 3.9 (C++)	RBD.jl
Forward kinematics	0.31 μs	0.30 μs	0.86 μs	0.41 μs
Jacobian	0.39 μs	0.36 μs	1.0 μs	0.13 μs
Inverse dynamics (RNEA)	1.15 μs	1.0 μs	0.64 μs	0.62 μs
Mass matrix (CRBA)	1.14 μs	0.91 μs	0.65 μs	0.14 μs
Mass matrix (RNEA)	6.79 μs	—	—	—
Velocity quadratic forces	1.19 μs	1.02 μs	—	—
Gravity forces	1.19 μs	1.03 μs	0.48 μs	0.55 μs
End-effector forces	1.19 μs	1.03 μs	—	—
Forward dynamics (ABA)	1.65 μs	1.41 μs	2.82 μs	—
Forward dynamics (CRBA)	2.80 μs	2.39 μs	—	—
Forward dynamics (RNEA)	5.44 μs	4.91 μs	—	—

Measured on Apple M2 (16 GB), Julia 1.12, Python 3.13. Julia timings are median values from BenchmarkTools.jl. RBD.jl: the ee_link frame is grabbed before removing fixed joints; timings are for in-place variants where available.

Analysis

Forward kinematics and Jacobian computation are ~3x faster than Pinocchio thanks to Julia's StaticArrays, which eliminate heap allocations for small fixed-size matrices (4×4, 6×6). The in-place variants (forward_kinematics_space!, jacobian_space!, etc.) achieve zero allocations.

Dynamics functions are now within 1.4–2x of Pinocchio, with forward dynamics on par. Both libraries use the same algorithms (RNEA, CRBA); the remaining gap is due to Pinocchio's hand-tuned C++/Eigen spatial algebra vs our general Julia SMatrix operations.

RigidBodyDynamics.jl achieves extremely fast mass matrix (0.14 μs) and Jacobian (0.13 μs) computation through aggressive caching and code generation specialized to the mechanism topology. It uses a state-based API where MechanismState caches intermediate kinematic results, amortizing costs across queries. With fixed joints removed (the default), the kinematic tree is smaller and dynamics are faster. The ee_link frame is preserved on the parent body for FK queries even after pruning.

The textbook algorithm variants (mass_matrix_rnea, forward_dynamics_rnea) are provided for educational purposes — they directly mirror the textbook equations but are slower than the optimized defaults.

Algorithms

Inverse dynamics: Uses the Recursive Newton-Euler Algorithm (RNEA), the same O(n) algorithm as Pinocchio and RBD.jl. The in-place variant achieves zero allocations.
Mass matrix: Uses the Composite Rigid Body Algorithm (CRBA), which computes the full matrix in a single backward pass over composite spatial inertias — the same algorithm as Pinocchio and RBD.jl. The in-place variant achieves zero allocations. The textbook variant mass_matrix_rnea calls inverse_dynamics_rnea n times with unit accelerations.
Forward dynamics: Three implementations are available: forward_dynamics_aba uses the Articulated Body Algorithm — the same O(n) algorithm as Pinocchio — which solves for joint accelerations without forming the mass matrix. forward_dynamics_crba uses CRBA + a single RNEA call + backslash solve. forward_dynamics_rnea is the textbook variant that explicitly forms M⁻¹ and calls RNEA separately for each term.

Allocations

Function	Allocating	In-place (`!`)
Forward kinematics	2 (208 B)	0 (0 B)
Jacobian	2 (384 B)	0 (0 B)
Inverse dynamics (RNEA)	13 (3.4 KiB)	0 (0 B)
Mass matrix (CRBA)	9 (4.6 KiB)	0 (0 B)
Mass matrix (RNEA)	76 (20.7 KiB)	—
Velocity quadratic forces	17 (3.6 KiB)	0 (0 B)
Gravity forces	17 (3.6 KiB)	0 (0 B)
End-effector forces	17 (3.6 KiB)	0 (0 B)
Forward dynamics (ABA)	23 (6.1 KiB)	0 (0 B)
Forward dynamics (CRBA)	32 (8.8 KiB)	8 (720 B)
Forward dynamics (RNEA)	78 (19.6 KiB)	10 (3.7 KiB)

When do in-place variants matter?

The allocating and in-place versions have similar single-call timings because the workspace allocations are small (~3–6 KiB) and modern allocators handle them efficiently. The real benefit of in-place (!) variants appears in tight loops:

GC pressure: millions of small allocations accumulate garbage, triggering garbage collection pauses that cause unpredictable latency spikes.
Predictable timing: zero allocations means no GC pauses — critical for real-time control where a single missed deadline can cause instability.
Scaling with DOF: workspace arrays grow with robot complexity, so the allocation overhead grows too.

For learning and prototyping, the allocating API is fast enough — a full forward dynamics call takes ~1.6 μs (ABA), allowing ~625,000 evaluations per second.

For real-time control loops (1 kHz+), trajectory optimization, or large-scale simulation (millions of evaluations), use the in-place variants with pre-allocated workspace to avoid GC pauses.

Reproducing

# ModernRoboticsBook.jl benchmark
julia --project benchmarks/bench_julia.jl

# RigidBodyDynamics.jl benchmark
julia benchmarks/bench_rbd.jl

# Pinocchio benchmark (requires: pip install pin robot-descriptions)
python benchmarks/bench_pinocchio.py