Feature Parity and Speed

This is Part 2 of "What If Probabilistic Programming Were Different?" Part 1 introduced the BEAM process runtime thesis. This part asks the uncomfortable question: does the thing actually work?

An architectural thesis is worthless if the system can't do the job. eXMC must be a competent probabilistic programming framework before it can be an interesting one. That means distributions, inference algorithms, model comparison, visualization, and — critically — performance that doesn't embarrass itself against PyMC.

The Distribution Library

eXMC ships 16 distributions, each implementing logpdf, sample, and constraint transforms with automatic Jacobian correction:

Category	Distributions
Continuous	Normal, HalfNormal, TruncatedNormal, Lognormal, Exponential, Uniform, Cauchy, HalfCauchy, Beta, Gamma, StudentT, Laplace
Discrete	Bernoulli, Poisson
Multivariate	MvNormal, Dirichlet

Every distribution declares its own constraint transform. HalfNormal uses :log — the sampler explores unconstrained space while the user thinks in constrained space. The Jacobian correction happens automatically, which means users never need to know it exists.

Four Inference Methods

NUTS is the workhorse, but not every model needs gradient-based sampling:

NUTS — full No-U-Turn Sampler with Stan-style three-phase warmup, diagonal and dense mass matrix adaptation, rho-based generalized U-turn criterion, sub-trajectory checks
ADVI — Automatic Differentiation Variational Inference for fast approximate posteriors
SMC — Sequential Monte Carlo for multimodal posteriors and model evidence estimation
Pathfinder — L-BFGS-based variational method for fast initialization

Model Comparison

WAIC and LOO-CV (Pareto-smoothed importance sampling) for comparing models without held-out data. Both computed from the pointwise log-likelihood, which eXMC stores automatically during sampling.

The Compilation Pipeline

Models are data before they're computation. The Builder API produces an intermediate representation — a map of random variable nodes with their distributions, transforms, and observation data. The Compiler walks this IR and produces a differentiable closure: one function that maps a flat parameter vector to a scalar log-probability.

Between Builder and Compiler, rewrite passes transform the IR:

Non-Centered Parameterization — automatically detects funnel geometry and rewrites x ~ Normal(mu, sigma) to z ~ Normal(0, 1), x = mu + sigma * z
Vectorized Observations — Builder.obs auto-adds reduce: :sum for non-scalar observation data

The closure is then JIT-compiled via EXLA when available. The JIT boundary is precise: tensor-heavy gradient computation goes through XLA, branchy tree logic stays in plain Elixir. Three number systems coexist in one sampler — Erlang floats for adaptation, Nx tensors for the sampler loop, EXLA-compiled closures for the gradient.

GPU Acceleration

Pass device: :cuda to sample on GPU. At d=22, GPU gives 2.65× wall-time speedup (ESS/s 3.9→10.3). The batched leapfrog integrator amortizes kernel launch overhead across all steps within a subtree.

Apple Silicon support via EMLX (Metal/MLX backend) works at f32 precision. Exmc.JIT abstracts the backend choice — EXLA, EMLX, or BinaryBackend fallback.

Visualization

ExmcViz renders ArviZ-style diagnostics through Scenic, Elixir's scene graph library. No Matplotlib, no PNG encoding, no Jupyter dependency. The scene graph is a live data structure composited at 60fps through OpenGL via NanoVG.

The live streaming feature connects the sampler directly to the visualization through GenServer message passing. Trace plots grow from left to right, amber lines on black. You watch Bayesian inference happen in real time.

The Race

Five-seed medians, 1 chain, 2000 samples + 1000 warmup, same model definitions:

Model	d	eXMC ESS/s	PyMC ESS/s	Ratio	Winner
simple	2	469	576	0.81×	PyMC
medium	5	298	157	1.90×	eXMC
stress	8	215	185	1.16×	eXMC
eight_schools	10	12	5	2.55×	eXMC
funnel	10	2	6	0.40×	PyMC
logistic	21	69	336	0.21×	PyMC
stochastic_vol	102	1.2	1.0	1.20×	eXMC

eXMC wins 4 models to PyMC's 3. The wins include the canonical Eight Schools benchmark (2.55×) and the 102-dimensional stochastic volatility model (1.20×). PyMC wins the trivial model (per-step overhead) and the logistic regression (dense mass matrix at d=21 favors PyMC's C++ inner loop).

The crossover is around d=3–4 parameters. Below that, architectural overhead dominates and PyMC wins. Above that, adaptation quality dominates and eXMC wins.

Distribution Scaling

With 5-node Erlang distribution: 2.88× average scaling. No MPI, no Dask, no Ray. Just :erpc.call and location-transparent PIDs.

The Stack

Component	Count
Distributions	16
Inference methods	4 (NUTS, ADVI, SMC, Pathfinder)
Rewrite passes	2 (NCP, vectorized obs)
Architectural decisions	52 documented
Tests (eXMC)	199
Tests (ExmcViz)	34
GPU backends	2 (CUDA via EXLA, Metal via EMLX)

This is not a toy. It is a research-grade probabilistic programming framework that happens to run on a telephone switch VM, and it holds its own against the most mature PPL in the Python ecosystem.

Part 3 tells the story of how we got from 34× slower to 1.9× faster — seven optimization phases, each revealing something about the nature of mixed-runtime PPL design.