This is Part 2 of "What If Probabilistic Programming Were Different?" Part 1 introduced the BEAM process runtime thesis. This part asks the uncomfortable question: does the thing actually work?
An architectural thesis is worthless if the system can't do the job. eXMC must be a competent probabilistic programming framework before it can be an interesting one. That means distributions, inference algorithms, model comparison, visualization, and — critically — performance that doesn't embarrass itself against PyMC.
The Distribution Library
eXMC ships 16 distributions, each implementing logpdf, sample, and constraint transforms with automatic Jacobian correction:
| Category | Distributions |
|---|---|
| Continuous | Normal, HalfNormal, TruncatedNormal, Lognormal, Exponential, Uniform, Cauchy, HalfCauchy, Beta, Gamma, StudentT, Laplace |
| Discrete | Bernoulli, Poisson |
| Multivariate | MvNormal, Dirichlet |
Every distribution declares its own constraint transform. HalfNormal uses :log — the sampler explores unconstrained space while the user thinks in constrained space. The Jacobian correction happens automatically, which means users never need to know it exists.
Four Inference Methods
NUTS is the workhorse, but not every model needs gradient-based sampling:
- NUTS — full No-U-Turn Sampler with Stan-style three-phase warmup, diagonal and dense mass matrix adaptation, rho-based generalized U-turn criterion, sub-trajectory checks
- ADVI — Automatic Differentiation Variational Inference for fast approximate posteriors
- SMC — Sequential Monte Carlo for multimodal posteriors and model evidence estimation
- Pathfinder — L-BFGS-based variational method for fast initialization
Model Comparison
WAIC and LOO-CV (Pareto-smoothed importance sampling) for comparing models without held-out data. Both computed from the pointwise log-likelihood, which eXMC stores automatically during sampling.
The Compilation Pipeline
Models are data before they're computation. The Builder API produces an intermediate representation — a map of random variable nodes with their distributions, transforms, and observation data. The Compiler walks this IR and produces a differentiable closure: one function that maps a flat parameter vector to a scalar log-probability.
Between Builder and Compiler, rewrite passes transform the IR:
- Non-Centered Parameterization — automatically detects funnel geometry and rewrites
x ~ Normal(mu, sigma)toz ~ Normal(0, 1), x = mu + sigma * z - Vectorized Observations —
Builder.obsauto-addsreduce: :sumfor non-scalar observation data
The closure is then JIT-compiled via EXLA when available. The JIT boundary is precise: tensor-heavy gradient computation goes through XLA, branchy tree logic stays in plain Elixir. Three number systems coexist in one sampler — Erlang floats for adaptation, Nx tensors for the sampler loop, EXLA-compiled closures for the gradient.
GPU Acceleration
Pass device: :cuda to sample on GPU. At d=22, GPU gives 2.65× wall-time speedup (ESS/s 3.9→10.3). The batched leapfrog integrator amortizes kernel launch overhead across all steps within a subtree.
Apple Silicon support via EMLX (Metal/MLX backend) works at f32 precision. Exmc.JIT abstracts the backend choice — EXLA, EMLX, or BinaryBackend fallback.
Visualization
ExmcViz renders ArviZ-style diagnostics through Scenic, Elixir's scene graph library. No Matplotlib, no PNG encoding, no Jupyter dependency. The scene graph is a live data structure composited at 60fps through OpenGL via NanoVG.
The live streaming feature connects the sampler directly to the visualization through GenServer message passing. Trace plots grow from left to right, amber lines on black. You watch Bayesian inference happen in real time.
The Race
Five-seed medians, 1 chain, 2000 samples + 1000 warmup, same model definitions:
| Model | d | eXMC ESS/s | PyMC ESS/s | Ratio | Winner |
|---|---|---|---|---|---|
| simple | 2 | 469 | 576 | 0.81× | PyMC |
| medium | 5 | 298 | 157 | 1.90× | eXMC |
| stress | 8 | 215 | 185 | 1.16× | eXMC |
| eight_schools | 10 | 12 | 5 | 2.55× | eXMC |
| funnel | 10 | 2 | 6 | 0.40× | PyMC |
| logistic | 21 | 69 | 336 | 0.21× | PyMC |
| stochastic_vol | 102 | 1.2 | 1.0 | 1.20× | eXMC |
eXMC wins 4 models to PyMC's 3. The wins include the canonical Eight Schools benchmark (2.55×) and the 102-dimensional stochastic volatility model (1.20×). PyMC wins the trivial model (per-step overhead) and the logistic regression (dense mass matrix at d=21 favors PyMC's C++ inner loop).
The crossover is around d=3–4 parameters. Below that, architectural overhead dominates and PyMC wins. Above that, adaptation quality dominates and eXMC wins.
Distribution Scaling
With 5-node Erlang distribution: 2.88× average scaling. No MPI, no Dask, no Ray. Just :erpc.call and location-transparent PIDs.
The Stack
| Component | Count |
|---|---|
| Distributions | 16 |
| Inference methods | 4 (NUTS, ADVI, SMC, Pathfinder) |
| Rewrite passes | 2 (NCP, vectorized obs) |
| Architectural decisions | 52 documented |
| Tests (eXMC) | 199 |
| Tests (ExmcViz) | 34 |
| GPU backends | 2 (CUDA via EXLA, Metal via EMLX) |
This is not a toy. It is a research-grade probabilistic programming framework that happens to run on a telephone switch VM, and it holds its own against the most mature PPL in the Python ecosystem.
Part 3 tells the story of how we got from 34× slower to 1.9× faster — seven optimization phases, each revealing something about the nature of mixed-runtime PPL design.