What if probabilistic programming runtimes were different?

A Probabilistic Programming environment on the BEAM, where liveness, fault tolerance, and distribution aren't features — they're consequences.

The Argument

Three architectural consequences of building a probabilistic programming framework on a virtual machine designed for telephone switches.

Every sample is a message

sample_stream sends each posterior draw as a BEAM message to any process — a Scenic window, a Phoenix LiveView, a GenServer. The sampler doesn't know or care what the receiver does.

Sampler.sample_stream(model, self(), init, opts)

receive do
  {:exmc_sample, i, point_map, step_stat} -> ...
end

If it crashes, try again

Each subtree build is wrapped in try/rescue. Zero-overhead on the happy path — BEAM's try is exception registration in the process stack. When a subtree crashes, it's replaced with a divergent placeholder.

try do
  build_subtree(state, direction, depth, step_fn)
rescue
  _ -> divergent_placeholder(state)
end

203 lines. No MPI. No Dask. No Ray.

Distribution is just sending messages farther. Start nodes, send the model, collect samples. Erlang's distribution protocol handles serialization. PIDs are location-transparent.

:erpc.call(node, Sampler, :sample_stream,
  [model, coordinator_pid, init, opts])

The Numbers

Head-to-head against PyMC. 5-seed medians, 1 chain, same model, same data.

Model PyMC ESS/s eXMC ESS/s Ratio Winner
simple (d=2) 576 469 0.81× PyMC
medium (d=5) 157 298 1.90× eXMC
stress (d=8) 185 215 1.16× eXMC
eight_schools (d=10) 5 12 2.55× eXMC
funnel (d=10) 6 2 0.40× PyMC
logistic (d=21) 336 69 0.21× PyMC
sv (d=102) 1 1 1.20× eXMC

eXMC wins 4 models to PyMC's 3, including the canonical Eight Schools benchmark (2.55×) and 102-dimensional stochastic volatility (1.20×). With 5-node distribution: 2.88× average scaling.

The Composition Test

Here's the result I'm most proud of, because it wasn't designed — it was discovered.

Streaming inference was built in January: a sampler that sends each posterior draw as a BEAM message, so a Scenic visualization window can update trace plots in real time. Distribution was built in February: four Erlang nodes, each running an independent chain, collecting samples through :erpc.

The two features were developed months apart. Neither knew the other existed. When we connected them, the change was three lines of code. The distributed coordinator already forwarded messages to a caller PID. The streaming visualization already listened on a PID for sample messages. We pointed one at the other.

Four nodes. Twenty thousand samples. Twenty-one seconds. The Scenic dashboard updated live from all four nodes simultaneously — trace plots interleaving draws from four independent chains running on four separate machines.

This composition worked on the first attempt because both features were built on the same primitive: send(pid, {:exmc_sample, i, point_map, step_stat}).

On the improbable and slightly reckless decision to build a probabilistic programming framework on a virtual machine designed for telephone switches

There is a peculiar constraint at the heart of the dominant language in scientific computing: it cannot truly do two things at once. A global lock, woven into the interpreter's memory model decades ago, ensures that threads yield to one another rather than run alongside. Every framework that requires concurrency — and probabilistic programming requires a great deal of it — must work around this fact, layering foreign runtimes beneath the surface.

The BEAM virtual machine has no such constraint. It was built for telephone switches, systems where dropping a call is not an option and where a million conversations must proceed simultaneously without interference. What if we took that machine — designed for reliability under concurrency — and asked it to explore posterior distributions instead of routing phone calls?

Read the full essay →

Architecture

Four layers, each depending only on the one below.

eXMC four-layer architecture diagram
Layer Modules Responsibility
IR Builder, DSL, Dist.* Model as data
Compiler Compiler, PointMap, Transform IR → differentiable closure
Sampler NUTS, ADVI, SMC, Pathfinder Inference algorithms
Runtime Streaming, Distribution, Viz BEAM integration

Getting Started

Three ways to define and sample a model.

alias Exmc.{Builder, Dist.Normal, Dist.HalfNormal}

ir = Builder.new()
  |> Builder.rv("mu", Normal.new(0.0, 10.0))
  |> Builder.rv("sigma", HalfNormal.new(1.0))
  |> Builder.obs("y", Normal.new("mu", "sigma"), data)

{trace, stats} = Exmc.Sampler.sample(ir, %{},
  num_samples: 1000, num_warmup: 1000)
import Exmc.DSL

model = model do
  mu    ~ Normal.new(0.0, 10.0)
  sigma ~ HalfNormal.new(1.0)
  y     ~ Normal.new(mu, sigma), observed: data
end
Exmc.Distributed.sample_chains(model, init,
  nodes: [:"n1@10.0.0.2", :"n2@10.0.0.3"],
  num_samples: 1000, num_warmup: 1000)

Add to your mix.exs dependencies:

{:exmc, "~> 0.1.0"}

Research

The thesis behind the framework.

  1. Architecture
    Why the BEAM runtime enables architectural properties other PPLs cannot express
  2. Compilation
    From declarative IR to differentiable closures via Nx and EXLA
  3. The No-U-Turn Sampler
    NUTS implementation with Stan-style three-phase warmup
  4. Streaming Inference
    Per-sample message passing for live posterior visualization
  5. Distributed MCMC
    Location-transparent multi-node sampling via Erlang distribution
  6. Benchmarks
    Seven-model comparison against PyMC, GPU acceleration, scaling analysis

Probabilistic Programming on BEAM Process Runtimes

Writing

Notes on building a probabilistic programming framework where none was expected.