eXMC — Probabilistic Programming on BEAM Process Runtimes

The Argument

Three architectural consequences of building a probabilistic programming framework on a virtual machine designed for telephone switches.

Every sample is a message

sample_stream sends each posterior draw as a BEAM message to any process — a Scenic window, a Phoenix LiveView, a GenServer. The sampler doesn't know or care what the receiver does.

Sampler.sample_stream(model, self(), init, opts)

receive do
  {:exmc_sample, i, point_map, step_stat} -> ...
end

If it crashes, try again

Each subtree build is wrapped in try/rescue. Zero-overhead on the happy path — BEAM's try is exception registration in the process stack. When a subtree crashes, it's replaced with a divergent placeholder.

try do
  build_subtree(state, direction, depth, step_fn)
rescue
  _ -> divergent_placeholder(state)
end

203 lines. No MPI. No Dask. No Ray.

Distribution is just sending messages farther. Start nodes, send the model, collect samples. Erlang's distribution protocol handles serialization. PIDs are location-transparent.

:erpc.call(node, Sampler, :sample_stream,
  [model, coordinator_pid, init, opts])

The Numbers

Head-to-head against PyMC. 5-seed medians, 1 chain, same model, same data.

Model	PyMC ESS/s	eXMC ESS/s	Ratio	Winner
simple (d=2)	576	469	0.81×	PyMC
medium (d=5)	157	298	1.90×	eXMC
stress (d=8)	185	215	1.16×	eXMC
eight_schools (d=10)	5	12	2.55×	eXMC
funnel (d=10)	6	2	0.40×	PyMC
logistic (d=21)	336	69	0.21×	PyMC
sv (d=102)	1	1	1.20×	eXMC

eXMC wins 4 models to PyMC's 3, including the canonical Eight Schools benchmark (2.55×) and 102-dimensional stochastic volatility (1.20×). With 5-node distribution: 2.88× average scaling.

The Composition Test

Here's the result I'm most proud of, because it wasn't designed — it was discovered.

Streaming inference was built in January: a sampler that sends each posterior draw as a BEAM message, so a Scenic visualization window can update trace plots in real time. Distribution was built in February: four Erlang nodes, each running an independent chain, collecting samples through :erpc.

The two features were developed months apart. Neither knew the other existed. When we connected them, the change was three lines of code. The distributed coordinator already forwarded messages to a caller PID. The streaming visualization already listened on a PID for sample messages. We pointed one at the other.

Four nodes. Twenty thousand samples. Twenty-one seconds. The Scenic dashboard updated live from all four nodes simultaneously — trace plots interleaving draws from four independent chains running on four separate machines.

This composition worked on the first attempt because both features were built on the same primitive: send(pid, {:exmc_sample, i, point_map, step_stat}).

On the improbable and slightly reckless decision to build a probabilistic programming framework on a virtual machine designed for telephone switches

There is a peculiar constraint at the heart of the dominant language in scientific computing: it cannot truly do two things at once. A global lock, woven into the interpreter's memory model decades ago, ensures that threads yield to one another rather than run alongside. Every framework that requires concurrency — and probabilistic programming requires a great deal of it — must work around this fact, layering foreign runtimes beneath the surface.

The BEAM virtual machine has no such constraint. It was built for telephone switches, systems where dropping a call is not an option and where a million conversations must proceed simultaneously without interference. What if we took that machine — designed for reliability under concurrency — and asked it to explore posterior distributions instead of routing phone calls?

Read the full essay →

Layer	Modules	Responsibility
IR	Builder, DSL, Dist.*	Model as data
Compiler	Compiler, PointMap, Transform	IR → differentiable closure
Sampler	NUTS, ADVI, SMC, Pathfinder	Inference algorithms
Runtime	Streaming, Distribution, Viz	BEAM integration

Getting Started

Three ways to define and sample a model.

Builder API DSL Syntax Distributed

alias Exmc.{Builder, Dist.Normal, Dist.HalfNormal}

ir = Builder.new()
  |> Builder.rv("mu", Normal.new(0.0, 10.0))
  |> Builder.rv("sigma", HalfNormal.new(1.0))
  |> Builder.obs("y", Normal.new("mu", "sigma"), data)

{trace, stats} = Exmc.Sampler.sample(ir, %{},
  num_samples: 1000, num_warmup: 1000)

import Exmc.DSL

model = model do
  mu    ~ Normal.new(0.0, 10.0)
  sigma ~ HalfNormal.new(1.0)
  y     ~ Normal.new(mu, sigma), observed: data
end

Exmc.Distributed.sample_chains(model, init,
  nodes: [:"n1@10.0.0.2", :"n2@10.0.0.3"],
  num_samples: 1000, num_warmup: 1000)

Add to your mix.exs dependencies:

{:exmc, "~> 0.1.0"}

Research

The thesis behind the framework.

Architecture
Why the BEAM runtime enables architectural properties other PPLs cannot express
Compilation
From declarative IR to differentiable closures via Nx and EXLA
The No-U-Turn Sampler
NUTS implementation with Stan-style three-phase warmup
Streaming Inference
Per-sample message passing for live posterior visualization
Distributed MCMC
Location-transparent multi-node sampling via Erlang distribution
Benchmarks
Seven-model comparison against PyMC, GPU acceleration, scaling analysis

Probabilistic Programming on BEAM Process Runtimes

Writing

Notes on building a probabilistic programming framework where none was expected.

February 2026
The Amber Trace
On the improbable and slightly reckless decision to build a probabilistic programming framework on a virtual machine designed for telephone switches. A literary technical memoir in twelve chapters.
January 2026
34× Slower to 1.9× Faster: The Optimization Journey
Seven optimization phases, each revealing something about mixed-runtime Probabilistic Programming design. From 3.3 ESS/s to 298 ESS/s without changing the algorithm.
January 2026
Feature Parity and Speed
16 distributions, 4 inference methods, GPU acceleration, and a head-to-head race against PyMC. The numbers that prove the thesis.
December 2025
What If Probabilistic Programming Runtimes Were Different?
The architectural thesis: streaming, fault tolerance, and distribution as consequences of the runtime, not features bolted on. Part 1 of the series.

What if probabilistic programming runtimes were different?