A Probabilistic Programming environment on the BEAM, where liveness, fault tolerance, and distribution aren't features — they're consequences.
⌄Three architectural consequences of building a probabilistic programming framework on a virtual machine designed for telephone switches.
sample_stream sends each posterior draw as a BEAM message to any process — a Scenic window, a Phoenix LiveView, a GenServer. The sampler doesn't know or care what the receiver does.
Sampler.sample_stream(model, self(), init, opts)
receive do
{:exmc_sample, i, point_map, step_stat} -> ...
end
Each subtree build is wrapped in try/rescue. Zero-overhead on the happy path — BEAM's try is exception registration in the process stack. When a subtree crashes, it's replaced with a divergent placeholder.
try do
build_subtree(state, direction, depth, step_fn)
rescue
_ -> divergent_placeholder(state)
end
Distribution is just sending messages farther. Start nodes, send the model, collect samples. Erlang's distribution protocol handles serialization. PIDs are location-transparent.
:erpc.call(node, Sampler, :sample_stream,
[model, coordinator_pid, init, opts])
Head-to-head against PyMC. 5-seed medians, 1 chain, same model, same data.
| Model | PyMC ESS/s | eXMC ESS/s | Ratio | Winner |
|---|---|---|---|---|
| simple (d=2) | 576 | 469 | 0.81× | PyMC |
| medium (d=5) | 157 | 298 | 1.90× | eXMC |
| stress (d=8) | 185 | 215 | 1.16× | eXMC |
| eight_schools (d=10) | 5 | 12 | 2.55× | eXMC |
| funnel (d=10) | 6 | 2 | 0.40× | PyMC |
| logistic (d=21) | 336 | 69 | 0.21× | PyMC |
| sv (d=102) | 1 | 1 | 1.20× | eXMC |
eXMC wins 4 models to PyMC's 3, including the canonical Eight Schools benchmark (2.55×) and 102-dimensional stochastic volatility (1.20×). With 5-node distribution: 2.88× average scaling.
Here's the result I'm most proud of, because it wasn't designed — it was discovered.
Streaming inference was built in January: a sampler that sends each posterior draw as a BEAM message, so a Scenic visualization window can update trace plots in real time. Distribution was built in February: four Erlang nodes, each running an independent chain, collecting samples through :erpc.
The two features were developed months apart. Neither knew the other existed. When we connected them, the change was three lines of code. The distributed coordinator already forwarded messages to a caller PID. The streaming visualization already listened on a PID for sample messages. We pointed one at the other.
Four nodes. Twenty thousand samples. Twenty-one seconds. The Scenic dashboard updated live from all four nodes simultaneously — trace plots interleaving draws from four independent chains running on four separate machines.
This composition worked on the first attempt because both features were built on the same primitive: send(pid, {:exmc_sample, i, point_map, step_stat}).
There is a peculiar constraint at the heart of the dominant language in scientific computing: it cannot truly do two things at once. A global lock, woven into the interpreter's memory model decades ago, ensures that threads yield to one another rather than run alongside. Every framework that requires concurrency — and probabilistic programming requires a great deal of it — must work around this fact, layering foreign runtimes beneath the surface.
The BEAM virtual machine has no such constraint. It was built for telephone switches, systems where dropping a call is not an option and where a million conversations must proceed simultaneously without interference. What if we took that machine — designed for reliability under concurrency — and asked it to explore posterior distributions instead of routing phone calls?
Read the full essay →Four layers, each depending only on the one below.
| Layer | Modules | Responsibility |
|---|---|---|
| IR | Builder, DSL, Dist.* | Model as data |
| Compiler | Compiler, PointMap, Transform | IR → differentiable closure |
| Sampler | NUTS, ADVI, SMC, Pathfinder | Inference algorithms |
| Runtime | Streaming, Distribution, Viz | BEAM integration |
Three ways to define and sample a model.
alias Exmc.{Builder, Dist.Normal, Dist.HalfNormal}
ir = Builder.new()
|> Builder.rv("mu", Normal.new(0.0, 10.0))
|> Builder.rv("sigma", HalfNormal.new(1.0))
|> Builder.obs("y", Normal.new("mu", "sigma"), data)
{trace, stats} = Exmc.Sampler.sample(ir, %{},
num_samples: 1000, num_warmup: 1000)
import Exmc.DSL
model = model do
mu ~ Normal.new(0.0, 10.0)
sigma ~ HalfNormal.new(1.0)
y ~ Normal.new(mu, sigma), observed: data
end
Exmc.Distributed.sample_chains(model, init,
nodes: [:"n1@10.0.0.2", :"n2@10.0.0.3"],
num_samples: 1000, num_warmup: 1000)
Add to your mix.exs dependencies:
{:exmc, "~> 0.1.0"}
The thesis behind the framework.
Probabilistic Programming on BEAM Process Runtimes
Notes on building a probabilistic programming framework where none was expected.
On the improbable and slightly reckless decision to build a probabilistic programming framework on a virtual machine designed for telephone switches. A literary technical memoir in twelve chapters.
Seven optimization phases, each revealing something about mixed-runtime Probabilistic Programming design. From 3.3 ESS/s to 298 ESS/s without changing the algorithm.
16 distributions, 4 inference methods, GPU acceleration, and a head-to-head race against PyMC. The numbers that prove the thesis.
The architectural thesis: streaming, fault tolerance, and distribution as consequences of the runtime, not features bolted on. Part 1 of the series.