What If Probabilistic Programming Were Different?

December 2025 Part 1 Architecture

Every probabilistic programming language works the same way. You define a model in Python, click run, wait, and eventually get back a bag of numbers. PyMC, Stan, NumPyro, Turing.jl — the syntax varies, the runtime doesn't. There's a Python process (or a C++ binary, or a Julia session) that grinds through thousands of gradient evaluations, and when it's done, it hands you a trace object. If a numerical error blows up midway through, you start over. If you want four chains, you spawn four OS processes and hope they all finish. If you want to watch what's happening while it runs — you don't. You wait.

This is fine for notebooks. It is not fine for production systems, for distributed clusters, or for the kind of interactive data science where a practitioner needs to see whether their model is working before committing to a 20-minute sampling run.

What if the runtime were designed differently?

The Accidental Architecture

The uniformity of existing PPLs isn't a design choice — it's an accident of history. PyMC was born in the Python scientific ecosystem, so it inherited Python's execution model: single-threaded, synchronous, no fault isolation. Stan was born in C++, so it inherited compiled binaries and MPI for distribution. NumPyro was born on JAX, so it inherited functional transformations and pmap for same-host parallelism.

None of these runtimes were designed for the things modern inference actually needs:

These aren't exotic requirements. They're the baseline for any distributed system built in the last 30 years. Telecom switches, chat servers, and database clusters have solved them since the 1990s — using a runtime called the BEAM.

The Experiment

eXMC is a probabilistic programming framework built in Elixir on the BEAM virtual machine — the same runtime that powers WhatsApp, Discord's backend, and Ericsson's telecom infrastructure. It implements NUTS (the No-U-Turn Sampler), ADVI, SMC, and Pathfinder, with automatic differentiation via Nx and JIT compilation via EXLA (Google's XLA compiler).

The thesis isn't that Elixir is a better language for math. It's that the runtime gives you properties that other PPLs can't have — not because they chose not to, but because their runtimes can't express them.

Here's what falls out naturally.

Every sample is a message

In eXMC, sample_stream sends each posterior draw as a message to any process:

Sampler.sample_stream(model, self(), init, opts)

# In your process:
receive do
  {:exmc_sample, i, point_map, step_stat} ->
    # Update a chart, feed a web dashboard, log to disk
end

The sampler doesn't know or care what the receiver does with the sample. It could be a Scenic window drawing trace plots. It could be a Phoenix LiveView pushing updates to a browser. It could be a GenServer aggregating results from multiple chains. The protocol is one message type. The rest is just processes talking to processes.

If it crashes, try again

NUTS explores a binary tree of trajectory points. Each tree expansion involves a leapfrog integration step that can fail — NaN gradients, infinite energies, out-of-memory on GPU. In eXMC, each subtree build is wrapped in try/rescue:

try do
  build_subtree(state, direction, depth, step_fn)
rescue
  _ -> divergent_placeholder(state)
end

The overhead of this on the happy path is zero — literally zero instructions, because BEAM's try is implemented as exception registration in the process stack. When a subtree crashes, the sampler continues producing valid posterior samples.

203 lines. No MPI. No Dask. No Ray.

Distribution is just sending messages farther. Start nodes, send the model, collect samples. Erlang's distribution protocol handles serialization. PIDs are location-transparent: send(pid, msg) works identically whether pid is local or on a machine across the network.

The entire distributed sampling system is 203 lines of Elixir. With 4-node distribution: 3.4–3.7× near-linear scaling. No tuning, no infrastructure.

The composition test

Streaming inference and distributed sampling were built months apart. Neither knew the other existed. When we connected them — 4 remote nodes streaming samples to a live visualization in real time — the change was three lines of code. The visualization needed zero changes. The coordinator needed zero changes. The sampler needed zero changes.

Four nodes. Twenty thousand samples. Twenty-one seconds. The Scenic dashboard updated live from all four nodes simultaneously.

This composition worked on the first attempt because both features were built on the same primitive: send(pid, {:exmc_sample, i, point_map, step_stat}).

The Numbers

Head-to-head against PyMC (5-seed median, 1 chain, same model, same data):

ModeleXMC ESS/sPyMC ESS/sRatio
Simple (d=2)4695760.81×
Medium (d=5, hierarchical)2981571.90×
Stress (d=8, 3 groups)2151851.16×
Eight Schools (d=10)1252.55×
Stochastic Volatility (d=102)1.21.01.20×

eXMC wins 4 models to PyMC's 3, including the canonical Eight Schools benchmark and 102-dimensional stochastic volatility. The simple model gap (0.81×) is the irreducible cost of an interpreted host language. For models where adaptation quality matters more than per-step speed, eXMC wins.

What this means

The point isn't "use Elixir for Bayesian inference." The point is that PPL architecture is downstream of runtime choice, and the runtimes we've been using constrain what PPLs can be.

If your runtime has message passing, you get streaming inference for free. If your runtime has process isolation, you get fault-tolerant sampling for free. If your runtime has transparent distribution, you get multi-node MCMC for free. And if all three properties come from the same mechanism, independently-designed features compose without integration effort.

eXMC is an existence proof that they can be emergent. A probabilistic programming framework where liveness, fault tolerance, and distribution aren't features — they're consequences.


This is Part 1 of a series about building a probabilistic programming framework on the BEAM virtual machine. Part 2 covers feature parity and benchmarks. Part 3 tells the optimization story.