Bayesian Inference#
A concise introduction to modern Bayesian inference in zfit covering essential features:
Prior specification: Define prior beliefs about parameters before seeing data
MCMC sampling: Use Markov Chain Monte Carlo (emcee) to sample from posterior distributions
Convergence diagnostics: Monitor R̂ (Gelman-Rubin statistic) and ESS (Effective Sample Size)
ArviZ integration: Advanced diagnostics and visualization tools
Posterior analysis: Extract credible intervals, means, and covariances
from __future__ import annotations
import os
os.environ["ZFIT_DISABLE_TF_WARNINGS"] = "1" # Suppress TensorFlow warnings
os.environ["CUDA_VISIBLE_DEVICES"] = "-1" # disable GPU
import matplotlib.pyplot as plt
import numpy as np
import zfit
np.random.seed(42)
Bayesian Analysis Fundamentals#
Bayesian inference uses Bayes’ theorem to update beliefs about parameters given data:
Where:
P(θ|data): Posterior - updated beliefs after seeing data
P(data|θ): Likelihood - probability of observing data given parameters
P(θ): Prior - initial beliefs about parameters before seeing data
Priors encode domain knowledge or express ignorance. Unlike frequentist methods that treat parameters as fixed unknowns, Bayesian analysis treats them as random variables with probability distributions.
# Available prior distributions in zfit
uniform_prior = zfit.prior.Uniform(lower=0, upper=10)
normal_prior = zfit.prior.Normal(mu=5.0, sigma=1.0)
gamma_prior = zfit.prior.Gamma(alpha=2.0, beta=1.0)
half_normal_prior = zfit.prior.HalfNormal(sigma=0.5)
poisson_prior = zfit.prior.Poisson(lam=3.0)
exponential_prior = zfit.prior.Exponential(lam=2.0)
student_t_prior = zfit.prior.StudentT(ndof=3, mu=0.0, sigma=1.0)
1. Model Setup with Priors#
Signal+background model with physics-motivated priors:
μ: Uniform around expected peak location
σ: HalfNormal (positive, favors smaller widths)
λ: Normal around typical decay rate
Yields: Normal based on expected counts
Priors can be set during creation or modified later:
# Setting and changing priors
param = zfit.Parameter("demo", 1.0, lower=0.0, upper=5.0)
print(f"Initial prior: {param.prior}")
param.set_prior(zfit.prior.Normal(mu=2.0, sigma=0.5))
print(f"Updated prior: {param.prior}")
param.set_prior(zfit.prior.Exponential(lam=1.0))
print(f"Exponential prior: {param.prior}")
param.set_prior(None) # Remove prior
print(f"Removed prior: {param.prior}")
Initial prior: None
Updated prior: Normal(name='None')
Exponential prior: Exponential(name='None')
Removed prior: None
# Define observable and parameters with priors
obs = zfit.Space("mass", 4.0, 6.0)
# Signal parameters
mu = zfit.Parameter("mu", 5.1, 4.5, 5.5, prior=zfit.prior.Uniform(lower=4.8, upper=5.2))
sigma = zfit.Parameter("sigma", 0.2, 0.05, 0.3, prior=zfit.prior.HalfNormal(sigma=0.1))
lambda_bkg = zfit.Parameter("lambda_bkg", -1.2, -3.0, 0.0, prior=zfit.prior.Normal(mu=-1.0, sigma=0.5))
# Yield parameters
n_sig = zfit.Parameter("n_sig", 900, 0, 5000, prior=zfit.prior.Normal(mu=1000, sigma=100))
n_bkg = zfit.Parameter("n_bkg", 600, 0, 2000, prior=zfit.prior.Normal(mu=500, sigma=50))
# Create model
signal = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma, extended=n_sig)
background = zfit.pdf.Exponential(obs=obs, lambda_=lambda_bkg, extended=n_bkg)
model = zfit.pdf.SumPDF([signal, background])
# Generate synthetic data from the model
true_params = {mu: 5.0, sigma: 0.1, lambda_bkg: -1.0, n_sig: 1000, n_bkg: 500}
data = model.sample(n=1500, params=true_params)
data.to_binned(50).to_hist().plot(label="Data", color="black", histtype="step")
[StairsArtists(stairs=<matplotlib.patches.StepPatch object at 0x70e2eda48440>, errorbar=<ErrorbarContainer object of 3 artists>, legend_artist=<ErrorbarContainer object of 3 artists>)]
# Create loss function
nll = zfit.loss.ExtendedUnbinnedNLL(model=model, data=data)
3. MCMC Sampling#
MCMC constructs a Markov chain to sample from the posterior. The emcee ensemble sampler uses multiple walkers for efficiency and affine invariance.
Key parameters:
nwalkers: Ensemble size, typically ≥ 2× parameters
n_warmup: Burn-in steps to reach stationarity, 200-500 for simple models, more for complex ones
n_samples: Production samples, 1000+ for final results, 100-500 for testing
# Initialize MCMC sampler
sampler = zfit.mcmc.EmceeSampler(nwalkers=32, verbosity=8) # 8 shows progressbar
# Sample from posterior
posterior = sampler.sample(
loss=nll,
n_samples=500, # Reduced for tutorial speed
n_warmup=200,
)
Running burn-in phase with 200 steps...
Running production phase with 500 steps...
4. Results Analysis#
The posterior provides parameter estimates and convergence diagnostics:
R̂: Gelman-Rubin statistic, convergence metric comparing within-chain to between-chain variance (≤ 1.1 indicates good convergence)
ESS: Effective sample size accounting for autocorrelation (higher = better sampling efficiency)
Credible intervals: Bayesian confidence intervals
Methods:
mean(),std(),credible_interval(),get_samples()
print(posterior)
PosteriorSamples
from
<ExtendedUnbinnedNLL model=[<zfit.<class 'zfit.models.functor.SumPDF'> params=[Composed_autoparam_40, Composed_autoparam_41]] data=[<zfit.Data: Data obs=('mass',) shape=(1500, 1)>] constraints=[]>
with
EmceeSampler(name='EmceeSampler')
╒═════════╤═════════════╤═════════╤═══════════╤═══════════════════════════════════════╕
│ valid │ converged │ max R̂ │ min ESS │ total samples | warmup | per walker │
╞═════════╪═════════════╪═════════╪═══════════╪═══════════════════════════════════════╡
│
True
│
True
│ 1.0271 │ 11797 │ 16000 | 200 | 500 │
╘═════════╧═════════════╧═════════╧═══════════╧═══════════════════════════════════════╛
Parameters
parameter mean std 95% CI R̂ ESS
----------- --------- --------- ------------------------ ------ -----
n_sig 1001.5992 ± 34.1576 [ 938.3271, 1070.1342] 1.0270 11797
n_bkg 498.1738 ± 24.0257 [ 452.2465, 546.2529] 1.0140 14301
mu 5.0001 ± 0.0038 [ 4.9926, 5.0076] 1.0090 14573
sigma 0.1041 ± 0.0031 [ 0.0982, 0.1104] 1.0090 16715
lambda_bkg -0.8155 ± 0.0845 [ -0.9883, -0.6535] 1.0140 15348
Sampler: EmceeSampler with 32 walkers
# Extract parameter estimates
for param in model.get_params():
mean_val = posterior.mean(param)
std_val = posterior.std(param)
print(f"{param.name}: {mean_val:.4f} ± {std_val:.4f}")
print("\n90% credible intervals:")
for param in model.get_params():
lower, upper = posterior.credible_interval(param, alpha=0.1)
print(f"{param.name}: [{lower:.4f}, {upper:.4f}]")
n_sig: 1001.5992 ± 34.1576
n_bkg: 498.1738 ± 24.0257
mu: 5.0001 ± 0.0038
sigma: 0.1041 ± 0.0031
lambda_bkg: -0.8155 ± 0.0845
90% credible intervals:
n_sig: [947.3732, 1060.4980]
n_bkg: [459.3049, 538.8280]
mu: [4.9940, 5.0063]
sigma: [0.0992, 0.1092]
lambda_bkg: [-0.9569, -0.6794]
5. Visualization#
Posterior plots show parameter uncertainties and compare to true values. Key insights:
Width: Parameter uncertainty
Shape: Non-Gaussian features
Location: How data updated the prior
# Plot posterior distributions
fig, axes = plt.subplots(2, 3, figsize=(12, 8))
axes = axes.flatten()
for i, param in enumerate(model.get_params()):
if i < len(axes):
samples = posterior.get_samples(param)
axes[i].hist(samples, bins=30, alpha=0.7, density=True)
axes[i].axvline(posterior.mean(param), color="red", linestyle="--", label="Mean")
axes[i].axvline(true_params[param], color="green", linestyle="-", label="True")
axes[i].set_title(f"{param.name}")
axes[i].set_xlabel("Value")
axes[i].set_ylabel("Density")
axes[i].legend()
# Remove empty subplot
if len(model.get_params()) < len(axes):
fig.delaxes(axes[-1])
plt.tight_layout()
plt.suptitle("Posterior Distributions", y=1.02)
plt.show()
# ArviZ integration for advanced diagnostics
import arviz as az
# Convert to ArviZ InferenceData format
idata = posterior.to_arviz()
# Print summary with R-hat and ESS
summary = az.summary(idata)
print(summary)
# Plot trace plots
az.plot_trace(idata, compact=True)
plt.tight_layout()
plt.show()
# Check R-hat values
rhat = az.rhat(idata)
print("\nR-hat values (should be ≤ 1.1):")
for var in rhat.data_vars:
print(f"{var}: {float(rhat[var]):.3f}")
# Effective sample size
ess = az.ess(idata)
print("\nEffective sample sizes:")
for var in ess.data_vars:
print(f"{var}: {float(ess[var]):.0f}")
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk \
lambda_bkg -0.816 0.085 -0.969 -0.651 0.001 0.001 15348.0
mu 5.000 0.004 4.993 5.007 0.000 0.000 14573.0
n_bkg 498.174 24.026 453.181 543.234 0.201 0.135 14301.0
n_sig 1001.599 34.159 942.365 1068.743 0.310 0.183 11797.0
sigma 0.104 0.003 0.098 0.110 0.000 0.000 16715.0
ess_tail r_hat
lambda_bkg 15695.0 1.01
mu 13014.0 1.01
n_bkg 15219.0 1.01
n_sig 14242.0 1.03
sigma 16505.0 1.01
R-hat values (should be ≤ 1.1):
lambda_bkg: 1.014
mu: 1.009
n_bkg: 1.014
n_sig: 1.027
sigma: 1.009
Effective sample sizes:
lambda_bkg: 15348
mu: 14573
n_bkg: 14301
n_sig: 11797
sigma: 16715
After the fit is before the fit#
# 1. Posterior to prior for hierarchical modeling
mu_posterior_prior = posterior.as_prior(mu)
print(f"Created KDE prior from posterior: {mu_posterior_prior}")
Created KDE prior from posterior: KDE(name='mu_posterior_prior')
# Covariance matrix and correlations
cov_matrix = posterior.covariance()
param_names = [p.name for p in model.get_params()]
corr_matrix = np.corrcoef(cov_matrix)
plt.figure(figsize=(8, 6))
plt.imshow(corr_matrix, cmap="coolwarm", vmin=-1, vmax=1)
plt.colorbar(label="Correlation")
plt.xticks(range(len(param_names)), param_names, rotation=45)
plt.yticks(range(len(param_names)), param_names)
plt.title("Parameter Correlation Matrix")
for i in range(len(param_names)):
for j in range(len(param_names)):
plt.text(
j,
i,
f"{corr_matrix[i, j]:.2f}",
ha="center",
va="center",
color="white" if abs(corr_matrix[i, j]) > 0.5 else "black",
)
plt.tight_layout()
plt.show()
# Context manager for setting parameters to posterior means, same as FitResult
original_mu = mu.value()
with posterior:
posterior_mu = mu.value()
print(f"Original mu: {original_mu:.4f}")
print(f"Posterior mean mu: {posterior_mu:.4f}")
print(f"After context: {mu.value():.4f}")
Original mu: 5.1000
Posterior mean mu: 5.0001
After context: 5.1000