Toy studies

Contents

Toy studies#

Having a model, it can be convenient to do sensitivity studies and checks of the fit by doing a “toy study”: sampling from the model and fitting to the generated sample. The fitted values and the spread characterize whether the fit is biased or not. The difference to the “actual” value divided by the uncertainty (the pulls) should follow a standard Gaussian distribution

import matplotlib.pyplot as plt
import numpy as np
import progressbar
import tensorflow as tf
import zfit
from zfit import z

We will build a simple model, just a Gaussian. But, given the well defined workflow of zfit, model can be exchanged by any complicated composition or custom model.

obs = zfit.Space('x', -5, 5)

sigma = zfit.Parameter('sigma', 1, 0.1, 10)
mu = zfit.Parameter('mu', 0, -1, 1)
model = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)

Instead of using sample as before, we will first build our loss with a more efficient Data, a “sampler”, created by create_sampler. This has like sample the arguments for limits and the number of samples, but also supports fixed_params, which is true by default. This means that whenever this object is resampled, it will be resampled with the parameter values that it had when we created the sampler.

sampler = model.create_sampler(n=3000)

This takes a while, as the first resampling is happening now. But first, we build our whole chain, just using our sampler as data.

nll = zfit.loss.UnbinnedNLL(model, sampler)

# this stategy does not raise an error with NaNs but returns a non-converged `FitResult`
from zfit.minimize import DefaultToyStrategy

minimizer = zfit.minimize.Minuit(strategy=DefaultToyStrategy(), verbosity=0, tol=1e-3, use_minuit_grad=True)
fit_results = []
ntoys = 20
params = nll.get_params()

with progressbar.ProgressBar(max_value=ntoys) as bar:

    while len(fit_results) < ntoys:

        # Generate toys
        sampler.resample()  # this is where the sampling happens

        # Randomise initial values. They can put the pdf in an unphysical region, making it negative at points.
        # This will produce NaNs in the log of the NLL. Therefore, we randomize until we got no NaNs anymore.
        for param in params:
            param.randomize()  # or smarter, use `set_value` for your own method

# The following can be used if the loss may returns NaNs, to test. Repeat in a while loop until it matches
#            try:
#                is_nan = np.isnan(zfit.run(nll.value()))
#            except tf.errors.InvalidArgumentError:  # NaNs produced, check_numerics raises this error
#                # print("nan error, try again")  # try again
#                is_nan = True
#            else:
#                break

        # Minimise the NLL
        result = minimizer.minimize(nll)

        if result.converged:
            # Calculate uncertainties
            result.hesse()
            fit_results.append(result)
            bar.update(len(fit_results))
print(fit_results[:10])
[<zfit.minimizers.fitresult.FitResult object at 0x7f95a77e45d0>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a778ac90>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a745b4d0>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a743a5d0>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a7499a10>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a7478d10>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a74a8390>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a74e3ed0>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a74677d0>, <zfit.minimizers.fitresult.FitResult object at 0x7f95a7321890>]

Evaluate results#

From here on, we can use the fit_results to compare against the true value, make plots, etc.