Toy studies

Contents

Toy studies#

Having a model, it can be convenient to do sensitivity studies and checks of the fit by doing a “toy study”: sampling from the model and fitting to the generated sample. The fitted values and the spread characterize whether the fit is biased or not. The difference to the “actual” value divided by the uncertainty (the pulls) should follow a standard Gaussian distribution

import matplotlib.pyplot as plt
import numpy as np
import progressbar
import tensorflow as tf
import zfit
from zfit import z

We will build a simple model, just a Gaussian. But, given the well defined workflow of zfit, model can be exchanged by any complicated composition or custom model.

obs = zfit.Space('x', -5, 5)

sigma = zfit.Parameter('sigma', 1, 0.1, 10)
mu = zfit.Parameter('mu', 0, -1, 1)
model = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)

Instead of using sample as before, we will first build our loss with a more efficient Data, a “sampler”, created by create_sampler. This has like sample the arguments for limits and the number of samples, but also supports fixed_params, which is true by default. This means that whenever this object is resampled, it will be resampled with the parameter values that it had when we created the sampler.

sampler = model.create_sampler(n=3000)

This takes a while, as the first resampling is happening now. But first, we build our whole chain, just using our sampler as data.

nll = zfit.loss.UnbinnedNLL(model, sampler)

# this stategy does not raise an error with NaNs but returns a non-converged `FitResult`
from zfit.minimize import DefaultToyStrategy

minimizer = zfit.minimize.Minuit(strategy=DefaultToyStrategy(), verbosity=0, tol=1e-3, use_minuit_grad=True)
fit_results = []
ntoys = 20
params = nll.get_params()

with progressbar.ProgressBar(max_value=ntoys) as bar:

    while len(fit_results) < ntoys:

        # Generate toys
        sampler.resample()  # this is where the sampling happens

        # Randomise initial values. They can put the pdf in an unphysical region, making it negative at points.
        # This will produce NaNs in the log of the NLL. Therefore, we randomize until we got no NaNs anymore.
        for param in params:
            param.randomize()  # or smarter, use `set_value` for your own method

# The following can be used if the loss may returns NaNs, to test. Repeat in a while loop until it matches
#            try:
#                is_nan = np.isnan(zfit.run(nll.value()))
#            except tf.errors.InvalidArgumentError:  # NaNs produced, check_numerics raises this error
#                # print("nan error, try again")  # try again
#                is_nan = True
#            else:
#                break

        # Minimise the NLL
        result = minimizer.minimize(nll)

        if result.converged:
            # Calculate uncertainties
            result.hesse()
            fit_results.append(result)
            bar.update(len(fit_results))
print(fit_results[:10])
[<zfit.minimizers.fitresult.FitResult object at 0x7f44a0486cd0>, <zfit.minimizers.fitresult.FitResult object at 0x7f44a04d0a50>, <zfit.minimizers.fitresult.FitResult object at 0x7f44a01114d0>, <zfit.minimizers.fitresult.FitResult object at 0x7f44a02a0d50>, <zfit.minimizers.fitresult.FitResult object at 0x7f44a016bf50>, <zfit.minimizers.fitresult.FitResult object at 0x7f44a013ad10>, <zfit.minimizers.fitresult.FitResult object at 0x7f44a0178790>, <zfit.minimizers.fitresult.FitResult object at 0x7f449a71cd90>, <zfit.minimizers.fitresult.FitResult object at 0x7f449a71e4d0>, <zfit.minimizers.fitresult.FitResult object at 0x7f449a710b10>]

Evaluate results#

From here on, we can use the fit_results to compare against the true value, make plots, etc.