Toy studies

Contents

Toy studies#

Having a model, it can be convenient to do sensitivity studies and checks of the fit by doing a “toy study”: sampling from the model and fitting to the generated sample. The fitted values and the spread characterize whether the fit is biased or not. The difference to the “actual” value divided by the uncertainty (the pulls) should follow a standard Gaussian distribution

import matplotlib.pyplot as plt
import numpy as np
import progressbar
import tensorflow as tf
import zfit
from zfit import z
No module named 'nlopt'

We gonna build a simple model, just a Gaussian. But, given the well defined workflow of zfit, model can be exchanged by any complicated composition or custom model.

obs = zfit.Space('x', (-5, 5))

sigma = zfit.Parameter('sigma', 1, 0.1, 10)
mu = zfit.Parameter('mu', 0, -1, 1)
model = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)

Instead of using sample as before, we will first build our loss with a more efficient Data, a “sampler”, created by create_sampler. This has like sample the arguments for limits and the number of samples, but also supports fixed_params, which is true by default. This means that whenever this object is resampled, it will be resampled with the parameter values that it had when we created the sampler.

sampler = model.create_sampler(n=3000, fixed_params=True)

So far, no sampling happened yet. But first, we build our whole chain, just using our sampler as data.

nll = zfit.loss.UnbinnedNLL(model, sampler)

from zfit.minimize import \
    DefaultToyStrategy  # this stategy does not raise an error with NaNs but returns a non-converged `FitResult`

minimizer = zfit.minimize.Minuit(strategy=DefaultToyStrategy(), verbosity=0, tol=1e-3, use_minuit_grad=True)
fit_results = []
ntoys = 20
params = nll.get_params()

with progressbar.ProgressBar(max_value=ntoys) as bar:

    while len(fit_results) < ntoys:

        # Generate toys
        sampler.resample()  # this is where the sampling happens

        # Randomise initial values. They can put the pdf in an unphysical region, making it negative at points.
        # This will produce NaNs in the log of the NLL. Therefore, we randomize until we got no NaNs anymore.
        for param in params:
            param.randomize()  # or smarter, use `set_value` for your own method

# The following can be used if the loss may returns NaNs, to test. Repeat in a while loop until it matches
#            try:
#                is_nan = np.isnan(zfit.run(nll.value()))
#            except tf.errors.InvalidArgumentError:  # NaNs produced, check_numerics raises this error
#                # print("nan error, try again")  # try again
#                is_nan = True
#            else:
#                break

        # Minimise the NLL
        result = minimizer.minimize(nll)

        if result.converged:
            # Calculate uncertainties
            result.hesse()
            fit_results.append(result)
            bar.update(len(fit_results))
print(fit_results[:10])
[<zfit.minimizers.fitresult.FitResult object at 0x7fd2284babd0>, <zfit.minimizers.fitresult.FitResult object at 0x7fd228490ad0>, <zfit.minimizers.fitresult.FitResult object at 0x7fd22d1fa490>, <zfit.minimizers.fitresult.FitResult object at 0x7fd22851d650>, <zfit.minimizers.fitresult.FitResult object at 0x7fd2296f8a50>, <zfit.minimizers.fitresult.FitResult object at 0x7fd283642090>, <zfit.minimizers.fitresult.FitResult object at 0x7fd2283b9d10>, <zfit.minimizers.fitresult.FitResult object at 0x7fd2283ba810>, <zfit.minimizers.fitresult.FitResult object at 0x7fd228391e50>, <zfit.minimizers.fitresult.FitResult object at 0x7fd228332c50>]

Evaluate results#

From here on, we can use the fit_results to compare against the true value, make plots, etc.