Custom Code with zfit#

zfit provides a lot of opportunities to implement your own code, be it in a model, loss or similar. It is mainly based on TensorFlow for it’s computations, which offers many advantages in terms of performance. While this is similar in nature to Numpy, some things are maybe not possible to implement and a fallback can be done.

This tutorial introduces the concept of TensorFlow and how to use which mode in zfit.

TL;DR: for immediate advice on what to use, also checkout the FAQ on graph and gradient modes

Defining computations#

Python offers a lot of flexibility as a programming language. In model fitting, much of this flexibility is though not needed. Fitting using a NLL with a dataset and a Gaussian is a straightforward process: a mathematical definition of a computation that is repeated many times, varying the parameters until the minimum is found. The expression always stays the same. But in fact, we could split things into two stages:

  • defining the computation: we build our model, whether it’s a Gaussian or something more complicated by sticking together models. Creating the NLL with the data and the model results in a well defined expression. From now on, there is no need to change this expression (in the vast majority of cases) and it can be ‘cached’

  • running: using our definition, we now need to do the actual calculation. Change the parameters, and calculate again. In this step, we do not need to redefine the computational expression, this stays the same. Just the values change.

If we can split our process into this two parts (and we mostly can), the first part has to be run only once, can be optimized (even using many resources as it is a one-time process) and then it can be executed many times.

On the other hand, we may have completely dynamic things, a model that changes it’s components after every minimization. While unlikely, let’s assume there is a case. Then we would need to rerun the model building every time, which is rather costly but this is the inherent price of it.

Graph vs Eager#

Eager#

Using Numpy does the latter; it does not remember the previous executions, but simply computes what is given on every line of code, even if done n times. This is also called “eager”. It is the “normal” behavior we are used to from Python.

TensorFlow by default acts exactly the same as Numpy and there is merely a difference to spot. In fact, Tensors can direclty be given to Numpy functions (in eager execution). So far, they are similar.

import os

os.environ["ZFIT_DISABLE_TF_WARNINGS"] = "1"  # disables some TF warnings

import numpy as np
import tensorflow as tf
import zfit
import zfit.z.numpy as znp  # this is numpy-like
from zfit import z  # this is basically tf, just wrapped
rnd_np = np.random.normal(size=(5,))
rnd_tf = z.random.normal(shape=(5,))
print(rnd_np)
print(rnd_tf)
[-0.91592026  0.86326117 -0.84781119  1.25087935 -0.58153664]

tf.Tensor([-0.7800776  -0.13786788  0.49046354 -0.91871296  0.54361401], shape=(5,), dtype=float64)

rnd_sum = rnd_np + rnd_tf  # this will be a tf.Tensor
print(rnd_sum)
tf.Tensor([-1.69599786  0.72539329 -0.35734765  0.3321664  -0.03792263], shape=(5,), dtype=float64)

np.square(rnd_sum)  # this will be a numpy nd.array
array([2.87640873e+00, 5.26195421e-01, 1.27697341e-01, 1.10334515e-01,
       1.43812604e-03])

As we see, this two libraries can, so far, be quite mixed. Or: TensorFlow can act “numpy-like” as a subset of it’s functionality. But it offers way more:

def eager_func(x):
    print("I am Python, being executed, x is ", x)
    tf.print("I am TensorFlow, being executed, x is ", x)
    return tf.square(x)
eager_func(z.constant(5))
I am Python, being executed, x is 
 
tf.Tensor(5.0, shape=(), dtype=float64)

I am TensorFlow, being executed, x is  5
<tf.Tensor: shape=(), dtype=float64, numpy=25.0>
eager_func(z.constant(7))
I am Python, being executed, x is 
 
tf.Tensor(7.0, shape=(), dtype=float64)

I am TensorFlow, being executed, x is  7
<tf.Tensor: shape=(), dtype=float64, numpy=49.0>

Graph#

TensorFlow has the possibility to decorate a function with tf.function (or here, use z.function!). This will first go through the code, stick together a computational graph (= build the computational expression) and then execute it directly. If “similar” arguments are given (e.g. just different data), it will actually re-use the computational expression. Building this graph in the first place has many advantages, as will be seen later.

Graph restrictions#

If a graph is built, every tf.* operation is recorded and remembered as a computation. On the other hand, any Python code is executed just once (not guaranteed, maybe also 2-3 times, but not more <- advanced implementation detail). So numpy operations won’t be in the computational graph and are treated as constants or will fail if they want to do something on a Tensor. Because the graph building will use a symbolic Tensor (as opposed to above where we had eager Tensors.

Creating the above function but decorater will demonstrate this.

@z.function
def graph_func(x):
    print("I am Python, being executed, x is ", x)
    tf.print("I am TensorFlow, being executed, x is ", x)
    return tf.square(x)
graph_func(znp.array(5))
I am Python, being executed, x is 
 
Tensor("x:0", shape=(), dtype=int64)

I am TensorFlow, being executed, x is  5
<tf.Tensor: shape=(), dtype=int64, numpy=25>
graph_func(znp.array(7))
I am TensorFlow, being executed, x is  7
<tf.Tensor: shape=(), dtype=int64, numpy=49>

There are two main differences here compared to the eager_func:

  • The Python print statement sais that x is a Tensor. But now, there is no value associated with. This is because we now look at a symbolic Tensor, a special object known by TensorFlow only. This will later on have a value, but only then.

  • In the second run with 7, the Python print statement vanished! This is because there is only the graph run again. In the graph, only tf.* operations (tf.print, tf.square) are added, no Python operation.

Let’s see more explicitly about the first point and how it will fail by using a Numpy operation instead of TF

@z.function
def graph_func_fail(x):
    print("I am Python, being executed, x is ", x)
    tf.print("I am TensorFlow, being executed, x is ", x)
    return np.square(x)
try:
    graph_func_fail(znp.array(5.))
except NotImplementedError as error:
    print(f"Error was raised, last line: {error}")
I am Python, being executed, x is 
 
Tensor("x:0", shape=(), dtype=float64)

Error was raised, last line: Cannot convert a symbolic tf.Tensor (x:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.

The error message is clear: Numpy does not know how to deal with a symbolic Tensor since it does not know the symbolic language of TensorFlow. It cab only act on concrete numbers given.

So far we have seen: when building a graph, we can only use tf.* acting on inputs.

Trigger a graph build#

Above we used directly Tensors to feed into a function. This is usually the most efficient way, as any arbitrary Python object can also be used, but (usually) causes a re-build of the graph, resp. adds a new graph into the function cache.

zfit has some internal logic to mitigate this somewhat and invalidate a graph when an object has changed.

However, calling the above with pure Python numbers will create a new graph every time (except it is exactly the same Python number). This is usually not what we want, but sometimes it is unavoidable. More on this further down.

Graph modes in zfit#

Most functions in zfit build a graph first, most notably the loss (which will also build a graph in every model). This behavior can be changed with zfit.run.set_mode(graph=False), which will run functions eagerly. With this, our previously failed example should run.

zfit.run.set_graph_mode(False)
<zfit.util.temporary.TemporarilySet at 0x7feed432b450>
graph_func_fail(znp.array(5))
I am Python, being executed, x is 
 
tf.Tensor(5, shape=(), dtype=int64)

I am TensorFlow, being executed, x is  5
25

This can be useful for debugging, as now every Tensor has a value and every operation is executed immediately.

Another problem is that building a graph becomes only efficient if we execute it multiple times, not just a single time. But for plotting a pdf for example, just a single call to pdf is needed. Furthermore, since different objects (e.g. different datasets, norm_range etc.) will create a new graph, things can become very slow, caching many graphs that often are not needed anymore.

The z.function decorator is in fact more powerful then the pure tf.function: it allow to tell what kind of function is wrapped and this on the other hand allows zfit to be “smart” about which function to trace and which not. By default, any method of models (pdf, integrate,…) are executed eagerly, without graphs. On the other hand, if a loss is built, this builds a graph of everything. Mainly, this behavior is wanted.

It implies though that if a loss is built, the execution is different then as opposed to calling pdf, because the former will do graph tracing, then execute this graph, while the latter will execute eagerly (by default).

Therefore, it can also be beneficial to set zfit.run.set_mode(graph=True), which will always trigger a graph tracing for any decorated function.

Python code in graph#

So far, to execute arbitrary Python code, including Numpy, also in losses, we will need to run zfit eagerly (with graph=False). There is another possibility, which is the z.py_function (wrapping tf.py_function). This allows to wrap an “arbitrary” Python function and put it into the graph; the only restriction is that it allows Tensors as inputs and outputs (resp. Numpy arrays) only.

def numpy_func(x, a):
    return np.square(x) * a


@z.function
def wrapped_numpy_func(x_tensor, a_tensor):
    result = z.py_function(func=numpy_func, inp=[x_tensor, a_tensor], Tout=zfit.ztypes.float)  # or tf.float64
    result.set_shape(x_tensor.shape)  # this is useful and can prevent bugs: it says that the shape of the
    # result is the same as of the input tensor. This does not have to be true always and may be adjusted
    # accordingly. It however prevents some failures e.g. related to sampling.
    result = tf.sqrt(result)  # we can of course continue to execute more tf operations
    return result
wrapped_numpy_func(z.random.uniform(shape=(10,)), z.constant(42))
<tf.Tensor: shape=(10,), dtype=float64, numpy=
array([1.74619969, 5.66170021, 2.17918274, 5.31814976, 6.37765933,
       4.70700287, 1.62384209, 3.17499354, 3.38616613, 3.83643533])>

That’s nice! Now we can execute Python code and use the graphs. There is though a drawback: This computations are completely not optimized: they won’t be run on the GPU, parallelized or anything. Most notably, not using pure z.* (or tf.*) functionality has another implication: TensorFlow is not able to have a full computational expression, but there are unknowns, which makes another feature unusable: automatic gradients.

Gradients in TensorFlow#

Tracking every operation that is done on a Tensor, it is possible to get an expression for the analytic gradient - by successively applying the chain rule to every operation. This technique is also called automatic differentiation.

This is only possible if all operations are performed by TensorFlow, whether it is run eagerly or within a graph creating function.

If Numpy is used directly with TensorFlow in a dynamic way (e.g. not just a static shape), such as when using SciPy distributions, this gradient won’t work anymore. zfit can switch to a numerical method for calculating the gradient and Hessian with zfit.run.set_mode(autograd=False). Futhermore, some optimizers such as Minuit have their own, iternal gradient calculator, which can be more efficient (Minuit(use_minuit_grad=True)).

Numerical gradients (provided by zfit) are less stable and tend to break.

Resetting#

To reset to the default behavior, use zfit.run.set_mode_default.

Graph caching and slowness#

Repeated calls to graph building functions are sometimes necessary, e.g. when scanning over a range and changing the norm_range, which renders the validity of the graph invalid. We can have a look at an example.

@z.function
def graph_func2(x):
    x *= (tf.random.uniform(shape=(10000,), dtype=tf.float64) * tf.cast(x, tf.float64) - 0.1) ** 4
    x += (tf.random.uniform(shape=(10000,), dtype=tf.float64) * tf.cast(x, tf.float64) - 0.3) ** 2
    return x

First, we can test this in eager mode to get an approximate idea

n_evals = 50  # how often to evaluate a function, e.g. the loss
n_changes = 100 # how often the loss changes fundamentally and has to be rebuilt
%%timeit -n1 -r1
zfit.run.set_graph_mode(False)  # running in eager mode
for i in range(100):
    for _ in range(n_evals):
        graph_func2(i)
12.6 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

%%timeit -n1 -r1
zfit.run.set_graph_mode(True)  # running in graph mode
for i in range(100):
    for _ in range(n_evals):
        graph_func2(i)
8.21 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

%%timeit -n1 -r1
zfit.run.clear_graph_cache()
zfit.run.set_graph_mode(graph=True)  # running in graph mode but clearing unused caches
for i in range(100):
    zfit.run.clear_graph_cache()
    for _ in range(n_evals):
        graph_func2(i)
8.23 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

zfit.run.set_mode_default() # resetting the mode

We see that for this (simple) example, eager and graph building with cache cleared in between match basically. For more n_evals, the graph version with clearing will be more efficient, for less, the eager mode. Building graphs and not clearing them will fill up the cache and significanlty slow things down, as demonstrated.

Advantages of graphs#

The main advantages are the optimized execution including parallelization and dispatching to the GPU. Furthermore, many things such as operation fusions to an optimized implementation, constant folding and more is performed.

The performance gain is mostly visible with highly parallizable functions, such as building a sum. Let’s look at an example here, using the previous examples.

@z.function
def sum_func():
    results = []
    for i in range(10):
        results.append(graph_func2(i))
    return z.reduce_sum(results)

To measure the timing, we can first call it, then it builds the graph. So we basically remove the graph building time. If something is called multiple times, usually we are interested in the successive calls time, not just the first call.

zfit.run.set_graph_mode(False)  # test first in eager mode
<zfit.util.temporary.TemporarilySet at 0x7feecd5055d0>
%%timeit -n1 -r5
print(sum_func())
tf.Tensor(1652230565301.0815, shape=(), dtype=float64)

tf.Tensor(1654978760080.5261, shape=(), dtype=float64)

tf.Tensor(1649231523306.3616, shape=(), dtype=float64)

tf.Tensor(1637563390962.3777, shape=(), dtype=float64)

tf.Tensor(1614438918649.6382, shape=(), dtype=float64)

28.3 ms ± 1.86 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)

This gives us a reference. As we see, the values are different each time, as expected. Now let’s run with the graph mode on. As mentioned, the first call just measures the graph building time + one single execution.

zfit.run.set_mode_default()
%%timeit -n1 -r1
print(sum_func())
tf.Tensor(1618655459315.5688, shape=(), dtype=float64)

119 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

This takes significantly more time then the eager execution. Now we can execute it and measure the time of the succesive calls

%%timeit -n1 -r5  # 5 repetitions
print(sum_func())
tf.Tensor(1633236972343.317, shape=(), dtype=float64)

tf.Tensor(1674102824381.7644, shape=(), dtype=float64)

tf.Tensor(1616188010705.4148, shape=(), dtype=float64)

tf.Tensor(1684423231148.3633, shape=(), dtype=float64)

tf.Tensor(1659740373693.6792, shape=(), dtype=float64)

8.05 ms ± 298 µs per loop (mean ± std. dev. of 5 runs, 1 loop each)

That’s a significant speedup! It is clear that for a few evaluations, it does not matter too much. But this is about the scalability: imagine we have a large fit, where a minimizer needs hundreds or thousands of evaluations: that’s when the initial Graph building becomes neglectible and the speedup matters.