The Feldman-Hájek theorem

In other posts, I've touched on the idea of sampling from probability measures defined over spaces of functions. We might be inferring the density of material within the earth from seismic surveys, the elevation of the earth's surface from lidar measurements, or the conductivity of an aquifer from well head measurements. For the sake of definiteness, we'll assume throughout that the function we're looking for is an element of the space $X = L^2(\Omega)$ for some domain $\Omega$. We could consider Sobolev or other spaces but the idea is the same. I'll also tacitly assume that $\Omega$ is regular enough for PDE theory to apply. That usually means that the boundary of $\Omega$ has to be piecewise smooth and have no cusps.

You can think of a random variable with values in $X$, i.e. a function selected at random according to some statistical law, as a stochastic process. I'll use the two terms interchangeably. In the stochastic process literature, it's common to focus on processes that have certain useful statistical properties. For example, we might think of $\Omega$ as some time interval $[0, T]$ and a random function $z(t)$ as a function of time $t$. We'd then like to have some notion of causality. The values of $z(t)$ should depend on the values of $z(s)$ for $s$ coming before $t$ but not after. The Markov property gives us this idea of causality. If $\Omega$ is instead a spatial domain, we might instead be interested in random fields where we specify analytically the covariance between $z(x)$ and $z(y)$ as a function of $x - y$. For example, it's common to assume that the conductivity of an aquifer is a random field whose logarithm has the spatial correlation structure $$\mathbb E[(z(x) - m)(z(y) - m)] = \text{const}\times\exp(-|x - y|^2/\lambda^2).$$ This is all the province of geostatistics. But in the following I'll be focused more on general principles without making any assumptions about the dependence structure of the process.

The normal distribution has a kind of universal character in finite-dimensional statistical inference. We want to generalize the idea of a normal distribution to function spaces -- a Gaussian process. We'll have to think a bit about measure theory. This will lead us by turns to ask about the relation between two distinct normal random variables with values in a function space. This relationship will be summed up in the Feldman-Hájek theorem. It's a doozy. I'll do some numerical experiments to illustrate the theorem.

Normal random variables

A normal random variable $Z$ has a density $$\rho(z) = \frac{\exp\left(-\frac{1}{2}(z - m)^*C^{-1}(z - m)\right)}{\sqrt{(2\pi)^n|\det C|}}$$ where $m$ is the mean, $C$ the covariance matrix. To talk about normal random variables on function spaces, we'll have to think about things that we take for granted. What does it mean to say that a random variable $Z$ has a density at all? In finite dimensions, this means that we can evaluate expectations of any function of $Z$ by evaluating an integral: $$\mathbb E[f(Z)] = \int f(z)\rho(z)\mathrm dz.$$ Fine, but what is $\mathrm dz$? It's a symbolic shorthand for integration with respect to Lebesgue measure. I'll write Lebesgue measure as $\lambda$ for the time being. Let $\mu$ be the distribution for $Z$, i.e. $$\mu(Z) = \mathbb P[Z \in A].$$ When we say that $Z$ has a density, we really mean that, for any set $A$ such that $\lambda(A) = 0$, we must also have that $\mu(A) = 0$ as well. The fancy term for this is that $\mu$ is absolutely continuous with respect to $\lambda$, usually written as $\mu \ll \lambda$. The Radon-Nikodym theorem then tells us that there is a unique positive function $\rho$ such that, for all $f$, $$\int f\,\mathrm d\mu = \int f\,\rho\,\mathrm d\lambda.$$ At a formal level we might say that $\mathrm d\mu = \rho\,\mathrm d\lambda$, hence the notation $$\rho = \frac{\mathrm d\mu}{\mathrm d\lambda}.$$ The usual Lebesgue measure provides a kind of background, reference measure to which other probability measures can be absolutely continuous. Lebesgue measure is not itself a probability measure because the volume of the entire space is infinite. Discrete random variables don't have densities. The Lebesgue measure of a single point is zero. Discrete random variables instead have densities with respect to a different measure, for example the counting measure on $\mathbb Z$. The counting measure, which is again not itself a probability measure, fulfills the role of the reference measure to which discrete random variables can be absolutely continuous.

In infinite dimensions, there is no spoon Lebesgue measure. To be more specific, there's no measure on an infinite-dimensional vector space that is non-trivial and translation-invariant. There is no background reference measure "$\mathrm dz$" to integrate against. We can't write down a normal density -- a density is defined w.r.t. a background measure and we don't have one. There are still well-defined normal random variables and probability measures.

Not having a measure $\mathrm dz$ or a density $\rho(z)$ to work with doesn't stop physicists from writing functional integrals. I won't use this notation going forward, but I do find it appealing.

To define a normal random variable on a function space, we can instead work with arbitrary finite-dimensional projections. Let $\{g_1, \ldots, g_N\}$ be a finite collection of elements of the dual space $X^*$. We can then define a linear operator $G : X \to \mathbb R^N$ by $$G = \sum_k e_k\otimes g_k$$ where $e_k$ is the $k$-th standard basis vector in $\mathbb R^N$. Given a random variable $Z$ with values in $X$, $$GZ = \left[\begin{matrix}\langle g_1, Z\rangle \\ \vdots \\ \langle g_N, Z\rangle\end{matrix}\right]$$ is a random vector taking values in $\mathbb R^N$. We say that $Z$ is a normal random variable if $GZ$ is normal for all finite choices of $N$ and $G$. The mean of $Z$ is the vector $m$ in $X$ such that $GZ$ has the mean $Gm$ for all linear mappings $G$. Likewise, the covariance $C$ of $Z$ is an element of $X\otimes X$ such that $GZ$ has the covariance matrix $GCG^*$ for all $G$.

Not just any operator can be the covariance of a normal random variable on a function space. It ought to be familiar from the finite-dimensional theory that $C$ has to be symmetric and positive-definite. You might imagine writing down a decomposition for $C$ in terms of its eigenfunctions $\{\psi_n\}$. We know that the eigenfunctions are orthonormal because $C$ is symmetric. Since the eigenvalues are real and positive, we can write them as the squares $\sigma_n^2$ of a sequence $\{\sigma_n\}$ of real numbers as a mnemonic. I'll do this throughout because it becomes convenient in a number of places. The action of $C$ on a function $\phi$ is then $$C\phi = \sum_n\sigma_n^2\langle\psi_n, \phi\rangle\,\psi_n.$$ In the function space setting, we need that the sum of all the eigenvalues is finite: $$\sum_n\sigma_n^2 < \infty.$$ The fancy term for this is that $C$ has to be of trace class. In finite dimensions we don't have to worry about this condition at all. In function spaces, it's easy to try and use a covariance operator with $\sigma_n^2 \sim n^{-1}$ by mistake. The trace then diverges like the harmonic series.

The Karhunen-Loève expansion

The spectral theorem assures us that the covariance operator of a Gaussian process has to have an eigenvalue factorization. The Karhunen-Loève theorem then shows how we can use that decomposition to understand the stochastic process. It states that, if $\{\xi_k\}$ are an infinite sequence of i.i.d. standard normal random variables, then the random variable $$Z = \sum_n\sigma_n\,\xi_n\,\psi_n$$ has the distribution $N(0, C)$. In other words, if we know the eigendecomposition, we can generate samples from the process. We'll use this property in just a moment to simulate Brownian bridges. The eigenfunctions for the covariance operator of a Brownian bridge are trigonometric functions. We're simulating the stochastic process by Fourier synthesis.

Rather than start with a covariance operator and then compute its eigenvalue decomposition, the KL expansion also gives us a way to synthesize random variables with useful properties by picking the eigenvalues and eigenfunctions. The covariance operator is then $$C = \sum_n\sigma_n^2\,\psi_n\otimes\psi_n.$$ We can compute a closed form expression for $C$ as some integral operator $$C\phi = \int_\Omega \mathscr C(x, y)\phi(y)dy.$$ in some cases. But we don't need an explicit expression for the kernel for the KL expansion to be useful.

The code below generates some random functions on the domain $\Omega = [0, 1]$. Here we use the basis functions $$\psi_n = \sin(\pi n x).$$ We can then try eigenvalue sequences $\sigma_n = \alpha n^{-\gamma}$ for some power $\gamma$. Taking $\gamma \ge 1$ guarantees that $\sum_n\sigma_n^2$ is finite.

import numpy as np
from numpy import pi as π
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

def generate_sample(x, σs, rng):
    ξs = rng.standard_normal(size=len(σs))
    return sum(
        σ * ξ * np.sin(π * n * x)
        for n, (σ, ξ) in enumerate(zip(σs, ξs), start=1)
    )
num_points = 512
num_modes = num_points // 2
x = np.linspace(0.0, 1.0, num_points + 1)
ns = np.linspace(1.0, num_modes, num_modes)
rng = np.random.default_rng(seed=1729)

The plot below compares two function-valued random variables with different eigenvalue sequences. The plot on the top shows a classic Brownian bridge: $\sigma_n^2 = 2 / \pi^2n^2$. The plot on the bottom shows $\sigma_n^2 = 2 / \pi^2n^{2\gamma}$ where $\gamma = 1.6$. The second stochastic process is much smoother, which we expect -- the Fourier coefficients decay faster.

γ_1, γ_2 = 1.0, 1.6
σs_1 = np.sqrt(2) / (π * ns ** γ_1)
σs_2 = np.sqrt(2) / (π * ns ** γ_2)

num_samples = 128
ws = np.stack([generate_sample(x, σs_1, rng) for k in range(num_samples)])
zs = np.stack([generate_sample(x, σs_2, rng) for k in range(num_samples)])
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
for ax in axes:
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

for w, z in zip(ws, zs):
    axes[0].plot(x, w, color="tab:blue", alpha=0.25)
    axes[1].plot(x, z, color="tab:orange", alpha=0.25)
No description has been provided for this image

The Feldman-Hájek theorem

Now consider two $X$-valued normal random variables $W$ and $Z$. To simplify things, we'll assume that both of them have zero mean. The plots above might suggest that these two stochastic processes behave differently in some essential way that we can quantify. For example, the first process is a Brownian bridge. A Brownian bridge is continuous with probability 1, but it is nowhere differentiable. The second process, on the other hand, is almost surely differentiable. We can be a little more formal about this and say that $$\mathbb P[W \in H^1(\Omega)] = 0, \quad \mathbb P[Z \in H^1(\Omega)] = 1$$ where $H^1(\Omega)$ is the space of functions with square-integrable derivatives. The two random variables are not mutually absolutely continuous. That might not be too surprising. We picked the eigenvalue distributions of each process to have different decay rates: $$\sigma_n^2(W) \sim n^{-2}, \quad \sigma_n^2(Z) \sim n^{-2\gamma}.$$ Just to really hammer this point home, let's evaluate an approximation to the mean-square derivative of each group of samples.

δx = 1 / num_points

Dw = np.diff(ws, axis=1) / δx
energies_w = np.sum((Dw * Dw) * δx, axis=1)
print(f"∫|∇w|² dx = {energies_w.mean():.1f} ± {energies_w.std():.1f}")

Dz = np.diff(zs, axis=1) / δx
energies_z = np.sum((Dz * Dz) * δx, axis=1)
print(f"∫|∇z|² dx = {energies_z.mean():.1f} ± {energies_z.std():.1f}")
∫|∇w|² dx = 240.2 ± 21.5
∫|∇z|² dx = 3.8 ± 1.5

If you run this notebook with an even finer discretization of the unit interval, you'll see that $\int|\nabla w|^2\mathrm dx$ keeps increasing while the value of $\int|\nabla z|^2\mathrm dx$ stabilizes.

What conditions guarantee that two function-value random variables are absolutely continuous? In other words, we want to know when it is impossible for there to be some subset $A$ of $X$ such that there is positive probability that $W$ is in $A$ but zero probability that $Z$ is in $A$ or vice versa. What needs to be true about the covariance operators of the two random variables to fulfill this condition? This is the content of the Feldman-Hájek theorem.

The calculation above suggests that two normal distributions will concentrate on different sets if the ratios $\sigma_n^2(W)/\sigma_n^2(Z)$ of the eigenvalues of their covariance operators can go to either zero or infinity. The example above used two eigenvalue sequences that look like $n^{-\alpha}$ for different values of $\alpha$. The gap in the decay rates then lets us construct a quadratic functional for which the process $Z$ takes finite values and $W$ takes infinite values. The fact of this functional taking on finite or infinite values on one process or the other then implies the concentration of the two probability measures on different sets. The fact that the mean square gradient of $W$ is infinite implies that $W$ is not differentiable with probability 1, while $Z$ is differentiable with probability 1. Put another way, the probability distribution for $Z$ is concentrated in the set of differentiable functions while the probability distribution for $W$ is not.

We can always construct an operator like this when the decay rates differ, although the effect might be subtler. For example, the distinction might be in having a higher-order derivative or not.

What if the ratio of the eigenvalue sequences doesn't go to zero or infinity? Are the laws of the two processes equivalent, or can they still concentrate on different sets? We can make it even simpler by supposing that $W \sim N(0, C)$ and $Z \sim N(0, \alpha C)$ for some non-zero $\alpha$. We'll show that even these random variables concentrate on different sets. I got this example from MathOverflow. Let $\{\sigma_n^2\}$ be the eigenvalues of $C$ (note the square) and $\{\psi_n\}$ the eigenfunctions. Define the sequence of random variables $$U_n = \langle \psi_n, W\rangle / \sigma_n.$$ We know from the KL expansion that the $U_n$ are independent, identically-distributed standard normal random variables. If we also define $$V_n = \langle\psi_n, Z\rangle / \sigma_n$$ then these random variables are i.i.d. normals with mean zero and variance $\alpha$.

Now remember that $W$ and $Z$ take values in a function space and so there are infinitely many modes to work with. The strong law of large numbers then implies that $$\frac{1}{n}\sum_nU_n^2 \to 1$$ with probability 1. This would not be possible in a finite-dimensional vector space -- the sum would have to terminate at some $N$. The average of $U_n^2$ would be a random variable with some non-trivial spread around its mean. It can take on any positive value with non-zero probability. That probability could be minute, but it is still positive. Likewise, we also find that $$\frac{1}{n}\sum_nV_n^2 \to \alpha$$ with probability 1. But we've now shown that the two probability measures concentrate on different sets as long as $\alpha$ is not equal to 1. The weighted average of squared expansion coefficients takes distinct values, each with probability 1, for the distributions of $W$ and $Z$. So the two distributions are mutually singular.

Just to illustrate things a bit, the plot below shows samples obtained with a spectrum of $\{\sigma_n\}$ generated as before with $\gamma = 1.5$ and with $2\sigma_n$ respectively.

γ = 1.5
σs = np.sqrt(2) / (π * ns ** γ)
ws = np.stack([generate_sample(x, σs, rng) for k in range(num_samples)])
zs = np.stack([generate_sample(x, 2 * σs, rng) for k in range(num_samples)])
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
for ax in axes:
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

for w, z in zip(ws, zs):
    axes[0].plot(x, w, color="tab:blue", alpha=0.25)
    axes[1].plot(x, z, color="tab:orange", alpha=0.25)
No description has been provided for this image

What this example shows is that it's not enough that $\sigma_n(W)$ and $\sigma_n(Z)$ have the same decay rate. They both decay like some constant $\times$ $n^{-\gamma}$ for the same value of $\gamma$. The two distributions are mutually singular even if the rates are the same but the constants are different.

At this point you might wonder if it's enough to have $$\sigma_n(W)/\sigma_n(Z) \to 1.$$ Even this much tighter condition is not enough!

The Feldman-Hájek theorem states that, in order for $W$ and $Z$ to not concentrate on different sets, we need that $$\sum_n\left(\frac{\sigma_n(W)}{\sigma_n(Z)} - 1\right)^2 < \infty.$$ Not only do we need that the eigenvalue ratio goes to 1, we need it to go to 1 fast enough to be summable. For example, suppose we had a pair of $X$-valued random variables such that $$\frac{\sigma_n(W)}{\sigma_n(Z)} - 1 \sim \frac{\text{const}}{\log n}.$$ The relative error between the two eigenvalue sequences is decreasing to zero. But it isn't decreasing fast enough, so the two random variables would concentrate on different sets.

Why does this matter? I stumbled on the Feldman-Hájek theorem while trying to understand two papers, Beskos et al. (2008) and Cotter et al. (2013). Both of these papers are about how to do Markov-chain Monte Carlo sampling of probability distributions defined over function spaces. The main ingredient of MCMC is a proposal mechanism, which generates a new candidate sample $z$ from the current sample $w$. We can then also define a reverse proposal -- what is the probability of proposing $w$ if we instead started from $z$? In order for an MCMC method to converge, Tierney (1998) showed that the forward and reverse proposals need to be mutually absolutely continuous. They cannot concentrate probability on different sets. In finite dimensions, it's so hard to violate this condition that it often goes unstated. In function spaces, on the other hand, you're almost guaranteed to violate it unless you're careful. Even for the nicest class of distributions (normal random variables), the spectra of the covariance operators have to have nearly identical asymptotic behavior. In light of that, it's not surprising that MCMC sampling in function spaces is so challenging.

Plate theory

In this post we'll look at a fourth-order equation describing the vibrations of clamped plates. Fourth-order problems are much harder to discretize than second-order. Here we'll get around that by using its dual or mixed form. This will use some of the same techniques that we've already seen with the Stokes equations. But it won't be a perfectly conforming discretization, and as a consequence we'll need some jump terms that ought to be familiar from the discontinuous Galerkin method. Finally we'll compute some eigenfunctions of plate operator. This will look similar to what we did with Weyl's law but with an extra mathematical subtlety.

Our starting point is linear elasticity. Here we want to solve for the 3D displacement $u$ of the medium. The first equation is momentum conservation: $$\rho\ddot u = \nabla\cdot\sigma + f$$ where $\sigma$ is the stress tensor and $f$ the body forces. To close the system, we need to supply a constitutive equation relating the stress tensor and the strain tensor $$\varepsilon = \frac{1}{2}\left(\nabla u + \nabla u^*\right).$$ The most general linear constitutive equation we can write down is $$\sigma = \mathscr C\,\varepsilon$$ where $\mathscr C$ is the rank-4 elasticity tensor. We need that $\mathscr C$ maps symmetric tensors to symmetric tensors and that $\varepsilon:\mathscr C\varepsilon$ is always positive. The equilibrium displacement of the medium is the minimizer of the energy functional $$J(u) = \int_\Omega\left(\frac{1}{2}\sigma:\varepsilon + f\cdot u\right)\mathrm dx + \ldots$$ where the ellipses stand for various boundary forcings that we'll ignore.

For a medium that is homogeneous and isotropic, the elasticity tensor has to have the form $$\mathscr C\,\varepsilon = 2\,\mu\,\varepsilon + \lambda\,\text{tr}(\varepsilon)I$$ where $\mu$ and $\lambda$ are the Lamé parameters. As an aside, there are a mess of alternate forms of the elasticity equations. The wiki page has a conversion table at the bottom. Now take this with a grain of salt because I do fluid mechanics. But if I were a solid mechanics kinda guy, this would embarrass me. Get it together folks.

Plate theory is what you get when you assume the medium is thin along the vertical dimension and that this restricts the form that the displacements can take. The first and most widely agreed-upon simplification is that the vertical displacement is some function $w$ of the horizontal coordinates $x$, $y$: $$u_z = w(x, y).$$ From here we have to make additional assumptions about the horizontal displacements $u_x$ and $u_y$. Different theories make different sets of assumptions. The classical theory is the Kirchoff-Love plate. The Kirchoff theory assumes that any straight line that's perpendicular to the middle of the plate remains straight and perpendicular after deformation. This theory has some deficiences, see for example this paper. But it's a good starting point for experimentation. These assumptions let us write down the other components of the deformation in terms of $w$: $$u_x = -z\frac{\partial w}{\partial x}, \quad u_y = -z\frac{\partial w}{\partial y}.$$ It's possible that $u_x$ and $u_y$ are also offset by in-plane displacements. I'll assume that the boundary conditions make those equal to zero. We can then work out what the displacement gradient is: $$\nabla u = \left[\begin{matrix}-z\nabla^2 w & -\nabla w \\ +\nabla w^* & 0 \end{matrix}\right]$$ I'm being a little loose in my use of the gradient operator -- it's 3D on the left-hand side of this equation and 2D on the right. Notice how the antisymmetric part of the displacement gradient is all in the $x-z$ and $y-z$ components. When we symmetrize the displacement gradient, we get the strain tensor: $$\varepsilon = -z\left[\begin{matrix}\nabla^2 w & 0 \\ 0 & 0\end{matrix}\right].$$ Now remember that the medium is a thin plate. We can express the spatial domain as the product of a 2D footprint domain $\Phi$ and the interval $[-h / 2, +h / 2]$ where $h$ is the plate thickness. The strain energy is then $$\begin{align} J(w) & = \int_\Phi\int_{-h/2}^{+h/2}\left(\mu z^2|\nabla^2 w|^2 + \frac{\lambda}{2} z^2|\Delta w|^2 + fw\right)\mathrm dz\;\mathrm dx\\ & = \int_\Phi\left\{\frac{h^3}{24}\left(2\mu|\nabla ^2 w|^2 + \lambda|\Delta w|^2\right) + hfw\right\}\mathrm dx \end{align}.$$ Here I've used the fact that $\text{tr}(\nabla^2w) = \Delta w$ where $\Delta$ is the Laplace operator. In a gross abuse of notation I'll write this as $$J(w) = \int_\Omega\left(\frac{1}{2}\mathscr C\nabla^2w :\nabla^2 w + fw\right)h\,\mathrm dx$$ where $\mathscr C$ is an elasticity tensor. We'll need the explicit form $$\mathscr C\kappa = \frac{h^2}{12}\left(2\,\mu\,\kappa + \lambda\,\text{tr}(\kappa)\,I\right)$$ in a moment.

We can't discretize this problem right away using a conventional finite element basis. The usual piecewise polynomial basis functions are differentiable and piecewise continuous across cell edges. A conforming basis for a minimization problem involving 2nd-order derivatives would need to instead by continuously differentiable. It's much harder to come up with $C^1$ bases.

We have a few ways out.

  1. Use a $C^1$ basis like Argyris or Hsieh-Clough-Tocher. This approach makes forming the problem easy, but applying boundary conditions hard. Kirby and Mitchell (2018) implemented the Argyris element in Firedrake but had to use Nitsche's method to enforce the boundary conditions.
  2. Use $C^0$ elements and discretize the second derivatives using an interior penalty formulation of the problem. This approach is analogous to discretizing a 2nd-order elliptic problem using DG elements. Forming the problem is harder but applying the boundary conditions is easier. Bringmann et al. (2023) work out the exact form the interior penalty parameters necessary to make the discrete form work right.
  3. Use the mixed or dual form of the problem, which introduces the moment tensor explicitly as an unknown. Discretize it with the Hellan-Herrmann-Johnson or HHJ element. Arnold and Walker (2020) describe this formulation of the problem in more detail.

In the following, I'll take the third approach. The dual form of the problem introduces the moment tensor, which in another gross abuse of notation I'll write as $\sigma$, explicitly as an unknown. We then add the constraint that $\sigma = \mathscr C\nabla^2 w$. But switching to the dual form of the problem inverts constitutive relations, so we instead enforce $$\nabla^2w = \mathscr A\sigma$$ where $\mathscr A$ is the inverse to $\mathscr C$. If $\mathscr C$ is like the elasticity tensor in 3D, then $\mathscr A$ is like the compliance tensor. Because we'll need it later, the explicit form of the compliance tensor is $$\mathscr A\sigma = \frac{6}{h^2\mu}\left(\sigma - \frac{\lambda}{2(\mu + \lambda)}\text{tr}(\sigma)I\right).$$ The full Lagrangian for the dual form is $$L(w, \sigma) = \int_\Omega\left(\frac{1}{2}\mathscr A\sigma : \sigma - \sigma : \nabla^2 w - fw\right)h\,\mathrm dx.$$ Here the displacement $w$ acts like a multiplier enforcing the constraint that $\nabla\cdot\nabla\cdot\sigma + f = 0$.

To get a conforming discretization of the dual problem, we'd still need the basis functions for the displacements to have continuous derivatives. That's exactly what we were trying to avoid by using the dual form in the first place. At the same time, we don't assume any continuity properties of the moments. We can work around this problem by relaxing the continuity requirements for the displacements while strengthening them for the moments. The Hellann-Herrmann-Johnson element discretizes the space of symmetric tensor fields with just enough additional regularity to make this idea work. The HHJ element has normal-normal continuity: the quantity $n\cdot\sigma n$ is continuous across cell boundaries. The tangential and mixed components are unconstrained. This extra continuity is enough to let us use the conventional Lagrange finite elements for the displacements as long as we add a correction term to the Lagrangian: $$\ldots + \sum_\Gamma\int_\Gamma (n\cdot\sigma n)\,\left[\!\!\left[\frac{\partial w}{\partial n}\right]\!\!\right]h\,\mathrm dS$$ where $\Gamma$ are all the edges of the mesh and the double square bracket denotes the jump of a quantity across a cell boundary. What is especially nice about the HHJ element is that this correction is all we need to add; there are no mesh-dependent penalty factors.

Experiments

First let's try and solve a plate problem using simple input data. We'll set all the physical constants equal to 1 but you can look these up for the material of your fancy.

import firedrake
from firedrake import Constant, inner, tr, dot, grad, dx, ds, dS, avg, jump
import ufl
f = Constant(1.0)
h = Constant(1.0)
μ = Constant(1.0)
λ = Constant(1.0)

The code below creates the dual form of the problem using the formula I wrote down above for the explicit form of the compliance tensor. Lord Jesus I hope I did all that algebra correct.

def form_hhj_lagrangian(z, f, h, μ, λ):
    mesh = ufl.domain.extract_unique_domain(z)
    w, σ = firedrake.split(z)
    I = firedrake.Identity(2)
    n = firedrake.FacetNormal(mesh)

     = 6 / (h**2 * μ) * (σ - λ / (2 * (μ + λ)) * tr(σ) * I)

    L_cells = (0.5 * inner(, σ) - inner(σ, grad(grad(w))) + f * w) * h * dx
    L_facets = avg(inner(n, dot(σ, n))) * jump(grad(w), n) * h * dS
    L_boundary = inner(n, dot(σ, n)) * inner(grad(w), n) * h * ds
    return L_cells + L_facets + L_boundary

The right degrees are $p + 1$ for the displacements and $p$ for the moments.

p = 1
cg = firedrake.FiniteElement("CG", "triangle", p + 1)
hhj = firedrake.FiniteElement("HHJ", "triangle", p)

Here we'll work on the unit square. An interesting feature of plate problems is that they don't become much easier to solve analytically on the unit square by separation of variables because of the mixed derivatives. The eigenfunctions of the biharmonic operator (we'll get to them below) are easier to derive on the circle than the square.

n = 64
mesh = firedrake.UnitSquareMesh(n, n, diagonal="crossed")
Q = firedrake.FunctionSpace(mesh, cg)
Σ = firedrake.FunctionSpace(mesh, hhj)
Z = Q * Σ
z = firedrake.Function(Z)
L = form_hhj_lagrangian(z, f, h, μ, λ)
F = firedrake.derivative(L, z)

For plate problems, we almost always have some Dirichlet boundary condition $$w|_{\partial\Omega} = w_0.$$ Since the problem is fourth-order, we need to supply more boundary conditions than we do for, say, the diffusion equation. There are two kinds. The first is the clamped boundary condition: $$\frac{\partial w}{\partial n} = 0.$$ The second is the simply-supported boundary condition: $$n\cdot\sigma n = 0.$$ Before I started writing this I was dreading how I was going to deal with either one. I was shocked at how easy it was to get both using the HHJ element. First, let's see what happens if we only enforce zero displacement at the boundary.

bc = firedrake.DirichletBC(Z.sub(0), 0, "on_boundary")
firedrake.solve(F == 0, z, bc)
w, σ = z.subfunctions
n = firedrake.FacetNormal(mesh)
dw_dn = inner(grad(w), n)
σ_nn = inner(dot(σ, n), n)

boundary_slope = firedrake.assemble(dw_dn ** 2 * ds)
boundary_stress = firedrake.assemble(σ_nn ** 2 * ds)
print(f"Integrated boundary slope:  {boundary_slope}")
print(f"Integrated boundary moment: {boundary_stress}")
Integrated boundary slope:  2.2011469921266892e-10
Integrated boundary moment: 0.004669439784341617

So it looks like the clamped boundary condition is natural with HHJ elements. We can confirm that by looking at a 3D plot.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
firedrake.trisurf(w, axes=ax);
No description has been provided for this image

Now let's see what happens if we apply a Dirichlet boundary condition to the moment part of the solution.

simply_supported_bc = firedrake.DirichletBC(Z.sub(1), 0, "on_boundary")
firedrake.solve(F == 0, z, bcs=[bc, simply_supported_bc])
w, σ = z.subfunctions
n = firedrake.FacetNormal(mesh)
dw_dn = inner(grad(w), n)
σ_nn = inner(dot(σ, n), n)

boundary_slope = firedrake.assemble(dw_dn ** 2 * ds)
boundary_stress = firedrake.assemble(σ_nn ** 2 * ds)
print(f"Integrated boundary slope:  {boundary_slope}")
print(f"Integrated boundary moment: {boundary_stress}")
Integrated boundary slope:  0.006015973810192354
Integrated boundary moment: 0.0

Now the boundary stresses are zero and so the boundary slopes are unconstrained.

fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
firedrake.trisurf(w, axes=ax);
No description has been provided for this image

The reason why this works is that the boundary degrees of freedom for the HHJ element are the normal-normal stress components. So if we set all of those to zero, we get simply-supported boundary conditions. This is analogous to the simpler setting of $H(\text{div})$ elements for vector fields having normal continuity. We can easily enforce boundary conditions like $u\cdot n = 0$ by setting the right degrees of freedom to zero.

We have to work a lot harder to get the boundary conditions right using other finite element bases. For example, the Argyris element has both value and derivative degrees of freedom at the mesh vertices. We can't set the boundary values alone in the same way. In order to set the boundary values with Argyris elements, Kirby and Mitchell had to use Nitsche's method. The Hsieh-Clough-Tocher (HCT) element makes it easy to use clamped boundary conditions but simply-supported is harder. The fact that both types are relatively easy with HHJ is a definite advantage, although it has more total degrees of freedom than HCT.

Eigenfunctions

The eigenfunctions of the plate problem, also known as Chladni patterns, have a fascinating history. This review is good reading. Here I'll show the first few eigenfunctions in a square with simply-supported boundary conditions.

A = firedrake.derivative(F, z)

w, σ = firedrake.split(z)
J = 0.5 * w**2 * dx
M = firedrake.derivative(firedrake.derivative(J, z), z)

bcs = [bc, simply_supported_bc]
problem = firedrake.LinearEigenproblem(A, M, bcs=bcs, restrict=True)

Note the $M$ matrix that we supplied above. We've looked at generalized eigenvalue problems before, like in the post on Weyl's law. In that setting we were solving eigenvalue problems of the form $$A\phi = \lambda M\phi$$ where $M$ was some symmetric positive-definite matrix instead of the identity. In those cases we always used the mass matrix. For saddle-point eigenvalue problems, we're often using something that is no longer definite. Here the form we want is instead $$\left[\begin{matrix} A & B^* \\ B & 0\end{matrix}\right]\left[\begin{matrix}\sigma \\ w\end{matrix}\right] = \lambda\left[\begin{matrix} 0 & 0 \\ 0 & M\end{matrix}\right]\left[\begin{matrix}\sigma \\ w\end{matrix}\right]$$ where $M$ is the mass matrix for the displacement block.

num_values = 40
opts = {
    "solver_parameters": {
        "eps_gen_hermitian": None,
        "eps_target_real": None,
        "eps_smallest_real": None,
        "st_type": "sinvert",
        "st_ksp_type": "gmres",
        "st_pc_type": "lu",
        "st_pc_factor_mat_solver_type": "mumps",
        "eps_tol": 1e-8,
    },
    "n_evals": num_values,
}
eigensolver = firedrake.LinearEigensolver(problem, **opts)
num_converged = eigensolver.solve()
print(f"Number of converged eigenfunctions: {num_converged}")
Number of converged eigenfunctions: 42

The plot below shows the nodal lines of the eigenfunctions. Many of them come in pairs due to the axis-flipping symmetry. These are the patterns that so fascinated Chladni and others.

ϕs = [eigensolver.eigenfunction(n)[0].sub(0) for n in range(num_values)]
fig, axes = plt.subplots(nrows=5, ncols=8, sharex=True, sharey=True)
for ax in axes.flatten():
    ax.set_aspect("equal")
    ax.set_axis_off()

levels = [-10, 0, 10]
for ax, ϕ in zip(axes.flatten(), ϕs):
    firedrake.tricontour(ϕ, levels=levels, axes=ax)
No description has been provided for this image

Mantle convection

Mantle convection

This code is taken from a chapter in The FEniCS Book, which is available for free online. That chapter is in turn an implementation of a model setup from van Keken et al (1997). I took the thermomechanical parts and removed the chemistry. This paper and the code from the FEniCS book all use a non-dimensional form of the equations, which I've adopted here.

In this notebook, we'll see the Stokes equations again, but we'll couple them to the evolution of temperature through an advection-diffusion equation. The extensive quantity is not the temperature itself but the internal energy density $E = \rho c_p T$ where $\rho$, $c_p$ are the mass density and the specific heat at constant pressure. The flux of heat is $$F = \rho c_p T u - k\nabla T.$$ We'll assume there are no heat sources but, but the real physics would include sources from decay of radioactive elements, strain heating, and chemical reactions. The variational form of the heat equation is: $$\int_\Omega\left\{\partial_t(\rho c_p T)\,\phi - (\rho c_p Tu - k\nabla T)\cdot\nabla\phi - Q\phi\right\}dx = 0$$ for all test functions $\phi$. I've written this in such a way that it still makes sense when the density and heat capacity aren't constant. We'll assume they are (mostly) constant in the following. The variational form of the momentum balance equation is $$\int_\Omega\left\{2\mu\varepsilon(u): \dot\varepsilon(v) - p\nabla\cdot v - q\nabla\cdot u - \rho g\cdot v\right\}dx = 0$$ for all velocity and pressure test functions $v$, $q$.

The characteristic that makes this problem interesting is that we assume the flow is just barely compressible. This is called the Boussinesq approximation). In practice, it means we do two apparently contradictory things. First, we assume that the flow is incompressible: $$\nabla\cdot u = 0.$$ Second, we assume that the density that we use on the right-hand side of the momentum balance equation actually is compressible, and that the degree of compression is linear in the temperature: $$\rho = \rho_0(1 - \beta(T - T_0))$$ where $\rho_0$ is a reference density and $\beta$ is thermal expansion coefficient. These assumptions aren't consistent with each other. We're also not consistent in using this everywhere. For example, we only use this expression on the right-hand side of the momentum balance equation, but we'll still take the internal energy density to be equal to $\rho_0c_pT$ in the energy equation. In other words, the material is compressible as far as momentum balance is concerned but not energy balance. I find that uncomfortable. But I would find writing a full compressible solver to be blinding agony. Anyway it lets us make pretty pictures.

I won't go into a full non-dimensionalization of the problem, which is detailed in the van Keken paper. The most important dimensionless numbers for this system are the Prandtl number and the Rayleigh number. The Prandtl number quantifies the ratio of momentum diffusivity to thermal diffusivity: $$\text{Pr} = \frac{\mu / \rho}{k / \rho c_p} = \frac{\nu}{\kappa}$$ where $\nu$ and $\kappa$ are respectively the kinematic viscosity and thermal diffusivity. For mantle flows, the Prandtl number is on the order of $10^{25}$. We can get a sense for a characteristic velocity scale from the momentum balance equation. First, we can take a factor of $\rho_0g$ into the pressure. If we take $U$, $L$, and $\Delta T$ to be the characteristic velocity, length, and temperature difference scales, non-dimensionalizing the momentum balance equation gives $$\frac{\mu U}{L^2} \approx \rho_0g\beta\Delta T$$ which then implies that $U = \rho_0g\beta L^2\Delta T / \mu$. Now we can compute the thermal Péclet number $$\text{Pe} = \frac{UL}{\kappa}.$$ When the Péclet number is large, we're in the convection-dominated regime; when it's small, we're in the diffusion-dominated regime. Taking our estimate of velocity scale from before, we get $$\text{Pe} = \frac{\rho_0 g \beta L^3 \Delta T}{\kappa\mu}.$$ This is identical to the Rayleigh number, which we'll write as $\text{Ra}$. Lord Rayleigh famously used linear stability analysis to show that a fluid that is heated from below and cooled from above is unstable when the Rayleigh number exceeds 600 or so. Here we'll take the Rayleigh number to be equal to $10^6$ as in the van Keken paper. It's a worthwhile exercise in numerical analysis to see what it takes to make a solver go for higher values of the Rayleigh number.

Here we're using a rectangular domain. Again, the spatial scales have been non-dimensionalized.

import numpy as np
from numpy import pi as π
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from tqdm.notebook import trange, tqdm
import firedrake
from firedrake import (
    Constant, sqrt, exp, min_value, max_value, inner, sym, grad, div, dx
)
import irksome
from irksome import Dt

Lx, Ly = Constant(2.0), Constant(1.0)
ny = 32
nx = int(float(Lx / Ly)) * ny
mesh = firedrake.RectangleMesh(
    nx, ny, float(Lx), float(Ly), diagonal="crossed"
)

The initial condition for temperature involves loads of annoyingly long expressions, but it's all in Appendix A of the van Keken paper.

def clamp(z, zmin, zmax):
    return min_value(Constant(zmax), max_value(Constant(zmin), z))

def switch(z):
    return exp(z) / (exp(z) + exp(-z))

Ra = Constant(1e6)

ϵ = Constant(1 / nx)
x = firedrake.SpatialCoordinate(mesh)

q = Lx**(7 / 3) / (1 + Lx**4)**(2 / 3) * (Ra / (2 * np.sqrt(π)))**(2/3)
Q = 2 * sqrt(Lx / (π * q))
T_u = 0.5 * switch((1 - x[1]) / 2 * sqrt(q / (x[0] + ϵ)))
T_l = 1 - 0.5 * switch(x[1] / 2 * sqrt(q / (Lx - x[0] + ϵ)))
T_r = 0.5 + Q / (2 * np.sqrt(π)) * sqrt(q / (x[1] + 1)) * exp(-x[0]**2 * q / (4 * x[1] + 4))
T_s = 0.5 - Q / (2 * np.sqrt(π)) * sqrt(q / (2 - x[1])) * exp(-(Lx - x[0])**2 * q / (8 - 4 * x[1]))
expr = T_u + T_l + T_r + T_s - Constant(1.5)

degree = 1
temperature_space = firedrake.FunctionSpace(mesh, "CG", degree)
T_0 = firedrake.Function(temperature_space).interpolate(clamp(expr, 0, 1))
T = T_0.copy(deepcopy=True)

The important point is to make the fluid hotter below and on one side and cooler above and on the other side, shown in the plot below.

def subplots():
    fig, axes = plt.subplots()
    axes.set_aspect("equal")
    axes.get_xaxis().set_visible(False)
    axes.get_yaxis().set_visible(False)
    axes.set_xlim(0, float(Lx))
    axes.set_ylim(0, float(Ly))
    return fig, axes

fig, axes = subplots()
firedrake.tripcolor(T, cmap="inferno", axes=axes);
No description has been provided for this image

Next we need to make some function spaces for the fluid velocity and pressure. Note how the degree of the velocity space is one higher than that of the pressure -- we're using the Taylor-Hood element again.

pressure_space = firedrake.FunctionSpace(mesh, "CG", 1)
velocity_space = firedrake.VectorFunctionSpace(mesh, "CG", 2)
Z = velocity_space * pressure_space

Once we've created a function in the mixed space, we can then pull out the two parts with the split method.

z = firedrake.Function(Z)
u, p = firedrake.split(z)

The code below creates the variational form of the Stokes equations $$\nabla\cdot\tau - \nabla p + \text{Ra}\;T\; g = 0$$ with temperature-dependent gravitational forcing.

μ = Constant(1)
ε = lambda u: sym(grad(u))

v, q = firedrake.TestFunctions(z.function_space())

τ = 2 * μ * ε(u)
g = Constant((0, -1))
f = -Ra * T * g
F = (inner(τ, ε(v)) - q * div(u) - p * div(v) - inner(f, v)) * dx

We can use the .sub method to pull parts out of mixed spaces, which we need in order to create the right boundary conditions.

bc = firedrake.DirichletBC(Z.sub(0), Constant((0, 0)), "on_boundary")

A bit of magic in order to tell the linear solver that the Stokes equations have a null space we need to project out.

basis = firedrake.VectorSpaceBasis(constant=True, comm=firedrake.COMM_WORLD)
nullspace = firedrake.MixedVectorSpaceBasis(Z, [Z.sub(0), basis])

Here I'm creating solver object since we'll need to repeatedly solve the same system many times.

stokes_problem = firedrake.NonlinearVariationalProblem(F, z, bc)
parameters = {
    "nullspace": nullspace,
    "solver_parameters": {
        "ksp_type": "preonly",
        "pc_type": "lu",
        "pc_factor_mat_solver_type": "mumps",
    },
}
stokes_solver = firedrake.NonlinearVariationalSolver(stokes_problem, **parameters)
stokes_solver.solve()
fig, axes = subplots()
firedrake.streamplot(
    z.sub(0), axes=axes, resolution=1/40, cmap="inferno", seed=1729
);
No description has been provided for this image

Here we're setting up the temperature solver, which includes both convection and diffusion. I've set the timestep based on the mesh size and the maximum speed that we just found above from the velocity solution.

ρ, c, k = Constant(1), Constant(1), Constant(1)
δx = mesh.cell_sizes.dat.data_ro[:].min()
umax = z.sub(0).dat.data_ro[:].max()
δt = Constant(δx / umax)

ϕ = firedrake.TestFunction(temperature_space)
f = ρ * c * T * u - k * grad(T)
G = (ρ * c * Dt(T) * ϕ - inner(f, grad(ϕ))) * dx

lower_bc = firedrake.DirichletBC(temperature_space, 1, [3])
upper_bc = firedrake.DirichletBC(temperature_space, 0, [4])
bcs = [lower_bc, upper_bc]

method = irksome.BackwardEuler()
temperature_solver = irksome.TimeStepper(G, method, Constant(0.0), δt, T, bcs=bcs)
print(f"Timestep: {float(δt):0.03g}")
Timestep: 0.000132

And the timestepping loop. Note that the final time is on a non-dimensional scale again, in physical time it works out to be on the order of a hundred million years. Here we're using an operator splitting approach. We first update the temperature, and then we compute a new velocity and pressure. The splitting error goes like $\mathscr{O}(\delta t)$ as the timestep is reduced. So if we use a first-order integration scheme like backward Euler then the splitting error is asymptotically the same as that of the discretization itself. We could get $\mathscr{O}(\delta t^3)$ or higher convergence by using, say, the Radau-IIA method, but the total error would be dominated by splitting, so there would be little point in trying harder.

final_time = 0.25
num_steps = int(final_time / float(δt))
Ts = [T.copy(deepcopy=True)]
zs = [z.copy(deepcopy=True)]

for step in trange(num_steps):
    temperature_solver.advance()
    stokes_solver.solve()

    Ts.append(T.copy(deepcopy=True))
    zs.append(z.copy(deepcopy=True))

The movie below shows the temperature evolution. The rising and sinking plumes of hot and cold fluid correspond to the mantle plumes that produce surface volcanism.

%%capture
fig, axes = subplots()
kw = {"num_sample_points": 4, "vmin": 0, "vmax": 1, "cmap": "inferno"}
colors = firedrake.tripcolor(Ts[0], axes=axes, **kw)
fn_plotter = firedrake.FunctionPlotter(mesh, num_sample_points=4)
animate = lambda T: colors.set_array(fn_plotter(T))
animation = FuncAnimation(fig, animate, frames=tqdm(Ts), interval=1e3/30)
HTML(animation.to_html5_video())
z = zs[-1]
u, p = z.subfunctions
fig, axes = subplots()
firedrake.streamplot(
    u, axes=axes, resolution=1/40, cmap="inferno", seed=1729
);
No description has been provided for this image

What next

There are a few features in the van Keken paper and otherwise that I didn't attempt here. The original van Keken paper adds chemistry. Solving the associated species transport equation with as little numerical diffusion as possible is hard. You could then make the fluid buoyancy depend on chemical composition and add temperature-dependent chemical reactions.

Real mantle fluid also has a temperature-dependent viscosity. When I tried to add this using the splitting scheme above, the solver quickly diverges. Getting that to work might require a fully coupled time-integration scheme, which I did try. Rather than split up the temperature and the velocity/pressure solves, a fully coupled scheme would solve for all three fields at once. If you could make this work, it would open up the possibility of using higher-order methods like Radau-IIA. Whether it would go or not seemed to depend on the machine and the day of the week. I'll explore that more in a future post.

Kármán vortices

In previous posts, I looked at how to discretize the incompressible Stokes equations. The Stokes equations are a good approximation when the fluid speed is small enough that inertial effects are negligible. The relevant dimensionless number is the ratio $$\text{Re} = \frac{\rho UL}{\mu},$$ the Reynolds number. Stokes flow applies when the Reynolds number is substantially less than 1. The incompressibility constraint adds a new difficulty: we have to make good choices of finite element bases for the velocity and pressure. If we fail to do that, the resulting linear systems can have no solution or infinitely many.

Here I'll look at how we can discretize the full Navier-Stokes equations: $$\frac{\partial}{\partial t}\rho u + \nabla\cdot\rho u\otimes u = -\nabla p + \nabla\cdot \tau$$ where the deviatoric stress tensor is $\tau = 2\mu\dot\varepsilon$. The inertial terms are nonlinear, which makes this problem more difficult yet than the Stokes equations.

The goal here will be to simulate the famous von Kármán vortex street.

Making the initial geometry

First, we'll make a domain consisting of a circle punched out of a box. The fluid flow in the wake of the circle will produce vortices.

import gmsh
gmsh.initialize()
import numpy as np
from numpy import pi as π

Lx = 6.0
Ly = 2.0
lcar = 1 / 16

gmsh.model.add("chamber")
geo = gmsh.model.geo
ps = [(0, 0), (Lx, 0), (Lx, Ly), (0, Ly)]
box_points = [geo.add_point(*p, 0, lcar) for p in ps]
box_lines = [
    geo.add_line(i1, i2) for i1, i2 in zip(box_points, np.roll(box_points, 1))
]

for line in box_lines:
    geo.add_physical_group(1, [line])

f = 1 / 3
c = np.array([f * Lx, Ly / 2, 0])
center = geo.add_point(*c)
r = Ly / 8
num_circle_points = 16
θs = np.linspace(0.0, 2 * π, num_circle_points + 1)[:-1]
ss = np.column_stack((np.cos(θs), np.sin(θs), np.zeros(num_circle_points)))
tie_points = [geo.add_point(*(c + r * s), lcar) for s in ss]
circle_arcs = [
    geo.add_circle_arc(p1, center, p2)
    for p1, p2 in zip(tie_points, np.roll(tie_points, 1))
]

geo.add_physical_group(1, circle_arcs)

outer_curve_loop = geo.add_curve_loop(box_lines)
inner_curve_loop = geo.add_curve_loop(circle_arcs)
plane_surface = geo.add_plane_surface([outer_curve_loop, inner_curve_loop])
geo.add_physical_group(2, [plane_surface])
geo.synchronize()
gmsh.model.mesh.generate(2)
gmsh.write("chamber.msh")
Info    : Meshing 1D...
Info    : [  0%] Meshing curve 1 (Line)
Info    : [ 10%] Meshing curve 2 (Line)
Info    : [ 20%] Meshing curve 3 (Line)
Info    : [ 20%] Meshing curve 4 (Line)
Info    : [ 30%] Meshing curve 5 (Circle)
Info    : [ 30%] Meshing curve 6 (Circle)
Info    : [ 40%] Meshing curve 7 (Circle)
Info    : [ 40%] Meshing curve 8 (Circle)
Info    : [ 50%] Meshing curve 9 (Circle)
Info    : [ 50%] Meshing curve 10 (Circle)
Info    : [ 60%] Meshing curve 11 (Circle)
Info    : [ 60%] Meshing curve 12 (Circle)
Info    : [ 70%] Meshing curve 13 (Circle)
Info    : [ 70%] Meshing curve 14 (Circle)
Info    : [ 80%] Meshing curve 15 (Circle)
Info    : [ 80%] Meshing curve 16 (Circle)
Info    : [ 90%] Meshing curve 17 (Circle)
Info    : [ 90%] Meshing curve 18 (Circle)
Info    : [100%] Meshing curve 19 (Circle)
Info    : [100%] Meshing curve 20 (Circle)
Info    : Done meshing 1D (Wall 0.00234985s, CPU 0.002641s)
Info    : Meshing 2D...
Info    : Meshing surface 1 (Plane, Frontal-Delaunay)
Info    : Done meshing 2D (Wall 0.127101s, CPU 0.126774s)
Info    : 4047 nodes 8113 elements
Info    : Writing 'chamber.msh'...
Info    : Done writing 'chamber.msh'
import firedrake
mesh = firedrake.Mesh("chamber.msh")
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set_aspect("equal")
firedrake.triplot(mesh, axes=ax)
ax.legend(loc="upper right");
No description has been provided for this image

Initial velocity

We'll use a fixed inflow velocity $$u_x = 4 \frac{y}{L_y}\left(1 - \frac{y}{L_y}\right) u_{\text{in}}.$$ We'll take as characteristic length scale the radius of the disc $L_y / 8$. In order to see the effect we want, we'll need a Reynolds number that's on the order of 100 or greater.

from firedrake import Constant, inner, outer, sym, grad, div, dx, ds
import irksome
from irksome import Dt

μ = Constant(1e-2)

x, y = firedrake.SpatialCoordinate(mesh)
u_in = Constant(5.0)
ly = Constant(Ly)
expr = firedrake.as_vector((4 * u_in * y / ly * (1 - y / ly), 0))

I've used Taylor-Hood elements for this demonstration. It would be a good exercise to repeat this using, say, Crouzeix-Raviart or other elements.

cg1 = firedrake.FiniteElement("CG", "triangle", 1)
cg2 = firedrake.FiniteElement("CG", "triangle", 2)
Q = firedrake.FunctionSpace(mesh, cg1)
V = firedrake.VectorFunctionSpace(mesh, cg2)
Z = V * Q
z = firedrake.Function(Z)

Here we can use the fact that the Stokes problem has a minimization form. The Navier-Stokes equations do not because of the convective term.

u, p = firedrake.split(z)
ε = lambda v: sym(grad(v))

L = (μ * inner(ε(u), ε(u)) - p * div(u)) * dx

inflow_ids = (1,)
outflow_ids = (3,)
side_wall_ids = (2, 4, 5)
side_wall_bcs = firedrake.DirichletBC(Z.sub(0), Constant((0.0, 0.0)), side_wall_ids)
inflow_bcs = firedrake.DirichletBC(Z.sub(0), expr, inflow_ids)
bcs = [side_wall_bcs, inflow_bcs]

F = firedrake.derivative(L, z)
firedrake.solve(F == 0, z, bcs)

A bit of a philosophical point is in order here. We picked the inflow velocity and the viscosity to produce a high Reynolds number (about 200). The Stokes equations, on the other hand, are physically realistic only when the Reynolds number is $\ll 1$. But we can still solve the Stokes equations in the high-Reynolds number limit. In other words, the mathematical model remains well-posed even in regimes where it is not applicable. Here we're only using the result to initialize a simulation using the "right" model. But it's a mistake -- and one I've made -- to lull yourself into a false sense of correctness merely because the model gave you an answer.

fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_axis_off()
colors = firedrake.streamplot(
    z.subfunctions[0], resolution=1/16, seed=1729, cmap="Blues", axes=ax
)
fig.colorbar(colors, orientation="horizontal");
No description has been provided for this image

Solution method

There are a host of methods for solving the Navier-Stokes equations by breaking them up into two simpler problems. These are known as projection methods). I started out trying those but it required effort, which I find disgusting. So I threw backward Euler at it and it worked.

There are some non-trivial decisions to make about both the variational form and the boundary conditions. I started writing this wanting to use pressure boundary conditions at both the inflow and outflow. (Of course I had to be reminded that you can't prescribe pressures but rather tractions.) This went badly. If a wave reflects off of the obstacle, back upstream, and out the inflow boundary, the simulation will crash. So I had to dial back the challenge and use a fixed inflow velocity and a traction boundary condition at the outflow.

Almost any writing you see about the Navier-Stokes equations will express the problem in differential form and will use incompressibility to apply what might appear to be simplifications. For example, if the fluid is incompressible and the viscosity is constant, then you can rewrite the viscous term like so: $$\nabla\cdot \mu(\nabla u + \nabla u^\top) = \mu\nabla^2u.$$ You'll see this form in almost all numerical methods or engineering textbooks. I don't like it for two reasons. First, the apparent simplification only gets in your way as soon as you want to consider fluids with variable viscosity. Mantle convection is one obvious case -- the temperature and chemistry of mantle rock can change the viscosity by several orders of magnitude. Second, it gives the wrong boundary conditions when you try to discretize the problem (see Limache et al. (2007)). I've retained the symmetric gradients of the velocity and test function in the form below.

The second apparent simplification uses incompressibility to rewrite the convection term: $$\nabla\cdot \rho u\otimes u = u\cdot\nabla \rho u.$$ This form is ubiquitous and reflects a preference for thinking about fluid flow in a Lagrangian reference frame. I prefer to avoid it although both are correct, unlike the Laplacian form of the viscosity. Given any extensive density $\phi$, regardless of its tensor rank, the flux will include a term $\phi\cdot u$. The conversion of this flux from the conservation to the variational form is then $$-\int_\Omega \phi u\cdot\nabla\psi\,dx$$ and this is true of mass, momentum, energy, whatever you like. Taking something that was a divergence and making it not a divergence obfuscates the original conservation principle. It also stops you from pushing the differential operator over onto a test function. So I've instead coded up the convection term as a discretization of $$-\int_\Omega u\otimes u\cdot\dot\varepsilon(v)\,dx + \int_{\partial\Omega\cap\{u\cdot n \ge 0\}}(u\cdot n)(v\cdot n)ds.$$ In the first term, I've used the symmetric gradient of $v$ because the contraction of a symmetric and an anti-symmetric tensor is zero.

All together now, the variational form that I'm using is $$\begin{align} 0 & = \int_\Omega\left\{\rho\,\partial_tu\cdot v - \rho u\otimes u:\varepsilon(v) - p\nabla\cdot v - q\nabla\cdot u + 2\mu\,\dot\varepsilon(u):\dot\varepsilon(v)\right\}dx \\ & \qquad\qquad + \int_\Gamma (\rho u\cdot n)(u \cdot v)ds. \end{align}$$ for all test functions $v$ and $q$.

v, q = firedrake.TestFunctions(Z)
u, p = firedrake.split(z)

ρ = firedrake.Constant(1.0)

F_1 = (
    ρ * inner(Dt(u), v) -
    ρ * inner(ε(v), outer(u, u)) -
    p * div(v) -
    q * div(u) +
    2 * μ * inner(ε(u), ε(v))
) * dx

n = firedrake.FacetNormal(mesh)
F_2 = ρ * inner(u, v) * inner(u, n) * ds(outflow_ids)

F = F_1 + F_2

We'll need to make some choice about the timestep. Here I've computed the CFL time for the mesh and the initial velocity that we computed above. This choice might not be good enough. If we initialized the velocity by solving the Stokes equations, the fluid could evolve to a much higher speed. We might then find that this timestep is inadequate. A principled solution would be to use an adaptive scheme.

dg0 = firedrake.FiniteElement("DG", "triangle", 0)
Δ = firedrake.FunctionSpace(mesh, dg0)
area = firedrake.Function(Δ).project(firedrake.CellVolume(mesh))
δx_min = np.sqrt(2 * area.dat.data_ro.min())

u, p = z.subfunctions
U_2 = firedrake.Function(Δ).project(inner(u, u))
u_max = np.sqrt(U_2.dat.data_ro.max())
cfl_time = δx_min / u_max
print(f"Smallest cell diameter: {δx_min:0.4f}")
print(f"Max initial velocity:   {u_max:0.4f}")
print(f"Timestep:               {cfl_time:0.4f}")

dt = firedrake.Constant(0.5 * cfl_time)
Smallest cell diameter: 0.0351
Max initial velocity:   6.4170
Timestep:               0.0055
params = {
    "solver_parameters": {
        "snes_monitor": ":navier-stokes-output.log",
        "snes_atol": 1e-12,
        "ksp_atol": 1e-12,
        "snes_type": "newtonls",
        "ksp_type": "gmres",
        "pc_type": "lu",
        "pc_factor_mat_solver_type": "mumps",
    },
    "bcs": bcs,
}

method = irksome.BackwardEuler()
t = firedrake.Constant(0.0)
solver = irksome.TimeStepper(F, method, t, dt, z, **params)

I've added a bit of code to show some diagnostic information in the progress bar. First I have it printing out the number of Newton iterations that were required to compute each timestep. If you see this going much above 20 then something is off. Second, I had it print out the maximum pressure. Both of these were useful when I was debugging this code.

from tqdm.notebook import trange

zs = [z.copy(deepcopy=True)]

final_time = 10.0
num_steps = int(final_time / float(dt))
progress_bar = trange(num_steps)
for step in progress_bar:
    solver.advance()
    zs.append(z.copy(deepcopy=True))
    iter_count = solver.solver.snes.getIterationNumber()
    pmax = z.subfunctions[1].dat.data_ro.max()
    progress_bar.set_description(f"{iter_count}, {pmax:0.4f} | ")

Finally, we'll make an animated quiver plot because it looks pretty.

%%capture

from tqdm.notebook import tqdm
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_axis_off()

X = mesh.coordinates.dat.data_ro
V = mesh.coordinates.function_space()
u_t = zs[0].subfunctions[0].copy(deepcopy=True)
interpolator = firedrake.Interpolate(u_t, V)
u_X = firedrake.assemble(interpolator)
u_values = u_X.dat.data_ro

arrows = firedrake.quiver(zs[0].subfunctions[0], axes=ax, cmap="Blues")
def animate(z):
    u_t.assign(z.subfunctions[0])
    u_X = firedrake.assemble(interpolator)
    u_values = u_X.dat.data_ro
    arrows.set_UVC(*(u_values.T))

animation = FuncAnimation(fig, animate, tqdm(zs), interval=1e3/60)
from IPython.display import HTML
HTML(animation.to_html5_video())

There's an empirical formula for the frequency of vortex shedding. A fun follow-up to this would be to compute the shedding frequency from the simulation using, say, a windowed Fourier transform, and comparing the result to the empirical formula. Next on the docket is comparing the results using different spatial finite elements.

Overland Flow

In this post, we'll look at overland flow -- how rainwater drains across a landscape. The equations of motion are pretty rowdy and have some fascinating effects. To derive them, we'll start from the shallow water or Saint Venant equations for the water layer thickness $h$ and velocity $u$:

$$\begin{align} \frac{\partial h}{\partial t} + \nabla\cdot hu & = \dot a \\ \frac{\partial}{\partial t}hu + \nabla\cdot hu\otimes u & = -gh\nabla (b + h) - k|u|u \end{align}$$

The final term in the momentum equation represents frictional dissipation and $k$ is a (dimensionless) friction coefficient. Using the shallow water equations for predicting overland flow is challenging because the thickness can go to zero.

For many thin open channel flows, however, the fluid velocity can be expressed purely in terms of the surface slope and other factors. You could arrive at one such simplification by assuming that the inertial terms in the momentum equation are zero:

$$k|u|u + gh\nabla(b + h) = 0.$$

This approximation is known as the Darcy-Weisbach equation. We'll use it in the following because it's simple and it illustrates all the major difficulties. For serious work, you'd probably want to use the Manning formula, as it has some theoretical justification for turbulent open channel flows. The overall form of the equation and the resulting numerical challenges are the same in each case.

Putting together the Darcy-Weisbach equation for the velocity with the mass conservation equation gives a single PDE for the water layer thickness:

$$\frac{\partial h}{\partial t} - \nabla\cdot\left(\sqrt{\frac{gh^3}{k}}\frac{\nabla(b + h)}{\sqrt{|\nabla(b + h)|}}\right) = \dot a.$$

This looks like a parabolic equation, but there's a catch! The diffusion coefficient is proportional to $h^{3/2}$, so it can go to zero when $h = 0$; all the theory for elliptic and parabolic equations assumes that the diffusion coefficient is bounded below. For a non-degenerate parabolic PDE, disturbances propagate with infinite speed. For the degenerate problem we're considering, that's no longer true -- the $h = 0$ contour travels with finite speed! While we're using the Darcy-Weisbach equation to set the velocity here, we still get finite propagation speed if we use the Manning equation instead. What's important is that the velocity is propertional to some power of the thickness and surface slope.

Eliminating the velocity entirely from the problem is convenient for analysis, but not necessarily the best way to go numerically. We'll retain the velocity as an unknown, which gives the resulting variational form much of the same character as the mixed discretization of the heat equation.

As our model problem, we'll use the dam break test case from Santillana and Dawson (2009). They discretized the overland flow equations using the local discontinuous Galerkin or LDG method, which extends DG for hyperbolic systems to mixed advection-diffusion problems. We'll use different numerics because Firedrake has all the hipster elements. I'm eyeballing the shape of the domain from their figures.

import numpy as np
import gmsh

gmsh.initialize()
geo = gmsh.model.geo

coords = np.array(
    [
        [0.0, 0.0],
        [3.0, 0.0],
        [3.0, 2.0],
        [2.0, 2.0],
        [2.0, 4.0],
        [3.0, 4.0],
        [3.0, 6.0],
        [0.0, 6.0],
        [0.0, 4.0],
        [1.0, 4.0],
        [1.0, 2.0],
        [0.0, 2.0],
    ]
)

lcar = 0.125
points = [geo.add_point(*x, 0, lcar) for x in coords]
edges = [
    geo.add_line(p1, p2) for p1, p2 in
    zip(points, np.roll(points, 1))
]

geo.add_physical_group(1, edges)
loop = geo.add_curve_loop(edges)

plane_surface = geo.add_plane_surface([loop])
geo.add_physical_group(2, [plane_surface])

geo.synchronize()

gmsh.model.mesh.generate(2)
gmsh.write("dam.msh")

gmsh.finalize()
Info    : Meshing 1D...
Info    : [  0%] Meshing curve 1 (Line)
Info    : [ 10%] Meshing curve 2 (Line)
Info    : [ 20%] Meshing curve 3 (Line)
Info    : [ 30%] Meshing curve 4 (Line)
Info    : [ 40%] Meshing curve 5 (Line)
Info    : [ 50%] Meshing curve 6 (Line)
Info    : [ 60%] Meshing curve 7 (Line)
Info    : [ 60%] Meshing curve 8 (Line)
Info    : [ 70%] Meshing curve 9 (Line)
Info    : [ 80%] Meshing curve 10 (Line)
Info    : [ 90%] Meshing curve 11 (Line)
Info    : [100%] Meshing curve 12 (Line)
Info    : Done meshing 1D (Wall 0.00130549s, CPU 0.002145s)
Info    : Meshing 2D...
Info    : Meshing surface 1 (Plane, Frontal-Delaunay)
Info    : Done meshing 2D (Wall 0.0366912s, CPU 0.036251s)
Info    : 1158 nodes 2326 elements
Info    : Writing 'dam.msh'...
Info    : Done writing 'dam.msh'
import firedrake

mesh = firedrake.Mesh("dam.msh")
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.set_aspect("equal")
firedrake.triplot(mesh, axes=ax);
No description has been provided for this image

The bed profile consists of an upper, elevated basin, together with a ramp down to a lower basin.

from firedrake import Constant, min_value, max_value

x = firedrake.SpatialCoordinate(mesh)

y_0 = Constant(2.0)
y_1 = Constant(4.0)
b_0 = Constant(0.0)
b_1 = Constant(1.0)
b_expr = b_0 + (b_1 - b_0) * max_value(0, min_value(1, (x[1] - y_0) / (y_1 - y_0)))

S = firedrake.FunctionSpace(mesh, "CG", 1)
b = firedrake.Function(S).interpolate(b_expr)
fig = plt.figure()
axes = fig.add_subplot(projection="3d")
axes.set_box_aspect((3.0, 6.0, 1.0))
axes.set_axis_off()
firedrake.trisurf(b, axes=axes);
No description has been provided for this image

As I alluded to before, rather than eliminate the velocity entirely, we'll keep it as a field to be solved for explicitly. The problem we're solving, while degenerate, is pretty similar to the mixed form of the heat equation. This suggests that we should use element pairs that are stable for mixed Poisson. Here I'm using the MINI element: continuous linear basis functions for the thickness, and continuous linear enriched with cubic bubbles for the velocity. We could also have used a more proper $H(\text{div})$-conforming pair, like discontinuous Galerkin for the thickness and Raviart-Thomas or Brezzi-Douglas-Marini for the velocity.

cg1 = firedrake.FiniteElement("CG", "triangle", 1)
Q = firedrake.FunctionSpace(mesh, cg1)
b3 = firedrake.FiniteElement("B", "triangle", 3)
V = firedrake.VectorFunctionSpace(mesh, cg1 + b3)
Z = Q * V

The dam break problem specifies that the thickness is equal to 1 in the upper basin and 0 elsewhere. I've done a bit of extra work below because the expression for $h$ is discontinuous, and interpolating it directly gives some obvious mesh artifacts. Instead, I've chosen to project the expression and clamp it above and below.

h_expr = firedrake.conditional(x[1] >= y_1, 1.0, 0.0)
h_0 = firedrake.project(h_expr, Q)
h_0.interpolate(min_value(1, max_value(0, h_0)));
fig, ax = plt.subplots()
ax.set_aspect("equal")
colors = firedrake.tripcolor(h_0, axes=ax)
fig.colorbar(colors);
No description has been provided for this image
z = firedrake.Function(Z)
z_n = firedrake.Function(Z)
δt = Constant(1.0 / 32)
z_n.sub(0).assign(h_0)
z.sub(0).assign(h_0);

The test case in the Santillana and Dawson paper uses a variable friction coefficient in order to simulate the effect of increased drag when flowing over vegetation.

from firedrake import inner

k_0 = Constant(1.0)
δk = Constant(4.0)
r = Constant(0.5)
x_1 = Constant((1.5, 1.0))
x_2 = Constant((1.0, 3.5))
x_3 = Constant((2.0, 2.5))
ψ = sum(
    [
        max_value(0, 1 - inner(x - x_i, x - x_i) / r**2)
        for x_i in [x_1, x_2, x_3]
    ]
)
k = k_0 + δk * ψ
fig, axes = plt.subplots()
axes.set_aspect("equal")
firedrake.tripcolor(firedrake.Function(S).interpolate(k), axes=axes);
No description has been provided for this image

The code below defines the variational form of the overland flow equations.

from firedrake import div, grad, dx

g = Constant(1.0)

h, q = firedrake.split(z)
h_n = firedrake.split(z_n)[0]
ϕ, v = firedrake.TestFunctions(Z)

F_h = ((h - h_n) / δt + div(q)) * ϕ * dx
friction = k * inner(q, q)**0.5 * q
gravity = -g * h**3 * grad(b + h)
F_q = inner(friction - gravity, v) * dx
F = F_h + F_q

We'll run into trouble if we try and use a Newton-type method on the true variational form. Notice how the $q$-$q$ block of the derivative will go to zero whenever $q = 0$. This will happen whenever the thickness is zero too. The usual hack is to put a fudge factor $\varepsilon$ into the variational form, as shown below.

ϵ = Constant(1e-3)
friction = k * (inner(q, q) + ϵ**2)**0.5 * q
gravity = -g * h**3 * grad(b + h)
F_qϵ = inner(friction - gravity, v) * dx
F_ϵ = F_h + F_qϵ

The disadvantage of is that we're then solving a slightly different physics problem. We don't have a great idea ahead of time of what $\varepsilon$ should be either. If we choose it too large, the deviation from the true problem is large enough that we can't believe the results. But if we choose it too small, the derivative will fail to be invertible.

We can take a middle course by instead just using the perturbed variational form just to define the derivative in Newton's method, but keep the true variational form as the quantity to find a root for. To do this, we'll pass the derivative of $F_\varepsilon$ as the Jacobian or J argument to the nonlinear variational problem object. Choosing $\varepsilon$ too small will still cause the solver to crash. Taking it to be too large, instead of causing us to solve a completely different problem, will now only make the solver go slower instead. We still want to make $\varepsilon$ as small as possible, but to my mind, getting the right answer slowly is a more tolerable failure mode than getting the wrong answer.

bcs = firedrake.DirichletBC(Z.sub(1), firedrake.zero(), "on_boundary")
J = firedrake.derivative(F_ϵ, z)
problem = firedrake.NonlinearVariationalProblem(F, z, bcs, J=J)

We'll have one final difficulty to overcome -- what happens if the thickness inadvertently becomes negative? There's a blunt solution that everyone uses, which is to clamp the thickness to 0 from below at every step. Clamping can work in many cases. But if you're using a Runge-Kutta method, it only assures positivity at the end of each step and not in any of the intermediate stages. We can instead formulate the whole problem, including the non-negativity constraint, as a variational inequality. Much like how some but not all variational problems arise from minimization principles, some variational inequalities arise from minimization principles with inequality constraints, like the obstacle problem. But variational inequalities are a more general class of problem than inequality-constrained minimization. Formulating overland flow as as a variational inequality is a bit of overkill for the time discretization that we're using. Nonetheless, I'll show how to do that in the following just for illustrative purposes. We first need two functions representing the upper and lower bounds for the solution. In this case, the upper bound is infinity.

from firedrake.petsc import PETSc

upper = firedrake.Function(Z)
with upper.dat.vec as upper_vec:
    upper_vec.set(PETSc.INFINITY)

The thickness is bounded below by 0, but there's no lower bound at all on the flux, so we'll set only the flux entries to negative infinity.

lower = firedrake.Function(Z)
with lower.sub(1).dat.vec as lower_vec:
    lower_vec.set(PETSc.NINFINITY)

When we want to solve variational inequalities, we can't use the usual Newton solvers in PETSc -- we have a choice between a semi-smooth Newton (vinewtonssls) and an active set solver (vinewtonrsls). I couldn't get the semi-smooth Newton solver to work and I have no idea why.

params = {
    "solver_parameters": {
        "snes_type": "vinewtonrsls",
        "ksp_type": "gmres",
        "pc_type": "lu",
    }
}

solver = firedrake.NonlinearVariationalSolver(problem, **params)

Finally, we'll run the timestepping loop. Here we pass the bounds explicitly on each call to solve.

from tqdm.notebook import trange

final_time = 60.0
num_steps = int(final_time / float(δt))

hs = [z.sub(0).copy(deepcopy=True)]
qs = [z.sub(1).copy(deepcopy=True)]

for step in trange(num_steps):
    solver.solve(bounds=(lower, upper))
    z_n.assign(z)

    h, q = z.subfunctions
    hs.append(h.copy(deepcopy=True))
    qs.append(q.copy(deepcopy=True))

Movie time as always.

%%capture

from matplotlib.animation import FuncAnimation

fig, axes = plt.subplots()
axes.set_aspect("equal")
axes.get_xaxis().set_visible(False)
axes.get_yaxis().set_visible(False)

colors = firedrake.tripcolor(
    hs[0], axes=axes, vmin=0, vmax=1.0, cmap="Blues", num_sample_points=4
)
fn_plotter = firedrake.FunctionPlotter(mesh, num_sample_points=4)

def animate(h):
    colors.set_array(fn_plotter(h))

interval = 1e3 / 60
animation = FuncAnimation(fig, animate, frames=hs, interval=interval)
from IPython.display import HTML

HTML(animation.to_html5_video())

As some a posteriori sanity checking, we can evaluate how much the total water volume deviates.

volumes = np.array([firedrake.assemble(h * dx) for h in hs])
volume_error = (volumes.max() - volumes.min()) / volumes.mean()
print(f"Volume relative error: {volume_error:5.2g}")
Volume relative error: 0.013

Where a truly conservative scheme would exactly preserve the volume up to some small multiple of machine precision, we can only get global conservation up to the mesh resolution with our scheme. Instead, there are spurious "sources" at the free boundary. Likewise, there can be spurious sinks in the presence of ablation, so the sign error can go either way. This topic is covered in depth in this paper.

fig, axes = plt.subplots()
ts = np.linspace(0.0, final_time, num_steps + 1)
axes.set_xlabel("time")
axes.set_ylabel("volume ($m^3$)")
axes.plot(ts, volumes);
No description has been provided for this image

We can examine the fluxes after the fact in order to see where the value of $\varepsilon$ that we picked sits.

qms = [firedrake.project(inner(q, q)**0.5, Q) for q in qs]
area = firedrake.assemble(Constant(1) * dx(mesh))
qavgs = np.array([firedrake.assemble(q * dx) / area for q in qms])
print(f"Average flux: {qavgs.mean()*100**2:5.1f} cm²/s")
print(f"Fudge flux:   {float(ϵ)*100**2:5.1f} cm²/s")
Average flux: 266.6 cm²/s
Fudge flux:    10.0 cm²/s

The fudge flux is 1/25 that of the average. This is quite a bit smaller, but not so much so that we should feel comfortable with this large a perturbation to the physics equations themselves. The ability to use it only in the derivative and not in the residual is a huge help.

To wrap things up, the overland flow equations are a perfect demonstration of how trivially equivalent forms of the same physical problem can yield vastly different discretizations. Writing the system as a single parabolic PDE might seem simplest, but there are several potential zeros in the denominator that require some form of regularization. By contrast, using a mixed form introduces more unknowns and a nonlinear equation for the flux, but there's wiggle room within that nonlinear equation. This makes it much easier to come up with a robust solution procedure, even if it includes a few uncomfortable hacks like using a different Jacobian from that of the true problem. Finally, while our discretization still works ok with no positivity constraint, PETSc has variational inequality solvers that make it possible to enforce positivity.

Billiards on surfaces

In the previous post, I showed how to integrate Hamiltonian systems

$$\begin{align} \dot q & = +\frac{\partial H}{\partial p} \\ \dot p & = -\frac{\partial H}{\partial q} \end{align}$$

using methods that approximately preserve the energy. Here I'd like to look at what happens when there are non-trivial constraints

$$g(q) = 0$$

on the configuration of the system. The simplest example is the pendulum problem, where the position $x$ of the pendulum is constrained to lie on the circle of radius $L$ centered at the origin. These constraints are easy to eliminate by instead working with the angle $\theta$. A more complicated example is a problem with rotational degrees of freedom, where the angular configuration $Q$ is a 3 $\times$ 3 matrix. The constraint comes from the fact that this matrix has to be orthogonal:

$$Q^\top Q = I.$$

We could play similar tricks to the case of the pendulum and use Euler angles, but these introduce other problems when used for numerics. For this or other more complex problems, we'll instead enforce the constraints using a Lagrange multiplier $\lambda$, and working with the constrained Hamiltonian

$$H' = H - \lambda\cdot g(q).$$

We're then left with a differential-algebraic equation:

$$\begin{align} \dot q & = +\frac{\partial H}{\partial p} \\ \dot p & = -\frac{\partial H}{\partial q} + \lambda\cdot\nabla g \\ 0 & = g(q). \end{align}$$

If you feel like I pulled this multiplier trick out of a hat, you might find it more illuminating to think back to the Lagrangian formulation of mechanics, which corresponds more directly with optimization via the stationary action principle. Alternatively, you can view the Hamiltonian above as the limit of

$$H_\epsilon' = H + \frac{|p_\lambda|^2}{2\epsilon} - \lambda\cdot g(q)$$

as $\epsilon \to 0$, where $p_\lambda$ is a momentum variable conjugate to $\lambda$. This zero-mass limit is a singular perturbation, so actually building a practical algorithm from this formulation is pretty awful, but it can be pretty helpful conceptually.

For now we'll assume that the Hamiltonian has the form

$$H = \frac{1}{2}p^*M^{-1}p + U(q)$$

for some mass matrix $M$ and potential energy $U$. The 2nd-order splitting scheme to solve Hamilton's equations of motion in the absence of any constraints are

$$\begin{align} p_{n + \frac{1}{2}} & = p_n - \frac{\delta t}{2}\nabla U(q_n) \\ q_{n + 1} & = q_n + \delta t\cdot M^{-1}p_{n + \frac{1}{2}} \\ p_{n + 1} & = p_{n + \frac{1}{2}} - \frac{\delta t}{2}\nabla U(q_{n + 1}). \end{align}$$

To enforce the constraints, we'll add some extra steps where we project back onto the surface or, in the case of the momenta, onto its cotangent space. In the first stage, we solve the system

$$\begin{align} p_{n + \frac{1}{2}} & = p_n - \frac{\delta t}{2}\left\{\nabla U(q_n) - \lambda_{n + 1}\cdot \nabla g(q_n)\right\} \\ q_{n + 1} & = q_n - \delta t\cdot M^{-1}p_{n + \frac{1}{2}} \\ 0 & = g(q_{n + 1}). \end{align}$$

If we substitute the formula for $p_{n + 1/2}$ into the second equation and then substitute the resulting formula for $q_{n + 1}$ into the constraint $0 = g(q_{n + 1})$, we get a nonlinear system of equations for the new Lagrange multiplier $\lambda_{n + 1}$ purely in terms of the current positions and momenta. Having solved this nonlinear system, we can then substitute the value of $\lambda_{n + 1}$ to obtain the values of $p_{n + 1/2}$ and $q_{n + 1}$. Next, we compute the momentum at step $n + 1$, but subject to the constraint that it has to lie in the cotangent space of the surface:

$$\begin{align} p_{n + 1} & = p_{n + \frac{1}{2}} - \frac{\delta t}{2}\left\{\nabla U(q_{n + 1}) - \mu_{n + 1}\cdot \nabla g(q_{n + 1})\right\} \\ 0 & = \nabla g(q_{n + 1})\cdot M^{-1}p_{n + 1}. \end{align}$$

Once again, we can substitute the first equation into the second to obtain a linear system for the momentum-space multiplier $\mu$. Having solved for $\mu$, we can then back-substitute into the first equation to get $p_{n + 1}$. This is the RATTLE algorithm. (I'm pulling heavily from chapter 7 of Leimkuhler and Reich here if you want to see a comparison with other methods and proofs that it's symplectic.)

Surfaces

Next we have to pick an example problem to work on. To start out, we'll assume that the potential energy for the problem is 0 and focus solely on the free motion of a particle on some interesting surface. The simplest surface we could look at is the sphere:

$$g(x, y, z) = x^2 + y^2 + z^2 - R^2$$

or the torus:

$$g(x, y, z) = \left(\sqrt{x^2 + y^2} - R\right)^2 + z^2 - r^2.$$

Just for kicks, I'd like to instead look at motion on surfaces of genus 2 or higher. There are simple parametric equations for tracing out spheres and tori in terms of the trigonometric functions, so the machinery of explicitly enforcing constraints isn't really necessary. There is no such direct parameterization for higher-genus surfaces, so we'll actually need to be clever in defining the surface and in simulating motion on it. As an added bonus, the ability to trace out curves on the surface will give us a nice way of visualizing it.

To come up with an implicit equation for a higher-genus surface, we'll start with an implicit equation for a 2D curve and inflate it into 3D. For example, the equation for the torus that we defined above is obtained by inflating the implicit equation $\sqrt{x^2 + y^2} - R = 0$ for the circle in 2D. What we want to generate higher-genus surfaces is a lemniscate. An ellipse is defined as the set of points such that the sum of the distances to two foci is constant. Likewise, a lemniscate is defined as the set of points such that the product of the distances to two or more foci is constant. The Bernoulli leminscate is one such example, which traces out a figure-8 in 2D. The Bernoulli leminscate is the zero set of the polynomial

$$f(x, y) = (x^2 + y^2)^2 - a^2(x^2 - y^2)$$

and it also has the parametric equation

$$\begin{align} x & = a\frac{\sin t}{1 + \cos^2t} \\ y & = a\frac{\sin t\cdot\cos t}{1 + \cos^2t} \end{align}$$

which gives us a simple way to visualize what we're starting with.

import numpy as np
from numpy import pi as π

a = 1
t = np.linspace(0, 2 * π, 256)
xs = a * np.sin(t) / (1 + np.cos(t) ** 2)
ys = a * np.sin(t) * np.cos(t) / (1 + np.cos(t) ** 2)

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.plot(xs, ys);
No description has been provided for this image

We've loosely referred to the idea of inflating the zero-contour of a function $f(x, y)$ into 3D. The 3D function defining the desired implicit surface is

$$g(x, y, z) = f(x, y)^2 + z^2 - r^2,$$

where $r$ is a free parameter that we'll have to tune. I'm going to guess that $r < \frac{a}{\sqrt{2}}$ but it could be much less; beyond that we'll have to figure out what $r$ is by trial and error.

The code below uses the sympy software package to create a symbolic representation of the function $g$ defining our surface. Having a symbolic expression for $g$ allows us to evaluate it and its derivatives, but to actually visualize the surface we'll have to sample points on it somehow.

import sympy
x, y, z = sympy.symbols("x y z")
f = (x ** 2 + y ** 2) ** 2 - a ** 2 * (x ** 2 - y ** 2)

r = a / 6
g = f ** 2 + z ** 2 - r **2
dg = sympy.derive_by_array(g, [x, y, z])

Symbolically evaluating $g$ every time is expensive, so the code below uses the lambdify function from sympy to convert our symbolic expression into an ordinary Python function. I've added some additional wrappers so that we can pass in a numpy array of coordinates rather than the $x$, $y$, and $z$ values as separate arguments.

g_fn = sympy.lambdify([x, y, z], g, modules="numpy")
def G(q):
    return np.array([g_fn(*q)])

dg_fn = sympy.lambdify([x, y, z], dg, modules="numpy")
def dG(q):
    return np.array(dg_fn(*q)).reshape([1, 3])

One of the first algorithms for constrained mechanical systems was called SHAKE, so naturally some clever bastard had to make one called RATTLE and there's probably a ROLL out there too. The code below implements the RATTLE algorithm. You can view this as analogous to the Stormer-Verlet method, which does a half-step of the momentum solve, a full step of the position solve, and finally another half-step of the momentum solve again. In the RATTLE algorithm, we have to exercise a bit of foresight in the initial momentum half-step and position full-step in order to calculate a Lagrange multiplier to project an arbitrary position back onto the zero-contour of $g$. Solving for the position multiplier is a true nonlinear equation, whereas the final momentum half-step is just a linear equation for the momentum and its multiplier, which we've written here as $\mu$. Here we only have one constraint, so each multiplier is a scalar, which is a convenient simplification.

from tqdm.notebook import trange, tqdm
import scipy.linalg
import scipy.optimize

def trajectory(q, v, dt, num_steps, f, g, dg, progressbar=False):
    qs = np.zeros((num_steps + 1,) + q.shape)
    vs = np.zeros((num_steps + 1,) + q.shape)

    g_0 = g(q)
    λs = np.zeros((num_steps + 1,) + g_0.shape)
    μs = np.zeros((num_steps + 1,) + g_0.shape)

    def project_position(λ_0, q, v):
        def fn(λ, q, v):
            v_n = v + 0.5 * dt * (f(q) - λ @ dg(q))
            q_n = q + dt * v_n
            return g(q_n)

        result = scipy.optimize.root(fn, λ_0, args=(q, v))
        return result.x

    def project_velocity(q, v):
        J = dg(q)
        # TODO: Don't solve the normal equations, you're making Anne G sad
        A = J @ J.T
        b = J @ v
        return scipy.linalg.solve(A, b, assume_a="pos")

    qs[0] = q
    μs[0] = project_velocity(q, v)
    vs[0] = v - μs[0] @ dg(q)

    iterator = (trange if progressbar else range)(num_steps)
    for t in iterator:
        λs[t + 1] = project_position(λs[t], qs[t], vs[t])
        v_mid = vs[t] + 0.5 * dt * (f(qs[t]) - λs[t + 1] @ dg(qs[t]))
        qs[t + 1] = qs[t] + dt * v_mid

        v_prop = v_mid + 0.5 * dt * f(qs[t + 1])
        μs[t + 1] = project_velocity(qs[t + 1], v_prop)
        vs[t + 1] = v_mid + 0.5 * dt * f(qs[t + 1]) - μs[t + 1] @ dg(qs[t + 1])

    return qs, vs, λs, μs

I'll add that this algorithm was exceedingly fiddly to implement and I had to debug about 5 or 6 times before I got it right. The sanity checking shown below was essential to making sure it was right.

def potential(q):
    return q[2]

def force(q):
    return np.array((0, 0, -1))
num_trajectories = 25
θs = 2 * π * np.linspace(0, 1, num_trajectories)
num_steps = 2000
Qs = np.zeros((num_steps + 1, 3 * num_trajectories))
Vs = np.zeros((num_steps + 1, 3 * num_trajectories))
Λs = np.zeros((num_steps + 1, num_trajectories))
for i, θ in tqdm(enumerate(θs), total=num_trajectories):
    q = np.array((0, 0, r))
    v = np.array((np.cos(θ), np.sin(θ), 0))
    dt = 1e-2
    qs, vs, λs, μs = trajectory(q, v, dt, num_steps, force, G, dG)
    Qs[:, 3 * i : 3 * (i + 1)] = qs
    Vs[:, 3 * i : 3 * (i + 1)] = vs
    Λs[:, i] = λs.flatten()

As a sanity check, we'll evaluate the change in energy throughout the course of the simulation relative to the mean kinetic energy. The relative differences are on the order of 1%, which suggests that the method is doing a pretty good job. I re-ran this notebook with half the timestep and the energy deviation is cut by a factor of four, indicative of second-order convergence.

fig, ax = plt.subplots()
for i in range(num_trajectories):
    qs, vs = Qs[:, 3 * i : 3 * (i + 1)], Vs[:, 3 * i : 3 * (i + 1)]
    K = 0.5 * np.sum(vs ** 2, axis=1)
    U = np.array([potential(q) for q in qs])
    energies = K + U
    ax.plot((energies - energies[0]) / np.mean(K))
No description has been provided for this image

Finally, let's make a movie of the results.

from mpl_toolkits import mplot3d
from mpl_toolkits.mplot3d.art3d import Line3DCollection
from matplotlib.animation import FuncAnimation

def make_animation(
    Qs, depth=25, duration=30.0, start_width=0.1, end_width=1.5, ax=None
):
    num_steps = Qs.shape[0]
    num_particles = Qs.shape[1] // 3

    widths = np.linspace(start_width, end_width, depth)
    collections = []
    for i in range(num_particles):
        q_i = Qs[:depth, 3 * i : 3 * (i + 1)]
        points = q_i.reshape(-1, 1, 3)
        segments = np.concatenate([points[:-1], points[1:]], axis=1)
        collection = Line3DCollection(segments, linewidths=widths)
        collections.append(collection)
        ax.add_collection(collection)

    def update(step):
        start = max(0, step - depth)
        for i in range(num_particles):
            q_i = Qs[step - depth : step, 3 * i : 3 * (i + 1)]
            points = q_i.reshape(-1, 1, 3)
            segments = np.concatenate([points[:-1], points[1:]], axis=1)
            collections[i].set_segments(segments)

    interval = 1e3 * duration / num_steps
    frames = list(range(depth, num_steps))
    return FuncAnimation(
        ax.figure, update, frames=frames, interval=interval, blit=False
    )

My Riemannian geometry kung fu is weak is but I think that the geodesic flow on this surface is ergodic (see these notes).

%%capture

fig = plt.figure()
ax = fig.add_subplot(projection="3d")
ax.set_xlim((-a, a))
ax.set_ylim((-a, a))
ax.set_zlim((-a, a))
ax.set_axis_off()

animation = make_animation(Qs, depth=100, ax=ax)
from IPython.display import HTML
HTML(animation.to_html5_video())

It's also interesting to have a look at what the respective Lagrange multipliers for position and velocity are doing.

fig, ax = plt.subplots()
ts = np.linspace(0.0, num_steps * dt, num_steps + 1)
ax.plot(ts, Λs[:, 6].reshape(-1));
No description has been provided for this image

Note how the Lagrange multipliers aren't smooth -- they have pretty sharp transitions. If you think of the Lagrange multipliers as fictitious "forces" that push the trajectories back onto the constraint manifold, then their amplitude is probably some kind of indicator of the local curvature of the constraint surface.

More interesting now

This all worked well enough for a single particle on the surface. Now let's see what happens if we put several particles on the surface and make them interact. I'd like to find some potential that's repulsive at distances shorter than equilibrium, attractive at longer distances, and falls off to zero at infinity. We could use the Lennard-Jones potential shown in the last demo but the singularity at the origin is going to create more difficulty than necessary. Instead, I'll use a variant of the Ricker wavelet, which is plotted below.

r_e = a / 6
U_e = 0.5
r = sympy.symbols("r")
ρ = r / r_e
potential = U_e / 2 * (1 - 3 * ρ ** 2) * sympy.exp(3 / 2 * (1 - ρ ** 2))
rs = np.linspace(0.0, 3 * r_e, 61)
Us = sympy.lambdify(r, potential, modules="numpy")(rs)

fig, ax = plt.subplots()
ax.set_xlabel("distance / equilibrium")
ax.set_ylabel("potential")
ax.plot(rs / r_e, Us);
No description has been provided for this image

I'm using this potential just because it's convenient -- no one thinks there are real particles that act like this.

Now that we're looking at a multi-particle system, we have to evaluate the constraint on every single particle. The derivative matrix has a block structure which a serious implementation would take advantage of.

def G(q):
    return np.array([g_fn(*q[3 * i: 3 * (i + 1)]) for i in range(len(q) // 3)])

# TODO: Make it a sparse matrix
def dG(q):
    n = len(q) // 3
    J = np.zeros((n, 3 * n))
    for i in range(n):
        q_i = q[3 * i: 3 * (i + 1)]
        J[i, 3 * i: 3 * (i + 1)] = dg_fn(*q_i)

    return J

The code below calculates the total forces by summation over all pairs of particles. I added this silly extra variable force_over_r to avoid any annoying singularities at zero distance.

force = sympy.diff(potential, r)
force_over_r = sympy.lambdify(r, sympy.simplify(force / r), modules="numpy")

def F(q):
    n = len(q) // 3
    f = np.zeros_like(q)
    for i in range(n):
        q_i = q[3 * i: 3 * (i + 1)]
        for j in range(i + 1, n):
            q_j = q[3 * j: 3 * (j + 1)]
            r_ij = q_i - q_j
            r = np.sqrt(np.inner(r_ij, r_ij))
            f_ij = force_over_r(r) * r_ij

            f[3 * i: 3 * (i + 1)] += f_ij
            f[3 * j: 3 * (j + 1)] -= f_ij

    return f

To initialize the system, we'll take every 100th point from one of the trajectories that we calculated above.

skip = 100
particle = 3
q = Qs[::skip, 3 * particle : 3 * (particle + 1)].flatten()
v = np.zeros_like(q)
dt = 1e-2
N = 2000
qs, vs, λs, μs = trajectory(q, v, dt, N, F, G, dG, progressbar=True)
%%capture

fig = plt.figure()
ax = fig.add_subplot(projection="3d")
ax.set_xlim((-a, a))
ax.set_ylim((-a, a))
ax.set_zlim((-a, a))
ax.set_axis_off()

animation = make_animation(qs, depth=100, ax=ax)
HTML(animation.to_html5_video())

Some of the particles fall into each others' potential wells and become bound, developing oscillatory orbits, while others remain free. For two particles to bind, they have to have just the right momenta and relative positions; if they're moving too fast, they may scatter off of each other, but will ultimately fly off in opposite directions.

Conclusion

Enforcing constraints in solvers for Hamiltonian systems introduces several new difficulties. The basic second-order splitting scheme for unconstrained problems is pretty easy to implement and verify. While the RATTLE algorithm looks to be not much more complicated, it's very easy to introduce subtle off-by-one errors -- for example, accidentally evaluating the constraint derivative at the midpoint instead of the starting position. These mistakes manifest themselves as slightly too large deviations from energy conservation, but these deviations aren't necessarily large in any relative sense. The resulting scheme might still converge to the true solution, in which case the energy deviation will go to zero for any finite time interval. So measuring the reduction in the energy errors asymptotically as $\delta t \to 0$ probably won't catch this type of problem. It may be possible to instead calculate what the next-order term is in the Hamiltonian for the modified system using the Baker-Campbell-Hausdorff formula, but that may be pretty rough in the presence of constraints.

The implementation may be fiddly and annoying, but it is still possible to preserve much of the symplectic structure when constraints are added. The fact that structure-preserving integrators exist at all shouldn't be taken as a given. For example, there don't appear to be any simple structure-preserving adaptive integration schemes; see chapter 9 of Leimkuhler and Reich. The shallow water equations are a Hamiltonian PDE and deriving symplectic schemes that include the necessary upwinding is pretty hard.

There are several respects in which the code I wrote above is sub-optimal. For the multi-particle simulation, the constraints are applied to each particle and consequently the constraint derivative matrix $J$ is very sparse and $J\cdot J^\top$ is diagonal. For expediency's sake, I just used a dense matrix, but this scales very poorly to more particles. A serious implementation would either represent $J$ as a sparse matrix or would go matrix-free by providing routines to calculate the product of $J$ or $J^\top$ with a vector. I also implemented the projection of the momentum back onto the cotangent space by solving the normal equations, which is generally speaking a bad idea. The matrix $J\cdot J^\top$ was diagonal for our problem, so this approach will probably work out fine. For more complex problems, we may be better off solving a least-squares problem with the matrix $J^\top$ using either the QR or singular value decomposition.

Finally, I used a simple interaction potential just so we could see something interesting happen. The potential goes to a finite value at zero separation, which is a little unphysical. A much more serious deficiency was that the potential is defined using the particles' coordinates in 3D Cartesian space. Ideally, we would do everything in a way that relies as little on how the surface is embedded into Euclidean space as possible, which would mean using the geodesic distance instead.

Symplectic integrators

My dad is a physicist of the old school, and what this means is that he has to tell everyone -- regardless of their field -- that what they're doing is so simple as to not even be worth doing and that anyway physicsists could do it better. So whenever my job comes up he has to tell the same story about how he once took a problem to a numerical analyst. This poor bastard ran some code for him on a state-of-the-art computer of the time (a deer skeleton with a KT88 vacuum tube in its asshole) but the solution was total nonsense and didn't even conserve energy. Then pops realizes he could solve the Hamilton-Jacobi equation for the system exactly. Numerical analysis is for clowns.

Naturally, every time we have this conversation, I remind him that we figured out all sorts of things since then, like the fact that people who don't own land should be allowed to vote and also symplectic integrators. In this post I'll talk about the latter. A symplectic integrator is a scheme for solving Hamilton's equations of motion of classical mechanics in such a way that the map from the state at one time to the state at a later time preserves the canonical symplectic form. This is a very special property and not every timestepping scheme is symplectic. For those schemes that are symplectic, the trajectory samples exactly from the flow of a slightly perturbed Hamiltonian, which is a pretty nice result.

The two-body problem

First, we'll illustrate things on the famous two-body problem, which has the Hamiltonian

$$H = \frac{|p_1|^2}{2m_1} + \frac{|p_2|^2}{2m_2} - \frac{Gm_1m_2}{|x_1 - x_2|}$$

where $x_1$, $x_2$ are the positions of the two bodies, $m_1$, $m_2$ their masses, and $G$ the Newton gravitation constant. We can simplify this system by instead working in the coordinate system $Q = (m_1x_1 + m_2x_2) / (m_1 + m_2)$, $q = x_2 - x_1$. The center of mass $Q$ moves with constant speed, reducing the Hamiltonian to

$$H = \frac{|p|^2}{2\mu} - \frac{Gm_1m_2}{|q|}$$

where $\mu = m_1m_2 / (m_1 + m_2)$ is the reduced mass of the system. We could go on to write $q$ in polar coordinates and do several transformations to derive an exact solution; you can find this in the books by Goldstein or Klepper and Kolenkow.

Instead, we'll take the Hamiltonian above as our starting point, but first we want to make the units work out as nicely as possible. The gravitational constant $G$ has to have units of length${}^3\cdot$time${}^{-2}\cdot$mass${}^{-1}$ in order for both terms in the Hamiltonian we wrote above to have units of energy. We'd like for all the lengths and times in the problem to work out to be around 1, which suggests that we measure time in years and length in astronomic units. The depository of all knowledge tells me that, in this unit system, the gravitational constant is

$$G \approx 4\pi^2\, \text{AU}^3 \cdot \text{yr}^{-2}\cdot M_\odot^{-1}.$$

The factor of $M_\odot^{-1}$ in the gravitational constant will cancel with the corresponding factor in the Newton force law. For something like the earth-sun system, where the mass of the sun is much larger than that of the earth, the reduced mass of the system is about equal to the mass of the earth. So if we take the earth mass $M_\oplus$ as our basic mass unit, the whole system works out to about

$$H = \frac{|p|^2}{2} - \frac{4\pi^2}{|q|}.$$

Finally, in this unit system we can take the initial position of the earth to be a $(1, 0)$ AU; we know the angular velocity of the earth is about $2\pi$ AU / year, so the initial momentum is $2\pi$ AU / year. Hamilton's equations of motion are

$$\begin{align} \dot q & = +\frac{\partial H}{\partial p} = p \\ \dot p & = -\frac{\partial H}{\partial q} = -4\pi^2\frac{q}{|q|^3}. \end{align}$$

To start, we'll try out the classic explicit and implicit Euler methods first.

import numpy as np
from numpy import pi as π

q_0 = np.array([1.0, 0.0])
p_0 = np.array([0.0, 2 * π])

final_time = 3.0
num_steps = 3000
dt = final_time / num_steps
def gravitational_force(q):
    return -4 * π ** 2 * q / np.sqrt(np.dot(q, q)) ** 3
def explicit_euler(q, p, dt, num_steps, force):
    qs = np.zeros((num_steps + 1,) + q.shape)
    ps = np.zeros((num_steps + 1,) + p.shape)

    qs[0] = q
    ps[0] = p

    for t in range(num_steps):
        qs[t + 1] = qs[t] + dt * ps[t]
        ps[t + 1] = ps[t] + dt * force(qs[t])
        
    return qs, ps

We'll call out to scipy's nonlinear solver for our implementation of the implicit Euler method. In principle, scipy can solve the resulting nonlinear system of equations solely with the ability to evaluate the forces. But in order to make this approach as competitive as possible we should also provide the derivative of the forces with respect to the positions, which enables using Newton-type methods.

I = np.eye(2)

def gravitational_force_jacobian(q):
    Q = np.sqrt(np.dot(q, q))
    return -4 * π ** 2 / Q ** 3 * (I - 3 * np.outer(q, q) / Q ** 2)
from scipy.optimize import root

def implicit_euler(q, p, dt, num_steps, force, force_jacobian):
    qs = np.zeros((num_steps + 1,) + q.shape)
    ps = np.zeros((num_steps + 1,) + p.shape)

    qs[0] = q
    ps[0] = p

    def f(q, q_t, p_t):
        return q - q_t - dt * (p_t + dt * force(q))
    
    def J(q, q_t, p_t):
        return I - dt ** 2 * force_jacobian(q)
    
    for t in range(num_steps):
        result = root(f, qs[t, :], jac=J, args=(qs[t], ps[t]))
        qs[t + 1] = result.x
        ps[t + 1] = ps[t] + dt * force(qs[t + 1])
        
    return qs, ps
q_ex, p_ex = explicit_euler(
    q_0, p_0, dt, num_steps, gravitational_force
)
q_im, p_im = implicit_euler(
    q_0, p_0, dt, num_steps, gravitational_force, gravitational_force_jacobian
)
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

def plot_trajectory(q, start_width=1.0, end_width=3.0, **kwargs):
    points = q.reshape(-1, 1, 2)
    segments = np.concatenate([points[:-1], points[1:]], axis=1)
    widths = np.linspace(start_width, end_width, len(points))
    return LineCollection(segments, linewidths=widths, **kwargs)
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_xlim((-1.25, +1.25))
ax.set_ylim((-1.25, +1.25))
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.add_collection(plot_trajectory(q_ex, color="tab:blue", label="explicit"))
ax.add_collection(plot_trajectory(q_im, color="tab:orange", label="implicit"))
ax.legend(loc="upper right");
No description has been provided for this image

The explicit Euler method spirals out from what looks like to be a circular orbit at first, while the implicit Euler method spirals in. Since the gravitational potential is negative, this means that the explicit Euler scheme is gaining energy, while the implicit Euler scheme is losing energy.

def energies(qs, ps):
    kinetic = 0.5 * np.sum(ps ** 2, axis=1)
    potential = -4 * π ** 2 / np.sqrt(np.sum(qs ** 2, axis=1))
    return kinetic + potential

fig, ax = plt.subplots()
ts = np.linspace(0.0, final_time, num_steps + 1)
ax.plot(ts, energies(q_ex, p_ex), label="explicit")
ax.plot(ts, energies(q_im, p_im), label="implicit")
ax.set_xlabel("time (years)")
ax.set_ylabel("energy")
ax.legend();
No description has been provided for this image

If we use a slightly longer timestep, the implicit Euler method will eventually cause the earth and sun to crash into each other in the same short time span of three years. This prediction does not match observations, much as we might wish.

We could reduce the energy drift to whatever degree we desire by using a shorter timestep or using a more accurate method. But before we go and look up the coefficients for the usual fourth-order Runge Kutta method, let's instead try a simple variation on the explicit Euler scheme.

from tqdm.notebook import trange
def semi_explicit_euler(q, p, dt, num_steps, force, progressbar=False):
    qs = np.zeros((num_steps + 1,) + q.shape)
    ps = np.zeros((num_steps + 1,) + p.shape)

    qs[0] = q
    ps[0] = p

    iterator = trange(num_steps) if progressbar else range(num_steps)
    for t in iterator:
        qs[t + 1] = qs[t] + dt * ps[t]
        ps[t + 1] = ps[t] + dt * force(qs[t + 1])
        
    return qs, ps

Rather than use the previous values of the system state to pick the next system state, we first updated the position, then used this new value to update the momentum; we used force(qs[t + 1]) instead of force(qs[t]). This is an implicit scheme in the strictest sense of the word. The particular structure of the central force problem, however, makes the computations explicit. In fancy terms we would refer to the Hamiltonian as separable. Let's see how this semi-explicit Euler scheme does.

q_se, p_se = semi_explicit_euler(
    q_0, p_0, dt, num_steps, gravitational_force
)
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_xlim((-1.5, +1.5))
ax.set_ylim((-1.5, +1.5))
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.add_collection(plot_trajectory(q_ex, color="tab:blue", label="explicit"))
ax.add_collection(plot_trajectory(q_im, color="tab:orange", label="implicit"))
ax.add_collection(plot_trajectory(q_se, color="tab:green", label="symplectic"))
ax.legend(loc="upper right");
No description has been provided for this image

The orbit of the semi-explicit or symplectic method shown in green seems to be roughly closed, which is pretty good. The most stunning feature is that the energy drift, while non-zero, is bounded and oscillatory. The amplitude of the drift is smaller than the energy itself by a factor of about one in 10,000.

fig, ax = plt.subplots()
Hs = energies(q_se, p_se)
ax.plot(ts, Hs - Hs[0], label="semi-explicit")
ax.set_xlabel("time (years)")
ax.set_ylabel("energy drift");
No description has been provided for this image

Just for kicks, let's try again on an elliptical orbit with some more eccentricity than what we tried here, and on the same circular orbit, for a much longer time window.

final_time = 3e2
num_steps = int(3e4)
dt = final_time / num_steps

q_0 = np.array([1.0, 0.0])
p_0 = np.array([0.0, 2 * π])
q_se, p_se = semi_explicit_euler(q_0, p_0, dt, num_steps, gravitational_force)

ϵ = 0.1
q_0 = np.array([1.0 + ϵ, 0.0])
p_0 = np.array([0.0, 2 * π])
q_el, p_el = semi_explicit_euler(q_0, p_0, dt, num_steps, gravitational_force)
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_xlim((-1.5, +1.5))
ax.set_ylim((-1.5, +1.5))
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.add_collection(plot_trajectory(q_se, color="tab:blue", label="circular"))
ax.add_collection(plot_trajectory(q_el, color="tab:orange", label="elliptical"))
ax.legend(loc="lower right");
No description has been provided for this image

The orbits don't exactly trace out circles or ellipses -- the orbits precess a bit. Nonetheless, they still remain roughly closed and have bounded energy drift. For less work than the implicit Euler scheme, we got a vastly superior solution. Why is the semi-explicit Euler method so much better than the explicit or implicit Euler method?

Symplectic integrators

Arguably the most important property of Hamiltonian systems is that the energy is conserved, as well as other quantities such as linear and angular momentum. The explicit and implicit Euler methods are convergent, and so their trajectories reproduce those of the Hamiltonian system for any finite time horizon as the number of steps is increased. These guarantees don't tell us anything about how the discretized trajectories behave using a fixed time step and very long horizons, and they don't tell us anything about the energy conservation properties either. The wonderful property about semi-explicit Euler is that the map from the state of the system at one timestep to the next samples directly from the flow of a slightly perturbed Hamiltonian.

Let's try to unpack that statement a bit more. A fancy way of writing Hamilton's equations of motion is that, for any observable function $f$ of the total state $z = (q, p)$ of the system,

$$\frac{\partial f}{\partial t} = \{f, H\}$$

where $\{\cdot, \cdot\}$ denotes the Poisson bracket. For the simple systems described here, the Poisson bracket of two functions $f$ and $g$ is

$$\{f, g\} = \sum_i\left(\frac{\partial f}{\partial q_i}\frac{\partial g}{\partial p_i} - \frac{\partial f}{\partial p_i}\frac{\partial g}{\partial q_i}\right).$$

We recover the usual Hamilton equations of motion by substituting the positions and momenta themselves for $f$. In general, the Poisson bracket can be any bilinear form that's antisymmetric and satisfies the Leibniz and Jacobi identities. In a later demo, I'll look at rotational kinematics, where the configuration space is no longer flat Euclidean space but the Lie group SO(3). The Poisson bracket is rightfully viewed as a 2-form in this setting. Leaving this complications aside for the moment, the evolution equation in terms of brackets is especially nice in that it allows us to easily characterize the conserved quantities: any function $f$ such that $\{f, H\} = 0$. In particular, due to the antisymmetry of the bracket, the Hamiltonian $H$ itself is always conserved.

Solving Hamilton's equations of motion forward in time gives a map $\Phi_t$ from the initial to the final state. The nice part about this solution map is that it obeys the semi-group property: $\Phi_s\circ\Phi_t = \Phi_{s + t}$. In the same way that we can think of a matrix $A$ generating the solution map $e^{tA}$ of the linear ODE $\dot z = Az$, we can also think of the solution map for Hamiltonian systems as being generated by the Poisson bracket with the Hamiltonian:

$$\Phi_t = \exp\left(t\{\cdot, H\}\right)$$

where $\exp$ denotes the exponential map. This isn't a rigorous argument and to really make that clear I'd have to talk about diffeomorphism groups of manifolds. Just believe me and read Jerrold Marsden's books if you don't.

Now comes the interesting part. Suppose we want to solve the linear ODE

$$\dot z = (A + B)z.$$

We'd like to find a way to break down solving this problem into separately solving ODEs defined by $A$ and $B$. It isn't possible to split the problem exactly because, for matrices, $\exp\left(t(A + B)\right)$ is not equal to $\exp(tA)\exp(tB)$ unless $A$ and $B$ commute. But, for small values of $\delta t$, we can express the discrepancy in terms of the commutate $[A, B] = AB - BA$ of the matrices:

$$\exp(\delta t\cdot A)\exp(\delta t\cdot B) = \exp\left(\delta t(A + B) + \frac{\delta t^2}{2}[A, B] + \ldots\right)$$

where the ellipses denote terms of higher order in $\delta t$. Exactly what goes in the higher-order terms is the content of the Baker-Campbell-Hausdorff (BCH) formula. This reasoning is what leads to splitting methods for all kinds of different PDEs. For example, you can show that splitting the solution of an advection-diffusion equation into an explicit step for the advective part and an implicit step for the diffusive part works with an error of order $\mathscr{O}(\delta t)$ using the BCH formula.

The clever part about the analysis of symplectic methods is that we can play a similar trick for Hamiltonian problems (if we're willing to wave our hands a bit). Suppose that a Hamiltonian $H$ can be written as

$$H = H_1 + H_2$$

where exactly solving for the flow of each Hamiltonian $H_1$, $H_2$ is easy. The most obvious splitting is into kinetic and potential energies $K$ and $U$. Integrating the Hamiltonian $K(p)$ is easy because the momenta don't change all -- the particles continue in linear motion according to what their starting momenta were. Integrating the Hamiltonian $U(q)$ is also easy because, while the momenta will change according to the particles' initial positions, those positions also don't change. To write it down explicitly,

$$\Phi^K_t\left(\begin{matrix}q \\ p\end{matrix}\right) = \left(\begin{matrix}q + t\frac{\partial K}{\partial p} \\ p\end{matrix}\right)$$

and

$$\Phi^U_t\left(\begin{matrix}q \\ p\end{matrix}\right) = \left(\begin{matrix}q \\ p - t\frac{\partial U}{\partial q}\end{matrix}\right)$$

Each of these Hamiltonian systems by itself is sort of silly, but the composition of maps $\Phi^U_{\delta t}\circ \Phi^K_{\delta t}$ gives an $\mathscr{O}(\delta t$)-accurate approximation to $\Phi^{K + U}_{\delta t}$ by the BCH formula. Now if we keep up the analogy and pretend like we can apply the BCH formula to Hamiltonian flows exactly, we'd formally write that

$$\exp\left(\delta t\{\cdot, H_1\}\right)\exp\left(\delta t\{\cdot, H_2\}\right) = \exp\left(\delta t\{\cdot, H_1 + H_2\} + \frac{\delta t^2}{2}\left\{\cdot, \{H_1, H_2\}\right\} + \ldots \right).$$

In other words, it's not just that using the splitting scheme above is giving us a $\mathscr{O}(\delta t)$-accurate approximation to the solution $q(t), p(t)$, it's that the approximate solution is sampled exactly from integrating the flow of the perturbed Hamiltonian

$$H' = H + \frac{\delta t}{2}\{H_1, H_2\} + \mathscr{O}(\delta t^2).$$

All of the things that are true of Hamiltonian systems generally are then true of our numerical approximations. For example, they still preserve volume in phase space (Liouville's theorem)); have no stable or unstable equilibrium points, only saddles and centers; and typically have roughly bounded trajectories.

Using the BCH formula to compute the perturbed Hamiltonian helps us design schemes of even higher order. For example, the scheme that we're using throughout in this post is obtained by taking a full step of the momentum solve followed by a full step of the position solve. We could eliminate the first-order term in the expansion by taking a half-step of momentum, a full step of position, followed by a half-step of momentum again:

$$\Psi = \Phi^K_{\delta t / 2}\Phi^U_{\delta t}\Phi^K_{\delta t / 2},$$

i.e. a symmetric splitting. This gives a perturbed Hamiltonian that's accurate to $\delta t^2$ instead:

$$H' = H + \frac{\delta t^2}{24}\left(2\{U, \{U, K\}\} - \{K, \{K, U\}\}\right) + \mathscr{O}(\delta t^4)$$

This scheme is substantially more accurate and also shares a reversibility property with the true problem.

Making all of this analysis really rigorous requires a bit of Lie algebra sorcery that I can't claim to understand at any deep level. But for our purposes it's sufficient to know that symplectic methods like semi-explicit Euler sample exactly from some perturbed Hamiltonian, which is likely to have bounded level surfaces in phase space if the original Hamiltonian did. This fact gives us stability guarantees that are hard to come by any other way.

Molecular dynamics

The two-body gravitational problem is all well and good, but now let's try it for a more interesting and complex example: the motion of atoms. One of the simplest models for interatomic interactions is the Lennard-Jones (LJ) potential, which has the form

$$U = \epsilon\left(\left(\frac{R}{r}\right)^{12} - 2\left(\frac{R}{r}\right)^6\right).$$

The potential is repulsive at distances less than $R$, attractive at distances between $R$ and $2R$, and pretty much zero at distances appreciably greater than $2R$, with a well depth of $\epsilon$. The LJ potential is spherically symmetric, so it's not a good model for polyatomic molecules like water that have a non-trivial dipole moment, but it's thought to be a pretty approximation for noble gases like argon. We'll work in a geometrized unit system where $\epsilon = 1$ and $R = 1$. The code below calculates the potential and forces for a system of several Lennard-Jones particles.

ϵ = 1.0
R = 1.0

def lennard_jones_potential(q):
    U = 0.0
    n = len(q)
    for i in range(n):
        for j in range(i + 1, n):
            z = q[i] - q[j]
            ρ = np.sqrt(np.dot(z, z)) / R
            U += ϵ / ρ ** 6 * (1 / ρ ** 6 - 2)

    return U

def lennard_jones_force(q):
    fs = np.zeros_like(q)
    n = len(q)
    for i in range(n):
        for j in range(i + 1, n):
            z = q[i] - q[j]
            ρ = np.sqrt(np.dot(z, z)) / R
            f = -12 * ϵ / R ** 2 / ρ ** 8 * (1 - 1 / ρ ** 6) * z
            fs[i] += f
            fs[j] -= f

    return fs

This code runs in $\mathscr{O}(n^2)$ for a system of $n$ particles, but the Lennard-Jones interaction is almost completely negligible for distances greater than $3R$. There are approximation schemes that use spatial data structures like quadtrees to index the positions of all the particles and lump the effects of long-range forces. These schemes reduce the overall computational burden to $\mathscr{O}(n\cdot\log n)$ and are a virtual requirement to running large-scale simulations.

For the initial setup, we'll look at a square lattice of atoms separated by a distance $R$. We'll start out with zero initial velocity. If you were to imagine an infinite or periodic lattice of Lennard-Jones atoms, a cubic lattice should be stable. The points immediately to the north, south, east, and west on the grid are exactly at the equilibrium distance, while the forces between an atom and its neighbors to the northwest and southeast should cancel. For this simulation, we won't include any periodicity, so it's an interesting question to see if the cubic lattice structure remains even in the presence of edge effects.

num_rows, num_cols = 10, 10
num_particles = num_rows * num_cols

q = np.zeros((num_particles, 2))
for i in range(num_rows):
    for j in range(num_cols):
        q[num_cols * i + j] = (R * i, R * j)
        
p = np.zeros((num_particles, 2))

I've added a progress bar to the simulation so I can see how fast it runs. Each iteration usually takes about the same time, so after about 10 or so you can tell whether you should plan to wait through the next cup of coffee or until next morning.

dt = 1e-2
num_steps = 2000

qs, ps = semi_explicit_euler(
    q, p, dt, num_steps, force=lennard_jones_force, progressbar=True
)

And now for some pretty animations.

%%capture
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_xlim((qs[:, :, 0].min(), qs[:, :, 0].max()))
ax.set_ylim((qs[:, :, 1].min(), qs[:, :, 1].max()))
ax.set_aspect("equal")
points = ax.scatter(qs[0, :, 0], qs[0, :, 1], animated=True)

def update(timestep):
    points.set_offsets(qs[timestep, :, :])

num_steps = len(qs)
fps = 60
animation = FuncAnimation(fig, update, num_steps, interval=1e3 / fps)
from IPython.display import HTML
HTML(animation.to_html5_video())

The cubic lattice is unstable -- the particles eventually rearrange into a hexagonal lattice. We can also see this if we plot the potential energy as a function of time. Around halfway into the simulation, the average potential energy suddenly drops by about $\epsilon / 5$.

ts = np.linspace(0, num_steps * dt, num_steps)
Us = np.array([lennard_jones_potential(q) for q in qs]) / num_particles
fig, ax = plt.subplots()
ax.set_xlabel("time")
ax.set_ylabel("potential energy")
ax.plot(ts, Us);
No description has been provided for this image

By way of a posteriori sanity checking, we can see that the total energy wasn't conserved exactly, but the deviations are bounded and the amplitude is much smaller than the characteristic energy scale $\epsilon$ of the problem.

Ks = 0.5 * np.sum(ps ** 2, axis=(1, 2)) / num_particles
Hs = Us + Ks

fig, ax = plt.subplots()
ax.set_xlabel("time")
ax.set_ylabel("energy")
ax.plot(ts, Us, label="potential")
ax.plot(ts, Hs, label="total")
ax.legend();
No description has been provided for this image

Conclusion

An introductory class in numerical ODE will show you how to construct convergent discretization schemes. Many real problems, however, have special structure that a general ODE scheme may or may not preserve. Hamiltonian systems are particularly rich in structure -- energy and phase space volume conservation, reversibility. Some very special discretization schemes preserve this structure. In this post, we focused only on the very basic symplectic Euler scheme and hinted at the similar but more accurate Störmer-Verlet scheme. Another simple symplectic method is the implicit midpoint rule

$$\frac{z_{n + 1} - z_n}{\delta t} = f\left(\frac{z_n + z_{n + 1}}{2}\right).$$

There are of course higher-order symplectic schemes, for example Lobatto-type Runge Kutta methods.

We showed a simulation of several particles interacting via the Lennard-Jones potential, which is spherically symmetric. Things get much more complicated when there are rotational degrees of freedom. The rotational degrees of freedom live not in flat Euclidean space but on the Lie group SO(3), and the angular momenta in the Lie algebra $\mathfrak{so}(3)$. More generally, there are specialized methods for problems with constraints, such as being a rotation matrix, or being confined to a surface.

If you want to learn more, my favorite references are Geometric Numerical Integration by Hairer, Lubich, and Wanner and Simulating Hamiltonian Dynamics by Leimkuhler and Reich.

ADMM

In the previous post, I showed how to use Moreau-Yosida regularization for inverse problems with non-smooth regularization functionals. Specifically, we were looking at the total variation functional

$$R(q) = \alpha\int_\Omega|\nabla q|dx$$

as a regularizer, which promotes solutions that are piecewise constant on sets with relatively nice-looking boundary curves. Rather than try to minimize this functional directly, we instead used a smooth approximation, which in many cases is good enough. The smooth approximation is based on penalty-type methods, and one distinct disadvantage of penalty methods is that they tend to wreck the conditioning of the problem. This poor conditioning manifests itself as a multiple order-of-magnitude imbalance in the different terms in the objective. To minimize the objective accurately, say through a line search procedure, you have to do so with an accuracy that matches the magnitude of the smallest term.

In another previous post on Nitsche's method, I looked at how the pure quadratic penalty method compared to the augmented Lagrangian method for imposing Dirichlet boundary conditions. Here we'll proceed in a similar vein: what happens if we go from using a pure penalty method to using an augmented Lagrangian scheme?

Generating the exact data

We'll use the exact same problem as in the previous post on total variation regularization -- a random Fourier series for the boundary data, a quadratic blob for the forcing, and a discontinuous conductivity coefficient.

import firedrake
mesh = firedrake.UnitSquareMesh(32, 32, diagonal='crossed')
Q = firedrake.FunctionSpace(mesh, 'CG', 2)
V = firedrake.FunctionSpace(mesh, 'CG', 2)
import numpy as np
from numpy import random, pi as π
x = firedrake.SpatialCoordinate(mesh)

rng = random.default_rng(seed=1)
def random_fourier_series(std_dev, num_modes, exponent):
    from firedrake import sin, cos
    A = std_dev * rng.standard_normal((num_modes, num_modes))
    B = std_dev * rng.standard_normal((num_modes, num_modes))
    return sum([(A[k, l] * sin(π * (k * x[0] + l * x[1])) +
                 B[k, l] * cos(π * (k * x[0] + l * x[1])))
                / (1 + (k**2 + l**2)**(exponent/2))
                for k in range(num_modes)
                for l in range(int(np.sqrt(num_modes**2 - k**2)))])
from firedrake import Function
g = Function(V).interpolate(random_fourier_series(1.0, 6, 1))
from firedrake import inner, max_value, conditional, Constant
a = -Constant(4.5)
r = Constant(1/4)
ξ = Constant((0.4, 0.5))
q_true = Function(Q).interpolate(a * conditional(inner(x - ξ, x - ξ) < r**2, 1, 0))
firedrake.trisurf(q_true);
No description has been provided for this image
b = Constant(6.)
R = Constant(1/4)
η = Constant((0.7, 0.5))
f = Function(V).interpolate(b * max_value(0, 1 - inner(x - η, x - η) / R**2))
from firedrake import exp, grad, dx, ds
k = Constant(1.)
h = Constant(10.)
u_true = Function(V)
v = firedrake.TestFunction(V)
F = (
    (k * exp(q_true) * inner(grad(u_true), grad(v)) - f * v) * dx +
    h * (u_true - g) * v * ds
)
opts = {
    'solver_parameters': {
        'ksp_type': 'preonly',
        'pc_type': 'lu',
        'pc_factor_mat_solver_type': 'mumps',
    },
}
firedrake.solve(F == 0, u_true, **opts)
firedrake.trisurf(u_true);
No description has been provided for this image

Generating the observational data

To create the synthetic observations, we'll once again need to call out directly to PETSc to get a random field with the right error statistics when using a higher-order finite element approximation.

ξ = Function(V)
n = len(ξ.dat.data_ro)
ξ.dat.data[:] = rng.standard_normal(n)
from firedrake import assemble
from firedrake.petsc import PETSc
ϕ, ψ = firedrake.TrialFunction(V), firedrake.TestFunction(V)
m = inner(ϕ, ψ) * dx
M = assemble(m, mat_type='aij').M.handle
ksp = PETSc.KSP().create()
ksp.setOperators(M)
ksp.setUp()
pc = ksp.pc
pc.setType(pc.Type.CHOLESKY)
pc.setFactorSolverType(PETSc.Mat.SolverType.PETSC)
pc.setFactorSetUpSolverType()
L = pc.getFactorMatrix()
pc.setUp()
area = assemble(Constant(1) * dx(mesh))
z = Function(V)
z.dat.data[:] = rng.standard_normal(n)
with z.dat.vec_ro as Z:
    with ξ.dat.vec as Ξ:
        L.solveBackward(Z, Ξ)
        Ξ *= np.sqrt(area / n)
 = u_true.dat.data_ro[:]
signal = .max() - .min()
signal_to_noise = 50
σ = firedrake.Constant(signal / signal_to_noise)

u_obs = u_true.copy(deepcopy=True)
u_obs += σ * ξ

Solution via ADMM

To motivate ADMM, it helps to understand the augmented Lagrangian method. There are two basic ways to solve equality-constrained optimization problems: the Lagrange multiplier method and the penalty method. The augmented Lagrangian method uses both a Lagrange multiplier and a quadratic penalty, which astonishingly works much better than either the pure Lagrange multiplier or penalty methods. ADMM is based on using the augmented Lagrangian method with a consensus constraint to split out non-smooth problems. Specifically, we want to find a minimizer of the functional

$$J(q) = E(G(q) - u^o) + \alpha\int_\Omega|\nabla q|\, dx$$

where $E$ is the model-data misfit and $G$ is the solution operator for the problem

$$F(u, q) = 0.$$

If $F$ is continuously differentiable with respect to both of its arguments and the linearization with respect to $u$ is invertible, then the implicit function theorem in Banach spaces tells us that such a solution operator $u = G(q)$ exists. Minimizing this functional $J(q)$ is more challenging than the case where we used the $H^1$-norm to regularize the problem because the total variation functional is non-smooth. In the previous post, we showed how to work around this challenge by using Moreau-Yosida regularization. You can motivate Moreau-Yosida regularization by introducing an auxiliary vector field $v$ and imposing the constraint that $v = \nabla q$ by a quadratic penalty method. We can then solve for $v$ exactly because we know analytically what the proximal operator for the 1-norm is. The resulting functional upon eliminating $v$ is the Moreau-Yosida regularized form.

The idea behind ADMM is to instead use an augmented Lagrangian -- combining both the classical method of Lagrange multipliers with the quadratic penalty method -- to enforce the constraint that $v = \nabla q$. This gives us the augmented Lagrangian

$$\begin{align} L_\rho(q, v, \mu) & = E(G(q) - u^o) + \alpha\int_\Omega|v|\, dx \\ & \qquad + \rho\alpha^2\int_\Omega\left(\mu\cdot(\nabla q - v) + \frac{1}{2}|\nabla q - v|^2\right)dx. \end{align}$$

If you've seen ADMM before, you might notice that we've scaled the penalty parameter a bit. We put in an extra factor of $\alpha^2$ with the penalty term $\|\nabla q - v\|^2$ and an extra factor of $\rho\alpha^2$ with the Lagrange multiplier $\mu$ so that it has the same units as both $\nabla q$ and $v$. In order to highlight the connection with Moreau-Yosida regularization, we'll do a slight rearrangement by completing the square:

$$\begin{align} L_\rho(q, v, \mu) & = E(G(q) - u^o) + \alpha\int_\Omega|v|\, dx \\ & \qquad + \frac{\rho\alpha^2}{2}\int_\Omega\left\{|\nabla q + \mu - v|^2 - |\mu|^2\right\}dx. \end{align}$$

If we look at the parts of the Lagrangian involving only $v$, we get something that looks a lot like Moreau-Yosida regularization of the $L^1$ norm, only the argument to be evaluated at is $\nabla q + \mu$. Likewise, if we look at the parts of the Lagrangian involving only $q$, we have something that looks exactly like $H^1$ regularization, only with the regularization centered around $v - \mu$ instead of around 0.

Each iteration of the method will proceed in three steps:

  1. Minimize $L_\rho$ with respect to $q$ only. This step is very similar to using the squared $H^1$-norm for regularization but for the fact that we're not regularizing around 0, but rather around $v - \mu$.
  2. Minimize $L_\rho$ with respect to $v$: $$v \leftarrow \text{soft threshold}_{\rho\alpha}(\nabla q + \mu)$$
  3. Perform a gradient ascent step for $\mu$: $$\mu \leftarrow \mu + \nabla q - v$$

The final gradient ascent step for $\mu$ seemed a little bit mysterious to me at first. Ultimately the whole problem is a saddle point problem for $\mu$, so intuitively it made sense to me that gradient ascent would move you in the right direction. I still felt in my gut that you should have to do some more sophisticated kind of update for $\mu$. So I fell down a deep rabbit hole trying to understanding why the augmented Lagrangian method actually works. The usual references like Nocedal and Wright are deafeningly silent on this issue; the only source I could find that gave a decent explanation was Nonlinear Programming by Bertsekas. You could just believe me that step 3 is the right thing to do, but the reasons why are not at all trivial!

from firedrake.adjoint import (
    Control, ReducedFunctional, MinimizationProblem, ROLSolver
)

First, we'll solve the forward problem with our blunt initial guess for the solution. Under the hood, pyadjoint will tape this operation, thus allowing us to correctly calculate the derivative of the objective functional later.

firedrake.adjoint.continue_annotation()
q = Function(Q)
u = Function(V)
F = (
    (k * exp(q) * inner(grad(u), grad(v)) - f * v) * dx +
    h * (u - g) * v * ds
)
forward_problem = firedrake.NonlinearVariationalProblem(F, u)
forward_solver = firedrake.NonlinearVariationalSolver(forward_problem, **opts)
forward_solver.solve()
firedrake.trisurf(u);
No description has been provided for this image

These variables will store the values of the auxiliary field $v$ and the multiplier $\mu$ that enforces the constraint $v = \nabla q$. An interesting question that I haven't seen addressed anywhere in the literature on total variation regularization is what finite element basis to use for $v$ and $\mu$. Here we're using the usual continuous Lagrange basis of degree 1, which seems to work. I've also tried this with discontinuous basis functions and hte estimates for $v$ and $\mu$ seem to have oscillatory garbage. I have a hunch that some bases won't work because they fail to satisfy the LBB conditions. I have another hunch that it would be fun to repeat this experiment with some kind of H(curl)-conforming element for $v$ and $\mu$, but for now we'll just stick with CG(1).

Δ = firedrake.VectorFunctionSpace(mesh, 'CG', 1)
v = Function(Δ)
μ = Function(Δ)

At first, $v$ and $\mu$ are zero. So when we start the iteration, the first value of $q$ that we'll compute is just what we would have found had we used $H^1$-regularization. We'll start with the ADMM penalty of $\rho = 1$ and we'll use the same regularization penalty of $\alpha = 1 / 20$ that we used in the previous demo.

α = Constant(5e-2)
ρ = Constant(1.0)

Next we'll execute a few steps of the ADMM algorithm. I picked the number of iterations out of a hat. You should use an actual stopping criterion if you care about doing this right.

rol_options = {
    "Step": {
        "Type": "Line Search",
        "Line Search": {"Descent Method": {"Type": "Quasi-Newton Step"}},
    },
    "Status Test": {
        "Gradient Tolerance": 1e-4,
        "Step Tolerance": 1e-4,
        "Iteration Limit": 500,
    },
    "General": {
        "Print Verbosity": 0,
        "Secant": {"Type": "Limited-Memory BFGS", "Maximum Storage": 10},
    },
}
from firedrake import sqrt
from tqdm.notebook import trange

qs = [q.copy(deepcopy=True)]
vs = [v.copy(deepcopy=True)]
μs = [μ.copy(deepcopy=True)]

num_steps = 15
for step in trange(num_steps):
    # Step 1: Solve the inverse problem for q.
    J = assemble(
        0.5 * ((u - u_obs) / σ)**2 * dx +
        0.5 * ρ * α**2 * inner(grad(q) + μ - v, grad(q) + μ - v) * dx
    )

     = Control(q)
     = ReducedFunctional(J, )
    inverse_problem = MinimizationProblem()
    inverse_solver = ROLSolver(inverse_problem, rol_options, inner_product="L2")
    q_opt = inverse_solver.solve()
    q.assign(q_opt)
    forward_solver.solve()

    # Step 2: soft thresholding for v.
    z = grad(q) + μ
    expr = conditional(
        (ρ * α) ** 2 * inner(z, z) < 1,
        Constant((0, 0)),
        (1 - 1 / (ρ * α * sqrt(inner(z, z)))) * z
    )
    v.project(expr)

    # Step 3: gradient ascent for μ.
    μ.project(μ + grad(q) - v)

    qs.append(q.copy(deepcopy=True))
    vs.append(v.copy(deepcopy=True))

firedrake.adjoint.pause_annotation()
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     9.643037e+00   1.399841e+01   
  1     7.154779e+00   1.335929e+01   6.999207e-01   4         2         3         0         
  2     5.393157e+00   1.204827e+01   2.346359e-01   5         3         1         0         
  3     5.119816e+00   9.150370e+00   6.277251e-02   7         4         2         0         
  4     4.930592e+00   5.247948e+00   7.295959e-02   8         5         1         0         
  5     4.789151e+00   6.097955e+00   8.649612e-02   9         6         1         0         
  6     4.622931e+00   8.221825e+00   1.287386e-01   10        7         1         0         
  7     4.449526e+00   6.205820e+00   7.992481e-02   11        8         1         0         
  8     4.121219e+00   7.280188e+00   2.101343e-01   12        9         1         0         
  9     4.026356e+00   1.433093e+01   1.244611e-01   13        10        1         0         
  10    3.860075e+00   5.601893e+00   5.077568e-02   14        11        1         0         
  11    3.793454e+00   3.482083e+00   4.074358e-02   15        12        1         0         
  12    3.730993e+00   4.139238e+00   3.734976e-02   16        13        1         0         
  13    3.608907e+00   7.079848e+00   1.024446e-01   17        14        1         0         
  14    3.551388e+00   9.673739e+00   1.310366e-01   18        15        1         0         
  15    3.456754e+00   4.237341e+00   2.493397e-02   19        16        1         0         
  16    3.389891e+00   3.839081e+00   3.335864e-02   20        17        1         0         
  17    3.321920e+00   4.611124e+00   5.314189e-02   21        18        1         0         
  18    3.251386e+00   1.149157e+01   1.349899e-01   22        19        1         0         
  19    3.136923e+00   4.332286e+00   7.121934e-02   23        20        1         0         
  20    3.090610e+00   3.009438e+00   3.362013e-02   24        21        1         0         
  21    3.050125e+00   3.475295e+00   4.464281e-02   25        22        1         0         
  22    2.999362e+00   4.302084e+00   5.860036e-02   26        23        1         0         
  23    2.950063e+00   5.533713e+00   9.512963e-02   27        24        1         0         
  24    2.901031e+00   2.801333e+00   2.714397e-02   28        25        1         0         
  25    2.868337e+00   2.878297e+00   3.436521e-02   29        26        1         0         
  26    2.836637e+00   3.322364e+00   3.871974e-02   30        27        1         0         
  27    2.780930e+00   5.206006e+00   1.027000e-01   31        28        1         0         
  28    2.730845e+00   3.408293e+00   5.414826e-02   32        29        1         0         
  29    2.687017e+00   3.500898e+00   5.640573e-02   33        30        1         0         
  30    2.638774e+00   4.034669e+00   7.238146e-02   34        31        1         0         
  31    2.605243e+00   8.682050e+00   1.500411e-01   35        32        1         0         
  32    2.529443e+00   3.819625e+00   2.072799e-02   36        33        1         0         
  33    2.467981e+00   4.215844e+00   5.215309e-02   37        34        1         0         
  34    2.391076e+00   5.920534e+00   8.644902e-02   38        35        1         0         
  35    2.276579e+00   9.312491e+00   1.153366e-01   39        36        1         0         
  36    2.187389e+00   8.750683e+00   7.955271e-02   40        37        1         0         
  37    2.064179e+00   4.438504e+00   4.766639e-02   41        38        1         0         
  38    1.986084e+00   5.061850e+00   7.528908e-02   42        39        1         0         
  39    1.937930e+00   6.904494e+00   5.334200e-02   44        40        2         0         
  40    1.850110e+00   8.898184e+00   7.937133e-02   45        41        1         0         
  41    1.744523e+00   4.402026e+00   5.264442e-02   46        42        1         0         
  42    1.699520e+00   3.337567e+00   3.714799e-02   47        43        1         0         
  43    1.658632e+00   3.838408e+00   4.574495e-02   48        44        1         0         
  44    1.605916e+00   4.808835e+00   8.050445e-02   49        45        1         0         
  45    1.560187e+00   5.081993e+00   4.248256e-02   50        46        1         0         
  46    1.526258e+00   2.679248e+00   2.029849e-02   51        47        1         0         
  47    1.498661e+00   2.641068e+00   2.372319e-02   52        48        1         0         
  48    1.463678e+00   3.308420e+00   4.056073e-02   53        49        1         0         
  49    1.449984e+00   6.664337e+00   5.798838e-02   54        50        1         0         
  50    1.410265e+00   2.645446e+00   1.440682e-02   55        51        1         0         
  51    1.387751e+00   2.199552e+00   2.736317e-02   56        52        1         0         
  52    1.366552e+00   2.536430e+00   3.145133e-02   57        53        1         0         
  53    1.353645e+00   5.649606e+00   8.673526e-02   58        54        1         0         
  54    1.313101e+00   2.518101e+00   2.402064e-02   59        55        1         0         
  55    1.294206e+00   1.971290e+00   1.757626e-02   60        56        1         0         
  56    1.272358e+00   2.576784e+00   3.699604e-02   61        57        1         0         
  57    1.254834e+00   5.244316e+00   4.910536e-02   62        58        1         0         
  58    1.229883e+00   2.355426e+00   1.694050e-02   63        59        1         0         
  59    1.213382e+00   1.882423e+00   1.860405e-02   64        60        1         0         
  60    1.201745e+00   2.384472e+00   3.055203e-02   65        61        1         0         
  61    1.185787e+00   3.816450e+00   3.905445e-02   67        62        2         0         
  62    1.163090e+00   3.002470e+00   3.822290e-02   68        63        1         0         
  63    1.142069e+00   1.973187e+00   3.344676e-02   69        64        1         0         
  64    1.129517e+00   2.121919e+00   1.747341e-02   70        65        1         0         
  65    1.113160e+00   2.321293e+00   4.337948e-02   71        66        1         0         
  66    1.095114e+00   2.880631e+00   3.945214e-02   72        67        1         0         
  67    1.079887e+00   1.744337e+00   2.418480e-02   73        68        1         0         
  68    1.070205e+00   2.006817e+00   2.818702e-02   74        69        1         0         
  69    1.055958e+00   3.014354e+00   3.746316e-02   75        70        1         0         
  70    1.040686e+00   1.647696e+00   2.726464e-02   76        71        1         0         
  71    1.028063e+00   1.551001e+00   3.231767e-02   77        72        1         0         
  72    1.018225e+00   2.165314e+00   2.335971e-02   78        73        1         0         
  73    1.004975e+00   1.756667e+00   4.157743e-02   79        74        1         0         
  74    9.933205e-01   1.680730e+00   3.028746e-02   80        75        1         0         
  75    9.835886e-01   1.888048e+00   2.973520e-02   81        76        1         0         
  76    9.755826e-01   1.576810e+00   2.137186e-02   82        77        1         0         
  77    9.659042e-01   1.485761e+00   1.990405e-02   83        78        1         0         
  78    9.566627e-01   2.110676e+00   4.484715e-02   84        79        1         0         
  79    9.485024e-01   2.096881e+00   2.747037e-02   85        80        1         0         
  80    9.419114e-01   1.219122e+00   6.723367e-03   86        81        1         0         
  81    9.348520e-01   1.286920e+00   2.594424e-02   87        82        1         0         
  82    9.279363e-01   1.614166e+00   2.134413e-02   88        83        1         0         
  83    9.189751e-01   1.993423e+00   5.858074e-02   89        84        1         0         
  84    9.118141e-01   1.719193e+00   1.862688e-02   90        85        1         0         
  85    9.069879e-01   1.111093e+00   1.037850e-02   91        86        1         0         
  86    9.018842e-01   1.046564e+00   1.736425e-02   92        87        1         0         
  87    8.952371e-01   1.563474e+00   3.093647e-02   93        88        1         0         
  88    8.889546e-01   1.325946e+00   3.980427e-02   94        89        1         0         
  89    8.845086e-01   1.076646e+00   1.462483e-02   95        90        1         0         
  90    8.801604e-01   1.032635e+00   1.787966e-02   96        91        1         0         
  91    8.764240e-01   1.150960e+00   1.815361e-02   97        92        1         0         
  92    8.704640e-01   1.066813e+00   2.872132e-02   98        93        1         0         
  93    8.672625e-01   2.491161e+00   4.270177e-02   99        94        1         0         
  94    8.617435e-01   9.570411e-01   6.089080e-03   100       95        1         0         
  95    8.592133e-01   7.871056e-01   5.039762e-03   101       96        1         0         
  96    8.556734e-01   9.944722e-01   1.781953e-02   102       97        1         0         
  97    8.522448e-01   2.084157e+00   3.419571e-02   103       98        1         0         
  98    8.478017e-01   9.765039e-01   2.173814e-02   104       99        1         0         
  99    8.450951e-01   6.986401e-01   1.571291e-02   105       100       1         0         
  100   8.426929e-01   8.622675e-01   1.514048e-02   106       101       1         0         
  101   8.399642e-01   1.653647e+00   2.657581e-02   107       102       1         0         
  102   8.365510e-01   9.092983e-01   1.572915e-02   108       103       1         0         
  103   8.336826e-01   7.212800e-01   1.350535e-02   109       104       1         0         
  104   8.312119e-01   9.094269e-01   1.579383e-02   110       105       1         0         
  105   8.288902e-01   1.391602e+00   2.828746e-02   111       106       1         0         
  106   8.262726e-01   7.477913e-01   1.165017e-02   112       107       1         0         
  107   8.239226e-01   7.174085e-01   1.539580e-02   113       108       1         0         
  108   8.218452e-01   8.538489e-01   1.666317e-02   114       109       1         0         
  109   8.201081e-01   1.821792e+00   4.093249e-02   115       110       1         0         
  110   8.170254e-01   6.829189e-01   7.159954e-03   116       111       1         0         
  111   8.156900e-01   5.865481e-01   4.838552e-03   117       112       1         0         
  112   8.140476e-01   6.641569e-01   1.221252e-02   118       113       1         0         
  113   8.122082e-01   1.337919e+00   2.431581e-02   119       114       1         0         
  114   8.099813e-01   6.672202e-01   2.011067e-02   120       115       1         0         
  115   8.084660e-01   5.408243e-01   1.301420e-02   121       116       1         0         
  116   8.067445e-01   7.447205e-01   1.743968e-02   122       117       1         0         
  117   8.054658e-01   1.209652e+00   2.400962e-02   123       118       1         0         
  118   8.038782e-01   6.085617e-01   4.687949e-03   124       119       1         0         
  119   8.024774e-01   5.088039e-01   9.058040e-03   125       120       1         0         
  120   8.014968e-01   5.700131e-01   9.576051e-03   126       121       1         0         
  121   8.007044e-01   1.661292e+00   2.971997e-02   127       122       1         0         
  122   7.985798e-01   5.217895e-01   9.011942e-03   128       123       1         0         
  123   7.978982e-01   4.055522e-01   4.016476e-03   129       124       1         0         
  124   7.969401e-01   4.815829e-01   1.061061e-02   130       125       1         0         
  125   7.958466e-01   8.411780e-01   1.684280e-02   131       126       1         0         
  126   7.946225e-01   5.020889e-01   1.962119e-02   132       127       1         0         
  127   7.937976e-01   3.881745e-01   8.671115e-03   133       128       1         0         
  128   7.928303e-01   6.896466e-01   1.578443e-02   134       129       1         0         
  129   7.920774e-01   5.950230e-01   1.031924e-02   135       130       1         0         
  130   7.914904e-01   4.403409e-01   4.425709e-03   136       131       1         0         
  131   7.905122e-01   5.246818e-01   1.673403e-02   137       132       1         0         
  132   7.900762e-01   6.507808e-01   1.128173e-02   138       133       1         0         
  133   7.896413e-01   4.277112e-01   2.414059e-03   139       134       1         0         
  134   7.887698e-01   3.292955e-01   1.247300e-02   140       135       1         0         
  135   7.883516e-01   3.561241e-01   7.584141e-03   141       136       1         0         
  136   7.880774e-01   1.077968e+00   2.932974e-02   142       137       1         0         
  137   7.870795e-01   2.998503e-01   2.006312e-03   143       138       1         0         
  138   7.868773e-01   2.687542e-01   2.135749e-03   144       139       1         0         
  139   7.863510e-01   3.613836e-01   5.746220e-03   145       140       1         0         
  140   7.863032e-01   9.973556e-01   1.755351e-02   146       141       1         0         
  141   7.856176e-01   3.090211e-01   1.316962e-03   147       142       1         0         
  142   7.854127e-01   2.274695e-01   4.539449e-03   148       143       1         0         
  143   7.851147e-01   2.898076e-01   9.075818e-03   149       144       1         0         
  144   7.846853e-01   3.421479e-01   1.053450e-02   150       145       1         0         
  145   7.844488e-01   4.827183e-01   9.424942e-03   152       146       2         0         
  146   7.840923e-01   2.427134e-01   7.210749e-03   153       147       1         0         
  147   7.838679e-01   2.045331e-01   3.248989e-03   154       148       1         0         
  148   7.835911e-01   3.226447e-01   5.642305e-03   155       149       1         0         
  149   7.833774e-01   3.748948e-01   6.488885e-03   156       150       1         0         
  150   7.831735e-01   2.251834e-01   3.020169e-03   157       151       1         0         
  151   7.829315e-01   2.156794e-01   6.907847e-03   158       152       1         0         
  152   7.827823e-01   2.555777e-01   4.907795e-03   159       153       1         0         
  153   7.825497e-01   2.206863e-01   9.016209e-03   160       154       1         0         
  154   7.823759e-01   2.860218e-01   5.885784e-03   161       155       1         0         
  155   7.821989e-01   2.039769e-01   3.750025e-03   162       156       1         0         
  156   7.820516e-01   1.802131e-01   2.855609e-03   163       157       1         0         
  157   7.819378e-01   3.278259e-01   4.974406e-03   164       158       1         0         
  158   7.818067e-01   1.586748e-01   2.900218e-03   165       159       1         0         
  159   7.817217e-01   1.503091e-01   2.672508e-03   166       160       1         0         
  160   7.816079e-01   1.855170e-01   4.218257e-03   167       161       1         0         
  161   7.815685e-01   4.333322e-01   6.865707e-03   168       162       1         0         
  162   7.814295e-01   1.550949e-01   6.161954e-04   169       163       1         0         
  163   7.813722e-01   1.150718e-01   9.203258e-04   170       164       1         0         
  164   7.813061e-01   1.381392e-01   1.975982e-03   171       165       1         0         
  165   7.812018e-01   1.761599e-01   3.543759e-03   172       166       1         0         
  166   7.811366e-01   1.964034e-01   3.611504e-03   174       167       2         0         
  167   7.810678e-01   1.038301e-01   2.708798e-03   175       168       1         0         
  168   7.810193e-01   1.054403e-01   1.792834e-03   176       169       1         0         
  169   7.809598e-01   1.621081e-01   2.784228e-03   177       170       1         0         
  170   7.809098e-01   1.775076e-01   3.409971e-03   178       171       1         0         
  171   7.808658e-01   1.022085e-01   7.185032e-04   179       172       1         0         
  172   7.808181e-01   9.835704e-02   1.669477e-03   180       173       1         0         
  173   7.807847e-01   1.212398e-01   1.748331e-03   181       174       1         0         
  174   7.807298e-01   1.204909e-01   4.180061e-03   182       175       1         0         
  175   7.806929e-01   1.358405e-01   2.679010e-03   183       176       1         0         
  176   7.806588e-01   9.743728e-02   1.455691e-03   184       177       1         0         
  177   7.806238e-01   8.824102e-02   1.721721e-03   185       178       1         0         
  178   7.805925e-01   1.133267e-01   2.328784e-03   186       179       1         0         
  179   7.805641e-01   1.060729e-01   3.225727e-03   187       180       1         0         
  180   7.805424e-01   6.717565e-02   8.140367e-04   188       181       1         0         
  181   7.805178e-01   7.414908e-02   1.252292e-03   189       182       1         0         
  182   7.804968e-01   1.097134e-01   1.812511e-03   190       183       1         0         
  183   7.804721e-01   7.465755e-02   2.347357e-03   191       184       1         0         
  184   7.804544e-01   6.181574e-02   1.508851e-03   192       185       1         0         
  185   7.804359e-01   8.489787e-02   1.950301e-03   193       186       1         0         
  186   7.804209e-01   7.638083e-02   1.003378e-03   194       187       1         0         
  187   7.804055e-01   6.034212e-02   1.000571e-03   195       188       1         0         
  188   7.803918e-01   6.936178e-02   1.527492e-03   196       189       1         0         
  189   7.803803e-01   6.528653e-02   1.213385e-03   197       190       1         0         
  190   7.803701e-01   5.154062e-02   8.328123e-04   198       191       1         0         
  191   7.803588e-01   5.408188e-02   1.348300e-03   199       192       1         0         
  192   7.803520e-01   7.751279e-02   1.225145e-03   200       193       1         0         
  193   7.803444e-01   4.881596e-02   4.177782e-04   201       194       1         0         
  194   7.803343e-01   3.817679e-02   1.032828e-03   202       195       1         0         
  195   7.803286e-01   4.624608e-02   7.313966e-04   203       196       1         0         
  196   7.803186e-01   5.000786e-02   1.783573e-03   204       197       1         0         
  197   7.803117e-01   5.956560e-02   1.008118e-03   205       198       1         0         
  198   7.803052e-01   4.116461e-02   4.323393e-04   206       199       1         0         
  199   7.802986e-01   3.566899e-02   6.600859e-04   207       200       1         0         
  200   7.802933e-01   5.356812e-02   9.402358e-04   208       201       1         0         
  201   7.802874e-01   3.738714e-02   1.056837e-03   209       202       1         0         
  202   7.802823e-01   3.291491e-02   9.667675e-04   210       203       1         0         
  203   7.802779e-01   4.724975e-02   9.785863e-04   211       204       1         0         
  204   7.802738e-01   3.221950e-02   4.478986e-04   212       205       1         0         
  205   7.802697e-01   2.913403e-02   5.209517e-04   213       206       1         0         
  206   7.802667e-01   4.090331e-02   6.337414e-04   214       207       1         0         
  207   7.802634e-01   2.778482e-02   5.256630e-04   215       208       1         0         
  208   7.802606e-01   2.444231e-02   4.981491e-04   216       209       1         0         
  209   7.802583e-01   2.971716e-02   5.398047e-04   217       210       1         0         
  210   7.802559e-01   2.546203e-02   5.113083e-04   218       211       1         0         
  211   7.802536e-01   2.290358e-02   4.480445e-04   219       212       1         0         
  212   7.802517e-01   2.740342e-02   4.656331e-04   220       213       1         0         
  213   7.802502e-01   2.150917e-02   2.909617e-04   221       214       1         0         
  214   7.802486e-01   1.845839e-02   2.589914e-04   222       215       1         0         
  215   7.802469e-01   3.167463e-02   5.553718e-04   223       216       1         0         
  216   7.802457e-01   2.691835e-02   4.692968e-04   224       217       1         0         
  217   7.802447e-01   1.617462e-02   5.732223e-05   225       218       1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     1.075368e+00   3.242995e+00   
  1     1.041922e+00   5.667157e+00   3.242995e-02   4         2         3         0         
  2     1.006024e+00   3.333492e+00   1.028643e-02   5         3         1         0         
  3     9.921088e-01   2.751106e+00   8.620322e-03   6         4         1         0         
  4     9.826485e-01   2.359372e+00   6.517574e-03   7         5         1         0         
  5     9.707202e-01   2.902846e+00   1.249302e-02   8         6         1         0         
  6     9.596448e-01   2.508339e+00   1.180290e-02   9         7         1         0         
  7     9.482641e-01   2.330692e+00   1.369899e-02   10        8         1         0         
  8     9.369207e-01   2.683861e+00   1.954149e-02   11        9         1         0         
  9     9.276545e-01   2.296554e+00   1.200667e-02   12        10        1         0         
  10    9.193280e-01   1.996379e+00   1.056360e-02   13        11        1         0         
  11    9.102200e-01   2.353938e+00   1.779687e-02   14        12        1         0         
  12    9.014979e-01   2.258883e+00   1.507160e-02   15        13        1         0         
  13    8.919357e-01   2.234241e+00   1.797587e-02   16        14        1         0         
  14    8.823764e-01   2.388461e+00   2.034138e-02   17        15        1         0         
  15    8.726761e-01   2.217411e+00   1.468877e-02   18        16        1         0         
  16    8.631525e-01   2.280397e+00   2.050165e-02   19        17        1         0         
  17    8.549836e-01   2.143737e+00   1.296597e-02   20        18        1         0         
  18    8.467708e-01   1.868899e+00   1.380649e-02   21        19        1         0         
  19    8.384042e-01   2.146781e+00   2.004827e-02   22        20        1         0         
  20    8.307381e-01   2.067262e+00   1.462216e-02   23        21        1         0         
  21    8.231757e-01   1.900220e+00   1.505245e-02   24        22        1         0         
  22    8.160363e-01   2.177249e+00   1.812258e-02   25        23        1         0         
  23    8.093885e-01   1.959251e+00   1.038715e-02   26        24        1         0         
  24    8.029724e-01   1.742138e+00   1.311772e-02   27        25        1         0         
  25    7.965802e-01   1.920645e+00   1.700535e-02   28        26        1         0         
  26    7.910355e-01   1.993831e+00   1.552345e-02   29        27        1         0         
  27    7.855662e-01   1.592693e+00   9.975383e-03   30        28        1         0         
  28    7.801843e-01   1.728017e+00   1.658105e-02   31        29        1         0         
  29    7.758286e-01   1.734886e+00   1.098485e-02   32        30        1         0         
  30    7.711946e-01   1.455868e+00   9.535564e-03   33        31        1         0         
  31    7.655103e-01   1.594169e+00   1.562258e-02   34        32        1         0         
  32    7.615593e-01   2.025359e+00   1.774182e-02   35        33        1         0         
  33    7.575746e-01   1.278600e+00   4.209858e-03   36        34        1         0         
  34    7.537612e-01   1.134918e+00   1.039033e-02   37        35        1         0         
  35    7.506315e-01   1.433087e+00   1.074882e-02   38        36        1         0         
  36    7.473020e-01   1.654063e+00   1.830844e-02   39        37        1         0         
  37    7.438646e-01   1.253129e+00   1.000589e-02   40        38        1         0         
  38    7.405009e-01   1.206655e+00   1.008274e-02   41        39        1         0         
  39    7.374982e-01   1.534375e+00   1.441354e-02   42        40        1         0         
  40    7.341437e-01   1.292406e+00   1.130378e-02   43        41        1         0         
  41    7.308721e-01   1.127574e+00   8.128203e-03   44        42        1         0         
  42    7.282112e-01   1.294451e+00   1.247946e-02   45        43        1         0         
  43    7.256384e-01   1.268423e+00   1.142502e-02   46        44        1         0         
  44    7.227534e-01   1.121242e+00   1.001837e-02   47        45        1         0         
  45    7.200045e-01   1.222909e+00   1.418346e-02   48        46        1         0         
  46    7.171995e-01   1.368690e+00   1.141436e-02   49        47        1         0         
  47    7.145153e-01   1.093881e+00   8.812838e-03   50        48        1         0         
  48    7.121007e-01   1.050954e+00   1.005401e-02   51        49        1         0         
  49    7.100656e-01   1.289056e+00   9.642991e-03   52        50        1         0         
  50    7.080449e-01   9.301225e-01   7.691494e-03   53        51        1         0         
  51    7.058869e-01   1.030215e+00   1.245134e-02   54        52        1         0         
  52    7.038011e-01   1.256866e+00   1.356065e-02   55        53        1         0         
  53    7.013938e-01   1.309073e+00   1.525669e-02   56        54        1         0         
  54    6.986182e-01   1.160780e+00   1.206245e-02   57        55        1         0         
  55    6.962613e-01   1.096753e+00   6.839751e-03   58        56        1         0         
  56    6.940900e-01   1.038138e+00   9.463012e-03   59        57        1         0         
  57    6.917388e-01   1.041106e+00   1.608105e-02   60        58        1         0         
  58    6.897590e-01   1.141081e+00   1.000667e-02   61        59        1         0         
  59    6.879970e-01   9.033190e-01   6.887836e-03   62        60        1         0         
  60    6.864422e-01   7.549695e-01   5.108412e-03   63        61        1         0         
  61    6.850160e-01   9.323863e-01   9.319700e-03   64        62        1         0         
  62    6.836750e-01   9.828504e-01   1.080393e-02   65        63        1         0         
  63    6.823078e-01   7.567538e-01   7.180160e-03   66        64        1         0         
  64    6.808219e-01   8.111827e-01   1.190593e-02   67        65        1         0         
  65    6.796809e-01   1.053629e+00   1.066000e-02   68        66        1         0         
  66    6.785012e-01   6.888147e-01   3.596005e-03   69        67        1         0         
  67    6.771927e-01   6.533094e-01   8.957348e-03   70        68        1         0         
  68    6.761162e-01   9.771389e-01   1.163009e-02   71        69        1         0         
  69    6.749597e-01   8.136616e-01   1.181468e-02   72        70        1         0         
  70    6.734827e-01   7.344024e-01   1.027578e-02   73        71        1         0         
  71    6.721610e-01   9.541127e-01   1.367038e-02   74        72        1         0         
  72    6.708893e-01   8.045480e-01   7.324008e-03   75        73        1         0         
  73    6.694505e-01   7.889669e-01   8.336833e-03   76        74        1         0         
  74    6.682013e-01   1.150025e+00   1.011519e-02   77        75        1         0         
  75    6.668140e-01   8.138865e-01   4.989496e-03   78        76        1         0         
  76    6.655994e-01   7.809953e-01   5.953360e-03   79        77        1         0         
  77    6.644913e-01   8.127467e-01   4.681797e-03   80        78        1         0         
  78    6.634542e-01   8.290656e-01   1.098672e-02   81        79        1         0         
  79    6.623524e-01   7.040433e-01   4.318173e-03   82        80        1         0         
  80    6.614027e-01   7.854644e-01   7.308677e-03   83        81        1         0         
  81    6.604492e-01   7.804842e-01   4.872075e-03   84        82        1         0         
  82    6.595302e-01   6.683851e-01   4.824500e-03   85        83        1         0         
  83    6.589511e-01   5.274178e-01   4.779652e-03   86        84        1         0         
  84    6.584504e-01   5.142937e-01   2.838369e-03   87        85        1         0         
  85    6.578479e-01   5.227387e-01   6.370539e-03   88        86        1         0         
  86    6.572230e-01   7.107073e-01   8.522011e-03   89        87        1         0         
  87    6.565733e-01   6.128918e-01   4.718877e-03   90        88        1         0         
  88    6.559471e-01   5.526768e-01   4.886326e-03   91        89        1         0         
  89    6.549794e-01   7.427936e-01   9.776975e-03   92        90        1         0         
  90    6.543004e-01   8.872030e-01   1.025592e-02   93        91        1         0         
  91    6.535729e-01   5.349120e-01   1.614435e-03   94        92        1         0         
  92    6.528238e-01   5.079157e-01   4.851689e-03   95        93        1         0         
  93    6.521421e-01   6.460693e-01   5.320497e-03   96        94        1         0         
  94    6.512379e-01   7.640086e-01   1.211917e-02   97        95        1         0         
  95    6.504720e-01   8.510228e-01   8.841161e-03   98        96        1         0         
  96    6.495761e-01   6.043812e-01   2.607283e-03   99        97        1         0         
  97    6.488357e-01   6.177020e-01   5.747528e-03   100       98        1         0         
  98    6.480783e-01   6.894607e-01   4.519549e-03   101       99        1         0         
  99    6.473908e-01   6.582234e-01   7.006375e-03   102       100       1         0         
  100   6.468613e-01   5.126470e-01   3.910883e-03   103       101       1         0         
  101   6.463849e-01   4.934297e-01   3.509141e-03   104       102       1         0         
  102   6.458930e-01   4.948327e-01   3.843133e-03   105       103       1         0         
  103   6.453894e-01   5.106501e-01   5.127126e-03   106       104       1         0         
  104   6.449914e-01   5.536791e-01   3.279213e-03   107       105       1         0         
  105   6.446011e-01   3.872806e-01   2.405528e-03   108       106       1         0         
  106   6.443046e-01   4.060937e-01   2.073998e-03   109       107       1         0         
  107   6.440955e-01   3.991747e-01   2.608101e-03   110       108       1         0         
  108   6.439218e-01   3.160591e-01   1.485973e-03   111       109       1         0         
  109   6.436587e-01   3.656993e-01   2.818526e-03   112       110       1         0         
  110   6.434943e-01   3.944569e-01   2.054950e-03   113       111       1         0         
  111   6.433219e-01   2.903793e-01   1.147333e-03   114       112       1         0         
  112   6.430287e-01   3.083221e-01   4.437609e-03   115       113       1         0         
  113   6.428150e-01   4.477470e-01   3.914207e-03   116       114       1         0         
  114   6.425647e-01   3.387966e-01   3.160027e-03   117       115       1         0         
  115   6.422576e-01   3.964664e-01   6.488765e-03   118       116       1         0         
  116   6.419650e-01   4.365046e-01   3.389335e-03   119       117       1         0         
  117   6.417092e-01   3.659696e-01   4.208039e-03   120       118       1         0         
  118   6.414836e-01   3.287463e-01   4.231874e-03   121       119       1         0         
  119   6.412521e-01   3.294014e-01   3.870951e-03   122       120       1         0         
  120   6.410602e-01   3.727918e-01   3.275949e-03   123       121       1         0         
  121   6.408775e-01   2.637934e-01   3.318996e-03   124       122       1         0         
  122   6.407047e-01   2.682083e-01   3.690054e-03   125       123       1         0         
  123   6.405880e-01   4.070336e-01   3.627140e-03   126       124       1         0         
  124   6.404237e-01   2.723869e-01   1.983303e-03   127       125       1         0         
  125   6.402734e-01   3.084998e-01   2.913890e-03   128       126       1         0         
  126   6.401355e-01   3.001932e-01   9.741974e-04   129       127       1         0         
  127   6.399696e-01   2.763057e-01   1.774271e-03   130       128       1         0         
  128   6.398740e-01   2.567011e-01   1.778733e-03   131       129       1         0         
  129   6.397928e-01   2.086861e-01   1.196522e-03   132       130       1         0         
  130   6.396603e-01   2.190871e-01   3.118854e-03   133       131       1         0         
  131   6.395967e-01   3.884036e-01   2.699844e-03   134       132       1         0         
  132   6.395173e-01   1.862685e-01   4.228004e-04   135       133       1         0         
  133   6.394533e-01   1.640713e-01   9.391436e-04   136       134       1         0         
  134   6.393826e-01   2.026358e-01   1.449004e-03   137       135       1         0         
  135   6.392569e-01   2.832919e-01   4.029345e-03   138       136       1         0         
  136   6.391556e-01   3.280589e-01   4.620419e-03   139       137       1         0         
  137   6.390119e-01   3.189511e-01   2.007033e-03   140       138       1         0         
  138   6.388820e-01   3.161666e-01   1.268979e-03   141       139       1         0         
  139   6.387138e-01   2.967253e-01   1.419957e-03   142       140       1         0         
  140   6.384866e-01   3.701269e-01   4.482299e-03   143       141       1         0         
  141   6.382860e-01   3.768352e-01   3.261227e-03   144       142       1         0         
  142   6.380737e-01   3.468917e-01   3.801160e-03   145       143       1         0         
  143   6.378736e-01   3.818970e-01   5.193216e-03   146       144       1         0         
  144   6.377099e-01   4.588864e-01   2.528503e-03   147       145       1         0         
  145   6.375288e-01   2.785707e-01   1.153510e-03   148       146       1         0         
  146   6.373223e-01   3.269421e-01   3.348711e-03   149       147       1         0         
  147   6.370967e-01   3.591015e-01   2.414850e-03   150       148       1         0         
  148   6.368436e-01   5.622747e-01   6.690442e-03   151       149       1         0         
  149   6.365686e-01   4.323689e-01   1.883307e-03   152       150       1         0         
  150   6.363893e-01   2.411338e-01   1.574693e-03   153       151       1         0         
  151   6.362235e-01   2.730304e-01   4.217098e-03   154       152       1         0         
  152   6.360435e-01   3.478014e-01   3.636899e-03   155       153       1         0         
  153   6.358069e-01   3.616804e-01   7.510083e-03   156       154       1         0         
  154   6.355810e-01   4.345087e-01   4.252589e-03   157       155       1         0         
  155   6.353831e-01   3.466562e-01   2.307532e-03   158       156       1         0         
  156   6.352694e-01   2.362440e-01   1.840092e-03   159       157       1         0         
  157   6.351653e-01   2.057024e-01   1.320550e-03   160       158       1         0         
  158   6.350539e-01   2.644103e-01   2.308323e-03   161       159       1         0         
  159   6.349374e-01   2.524154e-01   2.382107e-03   162       160       1         0         
  160   6.348230e-01   2.158511e-01   1.734580e-03   163       161       1         0         
  161   6.346890e-01   3.099337e-01   5.009861e-03   164       162       1         0         
  162   6.345448e-01   2.961955e-01   2.226654e-03   165       163       1         0         
  163   6.344078e-01   2.780023e-01   2.345888e-03   166       164       1         0         
  164   6.343301e-01   2.037913e-01   9.418155e-04   167       165       1         0         
  165   6.342579e-01   1.729139e-01   1.474423e-03   168       166       1         0         
  166   6.341316e-01   1.959859e-01   3.051695e-03   169       167       1         0         
  167   6.340628e-01   3.350380e-01   3.807125e-03   170       168       1         0         
  168   6.339895e-01   1.699590e-01   5.502869e-04   171       169       1         0         
  169   6.339346e-01   1.481314e-01   9.169247e-04   172       170       1         0         
  170   6.338818e-01   1.935549e-01   1.346628e-03   173       171       1         0         
  171   6.338494e-01   2.301630e-01   2.143085e-03   175       172       2         0         
  172   6.337794e-01   1.655771e-01   1.199366e-03   176       173       1         0         
  173   6.337137e-01   1.465767e-01   1.926399e-03   177       174       1         0         
  174   6.336699e-01   1.548550e-01   1.271163e-03   178       175       1         0         
  175   6.336299e-01   1.379695e-01   1.116247e-03   179       176       1         0         
  176   6.335935e-01   1.477558e-01   2.024789e-03   180       177       1         0         
  177   6.335576e-01   1.261164e-01   1.032956e-03   181       178       1         0         
  178   6.335232e-01   1.245663e-01   1.029606e-03   182       179       1         0         
  179   6.335033e-01   1.893537e-01   2.704625e-03   183       180       1         0         
  180   6.334683e-01   1.741855e-01   1.237094e-03   184       181       1         0         
  181   6.334387e-01   1.321701e-01   5.214744e-04   185       182       1         0         
  182   6.334073e-01   1.229050e-01   1.630045e-03   186       183       1         0         
  183   6.333770e-01   2.234319e-01   2.822694e-03   187       184       1         0         
  184   6.333406e-01   1.224244e-01   4.931395e-04   188       185       1         0         
  185   6.333088e-01   1.100394e-01   5.748888e-04   189       186       1         0         
  186   6.332669e-01   1.342459e-01   1.669968e-03   190       187       1         0         
  187   6.332132e-01   2.410832e-01   4.854610e-03   191       188       1         0         
  188   6.331720e-01   1.776083e-01   6.835563e-04   192       189       1         0         
  189   6.331451e-01   1.104524e-01   3.287427e-04   193       190       1         0         
  190   6.331032e-01   9.923412e-02   1.362034e-03   194       191       1         0         
  191   6.330627e-01   1.943996e-01   1.842342e-03   195       192       1         0         
  192   6.330136e-01   1.436653e-01   1.924344e-03   196       193       1         0         
  193   6.329471e-01   1.330894e-01   2.989671e-03   197       194       1         0         
  194   6.328898e-01   1.974142e-01   5.579522e-03   198       195       1         0         
  195   6.328304e-01   2.351365e-01   1.091812e-03   199       196       1         0         
  196   6.327551e-01   1.763588e-01   1.430564e-03   200       197       1         0         
  197   6.326904e-01   1.971360e-01   3.087602e-03   201       198       1         0         
  198   6.326397e-01   1.853280e-01   2.532351e-03   202       199       1         0         
  199   6.325987e-01   2.023611e-01   2.951371e-03   203       200       1         0         
  200   6.325570e-01   1.679345e-01   2.682923e-03   204       201       1         0         
  201   6.324997e-01   1.505022e-01   1.461960e-03   205       202       1         0         
  202   6.324572e-01   2.284707e-01   4.236357e-03   206       203       1         0         
  203   6.323896e-01   1.790830e-01   1.313105e-03   207       204       1         0         
  204   6.323210e-01   1.716676e-01   1.481819e-03   208       205       1         0         
  205   6.322836e-01   2.028897e-01   1.906301e-03   210       206       2         0         
  206   6.322096e-01   1.554604e-01   1.265105e-03   211       207       1         0         
  207   6.321616e-01   1.587653e-01   2.443419e-03   212       208       1         0         
  208   6.321179e-01   2.099347e-01   2.348426e-03   213       209       1         0         
  209   6.320732e-01   1.508583e-01   1.253316e-03   214       210       1         0         
  210   6.320320e-01   1.345282e-01   1.489159e-03   215       211       1         0         
  211   6.319792e-01   1.854735e-01   1.582705e-03   216       212       1         0         
  212   6.319252e-01   1.605926e-01   2.131778e-03   217       213       1         0         
  213   6.318908e-01   1.377578e-01   8.953678e-04   218       214       1         0         
  214   6.318605e-01   1.064633e-01   6.515609e-04   219       215       1         0         
  215   6.318355e-01   1.224265e-01   8.703172e-04   220       216       1         0         
  216   6.318108e-01   1.175364e-01   1.651042e-03   221       217       1         0         
  217   6.317930e-01   8.693044e-02   9.213902e-04   222       218       1         0         
  218   6.317787e-01   8.388522e-02   7.406534e-04   223       219       1         0         
  219   6.317638e-01   8.838957e-02   7.000913e-04   224       220       1         0         
  220   6.317473e-01   7.761111e-02   6.982434e-04   225       221       1         0         
  221   6.317351e-01   7.487224e-02   8.205324e-04   226       222       1         0         
  222   6.317285e-01   7.340843e-02   6.847187e-04   227       223       1         0         
  223   6.317223e-01   5.823284e-02   3.517630e-04   228       224       1         0         
  224   6.317159e-01   5.476710e-02   4.360848e-04   229       225       1         0         
  225   6.317109e-01   7.513253e-02   5.922726e-04   230       226       1         0         
  226   6.317066e-01   4.919798e-02   6.081445e-04   231       227       1         0         
  227   6.317008e-01   5.220552e-02   5.264316e-04   232       228       1         0         
  228   6.316996e-01   5.469102e-02   5.709004e-04   233       229       1         0         
  229   6.316990e-01   4.660186e-02   4.566267e-04   234       230       1         0         
  230   6.316965e-01   4.538973e-02   5.660075e-04   235       231       1         0         
  231   6.316933e-01   5.312315e-02   2.974316e-04   236       232       1         0         
  232   6.316911e-01   3.348370e-02   1.479960e-04   237       233       1         0         
  233   6.316894e-01   3.156301e-02   3.319168e-04   238       234       1         0         
  234   6.316878e-01   3.388832e-02   2.505964e-04   239       235       1         0         
  235   6.316868e-01   3.159458e-02   4.464832e-04   240       236       1         0         
  236   6.316863e-01   2.952454e-02   2.348576e-04   241       237       1         0         
  237   6.316847e-01   2.229576e-02   1.127279e-04   242       238       1         0         
  238   6.316832e-01   2.146264e-02   1.645999e-04   243       239       1         0         
  239   6.316824e-01   2.646233e-02   2.337138e-04   244       240       1         0         
  240   6.316818e-01   1.686165e-02   1.604830e-04   245       241       1         0         
  241   6.316812e-01   1.505041e-02   1.682090e-04   246       242       1         0         
  242   6.316807e-01   1.853276e-02   1.431392e-04   247       243       1         0         
  243   6.316800e-01   1.423609e-02   1.517646e-04   248       244       1         0         
  244   6.316795e-01   1.287050e-02   1.039381e-04   249       245       1         0         
  245   6.316790e-01   1.484117e-02   1.131986e-04   250       246       1         0         
  246   6.316786e-01   1.272706e-02   1.245520e-04   251       247       1         0         
  247   6.316784e-01   1.402481e-02   8.142142e-05   252       248       1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.602561e-01   4.430702e-01   
  1     5.598682e-01   3.519042e-01   1.706876e-03   5         2         4         0         
  2     5.596644e-01   2.968628e-01   1.075428e-03   6         3         1         0         
  3     5.594565e-01   4.199166e-01   2.483344e-03   7         4         1         0         
  4     5.592556e-01   3.121204e-01   9.183057e-04   8         5         1         0         
  5     5.591197e-01   2.159904e-01   1.042781e-03   9         6         1         0         
  6     5.589586e-01   2.385835e-01   1.896047e-03   10        7         1         0         
  7     5.588023e-01   3.899692e-01   3.151887e-03   11        8         1         0         
  8     5.586812e-01   2.555274e-01   1.255811e-03   12        9         1         0         
  9     5.586125e-01   1.771202e-01   5.404504e-04   13        10        1         0         
  10    5.585264e-01   1.613848e-01   1.376623e-03   14        11        1         0         
  11    5.584528e-01   2.363098e-01   2.106142e-03   15        12        1         0         
  12    5.583969e-01   1.997785e-01   1.733145e-03   16        13        1         0         
  13    5.583451e-01   1.302330e-01   6.813931e-04   17        14        1         0         
  14    5.582815e-01   1.375508e-01   1.223596e-03   18        15        1         0         
  15    5.582294e-01   2.272902e-01   1.503559e-03   19        16        1         0         
  16    5.581693e-01   2.108098e-01   2.120447e-03   20        17        1         0         
  17    5.581034e-01   1.495709e-01   9.746579e-04   21        18        1         0         
  18    5.580571e-01   1.235825e-01   8.423047e-04   22        19        1         0         
  19    5.580172e-01   1.681742e-01   1.219429e-03   23        20        1         0         
  20    5.579853e-01   1.252873e-01   8.030540e-04   24        21        1         0         
  21    5.579629e-01   1.003393e-01   5.775816e-04   25        22        1         0         
  22    5.579311e-01   1.260772e-01   1.469203e-03   26        23        1         0         
  23    5.579133e-01   1.546043e-01   1.304608e-03   27        24        1         0         
  24    5.578954e-01   8.386660e-02   1.800892e-04   28        25        1         0         
  25    5.578770e-01   6.805307e-02   4.525077e-04   29        26        1         0         
  26    5.578639e-01   7.886575e-02   5.069731e-04   30        27        1         0         
  27    5.578444e-01   1.089181e-01   1.416850e-03   31        28        1         0         
  28    5.578283e-01   7.142817e-02   8.217622e-04   32        29        1         0         
  29    5.578163e-01   7.010380e-02   4.328522e-04   33        30        1         0         
  30    5.577993e-01   8.342281e-02   1.009760e-03   34        31        1         0         
  31    5.577884e-01   1.286463e-01   1.448572e-03   35        32        1         0         
  32    5.577771e-01   5.835672e-02   2.072104e-04   36        33        1         0         
  33    5.577700e-01   5.182746e-02   2.674489e-04   37        34        1         0         
  34    5.577613e-01   5.695070e-02   5.806358e-04   38        35        1         0         
  35    5.577550e-01   1.146869e-01   1.448873e-03   39        36        1         0         
  36    5.577459e-01   5.564002e-02   5.134375e-04   40        37        1         0         
  37    5.577416e-01   4.549879e-02   1.398516e-04   41        38        1         0         
  38    5.577349e-01   5.317406e-02   5.153923e-04   42        39        1         0         
  39    5.577316e-01   9.971289e-02   1.165106e-03   43        40        1         0         
  40    5.577254e-01   4.859891e-02   3.153227e-04   44        41        1         0         
  41    5.577211e-01   4.038818e-02   2.099143e-04   45        42        1         0         
  42    5.577168e-01   4.160892e-02   4.029914e-04   46        43        1         0         
  43    5.577134e-01   9.774684e-02   7.833261e-04   47        44        1         0         
  44    5.577082e-01   4.100630e-02   3.548457e-04   48        45        1         0         
  45    5.577056e-01   3.397979e-02   2.595060e-04   49        46        1         0         
  46    5.577029e-01   4.278441e-02   4.043961e-04   50        47        1         0         
  47    5.576994e-01   4.467475e-02   4.439200e-04   51        48        1         0         
  48    5.576961e-01   6.879750e-02   9.155049e-04   52        49        1         0         
  49    5.576927e-01   3.474653e-02   2.429812e-04   53        50        1         0         
  50    5.576907e-01   3.413298e-02   1.135493e-04   54        51        1         0         
  51    5.576871e-01   3.780534e-02   3.565742e-04   55        52        1         0         
  52    5.576851e-01   7.894985e-02   1.036191e-03   56        53        1         0         
  53    5.576816e-01   3.346019e-02   2.990223e-04   57        54        1         0         
  54    5.576798e-01   2.805381e-02   1.382321e-04   58        55        1         0         
  55    5.576775e-01   3.287248e-02   4.756095e-04   59        56        1         0         
  56    5.576750e-01   5.287085e-02   7.092529e-04   60        57        1         0         
  57    5.576725e-01   3.860423e-02   4.945997e-04   61        58        1         0         
  58    5.576711e-01   3.306323e-02   5.318272e-04   62        59        1         0         
  59    5.576690e-01   3.177959e-02   2.454716e-04   63        60        1         0         
  60    5.576673e-01   5.251973e-02   5.098825e-04   64        61        1         0         
  61    5.576654e-01   2.868271e-02   3.150446e-04   65        62        1         0         
  62    5.576646e-01   2.815397e-02   3.040486e-04   66        63        1         0         
  63    5.576629e-01   3.189121e-02   4.685765e-04   67        64        1         0         
  64    5.576615e-01   4.968270e-02   6.948715e-04   68        65        1         0         
  65    5.576596e-01   3.003390e-02   1.490534e-04   69        66        1         0         
  66    5.576580e-01   2.472865e-02   1.137962e-04   70        67        1         0         
  67    5.576561e-01   2.884464e-02   3.254101e-04   71        68        1         0         
  68    5.576548e-01   5.687977e-02   7.247240e-04   72        69        1         0         
  69    5.576529e-01   2.541239e-02   1.983453e-04   73        70        1         0         
  70    5.576515e-01   2.522424e-02   4.323704e-04   74        71        1         0         
  71    5.576499e-01   2.579785e-02   1.902877e-04   75        72        1         0         
  72    5.576470e-01   3.837306e-02   7.316349e-04   76        73        1         0         
  73    5.576441e-01   4.754105e-02   5.833931e-04   77        74        1         0         
  74    5.576419e-01   2.782707e-02   2.597265e-04   78        75        1         0         
  75    5.576403e-01   2.471321e-02   2.284421e-04   79        76        1         0         
  76    5.576383e-01   2.861521e-02   4.114043e-04   80        77        1         0         
  77    5.576366e-01   5.262662e-02   6.294290e-04   81        78        1         0         
  78    5.576351e-01   2.343591e-02   1.470479e-04   82        79        1         0         
  79    5.576344e-01   2.079178e-02   1.805492e-04   83        80        1         0         
  80    5.576335e-01   2.446216e-02   2.462889e-04   84        81        1         0         
  81    5.576326e-01   3.297777e-02   4.116947e-04   85        82        1         0         
  82    5.576321e-01   3.854223e-02   6.059118e-04   86        83        1         0         
  83    5.576306e-01   1.998813e-02   1.513668e-04   87        84        1         0         
  84    5.576294e-01   2.284318e-02   2.684780e-04   88        85        1         0         
  85    5.576288e-01   2.328375e-02   2.330642e-04   89        86        1         0         
  86    5.576274e-01   3.166445e-02   5.334593e-04   90        87        1         0         
  87    5.576268e-01   3.684364e-02   4.041008e-04   91        88        1         0         
  88    5.576263e-01   1.825534e-02   4.164730e-05   92        89        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.665013e-01   2.063208e-01   
  1     5.664177e-01   1.803858e-01   8.964388e-04   5         2         4         0         
  2     5.663600e-01   1.718751e-01   7.418192e-04   6         3         1         0         
  3     5.663009e-01   2.032254e-01   9.892854e-04   7         4         1         0         
  4     5.662269e-01   1.918484e-01   9.223553e-04   8         5         1         0         
  5     5.661742e-01   2.372080e-01   1.444533e-03   9         6         1         0         
  6     5.660822e-01   2.535828e-01   7.934560e-04   10        7         1         0         
  7     5.660000e-01   2.212575e-01   1.078285e-03   11        8         1         0         
  8     5.659195e-01   2.297864e-01   1.529672e-03   12        9         1         0         
  9     5.658304e-01   2.723072e-01   1.450581e-03   13        10        1         0         
  10    5.657628e-01   1.787121e-01   1.017416e-03   14        11        1         0         
  11    5.657108e-01   1.394450e-01   6.551674e-04   15        12        1         0         
  12    5.656722e-01   1.367611e-01   6.798496e-04   16        13        1         0         
  13    5.656320e-01   1.456376e-01   1.406729e-03   17        14        1         0         
  14    5.655941e-01   2.077761e-01   1.164600e-03   18        15        1         0         
  15    5.655538e-01   1.012810e-01   4.530758e-04   19        16        1         0         
  16    5.655363e-01   9.025749e-02   3.850572e-04   20        17        1         0         
  17    5.655197e-01   1.225716e-01   7.853873e-04   21        18        1         0         
  18    5.655017e-01   1.269177e-01   8.830134e-04   22        19        1         0         
  19    5.654819e-01   8.383058e-02   4.501479e-04   23        20        1         0         
  20    5.654660e-01   8.268823e-02   6.837288e-04   24        21        1         0         
  21    5.654511e-01   9.500512e-02   6.919588e-04   25        22        1         0         
  22    5.654337e-01   7.470364e-02   6.834911e-04   26        23        1         0         
  23    5.654207e-01   7.345346e-02   8.545848e-04   27        24        1         0         
  24    5.654085e-01   8.762586e-02   1.078871e-03   28        25        1         0         
  25    5.653962e-01   7.958456e-02   4.638817e-04   29        26        1         0         
  26    5.653789e-01   9.439174e-02   7.984379e-04   30        27        1         0         
  27    5.653591e-01   1.306396e-01   1.054275e-03   31        28        1         0         
  28    5.653359e-01   8.429111e-02   4.246865e-04   32        29        1         0         
  29    5.653206e-01   7.038387e-02   5.033740e-04   33        30        1         0         
  30    5.653106e-01   8.861112e-02   6.126952e-04   34        31        1         0         
  31    5.653025e-01   5.487917e-02   2.845974e-04   35        32        1         0         
  32    5.652956e-01   5.576386e-02   3.892924e-04   36        33        1         0         
  33    5.652811e-01   1.080947e-01   1.079145e-03   37        34        1         0         
  34    5.652733e-01   1.135900e-01   8.642732e-04   39        35        2         0         
  35    5.652563e-01   8.311599e-02   3.899080e-04   40        36        1         0         
  36    5.652439e-01   8.800920e-02   5.291386e-04   41        37        1         0         
  37    5.652357e-01   6.223474e-02   4.296031e-04   42        38        1         0         
  38    5.652294e-01   4.944232e-02   2.798883e-04   43        39        1         0         
  39    5.652266e-01   7.838183e-02   8.813344e-04   44        40        1         0         
  40    5.652227e-01   5.675137e-02   2.325981e-04   45        41        1         0         
  41    5.652199e-01   3.729162e-02   1.319684e-04   46        42        1         0         
  42    5.652176e-01   2.954891e-02   4.054427e-04   47        43        1         0         
  43    5.652150e-01   2.909842e-02   2.703480e-04   48        44        1         0         
  44    5.652137e-01   4.366995e-02   3.080766e-04   49        45        1         0         
  45    5.652129e-01   2.701960e-02   1.695732e-04   50        46        1         0         
  46    5.652120e-01   3.204669e-02   1.041167e-04   51        47        1         0         
  47    5.652116e-01   5.176178e-02   4.101868e-04   52        48        1         0         
  48    5.652097e-01   3.207642e-02   9.647484e-05   53        49        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.744089e-01   1.833979e-01   
  1     5.743478e-01   4.844239e-01   1.833979e-03   4         2         3         0         
  2     5.742226e-01   2.483768e-01   5.045531e-04   5         3         1         0         
  3     5.741730e-01   1.903207e-01   2.711903e-04   6         4         1         0         
  4     5.740815e-01   2.467733e-01   1.291004e-03   7         5         1         0         
  5     5.740123e-01   2.270828e-01   7.623469e-04   8         6         1         0         
  6     5.739154e-01   2.271387e-01   1.135720e-03   9         7         1         0         
  7     5.738214e-01   1.892448e-01   1.233100e-03   10        8         1         0         
  8     5.737516e-01   1.928280e-01   1.530107e-03   11        9         1         0         
  9     5.736708e-01   2.303729e-01   1.359885e-03   12        10        1         0         
  10    5.735970e-01   2.403837e-01   9.987922e-04   13        11        1         0         
  11    5.735269e-01   2.972883e-01   1.539539e-03   14        12        1         0         
  12    5.734341e-01   2.135654e-01   6.086994e-04   15        13        1         0         
  13    5.733199e-01   1.682807e-01   1.367820e-03   16        14        1         0         
  14    5.732485e-01   1.789023e-01   1.414008e-03   17        15        1         0         
  15    5.731552e-01   2.823048e-01   2.095720e-03   18        16        1         0         
  16    5.730838e-01   2.001176e-01   1.126869e-03   19        17        1         0         
  17    5.730398e-01   1.398328e-01   4.930712e-04   20        18        1         0         
  18    5.729878e-01   1.563198e-01   1.581755e-03   21        19        1         0         
  19    5.729247e-01   2.012618e-01   1.607266e-03   22        20        1         0         
  20    5.728294e-01   3.176700e-01   2.339606e-03   23        21        1         0         
  21    5.728140e-01   4.272640e-01   3.334102e-03   24        22        1         0         
  22    5.726402e-01   2.873478e-01   1.368036e-03   25        23        1         0         
  23    5.724586e-01   2.158496e-01   1.846082e-03   26        24        1         0         
  24    5.724156e-01   4.436017e-01   2.751512e-03   27        25        1         0         
  25    5.722668e-01   1.827318e-01   6.623103e-04   28        26        1         0         
  26    5.721782e-01   1.982316e-01   9.026266e-04   29        27        1         0         
  27    5.721264e-01   5.972358e-01   5.504066e-03   30        28        1         0         
  28    5.718247e-01   2.909961e-01   1.540026e-03   31        29        1         0         
  29    5.716479e-01   2.244961e-01   1.516215e-03   32        30        1         0         
  30    5.715033e-01   4.866624e-01   4.238061e-03   33        31        1         0         
  31    5.713190e-01   2.552226e-01   7.049495e-04   34        32        1         0         
  32    5.711760e-01   2.698292e-01   1.675919e-03   35        33        1         0         
  33    5.709131e-01   4.762698e-01   5.776604e-03   36        34        1         0         
  34    5.706363e-01   4.542158e-01   3.503080e-03   37        35        1         0         
  35    5.704392e-01   3.785454e-01   2.232392e-03   38        36        1         0         
  36    5.702506e-01   2.829375e-01   1.157737e-03   39        37        1         0         
  37    5.700212e-01   3.544932e-01   3.417527e-03   40        38        1         0         
  38    5.697910e-01   3.353489e-01   3.238429e-03   41        39        1         0         
  39    5.695492e-01   3.891853e-01   4.875016e-03   42        40        1         0         
  40    5.694058e-01   2.536276e-01   2.417063e-03   43        41        1         0         
  41    5.692932e-01   2.236148e-01   1.368729e-03   44        42        1         0         
  42    5.691077e-01   3.161218e-01   3.582287e-03   45        43        1         0         
  43    5.689662e-01   3.956319e-01   4.964498e-03   46        44        1         0         
  44    5.687925e-01   2.334957e-01   1.384572e-03   47        45        1         0         
  45    5.686678e-01   1.638972e-01   1.668356e-03   48        46        1         0         
  46    5.685944e-01   2.492386e-01   2.115621e-03   49        47        1         0         
  47    5.685194e-01   1.764504e-01   1.052688e-03   50        48        1         0         
  48    5.684538e-01   1.750613e-01   1.161247e-03   51        49        1         0         
  49    5.683584e-01   3.217423e-01   4.059017e-03   52        50        1         0         
  50    5.682772e-01   2.720418e-01   1.194850e-03   53        51        1         0         
  51    5.682142e-01   1.614891e-01   1.123853e-03   54        52        1         0         
  52    5.681647e-01   1.424072e-01   1.335213e-03   55        53        1         0         
  53    5.681257e-01   1.247541e-01   1.067561e-03   56        54        1         0         
  54    5.680862e-01   1.594184e-01   2.409321e-03   57        55        1         0         
  55    5.680690e-01   2.290035e-01   2.546040e-03   58        56        1         0         
  56    5.680191e-01   1.281833e-01   9.143434e-04   59        57        1         0         
  57    5.680025e-01   7.945838e-02   6.609661e-04   60        58        1         0         
  58    5.679701e-01   9.154114e-02   8.179754e-04   61        59        1         0         
  59    5.679427e-01   1.523374e-01   1.462264e-03   62        60        1         0         
  60    5.679142e-01   1.467068e-01   9.564366e-04   63        61        1         0         
  61    5.678939e-01   8.831244e-02   4.191443e-04   64        62        1         0         
  62    5.678734e-01   7.720428e-02   6.735280e-04   65        63        1         0         
  63    5.678379e-01   1.162726e-01   2.078198e-03   66        64        1         0         
  64    5.678281e-01   1.902495e-01   1.250841e-03   67        65        1         0         
  65    5.678093e-01   8.596858e-02   2.181089e-04   68        66        1         0         
  66    5.677996e-01   6.980931e-02   3.952829e-04   69        67        1         0         
  67    5.677917e-01   8.594130e-02   7.136096e-04   70        68        1         0         
  68    5.677752e-01   9.891189e-02   8.734613e-04   71        69        1         0         
  69    5.677634e-01   1.125867e-01   1.563540e-03   72        70        1         0         
  70    5.677520e-01   6.909091e-02   6.559910e-04   73        71        1         0         
  71    5.677445e-01   8.869690e-02   7.321620e-04   74        72        1         0         
  72    5.677307e-01   9.074694e-02   5.912829e-04   75        73        1         0         
  73    5.677079e-01   1.241042e-01   9.648894e-04   76        74        1         0         
  74    5.676878e-01   9.327713e-02   1.279182e-03   77        75        1         0         
  75    5.676751e-01   6.470305e-02   2.434605e-04   78        76        1         0         
  76    5.676591e-01   7.237889e-02   7.684691e-04   79        77        1         0         
  77    5.676379e-01   1.182162e-01   1.878489e-03   80        78        1         0         
  78    5.676240e-01   1.457755e-01   1.225830e-03   81        79        1         0         
  79    5.676047e-01   9.777505e-02   1.105543e-03   82        80        1         0         
  80    5.675827e-01   1.174349e-01   9.721785e-04   83        81        1         0         
  81    5.675670e-01   9.162150e-02   5.192711e-04   84        82        1         0         
  82    5.675412e-01   1.199167e-01   1.391858e-03   85        83        1         0         
  83    5.675253e-01   1.239820e-01   6.353432e-04   86        84        1         0         
  84    5.675100e-01   7.783736e-02   4.646916e-04   87        85        1         0         
  85    5.674992e-01   8.023454e-02   7.632274e-04   88        86        1         0         
  86    5.674861e-01   9.788645e-02   1.076902e-03   89        87        1         0         
  87    5.674765e-01   7.522001e-02   3.623472e-04   90        88        1         0         
  88    5.674693e-01   7.033205e-02   3.517850e-04   91        89        1         0         
  89    5.674658e-01   4.939969e-02   2.011642e-04   92        90        1         0         
  90    5.674630e-01   3.959504e-02   1.425859e-04   93        91        1         0         
  91    5.674623e-01   4.308568e-02   4.321145e-04   95        92        2         0         
  92    5.674580e-01   4.022621e-02   3.815020e-04   96        93        1         0         
  93    5.674556e-01   5.863909e-02   5.682566e-04   97        94        1         0         
  94    5.674528e-01   4.454920e-02   2.299040e-04   98        95        1         0         
  95    5.674508e-01   3.534916e-02   1.090549e-04   99        96        1         0         
  96    5.674466e-01   3.327124e-02   4.192106e-04   100       97        1         0         
  97    5.674457e-01   8.058991e-02   5.551308e-04   101       98        1         0         
  98    5.674428e-01   3.301102e-02   7.445501e-05   102       99        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.709818e-01   2.665580e-01   
  1     5.708351e-01   6.296854e-01   2.665580e-03   4         2         3         0         
  2     5.705126e-01   3.696182e-01   8.177682e-04   5         3         1         0         
  3     5.702732e-01   3.364943e-01   1.018748e-03   6         4         1         0         
  4     5.701504e-01   5.077535e-01   2.139453e-03   7         5         1         0         
  5     5.699545e-01   3.476987e-01   8.146366e-04   8         6         1         0         
  6     5.697829e-01   2.259876e-01   8.446744e-04   9         7         1         0         
  7     5.696089e-01   2.536946e-01   1.819646e-03   10        8         1         0         
  8     5.694734e-01   3.676364e-01   2.191812e-03   11        9         1         0         
  9     5.693457e-01   2.553561e-01   1.404746e-03   12        10        1         0         
  10    5.692491e-01   2.096039e-01   8.598190e-04   13        11        1         0         
  11    5.691277e-01   2.241949e-01   1.630549e-03   14        12        1         0         
  12    5.689784e-01   4.121360e-01   4.147106e-03   15        13        1         0         
  13    5.687538e-01   3.158562e-01   1.315469e-03   16        14        1         0         
  14    5.685476e-01   3.296105e-01   2.348839e-03   17        15        1         0         
  15    5.683622e-01   2.847831e-01   1.041638e-03   18        16        1         0         
  16    5.681439e-01   3.563430e-01   3.632144e-03   19        17        1         0         
  17    5.679583e-01   3.251116e-01   1.255639e-03   20        18        1         0         
  18    5.677992e-01   2.413017e-01   2.515059e-03   21        19        1         0         
  19    5.676543e-01   2.386607e-01   3.012461e-03   22        20        1         0         
  20    5.675001e-01   3.219986e-01   3.443560e-03   23        21        1         0         
  21    5.673436e-01   3.064200e-01   3.457018e-03   24        22        1         0         
  22    5.671712e-01   2.637132e-01   2.189691e-03   25        23        1         0         
  23    5.669978e-01   3.413141e-01   2.865660e-03   26        24        1         0         
  24    5.668605e-01   2.624751e-01   2.562497e-03   27        25        1         0         
  25    5.667232e-01   2.332123e-01   2.313805e-03   28        26        1         0         
  26    5.665369e-01   3.022254e-01   3.779627e-03   29        27        1         0         
  27    5.664183e-01   3.629406e-01   3.441383e-03   30        28        1         0         
  28    5.663024e-01   2.238785e-01   7.702955e-04   31        29        1         0         
  29    5.661713e-01   2.446023e-01   2.522543e-03   32        30        1         0         
  30    5.660555e-01   2.673680e-01   2.000307e-03   33        31        1         0         
  31    5.658516e-01   3.725375e-01   9.154180e-03   34        32        1         0         
  32    5.656739e-01   5.041598e-01   4.476983e-03   35        33        1         0         
  33    5.654856e-01   2.863297e-01   8.981118e-04   36        34        1         0         
  34    5.653245e-01   2.976007e-01   4.188443e-03   37        35        1         0         
  35    5.651554e-01   3.350387e-01   2.153161e-03   38        36        1         0         
  36    5.649321e-01   3.898384e-01   5.273677e-03   39        37        1         0         
  37    5.646984e-01   3.954560e-01   3.722027e-03   40        38        1         0         
  38    5.644984e-01   2.876352e-01   1.428703e-03   41        39        1         0         
  39    5.643362e-01   2.723268e-01   3.063758e-03   42        40        1         0         
  40    5.641664e-01   3.064745e-01   3.220750e-03   43        41        1         0         
  41    5.639916e-01   3.458993e-01   4.116496e-03   44        42        1         0         
  42    5.638068e-01   2.955291e-01   3.793229e-03   45        43        1         0         
  43    5.636317e-01   4.240557e-01   6.154275e-03   46        44        1         0         
  44    5.634401e-01   3.060467e-01   1.256219e-03   47        45        1         0         
  45    5.632869e-01   2.374488e-01   2.609930e-03   48        46        1         0         
  46    5.630940e-01   3.368046e-01   5.645657e-03   49        47        1         0         
  47    5.629446e-01   3.431139e-01   3.931029e-03   50        48        1         0         
  48    5.628286e-01   2.211461e-01   1.674036e-03   51        49        1         0         
  49    5.627054e-01   2.023533e-01   2.789831e-03   52        50        1         0         
  50    5.626290e-01   3.216078e-01   1.981429e-03   53        51        1         0         
  51    5.625391e-01   2.138257e-01   2.645004e-03   54        52        1         0         
  52    5.624323e-01   1.897689e-01   3.536678e-03   55        53        1         0         
  53    5.623636e-01   2.456295e-01   2.057418e-03   56        54        1         0         
  54    5.622929e-01   1.862721e-01   1.518268e-03   57        55        1         0         
  55    5.621907e-01   2.085621e-01   3.512618e-03   58        56        1         0         
  56    5.621295e-01   2.923224e-01   3.282255e-03   59        57        1         0         
  57    5.620688e-01   1.661049e-01   8.144531e-04   60        58        1         0         
  58    5.620070e-01   1.384013e-01   2.040935e-03   61        59        1         0         
  59    5.619590e-01   1.611634e-01   2.555263e-03   62        60        1         0         
  60    5.619129e-01   2.787233e-01   5.742988e-03   63        61        1         0         
  61    5.618500e-01   1.342466e-01   6.143337e-04   64        62        1         0         
  62    5.618066e-01   1.361051e-01   6.596338e-04   65        63        1         0         
  63    5.617494e-01   1.706107e-01   2.094518e-03   66        64        1         0         
  64    5.616927e-01   3.035109e-01   4.122299e-03   67        65        1         0         
  65    5.616349e-01   1.645106e-01   1.726102e-03   68        66        1         0         
  66    5.616020e-01   1.312962e-01   1.206856e-03   69        67        1         0         
  67    5.615559e-01   1.435405e-01   1.948052e-03   70        68        1         0         
  68    5.615128e-01   2.314208e-01   3.034793e-03   71        69        1         0         
  69    5.614689e-01   1.220526e-01   1.704284e-03   72        70        1         0         
  70    5.614386e-01   1.011879e-01   1.281755e-03   73        71        1         0         
  71    5.614106e-01   1.154948e-01   1.093308e-03   74        72        1         0         
  72    5.613864e-01   1.968915e-01   2.418828e-03   75        73        1         0         
  73    5.613626e-01   1.078645e-01   1.260402e-03   76        74        1         0         
  74    5.613479e-01   9.955903e-02   8.020127e-04   77        75        1         0         
  75    5.613283e-01   1.070078e-01   1.033084e-03   78        76        1         0         
  76    5.613102e-01   1.701176e-01   3.776441e-03   79        77        1         0         
  77    5.612811e-01   1.247189e-01   4.322000e-04   80        78        1         0         
  78    5.612649e-01   6.671530e-02   3.288979e-04   81        79        1         0         
  79    5.612501e-01   6.248253e-02   1.250904e-03   82        80        1         0         
  80    5.612396e-01   1.343161e-01   2.193471e-03   83        81        1         0         
  81    5.612314e-01   8.604200e-02   1.209228e-03   84        82        1         0         
  82    5.612226e-01   6.586022e-02   2.528481e-04   85        83        1         0         
  83    5.612119e-01   7.565897e-02   9.307173e-04   86        84        1         0         
  84    5.611994e-01   1.155121e-01   1.380753e-03   87        85        1         0         
  85    5.611874e-01   9.631049e-02   1.561597e-03   88        86        1         0         
  86    5.611698e-01   8.915921e-02   9.595627e-04   89        87        1         0         
  87    5.611592e-01   6.820277e-02   7.326988e-04   90        88        1         0         
  88    5.611500e-01   6.168571e-02   5.736727e-04   91        89        1         0         
  89    5.611399e-01   6.894754e-02   1.027975e-03   92        90        1         0         
  90    5.611327e-01   9.395639e-02   1.726856e-03   93        91        1         0         
  91    5.611248e-01   5.382328e-02   6.532187e-04   94        92        1         0         
  92    5.611180e-01   5.417380e-02   3.305441e-04   95        93        1         0         
  93    5.611096e-01   6.703028e-02   8.902842e-04   96        94        1         0         
  94    5.610997e-01   1.049998e-01   1.718118e-03   97        95        1         0         
  95    5.610902e-01   5.907131e-02   6.154698e-04   98        96        1         0         
  96    5.610825e-01   6.244826e-02   8.179310e-04   99        97        1         0         
  97    5.610742e-01   6.950126e-02   7.508575e-04   100       98        1         0         
  98    5.610644e-01   1.054766e-01   1.983288e-03   101       99        1         0         
  99    5.610552e-01   8.544168e-02   6.560587e-04   102       100       1         0         
  100   5.610529e-01   4.979134e-02   6.603084e-04   103       101       1         0         
  101   5.610464e-01   4.327166e-02   4.938185e-04   104       102       1         0         
  102   5.610389e-01   5.594424e-02   6.913227e-04   105       103       1         0         
  103   5.610300e-01   8.171311e-02   1.416233e-03   106       104       1         0         
  104   5.610203e-01   5.166129e-02   1.351286e-03   107       105       1         0         
  105   5.610130e-01   4.296602e-02   3.970905e-04   108       106       1         0         
  106   5.610049e-01   4.979153e-02   6.717511e-04   109       107       1         0         
  107   5.610003e-01   7.277730e-02   5.528723e-04   110       108       1         0         
  108   5.609973e-01   3.561634e-02   1.921892e-04   111       109       1         0         
  109   5.609948e-01   3.395656e-02   3.206283e-04   112       110       1         0         
  110   5.609910e-01   5.451463e-02   7.418288e-04   113       111       1         0         
  111   5.609871e-01   5.577622e-02   6.902335e-04   114       112       1         0         
  112   5.609812e-01   4.327806e-02   6.759283e-04   115       113       1         0         
  113   5.609785e-01   4.500991e-02   5.300183e-04   116       114       1         0         
  114   5.609747e-01   4.184765e-02   1.765808e-04   117       115       1         0         
  115   5.609719e-01   2.902511e-02   1.195852e-04   118       116       1         0         
  116   5.609713e-01   4.640962e-02   7.496624e-04   119       117       1         0         
  117   5.609699e-01   4.456509e-02   4.388254e-04   120       118       1         0         
  118   5.609677e-01   5.623773e-02   9.522192e-04   122       119       2         0         
  119   5.609651e-01   4.842758e-02   1.787272e-04   123       120       1         0         
  120   5.609625e-01   3.135612e-02   1.095527e-04   124       121       1         0         
  121   5.609576e-01   4.719268e-02   3.801286e-04   125       122       1         0         
  122   5.609546e-01   6.368294e-02   6.304034e-04   127       123       2         0         
  123   5.609519e-01   4.739706e-02   2.226516e-04   128       124       1         0         
  124   5.609485e-01   4.169462e-02   7.128045e-04   129       125       1         0         
  125   5.609462e-01   4.392360e-02   3.338195e-04   130       126       1         0         
  126   5.609436e-01   4.212399e-02   5.293926e-04   131       127       1         0         
  127   5.609418e-01   5.208862e-02   1.027617e-03   132       128       1         0         
  128   5.609384e-01   5.803815e-02   2.678904e-04   133       129       1         0         
  129   5.609351e-01   3.343549e-02   2.652666e-04   134       130       1         0         
  130   5.609327e-01   2.804573e-02   2.187629e-04   135       131       1         0         
  131   5.609305e-01   3.674670e-02   4.055186e-04   136       132       1         0         
  132   5.609295e-01   7.188980e-02   1.223607e-03   137       133       1         0         
  133   5.609290e-01   4.606449e-02   1.204377e-03   138       134       1         0         
  134   5.609250e-01   3.291960e-02   4.324475e-04   139       135       1         0         
  135   5.609241e-01   3.361414e-02   3.128622e-04   140       136       1         0         
  136   5.609204e-01   4.336617e-02   5.446258e-04   141       137       1         0         
  137   5.609173e-01   2.937099e-02   4.825448e-04   142       138       1         0         
  138   5.609156e-01   2.907701e-02   6.439053e-04   143       139       1         0         
  139   5.609140e-01   3.060858e-02   2.211826e-04   144       140       1         0         
  140   5.609125e-01   3.076125e-02   2.299468e-04   145       141       1         0         
  141   5.609112e-01   2.148806e-02   1.704647e-04   146       142       1         0         
  142   5.609099e-01   1.995033e-02   1.904105e-04   147       143       1         0         
  143   5.609055e-01   4.208888e-02   1.272892e-03   148       144       1         0         
  144   5.609036e-01   4.013806e-02   1.576434e-04   149       145       1         0         
  145   5.609023e-01   2.071949e-02   2.352697e-04   150       146       1         0         
  146   5.609004e-01   1.848293e-02   3.298238e-04   151       147       1         0         
  147   5.608984e-01   2.718854e-02   5.500967e-04   152       148       1         0         
  148   5.608974e-01   2.935380e-02   2.363947e-04   153       149       1         0         
  149   5.608967e-01   1.802687e-02   7.907516e-05   154       150       1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.582657e-01   1.925069e-01   
  1     5.581806e-01   1.891884e-01   9.115120e-04   5         2         4         0         
  2     5.581238e-01   1.389579e-01   5.737105e-04   6         3         1         0         
  3     5.580564e-01   1.748984e-01   1.211879e-03   7         4         1         0         
  4     5.580013e-01   2.424860e-01   1.449300e-03   8         5         1         0         
  5     5.579485e-01   1.482591e-01   7.023610e-04   9         6         1         0         
  6     5.578903e-01   1.275827e-01   1.191240e-03   10        7         1         0         
  7     5.578604e-01   1.432549e-01   9.113738e-04   11        8         1         0         
  8     5.578312e-01   9.200096e-02   9.039261e-04   12        9         1         0         
  9     5.578101e-01   9.138550e-02   7.273164e-04   13        10        1         0         
  10    5.577944e-01   1.021246e-01   7.184006e-04   14        11        1         0         
  11    5.577829e-01   6.289709e-02   3.099279e-04   15        12        1         0         
  12    5.577720e-01   5.826804e-02   3.975915e-04   16        13        1         0         
  13    5.577627e-01   7.535395e-02   5.689813e-04   17        14        1         0         
  14    5.577534e-01   8.401845e-02   8.693957e-04   18        15        1         0         
  15    5.577453e-01   5.171307e-02   4.198060e-04   19        16        1         0         
  16    5.577392e-01   4.798325e-02   3.128944e-04   20        17        1         0         
  17    5.577339e-01   7.086371e-02   5.033136e-04   21        18        1         0         
  18    5.577283e-01   5.036970e-02   4.332948e-04   22        19        1         0         
  19    5.577216e-01   4.586794e-02   3.274242e-04   23        20        1         0         
  20    5.577133e-01   6.956001e-02   8.090892e-04   24        21        1         0         
  21    5.577087e-01   6.582394e-02   2.218614e-04   25        22        1         0         
  22    5.577048e-01   4.144891e-02   1.375430e-04   26        23        1         0         
  23    5.576997e-01   3.567754e-02   3.902829e-04   27        24        1         0         
  24    5.576957e-01   5.444704e-02   5.655654e-04   28        25        1         0         
  25    5.576916e-01   4.267050e-02   5.957004e-04   29        26        1         0         
  26    5.576872e-01   3.391779e-02   4.039447e-04   30        27        1         0         
  27    5.576834e-01   3.955181e-02   3.830829e-04   31        28        1         0         
  28    5.576813e-01   3.131099e-02   2.091149e-04   32        29        1         0         
  29    5.576794e-01   2.711521e-02   1.794805e-04   33        30        1         0         
  30    5.576757e-01   4.282120e-02   6.243868e-04   34        31        1         0         
  31    5.576729e-01   3.277648e-02   1.674143e-04   35        32        1         0         
  32    5.576706e-01   2.160823e-02   9.874111e-05   36        33        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.586793e-01   1.254662e-01   
  1     5.586414e-01   1.309860e-01   6.111566e-04   5         2         4         0         
  2     5.586164e-01   1.356254e-01   4.962421e-04   6         3         1         0         
  3     5.585928e-01   1.016969e-01   3.472235e-04   7         4         1         0         
  4     5.585575e-01   1.412097e-01   1.020228e-03   8         5         1         0         
  5     5.585266e-01   1.546400e-01   6.931980e-04   9         6         1         0         
  6     5.584836e-01   1.405658e-01   1.249727e-03   10        7         1         0         
  7     5.584563e-01   1.237337e-01   4.855874e-04   11        8         1         0         
  8     5.584347e-01   1.006582e-01   6.076151e-04   12        9         1         0         
  9     5.584136e-01   1.285841e-01   9.492403e-04   13        10        1         0         
  10    5.583955e-01   9.526160e-02   9.224568e-04   14        11        1         0         
  11    5.583815e-01   7.529627e-02   4.252339e-04   15        12        1         0         
  12    5.583612e-01   8.424926e-02   7.991600e-04   16        13        1         0         
  13    5.583521e-01   1.283009e-01   1.201107e-03   17        14        1         0         
  14    5.583389e-01   1.017775e-01   1.048730e-03   18        15        1         0         
  15    5.583192e-01   1.073500e-01   1.767582e-03   19        16        1         0         
  16    5.582963e-01   2.019501e-01   1.978267e-03   20        17        1         0         
  17    5.582529e-01   1.391747e-01   8.047788e-04   21        18        1         0         
  18    5.581999e-01   1.211397e-01   1.174179e-03   22        19        1         0         
  19    5.581619e-01   1.772980e-01   1.852428e-03   23        20        1         0         
  20    5.581074e-01   2.048683e-01   3.090503e-03   24        21        1         0         
  21    5.580509e-01   1.714125e-01   1.116826e-03   25        22        1         0         
  22    5.580004e-01   1.224747e-01   1.729287e-03   26        23        1         0         
  23    5.579709e-01   1.221982e-01   1.243537e-03   27        24        1         0         
  24    5.579394e-01   1.185323e-01   9.798309e-04   28        25        1         0         
  25    5.579023e-01   1.337833e-01   1.974491e-03   29        26        1         0         
  26    5.578795e-01   1.147388e-01   7.073220e-04   30        27        1         0         
  27    5.578608e-01   8.526074e-02   4.993247e-04   31        28        1         0         
  28    5.578399e-01   8.817576e-02   1.057563e-03   32        29        1         0         
  29    5.578202e-01   1.068547e-01   9.637806e-04   33        30        1         0         
  30    5.578023e-01   8.038477e-02   9.389763e-04   34        31        1         0         
  31    5.577878e-01   9.347319e-02   1.116675e-03   35        32        1         0         
  32    5.577774e-01   7.795245e-02   4.797615e-04   36        33        1         0         
  33    5.577708e-01   5.497121e-02   3.134033e-04   37        34        1         0         
  34    5.577593e-01   5.351909e-02   5.240888e-04   38        35        1         0         
  35    5.577497e-01   7.894328e-02   7.083878e-04   39        36        1         0         
  36    5.577428e-01   4.922590e-02   3.928655e-04   40        37        1         0         
  37    5.577353e-01   4.651326e-02   6.003004e-04   41        38        1         0         
  38    5.577289e-01   5.998026e-02   5.086561e-04   42        39        1         0         
  39    5.577194e-01   6.992522e-02   8.455355e-04   43        40        1         0         
  40    5.577125e-01   7.528339e-02   1.321039e-03   44        41        1         0         
  41    5.577100e-01   7.454124e-02   4.855666e-04   46        42        2         0         
  42    5.577051e-01   6.409435e-02   7.012263e-04   47        43        1         0         
  43    5.577009e-01   9.147912e-02   1.142109e-03   48        44        1         0         
  44    5.576907e-01   7.668278e-02   3.128457e-04   49        45        1         0         
  45    5.576826e-01   5.308378e-02   2.793257e-04   50        46        1         0         
  46    5.576692e-01   5.562028e-02   9.758892e-04   51        47        1         0         
  47    5.576539e-01   7.896900e-02   1.078529e-03   52        48        1         0         
  48    5.576384e-01   6.926961e-02   1.221824e-03   53        49        1         0         
  49    5.576256e-01   6.214349e-02   1.244204e-03   54        50        1         0         
  50    5.576201e-01   9.385012e-02   2.559767e-03   55        51        1         0         
  51    5.576101e-01   6.760418e-02   8.935094e-04   56        52        1         0         
  52    5.576075e-01   5.168769e-02   6.918181e-04   57        53        1         0         
  53    5.576041e-01   3.659760e-02   3.545651e-04   58        54        1         0         
  54    5.575992e-01   5.038257e-02   1.347843e-03   59        55        1         0         
  55    5.575959e-01   6.087697e-02   3.896649e-04   60        56        1         0         
  56    5.575922e-01   3.894313e-02   2.670721e-04   61        57        1         0         
  57    5.575886e-01   3.388118e-02   1.803827e-04   62        58        1         0         
  58    5.575826e-01   4.382820e-02   4.758090e-04   63        59        1         0         
  59    5.575785e-01   6.718491e-02   8.774778e-04   64        60        1         0         
  60    5.575761e-01   3.209738e-02   1.716493e-04   65        61        1         0         
  61    5.575752e-01   2.567354e-02   9.159221e-05   66        62        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.579681e-01   1.376670e-01   
  1     5.579304e-01   1.182074e-01   5.319556e-04   5         2         4         0         
  2     5.579160e-01   1.149971e-01   3.506103e-04   6         3         1         0         
  3     5.579041e-01   8.255061e-02   2.423980e-04   7         4         1         0         
  4     5.578867e-01   7.983529e-02   5.857600e-04   8         5         1         0         
  5     5.578733e-01   1.359939e-01   7.329852e-04   9         6         1         0         
  6     5.578575e-01   7.983900e-02   5.511303e-04   10        7         1         0         
  7     5.578431e-01   6.498031e-02   5.064664e-04   11        8         1         0         
  8     5.578292e-01   7.299175e-02   6.389714e-04   12        9         1         0         
  9     5.578226e-01   1.384878e-01   1.014154e-03   13        10        1         0         
  10    5.578117e-01   5.577055e-02   1.746139e-04   14        11        1         0         
  11    5.578073e-01   4.902490e-02   1.977221e-04   15        12        1         0         
  12    5.578038e-01   5.722332e-02   4.185312e-04   16        13        1         0         
  13    5.577973e-01   6.591440e-02   7.291544e-04   17        14        1         0         
  14    5.577919e-01   9.834434e-02   9.399384e-04   18        15        1         0         
  15    5.577852e-01   6.097007e-02   4.723666e-04   19        16        1         0         
  16    5.577783e-01   4.834965e-02   2.607847e-04   20        17        1         0         
  17    5.577741e-01   4.949508e-02   3.636215e-04   21        18        1         0         
  18    5.577721e-01   9.376824e-02   1.503659e-03   22        19        1         0         
  19    5.577625e-01   6.718360e-02   3.091248e-04   23        20        1         0         
  20    5.577575e-01   4.222597e-02   1.750132e-04   24        21        1         0         
  21    5.577518e-01   4.928245e-02   5.031140e-04   25        22        1         0         
  22    5.577471e-01   7.759717e-02   1.543330e-03   26        23        1         0         
  23    5.577441e-01   1.049674e-01   7.517314e-04   27        24        1         0         
  24    5.577370e-01   5.608883e-02   1.524345e-04   28        25        1         0         
  25    5.577327e-01   4.064271e-02   3.834694e-04   29        26        1         0         
  26    5.577298e-01   5.153151e-02   5.412461e-04   30        27        1         0         
  27    5.577266e-01   7.216031e-02   9.823881e-04   31        28        1         0         
  28    5.577226e-01   5.544196e-02   6.384786e-04   32        29        1         0         
  29    5.577197e-01   4.056695e-02   3.221432e-04   33        30        1         0         
  30    5.577151e-01   4.701263e-02   3.624941e-04   34        31        1         0         
  31    5.577108e-01   5.820039e-02   6.632754e-04   35        32        1         0         
  32    5.577057e-01   4.676461e-02   6.697638e-04   36        33        1         0         
  33    5.576985e-01   4.806430e-02   6.254325e-04   37        34        1         0         
  34    5.576946e-01   4.797654e-02   4.011485e-04   38        35        1         0         
  35    5.576919e-01   3.271314e-02   1.433047e-04   39        36        1         0         
  36    5.576881e-01   3.023355e-02   3.796861e-04   40        37        1         0         
  37    5.576855e-01   4.272400e-02   4.396849e-04   41        38        1         0         
  38    5.576822e-01   3.201083e-02   5.748915e-04   42        39        1         0         
  39    5.576812e-01   3.584364e-02   9.189139e-04   43        40        1         0         
  40    5.576790e-01   4.286734e-02   4.429494e-04   44        41        1         0         
  41    5.576775e-01   3.096253e-02   1.956651e-04   45        42        1         0         
  42    5.576770e-01   2.646472e-02   1.718436e-04   46        43        1         0         
  43    5.576758e-01   2.872950e-02   3.484665e-04   47        44        1         0         
  44    5.576743e-01   2.522746e-02   3.772285e-04   48        45        1         0         
  45    5.576730e-01   2.094145e-02   2.439415e-04   49        46        1         0         
  46    5.576715e-01   2.555484e-02   3.787357e-04   50        47        1         0         
  47    5.576708e-01   2.756238e-02   2.212760e-04   51        48        1         0         
  48    5.576702e-01   1.765816e-02   9.325744e-05   52        49        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.584559e-01   9.494288e-02   
  1     5.584385e-01   7.630859e-02   4.017487e-04   5         2         4         0         
  2     5.584304e-01   8.345819e-02   2.641763e-04   6         3         1         0         
  3     5.584241e-01   5.833427e-02   1.763400e-04   7         4         1         0         
  4     5.584151e-01   5.774724e-02   4.375673e-04   8         5         1         0         
  5     5.584086e-01   7.653317e-02   4.508982e-04   9         6         1         0         
  6     5.584021e-01   6.338211e-02   5.374525e-04   10        7         1         0         
  7     5.583955e-01   4.855761e-02   2.907142e-04   11        8         1         0         
  8     5.583861e-01   5.639752e-02   6.215275e-04   12        9         1         0         
  9     5.583813e-01   8.088708e-02   4.496538e-04   13        10        1         0         
  10    5.583773e-01   4.291415e-02   9.046002e-05   14        11        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.594727e-01   7.799980e-02   
  1     5.594658e-01   7.681424e-02   2.025540e-04   5         2         4         0         
  2     5.594604e-01   4.520288e-02   1.466840e-04   6         3         1         0         
  3     5.594562e-01   3.945956e-02   2.073179e-04   7         4         1         0         
  4     5.594519e-01   5.266233e-02   2.565856e-04   8         5         1         0         
  5     5.594446e-01   5.889339e-02   4.464461e-04   9         6         1         0         
  6     5.594346e-01   7.262512e-02   7.230681e-04   10        7         1         0         
  7     5.594252e-01   8.801019e-02   1.111032e-03   11        8         1         0         
  8     5.594148e-01   6.734559e-02   5.867382e-04   12        9         1         0         
  9     5.594054e-01   6.045346e-02   5.618233e-04   13        10        1         0         
  10    5.593961e-01   8.350477e-02   8.112221e-04   14        11        1         0         
  11    5.593879e-01   4.956142e-02   7.291780e-04   15        12        1         0         
  12    5.593846e-01   4.078476e-02   3.606484e-04   16        13        1         0         
  13    5.593808e-01   5.346732e-02   7.049768e-04   17        14        1         0         
  14    5.593765e-01   6.477380e-02   4.651618e-04   18        15        1         0         
  15    5.593719e-01   5.182136e-02   3.976146e-04   19        16        1         0         
  16    5.593673e-01   5.334682e-02   9.546492e-04   20        17        1         0         
  17    5.593626e-01   7.560225e-02   6.688283e-04   21        18        1         0         
  18    5.593577e-01   5.630610e-02   5.772078e-04   22        19        1         0         
  19    5.593533e-01   5.643063e-02   9.951115e-04   23        20        1         0         
  20    5.593495e-01   7.182306e-02   8.125167e-04   24        21        1         0         
  21    5.593469e-01   5.331127e-02   7.731894e-04   25        22        1         0         
  22    5.593417e-01   4.276217e-02   3.202829e-04   26        23        1         0         
  23    5.593381e-01   5.203221e-02   4.680901e-04   27        24        1         0         
  24    5.593347e-01   3.728184e-02   4.514517e-04   28        25        1         0         
  25    5.593307e-01   3.669285e-02   3.627851e-04   29        26        1         0         
  26    5.593272e-01   6.170823e-02   1.413208e-03   30        27        1         0         
  27    5.593244e-01   4.791382e-02   4.417597e-04   31        28        1         0         
  28    5.593227e-01   2.602266e-02   1.402369e-04   32        29        1         0         
  29    5.593218e-01   2.088865e-02   1.394815e-04   33        30        1         0         
  30    5.593211e-01   2.202705e-02   2.247130e-04   34        31        1         0         
  31    5.593209e-01   2.047566e-02   2.256283e-04   36        32        2         0         
  32    5.593209e-01   1.927724e-02   9.692426e-05   39        33        3         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.603279e-01   8.938471e-02   
  1     5.603189e-01   6.491712e-02   2.715498e-04   5         2         4         0         
  2     5.603154e-01   6.508163e-02   2.334216e-04   6         3         1         0         
  3     5.603123e-01   4.577313e-02   1.480712e-04   7         4         1         0         
  4     5.603099e-01   5.048777e-02   2.120154e-04   8         5         1         0         
  5     5.603061e-01   6.367320e-02   3.932201e-04   9         6         1         0         
  6     5.603028e-01   8.048665e-02   4.935821e-04   10        7         1         0         
  7     5.602988e-01   4.594701e-02   2.171550e-04   11        8         1         0         
  8     5.602980e-01   5.169854e-02   4.229400e-04   12        9         1         0         
  9     5.602930e-01   4.548249e-02   2.052410e-04   13        10        1         0         
  10    5.602849e-01   5.231534e-02   6.539559e-04   14        11        1         0         
  11    5.602824e-01   7.350685e-02   4.003177e-04   15        12        1         0         
  12    5.602803e-01   3.602358e-02   7.449563e-05   16        13        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.613417e-01   6.403886e-02   
  1     5.613359e-01   6.699672e-02   1.679728e-04   5         2         4         0         
  2     5.613315e-01   3.932849e-02   1.108777e-04   6         3         1         0         
  3     5.613267e-01   3.557532e-02   1.696443e-04   7         4         1         0         
  4     5.613211e-01   5.131123e-02   3.019270e-04   8         5         1         0         
  5     5.613168e-01   6.083608e-02   2.881352e-04   9         6         1         0         
  6     5.613094e-01   4.908436e-02   4.265811e-04   10        7         1         0         
  7     5.613009e-01   7.285660e-02   6.884265e-04   11        8         1         0         
  8     5.612925e-01   7.761328e-02   6.987368e-04   12        9         1         0         
  9     5.612842e-01   6.585311e-02   4.718725e-04   13        10        1         0         
  10    5.612694e-01   7.769539e-02   1.206928e-03   14        11        1         0         
  11    5.612589e-01   8.592217e-02   8.208129e-04   15        12        1         0         
  12    5.612454e-01   6.736984e-02   7.736164e-04   16        13        1         0         
  13    5.612283e-01   6.301179e-02   1.042438e-03   17        14        1         0         
  14    5.612125e-01   7.450786e-02   1.230899e-03   18        15        1         0         
  15    5.611975e-01   7.612093e-02   1.619104e-03   19        16        1         0         
  16    5.611880e-01   8.264325e-02   9.145276e-04   20        17        1         0         
  17    5.611812e-01   4.764783e-02   4.087013e-04   21        18        1         0         
  18    5.611766e-01   3.986307e-02   2.930610e-04   22        19        1         0         
  19    5.611721e-01   4.672971e-02   4.974387e-04   23        20        1         0         
  20    5.611692e-01   5.020174e-02   3.431364e-04   24        21        1         0         
  21    5.611656e-01   3.469544e-02   2.507917e-04   25        22        1         0         
  22    5.611605e-01   3.723557e-02   5.167268e-04   26        23        1         0         
  23    5.611576e-01   3.762584e-02   2.860109e-04   27        24        1         0         
  24    5.611547e-01   3.343568e-02   2.612125e-04   28        25        1         0         
  25    5.611525e-01   3.744914e-02   2.819905e-04   29        26        1         0         
  26    5.611510e-01   2.743625e-02   2.850552e-04   30        27        1         0         
  27    5.611483e-01   3.012766e-02   4.578310e-04   31        28        1         0         
  28    5.611456e-01   3.251921e-02   3.129649e-04   32        29        1         0         
  29    5.611437e-01   2.169555e-02   1.333192e-04   33        30        1         0         
  30    5.611422e-01   2.221321e-02   1.728814e-04   34        31        1         0         
  31    5.611406e-01   2.531675e-02   2.129367e-04   35        32        1         0         
  32    5.611394e-01   2.026923e-02   2.581178e-04   36        33        1         0         
  33    5.611383e-01   1.976094e-02   2.171860e-04   37        34        1         0         
  34    5.611372e-01   2.366452e-02   3.731654e-04   38        35        1         0         
  35    5.611357e-01   2.128490e-02   1.363958e-04   39        36        1         0         
  36    5.611346e-01   1.631791e-02   1.144702e-04   40        37        1         0         
  37    5.611337e-01   1.828666e-02   1.349537e-04   41        38        1         0         
  38    5.611328e-01   1.930851e-02   1.523521e-04   42        39        1         0         
  39    5.611312e-01   4.424640e-02   7.720636e-04   43        40        1         0         
  40    5.611306e-01   2.316886e-02   3.018905e-04   44        41        1         0         
  41    5.611303e-01   1.515467e-02   9.401241e-05   45        42        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.621064e-01   7.235609e-02   
  1     5.621002e-01   5.242208e-02   2.209446e-04   5         2         4         0         
  2     5.620993e-01   5.879059e-02   1.914728e-04   6         3         1         0         
  3     5.620982e-01   3.864235e-02   1.096312e-04   7         4         1         0         
  4     5.620949e-01   3.627404e-02   1.660045e-04   8         5         1         0         
  5     5.620915e-01   4.532521e-02   2.816813e-04   9         6         1         0         
  6     5.620875e-01   6.617935e-02   3.479036e-04   10        7         1         0         
  7     5.620825e-01   4.293825e-02   3.267465e-04   11        8         1         0         
  8     5.620778e-01   4.060222e-02   4.475748e-04   12        9         1         0         
  9     5.620741e-01   5.271931e-02   5.492578e-04   13        10        1         0         
  10    5.620715e-01   4.930497e-02   3.836044e-04   14        11        1         0         
  11    5.620687e-01   3.677735e-02   3.231723e-04   15        12        1         0         
  12    5.620659e-01   3.805484e-02   3.344707e-04   16        13        1         0         
  13    5.620638e-01   4.471649e-02   4.302075e-04   17        14        1         0         
  14    5.620609e-01   3.960735e-02   3.340194e-04   18        15        1         0         
  15    5.620600e-01   4.443114e-02   5.350998e-04   20        16        2         0         
  16    5.620596e-01   4.760297e-02   6.465181e-04   21        17        1         0         
  17    5.620560e-01   4.979435e-02   3.114990e-04   22        18        1         0         
  18    5.620536e-01   3.915238e-02   1.967616e-04   23        19        1         0         
  19    5.620524e-01   2.803396e-02   1.740717e-04   24        20        1         0         
  20    5.620509e-01   2.574392e-02   2.395726e-04   25        21        1         0         
  21    5.620491e-01   4.508028e-02   3.752206e-04   26        22        1         0         
  22    5.620475e-01   2.476365e-02   1.804313e-04   27        23        1         0         
  23    5.620463e-01   1.968822e-02   1.505580e-04   28        24        1         0         
  24    5.620457e-01   2.067251e-02   1.601972e-04   29        25        1         0         
  25    5.620454e-01   2.445440e-02   1.498471e-04   31        26        2         0         
  26    5.620454e-01   2.152580e-02   1.810199e-04   33        27        2         0         
  27    5.620448e-01   2.454979e-02   3.237761e-04   34        28        1         0         
  28    5.620446e-01   2.341307e-02   2.553242e-04   35        29        1         0         
  29    5.620441e-01   1.355628e-02   4.202849e-05   36        30        1         0         
Optimization Terminated with Status: Step Tolerance Met
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     5.630775e-01   6.222945e-02   
  1     5.630710e-01   5.247122e-02   2.480069e-04   5         2         4         0         
  2     5.630676e-01   5.372554e-02   1.573669e-04   6         3         1         0         
  3     5.630653e-01   3.761779e-02   1.041340e-04   7         4         1         0         
  4     5.630628e-01   3.172727e-02   1.829048e-04   8         5         1         0         
  5     5.630613e-01   5.200914e-02   4.270347e-04   9         6         1         0         
  6     5.630579e-01   4.973266e-02   1.452540e-04   10        7         1         0         
  7     5.630554e-01   3.209168e-02   1.375445e-04   11        8         1         0         
  8     5.630528e-01   2.761594e-02   2.456504e-04   12        9         1         0         
  9     5.630503e-01   3.767507e-02   4.202661e-04   13        10        1         0         
  10    5.630478e-01   5.054913e-02   3.235570e-04   14        11        1         0         
  11    5.630455e-01   2.563980e-02   9.400150e-05   15        12        1         0         
Optimization Terminated with Status: Step Tolerance Met

The resulting computed value of $q$ does a great job capturing all of the sharp edges in $q$, despite the fact that we use a penalty parameter of $\rho = 1$. When we used the pure penalty method in the previous demo, we had to take the penalty parameter to be on the order of 400. The inverse solver took much longer to converge and we still missed some of the edges.

import matplotlib.pyplot as plt
fig, axes = plt.subplots()
colors = firedrake.tripcolor(q, axes=axes)
fig.colorbar(colors);
No description has been provided for this image

The split variable $v$ matches the gradient of $q$ quite well.

fig, axes = plt.subplots()
colors = firedrake.tripcolor(v, axes=axes)
fig.colorbar(colors);
No description has been provided for this image

Finally, let's look at the relative changes in successive iterates of $q$ in the 1-norm at each step in order to get an idea of how fast the method converges.

δs = [
    assemble(abs(q2 - q1) * dx) / assemble(abs(q2) * dx)
    for q1, q2 in zip(qs[:-1], qs[1:])
]
fig, axes = plt.subplots()
axes.set_yscale('log')
axes.set_ylabel('Relative change in $q$')
axes.set_xlabel('Iteration')
axes.plot(δs);
No description has been provided for this image

Some iterations seem to hardly advance the solution at all, but when taken in aggregate this looks like the typical convergence of a first-order method.

Discussion

Much like the pure penalty method, the alternating direction method of multipliers offers a way to solve certain classes of non-smooth optimization problem by instead solving a sequence of smooth ones. ADMM, by introducing an explicit Lagrange multiplier estimate to enforce the consensus constraint, offers much faster convergence than the pure penalty method and the size of the penalty parameter does not need to go off to infinity. As a consequence, each of the smooth optimization problems that we have to solve has much better conditioning.

For this test case, we were able to take the penalty parameter $\rho$ to be equal to 1 from the outset and still obtain a good convergence rate. For more involved problems it's likely that we would instead have to test for convergence with a given value of $\rho$ and increase it by some factor greater than 1 if need be. Scaling this penalty parameter by an appropriate power of the regularization parameter $\alpha$ ahead of time makes it dimensionless. This property is especially advantageous for realistic problems but it requires you to know something about the objective you're minimizing.

There are obvious grid imprinting artifacts in the solution that we computed. To remedy this undesirable feature, we could use a mesh adaptation strategy that would refine (preferably anisotropically) along any sharp gradients in $q$.

Finally, we motivated ADMM by assuming that we could take an $L^2$-norm difference of $v$ and $\nabla q$. The idealized, infinite-dimensional version of the problem assumes only that $q$ lives in the space $BV(\Omega)$ of functions of bounded variation. The gradient of such a function is a finite, signed Borel measure, and thus may not live in $L^2$ at all. Hintermüller et al. (2014) gives an alternative formulation based on the dual problem, which has the right coercivity properties for Moreau-Yosida regularization to make sense. It's possible that the form I presented here falls afoul of some subtle functional analysis and that the solutions exhibit strong mesh dependence under refinement. Alternatively, it's possible that, while $v$ and $\nabla q$ only live in the space of finite signed measures and thus are not square integrable, their difference $\nabla q - v$ does live in $L^2$. Investigating this more will have to wait for another day.

Rosenbrock schemes

In the previous demo, we looked at a few spatial and temporal discretizations of the nonlinear shallow water equations. One of the challenging parts about solving systems of hyperbolic PDE like the shallow water equations is choosing a timestep that satisfies the Courant-Friedrichs-Lewy condition. You can pick a good timestep ahead of time for a linear autonomous system. A nonlinear system, on the other hand, might wander into weird parts of phase space where the characteristic wave speeds are much higher. You might be able to pick a good timestep from the outset, but it's likely to be overly conservative and waste loads of compute time. The tyranny of the CFL condition is the reason why it's such a common grumble among practitioners that ocean models explode if you look at them sideways.

All of the timestepping schemes we used in the previous demo were Runge-Kutta methods, which look something like this:

$$z_{n + 1} = z_n + \delta t\cdot \sum_ib_ik_i$$

where $b_i$ are weights and the stages $k_i$ are defined as

$$k_i = f\left(z_n + \delta t\sum_ja_{ij}k_j\right).$$

For the method to be explicit, we would need that $a_{ij} = 0$ if $j \ge i$. You can find all the conditions for a Runge-Kutta method to have a certain order of accuracy in time in books like Hairer and Wanner.

Implicit Runge-Kutta schemes can eliminate many of the frustrating stability issues that occur with explicit schemes. Implicit methods can use timesteps that far exceed the CFL-stable timestep. But they introduce the added complexity of having to solve a nonlinear system at every timestep. What globalization strategy will you use for Newton's method? What preconditioner will you use in solving the associated linear systems? These are all decisions you didn't have to make before. It's possible to reduce some of the pain and suffering by using schemes for which $a_{ii}$ can be nonzero but $a_{ij} = 0$ if $j > 0$ -- these are the diagonally-implicit Runge-Kutta schemes. Rather than have to solve a gigantic nonlinear system for all of the stages $k_i$ at once, you only have to solve a sequence of nonlinear systems for each stage.

The idea behind Rosenbrock methods is to perform only a single iteration of Newton's method for the nonlinear system defining the Runge-Kutta stages, rather than actually solve that system to convergence. There are two heuristic justifications for Rosenbrock schemes. First, a scheme like implicit Euler is only first-order accurate in time anyway, so there isn't much reason to do a really accurate nonlinear system solve as part of a fairly crude timestepping scheme. Second, for a timestep that isn't too much larger than the characteristic timescale of the problem, the current system state is probably either in the quadratic convergence basin for Newton's method or at least fairly close.

More general Rosenbrock schemes follow from this idea. The best reference I've found is one of the original papers on the subject, Kaps and Rentrop (1979). This paper shows more general schemes in this family, derives the order conditions for the various weights and parameters, and perhaps most importantly derives an embedded Rosenbrock scheme that can be used for adaptive timestep control. Here we'll show one of the most basic schemes, which comes from taking a single Newton step for the implicit midpoint rule.

Setup

All of this is copied from the previous demo, so I'll give only cursory explanations.

import firedrake
from firedrake import Constant
g = Constant(9.81)
I = firedrake.Identity(2)

The following functions compute symbolic representations of the various shallow water fluxes.

from firedrake import inner, grad, dx
def cell_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    f_h = -inner(q, grad(ϕ)) * dx

    F = outer(q, q) / h + 0.5 * g * h**2 * I
    f_q = -inner(F, grad(v)) * dx

    return f_h + f_q
from firedrake import avg, outer, dS
def central_facet_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    mesh = z.ufl_domain()
    n = firedrake.FacetNormal(mesh)

    f_h = inner(avg(q), ϕ('+') * n('+') + ϕ('-') * n('-')) * dS

    F = outer(q, q) / h + 0.5 * g * h**2 * I
    f_q = inner(avg(F), outer(v('+'), n('+')) + outer(v('-'), n('-'))) * dS
    
    return f_h + f_q
from firedrake import sqrt, max_value
def lax_friedrichs_facet_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    mesh = h.ufl_domain()
    n = firedrake.FacetNormal(mesh)
    
    c = abs(inner(q / h, n)) + sqrt(g * h)
    α = avg(c)
    
    f_h = -α * (h('+') - h('-')) * (ϕ('+') - ϕ('-')) * dS
    f_q = -α * inner(q('+') - q('-'), v('+') - v('-')) * dS

    return f_h + f_q
def topographic_forcing(z, b):
    Z = z.function_space()
    h = firedrake.split(z)[0]
    v = firedrake.TestFunctions(Z)[1]

    return -g * h * inner(grad(b), v) * dx

For an explicit time discretization and a DG method in space, we can use an ILU preconditioner with a block Jacobi inner preconditioner and this will exactly invert the DG mass matrix.

block_parameters = {
    'ksp_type': 'preonly',
    'pc_type': 'ilu',
    'sub_pc_type': 'bjacobi'
}

parameters = {
    'solver_parameters': {
        'ksp_type': 'preonly',
        'pc_type': 'fieldsplit',
        'fieldsplit_0': block_parameters,
        'fieldsplit_1': block_parameters
    }
}
from firedrake import (
    NonlinearVariationalProblem as Problem,
    NonlinearVariationalSolver as Solver
)

class SSPRK3:
    def __init__(self, state, equation):
        z = state.copy(deepcopy=True)
        dt = firedrake.Constant(1.0)
        
        num_stages = 3
        zs = [state.copy(deepcopy=True) for stage in range(num_stages)]
        Fs = [equation(z), equation(zs[0]), equation(zs[1])]
        
        Z = z.function_space()
        w = firedrake.TestFunction(Z)
        forms = [
            inner(zs[0] - z, w) * dx - dt * Fs[0],
            inner(zs[1] - (3 * z + zs[0]) / 4, w) * dx - dt / 4 * Fs[1],
            inner(zs[2] - (z + 2 * zs[1]) / 3, w) * dx - 2 * dt / 3 * Fs[2]
        ]
        
        problems = [Problem(form, zk) for form, zk in zip(forms, zs)]
        solvers = [Solver(problem, **parameters) for problem in problems]
        
        self.state = z
        self.stages = zs
        self.timestep = dt
        self.solvers = solvers
    
    def step(self, timestep):
        self.timestep.assign(timestep)
        for solver in self.solvers:
            solver.solve()
        self.state.assign(self.stages[-1])

We'll create some auxiliary functions to actually run the simulation and create an animation of it.

from tqdm.notebook import trange

def run_simulation(solver, final_time, num_steps, output_freq):
    hs, qs = [], []
    qs = []
    pbar = trange(num_steps)
    for step in pbar:
        if step % output_freq == 0:
            h, q = solver.state.subfunctions
            hmin, hmax = h.dat.data_ro.min(), h.dat.data_ro.max()
            pbar.set_description(f'{hmin:5.3f}, {hmax:5.3f}')
            hs.append(h.copy(deepcopy=True))
            qs.append(q.copy(deepcopy=True))

        solver.step(timestep)
    
    return hs, qs
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

def make_animation(hs, b, timestep, output_freq, **kwargs):
    fig, axes = plt.subplots()
    axes.set_aspect('equal')
    axes.set_xlim((0.0, Lx))
    axes.set_ylim((0.0, Ly))
    axes.get_xaxis().set_visible(False)
    axes.get_yaxis().set_visible(False)
    η = firedrake.project(hs[0] + b, hs[0].function_space())
    colors = firedrake.tripcolor(
        hs[0], num_sample_points=1, axes=axes, **kwargs
    )
    
    def animate(h):
        η.project(h + b)
        colors.set_array(η.dat.data_ro[:])

    interval = 1e3 * output_freq * timestep
    animation = FuncAnimation(fig, animate, frames=hs, interval=interval)
    
    plt.close(fig)
    return HTML(animation.to_html5_video())

Rosenbrock scheme

The implementation of the Rosenbrock scheme is fairly similar to the other timestepping methods we've shown before, but we have an extra term in the variational problem describing the linearization of the dynamics. We're also making the initializer take some extra arguments for solver parameters. When we were using explicit schemes, there was really only one sane choice of solver parameters because the matrix we had to invert was just a DG mass matrix. Here, the choice of iterative solvers and preconditioners can become much more involved, as we'll show later.

from firedrake import derivative
class Rosenbrock:
    def __init__(self, state, equation, solver_parameters=None):
        z = state.copy(deepcopy=True)
        F = equation(z)
        
        z_n = z.copy(deepcopy=True)
        Z = z.function_space()
        w = firedrake.TestFunction(Z)

        dt = firedrake.Constant(1.0)

        dF = derivative(F, z, z_n - z)
        problem = Problem(
            inner(z_n - z, w) * dx - dt / 2 * dF - dt * F,
            z_n
        )
        solver = Solver(problem, solver_parameters=solver_parameters)

        self.state = z
        self.next_state = z_n
        self.timestep = dt
        self.solver = solver
        
    def step(self, timestep):
        self.timestep.assign(timestep)
        self.solver.solve()
        self.state.assign(self.next_state)

Demonstration

We'll use the same input data and function spaces as before -- BDFM(2) for the momentum and DG(1) for the thickness.

nx, ny = 32, 32
Lx, Ly = 20., 20.
mesh = firedrake.PeriodicRectangleMesh(
    nx, ny, Lx, Ly, diagonal='crossed'
)

x = firedrake.SpatialCoordinate(mesh)
lx = 5.
y = Constant((lx, lx))
r = Constant(2.5)

h_0 = Constant(1.)
δh = Constant(1/16)
h_expr = h_0 + δh * max_value(0, 1 - inner(x - y, x - y) / r**2)

y = Constant((3 * lx, 3 * lx))
δb = Constant(1/4)
b = δb * max_value(0, 1 - inner(x - y, x - y) / r**2)
DG1 = firedrake.FunctionSpace(mesh, family='DG', degree=1)
BDFM2 = firedrake.FunctionSpace(mesh, family='BDFM', degree=2)
Z = DG1 * BDFM2
z0 = firedrake.Function(Z)
z0.sub(0).project(h_expr - b);
def F(z):
    return (
        cell_flux(z) +
        central_facet_flux(z) +
        lax_friedrichs_facet_flux(z) -
        topographic_forcing(z, b)
    )
SSPRK(3)

To get a baseline solution, we'll use the SSPRK(3) scheme from before.

solver = SSPRK3(z0, F)
final_time = 10.0
timestep = 5e-3
num_steps = int(final_time / timestep)
output_freq = 10
hs, qs = run_simulation(solver, final_time, num_steps, output_freq)
make_animation(
    hs, b, timestep, output_freq, shading='gouraud', vmin=0.96, vmax=1.04
)
energies_ssprk3 = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies_ssprk3);
No description has been provided for this image

So that we have a number to compare against for later, we can calculate the total energy drift from the beginning to the end of the simulation:

energies_ssprk3[-1] - energies_ssprk3[0]
np.float64(0.11940083238732768)
Rosenbrock

Now let's see how the new scheme scheme fares.

solver = Rosenbrock(z0, F)

We can use a much longer timestep than we could with explicit methods.

final_time = 10.0
timestep = 50e-3
num_steps = int(final_time / timestep)
output_freq = 1
hs, qs = run_simulation(solver, final_time, num_steps, output_freq)

A subtle but interesting feature you can see in this animation is that the spurious wave emanating from the bump at the bed has a much smaller magnitude with the Rosenbrock scheme than with any of the explicit schemes.

make_animation(
    hs, b, timestep, output_freq, shading='gouraud', vmin=0.96, vmax=1.04
)

The energy drift is cut by a factor of 5 compared to using an explicit scheme. On top of that, we were able to achieve it using much larger timesteps than were CFL-stable before, and as a consequence the overall time for the simulation is shorter.

energies_rosenbrock = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies_rosenbrock);
No description has been provided for this image
energies_rosenbrock[-1] - energies_rosenbrock[0]
np.float64(0.22949445795711654)

FIXME: the statement above was from the last time I updated this in ~2022. The results from SSPRK3 got better and from Rosenbrock worse on a more recent Firedrake install. I should probably investigate this and figure out what's different.

Conclusion

In the previous post, we showed some of the difficulties associated with solving the shallow water equations. The two biggest problems we ran into were getting a CFL-stable timestep and controlling energy drift. Rosenbrock schemes almost eliminate stability problems and decimate the drift as well. While they are substantially more expensive for a single timestep, there are a lot of gains to be had by using a better preconditioner. On top of that, we can gain other efficiencies by approximating the linearized dynamics with a matrix that's easier to invert.

Inverse problems

In previous posts, we've seen how to solve elliptic PDE, sometimes with constraints, assuming we know everything about the coefficients and other input data. Some problems in geophysics and engineering involve going backwards. We have direct measurements of some field that we know is the solution of a PDE, and from that data we want to estimate what the coefficients were. This is what's called an inverse problem. For example, knowing the inflow rate of groundwater and the degree to which the soil and bedrock are porous, we can calculate what the hydraulic head will be by solving the Poisson equation; this is the forward problem. The inverse problem would be to estimate the porosity from measurements of the hydraulic head.

We've already seen many of the techniques that we'll use to solve inverse problems and in this post I'll demonstrate them. Inverse problems can be expressed through PDE-constrained optimization, and the biggest challenge is calculating the gradient of the objective functional with respect to the input parameters. There's a systematic and practical algorithm to do this called the adjoint method. The UFL language for variational forms preserves enough of the high-level semantics of what problem you're solving, and consequently it's possible to generate all of the code necessary to implement the adjoint method solely from the code for the weak form. The package pyadjoint does this and even won a Wilkinson Prize for numerical software. In the following, I'll use pyadjoint to both calculate derivatives and solve optimization problems, but it's instructive to roll your own adjoint method and solvers if you haven't done it before.

The problem

Suppose that the physics we're interested in can be described by the Poisson problem. We want to estimate is the conductivity coefficient and we have measurements of the solution $u$. Rather than solve for the conductivity $K$ itself, I'll instead assume that the field $q$ that we want to infer is the logarithm of the conductivity:

$$K = ke^q,$$

where $k$ is some real constant. The reason for this change of variables is to guarantee that the conductivity is positive, a necessary condition which can be challenging to enforce through other means. For our problem, we'll include some internal sources $f$. By way of boundary conditions, we'll assume that the solution is adjusts with some exchange coefficient $h$ to an external field $g$ (these are Robin boundary conditions). The weak form of this equation is

$$\begin{align} \langle F(u, q), v\rangle = & \int_\Omega\left(ke^q\nabla u\cdot\nabla v - fv\right)dx \\ & \qquad\qquad + \int_{\partial\Omega}h(u - g)v\, ds \end{align}$$

I'll assume that we know the sources, external field, and exchange coefficient accurately. The quantity that we want to minimize is the mean-square misfit of the solution $u$ with some observations $u^o$:

$$E(u) = \frac{1}{2}\int_\Omega\left(\frac{u - u^o}{\sigma}\right)^2dx,$$

where $\sigma$ is the standard deviation of the measurement errors in $u^o$. For realistic problems we might want to consider more robust measures of solution quality, like the 1-norm, but for demonstrative purposes the square norm is perfectly fine.

To make our problem as realistic as possible, we'll create a set of synthetic observations that's been polluted from the true value with random noise. The presence of noise introduces an additional challenge. The map from the parameters $q$ to the observations $u$ involves solving an elliptic PDE and thus tends to give an output field $u$ that is smoother than the input field $q$. (You can actually write down an analytical form of the linearization of this map that makes the smoothing property evident.) For many practical problems, however, the measurement errors are spatial white noise, which have equal power at all frequencies. If we put white noise through the inverse of a smoothing operator, we'll end up amplifying the high-frequency modes and the estimated field $q$ will be polluted with spurious osillations. To remove these unphysical features, we'll also include some metric of how oscillatory the inferred field is, which in our case will be

$$R(q) = \frac{1}{2}\int_\Omega|\nabla q|^2dx.$$

This is called the regularization functional. Depending on the problem you may want to use a different regularization functional, and at the end of this post I'll give an example of when you might want to do that.

All together now

The quantity we want to minimize is the functional

$$J(u, q) = E(u) + \alpha^2 R(q),$$

subject to the constraint that $u$ and $q$ are related by the PDE, which we'll write in abstract form as $F(u, q) = 0$. The parameter $\alpha$ is a length scale that determines how much we want to regularize the inferred field. Making a good choice of $\alpha$ is a bit of an art form best left for another day; in the following demonstration I'll pick a reasonable value and leave it at that. The adjoint method furnishes us with a way to calculate the derivative of $J$, which will be an essential ingredient in any minimization algorithm.

To be more explicit about enforcing those constraints, we can introduce a Lagrange multiplier $\lambda$. We would then seek a critical point of the Lagrangian

$$L(u, q, \lambda) = E(u) + \alpha^2 R(q) + \langle F(u, q), \lambda\rangle.$$

By first solving for $u$ and then for the adjoint state $\lambda$, we can effectively calculate the derivative of our original objective with respect to the parameters $q$. Under the hood, this is exactly what pyadjoint and (more generally) reverse-mode automatic differentiation does. The interface that pyadjoint presents to us hides the existence of a Lagrange multiplier and instead gives us only a reduced functional $\hat J(q)$.

Generating the exact data

First, we'll need to make a domain and some synthetic input data, which consist of:

  • the sources $f$
  • the external field $g$
  • the exchange coefficient $h$
  • the true log-conductivity field $q$

We have to be careful about what kind of data we use in order to make the problem interesting and instructive. Ideally, the the true log-conductivity field will give a solution that's very different from some kind of blunt, spatially constant initial guess. To do this, we'll first make the external field $g$ a random trigonometric polynomial.

import firedrake
mesh = firedrake.UnitSquareMesh(32, 32, diagonal='crossed')
Q = firedrake.FunctionSpace(mesh, family='CG', degree=2)
V = firedrake.FunctionSpace(mesh, family='CG', degree=2)
import numpy as np
from numpy import random, pi as π
x = firedrake.SpatialCoordinate(mesh)

rng = random.default_rng(seed=1)
def random_fourier_series(std_dev, num_modes, exponent):
    from firedrake import sin, cos
    A = std_dev * rng.standard_normal((num_modes, num_modes))
    B = std_dev * rng.standard_normal((num_modes, num_modes))
    return sum([(A[k, l] * sin(π * (k * x[0] + l * x[1])) +
                 B[k, l] * cos(π * (k * x[0] + l * x[1])))
                / (1 + (k**2 + l**2)**(exponent/2))
                for k in range(num_modes)
                for l in range(int(np.sqrt(num_modes**2 - k**2)))])
g = firedrake.Function(V).interpolate(random_fourier_series(1.0, 6, 1))
import matplotlib.pyplot as plt
firedrake.trisurf(g);
No description has been provided for this image

Next, we'll make the medium much more insulating (lower conductivity) near the center of the domain. This part of the medium will tend to soak up any sources much more readily than the rest.

from firedrake import inner, min_value, max_value, Constant
a = -Constant(8.)
r = Constant(1/4)
ξ = Constant((0.4, 0.5))
expr = a * max_value(0, 1 - inner(x - ξ, x - ξ) / r**2)
q_true = firedrake.Function(Q).interpolate(expr)
firedrake.trisurf(q_true);
No description has been provided for this image

In order to make the effect most pronounced, we'll stick a blob of sources right next to this insulating patch.

b = Constant(6.)
R = Constant(1/4)
η = Constant((0.7, 0.5))
expr = b * max_value(0, 1 - inner(x - η, x - η) / R**2)
f = firedrake.Function(V).interpolate(expr)
firedrake.trisurf(f);
No description has been provided for this image

Once we pick a baseline value $k$ of the conductivity and the exchange coefficient $h$, we can compute the true solution. We'll take the exchange coefficient somewhat arbitrarily to be 10 in this unit system because it makes the results look nice enough.

from firedrake import exp, grad, dx, ds
k = Constant(1.)
h = Constant(10.)
u_true = firedrake.Function(V)
v = firedrake.TestFunction(V)
F = (
    (k * exp(q_true) * inner(grad(u_true), grad(v)) - f * v) * dx +
    h * (u_true - g) * v * ds
)
opts = {
    'solver_parameters': {
        'ksp_type': 'preonly',
        'pc_type': 'lu',
        'pc_factor_mat_solver_type': 'mumps'
    }
}
firedrake.solve(F == 0, u_true, **opts)
firedrake.trisurf(u_true);
No description has been provided for this image

The true value of $u$ has a big hot spot in the insulating region, just as we expect.

Generating the observational data

For realistic problems, what we observe is the true solution plus some random noise $\xi$:

$$u_\text{obs} = u_\text{true} + \xi.$$

The ratio of the variance $\sigma$ of the noise to some scale of the solution, e.g. $\max_\Omega u_\text{true} - \min_\Omega u_\text{true}$, will determine the degree of accuracy that we can expect in the inferred field.

To make this experiment more realistic, we'll synthesize some observations by adding random noise to the true solution. We'll assume that the noise is spatially white, i.e. the covariance of the measurement errors is

$$\mathbb{E}[\xi(x)\xi(y)] = \sigma^2\delta(x - y)$$

where $\delta$ is the Dirac delta distribution. A naive approach would be to add a vector of normal random variables to the finite element expansion coefficients of the true solution, but this will fail for a subtle reason. Suppose that, at every point, the measurement errors $\xi$ are normal with mean 0 and variance $\sigma$. Letting $\mathbb{E}$ denote statistical expectation, we should then have by Fubini's theorem that

$$\mathbb{E}\left[\int_\Omega\xi(x)^2dx\right] = \int_\Omega\mathbb{E}[\xi(x)^2]dx = \sigma^2\cdot|\Omega|.$$

The naive approach to synthesizing the noise will give us the wrong value of the area-averaged variance.

ξ = firedrake.Function(V)
n = len(ξ.dat.data_ro)
ξ.dat.data[:] = rng.standard_normal(n)

firedrake.assemble(ξ**2 * dx)
np.float64(0.6237269211354283)

The "right" thing to do is:

  1. Compute the finite element mass matrix $M$
  2. Compute the Cholesky factorization $M = LL^*$
  3. Generate a standard normal random vector $z$
  4. The finite element expansion coefficients for the noise vector are

$$\hat\xi = \sigma\sqrt{\frac{|\Omega|}{n}}L^{-*}z.$$

You can show that this works out correctly by remembering that

$$\int_\Omega\xi^2dx = \hat\xi^*M\hat\xi.$$

We'll have to do a bit of hacking with PETSc data structures directly in order to pull out one of the Cholesky factors of the mass matrix.

from firedrake.petsc import PETSc
ϕ, ψ = firedrake.TrialFunction(V), firedrake.TestFunction(V)
m = inner(ϕ, ψ) * dx
M = firedrake.assemble(m, mat_type='aij').M.handle
ksp = PETSc.KSP().create()
ksp.setOperators(M)
ksp.setUp()
pc = ksp.pc
pc.setType(pc.Type.CHOLESKY)
pc.setFactorSolverType(PETSc.Mat.SolverType.PETSC)
pc.setFactorSetUpSolverType()
L = pc.getFactorMatrix()
pc.setUp()

Since our domain is the unit square, it has an area of 1, but for good measure I'll include this just to show the correct thing to do.

area = firedrake.assemble(Constant(1) * dx(mesh))
z = firedrake.Function(V)
z.dat.data[:] = rng.standard_normal(n)
with z.dat.vec_ro as Z:
    with ξ.dat.vec as Ξ:
        L.solveBackward(Z, Ξ)
        Ξ *= np.sqrt(area / n)

The error statistics are within spitting distance of the correct value of 1.

firedrake.assemble(ξ**2 * dx) / area
np.float64(0.9898623684079143)

The answer isn't exactly equal to one, but averaged over a large number of trials or with a larger mesh it will approach it. Finally, we can make the "observed" data. We'll use a signal-to-noise ratio of 50, but it's worth tweaking this value and seeing how the inferred parameters change.

 = u_true.dat.data_ro[:]
signal = .max() - .min()
signal_to_noise = 50
σ = firedrake.Constant(signal / signal_to_noise)

u_obs = u_true.copy(deepcopy=True)
u_obs += σ * ξ

The high-frequency noise you can see in the plot below is exactly what makes regularization necessary.

firedrake.trisurf(u_obs);
No description has been provided for this image

Calculating derivatives

Now we can import firedrake-adjoint. Under the hood, this will initialize the right data structures to calculate derivatives using the adjoint method, and we can even take a peek at those data structures.

import firedrake.adjoint
firedrake.adjoint.continue_annotation()
True

We'll start with a fairly neutral initial guess that the log-conductivity $q$ is identically 0.

q = firedrake.Function(Q)
u = firedrake.Function(V)
F = (
    (k * exp(q) * inner(grad(u), grad(v)) - f * v) * dx +
    h * (u - g) * v * ds
)
firedrake.solve(F == 0, u, **opts)

The computed solution with a constant conductivity doesn't have the gigantic spike in the insulating region, so it's very easy to tell them apart. When the differences are really obvious it makes it easier to benchmark a putative solution procedure.

firedrake.trisurf(u);
No description has been provided for this image

Just to give a sense of how different the initial value of the observed field is from the true value, we can calculate the relative difference in the 2-norm:

print(firedrake.norm(u - u_true) / firedrake.norm(u_true))
0.2985859425270404

Now we can start having some fun with Firedrake's adjoint capabilities. A lot of what we're going to do can seem like magic and I often find it a little bewildering to have no idea what's going on under the hood. Much of this machinery works by overloading functionality within Firedrake and recording operations to a tape. The tape can then in effect be played backwards to perform reverse-mode automatic differentiation. You can access the tape explicitly from the Firedrake adjoint API, which conveniently provides functions to visualise the tape using graphviz or NetworkX. The plot below shows the overall connectivity of the structure of the tape; you can query the nodes using NetworkX to get a better idea of what each one represents. This tape will grow and grow as we calculate more things and it's a common failure mode for an adjoint calculation to eat up all the system memory if you're not careful.

import networkx
tape = firedrake.adjoint.get_working_tape()
graph = tape.create_graph(backend='networkx')
fig, axes = plt.subplots()
networkx.draw_kamada_kawai(graph, ax=axes);
No description has been provided for this image

Hopefully this gives you some sense of how all this machinery works at a lower level. For more details you can see the dolfin-adjoint documentation, which has loads of commentary on both the math and the code by its author, Patrick Farrell.

To start on solving the inverse problem, we're going to declare that $q$ is the control variable, i.e. it's the thing that want to optimize over, as opposed to the field $u$ that we can observe.

 = firedrake.adjoint.Control(q)

Next we'll create the objective functional, which measures both the degree to which our computed solution $u$ differs from the true solution and the oscillations in our guess $q$. Normally, we might create a symbolic variable (a Firedrake Form type) that represents this functional. If we wanted to get an actual number out of this symbolic object, we would then call assemble. So it might stick out as unusual that we're assembling the form right away here.

α = Constant(5e-2)
J = firedrake.assemble(
    0.5 * ((u - u_obs) / σ)**2 * dx +
    0.5 * α**2 * inner(grad(q), grad(q)) * dx
)

In fact there's a bit of magic going under the hood; J isn't really a floating point number, but a more complex object defined within the pyadjoint package. The provenance of how this number is calculated is tracked through the adjoint tape.

print(type(J))
<class 'pyadjoint.adjfloat.AdjFloat'>

We can get an actual number out of this object by casting it to a float.

print(float(J))
4.975344431060689

The advantage of having this extra layer of indirection is that, as the control variable $q$ changes, so does $J$ and firedrake-adjoint will track the sensitivity under the hood for you. The next step is to somehow wire up this functional with the information that $u$ isn't really an independent variable, but rather a function of the control $q$. This is what the ReducedFunctional class does for us.

 = firedrake.adjoint.ReducedFunctional(J, )

The reduced functional has a method to calculate its derivative with respect to the control variable.

dĴ_dq = .derivative()

This method call is hiding some subtleties that are worth unpacking. The reduced functional $\hat J$ is a differentiable mapping of the function space $Q$ into the real numbers. The derivative $d\hat J/dq$ at a particular value of the control variable is an element of the dual space $Q^*$. As mathematicians, we grow accustomed to thinking of Hilbert spaces as being isometric to their duals. It's easy to forget that isometric does not mean identical; the mapping between primal and dual can be non-trivial. For example, suppose $Q$ is the Sobolev space $H^1(\Omega)$. The dual space $H^{-1}(\Omega)$ is isometric to the primal, but to evaluate the mapping between them, we have to solve an elliptic PDE.

The Sobolev space $H^1(\Omega)$ is a relatively tame one in the grand scheme of things. Real problems might involve controls in Banach spaces with no inner product structure at all. For example, the conductivity coefficient has to be bounded and positive, so we're probably looking in some cone in the space $L^\infty(\Omega)$. In general, conductivity fields can be discontinuous, although not wildly so. We might then want to look in the intersection of $L^\infty$ with the space $BV(\Omega)$ of functions whose first derivatives are finite signed measures.

Nonetheless, the discretization via finite elements can obscure the distinction between the primal and dual spaces. The control $q$ and the derivative $d\hat J/dq$ contain within them a wad of data that happens to look the same: an array of floating point numbers, the size of which is equal to the number of vertices + the number of edges of the mesh for our P2 discretization. What's confusing is that these numbers don't mean the same thing. The array living under $q$ represents its coefficients in the finite element basis for the space $Q$, while the array for $d\hat J/dq$ represents its coefficients in the dual basis. To get the action of $d\hat J/dq$ on some perturbation field $\phi$, we take the (Euclidean) dot product of the wads of data living underneath them. This is in distinct contrast to getting the inner product in, say, $L^2(\Omega)$ of $\phi$ with another function $\psi$, where the inner product is instead calculated using the finite element mass matrix.

So, where does that leave us? We need some way of mapping the dual space $Q^*$ back to the primal. This mapping is referred to in the literature as the Riesz map after the Riesz representation theorem. The laziest way we could possibly do so is to multiply $d\hat J/dq$ by the inverse of the finite element mass matrix. Maybe we should instead use a 2nd-order elliptic operator; we assumed that the controls live in an $H^1$-conforming space. But for illustrative purposes the mass matrix will do fine.

Under the hood, Firedrake automatically applies the mass matrix inverse for you. Let's try and peel back a layer of abstraction here. What if I want access to the raw value of the derivative, which really does live in the dual space? To access that, you can pass another option when you calculate derivatives. We can see the difference in the return types.

print(type(dĴ_dq))
print(type(.derivative(options={"riesz_representation": None})))
<class 'firedrake.function.Function'>
<class 'firedrake.cofunction.Cofunction'>

The second object is not a Function but rather a Cofunction, an element of the dual space.

Keeping track of which quantities live in the primal space and which live in the dual space is one of the challenging parts of solving PDE-constrained optimization problems. Most publications on numerical optimization assume the problem is posed over Euclidean space. In that setting, there's no distinction between primal and dual. You can see this bias reflected in software packages that purport to solve numerical optimization problems. Almost none of them have support for supplying a matrix other than the identity that defines the dual pairing. The fact that a Sobolev space isn't identical to its dual has some unsettling consequences. For starters, the gradient descent method doesn't make sense over Sobolev spaces. If you can rely on the built-in optimization routines from pyadjoint, you'll largely be insulated from this problem. But if you've read this far there's a good chance that you'll have to roll your own solvers at some point in your life. To paraphrase the warning at gate of Plato's academy, let none ignorant of duality enter there.

Solving the inverse problem

Ok, screed over. Let's do something useful now. The firedrake-adjoint package contains several routines to minimize the reduced objective functional. Here we'll use the Rapid Optimization Library, a sub-package of Trilinos. Let's see how well we can recover the log-conductivity field.

rol_options = {
    "Step": {
        "Type": "Line Search",
        "Line Search": {"Descent Method": {"Type": "Quasi-Newton Step"}},
    },
    "Status Test": {
        "Gradient Tolerance": 1e-4,
        "Step Tolerance": 1e-4,
        "Iteration Limit": 500,
    },
    "General": {
        "Print Verbosity": 0,
        "Secant": {"Type": "Limited-Memory BFGS", "Maximum Storage": 10},
    },
}

inverse_problem = firedrake.adjoint.MinimizationProblem()
inverse_solver = firedrake.adjoint.ROLSolver(
    inverse_problem, rol_options, inner_product="L2"
)
q_opt = inverse_solver.solve()
Quasi-Newton Method with Limited-Memory BFGS
Line Search: Cubic Interpolation satisfying Strong Wolfe Conditions
  iter  value          gnorm          snorm          #fval     #grad     ls_#fval  ls_#grad  
  0     4.975344e+00   6.582629e+00   
  1     4.104155e+00   8.199133e+00   6.582629e-01   3         2         2         0         
  2     3.404251e+00   1.833454e+01   2.304576e-01   4         3         1         0         
  3     3.099951e+00   6.744366e+00   7.305385e-02   6         4         2         0         
  4     3.029697e+00   3.761024e+00   2.488189e-02   7         5         1         0         
  5     2.959139e+00   3.380604e+00   5.514193e-02   8         6         1         0         
  6     2.897303e+00   5.621789e+00   7.281444e-02   9         7         1         0         
  7     2.822914e+00   4.040072e+00   7.902934e-02   10        8         1         0         
  8     2.749733e+00   3.602166e+00   7.183520e-02   11        9         1         0         
  9     2.683305e+00   4.603098e+00   9.295793e-02   12        10        1         0         
  10    2.636852e+00   3.872902e+00   4.458929e-02   13        11        1         0         
  11    2.594263e+00   2.772749e+00   3.218814e-02   14        12        1         0         
  12    2.559857e+00   3.652347e+00   6.459329e-02   15        13        1         0         
  13    2.545504e+00   3.873301e+00   2.885803e-02   16        14        1         0         
  14    2.531511e+00   2.151297e+00   1.167411e-02   17        15        1         0         
  15    2.512647e+00   1.732287e+00   2.402666e-02   18        16        1         0         
  16    2.493983e+00   2.502048e+00   3.415168e-02   19        17        1         0         
  17    2.463296e+00   3.705692e+00   6.612026e-02   20        18        1         0         
  18    2.446190e+00   4.700528e+00   1.047355e-01   21        19        1         0         
  19    2.419338e+00   1.722856e+00   1.969959e-02   22        20        1         0         
  20    2.406269e+00   1.812612e+00   1.456990e-02   23        21        1         0         
  21    2.392068e+00   2.326552e+00   2.526146e-02   24        22        1         0         
  22    2.378897e+00   4.868583e+00   6.208469e-02   25        23        1         0         
  23    2.358154e+00   2.083050e+00   2.750895e-02   26        24        1         0         
  24    2.348105e+00   1.580944e+00   1.677186e-02   27        25        1         0         
  25    2.337611e+00   1.744154e+00   2.387614e-02   28        26        1         0         
  26    2.324023e+00   2.078772e+00   3.451674e-02   29        27        1         0         
  27    2.314619e+00   3.547118e+00   6.152757e-02   30        28        1         0         
  28    2.300081e+00   1.543627e+00   8.616568e-03   31        29        1         0         
  29    2.290756e+00   1.646010e+00   1.324993e-02   32        30        1         0         
  30    2.279534e+00   1.915792e+00   2.570007e-02   33        31        1         0         
  31    2.275079e+00   5.183560e+00   6.011894e-02   34        32        1         0         
  32    2.256911e+00   1.528184e+00   1.588974e-02   35        33        1         0         
  33    2.251933e+00   1.037793e+00   7.938009e-03   36        34        1         0         
  34    2.245828e+00   1.329819e+00   1.883616e-02   37        35        1         0         
  35    2.237707e+00   1.506428e+00   2.419018e-02   38        36        1         0         
  36    2.232216e+00   2.328141e+00   2.654735e-02   40        37        2         0         
  37    2.223981e+00   1.153043e+00   2.752776e-02   41        38        1         0         
  38    2.219046e+00   9.882091e-01   1.437980e-02   42        39        1         0         
  39    2.212375e+00   1.479742e+00   2.492355e-02   43        40        1         0         
  40    2.207450e+00   2.256703e+00   3.198718e-02   44        41        1         0         
  41    2.201515e+00   1.230851e+00   9.547876e-03   45        42        1         0         
  42    2.195141e+00   1.079754e+00   2.144213e-02   46        43        1         0         
  43    2.191252e+00   1.140565e+00   1.741731e-02   47        44        1         0         
  44    2.187177e+00   2.019260e+00   2.756445e-02   49        45        2         0         
  45    2.181434e+00   9.852398e-01   2.760736e-02   50        46        1         0         
  46    2.177967e+00   8.689471e-01   1.450649e-02   51        47        1         0         
  47    2.172445e+00   1.326931e+00   2.727237e-02   52        48        1         0         
  48    2.168306e+00   2.154186e+00   3.680167e-02   53        49        1         0         
  49    2.162921e+00   1.240246e+00   1.069996e-02   54        50        1         0         
  50    2.155701e+00   1.256955e+00   2.650420e-02   55        51        1         0         
  51    2.149636e+00   1.599799e+00   2.932910e-02   56        52        1         0         
  52    2.134263e+00   3.210762e+00   1.020823e-01   57        53        1         0         
  53    2.116000e+00   2.710677e+00   1.194736e-01   58        54        1         0         
  54    2.094611e+00   2.288922e+00   3.691982e-02   59        55        1         0         
  55    2.052839e+00   4.535428e+00   1.570018e-01   60        56        1         0         
  56    2.028576e+00   5.390407e+00   8.026798e-02   61        57        1         0         
  57    1.975270e+00   6.127953e+00   1.052178e-01   62        58        1         0         
  58    1.896740e+00   6.747926e+00   1.234545e-01   63        59        1         0         
  59    1.853679e+00   7.689714e+00   6.525015e-02   65        60        2         0         
  60    1.775895e+00   6.422687e+00   5.000041e-02   67        61        2         0         
  61    1.697854e+00   5.128370e+00   7.969550e-02   68        62        1         0         
  62    1.586493e+00   7.255544e+00   1.047477e-01   69        63        1         0         
  63    1.488954e+00   5.091755e+00   8.159182e-02   70        64        1         0         
  64    1.432592e+00   3.799772e+00   3.031058e-02   71        65        1         0         
  65    1.395463e+00   6.657310e+00   1.051763e-01   72        66        1         0         
  66    1.323481e+00   5.521835e+00   5.875484e-02   73        67        1         0         
  67    1.273267e+00   5.464488e+00   5.027503e-02   74        68        1         0         
  68    1.231977e+00   2.896162e+00   3.091918e-02   75        69        1         0         
  69    1.203903e+00   2.955297e+00   4.076019e-02   76        70        1         0         
  70    1.175580e+00   4.329137e+00   2.421420e-02   77        71        1         0         
  71    1.141048e+00   2.945690e+00   2.918358e-02   78        72        1         0         
  72    1.112611e+00   2.567280e+00   3.708287e-02   79        73        1         0         
  73    1.092111e+00   3.239534e+00   3.026678e-02   80        74        1         0         
  74    1.072205e+00   2.464832e+00   2.316205e-02   81        75        1         0         
  75    1.048340e+00   2.587882e+00   3.605748e-02   82        76        1         0         
  76    1.035829e+00   3.213144e+00   2.440956e-02   83        77        1         0         
  77    1.022998e+00   2.260672e+00   1.192808e-02   84        78        1         0         
  78    1.002493e+00   2.234930e+00   3.672645e-02   85        79        1         0         
  79    9.955890e-01   3.094097e+00   5.079519e-02   86        80        1         0         
  80    9.780270e-01   1.782584e+00   1.253801e-02   87        81        1         0         
  81    9.644715e-01   1.227711e+00   1.778364e-02   88        82        1         0         
  82    9.522476e-01   2.581331e+00   5.037067e-02   89        83        1         0         
  83    9.418739e-01   1.859396e+00   3.056476e-02   90        84        1         0         
  84    9.335170e-01   1.658460e+00   1.144259e-02   91        85        1         0         
  85    9.205583e-01   1.671347e+00   3.515429e-02   92        86        1         0         
  86    9.148372e-01   3.466901e+00   3.078554e-02   93        87        1         0         
  87    9.054329e-01   1.615866e+00   7.897504e-03   94        88        1         0         
  88    8.984115e-01   1.144256e+00   1.966719e-02   95        89        1         0         
  89    8.940375e-01   1.313741e+00   1.324265e-02   96        90        1         0         
  90    8.864384e-01   2.578376e+00   3.995448e-02   97        91        1         0         
  91    8.783536e-01   1.228364e+00   2.558545e-02   98        92        1         0         
  92    8.744475e-01   9.437118e-01   7.492453e-03   99        93        1         0         
  93    8.691686e-01   1.194562e+00   1.995499e-02   100       94        1         0         
  94    8.662082e-01   2.207018e+00   2.487265e-02   101       95        1         0         
  95    8.618325e-01   1.111683e+00   6.323640e-03   102       96        1         0         
  96    8.579859e-01   8.570644e-01   1.435541e-02   103       97        1         0         
  97    8.549364e-01   9.902041e-01   1.496815e-02   104       98        1         0         
  98    8.497474e-01   1.223077e+00   2.985945e-02   105       99        1         0         
  99    8.470611e-01   1.204943e+00   2.001436e-02   107       100       2         0         
  100   8.444828e-01   7.381395e-01   9.137456e-03   108       101       1         0         
  101   8.416830e-01   8.225081e-01   1.249806e-02   109       102       1         0         
  102   8.391645e-01   1.063965e+00   1.488576e-02   110       103       1         0         
  103   8.363372e-01   1.196468e+00   2.498109e-02   111       104       1         0         
  104   8.338076e-01   7.843112e-01   1.341142e-02   112       105       1         0         
  105   8.315499e-01   7.756434e-01   1.235202e-02   113       106       1         0         
  106   8.294470e-01   8.794800e-01   1.327914e-02   114       107       1         0         
  107   8.269589e-01   1.580721e+00   3.190371e-02   115       108       1         0         
  108   8.240537e-01   7.625631e-01   1.228308e-02   116       109       1         0         
  109   8.223473e-01   6.635380e-01   7.750513e-03   117       110       1         0         
  110   8.204024e-01   7.806031e-01   1.376027e-02   118       111       1         0         
  111   8.189751e-01   1.668500e+00   2.793887e-02   119       112       1         0         
  112   8.165372e-01   7.032897e-01   7.601613e-03   120       113       1         0         
  113   8.150667e-01   5.723454e-01   9.069903e-03   121       114       1         0         
  114   8.136089e-01   6.998614e-01   1.251420e-02   122       115       1         0         
  115   8.111043e-01   1.121799e+00   2.433472e-02   123       116       1         0         
  116   8.090906e-01   9.663334e-01   3.423740e-02   124       117       1         0         
  117   8.075289e-01   5.398987e-01   2.982670e-03   125       118       1         0         
  118   8.059373e-01   6.363282e-01   6.941040e-03   126       119       1         0         
  119   8.044784e-01   8.042575e-01   1.269502e-02   127       120       1         0         
  120   8.024809e-01   9.745771e-01   3.256781e-02   128       121       1         0         
  121   8.008560e-01   5.803951e-01   1.445608e-02   129       122       1         0         
  122   7.996492e-01   6.193997e-01   1.192281e-02   130       123       1         0         
  123   7.982751e-01   7.132708e-01   1.527067e-02   131       124       1         0         
  124   7.971247e-01   1.369381e+00   3.117900e-02   132       125       1         0         
  125   7.954892e-01   5.594849e-01   4.832567e-03   133       126       1         0         
  126   7.945991e-01   4.754115e-01   5.081179e-03   134       127       1         0         
  127   7.937026e-01   5.510235e-01   8.993161e-03   135       128       1         0         
  128   7.921704e-01   8.514663e-01   2.102717e-02   136       129       1         0         
  129   7.910503e-01   7.940958e-01   2.959847e-02   137       130       1         0         
  130   7.900820e-01   4.132730e-01   2.622418e-03   138       131       1         0         
  131   7.892365e-01   4.688077e-01   6.187622e-03   139       132       1         0         
  132   7.883827e-01   5.756035e-01   1.126050e-02   140       133       1         0         
  133   7.876485e-01   9.636791e-01   2.755406e-02   141       134       1         0         
  134   7.866620e-01   3.999568e-01   5.409749e-03   142       135       1         0         
  135   7.861849e-01   3.812180e-01   4.158647e-03   143       136       1         0         
  136   7.854771e-01   4.614996e-01   9.754898e-03   144       137       1         0         
  137   7.849466e-01   1.057906e+00   2.075241e-02   145       138       1         0         
  138   7.840589e-01   4.260439e-01   6.543831e-03   146       139       1         0         
  139   7.836122e-01   3.115052e-01   5.007422e-03   147       140       1         0         
  140   7.831666e-01   3.711497e-01   7.254356e-03   148       141       1         0         
  141   7.824766e-01   4.673039e-01   1.214025e-02   149       142       1         0         
  142   7.821948e-01   8.577431e-01   2.283010e-02   150       143       1         0         
  143   7.814527e-01   2.986391e-01   2.905432e-03   151       144       1         0         
  144   7.811577e-01   2.953812e-01   1.729586e-03   152       145       1         0         
  145   7.807049e-01   3.491148e-01   6.019504e-03   153       146       1         0         
  146   7.804933e-01   9.421217e-01   1.537939e-02   154       147       1         0         
  147   7.798531e-01   3.386522e-01   5.398727e-03   155       148       1         0         
  148   7.795847e-01   2.380960e-01   5.437021e-03   156       149       1         0         
  149   7.792769e-01   3.037968e-01   8.741599e-03   157       150       1         0         
  150   7.788447e-01   3.317707e-01   1.054402e-02   158       151       1         0         
  151   7.786174e-01   4.947637e-01   8.608563e-03   160       152       2         0         
  152   7.782499e-01   2.605321e-01   8.351514e-03   161       153       1         0         
  153   7.779957e-01   2.205735e-01   4.813921e-03   162       154       1         0         
  154   7.776604e-01   3.394933e-01   8.233586e-03   163       155       1         0         
  155   7.774468e-01   4.572107e-01   9.023003e-03   164       156       1         0         
  156   7.772108e-01   2.492974e-01   2.145675e-03   165       157       1         0         
  157   7.769722e-01   2.171733e-01   5.178202e-03   166       158       1         0         
  158   7.768125e-01   2.402740e-01   4.959153e-03   167       159       1         0         
  159   7.767177e-01   6.981158e-01   1.408931e-02   168       160       1         0         
  160   7.763748e-01   2.173561e-01   2.358937e-03   169       161       1         0         
  161   7.762651e-01   1.689148e-01   1.202845e-03   170       162       1         0         
  162   7.760988e-01   2.179998e-01   3.939860e-03   171       163       1         0         
  163   7.758928e-01   2.547722e-01   6.489750e-03   172       164       1         0         
  164   7.757554e-01   2.998630e-01   6.798930e-03   174       165       2         0         
  165   7.756060e-01   1.612010e-01   4.981465e-03   175       166       1         0         
  166   7.754904e-01   1.666382e-01   3.546256e-03   176       167       1         0         
  167   7.753549e-01   2.412235e-01   4.813028e-03   177       168       1         0         
  168   7.752432e-01   3.164733e-01   6.677652e-03   178       169       1         0         
  169   7.751252e-01   1.651373e-01   1.333902e-03   179       170       1         0         
  170   7.750190e-01   1.503262e-01   2.496224e-03   180       171       1         0         
  171   7.749418e-01   1.721389e-01   2.787718e-03   181       172       1         0         
  172   7.748649e-01   3.988219e-01   8.238170e-03   182       173       1         0         
  173   7.747275e-01   1.425890e-01   3.101423e-03   183       174       1         0         
  174   7.746768e-01   1.202093e-01   9.853007e-04   184       175       1         0         
  175   7.745945e-01   1.369568e-01   3.001422e-03   185       176       1         0         
  176   7.745235e-01   2.548180e-01   4.653409e-03   186       177       1         0         
  177   7.744421e-01   1.251657e-01   3.817678e-03   187       178       1         0         
  178   7.743915e-01   1.013347e-01   1.885824e-03   188       179       1         0         
  179   7.743261e-01   1.375007e-01   3.015903e-03   189       180       1         0         
  180   7.742911e-01   2.633314e-01   4.154696e-03   190       181       1         0         
  181   7.742314e-01   1.222447e-01   8.878618e-04   191       182       1         0         
  182   7.741890e-01   9.457960e-02   1.457312e-03   192       183       1         0         
  183   7.741553e-01   1.063089e-01   1.731469e-03   193       184       1         0         
  184   7.740902e-01   1.464106e-01   3.684924e-03   194       185       1         0         
  185   7.740798e-01   2.638443e-01   6.055956e-03   195       186       1         0         
  186   7.740170e-01   8.138247e-02   1.168540e-03   196       187       1         0         
  187   7.739986e-01   7.387360e-02   3.544702e-04   197       188       1         0         
  188   7.739622e-01   9.761458e-02   1.647331e-03   198       189       1         0         
  189   7.739438e-01   2.276839e-01   3.804452e-03   199       190       1         0         
  190   7.739032e-01   8.871317e-02   1.234857e-03   200       191       1         0         
  191   7.738840e-01   6.662285e-02   1.394839e-03   201       192       1         0         
  192   7.738632e-01   8.118884e-02   1.933002e-03   202       193       1         0         
  193   7.738310e-01   8.799321e-02   2.768452e-03   203       194       1         0         
  194   7.738157e-01   1.341632e-01   1.725166e-03   205       195       2         0         
  195   7.737905e-01   7.279273e-02   1.952635e-03   206       196       1         0         
  196   7.737729e-01   5.632210e-02   1.121898e-03   207       197       1         0         
  197   7.737553e-01   8.137129e-02   1.420974e-03   208       198       1         0         
  198   7.737410e-01   8.141809e-02   1.355015e-03   209       199       1         0         
  199   7.737283e-01   5.579230e-02   7.587019e-04   210       200       1         0         
  200   7.737132e-01   6.490293e-02   1.544441e-03   211       201       1         0         
  201   7.737056e-01   6.939358e-02   1.155353e-03   212       202       1         0         
  202   7.736983e-01   4.938166e-02   6.006573e-04   213       203       1         0         
  203   7.736847e-01   4.663499e-02   1.820274e-03   214       204       1         0         
  204   7.736789e-01   9.204236e-02   9.369510e-04   215       205       1         0         
  205   7.736716e-01   5.624649e-02   3.808101e-04   216       206       1         0         
  206   7.736620e-01   3.789592e-02   8.979387e-04   217       207       1         0         
  207   7.736576e-01   4.183684e-02   5.307007e-04   218       208       1         0         
  208   7.736469e-01   9.229663e-02   1.972744e-03   219       209       1         0         
  209   7.736381e-01   5.268487e-02   1.574723e-03   220       210       1         0         
  210   7.736335e-01   3.242685e-02   1.939124e-04   221       211       1         0         
  211   7.736274e-01   3.495981e-02   4.608146e-04   222       212       1         0         
  212   7.736220e-01   5.261509e-02   8.218584e-04   223       213       1         0         
  213   7.736151e-01   4.371174e-02   1.408317e-03   224       214       1         0         
  214   7.736098e-01   3.653976e-02   1.012423e-03   225       215       1         0         
  215   7.736052e-01   3.574242e-02   1.016277e-03   226       216       1         0         
  216   7.736021e-01   3.210624e-02   3.500981e-04   227       217       1         0         
  217   7.735985e-01   2.902713e-02   4.221645e-04   228       218       1         0         
  218   7.735961e-01   4.944021e-02   7.139342e-04   229       219       1         0         
  219   7.735934e-01   2.618615e-02   2.272619e-04   230       220       1         0         
  220   7.735913e-01   2.476616e-02   3.217852e-04   231       221       1         0         
  221   7.735890e-01   2.574751e-02   4.844896e-04   232       222       1         0         
  222   7.735872e-01   4.959061e-02   8.566257e-04   233       223       1         0         
  223   7.735848e-01   1.933390e-02   3.226229e-04   234       224       1         0         
  224   7.735836e-01   1.664846e-02   1.285241e-04   235       225       1         0         
  225   7.735820e-01   2.017284e-02   2.706176e-04   236       226       1         0         
  226   7.735811e-01   4.579934e-02   5.470573e-04   237       227       1         0         
  227   7.735793e-01   1.932243e-02   1.710902e-04   238       228       1         0         
  228   7.735781e-01   1.421108e-02   2.371707e-04   239       229       1         0         
  229   7.735772e-01   1.648302e-02   2.863661e-04   240       230       1         0         
  230   7.735757e-01   1.929481e-02   4.525160e-04   241       231       1         0         
  231   7.735748e-01   2.206394e-02   4.401280e-04   243       232       2         0         
  232   7.735739e-01   1.202763e-02   2.374098e-04   244       233       1         0         
  233   7.735733e-01   1.227605e-02   1.533285e-04   245       234       1         0         
  234   7.735726e-01   2.206220e-02   2.894856e-04   246       235       1         0         
  235   7.735719e-01   1.310483e-02   2.275645e-04   247       236       1         0         
  236   7.735713e-01   1.065245e-02   1.682674e-04   248       237       1         0         
  237   7.735707e-01   1.500176e-02   3.172821e-04   249       238       1         0         
  238   7.735703e-01   2.016379e-02   3.429688e-04   250       239       1         0         
  239   7.735699e-01   1.078579e-02   5.487960e-05   251       240       1         0         
Optimization Terminated with Status: Step Tolerance Met
firedrake.trisurf(q_opt);
No description has been provided for this image

The optimization procedure has correctly identified the drop in the conductivity of the medium to within our smoothness constraints. Nonetheless, it's clear in the eyeball norm that the inferred field doesn't completely match the true one.

firedrake.norm(q_opt - q_true) / firedrake.norm(q_true)
0.28636248010146415

What's a little shocking is the degree to which the computed state matches observations despite these departures. If we plot the computed $u$, it looks very similar to the true value.

q.assign(q_opt)
firedrake.solve(F == 0, u, **opts)
firedrake.trisurf(u);
No description has been provided for this image

Moreover, if we compute the model-data misfit and weight it by the standard deviation of the measurement errors, we get a value that's roughly around 1/2.

firedrake.assemble(0.5 * ((u - u_obs) / σ)**2 * dx)
0.5139410915262156

This value is about what we would expect from statistical estimation theory. Assuming $u$ is an unbiased estimator for the true value of the observable state, the quantity $((u - u^o) / \sigma)^2$ is a $\chi^2$ random variable. When we integrate over the whole domain and divide by the area (in this case 1), we're effectively summing over independent $\chi^2$ variables and so we should get a value around 1/2.

Recall that we used a measurement error $\sigma$ that was about 2% of the true signal, which is pretty small. You can have an awfully good signal-to-noise ratio and yet only be able to infer the conductivity field to within a relative error of 1/4. These kinds of synthetic experiments are really invaluable for getting some perspective on how good of a result you can expect.