The Feldman-Hájek theorem

In other posts, I've touched on the idea of sampling from probability measures defined over spaces of functions. We might be inferring the density of material within the earth from seismic surveys, the elevation of the earth's surface from lidar measurements, or the conductivity of an aquifer from well head measurements. For the sake of definiteness, we'll assume throughout that the function we're looking for is an element of the space $X = L^2(\Omega)$ for some domain $\Omega$. We could consider Sobolev or other spaces but the idea is the same. I'll also tacitly assume that $\Omega$ is regular enough for PDE theory to apply. That usually means that the boundary of $\Omega$ has to be piecewise smooth and have no cusps.

You can think of a random variable with values in $X$, i.e. a function selected at random according to some statistical law, as a stochastic process. I'll use the two terms interchangeably. In the stochastic process literature, it's common to focus on processes that have certain useful statistical properties. For example, we might think of $\Omega$ as some time interval $[0, T]$ and a random function $z(t)$ as a function of time $t$. We'd then like to have some notion of causality. The values of $z(t)$ should depend on the values of $z(s)$ for $s$ coming before $t$ but not after. The Markov property gives us this idea of causality. If $\Omega$ is instead a spatial domain, we might instead be interested in random fields where we specify analytically the covariance between $z(x)$ and $z(y)$ as a function of $x - y$. For example, it's common to assume that the conductivity of an aquifer is a random field whose logarithm has the spatial correlation structure $$\mathbb E[(z(x) - m)(z(y) - m)] = \text{const}\times\exp(-|x - y|^2/\lambda^2).$$ This is all the province of geostatistics. But in the following I'll be focused more on general principles without making any assumptions about the dependence structure of the process.

The normal distribution has a kind of universal character in finite-dimensional statistical inference. We want to generalize the idea of a normal distribution to function spaces -- a Gaussian process. We'll have to think a bit about measure theory. This will lead us by turns to ask about the relation between two distinct normal random variables with values in a function space. This relationship will be summed up in the Feldman-Hájek theorem. It's a doozy. I'll do some numerical experiments to illustrate the theorem.

Normal random variables

A normal random variable $Z$ has a density $$\rho(z) = \frac{\exp\left(-\frac{1}{2}(z - m)^*C^{-1}(z - m)\right)}{\sqrt{(2\pi)^n|\det C|}}$$ where $m$ is the mean, $C$ the covariance matrix. To talk about normal random variables on function spaces, we'll have to think about things that we take for granted. What does it mean to say that a random variable $Z$ has a density at all? In finite dimensions, this means that we can evaluate expectations of any function of $Z$ by evaluating an integral: $$\mathbb E[f(Z)] = \int f(z)\rho(z)\mathrm dz.$$ Fine, but what is $\mathrm dz$? It's a symbolic shorthand for integration with respect to Lebesgue measure. I'll write Lebesgue measure as $\lambda$ for the time being. Let $\mu$ be the distribution for $Z$, i.e. $$\mu(Z) = \mathbb P[Z \in A].$$ When we say that $Z$ has a density, we really mean that, for any set $A$ such that $\lambda(A) = 0$, we must also have that $\mu(A) = 0$ as well. The fancy term for this is that $\mu$ is absolutely continuous with respect to $\lambda$, usually written as $\mu \ll \lambda$. The Radon-Nikodym theorem then tells us that there is a unique positive function $\rho$ such that, for all $f$, $$\int f\,\mathrm d\mu = \int f\,\rho\,\mathrm d\lambda.$$ At a formal level we might say that $\mathrm d\mu = \rho\,\mathrm d\lambda$, hence the notation $$\rho = \frac{\mathrm d\mu}{\mathrm d\lambda}.$$ The usual Lebesgue measure provides a kind of background, reference measure to which other probability measures can be absolutely continuous. Lebesgue measure is not itself a probability measure because the volume of the entire space is infinite. Discrete random variables don't have densities. The Lebesgue measure of a single point is zero. Discrete random variables instead have densities with respect to a different measure, for example the counting measure on $\mathbb Z$. The counting measure, which is again not itself a probability measure, fulfills the role of the reference measure to which discrete random variables can be absolutely continuous.

In infinite dimensions, there is no spoon Lebesgue measure. To be more specific, there's no measure on an infinite-dimensional vector space that is non-trivial and translation-invariant. There is no background reference measure "$\mathrm dz$" to integrate against. We can't write down a normal density -- a density is defined w.r.t. a background measure and we don't have one. There are still well-defined normal random variables and probability measures.

Not having a measure $\mathrm dz$ or a density $\rho(z)$ to work with doesn't stop physicists from writing functional integrals. I won't use this notation going forward, but I do find it appealing.

To define a normal random variable on a function space, we can instead work with arbitrary finite-dimensional projections. Let $\{g_1, \ldots, g_N\}$ be a finite collection of elements of the dual space $X^*$. We can then define a linear operator $G : X \to \mathbb R^N$ by $$G = \sum_k e_k\otimes g_k$$ where $e_k$ is the $k$-th standard basis vector in $\mathbb R^N$. Given a random variable $Z$ with values in $X$, $$GZ = \left[\begin{matrix}\langle g_1, Z\rangle \\ \vdots \\ \langle g_N, Z\rangle\end{matrix}\right]$$ is a random vector taking values in $\mathbb R^N$. We say that $Z$ is a normal random variable if $GZ$ is normal for all finite choices of $N$ and $G$. The mean of $Z$ is the vector $m$ in $X$ such that $GZ$ has the mean $Gm$ for all linear mappings $G$. Likewise, the covariance $C$ of $Z$ is an element of $X\otimes X$ such that $GZ$ has the covariance matrix $GCG^*$ for all $G$.

Not just any operator can be the covariance of a normal random variable on a function space. It ought to be familiar from the finite-dimensional theory that $C$ has to be symmetric and positive-definite. You might imagine writing down a decomposition for $C$ in terms of its eigenfunctions $\{\psi_n\}$. We know that the eigenfunctions are orthonormal because $C$ is symmetric. Since the eigenvalues are real and positive, we can write them as the squares $\sigma_n^2$ of a sequence $\{\sigma_n\}$ of real numbers as a mnemonic. I'll do this throughout because it becomes convenient in a number of places. The action of $C$ on a function $\phi$ is then $$C\phi = \sum_n\sigma_n^2\langle\psi_n, \phi\rangle\,\psi_n.$$ In the function space setting, we need that the sum of all the eigenvalues is finite: $$\sum_n\sigma_n^2 < \infty.$$ The fancy term for this is that $C$ has to be of trace class. In finite dimensions we don't have to worry about this condition at all. In function spaces, it's easy to try and use a covariance operator with $\sigma_n^2 \sim n^{-1}$ by mistake. The trace then diverges like the harmonic series.

The Karhunen-Loève expansion

The spectral theorem assures us that the covariance operator of a Gaussian process has to have an eigenvalue factorization. The Karhunen-Loève theorem then shows how we can use that decomposition to understand the stochastic process. It states that, if $\{\xi_k\}$ are an infinite sequence of i.i.d. standard normal random variables, then the random variable $$Z = \sum_n\sigma_n\,\xi_n\,\psi_n$$ has the distribution $N(0, C)$. In other words, if we know the eigendecomposition, we can generate samples from the process. We'll use this property in just a moment to simulate Brownian bridges. The eigenfunctions for the covariance operator of a Brownian bridge are trigonometric functions. We're simulating the stochastic process by Fourier synthesis.

Rather than start with a covariance operator and then compute its eigenvalue decomposition, the KL expansion also gives us a way to synthesize random variables with useful properties by picking the eigenvalues and eigenfunctions. The covariance operator is then $$C = \sum_n\sigma_n^2\,\psi_n\otimes\psi_n.$$ We can compute a closed form expression for $C$ as some integral operator $$C\phi = \int_\Omega \mathscr C(x, y)\phi(y)dy.$$ in some cases. But we don't need an explicit expression for the kernel for the KL expansion to be useful.

The code below generates some random functions on the domain $\Omega = [0, 1]$. Here we use the basis functions $$\psi_n = \sin(\pi n x).$$ We can then try eigenvalue sequences $\sigma_n = \alpha n^{-\gamma}$ for some power $\gamma$. Taking $\gamma \ge 1$ guarantees that $\sum_n\sigma_n^2$ is finite.

import numpy as np
from numpy import pi as π
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

def generate_sample(x, σs, rng):
    ξs = rng.standard_normal(size=len(σs))
    return sum(
        σ * ξ * np.sin(π * n * x)
        for n, (σ, ξ) in enumerate(zip(σs, ξs), start=1)
    )
num_points = 512
num_modes = num_points // 2
x = np.linspace(0.0, 1.0, num_points + 1)
ns = np.linspace(1.0, num_modes, num_modes)
rng = np.random.default_rng(seed=1729)

The plot below compares two function-valued random variables with different eigenvalue sequences. The plot on the top shows a classic Brownian bridge: $\sigma_n^2 = 2 / \pi^2n^2$. The plot on the bottom shows $\sigma_n^2 = 2 / \pi^2n^{2\gamma}$ where $\gamma = 1.6$. The second stochastic process is much smoother, which we expect -- the Fourier coefficients decay faster.

γ_1, γ_2 = 1.0, 1.6
σs_1 = np.sqrt(2) / (π * ns ** γ_1)
σs_2 = np.sqrt(2) / (π * ns ** γ_2)

num_samples = 128
ws = np.stack([generate_sample(x, σs_1, rng) for k in range(num_samples)])
zs = np.stack([generate_sample(x, σs_2, rng) for k in range(num_samples)])
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
for ax in axes:
    ax.set_axis_off()

for w, z in zip(ws, zs):
    axes[0].plot(x, w, color="tab:blue", alpha=0.25)
    axes[1].plot(x, z, color="tab:orange", alpha=0.25)
No description has been provided for this image

The Feldman-Hájek theorem

Now consider two $X$-valued normal random variables $W$ and $Z$. To simplify things, we'll assume that both of them have zero mean. The plots above might suggest that these two stochastic processes behave differently in some essential way that we can quantify. For example, the first process is a Brownian bridge. A Brownian bridge is continuous with probability 1, but it is nowhere differentiable. The second process, on the other hand, is almost surely differentiable. We can be a little more formal about this and say that $$\mathbb P[W \in H^1(\Omega)] = 0, \quad \mathbb P[Z \in H^1(\Omega)] = 1$$ where $H^1(\Omega)$ is the space of functions with square-integrable derivatives. The two random variables are not mutually absolutely continuous. That might not be too surprising. We picked the eigenvalue distributions of each process to have different decay rates: $$\sigma_n^2(W) \sim n^{-2}, \quad \sigma_n^2(Z) \sim n^{-2\gamma}.$$ Just to really hammer this point home, let's evaluate an approximation to the mean-square derivative of each group of samples.

δx = 1 / num_points

Dw = np.diff(ws, axis=1) / δx
energies_w = np.sum((Dw * Dw) * δx, axis=1)
print(f"∫|∇w|² dx = {energies_w.mean():.1f} ± {energies_w.std():.1f}")

Dz = np.diff(zs, axis=1) / δx
energies_z = np.sum((Dz * Dz) * δx, axis=1)
print(f"∫|∇z|² dx = {energies_z.mean():.1f} ± {energies_z.std():.1f}")
∫|∇w|² dx = 240.2 ± 21.5
∫|∇z|² dx = 3.8 ± 1.5

If you run this notebook with an even finer discretization of the unit interval, you'll see that $\int|\nabla w|^2\mathrm dx$ keeps increasing while the value of $\int|\nabla z|^2\mathrm dx$ stabilizes.

What conditions guarantee that two function-value random variables are absolutely continuous? In other words, we want to know when it is impossible for there to be some subset $A$ of $X$ such that there is positive probability that $W$ is in $A$ but zero probability that $Z$ is in $A$ or vice versa. What needs to be true about the covariance operators of the two random variables to fulfill this condition? This is the content of the Feldman-Hájek theorem.

The calculation above suggests that two normal distributions will concentrate on different sets if the ratios $\sigma_n^2(W)/\sigma_n^2(Z)$ of the eigenvalues of their covariance operators can go to either zero or infinity. The example above used two eigenvalue sequences that look like $n^{-\alpha}$ for different values of $\alpha$. The gap in the decay rates then lets us construct a quadratic functional for which the process $Z$ takes finite values and $W$ takes infinite values. The fact of this functional taking on finite or infinite values on one process or the other then implies the concentration of the two probability measures on different sets. The fact that the mean square gradient of $W$ is infinite implies that $W$ is not differentiable with probability 1, while $Z$ is differentiable with probability 1. Put another way, the probability distribution for $Z$ is concentrated in the set of differentiable functions while the probability distribution for $W$ is not.

We can always construct an operator like this when the decay rates differ, although the effect might be subtler. For example, the distinction might be in having a higher-order derivative or not.

What if the ratio of the eigenvalue sequences doesn't go to zero or infinity? Are the laws of the two processes equivalent, or can they still concentrate on different sets? We can make it even simpler by supposing that $W \sim N(0, C)$ and $Z \sim N(0, \alpha C)$ for some non-zero $\alpha$. We'll show that even these random variables concentrate on different sets. I got this example from MathOverflow. Let $\{\sigma_n^2\}$ be the eigenvalues of $C$ (note the square) and $\{\psi_n\}$ the eigenfunctions. Define the sequence of random variables $$U_n = \langle \psi_n, W\rangle / \sigma_n.$$ We know from the KL expansion that the $U_n$ are independent, identically-distributed standard normal random variables. If we also define $$V_n = \langle\psi_n, Z\rangle / \sigma_n$$ then these random variables are i.i.d. normals with mean zero and variance $\alpha$.

Now remember that $W$ and $Z$ take values in a function space and so there are infinitely many modes to work with. The strong law of large numbers then implies that $$\frac{1}{n}\sum_nU_n^2 \to 1$$ with probability 1. This would not be possible in a finite-dimensional vector space -- the sum would have to terminate at some $N$. The average of $U_n^2$ would be a random variable with some non-trivial spread around its mean. It can take on any positive value with non-zero probability. That probability could be minute, but it is still positive. Likewise, we also find that $$\frac{1}{n}\sum_nV_n^2 \to \alpha$$ with probability 1. But we've now shown that the two probability measures concentrate on different sets as long as $\alpha$ is not equal to 1. The weighted average of squared expansion coefficients takes distinct values, each with probability 1, for the distributions of $W$ and $Z$. So the two distributions are mutually singular.

Just to illustrate things a bit, the plot below shows samples obtained with a spectrum of $\{\sigma_n\}$ generated as before with $\gamma = 1.5$ and with $2\sigma_n$ respectively.

γ = 1.5
σs = np.sqrt(2) / (π * ns ** γ)
ws = np.stack([generate_sample(x, σs, rng) for k in range(num_samples)])
zs = np.stack([generate_sample(x, 2 * σs, rng) for k in range(num_samples)])
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
for ax in axes:
    ax.set_axis_off()

for w, z in zip(ws, zs):
    axes[0].plot(x, w, color="tab:blue", alpha=0.25)
    axes[1].plot(x, z, color="tab:orange", alpha=0.25)
No description has been provided for this image

What this example shows is that it's not enough that $\sigma_n(W)$ and $\sigma_n(Z)$ have the same decay rate. They both decay like some constant $\times$ $n^{-\gamma}$ for the same value of $\gamma$. The two distributions are mutually singular even if the rates are the same but the constants are different.

At this point you might wonder if it's enough to have $$\sigma_n(W)/\sigma_n(Z) \to 1.$$ Even this much tighter condition is not enough!

The Feldman-Hájek theorem states that, in order for $W$ and $Z$ to not concentrate on different sets, we need that $$\sum_n\left(\frac{\sigma_n(W)}{\sigma_n(Z)} - 1\right)^2 < \infty.$$ Not only do we need that the eigenvalue ratio goes to 1, we need it to go to 1 fast enough to be summable. For example, suppose we had a pair of $X$-valued random variables such that $$\frac{\sigma_n(W)}{\sigma_n(Z)} - 1 \sim \frac{\text{const}}{\log n}.$$ The relative error between the two eigenvalue sequences is decreasing to zero. But it isn't decreasing fast enough, so the two random variables would concentrate on different sets.

Why does this matter? I stumbled on the Feldman-Hájek theorem while trying to understand two papers, Beskos et al. (2008) and Cotter et al. (2013). Both of these papers are about how to do Markov-chain Monte Carlo sampling of probability distributions defined over function spaces. The main ingredient of MCMC is a proposal mechanism, which generates a new candidate sample $z$ from the current sample $w$. We can then also define a reverse proposal -- what is the probability of proposing $w$ if we instead started from $z$? In order for an MCMC method to converge, Tierney (1998) showed that the forward and reverse proposals need to be mutually absolutely continuous. They cannot concentrate probability on different sets. In finite dimensions, it's so hard to violate this condition that it often goes unstated. In function spaces, on the other hand, you're almost guaranteed to violate it unless you're careful. Even for the nicest class of distributions (normal random variables), the spectra of the covariance operators have to have nearly identical asymptotic behavior. In light of that, it's not surprising that MCMC sampling in function spaces is so challenging.

Plate theory

In this post we'll look at a fourth-order equation describing the vibrations of clamped plates. Fourth-order problems are much harder to discretize than second-order. Here we'll get around that by using its dual or mixed form. This will use some of the same techniques that we've already seen with the Stokes equations. But it won't be a perfectly conforming discretization, and as a consequence we'll need some jump terms that ought to be familiar from the discontinuous Galerkin method. Finally we'll compute some eigenfunctions of plate operator. This will look similar to what we did with Weyl's law but with an extra mathematical subtlety.

Our starting point is linear elasticity. Here we want to solve for the 3D displacement $u$ of the medium. The first equation is momentum conservation: $$\rho\ddot u = \nabla\cdot\sigma + f$$ where $\sigma$ is the stress tensor and $f$ the body forces. To close the system, we need to supply a constitutive equation relating the stress tensor and the strain tensor $$\varepsilon = \frac{1}{2}\left(\nabla u + \nabla u^*\right).$$ The most general linear constitutive equation we can write down is $$\sigma = \mathscr C\,\varepsilon$$ where $\mathscr C$ is the rank-4 elasticity tensor. We need that $\mathscr C$ maps symmetric tensors to symmetric tensors and that $\varepsilon:\mathscr C\varepsilon$ is always positive. The equilibrium displacement of the medium is the minimizer of the energy functional $$J(u) = \int_\Omega\left(\frac{1}{2}\sigma:\varepsilon + f\cdot u\right)\mathrm dx + \ldots$$ where the ellipses stand for various boundary forcings that we'll ignore.

For a medium that is homogeneous and isotropic, the elasticity tensor has to have the form $$\mathscr C\,\varepsilon = 2\,\mu\,\varepsilon + \lambda\,\text{tr}(\varepsilon)I$$ where $\mu$ and $\lambda$ are the Lamé parameters. As an aside, there are a mess of alternate forms of the elasticity equations. The wiki page has a conversion table at the bottom. Now take this with a grain of salt because I do fluid mechanics. But if I were a solid mechanics kinda guy, this would embarrass me. Get it together folks.

Plate theory is what you get when you assume the medium is thin along the vertical dimension and that this restricts the form that the displacements can take. The first and most widely agreed-upon simplification is that the vertical displacement is some function $w$ of the horizontal coordinates $x$, $y$: $$u_z = w(x, y).$$ From here we have to make additional assumptions about the horizontal displacements $u_x$ and $u_y$. Different theories make different sets of assumptions. The classical theory is the Kirchoff-Love plate. The Kirchoff theory assumes that any straight line that's perpendicular to the middle of the plate remains straight and perpendicular after deformation. This theory has some deficiences, see for example this paper. But it's a good starting point for experimentation. These assumptions let us write down the other components of the deformation in terms of $w$: $$u_x = -z\frac{\partial w}{\partial x}, \quad u_y = -z\frac{\partial w}{\partial y}.$$ It's possible that $u_x$ and $u_y$ are also offset by in-plane displacements. I'll assume that the boundary conditions make those equal to zero. We can then work out what the displacement gradient is: $$\nabla u = \left[\begin{matrix}-z\nabla^2 w & -\nabla w \\ +\nabla w^* & 0 \end{matrix}\right]$$ I'm being a little loose in my use of the gradient operator -- it's 3D on the left-hand side of this equation and 2D on the right. Notice how the antisymmetric part of the displacement gradient is all in the $x-z$ and $y-z$ components. When we symmetrize the displacement gradient, we get the strain tensor: $$\varepsilon = -z\left[\begin{matrix}\nabla^2 w & 0 \\ 0 & 0\end{matrix}\right].$$ Now remember that the medium is a thin plate. We can express the spatial domain as the product of a 2D footprint domain $\Phi$ and the interval $[-h / 2, +h / 2]$ where $h$ is the plate thickness. The strain energy is then $$\begin{align} J(w) & = \int_\Phi\int_{-h/2}^{+h/2}\left(\mu z^2|\nabla^2 w|^2 + \frac{\lambda}{2} z^2|\Delta w|^2 + fw\right)\mathrm dz\;\mathrm dx\\ & = \int_\Phi\left\{\frac{h^3}{24}\left(2\mu|\nabla ^2 w|^2 + \lambda|\Delta w|^2\right) + hfw\right\}\mathrm dx \end{align}.$$ Here I've used the fact that $\text{tr}(\nabla^2w) = \Delta w$ where $\Delta$ is the Laplace operator. In a gross abuse of notation I'll write this as $$J(w) = \int_\Omega\left(\frac{1}{2}\mathscr C\nabla^2w :\nabla^2 w + fw\right)h\,\mathrm dx$$ where $\mathscr C$ is an elasticity tensor. We'll need the explicit form $$\mathscr C\kappa = \frac{h^2}{12}\left(2\,\mu\,\kappa + \lambda\,\text{tr}(\kappa)\,I\right)$$ in a moment.

We can't discretize this problem right away using a conventional finite element basis. The usual piecewise polynomial basis functions are differentiable and piecewise continuous across cell edges. A conforming basis for a minimization problem involving 2nd-order derivatives would need to instead by continuously differentiable. It's much harder to come up with $C^1$ bases.

We have a few ways out.

  1. Use a $C^1$ basis like Argyris or Hsieh-Clough-Tocher. This approach makes forming the problem easy, but applying boundary conditions hard. Kirby and Mitchell (2018) implemented the Argyris element in Firedrake but had to use Nitsche's method to enforce the boundary conditions.
  2. Use $C^0$ elements and discretize the second derivatives using an interior penalty formulation of the problem. This approach is analogous to discretizing a 2nd-order elliptic problem using DG elements. Forming the problem is harder but applying the boundary conditions is easier. Bringmann et al. (2023) work out the exact form the interior penalty parameters necessary to make the discrete form work right.
  3. Use the mixed or dual form of the problem, which introduces the moment tensor explicitly as an unknown. Discretize it with the Hellan-Herrmann-Johnson or HHJ element. Arnold and Walker (2020) describe this formulation of the problem in more detail.

In the following, I'll take the third approach. The dual form of the problem introduces the moment tensor, which in another gross abuse of notation I'll write as $\sigma$, explicitly as an unknown. We then add the constraint that $\sigma = \mathscr C\nabla^2 w$. But switching to the dual form of the problem inverts constitutive relations, so we instead enforce $$\nabla^2w = \mathscr A\sigma$$ where $\mathscr A$ is the inverse to $\mathscr C$. If $\mathscr C$ is like the elasticity tensor in 3D, then $\mathscr A$ is like the compliance tensor. Because we'll need it later, the explicit form of the compliance tensor is $$\mathscr A\sigma = \frac{6}{h^2\mu}\left(\sigma - \frac{\lambda}{2(\mu + \lambda)}\text{tr}(\sigma)I\right).$$ The full Lagrangian for the dual form is $$L(w, \sigma) = \int_\Omega\left(\frac{1}{2}\mathscr A\sigma : \sigma - \sigma : \nabla^2 w - fw\right)h\,\mathrm dx.$$ Here the displacement $w$ acts like a multiplier enforcing the constraint that $\nabla\cdot\nabla\cdot\sigma + f = 0$.

To get a conforming discretization of the dual problem, we'd still need the basis functions for the displacements to have continuous derivatives. That's exactly what we were trying to avoid by using the dual form in the first place. At the same time, we don't assume any continuity properties of the moments. We can work around this problem by relaxing the continuity requirements for the displacements while strengthening them for the moments. The Hellann-Herrmann-Johnson element discretizes the space of symmetric tensor fields with just enough additional regularity to make this idea work. The HHJ element has normal-normal continuity: the quantity $n\cdot\sigma n$ is continuous across cell boundaries. The tangential and mixed components are unconstrained. This extra continuity is enough to let us use the conventional Lagrange finite elements for the displacements as long as we add a correction term to the Lagrangian: $$\ldots + \sum_\Gamma\int_\Gamma (n\cdot\sigma n)\,\left[\!\!\left[\frac{\partial w}{\partial n}\right]\!\!\right]h\,\mathrm dS$$ where $\Gamma$ are all the edges of the mesh and the double square bracket denotes the jump of a quantity across a cell boundary. What is especially nice about the HHJ element is that this correction is all we need to add; there are no mesh-dependent penalty factors.

Experiments

First let's try and solve a plate problem using simple input data. We'll set all the physical constants equal to 1 but you can look these up for the material of your fancy.

import firedrake
from firedrake import Constant, inner, tr, dot, grad, dx, ds, dS, avg, jump
import ufl
f = Constant(1.0)
h = Constant(1.0)
μ = Constant(1.0)
λ = Constant(1.0)

The code below creates the dual form of the problem using the formula I wrote down above for the explicit form of the compliance tensor. Lord Jesus I hope I did all that algebra correct.

def form_hhj_lagrangian(z, f, h, μ, λ):
    mesh = ufl.domain.extract_unique_domain(z)
    w, σ = firedrake.split(z)
    I = firedrake.Identity(2)
    n = firedrake.FacetNormal(mesh)

     = 6 / (h**2 * μ) * (σ - λ / (2 * (μ + λ)) * tr(σ) * I)

    L_cells = (0.5 * inner(, σ) - inner(σ, grad(grad(w))) + f * w) * h * dx
    L_facets = avg(inner(n, dot(σ, n))) * jump(grad(w), n) * h * dS
    L_boundary = inner(n, dot(σ, n)) * inner(grad(w), n) * h * ds
    return L_cells + L_facets + L_boundary

The right degrees are $p + 1$ for the displacements and $p$ for the moments.

p = 1
cg = firedrake.FiniteElement("CG", "triangle", p + 1)
hhj = firedrake.FiniteElement("HHJ", "triangle", p)

Here we'll work on the unit square. An interesting feature of plate problems is that they don't become much easier to solve analytically on the unit square by separation of variables because of the mixed derivatives. The eigenfunctions of the biharmonic operator (we'll get to them below) are easier to derive on the circle than the square.

n = 64
mesh = firedrake.UnitSquareMesh(n, n, diagonal="crossed")
Q = firedrake.FunctionSpace(mesh, cg)
Σ = firedrake.FunctionSpace(mesh, hhj)
Z = Q * Σ
z = firedrake.Function(Z)
L = form_hhj_lagrangian(z, f, h, μ, λ)
F = firedrake.derivative(L, z)

For plate problems, we almost always have some Dirichlet boundary condition $$w|_{\partial\Omega} = w_0.$$ Since the problem is fourth-order, we need to supply more boundary conditions than we do for, say, the diffusion equation. There are two kinds. The first is the clamped boundary condition: $$\frac{\partial w}{\partial n} = 0.$$ The second is the simply-supported boundary condition: $$n\cdot\sigma n = 0.$$ Before I started writing this I was dreading how I was going to deal with either one. I was shocked at how easy it was to get both using the HHJ element. First, let's see what happens if we only enforce zero displacement at the boundary.

bc = firedrake.DirichletBC(Z.sub(0), 0, "on_boundary")
firedrake.solve(F == 0, z, bc)
w, σ = z.subfunctions
n = firedrake.FacetNormal(mesh)
dw_dn = inner(grad(w), n)
σ_nn = inner(dot(σ, n), n)

boundary_slope = firedrake.assemble(dw_dn ** 2 * ds)
boundary_stress = firedrake.assemble(σ_nn ** 2 * ds)
print(f"Integrated boundary slope:  {boundary_slope}")
print(f"Integrated boundary moment: {boundary_stress}")
Integrated boundary slope:  2.2011469910858315e-10
Integrated boundary moment: 0.004669439781744687

So it looks like the clamped boundary condition is natural with HHJ elements. We can confirm that by looking at a 3D plot.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
firedrake.trisurf(w, axes=ax);
No description has been provided for this image

Now let's see what happens if we apply a Dirichlet boundary condition to the moment part of the solution.

simply_supported_bc = firedrake.DirichletBC(Z.sub(1), 0, "on_boundary")
firedrake.solve(F == 0, z, bcs=[bc, simply_supported_bc])
w, σ = z.subfunctions
n = firedrake.FacetNormal(mesh)
dw_dn = inner(grad(w), n)
σ_nn = inner(dot(σ, n), n)

boundary_slope = firedrake.assemble(dw_dn ** 2 * ds)
boundary_stress = firedrake.assemble(σ_nn ** 2 * ds)
print(f"Integrated boundary slope:  {boundary_slope}")
print(f"Integrated boundary moment: {boundary_stress}")
Integrated boundary slope:  0.006015973798724048
Integrated boundary moment: 0.0

Now the boundary stresses are zero and so the boundary slopes are unconstrained.

fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
firedrake.trisurf(w, axes=ax);
No description has been provided for this image

The reason why this works is that the boundary degrees of freedom for the HHJ element are the normal-normal stress components. So if we set all of those to zero, we get simply-supported boundary conditions. This is analogous to the simpler setting of $H(\text{div})$ elements for vector fields having normal continuity. We can easily enforce boundary conditions like $u\cdot n = 0$ by setting the right degrees of freedom to zero.

We have to work a lot harder to get the boundary conditions right using other finite element bases. For example, the Argyris element has both value and derivative degrees of freedom at the mesh vertices. We can't set the boundary values alone in the same way. In order to set the boundary values with Argyris elements, Kirby and Mitchell had to use Nitsche's method. The Hsieh-Clough-Tocher (HCT) element makes it easy to use clamped boundary conditions but simply-supported is harder. The fact that both types are relatively easy with HHJ is a definite advantage, although it has more total degrees of freedom than HCT.

Eigenfunctions

The eigenfunctions of the plate problem, also known as Chladni patterns, have a fascinating history. This review is good reading. Here I'll show the first few eigenfunctions in a square with simply-supported boundary conditions.

A = firedrake.derivative(F, z)

w, σ = firedrake.split(z)
J = 0.5 * w**2 * dx
M = firedrake.derivative(firedrake.derivative(J, z), z)

bcs = [bc, simply_supported_bc]
problem = firedrake.LinearEigenproblem(A, M, bcs=bcs, restrict=True)

Note the $M$ matrix that we supplied above. We've looked at generalized eigenvalue problems before, like in the post on Weyl's law. In that setting we were solving eigenvalue problems of the form $$A\phi = \lambda M\phi$$ where $M$ was some symmetric positive-definite matrix instead of the identity. In those cases we always used the mass matrix. For saddle-point eigenvalue problems, we're often using something that is no longer definite. Here the form we want is instead $$\left[\begin{matrix} A & B^* \\ B & 0\end{matrix}\right]\left[\begin{matrix}\sigma \\ w\end{matrix}\right] = \lambda\left[\begin{matrix} 0 & 0 \\ 0 & M\end{matrix}\right]\left[\begin{matrix}\sigma \\ w\end{matrix}\right]$$ where $M$ is the mass matrix for the displacement block.

num_values = 40
opts = {
    "solver_parameters": {
        "eps_gen_hermitian": None,
        "eps_target_real": None,
        "eps_smallest_real": None,
        "st_type": "sinvert",
        "st_ksp_type": "gmres",
        "st_pc_type": "lu",
        "st_pc_factor_mat_solver_type": "mumps",
        "eps_tol": 1e-8,
    },
    "n_evals": num_values,
}
eigensolver = firedrake.LinearEigensolver(problem, **opts)
num_converged = eigensolver.solve()
print(f"Number of converged eigenfunctions: {num_converged}")
Number of converged eigenfunctions: 42

The plot below shows the nodal lines of the eigenfunctions. Many of them come in pairs due to the axis-flipping symmetry. These are the patterns that so fascinated Chladni and others.

ϕs = [eigensolver.eigenfunction(n)[0].sub(0) for n in range(num_values)]
fig, axes = plt.subplots(nrows=5, ncols=8, sharex=True, sharey=True)
for ax in axes.flatten():
    ax.set_aspect("equal")
    ax.set_axis_off()

levels = [-10, 0, 10]
for ax, ϕ in zip(axes.flatten(), ϕs):
    firedrake.tricontour(ϕ, levels=levels, axes=ax)
No description has been provided for this image

Mantle convection

This code is taken from a chapter in The FEniCS Book, which is available for free online. That chapter is in turn an implementation of a model setup from van Keken et al (1997). I took the thermomechanical parts and removed the chemistry. This paper and the code from the FEniCS book all use a non-dimensional form of the equations, which I've adopted here.

In this notebook, we'll see the Stokes equations again, but we'll couple them to the evolution of temperature through an advection-diffusion equation. The extensive quantity is not the temperature itself but the internal energy density $E = \rho c_p T$ where $\rho$, $c_p$ are the mass density and the specific heat at constant pressure. The flux of heat is $$F = \rho c_p T u - k\nabla T.$$ We'll assume there are no heat sources but, but the real physics would include sources from decay of radioactive elements, strain heating, and chemical reactions. The variational form of the heat equation is: $$\int_\Omega\left\{\partial_t(\rho c_p T)\,\phi - (\rho c_p Tu - k\nabla T)\cdot\nabla\phi - Q\phi\right\}dx = 0$$ for all test functions $\phi$. I've written this in such a way that it still makes sense when the density and heat capacity aren't constant. We'll assume they are (mostly) constant in the following. The variational form of the momentum balance equation is $$\int_\Omega\left\{2\mu\varepsilon(u): \dot\varepsilon(v) - p\nabla\cdot v - q\nabla\cdot u - \rho g\cdot v\right\}dx = 0$$ for all velocity and pressure test functions $v$, $q$.

The characteristic that makes this problem interesting is that we assume the flow is just barely compressible. This is called the Boussinesq approximation). In practice, it means we do two apparently contradictory things. First, we assume that the flow is incompressible: $$\nabla\cdot u = 0.$$ Second, we assume that the density that we use on the right-hand side of the momentum balance equation actually is compressible, and that the degree of compression is linear in the temperature: $$\rho = \rho_0(1 - \beta(T - T_0))$$ where $\rho_0$ is a reference density and $\beta$ is thermal expansion coefficient. These assumptions aren't consistent with each other. We're also not consistent in using this everywhere. For example, we only use this expression on the right-hand side of the momentum balance equation, but we'll still take the internal energy density to be equal to $\rho_0c_pT$ in the energy equation. In other words, the material is compressible as far as momentum balance is concerned but not energy balance. I find that uncomfortable. But I would find writing a full compressible solver to be blinding agony. Anyway it lets us make pretty pictures.

I won't go into a full non-dimensionalization of the problem, which is detailed in the van Keken paper. The most important dimensionless numbers for this system are the Prandtl number and the Rayleigh number. The Prandtl number quantifies the ratio of momentum diffusivity to thermal diffusivity: $$\text{Pr} = \frac{\mu / \rho}{k / \rho c_p} = \frac{\nu}{\kappa}$$ where $\nu$ and $\kappa$ are respectively the kinematic viscosity and thermal diffusivity. For mantle flows, the Prandtl number is on the order of $10^{25}$. We can get a sense for a characteristic velocity scale from the momentum balance equation. First, we can take a factor of $\rho_0g$ into the pressure. If we take $U$, $L$, and $\Delta T$ to be the characteristic velocity, length, and temperature difference scales, non-dimensionalizing the momentum balance equation gives $$\frac{\mu U}{L^2} \approx \rho_0g\beta\Delta T$$ which then implies that $U = \rho_0g\beta L^2\Delta T / \mu$. Now we can compute the thermal Péclet number $$\text{Pe} = \frac{UL}{\kappa}.$$ When the Péclet number is large, we're in the convection-dominated regime; when it's small, we're in the diffusion-dominated regime. Taking our estimate of velocity scale from before, we get $$\text{Pe} = \frac{\rho_0 g \beta L^3 \Delta T}{\kappa\mu}.$$ This is identical to the Rayleigh number, which we'll write as $\text{Ra}$. Lord Rayleigh famously used linear stability analysis to show that a fluid that is heated from below and cooled from above is unstable when the Rayleigh number exceeds 600 or so. Here we'll take the Rayleigh number to be equal to $10^6$ as in the van Keken paper. It's a worthwhile exercise in numerical analysis to see what it takes to make a solver go for higher values of the Rayleigh number.

Here we're using a rectangular domain. Again, the spatial scales have been non-dimensionalized.

import numpy as np
from numpy import pi as π
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from tqdm.notebook import trange, tqdm
import firedrake
from firedrake import (
    Constant, sqrt, exp, min_value, max_value, inner, sym, grad, div, dx
)
import irksome
from irksome import Dt

Lx, Ly = Constant(2.0), Constant(1.0)
ny = 32
nx = int(float(Lx / Ly)) * ny
mesh = firedrake.RectangleMesh(
    nx, ny, float(Lx), float(Ly), diagonal="crossed"
)

The initial condition for temperature involves loads of annoyingly long expressions, but it's all in Appendix A of the van Keken paper.

def clamp(z, zmin, zmax):
    return min_value(Constant(zmax), max_value(Constant(zmin), z))

def switch(z):
    return exp(z) / (exp(z) + exp(-z))

Ra = Constant(1e6)

ϵ = Constant(1 / nx)
x = firedrake.SpatialCoordinate(mesh)

q = Lx**(7 / 3) / (1 + Lx**4)**(2 / 3) * (Ra / (2 * np.sqrt(π)))**(2/3)
Q = 2 * sqrt(Lx / (π * q))
T_u = 0.5 * switch((1 - x[1]) / 2 * sqrt(q / (x[0] + ϵ)))
T_l = 1 - 0.5 * switch(x[1] / 2 * sqrt(q / (Lx - x[0] + ϵ)))
T_r = 0.5 + Q / (2 * np.sqrt(π)) * sqrt(q / (x[1] + 1)) * exp(-x[0]**2 * q / (4 * x[1] + 4))
T_s = 0.5 - Q / (2 * np.sqrt(π)) * sqrt(q / (2 - x[1])) * exp(-(Lx - x[0])**2 * q / (8 - 4 * x[1]))
expr = T_u + T_l + T_r + T_s - Constant(1.5)

degree = 1
temperature_space = firedrake.FunctionSpace(mesh, "CG", degree)
T_0 = firedrake.Function(temperature_space).interpolate(clamp(expr, 0, 1))
T = T_0.copy(deepcopy=True)

The important point is to make the fluid hotter below and on one side and cooler above and on the other side, shown in the plot below.

def subplots():
    fig, axes = plt.subplots()
    axes.set_aspect("equal")
    axes.set_axis_off()
    axes.set_xlim(0, float(Lx))
    axes.set_ylim(0, float(Ly))
    return fig, axes

fig, axes = subplots()
firedrake.tripcolor(T, cmap="inferno", axes=axes);
No description has been provided for this image

Next we need to make some function spaces for the fluid velocity and pressure. Note how the degree of the velocity space is one higher than that of the pressure -- we're using the Taylor-Hood element again.

pressure_space = firedrake.FunctionSpace(mesh, "CG", 1)
velocity_space = firedrake.VectorFunctionSpace(mesh, "CG", 2)
Z = velocity_space * pressure_space

Once we've created a function in the mixed space, we can then pull out the two parts with the split method.

z = firedrake.Function(Z)
u, p = firedrake.split(z)

The code below creates the variational form of the Stokes equations $$\nabla\cdot\tau - \nabla p + \text{Ra}\;T\; g = 0$$ with temperature-dependent gravitational forcing.

μ = Constant(1)
ε = lambda u: sym(grad(u))

v, q = firedrake.TestFunctions(z.function_space())

τ = 2 * μ * ε(u)
g = Constant((0, -1))
f = -Ra * T * g
F = (inner(τ, ε(v)) - q * div(u) - p * div(v) - inner(f, v)) * dx

We can use the .sub method to pull parts out of mixed spaces, which we need in order to create the right boundary conditions.

bc = firedrake.DirichletBC(Z.sub(0), Constant((0, 0)), "on_boundary")

A bit of magic in order to tell the linear solver that the Stokes equations have a null space we need to project out.

basis = firedrake.VectorSpaceBasis(constant=True, comm=firedrake.COMM_WORLD)
nullspace = firedrake.MixedVectorSpaceBasis(Z, [Z.sub(0), basis])

Here I'm creating solver object since we'll need to repeatedly solve the same system many times.

stokes_problem = firedrake.NonlinearVariationalProblem(F, z, bc)
parameters = {
    "nullspace": nullspace,
    "solver_parameters": {
        "ksp_type": "preonly",
        "pc_type": "lu",
        "pc_factor_mat_solver_type": "mumps",
    },
}
stokes_solver = firedrake.NonlinearVariationalSolver(stokes_problem, **parameters)
stokes_solver.solve()
fig, axes = subplots()
firedrake.streamplot(
    z.sub(0), axes=axes, resolution=1/40, cmap="inferno", seed=1729
);
No description has been provided for this image

Here we're setting up the temperature solver, which includes both convection and diffusion. I've set the timestep based on the mesh size and the maximum speed that we just found above from the velocity solution.

ρ, c, k = Constant(1), Constant(1), Constant(1)
δx = mesh.cell_sizes.dat.data_ro[:].min()
umax = z.sub(0).dat.data_ro[:].max()
δt = Constant(δx / umax)

ϕ = firedrake.TestFunction(temperature_space)
f = ρ * c * T * u - k * grad(T)
G = (ρ * c * Dt(T) * ϕ - inner(f, grad(ϕ))) * dx

lower_bc = firedrake.DirichletBC(temperature_space, 1, [3])
upper_bc = firedrake.DirichletBC(temperature_space, 0, [4])
bcs = [lower_bc, upper_bc]

method = irksome.BackwardEuler()
temperature_solver = irksome.TimeStepper(G, method, Constant(0.0), δt, T, bcs=bcs)
print(f"Timestep: {float(δt):0.03g}")
Timestep: 0.000132

And the timestepping loop. Note that the final time is on a non-dimensional scale again, in physical time it works out to be on the order of a hundred million years. Here we're using an operator splitting approach. We first update the temperature, and then we compute a new velocity and pressure. The splitting error goes like $\mathscr{O}(\delta t)$ as the timestep is reduced. So if we use a first-order integration scheme like backward Euler then the splitting error is asymptotically the same as that of the discretization itself. We could get $\mathscr{O}(\delta t^3)$ or higher convergence by using, say, the Radau-IIA method, but the total error would be dominated by splitting, so there would be little point in trying harder.

final_time = 0.25
num_steps = int(final_time / float(δt))
Ts = [T.copy(deepcopy=True)]
zs = [z.copy(deepcopy=True)]

for step in trange(num_steps):
    temperature_solver.advance()
    stokes_solver.solve()

    Ts.append(T.copy(deepcopy=True))
    zs.append(z.copy(deepcopy=True))

The movie below shows the temperature evolution. The rising and sinking plumes of hot and cold fluid correspond to the mantle plumes that produce surface volcanism.

%%capture
fig, axes = subplots()
kw = {"num_sample_points": 4, "vmin": 0, "vmax": 1, "cmap": "inferno"}
colors = firedrake.tripcolor(Ts[0], axes=axes, **kw)
fn_plotter = firedrake.FunctionPlotter(mesh, num_sample_points=4)
animate = lambda T: colors.set_array(fn_plotter(T))
animation = FuncAnimation(fig, animate, frames=tqdm(Ts), interval=1e3/30)
HTML(animation.to_html5_video())
z = zs[-1]
u, p = z.subfunctions
fig, axes = subplots()
firedrake.streamplot(
    u, axes=axes, resolution=1/40, cmap="inferno", seed=1729
);
No description has been provided for this image

What next

There are a few features in the van Keken paper and otherwise that I didn't attempt here. The original van Keken paper adds chemistry. Solving the associated species transport equation with as little numerical diffusion as possible is hard. You could then make the fluid buoyancy depend on chemical composition and add temperature-dependent chemical reactions.

Real mantle fluid also has a temperature-dependent viscosity. When I tried to add this using the splitting scheme above, the solver quickly diverges. Getting that to work might require a fully coupled time-integration scheme, which I did try. Rather than split up the temperature and the velocity/pressure solves, a fully coupled scheme would solve for all three fields at once. If you could make this work, it would open up the possibility of using higher-order methods like Radau-IIA. Whether it would go or not seemed to depend on the machine and the day of the week. I'll explore that more in a future post.

Kármán vortices

In previous posts, I looked at how to discretize the incompressible Stokes equations. The Stokes equations are a good approximation when the fluid speed is small enough that inertial effects are negligible. The relevant dimensionless number is the ratio $$\text{Re} = \frac{\rho UL}{\mu},$$ the Reynolds number. Stokes flow applies when the Reynolds number is substantially less than 1. The incompressibility constraint adds a new difficulty: we have to make good choices of finite element bases for the velocity and pressure. If we fail to do that, the resulting linear systems can have no solution or infinitely many.

Here I'll look at how we can discretize the full Navier-Stokes equations: $$\frac{\partial}{\partial t}\rho u + \nabla\cdot\rho u\otimes u = -\nabla p + \nabla\cdot \tau$$ where the deviatoric stress tensor is $\tau = 2\mu\dot\varepsilon$. The inertial terms are nonlinear, which makes this problem more difficult yet than the Stokes equations.

The goal here will be to simulate the famous von Kármán vortex street.

Making the initial geometry

First, we'll make a domain consisting of a circle punched out of a box. The fluid flow in the wake of the circle will produce vortices.

import gmsh
gmsh.initialize()
import numpy as np
from numpy import pi as π

Lx = 6.0
Ly = 2.0
lcar = 1 / 16

gmsh.model.add("chamber")
geo = gmsh.model.geo
ps = [(0, 0), (Lx, 0), (Lx, Ly), (0, Ly)]
box_points = [geo.add_point(*p, 0, lcar) for p in ps]
box_lines = [
    geo.add_line(i1, i2) for i1, i2 in zip(box_points, np.roll(box_points, 1))
]

for line in box_lines:
    geo.add_physical_group(1, [line])

f = 1 / 3
c = np.array([f * Lx, Ly / 2, 0])
center = geo.add_point(*c)
r = Ly / 8
num_circle_points = 16
θs = np.linspace(0.0, 2 * π, num_circle_points + 1)[:-1]
ss = np.column_stack((np.cos(θs), np.sin(θs), np.zeros(num_circle_points)))
tie_points = [geo.add_point(*(c + r * s), lcar) for s in ss]
circle_arcs = [
    geo.add_circle_arc(p1, center, p2)
    for p1, p2 in zip(tie_points, np.roll(tie_points, 1))
]

geo.add_physical_group(1, circle_arcs)

outer_curve_loop = geo.add_curve_loop(box_lines)
inner_curve_loop = geo.add_curve_loop(circle_arcs)
plane_surface = geo.add_plane_surface([outer_curve_loop, inner_curve_loop])
geo.add_physical_group(2, [plane_surface])
geo.synchronize()
gmsh.model.mesh.generate(2)
gmsh.write("chamber.msh")
Info    : Meshing 1D...
Info    : [  0%] Meshing curve 1 (Line)
Info    : [ 10%] Meshing curve 2 (Line)
Info    : [ 20%] Meshing curve 3 (Line)
Info    : [ 20%] Meshing curve 4 (Line)
Info    : [ 30%] Meshing curve 5 (Circle)
Info    : [ 30%] Meshing curve 6 (Circle)
Info    : [ 40%] Meshing curve 7 (Circle)
Info    : [ 40%] Meshing curve 8 (Circle)
Info    : [ 50%] Meshing curve 9 (Circle)
Info    : [ 50%] Meshing curve 10 (Circle)
Info    : [ 60%] Meshing curve 11 (Circle)
Info    : [ 60%] Meshing curve 12 (Circle)
Info    : [ 70%] Meshing curve 13 (Circle)
Info    : [ 70%] Meshing curve 14 (Circle)
Info    : [ 80%] Meshing curve 15 (Circle)
Info    : [ 80%] Meshing curve 16 (Circle)
Info    : [ 90%] Meshing curve 17 (Circle)
Info    : [ 90%] Meshing curve 18 (Circle)
Info    : [100%] Meshing curve 19 (Circle)
Info    : [100%] Meshing curve 20 (Circle)
Info    : Done meshing 1D (Wall 0.00711856s, CPU 0.003696s)
Info    : Meshing 2D...
Info    : Meshing surface 1 (Plane, Frontal-Delaunay)
Info    : Done meshing 2D (Wall 0.139491s, CPU 0.135442s)
Info    : 4047 nodes 8113 elements
Info    : Writing 'chamber.msh'...
Info    : Done writing 'chamber.msh'
import firedrake
mesh = firedrake.Mesh("chamber.msh")
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set_aspect("equal")
firedrake.triplot(mesh, axes=ax)
ax.legend(loc="upper right");
No description has been provided for this image

Initial velocity

We'll use a fixed inflow velocity $$u_x = 4 \frac{y}{L_y}\left(1 - \frac{y}{L_y}\right) u_{\text{in}}.$$ We'll take as characteristic length scale the radius of the disc $L_y / 8$. In order to see the effect we want, we'll need a Reynolds number that's on the order of 100 or greater.

from firedrake import Constant, inner, outer, sym, grad, div, dx, ds
import irksome
from irksome import Dt

μ = Constant(1e-2)

x, y = firedrake.SpatialCoordinate(mesh)
u_in = Constant(5.0)
ly = Constant(Ly)
expr = firedrake.as_vector((4 * u_in * y / ly * (1 - y / ly), 0))

I've used Taylor-Hood elements for this demonstration. It would be a good exercise to repeat this using, say, Crouzeix-Raviart or other elements.

cg1 = firedrake.FiniteElement("CG", "triangle", 1)
cg2 = firedrake.FiniteElement("CG", "triangle", 2)
Q = firedrake.FunctionSpace(mesh, cg1)
V = firedrake.VectorFunctionSpace(mesh, cg2)
Z = V * Q
z = firedrake.Function(Z)

Here we can use the fact that the Stokes problem has a minimization form. The Navier-Stokes equations do not because of the convective term.

u, p = firedrake.split(z)
ε = lambda v: sym(grad(v))

L = (μ * inner(ε(u), ε(u)) - p * div(u)) * dx

inflow_ids = (1,)
outflow_ids = (3,)
side_wall_ids = (2, 4, 5)
side_wall_bcs = firedrake.DirichletBC(Z.sub(0), Constant((0.0, 0.0)), side_wall_ids)
inflow_bcs = firedrake.DirichletBC(Z.sub(0), expr, inflow_ids)
bcs = [side_wall_bcs, inflow_bcs]

F = firedrake.derivative(L, z)
firedrake.solve(F == 0, z, bcs)

A bit of a philosophical point is in order here. We picked the inflow velocity and the viscosity to produce a high Reynolds number (about 200). The Stokes equations, on the other hand, are physically realistic only when the Reynolds number is $\ll 1$. But we can still solve the Stokes equations in the high-Reynolds number limit. In other words, the mathematical model remains well-posed even in regimes where it is not applicable. Here we're only using the result to initialize a simulation using the "right" model. But it's a mistake -- and one I've made -- to lull yourself into a false sense of correctness merely because the model gave you an answer.

fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_axis_off()
colors = firedrake.streamplot(
    z.subfunctions[0], resolution=1/16, seed=1729, cmap="Blues", axes=ax
)
fig.colorbar(colors, orientation="horizontal");
No description has been provided for this image

Solution method

There are a host of methods for solving the Navier-Stokes equations by breaking them up into two simpler problems. These are known as projection methods). I started out trying those but it required effort, which I find disgusting. So I threw backward Euler at it and it worked.

There are some non-trivial decisions to make about both the variational form and the boundary conditions. I started writing this wanting to use pressure boundary conditions at both the inflow and outflow. (Of course I had to be reminded that you can't prescribe pressures but rather tractions.) This went badly. If a wave reflects off of the obstacle, back upstream, and out the inflow boundary, the simulation will crash. So I had to dial back the challenge and use a fixed inflow velocity and a traction boundary condition at the outflow.

Almost any writing you see about the Navier-Stokes equations will express the problem in differential form and will use incompressibility to apply what might appear to be simplifications. For example, if the fluid is incompressible and the viscosity is constant, then you can rewrite the viscous term like so: $$\nabla\cdot \mu(\nabla u + \nabla u^\top) = \mu\nabla^2u.$$ You'll see this form in almost all numerical methods or engineering textbooks. I don't like it for two reasons. First, the apparent simplification only gets in your way as soon as you want to consider fluids with variable viscosity. Mantle convection is one obvious case -- the temperature and chemistry of mantle rock can change the viscosity by several orders of magnitude. Second, it gives the wrong boundary conditions when you try to discretize the problem (see Limache et al. (2007)). I've retained the symmetric gradients of the velocity and test function in the form below.

The second apparent simplification uses incompressibility to rewrite the convection term: $$\nabla\cdot \rho u\otimes u = u\cdot\nabla \rho u.$$ This form is ubiquitous and reflects a preference for thinking about fluid flow in a Lagrangian reference frame. I prefer to avoid it although both are correct, unlike the Laplacian form of the viscosity. Given any extensive density $\phi$, regardless of its tensor rank, the flux will include a term $\phi\cdot u$. The conversion of this flux from the conservation to the variational form is then $$-\int_\Omega \phi u\cdot\nabla\psi\,dx$$ and this is true of mass, momentum, energy, whatever you like. Taking something that was a divergence and making it not a divergence obfuscates the original conservation principle. It also stops you from pushing the differential operator over onto a test function. So I've instead coded up the convection term as a discretization of $$-\int_\Omega u\otimes u\cdot\dot\varepsilon(v)\,dx + \int_{\partial\Omega\cap\{u\cdot n \ge 0\}}(u\cdot n)(v\cdot n)ds.$$ In the first term, I've used the symmetric gradient of $v$ because the contraction of a symmetric and an anti-symmetric tensor is zero.

All together now, the variational form that I'm using is $$\begin{align} 0 & = \int_\Omega\left\{\rho\,\partial_tu\cdot v - \rho u\otimes u:\varepsilon(v) - p\nabla\cdot v - q\nabla\cdot u + 2\mu\,\dot\varepsilon(u):\dot\varepsilon(v)\right\}dx \\ & \qquad\qquad + \int_\Gamma (\rho u\cdot n)(u \cdot v)ds. \end{align}$$ for all test functions $v$ and $q$.

v, q = firedrake.TestFunctions(Z)
u, p = firedrake.split(z)

ρ = firedrake.Constant(1.0)

F_1 = (
    ρ * inner(Dt(u), v) -
    ρ * inner(ε(v), outer(u, u)) -
    p * div(v) -
    q * div(u) +
    2 * μ * inner(ε(u), ε(v))
) * dx

n = firedrake.FacetNormal(mesh)
F_2 = ρ * inner(u, v) * inner(u, n) * ds(outflow_ids)

F = F_1 + F_2

We'll need to make some choice about the timestep. Here I've computed the CFL time for the mesh and the initial velocity that we computed above. This choice might not be good enough. If we initialized the velocity by solving the Stokes equations, the fluid could evolve to a much higher speed. We might then find that this timestep is inadequate. A principled solution would be to use an adaptive scheme.

dg0 = firedrake.FiniteElement("DG", "triangle", 0)
Δ = firedrake.FunctionSpace(mesh, dg0)
area = firedrake.Function(Δ).project(firedrake.CellVolume(mesh))
δx_min = np.sqrt(2 * area.dat.data_ro.min())

u, p = z.subfunctions
U_2 = firedrake.Function(Δ).project(inner(u, u))
u_max = np.sqrt(U_2.dat.data_ro.max())
cfl_time = δx_min / u_max
print(f"Smallest cell diameter: {δx_min:0.4f}")
print(f"Max initial velocity:   {u_max:0.4f}")
print(f"Timestep:               {cfl_time:0.4f}")

dt = firedrake.Constant(0.5 * cfl_time)
Smallest cell diameter: 0.0351
Max initial velocity:   6.4170
Timestep:               0.0055
params = {
    "solver_parameters": {
        "snes_monitor": ":navier-stokes-output.log",
        "snes_atol": 1e-12,
        "ksp_atol": 1e-12,
        "snes_type": "newtonls",
        "ksp_type": "gmres",
        "pc_type": "lu",
        "pc_factor_mat_solver_type": "mumps",
    },
    "bcs": bcs,
}

method = irksome.BackwardEuler()
t = firedrake.Constant(0.0)
solver = irksome.TimeStepper(F, method, t, dt, z, **params)

I've added a bit of code to show some diagnostic information in the progress bar. First I have it printing out the number of Newton iterations that were required to compute each timestep. If you see this going much above 20 then something is off. Second, I had it print out the maximum pressure. Both of these were useful when I was debugging this code.

from tqdm.notebook import trange

zs = [z.copy(deepcopy=True)]

final_time = 10.0
num_steps = int(final_time / float(dt))
progress_bar = trange(num_steps)
for step in progress_bar:
    solver.advance()
    zs.append(z.copy(deepcopy=True))
    iter_count = solver.solver.snes.getIterationNumber()
    pmax = z.subfunctions[1].dat.data_ro.max()
    progress_bar.set_description(f"{iter_count}, {pmax:0.4f} | ")

Finally, we'll make an animated quiver plot because it looks pretty.

%%capture

from tqdm.notebook import tqdm
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_axis_off()

X = mesh.coordinates.dat.data_ro
V = mesh.coordinates.function_space()
u_t = zs[0].subfunctions[0].copy(deepcopy=True)
interpolator = firedrake.Interpolate(u_t, V)
u_X = firedrake.assemble(interpolator)
u_values = u_X.dat.data_ro

arrows = firedrake.quiver(zs[0].subfunctions[0], axes=ax, cmap="Blues")
def animate(z):
    u_t.assign(z.subfunctions[0])
    u_X = firedrake.assemble(interpolator)
    u_values = u_X.dat.data_ro
    arrows.set_UVC(*(u_values.T))
animation = FuncAnimation(fig, animate, tqdm(zs), interval=1e3/60)
from IPython.display import HTML
HTML(animation.to_html5_video())

There's an empirical formula for the frequency of vortex shedding. A fun follow-up to this would be to compute the shedding frequency from the simulation using, say, a windowed Fourier transform, and comparing the result to the empirical formula. Next on the docket is comparing the results using different spatial finite elements.

Overland Flow

In this post, we'll look at overland flow -- how rainwater drains across a landscape. The equations of motion are pretty rowdy and have some fascinating effects. To derive them, we'll start from the shallow water or Saint Venant equations for the water layer thickness $h$ and velocity $u$:

$$\begin{align} \frac{\partial h}{\partial t} + \nabla\cdot hu & = \dot a \\ \frac{\partial}{\partial t}hu + \nabla\cdot hu\otimes u & = -gh\nabla (b + h) - k|u|u \end{align}$$

The final term in the momentum equation represents frictional dissipation and $k$ is a (dimensionless) friction coefficient. Using the shallow water equations for predicting overland flow is challenging because the thickness can go to zero.

For many thin open channel flows, however, the fluid velocity can be expressed purely in terms of the surface slope and other factors. You could arrive at one such simplification by assuming that the inertial terms in the momentum equation are zero:

$$k|u|u + gh\nabla(b + h) = 0.$$

This approximation is known as the Darcy-Weisbach equation. We'll use it in the following because it's simple and it illustrates all the major difficulties. For serious work, you'd probably want to use the Manning formula, as it has some theoretical justification for turbulent open channel flows. The overall form of the equation and the resulting numerical challenges are the same in each case.

Putting together the Darcy-Weisbach equation for the velocity with the mass conservation equation gives a single PDE for the water layer thickness:

$$\frac{\partial h}{\partial t} - \nabla\cdot\left(\sqrt{\frac{gh^3}{k}}\frac{\nabla(b + h)}{\sqrt{|\nabla(b + h)|}}\right) = \dot a.$$

This looks like a parabolic equation, but there's a catch! The diffusion coefficient is proportional to $h^{3/2}$, so it can go to zero when $h = 0$; all the theory for elliptic and parabolic equations assumes that the diffusion coefficient is bounded below. For a non-degenerate parabolic PDE, disturbances propagate with infinite speed. For the degenerate problem we're considering, that's no longer true -- the $h = 0$ contour travels with finite speed! While we're using the Darcy-Weisbach equation to set the velocity here, we still get finite propagation speed if we use the Manning equation instead. What's important is that the velocity is propertional to some power of the thickness and surface slope.

Eliminating the velocity entirely from the problem is convenient for analysis, but not necessarily the best way to go numerically. We'll retain the velocity as an unknown, which gives the resulting variational form much of the same character as the mixed discretization of the heat equation.

As our model problem, we'll use the dam break test case from Santillana and Dawson (2009). They discretized the overland flow equations using the local discontinuous Galerkin or LDG method, which extends DG for hyperbolic systems to mixed advection-diffusion problems. We'll use different numerics because Firedrake has all the hipster elements. I'm eyeballing the shape of the domain from their figures.

import numpy as np
import gmsh

gmsh.initialize()
geo = gmsh.model.geo

coords = np.array(
    [
        [0.0, 0.0],
        [3.0, 0.0],
        [3.0, 2.0],
        [2.0, 2.0],
        [2.0, 4.0],
        [3.0, 4.0],
        [3.0, 6.0],
        [0.0, 6.0],
        [0.0, 4.0],
        [1.0, 4.0],
        [1.0, 2.0],
        [0.0, 2.0],
    ]
)

lcar = 0.125
points = [geo.add_point(*x, 0, lcar) for x in coords]
edges = [
    geo.add_line(p1, p2) for p1, p2 in
    zip(points, np.roll(points, 1))
]

geo.add_physical_group(1, edges)
loop = geo.add_curve_loop(edges)

plane_surface = geo.add_plane_surface([loop])
geo.add_physical_group(2, [plane_surface])

geo.synchronize()

gmsh.model.mesh.generate(2)
gmsh.write("dam.msh")

gmsh.finalize()
Info    : Meshing 1D...
Info    : [  0%] Meshing curve 1 (Line)
Info    : [ 10%] Meshing curve 2 (Line)
Info    : [ 20%] Meshing curve 3 (Line)
Info    : [ 30%] Meshing curve 4 (Line)
Info    : [ 40%] Meshing curve 5 (Line)
Info    : [ 50%] Meshing curve 6 (Line)
Info    : [ 60%] Meshing curve 7 (Line)
Info    : [ 60%] Meshing curve 8 (Line)
Info    : [ 70%] Meshing curve 9 (Line)
Info    : [ 80%] Meshing curve 10 (Line)
Info    : [ 90%] Meshing curve 11 (Line)
Info    : [100%] Meshing curve 12 (Line)
Info    : Done meshing 1D (Wall 0.000991202s, CPU 0.000991s)
Info    : Meshing 2D...
Info    : Meshing surface 1 (Plane, Frontal-Delaunay)
Info    : Done meshing 2D (Wall 0.0347752s, CPU 0.034096s)
Info    : 1158 nodes 2326 elements
Info    : Writing 'dam.msh'...
Info    : Done writing 'dam.msh'
import firedrake

mesh = firedrake.Mesh("dam.msh")
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.set_aspect("equal")
firedrake.triplot(mesh, axes=ax);
No description has been provided for this image

The bed profile consists of an upper, elevated basin, together with a ramp down to a lower basin.

from firedrake import Constant, min_value, max_value

x = firedrake.SpatialCoordinate(mesh)

y_0 = Constant(2.0)
y_1 = Constant(4.0)
b_0 = Constant(0.0)
b_1 = Constant(1.0)
b_expr = b_0 + (b_1 - b_0) * max_value(0, min_value(1, (x[1] - y_0) / (y_1 - y_0)))

S = firedrake.FunctionSpace(mesh, "CG", 1)
b = firedrake.Function(S).interpolate(b_expr)
fig = plt.figure()
axes = fig.add_subplot(projection="3d")
axes.set_box_aspect((3.0, 6.0, 1.0))
axes.set_axis_off()
firedrake.trisurf(b, axes=axes);
No description has been provided for this image

As I alluded to before, rather than eliminate the velocity entirely, we'll keep it as a field to be solved for explicitly. The problem we're solving, while degenerate, is pretty similar to the mixed form of the heat equation. This suggests that we should use element pairs that are stable for mixed Poisson. Here I'm using the MINI element: continuous linear basis functions for the thickness, and continuous linear enriched with cubic bubbles for the velocity. We could also have used a more proper $H(\text{div})$-conforming pair, like discontinuous Galerkin for the thickness and Raviart-Thomas or Brezzi-Douglas-Marini for the velocity.

cg1 = firedrake.FiniteElement("CG", "triangle", 1)
Q = firedrake.FunctionSpace(mesh, cg1)
b3 = firedrake.FiniteElement("B", "triangle", 3)
V = firedrake.VectorFunctionSpace(mesh, cg1 + b3)
Z = Q * V

The dam break problem specifies that the thickness is equal to 1 in the upper basin and 0 elsewhere. I've done a bit of extra work below because the expression for $h$ is discontinuous, and interpolating it directly gives some obvious mesh artifacts. Instead, I've chosen to project the expression and clamp it above and below.

h_expr = firedrake.conditional(x[1] >= y_1, 1.0, 0.0)
h_0 = firedrake.project(h_expr, Q)
h_0.interpolate(min_value(1, max_value(0, h_0)));
fig, ax = plt.subplots()
ax.set_aspect("equal")
colors = firedrake.tripcolor(h_0, axes=ax)
fig.colorbar(colors);
No description has been provided for this image
z = firedrake.Function(Z)
z_n = firedrake.Function(Z)
δt = Constant(1.0 / 32)
z_n.sub(0).assign(h_0)
z.sub(0).assign(h_0);

The test case in the Santillana and Dawson paper uses a variable friction coefficient in order to simulate the effect of increased drag when flowing over vegetation.

from firedrake import inner

k_0 = Constant(1.0)
δk = Constant(4.0)
r = Constant(0.5)
x_1 = Constant((1.5, 1.0))
x_2 = Constant((1.0, 3.5))
x_3 = Constant((2.0, 2.5))
ψ = sum(
    [
        max_value(0, 1 - inner(x - x_i, x - x_i) / r**2)
        for x_i in [x_1, x_2, x_3]
    ]
)
k = k_0 + δk * ψ
fig, axes = plt.subplots()
axes.set_aspect("equal")
firedrake.tripcolor(firedrake.Function(S).interpolate(k), axes=axes);
No description has been provided for this image

The code below defines the variational form of the overland flow equations.

from firedrake import div, grad, dx

g = Constant(1.0)

h, q = firedrake.split(z)
h_n = firedrake.split(z_n)[0]
ϕ, v = firedrake.TestFunctions(Z)

F_h = ((h - h_n) / δt + div(q)) * ϕ * dx
friction = k * inner(q, q)**0.5 * q
gravity = -g * h**3 * grad(b + h)
F_q = inner(friction - gravity, v) * dx
F = F_h + F_q

We'll run into trouble if we try and use a Newton-type method on the true variational form. Notice how the $q$-$q$ block of the derivative will go to zero whenever $q = 0$. This will happen whenever the thickness is zero too. The usual hack is to put a fudge factor $\varepsilon$ into the variational form, as shown below.

ϵ = Constant(1e-3)
friction = k * (inner(q, q) + ϵ**2)**0.5 * q
gravity = -g * h**3 * grad(b + h)
F_qϵ = inner(friction - gravity, v) * dx
F_ϵ = F_h + F_qϵ

The disadvantage of is that we're then solving a slightly different physics problem. We don't have a great idea ahead of time of what $\varepsilon$ should be either. If we choose it too large, the deviation from the true problem is large enough that we can't believe the results. But if we choose it too small, the derivative will fail to be invertible.

We can take a middle course by instead just using the perturbed variational form just to define the derivative in Newton's method, but keep the true variational form as the quantity to find a root for. To do this, we'll pass the derivative of $F_\varepsilon$ as the Jacobian or J argument to the nonlinear variational problem object. Choosing $\varepsilon$ too small will still cause the solver to crash. Taking it to be too large, instead of causing us to solve a completely different problem, will now only make the solver go slower instead. We still want to make $\varepsilon$ as small as possible, but to my mind, getting the right answer slowly is a more tolerable failure mode than getting the wrong answer.

bcs = firedrake.DirichletBC(Z.sub(1), firedrake.zero(), "on_boundary")
J = firedrake.derivative(F_ϵ, z)
problem = firedrake.NonlinearVariationalProblem(F, z, bcs, J=J)

We'll have one final difficulty to overcome -- what happens if the thickness inadvertently becomes negative? There's a blunt solution that everyone uses, which is to clamp the thickness to 0 from below at every step. Clamping can work in many cases. But if you're using a Runge-Kutta method, it only assures positivity at the end of each step and not in any of the intermediate stages. We can instead formulate the whole problem, including the non-negativity constraint, as a variational inequality. Much like how some but not all variational problems arise from minimization principles, some variational inequalities arise from minimization principles with inequality constraints, like the obstacle problem. But variational inequalities are a more general class of problem than inequality-constrained minimization. Formulating overland flow as as a variational inequality is a bit of overkill for the time discretization that we're using. Nonetheless, I'll show how to do that in the following just for illustrative purposes. We first need two functions representing the upper and lower bounds for the solution. In this case, the upper bound is infinity.

from firedrake.petsc import PETSc

upper = firedrake.Function(Z)
with upper.dat.vec as upper_vec:
    upper_vec.set(PETSc.INFINITY)

The thickness is bounded below by 0, but there's no lower bound at all on the flux, so we'll set only the flux entries to negative infinity.

lower = firedrake.Function(Z)
with lower.sub(1).dat.vec as lower_vec:
    lower_vec.set(PETSc.NINFINITY)

When we want to solve variational inequalities, we can't use the usual Newton solvers in PETSc -- we have a choice between a semi-smooth Newton (vinewtonssls) and an active set solver (vinewtonrsls). I couldn't get the semi-smooth Newton solver to work and I have no idea why.

params = {
    "solver_parameters": {
        "snes_type": "vinewtonrsls",
        "ksp_type": "gmres",
        "pc_type": "lu",
    }
}

solver = firedrake.NonlinearVariationalSolver(problem, **params)

Finally, we'll run the timestepping loop. Here we pass the bounds explicitly on each call to solve.

from tqdm.notebook import trange

final_time = 60.0
num_steps = int(final_time / float(δt))

hs = [z.sub(0).copy(deepcopy=True)]
qs = [z.sub(1).copy(deepcopy=True)]

for step in trange(num_steps):
    solver.solve(bounds=(lower, upper))
    z_n.assign(z)

    h, q = z.subfunctions
    hs.append(h.copy(deepcopy=True))
    qs.append(q.copy(deepcopy=True))

Movie time as always.

%%capture

from matplotlib.animation import FuncAnimation

fig, axes = plt.subplots()
axes.set_aspect("equal")
axes.set_axis_off()
kw = {"vmin": 0, "vmax": 1, "cmap": "Blues", "num_sample_points": 4}
colors = firedrake.tripcolor(hs[0], **kw, axes=axes)
fn_plotter = firedrake.FunctionPlotter(mesh, num_sample_points=4)
animate = lambda h: colors.set_array(fn_plotter(h))
interval = 1e3 / 60
animation = FuncAnimation(fig, animate, frames=hs, interval=interval)
from IPython.display import HTML

HTML(animation.to_html5_video())

As some a posteriori sanity checking, we can evaluate how much the total water volume deviates.

volumes = np.array([firedrake.assemble(h * dx) for h in hs])
volume_error = (volumes.max() - volumes.min()) / volumes.mean()
print(f"Volume relative error: {volume_error:5.2g}")
Volume relative error: 0.013

Where a truly conservative scheme would exactly preserve the volume up to some small multiple of machine precision, we can only get global conservation up to the mesh resolution with our scheme. Instead, there are spurious "sources" at the free boundary. Likewise, there can be spurious sinks in the presence of ablation, so the sign error can go either way. This topic is covered in depth in this paper.

fig, axes = plt.subplots()
ts = np.linspace(0.0, final_time, num_steps + 1)
axes.set_xlabel("time")
axes.set_ylabel("volume ($m^3$)")
axes.plot(ts, volumes);
No description has been provided for this image

We can examine the fluxes after the fact in order to see where the value of $\varepsilon$ that we picked sits.

qms = [firedrake.project(inner(q, q)**0.5, Q) for q in qs]
area = firedrake.assemble(Constant(1) * dx(mesh))
qavgs = np.array([firedrake.assemble(q * dx) / area for q in qms])
print(f"Average flux: {qavgs.mean()*100**2:5.1f} cm²/s")
print(f"Fudge flux:   {float(ϵ)*100**2:5.1f} cm²/s")
Average flux: 266.6 cm²/s
Fudge flux:    10.0 cm²/s

The fudge flux is 1/25 that of the average. This is quite a bit smaller, but not so much so that we should feel comfortable with this large a perturbation to the physics equations themselves. The ability to use it only in the derivative and not in the residual is a huge help.

To wrap things up, the overland flow equations are a perfect demonstration of how trivially equivalent forms of the same physical problem can yield vastly different discretizations. Writing the system as a single parabolic PDE might seem simplest, but there are several potential zeros in the denominator that require some form of regularization. By contrast, using a mixed form introduces more unknowns and a nonlinear equation for the flux, but there's wiggle room within that nonlinear equation. This makes it much easier to come up with a robust solution procedure, even if it includes a few uncomfortable hacks like using a different Jacobian from that of the true problem. Finally, while our discretization still works ok with no positivity constraint, PETSc has variational inequality solvers that make it possible to enforce positivity.

Symplectic integrators

My dad is a physicist of the old school, and what this means is that he has to tell everyone -- regardless of their field -- that what they're doing is so simple as to not even be worth doing and that anyway physicsists could do it better. So whenever my job comes up he has to tell the same story about how he once took a problem to a numerical analyst. This poor bastard ran some code for him on a state-of-the-art computer of the time (a deer skeleton with a KT88 vacuum tube in its asshole) but the solution was total nonsense and didn't even conserve energy. Then pops realizes he could solve the Hamilton-Jacobi equation for the system exactly. Numerical analysis is for clowns.

Naturally, every time we have this conversation, I remind him that we figured out all sorts of things since then, like the fact that people who don't own land should be allowed to vote and also symplectic integrators. In this post I'll talk about the latter. A symplectic integrator is a scheme for solving Hamilton's equations of motion of classical mechanics in such a way that the map from the state at one time to the state at a later time preserves the canonical symplectic form. This is a very special property and not every timestepping scheme is symplectic. For those schemes that are symplectic, the trajectory samples exactly from the flow of a slightly perturbed Hamiltonian, which is a pretty nice result.

The two-body problem

First, we'll illustrate things on the famous two-body problem, which has the Hamiltonian

$$H = \frac{|p_1|^2}{2m_1} + \frac{|p_2|^2}{2m_2} - \frac{Gm_1m_2}{|x_1 - x_2|}$$

where $x_1$, $x_2$ are the positions of the two bodies, $m_1$, $m_2$ their masses, and $G$ the Newton gravitation constant. We can simplify this system by instead working in the coordinate system $Q = (m_1x_1 + m_2x_2) / (m_1 + m_2)$, $q = x_2 - x_1$. The center of mass $Q$ moves with constant speed, reducing the Hamiltonian to

$$H = \frac{|p|^2}{2\mu} - \frac{Gm_1m_2}{|q|}$$

where $\mu = m_1m_2 / (m_1 + m_2)$ is the reduced mass of the system. We could go on to write $q$ in polar coordinates and do several transformations to derive an exact solution; you can find this in the books by Goldstein or Klepper and Kolenkow.

Instead, we'll take the Hamiltonian above as our starting point, but first we want to make the units work out as nicely as possible. The gravitational constant $G$ has to have units of length${}^3\cdot$time${}^{-2}\cdot$mass${}^{-1}$ in order for both terms in the Hamiltonian we wrote above to have units of energy. We'd like for all the lengths and times in the problem to work out to be around 1, which suggests that we measure time in years and length in astronomic units. The depository of all knowledge tells me that, in this unit system, the gravitational constant is

$$G \approx 4\pi^2\, \text{AU}^3 \cdot \text{yr}^{-2}\cdot M_\odot^{-1}.$$

The factor of $M_\odot^{-1}$ in the gravitational constant will cancel with the corresponding factor in the Newton force law. For something like the earth-sun system, where the mass of the sun is much larger than that of the earth, the reduced mass of the system is about equal to the mass of the earth. So if we take the earth mass $M_\oplus$ as our basic mass unit, the whole system works out to about

$$H = \frac{|p|^2}{2} - \frac{4\pi^2}{|q|}.$$

Finally, in this unit system we can take the initial position of the earth to be a $(1, 0)$ AU; we know the angular velocity of the earth is about $2\pi$ AU / year, so the initial momentum is $2\pi$ AU / year. Hamilton's equations of motion are

$$\begin{align} \dot q & = +\frac{\partial H}{\partial p} = p \\ \dot p & = -\frac{\partial H}{\partial q} = -4\pi^2\frac{q}{|q|^3}. \end{align}$$

To start, we'll try out the classic explicit and implicit Euler methods first.

import numpy as np
from numpy import pi as π

q_0 = np.array([1.0, 0.0])
p_0 = np.array([0.0, 2 * π])

final_time = 3.0
num_steps = 3000
dt = final_time / num_steps
def gravitational_force(q):
    return -4 * π ** 2 * q / np.sqrt(np.dot(q, q)) ** 3
def explicit_euler(q, p, dt, num_steps, force):
    qs = np.zeros((num_steps + 1,) + q.shape)
    ps = np.zeros((num_steps + 1,) + p.shape)

    qs[0] = q
    ps[0] = p

    for t in range(num_steps):
        qs[t + 1] = qs[t] + dt * ps[t]
        ps[t + 1] = ps[t] + dt * force(qs[t])
        
    return qs, ps

We'll call out to scipy's nonlinear solver for our implementation of the implicit Euler method. In principle, scipy can solve the resulting nonlinear system of equations solely with the ability to evaluate the forces. But in order to make this approach as competitive as possible we should also provide the derivative of the forces with respect to the positions, which enables using Newton-type methods.

I = np.eye(2)

def gravitational_force_jacobian(q):
    Q = np.sqrt(np.dot(q, q))
    return -4 * π ** 2 / Q ** 3 * (I - 3 * np.outer(q, q) / Q ** 2)
from scipy.optimize import root

def implicit_euler(q, p, dt, num_steps, force, force_jacobian):
    qs = np.zeros((num_steps + 1,) + q.shape)
    ps = np.zeros((num_steps + 1,) + p.shape)

    qs[0] = q
    ps[0] = p

    def f(q, q_t, p_t):
        return q - q_t - dt * (p_t + dt * force(q))
    
    def J(q, q_t, p_t):
        return I - dt ** 2 * force_jacobian(q)
    
    for t in range(num_steps):
        result = root(f, qs[t, :], jac=J, args=(qs[t], ps[t]))
        qs[t + 1] = result.x
        ps[t + 1] = ps[t] + dt * force(qs[t + 1])
        
    return qs, ps
q_ex, p_ex = explicit_euler(
    q_0, p_0, dt, num_steps, gravitational_force
)
q_im, p_im = implicit_euler(
    q_0, p_0, dt, num_steps, gravitational_force, gravitational_force_jacobian
)
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

def plot_trajectory(q, start_width=1.0, end_width=3.0, **kwargs):
    points = q.reshape(-1, 1, 2)
    segments = np.concatenate([points[:-1], points[1:]], axis=1)
    widths = np.linspace(start_width, end_width, len(points))
    return LineCollection(segments, linewidths=widths, **kwargs)
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_xlim((-1.25, +1.25))
ax.set_ylim((-1.25, +1.25))
ax.set_axis_off()
ax.add_collection(plot_trajectory(q_ex, color="tab:blue", label="explicit"))
ax.add_collection(plot_trajectory(q_im, color="tab:orange", label="implicit"))
ax.legend(loc="upper right");
No description has been provided for this image

The explicit Euler method spirals out from what looks like to be a circular orbit at first, while the implicit Euler method spirals in. Since the gravitational potential is negative, this means that the explicit Euler scheme is gaining energy, while the implicit Euler scheme is losing energy.

def energies(qs, ps):
    kinetic = 0.5 * np.sum(ps ** 2, axis=1)
    potential = -4 * π ** 2 / np.sqrt(np.sum(qs ** 2, axis=1))
    return kinetic + potential

fig, ax = plt.subplots()
ts = np.linspace(0.0, final_time, num_steps + 1)
ax.plot(ts, energies(q_ex, p_ex), label="explicit")
ax.plot(ts, energies(q_im, p_im), label="implicit")
ax.set_xlabel("time (years)")
ax.set_ylabel("energy")
ax.legend();
No description has been provided for this image

If we use a slightly longer timestep, the implicit Euler method will eventually cause the earth and sun to crash into each other in the same short time span of three years. This prediction does not match observations, much as we might wish.

We could reduce the energy drift to whatever degree we desire by using a shorter timestep or using a more accurate method. But before we go and look up the coefficients for the usual fourth-order Runge Kutta method, let's instead try a simple variation on the explicit Euler scheme.

from tqdm.notebook import trange
def semi_explicit_euler(q, p, dt, num_steps, force, progressbar=False):
    qs = np.zeros((num_steps + 1,) + q.shape)
    ps = np.zeros((num_steps + 1,) + p.shape)

    qs[0] = q
    ps[0] = p

    iterator = trange(num_steps) if progressbar else range(num_steps)
    for t in iterator:
        qs[t + 1] = qs[t] + dt * ps[t]
        ps[t + 1] = ps[t] + dt * force(qs[t + 1])
        
    return qs, ps

Rather than use the previous values of the system state to pick the next system state, we first updated the position, then used this new value to update the momentum; we used force(qs[t + 1]) instead of force(qs[t]). This is an implicit scheme in the strictest sense of the word. The particular structure of the central force problem, however, makes the computations explicit. In fancy terms we would refer to the Hamiltonian as separable. Let's see how this semi-explicit Euler scheme does.

q_se, p_se = semi_explicit_euler(
    q_0, p_0, dt, num_steps, gravitational_force
)
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_xlim((-1.5, +1.5))
ax.set_ylim((-1.5, +1.5))
ax.set_axis_off()
ax.add_collection(plot_trajectory(q_ex, color="tab:blue", label="explicit"))
ax.add_collection(plot_trajectory(q_im, color="tab:orange", label="implicit"))
ax.add_collection(plot_trajectory(q_se, color="tab:green", label="symplectic"))
ax.legend(loc="upper right");
No description has been provided for this image

The orbit of the semi-explicit or symplectic method shown in green seems to be roughly closed, which is pretty good. The most stunning feature is that the energy drift, while non-zero, is bounded and oscillatory. The amplitude of the drift is smaller than the energy itself by a factor of about one in 10,000.

fig, ax = plt.subplots()
Hs = energies(q_se, p_se)
ax.plot(ts, Hs - Hs[0], label="semi-explicit")
ax.set_xlabel("time (years)")
ax.set_ylabel("energy drift");
No description has been provided for this image

Just for kicks, let's try again on an elliptical orbit with some more eccentricity than what we tried here, and on the same circular orbit, for a much longer time window.

final_time = 3e2
num_steps = int(3e4)
dt = final_time / num_steps

q_0 = np.array([1.0, 0.0])
p_0 = np.array([0.0, 2 * π])
q_se, p_se = semi_explicit_euler(q_0, p_0, dt, num_steps, gravitational_force)

ϵ = 0.1
q_0 = np.array([1.0 + ϵ, 0.0])
p_0 = np.array([0.0, 2 * π])
q_el, p_el = semi_explicit_euler(q_0, p_0, dt, num_steps, gravitational_force)
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_xlim((-1.5, +1.5))
ax.set_ylim((-1.5, +1.5))
ax.set_axis_off()
ax.add_collection(plot_trajectory(q_se, color="tab:blue", label="circular"))
ax.add_collection(plot_trajectory(q_el, color="tab:orange", label="elliptical"))
ax.legend(loc="lower right");
No description has been provided for this image

The orbits don't exactly trace out circles or ellipses -- the orbits precess a bit. Nonetheless, they still remain roughly closed and have bounded energy drift. For less work than the implicit Euler scheme, we got a vastly superior solution. Why is the semi-explicit Euler method so much better than the explicit or implicit Euler method?

Symplectic integrators

Arguably the most important property of Hamiltonian systems is that the energy is conserved, as well as other quantities such as linear and angular momentum. The explicit and implicit Euler methods are convergent, and so their trajectories reproduce those of the Hamiltonian system for any finite time horizon as the number of steps is increased. These guarantees don't tell us anything about how the discretized trajectories behave using a fixed time step and very long horizons, and they don't tell us anything about the energy conservation properties either. The wonderful property about semi-explicit Euler is that the map from the state of the system at one timestep to the next samples directly from the flow of a slightly perturbed Hamiltonian.

Let's try to unpack that statement a bit more. A fancy way of writing Hamilton's equations of motion is that, for any observable function $f$ of the total state $z = (q, p)$ of the system,

$$\frac{\partial f}{\partial t} = \{f, H\}$$

where $\{\cdot, \cdot\}$ denotes the Poisson bracket. For the simple systems described here, the Poisson bracket of two functions $f$ and $g$ is

$$\{f, g\} = \sum_i\left(\frac{\partial f}{\partial q_i}\frac{\partial g}{\partial p_i} - \frac{\partial f}{\partial p_i}\frac{\partial g}{\partial q_i}\right).$$

We recover the usual Hamilton equations of motion by substituting the positions and momenta themselves for $f$. In general, the Poisson bracket can be any bilinear form that's antisymmetric and satisfies the Leibniz and Jacobi identities. In a later demo, I'll look at rotational kinematics, where the configuration space is no longer flat Euclidean space but the Lie group SO(3). The Poisson bracket is rightfully viewed as a 2-form in this setting. Leaving this complications aside for the moment, the evolution equation in terms of brackets is especially nice in that it allows us to easily characterize the conserved quantities: any function $f$ such that $\{f, H\} = 0$. In particular, due to the antisymmetry of the bracket, the Hamiltonian $H$ itself is always conserved.

Solving Hamilton's equations of motion forward in time gives a map $\Phi_t$ from the initial to the final state. The nice part about this solution map is that it obeys the semi-group property: $\Phi_s\circ\Phi_t = \Phi_{s + t}$. In the same way that we can think of a matrix $A$ generating the solution map $e^{tA}$ of the linear ODE $\dot z = Az$, we can also think of the solution map for Hamiltonian systems as being generated by the Poisson bracket with the Hamiltonian:

$$\Phi_t = \exp\left(t\{\cdot, H\}\right)$$

where $\exp$ denotes the exponential map. This isn't a rigorous argument and to really make that clear I'd have to talk about diffeomorphism groups of manifolds. Just believe me and read Jerrold Marsden's books if you don't.

Now comes the interesting part. Suppose we want to solve the linear ODE

$$\dot z = (A + B)z.$$

We'd like to find a way to break down solving this problem into separately solving ODEs defined by $A$ and $B$. It isn't possible to split the problem exactly because, for matrices, $\exp\left(t(A + B)\right)$ is not equal to $\exp(tA)\exp(tB)$ unless $A$ and $B$ commute. But, for small values of $\delta t$, we can express the discrepancy in terms of the commutate $[A, B] = AB - BA$ of the matrices:

$$\exp(\delta t\cdot A)\exp(\delta t\cdot B) = \exp\left(\delta t(A + B) + \frac{\delta t^2}{2}[A, B] + \ldots\right)$$

where the ellipses denote terms of higher order in $\delta t$. Exactly what goes in the higher-order terms is the content of the Baker-Campbell-Hausdorff (BCH) formula. This reasoning is what leads to splitting methods for all kinds of different PDEs. For example, you can show that splitting the solution of an advection-diffusion equation into an explicit step for the advective part and an implicit step for the diffusive part works with an error of order $\mathscr{O}(\delta t)$ using the BCH formula.

The clever part about the analysis of symplectic methods is that we can play a similar trick for Hamiltonian problems (if we're willing to wave our hands a bit). Suppose that a Hamiltonian $H$ can be written as

$$H = H_1 + H_2$$

where exactly solving for the flow of each Hamiltonian $H_1$, $H_2$ is easy. The most obvious splitting is into kinetic and potential energies $K$ and $U$. Integrating the Hamiltonian $K(p)$ is easy because the momenta don't change all -- the particles continue in linear motion according to what their starting momenta were. Integrating the Hamiltonian $U(q)$ is also easy because, while the momenta will change according to the particles' initial positions, those positions also don't change. To write it down explicitly,

$$\Phi^K_t\left(\begin{matrix}q \\ p\end{matrix}\right) = \left(\begin{matrix}q + t\frac{\partial K}{\partial p} \\ p\end{matrix}\right)$$

and

$$\Phi^U_t\left(\begin{matrix}q \\ p\end{matrix}\right) = \left(\begin{matrix}q \\ p - t\frac{\partial U}{\partial q}\end{matrix}\right)$$

Each of these Hamiltonian systems by itself is sort of silly, but the composition of maps $\Phi^U_{\delta t}\circ \Phi^K_{\delta t}$ gives an $\mathscr{O}(\delta t$)-accurate approximation to $\Phi^{K + U}_{\delta t}$ by the BCH formula. Now if we keep up the analogy and pretend like we can apply the BCH formula to Hamiltonian flows exactly, we'd formally write that

$$\exp\left(\delta t\{\cdot, H_1\}\right)\exp\left(\delta t\{\cdot, H_2\}\right) = \exp\left(\delta t\{\cdot, H_1 + H_2\} + \frac{\delta t^2}{2}\left\{\cdot, \{H_1, H_2\}\right\} + \ldots \right).$$

In other words, it's not just that using the splitting scheme above is giving us a $\mathscr{O}(\delta t)$-accurate approximation to the solution $q(t), p(t)$, it's that the approximate solution is sampled exactly from integrating the flow of the perturbed Hamiltonian

$$H' = H + \frac{\delta t}{2}\{H_1, H_2\} + \mathscr{O}(\delta t^2).$$

All of the things that are true of Hamiltonian systems generally are then true of our numerical approximations. For example, they still preserve volume in phase space (Liouville's theorem)); have no stable or unstable equilibrium points, only saddles and centers; and typically have roughly bounded trajectories.

Using the BCH formula to compute the perturbed Hamiltonian helps us design schemes of even higher order. For example, the scheme that we're using throughout in this post is obtained by taking a full step of the momentum solve followed by a full step of the position solve. We could eliminate the first-order term in the expansion by taking a half-step of momentum, a full step of position, followed by a half-step of momentum again:

$$\Psi = \Phi^K_{\delta t / 2}\Phi^U_{\delta t}\Phi^K_{\delta t / 2},$$

i.e. a symmetric splitting. This gives a perturbed Hamiltonian that's accurate to $\delta t^2$ instead:

$$H' = H + \frac{\delta t^2}{24}\left(2\{U, \{U, K\}\} - \{K, \{K, U\}\}\right) + \mathscr{O}(\delta t^4)$$

This scheme is substantially more accurate and also shares a reversibility property with the true problem.

Making all of this analysis really rigorous requires a bit of Lie algebra sorcery that I can't claim to understand at any deep level. But for our purposes it's sufficient to know that symplectic methods like semi-explicit Euler sample exactly from some perturbed Hamiltonian, which is likely to have bounded level surfaces in phase space if the original Hamiltonian did. This fact gives us stability guarantees that are hard to come by any other way.

Molecular dynamics

The two-body gravitational problem is all well and good, but now let's try it for a more interesting and complex example: the motion of atoms. One of the simplest models for interatomic interactions is the Lennard-Jones (LJ) potential, which has the form

$$U = \epsilon\left(\left(\frac{R}{r}\right)^{12} - 2\left(\frac{R}{r}\right)^6\right).$$

The potential is repulsive at distances less than $R$, attractive at distances between $R$ and $2R$, and pretty much zero at distances appreciably greater than $2R$, with a well depth of $\epsilon$. The LJ potential is spherically symmetric, so it's not a good model for polyatomic molecules like water that have a non-trivial dipole moment, but it's thought to be a pretty approximation for noble gases like argon. We'll work in a geometrized unit system where $\epsilon = 1$ and $R = 1$. The code below calculates the potential and forces for a system of several Lennard-Jones particles.

ϵ = 1.0
R = 1.0

def lennard_jones_potential(q):
    U = 0.0
    n = len(q)
    for i in range(n):
        for j in range(i + 1, n):
            z = q[i] - q[j]
            ρ = np.sqrt(np.dot(z, z)) / R
            U += ϵ / ρ ** 6 * (1 / ρ ** 6 - 2)

    return U

def lennard_jones_force(q):
    fs = np.zeros_like(q)
    n = len(q)
    for i in range(n):
        for j in range(i + 1, n):
            z = q[i] - q[j]
            ρ = np.sqrt(np.dot(z, z)) / R
            f = -12 * ϵ / R ** 2 / ρ ** 8 * (1 - 1 / ρ ** 6) * z
            fs[i] += f
            fs[j] -= f

    return fs

This code runs in $\mathscr{O}(n^2)$ for a system of $n$ particles, but the Lennard-Jones interaction is almost completely negligible for distances greater than $3R$. There are approximation schemes that use spatial data structures like quadtrees to index the positions of all the particles and lump the effects of long-range forces. These schemes reduce the overall computational burden to $\mathscr{O}(n\cdot\log n)$ and are a virtual requirement to running large-scale simulations.

For the initial setup, we'll look at a square lattice of atoms separated by a distance $R$. We'll start out with zero initial velocity. If you were to imagine an infinite or periodic lattice of Lennard-Jones atoms, a cubic lattice should be stable. The points immediately to the north, south, east, and west on the grid are exactly at the equilibrium distance, while the forces between an atom and its neighbors to the northwest and southeast should cancel. For this simulation, we won't include any periodicity, so it's an interesting question to see if the cubic lattice structure remains even in the presence of edge effects.

num_rows, num_cols = 10, 10
num_particles = num_rows * num_cols

q = np.zeros((num_particles, 2))
for i in range(num_rows):
    for j in range(num_cols):
        q[num_cols * i + j] = (R * i, R * j)
        
p = np.zeros((num_particles, 2))

I've added a progress bar to the simulation so I can see how fast it runs. Each iteration usually takes about the same time, so after about 10 or so you can tell whether you should plan to wait through the next cup of coffee or until next morning.

dt = 1e-2
num_steps = 2000

qs, ps = semi_explicit_euler(
    q, p, dt, num_steps, force=lennard_jones_force, progressbar=True
)

And now for some pretty animations.

%%capture
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
ax.set_axis_off()
ax.set_xlim((qs[:, :, 0].min(), qs[:, :, 0].max()))
ax.set_ylim((qs[:, :, 1].min(), qs[:, :, 1].max()))
ax.set_aspect("equal")
points = ax.scatter(qs[0, :, 0], qs[0, :, 1], animated=True)

def update(timestep):
    points.set_offsets(qs[timestep, :, :])

num_steps = len(qs)
fps = 60
animation = FuncAnimation(fig, update, num_steps, interval=1e3 / fps)
from IPython.display import HTML
HTML(animation.to_html5_video())

The cubic lattice is unstable -- the particles eventually rearrange into a hexagonal lattice. We can also see this if we plot the potential energy as a function of time. Around halfway into the simulation, the average potential energy suddenly drops by about $\epsilon / 5$.

ts = np.linspace(0, num_steps * dt, num_steps)
Us = np.array([lennard_jones_potential(q) for q in qs]) / num_particles
fig, ax = plt.subplots()
ax.set_xlabel("time")
ax.set_ylabel("potential energy")
ax.plot(ts, Us);
No description has been provided for this image

By way of a posteriori sanity checking, we can see that the total energy wasn't conserved exactly, but the deviations are bounded and the amplitude is much smaller than the characteristic energy scale $\epsilon$ of the problem.

Ks = 0.5 * np.sum(ps ** 2, axis=(1, 2)) / num_particles
Hs = Us + Ks

fig, ax = plt.subplots()
ax.set_xlabel("time")
ax.set_ylabel("energy")
ax.plot(ts, Us, label="potential")
ax.plot(ts, Hs, label="total")
ax.legend();
No description has been provided for this image

Conclusion

An introductory class in numerical ODE will show you how to construct convergent discretization schemes. Many real problems, however, have special structure that a general ODE scheme may or may not preserve. Hamiltonian systems are particularly rich in structure -- energy and phase space volume conservation, reversibility. Some very special discretization schemes preserve this structure. In this post, we focused only on the very basic symplectic Euler scheme and hinted at the similar but more accurate Störmer-Verlet scheme. Another simple symplectic method is the implicit midpoint rule

$$\frac{z_{n + 1} - z_n}{\delta t} = f\left(\frac{z_n + z_{n + 1}}{2}\right).$$

There are of course higher-order symplectic schemes, for example Lobatto-type Runge Kutta methods.

We showed a simulation of several particles interacting via the Lennard-Jones potential, which is spherically symmetric. Things get much more complicated when there are rotational degrees of freedom. The rotational degrees of freedom live not in flat Euclidean space but on the Lie group SO(3), and the angular momenta in the Lie algebra $\mathfrak{so}(3)$. More generally, there are specialized methods for problems with constraints, such as being a rotation matrix, or being confined to a surface.

If you want to learn more, my favorite references are Geometric Numerical Integration by Hairer, Lubich, and Wanner and Simulating Hamiltonian Dynamics by Leimkuhler and Reich.

Rosenbrock schemes

In the previous demo, we looked at a few spatial and temporal discretizations of the nonlinear shallow water equations. One of the challenging parts about solving systems of hyperbolic PDE like the shallow water equations is choosing a timestep that satisfies the Courant-Friedrichs-Lewy condition. You can pick a good timestep ahead of time for a linear autonomous system. A nonlinear system, on the other hand, might wander into weird parts of phase space where the characteristic wave speeds are much higher. You might be able to pick a good timestep from the outset, but it's likely to be overly conservative and waste loads of compute time. The tyranny of the CFL condition is the reason why it's such a common grumble among practitioners that ocean models explode if you look at them sideways.

All of the timestepping schemes we used in the previous demo were Runge-Kutta methods, which look something like this:

$$z_{n + 1} = z_n + \delta t\cdot \sum_ib_ik_i$$

where $b_i$ are weights and the stages $k_i$ are defined as

$$k_i = f\left(z_n + \delta t\sum_ja_{ij}k_j\right).$$

For the method to be explicit, we would need that $a_{ij} = 0$ if $j \ge i$. You can find all the conditions for a Runge-Kutta method to have a certain order of accuracy in time in books like Hairer and Wanner.

Implicit Runge-Kutta schemes can eliminate many of the frustrating stability issues that occur with explicit schemes. Implicit methods can use timesteps that far exceed the CFL-stable timestep. But they introduce the added complexity of having to solve a nonlinear system at every timestep. What globalization strategy will you use for Newton's method? What preconditioner will you use in solving the associated linear systems? These are all decisions you didn't have to make before. It's possible to reduce some of the pain and suffering by using schemes for which $a_{ii}$ can be nonzero but $a_{ij} = 0$ if $j > 0$ -- these are the diagonally-implicit Runge-Kutta schemes. Rather than have to solve a gigantic nonlinear system for all of the stages $k_i$ at once, you only have to solve a sequence of nonlinear systems for each stage.

The idea behind Rosenbrock methods is to perform only a single iteration of Newton's method for the nonlinear system defining the Runge-Kutta stages, rather than actually solve that system to convergence. There are two heuristic justifications for Rosenbrock schemes. First, a scheme like implicit Euler is only first-order accurate in time anyway, so there isn't much reason to do a really accurate nonlinear system solve as part of a fairly crude timestepping scheme. Second, for a timestep that isn't too much larger than the characteristic timescale of the problem, the current system state is probably either in the quadratic convergence basin for Newton's method or at least fairly close.

More general Rosenbrock schemes follow from this idea. The best reference I've found is one of the original papers on the subject, Kaps and Rentrop (1979). This paper shows more general schemes in this family, derives the order conditions for the various weights and parameters, and perhaps most importantly derives an embedded Rosenbrock scheme that can be used for adaptive timestep control. Here we'll show one of the most basic schemes, which comes from taking a single Newton step for the implicit midpoint rule.

Setup

All of this is copied from the previous demo, so I'll give only cursory explanations.

import firedrake
from firedrake import Constant
g = Constant(9.81)
I = firedrake.Identity(2)

The following functions compute symbolic representations of the various shallow water fluxes.

from firedrake import inner, grad, dx
def cell_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    f_h = -inner(q, grad(ϕ)) * dx

    F = outer(q, q) / h + 0.5 * g * h**2 * I
    f_q = -inner(F, grad(v)) * dx

    return f_h + f_q
from firedrake import avg, outer, dS
def central_facet_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    mesh = z.ufl_domain()
    n = firedrake.FacetNormal(mesh)

    f_h = inner(avg(q), ϕ('+') * n('+') + ϕ('-') * n('-')) * dS

    F = outer(q, q) / h + 0.5 * g * h**2 * I
    f_q = inner(avg(F), outer(v('+'), n('+')) + outer(v('-'), n('-'))) * dS
    
    return f_h + f_q
from firedrake import sqrt, max_value
def lax_friedrichs_facet_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    mesh = h.ufl_domain()
    n = firedrake.FacetNormal(mesh)
    
    c = abs(inner(q / h, n)) + sqrt(g * h)
    α = avg(c)
    
    f_h = -α * (h('+') - h('-')) * (ϕ('+') - ϕ('-')) * dS
    f_q = -α * inner(q('+') - q('-'), v('+') - v('-')) * dS

    return f_h + f_q
def topographic_forcing(z, b):
    Z = z.function_space()
    h = firedrake.split(z)[0]
    v = firedrake.TestFunctions(Z)[1]

    return -g * h * inner(grad(b), v) * dx

For an explicit time discretization and a DG method in space, we can use an ILU preconditioner with a block Jacobi inner preconditioner and this will exactly invert the DG mass matrix.

block_parameters = {
    'ksp_type': 'preonly',
    'pc_type': 'ilu',
    'sub_pc_type': 'bjacobi'
}

parameters = {
    'solver_parameters': {
        'ksp_type': 'preonly',
        'pc_type': 'fieldsplit',
        'fieldsplit_0': block_parameters,
        'fieldsplit_1': block_parameters
    }
}
from firedrake import (
    NonlinearVariationalProblem as Problem,
    NonlinearVariationalSolver as Solver
)

class SSPRK3:
    def __init__(self, state, equation):
        z = state.copy(deepcopy=True)
        dt = firedrake.Constant(1.0)
        
        num_stages = 3
        zs = [state.copy(deepcopy=True) for stage in range(num_stages)]
        Fs = [equation(z), equation(zs[0]), equation(zs[1])]
        
        Z = z.function_space()
        w = firedrake.TestFunction(Z)
        forms = [
            inner(zs[0] - z, w) * dx - dt * Fs[0],
            inner(zs[1] - (3 * z + zs[0]) / 4, w) * dx - dt / 4 * Fs[1],
            inner(zs[2] - (z + 2 * zs[1]) / 3, w) * dx - 2 * dt / 3 * Fs[2]
        ]
        
        problems = [Problem(form, zk) for form, zk in zip(forms, zs)]
        solvers = [Solver(problem, **parameters) for problem in problems]
        
        self.state = z
        self.stages = zs
        self.timestep = dt
        self.solvers = solvers
    
    def step(self, timestep):
        self.timestep.assign(timestep)
        for solver in self.solvers:
            solver.solve()
        self.state.assign(self.stages[-1])

We'll create some auxiliary functions to actually run the simulation and create an animation of it.

from tqdm.notebook import trange

def run_simulation(solver, final_time, num_steps, output_freq):
    hs, qs = [], []
    qs = []
    pbar = trange(num_steps)
    for step in pbar:
        if step % output_freq == 0:
            h, q = solver.state.subfunctions
            hmin, hmax = h.dat.data_ro.min(), h.dat.data_ro.max()
            pbar.set_description(f'{hmin:5.3f}, {hmax:5.3f}')
            hs.append(h.copy(deepcopy=True))
            qs.append(q.copy(deepcopy=True))

        solver.step(timestep)

    return hs, qs
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

def make_animation(hs, b, timestep, output_freq, **kwargs):
    fig, axes = plt.subplots()
    axes.set_aspect('equal')
    axes.set_xlim((0.0, Lx))
    axes.set_ylim((0.0, Ly))
    axes.set_axis_off()
    η = firedrake.project(hs[0] + b, hs[0].function_space())
    colors = firedrake.tripcolor(
        hs[0], num_sample_points=1, axes=axes, **kwargs
    )

    def animate(h):
        η.project(h + b)
        colors.set_array(η.dat.data_ro[:])

    interval = 1e3 * output_freq * timestep
    animation = FuncAnimation(fig, animate, frames=hs, interval=interval)

    plt.close(fig)
    return HTML(animation.to_html5_video())

Rosenbrock scheme

The implementation of the Rosenbrock scheme is fairly similar to the other timestepping methods we've shown before, but we have an extra term in the variational problem describing the linearization of the dynamics. We're also making the initializer take some extra arguments for solver parameters. When we were using explicit schemes, there was really only one sane choice of solver parameters because the matrix we had to invert was just a DG mass matrix. Here, the choice of iterative solvers and preconditioners can become much more involved, as we'll show later.

from firedrake import derivative
class Rosenbrock:
    def __init__(self, state, equation, solver_parameters=None):
        z = state.copy(deepcopy=True)
        F = equation(z)
        
        z_n = z.copy(deepcopy=True)
        Z = z.function_space()
        w = firedrake.TestFunction(Z)

        dt = firedrake.Constant(1.0)

        dF = derivative(F, z, z_n - z)
        problem = Problem(
            inner(z_n - z, w) * dx - dt / 2 * dF - dt * F,
            z_n
        )
        solver = Solver(problem, solver_parameters=solver_parameters)

        self.state = z
        self.next_state = z_n
        self.timestep = dt
        self.solver = solver
        
    def step(self, timestep):
        self.timestep.assign(timestep)
        self.solver.solve()
        self.state.assign(self.next_state)

Demonstration

We'll use the same input data and function spaces as before -- BDFM(2) for the momentum and DG(1) for the thickness.

nx, ny = 32, 32
Lx, Ly = 20., 20.
mesh = firedrake.PeriodicRectangleMesh(
    nx, ny, Lx, Ly, diagonal='crossed'
)

x = firedrake.SpatialCoordinate(mesh)
lx = 5.
y = Constant((lx, lx))
r = Constant(2.5)

h_0 = Constant(1.)
δh = Constant(1/16)
h_expr = h_0 + δh * max_value(0, 1 - inner(x - y, x - y) / r**2)

y = Constant((3 * lx, 3 * lx))
δb = Constant(1/4)
b = δb * max_value(0, 1 - inner(x - y, x - y) / r**2)
DG1 = firedrake.FunctionSpace(mesh, family='DG', degree=1)
BDFM2 = firedrake.FunctionSpace(mesh, family='BDFM', degree=2)
Z = DG1 * BDFM2
z0 = firedrake.Function(Z)
z0.sub(0).project(h_expr - b);
def F(z):
    return (
        cell_flux(z) +
        central_facet_flux(z) +
        lax_friedrichs_facet_flux(z) -
        topographic_forcing(z, b)
    )
SSPRK(3)

To get a baseline solution, we'll use the SSPRK(3) scheme from before.

solver = SSPRK3(z0, F)
final_time = 10.0
timestep = 5e-3
num_steps = int(final_time / timestep)
output_freq = 10
hs, qs = run_simulation(solver, final_time, num_steps, output_freq)
make_animation(
    hs, b, timestep, output_freq, shading='gouraud', vmin=0.96, vmax=1.04
)
energies_ssprk3 = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies_ssprk3);
No description has been provided for this image

So that we have a number to compare against for later, we can calculate the total energy drift from the beginning to the end of the simulation:

energies_ssprk3[-1] - energies_ssprk3[0]
np.float64(0.1194008323880098)
Rosenbrock

Now let's see how the new scheme scheme fares.

solver = Rosenbrock(z0, F)

We can use a much longer timestep than we could with explicit methods.

final_time = 10.0
timestep = 50e-3
num_steps = int(final_time / timestep)
output_freq = 1
hs, qs = run_simulation(solver, final_time, num_steps, output_freq)

A subtle but interesting feature you can see in this animation is that the spurious wave emanating from the bump at the bed has a much smaller magnitude with the Rosenbrock scheme than with any of the explicit schemes.

make_animation(
    hs, b, timestep, output_freq, shading='gouraud', vmin=0.96, vmax=1.04
)

The energy drift is cut by a factor of 5 compared to using an explicit scheme. On top of that, we were able to achieve it using much larger timesteps than were CFL-stable before, and as a consequence the overall time for the simulation is shorter.

energies_rosenbrock = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies_rosenbrock);
No description has been provided for this image
energies_rosenbrock[-1] - energies_rosenbrock[0]
np.float64(0.22949445773156185)

FIXME: the statement above was from the last time I updated this in ~2022. The results from SSPRK3 got better and from Rosenbrock worse on a more recent Firedrake install. I should probably investigate this and figure out what's different.

Conclusion

In the previous post, we showed some of the difficulties associated with solving the shallow water equations. The two biggest problems we ran into were getting a CFL-stable timestep and controlling energy drift. Rosenbrock schemes almost eliminate stability problems and decimate the drift as well. While they are substantially more expensive for a single timestep, there are a lot of gains to be had by using a better preconditioner. On top of that, we can gain other efficiencies by approximating the linearized dynamics with a matrix that's easier to invert.

The shallow water equations

In the previous post, we explored how difficult it is to solve the simplest hyperbolic conservation law, the scalar advection equation. To solve this PDE accurately, we had to understand how the partly arbitrary choice of a numerical flux can make or break the stability of the spatial discretization, how low order schemes are very diffusive, and how higher-order explicit schemes introduce spurious maxima and minima that we can only control through a nonlinear flux limiting procedure. The scalar advection equation is comparatively simple in that signals can only propagate along the predetermined velocity field. In this post, we'll look at something more realistic and much more difficult: the shallow water equations. The shallow water equations are a system of equations rather than a scalar problem, and as such they can exhibit non-trivial wave propagation in a way that the advection equation can't. They're a great model for testing numerical solvers because they're both simple enough to keep in your head all at once, and yet at the same time they exhibit many of the complexities of more "serious" models -- nonlinearity, non-trivial conserved quantities, mimetic properties, the list goes on.

The shallow water equations can be derived from the incompressible Euler equations of fluid dynamics with a free surface under the assumption that the horizontal length scale is much longer than the vertical one. This approximation reduces the unknowns to the thickness $h$ of the fluid and the depth-averaged velocity $u$. The conservation laws are for mass and momentum:

$$\begin{align} \frac{\partial}{\partial t}h + \nabla\cdot hu & = 0 \\ \frac{\partial}{\partial t}hu + \nabla\cdot\left(hu\otimes u + \frac{1}{2}gh^2I\right) & = -gh\nabla b \end{align}$$

where $g$ is the acceleration due to gravity, $b$ is the bathymetry, and $I$ is the identity matrix. This problem is a little more complicated because of the time derivative on $h\cdot u$, a combination of two of the state variables. To make things a little easier, we'll instead work with the momentum $q = h\cdot u$ and rewrite the system as

$$\begin{align} \frac{\partial}{\partial t}h + \nabla\cdot q & = 0 \\ \frac{\partial}{\partial t}q + \nabla\cdot\left(h^{-1}q\otimes q + \frac{1}{2}gh^2I\right) & = -gh\nabla b. \end{align}$$

As in the previous post, we'll use a discontinuous Galerkin basis. We showed that there is more than one way to come up with a discretized problem that is consistent with the idealized one and this is manifested in which numerical flux to use. Things get much more interesting for systems of PDE, which can have more than one characteristic speed besides that of the background flow field. In the following, I'll assume you're familiar with the fact that the characteristic wave speed for the shallow water equations is

$$c = |u| + \sqrt{gh}.$$

The fact that the wave speed now depends on the solution and that waves propagate in all directions instead of only along a pre-set vector field has several consequences. First, we can't pick a CFL-stable timestep from the outset because the fluid velocity and thickness could increase well beyond their initial values. The only options for timestepping are to use an adaptive procedure or a whole mess of trial and error. Second, we have to think harder about numerical fluxes. For scalar equations, we can use the numerical flux to mimic the properties of upwind finite difference schemes, but for systems this reasoning doesn't work -- there can be waves simultaneously propagating in both normal directions at a given internal facet.

Spatial discretization

We'll examine several different test problems for the shallow water equations, so to avoid a huge amount of repeated code we'll first write a few Python functions to set up the weak form of the problem.

import firedrake
from firedrake import Constant
g = Constant(9.81)
I = firedrake.Identity(2)

The component of the fluxes in the interior of the cells is exactly what you get from applying the divergence theorem to the original system.

from firedrake import inner, grad, dx
def cell_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    f_h = -inner(q, grad(ϕ)) * dx

    F = outer(q, q) / h + 0.5 * g * h**2 * I
    f_q = -inner(F, grad(v)) * dx

    return f_h + f_q

Now we need to add the facet terms and this is where the numerical fluxes come in. We'll start with a function to compute the central flux. This is the easy part -- there are no choices to make.

from firedrake import avg, outer, dS
def central_facet_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    mesh = z.ufl_domain()
    n = firedrake.FacetNormal(mesh)

    f_h = inner(avg(q), ϕ('+') * n('+') + ϕ('-') * n('-')) * dS

    F = outer(q, q) / h + 0.5 * g * h**2 * I
    f_q = inner(avg(F), outer(v('+'), n('+')) + outer(v('-'), n('-'))) * dS
    
    return f_h + f_q

The central flux by itself is unstable with explicit timestepping schemes and to reocver stability we need an extra diffusive flux term. The subtle choices are all in this diffusive flux. For the remainder of this demo, we'll use the Lax-Friedrichs numerical flux. This flux uses the maximum outgoing wave speed to set the local diffusion coefficient. An upper bound for the outgoing wavespeed is then $|c| = |q/h \cdot n| + \sqrt{gh}$. I say an upper bound and not the actual maximum value because the system could exhibit the shallow water equivalent of supersonic flow -- where the speed $|u|$ exceeds $\sqrt{gh}$, both waves could be moving in the same direction rather than opposite each other.

The vast majority of the literature you'll find on DG methods uses the Lax-Friedrichs flux. For example, Cockburn and Shu (1998) in the last of their famous series of papers on the Runge-Kutta DG method consider only the Lax-Friedrichs flux and neglect to even mention the alternatives. This can be confusing for beginners to the subject because it isn't clear at what points in the process you (the programmer) have choices to make and where you don't. Part of the reason for this singular focus is that the Lax-Friedrichs flux is the simplest to implement, but several studies have found -- at least for problems without shock waves -- that other flux functions offer negligible gains in accuracy; see for example Qiu (2006) and Bernard et al. (2009). In that case, there isn't much value in using a different flux function that may be more expensive to compute and make the code harder to understand and maintain. The choice of numerical flux is more consequential for problems with shock waves or for more complex systems (see for example Beck et al. (2014), which studied turbulent flow).

from firedrake import sqrt, max_value
def lax_friedrichs_facet_flux(z):
    Z = z.function_space()
    h, q = firedrake.split(z)
    ϕ, v = firedrake.TestFunctions(Z)
    
    mesh = h.ufl_domain()
    n = firedrake.FacetNormal(mesh)
    
    c = abs(inner(q / h, n)) + sqrt(g * h)
    α = avg(c)
    
    f_h = -α * (h('+') - h('-')) * (ϕ('+') - ϕ('-')) * dS
    f_q = -α * inner(q('+') - q('-'), v('+') - v('-')) * dS

    return f_h + f_q

Finally, we'll do a few experiments with variable bottom topography as well.

def topographic_forcing(z, b):
    Z = z.function_space()
    h = firedrake.split(z)[0]
    v = firedrake.TestFunctions(Z)[1]

    return -g * h * inner(grad(b), v) * dx

Time discretization

We'll consider two different timestepping methods: the forward Euler scheme and the strong stability-preserving Runge-Kutta method of order 3, or SSPRK3 for short. Since we'll be using these quite a bit we'll factor them out into classes that store the needed internal state.

In the previous demo on the conservative advection equation, we used solver parameters that specified a block ILU preconditioner for the linear system. One application of this preconditioner gives an exact solution to the linear system because the mass matrix for DG discretizations is block diagonal. We're doing something a little different here by using a mixed function space for the thickness and momentum because it makes the time discretization much easier. But as a consequence the mass matrix that Firedrake will build under the hood uses a nested storage format -- probably compressed row storage for the thicknesses and block CRS for the momentum, but this shouldn't matter. In order to achieve the same exact linear solve effect here that we had for the conservative advection equation, we'll specify an outer-level fieldsplit preconditioner. Fieldsplit lets us separately specify preconditioners for each block, and those inner preconditioners will be our good old ILU + block Jacobi. You're likely to encounter fieldsplit preconditioners again if you ever have solved mixed finite element problems like the Stokes equations.

block_parameters = {
    'ksp_type': 'preonly',
    'pc_type': 'ilu',
    'sub_pc_type': 'bjacobi'
}

parameters = {
    'solver_parameters': {
        'ksp_type': 'preonly',
        'pc_type': 'fieldsplit',
        'fieldsplit_0': block_parameters,
        'fieldsplit_1': block_parameters
    }
}

I've defined a time integrator class below that provides the bare minimum amount of functionality to do what we need. The data stored in this class consists of the current value of the state variables, an auxiliary state for the value at the next timestep, and a cached solver object for the mass matrix solve. The step method lets us advance the simulation by a single timestep which we can change on one invocation to the next. In principle we could do adaptive timestepping with this implementation.

from firedrake import (
    NonlinearVariationalProblem as Problem,
    NonlinearVariationalSolver as Solver
)

class ForwardEuler:
    def __init__(self, state, equation):
        z = state.copy(deepcopy=True)
        F = equation(z)
        
        z_n = z.copy(deepcopy=True)
        Z = z.function_space()
        w = firedrake.TestFunction(Z)
        
        dt = firedrake.Constant(1.)

        problem = Problem(inner(z_n - z, w) * dx - dt * F, z_n)
        solver = Solver(problem, **parameters)
        
        self.state = z
        self.next_state = z_n
        self.timestep = dt
        self.solver = solver
        
    def step(self, timestep):
        self.timestep.assign(timestep)
        self.solver.solve()
        self.state.assign(self.next_state)

Demonstration

Our first test problem will be a periodic domain with a flat bottom. The initial state of the system will consist of a spherical perturbation to the water thickness, and we'll look at how this disturbance evolves in time.

nx, ny = 32, 32
Lx, Ly = 20., 20.
mesh = firedrake.PeriodicRectangleMesh(
    nx, ny, Lx, Ly, diagonal='crossed'
)

x = firedrake.SpatialCoordinate(mesh)
lx = 5.
y = Constant((lx, lx))
r = Constant(2.5)

h_0 = Constant(1.)
δh = Constant(1/16)
h_expr = h_0 + δh * max_value(0, 1 - inner(x - y, x - y) / r**2)
DG(0) basis; or, the finite volume method

We'll start by using the simplest discretization possible: piecewise constant basis functions for both the thickness and momentum. This method is identical to the lowest-order finite volume method. We'll also use a mixed function space $Z = Q \times V$ that includes both the thickness and momentum. This choice isn't strictly necessary but it makes it that much easier to write the time integrator you saw above. The code is equivalent to a single update for the combined state vector $z = (h, q)$ rather than two separate updates for each.

Q0 = firedrake.FunctionSpace(mesh, family='DG', degree=0)
V0 = firedrake.VectorFunctionSpace(mesh, family='DG', degree=0)
Z0 = Q0 * V0

The split method of functions in mixed spaces give us the tuple of components. That way we can initialize the thickness to the expression defined just above. Note that the method split of functions in mixed spaces is different from the module-level function split, which gives us symbolic objects.

z0 = firedrake.Function(Z0)
h0, q0 = z0.subfunctions
h0.project(h_expr);
solver = ForwardEuler(
    z0,
    lambda z: (
        cell_flux(z) +
        central_facet_flux(z) +
        lax_friedrichs_facet_flux(z)
    )
)

Since we'll be running the same simulation many times, we'll wrap up the loop in a function.

from tqdm.notebook import trange

def run_simulation(solver, final_time, num_steps, output_freq):
    hs, qs = [], []
    qs = []
    pbar = trange(num_steps)
    for step in pbar:
        if step % output_freq == 0:
            h, q = solver.state.subfunctions
            hmin, hmax = h.dat.data_ro.min(), h.dat.data_ro.max()
            pbar.set_description(f'{hmin:5.3f}, {hmax:5.3f}')
            hs.append(h.copy(deepcopy=True))
            qs.append(q.copy(deepcopy=True))

        solver.step(timestep)

    return hs, qs
final_time = 10.
timestep = 5e-3
num_steps = int(final_time / timestep)
output_freq = 10
hs, qs = run_simulation(solver, final_time, num_steps, output_freq)
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

def make_animation(hs, b, timestep, output_freq, **kwargs):
    fig, axes = plt.subplots()
    axes.set_aspect('equal')
    axes.set_xlim((0.0, Lx))
    axes.set_ylim((0.0, Ly))
    axes.set_axis_off()
    η = firedrake.project(hs[0] + b, hs[0].function_space())
    kw = {"num_sample_points": 4, "shading": "gouraud"}
    colors = firedrake.tripcolor(hs[0], **kw, **kwargs, axes=axes)

    fn_plotter = firedrake.FunctionPlotter(mesh, num_sample_points=4)
    def animate(h):
        η.project(h + b)
        colors.set_array(fn_plotter(η))

    interval = 1e3 * output_freq * timestep
    animation = FuncAnimation(fig, animate, frames=hs, interval=interval)

    plt.close(fig)
    return HTML(animation.to_html5_video())
make_animation(
    hs, Constant(0), timestep, output_freq, vmin=0.98, vmax=1.03
)

We get the expected propagation at first, but the method is so diffusive that the waves quickly get washed out. Another sanity check that we'll repeat through the following is to track the total energy of the system. The shallow water equations conserve the quantity

$$E = \frac{1}{2}\int_\Omega\left(\frac{|q|^2}{h} + g(h + b)^2\right)dx.$$

(There's a Hamiltonian formulation of the shallow water equations based on this energy functional, although the Poisson bracket is a little weird.) Approximately conserving the total energy is especially important for long-running simulations of systems where physical dissipation mechanisms are relatively weak or non-existent. The plot below shows that the explicit Euler scheme with DG(0) basis functions and the Lax-Friedrichs flux dissipates energy.

energies = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * h**2) * dx
    )
    for h, q in zip(hs, qs)
]
fig, axes = plt.subplots()
axes.plot(energies);
No description has been provided for this image
With topography

What happens if we add a bump at the bed? We'll use a similar perturbation to the initial thickness.

y = Constant((3 * lx, 3 * lx))
δb = Constant(1/4)
b = δb * max_value(0, 1 - inner(x - y, x - y) / r**2)

In order to make the initial state have a flat surface, we'll subtract off the bottom topography from the initial thickness.

z0 = firedrake.Function(Z0)
z0.sub(0).project(h_expr - b);

The only thing different here is the presence of the topographic forcing term.

def F(z):
    return (
        cell_flux(z) +
        central_facet_flux(z) +
        lax_friedrichs_facet_flux(z) -
        topographic_forcing(z, b)
    )

solver = ForwardEuler(z0, F)
hs, qs = run_simulation(solver, final_time, num_steps, output_freq)
make_animation(
    hs, b, timestep, output_freq, vmin=0.96, vmax=1.04
)

This is even worse than before; the resolution of the bump at the bed is poor enough that there's a permanent disturbance around it. The energy drift has a very different character this time around, instead oscillating around a higher value than the initial one.

energies = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies);
No description has been provided for this image
DG(1) basis

Now let's try increasing the resolution by using a higher-order finite element basis and seeing what happens.

Q1 = firedrake.FunctionSpace(mesh, family='DG', degree=1)
V1 = firedrake.VectorFunctionSpace(mesh, family='DG', degree=1)
Z1 = Q1 * V1

We can reuse the expression objects for the starting thickness and the bed bump since they're purely symbolic.

z0 = firedrake.Function(Z1)
z0.sub(0).project(h_expr - b);

The integrator and the simulation loop are the same as they were before.

solver = ForwardEuler(z0, F)
hs, qs = run_simulation(solver, final_time, num_steps, output_freq)

With the DG(1) basis, the results are much less diffusive than with DG(0), but we're still getting weird artifacts near the bed bump.

make_animation(hs, b, timestep, output_freq, vmin=0.96, vmax=1.04)

While the DG(1) basis gives substantially better results in the eyeball norm than the DG(0) basis, the energy drift has gotten much worse. If we were to run this simulation for an even longer time, the unphysical injection of energy into the system could eventually lead to such large momenta that the timestep we used is no longer stable.

energies = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies);
No description has been provided for this image

Now let's try a higher-order timestepping scheme.

SSPRK(3) scheme

The strong stability-preserving Runge-Kutta method of order 3 is a popular timestepping scheme for hyperbolic equations. The 3rd-order convergence is obviously a big advantage in addition to simplicity of implementation. We can implement a timestepping class for it in a similar way to the Euler scheme, but with more internal state for the Runge-Kutta stages.

class SSPRK3:
    def __init__(self, state, equation):
        z = state.copy(deepcopy=True)
        dt = firedrake.Constant(1.0)
        
        num_stages = 3
        zs = [state.copy(deepcopy=True) for stage in range(num_stages)]
        Fs = [equation(z), equation(zs[0]), equation(zs[1])]
        
        Z = z.function_space()
        w = firedrake.TestFunction(Z)
        forms = [
            inner(zs[0] - z, w) * dx - dt * Fs[0],
            inner(zs[1] - (3 * z + zs[0]) / 4, w) * dx - dt / 4 * Fs[1],
            inner(zs[2] - (z + 2 * zs[1]) / 3, w) * dx - 2 * dt / 3 * Fs[2]
        ]
        
        problems = [Problem(form, zk) for form, zk in zip(forms, zs)]
        solvers = [Solver(problem, **parameters) for problem in problems]
        
        self.state = z
        self.stages = zs
        self.timestep = dt
        self.solvers = solvers
    
    def step(self, timestep):
        self.timestep.assign(timestep)
        for solver in self.solvers:
            solver.solve()
        self.state.assign(self.stages[-1])
z0 = firedrake.Function(Z1)
z0.sub(0).project(h_expr - b)

solver = SSPRK3(z0, F)

Since there are now three Runge-Kutta stages, the simulation takes about 3x as long to run.

hs, qs = run_simulation(solver, final_time, num_steps, output_freq)

With the timestep we've chosen, the results of using the SSPRK3 scheme are visually indistinguishable from those of the explicit Euler scheme. This improved method has basically the same energy drift as explicit Euler even though it is formally of higher order in time.

energies = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies);
No description has been provided for this image

Since the SSPRK3 scheme is more accurate, we would ideally be able to use a timestep that's several times larger and get more accuracy at less cost if we wanted to. But this scheme has the same CFL requirements as explicit Euler, so if we wanted to do that we'd also have to reduce the resolution in space. There is a 4-stage, 3rd-order accurate SSPRK scheme that has twice as large a stability region which does allow us to be more flexible in what tradeoffs we make while using the same spatial discretization. Nonetheless, even that scheme won't help reduce the energy drift that we've incurred by going to the more sophisticated DG(1) discretization in space.

Hipster elements

Just going to a higher-order time discretization didn't substantially reduce the energy drift. Let's see what happens if we consider a different spatial discretization instead. The idea behind compatible finite element discretizations is to use function spaces for the thickness and momentum fields such that the divergence operator maps the momentum space exactly onto the thickness space. Compatible elements tend to recover many of the beneficial properties of staggered finite difference grids. If that subject is unfamiliar to you, I learned about it from Dale Durran's book. I learned about compatible finite elements from Colin Cotter, who's done a lot of the work developing this idea and demonstrating its usefulness on practical problems. I can't do justice to the subject here, but if you want to read more Natale et al. (2016) gives a great review.

The same elements that work well for the mixed form of the Poisson equation tend to also work well for the shallow water equations, which is lucky because people have been studying which elements work for mixed Poisson since the 70s. Here, we'll use the Brezzi-Douglas-Fortin-Marini element or BDFM(2) for short.

DG1 = firedrake.FunctionSpace(mesh, family='DG', degree=1)
BDFM2 = firedrake.FunctionSpace(mesh, family='BDFM', degree=2)
Z = DG1 * BDFM2
z0 = firedrake.Function(Z)
z0.sub(0).project(h_expr - b)

solver = ForwardEuler(z0, F)

The BDFM element family has more degrees of freedom than DG(1) and requires a more involved transformation to the reference element, so we should expect that this scheme will be substantially more expensive than the schemes we've used so far. On top of that, since the velocity space uses polynomials of higher degree than 1, we'll also incur a greater penalty from CFL by needing a smaller timestep. We've used a timestep so far of 1/200. We could get away with a timestep of 1/100 for the DG(1)/DG(1) discretization but not for BDFM(2)/DG(1).

hs, qs = run_simulation(solver, final_time, num_steps, output_freq)

While this discretization was much more expensive than DG(1), we make up for it by cutting the energy drift substantially.

energies = [
    firedrake.assemble(
        0.5 * (inner(q, q) / h + g * (h + b)**2) * dx
    )
    for h, q in zip(hs, qs)
]

fig, axes = plt.subplots()
axes.plot(energies);
No description has been provided for this image

If we really wanted close to exact energy conservation, we could explore symplectic schemes. The disadvantage of symplectic schemes is that they're typically implicit, for example the implicit midpoint rule.

Conclusion

Hyperbolic systems of conservation laws, like the shallow water equations, introduce a huge amount of new complexity compared to scalar problems. Here we've looked at some of the issues around the numerical flux and the choices of time and space discretization. Going to higher order in space gave us more accurate solutions but introduced an additional energy drift that could be a serious problem in long-running simulations. Using a more accurate time discretization didn't reduce the drift at all. We had to use a different and even more sophisticated spatial discretization in order to reduce this effect.

We've focused here on explicit timestepping schemes. These have the virtue of being particularly simple to implement. In other scenarios, however, you may end up stuck with a mesh that's finer than you'd like. The CFL condition can then become oppressive if the wave speeds across finer regions grow too high. In the posts that follow, we'll dig into this issue even more and start looking at implicit and Rosenbrock-type schemes.

Convection-diffusion

In a previous post, we looked at how to discretize conservation laws using the discontinuous Galerkin method. The DG method, with an appropriate choice of numerical flux and limiting scheme, can be a great way to solve purely hyperbolic problems. But many realistic problems aren't purely hyperbolic -- there's diffusive transport as well:

$$\frac{\partial\phi}{\partial t} + \nabla\cdot \phi u = \nabla\cdot k\nabla \phi.$$

where $\phi$ is the solution, $u$ is a velocity field, and $k$ is the conductivity. Depending on the ratio $UL / k$, where $L$ is a characteristic length scale, the problem is either advection- or diffusion-dominated.

What basis functions should we use for a problem with mixed character like convection-diffusion? It might seem that we have only bad choices. We could use continuous basis functions, which makes the diffusive part of the equation easy, but we might then need to stabilize the advective part using, say, streamlined upwinding. DG methods work perfectly well for discretizing the advection operator, but it feels a little strange to use basis functions that don't even live in the same function space as the solution. Nonetheless, it is possible to use DG for elliptic problems and we've already seen how in the form of Nitsche's method.

DG for elliptic problems

Recall that Nitsche's method gave us a way to weakly impose Dirichlet boundary conditions for elliptic PDE by modifying the variational principle rather than the discretized linear system. In the post where I introduced Nitsche's method, the application was for Dirichlet conditions at the boundary $\partial\Omega$ of the domain $\Omega$. To arrive at a DG discretization of elliptic problems, we'll instead imagine using Nitsche's method on every cell of the mesh. Rather than impose a set value on the boundary of each cell, we'll instead use Nitsche's method to penalize discontinuous solutions. For this reason the method that I'll describe is called the symmetric interior-penalty discontinuous Galerkin method.

Let's suppose we were only interested in solving the variational problem to minimize

$$J(\phi) = \frac{1}{2}\int_\Omega k|\nabla\phi|^2dx$$

subject to the boundary condition $\phi|_{\partial\Omega} = g$. This is another way of stating the weak form of the elliptic part of our problem above. Rather than enforce the Dirichlet boundary condition at the level of the discretized linear system, as is customary, we could instead use a different action functional:

$$J(\phi) = \frac{1}{2}\int_\Omega k|\nabla \phi|^2dx - \int_{\partial\Omega}k\frac{\partial \phi}{\partial n}(\phi - g)\,ds + \sum_E\int_E\frac{\gamma k}{2h}|\phi - g|^2ds,$$

where $E$ are all of the boundary faces of the mesh, $h$ the diameter of $E$, and $\gamma$ a constant. Let $\theta$ be the smallest angle between any two edges of the mesh, $d$ the spatial dimension, and $p$ the polynomial degree of the finite element basis. In the post on Nitsche's method, we showed that if

$$\gamma > 2 \cdot p \cdot (p + d - 1) \cdot \csc\theta\cdot\cot\theta / 2\cdot\frac{\max k}{\min k}$$

the modified action functional is convex and we can be assured that there is a discrete solution. The advantage of Nitsche's method is that we don't have to modify the discretized linear system, which can be error-prone, and that we can compute a good value of $\gamma$ just by examining the mesh and conductivity coefficient.

The idea behind DG discretization of elliptic problems is that, instead of using Nitsche's method to enforce a Dirichlet condition at the boundary of the whole domain, we use it to force the solution to be continuous across element boundaries. Rather than match $q$ to some externally defined function $g$, we instead match the value $q_+$ on one side of a facet $E$ to the value $q_-$ on the other side; terms like $q - g$ instead become $[q] = q_+ - q_-$, where we've introduced the shorthand $[\cdot]$ to denote the jump across an inter-element boundary. In that case, the action functional becomes

$$J(\phi) = \frac{1}{2}\sum_K\int_K k|\nabla\phi|^2dx + \sum_E\int_Ek\left[\frac{\partial\phi}{\partial n}\right][\phi]\, dS + \sum_E\int_E\frac{\gamma k}{2h}[\phi]^2dS + \ldots$$

where I've left off the remaining terms. The same considerations apply to picking $\gamma$ and we can actually get away with using the exact same value as before. To illustrate, let's try this on a toy problem.

Demonstration

We'll use the unit square as our domain. Although we know ahead of time that the minimum triangle area is $\pi / 4$, I've included some code here to calculate it. We'll need to know this value in order to get a good value of the penalty parameter.

import firedrake
nx, ny = 32, 32
mesh = firedrake.UnitSquareMesh(nx, ny)
import numpy as np
from numpy.linalg import norm

coords = mesh.coordinates.dat.data_ro
cells = mesh.coordinates.cell_node_map().values

θ = np.inf
for cell in cells:
    for k in range(3):
        x, y, z = coords[np.roll(cell, k)]
        ζ, ξ = y - x, z - x
        angle = np.arccos(np.inner(ζ, ξ) / (norm(ζ) * norm(ξ)))
        θ = min(angle, θ)

For boundary values, we'll use a random trigonometric polynomial.

from numpy import pi as π
from numpy.random import default_rng
x, y = firedrake.SpatialCoordinate(mesh)

rng = default_rng(seed=1)
def random_fourier_series(std_dev, num_modes, exponent):
    from firedrake import sin, cos
    A = std_dev * rng.standard_normal((num_modes, num_modes))
    B = std_dev * rng.standard_normal((num_modes, num_modes))
    expr = sum([(A[k, l] * sin(π * (k * x + l * y)) +
                 B[k, l] * cos(π * (k * x + l * y)))
                / (1 + (k**2 + l**2)**(exponent/2))
                for k in range(num_modes)
                for l in range(int(np.sqrt(num_modes**2 - k**2)))])
    return expr
g = random_fourier_series(std_dev=.25, num_modes=5, exponent=1)
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
V = firedrake.FunctionSpace(mesh, family='CG', degree=2)
fig = plt.figure()
axes = fig.add_subplot(projection='3d')
firedrake.trisurf(firedrake.Function(V).interpolate(g), axes=axes);
No description has been provided for this image

For basis functions, we'll use polynomials of degree 1. As we saw in the previous post on hyperbolic conservation laws, these gave very good numerical solutions provided that we had a good timestepping scheme. Now that we know the polynomial degree and the mesh quality, we can calculate the penalty parameter for Nitsche's method. We've put in a fudge factor $\alpha$ to make sure the penalty parameter is strictly greater than the minimum possible value.

from firedrake import Constant
p = 1
α = 1/2
C = p * (p + 1)
γ = Constant(2 * C / α**2 / (np.sin(θ) * np.tan(θ/2)))

The action functional for the DG formulation of the problem consists of five parts: an internal cell flux; the across-cell flux; the across-cell penalty; the boundary flux; and the boundary penalty.

from firedrake import inner, grad, dx, ds, dS

Q = firedrake.FunctionSpace(mesh, family='DG', degree=p)
n = firedrake.FacetNormal(mesh)
h = firedrake.CellSize(mesh)

ϕ = firedrake.project(g, Q)
dϕ_dn = inner(grad(ϕ), n)

J_cells = 0.5 * inner(grad(ϕ), grad(ϕ)) * dx
J_facet_flux = -(dϕ_dn('+') - dϕ_dn('-')) * (ϕ('+') - ϕ('-')) * dS
J_facet_penalty = 0.5 * γ / (h('+') + h('-')) * (ϕ('+') - ϕ('-'))**2 * dS

J_flux = -dϕ_dn * (ϕ - g) * ds
J_penalty = 0.5 * γ * (ϕ - g)**2 / h * ds

J = (
    J_cells +
    J_facet_flux +
    J_facet_penalty +
    J_flux +
    J_penalty
)

Finally, we can solve the PDE just like we did before. Note that, since we're using discontinuous piecewise linear basis functions, there are three degrees of freedom for every triangle of the mesh. Had we used continuous piecewise linear basis functions, the solution would have one degree of freedom for every vertex, which is much fewer than for the DG basis. Consequently we can expect the solution process to take much longer when using DG.

F = firedrake.derivative(J, ϕ)
parameters_lu = {
    'solver_parameters': {
        'ksp_type': 'preonly',
        'pc_type': 'lu',
        'pc_factor_mat_solver_type': 'mumps'
    }
}
firedrake.solve(F == 0, ϕ, **parameters_lu)
fig = plt.figure()
axes = fig.add_subplot(projection='3d')
triangles = firedrake.trisurf(ϕ, axes=axes)
fig.colorbar(triangles);
No description has been provided for this image

For comparison, we can compute the solution using continuous basis functions.

CG = firedrake.FunctionSpace(mesh, family='CG', degree=p)
ψ = firedrake.Function(CG)
J = 0.5 * inner(grad(ψ), grad(ψ)) * dx

F = firedrake.derivative(J, ψ)
bc = firedrake.DirichletBC(CG, g, 'on_boundary')
firedrake.solve(F == 0, ψ, bc, **parameters_lu)
firedrake.norm(ϕ - ψ) / firedrake.norm(ψ)
np.float64(0.012794504299351675)

The relative error is quite small, which suggests that our approach isn't totally off base. We're now ready to introduce timestepping, which gets quite interesting for problems of mixed character like ours.

IMEX methods

The natural way to timestep the heat equation is to use an implicit scheme like backward Euler. If you did try to use an explicit method, the CFL condition becomes oppressive for parabolic problems because perturbations propagate with infinite speed. For the advection equation, on the other hand, the natural approach is to use an explicit scheme because perturbations travel with finite speed. In our case we'll simplify things considerably by assuming the velocity of the medium doesn't change in time, but for realistic problems it too obeys some PDE and we have to assume that it can change. The argument for using explicit schemes becomes much stronger when the velocity can change because it becomes much harder to reuse information from past implicit solves.

We can reconcile the fact that different timestepping schemes work best for each part of the equation by using a splitting method. The idea of splitting methods is to separately integrate each part of the dynamics with a method that works best for it. To illustrate this it's helpful to imagine the problems as a linear system of ODE

$$\frac{\partial\phi}{\partial t} = (A + B)\phi$$

where $A$ and $B$ are operators. The formal solution to this problem can be expressed using the matrix exponential:

$$\phi(t) = e^{t(A + B)}\phi(0)$$

but we're then faced with the challenge of simultaneously integrating both parts of the dynamics. If $A$ and $B$ commute, then the exponential factors into a product, but we have to get very lucky for that to happen. Instead, we can recognize that

$$\exp\{t(A + B)\} = \exp(tA)\exp(tB)\exp\left(-\frac{t^2}{2}[A, B]\right)\cdot \ldots$$

In other words, to approximate integrating the problem forward by one timestep, we can first integrate the $A$ dynamics, then the $B$ dynamics. There is a cost in doing so: we incur a splitting error proportional to the commutator $[A, B]$ of the two operators. But this error is only of the order of $\delta t^2$ for a single timestep and thus on the order of $\delta t$ over all. We can derive higher-order splitting schemes that incur even less of a cost, often using the Zassenhaus formula. The equation I wrote above is the Zassenhous formula up to second order.

In our particular case, we know that using implicit Euler will work well for the parabolic part of the problem and explicit Euler (with a CFL-stable timestep) will work well for the hyperbolic part. A first-order splitting scheme amounts to using each scheme in succession. This approach to mixed-type problems is often called an implicit/explicit or IMEX discretization in the literature.

Demonstration

We'll use a very similar setup to the post on conservation laws to examine a simple first-order IMEX scheme for the convection-diffusion equation. The velocity field will be uniform solid body rotation about the center of the domain. In order to keep the solution from being immediately smoothed over, we'll use a conductivity coefficient of 1/1000. This puts the problem well in the advection-dominated regime. Nonetheless, the diffusivity is large enough that we won't need to use a flux limiter; any spurious oscillations will naturally get smoothed out.

p = 1
Q = firedrake.FunctionSpace(mesh, family='DG', degree=p)
from firedrake import sqrt, as_vector, min_value, max_value

x = firedrake.SpatialCoordinate(mesh)
x_c = Constant((5/8, 5/8))
R_c = Constant(1/8)

x_b = Constant((3/8, 3/8))
R_b = Constant(1/8)

ϕ_expr = (
    max_value(0, 1 - sqrt(inner(x - x_c, x - x_c) / R_c**2)) +
    max_value(0, 1 - inner(x - x_b, x - x_b) / R_b**2)
)
ϕ_0 = firedrake.project(ϕ_expr, Q)
ϕ = ϕ_0.copy(deepcopy=True)
x = firedrake.SpatialCoordinate(mesh)
y = Constant((.5, .5))
r = x - y
u = as_vector((-r[1], r[0]))
speed = firedrake.Function(Q).interpolate(sqrt(inner(u, u)))
max_speed = speed.dat.data_ro.max()

Q0 = firedrake.FunctionSpace(mesh, family='DG', degree=0)
diameters = firedrake.project(firedrake.CellDiameter(mesh), Q0)
min_diameter = diameters.dat.data_ro.min()

cfl_timestep = min_diameter / max_speed

final_time = 2 * π
num_steps = 4 * int(final_time / cfl_timestep)
dt = firedrake.Constant(final_time / num_steps)

First, we'll create a solver for the diffusive part of the problem. We can take advantage of the fact that the implicit Euler timestep for a parabolic PDE is itself a minimization problem to keep our description as succinct as possible.

J_mass = 0.5 * (ϕ - ϕ_0)**2 * dx

k = Constant(1e-3)
dϕ_dn = inner(grad(ϕ), n)
J_cells = 0.5 * inner(grad(ϕ), grad(ϕ)) * dx
J_facet_flux = -(dϕ_dn('+') - dϕ_dn('-')) * (ϕ('+') - ϕ('-')) * dS
J_facet_penalty = 0.5 * γ / (h('+') + h('-')) * (ϕ('+') - ϕ('-'))**2 * dS

J = J_mass + dt * k * (J_cells + J_facet_flux + J_facet_penalty)

F = firedrake.derivative(J, ϕ)
heat_problem = firedrake.NonlinearVariationalProblem(F, ϕ)
heat_solver = firedrake.NonlinearVariationalSolver(heat_problem, **parameters_lu)

Next we'll create a solver for the convective part of the problem.

from firedrake import min_value, max_value
m = firedrake.TrialFunction(Q) * firedrake.TestFunction(Q) * dx

ψ = firedrake.TestFunction(Q)
cell_flux = -inner(grad(ψ), ϕ * u) * dx

f = ϕ * max_value(0, inner(u, n))
facet_flux = (f('+') - f('-')) * (ψ('+') - ψ('-')) * dS

ϕ_in = firedrake.Constant(0)
influx = ϕ_in * min_value(0, inner(u, n)) * ψ * ds
outflux = ϕ * max_value(0, inner(u, n)) * ψ * ds

dϕ_dt = -(cell_flux + facet_flux + influx + outflux)
δϕ = firedrake.Function(Q)
flow_problem = firedrake.LinearVariationalProblem(m, dt * dϕ_dt, δϕ)
parameters_bjac = {
    'solver_parameters': {
        'ksp_type': 'preonly',
        'pc_type': 'bjacobi',
        'sub_pc_type': 'ilu'
    }
}
flow_solver = firedrake.LinearVariationalSolver(flow_problem, **parameters_bjac)

Now we can try the alternating scheme.

from tqdm.notebook import tqdm, trange

output_freq = 4
ϕs = []

for step in trange(num_steps, unit='timesteps'):
    flow_solver.solve()
    ϕ += δϕ

    ϕ_0.assign(ϕ)
    heat_solver.solve()

    if step % output_freq == 0:
        ϕs.append(ϕ.copy(deepcopy=True))
%%capture
fig, axes = plt.subplots()
axes.set_aspect('equal')
axes.set_axis_off()
colors = firedrake.tripcolor(ϕ, num_sample_points=1, vmin=0., vmax=1., axes=axes)
from firedrake.plot import FunctionPlotter
from matplotlib.animation import FuncAnimation
fn_plotter = FunctionPlotter(mesh, num_sample_points=1)
animate = lambda φ: colors.set_array(fn_plotter(ϕ))
interval = 1e3 * output_freq * float(dt)
animation = FuncAnimation(fig, animate, frames=tqdm(ϕs), interval=interval)
from IPython.display import HTML
HTML(animation.to_html5_video())

Unlike in the purely hyperbolic case, the solution becomes much smoother by the time the medium has completed a full rotation. If we had used an even smaller conductivity coefficient $k$, the intrinsic diffusion in the problem would be too small to suppress the oscillations that we would get for a purely convective problem using degree-1 basis functions. In that case a flux limiter would be necessary to get a decent solution.

Conclusion

The natural choices to make about spatial and temporal discretization for parabolic and hyperbolic problems are complete opposites. To solve a PDE that combines aspects of both types, finding a good middle ground can be a challenge. Here we showed that non-conforming discretizations of elliptic PDE allow us to apply what we already know about DG discretizations of conservation laws to other problems.

For time discretization, operator splitting helps turn a harder problem into two easy ones. In this post we only showed a first-order splitting scheme, but Strang splitting achieves second-order accuracy with only a little more work. Operator splitting is a great trick generally but achieving more than first-order accuracy for more than three operators or more is much harder than for two.

Conservation laws

A conservation law is a type of PDE that describes the transport of extensive quantities like mass, momentum, and energy. The most general form of a hyperbolic conservation law for a field $q$ is

$$\frac{\partial q}{\partial t} + \nabla\cdot f(q) = s,$$

where $f$ is the flux function and $s$ the sources. The solution variable $q$ could be a scalar, vector, or tensor field. Here we'll look at the simplest conservation law of them all, the advection equation: $q$ is a scalar field and $f(q) = qu$ for some velocity field $u$. As we'll see, there are lots of ways to screw up something as simple as the advection equation, so learning what the common error modes are will help when attacking harder problems like the shallow water or Euler equations.

One of the challenging things about solving hyperbolic problems is that the class of reasonable solutions includes functions with jump discontinuities, and for some problems this is true even when the initial data are smooth. Compare that to, say, elliptic problems, where the solution is almost always smoother than the input data. For elliptic problems, it's common to use continuous basis functions, but when we try to use the same basis for conservation laws we can run up against some very nasty stability problems. It's possible to work around these issues by using tricks like the streamlined upwind Petrov-Galerkin method. But almost any approach to stabilizing a CG discretization introduces (1) free parameters that can be difficult to tune right and (2) an unrealistic level of numerical diffusion. For these reasons, the discontinuous Galerkin method is very popular for these kinds of problems. The DG method has good local conservation properties, it can achieve high-order accuracy where the solution is smooth, and there are more options in how you guarantee a stable scheme.

DG is a huge subject and I couldn't possibly do justice to it here. If you want to read more about it, this paper by Cockburn and Shu is a great reference, as are these notes by Ralf Hartmann and this dissertation by Michael Crabb. Instead, I'll focus here on the effects of some of the choices you have to make when you solve these types of problems.

Input data

First, we want to create a domain, some function spaces, and a divergence-free velocity field $u$. The classic example is a material in uniform solid-body rotation around some fixed point $y$:

$$u(x) = \hat k \times (x - y)$$

where $\hat k$ is the unit vector in the $z$ direction.

import firedrake
from firedrake import inner, Constant, as_vector
mesh = firedrake.UnitSquareMesh(64, 64, diagonal="crossed")
x = firedrake.SpatialCoordinate(mesh)
y = Constant((.5, .5))
r = x - y
u = as_vector((-r[1], r[0]))

To have a stable timestepping scheme, we'll need to satisfy the Courant-Friedrichs-Lewy condition, which means calculating the maximum speed and the minimum cell diameter. Calculating the maximum speed exactly can be challenging; if $u$ is represented with piecewise linear basis functions, then $|u|^2$ is a quadratic function and thus might not attain its maximum value at the interpolation points. You could work around this by changing to a basis of Bernstein polynomials, but for our purposes it'll be enough to evaluate the maximum at the interpolation points and take a smaller timestep than necessary.

import numpy as np
Q = firedrake.FunctionSpace(mesh, family="CG", degree=2)
speed = firedrake.Function(Q).interpolate(inner(u, u))
max_speed = np.sqrt(speed.dat.data_ro.max())

Q0 = firedrake.FunctionSpace(mesh, family="DG", degree=0)
diameters = firedrake.project(firedrake.CellDiameter(mesh), Q0)
min_diameter = diameters.dat.data_ro.min()

cfl_timestep = min_diameter / max_speed
print("Upper bound for CFL-stable timestep: {}".format(cfl_timestep))
Upper bound for CFL-stable timestep: 0.022097086912079608

The initial data we'll use will be the classic bell and cone:

$$q_0 = \max\{0, 1 - |x - x_c| / r_c\} + \max\{0, 1 - |x - x_b|^2 / r_b^2\}$$

where $x_c$, $r_c$ are the center and radius of the cone and $x_b$, $r_b$ for the bell.

from firedrake import sqrt, min_value, max_value

x_c = as_vector((5/8, 5/8))
R_c = Constant(1/8)

x_b = as_vector((3/8, 3/8))
R_b = Constant(1/8)

q_expr = (
    max_value(0, 1 - sqrt(inner(x - x_c, x - x_c) / R_c**2)) +
    max_value(0, 1 - inner(x - x_b, x - x_b) / R_b**2)
)
q0 = firedrake.project(q_expr, Q0)
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d
fig = plt.figure()
axes = fig.add_subplot(projection='3d')
firedrake.trisurf(q0, axes=axes);
No description has been provided for this image

Fluxes

For our first experiment we'll look at the problem of choosing a numerical flux. As we'll see in the following, we have choices to make in how we determine the discrete approximation to the solution of the conservation law. This is very different from elliptic problems -- once we've decided to use the continous Galerkin method, the only real choice is what polynomial degree we'll use.

The usual procedure to come up with a weak form for a PDE is to multiply by some smooth test function $\phi$, move some derivatives onto $\phi$, and integrate. For the conservation law written above, we would arrive at the weak form

$$\int_\Omega\left(\frac{\partial q}{\partial t}\cdot\phi - f(q)\cdot\nabla\phi\right)dx = \int_\Omega s\cdot\phi\, dx + \ldots$$

where I've used an ellipsis to stand for some boundary terms that don't particularly matter. Unfortunately, this equation doesn't quite tell the whole story. We're using discontinuous basis functions to represent the solution $q$, and ideally we would use the same basis and test functions. What happens when the test functions are discontinuous too?

Let $\phi$ be some basis function and let $K$ be the cell of the domain where $\phi$ is supported. If we apply the usual procedure, we get an element-wise weak form when integrating against $\phi$:

$$\int_K\left(\frac{\partial q}{\partial t}\phi - f(q)\cdot\nabla\phi\right)dx + \int_{\partial K}f(q)\cdot\phi n\, ds = \int_K s\cdot\phi\, dx + \ldots$$

where $n$ is the unit outward normal vector to $K$. Note that we're integrating over only a single element and not the entire domain. The problem here is that if the solution and the basis functions are discontinuous across the element, we can't uniquely define their values on the boundary.

To see why this is so, you can imagine that, instead of having a discontinuous test function, we have a sequence $\phi_\epsilon$ of continuous test functions that converge to $\phi$ in some appropriate norm. If we take the support of each element of the sequence to be contained in the interior of $K$, then the value of $q$ in the boundary integral will its the value approaching the boundary from inside:

$$q_-(x) = \lim_{\epsilon\to 0}q(x - \epsilon n).$$

Alternatively, if we take $K$ to be contained in the interior of the support of each element of the sequence, then the value of the solution in the boundary integral will be its value approach the boundary from the outside:

$$q_+(x) = \lim_{\epsilon\to 0}q(x + \epsilon n).$$

Finally, with the right choice of sequence we could get any weighted average of the values on either side of the interface. As a consequence, we need to make some choice of the numerical flux. The numerical flux $f^*$ is a function of the interface values $q_+$ and $q_-$ and the unit normal vector $n$. The discrete approximation to the solution will satisfy the ODE system

$$\sum_K\left\{\int_K\left(\frac{\partial q}{\partial t}\phi - f(q)\cdot\nabla\phi\right)dx + \int_{\partial K}f^*(q_-, q_+, n)\cdot\phi\, ds\right\} = \sum_K\int_K s\cdot\phi\, dx + \ldots$$

for all test functions $\phi$. What kinds of functions can make a good numerical flux? First, if the solution is continuous across an element boundary, the numerical flux should give the same value as the true physical flux:

$$f^*(q, q, n) = f(q)\cdot n.$$

This condition is called consistency and it guarantees that the exact solution is also a discrete solution. The second property we want is to have some analogue of the conservative nature of the true problem. The important thing about fluxes in physical problems is that they can't create or destroy mass, momentum, energy, etc., they only transport it around the domain. To see how we can attain a similar property for our discrete problem, first observe that the sum over all the boundary integrals is telescoping because two neighboring cells $K_-$, $K_+$ share a common face $E$. We can then rewrite the sum of all the boundary integrals as a sum over all faces $E$ of the mesh:

$$\sum_K\int_{\partial K}f^*(q_-, q_+, n)\phi\, ds = \sum_E\int_E\left\{f^*(q_-, q_+, n_-)\phi_- + f^*(q_+, q_-, n_+)\phi_+\right\}ds$$

here $n_-$, $n_+$ are the unit outwardn ormal vectors to $K_-$ and $K_+$ respectively. Note that $n_+ = -n_-$, i.e. the two normals point in opposite directions to each other. What happens if the test function $\phi$ is identically equal to 1 throughout the entire domain? In that case the facet integrals should sum up to 0 -- fluxes only transport, they don't create or destroy. The numerical flux is conservative if

$$f^*(q_-, q_+, n) + f^*(q_+, q_-, -n) = 0.$$

The most braindead way we can come up with a sane numerical flux is to take the average of the solution values across the cell boundary:

$$f^*(q_-, q_+, n) = \frac{1}{2}(q_- + q_+)\cdot n.$$

This is called the central flux. Let's see how well it works.

from firedrake import grad, dx, ds, dS

q, ϕ = firedrake.TrialFunction(Q0), firedrake.TestFunction(Q0)
m = q * ϕ * dx

q = q0.copy(deepcopy=True)
cell_flux = -inner(grad(ϕ), q * u) * dx

n = firedrake.FacetNormal(mesh)
f = q * inner(u, n)
face_flux = (f("+") - f("-")) * (ϕ("+") - ϕ("-")) * dS

q_in = Constant(0)
influx = q_in * min_value(0, inner(u, n)) * ϕ * ds
outflux = q * max_value(0, inner(u, n)) * ϕ * ds

We'll take our timestep to be 1/4 of the formal CFL-stable timestep. We need at least a factor of 1/2 for the dimension, and probably another factor of 1/2 for triangle shape.

from numpy import pi as π
final_time = 2 * π
num_steps = 4 * int(final_time / cfl_timestep)
dt = Constant(final_time / num_steps)

Since we're repeatedly solving the same linear system, we'll create problem and solver objects so that this information can be reused from one solve to the next. The solver parameters are specially chosen for the fact that the mass matrix with discontinuous Galerkin methods is block diagonal, so a block Jacobi preconditioner with exact solvers on all the blocks is exact for the whole system.

from firedrake import LinearVariationalProblem, LinearVariationalSolver
dq_dt = -(cell_flux + face_flux + influx + outflux)

δq = firedrake.Function(Q0)
problem = LinearVariationalProblem(m, dt * dq_dt, δq)
parameters = {"ksp_type": "preonly", "pc_type": "bjacobi", "sub_pc_type": "ilu"}
solver = LinearVariationalSolver(problem, solver_parameters=parameters)
import numpy as np
qrange = np.zeros((num_steps, 2))

from tqdm.notebook import trange
for step in trange(num_steps, unit="timesteps"):
    solver.solve()
    q += δq
    qrange[step, :] = q.dat.data_ro.min(), q.dat.data_ro.max()

After only 250 steps the solution is already attaining values two orders of magnitude greater than what they should, even while using a CFL-stable timestep. The reason for this is that the central flux, while both consistent and conservative, is numerically unstable with forward time-differencing.

fig, axes = plt.subplots()
axes.set_yscale("log")
axes.plot(qrange[:250, 1])
axes.set_xlabel("timestep")
axes.set_ylabel("solution maximum");
No description has been provided for this image

Instead, we'll try the upwind numerical flux. The idea of the upwind flux is to sample from whichever side of the interface has the velocity flowing outward and not in. The numerical flux is defined as

$$f^*(q_-, q_+, n) = \begin{cases}q_-u\cdot n && u\cdot n > 0 \\ q_+u\cdot n && u\cdot n \le 0\end{cases}.$$

We can also write this in a more symmetric form as

$$f^*(q_-, q_+, n) = q_-\max\{0, u\cdot n\} + q_+\min\{0, u\cdot n\}.$$

The upwind flux is designed to mimic the stability properties of one-sided finite difference schemes for transport equations.

q = q0.copy(deepcopy=True)
cell_flux = -inner(grad(ϕ), q * u) * dx

n = firedrake.FacetNormal(mesh)
u_n = max_value(inner(u, n), 0)
f = q * u_n
face_flux = (f("+") - f("-")) * (ϕ("+") - ϕ("-")) * dS

q_in = Constant(0)
influx = q_in * min_value(0, inner(u, n)) * ϕ * ds
outflux = q * max_value(0, inner(u, n)) * ϕ * ds
       
dq_dt = -(cell_flux + face_flux + influx + outflux)

δq = firedrake.Function(Q0)
problem = LinearVariationalProblem(m, dt * dq_dt, δq)
parameters = {"ksp_type": "preonly", "pc_type": "bjacobi", "sub_pc_type": "ilu"}
solver = LinearVariationalSolver(problem, solver_parameters=parameters)

qs = []
output_freq = 5

for step in trange(num_steps, unit="timesteps"):
    solver.solve()
    q += δq
    if step % output_freq == 0:
        qs.append(q.copy(deepcopy=True))

We at least get a finite answer as a result, which is a big improvement. Keeping in mind that the original data capped out at a value of 1, the peaks have shrunk considerably, and we can also see that the sharp cone is much more rounded than before.

%%capture
fig, axes = plt.subplots()
axes.set_aspect("equal")
axes.set_axis_off()
kw = {"num_sample_points": 1, "vmin": 0, "vmax": 1, "shading": "gouraud"}
colors = firedrake.tripcolor(q, **kw, axes=axes)

from matplotlib.animation import FuncAnimation
fn_plotter = firedrake.FunctionPlotter(mesh, num_sample_points=1)
animate = lambda q: colors.set_array(fn_plotter(q))
interval = 1e3 * output_freq * float(dt)
animation = FuncAnimation(fig, animate, frames=qs, interval=interval)
from IPython.display import HTML
HTML(animation.to_html5_video())

Despite this fact, the total volume under the surface has been conserved to within rounding error.

print(firedrake.assemble(q * dx) / firedrake.assemble(q0 * dx))
0.999971350896164

Nonetheless, the relative error in the $L^1$ norm is quite poor.

firedrake.assemble(abs(q - q0) * dx) / firedrake.assemble(q0 * dx)
np.float64(0.6651047426779894)

Let's see if we can improve on that by changing the finite element basis.

Higher-order basis functions

One of the main advantages that the discontinuous Galerkin method has over the finite volume method is that achieving higher-order convergence is straightforward if the problem is nice -- you just increase the polynomial degree. (When the problem is not nice, for example if there are shockwaves, everything goes straight to hell and the finite volume method is much less finicky about stability.) Here we'll look at what happens when we go from piecewise constant basis functions to piecewise linear.

One of the first changes we have to make is that the Courant-Friedrichs-Lewy condition is more stringent for higher-order basis functions. For piecewise constant basis functions, we have that $\delta x / \delta t \ge |u|$; for degree-$p$ polynomials, we instead need that

$$\frac{\delta x}{\delta t} \ge (2p + 1)\cdot|u|.$$

One way of looking at this higher-degree CFL condition is that the introduction of more degrees of freedom makes the effective spacing between the nodes smaller than it might be in the piecewise-constant case. The multiplicative factor of $2p + 1$ accounts for the effective shrinkage in the numerical length scale. (For more, see this paper from 2013.) Once again, we'll use a timestep that's 1/4 of the formal CFL timestep to account for the spatial dimension and the mesh quality.

cfl_timestep = min_diameter / max_speed / 3
num_steps = 4 * int(final_time / cfl_timestep)
dt = Constant(final_time / num_steps)

We have to be a bit carefuly about creating the initial data. For discontinuous Galerkin discretizations, we would normally project the expression into the discrete function space. Since this is a projection in $L^2$, we might get negative values for an otherwise strictly positive expression. In this case, the positivity of the solution is vital and so instead I'm interpolating the expression for the initial data, but doing so is a little dangerous.

Q1 = firedrake.FunctionSpace(mesh, family="DG", degree=1)
q0 = firedrake.Function(Q1).interpolate(q_expr)
q0.dat.data_ro.min(), q0.dat.data_ro.max()
(np.float64(0.0), np.float64(0.9990945067964072))

In almost every other respect the discretization is the same as before.

q, ϕ = firedrake.TrialFunction(Q1), firedrake.TestFunction(Q1)
m = q * ϕ * dx

q = q0.copy(deepcopy=True)
cell_flux = -inner(grad(ϕ), q * u) * dx

n = firedrake.FacetNormal(mesh)
u_n = max_value(inner(u, n), 0)
f = q * u_n
face_flux = (f("+") - f("-")) * (ϕ("+") - ϕ("-")) * dS

q_in = Constant(0)
influx = q_in * min_value(0, inner(u, n)) * ϕ * ds
outflux = q * max_value(0, inner(u, n)) * ϕ * ds

dq_dt = -(cell_flux + face_flux + influx + outflux)

δq = firedrake.Function(Q1)
problem = LinearVariationalProblem(m, dt * dq_dt, δq)
parameters = {"ksp_type": "preonly", "pc_type": "bjacobi", "sub_pc_type": "ilu"}
solver = LinearVariationalSolver(problem, solver_parameters=parameters)

for step in trange(num_steps, unit="timesteps"):
    solver.solve()
    q += δq

The error in the $L^1$ norm is less than that of the degree-0 solution, which was on the order of 40%, but it's far from perfect.

firedrake.assemble(abs(q - q0) * dx) / firedrake.assemble(q0 * dx)
np.float64(0.08908932229220613)

Worse yet, the final value of the solution has substantial over- and undershoots. The mathematical term for this is that the true dynamics are monotonicity-preserving -- they don't create new local maxima or minima -- but the numerical scheme is not.

fig, axes = plt.subplots()
axes.set_aspect("equal")
colors = firedrake.tripcolor(q, axes=axes)
fig.colorbar(colors);
No description has been provided for this image

To be precise and for later comparison we'll print out exactly how far outside the initial range the solution goes.

q.dat.data_ro.min(), q.dat.data_ro.max()
(np.float64(-0.09991665375509991), np.float64(1.025164306094238))

But of course we're only using the explicit Euler timestepping scheme, which is of first order, while our spatial discretization should be 2nd-order accurate. Can we do better if we match the asymptotic accuracy of the errors in time and space?

Timestepping

Choosing a finite element basis or a numerical flux is part of deciding how we'll discretize the spatial part of the differential operator. After that we have to decide how to discretize in time. The explicit Euler, which we used in the preceding code, has the virtue of simplicity. Next we'll try out the strong stability-preserving Runge-Kutta method of order 3. First, we'll create a form representing the rate of change of $q$ with the upwind flux just as we did before.

q = q0.copy(deepcopy=True)
ϕ = firedrake.TestFunction(Q1)
cell_flux = -inner(grad(ϕ), q * u) * dx

n = firedrake.FacetNormal(mesh)
u_n = max_value(inner(u, n), 0)
f = q * u_n
face_flux = (f("+") - f("-")) * (ϕ("+") - ϕ("-")) * dS

q_in = Constant(0)
influx = q_in * min_value(0, inner(u, n)) * ϕ * ds
outflux = q * max_value(0, inner(u, n)) * ϕ * ds

dq_dt = -(cell_flux + face_flux + influx + outflux)

To implement the SSPRK3 timestepping scheme, we'll introduce some auxiliary functions and solvers for the Runge-Kutta stages.

q1 = firedrake.Function(Q1)
q2 = firedrake.Function(Q1)

F2 = firedrake.replace(dq_dt, {q: q1})
F3 = firedrake.replace(dq_dt, {q: q2})

problems = [
    LinearVariationalProblem(m, dt * dq_dt, δq),
    LinearVariationalProblem(m, dt * F2, δq),
    LinearVariationalProblem(m, dt * F3, δq)
]

solvers = [
    LinearVariationalSolver(problem, solver_parameters=parameters)
    for problem in problems
]

The timestepping loop is more involved; we have to separately evaluate the Runge-Kutta stages and then form the solution as an appropriate weighted sum.

for step in trange(num_steps, unit="timesteps"):
    solvers[0].solve()
    q1.assign(q + δq)

    solvers[1].solve()
    q2.assign(3 * q / 4 + (q1 + δq) / 4)

    solvers[2].solve()
    q.assign(q / 3 + 2 * (q2 + δq) / 3)

The SSPRK3 scheme gives a huge improvement in how well it agrees with the true solution.

firedrake.assemble(abs(q - q0) * dx) / firedrake.assemble(q0 * dx)
np.float64(0.03158548974444833)

In the eyeball norm, it looks like it stays pretty well wtihin the upper and lower limits of the initial data.

fig, axes = plt.subplots()
axes.set_aspect("equal")
colors = firedrake.tripcolor(q, axes=axes)
fig.colorbar(colors);
No description has been provided for this image

But if we explicitly calculate the upper and lower bounds, we see that this scheme also fails to be monotonicity preserving!

q.dat.data_ro.min(), q.dat.data_ro.max()
(np.float64(-0.015254743794348664), np.float64(1.0011548806442558))

The departures are relatively small but for more challenging or nonlinear problems the overshoots can become more severe. There is (unfortunately) a can't-do theorem that tells us why: the Godunov barrier. This theorem states that any linear, monotonicity-preserving scheme for hyperbolic conservation laws can be at most 1st-order accurate.

In principle this might sound like a bit of a bummer; why bother looking for higher-order accurate numerical schemes if they're doomed to do unphysical things that will likely result in instability? The operative word here is a linear scheme. The Godunov barrier does not rule out the possibility of nonlinear monotonicity-preserving schemes. I find it profoundly disturbing that we should be using nonlinear schemes to approximate the solutions of linear conservation laws, but ours is but to do and die I suppose.

Flux limiters

The Godunov barrier motivated the development in the early 80s of post-processing techniques that would turn an otherwise oscillatory scheme into one that does not introduce new local maxima or minima. These ideas fall under the aegis of flux limiters or slope limiters, which apply a transformation that clamps the solution in such a way as to suppress unrealistic gradients near sharp discontinuities but which leave the solution unaltered where it is smooth. The design of limiters is part science and part art. Sweby (1984) established some constraints on the what a good limiter function can look like in order to guarantee that the numerical scheme is variation-diminishing. But there's a very large range within those constraints; Sweby's paper showed three different ones even in 1984 and the wiki article on flux limiters lists 15.

Flux-corrected transport is a huge subject, and rather than try to do it any kind of justice I'll instead refer you to a wonderful book by Dmitri Kuzmin. Instead, let's finish things off by looking at what happens when we add a flux limiter to our simulation above. The application of the limiter will be interleaved with all of the Runge-Kutta stages, and conveniently we can reuse the existing solvers for the SSPRK3 stages.

q.assign(q0)
limiter = firedrake.VertexBasedLimiter(Q1)
for step in trange(num_steps, unit='timesteps'):
    solvers[0].solve()
    q1.assign(q + δq)
    limiter.apply(q1)
    
    solvers[1].solve()
    q2.assign(3 * q / 4 + (q1 + δq) / 4)
    limiter.apply(q2)
    
    solvers[2].solve()
    q.assign(q / 3 + 2 * (q2 + δq) / 3)
    limiter.apply(q)

The relative error in the 1-norm is just as good as before, but with the flux limiter the solution does a much better job staying within the bounds of the initial data.

firedrake.assemble(abs(q - q0) * dx) / firedrake.assemble(q0 * dx)
np.float64(0.027156220419849435)
q.dat.data_ro.min(), q.dat.data_ro.max()
(np.float64(-0.008129509947194847), np.float64(0.9836403522058957))

Conclusion

Hyperbolic problems are hard. There are difficult decisions to make even at the level of how to formulate the discrete problem. For this demo, we were looking at a scalar conservation laws, and the upwind flux works quite well. But for systems of conservation laws, like the shallow water equations, things become much more involved. You have to know something at an analytical level about the underlying problem -- the wave speeds. Once we've decided which discrete problem to solve, going beyond first-order accuracy is filled with even more challenges. Some issues, like getting a stable enough timestep, often require manual tuning. For the linear problem shown here, we know what the wave speeds are from the outset and we have reasonable confidence that we can pick a good timestep that will work for the entire simulation. The solutions of nonlinear conservation laws can meander to regions of state space where the wave speeds are much higher than where they started and an initially stable timestep becomes unstable. The Right Thing To Do is to use an adaptive timestepping scheme. But you now have the added implementational difficulty of tracking a higher- and lower-order solution with which to inform the adaptation strategy. Hopefully this has shown what some of the typical pitfalls are and what tools are available to remedy them.