The Feldman-Hájek theorem

In other posts, I've touched on the idea of sampling from probability measures defined over spaces of functions. We might be inferring the density of material within the earth from seismic surveys, the elevation of the earth's surface from lidar measurements, or the conductivity of an aquifer from well head measurements. For the sake of definiteness, we'll assume throughout that the function we're looking for is an element of the space $X = L^2(\Omega)$ for some domain $\Omega$. We could consider Sobolev or other spaces but the idea is the same. I'll also tacitly assume that $\Omega$ is regular enough for PDE theory to apply. That usually means that the boundary of $\Omega$ has to be piecewise smooth and have no cusps.

You can think of a random variable with values in $X$, i.e. a function selected at random according to some statistical law, as a stochastic process. I'll use the two terms interchangeably. In the stochastic process literature, it's common to focus on processes that have certain useful statistical properties. For example, we might think of $\Omega$ as some time interval $[0, T]$ and a random function $z(t)$ as a function of time $t$. We'd then like to have some notion of causality. The values of $z(t)$ should depend on the values of $z(s)$ for $s$ coming before $t$ but not after. The Markov property gives us this idea of causality. If $\Omega$ is instead a spatial domain, we might instead be interested in random fields where we specify analytically the covariance between $z(x)$ and $z(y)$ as a function of $x - y$. For example, it's common to assume that the conductivity of an aquifer is a random field whose logarithm has the spatial correlation structure $$\mathbb E[(z(x) - m)(z(y) - m)] = \text{const}\times\exp(-|x - y|^2/\lambda^2).$$ This is all the province of geostatistics. But in the following I'll be focused more on general principles without making any assumptions about the dependence structure of the process.

The normal distribution has a kind of universal character in finite-dimensional statistical inference. We want to generalize the idea of a normal distribution to function spaces -- a Gaussian process. We'll have to think a bit about measure theory. This will lead us by turns to ask about the relation between two distinct normal random variables with values in a function space. This relationship will be summed up in the Feldman-Hájek theorem. It's a doozy. I'll do some numerical experiments to illustrate the theorem.

Normal random variables

A normal random variable $Z$ has a density $$\rho(z) = \frac{\exp\left(-\frac{1}{2}(z - m)^*C^{-1}(z - m)\right)}{\sqrt{(2\pi)^n|\det C|}}$$ where $m$ is the mean, $C$ the covariance matrix. To talk about normal random variables on function spaces, we'll have to think about things that we take for granted. What does it mean to say that a random variable $Z$ has a density at all? In finite dimensions, this means that we can evaluate expectations of any function of $Z$ by evaluating an integral: $$\mathbb E[f(Z)] = \int f(z)\rho(z)\mathrm dz.$$ Fine, but what is $\mathrm dz$? It's a symbolic shorthand for integration with respect to Lebesgue measure. I'll write Lebesgue measure as $\lambda$ for the time being. Let $\mu$ be the distribution for $Z$, i.e. $$\mu(Z) = \mathbb P[Z \in A].$$ When we say that $Z$ has a density, we really mean that, for any set $A$ such that $\lambda(A) = 0$, we must also have that $\mu(A) = 0$ as well. The fancy term for this is that $\mu$ is absolutely continuous with respect to $\lambda$, usually written as $\mu \ll \lambda$. The Radon-Nikodym theorem then tells us that there is a unique positive function $\rho$ such that, for all $f$, $$\int f\,\mathrm d\mu = \int f\,\rho\,\mathrm d\lambda.$$ At a formal level we might say that $\mathrm d\mu = \rho\,\mathrm d\lambda$, hence the notation $$\rho = \frac{\mathrm d\mu}{\mathrm d\lambda}.$$ The usual Lebesgue measure provides a kind of background, reference measure to which other probability measures can be absolutely continuous. Lebesgue measure is not itself a probability measure because the volume of the entire space is infinite. Discrete random variables don't have densities. The Lebesgue measure of a single point is zero. Discrete random variables instead have densities with respect to a different measure, for example the counting measure on $\mathbb Z$. The counting measure, which is again not itself a probability measure, fulfills the role of the reference measure to which discrete random variables can be absolutely continuous.

In infinite dimensions, there is no spoon Lebesgue measure. To be more specific, there's no measure on an infinite-dimensional vector space that is non-trivial and translation-invariant. There is no background reference measure "$\mathrm dz$" to integrate against. We can't write down a normal density -- a density is defined w.r.t. a background measure and we don't have one. There are still well-defined normal random variables and probability measures.

Not having a measure $\mathrm dz$ or a density $\rho(z)$ to work with doesn't stop physicists from writing functional integrals. I won't use this notation going forward, but I do find it appealing.

To define a normal random variable on a function space, we can instead work with arbitrary finite-dimensional projections. Let $\{g_1, \ldots, g_N\}$ be a finite collection of elements of the dual space $X^*$. We can then define a linear operator $G : X \to \mathbb R^N$ by $$G = \sum_k e_k\otimes g_k$$ where $e_k$ is the $k$-th standard basis vector in $\mathbb R^N$. Given a random variable $Z$ with values in $X$, $$GZ = \left[\begin{matrix}\langle g_1, Z\rangle \\ \vdots \\ \langle g_N, Z\rangle\end{matrix}\right]$$ is a random vector taking values in $\mathbb R^N$. We say that $Z$ is a normal random variable if $GZ$ is normal for all finite choices of $N$ and $G$. The mean of $Z$ is the vector $m$ in $X$ such that $GZ$ has the mean $Gm$ for all linear mappings $G$. Likewise, the covariance $C$ of $Z$ is an element of $X\otimes X$ such that $GZ$ has the covariance matrix $GCG^*$ for all $G$.

Not just any operator can be the covariance of a normal random variable on a function space. It ought to be familiar from the finite-dimensional theory that $C$ has to be symmetric and positive-definite. You might imagine writing down a decomposition for $C$ in terms of its eigenfunctions $\{\psi_n\}$. We know that the eigenfunctions are orthonormal because $C$ is symmetric. Since the eigenvalues are real and positive, we can write them as the squares $\sigma_n^2$ of a sequence $\{\sigma_n\}$ of real numbers as a mnemonic. I'll do this throughout because it becomes convenient in a number of places. The action of $C$ on a function $\phi$ is then $$C\phi = \sum_n\sigma_n^2\langle\psi_n, \phi\rangle\,\psi_n.$$ In the function space setting, we need that the sum of all the eigenvalues is finite: $$\sum_n\sigma_n^2 < \infty.$$ The fancy term for this is that $C$ has to be of trace class. In finite dimensions we don't have to worry about this condition at all. In function spaces, it's easy to try and use a covariance operator with $\sigma_n^2 \sim n^{-1}$ by mistake. The trace then diverges like the harmonic series.

The Karhunen-Loève expansion

The spectral theorem assures us that the covariance operator of a Gaussian process has to have an eigenvalue factorization. The Karhunen-Loève theorem then shows how we can use that decomposition to understand the stochastic process. It states that, if $\{\xi_k\}$ are an infinite sequence of i.i.d. standard normal random variables, then the random variable $$Z = \sum_n\sigma_n\,\xi_n\,\psi_n$$ has the distribution $N(0, C)$. In other words, if we know the eigendecomposition, we can generate samples from the process. We'll use this property in just a moment to simulate Brownian bridges. The eigenfunctions for the covariance operator of a Brownian bridge are trigonometric functions. We're simulating the stochastic process by Fourier synthesis.

Rather than start with a covariance operator and then compute its eigenvalue decomposition, the KL expansion also gives us a way to synthesize random variables with useful properties by picking the eigenvalues and eigenfunctions. The covariance operator is then $$C = \sum_n\sigma_n^2\,\psi_n\otimes\psi_n.$$ We can compute a closed form expression for $C$ as some integral operator $$C\phi = \int_\Omega \mathscr C(x, y)\phi(y)dy.$$ in some cases. But we don't need an explicit expression for the kernel for the KL expansion to be useful.

The code below generates some random functions on the domain $\Omega = [0, 1]$. Here we use the basis functions $$\psi_n = \sin(\pi n x).$$ We can then try eigenvalue sequences $\sigma_n = \alpha n^{-\gamma}$ for some power $\gamma$. Taking $\gamma \ge 1$ guarantees that $\sum_n\sigma_n^2$ is finite.

import numpy as np
from numpy import pi as π
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

def generate_sample(x, σs, rng):
    ξs = rng.standard_normal(size=len(σs))
    return sum(
        σ * ξ * np.sin(π * n * x)
        for n, (σ, ξ) in enumerate(zip(σs, ξs), start=1)
    )
num_points = 512
num_modes = num_points // 2
x = np.linspace(0.0, 1.0, num_points + 1)
ns = np.linspace(1.0, num_modes, num_modes)
rng = np.random.default_rng(seed=1729)

The plot below compares two function-valued random variables with different eigenvalue sequences. The plot on the top shows a classic Brownian bridge: $\sigma_n^2 = 2 / \pi^2n^2$. The plot on the bottom shows $\sigma_n^2 = 2 / \pi^2n^{2\gamma}$ where $\gamma = 1.6$. The second stochastic process is much smoother, which we expect -- the Fourier coefficients decay faster.

γ_1, γ_2 = 1.0, 1.6
σs_1 = np.sqrt(2) / (π * ns ** γ_1)
σs_2 = np.sqrt(2) / (π * ns ** γ_2)

num_samples = 128
ws = np.stack([generate_sample(x, σs_1, rng) for k in range(num_samples)])
zs = np.stack([generate_sample(x, σs_2, rng) for k in range(num_samples)])
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
for ax in axes:
    ax.set_axis_off()

for w, z in zip(ws, zs):
    axes[0].plot(x, w, color="tab:blue", alpha=0.25)
    axes[1].plot(x, z, color="tab:orange", alpha=0.25)
No description has been provided for this image

The Feldman-Hájek theorem

Now consider two $X$-valued normal random variables $W$ and $Z$. To simplify things, we'll assume that both of them have zero mean. The plots above might suggest that these two stochastic processes behave differently in some essential way that we can quantify. For example, the first process is a Brownian bridge. A Brownian bridge is continuous with probability 1, but it is nowhere differentiable. The second process, on the other hand, is almost surely differentiable. We can be a little more formal about this and say that $$\mathbb P[W \in H^1(\Omega)] = 0, \quad \mathbb P[Z \in H^1(\Omega)] = 1$$ where $H^1(\Omega)$ is the space of functions with square-integrable derivatives. The two random variables are not mutually absolutely continuous. That might not be too surprising. We picked the eigenvalue distributions of each process to have different decay rates: $$\sigma_n^2(W) \sim n^{-2}, \quad \sigma_n^2(Z) \sim n^{-2\gamma}.$$ Just to really hammer this point home, let's evaluate an approximation to the mean-square derivative of each group of samples.

δx = 1 / num_points

Dw = np.diff(ws, axis=1) / δx
energies_w = np.sum((Dw * Dw) * δx, axis=1)
print(f"∫|∇w|² dx = {energies_w.mean():.1f} ± {energies_w.std():.1f}")

Dz = np.diff(zs, axis=1) / δx
energies_z = np.sum((Dz * Dz) * δx, axis=1)
print(f"∫|∇z|² dx = {energies_z.mean():.1f} ± {energies_z.std():.1f}")
∫|∇w|² dx = 240.2 ± 21.5
∫|∇z|² dx = 3.8 ± 1.5

If you run this notebook with an even finer discretization of the unit interval, you'll see that $\int|\nabla w|^2\mathrm dx$ keeps increasing while the value of $\int|\nabla z|^2\mathrm dx$ stabilizes.

What conditions guarantee that two function-value random variables are absolutely continuous? In other words, we want to know when it is impossible for there to be some subset $A$ of $X$ such that there is positive probability that $W$ is in $A$ but zero probability that $Z$ is in $A$ or vice versa. What needs to be true about the covariance operators of the two random variables to fulfill this condition? This is the content of the Feldman-Hájek theorem.

The calculation above suggests that two normal distributions will concentrate on different sets if the ratios $\sigma_n^2(W)/\sigma_n^2(Z)$ of the eigenvalues of their covariance operators can go to either zero or infinity. The example above used two eigenvalue sequences that look like $n^{-\alpha}$ for different values of $\alpha$. The gap in the decay rates then lets us construct a quadratic functional for which the process $Z$ takes finite values and $W$ takes infinite values. The fact of this functional taking on finite or infinite values on one process or the other then implies the concentration of the two probability measures on different sets. The fact that the mean square gradient of $W$ is infinite implies that $W$ is not differentiable with probability 1, while $Z$ is differentiable with probability 1. Put another way, the probability distribution for $Z$ is concentrated in the set of differentiable functions while the probability distribution for $W$ is not.

We can always construct an operator like this when the decay rates differ, although the effect might be subtler. For example, the distinction might be in having a higher-order derivative or not.

What if the ratio of the eigenvalue sequences doesn't go to zero or infinity? Are the laws of the two processes equivalent, or can they still concentrate on different sets? We can make it even simpler by supposing that $W \sim N(0, C)$ and $Z \sim N(0, \alpha C)$ for some non-zero $\alpha$. We'll show that even these random variables concentrate on different sets. I got this example from MathOverflow. Let $\{\sigma_n^2\}$ be the eigenvalues of $C$ (note the square) and $\{\psi_n\}$ the eigenfunctions. Define the sequence of random variables $$U_n = \langle \psi_n, W\rangle / \sigma_n.$$ We know from the KL expansion that the $U_n$ are independent, identically-distributed standard normal random variables. If we also define $$V_n = \langle\psi_n, Z\rangle / \sigma_n$$ then these random variables are i.i.d. normals with mean zero and variance $\alpha$.

Now remember that $W$ and $Z$ take values in a function space and so there are infinitely many modes to work with. The strong law of large numbers then implies that $$\frac{1}{n}\sum_nU_n^2 \to 1$$ with probability 1. This would not be possible in a finite-dimensional vector space -- the sum would have to terminate at some $N$. The average of $U_n^2$ would be a random variable with some non-trivial spread around its mean. It can take on any positive value with non-zero probability. That probability could be minute, but it is still positive. Likewise, we also find that $$\frac{1}{n}\sum_nV_n^2 \to \alpha$$ with probability 1. But we've now shown that the two probability measures concentrate on different sets as long as $\alpha$ is not equal to 1. The weighted average of squared expansion coefficients takes distinct values, each with probability 1, for the distributions of $W$ and $Z$. So the two distributions are mutually singular.

Just to illustrate things a bit, the plot below shows samples obtained with a spectrum of $\{\sigma_n\}$ generated as before with $\gamma = 1.5$ and with $2\sigma_n$ respectively.

γ = 1.5
σs = np.sqrt(2) / (π * ns ** γ)
ws = np.stack([generate_sample(x, σs, rng) for k in range(num_samples)])
zs = np.stack([generate_sample(x, 2 * σs, rng) for k in range(num_samples)])
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
for ax in axes:
    ax.set_axis_off()

for w, z in zip(ws, zs):
    axes[0].plot(x, w, color="tab:blue", alpha=0.25)
    axes[1].plot(x, z, color="tab:orange", alpha=0.25)
No description has been provided for this image

What this example shows is that it's not enough that $\sigma_n(W)$ and $\sigma_n(Z)$ have the same decay rate. They both decay like some constant $\times$ $n^{-\gamma}$ for the same value of $\gamma$. The two distributions are mutually singular even if the rates are the same but the constants are different.

At this point you might wonder if it's enough to have $$\sigma_n(W)/\sigma_n(Z) \to 1.$$ Even this much tighter condition is not enough!

The Feldman-Hájek theorem states that, in order for $W$ and $Z$ to not concentrate on different sets, we need that $$\sum_n\left(\frac{\sigma_n(W)}{\sigma_n(Z)} - 1\right)^2 < \infty.$$ Not only do we need that the eigenvalue ratio goes to 1, we need it to go to 1 fast enough to be summable. For example, suppose we had a pair of $X$-valued random variables such that $$\frac{\sigma_n(W)}{\sigma_n(Z)} - 1 \sim \frac{\text{const}}{\log n}.$$ The relative error between the two eigenvalue sequences is decreasing to zero. But it isn't decreasing fast enough, so the two random variables would concentrate on different sets.

Why does this matter? I stumbled on the Feldman-Hájek theorem while trying to understand two papers, Beskos et al. (2008) and Cotter et al. (2013). Both of these papers are about how to do Markov-chain Monte Carlo sampling of probability distributions defined over function spaces. The main ingredient of MCMC is a proposal mechanism, which generates a new candidate sample $z$ from the current sample $w$. We can then also define a reverse proposal -- what is the probability of proposing $w$ if we instead started from $z$? In order for an MCMC method to converge, Tierney (1998) showed that the forward and reverse proposals need to be mutually absolutely continuous. They cannot concentrate probability on different sets. In finite dimensions, it's so hard to violate this condition that it often goes unstated. In function spaces, on the other hand, you're almost guaranteed to violate it unless you're careful. Even for the nicest class of distributions (normal random variables), the spectra of the covariance operators have to have nearly identical asymptotic behavior. In light of that, it's not surprising that MCMC sampling in function spaces is so challenging.

Plate theory

In this post we'll look at a fourth-order equation describing the vibrations of clamped plates. Fourth-order problems are much harder to discretize than second-order. Here we'll get around that by using its dual or mixed form. This will use some of the same techniques that we've already seen with the Stokes equations. But it won't be a perfectly conforming discretization, and as a consequence we'll need some jump terms that ought to be familiar from the discontinuous Galerkin method. Finally we'll compute some eigenfunctions of plate operator. This will look similar to what we did with Weyl's law but with an extra mathematical subtlety.

Our starting point is linear elasticity. Here we want to solve for the 3D displacement $u$ of the medium. The first equation is momentum conservation: $$\rho\ddot u = \nabla\cdot\sigma + f$$ where $\sigma$ is the stress tensor and $f$ the body forces. To close the system, we need to supply a constitutive equation relating the stress tensor and the strain tensor $$\varepsilon = \frac{1}{2}\left(\nabla u + \nabla u^*\right).$$ The most general linear constitutive equation we can write down is $$\sigma = \mathscr C\,\varepsilon$$ where $\mathscr C$ is the rank-4 elasticity tensor. We need that $\mathscr C$ maps symmetric tensors to symmetric tensors and that $\varepsilon:\mathscr C\varepsilon$ is always positive. The equilibrium displacement of the medium is the minimizer of the energy functional $$J(u) = \int_\Omega\left(\frac{1}{2}\sigma:\varepsilon + f\cdot u\right)\mathrm dx + \ldots$$ where the ellipses stand for various boundary forcings that we'll ignore.

For a medium that is homogeneous and isotropic, the elasticity tensor has to have the form $$\mathscr C\,\varepsilon = 2\,\mu\,\varepsilon + \lambda\,\text{tr}(\varepsilon)I$$ where $\mu$ and $\lambda$ are the Lamé parameters. As an aside, there are a mess of alternate forms of the elasticity equations. The wiki page has a conversion table at the bottom. Now take this with a grain of salt because I do fluid mechanics. But if I were a solid mechanics kinda guy, this would embarrass me. Get it together folks.

Plate theory is what you get when you assume the medium is thin along the vertical dimension and that this restricts the form that the displacements can take. The first and most widely agreed-upon simplification is that the vertical displacement is some function $w$ of the horizontal coordinates $x$, $y$: $$u_z = w(x, y).$$ From here we have to make additional assumptions about the horizontal displacements $u_x$ and $u_y$. Different theories make different sets of assumptions. The classical theory is the Kirchoff-Love plate. The Kirchoff theory assumes that any straight line that's perpendicular to the middle of the plate remains straight and perpendicular after deformation. This theory has some deficiences, see for example this paper. But it's a good starting point for experimentation. These assumptions let us write down the other components of the deformation in terms of $w$: $$u_x = -z\frac{\partial w}{\partial x}, \quad u_y = -z\frac{\partial w}{\partial y}.$$ It's possible that $u_x$ and $u_y$ are also offset by in-plane displacements. I'll assume that the boundary conditions make those equal to zero. We can then work out what the displacement gradient is: $$\nabla u = \left[\begin{matrix}-z\nabla^2 w & -\nabla w \\ +\nabla w^* & 0 \end{matrix}\right]$$ I'm being a little loose in my use of the gradient operator -- it's 3D on the left-hand side of this equation and 2D on the right. Notice how the antisymmetric part of the displacement gradient is all in the $x-z$ and $y-z$ components. When we symmetrize the displacement gradient, we get the strain tensor: $$\varepsilon = -z\left[\begin{matrix}\nabla^2 w & 0 \\ 0 & 0\end{matrix}\right].$$ Now remember that the medium is a thin plate. We can express the spatial domain as the product of a 2D footprint domain $\Phi$ and the interval $[-h / 2, +h / 2]$ where $h$ is the plate thickness. The strain energy is then $$\begin{align} J(w) & = \int_\Phi\int_{-h/2}^{+h/2}\left(\mu z^2|\nabla^2 w|^2 + \frac{\lambda}{2} z^2|\Delta w|^2 + fw\right)\mathrm dz\;\mathrm dx\\ & = \int_\Phi\left\{\frac{h^3}{24}\left(2\mu|\nabla ^2 w|^2 + \lambda|\Delta w|^2\right) + hfw\right\}\mathrm dx \end{align}.$$ Here I've used the fact that $\text{tr}(\nabla^2w) = \Delta w$ where $\Delta$ is the Laplace operator. In a gross abuse of notation I'll write this as $$J(w) = \int_\Omega\left(\frac{1}{2}\mathscr C\nabla^2w :\nabla^2 w + fw\right)h\,\mathrm dx$$ where $\mathscr C$ is an elasticity tensor. We'll need the explicit form $$\mathscr C\kappa = \frac{h^2}{12}\left(2\,\mu\,\kappa + \lambda\,\text{tr}(\kappa)\,I\right)$$ in a moment.

We can't discretize this problem right away using a conventional finite element basis. The usual piecewise polynomial basis functions are differentiable and piecewise continuous across cell edges. A conforming basis for a minimization problem involving 2nd-order derivatives would need to instead by continuously differentiable. It's much harder to come up with $C^1$ bases.

We have a few ways out.

  1. Use a $C^1$ basis like Argyris or Hsieh-Clough-Tocher. This approach makes forming the problem easy, but applying boundary conditions hard. Kirby and Mitchell (2018) implemented the Argyris element in Firedrake but had to use Nitsche's method to enforce the boundary conditions.
  2. Use $C^0$ elements and discretize the second derivatives using an interior penalty formulation of the problem. This approach is analogous to discretizing a 2nd-order elliptic problem using DG elements. Forming the problem is harder but applying the boundary conditions is easier. Bringmann et al. (2023) work out the exact form the interior penalty parameters necessary to make the discrete form work right.
  3. Use the mixed or dual form of the problem, which introduces the moment tensor explicitly as an unknown. Discretize it with the Hellan-Herrmann-Johnson or HHJ element. Arnold and Walker (2020) describe this formulation of the problem in more detail.

In the following, I'll take the third approach. The dual form of the problem introduces the moment tensor, which in another gross abuse of notation I'll write as $\sigma$, explicitly as an unknown. We then add the constraint that $\sigma = \mathscr C\nabla^2 w$. But switching to the dual form of the problem inverts constitutive relations, so we instead enforce $$\nabla^2w = \mathscr A\sigma$$ where $\mathscr A$ is the inverse to $\mathscr C$. If $\mathscr C$ is like the elasticity tensor in 3D, then $\mathscr A$ is like the compliance tensor. Because we'll need it later, the explicit form of the compliance tensor is $$\mathscr A\sigma = \frac{6}{h^2\mu}\left(\sigma - \frac{\lambda}{2(\mu + \lambda)}\text{tr}(\sigma)I\right).$$ The full Lagrangian for the dual form is $$L(w, \sigma) = \int_\Omega\left(\frac{1}{2}\mathscr A\sigma : \sigma - \sigma : \nabla^2 w - fw\right)h\,\mathrm dx.$$ Here the displacement $w$ acts like a multiplier enforcing the constraint that $\nabla\cdot\nabla\cdot\sigma + f = 0$.

To get a conforming discretization of the dual problem, we'd still need the basis functions for the displacements to have continuous derivatives. That's exactly what we were trying to avoid by using the dual form in the first place. At the same time, we don't assume any continuity properties of the moments. We can work around this problem by relaxing the continuity requirements for the displacements while strengthening them for the moments. The Hellann-Herrmann-Johnson element discretizes the space of symmetric tensor fields with just enough additional regularity to make this idea work. The HHJ element has normal-normal continuity: the quantity $n\cdot\sigma n$ is continuous across cell boundaries. The tangential and mixed components are unconstrained. This extra continuity is enough to let us use the conventional Lagrange finite elements for the displacements as long as we add a correction term to the Lagrangian: $$\ldots + \sum_\Gamma\int_\Gamma (n\cdot\sigma n)\,\left[\!\!\left[\frac{\partial w}{\partial n}\right]\!\!\right]h\,\mathrm dS$$ where $\Gamma$ are all the edges of the mesh and the double square bracket denotes the jump of a quantity across a cell boundary. What is especially nice about the HHJ element is that this correction is all we need to add; there are no mesh-dependent penalty factors.

Experiments

First let's try and solve a plate problem using simple input data. We'll set all the physical constants equal to 1 but you can look these up for the material of your fancy.

import firedrake
from firedrake import Constant, inner, tr, dot, grad, dx, ds, dS, avg, jump
import ufl
f = Constant(1.0)
h = Constant(1.0)
μ = Constant(1.0)
λ = Constant(1.0)

The code below creates the dual form of the problem using the formula I wrote down above for the explicit form of the compliance tensor. Lord Jesus I hope I did all that algebra correct.

def form_hhj_lagrangian(z, f, h, μ, λ):
    mesh = ufl.domain.extract_unique_domain(z)
    w, σ = firedrake.split(z)
    I = firedrake.Identity(2)
    n = firedrake.FacetNormal(mesh)

     = 6 / (h**2 * μ) * (σ - λ / (2 * (μ + λ)) * tr(σ) * I)

    L_cells = (0.5 * inner(, σ) - inner(σ, grad(grad(w))) + f * w) * h * dx
    L_facets = avg(inner(n, dot(σ, n))) * jump(grad(w), n) * h * dS
    L_boundary = inner(n, dot(σ, n)) * inner(grad(w), n) * h * ds
    return L_cells + L_facets + L_boundary

The right degrees are $p + 1$ for the displacements and $p$ for the moments.

p = 1
cg = firedrake.FiniteElement("CG", "triangle", p + 1)
hhj = firedrake.FiniteElement("HHJ", "triangle", p)

Here we'll work on the unit square. An interesting feature of plate problems is that they don't become much easier to solve analytically on the unit square by separation of variables because of the mixed derivatives. The eigenfunctions of the biharmonic operator (we'll get to them below) are easier to derive on the circle than the square.

n = 64
mesh = firedrake.UnitSquareMesh(n, n, diagonal="crossed")
Q = firedrake.FunctionSpace(mesh, cg)
Σ = firedrake.FunctionSpace(mesh, hhj)
Z = Q * Σ
z = firedrake.Function(Z)
L = form_hhj_lagrangian(z, f, h, μ, λ)
F = firedrake.derivative(L, z)

For plate problems, we almost always have some Dirichlet boundary condition $$w|_{\partial\Omega} = w_0.$$ Since the problem is fourth-order, we need to supply more boundary conditions than we do for, say, the diffusion equation. There are two kinds. The first is the clamped boundary condition: $$\frac{\partial w}{\partial n} = 0.$$ The second is the simply-supported boundary condition: $$n\cdot\sigma n = 0.$$ Before I started writing this I was dreading how I was going to deal with either one. I was shocked at how easy it was to get both using the HHJ element. First, let's see what happens if we only enforce zero displacement at the boundary.

bc = firedrake.DirichletBC(Z.sub(0), 0, "on_boundary")
firedrake.solve(F == 0, z, bc)
w, σ = z.subfunctions
n = firedrake.FacetNormal(mesh)
dw_dn = inner(grad(w), n)
σ_nn = inner(dot(σ, n), n)

boundary_slope = firedrake.assemble(dw_dn ** 2 * ds)
boundary_stress = firedrake.assemble(σ_nn ** 2 * ds)
print(f"Integrated boundary slope:  {boundary_slope}")
print(f"Integrated boundary moment: {boundary_stress}")
Integrated boundary slope:  2.2011469910858315e-10
Integrated boundary moment: 0.004669439781744687

So it looks like the clamped boundary condition is natural with HHJ elements. We can confirm that by looking at a 3D plot.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
firedrake.trisurf(w, axes=ax);
No description has been provided for this image

Now let's see what happens if we apply a Dirichlet boundary condition to the moment part of the solution.

simply_supported_bc = firedrake.DirichletBC(Z.sub(1), 0, "on_boundary")
firedrake.solve(F == 0, z, bcs=[bc, simply_supported_bc])
w, σ = z.subfunctions
n = firedrake.FacetNormal(mesh)
dw_dn = inner(grad(w), n)
σ_nn = inner(dot(σ, n), n)

boundary_slope = firedrake.assemble(dw_dn ** 2 * ds)
boundary_stress = firedrake.assemble(σ_nn ** 2 * ds)
print(f"Integrated boundary slope:  {boundary_slope}")
print(f"Integrated boundary moment: {boundary_stress}")
Integrated boundary slope:  0.006015973798724048
Integrated boundary moment: 0.0

Now the boundary stresses are zero and so the boundary slopes are unconstrained.

fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
firedrake.trisurf(w, axes=ax);
No description has been provided for this image

The reason why this works is that the boundary degrees of freedom for the HHJ element are the normal-normal stress components. So if we set all of those to zero, we get simply-supported boundary conditions. This is analogous to the simpler setting of $H(\text{div})$ elements for vector fields having normal continuity. We can easily enforce boundary conditions like $u\cdot n = 0$ by setting the right degrees of freedom to zero.

We have to work a lot harder to get the boundary conditions right using other finite element bases. For example, the Argyris element has both value and derivative degrees of freedom at the mesh vertices. We can't set the boundary values alone in the same way. In order to set the boundary values with Argyris elements, Kirby and Mitchell had to use Nitsche's method. The Hsieh-Clough-Tocher (HCT) element makes it easy to use clamped boundary conditions but simply-supported is harder. The fact that both types are relatively easy with HHJ is a definite advantage, although it has more total degrees of freedom than HCT.

Eigenfunctions

The eigenfunctions of the plate problem, also known as Chladni patterns, have a fascinating history. This review is good reading. Here I'll show the first few eigenfunctions in a square with simply-supported boundary conditions.

A = firedrake.derivative(F, z)

w, σ = firedrake.split(z)
J = 0.5 * w**2 * dx
M = firedrake.derivative(firedrake.derivative(J, z), z)

bcs = [bc, simply_supported_bc]
problem = firedrake.LinearEigenproblem(A, M, bcs=bcs, restrict=True)

Note the $M$ matrix that we supplied above. We've looked at generalized eigenvalue problems before, like in the post on Weyl's law. In that setting we were solving eigenvalue problems of the form $$A\phi = \lambda M\phi$$ where $M$ was some symmetric positive-definite matrix instead of the identity. In those cases we always used the mass matrix. For saddle-point eigenvalue problems, we're often using something that is no longer definite. Here the form we want is instead $$\left[\begin{matrix} A & B^* \\ B & 0\end{matrix}\right]\left[\begin{matrix}\sigma \\ w\end{matrix}\right] = \lambda\left[\begin{matrix} 0 & 0 \\ 0 & M\end{matrix}\right]\left[\begin{matrix}\sigma \\ w\end{matrix}\right]$$ where $M$ is the mass matrix for the displacement block.

num_values = 40
opts = {
    "solver_parameters": {
        "eps_gen_hermitian": None,
        "eps_target_real": None,
        "eps_smallest_real": None,
        "st_type": "sinvert",
        "st_ksp_type": "gmres",
        "st_pc_type": "lu",
        "st_pc_factor_mat_solver_type": "mumps",
        "eps_tol": 1e-8,
    },
    "n_evals": num_values,
}
eigensolver = firedrake.LinearEigensolver(problem, **opts)
num_converged = eigensolver.solve()
print(f"Number of converged eigenfunctions: {num_converged}")
Number of converged eigenfunctions: 42

The plot below shows the nodal lines of the eigenfunctions. Many of them come in pairs due to the axis-flipping symmetry. These are the patterns that so fascinated Chladni and others.

ϕs = [eigensolver.eigenfunction(n)[0].sub(0) for n in range(num_values)]
fig, axes = plt.subplots(nrows=5, ncols=8, sharex=True, sharey=True)
for ax in axes.flatten():
    ax.set_aspect("equal")
    ax.set_axis_off()

levels = [-10, 0, 10]
for ax, ϕ in zip(axes.flatten(), ϕs):
    firedrake.tricontour(ϕ, levels=levels, axes=ax)
No description has been provided for this image

Mantle convection

This code is taken from a chapter in The FEniCS Book, which is available for free online. That chapter is in turn an implementation of a model setup from van Keken et al (1997). I took the thermomechanical parts and removed the chemistry. This paper and the code from the FEniCS book all use a non-dimensional form of the equations, which I've adopted here.

In this notebook, we'll see the Stokes equations again, but we'll couple them to the evolution of temperature through an advection-diffusion equation. The extensive quantity is not the temperature itself but the internal energy density $E = \rho c_p T$ where $\rho$, $c_p$ are the mass density and the specific heat at constant pressure. The flux of heat is $$F = \rho c_p T u - k\nabla T.$$ We'll assume there are no heat sources but, but the real physics would include sources from decay of radioactive elements, strain heating, and chemical reactions. The variational form of the heat equation is: $$\int_\Omega\left\{\partial_t(\rho c_p T)\,\phi - (\rho c_p Tu - k\nabla T)\cdot\nabla\phi - Q\phi\right\}dx = 0$$ for all test functions $\phi$. I've written this in such a way that it still makes sense when the density and heat capacity aren't constant. We'll assume they are (mostly) constant in the following. The variational form of the momentum balance equation is $$\int_\Omega\left\{2\mu\varepsilon(u): \dot\varepsilon(v) - p\nabla\cdot v - q\nabla\cdot u - \rho g\cdot v\right\}dx = 0$$ for all velocity and pressure test functions $v$, $q$.

The characteristic that makes this problem interesting is that we assume the flow is just barely compressible. This is called the Boussinesq approximation). In practice, it means we do two apparently contradictory things. First, we assume that the flow is incompressible: $$\nabla\cdot u = 0.$$ Second, we assume that the density that we use on the right-hand side of the momentum balance equation actually is compressible, and that the degree of compression is linear in the temperature: $$\rho = \rho_0(1 - \beta(T - T_0))$$ where $\rho_0$ is a reference density and $\beta$ is thermal expansion coefficient. These assumptions aren't consistent with each other. We're also not consistent in using this everywhere. For example, we only use this expression on the right-hand side of the momentum balance equation, but we'll still take the internal energy density to be equal to $\rho_0c_pT$ in the energy equation. In other words, the material is compressible as far as momentum balance is concerned but not energy balance. I find that uncomfortable. But I would find writing a full compressible solver to be blinding agony. Anyway it lets us make pretty pictures.

I won't go into a full non-dimensionalization of the problem, which is detailed in the van Keken paper. The most important dimensionless numbers for this system are the Prandtl number and the Rayleigh number. The Prandtl number quantifies the ratio of momentum diffusivity to thermal diffusivity: $$\text{Pr} = \frac{\mu / \rho}{k / \rho c_p} = \frac{\nu}{\kappa}$$ where $\nu$ and $\kappa$ are respectively the kinematic viscosity and thermal diffusivity. For mantle flows, the Prandtl number is on the order of $10^{25}$. We can get a sense for a characteristic velocity scale from the momentum balance equation. First, we can take a factor of $\rho_0g$ into the pressure. If we take $U$, $L$, and $\Delta T$ to be the characteristic velocity, length, and temperature difference scales, non-dimensionalizing the momentum balance equation gives $$\frac{\mu U}{L^2} \approx \rho_0g\beta\Delta T$$ which then implies that $U = \rho_0g\beta L^2\Delta T / \mu$. Now we can compute the thermal Péclet number $$\text{Pe} = \frac{UL}{\kappa}.$$ When the Péclet number is large, we're in the convection-dominated regime; when it's small, we're in the diffusion-dominated regime. Taking our estimate of velocity scale from before, we get $$\text{Pe} = \frac{\rho_0 g \beta L^3 \Delta T}{\kappa\mu}.$$ This is identical to the Rayleigh number, which we'll write as $\text{Ra}$. Lord Rayleigh famously used linear stability analysis to show that a fluid that is heated from below and cooled from above is unstable when the Rayleigh number exceeds 600 or so. Here we'll take the Rayleigh number to be equal to $10^6$ as in the van Keken paper. It's a worthwhile exercise in numerical analysis to see what it takes to make a solver go for higher values of the Rayleigh number.

Here we're using a rectangular domain. Again, the spatial scales have been non-dimensionalized.

import numpy as np
from numpy import pi as π
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from tqdm.notebook import trange, tqdm
import firedrake
from firedrake import (
    Constant, sqrt, exp, min_value, max_value, inner, sym, grad, div, dx
)
import irksome
from irksome import Dt

Lx, Ly = Constant(2.0), Constant(1.0)
ny = 32
nx = int(float(Lx / Ly)) * ny
mesh = firedrake.RectangleMesh(
    nx, ny, float(Lx), float(Ly), diagonal="crossed"
)

The initial condition for temperature involves loads of annoyingly long expressions, but it's all in Appendix A of the van Keken paper.

def clamp(z, zmin, zmax):
    return min_value(Constant(zmax), max_value(Constant(zmin), z))

def switch(z):
    return exp(z) / (exp(z) + exp(-z))

Ra = Constant(1e6)

ϵ = Constant(1 / nx)
x = firedrake.SpatialCoordinate(mesh)

q = Lx**(7 / 3) / (1 + Lx**4)**(2 / 3) * (Ra / (2 * np.sqrt(π)))**(2/3)
Q = 2 * sqrt(Lx / (π * q))
T_u = 0.5 * switch((1 - x[1]) / 2 * sqrt(q / (x[0] + ϵ)))
T_l = 1 - 0.5 * switch(x[1] / 2 * sqrt(q / (Lx - x[0] + ϵ)))
T_r = 0.5 + Q / (2 * np.sqrt(π)) * sqrt(q / (x[1] + 1)) * exp(-x[0]**2 * q / (4 * x[1] + 4))
T_s = 0.5 - Q / (2 * np.sqrt(π)) * sqrt(q / (2 - x[1])) * exp(-(Lx - x[0])**2 * q / (8 - 4 * x[1]))
expr = T_u + T_l + T_r + T_s - Constant(1.5)

degree = 1
temperature_space = firedrake.FunctionSpace(mesh, "CG", degree)
T_0 = firedrake.Function(temperature_space).interpolate(clamp(expr, 0, 1))
T = T_0.copy(deepcopy=True)

The important point is to make the fluid hotter below and on one side and cooler above and on the other side, shown in the plot below.

def subplots():
    fig, axes = plt.subplots()
    axes.set_aspect("equal")
    axes.set_axis_off()
    axes.set_xlim(0, float(Lx))
    axes.set_ylim(0, float(Ly))
    return fig, axes

fig, axes = subplots()
firedrake.tripcolor(T, cmap="inferno", axes=axes);
No description has been provided for this image

Next we need to make some function spaces for the fluid velocity and pressure. Note how the degree of the velocity space is one higher than that of the pressure -- we're using the Taylor-Hood element again.

pressure_space = firedrake.FunctionSpace(mesh, "CG", 1)
velocity_space = firedrake.VectorFunctionSpace(mesh, "CG", 2)
Z = velocity_space * pressure_space

Once we've created a function in the mixed space, we can then pull out the two parts with the split method.

z = firedrake.Function(Z)
u, p = firedrake.split(z)

The code below creates the variational form of the Stokes equations $$\nabla\cdot\tau - \nabla p + \text{Ra}\;T\; g = 0$$ with temperature-dependent gravitational forcing.

μ = Constant(1)
ε = lambda u: sym(grad(u))

v, q = firedrake.TestFunctions(z.function_space())

τ = 2 * μ * ε(u)
g = Constant((0, -1))
f = -Ra * T * g
F = (inner(τ, ε(v)) - q * div(u) - p * div(v) - inner(f, v)) * dx

We can use the .sub method to pull parts out of mixed spaces, which we need in order to create the right boundary conditions.

bc = firedrake.DirichletBC(Z.sub(0), Constant((0, 0)), "on_boundary")

A bit of magic in order to tell the linear solver that the Stokes equations have a null space we need to project out.

basis = firedrake.VectorSpaceBasis(constant=True, comm=firedrake.COMM_WORLD)
nullspace = firedrake.MixedVectorSpaceBasis(Z, [Z.sub(0), basis])

Here I'm creating solver object since we'll need to repeatedly solve the same system many times.

stokes_problem = firedrake.NonlinearVariationalProblem(F, z, bc)
parameters = {
    "nullspace": nullspace,
    "solver_parameters": {
        "ksp_type": "preonly",
        "pc_type": "lu",
        "pc_factor_mat_solver_type": "mumps",
    },
}
stokes_solver = firedrake.NonlinearVariationalSolver(stokes_problem, **parameters)
stokes_solver.solve()
fig, axes = subplots()
firedrake.streamplot(
    z.sub(0), axes=axes, resolution=1/40, cmap="inferno", seed=1729
);
No description has been provided for this image

Here we're setting up the temperature solver, which includes both convection and diffusion. I've set the timestep based on the mesh size and the maximum speed that we just found above from the velocity solution.

ρ, c, k = Constant(1), Constant(1), Constant(1)
δx = mesh.cell_sizes.dat.data_ro[:].min()
umax = z.sub(0).dat.data_ro[:].max()
δt = Constant(δx / umax)

ϕ = firedrake.TestFunction(temperature_space)
f = ρ * c * T * u - k * grad(T)
G = (ρ * c * Dt(T) * ϕ - inner(f, grad(ϕ))) * dx

lower_bc = firedrake.DirichletBC(temperature_space, 1, [3])
upper_bc = firedrake.DirichletBC(temperature_space, 0, [4])
bcs = [lower_bc, upper_bc]

method = irksome.BackwardEuler()
temperature_solver = irksome.TimeStepper(G, method, Constant(0.0), δt, T, bcs=bcs)
print(f"Timestep: {float(δt):0.03g}")
Timestep: 0.000132

And the timestepping loop. Note that the final time is on a non-dimensional scale again, in physical time it works out to be on the order of a hundred million years. Here we're using an operator splitting approach. We first update the temperature, and then we compute a new velocity and pressure. The splitting error goes like $\mathscr{O}(\delta t)$ as the timestep is reduced. So if we use a first-order integration scheme like backward Euler then the splitting error is asymptotically the same as that of the discretization itself. We could get $\mathscr{O}(\delta t^3)$ or higher convergence by using, say, the Radau-IIA method, but the total error would be dominated by splitting, so there would be little point in trying harder.

final_time = 0.25
num_steps = int(final_time / float(δt))
Ts = [T.copy(deepcopy=True)]
zs = [z.copy(deepcopy=True)]

for step in trange(num_steps):
    temperature_solver.advance()
    stokes_solver.solve()

    Ts.append(T.copy(deepcopy=True))
    zs.append(z.copy(deepcopy=True))

The movie below shows the temperature evolution. The rising and sinking plumes of hot and cold fluid correspond to the mantle plumes that produce surface volcanism.

%%capture
fig, axes = subplots()
kw = {"num_sample_points": 4, "vmin": 0, "vmax": 1, "cmap": "inferno"}
colors = firedrake.tripcolor(Ts[0], axes=axes, **kw)
fn_plotter = firedrake.FunctionPlotter(mesh, num_sample_points=4)
animate = lambda T: colors.set_array(fn_plotter(T))
animation = FuncAnimation(fig, animate, frames=tqdm(Ts), interval=1e3/30)
HTML(animation.to_html5_video())
z = zs[-1]
u, p = z.subfunctions
fig, axes = subplots()
firedrake.streamplot(
    u, axes=axes, resolution=1/40, cmap="inferno", seed=1729
);
No description has been provided for this image

What next

There are a few features in the van Keken paper and otherwise that I didn't attempt here. The original van Keken paper adds chemistry. Solving the associated species transport equation with as little numerical diffusion as possible is hard. You could then make the fluid buoyancy depend on chemical composition and add temperature-dependent chemical reactions.

Real mantle fluid also has a temperature-dependent viscosity. When I tried to add this using the splitting scheme above, the solver quickly diverges. Getting that to work might require a fully coupled time-integration scheme, which I did try. Rather than split up the temperature and the velocity/pressure solves, a fully coupled scheme would solve for all three fields at once. If you could make this work, it would open up the possibility of using higher-order methods like Radau-IIA. Whether it would go or not seemed to depend on the machine and the day of the week. I'll explore that more in a future post.

Kármán vortices

In previous posts, I looked at how to discretize the incompressible Stokes equations. The Stokes equations are a good approximation when the fluid speed is small enough that inertial effects are negligible. The relevant dimensionless number is the ratio $$\text{Re} = \frac{\rho UL}{\mu},$$ the Reynolds number. Stokes flow applies when the Reynolds number is substantially less than 1. The incompressibility constraint adds a new difficulty: we have to make good choices of finite element bases for the velocity and pressure. If we fail to do that, the resulting linear systems can have no solution or infinitely many.

Here I'll look at how we can discretize the full Navier-Stokes equations: $$\frac{\partial}{\partial t}\rho u + \nabla\cdot\rho u\otimes u = -\nabla p + \nabla\cdot \tau$$ where the deviatoric stress tensor is $\tau = 2\mu\dot\varepsilon$. The inertial terms are nonlinear, which makes this problem more difficult yet than the Stokes equations.

The goal here will be to simulate the famous von Kármán vortex street.

Making the initial geometry

First, we'll make a domain consisting of a circle punched out of a box. The fluid flow in the wake of the circle will produce vortices.

import gmsh
gmsh.initialize()
import numpy as np
from numpy import pi as π

Lx = 6.0
Ly = 2.0
lcar = 1 / 16

gmsh.model.add("chamber")
geo = gmsh.model.geo
ps = [(0, 0), (Lx, 0), (Lx, Ly), (0, Ly)]
box_points = [geo.add_point(*p, 0, lcar) for p in ps]
box_lines = [
    geo.add_line(i1, i2) for i1, i2 in zip(box_points, np.roll(box_points, 1))
]

for line in box_lines:
    geo.add_physical_group(1, [line])

f = 1 / 3
c = np.array([f * Lx, Ly / 2, 0])
center = geo.add_point(*c)
r = Ly / 8
num_circle_points = 16
θs = np.linspace(0.0, 2 * π, num_circle_points + 1)[:-1]
ss = np.column_stack((np.cos(θs), np.sin(θs), np.zeros(num_circle_points)))
tie_points = [geo.add_point(*(c + r * s), lcar) for s in ss]
circle_arcs = [
    geo.add_circle_arc(p1, center, p2)
    for p1, p2 in zip(tie_points, np.roll(tie_points, 1))
]

geo.add_physical_group(1, circle_arcs)

outer_curve_loop = geo.add_curve_loop(box_lines)
inner_curve_loop = geo.add_curve_loop(circle_arcs)
plane_surface = geo.add_plane_surface([outer_curve_loop, inner_curve_loop])
geo.add_physical_group(2, [plane_surface])
geo.synchronize()
gmsh.model.mesh.generate(2)
gmsh.write("chamber.msh")
Info    : Meshing 1D...
Info    : [  0%] Meshing curve 1 (Line)
Info    : [ 10%] Meshing curve 2 (Line)
Info    : [ 20%] Meshing curve 3 (Line)
Info    : [ 20%] Meshing curve 4 (Line)
Info    : [ 30%] Meshing curve 5 (Circle)
Info    : [ 30%] Meshing curve 6 (Circle)
Info    : [ 40%] Meshing curve 7 (Circle)
Info    : [ 40%] Meshing curve 8 (Circle)
Info    : [ 50%] Meshing curve 9 (Circle)
Info    : [ 50%] Meshing curve 10 (Circle)
Info    : [ 60%] Meshing curve 11 (Circle)
Info    : [ 60%] Meshing curve 12 (Circle)
Info    : [ 70%] Meshing curve 13 (Circle)
Info    : [ 70%] Meshing curve 14 (Circle)
Info    : [ 80%] Meshing curve 15 (Circle)
Info    : [ 80%] Meshing curve 16 (Circle)
Info    : [ 90%] Meshing curve 17 (Circle)
Info    : [ 90%] Meshing curve 18 (Circle)
Info    : [100%] Meshing curve 19 (Circle)
Info    : [100%] Meshing curve 20 (Circle)
Info    : Done meshing 1D (Wall 0.00711856s, CPU 0.003696s)
Info    : Meshing 2D...
Info    : Meshing surface 1 (Plane, Frontal-Delaunay)
Info    : Done meshing 2D (Wall 0.139491s, CPU 0.135442s)
Info    : 4047 nodes 8113 elements
Info    : Writing 'chamber.msh'...
Info    : Done writing 'chamber.msh'
import firedrake
mesh = firedrake.Mesh("chamber.msh")
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set_aspect("equal")
firedrake.triplot(mesh, axes=ax)
ax.legend(loc="upper right");
No description has been provided for this image

Initial velocity

We'll use a fixed inflow velocity $$u_x = 4 \frac{y}{L_y}\left(1 - \frac{y}{L_y}\right) u_{\text{in}}.$$ We'll take as characteristic length scale the radius of the disc $L_y / 8$. In order to see the effect we want, we'll need a Reynolds number that's on the order of 100 or greater.

from firedrake import Constant, inner, outer, sym, grad, div, dx, ds
import irksome
from irksome import Dt

μ = Constant(1e-2)

x, y = firedrake.SpatialCoordinate(mesh)
u_in = Constant(5.0)
ly = Constant(Ly)
expr = firedrake.as_vector((4 * u_in * y / ly * (1 - y / ly), 0))

I've used Taylor-Hood elements for this demonstration. It would be a good exercise to repeat this using, say, Crouzeix-Raviart or other elements.

cg1 = firedrake.FiniteElement("CG", "triangle", 1)
cg2 = firedrake.FiniteElement("CG", "triangle", 2)
Q = firedrake.FunctionSpace(mesh, cg1)
V = firedrake.VectorFunctionSpace(mesh, cg2)
Z = V * Q
z = firedrake.Function(Z)

Here we can use the fact that the Stokes problem has a minimization form. The Navier-Stokes equations do not because of the convective term.

u, p = firedrake.split(z)
ε = lambda v: sym(grad(v))

L = (μ * inner(ε(u), ε(u)) - p * div(u)) * dx

inflow_ids = (1,)
outflow_ids = (3,)
side_wall_ids = (2, 4, 5)
side_wall_bcs = firedrake.DirichletBC(Z.sub(0), Constant((0.0, 0.0)), side_wall_ids)
inflow_bcs = firedrake.DirichletBC(Z.sub(0), expr, inflow_ids)
bcs = [side_wall_bcs, inflow_bcs]

F = firedrake.derivative(L, z)
firedrake.solve(F == 0, z, bcs)

A bit of a philosophical point is in order here. We picked the inflow velocity and the viscosity to produce a high Reynolds number (about 200). The Stokes equations, on the other hand, are physically realistic only when the Reynolds number is $\ll 1$. But we can still solve the Stokes equations in the high-Reynolds number limit. In other words, the mathematical model remains well-posed even in regimes where it is not applicable. Here we're only using the result to initialize a simulation using the "right" model. But it's a mistake -- and one I've made -- to lull yourself into a false sense of correctness merely because the model gave you an answer.

fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_axis_off()
colors = firedrake.streamplot(
    z.subfunctions[0], resolution=1/16, seed=1729, cmap="Blues", axes=ax
)
fig.colorbar(colors, orientation="horizontal");
No description has been provided for this image

Solution method

There are a host of methods for solving the Navier-Stokes equations by breaking them up into two simpler problems. These are known as projection methods). I started out trying those but it required effort, which I find disgusting. So I threw backward Euler at it and it worked.

There are some non-trivial decisions to make about both the variational form and the boundary conditions. I started writing this wanting to use pressure boundary conditions at both the inflow and outflow. (Of course I had to be reminded that you can't prescribe pressures but rather tractions.) This went badly. If a wave reflects off of the obstacle, back upstream, and out the inflow boundary, the simulation will crash. So I had to dial back the challenge and use a fixed inflow velocity and a traction boundary condition at the outflow.

Almost any writing you see about the Navier-Stokes equations will express the problem in differential form and will use incompressibility to apply what might appear to be simplifications. For example, if the fluid is incompressible and the viscosity is constant, then you can rewrite the viscous term like so: $$\nabla\cdot \mu(\nabla u + \nabla u^\top) = \mu\nabla^2u.$$ You'll see this form in almost all numerical methods or engineering textbooks. I don't like it for two reasons. First, the apparent simplification only gets in your way as soon as you want to consider fluids with variable viscosity. Mantle convection is one obvious case -- the temperature and chemistry of mantle rock can change the viscosity by several orders of magnitude. Second, it gives the wrong boundary conditions when you try to discretize the problem (see Limache et al. (2007)). I've retained the symmetric gradients of the velocity and test function in the form below.

The second apparent simplification uses incompressibility to rewrite the convection term: $$\nabla\cdot \rho u\otimes u = u\cdot\nabla \rho u.$$ This form is ubiquitous and reflects a preference for thinking about fluid flow in a Lagrangian reference frame. I prefer to avoid it although both are correct, unlike the Laplacian form of the viscosity. Given any extensive density $\phi$, regardless of its tensor rank, the flux will include a term $\phi\cdot u$. The conversion of this flux from the conservation to the variational form is then $$-\int_\Omega \phi u\cdot\nabla\psi\,dx$$ and this is true of mass, momentum, energy, whatever you like. Taking something that was a divergence and making it not a divergence obfuscates the original conservation principle. It also stops you from pushing the differential operator over onto a test function. So I've instead coded up the convection term as a discretization of $$-\int_\Omega u\otimes u\cdot\dot\varepsilon(v)\,dx + \int_{\partial\Omega\cap\{u\cdot n \ge 0\}}(u\cdot n)(v\cdot n)ds.$$ In the first term, I've used the symmetric gradient of $v$ because the contraction of a symmetric and an anti-symmetric tensor is zero.

All together now, the variational form that I'm using is $$\begin{align} 0 & = \int_\Omega\left\{\rho\,\partial_tu\cdot v - \rho u\otimes u:\varepsilon(v) - p\nabla\cdot v - q\nabla\cdot u + 2\mu\,\dot\varepsilon(u):\dot\varepsilon(v)\right\}dx \\ & \qquad\qquad + \int_\Gamma (\rho u\cdot n)(u \cdot v)ds. \end{align}$$ for all test functions $v$ and $q$.

v, q = firedrake.TestFunctions(Z)
u, p = firedrake.split(z)

ρ = firedrake.Constant(1.0)

F_1 = (
    ρ * inner(Dt(u), v) -
    ρ * inner(ε(v), outer(u, u)) -
    p * div(v) -
    q * div(u) +
    2 * μ * inner(ε(u), ε(v))
) * dx

n = firedrake.FacetNormal(mesh)
F_2 = ρ * inner(u, v) * inner(u, n) * ds(outflow_ids)

F = F_1 + F_2

We'll need to make some choice about the timestep. Here I've computed the CFL time for the mesh and the initial velocity that we computed above. This choice might not be good enough. If we initialized the velocity by solving the Stokes equations, the fluid could evolve to a much higher speed. We might then find that this timestep is inadequate. A principled solution would be to use an adaptive scheme.

dg0 = firedrake.FiniteElement("DG", "triangle", 0)
Δ = firedrake.FunctionSpace(mesh, dg0)
area = firedrake.Function(Δ).project(firedrake.CellVolume(mesh))
δx_min = np.sqrt(2 * area.dat.data_ro.min())

u, p = z.subfunctions
U_2 = firedrake.Function(Δ).project(inner(u, u))
u_max = np.sqrt(U_2.dat.data_ro.max())
cfl_time = δx_min / u_max
print(f"Smallest cell diameter: {δx_min:0.4f}")
print(f"Max initial velocity:   {u_max:0.4f}")
print(f"Timestep:               {cfl_time:0.4f}")

dt = firedrake.Constant(0.5 * cfl_time)
Smallest cell diameter: 0.0351
Max initial velocity:   6.4170
Timestep:               0.0055
params = {
    "solver_parameters": {
        "snes_monitor": ":navier-stokes-output.log",
        "snes_atol": 1e-12,
        "ksp_atol": 1e-12,
        "snes_type": "newtonls",
        "ksp_type": "gmres",
        "pc_type": "lu",
        "pc_factor_mat_solver_type": "mumps",
    },
    "bcs": bcs,
}

method = irksome.BackwardEuler()
t = firedrake.Constant(0.0)
solver = irksome.TimeStepper(F, method, t, dt, z, **params)

I've added a bit of code to show some diagnostic information in the progress bar. First I have it printing out the number of Newton iterations that were required to compute each timestep. If you see this going much above 20 then something is off. Second, I had it print out the maximum pressure. Both of these were useful when I was debugging this code.

from tqdm.notebook import trange

zs = [z.copy(deepcopy=True)]

final_time = 10.0
num_steps = int(final_time / float(dt))
progress_bar = trange(num_steps)
for step in progress_bar:
    solver.advance()
    zs.append(z.copy(deepcopy=True))
    iter_count = solver.solver.snes.getIterationNumber()
    pmax = z.subfunctions[1].dat.data_ro.max()
    progress_bar.set_description(f"{iter_count}, {pmax:0.4f} | ")

Finally, we'll make an animated quiver plot because it looks pretty.

%%capture

from tqdm.notebook import tqdm
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.set_axis_off()

X = mesh.coordinates.dat.data_ro
V = mesh.coordinates.function_space()
u_t = zs[0].subfunctions[0].copy(deepcopy=True)
interpolator = firedrake.Interpolate(u_t, V)
u_X = firedrake.assemble(interpolator)
u_values = u_X.dat.data_ro

arrows = firedrake.quiver(zs[0].subfunctions[0], axes=ax, cmap="Blues")
def animate(z):
    u_t.assign(z.subfunctions[0])
    u_X = firedrake.assemble(interpolator)
    u_values = u_X.dat.data_ro
    arrows.set_UVC(*(u_values.T))
animation = FuncAnimation(fig, animate, tqdm(zs), interval=1e3/60)
from IPython.display import HTML
HTML(animation.to_html5_video())