Stokes flow

In the last post, we looked at variational principles by studying the minimal surface equation. Much of what you learn in multivariable calculus carries over equally well to infinite-dimensional spaces and we were able to leverage a lot of this intuition to design efficient solution procedures. For example, the notion of convexity carries over to variatonal problems and using this idea we can show that Newton's method is effective in this setting as well.

When we solved the minimal surface equation, our solution space consists of all functions that satisfy a set of Dirichlet boundary conditions. These conditions are easy to eliminate so our problem is essentially unconstrained. In this post, we'll look at the Stokes equations, which are a constrained optimization problem. For unconstrained problems, the convexity of the objective implies a kind of stability property that we can use to prove that roughly any finite element basis will give a convergent approximation scheme. For constrained problems we have to be much more careful about the choice of basis and this is the content of the Ladyzhenskaya-Babuška-Brezzi or LBB conditions, which I'll describe in a later post. For now, we'll focus on solving the Stokes equations using one particular discretization, the Taylor-Hood element.

The Stokes equations describe slow flow of very viscous, incompressible fluids. The fields we're solving for are the velocity $u$ and the pressure $p$. The incompressibility condition means that the velocity field is divergence-free:

$$\nabla\cdot u = 0.$$

The remaining equations state that the stresses are in balance wth body forces:

$$\nabla\cdot \tau - \nabla p + f = 0,$$

where $\tau$ is the rank-2 stress tensor and $f$ are the body forces. The stress tensor must be related somehow to the velocity field. For a viscous fluid, the stress tensor is related to the rate-of-strain tensor

$$\dot\varepsilon = \frac{1}{2}\left(\nabla u + \nabla u^*\right).$$

(For solids the stress tensor is related to the gradient of the displacement rather than the velocity.) The simplest constitutive relation is that of a Newtonian fluid:

$$\tau = 2\mu\dot\varepsilon,$$

where $\mu$ is the viscosity. There are other nonlinear constitutive relations, but for now we'll just consider Newtonian fluids. If $U$ and $L$ are characteristic velocity and length scales for the particular flow at hand and $\rho$ is the fluid density, the Stokes equations are a good description when the Reynolds number is much less than 1:

$$\text{Re} \equiv \frac{\rho UL}{\mu} \ll 1.$$

When the Reynolds number is closer to or larger than 1, we need to use the full Navier-Stokes equations, which includes inertial effects as well.

The Stokes equations, like the Poisson equation, have a minimization principle, but for two fields instead of one. The variational principle is that the solution $u$, $p$ is a critical point of the rate of decrease of the Gibbs free energy:

$$\dot{\mathscr{G}}(u, p) = \int_\Omega\left(\mu|\dot\varepsilon(u)|^2 - p\nabla\cdot u - f\cdot u\right)dx.$$

You can show using the usual tricks that the Euler-Lagrange equations for this functional are the Stokes equations. The free energy dissipation functional consists of a positive, quadratic term in the velocity, but the pressure $p$ only acts like a Lagrange multiplier enforcing the incompressibility condition. The lack of any positivity in the pressure is part of what makes the Stokes equations so challenging to discretize and solve. While the second derivative of the objective is still symmetric, it is no longer positive-definite.

Demonstration

Here we'll work on a classic problem of flow driven by a moving boundary. The domain will consist of a circle with two holes removed. We'll imagine that these holes are cylindrical turbines that are rotating with some fixed speed and dragging the fluid along with them. As we'll see, getting a linear solver to converge for this problem is much more challenging than for the Poisson equation.

import numpy as np
from numpy import pi as π
def add_ellipse(geometry, x, y, a, b, N, lcar):
    θs = np.array([2 * π * n / N for n in range(N)])
    xs, ys = x + a * np.cos(θs), y + b * np.sin(θs)
    points = [geometry.add_point([x, y, 0], lcar=lcar) for x, y in zip(xs, ys)]
    lines = [geometry.add_line(points[n], points[(n + 1) % N])
             for n in range(N)]

    geometry.add_physical(lines)
    line_loop = geometry.add_line_loop(lines)
    return line_loop
import pygmsh
geometry = pygmsh.built_in.Geometry()

outer_line_loop = add_ellipse(geometry, x=0, y=0, a=1, b=1, N=128, lcar=1/4)
inner_loops = [
    add_ellipse(geometry, x=0, y=+1/2, a=1/8, b=1/8, N=64, lcar=1/4),
    add_ellipse(geometry, x=0, y=-1/2, a=1/8, b=1/8, N=64, lcar=1/4)
]

plane_surface = geometry.add_plane_surface(outer_line_loop, inner_loops)
geometry.add_physical(plane_surface)

with open('mixer.geo', 'w') as geo_file:
    geo_file.write(geometry.get_code())
    
!gmsh -2 -format msh2 -v 0 -o mixer.msh mixer.geo
import firedrake
mesh = firedrake.Mesh('mixer.msh')
import matplotlib.pyplot as plt
fig, axes = plt.subplots()
axes.set_aspect('equal')
firedrake.triplot(mesh, axes=axes)
axes.legend();

For this problem we'll use the Taylor-Hood element: piecewise linear basis functions for the pressure and piecewise quadratic basis functions for the velocity. The Taylor-Hood element is stable for the Stokes equations in that norm of the inverse of the disretized linear system is bounded as the mesh is refined. This is a very special property and not just any element will work.

For scalar problems, the solution is a single field, but for the Stokes equations our solution consists of a pair of a velocity and a pressure field. Firedrake includes a handy algebraic notation for defining the direct product of two function spaces.

Q = firedrake.FunctionSpace(mesh, family='CG', degree=1)
V = firedrake.VectorFunctionSpace(mesh, family='CG', degree=2)
Z = V * Q

We can access the components of a function that lives in this product space using the usual Python indexing operators, but it's more convenient to use the function firedrake.split to give us handles for the two components.

z = firedrake.Function(Z)
u, p = firedrake.split(z)

This way our code to define the objective functional looks as much like the math as possible, rather than have to constantly reference the components.

We'll use a viscosity coefficient $\mu$ of 1000. Since the diameter of the domain and the fluid velocity are both on the order of 1, the viscosity would need to be fairly large for the Stokes equations to actually be applicable.

from firedrake import inner, sym, grad, div, dx
def ε(u):
    return sym(grad(u))

μ = firedrake.Constant(1e3)
𝒢̇ = (μ * inner(ε(u), ε(u)) - p * div(u)) * dx

One of the extra challenging factors about the Stokes equations is that they have a non-trivial null space. To see this, suppose we have some velocity-pressure pair $u$, $p$. The velocity field $u$ is not necessarily divergence-free, but we do need that $u\cdot n = 0$ on the boundary of the domain. If we add a constant factor $p_0$ to the pressure, then the value of the objective functional is unchanged:

$$\begin{align} \dot{\mathscr{G}}(u, p) - \dot{\mathscr{G}}(u, p + p_0) & = \int_\Omega p_0\nabla\cdot u\, dx \\ & = p_0\int_{\partial\Omega}u\cdot n\, ds = 0. \end{align}$$

In order to obtain a unique solution to the system, we can impose the additional constraint that

$$\int_\Omega p\, dx = 0,$$

or in other words that the pressure must be orthogonal to all the constant functions.

from firedrake import MixedVectorSpaceBasis, VectorSpaceBasis
basis = VectorSpaceBasis(constant=True)
nullspace = MixedVectorSpaceBasis(Z, [Z.sub(0), basis])

Next we have to create the boundary conditions. The only extra work we have to do here is to get the right component of the mixed function space $Z$.

x = firedrake.SpatialCoordinate(mesh)

x2 = firedrake.as_vector((0, +1/2))
r2 = firedrake.Constant(1/8)
x3 = firedrake.as_vector((0, -1/2))
r3 = firedrake.Constant(1/8)

q2 = (x - x2) / r2
q3 = (x - x3) / r3

u2 = firedrake.as_vector((-q2[1], q2[0]))
u3 = firedrake.as_vector((-q3[1], q3[0]))

from firedrake import DirichletBC, as_vector
bc1 = DirichletBC(Z.sub(0), as_vector((0, 0)), 1)
bc2 = DirichletBC(Z.sub(0), u2, 2)
bc3 = DirichletBC(Z.sub(0), u3, 3)

Now let's see what happens if we invoke the default linear solver.

from firedrake import derivative

try:
    firedrake.solve(
        derivative(𝒢̇, z) == 0, z,
        bcs=[bc1, bc2, bc3],
        nullspace=nullspace
    )
except firedrake.ConvergenceError:
    print("Oh heavens, it didn't converge!")

We'll take the easy way out and use the sparse direct solver MUMPS to make sure we get an answer. This approach will work for now, but even parallel direct solvers scale poorly to large problems, especially in 3D. The proper incanctation to invoke the direct solver needs a bit of explaining. For mixed problems like the Stokes equations, Firedrake will assemble a special matrix type that exploits the problem's block structure. Unfortunately MUMPS can't work with this matrix format, so we have to specify that it will use PETSc's aij matrix format with the option 'mat_type': 'aij'. Next, we'll request that the solver use the LU factorization with 'pc_type': 'lu'. Without any other options, this will use PETSc's built-in matrix factorization routines. These are fine for strictly positive matrices, but fail when the problem has a non-trivial null space. The option 'pc_factor_mat_solver_type': 'mumps' will use the MUMPS package instead of PETSc's built-in sparse direct solver.

firedrake.solve(
    derivative(𝒢̇, z) == 0, z,
    bcs=[bc1, bc2, bc3],
    nullspace=nullspace,
    solver_parameters={
        'mat_type': 'aij',
        'ksp_type': 'preonly',
        'pc_type': 'lu',
        'pc_factor_mat_solver_type': 'mumps'
    }
)

Some cool features you can observe in the stream plot are the saddle point at the center of the domain and the two counter-rotating vortices that form on either side of it.

u, p = z.split()
fig, axes = plt.subplots()
axes.set_aspect('equal')
kwargs = {'resolution': 1/30, 'seed': 4, 'cmap': 'winter'}
streamlines = firedrake.streamplot(u, axes=axes, **kwargs)
fig.colorbar(streamlines);

The velocity field should be close to divergence-free; if we project the divergence into a DG(2) we can see what the exact value is. There are some small deviations, especially around the boundary of the domain. Part of the problem is that the boundary conditions we've specified are exactly tangent to the idealized domain -- a large circle with two circular holes punched out of it -- but not to its discrete approximation by a collection of straight edges.

S = firedrake.FunctionSpace(mesh, family='DG', degree=2)
div_u = firedrake.project(div(u), S)
fig, axes = plt.subplots()
axes.set_aspect('equal')
kwargs = {'vmin': -0.01, 'vmax': +0.01, 'cmap': 'seismic'}
triangles = firedrake.tripcolor(div_u, axes=axes, **kwargs)
fig.colorbar(triangles);

For now we'll calculate and store the norm of the velocity diverence. When we try to improve on this we'll use this value as a baseline.

linear_coords_divergence = firedrake.norm(div_u)
print(linear_coords_divergence)
0.026532615307004404

Higher-order geometries

We can try to improve on this by using curved edges for the geometry instead of straight ones. The topology of the mesh is the same; we're just adding more data describing how it's embedded into Euclidean space. In principle, gmsh can generate this for us, but reading in the file seems to be awfully annoying. To get a higher-order geometry, we'll proceed by:

  1. making a quadratic vector function space
  2. interpolating the linear coordinates into this space
  3. patching the new coordinate field to conform to the boundary

This approach will work for our specific problem but it requires us to know things about the idealized geometry that aren't always available. So what we're about to do isn't exactly generalizable.

To do the patching in step 3, we'll create boundary condition objects defined on the quadratic function space and then apply them. We need to know the numbering of the various boundary segments in order to do that, so to refresh the memory let's look at the mesh again.

fig, axes = plt.subplots()
axes.set_aspect('equal')
firedrake.triplot(mesh, axes=axes)
axes.legend();

The outer curve is boundary 1, the upper mixer head is boundary 2, and the lower head is boundary 3. With that in mind we can create the new coordinate field.

Vc = firedrake.VectorFunctionSpace(mesh, family='CG', degree=2)
from firedrake import sqrt, Constant
def fixup(x, center, radius):
    distance = sqrt(inner(x - center, x - center))
    return center + radius * (x - center) / distance

centers = [Constant((0., 0.)), Constant((0., +.5)), Constant((0., -0.5))]
radii = [Constant(1.), Constant(1/8), Constant(1/8)]
bcs = [firedrake.DirichletBC(Vc, fixup(x, center, radius), index + 1)
       for index, (center, radius) in enumerate(zip(centers, radii))]

X0 = firedrake.interpolate(mesh.coordinates, Vc)
X = X0.copy(deepcopy=True)
for bc in bcs:
    bc.apply(X)

Just as a sanity check, we'll calculate the average deviation of the new from the old coordinate field to see how different they are.

from firedrake import ds
length = firedrake.assemble(Constant(1.) * ds(mesh))
firedrake.assemble(sqrt(inner(X - X0, X - X0)) * ds) / length
0.000180710597501049

Now we can solve the Stokes equations again on this new mesh using the exact same procedures as before.

qmesh = firedrake.Mesh(X)
Q = firedrake.FunctionSpace(qmesh, family='CG', degree=1)
V = firedrake.VectorFunctionSpace(qmesh, family='CG', degree=2)
Z = V * Q

z = firedrake.Function(Z)
u, p = firedrake.split(z)

𝒢̇ = (μ * inner(ε(u), ε(u)) - p * div(u)) * dx

basis = VectorSpaceBasis(constant=True)
nullspace = MixedVectorSpaceBasis(Z, [Z.sub(0), basis])

x = firedrake.SpatialCoordinate(qmesh)

x2 = firedrake.as_vector((0, +1/2))
r2 = firedrake.Constant(1/8)
x3 = firedrake.as_vector((0, -1/2))
r3 = firedrake.Constant(1/8)

q2 = (x - x2) / r2
q3 = (x - x3) / r3

u2 = firedrake.as_vector((-q2[1], q2[0]))
u3 = firedrake.as_vector((-q3[1], q3[0]))

from firedrake import DirichletBC, as_vector
bc1 = DirichletBC(Z.sub(0), as_vector((0, 0)), 1)
bc2 = DirichletBC(Z.sub(0), u2, 2)
bc3 = DirichletBC(Z.sub(0), u3, 3)

firedrake.solve(
    derivative(𝒢̇, z) == 0, z,
    bcs=[bc1, bc2, bc3],
    nullspace=nullspace,
    solver_parameters={
        'mat_type': 'aij',
        'ksp_type': 'preonly',
        'pc_type': 'lu',
        'pc_factor_mat_solver_type': 'mumps'
    }
)
S = firedrake.FunctionSpace(qmesh, family='DG', degree=2)
div_u = firedrake.project(div(u), S)

The ring of spurious divergences around the outer edge of the domain is substantially reduced with curved elements. Nonetheless, the boundary doesn't perfectly fit the circle and this imperfection means that at some points around the edge the discretized velocity field will have an unphysical, non-zero normal component.

fig, axes = plt.subplots()
axes.set_aspect('equal')
triangles = firedrake.tripcolor(div_u, axes=axes, **kwargs)
fig.colorbar(triangles);

Using a higher-order geometry reduced the norm of the velocity divergence almost by a factor of 4, which is a big improvement.

quadratic_coords_divergence = firedrake.norm(div_u)
print(linear_coords_divergence / quadratic_coords_divergence)
3.6951400937053247

Conclusion

The code above shows how to get an exact (up to rounding-error) solution to the discretized Stokes equations using MUMPS. For larger problems in 3D, using a direct method can become prohibitively expensive. The Firedrake documentation has a demo of how to use PETSc's field split preconditioners, together with matrix-free operators, to solve the Stokes equations efficiently. In subsequent posts, I'll show more about stable discretizations of mixed problems, and how to solve the Stokes equations with more exotic boundary conditions than the standard ones we've shown here.

The velocity field we calculated was not exactly divergence-free and part of this was a consequence of using a boundary condition that adapted poorly to a piecewise-linear discretized geometry. We were able to do better by increasing the polynomial degree of the geometry, and in general this is absolutely necessary to achieve the expected rates of convergence with higher-order finite element bases. Nonetheless, the support for higher-order geometries in common finite element and mesh generation packages should be better given how useful they are. I think this is an area where a little investment in resources could make a really outsized difference. The logical endpoint of this line of thinking is isogeometric analysis, which is an active area of research.

The obstacle problem

In this post, we'll look at the obstacle problem. We've seen in previous posts examples of variational problems -- minimization of some functional with respect to a field. The classic example of a variational problem is to find the function $u$ that minimizes the Dirichlet energy

$$\mathscr{J}(u) = \int_\Omega\left(\frac{1}{2}|\nabla u|^2 - fu\right)dx$$

subject to the homogeneous Dirichlet boundary condition $u|_{\partial\Omega} = 0$. The Poisson equation is especially convenient because the objective is convex and quadratic. The obstacle problem is what you get when you add the additional constraint

$$u \ge g$$

throughout the domain. More generally, we can look at the problem of minimizing a convex functional $\mathscr{J}$ subject to the constraint that $u$ has to live in a closed, convex set $K$ of a function space $Q$. For a totally unconstrained problem, $K$ would just be the whole space $Q$.

Newton's method with line search is a very effective algorithm for solving unconstrained convex problems, even for infinite-dimensional problems like PDEs. Things get much harder when you include inequality constraints. To make matters worse, much of the literature you'll find on this subject is focused on finite-dimensional problems, where techniques like the active-set method work quite well. It's not so obvious how to generalize these methods to variational problems. In the following, I'll follow the approach in section 4.1 of this paper by Farrell, Croci, and Surowiec, whch was my inspiration for writing this post.

Minimizing the action functional $\mathscr{J}$ over the convex set $K$ can be rephrased as an unconstrained problem to minimize the functional

$$\mathscr{J}(u) + \mathscr{I}(u),$$

where $\mathscr{I}$ is the indicator function of the set $K$:

$$\mathscr{I}(u) = \begin{cases}0 & u \in K \\ \infty & u \notin K\end{cases}.$$

This functional is still convex, but it can take the value $\infty$. The reformulation isn't especially useful by itself, but we can approximate it using the Moreau envelope. The envelope of $\mathscr{I}$ is defined as

$$\mathscr{I}_\gamma(u) = \min_v\left(\mathscr{I}(v) + \frac{1}{2\gamma^2}\|u - v\|^2\right).$$

In the limit as $\gamma \to 0$, $\mathscr{I}_\gamma(u) \to \mathscr{I}(u)$. The Moreau envelope is much easier to work with than the original functional because it's differentiable. In some cases it can be computed analytically; for example, when $\mathscr{I}$ is an indicator function,

$$\mathscr{I}_\gamma(u) = \frac{1}{2\gamma^2}\text{dist}\,(u, K)^2$$

where $\text{dist}$ is the distance to a convex set. We can do even better for our specific case, where $K$ is the set of all functions greater than $g$. For this choice of $K$, the distance to $K$ is

$$\text{dist}(u, K)^2 = \int_\Omega(u - g)_-^2dx,$$

where $v_- = \min(v, 0)$ is the negative part of $v$. So, our approach to solving the obstacle problem will be to find the minimzers of

$$\mathscr{J}_\gamma(u) = \int_\Omega\left(\frac{1}{2}|\nabla u|^2 - fu\right)dx + \frac{1}{2\gamma^2}\int_\Omega(u - g)_-^2dx$$

as $\gamma$ goes to 0. I've written things in such a way that $\gamma$ has units of length. Rather than take $\gamma$ to 0 we can instead stop at some fraction of the finite element mesh spacing. At that point, the errors in the finite element approximation are comparable to the distance of the approximate solution to the constraint set.

This is a lot like the penalty method for optimization problems with equality constraints. One of the main practical considerations when applying this regularization method is that the solution $u$ only satisfies the inequality constraints approximately. For the obstacle problem this deficiency isn't so severe, but for other problems we may need the solution to stay strictly feasible. In those cases, another approach like the logarithmic barrier method might be more appropriate.

Demonstration

For our problem, the domain will be the unit square and the obstacle function $g$ will be the upper half of a sphere.

import firedrake
nx, ny = 64, 64
mesh = firedrake.UnitSquareMesh(nx, ny, quadrilateral=True)
Q = firedrake.FunctionSpace(mesh, family='CG', degree=1)
from firedrake import max_value, sqrt, inner, as_vector, Constant

def make_obstacle(mesh):
    x = firedrake.SpatialCoordinate(mesh)
    y = as_vector((1/2, 1/2))
    z = 1/4
    return sqrt(max_value(z**2 - inner(x - y, x - y), 0))

g = firedrake.interpolate(make_obstacle(mesh), Q)
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d
fig = plt.figure()
axes = fig.add_subplot(projection='3d')
firedrake.trisurf(g, axes=axes);

Next we'll make a few utility procedures to create the Moreau envelope of the objective functional and to calculate a search direction from a given starting guess.

from firedrake import grad, dx, min_value

def make_objective(u, g, γ):
    J_elastic = 0.5 * inner(grad(u), grad(u)) * dx
    J_penalty = 0.5 / γ**2 * min_value(u - g, 0)**2 * dx
    return J_elastic + J_penalty
from firedrake import derivative
def update_search_direction(J, u, v):
    F = derivative(J, u)
    H = derivative(F, u)

    bc = firedrake.DirichletBC(u.function_space(), 0, 'on_boundary')
    params = {'ksp_type': 'cg', 'pc_type': 'icc'}
    firedrake.solve(H == -F, v, bc, solver_parameters=params)

Let's start from a zero initial guess and see what the first search direction will be.

u = firedrake.Function(Q)
γ = Constant(1.)
J = make_objective(u, g, γ)

v = firedrake.Function(Q)
update_search_direction(J, u, v)
fig = plt.figure()
axes = fig.add_subplot(projection='3d')
firedrake.trisurf(v, axes=axes);

To make sure that a Newton-type method will converge, we'll need a routine to perform a 1D minimization along the search direction.

from scipy.optimize import minimize_scalar
from firedrake import assemble, replace

def line_search(J, u, v):
    def J_line(step):
        t = firedrake.Constant(step)
        J_t = replace(J, {u: u + t * v})
        return assemble(J_t)

    result = minimize_scalar(J_line)
    assert result.success
    return result.x
t = line_search(J, u, v)
print(t)
1.0000833865577647

With these steps out of the way we can define a Newton search procedure and calculate a solution for our initial, rough guess of $\gamma$.

from firedrake import action
def newton_search(J, u, tolerance=1e-10, max_num_steps=30):
    v = firedrake.Function(u.function_space())
    F = derivative(J, u)
    
    for step in range(max_num_steps):
        update_search_direction(J, u, v)
        
        Δ = assemble(action(F, v))
        if abs(Δ) < tolerance * assemble(J):
            return

        t = Constant(line_search(J, u, v))
        u.assign(u + t * v)
newton_search(J, u)
fig = plt.figure()
axes = fig.add_subplot(projection='3d')
firedrake.trisurf(u, axes=axes);

The solution we obtain doesn't do a good job of staying above the obstacle because we haven't used a sufficiently small value of $\gamma$.

δ = firedrake.interpolate(max_value(g - u, 0), Q)
print(firedrake.assemble(δ * dx) / firedrake.assemble(g * dx))
0.9680039453017546

Instead, we can use the solution obtained for one value of $\gamma$ to initialize a search for the solution with $\gamma / 2$ and iterate. We've chosen this slightly indirect route rather than start from a small value of $\gamma$ directly because the problem may be very poorly conditioned. The numerical continuation approach can still give a reasonable answer even for poorly-conditioned problems.

def continuation_search(g, γ0, num_steps, contraction=0.5):
    u = g.copy(deepcopy=True)
    γ = Constant(γ0)

    for step in range(num_steps):
        J = make_objective(u, g, γ)
        newton_search(J, u)
        γ.assign(contraction * γ)

    return u

We'll choose a number of steps so that the final value of $\gamma$ is roughly proportional to the mesh spacing.

import numpy as np
num_steps = int(np.log2(nx)) + 1
print(num_steps)

u = continuation_search(g, 1., num_steps)
7

Finally, I'll plot a cross section of the solution and the constraint $g$ so that you can see where the two coincide.

fig, axes = plt.subplots()
num_points = 51
xs = np.linspace(0., 1., num_points)
ys = 0.5 * np.ones(num_points)
X = np.array((xs, ys)).T
axes.plot(xs, g.at(X), color='tab:orange')
axes.plot(xs, u.at(X), color='tab:blue');

Refinement

The code above worked well enough for a single grid, but one of the hard parts about optimization with PDE constraints is making sure that our algorithms do sane things under mesh refinement. Many common algorithms can have different convergence rates depending on the mesh or the degree of the finite element basis. The reasons for this are a little involved, but if you want to read more, I recommend this book by Málek and Strakos.

To really make sure we're doing things right, we should run this experiment at several levels of mesh refinement. We can do this easily using the MeshHierarchy function in Firedrake.

coarse_mesh = firedrake.UnitSquareMesh(nx, ny, quadrilateral=True)
num_levels = 3
mesh_hierarchy = firedrake.MeshHierarchy(coarse_mesh, num_levels)

for level, mesh in enumerate(mesh_hierarchy):
    Q = firedrake.FunctionSpace(mesh, family='CG', degree=1)
    g = firedrake.interpolate(make_obstacle(mesh), Q)
    num_continuation_steps = int(np.log(nx)) + level + 1
    u = continuation_search(g, 1, num_continuation_steps)
    print(assemble(max_value(g - u, 0) * dx))
0.0034424020045330894
0.0009409740393031862
0.00024126075715710023
6.073519021737773e-05

If we plot the volume of the region where $u$ is less than $g$, it decreases roughly by a factor of four on every mesh refinement. This rate of decrease makes sense -- the area of each cell decreases by the same amount on each refinement. Doing a more thorough convergence study would require more computational power, but for now this is a promising sign that our algorithm works right.

Discussion

We were able to get a convergent approximation scheme for the obstacle problem by expressing the constraint as an indicator functional and then using Moreau-Yosida regularization. The idea of regularizing non-smooth optimization problems is a more general trick; we can use it for things like $L^1$ or total variation penalties as well. The Moreau envelope is another angle to look at proximal algorithms from too.

For the obstacle problem, regularization made it possible to describe every part of the algorithm using higher-level concepts (fields, functionals) without having to dive down to lower levels of abstraction (matrices, vectors). In order to implement other approaches, like the active set method, we would have no choice but to pull out the PETSc matrices and vectors that lie beneath, which is a more demanding prospect.

Variational calculus

In this post I'll look at a classic example of a convex variational problem: computing minimal surfaces. The minimal surface problem has a simple physical interpretation in terms of soap films. Suppose you have a wire loop and you stretch a film of soap over it; what shape does the film take? The available energy that the film has to do mechanical work is proportional to the product of the surface tension and the area of the film. When the film is in equilibrium, it will minimize the energy, so it will find the surface of least area that stretches over the hoop. This shape is called a minimal surface.

Here we'll look at a geometrically simpler case where the surface can be described as the graph of a function defined on some footprint domain $\Omega$ that lives in the plane. We'll describe the position of the hoop as a function $g$ that maps the boundary $\partial\Omega$ to the reals, and the surface as a function $u$ on $\Omega$. The surface area of the graph of $u$ is the quantity

$$J(u) = \int_\Omega\sqrt{1 + |\nabla u|^2}\,dx.$$

So, our goal is to minimize the objective functional $J$ among all functions $u$ such that $u|_{\partial\Omega} = g$. This is a classic example in variational calculus, which I'll assume you're familiar with. If you haven't encountered this topic before, I learned about it from Weinstock's book.

The weak form of the Euler-Lagrange equation for $J$ is

$$\int_\Omega\frac{\nabla u\cdot\nabla v}{\sqrt{1 + |\nabla u|^2}}dx = 0$$

for all $v$ that vanish on the boundary. This PDE is just a specific way of stating the general condition that, for $u$ to be an extremum of $J$, its directional derivative along all perturbations $v$ must be 0:

$$\langle dJ(u), v\rangle = 0.$$

We can go a little bit further and calculate the second derivative of $J$ too:

$$\langle d^2J(u)\cdot v, w\rangle = \int_\Omega\frac{I - \frac{\nabla u\cdot \nabla u^*}{1 + |\nabla u|^2}}{\sqrt{1 + |\nabla u|^2}}\nabla v\cdot \nabla w\, dx,$$

Deriving this equation takes a bit of leg work, but the important part is that it looks like a symmetric, positive-definite elliptic operator, only the conductivity tensor depends on the gradient of $u$. Since the second derivative of $J$ is positive-definite, the minimization problem is convex and thus has a unique solution.

There are many approaches you could take to solving the minimal surface equation. I'll examine some here using the finite element modeling package Firedrake. If you're unfamiliar with Firedrake or FEniCS, their main selling point is that, rather than write code to fill matrices and vectors yourself, these packages use an embedded domain-specific language to describe the weak form of the PDE. The library then generates efficient C code on the spot to fill these matrices and vectors. Having done all this by hand for several years I can tell you this is a big improvement!

import firedrake

To keep things simple, we'll use the unit square as our spatial domain, and we'll use piecewise quadratic finite elements.

mesh = firedrake.UnitSquareMesh(100, 100, quadrilateral=True)
Q = firedrake.FunctionSpace(mesh, family='CG', degree=2)

I'll use a test case from some course notes from a class that Douglas Arnold teaches on finite element methods. The boundary curve is

$$g = ax\cdot\sin\left(\frac{5}{2}\pi y\right).$$

In the notes, Arnold uses $a = 1/5$. When the numerical range of $g$ is small relative to the diameter of the domain, the minimal surface equation linearizes to the Laplace equation. I want to instead look at the more nonlinear case of $a > 1$, which will stress the nonlinear solver a good deal more.

x, y = firedrake.SpatialCoordinate(mesh)

from numpy import pi as π
from firedrake import sin
a = firedrake.Constant(3/2)
g = a * x * sin(5 * π * y / 2)

A picture is worth a thousand words of course.

import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d
fig = plt.figure()
axes = fig.add_subplot(projection='3d')
firedrake.trisurf(firedrake.interpolate(g, Q), axes=axes);

Here we'll create the proposed solution $u$, define the objective functional, and try to find the minimizer naively using Firedrake's built-in solver. With the value for $a$ that I chose, the solver won't converge using its default settings.

u = firedrake.interpolate(g, Q)
bc = firedrake.DirichletBC(Q, g, 'on_boundary')

from firedrake import sqrt, inner, grad, dx
J = sqrt(1 + inner(grad(u), grad(u))) * dx
F = firedrake.derivative(J, u)

try:
    firedrake.solve(F == 0, u, bc)
except firedrake.ConvergenceError:
    print('Woops, nonlinear solver failed to converge!')
Woops, nonlinear solver failed to converge!

We could tweak these settings to make the solver converge, but instead let's try and dive deeper into what does and doesn't make for a good nonlinear solver.

Picard's method

This method is predicated on the idea that many nonlinear PDEs look like a linear problem with coefficients that depend on the solution. If you freeze those coefficients at the current guess for the solution, you get something that's fairly easy to solve and hopefully convergent. Suppose we've got a guess $u_n$ for the solution of the minimal surface equation. The Picard method would give us a next guess $u_{n + 1}$ that solves the linear PDE

$$\int_\Omega\frac{\nabla u_{n + 1}\cdot\nabla v}{\sqrt{1 + |\nabla u_n|^2}}dx = 0$$

for all $v$ that vanish on the boundary. This method is easy to implement if you know the functional form of the problem you're solving. Let's see how fast this decreases the area.

u.interpolate(g)
u_n = u.copy(deepcopy=True)
v = firedrake.TestFunction(Q)

G = inner(grad(u), grad(v)) / sqrt(1 + inner(grad(u_n), grad(u_n))) * dx

import numpy as np
num_iterations = 24
Js = np.zeros(num_iterations)
Js[0] = firedrake.assemble(J)
for step in range(1, num_iterations):
    firedrake.solve(G == 0, u, bc)
    u_n.assign(u)
    Js[step] = firedrake.assemble(J)

The method converges in the eyeball norm in about 6 iterations.

fig, axes = plt.subplots()
axes.scatter(list(range(num_iterations)), Js, label='surface area')
axes.set_xlabel('iteration')

axes = axes.twinx()
axes.scatter(list(range(1, num_iterations)), -np.diff(Js) / Js[1:],
             color='tab:orange', label='relative decrease')
axes.set_ylim(1e-6, 1)
axes.set_yscale('log')

fig.legend(loc='upper center');

This looks pretty good -- the iterates converge very rapidly to the minimizer. There are still reasons to look for something better though. Picard's method relies on the problem having special structure, which is true of the minimal surface equation but harder to find for other problems.

Newton's method (take 1)

One of the best known methods is due to Newton. The idea behind Newton's method is to use the Taylor expansion of the objective at the current guess $u_{n - 1}$ up to second order to define a quadratic approximation to the objective:

$$J(u_n + v) = J(u_n) + \langle F, v\rangle + \frac{1}{2}\langle Hv, v\rangle + \ldots$$

where $F = dJ(u_n)$, $H = d^2J(u_n)$ are the first and second derivatives of the objective. We can then define a new iterate as the minimizer of this quadratic problem:

$$u_{n + 1} = u_n + \text{argmin}_v\, \langle F, v\rangle + \frac{1}{2}\langle Hv, v\rangle.$$

The big advantage of Newton's method is that, for a starting guess sufficiently close to the solution, the iterates converge quadratically to the minimizer. Picard's method converges at best linearly.

One of the advantages of Newton's method is that there are many software packages for automatically calculating first and second derivatives of nonlinear functionals. So it's easy to apply to a broad class of problems. It isn't quite so clear how to select the right linear operator for Picard's method.

u.interpolate(g)

F = firedrake.derivative(J, u)
H = firedrake.derivative(F, u)

v = firedrake.Function(Q)

num_iterations = 24
Js = np.zeros(num_iterations + 1)
Js[0] = firedrake.assemble(J)

bc = firedrake.DirichletBC(Q, 0, 'on_boundary')
params = {'ksp_type': 'cg', 'pc_type': 'icc'}
try:
    for step in range(1, num_iterations):
        firedrake.solve(H == -F, v, bc, solver_parameters=params)
        u += v
        Js[step] = firedrake.assemble(J)
except firedrake.ConvergenceError:
    print('Newton solver failed after {} steps!'.format(step))
Newton solver failed after 3 steps!

Doesn't bode very well does it? Let's see what the objective functional did before exploding:

print(Js[:step])
[   4.27174587   13.03299587 2847.93224483]

Not a lot to save from the wreckage here -- the objective functional was increasing, which is just the opposite of what we want. What happened? Newton's method will converge quadratically if initialized close enough to the true solution. We don't have any idea a priori if we're close enough, and if we aren't then there's no guarantee that the iterates will converge at all. The example from Doug Arnold's course notes used a much smaller amplitude $a$ in the boundary data, so the initial guess is already within the convergence basin.

Newton's method (take 2)

But there's always hope! Suppose $v$ is a function such that the directional derivative of $J$ at $u$ along $v$ is negative:

$$\langle dJ(u), v\rangle < 0.$$

Then there must be some sufficiently small real number $t$ such that

$$J(u + t\cdot v) < J(u).$$

If we do have a descent direction in hand, then the problem of finding a better guess $u_{n + 1}$ starting from $u_n$ is reduced to the one-dimensional problem to minimize $J(u_n + t\cdot v)$ with respect to the real variable $t$.

If $H$ is any symmetric, positive-definite linear operator, then

$$v = -H^{-1}dJ(u)$$

is a descent direction for $J$. While the pure Newton method can diverge for some starting guesses, it does offer up a really good way to come up with descent directions for convex problems because the second derivative of the objective is positive-definite. This suggests the following algorithm:

$$\begin{align} v_n & = -d^2J(u_n)^{-1}dJ(u_n) \\ t_n & = \text{argmin}_t\, J(u_n + t\cdot v_n) \\ u_{n + 1} & = u_n + t_n\cdot v_n. \end{align}$$

This is called the damped Newton method or the Newton line search method. We can use standard packages like scipy to do the 1D minimization, as I'll show below.

u.interpolate(g)

F = firedrake.derivative(J, u)
H = firedrake.derivative(F, u)

v = firedrake.Function(Q)
bc = firedrake.DirichletBC(Q, 0, 'on_boundary')

import scipy.optimize
t = firedrake.Constant(1)
def J_t(s):
    t.assign(s)
    return firedrake.assemble(firedrake.replace(J, {u: u + t * v}))

num_iterations = 24
Js = np.zeros(num_iterations)
ts = np.zeros(num_iterations)
Δs = np.zeros(num_iterations)
Js[0] = firedrake.assemble(J)

from firedrake import action
for step in range(1, num_iterations):
    firedrake.solve(H == -F, v, bc, solver_parameters=params)
    Δs[step] = firedrake.assemble(-action(F, v))
    
    line_search_result = scipy.optimize.minimize_scalar(J_t)
    if not line_search_result.success:
        raise firedrake.ConvergenceError('Line search failed at step {}!'
                                         .format(step))
    t_min = firedrake.Constant(line_search_result.x)
    u.assign(u + t_min * v)

    ts[step] = t_min
    Js[step] = firedrake.assemble(J)

The same convergence plot as above for Newton's method paints a very different picture.

fig, axes = plt.subplots()
axes.scatter(list(range(num_iterations)), Js, label='surface area')
axes.set_xlabel('iteration')

axes = axes.twinx()
axes.scatter(list(range(1, num_iterations)), -np.diff(Js) / Js[1:],
             color='tab:orange', label='relative decrease')
axes.set_ylim(1e-16, 1)
axes.set_yscale('log')

fig.legend(loc='upper center');

The algorithm starts converges linearly just like the Picard method does. The second phase starts around iteration 15. By this point, the algorithm has entered the quadratic convergence basin and the errors plummet.

fig, axes = plt.subplots()
axes.scatter(list(range(1, num_iterations)), ts[1:])
axes.set_xlabel('iteration')
axes.set_ylabel('step size');

Another revealing way to look at this is to examine the Newton decrement. The Newton decrement $\Delta_k$ at step $k$ is defined as the directional derivative of the objective functional along the search direction:

$$\Delta_k = \langle dJ(u_k), v_k\rangle.$$

The Newton decrement is approximately half of the difference between the value of the objective at the current guess and the next guess:

$$\begin{align} J(u_{k + 1}) - J(u_k) & = \langle dJ(u_k), v_k\rangle + \frac{1}{2}\langle d^2J(u_k)v_k, v_k\rangle + \ldots \\ & = \langle dJ(u_k), v_k\rangle - \frac{1}{2}\langle dJ(u_k), v_k\rangle + \ldots\\ & = \frac{1}{2}\langle dJ(u_k), v_k\rangle + \ldots \end{align}$$

where we have used the fact that $v_k = -d^2J(u_k)^{-1}dJ(u_k)$ in going from the first line to the second. An informal way of describing the Newton decrement is that it gives an upper bound on how much we can expect reduce the objective functional by one more iteration of Newton's method.

The plot below shows the ratio of the differences in the objective functional to the value of the Newton decrement.

fig, axes = plt.subplots()
axes.scatter(list(range(1, num_iterations)), -2 * np.diff(Js) / Δs[1:])
axes.set_xlabel('iteration')
axes.set_ylabel('Actual decrease / expected decrease');

During the first few iterations while the method is iterating towards the quadratic convergence basin, the Newton decrement is less than half of the actual decrease. Once the method has found the convergence basin the ratio hits almost exactly 1/2 at a few points. Finally, once the method has effectively converged, the ratio drops to 0.

In the code above, I picked a fixed value of the iteration count. For real applications it's better to have a dynamic stopping criterion based on the current and past state. The Newton decrement is a useful quantity in this respect because it depends only on the current state. Here we'll plot the ratio of the Newton decrement to the value of the objective itself.

fig, axes = plt.subplots()
axes.scatter(list(range(1, num_iterations)), Δs[1:] / Js[:-1])
axes.set_ylim(1e-7, 1)
axes.set_yscale('log')
axes.set_xlabel('iteration')
axes.set_ylabel('Newton decrement / objective');

A sensible stopping criterion is that $\Delta_k < \epsilon \cdot J(u_k)$ where the tolerance $\epsilon$ is about $10^{-6}$. It's handy to think about this in terms of the informal description of the Newton decrement -- stop iterating when the expected gain is less than some small fraction of the currrent cost.

For the minimal surface equation, the objective functional is strictly positive and convex. Other problems might be convex but the objective is a sum of parts that can be either positive or negative. For example, for viscous flow problems, the objective can be divided into internal viscous dissipation of energy (strictly positive) and the power from driving stress (positive or negative). For more general problems it would then be incumbent on you as the modeler to know in advance which parts of the objective are strictly positive and use these to set the scale in the convergence criterion.

fig = plt.figure()
axes = fig.add_subplot(projection='3d')
firedrake.trisurf(u, axes=axes);

Kac's conjecture

In a previous post, we gave a numerical demonstration of the Weyl asymptotic law, which relates the eigenvalues of the Laplace operator on a domain to its area and perimeter. If this domain were a drumhead, the eigenvalues are essentially its resonant frequencies. Weyl's law tells you that you can hear the area and perimeter of a drumhead if only you listen carefully. In 1966, Mark Kac posed the tantalizing question of whether you could hear not just the area and perimeter, but the entire shape of a drum. Knowing all the eigenvalues of a domain, can you reconstruct the whole shape? In what must have been a pretty brutal wet blanket moment, John Milnor showed that there are two distinct 16-dimensional manifolds with the same spectrum, so the answer to Kac's question is no in general. Now I've never played a 16-dimensional drum, but in 1992 a counterexample was found in the plane. Here we'll study this problem numerically.

Geometry

The two shapes we'll use were devised by Gordon, Webb, and Wolpert. I got the coordinates for the polygon vertices by looking at this figure from the wikipedia page on hearing the shape of a drum.

import numpy as np
drum1 = np.array([[1, 3],
                  [0, 2],
                  [1, 1],
                  [2, 1],
                  [2, 0],
                  [3, 1],
                  [3, 2],
                  [1, 2]])

drum2 = np.array([[0, 3],
                  [0, 2],
                  [2, 0],
                  [2, 1],
                  [3, 1],
                  [2, 2],
                  [1, 2],
                  [1, 3]])

To solve PDEs on these shapes, we'll first generate outlines using the package pygmsh and save it to a .geo file.

import pygmsh
def make_geo(vertices, name, δx = 0.25):
    geometry = pygmsh.built_in.Geometry()
    N = len(vertices)

    points = [geometry.add_point([x[0], x[1], 0], lcar=δx) for x in vertices]
    lines = [geometry.add_line(points[n], points[(n + 1) % N]) for n in range(N)]
    line_loop = geometry.add_line_loop(lines)
    plane_surface = geometry.add_plane_surface(line_loop)

    for line in lines:
        geometry.add_physical(line)
    geometry.add_physical(plane_surface)

    with open(name, 'w') as geo_file:
        geo_file.write(geometry.get_code())
make_geo(drum1, 'drum1.geo')
make_geo(drum2, 'drum2.geo')

Then we'll use the mesh generator gmsh to triangulate the interiors of each polygon and save the resulting mesh to a .msh file.

!gmsh -2 -format msh2 -v 0 drum1.geo
!gmsh -2 -format msh2 -v 0 drum2.geo

Finally we'll read in the meshes using Firedrake and refine them.

import firedrake
coarse_mesh1 = firedrake.Mesh('drum1.msh')
mesh_hierarchy1 = firedrake.MeshHierarchy(coarse_mesh1, 4)
mesh1 = mesh_hierarchy1[-1]

coarse_mesh2 = firedrake.Mesh('drum2.msh')
mesh_hierarchy2 = firedrake.MeshHierarchy(coarse_mesh2, 4)
mesh2 = mesh_hierarchy2[-1]
%matplotlib inline
import matplotlib.pyplot as plt
fig, axes = plt.subplots(ncols=2, sharex=True, sharey=True)
for ax in axes:
    ax.set_aspect('equal')
    
firedrake.triplot(coarse_mesh1, axes=axes[0])
firedrake.triplot(coarse_mesh2, axes=axes[1]);

Solving the eigenproblem

To solve the eigenproblem, we'll use routines from package SLEPc just like we've done before for Weyl's law and Yau's conjecture. Since Firedrake is built on top of PETSc, any assembled linear operator has a PETSc matrix object living somewhere in the murky depths. Setting up the eigenproblem is just a matter of pulling out the underlying matrix objects and passing them to SLEPc.

from firedrake import inner, grad, dx
from petsc4py import PETSc
from slepc4py import SLEPc

def get_eigenvalues(mesh, num_eigenvalues=200):
    Q = firedrake.FunctionSpace(mesh, 'CG', 1)
    ϕ = firedrake.TestFunction(Q)
    ψ = firedrake.TrialFunction(Q)
    a = inner(grad(ϕ), grad(ψ)) * dx
    m = ϕ * ψ * dx

    bc = firedrake.DirichletBC(Q, firedrake.Constant(0), 'on_boundary')
    A = firedrake.assemble(a, bcs=bc).M.handle
    M = firedrake.assemble(m).M.handle

    opts = PETSc.Options()
    opts.setValue('eps_gen_hermitian', None)
    opts.setValue('eps_target_real', None)
    opts.setValue('eps_smallest_real', None)
    opts.setValue('st_type', 'sinvert')
    opts.setValue('st_ksp_type', 'cg')
    opts.setValue('st_pc-type', 'jacobi')
    opts.setValue('eps_tol', 1e-8)
    
    eigensolver = SLEPc.EPS().create(comm=firedrake.COMM_WORLD)
    eigensolver.setDimensions(num_eigenvalues)
    eigensolver.setOperators(A, M)
    eigensolver.setFromOptions()
    
    eigensolver.solve()
    
    num_converged = eigensolver.getConverged()
    λs = np.array([eigensolver.getEigenvalue(k) for k in range(num_converged)]).real
    return λs

As we expect, the eigenvaue distribution of the two domains is practically the same in the eyeball norm.

Λ1 = get_eigenvalues(mesh1)
Λ2 = get_eigenvalues(mesh2)

fig, axes = plt.subplots()
axes.plot(Λ1, color='tab:green')
axes.plot(Λ2, color='tab:blue');

To be more precise, the relative difference in the first 200 eigenvalues is below a 0.1%.

fig, axes = plt.subplots()
axes.plot(list(range(200)), np.abs((Λ1[:200] - Λ2[:200]) / Λ2[:200]));

Discussion

Steve Zelditch has a wonderful review of this topic on arXiv. As we've shown here experimentally, there are distinct isospectral planar domains, but the domains we used aren't convex. Zelditch proved that the spectrum uniquely characterizes the domain if it's convex and has an analytic boundary.

This type of problem has ramifications beyond just diverting mathematicians. For example, the same underlying mathematics shows up in eigenvalue problems for the Schroedinger equation. Knowing the interatomic potential, it's possible to calculate the quantum-mechanical energy levels. But if it's possible to infer things about potentials from energy levels -- essentially the Kac conjecture but for quantum mechanics -- then this could be useful in experimental physics as well.

Yau's Conjecture

In the last post we looked at the distribution of eigenvalues of the Laplace operator; this time we'll look at eigenfunctions. As a reminder, a function $\phi$ and a real number $\lambda$ are an eigenfunction and eigenvalue of the Laplace operator on a domain $\Omega$ if

$$-\Delta\phi = \lambda^2\phi$$

together with the Dirichlet boundary condition $\phi|_{\partial\Omega} = 0$. I'm using a slightly different convention that emphasizes the positivity of the Laplace operator, and which gives the eigenvalues $\lambda$ units of an inverse length.

The eigenfunctions grow more and more oscillatory for larger values of $\lambda$ and there are number of results that quantify this idea. A classical result is the Courant nodal domain theorem. The nodal domain theorem states that the zero set of the $n$-th eigenfunction of the Laplace operator divides the domain into at most $n$ regions, although the number of nodal domains can be less than $n$. Highly symmetric domains like the sphere or the torus provide interesting cases because, for these kinds of domains, you tend to get very degenerate eigenspaces. The Courant nodal domain theorem only gives a kind of upper bound on how oscillatory the eigenfunctions get, not a lower bound.

To refine the concept a bit further, we'll do a little experiment to verify the Yau conjecture. The Yau conjecture states that, as $\lambda \to \infty$, the area of the nodal set of $\phi_\lambda$ is bounded above and below a multiple of $\lambda$:

$$c\lambda \le \text{area}(\{\phi_\lambda = 0\}) \le C\lambda.$$

Donnelly and Fefferman proved the Yau conjecture for analytic manifolds (with and without boundary) in 1990. For smooth manifolds, Logunov announced a proof of the lower bound in May of 2019, but as far as I know the upper bound hasn't been resolved yet.

import numpy as np
from numpy import pi as π
def add_ellipse(geometry