Neural Ordinary Differential Equation Adjoints

Chris Rackauckas
November 20th, 2020

In this homework, we will write an implementation of neural ordinary differential equations from scratch. You may use the DifferentialEquations.jl ODE solver, but not the adjoint sensitivities functionality. Optionally, a second problem is to add GPU support to your implementation.

Due December 9th, 2020 at midnight.

Please email the results to 18337.mit.psets@gmail.com.

Problem 1: Neural ODE from Scratch

In this problem we will work through the development of a neural ODE.

Part 1: Gradients as vjps

Use the definition of the pullback as a vector-Jacobian product (vjp) to show that $B_f^x(1) = \left( \nabla f(x) \right)^{T}$ for a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$.

(Hint: if you put 1 into the pullback, what kind of function is it? What does the Jacobian look like?)

Part 2: Backpropagation of a neural network

Implement a simple $NN: \mathbb{R}^2 \rightarrow \mathbb{R}^2$ neural network

\[ NN(u;W_i,b_i) = W_2 tanh.(W_1 u + b_1) + b_2 \]

where $W_1$ is $50 \times 2$, $b_1$ is length 50, $W_2$ is $2 \times 50$, and $b_2$ is length 2. Implement the pullback of the neural network: $B_{NN}^{u,W_i,b_i}(y)$ to calculate the derivative of the neural network with respect to each of these inputs. Check for correctness by using ForwardDiff.jl to calculate the gradient.

Part 3: Implementing an ODE adjoint

The adjoint of an ODE can be described as the set of vector equations:

\[ \begin{align} u' &= f(u,p,t)\\ \end{align} \]

forward, and then

\[ \begin{align} \lambda' &= -\lambda^\ast \frac{\partial f}{\partial u}\\ \mu' &= -\lambda^\ast \frac{\partial f}{\partial p}\\ \end{align} \]

solved in reverse time from $T$ to $0$ for some cost function $C(p)$. For this problem, we will use the L2 loss function.

Note that $\mu(T) = 0$ and $\lambda(T) = \frac{\partial C}{\partial u(T)}$. This is written in the form where the only data point is at time $T$. If that is not the case, the reverse solve needs to add the jump $\frac{\partial C}{\partial u(t_i)}$ to $\lambda$ at each data point $u(t_i)$. Use this example for how to add these jumps to the equation.

Using this formulation of the adjoint, it holds that $\mu(0) = \frac{\partial C}{\partial p}$, and thus solving these ODEs in reverse gives the solution for the gradient as a part of the system at time zero.

Notice that $B_f^u(\lambda) = \lambda^\ast \frac{\partial f}{\partial u}$ and similarly for $\mu$. Implement an adjoint calculation for a neural ordinary differential equation where

\[ u' = NN(u) \]

from above. Solve the ODE forwards using OrdinaryDiffEq.jl's Tsit5() integrator, then use the interpolation from the forward pass for the u values of the backpass and solve.

(Note: you will want to double check this gradient by using something like ForwardDiff! Start with only measuring the datapoint at the end, then try multiple data points.)

Part 4: Training the neural ODE

Generate data from the ODE $u' = Au$ where A = [-0.1 2.0; -2.0 -0.1] at t=0.0:0.1:1.0 (use saveat) with $u(0) = [2,0]$. Define the cost function C(θ) to be the Euclidean distance between the neural ODE's solution and the data. Optimize this cost function by using gradient descent where the gradient is your adjoint method's output.

(Note: calculate the cost and the gradient at the same time by using the forward pass to calculate the cost, and then use it in the adjoint for the interpolation. Note that you should not use saveat in the forward pass then, because otherwise the interpolation is linear. Instead, post-interpolate the data points.)

(Optional) Problem 2: Array-Based GPU Computing

If you have access to a GPU, you may wish to try the following.

Part 1: GPU Neural Network

Change your neural network to be GPU-accelerated by using CuArrays.jl for the underlying array types.

Part 2: GPU Neural ODE

Change the initial condition of the ODE solves to a CuArray to make your neural ODE GPU-accelerated.