Skip to main content

The Hessian Matrix

Previously, we discussed the quadratic approximation of a function. We noticed how it is very useful, but can be complicated with a lot of terms. In this section, we will introduce the Hessian matrix, and how it can be used to create a vectorized version of the quadratic approximation.

Table of Contents

Introduction

Recall that the gradient of a function is essentially just a vector of partial derivatives;

The Hessian matrix is just that, but with the second partial derivatives. For instance, for a two-variable function , the Hessian matrix would be a matrix;

The Hessian matrix of a three-variable function would be a matrix;

Then, in any -variable function, the Hessian matrix would be an matrix:

Where the -th entry is:

Given that each element in the Hessian matrix is a function of the inputs, the Hessian matrix is a function of the inputs as well. In other words, the Hessian is sort of a "matrix-valued function".

Evaluating the Hessian Matrix

For instance, consider this function:

The Hessian matrix of can be written as:

First, calculate its first partial derivatives:

Then the Hessian is:

Properties of the Hessian Matrix

The Hessian matrix has some interesting properties that can be useful in optimization problems.

Symmetry

If the function is twice continuously differentiable, meaning its second derivatives exist and are continuous, then by Schwarz's theorem, all mixed partial derivatives are equal:

Let's consider the Hessian matrix for some function . Recall from Equation that the -th entry is . Then, by Schwarz's theorem:

This means that the Hessian matrix is symmetric. For instance, consider the Hessian matrix of a three-variable function :

Relationship with the Gradient and Jacobian

Recall the gradient of a function is a vector of partial derivatives, and the Jacobian of a function is a matrix of partial derivatives.

In other words, the gradient is used for scalar-valued functions, and the Jacobian is used for vector-valued functions.

Write out the gradient of a function :

The Jacobian of this gradient is then:

Quadratic Approximations in Vector Form

Recall that the quadratic approximation of a function is:

In order to vectorize this, first, treat the input not as two scalars and , but as a vector . Treat the point as a vector .

The first constant can be written as:

The linear terms can be written as:

For the quadratic terms, things get a bit more complicated. First, recall how matrix multiplication works:

We can first make the elements in the matrix the constants in the quadratic terms. The term involving does not have a half since two of them are added together (recall ). Hence, we split it into two and put them separately in the matrix. We can write this matrix as:

When we multiply this matrix with the vector , then apply its transpose to the vector, we get the quadratic terms:

Simplify this as:

(This matrix-vector calculation is elaborated on in the appendix.)

Thus, the quadratic approximation in vector form is:

Notice that this is very similar to the 2nd-degree Taylor polynomial for a single-variable function;

What about Higher-Degree Polynomials?

Recall from single-variable calculus that we can use the Taylor polynomial as an approximation for a function. The quadratic approximation is essentially a 2nd-degree Taylor polynomial.

Just like single-variable calculus, we can use higher-degree polynomials for multivariable functions. The problem is that we would need to construct something like a cubic Hessian matrix, which would be . This is known as a tensor, and it is beyond the scope of this course.

Summary and Next Steps

In this section, we introduced the Hessian matrix, a matrix of second partial derivatives.

  • The Hessian matrix is a matrix of second partial derivatives.

  • The Hessian is symmetric if the function is twice continuously differentiable.

  • The Hessian matrix can also be written as the transpose of the Jacobian of the gradient; .

  • The quadratic approximation of a function can be written in vector form using the Hessian matrix:

In the next section, we will discuss how all of this can be used in optimization problems.

Appendix: Matrix-Vector Multiplication

Given a matrix and a vector :

The matrix-vector multiplication is:

Now consider that term in the quadratic approximation:

First calculate the product :

Then, calculate the product :

Finally, multiply by to get the quadratic terms: