The Hessian Matrix
Previously, we discussed the quadratic approximation of a function. We noticed how it is very useful, but can be complicated with a lot of terms. In this section, we will introduce the Hessian matrix, and how it can be used to create a vectorized version of the quadratic approximation.
Table of Contents
Introduction
Recall that the gradient of a function
The Hessian matrix is just that, but with the second partial derivatives.
For instance, for a two-variable function
The Hessian matrix of a three-variable function
Then, in any
Where the
Given that each element in the Hessian matrix is a function of the inputs, the Hessian matrix is a function of the inputs as well. In other words, the Hessian is sort of a "matrix-valued function".
Evaluating the Hessian Matrix
For instance, consider this function:
The Hessian matrix of
First, calculate its first partial derivatives:
Then the Hessian is:
Properties of the Hessian Matrix
The Hessian matrix has some interesting properties that can be useful in optimization problems.
Symmetry
If the function
Let's consider the Hessian matrix for some function
This means that the Hessian matrix is symmetric.
For instance, consider the Hessian matrix of a three-variable function
Relationship with the Gradient and Jacobian
Recall the gradient of a function
In other words, the gradient is used for scalar-valued functions, and the Jacobian is used for vector-valued functions.
Write out the gradient of a function
The Jacobian of this gradient is then:
Quadratic Approximations in Vector Form
Recall that the quadratic approximation of a function
In order to vectorize this, first, treat the input not as two scalars
The first constant can be written as:
The linear terms can be written as:
For the quadratic terms, things get a bit more complicated. First, recall how matrix multiplication works:
We can first make the elements in the matrix the constants in the quadratic terms.
The term involving
When we multiply this matrix with the vector
Simplify this as:
(This matrix-vector calculation is elaborated on in the appendix.)
Thus, the quadratic approximation in vector form is:
Notice that this is very similar to the 2nd-degree Taylor polynomial for a single-variable function;
What about Higher-Degree Polynomials?
Recall from single-variable calculus that we can use the Taylor polynomial as an approximation for a function. The quadratic approximation is essentially a 2nd-degree Taylor polynomial.
Just like single-variable calculus, we can use higher-degree polynomials for multivariable functions.
The problem is that we would need to construct something like a cubic Hessian matrix, which would be
Summary and Next Steps
In this section, we introduced the Hessian matrix, a matrix of second partial derivatives.
-
The Hessian matrix is a matrix of second partial derivatives.
-
The Hessian is symmetric if the function is twice continuously differentiable.
-
The Hessian matrix can also be written as the transpose of the Jacobian of the gradient;
. -
The quadratic approximation of a function can be written in vector form using the Hessian matrix:
In the next section, we will discuss how all of this can be used in optimization problems.
Appendix: Matrix-Vector Multiplication
Given a matrix
The matrix-vector multiplication is:
Now consider that term in the quadratic approximation:
First calculate the product
Then, calculate the product
Finally, multiply by