The Hessian Matrix
Previously, we discussed the quadratic approximation of a function. We noticed how it is very useful, but can be complicated with a lot of terms. In this section, we will introduce the Hessian matrix, and how it can be used to create a vectorized version of the quadratic approximation.
Table of Contents
Introduction
Recall that the gradient of a function 
The Hessian matrix is just that, but with the second partial derivatives.
For instance, for a two-variable function 
The Hessian matrix of a three-variable function 
Then, in any 
Where the 
Given that each element in the Hessian matrix is a function of the inputs, the Hessian matrix is a function of the inputs as well. In other words, the Hessian is sort of a "matrix-valued function".
Evaluating the Hessian Matrix
For instance, consider this function:
The Hessian matrix of 
First, calculate its first partial derivatives:
Then the Hessian is:
Properties of the Hessian Matrix
The Hessian matrix has some interesting properties that can be useful in optimization problems.
Symmetry
If the function 
Let's consider the Hessian matrix for some function 
This means that the Hessian matrix is symmetric.
For instance, consider the Hessian matrix of a three-variable function 
Relationship with the Gradient and Jacobian
Recall the gradient of a function 
In other words, the gradient is used for scalar-valued functions, and the Jacobian is used for vector-valued functions.
Write out the gradient of a function 
The Jacobian of this gradient is then:
Quadratic Approximations in Vector Form
Recall that the quadratic approximation of a function 
In order to vectorize this, first, treat the input not as two scalars 
The first constant can be written as:
The linear terms can be written as:
For the quadratic terms, things get a bit more complicated. First, recall how matrix multiplication works:
We can first make the elements in the matrix the constants in the quadratic terms.
The term involving 
When we multiply this matrix with the vector 
Simplify this as:
(This matrix-vector calculation is elaborated on in the appendix.)
Thus, the quadratic approximation in vector form is:
Notice that this is very similar to the 2nd-degree Taylor polynomial for a single-variable function;
What about Higher-Degree Polynomials?
Recall from single-variable calculus that we can use the Taylor polynomial as an approximation for a function. The quadratic approximation is essentially a 2nd-degree Taylor polynomial.
Just like single-variable calculus, we can use higher-degree polynomials for multivariable functions.
The problem is that we would need to construct something like a cubic Hessian matrix, which would be 
Summary and Next Steps
In this section, we introduced the Hessian matrix, a matrix of second partial derivatives.
- 
The Hessian matrix is a matrix of second partial derivatives.
 - 
The Hessian is symmetric if the function is twice continuously differentiable.
 - 
The Hessian matrix can also be written as the transpose of the Jacobian of the gradient;
 . - 
The quadratic approximation of a function can be written in vector form using the Hessian matrix:
 
In the next section, we will discuss how all of this can be used in optimization problems.
Appendix: Matrix-Vector Multiplication
Given a matrix 
The matrix-vector multiplication is:
Now consider that term in the quadratic approximation:
First calculate the product 
Then, calculate the product 
Finally, multiply by