Skip to main content

Multivariable Chain Rule

The chain rule is a fundamental concept in calculus, and it can be extended to multivariable functions.

Table of Contents

Definition

Consider a function and a pair of functions and .

Then, consider applying to both:

This can be thought of as a series of transformations:

  1. Start with a number line for .
  2. Transform to a plane with .
  3. Transform to a number line with .

Notice how, although a plane is involved, it still starts with a single number and ends with a single number. Therefore, it is still a single-variable function.

Next, consider taking the derivative of . To do so, we can apply the chain rule.

We can start with an example:

Then, consider the derivative of :

This is fine, but there's a more general way to think about this.

Consider the partial derivatives of :

Next, consider the derivative of each function and :

Notice that the derivative of can be written as:

This is known as the multivariable chain rule.

Intuition for the Multivariable Chain Rule

We've just used one example and noticed a (possibly coincidental) pattern, but this should also make intuitive sense.

First, recall the intuition for the regular chain rule.

Consider a function . This is essentially two transformations:

  1. Start with a number line for .
  2. Transform to a number line with .
  3. Transform to a number line with .

Now, consider a change in , , and consider how the change sort of "propagates" through the transformations:

  1. The change in is .
  2. The change in is , since the fraction essentially cancels out.
  3. The change in is .

Then:

And dividing by gives the chain rule.

Let's extend this intuition to the multivariable case.

  1. You start with a number line for . The change in is .
  2. The change in causes a change in and . The change in is and the change in is , due to the cancelling of differentials.
  3. Both of these changes result in a change in . You could think of this as the sum of a change in due to a change in and a change in due to a change in .
    • The change in due to a change in is .
    • The change in due to a change in is .

Then:

And dividing by gives:

Vector Form of the Multivariable Chain Rule

The multivariable chain rule can be written in vector form.

We've used as separate parameters, but we can also think of as a function that takes a vector as input.

The vector can be written as:

Then, the derivative of can be written as:

Recall the multivariable chain rule:

Notice that this is basically a dot product:

This should also make intuitive sense, as it is very similar to the regular chain rule:

  • corresponds to ; the gradient is sort of an extension of the full derivative.
  • corresponds to .

Duality of the Multivariable Chain Rule and the Directional Derivative

One thing to notice is that the multivariable chain rule looks very similar to the directional derivative.

Recall the directional derivative:

And the vector-form multivariable chain rule:

The vector-form rule is essentially the directional derivative of in the direction of :

Consider why this is the case. Recall that the composition of functions can be thought of as a series of transformations:

  1. Start with a number line for .
  2. Transform to a plane with .
  3. Transform to a number line with .

When you increment a value by a vector in the plane, and measure the change in , that's a directional derivative.

The vector in question is caused by the change in . So, the directional derivative is in the direction of .

Formalizing the Multivariable Chain Rule

We have shown various ways to intuitively think about the multivariable chain rule, but let's treat it more formally now.

Recall that we used the cancellation of differentials to derive the chain rule. This is not rigorous, but it is still helpful because it very closely resembles the formal treatment.

Recall the vector form of the multivariable chain rule:

And the limit definition of the derivative (since this is a single-variable function):

Recall our intuition for the chain rule: the change in is based on the change in , which is based on the change in . As such, consider the derivative of :

Now we're going to do something that might be unfamiliar, but we're going to rewrite the limit as a sum of two terms:

The term represents the error term, which is the difference between the limit and the actual value. It should approach zero as approaches zero. This is common in epsilon-delta proofs in real analysis.

Multiply both sides by :

Rewrite as . This is a common notation in analysis, and all it represents is that as .

Then, rewrite as:

This is based on the definition of the derivative as a slope. We apply the slope to to get the change in , and then add the error term.

Substitute this back into the definition of the derivative of :

Since and is small, we can cancel out the term:

Finally, recall the definition of the directional derivative:

Looks familiar? The multivariable chain rule is essentially the directional derivative of in the direction of :

This also illustrates the power of using vectors, as well as an interplay between intuition and formalization - our entire manipulation was to evaluate the different nudges in a formal way.