Skip to main content

Extending the Derivative: Part 1 (Old Notes)

From single-variable calculus, the derivative was defined for functions with a single input and a single output. There are many functions in mathematics and physics that have multiple inputs and multiple outputs. In this case, we can extend the concept of the derivative to functions of multiple variables.

Table of Contents

Second Partial Derivatives

We can take multiple partial derivatives of a function.

Consider the following function:

There are two partial derivatives, and for each of them, there are two more second partial derivatives. Recall that the notation for a second normal derivative, at least in Leibniz notation, is . For a second partial derivative:

  • If it's the same variable for both derivatives, we write .
  • If it's different variables (say then ), we write it as sort of an expansion of , which is .

We can show the different derivatives using a tree.

Let's first compute the first partial derivatives of . The things kept constant are shown in different colors to help understand the computation.

And the partial derivative with respect to :

Now, let's compute the second partial derivatives.

DerivativeComputation

Notice something interesting: . This is a property that certain functions have, called the Symmetry of Second Derivatives.

There's a formal definition for this property called Schwarz's Theorem, which states that if the second partial derivatives of a function are continuous in a region, then the mixed partial derivatives are equal. This theorem is discussed in the appendix.

The Gradient

There's a few ways to think about the gradient of a function.

Purely computationally, the gradient is essentially a collection of all the partial derivatives of a function.

So for a function , the gradient is:

Consider the function . The gradient of is then:

The gradient is often denoted as , which is pronounced "nabla f" or "del f".

Hence, to create a more general definition, we can define the gradient of a function as:

Notice that the gradient is a vector, and its dimensions match the number of inputs to the function. In terms of basis vectors, the gradient can be written as:

Recall that the partial derivative is an "incomplete" way to measure the rate of change. In this sense, the gradient can be thought of as the "full" derivative of a function of multiple variables.

The Nabla

One convenient way to think about the gradient is to consider the symbol as a vector of partial derivative operators. It's easier to understand this with an example. For a function , is:

Then, the gradient of is simply a vector multiplication:

Gradients in the Context of Graphs

Outside of the computational context, there's a graphical way to think about the gradient. Consider this simple function , and its gradient:

We can plot this as a vector field.

In this graph, the gradient is represented as a vector field, more commonly known as a "gradient field".

One thing to note is that the gradient points in the "direction of the steepest ascent" of the function. So if you were to walk along the surface of the function, you would be walking in the direction where the function increases the fastest. This is not immediately obvious, but will become more apparent in light of directional derivatives.

Gradient in Contour Plots

It's important to understand how the gradient relates to contour plots.

Consider the function . The contour plot of this function is:

The gradient of is:

Like before, we can plot this as a gradient field along with the contour plot:

The important thing to notice is that the vector appears to point perpendicular to the contour lines. To see why this is the case, zoom in on 2 contour lines:

Recall that the gradient points in the direction of the steepest ascent. Instead of thinking of the steepest ascent, consider which direction the function increases from to in the shortest distance. This is essentially considering the shortest path between the two contour lines. Since the contour lines are almost parallel to each other, the gradient vector will be perpendicular to the contour lines.

The Directional Derivative

The directional derivative is somewhat of an extension or generalization of the partial derivative.

Consider a function that outputs a single value. It can be thought of as mapping a point in a 2D plane to a point in a number line.

When we consider the partial derivative with respect to , we consider a change in , and likewise for .

Now, instead of thinking of these individually, consider a change in both and at the same time. For instance, if and incremented by some vector , how much would change?

Recall that the derivative takes a limit as this change approaches . So instead of thinking about an actual vector , we're really thinking about some where .

For instance, consider this vector:

Since we're considering , we can think of this as:

It can be thought of as a nudge in the direction and negative two nudges in the direction. The notation for this includes:

To evaluate this directional derivative, we can use a combination of partial derivatives based on the components of :

Notice that this resembles the dot product of the gradient and :

In order for the directional derivative to be the slope of the tangent line, then the vector must be a unit vector. Otherwise, it would be scaled by the magnitude of .

Formalizing the Directional Derivative

We can formalize the directional derivative as a limit. Recall that the partial derivative is defined as:

We can switch up the notation a bit to make it more general. Instead of and , we can use a vector as the input to . Then, the partial derivative is:

Notice that our change in is now , where is the unit vector in the direction. The reason we use this notation is to make it much easier to extend - all the information about the direction of the change is contained in , so we can easily change the direction of the derivative.

So, the directional derivative is defined as:

To visualize a directional derivative, consider once again, the input space of , which is a 2D plane. Instead of the input being , we can think of it as a vector . Then, the directional derivative is the rate of change of at in the direction of :

The Directional Derivative in the Context of Graphs

Recall that the partial derivative can be visualized as a slice of a surface in a 3D graph. This slice can take two directions, and .

In a similar way, the directional derivative can be visualized as a slice of the surface in a direction .

Consider the following function:

Suppose we want to find the directional derivative of at the point in the direction of .

Since derivatives are graphically represented as slopes, we need to make a unit vector. Hence:

Let's slice the graph of at , with the direction of :

We can evaluate the directional derivative by first finding the gradient of at :

Recall that the directional derivative is the dot product of the gradient and :

To illustrate why the vector has to be a unit vector, consider the same directional derivative but with a different magnitude :

Therefore, the slope can be defined as the rate of change of the function in the direction of the unit vector :

The Gradient and the Directional Derivative

Recall the definition for the directional derivative:

Let's assume is a unit vector. Then, consider the direction of the steepest ascent at some point . This can be thought of as finding a that maximizes for all .

Let's once again consider the input space of , which is a 2D plane.

Recall that the dot product of two vectors and is equal to:

Therefore, to maximize the dot product, the projection of onto must be maximized, which means that must be in the direction of . Hence:

This means:

One important takeaway is that the gradient is a tool that can be used in conjunction with other things to analyze the behavior of a function.

Differentiating Vector-Valued Functions

So far, we've been dealing with functions that output a single value. Now, we consider functions that output a vector.

Consider the following parametric function:

This can be written as a vector-valued function:

Vector-valued functions, like many functions, can be thought of as a transformation. In this case, it transforms a point in a number line to a point in a 2D plane.

Consider taking the derivative of this function. Recall that by taking the derivative, you increase the input by a small amount, and see how the output changes. A drawing of a parametric function can help illustrate this:

The change in the input is and the change in the output is .

Therefore, the derivative can be written as:

Since , the derivative can be computed as:

This is essentially the same as taking the derivative of a function that outputs a single value, but for each component of the vector.

Another way to think about this is to consider the change in visually:

Since , as , , and:

This can be thought of as the implicit differentiation of vector-valued functions.

The Magnitude of the Parametric Derivative

The magnitude of the parametric derivative is the rate of change of the vector-valued function. To illustrate this, consider two different vector-valued functions:

First, let's graph :

Notice that, at , the curve goes halfway around the unit circle. Now consider :

In this case, at , the curve goes halfway around the unit circle. So it's twice as fast as , if you consider as physical time.

Let's consider the derivatives of these functions:

Let's evaluate the derivatives at certain points and plot them:

The important takeaway is that even if the slope appears the same, the rate of change of the vector-valued function can be different.

Multivariable Chain Rule

The chain rule is a fundamental concept in calculus, and it can be extended to multivariable functions.

Consider a function and a pair of functions and .

Then, consider applying to both:

This can be thought of as a series of transformations:

  1. Start with a number line for .
  2. Transform to a plane with .
  3. Transform to a number line with .

Notice how, although a plane is involved, it still starts with a single number and ends with a single number. Therefore, it is still a single-variable function.

Next, consider taking the derivative of . To do so, we can apply the chain rule.

We can start with an example:

Then, consider the derivative of :

This is fine, but there's a more general way to think about this.

Consider the partial derivatives of :

Next, consider the derivative of each function and :

Notice that the derivative of can be written as:

This is known as the multivariable chain rule.

Intuition for the Multivariable Chain Rule

We've just used one example and noticed a (possibly coincidental) pattern, but this should also make intuitive sense.

First, recall the intuition for the regular chain rule.

Consider a function . This is essentially two transformations:

  1. Start with a number line for .
  2. Transform to a number line with .
  3. Transform to a number line with .

Now, consider a change in , , and consider how the change sort of "propagates" through the transformations:

  1. The change in is .
  2. The change in is , since the fraction essentially cancels out.
  3. The change in is .

Then:

And dividing by gives the chain rule.

Let's extend this intuition to the multivariable case.

  1. You start with a number line for . The change in is .
  2. The change in causes a change in and . The change in is and the change in is , due to the cancelling of differentials.
  3. Both of these changes result in a change in . You could think of this as the sum of a change in due to a change in and a change in due to a change in .
    • The change in due to a change in is .
    • The change in due to a change in is .

Then:

And dividing by gives:

Vector Form of the Multivariable Chain Rule

The multivariable chain rule can be written in vector form.

We've used as separate parameters, but we can also think of as a function that takes a vector as input.

The vector can be written as:

Then, the derivative of can be written as:

Recall the multivariable chain rule:

Notice that this is basically a dot product:

This should also make intuitive sense, as it is very similar to the regular chain rule:

  • corresponds to ; the gradient is sort of an extension of the full derivative.
  • corresponds to .

Duality of the Multivariable Chain Rule and the Directional Derivative

One thing to notice is that the multivariable chain rule looks very similar to the directional derivative.

Recall the directional derivative:

And the vector-form multivariable chain rule:

The vector-form rule is essentially the directional derivative of in the direction of :

Consider why this is the case. Recall that the composition of functions can be thought of as a series of transformations:

  1. Start with a number line for .
  2. Transform to a plane with .
  3. Transform to a number line with .

When you increment a value by a vector in the plane, and measure the change in , that's a directional derivative.

The vector in question is caused by the change in . So, the directional derivative is in the direction of .

Formalizing the Multivariable Chain Rule

We have shown various ways to intuitively think about the multivariable chain rule, but let's treat it more formally now.

Recall that we used the cancellation of differentials to derive the chain rule. This is not rigorous, but it is still helpful because it very closely resembles the formal treatment.

Recall the vector form of the multivariable chain rule:

And the limit definition of the derivative (since this is a single-variable function):

Recall our intuition for the chain rule: the change in is based on the change in , which is based on the change in . As such, consider the derivative of :

Now we're going to do something that might be unfamiliar, but we're going to rewrite the limit as a sum of two terms:

The term represents the error term, which is the difference between the limit and the actual value. It should approach zero as approaches zero. This is common in epsilon-delta proofs in real analysis.

Multiply both sides by :

Rewrite as . This is a common notation in analysis, and all it represents is that as .

Then, rewrite as:

This is based on the definition of the derivative as a slope. We apply the slope to to get the change in , and then add the error term.

Substitute this back into the definition of the derivative of :

Since and is small, we can cancel out the term:

Finally, recall the definition of the directional derivative:

Looks familiar? The multivariable chain rule is essentially the directional derivative of in the direction of :

This also illustrates the power of using vectors, as well as an interplay between intuition and formalization - our entire manipulation was to evaluate the different nudges in a formal way.

Curvature

In an intuitive sense, curvature measures how much a curve deviates from a straight line.

A good way to think about curvature is to consider a circle for comparison.

Consider the following graph of this function:

The radius of this circle, known as the radius of curvature is a way to measure how much the curve actually "curves".

The definition for curvature is the reciprocal of the radius of curvature. Its symbol is (kappa), and it is defined as:

The reason the reciprocal is taken is because as the radius increases, the line is more straight, so the curvature should decrease.

In other words, small radius, high curvature; large radius, low curvature.

Next, we shall describe this in a more quantitative manner.

Deriving a Formula for Curvature

This will take quite a lot of work, but essentially it boils down to finding a unit tangent vector, then using the derivative with respect to arc length.

Laying the Groundwork

Below is the same graph, this time with the tangent vector drawn in:

Since the tangent vector is a unit vector, it always has a magnitude of .

To evaluate the curvature, consider how much the tangent vector changes as you move along the curve.

Plot in a separate graph:

It should make intuitive sense that a higher curvature means the tangent vector changes more rapidly.

Next, we need to quantize this change. One might think of the derivative of the tangent vector with respect to :

This is a good start, but it's not quite right.

The thing is that curvature is based on space, not time. It shouldn't matter how fast you're moving along the curve, only how much the curve is curving.

Instead, consider the derivative of the tangent vector with respect to arc length, :

So instead of considering how much the tangent vector changes as you increment , consider how much it changes as you move along the curve at a constant speed.

Since curvature is a scalar, we should put a magnitude on this derivative:

Now that we have dervied a formula for curvature, let's evaluate it for the curve .

The Circle and the Tangent

Previously, we explained the relationship between the curvature and the circle of curvature.

We will consider the case of a simple function:

Which is just a circle of some radius .

Simultaneously, we shall generalize our findings to any curve.

We can find the unit tangent vector for this curve; recall that the tangent vector is the derivative of the curve with respect to :

The tangent is then:

We need to normalize the derivative to get the unit tangent vector:

We can evaluate the magnitude using the Pythagorean theorem:

Plugging this back in:

We can generalize this as well. Using the Pythagorean theorem, we can find the magnitude of the derivative of any curve:

Then, the unit tangent vector is:

We have now found a general formula for the unit tangent vector.

Recall that the curvature is defined by . Hence, in the next part, we shall evaluate , the arc length.

The Arc Length

The curvature is defined by this derivative:

We can make use of some change-of-variables techniques to evaluate this derivative.

The derivative considers how changing the arc length affects the tangent vector.

Instead, consider how changes both the curve and the tangent vector, and then taking the ratio of these changes.

Hence:

Recall the unit tangent vector for our circle example in Equation :

We can find the derivative of this vector with respect to :

The magnitude of this derivative is:

Then, the curvature is:

This is a very important result. The curvature of a circle is always 1.

With the curvature fully evaluated, we can now generalize this to any curve.

Generalizing to All Curves

To do this final step, we shall bring together all the formulas we have derived.

First, recall the formula for the curvature:

We have already found the unit tangent vector:

We can rewrite this in a way that is easier to differentiate:

Then, the derivative of the unit tangent vector is:

Because of the lengthiness of the expression, we shall use Lagrange notation:

Note that although a is not explicitly written, these are still functions of .

Then, the derivative of the unit tangent vector is rewritten as:

Since this is a very lengthy expression, we shall make the substitution and :

Using the product rule, we can simplify this expression:

By the chain rule:

Now, we can evaluate the magnitude of this derivative:

We shall evaluate each component separately. The square of the -component is:

Similarly, the square of the -component is:

Then, the magnitude of the derivative is:

That was a lot of work, but we're almost there. Recall the formula for curvature:

We can evaluate the denominator using the Pythagorean theorem:

Then, the curvature is:

Substituting back in the definition of and :

If we were to write this out using proper notation for functions of :

And we're done.

Intuition for the Curvature Formula

The formula for curvature is quite complex, but it can be understood intuitively.

First, consider the numerator:

This is equivalent to a certain cross product. The magnitude of the cross product of two vectors and is .

Hence:

Isn't the cross product a 3D operation?

The cross product is indeed a 3D operation, meaning it's only defined for vectors .

However, the cross product can be generalized to 2D vectors by considering them as 3D vectors with a -component of 0.

Hence, the proper way to write the cross product is:

Instead of considering the cross product itself, which has a direction in 3D space, we consider the magnitude of the cross product, which is a scalar, and makes sense in 2D space.

We can consider what both and represent visually. Consider a curve of some parametric equation , and consider points on the curve:

For every point on the curve, consider a vector pointing there. Then, the first derivative measures the movement in this vector.

If you were to do this for all points, then you get the tangent vector.

To visualize , consider plotting the tangent vectors separately, placed at the origin. Then, consider the change in these tangent vectors.

In essence, the first derivative measures the change in the tip of the vector function, while the second derivative measures the change in the tangent vector.

Consider the cross product of these two vectors. To illustrate what it means, consider tangent vectors on a curve with high curvature:

Notice how it's almost a rotation as approaches 0. Hence, the acceleration of the tangent vector would approximately be perpendicular to the tangent vector.

In other words, the curvature is essentially how perpendicular the acceleration is to the tangent vector.

This should make you instantly think of the cross product, which is exactly what the curvature formula is.

Recall that geometrically, the magnitude of the cross product is the area of the parallelogram formed by the two vectors.

As such, if they're more perpendicular, the area is larger, and hence the curvature is higher, which is exactly what the formula evaluates.

So, the curvature can be described as:

There's a problem, however.

Recall that the curvature is defined as the rate of change with respect to arc length, since the curvature is a property of the curve itself, not the parameterization. This would pose a problem to our makeshift formulation of the curvature, since a higher speed would imply a longer tangent vector, which would imply a higher curvature.

For instance, if you double the speed, the tangent vector would double in length, and so will the acceleration.

We can instead consider "shrinking" the tangent vector to a unit vector:

However, for the acceleration vector, we divide by the magnitude of the velocity vector, so that it corresponds to the tangent:

Recall that the second derivative tells you how much the tangent vector changes, and the first derivative tells you how much the curve changes. If we want to express everything in terms of the tangent vector, we should divide by the magnitude of the tangent vector.

Then the cross product of these two vectors is:

This is actually equivalent to the derivative of the unit tangent vector with respect to time.

Recall that the curvature is defined as the ratio of this derivative and the derivative of the curve:

Hence, the curvature is:

Example Problem: Curvature of a 3D Helix

Find the curvature of the helix given by this function:

Where and .

(Source)

Let's first visualize the helix:

In essence, it is a circle in the -plane, with the -component increasing linearly.

Let's compute it;

Find the Unit Tangent Vector

Recall the formula for the unit tangent vector:

Hence, we need to first find the derivative of the curve:

Then, the magnitude of this derivative is:

Therefore, the unit tangent vector is:

Find the Derivative of the Unit Tangent Vector

We need to take what we have found and differentiate it with respect to .

We can make a substitution to simplify calculations:

Then, the derivative is:

Find the Magnitude of the Derivative

The magnitude of this derivative can be found using the Pythagorean theorem:

Divide by the Magnitude of the Derivative of the Curve

Recall that the curvature is defined as:

The denominator can be evaluated by taking the magnitude of the derivative of the curve:

Hence, the final curvature is:

Example Problem: Curvature of a Cycloid

The cycloid is a curve traced by a point on the rim of a circular wheel as the wheel rolls along a straight line. The equation of the cycloid is given by:

Find the curvature of the cycloid.

We will first visualize the cycloid:

This tiem we will use the explicit formula for the curvature:

Individually finding the derivatives:

Plugging these into the curvature formula:

Partial Derivatives of Vector-Valued Functions

In the previous sections, we have discussed the partial derivatives of a scalar-valued function.

In this section, we shall discuss the partial derivatives of a vector-valued function.

Partial Derivatives of Parametric Surfaces

Consider the following function :

We can find the partial derivatives of this function with respect to and . For instance, the partial derivative with respect to can be evaluated by keeping constant:

Carefully consider the dimensions for both the input and output of the function; The input is a 2D vector, and the output is a 3D vector.

Of course, we can't visualize the entire input-output relationship in a single graph, since that would require a 5D space.

However, it can be visualized in by mapping inputs in a 2D plane to outputs in a 3D space. This results in a surface in 3D space.

One common way to make this more convenient is to, instead of using a separate 2D plane, "override" the 2D plane into the -plane in 3D space.

Here, we start with a 2D plane in the -plane, and then transform each point to the output of the function. This is visualized by moving the input plane to form the surface.

Next, consider one singular point on the surface, say .

This point is visualized by seeing how a point on the input plane is transformed in the output surface. We can evaluate the output of the function at this point:

Next, recall the partial derivative with respect to :

Since the partial derivative with respect to means keeping constant, we can think of a line in the input plane where is constant:

This visualization gives an idea of how the output of the function changes as changes, while keeping constant. The entire line is essentially a line where is unchanged, whereas varies along the line.

For a different constant, you would get a different curve along the surface.

For the partial derivative, we visualize it by incrementing by a small amount, and seeing how the output changes. In the visualization, the increment is shown by a vector, and you can see the change in the output by seeing how the vector gets transformed:

Before the transformation, the vector represents the nudge in , .

After the transformation, it represents the change in the output, .

Taking the ratio, you get , and as , it becomes .

The magnitude of the output-change vector represents the magnitude of the derivative, i.e. how fast you are moving along the curve.

Using the above visualization, we can intuitively guess what the partial derivative is. By playing the animation and looking at the direction of the vector, we can see that it is pointing in the , , and directions.

Hence, the partial derivative at the point is:

We can now compute the partial derivative with respect to at this point:

This aligns with the intuition we had from the visualization.

Likewise, we can find the partial derivative with respect to by keeping constant:

Plugging in the values at :

The visualization for both partial derivatives is shown below:

Hopefully this provides an intuitive understanding of how partial derivatives work for parametric surfaces beyond just the algebraic manipulation.

Partial Derivatives of Vector Fields

Another way to express a vector-valued function is as a vector field. Recall that a vector field is a function that assigns a vector to each point in space.

For instance, consider the following vector field:

This vector field can be visualized by plotting the vectors at each point in the -plane:

Consider a point in the space, . The vector at this point is:

Consider the partial derivative of this vector field with respect to :

Recall that we can conceptualize the partial derivative as the rate of change of the vector field as changes, while keeping constant. We increment by a small amount, and see how the vector field changes, and keep decreasing the increment until it becomes infinitesimally small.

Incrementing in this case means moving in the direction in the field, and drawing a new vector at that point:

To find the difference in the output, the first step is to put the two vectors in a new space such that they are both at the origin:

One way to think about this is that:

As , the ratio of to becomes the partial derivative:

One way to further conceptualize this is to treat each output component as a separate scalar function:

Where:

We can write each component's respective partial derivative:

Next, once again, consider the point . The partial derivatives evaluated at this point are:

Summary

In this chapter, we have extended the concept of the derivative from single-variable functions to multi-variable functions.

We showed that the partial derivative is the derivative of a function with respect to one of its variables, while keeping the other variables constant:

We introduced these concepts for scalar-valued functions:

  • Gradient: A vector of partial derivatives of a scalar function.

    The gradient points in the direction of the steepest increase of the function.

  • Directional Derivative: The rate of change of a function in a certain direction:

    Where is a unit vector. Using this we can show that the directional derivative is maximized when is in the direction of the gradient:

    Where is the angle between the gradient and .

  • Multivariable Chain Rule: The chain rule for functions of multiple variables. For a function :

We also introduced these concepts for vector-valued functions:

  • Curvature: The rate of change of the unit tangent vector with respect to arc length. The curvature of a curve is:

    Where is the unit tangent vector.

    In 2D Cartesian coordinates, the curvature is:

  • Partial Derivatives of Parametric Surfaces: The partial derivatives of a vector-valued function with respect to its parameters. For a function :

Next, we will apply these fundamental concepts to formulate more advanced topics in multivariable calculus, like the divergence and curl of vector fields.