Previous | Next --- Slide 114 of 130
Back to Lecture Thumbnails
mckennab

Why don't we depend on our weights w2, w3 in computing the gradient for w1? Aren't those essentially constants in our function?

If I had something like y = f(5*g(x)), dy/dx would include the 5 in there, right?

motoole2

Yes, you're definitely right that the partial derivative with respect to w1 depends on both w2 and w3. (And perhaps the slide is a bit misleading in that respect.)

In the example on this slide, the partial derivative of a3 (output of a particular layer) with respect to f2 (input of a particular layer) is a function of w3. Similarly, the partial derivative of a2 with respect to f1 is a function of w2. So as a result, the weights w2 and w3 will appear when computing the gradient for w1.