Slide View : Computer Vision : Fall 2021

Previous | Next --- Slide 114 of 130

mckennab 3 years ago

Why don't we depend on our weights w2, w3 in computing the gradient for w1? Aren't those essentially constants in our function?

If I had something like y = f(5*g(x)), dy/dx would include the 5 in there, right?

motoole2 3 years ago

Yes, you're definitely right that the partial derivative with respect to w1 depends on both w2 and w3. (And perhaps the slide is a bit misleading in that respect.)

In the example on this slide, the partial derivative of a3 (output of a particular layer) with respect to f2 (input of a particular layer) is a function of w3. Similarly, the partial derivative of a2 with respect to f1 is a function of w2. So as a result, the weights w2 and w3 will appear when computing the gradient for w1.