Previous | Next --- Slide 117 of 123
Back to Lecture Thumbnails
zebra25

Hi, I wanted to clarify my understanding of what is happening in this slide and the next slide. In this situation we ran our filter against the image pyramid (although for the sake of example I believe the professor mentioned we applied the Laplacian to the filter instead for the sake of example?), and for section of the image we were interested in, as shown on the next slide, we computed the maximum detection in that image section from our harris detector. Then from each image on the scale we compared those maximums and use the image detection result from the scale of the image with the maximum of those maximums?

I guess I want to clarify that we do this for the whole image and then use that scale's result as our set of features, regardless of what we are looking at? So say we are considering whether there are corners in the full size image. So we will look at all of the scales, and pick scale 9.8. Then we will use the corner detection output at that scale for all portions of the image?

We don't instead consider and individual pixel, or small subset of pixels and then perform this algorithm of finding the local max and cross-scale max for that specific subset and use that scale.

Since if we use the former approach over the latter, wouldn't we miss all of the sunflowers in the back of the image that are smaller, even though we can detect them at different scales?

motoole2

In this slide, we show the result of convolving an image represented at 2 different scales (top row is the full size image; bottom row is the same image shrunken to be 3/4 size). Note that the image patches shown here have the same number of pixels, for both the top row and the bottom row. The only difference between these patches is that the scale of the image content is different.

Next, we convolve (or correlate) these image patches with a Laplacian kernel. The number above each response represents the standard deviation used to generate the Laplacian kernel. The response is strongest when the image content and the Laplacian kernel have the same characteristic scale. Because the image content is scaled differently across both rows, this response is maximized by using different standard deviation values.

motoole2

Now to address the questions:

In this situation we ran our filter against the image pyramid

To be clear, there isn't really an image pyramid shown in this particular example. This slide simply meant as an example of characteristic scale. The Laplacian filter responds more strongly when the image content has the same characteristic scale.

I guess I want to clarify that we do this for the whole image and then use that scale's result as our set of features, regardless of what we are looking at?

To make our detection algorithms insensitive to scale, we will often (i) build an image pyramid first, and (ii) run a detector across all levels of the image pyramid. Although our detector will respond to the image at all positions/scales, we tend to try to find the position/scale of the peak.

In the example that you gave, you would run your corner detection algorithm on your entire image pyramid. Next, you will identify the peaks (i.e., likely position of corners) at any given level of the pyramid. And finally, you will compare the response at different levels of the pyramids to find the characteristic scale of the corner.

So we will look at all of the scales, and pick scale 9.8. Then we will use the corner detection output at that scale for all portions of the image?

Not quite. We should always take into consideration the corner detection output at all scales. For example, we can interpret the response from our filter as a 3D function $g(x,y,s)$, where $x$ and $y$ represent the pixel location and $s$ represents the scale. We are effectively searching for local maxima in this 3D landscape $g(x,y,s)$. Each local maxima represents a corner at position $x,y$ and with a characteristic scale $s$.

We don't instead consider and individual pixel, or small subset of pixels and then perform this algorithm of finding the local max and cross-scale max for that specific subset and use that scale.

When computing corners or applying a Laplacian filter, we do this to the entire image or image pyramid. Only then do we start our search for peak responses.

Hope this addresses your questions!