The goal of researchers in the field of computer vision is to find algorithms that will take digitized images and produce useful analyses of the scenes in those images.
Typically, a computer vision system begins with cameras that take pictures of the world.
If the image is a color image, then the value of each pixel is a color. If the image is black and white, then the value of each pixel is a brightness value that describes a shade of gray.
There are many ways to do smoothing. The algorithm illustrated in the images above, involves replacing each pixel value by an average of its neighbors:
The demo program actually considers a 5x5 square of pixels as the "neighborhood" of a pixel. It also smooths the image 3 times, rather than just once. These are default values, but they can be changed as desired.
A pixel can be identified as an edge element if its value is very different from the values of its neighbors. If a pixel's value is different from the neighbors immediately above or below, then the pixel is part of a horizontal edge. If a pixel's value is different from the neighbors immediately to the left or right, then the pixel is part of a vertical edge.
Go through the image, pixel by pixel. For each pixel: Compare it to the pixels immediately to the right of it and below it. (i.e., compute the differences between the pixel and each of these two neighbors) If either of these differences is above a pre-defined threshold Then mark the pixel as belonging to an edge.
-1 1 -1 1We then "slide" this along the digitized image. At each step, we multiply the numbers in the mask with the brightness values of the image that are below them. Next we compute the sum of the products. If the absolute value of the sum is greater than some threshold, we might conclude that we've found an edge.
-1 0 1 -2 0 2 -1 0 1and
1 2 1 0 0 0 -1 -2 -1To determine if a particular pixel is an edge pixel, we place a mask over the image, centered on the pixel under consideration. We then multiply the mask values with the brightness values of the pixels "under" them, and then we add those products together. (That is, we compute a weighted sum.)
There are two Sobel operators in order to detect both vertical lines and horizontal lines. (The 2x2 mask given above is meant to find lines of a particular orientation. What is that orientation?)
Since we have two different masks, we need a principled way to combine their "scores" in order to determine whether a pixel is an edge pixel. There are several ways to do this, including this one:
Let's say that the weighted sums produced by the mask specify a point in a Cartesian coordinate system. One mask gives us an x value; the other gives us a y value. If the point generated by the two masks is (0, 0), this indicates that the point is not an edge point. The farther the point is from (0, 0), the more likely it is to be an edge point. We will specify a threshold -- i.e., a radius of a circle centered at the origin. All points outside of that circle will be marked as edge points.
To compute whether a pixel is an edge pixel, then, we compute weighted sums using the two masks given above. Let's call those x and y. We then treat those as a point (x, y). Finally, we compute the distance from the origin by calculating the square root of (x2 + y2) and then determine whether the distance exceeds the threshold required to be an edge point.
Therefore, we need to do something else to identify groups of edge pixels as lines.
To find horizontal lines:
Consider the first row of pixels. - If you have a long stretch of pixels marked as edge pixels (allowing for small gaps, if desired), mark that as a line segment. Do this for each row.For vertical lines:
Consider the first column of pixels. - If you have a long stretch of pixels marked as edge pixels (allowing for small gaps, if desired), mark that as a line segment. Do this for each column.We can do the same for diagonal lines.
Note that it is harder to identify closed outlines of objects in general (e.g., with rounded edges).