1. Computer vision
    1. Difficulties with automating vision
    2. Digitized images
    3. A three-phase process
  2. Prepping the digital image: Smoothing
  3. Finding edges
    1. A simple algorithm for marking edge pixels
    2. A more complex algorithm
    3. Sobel operators
    4. Identifying lines

Computer vision

We can imagine that it might be very useful to give a robot the gift of sight. This has been a topic of study for a number of decades.

The goal of researchers in the field of computer vision is to find algorithms that will take digitized images and produce useful analyses of the scenes in those images.

Typically, a computer vision system begins with cameras that take pictures of the world.

Difficulties with automating vision

There is no simple mapping between a concept and a visual image. Difficulties are introduced by

Digitized images

A digitized image is a 2-dimensional grid of pixels. (pixel = picture element)

If the image is a color image, then the value of each pixel is a color. If the image is black and white, then the value of each pixel is a brightness value that describes a shade of gray.

A three-phase process

Computer vision generally proceeds in three phases.

Prepping the digital image: Smoothing

Before any vision processing is done, we typically smooth the image. We do so for a number of reasons: The process of smoothing "blurs" the image somewhat:

There are many ways to do smoothing. The algorithm illustrated in the images above, involves replacing each pixel value by an average of its neighbors:

The demo program actually considers a 5x5 square of pixels as the "neighborhood" of a pixel. It also smooths the image 3 times, rather than just once. These are default values, but they can be changed as desired.

Finding edges

The first real phase of image processing for computer vision generally involves identifying the edges in the image. The result is a sort of line drawing of the scene. There are many different ways to do edge detection, but they all typically begin with a process that identifies individual pixels that are edge elements.

A pixel can be identified as an edge element if its value is very different from the values of its neighbors. If a pixel's value is different from the neighbors immediately above or below, then the pixel is part of a horizontal edge. If a pixel's value is different from the neighbors immediately to the left or right, then the pixel is part of a vertical edge.

A simple algorithm for marking edge pixels

A very simple algorithm for identifying edge pixels is a follows:
Go through the image, pixel by pixel.
For each pixel:
   Compare it to the pixels immediately to the right of it and below it.
   (i.e., compute the differences between the pixel and each of these two
   If either of these differences is above a pre-defined threshold
      Then mark the pixel as belonging to an edge.

A more complex algorithm

The simple algorithm above considered only two neighbors of each pixel. A more complex algorithm involves the consideration of several neighbors. We define a mask, i.e., a 2x2 or 3x3 matrix of values such as:
-1  1
-1  1
We then "slide" this along the digitized image. At each step, we multiply the numbers in the mask with the brightness values of the image that are below them. Next we compute the sum of the products. If the absolute value of the sum is greater than some threshold, we might conclude that we've found an edge.

Sobel operators

There are many possible masks. These include the Sobel operators:
-1  0  1
-2  0  2
-1  0  1
 1  2  1
 0  0  0
-1 -2 -1
To determine if a particular pixel is an edge pixel, we place a mask over the image, centered on the pixel under consideration. We then multiply the mask values with the brightness values of the pixels "under" them, and then we add those products together. (That is, we compute a weighted sum.)

There are two Sobel operators in order to detect both vertical lines and horizontal lines. (The 2x2 mask given above is meant to find lines of a particular orientation. What is that orientation?)

Since we have two different masks, we need a principled way to combine their "scores" in order to determine whether a pixel is an edge pixel. There are several ways to do this, including this one:

Let's say that the weighted sums produced by the mask specify a point in a Cartesian coordinate system. One mask gives us an x value; the other gives us a y value. If the point generated by the two masks is (0, 0), this indicates that the point is not an edge point. The farther the point is from (0, 0), the more likely it is to be an edge point. We will specify a threshold -- i.e., a radius of a circle centered at the origin. All points outside of that circle will be marked as edge points.

To compute whether a pixel is an edge pixel, then, we compute weighted sums using the two masks given above. Let's call those x and y. We then treat those as a point (x, y). Finally, we compute the distance from the origin by calculating the square root of (x2 + y2) and then determine whether the distance exceeds the threshold required to be an edge point.

Identifying lines

The algorithms just described identify possible edge pixels. If we look at an image that has those pixels highlighted, they will look like lines to us. But we have to remember that they are just a collection of dots. That is, the image processing algorithms "see" them as individual pixels, rather than a collection of lines.

Therefore, we need to do something else to identify groups of edge pixels as lines.

To find horizontal lines:

Consider the first row of pixels.
- If you have a long stretch of pixels marked as edge pixels (allowing for
  small gaps, if desired), mark that as a line segment.
Do this for each row.
For vertical lines:
Consider the first column of pixels.
- If you have a long stretch of pixels marked as edge pixels (allowing for
  small gaps, if desired), mark that as a line segment.
Do this for each column.
We can do the same for diagonal lines.

Note that it is harder to identify closed outlines of objects in general (e.g., with rounded edges).