CS136, Lecture 6

Vector implementation
1. Increasing space in vector
Complexity
Searching
1. Linear search
2. Binary search

Vector implementation

How can we implement Vector?

Might try linked list or some sort of array.

Try array implementation. Vector has two fields, array and # elts of array currently in use.
Note distinction btn size of array and # elts in use!

When about to exceed capacity, copy elts into a larger array.
Need efficient strategy for this.

Following is simplification from (more complex) code in structures:

public class Vector 
{
    protected Object elementData[]; // the data
    protected int elementCount;     // # of elts in vec
    protected int capacityIncrement;    // the rate of growth for 
                                                // vector

    // pre: initialExtend >= 0, extentIncrement >= 0
    // post: construct empty vector with initialExtent capacity
    //  that extends capacity by extentIncrement, or doubles 
    //  if 0
    public Vector(int initialExtent, int extentIncrement)
    {
        Assert.pre(initialExtent >= 0, 
                                        "Non-negative vector extent.");
        elementData = new Object[initialExtent];
        elementCount = 0;
        capacityIncrement = extentIncrement;
    }

Accessing and modifying elts trivial, adding and deleting more tricky. Method ensureCapacity(int n) ensures there is space for at least n elts in array component of vector. (Will discuss in detail later.)

    // post: add new element to end of possibly extended vector
    public void addElement(Object obj)
    {
        ensureCapacity(elementCount+1);
        elementData[elementCount] = obj;
        elementCount++;
    }

    // pre: 0 <= index <= size()
    // post: inserts new value in vector with desired index
    //   moving elements from index to size()-1 to right
    public void insertElementAt(Object obj, int index)
    {
        int i;
        ensureCapacity(elementCount+1);
        // Must copy from right to left to avoid destroying data
        for (i = elementCount; i > index; i--) 
            elementData[i] = elementData[i-1];
        // assertion: i == index and element[index] is available
        elementData[index] = obj;
        elementCount++;
    }

Remove similar (see code on line).

Adding or deleting an element may involve moving up to n elts, if n elts in array (slow!!).

Increasing space in vector

What if run out of space in the array when adding new elts?

Options:

Make new array with 1 (or some fixed # of) more elts and copy old array into new.
Make new array with double (or triple, ...) # of elts and copy old into new.

First option bad, since if start w/ empty takes about n²/2 copies to add n elts to vector.

(0 + 1 + 2 + 3 + 4 + ... + n = n*(n-1)/2 )

Whereas no copies need to be made if allocated space for n elts at beginning.

With second option (where assume n is power of 2 for simplicity), copy

0 + 1 + 2 + 4 + 8 + ... + n/2 = n - 1 elts.

While this is overhead, extra copy of n elts, much less painful than n²/2.

Let user decide which strategy to use. If call Vector(int initialExtent) then set capacityIncrement to 0 and use doubling strategy, else always extend by capacityIncrement new elts:

    // post: capacity of this vector is at least minCapacity.
    public void ensureCapacity(int minCapacity)
    {
        if (elementData.length < minCapacity) 
        {           // have less than needed
            int newLength = elementData.length; // initial guess
            if (capacityIncrement == 0) 
            {   // increment of 0 suggests doubling (default)
                if (newLength == 0) newLength = 1;
                while (newLength < minCapacity) 
                    newLength = newLength * 2;
            } 
            else 
            {   // increment != 0 suggests incremental increase
                while (newLength < minCapacity)
                    newLength = newLength + capacityIncrement;
            }
            // assertion: newLength > elementData.length.
            Object newElementData[] = new Object[newLength];
            // copy old data to array
            for (int i = 0; i < elementCount; i++) 
                newElementData[i] = elementData[i];
            elementData = newElementData;
            // N.B. Garbage collector will pick up old elementData
        }
        // assertion: capacity is at least minCapacity
    }

Complexity

Rather than keeping an exact count of operations, use order of magnitude count of complexity.

Ignore differences which are constant - e.g., treat n and n/2 as same order of magnitude.

Similarly with 2 n² and 1000 n².

In general if have polynomial of the form a0 n^k + a1 n^k-1 + ... + ak , say it is O(n^k).

Definition: We say that g(n) is O(f(n)) if there exist two constants C and k such that |g(n)| <= C |f(n)| for all n > k.

Equivalently, say g(n) is O(f(n)) if
there is a constant C such that for all sufficiently large n, | g(n) / f(n) | <= C.

Most common are

O(1) - for any constant

O(log n), O(n), O(n log n), O(n²), ..., O(2ⁿ)

Usually use these to measure time and space complexity of algorithms.

Insertion of new first element in an array of size n is O(n) since must bump all other elts up by one place.

Insertion of new last element in an array of size n is O(1).

Saw increasing array size by 1 at a time to build up to n takes time n*(n-1)/2, which is O(n²).

Saw increasing array size to n by doubling each time takes time n-1, which is O(n).

Make table of values to show difference.

Suppose have operations with time complexity O(log n), O(n), O(n log n), O(n²), and O(2ⁿ).

And suppose all work on problem of size n in time t. How much time to do problem 10, 100, or 1000 times larger?

size 10 n 100 n 1000 n

O(log n) >3t 10t >30t

O(n) 10t 100t 1,000t

O(n log n) >30t 1,000t >30,000t

O(n²) 100t 10,000t 1,000,000t

O(2ⁿ) ~t10 ~t100 ~t1000

TIME TO SOLVE PROBLEM

size	10 n	100 n	1000 n
O(log n)	>3t	10t	>30t
O(n)	10t	100t	1,000t
O(n log n)	>30t	1,000t	>30,000t
O(n²)	100t	10,000t	1,000,000t
O(2ⁿ)	~t10	~t100	~t1000

*Note that the last line depends on the fact that the constant is 1, otherwise the times are somewhat different.

Suppose get new machine that allows certain speed-up. How much larger problems can be solved? If original machine allowed solution of problem of size k in time t, then

speed-up 1x 10x 100x 1000x

O(log n) k k¹⁰ k¹⁰⁰ k¹⁰⁰⁰

O(n) k 10k 100k 1,000k

O(n log n) k <10k <100k <1,000k

O(n2) k 3k+ 10k 30k+

O(2n) k k+3 k+7 k+10

SIZE OF PROBLEM

speed-up	1x	10x	100x	1000x
O(log n)	k	k¹⁰	k¹⁰⁰	k¹⁰⁰⁰
O(n)	k	10k	100k	1,000k
O(n log n)	k	<10k	<100k	<1,000k
O(n2)	k	3k+	10k	30k+
O(2n)	k	k+3	k+7	k+10

We will use big Oh notation to help us measure complexity of algorithms.

Searching

Searching and sorting are important operations and also important example for use of complexity analysis.

Only deal with searches here, come back to do sorts.

Code for all searches is on-line in Sort program example

Linear search

Pretty straightforward. Compare element looking for with successive elements of the list until either find it or run out of elements.

If list has n elements, then n compares in worst case.

Average n/2 compares if element is in the list.
Get n compares if element not in list.
O(n) compares in all these cases.

Binary search

Binary search cleverer on ordered list. Look at middle element:

If middle elt is search elt then done.
If middle elt smaller than search elt, then do binary search of bigger elts.
If middle elt larger than search elt, the do binary search of smaller elts.

Notice this is recursive.

With each recursive call do at most two compares.

What is maximum number of recursive calls?

Each time make recursive call, divide size of array to be searched in half.
How many times can divide number in half before only 1 elt left?
If start with 2^k then => 2^k-1=> 2^k-2=> 2^k-3=> ...=> 2⁰ = 1; divide k times by 2.
In general can divide n by 2 at most log n times to get down to 1. In this course, write log n for log₂ n

At most (log n) + 1 invocations of routine & therefore at most 2*((log n) + 1) comparisons. O(log n) comparisons.

Concrete comparison of worst cases: # of comparisons:

Search\# elts 10 100 1000 1,000,000

linear 10 100 1000 1,000,000
binary 8 14 20 40

Can actually make faster if don't compare for equality until only 1 elt left!