Computer Science 010

Lecture Notes 8

Vectors

Sample Program

Let's try to develop a program that uses something like an array, but friendlier. Here is what we will do to make it friendlier:

When we access an element in this structure, we will check that the index is in bounds and report an error if it is not.
We will allow the strucutre's size to grow dynamically so we can still use it even if we don't know how big it might ultimately need to be.

We will call our structure Vector. It is basically a watered-down version of Java's Vector class (if you're familiar with that).

There are basically two things we can do with arrays: we can retrieve a value from an element in an array and we can set a value in a particular position in the array. Our program therefore needs a function for each of these. To get a value from an array, I need to know which array to access and which element to access and I return a value. For my Vector, I therefore need a signature along these lines:

<something> elementAt (Vector *v, int index)

The question is what is the type of thing that should be returned? For arrays, it returns whatever type we declared the array to be. For us, we will just have one Vector struct. So, it will be the type of thing we allow people to put into the Vector. We could make it int and there are vector could only hold integers. If we made it char, our vector could only hold charcters. If we make it "void *", our vector can hold any type of pointer. Let's do that since that will allow us to use the vector in more situations:

void *elementAt (Vector *v, int index)

We also need a function to change a value in the vector. Here we need to know the vector, the value to put in it, and the position to place the value. Let's define a function like this:

void setElementAt (Vector *v, void *elem, int index)

Now, what is a Vector? Well, it contains a collection of values that we index similaraly to an array. So, our Vector struct should have an array inside it to hold this collection:

typedef struct {
  /* A dynamically-sized array of pointers */
  void **contents;
} Vector;

In order to do bounds checking, it needs to know how big the array is, so let's add a field to the struct to remember the array's size:

typedef struct {
  void **contents;
  int numElements;
} Vector;

We also indicated that we wanted to allow the vector to grow. To support this,we really should remmeber 2 different sizes. First, how many things are in this vector (represented by numElements). Second, how many things can this vector hold before we need to allocate more memory for it. We'll put this second value in another field, called capacity. Finally, our Vector struct is defined as:

typedef struct {
  void **contents;
  int numElements;
  int capacity;
} Vector;

To get us started, we need an initialization function analogous to a constructor in Java:


/* 
 * If we need to increase the size of the vector, this is how
 * much it will grow by.
 */
#define CAPACITY_INCREMENT  100
#define CAPACITY_INITIAL    10

/* 
 * Create a new vector.  There should be a
 * corresponding freeVector function to deallocate the memory
 * allocated here!  
 */
void *newVector (void) {
  Vector *v = malloc (sizeof (Vector));

  /* Initially, it's empty */
  v->numElements = 0;

  /* The array is big enough to hold CAPACITY_INITIAL elements */
  v->capacity = CAPACITY_INITIAL;
  v->contents = malloc (CAPACITY_INITIAL * sizeof (void *));
  return v;
}

Now, let's define the array-like functions we identified earlier:

/* 
 * Return the element at the given index in the vector.  If the
 * index is out of bounds, return null.
 * Parameters:
 */
void *getElementAt (Vector *v, int index) {
  if (index >= 0 && index < v->numElements) {
    return v->contents [index];
  }
  return NULL;
}

/* 
 * Change an element in the vector if the index is in bounds.
 * Prints an error message, but does nothing, if the index is
 * out of bounds.
 */
void setElementAt (Vector *v, void *elem, int index) {
  if (index >= 0 && index < v->numElements) {
    /* DANGER!  If the last pointer to the previous element value
       was the pointer in this contents array, the next line
       of code causes a memory leak! */
    v->contents[index] = elem;
  }
  else {
    printf ("Bad index in setElementAt call: %d\n", index);
  }
}

Now, let's introduce some functions not found in arrays. First, let's define a function addElement that adds an element in the next empty position of the vector. If there are no more empty positions, it reallocates memory so the vector can hold more:

/* 
 * Add an element to the end of the vector.  The vector will grow
 * in size if it is currently full.
 */
void addElement (Vector *v, void *elem) {
  /* If the vector is full, reallocate the array containing the
     vector elements.  The realloc call allocates a new chunk of
     memory of the requested size.  It copies the values from the
     pointer passed in to this new chunk and returns the address
     of the new chunk.  It also frees the old chunk.
  */
  if (v->numElements == v->capacity) {
    v->capacity = v->capacity + CAPACITY_INCREMENT;

    /* No memory leak here.  realloc frees the previous value
       of v->contents */
    v->contents = realloc (v->contents, v->capacity * sizeof(void *));
  }

  /* Now we know the array is big enough, so put the element
     at the next empty slot of the array and increase the counter
     of elements in the array. This is effectively a new slot
     in the vector.  Since we are not overwriting an existing slot
     as we did for setElementAt, there is no memory leak problem
     here.
  */
  v->contents[v->numElements] = elem;
  v->numElements++;
}

There should be an analogous function to remove an arbitary element from the vector:

/* 
 * Remove an element from a vector at a given position.  If the
 * index is out of range, an error message is printed and the
 * vector is not changed.
 * Parameters:
 */
void removeElementAt (Vector *v, int index) {
  if (index >= 0 && index < v->numElements) {
    /* We have a potential memory leak here again.  If the last
       pointer to the element being removed is the one in this
       contents array, we have a memory leak!
    */
    for (; index < v->numElements - 1; index++) {
      v->contents[index] = v->contents[index + 1];
    }
    v->numElements--;
  }
  else {
    printf ("Bad index in removeElementAt call: %d\n", index);
  }
}

Shell Scripts

Compiled programs are not the only way to get things done in Unix. Many tasks can be done using shell scripts: lists of commands to be executed by the command shell.

A shell script is a program, but unlike a C program which needs to be compiled before you can run it, the script is interpreted by sending the commands in the script to the appropriate shell program.

Shell scripts come in many types, with the most important distiction being what shell program is intended to interpret the program. Common unix shell programs include the Bourne shell (sh), the C shell (csh), the Bourne-Again shell (bash), the T-C shell (tcsh), the Korn shell (ksh), and the Z shell (zsh). You may see all of these used as the interpreter for shell scripts you run across. Unfortunately, each has a different syntax, but the basic functionality is similar. More elaborate scripting languages are also available, most notably Perl, which is an extremely flexible scripting language.

A shell script typically specifies which shell program should be used to interpret it by specifying the shell on the first line. For example, if you look at your .xinitrc file, the first line is:

#! /usr/local/bin/bash

which specifies that /usr/local/bin/bash should be used to run the script. Most shells use the # character to indicate a comment, but #! is interpreted as a special sequence to be followed by a shell name. If this line is left off, your default shell will be used, which for us is bash. We will use bash for our shell scripting examples.

We start with a very simple script, which just executes a collection of Unix commands that we tend to execute repeatedly. We all know that make is better than shell scripts if you want to automate the compiling and linking process, but here's a script which would compile and link a C program in two source files, file1.c, and file2.c, with the resulting executable named prog.

#! /usr/local/bin/bash
#
# Simple script to compile several files.  (Don't do this - use make)
#
gcc -o file1 -g -Wall file1.c
gcc -o file2 -g -Wall file2.c

If we store this in a file called compile_all and give it execute permission (use the following command to make the file executable: "chmod u+x compile_all"), we can run it just like we run a compiled C program.

We can define variables in our compile_all script:

#! /usr/local/bin/bash
#
# Simple script to compile files.  It's a little better, but still use make
#
CC=gcc
CFLAGS="-g -Wall"
$CC $CFLAGS -o file1 file1.c
$CC $CFLAGS -o file2 file2.c
echo "done"

There are a few things to point out about this script. First, variables are set like in a Makefile, with double quotes required to set a value containing whitespace. You can refer to a variable by preceding its name with a $.

The ability to group commands like this is helpful in itself, but much more is possible. You can use pipes and I/O redirection in scripts. If you have a complex command line where several steps are needed to pipe the output from one command to the next, you can create a script to avoid retyping the command line.

You can pass arguments from the command line to your script. Try this one:

#! /usr/local/bin/bash
#
# Simple script to compile files, taking the names as command-line arguments
# Still use make.
#
CC=gcc
CFLAGS="-g -Wall"
$CC $CFLAGS -o $1 $1.c
$CC $CFLAGS -o $2 $2.c
echo "done"

Running this at the command prompt with compile_all file1 file2 would result in the same functionality as the previous example. We see that the command line arguments can be referenced as $1, $2, $3, etc.

Shell scripts are much more powerful when we start to take advantage of the available control structures. These include all of the usual control structures you'd expect to see in a programming language:

if  list; then list; 
[ elif list; then list; ] 
... 
[ else list; ] 
fi

while list; do list; done

until list; do list; done

case word in [ ( pattern [ | pattern ] ... ) list ;; ] ...
esac

for name [ in word ] ; do list ; done

The syntax for these is a bit different than what you're used to, but each of these does what you'd expect.

You can do things like check the value of a variable:

if [ $CC == "gcc" ]; then
  echo "We are using gcc"
else
  echo "We are not using gcc"
fi

Or check for the existence of a file:

if [ -f "file1.c" ]; then
  echo "We have a file file1.c"
else
  echo "file1.c not found"
fi

A frequently useful operation is to do something with each of a number of files. For example, to copy each C file in a directory to have a backup copy with a .bak extension:

for file in *.c; do
  cp $file $file.bak
done

The best way to learn about bash shell scripts is to look at larger examples and the bash man page. A simple example is the .bashrc file in your home directory. This is a bash script that gets executed every time you log in. This is a relatively straightforward script, but includes examples of how to write functions in a bash script. A shell script is also used to start up the FreeBSD systems we use. Take a look at the file /etc/rc on one of the FreeBSD systems and find out what happens when one of these systems is booted.

Assigning the output of a program to a variable

You can run a command and save its output to a shell variable. For example, to save the output of wc for a given file in the variable count:

count=`wc Random.java`

Return to CS 010 Home Page