Computer Science 010

Lecture Notes 10

Working with Large C Programs

Header files

So far we have been working with very small C programs. The entire program fits into a single file. As programs grow, it becomes necessary to split them into multiple files. When you do this, you will want to decide which part of the file you want to make visible to other files and which you want to keep private. C does not have keywords like public and private as Java does. Instead, you decide which parts you want to be public and you put their declarations into a header file.

Header files end in .h. The filenames that we have been giving in our #include statements are the names of header files. Header files typically include declarations only. They may declare types using typedef, constants using #define, and function prototypes. The function definitions stay in the .c file. Only the prototypes go in the .h file. Now, if somebody wants to use the things in that .h file, they must #include the .h file. Note that the .c file where the function declarations appear must also #include the .h file. When we include header files provided by Unix, we enclose the filename in <>. When we include our own files, we put the filename in "":

#include <string.h>
#include "myfile.h"

Compiling and Linking Large Programs

When you want to build an executable program that requires more than one C file, it requires several Unix commands. First, you compile each C file to create object code for that file. Then you link together the object code into an executable program. For example, suppose we have 2 C files: file1.c and file2.c. We wish to create an executable named myprog. We would issue the following Unix commands:

-> gcc -g -Wall -c file1.c
-> gcc -g -Wall -c file2.c
-> gcc -o myprog file1.o file2.o

There are several important differences in the use of the gcc command. When you are compiling, you include a -c argument. You omit the -o argument. The compiler will create an object code file using the same name as your source file by ending in .o instead of .c. .o is the traditional extension for object code that represents only part of an executable program. The last gcc command generates the executable program given a complete list of the .o files that make up the object code for the program. Note that this last line does not list any .c files.

Errors when Linking

The most common error by far when linking is "Undefined reference". This indicates that your program has a declaration for something, but not a definition. Most likely, you have a function prototype but not the function body. This can happen if you declare a prototype but forget to define it. It can also happen if you include a header file, but do not link in the object code that goes with that header file. The object code gets linked in in one of several ways:

By listing the .o file on the link line. This is how you link in object code that corresponds to C files that you write.
Automatically. This is how the prototypes defined in stdio.h, stdlib.h and string.h get linked in.
By specifying the library with a -l argument on the link line. When you use a library function, the man page for that function tells you what .h file to include. It may also show a "cc" line in the synopsis. If it does, note what the -l argument is that it lists. You must include this -l argument on your link line. For example, if you look at the man page for islower by looking in section 3 using xman or saying "man 3 islower" to the shell, you will see near the beginning of the man page:
```
LIBRARY
     Standard C Library (libc, -lc)
   
SYNOPSIS
     #include <ctype.h>
   
     int
     islower(int c);
```
The Library section shows the argument to use when linking the program. The Synopsis section shows what to include and the signature of the function.

Make

A large C program might contain 100 or more C and header files. Whenever a header file changes, you should recompile all the C files that use that header file and then relink your program. Keeping track of what has changed and knowing which files include which other files is a difficult task to do. Fortunately, there is a program called make that can help us deal with large C programs.

A Makefile consists of a collection of definitions followed by a collection of rules. The definitions define variables that are typically dependent on where you are building the program. They may define such things as the directories where files should be located.

The rules define how the program should be compiled. A rule consists of a target, the names of the files it depends upon, and one or more Unix commands to build the target. When you execute make, it reads in the Makefile. It looks for a file that has the name of each target. For each target, it compares the date and time at which the target was last modified with the date and time at which each file it depends on was modified. If the target is older than any of the files it depends on, the commands associated with the rule are executed. Usually, those will recreate the target. Here's an example:

match.o:  match.c
        gcc -c -g -Wall match.c
   
match:  match.o
        gcc -o match match.o

The first rule says that the object code file match.o depends on the source file match.c. If the source file is changed, its modification time will be more recent than the one in match.o. If we run make, make will notice this and will recompile the file for us using the following gcc command. If the object file is newer than the source file, then it means the source file did not change since the last time we compiled it so we do not need to compile it. In that case, the gcc command will be skipped.

The second rule says that the file match depends on the file match.o. This means that if the object code is newer than the executable program, we should relink the executable program. In this case the second command will be used to link match.

Now, suppose one of the files is missing. If a target is missing, but all of the files it depends on are present, the command(s) associated with the rule are executed. If one of the files in the dependency list is missing, make looks for another rule in which that dependency file appears as a target. It rebuilds the dependency file and then rebuilds the original target in that case. So, suppose the files match.c and match exist, but match.o does not. If we want to rebuild match, we can use the command:

make match

Make tries to compare the modification date of match with that of match.o. match.o does not exist so it looks for a rule with match.o as a target. It finds one. All the dependency files exist for match.o. So, it issues the following commands:

gcc -c -g -Wall match.c
gcc -o match match.o

As a result, it creates both match.o and match. If an error occurs during the execution of any command, make quits at that point. So, if make.c had compilation errors, it would not attempt to link match.

This all becomes more useful when we have large programs. We can put more than one file in a dependency list. The rules for .o targets typically include the corresponding .c file and any .h files #include'd in that .c file. The rules of an executable target typically list all the .o files that need to be linked together to create the executable program.

One extremely important rule about using makefiles is that the commands associated with a rule must be on lines that start with a TAB character. No other whitespace will do. If you do this incorrectly, make will report "Need an operator." when it tries to read your makefile.

Also, note that the default name for the makefile is Makefile or makefile. If you use that name, you do not need to tell make where the makefile is.

Now, let's explore a use for the declarations. It is generally a good idea to define some identifiers that indicate what compiler to use and what the compiler arguments should be. Then the rules just use those definitions. Here's how that would look:

# This defines an identifier whose value is the name of the C compiler
CC = gcc
   
# This identifier are the flags to compile with
CFLAGS = -c -g -Wall
   
# This defines the linker program
LD = gcc
   
# These are my linker arguments.  In this case, there are none.
LDFLAGS = 
   
# Here are rules that use the identifiers.  The syntax $(CC) means use the value of the
# CC identifier.
match.o:  match.c
        $(CC) $(CFLAGS) match.c
   
match:  match.o
        $(LD) $(LDFLAGS) -o match match.o

The advantage of defining identifiers particularly pays off with large make files. In that case, I could just change my definition of CFLAGS and it would affect all my rules that used that identifier.

Makefiles can be extremely complicated, but what you see here can go quite far.

C Preprocessor

Some of the lines that we have placed in our C files are not actually C code. Instead they are commands that are interpreted by the C preprocessor at the beginning of compilation. All preprocessor commands begin with a # and consist of a single line. So far, we have seen 2 preprocessor commands:

#include - used to include header files
#define - used to define constants

The C preprocessor is automatically run at the beginning of compilation. It reads the C file looking for preprocessor commands. In its interpretation of the command, it removes the command so that the rest of the compiler does not see it and performs some transformation on the text that is sent to the preprocessor. So, here is what the two commands we have used do:

#include replaces the #include statement with the contents of the file being included. After including the text, it then continues its preprocessing with the newly-included file. Thus, header files can include preprocessor commands.
#define looks through the remainder of the file and replaces all occurrences of the first word with the remainder of the line. Thus,
```
#define TRUE 1
```
replaces all occurrences of the string TRUE with the value 1.

#define is different from const in several ways:

#define is evaluated by the preprocessor rather than the compiler.
All occurrences of the defined string from the point of defintion to the end of the file are replaced.
#define can be used to do lots of things other than define constants. See the book for details.
const is evaluated by the compiler. It is an indication that your intent is to define a constant. It will ensure that you do not attempt to change the value.
const uses the same scoping rules as variables. If you declare a const within a function the const name only has meaning within that function.

There are some other useful preprocessor commands that provide conditional compilation. C is not as portable a language as Java. Nevertheless, we would like to write one C program and be able to compile and run it anywhere. To do this, though, we might need to take into account the features of a particular architecture or operating system. We might not be able to use exactly the same source code everywhere. Conditional compilation allows us to do that. We can surround a chunk of code with a conditional compilation operator and the chunk of code only gets included in the file sent from the preprocessor to the remainder of the compiler if the condition is true. Of course, the condition must be something that can be evaluated at compile time. There are 3 conditional compilation preprocessor commands:

#if <expression> ... #elif <expression> ... #endif
#ifdef <identifier> ... #endif
#ifndef <identifier> ... #endif

#if is useful to compile out debugging statements and be able to easily compile them back in as follows:

#define DEBUG 1
....
#ifdef DEBUG
   printf ("Some debugging output.\n");
#endif

If I want to disable all debugging output protected by similar statements, I can remove the #define statement so that DEBUG is not defined and recompile. None of my debugging printfs will get compiled.

One particular expression we can use in #ifdef is:

#if defined (<identifier>)

This will include the enclosed source if and only if the identifier has been defined. It does not matter what value it is defined to. This is particularly useful in conjunction with the -D compiler option. -D is given an identifier and defines it. So, now I would leave out the #define DEBUG 1 line. Instead, if I wanted debugging, I would compile with the -DDEBUG compiler option. If I don't want debugging output, I would just omit that option.

#ifdef is simply shorthand for "#if defined". Normally, you would use #ifdef unless you wanted to have #elif clauses. There is no #elifdef so you need to say defined if you want a list of them. #ifndef is just the negation of #ifdef.

Header files can #include other header files. If we are not careful, we can easily get into a situation in which the same header file gets included multiple times. If this happens, however, our program won't compile because the compiler will see multiple definitions of the same thing. To avoid this, header files typically use the following format:

#ifndef _FOO_H
#define _FOO_H
   
< contents of header>
   
#endif

The convention is that each header file defines a variable that represents that file and is extremely unlikely to be defined elsewhere. This is done through naming conventions. The variable is simply the name of the header file, all capitalized, preceded by _ and with the . replaced with _. If the file is #include'd multiple times, the contents will really only be textually included one time. The other times the contents will be skipped because the header file's variable has already been defined.

Similarly, sometimes multiple .h files define the same things. For example, both string.h and stdlib.h define NULL (fortunately to the same value!). To prevent errors, they must put their definition of null inside an #ifndef NULL statement.

With everything we know about C now, we should be able to understand most header files, so let's take a look at string.h. Recall that this file lives in the /usr/include directory. My comments are added below in red.

/* Copyright info omitted */

/* Check if this file has already been included.*/
#ifndef _STRING_H_

/* Set flag to indicate we are now including this file. */
#define _STRING_H_

/* A nested include... */
#include <machine/ansi.h>

/* The next two are pretty common and defined in a bunch of different
   .h files so there's a check to see if they've already been 
   included before defining them. */
#ifdef  _BSD_SIZE_T_
typedef _BSD_SIZE_T_    size_t;
#undef  _BSD_SIZE_T_
#endif

#ifndef NULL
#define NULL    0
#endif

/* Another nested include */
#include <sys/cdefs.h>

/* Prototypes for the functions defined by string.h */
__BEGIN_DECLS
void    *memchr __P((const void *, int, size_t));
int      memcmp __P((const void *, const void *, size_t));
void    *memcpy __P((void *, const void *, size_t));
void    *memmove __P((void *, const void *, size_t));
void    *memset __P((void *, int, size_t));
char    *strcat __P((char *, const char *));
char    *strchr __P((const char *, int));
int      strcmp __P((const char *, const char *));
int      strcoll __P((const char *, const char *));
char    *strcpy __P((char *, const char *));
size_t   strcspn __P((const char *, const char *));
char    *strerror __P((int));
size_t   strlen __P((const char *));
char    *strncat __P((char *, const char *, size_t));
int      strncmp __P((const char *, const char *, size_t));
char    *strncpy __P((char *, const char *, size_t));
char    *strpbrk __P((const char *, const char *));
char    *strrchr __P((const char *, int));
size_t   strspn __P((const char *, const char *));
char    *strstr __P((const char *, const char *));
char    *strtok __P((char *, const char *));
size_t   strxfrm __P((char *, const char *, size_t));

/* Notice the conditional compilation that defines more prototypes.
   We get to use these if we don't define consants elsewhere that
   state we intend our code to be portable. */
/* Nonstandard routines */
#if !defined(_ANSI_SOURCE) && !defined(_POSIX_SOURCE)
int      bcmp __P((const void *, const void *, size_t));
void     bcopy __P((const void *, void *, size_t));
void     bzero __P((void *, size_t));
int      ffs __P((int));
char    *index __P((const char *, int));
void    *memccpy __P((void *, const void *, int, size_t));
char    *rindex __P((const char *, int));
int      strcasecmp __P((const char *, const char *));
char    *strdup __P((const char *));
size_t   strlcat __P((char *, const char *, size_t));
size_t   strlcpy __P((char *, const char *, size_t));
void     strmode __P((int, char *));
int      strncasecmp __P((const char *, const char *, size_t));
char    *strsep __P((char **, const char *));
char    *strsignal __P((int));
char    *strtok_r __P((char *, const char *, char **));
void     swab __P((const void *, void *, size_t));
#endif
__END_DECLS

#endif /* _STRING_H_ */

Return to CS 010 Home Page