[C-tut] WIP - Whirlwind Tour of C

wtd · **Posted:** Sat Jun 17, 2006 6:47 pm

What kind of basic datatypes do we have at our disposal?

short
int
long
long long
float
double
char

The caveat about the int, long, and long long types is that they're machine dependent. On your run of the mill 32-bit system, both int and long will generally be 32 bits. The "long long" type will generally be 64 bits, and short integers are typically 16 bits. The dichotomy between int and long derives from the age of 16-bit computers.

The float and double types represent floating point numbers. The float type will represent 32 bits of memory while double is double that, at 64 bits.

The char type is an integer 8 bits long, representing ASCII characters. As it is an integral data type, it can be used anywhere integers can.

Booleans?

You'll note there is no boolean data type. Zero represents false, while every other value is true. The stdbool.h header gives us macros true, false, and bool. The bool macro resolves the the type __Bool. The macros true and false resolve to the integer constants 1 and 0, respectively.

This header also defines the __bool_true_false_are_defined macro, which resolves to 1. This will allow you to detect if the stdbool.h header has been included.

What's this talk of macros?

As a convenience C provides a very basic macro facility known as preprocessing directives. These are commands embedded in C source code which are evaluated by the C Preprocessor (cpp) before the program is compiled.

Perhaps the one most programmers are familiar with is #include. This directive takes the name of a file and then includes that file's contents into the program's source code. When provided with a filename surrounded by < > it looks for that file in a default set of directories, as well as others specified when the compilation is done. If the filename is in quotes, then it looks in directories relative to the program's source code file.

Also important is the #define macro, which permits creation of new macros. These work purely on the basis of textual substitution. For example, if I create the following macro:

code:

#define FOO 42

Then the following expression:

code:

FOO + FOO

Resolves to:

code:

42 + 42

That final version is what the compiler actually sees.

A more practical example would be the true macro from the stbool.h header.

code:

if (true) { ... }

Resolves to:

code:

if (1) { ... }

There are also various conditional directives, such as #if, #ifdef, #ifndef, #else, and #endif. They are all relatively self-explanatory. Consider some code to follow my earlier FOO example:

code:

#ifndef FOO
#define FOO 42
#else
#define BAR 27
#endif

Here if the symbol FOO is not defined, it will be defined. Otherwise the symbol BAR will be defined.

Comments

The C99 standard permits for line comments start with //. However, this cannot be counted on, so the only commenting style that should be used is as follows:

code:

/* One line comment. */

/*
* Multi-line
* comment.
*/

int main

So, we need a way to designate where the program starts. The main function serves this purpose.

Functions, eh?

Functions in C are reasonably simple. You specify the name of the function, the arguments it takes, and the type of data it returns. That return value is indicated by the return keyword, which also jumps to the end of the function, from anywhere in the function. This behavior is sometimes used where "else" clauses in conditionals should be.

code:

int times_two(int a)
{
return a * 2;
}

double times(double a, double b)
{
return a * b;
}

Function calls are also straightforward.

code:

times(4.5, 6.7)

Every function in a C program must have a unique name.

Functions will not be "aware" of functions declared later in the program. In order to achieve this, you must use forward declarations. A forward declaration looks exactly like the functions we've seen previously, except that it's missing the body.

code:

int foo(int bar);

void baz()
{
int a = foo(5);
}

int foo(int bar)
{
return bar + 1;
}

Parameter names may be omitted in the forward declaration, but you may also wish to keep them so your code is more self-documenting.

Parameters?

Parameters are passed only by value.

code:

int foo(int bar)
{
bar += 1;
return bar;
}

This will return bar plus 1, but if a variable is paqssed as an argument to the function, that variable's value will not change.

More types

void

The void type is use to indicate that a function returns nothing.

Pointers

Pointers are simply integer type variables which hold the memory address of some other value. This provides a considerable amount of power to the otherwise minimal functionality of C.

Let's declare a pointer to an integer.

code:

int *foo;

As with other variables, we can declare multiple variables at once, and in the case of pointers, this brings up the question: why did I put the asterisk with the variable name, rather than the int type?

code:

int *foo, bar;

The above actually declares foo as a pointer to an int, and bar as an int.

Of course, just declaring a pointer doesn't do much. We'll probably want it to actually point to some. As with any other integral type in C, if you don't give it a value, it gets a random value. To avoid this, if we have no other meaningful value for a pointer, we should start out initializing it to NULL.

code:

int *foo = NULL;

NULL is a preprocessor macro which resolves to zero. This also means we can treat NULL pointers as false.

Now, if we have an existing value, we can use the & operator to get its memory address.

code:

int foo = 42;
int *bar = &foo;

The * operator does exactly the opposite, taking a pointer and returning the underlying value.

code:

int foo = 42;
int *bar = &foo;
int baz = *bar;

We can change the underlying value like so:

code:

int foo = 42;
int *bar = &foo;
int baz = *bar;

*bar = 27;

Now the value of foo has changed, and thus the value that bar points to (we told it to, after all), but the value of baz has not, since the initial assingnment copied the value.

Arrays

Arrays in C are really just pointers. They point to the beginning of a contiguous section of memory that holds values.

The simplest array is one that it statically allocated.

code:

int foo[10];

This tells the compiler to set aside space for ten integers in memory, and creates foo as a pointer to the first of those integers.

We can see this by dereferencing that first element with:

code:

*foo

But we can also use another syntax to do this.

code:

foo[0]

Why zero?

Well, because pointers are an integral type, and we know 1 is the same as 1 + 0. So:

code:

*foo

Is equivalent to:

code:

*(foo + 0)

Which is equivalent to:

code:

foo[0]

So, what about a dynamic array?

Well, arrays are just pointers so, to declare a dynamic array:

code:

int *my_array;

We can use the function malloc to actually allocate the memory for the array.

code:

int *my_array = malloc(sizeof(int) * 10);

We can actually use malloc to dynamically allocate any amount of memory. Even if only for a single int.

code:

int *foo = malloc(sizeof(int));
*foo = 3;

Anyway, the dynamically allocated array can be used just like any other array.

Strings

Strings in C are just arrays of characters that happen to get terminated by the null character (zero).

Let's compile something, for crying out loud!

Let's start with a simple "Hello, world!" program. Here's the source.

code:

#include <stdio.h>

int main()
{
printf("Hello, world!\n");

return 0;
}

It's very important to leave a blank line at the end of the program. C is meant to be cross-platform, and some platforms won't see a file as properly constructed without a terminating newline.

But I digress. Let's call this file main.c and put it in a new directory called ww-hello-world.

Now, we need to create a makefile. Well, we could get away without it, but we won't. Jus like I could have put all of that main function on one line and gotten away with it.

In the same directory, in a file named "makefile":

code:

CC = gcc

# the default
all : hello

main.o : main.c
${CC} main.c -c -o main.o

hello : main.o
${CC} main.o -o hello

clean :
touch main.o
touch hello
rm main.o hello

Briefly....

The first line defines a variable CC which is set to equal gcc. This represents our C compiler. If in the future we wish to change compilers, we can do it in one place.

Lienes starting with # are comments. As the comment here indicates, "all" is the default rule.

The all rule, by means of what follows the colon indicates which rules it depends on. In this case, hello. So, to do all, we must do hello.

Looking at hello we see that to do hello we must do main.o. To do main.o we must do main.c. Oh, wait... there is no main.c rule. This just checks to make sure here is indeed a main.c file.

So, assuming we have a main.c file, the main.o rule proceeds to compile main.c to object code. The -c option tells our compiler not to link the file, and the -o option tells the compiler to name the output file main.o.

Now that we've done that, we're back to the hello rule.. Here we take main.o and link it to get an executable. Again, the -o option specifies the name of the resulting file. We o back to all, and we're done.

The clean rule is there so we can clean up and go back to a pristine state. It first touches the files that would have been created so rm doesn't throw a fit about not being able to find them, if "make all" hasn't already been run.

Let's make main.c a little more interesting:

code:

#include <stdio.h>

void say_hello();

int main()
{
say_hello();

return 0;
}

void say_hello()
{
printf("Hello, world!\n");
}

Now, our makefile doesn't have to change at all. But let's create a situation where it will have to.

Le's say we want to re-use that say_hello function elsewhere. Well, we'll need to put it in a separate file. First off, let's put the forward declaration in say_hello.h.

code:

void say_hello();

But that won't do. What if we include it more than once? Then C oes nuts and the universe implodes. You don't want that. So, we need a guard.

code:

#ifndef __SAY_HELLO_H
#define __SAY_HELLO_H

void say_hello();

#endif

The symbol __SAY_HELLO_H will only ever be undefined once, so the declaration will only get inserted into the final source once.

Now, we'll need the actual implementation somewhere, so let's create say_hello.c.

code:

#include <stdio.h>
#include "say_hello.h"

void say_hello()
{
printf("Hello, world!\n");
}

And now we need to modify main.c.

code:

#include "say_hello.h"

int main()
{
say_hello();

return 0;
}

And our makefile will have to change.

code:

CC = gcc

all : hello

main.o : main.c
${CC} main.c -c -o main.o

say_hello.o : say_hello.c
${CC} say_hello.c -c -o say_hello.o

hello : main.o say_hello.o
${CC} main.o say_hello.o -o hello

clean :
touch main.o
touch say_hello.o
touch hello
rm main.o say_hello.o hello

Automatic promotion

This is one of the areas where C appears a bit magical. Define a function that takes a double value as a parameter, and then pass it an int. It should balk at this, right? Well, it doesn't. Instead it silently changes the int to a double precision floating point number. The same will happen in reverse.

This also happens between integral types.

It makes the math look a bit better, but it can be a tad confusing.

Yet more data types... structs

So you want to group some data together. For this we'll use structs. Let's also add some functions for dealing with it.

In the file name.h:

code:

#ifndef __NAME_H
#define __NAME_H

struct name
{
char *first, *last;
};

struct name *new_name(char *, char *);
char *full_name(struct name *);

#endif

And in name.c:

code:

#include "name.h"
#include <string.h>

struct name *new_name(char *first, char *last)
{
struct name *result = malloc(sizeof(struct name));
strcpy((*result).first, first);
strcpy((*result).last, last);

return result;
}

char *full_name(struct name *n)
{
int len = strlen((*n).first) + 2 + strlen((*n).last);
char *result = malloc(sizeof(char) * len);

strcpy(result, (*n).first);
strcat(result, " ");
strcat(result, (*n).last);

return result;
}

There's a lot that's clumsy about this, though. We don't have to explicitly dereference those pointers to access the fields in the struct.

Additionally, we can typedef the sruct so we don't have to include the struct keyword each time.

Oh, and we can use sprintf.

We might even use typedef to remove a lot of the asterisks by typedefing the name_t pointer.

And let's add a static function to allocate enough space for a name. The neat thin about a static function is that if we try to use it outside the file where it's defined, the compiler will complain.

code:

#ifndef __NAME_H
#define __NAME_H

typedef struct name
{
char *first, *last;
} name_t;

typedef name_t *name_ptr;

static name_ptr alloc_name(int, int);
name_ptr new_name(char *, char *);
char *full_name(name_ptr);

#endif

code:

#include "name.h"
#include <string.h>

static name_ptr alloc_name(int fn_len, int ln_len)
{
name_ptr result = malloc(sizeof(name_t));

result->first = malloc(sizeof(char) * fn_len + 1);
result->last = malloc(sizeof(char) * ln_len + 1);

return result;
}

name_ptr new_name(char *first, char *last)
{
name_ptr result = alloc_name(strlen(first), strlen(last));

strcpy(result->first, first);
strcpy(result->last, last);

return result;
}

char *full_name(name_ptr n)
{
int len = strlen(n->first) + 2 + strlen(n->last);
char *result = malloc(sizeof(char) * len);

sprintf(result, "%s %s", n->first, n->last);

return result;
}

So, now let's expand the say_hello component of the program.

code:

#ifndef __SAY_HELLO_H
#define __SAY_HELLO_H

#include "name.h"

void say_hello();
void say_hello_to(name_ptr);

#endif

code:

#include <stdio.h>
#include "say_hello.h"
#include "name.h"

void say_hello()
{
printf("Hello, world!\n");
}

void say_hello_to(name_ptr n)
{
printf("Hello, %s!\n", full_name(n));
}

And we need to add some code to main.c.

code:

#include "say_hello.h"
#include "name.h"

int main()
{
name_ptr bobs_name = new_name("Bob", "Smith");

say_hello();
say_hello_to(bobs_name);

return 0;
}

And now we just need a spiffy new makefile.

code:

CC = gcc

all : hello

name.o : name.c
${CC} name.c -c -o name.o

main.o : main.c
${CC} main.c -c -o main.o

say_hello.o : say_hello.c
${CC} say_hello.c -c -o say_hello.o

hello : main.o say_hello.o name.o
${CC} main.o say_hello.o name.o -o hello

clean :
touch main.o
touch say_hello.o
touch name.o
touch hello
rm main.o say_hello.o name.o hello

Conditionals and Loops

What if one or both of the names is empty?

Well, we need a way to determine if a string is empty.

So, let's create string_utils.h:

code:

#ifndef __STRING_UTILS_H
#define __STRING_UTILS_H

#include <stdbool.h>

bool only_contains_whitespace(char *);

#endif

And string_utils.c:

code:

#include "string_utils.h"
#include <stdbool.h>

bool only_contains_whitespace(char *str)
{
int counter, len = strlen(str);
for (counter = 0; counter < len; ++counter)
{
char ch = str[counter];

if (!(ch == ' ' || ch == '\t' || ch == '\n'))
{
return false;
}
}

return true;
}

Now let's modify name.c.

code:

#include "name.h"
#include "string_utils.h"
#include <string.h>

static name_ptr alloc_name(int fn_len, int ln_len)
{
name_ptr result = malloc(sizeof(name_t));

result->first = malloc(sizeof(char) * fn_len + 1);
result->last = malloc(sizeof(char) * ln_len + 1);

return result;
}

name_ptr new_name(char *first, char *last)
{
name_ptr result = alloc_name(strlen(first), strlen(last));

strcpy(result->first, first);
strcpy(result->last, last);

return result;
}

char *full_name(name_ptr n)
{
int len = strlen(n->first) + 2 + strlen(n->last);
char *result = malloc(sizeof(char) * len);

if (only_contains_whitespace(n->first) &&
only_contains_whitespace(n->last))
{
strcpy(result, "");
}
else if (only_contains_whitespace(n->first))
{
strcpy(result, n->last);
}
else if (only_contains_whitespace(n->last))
{
strcpy(result, n->first);
}
else
{
sprintf(result, "%s %s", n->first, n->last);
}

return result;
}

And we'll make some changes to say_hello.c.

code:

#include <stdio.h>
#include "say_hello.h"
#include "name.h"
#include "string_utils.h"

void say_hello()
{
printf("Hello, world!\n");
}

void say_hello_to(name_ptr n)
{
char *fn = full_name(n);

if (only_contains_whitespace(fn))
{
printf("Hello?");
}
else
{
printf("Hello, %s!\n", full_name(n));
}
}

And the new makefile:

code:

CC = gcc

all : hello

name.o : name.c
${CC} name.c -c -o name.o

main.o : main.c
${CC} main.c -c -o main.o

say_hello.o : say_hello.c
${CC} say_hello.c -c -o say_hello.o

string_utils.o : string_utils.c
${CC} string_utils.c -c -o string_utils.o

hello : main.o say_hello.o name.o string_utils.o
${CC} main.o say_hello.o name.o string_utils.o -o hello

clean :
touch main.o
touch say_hello.o
touch name.o
touch string_utils.o
touch hello
rm main.o say_hello.o name.o string_utils.o hello

Let's Build a List!

Every student of C does it. They all construct a linked list. So let's go nuts.

In link_list.h:

code:

#ifndef __LINKED_LIST_H
#define __LINKED_LIST_H

#include <stdbool.h>

typedef struct linked_list_node
{
void *item;
struct linked_list_node *next;
} linked_list_node_t;

typedef linked_list_node_t *linked_list_node_ptr;

typedef struct linked_list
{
linked_list_node_ptr head;
} linked_list_t;

typedef linked_list_t *linked_list_ptr;

typedef void (*op_on_list_element)(int, linked_list_node_ptr);

linked_list_ptr new_list();
bool is_empty_list(linked_list_ptr);
int list_length(linked_list_ptr);
void list_push(linked_list_ptr, void *);
void list_pop(linked_list_ptr, bool);
void list_do_for_each(linked_list_ptr, op_on_list_element);

#endif

Whoa whoa whoa...

That looks like a pointer to void.

Here's where we encounter the other use of void. A void pointer has no particular type information associated with it. It's just a pointer. Thus we can give it any kind of pointer. This gives us the freedom to use the same linked list implementation for any kind of data, rather than having to create many implementations.

But, this is also extremely risky. The compiler uses type information to help you avoid problems. With void pointers, it has no such information, and you just have to count on your own discipline to keep you out of trouble. For instance, don't pass an int pointer in, then try to treat it like a string. The results are not pretty.

You'll also notice this little gem.

code:

typedef void (*op_on_list_element)(int, linked_list_node_ptr);

This is a function pointer that I'm giving a name with a typedef.

Let's look at the implementation of these functions:

code:

#include "linked_list.h"
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

linked_list_ptr new_list()
{
linked_list_ptr result = malloc(sizeof(linked_list_t));
result->head = NULL;

return result;
}

bool is_empty_list(linked_list_ptr list)
{
return list->head == NULL;
}

int list_length(linked_list_ptr list)
{
linked_list_node_ptr current;
int len = 0;

for (current = list->head; current != NULL; current = current->next)
{
len++;
}

return len;
}

void list_push(linked_list_ptr list, void *new_item)
{
linked_list_node_ptr new_node = malloc(sizeof(linked_list_node_t));
new_node->item = new_item;
new_node->next = NULL;

if (is_empty_list(list))
{
list->head = new_node;
return;
}

linked_list_node_ptr current = list->head;

while (current->next != NULL)
{
current = current->next;
}

current->next = new_node;
}

void list_pop(linked_list_ptr list, bool free_item)
{
if (!is_empty_list(list))
{
if (list_length(list) == 1)
{
linked_list_node_ptr temp = list->head;
list->head = NULL;
free(temp);
}
else
{
linked_list_node_ptr current = list->head;
linked_list_node_ptr last;

while (current->next != NULL && current->next->next != NULL)
{
current = current->next;
}

last = current->next;
current->next = NULL;
if (free_item) free(last->item);
free(last);
}
}
}

void list_do_for_each(linked_list_ptr list, op_on_list_element op)
{
linked_list_node_ptr current;
int len = 0;

for (current = list->head; current != NULL; current = current->next)
{
len++;
op(len, current);
}
}

And that's all fairly straightforward, so let's look at how main.c changes.

code:

#include "say_hello.h"
#include "name.h"
#include "linked_list.h"

void greet(int, linked_list_node_ptr);

int main()
{
linked_list_ptr foo = new_list();
list_push(foo, new_name("Bob", "Smith"));
list_push(foo, new_name("", ""));

list_do_for_each(foo, greet);

return 0;
}

void greet(int n, linked_list_node_ptr np)
{
say_hello_to(np->item);
}

And our makefile:

code:

CC = gcc
LN = gcc

all : hello

name.o : name.c
${CC} name.c -c

main.o : main.c
${CC} main.c -c

say_hello.o : say_hello.c
${CC} say_hello.c -c

string_utils.o : string_utils.c
${CC} string_utils.c -c

linked_list.o : linked_list.c
${CC} linked_list.c -c

hello : main.o say_hello.o name.o string_utils.o linked_list.o
${LN} main.o say_hello.o name.o string_utils.o \
linked_list.o -o hello

clean :
touch main.o
touch say_hello.o
touch name.o
touch string_utils.o
touch linked_list.o
touch hello
rm main.o say_hello.o name.o string_utils.o \
linked_list.o hello

A copy of all of the source at this stage is available at:

http://familygeek.com/ww-hello-world.zip

Let's Look at That Makefile

Notice that we have "LN = gcc" as well as the compiler variable.

So, why the difference?

Well, there is a difference between compiling and linking. When we compile the individual source files, we generate machine code. But, we use functions that are defined elsewhere. How does this work?

Well, the function declarations in the header files tell the compiler how the functions look and how we can call them. The compiler checks to make sure we've used the functions properly, and generates the machine code to call those functions. We call this resulting machine code an object file.

The linker then takes a bunch of object files and links them together to form an executable. With gcc this can be one step, or two, by using the -c option.

This is why we'd get a liunker error rather than a compiler error if we have a function declared, but missing an implementation.

The ability to compile files separately can be a great way of checking for errors.

What's this stdlib.h?

The stdlib.h header contains many useful functions and macros. These will almost always be of interest to you.

This brings up an important point...

When you assume...

Don't make assumptions. If you need functions from stdio.h in your main function, for instance, then include it there, even if you're pretty sure some other header file you're including already includes it.

An Idiom

Let's go back to our linked list. Let's say we start creating an iterator library for use with our linked lists.

code:

#ifndef __LINKED_LIST_ITERATOR_H
#define __LINKED_LIST_ITERATOR_H

#include "linked_list.h"
#include <stdlib.h>

typedef struct linked_list_iterator
{
linked_list_ptr list;
linked_list_node_ptr current;
} linked_list_iterator_t;

typedef linked_list_iterator_t *linked_list_iterator_ptr;

linked_list_iterator_ptr new_linked_list_iterator(linked_list_ptr);

#endif

And then the implementation.

code:

#include "linked_list_iterator.h"
#include "linked_list.h"
#include <stdlib.h>

linked_list_iterator_ptr new_linked_list_iterator(linked_list_ptr list)
{
linked_list_iterator_ptr result = malloc(sizeof(linked_list_iterator_t));
result->list = list;
result->current = list && list->head;

return result;
}

Pretty straightforward stuff. It isn't until we get to the next to last line that the idiom shows up.

code:

result->current = list && list->head;

In C, boolean operators like && (and) short-circuit. That is, as soon as the result of the expression is known, they stop evaluating. So let's look at just the expression.

code:

list && list->head

We know list is a pointer to a linked_list struct. It could be NULL. If it is, then it's false. Anything and false is always going to be false, so the expression short-circuits. The NULL value is returned, which is exactly what we wanted in that case.

If list is not NULL, then we assume it to be a valid pointer, and we check it's head field. Now, since the expression does not continue, the value of that field is going to be returned regardless. This is exactly the desired outcome, and we managed to avoid a repetitive if/else construct.

Scope

Let's look at the function which creates a new name, just for fun.

code:

name_ptr new_name(char *first, char *last)
{
name_ptr result = alloc_name(strlen(first), strlen(last));

strcpy(result->first, first);
strcpy(result->last, last);

return result;
}

No no... I'm actually interested in alloc_name.

code:

static name_ptr alloc_name(int fn_len, int ln_len)
{
name_ptr result = malloc(sizeof(name_t));

result->first = malloc(sizeof(char) * fn_len + 1);
result->last = malloc(sizeof(char) * ln_len + 1);

return result;
}

Now... alloc_name dynamically allocates memory for a name struct, but wouldn't it be easier for me to just statically allocate it and then get the address of that variable?

code:

static name_ptr alloc_name(int fn_len, int ln_len)
{
name_ptr result;

result.first = malloc(sizeof(char) * fn_len + 1);
result.last = malloc(sizeof(char) * ln_len + 1);

return &result;
}

This would compile just fine. After all, the fucntion returns a name_ptr, just as the signature says it should.

The problem is at run-time. To understand why this is a problem we have to understand the difference between the stack and the heap. Both are areas in memory. The stack, though, is a relatively small chunk of memory. Statically allocated variables in a function get pushed onto the stack when the flow of execution enters that function. When we leave the function, that space gets reused for variables in other functions.

So you see, if we were to statically allocate result, and then return a pointer to it, we'd be returning a pointer to something that, as soon as the function finishes, no longer exists.

With dynamic memory allocation, the variable gets stored on the heap, which is not affected by going in and out of functions.

Speaking of Scope...

Braces introduce a new level of scope. This means that things like conditionals and loops can have their own internal variables. When you know you will not need a variable outside such a scope, then declare it within that scope. This will help keep your code organized.

code:

#include <stdio.h>

int main()
{
int i, j;

for (i = 0; i < 10; i++)
{
j = i * 2 + 1;
printf("%d * 2 + 1 = %d\n", i, j);
}
}

Could easily be:

code:

#include <stdio.h>

int main()
{
int i;

for (i = 0; i < 10; i++)
{
int j = i * 2 + 1;
printf("%d * 2 + 1 = %d\n", i, j);
}
}

A Blast From the Past

In your travels of the world of C, you may encounter functions that look like so:

code:

int main()
int argc;
char **argv;
{
/* ... */
}

This is an old style of C. Modern versions of gcc will compile it without complaint. You should not write code this way, but you should be prepared to encounter it.

Hiding Details

Let's consider name.h again.

code:

#ifndef __NAME_H
#define __NAME_H

typedef struct name
{
char *first, *last;
} name_t;

typedef name_t *name_ptr;

static name_ptr alloc_name(int, int);
name_ptr new_name(char *, char *);
char *full_name(name_ptr);

#endif

Now the interesting thing to note is that I never actually make reference in the same file to the fields in the name struct. I might as well have the following.

code:

#ifndef __NAME_H
#define __NAME_H

typedef struct name name_t;

typedef name_t *name_ptr;

static name_ptr alloc_name(int, int);
name_ptr new_name(char *, char *);
char *full_name(name_ptr);

#endif

And then in name.c:

code:

#include "name.h"
#include "string_utils.h"
#include <string.h>

struct name
{
char *first, *last;
};

static name_ptr alloc_name(int fn_len, int ln_len)
{
name_ptr result = malloc(sizeof(name_t));

result->first = malloc(sizeof(char) * fn_len + 1);
result->last = malloc(sizeof(char) * ln_len + 1);

return result;
}

name_ptr new_name(char *first, char *last)
{
name_ptr result = alloc_name(strlen(first), strlen(last));

strcpy(result->first, first);
strcpy(result->last, last);

return result;
}

char *full_name(name_ptr n)
{
int len = strlen(n->first) + 2 + strlen(n->last);
char *result = malloc(sizeof(char) * len);

if (only_contains_whitespace(n->first) &&
only_contains_whitespace(n->last))
{
strcpy(result, "");
}
else if (only_contains_whitespace(n->first))
{
strcpy(result, n->last);
}
else if (only_contains_whitespace(n->last))
{
strcpy(result, n->first);
}
else
{
sprintf(result, "%s %s", n->first, n->last);
}

return result;
}

So, what's the difference?

With the old code, any file that includes name.h will be able to see the fields that comprise the struct. With the new code, they cannot. If they try to call any of those fields, the compiler will raise an error.

The name.c implementation, however, since it contains the actual definition of the name stuct, can directly access those fields.

This technique may be used to prevent accidentally modifying the state of a struct. If this sounds like one of the tenets of object-oriented programming, then you're not the first person to realize that. It is one way in which C can approximate object-oriented programming, despite offering no direct syntactic support for OOP.

A Caveat

A mistake I made has been made apparent to me. I got lazy because my compiler makes things easier than some. It allows me to inline variable declarations anywhere I'd like. Not all compilers are so lenient, and unless you want to get bitten by it later you should take care to code for the standard that is commonly supported.

code:

void list_push(linked_list_ptr list, void *new_item)
{
linked_list_node_ptr new_node = malloc(sizeof(linked_list_node_t));
new_node->item = new_item;
new_node->next = NULL;

if (is_empty_list(list))
{
list->head = new_node;
return;
}

linked_list_node_ptr current = list->head;

while (current->next != NULL)
{
current = current->next;
}

current->next = new_node;
}

Should be:

code:

void list_push(linked_list_ptr list, void *new_item)
{
linked_list_node_ptr new_node = malloc(sizeof(linked_list_node_t));
linked_list_node_ptr current;

new_node->item = new_item;
new_node->next = NULL;

if (is_empty_list(list))
{
list->head = new_node;
return;
}

current = list->head;

while (current->next != NULL)
{
current = current->next;
}

current->next = new_node;
}

Strings and initialization

So let's say I create a string with malloc.

code:

char *my_string = malloc(sizeof(char) * 256);

So, now what does this section of memory contain? As it happens, that's a very good question. You see, malloc does nothing to initialize the contents of the memory.

This is especially important with regards to strings, since strings in C are null-terminated. We could copy in an empty string.

code:

char *my_string = malloc(sizeof(char) * 256);
strcpy(my_string, "");

This does the job since strcpy adds a terminating character.

Or we could use calloc instead of malloc. The calloc function will initialize all of the memory to zero.

code:

char *my_string = calloc(256, sizeof(char));

Since all of the memory is initialized to zero, any string handling function will immediately see a null terminator and decide that the string is empty.

Speaking of strings

Let's talk about buffer overruns.

Normal string operations depend on null-terminators to tell them when a string ends. Let's look at a simple strcpy into a statically allocated string.

code:

char foo[10];
strcpy(foo, "hello");

That works just fine. Essentially, what happens is:

code:

foo[0] = 'h';
foo[1] = 'e';
/* ... */
foo[4] = 'o';
foo[5] = '\0';

And that only usess the sixth element of foo, which has space for ten characters, No problem. Now, what if we had used "hello world"?

code:

foo[0] = 'h';
foo[1] = 'e';
/* ... */
foo[10] = 'd';
foo[11] = '\0';

Whoa... what happened?

Well, strcpy has no idea how large foo is, so it just kept copying until it got to the end of the source string. So now we've written into memory we don't own. Your program will likely crash once it gets to this point. Note that it will compile, though.

How do we prevent this?

There is a whole set of string-handling functions defined in string,h that adds an 'n' to the name of the function to indicate that you can provide it with a maximum length argument.

code:

char foo[10];
strncpy(foo, "hello", 9);
foo[9] = '\0';

Here we can only copy a maximum of nine charactersm since the array has room for only ten, and we need to leave a space for the null character.

But the source string is only five characters (six with null). The remainder of that nine character limit is padded with zero. We manually add a null terminator in the event that the source string was nine characters or greater in length.

For instance.

code:

char foo[10];
strncpy(foo, "hello world", 9);
foo[9] = '\0';

The string foo should now be:

code:

"hello wor"

Macros again

So you've got your function f, which does some math stuff.

code:

int f(int a, int b)
{
return a * 2 + b - 1;
}

That's all nice and spiffy, but you think to yourself, " a function just for this? Isn't that kind of inefficient?" It does after all mean adding code to invoke the function. Can't we just insert that code into the program where we need it, without explicitly copying and pasting it?

And the C preprocessor says yes.

code:

#define f(a, b) a + b

So now any instance of "f(a, b)" in the program gets replaced with "a + b".

code:

int wooble = f(3, 5);

Becomes:

code:

int wooble = 3 + 5;

So now I should be able to write the following and have it work.

code:

int foo = 4;
int bar = 27;
int baz = 3;

int ninja = f(bar, baz) * foo;

And that compiles just fine, and just as we expect after using the f function, we get the sum of 27 and 3, which is 30, multiplied by 4, which is 120.

Except that ninja is 39.

Huh?

Well, to understand this, let's see what happened when the preprocessor ran.

code:

int foo = 4;
int bar = 27;
int baz = 3;

int ninja = bar + baz * foo;

And now, when this runs, order of operation rules take over, and the multiplication happened first. How can we avoid this?

This can very simply be avoided by modifying our macro slightly.

code:

#define f(a, b) (a + b)

Now the preprocessed code is:

code:

int foo = 4;
int bar = 27;
int baz = 3;

int ninja = (bar + baz) * foo;

Caveat: equality and assignment can look very similar

This one bites a lot of C programmers.

code:

#include <stdio.h>

int main()
{
int foo = 4, bar = 5;

if (foo = bar)
{
printf("Success!\n");
}

return 0;
}

This should print nothing, right? After all, four does not equal five.

But wait! Those variables weren't compared for equality. Instead the value of bar was assigned to foo, and that value was five, so it evaluates to true.

code:

#include <stdio.h>

int main()
{
int foo = 4, bar = 0;

if (foo = bar)
{
printf("Success!\n");
}

return 0;
}

And now nothing happens. This is because the value of foo is zero, which is false.

What we really want is probably:

code:

#include <stdio.h>

int main()
{
int foo = 4, bar = 5;

if (foo == bar)
{
printf("Success!\n");
}

return 0;
}

And nothing happens. None of these samples are wrong. They all compile and run just fine. That's what's dangerous about this: no tool can help you find it.

Free your mind... and some memory

This is the story of two arrays. Foo as a statically allocated array of ten ints, and bar was a dynamically allocated array of ten ints.

code:

int foo[10];
int *bar = malloc(sizeof(int) * 10);

They both live in a town called baz. Not much goes on in this sleepy little hamlet.

code:

void baz()
{
int foo[10];
int *bar = malloc(sizeof(int) * 10);
}

But as it turns out, bar really lives in the big city of heap, and is just pointed to by the bar in baz.

An adventurous ip named main wandered into baz one day.

code:

void baz()
{
int foo[10];
int *bar = malloc(sizeof(int) * 10);
}

int main()
{
baz();

return 0;
}

Immediately after he left, there was an accident, and all of baz's inhabitants were wiped out. The space they took up was free for others to allocate as they needed.

But wait... while foo was a family of ten, bar was just a single guy with a sign called "bar family" pointing in the general direction of heap. So when baz's inhabitants all went away, the bar family wasn't affected. So the next time we visit baz there's going to be a new foo family, and a new guy pointing to a baz family in the metropolis of heap, but he'll be pointing at a new bar family, and since no one knows where they went, the original bar family will be forgotten.

Colorful stories aside, this is how memory leaks happen. To avoid them, we have to explicitly free memory while a pointer to it still exists.

code:

void baz()
{
int foo[10];
int *bar = malloc(sizeof(int) * 10);

free(bar);
}

int main()
{
baz();

return 0;
}

I am constant as the northern star!

Well, not me. I'm really quite wishy-washy. But some of these "variables" are.

So let's say I want an integer constant in my program. Say, for instance, the upper bound of an array. I could use a preprocessor directive.

code:

#define UPPER_BOUND 256

Or I could avoid leaving the world of C and use a constant.

code:

const int UPPER_BOUND = 256;

Now, what if I want a constant string? Well, that's pretty easy, right?

code:

const char *foo = "foo";

Excellent! Now I have a constant string. So now let's write some code with it.

code:

foo = "hello";

Whoa! That shouldn't have worked. Didn't we say this was a constant string?

Well, as it turns out, that's misleading. It's a pointer to a constant char, but that pointer itself isn't itself constant. I would have gotten an error if I'd written:

code:

foo[0] = 'b';

But as it stands, I can have foo point to an entirely different constant string. Obviously I need a way to have the pointer be constant.

code:

const char * const foo = "foo";

Now this does what we'd expect, but it looks weird. In the first case, const precedes char, but now const follows the asterisk that indicates that this is a pointer.

In truth, there is another way this can work.

code:

char const * const foo = "foo";

Think of the const as applying to the element directly to its left. The other syntax caught on early on as many though "const type" looked nicer than the "type const" syntax.

Constants can also be used with function parameters. One of the dangers in passing pointers to a function is that we're providing the function with the address of some value in memory. With this information, we could go in and change that value.

code:

#include <stdio.h>

void greet(char *);

int main()
{
char *my_name = "wtd";

greet(my_name);
greet(my_name);

return 0;
}

void greet(char *name)
{
printf("Hello, %s!\n", name);
name[0] = 'a';
}

This should say "Hello, wtd!" twice, right?

Well, actually it crashes, because a string literal is inherently read-only, and my_name is just pointing to that read-only string. So let's try something a bit different.

code:

#include <stdio.h>

void greet(char *);

int main()
{
char my_name[] = "wtd";

greet(my_name);
greet(my_name);

return 0;
}

void greet(char *name)
{
printf("Hello, %s!\n", name);
name[0] = 'a';
}

I've made my_name a statically allocated char array that just happens to be initialized with "wtd".

Now it prints "Hello, wtd!" but then it print "Hello, atd!"

What if I make my_name an array of constant characters?

code:

#include <stdio.h>

void greet(char *);

int main()
{
const char my_name[] = "wtd";

greet(my_name);
greet(my_name);

return 0;
}

void greet(char *name)
{
printf("Hello, %s!\n", name);
name[0] = 'a';
}

This will compile and run, and do the exact same thing, but one things has changed. The compiler has warned me that by passing a constant char pointer to a function that expects just a char pointer, I'm throwing that const qualifier away, and bad things could happen.

Better yet, I could just flip this around.

code:

#include <stdio.h>

void greet(char *);

int main()
{
char my_name[] = "wtd";

greet(my_name);
greet(my_name);

return 0;
}

void greet(const char *name)
{
printf("Hello, %s!\n", name);
name[0] = 'a';
}

Now it won't compile at all.

Why haven't I made the pointer constant here?

Well, pointers are passed by value. I can change the pointer inside the function all I ant, and it won't change the value the pointer points to.

Command-line arguments

Most languages have some sane means of getting access to arguments passed to a program on the command-line. C is no different. The arguments are passed via parameters to the main function.

code:

#include <stdio.h>

int main(int argc, char **argv)
{
int i;

for (i = 0; i < argc; i++)
{
printf("%s\n", argv[i]);
}

return 0;
}

So, what is argc about? Well, argv is an array of char arrays, or in other words an array of strings. However, we know that C arrays are just pointers to the beginning of a chunk of memory. As a result, they cannot contain information on their own about their length. The argc parameter handles this, telling us how long the array is.

The rest of the code is straightforward now that we know this.

More on strings... and some other stuff

So you have two strings in a program. What do you suppose the output will be?

code:

#include <stdio.h>

int main()
{
char *foo = "wtd";
char *bar = "wtd";

if (foo == bar)
{
printf("Hello\n");
}

return 0;
}

Well, in this case it actually prints "Hello" even though that's not at all what I wanted to demonstrate. This is because my compiler is smart enough to see that I have two identical string literals, and so it only creates that string in memory once. We can even see this if we use the -S option to gcc so that it outputs the assembly.

code:

.file "basic.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "wtd\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -12(%ebp)
movl -12(%ebp), %eax
call __alloca
call ___main
movl $LC0, -4(%ebp)
movl $LC0, -8(%ebp)
movl $0, %eax
leave
ret

Don't worry if you don't understand that. Assembly can be fairly vicious, and this is x86 assembly, and it uses AT&T style syntax which is generally regarded with little fondness.

The important part is:

code:

LC0:
.ascii "wtd\0"
.text

This only happens once.

But I digress.

Let's make sure that our strings are located two different places in memory.

code:

#include <stdio.h>

int main()
{
char *foo = "wtd";
char *bar = malloc(sizeof(char) * 4);

strcpy(bar, foo);

if (foo == bar) printf("Hello\n");

return 0;
}

Now, we compile and run it, and we get nothing. Why?

Again, it comes down to pointers. The foo and bar variables are not the strings themselves. They are simply pointers. They point to the strings. Since these two strings reside in different places in memory, it's natural that their pointers are not equal.

So we need to compare them character by character, with strcmp.

code:

#include <stdio.h>
#include <string.h>

int main()
{
char *foo = "wtd";
char *bar = malloc(sizeof(char) * 4);

strcpy(bar, foo);

if (strcmp(foo, bar) == 0) printf("Hello\n");

return 0;
}

Wait. Why are we checking to see if the return value of strcmp is zero? Isn't zero false?

Yes, zero is regarded as false in C, but, strcmp does a three-way comparison. It does not check to see if one string is merely equivalent to another. It checks to see if it's equal, less than, or greater than another. It returns zero to indicate equivalence, and one and negative one to indicate the other conditions.

jamonathin · **Posted:** Wed Jun 21, 2006 11:16 am

This is perfect wtd. I was talking to one of my soon-to-be professors at Windsor, and he said it would be a good idea to learn C over the summer. I've dealt a bit with C++, but I'm not really shure on the differences between the two, except "C++ will blow your whole leg away" from a quote somewhere on this site.
I haven't read everything, but I will be Smile

.

md · **Posted:** Wed Jun 21, 2006 2:25 pm

C++ will *not* blow your whole leg away. That implies that the rest of you will be fine... that is far from the case. C++ can blow you, your immediate surroundings and any living thing therein to kingdom come. And that's on a good day.

C is equally dangerous, and I find it easier to make mistakes in if your not careful. That doesn't mean it isn't a great language, just that there is a lot of power there and if you really try you can easily break the universe.

wtd · **Posted:** Wed Jun 21, 2006 2:41 pm

The thing with C++....

It is more powerful than C as a language, especially with regard to templates. The standad libraries take advantage of this. However, compilers are not especially great at spitting out template errors that really identify the source of the problem. Thus even simple mistakes can spawn some nasty-looking error messages.

md · **Posted:** Wed Jun 21, 2006 9:23 pm

Exactly, great tutorial too wtd. Definitely worth a read if your interested in learning C.

Tony · **Posted:** Wed Jun 21, 2006 9:48 pm

Very impressive read.

This pretty much covers half of my first year C course, and then some Laughing

jamonathin · **Posted:** Thu Jun 22, 2006 11:24 am

Well, i've already added it to My Favorites Very Happy

. And since Tony is at Waterloo - i believe Confused

, then hopefully this should cover most of my C course at Windsor.

Cervantes · **Posted:** Sat Jun 24, 2006 1:22 pm

Absolutely phenomenal, wtd! Thank you very much. Very Happy

wtd · **Posted:** Sat Jul 01, 2006 4:39 pm

A test. What is the output of the following program?

code:

#include <stdio.h>

int main()
{
char *foo = "foo";
for (; *foo; ++foo) printf("%c", *foo);
printf("%s\n", foo);

return 0;
}

rdrake · **Posted:** Sat Jul 01, 2006 9:48 pm

wtd wrote:

A test. What is the output of the following program?

code:

#include <stdio.h>

int main()
{
char *foo = "foo";
for (; *foo; ++foo) printf("%c", *foo);
printf("%s\n", foo);

return 0;
}

I would've thought it would be like the following, but was wrong after running it under gcc.

/bin/csh wrote:

% ./atest
foofoo
%

However, it gave me the following.

/bin/csh wrote:

%./atest
foo
%

Care to explain why that is? Strange thing is by commenting out one of the lines which one would think outputs "foo" still outputs the same thing, which further confuses me.

rizzix · **Posted:** Sun Jul 02, 2006 12:49 am

The pointer "foo" has been modified by that for-loop, thus the second printf prints nothing (null string '\0').

wtd · **Posted:** Sun Jul 02, 2006 6:09 pm

Another test question:

Using the strstr function in string.h, determine how many characters into a string a substring starts. Explain how to determine if a substring does not occur in a string.

OneOffDriveByPoster · **Posted:** Tue Jan 01, 2008 10:10 pm

What kind of basic datatypes do we have at our disposal?
...

unsigned short
unsigned
unsigned long
unsigned long long
long double
signed char
unsigned char
_Bool

... there might be others, but these are worth mentioning...

Srlancelot39 · **Posted:** Wed Jul 09, 2014 8:03 pm

(I hope this isn't considered necroposting, if it is, sorry...)
Is there a way to efficiently check multiple conditions in an if/while statement?

For example, instead of:

c:

while (menuchoice != 'A' && menuchoice != 'B' && menuchoice != 'C') //imagine 7 more comparisons in this format...

have:

c:

while (menuchoice != ('A' || 'B' || 'C')) //or something to this effect that saves space and redundancy

Thanks!

EDIT: I am posting this here (even so many years later) not only because I would like to know if this is possible, but also because I believe it would be an excellent addition to the tutorial (that is, if the aforementioned syntax exists).

Insectoid · **Posted:** Wed Jul 09, 2014 8:13 pm

As far as I know there isn't any way to do this.

	Computer Science Canada Programming C, C++, Java, PHP, Ruby, Turing, VB Username: Password: Register
Wiki Blog Search Turing Chat Room Members