This is a tutorial about linked lists, primarily using C and C++ for examples. The information presented here is basically taken from the second-semester computer science course at NC State University.
Before you start reading, you should be able to read and understand C and/or C++ (especially all the pointer notation in these languages), and you should have a decent knowledge of how pointers work.
Why learn about linked lists ?
Suppose you are given a programming task that involves storing a list of integer
x–coordinates to be used in drawing a picture. Aha!
you might think, I'll
use an array to store all those values.
And indeed, an array is one solution
to the task. Arrays store lists of data, so you might just go ahead and declare
your data structure as :
const int size = 128; int xcoords[size];
Now, all would be well and good. But what if I came along in your program and tried to add 129 points to your data structure? You might have built-in dumb-user-devices to prevent me from doing such a diabolical deed, or your program might overwrite some of the data that's already in the array, or it might do something altogether different (and completely undesirable). You would start getting bug reports, and then nasty emails would come your way complaining that users want to be able to add 129 points. You really need to face it : you have to be able to dynamically add data to the list. You could simply reserve enough space for 100000 coordinates, but that will eat up memory quickly, and you know that a user will eventually find a way to break a program with such a hard-coded restriction. Even by declaring your array (in C++) as :
int size = 128; // possibly change the value of size here ... int *xcoords = new int[size];
or as :
int getNumXCoords(void); // ... later, inside a function call somewhere ... int *xcoords = new int[getNumXCoords()];
The user will have to know ahead of time how many points she is planning on using. There must be a better way.
Thankfully, there are several better ways, and one of them is a linked list.
What is a linked list ?
A linked list is a very useful data structure ; it is very much like an array, but it can dynamically grow and shrink to fit the number of data points that are entered in a program. Linked lists can be implemented in any computer language that is capable of using pointers (or any other analogous data type), but the code here uses C and sometimes C++.
List nodes
Basically, a non-null list consists of several nodes arranged in a particular linear order, just like an array. Unlike an array, though, each node consists of, in its simplest form, a data structure that has both a data element and a pointer.
The data item holds the node's data element (like the x-coordinate), and the
pointer (often called the next
pointer) points to, or holds the
address of, the next node in the list. Sometimes, as shown in the diagram, the
data item is itself a pointer, but I'll stick with the example case of the data
being integers. To make a list of arrays of floats, for example, just change the
type of the data
element in the linked list nodes.
There is one convention I'd like to clear up before continuing, and that is the
null pointer convention. Pointers can either point to something by holding some
useful data value, or they can point to nothing by holding a special value,
often called NULL
or NUL
or just plain 0
.
Different folks have different ways of representing null pointers in diagrams.
(Usually, it's a slash, or ground, or something else null-looking.)
I prefer the electrical ground symbol (three parallel lines); therefore, that is the notation that I will use here. Before going any further, draw a few list nodes just to get a feel for using your pencil or crayons or whatever you've got near your computer.
The list structure
Here's the conceptual diagram of a small linked list that holds integers as data :
Notice three things about the list :
The pointer at the end of the list is null.
Each node's
next
pointer (except for the last node) points to another unique node in the list, making the list a linear graph.
The first node of the list is pointed to by a pointer named
top
. (It could also be called start, go, fred, or xcoords ...) What is the type of thetop
pointer (e.g. is it a pointer tofloat
or achar
, or something else) ?
Thus a linked list is a linear data structure that is a lot like an array. The tricky part is that linked lists are a little bit more complex to implement than arrays. I'll give a small start, and then I will provide some programming ideas to help the linked list concept along.
Linked list code starter
To write linked list code, you first need to create the nodes. This is an easy enough process using the class concept in C++. A simple example might be :
class IntList {
public:
IntList *next; // the pointer to the next node
int data; // the node's data element
};
In the C programming language, which does not use objects per se, we might write the following instead (note the striking similarity ...) :
typedef struct _IntListNode IntList;
struct _IntListNode {
IntList *next;
int data;
};
So, using our new class (or struct), we could build the list above by writing in C, for example :
IntList *one = malloc(sizeof(IntList)); IntList *two = malloc(sizeof(IntList)); IntList *three = malloc(sizeof(IntList)); IntList *four = malloc(sizeof(IntList));
IntList *top = one;
one->data = 128; one->next = two; two->data = 5; two->next = three; three->data = 1979; three->next = four; four->data = -7; four->next = NULL;
Now, you might think this is a little bit repetitive, and it certainly is. There are much better ways to handle list formation. My current favorite is this C++ code :
class IntList {
public:
// constructor initializes both data members
IntList(int d = 0, IntList *n = 0)
: data(d), next(n) { }
// destructor recursively deletes whole list
~IntList() { if(next) delete next; }
private:
IntList *next; // pointer to next node
int data; // node's data element
};
The constructor basically initializes the class instance's data elements. The
code after the colon on the constructor's line assigns d
to the class
instance's data
element, and n
to the class instance's next
element.
If you're not sure how the destructor destroys the entire list, draw out a list
with your crayon, and then pretend that you called the first node's destructor.
Notice that when you delete the first node by calling its destructor, the
next
pointer gets deleted as well. See what happens to the node
that follows the first one (and the node that follows the second one, and so
on).
To illustrate how this has changed things, let's build the list again using the new class constructor, this time in C++.
// default argument here makes next null IntList *top = new IntList(-7);
top = new IntList(1979, top); top = new IntList(5, top); top = new IntList(128, top);
What a difference ! Of course, you can use this in a loop to automate the process even further.
How do I get data out of the list ?
There are two basic ways to traverse a list : recursively and iteratively. The iterative way makes use of a loop that has a fixed termination point. Recursion relies on nested function calls to traverse the list. Here is an example of using iteration to traverse the list and print out the data elements :
// suppose buildList() returns the list we made above. IntList *top = buildList();
// do other things before you need the list again ...
for (IntList *t = top; t; t = t->next)
cout << "element data is " << t->data << endl;
Note that top could even be null when the for loop starts ; the code would still work.
For the curious, here is an example of a function in C that would traverse a linked list recursively :
void
printListRecursive(IntList *l)
{
if (! l) return;
printf("element data is %d\n", l->data);
printListRecursive(l->next);
}
Traversing the list recursively, like doing anything else recursively, may seem a bit obfuscated until you become familiar with the process of recursion. It is very important to remember to put a terminating condition in a recursive function : we don't want ten million cats in the hat running around here !
Once you've seen the code for doing one thing to each list element (like printing them, for example), it's pretty easy to change it to do something else to each list element. For instance, you could count the elements in the list by doing the following :
IntList *top = buildList();
// do other things before you need the list again ...
int numElem = 0; for (IntList *t = top; t; t = t->next) numElem++;
That was iterative ; the recursive function might look like this :
int
countListRecursive(IntList *l)
{
if (! l) return 0;
return 1 + countListRecursive(l->next);
}
Extending data retrieval
Now that you know how to do things to each item in the list sequentially, it might seem like a logical next step to use this same process to do nothing to each element in the list sequentially until you find a particular element that you need. This is just the way that one goes about, for example, finding an element in the list or counting only specific elements in the list. Of course, there are many other tasks that you might do to elements in a list that fall into this category, but searching is one of the most common.
So here's an example of how you might traverse the list iteratively (in C) to
find a specific element. The function returns NULL
if the number
we're looking for isn't in the list.
IntList *
find(int needle, IntList *haystack)
{
IntList *t;
for(t = haystack; t; t = t->next)
if (t->data == needle) return t;
return NULL; }
Another example of how you might use this functionality would be to extract only those elements from a list that satisfy certain criteria, building up a list of such elements that you find. You might want to find all the prime numbers in a given list of integers, for example. Here's some code in C++ that uses recursion to do that.
IntList *
primesFromListRecursive(IntList *start, IntList *filtered)
{
if (! start) return filtered;
if (isPrime(start->data) {
IntList *newF = new
IntList(start->data, filtered); return
primesFromListRecursive(start->next, newF);
} else {
return primesFromListRecursive(start->next, filtered);
}
}
List manipulation: deleting, reversing, and randomizing
Another step you might want to take with this neato list concept is to change the list once you've built it up. There are several ways to change a list, but they are on different levels of complexity. So far, we've addressed the simplest levels of complexity, with building an unsorted list and traversing the list to do something to each element. Now let's take a look at how you might delete an element.
Basics to cover in linked list programming
Whenever you are dealing with linked lists, it's important to address three types of elements in the list :
the beginning
the end
a general element that's between the middle and the end
Sounds pretty easy, and the above three cases are enough to keep yourself out of hot water when programming a linked list. If you've covered the case of the beginning of the list, covered the case of the end of the list, and covered an arbitrary case in the middle of the list, you're set !
Deleting
So how does this apply to deleting ? Well, let's take a look at the list from above and think of how to delete the beginning, the end, and the middle of the list.
The beginning is pretty easy : we would need to keep a temp pointer to the
beginning (so we don't lose track of it), reroute the top
pointer
to the value of the first element's next
pointer, and then set the
first element's next
pointer to NULL
. Then we'll be
safe to delete that node without destroying the rest of the list, and thanks to
the rerouted top
pointer, we know where the new start of the list
will be.
The end is even easier, once you see how the beginning works : once we find the
end of the list, we keep a temp pointer to that element, set the previous node's
next
pointer to NULL
(because it will become the end
of the list), and delete the last node.
Finally, deleting a node from the middle : this is a little more complex, but it
uses the same process as before. We again need to keep a temp pointer to the
node we're trying to get rid of. Then, like the case for the beginning of the
list, we reroute the pointer that points to the current node to the value of the
current node's next
pointer. After we set the current node's
next
to NULL
, we'll be set to delete the node.
That covers our three cases. Now the question is how to unify the three into one
compact function ... it becomes more clear when you notice that in all three
cases, we take the pointer that points to the node we want to delete and then
reroute it to the node after the current node. There are two somewhat special
cases: in the beginning of the list, the previous pointer is the pointer that
keeps track of the start of the list, and for the end of the list, the node
after the one we want to delete is effectively NULL
. So here is one
way to delete a node from a linked list, iteratively in C :
IntList *
delete(IntList *l, integer value)
{
IntList *ret = l, *cur = l, *prev = l;
/* advance the current node pointer, trailing behind
the previous node pointer, until we find the end
of the list or the value we want. */
while (cur && (cur->data != value)) {
prev = cur;
cur = cur->next;
}
/* if we found a node with the value, delete it. */
if (cur && (cur->data == value)) {
if (cur == prev) ret = cur->next;
else prev->next = cur->next;
cur->next = NULL; free(cur);
}
return ret; }
There are some things to keep in mind about this particular implementation.
First, it's not pretty. Next, this function deletes the first node it encounters
in the list that has a data
value of value
. There are
ways to delete nodes based on a pointer to a particular node. (Say, for
instance, you found a pointer to the node using a search function, then passed
the pointer to the delete function.) In addition, this function deletes a
maximum of one node from the list. There are ways to delete all nodes with a
specified data element.
Sorting (as well as reversing and shuffling)
Sorting is a rather large topic for computer science in general. Because my experience and time are rather limited I really can't do justice to them at this point.
More information
A great intro page is at The Code Project. It discusses use of the CList class included with the Microsoft Foundation Classes (MFC).
For those of you looking for a quick code fix, check the 10 minute question from devx.