-----------------------------------
wtd
Tue Mar 29, 2005 2:39 pm

Education by Example: Linked List
-----------------------------------
Introduction

One of the crazy things about learning, is that you generally have to learn stuff others already have.  You have to reinvent the wheel to really understand how the wheel works.  Computer science is a perfect example of this.

The great, all-important question is how to reinvent the wheel.  What tools should someone use to do so?  

The following takes a look at implementing basic data structures (and functions of them) in three different programming languages.  Two of them I am fond of, and the third is Java, which has risen to quite a bit of popularity in computer science education.  Noteworthy is the fact that all of the included languages use garbage collection, and thus free the programmer from thinking about memory management issues.  I may expand this and change that later.

Linked List

The linked list is a simple data structure, and one that is quite useful for having collections of data that can grow beyond preset bounds.  It's is composed of nodes.  In a singly linked list each node is connected to a succeeding node.  Link a chain, it's then possible to get from one end of the list to the other by following links.  A doubly linked list simply lets one easily go backward as well as forward.  For the same of this discussion we'll look at singly linked lists.

For the purposes of this discussion the linked list need only store integers.  Subsequent discussion will give us the ability to store many other types of data.  Onto the code!

Haskell

data Node = Node Int Node | Empty

Here I've succinctly laid out the rules for a linked list.  A node either holds an int and another node, or is empty.

I need now a way of determining if the list is empty.  Again, we're basically just spelling out the rules for a linked list.

empty Empty = True 
empty _ = False

Now, what if we want to find out if a node is the last in a list?  Clearly it is if the node is empty, or if the next node is Empty.

lastNode Empty = True
lastNode (Node _ Empty) = True
lastNode _ = False

A slightly more complicated question: how long is the list?  To find this out we have to walk through the list.  Basically we start with a node.  If it's not empty we count one, then add that to the result of finding the length of the list for the next node.  An empty node has a length of zero.

listLength Empty = 0
listLength (Node _ next) = 1 + length next

The next question is: how do we print everything in the list?

Perhaps the question should be: how do we generate a string representation of the list?  Once we have a string, we can use standard printing functions.

An empty list gives us an empty string.  Otherwise tack on a string representation of the value, and if it's not the last node, also tack on a comma and the results of stringifying the rest of the list.

instance Show Node where
  show Empty = ""
  show (Node value Empty) = show value
  show (Node value next) = show value ++ ", " ++ show next

Let's create a function which checks to see if a value is in a list.  For this we walk over the list.  If a node has the value we're looking for, it's true.  Otherwise check to see if it's in the rest of the list.  Obviously, an empty node will never contain the desired value.

hasValue _ Empty = False
hasValue desiredValue (Node value next) = 
  value == desiredValue || hasValue desiredValue next

Now let's add something to the end of the list.  If the existing list is empty, then we'll simply create a new node holding that value.  Otherwise we take the exising node, but switch its next node to the result of adding the new value to the rest of the list.

addToList newValue Empty = Node newValue Empty
addToList newValue (Node existingValue next) = 
  Node existingValue (addToList newValue next)

One last trick.  Let's check to see if two lists are equal.  This means they must contain all of the same values in the same order.  Obviously two empty lists are equal.  An empty list is not equal to anything else.  For nodes containing values, the values have to be equal, and the remaining lists have to also be equal.

instance Eq Node where
  Empty == Empty = True
  Empty == _ = False
  _ == Empty = False
  Node value1 next1 == Node value2 next2 =
    value1 == value2 && next1 == next2

So, our linked list code looks like:

module LinkedList where

data Node = Node Int Node | Empty

instance Show Node where
  show Empty = ""
  show (Node value Empty) = show value
  show (Node value next) = show value ++ ", " ++ show next

instance Eq Node where
  Empty == Empty = True
  Empty == _ = False
  _ == Empty = False
  Node value1 next1 == Node value2 next2 =
    value1 == value2 && next1 == next2

addToList newValue Empty = Node newValue Empty
addToList newValue (Node existingValue next) = 
  Node existingValue (addToList newValue next)

hasValue _ Empty = False
hasValue v (Node nv next) = v == nv || hasValue v next

empty Empty = True 
empty _ = False

lastNode Empty = True
lastNode (Node _ Empty) = True
lastNode _ = False

listLength Empty = 0
listLength (Node _ next) = 1 + listLength next

The most important thing to take away from this example implementation is that Haskell allowed me to fairly directly describe the problem.  Next we look at a Java solution.

Java

Java is a rather heavily object-oriented language, so we start by creaing a class to represent a node.  Each node should contain an integer and anothing node.  That node can be null, indidating that there's no next node.  

We'll also create a constructor.

class Node
{
   private int value;
   private Node next;

   public Node(int newValue)
   {
      value = newValue;
      next  = null;
   }
}

Next we create accessors to allow us to get the value, and the next node.

class Node
{
   private int value;
   private Node next;

   public Node(int newValue)
   {
      value = newValue;
      next  = null;
   }

   public int getValue() { return value; }
   public Node getNext() { return next; }
   public void setNext(Node newNextNode) { next = newNextNode; }
}

Now, since we're using a null Node object to represent the empty state, we can't call any methods on it, and testing for the null state will take the place of the "empty" function in the Haskell example.

Next in the previous example, I created a function to see if the current node was the last in the list.  For this we can simply test to see if the next node is null.

public boolean last()
{
   return next == null;
} 

Now, let's create a method to walk a list and determine its length.  Haskell has no looping syntax, so we used the more natural recursion.  Java does have loops, so we'll use those.  Here we use a temporary node which points to each node in turn until the node is null.

public int length()
{
   Node temp = this;
   int count = 0;
   
   while (temp != null)
   {
      temp = temp.getNext();
      count++;
   }

   return count;
}

We can use the same walking pattern to generate a string representation of the list.  We'll also use a buffer to prevent memory problems.

public String toString()
{
   StringBuffer sb = new StringBuffer();
   Node temp = this;
   
   while (temp != null)
   {
      sb.append(temp.getValue());
      if (!temp.last())
         sb.append(", ");

      temp = temp.getNext();
   }

   return sb.toString();
}

Again, we'll apply the walking pattern to the check to see if the list contains a value.  The problem here, that's rather easy to see is that this relies heavily on the particular behavior of the language.  The false value is returned if nothing is returned from within the loop.

public boolean hasValue(int desiredValue)
{ 
   Node temp = this;
      
   while (temp != null)
   {
      if (temp.getValue() == desiredValue)
         return true;

      temp = temp.getNext();
   }

   return false;
}

I now need to be able to add some value to the end of the list.  Again, I need to use a loop, but I don't want to use the same strategy, since that strategy walks off the end of the list and I need to keep a reference to the last node.

public void append(int newValue)
{
   Node temp = this;
   
   while (!temp.last())
   {
      temp = temp.getNext();
   }

   temp.setNext(new Node(newValue));
}

One last trick, again.  I want to be able to test for equality.  We simply walk both lists simultaneously.  If they're of different lengths, we can assume false.  If any one value is not equal, we automatically assume a result of false.  If it walks off the end of either list without returning, we can assume they've passed all equality tests and return true.

public boolean equals(Node otherList)
{
   Node temp1 = this;
   Node temp2 = otherList;

   if (temp1.length() != temp2.length())
      return false;

   while (temp1 != null && temp2 != null)
   {
      if (temp1.getValue() != temp2.getValue())
         return false;

      temp1 = temp1.getNext();
      temp2 = temp2.getNext();
   }

   return true;
}

The full code:

class Node
{
   private int value;
   private Node next;

   public Node(int newValue)
   {
      value = newValue;
      next  = null;
   }

   public int getValue() { return value; }
   public Node getNext() { return next; }
   public void setNext(Node newNextNode) { next = newNextNode; }

   public boolean last()
   {
      return next == null;
   }

   public int length()
   {
      Node temp = this;
      int count = 0;
      
      while (temp != null)
      {
         temp = temp.getNext();
         count++;
      }

      return count;
   }

   public String toString()
   {
      StringBuffer sb = new StringBuffer();
      Node temp = this;
   
      while (temp != null)
      {
         sb.append(temp.getValue());
         if (!temp.last())
            sb.append(", ");

         temp = temp.getNext();
      }

      return sb.toString();
   }

   public boolean hasValue(int desiredValue)
   { 
      Node temp = this;
      
      while (temp != null)
      {
         if (temp.getValue() == desiredValue)
            return true;

         temp = temp.getNext();
      }

      return false;
   }

   public void append(int newValue)
   {
      Node temp = this;
   
      while (!temp.last())
      {
         temp = temp.getNext();
      }

      temp.setNext(new Node(newValue));
   }

   public boolean equals(Node otherList)
   {
      Node temp1 = this;
      Node temp2 = otherList;

      if (temp1.length() != temp2.length())
         return false;

      while (temp1 != null && temp2 != null)
      {
         if (temp1.getValue() != temp2.getValue())
            return false;

         temp1 = temp1.getNext();
         temp2 = temp2.getNext();
      }

      return true;
   }
}

Ruby

The Ruby example will be similar to the Java example, for the most part, save that Ruby's added expressiveness will allow perhaps for more understandable code.  As the various methods have been documented in the Java example,I'll simply post the code here.

class Node
   attr_reader :value
   attr_accessor :next

   def initialize(value)
      @value, @next = value, nil
   end

   def last?
      @next.nil?
   end

   def length
      temp = self
      count = 0

      until temp.nil?
         count += 1
         temp = temp.next
      end

      count
   end

   def to_s
      temp = self
      str = ""

      until temp.nil?
         str += temp.value.to_s
         str += ", " unless temp.last?

         temp = temp.next
      end

      str
   end

   def has_value?(desired_value)
      temp = self
      
      until temp.nil?
         return true if temp.value == desired_value
         temp = temp.next
      end

      false
   end

   def  result1
      Nothing -> result2
  where
    result1 = treeStartingWith desiredValue b1
    result2 = treeStartingWith desiredValue b2

One last trick and then I'll show off the complete module and move onto Java and Ruby.  Let's say I want all trees that begin with a desired value?

For this, and in other languages I'll use list types provided by the language.

Clearly, again, a search on an empty tree finds nothing.  Searching on a branch should check for matching trees in the subtrees, and, if the current branch matches return that as well.

allTreesBeginningWith _ Empty = []
allTreesBeginningWith desiredValue branch@(Branch value b1 b2) 
  | value == desiredValue = branch:all
  | otherwise = all 
  where 
    t1 = allTreesBeginningWith desiredValue b1
    t2 = allTreesBeginningWith desiredValue b2
    all = t1 ++ t2  

Wait, I changed my mind.  :)

One more trick.  We want a string representation of a tree.  Again, not surprisingly, this is a recursive solution.  The output from this will be replicated in the Java and Ruby examples. 

instance (Show a) => Show (Tree a) where
  show Empty = "Empty"
  show (Branch value b1 b2) = 
    "(Branch " ++ show value ++ " "
               ++ show b1    ++ " "
               ++ show b2    ++ ")"

The full source:

module BTree where

data Tree a = Branch a (Tree a) (Tree a) | Empty

treeSize Empty = 0
treeSize (Branch _ b1 b2) = 
  1 + treeSize b1 + treeSize b2

hasValue _ Empty = False
hasValue desiredValue (Branch value b1 b2) =
  value == desiredValue || 
  hasValue desiredValue b1 ||
  hasValue desiredValue b2

instance (Eq a) => Eq (Tree a) where
  Empty == Empty = True
  Empty == _ = False
  _ == Empty = False
  Branch v1 b11 b12 == Branch v2 b21 b22 =
    v1 == v2 && b11 == b21 && b12 == b22

treeStartingWith _ Empty = Nothing
treeStartingWith desiredValue branch@(Branch value b1 b2) 
  | value == desiredValue = Just branch
  | otherwise = 
    case result1 of
      Just _ -> result1
      Nothing -> result2
  where
    result1 = treeStartingWith desiredValue b1
    result2 = treeStartingWith desiredValue b2

allTreesBeginningWith _ Empty = []
allTreesBeginningWith desiredValue branch@(Branch value b1 b2) 
  | value == desiredValue = branch:all
  | otherwise = all 
  where 
    t1 = allTreesBeginningWith desiredValue b1
    t2 = allTreesBeginningWith desiredValue b2
    all = t1 ++ t2

instance (Show a) => Show (Tree a) where
  show Empty = "Empty"
  show (Branch value b1 b2) = 
    "(Branch " ++ show value ++ " "
               ++ show b1    ++ " "
               ++ show b2    ++ ")"

Java

Now, let's look at a comparable program in Java.  As I mentioned at the top, this requires Java 5.0, which includes generics.

The constructors should give us the ability to set just the value, with the branches being null, or to set the value and the branches.  We'll also need read-only (for simplicity) accessors.

class Branch
{
  private T value;
  private Branch left, right;

  public Branch(T initValue)
  {
    value = initValue;
    left = null;
    right = null;
  }

  public Branch(T initValue, Branch initLeft, Branch initRight)
  {
    value = initValue;
    left = initLeft;
    right = initRight;
  }

  public T getValue() { return value; }
  public Branch getLeft() { return left; }
  public Branch getRight() { return right; }
}

The next step is to add a recursive method for calculating the size of the tree.

public int size()
{
  int count = 1;
  if (left != null) count += left.size();
  if (right != null) count += right.size();
  return count;
}

Next we'll want to determine if a tree has a particular value.

public boolean hasValue(T desiredValue)
{
  return value.equals(desiredValue) ||
         (left != null && left.hasValue(desiredValue)) ||
         (right != null && right.hasValue(desiredValue));
}

Now, we'll want to be able to determine if two trees are equal.

public boolean equals(Branch otherTree)
{
  if (otherTree == null)
    return false;
  
  return value.equals(otherTree.getValue()) && 
         ((left != null && left.equals(otherTree.getLeft())) 
          || (left == null && otherTree.getLeft() == null)) &&
         ((right != null && right.equals(otherTree.getRight())) 
          || (right == null && otherTree.getRight() == null));
}

Getting a tree that starts with a particular value.  Returns null if it can't find one.  Here, as before, we have to diligently check to make sure the left and right branches are not null before we can call a method on them.

public Branch treeStartingWith(T desiredValue)
{
  if (value.equals(desiredValue))
    return this;
  else 
  {
    Branch result = 
      left == null ? null 
                   : left.treeStartingWith(desiredValue);

    if (result != null) return result; 

    result = 
      right == null ? null 
                    : right.treeStartingWith(desiredValue);

    if (result != null) return result;

    return null;
  }
}

The corresponding method of course finds all trees that start with a particular value.

public ArrayList allTreesBeginningWith(T desiredValue)
{  
  ArrayList list = new ArrayList();

  if (value.equals(desiredValue))
    list.add(this);

  if (left != null)
    list.addAll(left.allTreesBeginningWith(desiredValue));

  if (right != null)
    list.addAll(right.allTreesBeginningWith(desiredValue));
      
  return list;
}

The final code looks like:

class Branch
{
  private T value;
  private Branch left, right;

  public Branch(T initValue)
  {
    value = initValue;
    left = null;
    right = null;
  }

  public Branch(T initValue, Branch initLeft, Branch initRight)
  {
    value = initValue;
    left = initLeft;
    right = initRight;
  }

  public T getValue() { return value; }
  public Branch getLeft() { return left; }
  public Branch getRight() { return right; }

  public int size()
  {  
    int count = 1;
    if (left != null) count += left.size();
    if (right != null) count += right.size();
    return count;
  }

  public boolean hasValue(T desiredValue)
  {
    return value.equals(desiredValue) ||
           (left != null && left.hasValue(desiredValue)) ||
           (right != null && right.hasValue(desiredValue));
  }

  public boolean equals(Branch otherTree)
  {
    if (otherTree == null)
      return false;
  
    return value.equals(otherTree.getValue()) && 
           ((left != null && left.equals(otherTree.getLeft())) 
            || (left == null && otherTree.getLeft() == null)) &&
           ((right != null && right.equals(otherTree.getRight())) 
            || (right == null && otherTree.getRight() == null));
  }

  public Branch treeStartingWith(T desiredValue)
  {
    if (value.equals(desiredValue))
      return this;
    else 
    {
      Branch result = 
        left == null ? null : left.treeStartingWith(desiredValue);

      if (result != null)
        return result; 

      result =
        right == null ? null : right.treeStartingWith(desiredValue);

      if (result != null)
        return result;

      return null;
    }
  }

  public ArrayList allTreesBeginningWith(T desiredValue)
  {
    ArrayList list = new ArrayList();

    if (value.equals(desiredValue))
      list.add(this);

    if (left != null)
      list.addAll(left.allTreesBeginningWith(desiredValue));

    if (right != null)
      list.addAll(right.allTreesBeginningWith(desiredValue));
      
    return list;
  }

  public String toString()
  {
    StringBuffer sb = new StringBuffer("(Branch ");
    sb.append(value);
    sb.append(" ");
    sb.append(left == null ? "Empty" : left);
    sb.append(" ");
    sb.append(right == null ? "Empty" : right);
    sb.append(")");
    return sb.toString();
  }
}

Ruby

Having fairly well documented both the Haskell and Java example implementations, I feel safe in providing little documentation for the Ruby example.

The only thing I'll point out is that variables in Ruby dn't have types themselves (values do).  Thus no extra syntax is required for the tree to be generic.

class Branch
  attr_reader :value, :left, :right

  def initialize(value, left = nil, right = nil)
    @value, @left, @right = value, left, right
  end

  def size
    1 + (@left.nil? ? 0 : @left.size) + (@right.nil? ? 0 : @right.size)
  end

  def has_value?(desired_value)
     @value == desired_value or 
    (@left.nil? ? false : (@left.has_value? desired_value)) or
    (@right.nil? ? false : (@right.has_value? desired_value))
  end

  def ==(other_tree)
    return false if other_tree.nil?

     @value == other_tree.value and
    (@left.nil? ? other_tree.left.nil? : @left == other_tree.left) and
    (@right.nil? ? other_tree.right.nil? : @right == other_tree.right)
  end

  def tree_starting_with(desired_value)
    if @value == desired_value 
      self
    else
      (@left.nil? ? nil : @left.tree_starting_with(desired_value)) or
      (@right.nil? ? nil : @right.tree_starting_with(desired_value))
    end
  end

  def all_trees_beginning_with(desired_value)
    list = []

    list.push(self) if @value == desired_value
    list.push(*@left.all_trees_beginning_with(desired_value)) unless @left.nil?
    list.push(*@right.all_trees_beginning_with(desired_value)) unless @right.nil?

    list
  end

  def to_s
    "(Branch #{@value} #{@left.nil? ? "Empty" : @left} #{@right.nil? ? "Empty" : @right})"
  end
end

Conclusions?

The single biggest thing to note here is the use of the null or nil values in the Java and Ruby examples.  This may seem okay, but it really presents us with a problem.  We can't treat an empty tree as a tree at all, so we can't have a method that will find the size of an empty tree.  Instead we have to test for the tree being null (or "nil") and then treat it differently.

With Haskell, we can treat an empty tree as a tree, meaning we need not test at run-time for null or nil.  As a direct result of this, the recursion flows far more naturally.

Please, debate!  :)