Computer Science Canada

O'Caml Flexibility: Why You Should Check It Out

Author:  wtd [ Mon May 02, 2005 7:26 pm ]
Post subject:  O'Caml Flexibility: Why You Should Check It Out

I'll say it up front. O'Caml isn't the prettiest of programming languages. It has a lot of syntactic warts.

That said, should you learn it? Absolutely.

O'Caml is winning over more and more people because it's quite flexible. The language itself offers all of the usual functional goodness. Functions are first class and can be partially applied. Types are inferred wherever possible. It's statically typed, catching virtually all type errors at compile time. It has pattern matching. In addition, there are imperative features, good support for object-oriented programming, and a good module system.

Running an O'Caml program is another place where its flexibility can be seen. O'Caml programs can be compiled to bytecode and interpreted. They can be compiled to native machine code. Or, for learners, there's a powerful interactive interpreter.

Author:  wtd [ Tue May 03, 2005 2:34 am ]
Post subject: 

Since O'Caml is often seen as a cleaner, more expressive alternative to C++ that's quicker to program in, and often produces faster code, please feel free to ask for examples implemented in both. Smile

For a starter, let's consider programs to list out all command-line arguments.

Ocaml:
open Array
open Sys;;

Array.iter print_endline Sys.argv;;


c++:
#include <iostream>

int main(int argc, char **argv)
{
   for (int i = 0; i < argc; ++i)
   {
      std::cout << argv[i] << std::endl;
   }

   return 0;
}


Or, if we were to take the imperative approach in O'Caml.

Ocaml:
open Array
open Sys;;

let argc = Array.length Sys.argv in
for i = 0 to argc - 1 do
   print_endline Sys.argv.(i)
done;;


Certainly the first program is the cleanest of the three, but even with the imperative O'Caml program, the for loop is certainly more expressive than C++'s.

Author:  wtd [ Tue May 03, 2005 5:25 pm ]
Post subject: 

The classic grades problem. Read ten grades. Find the average, highest grade and lowest grade.

Ocaml:
open Printf
open Array

include Printf;;

let grades = Array.create 10 0
and sum = Array.fold_left (+) 0
and max = Array.fold_left (fun a b -> if a > b then a else b) 0
and min = Array.fold_left (fun a b -> if a < b then a else b) 10000
and get_grade () = try read_int () with _ -> 0
in
   for i = 0 to Array.length grades - 1 do
      grades.(i) <- get_grade ()
   done;
   let avg = float_of_int (sum grades) /. float_of_int (Array.length grades)
   and max_grade = max grades
   and min_grade = min grades
   in
      printf "Average: %f\n" avg;
      printf "Max: %d\n" max_grade;
      printf "Min: %d\n" min_grade;;


c++:
#include <iostream>

int main()
{
   int grades[10];
 
   for (int i(0); i < 10; ++i)
   { 
      std::cin >> grades[i];
      if (std::cin.fail())
      {
         grades[i] = 0;
      }
   }

   int sum(0);

   for (int i(0); i < 10; ++i)
   { 
      sum += grades[i];
   }

   double average(static_cast<double>(sum) / 10);

   int max(grades[0]);

   for (int i(1); i < 10; ++i)
   { 
      if (grades[i] > max)
      {
         max = grades[i];
      }
   }

   int min(grades[0]);

   for (int i(1); i < 10; ++i)
   { 
      if (grades[i] < max)
      {
         max = grades[i];
      }
   }

   std::cout << "Average: " << average << std::endl
             << "Max: "     << max     << std::endl
             << "Min: "     << min     << std::endl;

   return 0;
}

Author:  Naveg [ Tue May 03, 2005 6:01 pm ]
Post subject: 

did i miss something important or are those two posts strikingly similar?

Author:  wtd [ Tue May 03, 2005 10:47 pm ]
Post subject: 

No. I had some instability just as I posted that screwed things up.

Fortunately I'm a moderator so I can fix my goofs. Wink

Author:  wtd [ Wed May 04, 2005 12:36 am ]
Post subject: 

Ocaml:
class name f l =
   object (self)
      val first = f
      val last = l
      method first = first
      method last = last
      method full_name = first ^ " " ^ last
   end

class formal_name t f l =
   object (self)
      inherit name f l as super
      val title = t
      method title = title
      method full_name = title ^ " " ^ super#full_name
   end

let bobs_name = new formal_name "Sir" "Bob" "Smith" in
   print_endline bobs_name#full_name;;


c++:
class name
{
   private:
      const std::string _first, _last;
   public:
      name(std::string f, std::string l);
     
      std::string first() const;
      std::string last() const;
      std::string full_name() const;
};

class formal_name : public name
{
   private:
      const std::string _title;
   public:
      formal_name(std::string t, std::string f, std::string l);
     
      std::string title() const;
      std::string full_name() const;
};

int main()
{
   formal_name bobs_name("Sir", "Bob", "Smith");

   std::cout << bobs_name.full_name() << std::endl;

   return 0;
}

name::name(std::string f, std::string l)
: _first(f)
, _last(l)
{ }

std::string name::first() const
{
   return _first;
}

std::string name::last() const
{
   return _last;
}

std::string name::full_name() const
{
   return _first + " " + _last;
}

formal_name::formal_name(std::string t, std::string f, std::string l)
: name(f, l)
, _title(t)
{ }

std::string formal_name::title() const
{
   return _title;
}

std::string formal_name::full_name() const
{
   return _title + " " + name::full_name();
}

Author:  wtd [ Wed May 04, 2005 2:26 pm ]
Post subject: 

A basic "pyramid of asterisks program, imperatively in both languages.

Ocaml:
let default_levels = 8
let levels =
   let default_levels = 8
   in
      try int_of_string Sys.argv.(1)
      with
         Failure "int_of_string" ->
            print_endline "Please enter an integer.";
            exit 1
       | _ -> default_levels

let draw_pyramid ch sz =
   for i = 0 to sz - 1 do
      let num_chars = i * 2 + 1
      and num_spaces = sz - i - 1 in
      let chars = String.make num_chars ch
      and spaces = String.make num_spaces ' '
      in
         print_endline (spaces ^ chars)
   done;;

draw_pyramid '*' levels


c++:
#include <iostream>
#include <string>
#include <sstream>

void draw_pyramid(char ch, int sz)
{
   for (int i(0); i < sz; ++i)
   {
      int num_chars = i * 2 + 1;
      int num_spaces = sz - i - 1;
      std::string chars(num_chars, ch);
      std::string spaces(num_spaces, ' ');

      std::cout << spaces << chars << std::endl;
   }
}

int main(int argc, char **argv)
{
   const int default_levels(8);
   int levels;

   if (argc >= 2)
   {
      std::stringstream ss;
      ss << argv[1];
      ss >> levels;
      if (ss.fail())
      {
         std::cerr << "Please enter an integer." << std::endl;
         return 1;
      }
   }
   else
   {
      levels = default_levels;
   }
   
   draw_pyramid('*', levels);

   return 0;
}

Author:  md [ Wed May 04, 2005 3:35 pm ]
Post subject: 

Personally I find the C++ code in all your examples much more strait forward. They might be more lines, but their easy to follow. As for flexibility, does being able to do things in less lines (at the cost of a horrible syntax...) make a language moreflexible then another language that does not make such sacrafices? Given that both languages are turing complete, they are really both equally flexible...

Author:  wtd [ Wed May 04, 2005 5:08 pm ]
Post subject: 

Well, when I speak of flexibility, I actually am primarily referring to the ways to run O'Caml programs. You can compile to native code (ocamlopt is a very good optimizing compiler for 9 different CPU architectures and very frequently produces faster executables than C++ compilers). You can compile to bytecode and use a virtual machine, like Java, but with better garbage collection. You can use the "top-level" to interactively execute commands and test code.

Only one of those is an option with C++.

For the language itself, you have a number of advantages.

The language is statically-typed, and more type-safe than C++. If it compiles, you're pretty much guaranteed there aren't any type problems.

Types are inferred. Did I once actually type out the name of a type in the above code? The compiler is able to figure out from how I used values what type they are.

You get functions as first class items. See:

code:
Array.iter print_endline Sys.argv


The "print_endline" function is passed as a value to the iter function (in the Array module).

You have sane loops. So does C++, you're tempted to say. The problem, though, is that the index in a C++ loop can be mutated within the loop body. Consider:

code:
for (int i(0); i < 10; ++i)
{
   i--;
}


Why should this kind of madness be allowed?

I personally also find the O'Caml for loop much more expressive.

code:
for i = 0 to 9 do ... done


Consider the ability also to have what a C++ programmer would think of as "control structures" return values.

So instead of:

code:
int a;

if (pigs.fly())
   a = 42;
else
   a = 67;


We can have something like:

code:
let a = if pigs#fly then 42 else 67


Now, granted, we could have written:

code:
int a(pigs.fly() ? 42 : 67);


But I find the 'Caml version more expressive.

Functions can easily return more than one value.

Let's say we want to feed a number to a C++ function, then get that number, that number times 2, and that number squared.

Well, we need a type that can hold more than one value.

c++:
struct Result
{
   const int original, times_two, squared;
   Result(int o, int t, int s)
   : original(o)
   , times_two(t)
   , squared(s)
   { }
};

Result foo(int n)
{
   return Result(n, n * 2, n * n);
}


Now, I need to call the function extract the second value and print it.

c++:
int main()
{
   Result r(foo(42));

   std::cout << r.times_two << std::endl;

   return 0;
}


Now, let's see that in O'Caml.

code:
let foo n = (n, n * 2, n * n)
let (_, times_two, _) = foo 42;;

printf "%d\n" times_two


The above also nicely demonstrates pattern-matching. The underscores are values in the pattern that I don't really care about. I can also use much deeper pattern matching.

code:
let bar (m, n, (x, y, z), ((e, f), g)) =
   (* do something with all of those values *)


It's not about lines of code saved, it's about ease of development. The biggest source of lost clock cycles in computing is the time programmers spend typing. Losing that time for trivial stuff like this is silly.

Let's look at OO programming, and inheritance specifically. Resolving naming clashes and referring to methods in a parent class is fairly tedious in C++.

c++:
struct A
{
   std::string a() const { return "a"; }
};

struct B
{
   std::string a() const { return "b"; }
};

struct C : public A, public B
{
   std::string a() const { return A::a() + " " + B::a(); }
};


code:
class a =
   object
      method a = "a"
   end

class b =
   object
      method a = "b"
   end

class c =
   object
      inherit a as a
      inherit b as b
      method a = a#a ^ " " ^ b#a
   end


I don't mind your questions, and hopefully you don't mind my explanation. I can understand your feelings about the syntax. It's very different. I felt the same way too, but I've come to embrace its potential. A lot of people are using it, after all, so there has to be some benefit (that's the philosophy that's driven me to learn new programming languages). Smile

Feel free to keep the questions coming.

Author:  wtd [ Wed May 04, 2005 5:17 pm ]
Post subject: 

An additional benefit is the module system. Let's take a simple example. Let's say in C++ I have a namespace holding a "hello world" function and a string containing the greeting for it to print.

c++:
namespace Hello
{
   std::string greeting("hello world");

   void hello_world()
   {
      std::cout << greeting << std::endl;
   }
}


Now, let's say I want to hide the greeting variable, so it can't be accessed outside of the namespace. Hmmm.... can't do it.

Let's look at O'Caml. First, everything available.

code:
module Hello =
   struct
      let greeting = "hello world"
      let hello_world () = print_endline greeting
   end


Now, if I only want the function to be visible...

code:
module Hello : sig
      val hello_world : unit -> unit
   end =
   struct
      let greeting = "hello world"
      let hello_world () = print_endline greeting
   end

Author:  wtd [ Wed May 04, 2005 9:55 pm ]
Post subject: 

Let's take two lists of integers and check to see if each element in the first list is bigger than its counterpart in the second list. We'll assume they're equal in length.

c++:
bool all_greater_than(const std::vector<int>& list1, const std::vector<int>& list2)
{
   for (int i(0); i < list1.size(); ++i)
   {
      if (list1[i] <= list2[i]) return false;
   }

   return true;
}


Now, let's look at how we can do this in O'Caml.

We can use the for_all2 function in the List module. It loops over two lists, and passes their corresponding elements to a function which returns a boolean value. If all of those boolean values are true, it returns true; otherwise false.

We also need a function to tell us if one element is greater than the other.

code:
let greater_than a b = a > b

let all_greater_than list1 list2 =
   List.for_all2 greater_than list1 list2


But we notice in looking at this that "greater_than" is exactly the same as (>), making it redundant.

code:
let all_greater_than list1 list2 =
   List.for_all2 (>) list1 list2


Now, what's great about O'Caml (and other functional languages) is partial function application. That is, if we don't provide all of the necessary arguments, we get back a function which takes the remaining arguments.

In other words, we can avoid passing the lists explicitly as arguments.

code:
let all_greater_than = List.for_all2 (>)


At this point, of course, we question the point in even making this a separate function. The new function isn't much shorter and it's actually less expressive than simply writing:

code:
List.for_all2 (>) [1;2;3] [0;1;2]


Unfortunately, the same is not possible in C++.

Author:  wtd [ Thu May 05, 2005 2:22 pm ]
Post subject: 

Of course, it's easy enough for me to just say: "use the built-in library function." So, let's create our own function. We'll start a bit less ambitious, and just test for equality.

First we need to define a function. Let's call it "all_equal2" and have it take two lists as arguments.

code:
let all_equal2 list1 list2 = ...


This is obviously psuedo-code until we can fill in the rest.

Now, how do we actually tell if two lists are equal?

  • Well, obviously two empty lists are equal.
  • If the first list is empty, and the second is not, they're not equal.
  • If the first list is not empty, and the second is, they're not equal.
  • If they're both non-empty, check to see if their first elements are equal, and ifthey are, and running the function on both remaining lists yields true, then they're equal.


A pretty simple set of rules, don't you think?

Note the:

Quote:
running the function on both remaining lists yields true, then they're equal.


Yep, it's a recursive function. So we add the "rec" keyword.

code:
let rec all_equal2 list1 list2 = ...


Now, how do we determine whether or not the list is empty, and if not, how do we get the first element, and the remaining list?

Well, pattern matching comes to the rescue. But first, a few things:

  • An empty list in O'Caml looks like:

    code:
    []

  • The :: operator is the "list construction" operator. For instance:

    code:
    [4]


    Is really just:

    code:
    4 :: []


    And:

    code:
    [4;5;6;7;8]


    Is:

    code:
    4 :: [5;6;7;8]


    So we can pattern match with that.

    code:
    match [4;3;5] with x::xs -> print_int x

  • Matches proceed in order, and only one match is made. Therefore we can consider any previous patterns rejected when we write a pattern.


So, let's start filling in our rules:

code:
let rec all_equal2 list1 list2 =
   match (list1, list2) with
      ([], []) -> (* two empty lists *)
    | ([], _) -> (* an empty list and a non-empty list *)
    | (_, []) -> (* a non-empty list and an empty list *)
    | (x::xs, y::ys) -> (* two non-empty lists *)


Now, we know enough to fill in their of those.

code:
let rec all_equal2 list1 list2 =
   match (list1, list2) with
      ([], []) -> true
    | ([], _) -> false
    | (_, []) -> false
    | (x::xs, y::ys) -> (* two non-empty lists *)


But what about the fourth? We need to check to see if x equals y, and then run the function of xs and ys.

code:
let rec all_equal2 list1 list2 =
   match (list1, list2) with
      ([], []) -> true
    | ([], _) -> false
    | (_, []) -> false
    | (x::xs, y::ys) ->
         x = y && all_equal2 xs ys


And that's all there is to it. Not only is it concise, but it's perfectly logical. There's no mucking around with pointers or loops that we need to break out of early. And yet, it's optimized. The function will go no further than absolutely necessary because there are exactly three out four conditions which cause it to stop dead in its tracks and return a value.

Of course, there are a couple more things we can do with this. First, instead of returning false when a length mismatch is encountered, and that's what happens with conditions 2 and 3, let's raise an exception.

The Invalid_argument exception should work nicely.

code:
let rec all_equal2 list1 list2 =
   match (list1, list2) with
      ([], []) -> true
    | ([], _) -> raise (Invalid_argument "List2 too long")
    | (_, []) -> raise (Invalid_argument "List1 too long")
    | (x::xs, y::ys) ->
         x = y && all_equal2 xs ys


Also, we really should be able to use this with any function, as the List.for_all2 function does.

It's pretty simple to do this. Our function need only take an additional argument, which is the function to use as a test. "pred" is short for "predicate".

code:
let rec test_all2 pred list1 list2 =
   match (list1, list2) with
      ([], []) -> true
    | ([], _) -> raise (Invalid_argument "List2 too long")
    | (_, []) -> raise (Invalid_argument "List1 too long")
    | (x::xs, y::ys) ->
         pred x y && all_equal2 pred xs ys


We now need only write:

code:
test_all2 (=) [1;2;3] [1;2;3]


Or, if we have an exception, we can handle it.

code:
try test_all2 (=) [1;2;3] [1;2;3]
with Invalid_argument _ -> false

Author:  rizzix [ Thu May 05, 2005 2:25 pm ]
Post subject: 

i still prefer haskell.. Rolling Eyes

Author:  wtd [ Thu May 05, 2005 2:29 pm ]
Post subject: 

rizzix wrote:
i still prefer haskell


They both have their merits, and it still has a lot of stuff other languages lack (real type-safety, decent generics, pattern-matching, type inferencing, partial function application, really good list support, an interactive interpreter, etc.). Additionally, it has speed and efficiency, a damn good object layer, and for those new to functional programming it's probably a getler intro than Haskell. It's how I wrapped my mind around the concepts.

Have some faith in me. Smile

I wouldn't be talking it up if there wasn't something there.

Author:  wtd [ Fri May 06, 2005 12:10 pm ]
Post subject: 

Mutually recursive functions and good habits

Lets look at a set of mutually recursive functions in C++.

c++:
void a()
{
   std::cout << "Hello world" << std::endl;
   b();
}

void b()
{
   std::cout << "foo bar!" << std::endl;
   a();
}


Useless, but also incorrect. The function "a" doesn't know about "b" existing, so we have to use a forward declaration, which can be a pain in the butt.

c++:
void b();

void a()
{
   std::cout << "Hello world" << std::endl;
   b();
}

void b()
{
   std::cout << "foo bar!" << std::endl;
   a();
}


Now, let's see O'Caml's version.

code:
let rec a () =
   print_endline "Hello world";
   b ()
and b () =
   print_endline "foo bar!";
   a()


The "and" is the critical element here. With it we can define mutually recursive functions. But why do I say this enforces good habits?

Well, in general, if you have mutually recursive functions, they're going to be a potential source of trouble, and should be grouped closely together for easier maintenance. O'Caml not only promotes that, but it enforces it by leaving no other options for defining mutually recursive functions.

Author:  wtd [ Tue May 10, 2005 5:18 pm ]
Post subject: 

A Huge Avantage

Ever have a function that takes multiple arguments, but then forget which order the arguments should be in? While O'Caml is not the only language to offer a way out of this quandry, it also a solution while maintaining the option for high performance code. Few other languages can claim the same.

So, let's take a look at just a few programs:

c++:
#include <iostream>
#include <string>

void print_info(
   const std::string& first_name,
   const std::string& last_name,
   const int age,
   const int income;
   const bool is_married)
{
   std::cout
      << first_name << " " << last_name
      << " is " << age << " years old, makes $"
      << income << " a year, and "
      << (is_married ? " is married." : " is not married.")
      << std::endl;
}

int main()
{
   print_info("Bob", "Smith", 43, 100000, true);

   return 0;
}


And the naive O'Caml version.

code:
include Printf;;

let print_info fname lname age income is_married =
   printf "%s %s is %d years old, makes $%d a year, and %s.\n"
      fname lname age income
      (if is_married then "is married" else "is not married");;

print_info "Bob" "Smith" 43 100_000 true;;


But this doesn't use labels at all, and we still have to remember the order of the arguments (or check documentation). Let's redo this with labels.

code:
include Printf;;

let print_info ~fname ~lname ~age ~income ~is_married =
   printf "%s %s is %d years old, makes $%d a year, and %s.\n"
      fname lname age income
      (if is_married then "is married" else "is not married");;

print_info ~lname:"Smith" ~fname:"Bob" ~is_married:true ~income:100_000 ~age:43;;


Much better, isn't it?

Author:  Naveg [ Tue May 10, 2005 8:20 pm ]
Post subject: 

so ocaml lets you specify with argument is which, while C++ does not, but rather just assumes whatever is in the first spot is the "first" argument etc?

Author:  wtd [ Tue May 10, 2005 8:53 pm ]
Post subject: 

Vladimir wrote:
so ocaml lets you specify with argument is which, while C++ does not, but rather just assumes whatever is in the first spot is the "first" argument etc?


Yes. Of course, O'Caml will let you not use labels, and rely on the position of the arguments, if you want to.

Going back to one of my usual examples, the name class, let's look at how this could remove a common source of confusion.

code:
class name ~first ~last =
   object
      val first = first
      val last = last
      method first = first
      method last = last
      method full_name = first ^ " " ^ last
   end


Now, creating a name, I don't need to remember if the first name comes first, or the last name.

code:
new name "Bob" "Smith"


And

code:
new name ~first:"Bob" ~last:"Smith"


Are equivalent.

This is also handy for partial function application.

code:
let smith = new name ~last:"Smith"
let bob_smith = smith "Bob"
let alice_smith = smith "Alice"

Author:  wtd [ Wed May 11, 2005 4:37 pm ]
Post subject: 

A Fun Trick With Exiting Programs

The Pervasives module, included by default in every O'Caml program, and somewhat akin to Haskell's Prelude module, or Java's java.lang package, provides the handy function "at_exit".

This function takes as its argument a function which takes "unit" as an argument and returns unit. It then calls this function when the program exits.

Let's take a look at this in action.

code:
C:\>ocaml
        Objective Caml version 3.08.3

# at_exit (function _ -> print_endline "Bye bye!");;
- : unit = ()
# exit 0;;
Bye bye!

C:\>


We can also register multiple functoions this way. Keep in mind that the last function registered will be the first called.

code:
C:\>ocaml
        Objective Caml version 3.08.3

# at_exit (function _ -> print_endline "Bye bye!");;
- : unit = ()
# at_exit (function _ -> print_endline "All done?");;
- : unit = ()
# exit 0;;
All done?
Bye bye!

C:\>


: