Computer Science Canada [Haskell-tut] A Gentler Introduction, with pictures! (New!) |
Author: | wtd [ Thu Dec 23, 2004 11:26 pm ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Post subject: | [Haskell-tut] A Gentler Introduction, with pictures! (New!) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is a repost of a tutorial posted at compsci.ca. Feel free to ask questions. :) Disclaimer This is yet another attempt to bring the ideas of functional programming to the masses here, and an experiment in finding ways to make it easy and interesting to follow. Your feedback would be good as a means of judging my progress. Why should you care? Functional programming is fundamentally a very different way of thinking about programming. After all, we can learn several languages and pat ourselves on the back, but if the only real difference is syntactic, then we're not really being challenged. What do I need? You'll need either a Haskell interpreter or compiler. For learning purposes the Hugs Haskell interpreter is fine. You can download it from: http://www.haskell.org/hugs/ It installs as easily as any other Windows program, and an entry will be created in the Start menu under: Start -> Programs -> Hugs98. If you're running Windows 98 or ME, it would probably be wise to restart your computer, just to make sure everything the installer did takes effect. Windows 2000 and XP are pretty good about immediately applying the changes. Oh, and you'll want a good text editor. I suggest TextPad with the Haskell syntax coloring file. Directions for installing the syntax file are available. A quick look at Hugs Start -> Programs -> Hugs98 -> Nov2003 -> Hugs (Haskell98 mode) So, at startup of Hugs we've got some ASCII art, copyright information, some basic usage tips, and a prompt. This is a good start. What can we do with this? Well, we can evaluate expressions and see their results. What's an expression? An expression is a little bit of code that takes one or more values and gives you a new value. Consider this very simple example. Moving on But really, we could do basic math all day and be bored out of our skulls, so let's look at putting together a source file where we can build more complex things. A Haskell source file is just a text file containing Haskell code. The extension we use is ".hs". So, what will our source file contain? A simple hello world program.
What do we have here? Well, first of all we have to deal with the fact that Haskell code is organized into modules. Modules allow us to easily reuse code in other programs, and they allow the actual language itself to be relatively simple. The name of the file should match the name of the module. Here the module is named "Main". Next we have the "main" function, the center of activity, as in many other programming languages. The ease of creating this function shouldn't come as any surprise given the fact that Haskell focuses on functions.
Here we simply use the putStrLn function to print a string to the screen on its own line. The similar putStr function does the same, but doesn't automatically skip to a new line. Testing the code So, how do we run the code in this file? Well, open up your trusty Command Prompt window and "cd" to the directory where you saved your Main.hs file. Once you're there, start Hugs by simply typing "hugs" and hitting enter. Again we're back to: To load the "Main" module we simply:
The prompt has changed to indicate we're now in the Main module, rather than the Prelude module. And to run the main function:
So, we've seen a little bit of Haskell Is it scary? If you say yes, that's not bad. New things can be scary. You'll get over it. The real question, though, is: where do we go from here? Well, since Haskell is a functional programming language, I'm thinking it might be good to see some more functions. Expanding on Hello, world Anyone even moderately familiar with my other tutorials will recognize the pattern of starting with a simple hello world program and then expanding upon it. So, let's create a function "greet" which takes a name and greets that person. Our Main.hs looked like:
Now we're going to expand it with:
Of course, what if we want just the greeting string?
Tidying the code up - Haskell tricks Haskell by default infers the types of data being used in a program, but for documentation purposes, we can explicitly specify the types a function uses and returns. Doing this, we write a type signature for main. The "main" function does an IO action, and returns the closest thing you can get in Haskell to nothing, so:
Double colons separate the name of a function and its signature. Continuing, we generate a signature for the "greet" and "greeting" functions.
These signatures are saying, "greet takes a string as an argument and does an IO operation," and, "greeting takes a string as an argument and returns another string."
Here we use parentheses because otherwise this would be seen as putStrLn taking two arguments, "greeting" and "name". Since putStrLn only takes one argument, this would clearly be erroneous. But the parentheses can get annoying, so we have the $ operator. Essentially, the $ operator takes the value on its right hand side and gives it to the function on the left hand side. So, now our greet function looks like:
So, our code now looks like:
Input as well as output All of this greeting isn't very much good unless we can get input from the user as well, to find out their name. IO actions aren't quite like other functions. To "chain" them together in sequence we use the keyword "do".
Probably the most immediately notiecable change is the use of indentation. Haskell uses what's referred to as "layout", so semi-colons and braces aren't necessary. They are available:
Of course, the "layout" approach is so much nicer that it'd be silly to use braces and semi-colons. The second new bit of syntax is:
The return of getLine is a string, but an IO "tainted" string, which can't be immediately used. Using the <- syntax, "name" is a plain old string we can use elsewhere. Conditionals How about when the name given to the greeting function is "Haskell", the greeting is "Hey, whadda ya know? This is a Haskell program!"
This should look fairly straightforward to a programmer with basic experience in other languages. Also, again we use "layout" to signify the structure of the conditional. Case expressions Let's say we want our program to greet "Matz" with, "You make a good language." Using only "if":
Wow, that's ugly. Fortunately, we have the case expression that should look familiar to programmers.
As with "if", we use layout. Overloading functions Of course, we can do this even more cleanly by overloading the greeting function.
Loops At this point you might be tempted to ask how Haskell handles looping, since that's a pretty basic thing for programmers to learn about in other languages. Haskell provides no special syntax for looping. All looping is achieved via recursion, where a function calls itself.
A few questions that may come up from looking at this:
What does "return" do? This basically turns () into IO (), which is the return our main function wants. Lists So, we can greet a number of people. Of course, what if we want to be able to get a list of people we've greeted? Well, we need a list. A list in Haskell can contain any number of values, as long as they're all the same type. The most basic list is an empty list:
A small list of names might look like:
Anything dealing with such structures in other programming languages, where we often use the term "array", should instantly bring to mind loops. Of course, we've already covered that. Haskell has no explicit looping syntax, but rather recursion. The solution, therefore is to find a way to define lists in a recursive manner. Thankfully, Haskell lists are naturally recursive. The : operator adds an element to the beginning of a list. Our name list could look like:
Let's look at this in practice in a simple example. A simple range function should create a list of numbers from a start to an end.
This could look fairly cryptic until we break a sample use of it down.
Seeing a function with two arguments points out an interesting fact about Haskell. Arguments to a function are simply separated by space, rather than commas, as in many other programming languages. So, we might as well jump right in.
Breaking it down As always, breaking a large complex program down into small, understandable components is essential to understanding.
Our new signature for main indicates that it returns a list of strings. Of course it remains IO "tainted".
As before, if the user enters "quit", then we stop "looping". This time, though, we return an empty list, much as we did in the range function.
We can't directly use main, since it returns an IO tainted list. Instead we first extract that list.
Here we add the current name onto the list of names generated by running the main function again, then "return" that list. It may seem odd, but the last run of the main function is the last to finish. Another function Since this is getting fairly complex, perhaps we should break it into a separate function.
And another one Not a lot has changed, but now we can do something with the list of strings greetMultiple returns. Let's introduce a new function to print all of the strings in a list.
Here we've overloaded the printAll function so printing an empty list just returns (). When I want to print an actual list, I use the pattern "(x:xs)". We've seen the : before. It's used when we're constructing lists. So here x is the first element in the list. The rest of the list is "xs", which can be read as the plural of "x". Our code now looks like:
There's a simpler way The task of applying a function to each element in a list is such a common one, you'd think there would be a function or functions already present to solve this problem. When we're applying a normal function, we can use "map". Let's say we want to double each number in a list.
Now, when we're using IO "actions", we can't use map, but rather we use the mapM_ function.
A different greeting approach Now, lets say we first want to gather names from a group and then greet them all. First we need a gatherNames function.
Now we can incorporate that into our program and use map to generate the appropriate greetings, then pass the result of that with $ to the "mapM_ putStrLn" we've already seen.
Back to printAll
Of course we may look at this and think it's uglier in use than:
But previously we've seen that the definition of printAll adds quite a bit of unnecessary code. So, let's redefine printAll.
Much better. We can improve this further, though. In Haskell, if we have a function which takes two arguments, and only give it one of those arguments, we get another function which takes the final argument. This is known as "partial application." The partial application of functions allows us to easily create new functions based on already existing functions. The printAll function can be rewritten as:
Similarly we can generate the greetings like so:
Now, our code looks like:
Combining data In many cases, data isn't as simple as just someone's name. We often want different pieces of data grouped together into a single unit. In Haskell we accomplish his via tuples. So, with a name, let's say we want to store an age as well, so our greeting function can come up with greetings based on age as well. An example tuple, then, might be:
But, first, we need to be able to read an integer from the user. Since everything read in is in the form of a string, we need a means to extract an integer from a string. The "reads" function steps in. Now, reads returns something along the lines of:
As a result we need to get the first element from the array, and the first item from the tuple. The !! operator and fst function will do these things quite nicely. So, a basic getAge function might look like:
There's just one problem with this, and it's a big one. What if someone gives you bad input? Well, then reads will gives us am empty list, and that makes "reads input !! 0" cause an error. We need a way to avoid the error in the first place. We can do so by adding in a conditional expression which checks to see if the list is empty. Of course, if it is, we should probably tell the user we didn't get their age, and ask them again. Recursion will serve us well here.
Taking as step back again We want to not only get names, but also their ages. Where we had a gatherNames function before, let's replace that with a gatherInfo function.
The basic form looks pretty much like other recursive functions we've defined. Now, even though we have a very different type of data here than a simple string, we can get the string from each piece of info with map and fst.
Doing something with the extra information Now we have both name and age info for each of the people we're greeting, and we can still greet people the old way, but that seems to be missing the point of having that extra information in the first place. So, we want to get different greetings depending on age, as well as name. So, we'll first want to remake the greeting function. Our old greeting function looked like:
The first thing we realize will have to change is the signature of this function.
Now, for the rest, if the age is less than twelve, we'll answer with: "Do your parents know where you are, <name>?" If the age is greater than eighty, we'll answer with: "Do your children know where you are, <name>?" Otherwise we'll use the same rules we applied in the previous greeting function. To this end we'll use "guards" to define multiple versions of the function for these different conditions. The syntax should be fairly obvious.
So, our entire program is now:
Modules So, we've seen input, output, strings, functions, conditionals, lists, tuples, and some very handy functions like map and mapM_. What's next? Well, looking at the above program, most of it handles greeting people, and then one lonely function is the main function where everything happens. What if I want to use the functions related to greeting in another program but I want a different main? Well, then I need to put all of those functions into their own module. Let's call the new module Greeting. Naturally, it'll be located in the file Greeting.hs.
No our Main module simply looks like:
Of course, if you want it to be explicit where the functions you're using come from, you can prepend the name of the module.
This can be encorced by using the "qualified" import modifier.
And with either we can limit the functions we import.
Introducing our own data types Of course, at this point, we've used a new data type in the form of a tuple. However, we're counting on being able to recognize that a tuple consisting of a string and an integer is a person. Haskell gives us the power to be more expressive, by introducing new data types. In this case it's really quite simple.
This introduces a new type called PersonInfo with a single constructor which takes a string and an int. So, let's start by simply redefining the greeting function to take advantage of this new data type.
And, modifying the rest of our code to use this new data type, we end up with:
To arrive at this, I made very few changes to the code I had previously. This should demonstrate quite nicely the expressive power of Haskell. Greeting other things Now, we've defined a set of functions useful for greeting a person. However, people are not the only things we may want to greet. Consider, for instance, the case where we want to greet a dog. For the purposes of our program a dog will be described by its name, build and color. Based on these things we'll formulate a greeting. First thing's first, though. Let's define the data types we'll need.
In the build and Color data types, we have a set of constructors which take no arguments. Any one of these constructors creates a value of type Build or Color. This is somewhat analogous to the idea of enumerated types or "enums" in other languages. The Dog constructor then uses these data types. Let's talk about classes For the purposes of this document, please foget what you know about classes in object-oriented languages like C++, Java, C#, Python, Ruby, Eiffel, etc. When we talk about classes in Haskell, we're talking about classifying data types, according to what functions we can use on them. The advantage of classifying data types in this way is that we don't need to be as specific when declaring a function. Consider my current declaration of the greeting function.
This requires that the argument be of type PersonInfo. Given this, we can't simply define greeting to take an argument of type DogInfo. What we want instead is to classify data types based on whether or not we can greet them. So, let's create a class that does just that.
This is fairly straightforward. Here "a" represents any type in the Greetable class. Let's add a few more functions related to the first.
Now, the interesting thing about classes in Haskell is that not only can we specify the types of these functions, but if we have related functions which simply depend upon another, we can provide a default definition. Now, when we later define greeting, we get the other two automatically.
One thing to notice is that I've used comments for the first time. Anything following -- on a line is a comment in Haskell. Instances A data type is admitted to a class by means of an instance declaration. This is where we define the functions required by the class. To admit the PersonInfo data type into the Greetable class.
This looks pretty similar to our previous definition of greeting, so there's not a lot new to learn here. So, now our existing program looks like:
Of course, none of these changes influences our Main module in the slightest. Sidetracking: the Show class So we've seen one class of our own design. There are other classes already in Haskell 98. One useful class is the Show class, which requires that the function "show" be defined. That class would look like:
This provides a convenient means to get a string representation of something. For instance, to print an int, we'd use something like the following, since putStrLn can only deal with strings.
Or to add an integer to a string:
Let's consider a more pertinent situation. Previously, we had defined a data type Color.
Now, at some point we might wish to actually get a string containing "White" or "Red". We can achieve this by making Color an instance of Show.
It seems like there should be an easier way to achieve such a simple transformation. Well, for such a basic class, we can make Color a member of the Show class without an instance declaration.
Of course, in the course of this program we might want "red" rather than "Red". Fortunately, we can simply use map in this case. A string in Haskell, you see, is simply a list of characters. First, though, we have to import the toLower function from the Char module.
Then we can easily apply the map.
Back on topic: greeting a dog So, we have a Greetable class that requires members of the class define a greeting function and gives us a couple of free functions if we do.
And we have a few new data types.
What we need is to make DogInfo an instance of Greetable.
This example shows one important new feature of Haskell. Namely, what's the deal with all of the underscores? Well, in Haskell, when matching a pattern, you have to include all of the arguments. Of course, sometimes you just don't care what the argument is. When we greet a skinny or fat dog, we're rude and don't care what the dog's name or color is. We're just going to insult it anyway. When we don't care about an argument, we use an underscore. This signifies that yes there's an argument there, but we don't care enough to give it a name. In the end, I have the following program.
The end result At this point, I can greet either a person or a dog using the same function, even though the two share nothing in common except the greeting function. Of course, what more do they really need to share?
And we see on the screen:
Compilation At this point you might be thinking that this is all well and good, but this method of interpretation, where we write the code, then test it is tedious and not really suited to writing a program that can run, do its thing, then quit. Fortunately, Haskell can be compiled to native code as well as interpreted. The compiler we'll use is the Glorious Glasgow Haskell Compilation System, or just GHC available at: http://www.haskell.org/ghc/ An installer for Windows is available at: http://www.haskell.org/ghc/download_ghc_622.html#windows To compile the program we've assembled, the command is:
The -O tells the compiler to perform optimizations, --make tells the compiler we're actually building an executable based on this module, and -o tells the compiler the name of the resulting executable. Now, we can run this program.
And we see something rather odd. Nothing. Why don't we see anything? Well, it's a petty little thing which has nothing to do with functional programming. The problem is that Haskell programs compiled with GHC don't output anything until a newline is entered. When we ask, "You are?", we don't go to a new line, so nothing gets printed to the screen. The fix is simple. We need to flush all of the text in standard output. For this we'll need to import the IO module, then flush standard output after every instance of putStr. The putStrLn function doesn't have this problem. With this taken into mind, our Greeting module now looks like:
Now, repeating the above mentioned compiling will give us an executable that does the right thing. |
Author: | wtd [ Fri Dec 31, 2004 12:52 am ] | ||||||||||||||||||||||||||||||||
Post subject: | Haskell Intro Continued | ||||||||||||||||||||||||||||||||
More on executables Now that we have the ability to compile a Haskell program to a native executable, there are a couple of things we might want to do.
Both are provided by functions in the System module. The first is solved by the system function. The most useful example of this is probably to run the Windows "pause" program, which creates that famous "press any key to continue..." message. This is useful for preventing a console window from instantly closing when the program has finished its work. If I compile and run the following by double-clicking on the resulting executable, t'll print the message, but it'll do it so quick that the window will close before I can see anything.
Instead, I want to pause at the end of the program, so I compile:
Interestingly, the system function returns an ExitCode, so you can tell if the program you ran was successful or not. If it was successful, ExitSuccess is returned. Otherwise, ExitFailure is returned. The ExitFailure constructor also takes an int, so you can tell exactly what code the program exited with. We could test the success of pause.
Now, I mentioned getting variables passed to the program. The getArgs function will do that, in the form of a list of strings. So, let's greet the names passed to a program.
And now for something completely different Let's build a quick replacement for the typical copy program, which takes one file and copies its contents into another file. This should cover:
To get at the file handling functions, we'll need the IO module. Let's take a look at opening a file and reading the contents, then printing them to the screen.
Dissecting that...
The first argument is obviously the name of the file to open. The second argument is the mode in which to open the file. We could have also chosen WriteMode, AppendMode, or ReadWriteMode.
Obviously this gets the contents of "bar.txt".
Here we're closing the file. So, now we can open the file, get the contents, and close the file. The next goal is to be able to open a file, write to it, and close it.
There's not a whole lot new here. Simplifying the handling of files But if this is all we're doing with files, we can replace all of that with two functions: readFile and writeFile. So, let's write a program that copies one file's contents into another file.
Arguments Of course, to more accurately emulate the copy command, our program should work on user-supplied filenames, rather than hard-coded ones.
Error handling If we supply less than two arguments, this will fail due to "o = args !! 1". So we should check to see if there are at least two arguments. If not, remind the user.
Now, if we try to open a file for reading that doesn't exist, our program will raise an error and terminate. We could try to prevent this, but that's pretty tedious, so instead we'll let the error happen and "catch" it, using, appropriately enough, the catch function. Our desired behavior is, if the file to be read doesn't exist, the text should be an empty string. The catch functions allows us to do that by providing a function to run if an error occurs.
The only thing that should look odd here is:
What we're seeing here is another way of writing a function. The backslash introduces the function's arguments, and the -> leads to the actual guts of the function. The argument in this case is the error itself, but we aren't concerned with what the error actually is, so we use an underscore. The function itself simply returns an empty string. One last thing In the last code sample, we saw:
What you might also see is the use of:
Here the backticks around catch turn a function with two arguments into an operator. Which of these two forms you use is purely a matter of style, but it's important to understand either. |