Computer Science Canada

WIP - Perl Whirlwind

Author:  wtd [ Sat Aug 12, 2006 6:36 pm ]
Post subject:  WIP - Perl Whirlwind

The Perl Whirlwind

Datatypes

Perl has four basic datatypes, and variables that go with those.

The most basic are scalars. A scalar is a number, or a string. Scalar variables begin with a dollar sign sigil.

code:
$foo = "bar";
$baz = 42;
$qux = 3.14;


The next kind of datatype is an array. Perl arrays contain Perl scalars. Array variables begin with an at sign sigil.

code:
@values = ("bar", 42, 3.14);


Accessing elements of an array is simple. Perl arrays are zero-indexed.

code:
$foo = $values[0];


Whoa whoa whoa. Why'd the sigil change? Well, the sigil changed because once I accessed the scalar inside the array, I was then dealing with a scalar.

Next up we have another useful datatype: the hash. Hashes store multiple scalars, just like arrays, but they don't use integers as indexes. Hash variables begin with a percent symbol sigil.

code:
%math_stuff = ("pi" => 3.14);


We can then get that out with a very simple syntax.

code:
$pi = $math_stuff{"pi"};


Wait, I said there were four datatypes

And there is a fourth. Wondering at all by now how you'd create an array of arrays?

I suppose we could try:

code:
@arr = (1, 2, (3, 4, 5));


Here it looks like we have an array as one of the elements of the array. In fact, we don't. What we have is euivalent to the following.

code:
@arr = (1, 2, 3, 4, 5);


Growing more desperate we might try:

code:
@sub_arr = (3, 4, 5);
@arr = (1, 2, @sub_arr);


But again, we're left with the same result.

What we need is the almighty reference. The Perl reference is essentially another sort of scalar. So it can live inside an array or hash as a single item. This is because it really is a single item. It's just a single item holding a sign that points at something else. Scalar variables hold the same dollar sign sigil as scalar variables.

We can get a reference to a scalar, array or hash using the backslash operator.

code:
@values = ("bar", 42, 3.14);
%math_stuff = ("pi" => 3.14);
$values_ref = \@values;
$math_stuff_ref = \%math_stuff;
@arr = ($values_ref, $math_stuff_ref);


But that's kind of tedious, so let's streamline it a bit.

code:
@values = ("bar", 42, 3.14);
%math_stuff = ("pi" => 3.14);
@arr = (\@values, \%math_stuff);


And we can even do a bit better by using the syntax for creating anonymous array and hash references.

code:
@arr = (["bar", 42, 3.14], {"pi" => 3.14});


So, now let's say we want to get at the value 42. We can get the reference to the array its in the normal way.

code:
$arr[0]


Now, though, we have to dereference that array.

code:
@{$arr[0]}


And then we can access 42 in the normal way.

code:
@{$arr[0]}[1]


But that looks rather tedious, so we'll use the indirection operator.

code:
$arr[0]->[1]


Of course we still have to use the explicit dereference if we want to give a name to the array contained within.

Similarly, if we want the value of pi, or at least our incredibly bad approximation of it.

code:
%{$arr[1]}{"pi"}


Or:

code:
$arr[1]->{"pi"}


Subroutines

So far all of this code has just been out on its own. Let's start giving it names and organizing it into subroutines.

code:
sub hello_world {
   print "Hello world\n";
}


That's a pretty simple subroutine. It doesn't even take arguments. Let's make one that does. There's just one problem. Perl doesn't provide any place in the subroutine heading to give names to parameters/arguments.

Instead, all arguments are passed as items in a special array.

code:
sub hello {
   # args are in the @_ array
   $name = $_[0];
   
   print "Hello $name\n";
}


But there are a couple of idioms for dealing with arguments. The shift subroutine will take the first element off of an array and return it. This is very useful for dealing with arguments.

code:
sub hello {
   # args are in the @_ array
   $name = shift @_;
   
   print "Hello $name\n";
}


Even better, shift does the right thing in thise case if no array is specified.

code:
sub hello {
   $name = shift;
   
   print "Hello $name\n";
}


The other idiom we'll see in a little while.

Scope

There's just one significant problem. Let's expand that sample a bit.

code:
$name = "hello";

sub hello {
   $name = shift;
   
   print "Hello $name\n";
}

print "$name\n";
hello "world";
print "$name\n";


The problem here is that the output is:

code:
hello
Hello world
world


The "hello" subroutine modified the value in the outer variable.

To fix this we'll give our variables explicit scope with "my".

code:
my $name = "hello";

sub hello {
   my $name = shift;
   
   print "Hello $name\n";
}

print "$name\n";
hello "world";
print "$name\n";


The output is now:

code:
hello
Hello world
hello


Just as we would expect.

Packages

But what about when we want to organize our subroutines? Well, that's where packages come in.

We can have different packages in one source file. The default package is "main". Let's move our previous subroutine into a "Greetings" package.

code:
package Greetings;

sub hello {
   my $name = shift;
   
   print "Hello $name\n";
}

package main;

Greetings::hello "world";


Far more useful though, is when we separate a package out into its own file as a module, so it can be reused by many different programs.

In file Greetings.pm:

code:
package Greetings;

sub hello {
   my $name = shift;
   
   print "Hello $name\n";
}

1;


The trailing one tells the Perl interpreter that the module is returning true. If it does not do this, the program will terminate when it tries to use this module.

If this is located in a directed listed in Perl's @INC array, we can then easily use it in our program.

code:
use Greetings;

Greetings::hello "world";


Getting stuff out of a package

It's all well and good that we've got our subroutine packaged up in a module that we can reuse, but what if we don't want to prepend it with the module name all the time?

Well then, we can use the Exporter module.

code:
package Greetings;

use Exporter;

@ISA = qw(Exporter);
@EXPORT = qw(&hello);

sub hello {
   my $name = shift;

   print "Hello $name\n";
}

1;


And now we can write:

code:
use Greetings;

hello "world";


The "hello" subroutine (subroutines have an ampersand sigil) has been automatically exported into our program from the Greetings module. The only thing is... there really isn't much control here. What if I don't want to export the subroutine?

Well, then I'll specify in that module that hello can be exported, if the user wishes.

code:
package Greetings;

use Exporter;

@ISA = qw(Exporter);
@EXPORT_OK = qw(&hello);

sub hello {
   my $name = shift;

   print "Hello $name\n";
}

1;


And now to achieve the same as the previous version:

code:
use Greetings qw(&hello);

hello "world";


It should be noted that the "qw" construct takes a string, splits it on spaces, and creates an array of strings. Very handy little bit of code that is.

What about objects?

All of the other cool kids get to play with objects.

code:
class Name
   def initialize(first, last)
      @first, @last = first, last
   end
   
   def full_name
      "#{@first} #{@last}"
   end
end


So why not Perl?

Well that's silly, of course Perl gets to do objects. For instance, we'll put this code in Name.pm.

code:
package Name;

use strict;
use warnings;

sub new {
   my ($class, $first, $last) = @_;
   my $self = {first => $first, last => $last};
   bless $self, $class
}

sub full_name {
   my $self = shift;
   "$self->{first} $self->{last}"
}

1;


We aren't using Exporter here since we don't want to export methods.

You should be noticing something new. We're using the strict and warnings modules. These enforce stricter controls on variables and print warning messages, respectively. As a result, we now cannot get away with not specifying the scope of variables.

Next you'll note:

code:
my ($class, $first, $last) = @_;


Here we're assigning the variables in list context from the arguments array. This will assign the first three values in that array to the variables, and ignore the rest.

The first argument will be the actual classname. The next to are just arguments supplied to the constructor.

We next create our $self variable as a hash reference. We can then bless this hash reference as a Name object. You'll note that there is no semi-colon. In perl, semi-colons exist only as separators, and not terminators. Since the closing brace for the subroutine serves this purpose, there is no need for the semi-colon. Additionally, the last expression in a subroutine is automatically returned. Since the self argument to bless is returned by bless, we do not need to explicitly return $self, though we can if we wish.

In the full_name method, the first argument is the self object. We can then gain access to the data in that hash. We use string interpolation to incorporate that data into a string, which using the semantics exdplained earlier, is returned from the subroutine. Only strings in double quotes will do interpolation.

We can then use this class in a program.

code:
use strict;
use warning;

use Name;

my $n = Name->new('Bob', 'Smith');

print $n->full_name . "\n";


The dot operator here is used to concatenate strings.

Overloading

Wow, the last line of code demoed here is obnoxious. Why can't we just use string interpolation?

Well, that is because method calls such as the one used don't get evaluated in string interpolation.

But in Ruby we could simply override the to_s method, which gets called automagically. In Perl, we use the overload module.

code:
package Name;

use strict;
use warnings;

use overload '""' => sub {
   my $self = shift;
   $self->full_name
};

sub new {
   my ($class, $first, $last) = @_;
   my $self = {first => $first, last => $last};
   bless $self, $class
}

sub full_name {
   my $self = shift;
   "$self->{first} $self->{last}"
}

1;


Here we pass a list of arguments to overload. Yes, we've used the arrow operator used with hashes, but you see... hashes are really very similar to arrays. The arrows can be replaced with commas and it will still work. This way, though, it looks nice.

The other important thing to note here is that we've created an anonymous subroutine reference. We could have instead used a reference to an actual subroutine.

code:
package Name;

use strict;
use warnings;

use overload '""' => \&full_name;

sub new {
   my ($class, $first, $last) = @_;
   my $self = {first => $first, last => $last};
   bless $self, $class
}

sub full_name {
   my $self = shift;
   "$self->{first} $self->{last}"
}

1;


Inheritance

code:
package FormalName;

use strict;
use warnings;

use Name;

our @ISA = qw(Name);

use overload '""' => \&full_name;

sub new {
   my ($class, $title, $first, $last) = @_;
   my $self = Name->new($first, $last);
   $self->{title} = $title;
   bless $self, $class
}

sub full_name {
   my $self = shift;
   $self->{title} . ' ' . Name::full_name($self)
}

1;


The only real new thing here is setting up the inheritance. By using "our" instead of "my" we've made the array visible outside of the package.

Normally we'd just save these in Name.pm and FormalName.pm somewhere in our include path and be done with it. But... what if we want to call the FormalName class in a way that takes advantage of Perl's capabilities? Something like Name::Formal, perhaps?

Well, then in the same directory as Name.pm, we'd create a Name directory. Into that we'd put the following, as Formal.pm.

code:
package Name::Formal;

use strict;
use warnings;

use Name;

our @ISA = qw(Name);

use overload '""' => \&full_name;

sub new {
   my ($class, $title, $first, $last) = @_;
   my $self = Name->new($first, $last);
   $self->{title} = $title;
   bless $self, $class
}

sub full_name {
   my $self = shift;
   $self->{title} . ' ' . Name::full_name($self)
}

1;


We could then use that in a program with:

code:
use Name::Formal;


Of include paths

What if we can't put these files somewhere in the predefined include path? How then would we use them?

In that event, we would use yet another package to tell Perl where to look.

code:
use lib '/some/directory';


If you wish to test this, then simply print the @INC array.

code:
use lib 'foo';

print "\@INC is @INC\n";


Let's read some user input already!

code:
#!/usr/bin/perl

use strict;
use warnings;

print "Your name is? ";
chomp(my $name = <STDIN>);

print "Hello, $name!\n";


First off, I added a shebang line to the top of my program. This tells it what program to use to run this script. So now I can chmod my little program executable and run it with:

code:
./hello.pl


The rest looks pretty normal until you get to "chomp." The chomp subroutine removes the whitespace from the end of a string. I could have written this as:

code:
my $name = <STDIN>;
chomp $name;


But the first version has the same effect, and puts it all nicely into one line.

We then just use string interpolation to print the greeting.

Arguments

But instead of asking for a name, let's just let the user pass the name to greet in as a command-line argument.

code:
#!/usr/bin/perl

use strict;
use warnings;

my $name = $ARGV[0];

print "Hello, $name!\n";


Array length

But what if the @ARGV array that holds all of the supplied arguments is empty? Then trying to get the first element in it will cause the universe to implode all over your beautiful shell. That's no good at all.

To avoid this we need to be able to detect whether there is at least one argument in there. We need to be able to find the length of the array.

Now, in list context, an array is just a list. That makes perfect sense. But, in scalar context, an array is an integer. That integer just happens to be the array's length.

code:
#!/usr/bin/perl

use strict;
use warnings;

if (scalar @ARGV >= 1) {
   my $name = $ARGV[0];

   print "Hello, $name!\n";
}


Let's shorten that up a bit.

code:
#!/usr/bin/perl

use strict;
use warnings;

if (scalar @ARGV >= 1) {
   print "Hello, $ARGV[0]!\n";
}


And now let's shorten it up a bit farther with a magical postfix conditional.

code:
#!/usr/bin/perl

use strict;
use warnings;

print "Hello, $ARGV[0]!\n" if scalar @ARGV >= 1;


But, in Perl, 0 is false. Any other integer is true.

code:
#!/usr/bin/perl

use strict;
use warnings;

print "Hello, $ARGV[0]!\n" if scalar @ARGV;


But we don't even have to go that far. An empty array is false.

code:
#!/usr/bin/perl

use strict;
use warnings;

print "Hello, $ARGV[0]!\n" if @ARGV;


But postfix conditionals don't help much if we want to have some "else" conditions.

code:
#!/usr/bin/perl

use strict;
use warnings;

if (@ARGV) {
   print "Hello, $ARGV[0]!\n";
} else {
   print "Please enter a name next time.\n";
}


STDERR

So far we've just been printing to standard output, which is a nifty filehandle, but not the only one we have out of the box.

Let's print the error emssage from our previous example to the standard error filehandle.

code:
#!/usr/bin/perl

use strict;
use warnings;

if (@ARGV) {
   print "Hello, $ARGV[0]!\n";
} else {
   print STDERR "Please enter a name next time.\n";
}


Yes, it works the same way for any other filehandle.

Greeting a whole bunch of names

code:
#!/usr/bin/perl

use strict;
use warnings;

my $i = 0;

while ($i <= $#ARGV) {
   print "Hello, $ARGV[$i]!\n";
   $i++;
}


That's one way to do it. The while loop isn't a very sophisticated means of accomplishing the goal, but it works. The $#ARGV construct retrieves the last index of the @ARGV array.

code:
#!/usr/bin/perl

use strict;
use warnings;

for (my $i = 0; $i <= $#ARGV; $i++) {
   print "Hello, $ARGV[$i]!\n";
}


There's nothing particularly more sophisticated here.

code:
#!/usr/bin/perl

use strict;
use warnings;

for (@ARGV) {
   print "Hello, $_!\n";
}


The above represents a for-each loop. Instead of dealing with indexes, we just loop over the elements. Each element is assigned to the $_ variable. We can even use this as an infix control structure.

code:
#!/usr/bin/perl

use strict;
use warnings;

print "Hello, $_!\n" for @ARGV;


But that isn't a very meaningful variable name.

code:
#!/usr/bin/perl

use strict;
use warnings;

for my $name (@ARGV) {
   print "Hello, $name!\n";
}


Let's map it!

code:
#!/usr/bin/perl

use strict;
use warnings;

print for map { "Hello, $_!\n" } @ARGV;


Map takes each element in an array, calls it $_, and changes it to some other value, generating a new list.

When we use print for no argument, it will automatically print the value of $_. The for loop in postfix for will call each element $_. Thus we get the desired behavior.

Being selective

Let's not greet anyone named Larry. Don't let Larry sneak by using an odd spelling.

code:
#!/usr/bin/perl

use strict;
use warnings;

print for map {
   "Hello, $_!\n"
} grep {
   ! /\b larry \b/ix
} @ARGV;


A regular expression on its own automatically tries to match $_. The exclamation point operator will negate that. This only returns true if the name does not contain "larry."

Map and grep: what's the point?

With map and grep, we've taken what are multiple steps, and expressed them separately in code. This makes it easier for us to modify the code down the road.

If we were to do this without such tools:

code:
#!/usr/bin/perl

use strict;
use warnings;

for my $name (@ARGV) {
   if ($name !~ /\b larry \b/ix) {
      print "Hello, $name!\n";
   }
}

Author:  Clayton [ Sun Aug 13, 2006 12:53 pm ]
Post subject: 

w00t good job wtd, im very interested in the idea of hasches, much more readable and much better organzation tools then an array

Author:  wtd [ Sun Aug 13, 2006 12:57 pm ]
Post subject: 

Hashes exist in other languages as well. Ruby and Python both have very nice implementations of this. Ruby again calls them hashes, but Python refers to them as dictionaries.

You may also see them referred to as "associative arrays," as in PHP and Javascript, if memory serves

Author:  Clayton [ Sun Aug 13, 2006 1:28 pm ]
Post subject: 

actually one thing i have to ask, is a package a sort of object in Perl, or is it a kind of module?

Author:  wtd [ Sun Aug 13, 2006 1:31 pm ]
Post subject: 

A module. Though you perhaps are getting ahead of yourself with the question about objects. Wink

Author:  Clayton [ Sun Aug 13, 2006 1:37 pm ]
Post subject: 

i was just wondering as to what it was specifically (i was more inclined to think it was a module), although i wasnt sure Razz

Author:  octopi [ Mon Aug 14, 2006 2:49 pm ]
Post subject: 

man, you just reminded me of how much I love perl

it was the first language I learned, but I haven't used it in sooo long, its so great for making small programs quickly, and its got an amazing library of modules (http://www.cpan.org), which pretty much makes it easy to do anything! (even make windows gui apps)

Author:  wtd [ Mon Aug 14, 2006 2:51 pm ]
Post subject: 

Perl was the first language I spent much time studying as well.


: