Computer Science Canada

[Perl5-tut] Perl5 Primer

Author:  wtd [ Thu Dec 02, 2004 12:47 am ]
Post subject:  [Perl5-tut] Perl5 Primer

Goal

The goal of this tutorial is to introduce Perl5 to programmers who already have experience in another programming language.

What is Perl?

Depending on who you ask, Perl is the "Practical Extraction and Report Language", or the "Pathologically Eclectic Rubbish Lister". Either way, it's a powerful programming language that's installed on pretty much every Unix or Unix-like system, and is available for just about any platform you can imagine.

Further, Perl, despite not having a large company like Sun or Microsoft strongly advocating it, has become incredibly popular. Due to the amount of work already done in Perl, and the number of people continuing to work on it, it's also a well-known name, and has a lot of power on a resume.


A first program

code:
use strict;
use warnings;

print "Hello, world!\n";


The first two lines force us to program strict Perl, which makes it easier to avoid errors, and enables warnings, to let us know when we're doing things which could be dangerous. As in any language, warnings should not be ignored.

The third line simply prints "Hello, world!" to the screen.

Comments

Comments in Perl are anything following a #.

code:
# this is a comment


Variables

Variables in Perl5 in strict mode are "declared" with the keyword "my".

code:
my $hw = "Hello, world!\n";

print $hw;


There are only four types of variables in Perl, and they aren't what you're probably used to.

Scalars

Scalars are either numbers or strings. The language goes back and forth easily between the two.

String concatenation with a number and a string...

code:
print "foo" . 42;
# foo42


Similarly, two numbers...

code:
print 1 . 5;
# 15


Mathematical operators work much the same way.

code:
print "4" + 3;
# 7

print "3" + "12";
# 15

print 1 + "5";
# 6


To repeat a string, use the "x" operator rather than "*".

code:
print "hello " x 3;
# hello hello hello


Arrays

The second kind of variable is an array. Array variables are denoted with a leading @, just as scalars are indicated by a $. Each element in the array is a scalar. An array can contain any number of elements. Array indices start at zero.

A simple array:

code:
my @foo = (42, "foo", 57, "hello world");


Accessing each inidividual elements in the array is done like so:

code:
my $greeting = $foo[3];


The @ switches to a $ since the thing being accessed really is a scalar.

Inserting a value into an array is similarly simple, and you can insert at any index.

code:
my @foo;
$foo[10000] = 42;


Getting the length of an array is a matter of:

code:
my $length = scalar @foo;


Hashes

The third type of variable in Perl is the hash, or associative array. It's kind of like an array, but it uses a scalar as an index. Also, hash variables are prefixed with %, and {} are used instead of square brackets when accessing elements.

code:
my %bar = ("greeting" => "Hello, world");

print $bar{"greeting"};

$bar{"name"} = "Bob";


References

References are a fancy twist on scalars, and can go anywhere a scalar can. Instead of a number, or string, though, they hold references to other variables (scalars, arrays, hashes).

The \ operator gets a reference.

code:
my $str = "hello";
my $ref = \$str;

# or just

$ref = \"world";


Of course, we can also get reference to arrays and hashes with \...

code:
my @arr = (1, 4, 9);
my $arr_ref = \@arr;
my %hash = ("hello" => "world");
my $hash_ref = \%hash;


But, it's easier if we just directly create the reference using different brackets.

code:
my $arr_ref = [1, 4, 9];
my $hash_ref = {"hello" => "world"};


Dereferencing a reference is easy.

code:
my $scalar_ref = \"hello";
my $scalar = ${$scalar_ref};

my $arr_ref = [1, 4, 9];
my @arr = @{$arr_ref};

my $hash_ref = {"hello" => "world"};
my %hash = %{$hash_ref};


For simple examples like these, we can elide the brackets.

code:
my $scalar_ref = \"hello";
my $scalar = $$scalar_ref;

my $arr_ref = [1, 4, 9];
my @arr = @$arr_ref;

my $hash_ref = {"hello" => "world"};
my %hash = %$hash_ref;


And if we want an element from an array or hash:

code:
my $arr_ref = [1, 4, 9];
print $arr_ref->[2];

my $hash_ref = {"hello" => "world"};
print $hash_ref->{"hello"};


The real power of references is the ability to create arrays and hashes that contain other arrays or hashes.

code:
my $person = {
   "name" => {
      "first" => "Bob",
      "last"  => "Smith"
   },
   "car" => {
      "make"  => "AMC",
      "model" => "Gremlin",
      "year"  => 1970
   }
};

print $person->{"car"}->{"make"};


String interpolation

Few little things are more useful in Perl than string interpolation.

I could write my "Hello, world" using the stringconcat operator:

code:
my $hw = "Hello, world";
print $hw . "\n";


Or I could just put the variable into the string...

code:
my $hw = "Hello, world";
print "$hw\n";


Comparisons

Comparing numbers is done with:

code:
==, !=, >, <, >=, <=


While strings are compared with:

code:
eq, ne, gt, lt, ge, le


Other variables can be compared with either.

Three way comparisons can be done with <=> for numbers and "cmp" for strings. If the two things being compared are equal, 0 is returned. If the left hand thing is greater, 1 is returned. Otherwise -1 is returned.

Either form is acceptable for anything else, and "cmp" can be used with numbers, though <=> is usually the preferred form.

What's true and what's false?

It's easier to just say the things that are false.


  • zero
  • empty strings
  • "0" or '0'
  • empty arrays
  • empty hashes


What does an "if" statement look like?

code:
my $foo = "hello";

if ($foo eq "hello") {
   print "$foo\n";
} elsif ($foo eq "world") {
   print "that's weird\n";
} else {
   print "this is also weird\n";
}


When it's more convenient, we can write:

code:
my $foo = "hello";

print "$foo\n" if $foo eq "hello";


There's also a shortcut for "if not":

code:
my $foo = "hello";

print "$foo\n" unless $foo eq "world";


Connecting conditions

Those familiar with C, C++, Java, etc. should be familiar with:

code:
if ($foo ne "world" && $foo ne "foo") { }


Or the equivalent:

code:
unless ($foo eq "world" || $foo eq "foo") { }


For loop

The C/C++/Java style for loop works just fine in Perl to allow us to count up from one number to another:

code:
for (my $i = 0; $i < 10; $i++) {
   print $i;
}


A simpler, form, though is:

code:
for my $i (0 .. 10) {
   print $i;
}


Also:

code:
for my $i (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) {
   print $i;
}


This can also be used to loop over an array.

code:
my @arr = ("hello", "world");

for my $item (@arr) {
   print $item;
}


Or:

code:
my @arr = ("hello", "world");

print  for @arr;


Or even:

code:
print for "hello", "world"


The for loop can also be used to loop over the elements in a hash:

code:
my %hash = ("hello" => "world");

for my $key (keys %hash) {
   print "$key: $hash{$key}\n";
}


Subroutines

Subroutines in Perl5 correspond to functions or procedures in other languages. One noteable difference is that they do not have parameter lists.

A basic subroutine looks like:

code:
sub hello_world {
   print "Hello, world!\n";
}


Calling it is as simple as:

code:
sub hello_world {
   print "Hello, world!\n";
}

&hello_world;


In most situations the & can be elided, leaving:

code:
sub hello_world {
   print "Hello, world!\n";
}

hello_world;


Return

Of course, subroutines can return values.

code:
sub greeting {
   return "Hello, world\n";
}

print greeting;


They can return more than one thing, in fact.

code:
sub foo {
   return 5, 4, 3;
}


There are several ways we can use this output. Assigning it to a single scalar:

code:
my $bar = foo;


Results in 3 being stored in $bar.

Assigning it to an array:

If the $bar is in parens:

code:
my ($bar) = foo;


Then $bar is now 1.

code:
my @bar = foo;


Then it's equivalent to:

code:
my @bar = (5, 4, 3);


You can also assign each value to an array.

code:
my ($a, $b, $c) = foo;


$a is 1, $b is 2, and $c is 3.

And we can mix the two:

code:
my ($a, @b) = foo;


$a is 1 and @b is (2, 3).

Parameters

I mentioned that Perl subroutines don't have formal parameters lists. This does not mean that you can't pass parameters to a subroutine. They just end up in the @_ array.

Consider a subroutine "greeting" which takes the name of a person to greet.

code:
sub greeting {
   my ($name) = @_;
   return "Hello, $name\n";
}


When we call a subroutine with a parameter, we can either use parentheses or not. Using them simply makes it easier to understand which parameters go with which subroutine.

code:
print greeting "Bob";


Or:

code:
print greeting("Bob");


Subroutine References

In addition to getting references to scalars, arrays and hashes, we can get references to subroutines.

code:
sub foo {
   return 42;
}

my $sub_ref = \&foo;


It's worth noting that if we try:

code:
sub foo {
   return 42;
}

my $sub_ref = \foo;


Then we get a reference to whatever foo returns.

A more convenient syntax for directly getting a subroutine reference is:

code:
my $sub_ref = sub { return 42; };


Dereferencing a subroutine reference looks familiar:

code:
sub foo {
   return 42;
}

my $sub_ref = \&foo;

my $result = &{$sub_ref};
# of...
# my $result = &$sub_ref;


Or, we can call it directly.

code:
sub foo {
   return 42;
}

my $sub_ref = \&foo;

print $sub_ref->();


Map, Grep , and Sort: invaluable tools for dealing with arrays

Let's consider the case where want to create a new array based on an old array, but with each item in the array multipled by two.

code:
my @first = (1, 2, 3);
my @second;

for my $item (@first) {
   push @second, $item * 2;
}


"push" literally pushes a new item onto the end of an array.

The rest is pretty self-explanatory.

But, there's an easier way.

code:
my @first = (1, 2, 3);
my @second = map { $_ * 2 } @first;


"map"... well it maps each item in the input array to the code in the brackets. $_ is a "magical" variable which represents each item in the array.

Another case: we want to get each item in an array that's less than 42.

code:
my @first = (23, 12, 78, 42, 57, 15, 31);
my @second;

for my $item (@first) {
   if ($item < 42) {
      push @second, $item;
   }
}


Granted we could make this a bit more concise with:

code:
my @first = (23, 12, 78, 42, 57, 15, 31);
my @second;

for my $item (@first) {
   push @second, $item if $item < 42;
}


But an even better solution is:

code:
my @first = (23, 12, 78, 42, 57, 15, 31);
my @second = grep { $_ < 42 } @first;


Let's consider a case where we have an array of array references:

code:
my @names (
   ["Bob", "Smith"],
   ["John", "Doe"],
   ["Michael", "Fitzgerald"]
);


Each item in the array represents a name. What if we want to sort by last name?

The straightforward solution is to build an array of last names, use the built-in sort to sort that array. Then we can use that array to build a new array of sorted names. Of course, we need to find a way to eliminate redundant last names. Hashes can only use one key once, so if we make the last names keys in a hash, we can eliminate redundancy.

code:
my @names (
   ["Bob", "Smith"],
   ["John", "Doe"],
   ["Michael", "Fitzgerald"]
);

my @sorted_names;
my @last_names;
my %unique;

for my $name (@names) {
   push @last_names, $name->[1];
}

for my $last_name (@last_names) {
   $unique{$last_name} = 1;
}

@last_names = sort keys %unique;

for my $last_name (@last_names) {
   for my $name (@names) {
      push @sorted_names, $name if $name->[1] eq $last_name;
   }
}


Intimidated?

I would hope you would be. That's some mind-blowing code.

Now, let's see how the capabilities of sort can make this much easier:

code:
my @names (
   ["Bob", "Smith"],
   ["John", "Doe"],
   ["Michael", "Fitzgerald"]
);

my @sorted_names = sort { $a->[1] cmp $b->[1] } @names;


$a and $b are two adjacent items in the array. How they get swapped depends on a comparison between them. In this case we just do a standard comparison, but on the last name in each case.


: