Writing Perl Programs

		A Quick Introduction to Perl
		============================

* What is perl?  What's it good for?
  ----------------------------------
Perl is an interpreted language invented around 1987 by Larry Wall.
The name "perl" stands for "Practical Extraction and Report Language".
Perl was invented to make life easier for system administrators, who
often ran into the limitations of shell scripts.  

Perl combines the functionality of tools like awk, sed, grep and the 
shell itself into a single self-contained interpreter.  So how is a perl
script different from a shell script?  

When you write a shell script, each command (like "grep") causes the
shell to spawn off a new process.  That process does its job, returns
some data to the shell, and then exits.  Unfortunately, there's quite
a bit of time, memory and CPU cycles required to spawn off a new
process and kill it off when it's done.  Also, since the external
progams your shell uses aren't part of the shell itself, you may find
that they're missing when you copy your script to a different
computer, or they may require different parameters in order to run the
way you want.  In contrast, perl bundles up much of the functionality
you need inside a single program, so perl scripts are faster and more
portable than shell scripts.

Perl also provides you with many features that aren't available
in most shells.  Perhaps most importantly, perl provides a rich
set of tools for creating and using complicated data structures.

The best sources of information about perl are:

* "man perl", at the command line.  (If you're using Ubuntu, you may need
  to type "sudo apt-get install perl-doc" in order to make this available.)

* the following two books:
  - "Programming Perl", by Larry Wall, et al.
    		 (http://shop.oreilly.com/product/9780596004927.do)
  - "The Perl Cookbook", by Tom Christiansen, et al.
    	      	 (http://shop.oreilly.com/product/9781565922433.do)

* and these web sites:
  - http://www.perl.org/
  - http://www.perlmonks.org/
  - http://search.cpan.org/

The last web site is a search engine for the "Comprehensive Perl
Archive Network", a massive collection of extensions written for
perl.  The availablity this archive with its wealth of useful tools
is one of the chief strengths of perl.

* OK, I've read a little about perl.  But what are "TMTOWTDI" and "DWIM"?
  -----------------------------------------------------------------------
Just as every programmer has his or her own personality, so does
every programming language.  Built deep into perl's character are two
guiding bits of philosopy called "TMTOWTDI" and "DWIM".

"TMTOWTDI" stands for "There's More Than One Way To Do It."  Perl
has a rich syntax that often lets you accomplish the same goals by
several different methods.  You get to choose which one makes the most
sense to you, or makes your program the most readable.  In some
cases there may be performance tradeoffs in using one method or
another.  In any case, perl offers you a lot of options, and you
get to make the choice.  

"DWIM" stands for "Do What I Mean".  Perl tries very hard to guess what 
you want the program to do, based on context, and it's very successful
at doing so.  Of course, you always have the option of forcing
a particular interpretation, to eliminate any possible confusion.

We'll see examples of both of these principles as we go along.

* Basic perl syntax:
  ------------------
Perl's syntax is superficially much like C/C++.  For example:

- Statements can extend across multiple lines, and are terminated
  by a semicolon.

- Blocks of statements are delimited by curly brackets.

- Available operators are a superset of those available in C
  (e.g., + - / * += ++ = == and so forth).

- Arithmetic is done using binary infix notation, and assignments
  are like "left = right", as in C.

But there are differences too.  Probably the first thing you'll
notice is that variable names begin with a special character,
usually either $, @ or %.  These characters identify the variable's
type.  There are three main types of variables in perl:

$variable - This would be a scalar variable.  It can contain a
	    single value, either text or numeric.

@variable - This would be an array.  It contains a list of
	    values, each associated with a numeric index.

Perl doesn't distinguish between variables containing numbers
and variables containing text.  The values stored in the
variables are treated appropriately according to the context
within which the variable is used.  Of course, you can always
force a variable to be interpreted in a particular way, if you
want to.

So far, these types should be familiar from your experience with
other programming languages.  Perl also introduces a third
variable type, though:

%variable - This represents an "associative array", also called a "hash".
	    Like an array, it's a list of values, but in this case
	    the values are indexed by names, not by numbers.  We'll
	    see how this is used in the examples below.

Perl allows you to build up complicated structures using any
combination of these variable types.  You can have arrays of
hashes, hashes of arrays, arrays of arrays of hashes of arrays,
etc.  You data structures can be arbitrarily complex, and they
can easily be created as you need them, without any need to
predefine them.

* The structure of your program:
  ------------------------------
As with shell scripts, it's important to start shell programs with
a "shebang", like this:

#!/usr/bin/perl

This tells the operating system what program should be used to
interpret the program.  It will also help many editors identify
the program as a perl script, so that they can do appropriate
syntax highlighting.

There are two ways to run your perl program.  You can either
type "perl program.pl", where program.pl is the file containing
your program, or you can make the file executable by typing
"chmod +x program.pl" and then just type the file's name to
run it.  (Depending on your login shell, you may need to type
"rehash" first.  This also assumes that the directory containing
your script is located somewhere in your search path.)

The second line in your perl program should always be the following:

use strict;

This tells perl to turn on some extra sanity-checking that will
very likely keep you out of trouble later on.  In particular
"use strict" will catch typos in variable names.  Perl allows
you to create variables on the fly.  Without "use strict", perl 
won't complain if you use $veriable instead of $variable, and will 
just assume that $veriable has some empty value.

* The scope of variables in perl:
  -------------------------------
When you use "use strict", perl requires you to define the scope
of each variable.  Several scoping options are available in perl,
but you'll almost always want to use "lexical scoping".  To do this,
just put the word "my" in front of the variable the first time
you use it.  For example:

my $variable = 1;

This tells perl that you intentionally want to use a variable of
this name, and that it should be available anywhere within the
current block of code (defined by enclosing curly brackets, or
the whole file if the "my" statement occurs outside any brackets).
This is fairly intuitive, and similar to scoping in C++.

Outside a variable's scope, the variable doesn't exist.  If you
try to use it, you'll get an error message from perl.

* Finally! An example:
  --------------------
All of that being said, here's a simple perl script:

#!/usr/bin/perl
use strict;

# This program prints square roots of 0-9, with explanatory text:

my $limit = 10; # How many numbers to print.

for ( my $i=0; $i<$limit; $i++ ) {
    print "The square root of $i is ";
    print sqrt($i)."\n";
}

Notice several features of the program above:

- Comments begin with #, and can occur on a line by themselves or
  at the end of a command line.

- Perl offers several kinds of loops.  One of them is very
  similar to the C/C++ "for" loop.

- Perl's "print" command just prints out some text to your terminal.

- You can use several special characters in perl.  The expression
  "\n" in the print command above causes perl to print out a newline
  character.  There's no newline at the end of the first print
  statement, so the second statement just appends text onto the
  end of the first.

- Perl includes several math functions by default.  Others
  are available as loadable modules like Math::Trig and Math::Complex
  which are part of a set of "core modules" installed along with 
  perl on most computers.

- The dot operator, ".", concatenates strings.  In the second print
  statement, if we'd just put the "sqrt($i)" inside the quotes the
  program would have printed "sqrt(0)", etc., instead of the value
  of the square root.

- Finally notice that the scope of $i is the block of statements
  in the "for" loop.  Outside the loop, $i isn't defined.

* Getting arguments from the command line:
  ----------------------------------------
Consider the following example:

#!/usr/bin/perl

use strict;
use File::stat;

# Print out the sizes of a list of files, with names given
# on the command line.

for my $file (@ARGV) {
    my $stat = stat($file);
    print "$file is $stat->size bytes long\n";
}

Here are some things to notice in this example:

- We're pulling in an extra module, File::stat, that's not available 
  unless we ask for it.  This module provides a convenient set of tools
  for getting file properties.  In particular, it gives us a friendlier
  replacement for perl's built-in "stat" function.

- Notice that we're using a different form of the "for" loop.  In
  this form, we're looping through all of the elements of the array
  called @ARGV.  Each time around the loop, we set the value of
  $file to one of the values in @ARGV.

- @ARGV is an array that's automatically defined for us by perl.
  It contains any command-line arguments we give when we run our
  program.  For example, if our program is called prog.pl and
  we run it like this:

     prog.pl file1 file2 file3

  then @ARGV would contain three elements, "file1", "file2" and "file3".

- The "stat" function provided by File::stat returns information
  about the given file.  This information is returned as a reference
  to a perl object.  Objects are beyond the scope of this introduction,
  but you don't need to know much about them to use them.  The documentation
  for File::stat (which you can see by typing "man File::stat") will
  tell you that you can get properties of the file by referring to
  $stat->size, $stat->mode, $stat->perm, etc.  In this case, we use
  $stat->size, which give us the size of the file.

Note that we could get individual elements of the @ARGV array if
we wanted them.  For example, I could say:

my $firstfile = $ARGV[0];

Notice a few things about the line above:

- Array indices in perl start with zero, as they do in C/C++.

- We use square brackets around the index number to identify the
  element we want.

- The array name is @ARGV, but an individual element is $ARGV[0]
  (or "1" or "2" or whatever).  This is because the element itself
  is a scalar, not an array.

- Finally, notice that the array of command-line arguments is a little
  different from the $0, $1, $2, etc., variables in a shell script.
  In shell scripts, $0 is the name of the command itself and the
  first argument is in $1.  In perl $ARGV[0] is the first argument.
  If you want the name of the command, that's stored by perl in an 
  automatically-defined variable called "$0".

Usually, instead of referring explicitly to $ARGV[0], $ARGV[1] or
whatever, we'll use perl's "shift" function.  Here's another example:

#!/usr/bin/perl
use strict;

my $min = shift;
my $max = shift;
my $nstep = shift;

# Print x,y values for y=exp(-x**2):

my $step = ($max - $min)/$nstep;
for (my $i=0; $i<$nstep; $i++) {
    my $x = $min + $i*$step;
    my $y = exp(-$x**2);
    print "$x $y\n";
}

This program accepts three values on the command line (minimum and
maximum values for x, and the number of steps between).  We could
have said "my $min = $ARGV[0]", but instead we've used the "shift"
function.  Each time "shift" is called, it takes the first
element off of the @ARGV array and shifts the other elements to the
left by one position.

You can use "shift" on any other array just by giving it the array
name as an argument, like "shift @variable" or "shift(@variable)".
(Note that perl allows you to omit the parentheses in many function
calls, to improve readability.)

Finally, note that there's a corresponding "unshift" function,
that can be used to undo "shift"'s work.  Unshift shifts all
of the array elements to the right, and sticks a new value into
element number zero.  There are also "push" and "pop" functions
that work similarly on the LAST element of the array.  See
"man perlfunc" for more details.

* Printing special characters:
  ----------------------------
We've seen that we can print a newline with "\n".  There are several
other special characters that can be used similarly.  Here are a few:

\n - Newline
\r - Carriage return
\t - Tab
\f - Formfeed

* Single and double quotes:
  -------------------------
As is the case with shell scripts, single-quotes (') and double-quotes (")
work in different ways in perl.  Inside double-quotes, variables are 
replaced by their values, and special characters like \n do what you
expect them to do.  Inside single-quotes none of this happens.  In
single-quotes everything is literal.  For example:

print '$file\n';

would print:

$file\n

Note that, if you need to use variables in combination with single-quotes,
you can always concatenate strings together.  For example:

print 'This is a literal string '."and this one is not\n";

* Operators:
  ----------
As I mentioned above, many perl operators look just like their
counterparts in C/C++.  Here's a list of some commonly-used perl
operators:

+	Add
-	Subtract
*	Multiply
/	Divide
%	Modulo
**	Exponentiation (not in C/C++, but in Fortran)
++	Increment
--	Decrement
+=	Add and assign
-=	Subtract and assign
==	Test numbers for equality
!=	Test numbers for inequality
<	Less than
>	Greater than
>=	Greater or equal
<=	Less or equal

Here are some more commonly-used perl operators that differ from C/C++:

.	String concatenate
eq	Test strings for equality
ne	Test strings for inequality
cmp	Compare strings for sorting
<=>	Compare numbers for sorting
=~	Regular expression operations

We've already seen that the string concatenation operator makes
string concatenation very easy in perl.  In some examples below
we'll see how to use the sorting operators and the almost-omnipotent
"=~" operator.

* Conditional statements:
  -----------------------
A programming language wouldn't be complete without conditional statements.
Perl offers several variations on the if/elsif/else statements found in
other languages.

Consider the following variation on our "file size" program:

#!/usr/bin/perl

use strict;
use File::stat;

# Print out the sizes of a list of files, with names given
# on the command line.

for my $file (@ARGV) {
    my $stat = stat($file);
    if ( $stat->size > 100000 ) {
	print "$file is BIG!  ($stat->size bytes long)\n";
    }
}

This syntax is similar to "if" statements in C/C++.  We could
deal with other conditions by adding an "else" statement:

if ( $stat->size > 100000 ) {
  print "$file is BIG!  ($stat->size bytes long)\n";
} else {
  print "$file is not too big.  ($stat->size bytes long)\n";
}

or we could even say:

if ( $stat->size > 100000 ) {
  print "$file is BIG!  ($stat->size bytes long)\n";
} elsif ( $stat->size > 10000 ) {
  print "$file is medium-sized.\n";
} else {
  print "$file is dinky.\n";
}

We can add as many "elsif" statements as we want to.

We can use the boolean operators || and && to combine several
tests.  For example, we could say:

if ( $n > 100 && $n < 1000 )

Perl often provides us with alternative ways of saying things,
in order to make code more readable.  We could re-write the line
above like this:

if ( $n > 100 and $n < 1000 )

using "and" instead of &&.  We can also use "or" instead of ||.

We can use && and || outside of an "if" statement, too.  For
example, this is a complete, valid statement in perl:

$n < 100 && print "N is less than 100\n";

We also have the option of using an "if" at the end of a 
statement.  For example:

print "x has the value $x\n" if ($debug);

This statement will only be executed if $debug has a true
value.  In perl, the value of a variable is false if it
is numerically zero, the string "0", an empty string,
an empty array, or the special value "undef".  All other
values are true.

Perl also provides us with a convenient inverse for the "if" 
statement:

unless ( $n == 10 ) {
	print "n is not equal to 10\n";
}

"unless" is just an "if not".  It often helps make code more
readable.

As in C/C++, perl also has a terse ternary conditional statement.
For example, we can write:

my $y = ($x>1) ? 1.0 : 0.0;

This would set the value of $y to 1.0 if $x is greater than 1,
or 0 otherwise.  This is just a short, one-line version of an
if/else statement.

* Regular expressions
  -------------------
Just like the "grep" command used in shell scripts, perl 
understands "regular expressions" (often called "regexp").
Regular expressions are a way of specifying a pattern that
you want to match.  

Perl understands all of the regexp symbols known to grep,
and several more besides:

Symbol		Meaning						
------		-------						
.		Match any single character.
*		Match zero or more of the preceding item.
+		Match one or more of the preceding item.	
?		Match zero or one of the preceding item.	
{n,m}		Match at least n, but not more than m, of the	
		preceding item.
^		Match the beginning of the line.
$		Match the end of the line.
[abc123]	Match any of the enclosed list of characters.
[^abc123]	Match any character not in this list.
[a-zA-Z0-9]	Match any of the enclosed ranges of characters.
this|that	Match "this" or "that".				  
\., \*, etc.	Match a literal ".", "*", etc.

All of the above are known to grep, but perl also understands
these:

\w	Match a "word" character (alphanumeric plus "_")
\s	Match a whitespace character (space or tab)
\d	Match a digit.
...and some others.

(Type "man perlre" for full information about perl's regular
expression syntax.)

In perl, regular expressions are often used with the =~ operator.
Here's an example:

#!/usr/bin/perl

use strict;

for my $file (@ARGV) {
  if ( $file =~ /\w+\.jpg/ ) {
    # do stuff with the file.
  } else {
    print "$file doesn't appear to be a JPEG file. Skipping.\n";
  }
}

In the example above, we accept a list of file names on the command
line, then loop through the list of files.  For each file, we check
to see if the file's name matches a particular regular expression,
and act appropriately.  In this case, the =~ operator returns
a true value if the file name matches the regexp.

We can also save some matching parts of the string for later use.
Consider the following:

#!/usr/bin/perl

use strict;

for my $file (@ARGV) {
  if ( $file =~ /(\w+)\.jpg/ ) {

    my $firstpart = $1;
    my $newname = "$firstpart.JPG";

    rename $file $newname;

  } else {
    print "$file doesn't appear to be a JPEG file. Skipping.\n";
  }
}

In the example above, we enclose part of the regexp in parentheses.
The matching part of the string will then be available in the 
variable "$1".   The parentheses in this example are called 
"capturing parentheses".  If we had several sets of parentheses 
in the pattern, matches would be captured in $1, $2, $3, etc.

The example above also shows perl's "rename" function, which
just renames a file.

* Substitutions:
  --------------
The =~ operator can do other things, too.  For example:

my $string = "East is East and West is West.";
$string =~ s/East/Octarine/;

The lines above would cause $string to have the value "Octarine
is East and West is West".  In this case, the "s" operator is used 
to substitute "Octarine" in place of the first occurrence of
the regular expression "East".

By default "s" only makes the substitution the first time it
finds a match.  If we want it to replace all matches, we need
to append "g" (for global), like this:

$string =~ s/East/Octarine/g;

which would produce "Octarine is Octarine and West is West.".


* Other ways to do loops:
  -----------------------
We've seen examples of a couple of different ways to do loops in
perl.  There are others.  Consider the following:

#!/usr/bin/perl
use strict;

# Use Newton's method to find the square root.

my $n = shift;

my $epsilon = 0.001; # Desired precision.
my $root = $n; # First guess.
my $diff = $root**2 - $n;
 
while ( $diff > $epsilon ) {
    $root = $root - $diff/(2.0*$root);
    $diff = $root**2 - $n;
}

print "Sqrt of $n is $root\n";

In the example above, we use Newton's method to search for the 
square root of a number given on the command line.  To do this,
we loop through iterations until the answer is close enough
(as defined by $epsilon).

The "while" loop continues until the given condition is no longer
satisfied (in this case, until $diff becomes less than or equal to
$epsilon).

If we wanted to skip out of this loop (or any other perl loop)
prematurely, we could use the "last" command.  For example, we
could re-write the loop above as:

while (1) {
    $root = $root - $diff/(2.0*$root);
    $diff = $root**2 - $n;
    last if ( $diff <= $epsilon );
}

If, on the other hand, we just wanted to skip the rest of the
current iteration and go immediately to the next trip around
the loop, we could use "next".  This is often used when you're
searching for something:

for my $file (@files) {
	next unless ( $file =~ /\.jpg/ );  #skip non-jpeg files.
	# do some stuff with jpeg files....
}

As with if/unless, perl also offers an opposite for "while".
An "until" loop will continue until a given condition is met.
Thus, one more way to write our Newton's Method loop
would be this:

until ( $diff <= $epsilon ) {
    $root = $root - $diff/(2.0*$root);
    $diff = $root**2 - $n;
}

* Reading and writing files:
  --------------------------
As in C/C++ and many other languages, files must be "opened" before
perl can read or write them.  Here's a typical "open" statement
in perl:

open (my $fh, "<", $filename);

Perl offers several ways to write an "open" statement (TMTOWTDI!),
but this is arguably the best one to use most of the time.  It tells
perl to open the file that's named in $filename.  The "<" says that
we want to open the file for reading only (if we tried to write to
it, we'd get an error message).  Finally, "open" creates a "file
handle" that we'll use for referring to the file.  The file handle,
among other things, remembers where we are in the file as we
read through it.

Here's a complete example:

#!/usr/bin/perl
use strict;

# Add HTML markup to section headers to make them bold:

my $filename = shift;
open (my $fh, "<", $filename) or die "Can't open $filename: $!\n";
while (my $line = <$fh>) {
	$line =~ s/^Section (\d+):/<b>Section $1:<\/b>/;
	print $line;
}
close ($fh);

There's a lot going on in this short example.  Here's a
breakdown:

- First, notice that we've added some stuff at the end of the
  "open" statement, to deal with possible errors.  We use the
  "or" logical operator (we could alternatively have used ||)
  to do something if the "open" statement fails.

- If "open" fails, we use perl's "die" function to terminate
  the program and print out some descriptive text.  The text
  includes the special variable $!, which contains a readable
  description of the error (maybe something like "file not found").

- Then we start a loop, reading one line of the file at a time.
  The <$fh> construction reads data from the file (specified by
  its file handle).  

- Perl sees that we're going to put the data into a scalar variable,
  so it only reads one line at a time.  If we'd said "my @lines =
  <$fh>" perl would have sucked in the whole file at once, putting one
  line into each of the elements of @lines.

  Perl also offers you the option of sucking the whole file into 
  a scalar variable, newlines and all.  This is called "slurp
  mode".  You can find out about doing this here:

	http://www.perl.com/pub/2003/11/21/slurp.html

- For each line, we use =~ and the substitution operator to 
  re-write the line.  We look for strings like "Section 123:"
  at the beginning of lines, and then we put "<b>" and "</b>"
  around these strings.  (This is how HTML marks bold text.)

- If we were just trying to make "Section" bold, that would
  be easy. We'd just write "s/Section/<b>Section<\/b>/".  But
  we want the section number to be bold, too.  This is easy
  to match with a regular expression, but how do we know what
  number to write back into the string?  We do it by using
  capture parentheses to catch the number, and using $1 as
  part of what we write back.

- Finally, the "</b>" we want to write contains a backslash
  character.  Perl would get confused by this, since backslashes
  are used to delimite our substitution string.  We avoid
  this by escaping the backslash and writing "<\/b>".  This
  tells perl that this is a literal backslash, not the end of
  our substitution string.

Notice that we've used the "close" function at the end, to
explicitly close the file.  This is good practice, even though
perl will automatically close the file for you when your program
ends.

The file above was opened for reading, but we can also open
a file for writing, or for both reading and writing.  To open
a file for writing, use ">" in place of "<" in the open statement.
To do both, use "+<".  You can also open a file for "appending"
by using ">>".

Here's an example show how writing works, using a modified version
of our Newton's Method program:

#!/usr/bin/perl
use strict;
# Use Newton's method to find the square root.
my $n = shift;
my $epsilon = 0.001; # Desired precision.
my $root = $n; # First guess.
my $diff = $root**2 - $n;
 
open ( my $outfile, ">", 'newton.dat' );

my $nloop = 0;
while ( $diff > $epsilon ) {
    $root = $root - $diff/(2.0*$root);
    $diff = $root**2 - $n;
    $nloop++;
    print $outfile "$nloop $root $diff\n";
}

close ($outfile);
print "Sqrt of $n is $root\n";

The program above opens the file "newton.dat" for writing, and then
each time around the loop it writes some data into the file.  To write
into the file, we just use a slightly-modified version of the "print"
statement.  If print is given a filehandle as its first argument, it
will write to this file instead of sending output to your terminal.

The "open" statement can also be used to create a pipe connected
to another program.  We could modify the program above by 
adding the following lines after the "close" statement:

open (my $gnuplot, "|gnuplot -persist");
print $gnuplot "plot 'newton.dat' using 1:2 with lines\n";
close ($gnuplot);

This would cause the program to invoke gnuplot to show us a graph
of the data in the file 'newton.dat'.  Once we've created the
$gnuplot file handle, we can use "print" to send any commands
we like to the running gnuplot program.

We can also use pipes to read data from another program, by
saying something like "zcat file.gz|" instead of "|gnuplot -persist".
In that case, we could read from the pipe just as we read from the
file in the earlier example.  If you want to be able to both
read AND write to another program, you'll need to do something
more complicated. (Take a look at 
http://search.cpan.org/~rjbs/perl-5.18.2/ext/IPC-Open3/lib/IPC/Open3.pm ).

Note, however, that if you just want to get the output from
a command there are other ways to do that under perl (TMTOWTDI!).
One of them is backticks.  For example:

my $me = `whoami`;

The command above would capture the output of the whoami command and
put it into the scalar variable $me.  Backticks and <filehandles>
behave a little differently for multi-line output.  If I write:

my @list = `ls -al`;

I'll get one line of the ls command's output in each array element.
If I write:

my $list = `ls -al`;

all of the output, newlines and all, will be stuffed into the
single scalar variable $list.

When reading a file, we often want to omit the newlines at the end
of each line we read.  Perl provides a convenient function called
"chomp" to do that.  Chomp drops newlines and carriage returns
at the ends of lines.   If you give it an array, it chomps each
of the elements of the array.  Chomp is often used while reading
files, like this:

while ( chomp( my $line = <$fh> ) ) {
  # Do stuff with the cleaned-up lines.
}

* Splitting lines:
  ----------------
Perl's "split" function lets you to split up a line of text.
Here's a snippet of code that illustrates the usage:

# Split comma-separated columns:
while (chomp(my $line = <$fh>)) {
    my @data = split(/\s*,\s*/,$line);
    # The elements of @data now contain the data
    # the columns in this row of the file.
}

The first argument to split is a regular expression, telling split
where to split the line.  In the example above it's looking for
zero or more white-space characters followed by a comma and zero or
more white-space characters.  That would successfully split a line
like this:

1, 2 ,3 , 4

which might have some extra white-space characters in it.

* Associative Arrays (Hashes):
  ----------------------------
Hashes are one of the features that set perl apart from many other
programming languages.  A hash is like the arrays you see in C/C++,
but instead of numeric indices, the elements are indexed by keywords.

Take a look at the following example:

#!/usr/bin/perl
use strict;

my %height; # Tell perl that we want to use a hash variable called "height".

$height{bryan}  = 6.0; # Units are "feet".
$height{jed}    = 6.4;
$height{jethro} = 6.6;
$height{granny} = 5.0;

print "Bryan's height is $height{bryan}\n";

As you can see, you can create hash elements on the fly, without
predefining anything.  You just add new keys to the hash.

We might want to modify the program above to read data from a file:

#!/usr/bin/perl
use strict;

my %height; # Tell perl that we want to use a hash variable called "height".

open (my $fh, "<", "heights.dat");
while ( chomp ( my $line = <$fh> ) ) {
  my ($name,$height) = split(/\s+/,$line);
  $height{$name} = $height;
}
close( $fh );

print "Bryan's height is $height{bryan}\n";

There are a couple of things you should notice about this example:

- As you can see, we can put the output of "split" into a group
  of scalar variables instead of into an array.  We can do this
  whenever a function spits out multiple values.  If split were
  to provide more values than the variables we've defined, the 
  extra values would be discarded.

- Notice that we can have a scalar variable $height and a
  completely different hash variable %height in the same
  program.  Perl doesn't get confused because it can always
  distinguish between them:  $height{bryan} refers to a value
  in the hash, and $height refers to the scalar variable.

But what if our data file doesn't include a value for "bryan"?
We could modify the print statement to say something like this:

if ( exists $height{bryan} ) {
  print "Bryan's height is $height{bryan}\n";
} else {
  print "Bryan's height isn't defined.\n";
}

The "exists" function returns "true" if the given hash key exists.

What if we wanted to print out all of the data?  We could replace
the print statement with something like this:

for my $k (keys %height) {
  print "Height of $k is $height{$k}\n";
}

The "keys" function returns a list of keys for the given hash.
(There's also a "values" function that will return a list of
the values stored in the hash.)

What if the list of people is very long, and we want to sort
it?  We could modify our "for" loop like this:

for my $k (sort keys %height) {
  print "Height of $k is $height{$k}\n";
}

By default, the "keys" function supplies the list of keys in
no easily-predictable order.  The "sort" function here sorts
them in dictionary order.  This would cause the program to
start with "bryan" and end with "jethro", using the keys
in the examples above.

What if we wanted to print the list sorted by height, instead
of by name?  Sort can do that for us, too.  Consider this
example:

for my $k (sort {$height{$a} <=> $height{$b}} keys %height) {
  print "Height of $k is $height{$k}\n";
}

In this case, we've given "sort" an optional first argument
that specifies a block of code to use when comparing two keys
to see which should come first in the list.  There are two 
things to note about this expression:

- First, the <=> operator compares two arguments and returns
  -1, 0, or 1 depending on whether the left argument is 
  numerically less than, equal to, or greater than the right 
  argument.

- Second, the sort function automatically replaces the dummy
  variables $a and $b with each pair of keys it's comparing.

The code above would cause the program to print the key with
the smallest height first, and proceed through the list until
it came to the key with the largest height.  We could reverse
the order by swapping $a and $b, or we could use perl's
handy "reverse" function, which reverses the order of a list.

As I said somewhere above, perl allows you to build up
arbitrarily complicated data structures.  Let's look at
another example that illustrates a more complex structure:

#!/usr/bin/perl
use strict;

# Read ADC and TDC calibration data.
my %calib;

open (my $fh, "<", "calib.dat");
while ( chomp ( my $line = <$fh> ) ) {
    my ($name,$offset,$gain) = split(/\s+/,$line);
    $calib{$name}{offset} = $offset;
    $calib{$name}{gain} = $gain;
}
close( $fh );

In the example above, we read some calibration data from a file.
Each line of the file contains a name identifying a particular
ADC or TDC and an offset and gain for that device.  We read
the data into a "hash of hashes".  Each element in the %calib
hash is another hash, containing the keys "offset" and "gain".
If we had a device named "tdc1", we could get its calibration
parameters later in the program by referring to $calib{tdc1}{gain}
and $calib{tdc1}{offset}.  This is similar to a two-dimensional
array in C/C++, but using keywords as indices instead of
integers.

What if we wanted to print out all of the data in the %calib
hash?  We could do this:

for my $k (keys %calib) {
	print "Device $k: ";
	# Print out the device's calibration attributes:
	for my $a (keys %{$calib{$k}}) {
		print "$a=$calib{$k}{$a} ";
	}
	print "\n";
}

This would print out lines like:

Device tdc1: gain=1.0 offset=201

One thing to notice about this example:

- In the second "for" loop, we want to get the key values of
  the hash stored at, e.g., $calib{tdc1}.  (These key values
  will be "gain", "offset", or whatever other things we've stored
  in the hash.)  to do this, we need to tell perl to treat
  $calib{$k} as a hash.  To accomplish this, we do something
  similar to a "cast" in C/C++:  %{$calib{$k}}.  If we didn't
  do this, perl would treat $calib{$k} as a scalar value (after
  all, it begins with a "$").  And it really is a scalar value
  internally.  In perl's memory, this is a "reference", which
  is similar to a pointer in C/C++.

What if, instead of names like "tdc1", we wanted to have arrays
of devices, like "tdc[1]"? We could modify our program and
our data file to do that.  Here's what the program might look
like:

#!/usr/bin/perl
use strict;

my %calib;

open (my $fh, "<", "calib.dat");
while ( chomp ( my $line = <$fh> ) ) {
    my ($type,$i,$offset,$gain) = split(/\s+/,$line);
    $calib{$type}[$i]{offset} = $offset;
    $calib{$type}[$i]{gain} = $gain;
}
close( $fh );

This would read in data from "calib.dat" in the form:

tdc 0 123 1.1

where the columns are "type", "number", "offset" and "gain".
As you can see, all of the devices of each type are now
stored in arrays, and they can be referred to by number,
like "$calib{tdc}[0]".

If we wanted to print out all of the calibration data,
we might do something like this:

for my $t (keys %calib) {
    print "Devices of type $t:\n";
    for (my $i=0; $i<@{$calib{$t}}; $i++) {
        print "$i ";
        for my $a (keys %{$calib{$t}[$i]}) {
            print "$a=$calib{$t}[$i]{$a} ";
        }
        print "\n";
    }
}

Notice two things about the second "for" loop:

- Here, we tell perl that the data at $calib{$t} should
  be treated as an array, by using @{$calib{$t}}.  This
  is similar to the "cast" we used above.

- Note also that, when you refer to an array in a place
  where perl expects a number, perl just inserts the number
  of elements in the array.  So, this for loop goes from
  i=0 to i=N-1, where N is the number of array elements.

We could do more complicated examples, but I think you
get the idea.  We can build up any data structure we want
by combining hashes and arrays.

* Subroutines:
  ------------
You can create your own functions in perl by writing
"subroutines".  Say, for example, that we wanted to 
make our Newton's Method routine into a "function" that
we could use in a perl program.  We could do so by
writing something like this:

sub newton {
  my $n = shift;

  my $epsilon = 0.001; # Desired precision.
  my $root = $n; # First guess.
  my $diff = $root**2 - $n;
 
  while ( $diff > $epsilon ) {
      $root = $root - $diff/(2.0*$root);
      $diff = $root**2 - $n;
  }

  return($root);
}

We could then use this function in our program just like any
other perl function:

my $x = newton(1024.5);

Note that the subroutine gets the values of its arguments
using "shift", just as the main program uses shift to
get the command-line arguments. The subroutine returns a 
value by using the "return" statement.  You can return
scalar, array or hash data.

You don't need to pre-define subroutines before you use
them in your program.  You're free to just stick the
subroutines at the bottom of the program.  Perl parses
your entire program before it starts running it.

* References:
  -----------
When you use a variable as a subroutine argument, like
"newton($number)", the value of the variable is copied
into the subroutine.  Nothing the subroutine does can
affect the value of the original variable.

Perl gives us the option of passing the variable's 
storage address to the subroutine.  The subroutine can
then modify the actual values stored at that address,
and thereby modify the original value.

Consider the following snippet of code:

my $string = "This is my string and I'm sticking to it.\n";

mangle(\$string);

print $string;

sub mangle {
  my $ref = shift;
  ${$ref} = "No it's not\n";
}

When you run this, it will print out "No it's not.".

The expression "\$string" is a reference to the variable
$string.  It can be used in a way similar to a pointer
in C/C++.  We pass this reference to our subroutine,
where the reference is used to change the actual
value of $string.  The expression ${$ref} just refers
to $string.

References in perl are often used to pass complex
data structures around.  In fact, perl has a special
syntax for getting the values from a reference to
a hash or an array.  Consider the following, where
%calib is one of the calibration data structures
in our earlier example:

my $ref = \%calib;

print $ref->{tdc}[0]{offset};

The -> operator tells perl that we want to "de-reference"
$ref and get the data inside the hash it points to.

Similarly, we could have a reference to an array and
use expressions like $ref->[0] to get the array elements.

* Perl modules:
  -------------
As I mentioned in the beginning, there are thousands of
modules that extend perl's functionality.  Many of them
are archived and documented at http://search.cpan.org.
A core set of modules is usually bundled up with perl
when it's installed on your computer.  To find out about
a given module, use the "man" command.  For example,
to find out about the Getopt::Std module, type "man Getopt::Std".

To use a module in your program, add a line like:

use Getopt::Std;

Modules have the option of adding new functions and
variables.  (Many modules allow you to have some control
over which functions and variables are added.  See the
module's documentation for information.)

Here are a few notes about some particularly useful
modules.

- Data::Dumper
	For debugging perl programs, Data::Dumper is
	invaluable.  It allows you to dump out complicated
	data structures in a human-readable format.  Basic
	usage would be something like this:

	use Data::Dumper;

	print Dumper(\%calib);

	The Data::Dumper module provides a new function, Dumper,
	which accepts a reference to a data structure as an
	argument.  It produces explanatory text about the
	structure, which you can then print out.  Data::Dumper
	also includes functions for reading such text back in,
	and making it back into a perl data structure.

- LWP::Simple
	This module provides many tools for interacting with Web pages.
	Here's an example:

        use LWP::Simple;
        $content = get("http://www.virginia.edu/");

	print $content;

	The "get" function provided by LWP::Simple fetches the
	content of a given web page.  You can then extract data
	from the content, save it, print it out, or whatever else
	you need to do.

- CGI
	The CGI module provides a rich set of tools for creating
	program-driven web pages.  See "man CGI" for more.

- Text::CSV
	This module provides tools for parsing CSV files ("comma-
	separated-variable"). It understands about quotation marks,
	and can differentiate between commas within quotes and
	commas that actually separate columns.  It provides a
	function that can reliably split a line from the file into
	an array of column values.

- Spreadsheet::ParseExcel
	This module provides tools for reading and writing Microsoft
	xls files.  There's also a module called Spreadsheet::XSLX
	for parsing newer-format xlsx files.

- Getopt::Std
	Often you'll want your program to accept some options
	on the command line.  Maybe you want to do something like
	this:

	myprogram -o output.dat -v -n 10

	Getopt::Std gives you tools to conveniently parse command-line
	options.  You supply it with a list of one-letter switches,
	tell it which switches take arguments, and it will parse
	the command line for you and stick the result into a hash
	that you can use later in your program.  Here's an example:

	use Getopt::Std;
	my %args;
	getopts("ho:vn:",\%args);
	$args{h} && HELP_MESSAGE();

	This says that valid switches are h, o, v and n.  The
	letters followed by colons require arguments.  The command
	line is parsed and put into the hash "%args" for later use.

A few others you might want to look into:

- DateTime::Format::Natural for parsing strings like "thursday last week".

- DBI for interacting with databases.

- Net::LDAP for querying LDAP servers.

- Time::HiRes for counting time in microseconds or nanoseconds.

- File::Basename for parsing file paths and extracting file and directory names.

- Digest::MD5 for creating cryptographic hashes.


* Conclusion:
  -----------
I hope some of the stuff above has been useful.  Remember to check
"man perl" and "man perlintro" for much more information about perl.