Subroutines and local variables

Create subroutines through use of the sub keyword. The syntax is simple enough:

sub nameOfSubroutine { ...body of subroutine... }

It doesn't matter where the subroutine is declared within the file; Perl parses the entire file before beginning execution. Conventionally, the subroutines are listed at the bottom of the file, and appear in alphabetical order.

Within subroutines use the my keyword to declare local variables. Perl is lexically scoped, like C++ and Java, so the scope of local variables should be easily understood by most everyone in this class. If you're unsure of the how lexical scoping works, you should read the scoping section in chapter 6. If you are still confused, please speak with me. Here are some examples:

my $bob = 0;
my @arr = (1,2,3);
As any good software engineer would proclaim: all variables should be declared with lexical scope (via my). I.e., avoid global variables as much as possible.

Arguments (Parameters)

Notice that a subroutine declaration does not include a formal parameter list. In Perl, all arguments are passed via the implicit array variable @_. The first argument will be the first element of the array, the second will be the second, and so on. While it would be legal to always access the arguments via @_ (e.g., @_[1], or more correctly, $_[1]), this would not provide mneumonic identifiers for the values. So the first few lines of most subroutines are usually cenetered around pulling the values out of the @_ array and placing them in variables. So, if a subroutine takes two integer arguments, representing, say, the height and weight of something, the first line would probably be something like:

my ($height, $weight) = @_;

The parentheses indicate a list context, so @_[0] goes into the local variable $height, and @_[1] goes into $weight.

Here is a simple program illustrating both the sub and my keywords.

Now the big weakness of subroutine declaration not having formal parameter lists is that there is no argument checking by the compiler: if your program calls a subroutine with only 3 arguments, but that subroutine expects 4, there will not be a compiler error! Likewise, there is no type-checking. So again the compiler will not complain if a subroutine expecting an integer value is instead passed an array as its argument.

Command-Line Arguments

The code in your program that is not defined within a subroutine forms the program's mainline. You can think of it as a single, undeclared subroutine that is invoked by the operating system when the program is run. The special array variable @ARGV is used to carry any command line arguments. (C programmers will recall that the variables argc and argv are used to access command line arguments in that language.) You access these parameters the same way you do those passed via @_ to subroutines. This program demonstrates the use of @ARGV to handle command-line arguments. Note, too, that there are no global variables. Indeed, the use strict line tells the compiler to disallow any non-lexical variable declarations.

From now on, all of your Perl files should have a use strict; line.

How are arguments passed?

Tisdall's book is a little misleading about this. In it, he claims that Perl uses a pass-by-value technique to pass parameters, by default. In effect, the formal parameter (the ones in @_) is a copy of the actual parameter (those where the subroutine is invoked). Thus, changes to the formal parameters do not change the values of the actual parameters.

Actually, Perl uses pass-by-reference, as can be seen in this small program. The first subroutine invoked, change, only accesses the formal parameter value directly, via the @_ variable. Because Perl uses pass-by-reference, the line

$_[0] += 1;

in the subroutine changes both the formal and actual parameter (i.e., $arg in the mainline). The second subroutine invoked, change2, first places the formal parameter in the local variable $theArg via the line

my($theArg) = @_ ;

It is the assignment operator here that makes a copy of the formal parameter. Thus, changes to $theArg in this subroutine have no effect on the actual parameter.

When this program is run we get the following output:

In main, the variable arg is 1 before any subroutine calls.
In change, the formal parameter is 1.
In change, after adding, the parameter is 2.
In main, the actual parameter, arg, is 2 after the subroutine call.
Now we call a subroutine that retrieves its parameter into a local variable.
In change2, the formal parameter is 2.
In change2, after adding, the formal parameter is 3.
In main, after that subroutine call, the actual parameter, arg, is 2.

Now Tisdall is mostly correct because, as I indicated above, you should almost always store parameters in mneumonically named local variables. In that case, Perl does act as if it used pass-by-value parameters.

References

It turns out that we can have our cake and eat it too, because Perl provides reference variables. We will be able to pass parameters by reference and still access them via mneumonic identifiers. Reference variables, in effect, allow us to create two or more variables that refer to the same location in memory. Or, put another way, they allow us to create aliases for a value. (C++ programmers will be familiar with the &identifier syntax for declaring reference variables and parameters in that language.) Here is a small code fragment demonstrating the creation of an alias:

my($bob) = 1;
my ($bobAlias) = \$bob;
The '\' syntax returns a reference to the variable $bob. This reference is stored in $bobAlias. To access the latter, we have to use a dereference operator. There are three, one for each of the three value types: a '$' prefix for scalar references, '@' for array references, and '%' for hash references. Because $bobAlias is a reference to a scalar value, the following code fragment increments value that it refers to:
print "Original is $bob, while alias is $$bobAlias.\n";
# Increment the alias.
++$$bobAlias;

Admittedly, the '$$' looks a bit weird, but that's what is needed. Now if we execute all of this together:

my($bob) = 1;
my ($bobAlias) = \$bob;

print "Original is $bob, while alias is $$bobAlias.\n";
# Increment the alias.
++$$bobAlias;

# Changing the value of the alias changes the value of the original, since the
# two are simply aliases for the same location in memory.
print "After incrementing the alias, the original is $bob, while the alias is $$bobAlias.\n  Changing one changed the other!\n\n";

we get the following output:

Original is 1, while alias is 1.
After incrementing the alias, the original is 2, while the alias is 2.
  Changing one changed the other!

Okay, so now we just have to harness this capability within a subroutine to have pass-by-reference parameters. To do that we pass references as the actual parameters, and then dereference the formal parameters within the subroutine. Below is a modification of our earlier program that demonstrated that all parameters are passed by reference in Perl. In this version, though, we use reference variables to provide mneumonically named identifiers for the parameters, yet still retain the by-reference nature of the parameters. Notice that both the invocation of the subroutine and the use of the parameter within the subroutine are slightly different:

my($arg) = 1;
print "In main, the variable arg is $arg before any subroutine calls.\n";

#We invoke the subroutine, but pass a *reference* to the subroutine
change3(\$arg);

#After the subroutine is called, the actual parameter's value is changed!
print "In main, the actual parameter, arg, is $arg after the subroutine call.\n";

# This subroutine accesses the parameter as a reference.
sub change3 {

#Notice that the subroutine is actually making a copy of a reference with the 
# assignment operator.

    my ($theArg) = @_;
    print "In change, the formal parameter is $$theArg.\n";
    $$theArg += 1;
    print "In change, after adding, the parameter is $$theArg.\n";

The entire program demonstrating all of this can be found here. Execution of the program yields this output:

Original is 1, while alias is 1.
After incrementing the alias, the original is 2, while the alias is 2.
  Changing one changed the other!

In main, the variable arg is 1 before any subroutine calls.
In change, the formal parameter is 1.
In change, after adding, the parameter is 2.
In main, the actual parameter, arg, is 2 after the subroutine call.

Arrays and Hashes as Arguments

Probably the nicest thing about reference parameters is that it allows us to easily pass arrays and hashes as arguments to subroutines. The problem is that when arrays are passed as parameters they are "flattened"—their identity as an array can be lost within the called subroutine. Here is a program that demonstrates this:

#!/usr/bin/perl 
use strict;
use warnings;

my @bob = (1, 2, 3);
my @sue = (4, 5, 6);

print "In main before calling subroutine, bob = @bob and sue = @sue\n";

refSub(@bob, @sue);

print "In main after calling subroutine, bob = @bob and sue = @sue\n";

sub refSub {
    my(@firstArg, @secondArg) = @_;
    print "In the subroutine, firstArg = @firstArg and secondArg = @secondArg\n";
}

When this program is run, the elements of @bob and @sue are flattened into a single array, @_. Consequently, the line

my(@firstArg, @secondArg) = @_;

places all the elements of both arrays in @firstArg. There are none left to go into @secondArg. The output looks like this:

In main before calling subroutine, bob = 1 2 3 and sue = 4 5 6
In the subroutine, firstArg = 1 2 3 4 5 6 and secondArg =
In main after calling subroutine, bob = 1 2 3 and sue = 4 5 6

Notice the second line of output: the value of both arrays has been glommed into a single array.

To retain the "array-ness" of the actual parameters, we use our reference parameter trick again. We pass references to the arrays, and then dereference those via the "@$" syntax within the subroutines. So now our code will look like this:

my @bob = (1, 2, 3);
my @sue = (4, 5, 6);
print "In main, we call a subroutine, but pass references to bob and sue.\n";

refSub2 (\@bob, \@sue);
print "In main after calling that subroutine, bob = @bob and sue = @sue\n";

sub refSub2 {
    my($firstArg, $secondArg) = @_;
    print "In the reference-oriented subroutine, firstArg = @$firstArg and secondArg = @$secondArg\n";
}
The output from running this code is:
In main, we call a subroutine, but pass references to bob and sue.
In the reference-oriented subroutine, firstArg = 1 2 3 and secondArg = 4 5 6
In main after calling that subroutine, bob = 1 2 3 and sue = 4 5 6

In the second line of the output, you can see that we've successfully retained the array-ness of the second parameter. The entire program demonstrating this concept can be found here.

The "Warning" Compilation Flag

In the above program you may also have noticed the "use warnings;" line. This has the same effect compiling with the "-w" flag. In other words, if you attempt to use a variable in an r-expression that the compiler has not previously seen, it will issue a warning. This is especially useful in tracking down misspelled identifiers. Here is a small program with a typo:

# 
# Author: Matthew Evett
# Copyright 2004.
#
# This program demonstrates the usefulness of compiling files with the 
# warning flag turned on.

$bobby = 1;
print "bobby is $boby.\n";

Running this program without the warning flag yields this:

$ perl testWarnings.pl
bobby is .

While the output here is a strong indication that something is wrong with the program, bugs of this sort are often very difficult to detect. Now if we run the program with the warning flag we get several indications from the compiler that something might be amiss:

$ perl -w testWarnings.pl
Name "main::boby" used only once: possible typo at testWarnings.pl line 9.
Name "main::bobby" used only once: possible typo at testWarnings.pl line 8.
Use of uninitialized value in concatenation (.) or string at testWarnings.pl line 9.
bobby is .

The use of the warning flag is almost always a good idea. Consequently:

From now on, all of your Perl files should have a use warnings; line.

Once you've added this line to your files, you won't need to explicitly include the '-w' flag when you compile. It doesn't hurt, however.

Modules & Libraries

You'll want to form collections of your oft-used subroutines so that you won't have to cut and paste them into every program that wants to make use of them. A module is the Perl equivalent of a library in other languages, and of packages in others. Place your subroutines in a file with a name having the suffix ".pm" (Perl Module). For simplicity, make sure this file is in the same directory as the ones that want to use the module. Now, the last line of the module file must be

1;

(the digit one, followed by a semicolon), or your module won't function properly. Suppose that our module file is mattsModules.pm. Now, in the files that want to access the module we place the statement

use mattsModules;

If you want to get more complicated as to the location of modules, you can specify directories where Perl should look for modules either with a command-line argument to Perl, or via the statement

use lib 'directory';

where directory is the pathname of the directory in which your module file is located. Tisdall's textbook comes with an on-line module named BeginPerlBioinfo.

Sometimes, you'll want to run a program and specify dynamically where Perl is to look for modules. In many implementations you can use the -I command-line flag for this purpose:

perl -I'directory' program.pl