Sorting
CHAPTER 17: SORTING
Sorting In Perl
- As mentioned earlier, the Perl sort function can be used to sort
the elements of an array in ascending ASCII order
- But the sort function is actually more general-purpose and can
be used to perform many different sorting operations
Basic Sorting With Sort
- By default, the sort function sorts in ascending ASCII order
- sort (LIST)
sort LIST
- Returns the sorted list
- Ex.
@x = ("the", "bob", "Bob");
@y = sort (@x); # @y is ("Bob", "bob", "the")
@x = ("the", 123, 91);
@y = sort (@x); # @y is (123, 91, "the")
Note that the numeric literals are treated as strings!
Advanced Sorting With Sort
- The sort function can also take a user-defined subroutine which
defines how two elements of the array will be compared to do the
sort
- sort (SUBROUTINE LIST)
sort SUBROUTINE LIST
- The subroutine is written to compare the two scalar values,
$a and $b, and return the following:
-1 if $a should appear BEFORE $b in the sorted list
0 if $a and $b are equal in the sort order
1 if $a should appear AFTER $b in the sorted list
- The subroutine is repeatedly invoked by Perl on two array
elements until the list is sorted
- The two elements to be compared are NOT passed to the subroutine
via @_, but rather as the variables $a and $b. (Any variable $a
or $b you may have is protected.)
- SUBROUTINE may be the value of a scalar variable (which is why
there is NO comma after SUBROUTINE in the argument list for sort!)
Ascending Numerical Sorting
- This is how to sort the elements of an array in ascending
numerical order:
sub by_number_ascending
{
$a <=> $b;
}
Note that we are using the numeric compare operator (<=>) here:
$a <=> $b evaluates to
-1 if $a is numerically less than $b
0 if $a is numerically equal to $b
1 if $a is numerically greater than $b
Now we can do the following:
@x = (25, 14, -3);
@y = sort (by_number_ascending @x); # @y is (-3, 14, 25)
Or we could just say:
@y = sort ({$a <=> $b} @x);
Descending Numerical Sorting
- Once we have an ascending numerical sort, we can use the reverse
function to create a descending numerical sort:
@x = (25, 14, -3);
@y = reverse sort (by_number_ascending @x);
# @y is (25, 14, -3)
- But if speed is important, it is better to define another sort
subroutine:
sub by_number_descending
{
$b <=> $a;
}
And now we have:
@x = (25, 14, -3);
@y = sort (by_number_descending @x); # @y is (25, 14, -3)
Fundamental Sort Subroutines
- So our fundamental sort routines are:
sub by_string_ascending
{ # Really NO reason to ever define
$a cmp $b; # this one. It's the default!
}
sub by_string_descending
{
$b cmp $a;
}
sub by_number_ascending
{
$a <=> $b;
}
sub by_number_descending
{
$b <=> $a;
}
Numerical, Then String Sorting
- What happens is we numerically sort an array which has some
string values?
Consider:
@x = (25, "123Bob", "99Joe");
@y = sort (by_number_ascending @x); # @y is (25, "99Joe",
# "123Bob")
Note that when Perl did the numeric comparison using the <=>
operator, it converted "123Bob" to 123 and "99Joe" to 99 and
numerically compared them, putting the string that started
with 99 before the string that started with 123, which makes
sense for a numerical sort.
But what happens if we have:
@x = (25, "test", "goody");
@y = sort (by_number_ascending @x); # @y is ("test",
# "goody", 25)
Here the string "test" came before the string "goody" in the
sorted array. Why? Because both "test" and "goody" are
converted to numeric 0!
- Perhaps we want to first sort numerically, but then if two
elements are equal to sort by string:
sub by_number_ascending_then_string
{
($a <=> $b) || ($a cmp $b);
}
Now we have:
@x = (25, "test", "goody");
@y = sort (by_number_ascending_then_string @x);
# @y is ("goody",
# "test", 25)
Associative Array Sorting
- If you want to display the key-value pairs of an associative
array sorted by the key value, you can use the keys function
and the sort function in a straight-forward manner
- For example, suppose the %accounts associative array has keys
which are the login names of each user and values which are
their real names. We can display the key-value pairs of this
array sorted by ascending alphabetic order of its keys using:
foreach $key (sort (keys (%accounts)))
{
print ("Key $key has value $accounts{$key}\n");
}
- But what if we want to display the key-value pairs of this array
sorted by ascending alphabetic order of its values?? Well, we can
use:
sub by_values
{
($accounts{$a} cmp $accounts{$b}) || ($a cmp $b);
}
@sortedkeys_by_values = sort by_values (keys (%accounts));
foreach $key (@sortedkeys_by_values)
{
print ("Key $key has value $accounts{$key}\n");
}
Example Program
- Here is a program which sorts the /etc/passwd file in ascending
numerical UID order. This is a general technique that can be
used to sort an array by a computable field of each element of
the array. In this case each element of the array is a line
from the /etc/passwd file and the computable field is the third
colon-separated field (the UID) of the element.
#!/usr/bin/perl
# Read in each line of the /etc/passwd file as a element
# of the @data array.
@lines = split ("\n", `cat /etc/passwd`);
# Now create an array @uids with the UID of each user.
# The UIDs are kept in this array in the order
# encountered in the /etc/passwd file.
foreach (@lines)
{
push (@uids, (split (/:/)[2]);
}
# Now define our sort subroutine.
sub by_uids
{
$uids[$a] <=> $uids[$b];
}
# Now do the sort.
@sortedlines = @lines[sort (by_uids (0..$#lines))];
# How did the above work??!!??
# Suppose there were 10 lines in the /etc/passwd file.
# The (0..$#lines) list generates the list (0..9).
# We then sort those numbers based on the UID
# values in the @uids array. But remember the @uids
# array has the UIDs in the order they appeared in
# the /etc/passwd file. The result is a sorted list
# of the numbers 0 to 9 which indicates the order
# we should display the /etc/passwd file lines for
# ascending UID order. We then use this sorted list
# of the numbers 0 to 9 as a slice of the original
# @lines array to get the desired sorted array!
# So print it.
foreach $line (@sortedlines)
{
print ("$line\n");
}
Bob Tarr
University of Maryland, Baltimore County
tarr@umbc.edu