Sorting





                    CHAPTER 17: SORTING 


   Sorting In Perl

     - As mentioned earlier, the Perl sort function can be used to sort 
       the elements of an array in ascending ASCII order

     - But the sort function is actually more general-purpose and can 
       be used to perform many different sorting operations


   Basic Sorting With Sort

     - By default, the sort function sorts in ascending ASCII order

     - sort (LIST)
       sort LIST

     - Returns the sorted list
 
     - Ex.

         @x = ("the", "bob", "Bob"); 
         @y = sort (@x);           # @y is ("Bob", "bob", "the")

         @x = ("the", 123, 91); 

         @y = sort (@x);           # @y is (123, 91, "the")

         Note that the numeric literals are treated as strings!


   Advanced Sorting With Sort

     - The sort function can also take a user-defined subroutine which 
       defines how two elements of the array will be compared to do the 
       sort

     - sort (SUBROUTINE LIST)
       sort SUBROUTINE LIST

     - The subroutine is written to compare the two scalar values, 
       $a and $b, and return the following:

           -1 if $a should appear BEFORE $b in the sorted list

            0 if $a and $b are equal in the sort order

            1 if $a should appear AFTER $b in the sorted list

     - The subroutine is repeatedly invoked by Perl on two array
       elements until the list is sorted

     - The two elements to be compared are NOT passed to the subroutine 
       via @_, but rather as the variables $a and $b.  (Any variable $a 
       or $b you may have is protected.) 
   
     - SUBROUTINE may be the value of a scalar variable (which is why 
       there is NO comma after SUBROUTINE in the argument list for sort!)


   Ascending Numerical Sorting

     - This is how to sort the elements of an array in ascending
       numerical order:

       sub by_number_ascending
       {
         $a <=> $b;
       }
  
       Note that we are using the numeric compare operator (<=>) here:

           $a <=> $b evaluates to

               -1 if $a is numerically less than $b
                0 if $a is numerically equal to $b
                1 if $a is numerically greater than $b

       Now we can do the following:

         @x = (25, 14, -3);
         @y = sort (by_number_ascending @x);  # @y is (-3, 14, 25)

       Or we could just say:

         @y = sort ({$a <=> $b} @x);


   Descending Numerical Sorting

     - Once we have an ascending numerical sort, we can use the reverse 
       function to create a descending numerical sort:

         @x = (25, 14, -3);
         @y = reverse sort (by_number_ascending @x);  
                                              # @y is (25, 14, -3)


     - But if speed is important, it is better to define another sort 
       subroutine:

       sub by_number_descending
       {
         $b <=> $a;
       }
  
       And now we have:

         @x = (25, 14, -3);
         @y = sort (by_number_descending @x);  # @y is (25, 14, -3)


   Fundamental Sort Subroutines

     - So our fundamental sort routines are:

       sub by_string_ascending
       {                          # Really NO reason to ever define
         $a cmp $b;               #   this one.  It's the default!
       }
  
       sub by_string_descending
       { 
         $b cmp $a;
       }
  
       sub by_number_ascending
       {
         $a <=> $b;
       }
  
       sub by_number_descending
       {
         $b <=> $a;
       }
  

   Numerical, Then String Sorting

     - What happens is we numerically sort an array which has some
       string values?

       Consider:

         @x = (25, "123Bob", "99Joe");
         @y = sort (by_number_ascending @x);  # @y is (25, "99Joe",
                                              #    "123Bob")

       Note that when Perl did the numeric comparison using the <=>
       operator, it converted "123Bob" to 123 and "99Joe" to 99 and
       numerically compared them, putting the string that started
       with 99 before the string that started with 123, which makes
       sense for a numerical sort.  

       But what happens if we have:   

         @x = (25, "test", "goody");
         @y = sort (by_number_ascending @x);  # @y is ("test",
                                              #    "goody", 25)

       Here the string "test" came before the string "goody" in the
       sorted array.  Why?  Because both "test" and "goody" are 
       converted to numeric 0!

     - Perhaps we want to first sort numerically, but then if two
       elements are equal to sort by string:

       sub by_number_ascending_then_string
       {
         ($a <=> $b) || ($a cmp $b);
       }
  
       Now we have:

         @x = (25, "test", "goody");
         @y = sort (by_number_ascending_then_string @x);  
                                              # @y is ("goody",
                                              #    "test", 25)


   Associative Array Sorting

     - If you want to display the key-value pairs of an associative
       array sorted by the key value, you can use the keys function
       and the sort function in a straight-forward manner

     - For example, suppose the %accounts associative array has keys 
       which are the login names of each user and values which are 
       their real names.  We can display the key-value pairs of this 
       array sorted by ascending alphabetic order of its keys using:

       foreach $key (sort (keys (%accounts)))
       { 
         print ("Key $key has value $accounts{$key}\n");
       }

     - But what if we want to display the key-value pairs of this array 
       sorted by ascending alphabetic order of its values?? Well, we can 
       use:

       sub by_values
       {
         ($accounts{$a} cmp $accounts{$b}) || ($a cmp $b);
       } 

       @sortedkeys_by_values = sort by_values (keys (%accounts));

       foreach $key (@sortedkeys_by_values)
       { 
         print ("Key $key has value $accounts{$key}\n");
       }


   Example Program

     - Here is a program which sorts the /etc/passwd file in ascending 
       numerical UID order.  This is a general technique that can be 
       used to sort an array by a computable field of each element of 
       the array.  In this case each element of the array is a line 
       from the /etc/passwd file and the computable field is the third 
       colon-separated field (the UID) of the element.

       #!/usr/bin/perl

       # Read in each line of the /etc/passwd file as a element
       #   of the @data array.

       @lines = split ("\n", `cat /etc/passwd`);

       # Now create an array @uids with the UID of each user.
       #   The UIDs are kept in this array in the order
       #   encountered in the /etc/passwd file.

       foreach (@lines)
       {
         push (@uids, (split (/:/)[2]);
       }

       # Now define our sort subroutine.

       sub by_uids
       {
         $uids[$a] <=> $uids[$b];
       }

       # Now do the sort.

       @sortedlines = @lines[sort (by_uids (0..$#lines))];

       # How did the above work??!!??
       # Suppose there were 10 lines in the /etc/passwd file.
       #   The (0..$#lines) list generates the list (0..9).
       #   We then sort those numbers based on the UID
       #   values in the @uids array.  But remember the @uids
       #   array has the UIDs in the order they appeared in
       #   the /etc/passwd file.  The result is a sorted list
       #   of the numbers 0 to 9 which indicates the order
       #   we should display the /etc/passwd file lines for
       #   ascending UID order.  We then use this sorted list
       #   of the numbers 0 to 9 as a slice of the original
       #   @lines array to get the desired sorted array!

       # So print it.

       foreach $line (@sortedlines)
       {
         print ("$line\n");
       }




Bob Tarr
University of Maryland, Baltimore County
tarr@umbc.edu