String Functions





                    CHAPTER 16: STRING FUNCTIONS


   String Manipulation 

     - Perl provides several functions to perform various operations 
       on strings

     - These are similar to the corresponding awk built-in string 
       functions


   Length Function

     - Returns the length in characters of an expression evaluated in 
       a scalar context

     - length (SCALAR)
       length SCALAR

     - If SCALAR is omitted, the length of $_ is returned

     - Ex.

         $x = length ("toy1.c");   # $x is 6
         $x = length (6 + 6);      # $x is 2


   Index Function

     - Returns the position (starting at 0) of the first (leftmost)
       occurrence of a substring (SUBSTRING) in a string (STRING)

     - index (STRING, SUBSTRING, POSITION)
       index (STRING, SUBSTRING)

     - If the substring is not found in the string, -1 is returned

     - If POSITION is specified, the search starts at that position and 
       the returned value will be greater than or equal to POSITION 
       (or -1)
 
     - Ex.

         $x = index ("testing", "t");     # $x is 0
         $x = index ("testing", "bob");   # $x is -1
         $x = index ("testing", "t", 2);  # $x is 3


   Rindex Function

     - Returns the position (starting at 0) of the last (rightmost)
       occurrence of a substring (SUBSTRING) in a string (STRING)

     - rindex (STRING, SUBSTRING, POSITION)
       rindex (STRING, SUBSTRING)

     - If the substring is not found in the string, -1 is returned
 
     - If POSITION is specified, then it is the rightmost position
       that can be returned.  So the returned value will be less than 
       or equal to POSITION (or -1).

     - Note that the returned value of rindex is still the position of 
       the substring from the LEFT end of the string

     - Ex.

         $x = rindex ("testing", "t");     # $x is 3
         $x = rindex ("testing", "t", 2);  # $x is 0


   Substr Function

     - Extracts a substring from a string

     - substr (STRING, OFFSET, LENGTH)
       substr (STRING, OFFSET)

     - Extracts the substring starting at position OFFSET of length 
       LENGTH from the string STRING
 
     - If LENGTH is not specified, everything to the end of the string 
       is extracted.  If LENGTH is zero or negative, the null string is 
       returned.  The extracted characters NEVER go beyond the end of 
       the string.

     - If OFFSET exceeds the string length, the null string is returned.  
       If OFFSET is a negative number, the extraction begins at |OFFSET| 
       characters from the end of the string.  If a negative OFFSET 
       would cause the extraction to begin before the start of the string,
       an offset of 0 is used.

     - Ex.

         $x = substr ("testing", 2);      # $x is "sting"
         $x = substr ("testing", 2, 3);   # $x is "sti"
         $x = substr ("testing", -2, 3);  # $x is "ng"


   Using Substr As An Lvalue

     - If the STRING argument to the substr function is a scalar variable, 
       substr can itself be used on the left side of an assignment

     - In this case, that part of the string which would have been 
       extracted is changed.  The original string automatically grows 
       or shrinks as appropriate.

     - This method is more efficient than string concatenation

     - Ex.

         $x = "Testing";
         substr ($x, 4) = "ed";         # $x is now "Tested"

         $x = "Testing";
         substr ($x, 0, 0) = "Start ";  # $x is now "Start Testing"
                                        #   A way to prepend!

         $x = "Testing";
         substr ($x, length ($x), 0) = " Over";     
                                        # $x is now "Testing Over"
                                        #   A way to append!


         Note, however, that if the offset is more than the length of
         the string, the original string is unchanged!

         $x = "Testing";
         substr ($x, length ($x) + 1, 0) = " Over";     
                                        # $x is still "Testing"


   Sprintf Function

     - Returns a string formatted by the usual "printf" format
       specifications

     - sprintf (FORMAT, LIST)

     - Sprintf is useful in many circumstances.  In particular, consider 
       the following.  Suppose you want to invoke the system function as 
       follows:

           system ("/bin/chmod 0755 toy1");

       (BTW, it is MUCH better to use the chmod function to do the above, 
       but for the sake of an example, bear with me!)

       Suppose the mode is stored in a scalar variable:

           $mode = 0755;

       If you use:

          system ("/bin/chmod $mode toy1");

       Perl expands $mode as a DECIMAL value (493 in this case) and 
       /bin/chmod complains about an invalid mode.  To solve this 
       problem use sprintf to create a string with the proper octal 
       value for the mode:
 
           $string = sprintf ("/bin/chmod %o toy1", $mode);
           system ($string);


   Hex Function

     - Returns the decimal value of an expression interpreted as
       a hex string

     - hex (EXPR)
       hex EXPR

     - If EXPR is omitted, uses $_
 
     - The hex function is used to convert input data in hex format
       to the proper numeric value

     - The hex function can handle strings with or without a leading 
       0x or 0X

     - Ex.

         $x = hex ("0xa2");                # $x is 162
         $x = hex ("a2");                  # $x is 162
         $x = hex (0xa2);                  # $x is 354 (!)


   Oct Function

     - Returns the decimal value of an expression interpreted as
       an octal string

     - oct (EXPR)
       oct EXPR

     - If EXPR is omitted, uses $_
 
     - The oct function is used to convert input data in octal format
       to the proper numeric value

     - The oct function can also handle strings with a leading 0x or
       0X

     - Ex.

         $x = oct ("042");                 # $x is 34
         $x = oct ("42");                  # $x is 34
         $x = oct ("0x42");                # $x is 66
         $x = oct (042);                   # $x is 28 (!)


   Transliteration

     - Translates all occurrences of the characters found in a search 
       list (SL) to the corresponding character in a replacement list 
       (RL).

     - tr/SL/RL/
       y/SL/RL/    (y is an alias for tr, for you sed fanatics!)

     - Returns the number of characters replaced

     - Similar to the UNIX "tr" command

     - Operates on $_ by default.  The target can be changed with the 
       =~ operator.

     - If the RL is shorter than the SL, the last character of the RL 
       is repeated until the lists are equal length (but NOT if the d 
       (delete) option is used)

     - If the RL is empty, a copy of the SL is used for the RL (but 
       NOT if the d (delete) option is used)

     - A range of characters can be indicated by two characters
       separated by a dash.  (Use \- to get a literal dash.)

     - More efficient than the substitution command

     - Ex.

         $x = "Testing";
         $x =~ tr/et/ET/;          # $x is now "TEsTing"

         $x = "Testing";
         $x =~ tr/a-z/x/;          # $x is now "Txxxxxx"

         $x = "Testing";
         $x =~ tr/A-Z/a-z/;        # $x is now "txxxxxx"
                                   #   (Converts uppercase to
                                   #    lowercase)

         $x = "baacaad";
         $y = $x =~ tr/a//;        # $x is still "baacaad"
                                   # $y is 4 (the number of a's in $x)


   Options For The Transliteration Command

     - d (delete) - deletes all characters in the SL which do NOT 
                    have a corresponding character in the RL.  
                    Those characters from the SL which do have a 
                    corresponding character in the RL are translated 
                    normally.  If the RL is shorter than the SL, it
                    is NOT extended and if the RL is empty, it is
                    NOT equated to the SL.

     - Ex.

         $x = "Testing";
         $x =~ tr/tei/w/d;         # $x is now "Tswng"

     - c (complement) - complements the SL with respect to the
                        characters \001 - \377.  So the actual SL
                        is the set of all possible 256 characters
                        minus the original SL.

     - Ex.

         $x = "Good&Plenty";
         $x =~ tr/a-zA-Z/ /c;      # $x is now "Good Plenty"
                                   #   (All non-alphabetics are
                                   #    changed to blanks)

     - s (squeeze) - squeezes sequences of the same TRANSLATED
                     characters to a single occurrence of that
                     character.  Note that sequences of the
                     same character which occurred in the
                     original string and did NOT result from
                     the translation are NOT squeezed.

     - Ex.

         $x = "Good&Plenty";
         $x =~ tr/len/x/s;        # $x is now "Good&Pxty"

         $x = "Good&Plenty";
         $x =~ tr/len/t/s;        # $x is now "Good&Ptty"


   Other String Functions

     - Don't forget our old favorites!

       chop
       print
       printf
       s///




Bob Tarr
University of Maryland, Baltimore County
tarr@umbc.edu