Basic I/O And Filehandles





                    CHAPTER 11: BASIC I/O AND FILEHANDLES


   What Is A Filehandle?

     - Name for an I/O connection to a file or device

     - Filehandle is an identifier similar to a variable name, but 
       without any special prefix character

     - Recommended that filehandles be all uppercase to avoid conflict 
       with reserved words

     - Filehandles have their own namespace


   STDIN, STDOUT, STDERR

     - Perl automatically provides three filehandles
    
     - STDIN: Standard Input

     - STDOUT: Standard Output
   
     - STDERR: Standard Error


   The Open Function

     - Used to associate a filehandle with a file or device

     - open (FILEHANDLE, FILENAME)
       open (FILEHANDLE)
       open FILEHANDLE

     - open (FILEHANDLE, "<filename");    # Read Only (RO) [Note #1]
       open (FILEHANDLE, "filename");     # Read Only (RO) [Note #1]
       open (FILEHANDLE, ">filename");    # Write Only (WO) [Note #2]
       open (FILEHANDLE, ">>filename");   # Write Append Only (WAO) 
                                          #   [Note #3]
       open (FILEHANDLE, "+<filename");   # Read/Write (RW)  [Note #1]
       open (FILEHANDLE, "+>filename");   # Read/Write (RW) [Note #2]
       open (FILEHANDLE, "+>>filename");  # Read/Write Append (RWA) 
                                          #   [Note #3]
 
       Note #1 : Returns failure if file does not exist.

       Note #2 : If file does not exist, it is created.  If file 
                 does exist, it is truncated to zero bytes.

       Note #3 : If file does not exist, it is created. 

     - Returns 1 (True) on success, undef (False) on failure

     - On failure $! is set to the value of errno

     - If FILENAME is omitted, the scalar variable of the same name
       as the FILEHANDLE contains the filename

     - All of the ">" forms create the file is it does not exist

     - Ex.

         open (IN, "infile.dat");     # Opens file "infile.dat"
                                      #   RO with FH IN
         open (IN, "<infile.dat");    # Same as above
         open (OUT, ">outfile.dat");  # Opens file "outfile.dat"
                                      #   WO with FH OUT


   The Close Function

     - Used to disassociate a filehandle from a file or device

     - close (FILEHANDLE)
       close FILEHANDLE

     - Reopening an existing filehandle, closes the previously opened 
       file

     - All filehandles are closed automatically upon exit

     - Returns 1 (True) on success, undef (False) on failure

     - On failure $! is set to the value of errno

     - Ex.

         close (IN);               # Closes FH IN


   The Input Operator

     - <FILEHANDLE> 
       <SCALAR>

     - When evaluated in a scalar context, gives the next line of input 
       from the file or device associated with the filehandle (including 
       the terminating newline) or undef if there are no more lines

     - When evaluated in an array context, gives all the remaining lines 
       of input from the file or device associated with the filehandle 
       (including the terminating newlines) as a list (each list element 
       is one line) or undef if there are no more lines

     - The string inside the angle brackets may be a reference to a 
       scalar variable that contains the name of the filehandle

     - Ex.

         $line = <STDIN>;          # Read next line from STDIN
         @lines = <STDIN>;         # Read all remaining lines
                                   #   from STDIN
         $fh = IN;
         $line = <$fh>;            # Read a line from FH IN


   Use Of $_ In Input

     - Whenever a loop test consists solely of an input operator, Perl 
       copies the input line into the special default variable, $_

     - So the following loops are equivalent:

       while ($line = <STDIN>)
       {
         chop ($line);
         print ("The input line is $line\n");
       }

       while (<STDIN>)
       {
         chop;
         print ("The input line is $_\n");
       }

     - Note that the default operand for the chop operator is also $_.  
       (But if you use chop(), chop will work on the empty list!)

  
   Use Of The Input Operator With No Filehandle

     - If the input operator is used without a filehandle, the data is 
       read from the files specified on the command line at the time 
       the Perl script was invoked

     - Ex.

         #!/usr/bin/perl

         while (<>)
         {
           print;                  # Same as print ($_)
         }

         Above Perl program is called perlcat.  If invoked as
         "perlcat file1 file2", the contents of file1 and file2
         will be printed, just as the cat program would do.


   @ARGV And $ARGV And ARGV

     - Perl stores the command line arguments (excluding any Perl
       options and the command name itself) in the special array @ARGV

     - The <> operator reads data from the files specified in the @ARGV 
       array, one file at a time, until the last line of the last file 
       is read

     - The name of the file currently being processed is kept in the 
       special $ARGV variable

     - $ARGV is not given a value until the first line of the first
       file is read.  So we have:

         print ("$ARGV:\n");       # $ARGV is null here
         while (<>)
         {
	   print ("$ARGV:\n");     # $ARGV has the name of the first
                                   #    file here
         }

     - The filehandle used by the <> operator is ARGV

     - Note that $ARGV[0] is NOT the command name, but rather the first
       argument after the command name and any Perl options.  Use the
       special variable $0 to get the command name.

     - If the @ARGV array is initially empty, the <> operator will read
       from STDIN

     - The filename "-" can be used on the command line to cause the <> 
       operator to read from STDIN 


   Other Ways To Process Command Line Files

     - Instead of using the <> operator, you can access the @ARGV array 
       directly

     - Ex.

         # Here is one way.

         foreach $file (@ARGV)
         {
           open (IN, "<$file");
           while (<IN>)
           {
             # Process that file!
           }
         }

     - Ex.

         # And here is another.

         while ($file = shift (@ARGV))
         {
           open (IN, "<$file");
           while (<IN>)
           {
             # Process that file!
           }
         }


   The Eof Function

     - Used to detect end of file on a file

     - eof (FILEHANDLE)
       eof ()
       eof 

     - Returns 1 (True) if the next read on FILEHANDLE will return EOF
       or if FILEHANDLE does not refer to an open file.  Returns undef
       (False) otherwise.

     - If the "eof" form is used, returns the eof status of the last
       file read

     - If the "eof ()" form is used inside a "while(<>)" loop, it
       returns True only when all of the files specified on the
       command line (@ARGV) have been read

     - To test each file inside a "while(<>)" loop, use "eof(ARGV)"
       or "eof"

     - Note that the special meaning of "eof()" is only valid
       inside a "while(<>)" loop.   Outside of such a loop, both
       "eof()" and "eof" return the eof status of the last file
       read

     - Typical use:

         $status = eof (IN);


   The $. Variable

     - The special variable $. contains the line number (starting from 
       1) of the last filehandle that was read

     - Each filehandle maintains its own copy of the last line number
       read from its associated file

     - Only an explicit close of a filehandle resets the line number

     - The <> operator does NOT explicitly close the ARGV filehandle 
       and so line number increases across @ARGV file, unless an
       explicit close of ARGV is done

     - Ex.

         #!/usr/bin/perl

         # Here's another version of our perlcat program.
         # This one prints out line numbers for each file.

         while (<>)
         {
           print ("$. : $_\n");
           close (ARGV) if (eof);
         }


   The Range Operator

     - The range operator (..) is really two different operators 
       depending on whether it is used in an array context or a scalar
       context

     - In an array context, the range operator is the list constructor
       operator described previously

     - Ex.
       
         @x = (1 .. 5);            # @x is (1, 2, 3, 4, 5)

     - In a scalar context, the range operator is either True (1) or
       False (undef).  It is False as long as its left operand is False.
       It becomes True when the left operand becomes True.  It stays
       True until the right operand is True; then it becomes False
       again.

     - Ex.

         ($a > 1) .. ($b > 1);    # False while $a <= 1.  If $a exceeds
                                  #   1, becomes True.  Stays True while
                                  #   $b <= 1.  If $b exceeds 1, becomes
                                  #   False again.

     - What is the purpose of this scalar range operator???
       Line Number Ranges!!!


   Using The Range Operator For Line Number Ranges

     - If either operand of the scalar range operator is a static number
       and can be evaluated at compile time, that operand is implicitly
       compared to the $. (current line number) variable

     - This allows line number ranges in a fashion similar to awk

     - Ex.

         while (<>)
         {
	   print if (10 .. 20);   # Print lines 10 to 20
                                  # The expression (10 .. 20) is really
                                  #   ($. == 10 .. $. == 20)
         }

     - Ex.

         while (<>)
         {
	   next if (1 .. /^$/);   # Skip header lines
         }

     - Ex.

         while (<>)
         {
	   s/^/> / if (/^$/ .. eof());  # Put "> " before each body line
         }

     - Note that if you want to use variables as line number ranges, you
       must explicitly make the comparison to $.

     - Ex.

         $a = 10;
         $b = 20;
         while (<>)
         {
	   print if ($. == $a .. $. == $b);  # Print lines 10 to 20
         }


   The Print Function

     - Used to output a list to the file or device associated with a 
       filehandle 

     - print (FILEHANDLE LIST)
       print (LIST)
       print FILEHANDLE LIST
       print LIST
       print 

     - Sends each string in the specified LIST to the file or device 
       associated with FILEHANDLE, without any intervening or trailing 
       characters added or any formatting done

     - If FILEHANDLE is omitted, data is output to the currently
       selected output filehandle, which is initially STDOUT

     - Note that there is NO comma after the filehandle.  This allows 
       the FILEHANDLE to be the value of a scalar variable.

     - Ex.

         print ("Hello, World!\n");          # Print usual to STDOUT
         print ("Hello, ", "World!\n");      # Same thing
         print (STDERR "Error! Error!\n");   # Prints to STDERR


   The Printf Function

     - Used to output a list to the file or device associated with a 
       filehandle with formatting similar to the C printf() function 

     - printf (FILEHANDLE FORMAT, LIST)
       printf (FORMAT, LIST)
       printf FILEHANDLE FORMAT, LIST
       printf FORMAT, LIST

     - Uses the first element of list as a format specification string.  
       Sends each subsequent string in the specified LIST to the file 
       or device associated with FILEHANDLE, with the formatting done 
       according to the format specification.

     - If FILEHANDLE is omitted, data is output to the currently
       selected output filehandle, which is initially STDOUT

     - Note that there is NO comma after the filehandle.  This allows 
       the FILEHANDLE to be the value of a scalar variable.

     - Ex.

         printf ("Hello, World!\n");       # Print usual to STDOUT
         printf ("20%s\n", $line);         # Prints $line right-
                                           #   justified in field of
                                           #   width 20 to STDOUT
         printf (STDERR "%6.2f", $line);   # Prints to STDERR


   The Die Function

     - Used to write a message to STDERR and exit the program

     - die (LIST)
       die LIST
       die

     - The exit status will be the current value of $! (errno)

     - Useful when opening files critical to the execution of the
       program

     - If the last element of LIST does not end in a newline, the
       current script filename, line number and input line number
       (if any) are appended to the message, along with a
       final newline

     - Ex.

         unless (open (IN, "infile.dat"))
         {
           die ("Program X: Could not open input file: $!\n");
         }

         This is more commonly written as:

         open (IN, "infile.dat") || 
           die ("Program X: Could not open input file: $!\n");


   The Warn Function

     - Used to write a message to STDERR in a fashion similar to the die 
       function, but does NOT exit the program

     - warn (LIST)
       warn LIST

     - Useful for printing out warning messages

     - Ex.

	 warn ("You are about to do something dangerous!\n") if ($danger);


   Example Program

     - Here is a simple implementation of a copy program:

       open (IN, "<$infile") || die ("Could not open input file: $!\n");
       open (OUT, ">$outfile") || die ("Could not open output file: $!\n");

       while (<IN>)
       {
         print (OUT $_);
       }

       close (IN);
       close (OUT);


   The Select Function

     - Used to select the current default filehandle (CDF)

     - select (FILEHANDLE)
       select

     - The CDF is the default filehandle used by the print, printf and 
       write functions.  Perl initially sets the CDF to to STDOUT.  So 
       initially, the following statements are the same:

           print (STDOUT "Hello, World\n");
           print ("Hello, World\n");

     - Returns the previous CDF as a string scalar

     - Changing the CDF is useful when using Perl formats

     - The CDF is also known as the currently selected filehandle since
       the select function is used to specify its value

     - Ex.

         #!/usr/bin/perl

         print ("Hello, World\n");  # Goes to STDOUT
         open (NEWFH, ">test.dat"); 
         $oldcdf = select (NEWFH);  # $oldcdf is "STDOUT"
         print ("Hello, World\n");  # Goes to NEWFH ("test.dat")
         $oldcdf = select ($oldcdf);# $oldcdf is "NEWFH"
         print ("Hello, World\n");  # Goes to STDOUT


   The $| Variable

     - The special variable $| is used to control whether or not
       output directed to a filehandle is buffered

     - Each filehandle maintains its own copy of the $| variable

     - If the $| variable is set to a non-zero value, no buffering
       is done

     - By default, $| is zero and buffering is enabled

     - To change the value of $| for a filehandle, simply make the
       filehandle be the CDF and assign a value to $|

     - Ex.

         $oldcdf = select (OUT);
         $| = 1;                    # No buffering for OUT


   Special Variables For Filehandles

     - Each filehandle maintains its own copy of the following
       variables:

         $.             Current Line Number
         $|             Buffering Flag

         $~             Format Name
         $^             Top-Of-Page Format Name
         $%             Current Page Number
         $=             Current Page Length
         $-             Number Of Lines Left On Current Page

     - These last five are described in the chapter on Formats.




Bob Tarr
University of Maryland, Baltimore County
tarr@umbc.edu