Basic I/O And Filehandles
CHAPTER 11: BASIC I/O AND FILEHANDLES
What Is A Filehandle?
- Name for an I/O connection to a file or device
- Filehandle is an identifier similar to a variable name, but
without any special prefix character
- Recommended that filehandles be all uppercase to avoid conflict
with reserved words
- Filehandles have their own namespace
STDIN, STDOUT, STDERR
- Perl automatically provides three filehandles
- STDIN: Standard Input
- STDOUT: Standard Output
- STDERR: Standard Error
The Open Function
- Used to associate a filehandle with a file or device
- open (FILEHANDLE, FILENAME)
open (FILEHANDLE)
open FILEHANDLE
- open (FILEHANDLE, "<filename"); # Read Only (RO) [Note #1]
open (FILEHANDLE, "filename"); # Read Only (RO) [Note #1]
open (FILEHANDLE, ">filename"); # Write Only (WO) [Note #2]
open (FILEHANDLE, ">>filename"); # Write Append Only (WAO)
# [Note #3]
open (FILEHANDLE, "+<filename"); # Read/Write (RW) [Note #1]
open (FILEHANDLE, "+>filename"); # Read/Write (RW) [Note #2]
open (FILEHANDLE, "+>>filename"); # Read/Write Append (RWA)
# [Note #3]
Note #1 : Returns failure if file does not exist.
Note #2 : If file does not exist, it is created. If file
does exist, it is truncated to zero bytes.
Note #3 : If file does not exist, it is created.
- Returns 1 (True) on success, undef (False) on failure
- On failure $! is set to the value of errno
- If FILENAME is omitted, the scalar variable of the same name
as the FILEHANDLE contains the filename
- All of the ">" forms create the file is it does not exist
- Ex.
open (IN, "infile.dat"); # Opens file "infile.dat"
# RO with FH IN
open (IN, "<infile.dat"); # Same as above
open (OUT, ">outfile.dat"); # Opens file "outfile.dat"
# WO with FH OUT
The Close Function
- Used to disassociate a filehandle from a file or device
- close (FILEHANDLE)
close FILEHANDLE
- Reopening an existing filehandle, closes the previously opened
file
- All filehandles are closed automatically upon exit
- Returns 1 (True) on success, undef (False) on failure
- On failure $! is set to the value of errno
- Ex.
close (IN); # Closes FH IN
The Input Operator
- <FILEHANDLE>
<SCALAR>
- When evaluated in a scalar context, gives the next line of input
from the file or device associated with the filehandle (including
the terminating newline) or undef if there are no more lines
- When evaluated in an array context, gives all the remaining lines
of input from the file or device associated with the filehandle
(including the terminating newlines) as a list (each list element
is one line) or undef if there are no more lines
- The string inside the angle brackets may be a reference to a
scalar variable that contains the name of the filehandle
- Ex.
$line = <STDIN>; # Read next line from STDIN
@lines = <STDIN>; # Read all remaining lines
# from STDIN
$fh = IN;
$line = <$fh>; # Read a line from FH IN
Use Of $_ In Input
- Whenever a loop test consists solely of an input operator, Perl
copies the input line into the special default variable, $_
- So the following loops are equivalent:
while ($line = <STDIN>)
{
chop ($line);
print ("The input line is $line\n");
}
while (<STDIN>)
{
chop;
print ("The input line is $_\n");
}
- Note that the default operand for the chop operator is also $_.
(But if you use chop(), chop will work on the empty list!)
Use Of The Input Operator With No Filehandle
- If the input operator is used without a filehandle, the data is
read from the files specified on the command line at the time
the Perl script was invoked
- Ex.
#!/usr/bin/perl
while (<>)
{
print; # Same as print ($_)
}
Above Perl program is called perlcat. If invoked as
"perlcat file1 file2", the contents of file1 and file2
will be printed, just as the cat program would do.
@ARGV And $ARGV And ARGV
- Perl stores the command line arguments (excluding any Perl
options and the command name itself) in the special array @ARGV
- The <> operator reads data from the files specified in the @ARGV
array, one file at a time, until the last line of the last file
is read
- The name of the file currently being processed is kept in the
special $ARGV variable
- $ARGV is not given a value until the first line of the first
file is read. So we have:
print ("$ARGV:\n"); # $ARGV is null here
while (<>)
{
print ("$ARGV:\n"); # $ARGV has the name of the first
# file here
}
- The filehandle used by the <> operator is ARGV
- Note that $ARGV[0] is NOT the command name, but rather the first
argument after the command name and any Perl options. Use the
special variable $0 to get the command name.
- If the @ARGV array is initially empty, the <> operator will read
from STDIN
- The filename "-" can be used on the command line to cause the <>
operator to read from STDIN
Other Ways To Process Command Line Files
- Instead of using the <> operator, you can access the @ARGV array
directly
- Ex.
# Here is one way.
foreach $file (@ARGV)
{
open (IN, "<$file");
while (<IN>)
{
# Process that file!
}
}
- Ex.
# And here is another.
while ($file = shift (@ARGV))
{
open (IN, "<$file");
while (<IN>)
{
# Process that file!
}
}
The Eof Function
- Used to detect end of file on a file
- eof (FILEHANDLE)
eof ()
eof
- Returns 1 (True) if the next read on FILEHANDLE will return EOF
or if FILEHANDLE does not refer to an open file. Returns undef
(False) otherwise.
- If the "eof" form is used, returns the eof status of the last
file read
- If the "eof ()" form is used inside a "while(<>)" loop, it
returns True only when all of the files specified on the
command line (@ARGV) have been read
- To test each file inside a "while(<>)" loop, use "eof(ARGV)"
or "eof"
- Note that the special meaning of "eof()" is only valid
inside a "while(<>)" loop. Outside of such a loop, both
"eof()" and "eof" return the eof status of the last file
read
- Typical use:
$status = eof (IN);
The $. Variable
- The special variable $. contains the line number (starting from
1) of the last filehandle that was read
- Each filehandle maintains its own copy of the last line number
read from its associated file
- Only an explicit close of a filehandle resets the line number
- The <> operator does NOT explicitly close the ARGV filehandle
and so line number increases across @ARGV file, unless an
explicit close of ARGV is done
- Ex.
#!/usr/bin/perl
# Here's another version of our perlcat program.
# This one prints out line numbers for each file.
while (<>)
{
print ("$. : $_\n");
close (ARGV) if (eof);
}
The Range Operator
- The range operator (..) is really two different operators
depending on whether it is used in an array context or a scalar
context
- In an array context, the range operator is the list constructor
operator described previously
- Ex.
@x = (1 .. 5); # @x is (1, 2, 3, 4, 5)
- In a scalar context, the range operator is either True (1) or
False (undef). It is False as long as its left operand is False.
It becomes True when the left operand becomes True. It stays
True until the right operand is True; then it becomes False
again.
- Ex.
($a > 1) .. ($b > 1); # False while $a <= 1. If $a exceeds
# 1, becomes True. Stays True while
# $b <= 1. If $b exceeds 1, becomes
# False again.
- What is the purpose of this scalar range operator???
Line Number Ranges!!!
Using The Range Operator For Line Number Ranges
- If either operand of the scalar range operator is a static number
and can be evaluated at compile time, that operand is implicitly
compared to the $. (current line number) variable
- This allows line number ranges in a fashion similar to awk
- Ex.
while (<>)
{
print if (10 .. 20); # Print lines 10 to 20
# The expression (10 .. 20) is really
# ($. == 10 .. $. == 20)
}
- Ex.
while (<>)
{
next if (1 .. /^$/); # Skip header lines
}
- Ex.
while (<>)
{
s/^/> / if (/^$/ .. eof()); # Put "> " before each body line
}
- Note that if you want to use variables as line number ranges, you
must explicitly make the comparison to $.
- Ex.
$a = 10;
$b = 20;
while (<>)
{
print if ($. == $a .. $. == $b); # Print lines 10 to 20
}
The Print Function
- Used to output a list to the file or device associated with a
filehandle
- print (FILEHANDLE LIST)
print (LIST)
print FILEHANDLE LIST
print LIST
print
- Sends each string in the specified LIST to the file or device
associated with FILEHANDLE, without any intervening or trailing
characters added or any formatting done
- If FILEHANDLE is omitted, data is output to the currently
selected output filehandle, which is initially STDOUT
- Note that there is NO comma after the filehandle. This allows
the FILEHANDLE to be the value of a scalar variable.
- Ex.
print ("Hello, World!\n"); # Print usual to STDOUT
print ("Hello, ", "World!\n"); # Same thing
print (STDERR "Error! Error!\n"); # Prints to STDERR
The Printf Function
- Used to output a list to the file or device associated with a
filehandle with formatting similar to the C printf() function
- printf (FILEHANDLE FORMAT, LIST)
printf (FORMAT, LIST)
printf FILEHANDLE FORMAT, LIST
printf FORMAT, LIST
- Uses the first element of list as a format specification string.
Sends each subsequent string in the specified LIST to the file
or device associated with FILEHANDLE, with the formatting done
according to the format specification.
- If FILEHANDLE is omitted, data is output to the currently
selected output filehandle, which is initially STDOUT
- Note that there is NO comma after the filehandle. This allows
the FILEHANDLE to be the value of a scalar variable.
- Ex.
printf ("Hello, World!\n"); # Print usual to STDOUT
printf ("20%s\n", $line); # Prints $line right-
# justified in field of
# width 20 to STDOUT
printf (STDERR "%6.2f", $line); # Prints to STDERR
The Die Function
- Used to write a message to STDERR and exit the program
- die (LIST)
die LIST
die
- The exit status will be the current value of $! (errno)
- Useful when opening files critical to the execution of the
program
- If the last element of LIST does not end in a newline, the
current script filename, line number and input line number
(if any) are appended to the message, along with a
final newline
- Ex.
unless (open (IN, "infile.dat"))
{
die ("Program X: Could not open input file: $!\n");
}
This is more commonly written as:
open (IN, "infile.dat") ||
die ("Program X: Could not open input file: $!\n");
The Warn Function
- Used to write a message to STDERR in a fashion similar to the die
function, but does NOT exit the program
- warn (LIST)
warn LIST
- Useful for printing out warning messages
- Ex.
warn ("You are about to do something dangerous!\n") if ($danger);
Example Program
- Here is a simple implementation of a copy program:
open (IN, "<$infile") || die ("Could not open input file: $!\n");
open (OUT, ">$outfile") || die ("Could not open output file: $!\n");
while (<IN>)
{
print (OUT $_);
}
close (IN);
close (OUT);
The Select Function
- Used to select the current default filehandle (CDF)
- select (FILEHANDLE)
select
- The CDF is the default filehandle used by the print, printf and
write functions. Perl initially sets the CDF to to STDOUT. So
initially, the following statements are the same:
print (STDOUT "Hello, World\n");
print ("Hello, World\n");
- Returns the previous CDF as a string scalar
- Changing the CDF is useful when using Perl formats
- The CDF is also known as the currently selected filehandle since
the select function is used to specify its value
- Ex.
#!/usr/bin/perl
print ("Hello, World\n"); # Goes to STDOUT
open (NEWFH, ">test.dat");
$oldcdf = select (NEWFH); # $oldcdf is "STDOUT"
print ("Hello, World\n"); # Goes to NEWFH ("test.dat")
$oldcdf = select ($oldcdf);# $oldcdf is "NEWFH"
print ("Hello, World\n"); # Goes to STDOUT
The $| Variable
- The special variable $| is used to control whether or not
output directed to a filehandle is buffered
- Each filehandle maintains its own copy of the $| variable
- If the $| variable is set to a non-zero value, no buffering
is done
- By default, $| is zero and buffering is enabled
- To change the value of $| for a filehandle, simply make the
filehandle be the CDF and assign a value to $|
- Ex.
$oldcdf = select (OUT);
$| = 1; # No buffering for OUT
Special Variables For Filehandles
- Each filehandle maintains its own copy of the following
variables:
$. Current Line Number
$| Buffering Flag
$~ Format Name
$^ Top-Of-Page Format Name
$% Current Page Number
$= Current Page Length
$- Number Of Lines Left On Current Page
- These last five are described in the chapter on Formats.
Bob Tarr
University of Maryland, Baltimore County
tarr@umbc.edu