File Manipulation





                    CHAPTER 12: FILE MANIPULATION


   File Manipulation 

     - Perl provides a large number of functions to perform various 
       operations on files

     - These are very similar to the corresponding UNIX system call or 
       command

     - See the UNIX man pages for details


   File Test Operators

     - Operates on a filename or filehandle argument (except for -t 
       which only operates on a filehandle argument)

     - Tests associated file to determine if something is true or not 
       about the file

     - If the argument is omitted, $_ is tested (except for -t which 
       tests STDIN)

     - Most of these operators return 1 for True and the empty string 
       for False, or undef if the file does not exist (except for -s 
       which returns the file size and -M, -A and -C which return the 
       file age)

     - Precedence is higher than logical and relational operators, but 
       lower than arithmetic operators

     - For superuser, -r, -R, -w and -W always return True and -x and
       -X return True if ANY execute bit is set

     - Ex.

         if (-e "/etc/passwd")                # Does it exist?
         {
           print ("Let's start hacking!\n");
         }


   File Test Operator List

         -r             File is readable by effective uid
         -w             File is writable by effective uid
         -x             File is executable by effective uid
         -o             File is owned by effective uid
         -R             File is readable by real uid
         -W             File is writable by real uid
         -X             File is executable by real uid
         -O             File is owned by real uid
         -e             File exists
         -z             File exists and has zero size
         -s             File exists and has nonzero size 
                          (returns size in bytes)
         -f             File is a plain file
         -d             File is a directory
         -l             File is a symbolic link
         -p             File is a named pipe (FIFO)
         -S             File is a socket
         -b             File is a block special file
         -c             File is a character special file
         -u             File has its setuid bit set
         -g             File has its setgid bit set
         -k             File has its sticky bit set
         -t             Filehandle is a tty
         -T             File is a text file
         -B             File is a binary file
         -M             Modification age in days
         -A             Access age in days
         -C             Inode-modification age in days


   Stat Function

     - Returns a 13-element array of info on a file

     - stat (FILEHANDLE)
       stat FILEHANDLE
       stat (FILENAME)

     - Useful for file info which the file test operators do not 
       provide (such as number of links) or for finding the true 
       mode when superuser

     - Typical use:

         ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
          $atime, $mtime, $ctime, $blksize, $blocks)
          = stat ($filename);

         $file = "toy1.c";
         ($uid, $gid) = stat ($file) [4,5];


   Lstat Function

     - Same as the stat() function, but gives info on a symbolic link 
       itself

     - lstat (FILEHANDLE)
       lstat FILEHANDLE
       lstat (FILENAME)


   The _ Filehandle

     - Whenever a file test operator, stat function or lstat function 
       is used, Perl invokes the proper system call (stat(2) on UNIX) 
       to get the required info

     - Doing a file test, stat or lstat on the special _ filehandle,
       causes Perl to use the existing memory cache (stat buffer) of 
       file info from the previous file test, stat or lstat

     - Ex.

         if (-r $file && -w _)
         {
           print ("$file is both readable and writable\n");
         }

         The above does only one invocation of stat(2) which is more 
         efficient than the following which causes two invocations of 
         stat(2):

         if (-r $file && -w $file)
         {
           print ("$file is both readable and writable\n");
         }


   File Name Expansion (Globbing)

     - If the string inside angle brackets is NOT a filehandle, it is 
       interpreted as a C-Shell Filename Expansion (Globbing) pattern 
       (versus the input operator)

     - All C-Shell globbing metacharacters are valid:
         *, ?, [], -, {}, ~

     - In an array context, the glob returns a list of all filenames
       that match (or an empty list if none match).  In a scalar context, 
       the next filename that matches is returned (or undef if there are 
       no more matches).  This is all similar to how the input operator 
       with a filehandle works.

     - One level of scalar variable interpolation is done

     - But since <$x> indicates an indirect filehandle, use <${x}> for 
       globbing 

     - Ex. 

         <*.c>                     # All files that end in c
         <ch[1-3]>                 # Files ch1, ch2 and ch3
         
         $x = "*.c";
         <${x}>                    # All files that end in c


   Unlink Function

     - Removes one or more files (actually deletes links) 

     - unlink (LIST)
       unlink LIST

     - Returns the number of files successfully deleted

     - On failure $! is set to the value of errno

     - Uses unlink(2)

     - Typical use:

         $count = unlink ("toy1.c", "toy2.c", "toy3.c");

         $count = unlink (<*.c>);

     - Ex.

         #!/usr/bin/perl
         # Simple rm program

         foreach $file (@ARGV)
         {
           unlink ($file) || print ("Could not unlink $file: $!\n");
         }


   Rename Function

     - Renames a file 

     - rename (OLDNAME, NEWNAME)

     - Returns 1 for success, 0 for failure

     - On failure $! is set to the value of errno

     - Similar to mv(1), but does NOT rename across filesystems and 
       does not work if OLDNAME is a regular file and NEWNAME is an 
       existing directory

     - Typical use:

         $status = rename ("toy1.c", "toy2.c");
         $status = rename ("toy1.c", "toys/toy1.c");


   Link Function

     - Creates a new hard link for a file

     - link (OLDNAME, NEWNAME)

     - Returns 1 for success, 0 for failure

     - On failure $! is set to the value of errno

     - Uses link(2)

     - Typical use:

         $status = link ("toy1.c", "toy2.c");


   Symlink Function

     - Creates a new symbolic (soft) link for a file

     - symlink (OLDNAME, NEWNAME)

     - Returns 1 for success, 0 for failure

     - On failure $! is set to the value of errno

     - Uses symlink(2)

     - Typical use:

         $status = symlink ("toy1.c", "toy2.c");


   Readlink Function

     - Reads the contents of a symbolic link file

     - readlink (FILENAME)
       readlink FILENAME

     - Returns link contents on success, undef on failure

     - On failure $! is set to the value of errno

     - Uses readlink(2)

     - Uses $_ if FILENAME is omitted

     - Typical use:

         $link = readlink ("toy2.c");


   Chmod Function

     - Changes the mode (permissions) of a list of files

     - chmod (LIST)
       chmod LIST

     - Returns the number of files successfully changed

     - On failure $! is set to the value of errno

     - Uses chmod(2)

     - The first element of the list must be the numerical mode

     - Typical use:

         $count = chmod (0755, "toy1.c");


   Chown Function

     - Changes the owner and group of a list of files

     - chown (LIST)
       chown LIST

     - Returns the number of files successfully changed

     - On failure $! is set to the value of errno

     - Uses chown(2)

     - The first two elements of the list must be the numerical uid 
       and gid

     - Typical use:

         $count = chown ($uid, $gid, <toy*.c>);


   Utime Function

     - Changes the access (atime) and modification (mtime) times of a 
       list of files

     - utime (LIST)
       utime LIST

     - Returns the number of files successfully changed

     - On failure $! is set to the value of errno

     - Similar to touch(1)

     - The first two elements of the list must be the numerical access 
       and modification times

     - The inode modification time (ctime) is set to the current time

     - Typical use:

         $count = utime ($atime, $mtime, "toy1.c");




Bob Tarr
University of Maryland, Baltimore County
tarr@umbc.edu