Learning Perl, 5th Edition

5.6. Filehandles

A filehandle is the name in a Perl program for an I/O connection between your Perl process and the outside world. That is, it's the name of a connection and not necessarily the name of a file.

Filehandles are named like other Perl identifiers (letters, digits, and underscores, but they can't start with a digit); since they don't have any prefix character, they might be confused with present or future reserved words, or with labels, which we will cover in Chapter 10. Once again, as with labels, the recommendation from Larry is that you use all uppercase letters in the name of your filehandle. It will stand out better and will guarantee your program won't fail when a future (lowercase) reserved word is introduced.

Perl uses six special filehandle names for its own purposes: STDIN, STDOUT, STDERR, DATA, ARGV, and ARGVOUT.[*] Though you may choose any filehandle name you'd like, you shouldn't choose one of those six unless you intend to use that one's special properties.[]

[] In some cases, you could (re)use these names without a problem. But your maintenance programmer may think that youre using the name for its built-in features and may be confused.

Maybe you've recognized some of those names. When your program starts, STDIN is the filehandle naming the connection between the Perl process and wherever the program should get its input, known as the standard input stream. This is generally the user's keyboard unless the user asked for something else to be the source of input, such as a file or the output of another program through a pipe.[] STDOUT is the standard output stream. By default, this one goes to the users display screen, but the user may send the output to a file or to another program, as we'll see shortly. These standard streams come to us from the Unix standard I/O library, but they work in much the same way on most modern operating systems.[§] The general idea is that your program should blindly read from STDIN and blindly write to STDOUT, TRusting in the user (or generally whichever program is starting your program) to have set those up. In that way, the user can type a command like this one at the shell prompt:

[] The defaults we speak of in this chapter for the three main I/O streams are what the Unix shells do by default. But its not just shells that launch programs, of course. We'll see in Chapter 14 what happens when you launch another program from Perl.

[§] If you're not familiar with how your non-Unix system provides standard input and output, see the perlport manpage and the documentation for that system's equivalent to the Unix shell (the program that runs programs based upon your keyboard input).

$ ./your_program <dino >wilma

That command tells the shell that the program's input should be read from the file dino, and the output should go to the file wilma. As long as the program blindly reads its input from STDIN, processes it (in whatever way we need), and blindly writes its output to STDOUT, this will work just fine.

And at no extra charge, the program will work in a pipeline. This is another concept from Unix, which lets us write command lines like this one:

$ cat fred barney | sort | ./your_program | grep something | lpr

Now, if you're unfamiliar with these Unix commands, that's okay. This line says that the cat command should print out all of the lines of file fred followed by all of the lines of file barney. That output should be the input of the sort command, which sorts those lines and passes them on to your_program. After it has done its processing, your_program will send the data on to grep, which discards certain lines in the data, sending the others on to the lpr command, which should print everything that it gets on a printer. Whew!

Pipelines like that are common in Unix and many other systems today because they let you build powerful, complex commands out of simple, standard building blocks. Each building block does one thing well, and it's your job to use them together in the right way.

There's one more standard I/O stream. If (in the previous example) your_program had to emit any warnings or other diagnostic messages, those shouldn't go down the pipeline. The grep command is set to discard anything that it hasn't specifically been told to look for, so it will most likely discard the warnings. Even if it did keep the warnings, you probably don't want to pass them downstream to the other programs in the pipeline. That's why there's the standard error stream: STDERR. Even if the standard output is going to another program or file, the errors will go to wherever the user desires. By default, the errors will generally go to the user's display screen,[*] but the user may send the errors to a file with a shell command like this one:

[*] Generally, errors aren't buffered. That means that if the standard error and standard output streams are going to the same place (such as the monitor), the errors may appear earlier than the normal output. For example, if your program prints a line of ordinary text and tries to divide by zero, the output may show the message about dividing by zero first, and the ordinary text second.

$ netstat | ./your_program 2>/tmp/my_errors

Категории