Learning Perl, 5th Edition
5.6. Filehandles
A filehandle is the name in a Perl program for an I/O connection between your Perl process and the outside world. That is, it's the name of a connection and not necessarily the name of a file. Filehandles are named like other Perl identifiers (letters, digits, and underscores, but they can't start with a digit); since they don't have any prefix character, they might be confused with present or future reserved words, or with labels, which we will cover in Chapter 10. Once again, as with labels, the recommendation from Larry is that you use all uppercase letters in the name of your filehandle. It will stand out better and will guarantee your program won't fail when a future (lowercase) reserved word is introduced. Perl uses six special filehandle names for its own purposes: STDIN, STDOUT, STDERR, DATA, ARGV, and ARGVOUT.[*] Though you may choose any filehandle name you'd like, you shouldn't choose one of those six unless you intend to use that one's special properties.[ [ Maybe you've recognized some of those names. When your program starts, STDIN is the filehandle naming the connection between the Perl process and wherever the program should get its input, known as the standard input stream. This is generally the user's keyboard unless the user asked for something else to be the source of input, such as a file or the output of another program through a pipe.[ [ [§] If you're not familiar with how your non-Unix system provides standard input and output, see the perlport manpage and the documentation for that system's equivalent to the Unix shell (the program that runs programs based upon your keyboard input). $ ./your_program <dino >wilma
That command tells the shell that the program's input should be read from the file dino, and the output should go to the file wilma. As long as the program blindly reads its input from STDIN, processes it (in whatever way we need), and blindly writes its output to STDOUT, this will work just fine. And at no extra charge, the program will work in a pipeline. This is another concept from Unix, which lets us write command lines like this one: $ cat fred barney | sort | ./your_program | grep something | lpr
Now, if you're unfamiliar with these Unix commands, that's okay. This line says that the cat command should print out all of the lines of file fred followed by all of the lines of file barney. That output should be the input of the sort command, which sorts those lines and passes them on to your_program. After it has done its processing, your_program will send the data on to grep, which discards certain lines in the data, sending the others on to the lpr command, which should print everything that it gets on a printer. Whew! Pipelines like that are common in Unix and many other systems today because they let you build powerful, complex commands out of simple, standard building blocks. Each building block does one thing well, and it's your job to use them together in the right way. There's one more standard I/O stream. If (in the previous example) your_program had to emit any warnings or other diagnostic messages, those shouldn't go down the pipeline. The grep command is set to discard anything that it hasn't specifically been told to look for, so it will most likely discard the warnings. Even if it did keep the warnings, you probably don't want to pass them downstream to the other programs in the pipeline. That's why there's the standard error stream: STDERR. Even if the standard output is going to another program or file, the errors will go to wherever the user desires. By default, the errors will generally go to the user's display screen,[*] but the user may send the errors to a file with a shell command like this one: [*] Generally, errors aren't buffered. That means that if the standard error and standard output streams are going to the same place (such as the monitor), the errors may appear earlier than the normal output. For example, if your program prints a line of ordinary text and tries to divide by zero, the output may show the message about dividing by zero first, and the ordinary text second. $ netstat | ./your_program 2>/tmp/my_errors
|