Standard Streams
Module sys is also the place where the standard input, output, and error streams of your Python programs live:
>>> for f in (sys.stdin, sys.stdout, sys.stderr): print f ... ', mode 'r' at 762210> ', mode 'w' at 762270> ', mode 'w' at 7622d0>
The standard streams are simply pre-opened Python file objects that are automatically connected to your program's standard streams when Python starts up. By default, they are all tied to the console window where Python (or a Python program) was started. Because the print statement and raw_input functions are really nothing more than user-friendly interfaces to the standard output and input streams, they are similar to using stdout and stdin in sys directly:
>>> print 'hello stdout world' hello stdout world >>> sys.stdout.write('hello stdout world' + ' ') hello stdout world >>> raw_input('hello stdin world>') hello stdin world>spam 'spam' >>> print 'hello stdin world>',; sys.stdin.readline( )[:-1] hello stdin world>eggs 'eggs'
2.10.1 Redirecting Streams to Files and Programs
Technically, standard output (and print) text appears in the console window where a program was started, standard input (and raw_input) text comes from the keyboard, and standard error is used to print Python error messages to the console window. At least that's the default. It's also possible to redirect these streams both to files and other programs at the system shell, and to arbitrary objects within a Python script. On most systems, such redirections make it easy to reuse and combine general-purpose command-line utilities.
2.10.1.1 Redirecting streams to files
Redirection is useful for things like canned (precoded) test inputs: we can apply a single test script to any set of inputs by simply redirecting the standard input stream to a different file each time the script is run. Similarly, redirecting the standard output stream lets us save and later analyze a program's output; for example, testing systems might compare the saved standard output of a script with a file of expected output, to detect failures.
Although it's a powerful paradigm, redirection turns out to be straightforward to use. For instance, consider the simple read-evaluate-print loop program in Example 2-6.
Example 2-6. PP2ESystemStreams eststreams.py
# read numbers till eof and show squares def interact( ): print 'Hello stream world' # print sends to sys.stdout while 1: try: reply = raw_input('Enter a number>') # raw_input reads sys.stdin except EOFError: break # raises an except on eof else: # input given as a string num = int(reply) print "%d squared is %d" % (num, num ** 2) print 'Bye' if __name__ == '__main__': interact( ) # when run, not imported
As usual, the interact function here is automatically executed when this file is run, not when it is imported. By default, running this file from a system command line makes that standard stream appear where you typed the Python command. The script simply reads numbers until it reaches end-of-file in the standard input stream (on Windows, end-of-file is usually the two-key combination Ctrl+Z; on Unix, type Ctrl+D instead[8]):
[8] Notice that raw_input raises an exception to signal end-of-file, but file read methods simply return an empty string for this condition. Because raw_input also strips the end-of-line character at the end of lines, an empty string result means an empty line, so an exception is necessary to specify the end-of-file condition. File read methods retain the end-of-line character, and denote an empty line as instead of "". This is one way in which reading sys.stdin directly differs from raw_input. The latter also accepts a prompt string that is automatically printed before input is accepted.
C:...PP2ESystemStreams>python teststreams.py Hello stream world Enter a number>12 12 squared is 144 Enter a number>10 10 squared is 100 Enter a number>
But on both Windows and Unix-like platforms, we can redirect the standard input stream to come from a file with the < filename shell syntax. Here is a command session in a DOS console box on Windows that forces the script to read its input from a text file, input.txt. It's the same on Linux, but replace the DOS type command with a Unix cat command:
C:...PP2ESystemStreams>type input.txt 8 6 C:...PP2ESystemStreams>python teststreams.py < input.txt Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
Here, the input.txt file automates the input we would normally type interactively -- the script reads from this file instead of the keyboard. Standard output can be similarly redirected to go to a file, with the > filename shell syntax. In fact, we can combine input and output redirection in a single command:
C:...PP2ESystemStreams>python teststreams.py < input.txt > output.txt C:...PP2ESystemStreams>type output.txt Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
This time, the Python script's input and output are both mapped to text files, not the interactive console session.
2.10.1.2 Chaining programs with pipes
On Windows and Unix-like platforms, it's also possible to send the standard output of one program to the standard input of another, using the | shell character between two commands. This is usually called a "pipe" operation -- the shell creates a pipeline that connects the output and input of two commands. Let's send the output of the Python script to the standard "more" command-line program's input to see how this works:
C:...PP2ESystemStreams>python teststreams.py < input.txt | more Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
Here, teststreams's standard input comes from a file again, but its output (written by print statements) is sent to another program, not a file or window. The receiving program is more -- a standard command-line paging program available on Windows and Unix-like platforms. Because Python ties scripts into the standard stream model, though, Python scripts can be used on both ends -- one Python script's output can always be piped into another Python script's input:
C:...PP2ESystemStreams>type writer.py print "Help! Help! I'm being repressed!" print 42 C:...PP2ESystemStreams>type reader.py print 'Got this" "%s"' % raw_input( ) import sys data = sys.stdin.readline( )[:-1] print 'The meaning of life is', data, int(data) * 2 C:...PP2ESystemStreams>python writer.py | python reader.py Got this" "Help! Help! I'm being repressed!" The meaning of life is 42 84
This time, two Python programs are connected. Script reader gets input from script writer; both scripts simply read and write, oblivious to stream mechanics. In practice, such chaining of programs is a simple form of cross-program communications. It makes it easy to reuse utilities written to communicate via stdin and stdout in ways we never anticipated. For instance, a Python program that sorts stdin text could be applied to any data source we like, including the output of other scripts. Consider the Python command-line utility scripts in Examples Example 2-7 and Example 2-8 that sort and sum lines in the standard input stream.
Example 2-7. PP2ESystemStreamssorter.py
import sys lines = sys.stdin.readlines( ) # sort stdin input lines, lines.sort( ) # send result to stdout for line in lines: print line, # for further processing
Example 2-8. PP2ESystemStreamsadder.py
import sys, string sum = 0 while 1: try: line = raw_input( ) # or call sys.stdin.readlines( ): except EOFError: # or sys.stdin.readline( ) loop break else: sum = sum + string.atoi(line) # int(line[:-1]) treats 042 as octal print sum
We can apply such general-purpose tools in a variety of ways at the shell command line, to sort and sum arbitrary files and program outputs:
C:...PP2ESystemStreams>type data.txt 123 000 999 042 C:...PP2ESystemStreams>python sorter.py < data.txt sort a file 000 042 123 999 C:...PP2ESystemStreams>type data.txt | python adder.py sum program output 1164 C:...PP2ESystemStreams>type writer2.py for data in (123, 0, 999, 42): print '%03d' % data C:...PP2ESystemStreams>python writer2.py | python sorter.py sort py output 000 042 123 999 C:...PP2ESystemStreams>python writer2.py | python sorter.py | python adder.py 1164
The last command here connects three Python scripts by standard streams -- the output of each prior script is fed to the input of the next via pipeline shell syntax.
If you look closely, you'll notice that sorter reads all of stdin at once with the readlines method, but adder reads one line at a time. If the input source is another program, some platforms run programs connected by pipes in parallel. On such systems, reading line-by-line works better if the data streams being passed about are large -- readers need not wait until writers are completely finished to get busy processing data. Because raw_input just reads stdin, the line-by-line scheme used by adder can always be coded with sys.stdin too:
C:...PP2ESystemStreams>type adder2.py import sys, string sum = 0 while 1: line = sys.stdin.readline( ) if not line: break sum = sum + string.atoi(line[:-1]) print sum
Changing sorter to read line-by-line may not be a big performance boost, though, because the list sort method requires the list to already be complete. As we'll see in Chapter 17, manually coded sort algorithms are likely to be much slower than the Python list sorting method.
2.10.1.3 Redirected streams and user interaction
At the start of the last section, we piped teststreams.py output into the standard more command-line program with a command like this:
C:...PP2ESystemStreams>python teststreams.py < input.txt | more
But since we already wrote our own "more" paging utility in Python near the start of this chapter, why not set it up to accept input from stdin too? For example, if we change the last three lines of file more.py listed earlier in this chapter to this:
if __name__ == '__main__': # when run, not when imported if len(sys.argv) == 1: # page stdin if no cmd args more(sys.stdin.read( )) else: more(open(sys.argv[1]).read( ))
Then it almost seems as if we should be able to redirect the standard output of teststreams.py into the standard input of more.py :
C:...PP2ESystemStreams>python teststreams.py < input.txt | python ..more.py Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
This technique works in general for Python scripts. Here, teststreams.py takes input from a file again. And, as in the last section, one Python program's output is piped to another's input -- the more.py script in the parent ("..") directory.
2.10.1.3.1 Reading keyboard input
But there's a subtle problem lurking in the preceding more.py command. Really, chaining only worked there by sheer luck: if the first script's output is long enough for more to have to ask the user if it should continue, the script will utterly fail. The problem is that the augmented more.py uses stdin for two disjoint purposes. It reads a reply from an interactive user on stdin by calling raw_input, but now also accepts the main input text on stdin. When the stdin stream is really redirected to an input file or pipe, we can't use it to input a reply from an interactive user; it contains only the text of the input source. Moreover, because stdin is redirected before the program even starts up, there is no way to know what it meant prior to being redirected in the command line.
If we intend to accept input on stdin and use the console for user interaction, we have to do a bit more. Example 2-9 shows a modified version of the more script that pages the standard input stream if called with no arguments, but also makes use of lower-level and platform-specific tools to converse with a user at a keyboard if needed.
Example 2-9. PP2ESystemmoreplus.py
############################################################# # split and interactively page a string, file, or stream of # text to stdout; when run as a script, page stdin or file # whose name is passed on cmdline; if input is stdin, can't # use it for user reply--use platform-specific tools or gui; ############################################################# import sys, string def getreply( ): """ read a reply key from an interactive user even if stdin redirected to a file or pipe """ if sys.stdin.isatty( ): # if stdin is console return raw_input('?') # read reply line from stdin else: if sys.platform[:3] == 'win': # if stdin was redirected import msvcrt # can't use to ask a user msvcrt.putch('?') key = msvcrt.getche( ) # use windows console tools msvcrt.putch(' ') # getch( ) does not echo key return key elif sys.platform[:5] == 'linux': # use linux console device print '?', # strip eoln at line end console = open('/dev/tty') line = console.readline( )[:-1] return line else: print '[pause]' # else just pause--improve me import time # see also modules curses, tty time.sleep(5) # or copy to temp file, rerun return 'y' # or gui popup, tk key bind def more(text, numlines=10): """ split multi-line string to stdout """ lines = string.split(text, ' ') while lines: chunk = lines[:numlines] lines = lines[numlines:] for line in chunk: print line if lines and getreply( ) not in ['y', 'Y']: break if __name__ == '__main__': # when run, not when imported if len(sys.argv) == 1: # if no command-line arguments more(sys.stdin.read( )) # page stdin, no raw_inputs else: more(open(sys.argv[1]).read( )) # else page filename argument
Most of the new code in this version shows up in its getreply function. The file isatty method tells us if stdin is connected to the console; if it is, we simply read replies on stdin as before. Unfortunately, there is no portable way to input a string from a console user independent of stdin, so we must wrap the non-stdin input logic of this script in a sys.platform test:
- On Windows, the built-in msvcrt module supplies low-level console input and output calls (e.g., msvcrt.getch( ) reads a single key press).
- On Linux, the system device file named /dev/tty gives access to keyboard input (we can read it as though it were a simple file).
- On other platforms, we simply run a built-in time.sleep call to pause for five seconds between displays (this is not at all ideal, but is better than not stopping at all, and serves until a better nonportable solution can be found).
Of course, we only have to add such extra logic to scripts that intend to interact with console users and take input on stdin. In a GUI application, for example, we could instead pop up dialogs, bind keyboard-press event to run callbacks, and so on (we'll meet GUIs in Chapter 6).
Armed with the reusable getreply function, though, we can safely run our moreplus utility in a variety of ways. As before, we can import and call this module's function directly, passing in whatever string we wish to page:
>>> from moreplus import more >>> more(open('System.txt').read( )) This directory contains operating system interface examples. Many of the examples in this unit appear elsewhere in the examples distribution tree, because they are actually used to manage other programs. See the README.txt files in the subdirectories here for pointers.
Also as before, when run with a command-line argument, this script interactively pages through the named file's text:
C:...PP2ESystem>python moreplus.py System.txt This directory contains operating system interface examples. Many of the examples in this unit appear elsewhere in the examples distribution tree, because they are actually used to manage other programs. See the README.txt files in the subdirectories here for pointers. C:...PP2ESystem>python moreplus.py moreplus.py ############################################################# # split and interactively page a string, file, or stream of # text to stdout; when run as a script, page stdin or file # whose name is passed on cmdline; if input is stdin, can't # use it for user reply--use platform-specific tools or gui; ############################################################# import sys, string def getreply( ): ?n
But now the script also correctly pages text redirected in to stdin from either a file or command pipe, even if that text is too long to fit in a single display chunk. On most shells, we send such input via redirection or pipe operators like these:
C:...PP2ESystem>python moreplus.py < moreplus.py ############################################################# # split and interactively page a string, file, or stream of # text to stdout; when run as a script, page stdin or file # whose name is passed on cmdline; if input is stdin, can't # use it for user reply--use platform-specific tools or gui; ############################################################# import sys, string def getreply( ): ?n C:...PP2ESystem>type moreplus.py | python moreplus.py ############################################################# # split and interactively page a string, file, or stream of # text to stdout; when run as a script, page stdin or file # whose name is passed on cmdline; if input is stdin, can't # use it for user reply--use platform-specific tools or gui; ############################################################# import sys, string def getreply( ): ?n
This works the same on Linux, but again use the cat command instead of type. Finally, piping one Python script's output into this script's input now works as expected, without botching user interaction (and not just because we got lucky):
C:......SystemStreams>python teststreams.py < input.txt | python ..moreplus.py Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
Here, the standard output of one Python script is fed to the standard input of another Python script located in the parent directory: moreplus.py reads the output of teststreams.py.
All of the redirections in such command lines work only because scripts don't care what standard input and output really are -- interactive users, files, or pipes between programs. For example, when run as a script, moreplus.py simply reads stream sys.stdin; the command-line shell (e.g., DOS on Windows, csh on Linux) attaches such streams to the source implied by the command line before the script is started. Scripts use the preopened stdin and stdout file objects to access those sources, regardless of their true nature.
And for readers keeping count, we have run this single more pager script in four different ways: by importing and calling its function, by passing a filename command-line argument, by redirecting stdin to a file, and by piping a command's output to stdin. By supporting importable functions, command-line arguments, and standard streams, Python system tools code can be reused in a wide variety of modes.
2.10.2 Redirecting Streams to Python Objects
All of the above standard stream redirections work for programs written in any language that hooks into the standard streams, and rely more on the shell's command-line processor than on Python itself. Command-line redirection syntax like < filename and | program is evaluated by the shell, not Python. A more Pythonesque form of redirection can be done within scripts themselves, by resetting sys.stdin and sys.stdout to file-like objects.
The main trick behind this mode is that anything that looks like a file in terms of methods will work as a standard stream in Python. The object's protocol, not the object's specific datatype, is all that matters. That is:
- Any object that provides file-like read methods can be assigned to sys.stdin to make input come from that object's read methods.
- Any object that defines file-like write methods can be assigned to sys.stdout; all standard output will be sent to that object's methods.
Because print and raw_input simply call the write and readline methods of whatever objects sys.stdout and sys.stdin happen to reference, we can use this trick to both provide and intercept standard stream text with objects implemented as classes. Example 2-10 shows a utility module that demonstrates this concept.
Example 2-10. PP2ESystemStreams edirect.py
########################################################## # file-like objects that save all standard output text in # a string, and provide standard input text from a string; # redirect runs a passed-in function with its output and # input streams reset to these file-like class objects; ########################################################## import sys, string # get built-in modules class Output: # simulated output file def __init__(self): self.text = '' # empty string when created def write(self, string): # add a string of bytes self.text = self.text + string def writelines(self, lines): # add each line in a list for line in lines: self.write(line) class Input: # simulated input file def __init__(self, input=''): # default argument self.text = input # save string when created def read(self, *size): # optional argument if not size: # read N bytes, or all res, self.text = self.text, '' else: res, self.text = self.text[:size[0]], self.text[size[0]:] return res def readline(self): eoln = string.find(self.text, ' ') # find offset of next eoln if eoln == -1: # slice off through eoln res, self.text = self.text, '' else: res, self.text = self.text[:eoln+1], self.text[eoln+1:] return res def redirect(function, args, input): # redirect stdin/out savestreams = sys.stdin, sys.stdout # run a function object sys.stdin = Input(input) # return stdout text sys.stdout = Output( ) try: apply(function, args) except: sys.stderr.write('error in function! ') sys.stderr.write("%s, %s " % (sys.exc_type, sys.exc_value)) result = sys.stdout.text sys.stdin, sys.stdout = savestreams return result
This module defines two classes that masquerade as real files:
- Output provides the write method protocol expected of output files, but saves all output as it is written, in an in-memory string.
- Input provides the protocol expected of input files, but provides input on demand from an in-memory string, passed in at object construction time.
The redirect function at the bottom of this file combines these two objects to run a single function with input and output redirected entirely to Python class objects. The passed-in function so run need not know or care that its print statements, raw_input calls, and stdin and stdout method calls are talking to a class instead of a real file, pipe, or user.
To demonstrate, import and run the interact function at the heart of the teststreams script we've been running from the shell (to use the redirection utility function, we need to deal in terms of functions, not files). When run directly, the function reads from the keyboard and writes to the screen, just as if it were run as a program without redirection:
C:...PP2ESystemStreams>python >>> from teststreams import interact >>> interact( ) Hello stream world Enter a number>2 2 squared is 4 Enter a number>3 3 squared is 9 Enter a number >>>
Now, let's run this function under the control of the redirection function in redirect.py, and pass in some canned input text. In this mode, the interact function takes its input from the string we pass in ('4 5 6 ' -- three lines with explicit end-of-line characters), and the result of running the function is a string containing all the text written to the standard output stream:
>>> from redirect import redirect >>> output = redirect(interact, ( ), '4 5 6 ') >>> output 'Hello stream world 12Enter a number>4 squared is 16 12Enter a number> 5 squared is 25 12Enter a number>6 squared is 36 12Enter a number>Bye 12'
The result is a single, long string, containing the concatenation of all text written to standard output. To make this look better, we can split it up with the standard string module:
>>> from string import split >>> for line in split(output, ' '): print line ... Hello stream world Enter a number>4 squared is 16 Enter a number>5 squared is 25 Enter a number>6 squared is 36 Enter a number>Bye
Better still, we can reuse the more.py module we saw earlier in this chapter; it's less to type and remember, and is already known to work well:
>>> from PP2E.System.more import more >>> more(output) Hello stream world Enter a number>4 squared is 16 Enter a number>5 squared is 25 Enter a number>6 squared is 36 Enter a number>Bye
This is an artificial example, of course, but the techniques illustrated are widely applicable. For example, it's straightforward to add a GUI interface to a program written to interact with a command-line user. Simply intercept standard output with an object like the Output class shown earlier, and throw the text string up in a window. Similarly, standard input can be reset to an object that fetches text from a graphical interface (e.g., a popped-up dialog box). Because classes are plug-and-play compatible with real files, we can use them in any tool that expects a file. Watch for a GUI stream-redirection module named guiStreams in Chapter 9.
2.10.3 Other Redirection Options
Earlier in this chapter, we also studied the built-in os.popen function, which provides a way to redirect another command's streams from within a Python program. As we saw, this function runs a shell command line (e.g., a string we would normally type at a DOS or csh prompt), but returns a Python file-like object connected to the command's input or output stream. Because of that, the os.popen tool can be considered another way to redirect streams of spawned programs, and a cousin to the techniques we just met: Its effect is much like the shell | command-line pipe syntax for redirecting streams to programs (in fact its name means "pipe open"), but it is run within a script and provides a file-like interface to piped streams. It's similar in spirit to the redirect function, but is based on running programs (not calling functions), and the command's streams are processed in the spawning script as files (not tied to class objects).
By passing in the desired mode flag, we redirect a spawned program's input or output streams to a file in the calling scripts:
C:...PP2ESystemStreams>type hello-out.py print 'Hello shell world' C:...PP2ESystemStreams>type hello-in.py input = raw_input( ) open('hello-in.txt', 'w').write('Hello ' + input + ' ') C:...PP2ESystemStreams>python >>> import os >>> pipe = os.popen('python hello-out.py') # 'r' is default--read stdout >>> pipe.read( ) 'Hello shell world 12' >>> pipe = os.popen('python hello-in.py', 'w') >>> pipe.write('Gumby ') # 'w'--write to program stdin >>> pipe.close( ) # at end is optional >>> open('hello-in.txt').read( ) 'Hello Gumby 12'
The popen call is also smart enough to run the command string as an independent process on Unix and Linux. There are additional popen-like tools in the Python library that allow scripts to connect to more than one of the commands' streams. For instance, the popen2 module includes functions for hooking into both a command's input and output streams (popen2.popen2), and another for connecting to standard error as well (popen2.popen3):
import popen2 childStdout, childStdin = popen2.popen2('python hello-in-out.py') childStdin.write(input) output = childStdout.read( ) childStdout, childStdin, childStderr = popen2.popen3('python hello-in-out.py')
These two calls work much like os.popen, but connect additional streams. When I originally wrote this, these calls only worked on Unix-like platforms, not on Windows, because they relied on a fork call in Python 1.5.2. As of the Python 2.0 release, they now work well on Windows too.
Speaking of which: on Unix-like platforms, the combination of the calls os.fork, os.pipe, os.dup, and some os.exec variants can be used to start a new independent program with streams connected to the parent program's streams (that's how popen2 works its magic). As such, it's another way to redirect streams, and a low-level equivalent to tools like os.popen. See Chapter 3 for more on all these calls, especially its section on pipes.
|