Climbing the CGI Learning Curve
Okay, it's time to get into concrete programming details. This section introduces CGI coding one step at a time -- from simple, noninteractive scripts to larger programs that utilize all the common web page user input devices (what we called "widgets" in the Tkinter GUI chapters of Part II). We'll move slowly at first, to learn all the basics; the next two chapters will use the ideas presented here to build up larger and more realistic web site examples. For now, let's work though a simple CGI tutorial, with just enough HTML thrown in to write basic server-side scripts.
12.3.1 A First Web Page
As mentioned, CGI scripts are intimately bound up with HTML, so let's start with a simple HTML page. The file test0.html, shown in Example 12-1, defines a bona fide, fully functional web page -- a text file containing HTML code, which specifies the structure and contents of a simple web page.
Example 12-1. PP2EInternetCgi-WebBasics est0.html
HTML 101
A First HTML page
Hello, HTML World!
If you point your favorite web browser to the Internet address of this file (or to its local path on your own machine), you should see a page like that shown in Figure 12-2. This figure shows the Internet Explorer browser at work; other browsers render the page similarly.
Figure 12-2. A simple web page from an HTML file
To truly understand how this little file does its work, you need to know something about permission rules, HTML syntax, and Internet addresses. Let's take a quick first look at each of these topics before we move on to larger examples.
12.3.1.1 HTML file permission constraints
First of all, if you want to install this code on a different machine, it's usually necessary to grant web page files and their directories world-readable permission. That's because they are loaded by arbitrary people over the Web (actually, by someone named "nobody", who we'll introduce in a moment). An appropriate chmod command can be used to change permissions on Unix-like machines. For instance, a chmod 755 filename shell command usually suffices; it makes filename readable and executable by everyone, and writable by you only.[2] These directory and file permission details are typical, but they can vary from server to server. Be sure to find out about the local server's conventions if you upload this file to your site.
[2] These are not necessarily magic numbers. On Unix machines, mode 755 is a bit mask. The first 7 simply means that you (the file's owner) can read, write, and execute the file (7 in binary is 111 -- each bit enables an access mode). The two 5s (binary 101) say that everyone else (your group and others) can read and execute (but not write) the file. See your system's manpage on the chmod command for more details.
12.3.1.2 HTML basics
I promised that I wouldn't teach much HTML in this book, but you need to know enough to make sense of examples. In short, HTML is a descriptive markup language, based on tags -- items enclosed in <> pairs. Some tags stand alone (e.g.,
specifies a horizontal rule). Others appear in begin/end pairs where the end tag includes an extra slash.
For instance, to specify the text of a level-1 header line, we write HTML code of the form
text
; the text between the tags shows up on the web page. Some tags also allow us to specify options. For example, a tag pair like href="address">text specifies a hyperlink : pressing the link's text in the page directs the browser to access the Internet address (URL) listed in the href option.
It's important to keep in mind that HTML is used only to describe pages: your web browser reads it and translates its description to a web page with headers, paragraphs, links, and the like. Notably absent is both layout information -- the browser is responsible for arranging components on the page -- and syntax for programming logic -- there are no "if" statements, loops, and so on. There is also no Python code in this file anywhere to be found; raw HTML is strictly for defining pages, not for coding programs or specifying all user-interface details.
HTML's lack of user interface control and programmability is both a strength and a weakness. It's well-suited to describing pages and simple user interfaces at a high level. The browser, not you, handles physically laying out the page on your screen. On the other hand, HTML does not directly support full-blown GUIs and requires us to introduce CGI scripts (and other technologies) to web sites, in order to add dynamic programmability to otherwise static HTML.
12.3.1.3 Internet addresses (URLs)
Once you write an HTML file, you need to put it some place where the outside world can find it. Like all HTML files, test0.html must be stored in a directory on the server machine, from which the resident web server program allows browsers to fetch pages. On the server where this example lives, the page's file must be stored in or below the public_html directory of my personal home directory -- that is, somewhere in the directory tree rooted at /home/lutz/public_html. For this section, examples live in a Basics subdirectory, so the complete Unix pathname of this file on the server is:
/home/lutz/public_html/Basics/test0.html
This path is different than its PP2EInternetCgi-WebBasics location on the book's CD http://examples.oreilly.com/python2), as given in the example file listing's title. When you reference this file on the client, though, you must specify its Internet address, sometimes called a URL, instead. To load the remote page, type the following text in your browser's address field (or click the example root page's test0.html hyperlink, which refers to same address):
http://starship.python.net/~lutz/Basics/test0.html
This string is a URL composed of multiple parts:
Protocol name: http
The protocol part of this URL tells the browser to communicate with the HTTP server program on the server machine, using the HTTP message protocol. URLs used in browsers can also name different protocols -- for example, ftp:// to reference a file managed by the FTP protocol and server, telnet to start a Telnet client session, and so on.
Server machine name: starship.python.net
A URL also names the target server machine following the protocol type. Here, we list the domain name of the server machine were the examples are installed; the machine name listed is used to open a socket to talk to the server. For HTTP, the socket is usually connected to port number 80.
File path: ~lutz/Basics/test0.html
Finally, the URL gives the path to the desired file on the remote machine. The HTTP web server automatically translates the URL's file path to the file's true Unix pathname: on my server, ~lutz is automatically translated to the public_html directory in my home directory. URLs typically map to such files, but can reference other sorts of items as well.
Parameters (used in later examples)
URLs may also be followed by additional input parameters for CGI programs. When used, they are introduced by a ? and separated by & characters; for instance, a string of the form ?name=bob&job=hacker at the end of a URL passes parameters named name and job to the CGI script named earlier in the URL. These values are sometimes called URL query string parameters and are treated the same as form inputs. More on both forms and parameters in a moment.
For completeness, you should also know that URLs can contain additional information (e.g., the server name part can specify a port number following a :), but we'll ignore these extra formatting rules here. If you're interested in more details, you might start by reading the urlparse module's entry in Python's library manual, as well as its source code in the Python standard library. You might also notice that a URL you type to access a page looks a bit different after the page is fetched (spaces become + characters, %s are added, etc.). This is simply because browsers must also generally follow URL escaping (i.e., translation) conventions, which we'll explore later in this chapter.
12.3.1.4 Using minimal URLs
Because browsers remember the prior page's Internet address, URLs embedded in HTML files can often omit the protocol and server names, as well as the file's directory path. If missing, the browser simply uses these components' values from the last page's address. This minimal syntax works both for URLs embedded in hyperlinks and form actions (we'll meet forms later in this chapter). For example, within a page that was fetched from directory dirpath on server www.server.com, minimal hyperlinks and form actions such as:
<a href="more.html"> </a>
<a href="more.html">are treated exactly as if we had specified a complete URL with explicit server and path components, like the following: </a>
<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">The first minimal URL refers to file more.html on the same server and in the same directory that the page containing this hyperlink was fetched from; it is expanded to a complete URL within the browser. URLs can also employ Unix-style relative path syntax in the file path component. For instance, a hyperlink tag like HREF="http://spam.gif"> names a GIF file on the server machine and parent directory of the file that contains this link's URL. </a>
<a href="http://www.server.com/dirpath/more.html">Why all the fuss about shorter URLs? Besides extending the life of your keyboard and eyesight, the main advantage of such minimal URLs is that they don't need to be changed if you ever move your pages to a new directory or server -- the server and path are inferred when the page is used, not hardcoded into its HTML. The flipside of this can be fairly painful: examples that do include explicit site and pathnames in URLs embedded within HTML code cannot be copied to other servers without source code changes. Scripts can help here, but editing source code can be error-prone.[3] </a>
<a href="http://www.server.com/dirpath/more.html">[3] To make this process easier, the fixsitename.py script presented in the next section largely automates the necessary changes by performing global search-and-replace operations and directory walks. A few book examples do use complete URLs, so be sure to run this script after copying examples to a new site.</a>
<a href="http://www.server.com/dirpath/more.html">The downside of minimal URLs is that they don't trigger automatic Internet connection when followed. This becomes apparent only when you load pages from local files on your computer. For example, we can generally open HTML pages without connecting to the Internet at all, by pointing a web browser to a page's file that lives on the local machine (e.g., by clicking on its file icon). When browsing a page locally like this, following a fully specified URL makes the browser automatically connect to the Internet to fetch the referenced page or script. Minimal URLs, though, are opened on the local machine again; usually, the browser simply displays the referenced page or script's source code. </a>
<a href="http://www.server.com/dirpath/more.html">The net effect is that minimal URLs are more portable, but tend to work better when running all pages live on the Internet. To make it easier to work with the examples in this book, they will often omit the server and path components in URLs they contain. In this book, to derive a page or script's true URL from a minimal URL, imagine that the string: </a>
<a href="http://www.server.com/dirpath/more.html">http://starship.python.net/~lutz/subdir </a>
<a href="http://www.server.com/dirpath/more.html">appears before the filename given by the URL. Your browser will, even if you don't. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.2 A First CGI Script</a>
<a href="http://www.server.com/dirpath/more.html">The HTML file we just saw is just that -- an HTML file, not a CGI script. When referenced by a browser, the remote web server simply sends back the file's text to produce a new page in the browser. To illustrate the nature of CGI scripts, let's recode the example as a Python CGI program, as shown in Example 12-2. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-2. PP2EInternetCgi-WebBasics est0.cgi </a>
<a href="http://www.server.com/dirpath/more.html">#!/usr/bin/python ####################################################### # runs on the server, prints html to create a new page; # executable permissions, stored in ~lutz/public_html, # url=http://starship.python.net/~lutz/Basics/test0.cgi ####################################################### print "Content-type: text/html " print "</a>
CGI 101<a href="http://www.server.com/dirpath/more.html">" print "</a>
<a href="http://www.server.com/dirpath/more.html">A First CGI script</a>
<a href="http://www.server.com/dirpath/more.html">" print "</a>
<a href="http://www.server.com/dirpath/more.html">Hello, CGI World!</a>
<a href="http://www.server.com/dirpath/more.html">" </a>
<a href="http://www.server.com/dirpath/more.html">This file, test0.cgi, makes the same sort of page if you point your browser at it (simply replace .html with .cgi in the URL). But it's a very different kind of animal -- it's an executable program that is run on the server in response to your access request. It's also a completely legal Python program, in which the page's HTML is printed dynamically, rather than being precoded in a static file. In fact, there is little that is CGI-specific about this Python program at all; if run from the system command line, it simply prints HTML rather than generating a browser page: </a>
<a href="http://www.server.com/dirpath/more.html">C:...PP2EInternetCgi-WebBasics>python test0.cgi Content-type: text/html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">A First CGI script</a>
<a href="http://www.server.com/dirpath/more.html">Hello, CGI World!</a>
<a href="http://www.server.com/dirpath/more.html">When run by the HTTP server program on a web server machine, however, the standard output stream is tied to a socket read by the browser on the client machine. In this context, all the output is sent across the Internet to your browser. As such, it must be formatted per the browser's expectations. In particular, when the script's output reaches your browser, the first printed line is interpreted as a header, describing the text that follows. There can be more than one header line in the printed response, but there must always be a blank line between the headers and the start of the HTML code (or other data). </a>
<a href="http://www.server.com/dirpath/more.html">In this script, the first header line tells the browser that the rest of the transmission is HTML text (text/html), and the newline character ( ) at the end of the first print statement generates one more line-feed than the print statement itself. The rest of this program's output is standard HTML and is used by the browser to generate a web page on a client, exactly as if the HTML lived in a static HTML file on the server.[4] </a>
<a href="http://www.server.com/dirpath/more.html">[4] Notice that the script does not generate the enclosing </a>
<a href="http://www.server.com/dirpath/more.html">and </a><a href="http://www.server.com/dirpath/more.html"> tags in the static HTML file of the prior section. Strictly speaking, it should -- HTML without such tags is invalid. But all commonly used browsers simply ignore the omission. </a>
<a href="http://www.server.com/dirpath/more.html">CGI scripts are accessed just like HTML files: you either type the full URL of this script into your browser's address field, or click on the test0.cgi link line in the examples root page (which follows a minimal hyperlink that resolves to the script's full URL). Figure 12-3 shows the result page generated if you point your browser at this script to make it go. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-3. A simple web page from a CGI script</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">12.3.2.1 Installing CGI scripts</a>
<a href="http://www.server.com/dirpath/more.html">Like HTML files, CGI scripts are simple text files that you can either create on your local machine and upload to the server by FTP, or write with a text editor running directly on the server machine (perhaps using a telnet client). However, because CGI scripts are run as programs, they have some unique installation requirements that differ from simple HTML files. In particular, they usually must be stored and named specially, and they must be configured as programs that are executable by arbitrary users. Depending on your needs, CGI scripts may also need help finding imported modules and may need to be converted to the server platform's text file format after being uploaded. Let's look at each install constraint in more depth: </a>
<a href="http://www.server.com/dirpath/more.html">Directory and filename conventions </a>
<a href="http://www.server.com/dirpath/more.html">First of all, CGI scripts need to be placed in a directory that your web server recognizes as a program directory, and they need to be given a name that your server recognizes as a CGI script. On the server where these examples reside, CGI scripts can be stored in each user's public_html directory just like HTML files, but must have a filename ending in a .cgi suffix, not .py. Some servers allow .py filename suffixes too, and may recognize other program directories (cgi-bin is common), but this varies widely, too, and can sometimes be configured per server or user. </a>
<a href="http://www.server.com/dirpath/more.html">Execution conventions </a>
<a href="http://www.server.com/dirpath/more.html">Because they must be executed by the web server on behalf of arbitrary users on the Web, CGI script files also need to be given executable file permissions to mark them as programs, and they must be made executable by others. Again, a shell command chmod 0755 filename does the trick on most servers. CGI scripts also generally need the special #! line at the top, to identify the Python interpreter that runs the file's code. The text after the #! in the first line simply gives the directory path to the Python executable on your server machine. See Chapter 2, for more details on this special first line, and be sure to check your server's conventions for more details on non-Unix platforms. </a>
<a href="http://www.server.com/dirpath/more.html">One subtlety worth noting. As we saw earlier in the book, the special first line in executable text files can normally contain either a hardcoded path to the Python interpreter (e.g., #!/usr/bin/python) or an invocation of the env program (e.g., #!/usr/bin/env python), which deduces where Python lives from environment variable settings (i.e., your $PATH). The env trick is less useful in CGI scripts, though, because their environment settings are those of user "nobody" (not your own), as explained in the next paragraph. </a>
<a href="http://www.server.com/dirpath/more.html">Module search path configuration (optional)</a>
<a href="http://www.server.com/dirpath/more.html">HTTP servers generally run CGI scripts with username "nobody" for security reasons (this limits the user's access to the server machine). That's why files you publish on the Web must have special permission settings that make them accessible to other users. It also means that CGI scripts can't rely on the Python module search path to be configured in any particular way. As we've seen, the module path is normally initialized from the user's PYTHONPATH setting plus defaults. But because CGI scripts are run by user "nobody", PYTHONPATH may be arbitrary when a CGI script runs. </a>
<a href="http://www.server.com/dirpath/more.html">Before you puzzle over this too hard, you should know that this is often not a concern in practice. Because Python usually searches the current directory for imported modules by default, this is not an issue if all of your scripts and any modules and packages they use are stored in your web directory (which is the installation structure on the book's site). But if the module lives elsewhere, you may need to tweak the sys.path list in your scripts to adjust the search path manually before imports (e.g., with sys.path.append(dirname) calls, index assignments, and so on). </a>
<a href="http://www.server.com/dirpath/more.html">End-of-line conventions (optional)</a>
<a href="http://www.server.com/dirpath/more.html">Finally, on some Unix (and Linux) servers, you might also have to make sure that your script text files follow the Unix end-of-line convention ( ), not DOS ( ). This isn't an issue if you edit and debug right on the server (or on another Unix machine) or FTP files one by one in text mode. But if you edit and upload your scripts from a PC to a Unix server in a tar file (or in FTP binary mode), you may need to convert end-of-lines after the upload. For instance, the server that was used to develop this text returns a default error page for scripts whose end-of-lines are in DOS format (see later in this chapter for a converter script). </a>
<a href="http://www.server.com/dirpath/more.html">This installation process may sound a bit complex at first glance, but it's not bad once you've worked through it on your own: it's only a concern at install time and can usually be automated to some extent with Python scripts run on the server. To summarize, most Python CGI scripts are text files of Python code, which: </a>
- <a href="http://www.server.com/dirpath/more.html">Are named according to your web server's conventions (e.g., file.cgi) </a>
- <a href="http://www.server.com/dirpath/more.html">Are stored in a directory recognized by your web server (e.g., cgi-bin/ ) </a>
- <a href="http://www.server.com/dirpath/more.html">Are given executable file permissions (e.g., chmod 755 file.cgi) </a>
- <a href="http://www.server.com/dirpath/more.html">Usually have the special #!pythonpath line at the top (but not env) </a>
- <a href="http://www.server.com/dirpath/more.html">Configure sys.path only if needed to see modules in other directories </a>
- <a href="http://www.server.com/dirpath/more.html">Use Unix end-of-line conventions, only if your server rejects DOS format </a>
- <a href="http://www.server.com/dirpath/more.html">Print headers and HTML to generate a response page in the browser, if any </a>
- <a href="http://www.server.com/dirpath/more.html">Use the cgi module to parse incoming form data, if any (more about forms later in this chapter) </a>
<a href="http://www.server.com/dirpath/more.html">Even if you must use a server machine configured by someone else, most of the machine's conventions should be easy to root out. For instance, on some servers you can rename this example to test0.py and it will continue to be run when accessed. On others, you might instead see the file's source code in a popped-up text editor when you access it. Try a .cgi suffix if the text is displayed rather than executed. CGI directory conventions can vary, too, but try the directory where you normally store HTML files first. As usual, you should consult the conventions for any machine that you plan to copy these example files to. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.2.2 Automating installation steps</a>
<a href="http://www.server.com/dirpath/more.html">But wait -- why do things the hard way? Before you start installing scripts by hand, remember that Python programs can usually do much of your work for you. It's easy to write Python scripts that automate some of the CGI installation steps using the operating systems tools that we met earlier in the book. </a>
<a href="http://www.server.com/dirpath/more.html">For instance, while developing the examples in this chapter, I did all editing on my PC (it's generally more dependable than a telnet client). To install, I put all the examples in a tar file, which is uploaded to the Linux server by FTP in a single step. Unfortunately, my server expects CGI scripts to have Unix (not DOS) end-of-line markers; unpacking the tar file did not convert end-of-lines or retain executable permission settings. But rather than tracking down all the web CGI scripts and fixing them by hand, I simply run the Python script in Example 12-3 from within a Unix find command after each upload. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-3. PP2EInternetCgi-Webfixcgi.py </a>
<a href="http://www.server.com/dirpath/more.html">######################################################################## # run fom a unix find command to automate some cgi script install steps; # example: find . -name "*.cgi" -print -exec python fixcgi.py {} ; # which converts all cgi scripts to unix line-feed format (needed on # starship) and gives all cgi files executable mode, else won't be run; # do also: chmod 777 PyErrata/DbaseFiles/*, vi Extern/Email/mailconfig*; # related: fixsitename.py, PyTools/fixeoln*.py, System/Filetools ######################################################################## # after: ungzip, untar, cp -r Cgi-Web/* ~/public_html import sys, string, os fname = sys.argv[1] old = open(fname, 'rb').read( ) new = string.replace(old, ' ', ' ') open(fname, 'wb').write(new) if fname[-3:] == 'cgi': os.chmod(fname, 0755) # note octal int: rwx,sgo</a>
<a href="http://www.server.com/dirpath/more.html">This script is kicked off at the top of the Cgi-Web directory, using a Unix csh shell command to apply it to every CGI file in a directory tree, like this: </a>
<a href="http://www.server.com/dirpath/more.html">% find . -name "*.cgi" -print -exec python fixcgi.py {} ; ./Basics/languages-src.cgi ./Basics/getfile.cgi ./Basics/languages.cgi ./Basics/languages2.cgi ./Basics/languages2reply.cgi ./Basics/putfile.cgi ...more...</a>
<a href="http://www.server.com/dirpath/more.html">Recall from Chapter 2 that there are various ways to walk directory trees and find matching files in pure Python code, including the find module, os.path.walk, and one we'll use in the next section's script. For instance, a pure Python and more portable alternative could be kicked off like this: </a>
<a href="http://www.server.com/dirpath/more.html">C:...PP2EInternetCgi-Web>python >>> import os >>> from PP2E.PyTools.find import find >>> for filename in find('*.cgi', '.'): ... print filename ... stat = os.system('python fixcgi.py ' + filename) ... .Basicsgetfile.cgi .Basicslanguages-src.cgi .Basicslanguages.cgi .Basicslanguages2.cgi ...more...</a>
<a href="http://www.server.com/dirpath/more.html">The Unix find command simply does the same, but outside the scope of Python: the command line after -exec is run for each matching file found. For more details about the find command, see its manpage. Within the Python script, string.replace translates to Unix end-of-line markers, and os.chmod works just like a shell chmod command. There are other ways to translate end-of-lines, too; see Chapter 5. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.2.3 Automating site move edits</a>
<a href="http://www.server.com/dirpath/more.html">Speaking of installation tasks, a common pitfall of web programming is that hardcoded site names embedded in HTML code stop working the minute you relocate the site to a new server. Minimal URLs (just the filename) are more portable, but for various reasons are not always used. Somewhere along the way, I also grew tired of updating URLs in hyperlinks and form actions, and wrote a Python script to do it all for me (see Example 12-4). </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-4. PP2EInternetCgi-Webfixsitename.py </a>
<a href="http://www.server.com/dirpath/more.html">#!/usr/bin/env python ############################################################### # run this script in Cgi-Web dir after copying book web # examples to a new server--automatically changes all starship # server references in hyperlinks and form action tags to the # new server/site; warns about references that weren't changed # (may need manual editing); note that starship references are # not usually needed or used--since browsers have memory, server # and path can usually be omitted from a URL in the prior page # if it lives at the same place (e.g., "file.cgi" is assumed to # be in the same server/path as a page that contains this name, # with a real url like "http://lastserver/lastpath/file.cgi"), # but a handful of URLs are fully specified in book examples; # reuses the Visitor class developed in the system chapters, # to visit and convert all files at and below current dir; ############################################################### import os, string from PP2E.PyTools.visitor import FileVisitor # os.path.walk wrapper listonly = 0 oldsite = 'starship.python.net/~lutz' # server/rootdir in book newsite = 'XXXXXX/YYYYYY' # change to your site warnof = ['starship.python', 'lutz'] # warn if left after fix fixext = ['.py', '.html', '.cgi'] # file types to check class FixStarship(FileVisitor): def __init__(self, listonly=0): # replace oldsite refs FileVisitor.__init__(self, listonly=listonly) # in all web text files self.changed, self.warning = [], [] # need diff lists here def visitfile(self, fname): # or use find.find list FileVisitor.visitfile(self, fname) if self.listonly: return if os.path.splitext(fname)[1] in fixext: text = open(fname, 'r').read( ) if string.find(text, oldsite) != -1: text = string.replace(text, oldsite, newsite) open(fname, 'w').write(text) self.changed.append(fname) for word in warnof: if string.find(text, word) != -1: self.warning.append(fname); break if __name__ == '__main__': # don't run auto if clicked go = raw_input('This script changes site in all web files; continue?') if go != 'y': raw_input('Canceled - hit enter key') else: walker = FixStarship(listonly) walker.run( ) print 'Visited %d files and %d dirs' % (walker.fcount, walker.dcount) def showhistory(label, flist): print ' %s in %d files:' % (label, len(flist)) for fname in flist: print '=>', fname showhistory('Made changes', walker.changed) showhistory('Saw warnings', walker.warning) def edithistory(flist): for fname in flist: # your editor here os.system('vi ' + fname) if raw_input('Edit changes?') == 'y': edithistory(walker.changed) if raw_input('Edit warnings?') == 'y': edithistory(walker.warning)</a>
<a href="http://www.server.com/dirpath/more.html">This is a more complex script that reuses the visitor.py module we wrote in Chapter 5 to wrap the os.path.walk call. If you read that chapter, this script will make sense. If not, we won't go into many more details here again. Suffice it to say that this program visits all source code files at and below the directory where it is run, globally changing all starship.python.net/~lutz appearances to whatever you've assigned to variable newsite within the script. On request, it will also launch your editor to view files changed, as well as files that contain potentially suspicious strings. As coded, it launches the Unix vi text editor at the end, but you can change this to start whatever editor you like (this is Python, after all): </a>
<a href="http://www.server.com/dirpath/more.html">C:...PP2EInternetCgi-Web>python fixsitename.py This script changes site in all web files; continue?y . ... 1 => .PyInternetDemos.html 2 => .README.txt 3 => .fixcgi.py 4 => .fixsitename.py 5 => .index.html 6 => .python_snake_ora.gif .Basics ... 7 => .Basicsmlutz.jpg 8 => .Basicslanguages.html 9 => .Basicslanguages-src.cgi ...more... 146 => .PyMailCgi empsecret.doc.txt Visited 146 files and 16 dirs Made changes in 8 files: => .fixsitename.py => .Basicslanguages.cgi => .Basics est3.html => .Basics est0.py => .Basics est0.cgi => .Basics est5c.html => .PyMailCgicommonhtml.py => .PyMailCgisendurl.py Saw warnings in 14 files: => .PyInternetDemos.html => .fixsitename.py => .index.html => .Basicslanguages.cgi ...more... => .PyMailCgipymailcgi.html => .PyMailCgicommonhtml.py => .PyMailCgisendurl.py Edit changes?n Edit warnings?y </a>
<a href="http://www.server.com/dirpath/more.html">The net effect is that this script automates part of the site relocation task: running it will update all pages' URLs for the new site name automatically, which is considerably less aggravating than manually hunting down and editing each such reference by hand. </a>
<a href="http://www.server.com/dirpath/more.html">There aren't many hardcoded starship site references in web examples in this book (the script found and fixed eight above), but be sure to run this script in the Cgi-Web directory from a command line, after copying the book examples to your own site. To use this script for other site moves, simply set both oldsite and newsite as appropriate. The truly ambitious scriptmaster might even run such a script from within another that first copies a site's contents by FTP (see ftplib in the previous chapter).[5] </a>
<a href="http://www.server.com/dirpath/more.html">[5] As I mentioned at the start of this chapter, there are often multiple ways to accomplish any given webmaster-y task. For instance, the HTML </a>
<a href="http://www.server.com/dirpath/more.html">tag may provide an alternative way to map absolute URLs, and FTPing your web site files to your server individually and in text mode might obviate line-end issues. There are undoubtedly other ways to handle such tasks, too. On the other hand, such alternatives wouldn't be all that useful in a book that illustrates Python coding techniques. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.2.4 Finding Python on the server</a>
<a href="http://www.server.com/dirpath/more.html">One last install pointer: even though Python doesn't have to be installed on any clients in the context of a server-side web application, it does have to exist on the server machine where your CGI scripts are expected to run. If you are using a web server that you did not configure yourself, you must be sure that Python lives on that machine. Moreover, you need to find where it is on that machine so that you can specify its path in the #! line at the top of your script. </a>
<a href="http://www.server.com/dirpath/more.html">By now, Python is a pervasive tool, so this generally isn't as big a concern as it once was. As time goes by, it will become even more common to find Python as a standard component of server machines. But if you're not sure if or where Python lives on yours, here are some tips: </a>
- <a href="http://www.server.com/dirpath/more.html">Especially on Unix systems, you should first assume that Python lives in a standard place (e.g., /usr/local/bin/python), and see if it works. Chances are that Python already lives on such machines. If you have Telnet access on your server, a Unix find command starting at /usr may help. </a>
- <a href="http://www.server.com/dirpath/more.html">If your server runs Linux, you're probably set to go. Python ships as a standard part of Linux distributions these days, and many web sites and Internet Service Providers (ISPs) run the Linux operating system; at such sites, Python probably already lives at /usr/bin/python. </a>
- <a href="http://www.server.com/dirpath/more.html">In other environments where you cannot control the server machine yourself, it may be harder to obtain access to an already-installed Python. If so, you can relocate your site to a server that does have Python installed, talk your ISP into installing Python on the machine you're trying to use, or install Python on the server machine yourself. </a>
<a href="http://www.server.com/dirpath/more.html">If your ISP is unsympathetic to your need for Python and you are willing to relocate your site to one that is, you can find lists of Python-friendly ISPs by searching http://www.python.org. And if you choose to install Python on your server machine yourself, be sure to check out the freeze tool shipped with the Python source distribution (in the Tools directory). With freeze, you can create a single executable program file that contains the entire Python interpreter, as well as all the standard library modules. Such a frozen interpreter can be uploaded to your web account by FTP in a single step, and it won't require a full-blown Python installation on the server. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.3 Adding Pictures and Generating Tables</a>
<a href="http://www.server.com/dirpath/more.html">Now let's get back to writing server-side code. As anyone who's ever surfed the Web knows, web pages usually consist of more than simple text. Example 12-5 is a Python CGI script that prints an HTML tag in its output to produce a graphic image in the client browser. There's not much Python-specific about this example, but note that just as for simple HTML files, the image file (ppsmall.gif ) lives on and is downloaded from the server machine when the browser interprets the output of this script. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-5. PP2EInternetCgi-WebBasics est1.cgi </a>
<a href="http://www.server.com/dirpath/more.html">#!/usr/bin/python text = """Content-type: text/html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">A Second CGI script</a>
<a href="http://www.server.com/dirpath/more.html">Hello, CGI World!</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">""" print text </a>
<a href="http://www.server.com/dirpath/more.html">Notice the use of the triple-quoted string block here; the entire HTML string is sent to the browser in one fell swoop, with the print statement at the end. If client and server are both functional, a page that looks like Figure 12-4 will be generated when this script is referenced and run. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-4. A page with an image generated by test1.cgi</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">So far, our CGI scripts have been putting out canned HTML that could have just as easily been stored in an HTML file. But because CGI scripts are executable programs, they can also be used to generate HTML on the fly, dynamically -- even, possibly, in response to a particular set of user inputs sent to the script. That's the whole purpose of CGI scripts, after all. Let's start using this to better advantage now, and write a Python script that builds up response HTML programmatically (see Example 12-6). </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-6. PP2EInternetCgi-WebBasics est2.cgi </a>
<a href="http://www.server.com/dirpath/more.html">#!/usr/bin/python print """Content-type: text/html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">A Third CGI script</a>
<a href="http://www.server.com/dirpath/more.html">Hello, CGI World!</a>
<a href="http://www.server.com/dirpath/more.html">""" for i in range(5): print "" for j in range(4): print "" % (i, j) print "" print """ </a>
<a href="http://www.server.com/dirpath/more.html">%d.%d</a> |
<a href="http://www.server.com/dirpath/more.html">""" </a>
<a href="http://www.server.com/dirpath/more.html">Despite all the tags, this really is Python code -- the test2.cgi script uses triple-quoted strings to embed blocks of HTML again. But this time, the script also uses nested Python for loops to dynamically generate part of the HTML that is sent to the browser. Specifically, it emits HTML to lay out a two-dimensional table in the middle of a page, as shown in Figure 12-5. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-5. A page with a table generated by test2.cgi</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">Each row in the table displays a "row.column" pair, as generated by the executing Python script. If you're curious how the generated HTML looks, select your browser's View Source option after you've accessed this page. It's a single HTML page composed of the HTML generated by the first print in the script, then the for loops, and finally the last print. In other words, the concatenation of this script's output is an HTML document with headers. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.3.1 Table tags</a>
<a href="http://www.server.com/dirpath/more.html">This script generates HTML table tags. Again, we're not out to learn HTML here, but we'll take a quick look just so you can make sense of the example. Tables are declared by the text between </a>
<a href="http://www.server.com/dirpath/more.html">and </a>
<a href="http://www.server.com/dirpath/more.html">tags in HTML. Typically, a table's text in turn declares the contents of each table row between and tags and each column within a row between and tags. The loops in our script build up HTML to declare five rows of four columns each, by printing the appropriate tags, with the current row and column number as column values. For instance, here is part of the script's output, defining the first two rows: </a>
<a href="http://www.server.com/dirpath/more.html"> . . . </a>
<a href="http://www.server.com/dirpath/more.html">0.0</a> | <a href="http://www.server.com/dirpath/more.html">0.1</a> | <a href="http://www.server.com/dirpath/more.html">0.2</a> | <a href="http://www.server.com/dirpath/more.html">0.3</a> |
<a href="http://www.server.com/dirpath/more.html">1.0</a> | <a href="http://www.server.com/dirpath/more.html">1.1</a> | <a href="http://www.server.com/dirpath/more.html">1.2</a> | <a href="http://www.server.com/dirpath/more.html">1.3</a> |
<a href="http://www.server.com/dirpath/more.html">Other table tags and options let us specify a row title (), layout borders, and so on. We'll see more table syntax put to use to lay out forms in a later section. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.4 Adding User Interaction</a>
<a href="http://www.server.com/dirpath/more.html">CGI scripts are great at generating HTML on the fly like this, but they are also commonly used to implement interaction with a user typing at a web browser. As described earlier in this chapter, web interactions usually involve a two-step process and two distinct web pages: you fill out a form page and press submit, and a reply page eventually comes back. In between, a CGI script processes the form input. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.4.1 Submission</a>
<a href="http://www.server.com/dirpath/more.html">That description sounds simple enough, but the process of collecting user inputs requires an understanding of a special HTML tag, . Let's look at the implementation of a simple web interaction to see forms at work. First off, we need to define a form page for the user to fill out, as shown in Example 12-7. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-7. PP2EInternetCgi-WebBasics est3.html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">A first user interaction: forms</a>
<a href="http://www.server.com/dirpath/more.html">Enter your name: </a>
<a href="http://www.server.com/dirpath/more.html">test3.html is a simple HTML file, not a CGI script (though its contents could be printed from a script as well). When this file is accessed, all the text between its </a>
<a href="http://www.server.com/dirpath/more.html">and </a>
<a href="http://www.server.com/dirpath/more.html">tags generate the input fields and Submit button shown in Figure 12-6. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-6. A simple form page generated by test3.html</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">12.3.4.2 More on form tags</a>
<a href="http://www.server.com/dirpath/more.html">We won't go into all the details behind coding HTML forms, but a few highlights are worth underscoring. Within a form's HTML code: </a>
- <a href="http://www.server.com/dirpath/more.html">The form's action option gives the URL of a CGI script that will be invoked to process submitted form data. This is the link from a form to its handler program -- in this case, a program called test3.cgi in my web home directory, on a server machine called starship.python.net. The action option is the moral equivalent to command options in Tkinter buttons -- it's where a callback handler (here, a remote handler) is registered to the browser. </a>
- <a href="http://www.server.com/dirpath/more.html">Input controls are specified with nested tags. In this example, input tags have two key options. The type option accepts values such as text for text fields and submit for a Submit button (which sends data to the server and is labeled "Submit Query" by default). The name option is the hook used to identify the entered value by key, once all the form data reaches the server. For instance, the server-side CGI script we'll see in a moment uses the string user as a key to get the data typed into this form's text field. As we'll see in later examples, other input tag options can specify initial values (value=X), display-only mode (readonly), and so on. Other input type option values may transmit hidden data (type=hidden), reinitialize fields (type=reset), or make multiple-choice buttons (type=checkbox). </a>
- <a href="http://www.server.com/dirpath/more.html">Forms also include a method option to specify the encoding style to be used to send data over a socket to the target server machine. Here, we use the post style, which contacts the server and then ships it a stream of user input data in a separate transmission. An alternative get style ships input information to the server in a single transmission step, by adding user inputs to the end of the URL used to invoke the script, usually after a ? character (more on this soon). With get, inputs typically show up on the server in environment variables or as arguments in the command line used to start the script. With post, they must be read from standard input and decoded. Luckily, Python's cgi module transparently handles either encoding style, so our CGI scripts don't need to know or care which is used. </a>
<a href="http://www.server.com/dirpath/more.html">Notice that the action URL in this example's form spells out the full address for illustration. Because the browser remembers where the enclosing HTML page came from, it works the same with just the script's filename, as shown in Example 12-8. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-8. PP2EInternetCgi-WebBasics est3-minimal.html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">A first user interaction: forms</a>
<a href="http://www.server.com/dirpath/more.html">Enter your name: </a>
<a href="http://www.server.com/dirpath/more.html">It may help to remember that URLs embedded in form action tags and hyperlinks are directions to the browser first, not the script. The test3.cgi script itself doesn't care which URL form is used to trigger it -- minimal or complete. In fact, all parts of a URL through the script filename (and up to URL query parameters) is used in the conversation between browser and HTTP server, before a CGI script is ever spawned. As long as the browser knows which server to contact, the URL will work, but URLs outside of a page (e.g., typed into a browser's address field or sent to Python's urllib module) usually must be completely specified, because there is no notion of a prior page. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.4.3 Response</a>
<a href="http://www.server.com/dirpath/more.html">So far, we've created only a static page with an input field. But the Submit button on this page is loaded to work magic. When pressed, it triggers the remote program whose URL is listed in the form's action option, and passes this program the input data typed by the user, according to the form's method encoding style option. On the server, a Python script is started to handle the form's input data while the user waits for a reply on the client, as shown in Example 12-9. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-9. PP2EInternetCgi-WebBasics est3.cgi </a>
<a href="http://www.server.com/dirpath/more.html">#!/usr/bin/python ####################################################### # runs on the server, reads form input, prints html; # url=http://server-name/root-dir/Basics/test3.cgi ####################################################### import cgi form = cgi.FieldStorage( ) # parse form data print "Content-type: text/html" # plus blank line html = """ </a>
test3.cgi<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">Greetings</a>
<a href="http://www.server.com/dirpath/more.html">%s</a>
<a href="http://www.server.com/dirpath/more.html">""" if not form.has_key('user'): print html % "Who are you?" else: print html % ("Hello, %s." % form['user'].value) </a>
<a href="http://www.server.com/dirpath/more.html">As before, this Python CGI script prints HTML to generate a response page in the client's browser. But this script does a bit more: it also uses the standard cgi module to parse the input data entered by the user on the prior web page (see Figure 12-6). Luckily, this is all automatic in Python: a call to the cgi module's FieldStorage class automatically does all the work of extracting form data from the input stream and environment variables, regardless of how that data was passed -- in a post style stream or in get style parameters appended to the URL. Inputs sent in both styles look the same to Python scripts. </a>
<a href="http://www.server.com/dirpath/more.html">Scripts should call cgi.FieldStoreage only once and before accessing any field values. When called, we get back an object that looks like a dictionary -- user input fields from the form (or URL) show up as values of keys in this object. For example, in the script, form['user'] is an object whose value attribute is a string containing the text typed into the form's text field. If you flip back to the form page's HTML, you'll notice that the input field's name option was user -- the name in the form's HTML has become a key we use to fetch the input's value from a dictionary. The object returned by FieldStorage supports other dictionary operations, too -- for instance, the has_key method may be used to check if a field is present in the input data. </a>
<a href="http://www.server.com/dirpath/more.html">Before exiting, this script prints HTML to produce a result page that echoes back what the user typed into the form. Two string-formatting expressions (%) are used to insert the input text into a reply string, and the reply string into the triple-quoted HTML string block. The body of the script's output looks like this: </a>
test3.cgi<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">Greetings</a>
<a href="http://www.server.com/dirpath/more.html">Hello, King Arthur.</a>
<a href="http://www.server.com/dirpath/more.html">In a browser, the output is rendered into a page like the one in Figure 12-7. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-7. test3.cgi result for parameters in a form</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">12.3.4.4 Passing parameters in URLs</a>
<a href="http://www.server.com/dirpath/more.html">Notice that the URL address of the script that generated this page shows up at the top of the browser. We didn't type this URL itself -- it came from the action tag of the prior page's form HTML. However, there is nothing stopping us from typing the script's URL explicitly in our browser's address field to invoke the script, just as we did for our earlier CGI script and HTML file examples. </a>
<a href="http://www.server.com/dirpath/more.html">But there's a catch here: where does the input field's value come from if there is no form page? That is, if we type the CGI script's URL ourselves, how does the input field get filled in? Earlier, when we talked about URL formats, I mentioned that the get encoding scheme tacks input parameters onto the end of URLs. When we type script addresses explicitly, we can also append input values on the end of URLs, where they serve the same purpose as fields in forms. Moreover, the Python cgi module makes URL and form inputs look identical to scripts. </a>
<a href="http://www.server.com/dirpath/more.html">For instance, we can skip filling out the input form page completely, and directly invoke our test3.cgi script by visiting a URL of the form: </a>
<a href="http://www.server.com/dirpath/more.html">http://starship.python.net/~lutz/Basics/test3.cgi?user=Brian</a>
<a href="http://www.server.com/dirpath/more.html">In this URL, a value for the input named user is specified explicitly, as if the user had filled out the input page. When called this way, the only constraint is that the parameter name user must match the name expected by the script (and hardcoded in the form's HTML). We use just one parameter here, but in general, URL parameters are typically introduced with a ? and followed by one or more name=value assignments, separated by & characters if there is more than one. Figure 12-8 shows the response page we get after typing a URL with explicit inputs. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-8. test3.cgi result for parameters in a URL</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">In general, any CGI script can be invoked either by filling out and submitting a form page or by passing inputs at the end of a URL. When CGI scripts are invoked with explicit input parameters this way, it's difficult to not see their similarity to functions, albeit ones that live remotely on the Net. Passing data to scripts in URLs is similar to keyword arguments in Python functions, both operationally and syntactically. In fact, in Chapter 15we will meet a system called Zope that makes the relationship between URLs and Python function calls even more literal (URLs become more direct function calls). </a>
<a href="http://www.server.com/dirpath/more.html">Incidentally, if you clear out the name input field in the form input page (i.e., make it empty) and press submit, the user name field becomes empty. More accurately, the browser may not send this field along with the form data at all, even though it is listed in the form layout HTML. The CGI script detects such a missing field with the dictionary has_key method and produces the page captured in Figure 12-9 in response. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-9. An empty name field produces an error page</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">In general, CGI scripts must check to see if any inputs are missing, partly because they might not be typed by a user in the form, but also because there may be no form at all -- input fields might not be tacked on to the end of an explicitly typed URL. For instance, if we type the script's URL without any parameters at all (i.e., omit the text ? and beyond), we get this same error response page. Since we can invoke any CGI through a form or URL, scripts must anticipate both scenarios. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.5 Using Tables to Lay Out Forms</a>
<a href="http://www.server.com/dirpath/more.html">Now let's move on to something a bit more realistic. In most CGI applications, input pages are composed of multiple fields. When there is more than one, input labels and fields are typically laid out in a table, to give the form a well-structured appearance. The HTML file in Example 12-10 defines a form with two input fields. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-10. PP2EInternetCgi-WebBasics est4.html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">A second user interaction: tables</a>
<a href="http://www.server.com/dirpath/more.html">Enter your name: </a> | |
---|---|
<a href="http://www.server.com/dirpath/more.html">Enter your age: </a> | |
<a href="http://www.server.com/dirpath/more.html">The tag defines a column like , but also tags it as a header column, which generally means it is rendered in a bold font. By placing the input fields and labels in a table like this, we get an input page like that shown in Figure 12-10. Labels and inputs are automatically lined up vertically in columns much as they were by the Tkinter GUI geometry managers we met earlier in this book. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-10. A form laid out with table tags</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">When this form's Submit button (labeled "Send" by the page's HTML) is pressed, it causes the script in Example 12-11 to be executed on the server machine, with the inputs typed by the user. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-11. PP2EInternetCgi-WebBasics est4.cgi </a>
<a href="http://www.server.com/dirpath/more.html">#!/usr/bin/python ####################################################### # runs on the server, reads form input, prints html; # url http://server-name/root-dir/Basics/test4.cgi ####################################################### import cgi, sys sys.stderr = sys.stdout # errors to browser form = cgi.FieldStorage( ) # parse form data print "Content-type: text/html " # plus blank line # class dummy: # def __init__(self, s): self.value = s # form = {'user': dummy('bob'), 'age':dummy('10')} html = """ </a>
test4.cgi<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">Greetings</a>
<a href="http://www.server.com/dirpath/more.html">%s</a>
<a href="http://www.server.com/dirpath/more.html">%s</a>
<a href="http://www.server.com/dirpath/more.html">%s</a>
<a href="http://www.server.com/dirpath/more.html">""" if not form.has_key('user'): line1 = "Who are you?" else: line1 = "Hello, %s." % form['user'].value line2 = "You're talking to a %s server." % sys.platform line3 = "" if form.has_key('age'): try: line3 = "Your age squared is %d!" % (int(form['age'].value) ** 2) except: line3 = "Sorry, I can't compute %s ** 2." % form['age'].value print html % (line1, line2, line3) </a>
<a href="http://www.server.com/dirpath/more.html">The table layout comes from the HTML file, not this Python CGI script. In fact, this script doesn't do much new -- it uses string formatting to plug input values into the response page's HTML triple-quoted template string as before, this time with one line per input field. There are, however, a few new tricks here worth noting, especially regarding CGI script debugging and security. We'll talk about them in the next two sections. </a>
<a href="http://www.server.com/dirpath/more.html">12.3.5.1 Converting strings in CGI scripts</a>
<a href="http://www.server.com/dirpath/more.html">Just for fun, the script echoes back the name of the server platform by fetching sys.platform along with the square of the age input field. Notice that the age input's value must be converted to an integer with the built-in int function; in the CGI world, all inputs arrive as strings. We could also convert to an integer with the built-in string.atoi or eval function. Conversion (and other) errors are trapped gracefully in a try statement to yield an error line, rather than letting our script die. </a>
|
<a href="http://www.server.com/dirpath/more.html">12.3.5.2 Debugging CGI scripts</a>
<a href="http://www.server.com/dirpath/more.html">Errors happen, even in the brave new world of the Internet. Generally speaking, debugging CGI scripts can be much more difficult than debugging programs that run on your local machine. Not only do errors occur on a remote machine, but scripts generally won't run without the context implied by the CGI model. The script in Example 12-11 demonstrates the following two common debugging tricks. </a>
<a href="http://www.server.com/dirpath/more.html">Error message trapping</a>
<a href="http://www.server.com/dirpath/more.html">This script assigns sys.stderr to sys.stdout so that Python error messages wind up being displayed in the response page in the browser. Normally, Python error messages are written to stderr. To route them to the browser, we must make stderr reference the same file object as stdout (which is connected to the browser in CGI scripts). If we don't do this assignment, Python errors, including program errors in our script, never show up in the browser. </a>
<a href="http://www.server.com/dirpath/more.html">Test case mock-up</a>
<a href="http://www.server.com/dirpath/more.html">The dummy class definition, commented out in this final version, was used to debug the script before it was installed on the Net. Besides not seeing stderr messages by default, CGI scripts also assume an enclosing context that does not exist if they are tested outside the CGI environment. For instance, if run from the system command line, this script has no form input data. Uncomment this code to test from the system command line. The dummy class masquerades as a parsed form field object, and form is assigned a dictionary containing two form field objects. The net effect is that form will be plug-and-play compatible with the result of a cgi.FieldStorage call. As usual in Python, object interfaces (not datatypes) are all we must adhere to. </a>
<a href="http://www.server.com/dirpath/more.html">Here are a few general tips for debugging your server-side CGI scripts: </a>
<a href="http://www.server.com/dirpath/more.html">Run the script from the command line.</a>
<a href="http://www.server.com/dirpath/more.html">It probably won't generate HTML as is, but running it standalone will detect any syntax errors in your code. Recall that a Python command line can run source code files regardless of their extension: e.g., python somescript.cgi works fine. </a>
<a href="http://www.server.com/dirpath/more.html">Assign sys.stderr to sys.stdout as early as possible in your script. </a>
<a href="http://www.server.com/dirpath/more.html">This will make the text of Python error messages and stack dumps appear in your client browser when accessing the script. In fact, short of wading through server logs, this may be the only way to see the text of error messages after your script aborts. </a>
<a href="http://www.server.com/dirpath/more.html">Mock up inputs to simulate the enclosing CGI context. </a>
<a href="http://www.server.com/dirpath/more.html">For instance, define classes that mimic the CGI inputs interface (as done with the dummy class in this script), so that you can view the script's output for various test cases by running it from the system command line.[6] Setting environment variables to mimic form or URL inputs sometimes helps, too (we'll see how later in this chapter). </a>
<a href="http://www.server.com/dirpath/more.html">[6] This technique isn't unique to CGI scripts, by the way. In Chapter 15, we'll meet systems that embed Python code inside HTML. There is no good way to test such code outside the context of the enclosing system, without extracting the embedded Python code (perhaps by using the htmllib HTML parser that comes with Python) and running it with a passed-in mock-up of the API that it will eventually use.</a>
<a href="http://www.server.com/dirpath/more.html">Call utilities to display CGI context in the browser. </a>
<a href="http://www.server.com/dirpath/more.html">The CGI module includes utility functions that send a formatted dump of CGI environment variables and input values to the browser (e.g., cgi.test, cgi.print_form). Sometimes, this is enough to resolve connection problems. We'll use some of these in the mailer case study in the next chapter. </a>
<a href="http://www.server.com/dirpath/more.html">Show exceptions you catch. </a>
<a href="http://www.server.com/dirpath/more.html">If you catch an exception that Python raises, the Python error message won't be printed to stderr (that is simply the default behavior). In such cases, it's up to your script to display the exception's name and value in the response page; exception details are available in the built-in sys module. We'll use this in a later example, too. </a>
<a href="http://www.server.com/dirpath/more.html">Run it live. </a>
<a href="http://www.server.com/dirpath/more.html">Of course, once your script is at least half working, your best bet is likely to start running it live on the server, with real inputs coming from a browser. </a>
<a href="http://www.server.com/dirpath/more.html">When this script is run by submitting the input form page, its output produces the new reply page shown in Figure 12-11. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-11. Reply page generated by test4.cgi</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">As usual, we can pass parameters to this CGI script at the end of a URL, too. Figure 12-12 shows the page we get when passing a user and age explicitly in the URL. Notice that we have two parameters after the ? this time; we separate them with &. Also note that we've specified a blank space in the user value with +. This is a common URL encoding convention. On the server side, the + is automatically replaced with a space again. It's also part of the standard escape rule for URL strings, which we'll revisit later. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-12. Reply page generated by test4.cgi for parameters in URL</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">12.3.6 Adding Common Input Devices</a>
<a href="http://www.server.com/dirpath/more.html">So far, we've been typing inputs into text fields. HTML forms support a handful of input controls (what we'd call widgets in the traditional GUI world) for collecting user inputs. Let's look at a CGI program that shows all the common input controls at once. As usual, we define both an HTML file to lay out the form page and a Python CGI script to process its inputs and generate a response. The HTML file is presented in Example 12-12. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-12. PP2EInternetCgi-WebBasics est5a.html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">Common input devices</a>
<a href="http://www.server.com/dirpath/more.html">Please complete the following form and click Send</a>
<a href="http://www.server.com/dirpath/more.html">Name: </a> | ||||
---|---|---|---|---|
<a href="http://www.server.com/dirpath/more.html">Shoe size: </a> |
|
|||
<a href="http://www.server.com/dirpath/more.html">Occupation: </a> | <a href="http://www.server.com/dirpath/more.html">DeveloperManagerStudentEvangelistOther </a> | |||
<a href="http://www.server.com/dirpath/more.html">Political affiliations: </a> |
|
|||
<a href="http://www.server.com/dirpath/more.html">Comments: </a> | <a href="http://www.server.com/dirpath/more.html">Enter text here </a> | |||
<a href="http://www.server.com/dirpath/more.html">When rendered by a browser, the page in Figure 12-13 appears. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-13. Form page generated by test5a.html</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">This page contains a simple text field as before, but it also has radiobuttons, a pull-down selection list, a set of multiple-choice checkbuttons, and a multiple-line text input area. All have a name option in the HTML file, which identifies their selected value in the data sent from client to server. When we fill out this form and click the Send submit button, the script in Example 12-13 runs on the server to process all the input data typed or selected in the form. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-13. PP2EInternetCgi-WebBasics est5.cgi </a>
<a href="http://www.server.com/dirpath/more.html">#!/usr/bin/python ####################################################### # runs on the server, reads form input, prints html; # url=http://server-name/root-dir/Basics/test5.cgi ####################################################### import cgi, sys, string form = cgi.FieldStorage( ) # parse form data print "Content-type: text/html" # plus blank line html = """ </a>
test5.cgi<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">Greetings</a>
<a href="http://www.server.com/dirpath/more.html">Your name is %(name)s</a>
<a href="http://www.server.com/dirpath/more.html">You wear rather %(shoesize)s shoes</a>
<a href="http://www.server.com/dirpath/more.html">Your current job: %(job)s</a>
<a href="http://www.server.com/dirpath/more.html">You program in %(language)s</a>
<a href="http://www.server.com/dirpath/more.html">You also said:</a>
<a href="http://www.server.com/dirpath/more.html">%(comment)s</a>
<a href="http://www.server.com/dirpath/more.html">""" data = {} for field in ['name', 'shoesize', 'job', 'language', 'comment']: if not form.has_key(field): data[field] = '(unknown)' else: if type(form[field]) != type([]): data[field] = form[field].value else: values = map(lambda x: x.value, form[field]) data[field] = string.join(values, ' and ') print html % data </a>
<a href="http://www.server.com/dirpath/more.html">This Python script doesn't do much; it mostly just copies form field information into a dictionary called data, so that it can be easily plugged into the triple-quoted response string. A few of its tricks merit explanation: </a>
<a href="http://www.server.com/dirpath/more.html">Field validation</a>
<a href="http://www.server.com/dirpath/more.html">As usual, we need to check all expected fields to see if they really are present in the input data, using the dictionary has_key method. Any or all of the input fields may be missing if they weren't entered on the form or appended to an explicit URL. </a>
<a href="http://www.server.com/dirpath/more.html">String formatting</a>
<a href="http://www.server.com/dirpath/more.html">We're using dictionary key references in the format string this time -- recall that %(name)s means pull out the value for key name in the data dictionary and perform a to-string conversion on its value. </a>
<a href="http://www.server.com/dirpath/more.html">Multiple-choice fields</a>
<a href="http://www.server.com/dirpath/more.html">We're also testing the type of all the expected fields' values to see if they arrive as a list instead of the usual string. Values of multiple-choice input controls, like the language choice field in this input page, are returned from cgi.FieldStorage as a list of objects with value attributes, rather than a simple single object with a value. This script copies simple field values to the dictionary verbatim, but uses map to collect the value fields of multiple-choice selections, and string.join to construct a single string with an and inserted between each selection value (e.g., Python and Tcl).[7] </a>
<a href="http://www.server.com/dirpath/more.html">[7] Two forward references are worth noting here. Besides simple strings and lists, later we'll see a third type of form input object, returned for fields that specify file uploads. The script in this example should really also escape the echoed text inserted into the HTML reply to be robust, lest it contain HTML operators. We will discuss escapes in detail later. </a>
<a href="http://www.server.com/dirpath/more.html">When the form page is filled out and submitted, the script creates the response shown in Figure 12-14 -- essentially just a formatted echo of what was sent. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-14. Response page created by test5.cgi (1)</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">12.3.6.1 Changing input layouts</a>
<a href="http://www.server.com/dirpath/more.html">Suppose that you've written a system like this, and your users, clients, and significant other start complaining that the input form is difficult to read. Don't worry. Because the CGI model naturally separates the user interface (the HTML page definition) from the processing logic (the CGI script), it's completely painless to change the form's layout. Simply modify the HTML file; there's no need to change the CGI code at all. For instance, Example 12-14 contains a new definition of the input that uses tables a bit differently to provide a nicer layout with borders. </a>
<a href="http://www.server.com/dirpath/more.html">Example 12-14. PP2EInternetCgi-WebBasics est5b.html </a>
CGI 101<a href="http://www.server.com/dirpath/more.html"> </a>
<a href="http://www.server.com/dirpath/more.html">Common input devices: alternative layout</a>
<a href="http://www.server.com/dirpath/more.html">Use the same test5.cgi server side script, but change the layout of the form itself. Notice the separation of user interface and processing logic here; the CGI script is independent of the HTML used to interact with the user/client.</a>
<a href="http://www.server.com/dirpath/more.html">Please complete the following form and click Submit</a>
<a href="http://www.server.com/dirpath/more.html">Name: </a> | |
---|---|
<a href="http://www.server.com/dirpath/more.html">Shoe size: </a> | <a href="http://www.server.com/dirpath/more.html">Small Medium Large </a> |
<a href="http://www.server.com/dirpath/more.html">Occupation: </a> | <a href="http://www.server.com/dirpath/more.html">DeveloperManagerStudentEvangelistOther </a> |
<a href="http://www.server.com/dirpath/more.html">Political affiliations: </a> |
<a href="http://www.server.com/dirpath/more.html">Pythonista </a> <a href="http://www.server.com/dirpath/more.html">Perlmonger </a> <a href="http://www.server.com/dirpath/more.html">Tcler </a> |
<a href="http://www.server.com/dirpath/more.html">Comments: </a> | <a href="http://www.server.com/dirpath/more.html">Enter spam here </a> |
<a href="http://www.server.com/dirpath/more.html">When we visit this alternative page with a browser, we get the interface shown in Figure 12-15. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-15. Form page created by test5b.html</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">Now, before you go blind trying to detect the differences in this and the prior HTML file, I should note that the HTML differences that produce this page are much less important than the fact that the action fields in these two pages' forms reference identical URLs. Pressing this version's Submit button triggers the exact same and totally unchanged Python CGI script again, test5.cgi (Example 12-13). </a>
<a href="http://www.server.com/dirpath/more.html">That is, scripts are completely independent of the layout of the user-interface used to send them information. Changes in the response page require changing the script, of course; but we can change the input page's HTML as much as we like, without impacting the server-side Python code. Figure 12-16 shows the response page produced by the script this time around. </a>
<a href="http://www.server.com/dirpath/more.html">Figure 12-16. Response page created by test5.cgi (2)</a>
<a href="http://www.server.com/dirpath/more.html">
<a href="http://www.server.com/dirpath/more.html">12.3.7 Passing Parameters in Hardcoded URLs</a>
<a href="http://www.server.com/dirpath/more.html">Earlier, we passed parameters to CGI scripts by listing them at the end of a URL typed into the browser's address field (after a ?). But there's nothing sacred about the browser's address field. In particular, there's nothing stopping us from using the same URL syntax in hyperlinks that we hardcode in web page definitions. For example, the web page from Example 12-15 defines three hyperlinks (the text between </a> and tags), which all trigger our original test5.cgi script again, but with three different precoded sets of parameters.
Example 12-15. PP2EInternetCgi-WebBasics est5c.html
CGI 101
Common input devices: URL parameters
This demo invokes the test5.cgi server-side script again, but hardcodes input data to the end of the script's URL, within a simple hyperlink (instead of packaging up a form's inputs). Click your browser's "show page source" button to view the links associated with each list item below.
This is really more about CGI than Python, but notice that Python's cgi module handles both this form of input (which is also produced by GET form actions), as well as POST-ed forms; they look the same to the Python CGI script. In other words, cgi module users are independent of the method used to submit data.
Also notice that URLs with appended input values like this can be generated as part of the page output by another CGI script, to direct a next user click to the right place and context; together with type 'hidden' input fields, they provide one way to save state between clicks.
- <a href="test5.cgi?name=Bob&shoesize=small">Send Bob, small</a>
- <a href="test5.cgi?name=Tom&language=Python">Send Tom, Python</a>
- <a href="http://starship.python.net/~lutz/Basics/test5.cgi?job=Evangelist&comment=spam">Send Evangelist, spam</a>
This static HTML file defines three hyperlinks -- the first two are minimal and the third is fully specified, but all work similarly (again, the target script doesn't care). When we visit this file's URL, we see the page shown in Figure 12-17. It's mostly just a page for launching canned calls to the CGI script.
Figure 12-17. Hyperlinks page created by test5c.html
Clicking on this page's second link creates the response page in Figure 12-18. This link invokes the CGI script, with the name parameter set to "Tom" and the language parameter set to "Python," simply because those parameters and values are hardcoded in the URL listed in the HTML for the second hyperlink. It's exactly as if we had manually typed the line shown at the top of the browser in Figure 12-18.
Figure 12-18. Response page created by test5.cgi (3)
Notice that lots of fields are missing here; the test5.cgi script is smart enough to detect and handle missing fields and generate an unknown message in the reply page. It's also worth pointing out that we're reusing the Python CGI script again here. The script itself is completely independent of both the user-interface format of the submission page, as well as the technique used to invoke it (from a submitted form or a hardcoded URL). By separating user interface from processing logic, CGI scripts become reusable software components, at least within the context of the CGI environment.
12.3.7.1 Saving CGI script state information
But the real reason for showing this technique is that we're going to use it extensively in the larger case studies in the next two chapters to implement lists of dynamically generated selections that "know" what to do when clicked. Precoded parameters in URLs are a way to retain state information between pages -- they can be used to direct the action of the next script to be run. As such, hyperlinks with such parameters are sometimes known as "smart links."
Normally, CGI scripts run autonomously, with no knowledge of any other scripts that may have run before. That hasn't mattered in our examples so far, but larger systems are usually composed of multiple user interaction steps and many scripts, and we need a way to keep track of information gathered along the way. Generating hardcoded URLs with parameters is one way for a CGI script to pass data to the next script in the application. When clicked, such URL parameters send pre-programmed selection information back to another server-side handler script.
For example, a site that lets you read your email may present you with a list of viewable email messages, implemented in HTML as a list of hyperlinks generated by another script. Each hyperlink might include the name of the message viewer script, along with parameters identifying the selected message number, email server name, and so on -- as much data as is needed to fetch the message associated with a particular link. A retail site may instead serve up a generated list of product links, each of which triggers a hardcoded hyperlink containing the product number, its price, and so on.
In general, there are a variety of ways to pass or retain state information between CGI script executions:
- Hardcoded URL parameters in dynamically generated hyperlinks and embedded in web pages (as discussed here)
- Hidden form input fields that are attached to form data and embedded in web pages, but not displayed on web pages
- HTTP "cookies" that are stored on the client machine and transferred between client and server in HTTP message headers
- General server-side data stores that include databases, persistent object shelves, flat files, and so on
We'll meet most of these mediums in later examples in this chapter and in the two chapters that follow.
12 4 The Hello World Selector
|