The Hello World Selector

It's now time for something a bit more useful (well, more entertaining, at least). This section presents a program that displays the basic syntax required by various programming languages to print the string "Hello World", the classic language benchmark. To keep this simple, it assumes the string shows up in the standard output stream, not a GUI or web page. It also gives just the output command itself, not the complete programs. The Python version happens to be a complete program, but we won't hold that against its competitors here.

Structurally, the first cut of this example consists of a main page HTML file, along with a Python-coded CGI script that is invoked by a form in the main HTML page. Because no state or database data is stored between user clicks, this is still a fairly simple example. In fact, the main HTML page implemented by Example 12-16 is really just one big pull-down selection list within a form.

Example 12-16. PP2EInternetCgi-WebBasicslanguages.html

 

Languages

Hello World selector

This demo shows how to display a "hello world" message in various programming languages' syntax. To keep this simple, only the output command is shown (it takes more code to make a complete program in some of these languages), and only text-based solutions are given (no GUI or HTML construction logic is included). This page is a simple HTML file; the one you see after pressing the button below is generated by a Python CGI script which runs on the server. Pointers:


Select a programming language:

AllPythonPerlTclSchemeSmallTalkJavaCC++BasicFortranPascalOther

For the moment, let's ignore some of the hyperlinks near the middle of this file; they introduce bigger concepts like file transfers and maintainability that we will explore in the next two sections. When visited with a browser, this HTML file is downloaded to the client and rendered into the new browser page shown in Figure 12-19.

Figure 12-19. The "Hello World" main page

That widget above the Submit button is a pull-down selection list that lets you choose one of the

tag values in the HTML file. As usual, selecting one of these language names and pressing the Submit button at the bottom (or pressing your Enter key) sends the selected language name to an instance of the server-side CGI script program named in the form's action option. Example 12-17 contains the Python script that runs on the server upon submission.

Example 12-17. PP2EInternetCgi-WebBasicslanguages.cgi

#!/usr/bin/python ######################################################## # show hello world syntax for input language name; # note that it uses r'...' raw strings so that ' ' # in the table are left intact, and cgi.escape( ) on # the string so that things like '<<' don't confuse # browsers--they are translated to valid html code; # any language name can arrive at this script: e.g., # can type "http://starship.python.net/~lutz/Basics # /languages.cgi?language=Cobol" in any web browser. # caveats: the languages list appears in both the cgi # and html files--could import from a single file if # selection list generated by another cgi script too; ######################################################## debugme = 0 # 1=test from cmd line inputkey = 'language' # input parameter name hellos = { 'Python': r" print 'Hello World' ", 'Perl': r' print "Hello World "; ', 'Tcl': r' puts "Hello World" ', 'Scheme': r' (display "Hello World") (newline) ', 'SmallTalk': r" 'Hello World' print. ", 'Java': r' System.out.println("Hello World"); ', 'C': r' printf("Hello World "); ', 'C++': r' cout << "Hello World" << endl; ', 'Basic': r' 10 PRINT "Hello World" ', 'Fortran': r" print *, 'Hello World' ", 'Pascal': r" WriteLn('Hello World'); " } class dummy: # mocked-up input obj def __init__(self, str): self.value = str import cgi, sys if debugme: form = {inputkey: dummy(sys.argv[1])} # name on cmd line else: form = cgi.FieldStorage( ) # parse real inputs print "Content-type: text/html " # adds blank line print "

Languages" print "

Syntax


" def showHello(form): # html for one language choice = form[inputkey].value print "

%s

" % choice try: print cgi.escape(hellos[choice]) except KeyError: print "Sorry--I don't know that language" print "

" if not form.has_key(inputkey) or form[inputkey].value == 'All': for lang in hellos.keys( ): mock = {inputkey: dummy(lang)} showHello(mock) else: showHello(form) print '


'

And as usual, this script prints HTML code to the standard output stream to produce a response page in the client's browser. There's not much new to speak of in this script, but it employs a few techniques that merit special focus:

Raw strings

Notice the use of raw strings (string constants preceded by an "r" character) in the language syntax dictionary. Recall that raw strings retain backslash characters in the string literally, rather than interpreting them as string escape-code introductions. Without them, the newline character sequences in some of the language's code snippets would be interpreted by Python as line-feeds, rather than being printed in the HTML reply as .

Escaping text embedded in HTML and URLs

This script takes care to format the text of each language's code snippet with the cgi.escape utility function. This standard Python utility automatically translates characters that are special in HTML into HTML escape code sequences, such that they are not treated as HTML operators by browsers. Formally, cgi.escape translates characters to escape code sequences, according to the standard HTML convention: <, >, and & become <, >, and &. If you pass a second true argument, the double-quote character (") is also translated to ".

For example, the << left-shift operator in the C++ entry is translated to << -- a pair of HTML escape codes. Because printing each code snippet effectively embeds it in the HTML response stream, we must escape any special HTML characters it contains. HTML parsers (including Python's standard htmllib module) translate escape codes back to the original characters when a page is rendered.

More generally, because CGI is based upon the notion of passing formatted stringsacross the Net, escaping special characters is a ubiquitous operation. CGI scripts almost always need to escape text generated as part of the reply to be safe. For instance, if we send back arbitrary text input from a user or read from a data source on the server, we usually can't be sure if it will contain HTML characters or not, so we must escape it just in case.

In later examples, we'll also find that characters inserted into URL address strings generated by our scripts may need to be escaped as well. A literal & in a URL is special, for example, and must be escaped if it appears embedded in text we insert into a URL. However, URL syntax reserves different special characters than HTML code, and so different escaping conventions and tools must be used. As we'll see later in this chapter, cgi.escape implements escape translations in HTML code, but urllib.quote (and its relatives) escapes characters in URL strings.

Mocking up form inputs

Here again, form inputs are "mocked up" (simulated), both for debugging and for responding to a request for all languages in the table. If the script's global debugme variable is set to a true value, for instance, the script creates a dictionary that is plug-and-play compatible with the result of a cgi.FieldStorage call -- its "languages" key references an instance of the dummy mock-up class. This class in turn creates an object that has the same interface as the contents of a cgi.FieldStorage result -- it makes an object with a value attribute set to a passed-in string.

The net effect is that we can test this script by running it from the system command line: the generated dictionary fools the script into thinking it was invoked by a browser over the Net. Similarly, if the requested language name is "All," the script iterates over all entries in the languages table, making a mocked-up form dictionary for each (as though the user had requested each language in turn). This lets us reuse the existing showHello logic to display each language's code in a single page. As always in Python, object interfaces and protocols are what we usually code for, not specific datatypes. The showHello function will happily process any object that responds to the syntax form['language'].value.[8]

[8] If you are reading closely, you might notice that this is the second time we've used mock-ups in this chapter (see the earlier test4.cgi example). If you find this technique generally useful, it would probably make sense to put the dummy class, along with a function for populating a form dictionary on demand, into a module so it can be reused. In fact, we will do that in the next section. Even for two-line classes like this, typing the same code the third time around will do much to convince you of the power of code reuse.

Now let's get back to interacting with this program. If we select a particular language, our CGI script generates an HTML reply of the following sort (along with the required content-type header and blank line):

 

Languages

Syntax


Scheme

(display "Hello World") (newline)

 


Program code is marked with a

tag to specify preformatted text (the browser won't reformat it like a normal text paragraph). This reply code shows what we get when we pick "Scheme." Figure 12-20 shows the page served up by the script after selecting "Python" in the pull-down selection list.

Figure 12-20. Response page created by languages.cgi

Our script also accepts a language name of "All," and interprets it as a request to display the syntax for every language it knows about. For example, here is the HTML that is generated if we set global variable debugme to 1 and run from the command line with a single argument, "All." This output is the same as what's printed to the client's browser in response to an "All" request:[9]

[9] Interestingly, we also get the "All" reply if debugme is set to when we run the script from the command line. The cgi.FieldStorage call returns an empty dictionary if called outside the CGI environment rather than throwing an exception, so the test for a missing key kicks in. It's likely safer to not rely on this behavior, however.

C:...PP2EInternetCgi-WebBasics>python languages.cgi All Content-type: text/html

Languages

Syntax


Perl

print "Hello World ";

 

SmallTalk

'Hello World' print.

 

Basic

10 PRINT "Hello World"

 

Scheme

(display "Hello World") (newline)

 

Python

print 'Hello World'

 

C++

cout << "Hello World" << endl;

 

Pascal

WriteLn('Hello World');

 

Java

System.out.println("Hello World");

 

C

printf("Hello World ");

 

Tcl

puts "Hello World"

 

Fortran

print *, 'Hello World'

 


Each language is represented here with the same code pattern -- the showHello function is called for each table entry, along with a mocked-up form object. Notice the way that C++ code is escaped for embedding inside the HTML stream; this is the cgi.escape call's handiwork. When viewed with a browser, the "All" response page is rendered as shown in Figure 12-21.

Figure 12-21. Response page for "all languages" choice

12.4.1 Checking for Missing and Invalid Inputs

So far, we've been triggering the CGI script by selecting a language name from the pull-down list in the main HTML page. In this context, we can be fairly sure that the script will receive valid inputs. Notice, though, that there is nothing to prevent a user from passing the requested language name at the end of the CGI script's URL as an explicit parameter, instead of using the HTML page form. For instance, a URL of the form:

http://starship.python.net/~lutz/Basics/languages.cgi?language=Python

yields the same "Python" response page shown in Figure 12-20.[10] However, because it's always possible for a user to bypass the HTML file and use an explicit URL, it's also possible that a user could invoke our script with an unknown language name that is not in the HTML file's pull-down list (and so not in our script's table). In fact, the script might be triggered with no language input at all, if someone explicitly types its URL with no parameter at the end.

[10] See the urllib module examples in the prior and following chapters for a way to send this URL from a Python script. urllib lets programs fetch web pages and invoke remote CGI scripts by building and submitting URL strings like this one, with any required parameters filled in at the end of the string. You could use this module, for instance, to automatically send information to order Python books at an online bookstore from within a Python script, without ever starting a web browser.

To be robust, the script checks for both cases explicitly, as all CGI scripts generally should. For instance, here is the HTML generated in response to a request for the fictitious language "GuiDO":

 

Languages

Syntax


GuiDO

Sorry--I don't know that language

 


If the script doesn't receive any language name input, it simply defaults to the "All" case. If we didn't detect these cases, chances are that our script would silently die on a Python exception and leave the user with a mostly useless half-complete page or with a default error page (we didn't assign stderr to stdout here, so no Python error message would be displayed). In pictures, Figure 12-22 shows the page generated if the script is invoked with an explicit URL like this:

http://starship.python.net/~lutz/Basics/languages.cgi?language=COBOL

To test this error case, the pull-down list includes an "Unknown" name, which produces a similar error page reply. Adding code to the script's table for the COBOL "Hello World" program is left as an exercise for the reader.

Figure 12-22. Response page for unknown language

12 5 Coding for Maintainability

Категории