CGI Script Trade-offs

As shown in this chapter, PyMailCgi is still something of a system in the making, but it does work as advertised: by pointing a browser at the main page's URL, I can check and send email from anywhere I happen to be, as long as I can find a machine with a web browser. In fact, any machine and browser will do: Python doesn't even have to be installed.[5] That's not the case with the PyMailGui client-side program we wrote in Chapter 11.

[5] This property can be especially useful when visiting government institutions, which seem to generally provide web browser accessibility, but restrict administrative functions and broader network connectivity to officially cleared system administrators (and international spies).

But before we all jump on the collective Internet bandwagon and utterly abandon traditional APIs like Tkinter, a few words of larger context are in order. Besides illustrating larger CGI applications in general, this example was chosen to underscore some of the trade-offs you run into when building applications to run on the Web. PyMailGui and PyMailCgi do roughly the same things, but are radically different in implementation:

On a basic level, both systems use the Python POP and SMTP modules to fetch and send email through sockets. But the implementation alternatives they represent have some critical ramifications that you should know about when considering delivering systems on the Web:

Performance costs

Networks are slower than CPUs . As implemented, PyMailCgi isn't nearly as fast or as complete as PyMailGui. In PyMailCgi, every time the user clicks a submit button, the request goes across the network. More specifically, every user request incurs a network transfer overhead, every callback handler (usually) takes the form of a newly spawned process on the server, parameters come in as text strings that must be parsed out, and the lack of state information on the server between pages means that mail needs to be reloaded often. In contrast, user clicks in PyMailGui trigger in-process function calls instead of network traffic and process forks, and state is easily saved as Python in-process variables (e.g., the loaded-mail list is retained between clicks). Even with an ultra-fast Internet connection, a server-side CGI system is slower than a client-side program.[6]

[6] To be fair, some Tkinter operations are sent to the underlying Tcl library as strings too, which must be parsed. This may change in time; but the contrast here is with CGI scripts versus GUI libraries in general, not with a particular library's implementation.

Some of these bottlenecks may be designed away at the cost of extra program complexity. For instance, some web servers use threads and process pools to minimize process creation for CGI scripts. Moreover, some state information can be manually passed along from page to page in hidden form fields and generated URL parameters, and state can be saved between pages in a concurrently accessible database to minimize mail reloads (see the PyErrata case study in Chapter 14 for an example). But there's no getting past the fact that routing events over a network to scripts is much slower than calling a Python function directly.

Complexity costs

HTML isn't pretty . Because PyMailCgi must generate HTML to interact with the user in a web browser, it is also more complex (or at least, less readable) than PyMailGui. In some sense, CGI scripts embed HTML code in Python. Because the end result of this is a mixture of two very different languages, creating an interface with HTML in a CGI script can be much less straightforward than making calls to a GUI API such as Tkinter.

Witness, for example, all the care we've taken to escape HTML and URLs in this chapter's examples; such constraints are grounded in the nature of HTML. Furthermore, changing the system to retain loaded-mail list state in a database between pages would introduce further complexities to the CGI-based solution. Secure sockets (e.g., OpenSSL, to be supported in Python 1.6) would eliminate manual encryption costs, but introduce other overheads.

Functionality costs

HTML can only say so much. HTML is a portable way to specify simple pages and forms, but is poor to useless when it comes to describing more complex user interfaces. Because CGI scripts create user interfaces by writing HTML back to a browser, they are highly limited in terms of user-interface constructs.

For example, consider implementing an image-processing and animation program as CGI scripts: HTML doesn't apply once we leave the domain of fill-out forms and simple interactions. This is precisely the limitation that Java applets were designed to address -- programs that are stored on a server but pulled down to run on a client on demand, and given access to a full-featured GUI API for creating richer user interfaces. Nevertheless, strictly server-side programs are inherently limited by the constraints of HTML. The animation scripts we wrote at the end of Chapter 8, for example, are well beyond the scope of server-side scripts.

Portability benefits

All you need is a browser . On the client side, at least. Because PyMailCgi runs over the Web, it can be run on any machine with a web browser, whether that machine has Python and Tkinter installed or not. That is, Python needs to be installed on only one computer: the web server machine where the scripts actually live and run. As long as you know that the users of your system have an Internet browser, installation is simple.

Python and Tkinter, you will recall, are very portable too -- they run on all major window systems (X, Windows, Mac) -- but to run a client-side Python/Tk program such as PyMailGui, you need Python and Tkinter on the client machine itself. Not so with an application built as CGI scripts: they will work on Macintosh, Linux, Windows, and any other machine that can somehow render HTML web pages. In this sense, HTML becomes a sort of portable GUI API language in CGI scripts, interpreted by your web browser. You don't even need the source code or bytecode for the CGI scripts themselves -- they run on a remote server that exists somewhere else on the Net, not on the machine running the browser.

Execution requirements

But you do need a browser. That is, the very nature of web-enabled systems can render them useless in some environments. Despite the pervasiveness of the Internet, there are still plenty of applications that run in settings that don't have web browsers or Internet access. Consider, for instance, embedded systems, real-time systems, and secure government applications. While an Intranet (a local network without external connections) can sometimes make web applications feasible in some such environments, I have recently worked at more than one company whose client sites had no web browsers to speak of. On the other hand, such clients may be more open to installing systems like Python on local machines, as opposed to supporting an internal or external network.

Administration requirements

You really need a server too . You can't write CGI-based systems at all without access to a web sever. Further, keeping programs on a centralized server creates some fairly critical administrative overheads. Simply put, in a pure client/server architecture, clients are simpler, but the server becomes a critical path resource and a potential performance bottleneck. If the centralized server goes down, you, your employees, and your customers may be knocked out of commission. Moreover, if enough clients use a shared server at the same time, the speed costs of web-based systems become even more pronounced. In fact, one could make the argument that moving towards a web server architecture is akin to stepping backwards in time -- to the time of centralized mainframes and dumb terminals. Whichever way we step, offloading and distributing processing to client machines at least partially avoids this processing bottleneck.

So what's the best way to build applications for the Internet -- as client-side programs that talk to the Net, or as server-side programs that live and breathe on the Net? Naturally, there is no one answer to that question, since it depends upon each application's unique constraints. Moreover, there are more possible answers to it than we have proposed here; most common CGI problems already have common proposed solutions. For example:

Client-side solutions

Client- and server-side programs can be mixed in many ways. For instance, applet programs live on a server, but are downloaded to and run as client-side programs with access to rich GUI libraries (more on applets when we discuss JPython in Chapter 15). Other technologies, such as embedding JavaScript or Python directly in HTML code, also support client-side execution and richer GUI possibilities; such scripts live in HTML on the server, but run on the client when downloaded and access browser components through an exposed object model (see the discussion Section 15.8 near the end of Chapter 15). The emerging Dynamic HTML (DHTML) extensions provide yet another client-side scripting option for changing web pages after they have been constructed. All of these client-side technologies add extra complexities all their own, but ease some of the limitations imposed by straight HTML.

State retention solutions

Some web application servers (e.g., Zope, described in Chapter 15) naturally support state retention between pages by providing concurrently accessible object databases. Some of these systems have a real underlying database component (e.g., Oracle and mySql); others may make use of files or Python persistent object shelves with appropriate locking (as we'll explore in the next chapter). Scripts can also pass state information around in hidden form fields and generated URL parameters, as done in PyMailCgi, or store it on the client machine itself using the standard cookie protocol.

Cookies are bits of information stored on the client upon request from the server. A cookie is created by sending special headers from the server to the client within the response HTML (Set-Cookie: name=value). It is then accessed in CGI scripts as the value of a special environment variable containing cookie data uploaded from the client (HTTP_COOKIE). Search http://www.python.org for more details on using cookies in Python scripts, including the freely available cookie.py module, which automates the cookie translation process.[7] Cookies are more complex than program variables and are somewhat controversial (some see them as intrusive), but they can offload some simple state retention tasks.

[7] Also see the new standard cookie module in Python release 2.0.

HTML generation solutions

Add-ons can also take some of the complexity out of embedding HTML in Python CGI scripts, albeit at some cost to execution speed. For instance, the HTMLgen system described in Chapter 15 lets programs build pages as trees of Python objects that "know" how to produce HTML. When a system like this is employed, Python scripts deal only with objects, not the syntax of HTML itself. Other systems such as PHP and Active Server Pages (described in the same chapter) allow scripting language code to be embedded in HTML and executed on the server, to dynamically generate or determine part of the HTML that is sent back to a client in response to requests.

Clearly, Internet technology does imply some design trade-offs, and is still evolving rapidly. It is nevertheless an appropriate delivery context for many (though not all) applications. As with every design choice, you must be the judge. While delivering systems on the Web may have some costs in terms of performance, functionality, and complexity, it is likely that the significance of those overheads will diminish with time.

Категории