Collecting Web Input

18.6.1 Problem

You want to extract the input parameters that were submitted as part of a form or specified at the end of a URL.

18.6.2 Solution

Each API provides a means of accessing the names and values of the input parameters in the execution environment of a web script.

18.6.3 Discussion

Earlier sections of this chapter discuss how to retrieve information from MySQL and use it to generate various forms of output, such as static text, hyperlinks, or form elements. In this section, we'll discuss the opposite problemhow to collect input from the Web. Applications for such input are many. For example, you can use the techniques shown in this section to extract the contents of a form submitted by a user. You might interpret the information as search keywords, then run a query against a product catalog and show the matching items to a customer. In this case, you use the Web to collect information from which you can determine the client's interests. From that you construct an appropriate search query and display the results. If a form represents a survey, a mailing list sign-up sheet, or a poll, you might just store the values, using the data to create a new database record (or perhaps to update an existing record).

A script that receives input over the Web and uses it to interact with MySQL generally processes the information in a series of stages:

  1. Extract the input from the execution environment. When a request arrives that contains input parameters, the web server places the input into the environment of the script that handles the request, and the script queries its environment to obtain the parameters. It may be necessary to decode special characters in the parameters to recover the actual values submitted by the client, if the extraction mechanism provided by your API doesn't do it for you. (For example, you may need to convert %20 to space.)
  2. Validate the input to make sure it's legal. You cannot trust users to send legal values, so it's a good idea to check input parameters to make sure they look reasonable. For example, if you expect a user to enter a number into a field, you should check the value to be sure it's really numeric. If a form contains a pop-up menu that was constructed using the allowable values of an ENUM column, you might expect the value that you actually get back to be one of these values. But there's no way to be sure except to check. If you don't, you run the risk of entering garbage into your database.
  3. Construct a query based on the input. Typically, input parameters are used to add a record to a database, or to to retrieve information from the database for display to the client. Either way, you use the input to construct a query and send it to the MySQL server. Query construction based on user input should be done with care, using proper escaping to avoid creating malformed or dangerous SQL statements.

The rest of this section explores the first of these three stages of input processing. Recipe 18.7Recipe 18.7 and Recipe 18.8 cover the second and third stages. The first stage (pulling input from the execution environment) has little to do with MySQL, but is covered here because it's necessarily the means by which you obtain the information used in the later processing stages.

Input obtained over the Web can be received in several ways, two of which are most common:

You may also have occasion to process other types of input, such as uploaded files. Those are sent using POST requests, but as part of a special kind of form that is discussed in Recipe 18.9.

When you gather input for a web script, you may need to be concerned with how the input was sent. (Some APIs distinguish between input sent via GET and POST, others do not.) However, once you have pulled out the information that was sent, the request method doesn't matter. The validation and query construction stages do not need to know whether parameters were sent using GET or POST.

The recipes distribution includes some scripts in the apache/params directory (tomcat/mcb for JSP) that process input parameters. Each script allows you to submit GET or POST requests, and shows how to extract and display the parameter values thus submitted. Examine these scripts to see how the parameter extraction methods for the various APIs are used. Utility routines invoked by the scripts can be found in the library modules in the lib directory of the distribution.

18.6.4 Web Input Extraction Conventions

To obtain input parameters passed to a script, you should familiarize yourself with your API's conventions so that you know what it does for you, and what you must do yourself. For example, you should know the answers to these questions:

18.6.5 Perl

The Perl CGI.pm module makes input parameters available to scripts through the param( ) function. param( ) provides access to input submitted via either GET or POST, which simplifies your task as the script writer. If a form containing id and name parameters was submitted via POST, you can process it the same way as if the parameters were submitted at the end of the URL via GET. You don't need to perform any decoding, either; param( ) handles that as well.

To obtain a list of names of all available parameters, call param( ) with no arguments:

@names = param ( );

To obtain the value of a specific parameter, pass its name to param( ).

$id = param ("id"); @options = param ("options");

In scalar context, param( ) returns the parameter value if it is single-valued, the first value if it is multiple-values, or undef if the parameter is not available. In array context, param( ) returns a list containing all the parameter's values, or an empty list if the parameter is not available.

A parameter with a given name might not be available if the form field with the same name was left blank, or if there isn't any field with that name. Note too that a parameter value may be defined but empty. For good measure, you may want to check both possibilities. For example, to check for an age parameter and assign a default value of unknown if the parameter is missing or empty, you can do this:

$age = param ("age"); $age = "unknown" if !defined ($age) || $age eq "";

CGI.pm understands both ; and & as URL parameter separator characters.

18.6.6 PHP

Input parameters can be available to PHP in several ways, depending on your version of PHP and on your configuration settings:

The track_vars and register_globals settings can be compiled in or configured in the PHP php.ini file. As of PHP 4.0.3, track_vars is always on, and I suspect that most installations of earlier versions enable this setting as well. For this reason, I'll assume your version of PHP has track_vars enabled.

register_globals makes it convenient to access input parameters through global variables, but the PHP developers recommend that it be disabled for security reasons. Why is that? Well, suppose you write a script that requires the user to supply a password, which is represented by the $password variable. You might check the password in a script like this:

if (check_password ($password)) $password_is_ok = 1;

The intent here is that if the password matches, the script sets $password_is_ok to 1. Otherwise $password_is_ok is left unset (which compares false in Boolean expressions). But suppose register_variables is enabled and someone invokes your script as follows:

http://your.host.com/chkpass.php?password_is_ok=1

In this case, PHP sees that the password_is_ok parameter is set to 1, and sets the $password_is_ok variable to 1. The result is that when your script executes, $password_is_ok is 1 no matter what password was given, or even if no password was given! The problem with register_globals is that it allows outside users to supply default values for global variables in your scripts. One solution is to disable register_globals, in which case you'll need to check the global arrays ($_GET, $_POST) for input parameter values. If you don't want to do that, you should take care not to assume that PHP variables have no value initially. Unless you're expecting a global variable to be set from an input parameter, it's best to initialize it explicitly to a known value. The password-checking code should be written like this to make sure that $password_is_ok is assigned a value whatever the result of the test:

$password_is_ok = 0; if (check_password ($password)) $password_is_ok = 1;

The PHP scripts in this book do not rely on register_globals. Instead, they obtain input through the global parameter arrays.

Another complicating factor when retrieving input parameters in PHP is that they may need some decoding, depending on the value of the magic_quotes_gpc configuration setting. If magic quotes are enabled, any quote, backslash, and NUL characters in input parameter values will be escaped with backslashes. I guess this is supposed to save you a step by allowing you to extract values and use them directly in query strings. However, that's only useful if you plan to use web input in a query with no preprocessing or validity checking, which is dangerous. You should check your input first, in which case it's necessary to strip out the slashes anyway. That means having magic quotes turned on isn't really very useful.

Given the various sources through which input parameters may be available, and the fact that they may or may not contain extra backslashes, extracting input in PHP scripts can be an interesting problem. If you have control of your server and can set the values of the various configuration settings, you can of course write your scripts based on those settings. But if you do not control your server or are writing scripts that need to run on several machines, you may not know in advance what the settings are. Fortunately, with a bit of effort it's possible to write reasonably general purpose parameter extraction code that works correctly with very few assumptions about your PHP operating environment. The following utility function, get_param_val( ), takes a parameter name as its argument and returns the corresponding parameter value. If the parameter is not available, the function returns an unset value.

function get_param_val ($name) { global $HTTP_GET_VARS, $HTTP_POST_VARS; unset ($val); if (isset ($_GET[$name])) $val = $_GET[$name]; else if (isset ($_POST[$name])) $val = $_POST[$name]; else if (isset ($HTTP_GET_VARS[$name])) $val = $HTTP_GET_VARS[$name]; else if (isset ($HTTP_POST_VARS[$name])) $val = $HTTP_POST_VARS[$name]; if (isset ($val) && get_magic_quotes_gpc ( )) $val = strip_slash_helper ($val); return (@$val); }

To use this function to obtain the value of a single-valued parameter named id, call it like this:

$id = get_param_val ("id");

You can test $id to determine whether the id parameter was present in the input:

if (isset ($id)) ... id parameter is present ... else ... id parameter is not present ...

For a form field that may have multiple values (such as a checkbox group or a multiple-pick scrolling list), you should represent it in the form using a name that ends in [ ]. For example, a list element constructed from the SET column accessories in the cow_order table has one item for each allowable set value. To make sure PHP treats the element value as an array, don't name the field accessories, name it accessories[ ]. (See Recipe 18.4 for an example.) When the form is submitted, PHP places the array of values in a parameter named without the [ ], so to access it, do this:

$accessories = get_param_val ("accessories");

The $accessories variable will be an array. (This will be true whether the parameter has multiple values, a single value, or even no values. The determining factor is not whether the parameter actually has multiple values, but whether you name the corresponding field in the form using [ ] notation.)

The get_param_val( ) function checks the $_GET, $_POST, $HTTP_GET_VARS, and $HTTP_POST_VARS arrays for parameter values. Thus, it works correctly for PHP 3 and PHP 4, whether the request was made by GET or POST, and whether or not register_globals is turned on. The only thing that the function assumes is that track_vars is enabled.

get_param_val( ) also works correctly regardless of whether magic quoting is enabled. It uses a helper function strip_slash_helper( ) that performs backslash stripping from parameter values if necessary:

function strip_slash_helper ($val) { if (!is_array ($val)) $val = stripslashes ($val); else { reset ($val); while (list ($k, $v) = each ($val)) $val[$k] = strip_slash_helper ($v); } return ($val); }

strip_slash_helper( ) checks whether a value is a scalar or an array and processes it accordingly. The reason it uses a recursive algorithm for array values is that in PHP 4 it's possible to create nested arrays from input parameters.

To make it easy to obtain a list of all parameter names, write another utility function:

function get_param_names ( ) { global $HTTP_GET_VARS, $HTTP_POST_VARS; # construct an associative array in which each element has a # parameter name as both key and value. (Using an associative # array eliminates duplicates.) $keys = array ( ); if (isset ($_GET)) { reset ($_GET); while (list ($k, $v) = each ($_GET)) $keys[$k] = $k; } else if (isset ($HTTP_GET_VARS)) { reset ($HTTP_GET_VARS); while (list ($k, $v) = each ($HTTP_GET_VARS)) $keys[$k] = $k; } if (isset ($_POST)) { reset ($_POST); while (list ($k, $v) = each ($_POST)) $keys[$k] = $k; } else if (isset ($HTTP_POST_VARS)) { reset ($HTTP_POST_VARS); while (list ($k, $v) = each ($HTTP_POST_VARS)) $keys[$k] = $k; } return ($keys); }

get_param_names( ) returns a list of parameter names present in the HTTP variable arrays, with duplicate names removed if there is overlap between the arrays. The return value is an associative array with both the keys and values set to the parameter names. This way you can use either the keys or the values as the list of names. The following example prints the names, using the values:

$param_names = get_param_names ( ); while (list ($k, $v) = each ($param_names)) print (htmlspecialchars ($v) . " ");

For PHP 3 scripts, the parameters in URLs should be separated by & characters. That's also the default for PHP 4, although you can change it using the arg_separator configuration setting in the PHP initialization file.

18.6.7 Python

The Python cgi module provides access to the input parameters that are present in the script environment. Import that module, then create a FieldStorage object using a method of the same name:

import cgi param = cgi.FieldStorage ( )

The FieldStorage method returns information for parameters submitted via either GET or POST requests, so you need not know which method was used to send the request. The FieldStorage object contains an element for each parameter present in the environment. You can get a list of available parameter names like this:

names = param.keys ( )

If a given parameter, name, is single-valued, the value associated with it will be a scalar that you can access as follows:

val = param[name].value

If the parameter is multiple-valued, param[name] will be a list of MiniFieldStorage objects that have name and value attributes. Each of these has the same name (it will be equal to name) and one of the parameter's values. To create a list containing all the values for such a parameter, do this:

val = [ ] for item in param[name]: val.append (item.value)

You can distinguish single-valued from multiple-valued parameters by checking the type. The following code shows how to get the parameter names and loop through each parameter to print its name and value, printing multiple-valued parameters as a comma-separated list:[3]

[3] This code requires that you import the string and types module in addition to the cgi module.

param = cgi.FieldStorage ( ) param_names = param.keys ( ) param_names.sort ( ) print "

Parameter names:", param_names, "

" items = [ ] for name in param_names: if type (param[name]) is not types.ListType: # it's a scalar ptype = "scalar" val = param[name].value else: # it's a list ptype = "list" val = [ ] for item in param[name]: # iterate through MiniFieldStorage val.append (item.value) # items to get item values val = string.join (val, ",") # convert to string for printing items.append ("type=" + ptype + ", name=" + name + ", value=" + val) print make_unordered_list (items)

Python will raise an exception if you try to access a parameter that is not present in the FieldStorage object. To avoid this, use has_key( ) to find out if the parameter exists:

if param.has_key (name): print "parameter " + name + " exists" else: print "parameter " + name + " does not exist"

Single-valued parameters have attributes other than value. For example, a parameter representing an uploaded file has additional attributes you can use to get the file's contents. This is discussed further in Recipe 18.9.

The cgi module expects URL parameters to be separated by & characters. If you generate a hyperlink to a script based on the cgi module and the URL includes parameters, don't separate them with ; characters.

18.6.8 Java

Within JSP pages, the implicit request object provides access to the request parameters through the following methods:

getParameterNames( )

Returns an enumeration of String objects, one for each parameter name present in the request.

getParameterValues(String name)

Returns an array of String objects, one for each value associated with the parameter, or null if the parameter does not exist.

getParameterValue(String name)

Returns the first value associated with the parameter, or null if the parameter does not exist.

The following example shows one way to use these methods to display request parameters:

<%@ page import="java.util.*" %>

Request parameters are also available within JSTL tags, using the special variables param and paramValues. param[name] returns the first value for a given parameter and thus is most suited for single-valued parameters:

color value:

paramValues[name] returns an array of values for the parameter, so it's useful for parameters that may have multiple values:

accessory values:

You can also access a parameter using dot notation if the parameter name is legal as an object property name:

color value: accessory values:

To produce a list of parameter objects with key and value attributes, iterate over the paramValues variable:

 

To construct URLs that point to JSP pages and that have parameters at the end, you should separate the parameters by & characters.

Категории