Including Special Characters and NULL Values in Queries
2.8.1 Problem
You've having trouble constructing queries that include data values containing special characters such as quotes or backslashes, or special values such as NULL.
2.8.2 Solution
Use your API's placeholder mechanism or quoting function.
2.8.3 Discussion
Up to this point, our queries have used "safe" data values requiring no special treatment. This section describes how to construct queries when you're using values that contain special characters such as quotes, backslashes, binary data, or values that are NULL. The difficulty with such values is as follows. Suppose you have the following INSERT query:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('Alison','1973-01-12','blue','eggroll',4);
There's nothing unusual about that. But if you change the name column value to something like De'Mont that contains a single quote, the query becomes syntactically invalid:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De'Mont','1973-01-12','blue','eggroll',4);
The problem is that there is a single quote inside a single-quoted string. To make the query legal, the quote could be escaped by preceding it either with a single quote or with a backslash:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De''Mont','1973-01-12','blue','eggroll',4); INSERT INTO profile (name,birth,color,foods,cats) VALUES('De'Mont','1973-01-12','blue','eggroll',4);
Alternatively, you could quote the name value itself within double quotes rather than within single quotes:
INSERT INTO profile (name,birth,color,foods,cats) VALUES("De'Mont",'1973-01-12','blue','eggroll',4);
Naturally, if you are writing a query literally in your program, you can escape or quote the name value by hand because you know what the value is. But if you're using a variable to provide the name value, you don't necessarily know what the variable's value is. Worse yet, single quote isn't the only character you must be prepared to deal with; double quotes and backslashes cause problems, too. And if you want to store binary data such as images or sound clips in your database, such values might contain anythingnot just quotes or backslashes, but other characters such as nulls (zero-valued bytes). The need to handle special characters properly is particularly acute in a web environment where queries are constructed using form input (for example, if you're searching for records that match search terms entered by the remote user). You must be able to handle any kind of input in a general way, because you can't predict in advance what kind of information people will supply. In fact, it is not uncommon for malicious users to enter garbage values containing problematic characters in a deliberate attempt to break your scripts.
The SQL NULL value is not a special character, but it too requires special treatment. In SQL, NULL indicates "no value." This can have several meanings depending on context, such as "unknown," "missing," "out of range," and so forth. Our queries thus far have not used NULL values, to avoid dealing with the complications that they introduce, but now it's time to address these issues. For example, if you don't know De'Mont's favorite color, you can set the color column to NULLbut not by writing the query like this:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De''Mont','1973-01-12','NULL','eggroll',4);
Instead, the NULL value shouldn't have any surrounding quotes at all:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De''Mont','1973-01-12',NULL,'eggroll',4);
If you were writing the query literally in your program, you'd simply write the word "NULL" without surrounding quotes. But if the color value comes from a variable, the proper action is not so obvious. You must know something about the variable's value to be able to determine whether or not to surround it with quotes when you construct the query.
There are two general means at your disposal for dealing with special characters such as quotes and backslashes, and with special values such as NULL:
- Use placeholders if your API supports them. Generally, this is the preferred method, because the API itself will do all or most of the work for you of providing quotes around values as necessary, quoting or escaping special characters within the data value, and possibly interpreting a special value to map onto NULL without surrounding quotes. Recipe 2.7 provides general background on placeholder support; you should read that section if you haven't already.
- Use a quoting function if your API provides one for converting data values to a safe form that is suitable for use in query strings.
The remainder of this section shows how to handle special characters for each API. The examples demonstrate how to insert a profile table record that contains De'Mont for the name value and NULL for the color value. The techniques shown work generally to handle any special characters, including those found in binary data. (The techniques are not limited to INSERT queries. They work for other kinds of statements as well, such as SELECT queries.) Examples showing specifically how to work with a particular kind of binary dataimagesare provided in Chapter 17.
A related issue not covered here is the inverse operation of transforming special characters in values returned from your database for display in various contexts. For example, if you're generating HTML pages that include values taken from your database, you have to convert < and > characters in those values to the HTML entities < and > to make sure they display properly. This topic is discussed in Chapter 16.
2.8.4 Perl
DBI supports a placeholder mechanism for binding data values to queries, as discussed in Recipe 2.7. Using this mechanism, we can add the profile record for De'Mont by using do( ):
my $count = $dbh->do ("INSERT INTO profile (name,birth,color,foods,cats) VALUES(?,?,?,?,?)", undef, "De'Mont", "1973-01-12", undef, "eggroll", 4);
Alternatively, use prepare( ) plus execute( ):
my $sth = $dbh->prepare ("INSERT INTO profile (name,birth,color,foods,cats) VALUES(?,?,?,?,?)"); my $count = $sth->execute ("De'Mont", "1973-01-12", undef, "eggroll", 4);
In either case, the resulting query generated by DBI is as follows:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De'Mont','1973-01-12',NULL,'eggroll','4')
Note how DBI adds quotes around data values, even though there were none around the ? placeholder characters in the original query string. (The placeholder mechanism adds quotes around numeric values, too, but that's okay, because the MySQL server performs type conversion as necessary to convert strings to numbers.) Also note the DBI convention that when you bind undef to a placeholder, DBI puts a NULL into the query and correctly refrains from adding surrounding quotes.
DBI also provides a quote( ) method as an alternative to using placeholders. quote( ) is a database handle method, so you must have a connection open to the server before you can use it. (This is because the proper quoting rules cannot be selected until the driver is known; some databases have different quoting rules than others.) Here's how to use quote( ) to create a query string for inserting a new record in the profile table:
my $stmt = sprintf ( "INSERT INTO profile (name,birth,color,foods,cats) VALUES(%s,%s,%s,%s,%s)", $dbh->quote ("De'Mont"), $dbh->quote ("1973-01-12"), $dbh->quote (undef), $dbh->quote ("eggroll"), $dbh->quote (4)); my $count = $dbh->do ($stmt);
The query string generated by this code is the same as when you use placeholders. The %s format specifiers are written without surrounding quotes because quote( ) provides them automatically as necessary: undef values are inserted as NULL without quotes, and non-undef values are inserted with quotes.
2.8.5 PHP
PHP has no placeholder capability, but does provide an addslashes( ) function that you can use to make values safe for insertion into query strings. addslashes( ) escapes special characters such as quotes and backslashes, but does not add surrounding quotes around values; you must add them yourself. We also need a convention for specifying NULL values; let's try using unset( ) to force a variable to have "no value" (somewhat like Perl's undef value). Here is some PHP code for adding De'Mont's profile table record:
unset ($null); # create a "null" value $stmt = sprintf (" INSERT INTO profile (name,birth,color,foods,cats) VALUES('%s','%s','%s','%s','%s')", addslashes ("De'Mont"), addslashes ("1973-01-12"), addslashes ($null), addslashes ("eggroll"), addslashes (4)); $result_id = mysql_query ($stmt, $conn_id);
In the example, the %s format specifiers in the query string are surrounded with quotes because addslashes( ) doesn't provide them. Unfortunately, the resulting query string looks like this, which isn't quite correct:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De'Mont','1973-01-12','','eggroll','4')
The quote in the name field has been escaped properly, but the "null" (unset) value we passed for the color column turned into an empty string, not NULL. Let's fix this by writing a helper function sql_quote( ) to use in place of addslashes( ). sql_quote( ) is similar to addslashes( ), but returns NULL (without surrounding quotes) for unset values and adds quotes around the value otherwise. Here's what it looks like:
function sql_quote ($str) { return (isset ($str) ? "'" . addslashes ($str) . "'" : "NULL"); }
Because sql_quote( ) itself adds quote characters around the data value if they're needed, we can remove the quotes that surround the %s format specifiers in the query string and generate the INSERT statement like this:
unset ($null); # create a "null" value $stmt = sprintf (" INSERT INTO profile (name,birth,color,foods,cats) VALUES(%s,%s,%s,%s,%s)", sql_quote ("De'Mont"), sql_quote ("1973-01-12"), sql_quote ($null), sql_quote ("eggroll"), sql_quote (4)); $result_id = mysql_query ($stmt, $conn_id);
After making the preceding changes, the value of $stmt includes a properly unquoted NULL value:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De'Mont','1973-01-12',NULL,'eggroll','4')
If you're using PHP 4, you have some additional options for handling NULL values and special characters. First, PHP 4 has a special value NULL that is like an unset value, so you could use that in place of $null in the preceding code that generated the INSERT statement. (However, to write code that works for both PHP 3 and PHP 4, use an unset variable such as $null.) Second, as of PHP 4.0.3, an alternative to addslashes( ) is to use mysql_escape_string( ), which is based on the function of the same name in the MySQL C API. For example, you could rewrite sql_quote( ) to use mysql_escape_string( ) like this:
function sql_quote ($str) { return (isset ($str) ? "'" . mysql_escape_string ($str) . "'" : "NULL"); }
If you want a version that uses mysql_escape_string( ) if it's present and falls back to addslashes( ) otherwise, write sql_quote( ) like this:
function sql_quote ($str) { if (!isset ($str)) return ("NULL"); $func = function_exists ("mysql_escape_string") ? "mysql_escape_string" : "addslashes"; return ("'" . $func ($str) . "'"); }
Whichever version of sql_quote( ) you use, it's the kind of routine that is a good candidate for inclusion in a library file. I'll assume its availability for PHP scripts in the rest of this book. You can find it as part of the Cookbook_Utils.php file in the lib directory of the recipes distribution. To use the file, install it in the same location where you put Cookbook.php and reference it from scripts like this:
include "Cookbook_Utils.php";
2.8.6 Python
Python provides a placeholder mechanism that you can use for handling special characters in data values, as described in Recipe 2.7. To add the profile table record for De'Mont, the code looks like this:
try: cursor = conn.cursor ( ) cursor.execute (""" INSERT INTO profile (name,birth,color,foods,cats) VALUES(%s,%s,%s,%s,%s) """, ("De'Mont", "1973-01-12", None, "eggroll", 4)) print "%d row was inserted" % cursor.rowcount except: print "Oops, the query failed"
The parameter binding mechanism adds quotes around data values where necessary. DB-API treats None as logically equivalent to the SQL NULL value, so you can bind None to a placeholder to produce a NULL in the query string. The query that is sent to the server by the preceding execute( ) call looks like this:
INSERT INTO profile (name,birth,color,foods,cats) VALUES('De'Mont','1973-01-12',NULL,'eggroll',4)
With MySQLdb 0.9.1 or newer, an alternative method of quoting data values is to use the literal( ) method. To produce the INSERT statement for De'Mont by using literal( ), do this:
try: cursor = conn.cursor ( ) str = """ INSERT INTO profile (name,birth,color,foods,cats) VALUES(%s,%s,%s,%s,%s) """ % (conn.literal ("De'Mont"), conn.literal ("1973-01-12"), conn.literal (None), conn.literal ("eggroll"), conn.literal (4)) cursor.execute (str) print "%d row was inserted" % cursor.rowcount except: print "Oops, the query failed"
2.8.7 Java
Java provides a placeholder mechanism that you can use to handle special characters in data values, as described in Recipe 2.7. To add the profile table record for De'Mont, create a prepared statement, bind the data values to it, then execute the statement:
PreparedStatement s; int count; s = conn.prepareStatement ( "INSERT INTO profile (name,birth,color,foods,cats)" + " VALUES(?,?,?,?,?)"); s.setString (1, "De'Mont"); s.setString (2, "1973-01-12"); s.setNull (3, java.sql.Types.CHAR); s.setString (4, "eggroll"); s.setInt (5, 4); count = s.executeUpdate ( ); s.close ( ); // close statement
Each value-binding call here is chosen to match the data type of the column to which the value is bound: setString( ) to bind a string to the name column, setInt( ) to bind an integer to the cats column, and so forth. (Actually, I cheated a bit by using setString( ) to treat the date value for birth as a string.) The setXXX( ) calls add quotes around data values if necessary, so no quotes are needed around the ? placeholder characters in the query string. One difference between JDBC and the other APIs is that you don't specify a special value to bind a NULL to a placeholder by specifying some special value (such as undef in Perl or None in Python). Instead, you invoke a special method setNull( ), where the second argument indicates the type of the column (java.sql.Types.CHAR for a string, java.sql.Types.INTEGER for an integer, etc.).
To achieve some uniformity in the value-binding calls, a helper function bindParam( ) can be defined that takes a Statement object, a placeholder position, and a data value. This allows the same function to be used to bind any data value. We can even use the convention that passing the Java null value binds a SQL NULL to the query. After rewriting the previous example to use bindParam( ), it looks like this:
PreparedStatement s; int count; s = conn.prepareStatement ( "INSERT INTO profile (name,birth,color,foods,cats)" + " VALUES(?,?,?,?,?)"); bindParam (s, 1, "De'Mont"); bindParam (s, 2, "1973-01-12"); bindParam (s, 3, null); bindParam (s, 4, "eggroll"); bindParam (s, 5, 4); count = s.executeUpdate ( ); s.close ( ); // close statement
The implementation of bindParam( ) requires multiple functions, because the third argument can be of different types, so we need one function for each type. The following code shows versions that handle integer and string data values (the string version handles null and binds it to NULL):
public static void bindParam (PreparedStatement s, int pos, int val) { try { s.setInt (pos, val); } catch (Exception e) { /* catch and ignore */ } } public static void bindParam (PreparedStatement s, int pos, String val) { try { if (val == null) s.setNull (pos, java.sql.Types.CHAR); else s.setString (pos, val); } catch (Exception e) { /* catch and ignore */ } }
To handle additional data types, you'd write other versions of bindParam( ) that accept arguments of the appropriate type.