Processing File Uploads
18.9.1 Problem
You want to allow files to be uploaded your web server and stored in your database.
18.9.2 Solution
Present the user with a web form that includes a file field. Use a file field in a web form. When the user submits the form, extract the file and store it in MySQL.
18.9.3 Discussion
One special kind of web input is an uploaded file. A file is sent as part of a POST request, but it's handled differently than other POST parameters, because a file is represented by several pieces of information such as its contents, its MIME type, its original filename on the client, and its name in temporary storage on the web server host.
To handle file uploads, you must send a special kind of form to the user; this is true no matter what API you use to create the form. However, when the user submits the form, the operations that check for and process an uploaded file are API-specific.
To create a form that allows files to be uploaded, the opening
tag should specify the POST method and must also include an enctype (encoding type) attribute with a value of multipart/form-data:
If you don't specify this kind of encoding, the form will be submitted using the default encoding type (application/x-www-form-urlencoded) and file uploads will not work properly.
To include a file upload field in the form, use an element of type file. For example, to present a 60-character file field named upload_file, the element looks like this:
The browser displays this field as a text input box into which the user can enter the name manually. It also presents a Browse button for selecting the file via the standard file-browsing system dialog. When the user chooses a file and submits the form, the browser encodes the file contents for inclusion into the resulting POST request. At that point, the web server receives the request and invokes your script to process it. The specifics vary for particular APIs, but file uploads generally work like this:
- The file will already have been uploaded and stored in a temporary directory by the time your upload-handling script begins executing. All your script has to do is read it. The temporary file will be available to your script either as an open file descriptor or the temporary filename, or perhaps both. The size of the file can be obtained through the file descriptor. The API may also make available other information about the file, such as its MIME type. (But note that some browsers may not send a MIME value.)
- Uploaded files are deleted automatically by the web server when your script terminates. If you want a file's contents to persist beyond the end of your script's execution, you'll have to save it to a more permanent location (for example, in a database or somewhere else in the filesystem). If you save it in the filesystem, the directory where you store it must be accessible to the web server.
- The API may allow you to control the location of the temporary file directory or the maximum size of uploaded files. Changing the directory to one that is accessible only to your web server may improve security a bit against local exploits by other users with login accounts on the server host.
This section discusses how to create forms that include a file upload field. It also demonstrates how to handle uploads using a Perl script, post_image.pl. The script is somewhat similar to the store_image.pl script for loading images from the command line (Recipe 17.7). post_image.pl differs in that it allows you to store images over the Web by uploading them, and it stores images only in MySQL, whereas store_image.pl stores them in both MySQL and the filesystem.
This section also discusses how to obtain file upload information using PHP and Python. It does not repeat the entire image-posting scenario shown for Perl, but the recipes distribution contains equivalent implementations of post_image.pl for PHP and Python.
18.9.4 Perl
You can specify multipart encoding for a form several ways using the CGI.pm module. The following statements are all equivalent:
print start_form (-action => url ( ), -enctype => "multipart/form-data"); print start_form (-action => url ( ), -enctype => MULTIPART ( )); print start_multipart_form (-action => url ( ));
The first statement specifies the encoding type literally. The second uses the CGI.pm MULTIPART( ) function, which is easier than trying to remember the literal encoding value. The third statement is easiest of all, because start_multipart_form( ) supplies the enctype parameter automatically. (Like start_form( ), start_multipart_form( ) uses a default request method of POST, so you need not include a method argument.)
Here's a simple form that includes a text field for assigning a name to an image, a file field for selecting the image file, and a submit button:
print start_multipart_form (-action => url ( )), "Image name:", br ( ), textfield (-name =>"image_name", -size => 60), br ( ), "Image file:", br ( ), filefield (-name =>"upload_file", -size => 60), br ( ), br ( ), submit (-name => "choice", -value => "Submit"), end_form ( );
When the user submits an uploaded file, begin processing it by extracting the parameter value for the file field:
$file = param ("upload_file");
The value for a file upload parameter is special in CGI.pm because you can use it two ways. You can treat it as an open file handle to read the file's contents, or pass it to uploadInfo( ) to obtain a reference to a hash that provides information about the file such as its MIME type. The following listing shows how post_image.pl presents the form and processes a submitted form. When first invoked, post_image.pl generates a form with an upload field. For the initial invocation, no file will have been uploaded, so the script does nothing else. If the user submitted an image file, the script gets the image name, reads the file contents, determines its MIME type, and stores a new record in the image table. For illustrative purposes, post_image.pl also displays all the information that the uploadInfo( ) function makes available about the uploaded file.
#! /usr/bin/perl -w # post_image.pl - allow user to upload image files via POST requests use strict; use lib qw(/usr/local/apache/lib/perl); use CGI qw(:standard escapeHTML); use Cookbook; print header ( ), start_html (-title => "Post Image", -bgcolor => "white"); # Use multipart encoding because the form contains a file upload field print start_multipart_form (-action => url ( )), "Image name:", br ( ), textfield (-name =>"image_name", -size => 60), br ( ), "Image file:", br ( ), filefield (-name =>"upload_file", -size => 60), br ( ), br ( ), submit (-name => "choice", -value => "Submit"), end_form ( ); # Get a handle to the image file and the name to assign to the image my $image_file = param ("upload_file"); my $image_name = param ("image_name"); # Must have either no parameters (in which case that script was just # invoked for the first time) or both parameters (in which case the form # was filled in). If only one was filled in, the user did not fill in the # form completely. my $param_count = 0; ++$param_count if defined ($image_file) && $image_file ne ""; ++$param_count if defined ($image_name) && $image_name ne ""; if ($param_count == 0) # initial invocation { print p ("No file was uploaded."); } elsif ($param_count == 1) # incomplete form { print p ("Please fill in BOTH fields and resubmit the form."); } else # a file was uploaded { my ($size, $data); # If an image file was uploaded, print some information about it, # then save it in the database. # Get reference to hash containing information about file # and display the information in "key=x, value=y" format my $info_ref = uploadInfo ($image_file); print p ("Information about uploaded file:"); foreach my $key (sort (keys (%{$info_ref}))) { printf p ("key=" . escapeHTML ($key) . ", value=" . escapeHTML ($info_ref->{$key})); } $size = (stat ($image_file))[7]; # get file size from file handle print p ("File size: " . $size); binmode ($image_file); # helpful for binary data if (sysread ($image_file, $data, $size) != $size) { print p ("File contents could not be read."); } else { print p ("File contents were read without error."); # Get MIME type, use generic default if not present my $mime_type = $info_ref->{'Content-Type'}; $mime_type = "application/octet-stream" unless defined ($mime_type); # Save image in database table. (Use REPLACE to kick out any # old image with same name.) my $dbh = Cookbook::connect ( ); $dbh->do ("REPLACE INTO image (name,type,data) VALUES(?,?,?)", undef, $image_name, $mime_type, $data); $dbh->disconnect ( ); } } print end_html ( ); exit (0);
18.9.5 PHP
To write an upload form in PHP, include a file field. If you wish, you may also include a hidden field preceding the file field that has a name of MAX_FILE_SIZE and a value of the largest file size you're willing to accept:
Image name: Image file:
Be aware that MAX_FILE_SIZE is advisory only, because it can be subverted easily. To specify a value that cannot be exceeded, use the upload_max_filesize configuration setting in the PHP initialization file. There is also a file_uploads setting that controls whether or not file uploads are allowed at all.
When the user submits the form, file upload information may be obtained as follows:
- As of PHP 4.1, file upload information from POST requests is placed in a separate array, $_FILES, which has one entry for each uploaded file. Each entry is itself an array with four elements. For example, if a form has a file field named upload_file and the user submits a file, information about it is available in the following variables:
$_FILES["upload_file]["name"] original filename on client host $_FILES["upload_file]["tmp_name"] temporary filename on server host $_FILES["upload_file]["size"] file size, in bytes $_FILES["upload_file]["type"] file MIME type
Be careful here, because there may be an entry for an upload field even if the user submitted no file. In this case, the tmp_name value will be the empty string or the string none.
- Earlier PHP 4 releases have file upload information in a separate array, $HTTP_POST_FILES, which has entries that are structured like those in $_FILES. For a file field named upload_file, information about it is available in the following variables:
$HTTP_POST_FILES["upload_file]["name"] original filename on client host $HTTP_POST_FILES["upload_file]["tmp_name"] temporary filename on server host $HTTP_POST_FILES["upload_file]["size"] file size, in bytes $HTTP_POST_FILES["upload_file]["type"] file MIME type
- Prior to PHP 4, file upload information for a field named upload_file is available in a set of four $HTTP_POST_VARS variables:
$HTTP_POST_VARS["upload_file_name"] original filename on client host $HTTP_POST_VARS["upload_file"] temporary filename on server host $HTTP_POST_VARS["upload_file_size"] file size, in bytes $HTTP_POST_VARS["upload_file_type"] file MIME type
$_FILES is a superglobal array (global in any scope). $HTTP_POST_FILES and $HTTP_POST_VARS must be declared with the global keyword if used in a non-global scope, such as within a function.
To avoid having to fool around figuring out which array contains file upload information, it makes sense to write a utility routine that does all the work. The following function, get_upload_info( ), takes an argument corresponding to the name of a file upload field. Then it examines the $_FILES, $HTTP_POST_FILES, and $HTTP_POST_VARS arrays as necessary and returns an associative array of information about the file, or an unset value if the information is not available. For a successful call, the array element keys are "tmp_name", "name", "size", and "type" (that is, the keys are the same as those in the entries within the $_FILES or $HTTP_POST_FILES arrays.)
function get_upload_info ($name) { global $HTTP_POST_FILES, $HTTP_POST_VARS; unset ($unset); # Look for information in PHP 4.1 $_FILES array first. # Check the tmp_name member to make sure there is a file. (The entry # in $_FILES might be present even if no file was uploaded.) if (isset ($_FILES)) { if (isset ($_FILES[$name]) && $_FILES[$name]["tmp_name"] != "" && $_FILES[$name]["tmp_name"] != "none") return ($_FILES[$name]); return (@$unset); } # Look for information in PHP 4 $HTTP_POST_FILES array next. if (isset ($HTTP_POST_FILES)) { if (isset ($HTTP_POST_FILES[$name]) && $HTTP_POST_FILES[$name]["tmp_name"] != "" && $HTTP_POST_FILES[$name]["tmp_name"] != "none") return ($HTTP_POST_FILES[$name]); return (@$unset); } # Look for PHP 3 style upload variables. # Check the _name member, because $HTTP_POST_VARS[$name] might not # actually be a file field. if (isset ($HTTP_POST_VARS[$name]) && isset ($HTTP_POST_VARS[$name . "_name"])) { # Map PHP 3 elements to PHP 4-style element names $info = array ( ); $info["name"] = $HTTP_POST_VARS[$name . "_name"]; $info["tmp_name"] = $HTTP_POST_VARS[$name]; $info["size"] = $HTTP_POST_VARS[$name . "_size"]; $info["type"] = $HTTP_POST_VARS[$name . "_type"]; return ($info); } return (@$unset); }
See the post_image.php script for details about how to use this function to get image information and store it in MySQL.
The upload_tmp_dir PHP configuration setting controls where uploaded files are saved. This is /tmp by default on many systems, but you may want to override it to reconfigure PHP to use a different directory that's owned by the web server user ID and thus more private.
18.9.6 Python
A simple upload form in Python can be written like this:
print "
" % (os.environ["SCRIPT_NAME"]) print "Image name:
" print "" print "
" print "Image file:
" print "" print "
" print "" print "
"
When the user submits the form, its contents can be obtained using the FieldStorage( ) method of the cgi module. (See Recipe 18.6.) The resulting object contains an element for each input parameter. For a file upload field, you get this information as follows:
form = cgi.FieldStorage ( ) if form.has_key ("upload_file") and form["upload_file"].filename != "": image_file = form["upload_file"] else: image_file = None
According to most of the documentation that I have read, the file attribute of an object that corresponds to a file field should be true if a file has been uploaded. Unfortunately, the file attribute seems to be true even when the user submits the form but leaves the file field blank. It may even be the case that the type attribute is set when no file actually was uploaded (for example, to application/octet-stream). In my experience, a more reliable way to determine whether a file really was uploaded is to test the filename attribute:
form = cgi.FieldStorage ( ) if form.has_key ("upload_file") and form["upload_file"].filename: print "
A file was uploaded
" else: print "
A file was not uploaded
"
Assuming that a file was uploaded, access the parameter's value attribute to read the file and obtain its contents:
data = form["upload_file"].value
See the post_image.py script for details about how to use this function to get image information and store it in MySQL.