Storing Images or Other Binary Data

17.7.1 Problem

You want to store images in MySQL.

17.7.2 Solution

That's not difficult, provided you follow the proper precautions for encoding the image data.

17.7.3 Discussion

Web sites are not limited to displaying text. They can also serve various forms of binary data such as images, sounds, PDF documents, and so forth. However, images are by far the most common kind of binary data, and because image storage is a natural application for a database, a very common question is "How do I store images in MySQL?" Many people will answer this question by saying, "Don't do it!" and some of the reasons for this are discussed in the sidebar "Should You Store Images in Your Database?" Because it's important to know how to work with binary data, this section does show how to store images in MySQL. Nevertheless, in recognition that that may not always be the best thing to do, the section also shows how to store images in the filesystem.

Although the discussion here is phrased in terms of working with images, the principles apply to any kind of binary data, such as PDF files or compressed text. In fact, they apply to any kind of data at all, including text; people tend to think of images as special somehow, but they're not.

One reason image storage confuses people more often than does storing other types of information like text strings or numbers is that it's difficult to type in an image value manually. For example, you can easily use mysql to enter an INSERT statement to store a number like 3.48 or a string like Je voudrais une bicyclette rouge, but images contain binary data and it's not easy to refer to them by value. So you need to do something else. Your options are:

Should You Store Images in Your Database?

Deciding where to store images is a matter of trade-offs. There are pros and cons whether you store images in the database or in the filesystem:

  • Storing images in a database table bloats the table. With a lot of images, you're more likely to approach any limits your operating system places on table size. On the other hand, if you store images in the filesystem, directory lookups may become slow. To avoid this, you may be able to implement some kind of hierarchical storage or use a filesystem that has good lookup performance for large directories (such as the Reiser filesystem).
  • Using a database localizes storage for images that are used across multiple web servers on different hosts. Images stored in the filesystem must be stored locally on the web server host. In a multiple-host situation, that means you must replicate the set of images to the filesystem of each host. If you store the images in MySQL, only one copy of the images is required; each web server can get the images from the same database server.
  • When images are stored in the filesystem, they constitute in essence a foreign key. Image manipulation requires two operations: one in the database and one in the filesystem. This in turn means that if you require transactional behavior, it's more difficult to implementnot only do you have two operations, they take place in different domains. Storing images in the database is simpler because adding, updating, or removing an image requires only a single record operation. It becomes unnecessary to make sure the image table and the filesystem remain in sync.
  • It can be faster to serve images over the Web from the filesystem than from the database, because the web server itself opens the file, reads it, and writes it to the client. Images stored in the database must be read and written twice. First, the MySQL server reads the image from the database and writes it to your web script. Then the script reads the image and writes it to the client.
  • Images stored in the filesystem can be referred to directly in web pages by means of tag links that point to the image files. Images stored in MySQL must be served by a script that retrieves an image and sends it to the client. However, even if images are stored in the filesystem and accessible to the web server, you might still want to serve them through a script. This would be appropriate if you need to account for the number of times you serve each image (such as for banner ad displays where you charge customers by the number of ad impressions) or if you want to select an image at request time (such as when you pick an ad at random).
  • If you store images in the database, you need to use a column type such as a BLOB. This is a variable length type, so the table itself will have variable-length rows. Operations on fixed-length rows are often quicker, so you may gain some table lookup speed by storing images in the filesystem and using fix-length types for the columns in the image table.

17.7.4 Storing Images with LOAD_FILE( )

The LOAD_FILE( ) function takes an argument indicating a file to be read and stored in the database. For example, an image stored in /tmp/myimage.png might be loaded into a table like this:

mysql> INSERT INTO mytbl (image_data) VALUES(LOAD_FILE('/tmp/myimage.png'));

To load images into MySQL with LOAD_FILE( ), certain requirements must be satisfied:

These constraints mean that LOAD_FILE( ) is available only to some MySQL users.

17.7.5 Storing Images Using a Script

If LOAD_FILE( ) is not an option or you don't want to use it, you can write a short program to load your images. The program should either read the contents of an image file and create a record that contains the image data, or create a record that indicates where in the filesystem the image file is located. If you elect to store the image in MySQL, you include the image data in the record-creation statement the same way as any other kind of data. That is, you either use a placeholder and bind the data value to it, or else encode the data and put it directly into the query string.

The script shown in this section, store_image.pl, runs from the command line and stores an image file for later use. The script takes no side in the debate over whether to store images in the database or the filesystem. Instead, it demonstrates how to implement both approaches! Of course, this takes double the storage space, so to adapt this script for your own use, you'll want to retain only the parts that are appropriate for whichever storage method you want to implement. The necessary modifications are discussed at the end of this section.

The store_image.pl script uses an image table that includes columns for the image ID, name, and MIME type, and a column in which to store the image data:

CREATE TABLE image ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, # image ID number name VARCHAR(30) NOT NULL, # image name type VARCHAR(20) NOT NULL, # image MIME type data MEDIUMBLOB NOT NULL, # image data PRIMARY KEY (id), # id and name are unique UNIQUE (name) );

The name column indicates the name of the image file in the directory where images are stored in the filesystem. The data column is a MEDIUMBLOB, which is good for images smaller than 16 MB. If you need larger images, use a LONGBLOB column.

It is possible to use the name column to store full pathnames to images in the database, but if you put them all under the same directory, you can store names that are relative to that directory and name values will take less space. store_image.pl does this, but of course it needs to know the pathname of the image storage directory. That's what the $image_dir variable is for. You should check this variable's value and modify it as necessary before running the script. The default value reflects where I like to store images, but you'll need to change it according to your own preferences. Make sure to create the directory if it doesn't exist before you run the script. You'll also need to check and possibly change the image directory pathname in the display_image.pl script discussed later in this chapter.

store_image.pl looks like this:

#! /usr/bin/perl -w # store_image.pl - read an image file, store in that image table and # in the filesystem. (Normally, you'd store images only in one # place or another; this script demonstrates how to do both.) use strict; use lib qw(/usr/local/apache/lib/perl); use Fcntl; # for O_RDONLY, O_WRONLY, O_CREAT use FileHandle; use Cookbook; # Default image storage directory and pathname separator # (CHANGE THESE AS NECESSARY) my $image_dir = "/usr/local/apache/htdocs/mcb/images"; my $path_sep = "/"; # Reset directory and pathname separator for Windows/DOS if ($^O =~ /^MSWin/i || $^O =~ /^dos/) { $image_dir = "D:\apache\htdocs\mcb\images"; $path_sep = "\"; } -d $image_dir or die "$0: image directory ($image_dir) does not exist "; # Print help message if script was not invoked properly (@ARGV == 2 || @ARGV == 3) or die <close ( ); # Save image file in filesystem under $image_dir. (Overwrite file # if an old version exists.) my $image_path = $image_dir . $path_sep . $image_name; sysopen ($fh, $image_path, O_WRONLY|O_CREAT) or die "Cannot open $image_path: $! "; binmode ($fh); # helpful for binary data syswrite ($fh, $data, $size) == $size or die "Failed to write entire image file $image_path: $! "; $fh->close ( ); # Save image in database table. (Use REPLACE to kick out any old image # with same name.) my $dbh = Cookbook::connect ( ); $dbh->do ("REPLACE INTO image (name,type,data) VALUES(?,?,?)", undef, $image_name, $mime_type, $data); $dbh->disconnect ( ); exit (0);

If you invoke the script with no arguments, it displays a short help message. Otherwise, it requires two arguments that specify the name of the image file and the MIME type of the image. By default, the file's basename (final component) is also used as the name of the image stored in the database and in the image directory. To use a different name, provide it using an optional third argument.

The script is fairly straightforward. It performs the following actions:

  1. Check that the proper number of arguments was given and initialize some variables from them.
  2. Make sure the image directory exists. If it does not, the script cannot continue.
  3. Open and read the contents of the image file.
  4. Store the image as a file in the image directory.
  5. Store a record containing identifying information and the image data in the image table.

store_image.pl uses REPLACE rather than INSERT so that you can replace an old image with a new version having the same name simply by loading the new one. The query specifies no id column value; id is an AUTO_INCREMENT column, so MySQL assigns it a unique sequence number automatically. (Note that if you replace an image by loading a new one with the same name as an existing image, the REPLACE statement will generate a new id value. If you want to keep the old value, you should issue a SELECT first to see if the name already exists, then modify the REPLACE to specify the existing id value if a record was found, and NULL otherwise.)

The REPLACE statement that stores the image information into MySQL is relatively mundane:

$dbh->do ("REPLACE INTO image (name,type,data) VALUES(?,?,?)", undef, $image_name, $mime_type, $data);

If you examine the statement looking for some special indicator of how to handle binary data, you'll be disappointed, because the $data variable that contains the image isn't treated as special in any way. The query refers to all column values uniformly using ? placeholder characters and the values are passed at the end of the do( ) call. Another way to accomplish the same result is to perform escape processing on the column values explicitly and then insert them directly into the query string:

$image_name = $dbh->quote ($image_name); $mime_type = $dbh->quote ($mime_type); $data = $dbh->quote ($data); $dbh->do ("REPLACE INTO image (name,type,data) VALUES($image_name,$mime_type,$data)");

Many people make image-handling a lot more troublesome than it really is. If you properly handle image data in a query by using placeholders or by encoding it, you'll have no problems. If you don't, you'll get errors. Simple as that. This is no different than how you should handle other kinds of data, even text. After all, if you insert into a query a piece of text that contains quotes or other special characters without escaping them, the query will blow up in your face. So the need for placeholders or encoding is not some special thing that's necessary only for imagesit's necessary for all data. Say it with me: "I will always use placeholders or encode my column values. Always. Always, always, always." (Having said that, I feel obliged to point out that if you know enough about a given valuefor example, if you're absolutely certain that it's an integerthere are times you can get away with breaking this rule. Nevertheless, it's never wrong to follow it.)

To try out the script, change location into the apache/images directory of the recipes distribution. This directory contains the store_image.pl script, and some sample images are in its flags subdirectory (they're pictures of national flags for several countries). To load one of these images, run the script like this under Unix:

% ./store_image.pl flags/iceland.jpg image/jpeg

Or like this under Windows:

C:> store_image.pl flagsiceland.jpg image/jpeg

store_image.pl takes care of image storage, and the next section discusses how to retrieve images to serve them over the Web. What about deleting images? I'll leave it to you to write a utility to remove images that you no longer want. If you are storing images in the filesystem, remember to delete both the database record and the image file that the record points to.

store_image.pl stores each image both in the database and in the filesystem for illustrative purposes, but of course that makes it inefficient. Earlier, I mentioned that if you use this script as a basis for your own applications, you should modify it to store images only in one placeeither in the database or in the filesystemnot in both places. The modifications are as follows:

These modifications also apply to the display_image.pl image processing script shown in Recipe 17.8.

17.7.6 See Also

Recipe 17.8 shows how to retrieve images for display over the Web. Recipe 18.9 discusses how to upload images from a web page for storage into MySQL.

Категории