High Performance MySQL: Optimization, Backups, Replication, and More

Each table in a database is made up of one or more columns. When you create a table using a CREATE TABLE statement, you specify a type for each column. A column type is more specific than a data type, which is just a general category, such as "number" or "string." A column type precisely characterizes the kind of values a given table column can contain, such as SMALLINT or VARCHAR(32).

MySQL's column types are the means by which you describe what kinds of values a table's columns contain, which in turn determines how MySQL treats those values. For example, if you have numeric values, you can store them using a numeric or a string column type, but MySQL will treat the values somewhat differently depending on how you store them. Each column type has several characteristics:

  • What kind of values you can store in it

  • How much space values take up, and whether the values are fixed-length (all values of the type taking the same amount of space) or variable-length (the amount of space depending on the particular value being stored)

  • How values of the type are compared and sorted

  • Whether the type allows NULL values

  • Whether the type can be indexed

The following discussion surveys MySQL's column types briefly in a broad overview and then describes in more detail the properties that characterize each type.

Overview of Column Types

MySQL provides column types for values from all the general data type categories except the NULL value. NULL spans all types in the sense that the property of whether a column can contain NULL values is treated as a type attribute.

MySQL has numeric column types for both integer and floating-point values, as shown in Table 2.2. Integer columns can be signed or unsigned. A special attribute allows sequential integer column values to be generated automatically, which is useful in applications that require a series of unique identification numbers.

Table 2.2. Numeric Column Types

Type Name Meaning
TINYINT A very small integer
SMALLINT A small integer
MEDIUMINT A medium-sized integer
INT A standard integer
BIGINT A large integer
FLOAT A single-precision floating-point number
DOUBLE A double-precision floating-point number
DECIMAL A floating-point number, represented as a string

MySQL string column types are shown in Table 2.3. Strings can hold anything, even arbitrary binary data such as images or sounds. Strings can be compared according to whether or not they are case sensitive. In addition, you can perform pattern matching on strings. (Actually, in MySQL you can perform pattern matching on any column type, but it's most often done with string types.)

Table 2.3. String Column Types

Type Name Meaning
CHAR A fixed-length character string
VARCHAR A variable-length character string
TINYBLOB A very small BLOB (binary large object)
BLOB A small BLOB
MEDIUMBLOB A medium-sized BLOB
LONGBLOB A large BLOB
TINYTEXT A very small text string
TEXT A small text string
MEDIUMTEXT A medium-sized text string
LONGTEXT A large text string
ENUM An enumeration; column values may be assigned one enumeration member
SET A set; column values may be assigned multiple set members

MySQL date and time types are shown in Table 2.4, where CC, YY, MM, DD hh, mm, and ss represent century, year, month, day, hour, minute, and second. For temporal values, MySQL provides types for dates and times (either combined or separate) and timestamps (a special type that allows you to track when changes were last made to a record). There is also a type for efficiently representing year values when you don't need an entire date.

Table 2.4. Date and Time Column Types

Type Name Meaning
DATE A date value, in 'CCYY-MM-DD' format
TIME A time value, in 'hh:mm:ss' format
DATETIME A date and time value, in 'CCYY-MM-DD hh:mm:ss' format
TIMESTAMP A timestamp value, in CCYYMMDDhhmmss format
YEAR A year value, in CCYY format

Creating Tables

To create a table, issue a CREATE TABLE statement and specify a list of the columns that make up the table. Each column has a name and a type, and various attributes can be associated with each type. The following example creates a table named mytbl containing three columns named f, c, and i:

CREATE TABLE mytbl ( f FLOAT(10,4), c CHAR(15) NOT NULL DEFAULT 'none', i TINYINT UNSIGNED NULL );

The syntax for declaring a column is as follows:

col_name col_type [col_attributes] [general_attributes]

The name of the column, col_name, is always first in the definition. The precise rules for naming columns are given in the "MySQL Naming Rules" section of Chapter 3. Briefly summarized, column names can be up to 64 characters long and can consist of alphanumeric characters from the server's default character set, as well as the underscore and dollar sign characters ('_' and '$'). Function names (words such as POS and MIN) are not reserved and can be used as column names; but keywords, such as SELECT, DELETE, and CREATE, normally are reserved and cannot be used, but as of MySQL 3.23.6, you can include other characters within a name or use reserved words by enclosing the name within backtick ('`') characters. A column name can begin with any character that is legal in a name, including a digit. However, unless quoted within backticks, a name cannot consist entirely of digits because then it would appear to be a number.

The column type col_type indicates the specific kind of values the column can hold. The type specifier can also indicate the maximum length of the values you store in the column. For some types, you specify the length explicitly as a number. For others, the length is implied by the type name. For example, CHAR(10) specifies an explicit length of 10 characters, whereas TINYBLOB values have an implicit maximum length of 255 characters. Some of the type specifiers allow you to indicate a maximum display width (how many characters to use for displaying values). Floating-point types allow the number of decimal places to be specified, so you can control how precise values are.

Following the column type, you can specify optional type-specific attributes as well as more general attributes. These attributes function as type modifiers. They cause MySQL to change the way it treats column values in some way:

  • The type-specific attributes that are allowable depend on the column type you choose. For example, UNSIGNED is allowable only for numeric types, and BINARY is allowable only for CHAR and VARCHAR.

  • The general attributes can be given for any column type, with a few exceptions. You can specify NULL or NOT NULL to indicate whether a column can hold NULL values. For all but BLOB and TEXT types, you can specify DEFAULT def_value to indicate that a column should be assigned the value def_value when a new row is created that does not explicitly specify the column's value. The value of def_value must be a constant; it cannot be an expression or refer to other columns.

If multiple column attributes are given, there are some constraints on the order in which they may appear. In general, you should be safe if you specify column type-specific attributes such as UNSIGNED or ZEROFILL before general attributes such as NULL or NOT NULL.

The rest of this section discusses the syntax for declaring each of MySQL's column types and the properties that characterize them, such as their range and storage requirements. The type specifications are shown as you use them in CREATE TABLE statements. Optional information is indicated by square brackets ([]). For example, the syntax MEDIUMINT[(M)] indicates that the maximum display width, specified as (M), is optional. On the other hand, for VARCHAR(M), the lack of brackets indicates that (M) is required.

Numeric Column Types

MySQL's numeric column types fall into two general classifications:

  • Integer types are used for numbers that have no fractional part, such as 1, 43, -3, 0, or -798432. You can use integer columns for data represented by whole numbers, such as weight to the nearest pound, height to the nearest inch, number of stars in a galaxy, number of people in a household, or number of bacteria in a petri dish.

  • Floating-point types are used for numbers that may have a fractional part, such as 3.14159, -.00273, -4.78, or 39.3E+4. You can use floating-point column types for values that may have a fractional part or that are extremely large or small. Some types of data you might represent as floating-point values are average crop yield, distances, money values, unemployment rates, or stock prices.

Integer types are the simplest. Floating-point types are more complex, particularly because their behavior has changed at certain points in MySQL's development.

Floating-point values can be assigned to integer columns but will be rounded to the nearest integer. Conversely, integer values can be assigned to floating-point columns. They are treated as floating-point values with a fractional part of zero.

Table 2.5 shows the name and range of each numeric type, and Table 2.6 shows the amount of storage required for values of each type.

Table 2.5. Numeric Column Type Ranges

Type Specification Range
TINYINT[(M)] Signed values: 128 to 127 ( 27 to 27 1)
  Unsigned values: 0 to 255 (0 to 28 1)
SMALLINT[(M)] Signed values: 32768 to 32767 ( 215 to 215 1)
  Unsigned values: 0 to 65535 (0 to 216 1)
MEDIUMINT[(M)] Signed values: 8388608 to 8388607 ( 223 to 223 1)
  Unsigned values: 0 to 16777215 (0 to 224 1)
INT[(M)] Signed values: 2147683648 to 2147483647 ( 231 to 231 1)
  Unsigned values: 0 to 4294967295 (0 to 232 1)
BIGINT[(M)] Signed values: 9223372036854775808 to 9223372036854775807 ( 263 to 263 1)
  Unsigned values: 0 to 18446744073709551615 (0 to 264 1)
FLOAT[(M,D)] Minimum non-zero values: ±1.175494351E 38
  Maximum non-zero values: ±3.402823466E+38
DOUBLE[(M,D)] Minimum non-zero values: ±2.2250738585072014E 308
  Maximum non-zero values: ±1.7976931348623157E+308
DECIMAL([M[,D]]) Varies; range depends on M and D

Table 2.6. Numeric Column Type Storage Requirements

Type Specification Storage Required
TINYINT[(M)] 1 byte
SMALLINT[(M)] 2 bytes
MEDIUMINT[(M)] 3 bytes
INT[(M)] 4 bytes
BIGINT[(M)] 8 bytes
FLOAT[(M,D)] 4 bytes
DOUBLE[(M,D)] 8 bytes
DECIMAL([M[,D]]) M bytes (MySQL < 3.23), M+2 bytes (MySQL 3.23)

Integer Column Types

MySQL provides five integer types: TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT. INTEGER is a synonym for INT. These types vary in the range of values they can represent and in the amount of storage space they require. (Types with a larger range require more storage.) Integer columns can be declared as UNSIGNED to disallow negative values; this shifts the range for the column upward to begin at 0.

When you declare an integer column, you can specify an optional display size M. If given, M should be an integer from 1 to 255. It represents the number of characters used to display values for the column. For example, MEDIUMINT(4) specifies a MEDIUMINT column with a display width of 4. If you declare an integer column without an explicit width, a default width is assigned. The defaults are the lengths of the "longest" values for each type. Note that displayed values are not chopped to fit within M characters. If the printable representation of a particular value requires more than M characters, MySQL displays the full value.

The display size M for an integer column is related only to the number of characters used to display column values. It has nothing to do with the number of bytes of storage space required. For example, BIGINT values require 8 bytes of storage regardless of the display width. It is not possible to magically cut the required storage space for a BIGINT column in half by declaring it as BIGINT(4). Nor does M have anything to do with the range of values allowed. If you declare a column as INT(3), that will not restrict it to a maximum value of 999.

The following statement creates a table to illustrate the default values of M and D for integer column types:

CREATE TABLE mytbl ( itiny TINYINT, itiny_u TINYINT UNSIGNED, ismall SMALLINT, ismall_u SMALLINT UNSIGNED, imedium MEDIUMINT, imedium_u MEDIUMINT UNSIGNED, ireg INT, ireg_u INT UNSIGNED, ibig BIGINT, ibig_u BIGINT UNSIGNED );

If you issue a DESCRIBE mytbl statement after creating the table, the number following each type name shows the value that MySQL uses by default in the absence of an explicit display width specifier: [1]

[1] Due to a minor glitch, the display width for BIGINT will be 21 (not 20) if you run this query using a version of MySQL older than 3.23.

+-----------+-----------------------+ | Field | Type | +-----------+-----------------------+ | itiny | tinyint(4) | | itiny_u | tinyint(3) unsigned | | ismall | smallint(6) | | ismall_u | smallint(5) unsigned | | imedium | mediumint(9) | | imedium_u | mediumint(8) unsigned | | ireg | int(11) | | ireg_u | int(10) unsigned | | ibig | bigint(20) | | ibig_u | bigint(20) unsigned | +-----------+-----------------------+

Floating-Point Column Types

MySQL provides three floating-point types: FLOAT, DOUBLE, and DECIMAL. Synonymous types are DOUBLE PRECISION and REAL for DOUBLE, and NUMERIC for DECIMAL. Ranges for these types differ from ranges for integer types in the sense that there is not only a maximum value a floating-point type can represent, but also a minimum non-zero value. The minimum values provide a measure of how precise the type is, which is often important for recording scientific data. (There are, of course, corresponding negative maximum and minimum values.)

Floating-point types can be declared as UNSIGNED, although not until MySQL 4.0.2 for FLOAT and DOUBLE. Unlike the integer types, declaring a floating-point type UNSIGNED doesn't shift the type's range upward, it merely eliminates the negative end.

For each floating-point type, you can specify a maximum display size M and the number of decimal places D. The value of M should be from 1 to 255. The value of D can be from 0 to 30, but should be no more than M 2. (If you're more familiar with ODBC terms, M and D correspond to the ODBC concepts of "precision" and "scale.")

For FLOAT and DOUBLE, M and D are optional. If they are omitted, these types are treated as follows:

  • Prior to MySQL 3.23.6, FLOAT and DOUBLE are treated as FLOAT(10,2) and DOUBLE(16,4) with stored values rounded to 2 and 4 decimals, respectively.

  • For MySQL 3.23.6 and later, FLOAT and DOUBLE are stored to the full precision allowed by your hardware.

For DECIMAL, M and D may or may not be optional, depending on your version of MySQL:

  • Prior to MySQL 3.23.6, M and D are required for DECIMAL columns.

  • For MySQL 3.23.6 and later, if D is omitted, it defaults to 0. If M is omitted as well, it defaults to 10. In other words, the following equivalences hold:

    DECIMAL = DECIMAL(10) = DECIMAL(10,0) DECIMAL(n) = DECIMAL(n,0)

FLOAT(p) syntax is also allowed for ODBC compatibility. However, the precise behavior of columns specified using this syntax is somewhat complicated:

  • Prior to MySQL 3.23, the allowable values of p are 4 and 8, indicating the number of bytes of storage per value. FLOAT(4) and FLOAT(8) are treated as FLOAT(10,2) and DOUBLE(16,4) with stored values rounded to 2 and 4 decimals, respectively.

  • For MySQL 3.23.0 to 3.23.5, the allowable values of p are still 4 and 8 and indicate the number of bytes of storage, but FLOAT(4) and FLOAT(8) are treated as single-precision and double-precision columns with values stored to full hardware precision.

  • For MySQL 3.23.6 and later, p can range from 0 to 53 and indicates the minimum number of bits of precision required for stored values. For p values from 0 to 24, the column is treated as single-precision. For values from 25 to 53, the column is treated as double-precision.

More confusing still is that MySQL allows FLOAT4 and FLOAT8 as synonyms, but what they are synonyms for depends on your version of MySQL:

  • Prior to MySQL 3.23.6, FLOAT4 and FLOAT8 are equivalent to FLOAT(10,2) and DOUBLE(16,4).

  • For MySQL 3.23.6 and later, FLOAT4 and FLOAT8 are equivalent to FLOAT and DOUBLE.

If you carefully compare these equivalences to those for FLOAT(4) and FLOAT(8), you'll see that FLOAT4 and FLOAT8 are not quite the same as FLOAT(4) and FLOAT(8), although you might have expected them to be.

Checking How MySQL Treats a Type Specification

If you're not sure how your version of MySQL will treat a given floating-point column specification, try the following. Create a table that contains a column defined the way you're wondering about and then use DESCRIBE to see how MySQL reports the type. For example, in MySQL 3.23.0, if you create a column using FLOAT4, you'd see the following:

mysql> CREATE TABLE t (f FLOAT4); mysql> DESCRIBE t; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | f | float(10,2) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+

In MySQL 3.23.6, you'd see the following instead:

mysql> CREATE TABLE t (f FLOAT4); mysql> DESCRIBE t; +-------+-------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------+------+-----+---------+-------+ | f | float | YES | | NULL | | +-------+-------+------+-----+---------+-------+

The lack of a (M,D) indicator in the latter case indicates that values are stored to the full precision allowed by the hardware.

This technique actually works to see how MySQL treats any column definition, but I have found it most useful for floating-point types.

Choosing Numeric Column Types

When you choose a numeric type, consider the range of values you need to represent and choose the smallest type that will cover the range. Choosing a larger type wastes space, leading to tables that are unnecessarily large and that cannot be processed as efficiently as if you had chosen a smaller type. For integer values, TINYINT is the best if the range of values in your data is small, such as a person's age or number of siblings. MEDIUMINT can represent millions of values and can be used for many more types of values, at some additional cost in storage space. BIGINT has the largest range of all but requires twice as much storage as the next smallest integer type (INT) and should be used only when really necessary. For floating-point values, DOUBLE takes twice as much space as FLOAT. Unless you need exceptionally high precision or an extremely large range of values, you can probably represent your data at half the storage cost by using FLOAT.

Every numeric column's range of values is determined by its type. If you attempt to insert a value that lies outside the column's range, truncation occurs; MySQL clips the value to the appropriate endpoint of the range and uses the result. No truncation occurs when values are retrieved.

Value truncation occurs according to the range of the column type, not the display width. For example, a SMALLINT(3) column has a display width of 3 and a range from -32768 to 32767. The value 12345 is wider than the display width but within the range of the column, so it is inserted without clipping and retrieved as 12345. The value 99999 is outside the range, so it is clipped to 32767 when inserted. Subsequent retrievals retrieve the value 32767.

In general, values assigned to a floating-point column are rounded to the number of decimals indicated by the column specification. If you store 1.23456 in a FLOAT(8,1) column, the result is 1.2. If you store the same value in a FLOAT(8,4) column, the result is 1.2346. This means you should declare floating-point columns with a sufficient number of decimals to give you values as precise as you require. If you need accuracy to thousandths, don't declare a type with only two decimal places.

The DECIMAL type is a floating-point type, but it differs from FLOAT and DOUBLE in that DECIMAL values actually are stored as strings and have a fixed number of decimals. The significance of this fact is that DECIMAL values are not subject to roundoff error the way that FLOAT and DOUBLE columns are a property that makes DECIMAL especially applicable to currency calculations. The corresponding tradeoff is that DECIMAL values are not as efficient as floating-point values stored in native format that the processor can operate on directly.

The maximum possible range for DECIMAL is the same as for DOUBLE, but the effective range is determined by the values of M and D. If you vary M and hold D fixed, the range becomes larger as M becomes larger. This is illustrated by Table 2.7. If you hold M fixed and vary D, the range becomes smaller as D becomes larger, although the precision increases. This is shown by Table 2.8.

Table 2.7. How M Affects the Range of DECIMAL(M,D)

Type Specification Range (for MySQL < 3.23) Range (for MySQL 3.23)
DECIMAL(4,1) 9.9 to 99.9 999.9 to 9999.9
DECIMAL(5,1) 99.9 to 999.9 9999.9 to 99999.9
DECIMAL(6,1) 999.9 to 9999.9 99999.9 to 999999.9

Table 2.8. How D Affects the Range of DECIMAL(M,D)

Type Specification Range (for MySQL < 3.23) Range (for MySQL 3.23)
DECIMAL(4,0) 999 to 9999 9999 to 99999
DECIMAL(4,1) 9.9 to 99.9 999.9 to 9999.9
DECIMAL(4,2) .99 to 9.99 99.99 to 999.99

The range for a given DECIMAL type depends on your version of MySQL. As of MySQL 3.23, DECIMAL values are handled according to the ANSI specification, which states that a type of DECIMAL(M,D) must be able to represent any value with M digits and D decimal places. For example, DECIMAL(4,2) must be able to represent values from 99.99 to 99.99. Because the sign character and decimal point must still be stored, this requires an extra two bytes, so DECIMAL(M,D) values for MySQL 3.23 and later use M+2 bytes. For DECIMAL(4,2), six bytes are needed for the "widest" value ( 99.99). At the positive end of the range, the sign byte is not needed to hold a sign character, so MySQL uses it to extend the range beyond that required by the ANSI specification. In other words, for DECIMAL(4,2), the maximum value that can be stored in the six bytes available is 999.99.

There are two special conditions that reduce the DECIMAL storage requirement of M+2 bytes to a lesser value:

  • If D is 0, DECIMAL values have no fractional part and no byte need be allocated to store the decimal point. This reduces the required storage by one byte.

  • If a DECIMAL column is UNSIGNED, no sign character need be stored, also reducing the required storage by one byte.

For versions of MySQL prior to 3.23, DECIMAL values are represented in a slightly different fashion. A DECIMAL(M,D) column is stored using M bytes per value, and the sign character and decimal point (if needed) are included in the M bytes. Thus, for a type DECIMAL(4,2), the range is .99 to 9.99 because those cover all the possible 4-character values. If D is 0, no decimal point need be stored, and the byte usually used for that purpose can be used to store another digit. The effect is to extend the range of the column by an extra order of magnitude. (This explains why the pre-3.23 range in Table 2.8 shifts by a factor of 10 for DECIMAL(4,2) compared to DECIMAL(4,1), but by a factor of 100 for DECIMAL(4,1) compared to DECIMAL(4,0). I bet you didn't even notice that!)

Numeric Column Type Attributes

The ZEROFILL attribute can be specified for all numeric types. It causes displayed values for the column to be padded with leading zeros to the display width. You can use ZEROFILL when you want to make sure column values always display using a given number of digits. Actually, it's more accurate to say "a given minimum number of digits" because values wider than the display width are displayed in full without being chopped. You can see this by issuing the following statements:

mysql> DROP TABLE IF EXISTS mytbl; mysql> CREATE TABLE mytbl (my_zerofill INT(5) ZEROFILL); mysql> INSERT INTO mytbl VALUES(1),(100),(10000),(1000000); mysql> SELECT my_zerofill FROM mytbl; +-------------+ | my_zerofill | +-------------+ | 00001 | | 00100 | | 10000 | | 1000000 | +-------------+

Note that the final value, which is wider than the column's display width, is displayed in full.

The UNSIGNED attribute disallows negative values. It is most often used with integer types. Making an integer column UNSIGNED doesn't change the "size" of the underlying data type's range; it just shifts the range upward. Consider this table specification:

CREATE TABLE mytbl ( itiny TINYINT, itiny_u TINYINT UNSIGNED );

itiny and itiny_u are both TINYINT columns with a range of 256 values but differ in the particular allowable values. The range of itiny is 128 to 127, whereas the range of itiny_u is shifted up, resulting in a range of 0 to 255.

UNSIGNED is useful for integer columns into which you plan to store information that doesn't take on negative values, such as population counts or attendance figures. If you use a signed column for such values, you use only half of the column type's range. By making the column UNSIGNED, you effectively double your range. If you use the column for sequence numbers, it will take twice as long to run out of values if you make it UNSIGNED.

You can also specify UNSIGNED for floating-point columns, although the effect is slightly different than for integer columns. The range does not shift upward; instead, the upper end remains unchanged and the lower end becomes zero. A precaution to observe is that you should not use UNSIGNED with FLOAT or DOUBLE columns prior to MySQL 4.0.2. In earlier versions, MySQL allows these types to be declared as UNSIGNED, but doing so may result in unpredictable column behavior. (This prohibition does not apply to DECIMAL.)

One other attribute, AUTO_INCREMENT, can be specified for integer column types only. Use the AUTO_INCREMENT attribute when you want to generate unique identifiers or values in a series. When you insert NULL into an AUTO_INCREMENT column, MySQL generates the next sequence value and stores it in the column. Normally, unless you take steps to cause otherwise, AUTO_INCREMENT values begin at 1 and increase by 1 for each new row. The sequence can be affected if you delete rows from the table. This depends on the table type, which determines whether or not sequence values are reused.

You can have at most one AUTO_INCREMENT column in a table. The column should be NOT NULL, and it should be declared as a PRIMARY KEY or as a UNIQUE key. Also, because sequence values are always positive, you normally declare the column UNSIGNED as well. For example, you can declare an AUTO_INCREMENT column in any of the following ways:

CREATE TABLE ai (i INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY); CREATE TABLE ai (i INT UNSIGNED AUTO_INCREMENT NOT NULL, PRIMARY KEY (i)); CREATE TABLE ai (i INT UNSIGNED AUTO_INCREMENT NOT NULL, UNIQUE (i));

It is always allowable to declare an AUTO_INCREMENT column explicitly NOT NULL, as shown. However, for versions 3.23 and later, MySQL treats AUTO_INCREMENT columns as NOT NULL automatically.

The behavior of AUTO_INCREMENT columns is discussed further in the "Working with Sequences" section later in this chapter.

Following the attributes just described, which are specific to numeric columns, you can also specify the general attributes NULL or NOT NULL. If you do not specify NULL or NOT NULL for a numeric column, the default is NULL. You can also specify a default value using the DEFAULT attribute. If you do not specify a default value, one is chosen automatically. For all numeric column types, the default is NULL for columns that may contain NULL, and 0 otherwise.

The following table contains three INT columns, having default values of -1, 1, and NULL:

CREATE TABLE t ( i1 INT DEFAULT -1, i2 INT DEFAULT 1, i3 INT DEFAULT NULL );

String Column Types

MySQL provides several string types to hold character data. Strings are often used for values like the following:

'N. Bertram, et al.' 'Pencils (no. 2 lead)' '123 Elm St.' 'Monograph Series IX'

But strings are actually "generic" types in a sense because you can use them to represent any value. For example, you can use string types to hold binary data, such as images or sounds, or output from gzip, should you want to store compressed data.

Table 2.9 shows the types provided by MySQL for declaring string-valued columns and the maximum size and storage requirements of each type. For variable-length column types, the amount of storage taken by a value varies from row to row and depends on the length of the values actually stored in the column. This length is represented by L in the table.

The extra bytes required in addition to L are the number of bytes needed to store the length of the value. MySQL handles variable-length values by storing both the content of the value and its length. These extra bytes are treated as an unsigned integer. Notice the correspondence between a variable-length type's maximum length, the number of extra bytes required for that type, and the range of the unsigned integer type that uses the same number of bytes. For example, MEDIUMBLOB values can be up to 224 1 bytes long and require 3 bytes to record the result. The 3-byte integer type MEDIUMINT has a maximum unsigned value of 224 1. That's not a coincidence.

Table 2.9. String Column Types

Type Specification Maximum Size Storage Required
CHAR[(M)] M bytes M bytes
VARCHAR(M) M bytes L+1 bytes
TINYBLOB, TINYTEXT 28 1 bytes L+1 bytes
BLOB, TEXT 216 1 bytes L+2 bytes
MEDIUMBLOB, MEDIUMTEXT 224 1 bytes L+3 bytes
LONGBLOB, LONGTEXT 232 1 bytes L+4 bytes
ENUM('value1','value2',...) 65535 members 1 or 2 bytes
SET('value1','value2',...) 64 members 1, 2, 3, 4, or 8 bytes

For ENUM and SET, the column definition includes a list of legal values. Attempting to store a value other than those causes the value to be converted to '' (the empty string). For the other string types, values that are too long are chopped to fit. But string types range from very small to very large, with the largest type able to hold nearly 4GB of data, so you should be able to find something long enough to avoid truncation of your information.[2]

[2] The effective maximum column size is actually imposed by the maximum packet size of the client/server communication protocol. This value is 16MB prior to MySQL 4, and 1GB for MySQL 4 and later.

ENUM and SET values are stored internally as numbers, as detailed later in the "ENUM and SET Column Types" section. Values for the other string types are stored as a sequence of bytes and treated either as bytes or characters, depending on whether the type holds binary or non-binary strings:

  • A binary string is treated as a generic sequence of bytes, without respect to any character set. BLOB columns hold binary values, as do CHAR and VARCHAR columns if they are declared with the BINARY attribute.

  • A non-binary string is treated as a sequence of characters and interpreted with respect to the properties of a particular character set. TEXT columns hold non-binary strings, as do CHAR and VARCHAR columns if they are declared without the BINARY attribute. For a single-byte character set, each character takes one byte. For multi-byte character sets, characters can take more than one byte. In MySQL 4.1 and later, columns can be assigned character sets individually. Prior to MySQL 4.1, the server's default character set is used to interpret character strings.

Use of a character set causes non-binary strings to be compared and sorted using the character set's collating sequence. By contrast, a binary string has no character set and thus no collating sequence. This results in some differences in the way binary and non-binary strings are interpreted:

  • Binary strings are processed byte-by-byte in comparisons based only on the underlying numeric value of each byte. One implication of this property is that binary values are case sensitive, because the lowercase and uppercase versions of a given letter have different numeric codes.

  • Non-binary strings are processed character-by-character in comparisons using the character set collating sequence. For most character sets, uppercase and lowercase versions of a given letter have the same collating value, which means that non-binary string comparisons are not case sensitive. Similar characters with different accents also may have the same collating value. For example, 'E' and 'É' compare as the same character in the latin1 character set.

There are a few character sets that do treat uppercase and lowercase as having different collating values and that distinguish between accent marks: cp1521csas, cp1527ltlvcsas, latin1csas, maccecsas, and macromancsas. Note that these character set names each end with csas, which means "case sensitive, accent sensitive." They're something of a special case, so although elsewhere in this book I discuss non-binary strings as not case sensitive, keep in mind that these character sets exist as exceptions to the rule.

The distinction between characters and bytes can be seen easily by considering the length of a string containing multi-byte characters. For example, in MySQL 4.1 and later, you can use the CONVERT() function to generate a string in any available character set. The following statement creates @s as a string using ucs2, a character set that uses two bytes to encode each character:

mysql> SET @s = CONVERT('ABC' USING ucs2);

What is the "length" of the string @s? It depends. If you measure with CHAR_LENGTH(), which is multi-byte aware, you get the length in characters. If you measure with LENGTH(), which is not multi-byte aware, you get the length in bytes:

mysql> SELECT CHAR_LENGTH(@s), LENGTH(@s); +-----------------+------------+ | CHAR_LENGTH(@s) | LENGTH(@s) | +-----------------+------------+ | 3 | 6 | +-----------------+------------+

A binary string has no character set and is treated simply as a sequence of individual bytes. Consequently, the length of the string is the same whether measured in characters or bytes:

mysql> SET @s = BINARY CONVERT('ABC' USING ucs2); mysql> SELECT CHAR_LENGTH(BINARY @s), LENGTH(BINARY @s); +------------------------+-------------------+ | CHAR_LENGTH(BINARY @s) | LENGTH(BINARY @s) | +------------------------+-------------------+ | 6 | 6 | +------------------------+-------------------+

The difference between lengths in characters and in bytes is significant for interpreting the meaning of string column types. For example, a column declaration of VARCHAR(20) doesn't really mean "20 characters maximum," it means "as many characters as will fit in 20 bytes." For single-byte character sets, the two are the same because the number of characters is the same as the number of bytes. But, for a multi-byte character set, the number of characters can be many less than 20.

The CHAR and VARCHAR Column Types

CHAR and VARCHAR are the most commonly used string types. The difference between them is that CHAR is a fixed-length type and VARCHAR is a variable-length type. Values in a CHAR(M) column each take M bytes; shorter values are right-padded with spaces when they are stored. (Trailing spaces are stripped off on retrieval, however.) Values in a VARCHAR(M) column are stored using only as many bytes as necessary, plus one byte to record the length. Trailing spaces are stripped from VARCHAR values when they are stored; this differs from the ANSI SQL standard for VARCHAR values. (A VARCHAR type for which trailing spaces are not stripped may be introduced in a future version of MySQL.)

CHAR and VARCHAR columns can be declared with a maximum length M from 1 to 255. M is optional for CHAR and defaults to 1 if missing. Beginning with MySQL 3.23, CHAR(0) is also legal. CHAR(0) is useful as a placeholder when you want to declare a column but don't want to allocate space for it if you're not sure yet how wide to make it. You can use ALTER TABLE to widen the column later. A CHAR(0) column can also be used to represent on/off values if you allow it to be NULL. Values in such a column can have two values NULL or the empty string. A CHAR(0) column takes very little storage space in the table only a single bit. As of MySQL 4.0.2, VARCHAR(0) is allowable as well, but it's treated as CHAR(0).

Keep two general principles in mind when choosing between CHAR and VARCHAR column types:

  • If your values are all the same length, VARCHAR actually will use more space due to the extra byte required to record the length of values. On the other hand, if your values vary in length, VARCHAR columns have the advantage of taking less space. A CHAR(n) column always takes n bytes, even if it is empty or NULL.

  • If your values don't vary much in length, CHAR is a better choice than VARCHAR if you're using MyISAM or ISAM tables. For such table types, tables with fixed-length rows can be processed more efficiently than tables with variable-length rows.

With a few limited exceptions, you cannot mix CHAR and VARCHAR within the same table. MySQL will even change columns from one type to another, depending on the circumstances. (This is something that other databases do not do.) The principles that apply are as follows:

  • Table rows are fixed-length only if all the columns in the table are fixed-length types.

  • If even a single column has a variable length, table rows become variable-length as well.

  • If table rows are variable-length, any fixed-length columns in the column may as well be converted to variable-length equivalents when that will save space.

What this means is that if you have VARCHAR, BLOB, or TEXT columns in a table, you cannot also have CHAR columns; MySQL silently converts them to VARCHAR. Suppose you create a table as follows:

CREATE TABLE mytbl ( c1 CHAR(10), c2 VARCHAR(10) );

If you issue a DESCRIBE query, the output is as follows:

mysql> DESCRIBE mytbl; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | c1 | varchar(10) | YES | | NULL | | | c2 | varchar(10) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+

Notice that the presence of the VARCHAR column causes MySQL to convert c1 to VARCHAR as well. If you try using ALTER TABLE to convert c1 to CHAR, it won't work. The only way to convert a VARCHAR column to CHAR is to convert all VARCHAR columns in the table at the same time:

mysql> ALTER TABLE mytbl MODIFY c1 CHAR(10), MODIFY c2 CHAR(10); mysql> DESCRIBE mytbl; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+-------+ | c1 | char(10) | YES | | NULL | | | c2 | char(10) | YES | | NULL | | +-------+----------+------+-----+---------+-------+

The BLOB and TEXT column types are variable-length like VARCHAR, but they have no fixed-length equivalent, so you cannot use CHAR columns in the same table as BLOB or TEXT columns. Any CHAR column will be converted to VARCHAR.

The exception to non-mixing of fixed- and variable-length columns is that CHAR columns shorter than four characters are not converted to VARCHAR. For example, MySQL will not change the CHAR column in the following table to VARCHAR:

CREATE TABLE mytbl ( c1 CHAR(2), c2 VARCHAR(10) );

You can see this from the output of DESCRIBE:

mysql> DESCRIBE mytbl; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | c1 | char(2) | YES | | NULL | | | c2 | varchar(10) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+

The reason columns shorter than four characters are not converted is that, on average, any savings you might gain by not storing trailing spaces are offset by the extra byte needed in a VARCHAR column to record the length of each value. In fact, if all your columns are short, MySQL will convert any that you declare as VARCHAR to CHAR. MySQL does this because the conversion will decrease storage requirements on average and, for MyISAM and ISAM tables, will improve performance by making table rows fixed-length. Suppose you create a table with the following specification:

CREATE TABLE mytbl ( c0 VARCHAR(0), c1 VARCHAR(1), c2 VARCHAR(2), c3 VARCHAR(3) );

DESCRIBE reveals that MySQL silently changes all the VARCHAR columns to CHAR:

mysql> DESCRIBE mytbl; +-------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------+------+-----+---------+-------+ | c0 | char(0) | YES | | NULL | | | c1 | char(1) | YES | | NULL | | | c2 | char(2) | YES | | NULL | | | c3 | char(3) | YES | | NULL | | +-------+---------+------+-----+---------+-------+

The BLOB and TEXT Column Types

A "BLOB" is a binary large object basically, a container that can hold anything you want to toss into it, and that you can make about as big as you want. In MySQL, the BLOB type is really a family of types (TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB), which are identical except for the maximum amount of information they can hold (see Table 2.9). BLOB columns are useful for storing data that may grow very large or that can vary widely in size from row to row. Some examples are word-processing documents, images and sounds, compound data, and news articles. MySQL also has a family of TEXT types (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT). These are similar to the corresponding BLOB types, except that they are associated with a character set and operations on TEXT columns take character set into account. (For MySQL 4.1 and later, this is the character set assigned to the TEXT column itself. Prior to 4.1, it is the server's default character set.) This results in the general differences between binary and non-binary strings that were described earlier. For example, in comparison and sorting operations, BLOB values are case sensitive and TEXT values are not.

BLOB or TEXT columns sometimes can be indexed, depending on the table type you're using:

  • MyISAM tables support BLOB and TEXT indexing (for MySQL 3.23.2 and later), as do BDB tables. However, you must specify a prefix size to be used for the index. This avoids creating index entries that might be huge and thereby defeat any benefits to be gained by that index. The exception is that no prefix is specified for FULLTEXT indexes on TEXT columns, because FULLTEXT searches are based on the entire content of the indexed columns.

  • ISAM, HEAP, and InnoDB tables do not support BLOB and TEXT indexes.

BLOB or TEXT columns may require special care:

  • Due to the typical large variation in the size of BLOB and TEXT values, tables containing them are subject to high rates of fragmentation if many deletes and updates are done. You'll want to run OPTIMIZE TABLE periodically to reduce fragmentation and maintain good performance. See Chapter 4, "Query Optimization," for more information.

  • If you're using very large values, you may need to tune the server to increase the value of the max_allowed_packet parameter. See Chapter 11, "General MySQL Administration," for more information. You will also need to increase the packet size for any client that wishes to use very large values. Appendix E, "MySQL Program Reference," describes how to do this for the mysql client program.

The ENUM and SET Column Types

ENUM and SET are special string column types for which values must be chosen from a fixed (predefined) list of allowable strings. The primary difference between them is that ENUM column values must consist of exactly one member of the list of values, whereas SET column values can contain any or all members of the list. In other words, ENUM is used for values that are mutually exclusive, whereas SET allows multiple choices from the list.

The ENUM column type defines an enumeration. ENUM columns can be assigned values consisting of exactly one member chosen from a list of values specified at table-creation time. You can define an enumeration to have up to 65,535 members. Enumerations are commonly used to represent category values. For example, values in a column declared as ENUM('N','Y') can be either 'N' or 'Y'. Or you can use ENUM for such things as answers to multiple-choice questions in a survey or questionnaire, or available sizes or colors for a product:

employees ENUM('less than 100','100-500','501-1500','more than 1500') color ENUM('red','green','blue','black') size ENUM('S','M','L','XL','XXL')

If you are processing selections from Web pages, you can use an ENUM to represent the option that a visitor to your site chooses from a set of mutually exclusive radio buttons on a page. For example, if you run an online pizza ordering service, an ENUM can be used to represent the type of crust a customer orders:

crust ENUM('thin','regular','pan style','deep dish')

If enumeration categories represent counts, it's important to choose your categories properly when you create the enumeration. For example, when recording white blood cell counts from a laboratory test, you may group the counts into categories as follows:

wbc ENUM('0-100','101-300','>300')

When a test result comes in as an exact count, you can record the value in the wbc column terms of the category into which the count falls. But you cannot recover the original count if you decide you want to convert the column from a category-based ENUM to an integer column based on exact count. (If you really need the exact count, use an integer column instead.)

The SET type is similar to ENUM in the sense that when you create a SET column, you specify a list of legal set members. But unlike ENUM, each column value can consist of any number of members from the set. The set can have up to 64 members. You can use a SET when you have a fixed set of values that are not mutually exclusive, as they are in an ENUM column. For example, you might use a SET to represent options available for an automobile:

SET('luggage rack','cruise control','air conditioning','sun roof')

Then, particular SET values would represent those options actually ordered by customers:

'cruise control,sun roof' 'luggage rack,air conditioning' 'luggage rack,cruise control,air conditioning' 'air conditioning' ''

The final value shown (the empty string) means that the customer ordered no options. This is a legal SET value.

SET column values are represented as a single string. If a value consists of multiple set members, the members are separated in the string by commas. Obviously, this means you shouldn't use a string containing a comma as a SET member.

Other uses for SET columns might be for representing information, such as patient diagnoses or results from selections on Web pages. For a diagnosis, there may be a standard list of symptoms to ask a patient about, and the patient might exhibit any or all of them. For your online pizza service, the Web page for ordering could have a set of check boxes for ingredients that a customer wants on a pizza, several of which might be chosen.

The way you declare the legal value list for an ENUM or SET column is significant in several ways:

  • The list determines the possible legal values for the column, as has already been discussed.

  • You can insert ENUM or SET values in any lettercase, but the lettercase of the strings specified in the column declaration determines the lettercase of column values when they are retrieved later. For example, if you have an ENUM('Y','N') column and you store 'y' and 'n' in it, the values are displayed as 'Y' and 'N' when you retrieve them. This does not affect comparison or sorting behavior because ENUM and SET columns are not case sensitive.

  • The order of values in an ENUM declaration is the order used for sorting. The order of values in a SET declaration also determines sort order, although the relationship is more complicated because column values can contain multiple set members.

  • The order of values in a SET declaration determines the order in which set members appear when SET column values consisting of multiple members are displayed.

ENUM and SET are classified as string types because enumeration and set members are specified as strings when you create columns of these types. However, the members are stored internally as numbers and you can operate on them as such. This means that ENUM and SET types are more efficient than other string types because they often can be handled using numeric operations rather than string operations. It also means that ENUM and SET values can be used in either string or numeric contexts.

ENUM members in the column declaration are numbered sequentially beginning with 1. (0 is reserved by MySQL for the error member, which is represented in string form by the empty string.) The number of enumeration values determines the storage size of an ENUM column. One byte can represent 256 values, two bytes can represent 65,536 values. (Compare this to the ranges of the one-byte and two-byte integer types TINYINT UNSIGNED and SMALLINT UNSIGNED.) Thus, the maximum number of enumeration members is 65,536 (counting the error member) and the storage size depends on whether or not there are more than 256 members. You can specify a maximum of 65,535 (not 65,536) members in the ENUM declaration because MySQL reserves a spot for the error member as an implicit member of every enumeration. When you assign an illegal value to an ENUM column, MySQL assigns the error member instead.

The following is an example you can try using the mysql client. It demonstrates that you can retrieve ENUM values in either string or numeric form (which shows the numeric ordering of enumeration members and also that the NULL value has no number in the ordering):

mysql> CREATE TABLE e_table (e ENUM('jane','fred','will','marcia')); mysql> INSERT INTO e_table -> VALUES('jane'),('fred'),('will'),('marcia'),(''),(NULL); mysql> SELECT e, e+0, e+1, e*3 FROM e_table; +--------+------+------+------+ | e | e+0 | e+1 | e*3 | +--------+------+------+------+ | jane | 1 | 2 | 3 | | fred | 2 | 3 | 6 | | will | 3 | 4 | 9 | | marcia | 4 | 5 | 12 | | | 0 | 1 | 0 | | NULL | NULL | NULL | NULL | +--------+------+------+------+

You can compare ENUM members either by name or number:

mysql> SELECT e FROM e_table WHERE e='will'; +------+ | e | +------+ | will | +------+ mysql> SELECT e FROM e_table WHERE e=3; +------+ | e | +------+ | will | +------+

It is possible to declare the empty string as a legal enumeration member. It will be assigned a non-zero numeric value, just as any other member listed in the declaration would be. However, using an empty string may cause some confusion because that string is also used for the error member that has a numeric value of 0. In the following example, assigning the illegal enumeration value 'x' to the ENUM column causes the error member to be assigned. This is distinguishable from the empty string member only when retrieved in numeric form:

mysql> CREATE TABLE t (e ENUM('a','','b')); mysql> INSERT INTO t VALUES('a'),(''),('b'),('x'); mysql> SELECT e, e+0 FROM t; +------+------+ | e | e+0 | +------+------+ | a | 1 | | | 2 | | b | 3 | | | 0 | +------+------+

The numeric representation of SET columns is a little different than for ENUM columns. Set members are not numbered sequentially. Instead, each member corresponds to an individual bit in the SET value. The first set member corresponds to bit 0, the second member corresponds to bit 1, and so on. A numeric SET value of 0 corresponds to the empty string. SET members are maintained as bit values. Eight set values per byte can be stored this way, so the storage size for a SET column is determined by the number of set members, up to a maximum of 64 members. SET values take 1, 2, 3, 4, or 8 bytes for set sizes of 1 to 8, 9 to 16, 17 to 24, 25 to 32, and 33 to 64 members.

The representation of a SET as a set of bits is what allows a SET value to consist of multiple set members. Any combination of bits can be turned on in the value, so the value can consist of any combination of the strings in the SET declaration that correspond to those bits.

The following is an example that shows the relationship between the string and numeric forms of a SET column; the numeric value is displayed in both decimal and binary form:

mysql> CREATE TABLE s_table (s SET('jane','fred','will','marcia')); mysql> INSERT INTO s_table -> VALUES('jane'),('fred'),('will'),('marcia'),(''),(NULL); mysql> SELECT s, s+0, BIN(s+0) FROM s_table; +--------+------+----------+ | s | s+0 | BIN(s+0) | +--------+------+----------+ | jane | 1 | 1 | | fred | 2 | 10 | | will | 4 | 100 | | marcia | 8 | 1000 | | | 0 | 0 | | NULL | NULL | NULL | +--------+------+----------+

If you assign a value containing substrings that are not listed as set members to a SET column, those strings drop out and the column is assigned a value consisting of the remaining substrings. When you assign values to SET columns, the substrings don't need to be listed in the same order that you used when you declared the column. However, when you retrieve the value later, members will be listed in declaration order. Suppose you declare a SET column to represent furniture items using the following declaration:

SET('table','lamp','chair')

If you assign a value of 'chair,couch,table' to this column, two things happen. First, 'couch' drops out because it's not a member of the set. Second, when you retrieve the value later, it appears as 'table,chair'. This occurs because MySQL determines which bits correspond to each substring of the value to be assigned and turns them on in the stored value. 'couch' corresponds to no bit and is ignored. On retrieval, MySQL constructs the string value from the numeric value by scanning the bits in order, which automatically reorders the substrings to the order used when the column was declared. This behavior also means that if you specify a set member more than once in a value, it will appear only once when you retrieve the value. If you assign 'lamp,lamp,lamp' to a SET column, it will be simply 'lamp' when retrieved.

The fact that MySQL reorders members in a SET value means that if you search for values using a string, you must list members in the proper order. If you insert 'chair,table' and then search for 'chair,table' you won't find the record; you must look for it as 'table,chair'.

Sorting and indexing of ENUM and SET columns is done according to the internal (numeric) values of column values. The following example might appear to be incorrect otherwise because the values are not sorted in alpha numeric order:

mysql> SELECT e FROM e_table ORDER BY e; +--------+ | e | +--------+ | NULL | | | | jane | | fred | | will | | marcia | +--------+

The placement of the NULL value depends how your version of MySQL sorts NULL values. (See the "Sorting Query Results" section in Chapter 1.)

If you have a fixed set of values and you want them to sort in a particular order, you can exploit the ENUM sorting properties. Represent the values as an ENUM column in a table and list the enumeration values in the column declaration in the order that you want them to be sorted. Suppose you have a table representing personnel for a sports organization, such as a football team, and that you want to sort output by personnel position so that it comes out in a particular order, such as the coaches, assistant coaches, quarterbacks, running backs, receivers, linemen, and so on. Define the column as an ENUM and list the enumeration elements in the order that you want to see them. Sort operations on that column will automatically come out in the order you specify.

For cases where you want an ENUM to sort in regular lexical order, you can convert the column to a non-ENUM string by using CONCAT() and sorting the result:

mysql> SELECT CONCAT(e) AS e_str FROM e_table ORDER BY e_str; +--------+ | e_str | +--------+ | NULL | | | | fred | | jane | | marcia | | will | +--------+

CONCAT() doesn't change the displayed values but has the side effect in this query of performing an ENUM-to-string conversion that alters their sorting properties.

String Column Type Attributes

The BINARY attribute can be specified for the CHAR and VARCHAR types to cause column values to be treated as binary strings (that is, as a string of bytes rather than as a string of characters). A common use for this is to cause column values to be case sensitive.

In MySQL 4.1 and later, you can specify a CHARACTER SET charset attribute for CHAR, VARCHAR, and TEXT columns. charset should be a valid character set name. The character set may differ among columns. For example, the following table contains latin1_de (German), utf8 (Unicode), and sjis (Japanese) columns:

CREATE TABLE mytbl ( c1 CHAR(10) CHARACTER SET latin1_de, c2 VARCHAR(40) CHARACTER SET utf8, t MEDIUMTEXT CHARACTER SET sjis );

In versions of MySQL for which individual columns may be assigned character sets, DESCRIBE output will show that information:

mysql> DESCRIBE mytbl; +-------+-------------------------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------------------------------+------+-----+---------+-------+ | c1 | varchar(10) character set latin1_de | YES | | NULL | | | c2 | varchar(40) character set utf8 | YES | | NULL | | | t | mediumtext character set sjis | YES | | NULL | | +-------+-------------------------------------+------+-----+---------+-------+

Binary strings do not have character sets, so the CHARACTER SET attribute is not applicable to CHAR BINARY, VARCHAR BINARY, or any of the BLOB types. Character sets cannot be assigned to ENUM or SET columns, either, because values in such columns are represented numerically.[3]

[3] The string values that correspond to ENUM and SET values currently are interpreted with respect to the server's default character set. There is work in progress in MySQL 4.1 to allow such columns to be associated with a named character set or to be declared as BINARY a feature that may in fact be available by the time this book reaches you.

In MySQL 4.1 and later, every non-binary character column has a character set; one will be assigned, even if you do not specify one explicitly in the column definition. Character sets can be designated at the column, table, database, or server level, so when you create a character column, MySQL determines which character set to assign to it by trying the following rules in order:

  1. If the column definition includes a character set, use that set.

  2. Otherwise, if the table definition includes a table-level character set other than DEFAULT, use that set.

  3. Otherwise, if the database has been assigned a character set other than DEFAULT, use that set.

  4. Otherwise, use the server's default character set.

In other words, MySQL searches up through the levels at which character sets can be specified until it finds an explicit character set and then uses that for the column's set. The server always has a default character set, so the search process is guaranteed to terminate at the server level even if no character set is specified explicitly at any of the lower levels.

Suppose the server's character set is greek and that the current database has a character set of DEFAULT. The following CREATE TABLE statement specifies no character set at either the column or table level:

CREATE TABLE t (c CHAR(10));

The database has no explicit set either, so MySQL searches all the way up to the server level to find a character set (greek) to use for the column c. You can verify that with DESCRIBE:

mysql> DESCRIBE t; +-------+------------------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------------------+------+-----+---------+-------+ | c | char(10) character set greek | YES | | NULL | | +-------+------------------------------+------+-----+---------+-------+

The next statement specifies a table-level character set, so MySQL searches only up to that level to determine that the character set for column c should be czech:

CREATE TABLE t (c CHAR(10)) CHARACTER SET czech;

Again, you can verify that with DESCRIBE:

mysql> DESCRIBE t; +-------+------------------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------------------+------+-----+---------+-------+ | c | char(10) character set czech | YES | | NULL | | +-------+------------------------------+------+-----+---------+-------+

Character sets are described further in the "Character Set Support" section later in this chapter.

The general attributes NULL or NOT NULL can be specified for any of the string types. If you don't specify either of them, NULL is the default. However, declaring a string column as NOT NULL does not prevent entry of an empty string. An empty value is different than a missing value, so don't make the mistake of thinking that you can force a string column to contain non-empty values by declaring it NOT NULL. If you require string values to be non-empty, that is a constraint you must enforce from within your own applications.

You can also specify a default value using the DEFAULT attribute for all string column types except the BLOB and TEXT types. If you don't specify a default value, one is chosen automatically. The default is NULL for columns that may contain NULL. For columns that may not contain NULL, the default is the empty string except for ENUM, where the default is the first enumeration member. (For SET, the default when the column cannot contain NULL is actually the empty set, but that is equivalent to the empty string.)

Date and Time Column Types

MySQL provides several column types for temporal values DATE, DATETIME, TIME, TIMESTAMP, and YEAR. Table 2.10 shows the types provided by MySQL for declaring columns that hold date and time values and the range of legal values for each type. The storage requirements for each type are shown in Table 2.11.

Table 2.10. Date and Time Column Types

Type Specification Range
DATE '1000-01-01' to '9999-12-31'
TIME '-838:59:59' to '838:59:59'
DATETIME '1000-01-01 00:00:00' to '9999-12-31 23:59:59'
TIMESTAMP[(M)] 19700101000000 to sometime in the year 2037
YEAR[(M)] 1901 to 2155 for YEAR(4), and 1970 to 2069 for YEAR(2)

Table 2.11. Date and Time Column Type Storage Requirements

Type Specification Storage Required
DATE 3 bytes (4 bytes prior to MySQL 3.22)
TIME 3 bytes
DATETIME 8 bytes
TIMESTAMP 4 bytes
YEAR 1 byte

Each date and time type has a "zero" value that is stored when you insert a value that is illegal for the type, as shown in Table 2.12. This value is also the default value for date and time columns that are declared NOT NULL.

Table 2.12. Date and Time Type "Zero" Values

Type Specification Zero Value
DATE '0000-00-00'
TIME '00:00:00'
DATETIME '0000-00-00 00:00:00'
TIMESTAMP 00000000000000
YEAR 0000

MySQL always represents dates with the year first, in accordance with the ANSI SQL and ISO 8601 specifications. For example, December 3, 2004 is represented as '2004-12-03'. MySQL does allow some leeway in the way it allows input dates to be specified. For example, it will convert two-digit year values to four digits, and you need not supply a leading zero digit for month and day values that are less than 10. However, you must specify the year first and the day last. Formats that you may be more used to, such as '12/3/99' or '3/12/99', will be interpreted incorrectly. The date interpretation rules MySQL uses are discussed further in the "Working with Date and Time Columns" section later in this chapter.

Time values are returned in the time zone local to the server; MySQL doesn't make any time zone adjustments for the values that it returns to the client.

The DATE, TIME, and DATETIME Column Types

The DATE, TIME, and DATETIME types hold date, time, and combined date and time values. The formats are 'CCYY-MM-DD', 'hh:mm:ss', and 'CCYY-MM-DD hh:mm:ss', where CC, YY, MM, DD hh, mm, and ss represent century, year, month, day, hour, minute, and second. For the DATETIME type, the date and time parts are both required; if you assign a DATE value to a DATETIME column, MySQL automatically adds a time part of '00:00:00'. (Conversely, if you assign a DATETIME value to a DATE column, MySQL discards the time part.)

MySQL treats the time in DATETIME and TIME values slightly differently. For DATETIME, the time part represents a time of day. A TIME value, on the other hand, represents elapsed time that's why the range for TIME columns is so great and why negative values are allowed.

One thing to watch out for when inserting TIME values into a table is that if you use a "short" (not fully qualified) value, it may not be interpreted as you expect. For example, you'll probably find that if you insert '30' and '12:30' into a TIME column, one value will be interpreted from right to left and the other from left to right, resulting in stored values of '00:00:30' and '12:30:00'. If you consider '12:30' to represent a value of "12 minutes, 30 seconds," you should specify it in fully qualified form as '00:12:30'.

The TIMESTAMP Column Type

TIMESTAMP columns represent values in CCYYMMDDhhmmss format, with a range from 19700101000000 to sometime in the year 2037. The range is tied to UNIX time, where the first day of 1970 is "day zero," also known as "the epoch." The beginning of 1970 determines the lower end of the TIMESTAMP range. The upper end of the range corresponds to the four-byte limit on UNIX time, which can represent values into the year 2037.[4]

[4] The upper limit on TIMESTAMP values will increase as operating systems are modified to extend the upper range of UNIX time values. This is something that must be addressed at the system library level. MySQL will take advantage of these changes as they are made.

The TIMESTAMP type is so called because it has some special properties for recording when a row is created or modified:

  • If you insert a NULL into any TIMESTAMP column, the column value is set automatically to the current date and time.

  • The current date and time are also used if you create or update a row without assigning an explicit value to the column, but only for the first TIMESTAMP column in a row.

  • For any TIMESTAMP column, you can update its value to the current timestamp by setting it to NULL, or you can defeat timestamping by inserting an explicit date and time value into the column rather than NULL.

A TIMESTAMP column declaration can include a specification for a maximum display width M. Table 2.13 shows the display formats for the allowed values of M. If M is omitted from a TIMESTAMP declaration or has a value of 0 or greater than 14, the column is treated as TIMESTAMP(14). Odd values of M are treated as the next higher even number.

Table 2.13. TIMESTAMP Display Formats

Type Specification Display Format
TIMESTAMP(14) CCYYMMDDhhmmss
TIMESTAMP(12) YYMMDDhhmmss
TIMESTAMP(10) YYMMDDhhmm
TIMESTAMP(8) CCYYMMDD
TIMESTAMP(6) YYMMDD
TIMESTAMP(4) YYMM
TIMESTAMP(2) YY

The display width for TIMESTAMP columns has nothing to do with storage size or the values stored internally. TIMESTAMP values are always stored in 4 bytes and used in calculations to full 14-digit precision, regardless of the display width. To see this, suppose you declare a table as follows and then insert some rows into it and retrieve them:

mysql> CREATE TABLE mytbl (ts TIMESTAMP(8), i INT); mysql> INSERT INTO mytbl VALUES(20020801120000,3); mysql> INSERT INTO mytbl VALUES(20020801120001,2); mysql> INSERT INTO mytbl VALUES(20020801120002,1); mysql> INSERT INTO mytbl VALUES(20020801120003,0); mysql> SELECT * FROM mytbl ORDER BY ts, i; +----------+------+ | ts | i | +----------+------+ | 20020801 | 3 | | 20020801 | 2 | | 20020801 | 1 | | 20020801 | 0 | +----------+------+

On the face of it, the rows produced by the SELECT statement appear to be sorted in the wrong order the values in the first column are all the same, so it seems the sort should order the rows according to the values in the second column. This apparently anomalous result is due to the fact that MySQL is using the full 14-digit values inserted into the TIMESTAMP column for sorting. These values are all distinct, so they entirely determine the sort order of the result.

MySQL has no column type that can be set to the current date and time when a record is created and that remains immutable thereafter. If you want to achieve that, you can do it two ways:

  • Use a TIMESTAMP column. When you create a new record, set the column to NULL to initialize it to the current date and time:

    INSERT INTO tbl_name (ts_col, ...) VALUES(NULL, ...);

    Whenever you update the record thereafter, explicitly set the column to the value it already has. Assigning an explicit value defeats the timestamping mechanism because it prevents the column's value from being automatically updated:

    UPDATE tbl_name SET ts_col=ts_col WHERE ... ;

  • Use a DATETIME column. When you create a record, initialize the column to NOW():

    INSERT INTO tbl_name (dt_col, ...) VALUES(NOW(), ...);

    Whenever you update the record thereafter, leave the column alone:

    UPDATE tbl_name SET ... anything BUT dt_col here ... WHERE ... ;

If you want to use TIMESTAMP columns to maintain both a time-created value and a last-modified value, you can do so by using one TIMESTAMP for the time-modified value and a second TIMESTAMP for the time-created value. Make sure the time-modified column is the first TIMESTAMP, so that it's set when the record is created or changed. Make the time-created column the second TIMESTAMP, and initialize it to NOW() when you create new records. That way, its value will reflect the record creation time and will not change after that.

The YEAR Column Type

YEAR is a one-byte column type used for efficient representation of year values. A YEAR column declaration can include a specification for a display width M, which should be either 4 or 2. If M is omitted from a YEAR declaration, the default is 4. YEAR(4) has a range of 1901 to 2155. YEAR(2) has a range of 1970 to 2069, but only the last two digits are displayed. You can use the YEAR type when you want to store date information but only need the year part of the date, such as year of birth, year of election to office, and so forth. When you do not need a full date value, YEAR is much more space-efficient than other date types.

TINYINT has the same storage size as YEAR (one byte), but not the same range. To cover the same range of years as YEAR by using an integer type, you would need a SMALLINT, which takes twice as much space. If the range of years you need to represent coincides with the range of the YEAR type, YEAR is more space-efficient than SMALLINT. Another advantage of YEAR over an integer column is that MySQL will convert two-digit values into four-digit values for you using MySQL's usual year-guessing rules. For example, 97 and 14 become 1997 and 2014. However, be aware that inserting the numeric value 00 into a four-digit YEAR column will result in the value 0000 being stored, not 2000. If you want a value of 00 to convert to 2000, you must specify it in string form as '00'.

Date and Time Column Type Attributes

There are no attributes that are specific to the date and time column types. The general attributes NULL or NOT NULL can be specified for any of the date and time types. If you don't specify either of them, NULL is the default. You can also specify a default value using the DEFAULT attribute. If you don't specify a default value, one is chosen automatically. The default is NULL for columns that may contain NULL. Otherwise, the default is the "zero" value for the type. TIMESTAMP columns are special; the default for the first such column in a table is the current date and time and the "zero" value for any others.

Note that because default values must be constants. you cannot use a function such as NOW() to supply a value of "the current date and time" as the default for a DATETIME column. To achieve that result, set the column value explicitly to NOW() whenever you create a new record or else use a TIMESTAMP column (assuming that the special properties of TIMESTAMP are suitable for your purposes).

Working with Date and Time Columns

MySQL tries to interpret date and time values in a variety of formats, including both string and numeric forms. Table 2.14 shows the allowable formats for each of the date and time types.

Table 2.14. Date and Time Type Input Formats

Type Allowable Formats
DATETIME, TIMESTAMP 'CCYY-MM-DD hh:mm:ss'
  'YY-MM-DD hh:mm:ss'
  'CCYYMMDDhhmmss'
  'YYMMDDhhmmss'
  CCYYMMDDhhmmss
  YYMMDDhhmmss
DATE 'CCYY-MM-DD'
  'YY-MM-DD'
  'CCYYMMDD'
  'YYMMDD'
  CCYYMMDD
  YYMMDD
TIME 'hh:mm:ss'
  'hhmmss'
  hhmmss
YEAR 'CCYY'
  'YY'
  CCYY
  YY

Formats that have no century part (CC) are interpreted using the rules described in next section, "Interpretation of Ambiguous Year Values." For string formats that include delimiter characters, you don't have to use '-' for dates and ':' for times. Any punctuation character can be used as the delimiter. Interpretation of values depends on context, not on the delimiter. For example, although times are typically specified using a delimiter of ':', MySQL won't interpret a value containing ':' as a time in a context where a date is expected. In addition, for the string formats that include delimiters, you need not specify two digits for month, day, hour, minute, or second values that are less than 10. The following are all equivalent:

'2012-02-03 05:04:09' '2012-2-03 05:04:09' '2012-2-3 05:04:09' '2012-2-3 5:04:09' '2012-2-3 5:4:09' '2012-2-3 5:4:9'

Note that values with leading zeroes may be interpreted differently depending on whether they are specified as strings or numbers. The string '001231' will be seen as a six-digit value and interpreted as '2000-12-31' for a DATE and as '2000-12-31 00:00:00' for a DATETIME. On the other hand, the number 001231 will be seen as 1231 after the parser gets done with it and then the interpretation becomes problematic. This is a case where it's best to supply a string value '001231' or else use a fully qualified value if you are using numbers (that is, 20001231 for DATE and 200012310000 for DATETIME).

In general, you can freely assign values between the DATE, DATETIME, and TIMESTAMP types, although there are certain restrictions to keep in mind:

  • If you assign a DATETIME or TIMESTAMP value to a DATE, the time part is discarded.

  • If you assign a DATE value to a DATETIME or TIMESTAMP, the time part of the resulting value is set to zero ('00:00:00').

  • The types have different ranges. In particular, TIMESTAMP has a more limited range (1970 to 2037), so, for example, you cannot assign a pre-1970 DATETIME value to a TIMESTAMP and expect reasonable results. Nor can you assign values that are far in the future to a TIMESTAMP.

MySQL provides many functions for working with date and time values. See Appendix C for more information.

Interpretation of Ambiguous Year Values

For all date and time types that include a year part (DATE, DATETIME TIMESTAMP, YEAR), MySQL handles values that contain two-digit years by converting them to four-digit years. This conversion is performed according to the following rules:

  • Year values from 00 to 69 become 2000 to 2069.

  • Year values from 70 to 99 become 1970 to 1999.

You can see the effect of these rules most easily by assigning different two-digit values to a YEAR column and then retrieving the results. This will also demonstrate something you should take note of:

mysql> CREATE TABLE y_table (y YEAR); mysql> INSERT INTO y_table VALUES(68),(69),(99), (00); mysql> SELECT * FROM y_table; +------+ | y | +------+ | 2068 | | 2069 | | 1999 | | 0000 | +------+

Notice that 00 was converted to 0000, not to 2000. That's because as a number, 00 is the same as 0 and is a perfectly legal value for the YEAR type. If you insert a numeric zero, that's what you get. To get 2000 using a value that does not contain the century, insert the string '0' or '00'. You can make sure MySQL sees a string and not a number by inserting YEAR values using CONCAT(). This function returns a string result uniformly regardless of whether its argument is a string or a number.

In any case, keep in mind that the rules for converting two-digit to four-digit year values provide only a reasonable guess. There is no way for MySQL to be certain about the meaning of a two-digit year when the century is unspecified. If MySQL's conversion rules don't produce the values that you want, the solution is to provide unambiguous data with four-digit years.

Is MySQL Year-2000 Safe?

MySQL itself is year-2000 safe because it stores dates internally with four-digit years, but it's your responsibility to provide data that result in the proper values being stored in the first place. The real problem with two-digit year interpretation comes not from MySQL but from the human desire to take a shortcut and enter ambiguous data. If you're willing to take the risk, go ahead. It's your risk to take, and MySQL's guessing rules are adequate for many situations. Just be aware that there are times when you really do need to enter four digits. For example, to enter birth and death dates into the president table that lists U.S. presidents back into the 1700s, four-digit year values are in order. Values in these columns span several centuries, so letting MySQL guess the century from a two-digit year is definitely the wrong thing to do.

Категории