Specifying the Datafile Format

10.4.1 Problem

You have a datafile that's not in LOAD DATA's default format.

10.4.2 Solution

Use FIELDS and LINES clauses to tell LOAD DATA how to interpret the file.

10.4.3 Discussion

By default, LOAD DATA assumes that datafiles contain lines that are terminated by linefeeds (newlines) and that data values within a line are separated by tabs. The following statement does not specify anything about the format of the datafile, so MySQL assumes the default format:

mysql> LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl;

To specify a file format explicitly, use a FIELDS clause to describe the characteristics of fields within a line, and a LINES clause to specify the line-ending sequence. The following LOAD DATA statement specifies that the datafile contains values separated by colons and lines terminated by carriage returns:

mysql> LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl -> FIELDS TERMINATED BY ':' -> LINES TERMINATED BY ' ';

Each clause follows the table name. If both are present, the FIELDS clause must precede the LINES clause. The line and field termination indicators can contain multiple characters. For example, indicates that lines are terminated by carriage return/linefeed pairs.

If you use mysqlimport, command-line options provide the format specifiers. mysqlimport commands that correspond to the preceding two LOAD DATA statements look like this:

% mysqlimport --local cookbook mytbl.txt % mysqlimport --local --fields-terminated-by=":" --lines-terminated-by=" " cookbook mytbl.txt

The order in which you specify the options doesn't matter for mysqlimport, except that they should all precede the database name.

Specifying Binary Format Option Characters

As of MySQL 3.22.10, you can use hex notation to specify arbitrary format characters for FIELDS and LINES clauses. Suppose a datafile has lines with Ctrl-A between fields and Ctrl-B at the end of lines. The ASCII values for Ctrl-A and Ctrl-B are 1 and 2, so you represent them as 0x01 and 0x02:

FIELDS TERMINATED BY 0x01 LINES TERMINATED BY 0x02

mysqlimport understands hex constants for format specifiers as of MySQL 3.23.30. You may find this capability helpful if you don't like remembering how to type escape sequences on the command line or when it's necessary to use quotes around them. Tab is 0x09, linefeed is 0x0a, and carriage return is 0x0d. Here's an example that indicates that the datafile contains tab-delimited lines terminated by CRLF pairs:

% mysqlimport --local --lines-terminated-by=0x0d0a --fields-terminated-by=0x09 cookbook mytbl.txt

Категории