Dont Assume LOAD DATA Knows More than It Does
Don t Assume LOAD DATA Knows More than It Does
10.10.1 Problem
You think LOAD DATA is smarter than it really is.
10.10.2 Solution
Don't assume that LOAD DATA knows anything at all about the format of your datafile. And make sure you yourself know what its format is. If the file has been transferred from one machine to another, its contents may have been changed in subtle ways of which you're not aware.
10.10.3 Discussion
Many LOAD DATA frustrations occur because people expect MySQL to know things that it cannot possibly know. LOAD DATA makes certain assumptions about the structure of input files, represented as the default settings for the line and field terminators, and for the quote and escape character settings. If your input doesn't match those assumptions, you need to tell MySQL about it.
When in doubt, check the contents of your datafile using a hex dump program or other utility that displays a visible representation of whitespace characters like tab, carriage return, and linefeed. Under Unix, the od program can display file contents in a variety of formats. If you don't have od or some comparable utility, the transfer directory of the recipes distribution contains hex dumpers written in Perl and Python (hexdump.pl and hexdump.py), as well as a couple of programs that display printable representations of all characters of a file (see.pl and see.py). You may find them useful for examining files to see what they really contain. In some cases, you may be surprised to discover that a file's contents are different than you think. This is in fact quite likely if the file has been transferred from one machine to another:
- An FTP transfer between machines running different operating systems typically translates line endings to those that are appropriate for the destination machine if the transfer is performed in text mode rather than in binary (image) mode. Suppose you have tab-delimited linefeed-terminated records in a datafile that load into MySQL on a Unix system just fine using the default LOAD DATA settings. If you copy the file to a Windows machine with FTP using a text transfer mode, the linefeeds probably will be converted to carriage return/linefeed pairs. On that machine, the file will not load properly with the same LOAD DATA statement, because its contents will have been changed. Does MySQL have any way of knowing that? No. So it's up to you to tell it, by adding a LINES TERMINATED BY ' ' clause to the statement. Transfers between any two systems with dissimilar default line endings can cause these changes. For example, a Macintosh file containing carriage returns may contain linefeeds after transfer to a Unix system. You should either account for such changes with a LINES TERMINATED BY clause that reflects the modified line-ending sequence, or transfer the file in binary mode so that its contents do not change.
- Datafiles pasted into email messages often do not survive intact. Mail software may wrap (break) long lines or convert line-ending sequences. If you must transfer a datafile by email, it's best sent as an attachment.