Dealing with NULL Values
10.36.1 Problem
You're not sure how to represent NULL values in a datafile.
10.36.2 Solution
Try to use a value not otherwise present, so that you can distinguish NULL from all other legitimate non-NULL values.
10.36.3 Discussion
There's no particular standard for representing NULL values in datafiles, which makes them a bit of a problem for import and export operations. Some of the difficulty arises from the fact that NULL indicates the lack of a value, and something that's not there is not easy to represent literally in a datafile. Using an empty column value is the most obvious thing to do, but that's ambiguous for string-valued columns because there is no way to distinguish a NULL represented that way from a true empty string. Empty values can be a problem for other column types as well. For example, if you load an empty value with LOAD DATA into a numeric column, it gets stored as 0 rather than as NULL, and thus becomes indistinguishable from a true 0 in the input.
The usual strategy for dealing with this problem is to represent NULL using a value that doesn't otherwise occur in the data. This is how LOAD DATA and mysqlimport handle the issue, with N as the value that is understood by convention to mean NULL. Based on that fact, it's sometimes helpful to convert empty fields in a datafile to N so that LOAD DATA will interpret them as NULL. It's easy to write a script that does this:
#! /usr/bin/perl -w # empty_to_null.pl - Convert empty input fields to N. # N is the MySQL LOAD DATA convention for NULL. Running a file # through this script and loading the result causes NULL to be loaded # rather than empty strings. # Assumes tab-delimited, linefeed-terminated input lines. use strict; while (<>) { chomp; my @val = split (/ /, $_, 10000); # split, preserving all fields # map empty fields to N, write as tab-delimited output line print join (" ", map { /^$/ ? "\N" : $_ } @val) . " "; } exit (0);
You might use the script like this:
% empty_to_null.pl mytbl.txt > mytbl.txt2 % mysqlimport --local cookbook mytbl.txt2
Loading a file that has been run through the empty_to_null.pl script often can produce better results for columns that allow NULL values. This is shown by the following table, which compares the values that result when you use LOAD DATA or mysqlimport to load either an empty string or N (NULL) into various column types when those columns allow NULL values:
Column type |
Result of loading empty string |
Result of loading N |
---|---|---|
CHAR |
Empty string |
NULL |
INT |
0 |
NULL |
DATE |
0000-00-00 |
NULL |
But what happens if you load N rather than an empty string into columns that are defined as NOT NULL? As it happens, it doesn't make any difference either way, as shown in the next table:
Column type |
Result of loading empty string |
Result of loading N |
---|---|---|
CHAR |
Empty string |
Empty string |
INT |
0 |
0 |
DATE |
0000-00-00 |
0000-00-00 |
This means that it's not worth it to write a smarter version of empty_to_null.pl that looks at the structure of the table you want to load the data into and converts empty strings to N only for columns that allow NULL values.
On the other hand, one reason that it is worth writing a smarter script is that you might want to interpret values other than empty values as signifying NULL, and you might have different conventions in different columns. Consider the following datafile, has_nulls.txt:
str1 13 str2 0 Unknown 15 Unknown 0
The first column contains strings, and Unknown signifies NULL. The second column contains integers, and 0 signifies NULL. What to do? To handle that kind of file, the transfer directory of the recipes distribution contains a to_null.pl script. It provides options allowing you to specify both which columns to look in and what value to look for:
--columns= column_list
Convert values only in the named columns. By default, to_null.pl looks in all columns. If this option is given, column_list should be a list of one or more column positions separated by commas. Positions begin at 1.
--null= value
Interpret value as the NULL value indicator and convert instances of it to N. The default is to convert empty values, like empty_to_null.pl.
--case-insensitive, -i
Perform a case-insensitive comparison when looking for the NULL value indicator.
Because the has_nulls.txt datafile has two different NULL value indicators, it's necessary to process it using two invocations of to_null.pl:
% to_null.pl --columns=1 --null=Unknown has_nulls.txt | to_null.pl --columns=2 --null=0 > tmp
The resulting file, tmp, looks like this:
str1 13 str2 N N 15 N N
Sometimes you don't need to preprocess your input file if you can postprocess it after importing it. For example, if a datafile contains a numeric column that uses -1 to represent NULL values, you can easily convert all -1 values after loading the file using a simple UPDATE statement:
UPDATE tbl_name SET col_name = NULL WHERE col_name = -1;
The preceding discussion pertains to interpreting NULL values for import into MySQL. It's also necessary to think about NULL values when transferring data the other wayfrom MySQL into other programs. Here are some examples:
- SELECT INTO ... OUTFILE writes NULL values as N. Will another program understand that convention? If not, you'll need to convert N to something the program will understand.
- You can use mysql in batch mode as an easy way to produce tab-delimited output (see Recipe 10.14), but one problem with doing so is that NULL values appear in the output as instances of the word "NULL." If that word occurs nowhere else in the output, you may be able to postprocess it to convert instances of the word to something more appropriate. A script similar to empty_to_null.pl would be easy to write, or you could use a one-line sed command:
% sed -e "s/NULL/\N/g" data.txt > tmp
If the word "NULL" does appear where it represents something other than a NULL value, then it's ambiguous and you should probably use a different method of exporting your data.