Converting Datafiles from One Format to Another
10.19.1 Problem
You want to convert a file to a different format to make it easier to work with, or so that another program can understand it.
10.19.2 Solution
Use the cvt_file.pl converter script described here.
10.19.3 Discussion
The mysql_to_text.pl script discussed in Recipe 10.18 uses MySQL as a data source and produces output in the format you specify via the --delim, --quote, and --eol options. This section describes cvt_file.pl, a utility that provides similar formatting options, but for both input and output. It reads data from a file rather than from MySQL, and converts it from one format to another. For example, to read a tab-delimited file data.txt, convert it to colon-delimited format, and write the result to tmp, you would invoke cvt_file.pl like this:
% cvt_file.pl --idelim=" " --odelim=":" data.txt > tmp
The cvt_file.pl script has separate options for input and output. Thus, whereas mysql_to_text.pl has just a --delim for specifying the column delimiter, cvt_file.pl has separate --idelim and --odelim options to set the input and output line column delimiters. But as a shortcut, --delim is also supported; it sets the delimiter for both input and output. The full set of options that cvt_file.pl understands is as follows:
--idelim= str, --odelim= str, --delim= str
Set the column delimiter sequence for input, output, or both. The option value may consist of one or more characters.
--iquote= c, --oquote= c, --quote= c
Set the column quote character for input, output, or both.
--ieol= str, --oeol= str, --eol= str
Set the end-of-line sequence for input, output, or both. The option value may consist of one or more characters.
--iformat= format, --oformat= format, --format= format,
Specify an input format, an output format, or both. This option is shorthand for setting the quote and delimiter values. For example, --iformat=csv sets the input quote and delimiter characters to double quote and comma. --iformat=tab sets them to "no quotes" and tab.
--ilabels, --olabels, --labels
Expect an initial line of column labels for input, write an initial line of labels for output, or both. If you request labels for the output but do not read labels from the input, cvt_file.pl uses column labels of c1, c2, and so forth.
cvt_file.pl assumes the same default file format as LOAD DATA and SELECT INTO ... OUTFILE, that is, tab-delimited lines terminated by linefeeds.
cvt_file.pl can be found in the transfer directory of the recipes distribution. If you expect to use it regularly, you should install it in some directory that's listed in your search path so that you can invoke it from anywhere. Much of the source for the script is similar to mysql_to_text.pl, so rather than showing the code and discussing how it works, I'll just give some examples illustrating how to use it:
- Read a file in CSV format with CRLF line termination, write tab-delimited output with linefeed termination:
% cvt_file.pl --iformat=csv --ieol=" " --oformat=tab --oeol=" " data.txt > tmp
- Read and write CSV format, converting CRLF line terminators to carriage returns:
% cvt_file.pl --format=csv --ieol=" " --oeol=" " data.txt > tmp
- Produce a tab-delimited file from the colon-delimited /etc/passwd file:
% cvt_file.pl --idelim=":" /etc/passwd > tmp
- Convert tab-delimited query output from mysql into CSV format:
% mysql -e "SELECT * FROM profile" cookbook | cvt_file.pl --oformat=csv > profile.csv