Hack 91. Export and Import Table Definitions
You want to move your data from one vendor's database platform to another. The first stage is to extract the metadata.
It should be possible for a system to export an entire database as a sequence of SQL commands. It also should be possible to import it again, on the same machine or on another machine, running the same database platform or a different one.
That's the theory. In reality, each database vendor has its own variation on SQL, which makes moving a database between platforms difficult, but certainly not impossible. The first step is to extract the table definitions and the relationships between tables. After that you can move the data itself.
Unsurprisingly, the vendors have invested in making it easy for you to import your database, but have not put as much investment into exporting. So, each platform is better at accepting ANSI standard SQL than it is at exporting that format. And, of course, each platform can import the SQL that it has exported (assuming you're importing into the same version you exported from), so this causes a problem only for moving between different vendors or versions.
11.2.1. MySQL
MySQL has facilities for exporting table definitions. The mysqldump command-line utility can display CREATE TABLE commands as well as the data itself. You can find the documentation at http://dev.mysql.com/doc/refman/5.0/ en/mysqldump.html. For exporting just the schema you can use --no-data, and if you were interested only in the data you can use --no-create-info. The output from mysqldump will need a little coaxing if you plan to import those CREATE statements into another system. The SQL that comes out of mysqldump with the default options is not going to be acceptable to any other system:
andrew@SQLZoo3:~> mysqldump --no-data u username -ppassword dbname staff -- MySQL dump 10.10 -- -- Host: localhost Database: dbname -- ------------------------------------------------------ -- Server version 5.0.18-standard /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; --SNIP-- -- -- Table structure for table QstaffQ -- DROP TABLE IF EXISTS QstaffQ; CREATE TABLE QstaffQ ( QidQ varchar(20) character set utf8 NOT NULL, QnmQ varchar(200) character set utf8 default NULL, PRIMARY KEY (QidQ) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; --SNIP-- /*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
First, a number of comments appear at the start and end of the file. Your target system can safely ignore these, because /* */ delimiters indicates a comment in most SQL implementations. At worst, a target database platform will generate error messages without stopping the process. The back quotes and the ENGINE instructions will stop the other platforms from continuing, and the character set utf8 may also have to be filtered out.
A host of switches can improve the output, and by piping the output through sed you can also remove the character set information. You still get all the comments as before, but the output is much cleaner now:
andrew@SQLZoo3:~> mysqldump u username -ppassword --skip-opt > --compatible=ansi --no-data --skip-quote-names > dbname staff | sed 's/character set utf8 //' --SNIP-- CREATE TABLE staff ( id varchar(20) NOT NULL, nm varchar(200) default NULL, PRIMARY KEY (id) ); --SNIP--
|
11.2.2. PostgreSQL
PostgreSQL has a simple export command, pg_dump, which produces relatively normal SQL. It does produce commented lines starting with --, blank lines, and unwanted SET commands, but these are easy to remove with a sed command. PostgreSQL uses the term character varying for VARCHAR; you can fix that with another sed substitution:
$ pg_dump --table=staff --schema-only --no-owner scott | > sed -e '/^--/d' > -e '/^SET/d' > -e '/^ *$/d' > -e 's/character varying/VARCHAR/' CREATE TABLE staff ( id VARCHAR(20) NOT NULL, nm VARCHAR(200), n integer ); ALTER TABLE ONLY staff ADD CONSTRAINT staff_pkey PRIMARY KEY (id);
11.2.3. SQL Server
In SQL Server, you can generate a CREATE script from the Query Analyzer utility (replaced by Server Management Studio in SQL Server 2005). You can do this for a table by right-clicking on it from the Object Browser, or you can do a whole database at once by selecting the Tasks menu item from the Database right-click menu. You should set a few options to make your SQL more portable; set Include If NOT EXISTS and Script Owner to false.
Here is a typical generated script:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [staff]( [id] [nvarchar](20) NOT NULL, [nm] [nvarchar](200) NULL, PRIMARY KEY CLUSTERED ( [id] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO
You have to deal with a fair amount of SQL Serverspecific code, and you are going to have to hack the output before any other system will deal with it.
You can search and replace the GO instructions with a semicolon. Most systems generally accept a semicolon as a delimiter.
You can usually leave instructions such as SET ANSI_NULLS ON in the file. Most database system command interfaces support a mechanism for merely generating an error message on unrecognized commands, but continuing anyway onto the next instruction. So, for instance, if you were importing this into MySQL:
This fails with an error message $ mysql -u username -p password dbname < file.sql This prints an error messages, but succeeds anyway $ mysql -u username -p password dbname mysql> source file.sql
it would be easy to remove the open and close square brackets (SQL Server 2000 had an option to suppress these, but 2005 has taken away that feature).
You can fix most of the details using search and replace on any old text editor. You could instead use a few lines of Perl in the file convertMSSQL.pl:
use strict; while (<>){ #Foreach line of the file s/^GO$/;/; #Replace GO with semicolon s/[datetime]/TIMESTAMP/I; #Replace [datetime] with TIMESTAMP s/[//g; #Delete [ s/]//g; #Delete ] s/).*/)/; #Delete all characters following ) print; #Print out the new version of each line }
You can run this script from the operating system prompt:
perl convertMSSQL.pl < mssqloutput.sql
11.2.3.1. Dates in SQL Server
You must be especially careful when importing or exporting dates from SQL Server. SQL Server uses the DATETIME data type; other systems call this TIMESTAMP. To confuse matters further SQL Server has a TIMESTAMP type, but that does something else entirely.
When exporting from SQL Server, change DATETIME to TIMESTAMP.
When importing, change each of DATE, TIME, and TIMESTAMP to DATETIME.
11.2.4. Oracle
You need the exp and imp utilities to get the SQL CREATE statements out of Oracle. The exp command creates a binary file called expdat.dmp that contains all of your data and metadata. There is no option to allow exp to create SQL directly. The command imp is used to reimport the .dmp file. You can run the imp command so that it doesn't actually import anything, but instead prints out the SQL commands that it would have used:
[gordon@db book]$ exp USERNAME/password 'TABLES= ( DBBOSS )' Export: Release 10.1.0.3.0 - Production on Mon Jul 17 12:10:41 2006 Copyright (c) 1982, 2004, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production With the Partitioning, OLAP and Data Mining options Export done in UTF8 character set and AL16UTF16 NCHAR character set server uses AL32UTF8 character set (possible charset conversion) About to export specified tables via Conventional Path ... . . exporting table DBBOSS 30 rows exported EXP-00091: Exporting questionable statistics. EXP-00091: Exporting questionable statistics. Export terminated successfully with warnings. [gordon@db book]$ imp USERNAME/password SHOW=Y Import: Release 10.1.0.3.0 - Production on Mon Jul 17 12:10:46 2006 Copyright (c) 1982, 2004, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production With the Partitioning, OLAP and Data Mining options Export file created by EXPORT:V10.01.00 via conventional path import done in UTF8 character set and AL16UTF16 NCHAR character set import server uses AL32UTF8 character set (possible charset conversion) . importing USERNAME's objects into USERNAME "CREATE TABLE "DBBOSS" ("USERNAME" VARCHAR2(100), "MODID" NUMBER) PCTFREE 1" "0 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREE" "LIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS" . . skipping table "DBBOSS" "ALTER TABLE "DBBOSS" ADD PRIMARY KEY ("USERNAME", "MODID") USING INDEX PCT" "FREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREELIST " "GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING ENABLE" "ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("USERNAME") REFERENCES "DBUSERS" ("US" "ERNAME") ENABLE" "ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("MODID") REFERENCES "DBPROG" ("MID") " "ENABLE" Import terminated successfully without warnings. [gordon@db book]$
The SQL that you can capture and reuse on another system is highlighted in the preceding code. However, the imp command has unhelpfully broken up lines and added double quotes. A little processing would take care of this, but it is actually easier to dive into the binary file expdat.dmp. This file has the SQL DDL (Data Definition Languagethe CREATE and ALTER commands) commands buried in it, and you can find them using the Unix command strings:
[andrew@db book]$ strings expdat.dmp gEXPORT:V10.01.00 UDBRW RTABLES 8192 Mon Jul 17 12:10:41 2006expdat.dmp #C## #C## +00:00 BYTE UNUSED INTERPRETED DISABLE:ALL METRICST TABLE "DBBOSS" CREATE TABLE "DBBOSS" ("USERNAME" VARCHAR2(100), "MODID" NUMBER) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS INSERT INTO "DBBOSS" ("USERNAME", "MODID") VALUES (:1, :2) -- snip -- ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("USERNAME") REFERENCES "DBUSERS" ("USERNAME") ENABLE ENDTABLE TABLE "DBBOSS" ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("MODID") REFERENCES "DBPROG" ("MID") ENABLE
Once you've got a cleaned-up version you've still got to remove the Oracle-specific code. There are simple issues, such as replacing VARCHAR2 with VARCHAR, but there are also Oracle-specific additions to the end of the statement, and these can be harder to process automatically.
11.2.5. Access
Microsoft Access is handy for moving data around. It has a relatively simple wizard for importing a single table from a flat file; it is also fairly easy and intuitive to set up an ODBC connection. See "Tunnel into MySQL from Microsoft Access" [Hack #44] for an example.
Getting metadata such as schemas out of Access is a little harder. The Visual Basic code shown here will output a basic table definition into the debug window; you can copy and paste from there. The script has several limitations, however. It assumes that each table has only one index, the primary key. Also, it requires that the primary key be a single field. You should be able to customize this if you need to:
Private Sub Command0_Click( ) Dim sql As String For i = 0 To CurrentDb.TableDefs.Count - 1 If Left(CurrentDb.TableDefs(i).Name, 4) <> "MSys" And _ Left(CurrentDb.TableDefs(i).Name, 4) <> "~TMP" Then Debug.Print "CREATE TABLE " & CurrentDb.TableDefs(i).Name & "(" For j = 0 To CurrentDb.TableDefs(i).Fields.Count - 1 Debug.Print " " & CurrentDb.TableDefs(i).Fields(j).Name; Debug.Print " " & _ dataType(CurrentDb.TableDefs(i).Fields(j).Type, _ CurrentDb.TableDefs(i).Fields(j).Size); If CurrentDb.TableDefs(i).Fields(j).Required Then _ Debug.Print " NOT NULL "; Debug.Print " , " Next Debug.Print " PRIMARY KEY(" & _ CurrentDb.TableDefs(i).Indexes(0).Fields(0).Name & ")" Debug.Print ");" Else Debug.Print "--Skipping table: " & CurrentDb.TableDefs(i).Name End If Next i DoCmd.RunCommand acCmdDebugWindow End Sub Private Function dataType(t As Integer, sz As Integer) If t = 10 Then dataType = "VARCHAR(" & sz & ")": Exit Function If t = 20 Then dataType = "DECIMAL(" & sz & ")": Exit Function If t = 3 Then dataType = "INTEGER": Exit Function If t = 7 Then dataType = "FLOAT": Exit Function dataType = "UNKNOWN_DATA_TYPE_" & t & " " End Function
Here is some sample output from the preceding Visual Basic script:
--Skipping table: ~TMPCLP295561 CREATE TABLE bbcRemote( name VARCHAR(50) NOT NULL , region VARCHAR(60) , area DECIMAL(16) , population DECIMAL(16) , gdp DECIMAL(16) , PRIMARY KEY(region) ); --Skipping table: MSysAccessObjects --Skipping table: MSysACEs --Skipping table: MSysObjects --Skipping table: MSysQueries --Skipping table: MSysRelationships CREATE TABLE t( a UNKNOWN_DATA_TYPE_4 NOT NULL , b VARCHAR(50) , PRIMARY KEY(a) ); CREATE TABLE Table1( test VARCHAR(50) , PRIMARY KEY(test) );
As it happens, you can't put this output back into Access because the data types have nonstandard names. You should use LONG in place of DECIMAL(16).
All of the properties of the database are available through this Visual Basic interface. For example, you can access all of the foreign keys through the CurrentDb.Relations collection. You can iterate over that using the same techniques as shown for the CurrentDb.TableDefs and the CurrentDb.TableDefs( i ).Indexes collections.
11.2.6. Potential Showstoppers
Some factors can make the process of switching platforms harder.
11.2.6.1. Auto-numbers
If you've used autonumbering schemes you will have a problem. You can convert the metadata without too much difficultly (see Table 11-1 for details). However, getting the data across will be challenging. You need to suspend the generation of new numbers during import, but then you have to switch it back on again when your new system is up and running.
"Generate Unique Sequential Numbers" [Hack #57] and "Generate Sequential or Missing Data" [Hack #82] may be of use in transporting autogenerated sequences.
System | Column type | Notes |
---|---|---|
SQL Server |
IDENTITY |
|
MySQL | INTEGER AUTO_INCREMENT | |
PostgreSQL |
INTEGER |
Use NEXTVAL('seqName') to calculate the value to insert. |
Oracle |
INTEGER |
Use seqName.NEXTVAL to calculate the value to insert. |
Access |
COUNTER |
11.2.6.2. Spaces in table names and column names
If you have spaces in your table names or field names you will have to use the appropriate quoting mechanism. SQL Server and Access use square brackets; MySQL allows back quotes; and Oracle will let you use double quotes. You could instead convert the names to use _ rather than a space, which will solve the problem for future exports and imports.
11.2.6.3. Nonstandard functions
It is difficult to write complex queries while sticking to the standard, and in many cases it is just not possible. You will probably find that much of your SQL has to be rewritten. Table 11-2 and Table 11-3 show some of the most commonly used functions.
ANSI | PostgreSQL | MySQL |
---|---|---|
COALESCE(x, y) | ANSI | ANSI |
CASE WHEN b THEN t ELSE f END | ANSI | ANSI andIF (b, t, f) |
EXTRACT(MONTH FROM d) | ANSI andDATE_PART('month', d) | ANSI andMONTH( d ) |
whn + INTERVAL '5' DAY | whn + INTERVAL '5 DAY'andwhn + 5 | ANSI andwhn + 5 |
SUBSTRING(s FROM x FOR y) | ANSI | ANSI andSUBSTRING (s, x, y) |
a||b||c |
ANSI |
CONCAT(a,b,c) |
ANSI | Oracle | SQL Server | Access |
---|---|---|---|
COALESCE(x, y) | ANSI andNLV(x, y) | ANSI | Nz(x, y) |
CASE WHEN b THEN tELSE f END | ANSI | ANSI | IIF(b, t, f) |
EXtrACT(MONTH FROM d) | ANSI and MONTH( d ) | DatePart( mm, d) | Month( d ) |
whn + INTERVAL '5' DAY | ANSI andwhn + 5 | DateAdd( d , whn, 5)andwhn + 5 | DateAdd( d, whn, 5)andwhn + 5 |
SUBSTRING(s FROM x FOR y) | SUBSTR(s, x, y) | Substring(s, x, y) | MID(s, x, y) |
a||b||c |
ANSI | a + b + c | a & b & c |