Reloading the Data

Now that you know how to extract data from Oracle, you need to know how to load it back in again. This is easy if you have generated a file of SQL statements; you execute that file from within SQL*Plus. Loading data is a little tougher, however, if you have a file of comma-delimited or fixed-width data.

One way to load data into Oracle from a flat file is to use a tool called SQL*Loader, which is a generic utility provided by Oracle for the express purpose of loading data into the database from a file. Another approach is to use the new, external table feature introduced in Oracle9 i Database. SQL*Loader is convenient for loading from a relatively small file residing on your client PC, and it's a utility you can use without DBA support, which is a consideration if you don't happen to be the DBA.

An entire book could be written about SQL*Loader and external tables, so it's not possible to cover those topics exhaustively in the remainder of this chapter. What I can do is show you how to use the two methods to reload the employee table from a comma-delimited or fixed-width text file, the same files you learned how to create in this chapter. That should be enough to get you started.

9.5.1 Executing DDL and DML

If you extract data by using SQL*Plus to create a file of INSERT statements, loading the data somewhere else is as simple as creating the necessary table and executing the file. If you created a file of DDL statements, such as the CREATE PUBLIC SYNONYM commands shown earlier in Example 9-7, you only need to execute that file.

You may want to turn ECHO on, and spool the output of any files that you execute, so you can go back and check for errors later.

 

9.5.2 Running SQL*Loader

SQL*Loader is an Oracle utility to load data into a database from operating-system files. It's a general-purpose utility that can be configured to read and load data from various record formats. SQL*Loader is a powerful and versatile utility, and possibly because of that, it can be frustrating to learn. Certainly the manual can be a bit overwhelming the first time you look at it.

If you're loading data from a file residing on a client PC, SQL*Loader is the way to go. The other option, that of using an external table, requires that your datafile reside on the database server. SQL*Loader is a good option for one-off, ad hoc loads, even from files that are on the server. External table loads require a bit of upfront setup, which can be a bother to do for a load you want to run only one time.

9.5.2.1 The control file

To load data from a flat file into a database, you need to provide several types of information to SQL*Loader. First of all, SQL*Loader needs to know what database to connect to, how to connect to it, and what table to load. Then SQL*Loader needs to know the format of the input file. It needs to know where the fields are, how long they are, and how they are represented. If, for example, your input file has date fields, SQL*Loader needs to know whether they are in MM/DD/YYYY format, MM/DD/YY format, or some other format.

The database connection and login information are usually passed to SQL*Loader as command-line arguments. The remaining information, describing the input file, needs to be placed in a text file called the control file . When you run SQL*Loader, you tell it where the control file is. Then SQL*Loader reads the control file and uses that information to interpret the data in the flat file you are loading. Figure 9-3 illustrates this, and shows the information flow into and out of SQL*Loader.

Figure 9-3. SQL*Loader and the control file

In addition to describing the input file, the control file can be used to tell SQL*Loader what to do with badly formatted data records, and it can be used to specify conditions limiting the data that are loaded. You can read more about SQL*Loader in the Oracle Database Utilities manual.

9.5.2.2 Building a control file for comma-delimited data

Example 9-8 produced comma-delimited output that looks like this:

"ID","Billing Rate","Hire Date","Name" 101,169,11/15/1961,"Marusia Churai" 105,121,06/15/2004,"Mykola Leontovych" 107,45,01/02/2004,"Lesia Ukrainka" 111,100,08/23/1976,"Taras Shevchenko" 114,,07/05/2004,"Marusia Bohuslavka" 116,,07/05/2004,"Roxolana Lisovsky"

To load this same data back into the employee table or into another copy of the employee table, you need the control file shown in Example 9-11.

Example 9-11. Control file to load comma-delimited data produced by Example 9-8

OPTIONS (SKIP=1) LOAD DATA INFILE 'ex9-11.csv' APPEND INTO TABLE employee_copy ( employee_id INTEGER EXTERNAL TERMINATED BY ',', employee_billing_rate DECIMAL EXTERNAL TERMINATED BY ',' employee_hire_date DATE "MM/DD/YYYY" TERMINATED BY ',', employee_name CHAR TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"', )

You can think of the above as one long SQL*Loader command. The keywords LOAD DATA tell SQL*Loader to load data, and the rest of the command tells SQL*Loader where to get the data and how the data are formatted. The clauses are interpreted as described below:

OPTIONS (SKIP=1)

Causes SQL*Loader to skip over the first line containing column headings. The OPTIONS command embeds command-line options in the control file. The SKIP option specifies the number of input records to skip.

INFILE current_employees.csv

Tells SQL*Loader to read data from the file named current_employees.csv in the current working directory.

APPEND INTO TABLE employee_copy

Tells SQL*Loader to insert the data into the employee_copy table owned by the current user . SQL*Loader queries Oracle's data dictionary tables for the columns and datatypes used in this table. The keyword APPEND specifies that SQL*Loader should load the data even if the target table is not empty, and that SQL*Loader should preserve rows already in the target table.

Before running the load in Example 9-11, create the target table by executing the following SQL statement:

CREATE TABLE employee_copy AS SELECT * FROM employee WHERE 1=2;  

No rows, of course, will satisfy the WHERE clause. The result will be an empty table having the same column structure as the original.

 

( . . . column_specifications . . . )

Is a comma-delimited list of column specifications. Each column specification consists of the column name, followed by the representation (in the flat file) of the column, followed by the delimiter information.

The column names in the control file must correspond to the column names used in the database table you are loading, and they control the destination of each data element. For a delimited file, the order in which the column specifications appear in the control file must match the field order in the record.

The four elements of the column specifications used in this example are described in the following list. Table 9-1 describes the datatypes that are used.

column_name

Must be a column name in the destination table.

datatype

A SQL*Loader datatype. (See Table 9-1.)

TERMINATED BY ', '

Tells SQL*Loader that a comma marks the end of the value for the data element.

OPTIONALLY ENCLOSED BY '" '

Tells SQL*Loader that the data element may optionally be enclosed in quotes. If quotes are present, they are stripped off before the value is loaded.

Table 9-1. SQL*Loader data elements

Datatype

Description

CHAR

Used for character data.

DATE " format_string "

The data is a date, and the date is in the format specified by the format string. See Appendix B for information on writing such format strings.

DECIMAL EXTERNAL

Similar to INTEGER EXTERNAL, except that the number may contain a decimal point. This type is used for the employee_billing_rate field because the billing rate is a dollar and cent value.

FLOAT EXTERNAL

Enables loading of floating-point data, including that formatted using the E notation, as in 1.2345E+4.

INTEGER EXTERNAL

The data is numeric integer data stored as a character string. The character string must consist of the digits 0 through 9. Leading or trailing spaces are OK. Leading positive or negative signs (+ or -) are also OK.

SQL*Loader has its own set of datatypes, and they aren't the same as the ones used by the database. The most common datatypes used for loading data from text files are the numeric EXTERNAL types, CHAR, and DATE. These are described in Table 9-1.

9.5.2.3 Building a control file for fixed-width data

The control file used to load fixed-width employee data is similar to that used for delimited data. The only difference is that instead of specifying a delimiter for each field, you specify the starting and ending columns. Earlier in this chapter, in Example 9-3, you saw how to create a fixed-width file of employee data that looked like this:

101 169.00 11/15/1961 Marusia Churai 105 121.00 06/15/2004 Mykola Leontovych 107 045.00 01/02/2004 Lesia Ukrainka 111 100.00 08/23/1976 Taras Shevchenko 114 07/05/2004 Marusia Bohuslavka 116 07/05/2004 Roxolana Lisovsky

Example 9-12 specifies a control file that will load this fixed-width data into the employee_copy table.

Example 9-12. Control file to load fixed-width data generated by Example 9-3

LOAD DATA INFILE 'ex9-12.dat' APPEND INTO TABLE employee_copy ( employee_id POSITION (2:4) INTEGER EXTERNAL, employee_billing_rate POSITION (7:12) DECIMAL EXTERNAL NULLIF employee_billing_rate=BLANKS employee_hire_date POSITION (14:23) DATE "MM/DD/YYYY", employee_name POSITION (25:44) CHAR )

Each column in this control file contains a position specification that tells SQL*Loader where each field begins and ends. For some reason I have never been able to fathom, the position specification must precede the datatype, whereas a delimiter specification must follow the datatype. The position specification takes the following form:

POSITION ( starting_column : ending_column )

The starting and ending column numbers tell SQL*Loader where in the record to find the data, and the first character of a record is considered position 1. Unlike the case with delimited files, you do not have to list the column specifications for a fixed-width datafile in any particular order.

The employee_billing_rate column in this control file contains an extra element, a NULLIF clause. The NULLIF clause (the way it is written in the example) tells SQL*Loader to set the employee_billing_rate column to null when the input data record contains spaces instead of a rate. This clause isn't necessary in Example 9-11, which loads the comma-delimited file, because a null rate in that file is represented by an empty string between two adjacent commas. In the case of this fixed-width data, a null rate is represented as a string of spaces, or blanks. Hence, the use of NULLIF to specify that an all-blank field be treated as a null.

9.5.2.4 Loading the data

Once you have the control file written, you can invoke SQL*Loader to load the data into the database. You can pass the following three items as command-line parameters:

The last item, the log file name, is optional. If you include a log file name, SQL*Loader will generate a log of its activity and write it to that file. Among other things, any bad data records will result in log entries being made. At the end of the log file, SQL*Loader will print a summary showing how many records were loaded successfully and how many were rejected because of data errors. You won't get this information without a log file, so it's a good idea to generate one.

SQL*Loader is implemented as a command-line utility. From Oracle8 i onward, the command to run SQL*Loader is sqlldr . In a Windows environment, and prior to Oracle8 i Database, the command has the Oracle version number appended to it. If you have Oracle8 installed on Windows, the command is sqlldr80 .

Example 9-13 shows a run of SQL*Loader using the control file from Example 9-11. The comma-delimited data from ex9-11.csv is loaded into employee_copy , as per the control file's specifications. A log of the load operation is written to ex9-11.log , and Example 9-14 shows the contents of that log file. To load fixed-width data instead, substitute ex9-12.ctl and ex9-12.log for the control and log file names respectively.

Example 9-13. SQL*Loader being used to load comma-delimited data

oracle@gennick02:~/sqlplus> sqlldr gennick/secret control=ex9-11.ctl log=ex9-11.log SQL*Loader: Release 10.1.0.2.0 - Production on Wed Jul 7 21:16:21 2004 Copyright (c) 1982, 2004, Oracle. All rights reserved. Commit point reached - logical record count 6

 

Example 9-14. The log from Example 9-13s load

SQL*Loader: Release 10.1.0.2.0 - Production on Wed Jul 7 21:16:21 2004 Copyright (c) 1982, 2004, Oracle. All rights reserved. Control File: ex9-11.ctl Data File: ex9-11.csv Bad File: ex9-11.bad Discard File: none specified (Allow all discards) Number to load: ALL Number to skip: 1 Errors allowed: 50 Bind array: 64 rows, maximum of 256000 bytes Continuation: none specified Path used: Conventional Table EMPLOYEE_COPY, loaded from every logical record. Insert option in effect for this table: APPEND Column Name Position Len Term Encl Datatype ------------------------------ ---------- ----- ---- ---- --------------------- EMPLOYEE_ID FIRST * , CHARACTER EMPLOYEE_BILLING_RATE NEXT * , CHARACTER EMPLOYEE_HIRE_DATE NEXT * , DATE MM/DD/YYYY EMPLOYEE_NAME NEXT * , O(") CHARACTER Table EMPLOYEE_COPY: 6 Rows successfully loaded. 0 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null. Space allocated for bind array: 66048 bytes(64 rows) Read buffer bytes: 1048576 Total logical records skipped: 1 Total logical records read: 6 Total logical records rejected: 0 Total logical records discarded: 0 Run began on Wed Jul 07 21:16:21 2004 Run ended on Wed Jul 07 21:16:22 2004 Elapsed time was: 00:00:00.14 CPU time was: 00:00:00.05

The most important part of the log file to look at is the summary near the bottom, where SQL*Loader tells you how many rows were successfully loaded. In this case, one row was skipped, and six were successfully loaded. The skipped row was the first row with the column headings. If any records were rejected because of bad data, there would be an entry for each in the log file telling you which record was rejected and why.

There is a lot more to SQL*Loader than what you have seen in this chapter. Here are some of the other things you can do with SQL*Loader:

Bad data detection

You can specify a bad file , which is where SQL*Loader places records that are rejected because of bad data. After a load, you can review the bad file, fix the records, and attempt to load them again.

Record restriction

You can use a WHEN clause to place a restriction on the records to be loaded. Only those records that match the criteria in the WHEN clause will be loaded. Other records are ignored or may optionally be placed in a discard file .

Data manipulation

You can build expressions, using any of Oracle's built-in SQL functions, to manipulate the data in the input file before they are loaded into Oracle.

To learn more about SQL*Loader, consult the Oracle Database Utilities manual, which presents several case studies showing you how to use SQL*Loader's various features. You might also look at the book SQL*Loader: The Definitive Guide (O'Reilly), which is a joint effort between Sanjay Mishra and myself .

9.5.3 Using an External Table

Oracle9 i Database introduced a powerful new mechanism for loading data from operating system files. Using a new type of table known as the external table , you can retrieve data by issuing a SELECT statement; you can bring the full power of SQL to bear on filtering and transforming the data you are loading. A load then, becomes as simple and as powerful as issuing a CREATE TABLE . . . AS SELECT FROM or an INSERT INTO . . . SELECT FROM statement.

Learn more about external tables by reading the online Oracle Magazine article "Load Up with the Latest," which you can find at http://www.oracle.com/oramag/oracle/01-sep/index.html?o51o9i.html.

 

9.5.3.1 Creating a directory

The external table mechanism requires that the file to be loaded reside on the database server because it's the database instance rather than a client utility that reads the file. The file must reside in a directory accessible via an Oracle directory object. Your DBA (or you if you are the DBA) can create such a directory object as follows :

CREATE DIRECTORY loads AS '/home/oracle/sqlplus/ExampleScripts';

The command in this example creates an Oracle directory object named loads that points to the operating system directory /home/oracle/sqlplus/ExampleScripts . To issue the CREATE DIRECTORY statement, you must hold the CREATE ANY DIRECTORY system privilege. If you aren't the DBA, you'll need to get together with your DBA, to decide on an operating-system directory to use for loads, and then your DBA can issue the necessary CREATE DIRECTORY statement. The Oracle software owner will need read access to that directory and to any files in it that you wish to load via the external table mechanism. If you plan to write log files to that directory, the Oracle software will also need write access.

You need create a directory object only once. Once it has been created, you can use that directory repeatedly for any loads that you do. Avoid creating a new directory for each new load.

 

9.5.3.2 Creating an external table

Your next step is to create an external table. The statement to create such a table looks like a cross between a traditional CREATE TABLE statement and a SQL*Loader control file. Example 9-15 creates an external table to read the same comma-delimited data as loaded earlier by Example 9-11. Likewise, Example 9-16 creates an external table to access the fixed-width data loaded by Example 9-12.

Example 9-15. An external table to read comma-delimited data generated by Example 9-11

CREATE TABLE employee_comma ( employee_id NUMBER, employee_billing_rate NUMBER(5,2), employee_hire_date DATE, employee_name VARCHAR2(40) ) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY loads ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE 'employee_comma.bad' LOGFILE 'employee_comma.log' FIELDS ( employee_id CHAR(255) TERMINATED BY ",", employee_billing_rate CHAR(255) TERMINATED BY ",", employee_hire_date CHAR(255) TERMINATED BY "," DATE_FORMAT DATE MASK "MM/DD/YYYY", employee_name CHAR(255) TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' ) ) LOCATION ('ex9-11.csv') ) REJECT LIMIT UNLIMITED;

 

Example 9-16. An external table to read fixed-width data generated by Example 9-12

CREATE TABLE employee_fixed ( employee_id NUMBER, employee_billing_rate NUMBER(5,2), employee_hire_date DATE, employee_name VARCHAR2(40) ) ORGANIZATION external ( TYPE oracle_loader DEFAULT DIRECTORY loads ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE 'employee_fixed.bad' LOGFILE 'employee_fixed.log' FIELDS ( employee_id (2:4) CHAR(3), employee_billing_rate (7:12) CHAR(6) NULLIF (employee_billing_rate = BLANKS), employee_hire_date (14:23) CHAR(10) DATE_FORMAT DATE MASK "MM/DD/YYYY", employee_name (25:44) CHAR(20) ) ) LOCATION ('ex9-12.dat') ) REJECT LIMIT UNLIMITED;

Several things are worth noticing about Example 9-15. The ORGANIZATION EXTERNAL clause identifies the table as one that is mapped to an external datafile. TYPE oracle_loader specifies the SQL*Loader-like access driver. The DEFAULT DIRECTORY clause identifies loads as the Oracle directory object pointing to the operating system directory in which the datafile resides, and in which the log file will be created.

The definitions in the FIELDS clause are similar to those used for SQL*Loader in Example 9-11, but they are not quite the same. All the field types are CHAR, because all the values in the input file are in character form. Unlike the case with SQL*Loader, when using the external table oracle_loader access driver, you do need to specify a length, even for delimited columns. Example 9-15 specifies a length of 255 characters. This is a maximum length; no values in the external file even come close to approaching that many characters .

The oracle_loader access driver does support types such as INTEGER EXTERNAL and DECIMAL EXTERNAL. INTEGER EXTERNAL seems to work like its SQL*Loader namesake, but DECIMAL EXTERNAL does not. The oracle_loader driver does not support DATE as a datatype. It's easiest to use CHAR across the board.

Example 9-15 includes a BADFILE clause, specifying a file to which records that can't be loaded will be written. The file being loaded includes one such record, the one at the beginning with the column headings. There is no provision for skipping records during an external table load. However, the column headings record will always fail to load because it won't contain a valid number for the employee_id column.

Each time you SELECT data from the employee_comma table, a log of the load will be generated and appended to the log file. Thus, the employee_comma.log file will grow in size over time, and you'll need to delete it periodically. Oddly, the bad file is overwritten in each new load and, thus, does not grow in size.

One final thing to notice about Example 9-15 is the optional REJECT LIMIT UNLIMITED clause at the end of the statement. That clause allows for an unlimited number of bad records in the external datafile. The default is to abort an external table load in the event that any bad record is encountered .

Example 9-16 is similar to Example 9-15. The only differences are that each field definition specifies an explicit beginning and ending character position, and the field lengths are calculated to correspond to those character positions . employee_id , for example, runs from position 2 through 4 , for a total of three characters.

Migrating to External Tables

If you happen to have a SQL*Loader control file for a particular load, you can migrate that load to the external table mechanism by having SQL*Loader generate all the necessary SQL statements for you. For example, to migrate the load in Example 9-11 to external tables, issue this command:

sqlldr gennick/secret control=ex9-11.ctl log=ex9-11.log external_table=generate_only  

The external_table=generate_only option causes SQL*Loader not to do the load; instead, it tells it to write the following SQL statements into the log file:

  • A CREATE DIRECTORY statement, unless an existing directory can be found that refers to the directory containing your target datafile
  • A CREATE TABLE statement with a list of field definitions corresponding to those in your SQL*Loader control file
  • An INSERT statement that you can use to load the data from the external datafile, through the external table mechanism, and into your target database table

You may want to tweak SQL*Loader's CREATE TABLE statement to specify your own name for the external table rather than the one SQL*Loader generated for you. Not all SQL*Loader features translate directly to the external table mechanism. For example, there seems to be no external table equivalent of the SKIP parameter. If you run into a feature that doesn't translate, you may need to do some tweaking somewhere to make your load work, but you've still saved yourself plenty of drudgery by having SQL*Loader do so much of the work.

 

9.5.3.3 Loading the data

Once you have an external table in place, accessing the data from the external datafile to which the external table points is simply a matter of issuing a SELECT statement, as in Example 9-17.

Example 9-17. SELECTing from an external table

SQL> SELECT * FROM employee_comma; EMPLOYEE_ID EMPLOYEE_BILLING_RATE EMPLOYEE_ EMPLOYEE_NAME ----------- --------------------- --------- -------------------- 101 169 15-NOV-61 Marusia Churai 105 121 15-JUN-04 Mykola Leontovych 107 45 02-JAN-04 Lesia Ukrainka 111 100 23-AUG-76 Taras Shevchenko 114 05-JUL-04 Marusia Bohuslavka 116 05-JUL-04 Roxolana Lisovsky 6 rows selected.

 

You can just imagine the possibilities here. You can bring the full power of Oracle SQL, and of Oracle PL/SQL (via stored functions), to bear on transforming the data that you are loading. Example 9-18 demonstrates a load, giving you just a taste of the power that external tables place in your hands.

Example 9-18. Loading from an external table

SQL> COLUMN employee_name FORMAT A20 SQL> SQL> DELETE FROM employee_copy; 4 rows deleted. SQL> SQL> SELECT * FROM employee_copy; no rows selected SQL> SQL> INSERT /*+ APPEND */ INTO employee_copy ecc 2 (employee_id, employee_billing_rate, employee_hire_date, employee_name) 3 SELECT employee_id, 4 employee_billing_rate, 5 employee_hire_date, 6 UPPER(employee_name) 7 FROM employee_comma ec 8 WHERE EXISTS (SELECT * FROM project_hours ph 9 WHERE ph.employee_id = ec.employee_id); 4 rows created. SQL> SELECT * FROM employee_copy; SELECT * FROM employee_copy * ERROR at line 1: ORA-12838: cannot read/modify an object after modifying it in parallel SQL> COMMIT; Commit complete. SQL> SELECT * FROM employee_copy; EMPLOYEE_ID EMPLOYEE_NAME EMPLOYEE_ EMPLOYEE_ EMPLOYEE_BILLING_RATE ----------- -------------------- --------- --------- --------------------- 101 MARUSIA CHURAI 15-NOV-61 169 105 MYKOLA LEONTOVYCH 15-JUN-04 121 107 LESIA UKRAINKA 02-JAN-04 45 111 TARAS SHEVCHENKO 23-AUG-76 100

 

The INSERT statement in Example 9-18 performs the actual load. It uses the APPEND hint to cause the database to insert the loaded data, via the so-called direct path , into new blocks above the table's current highwater mark. Using APPEND is optional, but you'll get your best performance on large loads by using it. The APPEND hint is the reason why the first SELECT failed. You have to COMMIT before you can access a table that you've modified via the direct path.

The INSERT...SELECT FROM statement transforms the load in two ways. It applies the built-in UPPER function to the external table's employee_name column to convert all employee names to uppercase. It incorporates an EXISTS predicate that restricts the load to only those employees who have logged time to a project. Uppercasing a name is something you can do easily using SQL*Loader. Filtering records to be loaded based on other data in the database is something you can't do at all using SQL*Loader.

The SELECT statement at the end of the example shows the results of the load. Compare this to the output in Example 9-17, and you will see that two employees were omitted because they haven't charged any time to a project. All the names are uppercase.

External tables give you a great deal more power and flexibility when you are filtering and transforming data to be loaded. Moreover, you may also be able to use such tables to load data faster and more efficiently than with SQL*Loader. One reason for this is because the external table mechanism access driver can parallelize a load with practically no effort on your part. (Read the Oracle Database Utilities manual to learn how.) Another reason is that external tables may eliminate the need for temporary staging tables that consume disk space and require CPU and I/O to create. I still find SQL*Loader convenient for one-off, ad hoc loads, but for any load that you perform on a regular basis you should first think of using an external table.

     

Категории

© amp.flylib.com,