Creating Tables
Now that you have seen how to connect to a database and issue a simple query, it's time to create some sample data to work with.
Because you are pretending to model a movie-rental business (that is, a video store), you will create tables that model the data that you might need in a video store. Start by creating three tables: tapes, customers, and rentals.
The tapes table is simple: For each videotape, you want to store the name of the movie, the duration, and a unique identifier (remember that you may have more than one copy of any given movie, so the movie name is not sufficient to uniquely identify a specific tape).
Here is the command you should use to create the tapes table:
CREATE TABLE tapes ( tape_id CHARACTER(8) UNIQUE, title CHARACTER VARYING(80), duration INTERVAL );
Let's take a close look at this command.
The verb in this command is CREATE TABLE, and its meaning should be obviousyou want to create a table. Following the CREATE TABLE verb is the name of the table (tapes) and then a comma-separated list of column definitions, enclosed within parentheses.
Each column in a table is defined by a name and a data type. The first column in tapes is named tape_id. Column names (and table names) must begin with a letter or an underscore character[2] and should be 31 characters or fewer[3]. The tape_id column is created with a data type of CHARACTER(8). The data type you define for a column determines the set of values that you can put into that column. For example, if you want a column to hold numeric values, you should use a numeric data type; if you want a column to hold date (or time) values, you should use a date/time data type. tape_id holds alphanumeric values (a mixture of numbers and letters), so I chose a character data type, with a length of eight characters.
[2] You can begin a column or table name with nonalphabetic characters, but you must enclose the name in double quotes. You have to quote the name not only when you create it, but each time you reference it.
[3] You can increase the maximum identifier length beyond 31 characters if you build PostgreSQL from a source distribution. If you do so, you'll have to remember to increase the identifier length each time you upgrade your server, or whenever you migrate to a different server.
The tape_id column is defined as UNIQUE. The word UNIQUE is not a part of the data typethe data type is CHARACTER(8). The keyword 'UNIQUE' specifies a column constraint. A column constraint is a condition that must be met by a column. In this case, each row in the tapes table must have a unique tape_id. PostgreSQL supports a variety of column constraints (and table constraints). I'll cover constraints in Chapter 2.
The title is defined as CHARACTER VARYING(80). The difference between CHARACTER(n) and CHARACTER VARYING(n) is that a CHARACTER(n) column is fixed lengthit will always contain a fixed number of characters (namely, n characters). A CHARACTER VARYING(n) column can contain a maximum of n characters. I'll mention here that CHARACTER(n) can be abbreviated as CHAR(n), and CHARACTER VARYING(n) can be abbreviated as VARCHAR(n). I chose CHAR(8) as the data type for tape_id because I know that a tape_id will always contain exactly eight characters, never more and never less. Movie titles, on the other hand, are not all the same length, so I chose VARCHAR(80) for those columns. A fixed length data type is a good choice when the data that you store is in fact fixed length; and in some cases, fixed length data types can give you a performance boost. A variable length data type saves space (and often gives you better performance) when the data that you are storing is not all the same length and can vary widely.
The duration column is defined as an INTERVALan INTERVAL stores a period of time such as 2 weeks, 1 hour 45 minutes, and so on.
I'll be discussing PostgreSQL data types in detail in Chapter 2. Let's move on to creating the other tables in this example database.
The customers table is used to record information about each customer for the video store.
CREATE TABLE customers ( customer_id INTEGER UNIQUE, customer_name VARCHAR(50), phone CHAR(8), birth_date DATE, balance NUMERIC(7,2) );
Each customer will be assigned a unique customer_id. Notice that customer_id is defined as an INTEGER, whereas the identifier for a tape was defined as a CHAR(8). A tape_id can contain alphabetic characters, but a customer_id is entirely numeric[4].
[4] The decision to define customer_id as an INTEGER was arbitrary. I simply wanted to show a few more data types here.
I've used two other data types here that you may not have seen before: DATE and NUMERIC. A DATE column can hold date values (century, year, month, and day). PostgreSQL offers other date/time data types that can store different date/time components. For example, a TIME column can store time values (hours, minutes, seconds, and microseconds). A TIMESTAMP column gives you both date and time componentscenturies through microseconds.
A NUMERIC column, obviously, holds numeric values. When you create a NUMERIC column, you have to tell PostgreSQL the total number of digits that you want to store and the number of fractional digits (that is, the number of digits to the right of the decimal point). The balance column contains a total of seven digits, with two digits to the right of the decimal point.
Now, let's create the rentals table:
CREATE TABLE rentals ( tape_id CHARACTER(8), customer_id INTEGER, rental_date DATE );
When a customer comes in to rent a tape, you will add a row to the rentals table to record the transaction. There are three pieces of information that you need to record for each rental: the tape_id, the customer_id, and the date that the rental occurred. Notice that each row in the rentals table refers to a customer (customer_id) and a tape (tape_id). In most cases, when one row refers to another row, you want to use the same data type for both columns.