Character Values
There are three character (or, as they are more commonly known, string) data types offered by PostgreSQL. A string value is just thata string of zero or more characters. The three string data types are CHARACTER(n), CHARACTER VARYING(n), and TEXT.
A value of type CHARACTER(n) can hold a fixed-length string of n characters. If you store a value that is shorter than n, the value is padded with spaces to increase the length to exactly n characters. You can abbreviate CHARACTER(n) to CHAR(n). If you omit the "(n) " when you create a CHARACTER column, the length is assumed to be 1.
The CHARACTER VARYING(n) type defines a variable-length string of at most n characters. VARCHAR(n) is a synonym for CHARACTER VARYING(n). If you omit the "(n) " when creating a CHARACTER VARYING column, you can store strings of any length in that column.
The last string type is TEXT. A TEXT column is equivalent to a VARCHAR column without a specified lengtha TEXT column can store strings of any length.
Syntax for Literal Values
A string value is a sequence of characters surrounded by a pair of delimiters. Prior to PostgreSQL version 8.0, you had to use a pair of single quote characters to delimit a string value. Starting with version 8.0, you can also define your own delimiters for each string value using a form known as dollar quoting. Each of the following is a valid string value:
'I am a string' '3.14159265' ''
You can also write these same string values using dollar quoting as follows:
$$I am a string$$ $$3.14159265$$ $$$$
The first example is obviously a string value. '3.14159265' is also a string valueat first glance it may look like a numeric value but the fact it is surrounded by single quotes tells you that it is really a string. The third example ('') is also a valid string: It is the string composed of zero characters (that is, it has a length of zero). It is important to understand that an empty string is not the same as a NULL value. An empty string means that you have a known value that just happens to be empty, whereas NULL implies that the value is unknown. Consider, for example, that you are storing an employee name in your database. You might create three columns to hold the complete name: first_name, middle_name, and last_name. If you find an employee whose middle_name is NULL, that should imply that the employee might have a middle name, but you don't know what it is. On the other hand, if you find an employee who has no middle name, you should store that middle_name as an empty string. Again, NULL implies that you don't have a piece of information; an empty string means that you do have the information, but it just happens to be empty.
If a string is delimited with single quotes, how do you represent a string that happens to include a single quote? There are four choices. First, you can embed a single quote within a string by entering two adjacent quotes. For example, the string "Where's my car?" could be entered as:
'Where''s my car?'
Two other alternatives involve an escape character. An escape is a special character that tells PostgreSQL that the character (or characters) following the escape is to be interpreted as a directive instead of as a literal value. In PostgreSQL, the escape character is the backslash (). When PostgreSQL sees a backslash in a string literal, it discards the backslash and interprets the following characters according to the following rules:
is the backspace character f is the form feed character is the carriage-return character is the newline character is the tab character xxx (where xxx is an octal number) means the character whose ASCII value is xxx.
If any character, other than those mentioned, follows the backslash, it is treated as its literal value. So, if you want to include a single quote in a string, you can escape the quote by preceding it with a backslash:
'Where's my car?'
Or you can embed a single quote (or any character) within a string by escaping its ASCII value (in octal), as in
'Where 47s my car?'
Finally, you can use dollar quoting. To write the string "Where's my car?" in dollar-quoted form, use this format:
$$Where's my car?$$
Notice that in this form, the embedded single quote doesn't cause any problems. When you write a string in dollar-quoted form, the single quote character has no special meaningit's just another character. You may be thinking that dollar quoting just trades one special delimiter (a single quote) for another (two dollar signs). After all, what happens if you want to embed two consecutive dollar signs in a string value? OK, that's not very likely, but PostgreSQL doesn't just ignore the problem; it lets you define your own delimiters.
In its most simple form, a dollar-quote delimiter is just a pair of dollar signs. To define your own delimiter, simply include a tag between the two dollar signs at the beginning of the string and include the same tag between the two dollar signs at the end of the string. You get to choose the tag but be aware that tags are case sensitive. Here's a string written using a custom delimiter:
$MyTag$That restaurant's rated 3 $$$; it must be expensive$MyTag$
When you define your own delimiter, embedded single quotes lose their special meaning and so do consecutive dollar signs. You can define a custom delimiter for each string value that you write, but remember that you don't have to define a custom delimiter unless your string contains consecutive dollar signs.
To summarize, here are the four ways that you can embed a single quote within a string:
'It''s right where you left it' 'It's right where you left it' 'It 47s right where you left it' $$It's right where you left it$$
Supported Operators
PostgreSQL offers a large number of string operators. One of the most basic operations is string concatenation. The concatenation operator (||) is used to combine two string values into a single TEXT value. For example, the expression
'This is ' || 'one string'
will evaluate to the value: 'This is one string'. And the expression
'The current time is ' || now()
will evaluate to a TEXT value such as, 'The current time is 2002-01-01 19:45:17-04'.
PostgreSQL also gives you a variety of ways to compare string values. All comparison operators return a BOOLEAN value; the result will be trUE, FALSE, or NULL. A comparison operator will evaluate to NULL if either of the operands are NULL.
The equality (=) and inequality (<>) operators behave the way you would expecttwo strings are equal if they contain the same characters (in the same positions); otherwise, they are not equal. You can also determine whether one string is greater than or less than another (and of course, greater than or equal to and less than or equal to).
Table 2.1[1] shows a few sample string comparisons.
[1] You might find the format of this table a bit confusing at first. In the first column, I use the 'q' character to represent any one of the operators listed in the remaining columns. So, the first row of the table tells you that 'string' < 'string' evaluates to FALSE, 'string' <= 'string' evaluates to trUE, 'string' = 'string' evaluates to trUE, and so forth. I'll use the 'q' character throughout this chapter to indicate an operator.
Operator (q) |
||||||
---|---|---|---|---|---|---|
Expression |
< |
<= |
= |
<> |
>= |
> |
'string' q 'string' |
FALSE |
trUE |
trUE |
FALSE |
trUE |
FALSE |
'string1' q 'string' |
FALSE |
FALSE |
FALSE |
trUE |
TRUE |
trUE |
'String1' q 'string' |
trUE |
trUE |
FALSE |
trUE |
FALSE |
FALSE |
You can also use pattern-matching operators with string values. PostgreSQL defines eight pattern-matching operators, but the names are a bit contrived and not particularly intuitive.
Table 2.2 contains a summary of the string operators.
Operator |
Meaning |
Case Sensitive? |
---|---|---|
|| |
Concatenation |
Not applicable |
~ |
Matches regular expression |
Yes |
~~ |
Matches LIKE expression |
Yes |
~* |
Matches regular expression |
No |
~~* |
Matches LIKE expression |
No |
!~ |
Does not match regular expression |
Yes |
!~~ |
Does not match LIKE expression |
Yes |
!~* |
Does not match regular expression |
No |
!~~* |
Does not match LIKE expression |
No |
The first set of pattern-matching operators is related to the LIKE keyword. ~~ is equivalent to LIKE. The ~~* operator is equivalent to ILIKE it is a case-insensitive version of LIKE. !~~ and !~~* are equivalent to NOT LIKE and NOT ILIKE, respectively.
The second set of pattern-matching operators is used to match a string value against a regular expression (regular expression s are described in more detail in Chapter 1, "Introduction to PostgreSQL and SQL"). The naming convention for the regular expression operators is similar to that for the LIKE operatorsregular expression operators are indicated with a single tilde and LIKE operators use two tildes. The ~ operator compares a string against a regular expression (returning true if the string satisfies the regular expression). ~* compares a string against a regular expression, ignoring differences in case. The !~ operator returns False if the string value matches the regular expression (and returns true if the string satisfies the regular expression). The !~* operator returns False if the string value matches the regular expression, ignoring differences in case, and returns TRue otherwise.