BLOBs
Most database systems provide a data type that can store raw data, and PostgreSQL is no exception. I use the term raw data to mean that the database doesn't understand the structure or meaning of a value. In contrast, PostgreSQL does understand the structure and meaning of other data types. For example, when you define an INTEGER column, PostgreSQL knows that the bytes of data that you place into that column are supposed to represent an integer value. PostgreSQL knows what an integer isit can add integers, multiply them, convert them to and from string form, and so on. Raw data, on the other hand, is just a collection of bitsPostgreSQL can't infer any meaning in the data.
PostgreSQL offers the type BYTEA for storing raw data. A BYTEA column can theoretically hold values of any length, but it appears that the maximum length is 1GB.
The size of a BYTEA value is 4 bytes plus the actual number of bytes in the value.
Syntax for Literal Values
Entering a BYTEA value can be a little tricky. A BYTEA literal is entered as a string literal: It is just a string of characters enclosed within single quotes. Given that, how do you enter a BYTEA value that includes a single quote? If you look back to the discussion of string literal values (earlier in this chapter), you'll see that you can include special characters in a string value by escaping them. In particular, a single quote can by escaped in one of three ways:
- Double up the single quotes ('This is a single quote''')
- Precede the single quote with a backslash ('This is a single quote '')
- Include the octal value of the character instead ('This is a single quote 47')
There are two other characters that you must escape when entering BYTEA literals. A byte whose value is zero (not the character 0, but the null byte) must be escaped, and the backslash character must be escaped. You can escape any character using the "ddd " form (where ddd is an octal number). You can escape any printable character using the "c " form. So, if you want to store a BYTEA value that includes a zero byte, you could enter it like this:
'This is a zero byte \000'
If you want to store a BYTEA value that includes a backslash, you can enter it in either of the following forms:
'This is a backslash \\' 'This is also a backslash \134'
If you compare these rules to the rules for quoting string literals, you'll notice that BYTEA literals require twice as many backslash characters. This is a quirk of the design of the PostgreSQL parser. BYTEA literals are processed by two different parsers. The main PostgreSQL parser sees a BYTEA literal as a string literal (gobbling up the first set of backslash characters). Then, the BYTEA parser processes the result, gobbling up the second set of backslash characters.
So, if you have a BYTEA value such as This is a backslash , you quote it as 'This is a backslash \\'. After the string parser processes this string, it has been turned into 'This is a backslash \'. The BYTEA parser finally transforms this into This is a backslash .
Supported Operators
PostgreSQL offers a single BYTEA operator: concatenation. You can append one BYTEA value to another BYTEA value using the concatenation (||) operator.
Note that you can't compare two BYTEA values, even for equality/inequality. You can, of course, convert a BYTEA value into another value using the CAST() operator, and that opens up other operators.
Large-Objects
The BYTEA data type is currently limited to storing values no larger than 1GB. If you need to store values larger than will fit into a BYTEA column, you can use large-objects. A large-object is a value stored outside of a table. For example, if you want to store a photograph with each row in your tapes table, you would add an OID column to hold a reference to the corresponding large-object:
movies=# ALTER TABLE tapes ADD COLUMN photo_id OID; ALTER
Each value in the photo_id column refers to an entry in the pg_largeobject system table. PostgreSQL provides a function that will load an external file (such as a JPEG file) into the pg_largeobject table:
movies=# INSERT INTO tapes VALUES movies-# ( movies(# 'AA-55892', movies(# 'Casablanca', movies(# '102 min', movies(# lo_import('/tmp/casablanca.jpg' ) movies(# );
The lo_import() function loads the named file into pg_largeobject and returns an OID value that refers to the large-object. Now when you SELECT this row, you see the OID, not the actual bits that make up the photo:
movies=# SELECT * FROM tapes WHERE tape_id = 'AA-55892'; tape_id | title | duration | photo_id ----------+------------+----------+---------- AA-55892 | Casablanca | 01:42:00 | 510699
If you want to write the photo back into a file, you can use the lo_export() function:
movies=# SELECT lo_export( 510699, '/tmp/Casablanca.jpg' ); lo_export ----------- 1 (1 row)
To see all large-objects in the current database, use psql 's lo_list metacommand:
movies=# lo_list Large objects ID | Description --------+------------- 510699 | (1 row)
You can remove large-objects from your database using the lo_unlink() function:
movies=# SELECT lo_unlink( 510699 ); lo_unlink ----------- 1 (1 row) movies=# lo_list Large objects ID | Description ----+------------- (0 rows)
How do you get to the actual bits behind the reference OID ? You can'tat least not with psql. Large-object support must be built into the client application that you are using. psql is a text-oriented tool and has no way to display a photograph, so the best that you can do is to look at the raw data in the pg_largeobject table. A few client applications, such as the Conjectrix Workstation, do support large-objects and can interpret the raw data properly, in most cases.