Extending the PostgreSQL Server with Custom Functions
An extension function is loaded into a running PostgreSQL server process as needed. If you don't actually use an extension, it will not be loaded. Extension functions must be created in the form of a dynamically loadable object module. In the Windows world, an extension is contained within a DLL. In the Linux/Unix environment, an extension is contained within a shared object module.
There are two phases to the process of adding an extension function to the PostgreSQL server. First, you create the extension function in the language of your choice, compiling it into a dynamic object module (.dll or .so). Next, tell the PostgreSQL server about the function. The CREATE FUNCTION command adds a new function to a database.
I'll show you two examples that should help clarify this process.
PostgreSQL and Portability
Some of the steps required to write a PostgreSQL extension function in C may seem rather odd at first. You may feel more comfortable with the process if you understand the problem that the PostgreSQL authors were trying to fix.
When you call a function in a typical C program, you know at the time you write your code how to call that function. You know how many arguments are required and you know the data type of each argument. If you provide an incorrect number of parameters or incorrect data types, it is highly likely that your program will crash. For example, the fopen() function (from the C Runtime Library) requires two parameters:
If you omit the mode parameter or send a numeric data type instead of a pointer, your program will fail in some way.
Now, suppose that your program prompts the user for the name of a dynamic object module and the name of a function within that module. After you load the given module into your program, you have to call the named function. If you know which function the user will select, you can formulate your function call properly at the time you write your code. What happens if the user selects some other function that takes a completely different argument list? How can you formulate the function call if you don't know the parameter list? There is no portable way to do that, and PostgreSQL aims to be extremely portable.
So, the PostgreSQL authors decided to change the way you pass arguments to an extension function. Rather than declaring a separate formal parameter for each value passed to the function, PostgreSQL marshals all the arguments into a separate data structure and passes the address of the marshaled form to your extension. When you need to access function parameters, you get to them through the marshaled form.
This is similar in concept to the way the main() function of a C program behaves. You can't know, at the time you write the main() function, how many command-line parameters you will receive. (You might know how many parameters you should receive, but how many you will receive is not quite the same animal.) The startup routine on the C Runtime Library marshals the command-line arguments into a data structure (the argv[] array) and passes you the address of that structure. To find the actual values specified on the command line, you must use the data structure rather than formal parameters.
Older versions of PostgreSQL used a strategy that became less portable as operating systems advanced into the 64-bit arena. The old strategy is known as the "version-0 calling convention." The new strategy is called the "version-1 calling convention." PostgreSQL still supports both calling conventions, but you should stick to the version-1 convention for better portability.
For more information on the difference between the version-0 and version-1 conventions, see section 12 of the PostgreSQL Programmer's Guide.
There are two important consequences to the version-1 convention. First, all version-1 functions return the same data type: a Datum. A Datum is a sort of universal data type. Any PostgreSQL data type can be accessed through a Datum. PostgreSQL provides a set of macros that make it easy to work with Datums. Second, a version-1 function makes use of a set of macros to access function arguments. Every version-1 function is declared in the same way:
Datum function-name(PG_FUNCTION_ARGS);
As you read through the examples in this chapter, keep in mind that the PostgreSQL authors had to solve the portability problem.
The first example adds a simple function, named filesize, to the PostgreSQL server. Given the name of a file, it returns the size of the file (in bytes). If the file does not exist, cannot be examined, or is not a regular[1] file, this function returns NULL. You might find this function (and the filelist() function shown later) useful for performing system administration tasks from within a PostgreSQL application. After you have created the filesize function, you can call it like this:
[1] In this context, a file is considered "regular" if it is not a directory, named pipe, symbolic link, device file, or socket.
We'll develop the filesize function in C (see Listing 6.1).
The filesize function takes a single argumenta pathname in the form of a TEXT value. This function returns the size of the named file as an INTEGER value.