A Simple C Extension Module

2017-11-03 09:05:09

At least thats the short story; we need to turn to some code to make this more concrete. C types generally export a C module with a constructor function. Because of that (and because they are simpler), lets start off by studying the basics of C module coding with a quick example.

When you add new or existing C components to Python, you need to code an interface (or "glue") logic layer in C that handles cross-language dispatching and data translation. The C source file in Example 19-1 shows how to code one by hand. It implements a simple C extension module named hello for use in Python scripts, with a function named message that simply returns its input string argument with extra text prepended.

Example 19-1. PP2EIntegrateExtendHellohello.c

/******************************************************************** * A simple C extension module for Python, called "hello"; compile * this into a ".so" on python path, import and call hello.message; ********************************************************************/ #include

#include /* module functions */ static PyObject * /* returns object */ message(PyObject *self, PyObject *args) /* self unused in modules */ { /* args from python call */ char *fromPython, result[64]; if (! PyArg_Parse(args, "(s)", &fromPython)) /* convert Python -> C */ return NULL; /* null=raise exception */ else { strcpy(result, "Hello, "); /* build up C string */ strcat(result, fromPython); /* add passed Python string */ return Py_BuildValue("s", result); /* convert C -> Python */ } } /* registration table */ static struct PyMethodDef hello_methods[] = { {"message", message, 1}, /* method name, C func ptr, always-tuple */ {NULL, NULL} /* end of table marker */ }; , /* module initializer */ void inithello( ) /* called on first import */ { /* name matters if loaded dynamically */ (void) Py_InitModule("hello", hello_methods); /* mod name, table ptr */ }

Ultimately, Python code will call this C files message function with a string object and get a new string object back. First, though, it has to be somehow linked into the Python interpreter. To use this C file in a Python script, compile it into a dynamically loadable object file (e.g., hello.so on Linux) with a makefile like the one listed in Example 19-2, and drop the resulting object file into a directory listed on your PYTHONPATH module search path setting exactly as though it were a .py or .pyc file.^[2]

^[2] Because Python always searches the current working directory on imports, this chapters examples will run from the directory you compile them in (".") without any file copies or moves. Being on PYTHONPATHmatters more in larger programs and installs.

Example 19-2. PP2EIntegrateExtendHellomakefile.hello

############################################################# # Compile hello.c into a shareable object file on Linux, # to be loaded dynamically when first imported by Python. # MYPY is the directory where your Python header files live. ############################################################# PY = $(MYPY) hello.so: hello.c gcc hello.c -g -I$(PY)/Include -I$(PY) -fpic -shared -o hello.so clean: rm -f hello.so core

This is a Linux makefile (other platforms will vary); to use it to build the extension module, simply type make -f makefile.hello at your shell. Be sure to include the path to Pythons install directory with -I flags to access Python include (a.k.a. "header") files. When compiled this way, Python automatically loads and links the C module when it is first imported by a Python script.

Finally, to call the C function from a Python program, simply import module hello and call its hello.message function with a string:

[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ make -f makefile.hello [mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ python >>> import hello # import a C module >>> hello.message(world) # call a C function Hello, world >>> hello.message(extending) Hello, extending

And thats it -- youve just called an integrated C modules function from Python. The most important thing to notice here is that the C function looks exactly as if it were coded in Python. Python callers send and receive normal string objects from the call; the Python interpreter handles routing calls to the C function, and the C function itself handles Python/C data conversion chores.

In fact, there is little to distinguish hello as a C extension module at all, apart from its filename. Python code imports the module and fetches its attributes as if it had been written in Python. C extension modules even respond to dir calls as usual, and have the standard module and filename attributes (though the filename doesn end in a .py or .pyc this time around):

>>> dir(hello) # C module attributes [\__doc__, \__file__, \__name__, message] >>> hello.__name__, hello.__file__ (hello, ./hello.so) >>> hello.message # a C function object >>> hello # a C module object

Like any module in Python, you can also access the C extension from a script file. The Python file in Example 19-3, for instance, imports and uses the C extension module.

Example 19-3. PP2EIntegrateExtendHellohellouse.py

import hello print hello.message(C) print hello.message(module + hello.__file__) for i in range(3): print hello.message(str(i))

Run this script as any other -- when the script first imports module hello, Python automatically finds the C modules .so object file in a directory on PYTHONPATH and links it into the process dynamically. All of this scripts output represents strings returned from the C function in file hello.c :

[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ python hellouse.py Hello, C Hello, module ./hello.so Hello, 0 Hello, 1 Hello, 2

.3.1 Compilation and Linking

Now that Ive shown you the somewhat longer story, lets fill in the rest of the details. You always must compile and somehow link C extension files like the hello.c example with the Python interpreter to make them accessible to Python scripts, but there is some flexibility on how you go about doing so. For example, the following rule could be used to compile this C file on Linux too:

hello.so: hello.c gcc hello.c -c -g -fpic -I$(PY)/Include -I$(PY) -o hello.o gcc -shared hello.o -o hello.so rm -f hello.o

To compile the C file into a shareable object file on Solaris, you might instead say something like this:

hello.so: hello.c cc hello.c -c -KPIC -o hello.o ld -G hello.o -o hello.so rm hello.o

On other platforms, its more different still. Because compiler options vary widely, youll have to consult your C or C++ compilers documentation or Pythons extension manuals for platform- and compiler-specific details. The point is to determine how to compile a C source file into your platforms notion of a shareable or dynamically loaded object file. Once you have, the rest is easy; Python supports dynamic loading of C extensions on all major platforms today.

.3.1.1 Dynamic binding

Technically, what Ive been showing you so far is called "dynamic binding," and represents one of two ways to link compiled C extensions with the Python interpreter. Since the alternative, "static binding," is more complex, dynamic binding is almost always the way to go. To bind dynamically, simply:

Compile hello.c into a shareable object file

Put the object file in a directory on Pythons module search path

That is, once youve compiled the source code file into a shareable object file, simply copy or move the object file to a directory listed in PYTHONPATH. It will be automatically loaded and linked by the Python interpreter at runtime when the module is first imported anywhere in the Python process (e.g., from the interactive prompt, a standalone or embedded Python program, or a C API call).

Notice that the only non-static name in the hello.c example C file is the initialization function. Python calls this function by name after loading the object file, so its name must be a C global and should generally be of the form "initX", where "X" is both the name of the module in Python import statements and the name passed to Py_InitModule. All other names in C extension files are arbitrary, because they are accessed by C pointer, not by name (more on this later). The name of the C source file is arbitrary too -- at import time, Python cares only about the compiled object file.

.3.1.2 Static binding

Under static binding, extensions are added to the Python interpreter permanently. This is more complex, though, because you must rebuild Python itself, and hence need access to the Python source distribution (an interpreter executable won do). To link this example statically, add a line like:

hello ~/PP2E/Integrate/Extend/Hello/hello.c

to the Modules/Setup configuration file in the Python source code tree. Alternatively, you can copy your C file to the Modules directory (or add a link to it there with an ln command) and add a line to Setup like hello hello.c.

Then, rebuild Python itself by running a make command at the top level of the Python source tree. Python reconstructs its own makefiles to include the module you added to Setup, such that your code becomes part of the interpreter and its libraries. In fact, theres really no distinction between C extensions written by Python users and services that are a standard part of the language; Python is built with this same interface. The full format of module declaration lines looks like this (but see the Modules/Setup configuration file for more details):

... [ ...] [ ...] [ ...]

Under this scheme, the name of the modules initialization function must match the name used in the Setup file, or youll get linking errors when you rebuild Python. The name of the source or object file doesn have to match the module name; the leftmost name is the resulting Python modules name.

.3.1.3 Static versus dynamic binding

Static binding works on any platform and requires no extra makefile to compile extensions. It can be useful if you don want to ship extensions as separate files, or if you e on a platform without dynamic linking support. Its downsides are that you need to update the Python Setup configuration file and rebuild the Python interpreter itself, so you must therefore have the full source distribution of Python to use static linking at all. Moreover, all statically linked extensions are always added to your interpreter, whether or not they are used by a particular program. This can needlessly increase the memory needed to run all Python programs.

With dynamic binding, you still need Python include files, but can add C extensions even if all you have is a binary Python interpreter executable. Because extensions are separate object files, there is no need to rebuild Python itself or to access the full source distribution. And because object files are only loaded on demand in this mode, it generally makes for smaller executables too -- Python loads into memory only the extensions actually imported by each program run. In other words, if you can use dynamic linking on your platform, you probably should.

.3.2 Anatomy of a C Extension Module

Though simple, the hello.c example illustrates the structure common to all C modules. This structure can vary somewhat, but this file consists of fairly typical boilerplate code:

Python header files: The C file first includes the standard Python.h header file (from the installed Python Include directory). This file defines almost every name exported by the Python API to C, and serves as a starting point for exploring the API itself.
Method functions: The file then defines a function to be called from the Python interpreter in response to calls in Python programs. C functions receive two Python objects as input, and send either a Python object back to the interpreter as the result, or a NULL to trigger an exception in the script (more on this later). In C, a PyObject* represents a generic Python object pointer; you can use more specific type names, but don always have to. C module functions can all be declared C "static" (local to the file), because Python calls them by pointer, not name.
Registration table: Near the end, the file provides an initialized table (array) that maps function names to function pointers (addresses). Names in this table become module attribute names that Python code uses to call the C functions. Pointers in this table are used by the interpreter to dispatch C function calls. In effect, the table "registers" attributes of the module. A NULL entry terminates the table.
Initialization function: Finally, the C file provides an initialization function, which Python calls the first time this module is imported into a Python program. This function calls the API function Py_InitModule to build up the new modules attribute dictionary from the entries in the registration table and create an entry for the C module on the sys.modules table (described in Chapter 12). Once so initialized, calls from Python are routed directly to the C function through the registration tables function pointers.

.3.3 Data conversions

C module functions are responsible for converting Python objects to and from C datatypes. In Example 19-1, message gets two Python input objects passed from the Python interpreter: args is a Python tuple holding the arguments passed from the Python caller (the values listed in parentheses in a Python program), and self is ignored; it is useful only for extension types (discussed later in this chapter).

After finishing its business, the C function can return any of the following to the Python interpreter: a Python object (known in C as PyObject*), for an actual result; a Python None, (known in C as Py_None), if the function returns no real result; or a C NULL pointer, to flag an error and raise a Python exception.

There are distinct API tools for handling input conversions (Python to C) and output conversions (C to Python). Its up to C functions to implement their call signatures (argument lists and types) by using these tools properly.

.3.3.1 Python to C: Using Python argument lists

When the C function is run, the arguments passed from a Python script are available in the args Python tuple object. The API function PyArg_Parse(and PyArg_ParseTuple, its cousin that assumes it is converting a tuple object) is probably the easiest way to extract and convert passed arguments to C form.

PyArg_Parse takes a Python object, a format string, and a variable-length list of C target addresses. It converts the items in the tuple to C datatype values according to the format string, and stores the results in the C variables whose addresses are passed in. The effect is much like Cs scanf string function. For example, the hello module converts a passed-in Python string argument to a C char* using the s convert code:

PyArg_Parse(args, "(s)", &fromPython) # or PyArg_ParseTuple(args, "s",...

To handle multiple arguments, simply string format codes together and include corresponding C targets for each code in the string. For instance, to convert an argument list holding a string, an integer, and another string to C, say this:

PyArg_Parse(args, "(sis)", &s1, &i, &s2) # or PyArg_ParseTuple(args, "sis",...

To verify that no arguments were passed, use an empty format string like this: PyArg_Parse(args, "( )"). This API call checks that the number and types of the arguments passed from Python matches the format string in the call. If there is a mismatch, it sets an exception and returns zero to C (more on errors below).

.3.3.2 Python to C: Using Python return values

As well see in Chapter 20, Embedding Python, API functions may also return Python objects to C as results when Python is being run as an embedded language. Converting Python return values in this mode is almost the same as converting Python arguments passed to C extension functions, except that Python return values are not always tuples. To convert returned Python objects to C form, simply use PyArg_Parse. Unlike PyArg_ParseTuple, this call takes the same kinds of arguments but doesn expect the Python object to be a tuple.

.3.3.3 C to Python: Returning values to Python

There are two ways to convert C data to Python objects: by using type-specific API functions, or the general object-builder function Py_BuildValue. The latter is more general, and is essentially the inverse of PyArg_Parse, in that Py_BuildValue converts C data to Python objects according to a format string. For instance, to make a Python string object from a C char*, the hello module uses an s convert code:

return Py_BuildValue("s", result) # "result" is a C char []/*

More specific object constructors can be used instead:

return PyString_FromString(result) # same effect

Both calls make a Python string object from a C character array pointer. See the now-standard Python extension and runtime API manuals for an exhaustive list of such calls available. Besides being easier to remember, though, Py_BuildValue has syntax that allows you to build lists in a single step, described next.

.3.3.4 Common conversion codes

With a few exceptions, PyArg_Parse(Tuple) and Py_BuildValue use the same conversion codes in format strings. A list of all supported conversion codes appears in Pythons extension manuals. The most commonly used are shown in Table 19-1; the tuple, list, and dictionary formats can be nested.

Table 19-1. Common Python/C Data Conversion Codes
Format-String Code	C Datatype	Python Object Type
`s`	`char*`	String
`s#`	`char*, int`	String, length
`i`	`int`	Integer
`l`	`long int`	Integer
`c`	`char`	String
`f`	`float`	Floating-point
`d`	`double`	Floating-point
`O`	`PyObject*`	Raw (unconverted) object
`O&`	`&converter`, `void*`	Converted object (calls converter)
`(`items`)`	Targets or values	Nested tuple
`[`items`]`	Series of arguments/values	List
`{`items`}`	Series of `key,value` arguments	Dictionary

These codes are mostly what youd expect (e.g., i maps between a C int and a Python integer object), but here are a few usage notes on this tables entries:

Pass in the address of a char* for s codes when converting to C, not the address of a char array: Python copies out the address of an existing C string (and you must copy it to save it indefinitely on the C side: use strdup).

The O code is useful to pass raw Python objects between languages; once you have a raw object pointer, you can use lower-level API tools to access object attributes by name, index and slice sequences, and so on.

The O& code lets you pass in C converter functions for custom conversions. This comes in handy for special processing to map an object to a C datatype not directly supported by conversion codes (for instance, when mapping to or from an entire C struct or C++ class-instance). See the extensions manual for more details.

The last two entries, [...] and {...}, are currently supported only by Py_BuildValue: you can construct lists and dictionaries with format strings, but can unpack them. Instead, the API includes type-specific routines for accessing sequence and mapping components given a raw object pointer.

PyArg_Parsesupports some extra codes, which must not be nested in tuple formats ((...)):

|: The remaining arguments are all optional (varargs). The C targets are unchanged if arguments are missing in the Python tuple. For instance, si|sd requires two arguments but allows up to four.
:: The function name follows, for use in error messages set by the call (argument mismatches). Normally Python sets the error message to a generic string.
;: A full error message follows, running to the end of the format string.

This format code list isn exhaustive, and the set of convert codes may expand over time; refer to Pythons extension manual for further details.

.3.4 Error Handling

When you write C extensions, you need to be aware that errors can occur on either side of the languages fence. The following sections address both possibilities.

.3.4.1 Raising Python exceptions in C

C extension module functions return a C NULL value for the result object to flag an error. When control returns to Python, the NULL result triggers a normal Python exception in the Python code that called the C function. To name an exception, C code can also set the type and extra data of the exceptions it triggers. For instance, the PyErr_SetString API function sets the exception object to a Python object and sets the exceptions extra data to a character string:

PyErr_SetString(ErrorObject, message)

We will use this in the next example to be more specific about exceptions raised when C detects an error. C modules may also set a built-in Python exception; for instance, returning NULL after saying this:

PyErr_SetString(PyExc_IndexError, "index out-of-bounds")

raises a standard Python IndexError exception with the message string data. When an error is raised inside a Python API function, both the exception object and its associated "extra data" are automatically set by Python; there is no need to set it again in the calling C function. For instance, when an argument-passing error is detected in the PyArg_Parsefunction, the hello stack module just returns NULL to propagate the exception to the enclosing Python layer, instead of setting its own message.

.3.4.2 Detecting errors that occur in Python

Python API functions may be called from C extension functions, or from an enclosing C layer when Python is embedded. In either case, C callers simply check the return value to detect errors raised in Python API functions. For pointer result functions, Python returns NULL pointers on errors. For integer result functions, Python generally returns a status code of -1 to flag an error and a or positive value on success. (PyArg_Parse is an exception to this rule: it returns when it detects an error.) To make your programs robust, you should check return codes for error indicators after most Python API calls; some calls can fail for reasons you may not have expected (e.g., memory overflow).

.3.5 Reference Counts

The Python interpreter uses a reference-count scheme to implement garbage collection. Each Python object carries a count of the number of places it is referenced; when that count reaches zero, Python reclaims the objects memory space automatically. Normally, Python manages the reference counts for objects behind the scenes; Python programs simply make and use objects without concern for managing storage space.

When extending or embedding Python, though, integrated C code is responsible for managing the reference counts of the Python objects it uses. How important this becomes depends on how many raw Python objects a C module processes and which Python API functions it calls. In simple programs, reference counts are of minor, if any, concern; the hello module, for instance, makes no reference-count management calls at all.

When the API is used extensively, however, this task can become significant. In later examples, well see calls of these forms show up:

Py_INCREF(obj) increments an objects reference count.

Py_DECREF(obj) decrements an objects reference count (reclaim if zero).

Py_XINCREF(obj) is similar to Py_INCREF(obj), but ignores a NULL object pointer.

Py_XDECREF(obj) is similar to py_DECREF(obj), but ignores a NULL object pointer.

C module functions are expected to return either an object with an incremented reference count, or NULL to signal an error. As a general rule, API functions that create new objects increment their reference counts before returning them to C; unless a new object is to be passed back to Python, the C program that creates it should eventually decrement the objects counts. In the extending scenario, things are relatively simple; argument object reference counts need not be decremented, and new result objects are passed back to Python with their reference counts intact.

The upside of reference counts is that Python will never reclaim a Python object held by C as long as C increments the objects reference count (or doesn decrement the count on an object it owns). Although it requires counter management calls, Pythons garbage collector scheme is fairly well-suited to C integration.