A Simple C Extension Module
At least thats the short story; we need to turn to some code to make this more concrete. C types generally export a C module with a constructor function. Because of that (and because they are simpler), lets start off by studying the basics of C module coding with a quick example.
When you add new or existing C components to Python, you need to code an interface (or "glue") logic layer in C that handles cross-language dispatching and data translation. The C source file in Example 19-1 shows how to code one by hand. It implements a simple C extension module named hello for use in Python scripts, with a function named message that simply returns its input string argument with extra text prepended.
Example 19-1. PP2EIntegrateExtendHellohello.c
/******************************************************************** * A simple C extension module for Python, called "hello"; compile * this into a ".so" on python path, import and call hello.message; ********************************************************************/ #include
#include Ultimately,
Python code will call this C files message
function with a string object and get a new string object back.
First, though, it has to be somehow linked into the Python
interpreter. To use this C file in a Python script, compile it into a
dynamically loadable object file (e.g., hello.so
on Linux) with a makefile like the one listed in Example 19-2, and drop the resulting object file into a
directory listed on your PYTHONPATH module search path setting
exactly as though it were a .py or
.pyc file.[2]
[2] Because Python always
searches the current working directory on imports, this
chapters examples will run from the directory you compile them
in (".") without any file copies or moves. Being on
PYTHONPATHmatters more in larger programs and
installs.
#############################################################
# Compile hello.c into a shareable object file on Linux,
# to be loaded dynamically when first imported by Python.
# MYPY is the directory where your Python header files live.
#############################################################
PY = $(MYPY)
hello.so: hello.c
gcc hello.c -g -I$(PY)/Include -I$(PY) -fpic -shared -o hello.so
clean:
rm -f hello.so core
This is a Linux makefile (other
platforms will vary); to use it to build the extension module, simply
type make -f
makefile.hello at your shell. Be sure to include
the path to Pythons install directory with
-I flags to access Python include (a.k.a.
"header") files. When compiled this way, Python
automatically loads and links the C module when it is first imported
by a Python script.
Finally, to call the C function from a Python program, simply import
module hello and call its
hello.message function with a string:
[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ make -f makefile.hello
[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ python
>>> import hello # import a C module
>>> hello.message(world) # call a C function
Hello, world
>>> hello.message(extending)
Hello, extending
And
thats it -- youve just called an integrated C
modules function from Python. The most important thing to
notice here is that the C function looks exactly as if it were coded
in Python. Python callers send and receive normal string objects from
the call; the Python interpreter handles routing calls to the C
function, and the C function itself handles Python/C data conversion
chores.
In fact, there is little to distinguish hello as a
C extension module at all, apart from its filename. Python code
imports the module and fetches its attributes as if it had been
written in Python. C extension modules even respond to
dir calls as usual, and have the standard module
and filename attributes (though the filename doesn end in a
.py or .pyc this time
around):
>>> dir(hello) # C module attributes
[\__doc__, \__file__, \__name__, message]
>>> hello.__name__, hello.__file__
(hello, ./hello.so)
>>> hello.message # a C function object
>>> hello # a C module object
Like any module in Python, you can also access the C extension from a
script file. The Python file in Example 19-3, for
instance, imports and uses the C extension module.
import hello
print hello.message(C)
print hello.message(module + hello.__file__)
for i in range(3):
print hello.message(str(i))
Run this script as any other -- when the script first imports
module hello, Python automatically finds the C
modules .so object file in a directory on
PYTHONPATH and links it into the process dynamically. All of this
scripts output represents strings returned from the C function
in file hello.c :
[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ python hellouse.py
Hello, C
Hello, module ./hello.so
Hello, 0
Hello, 1
Hello, 2
Now that Ive shown you the somewhat
longer story, lets fill in the rest of the details. You always
must compile and somehow link C extension files like the
hello.c example with the Python interpreter to
make them accessible to Python scripts, but there is some flexibility
on how you go about doing so. For example, the following rule could
be used to compile this C file on Linux
too:
hello.so: hello.c
gcc hello.c -c -g -fpic -I$(PY)/Include -I$(PY) -o hello.o
gcc -shared hello.o -o hello.so
rm -f hello.o
To compile the C file into a shareable object file on Solaris, you
might instead say something like this:
hello.so: hello.c
cc hello.c -c -KPIC -o hello.o
ld -G hello.o -o hello.so
rm hello.o
On other platforms, its more different still. Because compiler
options vary widely, youll have to consult your C or C++
compilers documentation or Pythons extension manuals
for platform- and compiler-specific details. The point is to
determine how to compile a C source file into your platforms
notion of a shareable or dynamically loaded object file. Once you
have, the rest is easy; Python supports dynamic loading of C
extensions on all major platforms today.
Technically, what Ive been
showing you so far is called "dynamic binding," and
represents one of two ways to link compiled C extensions with the
Python interpreter. Since the alternative, "static
binding," is more complex, dynamic binding is almost always the
way to go. To bind dynamically, simply:
Compile hello.c into a shareable object file
Put the object file in a directory on Pythons module search
path
That is, once youve compiled the source code file into a
shareable object file, simply copy or move the object file to a
directory listed in PYTHONPATH. It will be automatically loaded and
linked by the Python interpreter at runtime when the module is first
imported anywhere in the Python process (e.g., from the interactive
prompt, a standalone or embedded Python program, or a C API call).
Notice that the only non-static name in the
hello.c example C file is the initialization
function. Python calls this function by name after loading the object
file, so its name must be a C global and should generally be of the
form "initX", where "X" is both the name of
the module in Python import statements and the name passed to
Py_InitModule. All other names in C extension
files are arbitrary, because they are accessed by C pointer, not by
name (more on this later). The name of the C source file is arbitrary
too -- at import time, Python cares only about the compiled object
file.
Under static
binding, extensions are added to the Python interpreter permanently.
This is more complex, though, because you must rebuild Python itself,
and hence need access to the Python source distribution (an
interpreter executable won do). To link this example
statically, add a line like:
hello ~/PP2E/Integrate/Extend/Hello/hello.c
to the Modules/Setup configuration file in the
Python source code tree. Alternatively, you can copy your C file to
the Modules directory (or add a link to it there
with an ln command) and add a line to
Setup like hello hello.c.
Then, rebuild Python itself by running a make
command at the top level of the Python source tree. Python
reconstructs its own makefiles to include the module you added to
Setup, such that your code becomes part of the
interpreter and its libraries. In fact, theres really no
distinction between C extensions written by Python users and services
that are a standard part of the language; Python is built with this
same interface. The full format of module declaration lines looks
like this (but see the Modules/Setup configuration
file for more details):
Under this scheme, the name of the modules initialization
function must match the name used in the Setup
file, or youll get linking errors when you rebuild Python. The
name of the source or object file doesn have to match the
module name; the leftmost name is the resulting Python modules
name.
Static binding works on any platform
and requires no extra makefile to compile extensions. It can be
useful if you don want to ship extensions as separate files,
or if you
e on a platform without dynamic linking support. Its
downsides are that you need to update the Python
Setup configuration file and rebuild the Python
interpreter itself, so you must therefore have the full source
distribution of Python to use static linking at all. Moreover, all
statically linked extensions are always added to your interpreter,
whether or not they are used by a particular program. This can
needlessly increase the memory needed to run all Python programs.
With dynamic binding, you still need Python include files, but can
add C extensions even if all you have is a binary Python interpreter
executable. Because extensions are separate object files, there is no
need to rebuild Python itself or to access the full source
distribution. And because object files are only loaded on demand in
this mode, it generally makes for smaller executables
too -- Python loads into memory only the extensions actually
imported by each program run. In other words, if you can use dynamic
linking on your platform, you probably should.
Though simple, the
hello.c example illustrates the structure common
to all C modules. This structure can vary somewhat, but this file
consists of fairly typical boilerplate code:
The C file first includes the standard
Python.h header file (from the installed Python
Include directory). This file defines almost every
name exported by the Python API to C, and serves as a starting point
for exploring the API itself.
The
file then defines a function to be called from the Python interpreter
in response to calls in Python programs. C functions receive two
Python objects as input, and send either a Python object back to the
interpreter as the result, or a NULL to trigger an
exception in the script (more on this later). In C, a
PyObject* represents a generic Python object
pointer; you can use more specific type names, but don always
have to. C module functions can all be declared C
"static" (local to the file), because Python calls them
by pointer, not name.
Near
the end, the file provides an initialized table (array) that maps
function names to function
pointers (addresses). Names in this table become
module attribute names that Python code uses to call the C functions.
Pointers in this table are used by the interpreter to dispatch C
function calls. In effect, the table "registers"
attributes of the module. A NULL entry terminates
the table.
Finally,
the C file provides an initialization function, which Python calls
the first time this module is imported into a Python program. This
function calls the API function Py_InitModule to
build up the new modules attribute dictionary from the entries
in the registration table and create an entry for the C module on the
sys.modules table (described in Chapter 12). Once so initialized, calls from
Python are routed directly to the C function through the registration
tables function pointers.
C
module functions are responsible for converting Python objects to and
from C datatypes. In Example 19-1,
message gets two Python input objects passed from
the Python interpreter: args is a Python tuple
holding the arguments passed from the Python caller (the values
listed in parentheses in a Python program), and
self is ignored; it is useful only for extension
types (discussed later in this chapter).
After finishing its business, the C function can return any of the
following to the Python interpreter: a Python object (known in C as
PyObject*), for an actual result; a Python
None, (known in C as Py_None),
if the function returns no real result; or a C
NULL pointer, to flag an error and raise a Python
exception.
There are distinct API tools for handling input conversions (Python
to C) and output conversions (C to Python). Its up to C
functions to implement their call signatures (argument lists and
types) by using these tools properly.
When the C
function is run, the arguments passed from a Python script are
available in the args Python tuple object. The API
function PyArg_Parse(and
PyArg_ParseTuple, its cousin that assumes it is
converting a tuple object) is probably the easiest way to extract and
convert passed arguments to C
form.
PyArg_Parse takes a Python object, a format
string, and a variable-length list of C target addresses. It converts
the items in the tuple to C datatype values according to the format
string, and stores the results in the C variables whose addresses are
passed in. The effect is much like Cs scanf
string function. For example, the hello module
converts a passed-in Python string argument to a C
char* using the s convert code:
PyArg_Parse(args, "(s)", &fromPython) # or PyArg_ParseTuple(args, "s",...
To handle multiple arguments, simply string format codes together and
include corresponding C targets for each code in the string. For
instance, to convert an argument list holding a string, an integer,
and another string to C, say this:
PyArg_Parse(args, "(sis)", &s1, &i, &s2) # or PyArg_ParseTuple(args, "sis",...
To verify
that no arguments were passed, use an empty format string like this:
PyArg_Parse(args, "( )"). This
API call checks that the number and types of the arguments passed
from Python matches the format string in the call. If there is a
mismatch, it sets an exception and returns zero to C (more on errors
below).
As well see in Chapter 20,
Embedding Python, API functions may also return
Python objects to C as results when Python is being run as an
embedded language. Converting Python return values in this mode is
almost the same as converting Python arguments passed to C extension
functions, except that Python return values are not always tuples. To
convert returned Python objects to C form, simply use
PyArg_Parse. Unlike
PyArg_ParseTuple, this call takes the same kinds
of arguments but doesn expect the Python object to be a
tuple.
There are
two ways to convert C data to Python objects: by using type-specific
API functions, or the general object-builder function
Py_BuildValue. The latter is more general, and is
essentially the inverse of PyArg_Parse, in that
Py_BuildValue converts C data to Python objects
according to a format string. For instance, to make a Python string
object from a C char*, the
hello module uses an s convert
code:
return Py_BuildValue("s", result) # "result" is a C char []/*
More specific object constructors can be used instead: return PyString_FromString(result) # same effect
Both calls make a Python string object from a C character array
pointer. See the now-standard Python extension and runtime API
manuals for an exhaustive list of such calls available. Besides being
easier to remember, though, Py_BuildValue has
syntax that allows you to build lists in a single step, described
next.
With a
few exceptions, PyArg_Parse(Tuple) and
Py_BuildValue use the same conversion codes in
format strings. A list of all supported conversion codes appears in
Pythons extension manuals. The most commonly used are shown in
Table 19-1; the tuple, list, and dictionary formats
can be nested.
These codes are mostly what youd expect (e.g.,
i maps between a C int and a
Python integer object), but here are a few usage notes on this
tables entries:
PyArg_Parsesupports some
extra codes, which must not be nested in tuple formats
((...)):
The remaining arguments are all optional
(varargs). The C targets are unchanged if
arguments are missing in the Python tuple. For instance,
si|sd requires two arguments but allows up to
four.
The function name follows, for use in error messages set by the call
(argument mismatches). Normally Python sets the error message to a
generic string.
A full error message follows, running to the end of the format string. This format code list isn exhaustive, and the set of convert
codes may expand over time; refer to Pythons extension manual
for further details.
When you write C extensions, you need to be aware that errors can
occur on either side of the languages fence. The following sections
address both possibilities.
C extension module functions return a C
NULL value for the result object to flag an error.
When control returns to Python, the NULL result
triggers a normal Python exception in the Python code that called the
C function. To name an exception, C code can also set the type and
extra data of the exceptions it triggers. For instance, the
PyErr_SetString API function sets the exception
object to a Python object and sets the exceptions extra data
to a character string:
PyErr_SetString(ErrorObject, message)
We will use this in the next example to be more specific about
exceptions raised when C detects an error. C modules may also set a
built-in Python exception; for instance, returning
NULL after saying this:
PyErr_SetString(PyExc_IndexError, "index out-of-bounds")
raises a standard Python IndexError exception with
the message string data. When an error is raised inside a Python API
function, both the exception object and its associated "extra
data" are automatically set by Python; there is no need to set
it again in the calling C function. For instance, when an
argument-passing error is detected in the
PyArg_Parsefunction, the
hello stack module just returns
NULL to propagate the exception to the enclosing
Python layer, instead of setting its own
message.
Python API functions may be called from C extension functions, or
from an enclosing C layer when Python is embedded. In either case, C
callers simply check the return value to detect errors raised in
Python API functions. For pointer result functions, Python returns
NULL pointers on errors. For integer result
functions, Python generally returns a status code of -1 to flag an
error and a
or positive value on success. (PyArg_Parse is an
exception to this rule: it returns
when it detects an error.) To make your programs robust, you should
check return codes for error indicators after most Python API calls;
some calls can fail for reasons you may not have expected (e.g.,
memory overflow).
The Python interpreter uses a
reference-count scheme to implement garbage collection. Each Python
object carries a count of the number of places it is referenced; when
that count reaches zero, Python reclaims the objects memory
space automatically. Normally, Python manages the reference counts
for objects behind the scenes; Python programs simply make and use
objects without concern for managing storage space.
When extending or embedding Python, though, integrated C code is
responsible for managing the reference counts of the Python objects
it uses. How important this becomes depends on how many raw Python
objects a C module processes and which Python API functions it calls.
In simple programs, reference counts are of minor, if any, concern;
the hello module, for instance, makes no
reference-count management calls at all.
When the API is used extensively, however, this task can become
significant. In later examples, well see calls of these forms
show up:
C module functions are expected to return either an object with an
incremented reference count, or NULL to signal an
error. As a general rule, API functions that create new objects
increment their reference counts before returning them to C; unless a
new object is to be passed back to Python, the C program that creates
it should eventually decrement the objects counts. In the
extending scenario, things are relatively simple; argument object
reference counts need not be decremented, and new result objects are
passed back to Python with their reference counts intact.
The upside of reference counts is that Python will never reclaim a
Python object held by C as long as C increments the objects
reference count (or doesn decrement the count on an object it
owns). Although it requires counter management calls, Pythons
garbage collector scheme is fairly well-suited to C
integration.
Example 19-2. PP2EIntegrateExtendHellomakefile.hello
Example 19-3. PP2EIntegrateExtendHellohellouse.py
.3.1 Compilation and Linking
.3.1.1 Dynamic binding
.3.1.2 Static binding
.3.1.3 Static versus dynamic binding
.3.2 Anatomy of a C Extension Module
.3.3 Data conversions
.3.3.1 Python to C: Using Python argument lists
.3.3.2 Python to C: Using Python return values
.3.3.3 C to Python: Returning values to Python
.3.3.4 Common conversion codes
.3.4 Error Handling
.3.4.1 Raising Python exceptions in C
.3.4.2 Detecting errors that occur in Python
.3.5 Reference Counts
Категории