Basic Embedding Techniques

2017-11-03 09:05:09

As you can probably tell from the preceding overview, there is much flexibility in the embedding domain. To illustrate common embedding techniques in action, this section presents a handful of short C programs that run Python code in one form or another. Most of these examples make use of the simple Python module file shown in Example 20-1.

Example 20-1. PP2EIntegrateEmbedBasicsusermod.py

######################################################### # C runs Python code in this module in embedded mode. # Such a file can be changed without changing the C layer. # There is just standard Python code (C does conversions). # You can also run code in standard modules like string. ######################################################### import string message = The meaning of life... def transform(input): input = string.replace(input, life, Python) return string.upper(input)

If you know any Python at all, you know that this file defines a string and a function; the function returns whatever it is passed with string substitution and upper-case conversions applied. Its easy to use from Python:

[mark@toy ~/.../PP2E/Integrate/Embed/Basics]$ python >>> import usermod # import a module >>> usermod.message # fetch a string The meaning of life... >>> usermod.transform(usermod.message) # call a function THE MEANING OF PYTHON...

With proper API use, its not much more difficult to use this module the same way in C.

.3.1 Running Simple Code Strings

Perhaps the simplest way to run Python code from C is by calling the PyRun_ SimpleString API function. With it, C programs can execute Python programs represented as C character string arrays. This call is also very limited: all code runs in the same namespace (module __main__ ), the code strings must be Python statements (not expressions), and there is no easy way to communicate inputs or outputs with the Python code run. Still, its a simple place to start; the C program in Example 20-2 runs Python code to accomplish the same results as the interactive session listed in the prior section.

Example 20-2. PP2EIntegrateEmbedBasicsembed-simple.c

/******************************************************* * simple code strings: C acts like the interactive * prompt, code runs in __main__, no output sent to C; *******************************************************/ #include

/* standard API def */ main( ) { printf("embed-simple "); Py_Initialize( ); PyRun_SimpleString("import usermod"); /* load .py file */ PyRun_SimpleString("print usermod.message"); /* on python path */ PyRun_SimpleString("x = usermod.message"); /* compile and run */ PyRun_SimpleString("print usermod.transform(x)"); }

The first thing you should notice here is that when Python is embedded, C programs always call Py_Initializeto initialize linked-in Python libraries before using any other API functions. The rest of this code is straightforward -- C submits hardcoded strings to Python that are roughly what we typed interactively. Internally, PyRun_SimpleString invokes the Python compiler and interpreter to run the strings sent from C; as usual, the Python compiler is always available in systems that contain Python.

.3.1.1 Compiling and running

To build a standalone executable from this C source file, you need to link its compiled form with the Python library file. In this chapter, "library" usually means the binary library file (e.g., an .a file on Unix) that is generated when Python is compiled, not the Python source code library.

Today, everything about Python you need in C is compiled into a single .a library file when the interpreter is built. The programs main function comes from your C code, and depending on the extensions installed in your Python, you may also need to link any external libraries referenced by the Python library.

Assuming no extra extension libraries are needed, Example 20-3 is a minimal Linux makefile for building the C program in Example 20-2. Again, makefile details vary per platform, but see Python manuals for hints. This makefile uses the Python include-files path to find Python.h in the compile step, and adds the Python library file to the final link step to make API calls available to the C program.

Example 20-3. PP2EIntegrateEmbedBasicsmakefile.1

# a linux makefile that builds a C executable that embeds # Python, assuming no external module libs must be linked in; # uses Python header files, links in the Python lib file; # both may be in other dirs (e.g., /usr) in your install; # set MYPY to your Python install tree, change lib version; PY = $(MYPY) PYLIB = $(PY)/libpython1.5.a PYINC = -I$(PY)/Include -I$(PY) embed-simple: embed-simple.o gcc embed-simple.o $(PYLIB) -g -export-dynamic -lm -ldl -o embed-simple embed-simple.o: embed-simple.c gcc embed-simple.c -c -g $(PYINC)

Things may not be quite this simple in practice, though, at least not without some coaxing. The makefile in Example 20-4 is the one I actually used to build all of this sections C programs on Linux.

Example 20-4. PP2EIntegrateEmbedBasicsmakefile.basics

# build all 5 basic embedding examples # with external module libs linked in; # source setup-pp-embed.csh if needed PY = $(MYPY) PYLIB = $(PY)/libpython1.5.a PYINC = -I$(PY)/Include -I$(PY) LIBS = -L/usr/lib -L/usr/X11R6/lib -lgdbm -ltk8.0 -ltcl8.0 -lX11 -lm -ldl BASICS = embed-simple embed-string embed-object embed-dict embed-bytecode all: $(BASICS) embed%: embed%.o gcc embed$*.o $(PYLIB) $(LIBS) -g -export-dynamic -o embed$* embed%.o: embed%.c gcc embed$*.c -c -g $(PYINC) clean: rm -f *.o *.pyc $(BASICS) core

This version links in Tkinter libraries because the Python library file it uses was built with Tkinter enabled. You may have to link in arbitrarily many more externals for your Python library, and frankly, chasing down all the linker dependencies can be tedious. Required libraries may vary per platform and Python install, so there isn a lot of advice I can offer to make this process simple (this is C, after all).

But if you e going to do much embedding work, you might want to build Python on your machine from its source with all unnecessary extensions disabled in the Modules/Setup file. This produces a Python library with minimal external dependencies, which links much more easily. For example, if your embedded code won be building GUIs, Tkinter can simply be removed from the library; see the Setup file for details. You can also find a list of external libraries referenced from your Python in the generated makefiles located in the Python source tree. In any event, the good news is that you only need to resolve linker dependencies once.

Once youve gotten the makefile to work, run it to build the C program with python libraries linked in. Run the resulting C program as usual:^[2]

^[2] My build environment is a little custom (really, odd), so I first need to source $PP2E/Config/setup-pp-embed.csh to set up PYTHONPATH to point to the source library directory of a custom Python build on my machine. In Python 1.5.2., at least, Python may have trouble locating standard library directories when it is embedded, especially if there are multiple Python installs on the same machine (e.g., the interpreter and library versions may not match). This probably won be an issue in your build environment, but see the sourced files contents for more details if you get startup errors when you try to run a C program that embeds Python. You may need to customize your login scripts or source such a setup configuration file before running the embedding examples, but only if your Python lives in dark places.

[mark@toy ~/.../PP2E/Integrate/Embed/Basics]$ embed-simple embed-simple The meaning of life... THE MEANING OF PYTHON...

Most of this output is produced by Python print statements sent from C to the linked-in Python library. Its as if C has become an interactive Python programmer.

However, strings of Python code run by C probably would not be hardcoded in a C program file like this. They might instead be loaded from a text file, extracted from HTML or XML files, fetched from a persistent database or socket, and so on. With such external sources, the Python code strings that are run from C could be changed arbitrarily without having to recompile the C program that runs them. They may even be changed onsite, and by end users of a system. To make the most of code strings, though, we need to move on to more flexible API tools.

.3.2 Running Code Strings with Results and Namespaces

Example 20-5 uses the following API calls to run code strings that return expression results back to C:

Py_Initialize initializes linked-in Python libraries as before

PyImport_ImportModule imports a Python module, returns pointer to it

PyModule_GetDict fetches a modules attribute dictionary object

PyRun_String runs a string of code in explicit namespaces

PyObject_SetAttrString assigns an object attribute by name string

PyArg_Parse converts a Python return value object to C form

The import calls are used to fetch the namespace of the usermod module listed in Example 20-1 earlier, so that code strings can be run there directly (and will have access to names defined in that module without qualifications). Py_Import_ImportModule is like a Python import statement, but the imported module object is returned to C, not assigned to a Python variable name. Because of that, its probably more similar to the Python __import__ built-in function we used in Example 7-32.

The PyRun_String call is the one that actually runs code here, though. It takes a code string, a parser mode flag, and dictionary object pointers to serve as the global and local namespaces for running the code string. The mode flag can be Py_eval_input to run an expression, or Py_file_input to run a statement; when running an expression, the result of evaluating the expression is returned from this call (it comes back as a PyObject* object pointer). The two namespace dictionary pointer arguments allow you to distinguish global and local scopes, but they are typically passed the same dictionary such that code runs in a single namespace.^[3]

^[3] A related function lets you run files of code but is not demonstrated in this chapter: PyObject* PyRun_File(FILE *fp, char *filename, mode, globals, locals). Because you can always load a files text and run it as a single code string with PyRun_String, the PyRun_File call is not always necessary. In such multiline code strings, the character terminates lines and indentation groups blocks as usual.

Example 20-5. PP2EIntegrateEmbedBasicsembed-string.c

/* code-strings with results and namespaces */ #include

main( ) { char *cstr; PyObject *pstr, *pmod, *pdict; printf("embed-string "); Py_Initialize( ); /* get usermod.message */ pmod = PyImport_ImportModule("usermod"); pdict = PyModule_GetDict(pmod); pstr = PyRun_String("message", Py_eval_input, pdict, pdict); /* convert to C */ PyArg_Parse(pstr, "s", &cstr); printf("%s ", cstr); /* assign usermod.X */ PyObject_SetAttrString(pmod, "X", pstr); /* print usermod.transform(X) */ (void) PyRun_String("print transform(X)", Py_file_input, pdict, pdict); Py_DECREF(pmod); Py_DECREF(pstr); }

When compiled and run, this file produces the same result as its predecessor:

[mark@toy ~/.../PP2E/Integrate/Embed/Basics]$ embed-string embed-string The meaning of life... THE MEANING OF PYTHON...

But very different work goes into producing this output. This time, C fetches, converts, and prints the value of the Python modules message attribute directly by running a string expression, and assigns a global variable (X) within the modules namespace to serve as input for a Python print statement string.

Because the string execution call in this version lets you specify namespaces, you can better partition the embedded code your system runs -- each grouping can have a distinct namespace to avoid overwriting other groups variables. And because this call returns a result, you can better communicate with the embedded code -- expression results are outputs, and assignments to globals in the namespace in which code runs can serve as inputs.

Before we move on, I need to explain two coding issues here. First of all, this program also decrements the reference count on objects passed to it from Python, using the Py_DECREF call introduced in Chapter 19. These calls are not strictly needed here (the objects space is reclaimed when the programs exits anyhow), but demonstrate how embedding interfaces must manage reference counts when Python passes their ownership to C. If this was a function called from a larger system, for instance, you would generally want to decrement the count to allow Python to reclaim the objects.

Secondly, in a realistic program, you should generally test the return values of all the API calls in this program immediately to detect errors (e.g., import failure). Error tests are omitted in this sections example to keep the code simple, but will appear in later code listings and should be included in your programs to make them more robust.

.3.3 Calling Python Objects

The last two sections dealt with running strings of code, but its easy for C programs to deal in terms of Python objects too. Example 20-6 accomplishes the same task as Examples Example 20-2 and Example 20-5, but uses other API tools to interact with objects in the Python module directly:

PyImport_ImportModule imports the module from C as before

PyObject_GetAttrString fetches an objects attribute value by name

PyEval_CallObject calls a Python function (or class, or method)

PyArg_Parse converts Python objects to C values

Py_BuildValue converts C values to Python objects

We met both the data conversion functions in the last chapter. The PyEval_CallObject call in this version is the key call here: it runs the imported function with a tuple of arguments, much like the Python apply built-in function. The Python functions return value comes back to C as a PyObject*, a generic Python object pointer.

Example 20-6. PP2EIntegrateEmbedBasicsembed-object.c

/* fetch and call objects in modules */ #include

main( ) { char *cstr; PyObject *pstr, *pmod, *pfunc, *pargs; printf("embed-object "); Py_Initialize( ); /* get usermod.message */ pmod = PyImport_ImportModule("usermod"); pstr = PyObject_GetAttrString(pmod, "message"); /* convert string to C */ PyArg_Parse(pstr, "s", &cstr); printf("%s ", cstr); Py_DECREF(pstr); /* call usermod.transform(usermod.message) */ pfunc = PyObject_GetAttrString(pmod, "transform"); pargs = Py_BuildValue("(s)", cstr); pstr = PyEval_CallObject(pfunc, pargs); PyArg_Parse(pstr, "s", &cstr); printf("%s ", cstr); /* free owned objects */ Py_DECREF(pmod); Py_DECREF(pstr); Py_DECREF(pfunc); /* not really needed in main( ) */ Py_DECREF(pargs); /* since all memory goes away */ }

When compiled and run, the result is the same again:

[mark@toy ~/.../PP2E/Integrate/Embed/Basics]$ embed-object embed-object The meaning of life... THE MEANING OF PYTHON...

But this output is all generated by C this time -- first by fetching the Python modules message attribute value, and then by fetching and calling the modules transform function object directly and printing its return value that is sent back to C. Input to the transform function is a function argument here, not a preset global variable. Notice that message is fetched as a module attribute this time, instead of by running its name as a code string; there is often more than one way to accomplish the same goals with different API calls.

Running functions in modules like this is a simple way to structure embedding; code in the module file can be changed arbitrarily without having to recompile the C program that runs it. It also provides a direct communication model: inputs and outputs to Python code can take the form of function arguments and return values.

.3.4 Running Strings in Dictionaries

When we used PyRun_String earlier to run expressions with results, code was executed in the namespace of an existing Python module. However, sometimes its more convenient to create a brand new namespace for running code strings that is independent of any existing module files. The C file in Example 20-7 shows how; the new namespace is created as a new Python dictionary object, and a handful of new API calls are employed in the process:

PyDict_New makes a new empty dictionary object

PyDict_SetItemString assigns to a dictionarys key

PyDict_GetItemString fetches (indexes) a dictionary value by key

PyRun_String runs a code string in namespaces, as before

PyEval_GetBuiltins gets the built-in scopes module

The main trick here is the new dictionary. Inputs and outputs for the embedded code strings are mapped to this dictionary by passing it as the codes namespace dictionaries in the PyRun_String call. The net effect is that the C program in Example 20-7 works exactly like this Python code:

>>> d = {} >>> d[Y] = 2 >>> exec X = 99 in d, d >>> exec X = X + Y in d, d >>> print d[X] 101

But here, each Python operation is replaced by a C API call.

Example 20-7. PP2EIntegrateEmbedBasicsembed-dict.c

/*************************************************** * make a new dictionary for code string namespace; ***************************************************/ #include

main( ) { int cval; PyObject *pdict, *pval; printf("embed-dict "); Py_Initialize( ); /* make a new namespace */ pdict = PyDict_New( ); PyDict_SetItemString(pdict, "__builtins__", PyEval_GetBuiltins( )); PyDict_SetItemString(pdict, "Y", PyInt_FromLong(2)); /* dict[Y] = 2 */ PyRun_String("X = 99", Py_file_input, pdict, pdict); /* run statements */ PyRun_String("X = X+Y", Py_file_input, pdict, pdict); /* same X and Y */ pval = PyDict_GetItemString(pdict, "X"); /* fetch dict[X] */ PyArg_Parse(pval, "i", &cval); /* convert to C */ printf("%d ", cval); /* result=101 */ Py_DECREF(pdict); }

When compiled and run, this C program creates this sort of output:

[mark@toy ~/.../PP2E/Integrate/Embed/Basics]$ embed-dict embed-dict 101

The output is different this time: it reflects the value of Python variable X assigned by the embedded Python code strings and fetched by C. In general, C can fetch module attributes either by calling PyObject_GetAttrString with the module, or by using PyDict_GetItemString to index the modules attribute dictionary (expression strings work too, but are less direct). Here, there is no module at all, so dictionary indexing is used to access the codes namespace in C.

Besides allowing you to partition code string namespaces independent of any Python module files on the underlying system, this scheme provides a natural communication mechanism. Values stored in the new dictionary before code is run serve as inputs, and names assigned by the embedded code can later be fetched out of the dictionary to serve as code outputs. For instance, the variable Y in the second string run refers to a name set to 2 by C; X is assigned by the Python code and fetched later by C code as the printed result.

There is one trick in this code that I need to explain. Each module namespace in Python has a link to the built-in scopes namespace, where names like open and len live. In fact, this is the link Python follows during the last step of its local/global/built-in three-scope name lookup procedure.^[4] Today, embedding code is responsible for setting the __builtins__ scope link in dictionaries that serve as namespaces. Python sets this link automatically in all other namespaces that host code execution, and this embedding requirement may be lifted in the future (it seems a bit too magical to be required for long). For now, simply do what this example does to initialize the built-ins link, in dictionaries you create for running code in C.

^[4] This link also plays a part in Pythons restricted-execution mode, described in Chapter 15. By changing the built-in scope link to a module with limited attribute sets and customized versions of built-in calls like open, the rexec module can control machine access from code run through its interface.

.3.5 Precompiling Strings to Bytecode

When you call Python function objects from C, you are actually running the already-compiled bytecode associated with the object (e.g., a function body). When running strings, Python must compile the string before running it. Because compilation is a slow process, this can be a substantial overhead if you run a code string more than once. Instead, precompile the string to a bytecode object to be run later, using the API calls illustrated in Example 20-8:^[5]

^[5] Just in case you flipped ahead to this chapter early: bytecode is simply an intermediate representation for already compiled program code in the current standard Python implementation. Its a low-level binary format that can be quickly interpreted by the Python runtime system. Bytecode is usually generated automatically when you import a module, but there may be no notion of an import when running raw strings from C.

Py_CompileString compiles a string of code, returns a bytecode object

PyEval_EvalCode runs a compiled bytecode object

The first of these takes the mode flag normally passed to PyRun_String, and a second string argument that is only used in error messages. The second takes two namespace dictionaries. These two API calls are used in Example 20-8 to compile and execute three strings of Python code.

Example 20-8. PP2EIntegrateEmbedBasicsembed-bytecode.c

/* precompile code strings to bytecode objects */ #include

#include #include main( ) { int i; char *cval; PyObject *pcode1, *pcode2, *pcode3, *presult, *pdict; char *codestr1, *codestr2, *codestr3; printf("embed-bytecode "); Py_Initialize( ); codestr1 = "import usermod print usermod.message"; /* statements */ codestr2 = "usermod.transform(usermod.message)"; /* expression */ codestr3 = "print \%d:%d % (X, X ** 2),"; /* use input X */ /* make new namespace dictionary */ pdict = PyDict_New( ); if (pdict == NULL) return -1; PyDict_SetItemString(pdict, "__builtins__", PyEval_GetBuiltins( )); /* precompile strings of code to bytecode objects */ pcode1 = Py_CompileString(codestr1, "", Py_file_input); pcode2 = Py_CompileString(codestr2, "", Py_eval_input); pcode3 = Py_CompileString(codestr3, "", Py_file_input); /* run compiled bytecode in namespace dict */ if (pcode1 && pcode2 && pcode3) { (void) PyEval_EvalCode((PyCodeObject *)pcode1, pdict, pdict); presult = PyEval_EvalCode((PyCodeObject *)pcode2, pdict, pdict); PyArg_Parse(presult, "s", &cval); printf("%s ", cval); Py_DECREF(presult); /* rerun code object repeatedly */ for (i = 0; i <= 10; i++) { PyDict_SetItemString(pdict, "X", PyInt_FromLong(i)); (void) PyEval_EvalCode((PyCodeObject *)pcode3, pdict, pdict); } printf(" "); } /* free referenced objects */ Py_XDECREF(pdict); Py_XDECREF(pcode1); Py_XDECREF(pcode2); Py_XDECREF(pcode3); }

This program combines a variety of technique weve already seen. The namespace in which the compiled code strings run, for instance, is a newly created dictionary (not an existing module object), and inputs for code strings are passed as preset variables in the namespace. When built and executed, the first part of the output is similar to previous examples in this section, but the last line represents running the same precompiled code string 11 times:

[mark@toy ~/.../PP2E/Integrate/Embed/Basics]$ embed-bytecode embed-bytecode The meaning of life... THE MEANING OF PYTHON... 0:0 1:1 2:4 3:9 4:16 5:25 6:36 7:49 8:64 9:81 10:100

If your system executes strings multiple times, it is a major speedup to precompile to bytecode in this fashion.