Pickled Objects
Probably the biggest limitation of DBM keyed files is in what they can store: data stored under a key must be a simple text string. If you want to store Python objects in a DBM file, you can sometimes manually convert them to and from strings on writes and reads (e.g., with str and eval calls), but this only takes you so far. For arbitrarily complex Python objects like class instances, you need something more. Class instance objects, for example, cannot be later recreated from their standard string representations.
The Python pickle module, a standard part of the Python system, provides the conversion step needed. It converts Python in-memory objects to and from a single linear string format, suitable for storing in flat files, shipping across network sockets, and so on. This conversion from object to string is often called serialization -- arbitrary data structures in memory are mapped to a serial string form. The string representation used for objects is also sometimes referred to as a byte-stream, due to its linear format.
16.4.1 Using Object Pickling
Pickling may sound complicated the first time you encounter it, but the good news is that Python hides all the complexity of object-to-string conversion. In fact, the pickle module's interfaces are incredibly simple to use. The following list describes a few details of this interface.
P = pickle.Pickler( file)
Make a new pickler for pickling to an open output file object file.
P.dump( object)
Write an object onto the pickler's file/stream.
pickle.dump( object, file)
Same as the last two calls combined: pickle an object onto an open file.
U = pickle.Unpickler( file)
Make an unpickler for unpickling from an open input file object file.
object = U.load( )
Read an object from the unpickler's file/stream.
object = pickle.load( file)
Same as the last two calls combined: unpickle an object from an open file.
string = pickle.dumps( object)
Return the pickled representation of object as a character string.
object = pickle.loads( string)
Read an object from a character string instead of a file.
Pickler and Unpickler are exported classes. In all of these, file is either an open file object or any object that implements the same attributes as file objects:
- Pickler calls the file's write method with a string argument.
- Unpickler calls the file's read method with a byte count, and readline without arguments.
Any object that provides these attributes can be passed in to the "file" parameters. In particular, file can be an instance of a Python class that provides the read/write methods. This lets you map pickled streams to in-memory objects, for arbitrary use. It also lets you ship Python objects across a network, by providing sockets wrapped to look like files in pickle calls at the sender and unpickle calls at the receiver (see Making Sockets Look Like Files in Chapter 10, for more details).
In more typical use, to pickle an object to a flat file, we just open the file in write-mode, and call the dump function; to unpickle, reopen and call load:
% python >>> import pickle >>> table = {'a': [1, 2, 3], 'b': ['spam', 'eggs'], 'c':{'name':'bob'}} >>> mydb = open('dbase', 'w') >>> pickle.dump(table, mydb) % python >>> import pickle >>> mydb = open('dbase', 'r') >>> table = pickle.load(mydb) >>> table {'b': ['spam', 'eggs'], 'a': [1, 2, 3], 'c': {'name': 'bob'}}
To make this process simpler still, the module in Example 16-1 wraps pickling and unpickling calls in functions that also open the files where the serialized form of the object is stored.
Example 16-1. PP2EDbasefilepickle.py
import pickle def saveDbase(filename, object): file = open(filename, 'w') pickle.dump(object, file) # pickle to file file.close( ) # any file-like object will do def loadDbase(filename): file = open(filename, 'r') object = pickle.load(file) # unpickle from file file.close( ) # recreates object in memory return object
To store and fetch now, simply call these module functions:
C:...PP2EDbase>python >>> from filepickle import * >>> L = [0] >>> D = {'x':0, 'y':L} >>> table = {'A':L, 'B':D} # L appears twice >>> saveDbase('myfile', table) # serialize to file C:...PP2EDbase>python >>> from filepickle import * >>> table = loadDbase('myfile') # reload/unpickle >>> table {'B': {'x': 0, 'y': [0]}, 'A': [0]} >>> table['A'][0] = 1 >>> saveDbase('myfile', table) C:...PP2EDbase>python >>> from filepickle import * >>> print loadDbase('myfile') # both L's updated as expected {'B': {'x': 0, 'y': [1]}, 'A': [1]}
Python can pickle just about anything, except compiled code objects, instances of classes that do not follow importability rules we'll meet later, and instances of some built-in and user-defined types that are coded in C or depend upon transient operating system states (e.g., open file objects cannot be pickled). A PicklingError is raised if an object cannot be pickled.
Refer to Python's library manual for more information on the pickler. And while you are flipping (or clicking) through that manual, be sure to also see the entries for the cPickle module -- a reimplementation of pickle coded in C for faster performance. Also check out marshal, a module that serializes an object, too, but can only handle simple object types. If available in your Python, the shelve module automatically chooses the cPickle module for faster serialization, not pickle. I haven't explained shelve yet, but I will now.