DBM Files
Flat files are handy for simple persistence tasks, but are generally geared towards a sequential processing mode. Although it is possible to jump around to arbitrary locations with seek calls, flat files don't provide much structure to data beyond the notion of bytes and text lines.
DBM files, a standard tool in the Python library for database management, improve on that by providing key-based access to stored text strings. They implement a random-access, single-key view on stored data. For instance, information related to objects can be stored in a DBM file using a unique key per object and later can be fetched back directly with the same key. DBM files are implemented by a variety of underlying modules (including one coded in Python), but if you have Python, you have a DBM.
16.3.1 Using DBM Files
Although DBM filesystems have to do a bit of work to map chunks of stored data to keys for fast retrieval (technically, they generally use a technique called hashing to store data in files), your scripts don't need to care about the action going on behind the scenes. In fact, DBM is one of the easiest ways to save information in Python -- DBM files behave so much like in-memory dictionaries that you may forget you're actually dealing with a file. For instance, given a DBM file object:
- Indexing by key fetches data from the file.
- Assigning to an index stores data in the file.
DBM file objects also support common dictionary methods such as keys-list fetches and tests, and key deletions. The DBM library itself is hidden behind this simple model. Since it is so simple, let's jump right into an interactive example that creates a DBM file and shows how the interface works:
% python >>> import anydbm # get interface: dbm, gdbm, ndbm,.. >>> file = anydbm.open('movie', 'c') # make a dbm file called 'movie' >>> file['Batman'] = 'Pow!' # store a string under key 'Batman' >>> file.keys( ) # get the file's key directory ['Batman'] >>> file['Batman'] # fetch value for key 'Batman' 'Pow!' >>> who = ['Robin', 'Cat-woman', 'Joker'] >>> what = ['Bang!', 'Splat!', 'Wham!'] >>> for i in range(len(who)): ... file[who[i]] = what[i] # add 3 more "records" ... >>> file.keys( ) ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>> len(file), file.has_key('Robin'), file['Joker'] (4, 1, 'Wham!') >>> file.close( ) # close sometimes required
Internally, importing anydbm automatically loads whatever DBM interface is available in your Python interpreter, and opening the new DBM file creates one or more external files with names that start with the string "movie" (more on the details in a moment). But after the import and open, a DBM file is virtually indistinguishable from a dictionary. In effect, the object called file here can be thought of as a dictionary mapped to an external file called movie.
Unlike normal dictionaries, though, the contents of file are retained between Python program runs. If we come back later and restart Python, our dictionary is still available. DBM files are like dictionaries that must be opened:
% python >>> import anydbm >>> file = anydbm.open('movie', 'c') # open existing dbm file >>> file['Batman'] 'Pow!' >>> file.keys( ) # keys gives an index list ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Robin Bang! Cat-woman Splat! Batman Pow! >>> file['Batman'] = 'Ka-Boom!' # change Batman slot >>> del file['Robin'] # delete the Robin entry >>> file.close( ) # close it after changes
Apart from having to import the interface and open and close the DBM file, Python programs don't have to know anything about DBM itself. DBM modules achieve this integration by overloading the indexing operations and routing them to more primitive library tools. But you'd never know that from looking at this Python code -- DBM files look like normal Python dictionaries, stored on external files. Changes made to them are retained indefinitely:
% python >>> import anydbm # open dbm file again >>> file = anydbm.open('movie', 'c') >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Cat-woman Splat! Batman Ka-Boom!
As you can see, this is about as simple as it can be. Table 16-1 lists the most commonly used DBM file operations. Once such a file is opened, it is processed just as though it were an in-memory Python dictionary. Items are fetched by indexing the file object by key and stored by assigning to a key.
Python Code |
Action |
Description |
---|---|---|
import anydbm |
Import |
Get dbm, gdbm ,... whatever is installed |
file = anydbm.open('filename', 'c') |
Open[1] |
Create or open an existing DBM file |
file['key'] = 'value' |
Store |
Create or change the entry for key |
value = file['key'] |
Fetch |
Load the value for entry key |
count = len(file) |
Size |
Return the number of entries stored |
index = file.keys( ) |
Index |
Fetch the stored keys list |
found = file. has_key('key') |
Query |
See if there's an entry for key |
del file['key'] |
Delete |
Remove the entry for key |
file.close( ) |
Close |
Manual close, not always needed |
[1] In Python versions 1.5.2 and later, be sure to pass a string c as a second argument when calling anydbm.open, to force Python to create the file if it does not yet exist, and simply open it otherwise. This used to be the default behavior but is no longer. You do not need the c argument when opening shelves discussed ahead -- they still use an "open or create" mode by default if passed no open mode argument. Other open mode strings can be passed to anydbm (e.g., n to always create the file, and r for read only -- the new default); see the library reference manuals for more details.
Despite the dictionary-like interface, DBM files really do map to one or more external files. For instance, the underlying gdbm interface writes two files, movie.dir and movie.pag, when a GDBM file called movie is made. If your Python was built with a different underlying keyed-file interface, different external files might show up on your computer.
Technically, module anydbm is really an interface to whatever DBM-like filesystem you have available in your Python. When creating a new file, anydbm today tries to load the dbhash, gdbm , and dbm keyed-file interface modules; Pythons without any of these automatically fall back on an all-Python implementation called dumbdbm. When opening an already-existing DBM file, anydbm tries to determine the system that created it with the whichdb module instead. You normally don't need to care about any of this, though (unless you delete the files your DBM creates).
Note that DBM files may or may not need to be explicitly closed, per the last entry in Table 16-1. Some DBM files don't require a close call, but some depend on it to flush changes out to disk. On such systems, your file may be corrupted if you omit the close call. Unfortunately, the default DBM on the 1.5.2 Windows Python port, dbhash (a.k.a., bsddb), is one of the DBM systems that requires a close call to avoid data loss. As a rule of thumb, always close your DBM files explicitly after making changes and before your program exits, to avoid potential problems. This rule extends by proxy to shelves, a topic we'll meet later in this chapter.