Parser Generators
If you have any background in parsing theory, you may know that neither regular expressions nor string splitting is powerful enough to handle more complex language grammars (roughly, they don't have the "memory" required by true grammars). For more sophisticated language analysis tasks, we sometimes need a full-blown parser. Since Python is built for integrating C tools, we can write integrations to traditional parser generator systems such as yacc and bison. Better yet, we could use an integration that already exists.
There are also Python-specific parsing systems accessible from Python's web site. Among them, the kwParsing system, developed by Aaron Watters, is a parser generator written in Python, and the SPARK toolkit, developed by John Aycock, is a lightweight system that employs the Earley algorithm to work around technical problems with LALR parser generation (if you don't know what that means, you probably don't need to care). Since these are all complex tools, though, we'll skip their details in this text. Consult http://www.python.org for information on parser generator tools available for use in Python programs.
Of special interest to this chapter, also see YAPPS -- Yet Another Python Parser System. YAPPS is a parser generator written in Python. It uses supplied rules to generate human-readable Python code that implements a recursive descent parser. The parsers generated by YAPPS look much like (and are inspired by) the hand-coded expression parsers shown in the next section. YAPPS creates LL(1) parsers, which are not as powerful as LALR parsers, but sufficient for many language tasks. For more on YAPPS, see http://theory.stanford.edu/~amitp/Yapps.