Mastering Regular Expressions
4.7. Summary
If you understood everything in this chapter the first time you read it, you probably didn't need to read it in the first place. It's heady stuff, to say the least. It took me quite a while to understand it, and then longer still to understand it. I hope this one concise presentation makes it easier for you. I've tried to keep the explanation simple without falling into the trap of oversimplification (an unfortunately all-too-common occurrence which hinders true understanding). This chapter has a lot in it, so I've included a lot of page references in the following summary, for when you'd like to quickly check back on something. There are two underlying technologies commonly used to implement a regex match engine, " regex-directed NFA" (˜ 153) and "text-directed DFA" (˜ 155). The abbreviations are spelled out on page 156. Combine the two technologies with the POSIX standard (˜ 178), and for practical purposes, there are three types of engines:
To get the most out of a utility, you need to understand which type of engine it uses, and craft your regular expressions appropriately. The most common type is the Traditional NFA, followed by the DFA. Table 4-1 (˜ 145) lists a few common tools and their engine types, and the section "Testing the Engine Type" (˜ 146) shows how you can test the type yourself. One overriding rule regardless of engine type: matches starting sooner take precedence over matches starting later. This is due to how the engine's "transmission" tests the regex at each point in the string (˜ 148). For the match attempt starting at any given spot:
Understanding the concepts and practices covered in this chapter is the foundation for writing correct and efficient regular expressions, which just happens to be the subject of the next two chapters. |