Mastering Regular Expressions

5.1. Regex Balancing Act

Writing a good regex involves striking a balance among several concerns:

  • Matching what you want, but only what you want

  • Keeping the regex manageable and understandable

  • For an NFA, being efficient (creating a regex that leads the engine quickly to a match or a non-match, as the case may be)

These concerns are often context-dependent. If I'm working on the command line and just want to grep something quickly, I probably don't care if I match a bit more than I need, and I won't usually be too concerned to craft just the right regex for it. I'll allow myself to be sloppy in the interest of time, since I can quickly peruse the output for what I want. However, when I'm working on an important program, it's worth the time and effort to get it right: a complex regular expression is OK if that's what it takes. There is a balance among all these issues.

Efficiency is context-dependent, even in a program. For example, with an NFA, something long like ^-(displaygeometrycemap ‹ quick24randomraw)$ to check command-line arguments is inefficient because of all that alternation , but since it is only checking command-line arguments (something done perhaps a few times at the start of the program) it wouldnt matter if it took 100 times longer than needed. It's just not an important place to worry much about efficiency. Were it used to check each line of a potentially large file, the inefficiency would penalize you for the duration of the program.

Категории