Strings and Text
Introduction
This chapter contains recipes for working with strings and text files. Most C++ programs, regardless of their application, manipulate strings and text files to some degree. Despite the variety of applications, however, the requirements are often the samefor strings: trimming, padding, searching, splitting, and so on; for text files: wrapping, reformatting, reading delimited files, and more. The recipes that follow provide solutions to many of these common needs that do not have ready-made solutions in the C++ standard library.
The standard library is portable, standardized, and, in general, at least as efficient as homemade solutions, so in the following examples I have preferred it over code from scratch. It contains a rich framework for manipulating and managing strings and text, much of which is in the form of the class templates basic_string (for strings), basic_istream, and basic_ostream (for input and output text streams). Almost all of the techniques in this chapter use or extend these class templates. In cases where they didn't have what I wanted, I turned to another area of the standard library that is full of generic, prebuilt solutions: algorithms and containers.
Everybody uses strings, so chances are that if what you need isn't in the standard library, someone has written it. The Boost String Algorithms library, written by Pavol Droba, fills many of the gaps in the standard library by implementing most of the algorithms that you've had to use at one time or another, and it does it in a portable, efficient way. Check out the Boost project at www.boost.org for more information and documentation of the String Algorithms library. There is some overlap between the String Algorithms library and the solutions I present in this chapter. In most cases, I provide examples of or at least mention Boost algorithms that are related to the solutions presented.
For most examples, I have provided both a nontemplate and a template version. I did this for two reasons. First, most of the areas of the standard library that use character data are class or function templates that are parameterized on the type of character, narrow (char) or wide (wchar_t). By following this model, you will help maximize the compatibility of your software with the standard library. Second, whether you are working with the standard library or not, class and function templates provide an excellent facility for writing generic software. If you do not need templates, however, you can use the nontemplate versions, though I recommend experimenting with templates if you are new to them.
The standard library makes heavy use of templates and uses typedefs to insulate programmers from some of the verbose syntax that templates use. As a result, I use the terms basic_string, string, and wstring interchangeably, since what applies to one usually applies to them all. string and wstring are just typedefs for basic_string and basic_string.
Finally, you will probably notice that none of the recipes in this chapter use C-style strings, i.e., null-terminated character arrays. The standard library provides such a wealth of efficient and extensible support for C++ strings that to use C-style string functions (which were provided primarily for backward-compatibility anyway) is to forego the flexibility, safety, and generic nature of what you get for free with your compiler: C++ string classes.