Trimming a String
Problem
You need to trim some number of characters from the end(s) of a string, usually whitespace.
Solution
Use iterators to identify the portion of the string you want to remove, and the erase member function to remove it. Example 4-2 presents the function rtrim that trims a character from the end of a string.
Example 4-2. Trimming characters from a string
#include #include // The approach for narrow character strings void rtrim(std::string& s, char c) { if (s.empty( )) return; std::string::iterator p; for (p = s.end( ); p != s.begin( ) && *--p == c;); if (*p != c) p++; s.erase(p, s.end( )); } int main( ) { std::string s = "zoo"; rtrim(s, 'o'); std::cout << s << ' '; }
Discussion
Example 4-2 will do the trick for strings of chars, but it only works for char strings. Just like you saw in Example 4-1, you can take advantage of the generic design of basic_string and use a function template instead. Example 4-3 uses a function template to trim characters from the end of any kind of character string.
Example 4-3. A generic version of rtrim
#include #include using namespace std; // The generic approach for trimming single // characters from a string template void rtrim(basic_string& s, T c) { if (s.empty( )) return; typename basic_string::iterator p; for (p = s.end( ); p != s.begin( ) && *--p == c;); if (*p != c) p++; s.erase(p, s.end( )); } int main( ) { string s = "Great!!!!"; wstring ws = L"Super!!!!"; rtrim(s, '!'); rtrim(ws, L'!'); cout << s << ' '; wcout << ws << L' '; }
This function works exactly the same way as the previous, nongeneric, version in Example 4-2, but since it is parameterized on the type of character being used, it will work for basic_strings of any kind.
Examples Example 4-2 and Example 4-3 remove sequences of a single character from a string. Trimming whitespace is different, however, because whitespace can be one of several characters. Conveniently, the standard library provides a concise way to do this: the isspace function in the header (and its wchar_t equivalent, iswspace, in ). Example 4-4 defines a generic function that trims trailing whitespace.
Example 4-4. Trim trailing whitespace
#include #include #include #include using namespace std; template void rtrimws(basic_string& s, F f) { if (s.empty( )) return; typename basic_string::iterator p; for (p = s.end( ); p != s.begin( ) && f(*--p);); if (!f(*p)) p++; s.erase(p, s.end( )); } // Overloads to make cleaner calling for client code void rtrimws(string& s) { rtrimws(s, isspace); } void rtrimws(wstring& ws) { rtrimws(ws, iswspace); } int main( ) { string s = "zing "; wstring ws = L"zong "; rtrimws(s); rtrimws(ws); cout << s << "| "; wcout << ws << L"| "; }
The function template in Example 4-4, rtrimws, is a generic function template, similar to the previous examples, that accepts a basic_string and trims whitespace from the end of it. But unlike the other examples, it takes a function object, and not a character, that is used to test an element of the string to determine whether it should be removed.
You don't need to overload rtrimws as I did in the example, but it makes the syntax cleaner when using the function, since the calling code can omit the predicate function argument when using them.
But alas, this solution requires that you write the code yourself. If you would rather use a libraryand a good one at thatBoost's String Algorithms library supplies lots of functions for trimming strings, and chances are that what you need is already there. In fact, there are lots of handy trimming functions in the String Algorithms library, so if you can use Boost you should take a look. Table 4-1 lists the function templates in the library that you can use for trimming strings, including some miscellaneous functions. Since these are function templates, they have template parameters that represent the different types used. Here is what each of them mean:
Seq
This is a type that satisfies the sequence requirements as defined in the C++ standard.
Coll
This is a type that satisfies a less-restrictive set of requirements than a standard sequence. See the Boost String Algorithms definitions a detailed description of the requirements a collection satisfies.
Pred
This is a function object or function pointer that takes a single argument and returns a boolin other words, an unary predicate. You can supply your own unary predicates to some of the trimming functions to trim elements that satisfy certain criteria.
OutIt
This is a type that satisfies the requirements of an output iterator as defined in the C++ standard, namely that you can increment it and assign to the new location to add an element to the end of the sequence to which it points.
Declaration |
Description |
---|---|
template void trim(Seq& s, const locale& loc = locale( )); |
Trim spaces from both ends of a string in place using the locale's classification function for identifying the space character. |
template void trim_if(Seq& s, Pred p); |
Trim elements from each end of the sequence s for which p(*it) is TRue, where it is an iterator that refers to an element in the sequence. The trimming ceases when p(*it) = false. |
template Seq trim_copy(const Seq& s, const locale& loc = locale( )); |
Does the same thing as trim, but instead of modifying s it returns a new sequence with the trimmed results. |
template Seq trim_copy_if(const Seq& s, Pred p); |
Does the same thing as trim_if, but instead of modifying s it returns a new sequence with the trimmed results. |
template OutIt trim_copy_if(OutIt out, const Coll& c, Pred p); |
Does the same thing as the previous version of trim_copy_if, with a few differences. First, it gives the guarantee of strong exception safety. Second, it takes an output iterator as the first argument and returns an output iterator that refers to one position past the end of the destination sequence. Finally, it takes a collection type instead of a sequence type; see the list before this table for more information. |
TRim_lefttrim_right |
Works like TRim, but only for the left or right end of a string. |
TRim_left_iftrim_right_if |
Works like trim_if, but only for the left or right end of a string. |
trim_left_copytrim_right_copy |
Works like trim_copy, but only for the left or right end of a string. |
trim_left_copy_iftrim_right_copy_if |
Works like trim_copy_if, but only for the left or right end of a string. Both have two versions, one that operates on a sequence and another that operates on a collection. |
The first four function templates described in Table 4-1 are the core functionality of the String Algorithms library's trim functions. The rest are variations on those themes. To see some of them in action, take a look at Example 4-5. It shows some of the advantages of using these functions over string member functions.
Example 4-5. Using Boost's string trim functions
#include #include #include using namespace std; using namespace boost; int main( ) { string s1 = " leading spaces?"; trim_left(s1); // Trim the original string s2 = trim_left_copy(s1); // Trim, but leave original intact cout << "s1 = " << s1 << endl; cout << "s2 = " << s2 << endl; s1 = "YYYYboostXXX"; s2 = trim_copy_if(s1, is_any_of("XY")); // Use a predicate to trim_if(s1, is_any_of("XY")); cout << "s1 = " << s1 << endl; cout << "s2 = " << s2 << endl; s1 = "1234 numbers 9876"; s2 = trim_copy_if(s1, is_digit( )); cout << "s1 = " << s1 << endl; cout << "s2 = " << s2 << endl; // Nest calls to trim functions if you like s1 = " ****Trim!*** "; s2 = trim_copy_if(trim_copy(s1), is_any_of("*")); cout << "s1 = " << s1 << endl; cout << "s2 = " << s2 << endl; }
Example 4-5 demonstrates how to use the Boost string trim functions. They are generally self-explanatory to use, so I won't go into a detailed explanation beyond what's in Table 4-1. The one function that is in the example that isn't in the table is is_any_of. This is a function template that returns a predicate function object that can be used by the TRim_if-style functions. Use it when you want to trim a set of characters. There is a similar classification function named is_from_range that takes two arguments and returns an unary predicate that returns true when a character is within the range. For example, to trim the characters a tHRough d from a string, you could do something like this:
s1 = "abcdXXXabcd"; trim_if(s1, is_from_range('a', 'd')); cout << "s1 = " << s1 << endl; // Now s1 = XXX
Note that this works in a case-sensitive way, since the range a tHRough d does not include the uppercase versions of those letters.