Sorting Localized Strings

Problem

You have a sequence of strings that contain non-ASCII characters, and you need to sort according to local convention.

Solution

The locale class has built-in support for comparing characters in a given locale by overriding operator. You can use an instance of the locale class as your comparison functor when you call any standard function that takes a functor for comparison. (See Example 13-8.)

Example 13-8. Locale-specific sorting

#include #include #include #include #include using namespace std; bool localeLessThan (const string& s1, const string& s2) { const collate& col = use_facet >(locale( )); // Use the global locale const char* pb1 = s1.data( ); const char* pb2 = s2.data( ); return (col.compare(pb1, pb1 + s1.size( ), pb2, pb2 + s2.size( )) < 0); } int main( ) { // Create two strings, one with a German character string s1 = "diät"; string s2 = "dich"; vector v; v.push_back(s1); v.push_back(s2); // Sort without giving a locale, which will sort according to the // current global locale's rules. sort(v.begin( ), v.end( )); for (vector::const_iterator p = v.begin( ); p != v.end( ); ++p) cout << *p << endl; // Set the global locale to German, and then sort locale::global(locale("german")); sort(v.begin( ), v.end( ), localeLessThan); for (vector::const_iterator p = v.begin( ); p != v.end( ); ++p) cout << *p << endl; }

The first sort follows ASCII sorting convention, and therefore the output looks like this:

dich diät

The second sort uses the proper ordering according to German semantics, and it is just the opposite:

diät dich

 

Discussion

Sorting becomes more complicated when you're working in different locales, and the standard library solves this problem. The facet collate provides a member function compare that works like strcmp: it returns -1 if the first string is less than the second, 0 if they are equal, and 1 if the first string is greater than the second. Unlike strcmp, collate::compare uses the character semantics of the target locale.

Example 13-8 presents the function localeLessThan , which returns TRue if the first argument is less than the second according to the global locale. The most important part of the function is the call to compare:

col.compare(pb1, // Pointer to the first char pb1 + s1.size( ), // Pointer to one past the last char pb2, pb2 + s2.size( ))

Depending on the execution character set of your implementation, Example 13-8 may return the results I showed earlier or not. But if you want to ensure string comparison works in a locale-specific manner, you should use collate::compare. Of course, the standard does not require an implementation to support any locales other than "C," so be sure to test for all the locales you support.

Категории