Autocorrect Text as a Buffer Changes
Problem
You have a class that represents some kind of text field or document, and as text is appended to it, you want to correct automatically misspelled words the way Microsoft Word's Autocorrect feature does.
Solution
Using a map, defined in , strings, and a variety of standard library features, you can implement this with relatively little code. Example 4-31 shows how to do it.
Example 4-31. Autocorrect text
#include #include #include #include using namespace std; typedef map StrStrMap; // Class for holding text fields class TextAutoField { public: TextAutoField(StrStrMap* const p) : pDict_(p) {} ~TextAutoField( ) {} void append(char c); void getText(string& s) {s = buf_;} private: TextAutoField( ); string buf_; StrStrMap* const pDict_; }; // Append with autocorrect void TextAutoField::append(char c) { if ((isspace(c) || ispunct(c)) && // Only do the auto- buf_.length( ) > 0 && // correct when ws or !isspace(buf_[buf_.length( ) - 1])) { // punct is entered string::size_type i = buf_.find_last_of(" f v"); i = (i == string::npos) ? 0 : ++i; string tmp = buf_.substr(i, buf_.length( ) - i); StrStrMap::const_iterator p = pDict_->find(tmp); if (p != pDict_->end( )) { // Found it, so erase buf_.erase(i, buf_.length( ) - i); // and replace buf_ += p->second; } } buf_ += c; } int main( ) { // Set up the map StrStrMap dict; TextAutoField txt(&dict); dict["taht"] = "that"; dict["right"] = "wrong"; dict["bug"] = "feature"; string tmp = "He's right, taht's a bug."; cout << "Original: " << tmp << ' '; for (string::iterator p = tmp.begin( ); p != tmp.end( ); ++p) { txt.append(*p); } txt.getText(tmp); cout << "Corrected version is: " << tmp << ' '; }
The output of Example 4-31 is:
Original: He's right, taht's a bug. Corrected version is: He's wrong, that's a feature.
Discussion
strings and maps are handy for situations when you have to keep track of string associations. TextAutoField is a simple text buffer that uses a string to hold its data. What makes TextAutoField interesting is its append method, which "listens" for whitespace or punctuation, and does some processing when either one occurs.
To make this autocorrect behavior a reality, you need two things. First, you need a dictionary of sorts that contains the common misspelling of a word and the associated correct spelling. A map stores key-value pairs, where the key and value can be of any types, so it's an ideal candidate. At the top of Example 4-31, there is a typedef for a map of string pairs:
typedef map StrStrMap;
See Recipe 4.18 for a more detailed explanation of maps. TextAutoField stores a pointer to the map, because most likely you would want a single dictionary for use by all fields.
Assuming client code puts something meaningful in the map, append just has to periodically do lookups in the map. In Example 4-31, append waits for whitespace or punctuation to do its magic. You can test a character for whitespace with isspace, or for punctuation by using ispunct, both of which are defined in for narrow characters (take a look at Table 4-3).
The code that does a lookup requires some explanation if you are not familiar with using iterators and find methods on STL containers. The string tmp contains the last chunk of text that was appended to the TextAutoField. To see if it is a commonly misspelled work, look it up in the dictionary like this:
StrStrMap::iterator p = pDict_->find(tmp); if (p != pDict_->end( )) {
The important point here is that map::find returns an iterator that points to the pair containing the matching key, if it was found. If not, an iterator pointing to one past the end of the map is returned, which is exactly what map::end returns (this is how all STL containers that support find work). If the word was found in the map, erase the old word from the buffer and replace it with the correct version:
buf_.erase(i, buf_.length( ) - i); buf_ += p->second;
Append the character that started the process (either whitespace or punctuation) and you're done.
See Also
Recipe 4.17, Recipe 4.18, and Table 4-3