Storing Strings in a Sequence
Problem
You want to store a set of strings in a sequence that looks and feels like an array.
Solution
Use a vector for array-like storage of your strings. Example 4-6 offers a simple example.
Example 4-6. Store strings in a vector
#include #include #include using namespace std; int main( ) { vector v; string s = "one"; v.push_back(s); s = "two"; v.push_back(s); s = "three"; v.push_back(s); for (int i = 0; i < v.size( ); ++i) { cout << v[i] << ' '; } }
vectors follow array semantics for random access (they also do a lot more), so they are easy and familiar to use. vectors are just one of many sequences in the standard library, however; read on for more of this broad subject.
Discussion
A vector is a dynamically sized sequence of objects that provides array-style operator[] random access. The member function push_back copies its argument via copy constructor, adds that copy as the last item in the vector, and increments its size by one. pop_back does the exact opposite, by removing the last element. Inserting or deleting items from the end of a vector takes amortized constant time, and inserting or deleting from any other location takes linear time. These are the basics of vectors. There is a lot more to them.
In most cases, a vector should be your first choice over a C-style array. First of all, they are dynamically sized, which means they can grow as needed. You don't have to do all sorts of research to figure out an optimal static size, as in the case of C arrays; a vector grows as needed, and it can be resized larger or smaller manually if you need to. Second, vectors offer bounds checking with the at member function (but not with operator[]), so that you can do something if you reference a nonexistent index instead of simply watching your program crash or worse, continuing execution with corrupt data. Look at Example 4-7. It shows how to deal with out-of-bounds indexes.
Example 4-7. Bounds-checking on vectors
#include #include #include using namespace std; int main( ) { char carr[] = {'a', 'b', 'c', 'd', 'e'}; cout << carr[100000] << ' '; // Whoops, who knows what's going // to happen vector v; v.push_back('a'); v.push_back('b'); v.push_back('c'); v.push_back('d'); v.push_back('e'); try { cout << v.at(10000) << ' '; // at checks bounds and throws } catch(out_of_range& e) { // out_of_range if it's invalid cerr << e.what( ) << ' '; } }
If you catch out_of_range, defined in , you can deal with invalid indexes in a meaningful way. And you can call the what member function to, depending on your implementation, get a useful error message, like this one returned by the code in Example 4-7:
invalid vector subscript
vectors aren't your only option though. There are lots of ways to store sequences of things in C++. In addition to vectors, there are lists, sets, and double-ended queues (deques). All support many of the same operations, and each supports operations of its own. In addition, each has different algorithmic complexity guarantees, storage requirements, and semantics in general. There is a lot to choose from.
Look at Example 4-6 closely. You will probably notice that I keep changing the value of the string s before I add it to the back of the container with push_back. You could reasonably expect the output to look like this:
three three three
I pushed the same string on the end of the vector three times, so each time I reassign the string, don't all vector elements now just refer to the same thing? No. This is an important point about STL containers.
STL containers store copies of the objects you put into them, not the objects themselves. So after I've put all three of my strings in the container, there are four strings in memory: the three copies that were made and are now "in" the container, and the one copy I've been assigning values to.
Who cares? So a few extra copies have been made: big deal. It is a big deal, because if whatever you are writing uses a lot of strings, you are going to pay for all of that copying with processor time, or memory, or both. Copying elements in and out of containers is the intentional behavior of the STL, and all containers work that way.
A solution to this (certainly not the solution) is to store pointers in the container instead. Just remember that the container doesn't delete the pointers when it is destroyed. Your code allocated the memory for the pointer, so your code has to clean it up. This goes for when the container is destroyed entirely, or when the element is removed.
In the interest of providing alternative solutions, let's explore another option. Consider the class template list, defined in , which is a doubly linked list. If you plan on having lots of inserts and deletes in the middle of the sequence, or if you want to ensure that iterators pointing to elements of the sequence are not invalidated when you modify the sequence, you may want to use a list. Example 4-8 uses a list instead of a vector to store a few strings; it also uses for_each to iterate through them and print them out instead of the index operator, as you would have to do with a simple array.
Example 4-8. Storing strings in a list
#include #include #include #include using namespace std; void write(const string& s) { cout << s << ' '; } int main( ) { list lst; string s = "knife"; lst.push_front(s); s = "fork"; lst.push_back(s); s = "spoon"; lst.push_back(s); // A list doesn't have random access, so // use for_each( ) instead for_each(lst.begin( ), lst.end( ), write); }
The point of this digression from the original problem (storing strings in a sequence) is to give a brief introduction to the sequences in the STL. I can't give comprehensive coverage of the topic here. For an overview of the STL, see Chapter 10 of C++ in a Nutshell, by Ray Lischner (O'Reilly).