Reading a Comma-Separated Text File
Problem
You want to read in a text file that is delimited by commas and new lines (or any other pair of delimiters for that matter). Records are delimited by one character, and fields within a record are delimited by another. For example, a comma-separated text file of employee information may look like the following:
Smith, Bill, 5/1/2002, Active Stanford, John, 4/5/1999, Inactive
Such files are usually interim storage for data sets exported from spreadsheets, databases, or other file formats.
Solution
See Example 4-32 for how to do this. If you read the text into strings one contiguous chunk at a time using getline (the function template defined in ) you can use the split function I presented in Recipe 4.6 to parse the text and put it in a data structure, in this case, a vector.
Example 4-32. Reading in a delimited file
#include #include #include #include using namespace std; void split(const string& s, char c, vector& v) { int i = 0; int j = s.find(c); while (j >= 0) { v.push_back(s.substr(i, j-i)); i = ++j; j = s.find(c, j); if (j < 0) { v.push_back(s.substr(i, s.length( ))); } } } void loadCSV(istream& in, vector*>& data) { vector* p = NULL; string tmp; while (!in.eof( )) { getline(in, tmp, ' '); // Grab the next line p = new vector( ); split(tmp, ',', *p); // Use split from // Recipe 4.7 data.push_back(p); cout << tmp << ' '; tmp.clear( ); } } int main(int argc, char** argv) { if (argc < 2) return(EXIT_FAILURE); ifstream in(argv[1]); if (!in) return(EXIT_FAILURE); vector*> data; loadCSV(in, data); // Go do something useful with the data... for (vector*>::iterator p = data.begin( ); p != data.end( ); ++p) { delete *p; // Be sure to de- } // reference p! }
Discussion
There isn't much in Example 4-32 that hasn't been covered already. I discussed getline in Recipe 4.19 and vectors in Recipe 4.3. The only piece worth mentioning has to do with memory allocation.
loadCSV creates a new vector for each line of data it reads in and stores it in yet another vector of pointers to vectors. Since the memory for each of these vectors is allocated on the heap, somebody has to de-allocate it, and that somebody is you (and not the vector implementation).
The vector has no knowledge of whether it contains a value or a pointer to a value, or anything else. All it knows is that when it's destroyed, it needs to call the destructor for each element it contains. If the vector stores objects, then this is fine; the object is properly destroyed. But if the vector contains pointers, the pointer is destroyed, but not the object it points to.
There are two ways to ensure the memory is freed. First, you can do what I did in Example 4-32 and do it manually yourself, like this:
for (vector*>::iterator p = data.begin( ); p != data.end( ); ++p) { delete *p; }
Or you can use a reference-counted pointer, such as the Boost project's smart_ptr, which will be part of the forthcoming C++0x standard. But doing so is nontrivial, so I recommend reading up on what a smart_ptr is and how it works. For more information on Boost in general, see the homepage at www.boost.org.