Effective C++ Third Edition 55 Specific Ways to Improve Your Programs and Designs

So you go into your C++ program and make a minor change to the implementation of a class. Not the class interface, mind you, just the implementation; only the private stuff. Then you rebuild the program, figuring that the exercise should take only a few seconds. After all, only one class has been modified. You click on Build or type make (or some equivalent), and you are astonished, then mortified, as you realize that the whole world is being recompiled and relinked! Don't you just hate it when that happens?

The problem is that C++ doesn't do a very good job of separating interfaces from implementations. A class definition specifies not only a class interface but also a fair number of implementation details. For example:

class Person { public: Person(const std::string& name, const Date& birthday, const Address& addr); std::string name() const; std::string birthDate() const; std::string address() const; ... private: std::string theName; // implementation detail Date theBirthDate; // implementation detail Address theAddress; // implementation detail };

Here, class Person can't be compiled without access to definitions for the classes the Person implementation uses, namely, string, Date, and Address. Such definitions are typically provided through #include directives, so in the file defining the Person class, you are likely to find something like this:

#include <string> #include "date.h" #include "address.h"

Unfortunately, this sets up a compilation dependency between the file defining Person and these header files. If any of these header files is changed, or if any of the header files they depend on changes, the file containing the Person class must be recompiled, as must any files that use Person. Such cascading compilation dependencies have caused many a project untold grief.

You might wonder why C++ insists on putting the implementation details of a class in the class definition. For example, why can't you define Person this way, specifying the implementation details of the class separately?

namespace std { class string; // forward declaration (an incorrect } // one see below) class Date; // forward declaration class Address; // forward declaration class Person { public: Person(const std::string& name, const Date& birthday, const Address& addr); std::string name() const; std::string birthDate() const; std::string address() const; ... };

If that were possible, clients of Person would have to recompile only if the interface to the class changed.

There are two problems with this idea. First, string is not a class, it's a typedef (for basic_string<char>). As a result, the forward declaration for string is incorrect. The proper forward declaration is substantially more complex, because it involves additional templates. That doesn't matter, however, because you shouldn't try to manually declare parts of the standard library. Instead, simply use the proper #includes and be done with it. Standard headers are unlikely to be a compilation bottleneck, especially if your build environment allows you to take advantage of precompiled headers. If parsing standard headers really is a problem, you may need to change your interface design to avoid using the parts of the standard library that give rise to the undesirable #includes.

The second (and more significant) difficulty with forward-declaring everything has to do with the need for compilers to know the size of objects during compilation. Consider:

int main() { int x; // define an int Person p( params ); // define a Person ... }

When compilers see the definition for x, they know they must allocate enough space (typically on the stack) to hold an int. No problem. Each compiler knows how big an int is. When compilers see the definition for p, they know they have to allocate enough space for a Person, but how are they supposed to know how big a Person object is? The only way they can get that information is to consult the class definition, but if it were legal for a class definition to omit the implementation details, how would compilers know how much space to allocate?

This question fails to arise in languages like Smalltalk and Java, because, when an object is defined in such languages, compilers allocate only enough space for a pointer to an object. That is, they handle the code above as if it had been written like this:

int main() { int x; // define an int Person *p; // define a pointer to a Person ... }

This, of course, is legal C++, so you can play the "hide the object implementation behind a pointer" game yourself. One way to do that for Person is to separate it into two classes, one offering only an interface, the other implementing that interface. If the implementation class is named PersonImpl, Person would be defined like this:

#include <string> // standard library components // shouldn't be forward-declared #include <memory> // for tr1::shared_ptr; see below class PersonImpl; // forward decl of Person impl. class class Date; // forward decls of classes used in class Address; // Person interface class Person { public: Person(const std::string& name, const Date& birthday, const Address& addr); std::string name() const; std::string birthDate() const; std::string address() const; ... private: // ptr to implementation; std::tr1::shared_ptr<PersonImpl> pImpl; // see Item 13 for info on }; // std::tr1::shared_ptr

Here, the main class (Person) contains as a data member nothing but a pointer (here, a tr1::shared_ptr see Item 13) to its implementation class (PersonImpl). Such a design is often said to be using the pimpl idiom ("pointer to implementation"). Within such classes, the name of the pointer is often pImpl, as it is above.

With this design, clients of Person are divorced from the details of dates, addresses, and persons. The implementations of those classes can be modified at will, but Person clients need not recompile. In addition, because they're unable to see the details of Person's implementation, clients are unlikely to write code that somehow depends on those details. This is a true separation of interface and implementation.

The key to this separation is replacement of dependencies on definitions with dependencies on declarations. That's the essence of minimizing compilation dependencies: make your header files self-sufficient whenever it's practical, and when it's not, depend on declarations in other files, not definitions. Everything else flows from this simple design strategy. Hence:

  • Avoid using objects when object references and pointers will do. You may define references and pointers to a type with only a declaration for the type. Defining objects of a type necessitates the presence of the type's definition.

  • Depend on class declarations instead of class definitions whenever you can. Note that you never need a class definition to declare a function using that class, not even if the function passes or returns the class type by value:

    class Date; // class declaration Date today(); // fine no definition void clearAppointments(Date d); // of Date is needed

    Of course, pass-by-value is generally a bad idea (see Item 20), but if you find yourself using it for some reason, there's still no justification for introducing unnecessary compilation dependencies.

    The ability to declare today and clearAppointments without defining Date may surprise you, but it's not as curious as it seems. If anybody calls those functions, Date's definition must have been seen prior to the call. Why bother to declare functions that nobody calls, you wonder? Simple. It's not that nobody calls them, it's that not everybody calls them. If you have a library containing dozens of function declarations, it's unlikely that every client calls every function. By moving the onus of providing class definitions from your header file of function declarations to clients' files containing function calls, you eliminate artificial client dependencies on type definitions they don't really need.

  • Provide separate header files for declarations and definitions. In order to facilitate adherence to the above guidelines, header files need to come in pairs: one for declarations, the other for definitions. These files must be kept consistent, of course. If a declaration is changed in one place, it must be changed in both. As a result, library clients should always #include a declaration file instead of forward-declaring something themselves, and library authors should provide both header files. For example, the Date client wishing to declare today and clearAppointments shouldn't manually forward-declare Date as shown above. Rather, it should #include the appropriate header of declarations:

    #include "datefwd.h" // header file declaring (but not // defining) class Date Date today(); // as before void clearAppointments(Date d);

    The name of the declaration-only header file "datefwd.h" is based on the header <iosfwd> from the standard C++ library (see Item 54). <iosfwd> contains declarations of iostream components whose corresponding definitions are in several different headers, including <sstream>, <streambuf>, <fstream>, and <iostream>.

    <iosfwd> is instructive for another reason, and that's to make clear that the advice in this Item applies as well to templates as to non-templates. Although Item 30 explains that in many build environments, template definitions are typically found in header files, some build environments allow template definitions to be in non-header files, so it still makes sense to provide declaration-only headers for templates. <iosfwd> is one such header.

    C++ also offers the export keyword to allow the separation of template declarations from template definitions. Unfortunately, compiler support for export is scanty, and real-world experience with export is scantier still. As a result, it's too early to say what role export will play in effective C++ programming.

Classes like Person that employ the pimpl idiom are often called Handle classes. Lest you wonder how such classes actually do anything, one way is to forward all their function calls to the corresponding implementation classes and have those classes do the real work. For example, here's how two of Person's member functions could be implemented:

#include "Person.h" // we're implementing the Person class, // so we must #include its class definition #include "PersonImpl.h" // we must also #include PersonImpl's class // definition, otherwise we couldn't call // its member functions; note that // PersonImpl has exactly the same // member functions as Person their // interfaces are identical Person::Person(const std::string& name, const Date& birthday, const Address& addr) : pImpl(new PersonImpl(name, birthday, addr)) {} std::string Person::name() const { return pImpl->name(); }

Note how the Person constructor calls the PersonImpl constructor (by using new see Item 16) and how Person::name calls PersonImpl::name. This is important. Making Person a Handle class doesn't change what Person does, it just changes the way it does it.

An alternative to the Handle class approach is to make Person a special kind of abstract base class called an Interface class. The purpose of such a class is to specify an interface for derived classes (see Item 34). As a result, it typically has no data members, no constructors, a virtual destructor (see Item 7), and a set of pure virtual functions that specify the interface.

Interface classes are akin to Java's and .NET's Interfaces, but C++ doesn't impose the restrictions on Interface classes that Java and .NET impose on Interfaces. Neither Java nor .NET allow data members or function implementations in Interfaces, for example, but C++ forbids neither of these things. C++'s greater flexibility can be useful. As Item 36 explains, the implementation of non-virtual functions should be the same for all classes in a hierarchy, so it makes sense to implement such functions as part of the Interface class that declares them.

An Interface class for Person could look like this:

class Person { public: virtual ~Person(); virtual std::string name() const = 0; virtual std::string birthDate() const = 0; virtual std::string address() const = 0; ... };

Clients of this class must program in terms of Person pointers and references, because it's not possible to instantiate classes containing pure virtual functions. (It is, however, possible to instantiate classes derived from Person see below.) Like clients of Handle classes, clients of Interface classes need not recompile unless the Interface class's interface is modified.

Clients of an Interface class must have a way to create new objects. They typically do it by calling a function that plays the role of the constructor for the derived classes that are actually instantiated. Such functions are typically called factory functions (see Item 13) or virtual constructors. They return pointers (preferably smart pointers see Item 18) to dynamically allocated objects that support the Interface class's interface. Such functions are often declared static inside the Interface class:

class Person { public: ... static std::tr1::shared_ptr<Person> // return a tr1::shared_ptr to a new create(const std::string& name, // Person initialized with the const Date& birthday, // given params; see Item 18 for const Address& addr); // why a tr1::shared_ptr is returned ... };

Clients use them like this:

std::string name; Date dateOfBirth; Address address; ... // create an object supporting the Person interface std::tr1::shared_ptr<Person> pp(Person::create(name, dateOfBirth, address)); ... std::cout << pp->name() // use the object via the << " was born on " // Person interface << pp->birthDate() << " and now lives at " << pp->address(); ... // the object is automatically // deleted when pp goes out of // scope see Item 13

At some point, of course, concrete classes supporting the Interface class's interface must be defined and real constructors must be called. That all happens behind the scenes inside the files containing the implementations of the virtual constructors. For example, the Interface class Person might have a concrete derived class RealPerson that provides implementations for the virtual functions it inherits:

class RealPerson: public Person { public: RealPerson(const std::string& name, const Date& birthday, const Address& addr) : theName(name), theBirthDate(birthday), theAddress(addr) {} virtual ~RealPerson() {} std::string name() const; // implementations of these std::string birthDate() const; // functions are not shown, but std::string address() const; // they are easy to imagine private: std::string theName; Date theBirthDate; Address theAddress; };

Given RealPerson, it is truly trivial to write Person::create:

std::tr1::shared_ptr<Person> Person::create(const std::string& name, const Date& birthday, const Address& addr) { return std::tr1::shared_ptr<Person>(new RealPerson(name, birthday,addr)); }

A more realistic implementation of Person::create would create different types of derived class objects, depending on e.g., the values of additional function parameters, data read from a file or database, environment variables, etc.

RealPerson demonstrates one of the two most common mechanisms for implementing an Interface class: it inherits its interface specification from the Interface class (Person), then it implements the functions in the interface. A second way to implement an Interface class involves multiple inheritance, a topic explored in Item 40.

Handle classes and Interface classes decouple interfaces from implementations, thereby reducing compilation dependencies between files. Cynic that you are, I know you're waiting for the fine print. "What does all this hocus-pocus cost me?" you mutter. The answer is the usual one in computer science: it costs you some speed at runtime, plus some additional memory per object.

In the case of Handle classes, member functions have to go through the implementation pointer to get to the object's data. That adds one level of indirection per access. And you must add the size of this implementation pointer to the amount of memory required to store each object. Finally, the implementation pointer has to be initialized (in the Handle class's constructors) to point to a dynamically allocated implementation object, so you incur the overhead inherent in dynamic memory allocation (and subsequent deallocation) and the possibility of encountering bad_alloc (out-of-memory) exceptions.

For Interface classes, every function call is virtual, so you pay the cost of an indirect jump each time you make a function call (see Item 7). Also, objects derived from the Interface class must contain a virtual table pointer (again, see Item 7). This pointer may increase the amount of memory needed to store an object, depending on whether the Interface class is the exclusive source of virtual functions for the object.

Finally, neither Handle classes nor Interface classes can get much use out of inline functions. Item 30 explains why function bodies must typically be in header files in order to be inlined, but Handle and Interface classes are specifically designed to hide implementation details like function bodies.

It would be a serious mistake, however, to dismiss Handle classes and Interface classes simply because they have a cost associated with them. So do virtual functions, and you wouldn't want to forgo those, would you? (If so, you're reading the wrong book.) Instead, consider using these techniques in an evolutionary manner. Use Handle classes and Interface classes during development to minimize the impact on clients when implementations change. Replace Handle classes and Interface classes with concrete classes for production use when it can be shown that the difference in speed and/or size is significant enough to justify the increased coupling between classes.

Things to Remember

  • The general idea behind minimizing compilation dependencies is to depend on declarations instead of definitions. Two approaches based on this idea are Handle classes and Interface classes.

  • Library header files should exist in full and declaration-only forms. This applies regardless of whether templates are involved.

Категории