Perl Best Practices
15.5. Encapsulation
Always use fully encapsulated objects. The voluntary nature of the security that restricted hashes offer is a genuine problem. Lack of encapsulation is one of the reasons why plain, unrestricted hashes aren't a suitable basis for objects either. Objects without effective encapsulation are vulnerable. Instead of politely respecting their public interface, like so:
# Use our company's proprietary OO file system interface... use File::Hierarchy; # Make an object representing the user's home directory... my $fs = File::Hierarchy->new('~'); # Ask for the list of files in it... for my $file ( $fs->get_files( ) ) { # ...then ask for the name of each file, and print it... print $file->get_name( ), "\n"; } some clever client coder inevitably will realize that it's marginally faster to interact directly with the underlying implementation: # Use our company's proprietary OO file system interface... use File::Hierarchy; # Make an object representing the user's home directory... my $fs = File::Hierarchy->new('~'); # Then poke around inside the (array-based) object # and pull out its embedded file objects... for my $file (@{$fs->{files}}) { # Then poke around inside each (hash-based) file object, # pull out its name, and print it... print $file->{name}, "\n"; }
From the moment someone does that, your class is no longer cleanly decoupled from the code that uses it. You can't be sure that any bugs in your class are actually caused by the internals of your class, and are not the result of some kind of monkeying by the client code. And to make matters worse, now you can't ever change those internals without the risk of breaking some other part of the system. Of course, if the client programmers have deliberately flouted the (unenforced) encapsulation of your objects, and your subsequent essential class modifications unavoidably and necessarily break several thousands of errant lines of their malignant code, surely that's just instant justice, isn't it? Unfortunately, your pointy-haired boss will probably only hear that "your sub ... essential class modifications un ... necessarily break ... thousands of ... lines of ... code". Now, guess who's going to have to fix it all. So you have to be aggressively pre-emptive about enforcing object encapsulation. If the first attempt to circumvent your interface fails, there won't be a second. Or a thousandth. From the very start, you need to enforce the encapsulation of your class rigorously; fatally, if possible. Fortunately, that's not difficult in Perl. There is a simple, convenient, and utterly secure way to prevent client code from accessing the internals of the objects you provide. Happily, that approach guards against misspelling attribute names, as well as being just as fast asand often more memory-efficient thanordinary hash-based objects. That approach is referred to by various namesflyweight scalars, warehoused attributes, inverted indicesbut is most commonly known as: inside-out objects. They're aptly named, too, because they reverse all of Perl's standard object-oriented conventions. For example, instead of storing the collected attributes of an object in an individual hash, inside-out objects store the individual attributes of an object in a collection of hashes. And, rather than using the object's attributes as separate keys into an object hash, they use each object as a key into separate attribute hashes. That description might sound horribly convoluted, but the technique itself certainly isn't. For example, consider the two typical hash-based Perl classes shown in Example 15-1. Each declares a constructor named new( ), which blesses an anonymous hash to produce a new object. The constructor then initializes the attributes of the nascent object by assigning values to appropriate keys within the blessed hash. The other methods defined in the classes (get_files( ) and get_name( )) then access the state of the object using the standard hash look-up syntax: $self->{attribute}. Example 15-1. Typical hash-based Perl classes
package File::Hierarchy; # Objects of this class have the following attributes... # 'root' - The root directory of the file hierarchy # 'files' - An array storing an object for each file in the root directory # Constructor takes path of file system root directory... sub new { my ($class, $root) = @_; # Bless a hash to instantiate the new object... my $new_object = bless {}, $class; # Initialize the object's "root" attribute... $new_object->{root} = $root; return $new_object; } # Retrieve files from root directory... sub get_files { my ($self) = @_; # Load up the "files" attribute, if necessary... if (!exists $self->{files}) { $self->{files} = File::System->list_files($self->{root}); } # Flatten the "files" attribute's array to produce a file list... return @{$self->{files}}; } package File::Hierarchy::File; # Objects of this class have the following attributes... # 'name' - the name of the file # Constructor takes name of file... sub new { my ($class, $filename) = @_; # Bless a hash to instantiate the new object... my $new_object = bless {}, $class; # Initialize the object's "name" attribute... $new_object->{name} = $filename; return $new_object; } # Retrieve name of file... sub get_name { my ($self) = @_; return $self->{name}; }
Example 15-2 shows the same two classes, reimplemented using inside-out objects. The first thing to note is that the inside-out version of each class requires exactly the same number of lines of code as the hash-based version[*]. Moreover, the structure of each class is line-by-line identical to that of its previous version, with only minor syntactic differences on a few corresponding lines. [*] Okay, so there's a small fudge there: the hash-based versions could each save three lines by leaving out the comments describing the class attributes. Of course, in that case the two versions, although still functionally identical, would no longer be identically maintainable. Example 15-2. Atypical inside-out Perl classes
package File::Hierarchy; use Class::Std::Utils; { # Objects of this class have the following attributes... my %root_of; # The root directory of the file hierarchy my %files_of; # An array storing an object for each file in the root directory # Constructor takes path of file system root directory... sub new { my ($class, $root) = @_; # Bless a scalar to instantiate the new object... my $new_object = bless \do{my $anon_scalar}, $class; # Initialize the object's "root" attribute... $root_of{ident $new_object} = $root; return $new_object; } # Retrieve files from root directory... sub get_files { my ($self) = @_; # Load up the "files" attribute, if necessary... if (!exists $files_of{ident $self}) { $files_of{ident $self} = File::System->list_files($root_of{ident $self}); } # Flatten the "files" attribute's array to produce a file list... return @{ $files_of{ident $self} }; } } package File::Hierarchy::File; use Class::Std::Utils; { # Objects of this class have the following attributes... my %name_of; # the name of the file # Constructor takes name of file... sub new { my ($class, $filename) = @_; # Bless a scalar to instantiate the new object... my $new_object = bless \do{my $anon_scalar}, $class; # Initialize the object's "name" attribute... $name_of{ident $new_object} = $filename; return $new_object; } # Retrieve name of file... sub get_name { my ($self) = @_; return $name_of{ident $self}; } }
But although those few differences are minor and syntactic, their combined effect is enormous, because they make the resulting classes significantly more robust, completely encapsulated, and considerably more maintainable[*]. [*] They can be made thread-safe, too, provided each attribute hash is declared as being :shared and the attribute entries themselves are consistently passed to lock( ) before each attribute access. See the perlthrtut documentation for more details. The first difference between the two approaches is that, unlike the hash-based classes, each inside-out class is specified inside a surrounding code block: package File::Hierarchy; { # [Class specification here] } package File::Hierarchy::File; { # [Class specification here] }
That block is vital, because it creates a limited scope, to which any lexical variables that are declared as part of the class will automatically be restricted. The benefits of that constraint will be made apparent shortly. Speaking of lexical variables, the next difference between the two versions of the classes is that the descriptions of attributes in Example 15-1: # Objects of this class have the following attributes... # 'root' - The root directory of the file hierarchy # 'files' - An array storing an object for each file in the root directory
have become declarations of attributes in Example 15-2:
# Objects of this class have the following attributes... my %root_of; # The root directory of the file hierarchy my %files_of; # An array storing an object for each file in the root directory
This is an enormous improvement. By telling Perl what attributes you expect to use, you enable the compiler to checkvia use strictthat you do indeed use only those attributes. That's possible because of the third difference in the two approaches. Each attribute of a hash-based object is stored in an entry in the object's hash: $self->{name}. In other words, the name of a hash-based attribute is symbolic: specified by the string value of a hash key. In contrast, each attribute of an inside-out object is stored in an entry of the attribute's hash: $name_of{ident $self}. So the name of an inside-out attribute isn't symbolic; it's a hard-coded variable name. With hash-based objects, if an attribute name is accidentally misspelled in some method: sub set_name { my ($self, $new_name) = @_; $self->{naem} = $new_name; # Oops! return; } then the $self hash will obliginglyand silently!create a new entry in the hash, with the key 'naem', then assign the new name to it. But since every other method in the class correctly refers to the attribute as $self->{name}, assigning the new value to $self->{naem} effectively makes that assigned value "vanish". With inside-out objects, however, an object's "name" attribute is stored as an entry in the class's lexical %name_of hash. If the attribute name is misspelled, then you're attempting to refer to an entirely different hash: %naem_of. Like so: sub set_name { my ($self, $new_name) = @_; $naem_of{ident $self} = $new_name; # Kaboom! return; }
But, because there's no such hash declared in the scope, use strict will complain (with extreme prejudice): Global symbol "%naem_of" requires explicit package name at Hierarchy.pm line 86
Not only is that consistency check now automatic, it's also performed at compile time. The next difference is even more important and beneficial. Instead of blessing an empty anonymous hash as the new object: my $new_object = bless {}, $class;
the inside-out constructor blesses an empty anonymous scalar: my $new_object = bless \do{my $anon_scalar}, $class;
That odd-looking \do{my $anon_scalar} construct is needed because there's no built-in syntax in Perl for creating a reference to an anonymous scalar; you have to roll-your-own (see the upcoming "Nameless Scalars" sidebar for details). Alternatively, you may prefer to avoid the oddity and just use the anon_scalar( ) function that's provided by the Class::Std::Utils CPAN module: use Class::Std::Utils; # and later... my $new_object = bless anon_scalar( ), $class;
Whichever way the anonymous scalar is created, it's immediately passed to bless, which anoints it as an object of the appropriate class. The resulting object reference is then stored in $new_object. Once the object exists, it's used to create a unique key (ident $new_object) under which each attribute that belongs to the object will be stored (e.g., $root_of{ident $new_object} or $name_of{ident $self}). The ident( ) utility that produces this unique key is provided by the Class::Std::Utils module and is identical in effect to the refaddr( ) function in the standard Scalar::Util module. That is, ident($obj) simply returns the memory address of the object as an integer. That integer is guaranteed to be unique to the object, because only one object can be stored at any given memory address. You could use refaddr( ) directly to get the address if you prefer, but the Class::Std::Utils gives it a shorter, less obtrusive name, which makes the resulting code more readable. To recap: every inside-out object is a blessed scalar, and hasintrinsic to ita unique identifying integer. That integer can be obtained from the object reference itself, and then used to access a unique entry for the object in each of the class's attribute hashes. But why is that so much better than just using hashes as objects? Because it means that every inside-out object is nothing more than an uninitialized scalar. When your constructor passes a new inside-out object back to the client code, all that comes back is an empty scalar, which makes it impossible for that client code to gain direct access to the object's internal state. Oh, sure, the client code could pass an object reference to refaddr( ) or ident( ) to obtain the unique identifier under which that object's state is stored. But that won't
help. The client code is outside the block that surrounds the object's class. So, by the time the client code gets hold of an object, the lexical attribute hashes inside the class block (such as %names_of and %files_of) will be out of scope. The client code won't even be able to see them, let alone access them. At this point you might be wondering: if those attribute hashes are out of scope, why didn't they cease to exist? As explained in the "Nameless Scalars" sidebar, variables are garbage-collected only when nothing refers to them anymore. But the attribute hashes in each class are permanently referred toby namein the code of the various methods of the class. It's those references that keep the hashes "alive" even after their scope ends. Interestingly, that also means that if you declare an attribute hash and then don't actually refer to it in any of the class's methods, that hash will be garbage-collected as soon as the declaration scope finishes. So you don't even pay a storage penalty for attributes you mistakenly declare but never use. With a hash-based object, object state is protected only by the client coder's self-discipline and sense of honour (that is, not at all): # Find the user's videos... $vid_lib = File::Hierarchy->new('~/videos'); # Replace the first three with titles that aren't # actually in the directory (bwah-ha-ha-hah!!!!)... $vid_lib->{files}[0] = q{Phantom Menace}; $vid_lib->{files}[1] = q{The Man Who Wasn't There}; $vid_lib->{files}[2] = q{Ghost};
But if the File::Hierarchy constructor returns an inside-out object instead, then the client code gets nothing but an empty scalar, and any attempt to mess with the object's internal state by treating the object as a raw hash will now produce immediate and fatal results: Not a HASH reference at client_code.pl line 6
By implementing all your classes using inside-out objects from the very beginning, you can ensure that client code never has the opportunity to rely on the internals of your classas it will never be given access to those internals. That guaranteed isolation of internals from interface makes inside-out objects intrinsically more maintainable, because it leaves you free to make changes to the class's implementation whenever you need to. Of the several popular methods of reliably enforcing encapsulation in Perl[*], inside-out objects are also by far the cheapest. The run-time performance of inside-out classes is effectively identical to that of regular hash-based classes. In particular, in both schemes, every attribute access requires only a single hash look-up. The only appreciable difference in speed occurs when an inside-out object is destroyed (see the "Destructors" guideline later in this chapter). [*] Including subroutine-based objects, "flyweight" objects, and the Class::Securehash modulesee Chapter 11 of Object Oriented Perl (Manning, 1999). The relative memory overheads of the two schemes are a little more complex to analyze. Hash-based classes require one hash per object (obviously). On the other hand, inside-out classes require one (empty) scalar per object, plus one hash per declared attribute (i.e., %name_of, %files_of, and so on). Both schemes also need one scalar per attribute per object (the actual storage for their data inside the various hashes), but that cancels out in the comparison and can be ignored. All of which means that, given the relative sizes of an empty hash and an empty scalar (about 7.7 to 1), inside-out objects are more space-efficient than hash-based objects whenever the number of objects to be created is at least 15% higher than the number of attributes per object. In practical terms, inside-out classes scale better than hash-based classes as the total number of objects increases. The only serious drawback of inside-out objects stems directly from their greatest benefit: encapsulation. Because their internals cannot be accessed outside their class, you can't use Data::Dumper (or any other serialization tool) to help you debug the structure of your objects. "Automating Class Hierarchies" in Chapter 16 describes a simple means of overcoming this limitation. |