Perl Best Practices

19.7. Encapsulated Cleverness

If you must rely on cleverness, encapsulate it.

Very occasionally a genuine need for efficiency may (appear to) make it essential to use non-obvious Perl idioms such as:

# Make sure the requests are unique... @requests = keys %{ {map {$_=>1} @raw_requests} };

This statement takes each request in @raw_requests, converts it to a pair ($_=>1) in which the request is now the key, and uses that list of pairs to initialize an anonymous hash ({map {$_=>1} @raw_requests }), which folds every repeated request into the same hash key. The hash is then dereferenced (%{ {map {$_=>1} @raw_requests}), and its unique keys are retrieved (keys %{ {map {$_=>1} @raw_requests} }) and finally assigned into @requests.

But an expression that complex should never be left raw in code. If it's kept at all, it should be kept as a dirty little secret, shamefully hidden away in a subroutine in some dark corner of your code:

sub unique { return keys %{ { map {$_=>1} @_ } };

# Mea culpa! }

# and later... @requests = unique(@raw_requests);

Apart from the obvious advantage that the request-handling code becomes vastly more readable, encapsulating the cleverness has another important benefit: when the cleverness proves not to be as clever as you first thought, it's very easy to replace it with something that's both slightly more readable and very much more efficient:

sub unique { my %uniq;

# Use keys of this hash to track unique values @uniq{@_} = ( );

# Use the args as those keys (the values don't matter) return keys %uniq;

# Return those unique values }

In this version, the list of values that's passed in (@_) is used as the list of keys in a hash slice (@uniq{@_}) of the %uniq hash. Slicing that hash creates all the requested keys within the hash, with the values for repeated keys overwriting earlier ones. The slice produces a list of entries, which is assigned an empty list (@uniq{@_} = ( )), as only the keys themselves matter. The assignment is required, though, because the hash keys are created only when the slice is an lvalue. The keys of the post-sliced hash are therefore the unique values from the original argument list, which the subroutine then simply returns (return keys %uniq).

This version is faster than the previous one because it doesn't have to build an interim key/value/key/value... list and then use that to initialize the hash. Instead, the slicing operation creates all the required keys at once, and the empty list assignment means that no values need to be assigned (or have space allocated for them).

Outside the subroutine there are important benefits too. Because the "clever" code remains hidden away, every line of client code that uses unique( ) will immediately benefit from the improved performance, but without needing to be rewritten in any way.

Of course, later you'll probably realize that both versions of unique( ) share two flaws: they don't preserve the order of any list they're given (which is a problem if that list is a search path), and they convert all the list data to strings (which is a problem if that data was originally a list of objects or references).

However, once you realize that, you still won't have to change the client code, because you can just rewrite the encapsulated nastiness once again:

sub unique { my %seen;

# Keys track values already seen return grep {!$seen{$_}++} @_;

# Keep only those not yet seen }

This version uses a hash (%seen) merely as a look-up table indicating which values have already been encountered. Then it filters the original argument list, keeping only those values that have not yet been seen and incrementing the count of "sightings" for each element as it's examined ($seen{$_}++). The use of a post-increment is critical here as it's essential that the count be incremented every time, but that the count be zero the first time the grep encounters a particular element, so the filter will let that first instance through.

Because grep is merely a filter which passes acceptable values through unchanged, this version of unique( ) preserves both the type and the order of the arguments it's given. And, though it's not quite as fast as the sliced solution shown earlier, it still runs faster than the "clever" anonymous hash version.

Категории