Hash Functions
Hash functions are used for several cryptographic applications. They can be used for secure password verification or storage and are also a base component for data authentication.
Hashing is a one-way function of input data, which produces fixed-length output data, the digest. The digest uniquely identifies the input data and is cryptographically very strong, that is, it is impossible to recover input data from its digest, and if the input data changes just a little, the digest (fingerprint) changes substantially (avalanche effect). Therefore, high-volume data can be identified by its (shorter) digest. For this reason, the digest is called a fingerprint of the data. Given only a digest, it is not computationally feasible to regenerate the data that was used to compute the digest.
Figure 24-4 illustrates how hashing is performed. Data of arbitrary length is input to the hash function, and the result of the hash function is the fixed-length hash (digest, fingerprint). Hashing is similar to the calculation of cyclic redundancy check (CRC) checksums, except that it is much stronger from a cryptographic point of view. With CRC, given a CRC value, it is easy to generate data with the same CRC. However, with hash functions, this is not computationally feasible for an attacker.
Figure 24-4. Hashing Process
The two best-known hashing functions are these:
- Message Digest 5 (MD5), with 128-bit digests
- Secure Hash Algorithm 1 (SHA-1), with 160-bit digests
There is considerable evidence that MD5 might not be as strong as originally envisioned and that collisions (different inputs resulting in the same fingerprint) are more likely to occur than designed for. Therefore, MD5 should be avoided as an algorithm of choice and SHA-1 should be used instead.
NIST developed SHA, the algorithm specified in the Secure Hash Standard. SHA-1 is a revision to SHA that was published in 1994; the revision corrected an unpublished flaw in SHA. Its design is very similar to the MD4 family of hash functions developed by Rivest. The algorithm takes a message of no less than 264 bits in length and produces a 160-bit message digest. The algorithm is slightly slower than MD5, but the larger message digest makes it more secure against brute-force collision and inversion attacks.
Figure 24-5 illustrates hashing in action. The sender wants to ensure that the message will not be altered on its way to the receiver. The sender uses the message as the input to a hashing algorithm and computes its fixed-length digest or fingerprint. This fingerprint is then attached to the message (the message and the hash are cleartext) and sent to the receiver. The receiver removes the fingerprint from the message and uses the message as input to the same hashing algorithm. If the hash computed by the receiver is equal to the one attached to the message, the message has not been altered during transit.
Figure 24-5. Hashing Example
Be aware that there is no security added to the message in this example. When the message traverses the network, a potential attacker could intercept the message, change it, recalculate the hash, and append the newly recalculated fingerprint to the message (a man-in-the-middle interception attack). Hashing only prevents the message from being changed accidentally (that is, by a communication error). There is nothing unique to the sender in the hashing procedure; therefore, anyone can compute a hash for any data, as long as they know the correct hash algorithm.
Thus, hash functions are helpful to ensure that data was not changed accidentally but cannot ensure that data was not deliberately changed. For the latter, you need to employ hash functions in the context of Hash-based Message Authentication Code (HMAC). They will extend hashes by adding a secure component.
HMAC uses existing hash functions, but with the significant difference of adding an additional secret key as the input to the hash function when calculating the digest (fingerprint). Only the sender and the receiver share the secret key, and the output of the hash function now depends on the input data and the secret key. Therefore, only parties who have access to that secret key can compute or verify the digest of an HMAC function. This defeats man-in-the-middle attacks and also provides authentication of data origin. If only two parties share a secret HMAC key and use HMAC functions for authentication, the receiver of a properly constructed HMAC digest with a message can be sure that the other party was the originator of the message because that other party is the only other entity possessing the secret key. However, because both parties know the key, HMAC does not provide nonrepudiation. For the latter, every entity would need its own secret key instead of having a secret key shared between two parties.
HMAC functions are generally fast and are often applied in these situations:
- To provide a fast proof of message authenticity and integrity among parties sharing the secret key, such as with IPsec packets or routing protocol authentication
- To generate one-time (and one-way) responses to challenges in authentication protocols (such as PPP Challenge Handshake Authentication Protocol [CHAP], Microsoft NT Domain, and Extensible Authentication Protocol-MD5 [EAP-MD5])
- To provide proof of integrity of bulk data, such as with file-integrity checkers (for example, Tripwire), or with document signing (digitally signed contracts, Public Key Infrastructure [PKI] certificates)
Some well-known HMAC functions are as follows:
- Keyed MD5, based on the MD5 hashing algorithm, which should be avoided
- Keyed SHA-1, based on the SHA-1 hashing algorithm, which is recommended
Cisco IP telephony uses SHA-1 HMAC for protecting signaling traffic and media exchange.