Writing Secure Code, Second Edition

Normalization

Many character set encodings, but especially Unicode, have multiple binary representations for the same string. For example, there are dozens of distinct strings that might render as . This multiplicity complicates operations such as indexing and validation. The complexity increases the risk of coding errors that will compromise security. To reduce complexity in your code, normalize strings to a single form.

Many normalization forms exist already:

The Win32 FoldString function provides several useful options for normalizing strings. Unfortunately, it doesn't cover the full range of Unicode characters, and the mappings do not always match any of the Unicode normalization forms. If you do use FoldString, be sure to test your code with the full Unicode repertoire. For example, if you use FoldString with the MAP_FOLDDIGITS option, it will normalize many but not all of the characters with the numeric Unicode property.

Категории