Data Types
The fundamental unit of data storage in PHP is known as the zval, or Zend Value. It's a small, four member struct defined in Zend/zend.h with the following format:
typedef struct _zval_struct { zvalue_value value; zend_uint refcount; zend_uchar type; zend_uchar is_ref; } zval;
It should be a simple matter to intuit the basic storage type for most of these members: unsigned integer for refcount, and unsigned character for type and is_ref. The value member however, is actually a union structure defined, as of PHP5, as:
typedef union _zvalue_value { long lval; double dval; struct { char *val; int len; } str; HashTable *ht; zend_object_value obj; } zvalue_value;
This union allows Zend to store the many different types of data a PHP variable is capable of holding in a single, unified structure.
Zend currently defines the eight data types listed in Table 2.1.
Type Value |
Purpose |
---|---|
IS_NULL |
This type is automatically assigned to uninitialized variables upon their first use and can also be explicitly assigned in userspace using the built-in NULL constant. This variable type provides a special "non-value," which is distinct from a Boolean FALSE or an integer 0. |
IS_BOOL |
Boolean variables can have one of two possible states, either TRUE or FALSE. Conditional expressions in userspace control structuresif, while, ternary, forare implicitly typecast to Boolean during evaluation. |
IS_LONG |
Integer data types in PHP are stored using the host system's signed long data type. On most 32-bit platforms this yields a storage range of -2147483648 to +2147483647. With a few exceptions, whenever a userspace script attempts to store an integer value outside of this range, it is automatically converted to a doubleprecision floating point type (IS_DOUBLE). |
IS_DOUBLE |
Floating point data types use the host system's signed double data type. Floating point numbers are not stored with exact precision; rather, a formula is used to express the value as a fraction of limited precision (mantissa) times 2 raised to a certain power (exponent). This representation allows the computer to store a wide range of values (positive or negative) from as small as 2.225x10^ (-308) to an upper limit of around 1.798x10^308 in only 8 bytes. Unfortunately, numbers that evaluate to exact figures in decimal don't always store cleanly as binary fractions. For example, the decimal expression 0.5 evaluates to an exact binary figure of 0.1, while decimal 0.8 becomes a repeating binary representation of 0.1100110011.... When converted back to decimal, the truncated binary digits yield a slightly offset value because they are not able to store the entire figure. Think of it like trying to express the number 1/3 as a decimal: 0.333333 comes very close, but it's not precise as evidenced by the fact that 3 * 0.333333 is not 1.0. This imprecision often leads to confusion when dealing with floating point numbers on computers. (These range limits are based on common 32-bit platforms; range may vary from system to system.) |
IS_STRING |
PHP's most universal data type is the string which is stored in just the way an experienced C programmer would expect. A block of memory, sufficiently large to hold all the bytes/characters of the string, is allocated and a pointer to that string is stored in the host zval. |
What's worth noting about PHP strings is that the length of the string is always explicitly stated in the zval structure. This allows strings to contain NULL bytes without being truncated. This aspect of PHP strings will be referred to hereafter as binary safety because it makes them safe to contain any type of binary data. |
|
Note that the amount of memory allocated for a given PHP string is always, at minimum, its length plus one. This last byte is populated with a terminating NULL character so that functions that do not require binary safety can simply pass the string pointer through to their underlying method. |
|
IS_ARRAY |
An array is a special purpose variable whose sole function is to carry around other variables. Unlike C's notion of an array, a PHP array is not a vector of a uniform data type (such as zval arrayofzvals[]; ). Instead, a PHP array is a complex set of data buckets linked into a structure known as a HashTable. Each HashTable element (bucket) contains two relevant pieces of information: label and data. In the case of PHP arrays, the label is the associative or numeric index within the array, and the data is the variable (zval) to which that key refers. |
IS_OBJECT |
Objects take the multi-element data storage of arrays and go one further by adding methods, access modifiers, scoped constants, and special event handlers. As an extension developer, building object-oriented code that functions equally well in PHP4 and PHP5 presents a special challenge because the internal object model has changed so much between Zend Engine 1 (PHP4) and Zend Engine 2 (PHP5). |
IS_RESOURCE |
Some data types simply cannot be mapped to userspace. For example, stdio's FILE pointer or libmysqlclient's connection handle can't be simply mapped to an array of scalar values, nor would they make sense if they could. To shield the userspace script writer from having to deal with these issues, PHP provides a generic resource data type. The details of how resources are implemented will be covered in Chapter 9, "The Resource Datatype"; for now just be aware that they exist. |
The IS_* constants listed in Table 2.1 are stored in the type element of the zval struct and determine which part of the value element of the zval struct should be looked at when examining its value.
The most obvious way to inspect the value of type would probably be to dereference it from a given zval as in the following code snippet:
void describe_zval(zval *foo) { if (foo->type == IS_NULL) { php_printf("The variable is NULL"); } else { php_printf("The variable is of type %d", foo->type); } }
Obvious, but wrong.
Well, not wrong, but certainly not the preferred approach. The Zend header files contain a large block of zval access macros that extension authors are expected to use when examining zval data. The primary reason for this is to avoid incompatibilities when and if the engine's API changes, but as a side benefit the code often becomes easier to read. Here's that same code snippet again, this time using the Z_TYPE_P() macro:
void describe_zval(zval *foo) { if (Z_TYPE_P(foo) == IS_NULL) { php_printf("The variable is NULL"); } else { php_printf("The variable is of type %d", Z_TYPE_P(foo)); } }
The _P suffix to this macro indicates that the parameter passed contains a single level of indirection. Two more macros exist in this set, Z_TYPE() and Z_TYPE_PP(), which expect parameters of type zval (no indirection), and zval** (two levels of indirection) respectively.
Note
In this example a special output function, php_printf(), was used to display a piece of data. This function is syntactically identical to stdio's printf() function; however, it handles special processing for web server SAPIs and takes advantage of PHP's output buffering mechanism. You'll learn more about this function and its cousin PHPWRITE() in Chapter 5, "Your First Extension."