The Data Hierarchy
Ultimately, all data items that digital computers process are reduced to combinations of zeros and ones. This occurs because it is simple and economical to build electronic devices that can assume two stable statesone state represents 0 and the other represents 1. It is remarkable that the impressive functions performed by computers involve only the most fundamental manipulations of 0s and 1s.
The smallest data item that computers support is called a bit (short for "binary digit"a digit that can assume one of two values). Each data item, or bit, can assume either the value 0 or the value 1. Computer circuitry performs various simple bit manipulations, such as examining the value of a bit, setting the value of a bit and reversing a bit (from 1 to 0 or from 0 to 1).
Programming with data in the low-level form of bits is cumbersome. It is preferable to program with data in forms such as decimal digits (i.e., 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9), letters (i.e., A through Z and a through z) and special symbols (i.e., $, @, %, &, *, (, ), -, +, ", :, ?, / and many others). Digits, letters and special symbols are referred to as characters. The set of all characters used to write programs and represent data items on a particular computer is called that computer's character set. Because computers can process only 1s and 0s, every character in a computer's character set is represented as a pattern of 1s and 0s. Bytes are composed of eight bits. Programmers create programs and data items with characters; computers manipulate and process these characters as patterns of bits. For example, C++ provides data type char. Each char occupies one byte of memory. C++ also provides data type wchar_t, which can occupy more than one byte (to support larger character sets, such as the Unicode® character set). For more information on Unicode®, visit www.unicode.org.
Just as characters are composed of bits, fields are composed of characters. A field is a group of characters that conveys some meaning. For example, a field consisting of uppercase and lowercase letters can represent a person's name.
Data items processed by computers form a data hierarchy (Fig. 17.1), in which data items become larger and more complex in structure as we progress from bits, to characters, to fields and to larger data aggregates.
Figure 17.1. Data hierarchy.
Typically, a record (which can be represented as a class in C++) is composed of several fields (called data members in C++). In a payroll system, for example, a record for a particular employee might include the following fields:
- Employee identification number
- Name
- Address
- Hourly pay rate
- Number of exemptions claimed
- Year-to-date earnings
- Amount of taxes withheld
Thus, a record is a group of related fields. In the preceding example, each field is associated with the same employee. A file is a group of related records.[1] A company's payroll file normally contains one record for each employee. Thus, a payroll file for a small company might contain only 22 records, whereas one for a large company might contain 100,000 records. It is not unusual for a company to have many files, some containing millions, billions, or even trillions of characters of information.
[1] Generally, a file can contain arbitrary data in arbitrary formats. In some operating systems, a file is viewed as nothing more than a collection of bytes. In such an operating system, any organization of the bytes in a file (such as organizing the data into records) is a view created by the application programmer.
To facilitate the retrieval of specific records from a file, at least one field in each record is chosen as a record key. A record key identifies a record as belonging to a particular person or entity and distinguishes that record from all others. In the payroll record described previously, the employee identification number normally would be chosen as the record key.
There are many ways of organizing records in a file. A common type of organization is called a sequential file, in which records typically are stored in order by a record-key field. In a payroll file, records usually are placed in order by employee identification number. The first employee record in the file contains the lowest employee identification number, and subsequent records contain increasingly higher ones.
Most businesses use many different files to store data. For example, a company might have payroll files, accounts-receivable files (listing money due from clients), accounts-payable files (listing money due to suppliers), inventory files (listing facts about all the items handled by the business) and many other types of files. A group of related files often are stored in a database. A collection of programs designed to create and manage databases is called a database management system (DBMS).