Data Hierarchy

Ultimately, all data items that computers process are reduced to combinations of 0s and 1s. This occurs because it is simple and economical to build electronic devices that can assume two stable statesone state represents 0 and the other represents 1. It is remarkable that the impressive functions performed by computers involve only the most fundamental manipulations of 0s and 1s.

The smallest data item that computers support is called a bit (short for "binary digit"a digit that can assume one of two values). Each data item, or bit, can assume either the value 0 or the value 1. Computer circuitry performs various simple bit manipulations, such as examining the value of a bit, setting the value of a bit and reversing a bit (from 1 to 0 or from 0 to 1).

Programming with data in the low-level form of bits is cumbersome. It is preferable to program with data in forms such as decimal digits (i.e., 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9), letters (i.e., AZ and az) and special symbols (i.e., $, @, %, &, *, (, ), -, +, ", :, ?, / and many others). Digits, letters and special symbols are referred to as characters. The set of all characters used to write programs and represent data items on a particular computer is called that computer's character set. Because computers can process only 0s and 1s, every character in a computer's character set is represented as a pattern of 0s and 1s Bytes are composed of eight bits. C# uses the Unicode® character set (www.unicode.org) in which characters are composed of 2 bytes. Programmers create programs and data items with characters; computers manipulate and process these characters as patterns of bits.

Just as characters are composed of bits, fields are composed of characters. A field is a group of characters that conveys meaning. For example, a field consisting of uppercase and lowercase letters can represent a person's name.

Data items processed by computers form a data hierarchy (Fig. 18.1), in which data items become larger and more complex in structure as we progress from bits to characters to fields to larger data aggregates.

Figure 18.1. Data hierarchy.

Typically, a record (which can be represented as a class) is composed of several related fields. In a payroll system, for example, a record for a particular employee might include the following fields:

  1. Employee identification number
  2. Name
  3. Address
  4. Hourly pay rate
  5. Number of exemptions claimed
  6. Year-to-date earnings
  7. Amount of taxes withheld

In the preceding example, each field is associated with the same employee. A file is a group of related records.[1] A company's payroll file normally contains one record for each employee. A payroll file for a small company might contain only 22 records, whereas one for a large company might contain 100,000 records. It is not unusual for a company to have many files, some containing millions, billions or even trillions of characters of information.

[1] Generally, a file can contain arbitrary data in arbitrary formats. In some operating systems, a file is viewed as nothing more than a collection of bytes, and any organization of the bytes in a file (such as organizing the data into records) is a view created by the application programmer.

To facilitate the retrieval of specific records from a file, at least one field in each record is chosen as a record key, which identifies a record as belonging to a particular person or entity and distinguishes that record from all others. For example, in a payroll record, the employee identification number normally would be the record key.

There are many ways to organize records in a file. A common organization is called a sequential file, in which records typically are stored in order by a record-key field. In a payroll file, records usually are placed in order by employee identification number. The first employee record in the file contains the lowest employee identification number, and subsequent records contain increasingly higher ones.

Most businesses use many different files to store data. For example, a company might have payroll files, accounts-receivable files (listing money due from clients), accounts-payable files (listing money due to suppliers), inventory files (listing facts about all the items handled by the business) and many other files. A group of related files often are stored in a database. A collection of programs designed to create and manage databases is called a database management system (DBMS). We discuss databases in Chapter 20.

Files and Streams

Категории