Non-Generic Collections

The System.Collections namespace in the .NET Framework Class Library is the primary source for non-generic collections. These classes provide standard implementations of many of the data structures discussed in Chapter 25 with collections that store references of type object. In this section, we demonstrate classes ArrayList, Stack and Hashtable.

27.4.1. Class ArrayList

In most programming languages, conventional arrays have a fixed sizethey cannot be changed dynamically to conform to an application's execution-time memory requirements. In some applications, this fixed-size limitation presents a problem for programmers. They must choose between using fixed-size arrays that are large enough to store the maximum number of elements the application may require and using dynamic data structures that can grow and shrink the amount of memory required to store data in response to the changing requirements of an application at execution time.

The .NET Framework's ArrayList collection class mimics the functionality of conventional arrays and provides dynamic resizing of the collection through the class's methods. At any time, an ArrayList contains a certain number of elements less than or equal to its capacitythe number of elements currently reserved for the ArrayList. An application can manipulate the capacity with ArrayList property Capacity. If an ArrayList needs to grow, it by default doubles its Capacity.

Performance Tip 27 1

As with linked lists, inserting additional elements into an ArrayList whose current size is less than its capacity is a fast operation.

Performance Tip 27 2

It is a slow operation to insert an element into an ArrayList that needs to grow larger to accommodate a new element. An ArrayList that is at its capacity must have its memory reallocated and the existing values copied into it.

Performance Tip 27 3

If storage is at a premium, use method TrimToSize of class ArrayList to trim an ArrayList to its exact size. This will optimize an ArrayList's memory use. Be carefulif the application needs to insert additional elements, the process will be slower because the ArrayList must grow dynamically (trimming leaves no room for growth).

Performance Tip 27 4

The default capacity increment, doubling the size of the ArrayList, may seem to waste storage, but doubling is an efficient way for an ArrayList to grow quickly to "about the right size." This is a much more efficient use of time than growing the ArrayList by one element at a time in response to insert operations.

ArrayLists store references to objects. All classes derive from class object, so an ArrayList can contain objects of any type. Figure 27.4 lists some useful methods and properties of class ArrayList.

Figure 27.4. Some methods and properties of class ArrayList.

Method or Property

Description

Add

Adds an object to the ArrayList and returns an int specifying the index at which the object was added.

Capacity

Property that gets and sets the number of elements for which space is currently reserved in the ArrayList.

Clear

Removes all the elements from the ArrayList.

Contains

Returns true if the specified object is in the ArrayList; otherwise, returns false.

Count

Read-only property that gets the number of elements stored in the ArrayList.

IndexOf

Returns the index of the first occurrence of the specified object in the ArrayList.

Insert

Inserts an object at the specified index.

Remove

Removes the first occurrence of the specified object.

RemoveAt

Removes an object at the specified index.

RemoveRange

Removes a specified number of elements starting at a specified index in the ArrayList.

Sort

Sorts the ArrayList.

trimToSize

Sets the Capacity of the ArrayList to the number of elements the ArrayList currently contains (Count).

Figure 27.5 demonstrates class ArrayList and several of its methods. Class ArrayList belongs to the System.Collections namespace (line 4). Lines 811 declare two arrays of strings (colors and removeColors) that we will use to fill two ArrayList objects. Recall from Section 9.11 that constants must be initialized at compile-time, but readonly variables can be initialized at execution time. Arrays are objects created at execution time, so we declare colors and removeColors with readonlynot constto make them unmodifiable. When the application begins execution, we create an ArrayList with an initial capacity of one element and store it in variable list (line 16). The foreach statement in lines 2021 adds the five elements of array colors to list via ArrayList's Add method, so list grows to accommodate these new elements. Line 25 uses ArrayList's overloaded constructor to create a new ArrayList initialized with the contents of array removeColors, then assigns it to variable removeList. This constructor can initialize the contents of an ArrayList with the elements of any ICollection passed to it. Many of the collection classes have such a constructor. Notice that the constructor call in line 25 performs the task of lines 2021.

Figure 27.5. Using class ArrayList.

(This item is displayed on pages 1402 - 1403 in the print version)

1 // Fig. 27.5: ArrayListTest.cs 2 // Using class ArrayList. 3 using System; 4 using System.Collections; 5 6 public class ArrayListTest 7 { 8 private static readonly string[] colors = 9 { "MAGENTA", "RED", "WHITE", "BLUE", "CYAN" }; 10 private static readonly string[] removeColors = 11 { "RED", "WHITE", "BLUE" }; 12 13 // create ArrayList, add colors to it and manipulate it 14 public static void Main( string[] args ) 15 { 16 ArrayList list = new ArrayList( 1 ); // initial capacity of 1 17 18 // add the elements of the colors array 19 // to the ArrayList list 20 foreach ( string color in colors ) 21 list.Add( color ); // add color to the ArrayList list 22 23 // add elements in the removeColors array to 24 // the ArrayList removeList with the ArrayList constructor 25 ArrayList removeList = new ArrayList( removeColors ); 26 27 Console.WriteLine( "ArrayList: " ); 28 DisplayInformation( list ); // output the list 29 30 // remove from ArrayList list the colors in removeList 31 RemoveColors( list, removeList ); 32 33 Console.WriteLine( " ArrayList after calling RemoveColors: " ); 34 DisplayInformation( list ); // output list contents 35 } // end method Main 36 37 // displays information on the contents of an array list 38 private static void DisplayInformation( ArrayList arrayList ) 39 { 40 // iterate through array list with a foreach statement 41 foreach ( object element in arrayList ) 42 Console.Write( "{0} ", element ); // invokes ToString 43 44 // display the size and capacity 45 Console.WriteLine( " Size = {0}; Capacity = {1}", 46 arrayList.Count, arrayList.Capacity ); 47 48 int index = arrayList.IndexOf( "BLUE" ); 49 50 if ( index != -1 ) 51 Console.WriteLine( "The array list contains BLUE at index {0}.", 52 index ); 53 else 54 Console.WriteLine( "The array list does not contain BLUE." ); 55 } // end method DisplayInformation 56 57 // remove colors specified in secondList from firstList 58 private static void RemoveColors( ArrayList firstList, 59 ArrayList secondList ) 60 { 61 // iterate through second ArrayList like an array 62 for ( int count = 0; count < secondList.Count; count++ ) 63 firstList.Remove( secondList[ count ] ); 64 } // end method RemoveColors 65 } // end class ArrayListTest

ArrayList: MAGENTA RED WHITE BLUE CYAN Size = 5; Capacity = 8 The array list contains BLUE at index 3. ArrayList after calling RemoveColors: MAGENTA CYAN Size = 2; Capacity = 8 The array list does not contain BLUE.

Line 28 calls method DisplayInformation (lines 3855) to output the contents of the list. This method uses a foreach statement to traverse the elements of an ArrayList. As we discussed in Section 27.3, the foreach statement is a convenient shorthand for calling ArrayList's GetEnumerator method and using an enumerator to traverse the elements of the collection. Also, we must use an iteration variable of type object because class ArrayList is non-generic and stores references to objects.

We use the Count and Capacity properties in line 46 to display the current number of elements and the maximum number of elements that can be stored without allocating more memory to the ArrayList. The output of Fig. 27.5 indicates that the ArrayList has capacity 8recall that an ArrayList doubles its capacity whenever it needs more space.

In line 48, we invoke method IndexOf to determine the position of the string "BLUE" in arrayList and store the result in local variable index. IndexOf returns -1 if the element is not found. The if statement in lines 5054 checks if index is -1 to determine whether arrayList contains "BLUE". If it does, we output its index. ArrayList also provides method Contains, which simply returns true if an object is in the ArrayList, and false otherwise. Method Contains is preferred if we do not need the index of the element.

Performance Tip 27 5

ArrayList methods IndexOf and Contains each perform a linear search, which is a costly operation for large ArrayLists. If the ArrayList is sorted, use ArrayList method BinarySearch to perform a more efficient search. Method BinarySearch returns the index of the element, or a negative number if the element is not found.

After method DisplayInformation returns, we call method RemoveColors (lines 5864) with the two ArrayLists. The for statement in lines 6263 iterates over ArrayList secondList. Line 63 uses an indexer to access an ArrayList elementby following the ArrayList reference name with square brackets ([]) containing the desired index of the element. An ArgumentOutOfRangeException occurs if the specified index is not both greater than 0 and less than the number of elements currently stored in the ArrayList (specified by the ArrayList's Count property).

We use the indexer to obtain each of secondList's elements, then remove each one from firstList with the Remove method. This method deletes a specified item from an ArrayList by performing a linear search and removing (only) the first occurrence of the specified object. All subsequent elements shift toward the beginning of the ArrayList to fill the emptied position.

After the call to RemoveColors, line 34 again outputs the contents of list, confirming that the elements of removeList were, indeed, removed.

27.4.2. Class Stack

The Stack class implements a stack data structure and provides much of the functionality that we defined in our own implementation in Section 25.3. Refer to that section for a discussion of stack data structure concepts. We created a test application in Fig. 25.11 to demonstrate the StackInheritance data structure that we developed. We adapt Fig. 25.14 in Fig. 27.6 to demonstrate the .NET Framework collection class Stack.

Figure 27.6. Demonstrating class Stack.

(This item is displayed on pages 1405 - 1406 in the print version)

1 // Fig. 27.6: StackTest.cs 2 // Demonstrating class Stack. 3 using System; 4 using System.Collections; 5 6 public class StackTest 7 { 8 public static void Main( string[] args ) 9 { 10 Stack stack = new Stack(); // default Capacity of 10 11 12 // create objects to store in the stack 13 bool aBoolean = true; 14 char aCharacter = '$'; 15 int anInteger = 34567; 16 string aString = "hello"; 17 18 // use method Push to add items to (the top of) the stack 19 stack.Push( aBoolean ); 20 PrintStack( stack ); 21 stack.Push( aCharacter ); 22 PrintStack( stack ); 23 stack.Push( anInteger ); 24 PrintStack( stack ); 25 stack.Push( aString ); 26 PrintStack( stack ); 27 28 // check the top element of the stack 29 Console.WriteLine( "The top element of the stack is {0} ", 30 stack.Peek() ); 31 32 // remove items from stack 33 try 34 { 35 while ( true ) 36 { 37 object removedObject = stack.Pop(); 38 Console.WriteLine( removedObject + " popped" ); 39 PrintStack( stack ); 40 } // end while 41 } // end try 42 catch ( InvalidOperationException exception ) 43 { 44 // if exception occurs, print stack trace 45 Console.Error.WriteLine( exception ); 46 } // end catch 47 } // end Main 48 49 // print the contents of a stack 50 private static void PrintStack( Stack stack ) 51 { 52 if ( stack.Count == 0 ) 53 Console.WriteLine( "stack is empty " ); // the stack is empty 54 else 55 { 56 Console.Write( "The stack is: " ); 57 58 // iterate through the stack with a foreach statement 59 foreach ( object element in stack ) 60 Console.Write( "{0} ", element ); // invokes ToString 61 62 Console.WriteLine( " " ); 63 } // end else 64 } // end method PrintStack 65 } // end class StackTest

The stack is: True The stack is: $ True The stack is: 34567 $ True The stack is: hello 34567 $ True The top element of the stack is hello hello popped The stack is: 34567 $ True 34567 popped The stack is: $ True $ popped The stack is: True True popped stack is empty System.InvalidOperationException: Stack empty. at System.Collections.Stack.Pop() at StackTest.Main(String[] args) in C:examplesch27 fig27_06StackTestStackTest.cs:line 37

The using directive in line 4 allows us to use the Stack class with its unqualified name from the System.Collections namespace. Line 10 creates a Stack with the default initial capacity (10 elements). As one might expect, class Stack has methods Push and Pop to perform the basic stack operations.

Method Push takes an object as an argument and inserts it at the top of the Stack. If the number of items on the Stack (the Count property) is equal to the capacity at the time of the Push operation, the Stack grows to accommodate more objects. Lines 1926 use method Push to add four elements (a bool, a char, an int and a string) to the stack and invoke method PrintStack (lines 5064) after each Push to output the contents of the stack. Notice that this non-generic Stack class can store only references to objects, so each of the value-type itemsthe bool, the char and the intare implicitly boxed before they are added to the Stack. (Namespace System.Collections.Generic provides a generic Stack class that has many of the same methods and properties used in Fig. 27.6.)

Method PrintStack (lines 5064) uses Stack property Count (implemented to fulfill the contract of interface ICollection) to obtain the number of elements in stack. If the stack is not empty (i.e., Count is not equal to 0), we use a foreach statement to iterate over the stack and output its contents by implicitly invoking the ToString method of each element. The foreach statement implicitly invokes Stack's GetEnumerator method, which we could have called explicitly to traverse the stack via an enumerator.

Method Peek returns the value of the top stack element, but does not remove the element from the Stack. We use Peek at line 30 to obtain the top object of the Stack, then output that object, implicitly invoking the object's ToString method. An InvalidOperationException occurs if the Stack is empty when the application calls Peek. (We do not need an exception handling block because we know the stack is not empty here.)

Method Pop takes no argumentsit removes and returns the object currently on top of the Stack. An infinite loop (lines 3540) pops objects off the stack and outputs them until the stack is empty. When the application calls Pop on the empty stack, an InvalidOperationException is thrown. The catch block (lines 4246) outputs the exception, implicitly invoking the InvalidOperationException's ToString method to obtain its error message and stack trace.

Common Programming Error 27 3

Attempting to Peek or Pop an empty Stack (a Stack whose Count property is 0) causes an InvalidOperationException.

Although Fig. 27.6 does not demonstrate it, class Stack also has method Contains, which returns true if the Stack contains the specified object, and returns false otherwise.

27.4.3. Class Hashtable

When an application creates objects of new or existing types, it needs to manage those objects efficiently. This includes sorting and retrieving objects. Sorting and retrieving information with arrays is efficient if some aspect of your data directly matches the key value and if those keys are unique and tightly packed. If you have 100 employees with nine-digit Social Security numbers and you want to store and retrieve employee data by using the Social Security number as a key, it would nominally require an array with 999,999,999 elements, because there are 999,999,999 unique nine-digit numbers. If you have an array that large, you could get very high performance storing and retrieving employee records by simply using the Social Security number as the array index, but it would be a large waste of memory.

Many applications have this problemeither the keys are of the wrong type (i.e., not non-negative integers), or they are of the right type, but they are sparsely spread over a large range.

What is needed is a high-speed scheme for converting keys such as Social Security numbers and inventory part numbers to unique array indices. Then, when an application needs to store something, the scheme could convert the application key rapidly to an index and the record of information could be stored at that location in the array. Retrieval occurs the same wayonce the application has a key for which it wants to retrieve the data record, the application simply applies the conversion to the key, which produces the array subscript where the data resides in the array and retrieves the data.

The scheme we describe here is the basis of a technique called hashing, in which we store data in a data structure called a hash table. Why the name? Because, when we convert a key into an array subscript, we literally scramble the bits, making a "hash" of the number. The number actually has no real significance beyond its usefulness in storing and retrieving this particular data record.

A glitch in the scheme occurs when collisions occur (i.e., two different keys "hash into" the same cell, or element, in the array). Since we cannot sort two different data records to the same space, we need to find an alternative home for all records beyond the first that hash to a particular array subscript. One scheme for doing this is to "hash again" (i.e., to reapply the hashing transformation to the key to provide a next candidate cell in the array). The hashing process is designed to be quite random, so the assumption is that with just a few hashes, an available cell will be found.

Another scheme uses one hash to locate the first candidate cell. If the cell is occupied, successive cells are searched linearly until an available cell is found. Retrieval works the same waythe key is hashed once, the resulting cell is checked to determine whether it contains the desired data. If it does, the search is complete. If it does not, successive cells are searched linearly until the desired data is found.

The most popular solution to hash-table collisions is to have each cell of the table be a hash "bucket"typically, a linked list of all the keyvalue pairs that hash to that cell. This is the solution that the .NET Framework's Hashtable class implements.

The load factor affects the performance of hashing schemes. The load factor is the ratio of the number of objects stored in the hash table to the total number of cells of the hash table. As this ratio gets higher, the chance of collisions tends to increase.

Performance Tip 27 6

The load factor in a hash table is a classic example of a space/time trade-off: By increasing the load factor, we get better memory utilization, but the application runs slower due to increased hashing collisions. By decreasing the load factor, we get better application speed because of reduced hashing collisions, but we get poorer memory utilization because a larger portion of the hash table remains empty.

Computer science students study hashing schemes in courses called "Data Structures" and "Algorithms." Recognizing the value of hashing, the .NET Framework provides class Hashtable to enable programmers to easily employ hashing in applications.

This concept is profoundly important in our study of object-oriented programming. Classes encapsulate and hide complexity (i.e., implementation details) and offer userfriendly interfaces. Crafting classes to do this properly is one of the most valued skills in the field of object-oriented programming.

A hash function performs a calculation that determines where to place data in the hash table. The hash function is applied to the key in a keyvalue pair of objects. Class Hashtable can accept any object as a key. For this reason, class object defines method GetHashCode, which all objects inherit. Most classes that are candidates to be used as keys in a hash table override this method to provide one that performs efficient hash code calculations for a specific type. For example, a string has a hash code calculation that is based on the contents of the string. Figure 27.7 uses a Hashtable to count the number of occurrences of each word in a string.

Figure 27.7. Application counts the number of occurrences of each word in a string and stores them in a hash table.

1 // Fig. 27.7: HashtableTest.cs 2 // Application counts the number of occurrences of each word in a string 3 // and stores them in a hash table. 4 using System; 5 using System.Text.RegularExpressions; 6 using System.Collections; 7 8 public class HashtableTest 9 { 10 public static void Main( string[] args ) 11 { 12 // create hash table based on user input 13 Hashtable table = CollectWords(); 14 15 // display hash table content 16 DisplayHashtable( table ); 17 } // end method Main 18 19 // create hash table from user input 20 private static Hashtable CollectWords() 21 { 22 Hashtable table = new Hashtable(); // create a new hash table 23 24 Console.WriteLine( "Enter a string: " ); // prompt for user input 25 string input = Console.ReadLine(); // get input 26 27 // split input text into tokens 28 string[] words = Regex.Split( input, @"s+" ); 29 30 // processing input words 31 foreach ( string word in words ) 32 { 33 string wordKey = word.ToLower(); // get word in lowercase 34 35 // if the hash table contains the word 36 if ( table.ContainsKey( wordKey ) ) 37 { 38 table[ wordKey ] = ( ( int ) table[ wordKey ] ) + 1; 39 } // end if 40 else 41 // add new word with a count of 1 to hash table 42 table.Add( wordKey, 1 ); 43 } // end foreach 44 45 return table; 46 } // end method CollectWords 47 48 // display hash table content 49 private static void DisplayHashtable( Hashtable table ) 50 { 51 Console.WriteLine( " Hashtable contains: {0,-12}{1,-12}", 52 "Key:", "Value:" ); 53 54 // generate output for each key in hash table 55 // by iterating through the Keys property with a foreach statement 56 foreach ( object key in table.Keys ) 57 Console.WriteLine( "{0,-12}{1,-12}", key, table[ key ] ); 58 59 Console.WriteLine( " size: {0}", table.Count ); 60 } // end method DisplayHashtable 61 } // end class HashtableTest

Enter a string: As idle as a painted ship upon a painted ocean Hashtable contains: Key: Value: painted 2 a 2 upon 1 as 2 ship 1 idle 1 ocean 1 size: 7

Lines 46 contain using directives for namespaces System (for class Console), System.Text.RegularExpressions (for class Regex, discussed in Chapter 16, Strings, Characters and Regular Expressions) and System.Collections (for class Hashtable). Class HashtableTest declares three static methods. Method CollectWords (lines 2046) inputs a string and returns a Hashtable in which each value stores the number of times that word appears in the string and the word is used for the key. Method DisplayHashtable (lines 4960) displays the Hashtable passed to it in column format. The Main method (lines 1017) simply invokes CollectWords (line 13), then passes the Hashtable returned by CollectWords to DisplayHashtable in line 16.

Method CollectWords (lines 2046) begins by initializing local variable table with a new Hashtable (line 22) that has a default initial capacity of 0 elements and a default maximum load factor of 1.0. When the number of items in the Hashtable becomes greater than the number of cells times the load factor, the capacity is increased automatically. (This implementation detail is invisible to clients of the class.) Lines 2425 prompt the user and input a string. We use static method Split of class Regex in line 28 to divide the string by its whitespace characters. This creates an array of "words," which we then store in local variable words.

The foreach statement in lines 3143 loops over every element of array words. Each word is converted to lowercase with string method ToLower, then stored in variable wordKey (line 33). Then line 36 calls Hashtable method ContainsKey to determine whether the word is in the hash table (and thus has occurred previously in the string). If the Hashtable does not contain an entry for the word, line 42 uses Hashtable method Add to create a new entry in the hash table, with the lowercase word as the key and an object containing 1 as the value. Note that autoboxing occurs when the application passes integer 1 to method Add, because the hash table stores both the key and value in references to type object.

Common Programming Error 27 4

Using the Add method to add a key that already exists in the hash table causes an ArgumentException.

If the word is already a key in the hash table, line 38 uses the Hashtable's indexer to obtain and set the key's associated value (the word count) in the hash table. We first downcast the value obtained by the get accessor from an object to an int. This unboxes the value so that we can increment it by 1. Then, when we use the indexer's set accessor to assign the key's associated value, the incremented value is implicitly reboxed so that it can be stored in the hash table.

Notice that invoking the get accessor of a Hashtable indexer with a key that does not exist in the hash table obtains a null reference. Using the set accessor with a key that does not exist in the hash table creates a new entry, as if you had used the Add method.

Line 45 returns the hash table to the Main method, which then passes it to method DisplayHashtable (lines 4960), which displays all the entries. This method uses readonly property Keys (line 56) to get an ICollection that contains all the keys. Because ICollection extends IEnumerable, we can use this collection in the foreach statement in lines 5657 to iterate over the keys of the hash table. This loop accesses and outputs each key and its value in the hash table using the iteration variable and table's get accessor. Each key and its value is displayed in a field width of -12. The negative field width indicates that the output is left justified. Note that a hash table is not sorted, so the keyvalue pairs are not displayed in any particular order. Line 59 uses Hashtable property Count to get the number of keyvalue pairs in the Hashtable.

Lines 5657 could have also used the foreach statement with the Hashtable object itself, instead of using the Keys property. If you use a foreach statement with a Hashtable object, the iteration variable will be of type DictionaryEntry. The enumerator of a Hashtable (or any other class that implements IDictionary) uses the DictionaryEntry structure to store keyvalue pairs. This structure provides properties Key and Value for retrieving the key and value of the current element. If you do not need the key, class Hashtable also provides a read-only Values property that gets an ICollection of all the values stored in the Hashtable. We can use this property to iterate through the values stored in the Hashtable without regard for where they are stored.

Problems with Non-Generic Collections

In the word-counting application of Fig. 27.7, our Hashtable stores its keys and data as object references, even though we store only string keys and int values by convention. This results in some awkward code. For example, line 38 was forced to unbox and box the int data stored in the Hashtable every time it incremented the count for a particular key. This is inefficient. A similar problem occurs in line 56the iteration variable of the foreach statement is an object reference. If we need to use any of its string-specific methods, we need an explicit downcast.

This can cause subtle bugs. Suppose we decide to improve the readability of Fig. 27.7 by using the indexer's set accessor instead of the Add method to add a key/value pair in line 42, but accidentally type:

table[ wordKey ] = wordKey; // initialize to 1

This statement will create a new entry with a string key and string value instead of an int value of 1. Although the application will compile correctly, this is clearly incorrect. If a word appears twice, line 38 will try to downcast this string to an int, causing an InvalidCastException at execution time. The error that appears at execution time will indicate that the problem is at line 38, where the exception occurred, not at line 42. This makes the error more difficult to find and debug, especially in large software applications where the exception may occur in a different fileand even in a different assembly.

In Chapter 26, we introduced generics. In the next two sections, we demonstrate how to use generic collections.

Категории