Searching Algorithms
Looking up a phone number, accessing a Web site and checking the definition of a word in a dictionary all involve searching large amounts of data. The next two sections discuss two common search algorithmsone that is easy to program yet relatively inefficient and one that is relatively efficient but more complex and difficult to program.
16.2.1. Linear Search
The linear search algorithm searches each element in an array sequentially. If the search key does not match an element in the array, the algorithm tests each element, and when the end of the array is reached, informs the user that the search key is not present. If the search key is in the array, the algorithm tests each element until it finds one that matches the search key and returns the index of that element.
As an example, consider an array containing the following values
34 56 2 10 77 51 93 30 5 52
and a program that is searching for 51. Using the linear search algorithm, the program first checks whether 34 matches the search key. It does not, so the algorithm checks whether 56 matches the search key. The program continues moving through the array sequentially, testing 2, then 10, then 77. When the program tests 51, which matches the search key, the program returns the index 5, which is the location of 51 in the array. If, after checking every array element, the program determines that the search key does not match any element in the array, the program returns a sentinel value (e.g. -1).
Figure 16.2 declares the LinearArray class. This class has two private instance variablesan array of ints named data, and a static Random object to fill the array with randomly generated ints. When an object of class LinearArray is instantiated, the constructor (lines 1219) creates and initializes the array data with random ints in the range 1099. If there are duplicate values in the array, linear search returns the index of the first element in the array that matches the search key.
Figure 16.2. LinearArray class.
(This item is displayed on page 788 in the print version)
1 // Fig 16.2: LinearArray.java 2 // Class that contains an array of random integers and a method 3 // that will search that array sequentially 4 import java.util.Random; 5 6 public class LinearArray 7 { 8 private int [] data; // array of values 9 private static Random generator = new Random(); 10 11 // create array of given size and fill with random numbers 12 public LinearArray( int size ) 13 { 14 data = new int [ size ]; // create space for array 15 16 // fill array with random ints in range 10-99 17 for ( int i = 0 ; i < size; i++ ) 18 data[ i ] = 10 + generator.nextInt( 90 ); 19 } // end LinearArray constructor 20 21 // perform a linear search on the data 22 public int linearSearch( int searchKey ) 23 { 24 // loop through array sequentially 25 for ( int index = 0 ; index < data.length; index++ ) 26 if ( data[ index ] == searchKey ) 27 return index; // return index of integer 28 29 return -1; // integer was not found 30 } // end method linearSearch 31 32 // method to output values in array 33 public String toString() 34 { 35 StringBuffer temporary = new StringBuffer(); 36 37 // iterate through array 38 for ( int element : data ) 39 temporary.append( element + " " ); 40 41 temporary.append( " " ); // add endline character 42 return temporary.toString(); 43 } // end method toString 44 } // end class LinearArray |
Lines 2230 perform the linear search. The search key is passed to parameter searchKey. Lines 2527 loop through the elements in the array. Line 26 compares each element in the array with searchKey. If the values are equal, line 27 returns the index of the element. If the loop ends without finding the value, line 29 returns -1. Lines 3343 declare method toString, which returns a String representation of the array for printing.
Figure 16.3 creates a LinearArray object containing an array of 10 ints (line 16) and allows the user to search the array for specific elements. Lines 2022 prompt the user for the search key and store it in searchInt. Lines 2541 then loop until the searchInt is equal to -1. The array holds ints from 10-99 (line 18 of Fig. 16.2). Line 28 calls method linearSearch to determine whether searchInt is in the array. If searchInt is not in the array, linearSearch returns -1 and the program notifies the user (lines 3132). If searchInt is in the array, linearSearch returns the position of the element, which the program outputs in lines 3435. Lines 3840 retrieve the next integer from the user.
Figure 16.3. LinearSearchTest class.
(This item is displayed on pages 789 - 790 in the print version)
1 // Fig 16.3: LinearSearchTest.java 2 // Sequentially search an array for an item. 3 import java.util.Scanner; 4 5 public class LinearSearchTest 6 { 7 public static void main( String args[] ) 8 { 9 // create Scanner object to input data 10 Scanner input = new Scanner( System.in ); 11 12 int searchInt; // search key 13 int position; // location of search key in array 14 15 // create array and output it 16 LinearArray searchArray = new LinearArray( 10 ); 17 System.out.println( searchArray ); // print array 18 19 // get input from user 20 System.out.print( 21 "Please enter an integer value (-1 to quit): " ); 22 searchInt = input.nextInt(); // read first int from user 23 24 // repeatedly input an integer; -1 terminates the program 25 while ( searchInt != -1 ) 26 { 27 // perform linear search 28 position = searchArray.linearSearch( searchInt ); 29 30 if ( position == -1 ) // integer was not found 31 System.out.println( "The integer " + searchInt + 32 " was not found. " ); 33 else // integer was found 34 System.out.println( "The integer " + searchInt + 35 " was found in position " + position + ". " ); 36 37 // get input from user 38 System.out.print( 39 "Please enter an integer value (-1 to quit): " ); 40 searchInt = input.nextInt(); // read next int from user 41 } // end while 42 } // end main 43 } // end class LinearSearchTest
|
Efficiency of Linear Search
Searching algorithms all accomplish the same goalfinding an element that matches a given search key, if such an element does, in fact, exist. There are, however, a number of things that differentiate search algorithms from another. The major difference is the amount of effort they require to complete the search. One way to describe this effort is with Big O notation, which indicates the worst-case run time for an algorithm, that is, how hard an algorithm may have to work to solve a problem. For searching and sorting algorithms, this is particularly dependent on how many data elements there are.
Suppose an algorithm is designed to test whether the first element of an array is equal to the second element of the array. If the array has 10 elements, this algorithm requires one comparison. If the array has 1,000 elements, the algorithm still requires one comparison. In fact, the algorithm is completely independent of the number of elements in the array. This algorithm is said to have a constant run time, which is represented in Big O notation as O(1). An algorithm that is O(1) does not necessarily require only one comparison. O(1) just means that the number of comparisons is constantit does not grow as the size of the array increases. An algorithm that tests whether the first element of an array is equal to any of the next three elements is still O(1) even though it requires three comparisons.
An algorithm that tests whether the first element of an array is equal to any of the other elements of the array will require at most n 1 comparisons, where n is the number of elements in the array. If the array has 10 elements, this algorithm requires up to nine comparisons. If the array has 1,000 elements, this algorithm requires up to 999 comparisons. As n grows larger, the n part of the expression "dominates" and subtracting one becomes inconsequential. Big O is designed to highlight these dominant terms and ignore terms that become unimportant as n grows. For this reason, an algorithm that requires a total of n 1 comparisons (such as the one we described earlier) is said to be O(n). An O(n) algorithm is referred to as having a linear run time. O(n) is often pronounced "on the order of n" or more simply "order n."
Now suppose you have an algorithm that tests whether any element of an array is duplicated elsewhere in the array. The first element must be compared with every other element in the array. The second element must be compared with every other element except the first (it was already compared to the first). The third element must be compared with every other element except the first two. In the end, this algorithm will end up making (n 1) + (n 2) + ... + 2 + 1 or n2/2 n/ 2 comparisons. As n increases, the n2 term dominates and the n term becomes inconsequential. Again, Big O notation highlights the n2 term, leaving n2/2. But as we will soon see, constant factors are omitted in Big O notation.
Big O is concerned with how an algorithm's run time grows in relation to the number of items processed. Suppose an algorithm requires n2 comparisons. With four elements, the algorithm will require 16 comparisons; with eight elements, the algorithm will require 64 comparisons. With this algorithm, doubling the number of elements quadruples the number of comparisons. Consider a similar algorithm requiring n2/2 comparisons. With four elements, the algorithm will require eight comparisons; with eight elements, the algorithm will require 32 comparisons. Again, doubling the number of elements quadruples the number of comparisons. Both of these algorithms grow as the square of n, so Big O ignores the constant and both algorithms are considered to be O(n2), which is referred to as quadratic run time and pronounced "on the order of n-squared" or more simply "order n-squared."
When n is small, O(n2) algorithms (running on today's billion-operation-per-second personal computers) will not noticeably affect performance. But as n grows you will start to notice the performance degradation. An O(n2) algorithm running on a million-element array would require a trillion "operations" (where each could actually require several machine instructions to execute). This could require a few hours to execute. A billion-element array would require a quintillion operations, a number so large that the algorithm could take decades! O(n2) algorithms, unfortunately, are easy to write, as you will see in this chapter. You will also see algorithms with more favorable Big O measures. These efficient algorithms often take a bit more cleverness and work to create, but their superior performance can be well worth the extra effort, especially as n gets large and as algorithms are compounded into larger programs.
The linear search algorithm runs in O(n) time. The worst case in this algorithm is that every element must be checked to determine whether the search item exists in the array. If the size of the array is doubled, the number of comparisons that the algorithm must perform is also doubled. Note that linear search can provide outstanding performance if the element matching the search key happens to be at or near the front of the array. But we seek algorithms that perform well, on average, across all searches, including those where the element matching the search key is near the end of the array.
Linear search is the easiest search algorithm to program, but it can be slow compared to other search algorithms. If a program needs to perform many searches on large arrays, it may be better to implement a different, more efficient algorithm, such as the binary search which we present in the next section.
Performance Tip 16.1
Sometimes the simplest algorithms perform poorly. Their virtue is that they are easy to program, test and debug. Sometimes more complex algorithms are required to realize maximum performance. |
16.2.2. Binary Search
The binary search algorithm is more efficient than the linear search algorithm, but it requires that the array be sorted. The first iteration of this algorithm tests the middle element in the array. If this matches the search key, the algorithm ends. Assuming the array is sorted in ascending order, then if the search key is less than the middle element, the search key cannot match any element in the second half of the array and the algorithm continues with only the first half of the array (i.e., the first element up to, but not including the middle element). If the search key is greater than the middle element, the search key cannot match any element in the first half of the array and the algorithm continues with only the second half of the array (i.e., the element after the middle element through the last element). Each iteration tests the middle value of the remaining portion of the array. If the search key does not match the element, the algorithm eliminates half of the remaining elements. The algorithm ends either by finding an element that matches the search key or reducing the sub-array to zero size.
As an example consider the sorted 15-element array
2 3 5 10 27 30 34 51 56 65 77 81 82 93 99
and a search key of 65. A program implementing the binary search algorithm would first check whether 51 is the search key (because 51 is the middle element of the array). The search key (65) is larger than 51, so 51 is discarded along with the first half of the array (all elements smaller than 51.) Next, the algorithm checks whether 81 (the middle element of the remainder of the array) matches the search key. The search key (65) is smaller than 81, so 81 is discarded along with the elements larger than 81. After just two tests, the algorithm has narrowed the number of values to check to three (56, 65 and 77). The algorithm then checks 65 (which indeed matches the search key), and returns the index of the array element containing 65. This algorithm required just three comparisons to determine whether the search key matched an element of the array. Using a linear search algorithm would have required 10 comparisons. [Note: In this example, we have chosen to use an array with 15 elements so that there will always be an obvious middle element in the array. With an even number of elements, the middle of the array lies between two elements. We implement the algorithm to chose the lower of those two elements.]
Figure 16.4 declares class BinaryArray. This class is similar to LinearArrayit has two private instance variables, a constructor, a search method (binarySearch), a remainingElements method and a toString method. Lines 1322 declare the constructor. After initializing the array with random ints from 1099 (lines 1819), line 21 calls the Arrays.sort method on the array data. Method sort is a static method of class Arrays that sorts the elements in an array in ascending order. Recall that the binary search algorithm will work only on a sorted array.
Figure 16.4. BinaryArray class.
(This item is displayed on pages 792 - 793 in the print version)
1 // Fig 16.4: BinaryArray.java 2 // Class that contains an array of random integers and a method 3 // that uses binary search to find an integer. 4 import java.util.Random; 5 import java.util.Arrays; 6 7 public class BinaryArray 8 { 9 private int [] data; // array of values 10 private static Random generator = new Random(); 11 12 // create array of given size and fill with random integers 13 public BinaryArray( int size ) 14 { 15 data = new int [ size ]; // create space for array 16 17 // fill array with random ints in range 10-99 18 for ( int i = 0; i < size; i++ ) 19 data[ i ] = 10 + generator.nextInt( 90 ); 20 21 Arrays.sort( data ); 22 } // end BinaryArray constructor 23 24 // perform a binary search on the data 25 public int binarySearch( int searchElement ) 26 { 27 int low = 0 ; // low end of the search area 28 int high = data.length - 1 ; // high end of the search area 29 int middle = ( low + high + 1 ) / 2 ; // middle element 30 int location = -1; // return value; -1 if not found 31 32 do // loop to search for element 33 { 34 // print remaining elements of array 35 System.out.print( remainingElements( low, high ) ); 36 37 // output spaces for alignment 38 for ( int i = 0; i < middle; i++ ) 39 System.out.print( " " ); 40 System.out.println( " * " ); // indicate current middle 41 42 // if the element is found at the middle 43 if ( searchElement == data[ middle ] ) 44 location = middle; // location is the current middle 45 46 // middle element is too high 47 else if ( searchElement < data[ middle ] ) 48 high = middle - 1 ; // eliminate the higher half 49 else // middle element is too low 50 low = middle + 1 ; // eliminate the lower half 51 52 middle = ( low + high + 1 ) / 2 ; // recalculate the middle 53 } while ( ( low <= high ) && ( location == -1 ) ); 54 55 return location; // return location of search key 56 } // end method binarySearch 57 58 // method to output certain values in array 59 public String remainingElements( int low, int high ) 60 { 61 StringBuffer temporary = new StringBuffer(); 62 63 // output spaces for alignment 64 for ( int i = 0; i < low; i++ ) 65 temporary.append( " " ); 66 67 // output elements left in array 68 for ( int i = low; i <= high; i++ ) 69 temporary.append( data[ i ] + " " ); 70 71 temporary.append( " " ); 72 return temporary.toString(); 73 } // end method remainingElements 74 75 // method to output values in array 76 public String toString() 77 { 78 return remainingElements( 0, data.length - 1 ); 79 } // end method toString 80 } // end class BinaryArray |
Lines 2556 declare method binarySearch. The search key is passed into parameter searchElement (line 25). Lines 2729 calculate the low end index, high end index and middle index of the portion of the array that the program is currently searching. At the beginning of the method, the low end is 0, the high end is the length of the array minus 1 and the middle is the average of these two values. Line 30 initializes the location of the element to -1the value that will be returned if the element is not found. Lines 3253 loop until low is greater than high (this occurs when the element is not found) or location does not equal -1 (indicating that the search key was found). Line 43 tests whether the value in the middle element is equal to searchElement. If this is TRue, line 44 assigns middle to location. Then the loop terminates and location is returned to the caller. Each iteration of the loop tests a single value (line 43) and eliminates half of the remaining values in the array (line 48 or 50).
Lines 2644 of Fig. 16.5 loop until the user enters -1. For each other number the user enters, the program performs a binary search on the data to determine whether it matches an element in the array. The first line of output from this program is the array of ints, in increasing order. When the user instructs the program to search for 23, the program first tests the middle element, which is 42 (as indicated by *). The search key is less than 42, so the program eliminates the second half of the array and tests the middle element from the first half of the array. The search key is smaller than 34, so the program eliminates the second half of the array, leaving only three elements. Finally, the program checks 23 (which matches the search key) and returns the index 1.
Figure 16.5. BinarySearchTest class.
(This item is displayed on pages 794 - 795 in the print version)
1 // Fig 16.5: BinarySearchTest.java 2 // Use binary search to locate an item in an array. 3 import java.util.Scanner; 4 5 public class BinarySearchTest 6 { 7 public static void main( String args[] ) 8 { 9 // create Scanner object to input data 10 Scanner input = new Scanner( System.in ); 11 12 int searchInt; // search key 13 int position; // location of search key in array 14 15 // create array and output it 16 BinaryArray searchArray = new BinaryArray( 15 ); 17 System.out.println( searchArray ); 18 19 // get input from user 20 System.out.print( 21 "Please enter an integer value (-1 to quit): " ); 22 searchInt = input.nextInt(); // read an int from user 23 System.out.println(); 24 25 // repeatedly input an integer; -1 terminates the program 26 while ( searchInt != -1 ) 27 { 28 // use binary search to try to find integer 29 position = searchArray.binarySearch( searchInt ); 30 31 // return value of -1 indicates integer was not found 32 if ( position == -1 ) 33 System.out.println( "The integer " + searchInt + 34 " was not found. " ); 35 else 36 System.out.println( "The integer " + searchInt + 37 " was found in position " + position + ". " ); 38 39 // get input from user 40 System.out.print( 41 "Please enter an integer value (-1 to quit): " ); 42 searchInt = input.nextInt(); // read an int from user 43 System.out.println(); 44 } // end while 45 } // end main 46 } // end class BinarySearchTest
|
Efficiency of Binary Search
In the worst-case scenario, searching a sorted array of 1,023 elements will take only 10 comparisons when using a binary search. Repeatedly dividing 1,023 by 2 (because after each comparison, we are able to eliminate half of the array) and rounding down (because we also remove the middle element) yields the values 511, 255, 127, 63, 31, 15, 7, 3, 1 and 0. The number 1023 (210 1) is divided by 2 only 10 times to get the value 0, which indicates that there are no more elements to test. Dividing by 2 is equivalent to one comparison in the binary-search algorithm. Thus, an array of 1,048,575 (220 1) elements takes a maximum of 20 comparisons to find the key, and an array of over one billion elements takes a maximum of 30 comparisons to find the key. This is a tremendous improvement in performance over the linear search. For a one-billion-element array, this is a difference between an average of 500 million comparisons for the linear search and a maximum of only 30 comparisons for the binary search! The maximum number of comparisons needed for the binary search of any sorted array is the exponent of the first power of 2 greater than the number of elements in the array which is represented as log2n. All logarithms grow at roughly the same rate, so in big O notation the base can be omitted. This results in a big O of O(log n) for a binary search which is also known as logarithmic run time.