Monday 2 April 2012

Simple Array


 2. Simple Array

The array is the most commonly used data storage structure; it’s built into most programming languages. Because arrays are so well known, they offer a convenient jumping off place for introducing data structures and for seeing
how object-oriented programming and data structures relate to one another. In this chapter we’ll introduce arrays in Java and demonstrate a home-made array class. We’ll also examine a special kind of array, the ordered array, in which the data is stored in ascending (or descending) key order. This arrangement makes possible a fast way of searching for a data item: the binary search.
We’ll start the chapter with a Java Workshop applet that shows insertion, searching, and deletion in an array. Then we’ll show some sample Java code that carries out these same operations.
Later we’ll examine ordered arrays, again starting with a Workshop applet. This applet will demonstrate a binary search. At the end of the chapter we’ll talk about Big O notation, the most widely used measure of algorithm efficiency. 

Suppose you’re coaching kids-league baseball, and you want to keep track of which players are present at the practice field. What you need is an attendance-monitoring program for your laptop—a program that maintains a database of the players who have shown up for practice. You can use a simple data structure to hold this data. There are several actions you would like to be able to perform:
• Insert a player into the data structure when the player arrives at the field.
• Check to see whether a particular player is present, by searching for the player’s number in the structure.
• Delete a player from the data structure when that player goes home. These three operations—insertion, searching, and deletion—will be the fundamental ones in most of the data storage structures we’ll study in this book.
We’ll often begin the discussion of a particular data structure by demonstrating it with a Workshop applet. This approach will give you a feeling for what the structure and its algorithms do, before we launch into a detailed explanation and demonstrate sample code. The Workshop applet called Array shows how an array can be used to implement insertion, searching, and deletion.
Now start up the Array Workshop applet, as described in Appendix A, “Running the Workshop Applets and Example Programs,” with
C:\>appletviewer Array.html
Following figure shows the resulting array with 20 elements, 10 of which have data items in them. You can think of these items as representing your baseball players. Imagine that each player has been issued a team shirt with the player’s number on the back.
To make things visually interesting, the shirts come in a variety of colors. You can see each player’s number and shirt color in the array.


                                         The Array Workshop applet. 


This applet demonstrates the three fundamental procedures mentioned earlier:
• The Ins button inserts a new data item.
• The Find button searches for specified data item.
• The Del button deletes a specified data item.
Using the New button, you can create a new array of a size you specify. You can fill this array with as many data items as you want using the Fill button. Fill creates a set of items and randomly assigns them numbers and colors. The numbers are in the range 0 to 999. You can’t create an array of more than 60 cells, and you can’t, of course, fill more data items than there are array cells.
Also, when you create a new array, you’ll need to decide whether duplicate items will be allowed; we’ll return to this question in a moment. The default value is no duplicates, so the No Dups radio button is initially selected to indicate this setting.
Insertion
Start with the default arrangement of 20 cells and 10 data items, and the No Dups button selected. You insert a baseball player’s number into the array when the player arrives at the practice field, having been dropped off by a parent. To insert a new item, press the Ins button once. You’ll be prompted to enter the value of the item:
Enter key of item to insert
Type a number, say 678, into the text field in the upper-right corner of the applet. (Yes, it is hard to get three digits on the back of a kid’s shirt.) Press Ins again and the applet will confirm your choice:
Will insert item with key 678
A final press of the button will cause a data item, consisting of this value and a random color, to appear in the first empty cell in the array. The prompt will say something like
Inserted item with key 678 at index 10
Each button press in a Workshop applet corresponds to a step that an algorithm carries out. The more steps required, the longer the algorithm takes. In the Array Workshop applet the insertion process is very fast, requiring only a single step. This is true because a new item is always inserted in the first vacant cell in  the array, and the algorithm knows this location because it knows how many items are already in the array. The new item is simply inserted in the next available space. Searching and deletion, however, are not so fast.
In no-duplicates mode you’re on your honor not to insert an item with the same key as an existing item. If you do, the applet displays an error message, but it won’t prevent the insertion. The assumption is that you won’t make this mistake.
Searching
To begin a search, click the Find button. You’ll be prompted for the key number of the person you’re looking for. Pick a number that appears on an item somewhere in the middle of the array. Type in the number and repeatedly press the Find button. At each button press, one step in the algorithm is carried out. You’ll see the red arrow start at cell 0 and move methodically down the cells, examining a new one each time you press the button. The index number in the message Checking next cell, index = 2
will change as you go along. When you reach the specified item, you’ll see the message Have found item with key 505 or whatever key value you typed in. Assuming duplicates are not allowed, the search will terminate as soon as an item with the specified key value is found.
If you have selected a key number that is not in the array, the applet will examine every occupied cell in the array before telling you that it can’t find that item. Notice that (again assuming duplicates are not allowed) the search algorithm must look through an average of half the data items to find a specified item. Items close to the beginning of the array will be found sooner, and those toward the end will be found later. If N is the number of items, the average number of steps needed to find an item is N/2. In the worst-case scenario, the specified item is in the last occupied cell, and N steps will be required to find it.
As we noted, the time an algorithm takes to execute is proportional to the number of steps, so searching takes much longer on the average (N/2 steps) than insertion (one step).
Deletion
To delete an item, you must first find it. After you type in the number of the item to be deleted, repeated button presses will cause the arrow to move, step by step, down the array until the item is located. The next button press deletes the item, and the cell becomes empty. (Strictly speaking, this step isn’t necessary because we’re going to copy over this cell anyway, but deleting the item makes it clearer what’s happening.)
Implicit in the deletion algorithm is the assumption that holes are not allowed in the array. A hole is one or more empty cells that have filled cells above them (at higher index numbers). If holes are allowed, all the algorithms become more complicated because they must check to see whether a cell is empty before examining its contents. Also, the algorithms become less efficient because they must waste time looking at unoccupied cells. For these reasons, occupied cells must be arranged contiguously: no holes allowed.
Therefore, after locating the specified item and deleting it, the applet must shift the contents of each subsequent cell down one space to fill in the hole. Figure 2.2 shows an example. 


                                                  Deleting an item
If the item in cell 5 (38, in Figure 2.2) is deleted, the item in 6 shifts into 5, the item in 7 shifts into 6, and so on to the last occupied cell. During the deletion process, when the item is located, the applet shifts down the contents of the higher-indexed cells as you continue to press the Del button.
A deletion requires (assuming no duplicates are allowed) searching through an average of N/2 elements and then moving the remaining elements (an average of N/2 moves) to fill up the resulting hole. This is N steps in all.
The Duplicates Issue
When you design a data storage structure, you need to decide whether items with duplicate keys will be allowed. If you’re working with a personnel file and the key is an employee number, duplicates don’t make much sense; there’s no point in assigning the same number to two employees. On the other hand, if the key value is last names, then there’s a distinct possibility several employees will have the same key value, so duplicates should be allowed. 
 Of course, for the baseball players, duplicate numbers should not be allowed. Keeping track of the players would be hard if more than one wore the same number.
The Array Workshop applet lets you select either option. When you use New to create a new array, you’re prompted to specify both its size and whether duplicates
are permitted. Use the radio buttons Dups OK or No Dups to make this selection. If you’re writing a data storage program in which duplicates are not allowed, you may need to guard against human error during an insertion by checking all the data items in the array to ensure that none of them already has the same key value as the item being inserted. This check is inefficient, however, and increases the number of steps required for an insertion from one to N. For this reason, our applet does not perform this check.
Searching with Duplicates
Allowing duplicates complicates the search algorithm, as we noted. Even if it finds a match, it must continue looking for possible additional matches until the last occupied cell. At least, this is one approach; you could also stop after the first match. How you proceed depends on whether the question is “Find me everyone with blue eyes” or “Find me someone with blue eyes.” When the Dups OK button is selected, the applet takes the first approach, finding all items matching the search key. This approach always requires N steps because the algorithm must go all the way to the last occupied cell.
Insertion with Duplicates
Insertion is the same with duplicates allowed as when they’re not: A single step inserts the new item. But remember, if duplicates are not allowed, and there’s a possibility the user will attempt to input the same key twice, you may need to check every existing item before doing an insertion.
Deletion with Duplicates
Deletion may be more complicated when duplicates are allowed, depending on exactly how “deletion” is defined. If it means to delete only the first item with a specified value, then, on the average, only N/2 comparisons and N/2 moves are necessary. This is the same as when no duplicates are allowed.
If, however, deletion means to delete every item with a specified key value, the same operation may require multiple deletions. Such an operation will require checking N cells and (probably) moving more than N/2 cells. The average depends on how the duplicates are distributed throughout the array.
The applet assumes this second meaning and deletes multiple items with the same key. This is complicated because each time an item is deleted, subsequent items must be shifted farther. For example, if three items are deleted, then items beyond the last deletion will need to be shifted three spaces. To see how this operation works, set the applet to Dups OK and insert three or four items with the same key. Then try deleting them.
Table 2.1 shows the average number of comparisons and moves for the three operations, first where no duplicates are allowed and then where they are allowed. N is the number of items in the array. Inserting a new item counts as one move.

 

You can explore these possibilities with the Array Workshop applet. The difference between N and N/2 is not usually considered very significant, except when you’re fine-tuning a program. Of more importance, as we’ll discuss toward the end of this chapter, is whether an operation takes one step, N steps, log(N) steps, or N2 steps.
Not Too Swift
One of the significant things to notice when you’re using the Array applet is the slow and methodical nature of the algorithms. With the exception of insertion, the algorithms involve stepping through some or all of the cells in the array. Different data structures offer much faster (but more complex) algorithms. We’ll see one, the binary search on an ordered array, later in this chapter, and others throughout this book.
2.2 The Basics of Arrays in Java
The preceding section showed graphically the primary algorithms used for arrays. Now we’ll see how to write programs to carry out these algorithms, but we first want to cover a few of the fundamentals of arrays in Java.
If you’re a Java expert, you can skip ahead to the next section, but even C and C++ programmers should stick around. Arrays in Java use syntax similar to that in C and C++ (and not that different from other languages), but there are nevertheless some unique aspects to the Java approach.
Creating an Array
As we noted in Chapter 1, “Overview,” there are two kinds of data in Java: primitive types (such as int and double) and objects. In many programming languages (even object-oriented ones such as C++), arrays are primitive types, but in Java they’re treated as objects. Accordingly, you must use the new operator to create an array:
int[] intArray;// defines a reference to an array
intArray = new int[100]; // creates the array, and
                                       // sets intArray to refer to it
Or you can use the equivalent single-statement approach:
int[] intArray = new int[100];
The [] operator is the sign to the compiler we’re naming an array object and not an ordinary variable. You can also use an alternative syntax for this operator, placing it after the name instead of the type:
int intArray[] = new int[100]; // alternative syntax
However, placing the [] after the int makes it clear that the [] is part of the type, not the name. Because an array is an object, its name—intArray in the preceding code—is a reference to an array; it’s not the array itself. The array is stored at an address elsewhere in memory, and intArray holds only this address.
Arrays have a length field, which you can use to find the size (the number of elements) of an array:
int arrayLength = intArray.length; // find array size
As in most programming languages, you can’t change the size of an array after it’s been created.
Accessing Array Elements
Array elements are accessed using an index number in square brackets. This is similar to how other languages work:
temp = intArray[3]; // get contents of fourth element of array
intArray[7] = 66; // insert 66 into the eighth cell
Remember that in Java, as in C and C++, the first element is numbered 0, so that the indices in an array of 10 elements run from 0 to 9.
The Basics of Arrays in Java
If you use an index that’s less than 0 or greater than the size of the array less 1, you’ll get the Array Index Out of Bounds runtime error.
Initialization
Unless you specify otherwise, an array of integers is automatically initialized to 0 when it’s created. Unlike C++, this is true even of arrays defined within a method (function). Say you create an array of objects like this:
autoData[] carArray = new autoData[4000];
Until the array elements are given explicit values, they contain the special null object. If you attempt to access an array element that contains null, you’ll get the runtime error Null Pointer Assignment. The moral is to make sure you assign something to an element before attempting to access it. You can initialize an array of a primitive type to something besides 0 using this syntax:
int[] intArray = { 0, 3, 6, 9, 12, 15, 18, 21, 24, 27 };
Perhaps surprisingly, this single statement takes the place of both the reference declaration and the use of new to create the array. The numbers within the curly brackets are called the initialization list. The size of the array is determined by the number of values in this list.
An Array Example
Let’s look at some example programs that show how an array can be used. We’ll start with an old-fashioned procedural version and then show the equivalent object oriented approach. Listing 2.1 shows the old-fashioned version, called array.java.
class ArrayApp
{
public static void main(String[] args)
{
long[] arr;  // reference to array
arr = new long[100];  // make array
int nElems = 0;  // number of items
int j;  // loop counter
long searchKey;  // key of item to search for
arr[0] = 77;  // insert 10 items
arr[1] = 99;
arr[2] = 44;
arr[3] = 55;
arr[4] = 22;
arr[5] = 88;
arr[6] = 11;
arr[7] = 00;
arr[8] = 66;
arr[9] = 33;
nElems = 10;  // now 10 items in array
//--------------------------------------------------------------
for(j=0; j<nElems; j++)  // display items
System.out.print(arr[j] + “ “);
System.out.println(“”);
//--------------------------------------------------------------
searchKey = 66;  // find item with key 66
for(j=0; j<nElems; j++)  // for each element,
if(arr[j] == searchKey)  // found item?
break;   // yes, exit before end
if(j == nElems)  // at the end?
System.out.println(“Can’t find “ + searchKey); // yes
else
System.out.println(“Found “ + searchKey);  // no
//--------------------------------------------------------------
searchKey = 55;  // delete item with key 55
for(j=0; j<nElems; j++)  // look for it
if(arr[j] == searchKey)
break;
for(int k=j; k<nElems-1; k++)  // move higher ones down
arr[k] = arr[k+1];
nElems--;   // decrement size
//--------------------------------------------------------------
for(j=0; j<nElems; j++)   // display items
System.out.print( arr[j] + “ “);
System.out.println(“”);
} // end main()
} // end class ArrayApp
In this program, we create an array called arr, place 10 data items (kids’ numbers) in it, search for the item with value 66 (the shortstop, Louisa), display all the items, remove the item with value 55 (Freddy, who had a dentist appointment), and then display the remaining 9 items. The output of the program looks like this:
77 99 44 55 22 88 11 0 66 33
Found 66
77 99 44 22 88 11 0 66 33
The data we’re storing in this array is type long. We use long to make it clearer that this is data; type int is used for index values. We’ve chosen a primitive type to simplify the coding. Generally, the items stored in a data structure consist of several fields, so they are represented by objects rather than primitive types. We’ll see such an example toward the end of this chapter.
Insertion
Inserting an item into the array is easy; we use the normal array syntax:
arr[0] = 77;
We also keep track of how many items we’ve inserted into the array with the nElems variable.
Searching
The searchKey variable holds the value we’re looking for. To search for an item, we step through the array, comparing searchKey with each element. If the loop variable j reaches the last occupied cell with no match being found, the value isn’t in the array. Appropriate messages are displayed: Found 66 or Can’t find 27.
Deletion
Deletion begins with a search for the specified item. For simplicity, we assume (perhaps rashly) that the item is present. When we find it, we move all the items with higher index values down one element to fill in the “hole” left by the deleted element, and we decrement nElems. In a real program, we would also take appropriate action if the item to be deleted could not be found.
Display
Displaying all the elements is straightforward: We step through the array, accessing each one with arr[j] and displaying it.
Program Organization
The organization of array.java leaves something to be desired. The program has only one class, ArrayApp, and this class has only one method, main(). array.java is essentially an old-fashioned procedural program. Let’s see if we can make it easier to understand (among other benefits) by making it more object oriented. We’re going to provide a gradual introduction to an object-oriented approach, using two steps. In the first, we’ll separate the data storage structure (the array) from the rest of the program. The remaining part of the program will become a user of the structure. In the second step, we’ll improve the communication between the storage structure and its user.
2.3 Dividing a Program into Classes
The array.java program in Listing 2.1 essentially consists of one big method. We can reap many benefits by dividing the program into classes. What classes? The data storage structure itself is one candidate, and the part of the program that uses this data structure is another. By dividing the program into these two classes, we can clarify the functionality of the program, making it easier to design and understand (and in real programs to modify and maintain).
In array.java we used an array as a data storage structure, but we treated it simply as a language element. Now we’ll encapsulate the array in a class, called LowArray. We’ll also provide class methods by which objects of other classes (the LowArrayApp class in this case) can access the array. These methods allow communication between LowArray and LowArrayApp.
Our first design of the LowArray class won’t be entirely successful, but it will demonstrate the need for a better approach.
// lowArray.java
// demonstrates array class with low-level interface
// to run this program: C>java LowArrayApp
////////////////////////////////////////////////////////////////
class LowArray
{
private long[] a;  // ref to array a
//--------------------------------------------------------------
public LowArray(int size)  // constructor
{ a = new long[size]; }  // create array
//--------------------------------------------------------------
public void setElem(int index, long value)  // set value
{ a[index] = value; 
}
//--------------------------------------------------------------
public long getElem(int index)   // get value
{ return a[index]; }
//--------------------------------------------------------------
} // end class LowArray
////////////////////////////////////////////////////////////////
class LowArrayApp
{
public static void main(String[] args)
{
LowArray arr;   // reference
arr = new LowArray(100);   // create LowArray object
int nElems = 0;    // number of items in array
int j;    // loop variable
arr.setElem(0,77);   // insert 10 items
arr.setElem(1,99);
arr.setElem(2,44);
arr.setElem(3, 55);
arr.setElem(4,22);
arr.setElem(5,88);
arr.setElem(6,11);
arr.setElem(7,00);
arr.setElem(8,66);
arr.setElem(9,33);    // now 10 items in array
nElems = 10; 
for(j=0; j<nElems; j++)    // display items
System.out.print(arr.getElem(j) + “ “);
System.out.println(“”);
int searchKey = 26;    // search for data item
for(j=0; j<nElems; j++)   // for each element,
if(arr.getElem(j) == searchKey) // found item?
break;
if(j == nElems)   // no
System.out.println(“Can’t find “ + searchKey);
else   // yes
System.out.println(“Found “ + searchKey);
for(j=0; j<nElems; j++)
if(arr.getElem(j) == 55)
break;
for(int k=j; k<nElems; k++)
// delete value 55
// look for it
// higher ones down
arr.setElem(k, arr.getElem(k+1) );
nElems--;   // decrement size
}
for(j=0; j<nElems; j++)   // display items
System.out.print( arr.getElem(j) + “ “);
System.out.println(“”);
} // end main()
// end class LowArrayApp

The output from the lowArray.java program is similar to that from array.java, except that we try to find a non-existent key value (26) before deleting the item with the key value 55:
77 99 44 55 22 88 11 0 66 33
Can’t find 26
77 99 44 22 88 11 0 66 33
Classes LowArray and LowArrayApp
In lowArray.java, we essentially wrap the class LowArray around an ordinary Java array. The array is hidden from the outside world inside the class; it’s private, so only LowArray class methods can access it. There are three LowArray methods: setElem() and getElem(), which insert and retrieve an element, respectively; and a constructor, which creates an empty array of a specified size.
Another class, LowArrayApp, creates an object of the LowArray class and uses it to store and manipulate data. Think of LowArray as a tool and LowArrayApp as a user of the tool. We’ve divided the program into two classes with clearly defined roles. This is a valuable first step in making a program object oriented. A class used to store data objects, as is LowArray in the lowArray.java program, is sometimes called a container class. Typically, a container class not only stores the data but also provides methods for accessing the data and perhaps also sorting it and performing other complex actions on it.
2.4 Class Interfaces
We’ve seen how a program can be divided into separate classes. How do these classes interact with each other? Communication between classes and the division of responsibility between them are important aspects of object-oriented programming. This point is especially true when a class may have many different users. Typically, a class can be used over and over by different users (or the same user) for different purposes. For example, someone might use the LowArray  class in some other program to store the serial numbers of his traveler’s checks. The class can handle this task just as well as it can store the numbers of baseball players.
If a class is used by many different programmers, the class should be designed so that it’s easy to use. The way that a class user relates to the class is called the class interface. Because class fields are typically private, when we talk about the interface, we usually mean the class methods—what they do and what their arguments are. By calling these methods, a class user interacts with an object of the class. One of the important advantages conferred by object-oriented programming is that a class interface can be designed to be as convenient and efficient as possible.
Not So Convenient
The interface to the LowArray class in lowArray.java is not particularly convenient. The methods setElem() and getElem() operate on a low conceptual level, performing exactly the same tasks as the [] operator in an ordinary Java array. The class user, represented by the main() method in the LowArrayApp class, ends up having to carry out the same low-level operations it did in the non-class version of an array in the array.java program. The only difference was that it related to setElem() and getElem() instead of the [] operator. It’s not clear that this approach is an improvement.
Also notice that there’s no convenient way to display the contents of the array. Somewhat crudely, the LowArrayApp class simply uses a for loop and the getElem() method for this purpose. We could avoid repeated code by writing a separate method for LowArrayApp that it could call to display the array contents, but is it really the responsibility of the LowArrayApp class to provide this method?
Thus, lowArray.java demonstrates how to divide a program into classes, but it really doesn’t buy us too much in practical terms. Let’s see how to redistribute responsibilities between the classes to obtain more of the advantages of OOP.
Who’s Responsible for What?
In the lowArray.java program, the main()routine in the LowArrayApp class, the user of the data storage structure, must keep track of the indices to the array. For some users of an array, who need random access to array elements and don’t mind keeping track of the index numbers, this arrangement might make sense. For example, sorting an array, as we’ll see in the next chapter, can make efficient use of this direct hands-on approach.
In a typical program, however, the user of the data storage device won’t find access to the array indices to be helpful or relevant.
The highArray.java Example
Out next example program shows an improved interface for the storage structure class, called HighArray. Using this interface, the class user (the HighArrayApp class) no longer needs to think about index numbers. The setElem() and getElem() methods are gone; they’re replaced by insert(), find(), and delete(). These new methods don’t require an index number as an argument because the class takes responsibility for handling index numbers. The user of the class (HighArrayApp) is free to concentrate on the what instead of the how—what’s going to be inserted, deleted, and accessed, instead of exactly how these activities are carried out.
Figure 2.4 shows the HighArray interface, and Listing 2.3 shows the highArray.java
program.
// highArray.java
// demonstrates array class with high-level interface
// to run this program: C>java HighArrayApp
////////////////////////////////////////////////////////////////
class HighArray
{
private long[] a; // ref to array a
private int nElems; // number of data items
//-----------------------------------------------------------
public HighArray(int max) // constructor
{
a = new long[max]; // create the array
nElems = 0; // no items yet
}
//-----------------------------------------------------------
public boolean find(long searchKey){  // find specified value
int j;
for(j=0; j<nElems; j++) // for each element,
if(a[j] == searchKey) // found item?

break;  // exit loop before end
if(j == nElems)  // gone to end?
return false; // yes, can’t find it
else
return true; // no, found it
} // end find()
//-----------------------------------------------------------
public void insert(long value) // put element into array
{
a[nElems] = value; // insert it
nElems++; // increment size
}
//-----------------------------------------------------------
public boolean delete(long value)
{
int j;
for(j=0; j<nElems; j++) // look for it
if( value == a[j] )
break;
if(j==nElems) // can’t find it
return false;
else // found it
{
for(int k=j; k<nElems; k++) // move higher ones down
a[k] = a[k+1];
nElems--; // decrement size
return true;
}
} // end delete()
//-----------------------------------------------------------
public void display() // displays array contents
{
for(int j=0; j<nElems; j++) // for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//-----------------------------------------------------------
} // end class HighArray
////////////////////////////////////////////////////////////////
class HighArrayApp
{
public static void main(String[] args)


{
int maxSize = 100;// array size
HighArray arr; // reference to array
arr = new HighArray(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.display(); // display items
int searchKey = 35; // search for item
if( arr.find(searchKey) )
System.out.println(“Found “ + searchKey);
else
System.out.println(“Can’t find “ + searchKey);
arr.delete(00);
arr.delete(55);
arr.delete(99); // delete 3 items
arr.display(); // display items again
} // end main()
} // end class HighArrayApp
////////////////////////////////////////////////////////////////

 The HighArray class is now wrapped around the array. In main(), we create an array of this class and carry out almost the same operations as in the lowArray.java program: We insert 10 items, search for an item—one that isn’t there—and display the array contents. Because deleting is so easy, we delete 3 items (0, 55, and 99) instead of 1 and finally display the contents again. Here’s the output:
77 99 44 55 22 88 11 0 66 33
Can’t find 35
77 44 22 88 11 66 33
Notice how short and simple main() is. The details that had to be handled by main() in lowArray.java are now handled by HighArray class methods. In the HighArray class, the find() method looks through the array for the item whose key value was passed to it as an argument. It returns true or false, depending on whether it finds the item.
The insert() method places a new data item in the next available space in the array. A field called nElems keeps track of the number of array cells that are actually filled with data items. The main() method no longer needs to worry about how many items are in the array.
The delete() method searches for the element whose key value was passed to it as an argument and, when it finds that element, shifts all the elements in higher index cells down one cell, thus writing over the deleted value; it then decrements nElems.
We’ve also included a display() method, which displays all the values stored in the array.
The User’s Life Made Easier
In lowArray.java (Listing 2.2), the code in main() to search for an item required eight lines; in highArray.java, it requires only one. The class user, the HighArrayApp class, need not worry about index numbers or any other array details. Amazingly, the class user doesn’t even need to know what kind of data structure the HighArray class is using to store the data. The structure is hidden behind the interface. In fact, in the next section, we’ll see the same interface used with a somewhat different data structure.
Abstraction
The process of separating the how from the what—how an operation is performed inside a class, as opposed to what’s visible to the class user—is called abstraction. Abstraction is an important aspect of software engineering. By abstracting class functionality, we make it easier to design a program because we don’t need to think about implementation details at too early a stage in the design process.
The Ordered Workshop Applet
Imagine an array in which the data items are arranged in order of ascending key values—that is, with the smallest value at index 0, and each cell holding a value larger than the cell below. Such an array is called an ordered array.
When we insert an item into this array, the correct location must be found for the insertion: just above a smaller value and just below a larger one. then all the larger values must be moved up to make room.
Why would we want to arrange data in order? One advantage is that we can speed up search times dramatically using a binary search. Start the Ordered Workshop applet, using the procedure described in Chapter 1.
You’ll see an array; it’s similar to the one in the Array Workshop applet, but the data is ordered.Following Figure shows this applet.



                                        The Ordered Workshop applet.
In the ordered array we’ve chosen not to allow duplicates. As we saw earlier, this decision speeds up searching somewhat but slows down insertion.
Linear Search
Two search algorithms are available for the Ordered Workshop applet: linear and binary. Linear search is the default. Linear searches operate in much the same way as the searches in the unordered array in the Array applet: The red arrow steps along, looking for a match. The difference is that in the ordered array, the search quits if an item with a larger key is found.
Try out a linear search. Make sure the Linear radio button is selected. Then use the Find button to search for a non-existent value that, if it were present, would fit somewhere in the middle of the array. In Figure 2.5, this number might be 400.
You’ll see that the search terminates when the first item larger than 400 is reached; it’s 427 in the figure. The algorithm knows there’s no point looking further. Try out the Ins and Del buttons as well. Use Ins to insert an item with a key value that will go somewhere in the middle of the existing items. You’ll see that insertion requires moving all the items with key values larger than the item being inserted.
Use the Del button to delete an item from the middle of the array. Deletion works much the same as it did in the Array applet, shifting items with higher index numbers down to fill in the hole left by the deletion. In the ordered array, however, the deletion algorithm can quit partway through if it doesn’t find the item, just as the search routine can.
Binary Search
The payoff for using an ordered array comes when we use a binary search. This kind of search is much faster than a linear search, especially for large arrays.
The Guess-a-Number Game
Binary search uses the same approach you did as a kid (if you were smart) to guess a number in the well-known children’s guessing game. In this game, a friend asks you to guess a number she’s thinking of between 1 and 100. When you guess a number, she’ll tell you one of three things: Your guess is larger than the number she’s thinking of, it’s smaller, or you guessed correctly.
To find the number in the fewest guesses, you should always start by guessing 50. If your friend says your guess is too low, you deduce the number is between 51 and 100, so your next guess should be 75 (halfway between 51 and 100). If she says it’s too high, you deduce the number is between 1 and 49, so your next guess should be 25.
Each guess allows you to divide the range of possible values in half. Finally, the range is only one number long, and that’s the answer.
Notice how few guesses are required to find the number. If you used a linear search, guessing first 1, then 2, then 3, and so on, finding the number would take you, on the average, 50 guesses. In a binary search each guess divides the range of possible values in half, so the number of guesses required is far fewer. Table 2.2 shows a game session when the number to be guessed is 33.



The correct number is identified in only seven guesses. This is the maximum. You might get lucky and guess the number before you’ve worked your way all the way down to a range of one. This would happen if the number to be guessed was 50, for example, or 34.
Binary Search in the Ordered Workshop Applet
To perform a binary search with the Ordered Workshop applet, you must use the New button to create a new array. After the first press, you’ll be asked to specify the size of the array (maximum 60) and which kind of searching scheme you want: linear or binary. Choose binary by clicking the Binary radio button. After the array is created, use the Fill button to fill it with data items. When prompted, type the amount (not more than the size of the array). A few more presses fills in all the items.
When the array is filled, pick one of the values in the array and see how you can use the Find button to locate it. After a few preliminary presses, you’ll see the red arrow pointing to the algorithm’s current guess, and you’ll see the range shown by a vertical blue line adjacent to the appropriate cells. Figure 2.6 depicts the situation when the range is the entire array.



                                 Initial range in the binary search.
At each press of the Find button, the range is halved and a new guess is chosen in the middle of the range.Following Figure  shows the next step in the process.




                                       Range in step 2 of the binary search.
Even with a maximum array size of 60 items, a half-dozen button presses suffices to locate any item.
Try using the binary search with different array sizes. Can you figure out how many steps are necessary before you run the applet? We’ll return to this question in the last section of this chapter.
Notice that the insertion and deletion operations also employ the binary search (when it’s selected). The place where an item should be inserted is found with a binary search, as is an item to be deleted. In this applet, items with duplicate keys are not permitted.
2.5 Java Code for an Ordered Array
Let’s examine some Java code that implements an ordered array. We’ll use the OrdArray class to encapsulate the array and its algorithms. The heart of this class is the find() method, which uses a binary search to locate a specified data item. We’ll examine this method in detail before showing the complete program.
Binary Search with the find() Method
The find() method searches for a specified item by repeatedly dividing in half the range of array elements to be considered. The method looks like this:
public int find(long searchKey)
{
int lowerBound = 0;
int upperBound = nElems-1;

int curIn;
while(true)
{
curIn = (lowerBound + upperBound
if(a[curIn]==searchKey)
return curIn;
//
else if(lowerBound > upperBound)
return nElems;
//
else
//
{
if(a[curIn] < searchKey)
lowerBound = curIn + 1; //
else
upperBound = curIn - 1; //
} // end else divide range
} // end while
} // end find()
) / 2;
found it
can’t find it
divide range
it’s in upper half
it’s in lower half
The method begins by setting the lowerBound and upperBound variables to the first and last occupied cells in the array. Setting these variables specifies the range where the item we’re looking for, searchKey, may be found. Then, within the while loop,the current index, curIn, is set to the middle of this range.
If we’re lucky, curIn may already be pointing to the desired item, so we first check if this is true. If it is, we’ve found the item, so we return with its index, curIn. Each time through the loop we divide the range in half. Eventually, the range will get so small that it can’t be divided any more. We check for this in the next statement: If lowerBound is greater than upperBound, the range has ceased to exist. (When lowerBound equals upperBound, the range is one and we need one more pass through the loop.) We can’t continue the search without a valid range, but we haven’t found the desired item, so we return nElems, the total number of items. This isn’t a valid index because the last filled cell in the array is nElems-1. The class user interprets this value to mean that the item wasn’t found.
If curIn is not pointing at the desired item, and the range is still big enough, we’re ready to divide the range in half. We compare the value at the current index, a[curIn], which is in the middle of the range, with the value to be found,
searchKey. If searchKey is larger, we know we should look in the upper half of the range. Accordingly, we move lowerBound up to curIn. Actually, we move it one cell beyond curIn because we’ve already checked curIn itself at the beginning of the loop.

If searchKey is smaller than a[curIn], we know we should look in the lower half of the range. So we move upperBound down to one cell below curIn. Figure 2.8 shows how the range is altered in these two situations.



                                    Dividing the range in a binary search.
 
The OrdArray Class
In general, the orderedArray.java program is similar to highArray.java (Listing 2.3). The main difference is that find() uses a binary search, as we’ve seen.
We could have used a binary search to locate the position where a new item will be inserted. This operation involves a variation on the find() routine, but for simplicity we retain the linear search in insert(). The speed penalty may not be important because, as we’ve seen, an average of half the items must be moved anyway when an insertion is performed, so insertion will not be very fast even if we locate the item with a binary search. However, for the last ounce of speed, you could change the initial part of insert() to a binary search (as is done in the Ordered Workshop applet). Similarly, the delete() method could call find() to figure out the location of the item to be deleted.
The OrdArray class includes a new size() method, which returns the number of data items currently in the array. This information is helpful for the class user, main(), when it calls find(). If find() returns nElems, which main() can discover with size(), then the search was unsuccessful. Listing 2.4 shows the complete listing for the orderedArray.java program.// orderedArray.java
// demonstrates ordered array class
// to run this program: C>java OrderedApp
////////////////////////////////////////////////////////////////
class OrdArray
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//-----------------------------------------------------------
public OrdArray(int max)
// constructor
{
a = new long[max];
// create array
nElems = 0;
}
//-----------------------------------------------------------
public int size()
{ return nElems; }
//-----------------------------------------------------------
public int find(long searchKey)
{
int lowerBound = 0;
int upperBound = nElems-1;
int curIn;
while(true)
{
curIn = (lowerBound + upperBound ) / 2;
if(a[curIn]==searchKey)
return curIn;
// found it
else if(lowerBound > upperBound)
return nElems;
// can’t find it
else
// divide range
{
if(a[curIn] < searchKey)
lowerBound = curIn + 1; // it’s in upper half
else
upperBound = curIn - 1; // it’s in lower half
} // end else divide range
} // end while
} // end find()
//-----------------------------------------------------------

public void insert(long value)
// put element into array
{
int j;
for(j=0; j<nElems; j++)
// find where it goes
if(a[j] > value)
// (linear search)
break;
for(int k=nElems; k>j; k--)
// move bigger ones up
a[k] = a[k-1];
a[j] = value;
// insert it
nElems++;
// increment size
} // end insert()
//-----------------------------------------------------------
public boolean delete(long value)
{
int j = find(value);
if(j==nElems)
// can’t find it
return false;
else
// found it
{
for(int k=j; k<nElems; k++) // move bigger ones down
a[k] = a[k+1];
nElems--;
// decrement size
return true;
}
} // end delete()
//-----------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//-----------------------------------------------------------
} // end class OrdArray
////////////////////////////////////////////////////////////////
class OrderedApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
OrdArray arr;
// reference to array
arr = new OrdArray(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
int searchKey = 55;
// search for item
if( arr.find(searchKey) != arr.size() )
System.out.println(“Found “ + searchKey);
else
System.out.println(“Can’t find “ + searchKey);
arr.display(); // display items
arr.delete(00); // delete 3 items
arr.delete(55);
arr.delete(99);
arr.display();
// display items again
} // end main()
} // end class OrderedApp
////////////////////////////////////////////////////////////////
Advantages of Ordered Arrays
What have we gained by using an ordered array? The major advantage is that search times are much faster than in an unordered array. The disadvantage is that insertion takes longer because all the data items with a higher key value must be moved up to make room. Deletions are slow in both ordered and unordered arrays because items must be moved down to fill the hole left by the deleted item.
Ordered arrays are therefore useful in situations in which searches are frequent, but insertions and deletions are not. An ordered array might be appropriate for a database of company employees, for example. Hiring new employees and laying off existing ones would probably be infrequent occurrences compared with accessing an existing employee’s record for information, or updating it to reflect changes in salary, address, and so on.
A retail store inventory, on the other hand, would not be a good candidate for an ordered array because the frequent insertions and deletions, as items arrived in the store and were sold, would run slowly.
2.6 Logarithms
In this section we’ll explain how logarithms are used to calculate the number of steps necessary in a binary search. If you’re a math major, you can probably skip this section. If math makes you break out in a rash, you can also skip it, except for taking a long hard look at Table 2.3.
We’ve seen that a binary search provides a significant speed increase over a linear search. In the number-guessing game, with a range from 1 to 100, a maximum of seven guesses is needed to identify any number using a binary search; just as in an array of 100 records, seven comparisons are needed to find a record with a specified key value. How about other ranges? Table 2.3 shows some representative ranges and the number of comparisons needed for a binary search.

Range                   Comparisons Needed
10                                      4
100                                    7
1,000                                 10
10,000                               14
100,000                             17
1,000,000                          20
10,000,000                        24
100,000,000                      27
1,000,000,000                   30


Notice the differences between binary search times and linear search times. For very small numbers of items, the difference isn’t dramatic. Searching 10 items would take an average of five comparisons with a linear search (N/2) and a maximum of four comparisons with a binary search. But the more items there are, the bigger the difference. With 100 items, there are 50 comparisons in a linear search, but only 7 in a binary search. For 1,000 items, the numbers are 500 versus 10, and for 1,000,000 items, they’re 500,000 versus 20. We can conclude that for all but very small arrays, the binary search is greatly superior.
The Equation
You can verify the results of Table 2.3 by repeatedly dividing a range (from the first column) in half until it’s too small to divide further. The number of divisions this process requires is the number of comparisons shown in the second column. Repeatedly dividing the range by two is an algorithmic approach to finding the number of comparisons. You might wonder if you could also find the number using a simple equation. Of course, there is such an equation, and it’s worth exploring here because it pops up from time to time in the study of data structures. This formula involves logarithms. (Don’t panic yet.)
The numbers in Table 2.3 leave out some interesting data. They don’t answer such questions as, What is the exact size of the maximum range that can be searched in five steps? To solve this problem, we must create a similar table, but one that starts at the beginning, with a range of one, and works up from there by multiplying the range by two each time. Table 2.4 shows how this looks for the first seven steps. For our original problem with a range of 100, we can see that 6 steps don’t produce a range quite big enough (64), while 7 steps cover it handily (128). Thus, the 7 steps that are shown for 100 items in Table 2.3 are correct, as are the 10 steps for a range of 1000.
Doubling the range each time creates a series that’s the same as raising two to a power, as shown in the third column of Table 2.4. We can express this power as a formula. If s represents steps (the number of times you multiply by two—that is, the power to which two is raised) and r represents the range, then the equation is
 r = 2^s
If you know s, the number of steps, this tells you r, the range. For example, if s is 6, the range is 26, or 64.
The Opposite of Raising Two to a Power
Our original question was the opposite of the one just described: Given the range, we want to know how many comparisons are required to complete a search. That is, given r, we want an equation that gives us s.
The inverse of raising something to a power is called a logarithm. Here’s the formula we want, expressed with a logarithm:
s = log2(r)
This equation says that the number of steps (comparisons) is equal to the logarithm to the base 2 of the range. What’s a logarithm? The base 2 logarithm of a number r is the number of times you must multiply two by itself to get r. In Table 2.4, we show that the numbers in the first column, s, are equal to log2(r). How do you find the logarithm of a number without doing a lot of dividing? Pocket calculators and most computer languages have a log function. It is usually log to the base 10, but you can convert easily to base 2 by multiplying by 3.322. For example, log10(100) = 2, so log2(100) = 2 times 3.322, or 6.644. Rounded up to the whole number 7, this is what appears in the column to the right of 100 in Table 2.4.
In any case, the point here isn’t to calculate logarithms. It’s more important to understand the relationship between a number and its logarithm. Look again at Table 2.3, which compares the number of items and the number of steps needed to find a particular item. Every time you multiply the number of items (the range) by a factor of 10, you add only three or four steps (actually 3.322, before rounding off to whole numbers) to the number needed to find a particular element. This is true because, as a number grows larger, its logarithm doesn’t grow nearly as fast. We’ll compare this logarithmic growth rate with that of other mathematical functions when we talk about Big O notation later in this chapter.
2.7 Storing Objects
In the Java examples we’ve shown so far, we’ve stored primitive variables of type long in our data structures. Storing such variables simplifies the program examples, but it’s not representative of how you use data storage structures in the real world.
Usually, the data items (records) you want to store are combinations of many fields. For a personnel record, you would store last name, first name, age, Social Security number, and so forth. For a stamp collection, you would store the name of the country that issued the stamp, its catalog number, condition, current value, and so on.
In our next Java example, we’ll show how objects, rather than variables of primitive types, can be stored.
The Person Class
In Java, a data record is usually represented by a class object. Let’s examine a typical class used for storing personnel data. Here’s the code for the Person class:
class Person
{
private String lastName;
private String firstName;
private int age;
//-----------------------------------------------------------
public Person(String last, String first, int a)
{
// constructor
lastName = last;
firstName = first;
age = a;
}
//-----------------------------------------------------------
public void displayPerson()
{
System.out.print(“
Last name: “ + lastName);
System.out.print(“, First name: “ + firstName);
System.out.println(“, Age: “ + age);
}
//-----------------------------------------------------------
public String getLast()
// get last name
{ return lastName; }
} // end class Person
We show only three variables in this class, for a person’s last name, first name, and age. Of course, records for most applications would contain many additional fields. A constructor enables a new Person object to be created and its fields initialized. The displayPerson() method displays a Person object’s data, and the getLast() method returns the Person’s last name; this is the key field used for searches.
The classDataArray.java Program
The program that makes use of the Person class is similar to the highArray.java program (Listing 2.3) that stored items of type long. Only a few changes are necessary to adapt that program to handle Person objects. Here are the major changes:
• The type of the array a is changed to Person.
• The key field (the last name) is now a String object, so comparisons require the equals() method rather than the == operator. The getLast() method of Person obtains the last name of a Person object, and equals() does the comparison:
if( a[j].getLast().equals(searchName) )
// found item?
• The insert() method creates a new Person object and inserts it in the array, instead of inserting a long value. The main() method has been modified slightly, mostly to handle the increased quantity of output. We still insert 10 items, display them, search for 1 item, delete 3 items, and display them all again. Listing 2.5 shows the complete classDataArray.java program.
LISTING 2.5
The classDataArray.java Program
// classDataArray.java
// data items as class objects
// to run this program: C>java ClassDataApp
////////////////////////////////////////////////////////////////
class Person
{
private String lastName;
private String firstName;
private int age;
//--------------------------------------------------------------
public Person(String last, String first, int a)
{
// constructor
lastName = last;
firstName = first;
age = a;
}
//--------------------------------------------------------------
public void displayPerson()
{
System.out.print(“
Last name: “ + lastName);
System.out.print(“, First name: “ + firstName);
System.out.println(“, Age: “ + age);
}
//--------------------------------------------------------------

public String getLast()
// get last name
{ return lastName; }
} // end class Person
////////////////////////////////////////////////////////////////
class ClassDataArray
{
private Person[] a;
// reference to array
private int nElems;
// number of data items
public ClassDataArray(int max)
// constructor
{
a = new Person[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public Person find(String searchName)
{
// find specified value
int j;
for(j=0; j<nElems; j++)
// for each element,
if( a[j].getLast().equals(searchName) ) // found item?
break;
// exit loop before end
if(j == nElems)
// gone to end?
return null;
// yes, can’t find it
else
return a[j];
// no, found it
} // end find()
//-------------------------------------------------------------
// put person into array
public void insert(String last, String first, int age)
{
a[nElems] = new Person(last, first, age);
nElems++;
// increment size
}
//--------------------------------------------------------------
public boolean delete(String searchName)
{
// delete person from array
int j;
for(j=0; j<nElems; j++)
// look for it
if( a[j].getLast().equals(searchName) )
break;
if(j==nElems)
// can’t find it
return false;
else
// found it
{
for(int k=j; k<nElems; k++)
// shift down
a[k] = a[k+1];
nElems--;
// decrement size
return true;
}
} // end delete()
//--------------------------------------------------------------
public void displayA()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
a[j].displayPerson();
// display it
}
//--------------------------------------------------------------
} // end class ClassDataArray
////////////////////////////////////////////////////////////////
class ClassDataApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ClassDataArray arr;
// reference to array
arr = new ClassDataArray(maxSize); // create the array
// insert 10 items
arr.insert(“Evans”, “Patty”, 24);
arr.insert(“Smith”, “Lorraine”, 37);
arr.insert(“Yee”, “Tom”, 43);
arr.insert(“Adams”, “Henry”, 63);
arr.insert(“Hashimoto”, “Sato”, 21);
arr.insert(“Stimson”, “Henry”, 29);
arr.insert(“Velasquez”, “Jose”, 72);
arr.insert(“Lamarque”, “Henry”, 54);
arr.insert(“Vang”, “Minh”, 22);
arr.insert(“Creswell”, “Lucinda”, 18);
arr.displayA(); // display items
String searchKey = “Stimson”;
    Person found;
found=arr.find(searchKey);
if(found != null)
{
System.out.print(“Found “);
found.displayPerson();
}
else
System.out.println(“Can’t find “ + searchKey);
System.out.println(“Deleting Smith, Yee, and Creswell”);
arr.delete(“Smith”);
// delete 3 items
arr.delete(“Yee”);
arr.delete(“Creswell”);
arr.displayA();
// display items again
} // end main()
} // end class ClassDataApp
////////////////////////////////////////////////////////////////
Here’s the output of this program:
Last name: Evans, First name: Patty, Age: 24
Last name: Smith, First name: Lorraine, Age: 37
Last name: Yee, First name: Tom, Age: 43
Last name: Adams, First name: Henry, Age: 63
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Stimson, First name: Henry, Age: 29
Last name: Velasquez, First name: Jose, Age: 72
Last name: Lamarque, First name: Henry, Age: 54
Last name: Vang, First name: Minh, Age: 22
Last name: Creswell, First name: Lucinda, Age: 18
Found     Last name: Stimson, First name: Henry, Age: 29
Deleting Smith, Yee, and Creswell
Last name: Evans, First name: Patty, Age: 24
Last name: Adams, First name: Henry, Age: 63
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Stimson, First name: Henry, Age: 29
Last name: Velasquez, First name: Jose, Age: 72
Last name: Lamarque, First name: Henry, Age: 54
Last name: Vang, First name: Minh, Age: 22

 The classDataArray.java program shows that class objects can be handled by data storage structures in much the same way as primitive types. (Note that a serious program using the last name as a key would need to account for duplicate last names, which would complicate the programming as discussed earlier.)
2.8 Big O Notation
Automobiles are divided by size into several categories: subcompacts, compacts, midsize, and so on. These categories provide a quick idea what size car you’re talking about, without needing to mention actual dimensions. Similarly, it’s useful to have a shorthand way to say how efficient a computer algorithm is. In computer science, this rough measure is called “Big O” notation. You might think that in comparing algorithms you would say things like “Algorithm A is twice as fast as algorithm B,” but in fact this sort of statement isn’t too meaningful. Why not? Because the proportion can change radically as the number of items changes. Perhaps you increase the number of items by 50%, and now A is three times as fast as B. Or you have half as many items, and A and B are now equal. What you need is a comparison that tells how an algorithm’s speed is related to the number of items. Let’s see how this looks for the algorithms we’ve seen so far.
Insertion in an Unordered Array: Constant
Insertion into an unordered array is the only algorithm we’ve seen that doesn’t depend on how many items are in the array. The new item is always placed in the next available position, at a[nElems], and nElems is then incremented. Insertion requires the same amount of time no matter how big N—the number of items in the array—is. We can say that the time, T, to insert an item into an unsorted array is a constant K:
T=K
In a real situation, the actual time (in microseconds or whatever) required by the insertion is related to the speed of the microprocessor, how efficiently the compiler has generated the program code, and other factors. The constant K in the preceding equation is used to account for all such factors. To find out what K is in a real situation, you need to measure how long an insertion took. (Software exists for this very purpose.) K would then be equal to that time.
Linear Search: Proportional to N
We’ve seen that, in a linear search of items in an array, the number of comparisons that must be made to find a specified item is, on the average, half of the total number of items. Thus, if N is the total number of items, the search time T is proportional to half of N:
T=K*N/2
As with insertions, discovering the value of K in this equation would require timing a search for some (probably large) value of N and then using the resulting value of T to calculate K. When you know K, you can calculate T for any other value of N. For a handier formula, we could lump the 2 into the K. Our new K is equal to the old K divided by 2. Now we have
T=K*N
This equation says that average linear search times are proportional to the size of the array. If an array is twice as big, searching it will take twice as long.
Binary Search: Proportional to log(N)
Similarly, we can concoct a formula relating T and N for a binary search:
T = K * log2(N)
As we saw earlier, the time is proportional to the base 2 logarithm of N. Actually, because any logarithm is related to any other logarithm by a constant (3.322 to go from base 2 to base 10), we can lump this constant into K as well. Then we don’t need to specify the base:
T = K * log(N)
Don’t Need the Constant
Big O notation looks like the formulas just described, but it dispenses with the constant K. When comparing algorithms, you don’t really care about the particular microprocessor chip or compiler; all you want to compare is how T changes for different values of N, not what the actual numbers are. Therefore, the constant isn’t needed.
Big O notation uses the uppercase letter O, which you can think of as meaning “order of.” In Big O notation, we would say that a linear search takes O(N) time, and a binary search takes O(log N) time. Insertion into an unordered array takes O(1), or constant time. (That’s the numeral 1 in the parentheses.)
Figure 2.9 graphs some Big O relationships between time and number of items. Based on this graph, we might rate the various Big O values (very subjectively) like this:
O(1) is excellent, O(log N) is good, O(N) is fair, and O(N2) is poor. O(N2) occurs in the bubble sort and also in certain graph algorithms that we’ll look at later in this book.



                                                   Graph of Big O times.
The idea in Big O notation isn’t to give actual figures for running times but to convey how the running times are affected by the number of items. This is the most meaningful way to compare algorithms, except perhaps actually measuring running times in a real installation.
2.9 Why Not Use Arrays for Everything?

Arrays seem to get the job done, so why not use them for all data storage? We’ve already seen some of their disadvantages. In an unordered array you can insert items quickly, in O(1) time, but searching takes slow O(N) time. In an ordered array you can search quickly, in O(logN) time, but insertion takes O(N) time. For both kinds of arrays, deletion takes O(N) time because half the items (on the average) must be moved to fill in the hole.
It would be nice if there were data structures that could do everything—insertion, deletion, and searching—quickly, ideally in O(1) time, but if not that, then in O(logN) time. In the chapters ahead, we’ll see how closely this ideal can be approached, and the price that must be paid in complexity.
Another problem with arrays is that their size is fixed when they are first created
with new. Usually, when the program first starts, you don’t know exactly how many items will be placed in the array later, so you guess how big it should be. If your guess is too large, you’ll waste memory by having cells in the array that are never filled. If your guess is too small, you’ll overflow the array, causing at best a message to the program’s user, and at worst a program crash.
Other data structures are more flexible and can expand to hold the number of items inserted in them. The linked list, discussed in Chapter 5, “Linked Lists,” is such a structure.
We should mention that Java includes a class called Vector that acts much like an array but is expandable. This added capability comes at the expense of some loss of efficiency.
You might want to try creating your own vector class. If the class user is about to overflow the internal array in this class, the insertion algorithm creates a new array of larger size, copies the old array contents to the new array, and then inserts the new item. This whole process would be invisible to the class user.

No comments:

Post a Comment