Monday, 2 April 2012

3. Simple Shorting

3. Simple Shorting

As soon as you create a significant database, you’ll probably think of reasons to sort it in various ways. You need to arrange names in alphabetical order, students by grade, customers by ZIP code, home sales by price, cities in order
of increasing population, countries by GNP, stars by magnitude, and so on.
Sorting data may also be a preliminary step to searching it.
As we saw in Chapter 2, “Arrays,” a binary search, which can be applied only to sorted data, is much faster than a linear search.
Because sorting is so important and potentially so timeconsuming, it has been the subject of extensive research in computer science, and some very sophisticated methods have been developed. In this chapter we’ll look at three of
the simpler algorithms: the bubble sort, the selection sort, and the insertion sort. Each is demonstrated with its own Workshop applet. In Chapter 7, “Advanced Sorting,” we’ll look at more sophisticated approaches: Shellsort and
quicksort.
The techniques described in this chapter, while unsophisticated and comparatively slow, are nevertheless worth examining. Besides being easier to understand, they are actually better in some circumstances than the more
sophisticated algorithms. The insertion sort, for example, is preferable to quicksort for small files and for almost-sorted files. In fact, an insertion sort is commonly used as a part of a quicksort implementation.
The example programs in this chapter build on the array classes we developed in the preceding chapter. The sorting algorithms are implemented as methods of similar array classes.
Be sure to try out the Workshop applets included in this chapter. They are more
effective in explaining how the sorting algorithms work than prose and static
pictures could ever be.
How Would You Do It?
Imagine that your kids-league baseball team (mentioned in Chapter 1, “Overview”) is lined up on the field, as shown in Figure 3.1. The regulation nine players, plus an extra, have shown up for practice. You want to arrange the players in order of increasing height (with the shortest player on the left) for the team picture. How would you go about this sorting process?


As a human being, you have advantages over a computer program. You can see all the kids at once, and you can pick out the tallest kid almost instantly. You don’t need to laboriously measure and compare everyone. Also, the kids don’t need to occupy particular places. They can jostle each other, push each other a little to make room, and stand behind or in front of each other. After some ad hoc rearranging, you would have no trouble in lining up all the kids, as shown in Figure 3.2.


A computer program isn’t able to glance over the data in this way. It can compare only two players at one time because that’s how the comparison operators work. This tunnel vision on the part of algorithms will be a recurring theme. Things may seem simple to us humans, but the algorithm can’t see the big picture and must, therefore, concentrate on the details and follow some simple rules. The three algorithms in this chapter all involve two steps, executed over and over until the data is sorted:
1. Compare two items.
2. Swap two items, or copy one item.
However, each algorithm handles the details in a different way.
3.1 Bubble Sort
The bubble sort is notoriously slow, but it’s conceptually the simplest of the sorting algorithms and for that reason is a good beginning for our exploration of sorting techniques.
Bubble Sort on the Baseball Players
Imagine that you’re near-sighted (like a computer program) so that you can see only two of the baseball players at the same time, if they’re next to each other and if you stand very close to them. Given this impediment, how would you sort them? Let’s assume there are N players, and the positions they’re standing in are numbered from 0 on the left to N-1 on the right.
The bubble sort routine works like this: You start at the left end of the line and
compare the two kids in positions 0 and 1. If the one on the left (in 0) is taller, you swap them. If the one on the right is taller, you don’t do anything. Then you move over one position and compare the kids in positions 1 and 2. Again, if the one on the left is taller, you swap them. This sorting process is shown in Figure 3.3. Here are the rules you’re following:
1. Compare two players.
2. If the one on the left is taller, swap them.
3. Move one position right.





You continue down the line this way until you reach the right end. You have
by no means finished sorting the kids, but you do know that the tallest kid is
on the right. This must be true because, as soon as you encounter the tallest
kid, you’ll end up swapping him (or her) every time you compare two kids,
until eventually he (or she) will reach the right end of the line. This is why it’s
called the bubble sort: As the algorithm progresses, the biggest items “bubble
up” to the top end of the array. Figure 3.4 shows the baseball players at the end
of the first pass.

After this first pass through all the data, you’ve made N-1 comparisons and
somewhere between 0 and N-1 swaps, depending on the initial arrangement of
the players. The item at the end of the array is sorted and won’t be moved
again. Now you go back and start another pass from the left end of the line. Again, you go toward the right, comparing and swapping when appropriate. However, this time you can stop one player short of the end of the line, at position N-2,
because you know the last position, at N-1, already contains the tallest player.
This rule could be stated as:
4. When you reach the first sorted player, start over at the left end of the line.
You continue this process until all the players are in order. Describing this process is much harder than demonstrating it, so let’s watch the BubbleSort Workshop applet at work.
The BubbleSort Workshop Applet
Start the BubbleSort Workshop applet. You’ll see something that looks like a bar
graph, with the bar heights randomly arranged, as shown in following Figure

 The Run Button
This Workshop applet contains a two-speed graph: You can either let it run by itself, or you can single-step through the process. To get a quick idea what happens, click the Run button. The algorithm will bubble-sort the bars. When it finishes, in 10 seconds or so, the bars will be sorted, as shown in following figure

The New Button
To do another sort, press the New button. New creates a new set of bars and initializes the sorting routine. Repeated presses of New toggle between two arrangements of bars: a random order, as shown in Figure 3.5, and an inverse ordering where the bars are sorted backward. This inverse ordering provides an extra challenge for many sorting algorithms.
The Step Button
The real payoff for using the BubbleSort Workshop applet comes when you single step through a sort. You can see exactly how the algorithm carries out each step. Start by creating a new randomly arranged graph with New. You’ll see three arrows pointing at different bars. Two arrows, labeled inner and inner+1, are side by side on the left. Another arrow, outer, starts on the far right. (The names are chosen to correspond to the inner and outer loop variables in the nested loops used in the algorithm.)
Click once on the Step button. You’ll see the inner and the inner+1 arrows move
together one position to the right, swapping the bars if appropriate. These arrows correspond to the two players you compared, and possibly swapped, in the baseball scenario.
A message under the arrows tells you whether the contents of inner and inner+1 will be swapped, but you know this just from comparing the bars: If the taller one is on the left, they’ll be swapped. Messages at the top of the graph tell you how many swaps and comparisons have been carried out so far. (A complete sort of 10 bars requires 45 comparisons and, on the average, about 22 swaps.)
Continue pressing Step. Each time inner and inner+1 finish going all the way from 0 to outer, the outer pointer moves one position to the left. At all times during the sorting process, all the bars to the right of outer are sorted; those to the left of (and at) outer are not.
The Size Button
The Size button toggles between 10 bars and 100 bars. Figure 3.7 shows what the 100 random bars look like.
You probably don’t want to single-step through the sorting process for 100 bars,
unless you’re unusually patient. Press Run instead, and watch how the blue inner
and inner+1 pointers seem to find the tallest unsorted bar and carry it down the row to the right, inserting it just to the left of the previously sorted bars.
Figure 3.8 shows the situation partway through the sorting process. The bars to the right of the red (longest) arrow are sorted. The bars to the left are beginning to look sorted, but much work remains to be done.
                                      The BubbleSort applet with 100 bars.

                                            The 100 partly sorted bars.

If you started a sort with Run and the arrows are whizzing around, you can freeze the process at any point by pressing the Step button. You can then single-step to watch the details of the operation or press Run again to return to high-speed mode.
The Draw Button
Sometimes while running the sorting algorithm at full speed, the computer takes
time off to perform some other task. This can result in some bars not being drawn. If this happens, you can press the Draw button to redraw all the bars. Doing so pauses the run, so you’ll need to press the Run button again to continue. You can press Draw at any time there seems to be a glitch in the display.
Java Code for a Bubble Sort
In the bubbleSort.java program, shown in Listing 3.1, a class called ArrayBub encapsulates an array a[], which holds variables of type long.
In a more serious program, the data would probably consist of objects, but we use a primitive type for simplicity. (We’ll see how objects are sorted in the objectSort.java program in Listing 3.4.) Also, to reduce the size of the listing, we don’t show find() and delete() methods with the ArrayBub class, although they would normally be part of a such a class.

// bubbleSort.java
// demonstrates bubble sort
// to run this program: C>java BubbleSortApp
////////////////////////////////////////////////////////////////
class ArrayBub
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArrayBub(int max)
// constructor
{
a = new long[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public void insert(long value)
// put element into array
{
a[nElems] = value;
// insert it
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void bubbleSort()
{
int out, in;
for(out=nElems-1; out>1; out--)
// outer loop (backward)
for(in=0; in<out; in++)
// inner loop (forward)
if( a[in] > a[in+1] )
// out of order?
swap(in, in+1);
// swap them
} // end bubbleSort()
//--------------------------------------------------------------
private void swap(int one, int two)
{
long temp = a[one];
a[one] = a[two];
a[two] = temp;
}
//--------------------------------------------------------------
} // end class ArrayBub
////////////////////////////////////////////////////////////////
class BubbleSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArrayBub arr;
// reference to array
arr = new ArrayBub(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.display(); // display items
arr.bubbleSort(); // bubble sort them



arr.display();
// display them again
} // end main()
} // end class BubbleSortApp
////////////////////////////////////////////////////////////////

The constructor and the insert() and display() methods of this class are similar to
those we’ve seen before. However, there’s a new method: bubbleSort(). When this method is invoked from main(), the contents of the array are rearranged into sorted order.
The main() routine inserts 10 items into the array in random order, displays the
array, calls bubbleSort() to sort it, and then displays it again. Here’s the output:
77 99 44 55 22 88 11 0 66 33
0 11 22 33 44 55 66 77 88 99
The bubbleSort() method is only four lines long. Here it is, extracted from the listing:
public void bubbleSort()
{
int out, in;
for(out=nElems-1; out>1; out--)
for(in=0; in<out; in++)
if( a[in] > a[in+1] )
swap(in, in+1);
} // end bubbleSort()
//
//
//
//
outer loop (backward)
inner loop (forward)
out of order?
swap them
The idea is to put the smallest item at the beginning of the array (index 0) and the largest item at the end (index nElems-1). The loop counter out in the outer for loop starts at the end of the array, at nElems-1, and decrements itself each time through the loop. The items at indices greater than out are always completely sorted. The out variable moves left after each pass by in so that items that are already sorted are no longer involved in the algorithm.
The inner loop counter in starts at the beginning of the array and increments itself each cycle of the inner loop, exiting when it reaches out. Within the inner loop, the two array cells pointed to by in and in+1 are compared, and swapped if the one in in is larger than the one in in+1.
For clarity, we use a separate swap() method to carry out the swap. It simply
exchanges the two values in the two array cells, using a temporary variable to hold the value of the first cell while the first cell takes on the value in the second and then setting the second cell to the temporary value. Actually, using a separate swap() method may not be a good idea in practice because the function call adds a small amount of overhead. If you’re writing your own sorting routine, you may prefer to put the swap instructions in line to gain a slight increase in speed.
Invariants
In many algorithms there are conditions that remain unchanged as the algorithm
proceeds. These conditions are called invariants. Recognizing invariants can be useful in understanding the algorithm. In certain situations they may also be helpful in debugging; you can repeatedly check that the invariant is true, and signal an error if it isn’t.
In the bubbleSort.java program, the invariant is that the data items to the right of out are sorted. This remains true throughout the running of the algorithm. (On the first pass, nothing has been sorted yet, and there are no items to the right of out because it starts on the rightmost element.)
Efficiency of the Bubble Sort
As you can see by watching the BubbleSort Workshop applet with 10 bars, the inner and inner+1 arrows make nine comparisons on the first pass, eight on the second, and so on, down to one comparison on the last pass. For 10 items, this is
9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 45
In general, where N is the number of items in the array, there are N-1 comparisons on the first pass, N-2 on the second, and so on. The formula for the sum of such a series is
(N–1) + (N–2) + (N–3) + ... + 1 = N*(N–1)/2
N*(N–1)/2 is 45 (10*9/2) when N is 10.
Thus, the algorithm makes about N ⁄2 comparisons (ignoring the –1, which doesn’t make much difference, especially if N is large).
There are fewer swaps than there are comparisons because two bars are swapped only
if they need to be. If the data is random, a swap is necessary about half the time, so there will be about N ⁄4 swaps. (Although in the worst case, with the initial data inversely sorted, a swap is necessary with every comparison.)
Both swaps and comparisons are proportional to N2. Because constants don’t count in Big O notation, we can ignore the 2 and the 4 and say that the bubble sort runs in O(N2) time. This is slow, as you can verify by running the BubbleSort Workshop applet with 100 bars.
Whenever you see one loop nested within another, such as those in the bubble sort and the other sorting algorithms in this chapter, you can suspect that an algorithm runs in O(N2) time. The outer loop executes N times, and the inner loop executes N (or perhaps N divided by some constant) times for each cycle of the outer loop. This means you’re doing something approximately N*N or N2 times. 

3.2 Selection Sort
The selection sort improves on the bubble sort by reducing the number of swaps
necessary from O(N2) to O(N). Unfortunately, the number of comparisons remains O(N2). However, the selection sort can still offer a significant improvement for large records that must be physically moved around in memory, causing the swap time to be much more important than the comparison time. (Typically, this isn’t the case in Java, where references are moved around, not entire objects.)
Selection Sort on the Baseball Players
Let’s consider the baseball players again. In the selection sort, you can no longer
compare only players standing next to each other. Thus, you’ll need to remember a certain player’s height; you can use a notebook to write it down. A magenta-colored towel will also come in handy.
A Brief Description
What’s involved in the selection sort is making a pass through all the players and
picking (or selecting, hence the name of the sort) the shortest one. This shortest
player is then swapped with the player on the left end of the line, at position 0. Now the leftmost player is sorted and won’t need to be moved again. Notice that in this algorithm the sorted players accumulate on the left (lower indices), whereas in the bubble sort they accumulated on the right.
The next time you pass down the row of players, you start at position 1, and, finding the minimum, swap with position 1. This process continues until all the players are sorted.
A More Detailed Description
In more detail, start at the left end of the line of players. Record the leftmost player’s height in your notebook and throw the magenta towel on the ground in front of this person. Then compare the height of the next player to the right with the height in your notebook. If this player is shorter, cross out the height of the first player and record the second player’s height instead. Also move the towel, placing it in front of this new “shortest” (for the time being) player. Continue down the row, comparing each player with the minimum. Change the minimum value in your notebook and move the towel whenever you find a shorter player. When you’re done, the magenta towel will be in front of the shortest player.
Swap this shortest player with the player on the left end of the line. You’ve now
sorted one player. You’ve made N-1 comparisons, but only one swap.
On the next pass, you do exactly the same thing, except that you can completely
ignore the player on the left because this player has already been sorted. Thus, the algorithm starts the second pass at position 1, instead of 0. With each succeeding pass, one more player is sorted and placed on the left, and one less player needs to be considered when finding the new minimum. Figure 3.9 shows how this sort looks for the first three passes.
The SelectSort Workshop Applet
To see how the selection sort looks in action, try out the SelectSort Workshop applet. The buttons operate the same way as those in the BubbleSort applet. Use New to create a new array of 10 randomly arranged bars. The red arrow called outer starts on the left; it points to the leftmost unsorted bar. Gradually, it will move right as more bars are added to the sorted group on its left.
The magenta min arrow also starts out pointing to the leftmost bar; it will move to record the shortest bar found so far. (The magenta min arrow corresponds to the towel in the baseball analogy.) The blue inner arrow marks the bar currently being compared with the minimum.
As you repeatedly press Step, inner moves from left to right, examining each bar in turn and comparing it with the bar pointed to by min. If the inner bar is shorter, min jumps over to this new, shorter bar. When inner reaches the right end of the graph, min points to the shortest of the unsorted bars. This bar is then swapped with outer, the leftmost unsorted bar.
Figure 3.10 shows the situation midway through a sort. The bars to the left of outer are sorted, and inner has scanned from outer to the right end, looking for the shortest bar. The min arrow has recorded the position of this bar, which will be swapped with outer.
Use the Size button to switch to 100 bars, and sort a random arrangement. You’ll see how the magenta min arrow hangs out with a perspective minimum value for a while and then jumps to a new one when the blue inner arrow finds a smaller candidate. The red outer arrow moves slowly but inexorably to the right, as the sorted bars accumulate to its left.
                                              Selection sort on baseball players.

                                            The SelectSort Workshop applet.


Java Code for Selection Sort
The listing for the selectSort.java program is similar to that for bubbleSort.java,
except that the container class is called ArraySel instead of ArrayBub, and the
bubbleSort() method has been replaced by selectSort(). Here’s how this method
looks:
public void selectionSort()
{
int out, in, min;
for(out=0; out<nElems-1; out++)
{
min = out;
for(in=out+1; in<nElems; in++)
if(a[in] < a[min] )
min = in;
swap(out, min);
} // end for(out)
} // end selectionSort()
// outer loop
//
//
//
//
//
minimum
inner loop
if min greater,
we have a new min
swap them
The outer loop, with loop variable out, starts at the beginning of the array (index 0) and proceeds toward higher indices. The inner loop, with loop variable in, begins at out and likewise proceeds to the right.
At each new position of in, the elements a[in] and a[min] are compared. If a[in] is
smaller, then min is given the value of in. At the end of the inner loop, min points to the minimum value, and the array elements pointed to by out and min are swapped.
Listing 3.2 shows the complete selectSort.java program.
// selectSort.java
// demonstrates selection sort
// to run this program: C>java SelectSortApp
////////////////////////////////////////////////////////////////
class ArraySel
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArraySel(int max)
// constructor
{
a = new long[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public void insert(long value)
// put element into array
{
a[nElems] = value;
// insert it
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void selectionSort()
{
int out, in, min;
for(out=0; out<nElems-1; out++)
{
min = out;
for(in=out+1; in<nElems; in++)
if(a[in] < a[min] )
min = in;
// outer loop
//
//
//
//
minimum
inner loop
if min greater,
we have a new min
swap(out, min);
// swap them
} // end for(out)
} // end selectionSort()
//--------------------------------------------------------------
private void swap(int one, int two)
{
long temp = a[one];
a[one] = a[two];
a[two] = temp;
}
//--------------------------------------------------------------
} // end class ArraySel
////////////////////////////////////////////////////////////////
class SelectSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArraySel arr;
// reference to array
arr = new ArraySel(maxSize); // create the array
arr.insert(77); arr.display(); // display items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.selectionSort();
}
// insert 10 items
// selection-sort them
arr.display(); // display them again
} // end main()
// end class SelectSortApp
////////////////////////////////////////////////////////////////
The output from selectSort.java is identical to that from bubbleSort.java:
77 99 44 55 22 88 11 0 66 33
0 11 22 33 44 55 66 77 88 99
Invariant
In the selectSort.java program, the data items with indices less than or equal to out are always sorted.
Efficiency of the Selection Sort
The selection sort performs the same number of comparisons as the bubble sort:
N*(N-1)/2. For 10 data items, this is 45 comparisons. However, 10 items require fewer than 10 swaps. With 100 items, 4,950 comparisons are required, but fewer than 100 swaps. For large values of N, the comparison times will dominate, so we would have to say that the selection sort runs in O(N2) time, just as the bubble sort did. However, it is unquestionably faster because there are so few swaps. For smaller values of N, the selection sort may in fact be considerably faster, especially if the swap times are much larger than the comparison times.
3.3 Insertion Sort
In most cases the insertion sort is the best of the elementary sorts described in this chapter. It still executes in O(N2) time, but it’s about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations. It’s also not too complex, although it’s slightly more involved than the bubble and selection sorts.It’s often used as the final stage of more sophisticated sorts, such  as quicksort.
Insertion Sort on the Baseball Players
To begin the insertion sort, start with your baseball players lined up in random order. (They wanted to play a game, but clearly there’s no time for that.) It’s easier to think about the insertion sort if we begin in the middle of the process, when the team is half sorted.
Partial Sorting
At this point there’s an imaginary marker somewhere in the middle of the line.
(Maybe you threw a red T-shirt on the ground in front of a player.) The players to
the left of this marker are partially sorted. This means that they are sorted among themselves; each one is taller than the person to his or her left. However, the players aren’t necessarily in their final positions because they may still need to be moved when previously unsorted players are inserted between them
Note that partial sorting did not take place in the bubble sort and selection sort. In these algorithms a group of data items was completely sorted at any given time; in the insertion sort a group of items is only partially sorted.
The Marked Player
The player where the marker is, whom we’ll call the “marked” player, and all the
players on her right, are as yet unsorted. This is shown in Figure 3.11.a.


                                 The insertion sort on baseball players.


What we’re going to do is insert the marked player in the appropriate place in the (partially) sorted group. However, to do this, we’ll need to shift some of the sorted players to the right to make room. To provide a space for this shift, we take the marked player out of line. (In the program this data item is stored in a temporary variable.) This step is shown in Figure 3.11.b.
Now we shift the sorted players to make room. The tallest sorted player moves into the marked player’s spot, the next-tallest player into the tallest player’s spot, andso on.
When does this shifting process stop? Imagine that you and the marked player are walking down the line to the left. At each position you shift another player to the right, but you also compare the marked player with the player about to be shifted. The shifting process stops when you’ve shifted the last player that’s taller than the marked player. The last shift opens up the space where the marked player, when inserted, will be in sorted order. This step is shown in Figure 3.11.c.
Now the partially sorted group is one player bigger, and the unsorted group is one player smaller. The marker T-shirt is moved one space to the right, so it’s again in front of the leftmost unsorted player. This process is repeated until all the unsorted players have been inserted (hence the name insertion sort) into the appropriate place in the partially sorted group.
The InsertSort Workshop Applet
Use the InsertSort Workshop applet to demonstrate the insertion sort. Unlike the
other sorting applets, it’s probably more instructive to begin with 100 random bars rather than 10.
Sorting 100 Bars
Change to 100 bars with the Size button, and click Run to watch the bars sort them- selves before your very eyes. You’ll see that the short red outer arrow marks the dividing line between the partially sorted bars to the left and the unsorted bars to the right. The blue inner arrow keeps starting from outer and zipping to the left, looking for the proper place to insert the marked bar. Figure 3.12 shows how this process looks when about half the bars are partially sorted.
The marked bar is stored in the temporary variable pointed to by the magenta arrow at the right end of the graph, but the contents of this variable are replaced so often that it’s hard to see what’s there (unless you slow down to single-step mode). 
Sorting 10 Bars
To get down to the details, use Size to switch to 10 bars. (If necessary, use New to make sure they’re in random order.)
                           The InsertSort Workshop applet with 100 bars.


At the beginning, inner and outer point to the second bar from the left (array index 1), and the first message is Will copy outer to temp. This will make room for the shift. (There’s no arrow for inner-1, but of course it’s always one bar to the left of inner.)
Click the Step button. The bar at outer will be copied to temp. We say that items are copied from a source to a destination. When performing a copy, the applet removes the bar from the source location, leaving a blank. This is slightly misleading because in a real Java program the reference in the source would remain there. However, blanking the source makes it easier to see what’s happening.
What happens next depends on whether the first two bars are already in order
(smaller on the left). If they are, you’ll see the message Have compared inner-1 and temp, no copy necessary.
If the first two bars are not in order, the message is Have compared inner-1 and temp, will copy inner-1 to inner. This is the shift that’s necessary to make room for the value in temp to be reinserted. There’s only one such shift on this first pass; more shifts will be necessary on subsequent passes. The situation is shown in Figure 3.13.
On the next click, you’ll see the copy take place from inner-1 to inner. Also, the
inner arrow moves one space left. The new message is Now inner is 0, so no copy necessary. The shifting process is complete.
No matter which of the first two bars was shorter, the next click will show you Will copy temp to inner. This will happen, but if the first two bars were initially in order, you won’t be able to tell a copy was performed because temp and inner hold the same bar. Copying data over the top of the same data may seem inefficient, but the algorithm runs faster if it doesn’t check for this possibility, which happens comparatively infrequently.

Now the first two bars are partially sorted (sorted with respect to each other), and the outer arrow moves one space right, to the third bar (index 2). The process repeats, with the Will copy outer to temp message. On this pass through the sorted data, there may be no shifts, one shift, or two shifts, depending on where the third bar fits among the first two.
Continue to single-step the sorting process. Again, you can see what’s happening
more easily after the process has run long enough to provide some sorted bars on the left. Then you can see how just enough shifts take place to make room for the reinsertion of the bar from temp into its proper place.
Java Code for Insertion Sort
Here’s the method that carries out the insertion sort, extracted from the
insertSort.java program:
public void insertionSort()
{
int in, out;
for(out=1; out<nElems; out++)
// out is dividing line
{
long temp = a[out];
// remove marked item
in = out;
// start shifts at out
while(in>0 && a[in-1] >= temp) // until one is smaller,
{
a[in] = a[in-1];
// shift item right,
--in;
}
a[in] = temp;
} // end for
// end insertionSort()
// go left one position
// insert marked item
In the outer for loop, out starts at 1 and moves right. It marks the leftmost unsorted data. In the inner while loop, in starts at out and moves left, until either temp is smaller than the array element there, or it can’t go left any further. Each pass through the while loop shifts another sorted element one space right.
It may be hard to see the relation between the steps in the InsertSort Workshop
applet and the code, so Figure 3.14 is an activity diagram of the insertionSort()
method, with the corresponding messages from the InsertSort Workshop applet.
Listing 3.3 shows the complete insertSort.java program.

// insertSort.java
// demonstrates insertion sort
// to run this program: C>java InsertSortApp
//--------------------------------------------------------------
class ArrayIns
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArrayIns(int max)
// constructor
{
a = new long[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public void insert(long value)
// put element into array
{
a[nElems] = value;
// insert it
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void insertionSort()
{
int in, out;
for(out=1; out<nElems; out++)
{
long temp = a[out];
in = out;
while(in>0 && a[in-1] >= temp)
{
a[in] = a[in-1];
--in;
}
a[in] = temp;
// insert marked item
} // end for
} // end insertionSort()
//--------------------------------------------------------------
} // end class ArrayIns
////////////////////////////////////////////////////////////////
class InsertSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArrayIns arr;
// reference to array
arr = new ArrayIns(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.display(); // display items
arr.insertionSort(); // insertion-sort them
arr.display();
// display them again
} // end main()
} // end class InsertSortApp
////////////////////////////////////////////////////////////////
Here’s the output from the insertSort.java program; it’s the same as that from the other programs in this chapter:
77 99 44 55 22 88 11 0 66 33
0 11 22 33 44 55 66 77 88 99
Invariants in the Insertion Sort
At the end of each pass, following the insertion of the item from temp, the data items with smaller indices than outer are partially sorted.
Efficiency of the Insertion Sort
How many comparisons and copies does this algorithm require? On the first pass, it compares a maximum of one item. On the second pass, it’s a maximum of two items, and so on, up to a maximum of N-1 comparisons on the last pass. This is
1 + 2 + 3 + ... + N-1 = N*(N-1)/2
However, because on each pass an average of only half of the maximum number of items are actually compared before the insertion point is found, we can divide by 2, which gives
N*(N-1)/4
The number of copies is approximately the same as the number of comparisons.
However, a copy isn’t as time-consuming as a swap, so for random data this algorithm runs twice as fast as the bubble sort and faster than the selection sort.
In any case, like the other sort routines in this chapter, the insertion sort runs in
O(N2) time for random data.
For data that is already sorted or almost sorted, the insertion sort does much better. When data is in order, the condition in the while loop is never true, so it becomes a simple statement in the outer loop, which executes N-1 times. In this case the algorithm runs in O(N) time. If the data is almost sorted, insertion sort runs in almost O(N) time, which makes it a simple and efficient way to order a file that is only slightly out of order.
However, for data arranged in inverse sorted order, every possible comparison and shift is carried out, so the insertion sort runs no faster than the bubble sort. You can check this using the reverse-sorted data option (toggled with New) in the InsertSort Workshop applet.
3.4 Sorting Objects
For simplicity we’ve applied the sorting algorithms we’ve looked at thus far to a
primitive data type: long. However, sorting routines will more likely be applied to
objects than primitive types. Accordingly, we show a Java program in Listing 3.4,
objectSort.java, that sorts an array of Person objects (last seen in the
classDataArray.java program in Chapter 2).
Java Code for Sorting Objects
The algorithm used in our Java program is the insertion sort from the preceding
section. The Person objects are sorted on lastName; this is the key field. The
objectSort.java program is shown in Listing 3.4.
// objectSort.java
// demonstrates sorting objects (uses insertion sort)
// to run this program: C>java ObjectSortApp
////////////////////////////////////////////////////////////////
class Person
{
private String lastName;
private String firstName;
private int age;
//-----------------------------------------------------------
public Person(String last, String first, int a)
{
// constructor
lastName = last;
firstName = first;
age = a;
}
//-----------------------------------------------------------
public void displayPerson()
{
System.out.print(“
Last name: “ + lastName);
System.out.print(“, First name: “ + firstName);
System.out.println(“, Age: “ + age);
}
//-----------------------------------------------------------
public String getLast()
// get last name
{ return lastName; }
} // end class Person
////////////////////////////////////////////////////////////////
class ArrayInOb
{
private Person[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArrayInOb(int max)
// constructor
{
a = new Person[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
// put person into array
public void insert(String last, String first, int age)
{
a[nElems] = new Person(last, first, age);
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
a[j].displayPerson();
// display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void insertionSort()
{
int in, out;
for(out=1; out<nElems; out++)
{
Person temp = a[out];
in = out;
// out is dividing line
// remove marked person
// start shifting at out
while(in>0 &&
// until smaller one found,
a[in-1].getLast().compareTo(temp.getLast())>0)
{
a[in] = a[in-1];
// shift item to the right
--in;
// go left one position
}
a[in] = temp;
// insert marked item
} // end for
} // end insertionSort()
//--------------------------------------------------------------
} // end class ArrayInOb
////////////////////////////////////////////////////////////////
class ObjectSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArrayInOb arr;
// reference to array
arr = new ArrayInOb(maxSize); // create the array
arr.insert(“Evans”, “Patty”, 24);
arr.insert(“Smith”, “Doc”, 59);
arr.insert(“Smith”, “Lorraine”, 37);
arr.insert(“Smith”, “Paul”, 37);
arr.insert(“Yee”, “Tom”, 43);
arr.insert(“Hashimoto”, “Sato”, 21);
arr.insert(“Stimson”, “Henry”, 29);
arr.insert(“Velasquez”, “Jose”, 72);
arr.insert(“Vang”, “Minh”, 22);
arr.insert(“Creswell”, “Lucinda”, 18);
System.out.println(“Before sorting:”);
arr.display();
// display items
arr.insertionSort();
// insertion-sort them
System.out.println(“After sorting:”);
arr.display();
// display them again
} // end main()
} // end class ObjectSortApp
////////////////////////////////////////////////////////////////
Here’s the output of this program:
Before sorting:
Last name: Evans, First name: Patty, Age: 24
Last name: Smith, First name: Doc, Age: 59
Last name: Smith, First name: Lorraine, Age: 37
Last name: Smith, First name: Paul, Age: 37
Last name: Yee, First name: Tom, Age: 43
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Stimson, First name: Henry, Age: 29
Last name: Velasquez, First name: Jose, Age: 72
Last name: Vang, First name: Minh, Age: 22
Last name: Creswell, First name: Lucinda, Age: 18
After sorting:
Last name: Creswell, First name: Lucinda, Age: 18
Last name: Evans, First name: Patty, Age: 24
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Smith, First name: Doc, Age: 59
Last name: Smith, First name: Lorraine, Age: 37
Last name: Smith, First name: Paul, Age: 37
Last name: Stimson, First name: Henry, Age: 29
Last name: Vang, First name: Minh, Age: 22
Last name: Velasquez, First name: Jose, Age: 72
Last name: Yee, First name: Tom, Age: 43
Lexicographical Comparisons
The insertSort() method in objectSort.java is similar to that in insertSort.java, but it has been adapted to compare the lastName key values of records rather than the value of a primitive type.
We use the compareTo() method of the String class to perform the comparisons in the insertSort() method. Here’s the expression that uses it:
a[in-1].getLast().compareTo(temp.getLast()) > 0
The compareTo() method returns different integer values depending on the lexico graphical (that is, alphabetical) ordering of the String for which it’s invoked and the String passed to it as an argument, as shown in Table 3.1.
TABLE 3.1
Operation of the compareTo() Method s2.compareTo(s1) Return Value
s1 < s2 <0
       0
      >0
s1 equals s2
s1 > s2
For example, if s1 is “cat” and s2 is “dog”, the function will return a number less
than 0. In the objectSort.java program, this method is used to compare the last
name of a[in-1] with the last name of temp.
Stability
Sometimes it matters what happens to data items that have equal keys. For example, you may have employee data arranged alphabetically by last names. (That is, the last names were used as key values in the sort.) Now you want to sort the data by ZIP code, but you want all the items with the same ZIP code to continue to be sorted by last names. You want the algorithm to sort only what needs to be sorted, and leave everything else in its original order. Some sorting algorithms retain this secondary ordering; they’re said to be stable.
All the algorithms in this chapter are stable. For example, notice the output of the objectSort.java program (Listing 3.4). Three persons have the last name of Smith. Initially, the order is Doc Smith, Lorraine Smith, and Paul Smith. After the sort, this ordering is preserved, despite the fact that the various Smith objects have been moved to new locations.
3.5 Comparing the Simple Sorts
There’s probably no point in using the bubble sort, unless you don’t have your algorithm book handy. The bubble sort is so simple that you can write it from memory.
Even so, it’s practical only if the amount of data is small. (For a discussion of what “small” means, see Chapter 15, “When to Use What.”)
The selection sort minimizes the number of swaps, but the number of comparisons is still high. This sort might be useful when the amount of data is small and swapping data items is very time-consuming compared with comparing them. The insertion sort is the most versatile of the three and is the best bet in most situations, assuming the amount of data is small or the data is almost sorted. For larger amounts of data, quicksort is generally considered the fastest approach; we’ll examine quicksort in Chapter 7.
We’ve compared the sorting algorithms in terms of speed. Another consideration for any algorithm is how much memory space it needs. All three of the algorithms in this chapter carry out their sort in place, meaning that, besides the initial array, very little extra memory is required. All the sorts require an extra variable to store an item temporarily while it’s being swapped.
You can recompile the example programs, such as bubbleSort.java, to sort larger amounts of data. By timing them for larger sorts, you can get an idea of the differences between them and the time required to sort different amounts of data on your particular system.

Simple Array


 2. Simple Array

The array is the most commonly used data storage structure; it’s built into most programming languages. Because arrays are so well known, they offer a convenient jumping off place for introducing data structures and for seeing
how object-oriented programming and data structures relate to one another. In this chapter we’ll introduce arrays in Java and demonstrate a home-made array class. We’ll also examine a special kind of array, the ordered array, in which the data is stored in ascending (or descending) key order. This arrangement makes possible a fast way of searching for a data item: the binary search.
We’ll start the chapter with a Java Workshop applet that shows insertion, searching, and deletion in an array. Then we’ll show some sample Java code that carries out these same operations.
Later we’ll examine ordered arrays, again starting with a Workshop applet. This applet will demonstrate a binary search. At the end of the chapter we’ll talk about Big O notation, the most widely used measure of algorithm efficiency. 

Suppose you’re coaching kids-league baseball, and you want to keep track of which players are present at the practice field. What you need is an attendance-monitoring program for your laptop—a program that maintains a database of the players who have shown up for practice. You can use a simple data structure to hold this data. There are several actions you would like to be able to perform:
• Insert a player into the data structure when the player arrives at the field.
• Check to see whether a particular player is present, by searching for the player’s number in the structure.
• Delete a player from the data structure when that player goes home. These three operations—insertion, searching, and deletion—will be the fundamental ones in most of the data storage structures we’ll study in this book.
We’ll often begin the discussion of a particular data structure by demonstrating it with a Workshop applet. This approach will give you a feeling for what the structure and its algorithms do, before we launch into a detailed explanation and demonstrate sample code. The Workshop applet called Array shows how an array can be used to implement insertion, searching, and deletion.
Now start up the Array Workshop applet, as described in Appendix A, “Running the Workshop Applets and Example Programs,” with
C:\>appletviewer Array.html
Following figure shows the resulting array with 20 elements, 10 of which have data items in them. You can think of these items as representing your baseball players. Imagine that each player has been issued a team shirt with the player’s number on the back.
To make things visually interesting, the shirts come in a variety of colors. You can see each player’s number and shirt color in the array.


                                         The Array Workshop applet. 


This applet demonstrates the three fundamental procedures mentioned earlier:
• The Ins button inserts a new data item.
• The Find button searches for specified data item.
• The Del button deletes a specified data item.
Using the New button, you can create a new array of a size you specify. You can fill this array with as many data items as you want using the Fill button. Fill creates a set of items and randomly assigns them numbers and colors. The numbers are in the range 0 to 999. You can’t create an array of more than 60 cells, and you can’t, of course, fill more data items than there are array cells.
Also, when you create a new array, you’ll need to decide whether duplicate items will be allowed; we’ll return to this question in a moment. The default value is no duplicates, so the No Dups radio button is initially selected to indicate this setting.
Insertion
Start with the default arrangement of 20 cells and 10 data items, and the No Dups button selected. You insert a baseball player’s number into the array when the player arrives at the practice field, having been dropped off by a parent. To insert a new item, press the Ins button once. You’ll be prompted to enter the value of the item:
Enter key of item to insert
Type a number, say 678, into the text field in the upper-right corner of the applet. (Yes, it is hard to get three digits on the back of a kid’s shirt.) Press Ins again and the applet will confirm your choice:
Will insert item with key 678
A final press of the button will cause a data item, consisting of this value and a random color, to appear in the first empty cell in the array. The prompt will say something like
Inserted item with key 678 at index 10
Each button press in a Workshop applet corresponds to a step that an algorithm carries out. The more steps required, the longer the algorithm takes. In the Array Workshop applet the insertion process is very fast, requiring only a single step. This is true because a new item is always inserted in the first vacant cell in  the array, and the algorithm knows this location because it knows how many items are already in the array. The new item is simply inserted in the next available space. Searching and deletion, however, are not so fast.
In no-duplicates mode you’re on your honor not to insert an item with the same key as an existing item. If you do, the applet displays an error message, but it won’t prevent the insertion. The assumption is that you won’t make this mistake.
Searching
To begin a search, click the Find button. You’ll be prompted for the key number of the person you’re looking for. Pick a number that appears on an item somewhere in the middle of the array. Type in the number and repeatedly press the Find button. At each button press, one step in the algorithm is carried out. You’ll see the red arrow start at cell 0 and move methodically down the cells, examining a new one each time you press the button. The index number in the message Checking next cell, index = 2
will change as you go along. When you reach the specified item, you’ll see the message Have found item with key 505 or whatever key value you typed in. Assuming duplicates are not allowed, the search will terminate as soon as an item with the specified key value is found.
If you have selected a key number that is not in the array, the applet will examine every occupied cell in the array before telling you that it can’t find that item. Notice that (again assuming duplicates are not allowed) the search algorithm must look through an average of half the data items to find a specified item. Items close to the beginning of the array will be found sooner, and those toward the end will be found later. If N is the number of items, the average number of steps needed to find an item is N/2. In the worst-case scenario, the specified item is in the last occupied cell, and N steps will be required to find it.
As we noted, the time an algorithm takes to execute is proportional to the number of steps, so searching takes much longer on the average (N/2 steps) than insertion (one step).
Deletion
To delete an item, you must first find it. After you type in the number of the item to be deleted, repeated button presses will cause the arrow to move, step by step, down the array until the item is located. The next button press deletes the item, and the cell becomes empty. (Strictly speaking, this step isn’t necessary because we’re going to copy over this cell anyway, but deleting the item makes it clearer what’s happening.)
Implicit in the deletion algorithm is the assumption that holes are not allowed in the array. A hole is one or more empty cells that have filled cells above them (at higher index numbers). If holes are allowed, all the algorithms become more complicated because they must check to see whether a cell is empty before examining its contents. Also, the algorithms become less efficient because they must waste time looking at unoccupied cells. For these reasons, occupied cells must be arranged contiguously: no holes allowed.
Therefore, after locating the specified item and deleting it, the applet must shift the contents of each subsequent cell down one space to fill in the hole. Figure 2.2 shows an example. 


                                                  Deleting an item
If the item in cell 5 (38, in Figure 2.2) is deleted, the item in 6 shifts into 5, the item in 7 shifts into 6, and so on to the last occupied cell. During the deletion process, when the item is located, the applet shifts down the contents of the higher-indexed cells as you continue to press the Del button.
A deletion requires (assuming no duplicates are allowed) searching through an average of N/2 elements and then moving the remaining elements (an average of N/2 moves) to fill up the resulting hole. This is N steps in all.
The Duplicates Issue
When you design a data storage structure, you need to decide whether items with duplicate keys will be allowed. If you’re working with a personnel file and the key is an employee number, duplicates don’t make much sense; there’s no point in assigning the same number to two employees. On the other hand, if the key value is last names, then there’s a distinct possibility several employees will have the same key value, so duplicates should be allowed. 
 Of course, for the baseball players, duplicate numbers should not be allowed. Keeping track of the players would be hard if more than one wore the same number.
The Array Workshop applet lets you select either option. When you use New to create a new array, you’re prompted to specify both its size and whether duplicates
are permitted. Use the radio buttons Dups OK or No Dups to make this selection. If you’re writing a data storage program in which duplicates are not allowed, you may need to guard against human error during an insertion by checking all the data items in the array to ensure that none of them already has the same key value as the item being inserted. This check is inefficient, however, and increases the number of steps required for an insertion from one to N. For this reason, our applet does not perform this check.
Searching with Duplicates
Allowing duplicates complicates the search algorithm, as we noted. Even if it finds a match, it must continue looking for possible additional matches until the last occupied cell. At least, this is one approach; you could also stop after the first match. How you proceed depends on whether the question is “Find me everyone with blue eyes” or “Find me someone with blue eyes.” When the Dups OK button is selected, the applet takes the first approach, finding all items matching the search key. This approach always requires N steps because the algorithm must go all the way to the last occupied cell.
Insertion with Duplicates
Insertion is the same with duplicates allowed as when they’re not: A single step inserts the new item. But remember, if duplicates are not allowed, and there’s a possibility the user will attempt to input the same key twice, you may need to check every existing item before doing an insertion.
Deletion with Duplicates
Deletion may be more complicated when duplicates are allowed, depending on exactly how “deletion” is defined. If it means to delete only the first item with a specified value, then, on the average, only N/2 comparisons and N/2 moves are necessary. This is the same as when no duplicates are allowed.
If, however, deletion means to delete every item with a specified key value, the same operation may require multiple deletions. Such an operation will require checking N cells and (probably) moving more than N/2 cells. The average depends on how the duplicates are distributed throughout the array.
The applet assumes this second meaning and deletes multiple items with the same key. This is complicated because each time an item is deleted, subsequent items must be shifted farther. For example, if three items are deleted, then items beyond the last deletion will need to be shifted three spaces. To see how this operation works, set the applet to Dups OK and insert three or four items with the same key. Then try deleting them.
Table 2.1 shows the average number of comparisons and moves for the three operations, first where no duplicates are allowed and then where they are allowed. N is the number of items in the array. Inserting a new item counts as one move.

 

You can explore these possibilities with the Array Workshop applet. The difference between N and N/2 is not usually considered very significant, except when you’re fine-tuning a program. Of more importance, as we’ll discuss toward the end of this chapter, is whether an operation takes one step, N steps, log(N) steps, or N2 steps.
Not Too Swift
One of the significant things to notice when you’re using the Array applet is the slow and methodical nature of the algorithms. With the exception of insertion, the algorithms involve stepping through some or all of the cells in the array. Different data structures offer much faster (but more complex) algorithms. We’ll see one, the binary search on an ordered array, later in this chapter, and others throughout this book.
2.2 The Basics of Arrays in Java
The preceding section showed graphically the primary algorithms used for arrays. Now we’ll see how to write programs to carry out these algorithms, but we first want to cover a few of the fundamentals of arrays in Java.
If you’re a Java expert, you can skip ahead to the next section, but even C and C++ programmers should stick around. Arrays in Java use syntax similar to that in C and C++ (and not that different from other languages), but there are nevertheless some unique aspects to the Java approach.
Creating an Array
As we noted in Chapter 1, “Overview,” there are two kinds of data in Java: primitive types (such as int and double) and objects. In many programming languages (even object-oriented ones such as C++), arrays are primitive types, but in Java they’re treated as objects. Accordingly, you must use the new operator to create an array:
int[] intArray;// defines a reference to an array
intArray = new int[100]; // creates the array, and
                                       // sets intArray to refer to it
Or you can use the equivalent single-statement approach:
int[] intArray = new int[100];
The [] operator is the sign to the compiler we’re naming an array object and not an ordinary variable. You can also use an alternative syntax for this operator, placing it after the name instead of the type:
int intArray[] = new int[100]; // alternative syntax
However, placing the [] after the int makes it clear that the [] is part of the type, not the name. Because an array is an object, its name—intArray in the preceding code—is a reference to an array; it’s not the array itself. The array is stored at an address elsewhere in memory, and intArray holds only this address.
Arrays have a length field, which you can use to find the size (the number of elements) of an array:
int arrayLength = intArray.length; // find array size
As in most programming languages, you can’t change the size of an array after it’s been created.
Accessing Array Elements
Array elements are accessed using an index number in square brackets. This is similar to how other languages work:
temp = intArray[3]; // get contents of fourth element of array
intArray[7] = 66; // insert 66 into the eighth cell
Remember that in Java, as in C and C++, the first element is numbered 0, so that the indices in an array of 10 elements run from 0 to 9.
The Basics of Arrays in Java
If you use an index that’s less than 0 or greater than the size of the array less 1, you’ll get the Array Index Out of Bounds runtime error.
Initialization
Unless you specify otherwise, an array of integers is automatically initialized to 0 when it’s created. Unlike C++, this is true even of arrays defined within a method (function). Say you create an array of objects like this:
autoData[] carArray = new autoData[4000];
Until the array elements are given explicit values, they contain the special null object. If you attempt to access an array element that contains null, you’ll get the runtime error Null Pointer Assignment. The moral is to make sure you assign something to an element before attempting to access it. You can initialize an array of a primitive type to something besides 0 using this syntax:
int[] intArray = { 0, 3, 6, 9, 12, 15, 18, 21, 24, 27 };
Perhaps surprisingly, this single statement takes the place of both the reference declaration and the use of new to create the array. The numbers within the curly brackets are called the initialization list. The size of the array is determined by the number of values in this list.
An Array Example
Let’s look at some example programs that show how an array can be used. We’ll start with an old-fashioned procedural version and then show the equivalent object oriented approach. Listing 2.1 shows the old-fashioned version, called array.java.
class ArrayApp
{
public static void main(String[] args)
{
long[] arr;  // reference to array
arr = new long[100];  // make array
int nElems = 0;  // number of items
int j;  // loop counter
long searchKey;  // key of item to search for
arr[0] = 77;  // insert 10 items
arr[1] = 99;
arr[2] = 44;
arr[3] = 55;
arr[4] = 22;
arr[5] = 88;
arr[6] = 11;
arr[7] = 00;
arr[8] = 66;
arr[9] = 33;
nElems = 10;  // now 10 items in array
//--------------------------------------------------------------
for(j=0; j<nElems; j++)  // display items
System.out.print(arr[j] + “ “);
System.out.println(“”);
//--------------------------------------------------------------
searchKey = 66;  // find item with key 66
for(j=0; j<nElems; j++)  // for each element,
if(arr[j] == searchKey)  // found item?
break;   // yes, exit before end
if(j == nElems)  // at the end?
System.out.println(“Can’t find “ + searchKey); // yes
else
System.out.println(“Found “ + searchKey);  // no
//--------------------------------------------------------------
searchKey = 55;  // delete item with key 55
for(j=0; j<nElems; j++)  // look for it
if(arr[j] == searchKey)
break;
for(int k=j; k<nElems-1; k++)  // move higher ones down
arr[k] = arr[k+1];
nElems--;   // decrement size
//--------------------------------------------------------------
for(j=0; j<nElems; j++)   // display items
System.out.print( arr[j] + “ “);
System.out.println(“”);
} // end main()
} // end class ArrayApp
In this program, we create an array called arr, place 10 data items (kids’ numbers) in it, search for the item with value 66 (the shortstop, Louisa), display all the items, remove the item with value 55 (Freddy, who had a dentist appointment), and then display the remaining 9 items. The output of the program looks like this:
77 99 44 55 22 88 11 0 66 33
Found 66
77 99 44 22 88 11 0 66 33
The data we’re storing in this array is type long. We use long to make it clearer that this is data; type int is used for index values. We’ve chosen a primitive type to simplify the coding. Generally, the items stored in a data structure consist of several fields, so they are represented by objects rather than primitive types. We’ll see such an example toward the end of this chapter.
Insertion
Inserting an item into the array is easy; we use the normal array syntax:
arr[0] = 77;
We also keep track of how many items we’ve inserted into the array with the nElems variable.
Searching
The searchKey variable holds the value we’re looking for. To search for an item, we step through the array, comparing searchKey with each element. If the loop variable j reaches the last occupied cell with no match being found, the value isn’t in the array. Appropriate messages are displayed: Found 66 or Can’t find 27.
Deletion
Deletion begins with a search for the specified item. For simplicity, we assume (perhaps rashly) that the item is present. When we find it, we move all the items with higher index values down one element to fill in the “hole” left by the deleted element, and we decrement nElems. In a real program, we would also take appropriate action if the item to be deleted could not be found.
Display
Displaying all the elements is straightforward: We step through the array, accessing each one with arr[j] and displaying it.
Program Organization
The organization of array.java leaves something to be desired. The program has only one class, ArrayApp, and this class has only one method, main(). array.java is essentially an old-fashioned procedural program. Let’s see if we can make it easier to understand (among other benefits) by making it more object oriented. We’re going to provide a gradual introduction to an object-oriented approach, using two steps. In the first, we’ll separate the data storage structure (the array) from the rest of the program. The remaining part of the program will become a user of the structure. In the second step, we’ll improve the communication between the storage structure and its user.
2.3 Dividing a Program into Classes
The array.java program in Listing 2.1 essentially consists of one big method. We can reap many benefits by dividing the program into classes. What classes? The data storage structure itself is one candidate, and the part of the program that uses this data structure is another. By dividing the program into these two classes, we can clarify the functionality of the program, making it easier to design and understand (and in real programs to modify and maintain).
In array.java we used an array as a data storage structure, but we treated it simply as a language element. Now we’ll encapsulate the array in a class, called LowArray. We’ll also provide class methods by which objects of other classes (the LowArrayApp class in this case) can access the array. These methods allow communication between LowArray and LowArrayApp.
Our first design of the LowArray class won’t be entirely successful, but it will demonstrate the need for a better approach.
// lowArray.java
// demonstrates array class with low-level interface
// to run this program: C>java LowArrayApp
////////////////////////////////////////////////////////////////
class LowArray
{
private long[] a;  // ref to array a
//--------------------------------------------------------------
public LowArray(int size)  // constructor
{ a = new long[size]; }  // create array
//--------------------------------------------------------------
public void setElem(int index, long value)  // set value
{ a[index] = value; 
}
//--------------------------------------------------------------
public long getElem(int index)   // get value
{ return a[index]; }
//--------------------------------------------------------------
} // end class LowArray
////////////////////////////////////////////////////////////////
class LowArrayApp
{
public static void main(String[] args)
{
LowArray arr;   // reference
arr = new LowArray(100);   // create LowArray object
int nElems = 0;    // number of items in array
int j;    // loop variable
arr.setElem(0,77);   // insert 10 items
arr.setElem(1,99);
arr.setElem(2,44);
arr.setElem(3, 55);
arr.setElem(4,22);
arr.setElem(5,88);
arr.setElem(6,11);
arr.setElem(7,00);
arr.setElem(8,66);
arr.setElem(9,33);    // now 10 items in array
nElems = 10; 
for(j=0; j<nElems; j++)    // display items
System.out.print(arr.getElem(j) + “ “);
System.out.println(“”);
int searchKey = 26;    // search for data item
for(j=0; j<nElems; j++)   // for each element,
if(arr.getElem(j) == searchKey) // found item?
break;
if(j == nElems)   // no
System.out.println(“Can’t find “ + searchKey);
else   // yes
System.out.println(“Found “ + searchKey);
for(j=0; j<nElems; j++)
if(arr.getElem(j) == 55)
break;
for(int k=j; k<nElems; k++)
// delete value 55
// look for it
// higher ones down
arr.setElem(k, arr.getElem(k+1) );
nElems--;   // decrement size
}
for(j=0; j<nElems; j++)   // display items
System.out.print( arr.getElem(j) + “ “);
System.out.println(“”);
} // end main()
// end class LowArrayApp

The output from the lowArray.java program is similar to that from array.java, except that we try to find a non-existent key value (26) before deleting the item with the key value 55:
77 99 44 55 22 88 11 0 66 33
Can’t find 26
77 99 44 22 88 11 0 66 33
Classes LowArray and LowArrayApp
In lowArray.java, we essentially wrap the class LowArray around an ordinary Java array. The array is hidden from the outside world inside the class; it’s private, so only LowArray class methods can access it. There are three LowArray methods: setElem() and getElem(), which insert and retrieve an element, respectively; and a constructor, which creates an empty array of a specified size.
Another class, LowArrayApp, creates an object of the LowArray class and uses it to store and manipulate data. Think of LowArray as a tool and LowArrayApp as a user of the tool. We’ve divided the program into two classes with clearly defined roles. This is a valuable first step in making a program object oriented. A class used to store data objects, as is LowArray in the lowArray.java program, is sometimes called a container class. Typically, a container class not only stores the data but also provides methods for accessing the data and perhaps also sorting it and performing other complex actions on it.
2.4 Class Interfaces
We’ve seen how a program can be divided into separate classes. How do these classes interact with each other? Communication between classes and the division of responsibility between them are important aspects of object-oriented programming. This point is especially true when a class may have many different users. Typically, a class can be used over and over by different users (or the same user) for different purposes. For example, someone might use the LowArray  class in some other program to store the serial numbers of his traveler’s checks. The class can handle this task just as well as it can store the numbers of baseball players.
If a class is used by many different programmers, the class should be designed so that it’s easy to use. The way that a class user relates to the class is called the class interface. Because class fields are typically private, when we talk about the interface, we usually mean the class methods—what they do and what their arguments are. By calling these methods, a class user interacts with an object of the class. One of the important advantages conferred by object-oriented programming is that a class interface can be designed to be as convenient and efficient as possible.
Not So Convenient
The interface to the LowArray class in lowArray.java is not particularly convenient. The methods setElem() and getElem() operate on a low conceptual level, performing exactly the same tasks as the [] operator in an ordinary Java array. The class user, represented by the main() method in the LowArrayApp class, ends up having to carry out the same low-level operations it did in the non-class version of an array in the array.java program. The only difference was that it related to setElem() and getElem() instead of the [] operator. It’s not clear that this approach is an improvement.
Also notice that there’s no convenient way to display the contents of the array. Somewhat crudely, the LowArrayApp class simply uses a for loop and the getElem() method for this purpose. We could avoid repeated code by writing a separate method for LowArrayApp that it could call to display the array contents, but is it really the responsibility of the LowArrayApp class to provide this method?
Thus, lowArray.java demonstrates how to divide a program into classes, but it really doesn’t buy us too much in practical terms. Let’s see how to redistribute responsibilities between the classes to obtain more of the advantages of OOP.
Who’s Responsible for What?
In the lowArray.java program, the main()routine in the LowArrayApp class, the user of the data storage structure, must keep track of the indices to the array. For some users of an array, who need random access to array elements and don’t mind keeping track of the index numbers, this arrangement might make sense. For example, sorting an array, as we’ll see in the next chapter, can make efficient use of this direct hands-on approach.
In a typical program, however, the user of the data storage device won’t find access to the array indices to be helpful or relevant.
The highArray.java Example
Out next example program shows an improved interface for the storage structure class, called HighArray. Using this interface, the class user (the HighArrayApp class) no longer needs to think about index numbers. The setElem() and getElem() methods are gone; they’re replaced by insert(), find(), and delete(). These new methods don’t require an index number as an argument because the class takes responsibility for handling index numbers. The user of the class (HighArrayApp) is free to concentrate on the what instead of the how—what’s going to be inserted, deleted, and accessed, instead of exactly how these activities are carried out.
Figure 2.4 shows the HighArray interface, and Listing 2.3 shows the highArray.java
program.
// highArray.java
// demonstrates array class with high-level interface
// to run this program: C>java HighArrayApp
////////////////////////////////////////////////////////////////
class HighArray
{
private long[] a; // ref to array a
private int nElems; // number of data items
//-----------------------------------------------------------
public HighArray(int max) // constructor
{
a = new long[max]; // create the array
nElems = 0; // no items yet
}
//-----------------------------------------------------------
public boolean find(long searchKey){  // find specified value
int j;
for(j=0; j<nElems; j++) // for each element,
if(a[j] == searchKey) // found item?

break;  // exit loop before end
if(j == nElems)  // gone to end?
return false; // yes, can’t find it
else
return true; // no, found it
} // end find()
//-----------------------------------------------------------
public void insert(long value) // put element into array
{
a[nElems] = value; // insert it
nElems++; // increment size
}
//-----------------------------------------------------------
public boolean delete(long value)
{
int j;
for(j=0; j<nElems; j++) // look for it
if( value == a[j] )
break;
if(j==nElems) // can’t find it
return false;
else // found it
{
for(int k=j; k<nElems; k++) // move higher ones down
a[k] = a[k+1];
nElems--; // decrement size
return true;
}
} // end delete()
//-----------------------------------------------------------
public void display() // displays array contents
{
for(int j=0; j<nElems; j++) // for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//-----------------------------------------------------------
} // end class HighArray
////////////////////////////////////////////////////////////////
class HighArrayApp
{
public static void main(String[] args)


{
int maxSize = 100;// array size
HighArray arr; // reference to array
arr = new HighArray(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.display(); // display items
int searchKey = 35; // search for item
if( arr.find(searchKey) )
System.out.println(“Found “ + searchKey);
else
System.out.println(“Can’t find “ + searchKey);
arr.delete(00);
arr.delete(55);
arr.delete(99); // delete 3 items
arr.display(); // display items again
} // end main()
} // end class HighArrayApp
////////////////////////////////////////////////////////////////

 The HighArray class is now wrapped around the array. In main(), we create an array of this class and carry out almost the same operations as in the lowArray.java program: We insert 10 items, search for an item—one that isn’t there—and display the array contents. Because deleting is so easy, we delete 3 items (0, 55, and 99) instead of 1 and finally display the contents again. Here’s the output:
77 99 44 55 22 88 11 0 66 33
Can’t find 35
77 44 22 88 11 66 33
Notice how short and simple main() is. The details that had to be handled by main() in lowArray.java are now handled by HighArray class methods. In the HighArray class, the find() method looks through the array for the item whose key value was passed to it as an argument. It returns true or false, depending on whether it finds the item.
The insert() method places a new data item in the next available space in the array. A field called nElems keeps track of the number of array cells that are actually filled with data items. The main() method no longer needs to worry about how many items are in the array.
The delete() method searches for the element whose key value was passed to it as an argument and, when it finds that element, shifts all the elements in higher index cells down one cell, thus writing over the deleted value; it then decrements nElems.
We’ve also included a display() method, which displays all the values stored in the array.
The User’s Life Made Easier
In lowArray.java (Listing 2.2), the code in main() to search for an item required eight lines; in highArray.java, it requires only one. The class user, the HighArrayApp class, need not worry about index numbers or any other array details. Amazingly, the class user doesn’t even need to know what kind of data structure the HighArray class is using to store the data. The structure is hidden behind the interface. In fact, in the next section, we’ll see the same interface used with a somewhat different data structure.
Abstraction
The process of separating the how from the what—how an operation is performed inside a class, as opposed to what’s visible to the class user—is called abstraction. Abstraction is an important aspect of software engineering. By abstracting class functionality, we make it easier to design a program because we don’t need to think about implementation details at too early a stage in the design process.
The Ordered Workshop Applet
Imagine an array in which the data items are arranged in order of ascending key values—that is, with the smallest value at index 0, and each cell holding a value larger than the cell below. Such an array is called an ordered array.
When we insert an item into this array, the correct location must be found for the insertion: just above a smaller value and just below a larger one. then all the larger values must be moved up to make room.
Why would we want to arrange data in order? One advantage is that we can speed up search times dramatically using a binary search. Start the Ordered Workshop applet, using the procedure described in Chapter 1.
You’ll see an array; it’s similar to the one in the Array Workshop applet, but the data is ordered.Following Figure shows this applet.



                                        The Ordered Workshop applet.
In the ordered array we’ve chosen not to allow duplicates. As we saw earlier, this decision speeds up searching somewhat but slows down insertion.
Linear Search
Two search algorithms are available for the Ordered Workshop applet: linear and binary. Linear search is the default. Linear searches operate in much the same way as the searches in the unordered array in the Array applet: The red arrow steps along, looking for a match. The difference is that in the ordered array, the search quits if an item with a larger key is found.
Try out a linear search. Make sure the Linear radio button is selected. Then use the Find button to search for a non-existent value that, if it were present, would fit somewhere in the middle of the array. In Figure 2.5, this number might be 400.
You’ll see that the search terminates when the first item larger than 400 is reached; it’s 427 in the figure. The algorithm knows there’s no point looking further. Try out the Ins and Del buttons as well. Use Ins to insert an item with a key value that will go somewhere in the middle of the existing items. You’ll see that insertion requires moving all the items with key values larger than the item being inserted.
Use the Del button to delete an item from the middle of the array. Deletion works much the same as it did in the Array applet, shifting items with higher index numbers down to fill in the hole left by the deletion. In the ordered array, however, the deletion algorithm can quit partway through if it doesn’t find the item, just as the search routine can.
Binary Search
The payoff for using an ordered array comes when we use a binary search. This kind of search is much faster than a linear search, especially for large arrays.
The Guess-a-Number Game
Binary search uses the same approach you did as a kid (if you were smart) to guess a number in the well-known children’s guessing game. In this game, a friend asks you to guess a number she’s thinking of between 1 and 100. When you guess a number, she’ll tell you one of three things: Your guess is larger than the number she’s thinking of, it’s smaller, or you guessed correctly.
To find the number in the fewest guesses, you should always start by guessing 50. If your friend says your guess is too low, you deduce the number is between 51 and 100, so your next guess should be 75 (halfway between 51 and 100). If she says it’s too high, you deduce the number is between 1 and 49, so your next guess should be 25.
Each guess allows you to divide the range of possible values in half. Finally, the range is only one number long, and that’s the answer.
Notice how few guesses are required to find the number. If you used a linear search, guessing first 1, then 2, then 3, and so on, finding the number would take you, on the average, 50 guesses. In a binary search each guess divides the range of possible values in half, so the number of guesses required is far fewer. Table 2.2 shows a game session when the number to be guessed is 33.



The correct number is identified in only seven guesses. This is the maximum. You might get lucky and guess the number before you’ve worked your way all the way down to a range of one. This would happen if the number to be guessed was 50, for example, or 34.
Binary Search in the Ordered Workshop Applet
To perform a binary search with the Ordered Workshop applet, you must use the New button to create a new array. After the first press, you’ll be asked to specify the size of the array (maximum 60) and which kind of searching scheme you want: linear or binary. Choose binary by clicking the Binary radio button. After the array is created, use the Fill button to fill it with data items. When prompted, type the amount (not more than the size of the array). A few more presses fills in all the items.
When the array is filled, pick one of the values in the array and see how you can use the Find button to locate it. After a few preliminary presses, you’ll see the red arrow pointing to the algorithm’s current guess, and you’ll see the range shown by a vertical blue line adjacent to the appropriate cells. Figure 2.6 depicts the situation when the range is the entire array.



                                 Initial range in the binary search.
At each press of the Find button, the range is halved and a new guess is chosen in the middle of the range.Following Figure  shows the next step in the process.




                                       Range in step 2 of the binary search.
Even with a maximum array size of 60 items, a half-dozen button presses suffices to locate any item.
Try using the binary search with different array sizes. Can you figure out how many steps are necessary before you run the applet? We’ll return to this question in the last section of this chapter.
Notice that the insertion and deletion operations also employ the binary search (when it’s selected). The place where an item should be inserted is found with a binary search, as is an item to be deleted. In this applet, items with duplicate keys are not permitted.
2.5 Java Code for an Ordered Array
Let’s examine some Java code that implements an ordered array. We’ll use the OrdArray class to encapsulate the array and its algorithms. The heart of this class is the find() method, which uses a binary search to locate a specified data item. We’ll examine this method in detail before showing the complete program.
Binary Search with the find() Method
The find() method searches for a specified item by repeatedly dividing in half the range of array elements to be considered. The method looks like this:
public int find(long searchKey)
{
int lowerBound = 0;
int upperBound = nElems-1;

int curIn;
while(true)
{
curIn = (lowerBound + upperBound
if(a[curIn]==searchKey)
return curIn;
//
else if(lowerBound > upperBound)
return nElems;
//
else
//
{
if(a[curIn] < searchKey)
lowerBound = curIn + 1; //
else
upperBound = curIn - 1; //
} // end else divide range
} // end while
} // end find()
) / 2;
found it
can’t find it
divide range
it’s in upper half
it’s in lower half
The method begins by setting the lowerBound and upperBound variables to the first and last occupied cells in the array. Setting these variables specifies the range where the item we’re looking for, searchKey, may be found. Then, within the while loop,the current index, curIn, is set to the middle of this range.
If we’re lucky, curIn may already be pointing to the desired item, so we first check if this is true. If it is, we’ve found the item, so we return with its index, curIn. Each time through the loop we divide the range in half. Eventually, the range will get so small that it can’t be divided any more. We check for this in the next statement: If lowerBound is greater than upperBound, the range has ceased to exist. (When lowerBound equals upperBound, the range is one and we need one more pass through the loop.) We can’t continue the search without a valid range, but we haven’t found the desired item, so we return nElems, the total number of items. This isn’t a valid index because the last filled cell in the array is nElems-1. The class user interprets this value to mean that the item wasn’t found.
If curIn is not pointing at the desired item, and the range is still big enough, we’re ready to divide the range in half. We compare the value at the current index, a[curIn], which is in the middle of the range, with the value to be found,
searchKey. If searchKey is larger, we know we should look in the upper half of the range. Accordingly, we move lowerBound up to curIn. Actually, we move it one cell beyond curIn because we’ve already checked curIn itself at the beginning of the loop.

If searchKey is smaller than a[curIn], we know we should look in the lower half of the range. So we move upperBound down to one cell below curIn. Figure 2.8 shows how the range is altered in these two situations.



                                    Dividing the range in a binary search.
 
The OrdArray Class
In general, the orderedArray.java program is similar to highArray.java (Listing 2.3). The main difference is that find() uses a binary search, as we’ve seen.
We could have used a binary search to locate the position where a new item will be inserted. This operation involves a variation on the find() routine, but for simplicity we retain the linear search in insert(). The speed penalty may not be important because, as we’ve seen, an average of half the items must be moved anyway when an insertion is performed, so insertion will not be very fast even if we locate the item with a binary search. However, for the last ounce of speed, you could change the initial part of insert() to a binary search (as is done in the Ordered Workshop applet). Similarly, the delete() method could call find() to figure out the location of the item to be deleted.
The OrdArray class includes a new size() method, which returns the number of data items currently in the array. This information is helpful for the class user, main(), when it calls find(). If find() returns nElems, which main() can discover with size(), then the search was unsuccessful. Listing 2.4 shows the complete listing for the orderedArray.java program.// orderedArray.java
// demonstrates ordered array class
// to run this program: C>java OrderedApp
////////////////////////////////////////////////////////////////
class OrdArray
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//-----------------------------------------------------------
public OrdArray(int max)
// constructor
{
a = new long[max];
// create array
nElems = 0;
}
//-----------------------------------------------------------
public int size()
{ return nElems; }
//-----------------------------------------------------------
public int find(long searchKey)
{
int lowerBound = 0;
int upperBound = nElems-1;
int curIn;
while(true)
{
curIn = (lowerBound + upperBound ) / 2;
if(a[curIn]==searchKey)
return curIn;
// found it
else if(lowerBound > upperBound)
return nElems;
// can’t find it
else
// divide range
{
if(a[curIn] < searchKey)
lowerBound = curIn + 1; // it’s in upper half
else
upperBound = curIn - 1; // it’s in lower half
} // end else divide range
} // end while
} // end find()
//-----------------------------------------------------------

public void insert(long value)
// put element into array
{
int j;
for(j=0; j<nElems; j++)
// find where it goes
if(a[j] > value)
// (linear search)
break;
for(int k=nElems; k>j; k--)
// move bigger ones up
a[k] = a[k-1];
a[j] = value;
// insert it
nElems++;
// increment size
} // end insert()
//-----------------------------------------------------------
public boolean delete(long value)
{
int j = find(value);
if(j==nElems)
// can’t find it
return false;
else
// found it
{
for(int k=j; k<nElems; k++) // move bigger ones down
a[k] = a[k+1];
nElems--;
// decrement size
return true;
}
} // end delete()
//-----------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//-----------------------------------------------------------
} // end class OrdArray
////////////////////////////////////////////////////////////////
class OrderedApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
OrdArray arr;
// reference to array
arr = new OrdArray(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
int searchKey = 55;
// search for item
if( arr.find(searchKey) != arr.size() )
System.out.println(“Found “ + searchKey);
else
System.out.println(“Can’t find “ + searchKey);
arr.display(); // display items
arr.delete(00); // delete 3 items
arr.delete(55);
arr.delete(99);
arr.display();
// display items again
} // end main()
} // end class OrderedApp
////////////////////////////////////////////////////////////////
Advantages of Ordered Arrays
What have we gained by using an ordered array? The major advantage is that search times are much faster than in an unordered array. The disadvantage is that insertion takes longer because all the data items with a higher key value must be moved up to make room. Deletions are slow in both ordered and unordered arrays because items must be moved down to fill the hole left by the deleted item.
Ordered arrays are therefore useful in situations in which searches are frequent, but insertions and deletions are not. An ordered array might be appropriate for a database of company employees, for example. Hiring new employees and laying off existing ones would probably be infrequent occurrences compared with accessing an existing employee’s record for information, or updating it to reflect changes in salary, address, and so on.
A retail store inventory, on the other hand, would not be a good candidate for an ordered array because the frequent insertions and deletions, as items arrived in the store and were sold, would run slowly.
2.6 Logarithms
In this section we’ll explain how logarithms are used to calculate the number of steps necessary in a binary search. If you’re a math major, you can probably skip this section. If math makes you break out in a rash, you can also skip it, except for taking a long hard look at Table 2.3.
We’ve seen that a binary search provides a significant speed increase over a linear search. In the number-guessing game, with a range from 1 to 100, a maximum of seven guesses is needed to identify any number using a binary search; just as in an array of 100 records, seven comparisons are needed to find a record with a specified key value. How about other ranges? Table 2.3 shows some representative ranges and the number of comparisons needed for a binary search.

Range                   Comparisons Needed
10                                      4
100                                    7
1,000                                 10
10,000                               14
100,000                             17
1,000,000                          20
10,000,000                        24
100,000,000                      27
1,000,000,000                   30


Notice the differences between binary search times and linear search times. For very small numbers of items, the difference isn’t dramatic. Searching 10 items would take an average of five comparisons with a linear search (N/2) and a maximum of four comparisons with a binary search. But the more items there are, the bigger the difference. With 100 items, there are 50 comparisons in a linear search, but only 7 in a binary search. For 1,000 items, the numbers are 500 versus 10, and for 1,000,000 items, they’re 500,000 versus 20. We can conclude that for all but very small arrays, the binary search is greatly superior.
The Equation
You can verify the results of Table 2.3 by repeatedly dividing a range (from the first column) in half until it’s too small to divide further. The number of divisions this process requires is the number of comparisons shown in the second column. Repeatedly dividing the range by two is an algorithmic approach to finding the number of comparisons. You might wonder if you could also find the number using a simple equation. Of course, there is such an equation, and it’s worth exploring here because it pops up from time to time in the study of data structures. This formula involves logarithms. (Don’t panic yet.)
The numbers in Table 2.3 leave out some interesting data. They don’t answer such questions as, What is the exact size of the maximum range that can be searched in five steps? To solve this problem, we must create a similar table, but one that starts at the beginning, with a range of one, and works up from there by multiplying the range by two each time. Table 2.4 shows how this looks for the first seven steps. For our original problem with a range of 100, we can see that 6 steps don’t produce a range quite big enough (64), while 7 steps cover it handily (128). Thus, the 7 steps that are shown for 100 items in Table 2.3 are correct, as are the 10 steps for a range of 1000.
Doubling the range each time creates a series that’s the same as raising two to a power, as shown in the third column of Table 2.4. We can express this power as a formula. If s represents steps (the number of times you multiply by two—that is, the power to which two is raised) and r represents the range, then the equation is
 r = 2^s
If you know s, the number of steps, this tells you r, the range. For example, if s is 6, the range is 26, or 64.
The Opposite of Raising Two to a Power
Our original question was the opposite of the one just described: Given the range, we want to know how many comparisons are required to complete a search. That is, given r, we want an equation that gives us s.
The inverse of raising something to a power is called a logarithm. Here’s the formula we want, expressed with a logarithm:
s = log2(r)
This equation says that the number of steps (comparisons) is equal to the logarithm to the base 2 of the range. What’s a logarithm? The base 2 logarithm of a number r is the number of times you must multiply two by itself to get r. In Table 2.4, we show that the numbers in the first column, s, are equal to log2(r). How do you find the logarithm of a number without doing a lot of dividing? Pocket calculators and most computer languages have a log function. It is usually log to the base 10, but you can convert easily to base 2 by multiplying by 3.322. For example, log10(100) = 2, so log2(100) = 2 times 3.322, or 6.644. Rounded up to the whole number 7, this is what appears in the column to the right of 100 in Table 2.4.
In any case, the point here isn’t to calculate logarithms. It’s more important to understand the relationship between a number and its logarithm. Look again at Table 2.3, which compares the number of items and the number of steps needed to find a particular item. Every time you multiply the number of items (the range) by a factor of 10, you add only three or four steps (actually 3.322, before rounding off to whole numbers) to the number needed to find a particular element. This is true because, as a number grows larger, its logarithm doesn’t grow nearly as fast. We’ll compare this logarithmic growth rate with that of other mathematical functions when we talk about Big O notation later in this chapter.
2.7 Storing Objects
In the Java examples we’ve shown so far, we’ve stored primitive variables of type long in our data structures. Storing such variables simplifies the program examples, but it’s not representative of how you use data storage structures in the real world.
Usually, the data items (records) you want to store are combinations of many fields. For a personnel record, you would store last name, first name, age, Social Security number, and so forth. For a stamp collection, you would store the name of the country that issued the stamp, its catalog number, condition, current value, and so on.
In our next Java example, we’ll show how objects, rather than variables of primitive types, can be stored.
The Person Class
In Java, a data record is usually represented by a class object. Let’s examine a typical class used for storing personnel data. Here’s the code for the Person class:
class Person
{
private String lastName;
private String firstName;
private int age;
//-----------------------------------------------------------
public Person(String last, String first, int a)
{
// constructor
lastName = last;
firstName = first;
age = a;
}
//-----------------------------------------------------------
public void displayPerson()
{
System.out.print(“
Last name: “ + lastName);
System.out.print(“, First name: “ + firstName);
System.out.println(“, Age: “ + age);
}
//-----------------------------------------------------------
public String getLast()
// get last name
{ return lastName; }
} // end class Person
We show only three variables in this class, for a person’s last name, first name, and age. Of course, records for most applications would contain many additional fields. A constructor enables a new Person object to be created and its fields initialized. The displayPerson() method displays a Person object’s data, and the getLast() method returns the Person’s last name; this is the key field used for searches.
The classDataArray.java Program
The program that makes use of the Person class is similar to the highArray.java program (Listing 2.3) that stored items of type long. Only a few changes are necessary to adapt that program to handle Person objects. Here are the major changes:
• The type of the array a is changed to Person.
• The key field (the last name) is now a String object, so comparisons require the equals() method rather than the == operator. The getLast() method of Person obtains the last name of a Person object, and equals() does the comparison:
if( a[j].getLast().equals(searchName) )
// found item?
• The insert() method creates a new Person object and inserts it in the array, instead of inserting a long value. The main() method has been modified slightly, mostly to handle the increased quantity of output. We still insert 10 items, display them, search for 1 item, delete 3 items, and display them all again. Listing 2.5 shows the complete classDataArray.java program.
LISTING 2.5
The classDataArray.java Program
// classDataArray.java
// data items as class objects
// to run this program: C>java ClassDataApp
////////////////////////////////////////////////////////////////
class Person
{
private String lastName;
private String firstName;
private int age;
//--------------------------------------------------------------
public Person(String last, String first, int a)
{
// constructor
lastName = last;
firstName = first;
age = a;
}
//--------------------------------------------------------------
public void displayPerson()
{
System.out.print(“
Last name: “ + lastName);
System.out.print(“, First name: “ + firstName);
System.out.println(“, Age: “ + age);
}
//--------------------------------------------------------------

public String getLast()
// get last name
{ return lastName; }
} // end class Person
////////////////////////////////////////////////////////////////
class ClassDataArray
{
private Person[] a;
// reference to array
private int nElems;
// number of data items
public ClassDataArray(int max)
// constructor
{
a = new Person[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public Person find(String searchName)
{
// find specified value
int j;
for(j=0; j<nElems; j++)
// for each element,
if( a[j].getLast().equals(searchName) ) // found item?
break;
// exit loop before end
if(j == nElems)
// gone to end?
return null;
// yes, can’t find it
else
return a[j];
// no, found it
} // end find()
//-------------------------------------------------------------
// put person into array
public void insert(String last, String first, int age)
{
a[nElems] = new Person(last, first, age);
nElems++;
// increment size
}
//--------------------------------------------------------------
public boolean delete(String searchName)
{
// delete person from array
int j;
for(j=0; j<nElems; j++)
// look for it
if( a[j].getLast().equals(searchName) )
break;
if(j==nElems)
// can’t find it
return false;
else
// found it
{
for(int k=j; k<nElems; k++)
// shift down
a[k] = a[k+1];
nElems--;
// decrement size
return true;
}
} // end delete()
//--------------------------------------------------------------
public void displayA()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
a[j].displayPerson();
// display it
}
//--------------------------------------------------------------
} // end class ClassDataArray
////////////////////////////////////////////////////////////////
class ClassDataApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ClassDataArray arr;
// reference to array
arr = new ClassDataArray(maxSize); // create the array
// insert 10 items
arr.insert(“Evans”, “Patty”, 24);
arr.insert(“Smith”, “Lorraine”, 37);
arr.insert(“Yee”, “Tom”, 43);
arr.insert(“Adams”, “Henry”, 63);
arr.insert(“Hashimoto”, “Sato”, 21);
arr.insert(“Stimson”, “Henry”, 29);
arr.insert(“Velasquez”, “Jose”, 72);
arr.insert(“Lamarque”, “Henry”, 54);
arr.insert(“Vang”, “Minh”, 22);
arr.insert(“Creswell”, “Lucinda”, 18);
arr.displayA(); // display items
String searchKey = “Stimson”;
    Person found;
found=arr.find(searchKey);
if(found != null)
{
System.out.print(“Found “);
found.displayPerson();
}
else
System.out.println(“Can’t find “ + searchKey);
System.out.println(“Deleting Smith, Yee, and Creswell”);
arr.delete(“Smith”);
// delete 3 items
arr.delete(“Yee”);
arr.delete(“Creswell”);
arr.displayA();
// display items again
} // end main()
} // end class ClassDataApp
////////////////////////////////////////////////////////////////
Here’s the output of this program:
Last name: Evans, First name: Patty, Age: 24
Last name: Smith, First name: Lorraine, Age: 37
Last name: Yee, First name: Tom, Age: 43
Last name: Adams, First name: Henry, Age: 63
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Stimson, First name: Henry, Age: 29
Last name: Velasquez, First name: Jose, Age: 72
Last name: Lamarque, First name: Henry, Age: 54
Last name: Vang, First name: Minh, Age: 22
Last name: Creswell, First name: Lucinda, Age: 18
Found     Last name: Stimson, First name: Henry, Age: 29
Deleting Smith, Yee, and Creswell
Last name: Evans, First name: Patty, Age: 24
Last name: Adams, First name: Henry, Age: 63
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Stimson, First name: Henry, Age: 29
Last name: Velasquez, First name: Jose, Age: 72
Last name: Lamarque, First name: Henry, Age: 54
Last name: Vang, First name: Minh, Age: 22

 The classDataArray.java program shows that class objects can be handled by data storage structures in much the same way as primitive types. (Note that a serious program using the last name as a key would need to account for duplicate last names, which would complicate the programming as discussed earlier.)
2.8 Big O Notation
Automobiles are divided by size into several categories: subcompacts, compacts, midsize, and so on. These categories provide a quick idea what size car you’re talking about, without needing to mention actual dimensions. Similarly, it’s useful to have a shorthand way to say how efficient a computer algorithm is. In computer science, this rough measure is called “Big O” notation. You might think that in comparing algorithms you would say things like “Algorithm A is twice as fast as algorithm B,” but in fact this sort of statement isn’t too meaningful. Why not? Because the proportion can change radically as the number of items changes. Perhaps you increase the number of items by 50%, and now A is three times as fast as B. Or you have half as many items, and A and B are now equal. What you need is a comparison that tells how an algorithm’s speed is related to the number of items. Let’s see how this looks for the algorithms we’ve seen so far.
Insertion in an Unordered Array: Constant
Insertion into an unordered array is the only algorithm we’ve seen that doesn’t depend on how many items are in the array. The new item is always placed in the next available position, at a[nElems], and nElems is then incremented. Insertion requires the same amount of time no matter how big N—the number of items in the array—is. We can say that the time, T, to insert an item into an unsorted array is a constant K:
T=K
In a real situation, the actual time (in microseconds or whatever) required by the insertion is related to the speed of the microprocessor, how efficiently the compiler has generated the program code, and other factors. The constant K in the preceding equation is used to account for all such factors. To find out what K is in a real situation, you need to measure how long an insertion took. (Software exists for this very purpose.) K would then be equal to that time.
Linear Search: Proportional to N
We’ve seen that, in a linear search of items in an array, the number of comparisons that must be made to find a specified item is, on the average, half of the total number of items. Thus, if N is the total number of items, the search time T is proportional to half of N:
T=K*N/2
As with insertions, discovering the value of K in this equation would require timing a search for some (probably large) value of N and then using the resulting value of T to calculate K. When you know K, you can calculate T for any other value of N. For a handier formula, we could lump the 2 into the K. Our new K is equal to the old K divided by 2. Now we have
T=K*N
This equation says that average linear search times are proportional to the size of the array. If an array is twice as big, searching it will take twice as long.
Binary Search: Proportional to log(N)
Similarly, we can concoct a formula relating T and N for a binary search:
T = K * log2(N)
As we saw earlier, the time is proportional to the base 2 logarithm of N. Actually, because any logarithm is related to any other logarithm by a constant (3.322 to go from base 2 to base 10), we can lump this constant into K as well. Then we don’t need to specify the base:
T = K * log(N)
Don’t Need the Constant
Big O notation looks like the formulas just described, but it dispenses with the constant K. When comparing algorithms, you don’t really care about the particular microprocessor chip or compiler; all you want to compare is how T changes for different values of N, not what the actual numbers are. Therefore, the constant isn’t needed.
Big O notation uses the uppercase letter O, which you can think of as meaning “order of.” In Big O notation, we would say that a linear search takes O(N) time, and a binary search takes O(log N) time. Insertion into an unordered array takes O(1), or constant time. (That’s the numeral 1 in the parentheses.)
Figure 2.9 graphs some Big O relationships between time and number of items. Based on this graph, we might rate the various Big O values (very subjectively) like this:
O(1) is excellent, O(log N) is good, O(N) is fair, and O(N2) is poor. O(N2) occurs in the bubble sort and also in certain graph algorithms that we’ll look at later in this book.



                                                   Graph of Big O times.
The idea in Big O notation isn’t to give actual figures for running times but to convey how the running times are affected by the number of items. This is the most meaningful way to compare algorithms, except perhaps actually measuring running times in a real installation.
2.9 Why Not Use Arrays for Everything?

Arrays seem to get the job done, so why not use them for all data storage? We’ve already seen some of their disadvantages. In an unordered array you can insert items quickly, in O(1) time, but searching takes slow O(N) time. In an ordered array you can search quickly, in O(logN) time, but insertion takes O(N) time. For both kinds of arrays, deletion takes O(N) time because half the items (on the average) must be moved to fill in the hole.
It would be nice if there were data structures that could do everything—insertion, deletion, and searching—quickly, ideally in O(1) time, but if not that, then in O(logN) time. In the chapters ahead, we’ll see how closely this ideal can be approached, and the price that must be paid in complexity.
Another problem with arrays is that their size is fixed when they are first created
with new. Usually, when the program first starts, you don’t know exactly how many items will be placed in the array later, so you guess how big it should be. If your guess is too large, you’ll waste memory by having cells in the array that are never filled. If your guess is too small, you’ll overflow the array, causing at best a message to the program’s user, and at worst a program crash.
Other data structures are more flexible and can expand to hold the number of items inserted in them. The linked list, discussed in Chapter 5, “Linked Lists,” is such a structure.
We should mention that Java includes a class called Vector that acts much like an array but is expandable. This added capability comes at the expense of some loss of efficiency.
You might want to try creating your own vector class. If the class user is about to overflow the internal array in this class, the insertion algorithm creates a new array of larger size, copies the old array contents to the new array, and then inserts the new item. This whole process would be invisible to the class user.