3. Simple Shorting
- 3.1 Bubble sort
- 3.2 Selection Sort
- 3.3 Insertion Sort
- 3.4 Sorting Objects
- 3.5 Comparing the Simple Sorts
As soon as you create a significant database, you’ll probably think of reasons to sort it in various ways. You need to arrange names in alphabetical order, students by grade, customers by ZIP code, home sales by price, cities in order
of increasing population, countries by GNP, stars by magnitude, and so on.
Sorting data may also be a preliminary step to searching it.
As we saw in Chapter 2, “Arrays,” a binary search, which can be applied only to sorted data, is much faster than a linear search.
Because sorting is so important and potentially so timeconsuming, it has been the subject of extensive research in computer science, and some very sophisticated methods have been developed. In this chapter we’ll look at three of
the simpler algorithms: the bubble sort, the selection sort, and the insertion sort. Each is demonstrated with its own Workshop applet. In Chapter 7, “Advanced Sorting,” we’ll look at more sophisticated approaches: Shellsort and
quicksort.
The techniques described in this chapter, while unsophisticated and comparatively slow, are nevertheless worth examining. Besides being easier to understand, they are actually better in some circumstances than the more
sophisticated algorithms. The insertion sort, for example, is preferable to quicksort for small files and for almost-sorted files. In fact, an insertion sort is commonly used as a part of a quicksort implementation.
The example programs in this chapter build on the array classes we developed in the preceding chapter. The sorting algorithms are implemented as methods of similar array classes.
Be sure to try out the Workshop applets included in this chapter. They are more
effective in explaining how the sorting algorithms work than prose and static
pictures could ever be.
How Would You Do It?
Imagine that your kids-league baseball team (mentioned in Chapter 1, “Overview”) is lined up on the field, as shown in Figure 3.1. The regulation nine players, plus an extra, have shown up for practice. You want to arrange the players in order of increasing height (with the shortest player on the left) for the team picture. How would you go about this sorting process?
As a human being, you have advantages over a computer program. You can see all the kids at once, and you can pick out the tallest kid almost instantly. You don’t need to laboriously measure and compare everyone. Also, the kids don’t need to occupy particular places. They can jostle each other, push each other a little to make room, and stand behind or in front of each other. After some ad hoc rearranging, you would have no trouble in lining up all the kids, as shown in Figure 3.2.
A computer program isn’t able to glance over the data in this way. It can compare only two players at one time because that’s how the comparison operators work. This tunnel vision on the part of algorithms will be a recurring theme. Things may seem simple to us humans, but the algorithm can’t see the big picture and must, therefore, concentrate on the details and follow some simple rules. The three algorithms in this chapter all involve two steps, executed over and over until the data is sorted:
1. Compare two items.
2. Swap two items, or copy one item.
However, each algorithm handles the details in a different way.
3.1 Bubble Sort
The bubble sort is notoriously slow, but it’s conceptually the simplest of the sorting algorithms and for that reason is a good beginning for our exploration of sorting techniques.
Bubble Sort on the Baseball Players
Imagine that you’re near-sighted (like a computer program) so that you can see only two of the baseball players at the same time, if they’re next to each other and if you stand very close to them. Given this impediment, how would you sort them? Let’s assume there are N players, and the positions they’re standing in are numbered from 0 on the left to N-1 on the right.
The bubble sort routine works like this: You start at the left end of the line and
compare the two kids in positions 0 and 1. If the one on the left (in 0) is taller, you swap them. If the one on the right is taller, you don’t do anything. Then you move over one position and compare the kids in positions 1 and 2. Again, if the one on the left is taller, you swap them. This sorting process is shown in Figure 3.3. Here are the rules you’re following:
1. Compare two players.
2. If the one on the left is taller, swap them.
3. Move one position right.
You continue down the line this way until you reach the right end. You have
by no means finished sorting the kids, but you do know that the tallest kid is
on the right. This must be true because, as soon as you encounter the tallest
kid, you’ll end up swapping him (or her) every time you compare two kids,
until eventually he (or she) will reach the right end of the line. This is why it’s
called the bubble sort: As the algorithm progresses, the biggest items “bubble
up” to the top end of the array. Figure 3.4 shows the baseball players at the end
of the first pass.
After this first pass through all the data, you’ve made N-1 comparisons and
somewhere between 0 and N-1 swaps, depending on the initial arrangement of
the players. The item at the end of the array is sorted and won’t be moved
again. Now you go back and start another pass from the left end of the line. Again, you go toward the right, comparing and swapping when appropriate. However, this time you can stop one player short of the end of the line, at position N-2,
because you know the last position, at N-1, already contains the tallest player.
This rule could be stated as:
4. When you reach the first sorted player, start over at the left end of the line.
You continue this process until all the players are in order. Describing this process is much harder than demonstrating it, so let’s watch the BubbleSort Workshop applet at work.
The BubbleSort Workshop Applet
Start the BubbleSort Workshop applet. You’ll see something that looks like a bar
graph, with the bar heights randomly arranged, as shown in following Figure
The Run Button
This Workshop applet contains a two-speed graph: You can either let it run by itself, or you can single-step through the process. To get a quick idea what happens, click the Run button. The algorithm will bubble-sort the bars. When it finishes, in 10 seconds or so, the bars will be sorted, as shown in following figure
The New Button
To do another sort, press the New button. New creates a new set of bars and initializes the sorting routine. Repeated presses of New toggle between two arrangements of bars: a random order, as shown in Figure 3.5, and an inverse ordering where the bars are sorted backward. This inverse ordering provides an extra challenge for many sorting algorithms.
The Step Button
The real payoff for using the BubbleSort Workshop applet comes when you single step through a sort. You can see exactly how the algorithm carries out each step. Start by creating a new randomly arranged graph with New. You’ll see three arrows pointing at different bars. Two arrows, labeled inner and inner+1, are side by side on the left. Another arrow, outer, starts on the far right. (The names are chosen to correspond to the inner and outer loop variables in the nested loops used in the algorithm.)
Click once on the Step button. You’ll see the inner and the inner+1 arrows move
together one position to the right, swapping the bars if appropriate. These arrows correspond to the two players you compared, and possibly swapped, in the baseball scenario.
A message under the arrows tells you whether the contents of inner and inner+1 will be swapped, but you know this just from comparing the bars: If the taller one is on the left, they’ll be swapped. Messages at the top of the graph tell you how many swaps and comparisons have been carried out so far. (A complete sort of 10 bars requires 45 comparisons and, on the average, about 22 swaps.)
Continue pressing Step. Each time inner and inner+1 finish going all the way from 0 to outer, the outer pointer moves one position to the left. At all times during the sorting process, all the bars to the right of outer are sorted; those to the left of (and at) outer are not.
The Size Button
The Size button toggles between 10 bars and 100 bars. Figure 3.7 shows what the 100 random bars look like.
You probably don’t want to single-step through the sorting process for 100 bars,
unless you’re unusually patient. Press Run instead, and watch how the blue inner
and inner+1 pointers seem to find the tallest unsorted bar and carry it down the row to the right, inserting it just to the left of the previously sorted bars.
Figure 3.8 shows the situation partway through the sorting process. The bars to the right of the red (longest) arrow are sorted. The bars to the left are beginning to look sorted, but much work remains to be done.
The BubbleSort applet with 100 bars.
The 100 partly sorted bars.
If you started a sort with Run and the arrows are whizzing around, you can freeze the process at any point by pressing the Step button. You can then single-step to watch the details of the operation or press Run again to return to high-speed mode.
The Draw Button
Sometimes while running the sorting algorithm at full speed, the computer takes
time off to perform some other task. This can result in some bars not being drawn. If this happens, you can press the Draw button to redraw all the bars. Doing so pauses the run, so you’ll need to press the Run button again to continue. You can press Draw at any time there seems to be a glitch in the display.
Java Code for a Bubble Sort
In the bubbleSort.java program, shown in Listing 3.1, a class called ArrayBub encapsulates an array a[], which holds variables of type long.
In a more serious program, the data would probably consist of objects, but we use a primitive type for simplicity. (We’ll see how objects are sorted in the objectSort.java program in Listing 3.4.) Also, to reduce the size of the listing, we don’t show find() and delete() methods with the ArrayBub class, although they would normally be part of a such a class.
// bubbleSort.java
// demonstrates bubble sort
// to run this program: C>java BubbleSortApp
////////////////////////////////////////////////////////////////
class ArrayBub
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArrayBub(int max)
// constructor
{
a = new long[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public void insert(long value)
// put element into array
{
a[nElems] = value;
// insert it
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void bubbleSort()
{
int out, in;
for(out=nElems-1; out>1; out--)
// outer loop (backward)
for(in=0; in<out; in++)
// inner loop (forward)
if( a[in] > a[in+1] )
// out of order?
swap(in, in+1);
// swap them
} // end bubbleSort()
//--------------------------------------------------------------
private void swap(int one, int two)
{
long temp = a[one];
a[one] = a[two];
a[two] = temp;
}
//--------------------------------------------------------------
} // end class ArrayBub
////////////////////////////////////////////////////////////////
class BubbleSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArrayBub arr;
// reference to array
arr = new ArrayBub(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.display(); // display items
arr.bubbleSort(); // bubble sort them
arr.display();
// display them again
} // end main()
} // end class BubbleSortApp
////////////////////////////////////////////////////////////////
The constructor and the insert() and display() methods of this class are similar to
those we’ve seen before. However, there’s a new method: bubbleSort(). When this method is invoked from main(), the contents of the array are rearranged into sorted order.
The main() routine inserts 10 items into the array in random order, displays the
array, calls bubbleSort() to sort it, and then displays it again. Here’s the output:
77 99 44 55 22 88 11 0 66 33
0 11 22 33 44 55 66 77 88 99
The bubbleSort() method is only four lines long. Here it is, extracted from the listing:
public void bubbleSort()
{
int out, in;
for(out=nElems-1; out>1; out--)
for(in=0; in<out; in++)
if( a[in] > a[in+1] )
swap(in, in+1);
} // end bubbleSort()
//
//
//
//
outer loop (backward)
inner loop (forward)
out of order?
swap them
The idea is to put the smallest item at the beginning of the array (index 0) and the largest item at the end (index nElems-1). The loop counter out in the outer for loop starts at the end of the array, at nElems-1, and decrements itself each time through the loop. The items at indices greater than out are always completely sorted. The out variable moves left after each pass by in so that items that are already sorted are no longer involved in the algorithm.
The inner loop counter in starts at the beginning of the array and increments itself each cycle of the inner loop, exiting when it reaches out. Within the inner loop, the two array cells pointed to by in and in+1 are compared, and swapped if the one in in is larger than the one in in+1.
For clarity, we use a separate swap() method to carry out the swap. It simply
exchanges the two values in the two array cells, using a temporary variable to hold the value of the first cell while the first cell takes on the value in the second and then setting the second cell to the temporary value. Actually, using a separate swap() method may not be a good idea in practice because the function call adds a small amount of overhead. If you’re writing your own sorting routine, you may prefer to put the swap instructions in line to gain a slight increase in speed.
Invariants
In many algorithms there are conditions that remain unchanged as the algorithm
proceeds. These conditions are called invariants. Recognizing invariants can be useful in understanding the algorithm. In certain situations they may also be helpful in debugging; you can repeatedly check that the invariant is true, and signal an error if it isn’t.
In the bubbleSort.java program, the invariant is that the data items to the right of out are sorted. This remains true throughout the running of the algorithm. (On the first pass, nothing has been sorted yet, and there are no items to the right of out because it starts on the rightmost element.)
Efficiency of the Bubble Sort
As you can see by watching the BubbleSort Workshop applet with 10 bars, the inner and inner+1 arrows make nine comparisons on the first pass, eight on the second, and so on, down to one comparison on the last pass. For 10 items, this is
9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 45
In general, where N is the number of items in the array, there are N-1 comparisons on the first pass, N-2 on the second, and so on. The formula for the sum of such a series is
(N–1) + (N–2) + (N–3) + ... + 1 = N*(N–1)/2
N*(N–1)/2 is 45 (10*9/2) when N is 10.
Thus, the algorithm makes about N ⁄2 comparisons (ignoring the –1, which doesn’t make much difference, especially if N is large).
There are fewer swaps than there are comparisons because two bars are swapped only
if they need to be. If the data is random, a swap is necessary about half the time, so there will be about N ⁄4 swaps. (Although in the worst case, with the initial data inversely sorted, a swap is necessary with every comparison.)
Both swaps and comparisons are proportional to N2. Because constants don’t count in Big O notation, we can ignore the 2 and the 4 and say that the bubble sort runs in O(N2) time. This is slow, as you can verify by running the BubbleSort Workshop applet with 100 bars.
Whenever you see one loop nested within another, such as those in the bubble sort and the other sorting algorithms in this chapter, you can suspect that an algorithm runs in O(N2) time. The outer loop executes N times, and the inner loop executes N (or perhaps N divided by some constant) times for each cycle of the outer loop. This means you’re doing something approximately N*N or N2 times.
3.2 Selection Sort
The selection sort improves on the bubble sort by reducing the number of swaps
necessary from O(N2) to O(N). Unfortunately, the number of comparisons remains O(N2). However, the selection sort can still offer a significant improvement for large records that must be physically moved around in memory, causing the swap time to be much more important than the comparison time. (Typically, this isn’t the case in Java, where references are moved around, not entire objects.)
Selection Sort on the Baseball Players
Let’s consider the baseball players again. In the selection sort, you can no longer
compare only players standing next to each other. Thus, you’ll need to remember a certain player’s height; you can use a notebook to write it down. A magenta-colored towel will also come in handy.
A Brief Description
What’s involved in the selection sort is making a pass through all the players and
picking (or selecting, hence the name of the sort) the shortest one. This shortest
player is then swapped with the player on the left end of the line, at position 0. Now the leftmost player is sorted and won’t need to be moved again. Notice that in this algorithm the sorted players accumulate on the left (lower indices), whereas in the bubble sort they accumulated on the right.
The next time you pass down the row of players, you start at position 1, and, finding the minimum, swap with position 1. This process continues until all the players are sorted.
A More Detailed Description
In more detail, start at the left end of the line of players. Record the leftmost player’s height in your notebook and throw the magenta towel on the ground in front of this person. Then compare the height of the next player to the right with the height in your notebook. If this player is shorter, cross out the height of the first player and record the second player’s height instead. Also move the towel, placing it in front of this new “shortest” (for the time being) player. Continue down the row, comparing each player with the minimum. Change the minimum value in your notebook and move the towel whenever you find a shorter player. When you’re done, the magenta towel will be in front of the shortest player.
Swap this shortest player with the player on the left end of the line. You’ve now
sorted one player. You’ve made N-1 comparisons, but only one swap.
On the next pass, you do exactly the same thing, except that you can completely
ignore the player on the left because this player has already been sorted. Thus, the algorithm starts the second pass at position 1, instead of 0. With each succeeding pass, one more player is sorted and placed on the left, and one less player needs to be considered when finding the new minimum. Figure 3.9 shows how this sort looks for the first three passes.
The SelectSort Workshop Applet
To see how the selection sort looks in action, try out the SelectSort Workshop applet. The buttons operate the same way as those in the BubbleSort applet. Use New to create a new array of 10 randomly arranged bars. The red arrow called outer starts on the left; it points to the leftmost unsorted bar. Gradually, it will move right as more bars are added to the sorted group on its left.
The magenta min arrow also starts out pointing to the leftmost bar; it will move to record the shortest bar found so far. (The magenta min arrow corresponds to the towel in the baseball analogy.) The blue inner arrow marks the bar currently being compared with the minimum.
As you repeatedly press Step, inner moves from left to right, examining each bar in turn and comparing it with the bar pointed to by min. If the inner bar is shorter, min jumps over to this new, shorter bar. When inner reaches the right end of the graph, min points to the shortest of the unsorted bars. This bar is then swapped with outer, the leftmost unsorted bar.
Figure 3.10 shows the situation midway through a sort. The bars to the left of outer are sorted, and inner has scanned from outer to the right end, looking for the shortest bar. The min arrow has recorded the position of this bar, which will be swapped with outer.
Use the Size button to switch to 100 bars, and sort a random arrangement. You’ll see how the magenta min arrow hangs out with a perspective minimum value for a while and then jumps to a new one when the blue inner arrow finds a smaller candidate. The red outer arrow moves slowly but inexorably to the right, as the sorted bars accumulate to its left.
Selection sort on baseball players.
The SelectSort Workshop applet.
Java Code for Selection Sort
The listing for the selectSort.java program is similar to that for bubbleSort.java,
except that the container class is called ArraySel instead of ArrayBub, and the
bubbleSort() method has been replaced by selectSort(). Here’s how this method
looks:
public void selectionSort()
{
int out, in, min;
for(out=0; out<nElems-1; out++)
{
min = out;
for(in=out+1; in<nElems; in++)
if(a[in] < a[min] )
min = in;
swap(out, min);
} // end for(out)
} // end selectionSort()
// outer loop
//
//
//
//
//
minimum
inner loop
if min greater,
we have a new min
swap them
The outer loop, with loop variable out, starts at the beginning of the array (index 0) and proceeds toward higher indices. The inner loop, with loop variable in, begins at out and likewise proceeds to the right.
At each new position of in, the elements a[in] and a[min] are compared. If a[in] is
smaller, then min is given the value of in. At the end of the inner loop, min points to the minimum value, and the array elements pointed to by out and min are swapped.
Listing 3.2 shows the complete selectSort.java program.
// selectSort.java
// demonstrates selection sort
// to run this program: C>java SelectSortApp
////////////////////////////////////////////////////////////////
class ArraySel
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArraySel(int max)
// constructor
{
a = new long[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public void insert(long value)
// put element into array
{
a[nElems] = value;
// insert it
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void selectionSort()
{
int out, in, min;
for(out=0; out<nElems-1; out++)
{
min = out;
for(in=out+1; in<nElems; in++)
if(a[in] < a[min] )
min = in;
// outer loop
//
//
//
//
minimum
inner loop
if min greater,
we have a new min
swap(out, min);
// swap them
} // end for(out)
} // end selectionSort()
//--------------------------------------------------------------
private void swap(int one, int two)
{
long temp = a[one];
a[one] = a[two];
a[two] = temp;
}
//--------------------------------------------------------------
} // end class ArraySel
////////////////////////////////////////////////////////////////
class SelectSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArraySel arr;
// reference to array
arr = new ArraySel(maxSize); // create the array
arr.insert(77); arr.display(); // display items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.selectionSort();
}
// insert 10 items
// selection-sort them
arr.display(); // display them again
} // end main()
// end class SelectSortApp
////////////////////////////////////////////////////////////////
The output from selectSort.java is identical to that from bubbleSort.java:
77 99 44 55 22 88 11 0 66 33
0 11 22 33 44 55 66 77 88 99
Invariant
In the selectSort.java program, the data items with indices less than or equal to out are always sorted.
Efficiency of the Selection Sort
The selection sort performs the same number of comparisons as the bubble sort:
N*(N-1)/2. For 10 data items, this is 45 comparisons. However, 10 items require fewer than 10 swaps. With 100 items, 4,950 comparisons are required, but fewer than 100 swaps. For large values of N, the comparison times will dominate, so we would have to say that the selection sort runs in O(N2) time, just as the bubble sort did. However, it is unquestionably faster because there are so few swaps. For smaller values of N, the selection sort may in fact be considerably faster, especially if the swap times are much larger than the comparison times.
3.3 Insertion Sort
In most cases the insertion sort is the best of the elementary sorts described in this chapter. It still executes in O(N2) time, but it’s about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations. It’s also not too complex, although it’s slightly more involved than the bubble and selection sorts.It’s often used as the final stage of more sophisticated sorts, such as quicksort.
Insertion Sort on the Baseball Players
To begin the insertion sort, start with your baseball players lined up in random order. (They wanted to play a game, but clearly there’s no time for that.) It’s easier to think about the insertion sort if we begin in the middle of the process, when the team is half sorted.
Partial Sorting
At this point there’s an imaginary marker somewhere in the middle of the line.
(Maybe you threw a red T-shirt on the ground in front of a player.) The players to
the left of this marker are partially sorted. This means that they are sorted among themselves; each one is taller than the person to his or her left. However, the players aren’t necessarily in their final positions because they may still need to be moved when previously unsorted players are inserted between them
Note that partial sorting did not take place in the bubble sort and selection sort. In these algorithms a group of data items was completely sorted at any given time; in the insertion sort a group of items is only partially sorted.
The Marked Player
The player where the marker is, whom we’ll call the “marked” player, and all the
players on her right, are as yet unsorted. This is shown in Figure 3.11.a.
The insertion sort on baseball players.
What we’re going to do is insert the marked player in the appropriate place in the (partially) sorted group. However, to do this, we’ll need to shift some of the sorted players to the right to make room. To provide a space for this shift, we take the marked player out of line. (In the program this data item is stored in a temporary variable.) This step is shown in Figure 3.11.b.
Now we shift the sorted players to make room. The tallest sorted player moves into the marked player’s spot, the next-tallest player into the tallest player’s spot, andso on.
When does this shifting process stop? Imagine that you and the marked player are walking down the line to the left. At each position you shift another player to the right, but you also compare the marked player with the player about to be shifted. The shifting process stops when you’ve shifted the last player that’s taller than the marked player. The last shift opens up the space where the marked player, when inserted, will be in sorted order. This step is shown in Figure 3.11.c.
Now the partially sorted group is one player bigger, and the unsorted group is one player smaller. The marker T-shirt is moved one space to the right, so it’s again in front of the leftmost unsorted player. This process is repeated until all the unsorted players have been inserted (hence the name insertion sort) into the appropriate place in the partially sorted group.
The InsertSort Workshop Applet
Use the InsertSort Workshop applet to demonstrate the insertion sort. Unlike the
other sorting applets, it’s probably more instructive to begin with 100 random bars rather than 10.
Sorting 100 Bars
Change to 100 bars with the Size button, and click Run to watch the bars sort them- selves before your very eyes. You’ll see that the short red outer arrow marks the dividing line between the partially sorted bars to the left and the unsorted bars to the right. The blue inner arrow keeps starting from outer and zipping to the left, looking for the proper place to insert the marked bar. Figure 3.12 shows how this process looks when about half the bars are partially sorted.
The marked bar is stored in the temporary variable pointed to by the magenta arrow at the right end of the graph, but the contents of this variable are replaced so often that it’s hard to see what’s there (unless you slow down to single-step mode).
Sorting 10 Bars
To get down to the details, use Size to switch to 10 bars. (If necessary, use New to make sure they’re in random order.)
The InsertSort Workshop applet with 100 bars.
At the beginning, inner and outer point to the second bar from the left (array index 1), and the first message is Will copy outer to temp. This will make room for the shift. (There’s no arrow for inner-1, but of course it’s always one bar to the left of inner.)
Click the Step button. The bar at outer will be copied to temp. We say that items are copied from a source to a destination. When performing a copy, the applet removes the bar from the source location, leaving a blank. This is slightly misleading because in a real Java program the reference in the source would remain there. However, blanking the source makes it easier to see what’s happening.
What happens next depends on whether the first two bars are already in order
(smaller on the left). If they are, you’ll see the message Have compared inner-1 and temp, no copy necessary.
If the first two bars are not in order, the message is Have compared inner-1 and temp, will copy inner-1 to inner. This is the shift that’s necessary to make room for the value in temp to be reinserted. There’s only one such shift on this first pass; more shifts will be necessary on subsequent passes. The situation is shown in Figure 3.13.
On the next click, you’ll see the copy take place from inner-1 to inner. Also, the
inner arrow moves one space left. The new message is Now inner is 0, so no copy necessary. The shifting process is complete.
No matter which of the first two bars was shorter, the next click will show you Will copy temp to inner. This will happen, but if the first two bars were initially in order, you won’t be able to tell a copy was performed because temp and inner hold the same bar. Copying data over the top of the same data may seem inefficient, but the algorithm runs faster if it doesn’t check for this possibility, which happens comparatively infrequently.
Now the first two bars are partially sorted (sorted with respect to each other), and the outer arrow moves one space right, to the third bar (index 2). The process repeats, with the Will copy outer to temp message. On this pass through the sorted data, there may be no shifts, one shift, or two shifts, depending on where the third bar fits among the first two.
Continue to single-step the sorting process. Again, you can see what’s happening
more easily after the process has run long enough to provide some sorted bars on the left. Then you can see how just enough shifts take place to make room for the reinsertion of the bar from temp into its proper place.
Java Code for Insertion Sort
Here’s the method that carries out the insertion sort, extracted from the
insertSort.java program:
public void insertionSort()
{
int in, out;
for(out=1; out<nElems; out++)
// out is dividing line
{
long temp = a[out];
// remove marked item
in = out;
// start shifts at out
while(in>0 && a[in-1] >= temp) // until one is smaller,
{
a[in] = a[in-1];
// shift item right,
--in;
}
a[in] = temp;
} // end for
// end insertionSort()
// go left one position
// insert marked item
In the outer for loop, out starts at 1 and moves right. It marks the leftmost unsorted data. In the inner while loop, in starts at out and moves left, until either temp is smaller than the array element there, or it can’t go left any further. Each pass through the while loop shifts another sorted element one space right.
It may be hard to see the relation between the steps in the InsertSort Workshop
applet and the code, so Figure 3.14 is an activity diagram of the insertionSort()
method, with the corresponding messages from the InsertSort Workshop applet.
Listing 3.3 shows the complete insertSort.java program.
// insertSort.java
// demonstrates insertion sort
// to run this program: C>java InsertSortApp
//--------------------------------------------------------------
class ArrayIns
{
private long[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArrayIns(int max)
// constructor
{
a = new long[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
public void insert(long value)
// put element into array
{
a[nElems] = value;
// insert it
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
System.out.print(a[j] + “ “); // display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void insertionSort()
{
int in, out;
for(out=1; out<nElems; out++)
{
long temp = a[out];
in = out;
while(in>0 && a[in-1] >= temp)
{
a[in] = a[in-1];
--in;
}
a[in] = temp;
// insert marked item
} // end for
} // end insertionSort()
//--------------------------------------------------------------
} // end class ArrayIns
////////////////////////////////////////////////////////////////
class InsertSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArrayIns arr;
// reference to array
arr = new ArrayIns(maxSize); // create the array
arr.insert(77); // insert 10 items
arr.insert(99);
arr.insert(44);
arr.insert(55);
arr.insert(22);
arr.insert(88);
arr.insert(11);
arr.insert(00);
arr.insert(66);
arr.insert(33);
arr.display(); // display items
arr.insertionSort(); // insertion-sort them
arr.display();
// display them again
} // end main()
} // end class InsertSortApp
////////////////////////////////////////////////////////////////
Here’s the output from the insertSort.java program; it’s the same as that from the other programs in this chapter:
77 99 44 55 22 88 11 0 66 33
0 11 22 33 44 55 66 77 88 99
Invariants in the Insertion Sort
At the end of each pass, following the insertion of the item from temp, the data items with smaller indices than outer are partially sorted.
Efficiency of the Insertion Sort
How many comparisons and copies does this algorithm require? On the first pass, it compares a maximum of one item. On the second pass, it’s a maximum of two items, and so on, up to a maximum of N-1 comparisons on the last pass. This is
1 + 2 + 3 + ... + N-1 = N*(N-1)/2
However, because on each pass an average of only half of the maximum number of items are actually compared before the insertion point is found, we can divide by 2, which gives
N*(N-1)/4
The number of copies is approximately the same as the number of comparisons.
However, a copy isn’t as time-consuming as a swap, so for random data this algorithm runs twice as fast as the bubble sort and faster than the selection sort.
In any case, like the other sort routines in this chapter, the insertion sort runs in
O(N2) time for random data.
For data that is already sorted or almost sorted, the insertion sort does much better. When data is in order, the condition in the while loop is never true, so it becomes a simple statement in the outer loop, which executes N-1 times. In this case the algorithm runs in O(N) time. If the data is almost sorted, insertion sort runs in almost O(N) time, which makes it a simple and efficient way to order a file that is only slightly out of order.
However, for data arranged in inverse sorted order, every possible comparison and shift is carried out, so the insertion sort runs no faster than the bubble sort. You can check this using the reverse-sorted data option (toggled with New) in the InsertSort Workshop applet.
3.4 Sorting Objects
For simplicity we’ve applied the sorting algorithms we’ve looked at thus far to a
primitive data type: long. However, sorting routines will more likely be applied to
objects than primitive types. Accordingly, we show a Java program in Listing 3.4,
objectSort.java, that sorts an array of Person objects (last seen in the
classDataArray.java program in Chapter 2).
Java Code for Sorting Objects
The algorithm used in our Java program is the insertion sort from the preceding
section. The Person objects are sorted on lastName; this is the key field. The
objectSort.java program is shown in Listing 3.4.
// objectSort.java
// demonstrates sorting objects (uses insertion sort)
// to run this program: C>java ObjectSortApp
////////////////////////////////////////////////////////////////
class Person
{
private String lastName;
private String firstName;
private int age;
//-----------------------------------------------------------
public Person(String last, String first, int a)
{
// constructor
lastName = last;
firstName = first;
age = a;
}
//-----------------------------------------------------------
public void displayPerson()
{
System.out.print(“
Last name: “ + lastName);
System.out.print(“, First name: “ + firstName);
System.out.println(“, Age: “ + age);
}
//-----------------------------------------------------------
public String getLast()
// get last name
{ return lastName; }
} // end class Person
////////////////////////////////////////////////////////////////
class ArrayInOb
{
private Person[] a;
// ref to array a
private int nElems;
// number of data items
//--------------------------------------------------------------
public ArrayInOb(int max)
// constructor
{
a = new Person[max];
// create the array
nElems = 0;
// no items yet
}
//--------------------------------------------------------------
// put person into array
public void insert(String last, String first, int age)
{
a[nElems] = new Person(last, first, age);
nElems++;
// increment size
}
//--------------------------------------------------------------
public void display()
// displays array contents
{
for(int j=0; j<nElems; j++)
// for each element,
a[j].displayPerson();
// display it
System.out.println(“”);
}
//--------------------------------------------------------------
public void insertionSort()
{
int in, out;
for(out=1; out<nElems; out++)
{
Person temp = a[out];
in = out;
// out is dividing line
// remove marked person
// start shifting at out
while(in>0 &&
// until smaller one found,
a[in-1].getLast().compareTo(temp.getLast())>0)
{
a[in] = a[in-1];
// shift item to the right
--in;
// go left one position
}
a[in] = temp;
// insert marked item
} // end for
} // end insertionSort()
//--------------------------------------------------------------
} // end class ArrayInOb
////////////////////////////////////////////////////////////////
class ObjectSortApp
{
public static void main(String[] args)
{
int maxSize = 100;
// array size
ArrayInOb arr;
// reference to array
arr = new ArrayInOb(maxSize); // create the array
arr.insert(“Evans”, “Patty”, 24);
arr.insert(“Smith”, “Doc”, 59);
arr.insert(“Smith”, “Lorraine”, 37);
arr.insert(“Smith”, “Paul”, 37);
arr.insert(“Yee”, “Tom”, 43);
arr.insert(“Hashimoto”, “Sato”, 21);
arr.insert(“Stimson”, “Henry”, 29);
arr.insert(“Velasquez”, “Jose”, 72);
arr.insert(“Vang”, “Minh”, 22);
arr.insert(“Creswell”, “Lucinda”, 18);
System.out.println(“Before sorting:”);
arr.display();
// display items
arr.insertionSort();
// insertion-sort them
System.out.println(“After sorting:”);
arr.display();
// display them again
} // end main()
} // end class ObjectSortApp
////////////////////////////////////////////////////////////////
Here’s the output of this program:
Before sorting:
Last name: Evans, First name: Patty, Age: 24
Last name: Smith, First name: Doc, Age: 59
Last name: Smith, First name: Lorraine, Age: 37
Last name: Smith, First name: Paul, Age: 37
Last name: Yee, First name: Tom, Age: 43
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Stimson, First name: Henry, Age: 29
Last name: Velasquez, First name: Jose, Age: 72
Last name: Vang, First name: Minh, Age: 22
Last name: Creswell, First name: Lucinda, Age: 18
After sorting:
Last name: Creswell, First name: Lucinda, Age: 18
Last name: Evans, First name: Patty, Age: 24
Last name: Hashimoto, First name: Sato, Age: 21
Last name: Smith, First name: Doc, Age: 59
Last name: Smith, First name: Lorraine, Age: 37
Last name: Smith, First name: Paul, Age: 37
Last name: Stimson, First name: Henry, Age: 29
Last name: Vang, First name: Minh, Age: 22
Last name: Velasquez, First name: Jose, Age: 72
Last name: Yee, First name: Tom, Age: 43
Lexicographical Comparisons
The insertSort() method in objectSort.java is similar to that in insertSort.java, but it has been adapted to compare the lastName key values of records rather than the value of a primitive type.
We use the compareTo() method of the String class to perform the comparisons in the insertSort() method. Here’s the expression that uses it:
a[in-1].getLast().compareTo(temp.getLast()) > 0
The compareTo() method returns different integer values depending on the lexico graphical (that is, alphabetical) ordering of the String for which it’s invoked and the String passed to it as an argument, as shown in Table 3.1.
TABLE 3.1
Operation of the compareTo() Method s2.compareTo(s1) Return Value
s1 < s2 <0
0
>0
s1 equals s2
s1 > s2
For example, if s1 is “cat” and s2 is “dog”, the function will return a number less
than 0. In the objectSort.java program, this method is used to compare the last
name of a[in-1] with the last name of temp.
Stability
Sometimes it matters what happens to data items that have equal keys. For example, you may have employee data arranged alphabetically by last names. (That is, the last names were used as key values in the sort.) Now you want to sort the data by ZIP code, but you want all the items with the same ZIP code to continue to be sorted by last names. You want the algorithm to sort only what needs to be sorted, and leave everything else in its original order. Some sorting algorithms retain this secondary ordering; they’re said to be stable.
All the algorithms in this chapter are stable. For example, notice the output of the objectSort.java program (Listing 3.4). Three persons have the last name of Smith. Initially, the order is Doc Smith, Lorraine Smith, and Paul Smith. After the sort, this ordering is preserved, despite the fact that the various Smith objects have been moved to new locations.
3.5 Comparing the Simple Sorts
There’s probably no point in using the bubble sort, unless you don’t have your algorithm book handy. The bubble sort is so simple that you can write it from memory.
Even so, it’s practical only if the amount of data is small. (For a discussion of what “small” means, see Chapter 15, “When to Use What.”)
The selection sort minimizes the number of swaps, but the number of comparisons is still high. This sort might be useful when the amount of data is small and swapping data items is very time-consuming compared with comparing them. The insertion sort is the most versatile of the three and is the best bet in most situations, assuming the amount of data is small or the data is almost sorted. For larger amounts of data, quicksort is generally considered the fastest approach; we’ll examine quicksort in Chapter 7.
We’ve compared the sorting algorithms in terms of speed. Another consideration for any algorithm is how much memory space it needs. All three of the algorithms in this chapter carry out their sort in place, meaning that, besides the initial array, very little extra memory is required. All the sorts require an extra variable to store an item temporarily while it’s being swapped.
You can recompile the example programs, such as bubbleSort.java, to sort larger amounts of data. By timing them for larger sorts, you can get an idea of the differences between them and the time required to sort different amounts of data on your particular system.