Sorting

Sorting

There are several reasons why sorting algorithms are usually covered in great detail in algorithms books. First, they are interesting and demonstrate several useful techniques, such as recursion, problem subdivision, heaps, and trees. Second, sorting algorithms are well studied and are some of the few algorithms for which exact run times are known. It can be shown that the fastest possible algorithm that uses comparisons to sort N items must use O(N log N) time. Several sorting algorithms actually achieve that performance, so in some sense they are optimal. Finally, sorting algorithms are useful. Almost any data is more useful when it is sorted in various ways, so sorting algorithms play an important role in many applications.

Bubble-sort

Bubblesort uses the fairly obvious fact that, if an array is not sorted, it must contain two adjacent elements that are out of order. The algorithm repeatedly passes through the array, swapping items that are out of order, until it can’t nd any more swaps. The following pseudocode shows the bubblesort algorithm:


Bubblesort(Data: values[])
      // Repeat until the array is sorted.
      Boolean: not_sorted = True
      While (not_sorted)
          // Assume we won't find a pair to swap.
          not_sorted = False
          // Search the array for adjacent items that are out of order.
          For i = 0 To <length of values> - 1
              // See if items i and i - 1 are out of order.
              If (values[i] < values[i - 1]) Then
                  // Swap them.
                  Data: temp = values[i]
                  values[i] = values[i - 1]
                  values[i - 1] = temp
                  // The array isn't sorted after all.
                  not_sorted = True
              End If
          Next i
      End While
End Bubblesort
image

imgur.com

Heapsort

Heapsort uses a data structure called a heap and is a useful technique for storing a complete binary tree in an array.

Storing Complete Binary Trees in Arrays A binary tree is a tree where every node is connected to, at most, two children. In a complete tree (binary or otherwise), all the tree’s levels are completely filled, except possibly the last level, where all the nodes are pushed to the left. Storing the heap in an array makes this process particularly easy, because when you need to add a new item to the end of the tree, it’s already in the proper position in the array. The following pseudocode shows the algorithm to turn an array into a heap:


MakeHeap(Data: values[])
      // Add each item to the heap one at a time.
      For i = 0 To <length of values> - 1
          // Start at the new item, and work up to the root.
          Integer: index = i
          While (index != 0)
          // Find the parent's index.
          Integer: parent = (index - 1) / 2
            // If child <= parent, we're done, so
             // break out of the While loop.
                If (values[index] <= values[parent]) Then Break
              // Swap the parent and child.
                     Data: temp = values[index]
                     values[index] = values[parent]
                     values[parent] = temp
                     // Move to the parent.
                     index = parent
                 End While
             Next i
         End MakeHeap


Heaps are useful for creating priority queues because the largest item in the tree is always at the root node. When you need to remove an item from the queue, you simply use the item at the root. Unfortunately, that breaks the heap, so it is no longer a tree. Fortunately, there’s an easy way to x it: Move the last item in the tree to the root. Doing so breaks the tree’s heap property, but you can x that using a method similar to the one you used to build the heap. If the new value is smaller than one of its child values, swap it with the larger child. That fixes the heap property at this node, but it may have broken it at the child’s level, so move down to that node and repeat the process. Continue swapping the node down into the tree until you find a spot where the heap property is already satisfied or you reach the bottom of the tree. The following pseudocode shows the algorithm to remove an item from the heap and restore the heap property:



         Data: RemoveTopItem (Data: values[], Integer: count)
             // Save the top item to return later.
             Data: result = values[0]
             // Move the last item to the root.
             values[0] = values[count - 1]
             // Restore the heap property.
             Integer: index = 0
             While (True)
                 // Find the child indices.
                 Integer: child1 = 2 * index + 1
                 Integer: child2 = 2 * index + 2
// If a child index is off the end of the tree,
          // use the parent's index.
          If (child1 >= count) Then child1 = index
          If (child2 >= count) Then child2 = index
          // If the heap property is satisfied,
          // we're done, so break out of the While loop.
          If ((values[index] >= values[child1]) And
              (values[index] >= values[child2])) Then Break
          // Get the index of the child with the larger value.
          Integer: swap_child
          If (values[child1] > values[child2]) Then
              swap_child = child1
          Else
              swap_child = child2
          // Swap with the larger child.
          Data: temp = values[index]
          values[index] = values[swap_child]
          values[swap_child] = temp
          // Move to the child node.
          index = swap_child
      End While
      // Return the value we removed from the root.
      return result
  End RemoveTopItem


Implementing Heapsort

Now that we know how to build and maintain a heap, implementing the heapsort algorithm is easy. The algorithm builds a heap. It then repeatedly swaps the first and last items in the heap, and rebuilds the heap excluding the last item. During each pass, one item is removed from the heap and added to the end of the array where the items are placed in sorted order. The following pseudocode shows how the algorithm works:



  Heapsort(Data: values)
      <Turn the array into a heap.>
      For i = <length of values> - 1 To 0 Step -1
          // Swap the root item and the last item.
          Data: temp = values[0]
          values[0] = values[i]
          values[i] = temp
          <Consider the item in position i to be removed from the heap, so
           the heap now holds i - 1 items. Push the new root value down
           into the heap to restore the heap property.>
      Next i
  End Heapsort


image

Quicksort

The quicksort algorithm works by subdividing an array into two pieces and then calling itself recursively to sort the pieces. The following pseudocode shows the algorithm at a high level:


Quicksort(Data: values[], Integer: start, Integer: end)
      <Pick a dividing item from the array. Call it divider.>
      <Move items < divider to the front of the array.
       Move items >= divider to the end of the array.
       Let middle be the index between the pieces where divider is put.>
      // Recursively sort the two halves of the array.
      Quicksort(values, start, middle - 1)
      Quicksort(values, middle + 1, end)
  End Quicksort


The following pseudocode shows the entire quicksort algorithm at a low level:


// Sort the indicated part of the array.
         Quicksort(Data: values[], Integer: start, Integer: end)
             // If the list has no more than one element, it's sorted.
             If (start >= end) Then Return
             // Use the first item as the dividing item.
             Integer: divider = values[start]
             // Move items < divider to the front of the array and
             // items >= divider to the end of the array.
             Integer: lo = start
             Integer: hi = end
             While (True)
                 // Search the array from back to front starting at "hi"
                 // to find the last item where value < "divider."
                 // Move that item into the hole. The hole is now where
                 // that item was.
                 While (values[hi] >= divider)
                       hi = hi - 1
                     If (hi <= lo) Then <Break out of the inner While loop.>
                 End While
                 If (hi <= lo) Then
                     // The left and right pieces have met in the middle
                     // so we're done. Put the divider here, and
                     // break out of the outer While loop.
                     values[lo] = divider
<Break out of the outer While loop.>
               End If
          // Move the value we found to the lower half.
          values[lo] = values[hi]
          // Search the array from front to back starting at "lo"
          // to find the first item where value >= "divider."
          // Move that item into the hole. The hole is now where
          // that item was.
          lo = lo + 1
          While (values[lo] < divider)
               lo = lo + 1
              If (lo >= hi) Then <Break out of the inner While loop.>
          End While
          If (lo >= hi) Then
              // The left and right pieces have met in the middle
              // so we're done. Put the divider here, and
              // break out of the outer While loop.
              lo = hi
              values[hi] = divider
              <Break out of the outer While loop.>
           End If
          // Move the value we found to the upper half.
          values[hi] = values[lo]
      End While
      // Recursively sort the two halves.
      Quicksort(values, start, lo - 1)
      Quicksort(values, lo + 1, end)
  End Quicksort


image

Mergesort

Like quicksort, mergesort uses a divide-and-conquer strategy. Instead of picking a dividing item and splitting the items into two groups holding items that are larger and smaller than the dividing item, mergesort splits the items into two halves of equal size. It then recursively calls itself to sort the two halves. When the recursive calls to mergesort return, the algorithm merges the two sorted halves into a combined sorted list. The following pseudocode shows the algorithm:


  Mergesort(Data: values[], Data: scratch[], Integer: start, Integer: end)
      // If the array contains only one item, it is already sorted.
      If (start == end) Then Return
      // Break the array into left and right halves.
      Integer: midpoint = (start + end) / 2
      // Call Mergesort to sort the two halves.
      Mergesort(values, scratch, start, midpoint)
      Mergesort(values, scratch, midpoint + 1, end)
      // Merge the two sorted halves.
      Integer: left_index = start
      Integer: right_index = midpoint + 1
      Integer: scratch_index = left_index
      While ((left_index <= midpoint) And (right_index <= end))
          If (values[left_index] <= values[right_index]) Then
              scratch[scratch_index] = values[left_index]
              left_index = left_index + 1
          Else
              scratch[scratch_index] = values[right_index]
              right_index = right_index + 1
          End If
          scratch_index = scratch_index + 1    End While
      // Finish copying whichever half is not empty.
      For i = left_index To midpoint
          scratch[scratch_index] = values[i]
          scratch_index = scratch_index + 1
      Next i
      For i = right_index To end
          scratch[scratch_index] = values[i]
          scratch_index = scratch_index + 1
Next i
      // Copy the values back into the original values array.
      For i = start To end
          values[i] = scratch[i]
      Next i
  End Mergesort


image

Countingsort

Countingsort works if the values you are sorting are integers that lie in a relatively small range. For example, if you need to sort 1 million integers with values between 0 and 1,000, countingsort can provide amazingly fast performance. The basic idea behind countingsort is to count the number of items in the array that have each value. Then it is relatively easy to copy each value, in order, the required number of times back into the array. Then the following pseudocode shows the countingsort algorithm:


Countingsort(Integer: values[], Integer: max_value)
             // Make an array to hold the counts.
             Integer: counts[0 To max_value]
             // Initialize the array to hold the counts.
             // (This is not necessary in all programming languages.)
             For i = 0 To max_value
                 counts[i] = 0
             Next i
             // Count the items with each value.
             For i = 0 To <length of values> - 1
                 // Add 1 to the count for this value.
                 counts[values[i]] = counts[values[i]] + 1
             Next i
             // Copy the values back into the array.
             Integer: index = 0
             For i = 0 To max_value
                 // Copy the value i into the array counts[i] times.
                 For j = 1 To counts[i]
                     values[index] = i
                     index = index + 1
                 Next j
             Next i
         End Countingsort


image

Bucketsort

The bucketsort algorithm (also called bin sort) works by dividing items into buckets. It sorts the buckets either by recursively calling bucketsort or by using some other algorithm and then concatenates the buckets’ contents back into the original array. The following pseudocode shows the algorithm at a high level:


  Bucketsort(Data: values[])
      <Make buckets.>
      <Distribute the items into the buckets.>
      <Sort the buckets.>
      <Gather the bucket values back into the original array.>
  End Bucketsort

comments powered by Disqus