HeapSort: A Python example


In this post, I will give an introduction to a famous sorting algorithm as well as it implementation in Python.

Heapsort is a sorting algorithm, which sorts in place like the insertion sort but it has a running time of O(n logn) as the merge-sort.  This algorithm use a data structure called “heap”, that can be compared nearly to a binary tree. Furthermore, this tree can be represented as an array and each element of this array corresponds to a node of the tree.

The array that represents the heap has two properties, which are:

  • length[Array]: number of elements in the array
  • heap-size[Array]: number of elements in the heap stored within the array.

The array will contain valid numbers between 1 and lenght[A]. The root of the heap can be found in the first element of the array and given an index j, it is possibile to find the indices of the parent element in the tree ( Parent( j ) ), the left element ( Left( j ) ) as well as the right element ( Right( j ) ).

They can be calculated as follows:

def Parent( j ): 
    return j/2
def Left( j ): 
    return 2*j
def Right( j ): 
    return 2*j+1

There are two kind of binary heaps, called max-heap and min-heap and each of them will satisfy an different heap-property. In the case of the max-heap, the property states that, given the index j, for each node of the tree other than the root, the value contained in the array to the index Parent( j ) will be higher or equal to the value stored to the index j.

Array[ Parent( j ) ] >= Array[ j ]

The min-heap will satisfy the opposite property described as follows:

Array[ Parent( j ) ] <= Array[ j ]

Given this property, it is possible to know if the element in the root is the biggest or the smallest one.
The algorithm is implemented using three procedures called Heapify, BuildHeapSort and HeapSort.

The first procedure that is described is Heapify. It runs in O( logn ) and it allows to maintain the heap property in the binary tree. Its implementation in Python is shown below.

def Heapify( A, i, n ): 
    l = Left( i )
    r = Right( i )
    if l  A[ i ]: largest = l
    else: largest = i
    if r  A[ largest ]:
        largest = r
    if largest != i:
        A[ i ], A[ largest ] = A[ largest ], A[ i ]
        Heapify( A, largest, n )

BuildHeapSort procedure, which runs in O( n ), produces a heap from an unordered input array.It uses the Heapify procedure in a bottom-up manner from the element to the index n/2 to 1(the root), where n is the length of the array.

def HeapLength( A ): return len( A ) - 1
def BuildHeap( A ): 
    n = HeapLength( A )
    for i in range( n/2 ,0 ,-1 ):
        Heapify( A, i, n )

The last procedure is called HeapSort and it runs in O( n logn ). The first step in this procedure use the BuildHeapSort subroutine to build a heap given an input array and then use the Heapify procedure to mantain the heap property.

def HeapSort( A ): 
    BuildHeap( A )
    HeapSize = HeapLength( A )
    for i in range( HeapSize, 1 , -1 ):
        A[ 1 ],A[ i ] = A[ i ],A[ 1 ]
        HeapSize = HeapSize-1 
        Heapify( A,1,HeapSize )

The original code can be found here.
An implementation in Java as well as a nice animation of how this algorithm works can be found at http://www.cs.auckland.ac.nz/software/AlgAnim/heapsort.html.

The algorithm is described in detail in Introduction to Algorithms,
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
MIT PRESS