1. Introduction
In this tutorial, we’ll see how Heap Sort works, and we’ll implement it in Java.
Heap Sort is based on the Heap data structure. In order to understand Heap Sort properly, we’ll first dig into Heaps and how they are implemented.
2. Heap Data Structure
A Heap is a specialized tree-based data structure. Therefore it’s composed of nodes. We assign the elements to nodes: every node contains exactly one element.
Also, nodes can have children. If a node doesn’t have any children, we call it leaf.
What Heap makes special are two things:
- every node’s value must be less or equal to all values stored in its children
- it’s a complete tree, which means it has the least possible height
Because of the 1st rule, the least element always will be in the root of the tree.
How we enforce these rules is implementation-dependent.
Heaps are usually used to implement priority queues because Heap is a very efficient implementation of extracting the least (or greatest) element.
2.1. Heap Variants
Heap has many variants, all of them differ in some implementation details.
For example, what we described above is a Min-Heap, because a parent is always less than all of its children. Alternatively, we could have defined Max-Heap, in which case a parent is always greater than it’s children. Hence, the greatest element will be in the root node.
We can choose from many tree implementations. The most straightforward is a Binary Tree. In a Binary Tree, every node can have at most two children. We call them left child and right child.
The easiest way to enforce the 2nd rule is to use a Full Binary Tree. A Full Binary Tree follows some simple rules:
- if a node has only one child, that should be its left child
- only the rightmost node on the deepest level can have exactly one child
- leaves can only be on the deepest level
Let’s see these rules with some examples:
1 2 3 4 5 6 7 8 9 10 () () () () () () () () () () / \ / \ / \ / \ / \ / / / \ () () () () () () () () () () () () () () / \ / \ / \ / / \ () () () () () () () () () / ()
The trees 1, 2, 4, 5 and 7 follow the rules.
Tree 3 and 6 violate the 1st rule, 8 and 9 the 2nd rule, and 10 violate the 3rd rule.
In this tutorial, we’ll focus on Min-Heap with a Binary Tree implementation.
2.2. Inserting Elements
We should implement all operations in a way, that keeps the Heap invariants. This way, we can build the Heap with repeated insertions, so we’ll focus on the single insert operation.
We can insert an element with the following steps:
- create a new leaf which is the rightmost available slot on the deepest level and store the item in that node
- if the element is less than it’s parent, we swap them
- continue with step 2, until the element is less than it’s parent or it becomes the new root
Note, that step 2 won’t violate the Heap rule, because if we replace a node’s value with a less one, it still will be less than it’s children.
Let’s see an example! We want to insert 4 into this Heap:
2 / \ / \ 3 6 / \ 5 7
The first step is to create a new leaf which stores 4:
2 / \ / \ 3 6 / \ / 5 7 4
Since 4 is less than it’s parent, 6, we swap them:
2 / \ / \ 3 4 / \ / 5 7 6
Now we check whether 4 is less than it’s parent or not. Since its parent is 2, we stop. The Heap is still valid, and we inserted number 4.
Let’s insert 1:
2 / \ / \ 3 4 / \ / \ 5 7 6 1
We have to swap 1 and 4:
2 / \ / \ 3 1 / \ / \ 5 7 6 4
Now we should swap 1 and 2:
1 / \ / \ 3 2 / \ / \ 5 7 6 4
Since 1 is the new root, we stop.
3. Heap Implementation in Java
Since we use a Full Binary Tree, we can implement it with an array: an element in the array will be a node in the tree. We mark every node with the array indices from left-to-right, from top-to-bottom the following way:
0 / \ / \ 1 2 / \ / 3 4 5
The only thing we need is to keep track of how many elements we store in the tree. This way the index of the next element we want to insert will be the size of the array.
Using this indexing, we can calculate the index of the parent and child nodes:
- parent: (index – 2) / 2
- left child: 2 * index + 1
- right child: 2 * index + 2
Since we don’t want to bother with array reallocating, we’ll simplify the implementation even more and use an ArrayList.
A basic Binary Tree implementation looks like this:
class BinaryTree<E> { List<E> elements = new ArrayList<>(); void add(E e) { elements.add(e); } boolean isEmpty() { return elements.isEmpty(); } E elementAt(int index) { return elements.get(index); } int parentIndex(int index) { return (index - 1) / 2; } int leftChildIndex(int index) { return 2 * index + 1; } int rightChildIndex(int index) { return 2 * index + 2; } }
The code above only adds the new element to the end of the tree. Therefore, we need to traverse the new element up if necessary. We can do it with the following code:
class Heap<E extends Comparable<E>> { // ... void add(E e) { elements.add(e); int elementIndex = elements.size() - 1; while (!isRoot(elementIndex) && !isCorrectChild(elementIndex)) { int parentIndex = parentIndex(elementIndex); swap(elementIndex, parentIndex); elementIndex = parentIndex; } } boolean isRoot(int index) { return index == 0; } boolean isCorrectChild(int index) { return isCorrect(parentIndex(index), index); } boolean isCorrect(int parentIndex, int childIndex) { if (!isValidIndex(parentIndex) || !isValidIndex(childIndex)) { return true; } return elementAt(parentIndex).compareTo(elementAt(childIndex)) < 0; } boolean isValidIndex(int index) { return index < elements.size(); } void swap(int index1, int index2) { E element1 = elementAt(index1); E element2 = elementAt(index2); elements.set(index1, element2); elements.set(index2, element1); } // ... }
Note, that since we need to compare the elements, they need to implement java.util.Comparable.
4. Heap Sort
Since the root of the Heap always contains the smallest element, the idea behind Heap Sort is pretty simple: remove the root node until the Heap becomes empty.
The only thing we need is a remove operation, which keeps the Heap in a consistent state. We must ensure that we don’t violate the structure of the Binary Tree or the Heap property.
To keep the structure, we can’t delete any element, except the rightmost leaf. So the idea is to remove the element from the root node and store the rightmost leaf in the root node.
But this operation will most certainly violate the Heap property. So if the new root is greater than any of its child nodes, we swap it with its least child node. Since the least child node is less than all other child nodes, it doesn’t violate the Heap property.
We keep swapping until the element becomes a leaf, or it’s less than all of its children.
Let’s delete the root from this tree:
1 / \ / \ 3 2 / \ / \ 5 7 6 4
First, we place the last leaf in the root:
4 / \ / \ 3 2 / \ / 5 7 6
Then, since it’s greater than both of its children, we swap it with its least child, which is 2:
2 / \ / \ 3 4 / \ / 5 7 6
4 is less than 6, so we stop.
5. Heap Sort Implementation in Java
With all we have, removing the root (popping) looks like this:
class Heap<E extends Comparable<E>> { // ... E pop() { if (isEmpty()) { throw new IllegalStateException("You cannot pop from an empty heap"); } E result = elementAt(0); int lasElementIndex = elements.size() - 1; swap(0, lasElementIndex); elements.remove(lasElementIndex); int elementIndex = 0; while (!isLeaf(elementIndex) && !isCorrectParent(elementIndex)) { int smallerChildIndex = smallerChildIndex(elementIndex); swap(elementIndex, smallerChildIndex); elementIndex = smallerChildIndex; } return result; } boolean isLeaf(int index) { return !isValidIndex(leftChildIndex(index)); } boolean isCorrectParent(int index) { return isCorrect(index, leftChildIndex(index)) && isCorrect(index, rightChildIndex(index)); } int smallerChildIndex(int index) { int leftChildIndex = leftChildIndex(index); int rightChildIndex = rightChildIndex(index); if (!isValidIndex(rightChildIndex)) { return leftChildIndex; } if (elementAt(leftChildIndex).compareTo(elementAt(rightChildIndex)) < 0) { return leftChildIndex; } return rightChildIndex; } // ... }
Like we said before, sorting is just creating a Heap, and removing the root repeatedly:
class Heap<E extends Comparable<E>> { // ... static <E extends Comparable<E>> List<E> sort(Iterable<E> elements) { Heap<E> heap = of(elements); List<E> result = new ArrayList<>(); while (!heap.isEmpty()) { result.add(heap.pop()); } return result; } static <E extends Comparable<E>> Heap<E> of(Iterable<E> elements) { Heap<E> result = new Heap<>(); for (E element : elements) { result.add(element); } return result; } // ... }
We can verify it’s working with the following test:
@Test void givenNotEmptyIterable_whenSortCalled_thenItShouldReturnElementsInSortedList() { // given List<Integer> elements = Arrays.asList(3, 5, 1, 4, 2); // when List<Integer> sortedElements = Heap.sort(elements); // then assertThat(sortedElements).isEqualTo(Arrays.asList(1, 2, 3, 4, 5)); }
Note, that we could provide an implementation, which sorts in-place, which means we provide the result in the same array we got the elements. Additionally, this way we don’t need any intermediate memory allocation. However, that implementation would be a bit harder to understand.
6. Time Complexity
Heap sort consists of two key steps, inserting an element and removing the root node. Both steps have the complexity O(log n).
Since we repeat both steps n times, the overall sorting complexity is O(n log n).
Note, that we didn’t mention the cost of array reallocation, but since it’s O(n), it doesn’t affect the overall complexity. Also, as we mentioned before, it’s possible to implement an in-place sorting, which means no array reallocation is necessary.
Also worth mentioning, that 50% of the elements are leaves, and 75% of elements are at the two bottommost levels. Therefore, most insert operations won’t take more, than two steps.
Note, that on real-world data, Quicksort is usually more performant than Heap Sort. The silver lining is that Heap Sort always has a worst-case O(n log n) time complexity.
7. Conclusion
In this tutorial, we saw an implementation of Binary Heap and Heap Sort.
Even though it’s time complexity is O(n log n), in most cases, it isn’t the best algorithm on real-world data.
As usual, the examples are available over on GitHub.