1. Overview
In this tutorial, we'll implement different solutions to the problem of finding the k largest elements in an array with Java. To describe time complexity we`ll be using Big-O notation.
2. Brute-Force Solution
The brute-force solution to this problem is to iterate through the given array k times. In each iteration, we'll find the largest value. Then we'll remove this value from the array and put into the output list:
public List findTopK(List input, int k) {
List array = new ArrayList<>(input);
List topKList = new ArrayList<>();
for (int i = 0; i < k; i++) {
int maxIndex = 0;
for (int j = 1; j < array.size(); j++) {
if (array.get(j) > array.get(maxIndex)) {
maxIndex = j;
}
}
topKList.add(array.remove(maxIndex));
}
return topKList;
}
If we suppose n to be the size of the given array, the time complexity of this solution is O(n * k). Furthermore, this is the most inefficient solution.
3. Java Collections Approach
However, more efficient solutions to this problem exist. In this section, we'll explain two of them using Java Collections.
3.1. TreeSet
TreeSet has a Red-Black Tree data structure as a backbone. As a result, putting a value to this set costs O(log n). TreeSet is a sorted collection. Therefore, we can put all the values in the TreeSet and extract the first k of them:
public List<Integer> findTopK(List<Integer> input, int k) {
Set<Integer> sortedSet = new TreeSet<>(Comparator.reverseOrder());
sortedSet.addAll(input);
return sortedSet.stream().limit(k).collect(Collectors.toList());
}
The time complexity of this solution is O(n * log n). Above all, this is supposed to be more efficient than the brute-force approach if k ≥ log n.
It's important to remember that TreeSet contains no duplicates. As a result, the solution works only for an input array with distinct values.
3.2. PriorityQueue
PriorityQueue is a Heap data structure in Java. With its help, we can achieve an O(n * log k) solution. Moreover, this will be a faster solution than the previous one. Due to the stated problem, k is always less than the size of the array. So, it means that O(n * log k) ≤ O(n * log n).
The algorithm iterates once through the given array. At each iteration, we'll add a new element to the heap. Also, we'll keep the size of the heap to be less than or equal to k. So, we'll have to remove extra elements from the heap and add new ones. As a result, after iterating through the array, the heap will contain the k largest values:
public List<Integer> findTopK(List<Integer> input, int k) {
PriorityQueue<Integer> maxHeap = new PriorityQueue<>();
input.forEach(number -> {
maxHeap.add(number);
if (maxHeap.size() > k) {
maxHeap.poll();
}
});
List<Integer> topKList = new ArrayList<>(maxHeap);
Collections.reverse(topKList);
return topKList;
}
4. Selection Algorithm
There are many approaches to solve the given problem. And, although it's beyond the scope of this tutorial, using the Selection algorithm approach will be the best because it yields a linear time complexity.
5. Conclusion
In this tutorial, we've described several solutions for finding the k largest elements in an array.
As usual, the example code is available over on GitHub.