1. Overview
When it comes to analyzing data in Java, calculating percentiles is a fundamental task in understanding the statistical distribution and characteristics of a numeric dataset.
In this tutorial, we’ll walk through the process of calculating percentiles in Java, providing code examples and explanations along the way.
2. Understanding Percentiles
Before discussing the implementation details, let’s first understand what percentiles are and how they’re commonly used in data analysis.
A percentile is a measure used in statistics indicating the value at or below which a given percentage of observations fall. For instance, the 50th percentile (also known as the median) represents the value below which 50% of the data points fall.
It’s worth noting that percentiles are expressed in the same unit of measurement as the input dataset, not in percent. For example, if the dataset refers to monthly salary, the corresponding percentiles will be expressed in USD, EUR, or other currencies.
Next, let’s see a couple of concrete examples:
Input: A dataset with numbers 1-100 unsorted
-> sorted dataset: [1, 2, ... 49, (50), 51, 52, ..100]
-> The 50th percentile: 50
Input: [-1, 200, 30, 42, -5, 7, 8, 92]
-> sorted dataset: [-2, -1, 7, (8), 30, 42, 92, 200]
-> The 50th percentile: 8
Percentiles are often used to understand data distribution, identify outliers, and compare different datasets. They’re particularly useful when dealing with large datasets or when succinctly summarizing a dataset’s characteristics.
Next, let’s see how to calculate percentiles in Java.
3. Calculating Percentile From a Collection
Now that we understand what percentiles are. Let’s summarize a step-by-step guide to implementing the percentile calculation:
- Sort the given dataset in ascending order
- Calculate the rank of the required percentile as (percentile / 100) * dataset.size
- Take the ceiling value of the rank, as the rank can be a decimal number
- The final result is the element at the index ceiling(rank) – 1 in the sorted dataset
Next, let’s create a generic method to implement the above logic:
static <T extends Comparable<T>> T getPercentile(Collection<T> input, double percentile) {
if (input == null || input.isEmpty()) {
throw new IllegalArgumentException("The input dataset cannot be null or empty.");
}
if (percentile < 0 || percentile > 100) {
throw new IllegalArgumentException("Percentile must be between 0 and 100 inclusive.");
}
List<T> sortedList = input.stream()
.sorted()
.collect(Collectors.toList());
int rank = percentile == 0 ? 1 : (int) Math.ceil(percentile / 100.0 * input.size());
return sortedList.get(rank - 1);
}
As we can see, the implementation above is pretty straightforward. However, it’s worth mentioning a couple of things:
- The validation of the percentile parameter is required ( 0<= percentile <= 100)
- We sorted the input dataset using the Stream API and collected the sorted result in a new list to avoid modifying the original dataset
Next, let’s test our getPercentile() method.
4. Testing the getPercentile() Method
First, the method should throw an IllegalArgumentException if the percentile is out of the valid range:
assertThrows(IllegalArgumentException.class, () -> getPercentile(List.of(1, 2, 3), -1));
assertThrows(IllegalArgumentException.class, () -> getPercentile(List.of(1, 2, 3), 101));
We used the assertThrows() method to verify if the expected exception was raised.
Next, let’s take a List of 1-100 as the input to verify whether the method can produce the expected result:
List<Integer> list100 = IntStream.rangeClosed(1, 100)
.boxed()
.collect(Collectors.toList());
Collections.shuffle(list100);
assertEquals(1, getPercentile(list100, 0));
assertEquals(10, getPercentile(list100, 10));
assertEquals(25, getPercentile(list100, 25));
assertEquals(50, getPercentile(list100, 50));
assertEquals(76, getPercentile(list100, 75.3));
assertEquals(100, getPercentile(list100, 100));
In the above code, we prepared the input list through an IntStream. Further, we used the shuffle() method to sort the 100 numbers randomly.
Additionally, let’s test our method with another dataset input:
List<Integer> list8 = IntStream.of(-1, 200, 30, 42, -5, 7, 8, 92)
.boxed()
.collect(Collectors.toList());
assertEquals(-5, getPercentile(list8, 0));
assertEquals(-5, getPercentile(list8, 10));
assertEquals(-1, getPercentile(list8, 25));
assertEquals(8, getPercentile(list8, 50));
assertEquals(92, getPercentile(list8, 75.3));
assertEquals(200, getPercentile(list8, 100));
5. Calculating Percentile From an Array
Sometimes, the given dataset input is an array instead of a Collection. In this case, we can first convert the input array to a List and then utilize our getPercentile() method to calculate the required percentiles.
Next, let’s demonstrate how to achieve this by taking a long array as the input:
long[] theArray = new long[] { -1, 200, 30, 42, -5, 7, 8, 92 };
//convert the long[] array to a List<Long>
List<Long> list8 = Arrays.stream(theArray)
.boxed()
.toList();
assertEquals(-5, getPercentile(list8, 0));
assertEquals(-5, getPercentile(list8, 10));
assertEquals(-1, getPercentile(list8, 25));
assertEquals(8, getPercentile(list8, 50));
assertEquals(92, getPercentile(list8, 75.3));
assertEquals(200, getPercentile(list8, 100));
As the code shows, since our input is an array of primitives (long[]), we employed Arrays.stream() to convert it to List<Long>. Then, we can pass the converted List to the getPercentile() to get the expected result.
6. Conclusion
In this article, we first discussed the underlying principles of percentiles. Then, we explored how to compute percentiles for a dataset in Java.
As always, the complete source code for the examples is available over on GitHub.