As senior Java developers, we frequently come across scenarios where we need to process arrays and eliminate duplicate elements. This task might seem simple, but choosing an optimal approach is crucial for handling large datasets effectively.
In this blog post, I’ll walk you through different approaches to removing duplicates from an array in Java, focusing on simplicity, readability, and performance.
Problem Statement:
Given an array of integers, the objective is to remove the duplicate elements and return an array with only unique elements, while preserving the order of first occurrences.
Example:
Input:
int[] arr = {4, 2, 2, 3, 4, 5, 3}
Output:
int[] uniqueArr = {4, 2, 3, 5}
Approach 1: Using Set
to Remove Duplicates
Java’s Set
interface provides a straightforward way to eliminate duplicates, as it inherently stores only unique elements. Here's an implementation using LinkedHashSet
, which preserves the order of insertion.
Code:
import java.util.Arrays;
import java.util.LinkedHashSet;
import java.util.Set;
public class RemoveDuplicates {
public static int[] removeDuplicatesUsingSet(int[] arr) {
Set<Integer> uniqueElements = new LinkedHashSet<>();
for (int num : arr) {
uniqueElements.add(num);
}
// Convert Set back to an array
int[] result = new int[uniqueElements.size()];
int i = 0;
for (Integer num : uniqueElements) {
result[i++] = num;
}
return result;
}
public static void main(String[] args) {
int[] arr = {4, 2, 2, 3, 4, 5, 3};
int[] result = removeDuplicatesUsingSet(arr);
System.out.println("Array with unique elements: " + Arrays.toString(result));
}
}
Output:
Array with unique elements: [4, 2, 3, 5]
Explanation:
- A
LinkedHashSet
is used to store the elements of the array. Since sets do not allow duplicates, this automatically removes any repeated elements. LinkedHashSet
maintains the order of insertion, which ensures the first occurrence of an element is retained.
Time Complexity: O(n), where n
is the length of the array. Adding an element to a Set
is generally O(1), and we iterate over the array once.
Approach 2: Using a Temporary Array (Brute Force)
While sets offer a clean and efficient solution, we can also solve this problem without extra data structures. This approach involves scanning through the array and copying only unique elements to a new array.
Code:
import java.util.Arrays;
public class RemoveDuplicates {
public static int[] removeDuplicatesBruteForce(int[] arr) {
int n = arr.length;
if (n == 0 || n == 1) {
return arr;
}
// Temporary array to store unique elements
int[] temp = new int[n];
int j = 0;
for (int i = 0; i < n - 1; i++) {
if (arr[i] != arr[i + 1]) {
temp[j++] = arr[i];
}
}
temp[j++] = arr[n - 1];
// Copy the unique elements back to a new array of the correct size
return Arrays.copyOf(temp, j);
}
public static void main(String[] args) {
int[] arr = {4, 2, 2, 3, 4, 5, 3};
Arrays.sort(arr); // Array needs to be sorted for this approach to work
int[] result = removeDuplicatesBruteForce(arr);
System.out.println("Array with unique elements: " + Arrays.toString(result));
}
}
Output:
Array with unique elements: [2, 3, 4, 5]
Explanation:
- The input array is sorted first, then iterated over. Only the first occurrence of each element is copied into a temporary array.
- Finally, we create a new array that contains only the unique elements.
Time Complexity: O(n log n) due to the sorting step, followed by O(n) for the array scan. Overall complexity is O(n log n).
Approach 3: Using Streams (Java 8+)
In modern Java, the Stream
API offers a highly readable and concise solution. Using a combination of distinct()
and toArray()
, we can remove duplicates with minimal boilerplate.
Code:
import java.util.Arrays;
public class RemoveDuplicates {
public static int[] removeDuplicatesUsingStreams(int[] arr) {
return Arrays.stream(arr)
.distinct()
.toArray();
}
public static void main(String[] args) {
int[] arr = {4, 2, 2, 3, 4, 5, 3};
int[] result = removeDuplicatesUsingStreams(arr);
System.out.println("Array with unique elements: " + Arrays.toString(result));
}
}
Output:
Array with unique elements: [4, 2, 3, 5]
Explanation:
- The
stream()
method converts the array to a stream of integers. distinct()
removes duplicates by internally using a set.toArray()
converts the stream back to an array.
Time Complexity: O(n), as the stream’s distinct()
operation internally leverages a hash-based set to filter out duplicates efficiently.
Which Approach to Choose?
- For simplicity and performance in most cases, Approach 1 (using a Set) is a solid choice. It offers O(n) time complexity and keeps the implementation simple.
- For older Java versions (pre-Java 8), Approach 2 can be an option, though its performance is suboptimal due to sorting.
- If you are working with Java 8+ and prefer concise, functional-style code, Approach 3 with the
Stream
API is elegant and easy to maintain.
Conclusion
Removing duplicates from an array is a common problem, and Java provides multiple ways to solve it. The right approach depends on the requirements—whether it’s simplicity, performance, or a modern functional style.
By understanding these approaches, you can choose the optimal solution for your specific use case.