Efficient Java Program to Remove Duplicate Elements from an Array

As senior Java developers, we frequently come across scenarios where we need to process arrays and eliminate duplicate elements. This task might seem simple, but choosing an optimal approach is crucial for handling large datasets effectively.

In this blog post, I’ll walk you through different approaches to removing duplicates from an array in Java, focusing on simplicity, readability, and performance.


Problem Statement:

Given an array of integers, the objective is to remove the duplicate elements and return an array with only unique elements, while preserving the order of first occurrences.

Example:

Input:
int[] arr = {4, 2, 2, 3, 4, 5, 3}

Output:
int[] uniqueArr = {4, 2, 3, 5}


Approach 1: Using Set to Remove Duplicates

Java’s Set interface provides a straightforward way to eliminate duplicates, as it inherently stores only unique elements. Here's an implementation using LinkedHashSet, which preserves the order of insertion.

Code:


import java.util.Arrays;
import java.util.LinkedHashSet;
import java.util.Set;

public class RemoveDuplicates {
    public static int[] removeDuplicatesUsingSet(int[] arr) {
        Set<Integer> uniqueElements = new LinkedHashSet<>();
        for (int num : arr) {
            uniqueElements.add(num);
        }
        // Convert Set back to an array
        int[] result = new int[uniqueElements.size()];
        int i = 0;
        for (Integer num : uniqueElements) {
            result[i++] = num;
        }
        return result;
    }

    public static void main(String[] args) {
        int[] arr = {4, 2, 2, 3, 4, 5, 3};
        int[] result = removeDuplicatesUsingSet(arr);
        System.out.println("Array with unique elements: " + Arrays.toString(result));
    }
}

Output:


Array with unique elements: [4, 2, 3, 5]

Explanation:

  • A LinkedHashSet is used to store the elements of the array. Since sets do not allow duplicates, this automatically removes any repeated elements.
  • LinkedHashSet maintains the order of insertion, which ensures the first occurrence of an element is retained.

Time Complexity: O(n), where n is the length of the array. Adding an element to a Set is generally O(1), and we iterate over the array once.


Approach 2: Using a Temporary Array (Brute Force)

While sets offer a clean and efficient solution, we can also solve this problem without extra data structures. This approach involves scanning through the array and copying only unique elements to a new array.

Code:


import java.util.Arrays;

public class RemoveDuplicates {
    public static int[] removeDuplicatesBruteForce(int[] arr) {
        int n = arr.length;
        if (n == 0 || n == 1) {
            return arr;
        }
        // Temporary array to store unique elements
        int[] temp = new int[n];
        int j = 0;

        for (int i = 0; i < n - 1; i++) {
            if (arr[i] != arr[i + 1]) {
                temp[j++] = arr[i];
            }
        }
        temp[j++] = arr[n - 1];

        // Copy the unique elements back to a new array of the correct size
        return Arrays.copyOf(temp, j);
    }

    public static void main(String[] args) {
        int[] arr = {4, 2, 2, 3, 4, 5, 3};
        Arrays.sort(arr); // Array needs to be sorted for this approach to work
        int[] result = removeDuplicatesBruteForce(arr);
        System.out.println("Array with unique elements: " + Arrays.toString(result));
    }
}

Output:


Array with unique elements: [2, 3, 4, 5]

Explanation:

  • The input array is sorted first, then iterated over. Only the first occurrence of each element is copied into a temporary array.
  • Finally, we create a new array that contains only the unique elements.

Time Complexity: O(n log n) due to the sorting step, followed by O(n) for the array scan. Overall complexity is O(n log n).


Approach 3: Using Streams (Java 8+)

In modern Java, the Stream API offers a highly readable and concise solution. Using a combination of distinct() and toArray(), we can remove duplicates with minimal boilerplate.

Code:


import java.util.Arrays;

public class RemoveDuplicates {
    public static int[] removeDuplicatesUsingStreams(int[] arr) {
        return Arrays.stream(arr)
                     .distinct()
                     .toArray();
    }

    public static void main(String[] args) {
        int[] arr = {4, 2, 2, 3, 4, 5, 3};
        int[] result = removeDuplicatesUsingStreams(arr);
        System.out.println("Array with unique elements: " + Arrays.toString(result));
    }
}

Output:


Array with unique elements: [4, 2, 3, 5]

Explanation:

  • The stream() method converts the array to a stream of integers.
  • distinct() removes duplicates by internally using a set.
  • toArray() converts the stream back to an array.

Time Complexity: O(n), as the stream’s distinct() operation internally leverages a hash-based set to filter out duplicates efficiently.


Which Approach to Choose?

  • For simplicity and performance in most cases, Approach 1 (using a Set) is a solid choice. It offers O(n) time complexity and keeps the implementation simple.
  • For older Java versions (pre-Java 8), Approach 2 can be an option, though its performance is suboptimal due to sorting.
  • If you are working with Java 8+ and prefer concise, functional-style code, Approach 3 with the Stream API is elegant and easy to maintain.

Conclusion

Removing duplicates from an array is a common problem, and Java provides multiple ways to solve it. The right approach depends on the requirements—whether it’s simplicity, performance, or a modern functional style.

By understanding these approaches, you can choose the optimal solution for your specific use case.

Post a Comment

Previous Post Next Post