How to Find Duplicate Elements in a Stream in Java

When working with collections in Java, a common task is finding duplicate elements. With the introduction of the Stream API in Java 8, performing such operations has become both efficient and elegant. In this blog post, I’ll walk you through various ways to find duplicate elements in a stream using Java, providing explanations and code examples. 


How to Find Duplicate Elements in a Stream in Java


Why Streams?

The Stream API allows developers to process sequences of elements in a functional style. It abstracts away the underlying mechanics of iteration, focusing on what you want to achieve rather than how to achieve it.

Problem Overview

Let’s say we have a collection of integers, and we want to find all the elements that appear more than once in the list.

Approach 1: Using a Set and a Filter

One of the simplest ways to find duplicates is by leveraging two sets:

  • A set to store unique elements.
  • A set to store duplicates.

The logic is straightforward: if the element is already present in the seen set, it’s a duplicate and should be added to the duplicates set.

Code Example:


import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FindDuplicates {
    public static void main(String[] args) {
        List numbers = Arrays.asList(1, 2, 3, 4, 5, 1, 2, 6, 7, 3);

        Set seen = new HashSet<>();
        Set duplicates = numbers.stream()
                                         .filter(n -> !seen.add(n))
                                         .collect(Collectors.toSet());

        System.out.println("Duplicates: " + duplicates);
    }
}

Explanation:

We start by initializing a set seen to store unique elements. The filter method checks if the element can be added to the set. If not, it means the element is a duplicate, and it gets added to the result set (duplicates).

Output:


Duplicates: [1, 2, 3]

Approach 2: Using Collectors.groupingBy()

Another method involves using the Collectors.groupingBy() method to group elements by their value and then filtering out those that occur more than once.

Code Example:


import java.util.*;
import java.util.function.Function;
import java.util.stream.Collectors;

public class FindDuplicates {
    public static void main(String[] args) {
        List numbers = Arrays.asList(1, 2, 3, 4, 5, 1, 2, 6, 7, 3);

        Set duplicates = numbers.stream()
                                         .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
                                         .entrySet()
                                         .stream()
                                         .filter(entry -> entry.getValue() > 1)
                                         .map(Map.Entry::getKey)
                                         .collect(Collectors.toSet());

        System.out.println("Duplicates: " + duplicates);
    }
}

Explanation:

We first use groupingBy(Function.identity(), Collectors.counting()) to create a map where the keys are the elements and the values are their counts. We then filter the entries where the count is greater than 1, meaning they are duplicates. Finally, we collect the keys (the duplicate elements) into a set.

Output:


Duplicates: [1, 2, 3]

Approach 3: Using a Map for Counting

Similar to the groupingBy() approach, we can use a Map manually to count the occurrences of each element in the list.

Code Example:


import java.util.*;
import java.util.stream.Collectors;

public class FindDuplicates {
    public static void main(String[] args) {
        List numbers = Arrays.asList(1, 2, 3, 4, 5, 1, 2, 6, 7, 3);

        Map elementCountMap = numbers.stream()
                                                    .collect(Collectors.groupingBy(n -> n, Collectors.counting()));

        Set duplicates = elementCountMap.entrySet()
                                                 .stream()
                                                 .filter(entry -> entry.getValue() > 1)
                                                 .map(Map.Entry::getKey)
                                                 .collect(Collectors.toSet());

        System.out.println("Duplicates: " + duplicates);
    }
}

Explanation:

We use groupingBy to create a Map<Integer, Long> that holds the element as the key and its count as the value. We then filter entries where the count is greater than 1 and collect the duplicate elements into a Set.

Output:


Duplicates: [1, 2, 3]

Conclusion

Finding duplicate elements in a stream in Java can be accomplished in several ways, each with its pros and cons. Using sets provides an intuitive and easy-to-understand approach, while Collectors.groupingBy() offers more flexibility for handling larger datasets. Depending on your specific use case, you can choose the method that best fits your needs.

Happy Coding!

Post a Comment

Previous Post Next Post