Deleting duplicates in std::vector
The methods described are aimed at vectors storing primitive data types like int
, char
, float
, etc. The 5 methods that are described below are taken from here and they are:-
Introduction
- Using std::sort + std::unique
- Convert to a set using a constructor
- Convert to a set using manually
- Convert to an unordered set using a constructor
- Convert to an unordered set manually - The fastest of them all
Remove duplicates using std::sort and std::unique
We first need to know the syntax of std::sort
, std::unique
and std::vector::erase
.
-
std::sort
sorts the elements in the range[first, last)
in ascending order. Its syntax isstd::sort(begin(vec), end(vec))
where begin and end arestd::begin
andstd::end
respectively. The time complexity isO(NlogN)
-
std::unique
is an STL algorithm that provides the functionality to remove all consecutive duplicate elements from a given range. Hence,std::sort
precedesstd::unique
. But it is important to remember thatstd::unique
will not decrease the actual size of given range i.e. it will only overwrite the duplicate elements and returns an iterator to the new end of the range. Hence, we need to erase the rest of the array. Eg. 1,2,3,3,3,4,4,5,5 becomes 1,2,3,4,5,4,4,5,5 -
std::vector::erase
is a member function of the vector datatype. It can be used to delete an individual element or a range of elements in the vector.iterator erase (iterator position);
iterator erase (iterator first, iterator last);
By combining these three operations, we can delete all duplicate elements in an array.
vector<int> vec({1,1,3,2,5,4,5,5,2});
sort(begin(vec), end(vec));
vec.erase(unique(begin(vec), end(vec)), end(vec));
// vec will now be: 1, 2, 3, 4, 5
Convert to a set using a constructor
This method like the next few are relatively simple and intutive. The vector is copied into a set. A set can only contain unique elements. Any duplicates that are inserted are ignored. We then convert the set back into vector using std::vector::assign
which will replace the existing vector. We have to remember that std::set
is ordered, hence insertion will take O(NlogN)
time at worst. However, this is still better than using sort + unique.
vector<int> vec({1,1,3,2,5,4,5,5,2});
set<int> s(begin(vec), end(vec));
vec.assign(begin(s), end(s));
// vec will now be: 1, 2, 3, 4, 5
Convert to a set manually
The issue with using a constructor is it adds some additional overhead. The author of the stack overflow comment, which this post is based on, discovered that the constructor of std::set
and std::unordered_set
actually constructs a new node for every element, before checking its value to determine if it should actually be inserted. This can largely be avoided by populating the set manually.
vector<int> vec({1,1,3,2,5,4,5,5,2});
set<int> s;
for (int i : vec)
s.insert(i);
vec.assign( s.begin(), s.end() );
// vec will now be: 1, 2, 3, 4, 5
Convert to a unordered_set using a constructor
std::unordered set
as the name suggests does not store the elements in ascending order like std::set
does. Hence, insertion times will be far better.
vector<int> vec({1,1,3,2,5,4,5,5,2});
unordered_set<int> s(begin(vec), end(vec));
vec.assign( s.begin(), s.end() );
// vec will now be: 1, 2, 3, 4, 5
Convert to a unordered_set manually
Finally we arrive at the fasted performing method. Using an unordered_set and avoiding the overhead caused by the constructor. This method performs much faster than the others mentioned above even as the size of vector increases.
vector<int> vec({1,1,3,2,5,4,5,5,2});
unordered_set<int> s;
for (int i : vec)
s.insert(i);
vec.assign( s.begin(), s.end() );
// vec will now be: 1, 2, 3, 4, 5