Sunday, April 13, 2008

Remove duplicates from vector

Q. How do you remove duplicates from a vector?

A. This is how:

    template<typename T>
    void removeDuplicates(std::vector<T>& vec)
    {
        std::sort(vec.begin(), vec.end());
        vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
    }

If I needed a container to keep itself populated of only unique values, I would probably choose std::set but then choosing the right container depends on so many other different factors as well.

Edit Comment : Corrected above by replacing remove with erase as there isn't such a member function in std::vector. std::erase, however, is and the code works with just that replacement.

9 comments:

Alex K said...

Hi Abhishek,

I just tried your solution to remove duplicates - as I was looking for that.

First of all remove is not a method of vector - it should be erase. remove can be used with std::remove. So how can you compile that?

Second: When I tried to compile with the erase() function even more errors appeared pointing to a file called "algorithm".

Can you give a working solution - I'm using Visuall C++ 2005 Express.

Regards, Alex

abnegator said...

Hi Alex,

Thanks for pointing out the error. Yes, it should be vec.erase since there isn't a member remove(). I will correct the blog post. As for your problem, the following should compile fine:

//------CODE-------
#include <iostream>
#include <vector>
#include <algorithm>

void removeDuplicates(std::vector<int>& vec)
{

   std::sort(vec.begin(), vec.end());
   vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}

int main()
{

   std::vector<int> vec(10);
   vec.push_back(20); vec.push_back(20);
   vec.push_back(30); vec.push_back(20);
   std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, "\t"));
   removeDuplicates(vec);
   std::cout << std::endl;
   std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, "\t"));
}
//---------CODE---------

Cheers!

Alex K said...

Hi again,

thx for your effort :) . In VC++ 2005 it works.

i didn't know that copying to an output-stream is so easy - nice version.

But there is still one thing i don't get. I have a special type I use in the vector, so the vector declaration looks like this:

typedef std::vector<alex::color> my_vec_t;

my_vec_t v;

Well, if I use your code with my type in the vector there are 18 erros pointing to the file "algoritm" (as before) - so what is the key to the solution aka door ^^?

Why does it only work with int? Any idea?

thx in advance,

Rgds, Alex

abnegator said...

Hi Alex,

Do you have operator< and operator== defined for your class? Those two would be needed. operator< for std::sort and operator== for std::unique.

For fundamental types like an "int" those operators are in-built but for the user defined classes they should be implemented. Usually, you should be able to find which operator is needed for which algorithm from the documentation. Sometimes though, it might not be as clear.

Hope this helps.

Cheers!

Alex K said...

Hi Abhishek,

actually I have a user-defined operator created as a struct - implemented with a "bigger than" operator (which should be no problem).

The code for such a user-defined sort is like this, right?:

std::sort(v.begin(), v.end(),userDefinedOperator);

this is how it worked for me.

But I have no idea how to code a "== operator".

I think the problem is that there's code from my lecturer at university which is a template to work with as a basis.

The error I get is that "the bool std::operator ==(...) could not be derived from alex::color".

So in fact it's a problem of implementation (as you said) - well I think I'm gonna stop trying to code as it is just an extra task to do - i think i've done more than i had to actually ;)

Thx for your time & effort,

Greets, Alex

Angelo said...

I didn't know about std::unique before... So this was a good illustration. However, I would like to point out that this code is quite inefficient because it employs sorting of the vector elements. Another problem is that the ordering of the vector elements will be lost once sorting is done. This may not be feasible in some cases.

Arthur said...

std::vector::erase() is also quite inefficient. If the first element is erased, every remaining element will be copied to its new position (an O(N) operation). This makes the remove duplicates O(N^2) because of erase...so the sort (O(N log N)) isn't even the biggest potential bottleneck.

If the original ordering does not matter, the following is an alternate approach that's a bit faster:

if (!v.empty())
{
std::vector u;
u.reserve(std::max(v.size() - guessAtNumberOfDuplicates, 1));

std::sort(v.begin(), v.end());

u.push_back(v.front());
for (std::vector::iterator it = v.begin() + 1; it != v.end(); ++it)
{
if (u.back() != *it)
u.push_back(*it);
}
u.swap(v);
}

(Sorry, blogger has removed the indentation and the template T on vector!)

admin said...

Apologies - I misread the example. erase() will only be called once in the original post. It will not be O(N^2) as I previously suggested. The erase(unique()) approach is likely to be as efficient as the vector copy approach.

jesus.dallas said...

amigo,

i found your example very useful.

thank you for your help,
jh, in dallas tx.