Learning C++: boost

Showing posts with label boost. Show all posts

Saturday, April 19, 2008

boost::tokenizer and BOOST_FOREACH

I looked at the documentation for what all could foreach macro support? The list didn't have boost tokenizer in it, which was expected but it said something that meant, it should work: "The support for STL containers is very general; anything that looks like an STL container counts. If it has nested iterator and const_iterator types and begin() and end() member functions, BOOST_FOREACH will automatically know how to iterate over it."

So, I tried it out and yes, it worked. The following compiled and worked fine with VC++ 2005. The code should be easy to understand that basically, reads a file line by line and tokenizes the lines assuming a space or punctuation as the seperator.

[code]
    //standard headers
    #include <iostream>
    #include <fstream>
    #include <string>

    //boost headers
    #include <boost/tokenizer.hpp>
    #include <boost/foreach.hpp>

    //make BOOST_FOREACH prettier! ;-) :-)
    #define foreach BOOST_FOREACH

    void tokenizeLines(std::istream& inputStream)
    {
        if(inputStream)
        {
            std::string line;
            while(std::getline(inputStream, line, '\n'))
            {
                boost::tokenizer<> tokens(line);
                foreach(const std::string & str, tokens)
                {
                    std::cout << str << "\n";
                }
            }
        }
        else
        {
            std::cerr << "Error: Invalid stream object\n";
        }
    }

    int main()
    {
        std::ifstream inputStream("C:\\testfiles\\myfile.txt");
        tokenizeLines(inputStream);
    }

Wednesday, April 16, 2008

Case insensitive string comparison

While looking up for case insensitive comparison function, I came across this nice article by Matt Austern : Case Insensitive String Comparison). And decided to try to make a sample that does that. Below is the result, am not sure if it is perfect and has no issues but that is the best I could do. Atleast, better than ignoring the locale completely! Yes, ignoring it would work most of the times as it's not needed but just in case, a need came up, what would you do?

I tested the code with VS 2005 (couldn't test it with gcc as I found out that I did not have support for the german locale as I only had the sample strings for that language (out of the above article) and I felt lazy enough to find more sample to test) but if someone finds an issue with some other language words for which your compiler provides locale support, can you please point it out?

Code:

    #include<iostream>
    #include<locale>
    #include<string>
    #include<algorithm>
    #include<functional>
    //used boost::bind due to buggy bind2nd
    //#include<tr1/bind.hpp>
    #include<boost/bind.hpp>

    //using namespace std;
    //using namespace std::tr1;
    //using namespace std::tr1::placeholders;

    using namespace boost;

    struct CaseInsensitiveCompare
    {
        bool operator()(const std::string& lhs, const std::string& rhs)
        {
            std::string lhs_lower;
            std::string rhs_lower;
            //std::transform(lhs.begin(), lhs.end(), std::back_inserter(lhs_lower), std::bind2nd(std::ptr_fun(std::tolower<char>), loc));
            //std::transform(rhs.begin(), rhs.end(), std::back_inserter(rhs_lower), std::bind2nd(std::ptr_fun(std::tolower<char>), loc));
            std::transform(lhs.begin(), lhs.end(), std::back_inserter(lhs_lower), bind(std::tolower<char>, _1, loc));
            std::transform(rhs.begin(), rhs.end(), std::back_inserter(rhs_lower), bind(std::tolower<char>, _1, loc));
            return lhs_lower < rhs_lower;
        }
        CaseInsensitiveCompare(const std::locale& loc_): loc(loc_){}
    private:
        std::locale loc;
    };

    int main()
    {
        std::string lhs = "GEW\334RZTRAMINER";
        std::string rhs = "gew\374rztraminer";
        std::cout << "lhs : " << lhs << std::endl;
        std::cout << "rhs : " << rhs << std::endl;
        CaseInsensitiveCompare cis((std::locale("German_germany")));
        //CaseInsensitiveCompare cis((std::locale()));
        std::cout << "compare result : " << cis(lhs,rhs) << std::endl;
    }

One obvious improvement/alternative could be not copying the strings and instead doing a per character based tolower/toupper and compare them. This has a 2nd advantage as well that it will break out as soon as a mismatch happens for a character, without the need to convert the whole 2 strings into a common case.

Herb Sutter, in one of his Gotw's, writes about a case insensitive string class but also shows the basic problems that would have. Quite simply put, it doesn't work with iostreams (cout/cerr etc). Here: Strings: A case insensitive string class.

Friday, September 14, 2007

boost::any

boost::any is a strong concept and a much better replacement to void* to hold any type of data. You can make heterogenous containers using it as well. Let us see how it works in a very simplified way. The idea is to have a template class that can wrap all types and a value associated with that type. Something like this:

template<typename T>
class HoldData
{
    T t;
};

And then having a base class from which this wrapper would derive, so the above becomes adding a constructor that needs the type to be stored in it to be copy constructible:

class BaseHolder
{
    public:
        virtual ~BaseHolder(){}
};

template<typename T>
class HoldData : public BaseHolder
{
    public:
        HoldData(const T& t_) : t(t_){}
    private:
        T t;
};

Now, you would have a class, name it Variant that will take inputs of all types and then has a pointer to this wrapper's base type. So, now you have (including above classes):

class BaseHolder
{
    public:
        virtual ~BaseHolder(){}
};

template<typename T>
class HoldData : public BaseHolder
{
    public:
        HoldData(const T& t_) : t(t_){}
    private:
        T t;
};

class Variant
{
    public:
        template<typename T>
        Variant(const T& t) : data(new HoldData<T>(t)){}
        ~Variant(){delete data;}
    private:
        BaseHolder* data;
};

You construct the corresponding type's wrapper objects and save their pointer into the another class that you call variant, that can hold and help retrieve any data type and does not lose the respective type information. That is actually what boost::any does. Take a look at the code here - boost::any code.

The documentation on it can be found here - boost::any documentation.

Wednesday, July 11, 2007

Handling dynamic array allocations

One of the recent questions on the forums, that I answered to recently, was related to using std::auto_ptr<> with allocations from array form of new. The need to know was - if the following would work?

std::auto_ptr<int> ptr(new int[100]);

if not then why not?

The answer was simple if one knows what std::auto_ptr<> is good for, what its deleter is. std::auto_ptr<> uses delete operator to destruct owned pointer (pointing to a dynamic allocation made via new operator). So, it comes down to mixing of constructs like array form of new and simple delete. That is undefined behaviour as per the standards. You can read up more about these operators here - Free New, Delete Malloc

By design, the standard auto_ptr<> smart pointer is not suitable for dynamic array allocations. There can be many alternatives though. Few of those are as listed below:

Avoid self memory management and instead use std::vector<> / std::deque<> for dynamic arrays.
Use boost::scoped_array<>/boost::shared_array<>/boost::shared_ptr<> with a custom deleter (delete[]) passed in.
Write your own smart pointer : auto_ptr_array<> (same implementation as std::auto_ptr<>) just that it uses array form of delete instead of delete.

For the second point above - you can read more about custom deleters and shared_ptr<> here - Custom deleters with smart pointers

A boost::shared_array<>/boost::shared_ptr<> might not be the right choice if you don't want the reference count overhead of shared ownership which really should not be required when talking about auto_ptr<>s but if that's not an issue to worry about, it should work fine.

Another confusion arose, why doesn't std::auto_ptr<> do a delete[] instead of delete? Well, again, because it is not intended for array allocations. It would hence not work for singular objects created on the free store. This is because of the way delete[] might be implemented by the compiler.

There can be many ways in which delete[] could be implemented. Two of those are explained here - How does delete[] know how many elements to destroy? It is important for delete[] to know how many elements to destroy particularly since it is used with non-POD types as well and there, the destructor calls are necessary to be made for each of the to-be-destructed elements of the array.

The first method listed in the C++ FAQ Lite uses an extra allocation to remember the size you ask for at the runtime, if you used delete[] for single allocations, the compiler might try to interpret the block before the pointer you call it on as being size but since that was not allocated in the first place (new[] was not used!) - expect anything to happen, a crash or any other form of runtime error (depending upon how that error is translated on a particular system).

For the second, where array length association might not exist, it could cause anything depending upon a number of factors about the compiler implementation. One such possibility is, it could leave the variable uninitialized (yeah, its a bad thing, may not actually be happening, but who knows for sure about all compilers? We are just talking of possibilities here) - in which case the size could be anything - even 2 which goes beyond the buffer you allocated and again can cause any fault to the system/system free store. If the size is initialized to 0, it could leave the memory untouched and hence lost resulting in a leak. Memory leaks can be really dangerous particularly on systems that don't reclaim the leaked blocks by a process.

So, when the standard says, its undefined behaviour, it can result in anything. Worse could happen and thinking about these issues beyond the statement of "undefined behaviour" is rather silly. There can be many such "interesting" stories that one can write up. The key is, use std::auto_ptr<> what it is suited for, use delete/delete[] what they are suited for. Mixing them is really dangerous.

There is a peculiarity to note though. If you do the allocation for the single element using array form of new - you could use delete[] for that single element. That is:

int * ptr = new int[1];
delete[] ptr; //OK

Well, not really a peculiarity, but yeah, something to take note of.

Forum reference : What's wrong with std::auto_ptr<int> ptr(new int[100]);?

Monday, May 07, 2007

Custom deleters with smart pointers

There is just one smart pointer as part of the standard C++ library as of now (excluding the tr1 and proposals). That is the std::auto_ptr.

It is crucial to know that you can only use it for allocations made via the new operator. It neither handles, the allocations made via the array form of new or malloc or any other allocation routine provided, for that matter.

How can you extend it to be able to use it with those? Basically, you can't. There are an option though - you lay down your own version of auto_ptrs specific to those which have the delete call replaced by free/delete[]/or the relevant deleter. This will ask for writing down multiple classes for them and using them as appropriate. There is another way to extend it but before that let's look at how boost::shared_ptr handles it.

Boost shared_ptr has a concept for a custom deleter. The deleter would operate on the pointer (when the reference count drops to 0 - remembers the ownership is shared!) and the effect should be that of freeing the resource. The following is an illustration:

[CODE]

template<typename T>
struct mallocDeleter
{
    void operator() (T*& ptr)
    {
        if (ptr)
        {
            free(ptr);
            ptr=NULL;
        }
    }
};
void somefunction()
{
    MyType* ptr = getAllocatedPointer();//returns pointer to memory allocated by malloc
    shared_ptr<MyType, mallocDeleter<MyType> >object(ptr, mallocDeleter());
}

//scope ends shared_ptr's destructor gets called that then called mallocDeleter(ptr) - The default deleter is "delete".

Notice, how this expresses the flexibility. You could have acquired any other resource and would have wrapped it inside a shared_ptr and would provide the deleter to free that resource. For example, a File Handle - on which the custom deleter calls close (for C++ fstream objects, you don't need that - they already exploit RAII). Or, a db connection handle, for which the deleter would handle closing it. Just about anything that is manually allocated! Amazing how the ability to provide the deleter makes this smartness so generic.

I wonder why the auto_ptr or the boost scoped_ptr don't provide this ability. RAII is truely magic and takes care of resource leak issues so well.

Cheers!

Learning C++

Saturday, April 19, 2008

boost::tokenizer and BOOST_FOREACH

Wednesday, April 16, 2008

Case insensitive string comparison

Friday, September 14, 2007

boost::any

Wednesday, July 11, 2007

Handling dynamic array allocations

Monday, May 07, 2007

Custom deleters with smart pointers

About Me

Hit Counter

Index

My Other blogs

Visitor Map

I visit these

I also visit these

googleanalytics