Tuesday, July 03, 2007

Pointers or references (function arguments)

There is always a long discussion whenever a choice between a pointer or a reference is to be made. Be it when deciding function arguments (part of the interfaces) or when creating objects and working with them.

The fundamental difference comes around the NULLability of the pointers that helps decide if a reference should be used or a pointers. Certain people are also of the mind that having a pointer in the interface is a good signal to tell the caller that the value can be modified. But so can a reference tell you (of course, you would need to look at the interface). The point is further weakened by the const-keyword. What if the pointer is to a const? There are certain more interesting points to think upon that I touch above here.

Pointers can point to raw memory and not just valid or invalid or NULL objects. In object construction, there is are two steps of memory allocation and then construction via the constructor call. You would not want pass the memory being referred/pointed to as a reference when the second phase is yet to complete. For example, with boost serialization:

    // load data required for construction and invoke constructor in place
    template<class Archive, class T>
    inline void load_construct_data(
    Archive & ar, T * t, const unsigned int file_version ){
        // default just uses the default constructor to initialize
        // previously allocated memory.
        ::new(t)T();
    }

In the above snippet, passing the second object by reference would not make sense because the object is not yet fully constructed. For such lazy construction scenarios, a pointer parameter type looks more clearer as I would not want to associate a reference to a not-fully-constructed object. And quite reasonably, I am yet to see any code doing this.

In addition to that, pointers support arithematic on them, references don't. So, if there is such a need in a function, passing a reference makes it much less clearer. A pointer is more intuitive and natural. A reference would mean a single object (atleast I haven't seen anyone working with a reference to the first element of an array - containers aside - but still we work with iterators or random access there which is closer to pointer semantics). To me, it would look really poor choice to have a reference passed as a start element for an array. I have never seen such code. In this context, think upon : why don't STL algorithms take references? They could have had got back to pointers? They take iterators (and pointers) and it is not that all of them tend to modify the objects always.

Whole of memory management constructs in C and C++ revolve around pointers. new returns a pointer, malloc returns a pointer, free and delete work on pointers. So, preserve the naturality of expressions/statements, a pointer is a better choice. Similarly, references can be the right choice in many cases based on the same ground.

For a piece of C++ code that has to deal with C, or for that matter with any other language interoperability, pointers have no alternative. Why make an unnecessary transformation from pointer to reference and then back to pointer in such scenarios?

There are certain restrictions sometimes, for example exception handling disallowed in the code. In those scenario' of nothrow(), there are operations that throw with references (for example, dynamic_cast<> as compared to pointers. What would be more natural choice - a pointer argument or a reference? Well, that could be argued
upon depending on how the object was being declared and used prior to it but if this happens inside a function (for the interface writer), I would expect it to take a pointer than a reference. When you return the result of the cast, you would not want to dereference it and pass it back. All this pointer to reference transformation makes it look a lot less cleaner. Not to mention the case of a NULL return if the cast fails.

Now, I wonder why boost::lexical_cast<> doesn't take a pointer. For example:

       template<typename Target, typename Source>
       Target lexical_cast(const Source &arg)
       {
               typedef typename detail::array_to_pointer_decay<Source>::type NewSource;
               detail::lexical_stream<Target, NewSource> interpreter;
               Target result;
               if(!(interpreter << arg && interpreter >> result))
                       throw_exception(bad_lexical_cast(typeid(NewSource), typeid(Target)));
               return result;
       }

This template would generate multiple instantiations for a char array of different sizes. That is plain code bloat. Boost writers prevent the lexical_stream<> from bloating by that decay trick but the lexical_cast template itself isn't saved. It needs compiler optimizations to prevent that (with VS2005 and gcc 4.1 that I know
of). That would have had been avoided had it accepted the Source
argument as a pointer in which case, all arrays passed would have had
automatically decayed to pointer. No hacks would have had been
required!

On the initial point of if a pointer/reference signalling modification inside the function - what if pointers are themselves passed by reference? How clearer would the code be from the calling point (for the caller)? :-)

What I feel is the natural semantics should be respected. Be it a reference or a pointer, it does not matter as long as the code does what it needs to do (checking for non-NULL pointers, not change an object) and the restrictions can be implemented using any of those. References provide ease to accessing members and you don't always need the '->' operator which can be a typing pain sometimes. :) I am with the C++ FAQ Lite on this - "Use references where you can, pointers where you have to." A little juggling between them is not really a concern except when it becomes too much causing loss of code clarity/readability and may be performance at some point of time.

No comments: