Sunday, July 27, 2008

C++ unions : how to find currently active member

C++ unions can have member functions. And to surprise you even more, they can have constructors and a destructor. But that's not all that I intend to write about. A question straightaway comes into our mind: If unions can have member functions, how would you know which of its data member is currently 'active' or 'set' (well, this can come up even in case they don't have member functions, not very specific to them).

This is important to know because if say member1 of the union is set then that is the one that is active and you can only access its value. Accessing value of any of the other members is undefined behavior as per the standard and we all know what undefined behavior means; it may work in your case, on your machine, and it can fail you in front of a very potential client of yours or your managers when demo-ing your application.

Here is a way; you can let your member functions (is applicable to unions in general) know which data member is currently set:

[code]
union myunion
{
    int member; //if == 1 then mem1 is set, use it; if ==2 mem2 is set, use it
    struct
    {
        int struct_id; //1

        //other members
    } mem1;
    struct
    {

        int struct_id; //2
        //other members
    } mem2;
    //and so on...
    void memfunc()
    {
        //could use a switch
        if(member==1)
        {
            //use mem1
        }
        else if (member ==2)
        {
            //use mem2
        }
        //and so on...
    }
};
[/code]

The in-lined comments are self explanatory. The member variable 'member' can be used as a signifier as to which of the other struct members is currently active, reading which is not unsafe. What makes it safe, you might wonder? To understand the guarantee that the programmer is provided with, let's go through what the C++ standard has to say about it. Quote: 18.5 [class.union]/1 provides the rules that suffice the above explanations. I will quote that paragraph in full below:

[quote]
In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time. [Note: one special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (9.2), and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members; see 9.2. —end note] The size of a union is sufficient to contain the largest of its data members. Each data member is allocated as if it were the sole member of a struct. A union can have member functions (including constructors and destructors), but not virtual (10.3) functions. A union shall not have base classes. A union shall not be used as a base class. An object of a non-trivial class (clause 9) shall not be a member of a union, nor shall an array of such objects. If a union contains a static data member or a member of reference type the program is ill-formed.
[/quote]

The above tells you that member functions are allowed with unions including constructors and destructors. It also makes a statement that structs inside a union that follow a common initial sequence, it is legal to inspect (i.e. query or read the value of) that common initial sequence. Also, it says that, each data member is allocated as if it were the sole member of a struct. In our example above, 'member' is such a data member. It is as if you have a struct as a member of the union with just a single data member of type 'int'.

Moreover, there is another mention in section 9.2/18 as follows:

[Quote]
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note]
[/quote]

Considering the above quotes, it is guaranteed that, each 'struct_id' (in the code sample above) shares the same layout as the first member of the union of int type "member". And hence using them (being the same initial sequence) in a union is perfectly safe, independent of which union member had been previously set (or is active).

This (the int member 'member') allows us to have a convenient way to see which data member of the union is currently active without peeping into any specific struct members (not that it is impossible) and without causing any effect on the size of the union.

Having member functions find out the active member that way is a little inextensible. What if you plan to add another struct member? You would need to enhance the switch statement, or the if-else-if conditionals to accommodate that. But then, even in case unions have no member functions, you might very well need to know that information and the above method helps.

1 comment:

marvellous thoughts said...

You have covered an Excelent topic here. I bet, I would have never got to know about it rest of my life, if I wouldn't have stumbled upon here.

Cheers! Regards from India.
~Shakti