Team LiB
Previous Section Next Section

19.6. union: A Space-Saving Class

 

A union is a special kind of class. A union may have multiple data members, but at any point in time, only one of the members may have a value. When a value is assigned to one member of the union, all other members become undefined. The amount of storage allocated for a union is at least as much as is needed to contain its largest data member. Like any class, a union defines a new type.

 

Some, but not all, class features apply equally to unions. A union cannot have a member that is a reference, but it can have members of most other types, including, under the new standard, class types that have constructors or destructors. A union can specify protection labels to make members public, private, or protected. By default, like structs, members of a union are public.

 

A union may define member functions, including constructors and destructors. However, a union may not inherit from another class, nor may a union be used as a base class. As a result, a union may not have virtual functions.

 

Defining a union

 

unions offer a convenient way to represent a set of mutually exclusive values of different types. As an example, we might have a process that handles different kinds of numeric or character data. That process might define a union to hold these values:

 

 

// objects of type Token have a single member, which could be of any of the listed types
union Token {
// members are public by default
    char   cval;
    int    ival;
    double dval;
};

 

A union is defined starting with the keyword union, followed by an (optional) name for the union and a set of member declarations enclosed in curly braces. This code defines a union named Token that can hold a value that is either a char, an int, or a double.

 

Using a union Type

 

The name of a union is a type name. Like the built-in types, by default unions are uninitialized. We can explicitly initialize a union in the same way that we can explicitly initialize aggregate classes (§ 7.5.5, p. 298) by enclosing the initializer in a pair of curly braces:

 

 

Token first_token = {'a'}; // initializes the cval member
Token last_token;          // uninitialized Token object
Token *pt = new Token;     // pointer to an uninitialized Token object

 

If an initializer is present, it is used to initialize the first member. Hence, the initialization of first_token gives a value to its cval member.

 

The members of an object of union type are accessed using the normal member access operators:

 

last_token.cval = 'z';
pt->ival = 42;

 

Assigning a value to a data member of a union object makes the other data members undefined. As a result, when we use a union, we must always know what type of value is currently stored in the union. Depending on the types of the members, retrieving or assigning to the value stored in the union through the wrong data member can lead to a crash or other incorrect program behavior.

 

Anonymous unions

 

An anonymous union is an unnamed union that does not include any declarations between the close curly that ends its body and the semicolon that ends the union definition (§ 2.6.1, p. 73). When we define an anonymous union the compiler automatically creates an unnamed object of the newly defined union type:

 

 

union {           // anonymous union
    char   cval;
    int    ival;
    double dval;
};  // defines an unnamed object, whose members we can access directly
cval = 'c'; // assigns a new value to the unnamed, anonymous union object
ival = 42;  // that object now holds the value 42

 

The members of an anonymous union are directly accessible in the scope where the anonymous union is defined.

 

Image Note

An anonymous union cannot have private or protected members, nor can an anonymous union define member functions.

 

 

unions with Members of Class Type

 
Image

Under earlier versions of C++, unions could not have members of a class type that defined its own constructors or copy-control members. Under the new standard, this restriction is lifted. However, unions with members that define their own constructors and/or copy-control members are more complicated to use than unions that have members of built-in type.

 

When a union has members of built-in type, we can use ordinary assignment to change the value that the union holds. Not so for unions that have members of nontrivial class types. When we switch the union’s value to and from a member of class type, we must construct or destroy that member, respectively: When we switch the union to a member of class type, we must run a constructor for that member’s type; when we switch from that member, we must run its destructor.

 

When a union has members of built-in type, the compiler will synthesize the memberwise versions of the default constructor or copy-control members. The same is not true for unions that have members of a class type that defines its own default constructor or one or more of the copy-control members. If a union member’s type defines one of these members, the compiler synthesizes the corresponding member of the union as deleted (§ 13.1.6, p. 508).

 

For example, the string class defines all five copy-control members and the default constructor. If a union contains a string and does not define its own default constructor or one of the copy-control members, then the compiler will synthesize that missing member as deleted. If a class has a union member that has a deleted copy-control member, then that corresponding copy-control operation(s) of the class itself will be deleted as well.

 

Using a Class to Manage union Members

 

Because of the complexities involved in constructing and destroying members of class type, unions with class-type members ordinarily are embedded inside another class. That way the class can manage the state transitions to and from the member of class type. As an example, we’ll add a string member to our union. We’ll define our union as an anonymous union and make it a member of a class named Token. The Token class will manage the union’s members.

 

To keep track of what type of value the union holds, we usually define a separate object known as a discriminant. A discriminant lets us discriminate among the values that the union can hold. In order to keep the union and its discriminant in sync, we’ll make the discriminant a member of Token as well. Our class will define a member of an enumeration type (§ 19.3, p. 832) to keep track of the state of its union member.

 

The only functions our class will define are the default constructor, the copy-control members, and a set of assignment operators that can assign a value of one of our union’s types to the union member:

 

 

class Token {
public:
    // copy control needed because our class has a union with a string member
    // defining the move constructor and move-assignment operator is left as an exercise
    Token(): tok(INT), ival{0} { }
    Token(const Token &t): tok(t.tok) { copyUnion(t); }
    Token &operator=(const Token&);
    // if the union holds a string, we must destroy it; see § 19.1.2 (p. 824)
    ~Token() { if (tok == STR) sval.~string(); }
    // assignment operators to set the differing members of the union
    Token &operator=(const std::string&);
    Token &operator=(char);
    Token &operator=(int);
    Token &operator=(double);
private:
    enum {INT, CHAR, DBL, STR} tok; // discriminant
    union {                         // anonymous union
        char   cval;
        int    ival;
        double dval;
        std::string sval;
    }; // each Token object has an unnamed member of this unnamed union type
    // check the discriminant and copy the union member as appropriate
    void copyUnion(const Token&);
};

 

Our class defines a nested, unnamed, unscoped enumeration (§ 19.3, p. 832) that we use as the type for the member named tok. We defined tok following the close curly and before the semicolon that ends the definition of the enum, which defines tok to have this unnamed enum type (§ 2.6.1, p. 73).

 

We’ll use tok as our discriminant. When the union holds an int value, tok will have the value INT; if the union has a string, tok will be STR; and so on.

 

The default constructor initializes the discriminant and the union member to hold an int value of 0.

 

Because our union has a member with a destructor, we must define our own destructor to (conditionally) destroy the string member. Unlike ordinary members of a class type, class members that are part of a union are not automatically destroyed. The destructor has no way to know which type the union holds, so it cannot know which member to destroy.

 

Our destructor checks whether the object being destroyed holds a string. If so, the destructor explicitly calls the string destructor (§ 19.1.2, p. 824) to free the memory used by that string. The destructor has no work to do if the union holds a member of any of the built-in types.

 

Managing the Discriminant and Destroying the string

 

The assignment operators will set tok and assign the corresponding member of the union. Like the destructor, these members must conditionally destroy the string before assigning a new value to the union:

 

 

Token &Token::operator=(int i)
{
    if (tok == STR) sval.~string(); // if we have a string, free it
    ival = i;                       // assign to the appropriate member
    tok = INT;                      // update the discriminant
    return *this;
}

 

If the current value in the union is a string, we must destroy that string before assigning a new value to the union. We do so by calling the string destructor. Once we’ve cleaned up the string member, we assign the given value to the member that corresponds to the parameter type of the operator. In this case, our parameter is an int, so we assign to ival. We update the discriminant and return.

 

The double and char assignment operators behave identically to the int version and are left as an exercise. The string version differs from the others because it must manage the transition to and from the string type:

 

 

Token &Token::operator=(const std::string &s)
{
    if (tok == STR) // if we already hold a string, just do an assignment
        sval = s;
    else
        new(&sval) string(s); // otherwise construct a string
    tok = STR;                // update the discriminant
    return *this;
}

 

In this case, if the union already holds a string, we can use the normal string assignment operator to give a new value to that string. Otherwise, there is no existing string object on which to invoke the string assignment operator. Instead, we must construct a string in the memory that holds the union. We do so using placement new19.1.2, p. 824) to construct a string at the location in which sval resides. We initialize that string as a copy of our string parameter. We next update the discriminant and return.

 

Managing Union Members That Require Copy Control

 

Like the type-specific assignment operators, the copy constructor and assignment operators have to test the discriminant to know how to copy the given value. To do this common work, we’ll define a member named copyUnion.

 

When we call copyUnion from the copy constructor, the union member will have been default-initialized, meaning that the first member of the union will have been initialized. Because our string is not the first member, we know that the union member doesn’t hold a string. In the assignment operator, it is possible that the union already holds a string. We’ll handle that case directly in the assignment operator. That way copyUnion can assume that if its parameter holds a string, copyUnion must construct its own string:

 

 

void Token::copyUnion(const Token &t)
{
    switch (t.tok) {
        case Token::INT: ival = t.ival; break;
        case Token::CHAR: cval = t.cval; break;
        case Token::DBL: dval = t.dval; break;
        // to copy a string, construct it using placement new; see (§ 19.1.2 (p. 824))
        case Token::STR: new(&sval) string(t.sval); break;
    }
}

 

This function uses a switch statement (§ 5.3.2, p. 178) to test the discriminant. For the built-in types, we assign the value to the corresponding member; if the member we are copying is a string, we construct it.

 

The assignment operator must handle three possibilities for its string member: Both the left-hand and right-hand operands might be a string; neither operand might be a string; or one but not both operands might be a string:

 

 

Token &Token::operator=(const Token &t)
{
    // if this object holds a string and t doesn't, we have to free the old string
    if (tok == STR && t.tok != STR) sval.~string();
    if (tok == STR && t.tok == STR)
        sval = t.sval;  // no need to construct a new string
    else
        copyUnion(t);   // will construct a string if t.tok is STR
    tok = t.tok;
    return *this;
}

 

If the union in the left-hand operand holds a string, but the union in the right-hand does not, then we have to first free the old string before assigning a new value to the union member. If both unions hold a string, we can use the normal string assignment operator to do the copy. Otherwise, we call copyUnion to do the assignment. Inside copyUnion, if the right-hand operand is a string, we’ll construct a new string in the union member of the left-hand operand. If neither operand is a string, then ordinary assignment will suffice.

 

Exercises Section 19.6

 

Exercise 19.21: Write your own version of the Token class.

Exercise 19.22: Add a member of type Sales_data to your Token class.

Exercise 19.23: Add a move constructor and move assignment to Token.

Exercise 19.24: Explain what happens if we assign a Token object to itself.

Exercise 19.25: Write assignment operators that take values of each type in the union.


 
Team LiB
Previous Section Next Section