union: A Space-Saving ClassA union is a special kind of class. A union may have multiple data members, but at any point in time, only one of the members may have a value. When a value is assigned to one member of the union, all other members become undefined. The amount of storage allocated for a union is at least as much as is needed to contain its largest data member. Like any class, a union defines a new type.
Some, but not all, class features apply equally to unions. A union cannot have a member that is a reference, but it can have members of most other types, including, under the new standard, class types that have constructors or destructors. A union can specify protection labels to make members public, private, or protected. By default, like structs, members of a union are public.
A union may define member functions, including constructors and destructors. However, a union may not inherit from another class, nor may a union be used as a base class. As a result, a union may not have virtual functions.
unionunions offer a convenient way to represent a set of mutually exclusive values of different types. As an example, we might have a process that handles different kinds of numeric or character data. That process might define a union to hold these values:
// objects of type Token have a single member, which could be of any of the listed types
union Token {
// members are public by default
char cval;
int ival;
double dval;
};
A union is defined starting with the keyword union, followed by an (optional) name for the union and a set of member declarations enclosed in curly braces. This code defines a union named Token that can hold a value that is either a char, an int, or a double.
union TypeThe name of a union is a type name. Like the built-in types, by default unions are uninitialized. We can explicitly initialize a union in the same way that we can explicitly initialize aggregate classes (§ 7.5.5, p. 298) by enclosing the initializer in a pair of curly braces:
Token first_token = {'a'}; // initializes the cval member
Token last_token; // uninitialized Token object
Token *pt = new Token; // pointer to an uninitialized Token object
If an initializer is present, it is used to initialize the first member. Hence, the initialization of first_token gives a value to its cval member.
The members of an object of union type are accessed using the normal member access operators:
last_token.cval = 'z';
pt->ival = 42;
Assigning a value to a data member of a union object makes the other data members undefined. As a result, when we use a union, we must always know what type of value is currently stored in the union. Depending on the types of the members, retrieving or assigning to the value stored in the union through the wrong data member can lead to a crash or other incorrect program behavior.
unionsAn anonymous
union is an unnamed union that does not include any declarations between the close curly that ends its body and the semicolon that ends the union definition (§ 2.6.1, p. 73). When we define an anonymous union the compiler automatically creates an unnamed object of the newly defined union type:
union { // anonymous union
char cval;
int ival;
double dval;
}; // defines an unnamed object, whose members we can access directly
cval = 'c'; // assigns a new value to the unnamed, anonymous union object
ival = 42; // that object now holds the value 42
The members of an anonymous union are directly accessible in the scope where the anonymous union is defined.
An anonymous
unioncannot haveprivateorprotectedmembers, nor can an anonymousuniondefine member functions.
unions with Members of Class TypeUnder earlier versions of C++, unions could not have members of a class type that defined its own constructors or copy-control members. Under the new standard, this restriction is lifted. However, unions with members that define their own constructors and/or copy-control members are more complicated to use than unions that have members of built-in type.
When a union has members of built-in type, we can use ordinary assignment to change the value that the union holds. Not so for unions that have members of nontrivial class types. When we switch the union’s value to and from a member of class type, we must construct or destroy that member, respectively: When we switch the union to a member of class type, we must run a constructor for that member’s type; when we switch from that member, we must run its destructor.
When a union has members of built-in type, the compiler will synthesize the memberwise versions of the default constructor or copy-control members. The same is not true for unions that have members of a class type that defines its own default constructor or one or more of the copy-control members. If a union member’s type defines one of these members, the compiler synthesizes the corresponding member of the union as deleted (§ 13.1.6, p. 508).
For example, the string class defines all five copy-control members and the default constructor. If a union contains a string and does not define its own default constructor or one of the copy-control members, then the compiler will synthesize that missing member as deleted. If a class has a union member that has a deleted copy-control member, then that corresponding copy-control operation(s) of the class itself will be deleted as well.
union MembersBecause of the complexities involved in constructing and destroying members of class type, unions with class-type members ordinarily are embedded inside another class. That way the class can manage the state transitions to and from the member of class type. As an example, we’ll add a string member to our union. We’ll define our union as an anonymous union and make it a member of a class named Token. The Token class will manage the union’s members.
To keep track of what type of value the union holds, we usually define a separate object known as a discriminant. A discriminant lets us discriminate among the values that the union can hold. In order to keep the union and its discriminant in sync, we’ll make the discriminant a member of Token as well. Our class will define a member of an enumeration type (§ 19.3, p. 832) to keep track of the state of its union member.
The only functions our class will define are the default constructor, the copy-control members, and a set of assignment operators that can assign a value of one of our union’s types to the union member:
class Token {
public:
// copy control needed because our class has a union with a string member
// defining the move constructor and move-assignment operator is left as an exercise
Token(): tok(INT), ival{0} { }
Token(const Token &t): tok(t.tok) { copyUnion(t); }
Token &operator=(const Token&);
// if the union holds a string, we must destroy it; see § 19.1.2 (p. 824)
~Token() { if (tok == STR) sval.~string(); }
// assignment operators to set the differing members of the union
Token &operator=(const std::string&);
Token &operator=(char);
Token &operator=(int);
Token &operator=(double);
private:
enum {INT, CHAR, DBL, STR} tok; // discriminant
union { // anonymous union
char cval;
int ival;
double dval;
std::string sval;
}; // each Token object has an unnamed member of this unnamed union type
// check the discriminant and copy the union member as appropriate
void copyUnion(const Token&);
};
Our class defines a nested, unnamed, unscoped enumeration (§ 19.3, p. 832) that we use as the type for the member named tok. We defined tok following the close curly and before the semicolon that ends the definition of the enum, which defines tok to have this unnamed enum type (§ 2.6.1, p. 73).
We’ll use tok as our discriminant. When the union holds an int value, tok will have the value INT; if the union has a string, tok will be STR; and so on.
The default constructor initializes the discriminant and the union member to hold an int value of 0.
Because our union has a member with a destructor, we must define our own destructor to (conditionally) destroy the string member. Unlike ordinary members of a class type, class members that are part of a union are not automatically destroyed. The destructor has no way to know which type the union holds, so it cannot know which member to destroy.
Our destructor checks whether the object being destroyed holds a string. If so, the destructor explicitly calls the string destructor (§ 19.1.2, p. 824) to free the memory used by that string. The destructor has no work to do if the union holds a member of any of the built-in types.
stringThe assignment operators will set tok and assign the corresponding member of the union. Like the destructor, these members must conditionally destroy the string before assigning a new value to the union:
Token &Token::operator=(int i)
{
if (tok == STR) sval.~string(); // if we have a string, free it
ival = i; // assign to the appropriate member
tok = INT; // update the discriminant
return *this;
}
If the current value in the union is a string, we must destroy that string before assigning a new value to the union. We do so by calling the string destructor. Once we’ve cleaned up the string member, we assign the given value to the member that corresponds to the parameter type of the operator. In this case, our parameter is an int, so we assign to ival. We update the discriminant and return.
The double and char assignment operators behave identically to the int version and are left as an exercise. The string version differs from the others because it must manage the transition to and from the string type:
Token &Token::operator=(const std::string &s)
{
if (tok == STR) // if we already hold a string, just do an assignment
sval = s;
else
new(&sval) string(s); // otherwise construct a string
tok = STR; // update the discriminant
return *this;
}
In this case, if the union already holds a string, we can use the normal string assignment operator to give a new value to that string. Otherwise, there is no existing string object on which to invoke the string assignment operator. Instead, we must construct a string in the memory that holds the union. We do so using placement new (§ 19.1.2, p. 824) to construct a string at the location in which sval resides. We initialize that string as a copy of our string parameter. We next update the discriminant and return.
Like the type-specific assignment operators, the copy constructor and assignment operators have to test the discriminant to know how to copy the given value. To do this common work, we’ll define a member named copyUnion.
When we call copyUnion from the copy constructor, the union member will have been default-initialized, meaning that the first member of the union will have been initialized. Because our string is not the first member, we know that the union member doesn’t hold a string. In the assignment operator, it is possible that the union already holds a string. We’ll handle that case directly in the assignment operator. That way copyUnion can assume that if its parameter holds a string, copyUnion must construct its own string:
void Token::copyUnion(const Token &t)
{
switch (t.tok) {
case Token::INT: ival = t.ival; break;
case Token::CHAR: cval = t.cval; break;
case Token::DBL: dval = t.dval; break;
// to copy a string, construct it using placement new; see (§ 19.1.2 (p. 824))
case Token::STR: new(&sval) string(t.sval); break;
}
}
This function uses a switch statement (§ 5.3.2, p. 178) to test the discriminant. For the built-in types, we assign the value to the corresponding member; if the member we are copying is a string, we construct it.
The assignment operator must handle three possibilities for its string member: Both the left-hand and right-hand operands might be a string; neither operand might be a string; or one but not both operands might be a string:
Token &Token::operator=(const Token &t)
{
// if this object holds a string and t doesn't, we have to free the old string
if (tok == STR && t.tok != STR) sval.~string();
if (tok == STR && t.tok == STR)
sval = t.sval; // no need to construct a new string
else
copyUnion(t); // will construct a string if t.tok is STR
tok = t.tok;
return *this;
}
If the union in the left-hand operand holds a string, but the union in the right-hand does not, then we have to first free the old string before assigning a new value to the union member. If both unions hold a string, we can use the normal string assignment operator to do the copy. Otherwise, we call copyUnion to do the assignment. Inside copyUnion, if the right-hand operand is a string, we’ll construct a new string in the union member of the left-hand operand. If neither operand is a string, then ordinary assignment will suffice.
Exercises Section 19.6
Exercise 19.22: Add a member of type
Sales_datato yourTokenclass.Exercise 19.23: Add a move constructor and move assignment to
Token.Exercise 19.24: Explain what happens if we assign a
Tokenobject to itself.Exercise 19.25: Write assignment operators that take values of each type in the
union.