A compound type is a type that is defined in terms of another type. C++ has several compound types, two of which—references and pointers—we’ll cover in this chapter.
Defining variables of compound type is more complicated than the declarations we’ve seen so far. In § 2.2 (p. 41) we said that simple declarations consist of a type followed by a list of variable names. More generally, a declaration is a base type followed by a list of declarators. Each declarator names a variable and gives the variable a type that is related to the base type.
The declarations we have seen so far have declarators that are nothing more than variable names. The type of such variables is the base type of the declaration. More complicated declarators specify variables with compound types that are built from the base type of the declaration.
The new standard introduced a new kind of reference: an “rvalue reference,” which we’ll cover in § 13.6.1 (p. 532). These references are primarily intended for use inside classes. Technically speaking, when we use the term reference, we mean “lvalue reference.”
A reference defines an alternative name for an object. A reference type “refers to” another type. We define a reference type by writing a declarator of the form &d
, where d
is the name being declared:
int ival = 1024;
int &refVal = ival; // refVal refers to (is another name for) ival
int &refVal2; // error: a reference must be initialized
Ordinarily, when we initialize a variable, the value of the initializer is copied into the object we are creating. When we define a reference, instead of copying the initializer’s value, we bind the reference to its initializer. Once initialized, a reference remains bound to its initial object. There is no way to rebind a reference to refer to a different object. Because there is no way to rebind a reference, references must be initialized.
A reference is not an object. Instead, a reference is just another name for an already existing object.
After a reference has been defined, all operations on that reference are actually operations on the object to which the reference is bound:
refVal = 2; // assigns 2 to the object to which refVal refers, i.e., to ival
int ii = refVal; // same as ii = ival
When we assign to a reference, we are assigning to the object to which the reference is bound. When we fetch the value of a reference, we are really fetching the value of the object to which the reference is bound. Similarly, when we use a reference as an initializer, we are really using the object to which the reference is bound:
// ok: refVal3 is bound to the object to which refVal is bound, i.e., to ival
int &refVal3 = refVal;
// initializes i from the value in the object to which refVal is bound
int i = refVal; // ok: initializes i to the same value as ival
Because references are not objects, we may not define a reference to a reference.
We can define multiple references in a single definition. Each identifier that is a reference must be preceded by the &
symbol:
int i = 1024, i2 = 2048; // i and i2 are both ints
int &r = i, r2 = i2; // r is a reference bound to i; r2 is an int
int i3 = 1024, &ri = i3; // i3 is an int; ri is a reference bound to i3
int &r3 = i3, &r4 = i2; // both r3 and r4 are references
With two exceptions that we’ll cover in § 2.4.1 (p. 61) and § 15.2.3 (p. 601), the type of a reference and the object to which the reference refers must match exactly. Moreover, for reasons we’ll explore in § 2.4.1, a reference may be bound only to an object, not to a literal or to the result of a more general expression:
int &refVal4 = 10; // error: initializer must be an object
double dval = 3.14;
int &refVal5 = dval; // error: initializer must be an int object
Exercises Section 2.3.1
Exercise 2.15: Which of the following definitions, if any, are invalid? Why?
(a)
int ival = 1.01;
(b)
int &rval1 = 1.01;
(c)
int &rval2 = ival;
(d)
int &rval3;
Exercise 2.16: Which, if any, of the following assignments are invalid? If they are valid, explain what they do.
int i = 0, &r1 = i; double d = 0, &r2 = d;
(a)
r2 = 3.14159;
(b)
r2 = r1;
(c)
i = r2;
(d)
r1 = d;
int i, &ri = i;
i = 5; ri = 10;
std::cout << i << " " << ri << std::endl;
A pointer is a compound type that “points to” another type. Like references, pointers are used for indirect access to other objects. Unlike a reference, a pointer is an object in its own right. Pointers can be assigned and copied; a single pointer can point to several different objects over its lifetime. Unlike a reference, a pointer need not be initialized at the time it is defined. Like other built-in types, pointers defined at block scope have undefined value if they are not initialized.
Pointers are often hard to understand. Debugging problems due to pointer errors bedevil even experienced programmers.
We define a pointer type by writing a declarator of the form *d
, where d
is the name being defined. The *
must be repeated for each pointer variable:
int *ip1, *ip2; // both ip1 and ip2 are pointers to int
double dp, *dp2; // dp2 is a pointer to double; dp is a double
A pointer holds the address of another object. We get the address of an object by usin the address-of operator (the &
operator):
int ival = 42;
int *p = &ival; // p holds the address of ival; p is a pointer to ival
The second statement defines p
as a pointer to int
and initializes p
to point to the int
object named ival
. Because references are not objects, they don’t have addresses. Hence, we may not define a pointer to a reference.
With two exceptions, which we cover in § 2.4.2 (p. 62) and § 15.2.3 (p. 601), the types of the pointer and the object to which it points must match:
double dval;
double *pd = &dval; // ok: initializer is the address of a double
double *pd2 = pd; // ok: initializer is a pointer to double
int *pi = pd; // error: types of pi and pd differ
pi = &dval; // error: assigning the address of a double to a pointer to int
The types must match because the type of the pointer is used to infer the type of the object to which the pointer points. If a pointer addressed an object of another type, operations performed on the underlying object would fail.
The value (i.e., the address) stored in a pointer can be in one of four states:
1. It can point to an object.
2. It can point to the location just immediately past the end of an object.
3. It can be a null pointer, indicating that it is not bound to any object.
4. It can be invalid; values other than the preceding three are invalid.
It is an error to copy or otherwise try to access the value of an invalid pointer. As when we use an uninitialized variable, this error is one that the compiler is unlikely to detect. The result of accessing an invalid pointer is undefined. Therefore, we must always know whether a given pointer is valid.
Although pointers in cases 2 and 3 are valid, there are limits on what we can do with such pointers. Because these pointers do not point to any object, we may not use them to access the (supposed) object to which the pointer points. If we do attempt to access an object through such pointers, the behavior is undefined.
When a pointer points to an object, we can use the dereference operator (the *
operator) to access that object:
int ival = 42;
int *p = &ival; // p holds the address of ival; p is a pointer to ival
cout << *p; // * yields the object to which p points; prints 42
Dereferencing a pointer yields the object to which the pointer points. We can assign to that object by assigning to the result of the dereference:
*p = 0; // * yields the object; we assign a new value to ival through p
cout << *p; // prints 0
When we assign to *p
, we are assigning to the object to which p
points.
Some symbols, such as
&
and*
, are used as both an operator in an expression and as part of a declaration. The context in which a symbol is used determines what the symbol means:int i = 42;
int &r = i; // & follows a type and is part of a declaration; r is a reference
int *p; // * follows a type and is part of a declaration; p is a pointer
p = &i; // & is used in an expression as the address-of operator
*p = i; // * is used in an expression as the dereference operator
int &r2 = *p; // & is part of the declaration; * is the dereference operatorIn declarations,
&
and*
are used to form compound types. In expressions, these same symbols are used to denote an operator. Because the same symbol is used with very different meanings, it can be helpful to ignore appearances and think of them as if they were different symbols.
A null pointer does not point to any object. Code can check whether a pointer is null before attempting to use it. There are several ways to obtain a null pointer:
int *p1 = nullptr; // equivalent to int *p1 = 0;
int *p2 = 0; // directly initializes p2 from the literal constant 0
// must #include cstdlib
int *p3 = NULL; // equivalent to int *p3 = 0;
The most direct approach is to initialize the pointer using the literal nullptr
, which was introduced by the new standard. nullptr
is a literal that has a special type that can be converted (§ 2.1.2, p. 35) to any other pointer type. Alternatively, we can initialize a pointer to the literal 0
, as we do in the definition of p2
.
Older programs sometimes use a preprocessor variable named NULL
, which the cstdlib
header defines as 0
.
We’ll describe the preprocessor in a bit more detail in § 2.6.3 (p. 77). What’s useful to know now is that the preprocessor is a program that runs before the compiler. Preprocessor variables are managed by the preprocessor, and are not part of the std
namespace. As a result, we refer to them directly without the std::
prefix.
When we use a preprocessor variable, the preprocessor automatically replaces the variable by its value. Hence, initializing a pointer to NULL
is equivalent to initializing it to 0
. ModernC++ programs generally should avoid using NULL
and use nullptr
instead.
It is illegal to assign an int
variable to a pointer, even if the variable’s value happens to be 0
.
int zero = 0;
pi = zero; // error: cannot assign an int to a pointer
Uninitialized pointers are a common source of run-time errors.
As with any other uninitialized variable, what happens when we use an uninitialized pointer is undefined. Using an uninitialized pointer almost always results in a run-time crash. However, debugging the resulting crashes can be surprisingly hard.
Under most compilers, when we use an uninitialized pointer, the bits in the memory in which the pointer resides are used as an address. Using an uninitialized pointer is a request to access a supposed object at that supposed location. There is no way to distinguish a valid address from an invalid one formed from the bits that happen to be in the memory in which the pointer was allocated.
Our recommendation to initialize all variables is particularly important for pointers. If possible, define a pointer only after the object to which it should point has been defined. If there is no object to bind to a pointer, then initialize the pointer to
nullptr
or zero. That way, the program can detect that the pointer does not point to an object.
Both pointers and references give indirect access to other objects. However, there are important differences in how they do so. The most important is that a reference is not an object. Once we have defined a reference, there is no way to make that reference refer to a different object. When we use a reference, we always get the object to which the reference was initially bound.
There is no such identity between a pointer and the address that it holds. As with any other (nonreference) variable, when we assign to a pointer, we give the pointer itself a new value. Assignment makes the pointer point to a different object:
int i = 42;
int *pi = 0; // pi is initialized but addresses no object
int *pi2 = &i; // pi2 initialized to hold the address of i
int *pi3; // if pi3 is defined inside a block, pi3 is uninitialized
pi3 = pi2; // pi3 and pi2 address the same object, e.g., i
pi2 = 0; // pi2 now addresses no object
It can be hard to keep straight whether an assignment changes the pointer or the object to which the pointer points. The important thing to keep in mind is that assignment changes its left-hand operand. When we write
pi = &ival; // value in pi is changed; pi now points to ival
we assign a new value to pi
, which changes the address that pi
holds. On the other hand, when we write
*pi = 0; // value in ival is changed; pi is unchanged
then *pi
(i.e., the value to which pi
points) is changed.
So long as the pointer has a valid value, we can use a pointer in a condition. Just as when we use an arithmetic value in a condition (§ 2.1.2, p. 35), if the pointer is 0, then the condition is false
:
int ival = 1024;
int *pi = 0; // pi is a valid, null pointer
int *pi2 = &ival; // pi2 is a valid pointer that holds the address of ival
if (pi) // pi has value 0, so condition evaluates as false
// ...
if (pi2) // pi2 points to ival, so it is not 0; the condition evaluates as true
// ...
Any nonzero pointer evaluates as true
Given two valid pointers of the same type, we can compare them using the equality (==
) or inequality (!=
) operators. The result of these operators has type bool
. Two pointers are equal if they hold the same address and unequal otherwise. Two pointers hold the same address (i.e., are equal) if they are both null, if they address the same object, or if they are both pointers one past the same object. Note that it is possible for a pointer to an object and a pointer one past the end of a different object to hold the same address. Such pointers will compare equal.
Because these operations use the value of the pointer, a pointer used in a condition or in a comparsion must be a valid pointer. Using an invalid pointer as a condition or in a comparison is undefined.
§ 3.5.3 (p. 117) will cover additional pointer operations.
void*
PointersThe type void*
is a special pointer type that can hold the address of any object. Like any other pointer, a void*
pointer holds an address, but the type of the object at that address is unknown:
double obj = 3.14, *pd = &obj;
// ok: void* can hold the address value of any data pointer type
void *pv = &obj; // obj can be an object of any type
pv = pd; // pv can hold a pointer to any type
There are only a limited number of things we can do with a void*
pointer: We can compare it to another pointer, we can pass it to or return it from a function, and we can assign it to another void*
pointer. We cannot use a void*
to operate on the object it addresses—we don’t know that object’s type, and the type determines what operations we can perform on the object.
Generally, we use a void*
pointer to deal with memory as memory, rather than using the pointer to access the object stored in that memory. We’ll cover using void*
pointers in this way in § 19.1.1 (p. 821). § 4.11.3 (p. 163) will show how we can retrieve the address stored in a void*
pointer.
Exercises Section 2.3.2
Exercise 2.18: Write code to change the value of a pointer. Write code to change the value to which the pointer points.
Exercise 2.19: Explain the key differences between pointers and references.
int i = 42;
int *p1 = &i;
*p1 = *p1 * *p1;Exercise 2.21: Explain each of the following definitions. Indicate whether any are illegal and, if so, why.
int i = 0;
(a)
double* dp = &i;
(b)
int *ip = i;
(c)
int *p = &i;
Exercise 2.22: Assuming
p
is a pointer toint
, explain the following code:if (p) // ...
if (*p) // ...Exercise 2.23: Given a pointer
p
, can you determine whetherp
points to a valid object? If so, how? If not, why not?Exercise 2.24: Why is the initialization of
p
legal but that oflp
illegal?int i = 42; void *p = &i; long *lp = &i;
As we’ve seen, a variable definition consists of a base type and a list of declarators. Each declarator can relate its variable to the base type differently from the other declarators in the same definition. Thus, a single definition might define variables of different types:
//
i is an int; p is a pointer to int; r is a reference to intint i = 1024, *p = &i, &r = i;
Many programmers are confused by the interaction between the base type and the type modification that may be part of a declarator.
It is a common misconception to think that the type modifier (*
or &
) applies to all the variables defined in a single statement. Part of the problem arises because we can put whitespace between the type modifier and the name being declared:
int* p; // legal but might be misleading
We say that this definition might be misleading because it suggests that int*
is the type of each variable declared in that statement. Despite appearances, the base type of this declaration is int
, not int*
. The *
modifies the type of p
. It says nothing about any other objects that might be declared in the same statement:
int* p1, p2; // p1 is a pointer to int; p2 is an int
There are two common styles used to define multiple variables with pointer or reference type. The first places the type modifier adjacent to the identifier:
int *p1, *p2; // both p1 and p2 are pointers to int
This style emphasizes that the variable has the indicated compound type.
The second places the type modifier with the type but defines only one variable per statement:
int* p1; // p1 is a pointer to int
int* p2; // p2 is a pointer to int
This style emphasizes that the declaration defines a compound type.
There is no single right way to define pointers or references. The important thing is to choose a style and use it consistently.
In this book we use the first style and place the *
(or the &
) with the variable name.
In general, there are no limits to how many type modifiers can be applied to a declarator. When there is more than one modifier, they combine in ways that are logical but not always obvious. As one example, consider a pointer. A pointer is an object in memory, so like any object it has an address. Therefore, we can store the address of a pointer in another pointer.
We indicate each pointer level by its own *
. That is, we write **
for a pointer to a pointer, ***
for a pointer to a pointer to a pointer, and so on:
int ival = 1024;
int *pi = &ival; // pi points to an int
int **ppi = π // ppi points to a pointer to an int
Here pi
is a pointer to an int
and ppi
is a pointer to a pointer to an int
. We might represent these objects as
Just as dereferencing a pointer to an int
yields an int
, dereferencing a pointer to a pointer yields a pointer. To access the underlying object, we must dereference the original pointer twice:
cout << "The value of ival\n"
<< "direct value: " << ival << "\n"
<< "indirect value: " << *pi << "\n"
<< "doubly indirect value: " << **ppi
<< endl;
This program prints the value of ival
three different ways: first, directly; then, through the pointer to int
in pi
; and finally, by dereferencing ppi
twice to get to the underlying value in ival
.
A reference is not an object. Hence, we may not have a pointer to a reference. However, because a pointer is an object, we can define a reference to a pointer:
int i = 42;
int *p; // p is a pointer to int
int *&r = p; // r is a reference to the pointer p
r = &i; // r refers to a pointer; assigning &i to r makes p point to i
*r = 0; // dereferencing r yields i, the object to which p points; changes i to 0
The easiest way to understand the type of r
is to read the definition right to left. The symbol closest to the name of the variable (in this case the &
in &r
) is the one that has the most immediate effect on the variable’s type. Thus, we know that r
is a reference. The rest of the declarator determines the type to which r
refers. The next symbol, *
in this case, says that the type r
refers to is a pointer type. Finally, the base type of the declaration says that r
is a reference to a pointer to an int
.
It can be easier to understand complicated pointer or reference declarations if you read them from right to left.
Exercises Section 2.3.3
Exercise 2.25: Determine the types and values of each of the following variables.
(a)
int* ip, &r = ip;
(b)
int i, *ip = 0;
(c)
int* ip, ip2;