17.5. The IO Library Revisited

In Chapter 8 we introduced the basic architecture and most commonly used parts of the IO library. In this section we’ll look at three of the more specialized features that the IO library supports: format control, unformatted IO, and random access.

17.5.1. Formatted Input and Output

In addition to its condition state (§ 8.1.2, p. 312), each iostream object also maintains a format state that controls the details of how IO is formatted. The format state controls aspects of formatting such as the notational base for integral values, the precision of floating-point values, the width of an output element, and so on.

The library defines a set of manipulators (§ 1.2, p. 7), listed in Tables 17.17 (p. 757) and 17.18 (p. 760), that modify the format state of a stream. A manipulator is a function or object that affects the state of a stream and can be used as an operand to an input or output operator. Like the input and output operators, a manipulator returns the stream object to which it is applied, so we can combine manipulators and data in a single statement.

Our programs have already used one manipulator, endl, which we “write” to an output stream as if it were a value. But endl isn’t an ordinary value; instead, it performs an operation: It writes a newline and flushes the buffer.

Many Manipulators Change the Format State

Manipulators are used for two broad categories of output control: controlling the presentation of numeric values and controlling the amount and placement of padding. Most of the manipulators that change the format state provide set/unset pairs; one manipulator sets the format state to a new value and the other unsets it, restoring the normal default formatting.

Controlling the Format of Boolean Values

Indicating Base on the Output

Exercises Section 17.5.1

Exercise 17.34: Write a program that illustrates the use of each manipulator in Tables 17.17 (p. 757) and 17.18.

Exercise 17.35: Write a version of the program from page 758, that printed the square root of 2 but this time print hexadecimal digits in uppercase.

Exercise 17.36: Modify the program from the previous exercise to print the various floating-point values so that they line up in a column.

17.5.2. Unformatted Input/Output Operations

So far, our programs have used only formatted IO operations. The input and output operators (<< and >>) format the data they read or write according to the type being handled. The input operators ignore whitespace; the output operators apply padding, precision, and so on.

The library also provides a set of low-level operations that support unformatted IO. These operations let us deal with a stream as a sequence of uninterpreted bytes.

Single-Byte Operations

Several of the unformatted operations deal with a stream one byte at a time. These operations, which are described in Table 17.19, read rather than ignore whitespace. For example, we can use the unformatted IO operations get and put to read and write the characters one at a time:

This program preserves the whitespace in the input. Its output is identical to the input. It executes the same way as the previous program that used noskipws.

This program operates identically to the one on page 761, the only difference being the version of get that is used to read the input.

Multi-Byte Operations

Some unformatted IO operations deal with chunks of data at a time. These operations can be important if speed is an issue, but like other low-level operations, they are error-prone. In particular, these operations require us to allocate and manage the character arrays (§ 12.2, p. 476) used to store and retrieve data. The multi-byte operations are listed in Table 17.20.

Caution: Low-Level Routines Are Error-Prone

In general, we advocate using the higher-level abstractions provided by the library. The IO operations that return int are a good example of why.

It is a common programming error to assign the return, from get or peek to a char rather than an int. Doing so is an error, but an error the compiler will not detect. Instead, what happens depends on the machine and on the input data. For example, on a machine in which chars are implemented as unsigned chars, this loop will run forever:

char ch; // using a char here invites disaster! // the return from cin.get is converted to char and then compared to an int while ((ch = cin.get()) != EOF) cout.put(ch);

The problem is that when get returns EOF, that value will be converted to an unsigned char value. That converted value is no longer equal to the int value of EOF, and the loop will continue forever. Such errors are likely to be caught in testing.

On machines for which chars are implemented as signed chars, we can’t say with confidence what the behavior of the loop might be. What happens when an out-of-bounds value is assigned to a signed value is up to the compiler. On many machines, this loop will appear to work, unless a character in the input matches the EOF value. Although such characters are unlikely in ordinary data, presumably low-level IO is necessary only when we read binary values that do not map directly to ordinary characters and numeric values. For example, on our machine, if the input contains a character whose value is '\377', then the loop terminates prematurely. '\377' is the value on our machine to which −1 converts when used as a signed char. If the input has this value, then it will be treated as the (premature) end-of-file indicator.

Such bugs do not happen when we read and write typed values. If you can use the more type-safe, higher-level operations supported by the library, do so.

Seek and Tell Functions

There Is Only One Marker

The fact that the library distinguishes between the “putting” and “getting” versions of the seek and tell functions can be misleading. Even though the library makes this distinction, it maintains only a single marker in a stream—there is not a distinct read marker and write marker.

When we’re dealing with an input-only or output-only stream, the distinction isn’t even apparent. We can use only the g or only the p versions on such streams. If we attempt to call tellp on an ifstream, the compiler will complain. Similarly, it will not let us call seekg on an ostringstream.

The fstream and stringstream types can read and write the same stream. In these types there is a single buffer that holds data to be read and written and a single marker denoting the current position in the buffer. The library maps both the g and p positions to this single marker.

The arguments, new_position and offset, have machine-dependent types named pos_type and off_type, respectively. These types are defined in both istream and ostream. pos_type represents a file position and off_type represents an offset from that position. A value of type off_type can be positive or negative; we can seek forward or backward in the file.

Accessing the Marker

The tellg or tellp functions return a pos_type value denoting the current position of the stream. The tell functions are usually used to remember a location so that we can subsequently seek back to it:

Reading and Writing to the Same File

Let’s look at a programming example. Assume we are given a file to read. We are to write a newline at the end of the file that contains the relative position at which each line begins. For example, given the following file,

Because our program writes to its input file, we can’t use end-of-file to signal when it’s time to stop reading. Instead, our loop must end when it reaches the point at which the original input ended. As a result, we must first remember the original end-of-file position. Because we opened the file in ate mode, inOut is already positioned at the end. We store the current (i.e., the original end) position in end_mark. Having remembered the end position, we reposition the read marker at the beginning of the file by seeking to the position 0 bytes from the beginning of the file.

The while loop has a three-part condition: We first check that the stream is valid; if so, we check whether we’ve exhausted our original input by comparing the current read position (returned by tellg) with the position we remembered in end_mark. Finally, assuming that both tests succeeded, we call getline to read the next line of input. If getline succeeds, we perform the body of the loop.

The loop body starts by remembering the current position in mark. We save that position in order to return to it after writing the next relative offset. The call to seekp repositions the write marker to the end of the file. We write the counter value and then seekg back to the position we remembered in mark. Having restored the marker, we’re ready to repeat the condition in the while.

Each iteration of the loop writes the offset of the next line. Therefore, the last iteration of the loop takes care of writing the offset of the last line. However, we still need to write a newline at the end of the file. As with the other writes, we call seekp to position the file at the end before writing the newline.