Lab 4 - Advanced I/O in C
Lab goals:
- Introduce advanced C input/output functions and how they are used.
Formatted I/O
We have seen printf()
before in previous
labs and examples, but we have not yet really covered all
of its features. In this lab, we will start with a deeper
look at printf()
and will also introduce the
related Standard I/O library function
scanf()
:
int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
int scanf(const char *format, ...);
int fscanf(FILE *stream, const char *format, ...);
Included above are also the related functions
fprintf()
and fscanf()
. Calling
printf()
always outputs to stdout,
whereas fprintf()
outputs to the specified
stream, and calling scanf()
always reads from
stdin, whereas fscanf()
reads from
the specified stream.
Below is a simple, but buggy, example that uses
printf()
and scanf()
(this code
is also available as printf_bug.c in your repo
for this lab):
#include <stdio.h>
int
main(void)
{
int cnt, i1, i2;
printf("Enter two integers: ");
cnt = scanf("%d %d", i1, i2);
if (cnt == EOF) {
fprintf(stderr, "Error during scanf.\n");
return (1); /* non-zero for error */
} else if (cnt < 2) {
fprintf(stderr, "scanf matched %d input items instead of 2.\n",
cnt);
return (2); /* non-zero for error */
}
printf("\nThe product of %d and %d is %d.\n", i1, i2, i1 * i2);
return (0); /* no error */
}
Key points:
-
The first argument of printf() and scanf() is a format string. In a format string, the percent sign (
%
) acts as an escape character to mark the placement of one of the arguments following the format string. The character(s) following the percent sign indicate the output (for printf()) or input (for scanf()) display format for the argument. The common formats are: %d (for an integer formatted in decimal), %u (for an unsigned decimal integer), %x (for an unsigned integer formatted in hexadecimal), %f (for a floating-point number), %c (for a single character), %s (for a string), and %p (for a pointer). -
The escape sequences in the format string can also include modifiers, e.g., to indicate the number of digits to display. For example, the format string %02x will print out a hex number with at least two digits of precision, printing leading zeroes as necessary ("zero padding"). See man 3 printf for more information on modifiers. Remember that section 3 of the manual documents library procedures, whereas section 1 documents commands; what happens if you use just man printf instead of man 3 printf?
-
The values of
i1
andi2
above are undefined before the call toscanf()
, but afterwards they have the inputted values. For scanf(), the arguments following the format string must be pointers to space that has been allocated! (See also the class notes on allocation.)
String I/O
Recall the echo_bug.c program from Lab 2. It is also possible to do I/O — and to echo — with strings instead of characters, using the following functions in the Standard I/O library:
char *gets(char *s);
char *fgets(char *s, int size, FILE *stream);
int puts(const char *s);
int fputs(const char *s, FILE *stream);
Here is a simple, but buggy, example (see echostr_bug.c in your repo):
#include <stdio.h>
int
main(void)
{
char input[10];
while (gets(input) != NULL) {
puts(input);
}
return (0); /* no error */
}
Key points:
-
The functions
gets()
andfgets()
returns
on success, or returnNULL
on any error or on end of file on the input stream. -
In the code above, if the length of the input string is greater than 9 characters, where do the extra characters go? They overrun into unallocated memory! Never use
gets()
. Note the warning message thatcc
reports when you try to use it! -
Calling
gets(input)
is (almost) equivalent toscanf("%s", input)
, and it suffers from the same buffer overflow bug. Never usescanf("%s", input)
. Instead, usescanf("%9s", input)
for a maximum number of 9 characters (for example), or usescanf("%ms", &p)
wherep
is achar *
to havescanf()
allocate a buffer of sufficient size (you must remember to thenfree(p)
yourself!). See man scanf for more information. -
The function
fgets()
is the file version ofgets()
, except thatfgets()
also includes a maximum length argument that makes it safe to use. Callingfgets()
is the only preferred way to read a string in C!
Byte and Word I/O
When processing large data sets, it is more efficient to handle the data as raw binary data rather than as character strings. For example, representing the number 1234567 as a character string requires 7 bytes (really 8 bytes, with the '\0' character to terminate this string), whereas representing it as an integer (raw binary data), requires only 4 bytes. The following functions in the Standard I/O library are useful for I/O on such raw binary data:
int getw(FILE *stream);
int putw(int w, FILE *stream);
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
Here is a simple, but buggy, example (this code is also available as sumints_bug.c in your repo):
#include <stdio.h>
int
main(int argc, char *argv[])
{
FILE *input_file, *output_file;
int error, number, sum = 0;
char *input_filename, *output_filename = "SUM.bin";
/* Filename must be the only argument. */
if (argc == 2)
input_filename = argv[argc - 1];
else {
fprintf(stderr, "Wrong number of arguments.\n");
return (1);
}
input_file = fopen(input_filename, "r");
if (input_file == NULL) {
fprintf(stderr, "Can't open %s.\n", input_filename);
return (1); /* non-zero for error */
}
output_file = fopen(output_filename, "w");
if (output_file == NULL) {
fclose(input_file);
fprintf(stderr, "Can't open %s.\n", output_filename);
return (1); /* non-zero for error */
}
while ((number = getw(input_file)) != EOF)
sum += number;
printf("The sum is %d.\n", sum);
if (putw(sum, output_file) == EOF) {
fprintf(stderr, "Unable to write sum.\n");
error = 1;
} else
error = 0;
fclose(input_file);
fclose(output_file);
return (error);
}
These functions can be used to effectively
serialize
your data, so that you can save the data
to a file and restore it later. However, note that this
serialization is actually machine-dependent,
unlike the machine-independent serialization you may be
used to from languages like Java. Therefore, one important
caveat when dealing with binary data in C is that the data
will be interpreted differently depending on the
endianness
(little-endian vs. big-endian format) of
the host computer. The terms little-endian and big-endian
refer to the two different, incompatible ways of laying out
the bytes of a multi-byte value (such as an
int
) in memory. (The use of the word
endian
in describing this issue comes originally
from Jonathan Swift's story Gulliver's Travels; see
this
classic paper, if you are curious about how we got from
there to here.)
As an example, to represent a 4-byte integer on
computers using little-endian format such as the
Intel x86, the least significant byte of the integer
comes first in memory (i.e., of the 4 bytes of memory used
to store the integer, the memory byte with the lowest
address stores the least significant byte of the
integer, with the other 3 bytes following in order, at each
higher-addressed byte in memory), whereas on computers
using big-endian format such as most SPARC
processors, the most significant byte of the integer
comes first in memory (i.e., of the 4 bytes, the memory
byte with the lowest address stores the most
significant byte of the integer). This difference in byte
ordering means that you would need to byte swap
the
values in your data if you write it on a computer of one
endianness and then read it on a computer of the opposite
endianness.
Also, note that the function getw()
returns
either the integer that was read or the
value EOF
. However, as described in the
previous I/O lab, the value EOF
is
traditionally just the integer -1. So it is
impossible to tell if getw()
actually read and
is returning the integer -1 or instead is returning the
value EOF
. It would be correct to use
feof()
to check for the end-of-file when using
getw()
.
But the functions getw()
and putw()
are deprecated,
and using the functions fread()
and
fwrite()
instead is preferred.
Seeking to a New Position within a Stream
When you first open a stream, the current position within that stream is at the beginning of the file. If you read or write n bytes on that stream, the current position on the stream advances by n bytes. You can thus read or write the stream sequentially by simply making repeated read or write calls on the stream.
But reading through an entire large file just to access
data that is near the end of the file would be inefficient.
Similarly, for example, it would be inefficient to close a
file and re-open it just to go back to the beginning of the
file and be able to read the first parts of the file again.
To make moving around within an open file more efficient
and convenient, the Standard I/O library provides the
function fseek()
to allow you to explicitly
move around within the stream by modifying the stream's
current position in the file, seeking
in the stream
to a specified position:
int fseek(FILE *stream, long offset, int whence);
The whence
parameter should be one of
SEEK_SET
, SEEK_CUR
,
or SEEK_END
to indicate that the
offset
parameter is relative to, respectively,
the beginning of the file, the current position in the file
for that stream, or the end of the file. See
man fseek.
GitHub Repository for This Lab
To obtain your private repo for this lab, please point your browser to the starter code for the lab at:
https://classroom.github.com/a/V7d1IpFbFollow the same steps as for previous labs and assignments to to create your repository on GitHub and to then clone it onto CLEAR. The directory for your repository for this lab will be
lab-4-advanced-i-o-in-c-namewhere name is your GitHub userid.