Lab 4 - Advanced I/O in C

Lab goals:


Formatted I/O

We have seen printf() before in previous labs and examples, but we have not yet really covered all of its features. In this lab, we will start with a deeper look at printf() and will also introduce the related Standard I/O library function scanf():

    int printf(const char *format, ...);

    int fprintf(FILE *stream, const char *format, ...);

    int scanf(const char *format, ...);

    int fscanf(FILE *stream, const char *format, ...);

Included above are also the related functions fprintf() and fscanf(). Calling printf() always outputs to stdout, whereas fprintf() outputs to the specified stream, and calling scanf() always reads from stdin, whereas fscanf() reads from the specified stream.

Below is a simple, but buggy, example that uses printf() and scanf() (this code is also available as printf_bug.c in your repo for this lab):

     #include <stdio.h>

     int
     main(void)
     {
             int cnt, i1, i2;

             printf("Enter two integers: ");
     
             cnt = scanf("%d %d", i1, i2);
             if (cnt == EOF) {
                     fprintf(stderr, "Error during scanf.\n");
                     return (1);  /* non-zero for error */
             } else if (cnt < 2) {
                     fprintf(stderr, "scanf matched %d input items instead of 2.\n",
                         cnt);
                     return (2);  /* non-zero for error */
             }

             printf("\nThe product of %d and %d is %d.\n", i1, i2, i1 * i2);

             return (0);  /* no error */
     }

Key points:


String I/O

Recall the echo_bug.c program from Lab 2. It is also possible to do I/O — and to echo — with strings instead of characters, using the following functions in the Standard I/O library:

    char *gets(char *s);

    char *fgets(char *s, int size, FILE *stream);

    int puts(const char *s);

    int fputs(const char *s, FILE *stream);

Here is a simple, but buggy, example (see echostr_bug.c in your repo):

     #include <stdio.h>

     int
     main(void)
     {
             char input[10];

             while (gets(input) != NULL) {
                     puts(input);
             }

             return (0);  /* no error */
     }

Key points:


Byte and Word I/O

When processing large data sets, it is more efficient to handle the data as raw binary data rather than as character strings. For example, representing the number 1234567 as a character string requires 7 bytes (really 8 bytes, with the '\0' character to terminate this string), whereas representing it as an integer (raw binary data), requires only 4 bytes. The following functions in the Standard I/O library are useful for I/O on such raw binary data:

    int getw(FILE *stream);

    int putw(int w, FILE *stream);

    size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

    size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

Here is a simple, but buggy, example (this code is also available as sumints_bug.c in your repo):

     #include <stdio.h>

     int
     main(int argc, char *argv[])
     {
             FILE *input_file, *output_file;
             int error, number, sum = 0;
             char *input_filename, *output_filename = "SUM.bin";
     
             /* Filename must be the only argument. */
             if (argc == 2)
                     input_filename = argv[argc - 1];
             else {
                     fprintf(stderr, "Wrong number of arguments.\n");
                     return (1);
             }
     
             input_file = fopen(input_filename, "r");
             if (input_file == NULL) {
                     fprintf(stderr, "Can't open %s.\n", input_filename);
                     return (1);  /* non-zero for error */
             }
      
             output_file = fopen(output_filename, "w");
             if (output_file == NULL) {
                     fclose(input_file);
                     fprintf(stderr, "Can't open %s.\n", output_filename);
                     return (1);  /* non-zero for error */
             }
      
             while ((number = getw(input_file)) != EOF)
                     sum += number;
             
             printf("The sum is %d.\n", sum);
      
             if (putw(sum, output_file) == EOF) {
                     fprintf(stderr, "Unable to write sum.\n");
                     error = 1;
             } else
                     error = 0;
      
             fclose(input_file);
             fclose(output_file);
      
             return (error);
     }

These functions can be used to effectively serialize your data, so that you can save the data to a file and restore it later. However, note that this serialization is actually machine-dependent, unlike the machine-independent serialization you may be used to from languages like Java. Therefore, one important caveat when dealing with binary data in C is that the data will be interpreted differently depending on the endianness (little-endian vs. big-endian format) of the host computer. The terms little-endian and big-endian refer to the two different, incompatible ways of laying out the bytes of a multi-byte value (such as an int) in memory. (The use of the word endian in describing this issue comes originally from Jonathan Swift's story Gulliver's Travels; see this classic paper, if you are curious about how we got from there to here.)

As an example, to represent a 4-byte integer on computers using little-endian format such as the Intel x86, the least significant byte of the integer comes first in memory (i.e., of the 4 bytes of memory used to store the integer, the memory byte with the lowest address stores the least significant byte of the integer, with the other 3 bytes following in order, at each higher-addressed byte in memory), whereas on computers using big-endian format such as most SPARC processors, the most significant byte of the integer comes first in memory (i.e., of the 4 bytes, the memory byte with the lowest address stores the most significant byte of the integer). This difference in byte ordering means that you would need to byte swap the values in your data if you write it on a computer of one endianness and then read it on a computer of the opposite endianness.

Also, note that the function getw() returns either the integer that was read or the value EOF. However, as described in the previous I/O lab, the value EOF is traditionally just the integer -1. So it is impossible to tell if getw() actually read and is returning the integer -1 or instead is returning the value EOF. It would be correct to use feof() to check for the end-of-file when using getw().

But the functions getw() and putw() are deprecated, and using the functions fread() and fwrite() instead is preferred.


Seeking to a New Position within a Stream

When you first open a stream, the current position within that stream is at the beginning of the file. If you read or write n bytes on that stream, the current position on the stream advances by n bytes. You can thus read or write the stream sequentially by simply making repeated read or write calls on the stream.

But reading through an entire large file just to access data that is near the end of the file would be inefficient. Similarly, for example, it would be inefficient to close a file and re-open it just to go back to the beginning of the file and be able to read the first parts of the file again. To make moving around within an open file more efficient and convenient, the Standard I/O library provides the function fseek() to allow you to explicitly move around within the stream by modifying the stream's current position in the file, seeking in the stream to a specified position:

    int fseek(FILE *stream, long offset, int whence);

The whence parameter should be one of SEEK_SET, SEEK_CUR, or SEEK_END to indicate that the offset parameter is relative to, respectively, the beginning of the file, the current position in the file for that stream, or the end of the file. See man fseek.


GitHub Repository for This Lab

To obtain your private repo for this lab, please point your browser to the starter code for the lab at:

https://classroom.github.com/a/V7d1IpFb
Follow the same steps as for previous labs and assignments to to create your repository on GitHub and to then clone it onto CLEAR. The directory for your repository for this lab will be
lab-4-advanced-i-o-in-c-name
where name is your GitHub userid.

Submission

Again, be sure to git push the appropriate C source files for this lab before 11:55 PM tonight to get credit for this lab.