Lab 4 - Advanced C I/O

Lab goals:


GitHub Repository for This Lab

To obtain your private repo for this lab, please point your browser to this starting link. Follow the protocol from the previous labs and assignments to get your repository link and clone the repository. The repository for this lab is
  lab-4-advanced-i-o-in-c-[YOUR GITHUB ID]

Formatted I/O

We have seen printf() before in previous labs and examples, but we have not completely covered the function. Let's take a deeper look at printf() and introduce scanf():

    int printf(const char *format, ...);

    int scanf(const char *format, ...);

Here is a simple, but buggy, example that uses printf() and scanf() (see printf_bug.c in your repo):

     #include <stdio.h>

     int
     main(void)
     {
             int cnt, i1, i2;

             printf("Enter two integers: ");
     
             cnt = scanf("%d %d", i1, i2);
             if (cnt == EOF) {
                     fprintf(stderr, "Error during scanf.\n");
                     return (1);  /* non-zero for error */
             } else if (cnt < 2) {
                     fprintf(stderr, "scanf matched %d input items instead of 2.\n",
                         cnt);
                     return (2);  /* non-zero for error */
             }

             printf("\nThe product of %d and %d is %d.\n", i1, i2, i1 * i2);

             return (0);  /* no error */
     }

Key points:

Scanf Exercise #1

Fix the bug in the above code. The compiler will provide some useful warnings.

Printf Exercise #1

Modify the above code to input and multiply two floating-point numbers. Use mul-fp.c as the name of your source code file for this new program.

Print out the inputs and result with 3 digits of precision. (Use man 3 printf to determine how to do this.)

Scanf Exercise #2

Take a look at the following scanf_bug.c buggy code.

     #include <stdio.h>
     #include <string.h>

     int
     main(void)
     {
             int cnt;
             char string[10];
     
             printf("Enter a string: ");
     
             cnt = scanf("%s", string);
             if (cnt == EOF) {
                     fprintf(stderr, "Error during scanf.\n");
                     return (1);  /* non-zero for error */
             } else if (cnt < 1) {
                     fprintf(stderr, "scanf matched %d input items instead of 1.\n",
                         cnt);
                     return (2);  /* non-zero for error */
             }
 
             printf("\nThe length of the string is %zu.\n",
                 strlen(string));

             return (0);  /* no error */
     }

The return type of strlen is size_t. Note the use of the format specifier %zu in the format string for the final call to printf in order to properly format a number of type size_t. For comparison, if it were an unsigned long, you would use %lu.

Find, demonstrate, and explain the buffer overflow bug in the above code.

scanf(), like printf(), allows the programmer to specify size modifiers. Changing the scanf() line to scanf("%9s", string); will prevent a string larger than 9 characters from ever being stored into string. Make this change and verify that it fixes the buffer overflow bug.


String I/O

Recall the echo_bug.c program from Lab 2. It is also possible to echo with strings instead of characters with the following functions:

    char *gets(char *s);

    char *fgets(char *s, int size, FILE *stream);

    int puts(const char *s);

    int fputs(const char *s, FILE *stream);

Here is a simple, but buggy, example (see echostr_bug.c in your repo):

     #include <stdio.h>

     int
     main(void)
     {
             char input[10];

             while (gets(input) != NULL) {
                     puts(input);
             }

             return (0);  /* no error */
     }

Key points:

String I/O Exercise

Rewrite the above buggy example to use fgets() instead of gets()


Byte and Word I/O

When processing large data sets, it is more efficient to handle the data as raw binary data than as characters. For example, representing the number "1,234,567" with characters requires 7 bytes, whereas representing it as an integer (raw binary data), requires only 4 bytes. The following functions are used for manipulating binary data:

    int getw(FILE *stream);

    int putw(int w, FILE *stream);

    size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

    size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

Here is a simple, but buggy, sumints_bug.c example:

     #include <stdio.h>

     int
     main(int argc, char *argv[])
     {
             FILE *input_file, *output_file;
             int error, number, sum = 0;
             char *input_filename, *output_filename = "SUM.bin";
     
             /* Filename must be the only argument. */
             if (argc == 2)
                     input_filename = argv[argc - 1];
             else {
                     fprintf(stderr, "Wrong number of arguments.\n");
                     return (1);
             }
     
             input_file = fopen(input_filename, "r");
             if (input_file == NULL) {
                     fprintf(stderr, "Can't open %s.\n", input_filename);
                     return (1);  /* non-zero for error */
             }
      
             output_file = fopen(output_filename, "w");
             if (output_file == NULL) {
                     fclose(input_file);
                     fprintf(stderr, "Can't open %s.\n", output_filename);
                     return (1);  /* non-zero for error */
             }
      
             while ((number = getw(input_file)) != EOF)
                     sum += number;
             
             printf("The sum is %d.\n", sum);
      
             if (putw(sum, output_file) == EOF) {
                     fprintf(stderr, "Unable to write sum.\n");
                     error = 1;
             } else
                     error = 0;
      
             fclose(input_file);
             fclose(output_file);
      
             return (error);
     }

These functions can be used to effectively serialize your data, so that you can save the data to a file and restore it later. However, note that this serialization is actually machine-dependent unlike the machine-independent serialization you are used to from languages like Java. Therefore, one important caveat when dealing with binary data is that the data will be interpreted differently depending on the "endianness" (little-endian vs. big-endian format) of the host machine. These terms refer to the two different, incompatible ways of laying out the bytes of a multi-byte value in memory.

For example, to represent a 4-byte integer, on machine's using little-endian format such as the Intel x86, the least significant byte of the integer comes first in memory (of the 4 bytes used to store the integer, the memory byte with the lowest address stores the least significant byte of the integer), whereas on machine's using big-endian format such as most SPARC processors, the most significant byte of the integer comes first in memory (of the 4 bytes, the memory byte with the lowest address stores the most significant byte of the integer). This difference means that you would need to byte swap the values in your data set if you switch between machines with different endiannesses.

The function getw() returns either the read integer or EOF. However, as you may recall from the previous I/O lab, EOF is traditionally just the integer -1. So it is impossible to tell if getw() is returning the integer -1 or EOF. It would be correct to use feof() to check for the end-of-file, but the functions fread() and fwrite() are preferred.

Byte I/O Exercise #1

Run the above sumints_bug.c program on the binary file INTS.bin containing the integers from 10 to -1. The sum should be 54, but because of the bug, the result turns out to be 55. Fix the bug using the feof() function.

Byte I/O Exercise #2

The getw() and putw() functions are depricated. Modify the above sumints_bug.c to use the preferred fread() and fwrite() functions instead. man fread and man fwrite will be useful for this.


File Seeking

Reading through an entire large file looking for data at the end is inefficient. Similarly, it would be inefficient to close and re-open a file just to read the beginning of the file again. The fseek() function allows you to explicitly modify the current position for a file.

    int fseek(FILE *stream, long offset, int whence);

The whence parameter is one of SEEK_SET, SEEK_CUR, or SEEK_END to indicate if the offset parameter is relative to the beginning of the file, the current position indicator, or the end of the file, respectively.

File Seeking Exercise

Modify the above sumints_bug.c example. so that only the last 5 integers in INTS.bin are summed. The result should be 5.

It is important to note that the integers in INTS.bin are 4 bytes each, which is also equal to sizeof(int)


Submission

Please push the appropriate C source files for this lab before 11:55 PM on Saturday at the end of this week.