Lab 2 - Basic I/O in C

Lab goals:


GitHub Repository for This Lab

To obtain your private repo for this lab, please point your browser to this starting link. From there, follow the same procedure as from lab01. This starting link will take you to a page labeled RICE-COMP321-SPRING19.

As with lab01, please accept the assignment invitation found on the RICE-COMP321 page. (This step might reqiuire logging in to github.)

After accepting the assignment, you will be redirected to your invitation acceptance page. On the page will be a link to your main assignment repository page. Please click the link.

Your main repository link has a button labeled Clone or download. Click the button.

You should now see text field with the URL of your personal repositoty for this assignment. Save/remember this URL, as you will need it to clone your repository.

Log into CLEAR. Once you are on the CLEAR system, clone your lab 2 repository.

    git clone [YOUR PERSONAL LAB 2 URL]
  


Unix Command Line Essentials

Many of you have never navigated around a Unix filesystem using the command line before. This section of the lab will help you acquire some rudimentary command line skills. Roughly speaking, a Unix filesystem is a hierarchical collection of directories, where each directory may contain either files, other directories, or some combination. When working on a Unix system, you are always in some directory. Thus, navigating the file system relies on the abstraction of a current working directory that is provided by the shell. The pwd command will print the working directory. For our first example of pwd, we will use it right after logging in. When you first log into a Unix system, you are in your home directory. To see the full name of your home directory, please type

pwd
right after logging into CLEAR. As you move around in the Unix filesystem, pwd tells you where you are

Our next command is ls. The ls command will list the contents of a directory. When invoked with no arguments, ls operates on the current working directory. Let's use ls to look at the contents of the current directory. This will be your home directory if you have just logged in. Type

ls
You should see several files here. In particular, you should see the lab 2 repo directory
  lab-2-basic-i-o-in-c-[YOUR github ID]
(We cannot say what else may be in your personal home).

Before describing the fundamental Unix navigation command, we need to understand the notion of a path. In Unix, there are two types of paths. absolute paths and relative paths. An absolute path points to the same location on the file system regardless of the working directory. Absolute paths are written in reference to the top of the directory hierarchy, called the root directory, represented by the symbol "/". Relative paths are resolved relative to the working directory and are not prefaced with the symbol "/".

The fundamental Unix navigation command is cd, which is short for change directory. Normally, cd takes a path argument

    cd path
  
For some path. This command takes you to the directory specified by the path. If you use cd without a path argument, then cd will take you to your home directory. To get some practice with what you know so far, first type

      cd /usr/bin
  
Then, use pwd to confirm that your cd worked correctly. Also, type ls to see what is in the /usr/bin directory.

Next, type cd (no path), followed by pwd to confirm that you are back to your home directory.

The ls command can also operate on paths. Let's use ls to operate on the absolute path /usr/bin:

 ls /usr/bin
You should see the same files as you saw when you ran cd /usr/bin. Using ls with a path, however, does not change your working directory.

Now, we will explore cd with a relative path. Assuming you are still in your home directory, you want to get to your lab 2 repo directory. This means that you should type
  cd lab-2-basic-i-o-in-c-[YOUR github ID]
THAT IS A LOT OF TYPING!! So, before you attempt to type such a long string, we will show you some veteran Unix command line techniques.

Digression: Unix command line tips

To see Tip #1 work, start typing

  cd lab-2
Right after typing the 2 character, hit the tab key. You should be pleasantly surprised to see that Unix has completed the file name for you! (This technique is called Tab completion, by the way).

Tab completion works for any Unix path/file name. In fact, if you type the tab key and do not see a completion, that means that either there are no completions, or there are more than 1 valid completions. To see which case, type tab a second time. If you get a list of possibilities, then you know that you did not type enough characters to uniquely determine the name. Type enough more characters to disambiguate your choice, then type tab again to get a successful completion. Depending on the configuration of your shell, the list of possibilities may be listed automatically after the first tab, without you needing to type the second tab to see the list. Or in other shell configurations, typing a control-D character, also called Ctrl+D (hold down the Control key, type a d, and then release the Control key), rather than the second tab will show the list of possibilities.

Tip #2 is little more general purpose. The technique we will show you is called history editing. Unix shells save a history of your previous commands. To access you command history, use the arrow keys. The up arrow takes you to older commands, whereas down moves towards the newer commands. You can do more than just re-execute old commands with the history feature. You can also edit them, and then execute your modified command. To edit a command, use right and left arrow keys to position the cursor, then use delete to delete characters, or just type what you want to insert.

We encourage you to use tab completion and history editing to greatly simplify and enhance your command line usage.

End Digression

Now that we can navigate a Unix directory structure, we turn our attention to creating and maintaining a directory structure of our own. Leaving all of your files in your home directory can get messy, so here is a crash course on setting up a basic directory structure. The first step in organizing your home directory is to create a directory for your comp 321 work. The mkdir command makes directories with the name of the command line argument. Get to your home directory (if you are not already there), and issue the following command to create your initial comp 321 directory:

 cd
 mkdir comp321
 ls
The ls command should verify that your comp321 directory was created.

Our next organizational step will be to create 2 new directories, 1 for lab work(labs)), and 1 for programming assignments(assignments):

  cd comp321
  mkdir labs
  mkdir assignments
  ls

Now, we will move lab repo directories from the home directory to comp321/labs The Unix file moving command is mv. This command moves one or more files or directories into the directory specified as the last argument. So, our next organization steps will be to change to home directory, move all of the lab repo directories into comp321/labs, and verify.


    cd
    mv [lab 1 repo directory] [lab 2 repo directory] comp321/labs
    ls comp321/labs
NOTE: You might want to use tab completion for the repo directory names. For a final sanity check, try:

   cd comp321/labs
   pwd
   ls
Hopefully, everything works as expected.

An additional note about mv: The mv command also serves as the rename command. Use the rename feature like this:

    mv [Old File Name] [New File Name]
  
You need to be careful, though. If New File Name is a directory, then mv will move Old File Name into it.

One more useful navigation convention: In Unix, .. is a special name used to refer to the parent directory of the current working directory (unless you are in the root directory, in which case .. refers to the current working directory). So, cd .. will change into the parent directory, and ls .. will list the parent directory.

Similarly, . is a special name used to refer to the current directory, which is why typing ./program tells the shell to look for program in the current directory.

Our final command is the remove command, rm. The rm command is useful for removing files that you no longer want, but be careful!, rm does not move deleted files to a Trash directory from which they can be resurrected if you make a mistake. Once a file is removed with rm, it is gone forever!

As always, see the man pages for more complex invocations of any of the above functions.


Terminal I/O

As in most programming languages, to do any useful work in C, you must use I/O to get and return data. The simplest form of I/O is character-based, using the functions getchar() and putchar(). These functions are defined by the include file stdio.h. Their prototypes are as follows:

    int getchar(void);

    int putchar(int c);

It is useful to remember that characters are really just 8-bit integers. Below is a simple program that prints the character 'A' 3 times, terminating with a '\n'.

     #include <stdio.h>

     /*
      * Requires:
      *   Nothing.
      *
      * Effects:
      *   Illustrates 3 ways to print the character 'A'.
      */
     int
     main(void)
     {
     
             putchar('A');      // character literal constant
             putchar(0x41);     // hexadecimal constant
             putchar(65);       // decimal constant
             putchar('\n');
     }

Here is a simple example that is intended to echo characters typed in the terminal:

     #include <stdio.h>

     /*
      * Requires:
      *   Nothing.
      *
      * Effects:
      *   Echo stdin to stdout.
      */
     int
     main(void)
     {
             char c;

             while ((c = getchar()) != EOF)
                     putchar(c);
             /* Returning zero says, "No errors occurred." */
             return (0);
     }

However, this example has a common but subtle bug that we will explore below.

Key points:

Echo Exercise #1

Create a file named my_echo.c that contains the above program.

Compile this program using the command:

          clang -Wall -Wextra -Werror my_echo.c -o my_echo
         

Run the my_echo program and type several lines of text on the keyboard. As you type each line, none of the characters on that line will be echoed a second time until Enter is pressed. (We will learn the reason for this below.) To terminate the program, enter Ctrl+D on the keyboard. Typing this causes the getchar() to return EOF, meaning end of file.

Although the my_echo program appears to work, it has a common but subtle bug that would only be discovered through more extensive testing. In particular, the program terminates prematurely if getchar() ever returns the character that is represented by the value 255. Find and fix this bug.

NOTE: Do not try to test your program by entering the character that is represented by the value 255 from the keyboard. It can be done, but it is not straightforward, and does not work the same on all computers or keyboards. The next section on redirection describes how to test your fix in a way that works on all computers or keyboards.

Once you have fixed the bug, remove #include <stdio.h> from the code and look at the errors and warnings the compiler gives you.

Some useful character manipulation functions are provided by <ctype.h>.

Echo Exercise #2

Modify your fixed echo program to convert all upper case characters to lower case and all spaces to dashes.

Use man isupper, man isspace and man tolower to determine how to do this.


Redirection

In Unix, if a program expects its input from the keyboard and outputs to the terminal, you can still force it to use a file either for its input or output by using redirection, e.g.,

      program args… < inputfile

Similarly, you can redirect its output (standard output) to a file:

      program args… > outputfile

Of course, you can combine these:

      program args… < inputfile > outputfile
Redirection Exercises
  1. Create a file containing text.
  2. Use the echo program from the previous exercise. Redirect its standard input to get the text from this file. Also redirect its output to a file.
  3. Write a new program to construct a test file that demonstrates the bug in the original version of the echo program. Then, use this same test file to demonstrate that your change to this program fixes the bug.

There are other forms of redirections, but these are the basics.


The Big Picture

So far, we have only discussed terminal-based I/O, but C's Standard I/O library that is described by stdio.h also supports file I/O. Thus, the Standard I/O library can meet most programs' needs.

Before we delve into the details of file I/O, let's first discuss the Standard I/O library's place in the overall computer system.

Unix I/O stack
Stdio Stack

Everything above the dashed line will be discussed in this class, either now or later in the semester. Other classes, such as COMP 421 and ELEC 425, are concerned with things below the dashed line.

As shown in the figure, the Standard I/O library is implemented above the operating system. As implied by its name, it provides a set of I/O functions that are standard to all implementations of C. Moreover, they are not tied to any operating system. Thus, if you write a program using only the functions provided by the Standard I/O library, it can be compiled and run on any implementation of C, whether on Red Hat Enterprise Linux or Microsoft Windows. Of course, on each of these operating systems the code implementing the Standard I/O library will be different, but that is not your problem! Any differences in the underlying operating system are effectively hidden from you.


File I/O

As in most programming languages, C's terminal I/O is just a special case of file I/O. The previous functions, getchar() and putchar(), implicitly use the following "files" provided by Standard I/O:

There is one other predefined file that is used to output error messages:

To use a file, you must open and eventually close it, except that the special terminal files are already open and don't need to be closed. You should also check if the file was successfully opened, as it might not exist or have the correct permissions. Here are the prototypes of some useful file manipulation functions:

    FILE *fopen(const char *path, const char *mode);

    int fclose(FILE *fp);

    int fgetc(FILE *stream);

    int fputc(int c, FILE *stream);

    int feof(FILE *stream);

Thus, getchar() is equivalent to fgetc(stdin), and putchar(c) is equivalent to fputc(c, stdout).

An example:

     #include <stdio.h>

     /*
      * Requires:
      *   Nothing.
      *
      * Effects:
      *   Tries to copy the contents of the file "input.txt" to the file
      *   "output.txt".  Returns 0 if the copy completed successfully.
      *   Otherwise, returns 1.  
      */
     int
     main(void)
     {
             FILE *input_file, *output_file;
             int c, error = 0;  /* no error */
             char *input_filename = "input.txt";
             char *output_filename = "output.txt";

             input_file = fopen(input_filename, "r");
             if (input_file == NULL) {
                     fprintf(stderr, "Can't open %s.\n", input_filename);
                     return (1);  /* non-zero for error */
             }
             output_file = fopen(output_filename, "w");
             if (output_file == NULL) {
                     fprintf(stderr, "Can't open %s.\n", output_filename);
                     fclose(input_file);
                     return (1);  /* non-zero for error */
             }
             while ((c = fgetc(input_file)) != EOF)
                     fputc(c, output_file);
             if (!feof(input_file)) {
                     /*
                      * If feof() returns FALSE, then the above while loop
                      * didn't reach the end of file.  The EOF returned by
                      * fgetc() instead meant that an error occurred while
                      * reading from the input file. 
                      */
                     fprintf(stderr, "An error occurred reading %s.\n",
                         input_filename);
                     error = 1;  /* non-zero for error */
             }
             fclose(input_file);
             fclose(output_file);
             return (error);
     }

You always declare variables for referencing files using the type "FILE *", not "FILE".

Some input functions, like fgetc(), return EOF not only when the end-of-file is reached but also when an error occurs while reading from the file. The function feof() can be used to distinguish between these two cases. It returns TRUE if and only if the end-of-file has been reached. Moreover, it only returns TRUE after fgetc() has already returned EOF. Thus, the following usage of feof() would be wrong:

         while (!feof(input_file)) {
             c = fgetc(input_file);
             fputc(c, output_file);
         }

This loop would write an extra character, specifically, the character that is represented by the value 255, to the output file.

By default, with the notable exception of stderr, streams are buffered by Standard I/O. Essentially, this means that each of your program's calls to fgetc() doesn't necessarily result in a corresponding call to a Unix I/O function. That would make fgetc() run too slowly! Instead, depending on whether the stream is a terminal or a file, the Standard I/O library reads (or writes) an entire line or a multi-kilobyte buffer at once from Unix. (The extra characters are stored in a private buffer that is accessible to the Standard I/O library.) This buffering explains why your echo program only echoed the typed characters after you entered a newline.

Another benefit of this buffering is that characters can be put back with ungetc(). This can be useful for writing parsers that require lookahead. Output stream buffers are flushed when they are full, the file is closed, or a newline is written to a terminal. fflush() will manually flush the buffer.

There are many other I/O functions. For example fprintf() is the file version of printf. See a manual by typing man stdio or find online man pages for details of other I/O functions.

File I/O Exercises
  1. Create a file, called input.txt, containing text.
  2. Modify the above example to echo the full contents of input.txt to output.txt, converting all uppercase characters to lower case and all spaces to dashes.