Lab 2 - Basic I/O in C

Lab goals:

Introduce some of the essentials for using the Unix command line.
Introduce C character input and output functions and how they are used.
Introduce simple file I/O using the Standard I/O library.

Using SSH Keys to Login to CLEAR

As explained in the last lab, you will be using CLEAR for all of your work in COMP 321. You can continue to login to CLEAR using your password and DUO, if you would like. However, ssh allows the use of a public-private key pair to provide authentication. This is more secure than a password, which can be guessed, and it is also more convenient. It is significantly more difficult to guess a properly generated private key. To authenticate, the system uses the private key to encrypt information, which can then only be decrypted using the public key. The private key cannot be guessed using the public key, so the public key can be freely shared. And if you can decrypt the information with the public key, you know that it came from someone who has the corresponding private key.

If you would like to use ssh keys to login to clear from your laptop without using a password (or using DUO!), do the following in Terminal on a Mac or PowerShell on Windows (do this on your laptop, not on CLEAR):

Generate a public-private key pair by typing the following command:
```
    ssh-keygen -t ed25519
```
Press Enter to accept the default location for the key pair.
Press Enter to accept the default passphrase (i.e., no passphrase).
Press Enter again to confirm the passphrase.
Copy the public key to CLEAR by typing the following command (substituting your own NetID for yournetid and substituting the path to the key file as output by ssh-keygen for key):
```
    ssh-copy-id -i key yournetid@ssh.clear.rice.edu
```
Enter your password for CLEAR when prompted.
Use DUO to complete the authentication

If your system does not have ssh-copy-id, you can instead do the following:

    cat key.pub | ssh yournetid@ssh.clear.rice.edu "cat >> ~/.ssh/authorized_keys"

Note that you must specifiy the public key if you use this command. ssh-copy-id makes sure to copy the public key without being told.

After you do this, you should be able to ssh into CLEAR without typing a password. Test that your SSH keys work by trying to ssh into CLEAR from Terminal or PowerShell:

    ssh ssh.clear.rice.edu

If everything is working, you should be logged in to CLEAR without having had to type a password or use DUO.

Similarly, when you connect to CLEAR through VSCode, it will no longer prompt you for a password. However, you may need to edit your ssh config file to make that work. If VSCode continues to ask for a password when you use Remote-SSH to connect, do the following:

Run the Remote-SSH: Add New SSH Host command in VSCode.
Type ssh yournetid@ssh.clear.rice.edu as the command, where yournetid is your NetID.
It will ask which SSH configuration file to update. Select the first one, which should be somewhere in your home/user directory. If it doesn't explicitly ask, there will be a dialog at the bottom with an Open Config button, which you should click.
VSCode will open the configuration file which should have an entry with Host ssh.clear.rice.edu in it. There should be fields for HostName and User indented beneath it.
Add an entry that is indented to the same level as HostName and User, with the value IdentityFile keyfilename where keyfilename is the full path of the private key file you created with ssh-keygen above.

Now when you run the Remote-SSH: Connect to Host... command, ssh.clear.rice.edu should be an option that will connect without needing to enter any passwords or to use Duo. (If you need to fix anything, there's also a Configure SSH Hosts... option that will let you edit the configuration file further.)

Unix Command Line Essentials

Many of you may not have ever navigated around a Unix filesystem using the command line before, other than the small amount that you did in Lab 1. Some of this was briefly described in the last lab, but this section of the lab this week will help you acquire a bit more command line skills.

Roughly speaking, a Unix filesystem is a hierarchical (tree-like) collection of directories (sometimes referred to as folders on other types of systems), where each directory may contain either files, other directories, or some combination of both. When working on a Unix system, you are always in some directory, known as your current working directory or just your current directory or working directory. Navigating the filesystem relies on this abstraction of a current directory. The pwd command will print the name of your working directory. For our first example of pwd, we will use it right after logging in.

When you first log in to a Unix system, you are in your home directory as your current directory. To see the full name of your home directory then, just type

pwd

right after logging in to CLEAR. As you move around in the Unix filesystem, you can always use the pwd command to see where you are.

Our next command is ls (mentioned in the previous lab). The ls command will list the contents of a directory. When used with no command-line arguments, ls operates on the current working directory. Let's use ls to look at the contents of the current directory. This will be your home directory if you have just logged in. If you type the shell command

ls

you should see several files listed. Most importantly, you should see your Lab 1 repo directory (that you created last week). This Lab 1 repo directory will be named

lab-1-introduction-to-c-name

where name is your GitHub userid. We cannot say what else may be in your personal home directory, but you will in general also see other names listed from this ls command.

Before describing more on navigating in a Unix filesystem, we need to understand the notion of a pathname. In Unix, there are two types of pathnames, absolute pathnames and relative pathnames. An absolute pathname refers to the same location in the file system regardless of what your current working directory is. Absolute pathnames begin with a / (called slash) character, in reference to the top of the directory hierarchy, called the root directory, itself represented by the pathname just /. Relative pathnames, on the other hand, are resolved relative to the current working directory and do not begin with a / character.

The fundamental Unix navigation command is cd (also mentioned in the previous lab). The name cd is short for change directory.

Normally, cd takes a pathname argument, such as

cd pathname

for some specified pathname (which, again, may be an absolute pathname or a relative pathname). This command takes you to the directory specified by the pathname; in other words, it changes your current working directory to be the directory specified by that pathname.

If you use cd without a pathname argument, then cd will take you to your home directory.

To get some practice with what you know so far, first type

     cd /usr/bin

Then, use the pwd command to confirm that your cd worked correctly and that your current working directory is now /usr/bin. Also, type the ls command (with no arguments) to see what is in the /usr/bin directory.

Next, run the cd command (with no pathname), followed by the pwd command to confirm that you are back to your home directory.

The ls command can also operate on pathnames. Let's use ls to operate on the absolute pathname /usr/bin:

     ls /usr/bin

You should see the same files as you saw when you ran cd /usr/bin and ls above. Using ls with a pathname, however, does not change your current working directory as the cd command did above.

Now, we will explore cd with a relative pathname. Assuming you are still in your home directory, you want to get to your Lab 1 repo directory. This means that you should type

cd lab-1-introduction-to-c-name

where name is your GitHub userid. But that is a lot of typing! So, before you attempt to type such a long name, we will give you some tips on Unix command line techniques.

Digression: Unix Command Line Tips

Tip #1: To save yourself a lot of typing, you can get the shell to complete a pathname for you. To see this working, type

     cd lab-1

but do not hit Enter at the end of it. Instead right after typing the 1 character above, hit the Tab key on the keyboard. You should be pleasantly surprised to see that Unix has completed the file name for you! This technique is called tab completion.

Tab completion works for any Unix pathname/file name. If you type the Tab key and do not see a completion, that means that either there are no matching pathnames/file names, or there are more than 1 valid completions. To see which is the case, type the Tab key a second time. If you get a list of possibilities, then there are more than 1 valid completions for what you had typed (you did not type enough characters to uniquely determine the name). Type enough more characters to disambiguate what you want among the choices shown, then type Tab again to get a successful completion. Note that, depending on the configuration of your shell, the list of possibilities may be listed automatically after the first Tab, without you needing to type the second Tab to see the list. Or in other shell configurations, typing a control-D character, also referred to as Ctrl+D or written as ^D (as you saw in the introduction to nano in last week's lab), rather than the second Tab will show the list of possibilities. To type a control-D, hold down the Control key, type a d, and then release the Control key.

Tip #2: This tip is little more general purpose. The technique we will show you here is called history editing. Unix shells save a history of your previous commands. To access your command history, use the arrow keys. The up-arrow key takes you to older commands in your history, whereas the down-arrow key moves towards newer commands in your history. When you see the command you want to re-execute, hit the Enter key, and the shell will then run that command just as if you had just now typed it from scratch.

And, with this history feature, you can do more than just re-execute old commands. You can also edit these old commands and then execute your modified command. To edit a command, use the left-arrow and right-arrow keys to position the cursor, then use the Delete key to delete characters, or just type there what you want to insert.

We encourage you to use tab completion and history editing to greatly simplify and enhance your command line usage.

Now that we can navigate a Unix directory structure, we turn to creating and maintaining a directory structure of your own to organize your files for COMP 321. Leaving all of your files in your home directory can get messy, so here is a crash course on setting up a basic directory structure. The first step in organizing your home directory is to create a directory for your COMP 321 work. The mkdir command makes a directory with a name given by the command line argument. Use cd to go to your home directory (if you are not already there), and issue the following command to create your new directory for your COMP 321 work:

     cd
     mkdir comp321

Then use the ls command to verify that your comp321 directory was created.

Our next organizational step will be to create two new directories within your new comp321 directory, one for lab work (which we will call labs) and one for programming assignments (which we will call assignments):

     cd comp321
     mkdir labs
     mkdir assignments
     ls

The ls command at the end there is just used to allow you, as above, to see that your two new directories were created. Note that you could instead have said mkdir labs assignments as a single command instead of two separate commands.

Now, we will move your existing lab repo directories from your home directory to comp321/labs. The Unix command for moving files is mv. This command moves one or more files or directories into the directory specified as the last command line argument. So, our next organization steps will be to change to your home directory, move your existing lab repo directory into comp321/labs, and then verify the result.

Specifically, if lab1dir is the name of your Lab 1 repo directory then run the following three commands:

cd
mv lab1dir comp321/labs
ls comp321/labs

You might want to use tab completion (described above) to save some typing for the repo directory names. For a final sanity check, try:

     cd comp321/labs
     pwd
     ls

Hopefully, everything worked as expected and you an see your two repo directories there. We recommend cloning future lab git repositories directly into your newly created labs directory.

You can mv the directories you created for the Factors assignment into your new comp321/assignments directory in the same way.

The mv command also serves as the rename command. Use the rename feature like this:

mv oldname newname

where oldname is the old (existing) name of the file or directory that you want to rename, and newname is the new name that you want to rename it to be. You need to be careful, though. If newname is the name of a directory that already exists, then this mv command will move oldname into the directory newname rather than renaming oldname to be newname.

As explained in the last lab, in Unix, .. is a special name used to refer to the parent directory of the current directory (unless you are in the root directory, in which case .. also refers to the current directory, since the parent of the root directory is the root directory). So, cd .. will change your current directory to the parent directory, and ls .. will list the contents of the parent directory. Similarly, . is a special name used to refer to the current directory itself, which is why typing ./program tells the shell to look for program (only) in the current directory.

Our final command here is the rm command. This command can be used to remove files that you no longer want. But be careful! The rm command cannot be undone. The rm command does not move the files to a Trash directory or the like from which they can be resurrected if you made a mistake. Once a file has been removed with rm, it is gone forever!

As always, see the man pages for more complete details on any of these commands (e.g., man rm).

Character I/O

As in most programming languages, to do any useful work in C, you must use I/O to get and return data. The simplest, most basic form of I/O is character-based I/O, using the C functions getchar() and putchar(). These functions are defined by the header file stdio.h, which is used for all of C's "Standard I/O" library. The function prototypes for getchar() and putchar() are as follows:

    int getchar(void);

    int putchar(int c);

It is useful to remember that characters are really just 8-bit integers. Below is a simple program that prints the character 'A' three times (in three different ways), terminating the output with a newline character ('\n'):

     #include <stdio.h>

     /*
      * Requires:
      *   Nothing.
      *
      * Effects:
      *   Illustrates 3 ways to print the character 'A'.
      */
     int
     main(void)
     {
             putchar('A');      // 'A' as a character literal constant
             putchar(0x41);     // 'A' as a hexadecimal constant
             putchar(65);       // 'A' as a decimal constant
             putchar('\n');     // newline character terminates the line
     }

Here is a simple example program intended to echo characters typed at the terminal:

     #include <stdio.h>

     /*
      * Requires:
      *   Nothing.
      *
      * Effects:
      *   Echo stdin to stdout.
      */
     int
     main(void)
     {
             char c;

             while ((c = getchar()) != EOF)
                     putchar(c);
             /* Returning zero says, "No errors occurred." */
             return (0);
     }

However, this example program has a common but subtle bug that we will explore below.

Key points:

To use these I/O functions, you should include stdio.h, the header file that provides their declarations.
This program illustrates a construct that is commonly used in C programs but is sometimes confusing to beginners. This while loop condition expression combines an assignment together with an inequality test. In effect, the expression c = getchar() is evaluated first (since it is inside the parentheses), followed by the expression c != EOF to compare c (i.e., the value returned by getchar()) against the value EOF.

This combination is possible because the result of the assignment operation in C (i.e., =) has a value, just like the result of an addition or subtraction operation has a value. However, care must be taken to ensure that the assignment takes place before the test. Otherwise, the variable c would be assigned the result of the equality comparison getchar() != EOF, which has a value either 0 (false) or 1 (true). As with arithmetic expressions, the order of evaluation can be controlled by surrounding the assignment operation, as here, with parentheses, to make it evaluate before the equality comparison.
getchar() reads one character from the input and returns it. However, getchar() actually has 257 possible different return values: the values 0 through 255, representing one of the 256 possible different characters, and EOF, which is not a character but a special value that is returned to indicate the end of file (or an error) on the input. (Traditionally, EOF is represented by the value -1.) This is why getchar()'s return type is int and not char. A char can represent only 256 different values, but getchar() can return any of 257 different values.
putchar() outputs one character to the terminal. It returns the value EOF on any error.

I/O Redirection

In Unix, if a program expects its input from the keyboard and outputs to the terminal, you can still force it to instead use a file either for its input or output (or both) by using I/O redirection on the command line when you run the program. For example, putchar() writes its output to what is referred to as standard output, and getchar() reads its input from what is referred to as standard input. By default both standard output and standard input are connected to the terminal.

Suppose that some program named program uses putchar() and/or getchar(). You can run program from the current directory (here with the command-line arguments arg1, arg2, and arg3), redirecting the program's standard input from a file such as inputfile:

      ./program arg1 arg2 arg3 < inputfile

The less than sign < indicates the input redirection from the file inputfile (the < and the name inputfile are treated specially by the shell and are not command-line arguments passed to program).

Similarly, you can redirect the program's standard output to a file such as outputfile:

      ./program arg1 arg2 arg3 > outputfile

The greater than sign > indicates the output redirection to the file outputfile (the > and the name outputfile are treated specially by the shell and are not command-line arguments passed to program).

And, of course, you can combine both of these forms of I/O redirection to redirect both its standard input and its standard output:

      ./program arg1 arg2 arg3 < inputfile > outputfile

There are other forms of I/O redirection, but these are the main basics.

The Bigger Picture

So far, we have discussed only terminal-based I/O (and command-line I/O redirection to redirect it to/from files), but C's Standard I/O library, described by the header file stdio.h, also directly supports file I/O. Thus, the Standard I/O library can meet most programs' I/O needs.

Before we delve into the details of file I/O, let's first discuss the Standard I/O library's place in an overall computer system. The figure below represents the various layers of the Unix I/O stack, with each layer utilizing the layer below it:

**The Unix I/O Stack**

Everything above the horizontal dashed line will be discussed in this class, either now or later in the semester. Other classes, such as COMP 421 and ELEC 425, are concerned with things below this line.

The device drivers shown in the figure above are part of the operating system (i.e., part of Unix), and the Standard I/O library is implemented above this. As implied by its name, the Standard I/O library provides a set of I/O functions that are standard across all implementations of C. Moreover, they are not tied to any particular operating system. Thus, if you write a program using only the functions provided by the Standard I/O library, it can be compiled and run on any implementation of C, whether on Red Hat Enterprise Linux or Microsoft Windows or otherwise. Of course, on each of these operating systems, the code implementing the Standard I/O library will be different, but that is not your problem! Any differences in the underlying operating system are effectively hidden from you by this library.

File I/O Using the Standard I/O Library

As in most programming languages, C's terminal I/O is just a special case of its file I/O. The I/O functions described above, getchar() and putchar(), implicitly use the following files (commonly also known as streams) provided by the Standard I/O library:

stdin (standard input, typically the keyboard) or
stdout (standard output, typically the monitor).

There is also one other predefined file (i.e., stream), generally used for outputting error messages:

stderr (standard error, typically the monitor).

For more information, see man stdio.

To use a file (i.e., stream), you must open and eventually close the file, except that the special files stdin, stdout, and stderr are already open and don't need to be closed (also, other files that you have open are automatically closed when the entire program terminates). When you open a file, you should always check if the file was successfully opened, as the file might, for example, not exist or have the correct permissions for accessing it. Here are the prototypes of some useful file manipulation functions in the Standard I/O library (use the man for more information on any of them):

    FILE *fopen(const char *pathname, const char *mode);

    int fclose(FILE *fp);

    int fgetc(FILE *stream);

    int fputc(int c, FILE *stream);

    int feof(FILE *stream);

The function fopen(const char *pathname, const char *mode) opens the file identified by the name pathname for the indicated mode (e.g., for reading the file ("r"), for writing the file ("w"), or both ("r+" or "w+")). It returns a FILE *, which is a pointer indicating the open stream, or returns NULL on any error. The function fclose(FILE *fp) closes the open stream indicated by the given FILE * pointer.

The function fgetc(stdin) is essentially equivalent to getchar(), and the function fputc(c, stdout) is essentially equivalent to putchar(c). In particular, fgetc() reads and returns the next character from the indicated stream (rather than always just from stdin), and fputc() outputs the given character to the indicated stream (rather than always just to stdout).

For each open stream, the Standard I/O library remembers whether or not reading from that stream has previously encountered the end-of-file (attempting to read past that point, not just reading up to that point). The function feof() tests this internal remembered end-of-file indicator for the indicated stream, returning nonzero (meaning true) if it is set (meaning that reading from that stream has previously encountered the end-of-file). The use of feof() is explained more fully below.

An example:

     #include <stdio.h>

     /*
      * Requires:
      *   Nothing.
      *
      * Effects:
      *   Tries to copy the contents of the file "input.txt" to the file
      *   "output.txt".  Returns 0 if the copy completed successfully.
      *   Otherwise, returns 1.  
      */
     int
     main(void)
     {
             FILE *input_file, *output_file;
             int c, error = 0;  /* no error */
             char *input_filename = "input.txt";
             char *output_filename = "output.txt";

             input_file = fopen(input_filename, "r");
             if (input_file == NULL) {
                     fprintf(stderr, "Can't open %s.\n", input_filename);
                     return (1);  /* non-zero for error */
             }
             output_file = fopen(output_filename, "w");
             if (output_file == NULL) {
                     fprintf(stderr, "Can't open %s.\n", output_filename);
                     fclose(input_file);
                     return (1);  /* non-zero for error */
             }
             while ((c = fgetc(input_file)) != EOF)
                     fputc(c, output_file);
             if (!feof(input_file)) {
                     /*
                      * If feof() returns FALSE, then the above while loop
                      * didn't reach the end of file.  The EOF returned by
                      * fgetc() instead meant that an error occurred while
                      * reading from the input file. 
                      */
                     fprintf(stderr, "An error occurred reading %s.\n",
                         input_filename);
                     error = 1;  /* non-zero for error */
             }
             fclose(input_file);
             fclose(output_file);
             return (error);
     }

Always declare variables for referencing streams using the C type FILE *, not just FILE.

Some input functions, like fgetc(), return the value EOF not only when the end-of-file has been reached (that is, when you try to read past the actual end of the file) but also when an error occurs while reading from the file. The function feof() can be used to distinguish between these two cases. Its return value is nonzero (meaning true) if and only if the end-of-file has been reached (attempting to read past that point, not just reading up to that point) and fgetc() (or some other input function on that stream) has already returned EOF due to that. Thus, the following usage of feof() would be incorrect:

     while (!feof(input_file)) {
             c = fgetc(input_file);
             fputc(c, output_file);
     }

This loop would output an extra final character, specifically the character that is represented by the value 255, to the output file. Think about why this is the case.

By default, with the exception of stderr, Standard I/O streams are buffered by the Standard I/O library. Essentially, for input, this means that each of your program's calls, for example to fgetc(), doesn't necessarily result in a corresponding call to a Unix operating system I/O input function. Likewise, for output, each of your program's calls, for example to fputc(), don't necessarily result in a corresponding call to a Unix operating system I/O output function. Performing all of these operating system calls would make these Standard I/O library calls run very slowly!

Instead, depending on whether the stream is a terminal or a file, the Standard I/O library reads (or writes) an entire line or a multi-kilobyte buffer at once from the operating system. The extra characters are stored in a private internal buffer within the Standard I/O library. This buffering explains why your echo program above only echoed the full line of typed characters after you typed Enter on the keyboard. Another benefit of this buffering is that input characters can be put back with ungetc(). This can be useful for writing parsers that require lookahead. Output stream buffers by default are flushed when they are full, when the file is closed, or, for streams open to a terminal, when a newline character \n is written to the stream. The function fflush(FILE *stream) can also be used anytime to manually force the output buffer of the indicated stream to be flushed.

There are many other functions in the Standard I/O library. For example fprintf() is the file version of printf. See man stdio, as suggested above. In addition to describing the standard streams stdin, stdout, and stderr, this manual page also has a list of all library functions that are part of the Standard I/O library. You can also use the man command on any of those other function names to find more information on it.

GitHub Repository for This Lab

To set up your private repo for this lab, follow the same steps as for the previous lab and the first assignment. In particular, the first step for any lab or assignment in this class is to use your browser to go to the provided link for the starter code. For this lab, that link is:

https://classroom.github.com/a/1-5-KAvz

If you are not already logged in to GitHub, it will ask you to do so now.

You should then be presented with a web page asking you to "Accept the assignment". Please click on the green Accept this assignment button. GitHub will then do a bit of work to create a new remote repo for you for this assignment, based on the starter code we have provided. This may take only a few seconds, or it may take a few minutes. Occasionally use your browser to refresh the view of this web page, until you get a new page saying You're ready to go!.

On this You're ready to go! page, you should also see a link to your new remote repo. This link will begin with https:// and will end with your own GitHub username. Use the mouse on your computer to copy that entire link (beginning with the https:// text and ending with your GitHub username) to the clipboard on your computer. You will use this link in a moment to clone this remote repo onto CLEAR as your local repo for this lab.

Log in to the CLEAR system if you have not already done so. Assuming you have already created a comp321/labs directory as directed in Section 1, first use cd comp321/labs to change into your labs directory, then type the following command:

git clone paste your repo link from the clipboard here.git

In other words, type the words

git clone

followed by a space, then paste the link from your clipboard (the link you copied above) onto that same command line, and then type .git (with no space before it) onto the end of the command line, completing the command (hit the Enter key on your keyboard). You will be prompted for your GitHub username and password. You must use your GitHub personal access token as your GitHub password here (e.g., use your mouse to copy and paste your personal access token here, from whatever file you saved it in when you generated during the first lab).

Once the clone operation is complete, you will have a directory named

lab-2-basic-i-o-in-c-name

where name is your GitHub username.

Now, change your current working directory into this new directory by typing

cd lab-2-basic-i-o-in-c-name

where name, again, is your GitHub username.

You are now ready to begin working on the exercises in the README.md file in the repository. You can view it nicely formatted in Github. The file has that name so that Github will display it when you navigate to the repository below the source code listing. You do not need to actually open the file.

Submission

As with all labs in this course, be sure to git push your lab before 11:55 PM tonight to get credit for this lab.

And you should always include in your repo all files that you created in the lab, other than those (such as the output of the compiler) generated automatically from other files. Think of this as including just the files necessary to backup and to be able to recreate your work. Do not simply add all files to your repo; for example, do not simply say something like git add . or git add *, and never add any core files to your repo. Only add the actual files that should be in your repo.

COMP 321: Introduction to Computer Systems

Navigation