Rice University - Comp 212 - Intermediate Programming

Spring 2006

Lecture #30 - Stream and File IO

Slides

Introduction

In Java, input and output is defined in terms of an abstract concept called "stream".  A stream is a sequence of data.  If it is an input stream, it has source.  If it is an output stream, it has a destination.  There are two kinds of streams: byte streams and character streams.  The java.io package provides a large number of classes to perform stream I/O.  Mastering Java stream I/O seems like a daunting task.  Do not fret.  In the beginning, you only need to learn to how to manipulate a few I/O classes.  As you progress, you can figure out on your own how to use the other I/O classes and even design new customized I/O classes.  This lecture describes a few commonly used I/O classes and a class used to parse the input streams called StreamTokenizer.

More details on IO Streams can be found at the Java Online Tutorial site:

http://java.sun.com/docs/books/tutorial/essential/io/index.html

 


The InputStream and OutputStream classes

Input and output in Java of byte (i.e. binary) streams are handled through subclasses of the java.io.InputStream and java.io.OutputStream classes.  These classes are abstractions that Java provides for dealing with reading and writing information sequentially to/from anything you want, be it a disk, a string buffer, an enumeration, or an array of bytes.  We will not spend time on these classes.  Click on this link to see more details on  InputStream and OutputStream


The Reader and Writer classes

Input and output for characters (i.e. ASCII) streams are handled through subclasses of the java.io.Reader and java.io.Writer classes, respectively.  These classes are abstractions that Java provides for dealing with reading and writing character data.

Reader is an abstract class that abstractly knows how to read characters from an abstract character source stream.  To read from a concrete character source stream, you will need to instantiate an concrete subclass of Reader that concretely knows how to read characters from the source.  For example, to read from a text file with file name fn, you can instantiate a FileReader (a concrete subclass of Reader) object as follows:

FileReader fReader = new FileReader(fn);

Afterwards, you can start reading by calling fReader.read() to get the next (unicode) character from the source.

Writer is an abstract class that abstractly knows how to write characters to an abstract character destination stream.  To write to a concrete character destination stream, you will need to instantiate an concrete subclass of Writer that concretely knows how to write characters to the destination.  For example, to write to text file with file name fn, you can instantiate a FileWriter (a concrete subclass of Writer ) object as follows:

FileWriter fWriter = new FileWriter(fn);

Afterwards, you can start writing by calling fWriter.write() to write the next (unicode) character to the destination.

The link below from the online Sun java tutorial shows a simple example of how to read/write text files:

http://java.sun.com/docs/books/tutorial/essential/io/filestreams.html

Below is the UML class diagram for a few commonly used Reader classes.

 

And here is a UML class diagram for common Writer classes.

 

The section on parsing below will illustrate the use of Reader and Writer streams to read a text file and parse it into "tokens".


StreamTokenizer and Parsing

Many times when reading an input stream of characters, we need to "parse" it to see if what we are reading is a word or a number.  This process is called "tokenizing". A "token" is a sequence of characters that represents some abstract entity.  For examples, the string "public" is a word token, while the string "123" is a number token. Tokenizing is the first thing that a compiler would do in order to figure out whether or not the program source is syntactically correct.  In standard compiler techniques, a token is represented by an integer.

 java.io.StreamTokenizer is a class that can do simple tokenization.  It can scan an input stream and return the next token in the input stream as an integer (int), via the method nextToken().  The only constructor you should use to instantiate a StreamTokenizer object is StreamTokenizer(Reader inp).  For examples:

TestStreamTokenizer.java is a sample program showing how to use StreamTokenizer.  The program uses a while loop to scan the input file until the end-of-file is reached.  This is probably the first time you see a while loop.  The syntax of a while loop looks like:

while (boolean loop condition) {
    loop body code
}

The semantic of the while loop is as follows: 

The main() method takes as parameters the name of some input file and the name of an output file.  The program opens the associated input text file, scans it and write the tokens back out onto System.out and the associated output text file.  For example, suppose you have an input text file called input.txt, executing

java TestStreamTokenizer input.txt output.txt

will read from input.txt and write to both System.out and output.txt.

Copy TestStreamTokenizer.java and the file input.txt to your local directory and test it out.


Command-line arguments

Sometimes when running large programs you want to be able to set options about how your program should behave at runtime.  For instance, you might have a hierarchy of debugging statements that you want to suppress at run-time, or you might have a variety of features that you want your users to have access to (check out the manual page for the Unix command "ls"--there's a vast amount of features regarding how to display your data, what to display, ad nauseam).  You can handle this in several ways, but most people make use of command-line arguments to do it.

Remember how all your main functions for your programs have to have the same signature?

        public static void main(String[] argv) { ... }

The array of Strings is where you keep your command-line arguments.  Everything after the class name is included in argv. Let's say we wanted to write a program that just echoed back the command-line arguments to stdout. Here's how we'd do it:

    public class Echo {

        public static void main(String[] argv) {
            for(int j = 0; j < argv.length; j++)
                System.out.print(argv[j] + " ");  // don't insert an end-line character yet
            System.out.println();  // now print an end-line ('\n')
        }
    }
 


The System class

The java.lang.System class is a useful class containing useful static members doing useful things.  This should look familiar:

    System.out.println("Hello world!");

I'm sure you've done calls like this a million times already, but what's really going on?  out is a static member of the System class; it's a PrintStream object.  A PrintStream is a grandchild of OutputStream in the Java class hierarchy--it has methods implemented that print lines of text at a time as opposed to each character at a time.  System.out is initialized when a program starts to what is known as standard output (stdout).  Stdout is usually the monitor screen, but you can also send stdout to a file at runtime by redirecting it from the Unix command line.  For example, to send the stdout to file "outfile.txt", we do the following:

    % java MyClass > outfile.txt

There is also a System.in class popularly known as standard input (stdin).  Stdin is an InputStream, initially set to taking input from the keyboard, but it can also read from a file at runtime like this:

    % java MyClass < infile.txt

There's a third System i/o file called standard error (stderr).  System.err is another PrintStream designed to direct error messages in case you don't want your output and error messages going to the same place.  Stderr is also initialized to the monitor, but you can redirect it like this:

    % java MyClass >& errfile.txt

And you can combine the redirections:

    % java MyClass < infile.txt > outfile.txt >& errfile.txt

There's lots of other stuff the System class provides for you--check it out.

NOTE: System.out and System.err are the ONLY PrintStream object you should use.  Class PrintStream is deprecated.  To output character streams, you should use PrintWriter (shown in the above UML diagram and illustrated in  TestStreamTokenizer.java) instead.



Dung X. Nguyen
Last revised 03/22/2006