Comp 212 Lab 11: Stream IO and Tokenizing


This lab has two main goals:


Java Reader & Writer IO

Raw, sequential data is one of the cornerstones of computer science. It consists of a sequence of bytes. As such, it can only be read in one way, sequentially - byte by byte. The Reader class contains a basic interface that allows this data to be read. Similarly, the Writer class contains an interface that allows this data to be written.

However, even though its format is universal, raw data can be stored in many ways - files, strings, websites, arrays. Of course, most programs don't care where the data is being read from or written to. As a result, the Reader and Writer classes are subclassed to provide specific implementations of readers and writers for these different ways of storing data

For an example of how the Reader/Writer API works, look at the following page in the Java Tutorial on Java I/O. It shows how the copy operation can be implemented using a FileReader and a FileWriter.

The Reader/Writer pairs listed above are useful because they hide the source or the destination of the data. Other Reader/Writer pairs are useful because they format information as sequential data. The PrintReader/PrintWriter provide many useful helper functions that format seemingly non-sequential data as sequential data.


The Calculator

Now knowing the basics of how to get sequential data into our program, let's build a calculator. In doing so we will build a scanner and a parser.

Functional Description

Our calculator will evaluate expressions entered on the standard input. It will support one type of expression:

[num] [op] [num] where [num] is any number and [op] is either '+', '-', '*', or '/' - representing the usual operations plus, minus, multiply, and divide.

If the user enters a string that doesn't parse to an expression, an error message should be printed. If the user enters an expression that divides a number by zero, a different error message should be printed.

Overall Design

Our program will have 4 distinct parts:

For the sake of simplicity, all of our code will sit in a single package "calc". The runnable class will be "CalcApp.java".

Download the skeleton code here.

In order to motivate how each piece works, we will start with the part that ties everything together and work towards the tokenizer - the lowest level component.

The Application

The application, once started, reads a line from the standard in, parses it, and then evaluates the expression. All this logic is contained in the main method of the CalcApp class.

Reading a single line

In Java, stdin is accessed via the System.in public field (why it is a field and not a method is still a mystery to me). However, notice that System.in is an InputStream. From our earlier discussion, we'd prefer to use a Reader. Fortunately, Java gives us a class that will turn an InputStream into a Reader. What is this class?

Unfortunately, this Reader only returns bytes - which isn't a huge help if we want to read a whole line at a time as a java.lang.String. We'd prefer to use a reader that has a method that returns a whole line as a String. The java.io package also contains this handy Reader. What is this class?

Reading Multiple Lines

At this point, we know enough to write the code that reads a single line from System.in. What must we do in order to read a line and process it over and over again? This code forms the core of our CalcApp.main method.

Processing a line

Now we have a line to do something with. As mentioned earlier, we must first parse it into an expression and then evaluate it. In class we discussed that parsing actually consists of first scanning or tokenizing the string, and then parsing these tokens into an expression. Let's modularize each of these steps into separate objects. Tokenizing will be done by the CalcTokenizer, parsing by the CalcParser, and expression by the Expression itself.

In our design, we assume that CalcTokenizer sequentially returns the tokens for a string provided during construction. We assume that CalcParser provides a single static method CalcParser.parseExpression that, given a tokenizer, returns the parsed Expression. Finally, the Expression object has the intelligence it needs to evaluate itself. This evaluation process is performed when the Expression.evaluate() method is called.

Given all this information, we can write the entire body of the CalcApp.main method. Add the appropriate stub methods to the other classes directly used by the main method.

When writing this, consider how to handle the following kinds of errors:

Evaluation

Now we move to implementing the evaluation process. We already have the starting point - the Expression.evaluate method. The Expression is representing the structured form of the statement "[num] [op] [num]". It is useful to think of each of these tokens as being their own object type - or class in other words. Knowing this, what information will every Expression contain? What fields should Expression contain?

For the sake of making the evaluation process as simple as possible, it behooves us to delegate the responsibility of applying the operation "[op]" to the class that represents an operation. What class is most capable of providing a method that knows how to perform the operation? What are the inputs for such a method? What is the output of this method?

Write this method, calling it Operator.apply, taking into account what errors might occur while applying the operation.

Using the Operator and Number classes, write the Expression.evaluate methods. Do any errors need to be handled here?

Parsing

Since we haven't discussed parsers in detail yet, our parser will be VERY simple. The job of the CalcParser.parseExpression method is to take a CalcTokenizer and return a valid Expression if possible.

Write the CalcParser.parseExpression method (creating any other helper methods that might be useful). What sequence of tokens must your code see in order to create an Expression? What errors does your code need to handle?

NOTE: Since we have not learned an OO approach to designing a parser, ours is non-object oriented and uses static methods for simplicity.

Tokenizing

We have already designated CalcTokenizer as the tokenizer for our application. An instance of a CalcTokenizer returns the tokens in a given java.lang.String. The CalcTokenizer recognizes three types of tokens: Numbers, Operators, and End Of Line (Eol).

Write the constructor of CalcTokenizer to configure a java.io.StreamTokenizer to do the work of actually tokenizing the string.

Write the body of the method Token CalcTokenizer.getNextToken(). Modify other classes as necessary. What errors need to be handled here?

Finishing

Enjoy your Calculator program by compiling it and running calc.CalcApp. You should be able to enter expressions into the Interactions pane and see the result after pressing enter (signaling to the Reader that the line is complete).

In reviewing the design we used, where has our design limited us? Where will it be hard to extend our calculator?

How would we add a new token type to our design?

How would we add a new type of expression to our design? For example, what if I wanted to support expressions with parenthesis "(2 + 3) * 4".

However, note that even though our design is limited, we have modularly designed it so that to fix this limitation will not require overhauling everything, only the design-limited modules. In class on Wednesday, we will learn methods for making our Parser more extensible - a desirable property since most languages, even programming ones, are not completely static.


 

Last revised 04/03/2005 08:30:17 PM

Dung X. Nguyen at dxnguyen at rice dot edu