Comp 201 - Data Abstraction

Introduction

Abstraction is the process of hiding the details and exposing only the essential features of a particular concept or object.
Computer scientists use abstraction to understand and solve problems and communicate their solutions with the computer. The way we communicate with the computer is fashioned after the way we communicate with humans: that is we use "languages." The languages that computers "understand" are called computer languages. They use specific symbols and have specific rules, called syntax rules, on how these symbols should be put together. There are also rules that prescribe what each specific syntactic construct means; they are called semantic rules.

There are many computer languages, each of which is designed to use for some specific problem domain. The language we will learn and use in this course is called Java. It was introduced to the computing community in 1995 as language for programming the computers across the Internet.

Since computers are originally designed to perform calculations (Computers of Yore), we illustrate the problem solving process using Java with the following numerical example.

Problem: Given a rectangle 4.5 ft wide and 7.2 ft high, compute its area.

Solution: We know the area of a rectangle is its width times its height. So all we have to do to solve the above problem is to multiply 4.5 by 7.2 and get the the answer. Note that in order to solve this problem, we need to know what a rectangle is, in particular, how to compute its area. This requires knowledge that is external to programming knowledge. Programming knowledge involves how to express the above (mathematical) solution in a particular computer language (e.g. Java), so that the computer can perform the computation and communicate back the answer.

Data Abstraction

The product of 4.5 by 7.2 is expressed in Java as: 4.5 * 7.2. In this expression, the symbol * represents the multiplication operation. 4.5 and 7.2 are called number literals. Using DrJava, we can type in the expression

4.5 * 
			7.2

directly in the interactions window and see the answer.

Now suppose we change the problem to compute the area of a rectangle of width 3.6 and height 9.3. Has the original problem really changed at all? To put it in another way, has the essence of the original problem changed? After all, the formula for computing the answer is still the same. All we have to do is to enter 3.6 * 9.3. What is it that has not change (the invariant)? And what is it that has changed (the variant)?

Type Abstraction

The problem has not changed in that it still deals with the same geometric shape, a rectangle, described in terms of the same dimensions, its width and height. What vary are simply the values of the width and the height. The formula to compute the area of a rectangle given its width and height does not change:

width * height

It does not care what the actual specific values of width and height are. What it cares about is that the values of width and height must be such that the multiplication operation makes sense. How do we express the above invariants in Java?

We just want to think of the width and height of a given rectangle as elements of the set of real numbers. In computing, we group values with common characteristics into a set and called it a type. In Java, the type double is the set of real numbers that are implemented inside the computer in some specific way. The details of this internal representation is immaterial for our purpose and thus can be ignored. In addition to the type double, Java provides many more pre-built types such as int to represent the set of integers and char to represent the set of characters. We will examine and use them as their need arises in future examples. As to our problem, we only need to restrict ourselves to the type double.

We can define the width and the height of a rectangle as double in Java as follows.

double width;
double height;

The above two statements are called variable definitions where width and height are said to be variable names. In Java, a variable represents a memory location inside the computer. We define a variable by first declare its type, then follow the type by the name of the variable, and terminate the definition with a semi-colon. This a Java syntax rule. Violating a syntax rule constitutes an error. When we define a variable in this manner, its associated memory content is initialized to a default value specified by the Java language. For variables of type double, the default value is 0.

Finger Exercise: Use the interactions pane of DrJava to evaluate width and height and verify that their values are set to 0.

Once we have defined the width and height variables, we can solve our problem by writing the expression that computes the area of the associated rectangle in terms of width and height as follows.

width * height

Observe that the two variable definitions together with the expression to compute the area presented in the above, directly translate the description of the problem -two real numbers representing the width and the height of a rectangle- and the high-level thinking of what the solution of the problem should be -area is the width times the height. We have just expressed the invariants of the problem and its solution. Now, how do we vary width and height in Java? We use what is called the assignment operation. To assign the value 4.5 to the variable width and the value 7.2 to the variable height, we write the following Java assignment statements.

width = 4.5;
height = 7.2;

The syntax rule for the assignment statement in Java is: first write the name of the variable, then follow it by the equal sign, then follow the equal sign by a Java expression, and terminate it with a semi-colon. The semantic (i.e. meaning) of such an assignment is: evaluate the expression on the right hand side of the equal sign and assign the resulting value into the memory location represented by the variable name on the left hand side of the equal side. It is an error if the type of the expression on the right hand side is not a subset of the type of the variable on the left hand side.

Now if we evaluate width * height again (using the Interactions pane of DrJava), we should get the desired answer. Life is good so far, though there is a little bit of inconvenience here: we have to type the expression width * height each time we are asked to compute the area of a rectangle with a given width and a given height. This may be OK for such a simple formula, but what if the formula is something much more complex, like computing the length of the diagonal of a rectangle? Re-typing the formula each time is quite an error-prone process. Is there a way to have the computer memorize the formula and perform the computation behind the scene so that we do not have to memorize it and rewrite it ourselves? The answer is yes, and it takes a little bit more work to achieve this goal in Java.

What we would like to do is to build the equivalent of a black box that takes in as inputs two real numbers (recall type double) with a button. When we put in two numbers and depress the button, "magically" the black box will compute the product of the two input numbers and spit out the result, which we will interpret as the area of a rectangle whose width and height are given by the two input numbers. This black box is in essence a specialized calculator that can only compute one thing: the area of a rectangle given a width and a height. To build this box in Java, we use a construct called a class, which looks like the following.

class AreaCalc { 
    double rectArea(double width, double height) {
        return width * height;
    }
}

What this Java code means is something like: AreaCalc is a blue print of a specialized computing machine that is capable of accepting two input doubles , one labeled width and the other labeled height, computing their product and returning the result. This computation is given a name: rectArea. In Java parlance, it is called a method for the class AreaCalc.

Here is an example of how we use AreaCalc to compute area of a rectangle of width 4.5 and height 7.2. In the Interactions pane of DrJava, enter the following lines of code.

AreaCalc calc = new AreaCalc();
calc.rectArea(4.5, 7.2)

The first line of code defines

calc

as a variable of type

AreaCalc

and assign to it an instance of the class AreaCalc. new is a keyword in Java. It is an example of what is called a class operator. It operates on a class and creates an instance (also called object) of the given class. The second line of code is a call to the object calc to perform the rectangle task where width is assigned the value 4.5 and height is assigned the value 7.2. To get the area of a 5.6 by 8.4 rectangle, we simply use the same calculator calc again:

calc.rectArea(5.6, 8.4);

So instead of solving just one problem-given a rectangle 4.5 ft wide and 7.2 ft high, compute its area- we have built a "machine" that can compute the area of any given rectangle. But what about computing the area of a right triangle with height 5 and base 4? We cannot simply use this calculator. We need another specialized calculator, the kind that can compute the area of a right triangle. (See lab #2 that leads you through the process of writing various calculators that compute various kinds of areas.)

There are at least two distinct designs for such a calculator:

create a new class called AreaCalc2 with one method called rightTriangleArea with two input parameters of type double. This corresponds to designing a different area calculator with one button labeled rightTriangleArea with two input slots.
add to AreaCalc a method called rightTriangleArea with two input parameters of type double. This corresponds to designing an area calculator with two buttons: one labeled rectArea with two input slots and the other labeled rightTriangleArea, also with two input slots.

In either design, it is the responsibility of the calculator user to pick the appropriate calculator or press the appropriate button on the calculator to correctly obtain the area of the given geometric shape. Since the two computations require exactly the same number of input parameters of exactly the same type, the calculator user must be careful not get mixed up. This may not be too much of an inconvenience if there are only two kinds of shape to choose from: rectangle and right triangle. But what if the user has to choose from hundreds of different shapes? or better yet an open-ended number of shapes? How can we, as programmers, build a calculator that can handle an infinite number of shapes? The answer lies in abstraction. To motivate how conceptualize the problem, let us digress and contemplate the behavior of a child!

Glossary

assignment: To set a variable to a particular data value. See Also: variable.

invariant: The parts of a program, such as values or programmatic behaviors, that do not vary from one invocation to the next. Note that while a value may be variant, an abstraction of that value may be invariant. See Also: variant.

literal: An explicit, concrete textual representation of a value of a given type. Literals are often used to set the values for variables. See Also: variable.

type: A set of values with certain common characteristics. In Java, all data values must be of some type. Examples:

int is a type that is used to represent integer number values.
double is a type that is used to represent real number values.
String is a type that is used to represent a string of characters.

See Also: variable.

variable: A memory location to hold a particular value of a given type. In a strongly-typed language such as Java, all variables must have a type. This is not true in all languages however. In a Java program, variables have names called identifiers, which are sequences of characters put together according to the following rule. A must begin with an alphabet character (e.g. 'a', 'b', 'X', 'Y', etc.) and may be followed by zero or more alphabet characters and/or digit characters (e.g. '0', '1', etc.) and/or the underscore character ('_'). For examples, cp3PO is a valid variable name while Darth Vader is not because it has a blank character between the 'h' and the 'V'. See Also: type.

variant: The parts of a program, such as values or programmatic behaviors, that vary from one invocation to the next. Note that while a value may be variant, its abstraction may be invariant. See Also: invariant.

Last Revised Thursday, 03-Jun-2010 09:50:19 CDT

Comp201: Principles of Object-Oriented Programming I Spring 2007 -- Lec 02: Data Abstraction

Introduction

Data Abstraction

Type Abstraction

Glossary

Comp201: Principles of Object-Oriented Programming I
Spring 2007 -- Lec 02: Data Abstraction