20-CS-122-001 Computer Science II Spring 2012
Brownie Point Assignment

Virtual functions, classes, inheritance, lists, queues, stacks, applications

Evaluate Simple Arithmetic Expressions

For 15 brownie points (not required)
(submit instructions: here)

Rationale:

Simple use of the Stack class.

Purpose:

Use two stacks, one for holding operators and one for holding operands, to evaluate a simple arithmetic expression. Operands are numbers. Operators include plus (+), minus (-), multiply (*), divide (/). Input should be taken from stdin to allow the user to apply the evaluator to a file containing a large expression or to an argument on the command line.

Background:

Consider a simple arithmetic expression using parens, arithmetic operators, and numbers. The following is an example

    (7 + (3 * (4 + 4) * 2 * 2 * (1 + 2)) * 3)

How would you evaluate this expression? Before you answer, in order to simplify code later, let us suppose + and * have the same level of precedence and that operations are evaluated from left to right. So, for example, the expression

    (8 + 7 * 6)

would be evaluated by first adding 8 and 7 to get 15, then multiplying 15 by 6 to get 90. This is different from what one normally does which would be to multiply first then add. Due to the simplification stated above, operations on the same nested level may be paired from left to right. Doing so to the first expression above gives

    ((7 + (((((3 * (4 + 4)) * 2) * 2) * (1 + 2)))) * 3)

This expression can be evaluated easily. First, find the most deeply nested parenthesized subexpression and evaluate that. In the above example its (4 + 4). Evaluating and substituting gives

    ((7 + (((((3 * 8) * 2) * 2) * (1 + 2)))) * 3)

Next, find the most deeply nested subexpression to be (3 * 8). Evaluating and substituting gives

    ((7 + ((((24 * 2) * 2) * (1 + 2)))) * 3)

Repeating (realizing that (number) can be replaced by number) gives the following sequence

    ((7 + (((48 * 2) * (1 + 2)))) * 3)
    ((7 + ((96 * (1 + 2)))) * 3)
    ((7 + ((96 * 3))) * 3)
    ((7 + (288)) * 3)
    ((7 + 288) * 3)
    ((295) * 3)
    (295 * 3)
    (885)
    885

Try this  

Now consider how a computer program might be written to find this answer. It is desireable to scan the input from left to right, reading every token just once. This means that some scanned operations must be delayed until they become the currently most deeply nested. Accordingly, operands associated with delayed operators are also delayed. The delays are easily implemented using two stacks: one for the operators, called Operator, and one for the operands, called Operands. Stacks are used because it will be easy to push items onto the stacks in such a way that the next operands to use can be found quickly at the top of the Operands stack and the next operator to be applied can be found quickly at the top of the Operator stack.

Let's express the rules for pushing and popping in English by considering the possibilities one at a time:

First, observe when reading a token from the input line there are four possibilities: namely, the token can be a number, a left paren, a right paren, or an operator. Let's start with the easiest one...

The remaining cases are more complex and involve popping data from the stacks. Consider first the case of ...

The next case is reading a number. But a number may be read before or after the operator it associates with, or there may be no operator associated with the number (for example, consider the case (23)). However, when reading a number, there is no way to tell whether or not an associated operator exists for that number if it is read before an associated operator is read. Hence, we will ignore the latter case for now and say the case () is undefined. So there are only two subcases to consider. First consider...

The second subcase is...

The last case is the hardest...

Now that we understand the above rules, the question is how to organize the code needed to solve the problem. From the rules above it seems there will be sections in the code where some action is invoked and there will be points where decisions (if or switch statements) will be made about which action section to invoke next, given the type of token that is read and the action section most recently invoked. For example, we can identify four actions:

  1. Push an operator onto the Operator stack.
  2. Push an operand onto the Operands stack.
  3. Pop the top operand from the Operands stack and the top operator from the Operator stack, apply the operator to the operand and the input token, push the result onto the Operands stack.
  4. Pop the top two operands from the Operands stack, pop the top operator from the Operator stack, apply the operator to the operands, and push the result on the Operands stack.

We can also identify four points at which decisions are made:

It helps considerably to visualize the above. Let's use circles to represent decision points and lines between circles to represent actions. Each line will have a label indicating input token type and one end will contain an arrowhead. The orientation of a line, indicated by the arrowhead, specifies the next decision point, after the action associated with the line is completed. The picture below visually describes the relationship between action and decision which is described above in terms of circles and oriented lines. The numbers in square brackets refer to the actions as numbered in the list above. On lines, the labels Number, Operator, and ) refer to read tokens that are numbers, operators, and right parentheses, respectively. In circles, the labels Number, Operator, ( and ) refer to previously read token types that are numbers, operators, left parentheses and right parentheses, respectively.

What is interesting is that the action taken depends not only on the currently read token type (shown as a line in the Figure), but also on the previously read token type (shown as a circle in the Figure). In the discussion of sample code, given below, we will refer to this figure in the comments. The lower left circle of the Figure corresponds to previous type Number, the top circle to previous type Operator, and the lower right circle to previous type RParen. The previous type LParen turns out not to be necessary in this case.

Look at this powerpoint presentation to observe the movement between decision points, the decisions made, and the actions invoked as tokens are read, left to right, from the input:

    (7 + (3 * (4 + 4) * 2 * 2 * (1 + 2)) * 3)

Alternatively, the following progression shows the contents of both stacks as tokens are read, left to right, from the above input expression.

Input Token Stacks Notes
 
Operator: +
Operands: 0
Initially, stack + and 0, previous type is Operator.
(
Operator: + +
Operands: 0 0
Stack + and 0, previous type is Operator.
7
Operator: +
Operands: 0 7
Pop the 0 and +, add to the 7. Push the resulting 7 onto Operands. Previous type is Number.
+
Operator: + +
Operands: 0 7
Push the + onto Operator. Previous type is Operator.
(
Operator: + + +
Operands: 0 7 0
Push + and 0, previous type is Operator.
3
Operator: + +
Operands: 0 7 3
Pop the + and 0, add the 3 to the 0 and push the resulting 3 onto Operands. Previous type is Number.
*
Operator: + + *
Operands: 0 7 3
Push the * on Operator. Previous type is Operator.
(
Operator: + + * +
Operands: 0 7 3 0
Stack + and 0, previous type is Operator.
4
Operator: + + *
Operands: 0 7 3 4
Pop + and 0, add 0 to 4, push the resulting 4 onto Operands. Previous type is Number.
+
Operator: + + * +
Operands: 0 7 3 4
Push the plus on Operator. Previous type is Operator.
4
Operator: + + *
Operands: 0 7 3 8
Pop the plus and 4, add the popped 4 to the read 4, push the result onto Operands. Previous type is Number.
)
Operator: + +
Operands: 0 7 24
Pop the 8 and the 3 from Operands, pop the * from Operator, multiply 3 and 8 and push the resulting 24 onto Operand. Previous type is Operator.
*
Operator: + + *
Operands: 0 7 24
Push the * onto Operator. Previous type is Operator.
2
Operator: + +
Operands: 0 7 48
Pop the 24 from Operands, pop the * from Operator, multiply the read 2 by the popped 24, push the result onto Operands. Previous type is Number.
*
Operator: + + *
Operands: 0 7 48
Push the * onto Operator. Previous type is Operator.
2
Operator: + +
Operands: 0 7 96
Pop the 48 from Operands, pop the * from Operator, multiply the read 2 by the popped 48, push the resulting 96 onto Operands. Previous type is Number.
*
Operator: + + *
Operands: 0 7 96
Push the * onto Operator. Previous type is Operator.
(
Operator: + + * +
Operands: 0 7 96 0
Stack + and 0, previous type is Operator.
1
Operator: + + *
Operands: 0 7 96 1
Pop the 0 and +, add the 0 to the read 1, push the resulting 1 onto Operands. Previous type is Number.
+
Operator: + + * +
Operands: 0 7 96 1
Push the + onto Operator. Previous type is Operator.
2
Operator: + + *
Operands: 0 7 96 3
Pop the 1 from Operands and + from Operator, add the read 2 and popped 1 and push the result onto Operands. Previous type is Number.
)
Operator: + +
Operands: 0 7 288
Pop the 3, and 96 from Operands, pop the * from Operator, multiply the popped 3 and popped 96, and push the resulting 288 onto Operands. Previous type is RParen.
)
Operator: +
Operands: 0 295
Pop the 288, and 7 from Operands, pop the + from Operator, add the popped 288 and popped 7, and push the resulting 295 onto Operands. Previous type is RParen.
*
Operator: + *
Operands: 0 295
Push the * onto Operator. Previous type is Operator.
3
Operator: +
Operands: 0 885
Pop the 295 from Operands, pop the * from Operator, multiply the popped 295 and read 3, and push the resulting 885 onto Operands. Previous type is Number.
)
Operator: empty
Operands: 885
Pop the 885 and 0 from Operands, and + from Operator, add the 885 to 0, and push the resulting 885 onto Operands. Done.

The code can be organized around the decision points and action sections as follows:

   // Action sections of code are implemented as functions whose
   // prototypes are given below.
   // All input tokens are treated as strings even if they are numbers.
   // Hence, arithmetic operations can be performed only through 
   // appropriate conversions to C++ type int or long
   void pushOperator(char *op) { /* push op onto Operator stack */ }
   void pushOperand(char *operand) { /* push operand onto Operand stack */ }
   void computeWithInputToken(char *number) {
      /* performs task of action 3) above where <number> is the
         token read from the input line */
   }
   void computeWithOperandsOnly() {
      /* performs task of action 4 above */
   }

   // Use this type to distinguish token and previous token (circle) types
   enum Type { LParen, Number, Operator, RParen };

   // This function determines the type of input token.  Token types are
   // LParen (that is, '('), Number, Operator, and RParen (that is, ')')
   Type tokenType(char *token) { ... }

   // The following outlines the decision points.
   // Comments refer to circles of the figure above. The value of variable 
   // "previous_type" signifies both the current decision point and a circle.
   // Values are Number, Operator, RParen representing, respectively, lower 
   // left, top, lower right circles in the above Figure.  The variable
   // is never given the value LParen since there is no such state.
   Type previous_type = Operator;  // The initial circle (top circle in the figure)
   ...
   Type type = tokenType (token);

   // From the lower right circle or the lower left circle, if the token is an
   // operator, perform action [1] and enter the top circle
   if ((previous_type == RParen || previous_type == Number) && type == Operator)
      pushOperator(token);           // action [1]

   // From the top circle, if the token is a Number,
   // perform action [3] and enter the lower left circle.
   else if (previous_type == Operator && type == Number) 
      computeWithInputToken(token);  // action [3]

   // From the lower right circle or the lower left circle, if the token is a 
   // right paren, perform action [4] and enter the lower right circle.
   else if ((previous_type == RParen || previous_type == Number) && type == RParen)
      computeWithOperandsOnly();     // action [4]

   // If there is a left paren, stack 0 and + and bypass the
   // LParen state, landing in the Operator state.
   else if (previous_type == Operator && type == LParen) {
      pushOperand("0");   // action [2]
      pushOperator("+");  // action [1]
      type = Operator;
   }   

   // Otherwise there is some parse error.
   else return ...;

   // The following line "moves" us from the previous circle to the new one.
   previous_type = type;
   ...

The Homework Problem:

Write a C++ program implementing the rules above and able to evaluate simple arithmetic expression as described above.

Requirements:

  1. Functional requirements:
    1. The program shall read an arithmetic expression from the command line. Tokens will be separated by blanks. For example, if the name of the executable is arith, a sample execution is
            arith "(7+(3*(4+4)*2*2*(1+2))*3)"
      
    2. Sample code for doing this, based on the Tokenizer class found in tokenizer.cc and tokenizer.h, is given here but you are encouraged to supply your own classes to do this.
         #include "tokenizer.h"
         ...
         int main (int argc, char **argv) {
            Parser *parser = new Parser();
            Tokenizer *tokenizer = new Tokenizer(argv[1]);
            char *token;
            while ((token = tokenizer->nextToken()) != NULL) parser->parse(token);
         }
      
      where class Parser is described below as is the method parse of Parser. The purpose of a tokenizer object is to separate input arithmetic expression tokens, which are all in argv[1] due to the double quotes surrounding the expression, and operate on each one in turn by passing them to method parse, one at a time. All tokens returned by a tokenizer object are of C++ type char * for convenience.

  2. Performance requirements:
    1. Time: None.
    2. Space: None.

  3. Implementation requirements:
    1. Use the code stacker.cc and stacker.h to supply the Stack class for maintaining the Operator and Operands stacks. Optionally, use the Tokenizer class implemented in tokenizer.h and tokenizer.cc to separate tokens on the command line (assumes tokens are delimited by at least one blank). If your code is in file arith.cc, say, then you can compile and link as follows:
         g++ -c tokenizer.cc
         g++ -c stacker.cc
         g++ -c arith.cc
         g++ arith.o stacker.o tokenizer.o -o arith
      
      which results in an executable named arith. Do not forget to add #include "tokenizer.h" and #include "stacker.h" to arith.cc.

      When linked to stacker.o you can create a stack using:

            Stack *stack = new Stack(disp);
      
      where disp is defined below. To push an item onto the stack use:
            stack->push("A string item");
      
      An example of popping an item from a stack is the following:
            char *item = (char *)stack->pop();
      
    2. The Stack class implemented in stacker.cc requires a display function to be passed to the constructor as in the example above. You should define and use
         void disp (void *object) {  cout << (char *)object << " ";  }
      
      for this purpose.
    3. Build class Parser to handle the evaluation. The class should contain the following elements.
      1. Code that has the effect of the following (important: you cannot put the code below into the constructor!! If you get errors, look at the initialization of the stacks first!!):
           Stack *operator_stack = new Stack(disp);
           Stack *operands_stack = new Stack(disp);
        
      2. Functions implementing each of the four action prototypes given above.
      3. Use
           enum Type { LParen, Number, Operator, RParen };
        
        to define token types.
      4. A method called tokenType which takes a char *token as input and returns an object of type Type which corresponds to the arithmetic type of token. For example, if token is the single character string * then the return value of tokenType is Operator.
      5. A variable called previous_type of type Type to hold the current decision point.
      6. A method called parse taking a char *token as input and having no return value. The token character string is the arithmetic token that is currently being read from the input expression. Depending on the current decision point and the arithmetic type of token, some action function is invoked, usually with token as an argument, and causing previous_type to be updated to reflect a new decision point for the next input token.
      7. A method prototyped as follows:
           char *compute (char *opr, char *opnd1, char *opnd2);
        
        which converts opnd1 and opnd2 to numbers, applies the operator opr to the numbers (for example, opnd1 * opnd2), converts the result back to a character string, and returns that string. Important: opnd2 should always be the operand on the right side of the operator opr and opnd1 should always be the left operand. Otherwise, this might work OK for + and * but not for - and /.

        Also, do not use something like the following:

           char *compute (char *opr, char *opnd1, char *opnd2) {
              ...
              char number[1024];
              ...
              sprintf(number, "%d", opnd1+opnd2);
              ...
              return number;
           }
        
        This is a classic error made by novice C++ coders. This cannot give predictable results because the memory taken by number is returned to the operating system as the invocation of compute is completed. Hence, there is no telling what value is returned.