Project 2: Sally FORTH

Due: Tuesday, October 9, 8:59:59pm

Addendum

[Sun Oct 7, 18:10pm] Added a third driver file, which takes the name of the Sally script file from a command line arg.

Objectives

The objective of this programming assignment is to have you become familiar with some of the templated data structures in the C++ Standard Template Library (STL).

Introduction

In this project, you will implement an interpreter for a FORTH-like programming language. In FORTH, arithmetic operations are written in Reverse Polish Notation (RPN). In RPN, the operands are written first, followed by the operation. For example,

2 + 3 would be written as 2 3 + The main "advantage" of RPN is that parentheses are unnecessary. For example, (2 + 3) * (4 + 5) in RPN is 2 3 + 4 5 + * This greatly simplifies the job of a FORTH interpreter or compiler since it does not have to do very much parsing. (A standard joke about FORTH is that it is a programming language that uses the programmer as the parser.) In fact, a FORTH compiler can be written in assembly language from scratch quite quickly. (Using STL, you will put together a FORTH interpreter in hours.) FORTH was implemented on many small computer systems in the 1970's and 1980's and was popular among astronomers. Today, vestiges of FORTH can still be found in the Postscript language and, to a lesser extent, in PDF.

FORTH is a stack-based language. The best way to think about the RPN expression

2 3 + 4 5 + * is to imagine a stack that holds the operands for each operation. Initially, 2 and 3 are pushed onto this parameter stack. When + is executed, it takes the two parameters 2 and 3 off the top of the stack and pushes their sum, 5, on the stack. Next, 4 and 5 are pushed onto the stack, and then, the second + replaces them with 9. Finally, the * operation takes 9 and 5 off the stack and replaces them with the product, 45. In FORTH, almost everything is done in reversed notation, not just the arithmetic operators. Even if statements and loops use the parameter stack.

We will call our FORTH interpreter "Sally" which is an acronym for Yet Another Little Stack-based Language. (Since ordering is scrambled in FORTH, the acronym is "Sally" rather than the unpronounceable "Yalsl".)

Assignment

For this project, we will start you off with a working interpreter that only has arithmetic operations ( +, -, *, /, % and NEG ) and printing commands ( ., SP and CR ). The source code for this calculator is here:

Your assignment is to expand the interpreter to handle stack operations, variables, comparison operators, logical operators, if statements and loops.

The calculator program has a lexical parser that groups the characters of the input into tokens. The lexical parser also ignores comments that come after // and recognizes string literals between ." and ". Other than comments and string literals, the tokens are the characters separated by white space (the space character, tabs and newlines). If a token can be converted into an integer, the lexical analyzer will indicate that the token is an integer. String literals are also indicated. Everything else is labeled "unknown". In the code above, the Sally class has a nextToken() member function that the interpreter calls to get the next Token of the input. The nextToken() function in turn calls fillBuffer() which implements the lexical parser.

The lexical parser will keep placing tokens into a buffer until the end of the input or a blank line is encountered. (A blank line has no characters, not even spaces and tabs.) When a blank line or end of file is reached, control is given back to the "main loop" of the interpreter. This arrangement allows us to use the interpreter in an interactive manner (type in a code fragment followed by a blank line) and also have the interpreter read from a file. Here's an example of a program that the calculator can handle:

// File: example1.sally // // CMSC 341 Fall 2018 Project 2 // // Sally FORTH source code // // Basic test // ."In FORTH, speak like Yoda we do" // string literals between ." and " . // . prints top of the stack CR // prints carriage return ."Three plus five, what is?" . SP // prints a space character 3 5 + // computes the sum . // prints the top of the stack again CR

Running this source through the calculator program produces this output:

In FORTH, speak like Yoda we do Three plus five, what is? 8 End of Program Parameter stack empty.

The main loop of the interpreter does the following:

Get the next token.
If the token is an integer, push the token on the parameter stack.
If the token is a string literal, push the token on the stack.
Otherwise, look for the token in the symbol table.
- If the token is found to be a keyword in the symbol table, then execute the operation for the keyword. This is accomplished by invoking a function through the function pointer stored in the symbol table entry.
- If the token is a variable, push the token on the stack, but re-label the token as a variable.
- Otherwise, the token is really an unknown item. We will push the token onto the parameter stack, but this is likely to cause an error later.

Here is a visual diagram of the interpreter:

The code for the main loop of the calculator program looks like: (The loop exits when nextToken() throws a EOProgram exception.)

Token tk ; while( 1 ) { tk = nextToken() ; if (tk.m_kind == INTEGER || tk.m_kind == STRING) { // if INTEGER or STRING just push onto stack params.push(tk) ; } else { it = symtab.find(tk.m_text) ; if ( it == symtab.end() ) { // not in symtab params.push(tk) ; } else if (it->second.m_kind == KEYWORD) { // invoke the function for this operation // it->second.m_dothis(this) ; } else if (it->second.m_kind == VARIABLE) { // variables are pushed as tokens // tk.m_kind = VARIABLE ; params.push(tk) ; } else { // default action // params.push(tk) ; } } }

Here Token is a data-only class with 3 data members. The member m_kind says whether the token is an INTEGER, STRING or UNKNOWN. (See class definition of Token below.) The member m_value is the binary int value of the token, if the token happens to be a number. Finally, m_text is the original text string of the token.

enum TokenKind { UNKNOWN, KEYWORD, INTEGER, VARIABLE, STRING } ; class Token { public: Token(TokenKind kind=UNKNOWN, int val=0, string txt="" ) ; TokenKind m_kind ; int m_value ; // if it's a known numeric value string m_text ; // original text that created this token } ;

In the interpreter, the token buffer tkBuffer is an STL list of Tokens. Similarly, the parameter stack params is an STL stack of Tokens. The symbol table, on the other hand, is an STL map from string to SymTabEntry. The class SymTabEntry is defined as:

typedef void (* operation_t)(Sally *Sptr) ; class SymTabEntry { public: SymTabEntry(TokenKind kind=UNKNOWN, int val=0, operation_t fptr=NULL) ; TokenKind m_kind ; int m_value ; // variables' values are stored here operation_t m_dothis ; // pointer to a function that does the work } ;

Note that SymTabEntry does not have an m_text member. This is because the symbol table is a map from string to SymTabEntry and the m_text member in Token is the string used as the key in the map. For example, for the CR operation, the lexical parser will put the string "CR" in the m_text member of a token. This is how the interpreter can get hold of the string "CR" to search in the symbol table for the entry for the CR operation. The m_kind member of the symbol table entry will then say that CR is a keyword, whereupon the interpreter will invoke the function stored in the member m_dothis.

How did this function pointer get stored in the symbol table entry? The constructor for the interpreter must initialize the symbol table. (See Sally.cpp.) The constructor for the Sally class includes the assignment

symtab["CR"] = SymTabEntry(KEYWORD,0,&doCR) ;

That adds an entry to the symbol table for the keyword CR with the address of the function doCR. What is doCR? It is declared as a static member function of the Sally class:

static void doCR(Sally *Sptr) ;

The definition of doCR in Sally.cpp is:

void Sally::doCR(Sally *Sptr) { cout << endl ; }

Thus, when interpreter encounters CR in the Sally FORTH program, it will invoke doCR() and print a newline to the standard output. It may seem rather convoluted, but using a map this way reduces the number of string comparisons that we have to make. (And, it gets you to practice using the STL map class.)

Why is doCR() a static member function?? There are two reasons. First, it is rather difficult to invoke a non-static member function using a function pointer. Some tricks are needed just to get a pointer to a non-static member function. In contrast, the pointer to doCR() is just &doCR. Secondly, we want a function that will be allowed to work on the parameter stack and the symbol table, which should be private members of the Sally class. Member functions, even static ones, are allowed access to the private members.

Now, static member functions do not have a host object. So, how does it get hold of the parameter stack? Through its own parameter. Here is the function that handles the + operation. Notice that every access of the parameter stack params uses Sptr.

void Sally::doPlus(Sally *Sptr) { Token p1, p2 ; if ( Sptr->params.size() < 2 ) { throw out_of_range("Need two parameters for +.") ; } p1 = Sptr->params.top() ; Sptr->params.pop() ; p2 = Sptr->params.top() ; Sptr->params.pop() ; int answer = p2.m_value + p1.m_value ; Sptr->params.push( Token(INTEGER, answer, "") ) ; }

Where does Sptr come from? When the functions doCR and doPlus are invoked in the main interpreter loop, the pointer to the host, the this pointer, is passed as a parameter:

it->second.m_dothis(this) ;

Finally, these are the operations that you have to add to the Sally FORTH interpreter:

Stack operations: DUP DROP SWAP ROT
Variable operations: SET @ !
Comparison operations: < <= == != >= >
Logic operations: AND OR NOT
If statements: IFTHEN ELSE ENDIF
Loop construct: DO UNTIL

Specifications

Here are the specifics of the assignment, including a description of each operation that you have to add to the interpreter.

Requirement: You are allowed to modify Sally.h and to change the implementation of the functions given to you, but you must use the STL list class for the token buffer, the STL stack class for the parameter stack and the STL map class for the symbol table. (For example, you can't change the implementation and use vector for any of these data structures.)

Requirement: For all operations involving the parameter stack, if there are not enough items on the stack, your interpreter should throw an out_of_range exception.

Stack operations: DUP DROP SWAP ROT

The DUP operation makes a copy of the top item of the parameter stack. For example, after the top two items of the parameter stack are 15 and 15.
The DROP operation removes the top item from the parameter stack.
The SWAP operation exchanges the top two items of the parameter stack. For example, after the top item of the parameter stack is 41 and the second item from the top is 79.
The ROT operation "rotates" the top three items of the parameter stack. For example, after the top item is 17, the next item is 19 and then 18.

Variable operations: SET @ !

Variables are stored in the symbol table. The symbol table entry for a variable also holds the value of the variable. Only integer variables are allowed. The SET operation defines the variable. For example

13 Lucky SET

will create a symbol table entry for a variable named Lucky and initialize it with the value 13. Duplicate definitions are not allowed. If the symbol table already has a value for "Lucky", an error message should be printed out and the interpreter should carry on. (This also prevents variable definitions from redefining keywords.)

The @ operation (pronounced "at" or "fetch") retrieves the value of a variable and places that value on the parameter stack. For example:

Lucky @

will place the value 13 on the stack. If no such variable exists, the interpreter should print out an error message and carry on.

The ! operation stores a value in a variable. For example:

17 Lucky !

stores the value 17 in the entry for Lucky in the symbol table. If no entry exists for a variable, the interpreter should print out an error message and carry on.

Comparison operations: < <= == != >= >

We will use the C/C++ convention that 0 represents the Boolean value false and anything else is true. Each of these operations will take two values from the top of the parameter stack, compare them and push the Boolean result of the comparison. The order of the operands is the same as the infix comparison. That is,

3 10 < should result in "true" being pushed on the stack. On the other hand 3 10 > should result in false.

Logic operations: AND OR NOT

Again, we will use the C/C++ convention that 0 represents the Boolean value false and anything else is true. The logical operations AND and OR are binary operations. The NOT operator is unary and negates the Boolean value at the top of the stack.

If statements: IFTHEN ELSE ENDIF

If statements in FORTH assume that the top of the stack has the Boolean value to test. For example:

1 2 > IFTHEN ."It's true!" . CR ELSE ."It's false!" . CR ENDIF

should print It's false!. On the other hand,

1 2 <= IFTHEN ."It's true!" . CR ELSE ."It's false!" . CR ENDIF

should print It's true!.

All if statements should have an IFTHEN part and an ELSE part. Nested if statements are allowed.

Loop construct: DO UNTIL

A DO UNTIL loop will execute the body of the loop until a Boolean value is true. For example, the following loop will print out 1 through 10 (inclusive).

0 DO 1 + DUP . CR DUP 10 >= UNTIL DROP

Test Programs

The following test programs may be used to check the compatibility of your Sally FORTH interpreter. These programs do not guarantee the correctness of your implementation. Even if your interpreter runs these programs correctly, it does not mean your interpreter is error-free. Grading will be done using programs that exercise your interpreter much more thoroughly. You must do the testing yourself --- testing is part of programming. Conversely, if your interpreter does not run these test programs correctly, then it is unlikely that it will run the grading programs correctly.

These files are also available on GL in the directory:

/afs/umbc.edu/users/p/a/park/pub/www/cs341.f18/projects/proj2files/

Implementation Notes

One of the main objectives for this project is for you to understand the STL classes by reading the documentation. Here are two sources that document the STL classes:
- Documentation for STL on SGI.com
- Documentation for STL on www.cplusplus.com
Before you use a feature of an STL class that you have not used before, write a separate little program that tests this feature. Make sure that the feature works as you expect before you attempt to incorporate it in the interpreter.
Practice incremental development. Test your code after you have implemented each group of Sally FORTH operations.
As declared, the Sally class does not have a destructor, a copy constructor or an overloaded assignment operator. Why is this OK? If you change the declarations in Sally.h, is it still OK?
If statements can be nested. After executing the IFTHEN part, the interpreter should skip over the ELSE part until it reaches ENDIF. However, there could be several IFTHEN ELSE ENDIF statements in the original ELSE part. So, you can't stop after the first ENDIF you find, because that ENDIF might belong to a different IFTHEN statement. You should count the number of IFTHEN tokens you see and the number of ENDIF tokens. You should stop only after the IFTHEN and ENDIF tokens are "balanced".
Be careful with ENDIF tokens. They don't do anything, but you should still put them in the symbol table. (Why?)
Your interpreter is allowed to misbehave if the programmer does not match the IFTHEN, ELSE and ENDIF tokens. For example, the code below has undefined behavior because the inner IFTHEN does not have a ELSE part.
When the interpreter encounters a DO UNTIL loop, it must start saving the tokens it has interpreted instead of discarding them. The interpreter can switch to the "save" mode when the DO token is encountered. The input tokens that come after DO should be saved in an STL list of Tokens. When UNTIL is executed, if the value on top of the stack is false, the saved tokens should be inserted back in the token buffer, at the front. Then, the next operation to be interpreted will be the first operation of the body of the DO UNTIL loop.

If the UNTIL operation stops the loop, the saved tokens should be discarded and the interpreter should switch back to "discard" mode.
This technique does not work for nested loops. (Why?)
Look up the splice() member function of the STL list class. It will be helpful for implementing DO UNTIL loops.
A DUMP operation is included. This is for you to implement any sort of debugging functionality that you may need. For example, DUMP might print out the contents of the parameter stack to help you figure out what's going on. (Hint: make a copy of the stack and print out the copy.) Then the DUMP operation can be inserted in the Sally FORTH source code where you would like to inspect the parameter stack.

Extra Credit

For 10 points extra credit, implement nested DO UNTIL loops. Hint: make a stack of lists of Tokens to save the bodies of the loops. Be careful that you do the right thing when an inner loop terminates. What should you do with the inner loop's body?

What to Submit

You must submit the following files to the proj2 directory.

Sally.h
Sally.cpp
Driver.cpp
test.sally

The test.sally program (written in Sally FORTH) should include tests showing the parts of your interpreter that work correctly.

If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command command.

cp Sally.h Sally.cpp Driver.cpp test.sally ~/cs341proj/proj2/

CMSC 341 Data Structures — All Sections — Fall 2018