Project 2: Sally FORTH

Due: Tuesday, October 9, 8:59:59pm


Addendum

[Sun Oct 7, 18:10pm] Added a third driver file, which takes the name of the Sally script file from a command line arg.


Objectives

The objective of this programming assignment is to have you become familiar with some of the templated data structures in the C++ Standard Template Library (STL).

Introduction

In this project, you will implement an interpreter for a FORTH-like programming language. In FORTH, arithmetic operations are written in Reverse Polish Notation (RPN). In RPN, the operands are written first, followed by the operation. For example,

2 + 3 would be written as 2 3 + The main "advantage" of RPN is that parentheses are unnecessary. For example, (2 + 3) * (4 + 5) in RPN is 2 3 + 4 5 + * This greatly simplifies the job of a FORTH interpreter or compiler since it does not have to do very much parsing. (A standard joke about FORTH is that it is a programming language that uses the programmer as the parser.) In fact, a FORTH compiler can be written in assembly language from scratch quite quickly. (Using STL, you will put together a FORTH interpreter in hours.) FORTH was implemented on many small computer systems in the 1970's and 1980's and was popular among astronomers. Today, vestiges of FORTH can still be found in the Postscript language and, to a lesser extent, in PDF.

FORTH is a stack-based language. The best way to think about the RPN expression

2 3 + 4 5 + * is to imagine a stack that holds the operands for each operation. Initially, 2 and 3 are pushed onto this parameter stack. When + is executed, it takes the two parameters 2 and 3 off the top of the stack and pushes their sum, 5, on the stack. Next, 4 and 5 are pushed onto the stack, and then, the second + replaces them with 9. Finally, the * operation takes 9 and 5 off the stack and replaces them with the product, 45. In FORTH, almost everything is done in reversed notation, not just the arithmetic operators. Even if statements and loops use the parameter stack.

We will call our FORTH interpreter "Sally" which is an acronym for Yet Another Little Stack-based Language. (Since ordering is scrambled in FORTH, the acronym is "Sally" rather than the unpronounceable "Yalsl".)


Assignment

For this project, we will start you off with a working interpreter that only has arithmetic operations ( +, -, *, /, % and NEG ) and printing commands ( ., SP and CR ). The source code for this calculator is here:

Your assignment is to expand the interpreter to handle stack operations, variables, comparison operators, logical operators, if statements and loops.

The calculator program has a lexical parser that groups the characters of the input into tokens. The lexical parser also ignores comments that come after // and recognizes string literals between ." and ". Other than comments and string literals, the tokens are the characters separated by white space (the space character, tabs and newlines). If a token can be converted into an integer, the lexical analyzer will indicate that the token is an integer. String literals are also indicated. Everything else is labeled "unknown". In the code above, the Sally class has a nextToken() member function that the interpreter calls to get the next Token of the input. The nextToken() function in turn calls fillBuffer() which implements the lexical parser.

The lexical parser will keep placing tokens into a buffer until the end of the input or a blank line is encountered. (A blank line has no characters, not even spaces and tabs.) When a blank line or end of file is reached, control is given back to the "main loop" of the interpreter. This arrangement allows us to use the interpreter in an interactive manner (type in a code fragment followed by a blank line) and also have the interpreter read from a file. Here's an example of a program that the calculator can handle:

// File: example1.sally // // CMSC 341 Fall 2018 Project 2 // // Sally FORTH source code // // Basic test // ."In FORTH, speak like Yoda we do" // string literals between ." and " . // . prints top of the stack CR // prints carriage return ."Three plus five, what is?" . SP // prints a space character 3 5 + // computes the sum . // prints the top of the stack again CR


Running this source through the calculator program produces this output:

In FORTH, speak like Yoda we do Three plus five, what is? 8 End of Program Parameter stack empty.


The main loop of the interpreter does the following:

Here is a visual diagram of the interpreter:

The code for the main loop of the calculator program looks like: (The loop exits when nextToken() throws a EOProgram exception.)

Token tk ; while( 1 ) { tk = nextToken() ; if (tk.m_kind == INTEGER || tk.m_kind == STRING) { // if INTEGER or STRING just push onto stack params.push(tk) ; } else { it = symtab.find(tk.m_text) ; if ( it == symtab.end() ) { // not in symtab params.push(tk) ; } else if (it->second.m_kind == KEYWORD) { // invoke the function for this operation // it->second.m_dothis(this) ; } else if (it->second.m_kind == VARIABLE) { // variables are pushed as tokens // tk.m_kind = VARIABLE ; params.push(tk) ; } else { // default action // params.push(tk) ; } } }

Here Token is a data-only class with 3 data members. The member m_kind says whether the token is an INTEGER, STRING or UNKNOWN. (See class definition of Token below.) The member m_value is the binary int value of the token, if the token happens to be a number. Finally, m_text is the original text string of the token.

enum TokenKind { UNKNOWN, KEYWORD, INTEGER, VARIABLE, STRING } ; class Token { public: Token(TokenKind kind=UNKNOWN, int val=0, string txt="" ) ; TokenKind m_kind ; int m_value ; // if it's a known numeric value string m_text ; // original text that created this token } ;

In the interpreter, the token buffer tkBuffer is an STL list of Tokens. Similarly, the parameter stack params is an STL stack of Tokens. The symbol table, on the other hand, is an STL map from string to SymTabEntry. The class SymTabEntry is defined as:

typedef void (* operation_t)(Sally *Sptr) ; class SymTabEntry { public: SymTabEntry(TokenKind kind=UNKNOWN, int val=0, operation_t fptr=NULL) ; TokenKind m_kind ; int m_value ; // variables' values are stored here operation_t m_dothis ; // pointer to a function that does the work } ;

Note that SymTabEntry does not have an m_text member. This is because the symbol table is a map from string to SymTabEntry and the m_text member in Token is the string used as the key in the map. For example, for the CR operation, the lexical parser will put the string "CR" in the m_text member of a token. This is how the interpreter can get hold of the string "CR" to search in the symbol table for the entry for the CR operation. The m_kind member of the symbol table entry will then say that CR is a keyword, whereupon the interpreter will invoke the function stored in the member m_dothis.

How did this function pointer get stored in the symbol table entry? The constructor for the interpreter must initialize the symbol table. (See Sally.cpp.) The constructor for the Sally class includes the assignment

symtab["CR"] = SymTabEntry(KEYWORD,0,&doCR) ;

That adds an entry to the symbol table for the keyword CR with the address of the function doCR. What is doCR? It is declared as a static member function of the Sally class:

static void doCR(Sally *Sptr) ;

The definition of doCR in Sally.cpp is:

void Sally::doCR(Sally *Sptr) { cout << endl ; }

Thus, when interpreter encounters CR in the Sally FORTH program, it will invoke doCR() and print a newline to the standard output. It may seem rather convoluted, but using a map this way reduces the number of string comparisons that we have to make. (And, it gets you to practice using the STL map class.)

Why is doCR() a static member function?? There are two reasons. First, it is rather difficult to invoke a non-static member function using a function pointer. Some tricks are needed just to get a pointer to a non-static member function. In contrast, the pointer to doCR() is just &doCR. Secondly, we want a function that will be allowed to work on the parameter stack and the symbol table, which should be private members of the Sally class. Member functions, even static ones, are allowed access to the private members.

Now, static member functions do not have a host object. So, how does it get hold of the parameter stack? Through its own parameter. Here is the function that handles the + operation. Notice that every access of the parameter stack params uses Sptr.

void Sally::doPlus(Sally *Sptr) { Token p1, p2 ; if ( Sptr->params.size() < 2 ) { throw out_of_range("Need two parameters for +.") ; } p1 = Sptr->params.top() ; Sptr->params.pop() ; p2 = Sptr->params.top() ; Sptr->params.pop() ; int answer = p2.m_value + p1.m_value ; Sptr->params.push( Token(INTEGER, answer, "") ) ; }

Where does Sptr come from? When the functions doCR and doPlus are invoked in the main interpreter loop, the pointer to the host, the this pointer, is passed as a parameter:

it->second.m_dothis(this) ;

Finally, these are the operations that you have to add to the Sally FORTH interpreter:



Specifications

Here are the specifics of the assignment, including a description of each operation that you have to add to the interpreter.

Requirement: You are allowed to modify Sally.h and to change the implementation of the functions given to you, but you must use the STL list class for the token buffer, the STL stack class for the parameter stack and the STL map class for the symbol table. (For example, you can't change the implementation and use vector for any of these data structures.)

Requirement: For all operations involving the parameter stack, if there are not enough items on the stack, your interpreter should throw an out_of_range exception.


Stack operations: DUP DROP SWAP ROT


Variable operations: SET @ !

Variables are stored in the symbol table. The symbol table entry for a variable also holds the value of the variable. Only integer variables are allowed. The SET operation defines the variable. For example

13 Lucky SET

will create a symbol table entry for a variable named Lucky and initialize it with the value 13. Duplicate definitions are not allowed. If the symbol table already has a value for "Lucky", an error message should be printed out and the interpreter should carry on. (This also prevents variable definitions from redefining keywords.)

The @ operation (pronounced "at" or "fetch") retrieves the value of a variable and places that value on the parameter stack. For example:

Lucky @

will place the value 13 on the stack. If no such variable exists, the interpreter should print out an error message and carry on.

The ! operation stores a value in a variable. For example:

17 Lucky !

stores the value 17 in the entry for Lucky in the symbol table. If no entry exists for a variable, the interpreter should print out an error message and carry on.


Comparison operations: < <= == != >= >

We will use the C/C++ convention that 0 represents the Boolean value false and anything else is true. Each of these operations will take two values from the top of the parameter stack, compare them and push the Boolean result of the comparison. The order of the operands is the same as the infix comparison. That is,

3 10 < should result in "true" being pushed on the stack. On the other hand 3 10 > should result in false.


Logic operations: AND OR NOT

Again, we will use the C/C++ convention that 0 represents the Boolean value false and anything else is true. The logical operations AND and OR are binary operations. The NOT operator is unary and negates the Boolean value at the top of the stack.


If statements: IFTHEN ELSE ENDIF

If statements in FORTH assume that the top of the stack has the Boolean value to test. For example:

1 2 > IFTHEN ."It's true!" . CR ELSE ."It's false!" . CR ENDIF

should print It's false!. On the other hand,

1 2 <= IFTHEN ."It's true!" . CR ELSE ."It's false!" . CR ENDIF

should print It's true!.

All if statements should have an IFTHEN part and an ELSE part. Nested if statements are allowed.


Loop construct: DO UNTIL

A DO UNTIL loop will execute the body of the loop until a Boolean value is true. For example, the following loop will print out 1 through 10 (inclusive).

0 DO 1 + DUP . CR DUP 10 >= UNTIL DROP


Test Programs

The following test programs may be used to check the compatibility of your Sally FORTH interpreter. These programs do not guarantee the correctness of your implementation. Even if your interpreter runs these programs correctly, it does not mean your interpreter is error-free. Grading will be done using programs that exercise your interpreter much more thoroughly. You must do the testing yourself --- testing is part of programming. Conversely, if your interpreter does not run these test programs correctly, then it is unlikely that it will run the grading programs correctly.

These files are also available on GL in the directory:

/afs/umbc.edu/users/p/a/park/pub/www/cs341.f18/projects/proj2files/


Implementation Notes



Extra Credit

For 10 points extra credit, implement nested DO UNTIL loops. Hint: make a stack of lists of Tokens to save the bodies of the loops. Be careful that you do the right thing when an inner loop terminates. What should you do with the inner loop's body?



What to Submit

You must submit the following files to the proj2 directory.

The test.sally program (written in Sally FORTH) should include tests showing the parts of your interpreter that work correctly.

If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command command.

cp Sally.h Sally.cpp Driver.cpp test.sally ~/cs341proj/proj2/