Project 2: Sally FORTH
Due: Tuesday, October 9, 8:59:59pm
Addendum
[Sun Oct 7, 18:10pm] Added a third driver file, which takes the name of the Sally script file from a command line arg.
Objectives
The objective of this programming assignment is to have you become familiar with some of the templated data structures in the C++ Standard Template Library (STL).Introduction
In this project, you will implement an interpreter for a FORTH-like programming language. In FORTH, arithmetic operations are written in Reverse Polish Notation (RPN). In RPN, the operands are written first, followed by the operation. For example,
FORTH is a stack-based language. The best way to think about the RPN expression
We will call our FORTH interpreter "Sally" which is an acronym for Yet Another Little Stack-based Language. (Since ordering is scrambled in FORTH, the acronym is "Sally" rather than the unpronounceable "Yalsl".)
Assignment
For this project, we will start you off with a working interpreter that only has arithmetic operations ( +, -, *, /, % and NEG ) and printing commands ( ., SP and CR ). The source code for this calculator is here:
Your assignment is to expand the interpreter to handle stack operations, variables, comparison operators, logical operators, if statements and loops.
The calculator program has a lexical parser that groups the characters of the input into tokens. The lexical parser also ignores comments that come after // and recognizes string literals between ." and ". Other than comments and string literals, the tokens are the characters separated by white space (the space character, tabs and newlines). If a token can be converted into an integer, the lexical analyzer will indicate that the token is an integer. String literals are also indicated. Everything else is labeled "unknown". In the code above, the Sally class has a nextToken() member function that the interpreter calls to get the next Token of the input. The nextToken() function in turn calls fillBuffer() which implements the lexical parser.
The lexical parser will keep placing tokens into a buffer until the end of the input or a blank line is encountered. (A blank line has no characters, not even spaces and tabs.) When a blank line or end of file is reached, control is given back to the "main loop" of the interpreter. This arrangement allows us to use the interpreter in an interactive manner (type in a code fragment followed by a blank line) and also have the interpreter read from a file. Here's an example of a program that the calculator can handle:
Running this source through the calculator program produces this output:
The main loop of the interpreter does the following:
- Get the next token.
- If the token is an integer, push the token on the parameter stack.
- If the token is a string literal, push the token on the stack.
- Otherwise, look for the token in the symbol table.
- If the token is found to be a keyword in the symbol table, then execute the operation for the keyword. This is accomplished by invoking a function through the function pointer stored in the symbol table entry.
- If the token is a variable, push the token on the stack, but re-label the token as a variable.
- Otherwise, the token is really an unknown item. We will push the token onto the parameter stack, but this is likely to cause an error later.
Here is a visual diagram of the interpreter:
Here Token is a data-only class with 3 data members. The member m_kind says whether the token is an INTEGER, STRING or UNKNOWN. (See class definition of Token below.) The member m_value is the binary int value of the token, if the token happens to be a number. Finally, m_text is the original text string of the token.
In the interpreter, the token buffer tkBuffer is an STL list of Tokens. Similarly, the parameter stack params is an STL stack of Tokens. The symbol table, on the other hand, is an STL map from string to SymTabEntry. The class SymTabEntry is defined as:
Note that SymTabEntry does not have an m_text member. This is because the symbol table is a map from string to SymTabEntry and the m_text member in Token is the string used as the key in the map. For example, for the CR operation, the lexical parser will put the string "CR" in the m_text member of a token. This is how the interpreter can get hold of the string "CR" to search in the symbol table for the entry for the CR operation. The m_kind member of the symbol table entry will then say that CR is a keyword, whereupon the interpreter will invoke the function stored in the member m_dothis.
How did this function pointer get stored in the symbol table entry? The constructor for the interpreter must initialize the symbol table. (See Sally.cpp.) The constructor for the Sally class includes the assignment
That adds an entry to the symbol table for the keyword CR with the address of the function doCR. What is doCR? It is declared as a static member function of the Sally class:
Thus, when interpreter encounters CR in the Sally FORTH program, it will invoke doCR() and print a newline to the standard output. It may seem rather convoluted, but using a map this way reduces the number of string comparisons that we have to make. (And, it gets you to practice using the STL map class.)
Why is doCR() a static member function?? There are two reasons. First, it is rather difficult to invoke a non-static member function using a function pointer. Some tricks are needed just to get a pointer to a non-static member function. In contrast, the pointer to doCR() is just &doCR. Secondly, we want a function that will be allowed to work on the parameter stack and the symbol table, which should be private members of the Sally class. Member functions, even static ones, are allowed access to the private members.
Now, static member functions do not have a host object. So, how does it get hold of the parameter stack? Through its own parameter. Here is the function that handles the + operation. Notice that every access of the parameter stack params uses Sptr.
Where does Sptr come from? When the functions doCR and doPlus are invoked in the main interpreter loop, the pointer to the host, the this pointer, is passed as a parameter:
Finally, these are the operations that you have to add to the Sally FORTH interpreter:
- Stack operations: DUP DROP SWAP ROT
- Variable operations: SET @ !
- Comparison operations: < <= == != >= >
- Logic operations: AND OR NOT
- If statements: IFTHEN ELSE ENDIF
- Loop construct: DO UNTIL
Specifications
Here are the specifics of the assignment, including a description of each operation that you have to add to the interpreter.
Requirement: You are allowed to modify Sally.h and to change the implementation of the functions given to you, but you must use the STL list class for the token buffer, the STL stack class for the parameter stack and the STL map class for the symbol table. (For example, you can't change the implementation and use vector for any of these data structures.)
Requirement: For all operations involving the parameter stack, if there are not enough items on the stack, your interpreter should throw an out_of_range exception.
Stack operations: DUP DROP SWAP ROT
- The DUP operation makes a copy of the top item of
the parameter stack. For example, after
15 DUP the top two items of the parameter stack are 15 and 15. - The DROP operation removes the top item from the parameter stack.
- The SWAP operation exchanges the top two items of the parameter
stack. For example, after
41 79 SWAP the top item of the parameter stack is 41 and the second item from the top is 79. - The ROT operation "rotates" the top three items of the parameter
stack. For example, after
17 18 19 ROT the top item is 17, the next item is 19 and then 18.
Variable operations: SET @ !
Variables are stored in the symbol table. The symbol table entry for a variable also holds the value of the variable. Only integer variables are allowed. The SET operation defines the variable. For example
will create a symbol table entry for a variable named Lucky and initialize it with the value 13. Duplicate definitions are not allowed. If the symbol table already has a value for "Lucky", an error message should be printed out and the interpreter should carry on. (This also prevents variable definitions from redefining keywords.)
The @ operation (pronounced "at" or "fetch") retrieves the value of a variable and places that value on the parameter stack. For example:
will place the value 13 on the stack. If no such variable exists, the interpreter should print out an error message and carry on.
The ! operation stores a value in a variable. For example:stores the value 17 in the entry for Lucky in the symbol table. If no entry exists for a variable, the interpreter should print out an error message and carry on.
Comparison operations: < <= == != >= >
We will use the C/C++ convention that 0 represents the Boolean value false and anything else is true. Each of these operations will take two values from the top of the parameter stack, compare them and push the Boolean result of the comparison. The order of the operands is the same as the infix comparison. That is,
Logic operations: AND OR NOT
Again, we will use the C/C++ convention that 0 represents the Boolean value false and anything else is true. The logical operations AND and OR are binary operations. The NOT operator is unary and negates the Boolean value at the top of the stack.
If statements: IFTHEN ELSE ENDIF
If statements in FORTH assume that the top of the stack has the Boolean value to test. For example:
All if statements should have an IFTHEN part and an ELSE part. Nested if statements are allowed.
Loop construct: DO UNTIL
A DO UNTIL loop will execute the body of the loop until a Boolean value is true. For example, the following loop will print out 1 through 10 (inclusive).
Test Programs
The following test programs may be used to check the compatibility of your Sally FORTH interpreter. These programs do not guarantee the correctness of your implementation. Even if your interpreter runs these programs correctly, it does not mean your interpreter is error-free. Grading will be done using programs that exercise your interpreter much more thoroughly. You must do the testing yourself --- testing is part of programming. Conversely, if your interpreter does not run these test programs correctly, then it is unlikely that it will run the grading programs correctly.
- example1.sally and example1.txt
- example2.sally and example2.txt
- example3.sally and example3.txt
- example4.sally and example4.txt
- example5.sally and example5.txt
- example6.sally and example6.txt
- example7.sally and example7.txt
- example8.sally and example8.txt
- example9.sally and example9.txt
These files are also available on GL in the directory:
Implementation Notes
- One of the main objectives for this project is for you to understand the STL classes by reading the documentation. Here are two sources that document the STL classes:
- Before you use a feature of an STL class that you have not used before, write a separate little program that tests this feature. Make sure that the feature works as you expect before you attempt to incorporate it in the interpreter.
- Practice incremental development. Test your code after you have implemented each group of Sally FORTH operations.
- As declared, the Sally class does not have a destructor, a copy constructor or an overloaded assignment operator. Why is this OK? If you change the declarations in Sally.h, is it still OK?
- If statements can be nested. After executing the IFTHEN part, the interpreter should skip over the ELSE part until it reaches ENDIF. However, there could be several IFTHEN ELSE ENDIF statements in the original ELSE part. So, you can't stop after the first ENDIF you find, because that ENDIF might belong to a different IFTHEN statement. You should count the number of IFTHEN tokens you see and the number of ENDIF tokens. You should stop only after the IFTHEN and ENDIF tokens are "balanced".
- Be careful with ENDIF tokens. They don't do anything, but you should still put them in the symbol table. (Why?)
-
Your interpreter is allowed to misbehave if the programmer does not
match the IFTHEN, ELSE and ENDIF tokens.
For example, the code below has undefined behavior because the inner
IFTHEN does not have a ELSE part.
1 < 2 IFTHEN 3 10 > IFTHEN ."something" . CR ENDIF ELSE ."another thing" . CR ENDIF
-
When the interpreter encounters a DO UNTIL loop, it must start
saving the tokens it has interpreted instead of discarding them. The
interpreter can switch to the "save" mode when the DO token is
encountered. The input tokens that come after DO
should be saved in an STL list of
Tokens. When UNTIL is executed, if the value on top of
the stack is false, the saved tokens should be inserted back in the
token buffer, at the front. Then, the next operation to be interpreted
will be the first operation of the body of the DO UNTIL loop.
If the UNTIL operation stops the loop, the saved tokens should be discarded and the interpreter should switch back to "discard" mode.
This technique does not work for nested loops. (Why?)
- Look up the splice() member function of the STL list class. It will be helpful for implementing DO UNTIL loops.
- A DUMP operation is included. This is for you to implement any sort of debugging functionality that you may need. For example, DUMP might print out the contents of the parameter stack to help you figure out what's going on. (Hint: make a copy of the stack and print out the copy.) Then the DUMP operation can be inserted in the Sally FORTH source code where you would like to inspect the parameter stack.
Extra Credit
For 10 points extra credit, implement nested DO UNTIL loops. Hint: make a stack of lists of Tokens to save the bodies of the loops. Be careful that you do the right thing when an inner loop terminates. What should you do with the inner loop's body?
What to Submit
You must submit the following files to the proj2 directory.
- Sally.h
- Sally.cpp
- Driver.cpp
- test.sally
The test.sally program (written in Sally FORTH) should include tests showing the parts of your interpreter that work correctly.
If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command command.