<- previous index next ->
Grammars that have the same languages as DFA's A grammar is defined as G = (V, T, P, S) where V is a set of variables. We usually use capital letters for variables. T is a set of terminal symbols. This is the same as Sigma for a machine. P is a list of productions (rules) of the form: variable -> concatenation of variables and terminals S is the starting variable. S is in V. A string z is accepted by a grammar G if some sequence of rules from P can be applied to z with a result that is exactly the variable S. We say that L(G) is the language generated (accepted) by the grammar G. To start, we restrict the productions P to be of the form A -> w w is a concatenation of terminal symbols B -> wC w is a concatenation of terminal symbols A, B and C are variables in V and thus get a grammar that generates (accepts) a regular language. Suppose we are given a machine M = (Q, Sigma, delta, q0, F) with Q = { S } Sigma = { 0, 1 } q0 = S F = { S } delta | 0 | 1 | ---+---+---+ S | S | S | ---+---+---+ this looks strange because we would normally use q0 is place of S The regular expression for M is (0+1)* We can write the corresponding grammar for this machine as G = (V, T, P, S) where V = { S } the set of states in the machine T = { 0, 1 } same as Sigma for the machine P = S -> epsilon | 0S | 1S S = S the q0 state from the machine the construction of the rules for P is directly from M's delta If delta has an entry from state S with input symbol 0 go to state S, the rule is S -> 0S. If delta has an entry from state S with input symbol 1 go to state S, the rule is S -> 1S. There is a rule generated for every entry in delta. delta(qi,a) = qj yields a rule qi -> a qj An additional rule is generated for each final state, i.e. S -> epsilon (An optional encoding is to generate an extra rule for every transition to a final state: delta(qi,a) = any final state, qi -> a with this option, if the start state is a final state, the production S -> epsilon is still required. ) See g_reg.g file for worked example. See g_reg.out for simulation check. The shorthand notation S -> epsilon | 0S | 1S is the same as writing the three rules. Read "|" as "or". Grammars can be more powerful (read accept a larger class of languages) than finite state machines (DFA's NFA's NFA-epsilon regular expressions). i i For example the language L = { 0 1 | i=0, 1, 2, ... } is not a regular language. Yet, this language has a simple grammar S -> epsilon | 0S1 Note that this grammar violates the restriction needed to make the grammars language a regular language, i.e. rules can only have terminal symbols and then one variable. This rule has a terminal after the variable. A grammar for matching parenthesis might be G = (V, T, P, S) V = { S } T = { ( , ) } P = S -> epsilon | (S) | SS S = S We can check this be rewriting an input string ( ( ( ) ( ) ( ( ) ) ) ) ( ( ( ) ( ) ( S ) ) ) S -> (S) where the inside S is epsilon ( ( ( ) ( ) S ) ) S -> (S) ( ( ( ) S S ) ) S -> (S) where the inside S is epsilon ( ( ( ) S ) ) S -> SS ( ( S S ) ) S -> (S) where the inside S is epsilon ( ( S ) ) S -> SS ( S ) S -> (S) S S -> (S) Thus the string ((()()(()))) is accepted by G because the rewriting produced exactly S, the start variable. More examples of constructing grammars from language descriptions: Construct a CFG for non empty Palindromes over T = { 0, 1 } The strings in this language read the same forward and backward. G = ( V, T, P, S) T = { 0, 1 }, V = S, S = S, P is below: S -> 0 | 1 | 00 | 11 | 0S0 | 1S1 We started the construction with S -> 0 and S -> 1 the shortest strings in the language. S -> 0S0 is a palindrome with a zero added to either end S -> 1S1 is a palindrome with a one added to either end But, we needed S -> 00 and S -> 11 to get the even length palindromes started. "Non empty" means there can be no rule S -> epsilon. n n Construct the grammar for the language L = { a b n>0 } G = ( V, T, P, S ) T = { a, b } V = { S } S = S P is: S -> ab | aSb Because n>0 there can be no S -> epsilon The shortest string in the language is ab a's have to be on the front, b's have to be on the back. When either an "a" or a "b" is added the other must be added in order to keep the count the same. Thus S -> aSb. The toughest decision is when to stop adding rules. In this case start "generating" strings in the language S -> ab ab for n=1 S -> aSb aabb for n=2 S -> aaSbb aaabbb for n=3 etc. Thus, no more rules needed. "Generating" the strings in a language defined by a grammar is also called "derivation" of the strings in a language. Homework 6 is assigned
<- previous index next ->