<- previous index next ->
Grammars that have the same languages as DFA's
A grammar is defined as G = (V, T, P, S) where
V is a set of variables. We usually use capital letters for variables.
T is a set of terminal symbols. This is the same as Sigma for a machine.
P is a list of productions (rules) of the form:
variable -> concatenation of variables and terminals
S is the starting variable. S is in V.
A string z is accepted by a grammar G if some sequence of rules from P
can be applied to z with a result that is exactly the variable S.
We say that L(G) is the language generated (accepted) by the grammar G.
To start, we restrict the productions P to be of the form
A -> w w is a concatenation of terminal symbols
B -> wC w is a concatenation of terminal symbols
A, B and C are variables in V
and thus get a grammar that generates (accepts) a regular language.
Suppose we are given a machine M = (Q, Sigma, delta, q0, F) with
Q = { S }
Sigma = { 0, 1 }
q0 = S
F = { S }
delta | 0 | 1 |
---+---+---+
S | S | S |
---+---+---+
this looks strange because we would normally use q0 is place of S
The regular expression for M is (0+1)*
We can write the corresponding grammar for this machine as
G = (V, T, P, S) where
V = { S } the set of states in the machine
T = { 0, 1 } same as Sigma for the machine
P =
S -> epsilon | 0S | 1S
S = S the q0 state from the machine
the construction of the rules for P is directly from M's delta
If delta has an entry from state S with input symbol 0 go to state S,
the rule is S -> 0S.
If delta has an entry from state S with input symbol 1 go to state S,
the rule is S -> 1S.
There is a rule generated for every entry in delta.
delta(qi,a) = qj yields a rule qi -> a qj
An additional rule is generated for each final state, i.e. S -> epsilon
(An optional encoding is to generate an extra rule for every transition
to a final state: delta(qi,a) = any final state, qi -> a
with this option, if the start state is a final state, the production
S -> epsilon is still required. )
See g_reg.g file for worked example.
See g_reg.out for simulation check.
The shorthand notation S -> epsilon | 0S | 1S is the same as writing
the three rules. Read "|" as "or".
Grammars can be more powerful (read accept a larger class of languages)
than finite state machines (DFA's NFA's NFA-epsilon regular expressions).
i i
For example the language L = { 0 1 | i=0, 1, 2, ... } is not a regular
language. Yet, this language has a simple grammar
S -> epsilon | 0S1
Note that this grammar violates the restriction needed to make the grammars
language a regular language, i.e. rules can only have terminal symbols
and then one variable. This rule has a terminal after the variable.
A grammar for matching parenthesis might be
G = (V, T, P, S)
V = { S }
T = { ( , ) }
P = S -> epsilon | (S) | SS
S = S
We can check this be rewriting an input string
( ( ( ) ( ) ( ( ) ) ) )
( ( ( ) ( ) ( S ) ) ) S -> (S) where the inside S is epsilon
( ( ( ) ( ) S ) ) S -> (S)
( ( ( ) S S ) ) S -> (S) where the inside S is epsilon
( ( ( ) S ) ) S -> SS
( ( S S ) ) S -> (S) where the inside S is epsilon
( ( S ) ) S -> SS
( S ) S -> (S)
S S -> (S)
Thus the string ((()()(()))) is accepted by G because the rewriting
produced exactly S, the start variable.
More examples of constructing grammars from language descriptions:
Construct a CFG for non empty Palindromes over T = { 0, 1 }
The strings in this language read the same forward and backward.
G = ( V, T, P, S) T = { 0, 1 }, V = S, S = S, P is below:
S -> 0 | 1 | 00 | 11 | 0S0 | 1S1
We started the construction with S -> 0 and S -> 1
the shortest strings in the language.
S -> 0S0 is a palindrome with a zero added to either end
S -> 1S1 is a palindrome with a one added to either end
But, we needed S -> 00 and S -> 11 to get the even length
palindromes started.
"Non empty" means there can be no rule S -> epsilon.
n n
Construct the grammar for the language L = { a b n>0 }
G = ( V, T, P, S ) T = { a, b } V = { S } S = S P is:
S -> ab | aSb
Because n>0 there can be no S -> epsilon
The shortest string in the language is ab
a's have to be on the front, b's have to be on the back.
When either an "a" or a "b" is added the other must be added
in order to keep the count the same. Thus S -> aSb.
The toughest decision is when to stop adding rules.
In this case start "generating" strings in the language
S -> ab ab for n=1
S -> aSb aabb for n=2
S -> aaSbb aaabbb for n=3 etc.
Thus, no more rules needed.
"Generating" the strings in a language defined by a grammar
is also called "derivation" of the strings in a language.
Homework 6 is assigned
<- previous index next ->