<- previous index next ->
The goal here is to take an arbitrary Context Free Grammar
G = (V, T, P, S) and perform transformations on the grammar that
preserve the language generated by the grammar but reach a
specific format for the productions.
Overview: Step 1a) Eliminate useless variables that can not become terminals
Step 1b) Eliminate useless variables that can not be reached
Step 2) Eliminate epsilon productions
Step 3) Eliminate unit productions
Step 4) Make productions Chomsky Normal Form
Step 5) Make productions Greibach Normal Form
The CYK parsing uses Chomsky Normal Form as input
The CFG to NPDA uses Greibach Normal Form as input
Details: one step at a time
1a) Eliminate useless variables that can not become terminals
See 1st Ed. book p88, Lemma 4.1, figure 4.7
2nd Ed. section 7.1
Basically: Build the set NEWV from productions of the form
V -> w where V is a variable and w is one or more terminals.
Insert V into the set NEWV.
Then iterate over the productions, now accepting any variable
in w as a terminal if it is in NEWV. Thus NEWV is all the
variables that can be reduced to all terminals.
Now, all productions containing a variable not in NEWV
can be thrown away. Thus T is unchanged, S is unchanged,
V=NEWV and P may become the same or smaller.
The new grammar G=(V,T,P,S) represents the same language.
1b) Eliminate useless variables that can not be reached from S
See 1st Ed. book p89, Lemma 4.2, 2nd Ed. book 7.1.
Set V'=S, T'=phi, mark all production as unused.
Iterate repeatedly through all productions until no change
in V' or T'. For any production A -> w, with A in V'
insert the terminals from w into the set T' and insert
the variables form w into the set V' and mark the
production as used.
Now, delete all productions from P that are marked unused.
V=V', T=T', S is unchanged.
The new grammar G=(V,T,P,S) represents the same language.
2) Eliminate epsilon productions.
See 1st Ed. book p90, Theorem 4.3, 2nd Ed. book 7.1
This is complex. If the language of the grammar contains
the null string, epsilon, then in principle remove epsilon
from the grammar, eliminate epsilon productions.
The new grammar G=(V,T,P,S) represents the same language except
the new language does not contain epsilon.
3) Eliminate unit productions.
See 1st Ed. book p91, Theorem 4.4, 2nd Ed. 7.1
Iterate through productions finding A -> B type "unit productions".
Delete this production from P.
Make a copy of all productions B -> gamma, replacing B with A.
Be careful of A -> B, B -> C, C -> D type cases,
there needs to be copies of B -> gamma, C -> gamma, D -> gamma for A.
Delete duplicate productions. (sort and remove adjacent duplicate)
The new grammar G=(V,T,P,S) represents the same language.
Briefly, some pseudo code for the above steps.
Step 1a) The set V' = phi
loop through the productions, P, to find:
A -> w where w is all terminals
union V' with A
n := 0
while n /= |V'|
n := |V'|
loop through productions to find:
A -> alpha where alpha is only terminals and variables in V'
union V' with A
end while
Eliminate := V - V'
loop through productions
delete any production containing a variable in Eliminate,
V := V'
Step 1b) The set V' = {S}
The set T' = phi
n := 0
while n /= |V'| + |T'|
n := |V'| + |T'|
loop through productions to find:
A -> alpha where A in V'
union V' with variables in alpha
union T' with terminals in alpha
end while
loop through productions
delete any production containing anything outside V' T' and epsilon
V := V'
T := T'
Step 2) The set N = phi
n := -1
while n /= |N|
n = |N|
loop through productions to find:
A -> epsilon
union N with A
delete production
A -> alpha where no terminals in alpha and
all variables in alpha are in N
union N with A
delete production
end while
if S in N set null string accepted
loop through productions
A -> alpha where at least one variable in alpha in N
generate rules A -> alpha' where alpha'
is all combinations of eliminating the
variables in N
Step 3) P' := all non unit productions ( not A -> B )
U := all unit productions
loop through productions in U, |U| times, to find:
A -> A
ignore this
A -> B
loop through productions in P'
copy/substitute B -> gamma to A -> gamma in P'
P := P'
eliminate duplicate productions (e.g. sort and check i+i against i)
See link to "Turing machines and parsers."
The CYKP, CYK parser, has the above steps coded in C++ and with
"verbose 3" in the grammar file, most of the simplification is printed.
Of possible interest is a test case g_elim.g
input data to cykp and output g_elim.out
<- previous index next ->