<- previous index next ->
The goal here is to take an arbitrary Context Free Grammar G = (V, T, P, S) and perform transformations on the grammar that preserve the language generated by the grammar but reach a specific format for the productions. Overview: Step 1a) Eliminate useless variables that can not become terminals Step 1b) Eliminate useless variables that can not be reached Step 2) Eliminate epsilon productions Step 3) Eliminate unit productions Step 4) Make productions Chomsky Normal Form Step 5) Make productions Greibach Normal Form The CYK parsing uses Chomsky Normal Form as input The CFG to NPDA uses Greibach Normal Form as input Details: one step at a time 1a) Eliminate useless variables that can not become terminals See 1st Ed. book p88, Lemma 4.1, figure 4.7 2nd Ed. section 7.1 Basically: Build the set NEWV from productions of the form V -> w where V is a variable and w is one or more terminals. Insert V into the set NEWV. Then iterate over the productions, now accepting any variable in w as a terminal if it is in NEWV. Thus NEWV is all the variables that can be reduced to all terminals. Now, all productions containing a variable not in NEWV can be thrown away. Thus T is unchanged, S is unchanged, V=NEWV and P may become the same or smaller. The new grammar G=(V,T,P,S) represents the same language. 1b) Eliminate useless variables that can not be reached from S See 1st Ed. book p89, Lemma 4.2, 2nd Ed. book 7.1. Set V'=S, T'=phi, mark all production as unused. Iterate repeatedly through all productions until no change in V' or T'. For any production A -> w, with A in V' insert the terminals from w into the set T' and insert the variables form w into the set V' and mark the production as used. Now, delete all productions from P that are marked unused. V=V', T=T', S is unchanged. The new grammar G=(V,T,P,S) represents the same language. 2) Eliminate epsilon productions. See 1st Ed. book p90, Theorem 4.3, 2nd Ed. book 7.1 This is complex. If the language of the grammar contains the null string, epsilon, then in principle remove epsilon from the grammar, eliminate epsilon productions. The new grammar G=(V,T,P,S) represents the same language except the new language does not contain epsilon. 3) Eliminate unit productions. See 1st Ed. book p91, Theorem 4.4, 2nd Ed. 7.1 Iterate through productions finding A -> B type "unit productions". Delete this production from P. Make a copy of all productions B -> gamma, replacing B with A. Be careful of A -> B, B -> C, C -> D type cases, there needs to be copies of B -> gamma, C -> gamma, D -> gamma for A. Delete duplicate productions. (sort and remove adjacent duplicate) The new grammar G=(V,T,P,S) represents the same language. Briefly, some pseudo code for the above steps. Step 1a) The set V' = phi loop through the productions, P, to find: A -> w where w is all terminals union V' with A n := 0 while n /= |V'| n := |V'| loop through productions to find: A -> alpha where alpha is only terminals and variables in V' union V' with A end while Eliminate := V - V' loop through productions delete any production containing a variable in Eliminate, V := V' Step 1b) The set V' = {S} The set T' = phi n := 0 while n /= |V'| + |T'| n := |V'| + |T'| loop through productions to find: A -> alpha where A in V' union V' with variables in alpha union T' with terminals in alpha end while loop through productions delete any production containing anything outside V' T' and epsilon V := V' T := T' Step 2) The set N = phi n := -1 while n /= |N| n = |N| loop through productions to find: A -> epsilon union N with A delete production A -> alpha where no terminals in alpha and all variables in alpha are in N union N with A delete production end while if S in N set null string accepted loop through productions A -> alpha where at least one variable in alpha in N generate rules A -> alpha' where alpha' is all combinations of eliminating the variables in N Step 3) P' := all non unit productions ( not A -> B ) U := all unit productions loop through productions in U, |U| times, to find: A -> A ignore this A -> B loop through productions in P' copy/substitute B -> gamma to A -> gamma in P' P := P' eliminate duplicate productions (e.g. sort and check i+i against i) See link to "Turing machines and parsers." The CYKP, CYK parser, has the above steps coded in C++ and with "verbose 3" in the grammar file, most of the simplification is printed. Of possible interest is a test case g_elim.g input data to cykp and output g_elim.out
<- previous index next ->