<- previous index next ->
Greibach Normal Form of a CFG has all productions of the form A ->aV Where 'A' is a variable, 'a' is exactly one terminal and 'V' is a string of none or more variables. Every CFG can be rewritten in Greibach Normal Form. This is step 5 in the sequence of "simplification" steps for CFG's. Greibach Normal Form will be used to construct a PushDown Automata that recognizes the language generated by a Context Free Grammar. Starting with a grammar: G = ( V, T, P, S) 1a) Eliminate useless variables that can not become terminals 1b) Eliminate useless variables that can not be reached 2) Eliminate epsilon productions 3) Eliminate unit productions 4) Convert productions to Chomsky Normal Form 5) Convert productions to Greibach Normal Form using algorithm below: Re-label all variables such that the names are A1, A2, ... , Am . The notation A(j) refers to the variable named Aj . New variables may be created with names B1, B2, ... referred to as B(j) . Productions may be created and/or removed (a mechanical implementation may use coloring to track processed, added and deleted productions.) Step 1 (repeat until no productions are added, m-1 times at most) begin (m is the number of variables, can change on each repeat) for k := 1 to m do begin for j := 1 to k-1 do for each production of the form A(k) -> A(j) alpha do begin for all productions A(j) -> beta do add production A(k) -> beta alpha remove production A(k) -> A(j) alpha end; end; for each production of form A(k) -> A(k) alpha do begin add production b(k) -> alpha and B(k) -> alpha B(k) remove production A(k) -> A(k) alpha end; for each production A(k) -> beta where beta does not begin A(K) do add production A(k) -> beta B(k) end; Step 2 (sort productions and delete any duplicates ) Remove A -> A beta by creating B -> alpha B Substitute so all rules become Greibach with starting terminal. For neatness, sort productions and delete any duplicates see book: 2nd Ed. page 271, section 7.1, Exercise 7.1.11 construction Example (in file format, input given to program cykp, output file greibach.pda) // g1.g a test grammar // G = (V, T, P, S) V={S,A} T={a,b} S=S // start S terminal a b ; verbose 7 // causes every step of ever loop to be printed // runs a very long time ! S -> a A S ; S -> a ; A -> S b A ; A -> S S ; A -> b a ; enddef abaa // greibach.pda , edited, from cykp < g1.g > g1_cykp.out start S terminal a ; terminal b ; variable A ; variable C_AS ; variable C_T_bA ; variable S ; variable T_a ; A -> a C_AS C_T_bA ; A -> a C_AS S ; A -> a C_T_bA ; A -> a S ; A -> b T_a ; C_AS -> a C_AS C_T_bA S ; C_AS -> a C_AS S S ; C_AS -> a C_T_bA S ; C_AS -> a S S ; C_AS -> b T_a S ; C_T_bA -> b A ; S -> a C_AS ; S -> a ; T_a -> a ; enddef Note that an optional optimization could delete the last rule, and replace T_a by S. Push Down Automata M = ( Q, Sigma, Gamma, delta, q0, Z0, F) Q = a finite set of states including q0 Sigma = a finite alphabet of input symbols (on the input tape) Gamma = a finite set of push down stack symbols including Z0 delta = a set of nondeterministic transitions from-state tape-input top-of-stack to-state symbol(s)-to-push q0 = the initial state, an element of Q Z0 = the initial stack contents, an element of Gamma F = the set of final accepting states, if empty then accept on an empty stack at end of input +-------------------------+ | input string | +-------------------------+ ^ read, move right | | read or write (pop or push) +-------------------------------------> |Z0 | | +-----+ | | | | |--> accept | | push down +--+ FSM | | | stack | |--> reject | | +-----+ accepts at end of input string, when in a final state, and stack contains only Z0. Standard Finite State Machine we have been using. The only extra part is the push down stack. Many parsers use a push down stack. We are covering the basic abstract, theory, that can prove parsers work.BNF Backus-Naur Form
history 4000BC to 1963 The similarity of our "productions" with the computer BNF is shown in the following example. Variables can be any string, we use uppercase letters and numbers. Terminals are enclosed in apostrophes. Colon, :, replaces arrow, -> Vertical bar, |, is "or" Computer BNF Production file type .y file type .g S : 'a' S -> a | 'a' S S -> aS S -> a | aS term : var '*' var T1 : V1 '*' V1 T1 -> V1*V1 when * in Sigma B : B -> epsilon | Bab | Bb | B 'a' B | B 'b' Additional actions/output/statements can be used in computer BNF, { stuff } C : 'c' { printf("found a 'c' \n); } | 'c" C { printf("here comes more 'c' \n); } Thus, formatting, put "|" under ":". May also need or use ";" big ugly "C" language example Note needs "tokenizer" for variables and numbers, punctuation are tokens, reserved words are tokens, ... then rules, ... Review for Quiz 2
<- previous index next ->