Amu_Ke_Fundye
Context-Free Grammars and Push-Down Automata
Context Free Language
- The languages which are generated by context-free grammars are called Context-Free Languages (CFLs).
- CFLs are accepted by Push down Automata.
- CFLs are also called as non-deterministic CFL.
Definition: If v is one-step derivable from u, written u ⇒ v. If v is derivable from u, written u if there is a chain of one derivations of the form u ⇒ u1 ⇒ u2 ⇒ … ⇒ v
Example: Consider the context free grammar G = ({s}, {0, 1}, P, S) where Productions are:
(i) S → 0S1
(ii) S →ε
Derivations are:
Derivations:
A derivation of a string is a sequence of rule applications. The language defined by a context-free grammar is the set of strings derivable from the start symbol S (for sentence).
A string can be derived with any of the following derivations.
Left Most Derivation:
- In each sentential form, left most non-terminal substituted first to derive a string from the starting symbol.
- A derivation is left most, if at each step in the derivation a production is applied to the left most non-terminal in the sentential form.
Right Most Derivation:
- In each sentential form, right most non-terminal substituted first to derive a string from the starting symbol.
- A derivation is left most, if at each step in the derivation a production is applied to the left most non-terminal in the sentential form.
Example:
- Every derivation corresponds to one derivation tree.
S ⇒ AB
⇒ aAAB
⇒ aaAB
⇒ aaaB
⇒ aaab
- Every derivation tree corresponds to one or more derivations.
S ⇒ AB S ⇒ AB S ⇒ AB
⇒ aAAB ⇒ Ab ⇒ AB
⇒ aaAB ⇒ aAAb ⇒ aAAb
⇒ aaaB ⇒ aAab ⇒ aaAb
⇒ aaab ⇒ aaab ⇒ aaab
Derivation Tree (Parse Tree)
- A derivation tree (or parse tree) can be defined with any non-terminal as the root, internal nodes are non-terminals and leaf nodes are terminals.
- Every derivation corresponds to one derivation tree.
- If a vertex A has k children with labels A1, A2, A3,…Ak, then A → A1 A2 A3…Ak will be a production in context-free grammar G.
Example:
S → AB, A → aAA, A → aA, B → bB, B → b
Ambiguous Grammar
A context-free grammar G is ambiguous if there is atleast one string in L(G) having two or more distinct derivation trees (or equivalently, two or more distinct left most derivations or two or more distinct right most derivations).
e.g., consider the context-free grammar G having productions E → E + E/a. The string a + a + a has two left most derivations.
Let’s see the derivations
E ⇒ E + E ⇒ E + E + E ⇒ a + E + E ⇒ a + a + E ⇒ a + a + a
E ⇒ E + E ⇒ a + E ⇒ a ⇒ E + E ⇒ a + a + E ⇒ a + a + a
and the derivation trees are
CFG Simplification
The four main steps will be followed in CFG simplification
- Eliminate ambiguity.
- Eliminate useless symbols productions.
- Eliminate ∧ productions: A → ∧
- Eliminate unit productions: A → B
Eliminate the Ambiguity
We can remove the ambiguity by removing the left recursing and left factoring.
Left Recursion
A production of the context free grammar G = (VN, E, P, S) is said to be left recursive if it is of the form
A → Aα
Where, A is a non-terminal and α∈(VN ∪ E)*
Removal of Left Recursion
Let the variable A has left recursive productions as follows
(i) A → Aα1 |Aα2|Aα3|…|Aαn|β1|β2|β3|…|βn
Where β1, β2 ….. βn do not begin with A. Then we replace A production in the form of
(ii) A → β1A1 |β2A1|…|βmA1 where
A1 → α1A1|α2A1|α3A1|…,|αnA1|∧
Left Factoring
Two or more productions of a variable A of the grammar G = (VN, E, S, P) are said to have left factoring, if the productions are of the form
A → αβ1|αβ2|…αβn where β1,…βn(VN ∪Σ)
Removal of Left Factoring
Let the variable A has left factoring productions as follows
A → αβ1|αβ2|…|αβn|y1|y2|y3|…|ym
where, β1, β2…..,βn have a common factor
α and y1, y2,….ym does not contain a as a prefix, then we replace the production into the form as follows
A → αA1|Y1Y2|…..|YM, where
A1 → β1|β2|…..|βn
Eliminate the Useless Productions/Symbols
The symbols that cannot be used in any productions due to their unavailability in the productions or inability in deriving the terminals, are known as useless symbols.
e,g., consider the grammar G with the following production rules
S → aS |A| C
A → a
B → aa
C → ab
Step 1 Generate the list of variables those produce terminal symbols
U = {A, B, S}
Because C does not produce terminal symbols so this production will be deleted. Now the modified productions are
S → aS |A
A → a
B → aa
Step 2 Identity the variables dependency graph
S → AB
In this graph, B variable is not reachable from S so it will be deleted also. Now the productions are
S → aS |A
A → a
Eliminate Null Productions
If any variable goes to ∧ then that is called as nullable variable.
e.g., A → ∧, then variable A is said to be nullable variable
Step 1 Scan the nullable variables in the given production list.
Step 2 Find all productions which does not include null productions.
e.g., consider the CFG has following productions
S → ABaC
A → BC
B → b|∧
C → D|∧
D → d
solve step find the nulable variables firstly the set is empty
N = {}
N = {B, C}
N = {A, B, C}
Due to B, C variables, A will also be a nullable variable.
Step 3 {Null productions}
S → BaC | AaC | ABa | aC | Ba | Aa | a
A → B | C
B → b
C → D
D → d
The above grammar is the every possible combination except ∧ Now put this new grammar with original grammar with null.
S → ABaC | BaC | AaC | ABa | aC | Ba | Aa | a
Eliminate the Unit-Productions
A production of the type A → B, where A, B are variables is called unit productions.
Step 1 Using productions, we create dependency graph
S ⇒ B
∵ S B & B A
∴ S A
Step 2 Now the production without unit productions
S → Aa S → bb | a | bc
B → bb + A → bb
A → a | bc B → a | bc
Now the final grammar is
S → Aa | bb | a | bc
B → bb | a | bc
A → a | bc | bb
Normal Forms of CFGs
Ambiguity is the undesirable property of a context-free grammar that we might wish to eliminate. To convert a context-free grammar into normal form, we start by trying to eliminate null productions of the form A → ∧ and the unit productions of the form B → C.
There are two normal forms
- Chomsky Normal Form (CNF)
- Greibach Normal Form (GNF)
Chomsky Normal Form (CNF)
A context-free grammar G is said to be in Chomsky Normal Form, if every production is of the form either A → a, (exactly a single terminal in the right hand side of the production) or A → BC (exactly two variables in the right hand side of the production).
e.g., the context-free grammar G with productions S → AB, A → a, B → b is in Chomsky normal form.
Chomsky Normal Form Properties
- The number of steps in derivation of any string ω of length n is 2n – 1, where the grammar should be in CNF.
- The minimum height of derivation tree of any ω of length n is [log2 n] + 1.
- The maximum height of derivation tree of any ω of length n = n.
Greibach Normal Form (GNF)
A context-free grammar is said to be in Greibach Normal Form, if every production is of the form
A → aα
where, a∈Σ, A∈VN and α∈
Deterministic Context-Free Language (DCFL)
The set of deterministic context-free languages is a proper subset of the set of context-free languages that possess an unambiguous context-free grammar.
Key Points
- The problem of whether a given context-free language is deterministic is undividable.
- Deterministic context-free languages can be recognized by a deterministic turning machine in polynomial time and O (log2 n) space.
- The language of this class have great practical importance in computer science as they can be passed much more efficiently then non deterministic context-free languages.
Pushdown Automata (PDA)
A Pushdown Automata (PDA) is essentially an NFA with a stack. A PDA is inherently non-deterministic. To handle a language like {an bn |n ≥ 0}, the machine needs to remember the number of a’s and b’s. To do this, we use a stack. So, a PDA is a finite automaton with a stack. A stack is a data structure that can contain an number of elements but for which only the top element may be accessed.
Definition of PDA
A Pushdown Automaton (PDA) is defined as 7-tuple.
M = (Q, Σ, Γ, δ, q0, Z,F)
where, Q is a finite set of sates
Σ is the input alphabet
Γ is the stack alphabet
δ is the transition function which maps
(Q × (Σ ∪ {ε}) × (Γ ∪{ε}) = (Q × (Γ ∪ {ε}))
q0 is the start state and ε denotes the empty string.
q0∈Q is start state
Z ∈Γ is the initial stack symbol
F ⊆ Q is the set of final or accepting states.
Acceptance of PDA
Tape is divided into finitely many cells. Each cell contains a symbol in an alphabet L. The stack head always scans the top symbol of the stack as shown in figure.
It performs two basic operations
Push add a new symbol at the top.
Pop read and remove the top symbol.
δ (q, a, v) = (p, u)
It means that if the tape head reads input a, the stack head read v and the finite control is in state q, then one of the possible moves is that the next state is p, v is replaced by u at stack and the tape head moves one cell to the right.
δ (q, ε, v) = (p, u)
It means that this is a ε -move (null move)
δ (q, a, ε) = (p, u)
It means that a push operation performs on stack.
δ (q, a, v) = (p, ε)
It means that a pop operation performs on stack.
PDA can accept a string in three ways:
- PDA acceptance by Empty Stack: If the stack is empty after reading the entire input string then PDA accepted the given string, otherwise rejected.
- PDA acceptance by Final State: If the stack reaches final state after reading the input string then PDA accepted the given string, otherwise rejected.
- PDA acceptance by Final State and Empty Stack: If the stack reaches final state and also stack is empty after reading the entire input string then PDA accepted the given string, otherwise rejected.
Non-deterministic PDA: Like NFA, Non-deterministic PDA (NPDA) has number of choices for its inputs. An NPDA accepts an input, if sequence of choices leads to some final state or causes PDA to empty its stack.
Deterministic PDA
Deterministic PDA (DPDA) is a pushdown automata whose action is an situation is fully determined rather than facing a choice between multiple alternative actions. DPDAs cannot handle languages or grammars with ambiguity. A deterministic context-free language is a language recognised by some deterministic pushdown automata.
Following languages are DCFLs
- L = {anbn : n ≥ 0}
- L = {ancb2n : n ≥ 0}
- L = {ωcωR : ω∈(a + b) * but not L = {ωωR : (a + b)*}
- For every regular set, there exists a CFG e such that L = L (G).
- Every regular language is a CFL.
- Let G1 and G2 be context-free grammars. Then, G1 and G2 are equivalent if and only if L (G1) = L (G2).
- The intersection of a context-free language and a regular language is a context-free language.
- The reverse of a context-free language is context-free.
- A DFA can remember only a finite amount of information whereas a PDA can remember an infinite amount of information.
- For every PDA, there is a context-free grammar and for every context-free grammar, there is a PDA.
- If L1 is a DCFL and L2 is regular then, L1 ∪ L2 is also DCFL.
- If L1 is a DCFL and L2 is a regular language, then L1 ∩ L2 is also DCFL.
- Every regular language is DCFL.
- The power of non-deterministic pushdown automata and deterministic pushdown automata is not same. But the power of non-deterministic pushdown automata and deterministic pushdown is same.
- A FSM (Finite State Machine) with one stack is more powerful than FSM without stack.
- If left recursion or left factoring is present, it is not sure that the grammar is ambiguous but there may be chance of ambiguity.
Closure Properties of CFLs
CFLs are closed under following properties:
- Union
- Concatenation
- Kleene closure
- Positive closure
- Substitution
- Homomorphism
- Inverse homomorphism
- Reversal
- Intersection with regular
- Union with regular
- Difference with regular
CFLs are not closed under following properties :
- Intersection
- Complementation
- Difference
Decision Properties of CFL’s
Decidable Problems:
- Emptyness Problem: Is a given L(G) accepts empty?
- Membership problem: Is a string w accepted by given PDA?
- Finiteness Problem: Is a given L(G) accepts finite language?
Undecidable Problems:
- Equivalence problem: Are two CFLs the same?
- Ambiguity problem: Is a given CFG ambiguous?
- Is a given CFL inherently ambiguous?
- Is the intersection of two CFLs empty?
- Totality problem: Is a given L(G) equal to ∑* ?
Regards
Amrut Jagdish Gupta
Comments
Post a Comment