PUMPING LEMMA FOR REGULAR LANGUAGES

In the theory of formal languages, the 'pumping lemma for regular languages' is a lemma that gives a property that all regular languages have. It is a particular example of pumping lemmas in general. Its primary use is to prove a language is not regular.
The pumping lemma was first articulated by Y. Bar-Hillel, M. Perles, E. Shamir in 1961[1]

Contents
Formal statement
Informal statement and explanation
Use of lemma
Proof of lemma
Lemma not sufficient
General version of Pumping Lemma for Regular Languages
References

Formal statement


Let ''L'' be a regular language. Then there exists an integer ''p'' ≥ 1 such that every string ''w'' in ''L'' of
length at least ''p'' (''p'' is called the "pumping length") can be written as ''w'' = ''xyz'' (i.e., ''w'' can be divided into three substrings), satisfying the following conditions:
# |''y''| > 0
# |''xy''| ≤ ''p''
# for all ''i'' ≥ 0, ''xy''''i''''z'' ∈ ''L''
''y'' is the substring that can be pumped (removed or repeated any number of times, and the resulting string is always in ''L''). (i) means the loop ''y'' to be pumped must be of length at least one; (ii) means the loop must occur within the first ''p'' characters. There is no restriction on ''x'' and ''z''.

Informal statement and explanation


The pumping lemma describes a property that all regular languages are guaranteed to have. More precisely, it describes a property of some strings belonging to regular languages. The strings in the language it concerns itself with are those that are of length at least ''p'' (this is the same ''p'' as in the formal statement above), where ''p'' is a constant -- called the pumping length -- that varies between regular languages. Say ''s'' is a string of length at least ''p'' that belongs to a regular language ''L''. The pumping lemma states that there is some way to split ''s'' into three substrings, s = xyz, such that the middle portion, ''y'' (which must not be empty), can be repeated an arbitrary number of times (including zero times, which removes it) yielding a string that is still in ''L''. This process of "pumping" the string by repeating a section of it is what has given the pumping lemma its name. In addition, the pumping lemma guarantees that the length of ''xy'' will be at most ''p'' for some such way to split up ''s''.
The pumping lemma is often used to prove languages non-regular (that's why it's called a "lemma" -- it is a useful result for proving other things), by showing that some string in the language that should have the properties outlined in the pumping lemma -- provided that the language is regular -- in fact does not have them (i.e. it cannot be pumped without getting some strings that are not in the language). See the ''Use of lemma'' section.

Use of lemma


When using this lemma to prove a language non-regular, it is generally done with a proof by contradiction. If we assume that a language ''L'' is regular, then the pumping lemma for regular languages must hold for all strings in ''L''. By showing that there exists an instance where the lemma does ''not'' hold, we also show that our assumption is incorrect: ''L'' cannot be regular. This is done by finding a string ''w'' ∈ ''L'' and integer ''i'' where the pumping lemma does not hold.
Using this lemma, one can, for instance, prove that the language
''L'' = {''anbn'' : ''n'' ≥ 0}
over the alphabet Σ = {''a'', ''b''} is not regular. We will let ''w'', ''x'', ''y'', ''z'', ''p'', and ''i'' be as stated in the pumping lemma for regular languages. Let ''w'' = ''apbp''. It is obviously in ''L'', and the pumping lemma therefore yields a decomposition of ''w'' = ''xyz'' with |''xy''| ≤ ''p'', |''y''| ≥ 1 and
''xyiz'' in L for every ''i'' ≥ 0. If we let |''xy''|=''p'' and |''z''|=''p'', then ''xy'' is the first half of ''w'', or all ''p'' of the ''a''s. Because |''y''| ≥ 1, it consists of a non-zero number of ''a''s, and ''xy2z'' has more ''a''s than ''b''s and is therefore not in ''L'' (note that any value of ''i'' ≠ 1 will give us a contradiction). We have reached a contradiction because, in this case, the pumping lemma does not hold. Therefore, the assumption that ''L'' is regular must be incorrect. ''L'' is not regular.
The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows the same idea. Given ''p'', there is a string of balanced parentheses that begins with more than ''p'' left parentheses, so that ''y'' will consist entirely of left parentheses. By repeating ''y'', we can produce a string that does not contain the same number of left and right parentheses, and so they cannot be balanced.

Proof of lemma


The proof idea uses the fact that for every regular language there is a finite state machine (FSA) that accepts the language. If the regular language is finite, then the pumping lemma is trivially true since there are no strings in the language greater than the pumping length.
If the regular language is infinite, then there must be some minimal FSA that accepts the language. The number of states in this FSA are counted, and then, that count is used as the pumping length ''p''. If the string's length is longer than ''p'', then there must be at least one state that is repeated (which we will call state ''S''). The transitions that take the machine from state ''S'' back to state ''S'' match some string. This string is called ''y'' in the lemma, and since the machine will match a string without the ''y'' portion, or the string ''y'' can be repeated, the conditions of the lemma are satisfied.
This makes more sense with an example. The following image shows an FSA.
An automat accepting the language a(bc)
★ d.png

The FSA accepts the string: 'abcbcd' . Since this is longer than the number of states there are repeated states. In this example, the repeated states are states q1 and q2. Since the substring 'bcbc' of string 'abcbcd' takes the machine through transitions that start at state q1 and end at state q1, the string portion could be repeated and the FSA would still accept giving the string 'abcbcbcbcd', and the string portion could be removed and the FSA would still accept giving the string 'ad'. In terms of the pumping lemma, the string 'abcbcd' is broken into a ''x'' portion 'a', a ''y'' portion 'bc' and a ''z'' portion 'bcd'. (Note, it could be broken in different ways such as a ''x'' portion 'a' , a ''y'' portion 'bcbc' and a ''z'' portion 'd' and all the conditions hold except that |''xy''| ≤ ''p'')

Lemma not sufficient


Note that the pumping lemma does not give a ''sufficient'' condition for a language to be regular. That is, the lemma holds for some non-regular languages. If the lemma does 'not' hold for a language ''L'', then ''L'' is 'not' regular. If the lemma 'does' hold for a language ''L'', ''L'' 'might' be regular. For example, the language ''{uuRv : u,v in {0,1}+}'' (strings over the alphabet ''{0,1}'' consisting of a nonempty even palindrome followed by another nonempty string) is not regular but can still be "pumped" with ''p'' = 4: Suppose ''w=uuRv'' has length at least 4. If ''u'' has length 1, then |''v''| ≥ 2 and we can take ''y'' to be the first character in ''v''. Otherwise, take ''y'' to be the first character of ''u'' and note that ''yk'' for ''k'' ≥ 2 starts with the nonempty palindrome ''yy''. For a practical test that exactly characterizes regular languages, see the Myhill-Nerode theorem. The typical method for proving that a language is regular is to construct either a finite state machine or a regular expression for the language.

General version of Pumping Lemma for Regular Languages


If a language ''L'' is regular, then there exists a number ''p'' > 0 such that every string ''uwv'' in ''L'' with |''w''| ≥ ''p'' can be written in the form; where ''p'' is a pumping length.
:''uwv'' = ''uxyzv''
with strings ''x'', ''y'' and ''z'' such that |''xy''| ≤ ''p'', |''y''| > 0 and
:''uxyizv'' is in ''L'' for every integer ''i'' ≥ 0.
This version can be used to prove many more languages are non-regular, since it imposes stricter requirements on the language.

References


1. Y. Bar-Hillel, M. Perles, E. Shamir, "On formal properties of simple phrase structure grammars", ''Zeitschrift für Phonetik, Sprachweissenshaft und Kommunikationsforschung'' '14' (1961) pp. 143-172.


Introduction to the Theory of Computation, Michael Sipser, , , PWS Publishing, 1997, ISBN 0-534-94728-X Section 1.4: Nonregular Languages, pp.77–83.

★ John E. Hopcroft and Jeffrey D. Ullman, ''Introduction to Automata Theory, Languages and Computation'', Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN 0-201-029880-X. ''(See chapter 3.)''

This article provided by Wikipedia. To edit the contents of this article, click here for original source.

psst.. try this: add to faves