FORMAL LANGUAGE
:''This article is about the term 'formal language' as it is used in mathematics, logic and computer science. For information about a mode of expression that is more disciplined or precise than everyday speech, see Register (linguistics).''
In mathematics, logic, and computer science, a 'formal language' is a language that is defined by precise mathematical or machine processable formulas. Like languages in linguistics, formal languages generally have two aspects:
★ the syntax of a language is what the language looks like (more formally: the set of possible expressions that are valid utterances in the language)
★ the semantics of a language are what the utterances of the language mean (which is formalized in various ways, depending on the type of language in question)
The branch of mathematics and computer science which studies exclusively the theory of language syntax is known as 'formal language theory'. In formal language theory, a language is nothing more than its syntax; questions of semantics are not addressed in this specialty.
In mathematics, a 'formal language' consists of two parts, an ''alphabet'' and rules of ''syntax''. The ''alphabet'' is any set of symbols; the rules of ''syntax'' are rules that a string of these symbols must follow if it is to be considered part of the 'formal language'.
As a simple example, consider the ''alphabet'' {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =} together with the following rules of ''syntax'':
★ Every string that does not contain + or = and does not start with 0 is in the language.
★ The string consisting of 0 is in the language.
★ A string containing = is in the language if and only if there is exactly one =, and it separates two strings in the language.
★ Any number of + symbols can appear in a string of the language as long as each of them is surrounded by symbols other than + or =.
★ No string is in the language other than those implied by the previous rules.
Under these rules, the string "23+4=555" is in the language, the string "=234=+" is not. This 'formal language' expresses whole numbers, well-formed addition statements, and well-formed addition equalities, but it expresses only what they look like (their syntax), not what they mean (semantics). For instance, nowhere in these rules is it defined that 0 means the number zero, or that + means addition.
The rest of this article is devoted to the basic notions of formal language theory.
In formal language theory, a language is defined as a possibly infinite set of finite-length sequences of elements drawn from a specified finite set, say .
The set is called the ''alphabet'' of the language;
an element of is called a ''symbol'' or ''character'';
a finite sequence of symbols/characters is called a ''string'';
an element of the language is called a ''word''.
However, the theory is not concerned with applications at all, and therefore, completely neutral to what symbols and words actually stand for.
For instance, in linguistics, formal language theory can be applied at many different levels of language description simultaneously:
★ in syntax, which describes how ''words'' in the ''lexicon'' (or ''vocabulary'') combine to form ''sentences'';
★ in morphology, which describes how ''parts of words'' combine to form ''words'';
★ in orthography (or spelling), which describes how ''characters'' (in the ''alphabet'') combine to form ''words''
★ in phonology, which describes how ''phonemes'' combine to form ''words''.
As an example of a formal language, an alphabet might be , and a string over that alphabet might be .
A typical language over that alphabet, containing that string, would be the set of all strings which contain the same number of symbols and .
The 'empty word' (that is, length-zero string) is allowed and is often denoted by , or . While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings (because the length of words belonging to it may be unbounded).
Some more examples:
★ the set of all words over
★ the set , where is a natural number and means repeated times
★ Finite languages, such as or
★ the set of syntactically correct programs in a given programming language; or
★ the set of inputs upon which a certain Turing machine halts; or,
★ the set of the longest strings of alphanumeric ASCII characters on this line, (i.e., the set {the, set, of, longest, strings, of, alphanumeric, ASCII, characters, on, this, line, i, e})
Formal language theory rarely concerns itself with particular languages (except as examples), but is mainly concerned with the study of various types of formalisms to describe languages. For instance, a language can be given as
★ the set of strings that can be generated by some grammar (see Chomsky hierarchy);
★ the set of strings described or matched by a particular regular expression;
★ the set of strings accepted by some automaton, such as a Turing machine or finite state automaton;
★ the set of strings for which some decision procedure (an algorithm that asks a sequence of related YES/NO questions) produces the answer YES.
Typical questions asked about such formalisms include:
★ What is their expressive power? (Can formalism X describe every language that formalism Y can describe? Can it describe other languages?)
★ What is their recognizability? (How difficult is it to decide whether a given word belongs to a language described by formalism X?)
★ What is their comparability? (How difficult is it to decide whether two languages, one described in formalism X and one in formalism Y, or in X again, are actually the same language?)
Surprisingly often, the answer to these decision problems is "it cannot be done at all", or "it is extremely expensive" (with a precise characterization of how expensive exactly). Therefore, formal language theory is a major application area of computability theory and complexity theory.
Certain operations on languages are common. This includes the standard set operations, such as union, intersection, and complementation. Another class of operation is the elementwise application of string operations.
Examples: suppose and are languages over some common alphabet.
★ The ''concatenation'' consists of all strings of the form where is a string from and is a string from .
★ The ''intersection'' of and consists of all strings which are contained in both languages
★ The ''complement'' of a language w.r.t. a given alphabet consists of all strings over the alphabet that are not in the language.
Such operations are used to investigate closure properties of classes of languages.
A class of languages is closed under a particular operation when the operation, applied to languages in the class, always produces a language in the same class again.
For instance, the context-free languages are known to be closed under union, concatenation, and intersection with regular languages, but not closed under intersection or complementation.
★ A. G. Hamilton, ''Logic for Mathematicians'', Cambridge University Press, 1978, ISBN 0 521 21838 1.
★ Michael A. Harrison: Introduction to Formal Language Theory, Addison-Wesley, 1978.
★ Grzegorz Rozenberg, Arto Salomaa, ''Handbook of Formal Languages: Volume I-III'', Springer, 1997, ISBN 3 540 61486 9.
★ Patrick Suppes, ''Introduction to Logic'', D. Van Nostrand, 1957, ISBN 0 442 08072 7.
★ Formal grammar
★ Formal method
★ Formal science
★ Formal system
★ Mathematical notation
★ Programming language
★ Formal Language Definitions website 1/24/04
★ James Power, Notes on Formal Language Theory and Parsing, 29 November 2002.
★ Alexandru Mateescu and Arto Salomaa, "Preface" in Vol.1, pp. v-viii, and "Formal Languages: An Introduction and a Synopsis", Chapter 1 in Vol. 1, pp.1-39
★ Sheng Yu, "Regular Languages", Chapter 2 in Vol. 1
★ Jean-Michel Autebert, Jean Berstel, Luc Boasson, "Context-Free Languages and Push-Down Automata", Chapter 3 in Vol. 1
★ Christian Choffrut and Juhani Karhumäki, "Combinatorics of Words", Chapter 6 in Vol. 1
★ Tero Harju and Juhani Karhumäki, "Morphisms", Chapter 7 in Vol. 1, pp. 439 - 510
★ Jean-Eric Pin, "Syntactic semigroups", Chapter 10 in Vol. 1, pp. 679-746
★ M. Crochemore and C. Hancart, "Automata for matching patterns", Chapter 9 in Vol. 2
★ Dora Giammarresi, Antonio Restivo, "Two-dimensional Languages", Chapter 4 in Vol. 3, pp. 215 - 267
In mathematics, logic, and computer science, a 'formal language' is a language that is defined by precise mathematical or machine processable formulas. Like languages in linguistics, formal languages generally have two aspects:
★ the syntax of a language is what the language looks like (more formally: the set of possible expressions that are valid utterances in the language)
★ the semantics of a language are what the utterances of the language mean (which is formalized in various ways, depending on the type of language in question)
Formal language theory
The branch of mathematics and computer science which studies exclusively the theory of language syntax is known as 'formal language theory'. In formal language theory, a language is nothing more than its syntax; questions of semantics are not addressed in this specialty.
Elementary example
In mathematics, a 'formal language' consists of two parts, an ''alphabet'' and rules of ''syntax''. The ''alphabet'' is any set of symbols; the rules of ''syntax'' are rules that a string of these symbols must follow if it is to be considered part of the 'formal language'.
As a simple example, consider the ''alphabet'' {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =} together with the following rules of ''syntax'':
★ Every string that does not contain + or = and does not start with 0 is in the language.
★ The string consisting of 0 is in the language.
★ A string containing = is in the language if and only if there is exactly one =, and it separates two strings in the language.
★ Any number of + symbols can appear in a string of the language as long as each of them is surrounded by symbols other than + or =.
★ No string is in the language other than those implied by the previous rules.
Under these rules, the string "23+4=555" is in the language, the string "=234=+" is not. This 'formal language' expresses whole numbers, well-formed addition statements, and well-formed addition equalities, but it expresses only what they look like (their syntax), not what they mean (semantics). For instance, nowhere in these rules is it defined that 0 means the number zero, or that + means addition.
The rest of this article is devoted to the basic notions of formal language theory.
Definition and basic terminology
In formal language theory, a language is defined as a possibly infinite set of finite-length sequences of elements drawn from a specified finite set, say .
The set is called the ''alphabet'' of the language;
an element of is called a ''symbol'' or ''character'';
a finite sequence of symbols/characters is called a ''string'';
an element of the language is called a ''word''.
However, the theory is not concerned with applications at all, and therefore, completely neutral to what symbols and words actually stand for.
For instance, in linguistics, formal language theory can be applied at many different levels of language description simultaneously:
★ in syntax, which describes how ''words'' in the ''lexicon'' (or ''vocabulary'') combine to form ''sentences'';
★ in morphology, which describes how ''parts of words'' combine to form ''words'';
★ in orthography (or spelling), which describes how ''characters'' (in the ''alphabet'') combine to form ''words''
★ in phonology, which describes how ''phonemes'' combine to form ''words''.
Examples
As an example of a formal language, an alphabet might be , and a string over that alphabet might be .
A typical language over that alphabet, containing that string, would be the set of all strings which contain the same number of symbols and .
The 'empty word' (that is, length-zero string) is allowed and is often denoted by , or . While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings (because the length of words belonging to it may be unbounded).
Some more examples:
★ the set of all words over
★ the set , where is a natural number and means repeated times
★ Finite languages, such as or
★ the set of syntactically correct programs in a given programming language; or
★ the set of inputs upon which a certain Turing machine halts; or,
★ the set of the longest strings of alphanumeric ASCII characters on this line, (i.e., the set {the, set, of, longest, strings, of, alphanumeric, ASCII, characters, on, this, line, i, e})
Language specification formalisms
Formal language theory rarely concerns itself with particular languages (except as examples), but is mainly concerned with the study of various types of formalisms to describe languages. For instance, a language can be given as
★ the set of strings that can be generated by some grammar (see Chomsky hierarchy);
★ the set of strings described or matched by a particular regular expression;
★ the set of strings accepted by some automaton, such as a Turing machine or finite state automaton;
★ the set of strings for which some decision procedure (an algorithm that asks a sequence of related YES/NO questions) produces the answer YES.
Typical questions asked about such formalisms include:
★ What is their expressive power? (Can formalism X describe every language that formalism Y can describe? Can it describe other languages?)
★ What is their recognizability? (How difficult is it to decide whether a given word belongs to a language described by formalism X?)
★ What is their comparability? (How difficult is it to decide whether two languages, one described in formalism X and one in formalism Y, or in X again, are actually the same language?)
Surprisingly often, the answer to these decision problems is "it cannot be done at all", or "it is extremely expensive" (with a precise characterization of how expensive exactly). Therefore, formal language theory is a major application area of computability theory and complexity theory.
Operations on languages
Certain operations on languages are common. This includes the standard set operations, such as union, intersection, and complementation. Another class of operation is the elementwise application of string operations.
Examples: suppose and are languages over some common alphabet.
★ The ''concatenation'' consists of all strings of the form where is a string from and is a string from .
★ The ''intersection'' of and consists of all strings which are contained in both languages
★ The ''complement'' of a language w.r.t. a given alphabet consists of all strings over the alphabet that are not in the language.
Such operations are used to investigate closure properties of classes of languages.
A class of languages is closed under a particular operation when the operation, applied to languages in the class, always produces a language in the same class again.
For instance, the context-free languages are known to be closed under union, concatenation, and intersection with regular languages, but not closed under intersection or complementation.
References
★ A. G. Hamilton, ''Logic for Mathematicians'', Cambridge University Press, 1978, ISBN 0 521 21838 1.
★ Michael A. Harrison: Introduction to Formal Language Theory, Addison-Wesley, 1978.
★ Grzegorz Rozenberg, Arto Salomaa, ''Handbook of Formal Languages: Volume I-III'', Springer, 1997, ISBN 3 540 61486 9.
★ Patrick Suppes, ''Introduction to Logic'', D. Van Nostrand, 1957, ISBN 0 442 08072 7.
See also
★ Formal grammar
★ Formal method
★ Formal science
★ Formal system
★ Mathematical notation
★ Programming language
External links
★ Formal Language Definitions website 1/24/04
★ James Power, Notes on Formal Language Theory and Parsing, 29 November 2002.
Drafts of some chapters in the "Handbook of Formal Language Theory", Vol. 1-3, G. Rozenberg and A. Salomaa (eds.), Springer Verlag, (1997):
★ Alexandru Mateescu and Arto Salomaa, "Preface" in Vol.1, pp. v-viii, and "Formal Languages: An Introduction and a Synopsis", Chapter 1 in Vol. 1, pp.1-39
★ Sheng Yu, "Regular Languages", Chapter 2 in Vol. 1
★ Jean-Michel Autebert, Jean Berstel, Luc Boasson, "Context-Free Languages and Push-Down Automata", Chapter 3 in Vol. 1
★ Christian Choffrut and Juhani Karhumäki, "Combinatorics of Words", Chapter 6 in Vol. 1
★ Tero Harju and Juhani Karhumäki, "Morphisms", Chapter 7 in Vol. 1, pp. 439 - 510
★ Jean-Eric Pin, "Syntactic semigroups", Chapter 10 in Vol. 1, pp. 679-746
★ M. Crochemore and C. Hancart, "Automata for matching patterns", Chapter 9 in Vol. 2
★ Dora Giammarresi, Antonio Restivo, "Two-dimensional Languages", Chapter 4 in Vol. 3, pp. 215 - 267
This article provided by Wikipedia. To edit the contents of this article, click here for original source.
psst.. try this: add to faves

العربية
中国
Français
Deutsch
Ελληνική
हिन्दी
Italiano
日本語
Português
Русский
Español