LARGE DEVIATIONS THEORY
'Large Deviations Theory' concerns the remote tail of a probability distribution. For example, consider a sequence of independent tosses of a fair
coin. The possible outcomes could be head or tail. Let us denote the possible outcome of the i-th trial by
, where we encode head as -1 and tail as 1. Now let denote the mean value after trials, i.e
Then lies between -1 and 1; but from the weak or strong law of large numbers (and also
from our experience) we know that as N become very large (tends to infinity) becomes
increasingly likely to lie increasingly close to . Now let us make the preceding statement more precise : For a given value what is the probability that
is greater than ? Let us denote
this probability by . It is known that
, and that in fact this upper bound is rather sharp, in a certain technical sense.
In other words the probability is decaying exponentially rapidly as N grows large, at a rate depending on x. Roughly speaking, large deviation
theory concerns itself with the exponential decay of the probability of certain kinds of extreme or "tail" events, as the number of observations grows arbitrarily large.
In the above mentioned example of coin-tossing we tacitly assumed that each toss is an
independent trial. And for each toss, the probability of getting head or tail is always the
same. This makes the random number Independent and Identically Distributed (I.I.D.). For I.I.D. variables
whose common distribution satisfies a certain growth condition, large deviation theory states that the following limit exists:
The function is called the "rate function" or "Cramer function" or sometimes the "entropy function". Roughly speaking, the existence of this limit is what establishes the above mentioned exponential decay and allows us to conclude that for large ,
takes the form:
(The inequality given in the first paragraph, as opposed to the asymptotic formula presented here, requires an additional argument.) Also is a monotonically increasing convex function.
This is the basic result of Large Deviations Theory.
If we know the probability distribution of , an explicit
expression for the rate function can be obtained. This is given by a
Legendre transform
where the function is called the "Cumulant Generating
Function (CGF)", given by
Here denotes expectation value with respect to the probability
distribution function of and is any one of
s. If follows a Gaussian distribution,
the rate function becomes a parabola with its apex at the mean of the Gaussian
distribution.
If the condition of Independent Identical Distribution is relaxed, particularly
if the numbers are not independent but nevertheless
satisfies Markov Property, the basic large deviations result stated above can be generalized.
Large deviations theory was perhaps discovered by Swedish mathematician
Harald Cramér who applied it to model the insurance business. From the point
of view of an insurance company, the earning is at a constant rate per month
(the monthly premium) but the claims come randomly. For the company to be successful
over a certain period of time (preferably many months), the total earning should
exceed the total claim. Thus to estimate the premium you have to ask the following
question : "What should we choose as the premium such that over
months the total claim should
be less than ? " This is clearly the same question asked by
the large deviations theory. Cramer gave a solution to this question for I.I.D. Gaussian
random variables, where the rate function is expressed as a power series.
The results we have quoted above were later obtained by H. Chernoff, among other people. A very
incomplete list of mathematicians who have made important advances would
include S.R.S. Varadhan (who has won the Abel prize), D. Ruelle and O.E. Landford.
The rate function is related to the entropy is statistical mechanics. This can be heuristically seen
in the following way. In statistical mechanics the entropy of a particular macro-state is related
to the number of micro-states which corresponds to this macro-state. In our coin tossing example the
mean value could designate a particular macro-state. And the particular sequence of
heads and tails which gives rise to a particular value of constitutes a particular
micro-state. Loosely speaking a macro-state having more number of micro-states giving rise to it,
has higher entropy. And a state with higher entropy has more chance of being realised in actual
experiments. The macro-state with mean value of half has the highest number micro-states giving
rise to it and it is indeed the state with the highest entropy. And in most practical situation
we shall indeed obtain this macro-state for large number of trials. The "rate function" on the other
hand measures the probability of appearance of a particular macro-state. The smaller the rate function
the higher is the chance of a macro-state appearing. In our coin-tossing the value of the "rate function" for mean value
equal to half is zero. In this way one can see the "rate function" as the negative of the "entropy".
★ A very useful elementary introduction is available at [1]
★ Entropy, Large Deviations and Statistical Mechanics by R.S. Ellis, Springer Publication. ISBN 3-540-29059-1
★ Large Deviations for Performance Analysis by Alan Weiss and Adam Shwartz. Chapman and Hall ISBN 0-412-06311-5
★ Large Deviations Techniques and Applications by Amir Dembo and Ofer Zeitouni. Springer ISBN 0-387-98406-2
★ Random Perturbations of Dynamical Systems by M.I. Freidlin and A.D. Wentzell. Springer ISBN 0-387-98362-7
★ Chernoff's inequality
★ Varadhan's integral
★ Abel Prize 2007 - to S.R.S. Varadhan
coin. The possible outcomes could be head or tail. Let us denote the possible outcome of the i-th trial by
, where we encode head as -1 and tail as 1. Now let denote the mean value after trials, i.e
Then lies between -1 and 1; but from the weak or strong law of large numbers (and also
from our experience) we know that as N become very large (tends to infinity) becomes
increasingly likely to lie increasingly close to . Now let us make the preceding statement more precise : For a given value what is the probability that
is greater than ? Let us denote
this probability by . It is known that
, and that in fact this upper bound is rather sharp, in a certain technical sense.
In other words the probability is decaying exponentially rapidly as N grows large, at a rate depending on x. Roughly speaking, large deviation
theory concerns itself with the exponential decay of the probability of certain kinds of extreme or "tail" events, as the number of observations grows arbitrarily large.
| Contents |
| Basic statement |
| Brief History |
| Large Deviation and Entropy |
| Further reading |
| See also |
Basic statement
In the above mentioned example of coin-tossing we tacitly assumed that each toss is an
independent trial. And for each toss, the probability of getting head or tail is always the
same. This makes the random number Independent and Identically Distributed (I.I.D.). For I.I.D. variables
whose common distribution satisfies a certain growth condition, large deviation theory states that the following limit exists:
The function is called the "rate function" or "Cramer function" or sometimes the "entropy function". Roughly speaking, the existence of this limit is what establishes the above mentioned exponential decay and allows us to conclude that for large ,
takes the form:
(The inequality given in the first paragraph, as opposed to the asymptotic formula presented here, requires an additional argument.) Also is a monotonically increasing convex function.
This is the basic result of Large Deviations Theory.
If we know the probability distribution of , an explicit
expression for the rate function can be obtained. This is given by a
Legendre transform
where the function is called the "Cumulant Generating
Function (CGF)", given by
Here denotes expectation value with respect to the probability
distribution function of and is any one of
s. If follows a Gaussian distribution,
the rate function becomes a parabola with its apex at the mean of the Gaussian
distribution.
If the condition of Independent Identical Distribution is relaxed, particularly
if the numbers are not independent but nevertheless
satisfies Markov Property, the basic large deviations result stated above can be generalized.
Brief History
Large deviations theory was perhaps discovered by Swedish mathematician
Harald Cramér who applied it to model the insurance business. From the point
of view of an insurance company, the earning is at a constant rate per month
(the monthly premium) but the claims come randomly. For the company to be successful
over a certain period of time (preferably many months), the total earning should
exceed the total claim. Thus to estimate the premium you have to ask the following
question : "What should we choose as the premium such that over
months the total claim should
be less than ? " This is clearly the same question asked by
the large deviations theory. Cramer gave a solution to this question for I.I.D. Gaussian
random variables, where the rate function is expressed as a power series.
The results we have quoted above were later obtained by H. Chernoff, among other people. A very
incomplete list of mathematicians who have made important advances would
include S.R.S. Varadhan (who has won the Abel prize), D. Ruelle and O.E. Landford.
Large Deviation and Entropy
The rate function is related to the entropy is statistical mechanics. This can be heuristically seen
in the following way. In statistical mechanics the entropy of a particular macro-state is related
to the number of micro-states which corresponds to this macro-state. In our coin tossing example the
mean value could designate a particular macro-state. And the particular sequence of
heads and tails which gives rise to a particular value of constitutes a particular
micro-state. Loosely speaking a macro-state having more number of micro-states giving rise to it,
has higher entropy. And a state with higher entropy has more chance of being realised in actual
experiments. The macro-state with mean value of half has the highest number micro-states giving
rise to it and it is indeed the state with the highest entropy. And in most practical situation
we shall indeed obtain this macro-state for large number of trials. The "rate function" on the other
hand measures the probability of appearance of a particular macro-state. The smaller the rate function
the higher is the chance of a macro-state appearing. In our coin-tossing the value of the "rate function" for mean value
equal to half is zero. In this way one can see the "rate function" as the negative of the "entropy".
Further reading
★ A very useful elementary introduction is available at [1]
★ Entropy, Large Deviations and Statistical Mechanics by R.S. Ellis, Springer Publication. ISBN 3-540-29059-1
★ Large Deviations for Performance Analysis by Alan Weiss and Adam Shwartz. Chapman and Hall ISBN 0-412-06311-5
★ Large Deviations Techniques and Applications by Amir Dembo and Ofer Zeitouni. Springer ISBN 0-387-98406-2
★ Random Perturbations of Dynamical Systems by M.I. Freidlin and A.D. Wentzell. Springer ISBN 0-387-98362-7
See also
★ Chernoff's inequality
★ Varadhan's integral
★ Abel Prize 2007 - to S.R.S. Varadhan
This article provided by Wikipedia. To edit the contents of this article, click here for original source.
psst.. try this: add to faves

العربية
中国
Français
Deutsch
Ελληνική
हिन्दी
Italiano
日本語
Português
Русский
Español