HYPERGEOMETRIC DISTRIBUTION



Contents
Introduction
Application and example
Symmetries
Relationship to Fisher's exact test
Related distributions
Multivariate hypergeometric distribution
See also
External links

Introduction


{{Probability distribution |
name =Hypergeometric|
type =mass|
pdf_image =|
cdf_image =|
parameters =Nin 1,2,3,dots,
min 0,1,dots,N,
nin 1,2,dots,N,
|
support =k in mathrm{max} lbrace 0,n+m-N
brace,dots, mathrm{min} lbrace m,n
brace ,|
pdf ={{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}|
cdf =|
mean =n mover N|
median =|
mode =left lfloor rac{(n+1)(m+1)}{N+2}
ight
floor|
variance =n(m/N)(1-m/N)(N-n)over (N-1)|
skewness = rac{(N-2m)(N-1)^ rac{1}{2}(N-2n)}{[nm(N-m)(N-n)]^ rac{1}{2}(N-2)}|
kurtosis = left[ rac{N^2(N-1)}{n(N-2)(N-3)(N-n)}
ight]
cdotleft[ rac{N(N+1)-6N(N-n)}{m(N-m)}
ight.
+left. rac{3n(N-n)(N+6)}{N^2}-6
ight]|
entropy =|
mgf = rac
,_2F_1(-n,!-m;!N!-!m!-!n!+!1;!e^{t})|
char = rac
,_2F_1(-n,!-m;!N!-!m!-!n!+!1;!e^{it})
}}
In probability theory and statistics, the 'hypergeometric distribution' is a discrete probability distribution that describes the number of successes in a sequence of ''n'' draws from a finite population without replacement.
A typical example is illustrated by this contingency table:



























drawn not drawn total
'defective' ''k'' ''m'' − ''k'' ''m''
'non-defective' ''n'' − ''k'' ''N + k − n − m'' ''N − m''
'total' ''n'' ''N − n'' ''N''

There is a shipment of ''N'' objects in which ''m'' are defective. The hypergeometric distribution describes the probability that in a sample of ''n'' distinctive objects drawn from the shipment exactly ''k'' objects are defective.
In general, if a random variable ''X'' follows the hypergeometric distribution with parameters ''N'', ''m'' and ''n'', then the probability of getting exactly ''k'' successes is given by
: f(k;N,m,n) = {{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}
The probability is positive when ''k'' is between
mathrm{max} lbrace 0,n+m-N
brace and mathrm{min} lbrace m,n
brace.
The formula can be understood as follows: There are binom{N}{n} possible samples (without replacement). There are binom{m}{k} ways to obtain ''k'' defective objects and there are binom{N-m}{n-k} ways to fill out the rest of the sample with non-defective objects.
The fact that the sum of the probabilities, as ''k'' runs through the range of possible values, is equal to 1, is essentially Vandermonde's identity from combinatorics.

Application and example


The classical application of the hypergeometric distribution is 'sampling without replacement'. Think of an urn with two types of marbles, black ones and white ones. Define drawing a black marble as a success and drawing a white marble as a failure (analogous to the binomial distribution). If the variable ''N'' describes the number of 'all marbles in the urn' (see contingency table above) and ''m'' describes the number of 'white marbles' (called ''defective'' in the example above), then ''N'' − ''m'' corresponds to the number of 'black marbles'.

Now, assume that there are 5 white and 45 black marbles in the urn. Standing next to the urn, you close your eyes and draw 10 marbles without replacement. What's the probability that you draw exactly 4 white marbles (and - of course - 6 black marbles) ?
This problem is summarized by the following contingency table:

























drawn not drawn total
'white marbles' '4' (''k'') '1' = 5 − 4 (''m'' − ''k'') '''5' (m)''
'black marbles' '6' = 10 − 4 (''n'' − ''k'') '39 ' = 50 + 4 − 10 − 5 (''N + k − n − m'') '45' (''N − m'')
'total' '10' (''n'') '40' (''N − n'') '50' (''N'')

The probability of drawing exactly ''x'' white marbles can be calculated by the formula
: Pr(k=x) = f(k;N,m,n) = {{{m choose k} {{N-m} choose {n-k}}}over {N choose n}}.
Hence, in this example ''x'' = 4, calculate
: Pr(k=4) = f(4;50,5,10) = {{{5 choose 4} {{45} choose {6}}}over {50 choose 10}} = 0.003964583dots.
So, the probability of drawing exactly 4 white marbles is quite low (approximately 0.004) and the event is very unlikely. It means, if you repeated your random experiment (drawing 10 marbles from the urn of 50 marbles without replacement) 1000 times you just would expect to obtain such a result 4 times.
But what about the probability of drawing all 5 white marbles? You will intuitively agree upon that this is even more unlikely than drawing 4 white marbles. Let us calculate the probability for such an extreme event.
The contingency table is as follows:

























drawn not drawn total
'white marbles' '5' (''k'') '0' = 5 − 5 (''m − k'') '''5' (m)''
'black marbles' '5' = 10 − 5 (''n − k'') '40 ' = 50 + 5 − 10 − 5 (''N + k − n − D'') '45' (''N − m'')
'total' '10' (''n'') '40' (''N − n'') '50' (''N'')

And we can calculate the probability as follows (notice that the denominator always stays the same):
: Pr[k=5] = f(5;50,5,10) = {{{5 choose 5} {{45} choose {5}}}over {50 choose 10}} = 0.0001189375dots,
As expected, the probability of drawing 5 white marbles is even much lower than drawing 4 white marbles.

Symmetries



f(k;N,m,n) = f(n-k;N,N-m,n)
This symmetry can be intuitively understood if you repaint all the black marbles to white and vice versa, thus the black and white marbles simply change roles.

f(k;N,m,n) = f(m-k;N,m,N-n)
This symmetry can be intuitively understood as swapping the roles of ''taken'' and ''not taken'' marbles.

f(k;N,m,n) = f(k;N,n,m)
This symmetry can be intuitively understood if instead of drawing marbles, you label the marbles that you would have drawn. Both expressions give the probability that exactly ''k'' marbles are "black" and labeled "drawn".

Relationship to Fisher's exact test


The test (see above) based on the hypergeometric distribution (hypergeometric test) is identical to the corresponding one-tailed version of Fisher's exact test. Reciprocally, the p-value of a two-sided Fisher's exact test can be calculated as the sum of two appropriate hypergeometric tests (for more information see the following web site).

Related distributions


Let X ~ Hypergeometric(m, N, n) and p=m/N.

★ If n=1 then X has a Bernoulli distribution with parameter p.

★ If N and m are large compared to n and p is not close to 0 or 1, then P[X le x] pprox P[Y le x] where Y has a binomial distribution with parameters n and p.

★ If n is large, N and m are large compared to n and p is not close to 0 or 1, then
P[X le x] pprox Phi left( rac{x-n p}{sqrt{n p (1-p)}}
ight)
where Phi is the standard normal distribution function

Multivariate hypergeometric distribution


{{Probability distribution |
name =Multivariate Hypergeometric Distribution|
type =mass|
pdf_image =|
cdf_image =|
parameters =c in mathbb{N}
(m_1,ldots,m_c) in mathbb{N}^c
N = sum_{i=1}^c m_i
n in [0,N]|
support =left{ mathbf{k} in mathbb{Z}_{0+}^c , : , sum_{i=1}^{c} k_i = n
ight}|
pdf = rac{prod_{i=1}^{c} inom{m_i}{k_i}}{inom{N}{n}}|
cdf =|
mean =E(X_i) = rac{n m_i}{N}|
median =|
mode =|
variance =var(X_i) = rac{m_i}{N} left(1- rac{m_i}{N}
ight) n rac{N-n}{N-1}
cov(X_i,X_j) = - rac{n m_i m_j}{N^2} rac{N-n}{N-1} |
skewness =|
kurtosis =|
entropy =|
mgf =|
char =
}}
The model of an urn with black and white marbles can be extended to the case where there are more than two colors of marbles. If there are ''m''i marbles of color ''i'' in the urn and you take ''n'' marbles at random without replacement, then the number of marbles of each color in the sample (''k''1,''k''2,...,''k''c) has the multivariate hypergeometric distribution.
The properties of this distribution is given in the adjacent table, where ''c'' is the number of different colors and N=sum_{i=1}^{c} m_i is the total number of marbles.


See also



Binomial distribution

Fisher's exact test

Noncentral hypergeometric distributions

Sampling (statistics)

Urn problem

External links



Hypergeometric Distribution Calculator

Hypergeometric Distribution Calculator with source (Ruby, C++)

This article provided by Wikipedia. To edit the contents of this article, click here for original source.

psst.. try this: add to faves