en.wikipedia.org

Bernoulli distribution - Wikipedia

From Wikipedia, the free encyclopedia

Bernoulli distribution

Probability mass function Three examples of Bernoulli distribution: $P(x=0)=0{.}2$ and $P(x=1)=0{.}8$ $P(x=0)=0{.}8$ and $P(x=1)=0{.}2$ $P(x=0)=0{.}5$ and $P(x=1)=0{.}5$
Parameters	$0\leq p\leq 1$ $q=1-p$
Support	$k\in \{0,1\}$
PMF	${\begin{cases}q=1-p&{\text{if }}k=0\\p&{\text{if }}k=1\end{cases}}$
CDF	${\begin{cases}0&{\text{if }}k<0\\1-p&{\text{if }}0\leq k<1\\1&{\text{if }}k\geq 1\end{cases}}$
Mean	$p$
Median	${\begin{cases}0&{\text{if }}p<1/2\\\left[0,1\right]&{\text{if }}p=1/2\\1&{\text{if }}p>1/2\end{cases}}$
Mode	${\begin{cases}0&{\text{if }}p<1/2\\0,1&{\text{if }}p=1/2\\1&{\text{if }}p>1/2\end{cases}}$
Variance	$p(1-p)=pq$
MAD	$2p(1-p)=2pq$
Skewness	${\frac {q-p}{\sqrt {pq}}}$
Excess kurtosis	${\frac {1-6pq}{pq}}$
Entropy	$-q\ln q-p\ln p$
MGF	$q+pe^{t}$
CF	$q+pe^{it}$
PGF	$q+pz$
Fisher information	${\frac {1}{pq}}$

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,^[1] is the discrete probability distribution of a random variable which takes the value 1 with probability $p$ and the value 0 with probability $q=1-p$ . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and p would be the probability of tails). In particular, unfair coins would have $p\neq 1/2.$

The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.^[2]

If $X$ is a random variable with a Bernoulli distribution, then:

$\Pr(X=1)=p=1-\Pr(X=0)=1-q.$

The probability mass function $f$ of this distribution, over possible outcomes k, is

$f(k;p)={\begin{cases}p&{\text{if }}k=1,\\q=1-p&{\text{if }}k=0.\end{cases}}$ ^[3]

This can also be expressed as

$f(k;p)=p^{k}(1-p)^{1-k}\quad {\text{for }}k\in \{0,1\}$

or as

$f(k;p)=pk+(1-p)(1-k)\quad {\text{for }}k\in \{0,1\}.$

The Bernoulli distribution is a special case of the binomial distribution with $n=1.$ ^[4]

The kurtosis goes to infinity for high and low values of $p,$ but for $p=1/2$ the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.

The Bernoulli distributions for $0\leq p\leq 1$ form an exponential family.

The maximum likelihood estimator of $p$ based on a random sample is the sample mean.

The expected value of a Bernoulli random variable $X$ is

$\operatorname {E} [X]=p$

This is because for a Bernoulli distributed random variable $X$ with $\Pr(X=1)=p$ and $\Pr(X=0)=q$ we find

$\operatorname {E} [X]=\Pr(X=1)\cdot 1+\Pr(X=0)\cdot 0=p\cdot 1+q\cdot 0=p.$ ^[3]

The variance of a Bernoulli distributed $X$ is

$\operatorname {Var} [X]=pq=p(1-p)$

We first find

$\operatorname {E} [X^{2}]=\Pr(X=1)\cdot 1^{2}+\Pr(X=0)\cdot 0^{2}$

$=p\cdot 1^{2}+q\cdot 0^{2}=p=\operatorname {E} [X]$

From this follows

$\operatorname {Var} [X]=\operatorname {E} [X^{2}]-\operatorname {E} [X]^{2}=\operatorname {E} [X]-\operatorname {E} [X]^{2}$

$=p-p^{2}=p(1-p)=pq$ ^[3]

With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside $[0,1/4]$ .

The skewness is ${\frac {q-p}{\sqrt {pq}}}={\frac {1-2p}{\sqrt {pq}}}$ . When we take the standardized Bernoulli distributed random variable ${\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}$ we find that this random variable attains ${\frac {q}{\sqrt {pq}}}$ with probability $p$ and attains $-{\frac {p}{\sqrt {pq}}}$ with probability $q$ . Thus we get

${\begin{aligned}\gamma _{1}&=\operatorname {E} \left[\left({\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}\right)^{3}\right]\\&=p\cdot \left({\frac {q}{\sqrt {pq}}}\right)^{3}+q\cdot \left(-{\frac {p}{\sqrt {pq}}}\right)^{3}\\&={\frac {1}{{\sqrt {pq}}^{3}}}\left(pq^{3}-qp^{3}\right)\\&={\frac {pq}{{\sqrt {pq}}^{3}}}(q^{2}-p^{2})\\&={\frac {(1-p)^{2}-p^{2}}{\sqrt {pq}}}\\&={\frac {1-2p}{\sqrt {pq}}}={\frac {q-p}{\sqrt {pq}}}.\end{aligned}}$

Higher moments and cumulants

[edit]

The raw moments are all equal because $1^{k}=1$ and $0^{k}=0$ .

$\operatorname {E} [X^{k}]=\Pr(X=1)\cdot 1^{k}+\Pr(X=0)\cdot 0^{k}=p\cdot 1+q\cdot 0=p=\operatorname {E} [X].$

The central moment of order $k$ is given by

$\mu _{k}=(1-p)(-p)^{k}+p(1-p)^{k}.$

The first six central moments are

${\begin{aligned}\mu _{1}&=0,\\\mu _{2}&=p(1-p),\\\mu _{3}&=p(1-p)(1-2p),\\\mu _{4}&=p(1-p)(1-3p(1-p)),\\\mu _{5}&=p(1-p)(1-2p)(1-2p(1-p)),\\\mu _{6}&=p(1-p)(1-5p(1-p)(1-p(1-p))).\end{aligned}}$

The higher central moments can be expressed more compactly in terms of $\mu _{2}$ and $\mu _{3}$

${\begin{aligned}\mu _{4}&=\mu _{2}(1-3\mu _{2}),\\\mu _{5}&=\mu _{3}(1-2\mu _{2}),\\\mu _{6}&=\mu _{2}(1-5\mu _{2}(1-\mu _{2})).\end{aligned}}$

The first six cumulants are

${\begin{aligned}\kappa _{1}&=p,\\\kappa _{2}&=\mu _{2},\\\kappa _{3}&=\mu _{3},\\\kappa _{4}&=\mu _{2}(1-6\mu _{2}),\\\kappa _{5}&=\mu _{3}(1-12\mu _{2}),\\\kappa _{6}&=\mu _{2}(1-30\mu _{2}(1-4\mu _{2})).\end{aligned}}$

Entropy and Fisher's Information

[edit]

Entropy is a measure of uncertainty or randomness in a probability distribution. For a Bernoulli random variable $X$ with success probability $p$ and failure probability $q=1-p$ , the entropy $H(X)$ is defined as:

${\begin{aligned}H(X)&=\mathbb {E} _{p}\ln({\frac {1}{P(X)}})=-[P(X=0)\ln P(X=0)+P(X=1)\ln P(X=1)]\\H(X)&=-(q\ln q+p\ln p),\quad q=P(X=0),p=P(X=1)\end{aligned}}$

The entropy is maximized when $p=0.5$ , indicating the highest level of uncertainty when both outcomes are equally likely. The entropy is zero when $p=0$ or $p=1$ , where one outcome is certain.

Fisher's Information

[edit]

Fisher information measures the amount of information that an observable random variable $X$ carries about an unknown parameter $p$ upon which the probability of $X$ depends. For the Bernoulli distribution, the Fisher information with respect to the parameter $p$ is given by:

${\begin{aligned}I(p)={\frac {1}{pq}}\end{aligned}}$

Proof:

The Likelihood Function for a Bernoulli random variable $X$ is:

${\begin{aligned}L(p;X)=p^{X}(1-p)^{1-X}\end{aligned}}$

This represents the probability of observing $X$ given the parameter $p$ .

The Log-Likelihood Function is:

${\begin{aligned}\ln L(p;X)=X\ln p+(1-X)\ln(1-p)\end{aligned}}$

The Score Function (the first derivative of the log-likelihood w.r.t. $p$ is:

${\begin{aligned}{\frac {\partial }{\partial p}}\ln L(p;X)={\frac {X}{p}}-{\frac {1-X}{1-p}}\end{aligned}}$

The second derivative of the log-likelihood function is:

${\begin{aligned}{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)=-{\frac {X}{p^{2}}}-{\frac {1-X}{(1-p)^{2}}}\end{aligned}}$

Fisher information is calculated as the negative expected value of the second derivative of the log-likelihood:

${\begin{aligned}I(p)=-E\left[{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)\right]=-\left(-{\frac {p}{p^{2}}}-{\frac {1-p}{(1-p)^{2}}}\right)={\frac {1}{p(1-p)}}={\frac {1}{pq}}\end{aligned}}$

It is maximized when $p=0.5$ , reflecting maximum uncertainty and thus maximum information about the parameter $p$ .

The Bernoulli distribution is simply $\operatorname {B} (1,p)$ , also written as ${\textstyle \mathrm {Bernoulli} (p).}$

Bernoulli process, a random process consisting of a sequence of independent Bernoulli trials
Bernoulli sampling
Binary entropy function
Binary decision diagram

^ Uspensky, James Victor (1937). Introduction to Mathematical Probability. New York: McGraw-Hill. p. 45. OCLC 996937.
^ Dekking, Frederik; Kraaikamp, Cornelis; Lopuhaä, Hendrik; Meester, Ludolf (9 October 2010). A Modern Introduction to Probability and Statistics (1 ed.). Springer London. pp. 43–48. ISBN 9781849969529.
^ ^a ^b ^c ^d Bertsekas, Dimitri P. (2002). Introduction to Probability. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν. Belmont, Mass.: Athena Scientific. ISBN 188652940X. OCLC 51441829.
^ McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC. Section 4.2.2. ISBN 0-412-31760-5.
^ Orloff, Jeremy; Bloom, Jonathan. "Conjugate priors: Beta and normal" (PDF). math.mit.edu. Retrieved October 20, 2023.

Johnson, N. L.; Kotz, S.; Kemp, A. (1993). Univariate Discrete Distributions (2nd ed.). Wiley. ISBN 0-471-54897-9.
Peatman, John G. (1963). Introduction to Applied Statistics. New York: Harper & Row. pp. 162–171.

"Binomial distribution", Encyclopedia of Mathematics, EMS Press, 2001 [1994].
Weisstein, Eric W. "Bernoulli Distribution". MathWorld.
Interactive graphic: Univariate Distribution Relationships.