en.wikipedia.org

Inside–outside algorithm - Wikipedia

From Wikipedia, the free encyclopedia

For parsing algorithms in computer science, the inside–outside algorithm is a way of re-estimating production probabilities in a probabilistic context-free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward–backward algorithm for parameter estimation on hidden Markov models to stochastic context-free grammars. It is used to compute expectations, for example as part of the expectation–maximization algorithm (an unsupervised learning algorithm).

Inside and outside probabilities

[edit]

The inside probability {\displaystyle \beta _{j}(p,q)} is the total probability of generating words {\displaystyle w_{p}\cdots w_{q}}, given the root nonterminal {\displaystyle N^{j}} and a grammar {\displaystyle G}:[1]

{\displaystyle \beta _{j}(p,q)=P(w_{pq}|N_{pq}^{j},G)}

The outside probability {\displaystyle \alpha _{j}(p,q)} is the total probability of beginning with the start symbol {\displaystyle N^{1}} and generating the nonterminal {\displaystyle N_{pq}^{j}} and all the words outside {\displaystyle w_{p}\cdots w_{q}}, given a grammar {\displaystyle G}:[1]

{\displaystyle \alpha _{j}(p,q)=P(w_{1(p-1)},N_{pq}^{j},w_{(q+1)m}|G)}

Computing inside probabilities

[edit]

Base Case:

{\displaystyle \beta _{j}(p,p)=P(w_{p}|N^{j},G)}

General case:

Suppose there is a rule {\displaystyle N_{j}\rightarrow N_{r}N_{s}} in the grammar, then the probability of generating {\displaystyle w_{p}\cdots w_{q}} starting with a subtree rooted at {\displaystyle N_{j}} is:

{\displaystyle \sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}

The inside probability {\displaystyle \beta _{j}(p,q)} is just the sum over all such possible rules:

{\displaystyle \beta _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}

Computing outside probabilities

[edit]

Base Case:

{\displaystyle \alpha _{j}(1,n)={\begin{cases}1&{\mbox{if }}j=1\\0&{\mbox{otherwise}}\end{cases}}}

Here the start symbol is {\displaystyle N_{1}}.

General case:

Suppose there is a rule {\displaystyle N_{r}\rightarrow N_{j}N_{s}} in the grammar that generates {\displaystyle N_{j}}. Then the left contribution of that rule to the outside probability {\displaystyle \alpha _{j}(p,q)} is:

{\displaystyle \sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)}

Now suppose there is a rule {\displaystyle N_{r}\rightarrow N_{s}N_{j}} in the grammar. Then the right contribution of that rule to the outside probability {\displaystyle \alpha _{j}(p,q)} is:

{\displaystyle \sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}

The outside probability {\displaystyle \alpha _{j}(p,q)} is the sum of the left and right contributions over all such rules:

{\displaystyle \alpha _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)+\sum _{N_{r},N_{s}}\sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}

  1. ^ a b Manning, Christopher D.; Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press. pp. 388–402. ISBN 0-262-13360-1.