en.wikipedia.org

Quadratic classifier - Wikipedia

  • ️Sun Mar 29 2020

From Wikipedia, the free encyclopedia

This article is about statistical classification. For other uses of the word "quadratic" in mathematics, see Quadratic (disambiguation).

In statistics, a quadratic classifier is a statistical classifier that uses a quadratic decision surface to separate measurements of two or more classes of objects or events. It is a more general version of the linear classifier.

The classification problem

[edit]

Statistical classification considers a set of vectors of observations x of an object or event, each of which has a known type y. This set is referred to as the training set. The problem is then to determine, for a given new observation vector, what the best class should be. For a quadratic classifier, the correct solution is assumed to be quadratic in the measurements, so y will be decided based on {\displaystyle \mathbf {x^{T}Ax} +\mathbf {b^{T}x} +c}

In the special case where each observation consists of two measurements, this means that the surfaces separating the classes will be conic sections (i.e., either a line, a circle or ellipse, a parabola or a hyperbola). In this sense, we can state that a quadratic model is a generalization of the linear model, and its use is justified by the desire to extend the classifier's ability to represent more complex separating surfaces.

Quadratic discriminant analysis

[edit]

Quadratic discriminant analysis (QDA) is closely related to linear discriminant analysis (LDA), where it is assumed that the measurements from each class are normally distributed.[1] Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical.[2] When the normality assumption is true, the best possible test for the hypothesis that a given measurement is from a given class is the likelihood ratio test. Suppose there are only two groups, with means {\displaystyle \mu _{0},\mu _{1}} and covariance matrices {\displaystyle \Sigma _{0},\Sigma _{1}} corresponding to {\displaystyle y=0} and {\displaystyle y=1} respectively. Then the likelihood ratio is given by {\displaystyle {\text{Likelihood ratio}}={\frac {{\sqrt {2\pi |\Sigma _{1}|}}^{-1}\exp \left(-{\frac {1}{2}}(\mathbf {x} -{\boldsymbol {\mu }}_{1})^{T}\Sigma _{1}^{-1}(\mathbf {x} -{\boldsymbol {\mu }}_{1})\right)}{{\sqrt {2\pi |\Sigma _{0}|}}^{-1}\exp \left(-{\frac {1}{2}}(\mathbf {x} -{\boldsymbol {\mu }}_{0})^{T}\Sigma _{0}^{-1}(\mathbf {x} -{\boldsymbol {\mu }}_{0})\right)}}<t} for some threshold {\displaystyle t}. After some rearrangement, it can be shown that the resulting separating surface between the classes is a quadratic. The sample estimates of the mean vector and variance-covariance matrices will substitute the population quantities in this formula.

While QDA is the most commonly-used method for obtaining a classifier, other methods are also possible. One such method is to create a longer measurement vector from the old one by adding all pairwise products of individual measurements. For instance, the vector {\displaystyle [x_{1},\;x_{2},\;x_{3}]} would become {\displaystyle [x_{1},\;x_{2},\;x_{3},\;x_{1}^{2},\;x_{1}x_{2},\;x_{1}x_{3},\;x_{2}^{2},\;x_{2}x_{3},\;x_{3}^{2}].}

Finding a quadratic classifier for the original measurements would then become the same as finding a linear classifier based on the expanded measurement vector. This observation has been used in extending neural network models;[3] the "circular" case, which corresponds to introducing only the sum of pure quadratic terms {\displaystyle \;x_{1}^{2}+x_{2}^{2}+x_{3}^{2}+\cdots \;} with no mixed products ({\displaystyle \;x_{1}x_{2},\;x_{1}x_{3},\;\ldots \;}), has been proven to be the optimal compromise between extending the classifier's representation power and controlling the risk of overfitting (Vapnik-Chervonenkis dimension).[4]

For linear classifiers based only on dot products, these expanded measurements do not have to be actually computed, since the dot product in the higher-dimensional space is simply related to that in the original space. This is an example of the so-called kernel trick, which can be applied to linear discriminant analysis as well as the support vector machine.

  1. ^ Tharwat, Alaa (2016). "Linear vs. quadratic discriminant analysis classifier: a tutorial". International Journal of Applied Pattern Recognition. 3 (2): 145. doi:10.1504/IJAPR.2016.079050. ISSN 2049-887X.
  2. ^ "Linear & Quadratic Discriminant Analysis · UC Business Analytics R Programming Guide". uc-r.github.io. Retrieved 2020-03-29.
  3. ^ Cover TM (1965). "Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition". IEEE Transactions on Electronic Computers. EC-14 (3): 326–334. doi:10.1109/pgec.1965.264137.
  4. ^ Ridella S, Rovetta S, Zunino R (1997). "Circular backpropagation networks for classification". IEEE Transactions on Neural Networks. 8 (1): 84–97. doi:10.1109/72.554194. PMID 18255613. href IEEE: [1].