web.archive.org

Multinomial logit: Information from Answers.com

In statistics, economics, and genetics, a multinomial logit model is a regression model which generalizes logistic regression by allowing more than two discrete outcomes.

Introduction

Multinomial logit regression is used when the dependent variable in question is nominal (a set of categories which cannot be ordered in any meaningful way) and consists of more than two categories. For example, multinomial logit regression would be appropriate when trying to determine what factors predict which major college students choose.

Multinomial logit regression is appropriate in cases where the response is not ordinal in nature as in ordered logit. Ordered logit regression is used in cases where the dependent variable in question consists of a set number (more than two) of categories which can be ordered in a meaningful way (for example, highest degree, social class) while multinomial logit is used when there is no apparent order (i.e. the choice of muffins, bagels or donuts for breakfast) .

Assumptions

The multinomial logit model assumes that data are case specific; that is, each independent variable has a single value for each case. The multinomial logit model also assumes that the dependent variable cannot be perfectly predicted from the independent variables for any case. As with other types of regression, collinearity is assumed to be relatively low, as it becomes difficult to differentiate between the impact of several variables if they are highly correlated. The independence of irrelevant alternatives must either be included in the error structure, or assumed to exist; this is similar to the multinomial logit model. This assumption states that the odds do not depend on other alternatives that are not relevant (i.e. the choice between taking a car or bus to work does not change if blue and orange buses are added as possibilities).

Estimation of intercept

When using multinomial logistic regression, one category of the dependent variable is chosen as the comparison category. Separate relative risk ratios are determined for all independent variables for each category of the independent variable with the exception of the comparison category of the dependent variable, which is omitted from the analysis. Relative risk ratios, the exponential beta coefficient, represent the change in the odds of being in the dependent variable category versus the comparison category associated with a one unit change on the independent variable.

Model

$\Pr(y_{i}=j)=\frac{\exp(X_{i}\beta_{j})}{1+\sum_{j=1}^{J}\exp(X_{i}\beta_{j})}$

and

$\Pr(y_{i}=0)=\frac{1}{1+\sum_{j=1}^{J}\exp(X_{i}\beta_{j})},$

where for the ith individual, y_i is the observed outcome and X_i is a vector of explanatory variables. The unknown parameters β_j are typically estimated by maximum likelihood.

Applications

Random multinomial logit models combine a random ensemble of multinomial logit models for use as a classifier.