Dummy Variables
Menu location: Data_Dummy Variables
This function creates dummy (or design) variables from one categorical variable.
The reference cell coding model is used (Kleinbaum et al., 1998):
- the source data may be numerical or text, representing categories. The coding scheme shown above is applied to your data in reverse alphanumeric order for the k categories found, so for three categories, say race equal to black, white or other, white (being the last in an alphabetical sorting) is coded 1,0,0 which reduces to dummy variables X (3) = 1, X (2) = 0.
In order to represent a categorical variable with more than two levels in a regression model you may wish to convert it to a series of dummy variables using this function.
Say a linear regression model is specified with three predictors; the first and third predictors are continuous data, and the second predictor is a classifier (categorical data) with three levels. The second predictor should be converted to two dummy dichotomous variables (e.g. the example below) and put into a multiple linear regression as two predictors.
The naming scheme for dummy variables is the original variable name suffixed with (1) if there are only two categories, or suffixed with (j+1) where there are j+1 categories giving rise to j dummy variables.
In general form, a regression model where the jth predictor variable is a classifier with k levels can be interpreted as follows, provided the jth variable is converted to dummy variables:
- where Y is the outcome variable, b is a regression coefficient, D is a dummy variable for a classifier variable of k levels and x is a non-classifier predictor variable.
Example
Group ID ---> | Group ID (2) | Group ID (3) |
1 | 0 | 0 |
1 | 0 | 0 |
1 | 0 | 0 |
1 | 0 | 0 |
2 | 1 | 0 |
2 | 1 | 0 |
2 | 1 | 0 |
2 | 1 | 0 |
3 | 0 | 1 |
3 | 0 | 1 |
3 | 0 | 1 |