12.6 Indicator Variables
A special case of a variable with multiple categories is an indicator variable. These variables are sometimes referred to also as binary or dummy variables.
You can think of these variables with just two categories: 0 and 1. Usually the level of 1 is reserved for the characteristic of interest.
For example, you may want to indicate those with incomes higher than average. For the states data, you could do this multiple ways.
- One way is the following:
$HighIncome <- 1*(dat$Income > mean(dat$Income))
dathead(dat)
Listing the first few lines of the data show a 1 for Alaska, Arizone, California, and Colorado, and a 0 for Alabama and Arkansas. With a mean of 4435.8 we can compare the income for the state with the indicator.
- A second way is the following:
$HighIncome1 <- ifelse(dat$Income > mean(dat$Income), 1, 0)
dathead(dat)