12.6 Indicator Variables

A special case of a variable with multiple categories is an indicator variable. These variables are sometimes referred to also as binary or dummy variables.

You can think of these variables with just two categories: 0 and 1. Usually the level of 1 is reserved for the characteristic of interest.

For example, you may want to indicate those with incomes higher than average. For the states data, you could do this multiple ways.

  • One way is the following:
dat$HighIncome <- 1*(dat$Income > mean(dat$Income))
head(dat)

Listing the first few lines of the data show a 1 for Alaska, Arizone, California, and Colorado, and a 0 for Alabama and Arkansas. With a mean of 4435.8 we can compare the income for the state with the indicator.

  • A second way is the following:
dat$HighIncome1 <- ifelse(dat$Income > mean(dat$Income), 1, 0)
head(dat)