12.2 Example Data

The following data are included in the R Core will be used as illustrative of the concepts in this chapter. This chapter assumes that you know about data frames introduced in Section 10.

Some of the data are combined into a data frame, called dat, for easier viewing.

  rm(list=ls())
 ?state # documentation for data
  data(state) # loads state data into the workspace
   
  dat <- data.frame(state.abb,state.division,state.region,state.x77)
  str(dat)
'data.frame':   50 obs. of  11 variables:
 $ state.abb     : chr  "AL" "AK" "AZ" "AR" ...
 $ state.division: Factor w/ 9 levels "New England",..: 4 9 8 5 9 8 1 3 3 3 ...
 $ state.region  : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
 $ Population    : num  3615 365 2212 2110 21198 ...
 $ Income        : num  3624 6315 4530 3378 5114 ...
 $ Illiteracy    : num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
 $ Life.Exp      : num  69 69.3 70.5 70.7 71.7 ...
 $ Murder        : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
 $ HS.Grad       : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
 $ Frost         : num  20 152 15 65 20 166 139 103 11 60 ...
 $ Area          : num  50708 566432 113417 51945 156361 ...

The output of the str() function of dat for the state data shows a num for numerical quantities. Variables from Population to Area in dat are numerical. Population or Income, for example, are shown as integer variables, while Illiteracy and Life Expectancy are continuous variables.

The variables in dat for state.abb, state.division, and state.region are categorical variables and shown as factors in the str().