12.3 Summarize Data
We can summarize numerical data by examining their:
- Means
- Modes
- Medians
- Percentiles
- Ranges
- Variances/Standard Deviations
We can summarize categorical data by examining their:
- Frequencies
- Proportions
Using our example data frame, dat, from Section 12.2 we can summarize the data with the summary() function.
summary(dat)
Note that the function automatically creates statistics appropriate for the type of the data. For numerical data, the minimum, 25% percentile, median (50% percentile), mean (average), 75% percentile, and maximum are shown. For categorical data, the frequencies are shown, although limited to 7 lines with the last being Other.
Note that using the summary() function does not show the variance or the standard deviation.
A nice package to obtain descriptive statistics is the psych package. Refer to Sections 3.1 and 3.2 for details on installing and loading packages. To use this package, be sure that it is listed in the Packages tab in the lower right-hand window of RStudio. If not shown, then click Install and follow the instructions in 3.1.
library(psych)
describe(dat)
Now we see the standard deviation (sd), as well as a number of other statistics that are calculated. The psych package calculates statistics (meaningless) for the categorical variables, but does include an (*) to indicate that they are categorical.