13.2 Normal Distribution

The normal distribution is used if the variable is continuous. We usually refer to the density of a normal random variable as a bell-shaped curve. We require a value for the mean and another for the standard deviation to simulate a value from a normal distribution.(The mean and standard deviation (or variance) are the parameters of a normal distribution.)

We can easily simulate 1000 values from a normal distribution with a mean of 10 and a standard deviation of 4 as follows:

x <- rnorm(n = 1000, mean = 10, sd = 4)

Notice that notation of n = (number of values to be simulated), mean = , and sd = . If we plot a histogram, we can see a somewhat bell-shaped curve.

hist(x)

We can check what are the mean and standard deviation of the sample values. We could start with the summary() function:

summary(x)

Note in the results that the mean is shown. While the mean not equal 10, it is close to 10. Samples of variables will never exactly be equal to the parameters used (in this case the mean and standard deviation). The larger the sample, the closer the simulated sample values will be to those set in the rnorm() function.

The standard deviation is not shown. We can use the sd() function or load the psych package.

sd(x)

library(psych)
describe(x)

Again we see that the standard deviation is close to the parameter of 4.

Simulation is a nice tool as you can re-do everything and get different samples. This way you can see how quantities (like the mean, standard deviation, histogram) vary from one sample to another, even though they were generated from the same underlying distribution.

You can change the sample size, or mean, or standard deviation. Plotting the values helps you see the data. See what happens if we just change the sample size to 10, instead of 1000.

x <- rnorm(n = 10, mean = 10, sd = 4)
describe(x)
hist(x)

For more depth on choosing the number of values to simulate, see your favorite statistics book.