10.5 Draw a Random Sample

In certain analyses, you may want to take a random sample from your original data.

Again, let us remind ourselves of the data in dat from Section 10.1 with 11 observations and 3 variables.

         time   t    accum
1        Zero 0.0 1.000000
2    One Half 0.5 1.024695
3         One 1.0 1.050000
4    One+Half 1.5 1.075930
5         Two 2.0 1.102500
6    Two+Half 2.5 1.129726
7       Three 3.0 1.157625
8  Three+Half 3.5 1.186213
9        Four 4.0 1.215506
10  Four+Half 4.5 1.245523
11       Five 5.0 1.276282

10.5.1 Without Replacement

Suppose we want to create a new data frame that randomly selects a unique row each with each draw. Thus, once a row is selected, that data are not permitted to be sampled again. This type of sampling is called without replacement.

To sample five rows without replacement from dat we use the following command:

dat.wo <- dat[sample(nrow(dat), size = 5, replace = FALSE), ]
dat.wo

Take a look at your new data frame dat.wo. You have 5 rows but note that they are not ordered by time as in the original dat.

The syntax for the function sample() is examines the length of dat and randomly samples row numbers. The rows associated with the sampled row numbers are retained in the new data frame. These row numbers are in the r part of the [r, c] of the data frame. Note that the code is before the comma. The coding of this sampling mechanism is similar to that when we took a small subset of the data in Section 10.4. Also the replace = FALSE part of the syntax indicates to sample without replacement.

10.5.2 With Replacement

Suppose we want to create a new data frame that randomly selects a new row each with each draw, but this time we do not care if we repeat rows selected previously. This type of sampling is called with replacement.

To sample five rows with replacement from dat we use the following command:

dat.with <- dat[sample(nrow(dat), size = 5, replace = TRUE), ]
dat.with

Take a look at your new data frame dat.with. You have 5 rows but note that they are not ordered by time as in the original dat, as the rows are sampled at random.

Here we could use the same coding as in Section 10.5.1 and just change the replace = FALSE to replace = TRUE.

The statistical technique called bootstrapping uses random sampling with replacement.