Homework

This section provides optional exercises for those who want to practise and solidify the concepts introduced in the practicals. The exercises focus on subsetting data frames using the penguins dataset and more advanced vector subsetting with conditionals.

ImportantThis is entirely optional

Some of the later exercises are quite advanced, so you may choose to skip them for now and revisit them later, after the course, to practise what you’ve learnt.

Subsetting data frames

The penguins dataset contains information about 344 penguins, including 3 different species, collected from 3 islands in the Palmer Archipelago, Antarctica.

  1. Load the penguins dataset by typing data(penguins) and inspect the first few rows.
data(penguins)
head(penguins)
  species    island bill_len bill_dep flipper_len body_mass    sex year
1  Adelie Torgersen     39.1     18.7         181      3750   male 2007
2  Adelie Torgersen     39.5     17.4         186      3800 female 2007
3  Adelie Torgersen     40.3     18.0         195      3250 female 2007
4  Adelie Torgersen       NA       NA          NA        NA   <NA> 2007
5  Adelie Torgersen     36.7     19.3         193      3450 female 2007
6  Adelie Torgersen     39.3     20.6         190      3650   male 2007
names(penguins)
[1] "species"     "island"      "bill_len"    "bill_dep"    "flipper_len"
[6] "body_mass"   "sex"         "year"       
nrow(penguins)
[1] 344

The penguins dataset was added to R in version 4.5.0. If you’re using an earlier version of R, you’ll need to install the palmerpenguins package to access the data:

library(palmerpenguins)
head(penguins)

You can check your current R version with sessionInfo(). Note that some of the variable names are different in the package version of the dataset.

  1. Select the third column (bill_len) by name:
  1. Using the $ syntax;
  2. Using single square brackets (penguins[...]);
  3. Using double square brackets (penguins[[...]])
penguins$bill_len
  [1] 39.1 39.5 40.3   NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1 38.6 34.6
 [16] 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5
 [31] 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0
 [46] 39.6 41.1 37.5 36.0 42.3 39.6 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6
 [61] 35.7 41.3 37.6 41.1 36.4 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5
 [76] 42.8 40.9 37.2 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9
 [91] 35.7 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9
[106] 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1
[121] 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1
[136] 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8
[151] 36.0 41.5 46.1 50.0 48.7 50.0 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5
[166] 48.4 45.8 49.3 42.0 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8
[181] 48.2 50.0 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5
[211] 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5 50.7 47.7 46.4 48.2
[226] 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5 47.4 50.0 44.9 50.8 43.4 51.3
[241] 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2
[256] 49.1 47.3 46.8 41.7 53.4 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8
[271] 47.2   NA 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0
[286] 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2 50.6
[301] 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5 47.6 52.0 46.9
[316] 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8 48.1 51.4 45.7 50.7
[331] 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5 49.6 50.8 50.2
penguins[, "bill_len"]
  [1] 39.1 39.5 40.3   NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1 38.6 34.6
 [16] 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5
 [31] 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0
 [46] 39.6 41.1 37.5 36.0 42.3 39.6 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6
 [61] 35.7 41.3 37.6 41.1 36.4 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5
 [76] 42.8 40.9 37.2 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9
 [91] 35.7 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9
[106] 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1
[121] 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1
[136] 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8
[151] 36.0 41.5 46.1 50.0 48.7 50.0 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5
[166] 48.4 45.8 49.3 42.0 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8
[181] 48.2 50.0 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5
[211] 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5 50.7 47.7 46.4 48.2
[226] 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5 47.4 50.0 44.9 50.8 43.4 51.3
[241] 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2
[256] 49.1 47.3 46.8 41.7 53.4 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8
[271] 47.2   NA 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0
[286] 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2 50.6
[301] 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5 47.6 52.0 46.9
[316] 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8 48.1 51.4 45.7 50.7
[331] 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5 49.6 50.8 50.2
penguins[["bill_len"]]
  [1] 39.1 39.5 40.3   NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1 38.6 34.6
 [16] 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5
 [31] 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0
 [46] 39.6 41.1 37.5 36.0 42.3 39.6 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6
 [61] 35.7 41.3 37.6 41.1 36.4 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5
 [76] 42.8 40.9 37.2 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9
 [91] 35.7 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9
[106] 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1
[121] 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1
[136] 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8
[151] 36.0 41.5 46.1 50.0 48.7 50.0 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5
[166] 48.4 45.8 49.3 42.0 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8
[181] 48.2 50.0 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5
[211] 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5 50.7 47.7 46.4 48.2
[226] 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5 47.4 50.0 44.9 50.8 43.4 51.3
[241] 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2
[256] 49.1 47.3 46.8 41.7 53.4 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8
[271] 47.2   NA 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0
[286] 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2 50.6
[301] 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5 47.6 52.0 46.9
[316] 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8 48.1 51.4 45.7 50.7
[331] 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5 49.6 50.8 50.2
  1. Select the fifth column (flipper_len) by position:
  1. Using single square brackets (penguins[...]);
  2. Using double square brackets (penguins[[...]])
penguins[[5]]
  [1] 181 186 195  NA 193 190 181 195 193 190 186 180 182 191 198 185 195 197
 [19] 184 194 174 180 189 185 180 187 183 187 172 180 178 178 188 184 195 196
 [37] 190 180 181 184 182 195 186 196 185 190 182 179 190 191 186 188 190 200
 [55] 187 191 186 193 181 194 185 195 185 192 184 192 195 188 190 198 190 190
 [73] 196 197 190 195 191 184 187 195 189 196 187 193 191 194 190 189 189 190
 [91] 202 205 185 186 187 208 190 196 178 192 192 203 183 190 193 184 199 190
[109] 181 197 198 191 193 197 191 196 188 199 189 189 187 198 176 202 186 199
[127] 191 195 191 210 190 197 193 199 187 190 191 200 185 193 193 187 188 190
[145] 192 185 190 184 195 193 187 201 211 230 210 218 215 210 211 219 209 215
[163] 214 216 214 213 210 217 210 221 209 222 218 215 213 215 215 215 216 215
[181] 210 220 222 209 207 230 220 220 213 219 208 208 208 225 210 216 222 217
[199] 210 225 213 215 210 220 210 225 217 220 208 220 208 224 208 221 214 231
[217] 219 230 214 229 220 223 216 221 221 217 216 230 209 220 215 223 212 221
[235] 212 224 212 228 218 218 212 230 218 228 212 224 214 226 216 222 203 225
[253] 219 228 215 228 216 215 210 219 208 209 216 229 213 230 217 230 217 222
[271] 214  NA 215 222 212 213 192 196 193 188 197 198 178 197 195 198 193 194
[289] 185 201 190 201 197 181 190 195 181 191 187 193 195 197 200 200 191 205
[307] 187 201 187 203 195 199 195 210 192 205 210 187 196 196 196 201 190 212
[325] 187 198 199 201 193 203 187 197 191 203 202 194 206 189 195 207 202 193
[343] 210 198
penguins[5]
    flipper_len
1           181
2           186
3           195
4            NA
5           193
6           190
7           181
8           195
9           193
10          190
11          186
12          180
13          182
14          191
15          198
16          185
17          195
18          197
19          184
20          194
21          174
22          180
23          189
24          185
25          180
26          187
27          183
28          187
29          172
30          180
31          178
32          178
33          188
34          184
35          195
36          196
37          190
38          180
39          181
40          184
41          182
42          195
43          186
44          196
45          185
46          190
47          182
48          179
49          190
50          191
51          186
52          188
53          190
54          200
55          187
56          191
57          186
58          193
59          181
60          194
61          185
62          195
63          185
64          192
65          184
66          192
67          195
68          188
69          190
70          198
71          190
72          190
73          196
74          197
75          190
76          195
77          191
78          184
79          187
80          195
81          189
82          196
83          187
84          193
85          191
86          194
87          190
88          189
89          189
90          190
91          202
92          205
93          185
94          186
95          187
96          208
97          190
98          196
99          178
100         192
101         192
102         203
103         183
104         190
105         193
106         184
107         199
108         190
109         181
110         197
111         198
112         191
113         193
114         197
115         191
116         196
117         188
118         199
119         189
120         189
121         187
122         198
123         176
124         202
125         186
126         199
127         191
128         195
129         191
130         210
131         190
132         197
133         193
134         199
135         187
136         190
137         191
138         200
139         185
140         193
141         193
142         187
143         188
144         190
145         192
146         185
147         190
148         184
149         195
150         193
151         187
152         201
153         211
154         230
155         210
156         218
157         215
158         210
159         211
160         219
161         209
162         215
163         214
164         216
165         214
166         213
167         210
168         217
169         210
170         221
171         209
172         222
173         218
174         215
175         213
176         215
177         215
178         215
179         216
180         215
181         210
182         220
183         222
184         209
185         207
186         230
187         220
188         220
189         213
190         219
191         208
192         208
193         208
194         225
195         210
196         216
197         222
198         217
199         210
200         225
201         213
202         215
203         210
204         220
205         210
206         225
207         217
208         220
209         208
210         220
211         208
212         224
213         208
214         221
215         214
216         231
217         219
218         230
219         214
220         229
221         220
222         223
223         216
224         221
225         221
226         217
227         216
228         230
229         209
230         220
231         215
232         223
233         212
234         221
235         212
236         224
237         212
238         228
239         218
240         218
241         212
242         230
243         218
244         228
245         212
246         224
247         214
248         226
249         216
250         222
251         203
252         225
253         219
254         228
255         215
256         228
257         216
258         215
259         210
260         219
261         208
262         209
263         216
264         229
265         213
266         230
267         217
268         230
269         217
270         222
271         214
272          NA
273         215
274         222
275         212
276         213
277         192
278         196
279         193
280         188
281         197
282         198
283         178
284         197
285         195
286         198
287         193
288         194
289         185
290         201
291         190
292         201
293         197
294         181
295         190
296         195
297         181
298         191
299         187
300         193
301         195
302         197
303         200
304         200
305         191
306         205
307         187
308         201
309         187
310         203
311         195
312         199
313         195
314         210
315         192
316         205
317         210
318         187
319         196
320         196
321         196
322         201
323         190
324         212
325         187
326         198
327         199
328         201
329         193
330         203
331         187
332         197
333         191
334         203
335         202
336         194
337         206
338         189
339         195
340         207
341         202
342         193
343         210
344         198

What’s the difference between the single ([]) and double brackets ([[]])?

Creating new columns

With the penguins data frame:

  1. Tabulate the number of missing values (NA) in the sex column.
TipHint

You’ll need to use the is.na and table functions and table.

table(is.na(penguins[["sex"]]))

FALSE  TRUE 
  333    11 
# Or, equivalently:
table(is.na(penguins$sex))

FALSE  TRUE 
  333    11 
# This would also work, but is fairly verbose:
sum(is.na(penguins$sex))
[1] 11
sum(!is.na(penguins$sex))
[1] 333
  1. Using assignment, create a new column, containing bill_len stored as an integer.
penguins$bill_int <-  as.integer(penguins$bill_len)
  1. Create a new column containing bill_dep formatted to two decimal places.
TipHint

You’ll need to use sprintf for this, setting the format argument to %.2f. See the help file for details.

penguins$bill_depth_2dp <- sprintf("%.2f", penguins$bill_dep)
  1. Use paste to concatenate the species and island columns. Store the result as a new column.
penguins$species_island <-  paste(penguins$species, penguins$island)

Months of the year

  1. Define a character vector containing the months of the year (i.e., January, February, …, December).
moy <- c(
  "January",
  "February",
  "March",
  "April",
  "May",
  "June",
  "July",
  "August",
  "September",
  "October",
  "November",
  "December"
)

# Or use the built-in constant 'months.name'
moy <- month.name

# NOTE: I'm using 'moy' as short for 'Months of the year', but you can use any
# label you like.
  1. Select the third element of this vector.
moy[3]
[1] "March"
  1. Select the 6th, 7th, and 8th elements of this vector.
moy[c(6, 7, 8)]
[1] "June"   "July"   "August"
# Or, even shorter:
moy[6:8]
[1] "June"   "July"   "August"
  1. Select the last five elements of this vector.
moy[8:12]
[1] "August"    "September" "October"   "November"  "December" 
# The above answer will work, but it assumes that
# there are always 12 elements in the vector. This
# is true for this example, but it's good practice
# write code that can handle various input lengths.

tail(moy, 5)
[1] "August"    "September" "October"   "November"  "December" 
  1. (Harder) Select all months ending in “er” (Hint: str_ends).

str_ends is a function that returns a logical value (TRUE or FALSE) indicating whether a string ends with a specified pattern.

To use the function, you’ll first need to load the tidyverse library:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
moy[str_ends(moy, "er")]
[1] "September" "October"   "November"  "December" 
  1. (Harder) Select all months starting with “A” (Hint: str_starts).
moy[str_starts(moy, "A")]
[1] "April"  "August"
# As always, there are multiple ways of achieving this.
# The above method is quite efficient, but you could
# also try:

# Using regular expressions:
moy[str_detect(moy, "^A")]
[1] "April"  "August"
# Using purrr
keep(moy, \(x) str_detect(x, "^A"))
[1] "April"  "August"
# Or, if you really want to over-complicate it:
a_months <- c()
for (m in moy) {
  if (substring(m, 1, 1) == "A") {
    a_months <- c(a_months, m)
  }
}  # You shouldn't do this! But try to see what's happening here.