This section provides optional exercises for those who want to practise and solidify the concepts introduced in the practicals. The exercises focus on subsetting data frames using the penguins dataset and more advanced vector subsetting with conditionals.
Some of the later exercises are quite advanced, so you may choose to skip them for now and revisit them later, after the course, to practise what you’ve learnt.
Subsetting data frames
The penguins dataset contains information about 344 penguins, including 3 different species, collected from 3 islands in the Palmer Archipelago, Antarctica.
Load the penguins dataset by typing data(penguins) and inspect the first few rows.
data (penguins)
head (penguins)
species island bill_len bill_dep flipper_len body_mass sex year
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18.0 195 3250 female 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
[1] "species" "island" "bill_len" "bill_dep" "flipper_len"
[6] "body_mass" "sex" "year"
The penguins dataset was added to R in version 4.5.0. If you’re using an earlier version of R, you’ll need to install the palmerpenguins package to access the data:
library (palmerpenguins)
head (penguins)
You can check your current R version with sessionInfo(). Note that some of the variable names are different in the package version of the dataset.
Select the third column (bill_len) by name :
Using the $ syntax;
Using single square brackets (penguins[...]);
Using double square brackets (penguins[[...]])
[1] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1 38.6 34.6
[16] 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5
[31] 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0
[46] 39.6 41.1 37.5 36.0 42.3 39.6 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6
[61] 35.7 41.3 37.6 41.1 36.4 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5
[76] 42.8 40.9 37.2 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9
[91] 35.7 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9
[106] 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1
[121] 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1
[136] 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8
[151] 36.0 41.5 46.1 50.0 48.7 50.0 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5
[166] 48.4 45.8 49.3 42.0 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8
[181] 48.2 50.0 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5
[211] 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5 50.7 47.7 46.4 48.2
[226] 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5 47.4 50.0 44.9 50.8 43.4 51.3
[241] 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2
[256] 49.1 47.3 46.8 41.7 53.4 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8
[271] 47.2 NA 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0
[286] 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2 50.6
[301] 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5 47.6 52.0 46.9
[316] 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8 48.1 51.4 45.7 50.7
[331] 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5 49.6 50.8 50.2
[1] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1 38.6 34.6
[16] 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5
[31] 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0
[46] 39.6 41.1 37.5 36.0 42.3 39.6 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6
[61] 35.7 41.3 37.6 41.1 36.4 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5
[76] 42.8 40.9 37.2 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9
[91] 35.7 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9
[106] 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1
[121] 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1
[136] 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8
[151] 36.0 41.5 46.1 50.0 48.7 50.0 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5
[166] 48.4 45.8 49.3 42.0 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8
[181] 48.2 50.0 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5
[211] 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5 50.7 47.7 46.4 48.2
[226] 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5 47.4 50.0 44.9 50.8 43.4 51.3
[241] 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2
[256] 49.1 47.3 46.8 41.7 53.4 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8
[271] 47.2 NA 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0
[286] 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2 50.6
[301] 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5 47.6 52.0 46.9
[316] 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8 48.1 51.4 45.7 50.7
[331] 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5 49.6 50.8 50.2
[1] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1 38.6 34.6
[16] 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5
[31] 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0
[46] 39.6 41.1 37.5 36.0 42.3 39.6 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6
[61] 35.7 41.3 37.6 41.1 36.4 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5
[76] 42.8 40.9 37.2 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9
[91] 35.7 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9
[106] 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1
[121] 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1
[136] 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8
[151] 36.0 41.5 46.1 50.0 48.7 50.0 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5
[166] 48.4 45.8 49.3 42.0 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8
[181] 48.2 50.0 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5
[211] 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5 50.7 47.7 46.4 48.2
[226] 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5 47.4 50.0 44.9 50.8 43.4 51.3
[241] 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2
[256] 49.1 47.3 46.8 41.7 53.4 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8
[271] 47.2 NA 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0
[286] 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2 50.6
[301] 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5 47.6 52.0 46.9
[316] 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8 48.1 51.4 45.7 50.7
[331] 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5 49.6 50.8 50.2
Select the fifth column (flipper_len) by position :
Using single square brackets (penguins[...]);
Using double square brackets (penguins[[...]])
[1] 181 186 195 NA 193 190 181 195 193 190 186 180 182 191 198 185 195 197
[19] 184 194 174 180 189 185 180 187 183 187 172 180 178 178 188 184 195 196
[37] 190 180 181 184 182 195 186 196 185 190 182 179 190 191 186 188 190 200
[55] 187 191 186 193 181 194 185 195 185 192 184 192 195 188 190 198 190 190
[73] 196 197 190 195 191 184 187 195 189 196 187 193 191 194 190 189 189 190
[91] 202 205 185 186 187 208 190 196 178 192 192 203 183 190 193 184 199 190
[109] 181 197 198 191 193 197 191 196 188 199 189 189 187 198 176 202 186 199
[127] 191 195 191 210 190 197 193 199 187 190 191 200 185 193 193 187 188 190
[145] 192 185 190 184 195 193 187 201 211 230 210 218 215 210 211 219 209 215
[163] 214 216 214 213 210 217 210 221 209 222 218 215 213 215 215 215 216 215
[181] 210 220 222 209 207 230 220 220 213 219 208 208 208 225 210 216 222 217
[199] 210 225 213 215 210 220 210 225 217 220 208 220 208 224 208 221 214 231
[217] 219 230 214 229 220 223 216 221 221 217 216 230 209 220 215 223 212 221
[235] 212 224 212 228 218 218 212 230 218 228 212 224 214 226 216 222 203 225
[253] 219 228 215 228 216 215 210 219 208 209 216 229 213 230 217 230 217 222
[271] 214 NA 215 222 212 213 192 196 193 188 197 198 178 197 195 198 193 194
[289] 185 201 190 201 197 181 190 195 181 191 187 193 195 197 200 200 191 205
[307] 187 201 187 203 195 199 195 210 192 205 210 187 196 196 196 201 190 212
[325] 187 198 199 201 193 203 187 197 191 203 202 194 206 189 195 207 202 193
[343] 210 198
flipper_len
1 181
2 186
3 195
4 NA
5 193
6 190
7 181
8 195
9 193
10 190
11 186
12 180
13 182
14 191
15 198
16 185
17 195
18 197
19 184
20 194
21 174
22 180
23 189
24 185
25 180
26 187
27 183
28 187
29 172
30 180
31 178
32 178
33 188
34 184
35 195
36 196
37 190
38 180
39 181
40 184
41 182
42 195
43 186
44 196
45 185
46 190
47 182
48 179
49 190
50 191
51 186
52 188
53 190
54 200
55 187
56 191
57 186
58 193
59 181
60 194
61 185
62 195
63 185
64 192
65 184
66 192
67 195
68 188
69 190
70 198
71 190
72 190
73 196
74 197
75 190
76 195
77 191
78 184
79 187
80 195
81 189
82 196
83 187
84 193
85 191
86 194
87 190
88 189
89 189
90 190
91 202
92 205
93 185
94 186
95 187
96 208
97 190
98 196
99 178
100 192
101 192
102 203
103 183
104 190
105 193
106 184
107 199
108 190
109 181
110 197
111 198
112 191
113 193
114 197
115 191
116 196
117 188
118 199
119 189
120 189
121 187
122 198
123 176
124 202
125 186
126 199
127 191
128 195
129 191
130 210
131 190
132 197
133 193
134 199
135 187
136 190
137 191
138 200
139 185
140 193
141 193
142 187
143 188
144 190
145 192
146 185
147 190
148 184
149 195
150 193
151 187
152 201
153 211
154 230
155 210
156 218
157 215
158 210
159 211
160 219
161 209
162 215
163 214
164 216
165 214
166 213
167 210
168 217
169 210
170 221
171 209
172 222
173 218
174 215
175 213
176 215
177 215
178 215
179 216
180 215
181 210
182 220
183 222
184 209
185 207
186 230
187 220
188 220
189 213
190 219
191 208
192 208
193 208
194 225
195 210
196 216
197 222
198 217
199 210
200 225
201 213
202 215
203 210
204 220
205 210
206 225
207 217
208 220
209 208
210 220
211 208
212 224
213 208
214 221
215 214
216 231
217 219
218 230
219 214
220 229
221 220
222 223
223 216
224 221
225 221
226 217
227 216
228 230
229 209
230 220
231 215
232 223
233 212
234 221
235 212
236 224
237 212
238 228
239 218
240 218
241 212
242 230
243 218
244 228
245 212
246 224
247 214
248 226
249 216
250 222
251 203
252 225
253 219
254 228
255 215
256 228
257 216
258 215
259 210
260 219
261 208
262 209
263 216
264 229
265 213
266 230
267 217
268 230
269 217
270 222
271 214
272 NA
273 215
274 222
275 212
276 213
277 192
278 196
279 193
280 188
281 197
282 198
283 178
284 197
285 195
286 198
287 193
288 194
289 185
290 201
291 190
292 201
293 197
294 181
295 190
296 195
297 181
298 191
299 187
300 193
301 195
302 197
303 200
304 200
305 191
306 205
307 187
308 201
309 187
310 203
311 195
312 199
313 195
314 210
315 192
316 205
317 210
318 187
319 196
320 196
321 196
322 201
323 190
324 212
325 187
326 198
327 199
328 201
329 193
330 203
331 187
332 197
333 191
334 203
335 202
336 194
337 206
338 189
339 195
340 207
341 202
342 193
343 210
344 198
What’s the difference between the single ([]) and double brackets ([[]])?
Creating new columns
With the penguins data frame:
Tabulate the number of missing values (NA) in the sex column.
You’ll need to use the is.na and table functions and table.
table (is.na (penguins[["sex" ]]))
# Or, equivalently:
table (is.na (penguins$ sex))
# This would also work, but is fairly verbose:
sum (is.na (penguins$ sex))
sum (! is.na (penguins$ sex))
Using assignment, create a new column, containing bill_len stored as an integer.
penguins$ bill_int <- as.integer (penguins$ bill_len)
Create a new column containing bill_dep formatted to two decimal places.
You’ll need to use sprintf for this, setting the format argument to %.2f. See the help file for details.
penguins$ bill_depth_2dp <- sprintf ("%.2f" , penguins$ bill_dep)
Use paste to concatenate the species and island columns. Store the result as a new column.
penguins$ species_island <- paste (penguins$ species, penguins$ island)
Months of the year
Define a character vector containing the months of the year (i.e., January, February, …, December).
moy <- c (
"January" ,
"February" ,
"March" ,
"April" ,
"May" ,
"June" ,
"July" ,
"August" ,
"September" ,
"October" ,
"November" ,
"December"
)
# Or use the built-in constant 'months.name'
moy <- month.name
# NOTE : I'm using 'moy' as short for 'Months of the year', but you can use any
# label you like.
Select the third element of this vector.
Select the 6th, 7th, and 8th elements of this vector.
[1] "June" "July" "August"
# Or, even shorter:
moy[6 : 8 ]
[1] "June" "July" "August"
Select the last five elements of this vector.
[1] "August" "September" "October" "November" "December"
# The above answer will work, but it assumes that
# there are always 12 elements in the vector. This
# is true for this example, but it's good practice
# write code that can handle various input lengths.
tail (moy, 5 )
[1] "August" "September" "October" "November" "December"
(Harder) Select all months ending in “er” (Hint: str_ends).
str_ends is a function that returns a logical value (TRUE or FALSE) indicating whether a string ends with a specified pattern.
To use the function, you’ll first need to load the tidyverse library:
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
[1] "September" "October" "November" "December"
(Harder) Select all months starting with “A” (Hint: str_starts).
moy[str_starts (moy, "A" )]
# As always, there are multiple ways of achieving this.
# The above method is quite efficient, but you could
# also try:
# Using regular expressions:
moy[str_detect (moy, "^A" )]
# Using purrr
keep (moy, \(x) str_detect (x, "^A" ))
# Or, if you really want to over-complicate it:
a_months <- c ()
for (m in moy) {
if (substring (m, 1 , 1 ) == "A" ) {
a_months <- c (a_months, m)
}
} # You shouldn't do this! But try to see what's happening here.