Practical 2

This practical will guide you through creating, manipulating, and subsetting data frames, importing and exporting data, and practicing functions in R.

This practical will require you to install a few new packages:

  • haven
  • writexl

Remember, avoid putting install.packages statement in your scripts (since this would mean the are re-installed every time the script is run. Instead, install them once, and load with library each time you need them.

install.packages(c("haven", "writexl"))

You may want to use pak, a modern approach to installing R packages with several useful features:

install.packages("pak")
pak::pak(c("haven", "writexl"))

Data frames

  1. Combine the three vectors we created in Practical 1 into a new data frame. Remember to store the result as a new object.
x <- 1:5
y <- c("a", "b", "c", "d", "e")
z <- rnorm(5)
my_data <- data.frame(x, y, z)
  1. Count the number of rows and columns (e.g., nrow, ncol) and get the column names (names) of this data frame.
# Assuming your data frame above was stored in an object called 'my_data':
nrow(my_data)
[1] 5
ncol(my_data)
[1] 3
names(my_data)
[1] "x" "y" "z"

Built-in datasets

R has many built-in datasets that can be loaded with the data function. You can view available datasets by typing data(). For example:

1data()
2data(mtcars)
3head(mtcars)
1
List all available datasets.
2
Load mtcars, one of the built-in datasets.
3
Preview the first few rows of mtcars.
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
Note

Note that, once you’ve loaded the dataset with data, you can see it in the “Environment” pane in RStudio.

Many packages also come with datasets. We can see the datasets provided by a given package by typing, for example:

1library(lme4)
data(package = "lme4")
1
To run this line, you’ll need to install lme4 first.

Importing data

The starwars dataset is a built-in dataset provided by the dplyr package. It contains data on characters from the Star Wars universe, including variables such as name, height, mass, species, homeworld, and films.

To practice importing data, let’s first export the first 10 columns from the starwars dataset in a variety of formats:

1library(tidyverse)
2library(haven)
3library(writexl)

starwars <- starwars[, 1:10]

write_csv(starwars, file = "starwars.csv")
write_xlsx(starwars, path = "starwars.xlsx")
write_dta(starwars, path = "starwars.dta")
write_sav(starwars, path = "starwars.sav")
1
We’ll introduce the tidyverse package next session.
2
A package for importing and exporting data in different formats (e.g., Stata, SPSS).
3
A package for importing and exporting Microsoft Excel files.
  1. Import starwars.csv using read_csv from the readr package.
starwars_csv <- read_csv("starwars.csv")
NoteQuestions
  • What does this output mean?
  • What does the col_types argument do?
  1. Import starwars.xlsx using read_xlsx from the readxl package.
library(readxl)
starwars_xlsx  <- read_xlsx("starwars.xlsx")
Tip

If you want to use a function without loading the package, you can type :: to refer to a specific function. For example:

readxl::read_xlsx("starwars.xlsx")

You might want to do this when you only need a single function once (e.g., importing).

  1. Import starwars.dta (a Stata dataset) and starwars.sav (an SPSS dataset) using appropriate functions from the haven package.
library(haven)

starwars_dta  <- read_dta("starwars.dta")
starwars_sav  <- read_sav("starwars.sav")

Subsetting

Subsetting is the process of selecting specific elements, rows, or columns from a data object based on conditions or indices.

  1. Define a vector containing letters A to F.
a_to_f <- c("A", "B", "C", "D", "E", "F")
Note

You could also create this vector using the built-in LETTERS vector and subsetting:

a_to_f <- LETTERS[1:6]
  1. Use subsetting to select:
  1. The first element (i.e., A)
  2. The third element (i.e., C)
  3. All except the last element (i.e., A, B, C, D, E)

We can select elements by position:

# First element
a_to_f[1]
[1] "A"
# Third element
a_to_f[3]
[1] "C"

To select ‘all except the last’ element, we need to use negation:

a_to_f[-6]
[1] "A" "B" "C" "D" "E"

Note that this would only work if the vector has exactly five elements. We could instead define the position using length (i.e., the number of elements in the vector):

a_to_f[-length(a_to_f)]
[1] "A" "B" "C" "D" "E"

This would work for vectors of any length, for example:

week_days <- c("Monday", "Tuesday", "Wednesday")

# This works:
week_days[-length(week_days)]
[1] "Monday"  "Tuesday"
# Whereas our previous solution would fail:
week_days[-6]
[1] "Monday"    "Tuesday"   "Wednesday"
  1. Define a list containing a character vector, a logical value, and another list:
mylist <- list(
  a = c("Monday", "Tuesday", "Wednesday"),
  b = TRUE,
  c = list(rnorm(5), 21.3, "firstname")
)
  1. Use subsetting to select:
  1. The first element
  2. The element named b
  3. Tuesday
  4. 21.3
# i.  The first element;

mylist[[1]]
[1] "Monday"    "Tuesday"   "Wednesday"
# ii. The element named `b`;

mylist$b
[1] TRUE
mylist[["b"]]
[1] TRUE
mylist["b"]
$b
[1] TRUE
mylist[names(mylist) == "b"]
$b
[1] TRUE
# iii. `Tuesday`

mylist[["a"]][2]
[1] "Tuesday"
mylist[[1]][2]
[1] "Tuesday"
# iv. `21.3`

mylist$c[[2]]
[1] 21.3
mylist[[3]][[2]]
[1] 21.3