install.packages(c("haven", "writexl"))Practical 2
This practical will guide you through creating, manipulating, and subsetting data frames, importing and exporting data, and practicing functions in R.
This practical will require you to install a few new packages:
havenwritexl
Remember, avoid putting install.packages statement in your scripts (since this would mean the are re-installed every time the script is run. Instead, install them once, and load with library each time you need them.
You may want to use pak, a modern approach to installing R packages with several useful features:
install.packages("pak")
pak::pak(c("haven", "writexl"))Data frames
- Combine the three vectors we created in Practical 1 into a new data frame. Remember to store the result as a new object.
x <- 1:5
y <- c("a", "b", "c", "d", "e")
z <- rnorm(5)- Count the number of rows and columns (e.g.,
nrow,ncol) and get the column names (names) of this data frame.
Built-in datasets
R has many built-in datasets that can be loaded with the data function. You can view available datasets by typing data(). For example:
- 1
- List all available datasets.
- 2
-
Load
mtcars, one of the built-in datasets. - 3
-
Preview the first few rows of
mtcars.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Note that, once you’ve loaded the dataset with data, you can see it in the “Environment” pane in RStudio.
Many packages also come with datasets. We can see the datasets provided by a given package by typing, for example:
1library(lme4)
data(package = "lme4")- 1
-
To run this line, you’ll need to install
lme4first.
Importing data
The starwars dataset is a built-in dataset provided by the dplyr package. It contains data on characters from the Star Wars universe, including variables such as name, height, mass, species, homeworld, and films.
To practice importing data, let’s first export the first 10 columns from the starwars dataset in a variety of formats:
- 1
-
We’ll introduce the
tidyversepackage next session. - 2
- A package for importing and exporting data in different formats (e.g., Stata, SPSS).
- 3
- A package for importing and exporting Microsoft Excel files.
- What does this output mean?
- What does the
col_typesargument do?
If you want to use a function without loading the package, you can type :: to refer to a specific function. For example:
readxl::read_xlsx("starwars.xlsx")You might want to do this when you only need a single function once (e.g., importing).
- Import
starwars.dta(a Stata dataset) andstarwars.sav(an SPSS dataset) using appropriate functions from thehavenpackage.
Subsetting
Subsetting is the process of selecting specific elements, rows, or columns from a data object based on conditions or indices.
- Define a vector containing letters
AtoF.
- Use subsetting to select:
- The first element (i.e.,
A) - The third element (i.e.,
C) - All except the last element (i.e.,
A, B, C, D, E)
- Define a list containing a character vector, a logical value, and another list:
mylist <- list(
a = c("Monday", "Tuesday", "Wednesday"),
b = TRUE,
c = list(rnorm(5), 21.3, "firstname")
)- Use subsetting to select:
- The first element
- The element named
b Tuesday21.3