The following code is required for proper rendering of the document. Please do not modify it. When working in the Rmd version of this file, do not attempt to run this chunk.

knitr::opts_chunk$set(error = TRUE)

Module 2: Communicating results, plotting and data visualization in R

Last updated on 2023-Feb-03.

Original text: Chad M. Eliason, PhD

Revisions: Nick M. A. Crouch, PhD; Lucas J. Legendre, PhD; and Carlos A. Rodriguez-Saltos, PhD

Exercises: Lucas J. Legendre, PhD

Rmarkdown implementation: Carlos A. Rodriguez-Saltos, PhD

Principal course instructor: Julia A. Clarke, PhD

These modules are part of the course “Curiosity to Question: Research Design, Data Analysis and Visualization”, taught by Dr. Julia A. Clarke and Dr. Adam Papendieck at UT Austin.

For questions or comments, please send an email to Dr. Clarke ().

How to cite

Eliason, C. M., Proffitt, J. V., Crouch, N. M. A., Legendre, L. J., Rodriguez-Saltos, C. A., Papendieck, A., & Clarke, J. A. (2020). The Clarke Lab’s Introductory Course to R. Retrieved from https://juliaclarke.squarespace.com

Importing your data

Before you begin

When using an RMarkdown file, the working directory will be the folder containing the Rmd file. The data will be stored in a separate folder, the “data” folder. It is good practice to place your unmodified data in a folder by their own. You will store scripts, documents, and results in other folders.

For today’s class, download all the datasets available on Canvas and place them in the data folder.

.txt files

We will import data.txt, which contains a rectangular matrix written in plain text (ASCII) format. Data files such as these can be exported from Excel or a database program.

The easiest way to define your working directory in RStudio (so that you don’t need to redefine it later on) is to go to:
Session > Set Working Directory > Choose Directory…
and choose the folder that contains both your script and your data.

dat<-read.table("data.txt")
## Warning in file(file, "rt"): cannot open file 'data.txt': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection

If you have a very large dataset, you can use head() to visualize just the first six lines rather than the whole data frame.

head(dat)
## Error in head(dat): object 'dat' not found

.csv files

Sometimes, data files have columns that are separated by columns. Files written in this format usually end in .csv. To open these files, we use read.csv.

flowers <- read.csv(file = "iris.csv")
## Warning in file(file, "rt"): cannot open file 'iris.csv': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection
head(flowers)
## Error in head(flowers): object 'flowers' not found
names(flowers)
## Error in eval(expr, envir, enclos): object 'flowers' not found
# Summary of dataset
summary(flowers)
## Error in summary(flowers): object 'flowers' not found
# Structure of dataset and of each column
str(flowers)
## Error in str(flowers): object 'flowers' not found
# Class of a column
class(flowers$Sepal.Width)
## Error in eval(expr, envir, enclos): object 'flowers' not found

If you check the iris.csv file, you will see that the first row is the header containing the names of the variables. It is advisible to always include a header in your data files. Optionally, the first column in the file may contain the row names (labels).

Other file types

read.delim allows you to use delimited files, such as Tab-delimited ones. Check the help file of read.table to find out more.

You can also import Excel data into R. For that, you need to install and load the gdal package. The function for reading Excel files is read.xls.

By using functions from other R packages you can import a huge variety of data files. However, the most common files you will probably deal with are .txt and .csv.

This tutorial has more info on importing data files in R: https://www.datacamp.com/community/tutorials/r-tutorial-read-excel-into-r

Importing files interactively

You can launch a dialog box to ask the user to pick a file. You launch the box using file.choose.

flowers <- read.csv(file = file.choose())
## Error in file.choose(): file choice cancelled

This method, however, is not recommended because it cannot be automated, and therefore, it may present difficulties when other researchers (or yourself in the future) want to replicate your results.

Exercise

use the functions head, names, class, and summary to figure out what is wrong with the format of the flowers dataset (hint: there are at least 3 things that are wrong).

Here is a corrected version of the file.

flowers <- read.csv(file = "iris_good.csv")
## Warning in file(file, "rt"): cannot open file 'iris_good.csv': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection
head(flowers)
## Error in head(flowers): object 'flowers' not found
summary(flowers)
## Error in summary(flowers): object 'flowers' not found
class(flowers)
## Error in eval(expr, envir, enclos): object 'flowers' not found

For the rest of the module, we will work with this corrected version of the dataset.

Required packages

We will install a package with many custom color palettes to choose from.

install.packages("RColorBrewer")
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(RColorBrewer)

Here is a sample of the color palettes available.

display.brewer.all()

For this document we will also need ggplot2, so install it if you haven’t already:

install.packages("ggplot2")
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(ggplot2)

Common functions - reminder

The following are functions commonly used in R. If you are unsure what any of them does, search for its help file. For example: ?length

length()
rev()
sum(), cumsum(), prod(), cumprod()
mean(), sd(), var(), median()
min(), max(), range(), summary()
exp(), log(), sin(), cos(), tan() ## radians, not degrees
round(), ceiling(), floor(), signif()
sort(), order(), rank()
which(), which.max()
any(), all()

We can apply these functions to a given variable in a dataset.

mean(flowers$Sepal.Length)
## Error in mean(flowers$Sepal.Length): object 'flowers' not found
sd(flowers$Sepal.Length)
## Error in is.data.frame(x): object 'flowers' not found
range(flowers$Sepal.Length)
## Error in eval(expr, envir, enclos): object 'flowers' not found

These functions don’t like NAs: you have to specify how they should be handled.

age <- c(32, 25, NA, 52)

# The following line does not produce meaningful output
mean(age)
## [1] NA
# The following does
mean(age, na.rm = TRUE)
## [1] 36.33333

You can use apply to execute a function over each column in a dataset. In the following example, we will do so with the mean function. Note, however, that this function requires numeric data; therefore, we need to exclude some columns from the dataset when we send it to apply.

# Species column contains characters. Note the use of the minus (-) sign to
# exclude this column when using subset.
apply(subset(flowers, select = -Species), MARGIN = 2, mean)
## Error in subset(flowers, select = -Species): object 'flowers' not found

We can also use the aggregate function.

?aggregate
aggregate(Sepal.Length ~ Species, data = flowers, FUN = mean)
## Error in eval(m$data, parent.frame()): object 'flowers' not found

‘.’ in a formula, such as the one below, means “all variables”.

aggregate(. ~ Species, data = flowers, FUN = mean)
## Error in eval(m$data, parent.frame()): object 'flowers' not found
aggregate(. ~ Species, data = flowers, FUN = sd)
## Error in eval(m$data, parent.frame()): object 'flowers' not found

Types of graphic functions

There are four types of graphic functions in R (some of which we have already encountered in module 1):

  1. High level plotting functions, which create complete plots. Examples: plot(), hist(), barplot(), boxplot(), qqnorm(), qqplot(), pairs().

  2. Low level plotting functions, which add features to a plot. Examples: lines(), points(), text(), mtext(), abline(), qqline(), title().

  3. Interactive plotting functions.

  4. par(), which changes plot settings.

Histogram

Histrograms display the distribution of your records along a continuous variable. To make a histogram, use hist() on a numeric, continuous vector:

hist(flowers$Sepal.Length)
## Error in hist(flowers$Sepal.Length): object 'flowers' not found

We can specify the approximate number of bins using the breaks argument.

hist(flowers$Sepal.Length, breaks=5)
## Error in hist(flowers$Sepal.Length, breaks = 5): object 'flowers' not found

With the ggplot2 package, you can get a similar result using qplot():

library(ggplot2)
qplot(flowers$Sepal.Length)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## Error in eval_tidy(mapping$x, data, caller_env): object 'flowers' not found

If the vector is in a data frame, you can use the following syntax:

qplot(Sepal.Length, data=flowers, binwidth=.25)
## Error in eval_tidy(mapping$x, data, caller_env): object 'flowers' not found

Alternatively, we can use the function ggplot. This function allows greater flexibility when modifying the plot, which is done by “summing” plotting functions.

ggplot(flowers, aes(x=Sepal.Length)) +
  geom_histogram(binwidth=.25, colour = "black")
## Error in ggplot(flowers, aes(x = Sepal.Length)): object 'flowers' not found

Here is a way to color the data by species:

ggplot(flowers, aes(x=Sepal.Length, fill=Species)) + 
  geom_histogram(binwidth=.25, alpha=.75, colour = "black")
## Error in ggplot(flowers, aes(x = Sepal.Length, fill = Species)): object 'flowers' not found

Scatter plot

A scatter plot shows the distribution of your records along two continuous variables. Use plot() on a vector of x values followed by a vector of y values:

names(flowers)
## Error in eval(expr, envir, enclos): object 'flowers' not found
plot(Sepal.Length ~ Sepal.Width, data = flowers)
## Error in eval(m$data, eframe): object 'flowers' not found

With qplot():

qplot(flowers$Sepal.Width, flowers$Sepal.Length)
## Error in `geom_point()`:
## ! Problem while computing aesthetics.
## ℹ Error occurred in the 1st layer.
## Caused by error in `FUN()`:
## ! object 'flowers' not found
# Same output, but using the `data` argument
qplot(Sepal.Width, Sepal.Length, data = flowers)
## Error in ggplot(data, mapping, environment = caller_env): object 'flowers' not found

With ggplot():

ggplot(flowers, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point()
## Error in ggplot(flowers, aes(x = Sepal.Width, y = Sepal.Length)): object 'flowers' not found

Line graph

Making line graphs with plot() is similar to making scatterplots, but you set type to “l”.

?pressure
plot(pressure$temperature, pressure$pressure, type="l")

To add points and/or multiple lines, use the functions points() and lines(). Those functions are not standalone, plot() must be called first (ie. each line in the following chunk will not run by itself, the whole chunk has to be called at once for it work).

plot(pressure$temperature, pressure$pressure, type="l")
points(pressure$temperature, pressure$pressure)
lines(pressure$temperature, pressure$pressure/2, col="red")
points(pressure$temperature, pressure$pressure/2, col="red")

With ggplot2, you can draw a line graph using qplot() with geom=“line”:

qplot(pressure$temperature, pressure$pressure, geom="line")

If the two vectors are already in the same data frame:

qplot(temperature, pressure, data=pressure, geom="line")

Which is equivalent to:

ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line()

Lines and points together:

qplot(temperature, pressure, data=pressure, geom=c("line", "point"))

Which is equivalent to:

ggplot(pressure, aes(x=temperature, y=pressure)) + 
geom_line() + 
geom_point()

Or a plot with both lines.

ggplot(pressure) + 
geom_line(aes(x=temperature, y=pressure)) + 
geom_point(aes(x=temperature, y=pressure)) + 
geom_line(aes(x=temperature, y=(pressure/2), colour = 'red')) +
geom_point(aes(x=temperature, y=pressure/2, colour = 'red', fill = 'red')) + 
theme(legend.position = 0)

Check the Canvas supplement on resources for learning R; included there are cheat sheets on the various options available in ggplot2. The R graph gallery is strongly recommended: https://www.r-graph-gallery.com/.

Bar graph

Bar graphs represent the relationship between a continuous variable, plotted int the y-axis, and a categorical one, plotted in the x axis.

?BOD
barplot(BOD$demand, names.arg=BOD$Time)

Sometimes we want a bar graph to represent the number of cases in each level of a categorical variable. In this sense, the barplot is similar to a histogram, but in the latter the x-axis is continuous and the y-axis represents frequency, not counts.

Let’s say that in the mtcars dataset, we want to know the number of cars for each category of number of cylinders. This information is not given explicitly in the dataset, but we can generate it using the table function.

table(mtcars$cyl)
## 
##  4  6  8 
## 11  7 14

There are 11 cases of the value 4, 7 cases of 6, and 14 cases of 8. Simply pass the table to barplot() to generate a bar graph.

barplot(table(mtcars$cyl))

With the ggplot2 package, you can get a similar result using qplot(). If you generate a bar graph with information that is explicitly in the dataset, use geom=“bar” and stat=“identity”. Notice the difference in the output when the x variable is continuous and when it is discrete:

ggplot(data=BOD, aes(Time, demand)) +
         geom_bar(stat="identity")

# Converting a numeric variable to a factor results in it being treated as a
# discrete variable
ggplot(data=BOD, aes(factor(Time), demand)) +
  geom_bar(stat="identity")

When you want to generate counts that are not included in the dataset:

qplot(factor(cyl), data=mtcars)

Which is equivalent to:

ggplot(mtcars, aes(x=factor(cyl))) + geom_bar()

Boxplot

Boxplots also allow you to explore the relationship between a categorical variable and a continuous one; in addition, they allow you to see, within each category, the distribution of several statistics, such as the mean, median, mininum, maximum, and some quartiles. To make a box plot, use plot() on a factor of x values and a numeric vector of y values.

plot(as.factor(flowers$Species), flowers$Sepal.Length)
## Error in is.factor(x): object 'flowers' not found

If the two vectors are in the same data frame, you can also use a formula.

boxplot(Sepal.Length ~ Species, data = flowers)
## Error in eval(m$data, parent.frame()): object 'flowers' not found

Check the help file of boxplot to see what the boxes and lines mean.

You can plot the interaction of two variables.

?ToothGrowth
table(ToothGrowth$dose)
## 
## 0.5   1   2 
##  20  20  20
boxplot(len ~ supp + dose, data = ToothGrowth)

With the ggplot2 package, you can get a similar result using qplot(), with geom=“boxplot”:

qplot(flowers$Species, flowers$Sepal.Length, geom="boxplot")
## Error in `geom_boxplot()`:
## ! Problem while computing aesthetics.
## ℹ Error occurred in the 1st layer.
## Caused by error in `FUN()`:
## ! object 'flowers' not found

If the two vectors are already in the same data frame, you can use the following syntax:

qplot(Species, Sepal.Length, data=flowers, geom="boxplot")
## Error in ggplot(data, mapping, environment = caller_env): object 'flowers' not found

Which is equivalent to:

ggplot(flowers, aes(x=Species, y=Sepal.Length)) + geom_boxplot()
## Error in ggplot(flowers, aes(x = Species, y = Sepal.Length)): object 'flowers' not found

It’s also possible to make box plots for multiple variables, by combining the variables using the function interaction(). In this case, the dose variable is numeric. interaction() converts it to a factor before combining it with another variable.

qplot(
  interaction(ToothGrowth$supp, ToothGrowth$dose), 
  ToothGrowth$len, 
  geom="boxplot"
)

Alternatively, when the variables are in the same data frame.

qplot(interaction(supp, dose), len, data=ToothGrowth, geom="boxplot")

Which is equivalent to:

ggplot(ToothGrowth, aes(x=interaction(supp, dose), y=len)) + geom_boxplot()

Plot of a function

To plot a function, use curve() on an expression using the object x, which does not need to be an object in your environment.

curve(x ^ 3 - 5 * x, from= -4, to= 4)

curve(x ^ 2, from= -10, to =10)

You can plot any function that takes a numeric vector as input and returns another numeric vector, including functions that you define yourself. Using add=TRUE will add a curve to the previously created plot.

myfun <- function(xvar) { 
  1/(1 + exp(-xvar + 10))
}

curve(myfun(x), from= 0, to= 20)

# Adding a line to plot defined in previous line of code
curve(1 - myfun(x), add = TRUE, col = "red")

With the ggplot2 package, you can get a similar result using ggplot(), by using stat_function “fun = myfunctionname” and geom=“line”.

ggplot(data.frame(x=c(0, 20)), aes(x=x)) +
  stat_function(fun=myfun, geom="line")

3D plots

We will work with the volcano dataset.

?volcano
data(volcano)
head(volcano)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]  100  100  101  101  101  101  101  100  100   100   101   101   102   102
## [2,]  101  101  102  102  102  102  102  101  101   101   102   102   103   103
## [3,]  102  102  103  103  103  103  103  102  102   102   103   103   104   104
## [4,]  103  103  104  104  104  104  104  103  103   103   103   104   104   104
## [5,]  104  104  105  105  105  105  105  104  104   103   104   104   105   105
## [6,]  105  105  105  106  106  106  106  105  105   104   104   105   105   106
##      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
## [1,]   102   102   103   104   103   102   101   101   102   103   104   104
## [2,]   103   103   104   105   104   103   102   102   103   105   106   106
## [3,]   104   104   105   106   105   104   104   105   106   107   108   110
## [4,]   105   105   106   107   106   106   106   107   108   110   111   114
## [5,]   105   106   107   108   108   108   109   110   112   114   115   118
## [6,]   106   107   109   110   110   112   113   115   116   118   119   121
##      [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
## [1,]   105   107   107   107   108   108   110   110   110   110   110   110
## [2,]   107   109   110   110   110   110   111   112   113   114   116   115
## [3,]   111   113   114   115   114   115   116   118   119   119   121   121
## [4,]   117   118   117   119   120   121   122   124   125   126   127   127
## [5,]   121   122   121   123   128   131   129   130   131   131   132   132
## [6,]   124   126   126   129   134   137   137   136   136   135   136   136
##      [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
## [1,]   110   110   108   108   108   107   107   108   108   108   108   108
## [2,]   114   112   110   110   110   109   108   109   109   109   109   108
## [3,]   120   118   116   114   112   111   110   110   110   110   109   109
## [4,]   126   124   122   120   117   116   113   111   110   110   110   109
## [5,]   131   130   128   126   122   119   115   114   112   110   110   110
## [6,]   136   135   133   129   126   122   118   116   115   113   111   110
##      [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61]
## [1,]   107   107   107   107   106   106   105   105   104   104   103
## [2,]   108   108   108   107   107   106   106   105   105   104   104
## [3,]   109   109   108   108   107   107   106   106   105   105   104
## [4,]   109   109   109   108   108   107   107   106   106   105   105
## [5,]   110   110   109   109   108   107   107   107   106   106   105
## [6,]   110   110   110   109   108   108   108   107   107   106   106

We can represent the information in volcano with a heatmap.

image(volcano)

or with a contour plot.

contour(volcano)

We can also combine layers into a plot.

image(volcano, col = terrain.colors(50))
contour(volcano, add = TRUE, lwd = .5)

We can also produce a 3D perspective plot.

persp(volcano)

We can change the angle of the perspective using the theta and phi arguments. Check the help file of persp for more information.

persp(volcano, theta=20)

persp(volcano, theta=20, phi = 35)

The website R Graph Gallery contains useful instructions on reproducing 3D plots, including information on how to make them with ggplot2.

Interactive plots

The locator function allows you to find values in a plot. The function, however, won’t work if you run it inside a chunk. Copy the entire code in the chunk and paste it to the console. The plot will appear in the lower right pane. Click on regions of interest to find the corresponding values. When you are done, hit Esc.

plot(Sepal.Length ~ Sepal.Width, data = flowers)
## Error in eval(m$data, eframe): object 'flowers' not found
xy <- locator()
## Error in locator(): plot.new has not been called yet
xy
## Error in eval(expr, envir, enclos): object 'xy' not found

Another function that might interest you is identify(). Check its help file.

Take into account that interactive functions get very slow with large datasets.

The package plotly can help you with implementing advanced interactivity. If you are interested, check their website: https://plot.ly/r/

Low level plotting functions

You can generate pretty much any plot using low-level functions, but it can be time-consuming. For example, here we will show you how to generate a cool thermometer display.

First, generate the x values.

x <- 1:2

Then, generate random temperature measurements.

y <- runif(2, 0, 100)

Now we will generate the plot. Because we will use low-level plotting functions, we require an open plot (see “Types of graphic functions” in this document). Thus, we need to run everything in a single chunk. Read the comments for information on what each function does.

# Generate an empty plot
par(mar=c(4,4,2,4))
plot(x, y, type='n', xlim=c(.5, 2.5), ylim=c(-10, 110), axes=FALSE, ann=FALSE)

# Create an axis for C scale
axis(2, at=seq(0, 100, 20))
mtext("Temp (C)", side=2, line=3)

# Create a treatment (x) axis
axis(1, at=1:2, labels=c("Trt 1", "trt 2"))

# Create a secondary y-axis for F scale
axis(4, at=seq(0, 100, 20), labels=seq(0, 100, 20)*9/5 + 32)
mtext("temp F", side=4, line=3)

# Add a box around the plot
box()

# Plot the temperature measurements
segments(x, 0, x, 100, lwd=20, col="dark grey")
segments(x, 0, x, 100, lwd=16, col="white")
segments(x, 0, x, y, lwd=16, col="light pink")

Alternatively, you can display the thermometer in two plots side by side. Note the mfrow argument inside par(). Check the corresponding help file to read how to use it.

# Generate an empty plot
par(
  mfrow= c(1,2),
  mar= c(4,4,2,4)
  )

## Celsius
plot(1, y[1], type='n', 
     xlim=c(.5, 1.5), ylim=c(-10, 110), ylab= "Temp (C)",
     axes=FALSE, ann=FALSE)
axis(2, at=seq(0, 100, 20))
axis(1, at=1, labels= "Trt 1")
box()
segments(1, 0, 1, 100, lwd=20, col="dark grey")
segments(1, 0, 1, 100, lwd=16, col="white")
segments(1, 0, 1, y[1], lwd=16, col="light pink")

## Fahrenheit
plot(1, y[2], type='n', 
     xlim=c(.5, 1.5), ylim=c(-10, 110), ylab= "temp F",
     axes=FALSE, ann=FALSE)
axis(1, at=1, labels= "trt 2")
axis(2, at=seq(0, 100, 20), labels=seq(0, 100, 20)*9/5 + 32)
box()
segments(1, 0, 1, 100, lwd=20, col="dark grey")
segments(1, 0, 1, 100, lwd=16, col="white")
segments(1, 0, 1, y[2], lwd=16, col="light pink")

Fine-tuning plots for publication

OK, now let’s create a publication-ready plot.

This is our original plot:

plot(Petal.Length ~ Petal.Width, data=flowers)
## Error in eval(m$data, eframe): object 'flowers' not found

We can use par() to change plot settings. Note: these changes will be applied to all plots within a chunk. For example, to change margins

par(mar=c(4,4,2,2))

plot(Petal.Length~Petal.Width, data=flowers)
## Error in eval(m$data, eframe): object 'flowers' not found

To change the typeface to Times or Times New Roman (depending on your operating system):

par(family="serif")
plot(Petal.Length ~ Petal.Width, data= flowers)
## Error in eval(m$data, eframe): object 'flowers' not found

To modify the axis labels.

plot(
  Petal.Length ~ Petal.Width, 
  data= flowers, 
  xlab= "Petal width (mm)", 
  ylab= "Petal length (mm)"
)
## Error in eval(m$data, eframe): object 'flowers' not found

To adjust axis label orientation.

plot(
  Petal.Length ~ Petal.Width, 
  data=flowers, 
  xlab="Petal width (mm)", 
  ylab="Petal length (mm)", 
  las = 1
) 
## Error in eval(m$data, eframe): object 'flowers' not found

To adjust point type (pch) and size (cex).

plot(
  Petal.Length ~ Petal.Width, 
  data= flowers, 
  xlab= "Petal width (mm)", 
  ylab= "Petal length (mm)", 
  las = 1, 
  pch = 21, 
  cex = 1.5
) 
## Error in eval(m$data, eframe): object 'flowers' not found

Color management

We will create a palette for our plot. We will use it color each species. Given that there are three species, we will select 3 colors from a palette from RColorBrewer.

library(RColorBrewer)

## custom 3-color palette from the "Set2" base palette in RColorBrewer
pal <-  brewer.pal(3, "Set2")
pal 
## [1] "#66C2A5" "#FC8D62" "#8DA0CB"

To look at the colors:

barplot(c(1,1,1), col = pal)

We can select elements of the color vector using indexing (numbers inside brackets).

barplot(c(1,1,1,1), col = pal[c(1,1,1,3)])

We can assign a color to each level of a categorical variable, such as species in flowers. The categorical variable can be used as an indexing vector, but only if it is coded as a factor. The reason is that R uses a numeric vector to code the levels of a factor. For example, see the structure of the variable species:

str(flowers$Species)
## Error in str(flowers$Species): object 'flowers' not found

We will now assign a color to each observation in flowers, according to its species.

# Species must be a factor for the code to work
flowers$Species <- factor(flowers$Species)
## Error in factor(flowers$Species): object 'flowers' not found
species.cols <- pal[flowers$Species]
## Error in eval(expr, envir, enclos): object 'flowers' not found

Now, we will repeat our scatter plot showing petal length versus width. This time, though, we will color the dots according to species.

plot(
  Petal.Length ~ Petal.Width, 
  data= flowers, 
  xlab= "Petal width (mm)", 
  ylab= "Petal length (mm)", 
  las = 1, 
  pch = 21, 
  cex = 1.5,
  bg= species.cols,  # Filling color
  col= "black"  # Outline color
)
## Error in eval(m$data, eframe): object 'flowers' not found

Exporting plots

You can use the export button in the Plots pane of RStudio. But to do that, you need to recreate the plot in the console (copy code from chunk to console). When you export in this way, choose PDF, because it contains vector images, which are easier to edit later in image editing software (eg. Adobe Illustrator, Inkscape, PowerPoint).

Alternatively, you can use code to export your image. We will show an example, in which we export a plot to a PDF. We need the functions pdf() and dev.off(). In the former, we can specify attributes of the file such as its name and the size of the exported plot (default is in inches). Any code generating and modifying the plot should be written between those two functions. All the code needs to be in the same chunk. Check the example below.

# Generate the PDF and open it for exporting a plot
pdf(file = "flowerplot.pdf", width = 6, height = 6)

# Change the margins of the graphic device (the PDF)
par(mar=c(4,4,2,2), family="Times")

# Generate the plot
plot(
  Petal.Length ~ Petal.Width, 
  data= flowers, 
  xlab= "Petal width (mm)", 
  ylab= "Petal length (mm)", 
  las = 1, 
  pch = 21, 
  cex = 1.5,
  bg= species.cols,  # Filling color
  col= "black"  # Outline color
)
## Error in eval(m$data, eframe): object 'flowers' not found
# Close the PDF
dev.off()
## quartz_off_screen 
##                 2

Check your working directory, after running the above code you should see the plot in a flowerplot.pdf file.

HOMEWORK – DUE NEXT CLASS

You need to do only one of the exercises.

Exercise 1

  • Load the dataset sleep from the package datasets (take some time to look at it). Then, rearrange the data so that the values of the extra variable are in two columns, one for each drug. The columns should be labeled drug1 and drug2. The rearranged dataset, which should be a data frame, should not contain the group variable.

  • Using the barplot function, make two bar plots of the increase in hours of sleep, one for drug1 and another one for drug2. Plot them side by side. The plots should have different colors. In each one, label the y-axis with the variable name.

  • Generate the same barplots from the previous chunk, and in the same arrangement (side by side), but using functions from ggplot2. Tip: Use the function grid.arrange from package gridExtra to have both plots side by side. Check this link from the R Graph Gallery to learn how to use it: https://www.r-graph-gallery.com/261-multiple-graphs-on-same-page.html

Exercise 2

  • Load the dataset VADeaths.

  • Make a bar graph of death rate vs. age group. Within each bar, the population groups should be stacked. Label the y-axis, and add a legend of those four population groups at the top left of the plot

Exercise 3

  • Load the crabs dataset.

  • Using the function ggplot, make a scatter plot of FL ~ CL, with points colors matching species colors (orange and blue). Hint: use the function scale_color_manual.

  • Add dashed regression lines for each species, using geom_smooth. Your code below should reproduce the entire plot with the added line.

  • Write code to save your plot as “crabsplot.pdf”.

  • Make a boxplot of BD with ggplot. The boxes should be sorted by sex. The box colors should match species colors (orange and blue).