The following code is required for proper rendering of the document. Please do not modify it.
knitr::opts_chunk$set(error = TRUE)
Last updated on 2023-Feb-03.
Original text: Chad M. Eliason, PhD
Revisions: Nick M. A. Crouch, PhD; Lucas J. Legendre, PhD; and Carlos A. Rodriguez-Saltos, PhD
Exercises: Lucas J. Legendre, PhD
RMarkdown implementation: Carlos A. Rodriguez-Saltos, PhD
Principal course instructor: Julia A. Clarke, PhD
These modules are part of the course “Curiosity to Question: Research Design, Data Analysis and Visualization”, taught by Dr. Julia A. Clarke and Dr. Adam Papendieck at UT Austin.
For questions or comments, please send an email to Dr. Clarke (julia_clarke@jsg.utexas.edu).
Eliason, C. M., Proffitt, J. V., Crouch, N. M. A., Legendre, L. J., Rodriguez-Saltos, C. A., Papendieck, A., & Clarke, J. A. (2020). The Clarke Lab’s Introductory Course to R. Retrieved from https://juliaclarke.squarespace.com
We will make our analyses reproducible. This means that we will work with R markdown documents (.Rmd files). Writing in R markdown makes it easy to document every part of the code. Think of it as having a notebook for your code. Documenting your code helps others understand what you did; and by others, I also mean you, after one week of not having seen your code. If your code is not documented and organized, it will be totally strange to you.
In a RMarkdown document, text is written as one would normally do in a text editor. Code is written inside boxes known as “chunks”. The following is a chunk, and it runs code to sum two numbers. Go ahead and type the “play” button at the upper right of the chunk to see the result. Alternatively, you can place your cursor inside the chunk and type CTRL + SHIFT + Enter (Windows) or Command + SHIFT + Enter (Mac).
1 + 1
## [1] 2
In general, before each chunk we will write what the code does. For example:
We will do some basic math. You can run all operations at once by hitting the play button, or you can run them one by one by placing the cursor at the desired line and typing CTRL + Enter (Windows) or Command + Enter (Macintosh) (note that, contrary to what you would do if running the whole chunk, you do not press the SHIFT key).
1 + 1
## [1] 2
10 * 20
## [1] 200
The difference between R and RStudio: RStudio is a graphical user interface (i.e. a pretty “shell” for running R). It is often said that R is like the engine of a car, while RStudio is like the dashboard that allows you to drive the car.
There are 4 panes in RStudio:
R Console (where stuff happens)
Plot/Files window (where your graphs will show up, where you will browse files on your computer)
Source window (where we will right code that will be saved for running later on)
Environment/history (where we can see what is in the current R environment: numbers, data, etc.)
More information on using RStudio can be found in the RStudio cheatsheet, available here: https://raw.githubusercontent.com/rstudio/cheatsheets/main/rstudio-ide.pdf
e.g. numeric, factor, character, matrix, data frame, logical, list
The type (or class) of an object is important. For example, some commands require objects of a particular class. If you use the wrong object class, commands can return an error.
The class() command can tell you what class an object belongs to.
class(1)
## [1] "numeric"
R also understands text, but…
class(a)
## Error in eval(expr, envir, enclos): object 'a' not found
The text needs to be in quotes.
class("a")
## [1] "character"
We already saw that R can be used to run mathematical operations. We can also ask questions to R, such as ‘are two numbers the same?’ or ‘is one larger or smaller than the other’?
# Testing equality
1 == 1
## [1] TRUE
2 + 2 == 2 * 2
## [1] TRUE
3 + 3 == 3 * 3
## [1] FALSE
# Is the number to left larger than the number to right
3 > 2
## [1] TRUE
# Are these numbers unequal?
2 != 2
## [1] FALSE
The color scheme of RStudio inside the chunk is as follows: numbers in blue, comments in green, executable code in black, etc.
For really small numbers, R uses scientific notation.
1/2000
## [1] 5e-04
e-whatever means 10^whatever.
5e-4
## [1] 5e-04
5E-4
## [1] 5e-04
sin(1)
## [1] 0.841471
cos(1)
## [1] 0.5403023
Remember, case matters…
SIN(1)
## Error in SIN(1): could not find function "SIN"
Spaces do not, except in the name of an object or function.
sin (1)
si n(1)
## Error: <text>:2:4: unexpected symbol
## 1: sin (1)
## 2: si n
## ^
There are also some built-in constants, like pi.
pi
## [1] 3.141593
sin(pi)
## [1] 1.224647e-16
Let’s calculate the natural log of 10 (ln):
log(10)
## [1] 2.302585
If you want the decimal log…
log10(10)
## [1] 1
or…
log(10, base= 10)
## [1] 1
R is a language: it has a grammar and a vocabulary. The R language is made of objects and functions.
Objects contain information. We store information by using the arrow ’ <- ’, which you can type by pressing ALT (or Option on Mac) and -.
x <- 100
It can also go the other way, if you want to assign the number 10 to the object “x”:
10 -> x
This is now stored in the R environment (displayed on top right pane in Rstudio):
x
## [1] 10
Note that the object x, without quotes, contains a number. See what happens when you run the following chunk:
class(x)
## [1] "numeric"
class("x")
## [1] "character"
Question for you: Why is the output of each line different?
You can also assign more complex expressions as objects:
x <- 500 - 200
x
## [1] 300
Or even text:
greetings <- "Hello world!"
greetings
## [1] "Hello world!"
Listing and removing things in the R environment:
# listing
ls()
## [1] "greetings" "x"
# removing
bird <- "sparrow"
bird
## [1] "sparrow"
rm(bird)
bird
## Error in eval(expr, envir, enclos): object 'bird' not found
# The following line clear all objects from the workspace (use with caution!)
rm(list=ls())
Functions “do things” to objects. The names of functions are usually followed by (). Arguments go inside the parentheses. Many functions require at least one argument: the name of the object to manipulate. For example, when we use the function class(), we include inside the parentheses the name of the object of which we want to know the class.
class(1)
## [1] "numeric"
Arguments go inside parentheses. What the function does is written inside curly braces:
add <- function(firstnum, secondnum) {
firstnum + secondnum
}
add(firstnum = 10, secondnum = 5)
## [1] 15
Exercise In the following chunk, write a function that adds 5 to the product of two numbers. Tip: Think carefully about giving your objects/functions informative names.
R comes pre-loaded with many common functions Other people have
written more advanced/specific ones for certain needs packages are
collections of functions/R code. For example, you can use the
ls()
function to list all objects in the current R
environment:
ls()
## [1] "add"
Each of the following lines of code searches for the help file
assigned to sort
. The lines are equivalent to each
other.
?sort
help(sort)
Sometimes you just want to look for a word among the help files, and not necesarily open the help file assigned to a particular function. In that case, use two question marks.
??sort
R often provides examples of how to use a function (examples are also found at the end of help files).
example(sort)
##
## sort> require(stats)
##
## sort> x <- swiss$Education[1:25]
##
## sort> x; sort(x); sort(x, partial = c(10, 15))
## [1] 12 9 5 7 15 7 7 8 7 13 6 12 7 12 5 2 8 28 20 9 10 3 12 6 1
## [1] 1 2 3 5 5 6 6 7 7 7 7 7 8 8 9 9 10 12 12 12 12 13 15 20 28
## [1] 3 2 5 5 1 6 6 7 7 7 7 8 7 8 9 9 10 12 12 12 12 20 28 13 15
##
## sort> ## illustrate 'stable' sorting (of ties):
## sort> sort(c(10:3, 2:12), method = "shell", index.return = TRUE) # is stable
## $x
## [1] 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12
##
## $ix
## [1] 9 8 10 7 11 6 12 5 13 4 14 3 15 2 16 1 17 18 19
##
##
## sort> ## $x : 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12
## sort> ## $ix: 9 8 10 7 11 6 12 5 13 4 14 3 15 2 16 1 17 18 19
## sort> sort(c(10:3, 2:12), method = "quick", index.return = TRUE) # is not
## $x
## [1] 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12
##
## $ix
## [1] 9 8 10 11 7 6 12 5 13 4 14 3 15 2 16 17 1 18 19
##
##
## sort> ## $x : 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12
## sort> ## $ix: 9 10 8 7 11 6 12 5 13 4 14 3 15 16 2 17 1 18 19
## sort>
## sort> x <- c(1:3, 3:5, 10)
##
## sort> is.unsorted(x) # FALSE: is sorted
## [1] FALSE
##
## sort> is.unsorted(x, strictly = TRUE) # TRUE : is not (and cannot be)
## [1] TRUE
##
## sort> # sorted strictly
## sort> ## Not run:
## sort> ##D ## Small speed comparison simulation:
## sort> ##D N <- 2000
## sort> ##D Sim <- 20
## sort> ##D rep <- 1000 # << adjust to your CPU
## sort> ##D c1 <- c2 <- numeric(Sim)
## sort> ##D for(is in seq_len(Sim)){
## sort> ##D x <- rnorm(N)
## sort> ##D c1[is] <- system.time(for(i in 1:rep) sort(x, method = "shell"))[1]
## sort> ##D c2[is] <- system.time(for(i in 1:rep) sort(x, method = "quick"))[1]
## sort> ##D stopifnot(sort(x, method = "shell") == sort(x, method = "quick"))
## sort> ##D }
## sort> ##D rbind(ShellSort = c1, QuickSort = c2)
## sort> ##D cat("Speedup factor of quick sort():\n")
## sort> ##D summary({qq <- c1 / c2; qq[is.finite(qq)]})
## sort> ##D
## sort> ##D ## A larger test
## sort> ##D x <- rnorm(1e7)
## sort> ##D system.time(x1 <- sort(x, method = "shell"))
## sort> ##D system.time(x2 <- sort(x, method = "quick"))
## sort> ##D system.time(x3 <- sort(x, method = "radix"))
## sort> ##D stopifnot(identical(x1, x2))
## sort> ##D stopifnot(identical(x1, x3))
## sort> ## End(Not run)
## sort>
## sort>
The following code takes you to the home page of the R online documentation.
help.start()
## starting httpd help server ... done
## If the browser launched by '/usr/bin/open' is already running, it is
## *not* restarted, and you must switch to its window.
## Otherwise, be patient ...
If you think there might be a default (base) function called
sort
, but you are not sure, use the apropos
function.
apropos("sort")
## [1] ".doSortWrap" "is.unsorted" "sort" "sort.default" "sort.int"
## [6] "sort.list" "sort.POSIXlt" "sortedXyData"
Do not be shy about searching for answers in online forums. We do it all the time. You can just google your question, and in the search results you will see suggestions for forums in which that question has been asked. In particular, try to get familiarized with StackOverflow (https://stackoverflow.com/). The community of people that use R is so large, and distributed all over the World, that many of the problems you will face with data analysis have probably been discussed already and an answer is waiting for you. In general, it is a good idea to try to solve a problem by yourself, but try not to get stuck. As a rule of thumb, if 5 minutes have passed and you cannot still find an answer (either by reading help files or trying out stuff), then try to search for it in an online forum.
Importantly, do not take everything you read in online forums as the right answer. For simple stuff, such as debugging problems that are common in R or finding the code for a basic routine, you will probably have no need to look any further. But for things such as finding the type of analysis you need to test a hypothesis, you can get recommendations in forums but then you will need to go the peer-reviewed literature to confirm that the recommendations are valid. Of course, it will be much easier to navigate the literature if you know what to search for, and online forums can be good places to get the keywords you need.
If you ever need to ask a question in an online forum, StackOverFlow has this useful guide on how to do so: https://stackoverflow.com/help/how-to-ask
The c() function allows us to create a vector (c is short for combine):
c(0, 1, 2, 3, 4, 5, 6)
## [1] 0 1 2 3 4 5 6
The arguments for this function are the objects or data you want to combine.
c(0,1,2,3,4,5,6,3,4,5,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3)
## [1] 0 1 2 3 4 5 6 3 4 5 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Style note: You don’t need spaces between commas and the next number, but it is good coding practice to include these spaces. It will make it easier for you when you have to re-read your code.
The values in a vector can be numbers, strings, logical values, or any other type, but they must all be of the same type (can’t mix strings and numbers).
Try creating a vector of characters like this:
c("a", "b", "c")
## [1] "a" "b" "c"
Try mixing modes and see what happens.
c(1, TRUE, "three")
## [1] "1" "TRUE" "three"
In general, “as” functions are useful to change the class of an object.
cc <- 10:34
as.vector(cc)
## [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
as.factor(cc)
## [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## 25 Levels: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ... 34
as.character(cc)
## [1] "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24"
## [16] "25" "26" "27" "28" "29" "30" "31" "32" "33" "34"
…and “is” functions to check their class.
is.vector(cc)
## [1] TRUE
is.factor(cc)
## [1] FALSE
is.numeric(cc)
## [1] TRUE
is.character(cc)
## [1] FALSE
We can subset a vector by using square brackets []
.
Inside the brackets we write the position of the element or elements we
want to extract. Hint: We can use vectors inside
[]
.
cc
## [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
cc[1]
## [1] 10
cc[10:15]
## [1] 19 20 21 22 23 24
cc[c(2,4,5)]
## [1] 11 13 14
The function which
allows us to search data in a
vector.
bases <- c("A", "G", "T", "C")
# The following line creates a vector by randomly selecting one of the letters
# in `bases` for 20 times.
ADN <- sample(bases, size = 20, replace = T)
which(ADN == "G")
## [1] 3 9 15 19 20
# What is the proportion of "G"?
length(which(ADN == "G")) / length(ADN)
## [1] 0.25
When we construct our search string, we can use logical operators.
&
- AND, |
- OR.
which(ADN == "G" | ADN == "C")
## [1] 3 5 7 8 9 13 14 15 17 18 19 20
which(ADN == "G" & ADN == "C")
## integer(0)
As you have seen, the output of which
gives the
positions of the matches. We can use these positions to extract the
data.
ADN[which(ADN == "G" | ADN == "C")]
## [1] "G" "C" "C" "C" "G" "C" "C" "G" "C" "C" "G" "G"
which
also can be used to replace part of the data.
ARN <- ADN
ARN[which(ADN == "T")] <- "U"
ADN
## [1] "A" "T" "G" "A" "C" "T" "C" "C" "G" "A" "A" "T" "C" "C" "G" "A" "C" "C" "G"
## [20] "G"
ARN
## [1] "A" "U" "G" "A" "C" "U" "C" "C" "G" "A" "A" "U" "C" "C" "G" "A" "C" "C" "G"
## [20] "G"
We can perform operations using vectors…
x <- c(1, 2, 10, 20)
x
## [1] 1 2 10 20
x * 2
## [1] 2 4 20 40
min(x)
## [1] 1
max(x)
## [1] 20
sum(x)
## [1] 33
prod(x)
## [1] 400
sqrt(x)
## [1] 1.000000 1.414214 3.162278 4.472136
exp(x)
## [1] 2.718282e+00 7.389056e+00 2.202647e+04 4.851652e+08
length(x)
## [1] 4
…or calculate basic statistics.
mean(x)
## [1] 8.25
median(x)
## [1] 6
quantile(x)
## 0% 25% 50% 75% 100%
## 1.00 1.75 6.00 12.50 20.00
sd(x)
## [1] 8.80814
var(x)
## [1] 77.58333
Two very useful functions to give you a summary of an object and/or its structure:
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.75 6.00 8.25 12.50 20.00
str(x)
## num [1:4] 1 2 10 20
Let’s create a new vector y
.
y <- c(2, 1, .2, .1)
And multiply it by our stored vector x
.
x * y
## [1] 2 2 2 2
x
## [1] 1 2 10 20
y
## [1] 2.0 1.0 0.2 0.1
What do you think will happen if the vectors don’t have the same length?
x * c(2, -2)
## [1] 2 -4 20 -40
Answer: R will recyle the shorter vector (i.e. keep repeating it so it equals the length of the longer one). This will only happen if the longer length object is a multiple of the shorter one.
x * c(2, -2, 1)
## Warning in x * c(2, -2, 1): longer object length is not a multiple of shorter
## object length
## [1] 2 -4 10 40
Create a function called mymean
that calculates the mean
of the vector x
, but do not use the
mean
function. When making your own functions,
make sure that you do not use names of pre-made functions in R. Write
your function in the following box and test it with x
.
So far, we have been working with vectors, which are 1-dimensional objects.
x
## [1] 1 2 10 20
But, R can handle objects with more than 1 dimension. Here’s one way to do this: let’s create a matrix.
m <- matrix(1:6, nrow = 2)
m
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Note the values 1 - 6 are added to the 2-row matrix by column; R fills up 1st column then moves on to the 2nd, etc. You can change the way of filling the matrix using the argument ‘byrow’.
m2 <-matrix(1:6, nrow = 2, byrow=TRUE)
m2
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
Logical operators (TRUE and FALSE) can be abbreviated to T and R (however this can create issues in some functions – be careful!).
m2<-matrix(1:6, nrow = 2, byrow=T)
You can subset a matrix using square brackets []
. For
example, we might want to know what is in row 1 and column 3 of our
matrix.
m[1, 3]
## [1] 5
Or the 1st two rows of column one.
m[c(1, 2), 1]
## [1] 1 2
When subsetting, if you want the whole row or column, just leave the column or row blank, respectively. Remember, the row index comes before the comma and the column index, after it.
m[1, ]
## [1] 1 3 5
m[, 1]
## [1] 1 2
You can also get one specific element by using double brackets. In this case, you just provide one number and R counts the elements of the matrix by moving down each column.
m[[4]]
## [1] 4
m2[[4]]
## [1] 5
To know the dimensions of a matrix:
dim(m)
## [1] 2 3
nrow(m)
## [1] 2
ncol(m)
## [1] 3
You can use matrices in functions:
m * 2
## [,1] [,2] [,3]
## [1,] 2 6 10
## [2,] 4 8 12
log(m)
## [,1] [,2] [,3]
## [1,] 0.0000000 1.098612 1.609438
## [2,] 0.6931472 1.386294 1.791759
If you need to do matrix multiplication, use the special operator “%*%“. Here, the t() function transposes the matrix.
cc <- matrix(10:34, nc = 5, nr = 5)
t(m)
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
t(cc)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 10 11 12 13 14
## [2,] 15 16 17 18 19
## [3,] 20 21 22 23 24
## [4,] 25 26 27 28 29
## [5,] 30 31 32 33 34
m %*% t(m)
## [,1] [,2]
## [1,] 35 44
## [2,] 44 56
To add new rows or new columns to a matrix:
rbind(cc, -5:-1)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 10 15 20 25 30
## [2,] 11 16 21 26 31
## [3,] 12 17 22 27 32
## [4,] 13 18 23 28 33
## [5,] 14 19 24 29 34
## [6,] -5 -4 -3 -2 -1
cbind(cc, -5:-1)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 10 15 20 25 30 -5
## [2,] 11 16 21 26 31 -4
## [3,] 12 17 22 27 32 -3
## [4,] 13 18 23 28 33 -2
## [5,] 14 19 24 29 34 -1
cc
## [,1] [,2] [,3] [,4] [,5]
## [1,] 10 15 20 25 30
## [2,] 11 16 21 26 31
## [3,] 12 17 22 27 32
## [4,] 13 18 23 28 33
## [5,] 14 19 24 29 34
We can use the which
function and its variations to
locate data in a matrix.
which(cc==17)
## [1] 8
which(cc==17, arr.ind=T)
## row col
## [1,] 3 2
which(cc>15)
## [1] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
which.min(cc)
## [1] 1
which.max(cc)
## [1] 25
What if we want to “glue together” multiple kinds of data?. Enter… data frames!
This is an example dataset on two species of crabs. It includes information on their color (orange or blue, in column ‘sp’), their sex (M or F), and 5 shell measurements.
library(MASS)
crabs
## sp sex index FL RW CL CW BD
## 1 B M 1 8.1 6.7 16.1 19.0 7.0
## 2 B M 2 8.8 7.7 18.1 20.8 7.4
## 3 B M 3 9.2 7.8 19.0 22.4 7.7
## 4 B M 4 9.6 7.9 20.1 23.1 8.2
## 5 B M 5 9.8 8.0 20.3 23.0 8.2
## 6 B M 6 10.8 9.0 23.0 26.5 9.8
## 7 B M 7 11.1 9.9 23.8 27.1 9.8
## 8 B M 8 11.6 9.1 24.5 28.4 10.4
## 9 B M 9 11.8 9.6 24.2 27.8 9.7
## 10 B M 10 11.8 10.5 25.2 29.3 10.3
## 11 B M 11 12.2 10.8 27.3 31.6 10.9
## 12 B M 12 12.3 11.0 26.8 31.5 11.4
## 13 B M 13 12.6 10.0 27.7 31.7 11.4
## 14 B M 14 12.8 10.2 27.2 31.8 10.9
## 15 B M 15 12.8 10.9 27.4 31.5 11.0
## 16 B M 16 12.9 11.0 26.8 30.9 11.4
## 17 B M 17 13.1 10.6 28.2 32.3 11.0
## 18 B M 18 13.1 10.9 28.3 32.4 11.2
## 19 B M 19 13.3 11.1 27.8 32.3 11.3
## 20 B M 20 13.9 11.1 29.2 33.3 12.1
## 21 B M 21 14.3 11.6 31.3 35.5 12.7
## 22 B M 22 14.6 11.3 31.9 36.4 13.7
## 23 B M 23 15.0 10.9 31.4 36.4 13.2
## 24 B M 24 15.0 11.5 32.4 37.0 13.4
## 25 B M 25 15.0 11.9 32.5 37.2 13.6
## 26 B M 26 15.2 12.1 32.3 36.7 13.6
## 27 B M 27 15.4 11.8 33.0 37.5 13.6
## 28 B M 28 15.7 12.6 35.8 40.3 14.5
## 29 B M 29 15.9 12.7 34.0 38.9 14.2
## 30 B M 30 16.1 11.6 33.8 39.0 14.4
## 31 B M 31 16.1 12.8 34.9 40.7 15.7
## 32 B M 32 16.2 13.3 36.0 41.7 15.4
## 33 B M 33 16.3 12.7 35.6 40.9 14.9
## 34 B M 34 16.4 13.0 35.7 41.8 15.2
## 35 B M 35 16.6 13.5 38.1 43.4 14.9
## 36 B M 36 16.8 12.8 36.2 41.8 14.9
## 37 B M 37 16.9 13.2 37.3 42.7 15.6
## 38 B M 38 17.1 12.6 36.4 42.0 15.1
## 39 B M 39 17.1 12.7 36.7 41.9 15.6
## 40 B M 40 17.2 13.5 37.6 43.9 16.1
## 41 B M 41 17.7 13.6 38.7 44.5 16.0
## 42 B M 42 17.9 14.1 39.7 44.6 16.8
## 43 B M 43 18.0 13.7 39.2 44.4 16.2
## 44 B M 44 18.8 15.8 42.1 49.0 17.8
## 45 B M 45 19.3 13.5 41.6 47.4 17.8
## 46 B M 46 19.3 13.8 40.9 46.5 16.8
## 47 B M 47 19.7 15.3 41.9 48.5 17.8
## 48 B M 48 19.8 14.2 43.2 49.7 18.6
## 49 B M 49 19.8 14.3 42.4 48.9 18.3
## 50 B M 50 21.3 15.7 47.1 54.6 20.0
## 51 B F 1 7.2 6.5 14.7 17.1 6.1
## 52 B F 2 9.0 8.5 19.3 22.7 7.7
## 53 B F 3 9.1 8.1 18.5 21.6 7.7
## 54 B F 4 9.1 8.2 19.2 22.2 7.7
## 55 B F 5 9.5 8.2 19.6 22.4 7.8
## 56 B F 6 9.8 8.9 20.4 23.9 8.8
## 57 B F 7 10.1 9.3 20.9 24.4 8.4
## 58 B F 8 10.3 9.5 21.3 24.7 8.9
## 59 B F 9 10.4 9.7 21.7 25.4 8.3
## 60 B F 10 10.8 9.5 22.5 26.3 9.1
## 61 B F 11 11.0 9.8 22.5 25.7 8.2
## 62 B F 12 11.2 10.0 22.8 26.9 9.4
## 63 B F 13 11.5 11.0 24.7 29.2 10.1
## 64 B F 14 11.6 11.0 24.6 28.5 10.4
## 65 B F 15 11.6 11.4 23.7 27.7 10.0
## 66 B F 16 11.7 10.6 24.9 28.5 10.4
## 67 B F 17 11.9 11.4 26.0 30.1 10.9
## 68 B F 18 12.0 10.7 24.6 28.9 10.5
## 69 B F 19 12.0 11.1 25.4 29.2 11.0
## 70 B F 20 12.6 12.2 26.1 31.6 11.2
## 71 B F 21 12.8 11.7 27.1 31.2 11.9
## 72 B F 22 12.8 12.2 26.7 31.1 11.1
## 73 B F 23 12.8 12.2 27.9 31.9 11.5
## 74 B F 24 13.0 11.4 27.3 31.8 11.3
## 75 B F 25 13.1 11.5 27.6 32.6 11.1
## 76 B F 26 13.2 12.2 27.9 32.1 11.5
## 77 B F 27 13.4 11.8 28.4 32.7 11.7
## 78 B F 28 13.7 12.5 28.6 33.8 11.9
## 79 B F 29 13.9 13.0 30.0 34.9 13.1
## 80 B F 30 14.7 12.5 30.1 34.7 12.5
## 81 B F 31 14.9 13.2 30.1 35.6 12.0
## 82 B F 32 15.0 13.8 31.7 36.9 14.0
## 83 B F 33 15.0 14.2 32.8 37.4 14.0
## 84 B F 34 15.1 13.3 31.8 36.3 13.5
## 85 B F 35 15.1 13.5 31.9 37.0 13.8
## 86 B F 36 15.1 13.8 31.7 36.6 13.0
## 87 B F 37 15.2 14.3 33.9 38.5 14.7
## 88 B F 38 15.3 14.2 32.6 38.3 13.8
## 89 B F 39 15.4 13.3 32.4 37.6 13.8
## 90 B F 40 15.5 13.8 33.4 38.7 14.7
## 91 B F 41 15.6 13.9 32.8 37.9 13.4
## 92 B F 42 15.6 14.7 33.9 39.5 14.3
## 93 B F 43 15.7 13.9 33.6 38.5 14.1
## 94 B F 44 15.8 15.0 34.5 40.3 15.3
## 95 B F 45 16.2 15.2 34.5 40.1 13.9
## 96 B F 46 16.4 14.0 34.2 39.8 15.2
## 97 B F 47 16.7 16.1 36.6 41.9 15.4
## 98 B F 48 17.4 16.9 38.2 44.1 16.6
## 99 B F 49 17.5 16.7 38.6 44.5 17.0
## 100 B F 50 19.2 16.5 40.9 47.9 18.1
## 101 O M 1 9.1 6.9 16.7 18.6 7.4
## 102 O M 2 10.2 8.2 20.2 22.2 9.0
## 103 O M 3 10.7 8.6 20.7 22.7 9.2
## 104 O M 4 11.4 9.0 22.7 24.8 10.1
## 105 O M 5 12.5 9.4 23.2 26.0 10.8
## 106 O M 6 12.5 9.4 24.2 27.0 11.2
## 107 O M 7 12.7 10.4 26.0 28.8 12.1
## 108 O M 8 13.2 11.0 27.1 30.4 12.2
## 109 O M 9 13.4 10.1 26.6 29.6 12.0
## 110 O M 10 13.7 11.0 27.5 30.5 12.2
## 111 O M 11 14.0 11.5 29.2 32.2 13.1
## 112 O M 12 14.1 10.4 28.9 31.8 13.5
## 113 O M 13 14.1 10.5 29.1 31.6 13.1
## 114 O M 14 14.1 10.7 28.7 31.9 13.3
## 115 O M 15 14.2 10.6 28.7 31.7 12.9
## 116 O M 16 14.2 10.7 27.8 30.9 12.7
## 117 O M 17 14.2 11.3 29.2 32.2 13.5
## 118 O M 18 14.6 11.3 29.9 33.5 12.8
## 119 O M 19 14.7 11.1 29.0 32.1 13.1
## 120 O M 20 15.1 11.4 30.2 33.3 14.0
## 121 O M 21 15.1 11.5 30.9 34.0 13.9
## 122 O M 22 15.4 11.1 30.2 33.6 13.5
## 123 O M 23 15.7 12.2 31.7 34.2 14.2
## 124 O M 24 16.2 11.8 32.3 35.3 14.7
## 125 O M 25 16.3 11.6 31.6 34.2 14.5
## 126 O M 26 17.1 12.6 35.0 38.9 15.7
## 127 O M 27 17.4 12.8 36.1 39.5 16.2
## 128 O M 28 17.5 12.0 34.4 37.3 15.3
## 129 O M 29 17.5 12.7 34.6 38.4 16.1
## 130 O M 30 17.8 12.5 36.0 39.8 16.7
## 131 O M 31 17.9 12.9 36.9 40.9 16.5
## 132 O M 32 18.0 13.4 36.7 41.3 17.1
## 133 O M 33 18.2 13.7 38.8 42.7 17.2
## 134 O M 34 18.4 13.4 37.9 42.2 17.7
## 135 O M 35 18.6 13.4 37.8 41.9 17.3
## 136 O M 36 18.6 13.5 36.9 40.2 17.0
## 137 O M 37 18.8 13.4 37.2 41.1 17.5
## 138 O M 38 18.8 13.8 39.2 43.3 17.9
## 139 O M 39 19.4 14.1 39.1 43.2 17.8
## 140 O M 40 19.4 14.4 39.8 44.3 17.9
## 141 O M 41 20.1 13.7 40.6 44.5 18.0
## 142 O M 42 20.6 14.4 42.8 46.5 19.6
## 143 O M 43 21.0 15.0 42.9 47.2 19.4
## 144 O M 44 21.5 15.5 45.5 49.7 20.9
## 145 O M 45 21.6 15.4 45.7 49.7 20.6
## 146 O M 46 21.6 14.8 43.4 48.2 20.1
## 147 O M 47 21.9 15.7 45.4 51.0 21.1
## 148 O M 48 22.1 15.8 44.6 49.6 20.5
## 149 O M 49 23.0 16.8 47.2 52.1 21.5
## 150 O M 50 23.1 15.7 47.6 52.8 21.6
## 151 O F 1 10.7 9.7 21.4 24.0 9.8
## 152 O F 2 11.4 9.2 21.7 24.1 9.7
## 153 O F 3 12.5 10.0 24.1 27.0 10.9
## 154 O F 4 12.6 11.5 25.0 28.1 11.5
## 155 O F 5 12.9 11.2 25.8 29.1 11.9
## 156 O F 6 14.0 11.9 27.0 31.4 12.6
## 157 O F 7 14.0 12.8 28.8 32.4 12.7
## 158 O F 8 14.3 12.2 28.1 31.8 12.5
## 159 O F 9 14.7 13.2 29.6 33.4 12.9
## 160 O F 10 14.9 13.0 30.0 33.7 13.3
## 161 O F 11 15.0 12.3 30.1 33.3 14.0
## 162 O F 12 15.6 13.5 31.2 35.1 14.1
## 163 O F 13 15.6 14.0 31.6 35.3 13.8
## 164 O F 14 15.6 14.1 31.0 34.5 13.8
## 165 O F 15 15.7 13.6 31.0 34.8 13.8
## 166 O F 16 16.1 13.6 31.6 36.0 14.0
## 167 O F 17 16.1 13.7 31.4 36.1 13.9
## 168 O F 18 16.2 14.0 31.6 35.6 13.7
## 169 O F 19 16.7 14.3 32.3 37.0 14.7
## 170 O F 20 17.1 14.5 33.1 37.2 14.6
## 171 O F 21 17.5 14.3 34.5 39.6 15.6
## 172 O F 22 17.5 14.4 34.5 39.0 16.0
## 173 O F 23 17.5 14.7 33.3 37.6 14.6
## 174 O F 24 17.6 14.0 34.0 38.6 15.5
## 175 O F 25 18.0 14.9 34.7 39.5 15.7
## 176 O F 26 18.0 16.3 37.9 43.0 17.2
## 177 O F 27 18.3 15.7 35.1 40.5 16.1
## 178 O F 28 18.4 15.5 35.6 40.0 15.9
## 179 O F 29 18.4 15.7 36.5 41.6 16.4
## 180 O F 30 18.5 14.6 37.0 42.0 16.6
## 181 O F 31 18.6 14.5 34.7 39.4 15.0
## 182 O F 32 18.8 15.2 35.8 40.5 16.6
## 183 O F 33 18.9 16.7 36.3 41.7 15.3
## 184 O F 34 19.1 16.0 37.8 42.3 16.8
## 185 O F 35 19.1 16.3 37.9 42.6 17.2
## 186 O F 36 19.7 16.7 39.9 43.6 18.2
## 187 O F 37 19.9 16.6 39.4 43.9 17.9
## 188 O F 38 19.9 17.9 40.1 46.4 17.9
## 189 O F 39 20.0 16.7 40.4 45.1 17.7
## 190 O F 40 20.1 17.2 39.8 44.1 18.6
## 191 O F 41 20.3 16.0 39.4 44.1 18.0
## 192 O F 42 20.5 17.5 40.0 45.5 19.2
## 193 O F 43 20.6 17.5 41.5 46.2 19.2
## 194 O F 44 20.9 16.5 39.9 44.7 17.5
## 195 O F 45 21.3 18.4 43.8 48.4 20.0
## 196 O F 46 21.4 18.0 41.2 46.2 18.7
## 197 O F 47 21.7 17.1 41.7 47.2 19.6
## 198 O F 48 21.9 17.2 42.6 47.4 19.5
## 199 O F 49 22.5 17.2 43.0 48.7 19.8
## 200 O F 50 23.1 20.2 46.2 52.5 21.1
We can subset columns and check to what classes the data belong to. They can be different, because we have a data frame rather than a matrix. Are they?
class(crabs[, 1])
## [1] "factor"
class(crabs[, 4])
## [1] "numeric"
class(crabs[, 1]) == class(crabs[, 4])
## [1] FALSE
In some ways, data frames behave like matrices.
crabs[, 2]
## [1] M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
## [38] M M M M M M M M M M M M M F F F F F F F F F F F F F F F F F F F F F F F F
## [75] F F F F F F F F F F F F F F F F F F F F F F F F F F M M M M M M M M M M M
## [112] M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
## [149] M M F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
## [186] F F F F F F F F F F F F F F F
## Levels: F M
crabs[, 2:5]
## sex index FL RW
## 1 M 1 8.1 6.7
## 2 M 2 8.8 7.7
## 3 M 3 9.2 7.8
## 4 M 4 9.6 7.9
## 5 M 5 9.8 8.0
## 6 M 6 10.8 9.0
## 7 M 7 11.1 9.9
## 8 M 8 11.6 9.1
## 9 M 9 11.8 9.6
## 10 M 10 11.8 10.5
## 11 M 11 12.2 10.8
## 12 M 12 12.3 11.0
## 13 M 13 12.6 10.0
## 14 M 14 12.8 10.2
## 15 M 15 12.8 10.9
## 16 M 16 12.9 11.0
## 17 M 17 13.1 10.6
## 18 M 18 13.1 10.9
## 19 M 19 13.3 11.1
## 20 M 20 13.9 11.1
## 21 M 21 14.3 11.6
## 22 M 22 14.6 11.3
## 23 M 23 15.0 10.9
## 24 M 24 15.0 11.5
## 25 M 25 15.0 11.9
## 26 M 26 15.2 12.1
## 27 M 27 15.4 11.8
## 28 M 28 15.7 12.6
## 29 M 29 15.9 12.7
## 30 M 30 16.1 11.6
## 31 M 31 16.1 12.8
## 32 M 32 16.2 13.3
## 33 M 33 16.3 12.7
## 34 M 34 16.4 13.0
## 35 M 35 16.6 13.5
## 36 M 36 16.8 12.8
## 37 M 37 16.9 13.2
## 38 M 38 17.1 12.6
## 39 M 39 17.1 12.7
## 40 M 40 17.2 13.5
## 41 M 41 17.7 13.6
## 42 M 42 17.9 14.1
## 43 M 43 18.0 13.7
## 44 M 44 18.8 15.8
## 45 M 45 19.3 13.5
## 46 M 46 19.3 13.8
## 47 M 47 19.7 15.3
## 48 M 48 19.8 14.2
## 49 M 49 19.8 14.3
## 50 M 50 21.3 15.7
## 51 F 1 7.2 6.5
## 52 F 2 9.0 8.5
## 53 F 3 9.1 8.1
## 54 F 4 9.1 8.2
## 55 F 5 9.5 8.2
## 56 F 6 9.8 8.9
## 57 F 7 10.1 9.3
## 58 F 8 10.3 9.5
## 59 F 9 10.4 9.7
## 60 F 10 10.8 9.5
## 61 F 11 11.0 9.8
## 62 F 12 11.2 10.0
## 63 F 13 11.5 11.0
## 64 F 14 11.6 11.0
## 65 F 15 11.6 11.4
## 66 F 16 11.7 10.6
## 67 F 17 11.9 11.4
## 68 F 18 12.0 10.7
## 69 F 19 12.0 11.1
## 70 F 20 12.6 12.2
## 71 F 21 12.8 11.7
## 72 F 22 12.8 12.2
## 73 F 23 12.8 12.2
## 74 F 24 13.0 11.4
## 75 F 25 13.1 11.5
## 76 F 26 13.2 12.2
## 77 F 27 13.4 11.8
## 78 F 28 13.7 12.5
## 79 F 29 13.9 13.0
## 80 F 30 14.7 12.5
## 81 F 31 14.9 13.2
## 82 F 32 15.0 13.8
## 83 F 33 15.0 14.2
## 84 F 34 15.1 13.3
## 85 F 35 15.1 13.5
## 86 F 36 15.1 13.8
## 87 F 37 15.2 14.3
## 88 F 38 15.3 14.2
## 89 F 39 15.4 13.3
## 90 F 40 15.5 13.8
## 91 F 41 15.6 13.9
## 92 F 42 15.6 14.7
## 93 F 43 15.7 13.9
## 94 F 44 15.8 15.0
## 95 F 45 16.2 15.2
## 96 F 46 16.4 14.0
## 97 F 47 16.7 16.1
## 98 F 48 17.4 16.9
## 99 F 49 17.5 16.7
## 100 F 50 19.2 16.5
## 101 M 1 9.1 6.9
## 102 M 2 10.2 8.2
## 103 M 3 10.7 8.6
## 104 M 4 11.4 9.0
## 105 M 5 12.5 9.4
## 106 M 6 12.5 9.4
## 107 M 7 12.7 10.4
## 108 M 8 13.2 11.0
## 109 M 9 13.4 10.1
## 110 M 10 13.7 11.0
## 111 M 11 14.0 11.5
## 112 M 12 14.1 10.4
## 113 M 13 14.1 10.5
## 114 M 14 14.1 10.7
## 115 M 15 14.2 10.6
## 116 M 16 14.2 10.7
## 117 M 17 14.2 11.3
## 118 M 18 14.6 11.3
## 119 M 19 14.7 11.1
## 120 M 20 15.1 11.4
## 121 M 21 15.1 11.5
## 122 M 22 15.4 11.1
## 123 M 23 15.7 12.2
## 124 M 24 16.2 11.8
## 125 M 25 16.3 11.6
## 126 M 26 17.1 12.6
## 127 M 27 17.4 12.8
## 128 M 28 17.5 12.0
## 129 M 29 17.5 12.7
## 130 M 30 17.8 12.5
## 131 M 31 17.9 12.9
## 132 M 32 18.0 13.4
## 133 M 33 18.2 13.7
## 134 M 34 18.4 13.4
## 135 M 35 18.6 13.4
## 136 M 36 18.6 13.5
## 137 M 37 18.8 13.4
## 138 M 38 18.8 13.8
## 139 M 39 19.4 14.1
## 140 M 40 19.4 14.4
## 141 M 41 20.1 13.7
## 142 M 42 20.6 14.4
## 143 M 43 21.0 15.0
## 144 M 44 21.5 15.5
## 145 M 45 21.6 15.4
## 146 M 46 21.6 14.8
## 147 M 47 21.9 15.7
## 148 M 48 22.1 15.8
## 149 M 49 23.0 16.8
## 150 M 50 23.1 15.7
## 151 F 1 10.7 9.7
## 152 F 2 11.4 9.2
## 153 F 3 12.5 10.0
## 154 F 4 12.6 11.5
## 155 F 5 12.9 11.2
## 156 F 6 14.0 11.9
## 157 F 7 14.0 12.8
## 158 F 8 14.3 12.2
## 159 F 9 14.7 13.2
## 160 F 10 14.9 13.0
## 161 F 11 15.0 12.3
## 162 F 12 15.6 13.5
## 163 F 13 15.6 14.0
## 164 F 14 15.6 14.1
## 165 F 15 15.7 13.6
## 166 F 16 16.1 13.6
## 167 F 17 16.1 13.7
## 168 F 18 16.2 14.0
## 169 F 19 16.7 14.3
## 170 F 20 17.1 14.5
## 171 F 21 17.5 14.3
## 172 F 22 17.5 14.4
## 173 F 23 17.5 14.7
## 174 F 24 17.6 14.0
## 175 F 25 18.0 14.9
## 176 F 26 18.0 16.3
## 177 F 27 18.3 15.7
## 178 F 28 18.4 15.5
## 179 F 29 18.4 15.7
## 180 F 30 18.5 14.6
## 181 F 31 18.6 14.5
## 182 F 32 18.8 15.2
## 183 F 33 18.9 16.7
## 184 F 34 19.1 16.0
## 185 F 35 19.1 16.3
## 186 F 36 19.7 16.7
## 187 F 37 19.9 16.6
## 188 F 38 19.9 17.9
## 189 F 39 20.0 16.7
## 190 F 40 20.1 17.2
## 191 F 41 20.3 16.0
## 192 F 42 20.5 17.5
## 193 F 43 20.6 17.5
## 194 F 44 20.9 16.5
## 195 F 45 21.3 18.4
## 196 F 46 21.4 18.0
## 197 F 47 21.7 17.1
## 198 F 48 21.9 17.2
## 199 F 49 22.5 17.2
## 200 F 50 23.1 20.2
Contrary to matrices, data frame columns can also be subsetted by
using the dollar sign $
or entering the name of the column
in quotes.
crabs$sex
## [1] M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
## [38] M M M M M M M M M M M M M F F F F F F F F F F F F F F F F F F F F F F F F
## [75] F F F F F F F F F F F F F F F F F F F F F F F F F F M M M M M M M M M M M
## [112] M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
## [149] M M F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
## [186] F F F F F F F F F F F F F F F
## Levels: F M
crabs["sex"]
## sex
## 1 M
## 2 M
## 3 M
## 4 M
## 5 M
## 6 M
## 7 M
## 8 M
## 9 M
## 10 M
## 11 M
## 12 M
## 13 M
## 14 M
## 15 M
## 16 M
## 17 M
## 18 M
## 19 M
## 20 M
## 21 M
## 22 M
## 23 M
## 24 M
## 25 M
## 26 M
## 27 M
## 28 M
## 29 M
## 30 M
## 31 M
## 32 M
## 33 M
## 34 M
## 35 M
## 36 M
## 37 M
## 38 M
## 39 M
## 40 M
## 41 M
## 42 M
## 43 M
## 44 M
## 45 M
## 46 M
## 47 M
## 48 M
## 49 M
## 50 M
## 51 F
## 52 F
## 53 F
## 54 F
## 55 F
## 56 F
## 57 F
## 58 F
## 59 F
## 60 F
## 61 F
## 62 F
## 63 F
## 64 F
## 65 F
## 66 F
## 67 F
## 68 F
## 69 F
## 70 F
## 71 F
## 72 F
## 73 F
## 74 F
## 75 F
## 76 F
## 77 F
## 78 F
## 79 F
## 80 F
## 81 F
## 82 F
## 83 F
## 84 F
## 85 F
## 86 F
## 87 F
## 88 F
## 89 F
## 90 F
## 91 F
## 92 F
## 93 F
## 94 F
## 95 F
## 96 F
## 97 F
## 98 F
## 99 F
## 100 F
## 101 M
## 102 M
## 103 M
## 104 M
## 105 M
## 106 M
## 107 M
## 108 M
## 109 M
## 110 M
## 111 M
## 112 M
## 113 M
## 114 M
## 115 M
## 116 M
## 117 M
## 118 M
## 119 M
## 120 M
## 121 M
## 122 M
## 123 M
## 124 M
## 125 M
## 126 M
## 127 M
## 128 M
## 129 M
## 130 M
## 131 M
## 132 M
## 133 M
## 134 M
## 135 M
## 136 M
## 137 M
## 138 M
## 139 M
## 140 M
## 141 M
## 142 M
## 143 M
## 144 M
## 145 M
## 146 M
## 147 M
## 148 M
## 149 M
## 150 M
## 151 F
## 152 F
## 153 F
## 154 F
## 155 F
## 156 F
## 157 F
## 158 F
## 159 F
## 160 F
## 161 F
## 162 F
## 163 F
## 164 F
## 165 F
## 166 F
## 167 F
## 168 F
## 169 F
## 170 F
## 171 F
## 172 F
## 173 F
## 174 F
## 175 F
## 176 F
## 177 F
## 178 F
## 179 F
## 180 F
## 181 F
## 182 F
## 183 F
## 184 F
## 185 F
## 186 F
## 187 F
## 188 F
## 189 F
## 190 F
## 191 F
## 192 F
## 193 F
## 194 F
## 195 F
## 196 F
## 197 F
## 198 F
## 199 F
## 200 F
Let’s say we want to select only rows with FL > 19 (remember that
within [ ]
, the stuff before the comma is the row, and the
stuff after is the column):
crabs[crabs$FL > 19, ]
## sp sex index FL RW CL CW BD
## 45 B M 45 19.3 13.5 41.6 47.4 17.8
## 46 B M 46 19.3 13.8 40.9 46.5 16.8
## 47 B M 47 19.7 15.3 41.9 48.5 17.8
## 48 B M 48 19.8 14.2 43.2 49.7 18.6
## 49 B M 49 19.8 14.3 42.4 48.9 18.3
## 50 B M 50 21.3 15.7 47.1 54.6 20.0
## 100 B F 50 19.2 16.5 40.9 47.9 18.1
## 139 O M 39 19.4 14.1 39.1 43.2 17.8
## 140 O M 40 19.4 14.4 39.8 44.3 17.9
## 141 O M 41 20.1 13.7 40.6 44.5 18.0
## 142 O M 42 20.6 14.4 42.8 46.5 19.6
## 143 O M 43 21.0 15.0 42.9 47.2 19.4
## 144 O M 44 21.5 15.5 45.5 49.7 20.9
## 145 O M 45 21.6 15.4 45.7 49.7 20.6
## 146 O M 46 21.6 14.8 43.4 48.2 20.1
## 147 O M 47 21.9 15.7 45.4 51.0 21.1
## 148 O M 48 22.1 15.8 44.6 49.6 20.5
## 149 O M 49 23.0 16.8 47.2 52.1 21.5
## 150 O M 50 23.1 15.7 47.6 52.8 21.6
## 184 O F 34 19.1 16.0 37.8 42.3 16.8
## 185 O F 35 19.1 16.3 37.9 42.6 17.2
## 186 O F 36 19.7 16.7 39.9 43.6 18.2
## 187 O F 37 19.9 16.6 39.4 43.9 17.9
## 188 O F 38 19.9 17.9 40.1 46.4 17.9
## 189 O F 39 20.0 16.7 40.4 45.1 17.7
## 190 O F 40 20.1 17.2 39.8 44.1 18.6
## 191 O F 41 20.3 16.0 39.4 44.1 18.0
## 192 O F 42 20.5 17.5 40.0 45.5 19.2
## 193 O F 43 20.6 17.5 41.5 46.2 19.2
## 194 O F 44 20.9 16.5 39.9 44.7 17.5
## 195 O F 45 21.3 18.4 43.8 48.4 20.0
## 196 O F 46 21.4 18.0 41.2 46.2 18.7
## 197 O F 47 21.7 17.1 41.7 47.2 19.6
## 198 O F 48 21.9 17.2 42.6 47.4 19.5
## 199 O F 49 22.5 17.2 43.0 48.7 19.8
## 200 O F 50 23.1 20.2 46.2 52.5 21.1
The subset
function can do the same. Use the method you
are more comfortable with.
subset(crabs, crabs$FL > 19)
## sp sex index FL RW CL CW BD
## 45 B M 45 19.3 13.5 41.6 47.4 17.8
## 46 B M 46 19.3 13.8 40.9 46.5 16.8
## 47 B M 47 19.7 15.3 41.9 48.5 17.8
## 48 B M 48 19.8 14.2 43.2 49.7 18.6
## 49 B M 49 19.8 14.3 42.4 48.9 18.3
## 50 B M 50 21.3 15.7 47.1 54.6 20.0
## 100 B F 50 19.2 16.5 40.9 47.9 18.1
## 139 O M 39 19.4 14.1 39.1 43.2 17.8
## 140 O M 40 19.4 14.4 39.8 44.3 17.9
## 141 O M 41 20.1 13.7 40.6 44.5 18.0
## 142 O M 42 20.6 14.4 42.8 46.5 19.6
## 143 O M 43 21.0 15.0 42.9 47.2 19.4
## 144 O M 44 21.5 15.5 45.5 49.7 20.9
## 145 O M 45 21.6 15.4 45.7 49.7 20.6
## 146 O M 46 21.6 14.8 43.4 48.2 20.1
## 147 O M 47 21.9 15.7 45.4 51.0 21.1
## 148 O M 48 22.1 15.8 44.6 49.6 20.5
## 149 O M 49 23.0 16.8 47.2 52.1 21.5
## 150 O M 50 23.1 15.7 47.6 52.8 21.6
## 184 O F 34 19.1 16.0 37.8 42.3 16.8
## 185 O F 35 19.1 16.3 37.9 42.6 17.2
## 186 O F 36 19.7 16.7 39.9 43.6 18.2
## 187 O F 37 19.9 16.6 39.4 43.9 17.9
## 188 O F 38 19.9 17.9 40.1 46.4 17.9
## 189 O F 39 20.0 16.7 40.4 45.1 17.7
## 190 O F 40 20.1 17.2 39.8 44.1 18.6
## 191 O F 41 20.3 16.0 39.4 44.1 18.0
## 192 O F 42 20.5 17.5 40.0 45.5 19.2
## 193 O F 43 20.6 17.5 41.5 46.2 19.2
## 194 O F 44 20.9 16.5 39.9 44.7 17.5
## 195 O F 45 21.3 18.4 43.8 48.4 20.0
## 196 O F 46 21.4 18.0 41.2 46.2 18.7
## 197 O F 47 21.7 17.1 41.7 47.2 19.6
## 198 O F 48 21.9 17.2 42.6 47.4 19.5
## 199 O F 49 22.5 17.2 43.0 48.7 19.8
## 200 O F 50 23.1 20.2 46.2 52.5 21.1
Let’s sort the data frame based on sex rather than on species.
crabs[order(crabs$sex), ]
## sp sex index FL RW CL CW BD
## 51 B F 1 7.2 6.5 14.7 17.1 6.1
## 52 B F 2 9.0 8.5 19.3 22.7 7.7
## 53 B F 3 9.1 8.1 18.5 21.6 7.7
## 54 B F 4 9.1 8.2 19.2 22.2 7.7
## 55 B F 5 9.5 8.2 19.6 22.4 7.8
## 56 B F 6 9.8 8.9 20.4 23.9 8.8
## 57 B F 7 10.1 9.3 20.9 24.4 8.4
## 58 B F 8 10.3 9.5 21.3 24.7 8.9
## 59 B F 9 10.4 9.7 21.7 25.4 8.3
## 60 B F 10 10.8 9.5 22.5 26.3 9.1
## 61 B F 11 11.0 9.8 22.5 25.7 8.2
## 62 B F 12 11.2 10.0 22.8 26.9 9.4
## 63 B F 13 11.5 11.0 24.7 29.2 10.1
## 64 B F 14 11.6 11.0 24.6 28.5 10.4
## 65 B F 15 11.6 11.4 23.7 27.7 10.0
## 66 B F 16 11.7 10.6 24.9 28.5 10.4
## 67 B F 17 11.9 11.4 26.0 30.1 10.9
## 68 B F 18 12.0 10.7 24.6 28.9 10.5
## 69 B F 19 12.0 11.1 25.4 29.2 11.0
## 70 B F 20 12.6 12.2 26.1 31.6 11.2
## 71 B F 21 12.8 11.7 27.1 31.2 11.9
## 72 B F 22 12.8 12.2 26.7 31.1 11.1
## 73 B F 23 12.8 12.2 27.9 31.9 11.5
## 74 B F 24 13.0 11.4 27.3 31.8 11.3
## 75 B F 25 13.1 11.5 27.6 32.6 11.1
## 76 B F 26 13.2 12.2 27.9 32.1 11.5
## 77 B F 27 13.4 11.8 28.4 32.7 11.7
## 78 B F 28 13.7 12.5 28.6 33.8 11.9
## 79 B F 29 13.9 13.0 30.0 34.9 13.1
## 80 B F 30 14.7 12.5 30.1 34.7 12.5
## 81 B F 31 14.9 13.2 30.1 35.6 12.0
## 82 B F 32 15.0 13.8 31.7 36.9 14.0
## 83 B F 33 15.0 14.2 32.8 37.4 14.0
## 84 B F 34 15.1 13.3 31.8 36.3 13.5
## 85 B F 35 15.1 13.5 31.9 37.0 13.8
## 86 B F 36 15.1 13.8 31.7 36.6 13.0
## 87 B F 37 15.2 14.3 33.9 38.5 14.7
## 88 B F 38 15.3 14.2 32.6 38.3 13.8
## 89 B F 39 15.4 13.3 32.4 37.6 13.8
## 90 B F 40 15.5 13.8 33.4 38.7 14.7
## 91 B F 41 15.6 13.9 32.8 37.9 13.4
## 92 B F 42 15.6 14.7 33.9 39.5 14.3
## 93 B F 43 15.7 13.9 33.6 38.5 14.1
## 94 B F 44 15.8 15.0 34.5 40.3 15.3
## 95 B F 45 16.2 15.2 34.5 40.1 13.9
## 96 B F 46 16.4 14.0 34.2 39.8 15.2
## 97 B F 47 16.7 16.1 36.6 41.9 15.4
## 98 B F 48 17.4 16.9 38.2 44.1 16.6
## 99 B F 49 17.5 16.7 38.6 44.5 17.0
## 100 B F 50 19.2 16.5 40.9 47.9 18.1
## 151 O F 1 10.7 9.7 21.4 24.0 9.8
## 152 O F 2 11.4 9.2 21.7 24.1 9.7
## 153 O F 3 12.5 10.0 24.1 27.0 10.9
## 154 O F 4 12.6 11.5 25.0 28.1 11.5
## 155 O F 5 12.9 11.2 25.8 29.1 11.9
## 156 O F 6 14.0 11.9 27.0 31.4 12.6
## 157 O F 7 14.0 12.8 28.8 32.4 12.7
## 158 O F 8 14.3 12.2 28.1 31.8 12.5
## 159 O F 9 14.7 13.2 29.6 33.4 12.9
## 160 O F 10 14.9 13.0 30.0 33.7 13.3
## 161 O F 11 15.0 12.3 30.1 33.3 14.0
## 162 O F 12 15.6 13.5 31.2 35.1 14.1
## 163 O F 13 15.6 14.0 31.6 35.3 13.8
## 164 O F 14 15.6 14.1 31.0 34.5 13.8
## 165 O F 15 15.7 13.6 31.0 34.8 13.8
## 166 O F 16 16.1 13.6 31.6 36.0 14.0
## 167 O F 17 16.1 13.7 31.4 36.1 13.9
## 168 O F 18 16.2 14.0 31.6 35.6 13.7
## 169 O F 19 16.7 14.3 32.3 37.0 14.7
## 170 O F 20 17.1 14.5 33.1 37.2 14.6
## 171 O F 21 17.5 14.3 34.5 39.6 15.6
## 172 O F 22 17.5 14.4 34.5 39.0 16.0
## 173 O F 23 17.5 14.7 33.3 37.6 14.6
## 174 O F 24 17.6 14.0 34.0 38.6 15.5
## 175 O F 25 18.0 14.9 34.7 39.5 15.7
## 176 O F 26 18.0 16.3 37.9 43.0 17.2
## 177 O F 27 18.3 15.7 35.1 40.5 16.1
## 178 O F 28 18.4 15.5 35.6 40.0 15.9
## 179 O F 29 18.4 15.7 36.5 41.6 16.4
## 180 O F 30 18.5 14.6 37.0 42.0 16.6
## 181 O F 31 18.6 14.5 34.7 39.4 15.0
## 182 O F 32 18.8 15.2 35.8 40.5 16.6
## 183 O F 33 18.9 16.7 36.3 41.7 15.3
## 184 O F 34 19.1 16.0 37.8 42.3 16.8
## 185 O F 35 19.1 16.3 37.9 42.6 17.2
## 186 O F 36 19.7 16.7 39.9 43.6 18.2
## 187 O F 37 19.9 16.6 39.4 43.9 17.9
## 188 O F 38 19.9 17.9 40.1 46.4 17.9
## 189 O F 39 20.0 16.7 40.4 45.1 17.7
## 190 O F 40 20.1 17.2 39.8 44.1 18.6
## 191 O F 41 20.3 16.0 39.4 44.1 18.0
## 192 O F 42 20.5 17.5 40.0 45.5 19.2
## 193 O F 43 20.6 17.5 41.5 46.2 19.2
## 194 O F 44 20.9 16.5 39.9 44.7 17.5
## 195 O F 45 21.3 18.4 43.8 48.4 20.0
## 196 O F 46 21.4 18.0 41.2 46.2 18.7
## 197 O F 47 21.7 17.1 41.7 47.2 19.6
## 198 O F 48 21.9 17.2 42.6 47.4 19.5
## 199 O F 49 22.5 17.2 43.0 48.7 19.8
## 200 O F 50 23.1 20.2 46.2 52.5 21.1
## 1 B M 1 8.1 6.7 16.1 19.0 7.0
## 2 B M 2 8.8 7.7 18.1 20.8 7.4
## 3 B M 3 9.2 7.8 19.0 22.4 7.7
## 4 B M 4 9.6 7.9 20.1 23.1 8.2
## 5 B M 5 9.8 8.0 20.3 23.0 8.2
## 6 B M 6 10.8 9.0 23.0 26.5 9.8
## 7 B M 7 11.1 9.9 23.8 27.1 9.8
## 8 B M 8 11.6 9.1 24.5 28.4 10.4
## 9 B M 9 11.8 9.6 24.2 27.8 9.7
## 10 B M 10 11.8 10.5 25.2 29.3 10.3
## 11 B M 11 12.2 10.8 27.3 31.6 10.9
## 12 B M 12 12.3 11.0 26.8 31.5 11.4
## 13 B M 13 12.6 10.0 27.7 31.7 11.4
## 14 B M 14 12.8 10.2 27.2 31.8 10.9
## 15 B M 15 12.8 10.9 27.4 31.5 11.0
## 16 B M 16 12.9 11.0 26.8 30.9 11.4
## 17 B M 17 13.1 10.6 28.2 32.3 11.0
## 18 B M 18 13.1 10.9 28.3 32.4 11.2
## 19 B M 19 13.3 11.1 27.8 32.3 11.3
## 20 B M 20 13.9 11.1 29.2 33.3 12.1
## 21 B M 21 14.3 11.6 31.3 35.5 12.7
## 22 B M 22 14.6 11.3 31.9 36.4 13.7
## 23 B M 23 15.0 10.9 31.4 36.4 13.2
## 24 B M 24 15.0 11.5 32.4 37.0 13.4
## 25 B M 25 15.0 11.9 32.5 37.2 13.6
## 26 B M 26 15.2 12.1 32.3 36.7 13.6
## 27 B M 27 15.4 11.8 33.0 37.5 13.6
## 28 B M 28 15.7 12.6 35.8 40.3 14.5
## 29 B M 29 15.9 12.7 34.0 38.9 14.2
## 30 B M 30 16.1 11.6 33.8 39.0 14.4
## 31 B M 31 16.1 12.8 34.9 40.7 15.7
## 32 B M 32 16.2 13.3 36.0 41.7 15.4
## 33 B M 33 16.3 12.7 35.6 40.9 14.9
## 34 B M 34 16.4 13.0 35.7 41.8 15.2
## 35 B M 35 16.6 13.5 38.1 43.4 14.9
## 36 B M 36 16.8 12.8 36.2 41.8 14.9
## 37 B M 37 16.9 13.2 37.3 42.7 15.6
## 38 B M 38 17.1 12.6 36.4 42.0 15.1
## 39 B M 39 17.1 12.7 36.7 41.9 15.6
## 40 B M 40 17.2 13.5 37.6 43.9 16.1
## 41 B M 41 17.7 13.6 38.7 44.5 16.0
## 42 B M 42 17.9 14.1 39.7 44.6 16.8
## 43 B M 43 18.0 13.7 39.2 44.4 16.2
## 44 B M 44 18.8 15.8 42.1 49.0 17.8
## 45 B M 45 19.3 13.5 41.6 47.4 17.8
## 46 B M 46 19.3 13.8 40.9 46.5 16.8
## 47 B M 47 19.7 15.3 41.9 48.5 17.8
## 48 B M 48 19.8 14.2 43.2 49.7 18.6
## 49 B M 49 19.8 14.3 42.4 48.9 18.3
## 50 B M 50 21.3 15.7 47.1 54.6 20.0
## 101 O M 1 9.1 6.9 16.7 18.6 7.4
## 102 O M 2 10.2 8.2 20.2 22.2 9.0
## 103 O M 3 10.7 8.6 20.7 22.7 9.2
## 104 O M 4 11.4 9.0 22.7 24.8 10.1
## 105 O M 5 12.5 9.4 23.2 26.0 10.8
## 106 O M 6 12.5 9.4 24.2 27.0 11.2
## 107 O M 7 12.7 10.4 26.0 28.8 12.1
## 108 O M 8 13.2 11.0 27.1 30.4 12.2
## 109 O M 9 13.4 10.1 26.6 29.6 12.0
## 110 O M 10 13.7 11.0 27.5 30.5 12.2
## 111 O M 11 14.0 11.5 29.2 32.2 13.1
## 112 O M 12 14.1 10.4 28.9 31.8 13.5
## 113 O M 13 14.1 10.5 29.1 31.6 13.1
## 114 O M 14 14.1 10.7 28.7 31.9 13.3
## 115 O M 15 14.2 10.6 28.7 31.7 12.9
## 116 O M 16 14.2 10.7 27.8 30.9 12.7
## 117 O M 17 14.2 11.3 29.2 32.2 13.5
## 118 O M 18 14.6 11.3 29.9 33.5 12.8
## 119 O M 19 14.7 11.1 29.0 32.1 13.1
## 120 O M 20 15.1 11.4 30.2 33.3 14.0
## 121 O M 21 15.1 11.5 30.9 34.0 13.9
## 122 O M 22 15.4 11.1 30.2 33.6 13.5
## 123 O M 23 15.7 12.2 31.7 34.2 14.2
## 124 O M 24 16.2 11.8 32.3 35.3 14.7
## 125 O M 25 16.3 11.6 31.6 34.2 14.5
## 126 O M 26 17.1 12.6 35.0 38.9 15.7
## 127 O M 27 17.4 12.8 36.1 39.5 16.2
## 128 O M 28 17.5 12.0 34.4 37.3 15.3
## 129 O M 29 17.5 12.7 34.6 38.4 16.1
## 130 O M 30 17.8 12.5 36.0 39.8 16.7
## 131 O M 31 17.9 12.9 36.9 40.9 16.5
## 132 O M 32 18.0 13.4 36.7 41.3 17.1
## 133 O M 33 18.2 13.7 38.8 42.7 17.2
## 134 O M 34 18.4 13.4 37.9 42.2 17.7
## 135 O M 35 18.6 13.4 37.8 41.9 17.3
## 136 O M 36 18.6 13.5 36.9 40.2 17.0
## 137 O M 37 18.8 13.4 37.2 41.1 17.5
## 138 O M 38 18.8 13.8 39.2 43.3 17.9
## 139 O M 39 19.4 14.1 39.1 43.2 17.8
## 140 O M 40 19.4 14.4 39.8 44.3 17.9
## 141 O M 41 20.1 13.7 40.6 44.5 18.0
## 142 O M 42 20.6 14.4 42.8 46.5 19.6
## 143 O M 43 21.0 15.0 42.9 47.2 19.4
## 144 O M 44 21.5 15.5 45.5 49.7 20.9
## 145 O M 45 21.6 15.4 45.7 49.7 20.6
## 146 O M 46 21.6 14.8 43.4 48.2 20.1
## 147 O M 47 21.9 15.7 45.4 51.0 21.1
## 148 O M 48 22.1 15.8 44.6 49.6 20.5
## 149 O M 49 23.0 16.8 47.2 52.1 21.5
## 150 O M 50 23.1 15.7 47.6 52.8 21.6
To quickly locate data in a dataset, we can use the function
which
, which we used above for matrices.
which(crabs$sp == "O")
## [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
## [19] 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
## [37] 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
## [55] 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172
## [73] 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190
## [91] 191 192 193 194 195 196 197 198 199 200
Let’s make a quick plot of our data (we will cover plots in more depth in a different class). Exploratory plots can help us see general patterns in our data. There are a few ways to do this:
plot(x=crabs$CL, y=crabs$CW)
plot(crabs$CL ~ crabs$CW, data=crabs)
There are MANY arguments to help us add titles, captions, legends,
regression lines, species names, etc. If you want to learn more about
them, check the plot
help file.
?plot
## Help on topic 'plot' was found in the following packages:
##
## Package Library
## graphics /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## base /Library/Frameworks/R.framework/Resources/library
##
##
## Using the first match ...
A nice thing about R is that it is free, and open source. This means that we can easily find several packages that extend the functionality of R. Many of these packages have been written by students and researchers. They are available for free at R’s official repository, CRAN.
We can install packages from CRAN with the
install.packages
function.In Rstudio, we can also go to
Packages => Install (lower right pane).
install.packages("ggplot2")
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
Some packages are loaded when you open R launch R. These include the
base
, stats
, and graphics
packages. If you want to know what they do, check the help file of each
package by using the following code.
library(help= "base")
Many packages are not preloaded by R, and you need to manually load
them. You use the library
function to do so.
library(ggplot2)
You can also access functions without loading their packages.
str(gls)
## Error in str(gls): object 'gls' not found
str(nlme::gls)
## function (model, data = sys.frame(sys.parent()), correlation = NULL, weights = NULL,
## subset, method = c("REML", "ML"), na.action = na.fail, control = list(),
## verbose = FALSE)
Download and install the packages dplyr
,
nlme
, ape
, and strap
.
Once installed, load them.
Throughout this course, you will be working with data (loading it, analysing it, making graphs). It will make life easier if you have a directory structure to keep things organized.
We recommend the following structure: ProjectName (folder that contains subfolders) –data (where you will store your datasets) –docs (where you will store Rmd files) –figs (where you will put figures that you generate with an R script) –R (where you will store small R functions that you write to work with your data)
Next time, we are going to go over how to get data into R. You will use this a lot, and it will be VERY important later on in the class as you are working with your own datasets.
Use the dataset airquality
:
data("airquality")
What kind of object is it and what is its structure?
What are its dimensions?
Create a new object temp
that contains the
temperature data (column Temp
).
Extract the 30 first values of temp in a new object
temp2
.
Convert temp2
into a 2-column matrix.
Check if temp2
is a matrix.
The data in airquality stop at September 30. Add these data for October 1:
ozone = 18, solar = 215, wind = 9.3, temp = 67, month = 10, day = 1
Conversion formula: °C = (°F − 32) * 5 / 9
Using the crabs
dataset:
Create a dataset with only the males in
crabs
.
Create a dataset with only the orange females in
crabs
.
Create dataset with all crabs for which CL is larger than the median of that trait.
Comments
Comments go after a hash tag. They can help organize your code.