2 + 2*sin(pi)
[1] 2
R can be used as a calculator. Enter the following in the console.
When we assign a value to a variable, it is better to use the assignment operator “<-” as below.
There is a general preference among the R community to use “<-” (instead of “=”) for assignment for compatibility with (very) old versions of S-Plus.
Now you can click on environment (top right panel) to see the variables.
You will make lots of assignments, and <- can be annoying to type. You can save time with RStudio’s keyboard shortcut: Alt & - (the minus sign). Notice that RStudio automatically surrounds <- with spaces, which is a good code formatting practice.
command | usage |
---|---|
rm(x) | remove variable x |
rm(list = ls()) | remove all variables in the current environment |
getwd() | list the current working directory |
setwd(“/home/username/folder”) | change current working directory |
cat(“\014”) | clears console (same as ctrl + l) |
help(mean) or ?mean | Getting Help on function mean() |
class() | find out data structure |
str() | also find out data structure, can be more concise than class() |
The # symbol begins a comment. These will be used regularly to notate the action immediately below the comment. If a commented line is run in the R console, nothing will happen.
Keyboard shortcut:
shortcut | function |
---|---|
[Alt] + [-] | generate ” <- ” |
[Tab] | auto-fill |
[Cmd/ctrl] + [enter] | execute code |
[↑] | bring command history |
[Cmd/Ctrl] + [↑] | bring command history with the same starting letters typed in console |
There are several types of structures (or variable types) in R.
The following defines a vector in R.
You can have a vector of strings.
In R, the indexing starts at 1. Retrieve the 4th coordinate of s:
Retrieve the first 3 coordinates of s
Retrieve the 1st, 2nd, and 5th coordinates of s
Retrieve all coordinates but the 3rd
The function seq() is very handy in creating equally spaced numbers.
And basic arithmetic on vectors is applied to every element of of the vector.
To create a df,
name Height
1 Amy 160
2 Bob 170
3 Cindy 165
To retrieve the entry at the 3rd row and 2nd column.
To retrieve the 2nd row of a df.
Multiple ways to retrieve 1st col: df[,1], df[[1]], df$colname
[1] "Amy" "Bob" "Cindy"
[1] "Amy" "Bob" "Cindy"
[1] "Amy" "Bob" "Cindy"
In all the subselecting commands above, you can always replace the column index by the corresponding column name. For example, my_df[, 1] is the same as my_df[, “name”].
The following is to subselect the first col and keep the data frame structure
Recall that we can use class() to find out the data structure of any object.
The function data() lists all data sets stored in R. “CO2” is a data frame that is stored in R.
Learn more about this data set.
The following commands are good for learning the data more.
Plant Type Treatment conc uptake
1 Qn1 Quebec nonchilled 95 16.0
2 Qn1 Quebec nonchilled 175 30.4
3 Qn1 Quebec nonchilled 250 34.8
4 Qn1 Quebec nonchilled 350 37.2
5 Qn1 Quebec nonchilled 500 35.3
6 Qn1 Quebec nonchilled 675 39.2
[1] "Plant" "Type" "Treatment" "conc" "uptake"
[1] 84 5
[1] 84
We will be working with data frames a lot throughout this course.
Plant Type Treatment conc uptake
11 Qn2 Quebec nonchilled 350 41.8
12 Qn2 Quebec nonchilled 500 40.6
13 Qn2 Quebec nonchilled 675 41.4
14 Qn2 Quebec nonchilled 1000 44.3
17 Qn3 Quebec nonchilled 250 40.3
18 Qn3 Quebec nonchilled 350 42.1
19 Qn3 Quebec nonchilled 500 42.9
20 Qn3 Quebec nonchilled 675 43.9
21 Qn3 Quebec nonchilled 1000 45.5
35 Qc2 Quebec chilled 1000 42.4
42 Qc3 Quebec chilled 1000 41.4
The 2nd column of “CO2” is a factor. Factors are very useful for categorical variables.
Also see Section 27.3.3 of Hadley
A list can be thought of a vector that store different types of structures.
[ extracts a sub-list. For example, l[1:2] is still a list.
[[ or $ extracts a single component from a list. For example, l[[1]] is a vector.
Good examples:
Read 4.1 and 4.2 of this book for more details.
R is great in terms of probability tasks and statistical analysis.
For each kind of probability distribution, R has four accompanying functions. For example, for Gaussian (normal) distribution,
# generate 3 random numbers following the normal distribution with mean 2 and sd 0.2.
rnorm(n = 3, mean = 2, sd = 0.2)
[1] 1.757635 2.092836 2.035169
If we use set.seed(), then the results are repeatable.
[1] 0.01874617 -0.18425254 -1.37133055 -0.59916772
The density function of \(X \sim N(\mu, \sigma^2)\) is \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}.\] This function is dnorm() in R.
[1] 0.3910427
[1] 0.3910427
The following computes \(P(X<2)\) where \(X\) follows the normal distribution with \(\mu=1.6, \sigma=1\).
The following computes \(q\) such that \(P(X<q) = 0.8\) where \(X\) follows the normal distribution with \(\mu=1.6, \sigma=1\).
pnorm() and qnorm() are inverse functions of each other.
The other distributions work similarly. For example, runif() generates random numbers coming from the uniform distribution.
The syntax for defining a new function in R is
To be filled
to be continued
In the near future, we will be using package “ggplot2” for its great data visualization ability. However, it is still very useful to know how to use base R to plot for its convenience and simplicity.