You should have completed lessons 1 and 3 of R programming in swirl before reading this supplementary note.

One thing you notice when opening an R console is that you can use it as a calculator. In addition to the basic arithmetic operations addition (+), subtraction (-), multiplication (*) and division (/), R has built-in standard mathematical functions. The following is a short list of standard mathematical functions.

Function Name Example
abs absolute value abs(3-6) = 3
sqrt square root sqrt(16) = 4
^ exponentiation 3^10 = \(3^{10}\) = 59049
exp exponential function exp(1.7) = \(e^{1.7}\) = 5.473947
log log function (base e) log(10) = 2.302585
log10 base 10 log (\(\log_{10}\)) log10(100) = 2
pi mathematical constant \(\pi\) pi = 3.141593
sin, cos, tan trigonometric functions (argument in radians) sin(pi/2) = 1
asin, acos, atan inverse trigonometric functions acos(1) = 0
sinh, cosh, tanh hyperbolic functions cosh(0) = 1
asinh, acosh, atanh inverse hyperbolic functions atanh(tanh(12)) = 12
round(x,n) round x to n decimal places round(pi,2) = 3.14
floor rounds down floor(14.7) = 14
ceiling rounds up ceiling(14.7) = 15

There are also useful statistical functions that we will talk about later. The ability of using R as a calculator allows us to analyze data interactively, as we will demonstrate later in the course. One distinct advantage of R over a conventional scientific calculator is its ability to perform vectorized operations, as demonstrated below.

Vectorized Operations

As explained in lesson 3 of R programming in swirl, we can generate a sequential list of integer vector using :. For example, 1:100 generates a vector of length 100 with values 1, 2, 3, … 100. Here “vector” simply means an array of numbers/characters/objects of the same class. We can also store the integer vector to a variable using the assignment operator <-:

x <- 1:100

Many other programming languages use = as an assignment operator. In R, you can use = as an assignment operator too. For example, x = 1:100 is equivalent to x <- 1:100 in this context. However, as you will learn later that the = operator in R has other uses. Assignments can also be made in the other direction, using the obvious change in the assignment operator. For example, 1:100 -> x is equivalent to x <- 1:100.

Suppose now I type

x <- 2*x - 1

What happens is that the original integer vector (1, 2, 3, …, 100) is replaced by (1, 3, 5, 7, …, 199) as each element is multiplied by 2 and then subtracted by 1. You can confirm this by typing x to auto print its content:

x
  [1]   1   3   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33
 [18]  35  37  39  41  43  45  47  49  51  53  55  57  59  61  63  65  67
 [35]  69  71  73  75  77  79  81  83  85  87  89  91  93  95  97  99 101
 [52] 103 105 107 109 111 113 115 117 119 121 123 125 127 129 131 133 135
 [69] 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169
 [86] 171 173 175 177 179 181 183 185 187 189 191 193 195 197 199

What happens if I type the following?

x <- c(1,-1)*x

As explained in lesson 1 of R programming in swirl, since c(1,-1) is a vector of length 2 and x is a vector of length 100, R “recycles” the c(1,-1) vector 50 times to carry out the multiplication. The result is that the first element of the original vector is multiplied by 1, the second element is multiplied by -1, the third is multiplied by 1 and so on. So the content of x becomes:

x
  [1]    1   -3    5   -7    9  -11   13  -15   17  -19   21  -23   25  -27
 [15]   29  -31   33  -35   37  -39   41  -43   45  -47   49  -51   53  -55
 [29]   57  -59   61  -63   65  -67   69  -71   73  -75   77  -79   81  -83
 [43]   85  -87   89  -91   93  -95   97  -99  101 -103  105 -107  109 -111
 [57]  113 -115  117 -119  121 -123  125 -127  129 -131  133 -135  137 -139
 [71]  141 -143  145 -147  149 -151  153 -155  157 -159  161 -163  165 -167
 [85]  169 -171  173 -175  177 -179  181 -183  185 -187  189 -191  193 -195
 [99]  197 -199

Now let’s do the following:

x <- sum(1/x)

sum() is a built-in R function that returns the sum of a vector. Thus, x is now just a number (numeric vector of length = 1):

x
[1] 0.7828982

It is the result of the sum \[ 1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7}+\cdots + \frac{1}{199}=\sum_{n=1}^{100} \frac{(-1)^{n-1}}{2n-1} \] The same calculation can be compressed into a one-line expression:

sum( c(1,-1)/(2*(1:100)-1) )
[1] 0.7828982

or this one-line expression

sum(c(1,-1)/seq(1,199,2))
[1] 0.7828982

The seq() function is introduced in lesson 3 of the R programming in swirl. If you forget how it is used, type ?seq in the R console to pull up a help page.

We see that a lengthy calculation can be carried out by just a one-line expression. If you are not impressed yet, try this:

s <- 4*sum( c(1,-1)/(2*(1:1e6)-1) )

By typing the above single line expression, we have just told R to carry out a sum over one million terms and then multiply by 4! The number 1e6 means 1000000 (1 followed by 6 zeros, or \(10^6\)), which is one million. The value stored in the variable s is equal to \[4\left( 1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7}+\cdots + \frac{1}{1999999}\right) = 4\sum_{n=1}^{10^6} \frac{(-1)^{n-1}}{2n-1} \] The value is

s
[1] 3.141592

If this number seems familiar to you, it’s because it’s close to the number \(\pi= 3.141592653589...\). In R, the variable pi stores this number:

pi
[1] 3.141593

By default, R displays floating-point numbers to 7 significant figures even though R uses 8 bytes to store a floating-point number, corresponding to about 16 significant figures. You can change this default by the command options(digits=n), where n is the number of significant figures you want R to display. For example,

options(digits=15)

sets the default printout to 15 significant figures:

c(s,pi)
[1] 3.14159165358979 3.14159265358979

So we see that s and pi only agree in the first 6 digits. In fact, it is well-known by mathematicians that the infinite series
\[ 4 \sum_{n=1}^{\infty} \frac{(-1)^{n-1}}{2n-1} \] converges to \(\pi\) but the convergence is very slow.

Now you can tell your friends that you have just learned a new skill to quickly compute the sum of a long series. You show your friends that you can sum the series of two million terms \[ 1+\frac{1}{2^2}+\frac{1}{3^2}+\cdots + \frac{1}{(2\times 10^6)^2}\] by just typing

sum(1/(1:2e6)^2)
[1] 1.64493356684835

Be aware, though, that if one of your friends is knowledgeable in math, he/she will laugh and point out that \[ \sum_{n=1}^{\infty} \frac{1}{n^2}=\frac{\pi^2}{6} \ ,\] which was first figured out by the famous mathematician Leonhard Euler in 1735. You will be amazed and type

pi^2/6
[1] 1.64493406684823

to confirm that your sum is indeed close to this number.

The sum() function returns the sum of a vector. The prod() function returns the product of a vector. For example, the vector 3:5 is an integer vector consisting of 3, 4, and 5, whereas prod(3:5) returns \(3\times 4\times 5\) or 60:

prod(3:5)
[1] 60

Like the sum() function, prod() can be useful in some problems. Let’s consider the birthday problem encountered in Stat 100 to further illustrate its use.

The Birthday Problem

What is the probability that at least two people share the same birthday in a set of \(n\) randomly chosen people? This is the famous birthday problem. If you forget how to solve it, look up your Stat 100 note or visit this website. The answer is given by the expression (ignore the leap date) \[P = 1 - \frac{365}{365}\cdot \frac{364}{365}\cdot \frac{363}{365}\cdots \frac{366-n}{365}\] In a class of 100 students, the probability that at least two students share the same birthday is \[ P = 1 - \frac{365}{365}\cdot \frac{364}{365}\cdot \frac{363}{365}\cdots \frac{266}{365} \] What is the numerical value of this \(P\)? Your Stat 100 instructor told you that it’s very close to 1. Do you believe it? Have you checked the calculation? When I first read the birthday problem from a book, I was suspicious of the claim. I used a calculator to carry out the calculation and confirmed the result. It took me a few minutes to get the answer. With R, we can easily get the answer with the following one-line expression:

1-prod((365:266)/365)
[1] 0.999999692751072

We see that it is indeed very close to 1. Let’s break down the expression to see why it gives the desired result. 365:266 is an integer vector containing 365, 364, 363, …, 266. (365:266)/365 divides each element in 365:266 by 365, so it’s a vector containing \[ \frac{365}{365}, \ \ \ \ \ \frac{364}{365}, \ \ \ \ \ \frac{363}{365}, \ \ \ \ \ \cdots \ \ \ \ \ \frac{266}{365}\] prod((365:266)/365) returns the product of the vector in (365:266)/365. That is, \[ {\rm prod((365:266)/365)} = \frac{365}{365}\cdot \frac{364}{365}\cdot \frac{363}{365}\cdots \frac{266}{365}\] which is the probability that all of the 100 students have different birthdays. This is a very small number:

prod((365:266)/365)
[1] 3.07248927851577e-07

Finally, 1-prod((365:266)/365) is the probability that at least two students share the same birthday. Since prod((365:266)/365) is very small (\(\approx 3 \times 10^{-7}\)), 1-prod((365:266)/365) \(\approx 1\) as claimed.

Interestingly, I recently found out that R has a built-in function for the birthday problem. It is called the probability of coincidences. The function pbirthday(n) returns the probability that at least 2 people share the same birthday among \(n\) randomly chosen people. For \(n=100\), the function gives

pbirthday(100)
[1] 0.999999692751072

exactly the same value calculated above. pbirthday() has other optional parameters you can specify for generalized birthday problems. There is also an associated function qbirthday(). Type ?pbirthday for more information.