Programming
We will learn basic programming techniques in R to prepare ourselves to write program:
- Conditional statements
- Iterative Statements
Learning Objective:
Learn how to use relational operators
Learn how to use logical operators
Learn how to use conditional statements (if)
Learn how to use basic iterative statements (do while, for)
Learn how to use advanced iterative statements (apply, Reduce)
Logical statements
A logical variable (or Boolean variable) is either 0 (false) or 1 (true).
The basic way to define a logical variable is by relational operators by comparing two expressions. For example, we often ask if a variable x is bigger than a certain number?
A more sophiciated way is to combine two simple logical statements using logical operators.
Relational Operators
It compares two numerical expressions and return a Boolean variable: either 0 (FALSE) or 1 (TRUE). The following table shows the six commonly used relational operators
Relational operator | Interpretation |
---|---|
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | equal to |
!= | not equal to |
Precedence of relational operators is lower than arithmetic operators. To avoid confusion, it is good practice to use brackets to control ordering. For example, if we write 2 + 2 1 + 3, then the software we do the arithmetic operators first. This implies we have 4 4, which implies a false statement
Logical Operators
In practice, we often use two or more conditions to decide what to do. For example, scholarship is often given if the candidate has done well in both academic and extra-curricular activities.
To combine different conditions, there are three logical operators: (1) &&, (2) ||, and (3) !.
First, && is similar to AND in English, that is, A && B is true only if both A and B are true. Second, is similar to OR in English, that is, A || B is false only if both A and B are false. Third, ! is similar to NOT in English, that is, !A is true only if A is false.
A | B | A && B | A || B | !A | !B |
---|---|---|---|---|---|
false | false | false | false | true | true |
false | true | false | true | true | false |
true | false | false | true | false | true |
true | true | true | true | false | false |
Different relational operators, the precedence of logical operators can be high. While and and or operators are lower in precedence than relational operators, not has a very high precedence and almost always evaluated first. - Hence, it is always a good practice to use brackets to control operation ordering.
Conditional
Perform a different set of statement(s) based on whether the condition is true or false.
if (condition){
blocks to execute when condition hold
} else{
blocks to execuse when condition fails
}
Example of if-else statement
x <- 1
y <- 2
if(x < y){print("Yes!")}
## [1] "Yes!"
if(x<y){
print("Yes!")
} else{
print("No~")
}
## [1] "Yes!"
Exercises
Write a function f(x) that returns absolute value of x.
If x is positive, then f(x) =x.
If x is negative, then f(x) = -x.
Write a function f(x) that returns CAP given exam score x.
If x=90, then CAP is 5.
If x is between 80 and 90, then CAP is 4.
If x is between 70 and 80, then CAP is 3.
If x is less than 70, then CAP is 2.
- Write a function f(x,y) that returns grade using test score x and exam score y.
- If x+y is no less than 90 and y is no less than 40, then grade is A+.
- If x+y is no less than 80 but less than 90, then CAP is A.
- If x+y is no less than 70 but less than 80, then CAP is A-.
- If x+y is less than 70, then CAP is B+.
Application: Checking Data Type
One important usage of conditional statement is to check whether the data for calculation is correct.
Very often, we only want to execute the code if the data point exists. We can use is.na(). It will return true if it is NA; otherwise false.
if(!is.na(x)){
do.something
}
Another related function is to check if the data type is number. The function is.numeric() returns true if it is numerical variable; otherwise false.
Application: Missing data
Note that R reports error if there is an NA in the statement.
x <- NA
x > 0
## [1] NA
Very often, we want to set ignore those case with missing data (NA). Then we will use isTRUE
x <- NA
isTRUE(x > 0)
## [1] FALSE
Loop
We very often want to execute codes multiple times with the same or similar conditions. There are two basic forms of repetition statement in R:
- While loop: repeat as long as condition is true,
- for loop over integers: repeat for a pre-specific number of times, and
- for loop over vector.
While loop
The syntax of while loop is as follows:
while (condition){
keep doing when condition holds
}
The following example shows how to calculate the sum of all integers from 1 to 10.
n <- 10
i <- 1
sum <- 0
while (i <= n){ # i is the control variable
sum <- sum + i # accumulate i into sum
i <- i + 1 # increment i by 1
}
For Loop over integers
Loops that are based on the value of an integer variable. It increases by the same amount each time, and ends when the variable reaches a specified value
The syntax for a for loop is simple.
m <- 1
n <-10
for (i in m:n){
run as long as condition holds
}
Loop over vectors
We can define a vector and construct a for loop that executes for each element in the vector:
students <- c("Amy", "Tom")
for (student in students){
cat(student,"\n")
}
## Amy
## Tom
Example: Technical Indicators
A n-day simple moving avaerage (n-day SMA) is arithmetic average of prices of past n days:
\[ SMA_t(n) = \frac{P_t+\ldots+P_{t-n+1}}{n}\]
The following is an SMA function:
mySMA <- function (price,n){
sma <- c()
sma[1:(n-1)] <- NA
for (i in n:length(price)){
sma[i]<-mean(price[(i-n+1):i])
}
return(sma)
}
mySMA(1:10,5)
## [1] NA NA NA NA 3 4 5 6 7 8
An n-day exponential moving avaerage (n-day EMA) is exponential average of prices where weighting of more recent prices carry higher weights. The first EMA is similar to SMA and the updating formula is simple.
\[ EMA_t(n) = \beta P_t +(1-\beta) EMA_{t-1}(n)\]
where \(\beta =\frac{2}{n+1}\), \(n\) is number of days and \(P_t\) is price of time \(t\).
This is an EMA function:
myEMA <- function (price,n){
ema <- c()
ema[1:(n-1)] <- NA
ema[n]<- mean(price[1:n])
beta <- 2/(n+1)
for (i in (n+1):length(price)){
ema[i]<-beta * price[i] +
(1-beta) * ema[i-1]
}
return(ema)
}
myEMA(1:10,5)
## [1] NA NA NA NA 3 4 5 6 7 8
An n-day relative strenght index (RSI) is normalized n-day relative strength (RS) that is ratio of days of stock price has gone up to the days of stock price have gone down.
This is an RSI function:
myRSI <- function (price,n){
U <- rep(0,length(price))
D <- rep(0,length(price))
rs <- rep(NA,length(price))
rsi <- rep(NA,length(price))
for (i in 2:length(price)){
if (price[i]>price[(i-1)]){
U[i] <- 1
} else if (price[i]<price[(i-1)]){
D[i] <- 1
}
if (i>n){
if (sum(D[(i-n+1):i])==0){
rsi[i] <- 100
} else{
rs[i] <- sum(U[(i-n+1):i])/
sum(D[(i-n+1):i])
rsi[i] <- rs[i]/(rs[i]+1)*100
}
}
}
return(rsi)
}
myRSI(sample(4:10,20,replace=TRUE),6)
## [1] NA NA NA NA NA NA 50.00000
## [8] 40.00000 20.00000 33.33333 33.33333 20.00000 40.00000 60.00000
## [15] 75.00000 50.00000 66.66667 75.00000 50.00000 33.33333
Exercise
Use while loop to calculate the product of all numbers from 50 to 100.
Use for loop to calculate the product of all numbers from 50 to 100.
- Write a MACD function that returns both macd and signal line using four inputs:
- price: price vector
- S: number of periods for fast moving average
- L: number of periods for slow moving average
- K: number of periods for signal moving average
Apply Family
Apply family is a very convenient tools to loop over data structure (vector, array, matrix and list). The most useful for our purposes are:
- apply(),
- sapply(), and
- lapply().
apply
We will to apply a function to each row or each column on a matrix. It will apply a function by-column (=2) or by-row (=1).
x <- matrix(1:9, nrow = 3, ncol=3)
x
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
apply(x,1,mean) # average each row
## [1] 4 5 6
apply(x,2,mean) # average each column
## [1] 2 5 8
sapply and lapply
Sometimes, we will apply a function repeatedly on each element of a vector (or a list). Then sapply is more convenient then a for loop.
Consider the following for loop that returns the square of each element
x <- c(1,3,5)
for (i in 1:length(x)) {
x[i] <- x[i]^2
}
x
## [1] 1 9 25
We can do the same using sapply() with much cleaner code.
x <- c(1,3,5)
x <- sapply(x, function(x) x^2)
x
## [1] 1 9 25
lapply() is the same as sapply() but we end up with a list instead of vector. We can use unlist to get back the same result as sapply().
x <- c(1,3,5)
x <- lapply(x, function(x) x^2)
x
## [[1]]
## [1] 1
##
## [[2]]
## [1] 9
##
## [[3]]
## [1] 25
x <- unlist(x)
Reduce
Sometimes, we want to get a final result by applying operation one by one.
On vector
For example, 2-day moving average is \[EMA_{t}=\frac{1}{3}p_{t}+\frac{2}{3}EMA_{t-1}\].
The following example shows the final 2-day EMA with initial EMA being 5.
price <- 1:10
ema <- 5
for (i in 1:length(price)) {
ema <- (ema + 2*price[i])/3
}
ema
## [1] 9.500093
The following example use Reduce to simplify the code.
price <- 1:10
initial <- 5
ema <- Reduce(function(x,y){(x+2*y)/3},price,initial)
ema
## [1] 9.500093
On list
The following example shows how we can combine data sets.
HI <- data.frame(stock=c(1,2), high=c(7,8))
LO <- data.frame(stock=c(1,2), low=c(1,2))
OP <- data.frame(stock=c(1,2), open=c(3,6))
CL <- data.frame(stock=c(1,2), close=c(5,5))
df <-merge(HI,LO)
df <-merge(df,OP)
df <-merge(df,CL)
df
## stock high low open close
## 1 1 7 1 3 5
## 2 2 8 2 6 5
Since merging is one by one on the data.frame, we just need to create a list of data.frame.
HI <- data.frame(stock=c(1,2), high=c(7,8))
LO <- data.frame(stock=c(1,2), low=c(1,2))
OP <- data.frame(stock=c(1,2), open=c(3,6))
CL <- data.frame(stock=c(1,2), close=c(5,5))
df <- Reduce(function(x,y) {merge(x,y)}, list(HI, LO, OP, CL))
df
## stock high low open close
## 1 1 7 1 3 5
## 2 2 8 2 6 5
Tricks
There are some minor tricks to help you in coding. For long codes, we sometimes want to calculate the time required to complete a task, and we might improve the efficiency of the code if it is too slow.
Timing code
To get the current time, we may use Sys.time(). Put this at the beginning and the end of the code. Then the difference is the run time.
t1 <- sys.time()
#…run code here …#
t2 <- sys.time()
t2-t1
Suppress annoying message
R tends to give a lot of warning and messages when you run a long code. Sometimes you want to suppress them to keep your console window clean.
suppressMessage(…)
suppressWarnings(…)
All files names in a folder
The function list.files will find all files in a folder.
# full.names = TRUE for full path
filenames <- list.files(path = "./folder/", full.names = TRUE)
Using Reduce() and lapply() we can read all csv files from the folder, and merge them together
setwd("C:/path")
file.names <-list.files(path = ".",
pattern="*.csv",
full.names = TRUE)
file.list <- lapply(file.names,
function(x){
read.csv(file = x,
header = TRUE,
stringsAsFactors = FALSE)})
df<-Reduce(function(x,y) {merge(x, y, all=TRUE)},
file.list)