Programming

We will learn basic programming techniques in R to prepare ourselves to write program:

  • Conditional statements
  • Iterative Statements

Learning Objective:

  • Learn how to use relational operators

  • Learn how to use logical operators

  • Learn how to use conditional statements (if)

  • Learn how to use basic iterative statements (do while, for)

  • Learn how to use advanced iterative statements (apply, Reduce)

Logical statements

A logical variable (or Boolean variable) is either 0 (false) or 1 (true).

The basic way to define a logical variable is by relational operators by comparing two expressions. For example, we often ask if a variable x is bigger than a certain number?

A more sophiciated way is to combine two simple logical statements using logical operators.

Relational Operators

It compares two numerical expressions and return a Boolean variable: either 0 (FALSE) or 1 (TRUE). The following table shows the six commonly used relational operators

Relational operator Interpretation
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== equal to
!= not equal to

Precedence of relational operators is lower than arithmetic operators. To avoid confusion, it is good practice to use brackets to control ordering. For example, if we write 2 + 2 1 + 3, then the software we do the arithmetic operators first. This implies we have 4 4, which implies a false statement

Logical Operators

In practice, we often use two or more conditions to decide what to do. For example, scholarship is often given if the candidate has done well in both academic and extra-curricular activities.

To combine different conditions, there are three logical operators: (1) &&, (2) ||, and (3) !.

First, && is similar to AND in English, that is, A && B is true only if both A and B are true. Second,  is similar to OR in English, that is, A || B is false only if both A and B are false. Third, ! is similar to NOT in English, that is, !A is true only if A is false.

A B A && B A || B !A !B
false false false false true true
false true false true true false
true false false true false true
true true true true false false

Different relational operators, the precedence of logical operators can be high. While and and or operators are lower in precedence than relational operators, not has a very high precedence and almost always evaluated first. - Hence, it is always a good practice to use brackets to control operation ordering.

Conditional

Perform a different set of statement(s) based on whether the condition is true or false.

if (condition){
  blocks to execute when condition hold
} else{
  blocks to execuse when condition fails
}

Example of if-else statement

x <- 1
y <- 2
if(x < y){print("Yes!")}
## [1] "Yes!"
if(x<y){
  print("Yes!")
} else{
  print("No~")
}
## [1] "Yes!"

Exercises

  1. Write a function f(x) that returns absolute value of x.

    • If x is positive, then f(x) =x.

    • If x is negative, then f(x) = -x.

  2. Write a function f(x) that returns CAP given exam score x.

    • If x=90, then CAP is 5.

    • If x is between 80 and 90, then CAP is 4.

    • If x is between 70 and 80, then CAP is 3.

    • If x is less than 70, then CAP is 2.

  3. Write a function f(x,y) that returns grade using test score x and exam score y.
    • If x+y is no less than 90 and y is no less than 40, then grade is A+.
    • If x+y is no less than 80 but less than 90, then CAP is A.
    • If x+y is no less than 70 but less than 80, then CAP is A-.
    • If x+y is less than 70, then CAP is B+.

Application: Checking Data Type

One important usage of conditional statement is to check whether the data for calculation is correct.

Very often, we only want to execute the code if the data point exists. We can use is.na(). It will return true if it is NA; otherwise false.

if(!is.na(x)){
  do.something
}

Another related function is to check if the data type is number. The function is.numeric() returns true if it is numerical variable; otherwise false.

Application: Missing data

Note that R reports error if there is an NA in the statement.

x <- NA
x > 0
## [1] NA

Very often, we want to set ignore those case with missing data (NA). Then we will use isTRUE

x <- NA
isTRUE(x > 0)
## [1] FALSE

Loop

We very often want to execute codes multiple times with the same or similar conditions. There are two basic forms of repetition statement in R:

  1. While loop: repeat as long as condition is true,
  2. for loop over integers: repeat for a pre-specific number of times, and
  3. for loop over vector.

While loop

The syntax of while loop is as follows:

while (condition){
  keep doing when condition holds
}

The following example shows how to calculate the sum of all integers from 1 to 10.

n <- 10
i <- 1
sum <- 0
while (i <= n){         # i is the control variable     
  sum <- sum + i        # accumulate i into sum
        i <- i + 1      # increment i by 1
  }

For Loop over integers

Loops that are based on the value of an integer variable. It increases by the same amount each time, and ends when the variable reaches a specified value

The syntax for a for loop is simple.

m <- 1
n <-10
for (i in m:n){
  run as long as condition holds
}

Loop over vectors

We can define a vector and construct a for loop that executes for each element in the vector:

students <- c("Amy", "Tom")
for (student in students){
   cat(student,"\n")
}
## Amy 
## Tom

Example: Technical Indicators

A n-day simple moving avaerage (n-day SMA) is arithmetic average of prices of past n days:

\[ SMA_t(n) = \frac{P_t+\ldots+P_{t-n+1}}{n}\]

The following is an SMA function:

mySMA <- function (price,n){
  sma <- c()
  sma[1:(n-1)] <- NA
  for (i in n:length(price)){
    sma[i]<-mean(price[(i-n+1):i])
  }
  return(sma)
}
mySMA(1:10,5)
##  [1] NA NA NA NA  3  4  5  6  7  8

An n-day exponential moving avaerage (n-day EMA) is exponential average of prices where weighting of more recent prices carry higher weights. The first EMA is similar to SMA and the updating formula is simple.

\[ EMA_t(n) = \beta P_t +(1-\beta) EMA_{t-1}(n)\]

where \(\beta =\frac{2}{n+1}\), \(n\) is number of days and \(P_t\) is price of time \(t\).

This is an EMA function:

myEMA <- function (price,n){
  ema <- c()
  ema[1:(n-1)] <- NA
  ema[n]<- mean(price[1:n])
  beta <- 2/(n+1)
  for (i in (n+1):length(price)){
    ema[i]<-beta * price[i] + 
      (1-beta) * ema[i-1]
  }
  return(ema)
}
myEMA(1:10,5)
##  [1] NA NA NA NA  3  4  5  6  7  8

An n-day relative strenght index (RSI) is normalized n-day relative strength (RS) that is ratio of days of stock price has gone up to the days of stock price have gone down.

This is an RSI function:

myRSI <- function (price,n){
  U <- rep(0,length(price))
  D <- rep(0,length(price))
  rs <- rep(NA,length(price))
  rsi <- rep(NA,length(price))
  for (i in 2:length(price)){
    if (price[i]>price[(i-1)]){
      U[i] <- 1
    } else if (price[i]<price[(i-1)]){
      D[i] <- 1
    }
    if (i>n){
      if (sum(D[(i-n+1):i])==0){
        rsi[i] <- 100
      } else{
        rs[i] <- sum(U[(i-n+1):i])/
          sum(D[(i-n+1):i])
        rsi[i] <- rs[i]/(rs[i]+1)*100
      }
    }
  }
  return(rsi)
}
myRSI(sample(4:10,20,replace=TRUE),6)
##  [1]       NA       NA       NA       NA       NA       NA 50.00000
##  [8] 40.00000 20.00000 33.33333 33.33333 20.00000 40.00000 60.00000
## [15] 75.00000 50.00000 66.66667 75.00000 50.00000 33.33333

Exercise

  1. Use while loop to calculate the product of all numbers from 50 to 100.

  2. Use for loop to calculate the product of all numbers from 50 to 100.

  3. Write a MACD function that returns both macd and signal line using four inputs:
  • price: price vector
  • S: number of periods for fast moving average
  • L: number of periods for slow moving average
  • K: number of periods for signal moving average

Apply Family

Apply family is a very convenient tools to loop over data structure (vector, array, matrix and list). The most useful for our purposes are:

  1. apply(),
  2. sapply(), and
  3. lapply().

apply

We will to apply a function to each row or each column on a matrix. It will apply a function by-column (=2) or by-row (=1).

x <- matrix(1:9, nrow = 3, ncol=3)
x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
apply(x,1,mean) # average each row
## [1] 4 5 6
apply(x,2,mean) # average each column
## [1] 2 5 8

sapply and lapply

Sometimes, we will apply a function repeatedly on each element of a vector (or a list). Then sapply is more convenient then a for loop.

Consider the following for loop that returns the square of each element

x <- c(1,3,5)
for (i in 1:length(x)) {
  x[i] <- x[i]^2
}
x
## [1]  1  9 25

We can do the same using sapply() with much cleaner code.

x <- c(1,3,5)
x <- sapply(x, function(x) x^2)
x
## [1]  1  9 25

lapply() is the same as sapply() but we end up with a list instead of vector. We can use unlist to get back the same result as sapply().

x <- c(1,3,5)
x <- lapply(x, function(x) x^2)
x
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 9
## 
## [[3]]
## [1] 25
x <- unlist(x)

Reduce

Sometimes, we want to get a final result by applying operation one by one.

On vector

For example, 2-day moving average is \[EMA_{t}=\frac{1}{3}p_{t}+\frac{2}{3}EMA_{t-1}\].

The following example shows the final 2-day EMA with initial EMA being 5.

price <- 1:10
ema <- 5
for (i in 1:length(price)) {
 ema <- (ema + 2*price[i])/3 
}
ema
## [1] 9.500093

The following example use Reduce to simplify the code.

price <- 1:10
initial <- 5
ema <- Reduce(function(x,y){(x+2*y)/3},price,initial)
ema
## [1] 9.500093

On list

The following example shows how we can combine data sets.

HI <- data.frame(stock=c(1,2), high=c(7,8))
LO <- data.frame(stock=c(1,2), low=c(1,2))
OP <- data.frame(stock=c(1,2), open=c(3,6))
CL <- data.frame(stock=c(1,2), close=c(5,5))
df <-merge(HI,LO)
df <-merge(df,OP)
df <-merge(df,CL)
df
##   stock high low open close
## 1     1    7   1    3     5
## 2     2    8   2    6     5

Since merging is one by one on the data.frame, we just need to create a list of data.frame.

HI <- data.frame(stock=c(1,2), high=c(7,8))
LO <- data.frame(stock=c(1,2), low=c(1,2))
OP <- data.frame(stock=c(1,2), open=c(3,6))
CL <- data.frame(stock=c(1,2), close=c(5,5))
df <- Reduce(function(x,y) {merge(x,y)}, list(HI, LO, OP, CL))
df
##   stock high low open close
## 1     1    7   1    3     5
## 2     2    8   2    6     5

Tricks

There are some minor tricks to help you in coding. For long codes, we sometimes want to calculate the time required to complete a task, and we might improve the efficiency of the code if it is too slow.

Timing code

To get the current time, we may use Sys.time(). Put this at the beginning and the end of the code. Then the difference is the run time.

t1 <- sys.time()
#…run code here …#
t2 <- sys.time()
t2-t1

Suppress annoying message

R tends to give a lot of warning and messages when you run a long code. Sometimes you want to suppress them to keep your console window clean.

suppressMessage(…)
suppressWarnings(…)

All files names in a folder

The function list.files will find all files in a folder.

# full.names = TRUE for full path
filenames <- list.files(path = "./folder/", full.names = TRUE)

Using Reduce() and lapply() we can read all csv files from the folder, and merge them together

setwd("C:/path")
file.names <-list.files(path = ".", 
                        pattern="*.csv",
                        full.names = TRUE)
file.list <- lapply(file.names, 
                    function(x){
                      read.csv(file = x,
                               header = TRUE,
                               stringsAsFactors = FALSE)})
df<-Reduce(function(x,y) {merge(x, y, all=TRUE)},
           file.list)
Previous
Next