Pages (Desktop)

Pages (Mobile)

Introduction to loops in R

Programming loops are a way to repeat blocks of code to perform the same task over and over again. R offers the same functionality to perform repetitive tasks, allowing to write less and more efficient code.

This guide introduces you to loops in R, explains how to write them, including for loops and while loops, and some best practice for writing efficient loops. I'll also take a look at vectorization in R, explaining what it is and why you should used vectorized functions where possible. This guide will introduce you to the basics of programming and automating tasks in R.

Guide Information

Title Introduction to loops in R
Author Benjamin Bell
Published 24 Oct 2021
Last updated
R version 4.1.1
Packages base
© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

What are loops in R?

In programming, a loop is a block of code that will be repeated over and over again for a specified number of times, or until a condition is met. In R, there are three main types of loops: the for loop, which will repeat the code a specified number of times; the while loop, which will repeat the code until a logical condition is met; and the repeat loop, which will repeat the code indefinitely. To stop a repeat loop, you must also include a break statement and condition within the loop code.

So why would you want to write a loop? If you want to perform the same task multiple times, writing a loop will save you time as you can write the code once, rather than writing the same code over and over.

But maybe you heard that using loops in R was a bad idea, and that they can be slow?

Loops are not "bad" per se, but it is easy to write bad code whether this involves a loop or not. There are also certain things you can do to avoid writing bad loops. However, in terms of speed, a loop in R will be slower than using a vectorized method or function to perform the same task, but how much slower depends on what it is you are doing (and indeed how fast your system is).

So, should you avoid using loops?

You can read hundreds of articles or comments that say you should avoid using loops in R, and you can read hundreds more that say using loops is fine. Whether to use a loop or not depends entirely on your specific task and requirements.

For example, is there a in-built function in R that achieves the same task that your loop does? If yes, then use that built-in function and avoid using a loop. Do you need to perform thousands of operations on massive datasets where speed is critical? If yes, then avoid using a loop. However, if you need to quickly perform a task with a small dataset where speed is not an issue, or you need to perform a task that is not easily vectorized, then using a loop is fine.

But what about using apply() instead?

Using the apply() family of functions is considered the "preferred" method in R for achieving the same result as a for loop. In fact, these functions are just a wrapper for loops, and might not actually perform any faster (again, this depends on the task). However, they work in such a way as to eliminate many of the problems associated with badly written loops, and the code is considered more legible. However, it is still useful to have an understanding of loops, so you can decide whether or not to use them, or use apply().

© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Vectorization in R

Okay, so what about these "vectorized" functions in R?

Many of the functions in R are vectorized. But what exactly does this mean, and why are vectorized functions faster than loops written in R? A vectorized function is a function that works on each element of a vector at the same time. Effectively, the loop has already been implemented for you behind the scenes. Vectorized functions can simplify your code, they are easy to use and they are fast.

To understand why they are faster, it is important to understand how R works. As R is an interpreted language, when you send R code to the console, it is translated (or interpreted) into code that can be understood by the computer, it is then processed, before finally returning the output to the console. This process happens every time you execute code in R, line by line. This differs from a compiled language where the entire source code is compiled into machine readable code before the program can be run.

Because R code goes through an interpreter each time it is run, it is inherently slower than using a compiled language due to the additional overhead of the translation. This is why loops written in R code can be slow(er), because the code goes through the interpreter on every iteration of the loop. To keep R speedy, vectorized functions are written using C or Fortran, which are compiled languages and are very fast. A vectorized function does in fact use a loop to perform the task, but it is a highly optimised loop written in C/Fortran that is pre-compiled. Vectorized functions combine the benefits of using a compiled language (speed) with the benefits of an interpreted language (ease of use).

To help illustrate the concept, consider the following example. Lets say you want to sum all the numbers in a vector. If you were to use a loop written in R, it might look something like this:

# Vector of numbers
num <- c(5, 10, 8, 7, 2)

# Initialize "res"
res <- 0

# Loop to sum numbers
for(i in seq_along(num)) {
	res <- res + num[i]
}

Compare to using the equivalent vectorized function:

# Vector of numbers
num <- c(5, 10, 8, 7, 2)

# Vectorized function to sum numbers
res <- sum(num)

In this example, the loop requires more code to perform the same task as a vectorized function, and will perform slightly slower. It is also a poorly written loop, since the result vector (res) has to "grow" each time the loop is run, which is bad practice. To perform a task like this, you should always use the vectorized function.

However, sometimes you will want to do something that is not easily vectorized, or where speed is not critical (or doesn't make any difference), and this is where loops are useful.

Overall, the issue is quite overblown (especially with fast modern computers) and it is easy to waste a lot of time trying to avoid the use of loops, or rewriting perfectly good existing code that uses a loop, rather than just using one. Remember that "premature optimization is the root of all evil" - Donald Knuth.

If you do want or need to write advanced or high performance loops, you might want to take a look at the Rcpp package which allows you to write efficient C++ code in R. Check out Hadley Wickham's guide to using Rcpp to learn how to use it.

Otherwise, read on to get started with loops!

© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk
© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

For loops

The for loop allows you to perform a task a specific number of times. The diagram below helps to illustrate the for loop:

START END EXECUTE CODE YES for loop NO LAST ITERATION?

Here's an example of a for loop to print the same statement five times:

for(i in 1:5) {
     print("Hello World!")
}

Which would result in the following output in the R console:

[1] "Hello World!"
[1] "Hello World!"
[1] "Hello World!"
[1] "Hello World!"
[1] "Hello World!"

So how does this work? The loop code is made up of several components. i is the iterator variable, while 1:5 is our object which we will iterate through one by one.

But what is i exactly? It is a local variable, local that is to the loop you have just written. i.e. it exists only within the loop. The value of i changes on each iteration of the loop.

But why i, and what does it mean? Well, depending on who you ask, it could mean "iterator", "integer", "index", "imaginary", or a number of other things. You can actually use anything in place of i, e.g. a, b, x, mysuperdooperiterator, well perhaps not that last one. It doesn't matter what you call it as it will perform the same task. But, i is the convention when writing loops and it is used across most programming languages. This means it will be easily understood by others who are reading your code. For nested loops, you will typically see i and j used as the iterator variables (since j comes after i).

The length of the object (to be iterated through) will dictate the maximum number of times that the for loop with run. In this example, 1:5 creates an integer vector composed of five sequentially numbered elements, which has a length of five. You can confirm this in the R console:

> 1:5
[1] 1 2 3 4 5
> class(1:5)
[1] "integer"
> length(1:5)
[1] 5

On each iteration of the loop, the value of i changes. In this example, and on the first iteration of the loop, i becomes 1, on the second iteration, i becomes 2, for the third i becomes 3 and so on.

Lastly, the code contained inside the curly brackets is the code that is executed on each iteration of the loop. In this example, we only have one line of code, but you can have multiple lines of code in a loop. In fact, since there is only one, you do not actually need to use the curly brackets, and you could also write the code like this:

for(i in 1:5) print("Hello World!")

Lets consider another example. If you wanted to print a different statement each time, you could use the following code:

# Vector of names
names <- c("Buffy", "Xander", "Willow", "Rupert", "Cordelia")
# Loop
for(i in 1:5) {
     print(paste("Hello", names[i]))
}

Which would result in the following output in the R console:

[1] "Hello Buffy"
[1] "Hello Xander"
[1] "Hello Willow"
[1] "Hello Rupert"
[1] "Hello Cordelia"

As we iterate through the loop now, the code prints a different name, because we have now subsetted the names vector using the iterator variable i. Since this variable changes on each iteration of the loop, it prints a different name (the subset changes). On the first iteration, the code that is executed effectively becomes:

print(paste("Hello", names[1]))

Whilst on the second iteration of the loop, the code effectively becomes:

print(paste("Hello", names[2]))

And so on.

Another way to create the object to be iterated through is to use the seq_along function. For example:

# Loop
for(i in seq_along(names)) {
     print(paste("Hello", names[i]))
}

seq_along effectively does the same thing as using 1:5, turning the object into an integer with sequentially numbered elements of the same length. Take a look:

> seq_along(names)
[1] 1 2 3 4 5
> class(seq_along(names))
[1] "integer"
> length(seq_along(names))
[1] 5

seq_along is useful in loops when you do not know the length of the object that you are iterating through. Another option is to use the length() function. For example:

# Loop
for(i in 1:length(names)) {
     print(paste("Hello", names[i]))
}

Any of these approaches are valid, and they all have the same result.

© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Break and next statements

In the previous examples we iterate through every element within our vector. But you don't have to do this, you might only want to iterate through some of them. For example:

# Vector of names
names <- c("Buffy", "Xander", "Willow", "Rupert", "Cordelia")
# Loop
for(i in 2:4) {
     print(paste("Hello", names[i]))
}

Would result in the following output:

[1] "Hello Xander"
[1] "Hello Willow"
[1] "Hello Rupert"

The loop only prints three of the names in the names vector because the object that is being iterated through now only has three elements:

> 2:4
[1] 2 3 4
> class(2:4)
[1] "integer"
> length(2:4)
[1] 3

So, on the first iteration of the loop, i becomes 2, on the second iteration, i becomes 3 and on the third iteration i becomes 4.

A smarter way to decide what to iterate through in a loop can be achieved by using conditional statements. Let's say you want to stop the loop when it meets a certain condition, you can use the break statement after setting a condition. Or, if you wanted to skip an iteration of the loop (and keep it running afterwards), you could use the next statement after setting a condition.

. START END EXECUTE CODE NEXT? YES YES for loop NO NO LAST ITERATION? START END EXECUTE CODE BREAK? YES YES NO NO LAST ITERATION?

For example, lets stop running the loop once we reach the name "Rupert":

# Vector of names
names <- c("Buffy", "Xander", "Willow", "Rupert", "Cordelia")
# Stop the loop once the name equals "Rupert"
for(i in 1:5) {
     # Condition
     if(names[i]=="Rupert")
          break
     # Code to be executed
     print(paste("Hello", names[i]))
}

Which would result in the following output:

[1] "Hello Buffy"
[1] "Hello Xander"
[1] "Hello Willow"

But, what if we wanted to stop the loop from running after the name "Rupert", we could simply re-order the code within the loop:

# Stop the loop after "Rupert"
for(i in 1:5) {
     # Code to be executed
     print(paste("Hello", names[i]))
     # Condition
     if(names[i]=="Rupert")
          break
}
[1] "Hello Buffy"
[1] "Hello Xander"
[1] "Hello Willow"
[1] "Hello Rupert"

Now lets take a look at the next statement. For example, lets skip the iteration of the loop when the name is "Rupert":

# Skip "Rupert"
for(i in 1:5) {
     # Condition
     if(names[i]=="Rupert")
          next
     # Code to be executed
     print(paste("Hello", names[i]))
}

Which would result in the following output:

[1] "Hello Buffy"
[1] "Hello Xander"
[1] "Hello Willow"
[1] "Hello Cordelia"

You can also use both break and next statements and multiple conditions in a loop:

# Multiple conditions
for(i in 1:5) {
     # Condition 1
     if(names[i]=="Buffy")
          next
     # Code to be executed     
     print(paste("Hello", names[i]))
     # Condition 2
     if(names[i]=="Rupert")
          break
}
[1] "Hello Xander"
[1] "Hello Willow"
[1] "Hello Rupert"
© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Storing the output from a loop

The above examples send the output of the loop to the R console, this might be fine if you just want to gather some information, but it is likely that you will want to store the results from your loop. You can store the output or results of a loop in any format.

Before running the loop, you will first need to create a storage container for the results or output. IMPORTANT! The container should have the same length as the final results output, and it is also a good idea to give the container the same class of the results output (if possible).

For example, lets say you want to calculate the square root of every number in a vector and store the results in another vector. The original data is stored as a numeric vector and has ten elements. Therefore, your results vector should also have a length of ten, and since calculating the square root will generate numeric data, the results vector should also be numeric:

# Random data (10 numbers)
dat <- runif(10)

# Create a storage/container vector with a length of 10
results <- vector(mode="numeric", length=10)

# Run loop (calculate square root of each element)
for(i in 1:10) {
     results[i] <- sqrt(dat[i])
}

Here we create an empty numeric vector named "results". Before the loop is run, it will look like this in the R console:

> results
 [1] 0 0 0 0 0 0 0 0 0 0

And after you run the loop, it will be filled with the results (your data will vary):

> results
 [1] 0.57213147 0.96330053 0.45254744 0.02192411 0.97023437 0.81988137 0.36956970 0.36543316
 [9] 0.95555391 0.49002287

Both the "before" and "after" versions of the results vector will have the same memory allocation. You can test this by using the tracemem() function (your results will vary):

> tracemem(results)
[1] "<0000000011FAC248>"

If you didn't create a matching length results vector, this vector would have to "grow" on each iteration of the loop, with R re-allocating memory each time. This is inefficient and will slow down the performance of the loop.

You can see this for yourself by running the following code:

# Bad container
results2 <- vector()
# Bad loop
for(i in 1:10) {
     results2[i] <- sqrt(dat[i])
	print(tracemem(results2))
}

For a small loop like this, you probably won't notice the speed difference. But, what if the "dat" vector contained 100 million elements? Let compare their performance:

# Large random data
large_dat <- runif(100000000)

# Good loop
results <- vector("numeric", 100000000)
system.time(
for(i in 1:100000000) {
     results[i] <- sqrt(large_dat[i])
}
)

# Bad loop
results2 <- vector()
system.time(
for(i in 1:100000000) {
     results2[i] <- sqrt(large_dat[i])
}
)

"good loop" speed:

   user  system elapsed 
   4.61    0.02    4.63 

vs. "bad loop" speed:

   user  system elapsed 
  18.50    1.41   19.90 

Your results will vary, but as you can see here, the "bad loop" runs four times slower than the "good loop".

You can also store the results in a matrix, just follow the practice of preparing your results matrix before the loop:

# Create a storage/container matrix 
results_mat <- matrix(0, nrow=10, ncol=10)

# Run loop (calculate square root of each element)
for(i in 1:10) {
     results_mat[i,] <- sqrt(data[i])
}

You can also store results in a list, which is useful when you want to separate out results, or your output has mixed data classes, or you want to store objects themselves. As with vectors, you should first create an empty list of the same length as your results or output:

# Create a storage/container list 
results_list <- vector("list", 10)

# Run loop (calculate square root of each element)
for(i in 1:10) {
     results_list[[i]] <- sqrt(data[i])
}
© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk
© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

While loops

The while loop allows you to perform a task a until a logical condition is met. The diagram below helps to illustrate the while loop:

STARTENDEXECUTECODEFALSEwhileloopTRUELOGICALCONDITION

Here's an example of a while loop which will print the same statement five times:

# Container
i <- 1
# While loop
while(i <= 5) {
	print("Hello World!")
	i <- i + 1
}

The while loop requires a logical condition to work. In this example, the condition tells R to run the loop while the value of i is less than, or equal to 5.

For this example to work, we have to first create a "container" object with a value of 1 outside the loop. We then increment this value within the loop itself, increasing the value of i by 1 on each iteration. Therefore, on the fifth iteration of the loop, the value of i will have become 6 (1 +1 +1 +1 +1 +1). When the loop checks to see if it should execute the code again (on the sixth iteration), since the condition is no longer met, the loop exits.

You can also use break and next statements within while loops.

© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk
© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Repeat loops

The repeat loop will perform a task repeatedly. It will only stop when it meets a break statement and condition. Without this condition, the loop will not stop (unless you hit the escape key). The diagram below helps to illustrate the while loop:

BREAK?ENDFALSETRUESTARTEXECUTECODErepeatloopwith breakconditionSTARTEXECUTECODE

Here's an example of a repeat loop without a break statement and condition:

# Repeat loop
repeat {
	print("Hit escape key to stop this loop!")
     flush.console()
	Sys.sleep(1)
}

That loop would continue to run over and over again, until you intervened (hitting escape key). For a repeat loop to stop automatically, it must include a break statement and condition:

# Container
i <- 1
# Repeat loop
repeat {
     # Code to be executed
	print("Hello World!")
     # Container increment
	i <- i + 1
     # Condition
     if(i > 5) {
          break
     }

}

Like the while loop in the previous example, we created a container outside the loop which is incremented on each iteration of the loop. The loop now includes a condition to break (exit) the loop once the condition is met. The condition checks the value of i, and exits the loop once the value of i exceeds 5.

You'll notice that even though this performs the same task as the while loop in the previous example, the condition is written in a different way. One way of thinking about the difference is, a while loop will do a task while it hasn't yet met a condition, whereas the repeat loop will repeat a task until a condition is met.

So, that is the basics of writing loops in R. They can be very useful for automating or repeating tasks, and if there is not a native vectorized function that could perform the same task, don't be afraid to write a loop!

Thanks for reading, and please leave any comments of questions below.

© Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Further reading

Getting started with R - An introduction to R, everything you need to know to get started with R.

No comments

Post a Comment

Comments are moderated. There may be a delay until your comment appears.