Quick guide to annotating plots in R

R has powerful graphical capabilities, and its possible to create almost any kind of graph, chart or plot. It also has powerful annotation options, allowing you to write and draw all over your plot, using labels, shapes, highlighting, and more.

You might have previously created plots in R, and annotated them using a different graphic program (e.g. Photoshop, Corel Draw etc.). But, you could just do it all in R! This guide will show you some of the ways in which you can scribble on your plots, which can be useful for keeping notes, or to highlight certain features of your data...

© 2018 Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Plot annotation

! This guide was written using R version 3.4.2 on Windows 10.

In this guide, I'll show you some of the ways in which you can annotate your plots. I'll cover text annotation with text() and mtext(), drawing arrows arrows(), and drawing shapes including rectangles rect() and polygons polygon().

This guide refers to plot and figure regions - for a quick overview of these, take a look at my guide to layout.

First, we'll generate some random data for use in this guide, and plot it:

# Generate random data
x <- runif(50, min=1, max=100)
y <- rnorm(50)
# Create data frame and add random row names
df <- data.frame(x, y)
rownames(df) <- c(letters[1:26], LETTERS[1:24])
# Plot
par(mar=c(5, 5, 5, 5)) # Make large margins

Which will look like this:

0 20 40 60 80 -1 0 1 2 x y
© 2018 Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Adding text

If you want to annotate your plot or figure with labels, there are two basic options: text() will allow you to add labels to the plot region, and mtext() will allow you to add labels to the margins.

For the plot region, to add labels you need to specify the coordinates and the label. For example:

text(x=50, y=-1.5, labels="1st label")

This code would add single label to the plot at the specified co-ordinates. Using text() gives you complete control on the positioning of your labels. If you wanted to label the first 5 points from the dataframe, you could use the following code:

text(x=df$x[1:5], y=df$y[1:5], labels=rownames(df[1:5,]), pos=4, col="red")

This time, the coordinates are taken directly from the data frame, and the labels are the row names for the first 5 points. So that the labels are not plotted directly on top of the points, pos=4 will plot them on the right-hand side of the point, and col="red" will make them red, so they are easily distinguishable.

You could also use a vector to specify either the coordinates or the labels:

text(x=df$x[c(10, 20, 30)], y=df$y[c(10, 20, 30)], labels=c("Point 10", "Point 20", "Point 30"), pos=4, col="blue")

In this code, the x and y coordinates are pulled from the data frame for the specified points [c(10, 20, 30)], and the labels are now user defined (you can probably come up with better labels for your own data!).

So far we have added labels to the plot region. If you want to add labels the the margins (within the figure region), use mtext() instead. For example, lets add some labels to the x axis:

mtext(c("Lower", "Higher"), side=1, line=3, at=c(10, 80), col=c("blue", "red"))

In this code, we add two labels c("Lower", "Higher") to the bottom x axis side=1, positioning them on the "third line" of the margin line=3, at specific locations on the x axis at=c(10, 80), and colour each label differently col=c("blue", "red").

If you add a label to the y axis, it will automatically be rotated to 90 degrees, unless you use las=1.

mtext("Another label", side=4, line=1, at=2, col="green2") # Rotated y axis label
mtext("Another \nlabel", side=4, line=1, at=-1, col="green2", las=1) # Horizontal label

The first line of code will add a label rotated 90 degrees to match the y axis, while the second line will not rotate the label. The addition of \n before "label" will start a new line.

Let's see what all the labels look like on our plot:

1st label a b c d e Point 10 Point 20 Po Lower Higher Another label Another label

You'll notice that one of the labels ("Point 30") is cut off. This is because the label extends outside the plot region, and into the figure region. In order for the label to be displayed fully, you should add xpd=TRUE as an argument in the text() function.

© 2018 Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Identifying points and labelling

Another way to label points is to use the identify() function. This lets you click on the points you want to "identify" and will add a label. To stop identifying points, hit the ESC key, or press the stop button on the R menu bar. Alternatively, you can specify how many points you want to identify by adding n=10 to the command (to identify 10 points).

For example, to automatically label points with the row name use the following code:

# Identify points
identify(x=df, label=rownames(df), col="red")

Here, x= specifies the plot from which you want to identify points, and label=rownames(df) tells R to label the points using the row names from the data frame. You could also use a character string or vector to labels the points.

If you do not specify the label= argument, then it will default to plotting the row number from the dataframe (or if using vectors, the position of the point in the vector). This is useful if you want to find out which data the point belongs too. You could also create this as an object, for example:

ident <- identify(df)

And typing "ident" into the R console will give you a vector of the points you clicked (which in our example, relate to the row numbers from our data frame).

> ident
 [1] 13 35 42 49

(Your results will vary)


© 2018 Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Drawing arrows

You can draw arrows on your plot to point to specific data points. Like text(), you need to specify the x and y coordinates for the arrow, but you need to do this for the "start" and "end" (where the arrow head is drawn) positions.

Lets say we want to draw an arrow pointing at the first data point, you can get the coordinates from the data frame for the "end" position x1= and y1=, and then specify any coordinates for the "start" position (depending on where you want the arrow to be drawn from). For example:

arrows(x0=40, y0=-1, x1=df$x[1], y1=df$y[1], col="blue", lwd=2)

Will draw a blue arrow from the starting coordinates x0=40, y0=-1 to the specified data point x1=df$x[1], y1=df$y[1]. But, the arrow head will now cover the data point. To avoid this, you can offset the coordinates for the "end" point:

arrows(x0=40, y0=-1, x1=df$x[1]-2, y1=df$y[1]+0.02, col="blue", lwd=2)

The exact numbers you offset by will depend on your own data, as they are linked to the values of the axis.

This method might be a bit cumbersome if you want to draw several arrows, as you'll need to figure out the best coordinates to use for each arrow. An easier way to get the coordinates is to use the locator() function. Similar to identify(), you click on a position on the plot and/or figure region to get the coordinates for where you clicked.

For example, lets draw three arrows using locator(), we'll also start the arrows outside the plot region. First, we'll get the coordinates:

# Get coordinates
a1 <- locator(2)
a2 <- locator(2)
a3 <- locator(2)
# Create a matrix of the coordinates
co.x <- cbind(a1$x, a2$x, a3$x)
co.y <- cbind(a1$y, a2$y, a3$y)

This code will create a list object that contains coordinates for the two points locator(2) that you clicked on the figure. You should click the "start" position first, followed by the "end" position, for where you want the arrow to point to. We then combine the coordinates into a matrix, which we'll use to draw the arrows on our plot:

arrows(x0=co.x[1,], y0=co.y[1,], x1=co.x[2,], y1=co.y[2,], col=c("red", "green", "blue"), lwd=2, xpd=TRUE)

In this code we point the coordinates to the matrix we created, we use a vector of colours to colour each arrow differently, and specify xpd=TRUE to draw the arrows outside the plot region. The resulting plot will look something like this (your results will vary):

© 2018 Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Drawing shapes

You can draw shapes on your plot using rect() (to add squares or rectangles) or polygon() (to add polygons).

For rect(), you need to specify the four corners of the rectangle as plot coordinates. For example:

rect(xleft=20, ybottom=-1, xright=80, ytop=2, col=NA, border="orange", lwd=2)

This code will draw a large orange rectangle on our plot at the specified coordinates (which correspond to the x and y axis). col=NA stops the rectangle from being "filled", i.e. it will be transparent - specifying a colour will create a filled rectangle.

Polygon might be more useful for drawing on the plot, for example, to draw around a group of data points. To draw the polygon, you would need to specify the x and y coordinates of the shape. The easiest way to do this is with locator().

# Get coordinates
p1 <- locator(8)
p2 <- locator(12)
# Draw polygons
polygon(p1, border="green", lwd=2)
polygon(p2, border="blue", lwd=2)

In this code we created two polygons, the first used 8 sets of coordinates (from clicking on the plot 8 times), and the second used 12 sets of coordinates. They were added to the plot using polygon(). If you want the polygons to be filled, you would need to specify a colour col=.

The plot will now look similar to this:

© 2018 Benjamin Bell. All Rights Reserved. https://www.benjaminbell.co.uk

Bringing it all together

So far in this guide, I have shown you different ways to annotate your plot. Lets bring them all together, in a single plot:

par(mar=c(5, 5, 5, 5)) # Make large margins
# Add polygons
polygon(p1, border="green", lwd=2)
polygon(p2, border="blue", lwd=2)
# Label polygons
text(x=c(18, 62), y=c(-0.7, -0.9), labels=c("Green Group", "Blue Group"), col=c("green2", "blue2"))
# Add arrows
arrows(x0=co.x[1,], y0=co.y[1,], x1=co.x[2,], y1=co.y[2,], col=c("red2", "green2", "blue2"), lwd=2, xpd=TRUE)
# Label arrows
text(x=co.x[1,], y=co.y[1,], labels=c("Red Arrow", "Green Arrow", "Blue \nArrow"), col=c("red2", "green2", "blue2"), pos=c(3, 3, 4), xpd=TRUE)
# Add margin text
mtext(c("Lower", "Higher"), side=1, line=3, at=c(10, 80), col=c("blue", "red"))
# Margin arrows
arrows(x0=c(40, 60), y0=-2.65, x1=c(20, 72), y1=-2.65, col=c("blue", "red"), length=0.15, lwd=3, xpd=TRUE)
Lower Higher Green Group Blue Group Red Arrow Green Arrow Blue Arrow

Which results in a plot with some crazy annotations! Hopefully, this guide has given you some ideas for your own plots. Another way in which you might "annotate" your plot, is to highlight specific data points on your plot by over plotting them with larger, or different symbols. This was covered in the second part of my guide to PCA in R.

Thanks for reading, please leave any comments or questions below.


© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Further reading

A quick guide to pch symbols - A quick guide to the different pch symbols which are available in R, and how to use them. [R Graphics]

A quick guide to line types (lty) - A quick guide to the different line types available in R, and how to use them. [R Graphics]

A quick guide to layout() in R - How to create multi-panel plots and figures using the layout() function. Also covers plot and figure regions. [R Graphics]

Principal components analysis (PCA) - Part 2 - The second part of this guide for PCA, that covers loadings plots, convex hulls, specifying/limiting labels and/or variable arrows, and more biplot customisations - including over plotting data points.

No comments:

Post a Comment