Benjamin Bell: Blog: Pollen diagrams in R using rioja

Pollen diagrams in R using rioja - Part 1

by Ben on Wednesday, February 07, 2018

If you have ever needed to create a pollen or stratigraphic diagram, you'll know there are various ways to do this. And of course, it can be done in R! In this two-part guide series I will show you how to plot a pollen diagram using the "rioja" package in R. You can also use this guide to create other stratigraphic diagrams, for example, diatom diagrams. Read on for more...

! Never used R before? Check out my getting started with R guide first.

Guide Information

Title	Pollen diagrams in R using rioja - Part 1
Author	Benjamin Bell
Published	February 07, 2018
Last updated
R version	3.4.2
Packages	base; rioja; vegan
Navigation	Pollen diagrams using rioja Importing pollen data into R Plotting pollen diagrams Bar plots Bar plots with different coloured rows Line plots Symbol plots Filled (silhouette) plots Filled (silhouette) plots with exaggerated curves Additional options, including text rotation Cluster analysis Further reading: Advanced techniques

This is a 3 part guide:

Part 1: Pollen diagrams using rioja	How to plot pollen diagrams using rioja in R.
Part 2: Pollen diagrams using rioja	"Hacks" to further customise the plot.
Part 3: Pollen diagrams using rioja	How to use different plot types on the same plot.

Pollen diagrams using rioja

In this guide I will show you how to plot a pollen diagram in R using the strat.plot() function in the "rioja" package. "rioja" is written by Steve Juggins who also wrote the excellent "C2" program for visualising palaeoenvironmental data. This guide can also be used for plotting other palaeoenvironmental / chemical / biological etc. data. Part 1 (this guide) covers all of the possible plot types, and the main options which are available for creating pollen diagrams, while part 2 looks at some additional features and "hacks" you can do to change your pollen diagram. Part 3 shows you how to combine different plot types into a single pollen diagram.

Go ahead and install.packages("rioja") to download it.

Pollen diagrams consist of multiple plots side by side, each showing a different species. Of course, it's possible to construct a pollen diagram in R "manually" (and this guide shows you how using ggplot2), but rioja makes it nice and simple - once you know how to use all the different plotting options available.

Importing pollen data into R

For this guide, I'll use the pollen data from my first published paper (Modern surface pollen assemblages from the Middle and High Atlas, Morocco: insights into pollen representation and transport.), which you can download from Mendeley Data. The data is freely available, and you do not need to sign in to download it. The file can be found under "Experiment data files" titled "Morocco Pollen Surface Sample Data.csv". If you have your own pollen data, feel free to use that when following this guide!

This dataset contains pollen data for 33 modern surface samples collected in the Middle, and High Atlas Mountains in Morocco. Typically, pollen diagrams would show pollen changes down a "core", with each pollen sample representing a different depth or age. Surface samples on the other hand are collected from "the surface" (usually the first few centimetres of soil), and their age should be representative of the last few years. For the moment, we'll ignore that these are surface samples.

First, we'll import the data into R, and remove the unnecessary columns.

# Import data
ma.pollen.raw <- read.csv("Morocco Pollen Surface Sample Data.csv", header=TRUE, row.names=1, sep=",", check.names=FALSE)
# Remove the first four columns
ma.pollen <- ma.pollen.raw[-1:-4]
# Remove total AP/NAP and pollen concentration columns
ma.pollen <- ma.pollen[-49:-51]

In total, there are 48 pollen types in the dataset. Since some of the pollen types occur very infrequently, we'll remove them from the dataset.

# Remove pollen types where total abundanace is less than 10%
ma.sum <- colSums(ma.pollen) # Calculate column sums
ma.pollen1 <- ma.pollen[, which(ma.sum > 10)] # Subset data

In this code, we created a new object (ma.sum) which contains the total values for each column in the pollen dataset, which represent the different pollen types. The second part of the code creates another object, by subsetting the pollen dataset (ma.pollen) to keep only the pollen types where total abundance is more than 10%.

If you are using your own pollen data (or other data), the data frame should be formatted with columns as the pollen type or species, and the rows should contain the abundances. Row names should be the sample number, while column names should be the species.

Now the dataset is ready for plotting!

Plotting pollen diagrams

Bar plots

The strat.plot() function can plot pollen diagrams in several different ways, and this guide will cover them all. If you type ?strat.plot into the R console, you will invoke the help page, which details all of the arguments (options) available. As you can see, there is quite a lot available, and some arguments only work when used in combination with others.

We'll start by plotting a pollen diagram using bar plots.

# Load package
library("rioja")
# Define colour scheme
p.col <- c(rep("forestgreen", times=7), rep("gold2", times=20))
# Define y axis 
y.scale <- 1:33
# Plot bar plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=FALSE, plot.bar=TRUE, col.bar=p.col, lwd.bar=10, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

Which results in the following plot:

As you can see from the code, lots of arguments were used to produce this diagram! We'll go through each of these arguments to explain what it does and how to use it, starting with the first argument (and most crucial), which tells R which object your pollen data is stored in (ma.pollen1). The actual argument is d, although it is not necessary to write this in your code.

yvar

This argument is a vector that specifies the y axis. This can be the sample number, core depth or age. By default, it will use the row number if this is not specified.

y.tks

This argument is a numerical vector that specifies where to draw tick lines on the y axis, as well as what to label them. This cannot be a character vector (i.e. it can only consist of numbers), but there is a way around this (shown in part 2).

In this example, I have used the "y.scale" object to specify both the yvar, and y.tks arguments, depending on your own data, you may need to specify different objects for these arguments.

y.rev

This argument is a logical argument (TRUE or FALSE) to specify whether the y axis should be reversed. By default, the value is FALSE which would plot the diagram from bottom to top e.g. "sample 1" would appear at the bottom of the pollen diagram rather than the top. As pollen diagrams are usually plotted from top to bottom, you should specify this as TRUE.

Next, you'll notice that I have included three plot type arguments plot.line, plot.poly and plot.bar (each of which is logical), even though we are just plotting a bar plot. By default, the strat.plot() function will always plot a line graph unless told not to, so its a good idea to state explicitly what you want to plot, and what you do not want to plot.

col.bar

This argument specifies the colours of the bars in the diagram. This can either be a vector of colours, or a single colour to plot all bars the same.

In this example, the colours were defined in the "p.col" object, which tells R to use "forestgreen" for the first 7 bar charts (arboreal pollen), and "gold2" for the remaining (20) bar charts (non-arboreal pollen). If you wanted, you can specify a different colour for each bar chart, just create your vector object specifying each colour.

lwd.bar

This argument specifies the width of the bars that are plotted.

scale.percent

This is a logical argument which specifies whether the width of each graph (each pollen type or species) is scaled relative to the max abundance for that species. e.g. a pollen type which has a max abundance of 40% will have a wider graph than a pollen type with a max abundance of 10%. If the value of this argument is FALSE, then each graph will have the same width. You can also specify the width of each graph manually by using the graph.widths argument.

xSpace

This argument specifies the space between each graph, the values are "relative units". You can experiment with the value to increase or decrease the space between the graphs. If you have a lot of pollen types (or species) you may need to use a smaller value e.g. 0.005

x.pc.lab

This is a logical argument that tells R whether or not to draw the x-values (along the bottom of the diagram) when you use the scale.percent=TRUE argument.

x.pc.omit0

This is a logical argument that if TRUE, will not draw the "0" value label on the bottom x axis.

las

This argument specifies the style of axis labels - "2" tells R to draw the axis labels perpendicular to the axis. Type ?par in the R console for all options.

Bar plots with different coloured rows

With a bar plot pollen diagram, it is also possible to colour each row differently. This is instead of specifying the colour for each separate chart (pollen type or species).

# Specify row colours
p.col.group <- c(rep("blue", times=16), rep("red", times=14), rep("green2", times=3))
# plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=FALSE, plot.bar=TRUE, col.bar=p.col.group, lwd.bar=10, sep.bar=TRUE, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

In this example, sep.bar is also included. This is a logical argument which if TRUE tells R to colour each row separately, instead of colouring each chart separately. Note, that col.bar now specifies a different vector containing a different colour scheme.

This option is useful if you want to highlight or group certain samples together. For example, in the following plot, I have colour-coded the samples by their geographical location.

Line plots

To plot your pollen diagram as a line plot, use the following code:

# Line plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=TRUE, plot.poly=FALSE, plot.bar=FALSE, col.line=p.col, lwd.line=2, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

This code uses most of the same arguments used to plot the bar plot, with the addition of:

col.line

This argument specifies the colour of the line plots. Works the same as bar.col

lwd.line

This argument specifies the width of the line plots. Works the same as lwd.bar

Symbol plots

You could also plot your pollen diagram as symbols (although the customisation options are more limited)

# Symbol plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=FALSE, plot.bar=FALSE, plot.symb=TRUE, symb.pch=16, symb.cex=1, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

plot.symb

This is a logical argument telling R whether to plot the diagram as symbols.

symb.pch

This argument tells R which symbol to plot. Type ?pch into the R console to see a list of symbols. It is possible to use more than 1 symbol, but the different symbols will be plotted on the same chart i.e. it won't plot different symbols for different charts.

symb.cex

This argument specifies the symbol size.

Filled (silhouette) plots

To plot a filled, or silhouette plot, use the following code:

# Filled plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE, plot.bar=FALSE, col.poly=p.col, col.poly.line="black", lwd.poly=1, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

col.poly

This argument specifies the colour of the filled region. Works the same as bar.col

col.poly.line

This specifies the colour of the border line for the filled region. This can be a single colour, or a vector of colours.

lwd.poly

This specifies the width of the border line.

To make it easier to pick out individual samples (or depths) on the silhouette plot, you could also add bars for each sample.

# Filled plot with bars
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE, plot.bar=TRUE, col.bar="black", col.poly=p.col, col.poly.line="black", scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

In this code, we have specified both plot arguments plot.poly and plot.bar as TRUE, which adds the bars to the plot. You can configure the appearance of the bars using the options previously discussed in the bar plot example.

Filled (silhouette) plots with exaggerated curves

Often, when you plot a pollen diagram, you'll have several pollen types where the abundances are low, and so changes in abundance between samples are more difficult to see. To make changes in abundance easier to see, you could additionally plot an exaggerated curve for these pollen types.

# Specify which pollen types will have an exaggeration curve
ex <- c(FALSE, rep(TRUE, times=5), FALSE, rep(TRUE, times=7), FALSE, rep(TRUE, times=6), FALSE, rep(TRUE, times=3), FALSE, TRUE)
# Plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE,plot.bar=TRUE, col.bar="black", col.poly=p.col, col.poly.line="black", scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2, exag=ex, col.exag="auto", exag.alpha=0.50, exag.mult=3)

exag

This is a logical argument to add exaggerated curves to the plot. This only works with filled (silhouette) plots plot.poly=TRUE. This can be a single value (TRUE or FALSE) or a vector which specifies a TRUE or FALSE value for every chart in the diagram. In this example, we created a vector of logical values (ex), and specify this in the argument exag=ex

col.exag

This specifies the colours to use for the exaggerated curves. You could specify a single colour or a vector of colours. col.exag="auto" tells R to use the same colours as the main curve - which will be semi-transparent.

exag.alpha

This argument sets the transparency of the exaggerated curves. This should be a value between 0 and 1. This only works if you specify col.exag="auto"

exag.mult

This specifies the multiplier for the exaggerated curves. e.g. a value of 3, will multiply all the abundances by 3. This can be a single value, or you can use a vector to specify different multiplier values for different curves.

Additional options, including text rotation

There are several other arguments (options) which you can also specify to add or change things on your pollen diagram. Check the help page for full details, but some of the common ones you might want to include are:

title

This argument adds a title to your diagram.

x.names

This is a character vector that specifies the names for each graph. By default, it will use the column names from your data frame (hence, why it was not specified in the examples), but should you want to use different names, you can define them with this argument.

ylabel

This argument adds a label to the y axis e.g. "Depth" and/or "Age"

srt.xlabel

This is a numerical argument that allows you to rotate the text for the pollen types (or species). They are usually rotated by 45 degrees.

x.pc.inc

This will change the increments on the x axis to the value specified. This only works when scale.percent=TRUE. For example, to draw the x axis with increments every 5%, you would use x.pc.inc=5

Cluster analysis

If you want to add cluster analysis to your pollen diagram to show pollen assemblage zones, you can do this by creating a "chclust" object, and then adding it to your plot. You'll need to install the "vegan" package to calculate the dissimilarity indices which are used to create the "chclust" object. Go ahead and install "vegan" if you do not already have it installed install.packages("vegan")

Then, use the following code to create the "chclust" object for the pollen data.

# Cluster analysis
library("vegan")
ma.dist <- vegdist(ma.pollen1, method="bray", binary=FALSE, diag=FALSE, upper=FALSE, na.rm = FALSE) 
ma.chclust <- chclust(ma.dist, method="coniss")

The first part of the code calculates "dissimilarity indices" for the pollen data using the "bray" method. There are many options available, and are detailed in the help page, type ?vegdist in the R console for full details.

The second part of the code creates the "chclust" object from the dissimilarity indices, using the "coniss" method (Constrained Incremental Sums of Squares), which you can use to add to your pollen diagram.

To plot the pollen diagram and the cluster analysis, use the clust argument to specify the "chclust" object:

# Filled plot with cluster analysis
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE, plot.bar=TRUE, col.bar="black", col.poly=p.col, col.poly.line="black", scale.percent=TRUE, xSpace=0.01, x.pc.inc=5, x.pc.lab=TRUE, x.pc.omit0=TRUE, srt.xlabel=45, las=2, exag=ex, col.exag="auto", exag.mult=3, exag.alpha=0.50, title="Morocco Pollen Surface Samples", ylabel="Sample", clust=ma.chclust)

Next, to show the pollen assemblage zones on the diagram, you'll need to run the following code:

# Add cluster zones
addClustZone(pol.plot, ma.chclust, nZone=6, lwd=1.5, lty=2, col="grey25")

The addClustZone() function plots lines on your pollen diagram (pol.plot), based on the "chclust" object (ma.chclust), to create the number of "zones" as specified by the nZone argument.

This will result in the following plot:

And that's how you create pollen diagrams in R using rioja. Thanks for reading! As always, please leave any questions or comments below.

If you want to take these plots a little further, part 2 of this guide covers some additional options and "hacks" that you can do to add or change features on your pollen diagram. And part 3 shows you how to create a pollen diagram using multiple plot types. See below for details.