Pollen diagrams in R using rioja - Part 1

If you have ever needed to create a pollen or stratigraphic diagram, you'll know there are various ways to do this. And of course, it can be done in R! In this two-part guide series I will show you how to plot a pollen diagram using the "rioja" package in R. You can also use this guide to create other stratigraphic diagrams, for example, diatom diagrams. Read on for more...


Never used R before? Check out my getting started with R guide first.
© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Pollen diagrams using rioja

! This guide was written using R version 3.4.2 on Windows 10.
And using the "rioja" package, version 0.9-15.1

In this guide I will show you how to plot a pollen diagram in R using the strat.plot() function in the "rioja" package. "rioja" is written by Steve Juggins who also wrote the excellent "C2" program for visualising palaeoenvironmental data. This guide can also be used for plotting other palaeoenvironmental / chemical / biological etc. data. Part 1 (this guide) covers all of the possible plot types, and the main options which are available for creating pollen diagrams, while part 2 looks at some additional features and "hacks" you can do to change your pollen diagram. Part 3 shows you how to combine different plot types into a single pollen diagram.

Go ahead and install.packages("rioja") to download it.

Pollen diagrams consist of multiple plots side by side, each showing a different species. Of course, it's possible to construct a pollen diagram in R "manually" (and this guide shows you how using ggplot2), but rioja makes it nice and simple - once you know how to use all the different plotting options available.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Importing pollen data into R

For this guide, I'll use the pollen data from my first published paper (Modern surface pollen assemblages from the Middle and High Atlas, Morocco: insights into pollen representation and transport.), which you can download from Mendeley Data. The data is freely available, and you do not need to sign in to download it. The file can be found under "Experiment data files" titled "Morocco Pollen Surface Sample Data.csv". If you have your own pollen data, feel free to use that when following this guide!

This dataset contains pollen data for 33 modern surface samples collected in the Middle, and High Atlas Mountains in Morocco. Typically, pollen diagrams would show pollen changes down a "core", with each pollen sample representing a different depth or age. Surface samples on the other hand are collected from "the surface" (usually the first few centimetres of soil), and their age should be representative of the last few years. For the moment, we'll ignore that these are surface samples.

First, we'll import the data into R, and remove the unnecessary columns.

# Import data
ma.pollen.raw <- read.csv("Morocco Pollen Surface Sample Data.csv", header=TRUE, row.names=1, sep=",", check.names=FALSE)
# Remove the first four columns
ma.pollen <- ma.pollen.raw[-1:-4]
# Remove total AP/NAP and pollen concentration columns
ma.pollen <- ma.pollen[-49:-51]

In total, there are 48 pollen types in the dataset. Since some of the pollen types occur very infrequently, we'll remove them from the dataset.

# Remove pollen types where total abundanace is less than 10%
ma.sum <- colSums(ma.pollen) # Calculate column sums
ma.pollen1 <- ma.pollen[, which(ma.sum > 10)] # Subset data

In this code, we created a new object (ma.sum) which contains the total values for each column in the pollen dataset, which represent the different pollen types. The second part of the code creates another object, by subsetting the pollen dataset (ma.pollen) to keep only the pollen types where total abundance is more than 10%.

If you are using your own pollen data (or other data), the data frame should be formatted with columns as the pollen type or species, and the rows should contain the abundances. Row names should be the sample number, while column names should be the species.

Now the dataset is ready for plotting!

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Plotting pollen diagrams

Bar plots

The strat.plot() function can plot pollen diagrams in several different ways, and this guide will cover them all. If you type ?strat.plot into the R console, you will invoke the help page, which details all of the arguments (options) available. As you can see, there is quite a lot available, and some arguments only work when used in combination with others.

We'll start by plotting a pollen diagram using bar plots.

# Load package
library("rioja")
# Define colour scheme
p.col <- c(rep("forestgreen", times=7), rep("gold2", times=20))
# Define y axis 
y.scale <- 1:33
# Plot bar plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=FALSE, plot.bar=TRUE, col.bar=p.col, lwd.bar=10, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

Which results in the following plot:

As you can see from the code, lots of arguments were used to produce this diagram! We'll go through each of these arguments to explain what it does and how to use it, starting with the first argument (and most crucial), which tells R which object your pollen data is stored in (ma.pollen1). The actual argument is d=, although it is not necessary to write this in your code.

yvar is a vector that specifies the y axis. This can be the sample number, core depth or age. By default, it will use the row number if this is not specified.

y.tks is a numerical vector that specifies where to draw tick lines on the y axis, as well as what to label them. This cannot be a character vector (i.e. it can only consist of numbers), but there is a way around this (shown in part 2).

In this example, I have used the "y.scale" object to specify both the yvar, and y.tks arguments, depending on your own data, you may need to specify different objects for these arguments.

y.rev is a logical argument (TRUE or FALSE) to specify whether the y axis should be reversed. By default, the value is FALSE which would plot the diagram from bottom to top e.g. "sample 1" would appear at the bottom of the pollen diagram rather than the top. As pollen diagrams are usually plotted from top to bottom, you should specify this as TRUE.

Next, you'll notice that I have included three plot type arguments plot.line, plot.poly and plot.bar (each of which is logical), even though we are just plotting a bar plot. By default, the strat.plot() function will always plot a line graph unless told not to, so its a good idea to state explicitly what you want to plot, and what you do not want to plot.

col.bar specifies the colours of the bars in the diagram. This can either be a vector of colours, or a single colour to plot all bars the same.

In this example, the colours were defined in the "p.col" object, which tells R to use "forestgreen" for the first 7 bar charts (arboreal pollen), and "gold2" for the remaining (20) bar charts (non-arboreal pollen). If you wanted, you can specify a different colour for each bar chart, just create your vector object specifying each colour.

lwd.bar specifies the width of the bars that are plotted.

scale.percent is a logical argument which specifies whether the width of each graph (each pollen type or species) is scaled relative to the max abundance for that species. e.g. a pollen type which has a max abundance of 40% will have a wider graph than a pollen type with a max abundance of 10%. If the value of this argument is FALSE, then each graph will have the same width. You can also specify the width of each graph manually by using the graph.widths argument.

xSpace specifies the space between each graph, the values are "relative units". You can experiment with the value to increase or decrease the space between the graphs. If you have a lot of pollen types (or species) you may need to use a smaller value e.g. 0.005

x.pc.lab is a logical argument that tells R whether or not to draw the x-values (along the bottom of the diagram) when you use the scale.percent=TRUE argument.

x.pc.omit0 is a logical argument that if TRUE, will not draw the "0" value label on the bottom x axis.

las specifies the style of axis labels - "2" tells R to draw the axis labels perpendicular to the axis. Type ?par in the R console for all options.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Bar plots with different coloured rows

With a bar plot pollen diagram, it is also possible to colour each row differently. This is instead of specifying the colour for each separate chart (pollen type or species).

# Specify row colours
p.col.group <- c(rep("blue", times=16), rep("red", times=14), rep("green2", times=3))
# plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=FALSE, plot.bar=TRUE, col.bar=p.col.group, lwd.bar=10, sep.bar=TRUE, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

In this example, sep.bar is also included. This is a logical argument which if TRUE tells R to colour each row separately, instead of colouring each chart separately. Note, that col.bar now specifies a different vector containing a different colour scheme.

This option is useful if you want to highlight or group certain samples together. For example, in the following plot, I have colour-coded the samples by their geographical location.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Line plots

To plot your pollen diagram as a line plot, use the following code:

# Line plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=TRUE, plot.poly=FALSE, plot.bar=FALSE, col.line=p.col, lwd.line=2, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

This code uses most of the same arguments used to plot the bar plot, with the addition of:

col.line specifies the colour of the line plots. Works the same as bar.col

lwd.line specifies the width of the line plots. Works the same as lwd.bar

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Symbol plots

You could also plot your pollen diagram as symbols (although the customisation options are more limited)

# Symbol plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=FALSE, plot.bar=FALSE, plot.symb=TRUE, symb.pch=16, symb.cex=1, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

plot.symb is a logical argument telling R whether to plot the diagram as symbols.

symb.pch tells R which symbol to plot. Type ?pch into the R console to see a list of symbols. It is possible to use more than 1 symbol, but the different symbols will be plotted on the same chart i.e. it won't plot different symbols for different charts.

symb.cex specifies the symbol size.


Ad

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Filled (silhouette) plots

To plot a filled, or silhouette plot, use the following code:

# Filled plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE, plot.bar=FALSE, col.poly=p.col, col.poly.line="black", lwd.poly=1, scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

col.poly specifies the colour of the filled region. Works the same as bar.col

col.poly.line specifies the colour of the border line for the filled region. This can be a single colour, or a vector of colours.

lwd.poly specifies the width of the border line.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

To make it easier to pick out individual samples (or depths) on the silhouette plot, you could also add bars for each sample.

# Filled plot with bars
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE, plot.bar=TRUE, col.bar="black", col.poly=p.col, col.poly.line="black", scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2)

In this code, we have specified both plot arguments plot.poly and plot.bar as TRUE, which adds the bars to the plot. You can configure the appearance of the bars using the options previously discussed in the bar plot example.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Filled (silhouette) plots with exaggerated curves

Often, when you plot a pollen diagram, you'll have several pollen types where the abundances are low, and so changes in abundance between samples are more difficult to see. To make changes in abundance easier to see, you could additionally plot an exaggerated curve for these pollen types.

# Specify which pollen types will have an exaggeration curve
ex <- c(FALSE, rep(TRUE, times=5), FALSE, rep(TRUE, times=7), FALSE, rep(TRUE, times=6), FALSE, rep(TRUE, times=3), FALSE, TRUE)
# Plot
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE,plot.bar=TRUE, col.bar="black", col.poly=p.col, col.poly.line="black", scale.percent=TRUE, xSpace=0.01, x.pc.lab=TRUE, x.pc.omit0=TRUE, las=2, exag=ex, col.exag="auto", exag.alpha=0.50, exag.mult=3)

exag is a logical argument to add exaggerated curves to the plot. This only works with filled (silhouette) plots plot.poly=TRUE. This can be a single value (TRUE or FALSE) or a vector which specifies a TRUE or FALSE value for every chart in the diagram. In this example, we created a vector of logical values (ex), and specify this in the argument exag=ex

col.exag specifies the colours to use for the exaggerated curves. You could specify a single colour or a vector of colours. col.exag="auto" tells R to use the same colours as the main curve - which will be semi-transparent.

exag.alpha sets the transparency of the exaggerated curves. This should be a value between 0 and 1. This only works if you specify col.exag="auto"

exag.mult specifies the multiplier for the exaggerated curves. e.g. a value of 3, will multiply all the abundances by 3. This can be a single value, or you can use a vector to specify different multiplier values for different curves.


Update: If you want to create a pollen diagram using multiple plot types, check out part 3 of this guide series.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Additional options, including text rotation

There are several other arguments (options) which you can also specify to add or change things on your pollen diagram. Check the help page for full details, but some of the common ones you might want to include are:

title which will add a title to your diagram.

x.names which is a character vector that specifies the names for each graph. By default, it will use the column names from your data frame (hence, why it was not specified in the examples), but should you want to use different names, you can define them with this argument.

ylabel adds a label to the y axis e.g. "Depth" and/or "Age"

In many pollen or stratigraphic diagrams, the text for the pollen types (or species) is often rotated 45 degrees. You can also do this in R by adding the srt.xlabel argument, and specifying the angle.

x.pc.inc will change the increments on the x axis to the value specified. This only works when scale.percent=TRUE. For example, to draw the x axis with increments every 5%, you would use x.pc.inc=5

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Cluster analysis

If you want to add cluster analysis to your pollen diagram to show pollen assemblage zones, you can do this by creating a "chclust" object, and then adding it to your plot. You'll need to install the "vegan" package to calculate the dissimilarity indices which are used to create the "chclust" object. Go ahead and install "vegan" if you do not already have it installed install.packages("vegan")

Then, use the following code to create the "chclust" object for the pollen data.

# Cluster analysis
library("vegan")
ma.dist <- vegdist(ma.pollen1, method="bray", binary=FALSE, diag=FALSE, upper=FALSE, na.rm = FALSE) 
ma.chclust <- chclust(ma.dist, method="coniss")

The first part of the code calculates "dissimilarity indices" for the pollen data using the "bray" method. There are many options available, and are detailed in the help page, type ?vegdist in the R console for full details.

The second part of the code creates the "chclust" object from the dissimilarity indices, using the "coniss" method (Constrained Incremental Sums of Squares), which you can use to add to your pollen diagram.

To plot the pollen diagram and the cluster analysis, use the clust argument to specify the "chclust" object:

# Filled plot with cluster analysis
pol.plot <- strat.plot(ma.pollen1, yvar=y.scale, y.tks=y.scale, y.rev=TRUE, plot.line=FALSE, plot.poly=TRUE, plot.bar=TRUE, col.bar="black", col.poly=p.col, col.poly.line="black", scale.percent=TRUE, xSpace=0.01, x.pc.inc=5, x.pc.lab=TRUE, x.pc.omit0=TRUE, srt.xlabel=45, las=2, exag=ex, col.exag="auto", exag.mult=3, exag.alpha=0.50, title="Morocco Pollen Surface Samples", ylabel="Sample", clust=ma.chclust)

Next, to show the pollen assemblage zones on the diagram, you'll need to run the following code:

# Add cluster zones
addClustZone(pol.plot, ma.chclust, nZone=6, lwd=1.5, lty=2, col="grey25")

The addClustZone() function plots lines on your pollen diagram (pol.plot), based on the "chclust" object (ma.chclust), to create the number of "zones" as specified by the nZone argument.

This will result in the following plot:

And that's how you create pollen diagrams in R using rioja. Thanks for reading! As always, please leave any questions or comments below.

If you want to take these plots a little further, part 2 of this guide covers some additional options and "hacks" that you can do to add or change features on your pollen diagram. And part 3 shows you how to create a pollen diagram using multiple plot types. See below for details.


Ad

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Further reading: Advanced techniques

Pollen diagrams using rioja - Part 2 - In the second part of this guide, I show you some additional options available for plotting pollen diagrams.

Pollen diagrams using rioja - Part 3 - In the third part of this guide, I show you how to use multiple plot types on the same pollen diagram.


4 comments:

  1. Can you define the nZone argument using a broken stick method? Something like:
    nZone = min( which(bstick(clust)$bstick < bstick(clust)$dispersion) )-1

    ReplyDelete
    Replies
    1. As far as I know, the nZone argument should be an integer - the addClustZone() function uses the cutree function in stats... take a look at ?cutree for more details.

      You could use addZone(x, upper=) for total control on the placement of zones.

      Also take a look at: plot(ma.chclust), and bstick(ma.chclust) to see the broken stick model versus the cluster analysis.

      Delete
  2. I have a few questions: 1) How would you add a secondary y-axis to the stratigraphic plot (e.g. containing 210Pb Ages); 2) How do you have different x labels at the top for different types of plots (e.g. species names at an angle in italics, dendrogram with label CONISS at no angle, and PCA sample scores (line graph), also at no angle etc..); 3) How do you add x-axis labels at the bottom of the stratigraphic plot for Percentage abundance (%) (under all graphs with species), Total sum of squares (for the dendrogram), etc? Thank you.

    ReplyDelete
    Replies
    1. Hi Sarah, thanks for your questions!

      1) This should be a case of using something like: axis(2, at="", labels="", line=3) - which will add a second y axis to your plot.Take a look at part 3 of this guide for an example.

      The problem will be lining up the axis correctly - this will involve adjusting the par settings (There's an example in part 3 of this guide). Best way is to play around with the different settings and see what works. Unfortunately, i don't think there is an easy way to do this.

      2) There doesn't appear to be an easy way to do this either - it appears the srt.xlabel= argument can only take a single value.

      I would suggest that you use mtext - this will add text to the plot "margins" - again, you'll need to play around with the settings, something like: mtext("CONISS", side=3, line=-2, at=6), Experiment with different values until it appears in the right place :)

      3) This is fairly easy, mtext will do what you want - something like: mtext("Percentages", side=1, line=4) and mtext("Sum of Squares", side=1, line=4, at=6).

      Again, you'll need to experiment with the arguments here, adjusting the "at=" value, and "line=" values.

      If you have access to a graphics editor - it might be easier to add them in that way :)

      Delete