Getting Climate Data

Have you ever needed weather information or climate data for a project you are working on, such as a dissertation or thesis? Depending on which part of the world you need data for, sometimes it can prove very difficult to obtain good reliable climate datasets. While some agencies make weather station data freely accessible e.g. NOAA, others may restrict access. You might also want to look at climate data on large or global scales, where obtaining data from individual agencies can become more complex.

Luckily, there are several global climate datasets available, which have done all the hard work of collating and processing the climate data into an easily accessible and consistent format.

In part one of this guide, I will show you how to obtain climate data from the Climate Research Unit (CRU) and extract only the data you need for your project using R. In part two of this guide (coming soon!), I will show you how you can manipulate this data, while part three will show you how to extract climate data from WorldClim datasets.



© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Getting CRU climate data

! This guide was written using R version 3.4.2 on Windows 10.

This is part one of a guide that will take you through the complete process of downloading climate data, opening it in R, and then extracting only the data you want for your sites, step by step. In the next guide i will show you can work with and manipulate this data, while future guides will look at plotting the climate data.

The Climatic Research Unit (CRU) at the University of East Anglia provides gridded climate datasets on past and present climate. For this guide, we will be using the High-resolution gridded dataset: CRU TS, which provides time-series global land climate data on a 0.5° x 0.5° degree grid, which is an area approximately 55 km2.

CRU TS provides data for: cloud cover, diurnal temperature range, potential evapotranspiration (PET), daily mean temperatures, mean maximum temperature, mean minimum temperature, precipitation, and vapour pressure. Full details about the dataset is available in the associated research paper (Harris et al. 2013).

The latest version of the dataset: v4.01, provides this data as monthly data for the period 1901 - 2016. The 4.xx series uses an updated method for calculating the gridded climate data, which differs from the previous 3.xx releases. You can still download v3.25.01 of the dataset, which covers the same period, but this series of datasets will no longer be maintained. Details about the changes and how they are created can be found in the release notes.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Downloading the data

For this guide, we will be using CRU TS v4.01 and looking at precipitation and mean temperature, we will make use of several R packages, but firstly, we need to download the data.

The data can be downloaded from the Centre for Environmental Data Analysis, using this permalink: http://doi.org/10/gcmcz3 which will present you with a page that looks like this: (These instructions are correct as at 10/01/2018, but the website may be updated and change)

You may first need to register to access the data, so click on the "Register/Login for Access" button and follow the instructions. Once you are registered and have logged in, click the "Get Data" button to download the available data, which will present you with this page:

Firstly, we are going to get precipitation data, so you should click on the "data" folder, then the "pre" folder. We want precipitation data for the full period of 1901 to 2016, so you need to find the file "cru_ts4.01.1901.2016.pre.dat.nc.gz" and then click the download link to save the file to your computer.

We also want mean temperature data, so, starting from the data folder, click the "tmp" folder to find the file "cru_ts4.01.1901.2016.tmp.dat.nc.gz" and download this to your computer. These files are compressed ".gz" files which you need to first extract, in order to open them in R. A list of programs that can extract .gz files can be found here for Windows, Mac OS X and Linux. On Windows, I would recommend 7-Zip (free and open source) which can open just about any archive file.

For each R project you are going to work on, it is generally a good idea to create a separate folder to work in. So, I would suggest you save the climate data to a new folder, e.g. "CRU climate" within your R working directory.

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Extracting the climate data using R

To work with the CRU datasets in R, you first need to install some additional packages: "ncdf4" in order to read the data files, and "raster" which can extract and plot the data.

To install a package in R, use the command install.packages() and put the package name (case sensitive) within the parenthesis and enclosed with quotation marks. For example, to install the "raster" package, type the following command into the R console:

> install.packages("raster")

On the first time you try and install a package after opening R, it will ask you to select a download mirror - select whichever is closest to your location, and the package will then download and install automatically. Then repeat the process to install ncdf4.

After you have installed the packages, you will need to load them for this R session. If you were to close the R environment and reopen it, you would need to reload the packages.

Use the following code to load the packages in R:
© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk
# Extracting CRU Climate data: CRU TS v4.01
# Complete guide available at: http://www.benjaminbell.co.uk

# Load packages
library(raster)
library(ncdf4)

Before you extract any climate data, it is a good idea to first take a look at the information provided by the data file, which is in NetCDF format. This format is often used for large datasets, and can be easily read using R.

In the R console, first change the working directory using the setwd() command to the same folder in which you have downloaded the climate data (or your R project folder).

For example, if you are using Windows and you have an R working directory on your C:\ drive, and created the subfolder "CRU climate", you would set the working directory using the following command:

> setwd("C:/R/CRU climate")

Note, that when using R on Windows systems, you need to use a forward slash (/) instead of a backslash (\) when using file locations. Forward slash for file locations is standard on Mac OS X and Linux.

You can confirm which directory you are currently working in by using the getwd() command. e.g.

> getwd()
 [1] "C:/R/CRU climate"

To view information about the CRU precipitation data file, use the nc_open() command from the "ncdf4" package, followed by the print() command, which will output the following information:

> nc.pre <- nc_open("cru_ts4.01.1901.2016.pre.dat.nc")
> print(nc.pre)
File C:/R/CRU climate/cru_ts4.01.1901.2016.pre.dat.nc (NC_FORMAT_CLASSIC):

     2 variables (excluding dimension variables):
        float pre[lon,lat,time]   
            long_name: precipitation
            units: mm/month
            correlation_decay_distance: 450
            _FillValue: 9.96920996838687e+36
            missing_value: 9.96920996838687e+36
        int stn[lon,lat,time]   
            description: number of stations contributing to each datum

     3 dimensions:
        lon  Size:720
            long_name: longitude
            units: degrees_east
        lat  Size:360
            long_name: latitude
            units: degrees_north
        time  Size:1392   *** is unlimited ***
            long_name: time
            units: days since 1900-1-1
            calendar: gregorian

    8 global attributes:
        Conventions: CF-1.4
        title: CRU TS4.01 Precipitation
        institution: Data held at British Atmospheric Data Centre, RAL, UK.
        source: Run ID = 1709081022. Data generated from:pre.1704241136.dtb
        history: Fri  8 Sep 2017 12:54:11 BST : User ianharris : Program makegridsauto.for called by update.for
        references: Information on the data is available at http://badc.nerc.ac.uk/data/cru/
        comment: Access to these data is available to any registered CEDA user.
        contact: BADC  

This tells us that the precipitation data file has two variables "pre" and "stn", and it provides a description of what they tell us: "pre" being total precipitation (mm/month), and "stn" being the number of climate stations that were used for the grid square. For now, we are only concerned with the "pre" variable.

The dimensions refer to longitude and latitude, so from the file information we can see that the file contains climate data for 720 x 360 grid squares covering the entire globe. It also tells us the format of the coordinates system. The third dimension represents time, covering each month between January 1901 and December 2016. This represents 1392 data entries, which can be confirmed by multiplying the number of years (116) by the number of months (12), which equals 1392.

Now that we have had a look at the file information, we now know the name of the variable we want to extract, so we can go ahead and use R to load the data.
© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk
# Load the CRU TS precipitation dataset into R 
pre <- brick("C:/R/CRU climate/cru_ts4.01.1901.2016.pre.dat.nc", varname="pre")

The above code loads the NetCDF file as a "RasterBrick" object (similar to an array) into memory using the raster package. - you can view information about the RasterBrick by simply inputting the object name into R. For example:

> pre
class       : RasterBrick 
dimensions  : 360, 720, 259200, 1392  (nrow, ncol, ncell, nlayers)
resolution  : 0.5, 0.5  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
data source : C:\R\CRU climate\cru_ts4.01.1901.2016.pre.dat.nc 
names       : X1901.01.16, X1901.02.15, X1901.03.16, X1901.04.16, X1901.05.16, X1901.06.16, X1901.07.16, X1901.08.16, X1901.09.16, X1901.10.16, X1901.11.16, X1901.12.16, X1902.01.16, X1902.02.15, X1902.03.16, ... 
Date        : 1901-01-16, 2016-12-16 (min, max)
varname     : pre 

It is possible to plot the information directly from a "RasterBrick" object. Lets quickly plot a map showing global precipitation for January 1901 just to see how the data looks, using this command in the R console:

> plot(pre$X1901.01.16)

With raster objects, you can also look more closely at a particular area by using extent() to define the area using coordinates, and then by using crop() to create a new object using the original raster object data cropped to the new area. Lets take a closer look at the UK:

> uk.area <- extent(-12, 4, 48, 64)
> uk <- crop(pre, uk.area)
> plot(uk$X1901.01.16)

Later guides will look at plotting climate data in more detail, so let's get back to extracting climate data from the CRU TS dataset.

If, for example you wanted climate data for a number of your sample sites, you would need to tell R the coordinates of these sites for it to extract the relevant data. You can do this by creating a matrix or data.frame containing your sample sites, putting your sample names as the row names, column 1 as longitude, and column 2 as latitude. If you are unsure how to do this, please check out my Getting Started with R Guide for help.

But, say you already have a spreadsheet of sample sites, rather than recreate it in R, we shall import the existing data.

Go ahead and create a spreadsheet using your favourite spreadsheet software, or Google Docs will also work. Create a simple spreadsheet which contains 4 sample sites, and their coordinates, using column names "site", "lon" and "lat" and save the spreadsheet as a .csv file as below.

Or, you can download the spreadsheet I have already created, available on Google Docs

Save the spreadsheet to the same directory you are currently working in (CRU climate) as "samples.csv", and then import the data into R using the following code:
© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk
# Import sample site information
samples <- read.csv("samples.csv", header=TRUE, row.names="site", sep=",")

Don't worry if you get a warning message here - it can safely be ignored.

If you wanted to check that your sample sites have the correct coordinates and appear in the right place, you could plot them on your map which you made earlier:

> plot(uk$X1901.01.16)
> points(samples, pch=16)

Which looks about right!

Now, lets go ahead and extract the climate data for our 4 sample sites using the extract() command from the raster package:

# Extract climate data from the RasterBrick as a data.frame
pre.sites <- data.frame(extract(pre, samples, ncol=2))

The above code tells R to create a new data.frame object called "pre.sites", where it will extract information from the RasterBrick object "pre" (which contains all the climate data), using the coordinates found in the "samples" object (which we imported our sample site data too).

If you were to then look at the data using fix("pre.sites") you should now see a table with your 4 sample sites and the monthly precipitation data for these sites.

We can make this table a bit more user friendly. Lets add the sample site names to the table using the row.names() command:

# Add sample site names
row.names(pre.sites) <- row.names(samples)

Here, we have changed the row names of the data.frame object "pre.sites" to use the same names from our imported spreadsheet.

Lets also change the column names to be a bit easier to work with - we'll name them using the convention: year followed by month. If you were to rename 1392 columns manually, this could take quite a long time! But, in R, there is a much easier and faster way.

First, create two vector objects containing the years and months. There is no need to type each year, simply the start and end years, seperated by a colon.
© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk
# Change column names
years <- 1901:2016
month <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

Now, we will rename the columns of our data.frame using the names() command, the paste() command, and the rep() command.

names(pre.sites) <- paste(rep(years, each=12), rep(month, times=116), sep="_")

This code tells R to firstly "paste" each year (taken from the "years" object) 12 times for each month. It then "pastes" the names of the months (taken from the "month" object) 116 times for each year, and uses an underscore to separate them.

When you look at the data now fix("pre.sites") it should appear a lot more user friendly!

Now that we have extracted the precipitation data for our samples sites, lets export that data to a spreadsheet. You can use different packages to export to any file format, including Excel files, but for now, we will stick to the defaults and export a .csv file of our data, which you should be able to open in most software programs.

# Save the extracted climate data to a .csv file
write.csv(pre.sites, file="Precipitation Data.csv")

And that's it! You have successfully downloaded CRU climate data, opened it in R, and extracted only the data you want for your sample sites!

You should now be able to follow this guide to extract mean temperature data from the other CRU file you downloaded - just remember to change object names from "pre" to "tmp", and change the variable name when loading the dataset to "tmp". You could also experiment with some of the other datasets available from CRU.

Here's the complete code to extract precipitation and temperature data in case you get stuck:
© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk
# Extracting CRU Climate data: CRU TS v4.01
# Complete guide available at: http://www.benjaminbell.co.uk

# Load packages
library(raster)
library(ncdf4)

# Load the CRU TS datasets into R 
pre <- brick("cru_ts4.01.1901.2016.pre.dat.nc", varname="pre") # Precipitation
tmp <- brick("cru_ts4.01.1901.2016.tmp.dat.nc", varname="tmp") # Mean monthly temperature

# Import sample site information
samples <- read.csv("samples.csv", header=TRUE, row.names="site", sep=",")

# Extract climate data from the RasterBrick as a data.frame
pre.sites <- data.frame(extract(pre, samples, ncol=2)) # Precipitation
tmp.sites <- data.frame(extract(tmp, samples, ncol=2)) # Mean monthly temperature

# Add sample site names to the data.frame
row.names(pre.sites) <- row.names(samples)
row.names(tmp.sites) <- row.names(samples)

# Change column names
years <- 1901:2016
month <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
names(pre.sites) <- paste(rep(years, each=12), rep(month, times=116), sep="_")
names(tmp.sites) <- paste(rep(years, each=12), rep(month, times=116), sep="_")

# Save the extracted climate data to a .csv file
write.csv(pre.sites, file="Precipitation Data.csv")
write.csv(tmp.sites, file="Temperature Data.csv")

Thanks for reading, if you have any comments please leave them below!


Ad

© 2018 Benjamin Bell. All Rights Reserved. http://www.benjaminbell.co.uk

Further reading

Working with the extracted climate data - Part 2 of this guide!

Extracting Worldclim climate data - Part 3 of this guide!


9 comments:

  1. Really cool Benjamim. Thank you for sharing!

    ReplyDelete
    Replies
    1. You’re welcome. Glad it is of some use!

      Delete
  2. Thanks, the most accessible for understanding guide on cruts from those that I saw.

    ReplyDelete
    Replies
    1. Thanks for the comments, glad this has been helpful!

      Delete
  3. You are a very good teacher

    ReplyDelete
  4. Thank you very much. This was so easy to follow and super helpful.

    ReplyDelete
  5. Hi! Thank you for this tutorial. It's very helpful! I'm having trouble extracting the files. I downloaded 7-zip but when I try to extract the file, a message pops up telling me that it's not a file...
    What could be happening? Thanks for your help in advance!

    ReplyDelete
    Replies
    1. Hi Antonella,

      Can I clarify if you are having trouble with 7-Zip or the file you downloaded?

      For 7-Zip, after you have downloaded and installed it, you should get a new option on the context menu when you right click on a file which says "7-Zip". When you hover over this option, you'll get several new options including "Open Archive" and "Extract files..."

      "Extract files..." is the one you want for opening the climate data.


      Assuming that 7-Zip is working fine - have you tried to re-download the climate data? Make sure you download the file ending ".nc.gz" which is the "gzipped" file. Instead of using the download link on the website, you could try right-clicking the link and selecting "Save Link As..." or "Save Target as..." to download.

      Hope this helps,
      Ben

      Delete