In the final part of this expanded guide series introducing you to R, this guide gives you an overview of the graphics systems in R, and gets you started creating plots using base graphics.
Guide Information
Title | Getting started with R: An overview of graphics in R |
Author | Benjamin Bell |
Published | August 24, 2021 |
Last updated | |
R version | 4.1.1 |
Packages | base |
Navigation |
This is a 3 part guide:
Part 1: Introduction to R | An introduction to R - what is R, where to get R, how to get started using R, a look at objects and functions, an overview of data structures, and getting started with manipulating data. |
Part 2: Importing data into R | This guide shows you how to import your existing data into R from .csv or Excel files. |
Part 3: An overview of graphics in R | This guide gives you an overview of some of the graphical capabilities of base R to produce high quality plots. |
R graphics systems
As well as being a programming language for statistical computing, R has powerful functions for creating professional and publication-quality graphics. This introductory guide will give you an overview of the graphical systems available in R, and the basics for producing plots.
When it comes to producing different kinds of graphics or plots in R, there is no shortage of ways for doing this. Two graphical systems are included in R, and several different graphical packages also exist. This provides endless options for generating graphics.
The base functionality for producing graphics is provided by the "base graphics" package. This is included by default when installing R. Base graphics may also be referred to as "traditional graphics", and it is based on S graphics. Most default functions in R use base graphics for producing graphical output.
R also includes Grid graphics, which is a graphical system designed to improve upon base graphics, offering better and more flexible layout support.
Grid graphics is a low-level system, and doesn't generate plots itself, rather packages can be built to use grid graphics. This includes the "lattice" package, and notably "ggplot2", which can generate many of the same plots that base graphics can.
Lattice is an implementation of Trellis graphics, and is ideal for plotting multivariate data.
ggplot2 is a plotting system that is "based on the grammar of graphics". The philosophy behind the package is to take the best bits of base and lattice graphics, but none of the bad bits. ggplot2 is part of the "tidyverse", and all tidyverse functions for producing graphics will use ggplot2.
The different graphical systems use their own set of arguments to customise and configure the plot or graphical output. Often, code for generating a plot in base graphics will differ to code for generating a plot in ggplot2. You may find that you favour one plotting system over the other, generally it is down to personal preference and the types of analysis, or packages that you use.
If you are just getting started with R, base graphics is a good place to start!
Graphic devices
R is able to output graphics to the screen or save them directly to a file (e.g. postscript, pdf, svg, png, jpeg etc.). The different functions for producing graphical output are known as "Devices". For example, pdf()
would invoke the pdf device, while png()
would invoke the png device. Type ?Devices
into the R console to see a list of graphical devices that are available to R on your system.
By default, graphical output is sent to the screen. As R is cross-platform, the graphics device for producing "screen" graphics differs by system. The available fonts may also differ by system and graphical device.
For Windows, the default graphics device for outputting to screen is windows()
. To see which fonts are available to the windows graphics device, type windowsFonts()
into the R console (this code will only work on Windows systems).
For MacOS, the default graphics device for outputting to screen is quartz()
. This device can also be used to create a number of graphics files on MacOS systems. To see the fonts available to the quartz graphics device, type quartzFonts()
into the R console.
For Linux and UNIX systems, the default graphics device is X11()
which outputs to screen. The X11()
function is a wrapper for two devices: "Xlib" and "Cairo" graphics. Most pre-compiled R builds for Linux will default to Cairo graphics. To see the fonts available to the X11(type="Xlib")
graphics device, type X11Fonts()
into the R console.
As well as graphical output to the screen, R can write the graphical output directly to a graphics file using R code. In addition, you can also write files using the "Cairo" graphics library for R, based on the device-independent and open-source Cairo graphics library, but, let's not worry about that for now!
Since sending graphical output to the screen is the default, you can get plotting without worrying about what graphical device your system is using.
If you are using the default R environment, when you generate a plot, it will appear on screen in a new window (the graphics device). If you were to generate a second plot, it would replace the previous plot with the new one in the same window. In RStudio, plots appear in a separate plot viewer, and you can scroll between different plots.
You can have multiple graphics devices on the screen at the same time (graphics windows).
To create a new screen device for the new plot, simply run the relevant device code before your plot code. For example:
# Windows systems:
# Open new screen device
windows()
### Plot code here ###
# MacOS:
# Open new screen device
quartz()
### Plot code here ###
# Linux/UNIX systems:
# Open new screen device
x11()
### Plot code here ###
This works in the default R environment, and also RStudio where a new window will open showing the plot (rather than sending it to the plot viewer).
If you were to write a plot directly to a file (e.g. using the pdf()
device), it would not appear on screen.
Plotting with base graphics: plot()
When using base graphics, or any default function to plot data, the main function you'll use for creating plots is plot()
. This is a generic function which means it will use a different method for plotting data depending on the type or class of data (or object) you want to plot.
If you type methods(plot)
into the R console, you'll get a list of the different plot methods that are available on your system (these may differ depending on what packages you have installed).
The methods may work differently and may have different arguments available. For example, check the help pages for ?plot
and ?plot.default
(type this in the R console) to see some of the differences.
For general usage, you don't really need to worry about this since plot()
will call the right method automatically. However, its worth bearing in mind in case you run into any problems - check the object class, and check the help page for the plot method for that class.
When using plot()
, the output is sent to the graphics device. This creates a "plot region" on the graphics device. The visible area of the graphics device is also known as the "device region". The plot region on the other hand is the area of the plot (usually bounded by a box), and does not include the plot axes, labels, or margins.
The plot region, plus the axes, labels and margins is known as the "figure region". Often, the device region and figure region can be the same size - but they are not the same thing.
Almost all kinds of plots, charts and graphs can be produced using base graphics, and these plots can be fully customised using par()
, check out the help page (?par) for details of customisations.
This may seem confusing at first, and generally it is not something you need to worry about, unless you are creating multi panel plots, or heavily customising the plot. The diagram below illustrates these regions.
Plotting data in R
Now that you're more familiar with graphics in R, lets create a simple plot to show how easy it is in R. For these examples, we'll create some random data to plot.
# Set seed (to get the same random values)
set.seed(5431)
# Create random data
x <- rnorm(30) # 30 random numbers with a normal distribution
y <- rnorm(30)
In this code set.seed()
is used so that the same "random" numbers can be created. rnorm()
creates a set of random numbers with a normal distribution. x
and y
are the vectors containing the data.
To plot these two vectors, use the following code:
plot(x, y)
Which results in a simple scatter plot:
If the data was contained within a matrix or data frame, you could use the following code to plot:
# Create matrix
m <- matrix(c(x, y), ncol=2)
# Plot matrix
plot(m)
However, if the matrix or data frame contained more than two columns of data, you would also need to specify which columns to plot. For example:
# Specify data to be plotted
plot(m$x, m$y)
This code will plot the data from the column named "x" from the "m" matrix, against the data from the column named "y" from the "m" matrix.
This code is the equivalent of using:
# Specify data to be plotted
plot(m[,1], m[,2])
Which tells R to plot the first column of data (x axis) against the second column of data (y axis).
plot() arguments
Since plot()
is a generic function, there are a lot of arguments available to control the plot, and these can vary depending on the type of plot you are creating.
For generic plotting, by default plot()
will plot "points" to create a scatter plot (x y plot). You can change the plot type to join the points (line plots), or create bar or stair plots using the type
argument.
type
- "p" for points.
- "l" for lines.
- "b" for both points and lines.
- "c" for the lines and a space (where a point would be).
- "o" for both points and lines which are overplotted each other.
- "h" for vertical lines.
- "s" for stair steps (horizontal step first).
- "S" for other steps (vertical step first).
- "n" for no plot.
type="n"
.
Examples of each plot are below.
These are just the basic plot types available using plot()
, which were created without any customisations, using minimal code. They are useful for quickly visualising data using minimal code. But, you can do so much more with graphics in R!
Here are a few of the available arguments you are likely to use with plot()
.
xlim
c()
function, and providing 2 numbers. E.g. xlim=c(0, 10)
.
ylim
c()
function, and providing 2 numbers. E.g. ylim=c(0, 10)
.
col
pch
cex
lty
lwd
main
title()
function (separately from the plot()
code), which gives further options (including subtitles).
xlab
ylab
There are many more options available, and further customisations can be done by using par()
. Check out the respective help pages for info.
Here's an example of a customised plot:
# Basic plot customisations
plot(x, y, type="b", col=rainbow(30), pch=18, cex=3, lty=2, lwd=2, main="My Plot", xlab="X axis", ylab="Y axis")
As well as these simple plots, you can create pretty much any type of plot or graphics that you can think of, either using base graphics, ggplot2, built-in functions or packages. This guide barely scratches the surface of what is possible, but it has hopefully given you a good overview of the basics.
Thanks for reading this guide and please leave any comments below.
This is a 3 part guide:
Part 1: Introduction to R | An introduction to R - what is R, where to get R, how to get started using R, a look at objects and functions, an overview of data structures, and getting started with manipulating data. |
Part 2: Importing data into R | This guide shows you how to import your existing data into R from .csv or Excel files. |
Part 3: An overview of graphics in R | This guide gives you an overview of some of the graphical capabilities of base R to produce high quality plots. |
Further reading
A quick guide to pch symbols - A quick guide to the different pch symbols which are available in R, and how to use them. [R Graphics]
A quick guide to line types (lty) - A quick guide to the different line types available in R, and how to use them. [R Graphics]
Quick guide to annotating plots in R - A quick guide to annotating plots with text, arrows, and shapes. [R Graphics]
Pollen diagrams using rioja - Part 1 of a 3 part guide series where I show you how to plot pollen diagrams using rioja.
Principal components analysis (PCA) in R - A guide showing you how to perform PCA in R, and how to create great looking biplots.
No comments
Post a Comment
Comments are moderated. There may be a delay until your comment appears.