The yarrr package (0.0.8) is (finally!) on CRAN

Great news R pirates! The yarrr package, which contains the pirateplot, has now been updated to version 0.0.8 and is up on CRAN (after hiding in plain sight on GitHub). Let’s install the latest version and go over some of the updates:

[code language=”r”]
install.packages("yarrr") # Install package from CRAN
library("yarrr") # Load the package
yarrr.guide() # Open the package guide
[/code]

The most important function in the yarrr package is pirateplot(). What the heck is a pirateplot? A pirateplot is a modern way of visualising the relationship between a categorical independent variable, and a continuous dependent variable. Unlike traditional plotting methods, like barplots and boxplots, a pirateplot is an RDI plotting trifecta which presents Raw data (all data as points), Descriptive statistics (as a horizontal line at the mean — or any other function you wish), and Inferential statistics (95% Bayesian Highest Density Intervals, and smoothed densities).

pirateplot-elements

For a full guide to the package, check out the package guide at CRAN here. For now, here are some examples of pirateplots showing off some the package updates.

Up to 3 IVs

You can now include up to three independent variables in your pirateplot. The first IV is presented as adjacent beans, the second is presented in different groups of beans in the same plot, and the third IV is shown in separate plots.

Here is a pirateplot of the heights of pirates based on three separate IVs: headband (whether the pirate wears a headband or not), sex, and eyepatch (whether the pirate wears an eye patch or not):

[code language=”r”]
pirateplot(formula = height ~ sex + headband + eyepatch,
point.o = .1,
data = pirates)
[/code]

threeivpp

Here, we can see that male pirates tend to be the tallest, but there there doesn’t seem to be a difference between those who wear headbands or not, and those who have eye patches or not.

New color palettes

The updated package has a few fun new color palettes contained in the piratepal() function. The first, called ‘xmen’, is inspired by my 90s Saturday morning cartoon nostalgia.

[code language=”r”]
# Display the xmen palette
piratepal(palette = "xmen",
trans = .1, # Slightly transparent colors
plot.result = TRUE)
[/code]

xmen_display

Here, I’ll use the xmen palette to plot the distribution of the weights of chickens over time (if someone has a more suitable dataset for the xmen palette let me know!):

[code language=”r”]
pirateplot(formula = weight ~ Time,
data = ChickWeight,
main = "Weights of chickens by Time",
pal = "xmen",
gl.col = "gray")

mtext(text = "Using the xmen palette!",
side = 3,
font = 3)

mtext(text = "*The mean and variance of chicken\nweights tend to increase over time.",
side = 1,
adj = 1,
line = 3.5,
font = 3,
cex = .7)
[/code]

xmen_chikens

The second palette called “pony” is inspired by the Bronys in our IT department.

[code language=”r”]
# Display the pony palette
piratepal(palette = "pony",
trans = .1, # Slightly transparent colors
plot.result = TRUE)
[/code]

pony_image

Here, I’ll plot the distribution of the lengths of movies as a function of their MPAA ratings (where G is for suitable for children, and R is suitable for adults) using the pony palette:

[code language=”r”]
pirateplot(formula= time ~ rating,
data = subset(movies, time > 0 & rating %in% c("G", "PG", "PG-13", "R")),
pal = "pony",
point.o = .05,
bean.o = 1,
main = "Movie times by rating",
bean.lwd = 2,
gl.col = "gray")

mtext(text = "Using the pony palette!",
side = 3,
font = 3)

mtext(text = "*Movies rated for children\n(G and PG) tend to be longer \nthan those rated for adults",
side = 1,
adj = 1,
font = 3,
line = 3.5,
cex = .7)
[/code]

pony_times

I have to be honest, the pony palette colors are not terribly well suited for this pirateplot — but I think they look better in a basic scatterplot. Because the piratepal function returns a vector of colors (when plot.result = F), you can also use it in other plots. Here, I’ll use the pony palette in a scatterplot:

[code language=”r”]
set.seed(100) # for replicability
x <- rnorm(100, mean = 10, sd = 1)
y <- x + rnorm(100, mean = 0, sd = 1)
point.sizes <- runif(100, min = .2, max = 2) # Just for fun

plot(x, y,
main = "Scatterplot with the pony palette",
pch = 21,
bg = piratepal("pony", trans = .1),
col = "white",
bty = "n",
cex = point.sizes)

grid() # Add gridlines
[/code]

ponyscatter

To see all of the palettes (including those inspired by movies and a transit map of Basel), just run the function with “all” as the main argument

[code language=”r”]
piratepal(palette = "all")
[/code]

Of course, if you find that these color palettes give you a headache, you can always set a pirateplot to grayscale (or any other color), by specifying a single color in the palette argument. Here, I’ll create a grayscale pirateplot showing the distribution of movie budgets by their creative type:

[code language=”r”]
pirateplot(formula = budget ~ creative.type,
data = subset(movies, budget > 0 &
creative.type %in% c("Multiple Creative Types", "Factual") == FALSE),
point.o = .02,
xlab = "Movie Creative Type",
main = "Movie budgets (in millions) by rating",
gl.col = "gray",
pal = "black")

mtext("Using a grayscale pirateplot",
side = 3,
font = 3)

mtext("*Superhero movies tend to have the highest budgets\n…by far!",
side = 1, adj = 1, line = 3,
cex = .8, font = 3)
[/code]

moviebudgetpp

Looks like super hero movies have the highest budgets…by far!

And again, to get more tips on how to customise your palettes and pirateplots, check out the main package guide at https://cran.r-project.org/web/packages/yarrr/vignettes/guide.html, or by running the following code:

[code language=”r”]
yarrr.guide() # Open the yarrr package guide
[/code]

Acknowledgements and Comments

– The pirateplot is largely inspired by the great beanplot package by Peter Kampstra.
– Bayesian 95% HDIs are calculated using the truly amazing BayesFactor package by Richard Morey [Note: a previous version of this post incorrectly called Richard “Brian” — I blame lack of caffeine].
– The latest developer version of yarrr is always available at https://github.com/ndphillips/yarrr. Please post any bugs, issues, or feature requests at https://github.com/ndphillips/yarrr/issues

The Pirate Plot (2.0) – The RDI plotting choice of R pirates


pirateplotex3
Note: this post refers to an outdated version of the pirateplot() function. For the latest description of the function, check out http://rpubs.com/yarrr/pirateplot

Barplots suck

Plain vanilla barplots are as uninformative (and ugly) as they are popular. And boy, are they popular. From the floors of congress, to our latest scientific articles, barplots surround us. The reason why barplots are so popular is because they are so simple and easy to understand. However, that simplicity also carries costs — namely, barplots can mask important patters in data like multiple modes and skewness.

Instead of barplots, we should be using RDI plots, where RDI stands for Raw (data), Description and Inference. Specifically, an RDI plot should present complete raw data — including smoothed densities, descriptive statistics — like means and medians, and Inferential statistics — like a Bayesian 95% Highest Density Interval (HDI). The R community already has access to many great examples of plots that come close to the RDI trifecta. For example, beanplots, created by the beanplot() function, show complete raw data and smoothed distributions (Kampstra, 2008).

Today, the R community has access to a new RDI plot — the pirate plot. I discovered the original code underlying the pirate plot during a late night swim on the Bodensee in Konstanz Germany. The pirate plot function was written in an archaic German pirate dialect on an old beer bottle and is unfortunately unusable. However, I have taken the time to painstakingly translate the original pirate code into a new R function called pirateplot(). The latest version (now 2.0) of the translations are stored in the yarrr package on Github at (www.github.com/ndphillips/yarrr). To install the package and access the piratepal() function within R, first install and load the yarrr package

[code language=”r”]

install.packages("yarrr")
library("yarrr")

[/code]

 

Now you’re ready to make some pirate plots! Let’s create a pirate plot from the pirates dataset in the yarrr package. This dataset contains results from a survey of several pirates at the Bodensee in Konstanz. We’ll create a pirateplot showing the distribution of ages of pirates based on their favorite pirate:

[code language=”r”]

pirateplot(formula = age ~ favorite.pirate,
data = pirates,
xlab = "Favorite Pirate",
ylab = "Age",
main = "My First Pirate Plot!")

[/code]

pp1

The arguments for the pirateplot are very similar to that of other plotting functions like barplot() and beanplot(). They key arguments are formula, where you specify one (or two) categorical variable(s) for the x-axis, and and numerical variable for the y-axis.

In addition to the data arguments, there are arguments that dictate the opacity of the 5 key elements of a pirate plot: bar.o, The opacity of the bars. bean.o, the opacity of the beans, point.o, the opacity of the points, and line.o, the opacity of the average lines at the top of the bars. Finally, hdi.o controls the opacity of the 95% Bayesian Highest Density Interval (HDI). The HDIs are calculated using the BEST package (Kruschke, 2013). Because calculating HDIs can be time-consuming, they are turned off by default (i.e.; hdi.o = 0). In the next plots, I’ll turn them on so you can see them.

The pirateplot() function has built-in color arguments. You can control the overall color palette of the plot with pal, and the color of the plot background with back.col. Let’s change a few of these arguments. I’ll also include the 95% Highest Density Intervals (HDIs) by setting hdi.o = .7.

[code language=”r”]

pirateplot(formula = age ~ favorite.pirate,
data = pirates,
xlab = "Favorite Pirate",
ylab = "Age",
main = "Black and White Pirate Plot",
pal = "black",
hdi.o = .7,
line.o = 1,
bar.o = .1,
bean.o = .1,
point.o = .1,
point.pch = 16,
back.col = gray(.97))
[/code]

pp2

As you can see, the entire plot is now grayscale, and different elements of the plot have been emphasised by changing the opacity arguments. For example, now that we’ve set the opacity of the HDI to .8 (the default is 0), we can see the Bayesian 95% Highest Density Interval for the mean of each group.

Hopefully it’s clear how much better RDI plots are than standard bar plots. Now, in addition to just seeing one piece of information (the mean) of each group, we can see all the raw data, a smoothed density curve of the data (helpful for detecting multiple modes and skewness), as well as Bayesian inference.

Oh, and just for comparison purposes, we can create a standard barplot within the pirateplot() function by adjusting the opacity arguments:

[code language=”r”]

pirateplot(formula = age ~ favorite.pirate,
data = pirates,
xlab = "Favorite Pirate",
ylab = "Age",
main = "Black and White Pirate Plot",
pal = "black",
hdi.o = 0,
line.o = 0,
bar.o = 1,
bean.o = 0,
point.o = 0)
[/code]

sad

 

Now how awful does that barplot look in comparison to the far superior pirate plot?!

You can also include multiple independent variables as arguments to the pirateplot() function. For example, I can plot the pirates’ beard lengths separated by sex and the college pirate went to. For this plot, I’ll use the southpark palette and emphasize the HDI by turning its opacity up to .6

[code language=”r”]

pirateplot(formula = beard.length ~ sex + college,
data = pirates,
main = "Beard lengths",
pal = "southpark",
xlab = "",
ylab = "Beard Length",
point.pch = 16,
point.o = .2,
hdi.o = .6,
bar.o = .1,
line.o = .5)
[/code]

pp4

As you can see, it’s very easy to customise the look and focus of your pirate plot. Here are 6 different plots of the weights of chickens given one of 4 diets (from the ChickWeight dataframe in R). You can see the code for each by accessing the help menu for the pirateplot() function within R.

ppmatrix

 

Have fun creating your own pirate plots! If you have suggestions for further improvements, don’t hesitate to write my squire at yarrr.book@gmail.com.

References

Kampstra, P. (2008) Beanplot: A Boxplot Alternative for Visual Comparison of Distributions. Journal of Statistical Software, Code Snippets, 28(1), 1-9. URL http://www.jstatsoft.org/v28/c01/

Kruschke, J. K. 2013. Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General 142(2):573-603. doi: 10.1037/a0029146

piratepal – An R pirate’s source of palettes inspired by everything from the Mona Lisa to the Evil Dead

When most people think about R, they think of statistics. They don’t think of horror films. I’m here to change that.

Let’s back up. While R is certainly one of the top two (I’m looking over my back at you Python) languages for statistical analysis, it’s just as good for creating beautiful and informative plots – but only if you have the right colors. No matter what kind of plot you create, if you use colors that suck, your plot will look like shit. To pick colors that go well together, you need to use a color palette.

How can you select a color palette in R? Up until now I’ve been using the great RColorBrewer package. However, I’ve lost my enthusiasm for the palettes in the package – not because they’re bad, but because they didn’t mean anything to me. To fix this, I created several palettes either created by graphic artists, or inspired by movies, works of art, and my own aimless Googling. They are all stored in a function called piratepal contained in the yarrr package.

To use piratepal(), you first need to download and install the yarrr package (if you don’t have the devtools package installed, you’ll need to install that first):

[code language=”r”]
install.packages("devtools")
library("devtools")
install_github("ndphillips/yarrr")
library("yarrr")
[/code]

Once you’ve installed the yarrr package, you can learn about the piratepal() function using the help menu:

[code language=”r”]
?piratepal
[/code]

Here, you’ll see that piratepal() has three arguments

  • palette: A string defining the color palette to use (see examples). To use a random palette, use “random”. To view all palettes, use “all” combined with action = “show”
  • action: Either “return” to return a vector of colors, or “show” to show the palette. You can also use “r” or “s” for shorthand.
  • trans: A number in the interval [0, 1] indicating how transparent to make the colors. A value of 0 means no transparency and a value of 1 means complete transparency. Defaults to 0.

Let’s start by seeing all the different palettes in the package. To do this, set palette = “all”, and action = “show”:

[code language=”r”]
piratepal(palette = "all", action = "show")
[/code]

piratepalettes

Here, you can see a brief overview of all the palettes in the package. The names of the palettes on are on the left, and the colors in each palette are displayed to the right of the names. If you want to see the colors in a specific palette in more detail, you can see them (combined with an inspirational image) by using action = “show”.

For example, here is a palette called “scholar” that I got from this blog on the Shutterstock website

scholar

Now if you want to use the colors in a palette, just use the action = “return” argument to return a vector of the palette colors:

[code language=”r”]
piratepal(palette = "scholar", action = "return")
[/code]

This code will return the following vector of the colors in the palette:

blue1 yellow blue2 gray
#2B3E56FF #CDA725FF #C5CBD3FF #F1F1E7FF

Once you’ve assigned this vector of colors to an object (like my.colors), you can use them in any plot that you’d like!

You’ll notice that while some of these palettes have vague, generic names (like “scholar”), others are suspiciously recognisable. For example, what is this “monalisa” palette? Well, I discovered this blog post from a site called artvarsity that contains color palettes inspired by classical art. So naturally, I stole some and put them into the package. For example, let’s look at the “monalisa” palette:

[code language=”r”]
piratepal(palette = "monalisa", action = "show")
[/code]

monalisa

Oh yeah, I’ve also got several palettes inspired by movies. Oh yes, palettes from movies. I discovered this badass site called Movies in Color which provides color palettes for tons of movies. Don’t get me wrong, these palettes aren’t necessarily the most beautiful in the world – but who doesn’t want to use a palette from a horror movie?!

Here’s a palette from “Eternal Sunshine of the Spotless Mind” – the film shown at every psychology undergraduate social event

[code language=”r”]
piratepal(palette = "eternal", action = "show")
[/code]

eternal

Since I love Pixar movies, and found this great website showing palettes for all the Pixar films, I added lots of palettes from Pixar movies. Let’s take a look at the “up” palette:

[code language=”r”]
piratepal(palette = "up", action = "show")
[/code]

up

Just for fun, Let’s draw the house from the movie “Up” using the “google” palette (using my handy digital color meter, I also stole the colors from the Google logo).

[code language=”r”]
google.colors <- piratepal("google", trans = .1)
n.balloons <- 500
x <- rnorm(n.balloons, 0)
y <- rnorm(n.balloons, 2, 1)
plot(1, xlim = c(-7, 7), ylim = c(-7, 7),
xlab = "", ylab = "", type = "n", xaxt = "n", yaxt = "n", bty = "n")

rect(-2, -6, 2, -2)
polygon(c(-2, 0, 2),
c(-2, 0, -2)
)
rect(-7, -7, -2, 100)
rect(2, -7, 7, 100)
rect(-.5, -6, .5, -4)
points(.3, -5)

line.start.x <- rnorm(n.balloons, 0, .4)
line.start.y <- -1 + rnorm(n.balloons, 0, .1)

segments(line.start, line.start.y, x, y, lty = 1, col = gray(.7), lwd = .2)
points(x, y, pch = 21, bg = sample(google.colors, 100, replace = T),
xlim = c(-7, 7), ylim = c(-7, 7), col = gray(.9), cex = rnorm(100, 2, .3))
[/code]

Here’s the result! Oh and did you know that the Up house really exists in Seattle?!?! Here’s the real one!:

uphouse

Finally, for all you horror fans out there: here’s the Evil Dead palette. I’ll warn you, it’s pretty bleak – but honestly what did you expect?

[code language=”r”]
piratepal(palette = "evildead", action = "show")
[/code]

evildead

Let’s take advantage of this friendly, colorful palette. Here are the domestic revenues of the 4 Evil Dead films (from the-numbers.com):

[code language=”r”]

evil.colors <- piratepal(
palette = "evildead", action = "return",
trans = .2)

revenues <- c(2400000, 5923044, 11502976, 54239856)

plot(1, xlim = c(.5, 4.5), ylim = c(0, 59000000), type = "n",
xlab = "", ylab = "", xaxt = "n", bty = "n",
yaxt = "n", main = "Evil Dead Movie Domestic Revenues")

mtext("Using the evildead palette in the yarrr package!", side = 3,
cex = .8, line = .5)

segments(1:4, rep(0, 4), 1:4, revenues, lty = 2)

abline(h = seq(0, 55000000, 10000000), col = gray(.9))

adj <- 2000000
text(1:4, revenues + adj,
labels = c("$2.4 Mil", "$5.9 Mil", "$11.5 Mil", "$54 Mil"), pos = 3)

points(1:4, revenues,
col = evil.colors, cex = 4, pch = 16)

mtext(c("Evil Dead\n(1983)", "Evil Dead 2\n(1987)", "Army of Darkness\n(1993)",
"Evil Dead\n(2016)"), at = 1:4, side = 1, line = 1)
[/code]

evildeadrevenue

Here are a few of the other movie-based palettes:

[code language=”r”]
piratepal(palette = "ghostbusters", action = "show")
[/code]

Dr Ray Stantz: Everything was fine with our system until the power grid was shut off by dickless here.
Walter Peck: They caused an explosion!
Mayor: Is this true?
Dr. Peter Venkman: Yes it’s true.
[pause]
Dr. Peter Venkman: This man has no dick.

ghostbusters

[code language=”r”]
piratepal(palette = "rat", action = "show")
[/code]
Remy: [as Emile tastes a piece of cheese] Creamy, salty-sweet, an oaky nuttiness… You detect that?
Emile: Oh, I’m detecting nuttiness…

rat

I hope these art / movie / random googling inspired palettes inspire people to make plots with better colors. Oh and if you have a favorite movie that you’d like me to add to the function in the form of a palette, let me know. I can always use a good movie recommendation. Bonus points if it’s available on German Netflix.

NDP Consulting

NDPConsulting

As of May 17 2015, I am proud to announce NDP Consulting! NDP Consulting is my new umbrella for providing consulting services for both academic and non-academic problems. While I have provided consulting services to many people in the past, this will be my first time officially advertising my services (and hey, it was fun creating my new logo – in R of course).

If you have a data analysis, forecasting, experimental design, or modelling problem that you need help with, please don’t hesitate to contact me with a brief outline of the project and I will let you know what I can do to help.

My R course…now on YouTube!

Well, after talking about it for about a year now, I’ve finally started converting my R course to a YouTube friendly format. Since I’m in the middle of my university R course, the videos don’t quite start at the beginning. I’ll add the beginning 3-5 lectures after the semester ends. In the meantime, here’s the permanent link: http://goo.gl/a6L0up