This page is now outdated! Please go to

My new page is at, or the easier-to-remember link! From now on, all my future posts will be there.

From WordPress to GitHub Pages

Over the past two years, I’ve really enjoyed creating and maintaining this website with WordPress to share my R-related musings — from my e-book YaRrr! The Pirate’s guide to R, to my R packages like yarrr and FFTrees.

However, over the past few months, I’ve realised that my content creation and sharing process has been flawed. Until now, I’ve been writing my posts in WordPress. To share a new post, I’ve had to write the post, including all code and output, in R Markdown, and then find a way to transfer the content to the WordPress editor. This often meant having to save and upload separate plotting images, hack HTML (which I don’t really know) to make sure the code is nicely formatted, all while trying to navigate the at times overwhelming assault of menus and customisations on the WordPress admin page. In short — it has worked, but it has been a bit of a pain.

Last week, my colleague Dirk Wulff introduced me to a much better solution to website management: R Markdown combined with GitHub pages. This solution allows you to do all webpage management within RStudio, while providing a much cleaner user experience. Thanks to the great tutorials provided by RStudio at and Florian Prive at, I was able to create my elegant new webpage in an afternoon.

Discovering this new process of website management with GitHub pages has been like getting my first iPhone, discovering R, and getting my first stick of deodorant: I didn’t know I needed it until I got it. And when I did, I couldn’t imagine how I lived without it.

I love the open-source community

I an only begin to fully express how much I appreciate the open-source programming community. With free online tutorials, free hosting provided by GitHub, the open-source programming language R, the simple Markdown language, and the programming environment RStudio, I was able to create something that would have taken me months, if not years, to do on my own from scratch.

I am really looking forward to using this new platform to share my thoughts and pass what I’ve learned on to someone else. It’s going to be fun 🙂

The yarrr package (0.0.8) is (finally!) on CRAN

Great news R pirates! The yarrr package, which contains the pirateplot, has now been updated to version 0.0.8 and is up on CRAN (after hiding in plain sight on GitHub). Let’s install the latest version and go over some of the updates:

[code language=”r”]
install.packages("yarrr") # Install package from CRAN
library("yarrr") # Load the package # Open the package guide

The most important function in the yarrr package is pirateplot(). What the heck is a pirateplot? A pirateplot is a modern way of visualising the relationship between a categorical independent variable, and a continuous dependent variable. Unlike traditional plotting methods, like barplots and boxplots, a pirateplot is an RDI plotting trifecta which presents Raw data (all data as points), Descriptive statistics (as a horizontal line at the mean — or any other function you wish), and Inferential statistics (95% Bayesian Highest Density Intervals, and smoothed densities).


For a full guide to the package, check out the package guide at CRAN here. For now, here are some examples of pirateplots showing off some the package updates.

Up to 3 IVs

You can now include up to three independent variables in your pirateplot. The first IV is presented as adjacent beans, the second is presented in different groups of beans in the same plot, and the third IV is shown in separate plots.

Here is a pirateplot of the heights of pirates based on three separate IVs: headband (whether the pirate wears a headband or not), sex, and eyepatch (whether the pirate wears an eye patch or not):

[code language=”r”]
pirateplot(formula = height ~ sex + headband + eyepatch,
point.o = .1,
data = pirates)


Here, we can see that male pirates tend to be the tallest, but there there doesn’t seem to be a difference between those who wear headbands or not, and those who have eye patches or not.

New color palettes

The updated package has a few fun new color palettes contained in the piratepal() function. The first, called ‘xmen’, is inspired by my 90s Saturday morning cartoon nostalgia.

[code language=”r”]
# Display the xmen palette
piratepal(palette = "xmen",
trans = .1, # Slightly transparent colors
plot.result = TRUE)


Here, I’ll use the xmen palette to plot the distribution of the weights of chickens over time (if someone has a more suitable dataset for the xmen palette let me know!):

[code language=”r”]
pirateplot(formula = weight ~ Time,
data = ChickWeight,
main = "Weights of chickens by Time",
pal = "xmen",
gl.col = "gray")

mtext(text = "Using the xmen palette!",
side = 3,
font = 3)

mtext(text = "*The mean and variance of chicken\nweights tend to increase over time.",
side = 1,
adj = 1,
line = 3.5,
font = 3,
cex = .7)


The second palette called “pony” is inspired by the Bronys in our IT department.

[code language=”r”]
# Display the pony palette
piratepal(palette = "pony",
trans = .1, # Slightly transparent colors
plot.result = TRUE)


Here, I’ll plot the distribution of the lengths of movies as a function of their MPAA ratings (where G is for suitable for children, and R is suitable for adults) using the pony palette:

[code language=”r”]
pirateplot(formula= time ~ rating,
data = subset(movies, time > 0 & rating %in% c("G", "PG", "PG-13", "R")),
pal = "pony",
point.o = .05,
bean.o = 1,
main = "Movie times by rating",
bean.lwd = 2,
gl.col = "gray")

mtext(text = "Using the pony palette!",
side = 3,
font = 3)

mtext(text = "*Movies rated for children\n(G and PG) tend to be longer \nthan those rated for adults",
side = 1,
adj = 1,
font = 3,
line = 3.5,
cex = .7)


I have to be honest, the pony palette colors are not terribly well suited for this pirateplot — but I think they look better in a basic scatterplot. Because the piratepal function returns a vector of colors (when plot.result = F), you can also use it in other plots. Here, I’ll use the pony palette in a scatterplot:

[code language=”r”]
set.seed(100) # for replicability
x <- rnorm(100, mean = 10, sd = 1)
y <- x + rnorm(100, mean = 0, sd = 1)
point.sizes <- runif(100, min = .2, max = 2) # Just for fun

plot(x, y,
main = "Scatterplot with the pony palette",
pch = 21,
bg = piratepal("pony", trans = .1),
col = "white",
bty = "n",
cex = point.sizes)

grid() # Add gridlines


To see all of the palettes (including those inspired by movies and a transit map of Basel), just run the function with “all” as the main argument

[code language=”r”]
piratepal(palette = "all")

Of course, if you find that these color palettes give you a headache, you can always set a pirateplot to grayscale (or any other color), by specifying a single color in the palette argument. Here, I’ll create a grayscale pirateplot showing the distribution of movie budgets by their creative type:

[code language=”r”]
pirateplot(formula = budget ~ creative.type,
data = subset(movies, budget > 0 &
creative.type %in% c("Multiple Creative Types", "Factual") == FALSE),
point.o = .02,
xlab = "Movie Creative Type",
main = "Movie budgets (in millions) by rating",
gl.col = "gray",
pal = "black")

mtext("Using a grayscale pirateplot",
side = 3,
font = 3)

mtext("*Superhero movies tend to have the highest budgets\n…by far!",
side = 1, adj = 1, line = 3,
cex = .8, font = 3)


Looks like super hero movies have the highest budgets…by far!

And again, to get more tips on how to customise your palettes and pirateplots, check out the main package guide at, or by running the following code:

[code language=”r”] # Open the yarrr package guide

Acknowledgements and Comments

– The pirateplot is largely inspired by the great beanplot package by Peter Kampstra.
– Bayesian 95% HDIs are calculated using the truly amazing BayesFactor package by Richard Morey [Note: a previous version of this post incorrectly called Richard “Brian” — I blame lack of caffeine].
– The latest developer version of yarrr is always available at Please post any bugs, issues, or feature requests at

Making fast, good decisions with the FFTrees R package


“…[W]e are suspicious of rapid cognition. We live in a world that assumes that the quality of a decision is directly related to the time and effort that went into making it.” ~ Malcolm Gladwell

In his book Blink, Malcolm Gladwell summarises a common misconception about good decision making. According to folk wisdom, the more time, information, and effort you put into a decision, the better it gets. In other words, “More is better.” If you are a doctor making a diagnosis, more medical tests are always better. If you are trying to decide if your new Tinder match is worthy of a date, try to find them on Facebook, Instagram and Snapchat first. If you deciding how to invest in the stock market, get as much data as you can and build a statistical model so complex that it describes the past perfectly.

However, decades of research in cognitive science and machine learning have shown that the “More is better” theory is, in many real-world decisions, flat wrong. In contrast, there are many cases where, as Dr. Gerd Gigerenzer has put it, “Less is more.” Why? For two key reasons. The first reason is statistical: Complex decision models with many parameters can lead to overfitting. In essence, overfitting occurs when a model is very good at describing one specific past dataset, but fails in predicting new, unseen data (Gigerenzer & Brighton, 2009). For example, a complex economic model might describe changes in past stock prices very well, but is largely unable to predict future changes. This is why, as Burton Malkiel shows, it is so hard for complex trading strategies to outperform simple index funds in the stock market (Malkiel, 1999). The second reason is psychological: Even if a complex decision model is good at predicting new data, if a person can’t understand it, or easily implement it in a real-world decision, like a doctor trying to use logistic regression in an emergency room, they won’t use it.

What simple decision rules can people use to make good decisions? One popular class of simple decision rules are Fast and Frugal Trees (FFTs, Gigerenzer & Todd, 1999; Martignon, Katsikopoulos & Woike, 2008). Fast and frugal trees make very fast decisions based on a few (usually 1 to 5) pieces of information and ignore all other information. In other words, Fast and frugal trees are noncompensatory, meaning that once they make a decision based on a few pieces of information, no additional information can ever change the decision. Because they are so simple to use, they have been used in many real-world decision tasks from making coronary artery disease diagnoses (Green & Mehr, 1997), to diagnosing depression (Jenny, Pachur, Williams, Becker & Margraf, 2013). However, lest you think that fast and frugal trees are only useful when time is limited, research has shown that fast and frugal trees can out-predict more complex models in decidedly non-human simulations (Gigerenzer, Czerlinski & Martignon, 1999).

While fast and frugal trees have shown promise, there are currently no off-the-shelf methods to create them. How can you create your own fast and frugal decision trees for your own dataset? Starting today, you can use the FFTrees R package available on CRAN. The main function in the package is FFTrees(), which takes a standard formula and data argument, and returns a fast and frugal tree (FFTrees) object. From this object, you can view its underlying trees, along with many standard classification statistics (e.g.; hit-rate, false alarm rate, AUC) applied to both training and test (i.e.; prediction) datasets. Finally, the function has two alternative classification algorithms, logistic regression and CART, built in, so you can always compare the accuracy of your fast and frugal trees to two gold-standards in the classification literature. If you’re like me, you’ll be amazed at how well simple, transparent fast and frugal trees perform relative to these gold-standards, especially in predicting new data!

The FFTrees package in action

You can install and load the FFTrees package from CRAN:

[code language=”r”]

Once you’ve installed the package, you can view the overview vignette by running the code However, for this blog post I’ll show you how to create fast and frugal trees for predicting breast cancer. The data we’ll use comes from the Wisconsin Breast Cancer Database (data source). The data is stored as a dataframe with 699 rows, representing 699 patients, and 10 columns. The 10 columns represent 9 physiological measurements, from cell sizes to cell shapes, and 1 binary variable (diagnosis) indicating whether the woman truly does, or does not have breast cancer. Here is how the first few rows of the dataframe look:

thickness cellsize.unif cellshape.unif adhesion epithelial nuclei.bare chromatin nucleoli mitoses diagnosis
5 1 1 1 2 1 3 1 1 FALSE
5 4 4 5 7 10 3 2 1 FALSE
3 1 1 1 2 2 3 1 1 FALSE
6 8 8 1 3 4 3 7 1 FALSE
4 1 1 3 2 1 3 1 1 FALSE
8 10 10 8 7 10 9 7 1 TRUE

To create a fast and frugal tree from the dataset, we’ll use the FFTrees() function, entering formula = diagnosis ~., meaning that we want to predict diagnosis as a function of (potentially), all other variables, and data = breastcancer. We’ll assign the result to a new object of class FFTrees called breastcancer.fft

[code language=”r”]
breastcancer.fft <- FFTrees(formula = diagnosis ~.,
data = breastcancer)

Now that we’ve created the object, we can print it to the console to get basic information

[code language=”r”]
# "An FFTrees object containing 6 trees using 4 cues {cellsize.unif,cellshape.unif,nuclei.bare,epithelial} out of an original 9"
# "Data were trained on 683 exemplars. There were no test data"
# "FFT AUC: (Train = 0.98, Test = NA)"
# "My favorite tree is #3 [Training: HR = 0.93, FAR = 0.05], [Testing: HR = NA, FAR = NA]"

The printout tells us that the final FFTrees object contains 6 different trees, and the largest tree only uses 4 of the original 9 cues. To see the best tree, we can simply plot the FFTrees object:

[code language=”r”]
main = "Breastcancer FFT",
decision.names = c("Healthy", "Cancer"))




There’s one of our fast and frugal trees! In the top section of the plot, we see that the data had 444 true healthy cases, and 239 true cancer cases. In the middle section, we see the actual tree. The tree then starts by looking at the cue cellsize.u. If the value is less than 3, the tree decides that the person is healthy. If the value is not less than 3, then the tree looks at cellshape. If the cellshape. <= 2, the tree decides the patient is healthy. If cellshape. > 2, the tree decides that the person does have cancer. That’s the whole decision algorithm! Now isn’t that a lot easier to interpret than something like logistic regression? Again, imagine giving this logistic regression equation to anyone without a degree in statistics.


Ok, so our fast and frugal tree easy to understand and use, but how well does it perform? The bottom section of the above plot shows a series of performance statistics. On the bottom left hand corner, we can see a classification table, which shows how the tree’s decisions compare to the truth. Entries on the main diagonal (Cor Rej and Hit) correspond to correct decisions, while the other entries correspond to incorrect decisions. As you can see, the tree performed exceptionally well:  it made correct diagnoses in 646 (424 + 222) out of all 683 cases (95% correct). Additional performance statistics, including specificity (1 – false alarm rate), hit rate, d-prime, AUC (area under the curve) are also displayed. Finally, in the bottom right plot, you can see an ROC curve which compares the performance of the trees to CART (in red) and logistic regression (in blue).

Viewing other trees

While this tree did well, it still made some errors in both detecting true cancer cases (i.e.; hit rate) and in correctly rejecting true healthy cases (i.e.; specificity). Now, what if you want a tree that rarely misses true cases, at the cost of additional false alarms? As  Luan, Schooler, & Gigerenzer (2011) have shown, you can easily shift the balance of errors in a fast and frugal tree by adjusting the decisions it makes at each level of a tree. The FFTrees function automatically builds several versions of the same general tree that make different error trade-offs. We can see the performance of each of these trees in the bottom-right ROC curve. Looking at the ROC curve, we can see that tree number 5 has a very high hit-rate, but a smaller specificity. We can look at this tree by adding the tree = 5 argument to plot():

[code language=”r”]
main = "Breastcancer FFT",
decision.names = c("Healthy", "Cancer"),
tree = 5)


Screen Shot 2016-08-16 at 15.21.31


Here is the resulting tree. As you can see, this tree uses an extra cue called This tree has a perfect hit rate of 100% (just as we wanted), but at a cost of a lower specificity of 80%.

Additional arguments

Cross-validation: The FFTrees() function allows you to easily create trees from a training dataset, and test the performance of the trees with a test dataset (aka, cross-validation). You can do this by either entering an explicit test dataset in the data.test argument, or by randomly splitting the main dataset into a separate training and testing sample with train.p. For example, train.p = .5 will randomly split the data into a 50% training set, which will be used to build the trees, and a 50% test set, which will be used to evaluate their prediction performance.

[code language=”r”]
# Create a 50% training and 50% testing dataset with train.p = .5
breastcancer.test.fft <- FFTrees(formula = diagnosis ~ .,
data = breastcancer,
train.p = .5)

Restricting trees: If you want to explicitly decide which cues you want in the tree, you can specify this in the formula argument. For example, the following code will generate a tree from the breast cancer data, but only using cues thickness, mitosis, and adhesion.

[code language=”r”]
# Only use 3 cues in the trees
breastcancer.r.fft <- FFTrees(formula = diagnosis ~ thickness + mitoses + adhesion,
data = breastcancer)


The FFTrees package contains lots of other functions for visualising and comparing trees. In addition, it contains several real world datasets you can use to predict who survived the Titanic (titanic), whether a mushroom is poisonous or not (mushrooms), or how a politician will vote on a new bill (voting). To see all the details, check out the package vignettes either in the package or on CRAN (here). For all you judgment and decision making researchers out there, I will also be presenting the package at the annual meeting of the Society for Judgment and Decision Making (SJDM) in Boston in November 2016.

The package is also very much in development, so I am grateful for any recommendations, inevitable bug-reports, or criticisms. You can post bug-reports at, or email me directly at

Alternative decision tree R packages

I hope you have as much fun with the package as I do, but I must warn you: this package was written by a psychologist who just happens to love programming in R for fun, rather than a statistician who really knows their shit about how best to build trees. For this reason, the code is messy, dreadfully inefficient, and without proper reference to established tree building algorithms. While I hope to collaborate with statisticians in the future to improve the tree building algorithm (likely by implementing one that already exists), it’s not quite there yet. For more established methods, I strongly recommend that you check out the rpart, and party packages.


This package was developed in collaboration with Dr. Hansjoerg Neth and Dr. Wolfgang Gaissmaier at the University of Konstanz, and Dr. Jan Woike at the Max Planck Institute for Human Development in Berlin.

Code / Package Updates

This blog post has been updated to incorporate updates made to version 1.1.4 of the FFTrees package. Specifically, the original fft() function was changed to FFTrees() to avoid confusion with fast fourier transform. All the original code (should) still work, but with some warnings.


Gigerenzer, G., & Brighton, H. (2009). Homo heuristicus: Why biased minds make better inferences. Topics in Cognitive Science, 1(1), 107–143.

Gigerenzer, G., & Todd, P. M. (1999). Fast and frugal heuristics: The adaptive toolbox. In Simple heuristics that make us smart (pp. 3–34). Oxford University Press.

Gigerenzer, G., Czerlinski, J., & Martignon, L. (1999). How good are fast and frugal heuristics? In Decision science and technology (pp. 81–103). Springer.

Gladwell, M. (2007). Blink: The power of thinking without thinking. Back Bay Books.

Green, L., & Mehr, D. R. (1997). What alters physicians’ decisions to admit to the coronary care unit? Journal of Family Practice, 45(3), 219–226.

Jenny, M. A., Pachur, T., Williams, S. L., Becker, E., & Margraf, J. (2013). Simple rules for detecting depression. Journal of Applied Research in Memory and Cognition, 2(3), 149–157.

Malkiel, B. G. (1999). A random walk down Wall Street: including a life-cycle guide to personal investing. WW Norton & Company.

Marewski, J. N., & Gigerenzer, G. (2012). Heuristic decision making in medicine. Dialogues Clin Neurosci, 14(1), 77–89.

Martignon, L., Katsikopoulos, K. V., & Woike, J. K. (2008). Categorization with limited resources: A family of simple heuristics. Journal of Mathematical Psychology, 52(6), 352-361.

Luan, S., Schooler, L. J., & Gigerenzer, G. (2011). A signal-detection analysis of fast-and-frugal trees. Psychological Review, 118(2), 316.

The new and improved pirateplot()! Now with themes!

Note: this post refers to an outdated version of the pirateplot() function. For the latest description of the function, check out

Hello fellow R pirates! I recently wrote a post demonstrating the pirateplot() function in the yarrr package. The pirateplot() function replace deceptive (and boring) bar plots with a full RDI plot, capable of plotting Raw data (and smoothed densities), Descriptive statistics (like means and medians), and Inferential statistics (a 95% Bayesian Highest Density Interval “HDI”).

In the past 48 hours I’ve had an overwhelmingly positive response to the pirateplot() function. At the same time, I received several great recommendations for tweaks and improvements to the function to make it even better. After consulting the original pirate text, I decided that the changes stay true to the original piratery intent of the pirate plot. I’m sure the founding members of the pirate plot would be proud.

Let’s go over the changes! First let’s install and loadthe yarrr package which contains the pirateplot() function:

[code language=”r”]


Now you’re ready to make some pirate plots! Here are the major updates to the function:

Opacity Themes

The five critical aspects of a pirate plot are the bars, beans, points, (average) lines, and hdis. You can adjust the opacity of each of these elements with opacity arguments — such as bars.o, beans.o (etc.).

The biggest update to pirateplot() is the addition of opacity themes which are designated by a new argument called theme.o. The input to this argument is an integer (currently 1, 2 or 3) that defines an opacity theme across all five elements. Themes 1, 2, and 3 create specific opacity values for each of the elements, while Theme 0 sets all opacities to 0. Thankfully, the themes just set default values for the individual element opacities — you can still override the opacities of any specific object within a theme by including an object specific opacity value.

Here are examples of the three different themes applied to the ChickWeight dataset:

Theme 1

Theme 1 emphasises the bar with light points and beans (I’ll use the appletv palette for this one)

[code language=”r”]

pirateplot(formula = weight ~ Diet,
data = ChickWeight,
main = "Theme 1\nappletv palette",
theme.o = 1,
pal = "appletv")



Theme 2

Theme 2 emphasises the points and beans (using the southpark palette)

[code language=”r”]

pirateplot(formula = weight ~ Diet,
data = ChickWeight,
main = "Theme 2\nsouthpark palette",
theme.o = 2,
pal = "southpark")



Theme 3

Theme 3 Emphases the 95% Highest Density Intervals (HDIs). Keep in mind that calculating HDIs can take a few seconds for each bean… Here I’ll use the Basel palette.

[code language=”r”]

pirateplot(formula = weight ~ Diet,
data = ChickWeight,
main = "Theme 3\nbasel palette",
theme.o = 3,
pal = "basel")



Theme 0

In Theme 0, all opacities are set to 0 by default, so you can just individually specify the opacity of each element. In this example, I’ll turn the lines on full-blast, and turn the points on slighly. I’ll also increase the amount of jittering and size of the points. I’ll also use the google palette.

[code language=”r”]

pirateplot(formula = weight ~ Diet,
data = ChickWeight,
main = "Theme 0\ngoogle palette",
pal = "google",
point.o = .2,
line.o = 1,
theme.o = 0,
line.lwd = 10,
point.pch = 16,
point.cex = 1.5,
jitter.val = .1)



Of course, you can still change the colors of the plotting elements with the par argument, and the background using the back.col argument. Here’s an x-ray version of Theme 3

[code language=”r”]
pirateplot(formula = weight ~ Diet,
data = ChickWeight,
main = "Theme 3\nlight color with black background",
pal = "white",
theme.o = 3,
point.pch = 16,
back.col = gray(.2))




You can now include gridlines in your plot with the gl.col argument. Just specify the color of the lines and the function will put them in reasonable places. The following plot also shows how pirateplot() handles two independent variables:

[code language=”r”]
pirateplot(formula = weight ~ Diet + Time,
data = subset(ChickWeight, Time < 10),
theme.o = 2,
pal = "basel",
point.pch = 16,
gl.col = gray(.8),
main = "Two IVs\nWith gridlines")


Other minor changes

I’ve also made the following smaller changes

  • The function no longer automatically sorts the levels of the IV. It will plot levels of the IV(s) in the order the are found in the original dataframe.
  • You can now manually change the color of the bar borders with the bar.border.col argument.

Happy pirate plotting!

As always, I am very happy to receive more comments and feedback on the function. You can write me at Happy pirate plotting!

Coin flips and the Hot Hand

The Hot Hand is the belief that an actor, such as a basketball player, is more likely to be successful if he has just experienced success than if he has just experienced a failure. For example, a basketball player is believed to be ‘hot’ when he has just made one (or more) shots in a row – and as a result, is more likely to make his next shot than if he was not in this ‘hot’ state. In statistical language, the hot hand is the claim that the conditional probability of a success given a prior success (p(Success | Success)) is higher than the conditional probability of a success given a previous failure (p(Success | Failure)).

While belief in the hot hand is pervasive, several decades of research, notably one by Gilovich, Vallone and Tversky has suggested that it in fact does not exist. Rather, it is the result of a cognitive fallacy wherein people expect random sequences to have fewer strings of successes than are actually expected by chance.

However, in the past few weeks, an unpublished manuscript written by Joshua Miller and Adam Sanjurjo has apparently put an atom bomb on this debate. It was even picked up by the Wall Street Journal! The fact that the WSJ is writing about the results from an unpublished manuscript shows how big of a deal this is. So what are Miller and Sanjujo claiming that set off this massive reaction? They claim that in fact scientists, rather than basketball fans, have been making a statistical mistake. Specifically, Miller and Sanjurjo claim that, for a fair coin, the probability of a head given a previous head may not be 0.50!

Miller and Sanjurjo take the following example:

Take a coin and flip it 4 times, calculate the percentage of times where a head was followed by a head. Call this the “Head Repetition Probability.” Repeat this procedure 1 million times. What will the average Head Repetition Probability be?

If your answer is 50%, then you’re similar to most people. And like most people, you’re wrong. In fact, the probability is 41.7% – substantially lower than the intuitive 50% number that most people, and prior researchers, had believed. Why is this important? Well, if researchers find that, in fact, real basketball players have a Basket Repetition Probability of 50% (they have), then they have a higher RP than is expected by chance (i.e.; 41.7%)! Thus, the often detected 50% repetition probability in real people may in fact be evidence for the Hot Hand!

But were does this 41.7% figure come from? Does it really reflect the expected probability of getting a Head given that the previous outcome is a head? If so, then coins should be inherently anti-streaky, which would pretty much contradict everything we know from Probability 101. Thankfully, that’s not what the 41.7% number reflects. Rather, the number comes from an improper statistic which will equate sequences that are actually quite different.

Where the 41.7% figure comes from (and how to fix it)

To see where the 41.7% figure comes from, I will follow Andrew Gelman’s example and simulate the result in R using flips of a virtual fair coin. Using the coin analogy, we want to know “What is the probability of flipping a H given that a H occurred on the previous flip?”

However, in contrast to Gelman, I will not randomly simulate coin flips. Instead, I’ll use R to calculate all solutions analytically.

First, we’ll create a dataframe called possible showing of all possible sequences of 3 flips of a fair coin (though you can easily change the number of flips in the code by changing the n.flips value). We’ll use 1 to denote a Head and 0 to denote a Tail:

[code language=”r”]
n.flips <- 3
possible <- expand.grid(lapply(1:n.flips, function(x) {c(0, 1)}))
names(possible) <- paste("f", 1:n.flips, sep = "")


## f1 f2 f3
## 1 0 0 0
## 2 1 0 0
## 3 0 1 0
## 4 1 1 0
## 5 0 0 1
## 6 1 0 1
## 7 0 1 1
## 8 1 1 1

As you can see, there are 8 possible sequences (2 to the power of 3). However, we need to remove the two cases where there are no heads in the first 2 flips. Why? Because if there are no heads in the first two flips, then there is no possibility for a sequence of two heads anywhere in the sequence.

[code language=”r”]
possible <- possible[rowMeans(possible[,1:(n.flips – 1)]) != 0,]

## f1 f2 f3
## 2 1 0 0
## 3 0 1 0
## 4 1 1 0
## 6 1 0 1
## 7 0 1 1
## 8 1 1 1

Now we have 6 possible sequences that are equally likely to occur. For each of these sequences, how often is there an HH (a Head-Head repetition?). To do this, let’s calculate three statistics:

streak.opportunities: How many opportunities are there for an HH streak? In other words, how many H values are there in the first 2 flips?

actual.streaks: How many actual HH streaks are there?

p.HH: Of those possible streak opportunities, what percentage of the time was there an actual streak?

[code language=”r”]
for(i in 1:nrow(possible)) {

seq <- possible$flips[i]
seq.values <- as.numeric(possible[i,1:n.flips])

heads1 <- seq.values[1:(n.flips -1)] == 1
heads2 <- seq.values[2:(n.flips)] == 1

streak.opportunities.i <- sum(heads1)
actual.streaks.i <- sum(heads1 & heads2)

p.hh <- actual.streaks.i / streak.opportunities.i

possible$hh.streak.opportunities[i] <- streak.opportunities.i
possible$hh.streaks[i] <- actual.streaks.i
possible$p.hh[i] <- p.hh



## f1 f2 f3 hh.streak.opportunities hh.streaks p.hh
## 2 1 0 0 1 0 0.0
## 3 0 1 0 1 0 0.0
## 4 1 1 0 2 1 0.5
## 6 1 0 1 1 0 0.0
## 7 0 1 1 1 1 1.0
## 8 1 1 1 2 2 1.0

We’re almost to the 41.7% number! So, to calculate the expected probability of getting HH, we can calculate the average p.HH value across all 6 sequences:

[code language=”r”]
## [1] 0.4166667

Wow! The value is indeed 41.67%! But does this number really represent the conditional probability of getting a H given a H? No, it doesn’t

The Problem

Why is the 41.7% number misleading? Well, in our previous calculation, we treated all flip sequences as equally informative. That is, we said each sequence is equally likely, therefore, we can just take the average proportion of hits given hits (p.HH) across all 6 sequences. However, this is a false assumption! Indeed, some of these sequences are more informative than others.

To see this, we’ll take an extreme example. Consider flipping a coin 10 times. Here are two possible sequences. Assuming the coin is fair, both of these sequences are equally likely. For each sequence, I’ve calculated the number of HH streak opportunities, the number of actual HH streaks, and the percentage of HH streaks.

Sequence Streak opportunities Number of Streaks Streak Percentage
1, 1, 1, 1, 1, 1, 1, 1, 1, 1 9 9 1.0
0, 0, 0, 0, 0, 0, 0, 0, 1, 1 1 1 1.0

In the first sequence, there are 9 opportunities for a streak (in the 2nd through the 10th flip). Additionally, there were 9 actual streaks, so the proportion of streaks was 1.0.

Now, consider the second sequence. In this sequence, there is only one opportunity for an HH streak (on the final 10th flip). And indeed, there was indeed an H on the 10th and final flip. Therefore, the proportion of HH streaks out of all possible HH streaks was 100% – just like in the first sequence! But are these sequences really equally informative? I think not. In the first sequence, we considered all 10 data points while in the second sequence, we ignored the first 8 flips and only considered the last two.

Think about a real world example: if you saw a player hit 10 shots in a row, would’t you think she provided stronger evidence for a hot-hand than a player who missed 8 shots and then made only 2 in a row?

Thankfully, there is an easy fix to the problem

Fixing the Problem (and finally getting p(H|H) = 50%)

To fix the problem, we need to make sure we weight sequences according to how many opportunities there were for an HH sequence. To do this, we can simply calculate the total number of HH streaks across all sequences, and divide it by the total number of HH streak opportunities across all sequences. Here, we’re ignoring differences between people.

Let’s go back to our coin flip dataframe and make this calculation. If I’m right, we should get a p(H|H) probability of 50%:

f1 f2 f3 hh.streak.opportunities hh.streaks p.hh
1 0 0 1 0 0.0
0 1 0 1 0 0.0
1 1 0 2 1 0.5
1 0 1 1 0 0.0
0 1 1 1 1 1.0
1 1 1 2 2 1.0

To calculate this, we’ll calculate the total possible number of streaks (sum of the 5th column) and divide it by the total possible number of streaks (sum of the 4th column). We should get 4 / 8 = 50%:

[code language=”r”]
sum(possible$hh.streaks) / sum(possible$hh.streak.opportunities)
## [1] 0.5

Phew, we got it! Again, the reason why our 50% number is larger than the previous 41.7% number is because we’ve given more weight to sequences with more opportunities for HH events than those with fewer opportunities.


While the 41.7% number reported by Joshua Miller and Adam Sanjurjo (and subsequently confirmed by Andrew Gelman sounds like a scary violation of the laws of statistics, it is in fact an artefact of an improper statistical procedure. As I’ve shown, taking the simple average proportion of HH events across many sequences is improper because it gives equal weight to sequences with many opportunities for HH events and to those with only a few opportunities. By analogy, consider two basketball players who shoot 100 shots. Player A misses his first 97 shots, and makes his last 3 shots. In contrast, player B makes all 100 shots. Which player was more hot? According to the improper statistic – both are equally hot because both have a p(H|H) value of 100%! Following this logic, both provide equivalent (positive) evidence for the hot-hand. Of course, this is poor reasoning, we should give player B’s performance more weight. Departing from this example and moving back to the more general example of coin flips, we can avoid this bias by simply calculating the total number of HH events divided by all possible HH events, you will reach the correct (and intuitive) 50% figure.

Addendum: Using Bayes to help correct the error

After finishing the previous analysis, I wondered if we could use Bayesian statistics to correct the bias inherent in the improper statistic. Specifically, if we allow Bayes theorem to weight evidence according to the number of opportunities for success (which as I’ve shown to be critical), can we return to using the improper statistic?

To answer this question, I will repeat the improper statistic procedure, but I won’t calculate the mean of the sample pHH proportions. Instead, I’ll first calculate the Bayesian posterior highest density of the pHH value for each player (e.g.; coin), and then calculate the average Bayesian posterior highest density across all players.

I’m make this example a bit more interesting by simulating the results of baskets in a shooting tournament. I’ll generate outcomes of 10 sequential shots from 10,000 basketball players. Importantly, each player will have a true 50% chance of success on each shot.

[code language=”r”]

n.players <- 1000
n.shots <- 10
shots <-, 1), size = n.players * n.shots, replace = T),
nrow = n.players, ncol = n.shots))


Again, let’s delete cases where the first 9 shots are all misses

[code language=”r”]
shots <- shots[rowMeans(shots[,1:9]) != 0,]

Now, for each player, let’s calculate 4 statistics

  • possible.HH: The number of hits in shots 1 – 9; this equates to the number of possible streaks in the sequence
  • actual.HH: The number of actual HH streaks in the sequence
  • actual.HH.p: The actual proportion of HH streaks in the sequence
  • posterior.HH.p: The Bayesian posterior estimate of the probability of an HH sequence given

[code language=”r”]
for(i in 1:nrow(shots)) {

player.seq <- shots[i,]

possible.HH.sequences <- sum(player.seq[1:(n.shots – 1)] == 1)
actual.HH.sequences <- sum(player.seq[1:(n.shots – 1)] == 1 & player.seq[2:(n.shots)] == 1)

new.player.seq <- c(rep(1, actual.HH.sequences), rep(0, possible.HH.sequences – actual.HH.sequences))

n <- possible.HH.sequences
p <- actual.HH.sequences / possible.HH.sequences

post.a <- 1 + p * n
post.b <- 1 + (1 – p) * n

post.mean <- qbeta(.5, post.a, post.b)
lb <- qbeta(.025, post.a, post.b)
ub <- qbeta(.975, post.a, post.b)

shots$possible.HH[i] <- possible.HH.sequences
shots$actual.HH[i] <- actual.HH.sequences
shots$actual.HH.p[i] <- p
shots$bayes.HH.p[i] <- post.mean
shots$lb[i] <- lb
shots$ub[i] <- ub

Now, let’s calculate the average HH streaks from our measures.

First, let’s use the original biased measure (the same one that got us the ugly 41.7% value for 4 coin flips).

[code lanugage=”r”]
## [1] 0.4431845

In my simulation, I got a value of 44.3%. Better than the 41.7% value, but still far away from the desired 50% value.

Now, let’s see how our average Bayesian solution did:

[code language=”r”]
## [1] 0.4701428

The mean Bayesian posterior response is 47.0%, closer to 50% than before, but still not good enough. To be honest, I’m not totally sure why this value does not hit 50% (ignoring sampling error). I’ll still need to work on this.

Finally, let’s use our simple unbiased measure:

[code language=”r”]
sum(shots$actual.HH) / sum(shots$possible.HH)
## [1] 0.4975391

Here, I got a value of 49.8% – close enough to be within sampling error of 50%.

The yarrr R package!

After reading Hilary Parker’s and Hadley Wickham’s great tutorials on building R packages, I’ve decided to build the yarrr package. The package contains all of the datasets, functions, and the game-changing jokes contained in YaRrr! The Pirate’s Guide to R.

To install the package from github, run the following code:

[code language=”r”]

Once you do, you should be able to see the pirates dataset – a historical survey of hundreds of pirates now in R form for your data analysis pleasures:

[code language=”r”]

P.S. Thanks to Hilary Parker for showing me how to do proper code formatting in wordpress!

Press Release: The Janus face of Darwinian competition

Press Release:

Paper: The Janus face of Darwinian competition

This week Michigan State University released the following press release on a recent publication in Scientific Reports that I was privileged to be a co-author on! The paper, titled “The Janus face of Darwinian competition” was first-authored by Dr. Arend Hintze at Michigan State University. The paper uses computational techniques to test how competition affects the evolution of decision making strategies. Here is the punchline of the paper: some competition is good, but too much can spur the evolution of poor decision making strategies. The paper builds on the competitive sampling paradigm I co-developed along with Dr. Yaakov Kareev, Dr. Judith Avrahami, and Dr. Ralph Hertwig in our paper “Rivals in the dark: How competition influences search in decisions under uncertainty” published last year in Cognition.

Maximum-Likelihood parameter estimation in R: A brief tutorial

A brief tutorial on ML estimation in R

Recently, a colleague asked me to demonstrate how one can calculate maximum-likelihood (ML) parameter estimates in R. As I had been meaning to do this for my R course anyway, I decided to write up the following brief tutorial. In the tutorial, I go through the three basic steps of ML estimation in R for both a statistical example (the simple linear model) and a psychological model of choice (the Proportional Difference model, Gonzalez-Vallejo, 2001).

You can view the tutorial on the preceding link. If you have any comments, criticisms or questions, please let me know and I will try to update the document!

Note: I am in debt to Dr. Stephen Lewandowsky and Dr. Simon Farrell’s summer school on computational cognitive modelling for teaching me these techniques. You can purchase their great book on the subject here Computational Modeling in Cognition: Principles and Practice

My R course…now on YouTube!

Well, after talking about it for about a year now, I’ve finally started converting my R course to a YouTube friendly format. Since I’m in the middle of my university R course, the videos don’t quite start at the beginning. I’ll add the beginning 3-5 lectures after the semester ends. In the meantime, here’s the permanent link: