Nassim Nicholas Taleb recently tweeted out an interesting problem. I didn’t know how to solve it mathematically, so I wanted to build a simulation in R.
Here’s the problem:
PROBABILITY QUIZ: How randomness should almost never look random.
— NassimNicholasTaleb (@nntaleb) November 2, 2017
(spurious cancer cluster in Fooled by Randomness)@CutTheKnotMath pic.twitter.com/1Svrn0IB69
I like to visualize things, so I decided to simulate throwing 8 darts on a board first, then simulate the dart throwing model many times using the Monte Carlo method.
Simulating dart throws
This uses the raster
and tidyverse
packages^{1}. The raster
package simulate throwing darts, while the tidyverse
gives us flexible ways to filter the data we’re generating.
We need to do four main things:
 Create a 4x4 grid that will function as our dart board
 Generate 8 random points on the grid that function as our dart throws
 Count the number of darts in each section of our dart board
 Visualize the results
Constructing a dart board
To build the dart board, we can use the raster package:
# create a raster with 4x4 grids
board < raster(xmn=0, ymn=0, xmx=1, ymx=1, res=0.25)
Generating dart throws
We need to generate an extent from our dart board raster. This forms a bounding box around our dart board raster, so we can calculate coordinates within the extent that map to the raster.
To generate a spatial sample, we pass in our dart board extent^{2}, the number of times we want to throw darts, and our method for throwing the darts, which is random
.
# throw our darts on the dart board
throws < spsample(as(extent(board), 'SpatialPolygons'), 8, 'random')
Counting darts
The raster package has a function called rasterize
that will tablulate how many times each grid was hit by a dart.^{3}
# calculate which grid each of our darts hit
board < rasterize(throws, board, fun='count', background = 0)
Visualizing the dart board
Let’s take a look at our dart board now:
# plot the board
plot(board)
# plot the points
points(throws, pch=20)
In this exampe, none of our grids were hit by 3 or more darts.
Simulate our dart board model many times
Now that we can visualize hitting one dart board with 8 darts, we want to determine the likelihood of one of our grids getting hit by 3 or more darts, per Taleb’s challenge.
One simulation, shown above, gave us a little information about this probability. Using the Monte Carlo simulation method, we can run many simulations rapidly, then compare the results to get a much better idea of this probability. While we’ll never prove the correct answer with this method, it will give us a practical answer very quickly.^{4}
Turn our earlier work into a function
To create a Monte Carlo simulation, we’re going to turn our earlier R snippets into a function that returns TRUE
or FALSE
, depending on if the dart board contains any grids with 3 or more darts:
throwDarts < function(raster, n_throws) {
# throw our darts on the dart board
throws < spsample(as(extent(board), 'SpatialPolygons'), n_throws, 'random')
# calculate which grid each of our darts hit
board < rasterize(throws, board, fun='count', background = 0)
# save board to a table
board_table < tbl_df(data.frame(coordinates(board), count=board[]))
# return TRUE if there are 3 or more darts in any grid, else return FALSE
ifelse(nrow(board_table %>% filter(count >= 3)) > 0, return(TRUE), return(FALSE))
}
The primary changes in this function are:
 We’re saving the results of the board after counting the darts in each grid to a table, instead of plotting them in R.
 We’re evaluating whether or not there are 3 or more darts in any grid within the function and returning
TRUE
orFALSE
depending on the results.  The parameter
raster
needs to be passed in (we’ll use ourboard
raster from earlier), as well as the number of throws,n_throws
.
We can run this function with throwDarts(board, 8)
.
Building the Monte Carlo simulator
Now that we have the throwDarts
function, all we need to do is run it many times and calculate the frequency of cases where it returns TRUE
divided by the number of simulations. This will return an estimated average for how frequently 3 or more darts will hit the same grid during a simulation.
Since we already did the heavy lifting, the Monte Carlo simulator is easy to build:
dartSims < function(raster, n_throws, n_sims) {
# save an empty vector to store sim results
boardEval < c()
# for loop to run throwDarts sims
for (i in seq(n_sims)) {
boardEval < c(boardEval,
throwDarts(raster,
n_throws))
}
# calculates the frequency of boards with 3 or more darts in a single grid
hitRate < sum(boardEval)/length(boardEval)
# returns this value when the function is evaluated
return(hitRate)
}
The bulk of the work is done in the for loop
, which runs through our throwDarts
function as many times as we tell it to, storing the results in the boardEval
vector.
Since TRUE
values equal 1 and FALSE
values equal 0 in R, we can write a quick oneliner to add up all the boards that have TRUE
values, then divide that by the number of boards we simulated.
This gives us a frequency for how often boards have a grid with 3 or more darts!
Final calculation
Running dartSims(board, 8, 100)
will simulate throwing 8 darts at our dart board 100 times, returning the frequency of grids with 3 or more darts.
However… this value isn’t likely the best answer! Running more simulations increases the likelihood of attaining a closer approximation of the true probability.
This is where judgement needs to come into play, as increasing the number of simulations increases the time and computing power required. There are diminishing returns past a certain point, too, so starting smaller and increasing until things start to stabilize is a good approach.
dartSims(board, 8, 100000)
gives a good approximation given these tradeoffs with a value of 0.16912
^{5}. Taleb tweeted out that the solution is 0.169057
, so this is a fantastic estimation for our purposes.
Recap
We covered a lot of ground, first building a visualization of throwing darts at a virtual dart board before abstracting the dart throwing into a function that we could use in a Monte Carlo simulation. While we didn’t answer Taleb’s question definitively, we did calculate a practical answer quickly.
The Monte Carlo method is portable to similar problems, too. We sneakily implemented a technique that will work for geospatial problems, so keep an eye out for future writeups along those lines (no pun intended).
You can get the complete R script here.

One of the best things about R is it’s ecosystem. There’s a package for almost any statistical or visualization task you can think of. ↩

You may notice the
SpatialPolygon
parameter, too. This is telling the sampling function that the dart board contains distinct grids. ↩ 
We’re using
fun='count'
because we want the counts of darts hitting each grid.background=0
tells rasterize to set grid values to 0 for the lonely grids that don’t get hit by a dart. ↩ 
As a quick philosophical aside, a practical, mostly correct answer now is good enough for me in most situations. ↩

Running
set.seed(2001)
just before executingdartSims(board, 8, 10000)
allows you to reproduce this result. R’s random functions aren’t really random, rather it uses a pseudorandom number generator behind the scenes.set.seed
allows us to tap into this and reproduce results by resetting to a starting point for a random number generation function. ↩