Nassim Nicholas Taleb recently tweeted out an interesting problem. I didn’t know how to solve it mathematically, so I wanted to build a simulation in R.

Here’s the problem:

PROBABILITY QUIZ: How randomness should almost never look random.
(spurious cancer cluster in Fooled by Randomness)@CutTheKnotMath pic.twitter.com/1Svrn0IB69
— NassimNicholasTaleb (@nntaleb) November 2, 2017

I like to visualize things, so I decided to simulate throwing 8 darts on a board first, then simulate the dart throwing model many times using the Monte Carlo method.

Simulating dart throws

This uses the raster and tidyverse packages¹. The raster package simulate throwing darts, while the tidyverse gives us flexible ways to filter the data we’re generating.

We need to do four main things:

Create a 4x4 grid that will function as our dart board
Generate 8 random points on the grid that function as our dart throws
Count the number of darts in each section of our dart board
Visualize the results

Constructing a dart board

To build the dart board, we can use the raster package:

# create a raster with 4x4 grids
board <- raster(xmn=0, ymn=0, xmx=1, ymx=1, res=0.25)

Generating dart throws

We need to generate an extent from our dart board raster. This forms a bounding box around our dart board raster, so we can calculate coordinates within the extent that map to the raster.

To generate a spatial sample, we pass in our dart board extent², the number of times we want to throw darts, and our method for throwing the darts, which is random.

# throw our darts on the dart board
throws <- spsample(as(extent(board), 'SpatialPolygons'), 8, 'random')

Counting darts

The raster package has a function called rasterize that will tablulate how many times each grid was hit by a dart.³

# calculate which grid each of our darts hit
board <- rasterize(throws, board, fun='count', background = 0)

Visualizing the dart board

Let’s take a look at our dart board now:

# plot the board
plot(board)
# plot the points
points(throws, pch=20)

A visualization of our dart board, with the points our darts and counts for how many darts hit each grid

In this exampe, none of our grids were hit by 3 or more darts.

Simulate our dart board model many times

Now that we can visualize hitting one dart board with 8 darts, we want to determine the likelihood of one of our grids getting hit by 3 or more darts, per Taleb’s challenge.

One simulation, shown above, gave us a little information about this probability. Using the Monte Carlo simulation method, we can run many simulations rapidly, then compare the results to get a much better idea of this probability. While we’ll never prove the correct answer with this method, it will give us a practical answer very quickly.⁴

Turn our earlier work into a function

To create a Monte Carlo simulation, we’re going to turn our earlier R snippets into a function that returns TRUE or FALSE, depending on if the dart board contains any grids with 3 or more darts:

throwDarts <- function(raster, n_throws) {

  # throw our darts on the dart board
  throws <- spsample(as(extent(board), 'SpatialPolygons'), n_throws, 'random')

  # calculate which grid each of our darts hit
  board <- rasterize(throws, board, fun='count', background = 0)

  # save board to a table
  board_table <- tbl_df(data.frame(coordinates(board), count=board[]))

  # return TRUE if there are 3 or more darts in any grid, else return FALSE
  ifelse(nrow(board_table %>% filter(count >= 3)) > 0, return(TRUE), return(FALSE))
}

The primary changes in this function are:

We’re saving the results of the board after counting the darts in each grid to a table, instead of plotting them in R.
We’re evaluating whether or not there are 3 or more darts in any grid within the function and returning TRUE or FALSE depending on the results.
The parameter raster needs to be passed in (we’ll use our board raster from earlier), as well as the number of throws, n_throws.

We can run this function with throwDarts(board, 8).

Building the Monte Carlo simulator

Now that we have the throwDarts function, all we need to do is run it many times and calculate the frequency of cases where it returns TRUE divided by the number of simulations. This will return an estimated average for how frequently 3 or more darts will hit the same grid during a simulation.

Since we already did the heavy lifting, the Monte Carlo simulator is easy to build:

dartSims <- function(raster, n_throws, n_sims) {

  # save an empty vector to store sim results
  boardEval <- c()

  # for loop to run throwDarts sims
  for (i in seq(n_sims)) {
    boardEval <- c(boardEval,
                   throwDarts(raster,
                              n_throws))
  }

  # calculates the frequency of boards with 3 or more darts in a single grid
  hitRate <- sum(boardEval)/length(boardEval)

  # returns this value when the function is evaluated
  return(hitRate)
}

The bulk of the work is done in the for loop, which runs through our throwDarts function as many times as we tell it to, storing the results in the boardEval vector.

Since TRUE values equal 1 and FALSE values equal 0 in R, we can write a quick one-liner to add up all the boards that have TRUE values, then divide that by the number of boards we simulated.

This gives us a frequency for how often boards have a grid with 3 or more darts!

Final calculation

Running dartSims(board, 8, 100) will simulate throwing 8 darts at our dart board 100 times, returning the frequency of grids with 3 or more darts.

However… this value isn’t likely the best answer! Running more simulations increases the likelihood of attaining a closer approximation of the true probability.

This is where judgement needs to come into play, as increasing the number of simulations increases the time and computing power required. There are diminishing returns past a certain point, too, so starting smaller and increasing until things start to stabilize is a good approach.

dartSims(board, 8, 100000) gives a good approximation given these tradeoffs with a value of 0.16912⁵. Taleb tweeted out that the solution is 0.169057, so this is a fantastic estimation for our purposes.

Recap

We covered a lot of ground, first building a visualization of throwing darts at a virtual dart board before abstracting the dart throwing into a function that we could use in a Monte Carlo simulation. While we didn’t answer Taleb’s question definitively, we did calculate a practical answer quickly.

The Monte Carlo method is portable to similar problems, too. We sneakily implemented a technique that will work for geospatial problems, so keep an eye out for future write-ups along those lines (no pun intended).

You can get the complete R script here.

One of the best things about R is it’s ecosystem. There’s a package for almost any statistical or visualization task you can think of. ↩
You may notice the SpatialPolygon parameter, too. This is telling the sampling function that the dart board contains distinct grids. ↩
We’re using fun='count' because we want the counts of darts hitting each grid. background=0 tells rasterize to set grid values to 0 for the lonely grids that don’t get hit by a dart. ↩
As a quick philosophical aside, a practical, mostly correct answer now is good enough for me in most situations. ↩
Running set.seed(2001) just before executing dartSims(board, 8, 10000) allows you to reproduce this result. R’s random functions aren’t really random, rather it uses a pseudorandom number generator behind the scenes. set.seed allows us to tap into this and reproduce results by resetting to a starting point for a random number generation function. ↩

Throwing darts with a Monte Carlo simulation