A “Minority Report” Monte Carlo simulation in Scala


“The Precogs are never wrong. But occasionally they do disagree.”
~ Minority Report
 

This article shares the source code for a Monte Carlo simulation that I originally wrote in Scala many years ago. It was inspired by the movie Minority Report, as well as my own experience.

Background

For the purposes of this simulation, imagine that you have three people that are each “right” roughly 80% of the time. For instance, if they take a test with 100 questions, each of the three individuals will get 80 of the questions right, although they may not get the same questions right or wrong.

Given these three people, my question to several statistician friends was:

“If two of the people have the same answer to a given question, what are the odds that they are correct? Furthermore, if all three of them give the same answer to a question, what are the odds that they are right?”

Maybe I phrased my question wrong, but the statisticians kept saying, “80%”, which I strongly felt was wrong.

Therefore, what I’m doing in this Monte Carlo simulation program is assuming that there are N questions in a given “test”. To keep this simple I’m assuming that the “correct” answer to each question is A. I then randomly populate the “answers” of three simulated people, making sure that very close to 80% of each person’s answers are A. Once I have N answers for each user (in this case N is 100,000), I compare the answers.

If you think of each element in the Person1 (P1) array as being a series of answers, you’ll find that P1 answered A 80% of the time. You can therefore think that this person was correct in all of these answers. The same thing is true for Person2 and Person3. However, because the results are random, there’s no guarantee that for any value of N that the three “people” will have the same answer. For the case of N=1, all three people may have the correct answer A, but for N=2, P1 may be A, P2 may be B, and P1 may be A, or any other possible combination.

So now the question becomes, if all three people have the same answer, what are the odds that they are correct? I do this by walking through the three arrays and finding the cases where all three people have the same answer. It turns out that when all three people have the same answer (where the answer is either A or B) they have the correct answer — A — roughly 98.5% of the time.

Similarly, it turns out that when you compare P1 and P2, when they agree on an answer, they are correct ~94% of the time. Intuitively I believe that it is correct that this value is greater than 80%, and also less than the value when all three people agree.

It may be that these results are skewed somewhat by the fact that there are only two possible answers in my questions, A and B. For instance, the results will surely be different if there are something like four possible answers to each question (A, B, C, D). How will adding more possible answers modify the test results? I encourage you to modify this Monte Carlo simulation and find out; after all, that’s what simulations like this are for. :)

The Scala source code

Given that background, here’s my Scala 3 source code for this ’Minority Report’ Monte Carlo simulation:

package montecarlo

import scala.util.*
import java.text.NumberFormat

/**
 * As you’ll see below, a simple way to simulate correct and incorrect
 * answers is to assume that 'A' is always the correct answers, and
 * 'b' (or any other character) is the incorrect answer.
 */
val A = 'a'   // the correct answer to all questions
val B = 'b'   // the incorrect answer

/**
 * A 'Minority Report' Monte Carlo simulation written by Alvin Alexander.
 * @see https://alvinalexander.com/scala/scala-monte-carlo-simulation-minority-report
 * for more details and information.
 */
@main def MinorityReportSimulation =

    // the number of simulated "questions" that we ask.
    // with a Monte Carlo simulation you want this to be a relatively
    // high number, because each test depends on randomly-generated
    // values.
    val N = 100_000

    // these arrays will be randomly populated with ~80% 'A' values,
    // correlating to each person being right 80% of the time.
    // first, create the empty arrays:
    val person1 = new Array[Char](N)
    val person2 = new Array[Char](N)
    val person3 = new Array[Char](N)

    // now populate each array with ~80% 'A' values, where those values
    // are stored in random locations in each array. it isn’t necessary
    // to populate the arrays with different high/low values, but i did
    // it to make it more interesting:
    populateArrayValues(person1,  0, 80, N)
    populateArrayValues(person2, 10, 90, N)
    populateArrayValues(person3, 15, 95, N)
    
    // verify that the person1 array has approximately 80% 'A' values:
    println(s"\n# of P1 'A' values = ${person1.filter(_ == A).size}")

    // run the simulations, then print the results
    val (numSame1, numCorrect1) = casesWhereP1AndP2HaveSameAnswer(person1, person2, N)
    printResults("P1 = P2", numSame1, numCorrect1)

    val (numSame2, numCorrect2) = casesWhereP1AndP2AndP3HaveSameAnswer(person1, person2, person3, N)
    printResults("P1 = P2 = P3", numSame2, numCorrect2)

end MinorityReportSimulation

/**
 * A method to populate the array values so that their answers are ~80% correct,
 * but the correct answers are randomly distributed in the array. (Some of this
 * is overkill, but I wanted to have some fun.)
 */
def populateArrayValues(
    personValues: Array[Char],
    lowRandom: Int,
    highRandom: Int,
    numberOfRuns: Int
): Unit =
    assert(highRandom - lowRandom == 80)   // the diff must be 80 for these tests
    for i <- 0 until numberOfRuns do
        val ri = Random.nextInt(100)
        if ri > lowRandom && ri <= highRandom then
            personValues(i) = A
        else
            personValues(i) = B

/**
 * Look at the situation where person1 and person2 have the same answer.
 * out of these, what percentage is correct? (i.e., what % of these are 'a'?)
 */
def casesWhereP1AndP2HaveSameAnswer(
    p1: Array[Char],
    p2: Array[Char],
    numberOfRuns: Int
): (Int, Int) =
    var numSame = 0
    var numCorrect = 0
    for i <- 0 until numberOfRuns do
        if p1(i) == p2(i) then
            numSame += 1
            if p1(i) == A then numCorrect += 1
        end if
    (numSame, numCorrect)

/**
 * Look at the situation where p1, p2, and p3 have the same answer.
 * out of these, what percentage is correct? (i.e., what % of these are 'a'?)
 */
def casesWhereP1AndP2AndP3HaveSameAnswer(
    p1: Array[Char],
    p2: Array[Char],
    p3: Array[Char],
    numberOfRuns: Int
): (Int, Int) =
    var numSame = 0
    var numCorrect = 0
    for i <- 0 until numberOfRuns do
        if p1(i) == p2(i) && p2(i) == p3(i) then
            numSame += 1
            if p1(i) == A then numCorrect += 1
        end if
    end for
    (numSame, numCorrect)

/**
 * Print the final results in the desired format.
 */
def printResults(header: String, numSame: Int, numCorrect: Int): Unit =
    val percentCorrect = numCorrect.asInstanceOf[Float]/numSame.asInstanceOf[Float] * 100F
    val formatter = NumberFormat.getIntegerInstance
    println(s"\n$header")
    println(s"# same answers:    ${formatter.format(numSame)}")
    println(s"# correct answers: ${formatter.format(numCorrect)}")
    println(f"percent correct:   $percentCorrect%2.2f")

Sample output

By definition, a Monte Carlo simulation produces different results each time it’s run, but when you run a simulation many times, you’ll find that the results tend to an average. For instance, here are the results from three sample runs:

P1 = P2
# same answers:    67,970
# correct answers: 63,972
percent correct:   94.12

P1 = P2 = P3
# same answers:    52,283
# correct answers: 51,462
percent correct:   98.43

---

P1 = P2
# same answers:    67,950
# correct answers: 63,942
percent correct:   94.10

P1 = P2 = P3
# same answers:    52,052
# correct answers: 51,271
percent correct:   98.50

---

P1 = P2
# same answers:    67,934
# correct answers: 63,965
percent correct:   94.16

P1 = P2 = P3
# same answers:    51,867
# correct answers: 51,114
percent correct:   98.55

As shown, when P1 and P2 values are the same, they have the ’correct’ answer ~94% of the time, and when P1, P2, and P3 are the same, they are correct ~98.5% of the time.

If you’re interested in performance, this code runs in about 25ms on my two-year old computer.

Summary

If you’ve never worked with a Monte Carlo simulation before, I hope this article has been helpful. I’ve found the technique useful in solving several problems that I didn’t know how to work out mathematically. The technique is:

  1. State the problem you’re trying to solve.
  2. Write code to simulate the problem. Generate random data for your problem, where that random data conforms on average to what you expect, or what you know from real-world observations.
  3. Run the simulation a large enough number of times so that the results have a chance to “settle down”.

I know I didn’t state that very well, but I’m a little short on time, so I’ll just say, “Please read the Monte Carlo Method entry on Wikipedia for more information”.

Finally, I hope this example shows one thing I really like about Scala: It looks and feels like a dynamic scripting language. Go back and look at the code and see how rarely data types have to be specified, and notice how very little “boilerplate” there is in the language. Scala is known as a “low ceremony” language, and I think this example helps to show that.

Functional programming

As a final note, at some point I’ll rewrite this code in a more functional style. Until then, see my book, Functional Programming, Simplified for more information on a “functional” approach.