A Java Monte Carlo simulation for my “Minority Report” problem

This Java Monte Carlo simulation tutorial, and the corresponding Java program, was inspired by the story and subsequent movie "Minority Report", as well as my recent interest in Monte Carlo simulations.

My Monte Carlo simulation - The problem statement

Imagine that you have three people that are each "right" 80% of the time. My question to several statistician friends was "If two of the people have the same answer, what are the odds that they are correct?" They both came back and said the same thing: "80%". Since I didn't like or agree with that answer, lol, I created a Java Monte Carlo simulation of this problem.

What I'm doing in my Java Monte Carlo simulation program is assuming that there are 'N' questions in a given "test". For the purpose of simplicity I'm assuming that the "correct" answer is always 'a'. I then randomly populate the "answers" of three simulated people, but make sure that very close to 80% of each person's answers are 'a'. Once I have "N" answers for each user (in this case "N" is 10,000), I compare the answers.

Java Monte Carlo simulation of the problem

So, without any further discussion, here is my Java-based Monte Carlo simulation program for my "Minority Report" problem:

package com.devdaily.montecarlo;

import java.util.*;

/**
 * This is a little crazy, but imagine that you have three people that are each
 * "right" 80% of the time. My question to a statistician was "If two of the
 * people have the same answer, what are the odds that they are correct?
 *
 * Since I didn't get an answer I liked, I created this Monte Carlo simulation.
 * What I'm doing here is assuming that there are 'N' questions, and that the
 * "correct" answer is always 'a'.
 */
public class RandomTests
{
  final static int N = 100000;
  char person1[] = new char[N];
  char person2[] = new char[N];
  char person3[] = new char[N];

  public static void main(String[] args)
  {
    new RandomTests();
  }

  public RandomTests() {
    populateFirstPerson();
    populateSecondPerson();
    populateThirdPerson();

    p1AndP2BothHaveAForAnswer();
    p1AndP2HaveSameAnswer();
    p1p2AndP3HaveSameAnswer();
  }

  private void populateThirdPerson() {
    //person3
    Random random = new Random();
    for ( int i=0; i<N; i++ )
    {
      int randomInt = random.nextInt(100);
      if (randomInt<41 || randomInt>60) person3[i] = 'a'; else person3[i] = 'b';
    }
  }

  private void populateSecondPerson() {
    //person2
    Random random = new Random();
    for ( int i=0; i<N; i++ )
    {
      int randomInt = random.nextInt(100);
      if (randomInt>19) person2[i] = 'a'; else person2[i] = 'b';
    }
  }

  private void populateFirstPerson() {
    //person1
    Random random = new Random();
    for ( int i=0; i<N; i++ )
    {
      int randomInt = random.nextInt(100);
      if (randomInt<80) person1[i] = 'a'; else person1[i] = 'b';
    }
  }

  private void p1AndP2BothHaveAForAnswer() {
    //--------------------------------------------------------------------------
    // look at the situation where person1 and person2 both have 'a' for their
    // answer
    //--------------------------------------------------------------------------
    int numOfA = 0;
    for ( int i=0; i<N; i++ )
    {
      char c1 = person1[i];
      char c2 = person2[i];
      char c3 = person3[i];
      if (c1=='a' && c2=='a') numOfA++;
    }
    System.out.println("# of cases where person1 and person2 have 'a' for an answer: " + numOfA + " out of " + N);
  }

  //--------------------------------------------------------------------------
  // look at the situation where person1 and person2 have the same answer.
  // out of these, what percentage is correct? (i.e., what % of these are 'a'?
  //--------------------------------------------------------------------------
  private void p1AndP2HaveSameAnswer() {
    int numSame = 0;
    int numCorrect = 0;
    for ( int i=0; i<N; i++ )
    {
      char c1 = person1[i];
      char c2 = person2[i];
      if (c1==c2) {
        numSame++;
        if (c1=='a') numCorrect++;
      }
    }
    float percentCorrect = (float)numCorrect/(float)numSame * 100.0f;
    System.out.println("\nP1 = P2");
    System.out.println("numSame: " + numSame);
    System.out.println("numCorrect: " + numCorrect);
    System.out.println("% correct:  " + percentCorrect);
  }

  //--------------------------------------------------------------------------
  // look at the situation where p1, p2, and p3 have the same answer.
  // out of these, what percentage is correct? (i.e., what % of these are 'a'?
  //--------------------------------------------------------------------------
  private void p1p2AndP3HaveSameAnswer() {
    int numSame = 0;
    int numCorrect = 0;
    for ( int i=0; i<N; i++ )
    {
      char c1 = person1[i];
      char c2 = person2[i];
      char c3 = person3[i];
      if (c1==c2 && c2==c3) {
        numSame++;
        if (c1=='a') numCorrect++;
      }
    }
    float percentCorrect = (float)numCorrect/(float)numSame * 100.0f;
    System.out.println("\nP1 = P2 = P3");
    System.out.println("numSame: " + numSame);
    System.out.println("numCorrect: " + numCorrect);
    System.out.println("% correct:  " + percentCorrect);
  }
}

(In retrospect I can write that code much better. I just whipped this up one day in 2005 on my day off.)

Java Monte Carlo simulation - Results and discussion

Given this program, here are my results for one run:

# of cases where person1 and person2 have 'a' for an answer: 64215 out of 100000 

P1 = P2 numSame: 68193 numCorrect: 64215 % correct:  94.16656 
P1 = P2 = P3 numSame: 52190 numCorrect: 51532 % correct:  98.73923

Based on this run, and several similar runs, here are my results:

When looking at the data from two simulated people:

If one person is right about something 80% of the time,
and a second person is also right about the same thing 80% of the time,
and they both have the same answer,
there is ~94% chance that answer is correct.

As you can see from the current version of the program, I also ran the same test with three "people" and the "correctness" increased to ~98%.

The lesson of this for me is this: I always knew of Monte Carlo simulations before, but I never knew why they were useful. But now I know. When you can't figure out the right statistical algorithm for a given problem, run a few thousand simulations so you'll know what the answer "should" be.

Comments

Permalink

No need for simulations for this. Just basic probability theory (as I was taught at the age of 14).

Begin assuming the odds of being right/wrong of each of the people are independent of each other. This is a very strong assumption -- specially if these people have the same hints/observations/worldview in relation to the question you're asking them -- but you're inadvertently making it anyway.

The two people have four possible answers, one of which is "they both agree and they're right" (P=0.8*0.8=0.64), another "they agree and they're wrong" (P=0.2*0.2=0.04), and the other two are uninteresting to the question you ask.

The probability of both being right conditioned to both giving the same answer is thus:

0.64/(0.64+0.04) = 0.94117...

With three people having to be coincident, it is:
(0.8^3)/(0.8^3+0.2^3) = 0.98461...

However, beware of your original question:

<>

Note that if three people answer a yes/no question, there will always be two who coincide. No conditioned probability here. What's the odds that two [or more] of the three will be right?

0.8^3+3*0.2*0.8^2 = 0.896

I can't understand how your statistician friends couldn't answer this correctly. As said, this is high-school math!

Permalink

I'd say your statisticians answered a different question to the one that your MC solved. I initially agreed with your statisticians, and wondered where you were going with the whole MC thing. Then I had a look at your MC, and it solves a different problem to the question I had thought you asked. I haven't seen Minority Report, so that could be why I didn't make the 'correct' assumptions. It depends which question you meant to ask!

If you have 3 people who are right 80% of the time, but you ask only two of them, and they both answer "Yes", then the likelihood that "Yes" is the correct answer is about 94%. However, if you ask all three of them a Yes/No question, and two people say "Yes", and one person says "No", then the probability of "Yes" being true is in fact 80%.

If this is the question you want to solve, then run the MC again, but exclude cases where all three agree. Then count the proportion where the majority were actually right. It will turn out to be 80%.

I'm not sure which question you are trying to solve. It sure sounds like my question. Since you have supposed the existence of three persons, but only mentioned that two of them agree, presumably the third person does not agree. The MC that you used, and the answer that Anonymous provided, is correct only in the case where the answer of the third person is unknown.

This just shows how important it is to be perfectly clear with the statistical question you ask!

Thank you very much for your comments. I have to say, I probably don't know how to phrase my own question at this point, lol. I'm not sure I understand the difference between (a) two people having the same answer, and (b) two people having the same answer, and a third person having a different answer. If I take this to an extreme, and say 1,000 people agree on an answer, but one person disagrees, I don't think the odds that the 1,000 people are right is still 80%, but again, I think that's on me to figure out, you've certainly tried to explain it. (Or perhaps I just misstated the problem again, lol.)

FWIW, the scenario from the Minority Report involves having three people that are capable of seeing the future, and the three of them work together to report "pre-crimes", which are crimes that someone will commit in the future. They didn't really get into the theme of two people having the same answer and a third person having a different answer, and I don't want to give away any more than this, but the movie just stimulated this thinking more than anything else.

Permalink

Well, actually the problem isn't well defined enough to have a correct answer. First you need to assume that the two outcomes are equally probable a priori (before you get information from the predicters), and that the probabilities of them being correct are independent. If you make these assumptions the correct answer is 0.8.

"I'm not sure I understand the difference between (a) two people having the same answer, and (b) two people having the same answer, and a third person having a different answer"

What about two people having one answer and *two* other giving the other answer. I think this thought experiment makes it easy to see (from symmetry) that the probability of the first two being correct is not 0.8.

"If I take this to an extreme, and say 1,000 people agree on an answer, but one person disagrees, I don't think the odds that the 1,000 people are right is still 80%"

You're right. They're the same as those of 999 people all giving the correct answer. An intuitive way to think of this is that each "disagreeing" prediction cancels one of the "agreeing" prediction. So 2 out of 3 gives you the same probability as 1 out of 1, or 3 out of 5 etc.

If you want you can read up on conditional probability and Bayes' rule and do the numbers.

I guess that one lesson of this is that Monte Carlo methods are not only powerful, but also dangerous - you need to know the theory well enough to know you´re solving the correct problem :-)

Add new comment

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.