Bayes Theorem and Bayesian Inference

What Is Bayes Theorem?

Bayes Theorem is widely used in statistics and probability, making it central to the field of data science. At its essence, Bayesian inference is a continuous re-evaluation of a hypothesis as data becomes available.

What Are its Applications?

Many and varied. From determining the likelihood that a credit card transaction is fraudulent to assessing the accuracy of a medical test.

How Does it Work?

Bayesian inference evaluates the probability of an event, given specific evidence.

A few things to keep in mind:

  • Events must be independent

  • Tests are not perfect, and so produce false positives (a transaction evaluated as fraudulent when it is non-fraudulent), and false negatives (a fraudulent transaction misclassified as non-fraudulent)

An Example

Consider a test for cancer where we want to ascertain the probability of a patient having cancer given a particular test result.

  • Chances a patient has this type of cancer are 1%, written as P(C) = 1% – the prior probability

  • Test result is 90% Positive if you have C – written as P(Pos | C) – the sensitivity (we can take 100%-90% = 10% as the remaining Positive percentage where there is no C but the test misdiagnoses it – the false positives

  • Test result is 90% Negative if you do not have C, written as P(Neg | ¬C) – the specificity (we can take 100%-90% = 10% as the percentage of negative results, but there is C, which the test misses – the false negatives

Bayes_1.png


  • Our Posterior probability is what we’re trying to predict – the chances of cancer actually being present, given a Positive Test – written as P( C | Pos ) – that is, we account for the chances of false positives and false negatives

  • Posterior P( C | Pos ) = P ( Pos | C) x P( C ) = .9 x .001 = 0.009

  • While P( ¬C | Pos) = P ( Pos | ¬C) x P(¬C) = .1 x .99 = 0.099

Bayes_2.png

We also need to account for the number of ways it could happen given all possible outcomes.

Baes_3.png

The chance of getting a real, positive result is .009. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (0.009 + 0.099 = 0.108).

So, our actual posterior probability of cancer given a positive test is .009/.108 = 0.0883, or about 8.3%.

In Bayes Theorem terms, this is written as follows, where c is the chance a patient has cancer, and x is the positive result

Bayes_4.png
  • P(c|x) = Chance of having cancer (c) given a positive test (x). This is what we want to know: How likely is it to have cancer with a positive result? In our case it was 8.3%.

  • P(x|c) = Chance of a positive test (x) given that you had cancer (c). This is the chance of a true positive, 90% in our case.

  • P(c) = Chance of having cancer (1%).

  • P(¬ c) = Chance of not having cancer (99%).

  • P(x|¬ c) = Chance of a positive test (x) given that you didn’t have cancer (¬ c). This is a false positive, 9.9% in our case.

Refrences

Previous
Previous

How To: Solve Business Problems with Data

Next
Next

Amazon Web Services: The Elephant in the Cloud