Bayes Theorem and Bayesian Inference
What Is Bayes Theorem?
Bayes Theorem is widely used in statistics and probability, making it central to the field of data science. At its essence, Bayesian inference is a continuous re-evaluation of a hypothesis as data becomes available.
What Are its Applications?
Many and varied. From determining the likelihood that a credit card transaction is fraudulent to assessing the accuracy of a medical test.
How Does it Work?
Bayesian inference evaluates the probability of an event, given specific evidence.
A few things to keep in mind:
Events must be independent
Tests are not perfect, and so produce false positives (a transaction evaluated as fraudulent when it is non-fraudulent), and false negatives (a fraudulent transaction misclassified as non-fraudulent)
An Example
Consider a test for cancer where we want to ascertain the probability of a patient having cancer given a particular test result.
Chances a patient has this type of cancer are 1%, written as P(C) = 1% – the prior probability
Test result is 90% Positive if you have C – written as P(Pos | C) – the sensitivity (we can take 100%-90% = 10% as the remaining Positive percentage where there is no C but the test misdiagnoses it – the false positives
Test result is 90% Negative if you do not have C, written as P(Neg | ¬C) – the specificity (we can take 100%-90% = 10% as the percentage of negative results, but there is C, which the test misses – the false negatives
Our Posterior probability is what we’re trying to predict – the chances of cancer actually being present, given a Positive Test – written as P( C | Pos ) – that is, we account for the chances of false positives and false negatives
Posterior P( C | Pos ) = P ( Pos | C) x P( C ) = .9 x .001 = 0.009
While P( ¬C | Pos) = P ( Pos | ¬C) x P(¬C) = .1 x .99 = 0.099
We also need to account for the number of ways it could happen given all possible outcomes.
The chance of getting a real, positive result is .009. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (0.009 + 0.099 = 0.108).
So, our actual posterior probability of cancer given a positive test is .009/.108 = 0.0883, or about 8.3%.
In Bayes Theorem terms, this is written as follows, where c is the chance a patient has cancer, and x is the positive result
P(c|x) = Chance of having cancer (c) given a positive test (x). This is what we want to know: How likely is it to have cancer with a positive result? In our case it was 8.3%.
P(x|c) = Chance of a positive test (x) given that you had cancer (c). This is the chance of a true positive, 90% in our case.
P(c) = Chance of having cancer (1%).
P(¬ c) = Chance of not having cancer (99%).
P(x|¬ c) = Chance of a positive test (x) given that you didn’t have cancer (¬ c). This is a false positive, 9.9% in our case.
Refrences
https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/
https://towardsdatascience.com/bayes-theorem-the-holy-grail-of-data-science-55d93315defb
https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf
https://medium.com/@montjoile/easy-and-quick-explanation-naive-bayes-algorithm-99cb5f3f4e9c