Bayes’ theorem is so fundamental and ubiquitous that a field called “bayesian statistics” exists.
Therefore, prior probabilities and posterior probabilities differ depending on the evidence.
P(A|B) = Posterior probability of “A” (the hypothesis) given the evidence “B” P(B|A) = Likelihood of the evidence “B” given the hypothesis “A” is true P(A) = Prior probability (The marginal probability of the event “A”) P(B) = Prior probability that the evidence itself is true
The conditional probability of H given E, written
, represents the probability of H occurring given that E also occurs (or has occurred). In our example, H is the hypothesis that Team B will win, and E is the evidence that I gave you about Team B bribing the referees.
is the frequentist probability, 10%.
is the probability that what I told you about the bribe is true, given that Team B wins. (If Team B wins tonight, would you believe what I told you?)
is the probability that Team B has in fact bribed the referees. Am I a trustworthy source of information? You can see that this approach incorporates more information than just the outcomes of the two teams’ previous 10 match-ups.
Naive bayes algorithm is structured by combining bayes’ theorem and some naive assumptions. Naive bayes algorithm assumes that features are independent of each other and there is no correlation between features.
The direct application of Bayes Theorem for classification becomes intractable, especially as the number of variables or features (n) increases. Instead, we can simplify the calculation and assume that each input variable is independent.
Although dramatic, this simpler calculation often gives very good performance, even when the input variables are highly dependent.
We can implement this from scratch by assuming a probability distribution for each separate input variable and calculating the probability of each specific input value belonging to each class and multiply the results together to give a score used to select the most likely class.