Ai Cheat Sheet
  • Home
  • Statistics ↓↑
    • Types of Measure
    • Population and Sample
    • Outliers
    • Variance
    • Standard Deviation
    • Skewness
    • Percentiles
    • Deciles
    • Quartiles
    • Box and Whisker Plots
    • Correlation and Covariance
    • Hypothesis Test
    • P Value
    • Statistical Significance
    • Bootstrapping
    • Confidence Interval
    • Central Limit Theorem
    • F1 Score (F Measure)
    • ROC and AUC
    • Random Variable
    • Expected Value
    • Central Limit Theorem
  • Probability ↓↑
    • What is Probability
    • Joint Probability
    • Marginal Probability
    • Conditional Probability
    • Bayesian Statistics
    • Naive Bayes
  • Data Science ↓↑
    • Probability Distribution
    • Bernoulli Distribution
    • Uniform Distribution
    • Binomial Distribution
    • Poisson Distribution
    • Normal Distribution
    • T-SNE
  • Data Engineering ↓↑
    • Data Science vs Data Engineering
    • Data Architecture
    • Data Governance
    • Data Quality
    • Data Compliance
    • Business Intelligence
    • Data Modeling
    • Data Catalog
    • Data Cleaning
    • Data Format
      • Apache Avro
    • Tools
      • Data Fusion
      • Dataflow
      • Dataproc
      • BigQuery
    • Cloud Platforms
      • GCP
    • SQL
      • ACID
      • SQL Transaction
      • Query Optimization
    • Data Engineering Interview Questions
  • Vector and Matrix
    • Vector
    • Matrix
  • Machine Learning ↓↑
    • L1 and L2 Loss Function
    • Linear Regression
    • Logistic Regression
    • Naive Bayes Classifier
    • Resources
  • Deep Learning ↓↑
    • Neural Networks and Deep Learning
    • Improving Deep Neural Networks
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
    • Sequence Models
    • Bias
    • Activation Function
    • Softmax
    • Cross Entropy
  • Natural Language Processing ↓↑
    • Linguistics and NLP
    • Text Augmentation
    • CNN for NLP
    • Transformers
      • Implementation
  • Computer Vision ↓↑
    • Object Localization
    • Object Detection
    • Bounding Box Prediction
    • Evaluating Object Localization
    • Anchor Boxes
    • YOLO Algorithm
    • R-CNN
    • Face Recognition
  • Time Series
    • Resources
  • Reinforcement Learning
    • Reinforcement Learning
  • System Design
    • SW Diagramming
    • Feed
  • Tools
    • PyTorch
    • Tensorflow
    • Hugging Face
  • MLOps
    • Vertex AI
      • Dataset
      • Feature Store
      • Pipelines
      • Training
      • Experiments
      • Model Registry
      • Serving
        • Batch Predictions
        • Online Predictions
      • Metadata
      • Matching Engine
      • Monitoring and Alerting
  • Interview Questions ↓↑
    • Questions by Shared Experience
  • Contact
    • My Personal Website
Powered by GitBook
On this page
  • Bayes’ Theorem
  • Example
  • Bayes Theorem to Naive Bayes Classifier

Was this helpful?

  1. Probability ↓↑

Bayesian Statistics

Bayes Theorem

PreviousConditional ProbabilityNextNaive Bayes

Last updated 4 years ago

Was this helpful?

Bayes’ theorem is so fundamental and ubiquitous that a field called “bayesian statistics” exists.

In bayesian statistics, the probability of an event or hypothesis as evidence comes into play.

Therefore, prior probabilities and posterior probabilities differ depending on the evidence.

Bayes’ Theorem

P(A∣B)=P(B∣A)×P(A)P(B)P(A|B)=\frac{P(B|A) \times P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)×P(A)​

P(A|B) = Posterior probability of “A” (the hypothesis) given the evidence “B” P(B|A) = Likelihood of the evidence “B” given the hypothesis “A” is true P(A) = Prior probability (The marginal probability of the event “A”) P(B) = Prior probability that the evidence itself is true

P(H∣E)=P(E∣H)×P(H)P(E)P(H|E)=\frac{P(E|H) \times P(H)}{P(E)}P(H∣E)=P(E)P(E∣H)×P(H)​

Example

Team A and Team B have played each other 10 times, and Team A has won 9 of those times. If the teams are playing each other tonight, and I ask you who you think will win, you’d probably say Team A! What if I also told you that Team B has bribed tonight’s referees? Well, then you might guess that Team B will win.

Bayesian statistics allows you to incorporate this extra information into your calculations, while frequentist statistics focuses solely on the 9 out of 10 win percentage.

The conditional probability of H given E, written P(H∣E)P(H|E)P(H∣E), represents the probability of H occurring given that E also occurs (or has occurred). In our example, H is the hypothesis that Team B will win, and E is the evidence that I gave you about Team B bribing the referees.

Bayes Theorem to Naive Bayes Classifier

Naive bayes algorithm is structured by combining bayes’ theorem and some naive assumptions. Naive bayes algorithm assumes that features are independent of each other and there is no correlation between features.

The direct application of Bayes Theorem for classification becomes intractable, especially as the number of variables or features (n) increases. Instead, we can simplify the calculation and assume that each input variable is independent.

Although dramatic, this simpler calculation often gives very good performance, even when the input variables are highly dependent.

We can implement this from scratch by assuming a probability distribution for each separate input variable and calculating the probability of each specific input value belonging to each class and multiply the results together to give a score used to select the most likely class.

P(H)P(H)P(H) is the frequentist probability, 10%. P(E∣H)P( E|H)P(E∣H) is the probability that what I told you about the bribe is true, given that Team B wins. (If Team B wins tonight, would you believe what I told you?)

Finally, P(E)P( E)P(E) is the probability that Team B has in fact bribed the referees. Am I a trustworthy source of information? You can see that this approach incorporates more information than just the outcomes of the two teams’ previous 10 match-ups.

P(yi∣x1,x2,...xn)=P(x1∣y1)∗P(x2∣y1)...P(xn∣y1)∗P(y1)P(yi|x1,x2,...xn) = P(x1|y1)*P(x2|y1)...P(xn|y1)*P(y1)P(yi∣x1,x2,...xn)=P(x1∣y1)∗P(x2∣y1)...P(xn∣y1)∗P(y1)
Top 5 Statistical Concepts Every Data Scientist Must KnowMedium
Logo