Ai Cheat Sheet
  • Home
  • Statistics ↓↑
    • Types of Measure
    • Population and Sample
    • Outliers
    • Variance
    • Standard Deviation
    • Skewness
    • Percentiles
    • Deciles
    • Quartiles
    • Box and Whisker Plots
    • Correlation and Covariance
    • Hypothesis Test
    • P Value
    • Statistical Significance
    • Bootstrapping
    • Confidence Interval
    • Central Limit Theorem
    • F1 Score (F Measure)
    • ROC and AUC
    • Random Variable
    • Expected Value
    • Central Limit Theorem
  • Probability ↓↑
    • What is Probability
    • Joint Probability
    • Marginal Probability
    • Conditional Probability
    • Bayesian Statistics
    • Naive Bayes
  • Data Science ↓↑
    • Probability Distribution
    • Bernoulli Distribution
    • Uniform Distribution
    • Binomial Distribution
    • Poisson Distribution
    • Normal Distribution
    • T-SNE
  • Data Engineering ↓↑
    • Data Science vs Data Engineering
    • Data Architecture
    • Data Governance
    • Data Quality
    • Data Compliance
    • Business Intelligence
    • Data Modeling
    • Data Catalog
    • Data Cleaning
    • Data Format
      • Apache Avro
    • Tools
      • Data Fusion
      • Dataflow
      • Dataproc
      • BigQuery
    • Cloud Platforms
      • GCP
    • SQL
      • ACID
      • SQL Transaction
      • Query Optimization
    • Data Engineering Interview Questions
  • Vector and Matrix
    • Vector
    • Matrix
  • Machine Learning ↓↑
    • L1 and L2 Loss Function
    • Linear Regression
    • Logistic Regression
    • Naive Bayes Classifier
    • Resources
  • Deep Learning ↓↑
    • Neural Networks and Deep Learning
    • Improving Deep Neural Networks
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
    • Sequence Models
    • Bias
    • Activation Function
    • Softmax
    • Cross Entropy
  • Natural Language Processing ↓↑
    • Linguistics and NLP
    • Text Augmentation
    • CNN for NLP
    • Transformers
      • Implementation
  • Computer Vision ↓↑
    • Object Localization
    • Object Detection
    • Bounding Box Prediction
    • Evaluating Object Localization
    • Anchor Boxes
    • YOLO Algorithm
    • R-CNN
    • Face Recognition
  • Time Series
    • Resources
  • Reinforcement Learning
    • Reinforcement Learning
  • System Design
    • SW Diagramming
    • Feed
  • Tools
    • PyTorch
    • Tensorflow
    • Hugging Face
  • MLOps
    • Vertex AI
      • Dataset
      • Feature Store
      • Pipelines
      • Training
      • Experiments
      • Model Registry
      • Serving
        • Batch Predictions
        • Online Predictions
      • Metadata
      • Matching Engine
      • Monitoring and Alerting
  • Interview Questions ↓↑
    • Questions by Shared Experience
  • Contact
    • My Personal Website
Powered by GitBook
On this page
  • Calculating Confidence Interval
  • References

Was this helpful?

  1. Statistics ↓↑

Confidence Interval

PreviousBootstrappingNextCentral Limit Theorem

Last updated 5 years ago

Was this helpful?

Confidence level represents the probability that the unknown parameter lies in the stated interval. The level of confidence can be chosen by the investigator.

This proposes a range of plausible values for an unknown parameter. The interval has an associated confidence level that the true parameter is in the proposed range.

Imagine you want to find the mean height of all the people in a particular US state. You could go to each person in that particular state and ask for their height, or you can do the smarter thing by taking a sample of 1000 people in the state.

Then you can use the mean of their heights (Estimated Mean) to estimate the average heights in the state (True Mean).

Calculating Confidence Interval

When you take 99% CI, you essentially increase the proportion and thus cast a wider net with three standard deviations.

References

We cast a net from the value we know x‾\overline{x}x .

To get such ranges or intervals, we go 1.96 SD away from x‾\overline{x}x (the sample mean) in both directions. And this range is the 95% confidence interval.

Now, when we say that, we estimate the true mean to be x‾\overline{x}x (the sample mean) with a confidence interval of [ x‾−1.96σ,x‾+1.96σ\overline{x}-1.96\sigma, \overline{x}+1.96\sigmax−1.96σ,x+1.96σ ], we are literally saying that: It is with 95% probability that the true population mean is within these Confidence Interval limits.

x‾±zsn\overline{x} \pm z \frac{s}{\sqrt{n}}x±zn​s​

Here, x‾\overline{x}x is the sample mean (mean of the 1000 heights sample we took). zzz is the no. of standard deviations away from the sample mean (1.96 for 95%, 2.576 for 99%), level of confidence we want. sss is the standard deviation in the sample. nnn is the size of the sample.

Confidence Intervals Explained Simply for Data ScientistsMedium
Logo