Ai Cheat Sheet
  • Home
  • Statistics ↓↑
    • Types of Measure
    • Population and Sample
    • Outliers
    • Variance
    • Standard Deviation
    • Skewness
    • Percentiles
    • Deciles
    • Quartiles
    • Box and Whisker Plots
    • Correlation and Covariance
    • Hypothesis Test
    • P Value
    • Statistical Significance
    • Bootstrapping
    • Confidence Interval
    • Central Limit Theorem
    • F1 Score (F Measure)
    • ROC and AUC
    • Random Variable
    • Expected Value
    • Central Limit Theorem
  • Probability ↓↑
    • What is Probability
    • Joint Probability
    • Marginal Probability
    • Conditional Probability
    • Bayesian Statistics
    • Naive Bayes
  • Data Science ↓↑
    • Probability Distribution
    • Bernoulli Distribution
    • Uniform Distribution
    • Binomial Distribution
    • Poisson Distribution
    • Normal Distribution
    • T-SNE
  • Data Engineering ↓↑
    • Data Science vs Data Engineering
    • Data Architecture
    • Data Governance
    • Data Quality
    • Data Compliance
    • Business Intelligence
    • Data Modeling
    • Data Catalog
    • Data Cleaning
    • Data Format
      • Apache Avro
    • Tools
      • Data Fusion
      • Dataflow
      • Dataproc
      • BigQuery
    • Cloud Platforms
      • GCP
    • SQL
      • ACID
      • SQL Transaction
      • Query Optimization
    • Data Engineering Interview Questions
  • Vector and Matrix
    • Vector
    • Matrix
  • Machine Learning ↓↑
    • L1 and L2 Loss Function
    • Linear Regression
    • Logistic Regression
    • Naive Bayes Classifier
    • Resources
  • Deep Learning ↓↑
    • Neural Networks and Deep Learning
    • Improving Deep Neural Networks
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
    • Sequence Models
    • Bias
    • Activation Function
    • Softmax
    • Cross Entropy
  • Natural Language Processing ↓↑
    • Linguistics and NLP
    • Text Augmentation
    • CNN for NLP
    • Transformers
      • Implementation
  • Computer Vision ↓↑
    • Object Localization
    • Object Detection
    • Bounding Box Prediction
    • Evaluating Object Localization
    • Anchor Boxes
    • YOLO Algorithm
    • R-CNN
    • Face Recognition
  • Time Series
    • Resources
  • Reinforcement Learning
    • Reinforcement Learning
  • System Design
    • SW Diagramming
    • Feed
  • Tools
    • PyTorch
    • Tensorflow
    • Hugging Face
  • MLOps
    • Vertex AI
      • Dataset
      • Feature Store
      • Pipelines
      • Training
      • Experiments
      • Model Registry
      • Serving
        • Batch Predictions
        • Online Predictions
      • Metadata
      • Matching Engine
      • Monitoring and Alerting
  • Interview Questions ↓↑
    • Questions by Shared Experience
  • Contact
    • My Personal Website
Powered by GitBook
On this page
  • Example
  • Hypothesis
  • T-value
  • References

Was this helpful?

  1. Statistics ↓↑

P Value

PreviousHypothesis TestNextStatistical Significance

Last updated 4 years ago

Was this helpful?

P-value is the probability of observing the data, given the null hypothesis is true.

For example, if we have a random variable A and the value x. The p-value of x is the probability that A takes the value x or any value that has the same or less chance to be observed.

Example

A neurologist is testing the effect of a drug on response time by injecting 100 rats with a unit dose of the drug, subjecting each to neurological stimulus, and recording its response time. The neurologist knows that the mean response time for rats not injected with the drug is 1.2 seconds. The mean of the 100 injected rats' response time is 1.05 seconds with a sample standard deviation of 0.5 seconds.

Do you think that the drug has an effect on response time?

Hypothesis

Null Hypothesis: Drug has no effect. (Sample mean would also be 1.2 sec even with drug) Alternative Hypothesis: Drug has an effect. (Mean is not equal 1.2 sec when drug is given)

Given, σx‾=σn=0.510=0.05\sigma_{\overline{x}}=\frac{\sigma}{\sqrt{n}}=\frac{0.5}{10}=0.05σx​=n​σ​=100.5​=0.05 [Best estimation of sample standard deviation] z=1.2−1.050.05=3z=\frac{1.2-1.05}{0.05}=3z=0.051.2−1.05​=3 [Z score or how far we are away from the mean]

That means, the z-score is 3, its 3 SD away (i.e., beyond the probability of 99.7% of the normal distribution), which is 0.3% = 0.003 Therefore, the p-value is 0.003

The probability of getting a result more extreme than 1.05 seconds given the Null Hypothesis is True, is 0.3% and is called the p-value. This rejects the Null Hypothesis.

P-value tells us how likely it is to get a result like the sample parameter (e.g., mu) if the Null Hypothesis is true.

P-values have different threshold other than 0.05, based on different experimental scenarios! A large probability means that the H0 or default assumption is likely. A small value, such as below 5% (0.05) suggests that it is not likely and that we can reject H0 in favor of H1, or that something is likely to be different (e.g. a significant result).

In this example, we performed a Parametric Statistical Hypothesis Tests.

Sometimes, P is the probability that two variables are independent (i.e., Correlation Test). See More

T-value

The larger the absolute value of the t-value, the smaller the p-value, and the greater the evidence against the null hypothesis.

References

P-values Explained By Data ScientistMedium
Logo
Use Cases