Ai Cheat Sheet
  • Home
  • Statistics ↓↑
    • Types of Measure
    • Population and Sample
    • Outliers
    • Variance
    • Standard Deviation
    • Skewness
    • Percentiles
    • Deciles
    • Quartiles
    • Box and Whisker Plots
    • Correlation and Covariance
    • Hypothesis Test
    • P Value
    • Statistical Significance
    • Bootstrapping
    • Confidence Interval
    • Central Limit Theorem
    • F1 Score (F Measure)
    • ROC and AUC
    • Random Variable
    • Expected Value
    • Central Limit Theorem
  • Probability ↓↑
    • What is Probability
    • Joint Probability
    • Marginal Probability
    • Conditional Probability
    • Bayesian Statistics
    • Naive Bayes
  • Data Science ↓↑
    • Probability Distribution
    • Bernoulli Distribution
    • Uniform Distribution
    • Binomial Distribution
    • Poisson Distribution
    • Normal Distribution
    • T-SNE
  • Data Engineering ↓↑
    • Data Science vs Data Engineering
    • Data Architecture
    • Data Governance
    • Data Quality
    • Data Compliance
    • Business Intelligence
    • Data Modeling
    • Data Catalog
    • Data Cleaning
    • Data Format
      • Apache Avro
    • Tools
      • Data Fusion
      • Dataflow
      • Dataproc
      • BigQuery
    • Cloud Platforms
      • GCP
    • SQL
      • ACID
      • SQL Transaction
      • Query Optimization
    • Data Engineering Interview Questions
  • Vector and Matrix
    • Vector
    • Matrix
  • Machine Learning ↓↑
    • L1 and L2 Loss Function
    • Linear Regression
    • Logistic Regression
    • Naive Bayes Classifier
    • Resources
  • Deep Learning ↓↑
    • Neural Networks and Deep Learning
    • Improving Deep Neural Networks
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
    • Sequence Models
    • Bias
    • Activation Function
    • Softmax
    • Cross Entropy
  • Natural Language Processing ↓↑
    • Linguistics and NLP
    • Text Augmentation
    • CNN for NLP
    • Transformers
      • Implementation
  • Computer Vision ↓↑
    • Object Localization
    • Object Detection
    • Bounding Box Prediction
    • Evaluating Object Localization
    • Anchor Boxes
    • YOLO Algorithm
    • R-CNN
    • Face Recognition
  • Time Series
    • Resources
  • Reinforcement Learning
    • Reinforcement Learning
  • System Design
    • SW Diagramming
    • Feed
  • Tools
    • PyTorch
    • Tensorflow
    • Hugging Face
  • MLOps
    • Vertex AI
      • Dataset
      • Feature Store
      • Pipelines
      • Training
      • Experiments
      • Model Registry
      • Serving
        • Batch Predictions
        • Online Predictions
      • Metadata
      • Matching Engine
      • Monitoring and Alerting
  • Interview Questions ↓↑
    • Questions by Shared Experience
  • Contact
    • My Personal Website
Powered by GitBook
On this page
  • Alternative Discussion
  • Machine Learning Hypothesis
  • Reasons for Hypothesis Tests
  • References

Was this helpful?

  1. Statistics ↓↑

Hypothesis Test

PreviousCorrelation and CovarianceNextP Value

Last updated 4 years ago

Was this helpful?

It is a claim or statement about a population parameter such as mean, variance, proportion, etc.

If population mean, σ\sigmaσ is known then we perform z-test, otherwise we perform t-test. However, based on the central limit theorem, if the sample is large enough, we can still perform z-test.

Alternative Discussion

A Statistical Hypothesis is a probabilistic explanation about the presence of a relationship between observations.

For example, we may be interested in evaluating the relationship between the means of two samples, e.g. whether the samples were drawn from the same distribution or not, whether there is a difference between them.

One hypothesis is that there is no difference between the population means, based on the data samples. This is a hypothesis of no effect and is called the null hypothesis and we can use the statistical hypothesis test to either reject this hypothesis, or fail to reject (retain) it.

We don’t say “accept” because the outcome is probabilistic and could still be wrong, just with a very low probability.

If the null hypothesis is rejected, then we assume the alternative hypothesis that there exists some difference between the means.

Null Hypothesis (H0): Suggests no effect. Alternate Hypothesis (H1): Suggests some effect.

Machine Learning Hypothesis

It is a candidate model that approximates a target function for mapping inputs to outputs.

Learning is a search through the space of possible hypotheses for one that will perform well, even on new examples beyond the training set. The choice of algorithm and algorithm configuration involves choosing a hypothesis space that is believed to contain a hypothesis that is a good or best approximation for the target function.

Reasons for Hypothesis Tests

  1. Normality Tests Tests that you can use to check if your data has a Gaussian distribution.

    1. Shapiro-Wilk Test

    2. D’Agostino’s K^2 Test

    3. Anderson-Darling Test

  2. Correlation Tests Tests that you can use to check if two samples are related. H0: the two samples are independent.

    H1: there is a dependency between the samples.

    1. Pearson’s Correlation Coefficient

    2. Spearman’s Rank Correlation

    3. Kendall’s Rank Correlation

    4. Chi-Squared Test

  3. Stationary Tests Tests that you can use to check if a time series is stationary or not.

    1. Augmented Dickey-Fuller

    2. Kwiatkowski-Phillips-Schmidt-Shin

  4. Parametric Statistical Hypothesis Tests Tests that you can use to compare data samples. H0: the means of the samples are equal.

    H1: the means of the samples are unequal.

    1. Student’s t-test

    2. Paired Student’s t-test

    3. Analysis of Variance Test (ANOVA)

    4. Repeated Measures ANOVA Test

  5. Nonparametric Statistical Hypothesis Tests Tests whether the distributions of two independent samples are equal or not. H0: the distributions of both samples are equal.

    H1: the distributions of both samples are not equal.

    1. Mann-Whitney U Test

    2. Wilcoxon Signed-Rank Test

    3. Kruskal-Wallis H Test

    4. Friedman Test

References

What is a Hypothesis in Machine Learning? - Machine Learning MasteryMachine Learning Mastery
17 Statistical Hypothesis Tests in Python (Cheat Sheet) - Machine Learning MasteryMachine Learning Mastery
Logo
Logo