Ai Cheat Sheet
  • Home
  • Statistics ↓↑
    • Types of Measure
    • Population and Sample
    • Outliers
    • Variance
    • Standard Deviation
    • Skewness
    • Percentiles
    • Deciles
    • Quartiles
    • Box and Whisker Plots
    • Correlation and Covariance
    • Hypothesis Test
    • P Value
    • Statistical Significance
    • Bootstrapping
    • Confidence Interval
    • Central Limit Theorem
    • F1 Score (F Measure)
    • ROC and AUC
    • Random Variable
    • Expected Value
    • Central Limit Theorem
  • Probability ↓↑
    • What is Probability
    • Joint Probability
    • Marginal Probability
    • Conditional Probability
    • Bayesian Statistics
    • Naive Bayes
  • Data Science ↓↑
    • Probability Distribution
    • Bernoulli Distribution
    • Uniform Distribution
    • Binomial Distribution
    • Poisson Distribution
    • Normal Distribution
    • T-SNE
  • Data Engineering ↓↑
    • Data Science vs Data Engineering
    • Data Architecture
    • Data Governance
    • Data Quality
    • Data Compliance
    • Business Intelligence
    • Data Modeling
    • Data Catalog
    • Data Cleaning
    • Data Format
      • Apache Avro
    • Tools
      • Data Fusion
      • Dataflow
      • Dataproc
      • BigQuery
    • Cloud Platforms
      • GCP
    • SQL
      • ACID
      • SQL Transaction
      • Query Optimization
    • Data Engineering Interview Questions
  • Vector and Matrix
    • Vector
    • Matrix
  • Machine Learning ↓↑
    • L1 and L2 Loss Function
    • Linear Regression
    • Logistic Regression
    • Naive Bayes Classifier
    • Resources
  • Deep Learning ↓↑
    • Neural Networks and Deep Learning
    • Improving Deep Neural Networks
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
    • Sequence Models
    • Bias
    • Activation Function
    • Softmax
    • Cross Entropy
  • Natural Language Processing ↓↑
    • Linguistics and NLP
    • Text Augmentation
    • CNN for NLP
    • Transformers
      • Implementation
  • Computer Vision ↓↑
    • Object Localization
    • Object Detection
    • Bounding Box Prediction
    • Evaluating Object Localization
    • Anchor Boxes
    • YOLO Algorithm
    • R-CNN
    • Face Recognition
  • Time Series
    • Resources
  • Reinforcement Learning
    • Reinforcement Learning
  • System Design
    • SW Diagramming
    • Feed
  • Tools
    • PyTorch
    • Tensorflow
    • Hugging Face
  • MLOps
    • Vertex AI
      • Dataset
      • Feature Store
      • Pipelines
      • Training
      • Experiments
      • Model Registry
      • Serving
        • Batch Predictions
        • Online Predictions
      • Metadata
      • Matching Engine
      • Monitoring and Alerting
  • Interview Questions ↓↑
    • Questions by Shared Experience
  • Contact
    • My Personal Website
Powered by GitBook
On this page
  • Correlation
  • Covariance
  • Correlation Coefficient

Was this helpful?

  1. Statistics ↓↑

Correlation and Covariance

PreviousBox and Whisker PlotsNextHypothesis Test

Last updated 5 years ago

Was this helpful?

Correlation

  • Positive correlation exists when larger values of xxx correspond to larger values of yyy and vice versa.

  • Negative correlation exists when larger values of xxx correspond to smaller values of yyy and vice versa.

  • Weak or no correlation exists if there is no such apparent relationship.

Covariance

It is a measure that quantifies the strength and direction of a relationship between a pair of variables.

cov(x,y)=1n∑in(xi−x‾)(yi−y‾)cov(x,y)=\frac{1}{n}\sum_i^n(x_i-\overline{x})(y_i-\overline{y})cov(x,y)=n1​i∑n​(xi​−x)(yi​−y​)

Correlation Coefficient

The correlation coefficient, or Pearson product-moment correlation coefficient is another measure of the correlation between data. You can think of it as a standardized covariance.

rxy=cov(x,y)σ(x)σ(y)=∑in(xi−x‾)(yi−y‾)∑in(xi−x‾)2∑in(yi−y‾)2r_{xy}=\frac{cov(x,y)}{\sigma(x)\sigma(y)}=\frac{\sum_i^n(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum_i^n(x_i-\overline{x})^2\sum_i^n(y_i-\overline{y})^2}}rxy​=σ(x)σ(y)cov(x,y)​=∑in​(xi​−x)2∑in​(yi​−y​)2​∑in​(xi​−x)(yi​−y​)​

Make a Scatter Plot, and look at it! You may see a correlation that the calculation does not.

Correlation Is Not Causation which says that a correlation does not mean that one thing causes the other.