Ai Cheat Sheet
  • Home
  • Statistics ↓↑
    • Types of Measure
    • Population and Sample
    • Outliers
    • Variance
    • Standard Deviation
    • Skewness
    • Percentiles
    • Deciles
    • Quartiles
    • Box and Whisker Plots
    • Correlation and Covariance
    • Hypothesis Test
    • P Value
    • Statistical Significance
    • Bootstrapping
    • Confidence Interval
    • Central Limit Theorem
    • F1 Score (F Measure)
    • ROC and AUC
    • Random Variable
    • Expected Value
    • Central Limit Theorem
  • Probability ↓↑
    • What is Probability
    • Joint Probability
    • Marginal Probability
    • Conditional Probability
    • Bayesian Statistics
    • Naive Bayes
  • Data Science ↓↑
    • Probability Distribution
    • Bernoulli Distribution
    • Uniform Distribution
    • Binomial Distribution
    • Poisson Distribution
    • Normal Distribution
    • T-SNE
  • Data Engineering ↓↑
    • Data Science vs Data Engineering
    • Data Architecture
    • Data Governance
    • Data Quality
    • Data Compliance
    • Business Intelligence
    • Data Modeling
    • Data Catalog
    • Data Cleaning
    • Data Format
      • Apache Avro
    • Tools
      • Data Fusion
      • Dataflow
      • Dataproc
      • BigQuery
    • Cloud Platforms
      • GCP
    • SQL
      • ACID
      • SQL Transaction
      • Query Optimization
    • Data Engineering Interview Questions
  • Vector and Matrix
    • Vector
    • Matrix
  • Machine Learning ↓↑
    • L1 and L2 Loss Function
    • Linear Regression
    • Logistic Regression
    • Naive Bayes Classifier
    • Resources
  • Deep Learning ↓↑
    • Neural Networks and Deep Learning
    • Improving Deep Neural Networks
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
    • Sequence Models
    • Bias
    • Activation Function
    • Softmax
    • Cross Entropy
  • Natural Language Processing ↓↑
    • Linguistics and NLP
    • Text Augmentation
    • CNN for NLP
    • Transformers
      • Implementation
  • Computer Vision ↓↑
    • Object Localization
    • Object Detection
    • Bounding Box Prediction
    • Evaluating Object Localization
    • Anchor Boxes
    • YOLO Algorithm
    • R-CNN
    • Face Recognition
  • Time Series
    • Resources
  • Reinforcement Learning
    • Reinforcement Learning
  • System Design
    • SW Diagramming
    • Feed
  • Tools
    • PyTorch
    • Tensorflow
    • Hugging Face
  • MLOps
    • Vertex AI
      • Dataset
      • Feature Store
      • Pipelines
      • Training
      • Experiments
      • Model Registry
      • Serving
        • Batch Predictions
        • Online Predictions
      • Metadata
      • Matching Engine
      • Monitoring and Alerting
  • Interview Questions ↓↑
    • Questions by Shared Experience
  • Contact
    • My Personal Website
Powered by GitBook
On this page
  • Model Could Be
  • May Include
  • Design Schemas

Was this helpful?

  1. Data Engineering ↓↑

Data Modeling

PreviousBusiness IntelligenceNextData Catalog

Last updated 2 years ago

Was this helpful?

Quality data governance requires that the stakeholders within an organization understand its data and that data is managed in a way that supports objectives and goals. These qualities increase the potential for staff to better define data uses to solve business problems. When the business requirements for data are analyzed, and defined in support of an organization, we call this data modeling.

Skilled data modelers work with business stakeholders, and the end product is a data model.

Model Could Be

Conceptual

Logical

Physical

May Include

Data Description

Constraints

Rules

Defaults Values

Security

One form of Data Model is called Entity-relationship Diagram (ERD) or ER Diagram.

Design Schemas

Star Schema

Star Schema is a modeling technique for designing data warehouses that organizes data into a central fact table and a set of dimension tables. The fact table contains the measures or metrics that the business wants to analyze, while the dimension tables contain the attributes that provide context to the measures. This approach provides a simplified, denormalized view of the data that facilitates fast querying and analysis.

Suppose a retail company wants to analyze sales data for their products across different regions and time periods. In a star schema, the fact table would contain the sales data such as revenue and units sold, and the dimension tables would contain the attributes such as product, region, and date. This approach would allow the company to easily analyze sales trends and patterns across different dimensions.

Snowflake

Snowflake is the extension of the star schema. It consists of a fact table and dimension tables with snowflake-like layers.

Galaxy

The Galaxy schema contains two fact tables, and it shares dimension tables between them.

Star and Snowflake Schema in Data Warehouse with Model ExamplesGuru99
Logo