MetricFlow Explained: The Open Source Metric Computation Engine
MetricFlow is the open-source metric computation engine powering dbt Semantic Layer. Learn how MetricFlow works, its query semantics, and its role in the modern data stack.
MetricFlow is an open-source metric computation engine that translates high-level metric queries into optimized SQL. Originally developed by Transform Data and later acquired by dbt Labs, MetricFlow now powers the dbt Semantic Layer. Understanding MetricFlow helps data teams grasp how modern semantic layers convert business metric requests into executable database queries.
What MetricFlow Does
At its core, MetricFlow solves a fundamental problem: how do you define a business metric once and compute it correctly across any set of dimensions and filters?
Consider "Revenue by Region." Sounds simple, but:
- What joins are needed to connect revenue to region?
- What if some orders lack region data?
- What if the underlying data has duplicates that could cause fan-out?
- How should the query be structured for optimal performance?
MetricFlow handles these complexities, ensuring that "Revenue by Region" returns the same correct answer regardless of how the query is constructed.
Core Concepts
Semantic Models
Semantic models describe the data structures MetricFlow operates on:
semantic_models:
- name: orders
model: ref('fct_orders')
defaults:
agg_time_dimension: order_date
entities:
- name: order_id
type: primary
- name: customer_id
type: foreign
- name: product_id
type: foreign
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
- name: order_status
type: categorical
measures:
- name: order_total
agg: sum
expr: amount
- name: order_count
agg: count
expr: order_id
Entities
Entities define the keys and relationships between semantic models. They tell MetricFlow how to join data:
- Primary: The unique identifier for the model
- Foreign: References to other semantic models
- Unique: Unique but not the primary key
- Natural: Business keys like email or SKU
Measures
Measures are the aggregatable values that form the foundation of metrics:
measures:
- name: revenue
agg: sum
expr: order_amount
- name: order_count
agg: count_distinct
expr: order_id
Metrics
Metrics build on measures with additional semantics:
metrics:
- name: revenue
type: simple
type_params:
measure: revenue
- name: average_order_value
type: derived
type_params:
expr: revenue / order_count
metrics:
- name: revenue
- name: order_count
- name: cumulative_revenue
type: cumulative
type_params:
measure: revenue
window: 1 month
Metric Types
Simple Metrics
Direct exposure of a measure as a metric:
- name: total_orders
type: simple
type_params:
measure: order_count
Derived Metrics
Calculations combining multiple metrics:
- name: profit_margin
type: derived
type_params:
expr: (revenue - costs) / revenue
metrics:
- name: revenue
- name: costs
Cumulative Metrics
Running totals over time:
- name: trailing_30d_revenue
type: cumulative
type_params:
measure: revenue
window: 30 days
Conversion Metrics
Funnel calculations between events:
- name: purchase_conversion
type: conversion
type_params:
base_measure: visitors
conversion_measure: purchasers
entity: user_id
calculation: conversion_rate
window: 7 days
Query Semantics
MetricFlow's query engine handles the complexity of turning metric requests into correct SQL.
Join Resolution
Given a metric query with dimensions from different semantic models, MetricFlow:
- Identifies which models contain required data
- Determines valid join paths through entities
- Constructs joins that preserve correct aggregation
Fan-Out Prevention
A common analytics pitfall: joining a fact table to a dimension table with multiple matches, causing inflated counts. MetricFlow's entity system prevents this by understanding cardinality and generating correct queries.
Time Spine Handling
Time-series metrics need special handling for missing periods, cumulative calculations, and time zone management. MetricFlow uses a time spine table to ensure correct temporal behavior.
How MetricFlow Generates SQL
When you query a metric:
mf query --metrics revenue --group-by product_category,order_date
MetricFlow:
- Parses the request
- Identifies required semantic models
- Determines join paths
- Generates optimized SQL
- Executes against the warehouse
The generated SQL handles all aggregation, joining, and filtering correctly - even for complex derived metrics across multiple semantic models.
MetricFlow in the dbt Ecosystem
dbt Core Integration
MetricFlow integrates with dbt Core for local development:
- Define semantic models in your dbt project
- Validate definitions with
mf validate-configs - Generate SQL with
mf query - Test metric calculations locally
dbt Cloud Integration
In dbt Cloud, MetricFlow powers production semantic layer features:
- GraphQL API for metric queries
- BI tool connectors
- Access control and governance
- Caching and performance optimization
Strengths of MetricFlow
Open Source Transparency
MetricFlow's Apache 2.0 license provides visibility into how calculations work. You can inspect generated SQL and understand exactly what executes.
Correct Query Semantics
MetricFlow's careful handling of joins and aggregation prevents common calculation errors. Trust comes from knowing the system understands fan-out and cardinality.
dbt Alignment
For dbt users, MetricFlow extends familiar concepts. Semantic models reference dbt models, maintaining the connection between transformation and semantics.
Active Development
dbt Labs continues investing in MetricFlow, with regular releases adding capabilities and improving performance.
Limitations
dbt Dependency
MetricFlow is designed for dbt projects. Using it with non-dbt transformations is possible but not the primary use case.
Learning Curve
Understanding entities, semantic models, and metric types requires investment. The concepts are powerful but not immediately intuitive.
AI Integration
MetricFlow provides excellent structure for AI, but integrating with LLMs requires additional work beyond what MetricFlow provides natively.
The Codd AI Perspective
MetricFlow represents an important evolution in metric computation - moving from scattered definitions to centralized, governed calculation engines. Its query semantics handle complexities that trip up manual SQL.
Codd AI builds on similar principles but extends them for AI-native analytics. Where MetricFlow focuses on correct SQL generation, Codd AI's semantic layer also provides the context that LLMs need to understand metrics conversationally. This includes not just calculations but business context, common usage patterns, and relationships between metrics - enabling natural language analytics that maintains the rigor MetricFlow brings to computation.
Questions
MetricFlow is the query engine that powers dbt Semantic Layer, but they are not identical. MetricFlow handles metric computation - translating metric queries into SQL. dbt Semantic Layer is the broader product that includes MetricFlow plus dbt Cloud integration, APIs, and BI connectors.