MetricFlow Explained: The Open Source Metric Computation Engine

MetricFlow is the open-source metric computation engine powering dbt Semantic Layer. Learn how MetricFlow works, its query semantics, and its role in the modern data stack.

5 min read·

MetricFlow is an open-source metric computation engine that translates high-level metric queries into optimized SQL. Originally developed by Transform Data and later acquired by dbt Labs, MetricFlow now powers the dbt Semantic Layer. Understanding MetricFlow helps data teams grasp how modern semantic layers convert business metric requests into executable database queries.

What MetricFlow Does

At its core, MetricFlow solves a fundamental problem: how do you define a business metric once and compute it correctly across any set of dimensions and filters?

Consider "Revenue by Region." Sounds simple, but:

  • What joins are needed to connect revenue to region?
  • What if some orders lack region data?
  • What if the underlying data has duplicates that could cause fan-out?
  • How should the query be structured for optimal performance?

MetricFlow handles these complexities, ensuring that "Revenue by Region" returns the same correct answer regardless of how the query is constructed.

Core Concepts

Semantic Models

Semantic models describe the data structures MetricFlow operates on:

semantic_models:
  - name: orders
    model: ref('fct_orders')
    defaults:
      agg_time_dimension: order_date

    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign
      - name: product_id
        type: foreign

    dimensions:
      - name: order_date
        type: time
        type_params:
          time_granularity: day
      - name: order_status
        type: categorical

    measures:
      - name: order_total
        agg: sum
        expr: amount
      - name: order_count
        agg: count
        expr: order_id

Entities

Entities define the keys and relationships between semantic models. They tell MetricFlow how to join data:

  • Primary: The unique identifier for the model
  • Foreign: References to other semantic models
  • Unique: Unique but not the primary key
  • Natural: Business keys like email or SKU

Measures

Measures are the aggregatable values that form the foundation of metrics:

measures:
  - name: revenue
    agg: sum
    expr: order_amount
  - name: order_count
    agg: count_distinct
    expr: order_id

Metrics

Metrics build on measures with additional semantics:

metrics:
  - name: revenue
    type: simple
    type_params:
      measure: revenue

  - name: average_order_value
    type: derived
    type_params:
      expr: revenue / order_count
      metrics:
        - name: revenue
        - name: order_count

  - name: cumulative_revenue
    type: cumulative
    type_params:
      measure: revenue
      window: 1 month

Metric Types

Simple Metrics

Direct exposure of a measure as a metric:

- name: total_orders
  type: simple
  type_params:
    measure: order_count

Derived Metrics

Calculations combining multiple metrics:

- name: profit_margin
  type: derived
  type_params:
    expr: (revenue - costs) / revenue
    metrics:
      - name: revenue
      - name: costs

Cumulative Metrics

Running totals over time:

- name: trailing_30d_revenue
  type: cumulative
  type_params:
    measure: revenue
    window: 30 days

Conversion Metrics

Funnel calculations between events:

- name: purchase_conversion
  type: conversion
  type_params:
    base_measure: visitors
    conversion_measure: purchasers
    entity: user_id
    calculation: conversion_rate
    window: 7 days

Query Semantics

MetricFlow's query engine handles the complexity of turning metric requests into correct SQL.

Join Resolution

Given a metric query with dimensions from different semantic models, MetricFlow:

  1. Identifies which models contain required data
  2. Determines valid join paths through entities
  3. Constructs joins that preserve correct aggregation

Fan-Out Prevention

A common analytics pitfall: joining a fact table to a dimension table with multiple matches, causing inflated counts. MetricFlow's entity system prevents this by understanding cardinality and generating correct queries.

Time Spine Handling

Time-series metrics need special handling for missing periods, cumulative calculations, and time zone management. MetricFlow uses a time spine table to ensure correct temporal behavior.

How MetricFlow Generates SQL

When you query a metric:

mf query --metrics revenue --group-by product_category,order_date

MetricFlow:

  1. Parses the request
  2. Identifies required semantic models
  3. Determines join paths
  4. Generates optimized SQL
  5. Executes against the warehouse

The generated SQL handles all aggregation, joining, and filtering correctly - even for complex derived metrics across multiple semantic models.

MetricFlow in the dbt Ecosystem

dbt Core Integration

MetricFlow integrates with dbt Core for local development:

  • Define semantic models in your dbt project
  • Validate definitions with mf validate-configs
  • Generate SQL with mf query
  • Test metric calculations locally

dbt Cloud Integration

In dbt Cloud, MetricFlow powers production semantic layer features:

  • GraphQL API for metric queries
  • BI tool connectors
  • Access control and governance
  • Caching and performance optimization

Strengths of MetricFlow

Open Source Transparency

MetricFlow's Apache 2.0 license provides visibility into how calculations work. You can inspect generated SQL and understand exactly what executes.

Correct Query Semantics

MetricFlow's careful handling of joins and aggregation prevents common calculation errors. Trust comes from knowing the system understands fan-out and cardinality.

dbt Alignment

For dbt users, MetricFlow extends familiar concepts. Semantic models reference dbt models, maintaining the connection between transformation and semantics.

Active Development

dbt Labs continues investing in MetricFlow, with regular releases adding capabilities and improving performance.

Limitations

dbt Dependency

MetricFlow is designed for dbt projects. Using it with non-dbt transformations is possible but not the primary use case.

Learning Curve

Understanding entities, semantic models, and metric types requires investment. The concepts are powerful but not immediately intuitive.

AI Integration

MetricFlow provides excellent structure for AI, but integrating with LLMs requires additional work beyond what MetricFlow provides natively.

The Codd AI Perspective

MetricFlow represents an important evolution in metric computation - moving from scattered definitions to centralized, governed calculation engines. Its query semantics handle complexities that trip up manual SQL.

Codd AI builds on similar principles but extends them for AI-native analytics. Where MetricFlow focuses on correct SQL generation, Codd AI's semantic layer also provides the context that LLMs need to understand metrics conversationally. This includes not just calculations but business context, common usage patterns, and relationships between metrics - enabling natural language analytics that maintains the rigor MetricFlow brings to computation.

Questions

MetricFlow is the query engine that powers dbt Semantic Layer, but they are not identical. MetricFlow handles metric computation - translating metric queries into SQL. dbt Semantic Layer is the broader product that includes MetricFlow plus dbt Cloud integration, APIs, and BI connectors.

Related