Activity Schema Modeling: Event-Centric Data Modeling for Modern Analytics

Activity schema is a data modeling approach that organizes data around business activities rather than entities. Learn how activity schemas work, when to use them, and how they support AI-powered analytics.

8 min read·

Activity schema is a data modeling approach that organizes analytical data around activities - the actions and events that entities perform over time. Rather than modeling separate fact tables for different business processes, activity schema consolidates all activities into a unified structure optimized for behavioral analysis.

The approach has gained popularity with the rise of product analytics, customer journey analysis, and behavioral segmentation. When the primary analytical questions focus on "what did users do and in what sequence," activity schema provides a simpler, more flexible foundation than traditional dimensional models.

Core Concept: Activities as the Center

In traditional dimensional modeling, different business processes get separate fact tables:

  • Orders fact table for purchases
  • Page views fact table for web activity
  • Support tickets fact table for service interactions
  • Email events fact table for marketing engagement

Each table has its own schema optimized for that process. Cross-process analysis requires joining these tables - often complex and slow.

Activity schema inverts this by creating a single activities table:

CREATE TABLE activities (
  activity_id BIGINT PRIMARY KEY,
  entity_id VARCHAR(100),            -- Customer, user, or account
  activity_type VARCHAR(100),         -- Order, page_view, ticket_opened, email_clicked
  activity_timestamp TIMESTAMP,
  activity_date DATE,

  -- Common dimensions
  channel VARCHAR(50),
  device_type VARCHAR(50),
  geo_country VARCHAR(50),

  -- Flexible attributes
  activity_attributes JSONB,          -- Activity-specific data

  -- Metrics (when applicable)
  revenue_amount DECIMAL(12,2),
  quantity INT
);

Every activity - regardless of type - goes in this table. Behavioral analysis becomes straightforward:

-- User journey: all activities for a customer in sequence
SELECT activity_type, activity_timestamp
FROM activities
WHERE entity_id = 'customer_123'
ORDER BY activity_timestamp;

-- Conversion funnel: activities leading to purchase
SELECT entity_id,
       MAX(CASE WHEN activity_type = 'page_view' THEN activity_timestamp END) as first_view,
       MAX(CASE WHEN activity_type = 'add_to_cart' THEN activity_timestamp END) as cart_add,
       MAX(CASE WHEN activity_type = 'purchase' THEN activity_timestamp END) as purchase_time
FROM activities
WHERE entity_id IN (SELECT entity_id FROM activities WHERE activity_type = 'purchase')
GROUP BY entity_id;

Schema Design Patterns

The Standard Activity Table

The minimal activity schema includes:

ColumnDescription
activity_idUnique identifier for each activity
entity_idThe actor performing the activity (user, customer, account)
activity_typeClassification of the activity
activity_timestampWhen the activity occurred
activity_attributesFlexible storage for type-specific data

Additional common columns might include:

  • Session ID for grouping related activities
  • Channel/source information
  • Geographic attributes
  • Revenue/quantity when applicable

Handling Activity-Specific Attributes

Different activities have different attributes - an order has products and amounts, a page view has URL and duration, an email click has campaign and link. Options for handling this:

JSON columns: Store activity-specific data in a JSON/JSONB column

-- Order activity
{"order_id": "123", "product_ids": ["A", "B"], "total": 99.50}

-- Page view activity
{"url": "/products/widget", "duration_seconds": 45, "referrer": "google"}

Pros: Fully flexible, no schema changes needed Cons: Query performance on JSON, no type safety

Denormalized columns: Include common attributes as columns, NULL when not applicable

CREATE TABLE activities (
  ...
  order_id VARCHAR(50),          -- Only for order activities
  page_url VARCHAR(500),         -- Only for page view activities
  email_campaign_id VARCHAR(50), -- Only for email activities
  ...
);

Pros: Query performance, type safety Cons: Sparse columns, schema changes for new attributes

Linked detail tables: Keep activities lean, link to type-specific tables

CREATE TABLE activity_order_details (
  activity_id BIGINT PRIMARY KEY,
  order_id VARCHAR(50),
  product_count INT,
  total_amount DECIMAL(12,2)
);

Pros: Clean schema, type-specific optimization Cons: Requires joins for complete data

In practice, hybrid approaches work well - common high-value attributes as columns, everything else in JSON.

Entity Resolution

Activity schema requires consistent entity identification. When activities come from different systems:

  • Web analytics tracks anonymous cookie IDs
  • App analytics tracks device IDs
  • CRM tracks customer IDs
  • Support tracks email addresses

An identity resolution layer maps these to a single entity_id:

CREATE TABLE entity_identities (
  entity_id VARCHAR(100),
  identity_type VARCHAR(50),
  identity_value VARCHAR(200),
  confidence_score DECIMAL(3,2),
  PRIMARY KEY (identity_type, identity_value)
);

Activities are joined through this mapping to create unified behavioral views.

Query Patterns

Sequential Analysis

Activity schema excels at sequential queries - what happened before/after specific activities:

-- What do users do after viewing pricing page?
WITH pricing_views AS (
  SELECT entity_id, activity_timestamp as view_time
  FROM activities
  WHERE activity_type = 'page_view'
    AND activity_attributes->>'url' = '/pricing'
)
SELECT a.activity_type, COUNT(*) as occurrences
FROM activities a
JOIN pricing_views p ON a.entity_id = p.entity_id
WHERE a.activity_timestamp > p.view_time
  AND a.activity_timestamp < p.view_time + INTERVAL '1 hour'
GROUP BY a.activity_type
ORDER BY occurrences DESC;

Funnel Analysis

Multi-step conversion funnels are natural in activity schema:

-- Signup funnel completion rates
SELECT
  COUNT(DISTINCT CASE WHEN step >= 1 THEN entity_id END) as visited_signup,
  COUNT(DISTINCT CASE WHEN step >= 2 THEN entity_id END) as started_form,
  COUNT(DISTINCT CASE WHEN step >= 3 THEN entity_id END) as completed_signup
FROM (
  SELECT entity_id,
         MAX(CASE WHEN activity_type = 'page_view' AND activity_attributes->>'url' = '/signup' THEN 1
                  WHEN activity_type = 'form_start' THEN 2
                  WHEN activity_type = 'signup_complete' THEN 3 END) as step
  FROM activities
  WHERE activity_date >= CURRENT_DATE - 30
  GROUP BY entity_id
) funnel;

Cohort Analysis

Grouping entities by shared characteristics and comparing behaviors:

-- Retention by signup cohort
WITH cohorts AS (
  SELECT entity_id, DATE_TRUNC('week', MIN(activity_timestamp)) as cohort_week
  FROM activities
  WHERE activity_type = 'signup_complete'
  GROUP BY entity_id
)
SELECT
  c.cohort_week,
  COUNT(DISTINCT c.entity_id) as cohort_size,
  COUNT(DISTINCT CASE WHEN a.activity_timestamp >= c.cohort_week + INTERVAL '7 days'
                       AND a.activity_timestamp < c.cohort_week + INTERVAL '14 days'
                  THEN a.entity_id END) as week_2_active
FROM cohorts c
LEFT JOIN activities a ON c.entity_id = a.entity_id
GROUP BY c.cohort_week;

Session Analysis

Grouping activities into sessions for engagement analysis:

-- Session duration and depth
WITH session_bounds AS (
  SELECT entity_id,
         activity_timestamp,
         SUM(CASE WHEN activity_timestamp - LAG(activity_timestamp)
                  OVER (PARTITION BY entity_id ORDER BY activity_timestamp)
                  > INTERVAL '30 minutes' THEN 1 ELSE 0 END)
             OVER (PARTITION BY entity_id ORDER BY activity_timestamp) as session_num
  FROM activities
)
SELECT entity_id, session_num,
       MIN(activity_timestamp) as session_start,
       MAX(activity_timestamp) as session_end,
       COUNT(*) as activities_in_session
FROM session_bounds
GROUP BY entity_id, session_num;

AI Analytics Considerations

Activity schema's flat structure is well-suited for AI analytics, but requires proper semantic definition.

Clear Activity Type Taxonomy

AI systems need to understand what each activity type represents:

activity_types:
  - name: page_view
    description: User viewed a page on the website or app
    entity_type: user
    attributes:
      - url: The page URL viewed
      - duration_seconds: Time spent on page

  - name: purchase
    description: User completed a purchase transaction
    entity_type: user
    attributes:
      - order_id: Unique order identifier
      - revenue: Transaction amount in USD

Without this documentation, AI cannot reliably interpret activity_type values.

Behavioral Metric Definitions

Common behavioral metrics should be explicitly defined:

metrics:
  - name: conversion_rate
    description: Percentage of users who purchased after viewing product
    calculation: |
      COUNT(DISTINCT entity_id WHERE activity_type = 'purchase') /
      COUNT(DISTINCT entity_id WHERE activity_type = 'product_view')
    valid_dimensions: [channel, device_type, cohort_week]

AI systems can use these definitions rather than inferring calculations.

Temporal Relationship Clarity

Behavioral analysis often involves temporal relationships:

  • Activities within a session
  • Activities within N days of signup
  • Activities before/after specific events

The semantic layer should define standard temporal contexts:

temporal_contexts:
  - name: first_session
    description: Activities in the user's first session
    definition: Activities within 30 minutes of first activity

  - name: trial_period
    description: Activities during the 14-day trial period
    definition: Activities within 14 days of signup_complete activity

When to Use Activity Schema

Strong fit:

  • Product analytics and user behavior analysis
  • Customer journey mapping
  • Funnel and conversion analysis
  • Behavioral segmentation
  • Engagement and retention metrics

Less suitable:

  • Detailed transactional reporting
  • Financial analysis requiring GAAP compliance
  • Inventory and supply chain operations
  • Complex multi-entity business processes

Activity schema complements rather than replaces dimensional modeling. Organizations often use both - activity schema for behavioral analytics, dimensional models for operational and financial reporting.

Implementation Considerations

Partitioning: Partition by date for query performance - most behavioral queries are time-bounded.

Indexing: Index on (entity_id, activity_timestamp) for journey queries and (activity_type, activity_timestamp) for funnel queries.

Data volume: Activity tables grow quickly. Plan for data lifecycle management - archiving older activities or aggregating to summaries.

Real-time vs. batch: Decide latency requirements. Many activity schemas are near-real-time with streaming ingestion.

Activity schema provides a powerful foundation for behavioral analytics when the primary questions center on what entities did and how those actions relate to outcomes. Combined with proper semantic layer documentation, it enables both human analysts and AI systems to understand user behavior effectively.

Questions

Traditional dimensional modeling centers on fact tables for specific business processes (orders, shipments, support tickets). Activity schema centers on a single activities table capturing all customer or user actions. Rather than joining across fact tables, you query one table with activity type filters. This simplifies behavioral analysis but requires different query patterns.

Related